freevec.org

  • about
  • benchmarks
Home

Matrix 4x4 Transpose (floats)

markos — Sat, 01/03/2008 - 20:13

Printer-friendly version

(Please see Matrix 4x4 addition/subtraction (floats) for the typedefs and definitions used.)

For the theory behind matrix transposition, please see here.

So, the 4x4 transpose would be:

| A1 | A2 | A3 | A4 |T    | A1 | B1 | C1 | D1 |
| B1 | B2 | B3 | B4 |     | A2 | B2 | C2 | D2 |
| C1 | C2 | C3 | C4 |  =  | A3 | B3 | C3 | D3 |
| D1 | D2 | D3 | C4 |     | A4 | B4 | C4 | D4 |

Basically we want to shuffle the elements in each vector in a particular way to end up with the transposed matrix. We could do it with a series of vec_permute instructions, but AltiVec offers us an easier and faster way: vec_mergeh/vec_mergel. Basically, what they do is this:

| A1 | A2 | A3 | A4 |
| B1 | B2 | B3 | B4 |
        |            \
   vec_mergeh      vec_mergel
        |              \
        v               \
| A1 | B1 | A2 | B2 |    | A3 | B3 | A4 | B4 |

So in effect we do two steps of mergeh/mergel operations:

vec_mergeh(Line 1, Line3) -> | A1 | C1 | A2 | C2 |
vec_mergel(Line 1, Line3) -> | A3 | C3 | A4 | C4 |
vec_mergeh(Line 2, Line4) -> | B1 | D1 | B2 | D2 |
vec_mergel(Line 2, Line4) -> | B3 | D3 | B4 | D4 |

2nd step:

vec_mergeh(Line 1, Line3) -> | A1 | B1 | C1 | D1 |
vec_mergel(Line 1, Line3) -> | A2 | B2 | C2 | D2 |
vec_mergeh(Line 2, Line4) -> | A3 | B3 | C3 | D3 |
vec_mergel(Line 2, Line4) -> | A4 | B4 | C4 | D4 |

And here is the final code:

void Mat44Transp(Mat44 m)
{
        vector float vm_1, vm_2, vm_3, vm_4,
                     vr_1, vr_2, vr_3, vr_4;
        // Load matrix
        LOAD_ALIGNED_MATRIX(m, vm_1, vm_2, vm_3, vm_4);
 
        // Do the transpose, first set of moves
        vr_1 = vec_mergeh(vm_1, vm_3);
        vr_2 = vec_mergel(vm_1, vm_3);
        vr_3 = vec_mergeh(vm_2, vm_4);
        vr_4 = vec_mergel(vm_2, vm_4);
        // Get the resulting vectors
        vm_1 = vec_mergeh(vr_1, vr_3);
        vm_2 = vec_mergel(vr_1, vr_3);
        vm_3 = vec_mergeh(vr_2, vr_4);
        vm_4 = vec_mergel(vr_2, vr_4);
 
        // Store back the result
        STORE_ALIGNED_MATRIX(m1, vm_1, vm_2, vm_3, vm_4);
}

SIMD

  • Matrix
  • AltiVec
  • Transpose
  • float
  • Login to post comments

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Search

User login

  • Request new password

Popular tags

Addition Algorithms AltiVec ARM Benchmarking float Inverse libfreevec Matrix Memory Multiply NEON reference Scale SSE String searching Subtraction Translate Transpose Tutorial
more tags

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.