freevec.org

  • about
  • benchmarks
Home

Search

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





Matrix 4x4 Transpose (floats)

markos — Sat, 01/03/2008 - 20:13

(Please see Matrix 4x4 addition/subtraction (floats) for the typedefs and definitions used.)

For the theory behind matrix transposition, please see here.

So, the 4x4 transpose would be:

| A1 | A2 | A3 | A4 |T    | A1 | B1 | C1 | D1 |
| B1 | B2 | B3 | B4 |     | A2 | B2 | C2 | D2 |
| C1 | C2 | C3 | C4 |  =  | A3 | B3 | C3 | D3 |
| D1 | D2 | D3 | C4 |     | A4 | B4 | C4 | D4 |

Basically we want to shuffle the elements in each vector in a particular way to end up with the transposed matrix. We could do it with a series of vec_permute instructions, but AltiVec offers us an easier and faster way: vec_mergeh/vec_mergel. Basically, what they do is this:

| A1 | A2 | A3 | A4 |
| B1 | B2 | B3 | B4 |
        |            \
   vec_mergeh      vec_mergel
        |              \
        v               \
| A1 | B1 | A2 | B2 |    | A3 | B3 | A4 | B4 |

So in effect we do two steps of mergeh/mergel operations:

vec_mergeh(Line 1, Line3) -> | A1 | C1 | A2 | C2 |
vec_mergel(Line 1, Line3) -> | A3 | C3 | A4 | C4 |
vec_mergeh(Line 2, Line4) -> | B1 | D1 | B2 | D2 |
vec_mergel(Line 2, Line4) -> | B3 | D3 | B4 | D4 |

2nd step:

vec_mergeh(Line 1, Line3) -> | A1 | B1 | C1 | D1 |
vec_mergel(Line 1, Line3) -> | A2 | B2 | C2 | D2 |
vec_mergeh(Line 2, Line4) -> | A3 | B3 | C3 | D3 |
vec_mergel(Line 2, Line4) -> | A4 | B4 | C4 | D4 |

And here is the final code:

void Mat44Transp(Mat44 m)
{
        vector float vm_1, vm_2, vm_3, vm_4,
                     vr_1, vr_2, vr_3, vr_4;
        // Load matrix
        LOAD_ALIGNED_MATRIX(m, vm_1, vm_2, vm_3, vm_4);
 
        // Do the transpose, first set of moves
        vr_1 = vec_mergeh(vm_1, vm_3);
        vr_2 = vec_mergel(vm_1, vm_3);
        vr_3 = vec_mergeh(vm_2, vm_4);
        vr_4 = vec_mergel(vm_2, vm_4);
        // Get the resulting vectors
        vm_1 = vec_mergeh(vr_1, vr_3);
        vm_2 = vec_mergel(vr_1, vr_3);
        vm_3 = vec_mergeh(vr_2, vr_4);
        vm_4 = vec_mergel(vr_2, vr_4);
 
        // Store back the result
        STORE_ALIGNED_MATRIX(m1, vm_1, vm_2, vm_3, vm_4);
}

SIMD

  • AltiVec
  • simdX86
  • Matrix operations
  • Login or register to post comments

SIMD

  • Algorithms (31)
    • Algebra (9)
      • Matrix operations (8)
    • Bit operations (0)
    • Codecs (0)
      • Audio (0)
      • Video (0)
    • Comparison (0)
      • image comparison (0)
      • Levenshtein (0)
    • Compression (0)
      • Bzip2 (0)
      • Gzip (0)
      • LZMA (0)
      • LZW (0)
      • Squashfs (0)
      • Zlib (0)
    • Encryption (0)
      • AES (0)
      • DES (0)
      • RSA (0)
      • Salsa (0)
      • SSL (0)
    • Hashing (1)
      • CRC (0)
      • TCP/IP checksum (0)
      • UMAC (0)
    • Memory operations (15)
    • Multiprecision (0)
    • Searching (5)
      • String searching (5)
    • Sorting (0)
  • Software (32)
    • Benchmarking (2)
    • Libraries (30)
      • Eigen2 (0)
      • libfreevec (22)
      • simdX86 (8)
  • Architecture (32)
    • AltiVec (32)
    • ARM NEON (0)
    • CELL SPU (0)
    • SSE (0)
    • VIS (0)

User login

  • Create new account
  • Request new password
  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.