freevec.org

  • about
  • benchmarks
Home

Search

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





Matrix 4x4 multiplication (floats)

markos — Sat, 01/03/2008 - 20:08

(Please see Matrix 4x4 addition/subtraction (floats) for the typedefs and definitions used.)

Matrix multiplication is done on a column x row basis. Given two input matrices m2, m3 we do the multiplication and store the result back to an output matrix m1. Hence the function prototype:

void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3);

The calculation of matrix m1 is done on a per column basis. Since we operate on 128-bit vectors, each vector can hold 4 32-bit floats, or one line of a 4x4 matrix. When working with columns, we need to calculate 4 128-bit vectors, but we only produce results for the first column. To calculate the full matrix product, we add the column results (again 4 128-bit vectors, but each time only the relevant column is non-zero) together.

void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3)
{
        vector float zero;
        vector float vA1, vA2, vA3, vA4, vB1, vB2, vB3, vB4;
        vector float vC1, vC2, vC3, vC4;
 
        // Load matrices and multiply the first row while we wait for the next row
        zero = (vector float) vec_splat_u32(0);
 
        LOAD_ALIGNED_MATRIX(m2, vA1, vA2, vA3, vA4);
        LOAD_ALIGNED_MATRIX(m3, vB1, vB2, vB3, vB4);
 
        // Calculate the first column of m1
        vC1 = vec_madd( vec_splat( vA1, 0 ), vB1, zero );
        vC2 = vec_madd( vec_splat( vA2, 0 ), vB1, zero );
        vC3 = vec_madd( vec_splat( vA3, 0 ), vB1, zero );
        vC4 = vec_madd( vec_splat( vA4, 0 ), vB1, zero );
 
        // By now we should have loaded both matrices and be done with the first row
        // Multiply vA x vB2, add to previous results, vC
        vC1 = vec_madd( vec_splat( vA1, 1 ), vB2, vC1 );
        vC2 = vec_madd( vec_splat( vA2, 1 ), vB2, vC2 );
        vC3 = vec_madd( vec_splat( vA3, 1 ), vB2, vC3 );
        vC4 = vec_madd( vec_splat( vA4, 1 ), vB2, vC4 );
 
        // Multiply vA x vB3, add to previous results, vC
        vC1 = vec_madd( vec_splat( vA1, 2 ), vB3, vC1 );
        vC2 = vec_madd( vec_splat( vA2, 2 ), vB3, vC2 );
        vC3 = vec_madd( vec_splat( vA3, 2 ), vB3, vC3 );
        vC4 = vec_madd( vec_splat( vA4, 2 ), vB3, vC4 );
 
        // Multiply vA x vB3, add to previous results, vC
        vC1 = vec_madd( vec_splat( vA1, 3 ), vB4, vC1 );
        vC2 = vec_madd( vec_splat( vA2, 3 ), vB4, vC2 );
        vC3 = vec_madd( vec_splat( vA3, 3 ), vB4, vC3 );
        vC4 = vec_madd( vec_splat( vA4, 3 ), vB4, vC4 );
 
        // Store back the result
        STORE_ALIGNED_MATRIX(m1, vC1, vC2, vC3, vC4);
}

SIMD

  • AltiVec
  • simdX86
  • Matrix operations
  • Login or register to post comments

SIMD

  • Algorithms (31)
    • Algebra (9)
      • Matrix operations (8)
    • Bit operations (0)
    • Codecs (0)
      • Audio (0)
      • Video (0)
    • Comparison (0)
      • image comparison (0)
      • Levenshtein (0)
    • Compression (0)
      • Bzip2 (0)
      • Gzip (0)
      • LZMA (0)
      • LZW (0)
      • Squashfs (0)
      • Zlib (0)
    • Encryption (0)
      • AES (0)
      • DES (0)
      • RSA (0)
      • Salsa (0)
      • SSL (0)
    • Hashing (1)
      • CRC (0)
      • TCP/IP checksum (0)
      • UMAC (0)
    • Memory operations (15)
    • Multiprecision (0)
    • Searching (5)
      • String searching (5)
    • Sorting (0)
  • Software (32)
    • Benchmarking (2)
    • Libraries (30)
      • Eigen2 (0)
      • libfreevec (22)
      • simdX86 (8)
  • Architecture (32)
    • AltiVec (32)
    • ARM NEON (0)
    • CELL SPU (0)
    • SSE (0)
    • VIS (0)

User login

  • Create new account
  • Request new password
  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.