freevec.org

  • about
  • benchmarks
Home

Matrix 4x4 multiplication (floats)

markos — Sat, 01/03/2008 - 20:08

Printer-friendly version

(Please see Matrix 4x4 addition/subtraction (floats) for the typedefs and definitions used.)

Matrix multiplication is done on a column x row basis. Given two input matrices m2, m3 we do the multiplication and store the result back to an output matrix m1. Hence the function prototype:

void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3);

The calculation of matrix m1 is done on a per column basis. Since we operate on 128-bit vectors, each vector can hold 4 32-bit floats, or one line of a 4x4 matrix. When working with columns, we need to calculate 4 128-bit vectors, but we only produce results for the first column. To calculate the full matrix product, we add the column results (again 4 128-bit vectors, but each time only the relevant column is non-zero) together.

void Mat44MulTo(Mat44 m1, Mat44 m2, Mat44 m3)
{
        vector float zero;
        vector float vA1, vA2, vA3, vA4, vB1, vB2, vB3, vB4;
        vector float vC1, vC2, vC3, vC4;
 
        // Load matrices and multiply the first row while we wait for the next row
        zero = (vector float) vec_splat_u32(0);
 
        LOAD_ALIGNED_MATRIX(m2, vA1, vA2, vA3, vA4);
        LOAD_ALIGNED_MATRIX(m3, vB1, vB2, vB3, vB4);
 
        // Calculate the first column of m1
        vC1 = vec_madd( vec_splat( vA1, 0 ), vB1, zero );
        vC2 = vec_madd( vec_splat( vA2, 0 ), vB1, zero );
        vC3 = vec_madd( vec_splat( vA3, 0 ), vB1, zero );
        vC4 = vec_madd( vec_splat( vA4, 0 ), vB1, zero );
 
        // By now we should have loaded both matrices and be done with the first row
        // Multiply vA x vB2, add to previous results, vC
        vC1 = vec_madd( vec_splat( vA1, 1 ), vB2, vC1 );
        vC2 = vec_madd( vec_splat( vA2, 1 ), vB2, vC2 );
        vC3 = vec_madd( vec_splat( vA3, 1 ), vB2, vC3 );
        vC4 = vec_madd( vec_splat( vA4, 1 ), vB2, vC4 );
 
        // Multiply vA x vB3, add to previous results, vC
        vC1 = vec_madd( vec_splat( vA1, 2 ), vB3, vC1 );
        vC2 = vec_madd( vec_splat( vA2, 2 ), vB3, vC2 );
        vC3 = vec_madd( vec_splat( vA3, 2 ), vB3, vC3 );
        vC4 = vec_madd( vec_splat( vA4, 2 ), vB3, vC4 );
 
        // Multiply vA x vB3, add to previous results, vC
        vC1 = vec_madd( vec_splat( vA1, 3 ), vB4, vC1 );
        vC2 = vec_madd( vec_splat( vA2, 3 ), vB4, vC2 );
        vC3 = vec_madd( vec_splat( vA3, 3 ), vB4, vC3 );
        vC4 = vec_madd( vec_splat( vA4, 3 ), vB4, vC4 );
 
        // Store back the result
        STORE_ALIGNED_MATRIX(m1, vC1, vC2, vC3, vC4);
}

SIMD

  • Matrix
  • AltiVec
  • Multiply
  • float
  • Login to post comments

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Search

User login

  • Request new password

Popular tags

Addition Algorithms AltiVec ARM Benchmarking float Inverse libfreevec Matrix Memory Multiply NEON reference Scale SSE String searching Subtraction Translate Transpose Tutorial
more tags

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.