freevec.org

  • about
  • benchmarks
Home

32-bit *signed* integer multiplication with AltiVec

markos — Sat, 23/08/2008 - 21:55

Printer-friendly version

While completing Eigen2 AltiVec support (should be almost complete now), I noticed that the 32-bit integer multiplication didn't work correctly all of the time. As AltiVec does not really include any instruction to do 32-bit integer multiplication, I used Apple's routine from the Apple Developer's site. But this didn't work and some results were totally off. With some debugging, I found out that this routine works for unsigned 32-bit integers, where Eigen2 uses signed integers! So, I had to search more, and to my surprise, I found no reference of any similar work. So I had 2 choices: a) ditch AltiVec integer vectorisation from Eigen2 (not acceptable!) b) implement my own method! It is obvious which choice I followed :)
UPDATE: Thanks to Matt Sealey, who noticed I could have used vec_abs() instead of vec_sub() and vec_max(). Duh! :D

Anyway, here is the code with some explanations, but just as a quick explanation, I can just say that there were 3 stages to the algorithm:

  • get the signs of the multiplication using xor, in particular, if
    v1 = | A1 | A2 | A3 | A4 |, v2 = | B1 | B2 | B3 | B4 |
     
    (v1 xor v2) -> | sgn(A1*B1) | sgn(A2*B2) | sgn(A3*B3) | sgn(A4*B4) |

    and we just have to use compare each 32-bit quantity with 0 and produce the needed mask.

  • Get the absolute values of each vector and do the multiplication as per the Apple method for unsigned integers.
  • Change the signs of the required elements, according to the mask, using basic binary arithmetic. A negative number is the two's complement +1: -A = ~A +1. So we just NOR the negative elements, add 1 to them and merge the results back to the final vector.

    Here is the code (taken and modified from the Eigen2 source).

    vector int vmulws(const vector int&   a, const vector int&   b)
    {
      v4i bswap, low_prod, high_prod, prod, prod_, a1, b1, v1sel;
     
      // Get the absolute values
      a1  = vec_abs(a);
      b1  = vec_abs(b);
     
      // Get the signs using xor
      v4bi sgn = (v4bi) vec_cmplt(vec_xor(a, b), v0i);
     
      // Do the multiplication for the asbolute values.
      bswap = (v4i) vec_rl((v4ui) b1, (v4ui) v16i_ );
      low_prod = vec_mulo((vector short)a1, (vector short)b1);
      high_prod = vec_msum((vector short)a1, (vector short)bswap, v0i);
      high_prod = (v4i) vec_sl((v4ui) high_prod, (v4ui) v16i_);
      prod = vec_add( low_prod, high_prod );
     
      // NOR the product and select only the negative elements according to the sign mask
      prod_ = vec_nor(prod, prod);
      prod_ = vec_sel(v0i, prod_, sgn);
     
      // Add 1 to the result to get the negative numbers
      v1sel = vec_sel(v0i, v1i, sgn);
      prod_ = vec_add(prod_, v1sel);
     
      // Merge the results back to the final vector.
      prod = vec_sel(prod, prod_, sgn);
      return prod;
    }

  • Algebra
  • AltiVec
  • Code
  • Login to post comments

ppc profiling on linux

sanjay — Mon, 25/08/2008 - 20:50

Hi Markos,

Hopefully, this won't be taken as an unwanted advertisement...but as a note from one PPC hacker to another:
If you're interested in a profiler and code analyzer for Linux PPC, check out: http://www.rotateright.com .

--
Sanjay

  • Login to post comments

I'll definitely check it

markos — Tue, 26/08/2008 - 19:36

I'll definitely check it out! This might be just what I needed!!

Thanks!

  • Login to post comments

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Search

User login

  • Request new password

Popular tags

Addition Algorithms AltiVec ARM Benchmarking float Inverse libfreevec Matrix Memory Multiply NEON reference Scale SSE String searching Subtraction Translate Transpose Tutorial
more tags

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.