VSX port added to Eigen!

Being the SIMD fanatic that I am, a few years ago I did the PowerPC Altivec and ARM NEON port for the Eigen linear algebra library, one of the best and most popular libraries -and most ported.

Recently I thought it would be a good idea to extend both ports to 64-bit, and it would also help me with the SIMD book, using VSX in one case and ARMv8 NEON (or Advanced SIMD as ARM likes to call it) in the latter. ARMv8 hardware is a bit scarce at the moment, so I thought I'd start with VSX. Being in Debian, I have access to a number of porterboxes in several architectures, and luckily one of those was a Power7 (with VSX) running ppc64. So I started the porting -or rather extending the code- to use VSX in the 64-bit doubles case. Unluckily, I could not test anything because Debian kernels do not have VSX enabled in wheezy -which is what the porterbox is running and enabling it is a non-option(#758620). So, running VSX code would turn out to be quite hard.

Thankfully, Breno Leitao of IBM, maintainer of the newer ppc64le in Debian, announced the availability on behalf of the OpenPower project of ppc64le VMs to Debian developers! I guess I was one of the first to apply for one :)
Anyway, thanks to Breno and OpenPower, I soon had access to a full -with root access!- ppc64le VM, and development resumed. Soon, I had the port running and started fixing the bugs. Soon after I started the port, one of the Eigen developers pointed me to two bounties from IBM on Eigen! I consider it pure luck, as I had no idea of those bounties and I was already half way through the port! Suffice to say, I added myself as working on the tasks :)

Today, I actually finished that port -94% of ~550 tests pass, the failures are not related to VSX- and performance increased 370% in 32-floats and 170% in 64-bit doubles (using bench_gemm). The port actually is(should be) working on both little-endian and big-endian VSX (and big-endian Altivec, nothing changed there), though I am waiting to get access to another big-endian VSX VM, to test and fix what's needed.

There is actually a performance regression on std::complex, but it's unrelated to VSX, as the older Altivec port also suffers from that, will need to investigate further.

Anyway, for those that want to give it a shot -assuming you have access to a ppc64le system- you can find the tree either in my eigen fork:

https://bitbucket.org/kmargar/eigen

or get the pull-request from here:

https://bitbucket.org/eigen/eigen/pull-request/84/add-vsx-support

Note#1: Now that I've finished the VSX port, I'll start work on the NEONv8 port, which I expect to be much easier to finish, there are no endianness issues there :)
Note#2: I learned a few days ago that MIPS has released a new SIMD architecture, MIPS SIMD Architecture or MSA! I'd love to add support for that in both Eigen and my book, if anyone can give me access to one such system or if ImgTec can donate a board for that, that would be great!