bmove512

Submitted by markos on Tue, 01/29/2008 - 17:38.

Description

Technically, bmove512 is not part of POSIX or any libc, but part of libstring which is used in MySQL. When this effort of vectorization started, one of the goals was to vectorise MySQL, so this function was picked then as a test case, and it remained part of libfreevec since. It's basically an optimised memcpy(), working on aligned blocks and working always on multiples of 512-bytes blocks. Using minimal branching, bmove512 is really a memcpy() operating at the theoritical bandwidth limits and a very good indication at that.

Each CPU in detail:

And for comparison here is the result of the same benchmark run on an Athlon X2 5000 (2.5Ghz), running 32-bit code:

Results/Comments

The same comments done in memcpy() apply here. The bmove512 operates only on aligned blocks, so we got optimum performance with this function, as there are no branches to do.

SIMD