Search
Primary links
Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.
memcmp()
markos — Thu, 06/03/2008 - 13:57
Description
According to the man page, the memcmp() function compares the first n bytes of the memory areas s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2. It returns an integer less than, equal to, or greater than zero if the first n bytes of s1 is found, respectively, to be less than, to match, or be greater than the first n bytes of s2.
memcmp() is one of the few functions that are assembly optimised for most architectures. It definitely is optimised for x86 and x86_64, but sadly it is not optimised for the powerpc (actually, glibc offers a POWER4-only optimised version, but the rest of the powerpc subarches don't use an optimised one). In any case the glibc implementation compares 32-bit (or 64-bit depending on the arch) blocks where possible. In libfreevec we also do that, but as we already have stated, we also use modern SIMD units (AltiVec for PowerPC CPUs, and in the future SSE for x86 CPUs). This has the effect that for smaller sizes, the performance is the same but for larger sizes performance increases dramatically.
Each CPU in detail:
And for comparison here is the result of the same benchmark run on an Athlon X2 5000 (2.5Ghz), running 32-bit code:
Results/Comments
The Athlon has a much faster FSB (800Mhz vs 533Mhz for the G5 and the MPC8610), which explains its superior performance in memcmp(). At least for aligned comparisons, however, the G5 is actually faster than the Athlon as for some reason memcmp()'s performance is almost halved when comparing unaligned blocks. Sadly, performance for unaligned blocks drops on the MPC8610 as well. Actually, what's more amazing seems to be the excellent performance of the Athlon X2 in very small sizes.