Search
Primary links
Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.
memcpy()
markos — Tue, 29/01/2008 - 01:58
Description
According to the man page, memcpy() function copies n bytes from memory area src to memory area dest. The memory areas should not overlap. Use memmove(3) if the memory areas do overlap. It returns a pointer to dest.
memcpy() is probably the #1 function that gets specific arch optimizations due to its nature, it's probably the most-often called function everywhere and improvements to it are directly apparent/measurable. Currently it is optimised for x86 and x86_64, and also for most PowerPC variants. The glibc implementation copies 32-bit (or 64-bit depending on the arch) blocks where possible. In libfreevec we also do that, but as we already have stated, we also use modern SIMD units (AltiVec for PowerPC CPUs, and in the future SSE for x86 CPUs). This has the effect that for smaller sizes, the performance is the same but for larger sizes performance increases dramatically.
Each CPU in detail:
And for comparison here is the result of the same benchmark run on an Athlon X2 5000 (2.5Ghz), running 32-bit code:
Results/Comments
The same comments done in memcmp() apply here, with extra emphasis to the bad (comparatively) performance of memcpy() for unaligned blocks, both for the Athlon and the G5, but not for the 32-bit PowerPC CPUs (G4 and MPC8610). For a function that has had so much optimization for so long a time, we think this is really sad. With regard to libmotovec which is also benchmarked here -but only for the G4- we can safely say that libfreevec has reached a point to outperform Freescale's library easily :D