freevec.org

  • about
  • benchmarks
Home › libfreevec 1.0.4 benchmarks updated!

Search

Primary links

  • About
    • History of libfreevec
  • Benchmarks
    • libfreevec

Please donate to libfreevec to ensure its continuing development! Donations are done via Paypal.





swab()

markos — Tue, 29/01/2008 - 01:59

Description

According to the man page, the swab() function function copies n bytes from the array pointed to by from to the array pointed to by to, exchanging adjacent even and odd bytes. This function is used to exchange data between machines that have different low/high byte ordering. This function does nothing when n is negative. When n is positive and odd, it handles n-1 bytes as above, and does something unspecified with the last byte. (In other words, n should be even.). It returns no value.

swab() is not a very commonly used function, but it is very useful in specific software that is dealing with manipulation of raw data, eg. Audio software, CD recording software, etc. In particular, it is used extensively everywhere that endianness is important (eg. filesystems, CD audio data). This byte-swapping algorithm was a perfect candidate for the AltiVec permute unit and when one looks at the benchmarks, it was obvious that an AltiVec-optimised routine was needed.

Each CPU in detail:

And for comparison here is the result of the same benchmark run on an Athlon X2 5000 (2.5Ghz), running 32-bit code:

Results/Comments

The Athlon X2 does a very good job, even if the algorithm used is a reference one-byte-at-a-time. Unfortunately, the G5 does not benefit that much from AltiVec in this case, and this has to do with its weaker AltiVec unit (in particular, the permute operations suffer the most). But even so, it does gain a bit more performance. And again, the MPC8610 is the overall winner, with the good old G4 coming a close 2nd.

One thing that is immediately apparent is that the algorithm suffers much from alignment. In fact, the algorithm changes when the byte is 16-bit aligned or not as we have to use a carry byte in that case. Still, the performance is better than the glibc implementation.

Lastly, as an example, let's assume, we are working on 512 or 1024 byte buffers (eg. used in CD recording or audio conversion software). The G5 gets ~1500MB/s, the Athlon X2 gets ~1600MB/s, while the G4 gets ~2000-2200MB/s and ~2750-3000MB/s, which is almost a 2x speed!

SIMD

  • AltiVec
  • libfreevec
  • Memory operations
‹ strnlen() up Comments/Conclusions ›
  • Login or register to post comments

SIMD

  • Algorithms (31)
    • Algebra (9)
      • Matrix operations (8)
    • Bit operations (0)
    • Codecs (0)
      • Audio (0)
      • Video (0)
    • Comparison (0)
      • image comparison (0)
      • Levenshtein (0)
    • Compression (0)
      • Bzip2 (0)
      • Gzip (0)
      • LZMA (0)
      • LZW (0)
      • Squashfs (0)
      • Zlib (0)
    • Encryption (0)
      • AES (0)
      • DES (0)
      • RSA (0)
      • Salsa (0)
      • SSL (0)
    • Hashing (1)
      • CRC (0)
      • TCP/IP checksum (0)
      • UMAC (0)
    • Memory operations (15)
    • Multiprecision (0)
    • Searching (5)
      • String searching (5)
    • Sorting (0)
  • Software (32)
    • Benchmarking (2)
    • Libraries (30)
      • Eigen2 (0)
      • libfreevec (22)
      • simdX86 (8)
  • Architecture (32)
    • AltiVec (32)
    • ARM NEON (0)
    • CELL SPU (0)
    • SSE (0)
    • VIS (0)

User login

  • Create new account
  • Request new password
  • about
  • benchmarks

Copyright (c)2008 by CODEX.
Powered by Drupal. Using theme Deco.
All Google charts have been created by the CSV Chart and Chart API Drupal modules.