|
The "features" of libfreevec, are best described in terms of which functions are implemented, what is
the status, and what perfomance gain can be expected from using each one, along with some comments.
For this purpose, the following table was created, use it for reference. Note, each function implemented
uses exactly the same definition as the GLIBC one, ie the same return type and the same number/type
of arguments.
| Function name | Part of | Status | Average gain | Comments |
|---|
| GNU libc memory functions |
| memcpy | GLIBC | DONE | up to ~4x | better use it on sizes > 16 |
| mempcpy | GLIBC | DONE | up to ~4x | same |
| memccpy | GLIBC | DONE | up to ~7x | depends on first the position of 'c' |
| memmove | GLIBC | ALMOST | X | needs a little more testing |
| memcmp | GLIBC | DONE | up to ~4x | depends on the first difference |
| memchr | GLIBC | DONE | up to ~4.5x | depends on the first position of 'c' |
| memrchr | GLIBC | DONE | up to ~4.5x | depends on the last position of 'c' |
| memset | GLIBC | DONE | up to ~3x | libmoto version is twice as fast... |
| memfrob | GLIBC | DONE | up to ~20x | Nice silly function, to show of AltiVec :-) |
| bcopy | GLIBC | DONE | X | implemented using memcpy() |
| bcmp | GLIBC | DONE | X | implemented using memcmp() |
| bzero | GLIBC | DONE | X | implemented using memset() |
| swab | GLIBC | DONE | up to ~9x | ~twice as fast if dest is word aligned |
| GNU libc string functions |
| strlen | GLIBC | DONE | up to ~2x | still needs work, libmoto gets 4x |
| strnlen | GLIBC | DONE | up to ~5x | probably can get more |
| strcpy | GLIBC | ALMOST | up to ~2x | See notes |
| strncpy | GLIBC | IN PROGRESS | X | a memccpy with length limit |
| strcat | GLIBC | TO-DO | X | |
| strncat | GLIBC | TO-DO | X | |
| strcmp | GLIBC | DONE | up to ~7x | See notes |
| strncmp | GLIBC | DONE | up to ~5x | See notes |
| strcasecmp | GLIBC | TO-DO | X | |
| strchr | GLIBC | TO-DO | X | |
| strrchr | GLIBC | TO-DO | X | |
| strstr | GLIBC | TO-DO | X | |
| strcasestr | GLIBC | TO-DO | X | |
| strfry | GLIBC | TO-DO | X | |
| libstring (MySQL) functions |
| bmove512 | libstring | ALMOST | ~1x (!) | something's fishy |
| strfill | libstring | DONE | up to ~7.5x | implemented using memset() |
| strappend | libstring | TO-DO | X | |
| strcend | libstring | TO-DO | X | |
| strcont | libstring | TO-DO | X | |
| strend | libstring | TO-DO | X | |
| strmake | libstring | TO-DO | X | |
| strmov | libstring | TO-DO | X | |
| strnmov | libstring | TO-DO | X | |
| Others |
| adler32 | ? | DONE | up to ~2.5x | Adler32 hashing algorithm |
| 3 hashing alg. | ? | DONE | up to ~7x | hashing algorithms from Berkeley DB |
| Insertion sort | ? | DONE | up to ~4x | separate functions for char, short, long, float. The char version actually can be up to ~50x faster |
| N-way Merge sort | ? | DONE | X | N-way scalar, AltiVec version soon |
| Quick sort | ? | ALMOST | X | Major pita, quite complex, but almost there |
Notes:
- DONE: Function is finished and tested, perhaps might use a little further optimization.
- ALMOST: Function is in an almost working state, but fails to produce exactly the same
results as the original scalar code
- TO-DO: Nothing/Very little done yet
- Testing is done very thoroughly against multiple combinations of alignment (both source and
destination if applicable), sizes and other variables. If a function is reported as DONE, you can be sure that it produces
exactly the same results as the original.
- About strcmp/strncmp: I'm sure these can get even faster, there are things I'm not entirely happy about right now.
Their performance also depends on where str1 ends or where the first difference between the two strings is. It's possible
that it will appear slower in some cases (eg. str1 ends in just the first few bytes).
- About strcpy: Likewise. I'm not happy with the performance right now. For unaligned data it can get get up to ~4x
faster than GLIBC strcpy, but that's because the GLIBC version doesn't handle alignment properly. For aligned data,
GLIBC strcpy is slightly faster. And libmotovec, completely trounces both, giving almost 4x the speed of GLIBC.
So, it _must_ be possible and I may be doing something wrong. I'm putting this as ALMOST done, even though it works ok,
because I want to improve its performance.
|
|