[3/*,v6] Generic comparison functions (strcmp, strncmp, strcasecmp, strncasecmp, strverscmp)

  On Mon, Jun 01, 2015 at 12:01:31AM +0200, Ondřej Bílka wrote:
> And here is updated version of comparing functions.
> 
> Main change here is that I introduced splitting loop. If architecture
> has slow unaligned loads and its faster to create them with bit shifts
> in loop or we emulate these then we will generate two loops, one with
> aligned loads and one with unaligned loads.
> 
> These help for str* functions where inputs are coaligned around half of
> time. Aligned loop is waste of space for memcmp where only around 1/8 of
> inputs is coaligned.
> 
> Also this optimization increases size so we would need to do some live
> profiling to measure impact.
> 
But as I realized that we don't need to use clz for comparison just
mask bytes after first mismatch and do vector comparison.

What I tried to make that branchless I didn't realize that a faster way
is use conditional moves which gcc should do in this case.

For strn?casecmp I did some profiling with dryrun, a summary that I
collected is:

average size   0.7 calls    11118 succeed  87.0%

s1    aligned to 4 bytes  65.4% aligned to 8 bytes  60.7% aligned to 16 bytes  57.7%
s2    aligned to 4 bytes  37.7% aligned to 8 bytes  25.3% aligned to 16 bytes  19.1%
s1-s2 aligned to 4 bytes  36.0% aligned to 8 bytes  24.3% aligned to 16 bytes  17.8%

n <= 0:  84.7% n <= 1:  87.6% n <= 2:  88.0% n <= 3:  93.5%  n <= 4:
96.5% n <= 8:  97.8% n <= 16:  99.6% n <= 32: 100.0% n <= 64: 100.0%

average case mismatches   0.122

Which is problem as on this workload vectorization doesn't make lot of
sense when 84% mismatch is in first byte.

So I added a bytewise checks first, until there are four identical
characters in row when we switch to vector search. That helps problem
that vector search has high startup cost and it would slow us down if it
repetately advanced only few bytes due case mismatch.

I now have doubt about x64 implementation and if converting whole vector
to lowercase is correct or it would be cheaper just handle mismatches
separately and pay price.

 	* sysdeps/generic/diff_skeleton.h: New file.
 	* string/strcmp.c: Use skeleton.
 	* string/strncmp.c: Likewise.
 	* string/strverscmp.c: Likewise.
 	* string/strcasecmp.c: Likewise.
 	* string/strncase.c: Likewise.

[3/*,v6] Generic comparison functions (strcmp, strncmp, strcasecmp, strncasecmp, strverscmp)

Commit Message

Patch