[2/2,neleai/string-x64] Add strcmp with avx2

  On Sat, Jun 20, 2015 at 10:35:25AM +0200, Ondřej Bílka wrote:
> 
> Hi,
> 
> When I read strcmp again to improve strncmp and add avx2 strcmp 
> I found that I made several mistakes, mainly caused by first optimizing 
> c template and then fixing assembly.
> 
> First was mainly my idea to simplify handling cross-page check by oring
> src and dest. I recall that I first did complex crosspage handling where
> false positives were cheap. Then I found that due to size it has big
> overhead and simple loop was faster when testing with firefox. 
> That turned original decision into bad one.
> 
> Second is to reorganize loop instructions so that after loop ends I could 
> simply find last byte without recalculating much, using trick that last
> 16 bit mask could be ored with previous three as its relevant only when
> previous three were zero.
> 
> Final one is that gcc generates bad loops in regards where to increment
> pointers. You should place them after loads that use them, not at start
> of loop like gcc does. That change is responsible for 10% improvement
> for large sizes.
> 
> Final are microoptimizations that save few bytes without measurable
> performance impact like using eax instead rax to save byte or moving
> unnecessary zeroing instruction when they are not needed.
> 
> Profile data are here, shortly with avx2 for haswell that I will submit
> next.
> 
> http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html
> 
> OK to commit this?
> 
Here is a avx2 loop that I promised earlier. Luckily it gives small 2% 
practical benefit on gcc workload. Also it does improve 
performance on longer inputs twice.

	* sysdeps/x86_64/multiarch/Makefile: Add strcmp-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Add __strcmp_avx2.
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: Add avx2
	loop.
	* sysdeps/x86_64/multiarch/strcmp.S: Add ifunc.

[2/2,neleai/string-x64] Add strcmp with avx2

Commit Message

Patch