Rewritten v9/64-bit sparc strcmp.

Message ID 20140429.224046.1363822337091646659.davem@davemloft.net
State Committed
Headers

Commit Message

David Miller April 30, 2014, 2:40 a.m. UTC
  From: Aurelien Jarno <aurelien@aurel32.net>
Date: Tue, 29 Apr 2014 11:53:39 +0200

> This method doesn't work when comparing a 0x00 char in string 1 and 0x01
> char in string 2. In that case the mask for this byte is 0x01 and the
> corresponding xor is also  0x01. The result of the comparison therefore
> depends on the garbage after the end of the string.
> 
> On Debian [1] this causes for example debian-installer to fail to build
> [2], and it might be the source of the random segfaults which we are
> trying to debug for a few years.
> 
> [1] http://bugs.debian.org/746310
> [2] http://bugs.debian.org/731806

Right you are, if there are any zeros after the first zero we will
potentially miscompare.

The only solution I can see at the moment is to clear all except the
topmost bit in the mask.

Can you please test this patch?

Meanwhile I'll try to adjust string/test-strcmp.c so that it
explicitly tests this situation.

Thanks!
  

Comments

Aurelien Jarno April 30, 2014, 4:09 p.m. UTC | #1
On Tue, Apr 29, 2014 at 10:40:46PM -0400, David Miller wrote:
> From: Aurelien Jarno <aurelien@aurel32.net>
> Date: Tue, 29 Apr 2014 11:53:39 +0200
> 
> > This method doesn't work when comparing a 0x00 char in string 1 and 0x01
> > char in string 2. In that case the mask for this byte is 0x01 and the
> > corresponding xor is also  0x01. The result of the comparison therefore
> > depends on the garbage after the end of the string.
> > 
> > On Debian [1] this causes for example debian-installer to fail to build
> > [2], and it might be the source of the random segfaults which we are
> > trying to debug for a few years.
> > 
> > [1] http://bugs.debian.org/746310
> > [2] http://bugs.debian.org/731806
> 
> Right you are, if there are any zeros after the first zero we will
> potentially miscompare.
> 
> The only solution I can see at the moment is to clear all except the
> topmost bit in the mask.
> 
> Can you please test this patch?

Thanks a lot for this quick patch. I confirm it works correctly.

That said as the patch needs to test the mask byte by byte, I do wonder
if overall it wont be easier to directly test byte by byte the string
word loaded in the register. My SPARC assembly knowledge is relatively
limited, so I am not sure it is actually the case.
  
David Miller April 30, 2014, 5:36 p.m. UTC | #2
From: Aurelien Jarno <aurelien@aurel32.net>
Date: Wed, 30 Apr 2014 18:09:33 +0200

> That said as the patch needs to test the mask byte by byte, I do wonder
> if overall it wont be easier to directly test byte by byte the string
> word loaded in the register. My SPARC assembly knowledge is relatively
> limited, so I am not sure it is actually the case.

There is also the possibility to do divide-and-conquer, for example
determining that the low 32-bits have not bits set and then just
checking the high 32-bits.

But all such schemes add branches, and the whole idea of this code
is for it to be straight line and branchless, thus avoiding thread
switching on Niagara.

Anyways, thanks for testing I'll get this fix merged.
  

Patch

diff --git a/sysdeps/sparc/sparc64/strcmp.S b/sysdeps/sparc/sparc64/strcmp.S
index 8925396..312924a 100644
--- a/sysdeps/sparc/sparc64/strcmp.S
+++ b/sysdeps/sparc/sparc64/strcmp.S
@@ -121,6 +121,37 @@  ENTRY(strcmp)
 	movleu	%xcc, -1, %o0
 	srlx	rTMP1, 7, rTMP1
 
+	/* In order not to be influenced by bytes after the zero byte, we
+	 * have to retain only the highest bit in the mask for the comparison
+	 * with rSTRXOR to work properly.
+	 */
+	mov	0, rTMP2
+	andcc	rTMP1, 0x0100, %g0
+	
+	movne	%xcc, 8, rTMP2
+	sllx	rTMP1, 63 - 16, %o1
+
+	movrlz	%o1, 16, rTMP2
+	sllx	rTMP1, 63 - 24, %o1
+
+	movrlz	%o1, 24, rTMP2
+	sllx	rTMP1, 63 - 32, %o1
+
+	movrlz	%o1, 32, rTMP2
+	sllx	rTMP1, 63 - 40, %o1
+
+	movrlz	%o1, 40, rTMP2
+	sllx	rTMP1, 63 - 48, %o1
+
+	movrlz	%o1, 48, rTMP2
+	sllx	rTMP1, 63 - 56, %o1
+
+	movrlz	%o1, 56, rTMP2
+
+	srlx	rTMP1, rTMP2, rTMP1
+
+	sllx	rTMP1, rTMP2, rTMP1
+
 	cmp	rTMP1, rSTRXOR
 	retl
 	 movgu	%xcc, 0, %o0