x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
Commit Message
I think this revised patch is an improvement because it avoids adding a
branch.
Thanks,
Florian
Comments
On Fri, Jun 23, 2017 at 6:51 AM, Florian Weimer <fweimer@redhat.com> wrote:
> I think this revised patch is an improvement because it avoids adding a
> branch.
>
> Thanks,
> Florian
The memcmp-avx2-movbe.S part is OK. For
(check_result, do_random_tests): Write error messages to standard
output.
any particular reason to replace error with printf?
Thanks.
On 06/23/2017 05:08 PM, H.J. Lu wrote:
> On Fri, Jun 23, 2017 at 6:51 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> I think this revised patch is an improvement because it avoids adding a
>> branch.
>>
>> Thanks,
>> Florian
>
> The memcmp-avx2-movbe.S part is OK.
Thanks, I will commit it shortly.
> For
>
> (check_result, do_random_tests): Write error messages to standard
> output.
>
> any particular reason to replace error with printf?
The error function writes to standard error, so the test harness does
not capture it output in the .out file.
Thanks,
Florian
On Fri, Jun 23, 2017 at 8:12 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/23/2017 05:08 PM, H.J. Lu wrote:
>> On Fri, Jun 23, 2017 at 6:51 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> I think this revised patch is an improvement because it avoids adding a
>>> branch.
>>>
>>> Thanks,
>>> Florian
>>
>> The memcmp-avx2-movbe.S part is OK.
>
> Thanks, I will commit it shortly.
>
>> For
>>
>> (check_result, do_random_tests): Write error messages to standard
>> output.
>>
>> any particular reason to replace error with printf?
>
> The error function writes to standard error, so the test harness does
> not capture it output in the .out file.
All string tests are like this. We still got
FAIL: string/test-memcmp
[hjl@gnu-6 build-x86_64-linux]$ string/test-memcmp
string/test-memcmp: Wrong result in function __memcmp_avx2_movbe 1488148660 -168
string/test-memcmp: Wrong result in function __memcmp_avx2_movbe 1488237565 -168
string/test-memcmp: Wrong result in function __memcmp_avx2_movbe -1275218947 180
string/test-memcmp: Wrong result in function __memcmp_avx2_movbe -1275265792 180
simple_memcmp __memcmp_avx2_movbe
__memcmp_sse4_1__memcmp_ssse3 __memcmp_sse2
[hjl@gnu-6 build-x86_64-linux]$
If we want to change this, it should be in a separate patch.
Thanks.
On 06/23/2017 05:43 PM, H.J. Lu wrote:
> On Fri, Jun 23, 2017 at 8:12 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> On 06/23/2017 05:08 PM, H.J. Lu wrote:
>>> On Fri, Jun 23, 2017 at 6:51 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>> I think this revised patch is an improvement because it avoids adding a
>>>> branch.
>>>>
>>>> Thanks,
>>>> Florian
>>>
>>> The memcmp-avx2-movbe.S part is OK.
>>
>> Thanks, I will commit it shortly.
>>
>>> For
>>>
>>> (check_result, do_random_tests): Write error messages to standard
>>> output.
>>>
>>> any particular reason to replace error with printf?
>>
>> The error function writes to standard error, so the test harness does
>> not capture it output in the .out file.
>
> All string tests are like this.
Right.
> We still got
>
> FAIL: string/test-memcmp
>
> [hjl@gnu-6 build-x86_64-linux]$ string/test-memcmp
> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe 1488148660 -168
> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe 1488237565 -168
> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe -1275218947 180
> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe -1275265792 180
> simple_memcmp __memcmp_avx2_movbe
> __memcmp_sse4_1__memcmp_ssse3 __memcmp_sse2
> [hjl@gnu-6 build-x86_64-linux]$
>
> If we want to change this, it should be in a separate patch.
Agreed, and we should fix all the instances in one go. I'm not going to
work on this immediately.
Florian
On Fri, Jun 23, 2017 at 8:47 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/23/2017 05:43 PM, H.J. Lu wrote:
>> On Fri, Jun 23, 2017 at 8:12 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>> On 06/23/2017 05:08 PM, H.J. Lu wrote:
>>>> On Fri, Jun 23, 2017 at 6:51 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>>> I think this revised patch is an improvement because it avoids adding a
>>>>> branch.
>>>>>
>>>>> Thanks,
>>>>> Florian
>>>>
>>>> The memcmp-avx2-movbe.S part is OK.
>>>
>>> Thanks, I will commit it shortly.
>>>
>>>> For
>>>>
>>>> (check_result, do_random_tests): Write error messages to standard
>>>> output.
>>>>
>>>> any particular reason to replace error with printf?
>>>
>>> The error function writes to standard error, so the test harness does
>>> not capture it output in the .out file.
>>
>> All string tests are like this.
>
> Right.
>
>> We still got
>>
>> FAIL: string/test-memcmp
>>
>> [hjl@gnu-6 build-x86_64-linux]$ string/test-memcmp
>> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe 1488148660 -168
>> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe 1488237565 -168
>> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe -1275218947 180
>> string/test-memcmp: Wrong result in function __memcmp_avx2_movbe -1275265792 180
>> simple_memcmp __memcmp_avx2_movbe
>> __memcmp_sse4_1__memcmp_ssse3 __memcmp_sse2
>> [hjl@gnu-6 build-x86_64-linux]$
>>
>> If we want to change this, it should be in a separate patch.
>
> Agreed, and we should fix all the instances in one go. I'm not going to
> work on this immediately.
>
> Florian
OK without the error -> printf change.
Thanks.
On 06/23/2017 09:51 AM, Florian Weimer wrote:
> I think this revised patch is an improvement because it avoids adding a
> branch.
OK on the following conditions:
- Split the printf changes out and do that in another pass.
- Clarify if the new testsuite change did catch the regression.
- Adjust comment for L(between_2_3):.
- Fix grammar in assembly comment.
> Thanks,
> Florian
>
>
> memcmp.patch
>
>
> x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
>
> This code:
>
> L(between_2_3):
> /* Load as big endian with overlapping loads and bswap to avoid
> branches. */
> movzwl -2(%rdi, %rdx), %eax
> movzwl -2(%rsi, %rdx), %ecx
> shll $16, %eax
> shll $16, %ecx
> movzwl (%rdi), %edi
> movzwl (%rsi), %esi
> orl %edi, %eax
> orl %esi, %ecx
> bswap %eax
> bswap %ecx
> subl %ecx, %eax
> ret
>
> needs a saturating subtract because the full register is used.
> With this commit, only the lower 24 bits of the register are used,
> so a regular subtraction suffices.
>
> The test case change adds coverage for these kinds of bugs.
Does the change to test-memcmp.c actually catch the bug reported by
the users?
> 2017-06-23 Florian Weimer <fweimer@redhat.com>
>
> [BZ #21662]
> * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
> Peform saturating subtraction.
> * string/test-memcmp.c (check1): Check with different lengths.
> (check_result, do_random_tests): Write error messages to standard
> output.
>
> diff --git a/string/test-memcmp.c b/string/test-memcmp.c
> index a7969ed..93bc972 100644
> --- a/string/test-memcmp.c
> +++ b/string/test-memcmp.c
> @@ -85,8 +85,8 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
> || (exp_result < 0 && result >= 0)
> || (exp_result > 0 && result <= 0))
> {
> - error (0, 0, "Wrong result in function %s %d %d", impl->name,
> - result, exp_result);
> + printf ("error: wrong result in function %s %d %d", impl->name,
> + result, exp_result);
> ret = 1;
> return -1;
> }
> @@ -192,8 +192,10 @@ do_random_tests (void)
> || (r < 0 && result >= 0)
> || (r > 0 && result <= 0))
> {
> - error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> - n, impl->name, align1 * CHARBYTES & 63, align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
> + printf ("error: Iteration %zd - wrong result in function %s"
> + " (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> + n, impl->name, align1 * CHARBYTES & 63,
> + align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
Outputting informative messages should be done to stdout not stderr.
However, in the future please don't mix changes like this with a bug fix.
Please commit the above two hunks right away since I can't see anyone
objecting.
> ret = 1;
> }
> }
> @@ -441,11 +443,12 @@ check1 (void)
>
> n = 116;
> for (size_t i = 0; i < n; i++)
> - {
> - exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, n - i);
> - FOR_EACH_IMPL (impl, 0)
> - check_result (impl, s1 + i, s2 + i, n - i, exp_result);
> - }
> + for (size_t len = 0; len <= n - i; ++len)
> + {
> + exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, len);
> + FOR_EACH_IMPL (impl, 0)
> + check_result (impl, s1 + i, s2 + i, len, exp_result);
> + }
OK, so you've added more iterations here over the length of the remaining
substring to compare.
I assume that this catches the regression by ensuring the high values of
the subtraction result in an underflow which results in a positive value
of the subtraction and a wrong answer?
> }
>
> /* This test checks that memcmp doesn't overrun buffers. */
> diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> index 47630dd..4cc9386 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> @@ -139,16 +139,17 @@ L(exit):
> L(between_2_3):
> /* Load as big endian with overlapping loads and bswap to avoid
> branches. */
Was this comment ever accurate? mobzwl is not a BE load.
Given that you changed the code, please adjust the comment.
> - movzwl -2(%rdi, %rdx), %eax
> - movzwl -2(%rsi, %rdx), %ecx
> - shll $16, %eax
> - shll $16, %ecx
> - movzwl (%rdi), %edi
> - movzwl (%rsi), %esi
> - orl %edi, %eax
> - orl %esi, %ecx
> + movzwl (%rdi), %eax
> + movzwl (%rsi), %ecx
OK, move 32-bit with zero-extend to full register.
1,2,0,0
> + shll $8, %eax
> + shll $8, %ecx
OK, shift out the part we don't care about.
0,1,2,0
> bswap %eax
> bswap %ecx
OK, swap, leaving either:
0,2,1,0
> + movzbl -1(%rdi, %rdx), %edi
> + movzbl -1(%rsi, %rdx), %esi
OK, byte 3 or 2 loaded with everything else set to 0 (24 bits).
3,0,0,0
or
2,0,0,0
> + orl %edi, %eax
> + orl %esi, %ecx
> + /* Subtraction is okay because the upper 8 bits a zero. */
s/a zero/are zero/g
We either have:
3,2,1,0
or
2,2,1,0
> subl %ecx, %eax
OK, must use 'l' to subtract full DWORD.
> ret
>
On 06/23/2017 06:38 PM, Carlos O'Donell wrote:
> I assume that this catches the regression by ensuring the high values of
> the subtraction result in an underflow which results in a positive value
> of the subtraction and a wrong answer?
Yes, I thought I said so in the commit message.
> Was this comment ever accurate? mobzwl is not a BE load.
We used bswap, so the register contents before the comparison is in
big-endian format.
>> + orl %edi, %eax
>> + orl %esi, %ecx
>> + /* Subtraction is okay because the upper 8 bits a zero. */
>
> s/a zero/are zero/g
Okay, I'll fix this typo in a follow-up commit.
I won't work on the error → printf cleanup for now. HJ asked that they
should be put into a separate commit.
Thanks,
Florian
On 06/23/2017 06:51 AM, Florian Weimer wrote:
> + shll $8, %eax
> + shll $8, %ecx
> bswap %eax
> bswap %ecx
> + movzbl -1(%rdi, %rdx), %edi
> + movzbl -1(%rsi, %rdx), %esi
> + orl %edi, %eax
> + orl %esi, %ecx
Do you really need to re-load and merge the final byte?
It would appear you could leave the low byte zero.
r~
On 06/23/2017 10:48 PM, Richard Henderson wrote:
> On 06/23/2017 06:51 AM, Florian Weimer wrote:
>> + shll $8, %eax
>> + shll $8, %ecx
>> bswap %eax
>> bswap %ecx
>> + movzbl -1(%rdi, %rdx), %edi
>> + movzbl -1(%rsi, %rdx), %esi
>> + orl %edi, %eax
>> + orl %esi, %ecx
>
> Do you really need to re-load and merge the final byte?
> It would appear you could leave the low byte zero.
Sorry, I don't understand. Is this the same issue as H.J. raised and
improved?
The final/third byte certainly can affect the outcome of the comparison.
Thanks,
Florian
On 06/23/2017 01:52 PM, Florian Weimer wrote:
> On 06/23/2017 10:48 PM, Richard Henderson wrote:
>> On 06/23/2017 06:51 AM, Florian Weimer wrote:
>>> + shll $8, %eax
>>> + shll $8, %ecx
>>> bswap %eax
>>> bswap %ecx
>>> + movzbl -1(%rdi, %rdx), %edi
>>> + movzbl -1(%rsi, %rdx), %esi
>>> + orl %edi, %eax
>>> + orl %esi, %ecx
>>
>> Do you really need to re-load and merge the final byte?
>> It would appear you could leave the low byte zero.
>
> Sorry, I don't understand. Is this the same issue as H.J. raised and
> improved?
>
> The final/third byte certainly can affect the outcome of the comparison.
Obviously I didn't intend to imply that one can completely ignore the final
byte. From the larger context that I didn't quote, it appeared that the last
byte has already been loaded.
That said, I take it all back -- I mis-read the diff.
r~
x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
This code:
L(between_2_3):
/* Load as big endian with overlapping loads and bswap to avoid
branches. */
movzwl -2(%rdi, %rdx), %eax
movzwl -2(%rsi, %rdx), %ecx
shll $16, %eax
shll $16, %ecx
movzwl (%rdi), %edi
movzwl (%rsi), %esi
orl %edi, %eax
orl %esi, %ecx
bswap %eax
bswap %ecx
subl %ecx, %eax
ret
needs a saturating subtract because the full register is used.
With this commit, only the lower 24 bits of the register are used,
so a regular subtraction suffices.
The test case change adds coverage for these kinds of bugs.
2017-06-23 Florian Weimer <fweimer@redhat.com>
[BZ #21662]
* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
Peform saturating subtraction.
* string/test-memcmp.c (check1): Check with different lengths.
(check_result, do_random_tests): Write error messages to standard
output.
@@ -85,8 +85,8 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
|| (exp_result < 0 && result >= 0)
|| (exp_result > 0 && result <= 0))
{
- error (0, 0, "Wrong result in function %s %d %d", impl->name,
- result, exp_result);
+ printf ("error: wrong result in function %s %d %d", impl->name,
+ result, exp_result);
ret = 1;
return -1;
}
@@ -192,8 +192,10 @@ do_random_tests (void)
|| (r < 0 && result >= 0)
|| (r > 0 && result <= 0))
{
- error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
- n, impl->name, align1 * CHARBYTES & 63, align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
+ printf ("error: Iteration %zd - wrong result in function %s"
+ " (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
+ n, impl->name, align1 * CHARBYTES & 63,
+ align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
ret = 1;
}
}
@@ -441,11 +443,12 @@ check1 (void)
n = 116;
for (size_t i = 0; i < n; i++)
- {
- exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, n - i);
- FOR_EACH_IMPL (impl, 0)
- check_result (impl, s1 + i, s2 + i, n - i, exp_result);
- }
+ for (size_t len = 0; len <= n - i; ++len)
+ {
+ exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, len);
+ FOR_EACH_IMPL (impl, 0)
+ check_result (impl, s1 + i, s2 + i, len, exp_result);
+ }
}
/* This test checks that memcmp doesn't overrun buffers. */
@@ -139,16 +139,17 @@ L(exit):
L(between_2_3):
/* Load as big endian with overlapping loads and bswap to avoid
branches. */
- movzwl -2(%rdi, %rdx), %eax
- movzwl -2(%rsi, %rdx), %ecx
- shll $16, %eax
- shll $16, %ecx
- movzwl (%rdi), %edi
- movzwl (%rsi), %esi
- orl %edi, %eax
- orl %esi, %ecx
+ movzwl (%rdi), %eax
+ movzwl (%rsi), %ecx
+ shll $8, %eax
+ shll $8, %ecx
bswap %eax
bswap %ecx
+ movzbl -1(%rdi, %rdx), %edi
+ movzbl -1(%rsi, %rdx), %esi
+ orl %edi, %eax
+ orl %esi, %ecx
+ /* Subtraction is okay because the upper 8 bits a zero. */
subl %ecx, %eax
ret