From patchwork Fri May 19 16:14:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 20510 Received: (qmail 68735 invoked by alias); 19 May 2017 16:14:32 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 68092 invoked by uid 89); 19 May 2017 16:14:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.1 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=064, 0, 64 X-HELO: mail-qk0-f193.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=MHY1G011VC+CGcz0lDiotT9GPM5gdZ156N4UlykivZ0=; b=IHwxUYDVda8H0LtMwyjycFY9wk79sUx6j3iflzltwcRfjDMKF7oN/eNq+cjz/yI1a+ FLq57Ldr3h6VxyeclgkK3LAyyWPhQ7hfqp/pPO/Xbvzq28NdF23ze5zVXAsHvHQ1Awb3 4R32UR7ANXJ1IeM0NPSWq62Ua6jvOmCjwKfYZMmQyHD9Dj/ERuURBp3OogVBZ1VQ7dnF A5LGDatrFNsfdZURRIWSNsSCit1+hPrq5PXEo1F/as2xbbhlsvHtz36GG/ut9YgecfC+ wUx29cp8YEWv2cwCowU/eAWNVyw1xMVuQrjRL94TbDwIAZ2gnBjdjRobPIhv3E6XpQfp 4X7w== X-Gm-Message-State: AODbwcAQLn3GPqDuIXOgpWIdXf2XUZmSLfoIpoGiyzVtLCuoVxev7MJ9 1hjz4mkij8igw0aITjd9jrzRlTFC+A== X-Received: by 10.55.127.199 with SMTP id a190mr10100817qkd.100.1495210470907; Fri, 19 May 2017 09:14:30 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <5e09e1e1-a4cf-772a-68d7-a532eca8d5be@linaro.org> References: <1489512487-24860-1-git-send-email-adhemerval.zanella@linaro.org> <40ea40ff-763f-e5ff-4bf1-1d24db403df3@linaro.org> <5d284d70-b6e5-b92e-185a-bfede9777ad8@linaro.org> <5e09e1e1-a4cf-772a-68d7-a532eca8d5be@linaro.org> From: "H.J. Lu" Date: Fri, 19 May 2017 09:14:30 -0700 Message-ID: Subject: Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182) To: Adhemerval Zanella Cc: GNU C Library On Fri, May 19, 2017 at 8:10 AM, Adhemerval Zanella wrote: > > > On 19/05/2017 11:45, Adhemerval Zanella wrote: >> >> >> On 19/05/2017 10:44, H.J. Lu wrote: >>> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu wrote: >>>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella >>>> wrote: >>>>> Ping (with the page_size/2 fix included). >>>>> >>>>> On 14/03/2017 14:28, Adhemerval Zanella wrote: >>>>>> This patch fixes the regression added by 23d2770 for final address >>>>>> overflow calculation. The subtraction of the considered size (16) >>>>>> at line 120 is at wrong place, for sizes less than 16 subsequent >>>>>> overflow check will not take in consideration an invalid size (since >>>>>> the subtraction will be negative). Also, the lea instruction also >>>>>> does not raise the carry flag (CF) that is used in subsequent jbe >>>>>> to check for overflow. >>>>>> >>>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow >>>>>> is first check and a sub instruction is issued. In case of resulting >>>>>> negative size, CF will be set by the sub instruction and a NULL >>>>>> result will be returned. The patch also add similar tests reported >>>>>> in bug report. >>>>>> >>>>>> Checked on i686-linux-gnu and x86_64-linux-gnu. >>>>>> >>>>>> [BZ# 21182] >>>>>> * string/test-memchr.c (do_test): Add BZ#21182 checks for address >>>>>> near end of a page. >>>>>> * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix >>>>>> overflow calculation. >>>>>> --- >>>>>> ChangeLog | 7 +++++++ >>>>>> string/test-memchr.c | 6 ++++++ >>>>>> sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +- >>>>>> 3 files changed, 14 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c >>>>>> index d64d10c..87a077b 100644 >>>>>> --- a/string/test-memchr.c >>>>>> +++ b/string/test-memchr.c >>>>>> @@ -210,6 +210,12 @@ test_main (void) >>>>>> do_test (0, i, i + 1, i + 1, 0); >>>>>> } >>>>>> >>>>>> + /* BZ#21182 - wrong overflow calculation for i686 implementation >>>>>> + with address near end of the page. */ >>>>>> + for (i = 2; i < 16; ++i) >>>>>> + /* page_size is in fact getpagesize() * 2. */ >>>>>> + do_test (page_size/2 - i, i, i, 1, 0x9B); >>>>>> + >>>>>> do_random_tests (); >>>>>> return ret; >>>>>> } >>>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S >>>>>> index 910679c..e41f324 100644 >>>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S >>>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S >>>>>> @@ -117,7 +117,6 @@ L(crosscache): >>>>>> >>>>>> # ifndef USE_AS_RAWMEMCHR >>>>>> jnz L(match_case2_prolog1) >>>>>> - lea -16(%edx), %edx >>>>>> /* Calculate the last acceptable address and check for possible >>>>>> addition overflow by using satured math: >>>>>> edx = ecx + edx >>>>>> @@ -125,6 +124,7 @@ L(crosscache): >>>>>> add %ecx, %edx >>>>>> sbb %eax, %eax >>>>>> or %eax, %edx >>>>>> + sub $16, %edx >>>>>> jbe L(return_null) >>>>>> lea 16(%edi), %edi >>>>>> # else >>>>>> >>>> >>>> Looks good. >>>> >>>> >>> >>> Just a thought. Is this approach >>> >>> /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)" >>> instead of "(edx + ecx) - 16" to void possible addition >>> overflow. */ >>> neg %ecx >>> add $16, %ecx >>> sub %ecx, %edx >>> jbe L(return_null) >>> >>> a little bit better? >> >> I do not have a strong preference here, although imho it is simpler to >> understand why the saturated math would work here and it is the same >> strategy used on other arch implementations. May you might outline in >> comment that ecx is always in the range of [0,64) so '-ecx + 16' can >> possible underflow. Is is this change just for micro-optimization? >> > > Sorry, my bad here. I made some confusion, in fact what I meant was ecx > was indeed in range [0,16) (is the and instruction for alignment that > uses 64) and your suggestion *can't* overflow. I would recommend > you to just outline the possible value of 'edx' in comments. Here is a patch for SSE2 memchr. OK for master? From 70c5d56431e54c7ef8dc8b8eb300a343a97ab72d Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 19 May 2017 09:04:23 -0700 Subject: [PATCH] x86: Optimize SSE2 memchr overflow calculation SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16. Use "edx - (16 - ecx)", instead of satured math, to avoid possible addition overflow. This replaces add %ecx, %edx sbb %eax, %eax or %eax, %edx sub $16, %edx with neg %ecx add $16, %ecx sub %ecx, %edx It is the same for x86_64, except for rcx/rdx, instead of ecx/edx. * sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use "edx + ecx - 16" to avoid possible addition overflow. * sysdeps/x86_64/memchr.S (memchr): Likewise. --- sysdeps/i386/i686/multiarch/memchr-sse2.S | 14 ++++++-------- sysdeps/x86_64/memchr.S | 14 ++++++-------- 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S index e41f324..172d70d 100644 --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S @@ -117,14 +117,12 @@ L(crosscache): # ifndef USE_AS_RAWMEMCHR jnz L(match_case2_prolog1) - /* Calculate the last acceptable address and check for possible - addition overflow by using satured math: - edx = ecx + edx - edx |= -(edx < ecx) */ - add %ecx, %edx - sbb %eax, %eax - or %eax, %edx - sub $16, %edx + /* "ecx" is less than 16. Calculate "edx + ecx - 16" by using + "edx - (16 - ecx)" instead of "(edx + ecx) - 16" to void + possible addition overflow. */ + neg %ecx + add $16, %ecx + sub %ecx, %edx jbe L(return_null) lea 16(%edi), %edi # else diff --git a/sysdeps/x86_64/memchr.S b/sysdeps/x86_64/memchr.S index a205a25..f82e1c5 100644 --- a/sysdeps/x86_64/memchr.S +++ b/sysdeps/x86_64/memchr.S @@ -76,14 +76,12 @@ L(crosscache): .p2align 4 L(unaligned_no_match): - /* Calculate the last acceptable address and check for possible - addition overflow by using satured math: - rdx = rcx + rdx - rdx |= -(rdx < rcx) */ - add %rcx, %rdx - sbb %rax, %rax - or %rax, %rdx - sub $16, %rdx + /* "rcx" is less than 16. Calculate "rdx + rcx - 16" by using + "rdx - (16 - rcx)" instead of "(rdx + rcx) - 16" to void + possible addition overflow. */ + neg %rcx + add $16, %rcx + sub %rcx, %rdx jbe L(return_null) add $16, %rdi sub $64, %rdx -- 2.9.4