| Message ID | adca186b-24e1-20da-9e4d-0acb6754f133@rivosinc.com |
|---|---|
| State | New |
| Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B38C3858283 for <patchwork@sourceware.org>; Tue, 8 Nov 2022 20:02:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by sourceware.org (Postfix) with ESMTPS id EB7213858D3C for <gcc-patches@gcc.gnu.org>; Tue, 8 Nov 2022 20:02:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EB7213858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-qk1-x735.google.com with SMTP id v8so9754384qkg.12 for <gcc-patches@gcc.gnu.org>; Tue, 08 Nov 2022 12:02:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:content-language:to:subject:from :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=vI9CKPcCFEpvuL+oicQElNkVNwQBJIOrvADKk/DWfYM=; b=xk2LHGFUrfLMcVZ952yB8eAwEo+koAAMFGgqrMtVDawogDvALXFV8hkCYBk250jF3r hgqn6ni+ahrVfmVF9JlBB1A2tnulPb/yG6RI+/BEvpxNkr1A8QzkUfTFGR+OzGBOvCPM JGieiHtdkJVC6RseyciIX1LIJ6dAEnlBeelR4PyYiU81nIa8ykeK71wihEcYs7z8QSIb TZ6Qy30s8dCyiuJCNf6fcFlDSP02mYhK+VLFwF5ylOJhijoVHmBx4S4giXhn4ekUn2dO BMPDfUIrFN1XFgUP/fPWJMjGkgCGlHivJBRazmqBwCTCdPLiYt7qXmbAV8jJ5aCq670+ pN9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:content-language:to:subject:from :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=vI9CKPcCFEpvuL+oicQElNkVNwQBJIOrvADKk/DWfYM=; b=CLfUfZO1+kvMxa+wRRA4q6cbdez0lTWuBukNRCurM3pftpYRTTBkWkkUcmw+OIuK/F G9M6EhZe4kKt1UzQcBJXo1+sj+rWgupbqirVuMlWQKZ890p8sCZex8WZhNvmS301zKXy sNDVTgczyPBGTZi8riAVclfQRIIuKGRSoTktV+aKYimJIoq4byABCHqnE3kpHd6dqgZF Y9h/r3XWzNPBENnKXn9pA/mYdRhuYye4RoIF98xpzNMv2tQ6+2FiKHCXOLwnc4mo8wER MyaAKRHUAlPP0NBdCxLaR4BRLtaHfvQqVejC0e6tBKDtMF4owSSJ3Xy8aaazihLWZAzZ oWyg== X-Gm-Message-State: ACrzQf0erwHInzKnGQE7flhK9WP6/hEcwww+icPkBeYSr+CAeE84RVyt tA3JhdXK6/9G28wNjIJyvGbAQ7a4kKIfGw== X-Google-Smtp-Source: AMsMyM6+0lmMFhOa9KtZieTQ5IBUNgcxN+VazGRczxAnnYvO8M5PaDHjIN+KtGY+QoMepb57u2rrXQ== X-Received: by 2002:ae9:f406:0:b0:6fa:395d:1480 with SMTP id y6-20020ae9f406000000b006fa395d1480mr32238331qkl.555.1667937741115; Tue, 08 Nov 2022 12:02:21 -0800 (PST) Received: from [192.168.86.117] ([136.57.172.92]) by smtp.gmail.com with ESMTPSA id n16-20020a05620a295000b006ce0733caebsm10122166qkp.14.2022.11.08.12.02.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Nov 2022 12:02:20 -0800 (PST) Message-ID: <adca186b-24e1-20da-9e4d-0acb6754f133@rivosinc.com> Date: Tue, 8 Nov 2022 15:02:20 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 From: Michael Collison <collison@rivosinc.com> Subject: [PATCH] match.pd: rewrite select to branchless expression To: gcc-patches@gcc.gnu.org Content-Language: en-US Cc: Jeff Law <jlaw@ventanamicro.com>, "jakub@redhat.com >> Jakub Jelinek" <jakub@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
| Series |
match.pd: rewrite select to branchless expression
|
|
Commit Message
Michael Collison
Nov. 8, 2022, 8:02 p.m. UTC
This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , 0x1)) & z ) op y. Matching this patterns allows GCC to generate branchless code for one of the functions in coremark. Bootstrapped and tested on x86 and RISC-V. Okay? Michael. 2022-11-08 Michael Collison <collison@rivosinc.com> * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) ) -> (-(and (x , 0x1)) & z ) op y) 2022-11-08 Michael Collison <collison@rivosinc.com> * gcc.dg/tree-ssa/branchless-cond.c: New test. --- gcc/match.pd | 22 ++++++++++++++++ .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++++++++++++++++++ 2 files changed, 48 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
Comments
On Tue, Nov 8, 2022 at 12:02 PM Michael Collison <collison@rivosinc.com> wrote: > > This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into > (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also > transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , > 0x1)) & z ) op y. > > Matching this patterns allows GCC to generate branchless code for one of > the functions in coremark. > > Bootstrapped and tested on x86 and RISC-V. Okay? This seems like a (much) reduced (simplified?) version of https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html . I have not had time for the last year to go through the comments on that patch and resubmit it though. It seems like you are aiming for one specific case in coremarks rather than a more generic fix too. Thanks, Andrew Pinski > > Michael. > > 2022-11-08 Michael Collison <collison@rivosinc.com> > > * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) ) > -> (-(and (x , 0x1)) & z ) op y) > > 2022-11-08 Michael Collison <collison@rivosinc.com> > > * gcc.dg/tree-ssa/branchless-cond.c: New test. > > --- > gcc/match.pd | 22 ++++++++++++++++ > .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++++++++++++++++++ > 2 files changed, 48 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 194ba8f5188..722f517ac6d 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1) > (max @2 @1)) > > +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) > ^ y */ > +(for op (bit_xor bit_ior) > + (simplify > + (cond (eq (bit_and @0 integer_onep@1) > + integer_zerop) > + @2 > + (op:c @3 @2)) > + (if (INTEGRAL_TYPE_P (type) > + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) > + > +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) > ^ y */ > +(for op (bit_xor bit_ior) > + (simplify > + (cond (ne (bit_and @0 integer_onep@1) > + integer_zerop) > + (op:c @3 @2) > + @2) > + (if (INTEGRAL_TYPE_P (type) > + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) > + > /* Simplifications of shift and rotates. */ > > (for rotate (lrotate rrotate) > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > new file mode 100644 > index 00000000000..68087ae6568 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int f1(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) == 0) ? y : z ^ y; > +} > + > +int f2(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) != 0) ? z ^ y : y; > +} > + > +int f3(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) == 0) ? y : z | y; > +} > + > +int f4(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) != 0) ? z | y : y; > +} > + > +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */ > -- > 2.34.1 > > > >
On Tue, Nov 8, 2022 at 9:02 PM Michael Collison <collison@rivosinc.com> wrote: > > This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into > (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also > transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , > 0x1)) & z ) op y. > > Matching this patterns allows GCC to generate branchless code for one of > the functions in coremark. > > Bootstrapped and tested on x86 and RISC-V. Okay? > > Michael. > > 2022-11-08 Michael Collison <collison@rivosinc.com> > > * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) ) > -> (-(and (x , 0x1)) & z ) op y) > > 2022-11-08 Michael Collison <collison@rivosinc.com> > > * gcc.dg/tree-ssa/branchless-cond.c: New test. > > --- > gcc/match.pd | 22 ++++++++++++++++ > .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++++++++++++++++++ > 2 files changed, 48 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 194ba8f5188..722f517ac6d 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1) > (max @2 @1)) > > +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) > ^ y */ Please write the match as a C expression in the comment, as present it's a weird mix. So x & 0x1 == 0 ? y : z <op> y -> (-(typeof(y))(x & 0x1) & z) <op> y > +(for op (bit_xor bit_ior) > + (simplify > + (cond (eq (bit_and @0 integer_onep@1) > + integer_zerop) > + @2 > + (op:c @3 @2)) > + (if (INTEGRAL_TYPE_P (type) > + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) Since you are literally keeping (bit_and @0 @1) and not matching @0 with anything I suspect you could instead use (simplify (cond (eq zero_one_valued_p@0 integer_zerop) ... eventually extending that to cover bit_and with one. Do you need to guard this against 'type' being a signed/unsigned 1-bit precision integer? > + > +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) > ^ y */ > +(for op (bit_xor bit_ior) > + (simplify > + (cond (ne (bit_and @0 integer_onep@1) > + integer_zerop) > + (op:c @3 @2) > + @2) > + (if (INTEGRAL_TYPE_P (type) > + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) > + > /* Simplifications of shift and rotates. */ > > (for rotate (lrotate rrotate) > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > new file mode 100644 > index 00000000000..68087ae6568 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int f1(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) == 0) ? y : z ^ y; > +} > + > +int f2(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) != 0) ? z ^ y : y; > +} > + > +int f3(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) == 0) ? y : z | y; > +} > + > +int f4(unsigned int x, unsigned int y, unsigned int z) > +{ > + return ((x & 1) != 0) ? z | y : y; > +} > + > +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */ > -- > 2.34.1 > > > >
Richard,
Thanks for your feedback. I want to make sure I am following what you
are recommending. Are you suggesting changing:
(for op (bit_xor bit_ior)
(simplify
(cond (eq (bit_and @0 integer_onep@1)
integer_zerop)
@2
(op:c @3 @2))
(if (INTEGRAL_TYPE_P (type)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
(op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2))))
to
(for op (bit_xor bit_ior)
(simplify
(cond (eq zero_one_valued_p@0
integer_zerop)
@1
(op:c @2 @1))
(if (INTEGRAL_TYPE_P (type)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
(op (bit_and (negate (convert:type (bit_and @0 { build_one_cst
(type); }))) @2) @1))))
On 11/9/22 02:41, Richard Biener wrote:
> On Tue, Nov 8, 2022 at 9:02 PM Michael Collison <collison@rivosinc.com> wrote:
>> This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into
>> (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also
>> transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
>> 0x1)) & z ) op y.
>>
>> Matching this patterns allows GCC to generate branchless code for one of
>> the functions in coremark.
>>
>> Bootstrapped and tested on x86 and RISC-V. Okay?
>>
>> Michael.
>>
>> 2022-11-08 Michael Collison <collison@rivosinc.com>
>>
>> * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
>> -> (-(and (x , 0x1)) & z ) op y)
>>
>> 2022-11-08 Michael Collison <collison@rivosinc.com>
>>
>> * gcc.dg/tree-ssa/branchless-cond.c: New test.
>>
>> ---
>> gcc/match.pd | 22 ++++++++++++++++
>> .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++++++++++++++++++
>> 2 files changed, 48 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 194ba8f5188..722f517ac6d 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
>> (max @2 @1))
>>
>> +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z )
>> ^ y */
> Please write the match as a C expression in the comment, as present
> it's a weird mix. So x & 0x1 == 0 ? y : z <op> y -> (-(typeof(y))(x &
> 0x1) & z) <op> y
>
>> +(for op (bit_xor bit_ior)
>> + (simplify
>> + (cond (eq (bit_and @0 integer_onep@1)
>> + integer_zerop)
>> + @2
>> + (op:c @3 @2))
>> + (if (INTEGRAL_TYPE_P (type)
>> + && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
>> + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2))))
> Since you are literally keeping (bit_and @0 @1) and not matching @0 with
> anything I suspect you could instead use
>
> (simplify (cond (eq zero_one_valued_p@0 integer_zerop) ...
>
> eventually extending that to cover bit_and with one. Do you need to guard
> this against 'type' being a signed/unsigned 1-bit precision integer?
>
>> +
>> +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z )
>> ^ y */
>> +(for op (bit_xor bit_ior)
>> + (simplify
>> + (cond (ne (bit_and @0 integer_onep@1)
>> + integer_zerop)
>> + (op:c @3 @2)
>> + @2)
>> + (if (INTEGRAL_TYPE_P (type)
>> + && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
>> + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2))))
>> +
>> /* Simplifications of shift and rotates. */
>>
>> (for rotate (lrotate rrotate)
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>> new file mode 100644
>> index 00000000000..68087ae6568
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +int f1(unsigned int x, unsigned int y, unsigned int z)
>> +{
>> + return ((x & 1) == 0) ? y : z ^ y;
>> +}
>> +
>> +int f2(unsigned int x, unsigned int y, unsigned int z)
>> +{
>> + return ((x & 1) != 0) ? z ^ y : y;
>> +}
>> +
>> +int f3(unsigned int x, unsigned int y, unsigned int z)
>> +{
>> + return ((x & 1) == 0) ? y : z | y;
>> +}
>> +
>> +int f4(unsigned int x, unsigned int y, unsigned int z)
>> +{
>> + return ((x & 1) != 0) ? z | y : y;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
>> --
>> 2.34.1
>>
>>
>>
>>
On Wed, Nov 9, 2022 at 10:06 PM Michael Collison <collison@rivosinc.com> wrote: > > Richard, > > Thanks for your feedback. I want to make sure I am following what you > are recommending. Are you suggesting changing: > > (for op (bit_xor bit_ior) > (simplify > (cond (eq (bit_and @0 integer_onep@1) > integer_zerop) > @2 > (op:c @3 @2)) > (if (INTEGRAL_TYPE_P (type) > && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) > > > to > > (for op (bit_xor bit_ior) > (simplify > (cond (eq zero_one_valued_p@0 > integer_zerop) > @1 > (op:c @2 @1)) > (if (INTEGRAL_TYPE_P (type) > && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > (op (bit_and (negate (convert:type (bit_and @0 { build_one_cst > (type); }))) @2) @1)))) in the replacement you'd simply use (op (bit_and (negate (convert:type @0)) @2) @1))) that is, convert the [0,1] valued @0 to 'type' directly. At least I can't see how that can go wrong? > > > On 11/9/22 02:41, Richard Biener wrote: > > On Tue, Nov 8, 2022 at 9:02 PM Michael Collison <collison@rivosinc.com> wrote: > >> This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into > >> (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also > >> transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , > >> 0x1)) & z ) op y. > >> > >> Matching this patterns allows GCC to generate branchless code for one of > >> the functions in coremark. > >> > >> Bootstrapped and tested on x86 and RISC-V. Okay? > >> > >> Michael. > >> > >> 2022-11-08 Michael Collison <collison@rivosinc.com> > >> > >> * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) ) > >> -> (-(and (x , 0x1)) & z ) op y) > >> > >> 2022-11-08 Michael Collison <collison@rivosinc.com> > >> > >> * gcc.dg/tree-ssa/branchless-cond.c: New test. > >> > >> --- > >> gcc/match.pd | 22 ++++++++++++++++ > >> .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++++++++++++++++++ > >> 2 files changed, 48 insertions(+) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > >> > >> diff --git a/gcc/match.pd b/gcc/match.pd > >> index 194ba8f5188..722f517ac6d 100644 > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1) > >> (max @2 @1)) > >> > >> +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) > >> ^ y */ > > Please write the match as a C expression in the comment, as present > > it's a weird mix. So x & 0x1 == 0 ? y : z <op> y -> (-(typeof(y))(x & > > 0x1) & z) <op> y > > > >> +(for op (bit_xor bit_ior) > >> + (simplify > >> + (cond (eq (bit_and @0 integer_onep@1) > >> + integer_zerop) > >> + @2 > >> + (op:c @3 @2)) > >> + (if (INTEGRAL_TYPE_P (type) > >> + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > >> + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) > > Since you are literally keeping (bit_and @0 @1) and not matching @0 with > > anything I suspect you could instead use > > > > (simplify (cond (eq zero_one_valued_p@0 integer_zerop) ... > > > > eventually extending that to cover bit_and with one. Do you need to guard > > this against 'type' being a signed/unsigned 1-bit precision integer? > > > >> + > >> +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) > >> ^ y */ > >> +(for op (bit_xor bit_ior) > >> + (simplify > >> + (cond (ne (bit_and @0 integer_onep@1) > >> + integer_zerop) > >> + (op:c @3 @2) > >> + @2) > >> + (if (INTEGRAL_TYPE_P (type) > >> + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > >> + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) > >> + > >> /* Simplifications of shift and rotates. */ > >> > >> (for rotate (lrotate rrotate) > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > >> new file mode 100644 > >> index 00000000000..68087ae6568 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > >> @@ -0,0 +1,26 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fdump-tree-optimized" } */ > >> + > >> +int f1(unsigned int x, unsigned int y, unsigned int z) > >> +{ > >> + return ((x & 1) == 0) ? y : z ^ y; > >> +} > >> + > >> +int f2(unsigned int x, unsigned int y, unsigned int z) > >> +{ > >> + return ((x & 1) != 0) ? z ^ y : y; > >> +} > >> + > >> +int f3(unsigned int x, unsigned int y, unsigned int z) > >> +{ > >> + return ((x & 1) == 0) ? y : z | y; > >> +} > >> + > >> +int f4(unsigned int x, unsigned int y, unsigned int z) > >> +{ > >> + return ((x & 1) != 0) ? z | y : y; > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */ > >> +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */ > >> -- > >> 2.34.1 > >> > >> > >> > >>
On 11/8/22 13:15, Andrew Pinski via Gcc-patches wrote: > On Tue, Nov 8, 2022 at 12:02 PM Michael Collison <collison@rivosinc.com> wrote: >> This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into >> (-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also >> transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , >> 0x1)) & z ) op y. >> >> Matching this patterns allows GCC to generate branchless code for one of >> the functions in coremark. >> >> Bootstrapped and tested on x86 and RISC-V. Okay? > This seems like a (much) reduced (simplified?) version of > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html . > I have not had time for the last year to go through the comments on > that patch and resubmit it though. > It seems like you are aiming for one specific case in coremarks rather > than a more generic fix too. I'm fairly confident it was developed independently. Michael did this transformation for LLVM and reached out to me a month or two ago for suggestions on the GCC implementation. My recollection is I suggested phi-opt or match.pd with a slight preference for phi-opt as I wasn't offhand sure if we'd have a form suitable for match.pd. THe pattern is just a conditional xor/ior with all is said and done. While the inspiration comes from coremark, I don't think it's supposed to be specific to coremark. jeff
diff --git a/gcc/match.pd b/gcc/match.pd index 194ba8f5188..722f517ac6d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1) (max @2 @1)) +/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) ^ y */ +(for op (bit_xor bit_ior) + (simplify + (cond (eq (bit_and @0 integer_onep@1) + integer_zerop) + @2 + (op:c @3 @2)) + (if (INTEGRAL_TYPE_P (type) + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) + +/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) ^ y */ +(for op (bit_xor bit_ior) + (simplify + (cond (ne (bit_and @0 integer_onep@1) + integer_zerop) + (op:c @3 @2) + @2) + (if (INTEGRAL_TYPE_P (type) + && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) + (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2)))) + /* Simplifications of shift and rotates. */ (for rotate (lrotate rrotate) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c new file mode 100644 index 00000000000..68087ae6568 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int f1(unsigned int x, unsigned int y, unsigned int z) +{ + return ((x & 1) == 0) ? y : z ^ y; +} + +int f2(unsigned int x, unsigned int y, unsigned int z) +{ + return ((x & 1) != 0) ? z ^ y : y; +} + +int f3(unsigned int x, unsigned int y, unsigned int z) +{ + return ((x & 1) == 0) ? y : z | y; +} + +int f4(unsigned int x, unsigned int y, unsigned int z) +{ + return ((x & 1) != 0) ? z | y : y; +} + +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */ +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */