Message ID | CALHvHFX5=L5kqWE7mpiCTJ7OAKwM4CPqu-J57uMKikNDirbgsQ@mail.gmail.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 487893858410 for <patchwork@sourceware.org>; Wed, 5 Jan 2022 06:15:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 487893858410 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1641363308; bh=2wgVcxpy43ZvEUMC8LmmuQGPJ3p/RNXINgTS0pUOPyg=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=DzAyI82RgO6gdcG88Loa5LN5WpDR9FWqhGeMxPOJE2BCgssns0g5Tb6r0RyViqr+r v6pf7tQ0i5n2UFBAUtzzVcaNcOerF6RczFUn3vd2RGVfIjc8/zZx+F8YjVmK9Q5VS4 rK1TWlIAN2yQq6iJsIcLZrO8/hzXcp0UVYxjqRe8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by sourceware.org (Postfix) with ESMTPS id 8818E3858C2C for <gcc-patches@gcc.gnu.org>; Wed, 5 Jan 2022 06:14:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8818E3858C2C Received: by mail-io1-xd2b.google.com with SMTP id p65so47242952iof.3 for <gcc-patches@gcc.gnu.org>; Tue, 04 Jan 2022 22:14:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=EHf3qLYmV/6412ZcV3qWfoUpscM6/7FsNLJr1Z3z/gU=; b=r0R9VDo63AoV2AoTdZuGBCqw67FMvtmNFn8tJc4oXR4aAgT7Z5gV5XDmCpGSWDR4pX Y/jVnBYnDJZ3nfSxftICbqAzl03/CpIf8cSfuYO/ydl9IECnMXyGPu28l3azeUkR+H04 xD9qEo+M134okadJ4wOXguNDVWJyAIFgMZHmLDW0bXhWCQ78CcxPZybxE728jDfTzA0V /hsK3cFjTYCDEkhWRLw8dpconbKyxePU8eqSz1xTOgpOFI7JaqoMg26yFoGfUiQ97Rx8 7Z4VJO7Wpts00ErY2WYumI0sAttuMAKg8GXXprLc91yMFjfQf3v2mTEPhlr8GkRDxbod /m6Q== X-Gm-Message-State: AOAM532xHRMEfS//rUFEQQ6+0c94tunBbur2Ot3pvWtJAKUKF2iIOc3r Er/uclsruTOlTCB/x+22NCjULjjLdiFqvImzp4Vmi7Dr+96kkzmE X-Google-Smtp-Source: ABdhPJxT94Zzv8kBeYuhYwW//6yDkuiPbTPrh6fMqtUsFetzQbW2FYL+Pxeg4OiYazh2ykGauOVbS533mVwFYZ45ROs= X-Received: by 2002:a05:6638:204c:: with SMTP id t12mr23623373jaj.169.1641363277612; Tue, 04 Jan 2022 22:14:37 -0800 (PST) MIME-Version: 1.0 Date: Wed, 5 Jan 2022 14:14:01 +0800 Message-ID: <CALHvHFX5=L5kqWE7mpiCTJ7OAKwM4CPqu-J57uMKikNDirbgsQ@mail.gmail.com> Subject: [PATCH] match.pd: Simplify 1 / X for integer X [PR95424] To: gcc-patches@gcc.gnu.org X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Zhao Wei Liew via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Zhao Wei Liew <zhaoweiliew@gmail.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
match.pd: Simplify 1 / X for integer X [PR95424]
|
|
Commit Message
Zhao Wei Liew
Jan. 5, 2022, 6:14 a.m. UTC
match.pd/95424: Simplify 1 / X for integer X This patch implements an optimization for the following C++ code: int f(int x) { return 1 / x; } int f(unsigned int x) { return 1 / x; } Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly: f(int): xor edx, edx mov eax, 1 idiv edi ret f(unsigned int): xor edx, edx mov eax, 1 div edi ret In comparison, clang++ -std=c++20 -O3 produces the following assembly: f(int): lea ecx, [rdi + 1] xor eax, eax cmp ecx, 3 cmovb eax, edi ret f(unsigned int): xor eax, eax cmp edi, 1 sete al ret Clang's output is more efficient as it avoids expensive div operations. With this patch, GCC now produces the following assembly: f(int): lea eax, [rdi + 1] cmp eax, 2 mov eax, 0 cmovbe eax, edi ret f(unsigned int): xor eax, eax cmp edi, 1 sete al ret which is virtualy identical to Clang's assembly output. Any slight differences in the output for f(int) is related to another possible missed optimization. gcc/ChangeLog: * match.pd: Simplify 1 / X where X is an integer. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/divide-6.c: New test. * gcc.dg/tree-ssa/divide-7.c: New test.
Comments
On Wed, Jan 5, 2022 at 7:15 AM Zhao Wei Liew via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > match.pd/95424: Simplify 1 / X for integer X > > This patch implements an optimization for the following C++ code: > > int f(int x) { > return 1 / x; > } > > int f(unsigned int x) { > return 1 / x; > } > > Before this patch, x86-64 gcc -std=c++20 -O3 produces the following > assembly: > > f(int): > xor edx, edx > mov eax, 1 > idiv edi > ret > f(unsigned int): > xor edx, edx > mov eax, 1 > div edi > ret > > In comparison, clang++ -std=c++20 -O3 produces the following assembly: > > f(int): > lea ecx, [rdi + 1] > xor eax, eax > cmp ecx, 3 > cmovb eax, edi > ret > f(unsigned int): > xor eax, eax > cmp edi, 1 > sete al > ret > > Clang's output is more efficient as it avoids expensive div operations. > > With this patch, GCC now produces the following assembly: > f(int): > lea eax, [rdi + 1] > cmp eax, 2 > mov eax, 0 > cmovbe eax, edi > ret > f(unsigned int): > xor eax, eax > cmp edi, 1 > sete al > ret > > which is virtualy identical to Clang's assembly output. Any slight > differences > in the output for f(int) is related to another possible missed optimization. > > gcc/ChangeLog: > > * match.pd: Simplify 1 / X where X is an integer. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/divide-6.c: New test. > * gcc.dg/tree-ssa/divide-7.c: New test. > > diff --git a/gcc/match.pd b/gcc/match.pd > index 84c9b918041..5edae1818bb 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (div:C @0 (negate @0)) > (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > && TYPE_OVERFLOW_UNDEFINED (type)) > - { build_minus_one_cst (type); }))) > + { build_minus_one_cst (type); })) > + > + /* 1 / X -> X == 1 for unsigned integer X > + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X > + But not for 1 / 0 so that we can get proper warnings and errors. */ > + (simplify > + (div integer_onep@0 @1) > + (switch > + (if (!integer_zerop (@1) > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) you can use INTEGRAL_TYPE_P (type), the type of @0, @1 and the result ('type') are the same. > + && TYPE_UNSIGNED (TREE_TYPE (@1))) > + (eq @0 @1)) > + (if (!integer_zerop (@1) > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > + && !TYPE_UNSIGNED (TREE_TYPE (@1))) Please refactor as (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1)) (if (TYPE_UNSIGNED (....)) (eq @0 @1) (cond (... > + (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); }) > + (le @1 { build_one_cst (integer_type_node); })) You should use build_[minus_]one_cst (type) to get -1/1 of the correct type here. X >= -1 && X <= 1 is (hopefully?) going to be simplified as (unsigned)X + 1 <= 2, it might be good to simplify it this way here as well? Finally I'm not sure whether 1 /[ceil] 2 isn't 1, similar -1 /[floor] 2 should be -1 so the transform looks to be correct only for TRUNC_DIV_EXPR, not all 'div'? Thanks, Richard. > + @1 { build_zero_cst (type); }))))) > > /* For unsigned integral types, FLOOR_DIV_EXPR is the same as > TRUNC_DIV_EXPR. Rewrite into the latter in this case. */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > new file mode 100644 > index 00000000000..a9fc4c04058 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-optimized" } */ > + > +unsigned int f(unsigned int x) { > + return 1 / x; > +} > + > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > new file mode 100644 > index 00000000000..285279af7c2 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-optimized" } */ > + > +int f(int x) { > + return 1 / x; > +} > + > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */
> X >= -1 && X <= 1 is (hopefully?) going to be simplified > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > here as well? Yup, GCC does simplify it that way in the end, so I didn't really bother to simplify it here. That said, I'm open to simplifying it here as well, but I'm not sure how to do the unsigned cast. Besides that, thanks for the rest of your suggestions! I'm testing the changes and will post a v2 soon. On Wed, 5 Jan 2022 at 16:18, Richard Biener <richard.guenther@gmail.com> wrote: > > On Wed, Jan 5, 2022 at 7:15 AM Zhao Wei Liew via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > match.pd/95424: Simplify 1 / X for integer X > > > > This patch implements an optimization for the following C++ code: > > > > int f(int x) { > > return 1 / x; > > } > > > > int f(unsigned int x) { > > return 1 / x; > > } > > > > Before this patch, x86-64 gcc -std=c++20 -O3 produces the following > > assembly: > > > > f(int): > > xor edx, edx > > mov eax, 1 > > idiv edi > > ret > > f(unsigned int): > > xor edx, edx > > mov eax, 1 > > div edi > > ret > > > > In comparison, clang++ -std=c++20 -O3 produces the following assembly: > > > > f(int): > > lea ecx, [rdi + 1] > > xor eax, eax > > cmp ecx, 3 > > cmovb eax, edi > > ret > > f(unsigned int): > > xor eax, eax > > cmp edi, 1 > > sete al > > ret > > > > Clang's output is more efficient as it avoids expensive div operations. > > > > With this patch, GCC now produces the following assembly: > > f(int): > > lea eax, [rdi + 1] > > cmp eax, 2 > > mov eax, 0 > > cmovbe eax, edi > > ret > > f(unsigned int): > > xor eax, eax > > cmp edi, 1 > > sete al > > ret > > > > which is virtualy identical to Clang's assembly output. Any slight > > differences > > in the output for f(int) is related to another possible missed optimization. > > > > gcc/ChangeLog: > > > > * match.pd: Simplify 1 / X where X is an integer. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/tree-ssa/divide-6.c: New test. > > * gcc.dg/tree-ssa/divide-7.c: New test. > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 84c9b918041..5edae1818bb 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (div:C @0 (negate @0)) > > (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > > && TYPE_OVERFLOW_UNDEFINED (type)) > > - { build_minus_one_cst (type); }))) > > + { build_minus_one_cst (type); })) > > + > > + /* 1 / X -> X == 1 for unsigned integer X > > + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X > > + But not for 1 / 0 so that we can get proper warnings and errors. */ > > + (simplify > > + (div integer_onep@0 @1) > > + (switch > > + (if (!integer_zerop (@1) > > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > > you can use INTEGRAL_TYPE_P (type), the type of @0, @1 and the > result ('type') are the same. > > > + && TYPE_UNSIGNED (TREE_TYPE (@1))) > > + (eq @0 @1)) > > + (if (!integer_zerop (@1) > > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > > + && !TYPE_UNSIGNED (TREE_TYPE (@1))) > > Please refactor as > > (if (INTEGRAL_TYPE_P (type) > && !integer_zerop (@1)) > (if (TYPE_UNSIGNED (....)) > (eq @0 @1) > (cond (... > > > + (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); }) > > + (le @1 { build_one_cst (integer_type_node); })) > > You should use build_[minus_]one_cst (type) to get -1/1 of the correct > type here. > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > here as well? > > Finally I'm not sure whether 1 /[ceil] 2 isn't 1, similar -1 /[floor] > 2 should be -1 > so the transform looks to be correct only for TRUNC_DIV_EXPR, not all 'div'? > > Thanks, > Richard. > > > + @1 { build_zero_cst (type); }))))) > > > > /* For unsigned integral types, FLOOR_DIV_EXPR is the same as > > TRUNC_DIV_EXPR. Rewrite into the latter in this case. */ > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > new file mode 100644 > > index 00000000000..a9fc4c04058 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -fdump-tree-optimized" } */ > > + > > +unsigned int f(unsigned int x) { > > + return 1 / x; > > +} > > + > > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > > +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */ > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > new file mode 100644 > > index 00000000000..285279af7c2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -fdump-tree-optimized" } */ > > + > > +int f(int x) { > > + return 1 / x; > > +} > > + > > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > > +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */
On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew <zhaoweiliew@gmail.com> wrote: > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > here as well? > > Yup, GCC does simplify it that way in the end, so I didn't really bother to simplify it here. That said, I'm open to simplifying it here as well, but I'm not sure how to do the unsigned cast. You'd do sth like (with { tree utype = unsigned_type_for (type); } (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { build_int_cst (utype, 2); }) ...) extra tricky will be 1 bit integer types, I guess it might be easiest to exclude them and special case them separately - X / Y is always X for them I think, for signed 1 bit types X / Y is always undefined when X is not 0 (the division overflows or is by zero). Not sure if worth though ;) (might appear when bools end up divided by bools ...) > Besides that, thanks for the rest of your suggestions! I'm testing the changes and will post a v2 soon. > > On Wed, 5 Jan 2022 at 16:18, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On Wed, Jan 5, 2022 at 7:15 AM Zhao Wei Liew via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > match.pd/95424: Simplify 1 / X for integer X > > > > > > This patch implements an optimization for the following C++ code: > > > > > > int f(int x) { > > > return 1 / x; > > > } > > > > > > int f(unsigned int x) { > > > return 1 / x; > > > } > > > > > > Before this patch, x86-64 gcc -std=c++20 -O3 produces the following > > > assembly: > > > > > > f(int): > > > xor edx, edx > > > mov eax, 1 > > > idiv edi > > > ret > > > f(unsigned int): > > > xor edx, edx > > > mov eax, 1 > > > div edi > > > ret > > > > > > In comparison, clang++ -std=c++20 -O3 produces the following assembly: > > > > > > f(int): > > > lea ecx, [rdi + 1] > > > xor eax, eax > > > cmp ecx, 3 > > > cmovb eax, edi > > > ret > > > f(unsigned int): > > > xor eax, eax > > > cmp edi, 1 > > > sete al > > > ret > > > > > > Clang's output is more efficient as it avoids expensive div operations. > > > > > > With this patch, GCC now produces the following assembly: > > > f(int): > > > lea eax, [rdi + 1] > > > cmp eax, 2 > > > mov eax, 0 > > > cmovbe eax, edi > > > ret > > > f(unsigned int): > > > xor eax, eax > > > cmp edi, 1 > > > sete al > > > ret > > > > > > which is virtualy identical to Clang's assembly output. Any slight > > > differences > > > in the output for f(int) is related to another possible missed optimization. > > > > > > gcc/ChangeLog: > > > > > > * match.pd: Simplify 1 / X where X is an integer. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/tree-ssa/divide-6.c: New test. > > > * gcc.dg/tree-ssa/divide-7.c: New test. > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index 84c9b918041..5edae1818bb 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (div:C @0 (negate @0)) > > > (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > > > && TYPE_OVERFLOW_UNDEFINED (type)) > > > - { build_minus_one_cst (type); }))) > > > + { build_minus_one_cst (type); })) > > > + > > > + /* 1 / X -> X == 1 for unsigned integer X > > > + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X > > > + But not for 1 / 0 so that we can get proper warnings and errors. */ > > > + (simplify > > > + (div integer_onep@0 @1) > > > + (switch > > > + (if (!integer_zerop (@1) > > > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > > > > you can use INTEGRAL_TYPE_P (type), the type of @0, @1 and the > > result ('type') are the same. > > > > > + && TYPE_UNSIGNED (TREE_TYPE (@1))) > > > + (eq @0 @1)) > > > + (if (!integer_zerop (@1) > > > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > > > + && !TYPE_UNSIGNED (TREE_TYPE (@1))) > > > > Please refactor as > > > > (if (INTEGRAL_TYPE_P (type) > > && !integer_zerop (@1)) > > (if (TYPE_UNSIGNED (....)) > > (eq @0 @1) > > (cond (... > > > > > + (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); }) > > > + (le @1 { build_one_cst (integer_type_node); })) > > > > You should use build_[minus_]one_cst (type) to get -1/1 of the correct > > type here. > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > here as well? > > > > Finally I'm not sure whether 1 /[ceil] 2 isn't 1, similar -1 /[floor] > > 2 should be -1 > > so the transform looks to be correct only for TRUNC_DIV_EXPR, not all 'div'? > > > > Thanks, > > Richard. > > > > > + @1 { build_zero_cst (type); }))))) > > > > > > /* For unsigned integral types, FLOOR_DIV_EXPR is the same as > > > TRUNC_DIV_EXPR. Rewrite into the latter in this case. */ > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > > new file mode 100644 > > > index 00000000000..a9fc4c04058 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > > @@ -0,0 +1,9 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O -fdump-tree-optimized" } */ > > > + > > > +unsigned int f(unsigned int x) { > > > + return 1 / x; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > > new file mode 100644 > > > index 00000000000..285279af7c2 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > > @@ -0,0 +1,9 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O -fdump-tree-optimized" } */ > > > + > > > +int f(int x) { > > > + return 1 / x; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */
On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches wrote: > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew <zhaoweiliew@gmail.com> wrote: > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > > here as well? > > > > Yup, GCC does simplify it that way in the end, so I didn't really bother to simplify it here. That said, I'm open to simplifying it here as well, but I'm not sure how to do the unsigned cast. > > You'd do sth like > (with > { tree utype = unsigned_type_for (type); } > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { > build_int_cst (utype, 2); }) ...) > > extra tricky will be 1 bit integer types, I guess it might be easiest > to exclude them > and special case them separately - X / Y is always X for them I think, Note, we already have: /* X / bool_range_Y is X. */ (simplify (div @0 SSA_NAME@1) (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) @0)) for those. Jakub
On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches wrote: > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew <zhaoweiliew@gmail.com> wrote: > > > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > > > here as well? > > > > > > Yup, GCC does simplify it that way in the end, so I didn't really bother to simplify it here. That said, I'm open to simplifying it here as well, but I'm not sure how to do the unsigned cast. > > > > You'd do sth like > > (with > > { tree utype = unsigned_type_for (type); } > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { > > build_int_cst (utype, 2); }) ...) > > > > extra tricky will be 1 bit integer types, I guess it might be easiest > > to exclude them > > and special case them separately - X / Y is always X for them I think, > > Note, we already have: > /* X / bool_range_Y is X. */ > (simplify > (div @0 SSA_NAME@1) > (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > @0)) > for those. Ah, it might not handle the signed : 1 case though since -1 is not in the bool range. We could generalize the above though. Richard. > > Jakub >
On Wed, 5 Jan 2022 at 17:55, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches > wrote: > > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew <zhaoweiliew@gmail.com> > wrote: > > > > > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > > > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > > > > here as well? > > > > > > > > Yup, GCC does simplify it that way in the end, so I didn't really > bother to simplify it here. That said, I'm open to simplifying it here as > well, but I'm not sure how to do the unsigned cast. > > > > > > You'd do sth like > > > (with > > > { tree utype = unsigned_type_for (type); } > > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { > > > build_int_cst (utype, 2); }) ...) > > > > > > extra tricky will be 1 bit integer types, I guess it might be easiest > > > to exclude them > > > and special case them separately - X / Y is always X for them I think, > > > > Note, we already have: > > /* X / bool_range_Y is X. */ > > (simplify > > (div @0 SSA_NAME@1) > > (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > > @0)) > > for those. > > Ah, it might not handle the signed : 1 case though since -1 is not in the > bool range. We could generalize the above though. > > Richard. > > > > > Jakub > > > Perhaps it is possible to exclude 1-bit cases and rely on other existing simplifications to deal with them? In particular, GCC seems to already optimally simplify the signed and unsigned 1-bit cases [1]. [1]: struct S { unsigned int x: 1; int y: 1; }; unsigned int fu(S s) { return 1 / s.x; } int fi(S s) { return 1 / s.y; } fu(S): mov eax, 1 ret fi(S): mov eax, -1 ret
On Thu, Jan 06, 2022 at 05:52:03PM +0800, Zhao Wei Liew wrote: > On Wed, 5 Jan 2022 at 17:55, Richard Biener <richard.guenther@gmail.com> > wrote: > > > On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches > > wrote: > > > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew <zhaoweiliew@gmail.com> > > wrote: > > > > > > > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > > > > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > > > > > here as well? > > > > > > > > > > Yup, GCC does simplify it that way in the end, so I didn't really > > bother to simplify it here. That said, I'm open to simplifying it here as > > well, but I'm not sure how to do the unsigned cast. > > > > > > > > You'd do sth like > > > > (with > > > > { tree utype = unsigned_type_for (type); } > > > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { > > > > build_int_cst (utype, 2); }) ...) > > > > > > > > extra tricky will be 1 bit integer types, I guess it might be easiest > > > > to exclude them > > > > and special case them separately - X / Y is always X for them I think, > > > > > > Note, we already have: > > > /* X / bool_range_Y is X. */ > > > (simplify > > > (div @0 SSA_NAME@1) > > > (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > > > @0)) > > > for those. > > > > Ah, it might not handle the signed : 1 case though since -1 is not in the > > bool range. We could generalize the above though. > > > > Richard. > > > > > > > > Jakub > > > > > > > Perhaps it is possible to exclude 1-bit cases and rely on other existing > simplifications to deal with them? > In particular, GCC seems to already optimally simplify the signed and > unsigned 1-bit cases [1]. Sure, I thought that was what Richi was suggesting, so ad some && TYPE_PRECISION (...) > 1 somewhere (after making sure it is INTEGRAL_TYPE_P), or element_precision if it is ANY_INTEGRAL_TYPE_P and could be vector type. The discussion was about whether we optimize even those : 1 cases sufficiently but that can be done in different simplifications... Jakub
diff --git a/gcc/match.pd b/gcc/match.pd index 84c9b918041..5edae1818bb 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (div:C @0 (negate @0)) (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) && TYPE_OVERFLOW_UNDEFINED (type)) - { build_minus_one_cst (type); }))) + { build_minus_one_cst (type); })) + + /* 1 / X -> X == 1 for unsigned integer X + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X + But not for 1 / 0 so that we can get proper warnings and errors. */ + (simplify + (div integer_onep@0 @1) + (switch + (if (!integer_zerop (@1) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_UNSIGNED (TREE_TYPE (@1))) + (eq @0 @1)) + (if (!integer_zerop (@1) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && !TYPE_UNSIGNED (TREE_TYPE (@1))) + (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); }) + (le @1 { build_one_cst (integer_type_node); })) + @1 { build_zero_cst (type); }))))) /* For unsigned integral types, FLOOR_DIV_EXPR is the same as TRUNC_DIV_EXPR. Rewrite into the latter in this case. */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c new file mode 100644 index 00000000000..a9fc4c04058 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +unsigned int f(unsigned int x) { + return 1 / x; +} + +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c new file mode 100644 index 00000000000..285279af7c2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +int f(int x) { + return 1 / x; +} + +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */