[match.pd] PR tree-optimization/114661: Generalize MULT_EXPR recognition (take #2)
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Hi Richard,
Many thanks for the review and recommendation to use nop_convert?.
This revised patch implements that suggestion, which required a little
experimentation/tweaking as ranger/EVRP records the ranges on the
useless type conversions rather than the multiplications.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2024-07-14 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/114661
* match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
type conversions around multiplicaitions, such as those inserted
by this transformation.
gcc/testsuite/ChangeLog
PR tree-optimization/114661
* gcc.dg/pr114661.c: New test case.
Thanks again,
Roger
--
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 10 July 2024 12:34
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [match.pd PATCH] PR tree-optimization/114661: Generalize
> MULT_EXPR recognition.
>
> On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch resolves PR tree-optimization/114661, by generalizing the
> > set of expressions that we canonicalize to multiplication. This
> > extends the
> > optimization(s) contributed (by me) back in July 2021.
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
> >
> > The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
> > allowed. A subtlety is that for non-wrapping integer types, we
> > actually fold this into (int)((unsigned)X*C3) so that we don't
> > introduce an undefined overflow that wasn't in the original.
> > Unfortunately, this transformation confuses itself, as the type-safe
> > multiplication isn't recognized when further combining bit operations.
> > Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X<<C2)
> > into (int)((unsigned)X*C3) so that match.pd and EVRP can continue to
> > construct multiplications.
> >
> > For the example given in the PR:
> >
> > unsigned mul(unsigned char c) {
> > if (c > 3) __builtin_unreachable();
> > return c << 18 | c << 15 |
> > c << 12 | c << 9 |
> > c << 6 | c << 3 | c;
> > }
> >
> > GCC on x86_64 with -O2 previously generated:
> >
> > mul: movzbl %dil, %edi
> > leal (%rdi,%rdi,8), %edx
> > leal 0(,%rdx,8), %eax
> > movl %edx, %ecx
> > sall $15, %edx
> > orl %edi, %eax
> > sall $9, %ecx
> > orl %ecx, %eax
> > orl %edx, %eax
> > ret
> >
> > with this patch we now generate:
> >
> > mul: movzbl %dil, %eax
> > imull $299593, %eax, %eax
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures. Ok for mainline?
>
> I'm looking at the difference between the existing
>
> (simplify
> (op:c (mult:s@0 @1 INTEGER_CST@2)
> (lshift:s@3 @1 INTEGER_CST@4))
> (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
> && tree_int_cst_sgn (@4) > 0
> && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
> (with { wide_int wone = wi::one (TYPE_PRECISION (type));
> wide_int c = wi::add (wi::to_wide (@2),
> wi::lshift (wone, wi::to_wide (@4))); }
> (mult @1 { wide_int_to_tree (type, c); }))))
>
> and
>
> + (simplify
> + (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3))
> + (lshift:s@4 @2 INTEGER_CST@5))
> + (if (INTEGRAL_TYPE_P (type)
> + && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> + && TREE_TYPE (@2) == type
> + && TYPE_UNSIGNED (TREE_TYPE (@1))
> + && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))
> + && tree_int_cst_sgn (@5) > 0
> + && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
> + (with { tree t = TREE_TYPE (@1);
> + wide_int wone = wi::one (TYPE_PRECISION (t));
> + wide_int c = wi::add (wi::to_wide (@3),
> + wi::lshift (wone, wi::to_wide (@5))); }
> + (convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); })))))
>
> and wonder whether wrapping of the multiplication is required for correctness,
> specifically the former seems to allow signed types with -fwrapv while the latter
> won't. It also looks the patterns could be merged doing
>
> (simplify
> (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2)
> (lshift:s@3 @1 INTEGER_CST@4))
>
> and by using nop_convert instead of convert simplify the condition?
>
> Richard.
>
> >
> > 2024-07-09 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR tree-optimization/114661
> > * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize
> > multiplications surrounded by casts to an unsigned type and back
> > such as those generated by these transformations.
> >
> > gcc/testsuite/ChangeLog
> > PR tree-optimization/114661
> > * gcc.dg/pr114661.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >
Comments
On Sun, Jul 14, 2024 at 3:08 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
> Many thanks for the review and recommendation to use nop_convert?.
> This revised patch implements that suggestion, which required a little
> experimentation/tweaking as ranger/EVRP records the ranges on the
> useless type conversions rather than the multiplications.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
OK.
Thanks,
Richard.
>
> 2024-07-14 Roger Sayle <roger@nextmovesoftware.com>
> Richard Biener <rguenther@suse.de>
>
> gcc/ChangeLog
> PR tree-optimization/114661
> * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
> type conversions around multiplicaitions, such as those inserted
> by this transformation.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/114661
> * gcc.dg/pr114661.c: New test case.
>
>
> Thanks again,
> Roger
> --
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 10 July 2024 12:34
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [match.pd PATCH] PR tree-optimization/114661: Generalize
> > MULT_EXPR recognition.
> >
> > On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch resolves PR tree-optimization/114661, by generalizing the
> > > set of expressions that we canonicalize to multiplication. This
> > > extends the
> > > optimization(s) contributed (by me) back in July 2021.
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
> > >
> > > The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
> > > allowed. A subtlety is that for non-wrapping integer types, we
> > > actually fold this into (int)((unsigned)X*C3) so that we don't
> > > introduce an undefined overflow that wasn't in the original.
> > > Unfortunately, this transformation confuses itself, as the type-safe
> > > multiplication isn't recognized when further combining bit operations.
> > > Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X<<C2)
> > > into (int)((unsigned)X*C3) so that match.pd and EVRP can continue to
> > > construct multiplications.
> > >
> > > For the example given in the PR:
> > >
> > > unsigned mul(unsigned char c) {
> > > if (c > 3) __builtin_unreachable();
> > > return c << 18 | c << 15 |
> > > c << 12 | c << 9 |
> > > c << 6 | c << 3 | c;
> > > }
> > >
> > > GCC on x86_64 with -O2 previously generated:
> > >
> > > mul: movzbl %dil, %edi
> > > leal (%rdi,%rdi,8), %edx
> > > leal 0(,%rdx,8), %eax
> > > movl %edx, %ecx
> > > sall $15, %edx
> > > orl %edi, %eax
> > > sall $9, %ecx
> > > orl %ecx, %eax
> > > orl %edx, %eax
> > > ret
> > >
> > > with this patch we now generate:
> > >
> > > mul: movzbl %dil, %eax
> > > imull $299593, %eax, %eax
> > > ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures. Ok for mainline?
> >
> > I'm looking at the difference between the existing
> >
> > (simplify
> > (op:c (mult:s@0 @1 INTEGER_CST@2)
> > (lshift:s@3 @1 INTEGER_CST@4))
> > (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
> > && tree_int_cst_sgn (@4) > 0
> > && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
> > (with { wide_int wone = wi::one (TYPE_PRECISION (type));
> > wide_int c = wi::add (wi::to_wide (@2),
> > wi::lshift (wone, wi::to_wide (@4))); }
> > (mult @1 { wide_int_to_tree (type, c); }))))
> >
> > and
> >
> > + (simplify
> > + (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3))
> > + (lshift:s@4 @2 INTEGER_CST@5))
> > + (if (INTEGRAL_TYPE_P (type)
> > + && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> > + && TREE_TYPE (@2) == type
> > + && TYPE_UNSIGNED (TREE_TYPE (@1))
> > + && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))
> > + && tree_int_cst_sgn (@5) > 0
> > + && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
> > + (with { tree t = TREE_TYPE (@1);
> > + wide_int wone = wi::one (TYPE_PRECISION (t));
> > + wide_int c = wi::add (wi::to_wide (@3),
> > + wi::lshift (wone, wi::to_wide (@5))); }
> > + (convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); })))))
> >
> > and wonder whether wrapping of the multiplication is required for correctness,
> > specifically the former seems to allow signed types with -fwrapv while the latter
> > won't. It also looks the patterns could be merged doing
> >
> > (simplify
> > (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2)
> > (lshift:s@3 @1 INTEGER_CST@4))
> >
> > and by using nop_convert instead of convert simplify the condition?
> >
> > Richard.
> >
> > >
> > > 2024-07-09 Roger Sayle <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > > PR tree-optimization/114661
> > > * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize
> > > multiplications surrounded by casts to an unsigned type and back
> > > such as those generated by these transformations.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR tree-optimization/114661
> > > * gcc.dg/pr114661.c: New test case.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
@@ -4156,30 +4156,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
Likewise, handle (X<<C3) and X as legitimate variants of X*C. */
(for op (bit_ior bit_xor)
(simplify
- (op (mult:s@0 @1 INTEGER_CST@2)
- (mult:s@3 @1 INTEGER_CST@4))
- (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
- && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
- (mult @1
- { wide_int_to_tree (type, wi::to_wide (@2) + wi::to_wide (@4)); })))
+ (op (nop_convert?:s@6 (mult:s@0 (nop_convert? @1) INTEGER_CST@2))
+ (nop_convert?:s@7 (mult:s@3 (nop_convert? @5) INTEGER_CST@4)))
+ (if (INTEGRAL_TYPE_P (type)
+ && operand_equal_p (@1, @5, 0)
+ && (tree_nonzero_bits (@6) & tree_nonzero_bits (@7)) == 0)
+ (with { tree t = type;
+ if (!TYPE_OVERFLOW_WRAPS (t))
+ t = unsigned_type_for (t);
+ wide_int c = wi::add (wi::to_wide (@2), wi::to_wide (@4)); }
+ (convert (mult:t (convert:t @1) { wide_int_to_tree (t, c); })))))
(simplify
- (op:c (mult:s@0 @1 INTEGER_CST@2)
+ (op:c (nop_convert?:s@5 (mult:s@0 (nop_convert? @1) INTEGER_CST@2))
(lshift:s@3 @1 INTEGER_CST@4))
- (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
+ (if (INTEGRAL_TYPE_P (type)
&& tree_int_cst_sgn (@4) > 0
- && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
- (with { wide_int wone = wi::one (TYPE_PRECISION (type));
+ && (tree_nonzero_bits (@5) & tree_nonzero_bits (@3)) == 0)
+ (with { tree t = type;
+ if (!TYPE_OVERFLOW_WRAPS (t))
+ t = unsigned_type_for (t);
+ wide_int wone = wi::one (TYPE_PRECISION (type));
wide_int c = wi::add (wi::to_wide (@2),
wi::lshift (wone, wi::to_wide (@4))); }
- (mult @1 { wide_int_to_tree (type, c); }))))
+ (convert (mult:t (convert:t @1) { wide_int_to_tree (t, c); })))))
(simplify
- (op:c (mult:s@0 @1 INTEGER_CST@2)
+ (op:c (nop_convert?:s@3 (mult:s@0 (nop_convert? @1) INTEGER_CST@2))
@1)
- (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
- && (tree_nonzero_bits (@0) & tree_nonzero_bits (@1)) == 0)
- (mult @1
- { wide_int_to_tree (type,
- wi::add (wi::to_wide (@2), 1)); })))
+ (if (INTEGRAL_TYPE_P (type)
+ && (tree_nonzero_bits (@3) & tree_nonzero_bits (@1)) == 0)
+ (with { tree t = type;
+ if (!TYPE_OVERFLOW_WRAPS (t))
+ t = unsigned_type_for (t);
+ wide_int c = wi::add (wi::to_wide (@2), 1); }
+ (convert (mult:t (convert:t @1) { wide_int_to_tree (t, c); })))))
(simplify
(op (lshift:s@0 @1 INTEGER_CST@2)
(lshift:s@3 @1 INTEGER_CST@4))
new file mode 100644
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+unsigned mul(unsigned char c) {
+ if (c > 3) __builtin_unreachable();
+ return c << 18 | c << 15 |
+ c << 12 | c << 9 |
+ c << 6 | c << 3 | c;
+}
+/* { dg-final { scan-tree-dump-times " \\* 299593" 1 "evrp" } } */