match.pd, v2: Avoid introducing UB in the a r<< (32-b) -> a r>> b optimization [PR117927]
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
Commit Message
On Wed, Jan 08, 2025 at 10:17:59AM +0100, Richard Biener wrote:
> > As mentioned in the PR, the a r<< (bitsize-b) to a r>> b and similar
> > match.pd optimization which has been introduced in GCC 15 can introduce
> > UB which wasn't there before, in particular if b is equal at runtime
> > to bitsize, then a r<< 0 is turned into a r>> bitsize.
> >
> > The following patch fixes it by optimizing it early only if VRP
> > tells us the count isn't equal to the bitsize, and late into
> > a r>> (b & (bitsize - 1)) if bitsize is power of two and the subtraction
> > has single use, on various targets the masking then goes away because
> > its rotate instructions do masking already. The latter can't be
> > done too early though, because then the expr_not_equal_to case is
> > basically useless and we introduce the masking always and can't find out
> > anymore that there was originally no masking. Even cfun->after_inlining
> > check would be too early, there is forwprop before vrp, so the patch
> > introduces a new PROP for vrp2 finish, after which there is another
> > forwprop. Or do you have ideas about what other condition could be used
> > for late matching only?
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Maybe instead add PROP_last_full_fold and set that at the start of the
> last forwprop? I think in the end we may want to have a special
> pre-expand (at ISEL time?) folding and have extra patterns enabled
> there (and some disabled eventually), but I think we don't want to
> introduce this now.
>
> With the rest I agree.
So like this? Works in quick testing, ok for trunk if it passes full
bootstrap/regtest?
2025-01-08 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <quic_apinski@quicinc.com>
PR tree-optimization/117927
* tree-pass.h (PROP_last_full_fold): Define.
* passes.def: Add last= parameters to pass_forwprop.
* tree-ssa-forwprop.cc (pass_forwprop): Add last_p non-static
data member and initialize it in the ctor.
(pass_forwprop::set_pass_param): New method.
(pass_forwprop::execute): Set PROP_last_full_fold in curr_properties
at the start if last_p.
* match.pd (a rrotate (32-b) -> a lrotate b): Only optimize either
if @2 is known not to be equal to prec or if during/after last
forwprop the subtraction has single use and prec is power of two; in
that case transform it into orotate by masked count.
* gcc.dg/tree-ssa/pr117927.c: New test.
Jakub
Comments
On Wed, 8 Jan 2025, Jakub Jelinek wrote:
> On Wed, Jan 08, 2025 at 10:17:59AM +0100, Richard Biener wrote:
> > > As mentioned in the PR, the a r<< (bitsize-b) to a r>> b and similar
> > > match.pd optimization which has been introduced in GCC 15 can introduce
> > > UB which wasn't there before, in particular if b is equal at runtime
> > > to bitsize, then a r<< 0 is turned into a r>> bitsize.
> > >
> > > The following patch fixes it by optimizing it early only if VRP
> > > tells us the count isn't equal to the bitsize, and late into
> > > a r>> (b & (bitsize - 1)) if bitsize is power of two and the subtraction
> > > has single use, on various targets the masking then goes away because
> > > its rotate instructions do masking already. The latter can't be
> > > done too early though, because then the expr_not_equal_to case is
> > > basically useless and we introduce the masking always and can't find out
> > > anymore that there was originally no masking. Even cfun->after_inlining
> > > check would be too early, there is forwprop before vrp, so the patch
> > > introduces a new PROP for vrp2 finish, after which there is another
> > > forwprop. Or do you have ideas about what other condition could be used
> > > for late matching only?
> > >
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > Maybe instead add PROP_last_full_fold and set that at the start of the
> > last forwprop? I think in the end we may want to have a special
> > pre-expand (at ISEL time?) folding and have extra patterns enabled
> > there (and some disabled eventually), but I think we don't want to
> > introduce this now.
> >
> > With the rest I agree.
>
> So like this? Works in quick testing, ok for trunk if it passes full
> bootstrap/regtest?
LGTM.
Thanks,
Richard.
> 2025-01-08 Jakub Jelinek <jakub@redhat.com>
> Andrew Pinski <quic_apinski@quicinc.com>
>
> PR tree-optimization/117927
> * tree-pass.h (PROP_last_full_fold): Define.
> * passes.def: Add last= parameters to pass_forwprop.
> * tree-ssa-forwprop.cc (pass_forwprop): Add last_p non-static
> data member and initialize it in the ctor.
> (pass_forwprop::set_pass_param): New method.
> (pass_forwprop::execute): Set PROP_last_full_fold in curr_properties
> at the start if last_p.
> * match.pd (a rrotate (32-b) -> a lrotate b): Only optimize either
> if @2 is known not to be equal to prec or if during/after last
> forwprop the subtraction has single use and prec is power of two; in
> that case transform it into orotate by masked count.
>
> * gcc.dg/tree-ssa/pr117927.c: New test.
>
> --- gcc/tree-pass.h.jj 2025-01-08 10:05:09.634203321 +0100
> +++ gcc/tree-pass.h 2025-01-08 11:22:56.356672128 +0100
> @@ -230,6 +230,7 @@ protected:
> #define PROP_assumptions_done (1 << 19) /* Assume function kept
> around. */
> #define PROP_gimple_lbitint (1 << 20) /* lowered large _BitInt */
> +#define PROP_last_full_fold (1 << 21) /* Start of last forwprop. */
>
> #define PROP_gimple \
> (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
> --- gcc/passes.def.jj 2025-01-02 11:23:21.976440640 +0100
> +++ gcc/passes.def 2025-01-08 11:24:47.865096546 +0100
> @@ -80,7 +80,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_ccp, false /* nonzero_p */);
> /* After CCP we rewrite no longer addressed locals into SSA
> form if possible. */
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/false);
> NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> NEXT_PASS (pass_sra_early);
> /* pass_build_ealias is a dummy pass that ensures that we
> @@ -214,7 +214,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_complete_unrolli);
> NEXT_PASS (pass_backprop);
> NEXT_PASS (pass_phiprop);
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/false);
> /* pass_build_alias is a dummy pass that ensures that we
> execute TODO_rebuild_alias at this point. */
> NEXT_PASS (pass_build_alias);
> @@ -254,7 +254,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_isolate_erroneous_paths);
> NEXT_PASS (pass_reassoc, true /* early_p */);
> NEXT_PASS (pass_dce);
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/false);
> NEXT_PASS (pass_phiopt, false /* early_p */);
> NEXT_PASS (pass_ccp, true /* nonzero_p */);
> /* After CCP we rewrite no longer addressed locals into SSA
> @@ -356,7 +356,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /* remove_unused_locals */);
> /* After late DCE we rewrite no longer addressed locals into SSA
> form if possible. */
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/true);
> NEXT_PASS (pass_sink_code, true /* unsplit edges */);
> NEXT_PASS (pass_phiopt, false /* early_p */);
> NEXT_PASS (pass_fold_builtins);
> --- gcc/tree-ssa-forwprop.cc.jj 2025-01-03 17:55:01.067219567 +0100
> +++ gcc/tree-ssa-forwprop.cc 2025-01-08 11:46:05.935096203 +0100
> @@ -4107,14 +4107,22 @@ class pass_forwprop : public gimple_opt_
> {
> public:
> pass_forwprop (gcc::context *ctxt)
> - : gimple_opt_pass (pass_data_forwprop, ctxt)
> + : gimple_opt_pass (pass_data_forwprop, ctxt), last_p (false)
> {}
>
> /* opt_pass methods: */
> opt_pass * clone () final override { return new pass_forwprop (m_ctxt); }
> + void set_pass_param (unsigned int n, bool param) final override
> + {
> + gcc_assert (n == 0);
> + last_p = param;
> + }
> bool gate (function *) final override { return flag_tree_forwprop; }
> unsigned int execute (function *) final override;
>
> + private:
> + /* Determines whether the pass instance should set PROP_last_full_fold. */
> + bool last_p;
> }; // class pass_forwprop
>
> unsigned int
> @@ -4123,6 +4131,8 @@ pass_forwprop::execute (function *fun)
> unsigned int todoflags = 0;
>
> cfg_changed = false;
> + if (last_p)
> + fun->curr_properties |= PROP_last_full_fold;
>
> calculate_dominance_info (CDI_DOMINATORS);
>
> --- gcc/match.pd.jj 2025-01-08 10:05:24.515994525 +0100
> +++ gcc/match.pd 2025-01-08 11:30:24.822335431 +0100
> @@ -4950,14 +4950,32 @@ (define_operator_list SYNC_FETCH_AND_AND
> build_int_cst (TREE_TYPE (@1),
> element_precision (type)), @1); }))
>
> -/* a rrotate (32-b) -> a lrotate b */
> -/* a lrotate (32-b) -> a rrotate b */
> +/* a rrotate (bitsize-b) -> a lrotate b */
> +/* a lrotate (bitsize-b) -> a rrotate b */
> +/* Only do the above when it is known that b != bitsize.
> + Otherwise canonicalize to a orotate (b & mask) when the subtraction
> + has single use and prec is a power of two. */
> (for rotate (lrotate rrotate)
> orotate (rrotate lrotate)
> (simplify
> - (rotate @0 (minus INTEGER_CST@1 @2))
> - (if (element_precision (TREE_TYPE (@0)) == wi::to_wide (@1))
> - (orotate @0 @2))))
> + (rotate @0 (minus@3 INTEGER_CST@1 @2))
> + (with { auto prec = element_precision (TREE_TYPE (@0)); }
> + (if (prec == wi::to_wide (@1))
> + (switch
> + (if (expr_not_equal_to (@2, wi::uhwi (prec,
> + TYPE_PRECISION (TREE_TYPE (@2)))))
> + (orotate @0 @2))
> + (if (single_use (@3)
> + && pow2p_hwi (prec)
> + /* Defer this case until last forwprop, so that VRP could be run and
> + expr_not_equal_to had a chance to match. Otherwise we'd do
> + pretty much always just the second case. */
> + && cfun
> + && ((cfun->curr_properties & PROP_last_full_fold) != 0
> + || !flag_tree_vrp
> + || optimize_debug))
> + (orotate @0
> + (bit_and @2 { build_int_cst (TREE_TYPE (@2), prec - 1); }))))))))
>
> /* Turn (a OP c1) OP c2 into a OP (c1+c2). */
> (for op (lrotate rrotate rshift lshift)
> --- gcc/testsuite/gcc.dg/tree-ssa/pr117927.c.jj 2025-01-08 11:22:04.773400985 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr117927.c 2025-01-08 11:22:04.773400985 +0100
> @@ -0,0 +1,97 @@
> +/* PR tree-optimization/117927 */
> +/* { dg-do compile { target { int32 && longlong64 } } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " r<< " 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " r>> " 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & 31;" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & 63;" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "32 - " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "64 - " "optimized" } } */
> +
> +static inline
> +unsigned lrotate32 (unsigned x, int t)
> +{
> + unsigned tl = x << t;
> + unsigned th = x >> (-t & 31);
> + return tl | th;
> +}
> +
> +static inline
> +unsigned rrotate32 (unsigned x, int t)
> +{
> + unsigned tl = x >> t;
> + unsigned th = x << (-t & 31);
> + return tl | th;
> +}
> +
> +static inline
> +unsigned long long lrotate64 (unsigned long long x, int t)
> +{
> + unsigned long long tl = x << t;
> + unsigned long long th = x >> (-t & 63);
> + return tl | th;
> +}
> +
> +static inline
> +unsigned long long rrotate64 (unsigned long long x, int t)
> +{
> + unsigned long long tl = x >> t;
> + unsigned long long th = x << (-t & 63);
> + return tl | th;
> +}
> +
> +unsigned
> +f1 (unsigned x, int t)
> +{
> + return lrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f2 (unsigned long long x, int t)
> +{
> + return lrotate64 (x, 64 - t);
> +}
> +
> +unsigned
> +f3 (unsigned x, int t)
> +{
> + if (t == 32)
> + __builtin_unreachable ();
> + return lrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f4 (unsigned long long x, int t)
> +{
> + if (t == 64)
> + __builtin_unreachable ();
> + return lrotate64 (x, 64 - t);
> +}
> +
> +unsigned
> +f5 (unsigned x, int t)
> +{
> + return rrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f6 (unsigned long long x, int t)
> +{
> + return rrotate64 (x, 64 - t);
> +}
> +
> +unsigned
> +f7 (unsigned x, int t)
> +{
> + if (t == 32)
> + __builtin_unreachable ();
> + return rrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f8 (unsigned long long x, int t)
> +{
> + if (t == 64)
> + __builtin_unreachable ();
> + return rrotate64 (x, 64 - t);
> +}
>
>
> Jakub
>
>
On Wed, Jan 8, 2025 at 3:08 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Jan 08, 2025 at 10:17:59AM +0100, Richard Biener wrote:
> > > As mentioned in the PR, the a r<< (bitsize-b) to a r>> b and similar
> > > match.pd optimization which has been introduced in GCC 15 can introduce
> > > UB which wasn't there before, in particular if b is equal at runtime
> > > to bitsize, then a r<< 0 is turned into a r>> bitsize.
> > >
> > > The following patch fixes it by optimizing it early only if VRP
> > > tells us the count isn't equal to the bitsize, and late into
> > > a r>> (b & (bitsize - 1)) if bitsize is power of two and the subtraction
> > > has single use, on various targets the masking then goes away because
> > > its rotate instructions do masking already. The latter can't be
> > > done too early though, because then the expr_not_equal_to case is
> > > basically useless and we introduce the masking always and can't find out
> > > anymore that there was originally no masking. Even cfun->after_inlining
> > > check would be too early, there is forwprop before vrp, so the patch
> > > introduces a new PROP for vrp2 finish, after which there is another
> > > forwprop. Or do you have ideas about what other condition could be used
> > > for late matching only?
> > >
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > Maybe instead add PROP_last_full_fold and set that at the start of the
> > last forwprop? I think in the end we may want to have a special
> > pre-expand (at ISEL time?) folding and have extra patterns enabled
> > there (and some disabled eventually), but I think we don't want to
> > introduce this now.
> >
> > With the rest I agree.
>
> So like this? Works in quick testing, ok for trunk if it passes full
> bootstrap/regtest?
>
> 2025-01-08 Jakub Jelinek <jakub@redhat.com>
> Andrew Pinski <quic_apinski@quicinc.com>
>
> PR tree-optimization/117927
> * tree-pass.h (PROP_last_full_fold): Define.
> * passes.def: Add last= parameters to pass_forwprop.
> * tree-ssa-forwprop.cc (pass_forwprop): Add last_p non-static
> data member and initialize it in the ctor.
> (pass_forwprop::set_pass_param): New method.
> (pass_forwprop::execute): Set PROP_last_full_fold in curr_properties
> at the start if last_p.
> * match.pd (a rrotate (32-b) -> a lrotate b): Only optimize either
> if @2 is known not to be equal to prec or if during/after last
> forwprop the subtraction has single use and prec is power of two; in
> that case transform it into orotate by masked count.
>
> * gcc.dg/tree-ssa/pr117927.c: New test.
>
> --- gcc/tree-pass.h.jj 2025-01-08 10:05:09.634203321 +0100
> +++ gcc/tree-pass.h 2025-01-08 11:22:56.356672128 +0100
> @@ -230,6 +230,7 @@ protected:
> #define PROP_assumptions_done (1 << 19) /* Assume function kept
> around. */
> #define PROP_gimple_lbitint (1 << 20) /* lowered large _BitInt */
> +#define PROP_last_full_fold (1 << 21) /* Start of last forwprop. */
>
> #define PROP_gimple \
> (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
> --- gcc/passes.def.jj 2025-01-02 11:23:21.976440640 +0100
> +++ gcc/passes.def 2025-01-08 11:24:47.865096546 +0100
> @@ -80,7 +80,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_ccp, false /* nonzero_p */);
> /* After CCP we rewrite no longer addressed locals into SSA
> form if possible. */
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/false);
> NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> NEXT_PASS (pass_sra_early);
> /* pass_build_ealias is a dummy pass that ensures that we
> @@ -214,7 +214,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_complete_unrolli);
> NEXT_PASS (pass_backprop);
> NEXT_PASS (pass_phiprop);
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/false);
> /* pass_build_alias is a dummy pass that ensures that we
> execute TODO_rebuild_alias at this point. */
> NEXT_PASS (pass_build_alias);
> @@ -254,7 +254,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_isolate_erroneous_paths);
> NEXT_PASS (pass_reassoc, true /* early_p */);
> NEXT_PASS (pass_dce);
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/false);
> NEXT_PASS (pass_phiopt, false /* early_p */);
> NEXT_PASS (pass_ccp, true /* nonzero_p */);
> /* After CCP we rewrite no longer addressed locals into SSA
> @@ -356,7 +356,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /* remove_unused_locals */);
> /* After late DCE we rewrite no longer addressed locals into SSA
> form if possible. */
> - NEXT_PASS (pass_forwprop);
> + NEXT_PASS (pass_forwprop, /*last=*/true);
> NEXT_PASS (pass_sink_code, true /* unsplit edges */);
> NEXT_PASS (pass_phiopt, false /* early_p */);
> NEXT_PASS (pass_fold_builtins);
Since below you set the last_p field to false by default, you only
need to change the one where you need the param changed. But both are
ok.
Also there is a style difference here about if the name of the
parameter goes after the value or before. Both ways are currently used
so this is something to clean up later on; I filed PR 118366 for this
to remind myself to fix this once stage1 opens up.
Otherwise LGTM. Plus PROP_last_full_fold was something I wanted to do
before and I got distracted from other things.
Thanks,
Andrew
> --- gcc/tree-ssa-forwprop.cc.jj 2025-01-03 17:55:01.067219567 +0100
> +++ gcc/tree-ssa-forwprop.cc 2025-01-08 11:46:05.935096203 +0100
> @@ -4107,14 +4107,22 @@ class pass_forwprop : public gimple_opt_
> {
> public:
> pass_forwprop (gcc::context *ctxt)
> - : gimple_opt_pass (pass_data_forwprop, ctxt)
> + : gimple_opt_pass (pass_data_forwprop, ctxt), last_p (false)
> {}
>
> /* opt_pass methods: */
> opt_pass * clone () final override { return new pass_forwprop (m_ctxt); }
> + void set_pass_param (unsigned int n, bool param) final override
> + {
> + gcc_assert (n == 0);
> + last_p = param;
> + }
> bool gate (function *) final override { return flag_tree_forwprop; }
> unsigned int execute (function *) final override;
>
> + private:
> + /* Determines whether the pass instance should set PROP_last_full_fold. */
> + bool last_p;
> }; // class pass_forwprop
>
> unsigned int
> @@ -4123,6 +4131,8 @@ pass_forwprop::execute (function *fun)
> unsigned int todoflags = 0;
>
> cfg_changed = false;
> + if (last_p)
> + fun->curr_properties |= PROP_last_full_fold;
>
> calculate_dominance_info (CDI_DOMINATORS);
>
> --- gcc/match.pd.jj 2025-01-08 10:05:24.515994525 +0100
> +++ gcc/match.pd 2025-01-08 11:30:24.822335431 +0100
> @@ -4950,14 +4950,32 @@ (define_operator_list SYNC_FETCH_AND_AND
> build_int_cst (TREE_TYPE (@1),
> element_precision (type)), @1); }))
>
> -/* a rrotate (32-b) -> a lrotate b */
> -/* a lrotate (32-b) -> a rrotate b */
> +/* a rrotate (bitsize-b) -> a lrotate b */
> +/* a lrotate (bitsize-b) -> a rrotate b */
> +/* Only do the above when it is known that b != bitsize.
> + Otherwise canonicalize to a orotate (b & mask) when the subtraction
> + has single use and prec is a power of two. */
> (for rotate (lrotate rrotate)
> orotate (rrotate lrotate)
> (simplify
> - (rotate @0 (minus INTEGER_CST@1 @2))
> - (if (element_precision (TREE_TYPE (@0)) == wi::to_wide (@1))
> - (orotate @0 @2))))
> + (rotate @0 (minus@3 INTEGER_CST@1 @2))
> + (with { auto prec = element_precision (TREE_TYPE (@0)); }
> + (if (prec == wi::to_wide (@1))
> + (switch
> + (if (expr_not_equal_to (@2, wi::uhwi (prec,
> + TYPE_PRECISION (TREE_TYPE (@2)))))
> + (orotate @0 @2))
> + (if (single_use (@3)
> + && pow2p_hwi (prec)
> + /* Defer this case until last forwprop, so that VRP could be run and
> + expr_not_equal_to had a chance to match. Otherwise we'd do
> + pretty much always just the second case. */
> + && cfun
> + && ((cfun->curr_properties & PROP_last_full_fold) != 0
> + || !flag_tree_vrp
> + || optimize_debug))
> + (orotate @0
> + (bit_and @2 { build_int_cst (TREE_TYPE (@2), prec - 1); }))))))))
>
> /* Turn (a OP c1) OP c2 into a OP (c1+c2). */
> (for op (lrotate rrotate rshift lshift)
> --- gcc/testsuite/gcc.dg/tree-ssa/pr117927.c.jj 2025-01-08 11:22:04.773400985 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr117927.c 2025-01-08 11:22:04.773400985 +0100
> @@ -0,0 +1,97 @@
> +/* PR tree-optimization/117927 */
> +/* { dg-do compile { target { int32 && longlong64 } } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " r<< " 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " r>> " 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & 31;" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & 63;" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "32 - " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "64 - " "optimized" } } */
> +
> +static inline
> +unsigned lrotate32 (unsigned x, int t)
> +{
> + unsigned tl = x << t;
> + unsigned th = x >> (-t & 31);
> + return tl | th;
> +}
> +
> +static inline
> +unsigned rrotate32 (unsigned x, int t)
> +{
> + unsigned tl = x >> t;
> + unsigned th = x << (-t & 31);
> + return tl | th;
> +}
> +
> +static inline
> +unsigned long long lrotate64 (unsigned long long x, int t)
> +{
> + unsigned long long tl = x << t;
> + unsigned long long th = x >> (-t & 63);
> + return tl | th;
> +}
> +
> +static inline
> +unsigned long long rrotate64 (unsigned long long x, int t)
> +{
> + unsigned long long tl = x >> t;
> + unsigned long long th = x << (-t & 63);
> + return tl | th;
> +}
> +
> +unsigned
> +f1 (unsigned x, int t)
> +{
> + return lrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f2 (unsigned long long x, int t)
> +{
> + return lrotate64 (x, 64 - t);
> +}
> +
> +unsigned
> +f3 (unsigned x, int t)
> +{
> + if (t == 32)
> + __builtin_unreachable ();
> + return lrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f4 (unsigned long long x, int t)
> +{
> + if (t == 64)
> + __builtin_unreachable ();
> + return lrotate64 (x, 64 - t);
> +}
> +
> +unsigned
> +f5 (unsigned x, int t)
> +{
> + return rrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f6 (unsigned long long x, int t)
> +{
> + return rrotate64 (x, 64 - t);
> +}
> +
> +unsigned
> +f7 (unsigned x, int t)
> +{
> + if (t == 32)
> + __builtin_unreachable ();
> + return rrotate32 (x, 32 - t);
> +}
> +
> +unsigned long long
> +f8 (unsigned long long x, int t)
> +{
> + if (t == 64)
> + __builtin_unreachable ();
> + return rrotate64 (x, 64 - t);
> +}
>
>
> Jakub
>
@@ -230,6 +230,7 @@ protected:
#define PROP_assumptions_done (1 << 19) /* Assume function kept
around. */
#define PROP_gimple_lbitint (1 << 20) /* lowered large _BitInt */
+#define PROP_last_full_fold (1 << 21) /* Start of last forwprop. */
#define PROP_gimple \
(PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
@@ -80,7 +80,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_ccp, false /* nonzero_p */);
/* After CCP we rewrite no longer addressed locals into SSA
form if possible. */
- NEXT_PASS (pass_forwprop);
+ NEXT_PASS (pass_forwprop, /*last=*/false);
NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
NEXT_PASS (pass_sra_early);
/* pass_build_ealias is a dummy pass that ensures that we
@@ -214,7 +214,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_complete_unrolli);
NEXT_PASS (pass_backprop);
NEXT_PASS (pass_phiprop);
- NEXT_PASS (pass_forwprop);
+ NEXT_PASS (pass_forwprop, /*last=*/false);
/* pass_build_alias is a dummy pass that ensures that we
execute TODO_rebuild_alias at this point. */
NEXT_PASS (pass_build_alias);
@@ -254,7 +254,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_isolate_erroneous_paths);
NEXT_PASS (pass_reassoc, true /* early_p */);
NEXT_PASS (pass_dce);
- NEXT_PASS (pass_forwprop);
+ NEXT_PASS (pass_forwprop, /*last=*/false);
NEXT_PASS (pass_phiopt, false /* early_p */);
NEXT_PASS (pass_ccp, true /* nonzero_p */);
/* After CCP we rewrite no longer addressed locals into SSA
@@ -356,7 +356,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /* remove_unused_locals */);
/* After late DCE we rewrite no longer addressed locals into SSA
form if possible. */
- NEXT_PASS (pass_forwprop);
+ NEXT_PASS (pass_forwprop, /*last=*/true);
NEXT_PASS (pass_sink_code, true /* unsplit edges */);
NEXT_PASS (pass_phiopt, false /* early_p */);
NEXT_PASS (pass_fold_builtins);
@@ -4107,14 +4107,22 @@ class pass_forwprop : public gimple_opt_
{
public:
pass_forwprop (gcc::context *ctxt)
- : gimple_opt_pass (pass_data_forwprop, ctxt)
+ : gimple_opt_pass (pass_data_forwprop, ctxt), last_p (false)
{}
/* opt_pass methods: */
opt_pass * clone () final override { return new pass_forwprop (m_ctxt); }
+ void set_pass_param (unsigned int n, bool param) final override
+ {
+ gcc_assert (n == 0);
+ last_p = param;
+ }
bool gate (function *) final override { return flag_tree_forwprop; }
unsigned int execute (function *) final override;
+ private:
+ /* Determines whether the pass instance should set PROP_last_full_fold. */
+ bool last_p;
}; // class pass_forwprop
unsigned int
@@ -4123,6 +4131,8 @@ pass_forwprop::execute (function *fun)
unsigned int todoflags = 0;
cfg_changed = false;
+ if (last_p)
+ fun->curr_properties |= PROP_last_full_fold;
calculate_dominance_info (CDI_DOMINATORS);
@@ -4950,14 +4950,32 @@ (define_operator_list SYNC_FETCH_AND_AND
build_int_cst (TREE_TYPE (@1),
element_precision (type)), @1); }))
-/* a rrotate (32-b) -> a lrotate b */
-/* a lrotate (32-b) -> a rrotate b */
+/* a rrotate (bitsize-b) -> a lrotate b */
+/* a lrotate (bitsize-b) -> a rrotate b */
+/* Only do the above when it is known that b != bitsize.
+ Otherwise canonicalize to a orotate (b & mask) when the subtraction
+ has single use and prec is a power of two. */
(for rotate (lrotate rrotate)
orotate (rrotate lrotate)
(simplify
- (rotate @0 (minus INTEGER_CST@1 @2))
- (if (element_precision (TREE_TYPE (@0)) == wi::to_wide (@1))
- (orotate @0 @2))))
+ (rotate @0 (minus@3 INTEGER_CST@1 @2))
+ (with { auto prec = element_precision (TREE_TYPE (@0)); }
+ (if (prec == wi::to_wide (@1))
+ (switch
+ (if (expr_not_equal_to (@2, wi::uhwi (prec,
+ TYPE_PRECISION (TREE_TYPE (@2)))))
+ (orotate @0 @2))
+ (if (single_use (@3)
+ && pow2p_hwi (prec)
+ /* Defer this case until last forwprop, so that VRP could be run and
+ expr_not_equal_to had a chance to match. Otherwise we'd do
+ pretty much always just the second case. */
+ && cfun
+ && ((cfun->curr_properties & PROP_last_full_fold) != 0
+ || !flag_tree_vrp
+ || optimize_debug))
+ (orotate @0
+ (bit_and @2 { build_int_cst (TREE_TYPE (@2), prec - 1); }))))))))
/* Turn (a OP c1) OP c2 into a OP (c1+c2). */
(for op (lrotate rrotate rshift lshift)
@@ -0,0 +1,97 @@
+/* PR tree-optimization/117927 */
+/* { dg-do compile { target { int32 && longlong64 } } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times " r<< " 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " r>> " 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & 31;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & 63;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "32 - " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "64 - " "optimized" } } */
+
+static inline
+unsigned lrotate32 (unsigned x, int t)
+{
+ unsigned tl = x << t;
+ unsigned th = x >> (-t & 31);
+ return tl | th;
+}
+
+static inline
+unsigned rrotate32 (unsigned x, int t)
+{
+ unsigned tl = x >> t;
+ unsigned th = x << (-t & 31);
+ return tl | th;
+}
+
+static inline
+unsigned long long lrotate64 (unsigned long long x, int t)
+{
+ unsigned long long tl = x << t;
+ unsigned long long th = x >> (-t & 63);
+ return tl | th;
+}
+
+static inline
+unsigned long long rrotate64 (unsigned long long x, int t)
+{
+ unsigned long long tl = x >> t;
+ unsigned long long th = x << (-t & 63);
+ return tl | th;
+}
+
+unsigned
+f1 (unsigned x, int t)
+{
+ return lrotate32 (x, 32 - t);
+}
+
+unsigned long long
+f2 (unsigned long long x, int t)
+{
+ return lrotate64 (x, 64 - t);
+}
+
+unsigned
+f3 (unsigned x, int t)
+{
+ if (t == 32)
+ __builtin_unreachable ();
+ return lrotate32 (x, 32 - t);
+}
+
+unsigned long long
+f4 (unsigned long long x, int t)
+{
+ if (t == 64)
+ __builtin_unreachable ();
+ return lrotate64 (x, 64 - t);
+}
+
+unsigned
+f5 (unsigned x, int t)
+{
+ return rrotate32 (x, 32 - t);
+}
+
+unsigned long long
+f6 (unsigned long long x, int t)
+{
+ return rrotate64 (x, 64 - t);
+}
+
+unsigned
+f7 (unsigned x, int t)
+{
+ if (t == 32)
+ __builtin_unreachable ();
+ return rrotate32 (x, 32 - t);
+}
+
+unsigned long long
+f8 (unsigned long long x, int t)
+{
+ if (t == 64)
+ __builtin_unreachable ();
+ return rrotate64 (x, 64 - t);
+}