combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

Message ID 20240116221914.267015-1-gkm@rivosinc.com
State New
Headers
Series combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm fail Testing failed

Commit Message

Greg McGary Jan. 16, 2024, 10:19 p.m. UTC
  The sign bit of a sign-extending load cannot be known until runtime,
so don't attempt to simplify it in the combiner.

2024-01-11  Greg McGary  <gkm@rivosinc.com>

        PR rtl-optimization/113010
        * combine.cc (expand_compound_operation): Don't simplify
	SIGN_EXTEND of a MEM on WORD_REGISTER_OPERATIONS targets

        * gcc.c-torture/execute/pr113010.c: New test.
---
 gcc/combine.cc                                 | 5 +++++
 gcc/testsuite/gcc.c-torture/execute/pr113010.c | 9 +++++++++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c
  

Comments

Richard Biener Jan. 17, 2024, 6:44 a.m. UTC | #1
On Tue, Jan 16, 2024 at 11:20 PM Greg McGary <gkm@rivosinc.com> wrote:
>
> The sign bit of a sign-extending load cannot be known until runtime,
> so don't attempt to simplify it in the combiner.

It feels like this papers over an issue downstream?

> 2024-01-11  Greg McGary  <gkm@rivosinc.com>
>
>         PR rtl-optimization/113010
>         * combine.cc (expand_compound_operation): Don't simplify
>         SIGN_EXTEND of a MEM on WORD_REGISTER_OPERATIONS targets
>
>         * gcc.c-torture/execute/pr113010.c: New test.
> ---
>  gcc/combine.cc                                 | 5 +++++
>  gcc/testsuite/gcc.c-torture/execute/pr113010.c | 9 +++++++++
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 812553c091e..ba587184dfc 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -7208,6 +7208,11 @@ expand_compound_operation (rtx x)
>        if (len == 0)
>         return x;
>
> +      /* Sign-extending loads can never be simplified at compile time.  */
> +      if (WORD_REGISTER_OPERATIONS && MEM_P (XEXP (x, 0))
> +         && load_extend_op (inner_mode) == SIGN_EXTEND)
> +       return x;
> +
>        break;
>
>      case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> new file mode 100644
> index 00000000000..a95c613c1df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> @@ -0,0 +1,9 @@
> +int minus_1 = -1;
> +
> +int
> +main ()
> +{
> +  if ((0, 0xfffffffful) >= minus_1)
> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.34.1
>
  
YunQiang Su Jan. 17, 2024, 7:56 a.m. UTC | #2
Greg McGary <gkm@rivosinc.com> 于2024年1月17日周三 06:20写道:
>
> The sign bit of a sign-extending load cannot be known until runtime,
> so don't attempt to simplify it in the combiner.
>
> 2024-01-11  Greg McGary  <gkm@rivosinc.com>
>
>         PR rtl-optimization/113010
>         * combine.cc (expand_compound_operation): Don't simplify
>         SIGN_EXTEND of a MEM on WORD_REGISTER_OPERATIONS targets
>
>         * gcc.c-torture/execute/pr113010.c: New test.
> ---
>  gcc/combine.cc                                 | 5 +++++
>  gcc/testsuite/gcc.c-torture/execute/pr113010.c | 9 +++++++++
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 812553c091e..ba587184dfc 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -7208,6 +7208,11 @@ expand_compound_operation (rtx x)
>        if (len == 0)
>         return x;
>
> +      /* Sign-extending loads can never be simplified at compile time.  */
> +      if (WORD_REGISTER_OPERATIONS && MEM_P (XEXP (x, 0))
> +         && load_extend_op (inner_mode) == SIGN_EXTEND)
> +       return x;
> +
>        break;
>
>      case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> new file mode 100644
> index 00000000000..a95c613c1df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
> @@ -0,0 +1,9 @@
> +int minus_1 = -1;
> +
> +int
> +main ()
> +{
> +  if ((0, 0xfffffffful) >= minus_1)

There is a warning option:

-Wsign-compare
Warn when a comparison between signed and unsigned values could
produce an incorrect result when the signed value is converted to unsigned.

> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.34.1
>
  
Greg McGary Jan. 18, 2024, 3:53 a.m. UTC | #3
On Tue, Jan 16, 2024 at 11:44 PM Richard Biener <richard.guenther@gmail.com>
wrote:

> > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary <gkm@rivosinc.com> wrote:

> > >

> > > The sign bit of a sign-extending load cannot be known until runtime,

> > > so don't attempt to simplify it in the combiner.

> >
> It feels like this papers over an issue downstream?

While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? There
are plenty of other conditions that also short circuit to "do nothing" and
this seems just as straightforward as those others. Efforts to catch this
further downstream add gratuitous complexity.

G
  
Jeff Law Jan. 18, 2024, 4:24 p.m. UTC | #4
On 1/17/24 20:53, Greg McGary wrote:
> On Tue, Jan 16, 2024 at 11:44 PM Richard Biener 
> <richard.guenther@gmail.com <mailto:richard.guenther@gmail.com>> wrote:
> 
>  > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary <gkm@rivosinc.com 
> <mailto:gkm@rivosinc.com>> wrote:
> 
>  > >
> 
>  > > The sign bit of a sign-extending load cannot be known until runtime,
> 
>  > > so don't attempt to simplify it in the combiner.
> 
>  >
>  > It feels like this papers over an issue downstream?
> 
> While the code comment is true, perhaps it obscures the primary intent,
> which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
> to expand into a single memory-load instruction and no simplification is
> possible, so why waste time with further analysis or transformation? There
> are plenty of other conditions that also short circuit to "do nothing" and
> this seems just as straightforward as those others. Efforts to catch this
> further downstream add gratuitous complexity.
Because the real bug is likely still lurking, waiting for something else 
to trigger it.

An early exit is fine when we're just trying to avoid unnecessary work, 
but there's something else going on here we need to understand first.

jeff
  
Greg McGary Feb. 2, 2024, 1:24 a.m. UTC | #5
On 1/18/24 9:24 AM, Jeff Law wrote:
>
> On 1/17/24 20:53, Greg McGary wrote:
>>
>> While the code comment is true, perhaps it obscures the primary intent,
>> which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is 
>> destined
>> to expand into a single memory-load instruction and no simplification is
>> possible, so why waste time with further analysis or transformation? 
>> There
>> are plenty of other conditions that also short circuit to "do 
>> nothing" and
>> this seems just as straightforward as those others. Efforts to catch 
>> this
>> further downstream add gratuitous complexity.
> Because the real bug is likely still lurking, waiting for something 
> else to trigger it.
>
> An early exit is fine when we're just trying to avoid unnecessary 
> work, but there's something else going on here we need to understand 
> first.


expand_compound_operation() wants to evaluate the sign- and 
zero-extension of MEM. It begins by calling gen_lowpart(), which returns 
the same result in both cases: a paradoxical subreg of a MEM (PSoM). 
What is the value of the high part of a PSoM? Can that high part be 
evaluated at compile time?

According to existing code, the high part of a PSoM is statically 0. 
However, for a machine where (WORD_REGISTER_OPERATIONS && load_extend_op 
(inner_mode) == SIGN_EXTEND), the high part of a PSoM is  only known at 
runtime as 0s or 1s. That's the downstream bug. The fix for such 
machines is either (A) forbid static evaluation of the high part of a 
PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM ...) ) into a PSoM. 
My patch does B. Perhaps you prefer A? The trouble with A is that in the 
zero-extend case, it is valid to statically evaluate as 0. It is only 
the sign-extend case that isn't known until runtime. By the time we have 
a PSoM, its upstream ancestor as sign- or zero-extend is already lost.

Does that give you the understanding you desire, or are there deeper 
mysteries to probe?

G
  
Jeff Law Feb. 2, 2024, 5:24 a.m. UTC | #6
On 2/1/24 18:24, Greg McGary wrote:
> On 1/18/24 9:24 AM, Jeff Law wrote:
>>
>> On 1/17/24 20:53, Greg McGary wrote:
>>>
>>> While the code comment is true, perhaps it obscures the primary intent,
>>> which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is 
>>> destined
>>> to expand into a single memory-load instruction and no simplification is
>>> possible, so why waste time with further analysis or transformation? 
>>> There
>>> are plenty of other conditions that also short circuit to "do 
>>> nothing" and
>>> this seems just as straightforward as those others. Efforts to catch 
>>> this
>>> further downstream add gratuitous complexity.
>> Because the real bug is likely still lurking, waiting for something 
>> else to trigger it.
>>
>> An early exit is fine when we're just trying to avoid unnecessary 
>> work, but there's something else going on here we need to understand 
>> first.
> 
> 
> expand_compound_operation() wants to evaluate the sign- and 
> zero-extension of MEM. It begins by calling gen_lowpart(), which returns 
> the same result in both cases: a paradoxical subreg of a MEM (PSoM). 
> What is the value of the high part of a PSoM? Can that high part be 
> evaluated at compile time?
Potentially, yes.   If LOAD_EXTEND_OP returns ZERO_EXTEND, then we know 
at compile time those bits are zero.

I could come up with ways to optimize the SIGN_EXTEND case as well, but 
that would require some state tracking and it doesn't immediately look 
like expand_compound_operation or its children use any of the state 
tracking that's available in combine.  So for the sake of this problem, 
let's consider the SIGN_EXTEND case as not computable at compile-time in 
expand_compound_operation.



> However, for a machine where (WORD_REGISTER_OPERATIONS && load_extend_op 
> (inner_mode) == SIGN_EXTEND), the high part of a PSoM is  only known at 
> runtime as 0s or 1s. That's the downstream bug. The fix for such 
> machines is either (A) forbid static evaluation of the high part of a 
> PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM ...) ) into a PSoM. 
> My patch does B. Perhaps you prefer A? The trouble with A is that in the 
> zero-extend case, it is valid to statically evaluate as 0. It is only 
> the sign-extend case that isn't known until runtime. By the time we have 
> a PSoM, its upstream ancestor as sign- or zero-extend is already lost.
> 
> Does that give you the understanding you desire, or are there deeper 
> mysteries to probe?
It's a good start and I can see what you're trying to do -- and it may 
in fact be correct -- the quick discussion with Palmer Tuesday and your 
follow-up have helped a lot).

But just to be sure, what's the incoming rtl at function entry?  just 
"debug_rtx (x)" should be sufficient.

jeff
  
Greg McGary Feb. 2, 2024, 10:48 p.m. UTC | #7
On 2/1/24 10:24 PM, Jeff Law wrote:
>
> On 2/1/24 18:24, Greg McGary wrote:
>
>> However, for a machine where (WORD_REGISTER_OPERATIONS && 
>> load_extend_op (inner_mode) == SIGN_EXTEND), the high part of a PSoM 
>> is  only known at runtime as 0s or 1s. That's the downstream bug. The 
>> fix for such machines is either (A) forbid static evaluation of the 
>> high part of a PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM 
>> ...) ) into a PSoM. My patch does B. Perhaps you prefer A? The 
>> trouble with A is that in the zero-extend case, it is valid to 
>> statically evaluate as 0. It is only the sign-extend case that isn't 
>> known until runtime. By the time we have a PSoM, its upstream 
>> ancestor as sign- or zero-extend is already lost.
>>
>> Does that give you the understanding you desire, or are there deeper 
>> mysteries to probe?
> It's a good start and I can see what you're trying to do -- and it may 
> in fact be correct -- the quick discussion with Palmer Tuesday and 
> your follow-up have helped a lot).
>
> But just to be sure, what's the incoming rtl at function entry? just 
> "debug_rtx (x)" should be sufficient.

input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
<var_decl 0x7ffff761cb40 minus_1>) [1 minus_1+0 S4 A32]))

result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
<var_decl 0x7ffff761cb40 minus_1>) [1 minus_1+0 S4 A32]) 0)

Later, the high part of the PSoM statically evaluates to 0, the code to 
load and test is elided, and the incorrect alternative is emitted 
unconditionally.

G
  
Jeff Law Feb. 5, 2024, 4:58 a.m. UTC | #8
On 2/2/24 15:48, Greg McGary wrote:
> On 2/1/24 10:24 PM, Jeff Law wrote:
>>
>> On 2/1/24 18:24, Greg McGary wrote:
>>
>>> However, for a machine where (WORD_REGISTER_OPERATIONS && 
>>> load_extend_op (inner_mode) == SIGN_EXTEND), the high part of a PSoM 
>>> is  only known at runtime as 0s or 1s. That's the downstream bug. The 
>>> fix for such machines is either (A) forbid static evaluation of the 
>>> high part of a PSoM, or (B) forbid transforming (SIGN_EXTEND (MEM 
>>> ...) ) into a PSoM. My patch does B. Perhaps you prefer A? The 
>>> trouble with A is that in the zero-extend case, it is valid to 
>>> statically evaluate as 0. It is only the sign-extend case that isn't 
>>> known until runtime. By the time we have a PSoM, its upstream 
>>> ancestor as sign- or zero-extend is already lost.
>>>
>>> Does that give you the understanding you desire, or are there deeper 
>>> mysteries to probe?
>> It's a good start and I can see what you're trying to do -- and it may 
>> in fact be correct -- the quick discussion with Palmer Tuesday and 
>> your follow-up have helped a lot).
>>
>> But just to be sure, what's the incoming rtl at function entry? just 
>> "debug_rtx (x)" should be sufficient.
> 
> input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
> <var_decl 0x7ffff761cb40 minus_1>) [1 minus_1+0 S4 A32]))
> 
> result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
> <var_decl 0x7ffff761cb40 minus_1>) [1 minus_1+0 S4 A32]) 0)
> 
> Later, the high part of the PSoM statically evaluates to 0, the code to 
> load and test is elided, and the incorrect alternative is emitted 
> unconditionally.
So I think we need to know where that high part gets statically turned 
into a zero.

I'm not happy with the sign_extend->subreg transformation as we 
generally want to avoid (subreg (mem)) for various reasons.  So we'll 
probably want to do something like your patch as well.   But let's chase 
down the static evaluation of the high part to zero first -- that's 
clearly wrong given the defined semantics of (subreg (mem)) in the 
presence of LOAD_EXTEND_OP.

jeff
  
Greg McGary Feb. 8, 2024, 5:36 a.m. UTC | #9
On 2/4/24 9:58 PM, Jeff Law wrote:
>
> On 2/2/24 15:48, Greg McGary wrote:
>>
>> input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 
>> 0x86] <var_decl 0x7ffff761cb40 minus_1>) [1 minus_1+0 S4 A32]))
>>
>> result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] 
>> <var_decl 0x7ffff761cb40 minus_1>) [1 minus_1+0 S4 A32]) 0)
>>
>> Later, the high part of the PSoM statically evaluates to 0, the code 
>> to load and test is elided, and the incorrect alternative is emitted 
>> unconditionally.
> So I think we need to know where that high part gets statically turned 
> into a zero.
>
> I'm not happy with the sign_extend->subreg transformation as we 
> generally want to avoid (subreg (mem)) for various reasons.  So we'll 
> probably want to do something like your patch as well.   But let's 
> chase down the static evaluation of the high part to zero first -- 
> that's clearly wrong given the defined semantics of (subreg (mem)) in 
> the presence of LOAD_EXTEND_OP.


This prevents the buggy evaluation. What think ye?

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 812553c091e..736206242e1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12550,9 +12550,15 @@ simplify_comparison (enum rtx_code code, rtx 
*pop0, rtx *pop1)
             }

           /* If the inner mode is narrower and we are extracting the 
low part,
-            we can treat the SUBREG as if it were a ZERO_EXTEND. */
+            we can treat the SUBREG as if it were a ZERO_EXTEND ...  */
           if (paradoxical_subreg_p (op0))
-           ;
+           {
+             /* ... except we can't treat as ZERO_EXTEND when a machine
+                automatically sign-extends loads. */
+             if (MEM_P (SUBREG_REG (op0)) && WORD_REGISTER_OPERATIONS
+                 && load_extend_op (inner_mode) == SIGN_EXTEND)
+               break;
+           }
           else if (subreg_lowpart_p (op0)
                    && GET_MODE_CLASS (mode) == MODE_INT
                    && is_int_mode (GET_MODE (SUBREG_REG (op0)), 
&inner_mode)
  

Patch

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 812553c091e..ba587184dfc 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7208,6 +7208,11 @@  expand_compound_operation (rtx x)
       if (len == 0)
 	return x;
 
+      /* Sign-extending loads can never be simplified at compile time.  */
+      if (WORD_REGISTER_OPERATIONS && MEM_P (XEXP (x, 0))
+	  && load_extend_op (inner_mode) == SIGN_EXTEND)
+	return x;
+
       break;
 
     case ZERO_EXTRACT:
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
new file mode 100644
index 00000000000..a95c613c1df
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
@@ -0,0 +1,9 @@ 
+int minus_1 = -1;
+
+int
+main ()
+{
+  if ((0, 0xfffffffful) >= minus_1)
+    __builtin_abort ();
+  return 0;
+}