combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

Message ID ZC6fiaL9vIiqZJ7z@tucnak
State New
Headers
Series combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040] |

Commit Message

Jakub Jelinek April 6, 2023, 10:31 a.m. UTC
  On Thu, Apr 06, 2023 at 12:15:51PM +0200, Eric Botcazou wrote:
> > Originally I didn't really see this as an operation.  But the more and
> > more I ponder it feels like it's an operation and thus should be subject
> > to WORD_REGISTER_OPERATIONS.
> > 
> > While it's not really binding on RTL semantics, if we look at how some
> > architectures implement reg->reg copies, they're actually implemented
> > with an ADD or IOR -- so a real operation under the hood.
> > 
> > If we accept a subreg copy as an operation and thus subject to
> > WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
> > the real problem here.  Otherwise dse is the culprit.
> 
> Yes, I agree that there is an ambiguity for subreg copy operations.  At some 
> point I tried to define what register operations are and added a predicate to 
> that effect (word_register_operation_p ); while it returns true for SUBREG, 
> it's an opt-out predicate so this does not mean much.
> 
> I don't think that DSE does anything wrong: as I wrote in the PR, defining 
> WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.
> 
> I therefore think that the problem is in the combiner and probably in the 
> intermediate step shown by Jakub:
> 
> "Then after that try_combine we do:
> 13325           record_value_for_reg (dest, record_dead_insn,
> 13326                                 WORD_REGISTER_OPERATIONS
> 13327                                 && word_register_operation_p (SET_SRC 
> (setter))
> 13328                                 && paradoxical_subreg_p (SET_DEST 
> (setter))
> 13329                                 ? SET_SRC (setter)
> 13330                                 : gen_lowpart (GET_MODE (dest),
> 13331                                                SET_SRC (setter)));
> and the 3 conditions are true here and so record value of the whole setter.
> That then records among other things nonzero_bits as 0x8084c."
> 
> That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297) 
> and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either 
> be reverted or restricted to the load case documented in its comment.  I can 
> provide testing on SPARC if need be.

If we want to fix it in the combiner, I think the fix would be following.
The optimization is about
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
and IMHO we can only optimize it into
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
if we know that the upper bits of the REG are zeros.  The optimization
is only when the constant second AND operand is the same in both modes.
Otherwise because of nonzero_bits ((reg:SI xxx), SImode) == 0x8084c
we optimize into (subreg:SI (reg:HI xxx) 0).
The original form has the upper 16 bits guaranteed to be all zeros,
while the new form has them whatever value comes from the original
operation.  For !WORD_REGISTER_OPERATIONS, we don't optimize this way
paradoxical SUBREGs at all, then by definition the upper bits are unspecified.

Now, this patch fixes the PR, but certainly generates worse (but correct)
code than the dse.cc patch.  So perhaps we want both of them?

As before, I unfortunately can't test it on riscv-linux (could perhaps try
that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
target, but last my bootstrap attempt there failed miserably because of the
Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
for that once I test it).

2023-04-06  Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* combine.cc (simplify_comparison) <case AND>: For swapping AND and
	SUBREG with paradoxical subreg on WORD_REGISTER_OPERATIONS target,
	ensure the upper bits are known to be 0.



	Jakub
  

Comments

Eric Botcazou April 6, 2023, 10:51 a.m. UTC | #1
> If we want to fix it in the combiner, I think the fix would be following.
> The optimization is about
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> and IMHO we can only optimize it into
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> if we know that the upper bits of the REG are zeros.

The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation 
is done on the full word register, in other words that it's in effect:

(subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)

that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.

> Now, this patch fixes the PR, but certainly generates worse (but correct)
> code than the dse.cc patch.  So perhaps we want both of them?

What happens if you disable the step I mentioned (patchlet attached)?

> As before, I unfortunately can't test it on riscv-linux (could perhaps try
> that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
> target, but last my bootstrap attempt there failed miserably because of the
> Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
> for that once I test it).

I think that we have one in the Ada compiler too; glad to know we are not 
alone in this boat. ;-)
  
Jakub Jelinek April 6, 2023, 11:37 a.m. UTC | #2
On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> > If we want to fix it in the combiner, I think the fix would be following.
> > The optimization is about
> > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > and IMHO we can only optimize it into
> > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > if we know that the upper bits of the REG are zeros.
> 
> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation 
> is done on the full word register, in other words that it's in effect:
> 
> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> 
> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.

If the
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
to
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
invalid operation is then in
simplify_context::simplify_binary_operation_1
    case AND:
...
      if (HWI_COMPUTABLE_MODE_P (mode))
        {
          HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
          HOST_WIDE_INT nzop1;
          if (CONST_INT_P (trueop1))
            {
              HOST_WIDE_INT val1 = INTVAL (trueop1);
              /* If we are turning off bits already known off in OP0, we need
                 not do an AND.  */
              if ((nzop0 & ~val1) == 0)
                return op0;
            }
We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
[0x84c]).
We then for integral? modes smaller than word_mode would then need to
actually check nonzero_bits in the word_mode (on paradoxical subreg of
trueop0?).  If INTVAL (trueop1) is >= 0, then I think just doing
nonzero_bits in the wider mode would be all we need (although the
subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
means AND with a constant which has the most significant bit set for the
upper bits.

So, perhaps just in the return op0; case add further code for
WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
again for the word mode and decide if it is still safe.

> > Now, this patch fixes the PR, but certainly generates worse (but correct)
> > code than the dse.cc patch.  So perhaps we want both of them?
> 
> What happens if you disable the step I mentioned (patchlet attached)?

That patch doesn't change anything at all on the testcase, it is still
miscompiled.

	Jakub
  
Eric Botcazou April 6, 2023, 2:21 p.m. UTC | #3
> If the
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> to
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
> invalid operation is then in
> simplify_context::simplify_binary_operation_1
>     case AND:
> ...
>       if (HWI_COMPUTABLE_MODE_P (mode))
>         {
>           HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
>           HOST_WIDE_INT nzop1;
>           if (CONST_INT_P (trueop1))
>             {
>               HOST_WIDE_INT val1 = INTVAL (trueop1);
>               /* If we are turning off bits already known off in OP0, we
> need not do an AND.  */
>               if ((nzop0 & ~val1) == 0)
>                 return op0;
>             }
> We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
> [0x84c]).
> We then for integral? modes smaller than word_mode would then need to
> actually check nonzero_bits in the word_mode (on paradoxical subreg of
> trueop0?).  If INTVAL (trueop1) is >= 0, then I think just doing
> nonzero_bits in the wider mode would be all we need (although the
> subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
> nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
> means AND with a constant which has the most significant bit set for the
> upper bits.

Yes, I agree that there is a tension between this AND case and the swapping 
done in the combiner for WORD_REGISTER_OPERATIONS.  I also agree that it would 
make sense do call nonzero_bits on word_mode instead of mode here in this case 
because AND is a word_register_operation_p.

> So, perhaps just in the return op0; case add further code for
> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
> again for the word mode and decide if it is still safe.

Does it work to just replace mode by word_mode in the calls to nonzero_bits?

> That patch doesn't change anything at all on the testcase, it is still
> miscompiled.

OK, too bad, thanks for trying it!
  
Jeff Law April 6, 2023, 2:35 p.m. UTC | #4
On 4/6/23 04:31, Jakub Jelinek wrote:

> 
> As before, I unfortunately can't test it on riscv-linux (could perhaps try
> that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
> target, but last my bootstrap attempt there failed miserably because of the
> Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
> for that once I test it).
I can do that test.  It'll take most of a day once it starts, so not 
eager to fire it up until we've settled on a patch.

jeff
  
Jeff Law April 6, 2023, 3:06 p.m. UTC | #5
On 4/6/23 04:31, Jakub Jelinek wrote:

> 
> If we want to fix it in the combiner, I think the fix would be following.
> The optimization is about
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> and IMHO we can only optimize it into
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> if we know that the upper bits of the REG are zeros.  
But in WORD_REGISTER_OPERATIONS, that inner AND variant operates on a 
full word.  So I think they're equivalent.  But maybe I'm getting myself 
confused again.



> 
> Now, this patch fixes the PR, but certainly generates worse (but correct)
> code than the dse.cc patch.  So perhaps we want both of them?
I think the dse patch has value independently of this discussion, though 
I think it's more of a gcc-14 thing.

> 
> As before, I unfortunately can't test it on riscv-linux (could perhaps try
> that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
> target, but last my bootstrap attempt there failed miserably because of the
> Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
> for that once I test it).
I can spin it here when the time comes.

jeff
  
Jeff Law April 9, 2023, 12:25 a.m. UTC | #6
On 4/6/23 08:21, Eric Botcazou wrote:

>> So, perhaps just in the return op0; case add further code for
>> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
>> again for the word mode and decide if it is still safe.
> 
> Does it work to just replace mode by word_mode in the calls to nonzero_bits?
It helps marginally -- basically we defer mucking up the code a bit.  We 
then hit this in simplify_and_const_int_1:


   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
      MODE.  */

   nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);

That just seems wrong for WORD_REGISTER_OPERATIONS targets.


Hacking both locations in a similar manner fixes the test.
jeff
  
Jeff Law April 9, 2023, 1:15 a.m. UTC | #7
On 4/6/23 05:37, Jakub Jelinek wrote:
> On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
>>> If we want to fix it in the combiner, I think the fix would be following.
>>> The optimization is about
>>> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
>>> and IMHO we can only optimize it into
>>> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
>>> if we know that the upper bits of the REG are zeros.
>>
>> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation
>> is done on the full word register, in other words that it's in effect:
>>
>> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
>>
>> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
> 
> If the
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> to
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
I think it is. In both cases the AND wipes the upper 16 bits.


> not really sure what for WORD_REGISTER_OPERATIONS
> means AND with a constant which has the most significant bit set for the
> upper bits.
That's a very good question.  I'm not sure either.  Obviously in the 
non-constant case all the bits up to word_mode get used.  The same thing 
is going to happen in the constant case.

THe fact that constants are sign extended from the mode bit is a gcc-ism 
though and not necessarily indicative of what hardware is going to to.



>>
>> What happens if you disable the step I mentioned (patchlet attached)?
> 
> That patch doesn't change anything at all on the testcase, it is still
> miscompiled.
That may be an artifact of later code in combine coming along and 
mucking things up in a manner similar.  That what I saw after twiddling 
simplify_binary_operation_1.  See simplify_and_const_int_1 and its calls 
to nonzero_bits

jeff
  
Hongtao Liu April 10, 2023, 5:13 a.m. UTC | #8
On Sun, Apr 9, 2023 at 9:15 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/6/23 05:37, Jakub Jelinek wrote:
> > On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> >>> If we want to fix it in the combiner, I think the fix would be following.
> >>> The optimization is about
> >>> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> >>> and IMHO we can only optimize it into
> >>> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> >>> if we know that the upper bits of the REG are zeros.
> >>
> >> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation
> >> is done on the full word register, in other words that it's in effect:
> >>
> >> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> >>
> >> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
> >
> > If the
> > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > to
> > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> I think it is. In both cases the AND wipes the upper 16 bits.
>
>
> > not really sure what for WORD_REGISTER_OPERATIONS
> > means AND with a constant which has the most significant bit set for the
> > upper bits.
> That's a very good question.  I'm not sure either.  Obviously in the
> non-constant case all the bits up to word_mode get used.  The same thing
> is going to happen in the constant case.
>
> THe fact that constants are sign extended from the mode bit is a gcc-ism
> though and not necessarily indicative of what hardware is going to to.
>
>
>
> >>
> >> What happens if you disable the step I mentioned (patchlet attached)?
> >
> > That patch doesn't change anything at all on the testcase, it is still
> > miscompiled.
> That may be an artifact of later code in combine coming along and
> mucking things up in a manner similar.  That what I saw after twiddling
> simplify_binary_operation_1.  See simplify_and_const_int_1 and its calls
> to nonzero_bits
Li pan and I tried to set SUBREG_PROMOTED_UNSIGNED_P for the
(subreg:SI (reg:HI xxx)) after the AND was optimized off.
But it looks RA doesn't handle it as expected,not sure it I understand
the semantics of SUBREG_PROMOTED_UNSIGNED_P correctly.
>
> jeff
  
Hongtao Liu April 10, 2023, 5:15 a.m. UTC | #9
On Mon, Apr 10, 2023 at 1:13 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Sun, Apr 9, 2023 at 9:15 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 4/6/23 05:37, Jakub Jelinek wrote:
> > > On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> > >>> If we want to fix it in the combiner, I think the fix would be following.
> > >>> The optimization is about
> > >>> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > >>> and IMHO we can only optimize it into
> > >>> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > >>> if we know that the upper bits of the REG are zeros.
> > >>
> > >> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation
> > >> is done on the full word register, in other words that it's in effect:
> > >>
> > >> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> > >>
> > >> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
> > >
> > > If the
> > > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > > to
> > > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > I think it is. In both cases the AND wipes the upper 16 bits.
> >
> >
> > > not really sure what for WORD_REGISTER_OPERATIONS
> > > means AND with a constant which has the most significant bit set for the
> > > upper bits.
> > That's a very good question.  I'm not sure either.  Obviously in the
> > non-constant case all the bits up to word_mode get used.  The same thing
> > is going to happen in the constant case.
> >
> > THe fact that constants are sign extended from the mode bit is a gcc-ism
> > though and not necessarily indicative of what hardware is going to to.
> >
> >
> >
> > >>
> > >> What happens if you disable the step I mentioned (patchlet attached)?
> > >
> > > That patch doesn't change anything at all on the testcase, it is still
> > > miscompiled.
> > That may be an artifact of later code in combine coming along and
> > mucking things up in a manner similar.  That what I saw after twiddling
> > simplify_binary_operation_1.  See simplify_and_const_int_1 and its calls
> > to nonzero_bits
> Li pan and I tried to set SUBREG_PROMOTED_UNSIGNED_P for the
I mean SUBREG_PROMOTED_UNSIGNED_SET.
> (subreg:SI (reg:HI xxx)) after the AND was optimized off.
> But it looks RA doesn't handle it as expected,not sure it I understand
> the semantics of SUBREG_PROMOTED_UNSIGNED_P correctly.
> >
> > jeff
>
>
>
> --
> BR,
> Hongtao
  
Jakub Jelinek April 10, 2023, 7:10 a.m. UTC | #10
On Sat, Apr 08, 2023 at 06:25:32PM -0600, Jeff Law wrote:
> 
> 
> On 4/6/23 08:21, Eric Botcazou wrote:
> 
> > > So, perhaps just in the return op0; case add further code for
> > > WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
> > > again for the word mode and decide if it is still safe.
> > 
> > Does it work to just replace mode by word_mode in the calls to nonzero_bits?
> It helps marginally -- basically we defer mucking up the code a bit.  We
> then hit this in simplify_and_const_int_1:
> 
> 
>   /* See what bits may be nonzero in VAROP.  Unlike the general case of
>      a call to nonzero_bits, here we don't care about bits outside
>      MODE.  */
> 
>   nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
> 
> That just seems wrong for WORD_REGISTER_OPERATIONS targets.
> 
> 
> Hacking both locations in a similar manner fixes the test.

If so, can you post that in patch form and can we go with that version
plus the testcase (e.g. from the first patch I've posted where I've changed
dse)?

	Jakub
  
Jeff Law April 12, 2023, 1:26 a.m. UTC | #11
On 4/10/23 01:10, Jakub Jelinek wrote:
> On Sat, Apr 08, 2023 at 06:25:32PM -0600, Jeff Law wrote:
>>
>>
>> On 4/6/23 08:21, Eric Botcazou wrote:
>>
>>>> So, perhaps just in the return op0; case add further code for
>>>> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
>>>> again for the word mode and decide if it is still safe.
>>>
>>> Does it work to just replace mode by word_mode in the calls to nonzero_bits?
>> It helps marginally -- basically we defer mucking up the code a bit.  We
>> then hit this in simplify_and_const_int_1:
>>
>>
>>    /* See what bits may be nonzero in VAROP.  Unlike the general case of
>>       a call to nonzero_bits, here we don't care about bits outside
>>       MODE.  */
>>
>>    nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
>>
>> That just seems wrong for WORD_REGISTER_OPERATIONS targets.
>>
>>
>> Hacking both locations in a similar manner fixes the test.
> 
> If so, can you post that in patch form and can we go with that version
> plus the testcase (e.g. from the first patch I've posted where I've changed
> dse)?
So as I mentioned in IRC, I haven't really looked closely at 
simplify_and_const_int_1.  I don't have a high degree of confidence this 
patch is complete, even though it does fix the test for 108947 and 109040.

I did bootstrap on riscv, but not a regression test, that's spinning 
right now.

Jeff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 22bf8e1ec89..c41d8a09b3b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop,
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
-     MODE.  */
+     MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  enum machine_mode tmode = WORD_REGISTER_OPERATIONS ? word_mode : mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
      Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 3b33afa2461..5f6f70491d8 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3752,7 +3752,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	return op0;
       if (HWI_COMPUTABLE_MODE_P (mode))
 	{
-	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+	     nonzero bits in WORD_MODE rather than MODE.  */
+	  HOST_WIDE_INT nzop0
+	    = nonzero_bits (trueop0, WORD_REGISTER_OPERATIONS ? word_mode : mode);
 	  HOST_WIDE_INT nzop1;
 	  if (CONST_INT_P (trueop1))
 	    {
  
Jakub Jelinek April 12, 2023, 6:21 a.m. UTC | #12
On Tue, Apr 11, 2023 at 07:26:07PM -0600, Jeff Law wrote:
> I did bootstrap on riscv, but not a regression test, that's spinning right
> now.
> 
> Jeff

> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 22bf8e1ec89..c41d8a09b3b 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop,
>  
>    /* See what bits may be nonzero in VAROP.  Unlike the general case of
>       a call to nonzero_bits, here we don't care about bits outside
> -     MODE.  */
> +     MODE unless WORD_REGISTER_OPERATIONS is true.  */

I would have expected something like
WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
as the condition to use word_mode, rather than just
WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
used as is, not a narrower word_mode instead of them.

> -  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
> +  enum machine_mode tmode = WORD_REGISTER_OPERATIONS ? word_mode : mode;
> +  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
>  
>    /* Turn off all bits in the constant that are known to already be zero.
>       Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 3b33afa2461..5f6f70491d8 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3752,7 +3752,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>  	return op0;
>        if (HWI_COMPUTABLE_MODE_P (mode))
>  	{
> -	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
> +	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
> +	     nonzero bits in WORD_MODE rather than MODE.  */
> +	  HOST_WIDE_INT nzop0
> +	    = nonzero_bits (trueop0, WORD_REGISTER_OPERATIONS ? word_mode : mode);
>  	  HOST_WIDE_INT nzop1;
>  	  if (CONST_INT_P (trueop1))
>  	    {

Regarding my earlier comments for this spot, the later code does
          nzop1 = nonzero_bits (trueop1, mode);
          /* If we are clearing all the nonzero bits, the result is zero.  */
          if ((nzop1 & nzop0) == 0
              && !side_effects_p (op0) && !side_effects_p (op1))
            return CONST0_RTX (mode);
and because nonzero_bits in word_mode if it is wider might have just more
bits set above mode, but nzop1 will not have those bits set, I think it is
fine the way you wrote it (except for the precision check).

	Jakub
  
Jeff Law April 12, 2023, 1:29 p.m. UTC | #13
On 4/12/23 00:21, Jakub Jelinek wrote:
> On Tue, Apr 11, 2023 at 07:26:07PM -0600, Jeff Law wrote:
>> I did bootstrap on riscv, but not a regression test, that's spinning right
>> now.
>>
>> Jeff
> 
>> diff --git a/gcc/combine.cc b/gcc/combine.cc
>> index 22bf8e1ec89..c41d8a09b3b 100644
>> --- a/gcc/combine.cc
>> +++ b/gcc/combine.cc
>> @@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop,
>>   
>>     /* See what bits may be nonzero in VAROP.  Unlike the general case of
>>        a call to nonzero_bits, here we don't care about bits outside
>> -     MODE.  */
>> +     MODE unless WORD_REGISTER_OPERATIONS is true.  */
> 
> I would have expected something like
> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
> as the condition to use word_mode, rather than just
> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> used as is, not a narrower word_mode instead of them.
Agreed.

Jeff
  

Patch

--- gcc/combine.cc.jj	2023-02-28 11:28:56.423182357 +0100
+++ gcc/combine.cc	2023-04-06 12:02:31.694747361 +0200
@@ -12671,7 +12671,11 @@  simplify_comparison (enum rtx_code code,
 		  && ((WORD_REGISTER_OPERATIONS
 		       && mode_width > GET_MODE_PRECISION (tmode)
 		       && mode_width <= BITS_PER_WORD
-		       && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
+		       && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1
+		       /* Make sure the bits above tmode in the SUBREG_REG
+			  are all known to be 0, see PR109040.  */
+		       && (nonzero_bits (SUBREG_REG (XEXP (op0, 0)), mode)
+			   & ~GET_MODE_MASK (tmode)) == 0)
 		      || (mode_width <= GET_MODE_PRECISION (tmode)
 			  && subreg_lowpart_p (XEXP (op0, 0))))
 		  && mode_width <= HOST_BITS_PER_WIDE_INT