match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]

Message ID CAFULd4Z5tQunoxdjn8D9UT2EN31PU-Ajbeg0UgCRUBDaUZ+SMg@mail.gmail.com
State New
Headers
Series match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm fail Testing failed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Testing failed

Commit Message

Uros Bizjak Jan. 8, 2024, 2:43 p.m. UTC
  Instead of converting XOR or PLUS of two values, ANDed with two constants that
have no bits in common, to IOR expression, convert IOR or XOR of said two
ANDed values to PLUS expression.

If we consider the following testcase:

--cut here--
unsigned int foo (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r + p + 2;
}

unsigned int bar (unsigned int a, unsigned int b)
{
  unsigned int r = a & 0x1;
  unsigned int p = b & ~0x3;

  return r | p | 2;
}
--cut here--

the above testcase compiles (x86_64 -O2) to:

foo:
        andl    $1, %edi
        andl    $-4, %esi
        orl     %esi, %edi
        leal    2(%rdi), %eax
        ret

bar:
        andl    $1, %edi
        andl    $-4, %esi
        orl     %esi, %edi
        movl    %edi, %eax
        orl     $2, %eax
        ret

There is no further simplification possible in any case, we can't combine
OR with a PLUS in the first case, and we don't have OR instruction with
multiple inputs in the second case.

If we switch around the logic in the conversion and convert from IOR/XOR
to PLUS, then the resulting assembly reads:

foo:
        andl    $-4, %esi
        andl    $1, %edi
        leal    2(%rsi,%rdi), %eax
        ret

bar:
        andl    $1, %edi
        andl    $-4, %esi
        leal    (%rdi,%rsi), %eax
        orl     $2, %eax
        ret

On x86, the conversion can now use LEA instruction, which is much more
usable than OR instruction.  In the first case, LEA implements three input
ADD instruction, while in the second case, even though the instruction
can't be combined with a follow-up OR, the non-destructive LEA avoids a move.

    PR target/108477

gcc/ChangeLog:

    * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2):
    Do not convert PLUS of two values, ANDed with two constants
    that have no bits in common to IOR exporession, convert
    IOR or XOR of said two ANDed values to PLUS expression.

gcc/testsuite/ChangeLog:

    * gcc.target/i386/pr108477.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline?

Uros.
  

Comments

Andrew Pinski Jan. 8, 2024, 4:57 p.m. UTC | #1
On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> Instead of converting XOR or PLUS of two values, ANDed with two constants that
> have no bits in common, to IOR expression, convert IOR or XOR of said two
> ANDed values to PLUS expression.

I think this only helps targets which have leal like instruction. Also
I think it is the same issue as I recorded as PR 111763 .  I suspect
BIT_IOR is more of a Canonical form for GIMPLE while we should handle
this in expand to decide if we want to use PLUS or IOR.

Thanks,
Andrew Pinski

>
> If we consider the following testcase:
>
> --cut here--
> unsigned int foo (unsigned int a, unsigned int b)
> {
>   unsigned int r = a & 0x1;
>   unsigned int p = b & ~0x3;
>
>   return r + p + 2;
> }
>
> unsigned int bar (unsigned int a, unsigned int b)
> {
>   unsigned int r = a & 0x1;
>   unsigned int p = b & ~0x3;
>
>   return r | p | 2;
> }
> --cut here--
>
> the above testcase compiles (x86_64 -O2) to:
>
> foo:
>         andl    $1, %edi
>         andl    $-4, %esi
>         orl     %esi, %edi
>         leal    2(%rdi), %eax
>         ret
>
> bar:
>         andl    $1, %edi
>         andl    $-4, %esi
>         orl     %esi, %edi
>         movl    %edi, %eax
>         orl     $2, %eax
>         ret
>
> There is no further simplification possible in any case, we can't combine
> OR with a PLUS in the first case, and we don't have OR instruction with
> multiple inputs in the second case.
>
> If we switch around the logic in the conversion and convert from IOR/XOR
> to PLUS, then the resulting assembly reads:
>
> foo:
>         andl    $-4, %esi
>         andl    $1, %edi
>         leal    2(%rsi,%rdi), %eax
>         ret
>
> bar:
>         andl    $1, %edi
>         andl    $-4, %esi
>         leal    (%rdi,%rsi), %eax
>         orl     $2, %eax
>         ret
>
> On x86, the conversion can now use LEA instruction, which is much more
> usable than OR instruction.  In the first case, LEA implements three input
> ADD instruction, while in the second case, even though the instruction
> can't be combined with a follow-up OR, the non-destructive LEA avoids a move.
>
>     PR target/108477
>
> gcc/ChangeLog:
>
>     * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2):
>     Do not convert PLUS of two values, ANDed with two constants
>     that have no bits in common to IOR exporession, convert
>     IOR or XOR of said two ANDed values to PLUS expression.
>
> gcc/testsuite/ChangeLog:
>
>     * gcc.target/i386/pr108477.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for mainline?
>
> Uros.
  
Jeff Law Jan. 8, 2024, 5:01 p.m. UTC | #2
On 1/8/24 09:57, Andrew Pinski wrote:
> On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> Instead of converting XOR or PLUS of two values, ANDed with two constants that
>> have no bits in common, to IOR expression, convert IOR or XOR of said two
>> ANDed values to PLUS expression.
> 
> I think this only helps targets which have leal like instruction. Also
> I think it is the same issue as I recorded as PR 111763 .  I suspect
> BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> this in expand to decide if we want to use PLUS or IOR.
Actually there's benefit on RISC-V to using PLUS over IOR/XOR when 
there's no bits in common.  In fact, I've been asked to do that by 
Andrew W. for a case where we know ahead of time there's no bits in 
common in a sequence that currently uses IOR.

Specifically it can allow more use of the compact instructions as the 
compact PLUS allows the full set of hard registers while compact IOR/XOR 
only allow a subset of registers.

jeff
  
Uros Bizjak Jan. 8, 2024, 8:10 p.m. UTC | #3
On Mon, Jan 8, 2024 at 5:57 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > ANDed values to PLUS expression.
>
> I think this only helps targets which have leal like instruction. Also
> I think it is the same issue as I recorded as PR 111763 .  I suspect
> BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> this in expand to decide if we want to use PLUS or IOR.

For the pr108477.c testcase, expand pass expands:

  r_3 = a_2(D) & 1;
 p_5 = b_4(D) & 4294967292;
 _1 = r_3 | p_5;
 _6 = _1 + 2;
 return _6;

The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
we need to determine values of constants. Is this information
available in the expand pass?

IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
the shown testcase would be beneficial when constructing control
register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
sequence in this case.

Uros.
  
Richard Biener Jan. 9, 2024, 8:35 a.m. UTC | #4
On Mon, 8 Jan 2024, Jeff Law wrote:

> 
> 
> On 1/8/24 09:57, Andrew Pinski wrote:
> > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >>
> >> Instead of converting XOR or PLUS of two values, ANDed with two constants
> >> that
> >> have no bits in common, to IOR expression, convert IOR or XOR of said two
> >> ANDed values to PLUS expression.
> > 
> > I think this only helps targets which have leal like instruction. Also
> > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > this in expand to decide if we want to use PLUS or IOR.
> Actually there's benefit on RISC-V to using PLUS over IOR/XOR when there's no
> bits in common.  In fact, I've been asked to do that by Andrew W. for a case
> where we know ahead of time there's no bits in common in a sequence that
> currently uses IOR.
> 
> Specifically it can allow more use of the compact instructions as the compact
> PLUS allows the full set of hard registers while compact IOR/XOR only allow a
> subset of registers.

Still the benefit of IOR is no undefined overflow issues and thus better
re-association (on GIMPLE).  Of course in the end what is better
"depends", but when "depends" depends on what the target can do better
it's indeed better to leave that to either RTL expansion or
instruction selection.

Richard.
  
Richard Biener Jan. 9, 2024, 8:53 a.m. UTC | #5
On Mon, 8 Jan 2024, Uros Bizjak wrote:

> On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> >
> > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > ANDed values to PLUS expression.
> >
> > I think this only helps targets which have leal like instruction. Also
> > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > this in expand to decide if we want to use PLUS or IOR.
> 
> For the pr108477.c testcase, expand pass expands:
> 
>   r_3 = a_2(D) & 1;
>  p_5 = b_4(D) & 4294967292;
>  _1 = r_3 | p_5;
>  _6 = _1 + 2;
>  return _6;
> 
> The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> we need to determine values of constants. Is this information
> available in the expand pass?

If there's single-uses then TER makes this info available.

> IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> the shown testcase would be beneficial when constructing control
> register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> sequence in this case.

The other possibility is to expose LEA as optab and making GIMPLE
instruction selection generate a direct internal function for that
(that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
but that has constraints on the addends mode (ptr_mode) which might
not fit what the target can do?  Otherwise that would be an existing
way to do this computation as well.

Richard.
  
Uros Bizjak Jan. 9, 2024, 9:25 a.m. UTC | #6
On Tue, Jan 9, 2024 at 9:58 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 8 Jan 2024, Uros Bizjak wrote:
>
> > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > >
> > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > ANDed values to PLUS expression.
> > >
> > > I think this only helps targets which have leal like instruction. Also
> > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > this in expand to decide if we want to use PLUS or IOR.
> >
> > For the pr108477.c testcase, expand pass expands:
> >
> >   r_3 = a_2(D) & 1;
> >  p_5 = b_4(D) & 4294967292;
> >  _1 = r_3 | p_5;
> >  _6 = _1 + 2;
> >  return _6;
> >
> > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > we need to determine values of constants. Is this information
> > available in the expand pass?
>
> If there's single-uses then TER makes this info available.
>
> > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > the shown testcase would be beneficial when constructing control
> > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > sequence in this case.
>
> The other possibility is to expose LEA as optab and making GIMPLE
> instruction selection generate a direct internal function for that
> (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> but that has constraints on the addends mode (ptr_mode) which might
> not fit what the target can do?  Otherwise that would be an existing
> way to do this computation as well.

I think there is no need for a new optab. If we can determine at
expand time that ANDed values are fed to the IOR/XOR expressions, then
we can check the constants and emit PLUS RTX instead. RTL combine pass
will then create LEA instruction from separate PLUS instructions.

So, we can emit:

op0 = and (a, CST1)
op1 = and (b, CST2)
op2 = plus (op0, op1)

RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0

and

op0 = and (a, CST1)
op1 = plus (op0, CST2)

RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0

The above transformation is valid for IOR and XOR.

x86 can't combine IOR/XOR in any meaningful way, but can combine the
sequence of PLUS (together with MULT) RTXes to LEA.

(BTW: I am not versed in the expand stuff, so a disclaimer is at hand ;) )

Uros.
  
Richard Biener Jan. 9, 2024, 9:39 a.m. UTC | #7
On Tue, 9 Jan 2024, Uros Bizjak wrote:

> On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> >
> > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > >
> > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > ANDed values to PLUS expression.
> > > >
> > > > I think this only helps targets which have leal like instruction. Also
> > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > this in expand to decide if we want to use PLUS or IOR.
> > >
> > > For the pr108477.c testcase, expand pass expands:
> > >
> > >   r_3 = a_2(D) & 1;
> > >  p_5 = b_4(D) & 4294967292;
> > >  _1 = r_3 | p_5;
> > >  _6 = _1 + 2;
> > >  return _6;
> > >
> > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > we need to determine values of constants. Is this information
> > > available in the expand pass?
> >
> > If there's single-uses then TER makes this info available.
> >
> > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > the shown testcase would be beneficial when constructing control
> > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > sequence in this case.
> >
> > The other possibility is to expose LEA as optab and making GIMPLE
> > instruction selection generate a direct internal function for that
> > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > but that has constraints on the addends mode (ptr_mode) which might
> > not fit what the target can do?  Otherwise that would be an existing
> > way to do this computation as well.
> 
> I think there is no need for a new optab. If we can determine at
> expand time that ANDed values are fed to the IOR/XOR expressions, then
> we can check the constants and emit PLUS RTX instead. RTL combine pass
> will then create LEA instruction from separate PLUS instructions.
> 
> So, we can emit:
> 
> op0 = and (a, CST1)
> op1 = and (b, CST2)
> op2 = plus (op0, op1)
> 
> RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> 
> and
> 
> op0 = and (a, CST1)
> op1 = plus (op0, CST2)
> 
> RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> 
> The above transformation is valid for IOR and XOR.
> 
> x86 can't combine IOR/XOR in any meaningful way, but can combine the
> sequence of PLUS (together with MULT) RTXes to LEA.

Btw, this looks like a three-insn combination even with IOR so a
pattern for this case would work as well?

Deciding whether to use PLUS or IOR at RTL expansion time would
need to somehow figure out what's better for the target in question.
I'm not sure how to do that?

Richard.
  
Jakub Jelinek Jan. 9, 2024, 9:53 a.m. UTC | #8
On Tue, Jan 09, 2024 at 10:39:50AM +0100, Richard Biener wrote:
> > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > sequence of PLUS (together with MULT) RTXes to LEA.
> 
> Btw, this looks like a three-insn combination even with IOR so a
> pattern for this case would work as well?
> 
> Deciding whether to use PLUS or IOR at RTL expansion time would
> need to somehow figure out what's better for the target in question.
> I'm not sure how to do that?

Maybe a new optab which would be used when expanding
BIT_IOR_EXPR/BIT_XOR_EXPR with operands which have no common bits
(say in (get_nonzero_bits (arg1) & get_nonzero_bits (arg2)) == 0
sense)?
x86 and riscv could just emit a PLUS in those cases...

	Jakub
  
Richard Biener Jan. 9, 2024, 10:01 a.m. UTC | #9
On Tue, 9 Jan 2024, Uros Bizjak wrote:

> On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> >
> > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > > >
> > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > >
> > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > > ANDed values to PLUS expression.
> > > > > >
> > > > > > I think this only helps targets which have leal like instruction. Also
> > > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > > this in expand to decide if we want to use PLUS or IOR.
> > > > >
> > > > > For the pr108477.c testcase, expand pass expands:
> > > > >
> > > > >   r_3 = a_2(D) & 1;
> > > > >  p_5 = b_4(D) & 4294967292;
> > > > >  _1 = r_3 | p_5;
> > > > >  _6 = _1 + 2;
> > > > >  return _6;
> > > > >
> > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > > we need to determine values of constants. Is this information
> > > > > available in the expand pass?
> > > >
> > > > If there's single-uses then TER makes this info available.
> > > >
> > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > > the shown testcase would be beneficial when constructing control
> > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > > sequence in this case.
> > > >
> > > > The other possibility is to expose LEA as optab and making GIMPLE
> > > > instruction selection generate a direct internal function for that
> > > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > > but that has constraints on the addends mode (ptr_mode) which might
> > > > not fit what the target can do?  Otherwise that would be an existing
> > > > way to do this computation as well.
> > >
> > > I think there is no need for a new optab. If we can determine at
> > > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > > will then create LEA instruction from separate PLUS instructions.
> > >
> > > So, we can emit:
> > >
> > > op0 = and (a, CST1)
> > > op1 = and (b, CST2)
> > > op2 = plus (op0, op1)
> > >
> > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> > >
> > > and
> > >
> > > op0 = and (a, CST1)
> > > op1 = plus (op0, CST2)
> > >
> > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> > >
> > > The above transformation is valid for IOR and XOR.
> > >
> > > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > > sequence of PLUS (together with MULT) RTXes to LEA.
> >
> > Btw, this looks like a three-insn combination even with IOR so a
> > pattern for this case would work as well?
> 
> IIUC the question: x86 does not have three-input IOR, but we want to
> emulate it with LEA (three-input PLUS, but one of the arguments has to
> be constant).

But couldn't you have a define_insn matching LEA but with IOR instead
of PLUS (and with the appropriate constraints?).  Maybe it could
also be combine trying PLUS instead of IOR if that's possible
(looking at the constants).

> We can do the conversion only when masking constants do
> not intersect. *However*, the original problem is that the compiler is
> converting PLUS to IOR, preventing the generation of LEA. As shown in
> the original PR, we would like to *prevent* the conversion from PLUS
> to IOR on x86, because it interferes with (a & CST1 + b & CST2 + CST3)
> - we would really want to generate LEA in this case.

I'm throwing in the argument that the user could have written
a & CST1 | b & CST2 + CST3 as well.

> > Deciding whether to use PLUS or IOR at RTL expansion time would
> > need to somehow figure out what's better for the target in question.
> > I'm not sure how to do that?
> 
> I think that a target hook would be needed. In addition to x86, Jeff
> presented a very specialized use case for RISC-V that can't be
> described otherwise.
> 
> Uros.
>
  
Uros Bizjak Jan. 9, 2024, 10:02 a.m. UTC | #10
On Tue, Jan 9, 2024 at 10:44 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 9 Jan 2024, Uros Bizjak wrote:
>
> > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > >
> > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > ANDed values to PLUS expression.
> > > > >
> > > > > I think this only helps targets which have leal like instruction. Also
> > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > this in expand to decide if we want to use PLUS or IOR.
> > > >
> > > > For the pr108477.c testcase, expand pass expands:
> > > >
> > > >   r_3 = a_2(D) & 1;
> > > >  p_5 = b_4(D) & 4294967292;
> > > >  _1 = r_3 | p_5;
> > > >  _6 = _1 + 2;
> > > >  return _6;
> > > >
> > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > we need to determine values of constants. Is this information
> > > > available in the expand pass?
> > >
> > > If there's single-uses then TER makes this info available.
> > >
> > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > the shown testcase would be beneficial when constructing control
> > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > sequence in this case.
> > >
> > > The other possibility is to expose LEA as optab and making GIMPLE
> > > instruction selection generate a direct internal function for that
> > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > but that has constraints on the addends mode (ptr_mode) which might
> > > not fit what the target can do?  Otherwise that would be an existing
> > > way to do this computation as well.
> >
> > I think there is no need for a new optab. If we can determine at
> > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > will then create LEA instruction from separate PLUS instructions.
> >
> > So, we can emit:
> >
> > op0 = and (a, CST1)
> > op1 = and (b, CST2)
> > op2 = plus (op0, op1)
> >
> > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> >
> > and
> >
> > op0 = and (a, CST1)
> > op1 = plus (op0, CST2)
> >
> > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> >
> > The above transformation is valid for IOR and XOR.
> >
> > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > sequence of PLUS (together with MULT) RTXes to LEA.
>
> Btw, this looks like a three-insn combination even with IOR so a
> pattern for this case would work as well?

IIUC the question: x86 does not have three-input IOR, but we want to
emulate it with LEA (three-input PLUS, but one of the arguments has to
be constant). We can do the conversion only when masking constants do
not intersect. *However*, the original problem is that the compiler is
converting PLUS to IOR, preventing the generation of LEA. As shown in
the original PR, we would like to *prevent* the conversion from PLUS
to IOR on x86, because it interferes with (a & CST1 + b & CST2 + CST3)
- we would really want to generate LEA in this case.

> Deciding whether to use PLUS or IOR at RTL expansion time would
> need to somehow figure out what's better for the target in question.
> I'm not sure how to do that?

I think that a target hook would be needed. In addition to x86, Jeff
presented a very specialized use case for RISC-V that can't be
described otherwise.

Uros.
  
Uros Bizjak Jan. 9, 2024, 10:19 a.m. UTC | #11
On Tue, Jan 9, 2024 at 11:06 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 9 Jan 2024, Uros Bizjak wrote:
>
> > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> > >
> > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > > > >
> > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > > > >
> > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > > > ANDed values to PLUS expression.
> > > > > > >
> > > > > > > I think this only helps targets which have leal like instruction. Also
> > > > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > > > this in expand to decide if we want to use PLUS or IOR.
> > > > > >
> > > > > > For the pr108477.c testcase, expand pass expands:
> > > > > >
> > > > > >   r_3 = a_2(D) & 1;
> > > > > >  p_5 = b_4(D) & 4294967292;
> > > > > >  _1 = r_3 | p_5;
> > > > > >  _6 = _1 + 2;
> > > > > >  return _6;
> > > > > >
> > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > > > we need to determine values of constants. Is this information
> > > > > > available in the expand pass?
> > > > >
> > > > > If there's single-uses then TER makes this info available.
> > > > >
> > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > > > the shown testcase would be beneficial when constructing control
> > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > > > sequence in this case.
> > > > >
> > > > > The other possibility is to expose LEA as optab and making GIMPLE
> > > > > instruction selection generate a direct internal function for that
> > > > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > > > but that has constraints on the addends mode (ptr_mode) which might
> > > > > not fit what the target can do?  Otherwise that would be an existing
> > > > > way to do this computation as well.
> > > >
> > > > I think there is no need for a new optab. If we can determine at
> > > > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > > > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > > > will then create LEA instruction from separate PLUS instructions.
> > > >
> > > > So, we can emit:
> > > >
> > > > op0 = and (a, CST1)
> > > > op1 = and (b, CST2)
> > > > op2 = plus (op0, op1)
> > > >
> > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> > > >
> > > > and
> > > >
> > > > op0 = and (a, CST1)
> > > > op1 = plus (op0, CST2)
> > > >
> > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> > > >
> > > > The above transformation is valid for IOR and XOR.
> > > >
> > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > > > sequence of PLUS (together with MULT) RTXes to LEA.
> > >
> > > Btw, this looks like a three-insn combination even with IOR so a
> > > pattern for this case would work as well?
> >
> > IIUC the question: x86 does not have three-input IOR, but we want to
> > emulate it with LEA (three-input PLUS, but one of the arguments has to
> > be constant).
>
> But couldn't you have a define_insn matching LEA but with IOR instead
> of PLUS (and with the appropriate constraints?).  Maybe it could
> also be combine trying PLUS instead of IOR if that's possible
> (looking at the constants).

we would have to include masking ANDs in the define_insn and add
additional conditions regarding mask constants in the insn constraint.

So, this define_insn would look something like:

(ior (ior (and (op1, CST1), and (op2, CST2)), CST3))

with (CST1 & CST2 & CST3) == 0 condition.

and combinations with PLUS / MULT RTXes including all the variants
with two arguments.

Uros.
  
Uros Bizjak Jan. 9, 2024, 10:27 a.m. UTC | #12
On Tue, Jan 9, 2024 at 11:19 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Jan 9, 2024 at 11:06 AM Richard Biener <rguenther@suse.de> wrote:
> >
> > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> >
> > > On Tue, Jan 9, 2024 at 10:44?AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Tue, 9 Jan 2024, Uros Bizjak wrote:
> > > >
> > > > > On Tue, Jan 9, 2024 at 9:58?AM Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Mon, 8 Jan 2024, Uros Bizjak wrote:
> > > > > >
> > > > > > > On Mon, Jan 8, 2024 at 5:57?PM Andrew Pinski <pinskia@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jan 8, 2024 at 6:44?AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants that
> > > > > > > > > have no bits in common, to IOR expression, convert IOR or XOR of said two
> > > > > > > > > ANDed values to PLUS expression.
> > > > > > > >
> > > > > > > > I think this only helps targets which have leal like instruction. Also
> > > > > > > > I think it is the same issue as I recorded as PR 111763 .  I suspect
> > > > > > > > BIT_IOR is more of a Canonical form for GIMPLE while we should handle
> > > > > > > > this in expand to decide if we want to use PLUS or IOR.
> > > > > > >
> > > > > > > For the pr108477.c testcase, expand pass expands:
> > > > > > >
> > > > > > >   r_3 = a_2(D) & 1;
> > > > > > >  p_5 = b_4(D) & 4294967292;
> > > > > > >  _1 = r_3 | p_5;
> > > > > > >  _6 = _1 + 2;
> > > > > > >  return _6;
> > > > > > >
> > > > > > > The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so
> > > > > > > we need to determine values of constants. Is this information
> > > > > > > available in the expand pass?
> > > > > >
> > > > > > If there's single-uses then TER makes this info available.
> > > > > >
> > > > > > > IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in
> > > > > > > the shown testcase would be beneficial when constructing control
> > > > > > > register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD
> > > > > > > sequence in this case.
> > > > > >
> > > > > > The other possibility is to expose LEA as optab and making GIMPLE
> > > > > > instruction selection generate a direct internal function for that
> > > > > > (that would be the "better" way).  There is LEA-like &TARGET_MEM_REF
> > > > > > but that has constraints on the addends mode (ptr_mode) which might
> > > > > > not fit what the target can do?  Otherwise that would be an existing
> > > > > > way to do this computation as well.
> > > > >
> > > > > I think there is no need for a new optab. If we can determine at
> > > > > expand time that ANDed values are fed to the IOR/XOR expressions, then
> > > > > we can check the constants and emit PLUS RTX instead. RTL combine pass
> > > > > will then create LEA instruction from separate PLUS instructions.
> > > > >
> > > > > So, we can emit:
> > > > >
> > > > > op0 = and (a, CST1)
> > > > > op1 = and (b, CST2)
> > > > > op2 = plus (op0, op1)
> > > > >
> > > > > RTX sequence for (a & CST1) | (b & CST2) when CST1 & CST2 == 0
> > > > >
> > > > > and
> > > > >
> > > > > op0 = and (a, CST1)
> > > > > op1 = plus (op0, CST2)
> > > > >
> > > > > RTX sequence for (a & CST1) | CST2 when CST1 & CST2 == 0
> > > > >
> > > > > The above transformation is valid for IOR and XOR.
> > > > >
> > > > > x86 can't combine IOR/XOR in any meaningful way, but can combine the
> > > > > sequence of PLUS (together with MULT) RTXes to LEA.
> > > >
> > > > Btw, this looks like a three-insn combination even with IOR so a
> > > > pattern for this case would work as well?
> > >
> > > IIUC the question: x86 does not have three-input IOR, but we want to
> > > emulate it with LEA (three-input PLUS, but one of the arguments has to
> > > be constant).
> >
> > But couldn't you have a define_insn matching LEA but with IOR instead
> > of PLUS (and with the appropriate constraints?).  Maybe it could
> > also be combine trying PLUS instead of IOR if that's possible
> > (looking at the constants).
>
> we would have to include masking ANDs in the define_insn and add
> additional conditions regarding mask constants in the insn constraint.
>
> So, this define_insn would look something like:
>
> (ior (ior (and (op1, CST1), and (op2, CST2)), CST3))
>
> with (CST1 & CST2 & CST3) == 0 condition.
>
> and combinations with PLUS / MULT RTXes including all the variants
> with two arguments.

Oh, and then we would have to split masking ANDs out of the above
instruction. We can't clobber the inputs, so this should be made with
a temporary registers before reload...

Uros.
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 7b4b15acc41..deac18a7635 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1830,18 +1830,18 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && element_precision (type) <= element_precision (TREE_TYPE (@1)))
    (bit_not (rop (convert @0) (convert @1))))))
 
-/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing
+/* If we are ORing or XORing two BIT_AND_EXPR's, both of which are and'ing
    with a constant, and the two constants have no bits in common,
-   we should treat this as a BIT_IOR_EXPR since this may produce more
+   we should treat this as a PLUS_EXPR since this may produce more
    simplifications.  */
-(for op (bit_xor plus)
+(for op (bit_ior bit_xor)
  (simplify
   (op (convert1? (bit_and@4 @0 INTEGER_CST@1))
       (convert2? (bit_and@5 @2 INTEGER_CST@3)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
        && tree_nop_conversion_p (type, TREE_TYPE (@2))
        && (wi::to_wide (@1) & wi::to_wide (@3)) == 0)
-   (bit_ior (convert @4) (convert @5)))))
+   (plus (convert @4) (convert @5)))))
 
 /* (X | Y) ^ X -> Y & ~ X*/
 (simplify
diff --git a/gcc/testsuite/gcc.target/i386/pr108477.c b/gcc/testsuite/gcc.target/i386/pr108477.c
new file mode 100644
index 00000000000..fb320a84c6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108477.c
@@ -0,0 +1,13 @@ 
+/* PR target/108477 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att" } */
+
+unsigned int foo (unsigned int a, unsigned int b)
+{
+  unsigned int r = a & 0x1;
+  unsigned int p = b & ~0x3;
+
+  return r + p + 2;
+}
+
+/* { dg-final { scan-assembler-not "orl" } } */