[to-be-committed,RISC-V,PR,rtl-optimization/80770] Simplify bit flipping operations down to xor
Checks
| Context |
Check |
Description |
| rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
| rivoscibot/toolchain-ci-rivos-lint |
warning
|
Lint failed
|
| rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-test |
success
|
Testing passed
|
Commit Message
So this is the target independent work to finish resolving pr80770.
It's a combination of Shreya's efforts and my own.
To recap, the basic idea is we want to simplify RTL blobs which
ultimately are just flipping a bit. Consider:
> (set (reg:DI 153)
> (ior:DI (and:DI (reg:DI 140 [ *s_4(D) ])
> (const_int 254 [0xfe]))
> (and:DI (not:DI (reg:DI 140 [ *s_4(D) ]))
> (const_int 1 [0x1]))))
The first operand of the IOR clears the low bit of the source register
leaving everything else unchanged. The second operand of the IOR clears
everything but the low bit and flips the low bit. When we IOR those
together we get the original value with the lowest bit flipped. The key
is to realize we have the same pseudo in both arms and there are no bits
in common for the constants. So this works for an arbitrary bit(s) as
long as the constants have the right form.
That gets us good code on riscv and almost certainly helps other
targets. There is another form which shows up on the H8 and possibly
other targets sub-word arithmetic. op0 and op1 are respectively:
> (gdb) p debug_rtx (op0)
> (and:QI (reg:QI 24 [ *s_4(D) ])
> (const_int 127 [0x7f]))
> $1 = void
> (gdb) p debug_rtx (op1)
> (plus:QI (and:QI (reg:QI 24 [ *s_4(D) ])
> (const_int -128 [0xffffffffffffff80]))
> (const_int -128 [0xffffffffffffff80]))
> $2 = void
Note we're in QImode. op1 just flips the highest QImode bit. If there
are carry-outs, we don't really care about them. The net is we can
capture that case on the H8 by verifying this form flips the highest bit
for the given mode. Otherwise the carry-outs are relevant and our
transformation is incorrect.
Plan is to commit Friday. While it has been tested with the usual
bootstraps as well as testing on various cross platforms, I'm more
comfortable giving folks time to take a looksie to see if Shreya or I
missed anything critical.
For the testcase in question before/afters look like this:
x86:
movzbl (%rdi), %eax
movl %eax, %edx
andl $-2, %eax
andl $1, %edx
xorl $1, %edx
orl %edx, %eax
movb %al, (%rdi)
Turns into:
xorb $1, (%rdi)
RISC-V:
lbu a5,0(a0)
andi a4,a5,1
xori a4,a4,1
andi a5,a5,-2
or a5,a5,a4
sb a5,0(a0)
Turns into:
lbu a5,0(a0)
xori a5,a5,1
sb a5,0(a0)
Jeff
PR rtl-optimization/80770
gcc/
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Identify and optimize cases where an IOR is just a bit flip.
gcc/testsuite
* gcc.target/riscv/pr80770.c: New test.
Comments
Jeffrey Law <jeffrey.law@oss.qualcomm.com> writes:
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index bf625cdaf608..365bb9db1930 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3897,6 +3897,101 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> && negated_ops_p (XEXP (op0, 0), op1))
> return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
>
> + /* (ior (and (A C1) (and (not (A) C2))) can be converted
Bracketing typo: (not A) rather than (not (A))
> + into (and (xor (A C2) (C1 + C2))) when there are no bits
and here: (and (xor A C2) (C1 + C2))
> + in common between C1 and C2. */
> + if (GET_CODE (op0) == AND
> + && GET_CODE (op1) == AND
> + && GET_CODE (XEXP (op1, 0)) == NOT
> + && rtx_equal_p (XEXP (op0, 0), XEXP (XEXP (op1, 0), 0))
> + && CONST_INT_P (XEXP (op0, 1))
> + && CONST_INT_P (XEXP (op1, 1))
> + && (INTVAL (XEXP (op0, 1)) & INTVAL (XEXP (op1, 1))) == 0)
> + {
> + rtx c = GEN_INT (INTVAL (XEXP (op0, 1)) + INTVAL (XEXP (op1, 1)));
> +
> + tem = simplify_gen_binary (XOR, mode, XEXP (op0, 0), XEXP (op1, 1));
> + if (tem)
> + {
> + tem = simplify_gen_binary (AND, mode, tem, c);
> + if (tem)
> + return tem;
> + }
> + }
> +
> + /* Same thing, but operand order is reversed for the outer IOR. */
> + if (GET_CODE (op0) == AND
> + && GET_CODE (op1) == AND
> + && GET_CODE (XEXP (op0, 0)) == NOT
> + && rtx_equal_p (XEXP (op1, 0), XEXP (XEXP (op0, 0), 0))
> + && CONST_INT_P (XEXP (op0, 1))
> + && CONST_INT_P (XEXP (op1, 1))
> + && (INTVAL (XEXP (op0, 1)) & INTVAL (XEXP (op1, 1))) == 0)
> + {
> + rtx c = GEN_INT (INTVAL (XEXP (op0, 1)) + INTVAL (XEXP (op1, 1)));
> +
> + tem = simplify_gen_binary (XOR, mode, XEXP (op1, 0), XEXP (op0, 1));
> + if (tem)
> + {
> + tem = simplify_gen_binary (AND, mode, tem, c);
> + if (tem)
> + return tem;
> + }
> + }
> +
> + /* Another variant seen on some backends, particularly those with
> + sub-word operations. For these cases we have to know there is no
> + carry from the PLUS into relevant bits. In practice that means
> + it's only valid for the uppermost bit. */
I think this is different enough that it's worth giving the expression
in the comment, i.e.
(ior (and A C1) (plus (and A C1) C2))
It took me a while to work that out from the condition :)
I wondered whether this one could/should be generalised to use
nonzero_bits, rather than checking specifically for (and A C1).
That is,
(ior X (plus X C))
(or the reverse, for simple X) is equivalent to (ior X C)
if the low bit of C is above the highest nonzero bit of X.
In the above example, that would give us:
(ior (and A C1) C2)
which admittedly is different from the expansion in the patch,
but is equally simple. This is one of those unfortunate cases where
there are many ways of writing the same thing...
The (ior X (plus X C)) with nonzero_bits thing works for xor or ior in
place of plus.
> + if (GET_CODE (op0) == AND
> + && GET_CODE (op1) == PLUS
> + && GET_CODE (XEXP (op1, 0)) == AND
> + && rtx_equal_p (XEXP (op0, 0), XEXP (XEXP (op1, 0), 0))
> + && CONST_INT_P (XEXP (op0, 1))
> + && CONST_INT_P (XEXP (op1, 1))
> + && CONST_INT_P (XEXP (XEXP (op1, 0), 1))
> + && INTVAL (XEXP (op1, 1)) == INTVAL (XEXP (XEXP (op1, 0), 1))
> + && GET_MODE_BITSIZE (GET_MODE (op1)).is_constant ()
> + && ((INTVAL (XEXP (op1, 1)) & GET_MODE_MASK (GET_MODE (op1)))
> + == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE (op1)).to_constant () - 1))
> + && (INTVAL (XEXP (op0, 1)) & INTVAL (XEXP (op1, 1))) == 0)
> + {
> + rtx c = GEN_INT (INTVAL (XEXP (op0, 1)) + INTVAL (XEXP (op1, 1)));
> +
> + tem = simplify_gen_binary (XOR, mode, XEXP (op0, 0), XEXP (op1, 1));
> + if (tem)
> + {
> + tem = simplify_gen_binary (AND, mode, tem, c);
> + if (tem)
> + return tem;
> + }
> + }
> +
> + /* And its variant with the operands of the outer AND reversed. */
> + if (GET_CODE (op1) == AND
> + && GET_CODE (op0) == PLUS
> + && GET_CODE (XEXP (op0, 0)) == AND
> + && rtx_equal_p (XEXP (op1, 0), XEXP (XEXP (op0, 0), 0))
> + && CONST_INT_P (XEXP (op1, 1))
> + && CONST_INT_P (XEXP (op0, 1))
> + && CONST_INT_P (XEXP (XEXP (op0, 0), 1))
> + && INTVAL (XEXP (op0, 1)) == INTVAL (XEXP (XEXP (op0, 0), 1))
> + && GET_MODE_BITSIZE (GET_MODE (op0)).is_constant ()
> + && ((INTVAL (XEXP (op0, 1)) & GET_MODE_MASK (GET_MODE (op0)))
> + == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE (op0)).to_constant () - 1))
> + && (INTVAL (XEXP (op1, 1)) & INTVAL (XEXP (op0, 1))) == 0)
> + {
> + rtx c = GEN_INT (INTVAL (XEXP (op1, 1)) + INTVAL (XEXP (op0, 1)));
> +
> + tem = simplify_gen_binary (XOR, mode, XEXP (op1, 0), XEXP (op0, 1));
> + if (tem)
> + {
> + tem = simplify_gen_binary (AND, mode, tem, c);
> + if (tem)
> + return tem;
> + }
> + }
> +
How about putting this in a for loop:
for (int i = 0; i < 2; ++i)
{
}
so that each transform needs to be written once? This seems like a
really large amount of code to cut-&-paste, and it would be easy for
mistakes to be made when editing in future.
Sorry for being awkward :)
Richard
> tem = simplify_with_subreg_not (code, mode, op0, op1);
> if (tem)
> return tem;
> diff --git a/gcc/testsuite/gcc.target/riscv/pr80770.c b/gcc/testsuite/gcc.target/riscv/pr80770.c
> new file mode 100644
> index 000000000000..4dafe3955f05
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr80770.c
> @@ -0,0 +1,150 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=gnu99" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" } } */
> +
> +
> +struct S {
> + _Bool b0: 1;
> + _Bool b1: 1;
> + _Bool b2: 1;
> + _Bool b3: 1;
> + _Bool b4: 1;
> + _Bool b5: 1;
> + _Bool b6: 1;
> + _Bool b7: 1;
> + _Bool b8: 1;
> + _Bool b9: 1;
> + _Bool b10: 1;
> + _Bool b11: 1;
> + _Bool b12: 1;
> + _Bool b13: 1;
> + _Bool b14: 1;
> + _Bool b15: 1;
> + _Bool b16: 1;
> + _Bool b17: 1;
> + _Bool b18: 1;
> + _Bool b19: 1;
> + _Bool b20: 1;
> + _Bool b21: 1;
> + _Bool b22: 1;
> + _Bool b23: 1;
> + _Bool b24: 1;
> + _Bool b25: 1;
> + _Bool b26: 1;
> + _Bool b27: 1;
> + _Bool b28: 1;
> + _Bool b29: 1;
> + _Bool b30: 1;
> + _Bool b31: 1;
> + _Bool b32: 1;
> + _Bool b33: 1;
> + _Bool b34: 1;
> + _Bool b35: 1;
> + _Bool b36: 1;
> + _Bool b37: 1;
> + _Bool b38: 1;
> + _Bool b39: 1;
> + _Bool b40: 1;
> + _Bool b41: 1;
> + _Bool b42: 1;
> + _Bool b43: 1;
> + _Bool b44: 1;
> + _Bool b45: 1;
> + _Bool b46: 1;
> + _Bool b47: 1;
> + _Bool b48: 1;
> + _Bool b49: 1;
> + _Bool b50: 1;
> + _Bool b51: 1;
> + _Bool b52: 1;
> + _Bool b53: 1;
> + _Bool b54: 1;
> + _Bool b55: 1;
> + _Bool b56: 1;
> + _Bool b57: 1;
> + _Bool b58: 1;
> + _Bool b59: 1;
> + _Bool b60: 1;
> + _Bool b61: 1;
> + _Bool b62: 1;
> + _Bool b63: 1;
> +};
> +
> +#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
> +
> +T(0)
> +T(1)
> +T(2)
> +T(3)
> +T(4)
> +T(5)
> +T(6)
> +T(7)
> +T(8)
> +T(9)
> +T(10)
> +T(11)
> +T(12)
> +T(13)
> +T(14)
> +T(15)
> +T(16)
> +T(17)
> +T(18)
> +T(19)
> +T(20)
> +T(21)
> +T(22)
> +T(23)
> +T(24)
> +T(25)
> +T(26)
> +T(27)
> +T(28)
> +T(29)
> +T(30)
> +T(31)
> +#if __riscv_xlen == 64
> +T(32)
> +T(33)
> +T(34)
> +T(35)
> +T(36)
> +T(37)
> +T(38)
> +T(39)
> +T(40)
> +T(41)
> +T(42)
> +T(43)
> +T(44)
> +T(45)
> +T(46)
> +T(47)
> +T(48)
> +T(49)
> +T(50)
> +T(51)
> +T(52)
> +T(53)
> +T(54)
> +T(55)
> +T(56)
> +T(57)
> +T(58)
> +T(59)
> +T(60)
> +T(61)
> +T(62)
> +T(63)
> +#endif
> +
> +/* { dg-final { scan-assembler-times "lbu\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "lbu\t" 32 { target rv32 } } } */
> +
> +/* { dg-final { scan-assembler-times "xori\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "xori\t" 32 { target rv32 } } } */
> +
> +
> +/* { dg-final { scan-assembler-times "sb\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "sb\t" 32 { target rv32 } } } */
On 5/6/2026 2:59 AM, Richard Sandiford wrote:
>> +
>> + /* Another variant seen on some backends, particularly those with
>> + sub-word operations. For these cases we have to know there is no
>> + carry from the PLUS into relevant bits. In practice that means
>> + it's only valid for the uppermost bit. */
> I think this is different enough that it's worth giving the expression
> in the comment, i.e.
>
> (ior (and A C1) (plus (and A C1) C2))
>
> It took me a while to work that out from the condition :)
Quite fair. I actually had to remind myself how it worked after I'd
been away from it for a few months. Both are a good sign that a fresh
comment is warranted.
>
> I wondered whether this one could/should be generalised to use
> nonzero_bits, rather than checking specifically for (and A C1).
> That is,
>
> (ior X (plus X C))
>
> (or the reverse, for simple X) is equivalent to (ior X C)
> if the low bit of C is above the highest nonzero bit of X.
> In the above example, that would give us:
>
> (ior (and A C1) C2)
>
> which admittedly is different from the expansion in the patch,
> but is equally simple. This is one of those unfortunate cases where
> there are many ways of writing the same thing...
>
> The (ior X (plus X C)) with nonzero_bits thing works for xor or ior in
> place of plus.
Seems like it's worth at least exploring to see if it's likely to trigger.
> How about putting this in a for loop:
>
> for (int i = 0; i < 2; ++i)
> {
>
> }
>
> so that each transform needs to be written once? This seems like a
> really large amount of code to cut-&-paste, and it would be easy for
> mistakes to be made when editing in future.
Yea, I'd pondered creating a couple local variables and doing a
std::swap based on some initial shape queries to reduce duplication. In
this form I wouldn't even need to do form testing. Just std::swap on the
2nd iteration and be done with it.
>
> Sorry for being awkward :)
Nothing to apologize for. I'm always happy to get your feedback.
Jeff
On 5/6/2026 2:59 AM, Richard Sandiford wrote:
> I think this is different enough that it's worth giving the expression
> in the comment, i.e.
>
> (ior (and A C1) (plus (and A C1) C2))
>
> It took me a while to work that out from the condition :)
So I fixed the two paren typos, adjusted the comment for the second pair
of cases, then collapsed from 4 cases down to 2 using a loop and
swapping of local variables.
>
> I wondered whether this one could/should be generalised to use
> nonzero_bits, rather than checking specifically for (and A C1).
> That is,
>
> (ior X (plus X C))
>
> (or the reverse, for simple X) is equivalent to (ior X C)
> if the low bit of C is above the highest nonzero bit of X.
> In the above example, that would give us:
>
> (ior (and A C1) C2)
>
> which admittedly is different from the expansion in the patch,
> but is equally simple. This is one of those unfortunate cases where
> there are many ways of writing the same thing...
>
> The (ior X (plus X C)) with nonzero_bits thing works for xor or ior in
> place of plus.
Yea, this can be shown with a trivial test:
>
> int
> foo(int x)
> {
> x &= 0xf;
> x |= (x + 0x80);
> return x;
> }
>
> int
> foo2(int x)
> {
> x &= 0xf;
> x |= (x | 0x80);
> return x;
> }
Which generates this on rv64:
> foo:
> andi a0,a0,15
> addi a5,a0,128
> or a0,a5,a0
> ret
> .size foo, .-foo
> .align 2
> .globl foo2
> .type foo2, @function
> foo2:
> andi a0,a0,15
> ori a0,a0,128
> ret
I think the transform you're looking for really just requires that there
be no bits in common between the nonzero bits in X and the constant.
If there's no common bits, then the PLUS or XOR can turn into an IOR.
That's trivial to include.
Here's the net. I also added a test for the H8 for giggles. H8 isn't
optimal as it's not using "bnot", but that's an independent target issue.
I've spot checked on x86 which looks good as well. For the main
testcase we're typically squashing 7 instructions down to a single xor
or addb RMW operation.
Further suggestions?
Jeff
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index ee3d9ec32082..08ae224847b1 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3900,6 +3900,83 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
&& negated_ops_p (XEXP (op0, 0), op1))
return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+ {
+ rtx top0 = op0;
+ rtx top1 = op1;
+ for (int i = 0; i < 2; i++)
+ {
+ /* The outer IOR is commutative so try op0/op1 in their
+ original position and reversed. */
+ if (i == 1)
+ std::swap (top0, top1);
+
+ /* (ior X (plus/xor X C)) can be simplified into (ior X C) when
+ X and C have no bits in common. */
+ if ((GET_CODE (top1) == PLUS || GET_CODE (top1) == XOR)
+ && rtx_equal_p (top0, XEXP (top1, 0))
+ && ((nonzero_bits (top0, GET_MODE (top0))
+ & nonzero_bits (XEXP (top1, 1), GET_MODE (top1))) == 0)
+ && !side_effects_p (top1))
+ return simplify_gen_binary (IOR, mode, top0, XEXP (top1, 1));
+
+ /* (ior (and A C1) (and (not A) C2)) can be converted
+ into (and (xor A C2) (C1 + C2)) when there are no bits
+ in common between C1 and C2. */
+ if (GET_CODE (top0) == AND
+ && GET_CODE (top1) == AND
+ && GET_CODE (XEXP (top1, 0)) == NOT
+ && rtx_equal_p (XEXP (top0, 0), XEXP (XEXP (top1, 0), 0))
+ && CONST_INT_P (XEXP (top0, 1))
+ && CONST_INT_P (XEXP (top1, 1))
+ && (INTVAL (XEXP (top0, 1)) & INTVAL (XEXP (top1, 1))) == 0)
+ {
+ rtx c = GEN_INT (INTVAL (XEXP (top0, 1)) + INTVAL (XEXP (top1, 1)));
+
+ tem = simplify_gen_binary (XOR, mode, XEXP (top0, 0), XEXP (top1, 1));
+ if (tem)
+ {
+ tem = simplify_gen_binary (AND, mode, tem, c);
+
+ if (tem)
+ return tem;
+ }
+ }
+
+ /* Another variant seen on some targets particularly those with
+ sub-word operations.
+
+ (ior (and A C1) (plus (and A C2) C2)) can be simplified into
+ (and (xor A C2) (C1 + C2)).
+
+ Where C2 is the sign bit for A's mode. So 0x80 for QI,
+ 0x8000 for HI, etc. In this case we know there is no carry
+ from the PLUS into relevant bits of the output. */
+ if (GET_CODE (top0) == AND
+ && GET_CODE (top1) == PLUS
+ && GET_CODE (XEXP (top1, 0)) == AND
+ && rtx_equal_p (XEXP (top0, 0), XEXP (XEXP (top1, 0), 0))
+ && CONST_INT_P (XEXP (top0, 1))
+ && CONST_INT_P (XEXP (top1, 1))
+ && CONST_INT_P (XEXP (XEXP (top1, 0), 1))
+ && INTVAL (XEXP (top1, 1)) == INTVAL (XEXP (XEXP (top1, 0), 1))
+ && GET_MODE_BITSIZE (GET_MODE (top1)).is_constant ()
+ && ((INTVAL (XEXP (top1, 1)) & GET_MODE_MASK (GET_MODE (top1)))
+ == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE (top1)).to_constant () - 1))
+ && (INTVAL (XEXP (top0, 1)) & INTVAL (XEXP (top1, 1))) == 0)
+ {
+ rtx c = GEN_INT (INTVAL (XEXP (top0, 1)) + INTVAL (XEXP (top1, 1)));
+
+ tem = simplify_gen_binary (XOR, mode, XEXP (top0, 0), XEXP (top1, 1));
+ if (tem)
+ {
+ tem = simplify_gen_binary (AND, mode, tem, c);
+ if (tem)
+ return tem;
+ }
+ }
+ }
+ }
+
tem = simplify_with_subreg_not (code, mode, op0, op1);
if (tem)
return tem;
diff --git a/gcc/testsuite/gcc.target/h8300/pr80770-2.c b/gcc/testsuite/gcc.target/h8300/pr80770-2.c
new file mode 100644
index 000000000000..d2b491ed24e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/h8300/pr80770-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=gnu99 -mint32" } */
+
+
+int foop(int x) { x &= 0xf; x |= (x + 0x80); return x; }
+int foox(int x) { x &= 0xf; x |= (x ^ 0x80); return x; }
+
+/* { dg-final { scan-assembler-not "add" } } */
+/* { dg-final { scan-assembler-not "xor" } } */
+/* { dg-final { scan-assembler-times "and" 2 } } */
+/* { dg-final { scan-assembler-times "or" 2 } } */
diff --git a/gcc/testsuite/gcc.target/h8300/pr80770.c b/gcc/testsuite/gcc.target/h8300/pr80770.c
new file mode 100644
index 000000000000..bccf2ff66c66
--- /dev/null
+++ b/gcc/testsuite/gcc.target/h8300/pr80770.c
@@ -0,0 +1,75 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=gnu99 -mint32" } */
+
+
+struct S {
+ _Bool b0: 1;
+ _Bool b1: 1;
+ _Bool b2: 1;
+ _Bool b3: 1;
+ _Bool b4: 1;
+ _Bool b5: 1;
+ _Bool b6: 1;
+ _Bool b7: 1;
+ _Bool b8: 1;
+ _Bool b9: 1;
+ _Bool b10: 1;
+ _Bool b11: 1;
+ _Bool b12: 1;
+ _Bool b13: 1;
+ _Bool b14: 1;
+ _Bool b15: 1;
+ _Bool b16: 1;
+ _Bool b17: 1;
+ _Bool b18: 1;
+ _Bool b19: 1;
+ _Bool b20: 1;
+ _Bool b21: 1;
+ _Bool b22: 1;
+ _Bool b23: 1;
+ _Bool b24: 1;
+ _Bool b25: 1;
+ _Bool b26: 1;
+ _Bool b27: 1;
+ _Bool b28: 1;
+ _Bool b29: 1;
+ _Bool b30: 1;
+ _Bool b31: 1;
+};
+
+#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
+
+T(0)
+T(1)
+T(2)
+T(3)
+T(4)
+T(5)
+T(6)
+T(7)
+T(8)
+T(9)
+T(10)
+T(11)
+T(12)
+T(13)
+T(14)
+T(15)
+T(16)
+T(17)
+T(18)
+T(19)
+T(20)
+T(21)
+T(22)
+T(23)
+T(24)
+T(25)
+T(26)
+T(27)
+T(28)
+T(29)
+T(30)
+T(31)
+
+/* { dg-final { scan-assembler-times "xor\t|add.b\t|bnot\t" 32 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr80770-2.c b/gcc/testsuite/gcc.target/riscv/pr80770-2.c
new file mode 100644
index 000000000000..1514a49c5605
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr80770-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=gnu99" } */
+
+
+int foop(int x) { x &= 0xf; x |= (x + 0x80); return x; }
+int foox(int x) { x &= 0xf; x |= (x ^ 0x80); return x; }
+
+/* { dg-final { scan-assembler-not "add" } } */
+/* { dg-final { scan-assembler-not "xor" } } */
+/* { dg-final { scan-assembler-times "andi\t" 2 } } */
+/* { dg-final { scan-assembler-times "ori\t" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr80770.c b/gcc/testsuite/gcc.target/riscv/pr80770.c
new file mode 100644
index 000000000000..4dafe3955f05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr80770.c
@@ -0,0 +1,150 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=gnu99" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" } } */
+
+
+struct S {
+ _Bool b0: 1;
+ _Bool b1: 1;
+ _Bool b2: 1;
+ _Bool b3: 1;
+ _Bool b4: 1;
+ _Bool b5: 1;
+ _Bool b6: 1;
+ _Bool b7: 1;
+ _Bool b8: 1;
+ _Bool b9: 1;
+ _Bool b10: 1;
+ _Bool b11: 1;
+ _Bool b12: 1;
+ _Bool b13: 1;
+ _Bool b14: 1;
+ _Bool b15: 1;
+ _Bool b16: 1;
+ _Bool b17: 1;
+ _Bool b18: 1;
+ _Bool b19: 1;
+ _Bool b20: 1;
+ _Bool b21: 1;
+ _Bool b22: 1;
+ _Bool b23: 1;
+ _Bool b24: 1;
+ _Bool b25: 1;
+ _Bool b26: 1;
+ _Bool b27: 1;
+ _Bool b28: 1;
+ _Bool b29: 1;
+ _Bool b30: 1;
+ _Bool b31: 1;
+ _Bool b32: 1;
+ _Bool b33: 1;
+ _Bool b34: 1;
+ _Bool b35: 1;
+ _Bool b36: 1;
+ _Bool b37: 1;
+ _Bool b38: 1;
+ _Bool b39: 1;
+ _Bool b40: 1;
+ _Bool b41: 1;
+ _Bool b42: 1;
+ _Bool b43: 1;
+ _Bool b44: 1;
+ _Bool b45: 1;
+ _Bool b46: 1;
+ _Bool b47: 1;
+ _Bool b48: 1;
+ _Bool b49: 1;
+ _Bool b50: 1;
+ _Bool b51: 1;
+ _Bool b52: 1;
+ _Bool b53: 1;
+ _Bool b54: 1;
+ _Bool b55: 1;
+ _Bool b56: 1;
+ _Bool b57: 1;
+ _Bool b58: 1;
+ _Bool b59: 1;
+ _Bool b60: 1;
+ _Bool b61: 1;
+ _Bool b62: 1;
+ _Bool b63: 1;
+};
+
+#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
+
+T(0)
+T(1)
+T(2)
+T(3)
+T(4)
+T(5)
+T(6)
+T(7)
+T(8)
+T(9)
+T(10)
+T(11)
+T(12)
+T(13)
+T(14)
+T(15)
+T(16)
+T(17)
+T(18)
+T(19)
+T(20)
+T(21)
+T(22)
+T(23)
+T(24)
+T(25)
+T(26)
+T(27)
+T(28)
+T(29)
+T(30)
+T(31)
+#if __riscv_xlen == 64
+T(32)
+T(33)
+T(34)
+T(35)
+T(36)
+T(37)
+T(38)
+T(39)
+T(40)
+T(41)
+T(42)
+T(43)
+T(44)
+T(45)
+T(46)
+T(47)
+T(48)
+T(49)
+T(50)
+T(51)
+T(52)
+T(53)
+T(54)
+T(55)
+T(56)
+T(57)
+T(58)
+T(59)
+T(60)
+T(61)
+T(62)
+T(63)
+#endif
+
+/* { dg-final { scan-assembler-times "lbu\t" 64 { target rv64 } } } */
+/* { dg-final { scan-assembler-times "lbu\t" 32 { target rv32 } } } */
+
+/* { dg-final { scan-assembler-times "xori\t" 64 { target rv64 } } } */
+/* { dg-final { scan-assembler-times "xori\t" 32 { target rv32 } } } */
+
+
+/* { dg-final { scan-assembler-times "sb\t" 64 { target rv64 } } } */
+/* { dg-final { scan-assembler-times "sb\t" 32 { target rv32 } } } */
Jeffrey Law <jeffrey.law@oss.qualcomm.com> writes:
> On 5/6/2026 2:59 AM, Richard Sandiford wrote:
>> I think this is different enough that it's worth giving the expression
>> in the comment, i.e.
>>
>> (ior (and A C1) (plus (and A C1) C2))
>>
>> It took me a while to work that out from the condition :)
> So I fixed the two paren typos, adjusted the comment for the second pair
> of cases, then collapsed from 4 cases down to 2 using a loop and
> swapping of local variables.
>
>
>>
>> I wondered whether this one could/should be generalised to use
>> nonzero_bits, rather than checking specifically for (and A C1).
>> That is,
>>
>> (ior X (plus X C))
>>
>> (or the reverse, for simple X) is equivalent to (ior X C)
>> if the low bit of C is above the highest nonzero bit of X.
>> In the above example, that would give us:
>>
>> (ior (and A C1) C2)
>>
>> which admittedly is different from the expansion in the patch,
>> but is equally simple. This is one of those unfortunate cases where
>> there are many ways of writing the same thing...
>>
>> The (ior X (plus X C)) with nonzero_bits thing works for xor or ior in
>> place of plus.
> Yea, this can be shown with a trivial test:
>>
>> int
>> foo(int x)
>> {
>> x &= 0xf;
>> x |= (x + 0x80);
>> return x;
>> }
>>
>> int
>> foo2(int x)
>> {
>> x &= 0xf;
>> x |= (x | 0x80);
>> return x;
>> }
>
> Which generates this on rv64:
>
>> foo:
>> andi a0,a0,15
>> addi a5,a0,128
>> or a0,a5,a0
>> ret
>> .size foo, .-foo
>> .align 2
>> .globl foo2
>> .type foo2, @function
>> foo2:
>> andi a0,a0,15
>> ori a0,a0,128
>> ret
>
> I think the transform you're looking for really just requires that there
> be no bits in common between the nonzero bits in X and the constant.
> If there's no common bits, then the PLUS or XOR can turn into an IOR.
> That's trivial to include.
Ah, yeah.
> Here's the net. I also added a test for the H8 for giggles. H8 isn't
> optimal as it's not using "bnot", but that's an independent target issue.
>
> I've spot checked on x86 which looks good as well. For the main
> testcase we're typically squashing 7 instructions down to a single xor
> or addb RMW operation.
Nice!
> Further suggestions?
>
> Jeff
>
>
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index ee3d9ec32082..08ae224847b1 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3900,6 +3900,83 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> && negated_ops_p (XEXP (op0, 0), op1))
> return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
>
> + {
> + rtx top0 = op0;
> + rtx top1 = op1;
> + for (int i = 0; i < 2; i++)
> + {
> + /* The outer IOR is commutative so try op0/op1 in their
> + original position and reversed. */
> + if (i == 1)
> + std::swap (top0, top1);
FWIW, it would also work to keep op0 and op1 and do the std::swap
at the end of the loop body. Maybe that would be less error prone
wrt top vs op.
> +
> + /* (ior X (plus/xor X C)) can be simplified into (ior X C) when
> + X and C have no bits in common. */
Maybe s/C/Y/ now that we're not requiring a constant.
> + if ((GET_CODE (top1) == PLUS || GET_CODE (top1) == XOR)
> + && rtx_equal_p (top0, XEXP (top1, 0))
> + && ((nonzero_bits (top0, GET_MODE (top0))
> + & nonzero_bits (XEXP (top1, 1), GET_MODE (top1))) == 0)
> + && !side_effects_p (top1))
> + return simplify_gen_binary (IOR, mode, top0, XEXP (top1, 1));
> +
> + /* (ior (and A C1) (and (not A) C2)) can be converted
> + into (and (xor A C2) (C1 + C2)) when there are no bits
> + in common between C1 and C2. */
> + if (GET_CODE (top0) == AND
> + && GET_CODE (top1) == AND
> + && GET_CODE (XEXP (top1, 0)) == NOT
> + && rtx_equal_p (XEXP (top0, 0), XEXP (XEXP (top1, 0), 0))
> + && CONST_INT_P (XEXP (top0, 1))
> + && CONST_INT_P (XEXP (top1, 1))
> + && (INTVAL (XEXP (top0, 1)) & INTVAL (XEXP (top1, 1))) == 0)
> + {
> + rtx c = GEN_INT (INTVAL (XEXP (top0, 1)) + INTVAL (XEXP (top1, 1)));
> +
> + tem = simplify_gen_binary (XOR, mode, XEXP (top0, 0), XEXP (top1, 1));
> + if (tem)
> + {
> + tem = simplify_gen_binary (AND, mode, tem, c);
> +
> + if (tem)
> + return tem;
> + }
> + }
> +
> + /* Another variant seen on some targets particularly those with
> + sub-word operations.
> +
> + (ior (and A C1) (plus (and A C2) C2)) can be simplified into
> + (and (xor A C2) (C1 + C2)).
> +
> + Where C2 is the sign bit for A's mode. So 0x80 for QI,
> + 0x8000 for HI, etc. In this case we know there is no carry
> + from the PLUS into relevant bits of the output. */
> + if (GET_CODE (top0) == AND
> + && GET_CODE (top1) == PLUS
> + && GET_CODE (XEXP (top1, 0)) == AND
> + && rtx_equal_p (XEXP (top0, 0), XEXP (XEXP (top1, 0), 0))
> + && CONST_INT_P (XEXP (top0, 1))
> + && CONST_INT_P (XEXP (top1, 1))
> + && CONST_INT_P (XEXP (XEXP (top1, 0), 1))
> + && INTVAL (XEXP (top1, 1)) == INTVAL (XEXP (XEXP (top1, 0), 1))
> + && GET_MODE_BITSIZE (GET_MODE (top1)).is_constant ()
> + && ((INTVAL (XEXP (top1, 1)) & GET_MODE_MASK (GET_MODE (top1)))
> + == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE (top1)).to_constant () - 1))
> + && (INTVAL (XEXP (top0, 1)) & INTVAL (XEXP (top1, 1))) == 0)
> + {
> + rtx c = GEN_INT (INTVAL (XEXP (top0, 1)) + INTVAL (XEXP (top1, 1)));
> +
> + tem = simplify_gen_binary (XOR, mode, XEXP (top0, 0), XEXP (top1, 1));
> + if (tem)
> + {
> + tem = simplify_gen_binary (AND, mode, tem, c);
> + if (tem)
> + return tem;
> + }
> + }
Do you still need the last transformation? I was hoping that
nonzero_bits would return C1 for (and A C1), and so the first test
above would handle it.
Otherwise LGTM FWIW.
Thanks,
Richard
> + }
> + }
> +
> tem = simplify_with_subreg_not (code, mode, op0, op1);
> if (tem)
> return tem;
> diff --git a/gcc/testsuite/gcc.target/h8300/pr80770-2.c b/gcc/testsuite/gcc.target/h8300/pr80770-2.c
> new file mode 100644
> index 000000000000..d2b491ed24e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/h8300/pr80770-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=gnu99 -mint32" } */
> +
> +
> +int foop(int x) { x &= 0xf; x |= (x + 0x80); return x; }
> +int foox(int x) { x &= 0xf; x |= (x ^ 0x80); return x; }
> +
> +/* { dg-final { scan-assembler-not "add" } } */
> +/* { dg-final { scan-assembler-not "xor" } } */
> +/* { dg-final { scan-assembler-times "and" 2 } } */
> +/* { dg-final { scan-assembler-times "or" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/h8300/pr80770.c b/gcc/testsuite/gcc.target/h8300/pr80770.c
> new file mode 100644
> index 000000000000..bccf2ff66c66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/h8300/pr80770.c
> @@ -0,0 +1,75 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=gnu99 -mint32" } */
> +
> +
> +struct S {
> + _Bool b0: 1;
> + _Bool b1: 1;
> + _Bool b2: 1;
> + _Bool b3: 1;
> + _Bool b4: 1;
> + _Bool b5: 1;
> + _Bool b6: 1;
> + _Bool b7: 1;
> + _Bool b8: 1;
> + _Bool b9: 1;
> + _Bool b10: 1;
> + _Bool b11: 1;
> + _Bool b12: 1;
> + _Bool b13: 1;
> + _Bool b14: 1;
> + _Bool b15: 1;
> + _Bool b16: 1;
> + _Bool b17: 1;
> + _Bool b18: 1;
> + _Bool b19: 1;
> + _Bool b20: 1;
> + _Bool b21: 1;
> + _Bool b22: 1;
> + _Bool b23: 1;
> + _Bool b24: 1;
> + _Bool b25: 1;
> + _Bool b26: 1;
> + _Bool b27: 1;
> + _Bool b28: 1;
> + _Bool b29: 1;
> + _Bool b30: 1;
> + _Bool b31: 1;
> +};
> +
> +#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
> +
> +T(0)
> +T(1)
> +T(2)
> +T(3)
> +T(4)
> +T(5)
> +T(6)
> +T(7)
> +T(8)
> +T(9)
> +T(10)
> +T(11)
> +T(12)
> +T(13)
> +T(14)
> +T(15)
> +T(16)
> +T(17)
> +T(18)
> +T(19)
> +T(20)
> +T(21)
> +T(22)
> +T(23)
> +T(24)
> +T(25)
> +T(26)
> +T(27)
> +T(28)
> +T(29)
> +T(30)
> +T(31)
> +
> +/* { dg-final { scan-assembler-times "xor\t|add.b\t|bnot\t" 32 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr80770-2.c b/gcc/testsuite/gcc.target/riscv/pr80770-2.c
> new file mode 100644
> index 000000000000..1514a49c5605
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr80770-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=gnu99" } */
> +
> +
> +int foop(int x) { x &= 0xf; x |= (x + 0x80); return x; }
> +int foox(int x) { x &= 0xf; x |= (x ^ 0x80); return x; }
> +
> +/* { dg-final { scan-assembler-not "add" } } */
> +/* { dg-final { scan-assembler-not "xor" } } */
> +/* { dg-final { scan-assembler-times "andi\t" 2 } } */
> +/* { dg-final { scan-assembler-times "ori\t" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr80770.c b/gcc/testsuite/gcc.target/riscv/pr80770.c
> new file mode 100644
> index 000000000000..4dafe3955f05
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr80770.c
> @@ -0,0 +1,150 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=gnu99" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" } } */
> +
> +
> +struct S {
> + _Bool b0: 1;
> + _Bool b1: 1;
> + _Bool b2: 1;
> + _Bool b3: 1;
> + _Bool b4: 1;
> + _Bool b5: 1;
> + _Bool b6: 1;
> + _Bool b7: 1;
> + _Bool b8: 1;
> + _Bool b9: 1;
> + _Bool b10: 1;
> + _Bool b11: 1;
> + _Bool b12: 1;
> + _Bool b13: 1;
> + _Bool b14: 1;
> + _Bool b15: 1;
> + _Bool b16: 1;
> + _Bool b17: 1;
> + _Bool b18: 1;
> + _Bool b19: 1;
> + _Bool b20: 1;
> + _Bool b21: 1;
> + _Bool b22: 1;
> + _Bool b23: 1;
> + _Bool b24: 1;
> + _Bool b25: 1;
> + _Bool b26: 1;
> + _Bool b27: 1;
> + _Bool b28: 1;
> + _Bool b29: 1;
> + _Bool b30: 1;
> + _Bool b31: 1;
> + _Bool b32: 1;
> + _Bool b33: 1;
> + _Bool b34: 1;
> + _Bool b35: 1;
> + _Bool b36: 1;
> + _Bool b37: 1;
> + _Bool b38: 1;
> + _Bool b39: 1;
> + _Bool b40: 1;
> + _Bool b41: 1;
> + _Bool b42: 1;
> + _Bool b43: 1;
> + _Bool b44: 1;
> + _Bool b45: 1;
> + _Bool b46: 1;
> + _Bool b47: 1;
> + _Bool b48: 1;
> + _Bool b49: 1;
> + _Bool b50: 1;
> + _Bool b51: 1;
> + _Bool b52: 1;
> + _Bool b53: 1;
> + _Bool b54: 1;
> + _Bool b55: 1;
> + _Bool b56: 1;
> + _Bool b57: 1;
> + _Bool b58: 1;
> + _Bool b59: 1;
> + _Bool b60: 1;
> + _Bool b61: 1;
> + _Bool b62: 1;
> + _Bool b63: 1;
> +};
> +
> +#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
> +
> +T(0)
> +T(1)
> +T(2)
> +T(3)
> +T(4)
> +T(5)
> +T(6)
> +T(7)
> +T(8)
> +T(9)
> +T(10)
> +T(11)
> +T(12)
> +T(13)
> +T(14)
> +T(15)
> +T(16)
> +T(17)
> +T(18)
> +T(19)
> +T(20)
> +T(21)
> +T(22)
> +T(23)
> +T(24)
> +T(25)
> +T(26)
> +T(27)
> +T(28)
> +T(29)
> +T(30)
> +T(31)
> +#if __riscv_xlen == 64
> +T(32)
> +T(33)
> +T(34)
> +T(35)
> +T(36)
> +T(37)
> +T(38)
> +T(39)
> +T(40)
> +T(41)
> +T(42)
> +T(43)
> +T(44)
> +T(45)
> +T(46)
> +T(47)
> +T(48)
> +T(49)
> +T(50)
> +T(51)
> +T(52)
> +T(53)
> +T(54)
> +T(55)
> +T(56)
> +T(57)
> +T(58)
> +T(59)
> +T(60)
> +T(61)
> +T(62)
> +T(63)
> +#endif
> +
> +/* { dg-final { scan-assembler-times "lbu\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "lbu\t" 32 { target rv32 } } } */
> +
> +/* { dg-final { scan-assembler-times "xori\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "xori\t" 32 { target rv32 } } } */
> +
> +
> +/* { dg-final { scan-assembler-times "sb\t" 64 { target rv64 } } } */
> +/* { dg-final { scan-assembler-times "sb\t" 32 { target rv32 } } } */
On 5/9/2026 3:24 PM, Richard Sandiford wrote:
>> Further suggestions?
>>
>> Jeff
>>
>>
>>
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index ee3d9ec32082..08ae224847b1 100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -3900,6 +3900,83 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>> && negated_ops_p (XEXP (op0, 0), op1))
>> return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
>>
>> + {
>> + rtx top0 = op0;
>> + rtx top1 = op1;
>> + for (int i = 0; i < 2; i++)
>> + {
>> + /* The outer IOR is commutative so try op0/op1 in their
>> + original position and reversed. */
>> + if (i == 1)
>> + std::swap (top0, top1);
> FWIW, it would also work to keep op0 and op1 and do the std::swap
> at the end of the loop body. Maybe that would be less error prone
> wrt top vs op.
I was originally worried about early exits from the loop and thus I
wanted to leave op0/op1 alone. But the only early exits are returns, so
the state of op0/op1 doesn't matter. I could probably make a case for
either form. I suspect we're more likely to inadvertently use op0/op1
rather than the temporaries than we are to introduce a break in the
loop. Or maybe just put all the stuff into a function to avoid these
worries. I don't offhand think of a strong reason why this all needs to
be inlined.
>
>> +
>> + /* (ior X (plus/xor X C)) can be simplified into (ior X C) when
>> + X and C have no bits in common. */
> Maybe s/C/Y/ now that we're not requiring a constant.
Agreed. I'll change that too.
> Do you still need the last transformation? I was hoping that
> nonzero_bits would return C1 for (and A C1), and so the first test
> above would handle it.
I'd hoped for the same, but it was still necessary for some of the H8
cases. Though I did introduce a transient bug while adjusting things.
it's possible I'd tested when I had that transient bug in my tree (which
prevented the first transformation from firing). I'll retest just to be
sure.
>
> Otherwise LGTM FWIW.
It's worth a lot to me. I'll make the two additional adjustments sniff
test and commit.
Thanks!
jeff
Jeffrey Law <jeffrey.law@oss.qualcomm.com> writes:
> On 5/9/2026 3:24 PM, Richard Sandiford wrote:
>>> Further suggestions?
>>>
>>> Jeff
>>>
>>>
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index ee3d9ec32082..08ae224847b1 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -3900,6 +3900,83 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>>> && negated_ops_p (XEXP (op0, 0), op1))
>>> return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
>>>
>>> + {
>>> + rtx top0 = op0;
>>> + rtx top1 = op1;
>>> + for (int i = 0; i < 2; i++)
>>> + {
>>> + /* The outer IOR is commutative so try op0/op1 in their
>>> + original position and reversed. */
>>> + if (i == 1)
>>> + std::swap (top0, top1);
>> FWIW, it would also work to keep op0 and op1 and do the std::swap
>> at the end of the loop body. Maybe that would be less error prone
>> wrt top vs op.
> I was originally worried about early exits from the loop and thus I
> wanted to leave op0/op1 alone. But the only early exits are returns, so
> the state of op0/op1 doesn't matter. I could probably make a case for
> either form. I suspect we're more likely to inadvertently use op0/op1
> rather than the temporaries than we are to introduce a break in the
> loop. Or maybe just put all the stuff into a function to avoid these
> worries. I don't offhand think of a strong reason why this all needs to
> be inlined.
Yeah. I suppose a lambda would keep it just as localised, but make it
harder for mistakes to creep in.
Richard
@@ -3897,6 +3897,101 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
&& negated_ops_p (XEXP (op0, 0), op1))
return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+ /* (ior (and (A C1) (and (not (A) C2))) can be converted
+ into (and (xor (A C2) (C1 + C2))) when there are no bits
+ in common between C1 and C2. */
+ if (GET_CODE (op0) == AND
+ && GET_CODE (op1) == AND
+ && GET_CODE (XEXP (op1, 0)) == NOT
+ && rtx_equal_p (XEXP (op0, 0), XEXP (XEXP (op1, 0), 0))
+ && CONST_INT_P (XEXP (op0, 1))
+ && CONST_INT_P (XEXP (op1, 1))
+ && (INTVAL (XEXP (op0, 1)) & INTVAL (XEXP (op1, 1))) == 0)
+ {
+ rtx c = GEN_INT (INTVAL (XEXP (op0, 1)) + INTVAL (XEXP (op1, 1)));
+
+ tem = simplify_gen_binary (XOR, mode, XEXP (op0, 0), XEXP (op1, 1));
+ if (tem)
+ {
+ tem = simplify_gen_binary (AND, mode, tem, c);
+ if (tem)
+ return tem;
+ }
+ }
+
+ /* Same thing, but operand order is reversed for the outer IOR. */
+ if (GET_CODE (op0) == AND
+ && GET_CODE (op1) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (op1, 0), XEXP (XEXP (op0, 0), 0))
+ && CONST_INT_P (XEXP (op0, 1))
+ && CONST_INT_P (XEXP (op1, 1))
+ && (INTVAL (XEXP (op0, 1)) & INTVAL (XEXP (op1, 1))) == 0)
+ {
+ rtx c = GEN_INT (INTVAL (XEXP (op0, 1)) + INTVAL (XEXP (op1, 1)));
+
+ tem = simplify_gen_binary (XOR, mode, XEXP (op1, 0), XEXP (op0, 1));
+ if (tem)
+ {
+ tem = simplify_gen_binary (AND, mode, tem, c);
+ if (tem)
+ return tem;
+ }
+ }
+
+ /* Another variant seen on some backends, particularly those with
+ sub-word operations. For these cases we have to know there is no
+ carry from the PLUS into relevant bits. In practice that means
+ it's only valid for the uppermost bit. */
+ if (GET_CODE (op0) == AND
+ && GET_CODE (op1) == PLUS
+ && GET_CODE (XEXP (op1, 0)) == AND
+ && rtx_equal_p (XEXP (op0, 0), XEXP (XEXP (op1, 0), 0))
+ && CONST_INT_P (XEXP (op0, 1))
+ && CONST_INT_P (XEXP (op1, 1))
+ && CONST_INT_P (XEXP (XEXP (op1, 0), 1))
+ && INTVAL (XEXP (op1, 1)) == INTVAL (XEXP (XEXP (op1, 0), 1))
+ && GET_MODE_BITSIZE (GET_MODE (op1)).is_constant ()
+ && ((INTVAL (XEXP (op1, 1)) & GET_MODE_MASK (GET_MODE (op1)))
+ == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE (op1)).to_constant () - 1))
+ && (INTVAL (XEXP (op0, 1)) & INTVAL (XEXP (op1, 1))) == 0)
+ {
+ rtx c = GEN_INT (INTVAL (XEXP (op0, 1)) + INTVAL (XEXP (op1, 1)));
+
+ tem = simplify_gen_binary (XOR, mode, XEXP (op0, 0), XEXP (op1, 1));
+ if (tem)
+ {
+ tem = simplify_gen_binary (AND, mode, tem, c);
+ if (tem)
+ return tem;
+ }
+ }
+
+ /* And its variant with the operands of the outer AND reversed. */
+ if (GET_CODE (op1) == AND
+ && GET_CODE (op0) == PLUS
+ && GET_CODE (XEXP (op0, 0)) == AND
+ && rtx_equal_p (XEXP (op1, 0), XEXP (XEXP (op0, 0), 0))
+ && CONST_INT_P (XEXP (op1, 1))
+ && CONST_INT_P (XEXP (op0, 1))
+ && CONST_INT_P (XEXP (XEXP (op0, 0), 1))
+ && INTVAL (XEXP (op0, 1)) == INTVAL (XEXP (XEXP (op0, 0), 1))
+ && GET_MODE_BITSIZE (GET_MODE (op0)).is_constant ()
+ && ((INTVAL (XEXP (op0, 1)) & GET_MODE_MASK (GET_MODE (op0)))
+ == HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (GET_MODE (op0)).to_constant () - 1))
+ && (INTVAL (XEXP (op1, 1)) & INTVAL (XEXP (op0, 1))) == 0)
+ {
+ rtx c = GEN_INT (INTVAL (XEXP (op1, 1)) + INTVAL (XEXP (op0, 1)));
+
+ tem = simplify_gen_binary (XOR, mode, XEXP (op1, 0), XEXP (op0, 1));
+ if (tem)
+ {
+ tem = simplify_gen_binary (AND, mode, tem, c);
+ if (tem)
+ return tem;
+ }
+ }
+
tem = simplify_with_subreg_not (code, mode, op0, op1);
if (tem)
return tem;
new file mode 100644
@@ -0,0 +1,150 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=gnu99" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" } } */
+
+
+struct S {
+ _Bool b0: 1;
+ _Bool b1: 1;
+ _Bool b2: 1;
+ _Bool b3: 1;
+ _Bool b4: 1;
+ _Bool b5: 1;
+ _Bool b6: 1;
+ _Bool b7: 1;
+ _Bool b8: 1;
+ _Bool b9: 1;
+ _Bool b10: 1;
+ _Bool b11: 1;
+ _Bool b12: 1;
+ _Bool b13: 1;
+ _Bool b14: 1;
+ _Bool b15: 1;
+ _Bool b16: 1;
+ _Bool b17: 1;
+ _Bool b18: 1;
+ _Bool b19: 1;
+ _Bool b20: 1;
+ _Bool b21: 1;
+ _Bool b22: 1;
+ _Bool b23: 1;
+ _Bool b24: 1;
+ _Bool b25: 1;
+ _Bool b26: 1;
+ _Bool b27: 1;
+ _Bool b28: 1;
+ _Bool b29: 1;
+ _Bool b30: 1;
+ _Bool b31: 1;
+ _Bool b32: 1;
+ _Bool b33: 1;
+ _Bool b34: 1;
+ _Bool b35: 1;
+ _Bool b36: 1;
+ _Bool b37: 1;
+ _Bool b38: 1;
+ _Bool b39: 1;
+ _Bool b40: 1;
+ _Bool b41: 1;
+ _Bool b42: 1;
+ _Bool b43: 1;
+ _Bool b44: 1;
+ _Bool b45: 1;
+ _Bool b46: 1;
+ _Bool b47: 1;
+ _Bool b48: 1;
+ _Bool b49: 1;
+ _Bool b50: 1;
+ _Bool b51: 1;
+ _Bool b52: 1;
+ _Bool b53: 1;
+ _Bool b54: 1;
+ _Bool b55: 1;
+ _Bool b56: 1;
+ _Bool b57: 1;
+ _Bool b58: 1;
+ _Bool b59: 1;
+ _Bool b60: 1;
+ _Bool b61: 1;
+ _Bool b62: 1;
+ _Bool b63: 1;
+};
+
+#define T(N) void fb##N (struct S *s) { s->b##N = !s->b##N; }
+
+T(0)
+T(1)
+T(2)
+T(3)
+T(4)
+T(5)
+T(6)
+T(7)
+T(8)
+T(9)
+T(10)
+T(11)
+T(12)
+T(13)
+T(14)
+T(15)
+T(16)
+T(17)
+T(18)
+T(19)
+T(20)
+T(21)
+T(22)
+T(23)
+T(24)
+T(25)
+T(26)
+T(27)
+T(28)
+T(29)
+T(30)
+T(31)
+#if __riscv_xlen == 64
+T(32)
+T(33)
+T(34)
+T(35)
+T(36)
+T(37)
+T(38)
+T(39)
+T(40)
+T(41)
+T(42)
+T(43)
+T(44)
+T(45)
+T(46)
+T(47)
+T(48)
+T(49)
+T(50)
+T(51)
+T(52)
+T(53)
+T(54)
+T(55)
+T(56)
+T(57)
+T(58)
+T(59)
+T(60)
+T(61)
+T(62)
+T(63)
+#endif
+
+/* { dg-final { scan-assembler-times "lbu\t" 64 { target rv64 } } } */
+/* { dg-final { scan-assembler-times "lbu\t" 32 { target rv32 } } } */
+
+/* { dg-final { scan-assembler-times "xori\t" 64 { target rv64 } } } */
+/* { dg-final { scan-assembler-times "xori\t" 32 { target rv32 } } } */
+
+
+/* { dg-final { scan-assembler-times "sb\t" 64 { target rv64 } } } */
+/* { dg-final { scan-assembler-times "sb\t" 32 { target rv32 } } } */