RISC-V: zero_extend(not) -> xor optimization [PR112398]
Checks
Commit Message
This patch adds optimization of the following patterns:
(zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
(xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
... where mask takes 0xffff bits of N mode bitsize
For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:
(zero_extend:M (subreg:N (not:M (X:M)))) ->
(xor:M (X:M, 0xffff))
Patch targets to handle code patterns like:
not a0,a0
andi a0,a0,0xff
to be optimized to:
xori a0,a0,255
Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this
optimization):
no regressions for all cases.
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, 0xff...f) when X
doesn't have any non-zero bits outside of SUBREG mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr112398.c: New test.
Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
---
gcc/simplify-rtx.cc | 23 +++++++++++++++++++++++
gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++++++++++++++
2 files changed, 37 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c
Comments
On 11/2/24 2:58 AM, Alexey Merzlyakov wrote:
> This patch adds optimization of the following patterns:
>
> (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
> (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
> ... where mask takes 0xffff bits of N mode bitsize
>
> For the cases when X:M doesn't have any non-zero bits outside of mode N,
> (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
> and whole optimization will be:
>
> (zero_extend:M (subreg:N (not:M (X:M)))) ->
> (xor:M (X:M, 0xffff))
>
> Patch targets to handle code patterns like:
> not a0,a0
> andi a0,a0,0xff
> to be optimized to:
> xori a0,a0,255
Just to be clear we don't need to worry about paradoxical subregs in
this case. To have a paradoxical N would have to be larger than M which
would be invalid for the zero_extend expression.
But as your implementation shows, we do need to be somewhat careful
about bits in X outside of mode M that may be set. Concretely let's
take a HImode inner object with a QImode narrowing subreg
(zero_extend:HI (subreg:QI (not:HI (X:HI)))) where X has the value 0x8000.
That expression says unambiguously that the result is 0xff, even though
the subreg indicates bits outside QI are don't cares, the outer
zero_extend sets them to 0. If we naively changed this to the xor X,
0xff form we'd generate 0x80ff, which would be incorrect.
>
> Change was locally tested for x86_64 and AArch64 (as most common)
> and for RV-64 and MIPS-32 targets (as having an effect from this
> optimization):
> no regressions for all cases.
>
> gcc/ChangeLog:
> * simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
> Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, 0xff...f) when X
> doesn't have any non-zero bits outside of SUBREG mode.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/pr112398.c: New test.
>
> Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
> ---
> gcc/simplify-rtx.cc | 23 +++++++++++++++++++++++
> gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++++++++++++++
> 2 files changed, 37 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 2c04ce960ee..608ecedbcb8 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1842,6 +1842,29 @@ simplify_context::simplify_unary_operation_1
> (rtx_code code, machine_mode mode,
> & ~GET_MODE_MASK (op_mode)) == 0)
> return SUBREG_REG (op);
>
> + /* Trying to optimize:
> + (zero_extend:M (subreg:N (not:M (X:M)))) ->
> + (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
> + where mask takes 0xffff bits of N mode bitsize.
"where the mask is GET_MODE_MASK (N)" is probably clearer to folks that
have been working on GCC for a while. GET_MODE_MASK is the bits set in
mode N. So for QI -> 0xff, HI would be 0xffff and so-on.
> + For the cases when X:M doesn't have any non-zero bits
> + outside of mode N, (zero_extend:M (subreg:N (X:M))
> + will be simplified to just (X:M)
> + and whole optimization will be -> (xor:M (X:M), 0xffff). */
> + if (GET_CODE (op) == SUBREG
Write this as "SUBREG_P (op)"
> + && GET_CODE (XEXP (op, 0)) == NOT
> + && GET_MODE (XEXP (op, 0)) == mode
> + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
I suspect the last mode check is redundant. We're not supposed to have
the argument of a code like NOT have a different mode than the code.
> + && subreg_lowpart_p (op)
Normally you'd probably need to check !paradoxical_subreg_p as well, but
that case can't happen here because the mode of the outer zero_extend
and thus the inner not must be wider than the mode of the subreg.
> + && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
> + & ~GET_MODE_MASK (GET_MODE (XEXP (XEXP (op, 0), 0)))) == 0)
Isn't GET_MODE (XEXP (XEXP (op, 0), 0)) the same as "mode"? Use the
value in the local variable, it's easier to read IMHO.
I suspect the formatting is off as well. We format conditionals like this:
if (a
&& b
&& c)
Note how the "&&" lines up under the first argument to the IF.
> + {
> + const uint64_t mask
> + = ~((uint64_t)~0 << GET_MODE_BITSIZE (GET_MODE (op)).coeffs[0]);
Probably shouldn't be looking directly at .coeffs field. Instead there
are methods to convert that value to an integer. Look for the
"to_constant ()" method. so
GET_MODE_SIZE (GET_MODE (op)).to_constant ()
Note that this transformation may not work for modes with nonconstant
sizes. So you might need to check the is_constant () method.
> + return simplify_gen_binary (XOR, mode,
> + XEXP (XEXP (op, 0), 0),
> + gen_int_mode (mask, mode));
Formatting looks goofy here too.
Line up each argument under the argument in the prior line ie
foo (XOR, mode
XEXP (XEXP (op, 0), 0)
get_int_mode (mask, mode))
Overall it looks pretty good, but there are a few things you need to fix
up and repost.
Jeff
@@ -1842,6 +1842,29 @@ simplify_context::simplify_unary_operation_1
(rtx_code code, machine_mode mode,
& ~GET_MODE_MASK (op_mode)) == 0)
return SUBREG_REG (op);
+ /* Trying to optimize:
+ (zero_extend:M (subreg:N (not:M (X:M)))) ->
+ (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
+ where mask takes 0xffff bits of N mode bitsize.
+ For the cases when X:M doesn't have any non-zero bits
+ outside of mode N, (zero_extend:M (subreg:N (X:M))
+ will be simplified to just (X:M)
+ and whole optimization will be -> (xor:M (X:M), 0xffff). */
+ if (GET_CODE (op) == SUBREG
+ && GET_CODE (XEXP (op, 0)) == NOT
+ && GET_MODE (XEXP (op, 0)) == mode
+ && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+ && subreg_lowpart_p (op)
+ && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
+ & ~GET_MODE_MASK (GET_MODE (XEXP (XEXP (op, 0), 0)))) == 0)
+ {
+ const uint64_t mask
+ = ~((uint64_t)~0 << GET_MODE_BITSIZE (GET_MODE (op)).coeffs[0]);
+ return simplify_gen_binary (XOR, mode,
+ XEXP (XEXP (op, 0), 0),
+ gen_int_mode (mask, mode));
+ }
+
#if defined(POINTERS_EXTEND_UNSIGNED)
/* As we do not know which address space the pointer is
referring to,
we can do this only if the target does not support different pointer
b/gcc/testsuite/gcc.target/riscv/pr112398.c
new file mode 100644
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+#include <stdint.h>
+
+uint8_t neg_u8 (const uint8_t src)
+{
+ return ~src;
+}
+
+/* { dg-final { scan-assembler-times "xori\t" 1 } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */