RISC-V: zero_extend(not) -> xor optimization [PR112398]

Message ID 3ebcb202-d6e5-4368-9217-033345fa308a@samsung.com
State Superseded
Delegated to: Jeff Law
Headers
Series RISC-V: zero_extend(not) -> xor optimization [PR112398] |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-apply-patch fail Patch failed to apply
rivoscibot/toolchain-ci-rivos-lint warning Lint failed
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Alexey Merzlyakov Nov. 2, 2024, 8:58 a.m. UTC
  This patch adds optimization of the following patterns:

   (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
   (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
     ... where mask takes 0xffff bits of N mode bitsize

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

   (zero_extend:M (subreg:N (not:M (X:M)))) ->
   (xor:M (X:M, 0xffff))

Patch targets to handle code patterns like:
   not   a0,a0
   andi  a0,a0,0xff
to be optimized to:
   xori    a0,a0,255

Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this 
optimization):
no regressions for all cases.

gcc/ChangeLog:
     * simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
     Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, 0xff...f) when X
     doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:
     * gcc.target/riscv/pr112398.c: New test.

Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
---
  gcc/simplify-rtx.cc                       | 23 +++++++++++++++++++++++
  gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++++++++++++++
  2 files changed, 37 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c
  

Comments

Jeff Law Nov. 4, 2024, 11:48 p.m. UTC | #1
On 11/2/24 2:58 AM, Alexey Merzlyakov wrote:
> This patch adds optimization of the following patterns:
> 
>     (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
>     (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
>       ... where mask takes 0xffff bits of N mode bitsize
> 
> For the cases when X:M doesn't have any non-zero bits outside of mode N,
> (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
> and whole optimization will be:
> 
>     (zero_extend:M (subreg:N (not:M (X:M)))) ->
>     (xor:M (X:M, 0xffff))
> 
> Patch targets to handle code patterns like:
>     not   a0,a0
>     andi  a0,a0,0xff
> to be optimized to:
>     xori    a0,a0,255
Just to be clear we don't need to worry about paradoxical subregs in 
this case.  To have a paradoxical N would have to be larger than M which 
would be invalid for the zero_extend expression.

But as your implementation shows, we do need to be somewhat careful 
about bits in X outside of mode M that may be set.  Concretely let's 
take a HImode inner object with a QImode narrowing subreg

(zero_extend:HI (subreg:QI (not:HI (X:HI)))) where X has the value 0x8000.

That expression says unambiguously that the result is 0xff, even though 
the subreg indicates bits outside QI are don't cares, the outer 
zero_extend sets them to 0.  If we naively changed this to the xor X, 
0xff form we'd generate 0x80ff, which would be incorrect.



> 
> Change was locally tested for x86_64 and AArch64 (as most common)
> and for RV-64 and MIPS-32 targets (as having an effect from this
> optimization):
> no regressions for all cases.
> 
> gcc/ChangeLog:
>       * simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
>       Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, 0xff...f) when X
>       doesn't have any non-zero bits outside of SUBREG mode.
> 
> gcc/testsuite/ChangeLog:
>       * gcc.target/riscv/pr112398.c: New test.
> 
> Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>
> ---
>    gcc/simplify-rtx.cc                       | 23 +++++++++++++++++++++++
>    gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++++++++++++++
>    2 files changed, 37 insertions(+)
>    create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c
> 
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 2c04ce960ee..608ecedbcb8 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1842,6 +1842,29 @@ simplify_context::simplify_unary_operation_1
> (rtx_code code, machine_mode mode,
>              & ~GET_MODE_MASK (op_mode)) == 0)
>        return SUBREG_REG (op);
> 
> +      /* Trying to optimize:
> +     (zero_extend:M (subreg:N (not:M (X:M)))) ->
> +     (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
> +     where mask takes 0xffff bits of N mode bitsize.
"where the mask is GET_MODE_MASK (N)" is probably clearer to folks that 
have been working on GCC for a while.  GET_MODE_MASK is the bits set in 
mode N.  So for QI -> 0xff, HI would be 0xffff and so-on.


> +     For the cases when X:M doesn't have any non-zero bits
> +     outside of mode N, (zero_extend:M (subreg:N (X:M))
> +     will be simplified to just (X:M)
> +     and whole optimization will be -> (xor:M (X:M), 0xffff). */
> +      if (GET_CODE (op) == SUBREG
Write this as "SUBREG_P (op)"

> +      && GET_CODE (XEXP (op, 0)) == NOT
> +      && GET_MODE (XEXP (op, 0)) == mode
> +      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
I suspect the last mode check is redundant.   We're not supposed to have 
the argument of a code like NOT have a different mode than the code.


> +      && subreg_lowpart_p (op)
Normally you'd probably need to check !paradoxical_subreg_p as well, but 
that case can't happen here because the mode of the outer zero_extend 
and thus the inner not must be wider than the mode of the subreg.


> +      && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
> +          & ~GET_MODE_MASK (GET_MODE (XEXP (XEXP (op, 0), 0)))) == 0)
Isn't GET_MODE (XEXP (XEXP (op, 0), 0)) the same as "mode"?  Use the 
value in the local variable, it's easier to read IMHO.

I suspect the formatting is off as well.  We format conditionals like this:

if (a
     && b
     && c)

Note how the "&&" lines up under the first argument to the IF.


> +      {
> +    const uint64_t mask
> +      = ~((uint64_t)~0 << GET_MODE_BITSIZE (GET_MODE (op)).coeffs[0]);
Probably shouldn't be looking directly at .coeffs field.  Instead there 
are methods to convert that value to an integer.  Look for the 
"to_constant ()" method.  so
GET_MODE_SIZE (GET_MODE (op)).to_constant ()

Note that this transformation may not work for modes with nonconstant 
sizes.  So you might need to check the is_constant () method.

> +    return simplify_gen_binary (XOR, mode,
> +                    XEXP (XEXP (op, 0), 0),
> +                    gen_int_mode (mask, mode));
Formatting looks goofy here too.

Line up each argument under the argument in the prior line ie

   foo (XOR, mode
        XEXP (XEXP (op, 0), 0)
        get_int_mode (mask, mode))


Overall it looks pretty good, but there are a few things you need to fix 
up and repost.


Jeff
  

Patch

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 2c04ce960ee..608ecedbcb8 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1842,6 +1842,29 @@  simplify_context::simplify_unary_operation_1 
(rtx_code code, machine_mode mode,
            & ~GET_MODE_MASK (op_mode)) == 0)
      return SUBREG_REG (op);

+      /* Trying to optimize:
+     (zero_extend:M (subreg:N (not:M (X:M)))) ->
+     (xor:M (zero_extend:M (subreg:N (X:M)), 0xffff))
+     where mask takes 0xffff bits of N mode bitsize.
+     For the cases when X:M doesn't have any non-zero bits
+     outside of mode N, (zero_extend:M (subreg:N (X:M))
+     will be simplified to just (X:M)
+     and whole optimization will be -> (xor:M (X:M), 0xffff). */
+      if (GET_CODE (op) == SUBREG
+      && GET_CODE (XEXP (op, 0)) == NOT
+      && GET_MODE (XEXP (op, 0)) == mode
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && subreg_lowpart_p (op)
+      && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
+          & ~GET_MODE_MASK (GET_MODE (XEXP (XEXP (op, 0), 0)))) == 0)
+      {
+    const uint64_t mask
+      = ~((uint64_t)~0 << GET_MODE_BITSIZE (GET_MODE (op)).coeffs[0]);
+    return simplify_gen_binary (XOR, mode,
+                    XEXP (XEXP (op, 0), 0),
+                    gen_int_mode (mask, mode));
+      }
+
  #if defined(POINTERS_EXTEND_UNSIGNED)
        /* As we do not know which address space the pointer is 
referring to,
       we can do this only if the target does not support different pointer
diff --git a/gcc/testsuite/gcc.target/riscv/pr112398.c 
b/gcc/testsuite/gcc.target/riscv/pr112398.c
new file mode 100644
index 00000000000..624a665b76c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr112398.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+#include <stdint.h>
+
+uint8_t neg_u8 (const uint8_t src)
+{
+  return ~src;
+}
+
+/* { dg-final { scan-assembler-times "xori\t" 1 } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */