RISC-V: Optimize min/max with SImode sources on 64-bit

Message ID 20221228181817.193462-1-rzinsly@ventanamicro.com
State Deferred, archived
Headers
Series RISC-V: Optimize min/max with SImode sources on 64-bit |

Commit Message

Raphael Moreira Zinsly Dec. 28, 2022, 6:18 p.m. UTC
  The Zbb min/max pattern was not matching 32-bit sources when
compiling for 64-bit.
This patch separates the pattern into SImode and DImode, and
use a define_expand to handle SImode on 64-bit.
zbb-min-max-02.c generates different code as a result of the new
expander.  The resulting code is as efficient as the old code.
Furthermore, the special sh1add pattern that appeared in
zbb-min-max-02.c is tested by the zba-shNadd-* tests.

	gcc/ChangeLog:

		* config/riscv/bitmanip.md
		(<bitmanip_optab><mode>3): Divide pattern into
		<bitmanip_optab>si3_insn and <bitmanip_optab>di3.
		(<bitmanip_optab>si3): Handle SImode sources on
		TARGET_64BIT.

	gcc/testsuite:

		* gcc.target/riscv/zbb-abs.c: New test.
		* gcc.target/riscv/zbb-min-max-02.c: Addapt the
		expected output.
---
 gcc/config/riscv/bitmanip.md                  | 38 ++++++++++++++++---
 gcc/testsuite/gcc.target/riscv/zbb-abs.c      | 18 +++++++++
 .../gcc.target/riscv/zbb-min-max-02.c         |  2 +-
 3 files changed, 52 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c
  

Comments

Jeff Law Dec. 29, 2022, 1:36 a.m. UTC | #1
On 12/28/22 11:18, Raphael Moreira Zinsly wrote:
> The Zbb min/max pattern was not matching 32-bit sources when
> compiling for 64-bit.
> This patch separates the pattern into SImode and DImode, and
> use a define_expand to handle SImode on 64-bit.
> zbb-min-max-02.c generates different code as a result of the new
> expander.  The resulting code is as efficient as the old code.
> Furthermore, the special sh1add pattern that appeared in
> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
> 
> 	gcc/ChangeLog:
> 
> 		* config/riscv/bitmanip.md
> 		(<bitmanip_optab><mode>3): Divide pattern into
> 		<bitmanip_optab>si3_insn and <bitmanip_optab>di3.
> 		(<bitmanip_optab>si3): Handle SImode sources on
> 		TARGET_64BIT.
> 
> 	gcc/testsuite:
> 
> 		* gcc.target/riscv/zbb-abs.c: New test.
> 		* gcc.target/riscv/zbb-min-max-02.c: Addapt the
> 		expected output.
So we need to do a bit of x86 debugging.

Given the regressions below on the x86 testsuite, we should assume there 
may be other targets where the optimization might result in testsuite 
regressions.

The good news is we can can use my upstream GCC tester to help identify 
some of these cases.  So I'll put the simplify-rtx change into my tester 
and see what pops out overnight on the embedded targets.

You're also missing a ChangeLog entry for the simplify-rtx change. 
Sorry I didn't catch that sooner.



--- /home/jlaw/gcc.sum  2022-12-28 15:59:52.612140312 -0700
+++ gcc/testsuite/gcc/gcc.sum   2022-12-28 17:14:52.661333526 -0700
@@ -182730,9 +182730,9 @@
  PASS: gcc.target/i386/pr106095.c (test for excess errors)
  PASS: gcc.target/i386/pr106122.c (test for excess errors)
  PASS: gcc.target/i386/pr106231-1.c (test for excess errors)
-PASS: gcc.target/i386/pr106231-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr106231-1.c scan-assembler-not cltq
  PASS: gcc.target/i386/pr106231-2.c (test for excess errors)
-PASS: gcc.target/i386/pr106231-2.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr106231-2.c scan-assembler-not cltq
  PASS: gcc.target/i386/pr106273.c (test for excess errors)
  PASS: gcc.target/i386/pr106273.c scan-assembler-not andn[ \\t]+%rdi, 
%r11, %rdi
  PASS: gcc.target/i386/pr106303.c (test for excess errors)
@@ -186148,7 +186148,7 @@
  PASS: gcc.target/i386/pr91704.c scan-assembler-not \tvpsubusb\t
  PASS: gcc.target/i386/pr91704.c scan-assembler-times \tvpcmpgtb\t%ymm 1
  PASS: gcc.target/i386/pr91824-1.c (test for excess errors)
-PASS: gcc.target/i386/pr91824-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr91824-1.c scan-assembler-not cltq
  PASS: gcc.target/i386/pr91824-2.c (test for excess errors)
  PASS: gcc.target/i386/pr91824-2.c scan-assembler-not cltq
  PASS: gcc.target/i386/pr91824-2.c scan-assembler-not movl\t%eax, %eax
@@ -186561,9 +186561,9 @@
  PASS: gcc.target/i386/pr95483-7.c (test for excess errors)
  PASS: gcc.target/i386/pr95483-7.c scan-assembler-times vmovdqu[ 
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+(?:\n|[ \\t]+#) 2
  PASS: gcc.target/i386/pr95535-1.c (test for excess errors)
-PASS: gcc.target/i386/pr95535-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr95535-1.c scan-assembler-not cltq
  PASS: gcc.target/i386/pr95535-2.c (test for excess errors)
-PASS: gcc.target/i386/pr95535-2.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr95535-2.c scan-assembler-not cltq
  PASS: gcc.target/i386/pr95740.c (test for excess errors)
  PASS: gcc.target/i386/pr95740.c scan-assembler-times (?n)incl[\\t ]*%eax 1
  PASS: gcc.target/i386/pr95740.c scan-assembler-times (?n)incq[\\t ]*%rax 1
  
Raphael Moreira Zinsly Dec. 29, 2022, 12:23 p.m. UTC | #2
On Wed, Dec 28, 2022 at 10:36 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 12/28/22 11:18, Raphael Moreira Zinsly wrote:
> > The Zbb min/max pattern was not matching 32-bit sources when
> > compiling for 64-bit.
> > This patch separates the pattern into SImode and DImode, and
> > use a define_expand to handle SImode on 64-bit.
> > zbb-min-max-02.c generates different code as a result of the new
> > expander.  The resulting code is as efficient as the old code.
> > Furthermore, the special sh1add pattern that appeared in
> > zbb-min-max-02.c is tested by the zba-shNadd-* tests.
> >
> >       gcc/ChangeLog:
> >
> >               * config/riscv/bitmanip.md
> >               (<bitmanip_optab><mode>3): Divide pattern into
> >               <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
> >               (<bitmanip_optab>si3): Handle SImode sources on
> >               TARGET_64BIT.
> >
> >       gcc/testsuite:
> >
> >               * gcc.target/riscv/zbb-abs.c: New test.
> >               * gcc.target/riscv/zbb-min-max-02.c: Addapt the
> >               expected output.
> So we need to do a bit of x86 debugging.
>
> Given the regressions below on the x86 testsuite, we should assume there
> may be other targets where the optimization might result in testsuite
> regressions.

Thanks for checking it!

> The good news is we can can use my upstream GCC tester to help identify
> some of these cases.  So I'll put the simplify-rtx change into my tester
> and see what pops out overnight on the embedded targets.
>
> You're also missing a ChangeLog entry for the simplify-rtx change.
> Sorry I didn't catch that sooner.

There is no simplify-rtx change in this patch, I think you may be
mixing my patches ;-)
  
Jeff Law Dec. 29, 2022, 2:18 p.m. UTC | #3
On 12/29/22 05:23, Raphael Zinsly wrote:
> On Wed, Dec 28, 2022 at 10:36 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 12/28/22 11:18, Raphael Moreira Zinsly wrote:
>>> The Zbb min/max pattern was not matching 32-bit sources when
>>> compiling for 64-bit.
>>> This patch separates the pattern into SImode and DImode, and
>>> use a define_expand to handle SImode on 64-bit.
>>> zbb-min-max-02.c generates different code as a result of the new
>>> expander.  The resulting code is as efficient as the old code.
>>> Furthermore, the special sh1add pattern that appeared in
>>> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>>>
>>>        gcc/ChangeLog:
>>>
>>>                * config/riscv/bitmanip.md
>>>                (<bitmanip_optab><mode>3): Divide pattern into
>>>                <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
>>>                (<bitmanip_optab>si3): Handle SImode sources on
>>>                TARGET_64BIT.
>>>
>>>        gcc/testsuite:
>>>
>>>                * gcc.target/riscv/zbb-abs.c: New test.
>>>                * gcc.target/riscv/zbb-min-max-02.c: Addapt the
>>>                expected output.
>> So we need to do a bit of x86 debugging.
>>
>> Given the regressions below on the x86 testsuite, we should assume there
>> may be other targets where the optimization might result in testsuite
>> regressions.
> 
> Thanks for checking it!
lol, you're right.  I'm mushing together your stuff in my head!  So 
we've got a couple TODOs for the ctz/clz improvements before they can be 
submitted ;-)

jeff
  
Philipp Tomsich Dec. 29, 2022, 2:50 p.m. UTC | #4
On Wed, 28 Dec 2022 at 19:18, Raphael Moreira Zinsly <
rzinsly@ventanamicro.com> wrote:

> The Zbb min/max pattern was not matching 32-bit sources when
> compiling for 64-bit.
> This patch separates the pattern into SImode and DImode, and
> use a define_expand to handle SImode on 64-bit.
> zbb-min-max-02.c generates different code as a result of the new
> expander.  The resulting code is as efficient as the old code.
> Furthermore, the special sh1add pattern that appeared in
> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>
>         gcc/ChangeLog:
>
>                 * config/riscv/bitmanip.md
>                 (<bitmanip_optab><mode>3): Divide pattern into
>                 <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
>                 (<bitmanip_optab>si3): Handle SImode sources on
>                 TARGET_64BIT.
>
>         gcc/testsuite:
>
>                 * gcc.target/riscv/zbb-abs.c: New test.
>                 * gcc.target/riscv/zbb-min-max-02.c: Addapt the
>                 expected output.
> ---
>  gcc/config/riscv/bitmanip.md                  | 38 ++++++++++++++++---
>  gcc/testsuite/gcc.target/riscv/zbb-abs.c      | 18 +++++++++
>  .../gcc.target/riscv/zbb-min-max-02.c         |  2 +-
>  3 files changed, 52 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index d17133d58c1..abf08a29e89 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -360,14 +360,42 @@
>    DONE;
>  })
>
> -(define_insn "<bitmanip_optab><mode>3"
> -  [(set (match_operand:X 0 "register_operand" "=r")
> -        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> -                          (match_operand:X 2 "register_operand" "r")))]
> -  "TARGET_ZBB"
> +(define_insn "<bitmanip_optab>si3_insn"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +        (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +                            (match_operand:SI 2 "register_operand" "r")))]
> +  "!TARGET_64BIT && TARGET_ZBB"
>    "<bitmanip_insn>\t%0,%1,%2"
>    [(set_attr "type" "bitmanip")])
>
> +(define_insn "<bitmanip_optab>di3"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
> +                            (match_operand:DI 2 "register_operand" "r")))]
> +  "TARGET_64BIT && TARGET_ZBB"
> +  "<bitmanip_insn>\t%0,%1,%2"
> +  [(set_attr "type" "bitmanip")])
> +
> +(define_expand "<bitmanip_optab>si3"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +        (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +                            (match_operand:SI 2 "register_operand" "r")))]
> +  "TARGET_ZBB"
> +  "
> +{
> +  if (TARGET_64BIT)
> +    {
> +      rtx op1_x = gen_reg_rtx (DImode);
> +      emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
> +      rtx op2_x = gen_reg_rtx (DImode);
> +      emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
> +      rtx dst_x = gen_reg_rtx (DImode);
> +      emit_insn (gen_<bitmanip_optab>di3 (dst_x, op1_x, op2_x));
> +      emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
> +      DONE;
> +    }
> +}")
>

We have two issues around min/max here:
1. That it doesn't apply to the SImode abs case (which is due to
expand_abs_nojump() blindly testing for the current mode in smax_optab).
2. That we have to reduce the number of extensions to the least amount.

The above addresses expand_abs_nojump(), but makes the general solution
harder as the middle-end needs to know there is no native SImode min/max
available.
We still plan (proof-of-concept works, but a final patch will likely not be
ready before very late in January) to submit a patch to improve the
expansion of MIN_EXPR/MAX_EXPR that utilizes the type-precision and
value-ranges to not even create the sign-extensions in the first place.  If
we do the above, the middle-end will blindly emit this sequence with the 2
sign-extensions — which may or may not be eliminated later by combining
with a w-form.
I'll also add an enhancement to expand_abs_nojump() to our list of changes
for the min/max enhancement during the lowering.

Note that, if we decide to go ahead with using this as a temporary solution
until our change is ready, you'll also need to add a cost for the SImode
max.

Philipp.


> +
>  ;; Optimize the common case of a SImode min/max against a constant
>  ;; that is safe both for sign- and zero-extension.
>  (define_insn_and_split "*minmax"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> new file mode 100644
> index 00000000000..6ef7efdbd49
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +#define ABS(x) (((x) >= 0) ? (x) : -(x))
> +
> +int
> +foo (int x)
> +{
> +  return ABS(x);
> +}
> +
> +/* { dg-final { scan-assembler-times "neg" 1 } } */
> +/* { dg-final { scan-assembler-times "max" 1 } } */
> +/* { dg-final { scan-assembler-not "sraiw" } } */
> +/* { dg-final { scan-assembler-not "xor" } } */
> +/* { dg-final { scan-assembler-not "subw" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> index b462859f10f..b9db655d55d 100644
> --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> @@ -9,6 +9,6 @@ int f(unsigned int* a)
>  }
>
>  /* { dg-final { scan-assembler-times "minu" 1 } } */
> -/* { dg-final { scan-assembler-times "sext.w" 1 } } */
> +/* { dg-final { scan-assembler-times "sext.w|addw" 1 } } */
>  /* { dg-final { scan-assembler-not "zext.w" } } */
>
> --
> 2.38.1
>
>
  
Jeff Law Dec. 29, 2022, 5:26 p.m. UTC | #5
On 12/29/22 07:50, Philipp Tomsich wrote:

> 
> 
> We have two issues around min/max here:
> 1. That it doesn't apply to the SImode abs case (which is due to 
> expand_abs_nojump() blindly testing for the current mode in smax_optab).
Mode testing is inherent in the optab query interface.

> 2. That we have to reduce the number of extensions to the least amount.
Yes.  I'd already indicated to Raphael that y'all were poking in this 
space and that I consider the two approaches complementary.

Essentially we want to reduce unnecessary extensions by type adjustment 
in gimple when we can prove that safe while at the same time we want the 
RTL bits to generate good code if we're unable to adjust the types.


> 
> The above addresses expand_abs_nojump(), but makes the general solution 
> harder as the middle-end needs to know there is no native SImode min/max 
> available.
I think we can probably deal with this without too much trouble.  We 
just need a reasonable API to ask "is this optab supported directly by 
the target, or is it expanded by the target". I can envision a couple 
ways to do that, one of which would be to query the rtx cost.  If the 
resultant cost is > COSTS_N_INSNS (1), then its synthesized, otherwise 
it's native.

> We still plan (proof-of-concept works, but a final patch will likely not 
> be ready before very late in January) to submit a patch to improve the 
> expansion of MIN_EXPR/MAX_EXPR that utilizes the type-precision and 
> value-ranges to not even create the sign-extensions in the first place.  
Which would be fantastic.


> If we do the above, the middle-end will blindly emit this sequence with 
> the 2 sign-extensions — which may or may not be eliminated later by 
> combining with a w-form.
I think the question becomes whether or not we think the work you're 
doing would likely subsume what Raphael has done.

Your work and Raphael's work both target the gimple->RTL expansion 
phase.  Yours tries to adjust types so that we have fewer extensions. 
Raphael's work adjusts how we expand when we're asked for a 32bit min/max.

Those seem complementary to me -- if you're unable to adjust the types, 
then go ahead and ask for that 32bit min/max and let Raphael's expander 
generate reasonable code for it.  If there's an API that allows you to 
query for native vs synthesized support, then you should have the 
backend info necessary to drive your type adjustment work.



> 
> Note that, if we decide to go ahead with using this as a temporary 
> solution until our change is ready, you'll also need to add a cost for 
> the SImode max.
Agreed.  I haven't exposed Raphael to rtx_cost yet, but now seems like a 
good a time as any.

Raphael, you're going to want to look in riscv.cc::riscv_rtx_costs. 
We'll want a case for MIN/MAX.  For TARGET_ZBB and word sized 
operations, the cost is probably COSTS_N_INSNS (1), for TARGET_ZBB a 
subword operation (ie SImode for TARGET_64BIT) it'd be COSTS_N_INSNS 
(2).  In both of those cases I think we return true.

For !TARGET_ZBB, COSTS_N_INSNS (N) where N is the number of instructions 
to implement a min/max without TARGET_ZBB would be the right value.  I 
just don't know it offhand.

Jeff
  

Patch

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d17133d58c1..abf08a29e89 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -360,14 +360,42 @@ 
   DONE;
 })
 
-(define_insn "<bitmanip_optab><mode>3"
-  [(set (match_operand:X 0 "register_operand" "=r")
-        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
-			   (match_operand:X 2 "register_operand" "r")))]
-  "TARGET_ZBB"
+(define_insn "<bitmanip_optab>si3_insn"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	 (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+			     (match_operand:SI 2 "register_operand" "r")))]
+  "!TARGET_64BIT && TARGET_ZBB"
   "<bitmanip_insn>\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn "<bitmanip_optab>di3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	 (bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
+			     (match_operand:DI 2 "register_operand" "r")))]
+  "TARGET_64BIT && TARGET_ZBB"
+  "<bitmanip_insn>\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
+(define_expand "<bitmanip_optab>si3"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	 (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+			     (match_operand:SI 2 "register_operand" "r")))]
+  "TARGET_ZBB"
+  "
+{
+  if (TARGET_64BIT)
+    {
+      rtx op1_x = gen_reg_rtx (DImode);
+      emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
+      rtx op2_x = gen_reg_rtx (DImode);
+      emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
+      rtx dst_x = gen_reg_rtx (DImode);
+      emit_insn (gen_<bitmanip_optab>di3 (dst_x, op1_x, op2_x));
+      emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
+      DONE;
+    }
+}")
+
 ;; Optimize the common case of a SImode min/max against a constant
 ;; that is safe both for sign- and zero-extension.
 (define_insn_and_split "*minmax"
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-abs.c b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
new file mode 100644
index 00000000000..6ef7efdbd49
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#define ABS(x) (((x) >= 0) ? (x) : -(x))
+
+int
+foo (int x)
+{
+  return ABS(x);
+}
+
+/* { dg-final { scan-assembler-times "neg" 1 } } */
+/* { dg-final { scan-assembler-times "max" 1 } } */
+/* { dg-final { scan-assembler-not "sraiw" } } */
+/* { dg-final { scan-assembler-not "xor" } } */
+/* { dg-final { scan-assembler-not "subw" } } */
+
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
index b462859f10f..b9db655d55d 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
@@ -9,6 +9,6 @@  int f(unsigned int* a)
 }
 
 /* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "sext.w" 1 } } */
+/* { dg-final { scan-assembler-times "sext.w|addw" 1 } } */
 /* { dg-final { scan-assembler-not "zext.w" } } */