RISC-V: Produce better code with complex constants [PR95632] [PR106602]

Message ID 20221207205517.526182-1-rzinsly@ventanamicro.com
State Deferred, archived
Headers
Series RISC-V: Produce better code with complex constants [PR95632] [PR106602] |

Commit Message

Raphael Moreira Zinsly Dec. 7, 2022, 8:55 p.m. UTC
  Due to RISC-V limitations on operations with big constants combine
is failing to match such operations and is not being able to
produce optimal code as it keeps splitting them. By pretending we
can do those operations we can get more opportunities for
simplification of surrounding instructions.

2022-12-06 Raphael Moreira Zinsly <rzinsly@ventanamicro.com>
           Jeff Law <jlaw@ventanamicro.com>

gcc/Changelog:
	PR target/95632
        PR target/106602
        * config/riscv/riscv.md: New pattern to simulate complex
        const_int loads.

gcc/testsuite/ChangeLog:
	* gcc.target/riscv/pr95632.c: New test.
        * gcc.target/riscv/pr106602.c: Likewise.
---
 gcc/config/riscv/riscv.md                 | 16 ++++++++++++++++
 gcc/testsuite/gcc.target/riscv/pr106602.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/riscv/pr95632.c  | 15 +++++++++++++++
 3 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr106602.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr95632.c
  

Comments

Jeff Law Dec. 7, 2022, 9:13 p.m. UTC | #1
On 12/7/22 13:55, Raphael Moreira Zinsly wrote:
> Due to RISC-V limitations on operations with big constants combine
> is failing to match such operations and is not being able to
> produce optimal code as it keeps splitting them. By pretending we
> can do those operations we can get more opportunities for
> simplification of surrounding instructions.
> 
> 2022-12-06 Raphael Moreira Zinsly <rzinsly@ventanamicro.com>
>             Jeff Law <jlaw@ventanamicro.com>
> 
> gcc/Changelog:
> 	PR target/95632
>          PR target/106602
>          * config/riscv/riscv.md: New pattern to simulate complex
>          const_int loads.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/riscv/pr95632.c: New test.
>          * gcc.target/riscv/pr106602.c: Likewise.
So to give a little background to others.

The core issue is that when we break down constants early, it can make 
it difficult for combine to reconstruct the constant and simplify code 
using the reconstructed constant -- you end up trying to do 4->3 or 
worse combination sequences which aren't supported by the combiner.

Usually this kind of scenario is handled with a "bridge" pattern.  Those 
are generally defined as patterns that exist solely for combine and may 
not correspond to any real instruction on the target.  "bridge" patterns 
are typically 2->1 or 3->1 combinations and are intermediate steps for 
4->N or even larger combination opportunities.  Obviously if the bridge 
doesn't allow subsequent simplifications, then the bridge pattern must 
generate correct code (either by generating suitable assembly or 
splitting later).

Raphael's patch introduces a bridge pattern that pretends we can load up 
splittable constants in a single insn.  We restrict the bridge pattern 
to be active from the point when CSE is no longer expected through the 
combiner up to the first splitter pass (where we'll break it down again 
if it's still in the IL).

So we get most of the benefit of splitting constants early (CSE, LICM, 
etc) while also getting the benefits of splitting late (combine 
simplifications).

Given I was working with Raphael on the patch, it's probably best for 
someone else to do the review rather than me approving it :-)

Jeff
  
Jakub Jelinek Dec. 7, 2022, 9:30 p.m. UTC | #2
On Wed, Dec 07, 2022 at 05:55:17PM -0300, Raphael Moreira Zinsly wrote:
> Due to RISC-V limitations on operations with big constants combine
> is failing to match such operations and is not being able to
> produce optimal code as it keeps splitting them. By pretending we
> can do those operations we can get more opportunities for
> simplification of surrounding instructions.
> 
> 2022-12-06 Raphael Moreira Zinsly <rzinsly@ventanamicro.com>
>            Jeff Law <jlaw@ventanamicro.com>

Just nits, not a proper review.
2 spaces after date and 2 spaces before <, rather than just 1.

> 
> gcc/Changelog:
> 	PR target/95632
>         PR target/106602
>         * config/riscv/riscv.md: New pattern to simulate complex
>         const_int loads.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.target/riscv/pr95632.c: New test.
>         * gcc.target/riscv/pr106602.c: Likewise.

All lines in the ChangeLog should be tab indented, rather than just some of
them and others with 8 spaces.

> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -1667,6 +1667,22 @@
>  		      MAX_MACHINE_MODE, &operands[3], TRUE);
>  })
>  
> +;; Pretend to have the ability to load complex const_int in order to get
> +;; better code generation around them.
> +(define_insn_and_split ""

define_insn_and_split patterns better should have some name, even if it
starts with *.  It makes dumps more readable, and you can refer to it
in the ChangeLog when it is added or changed etc.

> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> +    (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
> +  "cse_not_expected"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +

Why the empty line?

> +{
> +  riscv_move_integer (operands[0], operands[0], INTVAL (operands[1]),
> +		      <GPR:MODE>mode, TRUE);

You can just use <MODE> if there is only one iterator in the pattern.

	Jakub
  
Palmer Dabbelt Dec. 8, 2022, 5:39 p.m. UTC | #3
On Wed, 07 Dec 2022 13:30:32 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> On Wed, Dec 07, 2022 at 05:55:17PM -0300, Raphael Moreira Zinsly wrote:
>> Due to RISC-V limitations on operations with big constants combine
>> is failing to match such operations and is not being able to
>> produce optimal code as it keeps splitting them. By pretending we
>> can do those operations we can get more opportunities for
>> simplification of surrounding instructions.
>>
>> 2022-12-06 Raphael Moreira Zinsly <rzinsly@ventanamicro.com>
>>            Jeff Law <jlaw@ventanamicro.com>
>
> Just nits, not a proper review.
> 2 spaces after date and 2 spaces before <, rather than just 1.
>
>>
>> gcc/Changelog:
>> 	PR target/95632
>>         PR target/106602
>>         * config/riscv/riscv.md: New pattern to simulate complex
>>         const_int loads.
>>
>> gcc/testsuite/ChangeLog:
>> 	* gcc.target/riscv/pr95632.c: New test.
>>         * gcc.target/riscv/pr106602.c: Likewise.
>
> All lines in the ChangeLog should be tab indented, rather than just some of
> them and others with 8 spaces.

There's alsot contrib/git-commit-mklog.py, which provides a template for 
these (I also have trouble remembering the formatting rules).

>
>> --- a/gcc/config/riscv/riscv.md
>> +++ b/gcc/config/riscv/riscv.md
>> @@ -1667,6 +1667,22 @@
>>  		      MAX_MACHINE_MODE, &operands[3], TRUE);
>>  })
>>
>> +;; Pretend to have the ability to load complex const_int in order to get
>> +;; better code generation around them.
>> +(define_insn_and_split ""
>
> define_insn_and_split patterns better should have some name, even if it
> starts with *.  It makes dumps more readable, and you can refer to it
> in the ChangeLog when it is added or changed etc.
>
>> +  [(set (match_operand:GPR 0 "register_operand" "=r")
>> +    (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
>> +  "cse_not_expected"
>> +  "#"
>> +  "&& 1"
>> +  [(const_int 0)]
>> +
>
> Why the empty line?
>
>> +{
>> +  riscv_move_integer (operands[0], operands[0], INTVAL (operands[1]),
>> +		      <GPR:MODE>mode, TRUE);
>
> You can just use <MODE> if there is only one iterator in the pattern.
>
> 	Jakub
  
Palmer Dabbelt Dec. 8, 2022, 5:53 p.m. UTC | #4
On Wed, 07 Dec 2022 12:55:17 PST (-0800), rzinsly@ventanamicro.com wrote:
> Due to RISC-V limitations on operations with big constants combine
> is failing to match such operations and is not being able to
> produce optimal code as it keeps splitting them. By pretending we
> can do those operations we can get more opportunities for
> simplification of surrounding instructions.

I saw Jeff's comments.  This is always the kind of thing that worries 
me: we're essentially lying to the optimizer in order to trick it into 
generating better code, which might just make it generate worse code.  
It's always easy to see a small example that improves, but those could 
be wiped out by secondary effects in real code.  So I'd usually want to 
have some benchmarking for a patch like this.

That said, if this is just the standard way of doing things then maybe 
it's just fine?

> 2022-12-06 Raphael Moreira Zinsly <rzinsly@ventanamicro.com>
>            Jeff Law <jlaw@ventanamicro.com>
>
> gcc/Changelog:
> 	PR target/95632
>         PR target/106602
>         * config/riscv/riscv.md: New pattern to simulate complex
>         const_int loads.
>
> gcc/testsuite/ChangeLog:
> 	* gcc.target/riscv/pr95632.c: New test.
>         * gcc.target/riscv/pr106602.c: Likewise.
> ---
>  gcc/config/riscv/riscv.md                 | 16 ++++++++++++++++
>  gcc/testsuite/gcc.target/riscv/pr106602.c | 14 ++++++++++++++
>  gcc/testsuite/gcc.target/riscv/pr95632.c  | 15 +++++++++++++++
>  3 files changed, 45 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr106602.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr95632.c
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index df57e2b0b4a..0a9b5ec22b0 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -1667,6 +1667,22 @@
>  		      MAX_MACHINE_MODE, &operands[3], TRUE);
>  })
>
> +;; Pretend to have the ability to load complex const_int in order to get
> +;; better code generation around them.
> +(define_insn_and_split ""
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> +    (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
> +  "cse_not_expected"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +
> +{
> +  riscv_move_integer (operands[0], operands[0], INTVAL (operands[1]),
> +		      <GPR:MODE>mode, TRUE);
> +  DONE;
> +})

There's some comments from Jakub on this, I don't see any additional 
issues with the code (aside from the "does it help" stuff from above).

> +
>  ;; 64-bit integer moves
>
>  (define_expand "movdi"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr106602.c b/gcc/testsuite/gcc.target/riscv/pr106602.c
> new file mode 100644
> index 00000000000..83b70877012
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr106602.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv64gc" } */

There's a DG hook to limit this to 64-bit targets, that way it'll run 
with whatever target is being tested.

> +
> +unsigned long
> +foo2 (unsigned long a)
> +{
> +  return (unsigned long)(unsigned int) a << 6;
> +}
> +
> +/* { dg-final { scan-assembler-times "slli\t" 1 } } */
> +/* { dg-final { scan-assembler-times "srli\t" 1 } } */
> +/* { dg-final { scan-assembler-not "\tli\t" } } */
> +/* { dg-final { scan-assembler-not "addi\t" } } */
> +/* { dg-final { scan-assembler-not "and\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/pr95632.c b/gcc/testsuite/gcc.target/riscv/pr95632.c
> new file mode 100644
> index 00000000000..bd316ab1d7b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr95632.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv32imafc -mabi=ilp32f" } */

Is there a reason to make this rv32-only?  Unless I'm missing something 
this should generate pretty much the same code for rv64.

> +
> +unsigned short
> +foo (unsigned short crc)
> +{
> +  crc ^= 0x4002;
> +  crc >>= 1;
> +  crc |= 0x8000;
> +
> +  return crc;
> +}
> +
> +/* { dg-final { scan-assembler-times "srli\t" 1 } } */
> +/* { dg-final { scan-assembler-not "slli\t" } } */
  
Jeff Law Dec. 8, 2022, 6:15 p.m. UTC | #5
On 12/8/22 10:53, Palmer Dabbelt wrote:
> On Wed, 07 Dec 2022 12:55:17 PST (-0800), rzinsly@ventanamicro.com wrote:
>> Due to RISC-V limitations on operations with big constants combine
>> is failing to match such operations and is not being able to
>> produce optimal code as it keeps splitting them. By pretending we
>> can do those operations we can get more opportunities for
>> simplification of surrounding instructions.
> 
> I saw Jeff's comments.  This is always the kind of thing that worries 
> me: we're essentially lying to the optimizer in order to trick it into 
> generating better code, which might just make it generate worse code. 
> It's always easy to see a small example that improves, but those could 
> be wiped out by secondary effects in real code.  So I'd usually want to 
> have some benchmarking for a patch like this.
> 
> That said, if this is just the standard way of doing things then maybe 
> it's just fine?
Bridge combiner patterns are pretty standard.  The insn's condition of 
cse_not_expected is also in there to minimize the potential for 
surprises by not exposing this too early.

jeff
  
Palmer Dabbelt Dec. 8, 2022, 8:21 p.m. UTC | #6
On Thu, 08 Dec 2022 10:15:47 PST (-0800), gcc-patches@gcc.gnu.org wrote:
>
>
> On 12/8/22 10:53, Palmer Dabbelt wrote:
>> On Wed, 07 Dec 2022 12:55:17 PST (-0800), rzinsly@ventanamicro.com wrote:
>>> Due to RISC-V limitations on operations with big constants combine
>>> is failing to match such operations and is not being able to
>>> produce optimal code as it keeps splitting them. By pretending we
>>> can do those operations we can get more opportunities for
>>> simplification of surrounding instructions.
>>
>> I saw Jeff's comments.  This is always the kind of thing that worries
>> me: we're essentially lying to the optimizer in order to trick it into
>> generating better code, which might just make it generate worse code.
>> It's always easy to see a small example that improves, but those could
>> be wiped out by secondary effects in real code.  So I'd usually want to
>> have some benchmarking for a patch like this.
>>
>> That said, if this is just the standard way of doing things then maybe
>> it's just fine?
> Bridge combiner patterns are pretty standard.  The insn's condition of
> cse_not_expected is also in there to minimize the potential for
> surprises by not exposing this too early.

OK, I'm fine with this, then -- aside from the fairly minor issues 
pointed out.
  

Patch

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index df57e2b0b4a..0a9b5ec22b0 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1667,6 +1667,22 @@ 
 		      MAX_MACHINE_MODE, &operands[3], TRUE);
 })
 
+;; Pretend to have the ability to load complex const_int in order to get
+;; better code generation around them.
+(define_insn_and_split ""
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+    (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
+  "cse_not_expected"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+
+{
+  riscv_move_integer (operands[0], operands[0], INTVAL (operands[1]),
+		      <GPR:MODE>mode, TRUE);
+  DONE;
+})
+
 ;; 64-bit integer moves
 
 (define_expand "movdi"
diff --git a/gcc/testsuite/gcc.target/riscv/pr106602.c b/gcc/testsuite/gcc.target/riscv/pr106602.c
new file mode 100644
index 00000000000..83b70877012
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr106602.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc" } */
+
+unsigned long
+foo2 (unsigned long a)
+{
+  return (unsigned long)(unsigned int) a << 6;
+}
+
+/* { dg-final { scan-assembler-times "slli\t" 1 } } */
+/* { dg-final { scan-assembler-times "srli\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tli\t" } } */
+/* { dg-final { scan-assembler-not "addi\t" } } */
+/* { dg-final { scan-assembler-not "and\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr95632.c b/gcc/testsuite/gcc.target/riscv/pr95632.c
new file mode 100644
index 00000000000..bd316ab1d7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr95632.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32imafc -mabi=ilp32f" } */
+
+unsigned short
+foo (unsigned short crc)
+{
+  crc ^= 0x4002;
+  crc >>= 1;
+  crc |= 0x8000;
+
+  return crc;
+}
+
+/* { dg-final { scan-assembler-times "srli\t" 1 } } */
+/* { dg-final { scan-assembler-not "slli\t" } } */