[1/3] bpf: md: Enable zero_extend{hi, qi}di2 to generate wN regs ...

Message ID 20260329231754.2325557-2-vineet.gupta@linux.dev
State Committed
Commit 2c1a819254cc892877014169f99edce7eaced66f
Headers
Series bpf: Enable wN reg codegen for bug-fix and fun |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Vineet Gupta March 29, 2026, 11:17 p.m. UTC
  ... by adjusting the asm template reg string to 'W', similar to commit
gbd275e81812c for zero_extendsidi2. However unike that change, this is
not a bug fix: zero_extendsidi2 relies on wN reg move to achieve the
32->64 semantics. A {8,16}->64 extension needs to be emulated anyways
by masking out extra bits so the reg used is not imp.

However wN reg based codegen is still preferable in general:
 - It is easier on verifier as it has to track fewer bits and avoids
   corner cases which could trip it up for false positives.
 - Better native codegen as upper bits are guarateed to be zero and can
   leverage target ISA mechanisms to achieve that mostly for free.
 - Better nativ codegen on 32-bit targets which need to use 2 regs for
   rN regs.

gcc/ChangeLog:

	* config/bpf/bpf.md (zero_extendhidi2): Add alternates for
	alu32 and use 'W' for reg operands.
	(zero_extendqidi2): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/bpf/zero-ext.c: Add tests for {HI,QI}mode values.

Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
---
 gcc/config/bpf/bpf.md                   |  8 ++++----
 gcc/testsuite/gcc.target/bpf/zero-ext.c | 16 ++++++++++++++++
 2 files changed, 20 insertions(+), 4 deletions(-)
  

Comments

David Faust March 30, 2026, 7:15 p.m. UTC | #1
Hi Vineet,

On 3/29/26 16:17, Vineet Gupta wrote:
> ... by adjusting the asm template reg string to 'W', similar to commit
> gbd275e81812c for zero_extendsidi2. However unike that change, this is
> not a bug fix: zero_extendsidi2 relies on wN reg move to achieve the
> 32->64 semantics. A {8,16}->64 extension needs to be emulated anyways
> by masking out extra bits so the reg used is not imp.
> 
> However wN reg based codegen is still preferable in general:
>  - It is easier on verifier as it has to track fewer bits and avoids
>    corner cases which could trip it up for false positives.
>  - Better native codegen as upper bits are guarateed to be zero and can
>    leverage target ISA mechanisms to achieve that mostly for free.
>  - Better nativ codegen on 32-bit targets which need to use 2 regs for
>    rN regs.

nit: some typos in the commit message would be good to fix before
pushing.

But the patch itself makes sense and lgtm.
OK.

> 
> gcc/ChangeLog:
> 
> 	* config/bpf/bpf.md (zero_extendhidi2): Add alternates for
> 	alu32 and use 'W' for reg operands.
> 	(zero_extendqidi2): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/bpf/zero-ext.c: Add tests for {HI,QI}mode values.
> 
> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
> ---
>  gcc/config/bpf/bpf.md                   |  8 ++++----
>  gcc/testsuite/gcc.target/bpf/zero-ext.c | 16 ++++++++++++++++
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index c2c3152b5d7c..a2bceb8998d7 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -279,8 +279,8 @@
>  	(zero_extend:DI (match_operand:HI 1 "nonimmediate_operand" "0,r,q")))]
>    ""
>    "@
> -   {and\t%0,0xffff|%0 &= 0xffff}
> -   *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
> +   *return bpf_has_alu32 ? \"{and32\t%0,0xffff|%W0 &= 0xffff}\" : \"{and\t%0,0xffff|%0 &= 0xffff}\";
> +   *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xffff|%W0 = %W1;%W0 &= 0xffff}\" : \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
>     *return bpf_output_move (operands, \"{ldxh\t%0,%1|%0 = *(u16 *) %1}\");"
>    [(set_attr "type" "alu,alu,ldx")])
>  
> @@ -289,8 +289,8 @@
>  	(zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "0,r,q")))]
>    ""
>    "@
> -   {and\t%0,0xff|%0 &= 0xff}
> -   *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
> +   *return bpf_has_alu32 ? \"{and32\t%0,0xff|%W0 &= 0xff}\" : \"{and\t%0,0xff|%0 &= 0xff}\";
> +   *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xff|%W0 = %W1;%W0 &= 0xff}\" : \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
>     *return bpf_output_move (operands, \"{ldxb\t%0,%1|%0 = *(u8 *) %1}\");"
>    [(set_attr "type" "alu,alu,ldx")])
>  
> diff --git a/gcc/testsuite/gcc.target/bpf/zero-ext.c b/gcc/testsuite/gcc.target/bpf/zero-ext.c
> index a57c7dc521f0..b6ab7dab8bc0 100644
> --- a/gcc/testsuite/gcc.target/bpf/zero-ext.c
> +++ b/gcc/testsuite/gcc.target/bpf/zero-ext.c
> @@ -2,6 +2,8 @@
>  /* { dg-options "-O2 -mcpu=v4" } */
>  
>  int bar_int(void);
> +short bar_short(void);
> +_Bool bar_bool(void);
>  
>  int foo_int(void) {
>        if (bar_int() != 1) return 0; else return 1;
> @@ -9,3 +11,17 @@ int foo_int(void) {
>  
>  /* { dg-final { scan-assembler-not {r0 = r0} } } */
>  /* { dg-final { scan-assembler-times {w0 = w0} 1 } } */
> +
> +int trigger_zext_hidi(void) {
> +      if (bar_short() != 1) return 0; else return 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {r0 &= 0xffff} } } */
> +/* { dg-final { scan-assembler-times {w0 &= 0xffff} 1 } } */
> +
> +int trigger_zext_qidi(void) {
> +      if (bar_bool() != 0) return 0; else return 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {r0 &= 0xff\n} } } */
> +/* { dg-final { scan-assembler-times {w0 &= 0xff\n} 1 } } */
  
Vineet Gupta March 30, 2026, 8:16 p.m. UTC | #2
On 3/30/26 12:15 PM, David Faust wrote:
> On 3/29/26 16:17, Vineet Gupta wrote:
>> ... by adjusting the asm template reg string to 'W', similar to commit
>> gbd275e81812c for zero_extendsidi2. However unike that change, this is
>> not a bug fix: zero_extendsidi2 relies on wN reg move to achieve the
>> 32->64 semantics. A {8,16}->64 extension needs to be emulated anyways
>> by masking out extra bits so the reg used is not imp.
>>
>> However wN reg based codegen is still preferable in general:
>>   - It is easier on verifier as it has to track fewer bits and avoids
>>     corner cases which could trip it up for false positives.
>>   - Better native codegen as upper bits are guarateed to be zero and can
>>     leverage target ISA mechanisms to achieve that mostly for free.
>>   - Better nativ codegen on 32-bit targets which need to use 2 regs for
>>     rN regs.
> nit: some typos in the commit message would be good to fix before
> pushing.

Done !

> But the patch itself makes sense and lgtm.
> OK.

Committed !

Thx,
-Vineet
  

Patch

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index c2c3152b5d7c..a2bceb8998d7 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -279,8 +279,8 @@ 
 	(zero_extend:DI (match_operand:HI 1 "nonimmediate_operand" "0,r,q")))]
   ""
   "@
-   {and\t%0,0xffff|%0 &= 0xffff}
-   *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
+   *return bpf_has_alu32 ? \"{and32\t%0,0xffff|%W0 &= 0xffff}\" : \"{and\t%0,0xffff|%0 &= 0xffff}\";
+   *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xffff|%W0 = %W1;%W0 &= 0xffff}\" : \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
    *return bpf_output_move (operands, \"{ldxh\t%0,%1|%0 = *(u16 *) %1}\");"
   [(set_attr "type" "alu,alu,ldx")])
 
@@ -289,8 +289,8 @@ 
 	(zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "0,r,q")))]
   ""
   "@
-   {and\t%0,0xff|%0 &= 0xff}
-   *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
+   *return bpf_has_alu32 ? \"{and32\t%0,0xff|%W0 &= 0xff}\" : \"{and\t%0,0xff|%0 &= 0xff}\";
+   *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xff|%W0 = %W1;%W0 &= 0xff}\" : \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
    *return bpf_output_move (operands, \"{ldxb\t%0,%1|%0 = *(u8 *) %1}\");"
   [(set_attr "type" "alu,alu,ldx")])
 
diff --git a/gcc/testsuite/gcc.target/bpf/zero-ext.c b/gcc/testsuite/gcc.target/bpf/zero-ext.c
index a57c7dc521f0..b6ab7dab8bc0 100644
--- a/gcc/testsuite/gcc.target/bpf/zero-ext.c
+++ b/gcc/testsuite/gcc.target/bpf/zero-ext.c
@@ -2,6 +2,8 @@ 
 /* { dg-options "-O2 -mcpu=v4" } */
 
 int bar_int(void);
+short bar_short(void);
+_Bool bar_bool(void);
 
 int foo_int(void) {
       if (bar_int() != 1) return 0; else return 1;
@@ -9,3 +11,17 @@  int foo_int(void) {
 
 /* { dg-final { scan-assembler-not {r0 = r0} } } */
 /* { dg-final { scan-assembler-times {w0 = w0} 1 } } */
+
+int trigger_zext_hidi(void) {
+      if (bar_short() != 1) return 0; else return 1;
+}
+
+/* { dg-final { scan-assembler-not {r0 &= 0xffff} } } */
+/* { dg-final { scan-assembler-times {w0 &= 0xffff} 1 } } */
+
+int trigger_zext_qidi(void) {
+      if (bar_bool() != 0) return 0; else return 1;
+}
+
+/* { dg-final { scan-assembler-not {r0 &= 0xff\n} } } */
+/* { dg-final { scan-assembler-times {w0 &= 0xff\n} 1 } } */