[1/3] bpf: md: Enable zero_extend{hi, qi}di2 to generate wN regs ...
Checks
Commit Message
... by adjusting the asm template reg string to 'W', similar to commit
gbd275e81812c for zero_extendsidi2. However unike that change, this is
not a bug fix: zero_extendsidi2 relies on wN reg move to achieve the
32->64 semantics. A {8,16}->64 extension needs to be emulated anyways
by masking out extra bits so the reg used is not imp.
However wN reg based codegen is still preferable in general:
- It is easier on verifier as it has to track fewer bits and avoids
corner cases which could trip it up for false positives.
- Better native codegen as upper bits are guarateed to be zero and can
leverage target ISA mechanisms to achieve that mostly for free.
- Better nativ codegen on 32-bit targets which need to use 2 regs for
rN regs.
gcc/ChangeLog:
* config/bpf/bpf.md (zero_extendhidi2): Add alternates for
alu32 and use 'W' for reg operands.
(zero_extendqidi2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/zero-ext.c: Add tests for {HI,QI}mode values.
Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
---
gcc/config/bpf/bpf.md | 8 ++++----
gcc/testsuite/gcc.target/bpf/zero-ext.c | 16 ++++++++++++++++
2 files changed, 20 insertions(+), 4 deletions(-)
Comments
Hi Vineet,
On 3/29/26 16:17, Vineet Gupta wrote:
> ... by adjusting the asm template reg string to 'W', similar to commit
> gbd275e81812c for zero_extendsidi2. However unike that change, this is
> not a bug fix: zero_extendsidi2 relies on wN reg move to achieve the
> 32->64 semantics. A {8,16}->64 extension needs to be emulated anyways
> by masking out extra bits so the reg used is not imp.
>
> However wN reg based codegen is still preferable in general:
> - It is easier on verifier as it has to track fewer bits and avoids
> corner cases which could trip it up for false positives.
> - Better native codegen as upper bits are guarateed to be zero and can
> leverage target ISA mechanisms to achieve that mostly for free.
> - Better nativ codegen on 32-bit targets which need to use 2 regs for
> rN regs.
nit: some typos in the commit message would be good to fix before
pushing.
But the patch itself makes sense and lgtm.
OK.
>
> gcc/ChangeLog:
>
> * config/bpf/bpf.md (zero_extendhidi2): Add alternates for
> alu32 and use 'W' for reg operands.
> (zero_extendqidi2): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/bpf/zero-ext.c: Add tests for {HI,QI}mode values.
>
> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
> ---
> gcc/config/bpf/bpf.md | 8 ++++----
> gcc/testsuite/gcc.target/bpf/zero-ext.c | 16 ++++++++++++++++
> 2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index c2c3152b5d7c..a2bceb8998d7 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -279,8 +279,8 @@
> (zero_extend:DI (match_operand:HI 1 "nonimmediate_operand" "0,r,q")))]
> ""
> "@
> - {and\t%0,0xffff|%0 &= 0xffff}
> - *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
> + *return bpf_has_alu32 ? \"{and32\t%0,0xffff|%W0 &= 0xffff}\" : \"{and\t%0,0xffff|%0 &= 0xffff}\";
> + *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xffff|%W0 = %W1;%W0 &= 0xffff}\" : \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
> *return bpf_output_move (operands, \"{ldxh\t%0,%1|%0 = *(u16 *) %1}\");"
> [(set_attr "type" "alu,alu,ldx")])
>
> @@ -289,8 +289,8 @@
> (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "0,r,q")))]
> ""
> "@
> - {and\t%0,0xff|%0 &= 0xff}
> - *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
> + *return bpf_has_alu32 ? \"{and32\t%0,0xff|%W0 &= 0xff}\" : \"{and\t%0,0xff|%0 &= 0xff}\";
> + *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xff|%W0 = %W1;%W0 &= 0xff}\" : \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
> *return bpf_output_move (operands, \"{ldxb\t%0,%1|%0 = *(u8 *) %1}\");"
> [(set_attr "type" "alu,alu,ldx")])
>
> diff --git a/gcc/testsuite/gcc.target/bpf/zero-ext.c b/gcc/testsuite/gcc.target/bpf/zero-ext.c
> index a57c7dc521f0..b6ab7dab8bc0 100644
> --- a/gcc/testsuite/gcc.target/bpf/zero-ext.c
> +++ b/gcc/testsuite/gcc.target/bpf/zero-ext.c
> @@ -2,6 +2,8 @@
> /* { dg-options "-O2 -mcpu=v4" } */
>
> int bar_int(void);
> +short bar_short(void);
> +_Bool bar_bool(void);
>
> int foo_int(void) {
> if (bar_int() != 1) return 0; else return 1;
> @@ -9,3 +11,17 @@ int foo_int(void) {
>
> /* { dg-final { scan-assembler-not {r0 = r0} } } */
> /* { dg-final { scan-assembler-times {w0 = w0} 1 } } */
> +
> +int trigger_zext_hidi(void) {
> + if (bar_short() != 1) return 0; else return 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {r0 &= 0xffff} } } */
> +/* { dg-final { scan-assembler-times {w0 &= 0xffff} 1 } } */
> +
> +int trigger_zext_qidi(void) {
> + if (bar_bool() != 0) return 0; else return 1;
> +}
> +
> +/* { dg-final { scan-assembler-not {r0 &= 0xff\n} } } */
> +/* { dg-final { scan-assembler-times {w0 &= 0xff\n} 1 } } */
On 3/30/26 12:15 PM, David Faust wrote:
> On 3/29/26 16:17, Vineet Gupta wrote:
>> ... by adjusting the asm template reg string to 'W', similar to commit
>> gbd275e81812c for zero_extendsidi2. However unike that change, this is
>> not a bug fix: zero_extendsidi2 relies on wN reg move to achieve the
>> 32->64 semantics. A {8,16}->64 extension needs to be emulated anyways
>> by masking out extra bits so the reg used is not imp.
>>
>> However wN reg based codegen is still preferable in general:
>> - It is easier on verifier as it has to track fewer bits and avoids
>> corner cases which could trip it up for false positives.
>> - Better native codegen as upper bits are guarateed to be zero and can
>> leverage target ISA mechanisms to achieve that mostly for free.
>> - Better nativ codegen on 32-bit targets which need to use 2 regs for
>> rN regs.
> nit: some typos in the commit message would be good to fix before
> pushing.
Done !
> But the patch itself makes sense and lgtm.
> OK.
Committed !
Thx,
-Vineet
@@ -279,8 +279,8 @@
(zero_extend:DI (match_operand:HI 1 "nonimmediate_operand" "0,r,q")))]
""
"@
- {and\t%0,0xffff|%0 &= 0xffff}
- *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
+ *return bpf_has_alu32 ? \"{and32\t%0,0xffff|%W0 &= 0xffff}\" : \"{and\t%0,0xffff|%0 &= 0xffff}\";
+ *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xffff|%W0 = %W1;%W0 &= 0xffff}\" : \"{mov\t%0,%1\;and\t%0,0xffff|%0 = %1;%0 &= 0xffff}\");
*return bpf_output_move (operands, \"{ldxh\t%0,%1|%0 = *(u16 *) %1}\");"
[(set_attr "type" "alu,alu,ldx")])
@@ -289,8 +289,8 @@
(zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "0,r,q")))]
""
"@
- {and\t%0,0xff|%0 &= 0xff}
- *return bpf_output_move (operands, \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
+ *return bpf_has_alu32 ? \"{and32\t%0,0xff|%W0 &= 0xff}\" : \"{and\t%0,0xff|%0 &= 0xff}\";
+ *return bpf_output_move (operands, bpf_has_alu32 ? \"{mov\t%0,%1\;and\t%0,0xff|%W0 = %W1;%W0 &= 0xff}\" : \"{mov\t%0,%1\;and\t%0,0xff|%0 = %1;%0 &= 0xff}\");
*return bpf_output_move (operands, \"{ldxb\t%0,%1|%0 = *(u8 *) %1}\");"
[(set_attr "type" "alu,alu,ldx")])
@@ -2,6 +2,8 @@
/* { dg-options "-O2 -mcpu=v4" } */
int bar_int(void);
+short bar_short(void);
+_Bool bar_bool(void);
int foo_int(void) {
if (bar_int() != 1) return 0; else return 1;
@@ -9,3 +11,17 @@ int foo_int(void) {
/* { dg-final { scan-assembler-not {r0 = r0} } } */
/* { dg-final { scan-assembler-times {w0 = w0} 1 } } */
+
+int trigger_zext_hidi(void) {
+ if (bar_short() != 1) return 0; else return 1;
+}
+
+/* { dg-final { scan-assembler-not {r0 &= 0xffff} } } */
+/* { dg-final { scan-assembler-times {w0 &= 0xffff} 1 } } */
+
+int trigger_zext_qidi(void) {
+ if (bar_bool() != 0) return 0; else return 1;
+}
+
+/* { dg-final { scan-assembler-not {r0 &= 0xff\n} } } */
+/* { dg-final { scan-assembler-times {w0 &= 0xff\n} 1 } } */