[2/3] bpf: md: fix "*movsi" to generate wN regs [PR124688]
Checks
Commit Message
From: Vineet Gupta <gvineet@fb.com>
movsi currently only generates DImode rN regs, despite RTL being SImode.
| (insn 14 5 11 (set (reg:SI 1 %r1 [23])
| (reg:SI 0 %r0)) {*movsi}
| (expr_list:REG_DEAD (reg:SI 0 %r0)
| (nil)))
generates
| r1 = r0
as opposed to
| w1 = w0
This is not just issue of taste or getting more wN regs. As
illustrated by test, this can be a correctness issue where mov
needs to zero out the upper bits.
Again the issue is asm teplmate of pattern missing 'w' specifier,
leading bpf_print_register() to only generate 64-bit rN regs.
Using 'w' allows either wN/rN reg depending on the mode.
PR target/124688
gcc/ChangeLog:
* config/bpf/bpf.md (*movsi): Add 'w' to asm template.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/ret-reuse-arg-1.c: New test.
Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
---
gcc/config/bpf/bpf.md | 10 +++++-----
gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c | 14 ++++++++++++++
2 files changed, 19 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c
Comments
On 3/29/26 4:17 PM, Vineet Gupta wrote:
> From: Vineet Gupta <gvineet@fb.com>
>
> movsi currently only generates DImode rN regs, despite RTL being SImode.
>
> | (insn 14 5 11 (set (reg:SI 1 %r1 [23])
> | (reg:SI 0 %r0)) {*movsi}
> | (expr_list:REG_DEAD (reg:SI 0 %r0)
> | (nil)))
>
> generates
>
> | r1 = r0
>
> as opposed to
>
> | w1 = w0
>
> This is not just issue of taste or getting more wN regs. As
> illustrated by test, this can be a correctness issue where mov
> needs to zero out the upper bits.
>
> Again the issue is asm teplmate of pattern missing 'w' specifier,
> leading bpf_print_register() to only generate 64-bit rN regs.
> Using 'w' allows either wN/rN reg depending on the mode.
>
> PR target/124688
>
> gcc/ChangeLog:
>
> * config/bpf/bpf.md (*movsi): Add 'w' to asm template.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/bpf/ret-reuse-arg-1.c: New test.
>
> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
Oh just realized that this needs assembler update. So need to post that
first and also gate this change based on configure time detection of
updated gas.
-Vineet
On 3/29/26 4:35 PM, Vineet Gupta wrote:
> On 3/29/26 4:17 PM, Vineet Gupta wrote:
>> From: Vineet Gupta <gvineet@fb.com>
>>
>> movsi currently only generates DImode rN regs, despite RTL being SImode.
>>
>> | (insn 14 5 11 (set (reg:SI 1 %r1 [23])
>> | (reg:SI 0 %r0)) {*movsi}
>> | (expr_list:REG_DEAD (reg:SI 0 %r0)
>> | (nil)))
>>
>> generates
>>
>> | r1 = r0
>>
>> as opposed to
>>
>> | w1 = w0
>>
>> This is not just issue of taste or getting more wN regs. As
>> illustrated by test, this can be a correctness issue where mov
>> needs to zero out the upper bits.
>>
>> Again the issue is asm teplmate of pattern missing 'w' specifier,
>> leading bpf_print_register() to only generate 64-bit rN regs.
>> Using 'w' allows either wN/rN reg depending on the mode.
>>
>> PR target/124688
>>
>> gcc/ChangeLog:
>>
>> * config/bpf/bpf.md (*movsi): Add 'w' to asm template.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/bpf/ret-reuse-arg-1.c: New test.
>>
>> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
>
> Oh just realized that this needs assembler update. So need to post
> that first and also gate this change based on configure time detection
> of updated gas.
Essentially gcc would now generate following 'w' + 'r' reg variants in
addition to the existing 'r' only form.
wN = *(u32 *)(rM + imm)
*(u32 *) (rM + imm) = wN
Added support to gas, however when asking Claude to generate gas
updates, it reports the encodins for both forms to be same. Further
disassembler will only generate the 'r' output. So to avoid confusion,
it would make sense for gcc mem alternate to not emit the 'w' form
despite SImode.
Thx,
-Vineet
Hi Vineet,
On 3/29/26 16:17, Vineet Gupta wrote:
> From: Vineet Gupta <gvineet@fb.com>
>
> movsi currently only generates DImode rN regs, despite RTL being SImode.
>
> | (insn 14 5 11 (set (reg:SI 1 %r1 [23])
> | (reg:SI 0 %r0)) {*movsi}
> | (expr_list:REG_DEAD (reg:SI 0 %r0)
> | (nil)))
>
> generates
>
> | r1 = r0
>
> as opposed to
>
> | w1 = w0
>
> This is not just issue of taste or getting more wN regs. As
> illustrated by test, this can be a correctness issue where mov
> needs to zero out the upper bits.
>
> Again the issue is asm teplmate of pattern missing 'w' specifier,
> leading bpf_print_register() to only generate 64-bit rN regs.
> Using 'w' allows either wN/rN reg depending on the mode.
>
> PR target/124688
>
> gcc/ChangeLog:
>
> * config/bpf/bpf.md (*movsi): Add 'w' to asm template.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/bpf/ret-reuse-arg-1.c: New test.
>
> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
> ---
> gcc/config/bpf/bpf.md | 10 +++++-----
> gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c | 14 ++++++++++++++
> 2 files changed, 19 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index a2bceb8998d7..e6776a94e7e4 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -383,11 +383,11 @@
> (match_operand:MM 1 "mov_src_operand" " q,rIc,BC,r,I"))]
> ""
> "@
> - *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%0 = *(<smop> *) %1}\");
> - *return bpf_output_move (operands, \"{mov\t%0,%1|%0 = %1}\");
> - *return bpf_output_move (operands, \"{lddw\t%0,%1|%0 = %1 ll}\");
> - *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %0 = %1}\");
> - *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %0 = %1}\");"
> + *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%w0 = *(<smop> *) %w1}\");
> + *return bpf_output_move (operands, \"{mov\t%0,%1|%w0 = %w1}\");
> + *return bpf_output_move (operands, \"{lddw\t%0,%1|%w0 = %w1 ll}\");
> + *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");
> + *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");"
Based on our discussion in the BPF GCC meeting this morning
I think we decided that we should not emit wN for ld/st[x],
(though gas should accommodate them).
And the lddw pattern doesn't need to accommodate w regs since
it is exclusively for immediate values.
So then the only pattern that needs to change here would be
the register-register 'mov' insn itself. But the pattern is
matching a move from MM to MM, i.e. the source and dest must
already be the same mode. So are you sure we would need to emit
w reg in this case?
Maybe I am wrong but to me it seems the behavior in the test
below (and in the pr) must be more related to extendhisi.
> [(set_attr "type" "ldx,alu,alu,stx,st")])
>
> ;;;; Shifts
> diff --git a/gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c b/gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c
> new file mode 100644
> index 000000000000..6d0a4f280cd7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c
> @@ -0,0 +1,14 @@
> +/* Return value of first call is arg to second call. */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcpu=v4" } */
> +
> +int ret_int ();
> +void arg_int (int);
> +
> +void foo () {
> + arg_int(ret_int ());
> +}
> +
> +/* { dg-final { scan-assembler-not {r1 = r0} } } */
> +/* { dg-final { scan-assembler-times {w1 = w0} 1 } } */
Hi David,
Thanks for quickly reviewing the patches.
On 3/30/26 11:45 AM, David Faust wrote:
>> From: Vineet Gupta <gvineet@fb.com>
This somehow snuck in, will fix in v2.
>> movsi currently only generates DImode rN regs, despite RTL being SImode.
>>
>> | (insn 14 5 11 (set (reg:SI 1 %r1 [23])
>> | (reg:SI 0 %r0)) {*movsi}
>> | (expr_list:REG_DEAD (reg:SI 0 %r0)
>> | (nil)))
>>
>> generates
>>
>> | r1 = r0
>>
>> as opposed to
>>
>> | w1 = w0
[snip...]
>> - *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%0 = *(<smop> *) %1}\");
>> - *return bpf_output_move (operands, \"{mov\t%0,%1|%0 = %1}\");
>> - *return bpf_output_move (operands, \"{lddw\t%0,%1|%0 = %1 ll}\");
>> - *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %0 = %1}\");
>> - *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %0 = %1}\");"
>> + *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%w0 = *(<smop> *) %w1}\");
>> + *return bpf_output_move (operands, \"{mov\t%0,%1|%w0 = %w1}\");
>> + *return bpf_output_move (operands, \"{lddw\t%0,%1|%w0 = %w1 ll}\");
>> + *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");
>> + *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");"
> Based on our discussion in the BPF GCC meeting this morning
> I think we decided that we should not emit wN for ld/st[x],
> (though gas should accommodate them).
Correct, will do that in v2.
> And the lddw pattern doesn't need to accommodate w regs since
> it is exclusively for immediate values.
Correct, will revert that as well.
> So then the only pattern that needs to change here would be
> the register-register 'mov' insn itself. But the pattern is
> matching a move from MM to MM, i.e. the source and dest must
> already be the same mode. So are you sure we would need to emit
> w reg in this case?
Right, the modes not matching is not the issue, it emitting rN reg when
both operands are SImode is the problem which I describe in changelog above.
> Maybe I am wrong but to me it seems the behavior in the test
> below (and in the pr) must be more related to extendhisi.
Yes and No :-)
You are right that for the test below, it starts of with extend, sidi2
to be precise, which gets expanded to 2 shifts (Its on my todo to avoid
doing this for cpuv4 for better combine outcomes, but anyways that's for
later).
(insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
(ashift:DI (subreg:DI (reg:SI 21) 0)
(const_int 32 [0x20])))
(insn 9 8 10 2 (set (reg:DI 19 [ _1 ])
(ashiftrt:DI (reg:DI 19 [ _1 ])
(const_int 32 [0x20])))
(insn 10 9 11 2 (set (reg:SI 1 %r1)
(subreg/s/u:SI (reg:DI 19 [ _1 ]) 0))
Combine is able to see thru all this to generate insn 14
scanning new insn with uid = 14.
...
allowing combination of insns 8 and 9
deferring deletion of insn with uid = 8.
allowing combination of insns 9 and 10
deferring deletion of insn with uid = 9.
rescanning insn with uid = 10.
(insn 14 5 6 2 (set (reg:SI 23)
(reg:SI 0 %r0)) {*movsi}
And the -dP annotated output also confirms movsi involvement for the
actual asm.
#(insn 14 5 11 (set (reg:SI 1 %r1 [23])
# (reg:SI 0 %r0))
"gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c":10:4 35 {*movsi}
# (expr_list:REG_DEAD (reg:SI 0 %r0)
# (nil)))
r1 = r0 # 14 [c=4 l=8] *movsi/1
Spurred by your comment, I added a variant with "short" ret/args and
looks like movhi needs even more fixing as it needs to generate wN =
(s16) wM. I need to see if that needs to fixed in asm pattern or will
PROMOTE_MODE change later be able to handle that in the core.
Thx,
-Vineet
On 3/30/26 13:57, Vineet Gupta wrote:
> Hi David,
>
> Thanks for quickly reviewing the patches.
>
> On 3/30/26 11:45 AM, David Faust wrote:
>>> From: Vineet Gupta <gvineet@fb.com>
>
> This somehow snuck in, will fix in v2.
>
>>> movsi currently only generates DImode rN regs, despite RTL being SImode.
>>>
>>> | (insn 14 5 11 (set (reg:SI 1 %r1 [23])
>>> | (reg:SI 0 %r0)) {*movsi}
>>> | (expr_list:REG_DEAD (reg:SI 0 %r0)
>>> | (nil)))
>>>
>>> generates
>>>
>>> | r1 = r0
>>>
>>> as opposed to
>>>
>>> | w1 = w0
>
> [snip...]
>
>>> - *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%0 = *(<smop> *) %1}\");
>>> - *return bpf_output_move (operands, \"{mov\t%0,%1|%0 = %1}\");
>>> - *return bpf_output_move (operands, \"{lddw\t%0,%1|%0 = %1 ll}\");
>>> - *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %0 = %1}\");
>>> - *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %0 = %1}\");"
>>> + *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%w0 = *(<smop> *) %w1}\");
>>> + *return bpf_output_move (operands, \"{mov\t%0,%1|%w0 = %w1}\");
>>> + *return bpf_output_move (operands, \"{lddw\t%0,%1|%w0 = %w1 ll}\");
>>> + *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");
>>> + *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");"
>> Based on our discussion in the BPF GCC meeting this morning
>> I think we decided that we should not emit wN for ld/st[x],
>> (though gas should accommodate them).
>
> Correct, will do that in v2.
>
>> And the lddw pattern doesn't need to accommodate w regs since
>> it is exclusively for immediate values.
>
> Correct, will revert that as well.
>
>> So then the only pattern that needs to change here would be
>> the register-register 'mov' insn itself. But the pattern is
>> matching a move from MM to MM, i.e. the source and dest must
>> already be the same mode. So are you sure we would need to emit
>> w reg in this case?
>
> Right, the modes not matching is not the issue, it emitting rN reg when
> both operands are SImode is the problem which I describe in changelog above.
Hm...
> this can be a correctness issue where mov needs to zero out the upper bits.
I guess I am missing something but honestly I am confused by this part.
Why would mov need to zero out the upper bits?
I mean, this case is matching only say a (set (reg:SI rD) (reg:SI rS)).
i.e. the source register already has an SImode value. We aren't counting
on the state of the bits outside that SImode value. Moving either the
low 32-bits or all 64, doesn't make a difference to that SImode value.
Explicitly zeroing the upper 32 bits, while it may be achieved by using
a _BPF_ mov32 insn, I don't think is an exact match to the semantics of
a simple reg-to-reg move.
I understood from the testcase/pr that you are saying the issue is with
function return/argument passing and extension. But I don't think this
simple case matching on set reg:MM reg:MM is the proper way to
address that.
>
>> Maybe I am wrong but to me it seems the behavior in the test
>> below (and in the pr) must be more related to extendhisi.
>
> Yes and No :-)
>
> You are right that for the test below, it starts of with extend, sidi2
> to be precise, which gets expanded to 2 shifts (Its on my todo to avoid
> doing this for cpuv4 for better combine outcomes, but anyways that's for
> later).
>
> (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
> (ashift:DI (subreg:DI (reg:SI 21) 0)
> (const_int 32 [0x20])))
> (insn 9 8 10 2 (set (reg:DI 19 [ _1 ])
> (ashiftrt:DI (reg:DI 19 [ _1 ])
> (const_int 32 [0x20])))
> (insn 10 9 11 2 (set (reg:SI 1 %r1)
> (subreg/s/u:SI (reg:DI 19 [ _1 ]) 0))
>
> Combine is able to see thru all this to generate insn 14
OK, the value is sign extended to DImode, and then truncated
back to SImode. The original value in SImode is unchanged...
>
> scanning new insn with uid = 14.
> ...
> allowing combination of insns 8 and 9
> deferring deletion of insn with uid = 8.
> allowing combination of insns 9 and 10
> deferring deletion of insn with uid = 9.
> rescanning insn with uid = 10.
>
> (insn 14 5 6 2 (set (reg:SI 23)
> (reg:SI 0 %r0)) {*movsi}
... so this result from combine is valid; the net effect
is simply to move the SImode value from one reg to another,
keeping SImode.
I'm not sure exactly what is supposed to happen in the test.
The arg is passed as int, i.e. SImode. The origin of that value
is another int, also SImode. Why do the upper bits of the reg need
to be explicitly zeroed?
>
> And the -dP annotated output also confirms movsi involvement for the
> actual asm.
>
> #(insn 14 5 11 (set (reg:SI 1 %r1 [23])
> # (reg:SI 0 %r0))
> "gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c":10:4 35 {*movsi}
> # (expr_list:REG_DEAD (reg:SI 0 %r0)
> # (nil)))
> r1 = r0 # 14 [c=4 l=8] *movsi/1
>
>
> Spurred by your comment, I added a variant with "short" ret/args and
> looks like movhi needs even more fixing as it needs to generate wN =
> (s16) wM. I need to see if that needs to fixed in asm pattern or will
> PROMOTE_MODE change later be able to handle that in the core.
>
> Thx,
> -Vineet
On 3/31/26 3:42 PM, David Faust wrote:
>>> So then the only pattern that needs to change here would be
>>> the register-register 'mov' insn itself. But the pattern is
>>> matching a move from MM to MM, i.e. the source and dest must
>>> already be the same mode. So are you sure we would need to emit
>>> w reg in this case?
>> Right, the modes not matching is not the issue, it emitting rN reg when
>> both operands are SImode is the problem which I describe in changelog above.
> Hm...
>
>> this can be a correctness issue where mov needs to zero out the upper bits.
> I guess I am missing something but honestly I am confused by this part.
> Why would mov need to zero out the upper bits?
>
> I mean, this case is matching only say a (set (reg:SI rD) (reg:SI rS)).
> i.e. the source register already has an SImode value. We aren't counting
> on the state of the bits outside that SImode value. Moving either the
> low 32-bits or all 64, doesn't make a difference to that SImode value.
I think I'm starting to see where you are coming from.
> Explicitly zeroing the upper 32 bits, while it may be achieved by using
> a _BPF_ mov32 insn, I don't think is an exact match to the semantics of
> a simple reg-to-reg move.
Seems fair.
> I understood from the testcase/pr that you are saying the issue is with
> function return/argument passing and extension. But I don't think this
> simple case matching on set reg:MM reg:MM is the proper way to
> address that.
Right and that's not the intent, this is not claiming to fix the ABI,
that's a totally different change, this is just the precursor, at least
in my mind's eye, but I'm willing to concede that view-point.
So let me step back, all of this originated from ABI change work and
roughly based on HJ's v1 (not v2), I was starting to see the light.
There was a grand total of 1 additional selftest failure which was
attach_probe. The manually reduced test (attached here) showed the
failing case (is essentially beefier version of the test in this patch,
but same in spirit)
So with the ABI hooks and everything in place, what we were seeing was
following RTL stream
(call_insn 7 6 8 2 (set (reg:SI 0 %r0)
(call (mem:DI (symbol_ref:DI ("bpf_copy_from_user_str") [flags
0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>) [0
bpf_copy_from_user_str S8 A64])
(const_int 0 [0])))
(expr_list:REG_CALL_DECL (symbol_ref:DI ("bpf_copy_from_user_str")
[flags 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>)
(nil))
(expr_list:SI (use (reg:SI 1 %r1))
(nil)))
(insn 8 7 9 2 (set (reg/v:SI 19 [ ret ])
(reg:SI 0 %r0))
(nil))
(jump_insn 9 8 10 2 (set (pc)
(if_then_else (ne (reg/v:SI 19 [ ret ])
(const_int 4 [0x4]))
(label_ref:DI 33)
(pc))) {*branch_on_si}
-> 33)
(insn 11 10 12 4 (set (reg:SI 1 %r1)
(const_int 4 [0x4]))
(nil))
cse1 equivalence logic deduces that reg 19 = 4 = SI, guaranteed by
control flow, so we end up with
(insn 11 10 12 3 (set (reg:SI 1 %r1)
(reg/v:SI 19 [ ret ])) {*movsi}
(expr_list:REG_EQUAL (const_int 4 [0x4])
(nil)))
And this carries all the way to the end, which if generates rN is would
be problematic and was the reason for this change.
>>> Maybe I am wrong but to me it seems the behavior in the test
>>> below (and in the pr) must be more related to extendhisi.
>> Yes and No :-)
>>
>> You are right that for the test below, it starts of with extend, sidi2
>> to be precise, which gets expanded to 2 shifts (Its on my todo to avoid
>> doing this for cpuv4 for better combine outcomes, but anyways that's for
>> later).
>>
>> (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
>> (ashift:DI (subreg:DI (reg:SI 21) 0)
>> (const_int 32 [0x20])))
>> (insn 9 8 10 2 (set (reg:DI 19 [ _1 ])
>> (ashiftrt:DI (reg:DI 19 [ _1 ])
>> (const_int 32 [0x20])))
>> (insn 10 9 11 2 (set (reg:SI 1 %r1)
>> (subreg/s/u:SI (reg:DI 19 [ _1 ]) 0))
>>
>> Combine is able to see thru all this to generate insn 14
> OK, the value is sign extended to DImode, and then truncated
> back to SImode. The original value in SImode is unchanged...
>
>> scanning new insn with uid = 14.
>> ...
>> allowing combination of insns 8 and 9
>> deferring deletion of insn with uid = 8.
>> allowing combination of insns 9 and 10
>> deferring deletion of insn with uid = 9.
>> rescanning insn with uid = 10.
>>
>> (insn 14 5 6 2 (set (reg:SI 23)
>> (reg:SI 0 %r0)) {*movsi}
> ... so this result from combine is valid; the net effect
> is simply to move the SImode value from one reg to another,
> keeping SImode.
>
> I'm not sure exactly what is supposed to happen in the test.
> The arg is passed as int, i.e. SImode. The origin of that value
> is another int, also SImode. Why do the upper bits of the reg need
> to be explicitly zeroed?
You may be right that test case is more ABI leaning (and this pinskia
said something on PR), but even with the ABI change, this adjustment
will still be needed IMO.
Thx,
-Vineet
> On 3/31/26 3:42 PM, David Faust wrote:
>>>> So then the only pattern that needs to change here would be
>>>> the register-register 'mov' insn itself. But the pattern is
>>>> matching a move from MM to MM, i.e. the source and dest must
>>>> already be the same mode. So are you sure we would need to emit
>>>> w reg in this case?
>>> Right, the modes not matching is not the issue, it emitting rN reg when
>>> both operands are SImode is the problem which I describe in changelog above.
>> Hm...
>>
>>> this can be a correctness issue where mov needs to zero out the upper bits.
>> I guess I am missing something but honestly I am confused by this part.
>> Why would mov need to zero out the upper bits?
>>
>> I mean, this case is matching only say a (set (reg:SI rD) (reg:SI rS)).
>> i.e. the source register already has an SImode value. We aren't counting
>> on the state of the bits outside that SImode value. Moving either the
>> low 32-bits or all 64, doesn't make a difference to that SImode value.
>
> I think I'm starting to see where you are coming from.
>
>
>> Explicitly zeroing the upper 32 bits, while it may be achieved by using
>> a _BPF_ mov32 insn, I don't think is an exact match to the semantics of
>> a simple reg-to-reg move.
>
> Seems fair.
>
>> I understood from the testcase/pr that you are saying the issue is with
>> function return/argument passing and extension. But I don't think this
>> simple case matching on set reg:MM reg:MM is the proper way to
>> address that.
>
> Right and that's not the intent, this is not claiming to fix the ABI,
> that's a totally different change, this is just the precursor, at
> least in my mind's eye, but I'm willing to concede that view-point.
>
> So let me step back, all of this originated from ABI change work and
> roughly based on HJ's v1 (not v2), I was starting to see the
> light. There was a grand total of 1 additional selftest failure which
> was attach_probe. The manually reduced test (attached here) showed the
> failing case (is essentially beefier version of the test in this
> patch, but same in spirit)
>
> So with the ABI hooks and everything in place, what we were seeing was
> following RTL stream
>
> (call_insn 7 6 8 2 (set (reg:SI 0 %r0)
> (call (mem:DI (symbol_ref:DI ("bpf_copy_from_user_str") [flags
> 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>) [0
> bpf_copy_from_user_str S8 A64])
> (const_int 0 [0])))
> (expr_list:REG_CALL_DECL (symbol_ref:DI
> ("bpf_copy_from_user_str") [flags 0x41] <function_decl 0x7f10c1547300
> bpf_copy_from_user_str>)
> (nil))
> (expr_list:SI (use (reg:SI 1 %r1))
> (nil)))
>
> (insn 8 7 9 2 (set (reg/v:SI 19 [ ret ])
> (reg:SI 0 %r0))
> (nil))
>
> (jump_insn 9 8 10 2 (set (pc)
> (if_then_else (ne (reg/v:SI 19 [ ret ])
> (const_int 4 [0x4]))
> (label_ref:DI 33)
> (pc))) {*branch_on_si}
> -> 33)
>
> (insn 11 10 12 4 (set (reg:SI 1 %r1)
> (const_int 4 [0x4]))
> (nil))
>
> cse1 equivalence logic deduces that reg 19 = 4 = SI, guaranteed by
> control flow, so we end up with
>
> (insn 11 10 12 3 (set (reg:SI 1 %r1)
> (reg/v:SI 19 [ ret ])) {*movsi}
> (expr_list:REG_EQUAL (const_int 4 [0x4])
> (nil)))
>
> And this carries all the way to the end, which if generates rN is
> would be problematic and was the reason for this change.
>
>>>> Maybe I am wrong but to me it seems the behavior in the test
>>>> below (and in the pr) must be more related to extendhisi.
>>> Yes and No :-)
>>>
>>> You are right that for the test below, it starts of with extend, sidi2
>>> to be precise, which gets expanded to 2 shifts (Its on my todo to avoid
>>> doing this for cpuv4 for better combine outcomes, but anyways that's for
>>> later).
>>>
>>> (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
>>> (ashift:DI (subreg:DI (reg:SI 21) 0)
>>> (const_int 32 [0x20])))
>>> (insn 9 8 10 2 (set (reg:DI 19 [ _1 ])
>>> (ashiftrt:DI (reg:DI 19 [ _1 ])
>>> (const_int 32 [0x20])))
>>> (insn 10 9 11 2 (set (reg:SI 1 %r1)
>>> (subreg/s/u:SI (reg:DI 19 [ _1 ]) 0))
>>>
>>> Combine is able to see thru all this to generate insn 14
>> OK, the value is sign extended to DImode, and then truncated
>> back to SImode. The original value in SImode is unchanged...
>>
>>> scanning new insn with uid = 14.
>>> ...
>>> allowing combination of insns 8 and 9
>>> deferring deletion of insn with uid = 8.
>>> allowing combination of insns 9 and 10
>>> deferring deletion of insn with uid = 9.
>>> rescanning insn with uid = 10.
>>>
>>> (insn 14 5 6 2 (set (reg:SI 23)
>>> (reg:SI 0 %r0)) {*movsi}
>> ... so this result from combine is valid; the net effect
>> is simply to move the SImode value from one reg to another,
>> keeping SImode.
>>
>> I'm not sure exactly what is supposed to happen in the test.
>> The arg is passed as int, i.e. SImode. The origin of that value
>> is another int, also SImode. Why do the upper bits of the reg need
>> to be explicitly zeroed?
>
> You may be right that test case is more ABI leaning (and this pinskia
> said something on PR), but even with the ABI change, this adjustment
> will still be needed IMO.
Regardless of the context, ABI conventions, or any other consideration,
a move (set (reg:SI) (reg:SI)) should _not_ require mov32, because the
source register is supposed to already have the upper bits zeroed, or it
would not contain a value of mode SI.
The different movement/extension/etc insns conform a sort of a closure:
all the possible ways to put a SI value into a register, be it from
register or memory or immediate, must result into properly handled upper
32/48/56 bits. Moves from registers and immediates with values of
different moves involve expansions of extend* and friends, moves from
memory involves ldx and friends, all of which handle upper unused bits
properly. Given this, moves from registers of _the same_ mode should
not require anything extra than just copying all the 64 bits, i.e. mov.
So I think that if you can contrive an example in which you would need
emitting mov32 for such a move then the problem is somewhere else, and
you probably should ask yourself, why are the source register's contents
invalid.
You probably want to avoid optimizing while investigating this, because
otherwise a truncation + move will very likely be conflated into a
single perhaps-truncating-move and you will not be able to determine
what the source of the problem is..
> diff --git a/gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c b/gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c
> new file mode 100644
> index 000000000000..6d0a4f280cd7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/ret-reuse-arg-1.c
> @@ -0,0 +1,14 @@
> +/* Return value of first call is arg to second call. */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcpu=v4" } */
> +
> +int ret_int ();
> +void arg_int (int);
> +
> +void foo () {
> + arg_int(ret_int ());
> +}
> +
> +/* { dg-final { scan-assembler-not {r1 = r0} } } */
> +/* { dg-final { scan-assembler-times {w1 = w0} 1 } } */
As discussed in BZ 124688, these expected results are incorrect. There
is absolutely nothing wrong with the mov.
As things stand now, it is the callee that converts the value returned,
not the caller. Once we change the ABI to match llvm a conversion will
happen after the call, but even then a mov will be still the right
instruction to use to move a value of any mode from register to
register, provided the mode is not changed.
On 3/31/26 16:22, Vineet Gupta wrote:
>
> On 3/31/26 3:42 PM, David Faust wrote:
>>>> So then the only pattern that needs to change here would be
>>>> the register-register 'mov' insn itself. But the pattern is
>>>> matching a move from MM to MM, i.e. the source and dest must
>>>> already be the same mode. So are you sure we would need to emit
>>>> w reg in this case?
>>> Right, the modes not matching is not the issue, it emitting rN reg when
>>> both operands are SImode is the problem which I describe in changelog above.
>> Hm...
>>
>>> this can be a correctness issue where mov needs to zero out the upper bits.
>> I guess I am missing something but honestly I am confused by this part.
>> Why would mov need to zero out the upper bits?
>>
>> I mean, this case is matching only say a (set (reg:SI rD) (reg:SI rS)).
>> i.e. the source register already has an SImode value. We aren't counting
>> on the state of the bits outside that SImode value. Moving either the
>> low 32-bits or all 64, doesn't make a difference to that SImode value.
>
> I think I'm starting to see where you are coming from.
>
>
>> Explicitly zeroing the upper 32 bits, while it may be achieved by using
>> a _BPF_ mov32 insn, I don't think is an exact match to the semantics of
>> a simple reg-to-reg move.
>
> Seems fair.
>
>> I understood from the testcase/pr that you are saying the issue is with
>> function return/argument passing and extension. But I don't think this
>> simple case matching on set reg:MM reg:MM is the proper way to
>> address that.>
> Right and that's not the intent, this is not claiming to fix the ABI,
> that's a totally different change, this is just the precursor, at least
> in my mind's eye, but I'm willing to concede that view-point.
>
> So let me step back, all of this originated from ABI change work and
> roughly based on HJ's v1 (not v2), I was starting to see the light.
> There was a grand total of 1 additional selftest failure which was
> attach_probe. The manually reduced test (attached here) showed the
> failing case (is essentially beefier version of the test in this patch,
> but same in spirit)
Pasting here the attached test for reference:
> /* Extracted from BPF selftests attach_probe: verifier failure.
> Return value of first call is reused as argument of second.
> It either needs to be explicitly truncated to s32 or w regs need
> to be used. */
> /* { dg-do compile } */
> /* { dg-options "-O2 -mcpu=v4" } */
>
> extern int bpf_copy_from_user_str(int dst__sz);
>
> _Bool verify_sleepable_user_copy_str(void)
> {
> int ret;
> char data_short_pad[4];
>
> ret = bpf_copy_from_user_str(sizeof(data_short_pad));
> if (ret != 4)
> return false;
>
> /* ret above is guaranteed to be 4, same as sizeof. */
> ret = bpf_copy_from_user_str(sizeof(data_short_pad));
> if (ret != 4)
> return false;
>
> return true;
> }
>
> /* Don't generate 64-bit mov for arg, do 32-bit zero extended mov. */
> /* { dg-final { scan-assembler-not {r1 = r0} } } */
> /* { dg-final { scan-assembler-not {r1 = \(s32\) r0} } } */
> /* { dg-final { scan-assembler-times {w1 = w0} 1 } } */
I'm confused by the comment at the top.
First:
> Return value of first call is reused as argument of second.
Where? The only calls I see are identical:
ret = bpf_copy_from_user_str (sizeof (data_short_pad));
The return value is not reused as an argument.
Secondly:
> It either needs to be explicitly truncated to s32 or w regs need
> to be used. */
Why? Assuming the value were actually being reused, we would be
taking an int return value and passing it as an int argument to a
function taking an int. There is no conversion needed.
>
> So with the ABI hooks and everything in place, what we were seeing was
> following RTL stream
In the RTL below it's clear that the values are all being treated
in SImode:
>
> (call_insn 7 6 8 2 (set (reg:SI 0 %r0)
So the call return value is SImode.
> (call (mem:DI (symbol_ref:DI ("bpf_copy_from_user_str") [flags
> 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>) [0
> bpf_copy_from_user_str S8 A64])
> (const_int 0 [0])))
> (expr_list:REG_CALL_DECL (symbol_ref:DI ("bpf_copy_from_user_str")
> [flags 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>)
> (nil))
> (expr_list:SI (use (reg:SI 1 %r1))
(use (reg:SI 1 %r1)) : the arg is passed SI mode
> (nil)))
>
> (insn 8 7 9 2 (set (reg/v:SI 19 [ ret ])
> (reg:SI 0 %r0))
Return value copied to pseudo r19 in SImode to be used for
the comparison.
> (nil))
>
> (jump_insn 9 8 10 2 (set (pc)
> (if_then_else (ne (reg/v:SI 19 [ ret ])
> (const_int 4 [0x4]))
> (label_ref:DI 33)
> (pc))) {*branch_on_si}
> -> 33)
> > (insn 11 10 12 4 (set (reg:SI 1 %r1)
> (const_int 4 [0x4]))
> (nil))
>
> cse1 equivalence logic deduces that reg 19 = 4 = SI, guaranteed by
> control flow, so we end up with
Note that r19 is SImode a priori, to match the mode of the value from %r0.
It is not SI mode as a result of cse1 determining that it holds 4.
>
> (insn 11 10 12 3 (set (reg:SI 1 %r1)
> (reg/v:SI 19 [ ret ])) {*movsi}
> (expr_list:REG_EQUAL (const_int 4 [0x4])
> (nil)))
> And this carries all the way to the end, which if generates rN is would
> be problematic and was the reason for this change.
And again, what precisely is problematic about generating rN = rM
for this set?
If %r1 is subsequently used as the int argument to a function call,
then it is valid as is.
If it would be passed as a 'unsigned long' or something else requiring
a conversion then we would rather have e.g. a zero_extend:DI to do that,
not a simple set.
>
>>>> Maybe I am wrong but to me it seems the behavior in the test
>>>> below (and in the pr) must be more related to extendhisi.
>>> Yes and No :-)
>>>
>>> You are right that for the test below, it starts of with extend, sidi2
>>> to be precise, which gets expanded to 2 shifts (Its on my todo to avoid
>>> doing this for cpuv4 for better combine outcomes, but anyways that's for
>>> later).
>>>
>>> (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
>>> (ashift:DI (subreg:DI (reg:SI 21) 0)
>>> (const_int 32 [0x20])))
>>> (insn 9 8 10 2 (set (reg:DI 19 [ _1 ])
>>> (ashiftrt:DI (reg:DI 19 [ _1 ])
>>> (const_int 32 [0x20])))
>>> (insn 10 9 11 2 (set (reg:SI 1 %r1)
>>> (subreg/s/u:SI (reg:DI 19 [ _1 ]) 0))
>>>
>>> Combine is able to see thru all this to generate insn 14
>> OK, the value is sign extended to DImode, and then truncated
>> back to SImode. The original value in SImode is unchanged...
>>
>>> scanning new insn with uid = 14.
>>> ...
>>> allowing combination of insns 8 and 9
>>> deferring deletion of insn with uid = 8.
>>> allowing combination of insns 9 and 10
>>> deferring deletion of insn with uid = 9.
>>> rescanning insn with uid = 10.
>>>
>>> (insn 14 5 6 2 (set (reg:SI 23)
>>> (reg:SI 0 %r0)) {*movsi}
>> ... so this result from combine is valid; the net effect
>> is simply to move the SImode value from one reg to another,
>> keeping SImode.
>>
>> I'm not sure exactly what is supposed to happen in the test.
>> The arg is passed as int, i.e. SImode. The origin of that value
>> is another int, also SImode. Why do the upper bits of the reg need
>> to be explicitly zeroed?
>
> You may be right that test case is more ABI leaning (and this pinskia
> said something on PR), but even with the ABI change, this adjustment
> will still be needed IMO.
>
> Thx,
> -Vineet
On 4/1/26 11:28 AM, David Faust wrote:
>> attach_probe. The manually reduced test (attached here) showed the
>> failing case (is essentially beefier version of the test in this patch,
>> but same in spirit)
> Pasting here the attached test for reference:
>
>> /* Extracted from BPF selftests attach_probe: verifier failure.
>> Return value of first call is reused as argument of second.
>> It either needs to be explicitly truncated to s32 or w regs need
>> to be used. */
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -mcpu=v4" } */
>>
>> extern int bpf_copy_from_user_str(int dst__sz);
>>
>> _Bool verify_sleepable_user_copy_str(void)
>> {
>> int ret;
>> char data_short_pad[4];
>>
>> ret = bpf_copy_from_user_str(sizeof(data_short_pad));
>> if (ret != 4)
>> return false;
>>
>> /* ret above is guaranteed to be 4, same as sizeof. */
>> ret = bpf_copy_from_user_str(sizeof(data_short_pad));
>> if (ret != 4)
>> return false;
>>
>> return true;
>> }
>>
>> /* Don't generate 64-bit mov for arg, do 32-bit zero extended mov. */
>> /* { dg-final { scan-assembler-not {r1 = r0} } } */
>> /* { dg-final { scan-assembler-not {r1 = \(s32\) r0} } } */
>> /* { dg-final { scan-assembler-times {w1 = w0} 1 } } */
> I'm confused by the comment at the top.
>
> First:
>
>> Return value of first call is reused as argument of second.
> Where? The only calls I see are identical:
> ret = bpf_copy_from_user_str (sizeof (data_short_pad));
>
> The return value is not reused as an argument.
I meant semantically. At the time of second call, @ret from first call
is guaranteed to be 4 by virtue of control flow.
And sizeof (data_short_pad) happens to be 4 as well, hence the reason
gcc:cse1 decides to reuse the reg/value for ret as arg for second call.
> Secondly:
>
>> It either needs to be explicitly truncated to s32 or w regs need
>> to be used. */
> Why? Assuming the value were actually being reused, we would be
> taking an int return value and passing it as an int argument to a
> function taking an int. There is no conversion needed.
Correct they are all int, so yes no conversion required, but we are
forgetting that big boss "verifier" is watching everything and causes
issues, see below...
>> So with the ABI hooks and everything in place, what we were seeing was
>> following RTL stream
> In the RTL below it's clear that the values are all being treated
> in SImode:
>
>> (call_insn 7 6 8 2 (set (reg:SI 0 %r0)
> So the call return value is SImode.
>
>> (call (mem:DI (symbol_ref:DI ("bpf_copy_from_user_str") [flags
>> 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>) [0
>> bpf_copy_from_user_str S8 A64])
>> (const_int 0 [0])))
>> (expr_list:REG_CALL_DECL (symbol_ref:DI ("bpf_copy_from_user_str")
>> [flags 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>)
>> (nil))
>> (expr_list:SI (use (reg:SI 1 %r1))
> (use (reg:SI 1 %r1)) : the arg is passed SI mode
>
>> (nil)))
>>
>> (insn 8 7 9 2 (set (reg/v:SI 19 [ ret ])
>> (reg:SI 0 %r0))
> Return value copied to pseudo r19 in SImode to be used for
> the comparison.
Yeah everything is SI mode and its fine from gcc's codegen point of view.
>
>> (nil))
>>
>> (jump_insn 9 8 10 2 (set (pc)
>> (if_then_else (ne (reg/v:SI 19 [ ret ])
>> (const_int 4 [0x4]))
>> (label_ref:DI 33)
>> (pc))) {*branch_on_si}
>> -> 33)
>>> (insn 11 10 12 4 (set (reg:SI 1 %r1)
>> (const_int 4 [0x4]))
>> (nil))
>>
>> cse1 equivalence logic deduces that reg 19 = 4 = SI, guaranteed by
>> control flow, so we end up with
> Note that r19 is SImode a priori, to match the mode of the value from %r0.
> It is not SI mode as a result of cse1 determining that it holds 4.
Right.
>> (insn 11 10 12 3 (set (reg:SI 1 %r1)
>> (reg/v:SI 19 [ ret ])) {*movsi}
>> (expr_list:REG_EQUAL (const_int 4 [0x4])
>> (nil)))
>> And this carries all the way to the end, which if generates rN is would
>> be problematic and was the reason for this change.
> And again, what precisely is problematic about generating rN = rM
> for this set?
Verifier trips up, #12 attach _probe
(tools/testing/selftests/bpf/progs/test_attach_probe.c)
> If %r1 is subsequently used as the int argument to a function call,
> then it is valid as is.
> If it would be passed as a 'unsigned long' or something else requiring
> a conversion then we would rather have e.g. a zero_extend:DI to do that,
> not a simple set.
The rN based move copies the full 64-bits, but the subsequent arg is
int: verifier wants to make sure top bits are cleared, which is not
guaranteed given the current codegen.
Full log below, relevant snippets extracted:
22: (85) call bpf_copy_from_user_str#85440 ; R0=scalar()
...
24: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
...
29: (85) call bpf_strncmp#182 ; R0=scalar()
30: (55) if r0 != 0x0 goto pc+1 ; R0=0
31: (16) if w7 == 0x4 goto pc+2 34: R0=0 R6=map_value(map=test_att.bss,ks=4,vs=56) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R10=fp0 fp-16=???????m fp-24=mmmmmmmm fp-88=????mmmm
...
36: (bf) r2 = r7 ; R2=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000))
...
39: (85) call bpf_copy_from_user_str#85440
R2 min value is negative, either use unsigned or 'var &= const'
Now, thinking about it, part of the problem could also be #31 using w7
reg for comparison. If it had used the rN reg too, maybe verifier would
be confident about the top bits too and won't trip up. But that again
would point to impedance mismatch between movsi (only using rN regs) and
branch_on_si, which would use wN regs.
I did a quick hack to have branch_on_ generate rN forms (there's no way
this can be productized, since we might have even more issues with
verifier dealing with 64-bits now.
- case EQ: return "{jeq<msuffix>\t%0,%1,%2|if %w0 == %w1 goto %2}"; break;
+ case EQ: return "{jeq<msuffix>\t%0,%1,%2|if %0 == %1 goto %2}"; break;
And now it does generate rN for branch n compare
verify_sleepable_user_copy_str:
r1 = 4
call bpf_copy_from_user_str
if r0 == 4 goto .L5
r0 = 0
exit
.L5:
r1 = r0
call bpf_copy_from_user_str
w0 ^= 4
w0 = w0
r0 += -1
r0 >>= 63
exit
But this is besides the point, just a curiosity, since I'm pretty sure
verifier will hit way more false positives with rN regs in branch and
compare.
Full attach probe log (w/o the above hack) and code to generate it on my
gh portal [1]
[1] git@github.com:vineetgarc/gcc.git #bpf/promotion-n-abi
Thx,
-Vineet
--- #12 attach_probe:
libbpf: elf: skipping unrecognized data section(17) .comment
test_attach_probe:PASS:skel_open 0 nsec
libbpf: prog 'handle_uprobe_byname3_sleepable': BPF program load failed: -EACCES
libbpf: prog 'handle_uprobe_byname3_sleepable': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; @ vmlinux.h:49637
0: (b7) r2 = 9 ; R2=9
; @ test_attach_probe.c:92
1: (18) r6 = 0xffa00000017cd000 ; R6=map_value(map=test_att.bss,ks=4,vs=56)
3: (bf) r1 = r10 ; R1=fp0 R10=fp0
4: (79) r3 = *(u64 *)(r6 +0) ; R3=scalar() R6=map_value(map=test_att.bss,ks=4,vs=56)
5: (07) r1 += -24 ; R1=fp-24
6: (85) call bpf_copy_from_user#148 ; R0=scalar() fp-16=???????m fp-24=mmmmmmmm
; @ test_attach_probe.c:93
7: (bf) r1 = r10 ; R1=fp0 R10=fp0
8: (18) r3 = 0xff11000113510930 ; R3=map_value(map=.rodata.str1.1,ks=4,vs=10)
10: (b7) r2 = 9 ; R2=9
11: (07) r1 += -24 ; R1=fp-24
12: (85) call bpf_strncmp#182 ; R0=scalar()
; @ test_attach_probe.c:147
13: (55) if r0 != 0x0 goto pc+3 ; R0=0
; @ test_attach_probe.c:148
14: (18) r0 = 0xffa00000017cd01c ; R0=map_value(map=test_att.bss,ks=4,vs=56,imm=28)
16: (62) *(u32 *)(r0 +0) = 9 ; R0=map_value(map=test_att.bss,ks=4,vs=56,imm=28)
; @ test_attach_probe.c:105
17: (79) r3 = *(u64 *)(r6 +0) ; R3=scalar() R6=map_value(map=test_att.bss,ks=4,vs=56)
18: (b7) r2 = 4 ; R2=4
19: (b7) r4 = 0 ; R4=0
20: (bf) r1 = r10 ; R1=fp0 R10=fp0
21: (07) r1 += -88 ; R1=fp-88
22: (85) call bpf_copy_from_user_str#85440 ; R0=scalar()
; @ test_attach_probe.c:107
23: (bf) r1 = r10 ; R1=fp0 R10=fp0
; @ test_attach_probe.c:105
24: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
; @ test_attach_probe.c:107
25: (18) r3 = 0xff11000113514d30 ; R3=map_value(map=test_att.rodata,ks=4,vs=16)
27: (b7) r2 = 4 ; R2=4
28: (07) r1 += -88 ; R1=fp-88
29: (85) call bpf_strncmp#182 ; R0=scalar()
30: (55) if r0 != 0x0 goto pc+1 ; R0=0
31: (16) if w7 == 0x4 goto pc+2 34: R0=0 R6=map_value(map=test_att.bss,ks=4,vs=56) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R10=fp0 fp-16=???????m fp-24=mmmmmmmm fp-88=????mmmm
; @ test_attach_probe.c:110
34: (b7) r4 = 1 ; R4=1
35: (79) r3 = *(u64 *)(r6 +0) ; R3=scalar() R6=map_value(map=test_att.bss,ks=4,vs=56)
36: (bf) r2 = r7 ; R2=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000))
37: (bf) r1 = r10 ; R1=fp0 R10=fp0
38: (07) r1 += -80 ; R1=fp-80
39: (85) call bpf_copy_from_user_str#85440
R2 min value is negative, either use unsigned or 'var &= const'
arg#0 arg#1 memory, len pair leads to invalid memory access
processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'handle_uprobe_byname3_sleepable': failed to load: -EACCES
libbpf: failed to load object 'test_attach_probe'
libbpf: failed to load BPF skeleton 'test_attach_probe': -EACCES
test_attach_probe:FAIL:skel_load unexpected error: -13 (errno 13)
test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
#12 attach_probe:FAIL
On 4/1/26 17:02, Vineet Gupta wrote:
> On 4/1/26 11:28 AM, David Faust wrote:
>>> attach_probe. The manually reduced test (attached here) showed the
>>> failing case (is essentially beefier version of the test in this patch,
>>> but same in spirit)
>> Pasting here the attached test for reference:
>>
>>> /* Extracted from BPF selftests attach_probe: verifier failure.
>>> Return value of first call is reused as argument of second.
>>> It either needs to be explicitly truncated to s32 or w regs need
>>> to be used. */
>>> /* { dg-do compile } */
>>> /* { dg-options "-O2 -mcpu=v4" } */
>>>
>>> extern int bpf_copy_from_user_str(int dst__sz);
>>>
>>> _Bool verify_sleepable_user_copy_str(void)
>>> {
>>> int ret;
>>> char data_short_pad[4];
>>>
>>> ret = bpf_copy_from_user_str(sizeof(data_short_pad));
>>> if (ret != 4)
>>> return false;
>>>
>>> /* ret above is guaranteed to be 4, same as sizeof. */
>>> ret = bpf_copy_from_user_str(sizeof(data_short_pad));
>>> if (ret != 4)
>>> return false;
>>>
>>> return true;
>>> }
>>>
>>> /* Don't generate 64-bit mov for arg, do 32-bit zero extended mov. */
>>> /* { dg-final { scan-assembler-not {r1 = r0} } } */
>>> /* { dg-final { scan-assembler-not {r1 = \(s32\) r0} } } */
>>> /* { dg-final { scan-assembler-times {w1 = w0} 1 } } */
>> I'm confused by the comment at the top.
>>
>> First:
>>
>>> Return value of first call is reused as argument of second.
>> Where? The only calls I see are identical:
>> ret = bpf_copy_from_user_str (sizeof (data_short_pad));
>>
>> The return value is not reused as an argument.
>
> I meant semantically. At the time of second call, @ret from first call
> is guaranteed to be 4 by virtue of control flow.
> And sizeof (data_short_pad) happens to be 4 as well, hence the reason
> gcc:cse1 decides to reuse the reg/value for ret as arg for second call.
>
>> Secondly:
>>
>>> It either needs to be explicitly truncated to s32 or w regs need
>>> to be used. */
>> Why? Assuming the value were actually being reused, we would be
>> taking an int return value and passing it as an int argument to a
>> function taking an int. There is no conversion needed.
>
> Correct they are all int, so yes no conversion required, but we are
> forgetting that big boss "verifier" is watching everything and causes
> issues, see below...
>
>>> So with the ABI hooks and everything in place, what we were seeing was
>>> following RTL stream
>> In the RTL below it's clear that the values are all being treated
>> in SImode:
>>
>>> (call_insn 7 6 8 2 (set (reg:SI 0 %r0)
>> So the call return value is SImode.
>>
>>> (call (mem:DI (symbol_ref:DI ("bpf_copy_from_user_str") [flags
>>> 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>) [0
>>> bpf_copy_from_user_str S8 A64])
>>> (const_int 0 [0])))
>>> (expr_list:REG_CALL_DECL (symbol_ref:DI ("bpf_copy_from_user_str")
>>> [flags 0x41] <function_decl 0x7f10c1547300 bpf_copy_from_user_str>)
>>> (nil))
>>> (expr_list:SI (use (reg:SI 1 %r1))
>> (use (reg:SI 1 %r1)) : the arg is passed SI mode
>>
>>> (nil)))
>>>
>>> (insn 8 7 9 2 (set (reg/v:SI 19 [ ret ])
>>> (reg:SI 0 %r0))
>> Return value copied to pseudo r19 in SImode to be used for
>> the comparison.
>
> Yeah everything is SI mode and its fine from gcc's codegen point of view.
>
>>
>>> (nil))
>>>
>>> (jump_insn 9 8 10 2 (set (pc)
>>> (if_then_else (ne (reg/v:SI 19 [ ret ])
>>> (const_int 4 [0x4]))
>>> (label_ref:DI 33)
>>> (pc))) {*branch_on_si}
>>> -> 33)
>>>> (insn 11 10 12 4 (set (reg:SI 1 %r1)
>>> (const_int 4 [0x4]))
>>> (nil))
>>>
>>> cse1 equivalence logic deduces that reg 19 = 4 = SI, guaranteed by
>>> control flow, so we end up with
>> Note that r19 is SImode a priori, to match the mode of the value from %r0.
>> It is not SI mode as a result of cse1 determining that it holds 4.
>
> Right.
>
>>> (insn 11 10 12 3 (set (reg:SI 1 %r1)
>>> (reg/v:SI 19 [ ret ])) {*movsi}
>>> (expr_list:REG_EQUAL (const_int 4 [0x4])
>>> (nil)))
>>> And this carries all the way to the end, which if generates rN is would
>>> be problematic and was the reason for this change.
>> And again, what precisely is problematic about generating rN = rM
>> for this set?
>
> Verifier trips up, #12 attach _probe
> (tools/testing/selftests/bpf/progs/test_attach_probe.c)
>
>> If %r1 is subsequently used as the int argument to a function call,
>> then it is valid as is.
>> If it would be passed as a 'unsigned long' or something else requiring
>> a conversion then we would rather have e.g. a zero_extend:DI to do that,
>> not a simple set.
>
> The rN based move copies the full 64-bits, but the subsequent arg is
> int: verifier wants to make sure top bits are cleared, which is not
> guaranteed given the current codegen.
Ahhhhhh, so it is a verifier issue! Not a code correctness issue.
OK, that makes more sense.
>
> Full log below, relevant snippets extracted:
>
> 22: (85) call bpf_copy_from_user_str#85440 ; R0=scalar()
> ...
> 24: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
> ...
> 29: (85) call bpf_strncmp#182 ; R0=scalar()
> 30: (55) if r0 != 0x0 goto pc+1 ; R0=0
> 31: (16) if w7 == 0x4 goto pc+2 34: R0=0 R6=map_value(map=test_att.bss,ks=4,vs=56) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R10=fp0 fp-16=???????m fp-24=mmmmmmmm fp-88=????mmmm
> ...
>
> 36: (bf) r2 = r7 ; R2=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000))
> ...
> 39: (85) call bpf_copy_from_user_str#85440
> R2 min value is negative, either use unsigned or 'var &= const'
>
> Now, thinking about it, part of the problem could also be #31 using w7
> reg for comparison. If it had used the rN reg too, maybe verifier would
> be confident about the top bits too and won't trip up. But that again
> would point to impedance mismatch between movsi (only using rN regs) and
> branch_on_si, which would use wN regs.
Hmm, looking at these verifier logs (thanks for providing), I think
the issue may stem from the bpf_copy_from_user_str call.
Ultimately the return value propagates r0 -> r7 -> r2.
BUT, I am confused. This test passes on my system compiled with
gcc trunk. After the call, we emit a movs32 when moving the return value
to a different register, which is enough for the verifier to know the
value is restricted to signed integer range.
Are you seeing this with trunk or with the PROMOTE_MODE or some other
change applied?
>
> I did a quick hack to have branch_on_ generate rN forms (there's no way
> this can be productized, since we might have even more issues with
> verifier dealing with 64-bits now.
>
> - case EQ: return "{jeq<msuffix>\t%0,%1,%2|if %w0 == %w1 goto %2}"; break;
> + case EQ: return "{jeq<msuffix>\t%0,%1,%2|if %0 == %1 goto %2}"; break;
>
> And now it does generate rN for branch n compare
>
> verify_sleepable_user_copy_str:
> r1 = 4
> call bpf_copy_from_user_str
> if r0 == 4 goto .L5
> r0 = 0
> exit
> .L5:
> r1 = r0
> call bpf_copy_from_user_str
> w0 ^= 4
> w0 = w0
> r0 += -1
> r0 >>= 63
> exit
>
> But this is besides the point, just a curiosity, since I'm pretty sure
> verifier will hit way more false positives with rN regs in branch and
> compare.
>
>
> Full attach probe log (w/o the above hack) and code to generate it on my
> gh portal [1]
> [1] git@github.com:vineetgarc/gcc.git #bpf/promotion-n-abi
>
>
> Thx,
> -Vineet
>
> --- #12 attach_probe:
> libbpf: elf: skipping unrecognized data section(17) .comment
> test_attach_probe:PASS:skel_open 0 nsec
> libbpf: prog 'handle_uprobe_byname3_sleepable': BPF program load failed: -EACCES
> libbpf: prog 'handle_uprobe_byname3_sleepable': -- BEGIN PROG LOAD LOG --
> 0: R1=ctx() R10=fp0
> ; @ vmlinux.h:49637
> 0: (b7) r2 = 9 ; R2=9
> ; @ test_attach_probe.c:92
> 1: (18) r6 = 0xffa00000017cd000 ; R6=map_value(map=test_att.bss,ks=4,vs=56)
> 3: (bf) r1 = r10 ; R1=fp0 R10=fp0
> 4: (79) r3 = *(u64 *)(r6 +0) ; R3=scalar() R6=map_value(map=test_att.bss,ks=4,vs=56)
> 5: (07) r1 += -24 ; R1=fp-24
> 6: (85) call bpf_copy_from_user#148 ; R0=scalar() fp-16=???????m fp-24=mmmmmmmm
> ; @ test_attach_probe.c:93
> 7: (bf) r1 = r10 ; R1=fp0 R10=fp0
> 8: (18) r3 = 0xff11000113510930 ; R3=map_value(map=.rodata.str1.1,ks=4,vs=10)
> 10: (b7) r2 = 9 ; R2=9
> 11: (07) r1 += -24 ; R1=fp-24
> 12: (85) call bpf_strncmp#182 ; R0=scalar()
> ; @ test_attach_probe.c:147
> 13: (55) if r0 != 0x0 goto pc+3 ; R0=0
> ; @ test_attach_probe.c:148
> 14: (18) r0 = 0xffa00000017cd01c ; R0=map_value(map=test_att.bss,ks=4,vs=56,imm=28)
> 16: (62) *(u32 *)(r0 +0) = 9 ; R0=map_value(map=test_att.bss,ks=4,vs=56,imm=28)
> ; @ test_attach_probe.c:105
> 17: (79) r3 = *(u64 *)(r6 +0) ; R3=scalar() R6=map_value(map=test_att.bss,ks=4,vs=56)
> 18: (b7) r2 = 4 ; R2=4
> 19: (b7) r4 = 0 ; R4=0
> 20: (bf) r1 = r10 ; R1=fp0 R10=fp0
> 21: (07) r1 += -88 ; R1=fp-88
> 22: (85) call bpf_copy_from_user_str#85440 ; R0=scalar()
> ; @ test_attach_probe.c:107
> 23: (bf) r1 = r10 ; R1=fp0 R10=fp0
> ; @ test_attach_probe.c:105
> 24: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
> ; @ test_attach_probe.c:107
> 25: (18) r3 = 0xff11000113514d30 ; R3=map_value(map=test_att.rodata,ks=4,vs=16)
> 27: (b7) r2 = 4 ; R2=4
> 28: (07) r1 += -88 ; R1=fp-88
> 29: (85) call bpf_strncmp#182 ; R0=scalar()
> 30: (55) if r0 != 0x0 goto pc+1 ; R0=0
> 31: (16) if w7 == 0x4 goto pc+2 34: R0=0 R6=map_value(map=test_att.bss,ks=4,vs=56) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R10=fp0 fp-16=???????m fp-24=mmmmmmmm fp-88=????mmmm
> ; @ test_attach_probe.c:110
> 34: (b7) r4 = 1 ; R4=1
> 35: (79) r3 = *(u64 *)(r6 +0) ; R3=scalar() R6=map_value(map=test_att.bss,ks=4,vs=56)
> 36: (bf) r2 = r7 ; R2=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000))
> 37: (bf) r1 = r10 ; R1=fp0 R10=fp0
> 38: (07) r1 += -80 ; R1=fp-80
> 39: (85) call bpf_copy_from_user_str#85440
> R2 min value is negative, either use unsigned or 'var &= const'
> arg#0 arg#1 memory, len pair leads to invalid memory access
> processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 0
> -- END PROG LOAD LOG --
> libbpf: prog 'handle_uprobe_byname3_sleepable': failed to load: -EACCES
> libbpf: failed to load object 'test_attach_probe'
> libbpf: failed to load BPF skeleton 'test_attach_probe': -EACCES
> test_attach_probe:FAIL:skel_load unexpected error: -13 (errno 13)
> test_attach_probe:PASS:uprobe_ref_ctr_cleanup 0 nsec
> #12 attach_probe:FAIL
>
>
>
On 4/2/26 10:02 AM, David Faust wrote:
>>> And again, what precisely is problematic about generating rN = rM
>>> for this set?
>> Verifier trips up, #12 attach _probe
>> (tools/testing/selftests/bpf/progs/test_attach_probe.c)
>>
>>> If %r1 is subsequently used as the int argument to a function call,
>>> then it is valid as is.
>>> If it would be passed as a 'unsigned long' or something else requiring
>>> a conversion then we would rather have e.g. a zero_extend:DI to do that,
>>> not a simple set.
>> The rN based move copies the full 64-bits, but the subsequent arg is
>> int: verifier wants to make sure top bits are cleared, which is not
>> guaranteed given the current codegen.
> Ahhhhhh, so it is a verifier issue! Not a code correctness issue.
> OK, that makes more sense.
Yeah and my bad for not calling it in the changelog, but it was a
logistics issue as I'd seperated it out of ABI change but needed ABI
changes for the exact context.
>> Full log below, relevant snippets extracted:
>>
>> 22: (85) call bpf_copy_from_user_str#85440 ; R0=scalar()
>> ...
>> 24: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
>> ...
>> 29: (85) call bpf_strncmp#182 ; R0=scalar()
>> 30: (55) if r0 != 0x0 goto pc+1 ; R0=0
>> 31: (16) if w7 == 0x4 goto pc+2 34: R0=0 R6=map_value(map=test_att.bss,ks=4,vs=56) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R10=fp0 fp-16=???????m fp-24=mmmmmmmm fp-88=????mmmm
>> ...
>>
>> 36: (bf) r2 = r7 ; R2=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000))
>> ...
>> 39: (85) call bpf_copy_from_user_str#85440
>> R2 min value is negative, either use unsigned or 'var &= const'
>>
>> Now, thinking about it, part of the problem could also be #31 using w7
>> reg for comparison. If it had used the rN reg too, maybe verifier would
>> be confident about the top bits too and won't trip up. But that again
>> would point to impedance mismatch between movsi (only using rN regs) and
>> branch_on_si, which would use wN regs.
> Hmm, looking at these verifier logs (thanks for providing), I think
> the issue may stem from the bpf_copy_from_user_str call.
> Ultimately the return value propagates r0 -> r7 -> r2.
Exactly.
> BUT, I am confused. This test passes on my system compiled with
> gcc trunk. After the call, we emit a movs32 when moving the return value
> to a different register, which is enough for the verifier to know the
> value is restricted to signed integer range.
>
> Are you seeing this with trunk or with the PROMOTE_MODE or some other
> change applied?
Yeah it's specifically with PROMOTE_MODE part of the ABI change, w/o
touching target hooks.
Thx,
-Vineet
On 4/2/26 13:44, Vineet Gupta wrote:
> On 4/2/26 10:02 AM, David Faust wrote:
>>>> And again, what precisely is problematic about generating rN = rM
>>>> for this set?
>>> Verifier trips up, #12 attach _probe
>>> (tools/testing/selftests/bpf/progs/test_attach_probe.c)
>>>
>>>> If %r1 is subsequently used as the int argument to a function call,
>>>> then it is valid as is.
>>>> If it would be passed as a 'unsigned long' or something else requiring
>>>> a conversion then we would rather have e.g. a zero_extend:DI to do that,
>>>> not a simple set.
>>> The rN based move copies the full 64-bits, but the subsequent arg is
>>> int: verifier wants to make sure top bits are cleared, which is not
>>> guaranteed given the current codegen.
>> Ahhhhhh, so it is a verifier issue! Not a code correctness issue.
>> OK, that makes more sense.
>
> Yeah and my bad for not calling it in the changelog, but it was a
> logistics issue as I'd seperated it out of ABI change but needed ABI
> changes for the exact context.
>
>>> Full log below, relevant snippets extracted:
>>>
>>> 22: (85) call bpf_copy_from_user_str#85440 ; R0=scalar()
>>> ...
>>> 24: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
>>> ...
>>> 29: (85) call bpf_strncmp#182 ; R0=scalar()
>>> 30: (55) if r0 != 0x0 goto pc+1 ; R0=0
>>> 31: (16) if w7 == 0x4 goto pc+2 34: R0=0 R6=map_value(map=test_att.bss,ks=4,vs=56) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R10=fp0 fp-16=???????m fp-24=mmmmmmmm fp-88=????mmmm
>>> ...
>>>
>>> 36: (bf) r2 = r7 ; R2=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000)) R7=scalar(id=1,smin=0x8000000000000004,smax=0x7fffffff00000004,umin=smin32=umin32=4,umax=0xffffffff00000004,smax32=umax32=4,var_off=(0x4; 0xffffffff00000000))
>>> ...
>>> 39: (85) call bpf_copy_from_user_str#85440
>>> R2 min value is negative, either use unsigned or 'var &= const'
>>>
>>> Now, thinking about it, part of the problem could also be #31 using w7
>>> reg for comparison. If it had used the rN reg too, maybe verifier would
>>> be confident about the top bits too and won't trip up. But that again
>>> would point to impedance mismatch between movsi (only using rN regs) and
>>> branch_on_si, which would use wN regs.
>> Hmm, looking at these verifier logs (thanks for providing), I think
>> the issue may stem from the bpf_copy_from_user_str call.
>> Ultimately the return value propagates r0 -> r7 -> r2.
>
> Exactly.
>
>> BUT, I am confused. This test passes on my system compiled with
>> gcc trunk. After the call, we emit a movs32 when moving the return value
>> to a different register, which is enough for the verifier to know the
>> value is restricted to signed integer range.
>>
>> Are you seeing this with trunk or with the PROMOTE_MODE or some other
>> change applied?
>
> Yeah it's specifically with PROMOTE_MODE part of the ABI change, w/o
> touching target hooks.
I see.
At this point it seems clear that there is no correctness issue with the
moves currently generated by gcc trunk.
Rather, the question is about the verifiability of the resulting code.
But verifier issues raised here also do not reproduce with gcc trunk.
Can you please send a reduced example showing precisely what code is
being rejected by the verifier (despite being semantically correct)
_as generated by gcc trunk_ ?
If these issues only appear with a patched gcc, then this patch to
address those issues should be part of a patchset including all other
changes which would necessitate this fix.
You are introducing a lot of confusion by sending a patch against
issues that do not seem to exist in trunk.
>
> Thx,
> -Vineet
On 4/3/26 9:09 AM, David Faust wrote:
>>> BUT, I am confused. This test passes on my system compiled with
>>> gcc trunk. After the call, we emit a movs32 when moving the return value
>>> to a different register, which is enough for the verifier to know the
>>> value is restricted to signed integer range.
>>>
>>> Are you seeing this with trunk or with the PROMOTE_MODE or some other
>>> change applied?
>> Yeah it's specifically with PROMOTE_MODE part of the ABI change, w/o
>> touching target hooks.
> I see.
>
> At this point it seems clear that there is no correctness issue with the
> moves currently generated by gcc trunk.
>
> Rather, the question is about the verifiability of the resulting code.
> But verifier issues raised here also do not reproduce with gcc trunk.
>
> Can you please send a reduced example showing precisely what code is
> being rejected by the verifier (despite being semantically correct)
> _as generated by gcc trunk_ ?
>
> If these issues only appear with a patched gcc, then this patch to
> address those issues should be part of a patchset including all other
> changes which would necessitate this fix.
Yeah that's what I concluded as well on this thread and IRC.
> You are introducing a lot of confusion by sending a patch against
> issues that do not seem to exist in trunk.
Yep.
-Vineet
@@ -383,11 +383,11 @@
(match_operand:MM 1 "mov_src_operand" " q,rIc,BC,r,I"))]
""
"@
- *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%0 = *(<smop> *) %1}\");
- *return bpf_output_move (operands, \"{mov\t%0,%1|%0 = %1}\");
- *return bpf_output_move (operands, \"{lddw\t%0,%1|%0 = %1 ll}\");
- *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %0 = %1}\");
- *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %0 = %1}\");"
+ *return bpf_output_move (operands, \"{ldx<mop>\t%0,%1|%w0 = *(<smop> *) %w1}\");
+ *return bpf_output_move (operands, \"{mov\t%0,%1|%w0 = %w1}\");
+ *return bpf_output_move (operands, \"{lddw\t%0,%1|%w0 = %w1 ll}\");
+ *return bpf_output_move (operands, \"{stx<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");
+ *return bpf_output_move (operands, \"{st<mop>\t%0,%1|*(<smop> *) %w0 = %w1}\");"
[(set_attr "type" "ldx,alu,alu,stx,st")])
;;;; Shifts
new file mode 100644
@@ -0,0 +1,14 @@
+/* Return value of first call is arg to second call. */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=v4" } */
+
+int ret_int ();
+void arg_int (int);
+
+void foo () {
+ arg_int(ret_int ());
+}
+
+/* { dg-final { scan-assembler-not {r1 = r0} } } */
+/* { dg-final { scan-assembler-times {w1 = w0} 1 } } */