[to-be-committed,RISC-V,PR,target/118248] Avoid bogus alloca call in RISC-V backend
Checks
Context |
Check |
Description |
rivoscibot/toolchain-ci-rivos-lint |
success
|
Lint passed
|
rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-test |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
fail
|
Patch failed to apply
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
fail
|
Patch failed to apply
|
Commit Message
This is Jakub's patch and Ian's testcase for the slightly vexing fault
building the D runtime with an s390x-x-riscv cross compiler.
The core issue is we're allocating a vector to hold temporary registers
unconditionally, including cases where the vector isn't needed because
the loop isn't going to iterate.
In the cases where the vector isn't needed the length is computed with
an expression (x / y) - 1 where x / y will be zero. The alloca(-1) on
the s390 platform triggers a fault. We haven't seen the fault with an
x86 cross, but we can certainly see the bogus value being passed to
alloca with a debugger.
Jakub patch just conditionalizes the whole block in a sensible way. So
it looks larger than it really is. I thought it might be better to do a
bit of manual CSE on this code to make it even more obvious, but I think
we're ultimately OK here.
Ian provided the testcase, collapsed down into equivalent C code.
Again, it doesn't fault on an x86-x-riscv, but I can see the incorrect
behavior with a debugger.
And a shout-out to Stefan for providing a docker based reproducer, it
really helped track this down.
Waiting for the pre-commit tester to do its thing before committing.
Jeff
PR target/118248
gcc/
* config/riscv/riscv-string.cc (riscv_block_move_straight): Only
allocate REGS buffer if it will be needed.
gcc/testsuite
* gcc.target/riscv/pr118248.c: New test.
Comments
On Sun, Feb 16, 2025 at 4:38 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
> This is Jakub's patch and Ian's testcase for the slightly vexing fault
> building the D runtime with an s390x-x-riscv cross compiler.
>
> The core issue is we're allocating a vector to hold temporary registers
> unconditionally, including cases where the vector isn't needed because
> the loop isn't going to iterate.
>
> In the cases where the vector isn't needed the length is computed with
> an expression (x / y) - 1 where x / y will be zero. The alloca(-1) on
> the s390 platform triggers a fault. We haven't seen the fault with an
> x86 cross, but we can certainly see the bogus value being passed to
> alloca with a debugger.
I would expect alloca(-1) to trigger a fault with -fstack-clash-protection
even on x86, so we should indeed avoid doing this.
>
> Jakub patch just conditionalizes the whole block in a sensible way. So
> it looks larger than it really is. I thought it might be better to do a
> bit of manual CSE on this code to make it even more obvious, but I think
> we're ultimately OK here.
>
> Ian provided the testcase, collapsed down into equivalent C code.
> Again, it doesn't fault on an x86-x-riscv, but I can see the incorrect
> behavior with a debugger.
>
> And a shout-out to Stefan for providing a docker based reproducer, it
> really helped track this down.
>
> Waiting for the pre-commit tester to do its thing before committing.
>
> Jeff
>
@@ -804,7 +804,7 @@ static void
riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
unsigned HOST_WIDE_INT align)
{
- unsigned HOST_WIDE_INT offset, delta;
+ unsigned HOST_WIDE_INT offset = 0, delta;
unsigned HOST_WIDE_INT bits;
int i;
enum machine_mode mode;
@@ -815,20 +815,25 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
mode = mode_for_size (bits, MODE_INT, 0).require ();
delta = bits / BITS_PER_UNIT;
- /* Allocate a buffer for the temporary registers. */
- regs = XALLOCAVEC (rtx, length / delta - 1);
-
- /* Load as many BITS-sized chunks as possible. Use a normal load if
- the source has enough alignment, otherwise use left/right pairs. */
- for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
+ if (2 * delta <= length)
{
- regs[i] = gen_reg_rtx (mode);
- riscv_emit_move (regs[i], adjust_address (src, mode, offset));
- }
+ /* Allocate a buffer for the temporary registers. */
+ regs = XALLOCAVEC (rtx, length / delta - 1);
+
+ /* Load as many BITS-sized chunks as possible. Use a normal load if
+ the source has enough alignment, otherwise use left/right pairs. */
+ for (offset = 0, i = 0; offset + 2 * delta <= length;
+ offset += delta, i++)
+ {
+ regs[i] = gen_reg_rtx (mode);
+ riscv_emit_move (regs[i], adjust_address (src, mode, offset));
+ }
- /* Copy the chunks to the destination. */
- for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
- riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+ /* Copy the chunks to the destination. */
+ for (offset = 0, i = 0; offset + 2 * delta <= length;
+ offset += delta, i++)
+ riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+ }
/* Mop up any left-over bytes. */
if (offset < length)
new file mode 100644
@@ -0,0 +1,26 @@
+/* This does not fault with an x86 cross to risc-v, but does
+ with s390 cross to risc-v, probably due to rounding of the
+ argument to the alloca call within the risc-v backend. */
+/* { dg-do compile } */
+struct char100
+{
+ char data[100];
+};
+
+struct s118248
+{
+ void **vtbl;
+ struct char100 data;
+};
+
+void sink(struct char100 *buf);
+
+struct s118248 *pr118248(struct s118248 *pthis)
+{
+ struct char100 buf;
+ sink(&buf);
+ pthis->data = buf;
+ return pthis;
+}
+
+