[to-be-committed,RISC-V,PR,target/118248] Avoid bogus alloca call in RISC-V backend

Message ID c31b9773-ba54-49a6-b0b1-cbc904e5c457@gmail.com
State Committed
Delegated to: Jeff Law
Headers
Series [to-be-committed,RISC-V,PR,target/118248] Avoid bogus alloca call in RISC-V backend |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-lint success Lint passed
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-test success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Patch failed to apply

Commit Message

Jeff Law Feb. 16, 2025, 3:37 p.m. UTC
  This is Jakub's patch and Ian's testcase for the slightly vexing fault 
building the D runtime with an s390x-x-riscv cross compiler.

The core issue is we're allocating a vector to hold temporary registers 
unconditionally, including cases where the vector isn't needed because 
the loop isn't going to iterate.

In the cases where the vector isn't needed the length is computed with 
an expression (x / y) - 1 where x / y will be zero.  The alloca(-1) on 
the s390 platform triggers a fault.  We haven't seen the fault with an 
x86 cross, but we can certainly see the bogus value being passed to 
alloca with a debugger.

Jakub patch just conditionalizes the whole block in a sensible way.  So 
it looks larger than it really is.  I thought it might be better to do a 
bit of manual CSE on this code to make it even more obvious, but I think 
we're ultimately OK here.

Ian provided the testcase, collapsed down into equivalent C code. 
Again, it doesn't fault on an x86-x-riscv, but I can see the incorrect 
behavior with a debugger.

And a shout-out to Stefan for providing a docker based reproducer, it 
really helped track this down.

Waiting for the pre-commit tester to do its thing before committing.

Jeff
PR target/118248
gcc/
	* config/riscv/riscv-string.cc (riscv_block_move_straight): Only
	allocate REGS buffer if it will be needed.

gcc/testsuite
	* gcc.target/riscv/pr118248.c: New test.
  

Comments

Richard Biener Feb. 17, 2025, 9:56 a.m. UTC | #1
On Sun, Feb 16, 2025 at 4:38 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
> This is Jakub's patch and Ian's testcase for the slightly vexing fault
> building the D runtime with an s390x-x-riscv cross compiler.
>
> The core issue is we're allocating a vector to hold temporary registers
> unconditionally, including cases where the vector isn't needed because
> the loop isn't going to iterate.
>
> In the cases where the vector isn't needed the length is computed with
> an expression (x / y) - 1 where x / y will be zero.  The alloca(-1) on
> the s390 platform triggers a fault.  We haven't seen the fault with an
> x86 cross, but we can certainly see the bogus value being passed to
> alloca with a debugger.

I would expect alloca(-1) to trigger a fault with -fstack-clash-protection
even on x86, so we should indeed avoid doing this.

>
> Jakub patch just conditionalizes the whole block in a sensible way.  So
> it looks larger than it really is.  I thought it might be better to do a
> bit of manual CSE on this code to make it even more obvious, but I think
> we're ultimately OK here.
>
> Ian provided the testcase, collapsed down into equivalent C code.
> Again, it doesn't fault on an x86-x-riscv, but I can see the incorrect
> behavior with a debugger.
>
> And a shout-out to Stefan for providing a docker based reproducer, it
> really helped track this down.
>
> Waiting for the pre-commit tester to do its thing before committing.
>
> Jeff
>
  

Patch

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 408eb07e87f..90801899ec8 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -804,7 +804,7 @@  static void
 riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
 			   unsigned HOST_WIDE_INT align)
 {
-  unsigned HOST_WIDE_INT offset, delta;
+  unsigned HOST_WIDE_INT offset = 0, delta;
   unsigned HOST_WIDE_INT bits;
   int i;
   enum machine_mode mode;
@@ -815,20 +815,25 @@  riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
   mode = mode_for_size (bits, MODE_INT, 0).require ();
   delta = bits / BITS_PER_UNIT;
 
-  /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta - 1);
-
-  /* Load as many BITS-sized chunks as possible.  Use a normal load if
-     the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
+  if (2 * delta <= length)
     {
-      regs[i] = gen_reg_rtx (mode);
-      riscv_emit_move (regs[i], adjust_address (src, mode, offset));
-    }
+      /* Allocate a buffer for the temporary registers.  */
+      regs = XALLOCAVEC (rtx, length / delta - 1);
+
+      /* Load as many BITS-sized chunks as possible.  Use a normal load if
+	 the source has enough alignment, otherwise use left/right pairs.  */
+      for (offset = 0, i = 0; offset + 2 * delta <= length;
+	   offset += delta, i++)
+	{
+	  regs[i] = gen_reg_rtx (mode);
+	  riscv_emit_move (regs[i], adjust_address (src, mode, offset));
+	}
 
-  /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
-    riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+      /* Copy the chunks to the destination.  */
+      for (offset = 0, i = 0; offset + 2 * delta <= length;
+	   offset += delta, i++)
+	riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+    }
 
   /* Mop up any left-over bytes.  */
   if (offset < length)
diff --git a/gcc/testsuite/gcc.target/riscv/pr118248.c b/gcc/testsuite/gcc.target/riscv/pr118248.c
new file mode 100644
index 00000000000..05bf002abac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr118248.c
@@ -0,0 +1,26 @@ 
+/* This does not fault with an x86 cross to risc-v, but does
+   with s390 cross to risc-v, probably due to rounding of the
+   argument to the alloca call within the risc-v backend.  */
+/* { dg-do compile } */
+struct char100
+{
+    char data[100];
+};
+
+struct s118248
+{
+    void **vtbl;
+    struct char100 data;
+};
+
+void sink(struct char100 *buf);
+
+struct s118248 *pr118248(struct s118248 *pthis)
+{
+    struct char100 buf;
+    sink(&buf);
+    pthis->data = buf;
+    return pthis;
+}
+
+