[v2,FYI] -finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]

Message ID orbkaw8vlx.fsf_-_@lxoliva.fsfla.org
State Committed
Commit a8a3d832e609501002dee54150abfd96a28fe532
Headers
Series [v2,FYI] -finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm warning Patch is already merged
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 warning Patch is already merged

Commit Message

Alexandre Oliva Dec. 11, 2023, 6:03 p.m. UTC
  On Dec 11, 2023, Richard Biener <richard.guenther@gmail.com> wrote:

> you can use .exists (&move_mode) here to ...

Aah, yeah, and that should help avoid the noisy as_a conversions too,
that I could replace with require(), or drop altogether.

>> +      || (GET_MODE_BITSIZE (as_a <scalar_int_mode> (int_move_mode))
>> +         != incr * BITS_PER_UNIT))

Unfortunately, here it can't quite be dropped, GET_MODE_SIZE on a
machine_mode isn't suitable for the !=, but with .require() we know it's
a scalar_int_mode and thus != on its bitsize is fine.

> I'll note that int_mode_for_size and smallest_int_mode_for_size
> are not semantically equivalent in what they can return.  In
> particular it seems you are incrementing by iter_incr even when
> formerly smallest_int_mode_for_size would have returned a
> larger than necessary mode, resulting in at least inefficient
> code, copying/comparing pieces multiple times.

If we get a mode that isn't exactly the expected size, we go for
BLKmode, so it should be fine and efficient.  Unless machine modes are
not powers of two multiples of BITS_PER_UNIT, then things may get a
little weird, not so much because of repeated copying/comparing, but
because of inefficiencies in the block copy/compare operations with
block sizes that are not a good fit for such hypothetical (?) GCC
targets.  I guess we can cross that bridge if we ever get to it.

> So int_mode_for_size looks more correct.

*nod*.  IIRC I had it at first (very long ago), and went for the
smallest_ when transitioning to the machine_mode type hierarchy revamp.

> OK with the above change.

Here's what I've regstrapped on x86_64-linux-gnu, and will install
shortly.  Thanks!


smallest_int_mode_for_size may abort when the requested mode is not
available.  Call int_mode_for_size instead, that signals the
unsatisfiable request in a more graceful way.


for  gcc/ChangeLog

	PR middle-end/112784
	* expr.cc (emit_block_move_via_loop): Call int_mode_for_size
	for maybe-too-wide sizes.
	(emit_block_cmp_via_loop): Likewise.

for  gcc/testsuite/ChangeLog

	PR middle-end/112784
	* gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
---
 gcc/expr.cc                                        |   20 +++++++++-----------
 .../i386/avx512cd-inline-stringops-pr112784.c      |   12 ++++++++++++
 2 files changed, 21 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
  

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 6da51f2aca296..076ba706537aa 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2449,15 +2449,14 @@  emit_block_move_via_loop (rtx x, rtx y, rtx size,
     }
   emit_move_insn (iter, iter_init);
 
-  scalar_int_mode int_move_mode
-    = smallest_int_mode_for_size (incr * BITS_PER_UNIT);
-  if (GET_MODE_BITSIZE (int_move_mode) != incr * BITS_PER_UNIT)
+  opt_scalar_int_mode int_move_mode
+    = int_mode_for_size (incr * BITS_PER_UNIT, 1);
+  if (!int_move_mode.exists (&move_mode)
+      || GET_MODE_BITSIZE (int_move_mode.require ()) != incr * BITS_PER_UNIT)
     {
       move_mode = BLKmode;
       gcc_checking_assert (can_move_by_pieces (incr, align));
     }
-  else
-    move_mode = int_move_mode;
 
   x_addr = force_operand (XEXP (x, 0), NULL_RTX);
   y_addr = force_operand (XEXP (y, 0), NULL_RTX);
@@ -2701,16 +2700,15 @@  emit_block_cmp_via_loop (rtx x, rtx y, rtx len, tree len_type, rtx target,
   iter = gen_reg_rtx (iter_mode);
   emit_move_insn (iter, iter_init);
 
-  scalar_int_mode int_cmp_mode
-    = smallest_int_mode_for_size (incr * BITS_PER_UNIT);
-  if (GET_MODE_BITSIZE (int_cmp_mode) != incr * BITS_PER_UNIT
-      || !can_compare_p (NE, int_cmp_mode, ccp_jump))
+  opt_scalar_int_mode int_cmp_mode
+    = int_mode_for_size (incr * BITS_PER_UNIT, 1);
+  if (!int_cmp_mode.exists (&cmp_mode)
+      || GET_MODE_BITSIZE (int_cmp_mode.require ()) != incr * BITS_PER_UNIT
+      || !can_compare_p (NE, cmp_mode, ccp_jump))
     {
       cmp_mode = BLKmode;
       gcc_checking_assert (incr != 1);
     }
-  else
-    cmp_mode = int_cmp_mode;
 
   /* Save the base addresses.  */
   x_addr = force_operand (XEXP (x, 0), NULL_RTX);
diff --git a/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
new file mode 100644
index 0000000000000..c81f99c693c24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mavx512cd -finline-stringops" } */
+
+struct S {
+  int e;
+} __attribute__((aligned(128)));
+
+int main() {
+  struct S s1;
+  struct S s2;
+  int v = __builtin_memcmp(&s1, &s2, sizeof(s1));
+}