[rs6000] Clean up pre-checking of expand_block_compare

Message ID e3179721-0e0a-4790-b244-55c95465ecc3@linux.ibm.com
State New
Headers
Series [rs6000] Clean up pre-checking of expand_block_compare |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply

Commit Message

HAO CHEN GUI Dec. 11, 2023, 2:54 a.m. UTC
  Hi,
  This patch cleans up pre-checking of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 CPU test as only P7 above can enter this function and P7
LE is excluded by targetm.slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up pre-checking of expand_block_compare

gcc/
	* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
	only P7 above can enter this function.  Return false when it's
	optimized for size.  Remove P7 CPU test as only P7 above can enter
	this function and P7 LE is excluded by the checking of
	targetm.slow_unaligned_access on word_mode.

gcc/testsuite/
	* gcc.target/powerpc/memcmp_for_size.c: New.


patch.diff
  

Comments

Kewen.Lin Dec. 12, 2023, 5:51 a.m. UTC | #1
Hi,

on 2023/12/11 10:54, HAO CHEN GUI wrote:
> Hi,
>   This patch cleans up pre-checking of expand_block_compare. It does
> 1. Assert only P7 above can enter this function as it's already guard
> by the expand.
> 2. Return false when optimizing for size.
> 3. Remove P7 CPU test as only P7 above can enter this function and P7
> LE is excluded by targetm.slow_unaligned_access.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Clean up pre-checking of expand_block_compare
> 
> gcc/
> 	* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
> 	only P7 above can enter this function.  Return false when it's
> 	optimized for size.  Remove P7 CPU test as only P7 above can enter
> 	this function and P7 LE is excluded by the checking of
> 	targetm.slow_unaligned_access on word_mode.
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/memcmp_for_size.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
> index d4030854b2a..dff69e90d0c 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align,
>  bool
>  expand_block_compare (rtx operands[])
>  {
> +  gcc_assert (TARGET_POPCNTD);

Nit: Add one comment on why we expect TARGET_POPCNTD here.

> +
> +  if (optimize_insn_for_size_p ())
> +    return false;
> +
> +  /* Allow this param to shut off all expansion.  */
> +  if (rs6000_block_compare_inline_limit == 0)
> +    return false;
> +
>    rtx target = operands[0];
>    rtx orig_src1 = operands[1];
>    rtx orig_src2 = operands[2];

Nit: Move these below closer to their uses.

> @@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[])
>    if (TARGET_32BIT && TARGET_POWERPC64)
>      return false;
> 
> -  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
> -
> -  /* Allow this param to shut off all expansion.  */
> -  if (rs6000_block_compare_inline_limit == 0)
> -    return false;
> -
> -  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
> -     However slow_unaligned_access returns true on P7 even though the
> -     performance of this code is good there.  */
> -  if (!isP7
> -      && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
> -	  || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))))
> -    return false;
> -
> -  /* Unaligned l*brx traps on P7 so don't do this.  However this should
> -     not affect much because LE isn't really supported on P7 anyway.  */
> -  if (isP7 && !BYTES_BIG_ENDIAN)

IMHO we'd better to keep this check, since users are able to specify
no-strict-align on P7, that is we can't guarantee it's always strict-align
and can be rejected by targetm.slow_unaligned_access below.

> +  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
> +    if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
> +	|| targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
>      return false;

This change makes us respect targetm.slow_unaligned_access more, I like it.

> 
>    /* If this is not a fixed size compare, try generating loop code and
> @@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[])
>    if (!IN_RANGE (bytes, 1, max_bytes))
>      return expand_compare_loop (operands);
> 
> -  /* The code generated for p7 and older is not faster than glibc
> -     memcmp if alignment is small and length is not short, so bail
> -     out to avoid those conditions.  */
> -  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
> -      && ((base_align == 1 && bytes > 16)
> -	  || (base_align == 2 && bytes > 32)))
> -    return false;

Why did you change this?  I didn't see any explanation above or am I missing?

> -
>    rtx final_label = NULL;
> 
>    if (use_vec)
> diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
> new file mode 100644
> index 00000000000..c7e853ad593
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c

Nit: As the comment in another thread, it can be block-cmp-3.c or similar.

BR,
Kewen
 
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 4);
> +}
  

Patch

diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index d4030854b2a..dff69e90d0c 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,6 +1946,15 @@  expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  gcc_assert (TARGET_POPCNTD);
+
+  if (optimize_insn_for_size_p ())
+    return false;
+
+  /* Allow this param to shut off all expansion.  */
+  if (rs6000_block_compare_inline_limit == 0)
+    return false;
+
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
@@ -1959,23 +1968,9 @@  expand_block_compare (rtx operands[])
   if (TARGET_32BIT && TARGET_POWERPC64)
     return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
-  /* Allow this param to shut off all expansion.  */
-  if (rs6000_block_compare_inline_limit == 0)
-    return false;
-
-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
-     However slow_unaligned_access returns true on P7 even though the
-     performance of this code is good there.  */
-  if (!isP7
-      && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
-	  || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))))
-    return false;
-
-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
-     not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+    if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+	|| targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
     return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2018,6 @@  expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
     return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
-     memcmp if alignment is small and length is not short, so bail
-     out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
-      && ((base_align == 1 && bytes > 16)
-	  || (base_align == 2 && bytes > 32)))
-    return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
new file mode 100644
index 00000000000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
@@ -0,0 +1,8 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}