tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693]

Message ID ZVdyIFfKpN9rkOWh@tucnak
State New
Headers
Series tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Jakub Jelinek Nov. 17, 2023, 2:01 p.m. UTC
  Hi!

Per the earlier discussions on this PR, the following patch folds
popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
if the corresponding popcount optab isn't implemented (I think any
double-word popcount or call will be necessarily slower than the
above cheap 3 op check and even for -Os larger or same size).

I've noticed e.g. C++ aligned new starts with std::has_single_bit
which does popcount (x) == 1.

As a follow-up, I'm considering changing in this routine the popcount
call to IFN_POPCOUNT with 2 arguments and during expansion test costs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-17  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/90693
	* tree-ssa-math-opts.cc (match_single_bit_test): New function.
	(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
	and NE_EXPR assignments and GIMPLE_CONDs.

	* gcc.target/i386/pr90693.c: New test.


	Jakub
  

Comments

Jeff Law Nov. 19, 2023, 3:59 a.m. UTC | #1
On 11/17/23 07:01, Jakub Jelinek wrote:
> Hi!
> 
> Per the earlier discussions on this PR, the following patch folds
> popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
> if the corresponding popcount optab isn't implemented (I think any
> double-word popcount or call will be necessarily slower than the
> above cheap 3 op check and even for -Os larger or same size).
> 
> I've noticed e.g. C++ aligned new starts with std::has_single_bit
> which does popcount (x) == 1.
> 
> As a follow-up, I'm considering changing in this routine the popcount
> call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2023-11-17  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/90693
> 	* tree-ssa-math-opts.cc (match_single_bit_test): New function.
> 	(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
> 	and NE_EXPR assignments and GIMPLE_CONDs.
> 
> 	* gcc.target/i386/pr90693.c: New test.
OK.

Jeff
  
Richard Biener Nov. 20, 2023, 7:54 a.m. UTC | #2
On Fri, 17 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> Per the earlier discussions on this PR, the following patch folds
> popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
> if the corresponding popcount optab isn't implemented (I think any
> double-word popcount or call will be necessarily slower than the
> above cheap 3 op check and even for -Os larger or same size).
> 
> I've noticed e.g. C++ aligned new starts with std::has_single_bit
> which does popcount (x) == 1.
> 
> As a follow-up, I'm considering changing in this routine the popcount
> call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Classically this would have been an RTL expansion alternative, given
we want to do less of those the next place would have been the ISEL
pass.  Any particular reason you chose widening-mul for this (guess
that pass just has a bad name and it's the effective "optimize" ISEL pass
we have).

Richard.

> 2023-11-17  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/90693
> 	* tree-ssa-math-opts.cc (match_single_bit_test): New function.
> 	(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
> 	and NE_EXPR assignments and GIMPLE_CONDs.
> 
> 	* gcc.target/i386/pr90693.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj	2023-11-13 15:51:02.920562303 +0100
> +++ gcc/tree-ssa-math-opts.cc	2023-11-17 11:37:02.255905475 +0100
> @@ -5154,6 +5154,72 @@ match_uaddc_usubc (gimple_stmt_iterator
>    return true;
>  }
>  
> +/* Replace .POPCOUNT (x) == 1 or .POPCOUNT (x) != 1 with
> +   (x & (x - 1)) > x - 1 or (x & (x - 1)) <= x - 1 if .POPCOUNT
> +   isn't a direct optab.  */
> +
> +static void
> +match_single_bit_test (gimple_stmt_iterator *gsi, gimple *stmt)
> +{
> +  tree clhs, crhs;
> +  enum tree_code code;
> +  if (gimple_code (stmt) == GIMPLE_COND)
> +    {
> +      clhs = gimple_cond_lhs (stmt);
> +      crhs = gimple_cond_rhs (stmt);
> +      code = gimple_cond_code (stmt);
> +    }
> +  else
> +    {
> +      clhs = gimple_assign_rhs1 (stmt);
> +      crhs = gimple_assign_rhs2 (stmt);
> +      code = gimple_assign_rhs_code (stmt);
> +    }
> +  if (code != EQ_EXPR && code != NE_EXPR)
> +    return;
> +  if (TREE_CODE (clhs) != SSA_NAME || !integer_onep (crhs))
> +    return;
> +  gimple *call = SSA_NAME_DEF_STMT (clhs);
> +  combined_fn cfn = gimple_call_combined_fn (call);
> +  switch (cfn)
> +    {
> +    CASE_CFN_POPCOUNT:
> +      break;
> +    default:
> +      return;
> +    }
> +  if (!has_single_use (clhs))
> +    return;
> +  tree arg = gimple_call_arg (call, 0);
> +  tree type = TREE_TYPE (arg);
> +  if (!INTEGRAL_TYPE_P (type))
> +    return;
> +  if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH))
> +    return;
> +  tree argm1 = make_ssa_name (type);
> +  gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg,
> +				   build_int_cst (type, -1));
> +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  g = gimple_build_assign (make_ssa_name (type), BIT_XOR_EXPR, arg, argm1);
> +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  if (gcond *cond = dyn_cast <gcond *> (stmt))
> +    {
> +      gimple_cond_set_lhs (cond, gimple_assign_lhs (g));
> +      gimple_cond_set_rhs (cond, argm1);
> +      gimple_cond_set_code (cond, code == EQ_EXPR ? GT_EXPR : LE_EXPR);
> +    }
> +  else
> +    {
> +      gimple_assign_set_rhs1 (stmt, gimple_assign_lhs (g));
> +      gimple_assign_set_rhs2 (stmt, argm1);
> +      gimple_assign_set_rhs_code (stmt, code == EQ_EXPR ? GT_EXPR : LE_EXPR);
> +    }
> +  update_stmt (stmt);
> +  gimple_stmt_iterator gsi2 = gsi_for_stmt (call);
> +  gsi_remove (&gsi2, true);
> +  release_defs (call);
> +}
> +
>  /* Return true if target has support for divmod.  */
>  
>  static bool
> @@ -5807,6 +5873,11 @@ math_opts_dom_walker::after_dom_children
>  	      match_uaddc_usubc (&gsi, stmt, code);
>  	      break;
>  
> +	    case EQ_EXPR:
> +	    case NE_EXPR:
> +	      match_single_bit_test (&gsi, stmt);
> +	      break;
> +
>  	    default:;
>  	    }
>  	}
> @@ -5872,7 +5943,10 @@ math_opts_dom_walker::after_dom_children
>  	    }
>  	}
>        else if (gimple_code (stmt) == GIMPLE_COND)
> -	optimize_spaceship (as_a <gcond *> (stmt));
> +	{
> +	  match_single_bit_test (&gsi, stmt);
> +	  optimize_spaceship (as_a <gcond *> (stmt));
> +	}
>        gsi_next (&gsi);
>      }
>    if (fma_state.m_deferring_p
> --- gcc/testsuite/gcc.target/i386/pr90693.c.jj	2023-11-16 19:31:43.383587048 +0100
> +++ gcc/testsuite/gcc.target/i386/pr90693.c	2023-11-16 19:33:23.676177086 +0100
> @@ -0,0 +1,29 @@
> +/* PR tree-optimization/90693 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-abm -mno-popcnt -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not "POPCOUNT \\\(" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "__builtin_popcount(ll)? \\\(" "optimized" } } */
> +
> +int
> +foo (unsigned int x)
> +{
> +  return __builtin_popcount (x) == 1;
> +}
> +
> +int
> +bar (unsigned int x)
> +{
> +  return (x ^ (x - 1)) > x - 1;
> +}
> +
> +int
> +baz (unsigned long long x)
> +{
> +  return __builtin_popcountll (x) == 1;
> +}
> +
> +int
> +qux (unsigned long long x)
> +{
> +  return (x ^ (x - 1)) > x - 1;
> +}
> 
> 	Jakub
> 
>
  
Jakub Jelinek Nov. 20, 2023, 8:37 a.m. UTC | #3
On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote:
> On Fri, 17 Nov 2023, Jakub Jelinek wrote:
> > Per the earlier discussions on this PR, the following patch folds
> > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
> > if the corresponding popcount optab isn't implemented (I think any
> > double-word popcount or call will be necessarily slower than the
> > above cheap 3 op check and even for -Os larger or same size).
> > 
> > I've noticed e.g. C++ aligned new starts with std::has_single_bit
> > which does popcount (x) == 1.
> > 
> > As a follow-up, I'm considering changing in this routine the popcount
> > call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Classically this would have been an RTL expansion alternative, given
> we want to do less of those the next place would have been the ISEL
> pass.  Any particular reason you chose widening-mul for this (guess
> that pass just has a bad name and it's the effective "optimize" ISEL pass
> we have).

I think the ssa-math-opts pass does far more of this staff than the isel
pass which only deals with vector stuff right now and you've even mentioned
that pass for that in the PR90693 thread.
That said, I can move it into the isel pass as well if you prefer that.

	Jakub
  
Richard Biener Nov. 20, 2023, 8:39 a.m. UTC | #4
On Mon, 20 Nov 2023, Jakub Jelinek wrote:

> On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote:
> > On Fri, 17 Nov 2023, Jakub Jelinek wrote:
> > > Per the earlier discussions on this PR, the following patch folds
> > > popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
> > > if the corresponding popcount optab isn't implemented (I think any
> > > double-word popcount or call will be necessarily slower than the
> > > above cheap 3 op check and even for -Os larger or same size).
> > > 
> > > I've noticed e.g. C++ aligned new starts with std::has_single_bit
> > > which does popcount (x) == 1.
> > > 
> > > As a follow-up, I'm considering changing in this routine the popcount
> > > call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Classically this would have been an RTL expansion alternative, given
> > we want to do less of those the next place would have been the ISEL
> > pass.  Any particular reason you chose widening-mul for this (guess
> > that pass just has a bad name and it's the effective "optimize" ISEL pass
> > we have).
> 
> I think the ssa-math-opts pass does far more of this staff than the isel
> pass which only deals with vector stuff right now and you've even mentioned
> that pass for that in the PR90693 thread.
> That said, I can move it into the isel pass as well if you prefer that.

I think it's fine as you posted (and Jeff approved), I'm just wondering
if we should rename that pass somehow ;)  Note ISEL is more for
required pre-expansion stuff and widen-mul is for expansion related but
optimization parts.

Richard.
  
Jakub Jelinek Nov. 20, 2023, 8:41 a.m. UTC | #5
On Mon, Nov 20, 2023 at 08:39:42AM +0000, Richard Biener wrote:
> I think it's fine as you posted (and Jeff approved), I'm just wondering
> if we should rename that pass somehow ;)  Note ISEL is more for
> required pre-expansion stuff and widen-mul is for expansion related but
> optimization parts.

+1 for renaming the pass.

	Jakub
  
Jeff Law Nov. 20, 2023, 1:03 p.m. UTC | #6
On 11/20/23 01:39, Richard Biener wrote:
> On Mon, 20 Nov 2023, Jakub Jelinek wrote:
> 
>> On Mon, Nov 20, 2023 at 07:54:54AM +0000, Richard Biener wrote:
>>> On Fri, 17 Nov 2023, Jakub Jelinek wrote:
>>>> Per the earlier discussions on this PR, the following patch folds
>>>> popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
>>>> if the corresponding popcount optab isn't implemented (I think any
>>>> double-word popcount or call will be necessarily slower than the
>>>> above cheap 3 op check and even for -Os larger or same size).
>>>>
>>>> I've noticed e.g. C++ aligned new starts with std::has_single_bit
>>>> which does popcount (x) == 1.
>>>>
>>>> As a follow-up, I'm considering changing in this routine the popcount
>>>> call to IFN_POPCOUNT with 2 arguments and during expansion test costs.
>>>>
>>>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>>
>>> Classically this would have been an RTL expansion alternative, given
>>> we want to do less of those the next place would have been the ISEL
>>> pass.  Any particular reason you chose widening-mul for this (guess
>>> that pass just has a bad name and it's the effective "optimize" ISEL pass
>>> we have).
>>
>> I think the ssa-math-opts pass does far more of this staff than the isel
>> pass which only deals with vector stuff right now and you've even mentioned
>> that pass for that in the PR90693 thread.
>> That said, I can move it into the isel pass as well if you prefer that.
> 
> I think it's fine as you posted (and Jeff approved), I'm just wondering
> if we should rename that pass somehow ;)  Note ISEL is more for
> required pre-expansion stuff and widen-mul is for expansion related but
> optimization parts.
Yea, a rename seems appropriate given how much stuff in there is no 
longer math-ops related :-)
jeff
  

Patch

--- gcc/tree-ssa-math-opts.cc.jj	2023-11-13 15:51:02.920562303 +0100
+++ gcc/tree-ssa-math-opts.cc	2023-11-17 11:37:02.255905475 +0100
@@ -5154,6 +5154,72 @@  match_uaddc_usubc (gimple_stmt_iterator
   return true;
 }
 
+/* Replace .POPCOUNT (x) == 1 or .POPCOUNT (x) != 1 with
+   (x & (x - 1)) > x - 1 or (x & (x - 1)) <= x - 1 if .POPCOUNT
+   isn't a direct optab.  */
+
+static void
+match_single_bit_test (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  tree clhs, crhs;
+  enum tree_code code;
+  if (gimple_code (stmt) == GIMPLE_COND)
+    {
+      clhs = gimple_cond_lhs (stmt);
+      crhs = gimple_cond_rhs (stmt);
+      code = gimple_cond_code (stmt);
+    }
+  else
+    {
+      clhs = gimple_assign_rhs1 (stmt);
+      crhs = gimple_assign_rhs2 (stmt);
+      code = gimple_assign_rhs_code (stmt);
+    }
+  if (code != EQ_EXPR && code != NE_EXPR)
+    return;
+  if (TREE_CODE (clhs) != SSA_NAME || !integer_onep (crhs))
+    return;
+  gimple *call = SSA_NAME_DEF_STMT (clhs);
+  combined_fn cfn = gimple_call_combined_fn (call);
+  switch (cfn)
+    {
+    CASE_CFN_POPCOUNT:
+      break;
+    default:
+      return;
+    }
+  if (!has_single_use (clhs))
+    return;
+  tree arg = gimple_call_arg (call, 0);
+  tree type = TREE_TYPE (arg);
+  if (!INTEGRAL_TYPE_P (type))
+    return;
+  if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH))
+    return;
+  tree argm1 = make_ssa_name (type);
+  gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg,
+				   build_int_cst (type, -1));
+  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  g = gimple_build_assign (make_ssa_name (type), BIT_XOR_EXPR, arg, argm1);
+  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  if (gcond *cond = dyn_cast <gcond *> (stmt))
+    {
+      gimple_cond_set_lhs (cond, gimple_assign_lhs (g));
+      gimple_cond_set_rhs (cond, argm1);
+      gimple_cond_set_code (cond, code == EQ_EXPR ? GT_EXPR : LE_EXPR);
+    }
+  else
+    {
+      gimple_assign_set_rhs1 (stmt, gimple_assign_lhs (g));
+      gimple_assign_set_rhs2 (stmt, argm1);
+      gimple_assign_set_rhs_code (stmt, code == EQ_EXPR ? GT_EXPR : LE_EXPR);
+    }
+  update_stmt (stmt);
+  gimple_stmt_iterator gsi2 = gsi_for_stmt (call);
+  gsi_remove (&gsi2, true);
+  release_defs (call);
+}
+
 /* Return true if target has support for divmod.  */
 
 static bool
@@ -5807,6 +5873,11 @@  math_opts_dom_walker::after_dom_children
 	      match_uaddc_usubc (&gsi, stmt, code);
 	      break;
 
+	    case EQ_EXPR:
+	    case NE_EXPR:
+	      match_single_bit_test (&gsi, stmt);
+	      break;
+
 	    default:;
 	    }
 	}
@@ -5872,7 +5943,10 @@  math_opts_dom_walker::after_dom_children
 	    }
 	}
       else if (gimple_code (stmt) == GIMPLE_COND)
-	optimize_spaceship (as_a <gcond *> (stmt));
+	{
+	  match_single_bit_test (&gsi, stmt);
+	  optimize_spaceship (as_a <gcond *> (stmt));
+	}
       gsi_next (&gsi);
     }
   if (fma_state.m_deferring_p
--- gcc/testsuite/gcc.target/i386/pr90693.c.jj	2023-11-16 19:31:43.383587048 +0100
+++ gcc/testsuite/gcc.target/i386/pr90693.c	2023-11-16 19:33:23.676177086 +0100
@@ -0,0 +1,29 @@ 
+/* PR tree-optimization/90693 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-abm -mno-popcnt -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "POPCOUNT \\\(" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_popcount(ll)? \\\(" "optimized" } } */
+
+int
+foo (unsigned int x)
+{
+  return __builtin_popcount (x) == 1;
+}
+
+int
+bar (unsigned int x)
+{
+  return (x ^ (x - 1)) > x - 1;
+}
+
+int
+baz (unsigned long long x)
+{
+  return __builtin_popcountll (x) == 1;
+}
+
+int
+qux (unsigned long long x)
+{
+  return (x ^ (x - 1)) > x - 1;
+}