match: Do not select to branchless expression when target has movcc pattern [PR113095]

Message ID 20240117095055.119853-1-monk.chiang@sifive.com
State New
Headers
Series match: Do not select to branchless expression when target has movcc pattern [PR113095] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Testing failed
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm fail Testing failed

Commit Message

Monk Chiang Jan. 17, 2024, 9:50 a.m. UTC
  This allows the backend to generate movcc instructions, if target
machine has movcc pattern.

branchless-cond.c needs to be updated since some target machines have
conditional move instructions, and the experssion will not change to
branchless expression.

gcc/ChangeLog:
	PR target/113095
	* match.pd (`(zero_one == 0) ? y : z <op> y`,
	`(zero_one != 0) ? z <op> y : y`): Do not match to branchless
	expression, if target machine has conditional move pattern.

gcc/testsuite/ChangeLog:
	* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
---
 gcc/match.pd                                  | 30 +++++++++++++++++--
 .../gcc.dg/tree-ssa/branchless-cond.c         |  6 ++--
 2 files changed, 31 insertions(+), 5 deletions(-)
  

Comments

Richard Biener Jan. 17, 2024, 12:14 p.m. UTC | #1
On Wed, 17 Jan 2024, Monk Chiang wrote:

> This allows the backend to generate movcc instructions, if target
> machine has movcc pattern.
> 
> branchless-cond.c needs to be updated since some target machines have
> conditional move instructions, and the experssion will not change to
> branchless expression.

While I agree this pattern should possibly be applied during RTL
expansion or instruction selection on x86 which also has movcc
the multiplication is cheaper.  So I don't think this isn't the way to go.

I'd rather revert the change than trying to "fix" it this way?

Thanks,
Richard.

> gcc/ChangeLog:
> 	PR target/113095
> 	* match.pd (`(zero_one == 0) ? y : z <op> y`,
> 	`(zero_one != 0) ? z <op> y : y`): Do not match to branchless
> 	expression, if target machine has conditional move pattern.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
> ---
>  gcc/match.pd                                  | 30 +++++++++++++++++--
>  .../gcc.dg/tree-ssa/branchless-cond.c         |  6 ++--
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e42ecaf9ec7..a1f90b1cd41 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if (INTEGRAL_TYPE_P (type)
>         && TYPE_PRECISION (type) > 1
>         && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
> -       (op (mult (convert:type @0) @2) @1))))
> +   (with {
> +      bool can_movecc_p = false;
> +      if (can_conditionally_move_p (TYPE_MODE (type)))
> +	can_movecc_p = true;
> +
> +      /* Some target only support word_mode for movcc pattern, if type can
> +	 extend to word_mode then use conditional move. Even if there is a
> +	 extend instruction, the cost is lower than branchless.  */
> +      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +	  && can_conditionally_move_p (word_mode))
> +	can_movecc_p = true;
> +    }
> +    (if (!can_movecc_p)
> +     (op (mult (convert:type @0) @2) @1))))))
>  
>  /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */
>  (for op (bit_xor bit_ior plus)
> @@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if (INTEGRAL_TYPE_P (type)
>         && TYPE_PRECISION (type) > 1
>         && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
> -       (op (mult (convert:type @0) @2) @1))))
> +   (with {
> +      bool can_movecc_p = false;
> +      if (can_conditionally_move_p (TYPE_MODE (type)))
> +	can_movecc_p = true;
> +
> +      /* Some target only support word_mode for movcc pattern, if type can
> +	 extend to word_mode then use conditional move. Even if there is a
> +	 extend instruction, the cost is lower than branchless.  */
> +      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
> +	  && can_conditionally_move_p (word_mode))
> +	can_movecc_p = true;
> +    }
> +    (if (!can_movecc_p)
> +     (op (mult (convert:type @0) @2) @1))))))
>  
>  /* ?: Value replacement. */
>  /* a == 0 ? b : b + a  -> b + a */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> index e063dc4bb5f..c002ed97364 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z)
>    return ((x & 1) != 0) ? z | y : y;
>  }
>  
> -/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
> -/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
> +/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
>
  
Jeff Law Jan. 17, 2024, 2:37 p.m. UTC | #2
On 1/17/24 05:14, Richard Biener wrote:
> On Wed, 17 Jan 2024, Monk Chiang wrote:
> 
>> This allows the backend to generate movcc instructions, if target
>> machine has movcc pattern.
>>
>> branchless-cond.c needs to be updated since some target machines have
>> conditional move instructions, and the experssion will not change to
>> branchless expression.
> 
> While I agree this pattern should possibly be applied during RTL
> expansion or instruction selection on x86 which also has movcc
> the multiplication is cheaper.  So I don't think this isn't the way to go.
> 
> I'd rather revert the change than trying to "fix" it this way?
WRT reverting -- the patch in question's sole purpose was to enable 
branchless sequences for that very same code.  Reverting would regress 
performance on a variety of micro-architectures.  IIUC, the issue is 
that the SiFive part in question has a fusion which allows it to do the 
branchy sequence cheaply.

ISTM this really needs to be addressed during expansion and most likely 
with a RISC-V target twiddle for the micro-archs which have 
short-forward-branch optimizations.

jeff
  
Monk Chiang Jan. 18, 2024, 3:19 a.m. UTC | #3
Thanks for your advice!! I agree it should be fixed in the RISC-V backend
when expansion.


On Wed, Jan 17, 2024 at 10:37 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 1/17/24 05:14, Richard Biener wrote:
> > On Wed, 17 Jan 2024, Monk Chiang wrote:
> >
> >> This allows the backend to generate movcc instructions, if target
> >> machine has movcc pattern.
> >>
> >> branchless-cond.c needs to be updated since some target machines have
> >> conditional move instructions, and the experssion will not change to
> >> branchless expression.
> >
> > While I agree this pattern should possibly be applied during RTL
> > expansion or instruction selection on x86 which also has movcc
> > the multiplication is cheaper.  So I don't think this isn't the way to
> go.
> >
> > I'd rather revert the change than trying to "fix" it this way?
> WRT reverting -- the patch in question's sole purpose was to enable
> branchless sequences for that very same code.  Reverting would regress
> performance on a variety of micro-architectures.  IIUC, the issue is
> that the SiFive part in question has a fusion which allows it to do the
> branchy sequence cheaply.
>
> ISTM this really needs to be addressed during expansion and most likely
> with a RISC-V target twiddle for the micro-archs which have
> short-forward-branch optimizations.
>
> jeff
>
  
Palmer Dabbelt Jan. 18, 2024, 3:30 a.m. UTC | #4
On Wed, 17 Jan 2024 19:19:58 PST (-0800), monk.chiang@sifive.com wrote:
> Thanks for your advice!! I agree it should be fixed in the RISC-V backend
> when expansion.
>
>
> On Wed, Jan 17, 2024 at 10:37 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>>
>>
>> On 1/17/24 05:14, Richard Biener wrote:
>> > On Wed, 17 Jan 2024, Monk Chiang wrote:
>> >
>> >> This allows the backend to generate movcc instructions, if target
>> >> machine has movcc pattern.
>> >>
>> >> branchless-cond.c needs to be updated since some target machines have
>> >> conditional move instructions, and the experssion will not change to
>> >> branchless expression.
>> >
>> > While I agree this pattern should possibly be applied during RTL
>> > expansion or instruction selection on x86 which also has movcc
>> > the multiplication is cheaper.  So I don't think this isn't the way to
>> go.
>> >
>> > I'd rather revert the change than trying to "fix" it this way?
>> WRT reverting -- the patch in question's sole purpose was to enable
>> branchless sequences for that very same code.  Reverting would regress
>> performance on a variety of micro-architectures.  IIUC, the issue is
>> that the SiFive part in question has a fusion which allows it to do the
>> branchy sequence cheaply.
>>
>> ISTM this really needs to be addressed during expansion and most likely
>> with a RISC-V target twiddle for the micro-archs which have
>> short-forward-branch optimizations.

IIRC I ran into some of these middle-end interactions a year or two ago 
and determined that we'd need middle-end changes to get this working 
smoothly -- essentially replacing the expander checks for a MOVCC insn  
with some sort of costing.

Without that, we're just going to end up with some missed optimizations 
that favor one way or the other.

>>
>> jeff
>>
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index e42ecaf9ec7..a1f90b1cd41 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4231,7 +4231,20 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
        && TYPE_PRECISION (type) > 1
        && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
-       (op (mult (convert:type @0) @2) @1))))
+   (with {
+      bool can_movecc_p = false;
+      if (can_conditionally_move_p (TYPE_MODE (type)))
+	can_movecc_p = true;
+
+      /* Some target only support word_mode for movcc pattern, if type can
+	 extend to word_mode then use conditional move. Even if there is a
+	 extend instruction, the cost is lower than branchless.  */
+      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+	  && can_conditionally_move_p (word_mode))
+	can_movecc_p = true;
+    }
+    (if (!can_movecc_p)
+     (op (mult (convert:type @0) @2) @1))))))
 
 /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */
 (for op (bit_xor bit_ior plus)
@@ -4243,7 +4256,20 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type)
        && TYPE_PRECISION (type) > 1
        && (INTEGRAL_TYPE_P (TREE_TYPE (@0))))
-       (op (mult (convert:type @0) @2) @1))))
+   (with {
+      bool can_movecc_p = false;
+      if (can_conditionally_move_p (TYPE_MODE (type)))
+	can_movecc_p = true;
+
+      /* Some target only support word_mode for movcc pattern, if type can
+	 extend to word_mode then use conditional move. Even if there is a
+	 extend instruction, the cost is lower than branchless.  */
+      if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type))
+	  && can_conditionally_move_p (word_mode))
+	can_movecc_p = true;
+    }
+    (if (!can_movecc_p)
+     (op (mult (convert:type @0) @2) @1))))))
 
 /* ?: Value replacement. */
 /* a == 0 ? b : b + a  -> b + a */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
index e063dc4bb5f..c002ed97364 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -21,6 +21,6 @@  int f4(unsigned int x, unsigned int y, unsigned int z)
   return ((x & 1) != 0) ? z | y : y;
 }
 
-/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
+/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */