ifcvt: Fix PR104153 and PR104198

Message ID 35755ec2-0612-0d94-c2d5-d2527c384c3b@linux.ibm.com
State New
Headers
Series ifcvt: Fix PR104153 and PR104198 |

Commit Message

Robin Dapp Feb. 1, 2022, 2:45 p.m. UTC
  Hi,

this is a bugfix for aa8cfe785953a0e87d2472311e1260cd98c605c0 which
broke an or1k test case (PR104153) as well as SPARC bootstrap (PR104198).

cond_exec_get_condition () returns the jump condition directly and we
now it to the backend.  The or1k backend modified the condition in-place
but this modification is not reverted when the sequence in question is
discarded.  Therefore this patch copies the RTX instead of using it
directly.

The SPARC problem is due to the backend recreating the initial condition
when being passed a CC comparison.  This causes the sequence
to read from an already overwritten condition operand.  Generally, this
could also happen on other targets.  The workaround is to always first
emit to a temporary.  In a second run of noce_convert_multiple_1 we know
which sequences actually require the comparison and use no
temporaries if all sequences after the current one do not require it.


Before, I used reg_overlap_mentioned_p () to check the generated
instructions against the condition.  The problem with this is that
reg_overlap... only handles a set of rtx_codes while a backend can
theoretically emit everything in an expander.  Is reg_mentioned_p () the
"right thing" to do?  Maybe it is overly conservative but as soon as we
have more than let's say three insns, we are unlikely to succeed anyway.

Bootstrapped and reg-tested on s390x, Power 9, x86 and SPARC.

Regards
 Robin

--

        PR 104198
        PR 104153

gcc/ChangeLog:

        * ifcvt.cc (noce_convert_multiple_sets_1): Copy rtx instead of
using it
        directly.  Rework comparison handling and always perform a
second pass.

gcc/testsuite/ChangeLog:

        * gcc.dg/pr104198.c: New test.
  

Comments

Rainer Orth Feb. 2, 2022, 11:36 a.m. UTC | #1
Hi Robin,

> Bootstrapped and reg-tested on s390x, Power 9, x86 and SPARC.

I've now also tested the patch on sparcv9-sun-solaris2.11: no regressions.

Thanks.
        Rainer
  
Jeff Law Feb. 2, 2022, 8:40 p.m. UTC | #2
On 2/1/2022 7:45 AM, Robin Dapp wrote:
> Hi,
>
> this is a bugfix for aa8cfe785953a0e87d2472311e1260cd98c605c0 which
> broke an or1k test case (PR104153) as well as SPARC bootstrap (PR104198).
>
> cond_exec_get_condition () returns the jump condition directly and we
> now it to the backend.  The or1k backend modified the condition in-place
> but this modification is not reverted when the sequence in question is
> discarded.  Therefore this patch copies the RTX instead of using it
> directly.
>
> The SPARC problem is due to the backend recreating the initial condition
> when being passed a CC comparison.  This causes the sequence
> to read from an already overwritten condition operand.  Generally, this
> could also happen on other targets.  The workaround is to always first
> emit to a temporary.  In a second run of noce_convert_multiple_1 we know
> which sequences actually require the comparison and use no
> temporaries if all sequences after the current one do not require it.
>
>
> Before, I used reg_overlap_mentioned_p () to check the generated
> instructions against the condition.  The problem with this is that
> reg_overlap... only handles a set of rtx_codes while a backend can
> theoretically emit everything in an expander.  Is reg_mentioned_p () the
> "right thing" to do?  Maybe it is overly conservative but as soon as we
> have more than let's say three insns, we are unlikely to succeed anyway.
>
> Bootstrapped and reg-tested on s390x, Power 9, x86 and SPARC.
>
> Regards
>   Robin
>
> --
>
>          PR 104198
>          PR 104153
>
> gcc/ChangeLog:
>
>          * ifcvt.cc (noce_convert_multiple_sets_1): Copy rtx instead of
> using it
>          directly.  Rework comparison handling and always perform a
> second pass.
>
> gcc/testsuite/ChangeLog:
>
>          * gcc.dg/pr104198.c: New test.
Also note this fixed the or1k issues and it's been tested on a variety 
of the embedded targets.


>
> ifcvt-pr.patch
>
> commit 68489d5729b4879bf2df540753fc7ea8ba1565a5
> Author: Robin Dapp <rdapp@linux.ibm.com>
> Date:   Mon Jan 24 10:28:05 2022 +0100
>
>      ifcvt: Fix PR104153 and PR104198.
>      
>      This is a bugfix for aa8cfe785953a0e87d2472311e1260cd98c605c0 which
>      broke an or1k test case (PR104153) as well as SPARC bootstrap (PR104198).
>      
>      cond_exec_get_condition () returns the jump condition directly and we now
>      pass it to the backend.  The or1k backend modified the condition in-place
>      (other backends do that as well) but this modification is not reverted
>      when the sequence in question is discarded.  Therefore we copy the RTX
>      instead of using it directly.
>      
>      The SPARC problem is due to the SPARC backend recreating the initial
>      condition when being passed a CC comparison.  This causes the sequence
>      to read from an already overwritten condition operand.  Generally, this
>      could also happen on other targets.  The workaround is to always first
>      emit to a temporary.  In a second run of noce_convert_multiple_1 we know
>      which sequences actually require the comparison and will use no
>      temporaries if all sequences after the current one do not require it.
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index fe250d508e1..92c2b40a45a 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3391,7 +3391,11 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>     rtx cond = noce_get_condition (jump, &cond_earliest, false);
>   
>     rtx cc_cmp = cond_exec_get_condition (jump);
> +  if (cc_cmp)
> +    cc_cmp = copy_rtx (cc_cmp);
>     rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);
> +  if (rev_cc_cmp)
> +    rev_cc_cmp = copy_rtx (rev_cc_cmp);
>   
>     rtx_insn *insn;
>     int count = 0;
> @@ -3515,6 +3519,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>         unsigned cost1 = 0, cost2 = 0;
>         rtx_insn *seq, *seq1, *seq2;
>         rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
> +      bool read_comparison = false;
>   
>         seq1 = try_emit_cmove_seq (if_info, temp, cond,
>   				 new_val, old_val, need_cmov,
> @@ -3524,10 +3529,38 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>   	 as well.  This allows the backend to emit a cmov directly without
>   	 creating an additional compare for each.  If successful, costing
>   	 is easier and this sequence is usually preferred.  */
> -      seq2 = try_emit_cmove_seq (if_info, target, cond,
> +      seq2 = try_emit_cmove_seq (if_info, temp, cond,
Do you need to adjust anything now that this is emitting into TEMP 
rather than TARGET?
jeff
  
Robin Dapp Feb. 3, 2022, 8 a.m. UTC | #3
> Do you need to adjust anything now that this is emitting into TEMP 
> rather than TARGET?

The idea now is to emit to TEMP in the first pass and check if we read
the initial condition.  Overwriting the condition (and of course reading
it in every sequence) is the reason temporaries were needed in the first
place.

In the second pass (always done now) we use TEMP for those
SETs/sequences before the last one that read the condition and TARGET
for those after it.  Ideally it's TARGET for all of them which would
result in the best costing situation.

The whole logic was/is really a bit convoluted for what we want to
achieve but I'd argue it's not much worse than before ;)

Regards
 Robin
  
Jeff Law Feb. 5, 2022, 5:46 p.m. UTC | #4
On 2/3/2022 1:00 AM, Robin Dapp wrote:
>> Do you need to adjust anything now that this is emitting into TEMP
>> rather than TARGET?
> The idea now is to emit to TEMP in the first pass and check if we read
> the initial condition.  Overwriting the condition (and of course reading
> it in every sequence) is the reason temporaries were needed in the first
> place.
>
> In the second pass (always done now) we use TEMP for those
> SETs/sequences before the last one that read the condition and TARGET
> for those after it.  Ideally it's TARGET for all of them which would
> result in the best costing situation.
OK.  Thanks for clarifying.

>
> The whole logic was/is really a bit convoluted for what we want to
> achieve but I'd argue it's not much worse than before ;)
Yea.  And it's not that far off from things we do elsewhere -- we 
generate some sequence then start grubbing around inside for things of 
interest.

OK for the trunk.

Any thoughts on the ARC failure which is triggered by your patch and 
still unresolved?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104154

Jeff
  

Patch

commit 68489d5729b4879bf2df540753fc7ea8ba1565a5
Author: Robin Dapp <rdapp@linux.ibm.com>
Date:   Mon Jan 24 10:28:05 2022 +0100

    ifcvt: Fix PR104153 and PR104198.
    
    This is a bugfix for aa8cfe785953a0e87d2472311e1260cd98c605c0 which
    broke an or1k test case (PR104153) as well as SPARC bootstrap (PR104198).
    
    cond_exec_get_condition () returns the jump condition directly and we now
    pass it to the backend.  The or1k backend modified the condition in-place
    (other backends do that as well) but this modification is not reverted
    when the sequence in question is discarded.  Therefore we copy the RTX
    instead of using it directly.
    
    The SPARC problem is due to the SPARC backend recreating the initial
    condition when being passed a CC comparison.  This causes the sequence
    to read from an already overwritten condition operand.  Generally, this
    could also happen on other targets.  The workaround is to always first
    emit to a temporary.  In a second run of noce_convert_multiple_1 we know
    which sequences actually require the comparison and will use no
    temporaries if all sequences after the current one do not require it.

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index fe250d508e1..92c2b40a45a 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3391,7 +3391,11 @@  noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
   rtx cond = noce_get_condition (jump, &cond_earliest, false);
 
   rtx cc_cmp = cond_exec_get_condition (jump);
+  if (cc_cmp)
+    cc_cmp = copy_rtx (cc_cmp);
   rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);
+  if (rev_cc_cmp)
+    rev_cc_cmp = copy_rtx (rev_cc_cmp);
 
   rtx_insn *insn;
   int count = 0;
@@ -3515,6 +3519,7 @@  noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
       unsigned cost1 = 0, cost2 = 0;
       rtx_insn *seq, *seq1, *seq2;
       rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
+      bool read_comparison = false;
 
       seq1 = try_emit_cmove_seq (if_info, temp, cond,
 				 new_val, old_val, need_cmov,
@@ -3524,10 +3529,38 @@  noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	 as well.  This allows the backend to emit a cmov directly without
 	 creating an additional compare for each.  If successful, costing
 	 is easier and this sequence is usually preferred.  */
-      seq2 = try_emit_cmove_seq (if_info, target, cond,
+      seq2 = try_emit_cmove_seq (if_info, temp, cond,
 				 new_val, old_val, need_cmov,
 				 &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
 
+      /* The backend might have created a sequence that uses the
+	 condition.  Check this.  */
+      rtx_insn *walk = seq2;
+      while (walk)
+	{
+	  rtx set = single_set (walk);
+
+	  if (!set || !SET_SRC (set)) {
+	      walk = NEXT_INSN (walk);
+	      continue;
+	  }
+
+	  rtx src = SET_SRC (set);
+
+	  if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
+	    ;
+	  else
+	    {
+	      if (reg_mentioned_p (XEXP (cond, 0), src)
+		  || reg_mentioned_p (XEXP (cond, 1), src))
+		{
+		  read_comparison = true;
+		  break;
+		}
+	    }
+	  walk = NEXT_INSN (walk);
+	}
+
       /* Check which version is less expensive.  */
       if (seq1 != NULL_RTX && (cost1 <= cost2 || seq2 == NULL_RTX))
 	{
@@ -3540,6 +3573,8 @@  noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	{
 	  seq = seq2;
 	  temp_dest = temp_dest2;
+	  if (!second_try && read_comparison)
+	    *last_needs_comparison = count;
 	}
       else
 	{
@@ -3558,6 +3593,12 @@  noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
       unmodified_insns->safe_push (insn);
     }
 
+  /* Even if we did not actually need the comparison, we want to make sure
+     to try a second time in order to get rid of the temporaries.  */
+  if (*last_needs_comparison == -1)
+    *last_needs_comparison = 0;
+
+
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr104198.c b/gcc/testsuite/gcc.dg/pr104198.c
new file mode 100644
index 00000000000..bfc7a777184
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104198.c
@@ -0,0 +1,36 @@ 
+/* Make sure if conversion for two instructions does not break
+   anything (if it runs).  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c99" } */
+
+#include <limits.h>
+#include <assert.h>
+
+__attribute__ ((noinline))
+int foo (int *a, int n)
+{
+  int min = 999999;
+  int bla = 0;
+  for (int i = 0; i < n; i++)
+    {
+      if (a[i] < min)
+	{
+	  min = a[i];
+	  bla = 1;
+	}
+    }
+
+  if (bla)
+    min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a) / sizeof (a[0]));
+
+  assert (res == (INT_MIN + 1));
+}