[v3,7/7] ifcvt: Run second pass if it is possible to omit a temporary.

Message ID 20211206184352.42264-8-rdapp@linux.ibm.com
State New
Headers
Series ifcvt: Convert multiple |

Commit Message

Robin Dapp Dec. 6, 2021, 6:43 p.m. UTC
  If one of the to-be-converted SETs requires the original comparison
(i.e. in order to generate a min/max insn) but no other insn after it
does, we can omit creating temporaries, thus facilitating costing.
---
 gcc/ifcvt.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)
  

Comments

Jeff Law Dec. 9, 2021, 1:24 a.m. UTC | #1
On 12/6/2021 11:43 AM, Robin Dapp via Gcc-patches wrote:
> If one of the to-be-converted SETs requires the original comparison
> (i.e. in order to generate a min/max insn) but no other insn after it
> does, we can omit creating temporaries, thus facilitating costing.
> ---
>   gcc/ifcvt.c | 33 +++++++++++++++++++++++++++++++--
>   1 file changed, 31 insertions(+), 2 deletions(-)
I'd generally prefer to refactor the bits between the restart label and 
the goto restart into a function and call it twice, passing in the 
additional bits to allow for better costing.  Can you look into that?  
If it's going to be major surgery, then the goto approach will be OK.

Conceptually I don't have any concerns with the patch.  It'll obviously 
need a ChangeLog entry.

Jeff
  
Robin Dapp Dec. 10, 2021, 3:06 p.m. UTC | #2
Hi Jeff,

> I'd generally prefer to refactor the bits between the restart label and 
> the goto restart into a function and call it twice, passing in the 
> additional bits to allow for better costing.  Can you look into that?  
> If it's going to be major surgery, then the goto approach will be OK.

I transplanted the loop into a separate function
"noce_convert_multiple_sets_1" (for the lack of a better name right
now).  I guess an argument could be made about also moving

+  rtx cc_cmp = cond_exec_get_condition (jump);
+  rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);

into the function and not care about traversing all instructions
twice/four times (will not be more than a few anyway) but I did not do
that for now.

Does this look better? Not fully tested yet everywhere but a test suite
run on s390 looked good.

Regards
 Robin
  
Jeff Law Dec. 15, 2021, 8:24 p.m. UTC | #3
On 12/10/2021 8:06 AM, Robin Dapp wrote:
> Hi Jeff,
>
>> I'd generally prefer to refactor the bits between the restart label and
>> the goto restart into a function and call it twice, passing in the
>> additional bits to allow for better costing.  Can you look into that?
>> If it's going to be major surgery, then the goto approach will be OK.
> I transplanted the loop into a separate function
> "noce_convert_multiple_sets_1" (for the lack of a better name right
> now).  I guess an argument could be made about also moving
>
> +  rtx cc_cmp = cond_exec_get_condition (jump);
> +  rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);
>
> into the function and not care about traversing all instructions
> twice/four times (will not be more than a few anyway) but I did not do
> that for now.
>
> Does this look better? Not fully tested yet everywhere but a test suite
> run on s390 looked good.
I think it's looks better.  One might argue that a structure rather than 
a half-dozen named arguments or a class would be even better, but I 
think that can wait for a full class-ification of that file.

You probably should move the prototype for noce_convert_multiple_set_1 
into the .c file.  It's static, no no need to expose it in the .h file 
AFAICT.  OK with that change.

jeff
  
Robin Dapp Jan. 10, 2022, 11:18 a.m. UTC | #4
Posting the ChangeLog before pushing.

--

gcc/ChangeLog:

	* ifcvt.c (noce_convert_multiple_sets_1): New function.
	(noce_convert_multiple_sets): Call function a second time if we can
	improve the first try.
  

Patch

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index d1e7db1ee27..3be5bb131df 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3262,6 +3262,11 @@  noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src);
 
+  int last_needs_comparison = -1;
+  bool second_try = false;
+
+restart:
+
   FOR_BB_INSNS (then_bb, insn)
     {
       /* Skip over non-insns.  */
@@ -3305,8 +3310,12 @@  noce_convert_multiple_sets (struct noce_if_info *if_info)
 	 Therefore we introduce a temporary every time we are about to
 	 overwrite a variable used in the check.  Costing of a sequence with
 	 these is going to be inaccurate so only use temporaries when
-	 needed.  */
-      if (reg_overlap_mentioned_p (target, cond))
+	 needed.
+
+	 If performing a second try, we know how many insns require a
+	 temporary.  For the last of these, we can omit creating one.  */
+      if (reg_overlap_mentioned_p (target, cond)
+	  && (!second_try || count < last_needs_comparison))
 	temp = gen_reg_rtx (GET_MODE (target));
       else
 	temp = target;
@@ -3389,6 +3398,8 @@  noce_convert_multiple_sets (struct noce_if_info *if_info)
 	{
 	  seq = seq1;
 	  temp_dest = temp_dest1;
+	  if (!second_try)
+	    last_needs_comparison = count;
 	}
       else if (seq2 != NULL_RTX)
 	{
@@ -3412,6 +3423,24 @@  noce_convert_multiple_sets (struct noce_if_info *if_info)
       unmodified_insns.safe_push (insn);
     }
 
+    /* If there are insns that overwrite part of the initial
+       comparison, we can still omit creating temporaries for
+       the last of them.
+       As the second try will always create a less expensive,
+       valid sequence, we do not need to compare and can discard
+       the first one.  */
+    if (!second_try && last_needs_comparison >= 0)
+      {
+	end_sequence ();
+	start_sequence ();
+	count = 0;
+	targets.truncate (0);
+	temporaries.truncate (0);
+	unmodified_insns.truncate (0);
+	second_try = true;
+	goto restart;
+      }
+
   /* We must have seen some sort of insn to insert, otherwise we were
      given an empty BB to convert, and we can't handle that.  */
   gcc_assert (!unmodified_insns.is_empty ());