Optimize signed DImode -> TImode on power10, PR target/104698

Message ID Yh2RRKVa5D+kKLmY@toto.the-meissners.org
State New
Headers
Series Optimize signed DImode -> TImode on power10, PR target/104698 |

Commit Message

Michael Meissner March 1, 2022, 3:21 a.m. UTC
  Optimize signed DImode -> TImode on power10, PR target/104698.

On power10, GCC tries to optimize the signed conversion from DImode to
TImode by using the vextsd2q instruction.  However to generate this
instruction, it would have to generate 3 direct moves (1 from the GPR
registers to the altivec registers, and 2 from the altivec registers to
the GPR register).

This patch adds code back in to use the shift right immediate instruction
to do the conversion if the target/source is GPR registers.

2022-02-28   Michael Meissner  <meissner@linux.ibm.com>

gcc/
	PR target/104698
	* config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
	(extendditi2): Replace with code to deal with both GPR registers
	and with altivec registers.
---
 gcc/config/rs6000/vsx.md | 73 ++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 21 deletions(-)
  

Comments

will schmidt March 1, 2022, 7:30 p.m. UTC | #1
On Mon, 2022-02-28 at 22:21 -0500, Michael Meissner wrote:
> Optimize signed DImode -> TImode on power10, PR target/104698.
> 

Hi,
  Logic seems OK to me, a few suggestions on the comments intermixed
below.  As always, i defer if there are counter arguments. :-)


> On power10, GCC tries to optimize the signed conversion from DImode to
> TImode by using the vextsd2q instruction.  However to generate this
> instruction, it would have to generate 3 direct moves (1 from the GPR
> registers to the altivec registers, and 2 from the altivec registers to
> the GPR register).
> 
> This patch adds code back in to use the shift right immediate instruction
> to do the conversion if the target/source is GPR registers.


Perhaps drop "back in".   If it's necessary to call out a previous
commit that removed the code for whatever reason, certainly do so. 
It's not clear from context if that was the case.


> 
> 2022-02-28   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 	PR target/104698
> 	* config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
> 	(extendditi2): Replace with code to deal with both GPR registers
> 	and with altivec registers.

Perhaps enhance with 
	(extendditi2):  Convert from define_expand to
define_insn_and_split.  Replace with code ...


> ---
>  gcc/config/rs6000/vsx.md | 73 ++++++++++++++++++++++++++++------------
>  1 file changed, 52 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index b53de103872..62464f67f4d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5023,15 +5023,58 @@ (define_expand "vsignextend_si_v2di"
>    DONE;
>  })
> 
> -;; ISA 3.1 vector sign extend
> -;; Move DI value from GPR to TI mode in VSX register, word 1.
> -(define_insn "mtvsrdd_diti_w1"
> -  [(set (match_operand:TI 0 "register_operand" "=wa")
> -	(unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> -		     UNSPEC_MTVSRD_DITI_W1))]
> -  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> -  "mtvsrdd %x0,0,%1"
> -  [(set_attr "type" "vecmove")])
> +;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets.  If
> +;; the register allocator prefers the GPRs, we won't have to move the value to
> +;; the altivec registers, do the vextsd2q instruction and move it back.  If we
> +;; aren't compiling for 64-bit power10, don't provide the service and let the
> +;; machine independent code handle the extension.

So, the ".. we won't have to ..." applies to the altivec target path
here?   Describing in a way that indicates what code doesn't do doesn't
seem right. 
If so, and perhaps even if not,  i suggest rearranging the
comment slightly so it can be read as an either or.  
If the register
allocator prefers the GPRS, ... 
Otherwise, for altivec registers we dothe vextsd2q ... 


> +(define_insn_and_split "extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> +	(sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> +   (clobber (reg:DI CA_REGNO))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "#"
> +  "&& reload_completed"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and then do
> +     a sign extension to the upper part.  */
> +  if (INT_REGNO_P (dest_regno))
> +    {
> +      rtx dest_hi = gen_highpart (DImode, dest);
> +      rtx dest_lo = gen_lowpart (DImode, dest);
> +
> +      emit_move_insn (dest_lo, src);
> +      emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
> +      DONE;
> +    }
ok

> +
> +  /* For conversion to Altivec register, generate either a splat operation or
> +     a load rightmost double word instruction.  Both instructions gets the
> +     DImode value into the lower 64 bits, and then do the vextsd2q
> +     instruction.  */

consider   s/instruction. Both instructions gets/to get/

> +  else if (ALTIVEC_REGNO_P (dest_regno))
> +    {
> +      if (MEM_P (src))
> +	emit_insn (gen_vsx_lxvrdx (dest, src));
> +      else
> +	{
> +	  rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> +	  emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> +	}
> +
> +      emit_insn (gen_extendditi2_vector (dest, dest));
> +      DONE;
> +    }

ok

lgtm, thanks
-Will

> +
> +  else
> +    gcc_unreachable ();
> +}
> +  [(set_attr "length" "8")])
> 
>  ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
>  (define_insn "extendditi2_vector"
> @@ -5042,18 +5085,6 @@ (define_insn "extendditi2_vector"
>    "vextsd2q %0,%1"
>    [(set_attr "type" "vecexts")])
> 
> -(define_expand "extendditi2"
> -  [(set (match_operand:TI 0 "gpc_reg_operand")
> -	(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> -  "TARGET_POWER10"
> -  {
> -    /* Move 64-bit src from GPR to vector reg and sign extend to 128-bits.  */
> -    rtx temp = gen_reg_rtx (TImode);
> -    emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> -    emit_insn (gen_extendditi2_vector (operands[0], temp));
> -    DONE;
> -  })
> -
>  
>  ;; ISA 3.0 Binary Floating-Point Support
> 
> -- 
> 2.35.1
> 
>
  

Patch

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b53de103872..62464f67f4d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5023,15 +5023,58 @@  (define_expand "vsignextend_si_v2di"
   DONE;
 })
 
-;; ISA 3.1 vector sign extend
-;; Move DI value from GPR to TI mode in VSX register, word 1.
-(define_insn "mtvsrdd_diti_w1"
-  [(set (match_operand:TI 0 "register_operand" "=wa")
-	(unspec:TI [(match_operand:DI 1 "register_operand" "r")]
-		     UNSPEC_MTVSRD_DITI_W1))]
-  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrdd %x0,0,%1"
-  [(set_attr "type" "vecmove")])
+;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets.  If
+;; the register allocator prefers the GPRs, we won't have to move the value to
+;; the altivec registers, do the vextsd2q instruction and move it back.  If we
+;; aren't compiling for 64-bit power10, don't provide the service and let the
+;; machine independent code handle the extension.
+(define_insn_and_split "extendditi2"
+  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
+	(sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
+   (clobber (reg:DI CA_REGNO))]
+  "TARGET_POWERPC64 && TARGET_POWER10"
+  "#"
+  "&& reload_completed"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  int dest_regno = reg_or_subregno (dest);
+
+  /* Handle conversion to GPR registers.  Load up the low part and then do
+     a sign extension to the upper part.  */
+  if (INT_REGNO_P (dest_regno))
+    {
+      rtx dest_hi = gen_highpart (DImode, dest);
+      rtx dest_lo = gen_lowpart (DImode, dest);
+
+      emit_move_insn (dest_lo, src);
+      emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
+      DONE;
+    }
+
+  /* For conversion to Altivec register, generate either a splat operation or
+     a load rightmost double word instruction.  Both instructions gets the
+     DImode value into the lower 64 bits, and then do the vextsd2q
+     instruction.  */
+  else if (ALTIVEC_REGNO_P (dest_regno))
+    {
+      if (MEM_P (src))
+	emit_insn (gen_vsx_lxvrdx (dest, src));
+      else
+	{
+	  rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
+	  emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
+	}
+
+      emit_insn (gen_extendditi2_vector (dest, dest));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+}
+  [(set_attr "length" "8")])
 
 ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
 (define_insn "extendditi2_vector"
@@ -5042,18 +5085,6 @@  (define_insn "extendditi2_vector"
   "vextsd2q %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_expand "extendditi2"
-  [(set (match_operand:TI 0 "gpc_reg_operand")
-	(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
-  "TARGET_POWER10"
-  {
-    /* Move 64-bit src from GPR to vector reg and sign extend to 128-bits.  */
-    rtx temp = gen_reg_rtx (TImode);
-    emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
-    emit_insn (gen_extendditi2_vector (operands[0], temp));
-    DONE;
-  })
-
 
 ;; ISA 3.0 Binary Floating-Point Support