rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556]

Message ID e7cd06f2-3aca-d4b6-deec-9a67e752e0b9@linux.ibm.com
State New
Headers
Series rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556] |

Commit Message

Peter Bergner May 10, 2022, 8:47 p.m. UTC
  This patch addresses an issue when compiling the MMA optimized DGEMM kernel
in OpenBLAS.  The MMA code uses all 8 accumulators, which overlap all vs0-vs31
vector registers.  Current trunk assigns one of the normal vector inputs to
one of the MMA instructions, which forces us to spill one of the accumulators
to memory, leading to poor performance.  The solution here is to replace the
"wa" constraints for the vector input operands in the MMA instruction patterns
with "v,?d" so that we disparage using vs0-vs31 and prefer using the altivec
registers vs32-vs63 instead, which fixes the dgemm performance issue.

This passed bootstrap and regtesting with no regressions on powerpc64le-linux.
Ok for trunk and after a few days of burn-in to the GCC12 release branch?

Technically, the same issue exists in GCC11 and GCC10, but the RA
assignment is OK with the current code, so unless/until we have a
test case that exhibits the issue, I'm only asking for a backport to
GCC12 which does show the performance problem.

Peter


gcc/
	PR target/105556
	* config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
	mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraint with "v,?d".
  

Comments

Segher Boessenkool May 10, 2022, 10:35 p.m. UTC | #1
Hi!

On Tue, May 10, 2022 at 03:47:40PM -0500, Peter Bergner wrote:
> This patch addresses an issue when compiling the MMA optimized DGEMM kernel

If you want to use this same message as commit message, you shouldn't
say "this patch".  Also, try not to use lines more than 72 positions
wide (which handily is also a good maximum length for email messages,
that way it can be quoted a few times without wrapping).

> in OpenBLAS.  The MMA code uses all 8 accumulators, which overlap all vs0-vs31
> vector registers.  Current trunk assigns one of the normal vector inputs to
> one of the MMA instructions, which forces us to spill one of the accumulators
> to memory, leading to poor performance.  The solution here is to replace the
> "wa" constraints for the vector input operands in the MMA instruction patterns
> with "v,?d" so that we disparage using vs0-vs31 and prefer using the altivec
> registers vs32-vs63 instead, which fixes the dgemm performance issue.

And I assume generated code still looks at least as good otherwise?

> This passed bootstrap and regtesting with no regressions on powerpc64le-linux.
> Ok for trunk and after a few days of burn-in to the GCC12 release branch?
> 
> Technically, the same issue exists in GCC11 and GCC10, but the RA
> assignment is OK with the current code, so unless/until we have a
> test case that exhibits the issue, I'm only asking for a backport to
> GCC12 which does show the performance problem.

So, you put everything that shouldn't be in the commit message at the
end of the mail, easy to delete when applying the patch.  Good good :-)

> gcc/
> 	PR target/105556
> 	* config/rs6000/mma.md (mma_<vv>, mma_<avv>, mma_<pv>, mma_<apv>,
> 	mma_<vvi4i4i8>, mma_<avvi4i4i8>, mma_<vvi4i4i2>, mma_<avvi4i4i2>,
> 	mma_<vvi4i4>, mma_<avvi4i4>, mma_<pvi4i2>, mma_<apvi4i2>,
> 	mma_<vvi4i4i4>, mma_<avvi4i4i4>): Replace "wa" constraint with "v,?d".

> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 907c9d6d516..9c9920870e4 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -490,50 +490,50 @@ (define_insn "mma_xxsetaccz"
>    [(set_attr "type" "mma")])
>  
>  (define_insn "mma_<vv>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "wa")]
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")]

This is more involved than just replacing one constrait with two.  You
shoould say that in the changelog (and in your message).

Out of interest, did you try using v,?wa (so just two alternatives, not
four)?  Or did you think it wouldresult in  measurably worse code?  Or
did you decide it is not such bad backend code size explosion after
all :-)

Okay for trunk with a slightly better changelog.  Thanks!


Segher
  
Peter Bergner May 10, 2022, 10:46 p.m. UTC | #2
On 5/10/22 5:35 PM, Segher Boessenkool wrote:
> If you want to use this same message as commit message, you shouldn't
> say "this patch".  Also, try not to use lines more than 72 positions
> wide (which handily is also a good maximum length for email messages,
> that way it can be quoted a few times without wrapping).

Sure, I can reword that.


> And I assume generated code still looks at least as good otherwise?

If you are asking whether we get the code we want generated with the
patch, the answer is yes.  Otherwise, I'm not sure what you're asking. :-)

 

> This is more involved than just replacing one constrait with two.  You
> shoould say that in the changelog (and in your message).

Ok, I can expand on that.


> Out of interest, did you try using v,?wa (so just two alternatives, not
> four)?  Or did you think it wouldresult in  measurably worse code?  Or
> did you decide it is not such bad backend code size explosion after
> all :-)

I have not tried that, but I will and see what happens.
I'm slightly worried that since 'v' is part of 'wa', then saying
'?wa' might just cancel the 'v' usage, but I'll see how it works.
If it works and simplifies the patch, great!  I'll report back.


> Okay for trunk with a slightly better changelog.  Thanks!

If it doesn't work, I'll go with the code as is, modulo the
changes you asked for.  Thanks.


Peter
  
Peter Bergner May 10, 2022, 11:07 p.m. UTC | #3
On 5/10/22 5:46 PM, Peter Bergner wrote:
>> Out of interest, did you try using v,?wa (so just two alternatives, not
>> four)?  Or did you think it wouldresult in  measurably worse code?  Or
>> did you decide it is not such bad backend code size explosion after
>> all :-)
> 
> I have not tried that, but I will and see what happens.
> I'm slightly worried that since 'v' is part of 'wa', then saying
> '?wa' might just cancel the 'v' usage, but I'll see how it works.
> If it works and simplifies the patch, great!  I'll report back.

So using 'v','?wa' for both vector operands does give us the code we want
for this loop.  Yea!

However, looking closer, that 2 alternative version isn't quite as "optimal"
as the version with 4 alternatives 'v','v','?d','?d' and 'v','?d','v','?d'.
That's because the 2 alt version is an all or nothing thing, you either get
both 'v' operands or you potentially get both 'd'.  The 4 alt version is
more fine grained and has alternatives where one gets a 'v' for sure and the
other a 'd'.

That said, if even one operand is forced into a 'd' register, then our chances
of good performance are lost, so the fact we can get one in a 'v' probably
doesn't matter and given it's cleaner code with fewer alternatives, let's go
with 'v','?wa'.  Ok with you?

I'll kick off a bootstrap and regtest with that change.

Peter
  

Patch

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 907c9d6d516..9c9920870e4 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -490,50 +490,50 @@  (define_insn "mma_xxsetaccz"
   [(set_attr "type" "mma")])
 
 (define_insn "mma_<vv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")]
 		    MMA_VV))]
   "TARGET_MMA"
   "<vv> %A0,%x1,%x2"
   [(set_attr "type" "mma")])
 
 (define_insn "mma_<avv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")]
 		    MMA_AVV))]
   "TARGET_MMA"
   "<avv> %A0,%x2,%x3"
   [(set_attr "type" "mma")])
 
 (define_insn "mma_<pv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")]
 		    MMA_PV))]
   "TARGET_MMA"
   "<pv> %A0,%x1,%x2"
   [(set_attr "type" "mma")])
 
 (define_insn "mma_<apv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:OO 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:OO 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")]
 		    MMA_APV))]
   "TARGET_MMA"
   "<apv> %A0,%x2,%x3"
   [(set_attr "type" "mma")])
 
 (define_insn "mma_<vvi4i4i8>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "u8bit_cint_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "u8bit_cint_operand" "n,n,n,n")]
 		    MMA_VVI4I4I8))]
   "TARGET_MMA"
   "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
@@ -541,13 +541,13 @@  (define_insn "mma_<vvi4i4i8>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<avvi4i4i8>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n")
-		    (match_operand:SI 6 "u8bit_cint_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 6 "u8bit_cint_operand" "n,n,n,n")]
 		    MMA_AVVI4I4I8))]
   "TARGET_MMA"
   "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
@@ -555,12 +555,12 @@  (define_insn "mma_<avvi4i4i8>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<vvi4i4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_3_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n,n")]
 		    MMA_VVI4I4I2))]
   "TARGET_MMA"
   "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
@@ -568,13 +568,13 @@  (define_insn "mma_<vvi4i4i2>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<avvi4i4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n")
-		    (match_operand:SI 6 "const_0_to_3_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 6 "const_0_to_3_operand" "n,n,n,n")]
 		    MMA_AVVI4I4I2))]
   "TARGET_MMA"
   "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
@@ -582,11 +582,11 @@  (define_insn "mma_<avvi4i4i2>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<vvi4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")]
 		    MMA_VVI4I4))]
   "TARGET_MMA"
   "<vvi4i4> %A0,%x1,%x2,%3,%4"
@@ -594,12 +594,12 @@  (define_insn "mma_<vvi4i4>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<avvi4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")]
 		    MMA_AVVI4I4))]
   "TARGET_MMA"
   "<avvi4i4> %A0,%x2,%x3,%4,%5"
@@ -607,11 +607,11 @@  (define_insn "mma_<avvi4i4>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<pvi4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n")
-		    (match_operand:SI 4 "const_0_to_3_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 4 "const_0_to_3_operand" "n,n,n,n")]
 		    MMA_PVI4I2))]
   "TARGET_MMA"
   "<pvi4i2> %A0,%x1,%x2,%3,%4"
@@ -619,12 +619,12 @@  (define_insn "mma_<pvi4i2>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<apvi4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:OO 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_3_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:OO 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n,n")]
 		    MMA_APVI4I2))]
   "TARGET_MMA"
   "<apvi4i2> %A0,%x2,%x3,%4,%5"
@@ -632,12 +632,12 @@  (define_insn "mma_<apvi4i2>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<vvi4i4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")]
 		    MMA_VVI4I4I4))]
   "TARGET_MMA"
   "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
@@ -645,13 +645,13 @@  (define_insn "mma_<vvi4i4i4>"
    (set_attr "prefixed" "yes")])
 
 (define_insn "mma_<avvi4i4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n")
-		    (match_operand:SI 6 "const_0_to_15_operand" "n")]
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d")
+		    (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")
+		    (match_operand:SI 6 "const_0_to_15_operand" "n,n,n,n")]
 		    MMA_AVVI4I4I4))]
   "TARGET_MMA"
   "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"