From patchwork Tue May 10 20:47:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Bergner X-Patchwork-Id: 53759 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 874B63937415 for ; Tue, 10 May 2022 20:48:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 874B63937415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1652215694; bh=Kx8srYCqQboT/nFpYVbeR+AiPBiTnCNqjgGzeTDHCdY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=L5SiEZ11ZLSX5SofkAiLS86kHtbRBjwS6jAoYTuRDQN1+FTJQQJYXAiCBd1RwhBt3 c6BR7mrAyEO0HYASKbjtNaC5Xuxln5wA1nXfZt3dqDc/eewf4HXup6qKOSXs+2dRa5 gxHDoalJUwLwnPrB8JitMqspBhZKxaH4TA1WtAyc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 6AFB63857432 for ; Tue, 10 May 2022 20:47:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6AFB63857432 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24AI1mH2031457; Tue, 10 May 2022 20:47:44 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3fyvyh3183-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 10 May 2022 20:47:43 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24AKlh8d010861; Tue, 10 May 2022 20:47:43 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3fyvyh317x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 10 May 2022 20:47:43 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24AKlE81006806; Tue, 10 May 2022 20:47:42 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma05wdc.us.ibm.com with ESMTP id 3fwgd9yku6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 10 May 2022 20:47:42 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 24AKlf4J36831606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 10 May 2022 20:47:41 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CB5886E04E; Tue, 10 May 2022 20:47:41 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 368656E053; Tue, 10 May 2022 20:47:41 +0000 (GMT) Received: from [9.160.179.82] (unknown [9.160.179.82]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Tue, 10 May 2022 20:47:41 +0000 (GMT) Message-ID: Date: Tue, 10 May 2022 15:47:40 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Content-Language: en-US To: Segher Boessenkool , David Edelsohn Subject: rs6000: Prefer assigning the MMA vector operands to altivec registers [PR105556] X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 43Ipe5lhpxTms98d6tgKCGfZxG9Nnttk X-Proofpoint-ORIG-GUID: g-t5zTidBhFmP3jJQmTpH5W9FOJnq8xb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-10_06,2022-05-10_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 malwarescore=0 suspectscore=0 clxscore=1015 adultscore=0 phishscore=0 impostorscore=0 bulkscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205100084 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Peter Bergner via Gcc-patches From: Peter Bergner Reply-To: Peter Bergner Cc: GCC Patches Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch addresses an issue when compiling the MMA optimized DGEMM kernel in OpenBLAS. The MMA code uses all 8 accumulators, which overlap all vs0-vs31 vector registers. Current trunk assigns one of the normal vector inputs to one of the MMA instructions, which forces us to spill one of the accumulators to memory, leading to poor performance. The solution here is to replace the "wa" constraints for the vector input operands in the MMA instruction patterns with "v,?d" so that we disparage using vs0-vs31 and prefer using the altivec registers vs32-vs63 instead, which fixes the dgemm performance issue. This passed bootstrap and regtesting with no regressions on powerpc64le-linux. Ok for trunk and after a few days of burn-in to the GCC12 release branch? Technically, the same issue exists in GCC11 and GCC10, but the RA assignment is OK with the current code, so unless/until we have a test case that exhibits the issue, I'm only asking for a backport to GCC12 which does show the performance problem. Peter gcc/ PR target/105556 * config/rs6000/mma.md (mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Replace "wa" constraint with "v,?d". diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 907c9d6d516..9c9920870e4 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -490,50 +490,50 @@ (define_insn "mma_xxsetaccz" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")] MMA_VV))] "TARGET_MMA" " %A0,%x1,%x2" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")] MMA_AVV))] "TARGET_MMA" " %A0,%x2,%x3" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d")] MMA_PV))] "TARGET_MMA" " %A0,%x1,%x2" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:OO 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:OO 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d")] MMA_APV))] "TARGET_MMA" " %A0,%x2,%x3" [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "u8bit_cint_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "u8bit_cint_operand" "n,n,n,n")] MMA_VVI4I4I8))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" @@ -541,13 +541,13 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n") - (match_operand:SI 6 "u8bit_cint_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 6 "u8bit_cint_operand" "n,n,n,n")] MMA_AVVI4I4I8))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" @@ -555,12 +555,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n,n")] MMA_VVI4I4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" @@ -568,13 +568,13 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n") - (match_operand:SI 6 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 6 "const_0_to_3_operand" "n,n,n,n")] MMA_AVVI4I4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" @@ -582,11 +582,11 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n")] MMA_VVI4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" @@ -594,12 +594,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")] MMA_AVVI4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" @@ -607,11 +607,11 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 4 "const_0_to_3_operand" "n,n,n,n")] MMA_PVI4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" @@ -619,12 +619,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:OO 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_3_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:OO 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n,n")] MMA_APVI4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" @@ -632,12 +632,12 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:SI 3 "const_0_to_15_operand" "n") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 2 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n")] MMA_VVI4I4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" @@ -645,13 +645,13 @@ (define_insn "mma_" (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0") - (match_operand:V16QI 2 "vsx_register_operand" "wa") - (match_operand:V16QI 3 "vsx_register_operand" "wa") - (match_operand:SI 4 "const_0_to_15_operand" "n") - (match_operand:SI 5 "const_0_to_15_operand" "n") - (match_operand:SI 6 "const_0_to_15_operand" "n")] + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d,&d,&d") + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "v,v,?d,?d") + (match_operand:V16QI 3 "vsx_register_operand" "v,?d,v,?d") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n,n") + (match_operand:SI 6 "const_0_to_15_operand" "n,n,n,n")] MMA_AVVI4I4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6"