From patchwork Tue Mar 28 03:19:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 66981 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4FF9E385842B for ; Tue, 28 Mar 2023 03:20:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4FF9E385842B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1679973635; bh=JiMo9v1Uf/KMYKjqH4fccYME9w/+DKhgtwbCX2k1YzY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=yMM2RnqtWkGo1bEL1l5KTB9GRX489znJUbWfDob58PWp9utCZaaLD0PUQKwHyGxtF oC8N/thLvPQXD1Qlzdu7b8du0ipzyISJyNP1vgUDN1oDfkvOzt+8wqm8ZvYbqVAYxM UBPC71v3estZhFBQ4gBvHE0YErxDpWt/fbaHprhY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 17BE73858D39 for ; Tue, 28 Mar 2023 03:20:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 17BE73858D39 Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32S2p3Ko007243; Tue, 28 Mar 2023 03:20:03 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pkqtjrg3p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Mar 2023 03:20:01 +0000 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 32S2tB0l019863; Tue, 28 Mar 2023 03:20:01 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pkqtjrg38-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Mar 2023 03:20:01 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 32S110KO032036; Tue, 28 Mar 2023 03:20:00 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([9.208.130.100]) by ppma03wdc.us.ibm.com (PPS) with ESMTPS id 3phrk76fjc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Mar 2023 03:20:00 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 32S3JwBS14156318 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 28 Mar 2023 03:19:59 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7479458059; Tue, 28 Mar 2023 03:19:58 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A7C7758043; Tue, 28 Mar 2023 03:19:57 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.59.115]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Tue, 28 Mar 2023 03:19:57 +0000 (GMT) Date: Mon, 27 Mar 2023 23:19:55 -0400 To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH, V3] PR target/105325, Make load/cmp fusion know about prefixed loads Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: dmHWNcMFyBxnjUHF9_DaBvkLfCBEmUP5 X-Proofpoint-GUID: LkXYXf5-gdgeKll4fEiN40H_B8GE-3Sx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-24_11,2023-03-27_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 adultscore=0 priorityscore=1501 mlxscore=0 impostorscore=0 phishscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2303280023 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" I posted a version of patch on March 21st and a second version on March 24th. This patch makes some code changes suggested in the genfusion.pl code from the last 2 patch submissions. The fusion.md that is produced by genfusion.pl is the same in all 3 versions. I changed the genfusion.pl to match the suggestion for code layout. I also used the correct comment for each of the instructions (in the 2nd patch, the when I rewrote the comments about ld and lwa being DS format instructions, I had put the ld comment in the section handling lwa, and vice versa). I also removed lp64 from the new test. When I first added the prefixed code, it was only done for 64-bit, but now it is allowed for 32-bit. However, the case that shows up (lwa) would not hit in 32-bit, since it only generates lwz and not lwa. It also would not generate ld. But the test does pass when it is built with -m32. The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion optimization generates illegal assembler code. Ultimately the code was dying because the fusion load + compare -1/0/1 patterns did not handle the possibility that the load might be prefixed. The main cause is the constraints for the individual loads in the fusion did not match the machine. In particular, LWA is a ds format instruction when it is unprefixed. The code did not also set the prefixed attribute correctly. This patch rewrites the genfusion.pl script so that it will have more accurate constraints for the LWA and LD instructions (which are DS instructions). The updated genfusion.pl was then run to update fusion.md. Finally, the code for the "prefixed" attribute is modified so that it considers load + compare immediate patterns to be like the normal load insns in checking whether operand[1] is a prefixed instruction. I have tested this code on a power9 little endian system (with long double being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a power8 big endian system, testing both 32-bit and 64-bit code generation. Can I put this code into the master branch, and after a waiting period, apply it to the GCC 12 and GCC 11 branches (the bug does show up in those branches, and the patch applies without change). 2023-03-27 Michael Meissner gcc/ PR target/105325 * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation of the ld and lwa instructions which use the DS encoding instead of D. Use the YZ constraint for these loads. Handle prefixed loads better. Set the sign_extend attribute as appropriate. * gcc/config/rs6000/fusion.md: Regenerate. * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi instructions to the list of instructions that might have a prefixed load instruction. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts. --- gcc/config/rs6000/fusion.md | 17 +++++---- gcc/config/rs6000/genfusion.pl | 36 ++++++++++++++----- gcc/config/rs6000/rs6000.md | 2 +- gcc/testsuite/g++.target/powerpc/pr105325.C | 23 ++++++++++++ .../gcc.target/powerpc/fusion-p10-ldcmpi.c | 4 +-- 5 files changed, 64 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index d45fb138a70..da9953d9ad9 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -22,7 +22,7 @@ ;; load mode is DI result mode is clobber compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" ;; load mode is DI result mode is DI compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" ;; load mode is DI result mode is DI compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -106,7 +106,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" ;; load mode is SI result mode is clobber compare mode is CC extend is none (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "lwa_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:SI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -148,7 +148,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none" ;; load mode is SI result mode is SI compare mode is CC extend is none (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "lwa_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none" ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "lwa_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))] "(TARGET_P10_FUSION)" @@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 @@ -247,6 +248,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 @@ -289,6 +291,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index e4db352e0ce..05d66c18dd9 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -56,7 +56,7 @@ sub mode_to_ldst_char sub gen_ld_cmpi_p10 { my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred, - $mempred, $ccmode, $np, $extend, $resultmode); + $mempred, $ccmode, $np, $extend, $resultmode, $constraint); LMODE: foreach $lmode ('DI','SI','HI','QI') { $ldst = mode_to_ldst_char($lmode); $clobbermode = $lmode; @@ -69,23 +69,38 @@ sub gen_ld_cmpi_p10 # Don't allow EXTQI because that would allow HI result which we can't do. $result = "GPR" if $result eq "EXTQI"; CCMODE: foreach $ccmode ('CC','CCUNS') { - $np = "NON_PREFIXED_D"; - $mempred = "non_update_memory_operand"; if ( $ccmode eq 'CC' ) { next CCMODE if $lmode eq 'QI'; - if ( $lmode eq 'DI' || $lmode eq 'SI' ) { - # ld and lwa are both DS-FORM. + if ( $lmode eq 'HI' ) { + $np = "NON_PREFIXED_D"; + $mempred = "non_update_memory_operand"; + $echr = "a"; + $constraint = "m"; + } elsif ( $lmode eq 'SI' ) { + # lwa is DS-FORM. + $np = "NON_PREFIXED_DS"; + $mempred = "lwa_operand"; + $echr = "a"; + $constraint = "YZ"; + } elsif ( $lmode eq 'DI' ) { + # ld is DS-FORM. $np = "NON_PREFIXED_DS"; $mempred = "ds_form_mem_operand"; + $echr = ""; + $constraint = "YZ"; } $cmpl = ""; - $echr = "a"; $constpred = "const_m1_to_1_operand"; } else { if ( $lmode eq 'DI' ) { - # ld is DS-form, but lwz is not. + # ld is DS-form $np = "NON_PREFIXED_DS"; $mempred = "ds_form_mem_operand"; + $constraint = "YZ"; + } else { + $np = "NON_PREFIXED_D"; + $mempred = "non_update_memory_operand"; + $constraint = "m"; } $cmpl = "l"; $echr = "z"; @@ -108,7 +123,7 @@ sub gen_ld_cmpi_p10 print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n"; print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n"; - print " (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n"; + print " (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"${constraint}\")\n"; if ($ccmode eq 'CCUNS') { print " "; } print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n"; if ($result eq 'clobber') { @@ -137,6 +152,11 @@ sub gen_ld_cmpi_p10 print " \"\"\n"; print " [(set_attr \"type\" \"fused_load_cmpi\")\n"; print " (set_attr \"cost\" \"8\")\n"; + + if ($extend eq "sign") { + print " (set_attr \"sign_extend\" \"yes\")\n"; + } + print " (set_attr \"length\" \"8\")])\n"; print "\n"; } diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..d836a8a58b3 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes" (eq_attr "maybe_prefixed" "no")) (const_string "no") - (eq_attr "type" "load,fpload,vecload") + (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi") (if_then_else (match_test "prefixed_load_p (insn)") (const_string "yes") (const_string "no")) diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C new file mode 100644 index 00000000000..e42c8f9b30f --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C @@ -0,0 +1,23 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */ + +/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair + instead of PLWZ/CMPWI. Ultimately the code was dying because the fusion + load + compare -1/0/1 patterns did not handle the possibility that the load + might be prefixed. */ + +struct Ath__array1D { + int _current; + int getCnt() { return _current; } +}; +struct extMeasure { + int _mapTable[10000]; + Ath__array1D _metRCTable; +}; +void measureRC() { + extMeasure m; + for (; m._metRCTable.getCnt();) + for (;;) + ; +} diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c index 526a026d874..ca7297375a4 100644 --- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c +++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c @@ -61,7 +61,7 @@ TEST(int8_t) /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign" 16 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" 4 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 0 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 8 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 0 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none" 2 { target lp64 } } } */ @@ -73,6 +73,6 @@ TEST(int8_t) /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign" 8 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" 2 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 0 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 9 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 16 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 0 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none" 6 { target ilp32 } } } */