Message ID | 20211220082405.2206998-1-luoxhu@linux.ibm.com |
---|---|
State | Committed |
Commit | 460d53f816fe30093653cb22ef0feeb4bddc0895 |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BFAC5385802E for <patchwork@sourceware.org>; Mon, 20 Dec 2021 08:25:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFAC5385802E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1639988717; bh=65liegs4DUD33sU/s1wwse/lU22Niluv3e6m3amY910=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=QbpxtcP2JBYP1ZTiarFgK0zXnAl02L1jFdCRVY2v2qpaP8WNPrs/ANzh0h77A5Px6 wLfAL+8xns9cqMvWjDSoGIAWnEBK1nf/nIvfwvt6uWUEO6NGNJ+QnSA1lexQOLIsuk HOYr6xz+sFlj6Tm/c8XOWBMZBNl4Y5ix4PJ1US5Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 7F9783858404; Mon, 20 Dec 2021 08:24:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7F9783858404 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1BK8I0Xv020214; Mon, 20 Dec 2021 08:24:46 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3d1s6seqyn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Dec 2021 08:24:46 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1BK7p4AE011860; Mon, 20 Dec 2021 08:24:45 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 3d1s6seqy3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Dec 2021 08:24:45 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1BK7oXvG002843; Mon, 20 Dec 2021 08:24:43 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma06ams.nl.ibm.com with ESMTP id 3d16wja9pw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Dec 2021 08:24:42 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1BK8GJia34144664 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Dec 2021 08:16:19 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6F2ADA4053; Mon, 20 Dec 2021 08:24:29 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4A615A4059; Mon, 20 Dec 2021 08:24:28 +0000 (GMT) Received: from rain6p1.aus.stglabs.ibm.com (unknown [9.40.203.151]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 20 Dec 2021 08:24:28 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus Date: Mon, 20 Dec 2021 02:24:05 -0600 Message-Id: <20211220082405.2206998-1-luoxhu@linux.ibm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 06VhHy0CmC-18EBC7QPv6oDXzv93USHk X-Proofpoint-GUID: J9DslfRiD2WFkJDoltvCmhdRefNDYZRk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2021-12-20_04,2021-12-16_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=969 bulkscore=0 clxscore=1015 mlxscore=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 malwarescore=0 phishscore=0 suspectscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2112200047 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Xionghu Luo via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Xionghu Luo <luoxhu@linux.ibm.com> Cc: segher@kernel.crashing.org, Xionghu Luo <luoxhu@linux.ibm.com>, wschmidt@linux.ibm.com, linkw@gcc.gnu.org, dje.gcc@gmail.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus
|
|
Commit Message
Xionghu Luo
Dec. 20, 2021, 8:24 a.m. UTC
These four UNSPECS seems could be replaced with native RTL, and why "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): This bit is sticky; that is, once set to 1 it remains set to 1 until it is set to 0 by an mtvscr instruction. The RTL pattern set it to 0 but final ASM doesn't present it? And why not use Clobber VSCR_REGNO instead? Tested pass on P10, OK for master? Thanks. gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vaddu<VI_char>s): Replace UNSPEC_VADDU with us_plus. (altivec_vadds<VI_char>s): Replace UNSPEC_VADDS with ss_plus. (altivec_vsubu<VI_char>s): Replace UNSPEC_VSUBU with us_minus. (altivec_vsubs<VI_char>s): Replace UNSPEC_VSUBS with ss_minus. (altivec_abss_<mode>): Likewise. --- gcc/config/rs6000/altivec.md | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-)
Comments
On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luoxhu@linux.ibm.com> wrote: > > These four UNSPECS seems could be replaced with native RTL, and why > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): > > This bit is sticky; that is, once set to 1 it > remains set to 1 until it is set to 0 by an > mtvscr instruction. > > The RTL pattern set it to 0 but final ASM doesn't present it? And why > not use Clobber VSCR_REGNO instead? The design came from the early implementation of Altivec: https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html If one later checks for saturation (reads VSCR), one needs a corresponding SET of the value. It's set in an architecture-specific manner that isn't described to GCC, but it's set, not just clobbered and in an undefined state. The RTL does not describe that VSCR is set to the value 0. The (const_int 0) is not the value set. You can think of the (const_int 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at least one argument and the pattern doesn't try to express the argument, so it uses a dummy RTL constant. It's part of a PARALLEL and the plus or minus already expresses the data dependency of the pattern on the input operands. I'm unsure of the meaning of your question "final ASM doesn't present it". The operation on VSCR is implicit and not emitted as an instruction. It's in a PARALLEL, which means that the single Altivec instruction has both effects. Is that what you were asking? > Tested pass on P10, OK for master? This patch is okay. Thanks for updating the machine description and for cleaning up the formatting. > Thanks. > > gcc/ChangeLog: > > * config/rs6000/altivec.md (altivec_vaddu<VI_char>s): Replace > UNSPEC_VADDU with us_plus. > (altivec_vadds<VI_char>s): Replace UNSPEC_VADDS with ss_plus. > (altivec_vsubu<VI_char>s): Replace UNSPEC_VSUBU with us_minus. > (altivec_vsubs<VI_char>s): Replace UNSPEC_VSUBS with ss_minus. > (altivec_abss_<mode>): Likewise. > --- > gcc/config/rs6000/altivec.md | 29 ++++++++++------------------- > 1 file changed, 10 insertions(+), 19 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index a057218aa28..b2909857c34 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -29,8 +29,6 @@ (define_c_enum "unspec" > UNSPEC_VMHADDSHS > UNSPEC_VMHRADDSHS > UNSPEC_VADDCUW > - UNSPEC_VADDU > - UNSPEC_VADDS > UNSPEC_VAVGU > UNSPEC_VAVGS > UNSPEC_VMULEUB > @@ -61,8 +59,6 @@ (define_c_enum "unspec" > UNSPEC_VSR > UNSPEC_VSRO > UNSPEC_VSUBCUW > - UNSPEC_VSUBU > - UNSPEC_VSUBS > UNSPEC_VSUM4UBS > UNSPEC_VSUM4S > UNSPEC_VSUM2SWS > @@ -517,9 +513,8 @@ (define_insn "altivec_vaddcuw" > > (define_insn "altivec_vaddu<VI_char>s" > [(set (match_operand:VI 0 "register_operand" "=v") > - (unspec:VI [(match_operand:VI 1 "register_operand" "v") > - (match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VADDU)) > + (us_plus:VI (match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] > "<VI_unit>" > "vaddu<VI_char>s %0,%1,%2" > @@ -527,9 +522,8 @@ (define_insn "altivec_vaddu<VI_char>s" > > (define_insn "altivec_vadds<VI_char>s" > [(set (match_operand:VI 0 "register_operand" "=v") > - (unspec:VI [(match_operand:VI 1 "register_operand" "v") > - (match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VADDS)) > + (ss_plus:VI (match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] > "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" > "vadds<VI_char>s %0,%1,%2" > @@ -563,9 +557,8 @@ (define_insn "altivec_vsubcuw" > > (define_insn "altivec_vsubu<VI_char>s" > [(set (match_operand:VI 0 "register_operand" "=v") > - (unspec:VI [(match_operand:VI 1 "register_operand" "v") > - (match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VSUBU)) > + (us_minus:VI (match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] > "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" > "vsubu<VI_char>s %0,%1,%2" > @@ -573,9 +566,8 @@ (define_insn "altivec_vsubu<VI_char>s" > > (define_insn "altivec_vsubs<VI_char>s" > [(set (match_operand:VI 0 "register_operand" "=v") > - (unspec:VI [(match_operand:VI 1 "register_operand" "v") > - (match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VSUBS)) > + (ss_minus:VI (match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] > "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" > "vsubs<VI_char>s %0,%1,%2" > @@ -3480,9 +3472,8 @@ (define_expand "altivec_absv4sf2" > (define_expand "altivec_abss_<mode>" > [(set (match_dup 2) (vec_duplicate:VI (const_int 0))) > (parallel [(set (match_dup 3) > - (unspec:VI [(match_dup 2) > - (match_operand:VI 1 "register_operand" "v")] > - UNSPEC_VSUBS)) > + (ss_minus:VI (match_dup 2) > + (match_operand:VI 1 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) > (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]) > (set (match_operand:VI 0 "register_operand" "=v") > -- > 2.27.0 >
On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote: > On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luoxhu@linux.ibm.com> wrote: > > These four UNSPECS seems could be replaced with native RTL, and why > > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" > > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): > > > > This bit is sticky; that is, once set to 1 it > > remains set to 1 until it is set to 0 by an > > mtvscr instruction. > > > > The RTL pattern set it to 0 but final ASM doesn't present it? And why > > not use Clobber VSCR_REGNO instead? > > The design came from the early implementation of Altivec: > > https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html > > If one later checks for saturation (reads VSCR), one needs a > corresponding SET of the value. It's set in an architecture-specific > manner that isn't described to GCC, but it's set, not just clobbered > and in an undefined state. Well. RTL clobber and set do exactly the same thing, except with clobber it is not specified *what* value is set. All bits are set, all bits are defined. There is no (direct) way in RTL to say "undetermined". An RTL clobber would work just fine afaics? > The RTL does not describe that VSCR is set to the value 0. The > (const_int 0) is not the value set. You can think of the (const_int > 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at > least one argument and the pattern doesn't try to express the > argument, so it uses a dummy RTL constant. Yup. Traditionally (pc) was used for this. Nowadays (const_int 0) is not really more expensive anymore, and many people find it clearer (but not in this case it seems :-) ). > It's part of a PARALLEL > and the plus or minus already expresses the data dependency of the > pattern on the input operands. But they do not describe any dependency on vscr, or output to it. This is the same problem we have with fpscr (most FP insns use some of its fields, most set some, but there is no way to cleanly express that). Explicit clobbers like this help one side of the issue. For vscr, other than the sat bit there is only the nj bit, and we just ignore that :-) > This patch is okay. Thanks for updating the machine description and > for cleaning up the formatting. x2. Thanks! Segher
On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote: > > On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luoxhu@linux.ibm.com> wrote: > > > These four UNSPECS seems could be replaced with native RTL, and why > > > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" > > > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): > > > > > > This bit is sticky; that is, once set to 1 it > > > remains set to 1 until it is set to 0 by an > > > mtvscr instruction. > > > > > > The RTL pattern set it to 0 but final ASM doesn't present it? And why > > > not use Clobber VSCR_REGNO instead? > > > > The design came from the early implementation of Altivec: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html > > > > If one later checks for saturation (reads VSCR), one needs a > > corresponding SET of the value. It's set in an architecture-specific > > manner that isn't described to GCC, but it's set, not just clobbered > > and in an undefined state. > > Well. RTL clobber and set do exactly the same thing, except with > clobber it is not specified *what* value is set. All bits are set, all > bits are defined. There is no (direct) way in RTL to say > "undetermined". > > An RTL clobber would work just fine afaics? I don't know about the original intention from Aldy, but if one were looking at an RTL dump and the code used the saturation bit from VSCR, it might be confusing to see a CLOBBER instead of a SET. The SET documents that VSCR_REGNO is assigned a specific value; GCC doesn't know about the semantics, but it's not some undefined bit pattern. CLOBBER implies a trash value or a value that one will not query later, i.e., one would want to SET the register to a specific value before using it. > > > The RTL does not describe that VSCR is set to the value 0. The > > (const_int 0) is not the value set. You can think of the (const_int > > 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at > > least one argument and the pattern doesn't try to express the > > argument, so it uses a dummy RTL constant. > > Yup. Traditionally (pc) was used for this. Nowadays (const_int 0) is > not really more expensive anymore, and many people find it clearer (but > not in this case it seems :-) ). > > > It's part of a PARALLEL > > and the plus or minus already expresses the data dependency of the > > pattern on the input operands. > > But they do not describe any dependency on vscr, or output to it. This > is the same problem we have with fpscr (most FP insns use some of its > fields, most set some, but there is no way to cleanly express that). It describes that VSCR_REGNO is set, an output. It doesn't describe how it is set nor inform the compiler that the value depends on the input operands in some complicated way unknown to the compiler, but the compiler cannot do anything useful with the additional information. > > Explicit clobbers like this help one side of the issue. For vscr, other > than the sat bit there is only the nj bit, and we just ignore that :-) > > > This patch is okay. Thanks for updating the machine description and > > for cleaning up the formatting. > > x2. Thanks! > > > Segher
On 2021/12/21 09:32, David Edelsohn wrote: > On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> >> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote: >>> On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luoxhu@linux.ibm.com> wrote: >>>> These four UNSPECS seems could be replaced with native RTL, and why >>>> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" >>>> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): >>>> >>>> This bit is sticky; that is, once set to 1 it >>>> remains set to 1 until it is set to 0 by an >>>> mtvscr instruction. >>>> >>>> The RTL pattern set it to 0 but final ASM doesn't present it? And why >>>> not use Clobber VSCR_REGNO instead? >>> >>> The design came from the early implementation of Altivec: >>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html Thanks, I constructed a testcase for this, cat vadds.c #include <stdio.h> #include <altivec.h> vector signed int foo1 (vector signed int a, vector signed int c) { vector signed int ret = vec_vaddsws (a, c); union {vector unsigned short v; unsigned short s[8];} u; u.v = vec_mfvscr(); printf("foo1: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1], u.s[2], u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]); return a; } vector signed int foo2 (vector signed int a, vector signed int c) { vector signed int ret = vec_vaddsws (a, c); union {vector unsigned short v; unsigned short s[8];} u; u.v = vec_mfvscr(); printf("foo2: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1], u.s[2], u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]); return ret; } int main () { vector signed int a = {0x12345678, 0x22345678, 0x32345678, 0x42345678}; vector signed int c = {0x12345678, 0x22345678, 0x32345678, 0x42345678}; vector signed int b1 = foo1 (a, c); vector signed int b2 = foo2 (a, c); return b1[0]+b2[0]; } The output is: ./a.out foo1: vscr == { 0,0,0,0,0,0,0,0 } foo2: vscr == { 1,0,0,0,0,0,0,0 } So is this expected behavior? Seems doesn't meet with Aldy's description... (foo1's temp is optimized away so no vaddsws produced) " foo() { vector int result; result = vec_adds(blah, blah); __check_for_saturation__ } gcc, will optimize away vec_adds() because the result is a local variable unused later. then when we check the saturation bit in VSCR, we get wrong results. this patch explains to gcc all about VSCR, and adds it as a global register as well." >>> >>> If one later checks for saturation (reads VSCR), one needs a >>> corresponding SET of the value. It's set in an architecture-specific >>> manner that isn't described to GCC, but it's set, not just clobbered >>> and in an undefined state. >> >> Well. RTL clobber and set do exactly the same thing, except with >> clobber it is not specified *what* value is set. All bits are set, all >> bits are defined. There is no (direct) way in RTL to say >> "undetermined". >> >> An RTL clobber would work just fine afaics? > > I don't know about the original intention from Aldy, but if one were > looking at an RTL dump and the code used the saturation bit from VSCR, > it might be confusing to see a CLOBBER instead of a SET. The SET > documents that VSCR_REGNO is assigned a specific value; GCC doesn't > know about the semantics, but it's not some undefined bit pattern. > CLOBBER implies a trash value or a value that one will not query > later, i.e., one would want to SET the register to a specific value > before using it. > >> >>> The RTL does not describe that VSCR is set to the value 0. The >>> (const_int 0) is not the value set. You can think of the (const_int >>> 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at >>> least one argument and the pattern doesn't try to express the >>> argument, so it uses a dummy RTL constant. >> >> Yup. Traditionally (pc) was used for this. Nowadays (const_int 0) is >> not really more expensive anymore, and many people find it clearer (but >> not in this case it seems :-) ). >> >>> It's part of a PARALLEL >>> and the plus or minus already expresses the data dependency of the >>> pattern on the input operands. >> >> But they do not describe any dependency on vscr, or output to it. This >> is the same problem we have with fpscr (most FP insns use some of its >> fields, most set some, but there is no way to cleanly express that). > > It describes that VSCR_REGNO is set, an output. It doesn't describe > how it is set nor inform the compiler that the value depends on the > input operands in some complicated way unknown to the compiler, but > the compiler cannot do anything useful with the additional > information. > >> >> Explicit clobbers like this help one side of the issue. For vscr, other >> than the sat bit there is only the nj bit, and we just ignore that :-) >> >>> This patch is okay. Thanks for updating the machine description and >>> for cleaning up the formatting. >> >> x2. Thanks! >> >> >> Segher
On 2021/12/21 10:19, Xionghu Luo via Gcc-patches wrote: > > > On 2021/12/21 09:32, David Edelsohn wrote: >> On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool >> <segher@kernel.crashing.org> wrote: >>> >>> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote: >>>> On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luoxhu@linux.ibm.com> wrote: >>>>> These four UNSPECS seems could be replaced with native RTL, and why >>>>> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" >>>>> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): >>>>> >>>>> This bit is sticky; that is, once set to 1 it >>>>> remains set to 1 until it is set to 0 by an >>>>> mtvscr instruction. >>>>> >>>>> The RTL pattern set it to 0 but final ASM doesn't present it? And why >>>>> not use Clobber VSCR_REGNO instead? >>>> >>>> The design came from the early implementation of Altivec: >>>> >>>> https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html > > Thanks, I constructed a testcase for this, > > cat vadds.c > #include <stdio.h> > #include <altivec.h> > vector signed int foo1 (vector signed int a, vector signed int c) { > vector signed int ret = vec_vaddsws (a, c); > union {vector unsigned short v; unsigned short s[8];} u; > u.v = vec_mfvscr(); > printf("foo1: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1], > u.s[2], > u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]); > return a; > } > > vector signed int foo2 (vector signed int a, vector signed int c) { > vector signed int ret = vec_vaddsws (a, c); > union {vector unsigned short v; unsigned short s[8];} u; > u.v = vec_mfvscr(); > printf("foo2: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1], > u.s[2], > u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]); > return ret; > } > > int main () > { > vector signed int a = {0x12345678, 0x22345678, 0x32345678, 0x42345678}; > vector signed int c = {0x12345678, 0x22345678, 0x32345678, 0x42345678}; > vector signed int b1 = foo1 (a, c); > vector signed int b2 = foo2 (a, c); > return b1[0]+b2[0]; > } > > The output is: > > ./a.out > foo1: vscr == { 0,0,0,0,0,0,0,0 } > foo2: vscr == { 1,0,0,0,0,0,0,0 } > > > So is this expected behavior? Seems doesn't meet with Aldy's > description... (foo1's temp is optimized away so no vaddsws produced) Just realized that foo1's "ret" is optimized way in gimple already, so no such RTL and vaddsws generated in foo1, only foo2's vaddsws will set VSCR explicitly. > > " foo() > { > vector int result; > > result = vec_adds(blah, blah); > __check_for_saturation__ > } > > gcc, will optimize away vec_adds() because the result is a local > variable unused later. then when we check the saturation bit in VSCR, > we get wrong results. > > this patch explains to gcc all about VSCR, and adds it as a global > register as well." > > >>>> >>>> If one later checks for saturation (reads VSCR), one needs a >>>> corresponding SET of the value. It's set in an architecture-specific >>>> manner that isn't described to GCC, but it's set, not just clobbered >>>> and in an undefined state. >>> >>> Well. RTL clobber and set do exactly the same thing, except with >>> clobber it is not specified *what* value is set. All bits are set, all >>> bits are defined. There is no (direct) way in RTL to say >>> "undetermined". >>> >>> An RTL clobber would work just fine afaics? >> >> I don't know about the original intention from Aldy, but if one were >> looking at an RTL dump and the code used the saturation bit from VSCR, >> it might be confusing to see a CLOBBER instead of a SET. The SET >> documents that VSCR_REGNO is assigned a specific value; GCC doesn't >> know about the semantics, but it's not some undefined bit pattern. >> CLOBBER implies a trash value or a value that one will not query >> later, i.e., one would want to SET the register to a specific value >> before using it. Agree that SET is better than CLOBBER. Thank you! >> >>> >>>> The RTL does not describe that VSCR is set to the value 0. The >>>> (const_int 0) is not the value set. You can think of the (const_int >>>> 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at >>>> least one argument and the pattern doesn't try to express the >>>> argument, so it uses a dummy RTL constant. >>> >>> Yup. Traditionally (pc) was used for this. Nowadays (const_int 0) is >>> not really more expensive anymore, and many people find it clearer (but >>> not in this case it seems :-) ). >>> >>>> It's part of a PARALLEL >>>> and the plus or minus already expresses the data dependency of the >>>> pattern on the input operands. >>> >>> But they do not describe any dependency on vscr, or output to it. This >>> is the same problem we have with fpscr (most FP insns use some of its >>> fields, most set some, but there is no way to cleanly express that). >> >> It describes that VSCR_REGNO is set, an output. It doesn't describe >> how it is set nor inform the compiler that the value depends on the >> input operands in some complicated way unknown to the compiler, but >> the compiler cannot do anything useful with the additional >> information. >> >>> >>> Explicit clobbers like this help one side of the issue. For vscr, other >>> than the sat bit there is only the nj bit, and we just ignore that :-) >>> >>>> This patch is okay. Thanks for updating the machine description and >>>> for cleaning up the formatting. >>> >>> x2. Thanks! >>> >>> >>> Segher >
On 2021/12/21 09:32, David Edelsohn wrote: > Explicit clobbers like this help one side of the issue. For vscr, other > than the sat bit there is only the nj bit, and we just ignore that :-) > >> This patch is okay. Thanks for updating the machine description and >> for cleaning up the formatting. > x2. Thanks! Committed to r12-6084.
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index a057218aa28..b2909857c34 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -29,8 +29,6 @@ (define_c_enum "unspec" UNSPEC_VMHADDSHS UNSPEC_VMHRADDSHS UNSPEC_VADDCUW - UNSPEC_VADDU - UNSPEC_VADDS UNSPEC_VAVGU UNSPEC_VAVGS UNSPEC_VMULEUB @@ -61,8 +59,6 @@ (define_c_enum "unspec" UNSPEC_VSR UNSPEC_VSRO UNSPEC_VSUBCUW - UNSPEC_VSUBU - UNSPEC_VSUBS UNSPEC_VSUM4UBS UNSPEC_VSUM4S UNSPEC_VSUM2SWS @@ -517,9 +513,8 @@ (define_insn "altivec_vaddcuw" (define_insn "altivec_vaddu<VI_char>s" [(set (match_operand:VI 0 "register_operand" "=v") - (unspec:VI [(match_operand:VI 1 "register_operand" "v") - (match_operand:VI 2 "register_operand" "v")] - UNSPEC_VADDU)) + (us_plus:VI (match_operand:VI 1 "register_operand" "v") + (match_operand:VI 2 "register_operand" "v"))) (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] "<VI_unit>" "vaddu<VI_char>s %0,%1,%2" @@ -527,9 +522,8 @@ (define_insn "altivec_vaddu<VI_char>s" (define_insn "altivec_vadds<VI_char>s" [(set (match_operand:VI 0 "register_operand" "=v") - (unspec:VI [(match_operand:VI 1 "register_operand" "v") - (match_operand:VI 2 "register_operand" "v")] - UNSPEC_VADDS)) + (ss_plus:VI (match_operand:VI 1 "register_operand" "v") + (match_operand:VI 2 "register_operand" "v"))) (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" "vadds<VI_char>s %0,%1,%2" @@ -563,9 +557,8 @@ (define_insn "altivec_vsubcuw" (define_insn "altivec_vsubu<VI_char>s" [(set (match_operand:VI 0 "register_operand" "=v") - (unspec:VI [(match_operand:VI 1 "register_operand" "v") - (match_operand:VI 2 "register_operand" "v")] - UNSPEC_VSUBU)) + (us_minus:VI (match_operand:VI 1 "register_operand" "v") + (match_operand:VI 2 "register_operand" "v"))) (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" "vsubu<VI_char>s %0,%1,%2" @@ -573,9 +566,8 @@ (define_insn "altivec_vsubu<VI_char>s" (define_insn "altivec_vsubs<VI_char>s" [(set (match_operand:VI 0 "register_operand" "=v") - (unspec:VI [(match_operand:VI 1 "register_operand" "v") - (match_operand:VI 2 "register_operand" "v")] - UNSPEC_VSUBS)) + (ss_minus:VI (match_operand:VI 1 "register_operand" "v") + (match_operand:VI 2 "register_operand" "v"))) (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" "vsubs<VI_char>s %0,%1,%2" @@ -3480,9 +3472,8 @@ (define_expand "altivec_absv4sf2" (define_expand "altivec_abss_<mode>" [(set (match_dup 2) (vec_duplicate:VI (const_int 0))) (parallel [(set (match_dup 3) - (unspec:VI [(match_dup 2) - (match_operand:VI 1 "register_operand" "v")] - UNSPEC_VSUBS)) + (ss_minus:VI (match_dup 2) + (match_operand:VI 1 "register_operand" "v"))) (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]) (set (match_operand:VI 0 "register_operand" "=v")