Message ID | patch-16679-tamar@arm.com |
---|---|
State | Committed |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9DEEB3986424 for <patchwork@sourceware.org>; Thu, 8 Dec 2022 16:40:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9DEEB3986424 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1670517606; bh=QYxedN2l5ipvMOiCpOxCDdM8QTX4f0xCcPpzDX2l6MQ=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=cEXhg0/XflFOmajLZ8MGzYO/51jXtDvCdvcImQGHmEbaDx5O99/Pg/3gUqfreMb+v HebD4idjrpsLL3Pt4GlJw0/QAyBLcXgmKmemI1XdyFb2hToi8EqOfvlR8+AN9LAspI kuUvQVeVsqwk7YVpMTZNVZYe1uEDTrWZSRosNxUE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2046.outbound.protection.outlook.com [40.107.20.46]) by sourceware.org (Postfix) with ESMTPS id 8331638B5276 for <gcc-patches@gcc.gnu.org>; Thu, 8 Dec 2022 16:39:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8331638B5276 Received: from AM5PR0601CA0074.eurprd06.prod.outlook.com (2603:10a6:206::39) by VE1PR08MB5807.eurprd08.prod.outlook.com (2603:10a6:800:1b2::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Thu, 8 Dec 2022 16:39:28 +0000 Received: from AM7EUR03FT042.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:0:cafe::32) by AM5PR0601CA0074.outlook.office365.com (2603:10a6:206::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Thu, 8 Dec 2022 16:39:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT042.mail.protection.outlook.com (100.127.140.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5901.17 via Frontend Transport; Thu, 8 Dec 2022 16:39:26 +0000 Received: ("Tessian outbound 6c699027a257:v130"); Thu, 08 Dec 2022 16:39:26 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 26272579ae20b29f X-CR-MTA-TID: 64aa7808 Received: from 9fa57a0dc921.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D23B8E4A-E128-4EC7-A44A-951CF95FCD79.1; Thu, 08 Dec 2022 16:39:19 +0000 Received: from EUR03-DBA-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9fa57a0dc921.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 08 Dec 2022 16:39:19 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=F16Akgw03sqsU9yxZNb+fMThn63hCohYUETnH5E1JLad34J9Sf600zqmEe9mtD6V82kOFJR1+a3X3hxiQ13Mt/u8mwbUbPTPDevRbb1+8TbkISexjADzLrHJDcclvez6gA9IB9F6+BKaQ7/bBwBMBwhF7ZRMtMmnM3926JLB1S0EcakO8blaQ2XIpSqMy4GiYaCiy2ILrYaTnee0nMgJOsUXy7y0gCmxkq8BFly9I6hHemqX7MWog2m7tjSKLrQvZUFrHeYneZ0UVn7estyH9AIctQLJNreDHcm/TXYRddNI3cU4DwiTUdp970+FT+EWw5yHSR9+uOozqdYEKksi7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QYxedN2l5ipvMOiCpOxCDdM8QTX4f0xCcPpzDX2l6MQ=; b=hkLVuNAEowDPoszaSjq+oTNnIlJ958S2aROwsZLrFw/dsi920EMXzM70ucle3VxPyMeRFaKs57lWO7OIMeWlB5EMoz3iFBHGvLrwSD2SzU4GyWYoDrUlQat6R0bxeyi7YwH9DGb+hsymkBi/V8JICvPbZYpD49Dv3vcrGuAzZ9pTbqaFJwbxpqYg8KMGk60xe7A8Mer+E6j+ow/7mQOCG29OhMCXrrBGXopx2gL3gjT8aWfZJj/a77Ga3GbnjQVXTxfAiy5yzVlkPMoguz9LcocE2P1gPBrdv2abhe/hYFh0cRjchi5dcf3/sCJT83JYfRJecVzEezFbinBjxGzMdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS4PR08MB7877.eurprd08.prod.outlook.com (2603:10a6:20b:51c::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.11; Thu, 8 Dec 2022 16:39:16 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd2a:aff9:b1a0:2fc7]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd2a:aff9:b1a0:2fc7%4]) with mapi id 15.20.5880.014; Thu, 8 Dec 2022 16:39:16 +0000 Date: Thu, 8 Dec 2022 16:39:13 +0000 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH]AArch64 div-by-255, ensure that arguments are registers. [PR107988] Message-ID: <patch-16679-tamar@arm.com> Content-Type: multipart/mixed; boundary="dFyyMZgkSJUVzmz0" Content-Disposition: inline X-ClientProxiedBy: LO6P265CA0017.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2ff::8) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS4PR08MB7877:EE_|AM7EUR03FT042:EE_|VE1PR08MB5807:EE_ X-MS-Office365-Filtering-Correlation-Id: e625aac7-4c62-4bc0-0dad-08dad93ac568 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 8mEpXMQpWJMqBT7GoL8I8o5YBFi5y7YME2/jO7gujyFXQT0lAUnuDikf6ND2cLQCfV/aoCNKmEO2eVo6B5lLeV31qkd5CwwPnE4JaUP6cJoUHpw8KsYF/quEZy7+sIhyHbHsJ6pRVZ4gBci7dX4kJxaQASlYCyBQbgPYm8poEFlMhgjX42DEKwZN8EvZpgLm72YVg/Hf4eFMgsRzmfHIC8FQX6GcAN3/EIl2kGsbBE8Wm8AeVWS0Yd40NpCP9rmcxhXJJnMMd8lBW1cuYFUmprNpBURP/Kw3eI/n0l4YRJUqObLUs6OktQUeyGgplEEBWSgbSig21Nb82VPifPLyQrmP+mLT6CHBqPdOAp9P3zSNavBmSHGnpD3yIoZK9Ti/oDEDUx0kDVhOV8WYOL6bmYLWjAwyDIj62Em0MLVTyBCaRVsgouey/Vxvcj2wibqwnnnXZtENFR3AGR4PFH6w/71epbvKnj1zMHPj6ZvWcE+DZc5CV0/fo3Yr6Y0q9C5mByG8G8d7yJxPGm2qGsR9ChbTWKdyrAcoZSV9ZvLuF02Hj0y4XHnMSrgcFuAY20nSSvsTaaKVnHDvmB1go8eOEpBLRyZDhKYKdgXOCDuut1j+Hlx4dfd1aQLJXUMY8nQ87psmFW4DL4vxuxRHvroHalp8WLXfFcxGLULMtPGfq6l1E5kFexGdsAH+1tLiDPiVldlHUAgUzbxnNfA1CNFSozlB3Id5UNeDRXbWj4tTSDy2wAeLXM2czzEr0efNPH7Y X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(366004)(396003)(39860400002)(451199015)(235185007)(2906002)(26005)(6666004)(44144004)(6506007)(6512007)(86362001)(33964004)(38100700002)(6486002)(5660300002)(44832011)(8936002)(36756003)(6916009)(316002)(2616005)(186003)(66556008)(66946007)(66476007)(478600001)(4326008)(8676002)(84970400001)(41300700001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR08MB7877 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT042.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 1bcce1ec-87fa-421d-4274-08dad93abe9f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1Lm6/FYUd3htupP05gctSRr33zdxog0YnRJTe4Pzi6GM2f3WJHLhLqon8rE1BzlN8HhsN96sZwlbH0jbv2WKudeAtFiqAgoh2N3zBDIvZlHqwLy3BNtdI7fny9bLxQHXezb9C+P2TVP4LXegxIn3iDw6y799VYJkTDgUgzAcOW3+JoUB4wbBTaLEwEzb+lgY6nqshM2+2bHxP9vvYWorlTrWqxFP8INj4KYavDIrMgQ+fjDOJyM3CkKhYcMwKqPNse4c4qaDFRXm0bOcRh4cvk/TCDPcZOh0M4wuHRyv0MHSIdMw6IVEua79M5JUYU6yt2fGeHpp4FPF/lWR/XVr9uGs085Sn7VidzKCLo4DhX1y3o7t4fEw+ZKakQTQvxWqEsTfwvaRo0S+exYY6aUt6pPOYqAxq48bwKdTwef4vuHUe6YpdEiU2YV4pWdbddLGXD4MrOSthlLFPFlAPOGsXHmE88AVHA4oG/UI2wr9rEkslIbAaWTSA1r4xwvsj8ClmVXwGbvg6dSe5TaTbjXiFfOazcAWLiiKd+09PgC3kYf6hbcm+IVU2LO5jwC2Xp4DndZDl8tULdnbu2SNitkk5zAR3igC7HvgQH7OrgYydKUIeELp4w6+NlNdxWYvfCkOtWcj37QyUxqBkgU7mJ3OKEdnZNMyEsq/Y5ENyWpbRCHb4bl00pveZatQddyF2uP0BiV6jVQcegDdn9m81Dr0FbQ/tDzXbgQmOn4bHPYnagkM2KzFaydcQVKzvsEOqyxjSgTrKzFWJEN0FeFxRKzp3wYj4T/LgrgqJ+Iv1ytPzow= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(396003)(39860400002)(346002)(376002)(136003)(451199015)(36840700001)(40470700004)(46966006)(2906002)(84970400001)(5660300002)(235185007)(44832011)(40460700003)(86362001)(41300700001)(47076005)(478600001)(36756003)(8936002)(70586007)(70206006)(40480700001)(4326008)(356005)(8676002)(6916009)(316002)(36860700001)(44144004)(6512007)(6506007)(33964004)(6666004)(6486002)(82740400003)(26005)(81166007)(336012)(82310400005)(2616005)(186003)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2022 16:39:26.8670 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e625aac7-4c62-4bc0-0dad-08dad93ac568 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT042.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5807 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Tamar Christina <tamar.christina@arm.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
AArch64 div-by-255, ensure that arguments are registers. [PR107988]
|
|
Commit Message
Tamar Christina
Dec. 8, 2022, 4:39 p.m. UTC
Hi All, At -O0 (as opposed to e.g. volatile) we can get into the situation where the in0 and result RTL arguments passed to the division function are memory locations instead of registers. I think we could reject these early on by checking that the gimple values are GIMPLE registers, but I think it's better to handle it. As such I force them to registers and emit a move to the memory locations and leave it up to reload to handle. This fixes the ICE and still allows the optimization in these cases, which improves the code quality a lot. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/107988 * config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant): Ensure input and output RTL are registers. gcc/testsuite/ChangeLog: PR target/107988 * gcc.target/aarch64/pr107988-1.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index b8dc3f070c8afc47c85fa18768c4da92c774338f..9f96424993c4fcccce90e1b241fcb3aa97025225 100644 -- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index b8dc3f070c8afc47c85fa18768c4da92c774338f..9f96424993c4fcccce90e1b241fcb3aa97025225 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24337,12 +24337,27 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, if (!VECTOR_TYPE_P (vectype)) return false; + if (!REG_P (in0)) + in0 = force_reg (GET_MODE (in0), in0); + gcc_assert (output); - if (!*output) - *output = gen_reg_rtx (TYPE_MODE (vectype)); + rtx res = NULL_RTX; + + /* Once e get to this point we cannot reject the RTL, if it's not a reg then + Create a new reg and write the result to the output afterwards. */ + if (!*output || !REG_P (*output)) + res = gen_reg_rtx (TYPE_MODE (vectype)); + else + res = *output; + + emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), res, in0, in1)); + + if (*output && res != *output) + emit_move_insn (*output, res); + else + *output = res; - emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/pr107988-1.c b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c new file mode 100644 index 0000000000000000000000000000000000000000..c4fd290271b738345173b569bdc58c092fba7fe9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O0" } */ +typedef unsigned short __attribute__((__vector_size__ (16))) V; + +V +foo (V v) +{ + v /= 255; + return v; +}
Comments
On 08/12/2022 16:39, Tamar Christina via Gcc-patches wrote: > Hi All, > > At -O0 (as opposed to e.g. volatile) we can get into the situation where the > in0 and result RTL arguments passed to the division function are memory > locations instead of registers. I think we could reject these early on by > checking that the gimple values are GIMPLE registers, but I think it's better to > handle it. > > As such I force them to registers and emit a move to the memory locations and > leave it up to reload to handle. This fixes the ICE and still allows the > optimization in these cases, which improves the code quality a lot. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > > > gcc/ChangeLog: > > PR target/107988 > * config/aarch64/aarch64.cc > (aarch64_vectorize_can_special_div_by_constant): Ensure input and output > RTL are registers. > > gcc/testsuite/ChangeLog: > > PR target/107988 > * gcc.target/aarch64/pr107988-1.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index b8dc3f070c8afc47c85fa18768c4da92c774338f..9f96424993c4fcccce90e1b241fcb3aa97025225 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24337,12 +24337,27 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, > if (!VECTOR_TYPE_P (vectype)) > return false; > > + if (!REG_P (in0)) > + in0 = force_reg (GET_MODE (in0), in0); > + > gcc_assert (output); > > - if (!*output) > - *output = gen_reg_rtx (TYPE_MODE (vectype)); > + rtx res = NULL_RTX; > + > + /* Once e get to this point we cannot reject the RTL, if it's not a reg then > + Create a new reg and write the result to the output afterwards. */ > + if (!*output || !REG_P (*output)) > + res = gen_reg_rtx (TYPE_MODE (vectype)); > + else > + res = *output; Why not write rtx res = *output if (!res || !REG_P (res)) res = gen_reg_rtx... then you don't need either the else clause or the dead NULL_RTX assignment. > + > + emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), res, in0, in1)); > + > + if (*output && res != *output) > + emit_move_insn (*output, res); > + else > + *output = res; > > - emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); > return true; > } > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr107988-1.c b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..c4fd290271b738345173b569bdc58c092fba7fe9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O0" } */ > +typedef unsigned short __attribute__((__vector_size__ (16))) V; > + > +V > +foo (V v) > +{ > + v /= 255; > + return v; > +} > > > > Otherwise OK. R.
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > On 08/12/2022 16:39, Tamar Christina via Gcc-patches wrote: >> Hi All, >> >> At -O0 (as opposed to e.g. volatile) we can get into the situation where the >> in0 and result RTL arguments passed to the division function are memory >> locations instead of registers. I think we could reject these early on by >> checking that the gimple values are GIMPLE registers, but I think it's better to >> handle it. >> >> As such I force them to registers and emit a move to the memory locations and >> leave it up to reload to handle. This fixes the ICE and still allows the >> optimization in these cases, which improves the code quality a lot. >> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> Ok for master? >> >> Thanks, >> Tamar >> >> >> >> gcc/ChangeLog: >> >> PR target/107988 >> * config/aarch64/aarch64.cc >> (aarch64_vectorize_can_special_div_by_constant): Ensure input and output >> RTL are registers. >> >> gcc/testsuite/ChangeLog: >> >> PR target/107988 >> * gcc.target/aarch64/pr107988-1.c: New test. >> >> --- inline copy of patch -- >> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >> index b8dc3f070c8afc47c85fa18768c4da92c774338f..9f96424993c4fcccce90e1b241fcb3aa97025225 100644 >> --- a/gcc/config/aarch64/aarch64.cc >> +++ b/gcc/config/aarch64/aarch64.cc >> @@ -24337,12 +24337,27 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, >> if (!VECTOR_TYPE_P (vectype)) >> return false; >> >> + if (!REG_P (in0)) >> + in0 = force_reg (GET_MODE (in0), in0); >> + >> gcc_assert (output); >> >> - if (!*output) >> - *output = gen_reg_rtx (TYPE_MODE (vectype)); >> + rtx res = NULL_RTX; >> + >> + /* Once e get to this point we cannot reject the RTL, if it's not a reg then >> + Create a new reg and write the result to the output afterwards. */ >> + if (!*output || !REG_P (*output)) >> + res = gen_reg_rtx (TYPE_MODE (vectype)); >> + else >> + res = *output; > > Why not write > rtx res = *output > if (!res || !REG_P (res)) > res = gen_reg_rtx... > > then you don't need either the else clause or the dead NULL_RTX assignment. I'd prefer that we use the expand_insn interface, which already has logic for coercing inputs and outputs to predicates. Something like: machine_mode mode = TYPE_MODE (vectype); unsigned int flags = aarch64_classify_vector_mode (mode); if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) return false; ... expand_operand ops[3]; create_output_operand (&ops[0], *output, mode); create_input_operand (&ops[1], in0, mode); create_fixed_operand (&ops[2], in1); expand_insn (insn_code, 3, ops); *output = ops[0].value; return true; On this function: why do we have the VECTOR_TYPE_P condition in: /* We can use the optimized pattern. */ if (in0 == NULL_RTX && in1 == NULL_RTX) return true; if (!VECTOR_TYPE_P (vectype)) return false; ? It seems odd to be returning false after we have decided (in the non-generating case) that everything is OK. When would we see a vector mode that has an associated division instruction (checked above this), and yet not have a vector type? Thanks, Richard >> + >> + emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), res, in0, in1)); >> + >> + if (*output && res != *output) >> + emit_move_insn (*output, res); >> + else >> + *output = res; >> >> - emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); >> return true; >> } >> >> diff --git a/gcc/testsuite/gcc.target/aarch64/pr107988-1.c b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c >> new file mode 100644 >> index 0000000000000000000000000000000000000000..c4fd290271b738345173b569bdc58c092fba7fe9 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-additional-options "-O0" } */ >> +typedef unsigned short __attribute__((__vector_size__ (16))) V; >> + >> +V >> +foo (V v) >> +{ >> + v /= 255; >> + return v; >> +} >> >> >> >> > > Otherwise OK. > > R.
> -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Friday, December 9, 2022 7:08 AM > To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com> > Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; > nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com> > Subject: Re: [PATCH]AArch64 div-by-255, ensure that arguments are > registers. [PR107988] > > Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: > > On 08/12/2022 16:39, Tamar Christina via Gcc-patches wrote: > >> Hi All, > >> > >> At -O0 (as opposed to e.g. volatile) we can get into the situation > >> where the > >> in0 and result RTL arguments passed to the division function are > >> memory locations instead of registers. I think we could reject these > >> early on by checking that the gimple values are GIMPLE registers, but > >> I think it's better to handle it. > >> > >> As such I force them to registers and emit a move to the memory > >> locations and leave it up to reload to handle. This fixes the ICE > >> and still allows the optimization in these cases, which improves the code > quality a lot. > >> > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > >> Ok for master? > >> > >> Thanks, > >> Tamar > >> > >> > >> > >> gcc/ChangeLog: > >> > >> PR target/107988 > >> * config/aarch64/aarch64.cc > >> (aarch64_vectorize_can_special_div_by_constant): Ensure input and > output > >> RTL are registers. > >> > >> gcc/testsuite/ChangeLog: > >> > >> PR target/107988 > >> * gcc.target/aarch64/pr107988-1.c: New test. > >> > >> --- inline copy of patch -- > >> diff --git a/gcc/config/aarch64/aarch64.cc > >> b/gcc/config/aarch64/aarch64.cc index > >> > b8dc3f070c8afc47c85fa18768c4da92c774338f..9f96424993c4fcccce90e1b241f > >> cb3aa97025225 100644 > >> --- a/gcc/config/aarch64/aarch64.cc > >> +++ b/gcc/config/aarch64/aarch64.cc > >> @@ -24337,12 +24337,27 @@ > aarch64_vectorize_can_special_div_by_constant (enum tree_code code, > >> if (!VECTOR_TYPE_P (vectype)) > >> return false; > >> > >> + if (!REG_P (in0)) > >> + in0 = force_reg (GET_MODE (in0), in0); > >> + > >> gcc_assert (output); > >> > >> - if (!*output) > >> - *output = gen_reg_rtx (TYPE_MODE (vectype)); > >> + rtx res = NULL_RTX; > >> + > >> + /* Once e get to this point we cannot reject the RTL, if it's not a reg > then > >> + Create a new reg and write the result to the output afterwards. > >> + */ if (!*output || !REG_P (*output)) > >> + res = gen_reg_rtx (TYPE_MODE (vectype)); else > >> + res = *output; > > > > Why not write > > rtx res = *output > > if (!res || !REG_P (res)) > > res = gen_reg_rtx... > > > > then you don't need either the else clause or the dead NULL_RTX > assignment. > > I'd prefer that we use the expand_insn interface, which already has logic for > coercing inputs and outputs to predicates. Something like: > > machine_mode mode = TYPE_MODE (vectype); > unsigned int flags = aarch64_classify_vector_mode (mode); > if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) > return false; > > ... > > expand_operand ops[3]; > create_output_operand (&ops[0], *output, mode); > create_input_operand (&ops[1], in0, mode); > create_fixed_operand (&ops[2], in1); > expand_insn (insn_code, 3, ops); > *output = ops[0].value; > return true; > > On this function: why do we have the VECTOR_TYPE_P condition in: > It was left over after checking for optabs support. It's superfluous now. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/107988 * config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant): Ensure input and output RTL are registers. gcc/testsuite/ChangeLog: PR target/107988 * gcc.target/aarch64/pr107988-1.c: New test. --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 7bb0b7602ff6474410059494dd86b7be1621dde5..e1f34ef5da170ef11727e0c99a5bd42849c5d185 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24395,7 +24395,8 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, || !TYPE_UNSIGNED (vectype)) return false; - unsigned int flags = aarch64_classify_vector_mode (TYPE_MODE (vectype)); + machine_mode mode = TYPE_MODE (vectype); + unsigned int flags = aarch64_classify_vector_mode (mode); if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) return false; @@ -24411,15 +24412,14 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, if (in0 == NULL_RTX && in1 == NULL_RTX) return true; - if (!VECTOR_TYPE_P (vectype)) - return false; - gcc_assert (output); - if (!*output) - *output = gen_reg_rtx (TYPE_MODE (vectype)); - - emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); + expand_operand ops[3]; + create_output_operand (&ops[0], *output, mode); + create_input_operand (&ops[1], in0, mode); + create_fixed_operand (&ops[2], in1); + expand_insn (insn_code, 3, ops); + *output = ops[0].value; return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/pr107988-1.c b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c new file mode 100644 index 0000000000000000000000000000000000000000..c4fd290271b738345173b569bdc58c092fba7fe9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O0" } */ +typedef unsigned short __attribute__((__vector_size__ (16))) V; + +V +foo (V v) +{ + v /= 255; + return v; +}
Tamar Christina <Tamar.Christina@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandiford@arm.com> >> Sent: Friday, December 9, 2022 7:08 AM >> To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com> >> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; >> nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; >> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov >> <Kyrylo.Tkachov@arm.com> >> Subject: Re: [PATCH]AArch64 div-by-255, ensure that arguments are >> registers. [PR107988] >> >> Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes: >> > On 08/12/2022 16:39, Tamar Christina via Gcc-patches wrote: >> >> Hi All, >> >> >> >> At -O0 (as opposed to e.g. volatile) we can get into the situation >> >> where the >> >> in0 and result RTL arguments passed to the division function are >> >> memory locations instead of registers. I think we could reject these >> >> early on by checking that the gimple values are GIMPLE registers, but >> >> I think it's better to handle it. >> >> >> >> As such I force them to registers and emit a move to the memory >> >> locations and leave it up to reload to handle. This fixes the ICE >> >> and still allows the optimization in these cases, which improves the code >> quality a lot. >> >> >> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> >> >> Ok for master? >> >> >> >> Thanks, >> >> Tamar >> >> >> >> >> >> >> >> gcc/ChangeLog: >> >> >> >> PR target/107988 >> >> * config/aarch64/aarch64.cc >> >> (aarch64_vectorize_can_special_div_by_constant): Ensure input and >> output >> >> RTL are registers. >> >> >> >> gcc/testsuite/ChangeLog: >> >> >> >> PR target/107988 >> >> * gcc.target/aarch64/pr107988-1.c: New test. >> >> >> >> --- inline copy of patch -- >> >> diff --git a/gcc/config/aarch64/aarch64.cc >> >> b/gcc/config/aarch64/aarch64.cc index >> >> >> b8dc3f070c8afc47c85fa18768c4da92c774338f..9f96424993c4fcccce90e1b241f >> >> cb3aa97025225 100644 >> >> --- a/gcc/config/aarch64/aarch64.cc >> >> +++ b/gcc/config/aarch64/aarch64.cc >> >> @@ -24337,12 +24337,27 @@ >> aarch64_vectorize_can_special_div_by_constant (enum tree_code code, >> >> if (!VECTOR_TYPE_P (vectype)) >> >> return false; >> >> >> >> + if (!REG_P (in0)) >> >> + in0 = force_reg (GET_MODE (in0), in0); >> >> + >> >> gcc_assert (output); >> >> >> >> - if (!*output) >> >> - *output = gen_reg_rtx (TYPE_MODE (vectype)); >> >> + rtx res = NULL_RTX; >> >> + >> >> + /* Once e get to this point we cannot reject the RTL, if it's not a reg >> then >> >> + Create a new reg and write the result to the output afterwards. >> >> + */ if (!*output || !REG_P (*output)) >> >> + res = gen_reg_rtx (TYPE_MODE (vectype)); else >> >> + res = *output; >> > >> > Why not write >> > rtx res = *output >> > if (!res || !REG_P (res)) >> > res = gen_reg_rtx... >> > >> > then you don't need either the else clause or the dead NULL_RTX >> assignment. >> >> I'd prefer that we use the expand_insn interface, which already has logic for >> coercing inputs and outputs to predicates. Something like: >> >> machine_mode mode = TYPE_MODE (vectype); >> unsigned int flags = aarch64_classify_vector_mode (mode); >> if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) >> return false; >> >> ... >> >> expand_operand ops[3]; >> create_output_operand (&ops[0], *output, mode); >> create_input_operand (&ops[1], in0, mode); >> create_fixed_operand (&ops[2], in1); >> expand_insn (insn_code, 3, ops); >> *output = ops[0].value; >> return true; >> >> On this function: why do we have the VECTOR_TYPE_P condition in: >> > > It was left over after checking for optabs support. It's superfluous now. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? OK, thanks. Richard > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/107988 > * config/aarch64/aarch64.cc > (aarch64_vectorize_can_special_div_by_constant): Ensure input and output > RTL are registers. > > gcc/testsuite/ChangeLog: > > PR target/107988 > * gcc.target/aarch64/pr107988-1.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 7bb0b7602ff6474410059494dd86b7be1621dde5..e1f34ef5da170ef11727e0c99a5bd42849c5d185 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24395,7 +24395,8 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, > || !TYPE_UNSIGNED (vectype)) > return false; > > - unsigned int flags = aarch64_classify_vector_mode (TYPE_MODE (vectype)); > + machine_mode mode = TYPE_MODE (vectype); > + unsigned int flags = aarch64_classify_vector_mode (mode); > if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) > return false; > > @@ -24411,15 +24412,14 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, > if (in0 == NULL_RTX && in1 == NULL_RTX) > return true; > > - if (!VECTOR_TYPE_P (vectype)) > - return false; > - > gcc_assert (output); > > - if (!*output) > - *output = gen_reg_rtx (TYPE_MODE (vectype)); > - > - emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); > + expand_operand ops[3]; > + create_output_operand (&ops[0], *output, mode); > + create_input_operand (&ops[1], in0, mode); > + create_fixed_operand (&ops[2], in1); > + expand_insn (insn_code, 3, ops); > + *output = ops[0].value; > return true; > } > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr107988-1.c b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..c4fd290271b738345173b569bdc58c092fba7fe9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O0" } */ > +typedef unsigned short __attribute__((__vector_size__ (16))) V; > + > +V > +foo (V v) > +{ > + v /= 255; > + return v; > +}
--- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24337,12 +24337,27 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, if (!VECTOR_TYPE_P (vectype)) return false; + if (!REG_P (in0)) + in0 = force_reg (GET_MODE (in0), in0); + gcc_assert (output); - if (!*output) - *output = gen_reg_rtx (TYPE_MODE (vectype)); + rtx res = NULL_RTX; + + /* Once e get to this point we cannot reject the RTL, if it's not a reg then + Create a new reg and write the result to the output afterwards. */ + if (!*output || !REG_P (*output)) + res = gen_reg_rtx (TYPE_MODE (vectype)); + else + res = *output; + + emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), res, in0, in1)); + + if (*output && res != *output) + emit_move_insn (*output, res); + else + *output = res; - emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/pr107988-1.c b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c new file mode 100644 index 0000000000000000000000000000000000000000..c4fd290271b738345173b569bdc58c092fba7fe9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr107988-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O0" } */ +typedef unsigned short __attribute__((__vector_size__ (16))) V; + +V +foo (V v) +{ + v /= 255; + return v; +}