Message ID | 20211130164824.mmgllowvv5nsczf2@arm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1CE66385AC2F for <patchwork@sourceware.org>; Tue, 30 Nov 2021 16:49:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1CE66385AC2F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638290960; bh=UmqBEL2IytTL9hXwzDsE4kHX/jPS8PUtvZxLmvR0kEg=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=rZKxgqZRektglNHjaLRcOJHgq5dWjNfLEi8rcZY5cz8fIp4E73BEKt8rauD+uuSa9 UQjUsVLEJIQ1GIkMhkap8sr9rO85NFrZisI2Nvk4gMMJLFmXxQBp1fITSN6EoCA5jJ DKTA9zhPw8ZClFyMP4yV8jMUXQTJ2LqHUsz165cA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-eopbgr70088.outbound.protection.outlook.com [40.107.7.88]) by sourceware.org (Postfix) with ESMTPS id C43CB3858D28 for <gcc-patches@gcc.gnu.org>; Tue, 30 Nov 2021 16:48:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C43CB3858D28 Received: from AM5P194CA0011.EURP194.PROD.OUTLOOK.COM (2603:10a6:203:8f::21) by AM8PR08MB5697.eurprd08.prod.outlook.com (2603:10a6:20b:1d7::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.19; Tue, 30 Nov 2021 16:48:43 +0000 Received: from VE1EUR03FT003.eop-EUR03.prod.protection.outlook.com (2603:10a6:203:8f:cafe::af) by AM5P194CA0011.outlook.office365.com (2603:10a6:203:8f::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.22 via Frontend Transport; Tue, 30 Nov 2021 16:48:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT003.mail.protection.outlook.com (10.152.18.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.20 via Frontend Transport; Tue, 30 Nov 2021 16:48:42 +0000 Received: ("Tessian outbound c61f076cbd30:v110"); Tue, 30 Nov 2021 16:48:42 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7bbeac54afc0871b X-CR-MTA-TID: 64aa7808 Received: from ca1a0c3fec88.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 06777121-D29B-4DCC-AF14-256FF7035D22.1; Tue, 30 Nov 2021 16:48:36 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ca1a0c3fec88.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 30 Nov 2021 16:48:36 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QENSgdOI5HJx3m/urGnv2Am/JypADmflD9P4rpCP1y4fVmxcg3YLIxw6o0tqGwvgQuSU25JHJKR0+GfOpQRZj71X9n85H0KmCLgXokvl0+YuTDDPVR/Vv5MKug4K6Mh7zg5/NNnZgstsrUzLZcIryAgkBYQND5KFgH3kXuBe5CtzVmjU7f690ajVyLVy/WWluToW7QnxLrsNRdRjmo/6UjDdFn2gbe7pXu8xPHT6yYTffoahorzlGMTtygv04VxuPJSvT7RGjGGdM29O5ahtCMlQq6XvwKdNnz/foVI+35Ep5BGiWuyR8TNSAUMvK3YpBesCdi0nJ382iXE3vPKpXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UmqBEL2IytTL9hXwzDsE4kHX/jPS8PUtvZxLmvR0kEg=; b=dFKJpyWqPlRf0rHhBYg38gqhyo2tUxZx0pzkDPvNnt32TqABoSjbeqD4Kb4MNRlAdyvcaxojj1A+cA0+bPUW3NbCMwfTAsx1g1g1Dev2YaeUBYc8MUT1vFGxCojZyFQaIQ+P/elxHW+DDFoZV5JHx+bNA5lCYF0hAi5vujPe7lsao6/eMLMEtd1Zw2Bi8RdOSbEVLdbyqm5wvSdErVPi6BsTL4q0mCTFCoBnnCvDV2QEAxqg22n/4Za2ja35tar9yGamarM5CGPO9lvKCePEEPHuy6ud6aTwoa/5LOb5GF5JJMw5VzYWXPiPGS5Mu1p8c2tl8qJ/ji5qXcz+K98y+A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from PAXPR08MB7201.eurprd08.prod.outlook.com (2603:10a6:102:208::18) by PAXPR08MB7170.eurprd08.prod.outlook.com (2603:10a6:102:208::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.24; Tue, 30 Nov 2021 16:48:29 +0000 Received: from PAXPR08MB7201.eurprd08.prod.outlook.com ([fe80::2c37:727a:cf2f:ca28]) by PAXPR08MB7201.eurprd08.prod.outlook.com ([fe80::2c37:727a:cf2f:ca28%6]) with mapi id 15.20.4734.024; Tue, 30 Nov 2021 16:48:29 +0000 Date: Tue, 30 Nov 2021 16:48:25 +0000 To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix alignment of stack slots for overaligned types [PR103500] Message-ID: <20211130164824.mmgllowvv5nsczf2@arm.com> Content-Type: multipart/mixed; boundary="6ygamlnu5rcvnm7a" Content-Disposition: inline User-Agent: NeoMutt/20171215 X-ClientProxiedBy: LO2P265CA0379.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a3::31) To PAXPR08MB7201.eurprd08.prod.outlook.com (2603:10a6:102:208::18) MIME-Version: 1.0 Received: from arm.com (217.140.106.54) by LO2P265CA0379.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a3::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4734.22 via Frontend Transport; Tue, 30 Nov 2021 16:48:28 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3c569a1f-ef98-43cf-c7e1-08d9b42144a4 X-MS-TrafficTypeDiagnostic: PAXPR08MB7170:|AM8PR08MB5697: X-Microsoft-Antispam-PRVS: <AM8PR08MB56974AE237BF35958D77CFB5EA679@AM8PR08MB5697.eurprd08.prod.outlook.com> x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: m3Vee0nBXVS12rsSRDrmcrCzaJwjSoEIArZbWF66DtZB1tqf2qh7VGt7phlew1XxKDMAWYAF1tFueCVqk+sBX7VTKtcXR/aA5/TzToY3+1tOPdHDC1COQTKER1snCA/x4V4XlIavStaBkvY5oW1Dry0oQWfpKk/JCyq0RgGVDDwV2Fx7xtqXAOvSdghWGmOY5C2sZmTP2dx25DwxmtBbmlSq5uNQOsazTJGOJ8a+6j17xV7OdxFG0Rrwh3oO7F+PaIJPUQ4RmzT3vMjs8hvGWQkY2Nr54R/DB37PqdO6S/KfyKf4cWO8mOZMOOjgY4N9ANcBikaqaxH6rQ+iTGhAcMJBDLcpQwc/CAcxjkCxk8LdH1d5q0I/EZGQtB7UKpiDm7l4QeetSqeKG7Upn67TQDSovarESu328WdbGbGJjXxJfJfRkQoepmIRe2kioz8zWORvsNyEBDQhNbSv+Q9NGCilDJ4XosJ/ouuSrQKzDSMLqDGSobXCF3Xfk69C0XjLqoZx9EYaT2f8ZYYNeSGGokAvjNSxfhnJ/Fd7pwLG6gqL+gqbKJLVF0V8ILGCu/IqskZXKWv/Z14bw/IFLA3ES6Y0YoWdI+Z8iTndrkNXifWCO6cXWuWlDdlGr8lMIPda+uUoj3c8G270Ubwn0nJxQGqp6EjtvXDzdTobeNnD7J1TJLa59ktRSSL5KlIcNdpCJAURWNJ3Jr9Rx4lb59aUNgZo36XuP3KgBh9ePxyj8n8jzM2b2YwS769UOu+LCb6AfDmiaUpXWc5PjaGtttUdFfbjWA4HEI6QxDOqm/Ajiia+d4rgpbey0kUxI1cbnhU1 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAXPR08MB7201.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(44832011)(86362001)(6916009)(7696005)(21480400003)(5660300002)(84970400001)(8936002)(55016003)(186003)(52116002)(6666004)(26005)(8886007)(2616005)(33964004)(316002)(66946007)(66476007)(66556008)(956004)(508600001)(1076003)(44144004)(2906002)(8676002)(38100700002)(83380400001)(36756003)(38350700002)(235185007)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7170 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 5ba73dd0-8243-4db7-a10b-08d9b4213c2e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 51ySP6xYa6SWu1jQDeVHd311IhUddGBRk6QQ+iixg/dmVNrLHulJr9CMsNerYpTT6fZEva4p4ZbSQEEuCjw/DspET7g/2YycaWz1A+N+WQTtZY87qgUSGgpm0wwm6dNG/IA1WPpWt+MNZZbUrPfdfKyS2ielK1va6LwI0HpakzfFa2CCWp79okeNv/EBZLnxZvKYJBgTiHk8gpgrGdz+8d4DimBZGqgVxK9O7kJzLAhXD7Re2yGwCHGNIGTKlkDX0qQmz7HXRetvIxx9qVOp8C4BDG5Vf3h2ULC9t1niyKmAaba0vxRKgRYxmHkpnxI0et1B+RiDPiJFJqE8EzFz7j107LcEJ/uvzwDUzvxe99ixmjFKyMnfVL3Rh/LwhAS75G49pWBtKyJbiSE9ngnniwqt9iEleB3nSJFYNm/6yx3SSzZum4+4RwRsZxJ70A56dsGHobWPAXG/h3bh2pXUq2TUa53sqrKRKQlky71PwxsKRDwwia+zZM5XQDL+QV+MjidMWRzdR3dS6qB9BUeD8wDmkk1aC1LXwHNENca/PWXPRGy3VLgRqsVQUIL6wnrOepjxgXlCZHzoAnQtdk7ejLbhko1lXKO9A8DOXJwJvvSO/UkxwqFq3PIEaoPzW72VOQE6J/dfICg2Xsi8tQpFdaXKWqRmWtBhXdmVPFKhr/7N8EqWc3x2cqMLdp4ccJKhzv3mH5hjRa6iRkA1eDL46UumCWEAq2aDKQfCcAySrHNg42WNJkeN3b+mt2WMxP+v9mct8GdCqEDS9VfCEH19/qr+x5JbgrcDE+bEL7OkNZtKUtCsMnQfLf00yr5AuYec X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(84970400001)(82310400004)(36860700001)(8886007)(36756003)(5660300002)(81166007)(8676002)(6666004)(55016003)(6916009)(356005)(1076003)(2616005)(44832011)(316002)(21480400003)(44144004)(83380400001)(8936002)(47076005)(86362001)(70586007)(235185007)(336012)(33964004)(26005)(2906002)(7696005)(186003)(956004)(70206006)(508600001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Nov 2021 16:48:42.6796 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3c569a1f-ef98-43cf-c7e1-08d9b42144a4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT003.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB5697 X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Alex Coplan <alex.coplan@arm.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
Fix alignment of stack slots for overaligned types [PR103500]
|
|
Commit Message
Alex Coplan
Nov. 30, 2021, 4:48 p.m. UTC
Hi, This fixes PR103500 i.e. ensuring that stack slots for passed-by-reference overaligned types are appropriately aligned. For the testcase: typedef struct __attribute__((aligned(32))) { long x,y; } S; S x; void f(S); void g(void) { f(x); } on AArch64, we currently generate (at -O2): g: adrp x1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 stp x29, x30, [sp, -48]! mov x29, sp ldp q0, q1, [x1] add x0, sp, 16 stp q0, q1, [sp, 16] bl f ldp x29, x30, [sp], 48 ret so the stack slot for the passed-by-reference copy of the structure is at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the structure is only 16-byte aligned. The PCS requires the structure to be 32-byte aligned. After this patch, we generate: g: adrp x1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 stp x29, x30, [sp, -64]! mov x29, sp add x0, sp, 47 ldp q0, q1, [x1] and x0, x0, -32 stp q0, q1, [x0] bl f ldp x29, x30, [sp], 64 ret i.e. we ensure 32-byte alignment for the struct. The approach taken here is similar to that in function.c:assign_parm_setup_block where it handles the case for DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is similar to the approach taken in cfgexpand.c:expand_stack_vars (where the function calls get_dynamic_stack_size) which is the code that handles the alignment for overaligned structures as addressable local variables (see the related case discussed in the PR). This patch also updates the aapcs64 test mentioned in the PR to avoid the frontend folding away the alignment check. I've confirmed that the execution test actually fails on aarch64-linux-gnu prior to the patch being applied and passes afterwards. Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and arm-linux-gnueabihf: no regressions. I'd appreciate any feedback. Is it OK for trunk? Thanks, Alex gcc/ChangeLog: PR middle-end/103500 * function.c (get_stack_local_alignment): Align BLKmode overaligned types to the alignment required by the type. (assign_stack_temp_for_type): Handle BLKmode overaligned stack slots by allocating a larger-than-necessary buffer and aligning the address within appropriately. gcc/testsuite/ChangeLog: PR middle-end/103500 * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): Prevent the frontend from folding our alignment check away by using snprintf to store the pointer into a string and recovering it with sscanf.
Comments
* Alex Coplan via Gcc-patches: > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > arm-linux-gnueabihf: no regressions. > > I'd appreciate any feedback. Is it OK for trunk? Does this need an ABI warning? Thanks, Florian
On 30/11/2021 19:38, Florian Weimer wrote: > * Alex Coplan via Gcc-patches: > > > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > > arm-linux-gnueabihf: no regressions. > > > > I'd appreciate any feedback. Is it OK for trunk? > > Does this need an ABI warning? That's a good question. I'm not sure exactly what changes merit an ABI warning in GCC. In this case, we aren't changing how the parameter is passed in the sense that the callee can still locate it in the same way (by following the pointer in the appropriate GPR), we're simply increasing the alignment of the stack slot used for the indirected copy. So on this basis I would guess that it doesn't need a warning, but I'm happy to be corrected. Thanks, Alex > > Thanks, > Florian >
Hi, I'd just like to ping this for review: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585785.html Thanks, Alex On 30/11/2021 16:48, Alex Coplan via Gcc-patches wrote: > Hi, > > This fixes PR103500 i.e. ensuring that stack slots for > passed-by-reference overaligned types are appropriately aligned. For the > testcase: > > typedef struct __attribute__((aligned(32))) { > long x,y; > } S; > S x; > void f(S); > void g(void) { f(x); } > > on AArch64, we currently generate (at -O2): > > g: > adrp x1, .LANCHOR0 > add x1, x1, :lo12:.LANCHOR0 > stp x29, x30, [sp, -48]! > mov x29, sp > ldp q0, q1, [x1] > add x0, sp, 16 > stp q0, q1, [sp, 16] > bl f > ldp x29, x30, [sp], 48 > ret > > so the stack slot for the passed-by-reference copy of the structure is > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the > structure is only 16-byte aligned. The PCS requires the structure to be > 32-byte aligned. After this patch, we generate: > > g: > adrp x1, .LANCHOR0 > add x1, x1, :lo12:.LANCHOR0 > stp x29, x30, [sp, -64]! > mov x29, sp > add x0, sp, 47 > ldp q0, q1, [x1] > and x0, x0, -32 > stp q0, q1, [x0] > bl f > ldp x29, x30, [sp], 64 > ret > > i.e. we ensure 32-byte alignment for the struct. > > The approach taken here is similar to that in > function.c:assign_parm_setup_block where it handles the case for > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is > similar to the approach taken in cfgexpand.c:expand_stack_vars (where > the function calls get_dynamic_stack_size) which is the code that > handles the alignment for overaligned structures as addressable local > variables (see the related case discussed in the PR). > > This patch also updates the aapcs64 test mentioned in the PR to avoid > the frontend folding away the alignment check. I've confirmed that the > execution test actually fails on aarch64-linux-gnu prior to the patch > being applied and passes afterwards. > > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > arm-linux-gnueabihf: no regressions. > > I'd appreciate any feedback. Is it OK for trunk? > > Thanks, > Alex > > gcc/ChangeLog: > > PR middle-end/103500 > * function.c (get_stack_local_alignment): Align BLKmode overaligned > types to the alignment required by the type. > (assign_stack_temp_for_type): Handle BLKmode overaligned stack > slots by allocating a larger-than-necessary buffer and aligning > the address within appropriately. > > gcc/testsuite/ChangeLog: > > PR middle-end/103500 > * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): > Prevent the frontend from folding our alignment check away by > using snprintf to store the pointer into a string and recovering > it with sscanf. > diff --git a/gcc/function.c b/gcc/function.c > index 61b3bd036b8..5ed722ab959 100644 > --- a/gcc/function.c > +++ b/gcc/function.c > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode) > unsigned int alignment; > > if (mode == BLKmode) > - alignment = BIGGEST_ALIGNMENT; > + alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT) > + ? TYPE_ALIGN (type) > + : BIGGEST_ALIGNMENT; > else > alignment = GET_MODE_ALIGNMENT (mode); > > @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) > > p = ggc_alloc<temp_slot> (); > > - /* We are passing an explicit alignment request to assign_stack_local. > - One side effect of that is assign_stack_local will not round SIZE > - to ensure the frame offset remains suitably aligned. > - > - So for requests which depended on the rounding of SIZE, we go ahead > - and round it now. We also make sure ALIGNMENT is at least > - BIGGEST_ALIGNMENT. */ > - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); > - p->slot = assign_stack_local_1 (mode, > - (mode == BLKmode > - ? aligned_upper_bound (size, > - (int) align > - / BITS_PER_UNIT) > - : size), > - align, 0); > + if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT) > + { > + rtx allocsize = gen_int_mode (size, Pmode); > + get_dynamic_stack_size (&allocsize, 0, align, NULL); > + gcc_assert (CONST_INT_P (allocsize)); > + size = UINTVAL (allocsize); > + p->slot = assign_stack_local_1 (mode, > + size, > + BIGGEST_ALIGNMENT, 0); > + rtx addr = align_dynamic_address (XEXP (p->slot, 0), align); > + mark_reg_pointer (addr, align); > + p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr); > + MEM_NOTRAP_P (p->slot) = 1; > + } > + else > + /* We are passing an explicit alignment request to assign_stack_local. > + One side effect of that is assign_stack_local will not round SIZE > + to ensure the frame offset remains suitably aligned. > + > + So for requests which depended on the rounding of SIZE, we go ahead > + and round it now. We also make sure ALIGNMENT is at least > + BIGGEST_ALIGNMENT. */ > + p->slot = assign_stack_local_1 (mode, > + (mode == BLKmode > + ? aligned_upper_bound (size, > + (int) align > + / BITS_PER_UNIT) > + : size), > + align, 0); > > p->align = align; > > diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > index c9351802191..0b0ac0b9b97 100644 > --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2) > abort (); > if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned))) > abort (); > - long addr = ((long) &x1) & 31; > + > + char buf[64]; > + void *ptr; > + > + __builtin_snprintf (buf, sizeof(buf), "%p", &x1); > + if (__builtin_sscanf(buf, "%p", &ptr) != 1) > + abort (); > + > + long addr = ((long) ptr) & 31; > if (addr != 0) > { > - __builtin_printf ("Alignment was %d\n", addr); > + __builtin_printf ("Alignment was %ld\n", addr); > abort (); > } > }
Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi, > > This fixes PR103500 i.e. ensuring that stack slots for > passed-by-reference overaligned types are appropriately aligned. For the > testcase: > > typedef struct __attribute__((aligned(32))) { > long x,y; > } S; > S x; > void f(S); > void g(void) { f(x); } > > on AArch64, we currently generate (at -O2): > > g: > adrp x1, .LANCHOR0 > add x1, x1, :lo12:.LANCHOR0 > stp x29, x30, [sp, -48]! > mov x29, sp > ldp q0, q1, [x1] > add x0, sp, 16 > stp q0, q1, [sp, 16] > bl f > ldp x29, x30, [sp], 48 > ret > > so the stack slot for the passed-by-reference copy of the structure is > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the > structure is only 16-byte aligned. The PCS requires the structure to be > 32-byte aligned. After this patch, we generate: > > g: > adrp x1, .LANCHOR0 > add x1, x1, :lo12:.LANCHOR0 > stp x29, x30, [sp, -64]! > mov x29, sp > add x0, sp, 47 > ldp q0, q1, [x1] > and x0, x0, -32 > stp q0, q1, [x0] > bl f > ldp x29, x30, [sp], 64 > ret > > i.e. we ensure 32-byte alignment for the struct. > > The approach taken here is similar to that in > function.c:assign_parm_setup_block where it handles the case for > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is > similar to the approach taken in cfgexpand.c:expand_stack_vars (where > the function calls get_dynamic_stack_size) which is the code that > handles the alignment for overaligned structures as addressable local > variables (see the related case discussed in the PR). A difference with the latter is that cfgexpand (AFAICT) always honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly only increasing the alignment for efficiency reasons (rather than decreasing it). So… > This patch also updates the aapcs64 test mentioned in the PR to avoid > the frontend folding away the alignment check. I've confirmed that the > execution test actually fails on aarch64-linux-gnu prior to the patch > being applied and passes afterwards. > > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > arm-linux-gnueabihf: no regressions. > > I'd appreciate any feedback. Is it OK for trunk? > > Thanks, > Alex > > gcc/ChangeLog: > > PR middle-end/103500 > * function.c (get_stack_local_alignment): Align BLKmode overaligned > types to the alignment required by the type. > (assign_stack_temp_for_type): Handle BLKmode overaligned stack > slots by allocating a larger-than-necessary buffer and aligning > the address within appropriately. > > gcc/testsuite/ChangeLog: > > PR middle-end/103500 > * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): > Prevent the frontend from folding our alignment check away by > using snprintf to store the pointer into a string and recovering > it with sscanf. > > diff --git a/gcc/function.c b/gcc/function.c > index 61b3bd036b8..5ed722ab959 100644 > --- a/gcc/function.c > +++ b/gcc/function.c > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode) > unsigned int alignment; > > if (mode == BLKmode) > - alignment = BIGGEST_ALIGNMENT; > + alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT) > + ? TYPE_ALIGN (type) > + : BIGGEST_ALIGNMENT; …I'm not sure about this calculation. Why do we only honour TYPE_ALIGN if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the way back to BIGGEST_ALIGNMENT otherwise? It looks like on nvptx this would have the effect of honouring (say) 2048-byte alignment, but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT and MAX_SUPPORTED_STACK_ALIGNMENT). Also, what about the non-BLKmode case? A type's mode cannot be more aligned than the type, but a type is allowed to be more aligned than its mode. E.g.: typedef unsigned int V __attribute__((vector_size(32))); typedef struct __attribute__((aligned(32))) { V x; } S; S x; void f(S); void g(void) { f(x); } doesn't seem to be handled by the patch. It's possible to create variations on this on aarch64 up to 2048 bits using SVE and -msve-vector-bits=N. Perhaps we should treat the old alignment calculation as a minimum and take the max of that and TYPE_ALIGN, where TYPE_ALIGN is given. > else > alignment = GET_MODE_ALIGNMENT (mode); > > @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) > > p = ggc_alloc<temp_slot> (); > > - /* We are passing an explicit alignment request to assign_stack_local. > - One side effect of that is assign_stack_local will not round SIZE > - to ensure the frame offset remains suitably aligned. > - > - So for requests which depended on the rounding of SIZE, we go ahead > - and round it now. We also make sure ALIGNMENT is at least > - BIGGEST_ALIGNMENT. */ > - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); > - p->slot = assign_stack_local_1 (mode, > - (mode == BLKmode > - ? aligned_upper_bound (size, > - (int) align > - / BITS_PER_UNIT) > - : size), > - align, 0); > + if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT) > + { > + rtx allocsize = gen_int_mode (size, Pmode); > + get_dynamic_stack_size (&allocsize, 0, align, NULL); > + gcc_assert (CONST_INT_P (allocsize)); > + size = UINTVAL (allocsize); Since the size coming in is a poly_int64, I think we should treat the result in the same way, using rtx_to_poly_int64 rather than requiring a CONST_INT. We might not handle that case properly yet, but rtx_to_poly_int64 would trap mistakes, and using it now would avoid fewer surprises later. Thanks, Richard > + p->slot = assign_stack_local_1 (mode, > + size, > + BIGGEST_ALIGNMENT, 0); > + rtx addr = align_dynamic_address (XEXP (p->slot, 0), align); > + mark_reg_pointer (addr, align); > + p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr); > + MEM_NOTRAP_P (p->slot) = 1; > + } > + else > + /* We are passing an explicit alignment request to assign_stack_local. > + One side effect of that is assign_stack_local will not round SIZE > + to ensure the frame offset remains suitably aligned. > + > + So for requests which depended on the rounding of SIZE, we go ahead > + and round it now. We also make sure ALIGNMENT is at least > + BIGGEST_ALIGNMENT. */ > + p->slot = assign_stack_local_1 (mode, > + (mode == BLKmode > + ? aligned_upper_bound (size, > + (int) align > + / BITS_PER_UNIT) > + : size), > + align, 0); > > p->align = align; > > diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > index c9351802191..0b0ac0b9b97 100644 > --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2) > abort (); > if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned))) > abort (); > - long addr = ((long) &x1) & 31; > + > + char buf[64]; > + void *ptr; > + > + __builtin_snprintf (buf, sizeof(buf), "%p", &x1); > + if (__builtin_sscanf(buf, "%p", &ptr) != 1) > + abort (); > + > + long addr = ((long) ptr) & 31; > if (addr != 0) > { > - __builtin_printf ("Alignment was %d\n", addr); > + __builtin_printf ("Alignment was %ld\n", addr); > abort (); > } > }
Hi Richard, Thanks for the review. On 20/12/2021 13:19, Richard Sandiford wrote: > Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > Hi, > > > > This fixes PR103500 i.e. ensuring that stack slots for > > passed-by-reference overaligned types are appropriately aligned. For the > > testcase: > > > > typedef struct __attribute__((aligned(32))) { > > long x,y; > > } S; > > S x; > > void f(S); > > void g(void) { f(x); } > > > > on AArch64, we currently generate (at -O2): > > > > g: > > adrp x1, .LANCHOR0 > > add x1, x1, :lo12:.LANCHOR0 > > stp x29, x30, [sp, -48]! > > mov x29, sp > > ldp q0, q1, [x1] > > add x0, sp, 16 > > stp q0, q1, [sp, 16] > > bl f > > ldp x29, x30, [sp], 48 > > ret > > > > so the stack slot for the passed-by-reference copy of the structure is > > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the > > structure is only 16-byte aligned. The PCS requires the structure to be > > 32-byte aligned. After this patch, we generate: > > > > g: > > adrp x1, .LANCHOR0 > > add x1, x1, :lo12:.LANCHOR0 > > stp x29, x30, [sp, -64]! > > mov x29, sp > > add x0, sp, 47 > > ldp q0, q1, [x1] > > and x0, x0, -32 > > stp q0, q1, [x0] > > bl f > > ldp x29, x30, [sp], 64 > > ret > > > > i.e. we ensure 32-byte alignment for the struct. > > > > The approach taken here is similar to that in > > function.c:assign_parm_setup_block where it handles the case for > > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is > > similar to the approach taken in cfgexpand.c:expand_stack_vars (where > > the function calls get_dynamic_stack_size) which is the code that > > handles the alignment for overaligned structures as addressable local > > variables (see the related case discussed in the PR). > > A difference with the latter is that cfgexpand (AFAICT) always > honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly > only increasing the alignment for efficiency reasons (rather than > decreasing it). > > So… > > > This patch also updates the aapcs64 test mentioned in the PR to avoid > > the frontend folding away the alignment check. I've confirmed that the > > execution test actually fails on aarch64-linux-gnu prior to the patch > > being applied and passes afterwards. > > > > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > > arm-linux-gnueabihf: no regressions. > > > > I'd appreciate any feedback. Is it OK for trunk? > > > > Thanks, > > Alex > > > > gcc/ChangeLog: > > > > PR middle-end/103500 > > * function.c (get_stack_local_alignment): Align BLKmode overaligned > > types to the alignment required by the type. > > (assign_stack_temp_for_type): Handle BLKmode overaligned stack > > slots by allocating a larger-than-necessary buffer and aligning > > the address within appropriately. > > > > gcc/testsuite/ChangeLog: > > > > PR middle-end/103500 > > * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): > > Prevent the frontend from folding our alignment check away by > > using snprintf to store the pointer into a string and recovering > > it with sscanf. > > > > diff --git a/gcc/function.c b/gcc/function.c > > index 61b3bd036b8..5ed722ab959 100644 > > --- a/gcc/function.c > > +++ b/gcc/function.c > > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode) > > unsigned int alignment; > > > > if (mode == BLKmode) > > - alignment = BIGGEST_ALIGNMENT; > > + alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT) > > + ? TYPE_ALIGN (type) > > + : BIGGEST_ALIGNMENT; > > …I'm not sure about this calculation. Why do we only honour TYPE_ALIGN > if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the > way back to BIGGEST_ALIGNMENT otherwise? It looks like on nvptx > this would have the effect of honouring (say) 2048-byte alignment, > but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT > and MAX_SUPPORTED_STACK_ALIGNMENT). So, to be honest, this was a bit of a bodge to try and work around issues on x86. My original attempt at solving the PR used the more obvious calculation you suggest below, i.e. the max of BIGGEST_ALIGNMENT and TYPE_ALIGN (type). The problem with that is, on x86, MAX_SUPPORTED_STACK_ALIGNMENT has a huge value (2^31). explow.c:get_dynamic_stack_size has: if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0) { size = round_push (size); [...] } *psize = size; so inevitably we end up calling round_push on x86 which in turn ends up going down the else branch: else { /* If crtl->preferred_stack_boundary might still grow, use virtual_preferred_stack_boundary_rtx instead. This will be substituted by the right value in vregs pass and optimized during combine. */ align_rtx = virtual_preferred_stack_boundary_rtx; alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), NULL_RTX); } and we end up returning a size expression involving virtual_preferred_stack_boundary_rtx, which clearly cannot be a CONST_INT (required for the approach in the patch to work: we need a constant to pass to assign_stack_local_1). So the idea with that calculation was to only use the new, larger alignment, if it was larger than MAX_SUPPORTED_STACK_ALIGNMENT (which should never be the case on x86 for any reasonable value of TYPE_ALIGN). Otherwise, it would fall back to the old behaviour of just using BIGGEST_ALIGNMENT. I realise that this is a rather inelegant hack, but I wasn't sure how else to approach the problem. > > Also, what about the non-BLKmode case? A type's mode cannot be more > aligned than the type, but a type is allowed to be more aligned than > its mode. E.g.: Yes, we should probably try and handle the non-BLKmode case as well. I was just trying to avoid changing too much at once at the risk of introducing regressions. > > typedef unsigned int V __attribute__((vector_size(32))); > typedef struct __attribute__((aligned(32))) { V x; } S; > S x; > void f(S); > void g(void) { f(x); } > > doesn't seem to be handled by the patch. It's possible to create > variations on this on aarch64 up to 2048 bits using SVE and > -msve-vector-bits=N. > > Perhaps we should treat the old alignment calculation as a minimum > and take the max of that and TYPE_ALIGN, where TYPE_ALIGN is given. As discussed above, this doesn't work on x86, unfortunately. If you have any alternative suggestions as to how to approach the problem, it would be great to hear them. Thanks, Alex > > > else > > alignment = GET_MODE_ALIGNMENT (mode); > > > > @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) > > > > p = ggc_alloc<temp_slot> (); > > > > - /* We are passing an explicit alignment request to assign_stack_local. > > - One side effect of that is assign_stack_local will not round SIZE > > - to ensure the frame offset remains suitably aligned. > > - > > - So for requests which depended on the rounding of SIZE, we go ahead > > - and round it now. We also make sure ALIGNMENT is at least > > - BIGGEST_ALIGNMENT. */ > > - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); > > - p->slot = assign_stack_local_1 (mode, > > - (mode == BLKmode > > - ? aligned_upper_bound (size, > > - (int) align > > - / BITS_PER_UNIT) > > - : size), > > - align, 0); > > + if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT) > > + { > > + rtx allocsize = gen_int_mode (size, Pmode); > > + get_dynamic_stack_size (&allocsize, 0, align, NULL); > > + gcc_assert (CONST_INT_P (allocsize)); > > + size = UINTVAL (allocsize); > > Since the size coming in is a poly_int64, I think we should treat > the result in the same way, using rtx_to_poly_int64 rather than > requiring a CONST_INT. We might not handle that case properly yet, > but rtx_to_poly_int64 would trap mistakes, and using it now would > avoid fewer surprises later. > > Thanks, > Richard > > > + p->slot = assign_stack_local_1 (mode, > > + size, > > + BIGGEST_ALIGNMENT, 0); > > + rtx addr = align_dynamic_address (XEXP (p->slot, 0), align); > > + mark_reg_pointer (addr, align); > > + p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr); > > + MEM_NOTRAP_P (p->slot) = 1; > > + } > > + else > > + /* We are passing an explicit alignment request to assign_stack_local. > > + One side effect of that is assign_stack_local will not round SIZE > > + to ensure the frame offset remains suitably aligned. > > + > > + So for requests which depended on the rounding of SIZE, we go ahead > > + and round it now. We also make sure ALIGNMENT is at least > > + BIGGEST_ALIGNMENT. */ > > + p->slot = assign_stack_local_1 (mode, > > + (mode == BLKmode > > + ? aligned_upper_bound (size, > > + (int) align > > + / BITS_PER_UNIT) > > + : size), > > + align, 0); > > > > p->align = align; > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > index c9351802191..0b0ac0b9b97 100644 > > --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2) > > abort (); > > if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned))) > > abort (); > > - long addr = ((long) &x1) & 31; > > + > > + char buf[64]; > > + void *ptr; > > + > > + __builtin_snprintf (buf, sizeof(buf), "%p", &x1); > > + if (__builtin_sscanf(buf, "%p", &ptr) != 1) > > + abort (); > > + > > + long addr = ((long) ptr) & 31; > > if (addr != 0) > > { > > - __builtin_printf ("Alignment was %d\n", addr); > > + __builtin_printf ("Alignment was %ld\n", addr); > > abort (); > > } > > }
On Fri, Jan 7, 2022 at 7:20 AM Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi Richard, > > Thanks for the review. > > On 20/12/2021 13:19, Richard Sandiford wrote: > > Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > Hi, > > > > > > This fixes PR103500 i.e. ensuring that stack slots for > > > passed-by-reference overaligned types are appropriately aligned. For the > > > testcase: > > > > > > typedef struct __attribute__((aligned(32))) { > > > long x,y; > > > } S; > > > S x; > > > void f(S); > > > void g(void) { f(x); } > > > > > > on AArch64, we currently generate (at -O2): > > > > > > g: > > > adrp x1, .LANCHOR0 > > > add x1, x1, :lo12:.LANCHOR0 > > > stp x29, x30, [sp, -48]! > > > mov x29, sp > > > ldp q0, q1, [x1] > > > add x0, sp, 16 > > > stp q0, q1, [sp, 16] > > > bl f > > > ldp x29, x30, [sp], 48 > > > ret > > > > > > so the stack slot for the passed-by-reference copy of the structure is > > > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the > > > structure is only 16-byte aligned. The PCS requires the structure to be > > > 32-byte aligned. After this patch, we generate: > > > > > > g: > > > adrp x1, .LANCHOR0 > > > add x1, x1, :lo12:.LANCHOR0 > > > stp x29, x30, [sp, -64]! > > > mov x29, sp > > > add x0, sp, 47 > > > ldp q0, q1, [x1] > > > and x0, x0, -32 > > > stp q0, q1, [x0] > > > bl f > > > ldp x29, x30, [sp], 64 > > > ret > > > > > > i.e. we ensure 32-byte alignment for the struct. > > > > > > The approach taken here is similar to that in > > > function.c:assign_parm_setup_block where it handles the case for > > > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is > > > similar to the approach taken in cfgexpand.c:expand_stack_vars (where > > > the function calls get_dynamic_stack_size) which is the code that > > > handles the alignment for overaligned structures as addressable local > > > variables (see the related case discussed in the PR). > > > > A difference with the latter is that cfgexpand (AFAICT) always > > honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly > > only increasing the alignment for efficiency reasons (rather than > > decreasing it). > > > > So… > > > > > This patch also updates the aapcs64 test mentioned in the PR to avoid > > > the frontend folding away the alignment check. I've confirmed that the > > > execution test actually fails on aarch64-linux-gnu prior to the patch > > > being applied and passes afterwards. > > > > > > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and > > > arm-linux-gnueabihf: no regressions. > > > > > > I'd appreciate any feedback. Is it OK for trunk? > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR middle-end/103500 > > > * function.c (get_stack_local_alignment): Align BLKmode overaligned > > > types to the alignment required by the type. > > > (assign_stack_temp_for_type): Handle BLKmode overaligned stack > > > slots by allocating a larger-than-necessary buffer and aligning > > > the address within appropriately. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR middle-end/103500 > > > * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): > > > Prevent the frontend from folding our alignment check away by > > > using snprintf to store the pointer into a string and recovering > > > it with sscanf. > > > > > > diff --git a/gcc/function.c b/gcc/function.c > > > index 61b3bd036b8..5ed722ab959 100644 > > > --- a/gcc/function.c > > > +++ b/gcc/function.c > > > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode) > > > unsigned int alignment; > > > > > > if (mode == BLKmode) > > > - alignment = BIGGEST_ALIGNMENT; > > > + alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT) > > > + ? TYPE_ALIGN (type) > > > + : BIGGEST_ALIGNMENT; > > > > …I'm not sure about this calculation. Why do we only honour TYPE_ALIGN > > if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the > > way back to BIGGEST_ALIGNMENT otherwise? It looks like on nvptx > > this would have the effect of honouring (say) 2048-byte alignment, > > but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT > > and MAX_SUPPORTED_STACK_ALIGNMENT). > > So, to be honest, this was a bit of a bodge to try and work around > issues on x86. My original attempt at solving the PR used the more > obvious calculation you suggest below, i.e. the max of BIGGEST_ALIGNMENT > and TYPE_ALIGN (type). The problem with that is, on x86, > MAX_SUPPORTED_STACK_ALIGNMENT has a huge value (2^31). On x86, stack alignment limit is the limit of stack. You can align stack to any values on x86 as long as your stack allows it. > explow.c:get_dynamic_stack_size has: > > if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0) > { > size = round_push (size); > [...] > } > *psize = size; > > so inevitably we end up calling round_push on x86 which in turn ends up > going down the else branch: > > else > { > /* If crtl->preferred_stack_boundary might still grow, use > virtual_preferred_stack_boundary_rtx instead. This will be > substituted by the right value in vregs pass and optimized > during combine. */ > align_rtx = virtual_preferred_stack_boundary_rtx; > alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), > NULL_RTX); > } > > and we end up returning a size expression involving > virtual_preferred_stack_boundary_rtx, which clearly cannot be a > CONST_INT (required for the approach in the patch to work: we need a > constant to pass to assign_stack_local_1). > > So the idea with that calculation was to only use the new, larger > alignment, if it was larger than MAX_SUPPORTED_STACK_ALIGNMENT (which > should never be the case on x86 for any reasonable value of TYPE_ALIGN). > Otherwise, it would fall back to the old behaviour of just using > BIGGEST_ALIGNMENT. > > I realise that this is a rather inelegant hack, but I wasn't sure how > else to approach the problem. > > > > > Also, what about the non-BLKmode case? A type's mode cannot be more > > aligned than the type, but a type is allowed to be more aligned than > > its mode. E.g.: > > Yes, we should probably try and handle the non-BLKmode case as well. I > was just trying to avoid changing too much at once at the risk of > introducing regressions. > > > > > typedef unsigned int V __attribute__((vector_size(32))); > > typedef struct __attribute__((aligned(32))) { V x; } S; > > S x; > > void f(S); > > void g(void) { f(x); } > > > > doesn't seem to be handled by the patch. It's possible to create > > variations on this on aarch64 up to 2048 bits using SVE and > > -msve-vector-bits=N. > > > > Perhaps we should treat the old alignment calculation as a minimum > > and take the max of that and TYPE_ALIGN, where TYPE_ALIGN is given. > > As discussed above, this doesn't work on x86, unfortunately. If you have > any alternative suggestions as to how to approach the problem, it would > be great to hear them. > > Thanks, > Alex > > > > > > else > > > alignment = GET_MODE_ALIGNMENT (mode); > > > > > > @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) > > > > > > p = ggc_alloc<temp_slot> (); > > > > > > - /* We are passing an explicit alignment request to assign_stack_local. > > > - One side effect of that is assign_stack_local will not round SIZE > > > - to ensure the frame offset remains suitably aligned. > > > - > > > - So for requests which depended on the rounding of SIZE, we go ahead > > > - and round it now. We also make sure ALIGNMENT is at least > > > - BIGGEST_ALIGNMENT. */ > > > - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); > > > - p->slot = assign_stack_local_1 (mode, > > > - (mode == BLKmode > > > - ? aligned_upper_bound (size, > > > - (int) align > > > - / BITS_PER_UNIT) > > > - : size), > > > - align, 0); > > > + if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT) > > > + { > > > + rtx allocsize = gen_int_mode (size, Pmode); > > > + get_dynamic_stack_size (&allocsize, 0, align, NULL); > > > + gcc_assert (CONST_INT_P (allocsize)); > > > + size = UINTVAL (allocsize); > > > > Since the size coming in is a poly_int64, I think we should treat > > the result in the same way, using rtx_to_poly_int64 rather than > > requiring a CONST_INT. We might not handle that case properly yet, > > but rtx_to_poly_int64 would trap mistakes, and using it now would > > avoid fewer surprises later. > > > > Thanks, > > Richard > > > > > + p->slot = assign_stack_local_1 (mode, > > > + size, > > > + BIGGEST_ALIGNMENT, 0); > > > + rtx addr = align_dynamic_address (XEXP (p->slot, 0), align); > > > + mark_reg_pointer (addr, align); > > > + p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr); > > > + MEM_NOTRAP_P (p->slot) = 1; > > > + } > > > + else > > > + /* We are passing an explicit alignment request to assign_stack_local. > > > + One side effect of that is assign_stack_local will not round SIZE > > > + to ensure the frame offset remains suitably aligned. > > > + > > > + So for requests which depended on the rounding of SIZE, we go ahead > > > + and round it now. We also make sure ALIGNMENT is at least > > > + BIGGEST_ALIGNMENT. */ > > > + p->slot = assign_stack_local_1 (mode, > > > + (mode == BLKmode > > > + ? aligned_upper_bound (size, > > > + (int) align > > > + / BITS_PER_UNIT) > > > + : size), > > > + align, 0); > > > > > > p->align = align; > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > index c9351802191..0b0ac0b9b97 100644 > > > --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c > > > @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2) > > > abort (); > > > if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned))) > > > abort (); > > > - long addr = ((long) &x1) & 31; > > > + > > > + char buf[64]; > > > + void *ptr; > > > + > > > + __builtin_snprintf (buf, sizeof(buf), "%p", &x1); > > > + if (__builtin_sscanf(buf, "%p", &ptr) != 1) > > > + abort (); > > > + > > > + long addr = ((long) ptr) & 31; > > > if (addr != 0) > > > { > > > - __builtin_printf ("Alignment was %d\n", addr); > > > + __builtin_printf ("Alignment was %ld\n", addr); > > > abort (); > > > } > > > }
Sorry for the slow response. Alex Coplan <alex.coplan@arm.com> writes: > On 20/12/2021 13:19, Richard Sandiford wrote: >> Alex Coplan via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> > Hi, >> > >> > This fixes PR103500 i.e. ensuring that stack slots for >> > passed-by-reference overaligned types are appropriately aligned. For the >> > testcase: >> > >> > typedef struct __attribute__((aligned(32))) { >> > long x,y; >> > } S; >> > S x; >> > void f(S); >> > void g(void) { f(x); } >> > >> > on AArch64, we currently generate (at -O2): >> > >> > g: >> > adrp x1, .LANCHOR0 >> > add x1, x1, :lo12:.LANCHOR0 >> > stp x29, x30, [sp, -48]! >> > mov x29, sp >> > ldp q0, q1, [x1] >> > add x0, sp, 16 >> > stp q0, q1, [sp, 16] >> > bl f >> > ldp x29, x30, [sp], 48 >> > ret >> > >> > so the stack slot for the passed-by-reference copy of the structure is >> > at sp + 16, and the sp is only guaranteed to be 16-byte aligned, so the >> > structure is only 16-byte aligned. The PCS requires the structure to be >> > 32-byte aligned. After this patch, we generate: >> > >> > g: >> > adrp x1, .LANCHOR0 >> > add x1, x1, :lo12:.LANCHOR0 >> > stp x29, x30, [sp, -64]! >> > mov x29, sp >> > add x0, sp, 47 >> > ldp q0, q1, [x1] >> > and x0, x0, -32 >> > stp q0, q1, [x0] >> > bl f >> > ldp x29, x30, [sp], 64 >> > ret >> > >> > i.e. we ensure 32-byte alignment for the struct. >> > >> > The approach taken here is similar to that in >> > function.c:assign_parm_setup_block where it handles the case for >> > DECL_ALIGN (parm) > MAX_SUPPORTED_STACK_ALIGNMENT. This in turn is >> > similar to the approach taken in cfgexpand.c:expand_stack_vars (where >> > the function calls get_dynamic_stack_size) which is the code that >> > handles the alignment for overaligned structures as addressable local >> > variables (see the related case discussed in the PR). >> >> A difference with the latter is that cfgexpand (AFAICT) always >> honours the DECL/TYPE_ALIGN, with LOCAL_DECL_ALIGNMENT supposedly >> only increasing the alignment for efficiency reasons (rather than >> decreasing it). >> >> So… >> >> > This patch also updates the aapcs64 test mentioned in the PR to avoid >> > the frontend folding away the alignment check. I've confirmed that the >> > execution test actually fails on aarch64-linux-gnu prior to the patch >> > being applied and passes afterwards. >> > >> > Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and >> > arm-linux-gnueabihf: no regressions. >> > >> > I'd appreciate any feedback. Is it OK for trunk? >> > >> > Thanks, >> > Alex >> > >> > gcc/ChangeLog: >> > >> > PR middle-end/103500 >> > * function.c (get_stack_local_alignment): Align BLKmode overaligned >> > types to the alignment required by the type. >> > (assign_stack_temp_for_type): Handle BLKmode overaligned stack >> > slots by allocating a larger-than-necessary buffer and aligning >> > the address within appropriately. >> > >> > gcc/testsuite/ChangeLog: >> > >> > PR middle-end/103500 >> > * gcc.target/aarch64/aapcs64/rec_align-8.c (test_pass_by_ref): >> > Prevent the frontend from folding our alignment check away by >> > using snprintf to store the pointer into a string and recovering >> > it with sscanf. >> > >> > diff --git a/gcc/function.c b/gcc/function.c >> > index 61b3bd036b8..5ed722ab959 100644 >> > --- a/gcc/function.c >> > +++ b/gcc/function.c >> > @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode) >> > unsigned int alignment; >> > >> > if (mode == BLKmode) >> > - alignment = BIGGEST_ALIGNMENT; >> > + alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT) >> > + ? TYPE_ALIGN (type) >> > + : BIGGEST_ALIGNMENT; >> >> …I'm not sure about this calculation. Why do we only honour TYPE_ALIGN >> if it's greater than MAX_SUPPORTED_STACK_ALIGNMENT, and fall all the >> way back to BIGGEST_ALIGNMENT otherwise? It looks like on nvptx >> this would have the effect of honouring (say) 2048-byte alignment, >> but not 32-byte alignment (which falls between BIGGEST_ALIGNMENT >> and MAX_SUPPORTED_STACK_ALIGNMENT). > > So, to be honest, this was a bit of a bodge to try and work around > issues on x86. My original attempt at solving the PR used the more > obvious calculation you suggest below, i.e. the max of BIGGEST_ALIGNMENT > and TYPE_ALIGN (type). The problem with that is, on x86, > MAX_SUPPORTED_STACK_ALIGNMENT has a huge value (2^31). > explow.c:get_dynamic_stack_size has: > > if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0) > { > size = round_push (size); > [...] > } > *psize = size; > > so inevitably we end up calling round_push on x86 which in turn ends up > going down the else branch: > > else > { > /* If crtl->preferred_stack_boundary might still grow, use > virtual_preferred_stack_boundary_rtx instead. This will be > substituted by the right value in vregs pass and optimized > during combine. */ > align_rtx = virtual_preferred_stack_boundary_rtx; > alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), > NULL_RTX); > } > > and we end up returning a size expression involving > virtual_preferred_stack_boundary_rtx, which clearly cannot be a > CONST_INT (required for the approach in the patch to work: we need a > constant to pass to assign_stack_local_1). > > So the idea with that calculation was to only use the new, larger > alignment, if it was larger than MAX_SUPPORTED_STACK_ALIGNMENT (which > should never be the case on x86 for any reasonable value of TYPE_ALIGN). > Otherwise, it would fall back to the old behaviour of just using > BIGGEST_ALIGNMENT. > > I realise that this is a rather inelegant hack, but I wasn't sure how > else to approach the problem. Yeah, I think this shows that get_dynamic_stack_size probably isn't the right interface to use here. That's designed for cases in which we're doing an independent alloca(), but here we want to do a normal static allocation with an increased size. >> Also, what about the non-BLKmode case? A type's mode cannot be more >> aligned than the type, but a type is allowed to be more aligned than >> its mode. E.g.: > > Yes, we should probably try and handle the non-BLKmode case as well. I > was just trying to avoid changing too much at once at the risk of > introducing regressions. I think it'd be good to tackle both cases at the same time. >> typedef unsigned int V __attribute__((vector_size(32))); >> typedef struct __attribute__((aligned(32))) { V x; } S; >> S x; >> void f(S); >> void g(void) { f(x); } >> >> doesn't seem to be handled by the patch. It's possible to create >> variations on this on aarch64 up to 2048 bits using SVE and >> -msve-vector-bits=N. >> >> Perhaps we should treat the old alignment calculation as a minimum >> and take the max of that and TYPE_ALIGN, where TYPE_ALIGN is given. > > As discussed above, this doesn't work on x86, unfortunately. If you have > any alternative suggestions as to how to approach the problem, it would > be great to hear them. How about instead: (1) Define a new ASLK_* flag for assign_stack_local_1. (2) When the flag is set, make: if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) { alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; } increase the size by (new_align - old_align). (3) When the flag is set, and the new alignment is smaller than the original alignment, call align_dynamic_address before creating the MEM, as in your original patch. There are probably other details, but it looks like that should cope with the x86 dynamic stack realignment scheme even with the max-based alignment calculation discussed above. Thanks, Richard
Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > How about instead: > > (1) Define a new ASLK_* flag for assign_stack_local_1. > > (2) When the flag is set, make: > > if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) > { > alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT; > alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT; > } > > increase the size by (new_align - old_align). Er, I meant old_align - new_align here :-) > > (3) When the flag is set, and the new alignment is smaller than the > original alignment, call align_dynamic_address before creating > the MEM, as in your original patch. > > There are probably other details, but it looks like that should cope > with the x86 dynamic stack realignment scheme even with the max-based > alignment calculation discussed above. > > Thanks, > Richard
diff --git a/gcc/function.c b/gcc/function.c index 61b3bd036b8..5ed722ab959 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -278,7 +278,9 @@ get_stack_local_alignment (tree type, machine_mode mode) unsigned int alignment; if (mode == BLKmode) - alignment = BIGGEST_ALIGNMENT; + alignment = (type && TYPE_ALIGN (type) > MAX_SUPPORTED_STACK_ALIGNMENT) + ? TYPE_ALIGN (type) + : BIGGEST_ALIGNMENT; else alignment = GET_MODE_ALIGNMENT (mode); @@ -872,21 +874,35 @@ assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type) p = ggc_alloc<temp_slot> (); - /* We are passing an explicit alignment request to assign_stack_local. - One side effect of that is assign_stack_local will not round SIZE - to ensure the frame offset remains suitably aligned. - - So for requests which depended on the rounding of SIZE, we go ahead - and round it now. We also make sure ALIGNMENT is at least - BIGGEST_ALIGNMENT. */ - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); - p->slot = assign_stack_local_1 (mode, - (mode == BLKmode - ? aligned_upper_bound (size, - (int) align - / BITS_PER_UNIT) - : size), - align, 0); + if (mode == BLKmode && align > MAX_SUPPORTED_STACK_ALIGNMENT) + { + rtx allocsize = gen_int_mode (size, Pmode); + get_dynamic_stack_size (&allocsize, 0, align, NULL); + gcc_assert (CONST_INT_P (allocsize)); + size = UINTVAL (allocsize); + p->slot = assign_stack_local_1 (mode, + size, + BIGGEST_ALIGNMENT, 0); + rtx addr = align_dynamic_address (XEXP (p->slot, 0), align); + mark_reg_pointer (addr, align); + p->slot = gen_rtx_MEM (GET_MODE (p->slot), addr); + MEM_NOTRAP_P (p->slot) = 1; + } + else + /* We are passing an explicit alignment request to assign_stack_local. + One side effect of that is assign_stack_local will not round SIZE + to ensure the frame offset remains suitably aligned. + + So for requests which depended on the rounding of SIZE, we go ahead + and round it now. We also make sure ALIGNMENT is at least + BIGGEST_ALIGNMENT. */ + p->slot = assign_stack_local_1 (mode, + (mode == BLKmode + ? aligned_upper_bound (size, + (int) align + / BITS_PER_UNIT) + : size), + align, 0); p->align = align; diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c index c9351802191..0b0ac0b9b97 100644 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c @@ -21,10 +21,18 @@ test_pass_by_ref (int x0, overaligned x1, int x2) abort (); if (memcmp ((void *) &x1, (void *)&a, sizeof (overaligned))) abort (); - long addr = ((long) &x1) & 31; + + char buf[64]; + void *ptr; + + __builtin_snprintf (buf, sizeof(buf), "%p", &x1); + if (__builtin_sscanf(buf, "%p", &ptr) != 1) + abort (); + + long addr = ((long) ptr) & 31; if (addr != 0) { - __builtin_printf ("Alignment was %d\n", addr); + __builtin_printf ("Alignment was %ld\n", addr); abort (); } }