| Message ID | 20260506092124.2542192-1-stefansf@linux.ibm.com |
|---|---|
| State | New |
| Headers |
Return-Path: <gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3FD8F4BA2E27 for <patchwork@sourceware.org>; Wed, 6 May 2026 09:22:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3FD8F4BA2E27 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=gkfWYOzt X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E964C4BA23FC for <gcc-patches@gcc.gnu.org>; Wed, 6 May 2026 09:21:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E964C4BA23FC Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E964C4BA23FC Authentication-Results: sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1778059307; cv=none; b=a5L2P8t3xMmlTSPEhva8dTNOIR61+k6t5e8H02JXL65qtU29Xz+NxHt/mRavdbuB0SWW2yTqEUxPbY8+7/138cgXtTqx7NHndW1xLdqAahunQHULNYq3KM8LsZwZBEGLyDoFb8u1QKlQxfZob9+rOzXDGKvSJvBDaYZ9vtx7eCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1778059307; c=relaxed/simple; bh=ZQU6ERRn0fr5BZqg1YCPaUEzziK1DYqImycGvuLm4aI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=DmlFxleF9CGhC27zlyI3v7zQkng9wZwYq70byUVbhequkFg08so/z3re9U/NNJ7XqVSUrnjuiPHpoBBWNI0W8dbs+1VaBs/2ibslIXP8Av2FdB5Nd2lvEYl0Bf0fvRBZ5TtF4TNJO//oBbb87c2AjbFefWYzIPTGpBBcMHX9Um8= ARC-Authentication-Results: i=1; sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=gkfWYOzt DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E964C4BA23FC Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 645GMNh73390293; Wed, 6 May 2026 09:21:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:message-id:mime-version :subject:to; s=pp1; bh=kUr6G0zKc2hejDx8RFWbns4neC0R4PsH8Kz/YdJq0 HM=; b=gkfWYOztW+9v38zjjWfqQgZn2r80OD/xtBBnKkVADv91Ab3yf4G+9DGis PkDBtAAxn2suf/8l6GAzIXfzCHVks4oqSgTiU6uP4WjpplKRK8+OGhCT9IlRUG1T zyJu+UXFQDCdoCgOs/tbhcRmLvTTpBMdM4CRGGCV+pPdNGktrXtbrV8OfPCmIrVT ek5YUjh8924eE6h7/FhzEWJUHqbUIYDZr/WeoyGBlI4hLpFGmMR5lgf1vocpdXZ+ H/floLAkQgcgrqKuIhTljKC3WilVy59HfZj1VXK8hX8uIuLapTmtyzpPQoUlKZKI 298jkLccn/pZDDLlx8rH3ZXfEkmhw== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4dw9xxq7am-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 May 2026 09:21:45 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.7/8.18.1.7) with ESMTP id 64699S6g015223; Wed, 6 May 2026 09:21:45 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4dwwtgdcj7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 06 May 2026 09:21:45 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 6469Lf1W31654624 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 6 May 2026 09:21:41 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A999E20043; Wed, 6 May 2026 09:21:41 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9205D20040; Wed, 6 May 2026 09:21:41 +0000 (GMT) Received: from b3545007.lnxne.boe (unknown [9.87.84.240]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTPS; Wed, 6 May 2026 09:21:41 +0000 (GMT) From: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com> To: Vladimir Makarov <vmakarov@redhat.com>, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com> Subject: [PATCH] lra: Reloading section anchors Date: Wed, 6 May 2026 11:21:24 +0200 Message-ID: <20260506092124.2542192-1-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.53.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTA2MDA4OCBTYWx0ZWRfX+qwpldlMGuc1 91GdKRreRou+7EW0fjBIAqlxgFGh2jZdlkDLYd1FDdLFl5hF3Mn8UOkT31E7aVA3jQ4+wQh+EN4 WqOQj84C4LrnalmZUw6GW6ASO6ILZ6/TlrH7VOrZuT2KZCj2KXpGHhyYpNO9lO32hJy0Ex2Db0b cy/8mzrtqPboyfTafKubsICEI5+XcsEcX5pieT7u0XXKl2K7dLIPF+r/kdNyCzhv2/CTJpw4ZiD isVHG6kgu1dP4JQDqGHwbEZbK9liMotwN8OoupYx2M7lTmpnjy3pTp/90LfprczYuJvYDG2Lzjg crlFhwOQqvL0OQHQ+30nnDQhVTc7eUu71z9irXpKOxZGy3pfSq9PM53HD45/lKxa9Uly4PxMtsB iGSJREPNwFMf4Bmv+2ZwcxRyks2tVBA6TYM3M/yty+Itc70Sp3aCu+iTl5dbduzFGXap/tIwF3Y ALI7RV31qVghnGgaZFw== X-Proofpoint-ORIG-GUID: srLO44U4KcPNzIDbrvOOEqzTWYwHnt6K X-Proofpoint-GUID: srLO44U4KcPNzIDbrvOOEqzTWYwHnt6K X-Authority-Analysis: v=2.4 cv=ctWrVV4i c=1 sm=1 tr=0 ts=69fb082a cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=NGcC8JguVDcA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=V8glGbnc2Ofi9Qvn3v5h:22 a=mDV3o1hIAAAA:8 a=ydF0yO2sR2PL8jc0SGAA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-05-05_03,2026-04-30_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 adultscore=0 clxscore=1015 suspectscore=0 impostorscore=0 spamscore=0 malwarescore=0 phishscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2604200000 definitions=main-2605060088 X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_BLOCKED, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org |
| Series |
lra: Reloading section anchors
|
|
Commit Message
Stefan Schulze Frielinghaus
May 6, 2026, 9:21 a.m. UTC
From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
Currently an "entire" address is reloaded even in cases where section
anchors are involved. This makes it harder to share section anchors
which is the whole point of them. For example, in cases where
offsetable MEMs are ok do not reload .LANCHOR42+offset but only
.LANCHOR42 and replace the address with the resulting reload register
and the offset. As a consequence subsequent passes only have to deal
with register equivalences in order to share section anchors. For
example
double x;
double y;
double foo ()
{
return x + y;
}
With this patch, after LRA we have
20: %r1:DI=`*.LANCHOR0'
17: %f0:DF=[%r1:DI]
19: %r1:DI=`*.LANCHOR0'
12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
and after postreload
20: %r1:DI=`*.LANCHOR0'
17: %f0:DF=[%r1:DI]
12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
Of course, this was a lucky case since both reloads referred to the very
same register which allows for trivial removal of insn 19. At least in
cases like demonstrated by the new test section-anchors-4.c we are
guaranteed to re-use the reload for a single insn.
Before testing this patch for multiple targets, I'm wondering whether
there is even a way to re-use reloads during LRA across insns (like an
equiv) such that we wouldn't depend on subsequent passes?
---
gcc/lra-constraints.cc | 30 +++++++++++++++++++
.../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++
2 files changed, 55 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
Comments
On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote: > From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org> > > Currently an "entire" address is reloaded even in cases where section > anchors are involved. This makes it harder to share section anchors > which is the whole point of them. For example, in cases where > offsetable MEMs are ok do not reload .LANCHOR42+offset but only > .LANCHOR42 and replace the address with the resulting reload register > and the offset. As a consequence subsequent passes only have to deal > with register equivalences in order to share section anchors. For > example I thought how to fix this in another place as LRA is already too complicated. It could be fixed in some machined-dependent pass or in split1. Adding the pass is overkill and fix in split1 would be ok if the target would work with memory via few insns (e.g. only via load/store insns). So probably LRA is the best place to fix this problem. > double x; > double y; > > double foo () > { > return x + y; > } > > With this patch, after LRA we have > > 20: %r1:DI=`*.LANCHOR0' > 17: %f0:DF=[%r1:DI] > 19: %r1:DI=`*.LANCHOR0' > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;} > > and after postreload > > 20: %r1:DI=`*.LANCHOR0' > 17: %f0:DF=[%r1:DI] > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;} > > Of course, this was a lucky case since both reloads referred to the very > same register which allows for trivial removal of insn 19. At least in > cases like demonstrated by the new test section-anchors-4.c we are > guaranteed to re-use the reload for a single insn. > > Before testing this patch for multiple targets, I'm wondering whether > there is even a way to re-use reloads during LRA across insns (like an > equiv) such that we wouldn't depend on subsequent passes? LRA reuses the reload pseudo generated for one insn (please see usage of curr_insn_input_reloads). The problem in not reusing reload pseudo for the above example is that because reload for anchor occurs from different insns. First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg constraint in insn 12), then the anchor (from insn 17 ), and then reload *.LANCHOR0 from 2nd op of insn 12. But I'd not be worry that the reload pseudos get different hard regs. Knowing how assigning hard regs works in LRA I see very small probability that the pseduos get different regs. BTW I did not reproduce the testcase situation w/o -march options (the anchor in this case already in a pseudo before RA). But -march=z13 reproduces it. So probably you need to add this option to the test. So the patch (with -march=z13 or other one reproducing the situation as additional option for the test) is ok for me. Thank you. > --- > gcc/lra-constraints.cc | 30 +++++++++++++++++++ > .../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++ > 2 files changed, 55 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > index ccd68efc956..6779dfee020 100644 > --- a/gcc/lra-constraints.cc > +++ b/gcc/lra-constraints.cc > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p) > new_reg = emit_inc (rclass, *loc, > /* This value does not matter for MODIFY. */ > GET_MODE_SIZE (GET_MODE (op))); > + /* Try to pull out section anchors. For example, instead of > + reloading an "entire" address like .LANCHOR42+offset only reload > + .LANCHOR42 and use the new reload register as the base register. > + This allows following optimizations to share section anchors and > + remove redundant loads. */ > + else if (GET_CODE (*loc) == CONST > + && GET_CODE (XEXP (*loc, 0)) == PLUS > + && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF > + && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0)) > + && CONST_INT_P (XEXP (XEXP (*loc, 0), 1)) > + /* Some offsets are valid in conjunction with a symbol and > + invalid in conjunction with a register. Thus, pull out > + the anchor only in case the offset is a valid anchor > + offset. */ > + && INTVAL (XEXP (XEXP (*loc, 0), 1)) > + >= targetm.min_anchor_offset > + && INTVAL (XEXP (XEXP (*loc, 0), 1)) > + <= targetm.max_anchor_offset) > + { > + rtx anchor = XEXP (XEXP (*loc, 0), 0); > + rtx offset = XEXP (XEXP (*loc, 0), 1); > + > + if (get_reload_reg (OP_IN, Pmode, anchor, rclass, > + NULL, false, false, > + "offsetable address", &new_reg)) > + lra_emit_move (new_reg, anchor); > + > + new_reg = gen_rtx_PLUS (Pmode, new_reg, offset); > + lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op))); > + } > else if (get_reload_reg (OP_IN, Pmode, *loc, rclass, > NULL, false, false, > "offsetable address", &new_reg)) > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c > new file mode 100644 > index 00000000000..0b4cd081c61 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */ > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */ > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */ > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */ > + > +/* Ensure that we get the same reload register for the second memory operand. > + Prior LRA we have > + > + 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)]) > + > + and after LRA > + > + 59: %r1:DI=`*.LANCHOR0' > + 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */ > + > +long x, y; > + > +long > +foo (void) > +{ > + if (x == y) > + return 42; > + return 0; > +}
On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote: > > On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote: > > From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org> > > > > Currently an "entire" address is reloaded even in cases where section > > anchors are involved. This makes it harder to share section anchors > > which is the whole point of them. For example, in cases where > > offsetable MEMs are ok do not reload .LANCHOR42+offset but only > > .LANCHOR42 and replace the address with the resulting reload register > > and the offset. As a consequence subsequent passes only have to deal > > with register equivalences in order to share section anchors. For > > example > > > I thought how to fix this in another place as LRA is already too > complicated. It could be fixed in some machined-dependent pass or in > split1. Adding the pass is overkill and fix in split1 would be ok if the > target would work with memory via few insns (e.g. only via load/store > insns). So probably LRA is the best place to fix this problem. > > > > double x; > > double y; > > > > double foo () > > { > > return x + y; > > } > > > > With this patch, after LRA we have > > > > 20: %r1:DI=`*.LANCHOR0' > > 17: %f0:DF=[%r1:DI] > > 19: %r1:DI=`*.LANCHOR0' > > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;} > > > > and after postreload > > > > 20: %r1:DI=`*.LANCHOR0' > > 17: %f0:DF=[%r1:DI] > > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;} > > > > Of course, this was a lucky case since both reloads referred to the very > > same register which allows for trivial removal of insn 19. At least in > > cases like demonstrated by the new test section-anchors-4.c we are > > guaranteed to re-use the reload for a single insn. > > > > Before testing this patch for multiple targets, I'm wondering whether > > there is even a way to re-use reloads during LRA across insns (like an > > equiv) such that we wouldn't depend on subsequent passes? Meanwhile successfully bootstrapped and regtested on - aarch64-unknown-linux-gnu - s390x-ibm-linux-gnu - x86_64-pc-linux-gnu For powerpc64le-unknown-linux-gnu there is a new test failure: FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in extract_constrain_insn, at recog.cc:2795) Previously the whole address was reloaded which means we ended up with: 18: %2:DI=const(`*.LANCHOR0'+0x4) 7: %3:SI=bswap([%2:DI]) and with this patch we bail out during LRA for 18: %2:DI=`*.LANCHOR0' 7: %3:SI=bswap([%2:DI+0x4]) with pr94740.c:11:1: error: insn does not satisfy its constraints: 11 | } | ^ (insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119]) (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127]) (const_int 4 [0x4])) [1 array[1]+0 S4 A32]))) "pr94740.c":10:10 143 {bswapsi2_load} (nil)) The insn definition is: (define_insn "bswap<mode>2_load" [(set (match_operand:HSI 0 "gpc_reg_operand" "=r") (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))] "" "l<wd>brx %0,%y1" [(set_attr "type" "load")]) and the important part of the constraint Z is (define_special_predicate "indexed_or_indirect_address" (and (match_test "REG_P (op) || (GET_CODE (op) == PLUS /* Omit testing REG_P (XEXP (op, 0)). */ && REG_P (XEXP (op, 1)))") (match_operand 0 "address_operand"))) which doesn't accept offsettable addresses. At this point I'm not entirely sure what the contract between targets and LRA is. My reading so far was that due to goal_alt_offmemok[i] it is safe to use offsettable addresses. > > LRA reuses the reload pseudo generated for one insn (please see usage of > curr_insn_input_reloads). The problem in not reusing reload pseudo for the > above example is that because reload for anchor occurs from different > insns. First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg > constraint in insn 12), then the anchor (from insn 17 ), and then reload > *.LANCHOR0 from 2nd op of insn 12. But I'd not be worry that the reload > pseudos get different hard regs. Knowing how assigning hard regs works in > LRA I see very small probability that the pseduos get different regs. > > BTW I did not reproduce the testcase situation w/o -march options (the > anchor in this case already in a pseudo before RA). But -march=z13 > reproduces it. So probably you need to add this option to the test. Ahh right, I tested this patch on top of a private one where the test case is successful for many archs. Currently it is failing because the last alternative of insn cmpdi_cct is rejected for vanilla and accepted for my private patch. It will take me some time to polish up the private patch. Thus, I think we have two options here. Proceed with this patch without a test case, or wait until the other patch is ready. I'm fine either way. I was afraid of a flaky test and deliberately chose this test since it guarantees that the section anchor ends up in the very same reload register. Maybe I should be more brave here ;-) Thanks, Stefan > > So the patch (with -march=z13 or other one reproducing the situation as > additional option for the test) is ok for me. Thank you. > > > --- > > gcc/lra-constraints.cc | 30 +++++++++++++++++++ > > .../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++ > > 2 files changed, 55 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c > > > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > > index ccd68efc956..6779dfee020 100644 > > --- a/gcc/lra-constraints.cc > > +++ b/gcc/lra-constraints.cc > > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p) > > new_reg = emit_inc (rclass, *loc, > > /* This value does not matter for MODIFY. */ > > GET_MODE_SIZE (GET_MODE (op))); > > + /* Try to pull out section anchors. For example, instead of > > + reloading an "entire" address like .LANCHOR42+offset only reload > > + .LANCHOR42 and use the new reload register as the base register. > > + This allows following optimizations to share section anchors and > > + remove redundant loads. */ > > + else if (GET_CODE (*loc) == CONST > > + && GET_CODE (XEXP (*loc, 0)) == PLUS > > + && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF > > + && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0)) > > + && CONST_INT_P (XEXP (XEXP (*loc, 0), 1)) > > + /* Some offsets are valid in conjunction with a symbol and > > + invalid in conjunction with a register. Thus, pull out > > + the anchor only in case the offset is a valid anchor > > + offset. */ > > + && INTVAL (XEXP (XEXP (*loc, 0), 1)) > > + >= targetm.min_anchor_offset > > + && INTVAL (XEXP (XEXP (*loc, 0), 1)) > > + <= targetm.max_anchor_offset) > > + { > > + rtx anchor = XEXP (XEXP (*loc, 0), 0); > > + rtx offset = XEXP (XEXP (*loc, 0), 1); > > + > > + if (get_reload_reg (OP_IN, Pmode, anchor, rclass, > > + NULL, false, false, > > + "offsetable address", &new_reg)) > > + lra_emit_move (new_reg, anchor); > > + > > + new_reg = gen_rtx_PLUS (Pmode, new_reg, offset); > > + lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op))); > > + } > > else if (get_reload_reg (OP_IN, Pmode, *loc, rclass, > > NULL, false, false, > > "offsetable address", &new_reg)) > > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c > > new file mode 100644 > > index 00000000000..0b4cd081c61 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c > > @@ -0,0 +1,25 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */ > > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */ > > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */ > > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */ > > + > > +/* Ensure that we get the same reload register for the second memory operand. > > + Prior LRA we have > > + > > + 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)]) > > + > > + and after LRA > > + > > + 59: %r1:DI=`*.LANCHOR0' > > + 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */ > > + > > +long x, y; > > + > > +long > > +foo (void) > > +{ > > + if (x == y) > > + return 42; > > + return 0; > > +} >
On Mon, May 11, 2026 at 09:40:15PM +0200, Stefan Schulze Frielinghaus wrote: > On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote: > > > > On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote: > > > From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org> > > > > > > Currently an "entire" address is reloaded even in cases where section > > > anchors are involved. This makes it harder to share section anchors > > > which is the whole point of them. For example, in cases where > > > offsetable MEMs are ok do not reload .LANCHOR42+offset but only > > > .LANCHOR42 and replace the address with the resulting reload register > > > and the offset. As a consequence subsequent passes only have to deal > > > with register equivalences in order to share section anchors. For > > > example > > > > > > I thought how to fix this in another place as LRA is already too > > complicated. It could be fixed in some machined-dependent pass or in > > split1. Adding the pass is overkill and fix in split1 would be ok if the > > target would work with memory via few insns (e.g. only via load/store > > insns). So probably LRA is the best place to fix this problem. > > > > > > > double x; > > > double y; > > > > > > double foo () > > > { > > > return x + y; > > > } > > > > > > With this patch, after LRA we have > > > > > > 20: %r1:DI=`*.LANCHOR0' > > > 17: %f0:DF=[%r1:DI] > > > 19: %r1:DI=`*.LANCHOR0' > > > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;} > > > > > > and after postreload > > > > > > 20: %r1:DI=`*.LANCHOR0' > > > 17: %f0:DF=[%r1:DI] > > > 12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;} > > > > > > Of course, this was a lucky case since both reloads referred to the very > > > same register which allows for trivial removal of insn 19. At least in > > > cases like demonstrated by the new test section-anchors-4.c we are > > > guaranteed to re-use the reload for a single insn. > > > > > > Before testing this patch for multiple targets, I'm wondering whether > > > there is even a way to re-use reloads during LRA across insns (like an > > > equiv) such that we wouldn't depend on subsequent passes? > > Meanwhile successfully bootstrapped and regtested on > > - aarch64-unknown-linux-gnu > - s390x-ibm-linux-gnu > - x86_64-pc-linux-gnu > > For powerpc64le-unknown-linux-gnu there is a new test failure: > > FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in extract_constrain_insn, at recog.cc:2795) > > Previously the whole address was reloaded which means we ended up with: > > 18: %2:DI=const(`*.LANCHOR0'+0x4) > 7: %3:SI=bswap([%2:DI]) > > and with this patch we bail out during LRA for > > 18: %2:DI=`*.LANCHOR0' > 7: %3:SI=bswap([%2:DI+0x4]) > > with > > pr94740.c:11:1: error: insn does not satisfy its constraints: > 11 | } > | ^ > (insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119]) > (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127]) > (const_int 4 [0x4])) [1 array[1]+0 S4 A32]))) "pr94740.c":10:10 143 {bswapsi2_load} > (nil)) > > The insn definition is: > > (define_insn "bswap<mode>2_load" > [(set (match_operand:HSI 0 "gpc_reg_operand" "=r") > (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))] > "" > "l<wd>brx %0,%y1" > [(set_attr "type" "load")]) > > and the important part of the constraint Z is > > (define_special_predicate "indexed_or_indirect_address" > (and (match_test "REG_P (op) > || (GET_CODE (op) == PLUS > /* Omit testing REG_P (XEXP (op, 0)). */ > && REG_P (XEXP (op, 1)))") > (match_operand 0 "address_operand"))) > > which doesn't accept offsettable addresses. At this point I'm not > entirely sure what the contract between targets and LRA is. My reading > so far was that due to goal_alt_offmemok[i] it is safe to use > offsettable addresses. goal_alt_offmemok[i] is derived from offmemok which is computed in process_alt_operands() as e.g. here: case CT_MEMORY: case CT_RELAXED_MEMORY: if (MEM_P (op) && satisfies_memory_constraint_p (op, cn)) win = true; else if (spilled_pseudo_p (op)) win = true; /* If we didn't already win, we can reload constants via force_const_mem or put the pseudo value into memory, or make other memory by reloading the address like for 'o'. */ if (CONST_POOL_OK_P (mode, op) || MEM_P (op) || REG_P (op) /* We can restore the equiv insn by a reload. */ || equiv_substition_p[nop]) badop = false; constmemok = true; offmemok = true; break; Here offmemok is set without explicitly checking whether the constraint accepts offsettable memory or not. So far this was no problem because inside of if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op)) every resulting address was a single register. I'm a bit puzzled. Either goal_alt_offmemok[i] doesn't mean that offsettable addresses are ok, or another check is required before setting offmemok to true. Any thoughts? Cheers, Stefan > > > > > LRA reuses the reload pseudo generated for one insn (please see usage of > > curr_insn_input_reloads). The problem in not reusing reload pseudo for the > > above example is that because reload for anchor occurs from different > > insns. First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg > > constraint in insn 12), then the anchor (from insn 17 ), and then reload > > *.LANCHOR0 from 2nd op of insn 12. But I'd not be worry that the reload > > pseudos get different hard regs. Knowing how assigning hard regs works in > > LRA I see very small probability that the pseduos get different regs. > > > > BTW I did not reproduce the testcase situation w/o -march options (the > > anchor in this case already in a pseudo before RA). But -march=z13 > > reproduces it. So probably you need to add this option to the test. > > Ahh right, I tested this patch on top of a private one where the test > case is successful for many archs. Currently it is failing because > the last alternative of insn cmpdi_cct is rejected for vanilla and > accepted for my private patch. It will take me some time to polish up > the private patch. Thus, I think we have two options here. Proceed > with this patch without a test case, or wait until the other patch is > ready. I'm fine either way. > > I was afraid of a flaky test and deliberately chose this test since it > guarantees that the section anchor ends up in the very same reload > register. Maybe I should be more brave here ;-) > > Thanks, > Stefan > > > > > So the patch (with -march=z13 or other one reproducing the situation as > > additional option for the test) is ok for me. Thank you. > > > > > --- > > > gcc/lra-constraints.cc | 30 +++++++++++++++++++ > > > .../gcc.target/s390/section-anchors-4.c | 25 ++++++++++++++++ > > > 2 files changed, 55 insertions(+) > > > create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c > > > > > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > > > index ccd68efc956..6779dfee020 100644 > > > --- a/gcc/lra-constraints.cc > > > +++ b/gcc/lra-constraints.cc > > > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p) > > > new_reg = emit_inc (rclass, *loc, > > > /* This value does not matter for MODIFY. */ > > > GET_MODE_SIZE (GET_MODE (op))); > > > + /* Try to pull out section anchors. For example, instead of > > > + reloading an "entire" address like .LANCHOR42+offset only reload > > > + .LANCHOR42 and use the new reload register as the base register. > > > + This allows following optimizations to share section anchors and > > > + remove redundant loads. */ > > > + else if (GET_CODE (*loc) == CONST > > > + && GET_CODE (XEXP (*loc, 0)) == PLUS > > > + && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF > > > + && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0)) > > > + && CONST_INT_P (XEXP (XEXP (*loc, 0), 1)) > > > + /* Some offsets are valid in conjunction with a symbol and > > > + invalid in conjunction with a register. Thus, pull out > > > + the anchor only in case the offset is a valid anchor > > > + offset. */ > > > + && INTVAL (XEXP (XEXP (*loc, 0), 1)) > > > + >= targetm.min_anchor_offset > > > + && INTVAL (XEXP (XEXP (*loc, 0), 1)) > > > + <= targetm.max_anchor_offset) > > > + { > > > + rtx anchor = XEXP (XEXP (*loc, 0), 0); > > > + rtx offset = XEXP (XEXP (*loc, 0), 1); > > > + > > > + if (get_reload_reg (OP_IN, Pmode, anchor, rclass, > > > + NULL, false, false, > > > + "offsetable address", &new_reg)) > > > + lra_emit_move (new_reg, anchor); > > > + > > > + new_reg = gen_rtx_PLUS (Pmode, new_reg, offset); > > > + lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op))); > > > + } > > > else if (get_reload_reg (OP_IN, Pmode, *loc, rclass, > > > NULL, false, false, > > > "offsetable address", &new_reg)) > > > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c > > > new file mode 100644 > > > index 00000000000..0b4cd081c61 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c > > > @@ -0,0 +1,25 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */ > > > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */ > > > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */ > > > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */ > > > + > > > +/* Ensure that we get the same reload register for the second memory operand. > > > + Prior LRA we have > > > + > > > + 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)]) > > > + > > > + and after LRA > > > + > > > + 59: %r1:DI=`*.LANCHOR0' > > > + 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */ > > > + > > > +long x, y; > > > + > > > +long > > > +foo (void) > > > +{ > > > + if (x == y) > > > + return 42; > > > + return 0; > > > +} > >
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index ccd68efc956..6779dfee020 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p) new_reg = emit_inc (rclass, *loc, /* This value does not matter for MODIFY. */ GET_MODE_SIZE (GET_MODE (op))); + /* Try to pull out section anchors. For example, instead of + reloading an "entire" address like .LANCHOR42+offset only reload + .LANCHOR42 and use the new reload register as the base register. + This allows following optimizations to share section anchors and + remove redundant loads. */ + else if (GET_CODE (*loc) == CONST + && GET_CODE (XEXP (*loc, 0)) == PLUS + && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF + && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0)) + && CONST_INT_P (XEXP (XEXP (*loc, 0), 1)) + /* Some offsets are valid in conjunction with a symbol and + invalid in conjunction with a register. Thus, pull out + the anchor only in case the offset is a valid anchor + offset. */ + && INTVAL (XEXP (XEXP (*loc, 0), 1)) + >= targetm.min_anchor_offset + && INTVAL (XEXP (XEXP (*loc, 0), 1)) + <= targetm.max_anchor_offset) + { + rtx anchor = XEXP (XEXP (*loc, 0), 0); + rtx offset = XEXP (XEXP (*loc, 0), 1); + + if (get_reload_reg (OP_IN, Pmode, anchor, rclass, + NULL, false, false, + "offsetable address", &new_reg)) + lra_emit_move (new_reg, anchor); + + new_reg = gen_rtx_PLUS (Pmode, new_reg, offset); + lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op))); + } else if (get_reload_reg (OP_IN, Pmode, *loc, rclass, NULL, false, false, "offsetable address", &new_reg)) diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c new file mode 100644 index 00000000000..0b4cd081c61 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */ +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */ +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */ +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */ + +/* Ensure that we get the same reload register for the second memory operand. + Prior LRA we have + + 55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)]) + + and after LRA + + 59: %r1:DI=`*.LANCHOR0' + 55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8]) */ + +long x, y; + +long +foo (void) +{ + if (x == y) + return 42; + return 0; +}