lra: Reloading section anchors

Message ID 20260506092124.2542192-1-stefansf@linux.ibm.com
State New
Headers
Series lra: Reloading section anchors |

Commit Message

Stefan Schulze Frielinghaus May 6, 2026, 9:21 a.m. UTC
  From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>

Currently an "entire" address is reloaded even in cases where section
anchors are involved.  This makes it harder to share section anchors
which is the whole point of them.  For example, in cases where
offsetable MEMs are ok do not reload .LANCHOR42+offset but only
.LANCHOR42 and replace the address with the resulting reload register
and the offset.  As a consequence subsequent passes only have to deal
with register equivalences in order to share section anchors.  For
example

double x;
double y;

double foo ()
{
  return x + y;
}

With this patch, after LRA we have

   20: %r1:DI=`*.LANCHOR0'
   17: %f0:DF=[%r1:DI]
   19: %r1:DI=`*.LANCHOR0'
   12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}

and after postreload

   20: %r1:DI=`*.LANCHOR0'
   17: %f0:DF=[%r1:DI]
   12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}

Of course, this was a lucky case since both reloads referred to the very
same register which allows for trivial removal of insn 19.  At least in
cases like demonstrated by the new test section-anchors-4.c we are
guaranteed to re-use the reload for a single insn.

Before testing this patch for multiple targets, I'm wondering whether
there is even a way to re-use reloads during LRA across insns (like an
equiv) such that we wouldn't depend on subsequent passes?
---
 gcc/lra-constraints.cc                        | 30 +++++++++++++++++++
 .../gcc.target/s390/section-anchors-4.c       | 25 ++++++++++++++++
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
  

Comments

Vladimir Makarov May 8, 2026, 12:23 p.m. UTC | #1
On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
>
> Currently an "entire" address is reloaded even in cases where section
> anchors are involved.  This makes it harder to share section anchors
> which is the whole point of them.  For example, in cases where
> offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> .LANCHOR42 and replace the address with the resulting reload register
> and the offset.  As a consequence subsequent passes only have to deal
> with register equivalences in order to share section anchors.  For
> example


I thought how to fix this in another place as LRA is already too 
complicated.  It could be fixed in some machined-dependent pass or in 
split1.  Adding the pass is overkill and fix in split1 would be ok if 
the target would work with memory via few insns (e.g. only via 
load/store insns).  So probably LRA is the best place to fix this problem.


> double x;
> double y;
>
> double foo ()
> {
>    return x + y;
> }
>
> With this patch, after LRA we have
>
>     20: %r1:DI=`*.LANCHOR0'
>     17: %f0:DF=[%r1:DI]
>     19: %r1:DI=`*.LANCHOR0'
>     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
>
> and after postreload
>
>     20: %r1:DI=`*.LANCHOR0'
>     17: %f0:DF=[%r1:DI]
>     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
>
> Of course, this was a lucky case since both reloads referred to the very
> same register which allows for trivial removal of insn 19.  At least in
> cases like demonstrated by the new test section-anchors-4.c we are
> guaranteed to re-use the reload for a single insn.
>
> Before testing this patch for multiple targets, I'm wondering whether
> there is even a way to re-use reloads during LRA across insns (like an
> equiv) such that we wouldn't depend on subsequent passes?

LRA reuses the reload pseudo generated for one insn (please see usage of 
curr_insn_input_reloads).  The problem in not reusing reload pseudo for 
the above example is that because reload for anchor occurs from 
different insns.  First LRA reloads [*.LANCHOR0] (insn 17 generated to 
satisfy reg constraint in insn 12), then the anchor (from insn 17 ), and 
then reload *.LANCHOR0 from 2nd op of insn 12.  But I'd not be worry 
that the reload pseudos get different hard regs.  Knowing how assigning 
hard regs works in LRA I see very small probability that the pseduos get 
different regs.

BTW I did not reproduce the testcase situation w/o -march options (the 
anchor in this case already in a pseudo before RA).  But -march=z13 
reproduces it.  So probably you need to add this option to the test.

So the patch (with -march=z13 or other one reproducing the situation as 
additional option for the test) is ok for me. Thank you.

> ---
>   gcc/lra-constraints.cc                        | 30 +++++++++++++++++++
>   .../gcc.target/s390/section-anchors-4.c       | 25 ++++++++++++++++
>   2 files changed, 55 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
>
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index ccd68efc956..6779dfee020 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
>   	    new_reg = emit_inc (rclass, *loc,
>   				/* This value does not matter for MODIFY.  */
>   				GET_MODE_SIZE (GET_MODE (op)));
> +	  /* Try to pull out section anchors.  For example, instead of
> +	     reloading an "entire" address like .LANCHOR42+offset only reload
> +	     .LANCHOR42 and use the new reload register as the base register.
> +	     This allows following optimizations to share section anchors and
> +	     remove redundant loads.  */
> +	  else if (GET_CODE (*loc) == CONST
> +		   && GET_CODE (XEXP (*loc, 0)) == PLUS
> +		   && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> +		   && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> +		   && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> +		   /* Some offsets are valid in conjunction with a symbol and
> +		      invalid in conjunction with a register.  Thus, pull out
> +		      the anchor only in case the offset is a valid anchor
> +		      offset.  */
> +		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
> +		      >= targetm.min_anchor_offset
> +		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
> +		      <= targetm.max_anchor_offset)
> +	    {
> +	       rtx anchor = XEXP (XEXP (*loc, 0), 0);
> +	       rtx offset = XEXP (XEXP (*loc, 0), 1);
> +
> +	       if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> +				   NULL, false, false,
> +				   "offsetable address", &new_reg))
> +		  lra_emit_move (new_reg, anchor);
> +
> +		new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> +		lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op)));
> +	    }
>   	  else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
>   				   NULL, false, false,
>   				   "offsetable address", &new_reg))
> diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> new file mode 100644
> index 00000000000..0b4cd081c61
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */
> +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */
> +
> +/* Ensure that we get the same reload register for the second memory operand.
> +   Prior LRA we have
> +
> +   55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> +
> +   and after LRA
> +
> +   59: %r1:DI=`*.LANCHOR0'
> +   55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8])  */
> +
> +long x, y;
> +
> +long
> +foo (void)
> +{
> +  if (x == y)
> +    return 42;
> +  return 0;
> +}
  
Stefan Schulze Frielinghaus May 11, 2026, 7:40 p.m. UTC | #2
On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote:
> 
> On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> > From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
> > 
> > Currently an "entire" address is reloaded even in cases where section
> > anchors are involved.  This makes it harder to share section anchors
> > which is the whole point of them.  For example, in cases where
> > offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> > .LANCHOR42 and replace the address with the resulting reload register
> > and the offset.  As a consequence subsequent passes only have to deal
> > with register equivalences in order to share section anchors.  For
> > example
> 
> 
> I thought how to fix this in another place as LRA is already too
> complicated.  It could be fixed in some machined-dependent pass or in
> split1.  Adding the pass is overkill and fix in split1 would be ok if the
> target would work with memory via few insns (e.g. only via load/store
> insns).  So probably LRA is the best place to fix this problem.
> 
> 
> > double x;
> > double y;
> > 
> > double foo ()
> > {
> >    return x + y;
> > }
> > 
> > With this patch, after LRA we have
> > 
> >     20: %r1:DI=`*.LANCHOR0'
> >     17: %f0:DF=[%r1:DI]
> >     19: %r1:DI=`*.LANCHOR0'
> >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > 
> > and after postreload
> > 
> >     20: %r1:DI=`*.LANCHOR0'
> >     17: %f0:DF=[%r1:DI]
> >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > 
> > Of course, this was a lucky case since both reloads referred to the very
> > same register which allows for trivial removal of insn 19.  At least in
> > cases like demonstrated by the new test section-anchors-4.c we are
> > guaranteed to re-use the reload for a single insn.
> > 
> > Before testing this patch for multiple targets, I'm wondering whether
> > there is even a way to re-use reloads during LRA across insns (like an
> > equiv) such that we wouldn't depend on subsequent passes?

Meanwhile successfully bootstrapped and regtested on

- aarch64-unknown-linux-gnu
- s390x-ibm-linux-gnu
- x86_64-pc-linux-gnu

For powerpc64le-unknown-linux-gnu there is a new test failure:

FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in extract_constrain_insn, at recog.cc:2795)

Previously the whole address was reloaded which means we ended up with:

   18: %2:DI=const(`*.LANCHOR0'+0x4)
    7: %3:SI=bswap([%2:DI])

and with this patch we bail out during LRA for

   18: %2:DI=`*.LANCHOR0'
    7: %3:SI=bswap([%2:DI+0x4])

with

pr94740.c:11:1: error: insn does not satisfy its constraints:
   11 | }
      | ^
(insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119])
        (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127])
                    (const_int 4 [0x4])) [1 array[1]+0 S4 A32]))) "pr94740.c":10:10 143 {bswapsi2_load}
     (nil))

The insn definition is:

(define_insn "bswap<mode>2_load"
  [(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
        (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
  ""
  "l<wd>brx %0,%y1"
  [(set_attr "type" "load")])

and the important part of the constraint Z is

(define_special_predicate "indexed_or_indirect_address"
  (and (match_test "REG_P (op)
                    || (GET_CODE (op) == PLUS
                        /* Omit testing REG_P (XEXP (op, 0)).  */
                        && REG_P (XEXP (op, 1)))")
       (match_operand 0 "address_operand")))

which doesn't accept offsettable addresses.  At this point I'm not
entirely sure what the contract between targets and LRA is.  My reading
so far was that due to goal_alt_offmemok[i] it is safe to use
offsettable addresses.

> 
> LRA reuses the reload pseudo generated for one insn (please see usage of
> curr_insn_input_reloads).  The problem in not reusing reload pseudo for the
> above example is that because reload for anchor occurs from different
> insns.  First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg
> constraint in insn 12), then the anchor (from insn 17 ), and then reload
> *.LANCHOR0 from 2nd op of insn 12.  But I'd not be worry that the reload
> pseudos get different hard regs.  Knowing how assigning hard regs works in
> LRA I see very small probability that the pseduos get different regs.
> 
> BTW I did not reproduce the testcase situation w/o -march options (the
> anchor in this case already in a pseudo before RA).  But -march=z13
> reproduces it.  So probably you need to add this option to the test.

Ahh right, I tested this patch on top of a private one where the test
case is successful for many archs.  Currently it is failing because
the last alternative of insn cmpdi_cct is rejected for vanilla and
accepted for my private patch.  It will take me some time to polish up
the private patch.  Thus, I think we have two options here.  Proceed
with this patch without a test case, or wait until the other patch is
ready.  I'm fine either way.

I was afraid of a flaky test and deliberately chose this test since it
guarantees that the section anchor ends up in the very same reload
register.  Maybe I should be more brave here ;-)

Thanks,
Stefan

> 
> So the patch (with -march=z13 or other one reproducing the situation as
> additional option for the test) is ok for me. Thank you.
> 
> > ---
> >   gcc/lra-constraints.cc                        | 30 +++++++++++++++++++
> >   .../gcc.target/s390/section-anchors-4.c       | 25 ++++++++++++++++
> >   2 files changed, 55 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > 
> > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > index ccd68efc956..6779dfee020 100644
> > --- a/gcc/lra-constraints.cc
> > +++ b/gcc/lra-constraints.cc
> > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
> >   	    new_reg = emit_inc (rclass, *loc,
> >   				/* This value does not matter for MODIFY.  */
> >   				GET_MODE_SIZE (GET_MODE (op)));
> > +	  /* Try to pull out section anchors.  For example, instead of
> > +	     reloading an "entire" address like .LANCHOR42+offset only reload
> > +	     .LANCHOR42 and use the new reload register as the base register.
> > +	     This allows following optimizations to share section anchors and
> > +	     remove redundant loads.  */
> > +	  else if (GET_CODE (*loc) == CONST
> > +		   && GET_CODE (XEXP (*loc, 0)) == PLUS
> > +		   && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> > +		   && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> > +		   && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> > +		   /* Some offsets are valid in conjunction with a symbol and
> > +		      invalid in conjunction with a register.  Thus, pull out
> > +		      the anchor only in case the offset is a valid anchor
> > +		      offset.  */
> > +		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > +		      >= targetm.min_anchor_offset
> > +		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > +		      <= targetm.max_anchor_offset)
> > +	    {
> > +	       rtx anchor = XEXP (XEXP (*loc, 0), 0);
> > +	       rtx offset = XEXP (XEXP (*loc, 0), 1);
> > +
> > +	       if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> > +				   NULL, false, false,
> > +				   "offsetable address", &new_reg))
> > +		  lra_emit_move (new_reg, anchor);
> > +
> > +		new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> > +		lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op)));
> > +	    }
> >   	  else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
> >   				   NULL, false, false,
> >   				   "offsetable address", &new_reg))
> > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > new file mode 100644
> > index 00000000000..0b4cd081c61
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */
> > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */
> > +
> > +/* Ensure that we get the same reload register for the second memory operand.
> > +   Prior LRA we have
> > +
> > +   55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> > +
> > +   and after LRA
> > +
> > +   59: %r1:DI=`*.LANCHOR0'
> > +   55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8])  */
> > +
> > +long x, y;
> > +
> > +long
> > +foo (void)
> > +{
> > +  if (x == y)
> > +    return 42;
> > +  return 0;
> > +}
>
  
Stefan Schulze Frielinghaus May 13, 2026, 3:39 p.m. UTC | #3
On Mon, May 11, 2026 at 09:40:15PM +0200, Stefan Schulze Frielinghaus wrote:
> On Fri, May 08, 2026 at 08:23:46AM -0400, Vladimir Makarov wrote:
> > 
> > On 5/6/26 5:21 AM, Stefan Schulze Frielinghaus wrote:
> > > From: Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
> > > 
> > > Currently an "entire" address is reloaded even in cases where section
> > > anchors are involved.  This makes it harder to share section anchors
> > > which is the whole point of them.  For example, in cases where
> > > offsetable MEMs are ok do not reload .LANCHOR42+offset but only
> > > .LANCHOR42 and replace the address with the resulting reload register
> > > and the offset.  As a consequence subsequent passes only have to deal
> > > with register equivalences in order to share section anchors.  For
> > > example
> > 
> > 
> > I thought how to fix this in another place as LRA is already too
> > complicated.  It could be fixed in some machined-dependent pass or in
> > split1.  Adding the pass is overkill and fix in split1 would be ok if the
> > target would work with memory via few insns (e.g. only via load/store
> > insns).  So probably LRA is the best place to fix this problem.
> > 
> > 
> > > double x;
> > > double y;
> > > 
> > > double foo ()
> > > {
> > >    return x + y;
> > > }
> > > 
> > > With this patch, after LRA we have
> > > 
> > >     20: %r1:DI=`*.LANCHOR0'
> > >     17: %f0:DF=[%r1:DI]
> > >     19: %r1:DI=`*.LANCHOR0'
> > >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > > 
> > > and after postreload
> > > 
> > >     20: %r1:DI=`*.LANCHOR0'
> > >     17: %f0:DF=[%r1:DI]
> > >     12: {%f0:DF=%f0:DF+[%r1:DI+0x8];clobber %cc:CC;}
> > > 
> > > Of course, this was a lucky case since both reloads referred to the very
> > > same register which allows for trivial removal of insn 19.  At least in
> > > cases like demonstrated by the new test section-anchors-4.c we are
> > > guaranteed to re-use the reload for a single insn.
> > > 
> > > Before testing this patch for multiple targets, I'm wondering whether
> > > there is even a way to re-use reloads during LRA across insns (like an
> > > equiv) such that we wouldn't depend on subsequent passes?
> 
> Meanwhile successfully bootstrapped and regtested on
> 
> - aarch64-unknown-linux-gnu
> - s390x-ibm-linux-gnu
> - x86_64-pc-linux-gnu
> 
> For powerpc64le-unknown-linux-gnu there is a new test failure:
> 
> FAIL: gcc.target/powerpc/pr94740.c (internal compiler error: in extract_constrain_insn, at recog.cc:2795)
> 
> Previously the whole address was reloaded which means we ended up with:
> 
>    18: %2:DI=const(`*.LANCHOR0'+0x4)
>     7: %3:SI=bswap([%2:DI])
> 
> and with this patch we bail out during LRA for
> 
>    18: %2:DI=`*.LANCHOR0'
>     7: %3:SI=bswap([%2:DI+0x4])
> 
> with
> 
> pr94740.c:11:1: error: insn does not satisfy its constraints:
>    11 | }
>       | ^
> (insn 7 18 13 2 (set (reg:SI 3 3 [orig:119 _3 ] [119])
>         (bswap:SI (mem/c:SI (plus:DI (reg:DI 2 2 [127])
>                     (const_int 4 [0x4])) [1 array[1]+0 S4 A32]))) "pr94740.c":10:10 143 {bswapsi2_load}
>      (nil))
> 
> The insn definition is:
> 
> (define_insn "bswap<mode>2_load"
>   [(set (match_operand:HSI 0 "gpc_reg_operand" "=r")
>         (bswap:HSI (match_operand:HSI 1 "memory_operand" "Z")))]
>   ""
>   "l<wd>brx %0,%y1"
>   [(set_attr "type" "load")])
> 
> and the important part of the constraint Z is
> 
> (define_special_predicate "indexed_or_indirect_address"
>   (and (match_test "REG_P (op)
>                     || (GET_CODE (op) == PLUS
>                         /* Omit testing REG_P (XEXP (op, 0)).  */
>                         && REG_P (XEXP (op, 1)))")
>        (match_operand 0 "address_operand")))
> 
> which doesn't accept offsettable addresses.  At this point I'm not
> entirely sure what the contract between targets and LRA is.  My reading
> so far was that due to goal_alt_offmemok[i] it is safe to use
> offsettable addresses.

goal_alt_offmemok[i] is derived from offmemok which is computed in
process_alt_operands() as e.g. here:

        case CT_MEMORY:
        case CT_RELAXED_MEMORY:
          if (MEM_P (op)
              && satisfies_memory_constraint_p (op, cn))
            win = true;
          else if (spilled_pseudo_p (op))
            win = true;

          /* If we didn't already win, we can reload constants
             via force_const_mem or put the pseudo value into
             memory, or make other memory by reloading the
             address like for 'o'.  */
          if (CONST_POOL_OK_P (mode, op)
              || MEM_P (op) || REG_P (op)
              /* We can restore the equiv insn by a
                 reload.  */
              || equiv_substition_p[nop])
            badop = false;
          constmemok = true;
          offmemok = true;
          break;

Here offmemok is set without explicitly checking whether the constraint
accepts offsettable memory or not.  So far this was no problem because
inside of

  if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op))

every resulting address was a single register.  I'm a bit puzzled.
Either goal_alt_offmemok[i] doesn't mean that offsettable addresses are
ok, or another check is required before setting offmemok to true.  Any
thoughts?

Cheers,
Stefan


> 
> > 
> > LRA reuses the reload pseudo generated for one insn (please see usage of
> > curr_insn_input_reloads).  The problem in not reusing reload pseudo for the
> > above example is that because reload for anchor occurs from different
> > insns.  First LRA reloads [*.LANCHOR0] (insn 17 generated to satisfy reg
> > constraint in insn 12), then the anchor (from insn 17 ), and then reload
> > *.LANCHOR0 from 2nd op of insn 12.  But I'd not be worry that the reload
> > pseudos get different hard regs.  Knowing how assigning hard regs works in
> > LRA I see very small probability that the pseduos get different regs.
> > 
> > BTW I did not reproduce the testcase situation w/o -march options (the
> > anchor in this case already in a pseudo before RA).  But -march=z13
> > reproduces it.  So probably you need to add this option to the test.
> 
> Ahh right, I tested this patch on top of a private one where the test
> case is successful for many archs.  Currently it is failing because
> the last alternative of insn cmpdi_cct is rejected for vanilla and
> accepted for my private patch.  It will take me some time to polish up
> the private patch.  Thus, I think we have two options here.  Proceed
> with this patch without a test case, or wait until the other patch is
> ready.  I'm fine either way.
> 
> I was afraid of a flaky test and deliberately chose this test since it
> guarantees that the section anchor ends up in the very same reload
> register.  Maybe I should be more brave here ;-)
> 
> Thanks,
> Stefan
> 
> > 
> > So the patch (with -march=z13 or other one reproducing the situation as
> > additional option for the test) is ok for me. Thank you.
> > 
> > > ---
> > >   gcc/lra-constraints.cc                        | 30 +++++++++++++++++++
> > >   .../gcc.target/s390/section-anchors-4.c       | 25 ++++++++++++++++
> > >   2 files changed, 55 insertions(+)
> > >   create mode 100644 gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > 
> > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> > > index ccd68efc956..6779dfee020 100644
> > > --- a/gcc/lra-constraints.cc
> > > +++ b/gcc/lra-constraints.cc
> > > @@ -4839,6 +4839,36 @@ curr_insn_transform (bool check_only_p)
> > >   	    new_reg = emit_inc (rclass, *loc,
> > >   				/* This value does not matter for MODIFY.  */
> > >   				GET_MODE_SIZE (GET_MODE (op)));
> > > +	  /* Try to pull out section anchors.  For example, instead of
> > > +	     reloading an "entire" address like .LANCHOR42+offset only reload
> > > +	     .LANCHOR42 and use the new reload register as the base register.
> > > +	     This allows following optimizations to share section anchors and
> > > +	     remove redundant loads.  */
> > > +	  else if (GET_CODE (*loc) == CONST
> > > +		   && GET_CODE (XEXP (*loc, 0)) == PLUS
> > > +		   && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
> > > +		   && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
> > > +		   && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
> > > +		   /* Some offsets are valid in conjunction with a symbol and
> > > +		      invalid in conjunction with a register.  Thus, pull out
> > > +		      the anchor only in case the offset is a valid anchor
> > > +		      offset.  */
> > > +		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > > +		      >= targetm.min_anchor_offset
> > > +		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
> > > +		      <= targetm.max_anchor_offset)
> > > +	    {
> > > +	       rtx anchor = XEXP (XEXP (*loc, 0), 0);
> > > +	       rtx offset = XEXP (XEXP (*loc, 0), 1);
> > > +
> > > +	       if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
> > > +				   NULL, false, false,
> > > +				   "offsetable address", &new_reg))
> > > +		  lra_emit_move (new_reg, anchor);
> > > +
> > > +		new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
> > > +		lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op)));
> > > +	    }
> > >   	  else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
> > >   				   NULL, false, false,
> > >   				   "offsetable address", &new_reg))
> > > diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > new file mode 100644
> > > index 00000000000..0b4cd081c61
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
> > > +/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
> > > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */
> > > +/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */
> > > +
> > > +/* Ensure that we get the same reload register for the second memory operand.
> > > +   Prior LRA we have
> > > +
> > > +   55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
> > > +
> > > +   and after LRA
> > > +
> > > +   59: %r1:DI=`*.LANCHOR0'
> > > +   55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8])  */
> > > +
> > > +long x, y;
> > > +
> > > +long
> > > +foo (void)
> > > +{
> > > +  if (x == y)
> > > +    return 42;
> > > +  return 0;
> > > +}
> >
  

Patch

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index ccd68efc956..6779dfee020 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -4839,6 +4839,36 @@  curr_insn_transform (bool check_only_p)
 	    new_reg = emit_inc (rclass, *loc,
 				/* This value does not matter for MODIFY.  */
 				GET_MODE_SIZE (GET_MODE (op)));
+	  /* Try to pull out section anchors.  For example, instead of
+	     reloading an "entire" address like .LANCHOR42+offset only reload
+	     .LANCHOR42 and use the new reload register as the base register.
+	     This allows following optimizations to share section anchors and
+	     remove redundant loads.  */
+	  else if (GET_CODE (*loc) == CONST
+		   && GET_CODE (XEXP (*loc, 0)) == PLUS
+		   && GET_CODE (XEXP (XEXP (*loc, 0), 0)) == SYMBOL_REF
+		   && SYMBOL_REF_ANCHOR_P (XEXP (XEXP (*loc, 0), 0))
+		   && CONST_INT_P (XEXP (XEXP (*loc, 0), 1))
+		   /* Some offsets are valid in conjunction with a symbol and
+		      invalid in conjunction with a register.  Thus, pull out
+		      the anchor only in case the offset is a valid anchor
+		      offset.  */
+		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
+		      >= targetm.min_anchor_offset
+		   && INTVAL (XEXP (XEXP (*loc, 0), 1))
+		      <= targetm.max_anchor_offset)
+	    {
+	       rtx anchor = XEXP (XEXP (*loc, 0), 0);
+	       rtx offset = XEXP (XEXP (*loc, 0), 1);
+
+	       if (get_reload_reg (OP_IN, Pmode, anchor, rclass,
+				   NULL, false, false,
+				   "offsetable address", &new_reg))
+		  lra_emit_move (new_reg, anchor);
+
+		new_reg = gen_rtx_PLUS (Pmode, new_reg, offset);
+		lra_assert (valid_address_p (Pmode, new_reg, MEM_ADDR_SPACE (op)));
+	    }
 	  else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
 				   NULL, false, false,
 				   "offsetable address", &new_reg))
diff --git a/gcc/testsuite/gcc.target/s390/section-anchors-4.c b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
new file mode 100644
index 00000000000..0b4cd081c61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/section-anchors-4.c
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ira-slim -fdump-rtl-reload-slim" } */
+/* { dg-final { scan-assembler-times "\tlarl\t" 1 } } */
+/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[`\\\*.LANCHOR0'\\\],\\\[const\\\(`\\\*.LANCHOR0'\\\+0x8\\\)\\\]\\\)" "ira" } } */
+/* { dg-final { scan-rtl-dump "%cc:CCZ=cmp\\\(\\\[(%r\[1-9\]\[0-9\]?):DI\\\],\\\[\\1:DI\\\+0x8\\\]\\\)" "reload" } } */
+
+/* Ensure that we get the same reload register for the second memory operand.
+   Prior LRA we have
+
+   55: %cc:CCZ=cmp([`*.LANCHOR0'],[const(`*.LANCHOR0'+0x8)])
+
+   and after LRA
+
+   59: %r1:DI=`*.LANCHOR0'
+   55: %cc:CCZ=cmp([%r1:DI],[%r1:DI+0x8])  */
+
+long x, y;
+
+long
+foo (void)
+{
+  if (x == y)
+    return 42;
+  return 0;
+}