[pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

Message ID 20220211235904.25765-1-iain@sandoe.co.uk
State Committed
Headers
Series [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117]. |

Commit Message

Iain Sandoe Feb. 11, 2022, 11:59 p.m. UTC
  Two issues resulted in this PR, which manifests when we force a constant into
memory in LRA (in PIC code on Darwin).  The presence of such forced constants
is quite dependent on other RTL optimisations, and it is easy for the issue to
become latent for a specific case.

First, in the Darwin-specific rs6000 backend code, we were not being careful
enough in rejecting invalid symbolic addresses.  Specifically, when generating
PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.

Second, LRA was attempting to load a register using an invalid lo_sum address.

The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
Darwin-specific (although, of course, any observations are welcome).

Tested on several lo_sum targets and x86_64 all languages except as noted:
powerpc64-linux (m32/m64) -D
powerpc64le-linux  -D
powerpc64-aix -Ada -Go -D
aarch64-linux -Ada -D
x86_64-linux all langs -D
powerpc-darwin9 (master and 11.2) -D -Go.

pushed to master, thanks,
Iain

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>

	PR target/104117

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
	Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
	emitting PIC code.
	(legitimate_lo_sum_address_p): Likewise.
	* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
	load from an invalid lo_sum address.
---
 gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
 gcc/lra-constraints.cc      | 17 ++---------------
 2 files changed, 38 insertions(+), 17 deletions(-)
  

Comments

Richard Sandiford Feb. 14, 2022, 9:44 a.m. UTC | #1
Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Two issues resulted in this PR, which manifests when we force a constant into
> memory in LRA (in PIC code on Darwin).  The presence of such forced constants
> is quite dependent on other RTL optimisations, and it is easy for the issue to
> become latent for a specific case.
>
> First, in the Darwin-specific rs6000 backend code, we were not being careful
> enough in rejecting invalid symbolic addresses.  Specifically, when generating
> PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
>
> Second, LRA was attempting to load a register using an invalid lo_sum address.
>
> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
> Darwin-specific (although, of course, any observations are welcome).
>
> Tested on several lo_sum targets and x86_64 all languages except as noted:
> powerpc64-linux (m32/m64) -D
> powerpc64le-linux  -D
> powerpc64-aix -Ada -Go -D
> aarch64-linux -Ada -D
> x86_64-linux all langs -D
> powerpc-darwin9 (master and 11.2) -D -Go.
>
> pushed to master, thanks,
> Iain
>
> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
> Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>
>
> 	PR target/104117
>
> gcc/ChangeLog:
>
> 	* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
> 	Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
> 	emitting PIC code.
> 	(legitimate_lo_sum_address_p): Likewise.
> 	* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
> 	load from an invalid lo_sum address.
> ---
>  gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
>  gcc/lra-constraints.cc      | 17 ++---------------
>  2 files changed, 38 insertions(+), 17 deletions(-)
>
> […]
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index fdff9e0720a..c700c3f4578 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>  		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>  		  if (!valid_address_p (op, &ad, cn))
>  		    {
> -		      /* Try to put lo_sum into register.  */
> -		      insn = emit_insn (gen_rtx_SET
> -					(new_reg,
> -					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
> -		      code = recog_memoized (insn);
> -		      if (code >= 0)
> -			{
> -			  *ad.inner = new_reg;
> -			  if (!valid_address_p (op, &ad, cn))
> -			    {
> -			      *ad.inner = addr;
> -			      code = -1;
> -			    }
> -			}
> -
> +		      *ad.inner = addr; /* Punt.  */
> +		      code = -1;
>  		    }
>  		}
>  	      if (code < 0)

Could you go into more details about this?  Why is it OK to continue
to try:

  (lo_sum new_reg addr)

directly as an address (the context at the top of the hunk), but not try
moving the lo_sum into a register?  They should be semantically equivalent,
so it seems that if one is wrong, the other would be too.

Thanks,
Richard
  
Vladimir Makarov Feb. 14, 2022, 2:36 p.m. UTC | #2
On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
> Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Two issues resulted in this PR, which manifests when we force a constant into
>> memory in LRA (in PIC code on Darwin).  The presence of such forced constants
>> is quite dependent on other RTL optimisations, and it is easy for the issue to
>> become latent for a specific case.
>>
>> First, in the Darwin-specific rs6000 backend code, we were not being careful
>> enough in rejecting invalid symbolic addresses.  Specifically, when generating
>> PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
>>
>> Second, LRA was attempting to load a register using an invalid lo_sum address.
>>
>> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
>> Darwin-specific (although, of course, any observations are welcome).
>>
>> Tested on several lo_sum targets and x86_64 all languages except as noted:
>> powerpc64-linux (m32/m64) -D
>> powerpc64le-linux  -D
>> powerpc64-aix -Ada -Go -D
>> aarch64-linux -Ada -D
>> x86_64-linux all langs -D
>> powerpc-darwin9 (master and 11.2) -D -Go.
>>
>> pushed to master, thanks,
>> Iain
>>
>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>> Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>
>>
>> 	PR target/104117
>>
>> gcc/ChangeLog:
>>
>> 	* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>> 	Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>> 	emitting PIC code.
>> 	(legitimate_lo_sum_address_p): Likewise.
>> 	* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>> 	load from an invalid lo_sum address.
>> ---
>>   gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
>>   gcc/lra-constraints.cc      | 17 ++---------------
>>   2 files changed, 38 insertions(+), 17 deletions(-)
>>
>> […]
>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>> index fdff9e0720a..c700c3f4578 100644
>> --- a/gcc/lra-constraints.cc
>> +++ b/gcc/lra-constraints.cc
>> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>>   		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>>   		  if (!valid_address_p (op, &ad, cn))
>>   		    {
>> -		      /* Try to put lo_sum into register.  */
>> -		      insn = emit_insn (gen_rtx_SET
>> -					(new_reg,
>> -					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
>> -		      code = recog_memoized (insn);
>> -		      if (code >= 0)
>> -			{
>> -			  *ad.inner = new_reg;
>> -			  if (!valid_address_p (op, &ad, cn))
>> -			    {
>> -			      *ad.inner = addr;
>> -			      code = -1;
>> -			    }
>> -			}
>> -
>> +		      *ad.inner = addr; /* Punt.  */
>> +		      code = -1;
>>   		    }
>>   		}
>>   	      if (code < 0)
> Could you go into more details about this?  Why is it OK to continue
> to try:
>
>    (lo_sum new_reg addr)
>
> directly as an address (the context at the top of the hunk), but not try
> moving the lo_sum into a register?  They should be semantically equivalent,
> so it seems that if one is wrong, the other would be too.
>
Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If 
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
will help for any existing target.  As machine-dependent code for any 
target most probably (for ppc64 darwin it is exactly the case) checks 
address only in memory, it can wrongly accept wrong address by reloading 
it into reg and use it in memory. So these are my arguments for the 
remove this code from process_address_1.
  
Richard Sandiford Feb. 14, 2022, 4 p.m. UTC | #3
Hi Vlad,

Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
>> Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> Two issues resulted in this PR, which manifests when we force a constant into
>>> memory in LRA (in PIC code on Darwin).  The presence of such forced constants
>>> is quite dependent on other RTL optimisations, and it is easy for the issue to
>>> become latent for a specific case.
>>>
>>> First, in the Darwin-specific rs6000 backend code, we were not being careful
>>> enough in rejecting invalid symbolic addresses.  Specifically, when generating
>>> PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
>>>
>>> Second, LRA was attempting to load a register using an invalid lo_sum address.
>>>
>>> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
>>> Darwin-specific (although, of course, any observations are welcome).
>>>
>>> Tested on several lo_sum targets and x86_64 all languages except as noted:
>>> powerpc64-linux (m32/m64) -D
>>> powerpc64le-linux  -D
>>> powerpc64-aix -Ada -Go -D
>>> aarch64-linux -Ada -D
>>> x86_64-linux all langs -D
>>> powerpc-darwin9 (master and 11.2) -D -Go.
>>>
>>> pushed to master, thanks,
>>> Iain
>>>
>>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>>> Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>
>>>
>>> 	PR target/104117
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>>> 	Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>>> 	emitting PIC code.
>>> 	(legitimate_lo_sum_address_p): Likewise.
>>> 	* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>>> 	load from an invalid lo_sum address.
>>> ---
>>>   gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
>>>   gcc/lra-constraints.cc      | 17 ++---------------
>>>   2 files changed, 38 insertions(+), 17 deletions(-)
>>>
>>> […]
>>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>>> index fdff9e0720a..c700c3f4578 100644
>>> --- a/gcc/lra-constraints.cc
>>> +++ b/gcc/lra-constraints.cc
>>> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>>>   		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>>>   		  if (!valid_address_p (op, &ad, cn))
>>>   		    {
>>> -		      /* Try to put lo_sum into register.  */
>>> -		      insn = emit_insn (gen_rtx_SET
>>> -					(new_reg,
>>> -					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
>>> -		      code = recog_memoized (insn);
>>> -		      if (code >= 0)
>>> -			{
>>> -			  *ad.inner = new_reg;
>>> -			  if (!valid_address_p (op, &ad, cn))
>>> -			    {
>>> -			      *ad.inner = addr;
>>> -			      code = -1;
>>> -			    }
>>> -			}
>>> -
>>> +		      *ad.inner = addr; /* Punt.  */
>>> +		      code = -1;
>>>   		    }
>>>   		}
>>>   	      if (code < 0)
>> Could you go into more details about this?  Why is it OK to continue
>> to try:
>>
>>    (lo_sum new_reg addr)
>>
>> directly as an address (the context at the top of the hunk), but not try
>> moving the lo_sum into a register?  They should be semantically equivalent,
>> so it seems that if one is wrong, the other would be too.
>>
> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>
> I think there is no need for this code and it is misleading.  If 
> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
> will help for any existing target.  As machine-dependent code for any 
> target most probably (for ppc64 darwin it is exactly the case) checks 
> address only in memory, it can wrongly accept wrong address by reloading 
> it into reg and use it in memory. So these are my arguments for the 
> remove this code from process_address_1.

I'm probably making too much of this, but:

I think the code is potentially useful in that existing targets do forbid
forbid lo_sum addresses in certain contexts (due to limited offset range)
while still wanting lo_sum to be used to be load the address.  If we
handle the high/lo_sum split in generic code then we have more chance
of being able to optimise things.  So it feels like this is setting an
unfortunate precedent.

I still don't understand what went wrong before though (the PR trail
was a bit too long to process :-)).  Is there a case where
(lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.

Thanks,
Richard
  
Iain Sandoe Feb. 14, 2022, 4:21 p.m. UTC | #4
Hi Richard,

(hopefully, my take won’t cloud the issue ….)

> On 14 Feb 2022, at 16:00, Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi Vlad,
> 
> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
>>> Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> Two issues resulted in this PR, which manifests when we force a constant into
>>>> memory in LRA (in PIC code on Darwin).  The presence of such forced constants
>>>> is quite dependent on other RTL optimisations, and it is easy for the issue to
>>>> become latent for a specific case.
>>>> 
>>>> First, in the Darwin-specific rs6000 backend code, we were not being careful
>>>> enough in rejecting invalid symbolic addresses.  Specifically, when generating
>>>> PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
>>>> 
>>>> Second, LRA was attempting to load a register using an invalid lo_sum address.
>>>> 
>>>> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
>>>> Darwin-specific (although, of course, any observations are welcome).
>>>> 
>>>> Tested on several lo_sum targets and x86_64 all languages except as noted:
>>>> powerpc64-linux (m32/m64) -D
>>>> powerpc64le-linux  -D
>>>> powerpc64-aix -Ada -Go -D
>>>> aarch64-linux -Ada -D
>>>> x86_64-linux all langs -D
>>>> powerpc-darwin9 (master and 11.2) -D -Go.
>>>> 
>>>> pushed to master, thanks,
>>>> Iain
>>>> 
>>>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>>>> Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>
>>>> 
>>>> 	PR target/104117
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>> 	* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>>>> 	Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>>>> 	emitting PIC code.
>>>> 	(legitimate_lo_sum_address_p): Likewise.
>>>> 	* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>>>> 	load from an invalid lo_sum address.
>>>> ---
>>>>  gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
>>>>  gcc/lra-constraints.cc      | 17 ++---------------
>>>>  2 files changed, 38 insertions(+), 17 deletions(-)
>>>> 
>>>> […]
>>>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>>>> index fdff9e0720a..c700c3f4578 100644
>>>> --- a/gcc/lra-constraints.cc
>>>> +++ b/gcc/lra-constraints.cc
>>>> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>>>>  		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>>>>  		  if (!valid_address_p (op, &ad, cn))
>>>>  		    {
>>>> -		      /* Try to put lo_sum into register.  */
>>>> -		      insn = emit_insn (gen_rtx_SET
>>>> -					(new_reg,
>>>> -					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
>>>> -		      code = recog_memoized (insn);
>>>> -		      if (code >= 0)
>>>> -			{
>>>> -			  *ad.inner = new_reg;
>>>> -			  if (!valid_address_p (op, &ad, cn))
>>>> -			    {
>>>> -			      *ad.inner = addr;
>>>> -			      code = -1;
>>>> -			    }
>>>> -			}
>>>> -
>>>> +		      *ad.inner = addr; /* Punt.  */
>>>> +		      code = -1;
>>>>  		    }
>>>>  		}
>>>>  	      if (code < 0)
>>> Could you go into more details about this?  Why is it OK to continue
>>> to try:
>>> 
>>>   (lo_sum new_reg addr)
>>> 
>>> directly as an address (the context at the top of the hunk), but not try
>>> moving the lo_sum into a register?  They should be semantically equivalent,
>>> so it seems that if one is wrong, the other would be too.
>>> 
>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>> 
>> I think there is no need for this code and it is misleading.  If 
>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
>> will help for any existing target.  As machine-dependent code for any 
>> target most probably (for ppc64 darwin it is exactly the case) checks 
>> address only in memory, it can wrongly accept wrong address by reloading 
>> it into reg and use it in memory. So these are my arguments for the 
>> remove this code from process_address_1.
> 
> I'm probably making too much of this, but:
> 
> I think the code is potentially useful in that existing targets do forbid
> forbid lo_sum addresses in certain contexts (due to limited offset range)
> while still wanting lo_sum to be used to be load the address.  If we
> handle the high/lo_sum split in generic code then we have more chance
> of being able to optimise things.  So it feels like this is setting an
> unfortunate precedent.
> 
> I still don't understand what went wrong before though (the PR trail
> was a bit too long to process :-)).  Is there a case where
> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.

If X is an invalid address (in this case for PIC code a SYMBOL_REF is not
valid without an UNSPEC wrapper) - then (high X) and (lo_sum (high X) X) 
are also invalid.  AFAICT the target never (this late in the process) gets
the opportunity to apply a transform to validate the address.

> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.

once again if X is plain invalid - loading it into a register or splitting it
does not solve that - we’d need to call the target’s address legitimizer.

Of course, if we conclude that it is a target bug, I’ll try to fix it up .

thanks
Iain

> 
> Thanks,
> Richard
  
Richard Sandiford Feb. 14, 2022, 4:35 p.m. UTC | #5
Iain Sandoe <iain@sandoe.co.uk> writes:
> Hi Richard,
>
> (hopefully, my take won’t cloud the issue ….)
>
>> On 14 Feb 2022, at 16:00, Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>> 
>> Hi Vlad,
>> 
>> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
>>>> Iain Sandoe via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>>> Two issues resulted in this PR, which manifests when we force a constant into
>>>>> memory in LRA (in PIC code on Darwin).  The presence of such forced constants
>>>>> is quite dependent on other RTL optimisations, and it is easy for the issue to
>>>>> become latent for a specific case.
>>>>> 
>>>>> First, in the Darwin-specific rs6000 backend code, we were not being careful
>>>>> enough in rejecting invalid symbolic addresses.  Specifically, when generating
>>>>> PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
>>>>> 
>>>>> Second, LRA was attempting to load a register using an invalid lo_sum address.
>>>>> 
>>>>> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
>>>>> Darwin-specific (although, of course, any observations are welcome).
>>>>> 
>>>>> Tested on several lo_sum targets and x86_64 all languages except as noted:
>>>>> powerpc64-linux (m32/m64) -D
>>>>> powerpc64le-linux  -D
>>>>> powerpc64-aix -Ada -Go -D
>>>>> aarch64-linux -Ada -D
>>>>> x86_64-linux all langs -D
>>>>> powerpc-darwin9 (master and 11.2) -D -Go.
>>>>> 
>>>>> pushed to master, thanks,
>>>>> Iain
>>>>> 
>>>>> Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
>>>>> Co-authored-by: Vladimir Makarov <vmakarov@redhat.com>
>>>>> 
>>>>> 	PR target/104117
>>>>> 
>>>>> gcc/ChangeLog:
>>>>> 
>>>>> 	* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>>>>> 	Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>>>>> 	emitting PIC code.
>>>>> 	(legitimate_lo_sum_address_p): Likewise.
>>>>> 	* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>>>>> 	load from an invalid lo_sum address.
>>>>> ---
>>>>>  gcc/config/rs6000/rs6000.cc | 38 +++++++++++++++++++++++++++++++++++--
>>>>>  gcc/lra-constraints.cc      | 17 ++---------------
>>>>>  2 files changed, 38 insertions(+), 17 deletions(-)
>>>>> 
>>>>> […]
>>>>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>>>>> index fdff9e0720a..c700c3f4578 100644
>>>>> --- a/gcc/lra-constraints.cc
>>>>> +++ b/gcc/lra-constraints.cc
>>>>> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>>>>>  		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>>>>>  		  if (!valid_address_p (op, &ad, cn))
>>>>>  		    {
>>>>> -		      /* Try to put lo_sum into register.  */
>>>>> -		      insn = emit_insn (gen_rtx_SET
>>>>> -					(new_reg,
>>>>> -					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
>>>>> -		      code = recog_memoized (insn);
>>>>> -		      if (code >= 0)
>>>>> -			{
>>>>> -			  *ad.inner = new_reg;
>>>>> -			  if (!valid_address_p (op, &ad, cn))
>>>>> -			    {
>>>>> -			      *ad.inner = addr;
>>>>> -			      code = -1;
>>>>> -			    }
>>>>> -			}
>>>>> -
>>>>> +		      *ad.inner = addr; /* Punt.  */
>>>>> +		      code = -1;
>>>>>  		    }
>>>>>  		}
>>>>>  	      if (code < 0)
>>>> Could you go into more details about this?  Why is it OK to continue
>>>> to try:
>>>> 
>>>>   (lo_sum new_reg addr)
>>>> 
>>>> directly as an address (the context at the top of the hunk), but not try
>>>> moving the lo_sum into a register?  They should be semantically equivalent,
>>>> so it seems that if one is wrong, the other would be too.
>>>> 
>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>> 
>>> I think there is no need for this code and it is misleading.  If 
>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
>>> will help for any existing target.  As machine-dependent code for any 
>>> target most probably (for ppc64 darwin it is exactly the case) checks 
>>> address only in memory, it can wrongly accept wrong address by reloading 
>>> it into reg and use it in memory. So these are my arguments for the 
>>> remove this code from process_address_1.
>> 
>> I'm probably making too much of this, but:
>> 
>> I think the code is potentially useful in that existing targets do forbid
>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>> while still wanting lo_sum to be used to be load the address.  If we
>> handle the high/lo_sum split in generic code then we have more chance
>> of being able to optimise things.  So it feels like this is setting an
>> unfortunate precedent.
>> 
>> I still don't understand what went wrong before though (the PR trail
>> was a bit too long to process :-)).  Is there a case where
>> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
>
> If X is an invalid address (in this case for PIC code a SYMBOL_REF is not
> valid without an UNSPEC wrapper) - then (high X) and (lo_sum (high X) X) 
> are also invalid.  AFAICT the target never (this late in the process) gets
> the opportunity to apply a transform to validate the address.
>
>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>
> once again if X is plain invalid - loading it into a register or splitting it
> does not solve that - we’d need to call the target’s address legitimizer.
>
> Of course, if we conclude that it is a target bug, I’ll try to fix it up .

But if the target treats:

   (set R2 (high X))
   (set R1 (lo_sum R2 X))

as an invalid way of setting R1 to X (e.g. because X is plain invalid),
then recog should fail on at least one instruction, preferably both.
So yeah, this does feel like a target bug to me.

We shouldn't assume that lo_sum/high splits will only be generated by
the target.  Target-independent code can generate them “spontaneously”,
provided that it validates the result.  (Combine does this too,
for example.)

It looks like the LRA code might have been missing a HIGH in the
fallback case, but it doesn't sound like that was the bug that was
being fixed.

Thanks,
Richard
  
Vladimir Makarov Feb. 14, 2022, 4:58 p.m. UTC | #6
On 2022-02-14 11:00, Richard Sandiford wrote:
> Hi Vlad,
>
> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>
>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>
>> I think there is no need for this code and it is misleading.  If
>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
>> will help for any existing target.  As machine-dependent code for any
>> target most probably (for ppc64 darwin it is exactly the case) checks
>> address only in memory, it can wrongly accept wrong address by reloading
>> it into reg and use it in memory. So these are my arguments for the
>> remove this code from process_address_1.
> I'm probably making too much of this, but:
>
> I think the code is potentially useful in that existing targets do forbid
> forbid lo_sum addresses in certain contexts (due to limited offset range)
> while still wanting lo_sum to be used to be load the address.  If we
> handle the high/lo_sum split in generic code then we have more chance
> of being able to optimise things.  So it feels like this is setting an
> unfortunate precedent.
>
> I still don't understand what went wrong before though (the PR trail
> was a bit too long to process :-)).  Is there a case where
> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>
Sometimes it is hard to make a line where an RA bug is a bug in 
machine-dependent code or in RA itself.

For this case I would say it is a bug in the both parts.

Low-sum is generated by LRA and it does not know that it should be 
wrapped by unspec for darwin. Generally speaking we could avoid the 
change in LRA but it would require to do non-trivial analysis in machine 
dependent code to find cases when 'reg=low_sum ... mem[reg]' is 
incorrect code for darwin (PIC) target (and may be some other PIC 
targets too). Therefore I believe the change in LRA is a good solution 
even if the change can potentially result in less optimized code for 
some cases.  Taking your concern into account we could probably improve 
the patch by introducing a hook (I never liked such solutions as we 
already have too many hooks directing RA) or better to make the LRA 
change working only for PIC target. Something like this (it probably 
needs better recognition of pic target):

--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
           if (HAVE_lo_sum)
             {
               /* addr => lo_sum (new_base, addr), case (2) above.  */
               insn = emit_insn (gen_rtx_SET
                                 (new_reg,
                                  gen_rtx_HIGH (Pmode, copy_rtx (addr))));
               code = recog_memoized (insn);
               if (code >= 0)
                 {
                   *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
-                 if (!valid_address_p (op, &ad, cn))
+                 if (!valid_address_p (op, &ad, cn) && !flag_pic)
                     {
                       /* Try to put lo_sum into register.  */
                       insn = emit_insn (gen_rtx_SET
                                         (new_reg,
                                          gen_rtx_LO_SUM (Pmode, 
new_reg, addr)));
                       code = recog_memoized (insn);
                       if (code >= 0)
                         {
                           *ad.inner = new_reg;
                           if (!valid_address_p (op, &ad, cn))
  
Iain Sandoe Feb. 20, 2022, 5:34 p.m. UTC | #7
Hi Folks.

> On 14 Feb 2022, at 16:58, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 2022-02-14 11:00, Richard Sandiford wrote:

>> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> 
>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>> 
>>> I think there is no need for this code and it is misleading.  If
>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
>>> will help for any existing target.  As machine-dependent code for any
>>> target most probably (for ppc64 darwin it is exactly the case) checks
>>> address only in memory, it can wrongly accept wrong address by reloading
>>> it into reg and use it in memory. So these are my arguments for the
>>> remove this code from process_address_1.
>> I'm probably making too much of this, but:
>> 
>> I think the code is potentially useful in that existing targets do forbid
>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>> while still wanting lo_sum to be used to be load the address.  If we
>> handle the high/lo_sum split in generic code then we have more chance
>> of being able to optimise things.  So it feels like this is setting an
>> unfortunate precedent.
>> 
>> I still don't understand what went wrong before though (the PR trail
>> was a bit too long to process :-)).  Is there a case where
>> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>> 
> Sometimes it is hard to make a line where an RA bug is a bug in machine-dependent code or in RA itself.
> 
> For this case I would say it is a bug in the both parts.
> 
> Low-sum is generated by LRA and it does not know that it should be wrapped by unspec for darwin. Generally speaking we could avoid the change in LRA but it would require to do non-trivial analysis in machine dependent code to find cases when 'reg=low_sum ... mem[reg]' is incorrect code for darwin (PIC) target (and may be some other PIC targets too). Therefore I believe the change in LRA is a good solution even if the change can potentially result in less optimized code for some cases.  Taking your concern into account we could probably improve the patch by introducing a hook (I never liked such solutions as we already have too many hooks directing RA) or better to make the LRA change working only for PIC target. Something like this (it probably needs better recognition of pic target):
> 
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
>           if (HAVE_lo_sum)
>             {
>               /* addr => lo_sum (new_base, addr), case (2) above.  */
>               insn = emit_insn (gen_rtx_SET
>                                 (new_reg,
>                                  gen_rtx_HIGH (Pmode, copy_rtx (addr))));
>               code = recog_memoized (insn);
>               if (code >= 0)
>                 {
>                   *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
> -                 if (!valid_address_p (op, &ad, cn))
> +                 if (!valid_address_p (op, &ad, cn) && !flag_pic)

IMO the PIC aspect of this is possibly misleading
 - the issue is that we have an invalid address, and that such addresses in this case need to be legitimised by wrapping them in an UNSPEC. 
- My concern about the generic code was that I would not expect Darwin to be the only platform that might need to wrap an invlaid address in an unspec  [TOC, TLS, small data etc. spring to mind].

I need some help understanding the expected pathway through this code that could be useful.

we start with an invalid address.

1. we build (set reg (high invalid_address))
 - Darwin was allowing this (and the lo_sum) [eveywhere, not just here] on the basis that the target legitimizer would be called later to fix it up.  (that is why the initial recog passes) - but AFAICT we never call the target’s address legitimizer.

 - I am curious about what (other) circumstance there would be where a (high of an invalid address would be useful.

2. …  assuming the we allowed the build of the (high invalid)

 - we now build the lo_sum and check to see if it is valid.

3. if it is _not_ valid, we load it into a reg

  - I am not sure (outside the comment about about post-legitimiizer use) about how an invalid lo_sum can be used in this way.

  - assuming we accept this, we then test to see if the register is a valid address (my guess is that test will pass pretty much everywhere, since we picked a suitable register in the first place).

^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug….

[ part of me wonders why we do not just call the target’s address legitimizer when we have an illegal address ]

——— current WIP:

So .. I have split the Darwin patterns into a hi/lo pair for non-PIC and a hi/lo pair for PIC.

I added a predicate for the PIC case that requires it to match (unspec [….] UNSPEC_MACHOPIC_OFFSET).

Thus both the attempts (high and (lo_sum will fail recog now which has the same effect as punting on the bad lo_sum (and the original testcase now generates correct code) …

… the hardware is slow (well, it isn’t really - faster than a Cray XMP .. but …) .. so regstraps on 11.2 (to check the fix works) and master will take some time.

Then, hopefully, there is a target-local fix (and the LRA change could be reverted if that’s the concensus about the best solution) ...

cheers
Iain
  
Richard Sandiford Feb. 21, 2022, 10:55 a.m. UTC | #8
Iain Sandoe <iain@sandoe.co.uk> writes:
> Hi Folks.
>> On 14 Feb 2022, at 16:58, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 2022-02-14 11:00, Richard Sandiford wrote:
>
>>> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> 
>>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>>> 
>>>> I think there is no need for this code and it is misleading.  If
>>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
>>>> will help for any existing target.  As machine-dependent code for any
>>>> target most probably (for ppc64 darwin it is exactly the case) checks
>>>> address only in memory, it can wrongly accept wrong address by reloading
>>>> it into reg and use it in memory. So these are my arguments for the
>>>> remove this code from process_address_1.
>>> I'm probably making too much of this, but:
>>> 
>>> I think the code is potentially useful in that existing targets do forbid
>>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>>> while still wanting lo_sum to be used to be load the address.  If we
>>> handle the high/lo_sum split in generic code then we have more chance
>>> of being able to optimise things.  So it feels like this is setting an
>>> unfortunate precedent.
>>> 
>>> I still don't understand what went wrong before though (the PR trail
>>> was a bit too long to process :-)).  Is there a case where
>>> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
>>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>>> 
>> Sometimes it is hard to make a line where an RA bug is a bug in machine-dependent code or in RA itself.
>> 
>> For this case I would say it is a bug in the both parts.
>> 
>> Low-sum is generated by LRA and it does not know that it should be wrapped by unspec for darwin. Generally speaking we could avoid the change in LRA but it would require to do non-trivial analysis in machine dependent code to find cases when 'reg=low_sum ... mem[reg]' is incorrect code for darwin (PIC) target (and may be some other PIC targets too). Therefore I believe the change in LRA is a good solution even if the change can potentially result in less optimized code for some cases.  Taking your concern into account we could probably improve the patch by introducing a hook (I never liked such solutions as we already have too many hooks directing RA) or better to make the LRA change working only for PIC target. Something like this (it probably needs better recognition of pic target):
>> 
>> --- a/gcc/lra-constraints.cc
>> +++ b/gcc/lra-constraints.cc
>> @@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
>>           if (HAVE_lo_sum)
>>             {
>>               /* addr => lo_sum (new_base, addr), case (2) above.  */
>>               insn = emit_insn (gen_rtx_SET
>>                                 (new_reg,
>>                                  gen_rtx_HIGH (Pmode, copy_rtx (addr))));
>>               code = recog_memoized (insn);
>>               if (code >= 0)
>>                 {
>>                   *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>> -                 if (!valid_address_p (op, &ad, cn))
>> +                 if (!valid_address_p (op, &ad, cn) && !flag_pic)
>
> IMO the PIC aspect of this is possibly misleading
>  - the issue is that we have an invalid address, and that such addresses in this case need to be legitimised by wrapping them in an UNSPEC. 
> - My concern about the generic code was that I would not expect Darwin to be the only platform that might need to wrap an invlaid address in an unspec  [TOC, TLS, small data etc. spring to mind].

Yeah, that part is pretty common.

> I need some help understanding the expected pathway through this code that could be useful.
>
> we start with an invalid address.
>
> 1. we build (set reg (high invalid_address))
>  - Darwin was allowing this (and the lo_sum) [eveywhere, not just here] on the basis that the target legitimizer would be called later to fix it up.  (that is why the initial recog passes) - but AFAICT we never call the target’s address legitimizer.

I think it's really the other way around: legitimisers are (or should be)
called before recog, with the legitimisers producing instructions that
recog is known to accept.

Once something has been recog()ed, the target has to be prepared
to generate code for it, either directly from an asm template or
indirectly from a define_split.  There's no backing out beyond
that point.

For:

  (set … (mem (lo_sum (reg Y) (…))))

the lo_sum is an address that needs to be legitimised (because it's
in a mem) but in:

  (set (reg X) (lo_sum (reg Y) (…)))    // A
  (set … (mem (reg X)))

it isn't: it's just an ordinary bit of arithmetic whose result happens
to be used as an address later.  If the target accepts A then it must be
prepared to generate code for it, without other hooks being called first.

That distinction also (unfortunately) applies to the canonical form
of multiplication by 2^N.  In an address (i.e. inside a mem) it's:

  (mult … (const_int 2^N))

whereas outside of an address (outside of a mem) it's:

  (lshift … (const_int N))

Also, the address legitimisation hooks are just optimisations.  It should
always be valid to legitimise an address by moving the address into a
register via force_operand instead (at least when can_create_psuedos_p).

So the real work of legitimising symbolic references is done by the move
patterns.  Any legitimisation done by the address hooks is just an
optimisation on top of that.

>  - I am curious about what (other) circumstance there would be where a (high of an invalid address would be useful.
>
> 2. …  assuming the we allowed the build of the (high invalid)
>
>  - we now build the lo_sum and check to see if it is valid.
>
> 3. if it is _not_ valid, we load it into a reg
>
>   - I am not sure (outside the comment about about post-legitimiizer use) about how an invalid lo_sum can be used in this way.
>
>   - assuming we accept this, we then test to see if the register is a valid address (my guess is that test will pass pretty much everywhere, since we picked a suitable register in the first place).
>
> ^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug….
>
> [ part of me wonders why we do not just call the target’s address legitimizer when we have an illegal address ]
>
> ——— current WIP:
>
> So .. I have split the Darwin patterns into a hi/lo pair for non-PIC and a hi/lo pair for PIC.
>
> I added a predicate for the PIC case that requires it to match (unspec [….] UNSPEC_MACHOPIC_OFFSET).
>
> Thus both the attempts (high and (lo_sum will fail recog now which has the same effect as punting on the bad lo_sum (and the original testcase now generates correct code) …

I don't know the Darwin code, so I can't speak to whether splitting
the PIC patterns out makes sense, but: yeah, the end result sounds
right to me FWIW.

Thanks,
Richard

> … the hardware is slow (well, it isn’t really - faster than a Cray XMP .. but …) .. so regstraps on 11.2 (to check the fix works) and master will take some time.
>
> Then, hopefully, there is a target-local fix (and the LRA change could be reverted if that’s the concensus about the best solution) ...
>
> cheers
> Iain
  
Vladimir Makarov Feb. 22, 2022, 2:44 p.m. UTC | #9
On 2022-02-20 12:34, Iain Sandoe wrote:
>
> ^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug….
>
I'd be really glad if you do manage to fix this w/o changing LRA. 
Richard has a legitimate point that my proposed change in LRA 
prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate 
less optimized code or even might make LRA to generate unrecognized 
insns `reg = orginal addr` for some ports requiring further fixes in 
machine-dependent code of the ports.
  
Iain Sandoe Feb. 24, 2022, 4:02 p.m. UTC | #10
Folks,

> On 22 Feb 2022, at 14:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
> 
> 
> On 2022-02-20 12:34, Iain Sandoe wrote:
>> 
>> ^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug….
>> 
> I'd be really glad if you do manage to fix this w/o changing LRA. Richard has a legitimate point that my proposed change in LRA prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate less optimized code or even might make LRA to generate unrecognized insns `reg = orginal addr` for some ports requiring further fixes in machine-dependent code of the ports.

I think this is within my remit to push without further review - however I’d very much welcome any comment you folks have: I’d like to push this before my weekly Darwin test run - which is usually started just after the daily bump on Saturday morning.

The other RS6000 changes remain, as Vlad pointed out we were not being picky enough there - despite getting away with it for longer than I’ve been on the project ;)

I tested that the patch fixes the problem on 11.2 (for the testcases provided, the bug is latent on master) and causes no regressions on powerpc-darwin9 (master).

cheers
Iain


[PATCH] LRA, rs6000, Darwin: Revise lo_sum use for forced constants  [PR104117].

Follow up discussion to the initial patch for this PR identified that it is
preferable to avoid the LRA change, and arrange for the target to reject the
hi and lo_sum selections when presented with an invalid address.

We split the Darwin high/low selectors into two:
 1. One that handles non-PIC addresses (kernel mode, mdynamic-no-pic).
 2. One that handles PIC addresses and rejects SYMBOL_REFs unless they are
    suitably wrapped in the MACHOPIC_OFFSET unspec.

The second case is handled by providing a new predicate (macho_pic_address)
that checks the requirements.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

	PR target/PR104117

gcc/ChangeLog:

	* config/rs6000/darwin.md (@machopic_high_<mode>): New.
	(@machopic_low_<mode>): New.
	* config/rs6000/predicates.md (macho_pic_address): New.
	* config/rs6000/rs6000.cc (rs6000_legitimize_address): Do not
	apply the TLS processing to Darwin.
	* lra-constraints.cc (process_address_1): Revert the changes
	in r12-7209.
---
 gcc/config/rs6000/darwin.md     | 19 +++++++++++++++----
 gcc/config/rs6000/predicates.md | 14 ++++++++++++++
 gcc/config/rs6000/rs6000.cc     |  2 +-
 gcc/lra-constraints.cc          | 17 +++++++++++++++--
 4 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md
index 8443585df00..e73d59e8066 100644
--- a/gcc/config/rs6000/darwin.md
+++ b/gcc/config/rs6000/darwin.md
@@ -121,21 +121,32 @@ You should have received a copy of the GNU General Public License
    stw %0,lo16(%2)(%1)"
   [(set_attr "type" "store")])
 
-;; 64-bit MachO load/store support
-
 ;; Mach-O PIC.
 
 (define_insn "@macho_high_<mode>"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b*r")
 	(high:P (match_operand 1 "" "")))]
-  "TARGET_MACHO && (DEFAULT_ABI == ABI_DARWIN)"
+  "TARGET_MACHO && (DEFAULT_ABI == ABI_DARWIN) && !flag_pic"
   "lis %0,ha16(%1)")
 
 (define_insn "@macho_low_<mode>"
   [(set (match_operand:P 0 "gpc_reg_operand" "=r")
 	(lo_sum:P (match_operand:P 1 "gpc_reg_operand" "b")
 		   (match_operand 2 "" "")))]
-   "TARGET_MACHO && (DEFAULT_ABI == ABI_DARWIN)"
+   "TARGET_MACHO && (DEFAULT_ABI == ABI_DARWIN) && !flag_pic"
+   "la %0,lo16(%2)(%1)")
+
+(define_insn "@machopic_high_<mode>"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b*r")
+	(high:P (match_operand 1 "macho_pic_address" "")))]
+  "TARGET_MACHO && flag_pic"
+  "lis %0,ha16(%1)")
+
+(define_insn "@machopic_low_<mode>"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=r")
+	(lo_sum:P (match_operand:P 1 "gpc_reg_operand" "b")
+		   (match_operand 2 "macho_pic_address" "")))]
+   "TARGET_MACHO && flag_pic"
    "la %0,lo16(%2)(%1)")
 
 (define_split
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index c65dfb91f3d..28f6e9883cb 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -2045,3 +2045,17 @@
  (if_then_else (match_test "TARGET_VSX")
   (match_operand 0 "reg_or_cint_operand")
   (match_operand 0 "const_int_operand")))
+
+;; Return true if the operand is a valid Mach-O pic address.
+;;
+(define_predicate "macho_pic_address"
+  (match_code "const,unspec")
+{
+  if (GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
+
+  if (GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_MACHOPIC_OFFSET)
+    return CONSTANT_P (XVECEXP (op, 0, 0));
+  else
+    return false;
+})
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a855e8c4c72..9dbab1fc644 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -9028,7 +9028,7 @@ rs6000_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
       else
 	return force_reg (Pmode, x);
     }
-  if (SYMBOL_REF_P (x))
+  if (SYMBOL_REF_P (x) && !TARGET_MACHO)
     {
       enum tls_model model = SYMBOL_REF_TLS_MODEL (x);
       if (model != 0)
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index b2c4590153c..90b2c56d245 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3625,8 +3625,21 @@ process_address_1 (int nop, bool check_only_p,
 		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
 		  if (!valid_address_p (op, &ad, cn))
 		    {
-		      *ad.inner = addr; /* Punt.  */
-		      code = -1;
+		      /* Try to put lo_sum into register.  */
+		      insn = emit_insn (gen_rtx_SET
+					(new_reg,
+					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
+		      code = recog_memoized (insn);
+		      if (code >= 0)
+			{
+			  *ad.inner = new_reg;
+			  if (!valid_address_p (op, &ad, cn))
+			    {
+			      *ad.inner = addr;
+			      code = -1;
+			    }
+			}
+
 		    }
 		}
 	      if (code < 0)
--
  
Segher Boessenkool Feb. 26, 2022, 4:52 p.m. UTC | #11
Hi Iain,

On Thu, Feb 24, 2022 at 04:02:30PM +0000, Iain Sandoe wrote:
> > On 22 Feb 2022, at 14:44, Vladimir Makarov <vmakarov@redhat.com> wrote:
> > On 2022-02-20 12:34, Iain Sandoe wrote:
> >> 
> >> ^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug….
> >> 
> > I'd be really glad if you do manage to fix this w/o changing LRA. Richard has a legitimate point that my proposed change in LRA prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate less optimized code or even might make LRA to generate unrecognized insns `reg = orginal addr` for some ports requiring further fixes in machine-dependent code of the ports.
> 
> I think this is within my remit to push without further review - however I’d very much welcome any comment you folks have: I’d like to push this before my weekly Darwin test run - which is usually started just after the daily bump on Saturday morning.
> 
> The other RS6000 changes remain, as Vlad pointed out we were not being picky enough there - despite getting away with it for longer than I’ve been on the project ;)
> 
> I tested that the patch fixes the problem on 11.2 (for the testcases provided, the bug is latent on master) and causes no regressions on powerpc-darwin9 (master).

Nothing in the patch does anything if TARGET_MACHO isn't true, so it is
all fine with me.  It does look good to me fwiw (the empty constraints
are a bit nasty, but they aren't new).  Okay for trunk wrt rs6000.
Thanks!


Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index eaba9a2d698..bc3ef0721a4 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -8317,8 +8317,14 @@  darwin_rs6000_legitimate_lo_sum_const_p (rtx x, machine_mode mode)
   if (GET_CODE (x) == CONST)
     x = XEXP (x, 0);
 
+  /* If we are building PIC code, then any symbol must be wrapped in an
+     UNSPEC_MACHOPIC_OFFSET so that it will get the picbase subtracted.  */
+  bool machopic_offs_p = false;
   if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_MACHOPIC_OFFSET)
-    x =  XVECEXP (x, 0, 0);
+    {
+      x =  XVECEXP (x, 0, 0);
+      machopic_offs_p = true;
+    }
 
   rtx sym = NULL_RTX;
   unsigned HOST_WIDE_INT offset = 0;
@@ -8349,6 +8355,9 @@  darwin_rs6000_legitimate_lo_sum_const_p (rtx x, machine_mode mode)
   if (sym)
     {
       tree decl = SYMBOL_REF_DECL (sym);
+      /* As noted above, PIC code cannot use a bare SYMBOL_REF.  */
+      if (TARGET_MACHO && flag_pic && !machopic_offs_p)
+	return false;
 #if TARGET_MACHO
       if (MACHO_SYMBOL_INDIRECTION_P (sym))
       /* The decl in an indirection symbol is the original one, which might
@@ -8936,7 +8945,7 @@  legitimate_lo_sum_address_p (machine_mode mode, rtx x, int strict)
     return false;
   x = XEXP (x, 1);
 
-  if (TARGET_ELF || TARGET_MACHO)
+  if (TARGET_ELF)
     {
       bool large_toc_ok;
 
@@ -8962,7 +8971,32 @@  legitimate_lo_sum_address_p (machine_mode mode, rtx x, int strict)
 
       return CONSTANT_P (x) || large_toc_ok;
     }
+  else if (TARGET_MACHO)
+    {
+      if (GET_MODE_NUNITS (mode) != 1)
+	return false;
+      if (GET_MODE_SIZE (mode) > UNITS_PER_WORD
+	  && !(/* see above  */
+	       TARGET_HARD_FLOAT && (mode == DFmode || mode == DDmode)))
+	return false;
+#if TARGET_MACHO
+      if (MACHO_DYNAMIC_NO_PIC_P || !flag_pic)
+	return CONSTANT_P (x);
+#endif
+      /* Macho-O PIC code from here.  */
+      if (GET_CODE (x) == CONST)
+	x = XEXP (x, 0);
+
+      /* SYMBOL_REFs need to be wrapped in an UNSPEC_MACHOPIC_OFFSET.  */
+      if (SYMBOL_REF_P (x))
+	return false;
 
+      /* So this is OK if the wrapped object is const.  */
+      if (GET_CODE (x) == UNSPEC
+	  && XINT (x, 1) == UNSPEC_MACHOPIC_OFFSET)
+	return CONSTANT_P (XVECEXP (x, 0, 0));
+      return CONSTANT_P (x);
+    }
   return false;
 }
 
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index fdff9e0720a..c700c3f4578 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3625,21 +3625,8 @@  process_address_1 (int nop, bool check_only_p,
 		  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
 		  if (!valid_address_p (op, &ad, cn))
 		    {
-		      /* Try to put lo_sum into register.  */
-		      insn = emit_insn (gen_rtx_SET
-					(new_reg,
-					 gen_rtx_LO_SUM (Pmode, new_reg, addr)));
-		      code = recog_memoized (insn);
-		      if (code >= 0)
-			{
-			  *ad.inner = new_reg;
-			  if (!valid_address_p (op, &ad, cn))
-			    {
-			      *ad.inner = addr;
-			      code = -1;
-			    }
-			}
-
+		      *ad.inner = addr; /* Punt.  */
+		      code = -1;
 		    }
 		}
 	      if (code < 0)