middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359]

Message ID CAFULd4YN9BYzNp6b2r0zxOJ5CiPoumPpv6nAZjq8_DKtsT4VrA@mail.gmail.com
State New
Headers
Series middle-end: Use rtx_equal_p in notice_stack_pointer_modification_1 [PR117359] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed

Commit Message

Uros Bizjak Nov. 1, 2024, 11:34 a.m. UTC
  Stack pointer modifications in asm are currently not flagged in
crtl->sp_is_unchanging due to RTX pointer comparison in
notice_stack_pointer_modification_1.  Pointer comparison does not detect
that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
RTXes actually all correspond to the same stack pointer register.

Due to the above omission, the compiler does not detect that asm RTX
manipulates stack pointer in the following construct:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

  asm volatile ("pushfq; push %1; pop %0; popfq"
        : "=r" (y), ASM_CALL_CONSTRAINT
        : "e" (-1)
        : "memory");
--cut here--

which corresponds to the following RTX:

(insn:TI 17 15 3 2 (parallel [
            (set (reg:DI 0 ax [orig:114 y ] [114])
                (asm_operands/v:DI ("pushfq; push %1; pop %0; popfq") ("=r") 0 [
                        (const_int -1 [0xffffffffffffffff])
                        (reg/v:DI 7 sp [ current_stack_pointer ])
                    ]
                     [
                        (asm_input:SI ("e") pr117359.c:23)
                        (asm_input:DI ("1") pr117359.c:23)
                    ]
                     [] pr117359.c:23))
            (set (reg/v:DI 7 sp [ current_stack_pointer ])
                (asm_operands/v:DI ("pushfq; push %1; pop %0; popfq") ("=r") 1 [
                        (const_int -1 [0xffffffffffffffff])
                        (reg/v:DI 7 sp [ current_stack_pointer ])
                    ]
                     [
                        (asm_input:SI ("e") pr117359.c:23)
                        (asm_input:DI ("1") pr117359.c:23)
                    ]
                     [] pr117359.c:23))
            (clobber (mem:BLK (scratch) [0  A8]))
            (clobber (reg:CC 17 flags))
        ]) "pr117359.c":23:3 -1
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_UNUSED (reg:DI 0 ax [orig:114 y ] [114])
            (nil))))

The discussion in the PR117359 (and its dependent PR117312) suggested
adding "rsp" to the list of clobbered registers instead. However in this
case the compiler warns with:

warning: listing the stack pointer register ‘rsp’ in a clobber list is
deprecated [-Wdeprecated]
   23 |   asm volatile ("pushfq; push %1; pop %0; popfq"
      |   ^~~
note: the value of the stack pointer after an ‘asm’ statement must be
the same as it was before the statement

But in the above case, "(reg/f:DI 7 sp)" is indeed passed to
notice_stack_pointer_modification_1, which satisfied pointer comparison,
crtl->sp_is_unchanging is cleared and consequently redzone for x86
target is not constructed.

The original source however does *not* clobber the stack pointer, its value
after asm statement is still the same as it was before the statement
(as is requested in the note), so the clobber would be misleading. The asm
does change the memory location that depends on the stack pointer, but to
quote Jakub from PR117312c#19 regarding "memory" and "rsp" clobbers:

--q--
"memory" clobber is IMO about possibly changing any user var in memory
behind the back of the compiler, not about changing whatever compiler
internals stored somewhere on the stack in stack slots that don't have
address taken and could escape to the inline asm.
So, compiler doesn't need to assume the inline asm could have say changed
the saved frame pointer or return address on the stack, or stuff temporarily
pushed into the red zone, stack canary, pseudos pushed temporarily to stack
during reload, ...
Because there is no way the compiler can cope with that being changed,
reloading is exactly what the compiler does if say inline asm clobbers
a register in which something that needs to live across the inline asm
is stored.
--/q--

x86 enables redzone based on the crtl->sp_is_unchanging in:

  if (ix86_using_red_zone ()
      && crtl->sp_is_unchanging
      && crtl->is_leaf
      && !ix86_pc_thunk_call_expanded
      && !ix86_current_function_calls_tls_descriptor)

and when a builtin function that manipulates stack is used
(e.g. __builtin_ia32_readeflags_u64 that pushes eflags to the stack
and pops the value from the stack int a register),
notice_stack_pointer_modification_1 detects a write to a stack pointer
and clears crtl->sp_is_unchanging.

Based on the above discussion, the patch improves the detection of
stack pointer RTX in notice_stack_pointer_modification_1 to set
crtl->sp_is_unchanging even when the stack pointer is declared in
the source and used as the output of the asm.

    PR middle-end/117359

gcc/ChangeLog:

    * stack-ptr-mod.cc (notice_stack_pointer_modification_1):
    Use rtx_equal_p to check x for stack_pointer_rtx.

gcc/testsuite/ChangeLog:

    * gcc.target/i386/pr117359.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline and backports?

Thanks,
Uros.
  

Comments

Jeff Law Nov. 1, 2024, 1:18 p.m. UTC | #1
On 11/1/24 5:34 AM, Uros Bizjak wrote:
> Stack pointer modifications in asm are currently not flagged in
> crtl->sp_is_unchanging due to RTX pointer comparison in
> notice_stack_pointer_modification_1.  Pointer comparison does not detect
> that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
> RTXes actually all correspond to the same stack pointer register.
> 
> Due to the above omission, the compiler does not detect that asm RTX
> manipulates stack pointer in the following construct:
But how did you get two distinct RTXs for the stack pointer?  That's not 
supposed to happen IIRC.


Jeff
  
Uros Bizjak Nov. 1, 2024, 2:45 p.m. UTC | #2
On Fri, Nov 1, 2024 at 2:18 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 11/1/24 5:34 AM, Uros Bizjak wrote:
> > Stack pointer modifications in asm are currently not flagged in
> > crtl->sp_is_unchanging due to RTX pointer comparison in
> > notice_stack_pointer_modification_1.  Pointer comparison does not detect
> > that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
> > RTXes actually all correspond to the same stack pointer register.
> >
> > Due to the above omission, the compiler does not detect that asm RTX
> > manipulates stack pointer in the following construct:
> But how did you get two distinct RTXs for the stack pointer?  That's not
> supposed to happen IIRC.

Please see the testcase in the patch:

+register unsigned long current_stack_pointer asm ("%rsp");
+#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

[...]

+  asm volatile ("pushfq; push %1; pop %0; popfq"
+ : "=r" (y), ASM_CALL_CONSTRAINT
+ : "e" (-1));

When compiled for x86_64, the stack pointer RTX that is different from
the generic stack pointer RTX is created.

Uros.
  
Jeff Law Nov. 1, 2024, 2:51 p.m. UTC | #3
On 11/1/24 8:45 AM, Uros Bizjak wrote:
> On Fri, Nov 1, 2024 at 2:18 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 11/1/24 5:34 AM, Uros Bizjak wrote:
>>> Stack pointer modifications in asm are currently not flagged in
>>> crtl->sp_is_unchanging due to RTX pointer comparison in
>>> notice_stack_pointer_modification_1.  Pointer comparison does not detect
>>> that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
>>> RTXes actually all correspond to the same stack pointer register.
>>>
>>> Due to the above omission, the compiler does not detect that asm RTX
>>> manipulates stack pointer in the following construct:
>> But how did you get two distinct RTXs for the stack pointer?  That's not
>> supposed to happen IIRC.
> 
> Please see the testcase in the patch:
> 
> +register unsigned long current_stack_pointer asm ("%rsp");
> +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> 
> [...]
> 
> +  asm volatile ("pushfq; push %1; pop %0; popfq"
> + : "=r" (y), ASM_CALL_CONSTRAINT
> + : "e" (-1));
> 
> When compiled for x86_64, the stack pointer RTX that is different from
> the generic stack pointer RTX is created.
If we're allowing that, then I suspect there's a good number of places 
that will ultimately need to be fixed.  It's baked in pretty deep that 
there is only one stack pointer.

So I think we need to conclude whether or not the testcase is valid or 
not first.

jeff
  
Uros Bizjak Nov. 1, 2024, 4:44 p.m. UTC | #4
On Fri, Nov 1, 2024 at 3:51 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
> On 11/1/24 8:45 AM, Uros Bizjak wrote:
> > On Fri, Nov 1, 2024 at 2:18 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 11/1/24 5:34 AM, Uros Bizjak wrote:
> >>> Stack pointer modifications in asm are currently not flagged in
> >>> crtl->sp_is_unchanging due to RTX pointer comparison in
> >>> notice_stack_pointer_modification_1.  Pointer comparison does not detect
> >>> that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
> >>> RTXes actually all correspond to the same stack pointer register.
> >>>
> >>> Due to the above omission, the compiler does not detect that asm RTX
> >>> manipulates stack pointer in the following construct:
> >> But how did you get two distinct RTXs for the stack pointer?  That's not
> >> supposed to happen IIRC.
> >
> > Please see the testcase in the patch:
> >
> > +register unsigned long current_stack_pointer asm ("%rsp");
> > +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> >
> > [...]
> >
> > +  asm volatile ("pushfq; push %1; pop %0; popfq"
> > + : "=r" (y), ASM_CALL_CONSTRAINT
> > + : "e" (-1));
> >
> > When compiled for x86_64, the stack pointer RTX that is different from
> > the generic stack pointer RTX is created.
> If we're allowing that, then I suspect there's a good number of places
> that will ultimately need to be fixed.  It's baked in pretty deep that
> there is only one stack pointer.

grepping for "== stack_pointer_rtx" in gcc/ and its subdirectories
returns 178 hits, which doesn't look that bad.

> So I think we need to conclude whether or not the testcase is valid or
> not first.

This is what x86 linux kernel uses (arch/x86/include/asm/asm.h) as a
scheduling barrier to prevent asm from being scheduled before stack
frame is constructed:

--cut here--
/*
 * This output constraint should be used for any inline asm which has a "call"
 * instruction.  Otherwise the asm may be inserted before the frame pointer
 * gets set up by the containing function.  If you forget to do this, objtool
 * may print a "call without frame pointer save/setup" warning.
 */
register unsigned long current_stack_pointer asm(_ASM_SP);
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
--cut here--

since "%rbp" can't be used as a global register variable when
fno-omit-frame-pointer is in effect:

--cut here--
register unsigned long current_frame_pointer asm ("%rbp");
#define ASM_CALL_CONSTRAINT "+r" (current_frame_pointer)
--cut here--

In function ‘asm_clobbers_redzone’:
error: frame pointer required, but reserved
   8 | asm_clobbers_redzone (void)
     | ^~~~~~~~~~~~~~~~~~~~
note: for ‘current_frame_pointer’
   1 | register unsigned long current_frame_pointer asm ("%rbp");
     |                        ^~~~~~~~~~~~~~~~~~~~~

OTOH, declaring and using %rsp as a global register variable doesn't
error out, so one would expect that it would be used throughout the
compiler as an alias to the internal stack pointer. The presented
testcase exposes one place where this assumption doesn't hold.

Uros.
  
Jeff Law Nov. 5, 2024, 12:08 a.m. UTC | #5
On 11/1/24 10:44 AM, Uros Bizjak wrote:
> On Fri, Nov 1, 2024 at 3:51 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>> On 11/1/24 8:45 AM, Uros Bizjak wrote:
>>> On Fri, Nov 1, 2024 at 2:18 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/1/24 5:34 AM, Uros Bizjak wrote:
>>>>> Stack pointer modifications in asm are currently not flagged in
>>>>> crtl->sp_is_unchanging due to RTX pointer comparison in
>>>>> notice_stack_pointer_modification_1.  Pointer comparison does not detect
>>>>> that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
>>>>> RTXes actually all correspond to the same stack pointer register.
>>>>>
>>>>> Due to the above omission, the compiler does not detect that asm RTX
>>>>> manipulates stack pointer in the following construct:
>>>> But how did you get two distinct RTXs for the stack pointer?  That's not
>>>> supposed to happen IIRC.
>>>
>>> Please see the testcase in the patch:
>>>
>>> +register unsigned long current_stack_pointer asm ("%rsp");
>>> +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
>>>
>>> [...]
>>>
>>> +  asm volatile ("pushfq; push %1; pop %0; popfq"
>>> + : "=r" (y), ASM_CALL_CONSTRAINT
>>> + : "e" (-1));
>>>
>>> When compiled for x86_64, the stack pointer RTX that is different from
>>> the generic stack pointer RTX is created.
>> If we're allowing that, then I suspect there's a good number of places
>> that will ultimately need to be fixed.  It's baked in pretty deep that
>> there is only one stack pointer.
> 
> grepping for "== stack_pointer_rtx" in gcc/ and its subdirectories
> returns 178 hits, which doesn't look that bad.
But if we allow this case, then we probably need to fix the frame 
pointer, hard frame pointer and arg pointer cases too.

> 
>> So I think we need to conclude whether or not the testcase is valid or
>> not first.
> 
> This is what x86 linux kernel uses (arch/x86/include/asm/asm.h) as a
> scheduling barrier to prevent asm from being scheduled before stack
> frame is constructed:
Just because the kernel does it, doesn't make it right :-)

~~~~~~~~~~~~~~
> 
> OTOH, declaring and using %rsp as a global register variable doesn't
> error out, so one would expect that it would be used throughout the
> compiler as an alias to the internal stack pointer. The presented
> testcase exposes one place where this assumption doesn't hold.
But maybe it should error out or should be transformed internally to 
reference the singleton rtx object.

Jeff
  
Uros Bizjak Nov. 5, 2024, 7:53 a.m. UTC | #6
On Tue, Nov 5, 2024 at 1:08 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 11/1/24 10:44 AM, Uros Bizjak wrote:
> > On Fri, Nov 1, 2024 at 3:51 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >> On 11/1/24 8:45 AM, Uros Bizjak wrote:
> >>> On Fri, Nov 1, 2024 at 2:18 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 11/1/24 5:34 AM, Uros Bizjak wrote:
> >>>>> Stack pointer modifications in asm are currently not flagged in
> >>>>> crtl->sp_is_unchanging due to RTX pointer comparison in
> >>>>> notice_stack_pointer_modification_1.  Pointer comparison does not detect
> >>>>> that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
> >>>>> RTXes actually all correspond to the same stack pointer register.
> >>>>>
> >>>>> Due to the above omission, the compiler does not detect that asm RTX
> >>>>> manipulates stack pointer in the following construct:
> >>>> But how did you get two distinct RTXs for the stack pointer?  That's not
> >>>> supposed to happen IIRC.
> >>>
> >>> Please see the testcase in the patch:
> >>>
> >>> +register unsigned long current_stack_pointer asm ("%rsp");
> >>> +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> >>>
> >>> [...]
> >>>
> >>> +  asm volatile ("pushfq; push %1; pop %0; popfq"
> >>> + : "=r" (y), ASM_CALL_CONSTRAINT
> >>> + : "e" (-1));
> >>>
> >>> When compiled for x86_64, the stack pointer RTX that is different from
> >>> the generic stack pointer RTX is created.
> >> If we're allowing that, then I suspect there's a good number of places
> >> that will ultimately need to be fixed.  It's baked in pretty deep that
> >> there is only one stack pointer.
> >
> > grepping for "== stack_pointer_rtx" in gcc/ and its subdirectories
> > returns 178 hits, which doesn't look that bad.
> But if we allow this case, then we probably need to fix the frame
> pointer, hard frame pointer and arg pointer cases too.

I don't think it is possible to declare a global register that would
conflict with "%frame" or "%argp" and "%rbp" (with
-fno-omit-frame-pointer). Declaring these as global register in the
following source:

--cut here--
register unsigned long hardreg asm ("%frame");
#define ASM_CONSTRAINT "+r" (hardreg)

void foo (void)
{
  asm volatile ("#" : ASM_CONSTRAINT);
}
--cut here--

errors out with:

error: register specified for ‘hardreg’ is an internal GCC implementation detail

while declaring global register with "%rbp" errors out with:

error: frame pointer required, but reserved

when -fno-omit-frame-pointer is in effect.

> >
> >> So I think we need to conclude whether or not the testcase is valid or
> >> not first.
> >
> > This is what x86 linux kernel uses (arch/x86/include/asm/asm.h) as a
> > scheduling barrier to prevent asm from being scheduled before stack
> > frame is constructed:
> Just because the kernel does it, doesn't make it right :-)

True, it was not my intention to declare the usage in the kernel the
correct one ;) . There is in fact PR [1] with a request for what is
the correct construct to use to prevent scheduling in front of the
frame pointer initialization (and other issues with changing SP) in
case when assembly changes (not clobbers!) the stack pointer.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117311

> ~~~~~~~~~~~~~~
> >
> > OTOH, declaring and using %rsp as a global register variable doesn't
> > error out, so one would expect that it would be used throughout the
> > compiler as an alias to the internal stack pointer. The presented
> > testcase exposes one place where this assumption doesn't hold.
> But maybe it should error out or should be transformed internally to
> reference the singleton rtx object.

Adding a SP clobber to asm:

void bar (void)
{
  asm volatile ("#" : : : "rsp");
}

results in:

warning: listing the stack pointer register ‘rsp’ in a clobber list is
deprecated [-Wdeprecated]
note: the value of the stack pointer after an ‘asm’ statement must be
the same as it was before the statement

But what is the non-deprecated way to communicate the fact that SP
changes, and possibly clobbers stack in the asm to the compiler?

Uros.
  
Richard Biener Nov. 5, 2024, 9:38 a.m. UTC | #7
On Tue, Nov 5, 2024 at 8:54 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 1:08 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >
> >
> >
> > On 11/1/24 10:44 AM, Uros Bizjak wrote:
> > > On Fri, Nov 1, 2024 at 3:51 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> > >>
> > >> On 11/1/24 8:45 AM, Uros Bizjak wrote:
> > >>> On Fri, Nov 1, 2024 at 2:18 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 11/1/24 5:34 AM, Uros Bizjak wrote:
> > >>>>> Stack pointer modifications in asm are currently not flagged in
> > >>>>> crtl->sp_is_unchanging due to RTX pointer comparison in
> > >>>>> notice_stack_pointer_modification_1.  Pointer comparison does not detect
> > >>>>> that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)"
> > >>>>> RTXes actually all correspond to the same stack pointer register.
> > >>>>>
> > >>>>> Due to the above omission, the compiler does not detect that asm RTX
> > >>>>> manipulates stack pointer in the following construct:
> > >>>> But how did you get two distinct RTXs for the stack pointer?  That's not
> > >>>> supposed to happen IIRC.
> > >>>
> > >>> Please see the testcase in the patch:
> > >>>
> > >>> +register unsigned long current_stack_pointer asm ("%rsp");
> > >>> +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> > >>>
> > >>> [...]
> > >>>
> > >>> +  asm volatile ("pushfq; push %1; pop %0; popfq"
> > >>> + : "=r" (y), ASM_CALL_CONSTRAINT
> > >>> + : "e" (-1));
> > >>>
> > >>> When compiled for x86_64, the stack pointer RTX that is different from
> > >>> the generic stack pointer RTX is created.
> > >> If we're allowing that, then I suspect there's a good number of places
> > >> that will ultimately need to be fixed.  It's baked in pretty deep that
> > >> there is only one stack pointer.
> > >
> > > grepping for "== stack_pointer_rtx" in gcc/ and its subdirectories
> > > returns 178 hits, which doesn't look that bad.
> > But if we allow this case, then we probably need to fix the frame
> > pointer, hard frame pointer and arg pointer cases too.
>
> I don't think it is possible to declare a global register that would
> conflict with "%frame" or "%argp" and "%rbp" (with
> -fno-omit-frame-pointer). Declaring these as global register in the
> following source:
>
> --cut here--
> register unsigned long hardreg asm ("%frame");
> #define ASM_CONSTRAINT "+r" (hardreg)
>
> void foo (void)
> {
>   asm volatile ("#" : ASM_CONSTRAINT);
> }
> --cut here--
>
> errors out with:
>
> error: register specified for ‘hardreg’ is an internal GCC implementation detail
>
> while declaring global register with "%rbp" errors out with:
>
> error: frame pointer required, but reserved
>
> when -fno-omit-frame-pointer is in effect.
>
> > >
> > >> So I think we need to conclude whether or not the testcase is valid or
> > >> not first.
> > >
> > > This is what x86 linux kernel uses (arch/x86/include/asm/asm.h) as a
> > > scheduling barrier to prevent asm from being scheduled before stack
> > > frame is constructed:
> > Just because the kernel does it, doesn't make it right :-)
>
> True, it was not my intention to declare the usage in the kernel the
> correct one ;) . There is in fact PR [1] with a request for what is
> the correct construct to use to prevent scheduling in front of the
> frame pointer initialization (and other issues with changing SP) in
> case when assembly changes (not clobbers!) the stack pointer.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117311
>
> > ~~~~~~~~~~~~~~
> > >
> > > OTOH, declaring and using %rsp as a global register variable doesn't
> > > error out, so one would expect that it would be used throughout the
> > > compiler as an alias to the internal stack pointer. The presented
> > > testcase exposes one place where this assumption doesn't hold.
> > But maybe it should error out or should be transformed internally to
> > reference the singleton rtx object.
>
> Adding a SP clobber to asm:
>
> void bar (void)
> {
>   asm volatile ("#" : : : "rsp");
> }
>
> results in:
>
> warning: listing the stack pointer register ‘rsp’ in a clobber list is
> deprecated [-Wdeprecated]
> note: the value of the stack pointer after an ‘asm’ statement must be
> the same as it was before the statement
>
> But what is the non-deprecated way to communicate the fact that SP
> changes, and possibly clobbers stack in the asm to the compiler?

I don't think there's an existing way to tell the compiler that "arbitrary stack
locations" are clobbered (SP doesn't really change).  Only stack portions
allocated to variables are considered clobbered with a "memory" clobber.

I suggest to write an assembly routine if people want to scribble to random
%sp based addresses.

I'll note that those "random" (not variable backed) stack locations might be
used by the compiler for example for spill slots - there's also no way to
prevent or communicate the allocation of stack for such purposes, so the
asm() would have to preserve all stack content and not only the stack
pointer.

Richard.

>
> Uros.
  
Andreas Schwab Nov. 5, 2024, 10:42 a.m. UTC | #8
On Nov 05 2024, Uros Bizjak wrote:

> But what is the non-deprecated way to communicate the fact that SP
> changes, and possibly clobbers stack in the asm to the compiler?

Since an asm statement is not allowed to change SP there is no need for
that.
  
Uros Bizjak Nov. 5, 2024, 10:49 a.m. UTC | #9
On Tue, Nov 5, 2024 at 10:38 AM Richard Biener
<richard.guenther@gmail.com> wrote:

> > > >> So I think we need to conclude whether or not the testcase is valid or
> > > >> not first.
> > > >
> > > > This is what x86 linux kernel uses (arch/x86/include/asm/asm.h) as a
> > > > scheduling barrier to prevent asm from being scheduled before stack
> > > > frame is constructed:
> > > Just because the kernel does it, doesn't make it right :-)
> >
> > True, it was not my intention to declare the usage in the kernel the
> > correct one ;) . There is in fact PR [1] with a request for what is
> > the correct construct to use to prevent scheduling in front of the
> > frame pointer initialization (and other issues with changing SP) in
> > case when assembly changes (not clobbers!) the stack pointer.
> >
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117311
> >
> > > ~~~~~~~~~~~~~~
> > > >
> > > > OTOH, declaring and using %rsp as a global register variable doesn't
> > > > error out, so one would expect that it would be used throughout the
> > > > compiler as an alias to the internal stack pointer. The presented
> > > > testcase exposes one place where this assumption doesn't hold.
> > > But maybe it should error out or should be transformed internally to
> > > reference the singleton rtx object.
> >
> > Adding a SP clobber to asm:
> >
> > void bar (void)
> > {
> >   asm volatile ("#" : : : "rsp");
> > }
> >
> > results in:
> >
> > warning: listing the stack pointer register ‘rsp’ in a clobber list is
> > deprecated [-Wdeprecated]
> > note: the value of the stack pointer after an ‘asm’ statement must be
> > the same as it was before the statement
> >
> > But what is the non-deprecated way to communicate the fact that SP
> > changes, and possibly clobbers stack in the asm to the compiler?
>
> I don't think there's an existing way to tell the compiler that "arbitrary stack
> locations" are clobbered (SP doesn't really change).  Only stack portions
> allocated to variables are considered clobbered with a "memory" clobber.

Perhaps I was not clear enough; the asm is not allowed, and will not
clobber the frame that is allocated and used by the function. IOW,
when the function allocates its frame, the asm will not touch it. When
the frame is setup, we have this situation:

---- previous SP
function frame
---- current SP
<- we are here

Here, the asm pushes some value to the stack, either directly using
push/pop or indirectly using call insn.

Considering the case without redzone, if the function frame is
correctly set up, then the asm can push and pop values at will,
assuming that at the end it sets the SP value to its original value.
To prevent unwanted scheduling of the asm (where asm is moved before
stack frame creation or after stack frame teardown), the linux kernel
uses ASM_CALL_CONSTRAINT to depend asm on the RSP.

Now, we would like to compile the (kernel) source with redzone support
(-mred-zone). In this case we have to prevent redzone creation when
the function uses asm that "hides" call insn (or push/pop). The
function that uses asm with hidden call instruction is *not* leaf
anymore, and if it is classified as such, the call or push/pop combo
inside asm clobbers *redzone* of the leaf function.

The compiler already detects when push/pop RTX is used in the
instruction stream and cancels redzone creation (via
crtl->sp_is_unchanging). However, the compiler currently doesn't
detect when SP is used as output register in the asm (when
ASM_CALL_CONSTRAINT is used), because it uses *pointer comparison*
with stack_pointer_rtx instead of rtx_equal_p (reg,
stack_pointer_rtx). Pointer comparison does not detect that "(reg/v:DI
7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)" RTXes actually
both correspond to the same stack pointer register.

The above pointer comparisons with stack_pointer_rtx are used
throughout the compiler source, so if one is able to manually create
RTX that corresponds to the same stack pointer register number, the
compiler won't detect equality. This is a deficiency I would like to
point out.

Uros.
  
Uros Bizjak Nov. 5, 2024, 11 a.m. UTC | #10
On Tue, Nov 5, 2024 at 11:43 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
>
> On Nov 05 2024, Uros Bizjak wrote:
>
> > But what is the non-deprecated way to communicate the fact that SP
> > changes, and possibly clobbers stack in the asm to the compiler?
>
> Since an asm statement is not allowed to change SP there is no need for
> that.

asm volatile ("pushfq; popfq %0" : "=r"(x));

doesn't change SP at any sequence point, doesn't clobber the function
frame, but the function using it shouldn't create redzone. How to
communicate this to the compiler?

Uros.
  
Uros Bizjak Nov. 5, 2024, 11:03 a.m. UTC | #11
On Tue, Nov 5, 2024 at 12:00 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 11:43 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
> >
> > On Nov 05 2024, Uros Bizjak wrote:
> >
> > > But what is the non-deprecated way to communicate the fact that SP
> > > changes, and possibly clobbers stack in the asm to the compiler?
> >
> > Since an asm statement is not allowed to change SP there is no need for
> > that.
>
> asm volatile ("pushfq; popfq %0" : "=r"(x));
>
> doesn't change SP at any sequence point, doesn't clobber the function
> frame, but the function using it shouldn't create redzone. How to
> communicate this to the compiler?

"shouldn't use redzone"

Uros.
  
Jakub Jelinek Nov. 5, 2024, 11:19 a.m. UTC | #12
On Tue, Nov 05, 2024 at 12:00:24PM +0100, Uros Bizjak wrote:
> On Tue, Nov 5, 2024 at 11:43 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
> >
> > On Nov 05 2024, Uros Bizjak wrote:
> >
> > > But what is the non-deprecated way to communicate the fact that SP
> > > changes, and possibly clobbers stack in the asm to the compiler?
> >
> > Since an asm statement is not allowed to change SP there is no need for
> > that.
> 
> asm volatile ("pushfq; popfq %0" : "=r"(x));
> 
> doesn't change SP at any sequence point, doesn't clobber the function
> frame, but the function using it shouldn't create redzone. How to
> communicate this to the compiler?

One possibility would be make __attribute__((target ("no-red-zone")))
working (I think currently the option isn't TargetSave or isn't listed
somewhere, so it doesn't work).

And the other  one is IMHO making it explicit that it clobbers the red zone,
so "redzone" or "red-zone" clobber (or some other word, but can't be easily
stack-below-sp because we have hppa with stack growing in the wrong
direction).

	Jakub
  
Andreas Schwab Nov. 5, 2024, 11:49 a.m. UTC | #13
On Nov 05 2024, Uros Bizjak wrote:

> asm volatile ("pushfq; popfq %0" : "=r"(x));
>
> doesn't change SP at any sequence point, doesn't clobber the function
> frame, but the function using it shouldn't create redzone. How to
> communicate this to the compiler?

Saying that SP is changed (when it isn't) certainly isn't the right way.
  
Uros Bizjak Nov. 5, 2024, 1:47 p.m. UTC | #14
On Tue, Nov 5, 2024 at 12:19 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 05, 2024 at 12:00:24PM +0100, Uros Bizjak wrote:
> > On Tue, Nov 5, 2024 at 11:43 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > >
> > > On Nov 05 2024, Uros Bizjak wrote:
> > >
> > > > But what is the non-deprecated way to communicate the fact that SP
> > > > changes, and possibly clobbers stack in the asm to the compiler?
> > >
> > > Since an asm statement is not allowed to change SP there is no need for
> > > that.
> >
> > asm volatile ("pushfq; popfq %0" : "=r"(x));
> >
> > doesn't change SP at any sequence point, doesn't clobber the function
> > frame, but the function using it shouldn't create redzone. How to
> > communicate this to the compiler?
>
> One possibility would be make __attribute__((target ("no-red-zone")))
> working (I think currently the option isn't TargetSave or isn't listed
> somewhere, so it doesn't work).

This won't work if we have a macro that is used in several functions.
We will have to manually add the above attribute to all functions that
inline the macro with redzone clobbering asm.

> And the other  one is IMHO making it explicit that it clobbers the red zone,
> so "redzone" or "red-zone" clobber (or some other word, but can't be easily
> stack-below-sp because we have hppa with stack growing in the wrong
> direction).

We would have to define a memory clobber that would correspond to
redzone and attach it to the asm parallel RTX, perhaps:

(clobber (mem:BLK (reg:DI sp)))

that would ultimately clear "crtl->is_leaf" when detected.

Uros.
  
Jakub Jelinek Nov. 5, 2024, 2:05 p.m. UTC | #15
On Tue, Nov 05, 2024 at 02:47:15PM +0100, Uros Bizjak wrote:
> We would have to define a memory clobber that would correspond to
> redzone and attach it to the asm parallel RTX, perhaps:
> 
> (clobber (mem:BLK (reg:DI sp)))

That wouldn't represent what is actually clobbered.
That says everything from sp up is clobbered, while red zone clobber
means everything below it is, so it could be
(clobber (mem:BLK (plus:DI (reg:DI sp) (const_int -red_zone_size))))
with the appropriate red_zone_size as MEM_SIZE.
> 
> that would ultimately clear "crtl->is_leaf" when detected.

	Jakub
  
Richard Biener Nov. 5, 2024, 2:49 p.m. UTC | #16
On Tue, Nov 5, 2024 at 2:47 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 12:19 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, Nov 05, 2024 at 12:00:24PM +0100, Uros Bizjak wrote:
> > > On Tue, Nov 5, 2024 at 11:43 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > > >
> > > > On Nov 05 2024, Uros Bizjak wrote:
> > > >
> > > > > But what is the non-deprecated way to communicate the fact that SP
> > > > > changes, and possibly clobbers stack in the asm to the compiler?
> > > >
> > > > Since an asm statement is not allowed to change SP there is no need for
> > > > that.
> > >
> > > asm volatile ("pushfq; popfq %0" : "=r"(x));
> > >
> > > doesn't change SP at any sequence point, doesn't clobber the function
> > > frame, but the function using it shouldn't create redzone. How to
> > > communicate this to the compiler?
> >
> > One possibility would be make __attribute__((target ("no-red-zone")))
> > working (I think currently the option isn't TargetSave or isn't listed
> > somewhere, so it doesn't work).
>
> This won't work if we have a macro that is used in several functions.
> We will have to manually add the above attribute to all functions that
> inline the macro with redzone clobbering asm.
>
> > And the other  one is IMHO making it explicit that it clobbers the red zone,
> > so "redzone" or "red-zone" clobber (or some other word, but can't be easily
> > stack-below-sp because we have hppa with stack growing in the wrong
> > direction).
>
> We would have to define a memory clobber that would correspond to
> redzone and attach it to the asm parallel RTX, perhaps:
>
> (clobber (mem:BLK (reg:DI sp)))
>
> that would ultimately clear "crtl->is_leaf" when detected.

Maybe never make functions having any volatile asm() "leaf"?  Thus
require 'volatile' to be present - aka the asm has side-effects that
are not fully encoded in the constraints.

Richard.

> Uros.
  
Uros Bizjak Nov. 5, 2024, 4:56 p.m. UTC | #17
On Tue, Nov 5, 2024 at 3:49 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 2:47 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 12:19 PM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Tue, Nov 05, 2024 at 12:00:24PM +0100, Uros Bizjak wrote:
> > > > On Tue, Nov 5, 2024 at 11:43 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > > > >
> > > > > On Nov 05 2024, Uros Bizjak wrote:
> > > > >
> > > > > > But what is the non-deprecated way to communicate the fact that SP
> > > > > > changes, and possibly clobbers stack in the asm to the compiler?
> > > > >
> > > > > Since an asm statement is not allowed to change SP there is no need for
> > > > > that.
> > > >
> > > > asm volatile ("pushfq; popfq %0" : "=r"(x));
> > > >
> > > > doesn't change SP at any sequence point, doesn't clobber the function
> > > > frame, but the function using it shouldn't create redzone. How to
> > > > communicate this to the compiler?
> > >
> > > One possibility would be make __attribute__((target ("no-red-zone")))
> > > working (I think currently the option isn't TargetSave or isn't listed
> > > somewhere, so it doesn't work).
> >
> > This won't work if we have a macro that is used in several functions.
> > We will have to manually add the above attribute to all functions that
> > inline the macro with redzone clobbering asm.
> >
> > > And the other  one is IMHO making it explicit that it clobbers the red zone,
> > > so "redzone" or "red-zone" clobber (or some other word, but can't be easily
> > > stack-below-sp because we have hppa with stack growing in the wrong
> > > direction).
> >
> > We would have to define a memory clobber that would correspond to
> > redzone and attach it to the asm parallel RTX, perhaps:
> >
> > (clobber (mem:BLK (reg:DI sp)))
> >
> > that would ultimately clear "crtl->is_leaf" when detected.
>
> Maybe never make functions having any volatile asm() "leaf"?  Thus
> require 'volatile' to be present - aka the asm has side-effects that
> are not fully encoded in the constraints.

I think this would work!

Although this *might* be somehow a big hammer approach, disabling a
redzone optimization for leaf function with volatile asm is not that
problematic. Nowadays, small functions are inlined, so setting up and
tearing down a frame is not that performance critical.

And as you said - conceptually volatile covers "unspecified" side
effects. asm volatile might change stack (not stack pointer!) as a
side effect, so function with asm volatile can not be leaf.

We can document that any stack manipulation inside asm requires asm to
be volatile, that would also satisfy PR117311.

Uros.
  
Jakub Jelinek Nov. 5, 2024, 5:08 p.m. UTC | #18
On Tue, Nov 05, 2024 at 05:56:09PM +0100, Uros Bizjak wrote:
> > Maybe never make functions having any volatile asm() "leaf"?  Thus
> > require 'volatile' to be present - aka the asm has side-effects that
> > are not fully encoded in the constraints.
> 
> I think this would work!
> 
> Although this *might* be somehow a big hammer approach, disabling a

I think it will unnecessarily punish lots of code in the wild.
Most asm volatile doesn't do calls in it, or if they do, they have learned
in the past 20 years that on x86_64 there is a red zone and they need to
first subtract red zone size from sp and restore it at the end.

	Jakub
  
Richard Biener Nov. 5, 2024, 5:41 p.m. UTC | #19
> Am 05.11.2024 um 18:08 schrieb Jakub Jelinek <jakub@redhat.com>:
> 
> On Tue, Nov 05, 2024 at 05:56:09PM +0100, Uros Bizjak wrote:
>>> Maybe never make functions having any volatile asm() "leaf"?  Thus
>>> require 'volatile' to be present - aka the asm has side-effects that
>>> are not fully encoded in the constraints.
>> 
>> I think this would work!
>> 
>> Although this *might* be somehow a big hammer approach, disabling a
> 
> I think it will unnecessarily punish lots of code in the wild.

I don’t think it will matter in practice?  What else should volatile be for and why should it exclude this particular side-effect that cannot be expressed by constraints?  IMHO correctness trumps optimization here unless we want to retroactively document the red zone is off limits for any asm()?

Richard 

> Most asm volatile doesn't do calls in it, or if they do, they have learned
> in the past 20 years that on x86_64 there is a red zone and they need to
> first subtract red zone size from sp and restore it at the end.
> 
>    Jakub
>
  
Uros Bizjak Nov. 5, 2024, 5:55 p.m. UTC | #20
On Tue, Nov 5, 2024 at 6:08 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 05, 2024 at 05:56:09PM +0100, Uros Bizjak wrote:
> > > Maybe never make functions having any volatile asm() "leaf"?  Thus
> > > require 'volatile' to be present - aka the asm has side-effects that
> > > are not fully encoded in the constraints.
> >
> > I think this would work!
> >
> > Although this *might* be somehow a big hammer approach, disabling a
>
> I think it will unnecessarily punish lots of code in the wild.
> Most asm volatile doesn't do calls in it, or if they do, they have learned
> in the past 20 years that on x86_64 there is a red zone and they need to
> first subtract red zone size from sp and restore it at the end.

The difference between redzoned and non-redzoned code is *two*
instructions that decrease and increase stack pointer, which I think
is an acceptable compromise between correctness and performance. Also
important is that the approach works for all targets. My two
eurocents...

Attached RFC patch is all it is needed for the testcase to work correctly.

Uros.
diff --git a/gcc/final.cc b/gcc/final.cc
index 11141f2b56c..7d8733a2d7d 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -4044,7 +4044,8 @@ asm_fprintf (FILE *file, const char *p, ...)
   va_end (argptr);
 }
 
-/* Return true if this function has no function calls.  */
+/* Return true if this function has no function calls,
+   volatile asms and UNSPEC_VOLATILE instructions.  */
 
 bool
 leaf_function_p (void)
@@ -4070,6 +4071,9 @@ leaf_function_p (void)
 	  && CALL_P (XVECEXP (PATTERN (insn), 0, 0))
 	  && ! SIBLING_CALL_P (XVECEXP (PATTERN (insn), 0, 0)))
 	return false;
+      if (NONJUMP_INSN_P (insn)
+	  && volatile_insn_p (PATTERN(insn)))
+	return false;
     }
 
   return true;
  
Jakub Jelinek Nov. 5, 2024, 6:08 p.m. UTC | #21
On Tue, Nov 05, 2024 at 06:41:24PM +0100, Richard Biener wrote:
> I don’t think it will matter in practice?

IMHO it will.  Many projects have many inline asms, e.g. glibc several
hundreds of them, none of them need this treatment, many are used in leaf
functions, including really small functions which will be affected most.
Kernel has over 2000 volatile inline asms, and maybe it needs it on 10-20
from those (just guessing).

> What else should volatile be for and why should it exclude this particular
> side-effect that cannot be expressed by constraints?  IMHO correctness
> trumps optimization here unless we want to retroactively document the red
> zone is off limits for any asm()?

volatile primarily means it won't be DCEd if the outputs are unused.
It doesn't imply a "memory" clobber, or clobbering of some other memory
like in this specific case.
GCC never disabled leaf functions because of volatile asms and the red-zone
vs. inline asm stuff isn't something that appeared in the last year, we've
been telling users about it 20 years ago, kernel is well aware of that, just
want some guaranteed way to mark it.  Right now they probably use either
the rsp clobbering or __builtin_frame_address (0), both of which happen to
work but we want to move them from that.  We don't really need to replace
it with nothing, the current way of requesting no red-zone can be replaced
with some other.
And if some other project suffers from this, they can do so as well.
This isn't a silent miscompilation that wouldn't show up for years, if
a function uses red-zone and something pushes something on the stack or even
calls some function which uses several stack slots below the return address
as well, one will see very quickly it doesn't work as expected.

	Jakub
  
Jakub Jelinek Nov. 5, 2024, 6:12 p.m. UTC | #22
On Tue, Nov 05, 2024 at 06:55:33PM +0100, Uros Bizjak wrote:
> The difference between redzoned and non-redzoned code is *two*
> instructions that decrease and increase stack pointer, which I think
> is an acceptable compromise between correctness and performance. Also

It is not.  Because this price is paid for the 99.9% of inline asms that
don't really need it.  It is much better to pay it portably by adding
the two instructions to the inline asm that really need it.  That is
something that will not work just with new GCC versions, but all past
releases as well.
And by adding a new clobber allow better fine tuning (e.g. if one knows they
use inline asm that needs to do calls many times in the same otherwise leaf
function, they can use the clobber to force it).

	Jakub
  
Uros Bizjak Nov. 5, 2024, 6:24 p.m. UTC | #23
On Tue, Nov 5, 2024 at 7:12 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 05, 2024 at 06:55:33PM +0100, Uros Bizjak wrote:
> > The difference between redzoned and non-redzoned code is *two*
> > instructions that decrease and increase stack pointer, which I think
> > is an acceptable compromise between correctness and performance. Also
>
> It is not.  Because this price is paid for the 99.9% of inline asms that
> don't really need it.  It is much better to pay it portably by adding
> the two instructions to the inline asm that really need it.  That is
> something that will not work just with new GCC versions, but all past
> releases as well.
> And by adding a new clobber allow better fine tuning (e.g. if one knows they
> use inline asm that needs to do calls many times in the same otherwise leaf
> function, they can use the clobber to force it).

Maybe inventing "asm call", implicitly volatile asm variant could
solve this issue? This kind of asm would signal the compiler that the
function manipulates stack and should not be marked as leaf.

Uros.
  
Jakub Jelinek Nov. 5, 2024, 6:36 p.m. UTC | #24
On Tue, Nov 05, 2024 at 07:24:42PM +0100, Uros Bizjak wrote:
> On Tue, Nov 5, 2024 at 7:12 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Tue, Nov 05, 2024 at 06:55:33PM +0100, Uros Bizjak wrote:
> > > The difference between redzoned and non-redzoned code is *two*
> > > instructions that decrease and increase stack pointer, which I think
> > > is an acceptable compromise between correctness and performance. Also
> >
> > It is not.  Because this price is paid for the 99.9% of inline asms that
> > don't really need it.  It is much better to pay it portably by adding
> > the two instructions to the inline asm that really need it.  That is
> > something that will not work just with new GCC versions, but all past
> > releases as well.
> > And by adding a new clobber allow better fine tuning (e.g. if one knows they
> > use inline asm that needs to do calls many times in the same otherwise leaf
> > function, they can use the clobber to force it).
> 
> Maybe inventing "asm call", implicitly volatile asm variant could
> solve this issue? This kind of asm would signal the compiler that the
> function manipulates stack and should not be marked as leaf.

It doesn't have to be call.
asm ("pushf; popq %0" : "=r" (x));
needs it too, unless one subtracts 128 from rsp first and adds it back
afterwards.  And I really think clobbers are the best way to express it,
the problem is that the push or call in there clobbers the red-zone.
Most targets don't have something like red-zone, I'm aware of just
x86-64 and some ABIs on rs6000, anything else?  Inventing new keywords
in the FE just because of this looks like an overkill to me, while we already
have an easy way to add further clobbers, just parse them in
decode_reg_name_and_count, return -5, build something for it
in expand_asm_stmt (perhaps with a target hook which also determines if
the clobber is meaningful on the arch) and then just in the backends deal
with it as needed.

	Jakub
  
Uros Bizjak Nov. 6, 2024, 9:44 a.m. UTC | #25
Hello!

After some more thinking and considering all recent discussion
(thanks!), I am convinced that a slightly simplified original patch
(attached), now one-liner, is the way to go.

Let's look at the following test:

--cut here--
unsigned long foo (void)
{
  return __builtin_ia32_readeflags_u64 ();
}
--cut here--

When the above builtin is present in the function, the function
doesn't use redzone. As discussed in the original submission, x86
creates redzone based on crtl->sp_is_unchanging, cleared flag cancels
rezone creation. The flag is cleared in stack-ptr-mod.cc,
notice_stack_pointer_modification_1 when a) stack pointer is
explicitly mentioned or b) when RTX_AUTOINC with SP is detected *in
the output* of the RTX. The final RTX dump of the above function reads
as:

(insn:TI 5 2 6 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
        (unspec:DI [
                (reg:CC 17 flags)
            ] UNSPEC_PUSHFL)) "unchan.c":6:10 80 {pushfldi2}
     (expr_list:REG_DEAD (reg:CC 17 flags)
        (nil)))
(insn 6 5 12 2 (set (reg:DI 0 ax [99])
        (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0  S8 A8]))
"unchan.c":6:10 76 {*popdi1}
     (nil))

and notice_stack_pointer_modification_1 triggers on the first instruction.

Now consider the following test:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

unsigned long bar (void)
{
  unsigned long res;

  asm ("pushfq" : ASM_CALL_CONSTRAINT);
  asm ("popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
  return res;
}
--cut here--

The above function results in exactly the same asm as the one in the
first test. However, notice_stack_pointer_modification_1 does not
detect stack pointer access here, but it should. Here we *do* change
SP, so IMHO this case addresses Andreas' remark that we shouldn't say
we changed SP if we don't do it. Please note that due to inout
constraint on %rsp, the scheduler won't reschedule insns even without
"volatile" asm qualifier.

Going a bit further, we can join the asm to form a new test:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

unsigned long baz (void)
{
  unsigned long res;

  asm ("pushfq; popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
  return res;
}
--cut here--

here we inform the compiler that we read and write the rsp, not
necessarily change it, maybe something like "add %rsp, $0". As above,
notice_stack_pointer_modification_1 does not trigger here.

When we consider the very first testcase,
notice_stack_pointer_modification_1 is interested only in PUSH type
instruction. It doesn't concern POP instruction, where  SP gets
modified, too. The approach and logic of *stack_ptr_mod pass in fact
only looks if SP is mentioned in the output of the RTX, not what
happens with it, and records its detection by clearing
crtl->sp_is_unchanging.

This brings us to the logic of my now one-liner patch. There is no
difference in the assembly for any of the above tests, but in the
later two tests the compiler does not detect that RSP is changed,
although the access  is correctly modeled with inout constraints (so
SP hard register is mentioned in the output). There is a condition in
the notice_stack_pointer_modification_1 that tries to detect this
case, but it assumes that only one SP is defined and uses pointer
comparison. Pointer comparison does not detect that "(reg/v:DI 7 sp [
current_stack_pointer ])" and "(reg/f:DI 7 sp)" RTXes actually all
correspond to the same stack pointer register.

The patch fixes this wrong assumption. The change is localized
specifically to *stack_ptr_mod pass and doesn't affect anything else,
but clears crtl->sp_is_unchanging for RTXes where global register
"%rsp" is mentioned in the output, including:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

void test (void)
{
  current_stack_pointer += 1;
}
--cut here

which is currently not detected as the function where SP is changing,
even though we have:

(insn:TI 6 5 16 2 (parallel [
            (set (reg/v:DI 7 sp [ current_stack_pointer ])
                (plus:DI (reg/v:DI 7 sp [ current_stack_pointer ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "unchan.c":6:25 284 {*adddi_1}

Also, Jeff asked about validity of the

register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

If we want to confine e.g.:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

void c (void);

void test (void)
{
  asm ("call %c1" : ASM_CALL_CONSTRAINT : "i"(c));
}
--cut here--

between frame setup and teardown (when compiled with -fno-omit-frame-pointer):

0000000000000010 <test>:
 10:   55                      push   %rbp
 11:   48 89 e5                mov    %rsp,%rbp
 14:   e8 00 00 00 00          call   19 <test+0x9>
 19:   c9                      leave
 1a:   c3                      ret

then we have to depend the asm on %rsp to avoid asm from being moved
above insn @11 (output dependence) or below insn @19 ( == mov
%rbp,%rsp, input dependence). There is no other way to prevent
scheduler from moving the asm, we can't declare %rbp as global
register when -fno-omit-frame-pointer is in effect.

Based on the above discussion I propose a simplified patch (attached)
that effectively prevents redzone creation when ASM_CALL_CONSTRAINT is
used. I have tested it with gcc bootstrap and regtest, and since linux
is a heavy user of the ASM_CALL_CONSTRAINT construct, also by
compiling and booting the latest mainline kernel (Linux version
6.12.0-rc6-00004-g67f14dce9061). I have no means to test with FRED
enabled kernel, but with the proposed change at least correctly tagged
asm that changes SP won't be problematic when the kernel will be
compiled with -mred-zone.

Uros.
diff --git a/gcc/stack-ptr-mod.cc b/gcc/stack-ptr-mod.cc
index ba38de54bb1..a522a67c0b4 100644
--- a/gcc/stack-ptr-mod.cc
+++ b/gcc/stack-ptr-mod.cc
@@ -34,7 +34,7 @@ static void
 notice_stack_pointer_modification_1 (rtx x, const_rtx pat ATTRIBUTE_UNUSED,
 				     void *data ATTRIBUTE_UNUSED)
 {
-  if (x == stack_pointer_rtx
+  if (rtx_equal_p (x, stack_pointer_rtx)
       /* The stack pointer is only modified indirectly as the result
 	 of a push until later.  See the comments in rtl.texi
 	 regarding Embedded Side-Effects on Addresses.  */
  
Jakub Jelinek Nov. 6, 2024, 10:57 a.m. UTC | #26
On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote:
> After some more thinking and considering all recent discussion
> (thanks!), I am convinced that a slightly simplified original patch
> (attached), now one-liner, is the way to go.
> 
> Let's look at the following test:
> 
> --cut here--
> unsigned long foo (void)
> {
>   return __builtin_ia32_readeflags_u64 ();
> }
> --cut here--

In theory we could handle this one not by disabling red zone altogether,
but just forcing the backend to reserve the uppermost 64 bits of the red zone
if it sees such an instruction.  But probably not worth it.  And
could work fine if push is enforced to be adjacent to the pop (but
having them in the same insn will not result in correct .eh_frame),
though if it can be separated from it and some memory loads/stores can
appear in between it could be nasty.

> Now consider the following test:
> 
> --cut here--
> register unsigned long current_stack_pointer asm ("%rsp");
> #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> 
> unsigned long bar (void)
> {
>   unsigned long res;
> 
>   asm ("pushfq" : ASM_CALL_CONSTRAINT);
>   asm ("popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
>   return res;
> }
> --cut here--

This test is certainly invalid.
Modifying stack pointer by the inline asm results in something the
compiler can't deal with, e.g. if it needs to reload something across
the asm and not using frame pointers, it just can't recover, it doesn't
know where what it saved earlier will be, because stack pointer changed.
Similarly it would be invalid to do current_stack_pointer = buf;
or similar.

> --cut here--
> register unsigned long current_stack_pointer asm ("%rsp");
> #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> 
> unsigned long baz (void)
> {
>   unsigned long res;
> 
>   asm ("pushfq; popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
>   return res;
> }
> --cut here--
> 
> here we inform the compiler that we read and write the rsp, not
> necessarily change it, maybe something like "add %rsp, $0". As above,
> notice_stack_pointer_modification_1 does not trigger here.

While this happened to work, we've discussed several times this isn't
something we want to recommend to people.
Generally, if inline asm temporarily changes some register and restores it
back before returning to C code, there is no need to mention that register
in clobbers or in "+r" operand.  NAnd the "+r" (current_stack_pointer)
says that the stack pointer was or could have been changed, which is wrong,
valid inline asm can never change the stack pointer.  It can change it
temporarily and then restore if it knows what it is doing.

> --cut here--
> register unsigned long current_stack_pointer asm ("%rsp");
> #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> 
> void test (void)
> {
>   current_stack_pointer += 1;
> }
> --cut here

As I said above, this is invalid testcase too.

	Jakub
  
Uros Bizjak Nov. 6, 2024, 3:27 p.m. UTC | #27
On Wed, Nov 6, 2024 at 11:57 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote:
> > After some more thinking and considering all recent discussion
> > (thanks!), I am convinced that a slightly simplified original patch
> > (attached), now one-liner, is the way to go.
>
> > --cut here--> > register unsigned long current_stack_pointer asm ("%rsp");
> > #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> >
> > unsigned long baz (void)
> > {
> >   unsigned long res;
> >
> >   asm ("pushfq; popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
> >   return res;
> > }
> > --cut here--
> >
> > here we inform the compiler that we read and write the rsp, not
> > necessarily change it, maybe something like "add %rsp, $0". As above,
> > notice_stack_pointer_modification_1 does not trigger here.
>
> While this happened to work, we've discussed several times this isn't
> something we want to recommend to people.
> Generally, if inline asm temporarily changes some register and restores it
> back before returning to C code, there is no need to mention that register
> in clobbers or in "+r" operand.  NAnd the "+r" (current_stack_pointer)
> says that the stack pointer was or could have been changed, which is wrong,
> valid inline asm can never change the stack pointer.  It can change it
> temporarily and then restore if it knows what it is doing.

I see. While my solution would fit nicely with the above
ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT
is wrong by itself.

Oh, well.

Anyway, I guess "redzone" clobber you proposed does not remove the
need to use ASM_CALL_CONSTRAINT if we want to confine the "asm with
call inside" between frame setup/teardown insns? Is it possible to
invent similar clobber (unimaginatively called "stack", perhaps) that
would prevent scheduler from moving asm to the wrong place?

Uros.
  
H. Peter Anvin Nov. 6, 2024, 3:45 p.m. UTC | #28
On November 6, 2024 7:27:51 AM PST, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Wed, Nov 6, 2024 at 11:57 AM Jakub Jelinek <jakub@redhat.com> wrote:
>>
>> On Wed, Nov 06, 2024 at 10:44:54AM +0100, Uros Bizjak wrote:
>> > After some more thinking and considering all recent discussion
>> > (thanks!), I am convinced that a slightly simplified original patch
>> > (attached), now one-liner, is the way to go.
>>
>> > --cut here--> > register unsigned long current_stack_pointer asm ("%rsp");
>> > #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
>> >
>> > unsigned long baz (void)
>> > {
>> >   unsigned long res;
>> >
>> >   asm ("pushfq; popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
>> >   return res;
>> > }
>> > --cut here--
>> >
>> > here we inform the compiler that we read and write the rsp, not
>> > necessarily change it, maybe something like "add %rsp, $0". As above,
>> > notice_stack_pointer_modification_1 does not trigger here.
>>
>> While this happened to work, we've discussed several times this isn't
>> something we want to recommend to people.
>> Generally, if inline asm temporarily changes some register and restores it
>> back before returning to C code, there is no need to mention that register
>> in clobbers or in "+r" operand.  NAnd the "+r" (current_stack_pointer)
>> says that the stack pointer was or could have been changed, which is wrong,
>> valid inline asm can never change the stack pointer.  It can change it
>> temporarily and then restore if it knows what it is doing.
>
>I see. While my solution would fit nicely with the above
>ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT
>is wrong by itself.
>
>Oh, well.
>
>Anyway, I guess "redzone" clobber you proposed does not remove the
>need to use ASM_CALL_CONSTRAINT if we want to confine the "asm with
>call inside" between frame setup/teardown insns? Is it possible to
>invent similar clobber (unimaginatively called "stack", perhaps) that
>would prevent scheduler from moving asm to the wrong place?
>
>Uros.
>

I suggested __builtin_frame_address(0) as an input constraint (which already works in gcc and clang) and the "red-zone" clobber (new) for this exact reason (Andrew Pinski, however, summarily closed those BRs.)

Semantically, using the "+r" (rsp) constraint obviously doesn't make any sense, but we don't care if it is what the gcc community wants to support going forward, we'll use it, but it really, really needs to be documented as it is incredibly non-obvious.
  
Jakub Jelinek Nov. 6, 2024, 3:52 p.m. UTC | #29
On Wed, Nov 06, 2024 at 04:27:51PM +0100, Uros Bizjak wrote:
> I see. While my solution would fit nicely with the above
> ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT
> is wrong by itself.
> 
> Oh, well.
> 
> Anyway, I guess "redzone" clobber you proposed does not remove the
> need to use ASM_CALL_CONSTRAINT if we want to confine the "asm with
> call inside" between frame setup/teardown insns? Is it possible to
> invent similar clobber (unimaginatively called "stack", perhaps) that
> would prevent scheduler from moving asm to the wrong place?

I don't understand why would one want to avoid moving such inline asms
there, the profiler can be called from there too and if a function
doesn't use a red zone and the inline asm restores all registers except
those explicitly clobbered in the clobbers, I think such calls can be done
pretty much everywhere.  Sure, frame pointers might miss one frame, but
frame pointer reliability for backtraces is abysmal anyway.
But I'm not aware of a testcase where we would actually move inline asm
there either (at least not on x86).  Is that some known problem (and then
why?) or just theoretical issue?

Or do you want a new clobber which will automatically add all call clobbered
registers to the clobbers?  That might be useful, although in case where
one compiler can support multiple call ABIs one needs to pick one; would
that stand for the ABI of the current function, or something else?

	Jakub
  
Jakub Jelinek Nov. 6, 2024, 3:54 p.m. UTC | #30
On Wed, Nov 06, 2024 at 07:45:54AM -0800, H. Peter Anvin wrote:
> I suggested __builtin_frame_address(0) as an input constraint (which
> already works in gcc and clang) and the "red-zone" clobber (new) for this
> exact reason (Andrew Pinski, however, summarily closed those BRs.)

I posted a patch for "redzone" clobber in
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667781.html

	Jakub
  
Uros Bizjak Nov. 6, 2024, 4:05 p.m. UTC | #31
On Wed, Nov 6, 2024 at 4:53 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Nov 06, 2024 at 04:27:51PM +0100, Uros Bizjak wrote:
> > I see. While my solution would fit nicely with the above
> > ASM_CALL_CONSTRAINT approach, the approach using ASM_CALL_CONSTRAINT
> > is wrong by itself.
> >
> > Oh, well.
> >
> > Anyway, I guess "redzone" clobber you proposed does not remove the
> > need to use ASM_CALL_CONSTRAINT if we want to confine the "asm with
> > call inside" between frame setup/teardown insns? Is it possible to
> > invent similar clobber (unimaginatively called "stack", perhaps) that
> > would prevent scheduler from moving asm to the wrong place?
>
> I don't understand why would one want to avoid moving such inline asms
> there, the profiler can be called from there too and if a function
> doesn't use a red zone and the inline asm restores all registers except
> those explicitly clobbered in the clobbers, I think such calls can be done
> pretty much everywhere.  Sure, frame pointers might miss one frame, but
> frame pointer reliability for backtraces is abysmal anyway.
> But I'm not aware of a testcase where we would actually move inline asm
> there either (at least not on x86).  Is that some known problem (and then
> why?) or just theoretical issue?

Please see [1]:

/*
 * This output constraint should be used for any inline asm which has a "call"
 * instruction.  Otherwise the asm may be inserted before the frame pointer
 * gets set up by the containing function.  If you forget to do this, objtool
 * may print a "call without frame pointer save/setup" warning.
 */
register unsigned long current_stack_pointer asm(_ASM_SP);
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

Perhaps HPA can say some more?

[1] https://elixir.bootlin.com/linux/v6.11.6/source/arch/x86/include/asm/asm.h#L223

> Or do you want a new clobber which will automatically add all call clobbered
> registers to the clobbers?  That might be useful, although in case where
> one compiler can support multiple call ABIs one needs to pick one; would
> that stand for the ABI of the current function, or something else?

No, functions called from asm are not ABI compliant functions.

Uros.
  
Jakub Jelinek Nov. 6, 2024, 4:23 p.m. UTC | #32
On Wed, Nov 06, 2024 at 05:05:54PM +0100, Uros Bizjak wrote:
> Please see [1]:
> 
> /*
>  * This output constraint should be used for any inline asm which has a "call"
>  * instruction.  Otherwise the asm may be inserted before the frame pointer
>  * gets set up by the containing function.  If you forget to do this, objtool
>  * may print a "call without frame pointer save/setup" warning.
>  */
> register unsigned long current_stack_pointer asm(_ASM_SP);
> #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> 
> Perhaps HPA can say some more?

So workaround about issues in some kernel tool?
Not sure if gcc needs to provide workaround for that.
Just do the call in a separate .subsection or add some magic labels around
it that the tool can check and disable the warning.  Perhaps just a label
at the start of the call insn with some special prefix followed by %=
that the tool can use to find out calls coming from inline asm vs. calls
from C code.

	Jakub
  
Uros Bizjak Nov. 6, 2024, 4:31 p.m. UTC | #33
On Wed, Nov 6, 2024 at 5:23 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Nov 06, 2024 at 05:05:54PM +0100, Uros Bizjak wrote:
> > Please see [1]:
> >
> > /*
> >  * This output constraint should be used for any inline asm which has a "call"
> >  * instruction.  Otherwise the asm may be inserted before the frame pointer
> >  * gets set up by the containing function.  If you forget to do this, objtool
> >  * may print a "call without frame pointer save/setup" warning.
> >  */
> > register unsigned long current_stack_pointer asm(_ASM_SP);
> > #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
> >
> > Perhaps HPA can say some more?
>
> So workaround about issues in some kernel tool?
> Not sure if gcc needs to provide workaround for that.
> Just do the call in a separate .subsection or add some magic labels around
> it that the tool can check and disable the warning.  Perhaps just a label
> at the start of the call insn with some special prefix followed by %=
> that the tool can use to find out calls coming from inline asm vs. calls
> from C code.

Maybe even prefix the call with "cs;" prefix.

Uros.
  
Jakub Jelinek Nov. 6, 2024, 4:55 p.m. UTC | #34
On Wed, Nov 06, 2024 at 05:31:53PM +0100, Uros Bizjak wrote:
> > So workaround about issues in some kernel tool?
> > Not sure if gcc needs to provide workaround for that.
> > Just do the call in a separate .subsection or add some magic labels around
> > it that the tool can check and disable the warning.  Perhaps just a label
> > at the start of the call insn with some special prefix followed by %=
> > that the tool can use to find out calls coming from inline asm vs. calls
> > from C code.
> 
> Maybe even prefix the call with "cs;" prefix.

I'd think the "redzone" constraint would actually also likely prevent moving
the asm across stack pointer changing instructions as it has a stack pointer
+ offset based MEM.
Of course, for ia32 or other targets the "redzone" clobber won't be
supported or will do nothing, but then say "=m" (dummy) or "m" (dummy) for
char dummy; automatic var or something similar could prevent the moving as
well.

	Jakub
  
H. Peter Anvin Nov. 6, 2024, 6:03 p.m. UTC | #35
On November 6, 2024 8:31:53 AM PST, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Wed, Nov 6, 2024 at 5:23 PM Jakub Jelinek <jakub@redhat.com> wrote:
>>
>> On Wed, Nov 06, 2024 at 05:05:54PM +0100, Uros Bizjak wrote:
>> > Please see [1]:
>> >
>> > /*
>> >  * This output constraint should be used for any inline asm which has a "call"
>> >  * instruction.  Otherwise the asm may be inserted before the frame pointer
>> >  * gets set up by the containing function.  If you forget to do this, objtool
>> >  * may print a "call without frame pointer save/setup" warning.
>> >  */
>> > register unsigned long current_stack_pointer asm(_ASM_SP);
>> > #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
>> >
>> > Perhaps HPA can say some more?
>>
>> So workaround about issues in some kernel tool?
>> Not sure if gcc needs to provide workaround for that.
>> Just do the call in a separate .subsection or add some magic labels around
>> it that the tool can check and disable the warning.  Perhaps just a label
>> at the start of the call insn with some special prefix followed by %=
>> that the tool can use to find out calls coming from inline asm vs. calls
>> from C code.
>
>Maybe even prefix the call with "cs;" prefix.
>
>Uros.
>

The issue is that we want the frame pointer chain to be maintained, even across alternatives. This isn't to quiet a spurious warning, rather objtool is throwing a legitimate warning that a level environment invariant is being violated.
  
Jakub Jelinek Nov. 6, 2024, 6:15 p.m. UTC | #36
On Wed, Nov 06, 2024 at 10:03:25AM -0800, H. Peter Anvin wrote:
> The issue is that we want the frame pointer chain to be maintained, even
> across alternatives.

If the current function doesn't have frame pointer set up yet (or is in the
epilogue after it got restored already), then the chain is still maintained,
just one frame will be missing from it.  I don't see why that would be a big
deal.

>  This isn't to quiet a spurious warning, rather
> objtool is throwing a legitimate warning that a level environment
> invariant is being violated.

Anyway, do you have a testcase where you saw problematic inline asm being
moved into the prologue or after the epilogue or is that just theoretical?
So that we can test if "redzone" clobber prevents that or if "=m" (dummy)
does or similar?
The rsp register var in output is something we should warn about at least,
it isn't really something kernel should keep using.

	Jakub
  
H. Peter Anvin Nov. 6, 2024, 6:17 p.m. UTC | #37
On November 6, 2024 10:15:13 AM PST, Jakub Jelinek <jakub@redhat.com> wrote:
>On Wed, Nov 06, 2024 at 10:03:25AM -0800, H. Peter Anvin wrote:
>> The issue is that we want the frame pointer chain to be maintained, even
>> across alternatives.
>
>If the current function doesn't have frame pointer set up yet (or is in the
>epilogue after it got restored already), then the chain is still maintained,
>just one frame will be missing from it.  I don't see why that would be a big
>deal.
>
>>  This isn't to quiet a spurious warning, rather
>> objtool is throwing a legitimate warning that a level environment
>> invariant is being violated.
>
>Anyway, do you have a testcase where you saw problematic inline asm being
>moved into the prologue or after the epilogue or is that just theoretical?
>So that we can test if "redzone" clobber prevents that or if "=m" (dummy)
>does or similar?
>The rsp register var in output is something we should warn about at least,
>it isn't really something kernel should keep using.
>
>	Jakub
>

Not theoretical. I don't have an example *right now* because I'm about to step out the door for an internal trip.
  

Patch

diff --git a/gcc/stack-ptr-mod.cc b/gcc/stack-ptr-mod.cc
index ba38de54bb1..9a205c58707 100644
--- a/gcc/stack-ptr-mod.cc
+++ b/gcc/stack-ptr-mod.cc
@@ -34,13 +34,13 @@  static void
 notice_stack_pointer_modification_1 (rtx x, const_rtx pat ATTRIBUTE_UNUSED,
 				     void *data ATTRIBUTE_UNUSED)
 {
-  if (x == stack_pointer_rtx
+  if (rtx_equal_p (x, stack_pointer_rtx)
       /* The stack pointer is only modified indirectly as the result
 	 of a push until later.  See the comments in rtl.texi
 	 regarding Embedded Side-Effects on Addresses.  */
       || (MEM_P (x)
 	  && GET_RTX_CLASS (GET_CODE (XEXP (x, 0))) == RTX_AUTOINC
-	  && XEXP (XEXP (x, 0), 0) == stack_pointer_rtx))
+	  && rtx_equal_p (XEXP (XEXP (x, 0), 0), stack_pointer_rtx)))
     crtl->sp_is_unchanging = 0;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/pr117359.c b/gcc/testsuite/gcc.target/i386/pr117359.c
new file mode 100644
index 00000000000..7c996d611db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr117359.c
@@ -0,0 +1,44 @@ 
+/* PR middle-end/117359 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mred-zone" } */
+
+register unsigned long current_stack_pointer asm ("%rsp");
+#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
+
+#define BLOBS 16		/* 64 bytes */
+
+unsigned long
+__attribute__((noipa))
+asm_clobbers_redzone (void)
+{
+  unsigned int blob[BLOBS];
+  unsigned long y, sum = 0;
+
+  /* Write some data to an array on the stack */
+  /* gcc puts this at [-72..-8](%rsp) */
+  for (int i = 0; i < BLOBS; i++)
+    blob[i] = 0x55555555;
+
+  /* Clobbers [-16..-1](%rsp) */
+  asm volatile ("pushfq; push %1; pop %0; popfq"
+		: "=r" (y), ASM_CALL_CONSTRAINT
+		: "e" (-1));
+
+  /* Make sure the data on the stack is used */
+  for (int i = 0; i < BLOBS; i++)
+    sum += blob[i];
+
+  return sum;
+}
+
+int main ()
+{
+  unsigned long a;
+
+  a = asm_clobbers_redzone ();
+
+  if (a != 0x555555550UL)
+    __builtin_abort();
+
+  return 0;
+}