i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

Message ID Zd34cK6OVqHvMN9J@tucnak
State New
Headers
Series i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Jakub Jelinek Feb. 27, 2024, 2:57 p.m. UTC
  On Tue, Feb 27, 2024 at 01:09:09PM +0100, Jakub Jelinek wrote:
> So, IMHO either revert the changes altogether, or guard on -mcmodel=kernel
> (but talk to kernel people on linux-toolchains if that is what they actually
> want).

Here is a patch which guards this by non-default option, so kernel and other
users can choose if they want this or not.  On top of the PR114116 patch.

Only lightly tested so far.

2024-02-27  Jakub Jelinek  <jakub@redhat.com>

	PR target/38534
	* config/i386/i386.opt (mnoreturn-no-callee-saved-registers): New
	option.
	* config/i386/i386-options.cc (ix86_set_func_type): Don't use
	TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP unless
	ix86_noreturn_no_callee_saved_registers is enabled.
	* doc/invoke.texi (-mnoreturn-no-callee-saved-registers): Document.

	* gcc.target/i386/pr38534-1.c: Add -mnoreturn-no-callee-saved-registers
	to dg-options.
	* gcc.target/i386/pr38534-2.c: Likewise.
	* gcc.target/i386/pr38534-3.c: Likewise.
	* gcc.target/i386/pr38534-4.c: Likewise.
	* gcc.target/i386/pr38534-5.c: Likewise.
	* gcc.target/i386/pr38534-6.c: Likewise.
	* gcc.target/i386/pr114097-1.c: Likewise.
	* gcc.target/i386/stack-check-17.c: Likewise.



	Jakub
  

Comments

Jakub Jelinek Feb. 28, 2024, 8 a.m. UTC | #1
On Tue, Feb 27, 2024 at 03:57:52PM +0100, Jakub Jelinek wrote:
> On Tue, Feb 27, 2024 at 01:09:09PM +0100, Jakub Jelinek wrote:
> > So, IMHO either revert the changes altogether, or guard on -mcmodel=kernel
> > (but talk to kernel people on linux-toolchains if that is what they actually
> > want).
> 
> Here is a patch which guards this by non-default option, so kernel and other
> users can choose if they want this or not.  On top of the PR114116 patch.
> 
> Only lightly tested so far.

Successfully bootstrapped/regtested on x86_64-linux and i686-linux now.
Ok for trunk (if the PR114116 patch is approved too)?

> 2024-02-27  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/38534
> 	* config/i386/i386.opt (mnoreturn-no-callee-saved-registers): New
> 	option.
> 	* config/i386/i386-options.cc (ix86_set_func_type): Don't use
> 	TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP unless
> 	ix86_noreturn_no_callee_saved_registers is enabled.
> 	* doc/invoke.texi (-mnoreturn-no-callee-saved-registers): Document.
> 
> 	* gcc.target/i386/pr38534-1.c: Add -mnoreturn-no-callee-saved-registers
> 	to dg-options.
> 	* gcc.target/i386/pr38534-2.c: Likewise.
> 	* gcc.target/i386/pr38534-3.c: Likewise.
> 	* gcc.target/i386/pr38534-4.c: Likewise.
> 	* gcc.target/i386/pr38534-5.c: Likewise.
> 	* gcc.target/i386/pr38534-6.c: Likewise.
> 	* gcc.target/i386/pr114097-1.c: Likewise.
> 	* gcc.target/i386/stack-check-17.c: Likewise.

	Jakub
  
Jakub Jelinek Feb. 28, 2024, 8:53 a.m. UTC | #2
Hi!

Adding Hongtao and Honza into the loop as the ones who acked the original
patch.

The no_callee_saved_registers by default for noreturn functions change can
break in-process backtrace(3) or backtraces from debugger or other process
(quite often, any time the noreturn function decides to use the bp register
and any of the parent frames uses a frame pointer; the unwinder just crashes
in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
like to save bp register in that case:

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html

and additionally the no_callee_saved_registers by default for noreturn
functions change can make debugging harder, again not localized to the
noreturn function, but any of its callers.  So, if say glibc abort function
implementation needs a lot of normally callee-saved registers, no matter how
users recompile their apps, they will see garbage or optimized out
vars/parameters in their code unless they rebuild their glibc with -O0.
So, I think we should guard that by a non-default option:

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html

Plus we need to somehow make sure to emit DW_CFA_undefined for the modified
but not saved normally callee-saved registers, so that we at least don't get
garbage in debug info.  H.J. posted some patches for that, so far I wasn't
happy about the implementation but the actual change is desirable.

Your thoughts on this?

	Jakub
  
Hongtao Liu Feb. 29, 2024, 6:20 a.m. UTC | #3
On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> Adding Hongtao and Honza into the loop as the ones who acked the original
> patch.
>
> The no_callee_saved_registers by default for noreturn functions change can
> break in-process backtrace(3) or backtraces from debugger or other process
> (quite often, any time the noreturn function decides to use the bp register
> and any of the parent frames uses a frame pointer; the unwinder just crashes
> in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> like to save bp register in that case:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
I think this patch makes sense and LGTM, we save and restore frame
pointer for noreturn.
>
> and additionally the no_callee_saved_registers by default for noreturn
> functions change can make debugging harder, again not localized to the
> noreturn function, but any of its callers.  So, if say glibc abort function
> implementation needs a lot of normally callee-saved registers, no matter how
> users recompile their apps, they will see garbage or optimized out
> vars/parameters in their code unless they rebuild their glibc with -O0.
> So, I think we should guard that by a non-default option:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
So it turns off the optimization for noreturn functions by default,
I'm not sure about this.
Any comments, H.J?
>
> Plus we need to somehow make sure to emit DW_CFA_undefined for the modified
> but not saved normally callee-saved registers, so that we at least don't get
> garbage in debug info.  H.J. posted some patches for that, so far I wasn't
> happy about the implementation but the actual change is desirable.
>
> Your thoughts on this?
>
>         Jakub
>
  
H.J. Lu Feb. 29, 2024, 12:26 p.m. UTC | #4
On Wed, Feb 28, 2024 at 10:20 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > Hi!
> >
> > Adding Hongtao and Honza into the loop as the ones who acked the original
> > patch.
> >
> > The no_callee_saved_registers by default for noreturn functions change can
> > break in-process backtrace(3) or backtraces from debugger or other process
> > (quite often, any time the noreturn function decides to use the bp register
> > and any of the parent frames uses a frame pointer; the unwinder just crashes
> > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > like to save bp register in that case:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> I think this patch makes sense and LGTM, we save and restore frame
> pointer for noreturn.
> >
> > and additionally the no_callee_saved_registers by default for noreturn
> > functions change can make debugging harder, again not localized to the
> > noreturn function, but any of its callers.  So, if say glibc abort function
> > implementation needs a lot of normally callee-saved registers, no matter how
> > users recompile their apps, they will see garbage or optimized out
> > vars/parameters in their code unless they rebuild their glibc with -O0.
> > So, I think we should guard that by a non-default option:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> So it turns off the optimization for noreturn functions by default,
> I'm not sure about this.
> Any comments, H.J?

We need BP for backtrace.  I don't think we need to save other
registers.  True, GDB may not see function parameters.  But
optimization always has this impact.  When I need to debug a
program, I always use -O0 or -Og.

> >
> > Plus we need to somehow make sure to emit DW_CFA_undefined for the modified
> > but not saved normally callee-saved registers, so that we at least don't get
> > garbage in debug info.  H.J. posted some patches for that, so far I wasn't
> > happy about the implementation but the actual change is desirable.
> >
> > Your thoughts on this?
> >
> >         Jakub
> >
>
>
> --
> BR,
> Hongtao
  
Jakub Jelinek Feb. 29, 2024, 12:47 p.m. UTC | #5
On Thu, Feb 29, 2024 at 04:26:00AM -0800, H.J. Lu wrote:
> > > Adding Hongtao and Honza into the loop as the ones who acked the original
> > > patch.
> > >
> > > The no_callee_saved_registers by default for noreturn functions change can
> > > break in-process backtrace(3) or backtraces from debugger or other process
> > > (quite often, any time the noreturn function decides to use the bp register
> > > and any of the parent frames uses a frame pointer; the unwinder just crashes
> > > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > > like to save bp register in that case:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> > I think this patch makes sense and LGTM, we save and restore frame
> > pointer for noreturn.
> > >
> > > and additionally the no_callee_saved_registers by default for noreturn
> > > functions change can make debugging harder, again not localized to the
> > > noreturn function, but any of its callers.  So, if say glibc abort function
> > > implementation needs a lot of normally callee-saved registers, no matter how
> > > users recompile their apps, they will see garbage or optimized out
> > > vars/parameters in their code unless they rebuild their glibc with -O0.
> > > So, I think we should guard that by a non-default option:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> > So it turns off the optimization for noreturn functions by default,
> > I'm not sure about this.
> > Any comments, H.J?
> 
> We need BP for backtrace.  I don't think we need to save other
> registers.  True, GDB may not see function parameters.  But
> optimization always has this impact.  When I need to debug a
> program, I always use -O0 or -Og.

The problem is that it doesn't help in this case.
If some optimization makes debugging of some function harder, normally it is
enough to recompile the translation unit that defines it with -O0/-Og, or
add optimize attribute on the function.
While in this case, the optimization interferes with debugging of other
functions, not necessarily from the same translation unit, not necessarily
even from the same library or binary, or even from the same package.
As I tried to explain, supposedly glibc abort is compiled with -O2 and needs
a lot of registers, so say it uses all of %rbx, %rbp, %r12, %r13, %r14,
%r15 and this optimization is applied on it.  That means debugging of any
application (-O2, -Og or even -O0 compiled) to understand what went wrong
and why it aborted will be harder.  Including core file analysis.
Recompiling those apps with -O0/-Og will not help.  The only thing that
would help is to recompile glibc with -O0/-Og.
Doesn't have to be abort, doesn't have to be glibc.  Any library which
exports some noreturn APIs may be affected.
And there is not even a workaround other than to recompile with -O0/-Og the
noreturn functions, no way to disable this optimization.

Given that most users just will not be aware of this, even adding the option
but defaulting to on would mean a problem for a lot of users.  Most of them
will not know the problem is that some noreturn function 10 frames deep in
the call stack was optimized this way.

If people only call the noreturn functions from within the same package,
for some strange reason care about performance of noreturn functions (they
don't return, so unless you longjmp out of them or something similar
which is costly on its own already, they should be entered exactly once)
and are willing to pay the price in worse debugging in that case, let them
use the option.  But if they provide libraries that other packages then
consume, I'd say it wouldn't be a good idea.

	Jakub
  
Richard Biener Feb. 29, 2024, 1:24 p.m. UTC | #6
On Thu, Feb 29, 2024 at 1:47 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Feb 29, 2024 at 04:26:00AM -0800, H.J. Lu wrote:
> > > > Adding Hongtao and Honza into the loop as the ones who acked the original
> > > > patch.
> > > >
> > > > The no_callee_saved_registers by default for noreturn functions change can
> > > > break in-process backtrace(3) or backtraces from debugger or other process
> > > > (quite often, any time the noreturn function decides to use the bp register
> > > > and any of the parent frames uses a frame pointer; the unwinder just crashes
> > > > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > > > like to save bp register in that case:
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> > > I think this patch makes sense and LGTM, we save and restore frame
> > > pointer for noreturn.
> > > >
> > > > and additionally the no_callee_saved_registers by default for noreturn
> > > > functions change can make debugging harder, again not localized to the
> > > > noreturn function, but any of its callers.  So, if say glibc abort function
> > > > implementation needs a lot of normally callee-saved registers, no matter how
> > > > users recompile their apps, they will see garbage or optimized out
> > > > vars/parameters in their code unless they rebuild their glibc with -O0.
> > > > So, I think we should guard that by a non-default option:
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> > > So it turns off the optimization for noreturn functions by default,
> > > I'm not sure about this.
> > > Any comments, H.J?
> >
> > We need BP for backtrace.  I don't think we need to save other
> > registers.  True, GDB may not see function parameters.  But
> > optimization always has this impact.  When I need to debug a
> > program, I always use -O0 or -Og.
>
> The problem is that it doesn't help in this case.
> If some optimization makes debugging of some function harder, normally it is
> enough to recompile the translation unit that defines it with -O0/-Og, or
> add optimize attribute on the function.
> While in this case, the optimization interferes with debugging of other
> functions, not necessarily from the same translation unit, not necessarily
> even from the same library or binary, or even from the same package.
> As I tried to explain, supposedly glibc abort is compiled with -O2 and needs
> a lot of registers, so say it uses all of %rbx, %rbp, %r12, %r13, %r14,
> %r15 and this optimization is applied on it.  That means debugging of any
> application (-O2, -Og or even -O0 compiled) to understand what went wrong
> and why it aborted will be harder.  Including core file analysis.
> Recompiling those apps with -O0/-Og will not help.  The only thing that
> would help is to recompile glibc with -O0/-Og.
> Doesn't have to be abort, doesn't have to be glibc.  Any library which
> exports some noreturn APIs may be affected.
> And there is not even a workaround other than to recompile with -O0/-Og the
> noreturn functions, no way to disable this optimization.
>
> Given that most users just will not be aware of this, even adding the option
> but defaulting to on would mean a problem for a lot of users.  Most of them
> will not know the problem is that some noreturn function 10 frames deep in
> the call stack was optimized this way.
>
> If people only call the noreturn functions from within the same package,
> for some strange reason care about performance of noreturn functions (they
> don't return, so unless you longjmp out of them or something similar
> which is costly on its own already, they should be entered exactly once)
> and are willing to pay the price in worse debugging in that case, let them
> use the option.  But if they provide libraries that other packages then
> consume, I'd say it wouldn't be a good idea.

+1

I'll definitely patch this by-default behavior out if we as upstream keep it.
Debugging customer core dumps is more important than optimizing
glibc abort/assert.

I do hope such patch will be at least easy, like flipping the default of an
option.

Richard.

>         Jakub
>
  
Jan Hubicka Feb. 29, 2024, 1:31 p.m. UTC | #7
> >
> > The problem is that it doesn't help in this case.
> > If some optimization makes debugging of some function harder, normally it is
> > enough to recompile the translation unit that defines it with -O0/-Og, or
> > add optimize attribute on the function.
> > While in this case, the optimization interferes with debugging of other
> > functions, not necessarily from the same translation unit, not necessarily
> > even from the same library or binary, or even from the same package.
> > As I tried to explain, supposedly glibc abort is compiled with -O2 and needs
> > a lot of registers, so say it uses all of %rbx, %rbp, %r12, %r13, %r14,
> > %r15 and this optimization is applied on it.  That means debugging of any
> > application (-O2, -Og or even -O0 compiled) to understand what went wrong
> > and why it aborted will be harder.  Including core file analysis.
> > Recompiling those apps with -O0/-Og will not help.  The only thing that
> > would help is to recompile glibc with -O0/-Og.
> > Doesn't have to be abort, doesn't have to be glibc.  Any library which
> > exports some noreturn APIs may be affected.
> > And there is not even a workaround other than to recompile with -O0/-Og the
> > noreturn functions, no way to disable this optimization.
> >
> > Given that most users just will not be aware of this, even adding the option
> > but defaulting to on would mean a problem for a lot of users.  Most of them
> > will not know the problem is that some noreturn function 10 frames deep in
> > the call stack was optimized this way.
> >
> > If people only call the noreturn functions from within the same package,
> > for some strange reason care about performance of noreturn functions (they
> > don't return, so unless you longjmp out of them or something similar
> > which is costly on its own already, they should be entered exactly once)
> > and are willing to pay the price in worse debugging in that case, let them
> > use the option.  But if they provide libraries that other packages then
> > consume, I'd say it wouldn't be a good idea.
> 
> +1
> 
> I'll definitely patch this by-default behavior out if we as upstream keep it.
> Debugging customer core dumps is more important than optimizing
> glibc abort/assert.
> 
> I do hope such patch will be at least easy, like flipping the default of an
> option.

I agree that debugability of user core dumps is important here.

I guess an ideal solution would be to change codegen of noreturn functions
to callee save all registers. Performance of prologue of noreturn
function is not too important. THen we can stop caller saving registers
and still get reasonable backtraces.

This is essentially an ABI change (though kind of conservative one since
nothing except debugging really depends on it).  Perhaps it would make
sense to make the optimization non-default option now and also implement
the callee save logic. Then we should be able to flip release or two
later. Maybe synchroniation with LLVM would be desirable here if we
decide to go this route.

Honza
> 
> Richard.
> 
> >         Jakub
> >
  
Jakub Jelinek Feb. 29, 2024, 1:56 p.m. UTC | #8
On Thu, Feb 29, 2024 at 02:31:05PM +0100, Jan Hubicka wrote:
> I agree that debugability of user core dumps is important here.
> 
> I guess an ideal solution would be to change codegen of noreturn functions
> to callee save all registers. Performance of prologue of noreturn
> function is not too important. THen we can stop caller saving registers
> and still get reasonable backtraces.

I don't think that is possible.
While both C and C++ require that if [[noreturn]] attribute is used on
some function declaration, it must be used on the first declaration and
also if some function is [[noreturn]] in one TU, it must be [[noreturn]]
in all other TUs which declare the same function.
But, we have no such requirement for __attribute__((noreturn)), there it
is a pure optimization, it can be declared just on the caller side as an
optimization hint the function will not return, or just on the callee side
where the compiler will actually verify it doesn't return, or both.
And, the attribute is not part of function type, so even in standard C/C++,
one can use
extern void bar ();
[[noreturn]] void foo ()
{
  for (;;) bar ();
}
void (*fn) () = foo;
void baz ()
{
  fn ();
}
As you can call the noreturn function directly or indirectly, changing
calling conventions based on noreturn vs. no-noreturn is IMHO not possible.

	Jakub
  
Jan Hubicka Feb. 29, 2024, 2:15 p.m. UTC | #9
> On Thu, Feb 29, 2024 at 02:31:05PM +0100, Jan Hubicka wrote:
> > I agree that debugability of user core dumps is important here.
> > 
> > I guess an ideal solution would be to change codegen of noreturn functions
> > to callee save all registers. Performance of prologue of noreturn
> > function is not too important. THen we can stop caller saving registers
> > and still get reasonable backtraces.
> 
> I don't think that is possible.
> While both C and C++ require that if [[noreturn]] attribute is used on
> some function declaration, it must be used on the first declaration and
> also if some function is [[noreturn]] in one TU, it must be [[noreturn]]
> in all other TUs which declare the same function.
> But, we have no such requirement for __attribute__((noreturn)), there it
> is a pure optimization, it can be declared just on the caller side as an
> optimization hint the function will not return, or just on the callee side
> where the compiler will actually verify it doesn't return, or both.
> And, the attribute is not part of function type, so even in standard C/C++,
> one can use
> extern void bar ();
> [[noreturn]] void foo ()
> {
>   for (;;) bar ();
> }
> void (*fn) () = foo;
> void baz ()
> {
>   fn ();
> }
> As you can call the noreturn function directly or indirectly, changing
> calling conventions based on noreturn vs. no-noreturn is IMHO not possible.

I am not wed to the idea (just it appeared to me as an option to
disabling this optimization by default). I still think it may make sense.

Making noreturn calles to save caller saved register is compatible with
the default ABI.  If noreturn is missing on caller side, then caller will
save reigsters as usual. Noreturn callee will save them again, which is
pointless, but everything should work as usual and extra cost of saving
should not matter in practice.  This is also the case of indirect call
of noreturn function where you miss annotation on caller side.

If noreturn is missing on callee side, we will lose information on
functions arguments in backtrace, but the code will still work
(especially if we save BP register to make code backtraceable).  This is
scenario that probably can be avoided in practice where it matters (such
as in glibc abort whose implementation is annotated).  

Noreturn already leads to some information loss in backtraces. I tend to
get surprised from time to time to see whrong call to abort due to tail
merging. So it may be acceptable to lose info in a situation where user
does sily thing and only annotates caller.

Since we auto-detect noreturn, we may need to be extra careful about noreturn
comdats. Here auto-detection of prevailing def may have different
outcome than auto-detection of prevailed defs. So we may want to disable
the optimization for auto-detected comdats.

Honza
> 
> 	Jakub
>
  
H.J. Lu Feb. 29, 2024, 2:28 p.m. UTC | #10
On Thu, Feb 29, 2024 at 6:15 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On Thu, Feb 29, 2024 at 02:31:05PM +0100, Jan Hubicka wrote:
> > > I agree that debugability of user core dumps is important here.
> > >
> > > I guess an ideal solution would be to change codegen of noreturn functions
> > > to callee save all registers. Performance of prologue of noreturn
> > > function is not too important. THen we can stop caller saving registers
> > > and still get reasonable backtraces.
> >
> > I don't think that is possible.
> > While both C and C++ require that if [[noreturn]] attribute is used on
> > some function declaration, it must be used on the first declaration and
> > also if some function is [[noreturn]] in one TU, it must be [[noreturn]]
> > in all other TUs which declare the same function.
> > But, we have no such requirement for __attribute__((noreturn)), there it
> > is a pure optimization, it can be declared just on the caller side as an
> > optimization hint the function will not return, or just on the callee side
> > where the compiler will actually verify it doesn't return, or both.
> > And, the attribute is not part of function type, so even in standard C/C++,
> > one can use
> > extern void bar ();
> > [[noreturn]] void foo ()
> > {
> >   for (;;) bar ();
> > }
> > void (*fn) () = foo;
> > void baz ()
> > {
> >   fn ();
> > }
> > As you can call the noreturn function directly or indirectly, changing
> > calling conventions based on noreturn vs. no-noreturn is IMHO not possible.
>
> I am not wed to the idea (just it appeared to me as an option to
> disabling this optimization by default). I still think it may make sense.
>
> Making noreturn calles to save caller saved register is compatible with
> the default ABI.  If noreturn is missing on caller side, then caller will
> save reigsters as usual. Noreturn callee will save them again, which is
> pointless, but everything should work as usual and extra cost of saving
> should not matter in practice.  This is also the case of indirect call
> of noreturn function where you miss annotation on caller side.
>
> If noreturn is missing on callee side, we will lose information on
> functions arguments in backtrace, but the code will still work
> (especially if we save BP register to make code backtraceable).  This is
> scenario that probably can be avoided in practice where it matters (such
> as in glibc abort whose implementation is annotated).
>
> Noreturn already leads to some information loss in backtraces. I tend to
> get surprised from time to time to see whrong call to abort due to tail
> merging. So it may be acceptable to lose info in a situation where user
> does sily thing and only annotates caller.
>
> Since we auto-detect noreturn, we may need to be extra careful about noreturn
> comdats. Here auto-detection of prevailing def may have different
> outcome than auto-detection of prevailed defs. So we may want to disable
> the optimization for auto-detected comdats.
>

There are 2 kinds of noreturns.  One is abort which may require backtrace.
The other is a normal exit from the previous frame.  The latter case doesn't
require backtrace and can be performance critical.  Which one is more
important for users?
  
Jakub Jelinek Feb. 29, 2024, 3:10 p.m. UTC | #11
On Thu, Feb 29, 2024 at 03:15:30PM +0100, Jan Hubicka wrote:
> I am not wed to the idea (just it appeared to me as an option to
> disabling this optimization by default). I still think it may make sense.

Maybe I misunderstood your idea.
So, you are basically suggesting to go in a completely opposite direction
from H.J.'s changes, instead of saving less in noreturn prologues, save
more, most likely not change anything in code generation on the caller's
side (nothing is kept alive across visible unconditional noreturn calls)
and maybe after some time use normally caller-saved registers in say call
frame debug info for possible use in DW_OP_entry_value (but in that case it
is an ABI change) and improve debuggability of vars which live in normally
caller-saved registers right before a noreturn call in that frame?
What about registers in which function arguments are passed?

If it is done as a change with no changes at all on the caller side and
just on the callee side, it could again be guarded by some option (not sure
what the default would be) where the user could increase debuggability of
the noreturn caller (in that case always necessarily just a single immediate
one; while not doing the callee saved register saves improves debuggability
in perhaps multiple routines in the call stack, depending on which registers
wouldn't be saved and which registers are used in each of the caller frames;
e.g. noreturn function could save/not save all of %rbx, %r1[2345], one
caller somewhere use %r12, another %rbx and %r13, yet another one %r14 and
%r15).

	Jakub
  
Jan Hubicka Feb. 29, 2024, 3:26 p.m. UTC | #12
> On Thu, Feb 29, 2024 at 03:15:30PM +0100, Jan Hubicka wrote:
> > I am not wed to the idea (just it appeared to me as an option to
> > disabling this optimization by default). I still think it may make sense.
> 
> Maybe I misunderstood your idea.
> So, you are basically suggesting to go in a completely opposite direction
> from H.J.'s changes, instead of saving less in noreturn prologues, save
> more, most likely not change anything in code generation on the caller's
> side (nothing is kept alive across visible unconditional noreturn calls)
> and maybe after some time use normally caller-saved registers in say call
> frame debug info for possible use in DW_OP_entry_value (but in that case it
> is an ABI change) and improve debuggability of vars which live in normally
> caller-saved registers right before a noreturn call in that frame?
> What about registers in which function arguments are passed?

Hmm, you are right - I got it backwards. 
> 
> If it is done as a change with no changes at all on the caller side and
> just on the callee side, it could again be guarded by some option (not sure
> what the default would be) where the user could increase debuggability of
> the noreturn caller (in that case always necessarily just a single immediate
> one; while not doing the callee saved register saves improves debuggability
> in perhaps multiple routines in the call stack, depending on which registers
> wouldn't be saved and which registers are used in each of the caller frames;
> e.g. noreturn function could save/not save all of %rbx, %r1[2345], one
> caller somewhere use %r12, another %rbx and %r13, yet another one %r14 and
> %r15).

I am not sure how much practical value this would get, but in any case
it is indpeendent of the discussed patch.

Sorry for the confussion,
Honza
> 
> 	Jakub
>
  
Hongtao Liu March 5, 2024, 4:52 a.m. UTC | #13
On Thu, Feb 29, 2024 at 2:20 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > Hi!
> >
> > Adding Hongtao and Honza into the loop as the ones who acked the original
> > patch.
> >
> > The no_callee_saved_registers by default for noreturn functions change can
> > break in-process backtrace(3) or backtraces from debugger or other process
> > (quite often, any time the noreturn function decides to use the bp register
> > and any of the parent frames uses a frame pointer; the unwinder just crashes
> > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > like to save bp register in that case:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> I think this patch makes sense and LGTM, we save and restore frame
> pointer for noreturn.
> >
> > and additionally the no_callee_saved_registers by default for noreturn
> > functions change can make debugging harder, again not localized to the
> > noreturn function, but any of its callers.  So, if say glibc abort function
> > implementation needs a lot of normally callee-saved registers, no matter how
> > users recompile their apps, they will see garbage or optimized out
> > vars/parameters in their code unless they rebuild their glibc with -O0.
> > So, I think we should guard that by a non-default option:
From what has been discussed so far, I am inclined to this proposal.
If there are no additional objections(or concerns) in a few days, ok
for the trunk.
> >
>
>
> --
> BR,
> Hongtao
  

Patch

--- gcc/config/i386/i386.opt.jj	2024-01-10 12:19:07.694681189 +0100
+++ gcc/config/i386/i386.opt	2024-02-27 14:18:34.439240869 +0100
@@ -659,6 +659,10 @@  mstore-max=
 Target RejectNegative Joined Var(ix86_store_max) Enum(prefer_vector_width) Init(PVW_NONE) Save
 Maximum number of bits that can be stored to memory efficiently.
 
+mnoreturn-no-callee-saved-registers
+Target Var(ix86_noreturn_no_callee_saved_registers)
+Optimize noreturn functions by not saving callee-saved registers used in the function.
+
 ;; ISA support
 
 m32
--- gcc/config/i386/i386-options.cc.jj	2024-02-27 14:20:59.972228314 +0100
+++ gcc/config/i386/i386-options.cc	2024-02-27 14:23:26.042208182 +0100
@@ -3384,7 +3384,8 @@  ix86_set_func_type (tree fndecl)
 {
   /* No need to save and restore callee-saved registers for a noreturn
      function with nothrow or compiled with -fno-exceptions unless when
-     compiling with -O0 or -Og.  So that backtrace works for those at least
+     compiling with -O0 or -Og, except that it interferes with debugging
+     of callers.  So that backtrace works for those at least
      in most cases, save the bp register if it is used, because it often
      is used in callers to compute CFA.
 
@@ -3401,7 +3402,8 @@  ix86_set_func_type (tree fndecl)
   if (lookup_attribute ("no_callee_saved_registers",
 			TYPE_ATTRIBUTES (TREE_TYPE (fndecl))))
     no_callee_saved_registers = TYPE_NO_CALLEE_SAVED_REGISTERS;
-  else if (TREE_THIS_VOLATILE (fndecl)
+  else if (ix86_noreturn_no_callee_saved_registers
+	   && TREE_THIS_VOLATILE (fndecl)
 	   && optimize
 	   && !optimize_debug
 	   && (TREE_NOTHROW (fndecl) || !flag_exceptions)
--- gcc/doc/invoke.texi.jj	2024-02-23 11:34:34.278287553 +0100
+++ gcc/doc/invoke.texi	2024-02-27 14:29:18.071339182 +0100
@@ -1450,6 +1450,7 @@  See RS/6000 and PowerPC Options.
 -mvzeroupper  -mprefer-avx128  -mprefer-vector-width=@var{opt}
 -mpartial-vector-fp-math
 -mmove-max=@var{bits} -mstore-max=@var{bits}
+-mnoreturn-no-callee-saved-registers
 -mmmx  -msse  -msse2  -msse3  -mssse3  -msse4.1  -msse4.2  -msse4  -mavx
 -mavx2  -mavx512f  -mavx512pf  -mavx512er  -mavx512cd  -mavx512vl
 -mavx512bw  -mavx512dq  -mavx512ifma  -mavx512vbmi  -msha  -maes
@@ -35376,6 +35377,15 @@  Prefer 256-bit vector width for instruct
 Prefer 512-bit vector width for instructions.
 @end table
 
+@opindex mnoreturn-no-callee-saved-registers
+@item -mnoreturn-no-callee-saved-registers
+This option optimizes functions with @code{noreturn} attribute or
+@code{_Noreturn} specifier by not saving in the function prologue callee-saved
+registers which are used in the function (except for the @code{BP}
+register).  This option can interfere with debugging of the caller of the
+@code{noreturn} function or any function further up in the call stack, so it
+is not enabled by default.
+
 @opindex mcx16
 @item -mcx16
 This option enables GCC to generate @code{CMPXCHG16B} instructions in 64-bit
--- gcc/testsuite/gcc.target/i386/pr38534-1.c.jj	2024-02-27 14:21:00.385222600 +0100
+++ gcc/testsuite/gcc.target/i386/pr38534-1.c	2024-02-27 15:39:44.687716915 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer" } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer -mnoreturn-no-callee-saved-registers" } */
 
 #define ARRAY_SIZE 256
 
--- gcc/testsuite/gcc.target/i386/pr38534-2.c.jj	2024-02-27 14:21:00.385222600 +0100
+++ gcc/testsuite/gcc.target/i386/pr38534-2.c	2024-02-27 15:39:51.569621585 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer" } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer -mnoreturn-no-callee-saved-registers" } */
 
 extern void bar (void) __attribute__ ((no_callee_saved_registers));
 extern void fn (void) __attribute__ ((noreturn));
--- gcc/testsuite/gcc.target/i386/pr38534-3.c.jj	2024-02-27 14:21:00.385222600 +0100
+++ gcc/testsuite/gcc.target/i386/pr38534-3.c	2024-02-27 15:39:57.420540547 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer" } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer -mnoreturn-no-callee-saved-registers" } */
 
 typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
 extern fn_t bar;
--- gcc/testsuite/gcc.target/i386/pr38534-4.c.jj	2024-02-27 14:21:00.385222600 +0100
+++ gcc/testsuite/gcc.target/i386/pr38534-4.c	2024-02-27 15:40:08.185391436 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer" } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer -mnoreturn-no-callee-saved-registers" } */
 
 typedef void (*fn_t) (void) __attribute__ ((no_callee_saved_registers));
 extern void fn (void) __attribute__ ((noreturn));
--- gcc/testsuite/gcc.target/i386/pr38534-5.c.jj	2024-01-30 08:45:06.904842201 +0100
+++ gcc/testsuite/gcc.target/i386/pr38534-5.c	2024-02-27 15:49:17.382784286 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -mnoreturn-no-callee-saved-registers" } */
 
 #define ARRAY_SIZE 256
 
--- gcc/testsuite/gcc.target/i386/pr38534-6.c.jj	2024-01-30 08:45:06.904842201 +0100
+++ gcc/testsuite/gcc.target/i386/pr38534-6.c	2024-02-27 15:49:32.123580145 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -mnoreturn-no-callee-saved-registers" } */
 
 #define ARRAY_SIZE 256
 
--- gcc/testsuite/gcc.target/i386/pr114097-1.c.jj	2024-02-27 14:21:00.386222586 +0100
+++ gcc/testsuite/gcc.target/i386/pr114097-1.c	2024-02-27 15:41:12.758496992 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer" } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move -fomit-frame-pointer -mnoreturn-no-callee-saved-registers" } */
 
 #define ARRAY_SIZE 256
 
--- gcc/testsuite/gcc.target/i386/stack-check-17.c.jj	2024-02-27 14:21:00.386222586 +0100
+++ gcc/testsuite/gcc.target/i386/stack-check-17.c	2024-02-27 15:41:42.269088224 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fstack-clash-protection -mtune=generic -fomit-frame-pointer" } */
+/* { dg-options "-O2 -fstack-clash-protection -mtune=generic -fomit-frame-pointer -mnoreturn-no-callee-saved-registers" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
 /* { dg-skip-if "" { *-*-* } { "-fstack-protector*" } { "" } } */