diff mbox series

[v2] powerpc64: Workaround sigtramp vdso return call

Message ID 20210127192305.pw74xougw2ejihuz@work-tp
State Committed
Commit 5ee506ed35a2c9184bcb1fb5e79b6cceb9bb0dd1
Delegated to: Tulio Magno Quites Machado Filho
Headers show
Series [v2] powerpc64: Workaround sigtramp vdso return call | expand

Commit Message

Raoni Fassina Firmino Jan. 27, 2021, 7:23 p.m. UTC
Changes since v1[1]:
  - Fixed comments length and formatting;
  - Changed comment wording to the one suggested by Adhermeval;
  - Changed if logic to the one suggested by Adhermeval.

Original message:
  There was some initial discussions on the mailing list about the
  failing misc/tst-sigcontext-get_pc[0], and I made some suggestions
  of possible solutions.  As the window for the release is closing I
  want to sent the more simple one of them ASAP for consideration
  (others would not make it in time)

Tested on top of master (01cdcf783a666481133d4975b1980624b0ef4799)
on the following platforms with no regression:
  - powerpc64le-linux-gnu (Power 9) kernel 5.4.0-59
  - powerpc64le-linux-gnu (Power 9) kernel 5.9.14-1
  - powerpc64-linux-gnu (Power 9) kernel 5.10.0-1

[0] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html
[1] https://sourceware.org/pipermail/libc-alpha/2021-January/121933.html

---- 8< ----

A not so recent kernel change[1] changed how the trampoline
`__kernel_sigtramp_rt64` is used to call signal handlers.

This was exposed on the test misc/tst-sigcontext-get_pc

Before kernel 5.9, the kernel set LR to the trampoline address and
jumped directly to the signal handler, and at the end the signal
handler, as any other function, would `blr` to the address set.  In
other words, the trampoline was executed just at the end of the signal
handler and the only thing it did was call sigreturn.  But since
kernel 5.9 the kernel set CTRL to the signal handler and calls to the
trampoline code, the trampoline then `bctrl` to the address in CTRL,
setting the LR to the next instruction in the middle of the
trampoline, when the signal handler returns, the rest of the
trampoline code executes the same code as before.

Here is the full trampoline code as of kernel 5.11.0-rc5 for
reference:

    V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
    .Lsigrt_start:
            bctrl▸  /* call the handler */
            addi▸   r1, r1, __SIGNAL_FRAMESIZE
            li▸     r0,__NR_rt_sigreturn
            sc
    .Lsigrt_end:
    V_FUNCTION_END(__kernel_sigtramp_rt64)

This new behavior breaks how `backtrace()` uses to detect the
trampoline frame to correctly reconstruct the stack frame when it is
called from inside a signal handling.

This workaround rely on the fact that the trampoline code is at very
least two (maybe 3?) instructions in size (as it is in the 32 bits
version, only on `li` and `sc`), so it is safe to check the return
address be in the range __kernel_sigtramp_rt64 .. + 4.

[1] subject: powerpc/64/signal: Balance return predictor stack in signal trampoline
    commit: 0138ba5783ae0dcc799ad401a1e8ac8333790df9
    url: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0138ba5783ae0dcc799ad401a1e8ac8333790df9
---
 sysdeps/powerpc/powerpc64/backtrace.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Adhemerval Zanella Jan. 27, 2021, 7:33 p.m. UTC | #1
On 27/01/2021 16:23, Raoni Fassina Firmino wrote:
> Changes since v1[1]:
>   - Fixed comments length and formatting;
>   - Changed comment wording to the one suggested by Adhermeval;
>   - Changed if logic to the one suggested by Adhermeval.
> 
> Original message:
>   There was some initial discussions on the mailing list about the
>   failing misc/tst-sigcontext-get_pc[0], and I made some suggestions
>   of possible solutions.  As the window for the release is closing I
>   want to sent the more simple one of them ASAP for consideration
>   (others would not make it in time)
> 
> Tested on top of master (01cdcf783a666481133d4975b1980624b0ef4799)
> on the following platforms with no regression:
>   - powerpc64le-linux-gnu (Power 9) kernel 5.4.0-59
>   - powerpc64le-linux-gnu (Power 9) kernel 5.9.14-1
>   - powerpc64-linux-gnu (Power 9) kernel 5.10.0-1
> 
> [0] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html
> [1] https://sourceware.org/pipermail/libc-alpha/2021-January/121933.html
> 
> ---- 8< ----
> 
> A not so recent kernel change[1] changed how the trampoline
> `__kernel_sigtramp_rt64` is used to call signal handlers.
> 
> This was exposed on the test misc/tst-sigcontext-get_pc
> 
> Before kernel 5.9, the kernel set LR to the trampoline address and
> jumped directly to the signal handler, and at the end the signal
> handler, as any other function, would `blr` to the address set.  In
> other words, the trampoline was executed just at the end of the signal
> handler and the only thing it did was call sigreturn.  But since
> kernel 5.9 the kernel set CTRL to the signal handler and calls to the
> trampoline code, the trampoline then `bctrl` to the address in CTRL,
> setting the LR to the next instruction in the middle of the
> trampoline, when the signal handler returns, the rest of the
> trampoline code executes the same code as before.
> 
> Here is the full trampoline code as of kernel 5.11.0-rc5 for
> reference:
> 
>     V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
>     .Lsigrt_start:
>             bctrl▸  /* call the handler */
>             addi▸   r1, r1, __SIGNAL_FRAMESIZE
>             li▸     r0,__NR_rt_sigreturn
>             sc
>     .Lsigrt_end:
>     V_FUNCTION_END(__kernel_sigtramp_rt64)
> 
> This new behavior breaks how `backtrace()` uses to detect the
> trampoline frame to correctly reconstruct the stack frame when it is
> called from inside a signal handling.
> 
> This workaround rely on the fact that the trampoline code is at very
> least two (maybe 3?) instructions in size (as it is in the 32 bits
> version, only on `li` and `sc`), so it is safe to check the return
> address be in the range __kernel_sigtramp_rt64 .. + 4.
> 
> [1] subject: powerpc/64/signal: Balance return predictor stack in signal trampoline
>     commit: 0138ba5783ae0dcc799ad401a1e8ac8333790df9
>     url: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0138ba5783ae0dcc799ad401a1e8ac8333790df9

LGTM, it is ok for 2.33.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  sysdeps/powerpc/powerpc64/backtrace.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/sysdeps/powerpc/powerpc64/backtrace.c b/sysdeps/powerpc/powerpc64/backtrace.c
> index ae64c5d7a5..37de9b5bdd 100644
> --- a/sysdeps/powerpc/powerpc64/backtrace.c
> +++ b/sysdeps/powerpc/powerpc64/backtrace.c
> @@ -54,11 +54,22 @@ struct signal_frame_64 {
>    /* We don't care about the rest, since the IP value is at 'uc' field.  */
>  };
>  
> +/* Test if the address match to the inside the trampoline code.
> +   Up to and including kernel 5.8, returning from an interrupt or syscall to a
> +   signal handler starts execution directly at the handler's entry point, with
> +   LR set to address of the sigreturn trampoline (the vDSO symbol).
> +   Newer kernels will branch to signal handler from the trampoline instead, so
> +   checking the stacktrace against the vDSO entrypoint does not work in such
> +   case.
> +   The vDSO branches with a 'bctrl' instruction, so checking either the
> +   vDSO address itself and the next instruction should cover all kernel
> +   versions.  */
>  static inline bool
>  is_sigtramp_address (void *nip)
>  {
>  #ifdef HAVE_SIGTRAMP_RT64
> -  if (nip == GLRO (dl_vdso_sigtramp_rt64))
> +  if (nip == GLRO (dl_vdso_sigtramp_rt64) ||
> +      nip == GLRO (dl_vdso_sigtramp_rt64) + 4)
>      return true;
>  #endif
>    return false;
>
Tulio Magno Quites Machado Filho Jan. 28, 2021, 4:59 p.m. UTC | #2
Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> writes:

> On 27/01/2021 16:23, Raoni Fassina Firmino wrote:
>> 
>> A not so recent kernel change[1] changed how the trampoline
>> `__kernel_sigtramp_rt64` is used to call signal handlers.
>> 
>> This was exposed on the test misc/tst-sigcontext-get_pc
>> 
>> Before kernel 5.9, the kernel set LR to the trampoline address and
>> jumped directly to the signal handler, and at the end the signal
>> handler, as any other function, would `blr` to the address set.  In
>> other words, the trampoline was executed just at the end of the signal
>> handler and the only thing it did was call sigreturn.  But since
>> kernel 5.9 the kernel set CTRL to the signal handler and calls to the
>> trampoline code, the trampoline then `bctrl` to the address in CTRL,
>> setting the LR to the next instruction in the middle of the
>> trampoline, when the signal handler returns, the rest of the
>> trampoline code executes the same code as before.
>> 
>> Here is the full trampoline code as of kernel 5.11.0-rc5 for
>> reference:
>> 
>>     V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
>>     .Lsigrt_start:
>>             bctrl▸  /* call the handler */
>>             addi▸   r1, r1, __SIGNAL_FRAMESIZE
>>             li▸     r0,__NR_rt_sigreturn
>>             sc
>>     .Lsigrt_end:
>>     V_FUNCTION_END(__kernel_sigtramp_rt64)
>> 
>> This new behavior breaks how `backtrace()` uses to detect the
>> trampoline frame to correctly reconstruct the stack frame when it is
>> called from inside a signal handling.
>> 
>> This workaround rely on the fact that the trampoline code is at very
>> least two (maybe 3?) instructions in size (as it is in the 32 bits
>> version, only on `li` and `sc`), so it is safe to check the return
>> address be in the range __kernel_sigtramp_rt64 .. + 4.
>> 
>> [1] subject: powerpc/64/signal: Balance return predictor stack in signal trampoline
>>     commit: 0138ba5783ae0dcc799ad401a1e8ac8333790df9
>>     url: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0138ba5783ae0dcc799ad401a1e8ac8333790df9
>
> LGTM, it is ok for 2.33.
>
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Pushed as 5ee506ed35a2c9184bcb1fb5e79b6cceb9bb0dd1

Thanks!
Florian Weimer Feb. 11, 2021, 9:55 p.m. UTC | #3
* Tulio Magno Quites Machado Filho via Libc-alpha:

>> LGTM, it is ok for 2.33.
>>
>> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
>
> Pushed as 5ee506ed35a2c9184bcb1fb5e79b6cceb9bb0dd1

Why isn't this handled as a kernel regression?

Thanks,
Florian
Adhemerval Zanella Feb. 12, 2021, 2:14 a.m. UTC | #4
On 11/02/2021 18:55, Florian Weimer wrote:
> * Tulio Magno Quites Machado Filho via Libc-alpha:
> 
>>> LGTM, it is ok for 2.33.
>>>
>>> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
>>
>> Pushed as 5ee506ed35a2c9184bcb1fb5e79b6cceb9bb0dd1
> 
> Why isn't this handled as a kernel regression?

I don't have a strong opinion about it, but this issue is presented on
at least two already released versions (5.9 and 5.10) and it falls on
the category where it might characterize as ABI abuse: backtrace is
relying on a specific semantic where kernel/vDSO does provide a working
solution that does not suffer from this inherent limitation (unwinder
information produced by CFI).

It see this similar to how some sanitizer code expects some invariant
glibc internals and need to handle with ad-hoc code (such as struct
pthread internal size).  This issue for the sanitize case is slight
worse because there is proper solution.

That's why I suggested in the original thread to get rid of the powerpc
optimization and implement the backtrace as other architectures does
(through libgcc). I was hopping that IBM could work on this one.
Raoni Fassina Firmino Feb. 15, 2021, 5:51 p.m. UTC | #5
On Thu, Feb 11, 2021 at 10:55:54PM +0100, AL glibc-alpha wrote:
> * Tulio Magno Quites Machado Filho via Libc-alpha:
> 
> >> LGTM, it is ok for 2.33.
> >>
> >> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
> >
> > Pushed as 5ee506ed35a2c9184bcb1fb5e79b6cceb9bb0dd1
> 
> Why isn't this handled as a kernel regression?

I don't know if you mean a specific workflow to flag this as a
regression in any bugtrack or something like that.  Apart from that IMHO
It is is a regression as I mentioned in some prior email[1] and I
characterized it as such in my kernel patch[2] that was include for the
5.11 release and backported for 5.10.16.  At least no one questioned
this characterization.

o/
Raoni

[1] https://sourceware.org/pipermail/libc-alpha/2021-January/121951.html
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-February/223585.html
diff mbox series

Patch

diff --git a/sysdeps/powerpc/powerpc64/backtrace.c b/sysdeps/powerpc/powerpc64/backtrace.c
index ae64c5d7a5..37de9b5bdd 100644
--- a/sysdeps/powerpc/powerpc64/backtrace.c
+++ b/sysdeps/powerpc/powerpc64/backtrace.c
@@ -54,11 +54,22 @@  struct signal_frame_64 {
   /* We don't care about the rest, since the IP value is at 'uc' field.  */
 };
 
+/* Test if the address match to the inside the trampoline code.
+   Up to and including kernel 5.8, returning from an interrupt or syscall to a
+   signal handler starts execution directly at the handler's entry point, with
+   LR set to address of the sigreturn trampoline (the vDSO symbol).
+   Newer kernels will branch to signal handler from the trampoline instead, so
+   checking the stacktrace against the vDSO entrypoint does not work in such
+   case.
+   The vDSO branches with a 'bctrl' instruction, so checking either the
+   vDSO address itself and the next instruction should cover all kernel
+   versions.  */
 static inline bool
 is_sigtramp_address (void *nip)
 {
 #ifdef HAVE_SIGTRAMP_RT64
-  if (nip == GLRO (dl_vdso_sigtramp_rt64))
+  if (nip == GLRO (dl_vdso_sigtramp_rt64) ||
+      nip == GLRO (dl_vdso_sigtramp_rt64) + 4)
     return true;
 #endif
   return false;