RISC-V: Fix elfutils testsuite unwind failures.

Message ID 20190113234809.4818-1-jimw@sifive.com
State Committed
Headers

Commit Message

Jim Wilson Jan. 13, 2019, 11:48 p.m. UTC
  The clone.S patch fixes 2 elfutils testsuite unwind failures, where the
backtrace gets stuck repeating __thread_start until we hit the backtrace
limit.  This was confirmed by building and installing a patched glibc and
then building elfutils and running its testsuite.

Unfortunately, the testcase isn't working as expected and I don't know why.
The testcase passes even when my clone.S patch is not installed.  The testcase
looks logically similarly to the elfutils testcases that are failing.  Maybe
there is a subtle difference in how the glibc unwinding works versus the
elfutils unwinding?  I don't have good gdb pthread support yet, so I haven't
found a way to debug this.  Anyways, I don't know if the testcase is useful or
not.  If the testcase isn't useful then maybe the clone.S patch is OK without
a testcase?

Jim

	[BZ #24040] 
	* elf/Makefile (CFLAGS-tst-unwind-main.c): Add -DUSE_PTHREADS=0.
	* elf/tst-unwind-main.c: If USE_PTHEADS, include pthread.h and error.h
	(func): New.
	(main): If USE_PTHREADS, call pthread_create to run func.  Otherwise
	call func directly.
	* nptl/Makefile (tests): Add tst-unwind-thread.
	(CFLAGS-tst-unwind-thread.c): Define.
	* nptl/tst-unwind-thread.c: New file.
	* sysdeps/unix/sysv/linux/riscv/clone.S (__thread_start): Mark ra
	as undefined.
---
 elf/Makefile                          |  2 +-
 elf/tst-unwind-main.c                 | 28 ++++++++++++++++++++++++---
 nptl/Makefile                         |  4 +++-
 nptl/tst-unwind-thread.c              |  2 ++
 sysdeps/unix/sysv/linux/riscv/clone.S |  5 +++++
 5 files changed, 36 insertions(+), 5 deletions(-)
 create mode 100644 nptl/tst-unwind-thread.c
  

Comments

Florian Weimer Jan. 14, 2019, 1:38 p.m. UTC | #1
* Jim Wilson:

> -int
> -main (void)
> +static void *
> +func (void *a)
>  {
>    /* Arrange for this test to be killed if _Unwind_Backtrace runs into an
>       endless loop.  We cannot use the test driver because the complete
>       call chain needs to be compiled with -funwind-tables so that
> -     _Unwind_Backtrace is able to reach _start.  */
> +     _Unwind_Backtrace is able to reach the start routine.  */
>    alarm (DEFAULT_TIMEOUT);
>    _Unwind_Backtrace (callback, 0);
> +  return a;
> +}
> +
> +int
> +main (void)
> +{
> +#if USE_PTHREADS
> +  pthread_t thr;
> +  int rc = pthread_create (&thr, NULL, &func, NULL);
> +  if (rc)
> +    error (1, rc, "pthread_create");
> +  rc = pthread_join (thr, NULL);
> +  if (rc)
> +    error (1, rc, "pthread_join");
> +#else
> +  func (NULL);
> +#endif

Do you expect func to be inlined, to preserve the pattern of the
original test?

I wonder if it is preferable to have separate, new file for this.

Thanks,
Florian
  
Jim Wilson Jan. 14, 2019, 5:29 p.m. UTC | #2
On Mon, Jan 14, 2019 at 5:38 AM Florian Weimer <fweimer@redhat.com> wrote:
> Do you expect func to be inlined, to preserve the pattern of the
> original test?

It shouldn't be necessary for func to be inlined to preserve the
intent of the original test.  We are just unwinding to reach the end
of the stack, and if func isn't inlined it is just one more function
to unwind through.

Jim
  
Mark Wielaard Jan. 17, 2019, 5:05 p.m. UTC | #3
Hi,

On Sun, 2019-01-13 at 15:48 -0800, Jim Wilson wrote:
> The clone.S patch fixes 2 elfutils testsuite unwind failures, where the
> backtrace gets stuck repeating __thread_start until we hit the backtrace
> limit.  This was confirmed by building and installing a patched glibc and
> then building elfutils and running its testsuite.

Obviously I would really like this patch to go in because it makes the
riscv64 elfutils testsuite zero fail. Unfortunately my riscv64 setup is
fairly slow (qemu/libvirt based) and so I haven't actually rebuild
glibc myself with your patch. So the following feedback is mostly
theoretical.

> Unfortunately, the testcase isn't working as expected and I don't know why.
> The testcase passes even when my clone.S patch is not installed.  The testcase
> looks logically similarly to the elfutils testcases that are failing.  Maybe
> there is a subtle difference in how the glibc unwinding works versus the
> elfutils unwinding?  I don't have good gdb pthread support yet, so I haven't
> found a way to debug this.  Anyways, I don't know if the testcase is useful or
> not.  If the testcase isn't useful then maybe the clone.S patch is OK without
> a testcase?

For what it is worth on my setup with glibc-2.28.9000-24.fc30.riscv64
your testcase goes into an infinite loop with -DUSE_PTHREADS=1, but
ends quickly without. So it does seem useful.

There might indeed be subtle differences between the elfutils unwinder,
which is out of process and might take a bit more "risks" to keep
unwinding, compared to the libgcc unwinder which is in-process and so
might be very conservative and stop on any frame that looks suspicious.

You might be able to see how it exactly unwinds by simply adding
something like printf ("%p\n", (void *) _Unwind_GetIP (ctx));
in the callback () function.

> diff --git a/sysdeps/unix/sysv/linux/riscv/clone.S b/sysdeps/unix/sysv/linux/riscv/clone.S
> index c079c1fb9f..0ff9ab3fd9 100644
> --- a/sysdeps/unix/sysv/linux/riscv/clone.S
> +++ b/sysdeps/unix/sysv/linux/riscv/clone.S
> @@ -69,6 +69,11 @@ L (error):
>  
>  ENTRY (__thread_start)
>  L (thread_start):
> +	/* Terminate call stack by noting ra is undefined.  Use a dummy
> +	   .cfi_label to force starting the FDE.  */
> +	.cfi_label .Ldummy
> +	cfi_undefined (ra)
> +
>  	/* Restore the arg for user's function.  */
>  	REG_L		a1,0(sp)	/* Function pointer.  */
>  	REG_L		a0,SZREG(sp)	/* Argument pointer.  */

Why not use cfi_startproc/cfi_endproc instead of the cfi_label?

Cheers,

Mark
  
Jim Wilson Jan. 21, 2019, 5:55 a.m. UTC | #4
On Thu, Jan 17, 2019 at 9:05 AM Mark Wielaard <mark@klomp.org> wrote:
> For what it is worth on my setup with glibc-2.28.9000-24.fc30.riscv64
> your testcase goes into an infinite loop with -DUSE_PTHREADS=1, but
> ends quickly without. So it does seem useful.

Experimenting a bit, I've figured out that if I use the system
libpthread.so, then the testcase behaves as I expect.  It fails
without the libc.so patch, it works with my libc.so patch.  If I use
the libpthread.so I just built as part of the glibc build, then the
testcase always passes.

> You might be able to see how it exactly unwinds by simply adding
> something like printf ("%p\n", (void *) _Unwind_GetIP (ctx));
> in the callback () function.

Trying this, I see that with the system libpthread.so, the backtrace
reaches __thread_start, and then loops infinitely without my patch or
exits with a 0 with my patch.  With the just built libpthread.so, the
backtrace only gets to start_thread, and then exits.  start_thread has
unwind info and a valid return address pointing back to
__thread_start, so it isn't obvious why the unwind stops there.
libpthread does have its own personality routine, maybe something
changed there.  Or maybe one of the missing binutils/gcc/glibc/linux
kernel/etc features we have implemented since the system libpthread.so
was built is causing things to work slightly differently now.  Or
maybe my glibc is configured slightly differently than the system
glibc and that is effecting the result.

I do think the __thead_start change and the new testcase are correct.
Other ports have similar changes to __thread_start, and verifying that
unwind to top of stack in a thread works if a good test to do.

> Why not use cfi_startproc/cfi_endproc instead of the cfi_label?

I used the exact same solution that is already used in the RISC-V
_start function.  I don't care what solution is used here, but I do
think that _start and __thread_start should use the exact same
solution.

Jim
  
Mark Wielaard Jan. 22, 2019, 1:28 p.m. UTC | #5
On Sun, 2019-01-20 at 21:55 -0800, Jim Wilson wrote:
> Or maybe my glibc is configured slightly differently than the
> system glibc and that is effecting the result.

The distro might build with -fasynchronous-unwind-tables which would
make sure that only the CFI unwinder is used, while without it, it
might be that there is a frame that might have to be handled by the
fallback unwinder.

> I do think the __thead_start change and the new testcase are correct.
> Other ports have similar changes to __thread_start, and verifying
> that unwind to top of stack in a thread works is a good test to do.

Agreed,

Mark
  
Jim Wilson Jan. 23, 2019, 6:58 a.m. UTC | #6
On Tue, Jan 22, 2019 at 5:28 AM Mark Wielaard <mark@klomp.org> wrote:
> The distro might build with -fasynchronous-unwind-tables which would
> make sure that only the CFI unwinder is used, while without it, it
> might be that there is a frame that might have to be handled by the
> fallback unwinder.

Thanks.  That indeed turns out to be the problem here.  I see in the
fedora glibc rpm that it is using CFLAGS="-O2 -g
-fasynchronous-unwind-tables".  When I configure and build glibc that
way, the testcase behaves the way I expect.  It fails without my
__thread_start patch and works with my patch.

Jim
  
Palmer Dabbelt Jan. 24, 2019, 2:51 a.m. UTC | #7
On Sun, 13 Jan 2019 15:48:09 PST (-0800), Jim Wilson wrote:
> The clone.S patch fixes 2 elfutils testsuite unwind failures, where the
> backtrace gets stuck repeating __thread_start until we hit the backtrace
> limit.  This was confirmed by building and installing a patched glibc and
> then building elfutils and running its testsuite.
>
> Unfortunately, the testcase isn't working as expected and I don't know why.
> The testcase passes even when my clone.S patch is not installed.  The testcase
> looks logically similarly to the elfutils testcases that are failing.  Maybe
> there is a subtle difference in how the glibc unwinding works versus the
> elfutils unwinding?  I don't have good gdb pthread support yet, so I haven't
> found a way to debug this.  Anyways, I don't know if the testcase is useful or
> not.  If the testcase isn't useful then maybe the clone.S patch is OK without
> a testcase?
>
> Jim
>
> 	[BZ #24040]
> 	* elf/Makefile (CFLAGS-tst-unwind-main.c): Add -DUSE_PTHREADS=0.
> 	* elf/tst-unwind-main.c: If USE_PTHEADS, include pthread.h and error.h
> 	(func): New.
> 	(main): If USE_PTHREADS, call pthread_create to run func.  Otherwise
> 	call func directly.
> 	* nptl/Makefile (tests): Add tst-unwind-thread.
> 	(CFLAGS-tst-unwind-thread.c): Define.
> 	* nptl/tst-unwind-thread.c: New file.
> 	* sysdeps/unix/sysv/linux/riscv/clone.S (__thread_start): Mark ra
> 	as undefined.

I'm happy with the fix and can merge that.  I'm not sure if I can actually 
merge the test suite changes, though.

Does anyone oppose me merging the whole patch?

> ---
>  elf/Makefile                          |  2 +-
>  elf/tst-unwind-main.c                 | 28 ++++++++++++++++++++++++---
>  nptl/Makefile                         |  4 +++-
>  nptl/tst-unwind-thread.c              |  2 ++
>  sysdeps/unix/sysv/linux/riscv/clone.S |  5 +++++
>  5 files changed, 36 insertions(+), 5 deletions(-)
>  create mode 100644 nptl/tst-unwind-thread.c
>
> diff --git a/elf/Makefile b/elf/Makefile
> index 9cf5cd8dfd..815bbd2041 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -1497,4 +1497,4 @@ $(objpfx)tst-big-note: $(objpfx)tst-big-note-lib.so
>
>  $(objpfx)tst-unwind-ctor: $(objpfx)tst-unwind-ctor-lib.so
>
> -CFLAGS-tst-unwind-main.c += -funwind-tables
> +CFLAGS-tst-unwind-main.c += -funwind-tables -DUSE_PTHREADS=0
> diff --git a/elf/tst-unwind-main.c b/elf/tst-unwind-main.c
> index 4089e7e907..0c345dc31f 100644
> --- a/elf/tst-unwind-main.c
> +++ b/elf/tst-unwind-main.c
> @@ -20,19 +20,41 @@
>  #include <unistd.h>
>  #include <support/test-driver.h>
>
> +#if USE_PTHREADS
> +# include <pthread.h>
> +# include <error.h>
> +#endif
> +
>  static _Unwind_Reason_Code
>  callback (struct _Unwind_Context *ctx, void *arg)
>  {
>    return _URC_NO_REASON;
>  }
>
> -int
> -main (void)
> +static void *
> +func (void *a)
>  {
>    /* Arrange for this test to be killed if _Unwind_Backtrace runs into an
>       endless loop.  We cannot use the test driver because the complete
>       call chain needs to be compiled with -funwind-tables so that
> -     _Unwind_Backtrace is able to reach _start.  */
> +     _Unwind_Backtrace is able to reach the start routine.  */
>    alarm (DEFAULT_TIMEOUT);
>    _Unwind_Backtrace (callback, 0);
> +  return a;
> +}
> +
> +int
> +main (void)
> +{
> +#if USE_PTHREADS
> +  pthread_t thr;
> +  int rc = pthread_create (&thr, NULL, &func, NULL);
> +  if (rc)
> +    error (1, rc, "pthread_create");
> +  rc = pthread_join (thr, NULL);
> +  if (rc)
> +    error (1, rc, "pthread_join");
> +#else
> +  func (NULL);
> +#endif
>  }
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 340282c6cb..07c2f49235 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -319,7 +319,7 @@ tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
>  	tst-cnd-basic tst-mtx-trylock tst-cnd-broadcast \
>  	tst-cnd-timedwait tst-thrd-detach tst-mtx-basic tst-thrd-sleep \
>  	tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock \
> -	tst-rwlock-pwn
> +	tst-rwlock-pwn tst-unwind-thread
>
>  tests-internal := tst-rwlock19 tst-rwlock20 \
>  		  tst-sem11 tst-sem12 tst-sem13 \
> @@ -709,6 +709,8 @@ $(objpfx)tst-audit-threads: $(objpfx)tst-audit-threads-mod2.so
>  $(objpfx)tst-audit-threads.out: $(objpfx)tst-audit-threads-mod1.so
>  tst-audit-threads-ENV = LD_AUDIT=$(objpfx)tst-audit-threads-mod1.so
>
> +CFLAGS-tst-unwind-thread.c += -funwind-tables
> +
>  # The tests here better do not run in parallel
>  ifneq ($(filter %tests,$(MAKECMDGOALS)),)
>  .NOTPARALLEL:
> diff --git a/nptl/tst-unwind-thread.c b/nptl/tst-unwind-thread.c
> new file mode 100644
> index 0000000000..d5c38e3709
> --- /dev/null
> +++ b/nptl/tst-unwind-thread.c
> @@ -0,0 +1,2 @@
> +#define USE_PTHREADS 1
> +#include "../elf/tst-unwind-main.c"
> diff --git a/sysdeps/unix/sysv/linux/riscv/clone.S b/sysdeps/unix/sysv/linux/riscv/clone.S
> index c079c1fb9f..0ff9ab3fd9 100644
> --- a/sysdeps/unix/sysv/linux/riscv/clone.S
> +++ b/sysdeps/unix/sysv/linux/riscv/clone.S
> @@ -69,6 +69,11 @@ L (error):
>
>  ENTRY (__thread_start)
>  L (thread_start):
> +	/* Terminate call stack by noting ra is undefined.  Use a dummy
> +	   .cfi_label to force starting the FDE.  */
> +	.cfi_label .Ldummy
> +	cfi_undefined (ra)
> +
>  	/* Restore the arg for user's function.  */
>  	REG_L		a1,0(sp)	/* Function pointer.  */
>  	REG_L		a0,SZREG(sp)	/* Argument pointer.  */
  
Jim Wilson Jan. 28, 2019, 9:18 p.m. UTC | #8
On Wed, Jan 23, 2019 at 6:51 PM Palmer Dabbelt <palmer@sifive.com> wrote:
> I'm happy with the fix and can merge that.  I'm not sure if I can actually
> merge the test suite changes, though.
>
> Does anyone oppose me merging the whole patch?

ping

I need a global glibc maintainer review for the testcase.  Andreas, is
this something you can help with?

Original message:
https://sourceware.org/ml/libc-alpha/2019-01/msg00278.html

Palmer, if no one reviews the testcase, I would suggest just checking
in the RISC-V specific fix, and then I can just put the testcase into
its own bug report and maybe someone will look at it there.  Or maybe
propose me as a glibc RISC-V maintainer and then I will check in the
RISC-V part of the fix myself.  David Abdurachmanov found another
glibc problem, so I probably need to write another glibc patch to fix
it.

Jim
  
Jim Wilson Jan. 28, 2019, 11:10 p.m. UTC | #9
On Mon, Jan 28, 2019 at 2:44 PM DJ Delorie <dj@redhat.com> wrote:
>
> Jim Wilson <jimw@sifive.com> writes:
> > ping
> >
> > I need a global glibc maintainer review for the testcase.  Andreas, is
> > this something you can help with?
>
> I was planning on pushing at least the risc-v portion of this after the
> freeze.  I can worry about the testcase consensus if you've got other
> bugs to debug still, but this week is going to be a difficult week to
> get anyone's attention as we're all doing release-related stuff.

OK, thanks, that helps.  I can wait for that.

Jim
  
Jim Wilson Feb. 4, 2019, 11:01 p.m. UTC | #10
On Mon, Feb 4, 2019 at 2:36 PM DJ Delorie <dj@redhat.com> wrote:
> main() should return something in the fall-through case, be it zero
> (main should exit) or 1 (code shouldn't be reached), now that main()
> doesn't end with _Unwind_Backtrace() itself, if for no reason than to
> avoid future warnings.

The original testcase doesn't end with _Unwind_Backtrace().
_Unwind_Backtrace returns, and then it falls through the bottom of
main.  Just like my testcase.  I was just following the same style.  I
don't know if that was intentional or not.  By the way, the ISO C
standard allows programs to fall through the bottom of main.  This is
defined as exactly equivalent to returning 0.  Looking at the ISO C
2011 standard, section 5.1.2.2.3 Program termination, "reaching the }
that terminates the main function returns a value of 0".  The testcase
fails if _Unwind_Backtrace() does not return, in which case the alarm
triggers, and the alarm signal handler exits with an error.  If
_Unwind_Backtrace() returns then the testcase passed.  FYI
elf/tst-initorder.c has the same style, a main function without an
explicit return.

Anyways, adding a return 0 at the bottom of main is fine with me, I
write my own code that way, and it doesn't change the effect of the
testcase.

I don't have write access to glibc.  Do you want me to send an updated
patch?  Or just fix it yourself and commit it?

Jim
  
Jim Wilson Feb. 12, 2019, 8:08 p.m. UTC | #11
Original message:
https://sourceware.org/ml/libc-alpha/2019-01/msg00278.html

On Sun, Jan 13, 2019 at 3:49 PM Jim Wilson <jimw@sifive.com> wrote:
>
> The clone.S patch fixes 2 elfutils testsuite unwind failures, where the
> backtrace gets stuck repeating __thread_start until we hit the backtrace
> limit.  This was confirmed by building and installing a patched glibc and
> then building elfutils and running its testsuite.
>
> Unfortunately, the testcase isn't working as expected and I don't know why.
> The testcase passes even when my clone.S patch is not installed.  The testcase
> looks logically similarly to the elfutils testcases that are failing.  Maybe
> there is a subtle difference in how the glibc unwinding works versus the
> elfutils unwinding?  I don't have good gdb pthread support yet, so I haven't
> found a way to debug this.  Anyways, I don't know if the testcase is useful or
> not.  If the testcase isn't useful then maybe the clone.S patch is OK without
> a testcase?
>
> Jim
>
>         [BZ #24040]
>         * elf/Makefile (CFLAGS-tst-unwind-main.c): Add -DUSE_PTHREADS=0.
>         * elf/tst-unwind-main.c: If USE_PTHEADS, include pthread.h and error.h
>         (func): New.
>         (main): If USE_PTHREADS, call pthread_create to run func.  Otherwise
>         call func directly.
>         * nptl/Makefile (tests): Add tst-unwind-thread.
>         (CFLAGS-tst-unwind-thread.c): Define.
>         * nptl/tst-unwind-thread.c: New file.
>         * sysdeps/unix/sysv/linux/riscv/clone.S (__thread_start): Mark ra
>         as undefined.
> ---
>  elf/Makefile                          |  2 +-
>  elf/tst-unwind-main.c                 | 28 ++++++++++++++++++++++++---
>  nptl/Makefile                         |  4 +++-
>  nptl/tst-unwind-thread.c              |  2 ++
>  sysdeps/unix/sysv/linux/riscv/clone.S |  5 +++++
>  5 files changed, 36 insertions(+), 5 deletions(-)
>  create mode 100644 nptl/tst-unwind-thread.c
>
> diff --git a/elf/Makefile b/elf/Makefile
> index 9cf5cd8dfd..815bbd2041 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -1497,4 +1497,4 @@ $(objpfx)tst-big-note: $(objpfx)tst-big-note-lib.so
>
>  $(objpfx)tst-unwind-ctor: $(objpfx)tst-unwind-ctor-lib.so
>
> -CFLAGS-tst-unwind-main.c += -funwind-tables
> +CFLAGS-tst-unwind-main.c += -funwind-tables -DUSE_PTHREADS=0
> diff --git a/elf/tst-unwind-main.c b/elf/tst-unwind-main.c
> index 4089e7e907..0c345dc31f 100644
> --- a/elf/tst-unwind-main.c
> +++ b/elf/tst-unwind-main.c
> @@ -20,19 +20,41 @@
>  #include <unistd.h>
>  #include <support/test-driver.h>
>
> +#if USE_PTHREADS
> +# include <pthread.h>
> +# include <error.h>
> +#endif
> +
>  static _Unwind_Reason_Code
>  callback (struct _Unwind_Context *ctx, void *arg)
>  {
>    return _URC_NO_REASON;
>  }
>
> -int
> -main (void)
> +static void *
> +func (void *a)
>  {
>    /* Arrange for this test to be killed if _Unwind_Backtrace runs into an
>       endless loop.  We cannot use the test driver because the complete
>       call chain needs to be compiled with -funwind-tables so that
> -     _Unwind_Backtrace is able to reach _start.  */
> +     _Unwind_Backtrace is able to reach the start routine.  */
>    alarm (DEFAULT_TIMEOUT);
>    _Unwind_Backtrace (callback, 0);
> +  return a;
> +}
> +
> +int
> +main (void)
> +{
> +#if USE_PTHREADS
> +  pthread_t thr;
> +  int rc = pthread_create (&thr, NULL, &func, NULL);
> +  if (rc)
> +    error (1, rc, "pthread_create");
> +  rc = pthread_join (thr, NULL);
> +  if (rc)
> +    error (1, rc, "pthread_join");
> +#else
> +  func (NULL);
> +#endif
>  }
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 340282c6cb..07c2f49235 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -319,7 +319,7 @@ tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
>         tst-cnd-basic tst-mtx-trylock tst-cnd-broadcast \
>         tst-cnd-timedwait tst-thrd-detach tst-mtx-basic tst-thrd-sleep \
>         tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock \
> -       tst-rwlock-pwn
> +       tst-rwlock-pwn tst-unwind-thread
>
>  tests-internal := tst-rwlock19 tst-rwlock20 \
>                   tst-sem11 tst-sem12 tst-sem13 \
> @@ -709,6 +709,8 @@ $(objpfx)tst-audit-threads: $(objpfx)tst-audit-threads-mod2.so
>  $(objpfx)tst-audit-threads.out: $(objpfx)tst-audit-threads-mod1.so
>  tst-audit-threads-ENV = LD_AUDIT=$(objpfx)tst-audit-threads-mod1.so
>
> +CFLAGS-tst-unwind-thread.c += -funwind-tables
> +
>  # The tests here better do not run in parallel
>  ifneq ($(filter %tests,$(MAKECMDGOALS)),)
>  .NOTPARALLEL:
> diff --git a/nptl/tst-unwind-thread.c b/nptl/tst-unwind-thread.c
> new file mode 100644
> index 0000000000..d5c38e3709
> --- /dev/null
> +++ b/nptl/tst-unwind-thread.c
> @@ -0,0 +1,2 @@
> +#define USE_PTHREADS 1
> +#include "../elf/tst-unwind-main.c"
> diff --git a/sysdeps/unix/sysv/linux/riscv/clone.S b/sysdeps/unix/sysv/linux/riscv/clone.S
> index c079c1fb9f..0ff9ab3fd9 100644
> --- a/sysdeps/unix/sysv/linux/riscv/clone.S
> +++ b/sysdeps/unix/sysv/linux/riscv/clone.S
> @@ -69,6 +69,11 @@ L (error):
>
>  ENTRY (__thread_start)
>  L (thread_start):
> +       /* Terminate call stack by noting ra is undefined.  Use a dummy
> +          .cfi_label to force starting the FDE.  */
> +       .cfi_label .Ldummy
> +       cfi_undefined (ra)
> +
>         /* Restore the arg for user's function.  */
>         REG_L           a1,0(sp)        /* Function pointer.  */
>         REG_L           a0,SZREG(sp)    /* Argument pointer.  */
> --
> 2.19.2
>
  
Andreas Schwab Feb. 13, 2019, 12:03 p.m. UTC | #12
On Jan 13 2019, Jim Wilson <jimw@sifive.com> wrote:

> Unfortunately, the testcase isn't working as expected and I don't know why.
> The testcase passes even when my clone.S patch is not installed.  The testcase
> looks logically similarly to the elfutils testcases that are failing.  Maybe
> there is a subtle difference in how the glibc unwinding works versus the
> elfutils unwinding?  I don't have good gdb pthread support yet, so I haven't
> found a way to debug this.

You need to point libthread-db-search-path to the nptl_db directory so
that gdb is able to find the correct libthread_db.

Andreas.
  
Andreas Schwab Feb. 13, 2019, 4:33 p.m. UTC | #13
On Jan 13 2019, Jim Wilson <jimw@sifive.com> wrote:

> Unfortunately, the testcase isn't working as expected and I don't know why.
> The testcase passes even when my clone.S patch is not installed.

I don't see that here, with the current tools from openSUSE Factory.
There many be some (bad?) luck involved, since it's undefined territory.

> 	[BZ #24040] 
> 	* elf/Makefile (CFLAGS-tst-unwind-main.c): Add -DUSE_PTHREADS=0.
> 	* elf/tst-unwind-main.c: If USE_PTHEADS, include pthread.h and error.h
> 	(func): New.
> 	(main): If USE_PTHREADS, call pthread_create to run func.  Otherwise
> 	call func directly.
> 	* nptl/Makefile (tests): Add tst-unwind-thread.
> 	(CFLAGS-tst-unwind-thread.c): Define.
> 	* nptl/tst-unwind-thread.c: New file.
> 	* sysdeps/unix/sysv/linux/riscv/clone.S (__thread_start): Mark ra
> 	as undefined.

Ok.

Andreas.
  
Palmer Dabbelt Feb. 13, 2019, 10:25 p.m. UTC | #14
On Wed, 13 Feb 2019 08:33:34 PST (-0800), schwab@suse.de wrote:
> On Jan 13 2019, Jim Wilson <jimw@sifive.com> wrote:
>
>> Unfortunately, the testcase isn't working as expected and I don't know why.
>> The testcase passes even when my clone.S patch is not installed.
>
> I don't see that here, with the current tools from openSUSE Factory.
> There many be some (bad?) luck involved, since it's undefined territory.
>
>> 	[BZ #24040]
>> 	* elf/Makefile (CFLAGS-tst-unwind-main.c): Add -DUSE_PTHREADS=0.
>> 	* elf/tst-unwind-main.c: If USE_PTHEADS, include pthread.h and error.h
>> 	(func): New.
>> 	(main): If USE_PTHREADS, call pthread_create to run func.  Otherwise
>> 	call func directly.
>> 	* nptl/Makefile (tests): Add tst-unwind-thread.
>> 	(CFLAGS-tst-unwind-thread.c): Define.
>> 	* nptl/tst-unwind-thread.c: New file.
>> 	* sysdeps/unix/sysv/linux/riscv/clone.S (__thread_start): Mark ra
>> 	as undefined.
>
> Ok.

Thanks.  I committed this.
  
Andreas Schwab Feb. 18, 2019, 9:56 a.m. UTC | #15
Please don't forget to update the bug page.

Andreas.
  

Patch

diff --git a/elf/Makefile b/elf/Makefile
index 9cf5cd8dfd..815bbd2041 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -1497,4 +1497,4 @@  $(objpfx)tst-big-note: $(objpfx)tst-big-note-lib.so
 
 $(objpfx)tst-unwind-ctor: $(objpfx)tst-unwind-ctor-lib.so
 
-CFLAGS-tst-unwind-main.c += -funwind-tables
+CFLAGS-tst-unwind-main.c += -funwind-tables -DUSE_PTHREADS=0
diff --git a/elf/tst-unwind-main.c b/elf/tst-unwind-main.c
index 4089e7e907..0c345dc31f 100644
--- a/elf/tst-unwind-main.c
+++ b/elf/tst-unwind-main.c
@@ -20,19 +20,41 @@ 
 #include <unistd.h>
 #include <support/test-driver.h>
 
+#if USE_PTHREADS
+# include <pthread.h>
+# include <error.h>
+#endif
+
 static _Unwind_Reason_Code
 callback (struct _Unwind_Context *ctx, void *arg)
 {
   return _URC_NO_REASON;
 }
 
-int
-main (void)
+static void *
+func (void *a)
 {
   /* Arrange for this test to be killed if _Unwind_Backtrace runs into an
      endless loop.  We cannot use the test driver because the complete
      call chain needs to be compiled with -funwind-tables so that
-     _Unwind_Backtrace is able to reach _start.  */
+     _Unwind_Backtrace is able to reach the start routine.  */
   alarm (DEFAULT_TIMEOUT);
   _Unwind_Backtrace (callback, 0);
+  return a;
+}
+
+int
+main (void)
+{
+#if USE_PTHREADS
+  pthread_t thr;
+  int rc = pthread_create (&thr, NULL, &func, NULL);
+  if (rc)
+    error (1, rc, "pthread_create");
+  rc = pthread_join (thr, NULL);
+  if (rc)
+    error (1, rc, "pthread_join");
+#else
+  func (NULL);
+#endif
 }
diff --git a/nptl/Makefile b/nptl/Makefile
index 340282c6cb..07c2f49235 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -319,7 +319,7 @@  tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
 	tst-cnd-basic tst-mtx-trylock tst-cnd-broadcast \
 	tst-cnd-timedwait tst-thrd-detach tst-mtx-basic tst-thrd-sleep \
 	tst-mtx-recursive tst-tss-basic tst-call-once tst-mtx-timedlock \
-	tst-rwlock-pwn
+	tst-rwlock-pwn tst-unwind-thread
 
 tests-internal := tst-rwlock19 tst-rwlock20 \
 		  tst-sem11 tst-sem12 tst-sem13 \
@@ -709,6 +709,8 @@  $(objpfx)tst-audit-threads: $(objpfx)tst-audit-threads-mod2.so
 $(objpfx)tst-audit-threads.out: $(objpfx)tst-audit-threads-mod1.so
 tst-audit-threads-ENV = LD_AUDIT=$(objpfx)tst-audit-threads-mod1.so
 
+CFLAGS-tst-unwind-thread.c += -funwind-tables
+
 # The tests here better do not run in parallel
 ifneq ($(filter %tests,$(MAKECMDGOALS)),)
 .NOTPARALLEL:
diff --git a/nptl/tst-unwind-thread.c b/nptl/tst-unwind-thread.c
new file mode 100644
index 0000000000..d5c38e3709
--- /dev/null
+++ b/nptl/tst-unwind-thread.c
@@ -0,0 +1,2 @@ 
+#define USE_PTHREADS 1
+#include "../elf/tst-unwind-main.c"
diff --git a/sysdeps/unix/sysv/linux/riscv/clone.S b/sysdeps/unix/sysv/linux/riscv/clone.S
index c079c1fb9f..0ff9ab3fd9 100644
--- a/sysdeps/unix/sysv/linux/riscv/clone.S
+++ b/sysdeps/unix/sysv/linux/riscv/clone.S
@@ -69,6 +69,11 @@  L (error):
 
 ENTRY (__thread_start)
 L (thread_start):
+	/* Terminate call stack by noting ra is undefined.  Use a dummy
+	   .cfi_label to force starting the FDE.  */
+	.cfi_label .Ldummy
+	cfi_undefined (ra)
+
 	/* Restore the arg for user's function.  */
 	REG_L		a1,0(sp)	/* Function pointer.  */
 	REG_L		a0,SZREG(sp)	/* Argument pointer.  */