[RFC,0/3] Fix attaching to process when it has zombie threads

Message ID 20240321231149.519549-1-thiago.bauermann@linaro.org
Headers
Series Fix attaching to process when it has zombie threads |

Message

Thiago Jung Bauermann March 21, 2024, 11:11 p.m. UTC
  Hello,

This patch series fixes a GDB hang when attaching to a multi-threaded
inferior which happens often (but not always) on aarch64-linux and
powerpc64le-linux, as described in PR 31312.  See patch 3 for a detailed
descripiton of the problem.

Patches 1 and 2 are preparatory patches because I want to use existing code
to parse the /proc/PID/stat file to get the thread's starttime value, so
that GDB and gdbserver aren't fooled by PID reuse.

This patch series was tested on native and extended-remote aarch64-linux
and armv8l-linux-gnueabihf and no regressions were found, except for the
following:

When running gdb.threads/detach-step-over.exp on armv8l-linux-gnueabihf
extended-remote, sometimes GDBserver dies with:

  builtin_spawn /home/thiago.bauermann/.cache/builds/gdb-native-aarch32/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over
  Remote debugging from host 127.0.0.1, port 56624
  Process /home/thiago.bauermann/.cache/builds/gdb-native-aarch32/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over created; pid = 840876
  Attached; pid = 840821
  Detaching from process 840821
  Attached; pid = 840821
  /home/thiago.bauermann/src/binutils-gdb/gdbserver/linux-low.cc:1956: A problem internal to GDBserver has been detected.
  unsuspend LWP 840821, suspended=-1

The assertion triggered is this one:

  /* Decrement LWP's suspend count.  */

  static void
  lwp_suspended_decr (struct lwp_info *lwp)
  {
    lwp->suspended--;

    if (lwp->suspended < 0)
      {
        struct thread_info *thread = get_lwp_thread (lwp);

        internal_error ("unsuspend LWP %ld, suspended=%d\n", lwpid_of (thread),
  		      lwp->suspended);
      }
  }

Unfortunately for the moment I don't have time to further debug this
problem and I didn't want to keep sitting on these patches until I can come
back to this issue.

Note that of all the testcases in the GDB testsuite, only
detach-step-over.exp triggers the GDBserver internal error so it's a
localized problem.

This is why I'm posting the patch series as an RFC. Considering that it
fixes a problem that is causing instability in the testsuite results for
aarch64-linux and powerpc64le-linux, does it make sense to commit it as is,
and then investigate the GDBserver internal error on armv8l-linux-gnueabihf
later?

Thiago Jung Bauermann (3):
  gdb/nat: Use procfs(5) indexes in linux_common_core_of_thread
  gdb/nat: Factor linux_find_proc_stat_field out of
    linux_common_core_of_thread
  gdb/nat/linux: Fix attaching to process when it has zombie threads

 gdb/nat/linux-osdata.c | 65 +++++++++++++++++++++++++++++++++---------
 gdb/nat/linux-osdata.h |  7 +++++
 gdb/nat/linux-procfs.c | 19 ++++++++++++
 3 files changed, 77 insertions(+), 14 deletions(-)


base-commit: b42aa684f6ff2bce9b8bc58aa89574723f17f1ce
  

Comments

Christophe Lyon March 22, 2024, 10:17 a.m. UTC | #1
Hi!


On Fri, 22 Mar 2024 at 00:12, Thiago Jung Bauermann
<thiago.bauermann@linaro.org> wrote:
>
> Hello,
>
> This patch series fixes a GDB hang when attaching to a multi-threaded
> inferior which happens often (but not always) on aarch64-linux and
> powerpc64le-linux, as described in PR 31312.  See patch 3 for a detailed
> descripiton of the problem.
>
> Patches 1 and 2 are preparatory patches because I want to use existing code
> to parse the /proc/PID/stat file to get the thread's starttime value, so
> that GDB and gdbserver aren't fooled by PID reuse.
>
> This patch series was tested on native and extended-remote aarch64-linux
> and armv8l-linux-gnueabihf and no regressions were found, except for the
> following:
>
> When running gdb.threads/detach-step-over.exp on armv8l-linux-gnueabihf
> extended-remote, sometimes GDBserver dies with:
>
>   builtin_spawn /home/thiago.bauermann/.cache/builds/gdb-native-aarch32/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over
>   Remote debugging from host 127.0.0.1, port 56624
>   Process /home/thiago.bauermann/.cache/builds/gdb-native-aarch32/gdb/testsuite/outputs/gdb.threads/detach-step-over/detach-step-over created; pid = 840876
>   Attached; pid = 840821
>   Detaching from process 840821
>   Attached; pid = 840821
>   /home/thiago.bauermann/src/binutils-gdb/gdbserver/linux-low.cc:1956: A problem internal to GDBserver has been detected.
>   unsuspend LWP 840821, suspended=-1
>
> The assertion triggered is this one:
>
>   /* Decrement LWP's suspend count.  */
>
>   static void
>   lwp_suspended_decr (struct lwp_info *lwp)
>   {
>     lwp->suspended--;
>
>     if (lwp->suspended < 0)
>       {
>         struct thread_info *thread = get_lwp_thread (lwp);
>
>         internal_error ("unsuspend LWP %ld, suspended=%d\n", lwpid_of (thread),
>                       lwp->suspended);
>       }
>   }
>
> Unfortunately for the moment I don't have time to further debug this
> problem and I didn't want to keep sitting on these patches until I can come
> back to this issue.
>
> Note that of all the testcases in the GDB testsuite, only
> detach-step-over.exp triggers the GDBserver internal error so it's a
> localized problem.
>
> This is why I'm posting the patch series as an RFC. Considering that it
> fixes a problem that is causing instability in the testsuite results for
> aarch64-linux and powerpc64le-linux, does it make sense to commit it as is,
> and then investigate the GDBserver internal error on armv8l-linux-gnueabihf
> later?

I quickly looked at the series, patches 1 and 2 LGTM, I would say
patch 3 too, but it seems to be causing the random failures you
mention :-(
However, I think your rationale is OK, trading many failures for a
single, localized one.
But of course, I'm not a maintainer :-)

Thanks,

Christophe

>
> Thiago Jung Bauermann (3):
>   gdb/nat: Use procfs(5) indexes in linux_common_core_of_thread
>   gdb/nat: Factor linux_find_proc_stat_field out of
>     linux_common_core_of_thread
>   gdb/nat/linux: Fix attaching to process when it has zombie threads
>
>  gdb/nat/linux-osdata.c | 65 +++++++++++++++++++++++++++++++++---------
>  gdb/nat/linux-osdata.h |  7 +++++
>  gdb/nat/linux-procfs.c | 19 ++++++++++++
>  3 files changed, 77 insertions(+), 14 deletions(-)
>
>
> base-commit: b42aa684f6ff2bce9b8bc58aa89574723f17f1ce