[v3] Fix sporadic XFAILs in gdb.threads/attach-many-short-lived-threads.exp

Message ID AS8P193MB12856B1F3CB608F1B03A9484E4012@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
State New
Headers
Series [v3] Fix sporadic XFAILs in gdb.threads/attach-many-short-lived-threads.exp |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Bernd Edlinger April 7, 2024, 8:42 a.m. UTC
  This is about random test failures like those:

XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 6: attach (EPERM)
XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 7: attach (EPERM)
XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (EPERM)
XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 10: attach (EPERM)

The reason for this effect is apparently as follows:

There is a race condition when gdb tries to attach a thread but the
thread exits at the same time.  Normally when that happens the return
code of ptrace(PTRACE_ATTACH, x) is EPERM, which could also have other
reasons.  To detect the true reason, we try to open /proc/<pid>/status
which normally fails in that situation, but it may happen that the
fopen succeeds, and the thread disappears while reading the content,
then linux_proc_pid_get_state fails to find the "State:" line.
Since there is no other possible reason why this can happen,
use that as an indication that the thread has most likely exited.
---
 gdb/nat/linux-procfs.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

v2: from kernel code review, it seems the missing "State:"
 can only happen if the thread disappeared, so no need to
 look at errno at all here.

v3: update commit message.
  

Comments

Thiago Jung Bauermann April 10, 2024, 3:59 a.m. UTC | #1
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> This is about random test failures like those:
>
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 6: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 7: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
> XFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 10: attach (EPERM)
>
> The reason for this effect is apparently as follows:
>
> There is a race condition when gdb tries to attach a thread but the
> thread exits at the same time.  Normally when that happens the return
> code of ptrace(PTRACE_ATTACH, x) is EPERM, which could also have other
> reasons.  To detect the true reason, we try to open /proc/<pid>/status
> which normally fails in that situation, but it may happen that the
> fopen succeeds, and the thread disappears while reading the content,
> then linux_proc_pid_get_state fails to find the "State:" line.
> Since there is no other possible reason why this can happen,
> use that as an indication that the thread has most likely exited.
> ---
>  gdb/nat/linux-procfs.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> v2: from kernel code review, it seems the missing "State:"
>  can only happen if the thread disappeared, so no need to
>  look at errno at all here.

As I said in v1 of this patch, my reading of the kernel code (function
task_state in linux/fs/proc/array.c) agrees with the statement above,
but I'm not confident in my conclusion.

The reason is that it's tricky to understand the effect of kernel
preemption as well as simultaneous execution of "competing" code in a
different CPU. Case in point: in the beginning of task_state, there's a
block enclosed in rcu_read_lock *and* inside that block there's also
some code within task_lock. Why both are necessary? I don't know. And
the weirdest thing to me is that the part which reads the task state for
the "State:" line isn't protected by either of those. My uninformed
assumption would be that it would be best if it were.

So unfortunately I don't feel comfortable enough giving my Reviewed-by:
to this patch, even though I think it's probably right.

I would defer to Pedro, who wrote this code originally, in commit
8784d56326e7 ("Linux: on attach, attach to lwps listed under
/proc/$pid/task/"). It was a long time ago, but perhaps he remembers why
he decided to assume that there could be a live task with no "State:"
line in its status file.

> v3: update commit message.
>
> diff --git a/gdb/nat/linux-procfs.c b/gdb/nat/linux-procfs.c
> index e2086952ce6..8d46d5bf289 100644
> --- a/gdb/nat/linux-procfs.c
> +++ b/gdb/nat/linux-procfs.c
> @@ -157,17 +157,12 @@ linux_proc_pid_is_gone (pid_t pid)
>    enum proc_state state;
>
>    have_state = linux_proc_pid_get_state (pid, 0, &state);
> -  if (have_state < 0)
> +  if (have_state <= 0)
>      {
> -      /* If we can't open the status file, assume the thread has
> -	 disappeared.  */
> +      /* If we can't open the status file or there is no "State:" line,
> +	 assume the thread has disappeared.  */
>        return 1;
>      }
> -  else if (have_state == 0)
> -    {
> -      /* No "State:" line, assume thread is alive.  */
> -      return 0;
> -    }
>    else
>      return (state == PROC_STATE_ZOMBIE || state == PROC_STATE_DEAD);
>  }

--
Thiago
  

Patch

diff --git a/gdb/nat/linux-procfs.c b/gdb/nat/linux-procfs.c
index e2086952ce6..8d46d5bf289 100644
--- a/gdb/nat/linux-procfs.c
+++ b/gdb/nat/linux-procfs.c
@@ -157,17 +157,12 @@  linux_proc_pid_is_gone (pid_t pid)
   enum proc_state state;
 
   have_state = linux_proc_pid_get_state (pid, 0, &state);
-  if (have_state < 0)
+  if (have_state <= 0)
     {
-      /* If we can't open the status file, assume the thread has
-	 disappeared.  */
+      /* If we can't open the status file or there is no "State:" line,
+	 assume the thread has disappeared.  */
       return 1;
     }
-  else if (have_state == 0)
-    {
-      /* No "State:" line, assume thread is alive.  */
-      return 0;
-    }
   else
     return (state == PROC_STATE_ZOMBIE || state == PROC_STATE_DEAD);
 }