[v2,1/3,gdb/tdep] Add syscall number cache

Message ID 20231127202054.22070-1-tdevries@suse.de
State New
Headers
Series [v2,1/3,gdb/tdep] Add syscall number cache |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Testing passed

Commit Message

Tom de Vries Nov. 27, 2023, 8:20 p.m. UTC
  When running test-case gdb.base/catch-syscall.exp on powerpc64le-linux, we run
into an xfail:
...
(gdb) catch syscall execve^M
Catchpoint 18 (syscall 'execve' [11])^M
(gdb) PASS: gdb.base/catch-syscall.exp: execve: \
  catch syscall with arguments (execve)
  ...
continue^M
Continuing.^M
^M
Catchpoint 18 (call to syscall execve), 0x00007ffff7d7f18c in execve () from \
  /lib64/libc.so.6^M
(gdb) PASS: gdb.base/catch-syscall.exp: execve: program has called execve
continue^M
Continuing.^M
process 60484 is executing new program: catch-syscall^M
^M
Breakpoint 17, main (argc=1, argv=0x7fffffffe618) at catch-syscall.c:54^M
54              char buf1[2] = "a";^M
(gdb) XFAIL: gdb.base/catch-syscall.exp: execve: syscall execve has returned
...

The problem is that the catchpoint "(return from syscall execve)" doesn't
trigger.

This is caused by ppc_linux_get_syscall_number returning 0 at execve
syscall-exit-stop, while it should return 11.

This is a problem that was fixed in linux kernel version v5.19, by commit
ec6d0dde71d7 ("powerpc: Enable execve syscall exit tracepoint"), but the
machine I'm running the tests on has v4.18.0.

An approach was discussed in the PR where ppc_linux_get_syscall_number would
try to detect an execve syscall-exit-stop based on the register state, but
that was considered too fragile.

Fix this by caching the syscall number at syscall-enter-stop, and reusing it
at syscall-exit-stop.

This is sufficient to stop triggering the xfail, so remove it.

It's good to point out that this doesn't always eliminate the need to get the
syscall number at a syscall-exit-stop.

The test-case has an example called mid-vfork, where we do:
- catch vfork
- continue
- catch syscall
- continue.

The following things happen:
- the "catch vfork" specifies that we capture the PTRACE_EVENT_VFORK event.
- the first continue runs into the event
- the "catch syscall" specifies that we capture syscall-enter-stop and
  syscall-exit-stop events.
- the second continue runs into the syscall-exit-stop.  At that point there's
  no syscall number value cached, because no corresponding syscall-enter-stop
  was observed.

We can address this issue somewhat by translating events into syscalls.  A
followup patch in this series use this approach (though not for vfork).

PR tdep/28623
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28623
---
 gdb/linux-nat.c                          | 53 ++++++++++++++++++++++--
 gdb/linux-nat.h                          |  3 ++
 gdb/testsuite/gdb.base/catch-syscall.exp |  8 +---
 3 files changed, 53 insertions(+), 11 deletions(-)


base-commit: f1b8ee6f2b4381bc46a0ad4c233b6eddc1e135b5
  

Comments

John Baldwin Nov. 27, 2023, 10:33 p.m. UTC | #1
On 11/27/23 12:20 PM, Tom de Vries wrote:
> When running test-case gdb.base/catch-syscall.exp on powerpc64le-linux, we run
> into an xfail:
> ...
> (gdb) catch syscall execve^M
> Catchpoint 18 (syscall 'execve' [11])^M
> (gdb) PASS: gdb.base/catch-syscall.exp: execve: \
>    catch syscall with arguments (execve)
>    ...
> continue^M
> Continuing.^M
> ^M
> Catchpoint 18 (call to syscall execve), 0x00007ffff7d7f18c in execve () from \
>    /lib64/libc.so.6^M
> (gdb) PASS: gdb.base/catch-syscall.exp: execve: program has called execve
> continue^M
> Continuing.^M
> process 60484 is executing new program: catch-syscall^M
> ^M
> Breakpoint 17, main (argc=1, argv=0x7fffffffe618) at catch-syscall.c:54^M
> 54              char buf1[2] = "a";^M
> (gdb) XFAIL: gdb.base/catch-syscall.exp: execve: syscall execve has returned
> ...
> 
> The problem is that the catchpoint "(return from syscall execve)" doesn't
> trigger.
> 
> This is caused by ppc_linux_get_syscall_number returning 0 at execve
> syscall-exit-stop, while it should return 11.
> 
> This is a problem that was fixed in linux kernel version v5.19, by commit
> ec6d0dde71d7 ("powerpc: Enable execve syscall exit tracepoint"), but the
> machine I'm running the tests on has v4.18.0.
> 
> An approach was discussed in the PR where ppc_linux_get_syscall_number would
> try to detect an execve syscall-exit-stop based on the register state, but
> that was considered too fragile.
> 
> Fix this by caching the syscall number at syscall-enter-stop, and reusing it
> at syscall-exit-stop.
> 
> This is sufficient to stop triggering the xfail, so remove it.
> 
> It's good to point out that this doesn't always eliminate the need to get the
> syscall number at a syscall-exit-stop.
> 
> The test-case has an example called mid-vfork, where we do:
> - catch vfork
> - continue
> - catch syscall
> - continue.
> 
> The following things happen:
> - the "catch vfork" specifies that we capture the PTRACE_EVENT_VFORK event.
> - the first continue runs into the event
> - the "catch syscall" specifies that we capture syscall-enter-stop and
>    syscall-exit-stop events.
> - the second continue runs into the syscall-exit-stop.  At that point there's
>    no syscall number value cached, because no corresponding syscall-enter-stop
>    was observed.
> 
> We can address this issue somewhat by translating events into syscalls.  A
> followup patch in this series use this approach (though not for vfork).
> 
> PR tdep/28623
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28623

All 3 of these LGTM modulo one nit below.

Reviewed-by: John Baldwin <jhb@FreeBSD.org>

> ---
>   gdb/linux-nat.c                          | 53 ++++++++++++++++++++++--
>   gdb/linux-nat.h                          |  3 ++
>   gdb/testsuite/gdb.base/catch-syscall.exp |  8 +---
>   3 files changed, 53 insertions(+), 11 deletions(-)
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index 7b0562cf89b..89c4622160a 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -1508,6 +1508,17 @@ linux_resume_one_lwp_throw (struct lwp_info *lp, int step,
>     else
>       lp->stop_pc = 0;
>   
> +  if (catch_syscall_enabled () > 0)
> +    {
> +      /* Function inf_ptrace_target::resume uses PT_SYSCALL.  */
> +    }

ICYMI, Simon just changed catch_syscall_enabled to return a bool a couple of
hours before your mail.
  

Patch

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 7b0562cf89b..89c4622160a 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -1508,6 +1508,17 @@  linux_resume_one_lwp_throw (struct lwp_info *lp, int step,
   else
     lp->stop_pc = 0;
 
+  if (catch_syscall_enabled () > 0)
+    {
+      /* Function inf_ptrace_target::resume uses PT_SYSCALL.  */
+    }
+  else
+    {
+      /* Function inf_ptrace_target::resume uses PT_CONTINUE.
+	 Invalidate syscall_number cache.  */
+      lp->syscall_number = -1;
+    }
+
   linux_target->low_prepare_to_resume (lp);
   linux_target->low_resume (lp->ptid, step, signo);
 
@@ -1762,7 +1773,31 @@  linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
   struct target_waitstatus *ourstatus = &lp->waitstatus;
   struct gdbarch *gdbarch = target_thread_architecture (lp->ptid);
   thread_info *thread = linux_target->find_thread (lp->ptid);
-  int syscall_number = (int) gdbarch_get_syscall_number (gdbarch, thread);
+
+  enum target_waitkind new_syscall_state
+    = (lp->syscall_state == TARGET_WAITKIND_SYSCALL_ENTRY
+       ? TARGET_WAITKIND_SYSCALL_RETURN
+       : TARGET_WAITKIND_SYSCALL_ENTRY);
+
+  int syscall_number;
+  if (new_syscall_state == TARGET_WAITKIND_SYSCALL_RETURN
+      && lp->syscall_number != -1)
+    {
+      /* Calling gdbarch_get_syscall_number for TARGET_WAITKIND_SYSCALL_RETURN
+	 is unreliable on some targets for some syscalls, use the syscall
+	 detected at TARGET_WAITKIND_SYSCALL_ENTRY instead.  */
+      syscall_number = lp->syscall_number;
+      linux_nat_debug_printf
+	(_("Using syscall number %d supplied by syscall_number cache"),
+	 syscall_number);
+    }
+  else
+    {
+      syscall_number = (int) gdbarch_get_syscall_number (gdbarch, thread);
+      linux_nat_debug_printf
+	(_("Using syscall number %d supplied by architecture hook"),
+	 syscall_number);
+    }
 
   if (stopping)
     {
@@ -1791,6 +1826,7 @@  linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
 	 "PTRACE_CONT for SIGSTOP", syscall_number, lp->ptid.lwp ());
 
       lp->syscall_state = TARGET_WAITKIND_IGNORE;
+      lp->syscall_number = -1;
       ptrace (PTRACE_CONT, lp->ptid.lwp (), 0, 0);
       lp->stopped = 0;
       return 1;
@@ -1801,9 +1837,18 @@  linux_handle_syscall_trap (struct lwp_info *lp, int stopping)
      the user could install a new catchpoint for this syscall
      between syscall enter/return, and we'll need to know to
      report a syscall return if that happens.  */
-  lp->syscall_state = (lp->syscall_state == TARGET_WAITKIND_SYSCALL_ENTRY
-		       ? TARGET_WAITKIND_SYSCALL_RETURN
-		       : TARGET_WAITKIND_SYSCALL_ENTRY);
+  lp->syscall_state = new_syscall_state;
+
+  if (lp->syscall_state == TARGET_WAITKIND_SYSCALL_ENTRY)
+    {
+      /* Save to use in TARGET_WAITKIND_SYSCALL_RETURN.  */
+      lp->syscall_number = syscall_number;
+    }
+  else
+    {
+      /* Reset to prevent stale values.  */
+      lp->syscall_number = -1;
+    }
 
   if (catch_syscall_enabled ())
     {
diff --git a/gdb/linux-nat.h b/gdb/linux-nat.h
index 428bb9f1628..b17037400a3 100644
--- a/gdb/linux-nat.h
+++ b/gdb/linux-nat.h
@@ -277,6 +277,9 @@  struct lwp_info : intrusive_list_node<lwp_info>
      - TARGET_WAITKIND_SYSCALL_RETURN */
   enum target_waitkind syscall_state;
 
+  /* Syscall number corresponding to syscall_state.  */
+  int syscall_number = -1;
+
   /* The processor core this LWP was last seen on.  */
   int core = -1;
 
diff --git a/gdb/testsuite/gdb.base/catch-syscall.exp b/gdb/testsuite/gdb.base/catch-syscall.exp
index 0588cb35d87..d8ea466cf00 100644
--- a/gdb/testsuite/gdb.base/catch-syscall.exp
+++ b/gdb/testsuite/gdb.base/catch-syscall.exp
@@ -134,13 +134,7 @@  proc check_return_from_syscall { syscall { pattern "" } } {
 		return 1
 	    }
 	    -re -wrap ".*Breakpoint $decimal, main .*" {
-		# On Powerpc the kernel does not report the returned from
-		# syscall as expected by the test.  GDB bugzilla 28623.
-		if { [istarget "powerpc64*-linux*"] } {
-		    xfail $thistest
-		} else {
-		    fail $thistest
-		}
+		fail $thistest
 		return 0
 	    }
 	}