[RFC] Set process affinity in test to work around ARM ptrace bug

Message ID 86a8hxzni8.fsf@gmail.com
State New, archived
Headers

Commit Message

Yao Qi July 4, 2016, 10:49 a.m. UTC
  Pedro Alves <palves@redhat.com> writes:

> I also think that whatever workaround, if any, should be limited
> to known-broken kernels.  Otherwise, this is likely to mask
> other problems going forward.  Maybe all we have is the version
> number to work with, but that's still better than unconditionally
> enabling this on arm.

The updated version adds a linux kernel version check.
  

Comments

Yao Qi July 25, 2016, 1:22 p.m. UTC | #1
Ping.

On Mon, Jul 4, 2016 at 11:49 AM, Yao Qi <qiyaoltc@gmail.com> wrote:
> Pedro Alves <palves@redhat.com> writes:
>
>> I also think that whatever workaround, if any, should be limited
>> to known-broken kernels.  Otherwise, this is likely to mask
>> other problems going forward.  Maybe all we have is the version
>> number to work with, but that's still better than unconditionally
>> enabling this on arm.
>
> The updated version adds a linux kernel version check.
>
> --
> Yao (齐尧)
> From 27fe094e6a99929f8f281d88beaa599771550025 Mon Sep 17 00:00:00 2001
> From: Yao Qi <yao.qi@linaro.org>
> Date: Mon, 27 Jun 2016 08:45:16 +0100
> Subject: [PATCH] Set process affinity in test to work around ARM ptrace bug
>
> We recently found a ARM kernel ptrace bug
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-May/431962.html
> As a result of this bug, after GDB ptrace set VFP registers, the hardware
> registers may not be updated.  This bug causes some intermittent fails in
> tests, like return.exp, call-rt-st.exp, callfuncs.exp, etc.
>
> The bug was introduced by 8130b9d7b9d858aa04ce67805e8951e3cb6e9b2f
> in 2012 and is fixed in e2dfb4b880146bfd4b6aa8e138c0205407cebbaf in May.
> The bug is fixed in ARM kernel tree, but it is impractical to upgrade
> linux kernel from git tree or most recently release.  I am wondering
> we can workaround this kernel bug somehow.
>
> My first attempt is to workaround it in GDB, so that GDB still writes
> the VFP registers and sync them to hardware.  The kernel patch is quite
> simple, which moves vfp_flush_hwstate one line below.  Probably, we can
> call ptrace set vfp registers twice, and then the second vfp set can
> flush the state correctly.  Unfortunately, it doesn't work, because
> every time of ptrace set, kernel loads VFP registers from hardware first,
> which might be out of date after the first ptrace set.  That is to say,
> we can't workaround this kernel bug in GDB.
>
> Then, I am thinking we can workaround this bug in testing, because the
> intermittent fails are confusing in comparing test results.  We can bind
> both tracer and tracee on the same core.  For example, we can start GDB
> or GDBserver with "taskset -c 0 ", but this is a global change, may
> have some affects on gdb.threads tests.  I also think about doing
> "taskset -p PID -c 0" in test harness after the inferior is started,
> and do the same to the parent process of inferior (which is either GDB
> or GDBserver), but don't know how to get GDB (in remote host) and
> GDBserver's process id.
>
> The approach in this patch is to have a small c function which sets
> both process affinity and its parent's affinity to core 0 if the target
> is arm linux and the kernel version is known broken having the ptrace
> bug setting VFP registers.  The function set_process_affinity should
> be called in these tests explicitly, but other tests are not affected
> at all.
>
> Note that this kernel bug only exists between commits
> 8130b9d7b9d858aa04ce67805e8951e3cb6e9b2f and e2dfb4b880146bfd4b6aa8e138c0205407cebbaf
> However, a certain commit will be merged to many branches and releases,
> which makes version checks complicated.  I checked all released kernels,
> and get a list of versions that this bug is fixed.  Not all longterm
> kernels on kernel.org have this bug fix, I don't know why, for example,
> some 3.x kernels doesn't have this bug fix.
>
> Secondly, kernels older than 8130b9d7b9d858aa04ce67805e8951e3cb6e9b2f
> are not affected by this bug, so the official kernel releases older
> than 3.0.21 or 3.2.6 are not affected by this bug, but I think the
> distro may backport the commit to their older kernel, so it makes few
> sense to check kernel is older than some versions (3.0.21 and 3.2.6).
>
> gdb/testsuite:
>
> 2016-07-04  Yao Qi  <yao.qi@linaro.org>
>
>         * lib/set_process_affinity.c: New file.
>
>         * gdb.arch/arm-neon.c: Include lib/set_process_affinity.c.
>         (main): Call set_process_affinity.
>         * gdb.base/callfuncs.c: Likewise.
>         * gdb.base/call-rt-st.c: Likewise.
>         * gdb.base/gnu_vector.c: Likewise.
>         * gdb.base/return.c: Likewise.
>         * gdb.base/return2.c: Likewise.
>         * gdb.base/store.c: Likewise.
>         * gdb.base/structs.c: Likewise.
>         * gdb.arch/arm-neon.exp: Set breakpoint and continue to
>         breakpoint.
>         * gdb.base/gnu_vector.exp: Likewise.
>
  
Pedro Alves July 25, 2016, 2:28 p.m. UTC | #2
On 07/25/2016 02:22 PM, Yao Qi wrote:
> Ping.

Thanks.

Hmm.  Seeing that the kernel fix was backported to so many
stable releases (positively) surprised me.  In that case, I question
the testsuite workaround a bit harder.  If this was a workaround in
gdb or gdbserver themselves, then it be more clear to me that the workaround
would be going to a broad set of users for whom updating the kernel is not easy.

But since this is only for when running the testsuite alone, I could argue that
this masks the problem and thus makes it look like gdb works better on an
affected system than it really does.  I think if I were working on gdb/gdbserver
on arm, I'd much prefer if gdb told me my system had a broken ptrace, so I
could act on it, rather than masking it off and pretend all is well.
How about we make gdb / gdbserver detect bad kernel version, and output a
warning to the effect?  We already have precedent in nat/linux-ptrace.c.
I think we should probably do that regardless of any testsuite workaround.

How bad would it be to push for people to update their kernels?


From a testsuite workaround angle, instead of sprinkling 
set_process_affinity calls around, what if we we added a new proc
that would be called at the top of the .exp files:

gdb_caching_proc skip_arm_vfp_tests {} {

  if arm && linux && broken linux versions {
     return 1
  }
  
  return 0
}

This would skip tests instead of making them pass, but how bad would
that be?  I assume that people doing gdb development/testing on arm will
be able to update their kernels, and will very much want to do that.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/testsuite/gdb.arch/arm-neon.c b/gdb/testsuite/gdb.arch/arm-neon.c
index c67191c..f090b63 100644
--- a/gdb/testsuite/gdb.arch/arm-neon.c
+++ b/gdb/testsuite/gdb.arch/arm-neon.c
@@ -15,6 +15,7 @@ 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 
+#include "../lib/set_process_affinity.c"
 #include <arm_neon.h>
 
 #define DEF_FUNC1(N, TYPE, VALUE...)	\
@@ -98,5 +99,6 @@  DEF_FUNC2 (3)
 int
 main (void)
 {
-  return 0;
+  set_process_affinity ();
+  return 0; /* breakpoint here */
 }
diff --git a/gdb/testsuite/gdb.arch/arm-neon.exp b/gdb/testsuite/gdb.arch/arm-neon.exp
index 053170f..d7a149d 100644
--- a/gdb/testsuite/gdb.arch/arm-neon.exp
+++ b/gdb/testsuite/gdb.arch/arm-neon.exp
@@ -31,6 +31,9 @@  if ![runto_main] {
     return -1
 }
 
+gdb_breakpoint [gdb_get_line_number "breakpoint here"]
+gdb_continue_to_breakpoint "breakpoint here"
+
 # Test passing vectors in function argument in the inferior call.
 
 for {set i 1} {$i <= 18} {incr i} {
diff --git a/gdb/testsuite/gdb.base/call-rt-st.c b/gdb/testsuite/gdb.base/call-rt-st.c
index 072ea86..ad97e28 100644
--- a/gdb/testsuite/gdb.base/call-rt-st.c
+++ b/gdb/testsuite/gdb.base/call-rt-st.c
@@ -1,3 +1,4 @@ 
+#include "../lib/set_process_affinity.c"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -565,6 +566,7 @@  int main ()  {
    struct two_floats_t      *f3;
 
   gdb_unbuffer_output ();
+  set_process_affinity ();
 
   /* Allocate space for large structures 
    */
diff --git a/gdb/testsuite/gdb.base/callfuncs.c b/gdb/testsuite/gdb.base/callfuncs.c
index 317e7c4..cbc1977 100644
--- a/gdb/testsuite/gdb.base/callfuncs.c
+++ b/gdb/testsuite/gdb.base/callfuncs.c
@@ -25,6 +25,7 @@ 
 #define PARAMS(paramlist) paramlist
 #endif
 
+#include "../lib/set_process_affinity.c"
 # include <stdlib.h>
 # include <string.h>
 
@@ -644,7 +645,7 @@  voidfunc (void)
 
 int main ()
 {
-  void *p = malloc (1);
+  void *p = malloc (1); set_process_affinity ();
   t_double_values(double_val1, double_val2);
   t_structs_c(struct_val1);
   free (p);
diff --git a/gdb/testsuite/gdb.base/gnu_vector.c b/gdb/testsuite/gdb.base/gnu_vector.c
index ee03ac1..8e0d6a8 100644
--- a/gdb/testsuite/gdb.base/gnu_vector.c
+++ b/gdb/testsuite/gdb.base/gnu_vector.c
@@ -18,6 +18,7 @@ 
    Contributed by Ken Werner <ken.werner@de.ibm.com>  */
 
 #include <stdarg.h>
+#include "../lib/set_process_affinity.c"
 
 #define VECTOR(n, type)					\
   type __attribute__ ((vector_size (n * sizeof(type))))
@@ -137,7 +138,8 @@  main ()
 {
   int4 res;
 
-  res = add_some_intvecs (i4a, i4a + i4b, i4b);
+  set_process_affinity ();
+  res = add_some_intvecs (i4a, i4a + i4b, i4b); /* breakpoint here */
 
   res = add_some_intvecs (i4a, i4a + i4b, i4b);
 
diff --git a/gdb/testsuite/gdb.base/gnu_vector.exp b/gdb/testsuite/gdb.base/gnu_vector.exp
index aafaedd..1e57a26 100644
--- a/gdb/testsuite/gdb.base/gnu_vector.exp
+++ b/gdb/testsuite/gdb.base/gnu_vector.exp
@@ -55,6 +55,9 @@  gdb_test_multiple "show endian" "show endian" {
     }
 }
 
+gdb_breakpoint [gdb_get_line_number "breakpoint here"]
+gdb_continue_to_breakpoint "breakpoint here"
+
 # Test printing of character vector types
 gdb_test "print c4" "\\\$$decimal = \\{1, 2, 3, 4\\}"
 gdb_test "print c4\[2\]" "\\\$$decimal = 3"
diff --git a/gdb/testsuite/gdb.base/return.c b/gdb/testsuite/gdb.base/return.c
index c365e88..6ff38e6 100644
--- a/gdb/testsuite/gdb.base/return.c
+++ b/gdb/testsuite/gdb.base/return.c
@@ -15,6 +15,7 @@ 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 
+#include "../lib/set_process_affinity.c"
 #include <stdio.h>
 /*  Test "return" command.  */
 
@@ -40,6 +41,7 @@  double tmp3;
 
 int main ()
 {
+  set_process_affinity ();
   func1 ();
   printf("in main after func1\n");
   tmp2 = func2 ();
diff --git a/gdb/testsuite/gdb.base/return2.c b/gdb/testsuite/gdb.base/return2.c
index ced472a..53e292f 100644
--- a/gdb/testsuite/gdb.base/return2.c
+++ b/gdb/testsuite/gdb.base/return2.c
@@ -15,6 +15,7 @@ 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 
+#include "../lib/set_process_affinity.c"
 /* Test gdb's "return" command.  */
 
 int void_test = 0;
@@ -90,6 +91,7 @@  int main (int argc, char **argv)
   double double_resultval;
   int i;
 
+  set_process_affinity ();
   /* A "test load" that will insure that the function really returns 
      a ${type} (as opposed to just a truncated or part of a ${type}).  */
   for (i = 0; i < sizeof (testval.ffff); i++)
diff --git a/gdb/testsuite/gdb.base/store.c b/gdb/testsuite/gdb.base/store.c
index 545515d..d878142 100644
--- a/gdb/testsuite/gdb.base/store.c
+++ b/gdb/testsuite/gdb.base/store.c
@@ -7,6 +7,8 @@ 
    function calls within main even when no optimization flags were
    passed.  */
 
+#include "../lib/set_process_affinity.c"
+
 typedef signed char charest;
 
 charest
@@ -254,6 +256,7 @@  wack_field_4 (void)
 int
 main ()
 {
+  set_process_affinity ();
   /* These calls are for current frame test.  */
   wack_charest (-1, -2);
   wack_short (-1, -2);
diff --git a/gdb/testsuite/gdb.base/structs.c b/gdb/testsuite/gdb.base/structs.c
index b5832cc..7be1fe0 100644
--- a/gdb/testsuite/gdb.base/structs.c
+++ b/gdb/testsuite/gdb.base/structs.c
@@ -15,6 +15,8 @@ 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 
+#include "../lib/set_process_affinity.c"
+
 /* Useful abreviations.  */
 typedef void t;
 typedef char tc;
@@ -313,6 +315,8 @@  int main()
 {
   int i;
 
+  set_process_affinity ();
+
   for (i = 0; i < 256; i++)
     chartest[i].c = i;
   chartest[0].c = 0;  /* chartest-done */
diff --git a/gdb/testsuite/lib/set_process_affinity.c b/gdb/testsuite/lib/set_process_affinity.c
new file mode 100644
index 0000000..1d2a0e4
--- /dev/null
+++ b/gdb/testsuite/lib/set_process_affinity.c
@@ -0,0 +1,98 @@ 
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#if defined(__arm__) && defined(__linux__)
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <stdlib.h>
+
+struct version
+{
+  long major;
+  long minor;
+  long patch;
+};
+
+/* Probe the kernel version into V, and return 0 on success.  */
+
+static int
+probe_kernel_version (struct version *v)
+{
+  struct utsname buffer;
+
+  if (uname (&buffer) == 0)
+    {
+      char *start, *end;
+
+      start = buffer.release;
+      v->major = strtol (start, &end, 10);
+
+      start = end + 1;
+      v->minor = strtol (start, &end, 10);
+
+      start = end + 1;
+      v->patch = strtol (start, &end, 10);
+      return 0;
+    }
+  else
+    return -1;
+}
+
+#define VERSION_NEWER_THAN(VER, MAJOR, MINOR, PATCH) \
+  VER.major == MAJOR && VER.minor == MINOR && VER.patch >= PATCH
+
+#endif
+
+static void
+set_process_affinity (void)
+{
+#if defined(__arm__) && defined(__linux__)
+  struct version kernel;
+  cpu_set_t my_set;
+
+  if (probe_kernel_version (&kernel))
+    {
+      /* Can't get kernel version, do nothing.  */
+      return;
+    }
+
+  if (kernel.major >= 5
+      || (kernel.major == 4 && kernel.minor >= 7) /* 4.7 and later */
+      || VERSION_NEWER_THAN (kernel, 4, 6, 3)
+      || VERSION_NEWER_THAN (kernel, 4, 4, 14)
+      || VERSION_NEWER_THAN (kernel, 4, 1, 27)
+      || VERSION_NEWER_THAN (kernel, 3, 18, 36)
+      || VERSION_NEWER_THAN (kernel, 3, 14, 73))
+    {
+      /* Kernel is new enough to have bug fixed, do nothing.  */
+      return;
+    }
+
+  /* Set both process and parent process (GDB)'s affinity on core 0 to
+     workaround ARM linux kernel ptrace bug which doesn't flush the VFP
+     state to hardware after ptrace set VFP registers.  */
+
+  CPU_ZERO (&my_set);
+  CPU_SET (0, &my_set);
+
+  sched_setaffinity (0, sizeof(cpu_set_t), &my_set);
+  sched_setaffinity (getppid (), sizeof(cpu_set_t), &my_set);
+#endif
+}