[RFC] Set process affinity in test to work around ARM ptrace bug

Message ID 1467295036-2816-1-git-send-email-yao.qi@linaro.org
State New, archived
Headers

Commit Message

Yao Qi June 30, 2016, 1:57 p.m. UTC
  We recently found a ARM kernel ptrace bug
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-May/431962.html
As a result of this bug, after GDB ptrace set VFP registers, the hardware
registers may not be updated.  This bug causes some intermittent fails in
tests, like return.exp, call-rt-st.exp, callfuncs.exp, etc.

The bug is fixed in ARM kernel tree, but it is impractical to upgrade
linux kernel from git tree or most recently release (I don't know when
the fix can be shipped in the mainline kernel release).  I am wondering
we can workaround this kernel bug somehow.

My first attempt is to workaround it in GDB, so that GDB still writes
the VFP registers and sync them to hardware.  The kernel patch is quite
simple, which moves vfp_flush_hwstate one line below.  Probably, we can
call ptrace set vfp registers twice, and then the second vfp set can
flush the state correctly.  Unfortunately, it doesn't work, because
every time of ptrace set, kernel loads VFP registers from hardware first,
which might be out of date after the first ptrace set.  That is to say,
we can't workaround this kernel bug in GDB.

Then, I am thinking we can workaround this bug in testing, because the
intermittent fails are confusing in comparing test results, by binding
both tracer and tracee on the same core.  For example, we can start GDB
or GDBserver with "taskset -c 0 ", but this is a global change, may
have some affects on gdb.threads tests.  I also think about doing
"taskset -p PID -c 0" in test harness after the inferior is started,
and do the same to the parent process of inferior (which is either GDB
or GDBserver).

The approach in this patch is to have a small c function which sets
both process affinity and its parent's affinity to core 0.  This
function should be called in these tests explicitly, but other tests
are not affected at all.  This patch is posted to get comments on the
necessity of workaround this kernel bug, and the proper to workaround
this bug.  There are still some test cases affected by this kernel bug,
but this patch doesn't touch them yet.

gdb/testsuite:

2016-06-30  Yao Qi  <yao.qi@linaro.org>

	* lib/set_process_affinity.c: New file.
	* gdb.base/call-rt-st.c: Include lib/set_process_affinity.c.
	(main): Call set_process_affinity.
	* gdb.base/gnu_vector.c: Likewise.
	* gdb.base/return.c: Likewise.
	* gdb.base/gnu_vector.exp: Adjust test.
---
 gdb/testsuite/gdb.base/call-rt-st.c      |  2 ++
 gdb/testsuite/gdb.base/gnu_vector.c      |  4 +++-
 gdb/testsuite/gdb.base/gnu_vector.exp    |  3 +++
 gdb/testsuite/gdb.base/return.c          |  2 ++
 gdb/testsuite/lib/set_process_affinity.c | 41 ++++++++++++++++++++++++++++++++
 5 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/lib/set_process_affinity.c
  

Comments

Antoine Tremblay June 30, 2016, 2:19 p.m. UTC | #1
Yao Qi writes:

> We recently found a ARM kernel ptrace bug
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-May/431962.html
> As a result of this bug, after GDB ptrace set VFP registers, the hardware
> registers may not be updated.  This bug causes some intermittent fails in
> tests, like return.exp, call-rt-st.exp, callfuncs.exp, etc.
>
> The bug is fixed in ARM kernel tree, but it is impractical to upgrade
> linux kernel from git tree or most recently release (I don't know when
> the fix can be shipped in the mainline kernel release).  I am wondering
> we can workaround this kernel bug somehow.
>
> My first attempt is to workaround it in GDB, so that GDB still writes
> the VFP registers and sync them to hardware.  The kernel patch is quite
> simple, which moves vfp_flush_hwstate one line below.  Probably, we can
> call ptrace set vfp registers twice, and then the second vfp set can
> flush the state correctly.  Unfortunately, it doesn't work, because
> every time of ptrace set, kernel loads VFP registers from hardware first,
> which might be out of date after the first ptrace set.  That is to say,
> we can't workaround this kernel bug in GDB.
>
> Then, I am thinking we can workaround this bug in testing, because the
> intermittent fails are confusing in comparing test results, by binding
> both tracer and tracee on the same core.  For example, we can start GDB
> or GDBserver with "taskset -c 0 ", but this is a global change, may
> have some affects on gdb.threads tests.  I also think about doing
> "taskset -p PID -c 0" in test harness after the inferior is started,
> and do the same to the parent process of inferior (which is either GDB
> or GDBserver).
>
> The approach in this patch is to have a small c function which sets
> both process affinity and its parent's affinity to core 0.  This
> function should be called in these tests explicitly, but other tests
> are not affected at all.  This patch is posted to get comments on the
> necessity of workaround this kernel bug, and the proper to workaround
> this bug.  There are still some test cases affected by this kernel bug,
> but this patch doesn't touch them yet.
>

I like the idea, this has been a pain for a while however from my
testing there is a lot of intermitent tests and I'm not sure if this
ptrace fix fixes them all.

I think we just make sure that we don't hide other ptrace bugs so that
we can find them. I had another bug in the Odroid UX4 SoC causing
similar problems.

Also to consider is that this could apply to a lot of tests here's my
list of intermittent test from about 40 runs with Sergio's script:

argv0-symlink.exp array_bounds.exp array_ptr_renaming.exp
array_subscript_addr.exp auxv.exp bp-permanent.exp bp_enum_homonym.exp
bp_range_type.exp branch-to-self.exp break-precsave.exp
breakpoint-in-ro-region.exp catch_ex.exp char_enum.exp class2.exp
consecutive-precsave.exp converts.exp coredump-filter.exp dot_all.exp
exprs.exp fin_fun_out.exp finish-precsave.exp finish-reverse-bkpt.exp
finish-reverse.exp fixed_points.exp float_param.exp frame-args.exp
fstatat-reverse.exp fun_overload_menu.exp fun_renaming.exp
funcall_char.exp gcore-buffer-overflow.exp gcore-relro-pie.exp
gcore-relro.exp gcore.exp gdb-index.exp gdb1555.exp
getresuid-reverse.exp gnu-ifunc.exp gnu_vector.exp info-proc.exp
info-threads.exp interrupted-hand-call.exp jmisc.exp jprint.exp jump.exp
lang_switch.exp machinestate-precsave.exp mi_dyn_arr.exp
mi_interface.exp mi_task_arg.exp mi_task_info.exp mi_var_array.exp
multi-forks.exp next-while-other-thread-longjmps.exp operators.exp
optim_drec.exp out_of_line_in_inlined.exp pckd_arr_ren.exp
pipe-reverse.exp print-symbol-loading.exp print_chars.exp
process-dies-while-handling-bp.exp pthreads.exp py-strfns.exp python.exp
queue-signal.exp readv-reverse.exp rec_return.exp sigall-precsave.exp
sigall-reverse.exp siginfo-obj.exp siginfo-thread.exp skip-solib.exp
small_reg_param.exp solib-precsave.exp solib-reverse.exp str_uninit.exp
taft_type.exp task_bp.exp type_coercion.exp until-reverse.exp
watch-precsave.exp whatis_array_val.exp watch-bitfields.exp
packed_array.exp formatted_ref.exp vec_comps.exp solib-intra-step.exp
waitpid-reverse.exp mi-tsv-changed.exp"

These a from a few weeks ago and I think a lof of reverse tests may not
be valid... not sure still it's quite a list.

I'll retest with the patched kernel over the weekend see how many go
away...

Thanks for looking into this!
Antoine
  
Pedro Alves June 30, 2016, 3:31 p.m. UTC | #2
On 06/30/2016 02:57 PM, Yao Qi wrote:

> Then, I am thinking we can workaround this bug in testing, because the
> intermittent fails are confusing in comparing test results, by binding
> both tracer and tracee on the same core.  For example, we can start GDB
> or GDBserver with "taskset -c 0 ", but this is a global change, may
> have some affects on gdb.threads tests.
> I also think about doing
> "taskset -p PID -c 0" in test harness after the inferior is started,
> and do the same to the parent process of inferior (which is either GDB
> or GDBserver).
> 
> The approach in this patch is to have a small c function which sets
> both process affinity and its parent's affinity to core 0.  This
> function should be called in these tests explicitly, but other tests
> are not affected at all.  This patch is posted to get comments on the
> necessity of workaround this kernel bug, and the proper to workaround
> this bug.  There are still some test cases affected by this kernel bug,
> but this patch doesn't touch them yet.

Pushing people to update their kernels would be better, but I
understand how that's complicated on ARM, given that in many cases
it's not even possible to have access to the kernel's sources...

Still, it'd think that a fix in gdb/gdbserver itself would be
better  for _users_.

Also having to manually determine whether a test is misbehaving
because of this problem or not seems like recipe for continued pain.

I also think that whatever workaround, if any, should be limited
to known-broken kernels.  Otherwise, this is likely to mask
other problems going forward.  Maybe all we have is the version
number to work with, but that's still better than unconditionally
enabling this on arm.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/testsuite/gdb.base/call-rt-st.c b/gdb/testsuite/gdb.base/call-rt-st.c
index 072ea86..ad97e28 100644
--- a/gdb/testsuite/gdb.base/call-rt-st.c
+++ b/gdb/testsuite/gdb.base/call-rt-st.c
@@ -1,3 +1,4 @@ 
+#include "../lib/set_process_affinity.c"
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -565,6 +566,7 @@  int main ()  {
    struct two_floats_t      *f3;
 
   gdb_unbuffer_output ();
+  set_process_affinity ();
 
   /* Allocate space for large structures 
    */
diff --git a/gdb/testsuite/gdb.base/gnu_vector.c b/gdb/testsuite/gdb.base/gnu_vector.c
index ee03ac1..8e0d6a8 100644
--- a/gdb/testsuite/gdb.base/gnu_vector.c
+++ b/gdb/testsuite/gdb.base/gnu_vector.c
@@ -18,6 +18,7 @@ 
    Contributed by Ken Werner <ken.werner@de.ibm.com>  */
 
 #include <stdarg.h>
+#include "../lib/set_process_affinity.c"
 
 #define VECTOR(n, type)					\
   type __attribute__ ((vector_size (n * sizeof(type))))
@@ -137,7 +138,8 @@  main ()
 {
   int4 res;
 
-  res = add_some_intvecs (i4a, i4a + i4b, i4b);
+  set_process_affinity ();
+  res = add_some_intvecs (i4a, i4a + i4b, i4b); /* breakpoint here */
 
   res = add_some_intvecs (i4a, i4a + i4b, i4b);
 
diff --git a/gdb/testsuite/gdb.base/gnu_vector.exp b/gdb/testsuite/gdb.base/gnu_vector.exp
index aafaedd..1e57a26 100644
--- a/gdb/testsuite/gdb.base/gnu_vector.exp
+++ b/gdb/testsuite/gdb.base/gnu_vector.exp
@@ -55,6 +55,9 @@  gdb_test_multiple "show endian" "show endian" {
     }
 }
 
+gdb_breakpoint [gdb_get_line_number "breakpoint here"]
+gdb_continue_to_breakpoint "breakpoint here"
+
 # Test printing of character vector types
 gdb_test "print c4" "\\\$$decimal = \\{1, 2, 3, 4\\}"
 gdb_test "print c4\[2\]" "\\\$$decimal = 3"
diff --git a/gdb/testsuite/gdb.base/return.c b/gdb/testsuite/gdb.base/return.c
index c365e88..6ff38e6 100644
--- a/gdb/testsuite/gdb.base/return.c
+++ b/gdb/testsuite/gdb.base/return.c
@@ -15,6 +15,7 @@ 
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
 
+#include "../lib/set_process_affinity.c"
 #include <stdio.h>
 /*  Test "return" command.  */
 
@@ -40,6 +41,7 @@  double tmp3;
 
 int main ()
 {
+  set_process_affinity ();
   func1 ();
   printf("in main after func1\n");
   tmp2 = func2 ();
diff --git a/gdb/testsuite/lib/set_process_affinity.c b/gdb/testsuite/lib/set_process_affinity.c
new file mode 100644
index 0000000..2615965
--- /dev/null
+++ b/gdb/testsuite/lib/set_process_affinity.c
@@ -0,0 +1,41 @@ 
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#if defined(__arm__) && defined(__linux__)
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#include <sys/types.h>
+#endif
+
+static int
+set_process_affinity (void)
+{
+#if defined(__arm__) && defined(__linux__)
+  cpu_set_t my_set;
+
+  /* Set both process and parent process (GDB)'s affinity on core 0
+     to workaround ARM linux kernel ptrace bug which doesn't flush the
+     VFP state to hardware after ptrace set VFP registers.  */
+
+  CPU_ZERO (&my_set);
+  CPU_SET (0, &my_set);
+
+  sched_setaffinity (0, sizeof(cpu_set_t), &my_set);
+  sched_setaffinity (getppid (), sizeof(cpu_set_t), &my_set);
+#endif
+}