[gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

Message ID 20190924131612.GA24708@delia
State New, archived
Headers

Commit Message

Tom de Vries Sept. 24, 2019, 1:16 p.m. UTC
  Hi,

On my openSUSE Leap 15.1 x86_64 Skylake system with the default (4.12) kernel,
I run into:
...
FAIL: gdb.base/gcore.exp: corefile restored all registers
...

The problem is that there's a difference in the mxcsr register value before
and after the gcore command:
...
- mxcsr          0x0                 [ ]
+ mxcsr          0x400440            [ DAZ OM ]
...

This can be traced back to amd64_linux_nat_target::fetch_registers, where
xstateregs is partially initialized by the ptrace call:
...
          char xstateregs[X86_XSTATE_MAX_SIZE];
          struct iovec iov;

          amd64_collect_xsave (regcache, -1, xstateregs, 0);
          iov.iov_base = xstateregs;
          iov.iov_len = sizeof (xstateregs);
          if (ptrace (PTRACE_GETREGSET, tid,
                      (unsigned int) NT_X86_XSTATE, (long) &iov) < 0)
            perror_with_name (_("Couldn't get extended state status"));

          amd64_supply_xsave (regcache, -1, xstateregs);
...
after which amd64_supply_xsave is called.

The amd64_supply_xsave call is supposed to only use initialized parts of
xstateregs, but due to a kernel bug on intel skylake (fixed from 4.14 onwards
by commit 0852b374173b "x86/fpu: Add FPU state copying quirk to handle XRSTOR
failure on Intel Skylake CPUs") it can happen that the mxcsr part of
xstateregs is not initialized, while amd64_supply_xsave expects it to be
initialized, which explains the FAIL mentioned above.

Fix the undetermined behaviour by initializing xstateregs before calling
ptrace, which makes sure we get a 0x0 for mxcsr when this kernel bug occurs,
and which also happens to fix the FAIL.

Furthermore, add an xfail for this FAIL which triggers the same kernel bug:
...
FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: \
  check new value of MXCSR is still in place
...

Both FAILs pass when using a 5.3 kernel instead on the system mentioned above.

Tested on x86_64-linux.

OK for trunk?

Thanks,
- Tom

[gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

gdb/ChangeLog:

2019-09-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/23815
	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):
	Initialize xstateregs before ptrace PTRACE_GETREGSET call.

gdb/testsuite/ChangeLog:

2019-09-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/24598
	* gdb.arch/amd64-init-x87-values.exp: Add xfail.

---
 gdb/amd64-linux-nat.c                            |  1 +
 gdb/testsuite/gdb.arch/amd64-init-x87-values.exp | 19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)
  

Comments

Tom Tromey Sept. 24, 2019, 8:06 p.m. UTC | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Tom> [gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

Tom> gdb/ChangeLog:

Tom> 2019-09-24  Tom de Vries  <tdevries@suse.de>

Tom> 	PR gdb/23815
Tom> 	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):
Tom> 	Initialize xstateregs before ptrace PTRACE_GETREGSET call.

Thanks for the patch.

I think this is reasonable, but it could use a comment before the memset
explaining why it is there.  Maybe some future gdb hacker will want to
remove the memset, and a comment would explain why not to, or when it
would be ok to.

thanks,
Tom
  

Patch

diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
index 4f1c98a0d1..984b3d7c0f 100644
--- a/gdb/amd64-linux-nat.c
+++ b/gdb/amd64-linux-nat.c
@@ -238,6 +238,7 @@  amd64_linux_nat_target::fetch_registers (struct regcache *regcache, int regnum)
 	  char xstateregs[X86_XSTATE_MAX_SIZE];
 	  struct iovec iov;
 
+	  memset (xstateregs, 0, sizeof (xstateregs));
 	  iov.iov_base = xstateregs;
 	  iov.iov_len = sizeof (xstateregs);
 	  if (ptrace (PTRACE_GETREGSET, tid,
diff --git a/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp b/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
index cdf92dcd37..5fd18dbb79 100644
--- a/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
+++ b/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
@@ -116,7 +116,7 @@  proc_with_prefix check_x87_regs_around_init {} {
 # nop that does not enable any FP features).  Finally check that the
 # mxcsr register still has the value we set.
 proc_with_prefix check_setting_mxcsr_before_enable {} {
-    global binfile
+    global binfile gdb_prompt
 
     clean_restart ${binfile}
 
@@ -127,7 +127,22 @@  proc_with_prefix check_setting_mxcsr_before_enable {} {
 
     gdb_test_no_output "set \$mxcsr=0x9f80" "set a new value for MXCSR"
     gdb_test "stepi" "fwait" "step forward one instruction for mxcsr test"
-    gdb_test "p/x \$mxcsr" " = 0x9f80" "check new value of MXCSR is still in place"
+
+    set test "check new value of MXCSR is still in place"
+    set pass_pattern " = 0x9f80"
+    # Pre-4.14 kernels have a bug (fixed by commit 0852b374173b "x86/fpu:
+    # Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake
+    # CPUs") that causes mxcsr not to be copied, in which case we get 0 instead of
+    # the just saved value.
+    set xfail_pattern " = 0x0"
+    gdb_test_multiple "p/x \$mxcsr" $test {
+	-re "\[\r\n\]*(?:$pass_pattern)\[\r\n\]+$gdb_prompt $" {
+	    pass $test
+	}
+	-re "\[\r\n\]*(?:$xfail_pattern)\[\r\n\]+$gdb_prompt $" {
+	    xfail $test
+	}
+    }
 }
 
 # Start the test file, all FP features will be disabled.  Set new