gdb/x86: Handle kernels using compact xsave format

Message ID 20180314005707.9009-1-andrew.burgess@embecosm.com
State New, archived
Headers

Commit Message

Andrew Burgess March 14, 2018, 12:57 a.m. UTC
  I noticed that some gdb tests that involved making inferior calls
would sometimes fail with SIGFPE on one particular machine, but would
apparently never fail on other machines.  Both machines were x86-64,
running variants of Fedora GNU/Linux.  The difference turned out to be
what floating point features were available on each machine (and that
the kernel was taking advantage of).

Looking in the boot log for a non-failing machine I'd see these
messages:

    TIME/DATE HOSTNAME kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.

While on the failing machine I'd see these messages:

    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
    TIME/DATE HOSTNAME kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
    TIME/DATE HOSTNAME kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
    TIME/DATE HOSTNAME kernel: x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
    TIME/DATE HOSTNAME kernel: x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
    TIME/DATE HOSTNAME kernel: x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.

The important part is the change from 'standard' format to the
'compact' format.

Looking in the kernel at how these two formats are handled, the
details of which can be found in (as of 4.16):

    arch/x86/kernel/fpu/regset.c:xstateregs_get

If the kernel is using compact format then the registers are copied to
the user with a call to copy_xstate_to_user, while if using standard
format a call to user_regset_copyout is used.

The difference is that the 'standard' format's user_regset_copyout
call copies the entire register block, including the registers for all
floating point features that have not yet been enabled.  In order to
not leak random kernel data to the user, the kernel ensures that all
registers are initialised before copying out.  Copying in one block
like this is possible because the standard format used by the kernel
is the same format as is used by the ptrace interface to pass theh
regiters back to the user.

In contrast the compact format within the kernel is different to the
format used by ptrace.  As such, each floating point feature must be
expanded from the compact kernel format into the standard format when
the registers are requested though ptrace.  To save time the kernel
only expands those features that are actually enabled.  Features that
are not enabled are not written out to the user buffer, the output
buffer contents are left unmodified.

What this means is that, on machines using the compact floating point
format, if, for example, the x87 unit has not been used then a ptrace
request for the floating point registers will leave the user buffer
corresponding to the x87 registers unmodified.

There is a bit mask in the floating point data header that tells us
which units are in use.  We use this bit mask in GDB to prevent
reading uninitialised data from the floating point register buffer.
However, we don't guard reading the x87 control registers, these are
read from the register buffer even if they were (potentially) not
written by the kernel.

We can confirm this theory by applying a patch like this:

    diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
    index ad942435b8f..3e72ea0fedd 100644
    --- a/gdb/amd64-linux-nat.c
    +++ b/gdb/amd64-linux-nat.c
    @@ -163,6 +163,8 @@ amd64_linux_fetch_inferior_registers (struct target_ops *ops,
              char xstateregs[X86_XSTATE_MAX_SIZE];
              struct iovec iov;

    +         memset (xstateregs, 0x12, sizeof (xstateregs));
    +
              iov.iov_base = xstateregs;
              iov.iov_len = sizeof (xstateregs);
              if (ptrace (PTRACE_GETREGSET, tid,

With this in place then start up on an inferior on a machine using
'compact' format, and you'll see the pattern 0x12 leak through into
the values of the x87 control registers.  Without the memset in place
then the values in these registers are just random values from the GDB
stack.

The problem that leads to the SIGFPE is this:

  1. New inferior starts, floating point unit (FPU) is disabled.
  2. User makes an inferior call, so:
  3. GDB reads current register values, including the random
     values in the x87 control registers.
  4. Inferior call touches FPU, kernel enables the FPU and writes
     default values into all floating point registers, including the
     x87 control registers.
  5. Inferior call completes, and control passes back to GDB, so:
  6. GDB restores the previous register values, including the random
     values that were in the x87 control registers.  However, the
     kernel still thinks the FPU is initialised.
  7. User makes a new inferior call, so:
  8. As before, GDB backs up register values, including the random
     values in the x87 control registers we helpfully restored.
  9. Inferior call uses the FPU, however, as it is already enabled,
     the kernel does not get involved, the control registers keep
     their random values, and a SIGFPE might be generated depending on
     what happened to be on the GDB stack.

The solution is to treat the x87 control register exactly how we treat
all of the other floating point registers.  If the x87 floating point
unit is not yet in use then we don't read the control registers out of
the user ptrace register buffer, instead GDB supplies its own default
values that match the value described in the x86 manual (and in the
kernel).

The two tests where I've seen random SIGFPE failures are
gdb.base/structs.exp and gdb.base/callfuncs.exp.  With this patch in
place these issues seem to be resolved.

This patch has been tested on two x86-64 GNU/Linux machines one using
standard format, and one using compact format.

gdb/ChangeLog:

	* i387-tdep.c (i387_supply_xsave): Supply initial values when
	registers have not been initialised by the kernel yet.
	(i387_collect_xsave): Update xstate_bv after check if any control
	registers have changed.  Updating the control registers should
	result in xstate_bv being updated.
---
 gdb/ChangeLog   |   8 ++++
 gdb/i387-tdep.c | 117 +++++++++++++++++++++++++++++++++++---------------------
 2 files changed, 81 insertions(+), 44 deletions(-)
  

Comments

Pedro Alves March 26, 2018, 3:42 p.m. UTC | #1
On 03/14/2018 12:57 AM, Andrew Burgess wrote:

> 
>

[snip excellent description]

> 
> gdb/ChangeLog:
> 
> 	* i387-tdep.c (i387_supply_xsave): Supply initial values when
> 	registers have not been initialised by the kernel yet.
> 	(i387_collect_xsave): Update xstate_bv after check if any control
> 	registers have changed.  Updating the control registers should
> 	result in xstate_bv being updated.

> -	    memcpy (FXSAVE_ADDR (tdep, regs, i), buf, 2);
> +	    p = FXSAVE_ADDR (tdep, regs, i);
> +	    if (memcmp (p, buf, 2) == 0)
> +	      {
> +		xstate_bv |= X86_XSTATE_X87;
> +		memcpy (p, buf, 2);
> +	      }

I don't understand this though -- if memcmp returns 0, then
p and buf hold the same bytes, so the memcpy is useless.
I think you meant to do "memcmp ... != 0" ?

Curious no test catches that.

Thanks,
Pedro Alves
  
Pedro Alves March 26, 2018, 3:57 p.m. UTC | #2
Also, meant to ask but forgot -- does gdbserver/i387-fp.c need
any fixing too?

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/i387-tdep.c b/gdb/i387-tdep.c
index 19387929c59..b671a3fe58d 100644
--- a/gdb/i387-tdep.c
+++ b/gdb/i387-tdep.c
@@ -1246,10 +1246,23 @@  i387_supply_xsave (struct regcache *regcache, int regnum,
   for (i = I387_FCTRL_REGNUM (tdep); i < I387_XMM0_REGNUM (tdep); i++)
     if (regnum == -1 || regnum == i)
       {
+	if (clear_bv & X86_XSTATE_X87)
+	  {
+	    if (i == I387_FCTRL_REGNUM (tdep))
+	      {
+		/* Initial value for fctrl register, as defined in the X86
+		   manual, and confirmed in the (Linux) kernel source.  */
+		static const gdb_byte fctrl[4] = { 0x7f, 0x03, 0x00, 0x00 };
+
+		regcache_raw_supply (regcache, i, fctrl);
+	      }
+	    else
+	      regcache_raw_supply (regcache, i, zero);
+	  }
 	/* Most of the FPU control registers occupy only 16 bits in
 	   the xsave extended state.  Give those a special treatment.  */
-	if (i != I387_FIOFF_REGNUM (tdep)
-	    && i != I387_FOOFF_REGNUM (tdep))
+	else if (i != I387_FIOFF_REGNUM (tdep)
+		 && i != I387_FOOFF_REGNUM (tdep))
 	  {
 	    gdb_byte val[4];
 
@@ -1690,46 +1703,6 @@  i387_collect_xsave (const struct regcache *regcache, int regnum,
 	    }
 	}
 
-      /* Update the corresponding bits in `xstate_bv' if any SSE/AVX
-	 registers are changed.  */
-      if (xstate_bv)
-	{
-	  /* The supported bits in `xstat_bv' are 8 bytes.  */
-	  initial_xstate_bv |= xstate_bv;
-	  store_unsigned_integer (XSAVE_XSTATE_BV_ADDR (regs),
-				  8, byte_order,
-				  initial_xstate_bv);
-
-	  switch (regclass)
-	    {
-	    default:
-	      internal_error (__FILE__, __LINE__,
-			      _("invalid i387 regclass"));
-
-	    case all:
-	      break;
-
-	    case x87:
-	    case sse:
-	    case avxh:
-	    case mpx:
-	    case avx512_k:
-	    case avx512_zmm_h:
-	    case avx512_ymmh_avx512:
-	    case avx512_xmm_avx512:
-	    case pkeys:
-	      /* Register REGNUM has been updated.  Return.  */
-	      return;
-	    }
-	}
-      else
-	{
-	  /* Return if REGNUM isn't changed.  */
-	  if (regclass != all)
-	    return;
-	}
-    }
-
   /* Only handle x87 control registers.  */
   for (i = I387_FCTRL_REGNUM (tdep); i < I387_XMM0_REGNUM (tdep); i++)
     if (regnum == -1 || regnum == i)
@@ -1769,12 +1742,68 @@  i387_collect_xsave (const struct regcache *regcache, int regnum,
 		      buf[0] |= (1 << fpreg);
 		  }
 	      }
-	    memcpy (FXSAVE_ADDR (tdep, regs, i), buf, 2);
+	    p = FXSAVE_ADDR (tdep, regs, i);
+	    if (memcmp (p, buf, 2) == 0)
+	      {
+		xstate_bv |= X86_XSTATE_X87;
+		memcpy (p, buf, 2);
+	      }
 	  }
 	else
-	  regcache_raw_collect (regcache, i, FXSAVE_ADDR (tdep, regs, i));
+	  {
+	    int regsize;
+
+	    regcache_raw_collect (regcache, i, raw);
+	    regsize = regcache_register_size (regcache, i);
+	    p = FXSAVE_ADDR (tdep, regs, i);
+	    if (memcmp (raw, p, regsize) == 0)
+	      {
+		xstate_bv |= X86_XSTATE_X87;
+		memcpy (p, raw, regsize);
+	      }
+	  }
       }
 
+      /* Update the corresponding bits in `xstate_bv' if any SSE/AVX
+	 registers are changed.  */
+      if (xstate_bv)
+	{
+	  /* The supported bits in `xstat_bv' are 8 bytes.  */
+	  initial_xstate_bv |= xstate_bv;
+	  store_unsigned_integer (XSAVE_XSTATE_BV_ADDR (regs),
+				  8, byte_order,
+				  initial_xstate_bv);
+
+	  switch (regclass)
+	    {
+	    default:
+	      internal_error (__FILE__, __LINE__,
+			      _("invalid i387 regclass"));
+
+	    case all:
+	      break;
+
+	    case x87:
+	    case sse:
+	    case avxh:
+	    case mpx:
+	    case avx512_k:
+	    case avx512_zmm_h:
+	    case avx512_ymmh_avx512:
+	    case avx512_xmm_avx512:
+	    case pkeys:
+	      /* Register REGNUM has been updated.  Return.  */
+	      return;
+	    }
+	}
+      else
+	{
+	  /* Return if REGNUM isn't changed.  */
+	  if (regclass != all)
+	    return;
+	}
+    }
+
   if (regnum == I387_MXCSR_REGNUM (tdep) || regnum == -1)
     regcache_raw_collect (regcache, I387_MXCSR_REGNUM (tdep),
 			  FXSAVE_MXCSR_ADDR (regs));