Fix amd64->i386 linux syscall restart problem

Message ID 20190409192916.79fcb539@f29-4.lan
State New, archived
Headers

Commit Message

Kevin Buettner April 10, 2019, 2:29 a.m. UTC
  On Tue, 9 Apr 2019 18:46:45 +0100
Pedro Alves <palves@redhat.com> wrote:

> > +	  void *ptr = (gdb_byte *) gregs 
> > +		    + amd64_linux_gregset32_reg_offset[I386_EAX_REGNUM];  
> 
> Need to wrap the expression that spawns two lines in ()s.

I've made this change.  And the other one too.

> > +
> > +  /* Sign extend EAX value to avoid potential syscall restart problems.  */  
> 
> I'd rather see both implementations use the same comment.  Could you
> merge them?

I started to merge them and then decided to write a more detailed
comment based on the text that I wrote for the commit comment.  I
have, for the moment anyway, copied the comment to both locations with
only slight changes which reflect where the comment is located.  The
problem with having copies of the same long comment in two or more
places is making sure that if one gets updated, then the rest do too. 
It might be better to have one refer to the other.

I'm thinking that it might be preferable to have something like this
in gdb/gdbserver/linux-x86-low.c:

  /* Sign extend EAX value to avoid potential syscall restart problems.
  
     See amd64_linux_collect_native_gregset() in gdb/amd64-linux-nat.c
     for a detailed explanation.  */

Below is a diff showing the new comments.  It also includes the
changes which wrap the multi-line expressions in parens.
  

Comments

Pedro Alves April 10, 2019, 11:42 a.m. UTC | #1
On 4/10/19 3:29 AM, Kevin Buettner wrote:
> On Tue, 9 Apr 2019 18:46:45 +0100
> Pedro Alves <palves@redhat.com> wrote:
> 
>>> +	  void *ptr = (gdb_byte *) gregs 
>>> +		    + amd64_linux_gregset32_reg_offset[I386_EAX_REGNUM];  
>>
>> Need to wrap the expression that spawns two lines in ()s.
> 
> I've made this change.  And the other one too.
> 
>>> +
>>> +  /* Sign extend EAX value to avoid potential syscall restart problems.  */  
>>
>> I'd rather see both implementations use the same comment.  Could you
>> merge them?
> 
> I started to merge them and then decided to write a more detailed
> comment based on the text that I wrote for the commit comment.  I
> have, for the moment anyway, copied the comment to both locations with
> only slight changes which reflect where the comment is located.  The
> problem with having copies of the same long comment in two or more
> places is making sure that if one gets updated, then the rest do too. 
> It might be better to have one refer to the other.
> 
> I'm thinking that it might be preferable to have something like this
> in gdb/gdbserver/linux-x86-low.c:
> 
>   /* Sign extend EAX value to avoid potential syscall restart problems.
>   
>      See amd64_linux_collect_native_gregset() in gdb/amd64-linux-nat.c
>      for a detailed explanation.  */

I'm OK with that.  My concern with different comments in two places
is that at some point, like we managed to merge the x86
watchpoints code from gdb and gdbserver under gdb/nat/, we might be
able to merge this code as well.  And different comments make the job
a bit harder for the person doing that, since the person _then_ has to
decide what the resulting comment will be like.  If we can do it now,
it's preferable.  A pointer from one end to the other, like you're
proposing, works for me too.

> Below is a diff showing the new comments.  It also includes the
> changes which wrap the multi-line expressions in parens.

Thanks, that new version of the comment looks great.

Pedro Alves
  
Kevin Buettner April 11, 2019, 12:16 a.m. UTC | #2
On Wed, 10 Apr 2019 12:42:30 +0100
Pedro Alves <palves@redhat.com> wrote:

> > Below is a diff showing the new comments.  It also includes the
> > changes which wrap the multi-line expressions in parens.  
> 
> Thanks, that new version of the comment looks great.

It's in now.  Thanks for the review.

Kevin
  
Tom de Vries May 21, 2019, 12:59 p.m. UTC | #3
On 11-04-19 02:16, Kevin Buettner wrote:
> On Wed, 10 Apr 2019 12:42:30 +0100
> Pedro Alves <palves@redhat.com> wrote:
> 
>>> Below is a diff showing the new comments.  It also includes the
>>> changes which wrap the multi-line expressions in parens.  
>>
>> Thanks, that new version of the comment looks great.
> 
> It's in now.  Thanks for the review.

Hi,

the tests fixed by this commit fail on the 8.3 branch (filed as PR24592).

The commit applies cleanly on the 8.3 branch, and make the tests pass.

OK to backport?

Thanks,
- Tom
  
Tom de Vries June 21, 2019, 6:34 a.m. UTC | #4
On 21-05-19 14:59, Tom de Vries wrote:
> On 11-04-19 02:16, Kevin Buettner wrote:
>> On Wed, 10 Apr 2019 12:42:30 +0100
>> Pedro Alves <palves@redhat.com> wrote:
>>
>>>> Below is a diff showing the new comments.  It also includes the
>>>> changes which wrap the multi-line expressions in parens.  
>>>
>>> Thanks, that new version of the comment looks great.
>>
>> It's in now.  Thanks for the review.
> 
> Hi,
> 
> the tests fixed by this commit fail on the 8.3 branch (filed as PR24592).
> 
> The commit applies cleanly on the 8.3 branch, and make the tests pass.
> 
> OK to backport?

Hi,

[ Ping. ]

I'm probably missing some context here. It seems sofar there are no
commits to the 8.3 branch. AFAIU, a respin release 8.3.1 is targeted for
3 months after 8.3, so around 11th of August.

What kind of fixes are acceptable for the respin? Do they have to be
regression fixes? Or are functionality fixes like this one also allowed?
Who's responsibility is it to backport fixes?

Thanks,
- Tom
  
Kevin Buettner July 8, 2019, 5 p.m. UTC | #5
On Fri, 21 Jun 2019 08:34:46 +0200
Tom de Vries <tdevries@suse.de> wrote:

> On 21-05-19 14:59, Tom de Vries wrote:
> > On 11-04-19 02:16, Kevin Buettner wrote:  
> >> On Wed, 10 Apr 2019 12:42:30 +0100
> >> Pedro Alves <palves@redhat.com> wrote:
> >>  
> >>>> Below is a diff showing the new comments.  It also includes the
> >>>> changes which wrap the multi-line expressions in parens.    
> >>>
> >>> Thanks, that new version of the comment looks great.  
> >>
> >> It's in now.  Thanks for the review.  
> > 
> > Hi,
> > 
> > the tests fixed by this commit fail on the 8.3 branch (filed as PR24592).
> > 
> > The commit applies cleanly on the 8.3 branch, and make the tests pass.
> > 
> > OK to backport?  
> 
> Hi,
> 
> [ Ping. ]
> 
> I'm probably missing some context here. It seems sofar there are no
> commits to the 8.3 branch. AFAIU, a respin release 8.3.1 is targeted for
> 3 months after 8.3, so around 11th of August.
> 
> What kind of fixes are acceptable for the respin? Do they have to be
> regression fixes? Or are functionality fixes like this one also allowed?
> Who's responsibility is it to backport fixes?
> 
> Thanks,
> - Tom

I'm okay with it going into the next point release for 8.3.

Kevin
  

Patch

diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
index 06018c8f1c..8d0e8eb35c 100644
--- a/gdb/amd64-linux-nat.c
+++ b/gdb/amd64-linux-nat.c
@@ -106,14 +106,51 @@  amd64_linux_collect_native_gregset (const struct regcache *regcache,
   struct gdbarch *gdbarch = regcache->arch ();
   if (gdbarch_bfd_arch_info (gdbarch)->bits_per_word == 32)
     {
-      /* Sign-extend %eax as during return from a syscall it is being checked
-	 for -ERESTART* values -512 being above 0xfffffffffffffe00; tested by
-	 interrupt.exp.	 */
+      /* Sign extend EAX value to avoid potential syscall restart
+	 problems.  
+
+	 On Linux, when a syscall is interrupted by a signal, the
+	 (kernel function implementing the) syscall may return
+	 -ERESTARTSYS when a signal occurs.  Doing so indicates that
+	 the syscall is restartable.  Then, depending on settings
+	 associated with the signal handler, and after the signal
+	 handler is called, the kernel can then either return -EINTR
+	 or it can cause the syscall to be restarted.  We are
+	 concerned with the latter case here.
+	 
+	 On (32-bit) i386, the status (-ERESTARTSYS) is placed in the
+	 EAX register.  When debugging a 32-bit process from a 64-bit
+	 (amd64) GDB, the debugger fetches 64-bit registers even
+	 though the process being debugged is only 32-bit.  The
+	 register cache is only 32 bits wide though; GDB discards the
+	 high 32 bits when placing 64-bit values in the 32-bit
+	 regcache.  Normally, this is not a problem since the 32-bit
+	 process should only care about the lower 32-bit portions of
+	 these registers.  That said, it can happen that the 64-bit
+	 value being restored will be different from the 64-bit value
+	 that was originally retrieved from the kernel.  The one place
+	 (that we know of) where it does matter is in the kernel's
+	 syscall restart code.  The kernel's code for restarting a
+	 syscall after a signal expects to see a negative value
+	 (specifically -ERESTARTSYS) in the 64-bit RAX register in
+	 order to correctly cause a syscall to be restarted.
+	 
+	 The call to amd64_collect_native_gregset, above, is setting
+	 the high 32 bits of RAX (and other registers too) to 0.  For
+	 syscall restart, we need to sign extend EAX so that RAX will
+	 appear as a negative value when EAX is set to -ERESTARTSYS. 
+	 This in turn will cause the signal handling code in the
+	 kernel to recognize -ERESTARTSYS which will in turn cause the
+	 syscall to be restarted.
+
+	 The test case gdb.base/interrupt.exp tests for this problem.
+	 Without this sign extension code in place, it'll show
+	 a number of failures when testing against unix/-m32.  */
 
       if (regnum == -1 || regnum == I386_EAX_REGNUM)
 	{
-	  void *ptr = (gdb_byte *) gregs 
-		    + amd64_linux_gregset32_reg_offset[I386_EAX_REGNUM];
+	  void *ptr = ((gdb_byte *) gregs 
+		       + amd64_linux_gregset32_reg_offset[I386_EAX_REGNUM]);
 
 	  *(int64_t *) ptr = *(int32_t *) ptr;
 	}
diff --git a/gdb/gdbserver/linux-x86-low.c b/gdb/gdbserver/linux-x86-low.c
index 20f369c496..9a97cdf5c0 100644
--- a/gdb/gdbserver/linux-x86-low.c
+++ b/gdb/gdbserver/linux-x86-low.c
@@ -339,11 +339,48 @@  x86_fill_gregset (struct regcache *regcache, void *buf)
   collect_register_by_name (regcache, "orig_eax",
 			    ((char *) buf) + ORIG_EAX * REGSIZE);
 
-  /* Sign extend EAX value to avoid potential syscall restart problems.  */
+  /* Sign extend EAX value to avoid potential syscall restart
+     problems. 
+
+     On Linux, when a syscall is interrupted by a signal, the (kernel
+     function implementing the) syscall may return -ERESTARTSYS when a
+     signal occurs.  Doing so indicates that the syscall is
+     restartable.  Then, depending on settings associated with the
+     signal handler, and after the signal handler is called, the
+     kernel can then either return -EINTR or it can cause the syscall
+     to be restarted.  We are concerned with the latter case here.
+     
+     On (32-bit) i386, the status (-ERESTARTSYS) is placed in the EAX
+     register.  When debugging a 32-bit process from a 64-bit (amd64)
+     GDB, the debugger fetches 64-bit registers even though the
+     process being debugged is only 32-bit.  The register cache is
+     only 32 bits wide though; GDB discards the high 32 bits when
+     placing 64-bit values in the 32-bit regcache.  Normally, this is
+     not a problem since the 32-bit process should only care about the
+     lower 32-bit portions of these registers.  That said, it can
+     happen that the 64-bit value being restored will be different
+     from the 64-bit value that was originally retrieved from the
+     kernel.  The one place (that we know of) where it does matter is
+     in the kernel's syscall restart code.  The kernel's code for
+     restarting a syscall after a signal expects to see a negative
+     value (specifically -ERESTARTSYS) in the 64-bit RAX register in
+     order to correctly cause a syscall to be restarted.
+     
+     The call to collect_register, above, is setting the high 32 bits
+     of RAX (and other registers too) to 0.  For syscall restart, we
+     need to sign extend EAX so that RAX will appear as a negative
+     value when EAX is set to -ERESTARTSYS.  This in turn will cause
+     the signal handling code in the kernel to recognize -ERESTARTSYS
+     which will in turn cause the syscall to be restarted.
+
+     The test case gdb.base/interrupt.exp tests for this problem. 
+     Without this sign extension code in place, it'll show a number of
+     failures when testing against native-gdbserver/-m32.  */
+
   if (register_size (regcache->tdesc, 0) == 4)
     {
-      void *ptr = (gdb_byte *) buf
-                + i386_regmap[find_regno (regcache->tdesc, "eax")];
+      void *ptr = ((gdb_byte *) buf
+                   + i386_regmap[find_regno (regcache->tdesc, "eax")]);
 
       *(int64_t *) ptr = *(int32_t *) ptr;
     }