Fix interrupt.exp fails with m32 in x86_64

Message ID 53554475.3010201@mentor.com
State Superseded
Headers

Commit Message

Hui Zhu April 21, 2014, 4:16 p.m. UTC
  make check RUNTESTFLAGS="GDB_TESTCASE_OPTIONS=\"-m32\" interrupt.exp" in
x86_64 will got some fails:
FAIL: gdb.base/interrupt.exp: signal SIGINT (the program is no longer 
running)
FAIL: gdb.base/interrupt.exp: echo more data (timeout)
FAIL: gdb.base/interrupt.exp: send end of file
The issue can be reproduced:
#uname -p
x86_64

#gcc -g -m32 gdb.base/interrupt.c
#gdb ./a.out
(gdb) r
Starting program: /home/teawater/gdb/binutils-gdb/gdb/testsuite/a.out
talk to me baby
data
data
^C
Program received signal SIGINT, Interrupt.
0xf7ffd430 in __kernel_vsyscall ()
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0xf7ffd430 in __kernel_vsyscall ()
(gdb) p func1()
$1 = 4
(gdb) c
Continuing.
Unknown error 512
[Inferior 1 (process 7953) exited with code 01]

       nbytes = read (0, &x, 1);
       if (nbytes < 0)
     {
#ifdef EINTR
       if (errno != EINTR)
#endif
After GDB call a function "func1()" by hands, "read" will get
errno 512(ERESTARTSYS) that should handled by Linux kernel.

The root cause of this issue is:
When user use ctrl-c stop the inferior, the signal will be handled in
Linux kernel function "do_signal" in arch/x86/kernel/signal.c.
The inferior will be stoped by function "ptrace_stop".  The call trace is:
#0  freezable_schedule () at include/linux/freezer.h:172
#1  ptrace_stop (exit_code=exit_code@entry=5, why=why@entry=262148,
     clear_code=clear_code@entry=0, info=info@entry=0xffff88001d833e78)
     at kernel/signal.c:1920
#2  0xffffffff8107ec33 in ptrace_signal (info=0xffff88001d833e78, signr=5)
     at kernel/signal.c:2157
#3  get_signal_to_deliver (info=info@entry=0xffff88001d833e78,
     return_ka=return_ka@entry=0xffff88001d833e58, regs=<optimized out>,
     cookie=cookie@entry=0x0 <irq_stack_union>) at kernel/signal.c:2269
#4  0xffffffff81013438 in do_signal (regs=regs@entry=0xffff88001d833f58)
     at arch/x86/kernel/signal.c:696
#5  0xffffffff81013a40 in do_notify_resume (regs=0xffff88001d833f58,
     unused=<optimized out>, thread_info_flags=4) at 
arch/x86/kernel/signal.c:747
#6  <signal handler called>
#7  0x0000000000000000 in irq_stack_union ()

When GDB "call func1()", to control inferior execute the function func1()
and go back to old ip.  GDB need set all the registers by GDB function
"amd64_collect_native_gregset" that will zero-extend most of 32 bits 
registers
to 64 bits and set to inferior.
And execute from ptrace_stop and got back to do_signal.
current_thread_info()->status TS_COMPAT will be clean by function
"int_with_check" when it return to user space.

When GDB "continue", inferior will execute from ptrace_stop and got back
to do_signal again.
Because this signal interrupt a syscall, go back to function do_signal
will use function "syscall_get_error" check if this is a syscall and got
error:
static inline long syscall_get_error(struct task_struct *task,
                      struct pt_regs *regs)
{
     unsigned long error = regs->ax;
#ifdef CONFIG_IA32_EMULATION
     /*
      * TS_COMPAT is set for 32-bit syscall entries and then
      * remains set until we return to user mode.
      */
     if (task_thread_info(task)->status & TS_COMPAT)
         /*
          * Sign-extend the value so (int)-EFOO becomes (long)-EFOO
          * and will match correctly in comparisons.
          */
         error = (long) (int) error;
#endif
     return IS_ERR_VALUE(error) ? error : 0;
}
Now ax is in 32 bits now, need sign-extend to 64 bits.  But
current_thread_info()->status TS_COMPAT is cleared when GDB call "call 
func1()".
Linux kernel don't know this is a 32 bits task and will not extend it.
Then -ERESTARTSYS is not be handled and go back to user space.

Then the syscall "read" get a errno in ERESTARTSYS.

I make a patch that let eax sign-extend in function 
amd64_collect_native_gregset
that can handle this issue.
It can handle the issue and pass the regression test.
Please help me review it.

And I make a another patch for Linux kernel that can handle this issue
too.  I will post it to LKML later.

Thanks,
Hui

2014-04-21  Hui Zhu  <hui@codesourcery.com>

     * amd64-nat.c(amd64_collect_native_gregset): Make %eax sign-extended.
      }
@@ -156,7 +158,24 @@ amd64_collect_native_gregset (const stru
        int offset = amd64_native_gregset_reg_offset (gdbarch, i);

        if (offset != -1)
-        regcache_raw_collect (regcache, i, regs + offset);
+        {
+          if (i == I386_EAX_REGNUM
+          && gdbarch_bfd_arch_info (gdbarch)->bits_per_word == 32)
+            {
+          /* Make sure %eax get sign-extended to 64 bits. */
+          LONGEST val;
+
+          regcache_raw_collect (regcache, I386_EAX_REGNUM,
+                    regs + offset);
+          val = extract_signed_integer ((gdb_byte *)(regs + offset),
+                        4,
+                        gdbarch_byte_order (gdbarch));
+          store_signed_integer ((gdb_byte *)(regs + offset), 8,
+                    gdbarch_byte_order (gdbarch), val);
+        }
+          else
+        regcache_raw_collect (regcache, i, regs + offset);
+        }
      }
      }
  }
  

Comments

Mark Kettenis April 21, 2014, 6:27 p.m. UTC | #1
> Date: Tue, 22 Apr 2014 00:16:53 +0800
> From: Hui Zhu <hui_zhu@mentor.com>
> 
> I make a patch that let eax sign-extend in function 
> amd64_collect_native_gregset
> that can handle this issue.
> It can handle the issue and pass the regression test.
> Please help me review it.

I don't think the generic amd64 target code is the proper place to
work around Linux kernel bugs.  If you really want to work around this
bug in GDB, it should probably be done in the Linux-specific
i386/amd64 native code.

Mark

> 2014-04-21  Hui Zhu  <hui@codesourcery.com>
> 
>      * amd64-nat.c(amd64_collect_native_gregset): Make %eax sign-extended.
> --- a/gdb/amd64-nat.c
> +++ b/gdb/amd64-nat.c
> @@ -131,10 +131,12 @@ amd64_collect_native_gregset (const stru
>       {
>         num_regs = amd64_native_gregset32_num_regs;
> 
> -      /* Make sure %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp and
> +      /* Make sure %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp and
>            %eip get zero-extended to 64 bits.  */
>         for (i = 0; i <= I386_EIP_REGNUM; i++)
>       {
> +      if (i == I386_EAX_REGNUM)
> +        continue;
>         if (regnum == -1 || regnum == i)
>           memset (regs + amd64_native_gregset_reg_offset (gdbarch, i), 
> 0, 8);
>       }
> @@ -156,7 +158,24 @@ amd64_collect_native_gregset (const stru
>         int offset = amd64_native_gregset_reg_offset (gdbarch, i);
> 
>         if (offset != -1)
> -        regcache_raw_collect (regcache, i, regs + offset);
> +        {
> +          if (i == I386_EAX_REGNUM
> +          && gdbarch_bfd_arch_info (gdbarch)->bits_per_word == 32)
> +            {
> +          /* Make sure %eax get sign-extended to 64 bits. */
> +          LONGEST val;
> +
> +          regcache_raw_collect (regcache, I386_EAX_REGNUM,
> +                    regs + offset);
> +          val = extract_signed_integer ((gdb_byte *)(regs + offset),
> +                        4,
> +                        gdbarch_byte_order (gdbarch));
> +          store_signed_integer ((gdb_byte *)(regs + offset), 8,
> +                    gdbarch_byte_order (gdbarch), val);
> +        }
> +          else
> +        regcache_raw_collect (regcache, i, regs + offset);
> +        }
>       }
>       }
>   }
> 
>
  
Hui Zhu April 29, 2014, 3:56 p.m. UTC | #2
I am sorry that the root cause of issue has something wrong.
The right root cause is:
When inferior call 32 bits syscall "read", Linux kernel function
"ia32_cstar_target" will set TS_COMPAT to current_thread_info->status.

syscall read is interrupt by ctrl-c.   Then the $rax will be set to
errno -512 in 64 bits.
And the inferior will be stopped by Linux kernel function ptrace_stop,
the call trace is:
#0  freezable_schedule () at include/linux/freezer.h:172
#1  ptrace_stop (exit_code=exit_code@entry=5, why=why@entry=262148,
    clear_code=clear_code@entry=0, info=info@entry=0xffff88001d833e78)
    at kernel/signal.c:1920
#2  0xffffffff8107ec33 in ptrace_signal (info=0xffff88001d833e78, signr=5)
    at kernel/signal.c:2157
#3  get_signal_to_deliver (info=info@entry=0xffff88001d833e78,
    return_ka=return_ka@entry=0xffff88001d833e58, regs=<optimized out>,
    cookie=cookie@entry=0x0 <irq_stack_union>) at kernel/signal.c:2269
#4  0xffffffff81013438 in do_signal (regs=regs@entry=0xffff88001d833f58)
    at arch/x86/kernel/signal.c:696
#5  0xffffffff81013a40 in do_notify_resume (regs=0xffff88001d833f58,
    unused=<optimized out>, thread_info_flags=4) at arch/x86/kernel/signal.c:747
#6  <signal handler called>
#7  0x0000000000000000 in irq_stack_union ()

After that, GDB can control the stopped inferior.
To call function "func1()" of inferior, GDB need:
Step 1, save current values of registers ($rax 0xfffffffffffffe00(64 bits -512)
is cut to 0xfffffe00(32 bits -512) because inferior is a 32 bits program).
Step 2, change the values of registers.
Step 3, Push a dummy frame to stack.
Step 4, set a breakpint in the return address.

When GDB resume the inferior, it will keep execut from ptrace_stop
with new values of registers that set by GDB.
And TS_COMPAT inside current_thread_info->status will be cleared when
inferior switch back to user space.

When function "func1()" return, inferior will be stoped by breakpoint
inferior will be stopped by Linux kernel function "ptrace_stop" again.
current_thread_info->status will not set TS_COMPAT when inferior swith
from user space to kernel space because breakpoint handler "int3" doesn't
has code for that.

GDB begin to set saved values of registers back to inferior that use
function "amd64_collect_native_gregset".  Because this function just
zero-extend each 32 bits value to 64 bits value before put them to inferior.
$rax's value is set to 0xfffffe00(32 bits -512) but not
0xfffffffffffffe00(64 bits -512).

When GDB continue syscall "read" that is interrupted by "ctrl-c", it will
keep execute from ptrace_stop without "TS_COMPAT".
Then in Linux kernel function "syscall_get_error", current_thread_info->status
doesn't have TS_COMPAT and $rax is 0xfffffe00(32 bits -512).  Then in
function do_signal will not handle this -ERESTARTSYS.

-ERESTARTSYS will be return back to inferior, that is why inferior got a
errno -ERESTARTSYS.

On Tue, Apr 22, 2014 at 2:27 AM, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
>> Date: Tue, 22 Apr 2014 00:16:53 +0800
>> From: Hui Zhu <hui_zhu@mentor.com>
>>
>> I make a patch that let eax sign-extend in function
>> amd64_collect_native_gregset
>> that can handle this issue.
>> It can handle the issue and pass the regression test.
>> Please help me review it.
>
> I don't think the generic amd64 target code is the proper place to
> work around Linux kernel bugs.  If you really want to work around this
> bug in GDB, it should probably be done in the Linux-specific
> i386/amd64 native code.
>
> Mark
>
>> 2014-04-21  Hui Zhu  <hui@codesourcery.com>
>>
>>      * amd64-nat.c(amd64_collect_native_gregset): Make %eax sign-extended.
>> --- a/gdb/amd64-nat.c
>> +++ b/gdb/amd64-nat.c
>> @@ -131,10 +131,12 @@ amd64_collect_native_gregset (const stru
>>       {
>>         num_regs = amd64_native_gregset32_num_regs;
>>
>> -      /* Make sure %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp and
>> +      /* Make sure %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp and
>>            %eip get zero-extended to 64 bits.  */
>>         for (i = 0; i <= I386_EIP_REGNUM; i++)
>>       {
>> +      if (i == I386_EAX_REGNUM)
>> +        continue;
>>         if (regnum == -1 || regnum == i)
>>           memset (regs + amd64_native_gregset_reg_offset (gdbarch, i),
>> 0, 8);
>>       }
>> @@ -156,7 +158,24 @@ amd64_collect_native_gregset (const stru
>>         int offset = amd64_native_gregset_reg_offset (gdbarch, i);
>>
>>         if (offset != -1)
>> -        regcache_raw_collect (regcache, i, regs + offset);
>> +        {
>> +          if (i == I386_EAX_REGNUM
>> +          && gdbarch_bfd_arch_info (gdbarch)->bits_per_word == 32)
>> +            {
>> +          /* Make sure %eax get sign-extended to 64 bits. */
>> +          LONGEST val;
>> +
>> +          regcache_raw_collect (regcache, I386_EAX_REGNUM,
>> +                    regs + offset);
>> +          val = extract_signed_integer ((gdb_byte *)(regs + offset),
>> +                        4,
>> +                        gdbarch_byte_order (gdbarch));
>> +          store_signed_integer ((gdb_byte *)(regs + offset), 8,
>> +                    gdbarch_byte_order (gdbarch), val);
>> +        }
>> +          else
>> +        regcache_raw_collect (regcache, i, regs + offset);
>> +        }
>>       }
>>       }
>>   }
>>
>>
  
Pedro Alves April 29, 2014, 5:05 p.m. UTC | #3
On 04/29/2014 04:56 PM, Hui Zhu wrote:
> I am sorry that the root cause of issue has something wrong.
> The right root cause is:
> When inferior call 32 bits syscall "read", Linux kernel function
> "ia32_cstar_target" will set TS_COMPAT to current_thread_info->status.

Thanks a lot of tracking this stuff down.  I appreciate the effort.

It'd be great if we got an ack on the kernel side on what's
going on before we considered working around it in GDB.

Thanks,
  

Patch

--- a/gdb/amd64-nat.c
+++ b/gdb/amd64-nat.c
@@ -131,10 +131,12 @@  amd64_collect_native_gregset (const stru
      {
        num_regs = amd64_native_gregset32_num_regs;

-      /* Make sure %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp and
+      /* Make sure %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp and
           %eip get zero-extended to 64 bits.  */
        for (i = 0; i <= I386_EIP_REGNUM; i++)
      {
+      if (i == I386_EAX_REGNUM)
+        continue;
        if (regnum == -1 || regnum == i)
          memset (regs + amd64_native_gregset_reg_offset (gdbarch, i), 
0, 8);