x86_64: Expand CFI to cover clone after the syscall

Message ID 20161212052909.GA11750@juliacomputing.com
State New, archived
Headers

Commit Message

Keno Fischer Dec. 12, 2016, 5:29 a.m. UTC
  The CFI used to terminate after the syscall, leaving unwinding for the
rest of the function up to the debugger's imagination. The comment states
that the reason for this is that the unwind info would be incorrect in the
child. However, without unwind info, both the child and the parent process
may fail to unwind properly after the syscall (though some debuggers still
have heuristics that work for the parent case). To improve this situation,
use a DWARF impression that checks %rax, and if non-zero (i.e. we're in
the parent, behaves like the CFI for the rest of the function). If %rax==0,
the CFI indicates to unwind to thread_start. Ideally, I would have liked
to have it undefine %rip, but it does not look like that is possible.
However, even if not entirely correct, I think this version is a strict
improvement over what was there before.

E.g. in GDB:
Backtrace before:
 #0  0x00007f14a1119081 in clone () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x00007f14a13df640 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
 #2  0x00007f14a0e0c700 in ?? ()
 #3  0x0000000000000000 in ?? ()

Backtrace after:
 #0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
 #1  0x00007f377330485b in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:120

Another option would be to create another global symbol e.g.
`new_thread_from_clone` and have the unwind pretend to unwind there.
---
I'm on a somewhat long-term quest here to improve the accuracy of unwind info
in the basic libraries and the toolchain. Right now we're in a situation where
e.g. invalid memory accesses by unwinders are often ignored, because there are
sufficiently many places without unwind info, or with incorrect unwind info that
it's more likely that than an unwinder bug. I'd like to get into a position I can
consider an invalid memory access during unwinding a bug, but first, I need to
cleanup some of the more common cases where one runs into invalid unwind info.

 sysdeps/unix/sysv/linux/x86_64/clone.S | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)
  

Comments

Adhemerval Zanella Netto Dec. 14, 2016, 7:56 p.m. UTC | #1
On 12/12/2016 03:29, Keno Fischer wrote:
> The CFI used to terminate after the syscall, leaving unwinding for the
> rest of the function up to the debugger's imagination. The comment states
> that the reason for this is that the unwind info would be incorrect in the
> child. However, without unwind info, both the child and the parent process
> may fail to unwind properly after the syscall (though some debuggers still
> have heuristics that work for the parent case). To improve this situation,
> use a DWARF impression that checks %rax, and if non-zero (i.e. we're in
> the parent, behaves like the CFI for the rest of the function). If %rax==0,
> the CFI indicates to unwind to thread_start. Ideally, I would have liked
> to have it undefine %rip, but it does not look like that is possible.
> However, even if not entirely correct, I think this version is a strict
> improvement over what was there before.
> 
> E.g. in GDB:
> Backtrace before:
>  #0  0x00007f14a1119081 in clone () from /lib/x86_64-linux-gnu/libc.so.6
>  #1  0x00007f14a13df640 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>  #2  0x00007f14a0e0c700 in ?? ()
>  #3  0x0000000000000000 in ?? ()
> 
> Backtrace after:
>  #0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>  #1  0x00007f377330485b in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:120
> 
> Another option would be to create another global symbol e.g.
> `new_thread_from_clone` and have the unwind pretend to unwind there.
> ---
> I'm on a somewhat long-term quest here to improve the accuracy of unwind info
> in the basic libraries and the toolchain. Right now we're in a situation where
> e.g. invalid memory accesses by unwinders are often ignored, because there are
> sufficiently many places without unwind info, or with incorrect unwind info that
> it's more likely that than an unwinder bug. I'd like to get into a position I can
> consider an invalid memory access during unwinding a bug, but first, I need to
> cleanup some of the more common cases where one runs into invalid unwind info.

I tried to create a testcase [1] where unwind using backtrace() would trigger
this issue you pointed out, but even by calling clone direct libgcc unwind
(which for x86_64 backtrace is based) at least seems to get this right.  It
could be case where this issue appear only with a more complex stackframe
mixing symbols from different modules and DSO (I am far from a CFI expert).

In any case I would like to have at least have a workable testcase to actually
check this fix and have a way to check if this faulty behaviour also triggers
in other architectures than x86_64.  I am not really sure how would be a correct
way to check, maybe implementing a very simple backtrack clone using libgcc
as most of the architecture already does.

[1] https://paste.ubuntu.com/23630310/

> 
>  sysdeps/unix/sysv/linux/x86_64/clone.S | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/x86_64/clone.S b/sysdeps/unix/sysv/linux/x86_64/clone.S
> index 66f4b11..1bd2eed 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/clone.S
> +++ b/sysdeps/unix/sysv/linux/x86_64/clone.S
> @@ -72,16 +72,38 @@ ENTRY (__clone)
>  	mov	8(%rsp), %R10_LP
>  	movl	$SYS_ify(clone),%eax
>  
> -	/* End FDE now, because in the child the unwind info will be
> -	   wrong.  */
> -	cfi_endproc;
>  	syscall
>  
> +  /* Best effort unwind info that works for both the parent and the child.
> +     Ideally, we'd have cfi_undefiend(%rip) in the child and keep everything
> +     the same in the parent, but we can't do that (since there's no way for
> +     an expression to return undefined). Instead, we pretend that the child
> +     came from thread_start. This isn't quite correct, but at least better than
> +     whatever the debugger heuristics would have come up with in the absence
> +     of unwind info */
> +
> +  /* Encodes: parent %rip = %rax == 0 ? %rip + (2f - .) : *%rsp
> +     DW_CFA_val_expression %rip DW_OP_breg16(2f-1b) DW_OP_lit0 DW_OP_breg0(0)
> +     DW_OP_eq DW_OP_bra(0x0003) DW_OP_breg7(0) DW_OP_deref */
> +#define CFI_CHILD_RIP_IS_THREAD_START \
> +	.cfi_escape 0x16, 0x10, 0xc, 0x80, 2f-., 0x30, 0x70, 0x0, 0x29,\
> +		0x28, 0x03, 0x0, 0x77, 0x0, 0x6;
> +
> +  /* encode parent %rsp = %rsp + (%rax != 0 ? 8 : 0)
> +     DW_CFA_val_expression %rsp DW_OP_breg7(0) DW_OP_lit0 DW_OP_breg0(0)
> +     DW_OP_ne DW_OP_lit3 DW_OP_shl DW_OP_plus*/
> +	.cfi_escape 0x16, 0x7, 0x9, 0x77, 0x0, 0x30, 0x70, 0x00, 0x2e, 0x33,
> +		0x24, 0x22;
> +
> +	CFI_CHILD_RIP_IS_THREAD_START
>  	testq	%rax,%rax
> +	CFI_CHILD_RIP_IS_THREAD_START
>  	jl	SYSCALL_ERROR_LABEL
> + 	CFI_CHILD_RIP_IS_THREAD_START
>  	jz	L(thread_start)
>  
>  	ret
> +  cfi_endproc;
>  
>  L(thread_start):
>  	cfi_startproc;
> @@ -90,7 +112,7 @@ L(thread_start):
>  	/* Clear the frame pointer.  The ABI suggests this be done, to mark
>  	   the outermost frame obviously.  */
>  	xorl	%ebp, %ebp
> -
> +2:
>  	andq	$CLONE_VM, %rdi
>  	jne	1f
>  	movl	$SYS_ify(getpid), %eax
>
  

Patch

diff --git a/sysdeps/unix/sysv/linux/x86_64/clone.S b/sysdeps/unix/sysv/linux/x86_64/clone.S
index 66f4b11..1bd2eed 100644
--- a/sysdeps/unix/sysv/linux/x86_64/clone.S
+++ b/sysdeps/unix/sysv/linux/x86_64/clone.S
@@ -72,16 +72,38 @@  ENTRY (__clone)
 	mov	8(%rsp), %R10_LP
 	movl	$SYS_ify(clone),%eax
 
-	/* End FDE now, because in the child the unwind info will be
-	   wrong.  */
-	cfi_endproc;
 	syscall
 
+  /* Best effort unwind info that works for both the parent and the child.
+     Ideally, we'd have cfi_undefiend(%rip) in the child and keep everything
+     the same in the parent, but we can't do that (since there's no way for
+     an expression to return undefined). Instead, we pretend that the child
+     came from thread_start. This isn't quite correct, but at least better than
+     whatever the debugger heuristics would have come up with in the absence
+     of unwind info */
+
+  /* Encodes: parent %rip = %rax == 0 ? %rip + (2f - .) : *%rsp
+     DW_CFA_val_expression %rip DW_OP_breg16(2f-1b) DW_OP_lit0 DW_OP_breg0(0)
+     DW_OP_eq DW_OP_bra(0x0003) DW_OP_breg7(0) DW_OP_deref */
+#define CFI_CHILD_RIP_IS_THREAD_START \
+	.cfi_escape 0x16, 0x10, 0xc, 0x80, 2f-., 0x30, 0x70, 0x0, 0x29,\
+		0x28, 0x03, 0x0, 0x77, 0x0, 0x6;
+
+  /* encode parent %rsp = %rsp + (%rax != 0 ? 8 : 0)
+     DW_CFA_val_expression %rsp DW_OP_breg7(0) DW_OP_lit0 DW_OP_breg0(0)
+     DW_OP_ne DW_OP_lit3 DW_OP_shl DW_OP_plus*/
+	.cfi_escape 0x16, 0x7, 0x9, 0x77, 0x0, 0x30, 0x70, 0x00, 0x2e, 0x33,
+		0x24, 0x22;
+
+	CFI_CHILD_RIP_IS_THREAD_START
 	testq	%rax,%rax
+	CFI_CHILD_RIP_IS_THREAD_START
 	jl	SYSCALL_ERROR_LABEL
+ 	CFI_CHILD_RIP_IS_THREAD_START
 	jz	L(thread_start)
 
 	ret
+  cfi_endproc;
 
 L(thread_start):
 	cfi_startproc;
@@ -90,7 +112,7 @@  L(thread_start):
 	/* Clear the frame pointer.  The ABI suggests this be done, to mark
 	   the outermost frame obviously.  */
 	xorl	%ebp, %ebp
-
+2:
 	andq	$CLONE_VM, %rdi
 	jne	1f
 	movl	$SYS_ify(getpid), %eax