[RFC,7/9] hurd: Generalize init-first.c to support x86_64

Message ID 20230218203717.373211-8-bugaevc@gmail.com
State Changes Requested, archived
Headers
Series More x86_64-gnu glibc work |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Sergey Bugaev Feb. 18, 2023, 8:37 p.m. UTC
  Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
---
 sysdeps/mach/hurd/{i386 => x86}/init-first.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)
 rename sysdeps/mach/hurd/{i386 => x86}/init-first.c (96%)
  

Comments

Samuel Thibault Feb. 20, 2023, 12:01 a.m. UTC | #1
Sergey Bugaev, le sam. 18 févr. 2023 23:37:15 +0300, a ecrit:
> Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
> ---
>  sysdeps/mach/hurd/{i386 => x86}/init-first.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>  rename sysdeps/mach/hurd/{i386 => x86}/init-first.c (96%)
> 
> diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/x86/init-first.c
> similarity index 96%
> rename from sysdeps/mach/hurd/i386/init-first.c
> rename to sysdeps/mach/hurd/x86/init-first.c
> index a558da16..75ac1ff2 100644
> --- a/sysdeps/mach/hurd/i386/init-first.c
> +++ b/sysdeps/mach/hurd/x86/init-first.c
> @@ -227,10 +227,15 @@ init (int *data)
>     values we set just above.  We have stashed in %eax the user code
>     return address.  Push it on the top of the stack so it acts as
>     init1's return address, and then jump there.  */
> +#ifdef __x86_64__
> +asm ("call_init1:\n"
> +     "  push %rax\n"
> +     "  jmp *%rcx\n");
> +#else
>  asm ("call_init1:\n"
>       "	push %eax\n"
>       "	jmp *%ecx\n");
> -
> +#endif
>  
>  /* Do the first essential initializations that must precede all else.  */
>  static inline void
> @@ -242,7 +247,7 @@ first_init (void)
>  #ifndef SHARED
>    /* In the static case, we need to set up TLS early so that the stack
>       protection guard can be read at gs:0x14 by the gcc-generated snippets.  */
> -  _hurd_tls_init(&__init1_tcbhead);
> +  _hurd_tls_init (&__init1_tcbhead);
>    asm ("movw %%gs,%w0" : "=m" (__init1_desc));
>  #endif
>  
> @@ -300,7 +305,7 @@ _hurd_stack_setup (void)
>  	{
>  	  /* If we use ``__builtin_frame_address (0) + 2'' here, GCC gets
>  	     confused.  */
> -	  init ((int *) &argc);
> +	  init (&argc);

That won't work on x86_64: there, arguments are passed mostly through
registers, so &argc won't actually give you the address of arguments on
the stack. This can probably be checked on linux x86_64 to make sure how
arguments are passed.

>  	}
>  
>        /* Push the user return address after the argument data, and then
> @@ -308,9 +313,15 @@ _hurd_stack_setup (void)
>  	 caller had called `doinit1' with the argument data already on the
>  	 stack.  */
>        *--data = caller;
> +# ifdef __x86_64__
> +      asm volatile ("movq %0, %%rsp\n" /* Switch to new outermost stack.  */
> +                    "movq $0, %%rbp\n" /* Clear outermost frame pointer.  */
> +                    "jmp *%1" : : "r" (data), "r" (&doinit1));
> +# else
>        asm volatile ("movl %0, %%esp\n" /* Switch to new outermost stack.  */
>  		    "movl $0, %%ebp\n" /* Clear outermost frame pointer.  */
>  		    "jmp *%1" : : "r" (data), "r" (&doinit1));
> +# endif
>        /* NOTREACHED */
>      }
>  
> -- 
> 2.39.2
> 
>
  
Sergey Bugaev Feb. 20, 2023, 4:16 p.m. UTC | #2
On Mon, Feb 20, 2023 at 3:01 AM Samuel Thibault <samuel.thibault@gnu.org> wrote:
> That won't work on x86_64: there, arguments are passed mostly through
> registers, so &argc won't actually give you the address of arguments on
> the stack.

Right, good point.

I wish I had a better understanding of just what's going on in this
file. Maybe a lot of the hacks can be rewritten in a nicer way. For
instance, do we really need to hijack the return addresses and jump to
init1 in this weird way, only to enable it to access argc/arg0? Since
we know where they are on our stack (__builtin_frame_address (0) + 2
or something like that), can't we just pass it a pointer?

Let me actually try just that...

Sergey
  
Sergey Bugaev Feb. 20, 2023, 5:10 p.m. UTC | #3
> I wish I had a better understanding of just what's going on in this
> file. Maybe a lot of the hacks can be rewritten in a nicer way. For
> instance, do we really need to hijack the return addresses and jump to
> init1 in this weird way, only to enable it to access argc/arg0? Since
> we know where they are on our stack (__builtin_frame_address (0) + 2
> or something like that), can't we just pass it a pointer?
>
> Let me actually try just that...

Eh, no, let's start even earlier than that. Please correct me if I
(inevitably) get things wrong.

_dl_init_first is Hurd-specific. It is called from the assembly code
in dl-machine.h, specifically there's a RTLD_START_SPECIAL_INIT macro
that's defined to call _dl_init_first on the Hurd, and to nothing
otherwise. This RTLD_START_SPECIAL_INIT is used ("invoked") in
i386/dl-machine.h, s390/s390-{32,64}/dl-machine.h, ia64/dl-machine.h,
and alpha/dl-machine.h (but notably not in x86_64/dl-machine.h). In
all cases, it's emphasized that "The special initializer gets called
with the stack just as the application's entry point will see it; it
can switch stacks if it moves these contents over."

But I conclude that:
- s390-gnu, ia64-gnu, and alpha-gnu ports are nonexistent, nor are
they realistically ever going to happen, so we can ignore them
completely
- the implementation does not seem to actually switch stacks (in fact,
I have removed the unused switch_stacks function in the last commit)
-- so the "gets called with the stack just as the application's entry
point will see it" property may not be important anymore?

The only thing it really needs, it seems, is a pointer to
argc/argv/envp & Hurd data block *somewhere*. It does not have to be
on the stack (though where else would it be), or immediately preceding
its call frame -- all that really matters is that there's a pointer.

So my thinking goes, why don't we just hook into _dl_start, which
already has this very pointer? And in fact, _dl_start calls
_dl_sysdep_start, for which there is already a Hurd version. Can't we
just call our logic from there, and not worry about the stack layout
and overwriting return addresses?

That would work for the SHARED case; we also need to do something for
the static case. In that case, we are invoked by static-start.S; do I
understand it right that the argc/argv/whatever is still located
on-stack even on x86_64 in this case, and not passed in registers
according to the calling convention? Then again, we should be able to
just use __builtin_frame_address (0) + 2 and avoid most of the hacks?

Please tell me if this makes any sense.

Sergey
  
Noah Goldstein Feb. 20, 2023, 5:27 p.m. UTC | #4
On Sat, Feb 18, 2023 at 2:40 PM Sergey Bugaev via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
> ---
>  sysdeps/mach/hurd/{i386 => x86}/init-first.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>  rename sysdeps/mach/hurd/{i386 => x86}/init-first.c (96%)
>
> diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/x86/init-first.c
> similarity index 96%
> rename from sysdeps/mach/hurd/i386/init-first.c
> rename to sysdeps/mach/hurd/x86/init-first.c
> index a558da16..75ac1ff2 100644
> --- a/sysdeps/mach/hurd/i386/init-first.c
> +++ b/sysdeps/mach/hurd/x86/init-first.c
> @@ -227,10 +227,15 @@ init (int *data)
>     values we set just above.  We have stashed in %eax the user code
>     return address.  Push it on the top of the stack so it acts as
>     init1's return address, and then jump there.  */
> +#ifdef __x86_64__
> +asm ("call_init1:\n"
> +     "  push %rax\n"
> +     "  jmp *%rcx\n");
> +#else
>  asm ("call_init1:\n"
>       " push %eax\n"
>       " jmp *%ecx\n");
> -
> +#endif
>
>  /* Do the first essential initializations that must precede all else.  */
>  static inline void
> @@ -242,7 +247,7 @@ first_init (void)
>  #ifndef SHARED
>    /* In the static case, we need to set up TLS early so that the stack
>       protection guard can be read at gs:0x14 by the gcc-generated snippets.  */
> -  _hurd_tls_init(&__init1_tcbhead);
> +  _hurd_tls_init (&__init1_tcbhead);
>    asm ("movw %%gs,%w0" : "=m" (__init1_desc));
>  #endif
>
> @@ -300,7 +305,7 @@ _hurd_stack_setup (void)
>         {
>           /* If we use ``__builtin_frame_address (0) + 2'' here, GCC gets
>              confused.  */
> -         init ((int *) &argc);
> +         init (&argc);
>         }
>
>        /* Push the user return address after the argument data, and then
> @@ -308,9 +313,15 @@ _hurd_stack_setup (void)
>          caller had called `doinit1' with the argument data already on the
>          stack.  */
>        *--data = caller;
> +# ifdef __x86_64__
> +      asm volatile ("movq %0, %%rsp\n" /* Switch to new outermost stack.  */
> +                    "movq $0, %%rbp\n" /* Clear outermost frame pointer.  */
Unless you are intentionally preserving flags, `xorl %%ebp, %%ebp` is
better way to
zero a register.
> +                    "jmp *%1" : : "r" (data), "r" (&doinit1));
> +# else
>        asm volatile ("movl %0, %%esp\n" /* Switch to new outermost stack.  */
>                     "movl $0, %%ebp\n" /* Clear outermost frame pointer.  */
>                     "jmp *%1" : : "r" (data), "r" (&doinit1));
> +# endif
>        /* NOTREACHED */
>      }
>
> --
> 2.39.2
>
  
Samuel Thibault Feb. 20, 2023, 5:31 p.m. UTC | #5
Sergey Bugaev, le lun. 20 févr. 2023 20:10:29 +0300, a ecrit:
> - the implementation does not seem to actually switch stacks (in fact,
> I have removed the unused switch_stacks function in the last commit)
> -- so the "gets called with the stack just as the application's entry
> point will see it" property may not be important anymore?

That's possible, yes.

> So my thinking goes, why don't we just hook into _dl_start, which
> already has this very pointer? And in fact, _dl_start calls
> _dl_sysdep_start, for which there is already a Hurd version. Can't we
> just call our logic from there, and not worry about the stack layout
> and overwriting return addresses?

Possibly, yes.

> That would work for the SHARED case; we also need to do something for
> the static case. In that case, we are invoked by static-start.S; do I
> understand it right that the argc/argv/whatever is still located
> on-stack even on x86_64 in this case, and not passed in registers
> according to the calling convention?

I don't know. But you can check on Linux, we'll probably want to use the
same ABI.

Samuel
  

Patch

diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/x86/init-first.c
similarity index 96%
rename from sysdeps/mach/hurd/i386/init-first.c
rename to sysdeps/mach/hurd/x86/init-first.c
index a558da16..75ac1ff2 100644
--- a/sysdeps/mach/hurd/i386/init-first.c
+++ b/sysdeps/mach/hurd/x86/init-first.c
@@ -227,10 +227,15 @@  init (int *data)
    values we set just above.  We have stashed in %eax the user code
    return address.  Push it on the top of the stack so it acts as
    init1's return address, and then jump there.  */
+#ifdef __x86_64__
+asm ("call_init1:\n"
+     "  push %rax\n"
+     "  jmp *%rcx\n");
+#else
 asm ("call_init1:\n"
      "	push %eax\n"
      "	jmp *%ecx\n");
-
+#endif
 
 /* Do the first essential initializations that must precede all else.  */
 static inline void
@@ -242,7 +247,7 @@  first_init (void)
 #ifndef SHARED
   /* In the static case, we need to set up TLS early so that the stack
      protection guard can be read at gs:0x14 by the gcc-generated snippets.  */
-  _hurd_tls_init(&__init1_tcbhead);
+  _hurd_tls_init (&__init1_tcbhead);
   asm ("movw %%gs,%w0" : "=m" (__init1_desc));
 #endif
 
@@ -300,7 +305,7 @@  _hurd_stack_setup (void)
 	{
 	  /* If we use ``__builtin_frame_address (0) + 2'' here, GCC gets
 	     confused.  */
-	  init ((int *) &argc);
+	  init (&argc);
 	}
 
       /* Push the user return address after the argument data, and then
@@ -308,9 +313,15 @@  _hurd_stack_setup (void)
 	 caller had called `doinit1' with the argument data already on the
 	 stack.  */
       *--data = caller;
+# ifdef __x86_64__
+      asm volatile ("movq %0, %%rsp\n" /* Switch to new outermost stack.  */
+                    "movq $0, %%rbp\n" /* Clear outermost frame pointer.  */
+                    "jmp *%1" : : "r" (data), "r" (&doinit1));
+# else
       asm volatile ("movl %0, %%esp\n" /* Switch to new outermost stack.  */
 		    "movl $0, %%ebp\n" /* Clear outermost frame pointer.  */
 		    "jmp *%1" : : "r" (data), "r" (&doinit1));
+# endif
       /* NOTREACHED */
     }