[v2,1/4] hurd: Simplify init-first.c further

Message ID 20230221211932.296459-2-bugaevc@gmail.com
State Superseded, archived
Headers
Series More x86_64-gnu glibc work |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Sergey Bugaev Feb. 21, 2023, 9:19 p.m. UTC
  This drops all of the return address rewriting kludges. The only
remaining hack is the jump out of a call stack while adjusting the
stack pointer.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
---
 sysdeps/mach/hurd/dl-sysdep.c       |   5 +-
 sysdeps/mach/hurd/i386/dl-machine.h |   7 -
 sysdeps/mach/hurd/i386/init-first.c | 200 ++++++++++------------------
 3 files changed, 75 insertions(+), 137 deletions(-)
 delete mode 100644 sysdeps/mach/hurd/i386/dl-machine.h
  

Comments

Samuel Thibault Feb. 22, 2023, 11:26 p.m. UTC | #1
Hello,

Thanks for this!

Did you try to run make check?

Sergey Bugaev via Libc-alpha, le mer. 22 févr. 2023 00:19:29 +0300, a ecrit:
> This drops all of the return address rewriting kludges. The only
> remaining hack is the jump out of a call stack while adjusting the
> stack pointer.

Is this hack really still needed? Since we don't switch stack any more,
we could as well just return normally?

> --- a/sysdeps/mach/hurd/dl-sysdep.c
> +++ b/sysdeps/mach/hurd/dl-sysdep.c
> @@ -207,6 +207,9 @@ _dl_sysdep_start (void **start_argptr,
>  	    }
>  	}
>  
> +      extern void _dl_init_first (void *data);

Please put extern function declaration into a header, dl-sysdep.h for
instance.

> diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/i386/init-first.c
> index a558da16..34e8dcc0 100644
> --- a/sysdeps/mach/hurd/i386/init-first.c
> +++ b/sysdeps/mach/hurd/i386/init-first.c
> +  {
> +    /* Check if the stack we are now on is different from
> +       the one described by _hurd_stack_{base,size}.  */
>  
> +    char dummy;
> +    const vm_address_t newsp = (vm_address_t) &dummy;
> +
> +    if (d->stack_size != 0 && (newsp < d->stack_base
> +			       || newsp - d->stack_base > d->stack_size))
> +      /* The new stack pointer does not intersect with the
> +	 stack the exec server set up for us, so free that stack.  */
> +      __vm_deallocate (__mach_task_self (), d->stack_base, d->stack_size);
> +  }

Again, I don't think this is needed any more since we don't switch stack
any more?

Samuel
  
Sergey Bugaev Feb. 23, 2023, 1:54 p.m. UTC | #2
On Thu, Feb 23, 2023 at 2:26 AM Samuel Thibault <samuel.thibault@gnu.org> wrote:
> Did you try to run make check?

No.

I'm cross-compiling, so I don't think make check would be able to run
any tests. And from what I remember from building glibc on the Hurd
itself back in 2021, make check takes a very long time and either
never really completes or brings the system into some weird state.

If you're able to run make check on your end, please do so (but wait
until I send v3 with the changes you've requested below).

Are there specific tests for the various combinations of startup
variants? (shared vs static, args already on the stack vs not, exec
server present vs not) If so, maybe I could run just them and not the
full thing; that would be easier on my system.

> Sergey Bugaev via Libc-alpha, le mer. 22 févr. 2023 00:19:29 +0300, a ecrit:
> > This drops all of the return address rewriting kludges. The only
> > remaining hack is the jump out of a call stack while adjusting the
> > stack pointer.
>
> Is this hack really still needed? Since we don't switch stack any more,
> we could as well just return normally?

No, we can't just return normally, we have to adjust the stack pointer
and jump out. We don't _switch_ the stack as in use an entirely
different area of memory for the stack, but we do adjust the stack
pointer within the existing stack area.

Are the comments I have added not clear enough about this? If so,
maybe they should be expanded further. The reason we have to return in
this weird way is that:
1. The normal startup code (_start1) expects to find args/env on the
top of the stack. Like, literally, argc at sp[0], argv[0] at sp[1] and
so on.
2. _hurd_startup, which is the code that receives args/env from the
exec server, allocates them in its own stack frame, with alloca. So it
cannot return, and neither can _hurd_stack_setup.
Instead of returning, _hurd_startup invokes a callback (doinit) that
(eventually) just sets the stack pointer to point to this data (so it
now is on the top of the stack, just as _start1 expects) and jumps to
_hurd_stack_setup's caller (i.e. to _start). This would very much
break things if _start was itself using the stack for anything, since
the stack pointer is now different, but the only thing _start does is
it calls _hurd_startup and then jumps to _start1, so that works.

And the same is done in the SHARED startup path by _dl_sysdep_start,
except that it uses the RETURN_TO macro instead of direct inline
assembly. That macro only really differs in that it does *not* zero
out ebp/rbp, so I wonder how come that doesn't break backtraces.

> > --- a/sysdeps/mach/hurd/dl-sysdep.c
> > +++ b/sysdeps/mach/hurd/dl-sysdep.c
> > @@ -207,6 +207,9 @@ _dl_sysdep_start (void **start_argptr,
> >           }
> >       }
> >
> > +      extern void _dl_init_first (void *data);
>
> Please put extern function declaration into a header, dl-sysdep.h for
> instance.

Makes sense, thanks.

>
> > diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/i386/init-first.c
> > index a558da16..34e8dcc0 100644
> > --- a/sysdeps/mach/hurd/i386/init-first.c
> > +++ b/sysdeps/mach/hurd/i386/init-first.c
> > +  {
> > +    /* Check if the stack we are now on is different from
> > +       the one described by _hurd_stack_{base,size}.  */
> >
> > +    char dummy;
> > +    const vm_address_t newsp = (vm_address_t) &dummy;
> > +
> > +    if (d->stack_size != 0 && (newsp < d->stack_base
> > +                            || newsp - d->stack_base > d->stack_size))
> > +      /* The new stack pointer does not intersect with the
> > +      stack the exec server set up for us, so free that stack.  */
> > +      __vm_deallocate (__mach_task_self (), d->stack_base, d->stack_size);
> > +  }
>
> Again, I don't think this is needed any more since we don't switch stack
> any more?

Good point, most likely not. I'll drop it and see if anything breaks.

Sergey
  
Samuel Thibault Feb. 24, 2023, 1:08 a.m. UTC | #3
Sergey Bugaev, le jeu. 23 févr. 2023 16:54:05 +0300, a ecrit:
> And from what I remember from building glibc on the Hurd
> itself back in 2021, make check takes a very long time and either
> never really completes or brings the system into some weird state.

Some checks pose problem indeed, see the "unsupported" lines in debian's
debian/testsuite-xfail-debian.mk

> If you're able to run make check on your end, please do so (but wait
> until I send v3 with the changes you've requested below).

I'll let it run through the night.

> Are there specific tests for the various combinations of startup
> variants? (shared vs static, args already on the stack vs not, exec
> server present vs not)

Basically it's the elf/ directory that tests things. I don't know which
ones would exactly test what you'd want, there are various tests for
various situations.

> Instead of returning, _hurd_startup invokes a callback (doinit) that
> (eventually) just sets the stack pointer to point to this data (so it
> now is on the top of the stack, just as _start1 expects) and jumps to
> _hurd_stack_setup's caller (i.e. to _start).

Ah, ok, I had misunderstood and I hadn't really dived into the code. I
thought _hurd_startup was eventually calling main() itself, but actually
no it's sorta-returning to _start, but lower in the stack.

Your current comments are enough, I was just stuck with older
assumptions with the old stack switching code.

Thanks!
Samuel
  
Samuel Thibault Feb. 24, 2023, 7:43 p.m. UTC | #4
Samuel Thibault, le ven. 24 févr. 2023 02:08:09 +0100, a ecrit:
> Sergey Bugaev, le jeu. 23 févr. 2023 16:54:05 +0300, a ecrit:
> > And from what I remember from building glibc on the Hurd
> > itself back in 2021, make check takes a very long time and either
> > never really completes or brings the system into some weird state.
> 
> Some checks pose problem indeed, see the "unsupported" lines in debian's
> debian/testsuite-xfail-debian.mk

I have added the bugreports on bugzilla, and added unsupported lines in
sysdeps/hurd/mach/Makefile.

> > If you're able to run make check on your end, please do so (but wait
> > until I send v3 with the changes you've requested below).
> 
> I'll let it run through the night.

It went fine! I also rebuilt hurd to exercise ext2fs.static, and that
went fine too. I thus pushed your commit, many thanks!

Samuel
  

Patch

diff --git a/sysdeps/mach/hurd/dl-sysdep.c b/sysdeps/mach/hurd/dl-sysdep.c
index 9e591708..18b35ddb 100644
--- a/sysdeps/mach/hurd/dl-sysdep.c
+++ b/sysdeps/mach/hurd/dl-sysdep.c
@@ -207,6 +207,9 @@  _dl_sysdep_start (void **start_argptr,
 	    }
 	}
 
+      extern void _dl_init_first (void *data);
+      _dl_init_first (argdata);
+
       {
 	extern void _dl_start_user (void);
 	/* Unwind the stack to ARGDATA and simulate a return from _dl_start
@@ -793,7 +796,7 @@  _dl_show_auxv (void)
 
 
 void weak_function
-_dl_init_first (int argc, ...)
+_dl_init_first (void *p)
 {
   /* This no-op definition only gets used if libc is not linked in.  */
 }
diff --git a/sysdeps/mach/hurd/i386/dl-machine.h b/sysdeps/mach/hurd/i386/dl-machine.h
deleted file mode 100644
index 40f2ff29..00000000
--- a/sysdeps/mach/hurd/i386/dl-machine.h
+++ /dev/null
@@ -1,7 +0,0 @@ 
-/* Dynamic linker magic for Hurd/i386.
-   This file just gets us a call to _dl_first_init inserted
-   into the asm in sysdeps/i386/dl-machine.h that contains
-   the initializer code.  */
-
-#define RTLD_START_SPECIAL_INIT "call _dl_init_first@PLT; movl (%esp), %edx"
-#include_next "dl-machine.h"
diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/i386/init-first.c
index a558da16..34e8dcc0 100644
--- a/sysdeps/mach/hurd/i386/init-first.c
+++ b/sysdeps/mach/hurd/i386/init-first.c
@@ -24,8 +24,6 @@ 
 #include <sysdep.h>
 #include <set-hooks.h>
 #include "hurdstartup.h"
-#include "hurdmalloc.h"		/* XXX */
-#include "../locale/localeinfo.h"
 
 #include <ldsodefs.h>
 #include <fpu_control.h>
@@ -87,68 +85,13 @@  posixland_init (int argc, char **argv, char **envp)
   __init_misc (argc, argv, envp);
 }
 
-
 static void
-init1 (int argc, char *arg0, ...)
-{
-  char **argv = &arg0;
-  char **envp = &argv[argc + 1];
-  struct hurd_startup_data *d;
-
-  while (*envp)
-    ++envp;
-  d = (void *) ++envp;
-
-  if ((void *) d == argv[0])
-    {
-      /* No Hurd data block to process.  */
-#ifndef SHARED
-      __libc_enable_secure = 0;
-#endif
-      return;
-    }
-
-#ifndef SHARED
-  __libc_enable_secure = d->flags & EXEC_SECURE;
-#endif
-
-  _hurd_init_dtable = d->dtable;
-  _hurd_init_dtablesize = d->dtablesize;
-
-  {
-    /* Check if the stack we are now on is different from
-       the one described by _hurd_stack_{base,size}.  */
-
-    char dummy;
-    const vm_address_t newsp = (vm_address_t) &dummy;
-
-    if (d->stack_size != 0 && (newsp < d->stack_base
-			       || newsp - d->stack_base > d->stack_size))
-      /* The new stack pointer does not intersect with the
-	 stack the exec server set up for us, so free that stack.  */
-      __vm_deallocate (__mach_task_self (), d->stack_base, d->stack_size);
-  }
-
-  if (d->portarray || d->intarray)
-    /* Initialize library data structures, start signal processing, etc.  */
-    _hurd_init (d->flags, argv,
-		d->portarray, d->portarraysize,
-		d->intarray, d->intarraysize);
-}
-
-
-static inline void
 init (int *data)
 {
-  /* data is the address of the argc parameter to _dl_init_first or
-     doinit1 in _hurd_stack_setup, so the array subscripts are
-     undefined.  */
-  DIAG_PUSH_NEEDS_COMMENT;
-  DIAG_IGNORE_NEEDS_COMMENT (10, "-Warray-bounds");
-
   int argc = *data;
   char **argv = (void *) (data + 1);
   char **envp = &argv[argc + 1];
+  struct hurd_startup_data *d;
 
   /* Since the cthreads initialization code uses malloc, and the
      malloc initialization code needs to get at the environment, make
@@ -157,18 +100,18 @@  init (int *data)
      stored.  */
   __environ = envp;
 
-#ifndef SHARED
-  struct hurd_startup_data *d;
-
   while (*envp)
     ++envp;
   d = (void *) ++envp;
 
+#ifndef SHARED
+
   /* If we are the bootstrap task started by the kernel,
      then after the environment pointers there is no Hurd
      data block; the argument strings start there.  */
   if ((void *) d == argv[0] || d->phdr == 0)
     {
+      __libc_enable_secure = 0;
       /* With a new enough linker (binutils-2.23 or better),
          the magic __ehdr_start symbol will be available and
          __libc_start_main will have done this that way already.  */
@@ -186,51 +129,39 @@  init (int *data)
     }
   else
     {
+      __libc_enable_secure = d->flags & EXEC_SECURE;
       _dl_phdr = (ElfW(Phdr) *) d->phdr;
       _dl_phnum = d->phdrsz / sizeof (ElfW(Phdr));
       assert (d->phdrsz % sizeof (ElfW(Phdr)) == 0);
     }
 #endif
 
-  /* Call `init1' (above) with the user code as the return address, and the
-     argument data immediately above that on the stack.  */
-
-  void *usercode, **ret_address;
-
-  void call_init1 (void);
-
-  /* The argument data is just above the stack frame we will unwind by
-     returning.  Mutate our own return address to run the code below.  */
-  /* The following expression would typically be written as
-     ``__builtin_return_address (0)''.  But, for example, GCC 4.4.6 doesn't
-     recognize that this read operation may alias the following write
-     operation, and thus is free to reorder the two, clobbering the
-     original return address.  */
-  ret_address = (void **) __builtin_frame_address (0) + 1;
-  usercode = *ret_address;
-  /* GCC 4.4.6 also wants us to force loading USERCODE already here.  */
-  asm volatile ("# %0" : : "X" (usercode));
-  *ret_address = &call_init1;
-  /* Force USERCODE into %eax and &init1 into %ecx, which are not
-     restored by function return.  */
-  asm volatile ("# a %0 c %1" : : "a" (usercode), "c" (&init1));
-
-  DIAG_POP_NEEDS_COMMENT;	/* -Warray-bounds.  */
-}
+  if ((void *) d == argv[0])
+    return;
 
-/* These bits of inline assembler used to be located inside `init'.
-   However they were optimized away by gcc 2.95.  */
+  _hurd_init_dtable = d->dtable;
+  _hurd_init_dtablesize = d->dtablesize;
 
-/* The return address of `init' above, was redirected to here, so at
-   this point our stack is unwound and callers' registers restored.
-   Only %ecx and %eax are call-clobbered and thus still have the
-   values we set just above.  We have stashed in %eax the user code
-   return address.  Push it on the top of the stack so it acts as
-   init1's return address, and then jump there.  */
-asm ("call_init1:\n"
-     "	push %eax\n"
-     "	jmp *%ecx\n");
+  {
+    /* Check if the stack we are now on is different from
+       the one described by _hurd_stack_{base,size}.  */
 
+    char dummy;
+    const vm_address_t newsp = (vm_address_t) &dummy;
+
+    if (d->stack_size != 0 && (newsp < d->stack_base
+			       || newsp - d->stack_base > d->stack_size))
+      /* The new stack pointer does not intersect with the
+	 stack the exec server set up for us, so free that stack.  */
+      __vm_deallocate (__mach_task_self (), d->stack_base, d->stack_size);
+  }
+
+  if (d->portarray || d->intarray)
+    /* Initialize library data structures, start signal processing, etc.  */
+    _hurd_init (d->flags, argv,
+		d->portarray, d->portarraysize,
+		d->intarray, d->intarraysize);
+}
 
 /* Do the first essential initializations that must precede all else.  */
 static inline void
@@ -242,7 +173,7 @@  first_init (void)
 #ifndef SHARED
   /* In the static case, we need to set up TLS early so that the stack
      protection guard can be read at gs:0x14 by the gcc-generated snippets.  */
-  _hurd_tls_init(&__init1_tcbhead);
+  _hurd_tls_init (&__init1_tcbhead);
   asm ("movw %%gs,%w0" : "=m" (__init1_desc));
 #endif
 
@@ -252,21 +183,15 @@  first_init (void)
 #ifdef SHARED
 /* This function is called specially by the dynamic linker to do early
    initialization of the shared C library before normal initializers
-   expecting a Posixoid environment can run.  It gets called with the
-   stack set up just as the user will see it, so it can switch stacks.  */
+   expecting a Posixoid environment can run.  */
 
 void
-_dl_init_first (int argc, ...)
+_dl_init_first (void **p)
 {
   first_init ();
-
-  /* If we use ``__builtin_frame_address (0) + 2'' here, GCC gets confused.  */
-  init (&argc);
+  init ((int *) p);
 }
-#endif
-
 
-#ifdef SHARED
 /* The regular posixland initialization is what goes into libc's
    normal initializer.  */
 /* NOTE!  The linker notices the magical name `_init' and sets the DT_INIT
@@ -280,9 +205,10 @@  __libc_init_first (int argc, char **argv, char **envp)
 {
   /* Everything was done in the shared library initializer, _init.  */
 }
-#else
-strong_alias (posixland_init, __libc_init_first);
 
+#else /* SHARED */
+
+strong_alias (posixland_init, __libc_init_first);
 
 /* XXX This is all a crock and I am not happy with it.
    This poorly-named function is called by static-start.S,
@@ -291,32 +217,48 @@  void
 inhibit_stack_protector
 _hurd_stack_setup (void)
 {
-  intptr_t caller = (intptr_t) __builtin_return_address (0);
+  /* This is the very first C code that runs in a statically linked
+     executable -- calling this function is the first thing that _start in
+     static-start.S does.  Once this function returns, the unusual way that it
+     does (see below), _start jumps to _start1, the regular start-up code.
+
+     _start1 expects the arguments, environment, and a Hurd data block to be
+     located at the top of the stack.  The data may already be located there,
+     or we may need to receive it from the exec server.  */
+  void *caller = __builtin_extract_return_addr (__builtin_return_address (0));
+  /* If the arguments and environment are already located on the stack, this is
+     where they are, just above our call frame.  Note that this may not be a
+     valid pointer in case we're supposed to receive the arguments from the exec
+     server, so we can not dereference it yet.  */
+  void **p = (void **) __builtin_frame_address (0) + 2;
+
+  /* Init the essential things.  */
+  first_init ();
 
   void doinit (intptr_t *data)
     {
-      /* This function gets called with the argument data at TOS.  */
-      void doinit1 (int argc, ...)
-	{
-	  /* If we use ``__builtin_frame_address (0) + 2'' here, GCC gets
-	     confused.  */
-	  init ((int *) &argc);
-	}
-
-      /* Push the user return address after the argument data, and then
-	 jump to `doinit1' (above), so it is as if __libc_init_first's
-	 caller had called `doinit1' with the argument data already on the
-	 stack.  */
-      *--data = caller;
+      init ((int *) data);
       asm volatile ("movl %0, %%esp\n" /* Switch to new outermost stack.  */
-		    "movl $0, %%ebp\n" /* Clear outermost frame pointer.  */
-		    "jmp *%1" : : "r" (data), "r" (&doinit1));
-      /* NOTREACHED */
+		    "xorl %%ebp, %%ebp\n" /* Clear outermost frame pointer.  */
+		    "jmp *%1" : : "r" (data), "r" (caller));
+      __builtin_unreachable ();
     }
 
-  first_init ();
-
-  _hurd_startup ((void **) __builtin_frame_address (0) + 2, &doinit);
+  /* _hurd_startup () will attempt to receive the data block from the exec
+     server; or if that is not possible, will take the data from the pointer
+     we pass it here.  The important point here is that the data
+     _hurd_startup () collects may be allocated in its stack frame (with
+     alloca), which is why _hurd_startup () does not return the normal way.
+     Instead, it invokes a callback (which is not expected to return normally
+     either).
+
+     Our callback not only passes the data pointer to init (), but also jumps
+     out of the call stack back to our caller (i.e. to _start1), while setting
+     the stack pointer to the data (which is somewhere on teh current stack
+     anyway).  This way, _start1 find the data on the top of the stack, just as
+     it expects to.  */
+  _hurd_startup (p, &doinit);
+  __builtin_unreachable ();
 }
 #endif