Patchwork [05/14,v6] Open-code the memcpy() at static TLS initialization time.

login
register
mail settings
Submitter Nix
Date June 7, 2016, 11:06 a.m.
Message ID <1465297576-10981-6-git-send-email-nix@esperi.org.uk>
Download mbox | patch
Permalink /patch/12837/
State New
Headers show

Comments

Nix - June 7, 2016, 11:06 a.m.
From: Nick Alcock <nick.alcock@oracle.com>

This one is a bit nasty.  Now that we are initializing TLS earlier for
the stack canary's sake, existing memcpy() implementations become
problematic.  We can use the multiarch implementations, but they might
not always be present, and even if they are present they might not always
be in assembler, so might be compiled with stack-protection.  We cannot
use posix/memcpy.c without marking both it and */wordcopy.c as non-stack-
protected, which for memcpy() of all things seems like a seriously bad
idea: if any function in glibc should be stack-protected, it's memcpy()
(though stack-protecting the many optimized assembly versions is not done
in this patch series).

So we have two real options: hack up the guts of posix/memcpy.c and
*/wordcopy.c so that they can be #included (renamed and declared static)
inside libc-tls.c, or simply open-code the memcpy().  For simplicity's
sake, this patch open-codes it, on the grounds that static binaries are
relatively rare and quasi-deprecated anyway, and static binaries with
large TLS sections are yet rarer, and not worth the complexity of hacking
up all the arch-dependent wordcopy files.  If the arch provides an inline
assembler memcpy() implementation, we can use that in preference, for
speed; also, of course, if stack protection is not enabled at all, we
can still use a normal memcpy() as before.

(This was not revealed when testing on x86 because on that platform
GCC was open-coding the memcpy() for us.)

v2: New, lets us remove the memcpy() -fno-stack-protection, which wasn't
    enough in any case.
v4: Add an inhibit_loop_to_libcall to prevent GCC from turning the loop
    back into a memcpy() again.  Wrap long lines.
v6: Only do this if stack-protected.  Use the inline assembler
    ARCH_memcpy if available.

	* csu/libc-tls.c (__libc_setup_tls): Add inhibit_loop_to_libcall
	to avoid calls to potentially ifunced or stack-protected memcpy.
	Optionally open-code the TLS-initialization memcpy.
---
 csu/libc-tls.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
Florian Weimer - June 24, 2016, 12:42 p.m.
On 06/07/2016 01:06 PM, Nix wrote:
> From: Nick Alcock <nick.alcock@oracle.com>
>
> This one is a bit nasty.  Now that we are initializing TLS earlier for
> the stack canary's sake, existing memcpy() implementations become
> problematic.  We can use the multiarch implementations, but they might
> not always be present, and even if they are present they might not always
> be in assembler, so might be compiled with stack-protection.  We cannot
> use posix/memcpy.c without marking both it and */wordcopy.c as non-stack-
> protected, which for memcpy() of all things seems like a seriously bad
> idea: if any function in glibc should be stack-protected, it's memcpy()
> (though stack-protecting the many optimized assembly versions is not done
> in this patch series).

Some of the pathnames in this explanation appear to be incorrect.

I looked at the memcpy.c and wordcopy.c implementations, and I do not 
see why these functions would need the stack protector.  They should not 
have addressable local variables, they should all be in registers.  As a 
result, the implementation should never derive a pointer from the 
current stack frame.

These functions might be passed pointers into the callers stack frame, 
but the memory looks like this (addresses increase from bottom to top, 
so the stack grows downwards on most architectures (except HPPA)):

     :                             :
     +-----------------------------+
     |  Caller stack frame         |
     |                             |
     |  (end of buf)               |
     |    :                        |
     |    :                        |
     |  char buf[64];              |
     |                             |
     +-----------------------------+
     | Return address              |
     | memcpy stack frame          |
     |                             |
     +-(current top of stack)------+
     :                             :
     +=============================+
     | XXX guard page XXXXXXXXXXXX |
     :                             :


As a result, a buffer overflow will run away from the return address 
associated with the memcpy activation frame.  To avoid that, you'd need 
a memcpy call with a destination address which points below the current 
top of stack before memcpy is called, that is an invalid, dangling 
pointer.  A random pointer with an excessive size is unlikely to work 
because the copying operation would eventually hit the guard page below 
the stack.  This is not a straight stack-based overflow anymore, and not 
something stack protector can help with.

(Attacks against runaway memcpy calls (with size arguments approaching 
SIZE_MAX) work against some memcpy implementations not because these 
implementations lack stack protector, but because they load variables 
from the stack in the inner loop, which may have been overwritten by 
attacker-controlled values.)

Anyway, if the above analysis is right, it should be safe to disable 
stack protector for functions such as memcpy (essentially doing manually 
what -fstack-protector-strong does automatically).  This would be my 
preferred approach here.

Thanks,
Florian
Nix - June 24, 2016, 1:10 p.m.
On 24 Jun 2016, Florian Weimer said:

> On 06/07/2016 01:06 PM, Nix wrote:
>> From: Nick Alcock <nick.alcock@oracle.com>
>>
>> This one is a bit nasty.  Now that we are initializing TLS earlier for
>> the stack canary's sake, existing memcpy() implementations become
>> problematic.  We can use the multiarch implementations, but they might
>> not always be present, and even if they are present they might not always
>> be in assembler, so might be compiled with stack-protection.  We cannot
>> use posix/memcpy.c without marking both it and */wordcopy.c as non-stack-
>> protected, which for memcpy() of all things seems like a seriously bad
>> idea: if any function in glibc should be stack-protected, it's memcpy()
>> (though stack-protecting the many optimized assembly versions is not done
>> in this patch series).
>
> Some of the pathnames in this explanation appear to be incorrect.

Oh, it's string/memcpy.c, isn't it? Sorry.

> I looked at the memcpy.c and wordcopy.c implementations, and I do not
> see why these functions would need the stack protector. They should
> not have addressable local variables, they should all be in registers.

Now we're depending on the details of compiler register allocation for
security? What if the compiler changes?

> As a result, the implementation should never derive a pointer from the
> current stack frame.

Oh, they almost certainly don't, but my attitude here is that I'm less
smart than the attackers and I should just try to protect everything
possible (that doesn't require horrific pain like ld.so would).

> These functions might be passed pointers into the callers stack frame,

That was my worry. I've had exactly that sort of obviously-problematic
bug in the past, as have we all...

> but the memory looks like this (addresses increase from bottom to top,
> so the stack grows downwards on most architectures (except HPPA)):

You see, there are "except"s creeping in already :) every one is a
possible mistake...

>     :                             :
>     +-----------------------------+
>     |  Caller stack frame         |
>     |                             |
>     |  (end of buf)               |
>     |    :                        |
>     |    :                        |
>     |  char buf[64];              |
>     |                             |
>     +-----------------------------+
>     | Return address              |
>     | memcpy stack frame          |
>     |                             |
>     +-(current top of stack)------+
>     :                             :
>     +=============================+
>     | XXX guard page XXXXXXXXXXXX |
>     :                             :
>
>
> As a result, a buffer overflow will run away from the return address associated with the memcpy activation frame.  To avoid that,

Hm, good point. The caller would need stack-protection, but it's got it.

Do we ever see crashes inside memcpy() from return address overwrite?
You're right, I'm not sure I've ever seen one. (But then, I don't use
HP-PA.)

> Anyway, if the above analysis is right, it should be safe to disable
> stack protector for functions such as memcpy (essentially doing
> manually what -fstack-protector-strong does automatically). This would
> be my preferred approach here.

... which would mean we could drop this patch, too.
Florian Weimer - June 24, 2016, 1:18 p.m.
On 06/24/2016 03:10 PM, Nix wrote:
> On 24 Jun 2016, Florian Weimer said:
>
>> On 06/07/2016 01:06 PM, Nix wrote:
>>> From: Nick Alcock <nick.alcock@oracle.com>
>>>
>>> This one is a bit nasty.  Now that we are initializing TLS earlier for
>>> the stack canary's sake, existing memcpy() implementations become
>>> problematic.  We can use the multiarch implementations, but they might
>>> not always be present, and even if they are present they might not always
>>> be in assembler, so might be compiled with stack-protection.  We cannot
>>> use posix/memcpy.c without marking both it and */wordcopy.c as non-stack-
>>> protected, which for memcpy() of all things seems like a seriously bad
>>> idea: if any function in glibc should be stack-protected, it's memcpy()
>>> (though stack-protecting the many optimized assembly versions is not done
>>> in this patch series).
>>
>> Some of the pathnames in this explanation appear to be incorrect.
>
> Oh, it's string/memcpy.c, isn't it? Sorry.

Yes, that seems better.

>> I looked at the memcpy.c and wordcopy.c implementations, and I do not
>> see why these functions would need the stack protector. They should
>> not have addressable local variables, they should all be in registers.
>
> Now we're depending on the details of compiler register allocation for
> security? What if the compiler changes?

The stack protector already has such a dependency on compiler internals. 
  I'm not too worried about this aspect.

>> but the memory looks like this (addresses increase from bottom to top,
>> so the stack grows downwards on most architectures (except HPPA)):
>
> You see, there are "except"s creeping in already :) every one is a
> possible mistake...

HPPA is just very special (and very, very dead).

>> Anyway, if the above analysis is right, it should be safe to disable
>> stack protector for functions such as memcpy (essentially doing
>> manually what -fstack-protector-strong does automatically). This would
>> be my preferred approach here.
>
> ... which would mean we could drop this patch, too.

Wouldn't you have to supply $(no-stack-protector), for the sake of the 
dynamic linker?  Or does the rtld recompilation take care of that?

Thanks,
Florian
Nix - June 24, 2016, 1:25 p.m.
On 24 Jun 2016, Florian Weimer said:

> On 06/24/2016 03:10 PM, Nix wrote:
>> On 24 Jun 2016, Florian Weimer said:
>>> I looked at the memcpy.c and wordcopy.c implementations, and I do not
>>> see why these functions would need the stack protector. They should
>>> not have addressable local variables, they should all be in registers.
>>
>> Now we're depending on the details of compiler register allocation for
>> security? What if the compiler changes?
>
> The stack protector already has such a dependency on compiler internals. I'm not too worried about this aspect.

The stack protector is *part* of the compiler, though. That's a bit
different, in my eyes.

>>> Anyway, if the above analysis is right, it should be safe to disable
>>> stack protector for functions such as memcpy (essentially doing
>>> manually what -fstack-protector-strong does automatically). This would
>>> be my preferred approach here.
>>
>> ... which would mean we could drop this patch, too.
>
> Wouldn't you have to supply $(no-stack-protector), for the sake of the dynamic linker?  Or does the rtld recompilation take care of
> that?

Yeah, that's handled by the elide-stack-protector stuff in elf/Makefile.
(If we had to de-stack-protect everything ld.so used by hand, the patch
really would be unmaintainable.)
Florian Weimer - June 24, 2016, 1:53 p.m.
On 06/24/2016 03:25 PM, Nix wrote:

> Yeah, that's handled by the elide-stack-protector stuff in elf/Makefile.
> (If we had to de-stack-protect everything ld.so used by hand, the patch
> really would be unmaintainable.)

Okay, then let's drop this.  Thanks.

Florian

Patch

diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 3d67a64..4d81113 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -102,6 +102,7 @@  init_static_tls (size_t memsz, size_t align)
 }
 
 void
+inhibit_loop_to_libcall
 __libc_setup_tls (size_t tcbsize, size_t tcbalign)
 {
   void *tlsblock;
@@ -176,8 +177,23 @@  __libc_setup_tls (size_t tcbsize, size_t tcbalign)
 # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
 #endif
   _dl_static_dtv[2].pointer.is_static = true;
-  /* sbrk gives us zero'd memory, so we don't need to clear the remainder.  */
+
+  /* sbrk gives us zero'd memory, so we don't need to clear the remainder.
+
+     When stack-protecting, use inlined asm implementation if available:
+     otherwise, copy by hand, because memcpy() is stack-protected and is often
+     multiarch too.  */
+
+#if defined _HAVE_STRING_ARCH_memcpy || !defined STACK_PROTECTOR_LEVEL
   memcpy (_dl_static_dtv[2].pointer.val, initimage, filesz);
+#else
+  char *dst = (char *) _dl_static_dtv[2].pointer.val;
+  char *src = (char *) initimage;
+  size_t i;
+
+  for (i = 0; i < filesz; dst++, src++, i++)
+    *dst = *src;
+#endif
 
   /* Install the pointer to the dtv.  */