Patchwork pthread wastes memory with mlockall(MCL_FUTURE)

login
register
mail settings
Submitter Balazs Kezes
Date Sept. 20, 2015, 1:27 p.m.
Message ID <20150920132712.GA7569@eper>
Download mbox | patch
Permalink /patch/8808/
State New
Headers show

Comments

Balazs Kezes - Sept. 20, 2015, 1:27 p.m.
On 2015-09-18 19:22 -0400, Rich Felker wrote:
> If this works, I think it's only due to a kernel bug of failing to
> apply the lock after mprotect. It's also going to be considerably
> slower, I think. What I had in mind was switching around the existing
> mmap/mprotect order, not adding an extra mprotect.

Here's a better patch. It will readd the permissions in two steps. First
it will only add for this pthread structure at the end of the
allocation. Then at a later step it will readd it for the rest of the
stack (this part I didn't touch so that's why this patch is short).

I don't really like this two-step mprotect call but unless the stack
grows down, you can't avoid it (and I didn't special case that option).

But review carefully because I don't know how to test it better than
with my little test application and make xcheck.
Rich Felker - Sept. 20, 2015, 11:32 p.m.
On Sun, Sep 20, 2015 at 02:27:12PM +0100, Balazs Kezes wrote:
> On 2015-09-18 19:22 -0400, Rich Felker wrote:
> > If this works, I think it's only due to a kernel bug of failing to
> > apply the lock after mprotect. It's also going to be considerably
> > slower, I think. What I had in mind was switching around the existing
> > mmap/mprotect order, not adding an extra mprotect.
> 
> Here's a better patch. It will readd the permissions in two steps. First
> it will only add for this pthread structure at the end of the
> allocation. Then at a later step it will readd it for the rest of the
> stack (this part I didn't touch so that's why this patch is short).
> 
> I don't really like this two-step mprotect call but unless the stack
> grows down, you can't avoid it (and I didn't special case that option).
> 
> But review carefully because I don't know how to test it better than
> with my little test application and make xcheck.
> 
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 753da61..2959816 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -499,11 +499,11 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  #if MULTI_PAGE_ALIASING != 0
>  	  if ((size % MULTI_PAGE_ALIASING) == 0)
>  	    size += pagesize_m1 + 1;
>  #endif
>  
> -	  mem = mmap (NULL, size, prot,
> +	  mem = mmap (NULL, size, PROT_NONE,
>  		      MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>  
>  	  if (__glibc_unlikely (mem == MAP_FAILED))
>  	    return errno;
>  
> @@ -542,13 +542,25 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  				    - __static_tls_size)
>  				    & ~__static_tls_align_m1)
>  				   - TLS_PRE_TCB_SIZE);
>  #endif
>  
> +	  /* We allocated the memory without permissions in order for the kernel
> +	     to not allocate the guard pages in case of mlockall(MCL_FUTURE).
> +	     Now readd the permissions for pd's page.  */

This comment misses the real point: avoiding committing memory.
MCL_FUTURE is just a particular side effect of this.

> +	  void *pdpage = (void*) ((uintptr_t) pd & ~pagesize_m1);
> +	  size_t pdsize = mem + size - pdpage;
> +	  if (__glibc_unlikely (mprotect (pdpage, pdsize, prot) != 0))
> +	    {
> +	      (void) munmap (mem, size);
> +	      return errno;
> +	    }
> +

You're still adding a new mprotect rather than just inverting the one
that's already there (somewhere else) adding the PROT_NONE for the
guard page. Is there a reason this is hard to do right on glibc?

Rich
Balazs Kezes - Sept. 21, 2015, 5:41 p.m.
On 2015-09-20 19:32 -0400, Rich Felker wrote:
> You're still adding a new mprotect rather than just inverting the one
> that's already there (somewhere else) adding the PROT_NONE for the
> guard page. Is there a reason this is hard to do right on glibc?

Even if I manage to fix it properly (hah, I wish!) I won't be able to
test it as I lack the various hardware this function supports. I've
opened a bug instead so that more experienced people pick this up:

https://sourceware.org/bugzilla/show_bug.cgi?id=18988

Thanks!
Rich Felker - Sept. 21, 2015, 5:44 p.m.
On Mon, Sep 21, 2015 at 06:41:23PM +0100, Balazs Kezes wrote:
> On 2015-09-20 19:32 -0400, Rich Felker wrote:
> > You're still adding a new mprotect rather than just inverting the one
> > that's already there (somewhere else) adding the PROT_NONE for the
> > guard page. Is there a reason this is hard to do right on glibc?
> 
> Even if I manage to fix it properly (hah, I wish!) I won't be able to
> test it as I lack the various hardware this function supports. I've
> opened a bug instead so that more experienced people pick this up:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=18988
> 
> Thanks!

Thanks! This looks like the best approach to get some more attention
on it from people who might have better ideas for what a fix would
look like.

Rich

Patch

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 753da61..2959816 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -499,11 +499,11 @@  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 #if MULTI_PAGE_ALIASING != 0
 	  if ((size % MULTI_PAGE_ALIASING) == 0)
 	    size += pagesize_m1 + 1;
 #endif
 
-	  mem = mmap (NULL, size, prot,
+	  mem = mmap (NULL, size, PROT_NONE,
 		      MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
 
 	  if (__glibc_unlikely (mem == MAP_FAILED))
 	    return errno;
 
@@ -542,13 +542,25 @@  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 				    - __static_tls_size)
 				    & ~__static_tls_align_m1)
 				   - TLS_PRE_TCB_SIZE);
 #endif
 
+	  /* We allocated the memory without permissions in order for the kernel
+	     to not allocate the guard pages in case of mlockall(MCL_FUTURE).
+	     Now readd the permissions for pd's page.  */
+	  void *pdpage = (void*) ((uintptr_t) pd & ~pagesize_m1);
+	  size_t pdsize = mem + size - pdpage;
+	  if (__glibc_unlikely (mprotect (pdpage, pdsize, prot) != 0))
+	    {
+	      (void) munmap (mem, size);
+	      return errno;
+	    }
+
 	  /* Remember the stack-related values.  */
 	  pd->stackblock = mem;
 	  pd->stackblock_size = size;
+	  pd->guardsize = size;
 
 	  /* We allocated the first block thread-specific data array.
 	     This address will not change for the lifetime of this
 	     descriptor.  */
 	  pd->specific[0] = pd->specific_1stblock;
@@ -621,12 +633,11 @@  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 
 	  /* Note that all of the stack and the thread descriptor is
 	     zeroed.  This means we do not have to initialize fields
 	     with initial value zero.  This is specifically true for
 	     the 'tid' field which is always set back to zero once the
-	     stack is not used anymore and for the 'guardsize' field
-	     which will be read next.  */
+	     stack is not used anymore.  */
 	}
 
       /* Create or resize the guard area if necessary.  */
       if (__glibc_unlikely (guardsize > pd->guardsize))
 	{