Message ID | 20150918102734.GA27881@eper |
---|---|
State | Superseded |
Headers |
Received: (qmail 29757 invoked by alias); 18 Sep 2015 10:27:20 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 29747 invoked by uid 89); 18 Sep 2015 10:27:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wi0-f171.google.com X-Received: by 10.180.230.197 with SMTP id ta5mr36746998wic.26.1442572035378; Fri, 18 Sep 2015 03:27:15 -0700 (PDT) Date: Fri, 18 Sep 2015 11:27:34 +0100 From: Balazs Kezes <rlblaster@gmail.com> To: libc-alpha@sourceware.org Subject: pthread wastes memory with mlockall(MCL_FUTURE) Message-ID: <20150918102734.GA27881@eper> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) |
Commit Message
Balazs Kezes
Sept. 18, 2015, 10:27 a.m. UTC
Hi! I've run into the following problem: Whenever a new thread is created, pthread creates some guard pages next to its stack. These guard pages are usually empty zero pages, and have all their permissions removed -- nothing can read/write/execute on these pages. The problem is that the application I use has a large number of threads and uses mlockall(MCL_FUTURE) so this messes up the memory usage calculation (rss) for the application which then leads to memory wasted. Would it make sense for glibc to munlock these pages? I'm thinking something like this (although I haven't tested it yet): Thanks!
Comments
On Fri, Sep 18, 2015 at 11:27:34AM +0100, Balazs Kezes wrote: > Hi! > > I've run into the following problem: Whenever a new thread is created, > pthread creates some guard pages next to its stack. These guard pages > are usually empty zero pages, and have all their permissions removed -- > nothing can read/write/execute on these pages. > > The problem is that the application I use has a large number of threads > and uses mlockall(MCL_FUTURE) so this messes up the memory usage > calculation (rss) for the application which then leads to memory wasted. > > Would it make sense for glibc to munlock these pages? I'm thinking > something like this (although I haven't tested it yet): > > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c > index 753da61..1fc715c 100644 > --- a/nptl/allocatestack.c > +++ b/nptl/allocatestack.c > @@ -659,6 +659,11 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > > return errno; > } > + /* The guard pages shouldn't be locked into memory. A lot of memory > + would be unnecessarily wasted if you have a lot of threads and > + mlockall(MCL_FUTURE) set otherwise. We ignore the errors because > + can't do anything about them anyways. */ > + (void) munlock (guard, guardsize); I would say it's a kernel bug for PROT_NONE pages to actually occupy resources when locked, if they actually do? How did you test/measure this? Rich
On 2015-09-18 10:38 -0400, Rich Felker wrote: > I would say it's a kernel bug for PROT_NONE pages to actually occupy > resources when locked, if they actually do? It could make sense that you have some pages with some data in them but in a later stage you remove the permissions to trap data accesses. I think some debuggers (e.g. watchpoints) or some cache simulators work this way. > How did you test/measure this? There's a pthread_attr_setguardsize function. Basically I've set it to large (to expand the effect of this behavior) and then created a bunch of threads and checked what happens. I could try to create a simple repro if needed.
On Fri, Sep 18, 2015 at 05:38:42PM +0100, Balazs Kezes wrote: > On 2015-09-18 10:38 -0400, Rich Felker wrote: > > I would say it's a kernel bug for PROT_NONE pages to actually occupy > > resources when locked, if they actually do? > > It could make sense that you have some pages with some data in them but > in a later stage you remove the permissions to trap data accesses. I > think some debuggers (e.g. watchpoints) or some cache simulators work > this way. I'm talking about new PROT_NONE pages. The kernel certainly accounts for them differently as commit charge. New PROT_NONE pages consume no commit charge. Anonymous pages with data in them, which would become available again if you mprotect them readable, do consume commit charge. (For this reason, you have to mmap MAP_FIXED+PROT_NONE to uncommit memory rather than just using mprotect PROT_NONE, even if you already used madvise MADV_DONTNEED on it.) > > How did you test/measure this? > > There's a pthread_attr_setguardsize function. Basically I've set it to > large (to expand the effect of this behavior) and then created a bunch > of threads and checked what happens. I could try to create a simple > repro if needed. But you were able to measure them using physical resources? Or might it possibly just be bad accounting. Utilities like 'top' are notorious for misrepresenting the memory usage of a process. Rich
On 2015-09-18 13:08 -0400, Rich Felker wrote: > I'm talking about new PROT_NONE pages. That's not how pthread does the allocation: it mmaps read/write first, and then does a mprotect(..., ..., PROT_NONE). > The kernel certainly accounts for them differently as commit charge. > New PROT_NONE pages consume no commit charge. Anonymous pages with > data in them, which would become available again if you mprotect them > readable, do consume commit charge. (For this reason, you have to mmap > MAP_FIXED+PROT_NONE to uncommit memory rather than just using mprotect > PROT_NONE, even if you already used madvise MADV_DONTNEED on it.) So while working on the repro I've looked deeper and created a simple app which demonstrates the mmap behavior: // gcc -Wall -Wextra -std=c99 mapping.c -o mapping #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(void) { int r; r = mlockall(MCL_CURRENT | (getenv("M") ? MCL_FUTURE : 0)); assert(r == 0); int flags = MAP_PRIVATE | MAP_ANONYMOUS; void *mem = mmap(NULL, 8LL << 30, PROT_WRITE, flags, -1, 0); assert(mem != NULL); sleep(100); return 0; } All it does it mmaps some memory and if I have the envvar M set then it also does the mlocking part. When I run this application without mlocking then it barely uses any RSS memory. However when I set M then I can see in htop that RSS is 8GB and that "cat /proc/meminfo | grep MemAvailable" shows 8 GB less memory. Actually when I look at the number of minor pagefaults I get this: $ /usr/bin/time -f %R ./mapping 102 $ M=1 /usr/bin/time -f %R ./mapping 4709 So I think the kernel preallocates all the memory in this case. However if I set the protection to PROT_NONE then the kernel doesn't do the preallocation. Interestingly it does *not* preallocate even if mmap with PROT_NONE first and then do a mprotect(mem, 8LL<<30, PROT_WRITE). I do see the pagefaults if I do a memset(mem, 0, 8LL<<30) afterwards though. So here's what I think pthreads should do: First mmap with PROT_NONE and only then should mprotect read/write the stack pages. Does that sound reasonable? Thanks!
On Fri, Sep 18, 2015 at 08:29:52PM +0100, Balazs Kezes wrote: > On 2015-09-18 13:08 -0400, Rich Felker wrote: > > I'm talking about new PROT_NONE pages. > > That's not how pthread does the allocation: it mmaps read/write first, > and then does a mprotect(..., ..., PROT_NONE). Ah, that explains it then. I did the opposite in musl for exactly this reason: first mmap PROT_NONE then mprotect the non-guard part PROT_READ|PROT_WRITE. > > The kernel certainly accounts for them differently as commit charge. > > New PROT_NONE pages consume no commit charge. Anonymous pages with > > data in them, which would become available again if you mprotect them > > readable, do consume commit charge. (For this reason, you have to mmap > > MAP_FIXED+PROT_NONE to uncommit memory rather than just using mprotect > > PROT_NONE, even if you already used madvise MADV_DONTNEED on it.) > > So while working on the repro I've looked deeper and created a simple > app which demonstrates the mmap behavior: > > // gcc -Wall -Wextra -std=c99 mapping.c -o mapping > #define _GNU_SOURCE > #include <assert.h> > #include <stdlib.h> > #include <stdio.h> > #include <string.h> > #include <sys/mman.h> > #include <unistd.h> > > int main(void) > { > int r; > r = mlockall(MCL_CURRENT | (getenv("M") ? MCL_FUTURE : 0)); > assert(r == 0); > > int flags = MAP_PRIVATE | MAP_ANONYMOUS; > void *mem = mmap(NULL, 8LL << 30, PROT_WRITE, flags, -1, 0); > assert(mem != NULL); > sleep(100); > > return 0; > } > > All it does it mmaps some memory and if I have the envvar M set then it > also does the mlocking part. When I run this application without > mlocking then it barely uses any RSS memory. However when I set M then I > can see in htop that RSS is 8GB and that > "cat /proc/meminfo | grep MemAvailable" shows 8 GB less memory. Actually > when I look at the number of minor pagefaults I get this: > > $ /usr/bin/time -f %R ./mapping > 102 > $ M=1 /usr/bin/time -f %R ./mapping > 4709 > > So I think the kernel preallocates all the memory in this case. > > However if I set the protection to PROT_NONE then the kernel doesn't do > the preallocation. > > Interestingly it does *not* preallocate even if mmap with PROT_NONE > first and then do a mprotect(mem, 8LL<<30, PROT_WRITE). I do see the > pagefaults if I do a memset(mem, 0, 8LL<<30) afterwards though. > > So here's what I think pthreads should do: First mmap with PROT_NONE and > only then should mprotect read/write the stack pages. > > Does that sound reasonable? Yes. Rich
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 753da61..1fc715c 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -659,6 +659,11 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, return errno; } + /* The guard pages shouldn't be locked into memory. A lot of memory + would be unnecessarily wasted if you have a lot of threads and + mlockall(MCL_FUTURE) set otherwise. We ignore the errors because + can't do anything about them anyways. */ + (void) munlock (guard, guardsize); pd->guardsize = guardsize; }