diff mbox series

libgomp, openmp: pinned memory

Message ID f5260c95-6c71-99a7-3bf2-774380444082@codesourcery.com
State New
Headers show
Series libgomp, openmp: pinned memory | expand

Commit Message

Andrew Stubbs Jan. 4, 2022, 3:32 p.m. UTC
This patch implements the OpenMP pinned memory trait for Linux hosts. On 
other hosts and on devices the trait becomes a no-op (instead of being 
rejected).

The memory is locked via the mlock syscall, which is both the "correct" 
way to do it on Linux, and a problem because the default ulimit for 
pinned memory is very small (and most users don't have permission to 
increase it (much?)). Therefore the code emits a non-fatal warning 
message if locking fails.

Another approach might be to use cudaHostAlloc to allocate the memory in 
the first place, which bypasses the ulimit somehow, but this would not 
help non-NVidia users.

The tests work on Linux and will xfail on other hosts; neither libgomp 
nor the test knows how to allocate or query pinned memory elsewhere.

The patch applies on top of the text of my previously submitted patches, 
but does not actually depend on the functionality of those patches.

OK for stage 1?

I'll commit a backport to OG11 shortly.

Andrew
libgomp: pinned memory

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_PIN): New macro.
	(xmlock): New function.
	(omp_init_allocator): Don't disallow the pinned trait.
	(omp_aligned_alloc): Add pinning via MEMSPACE_PIN.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* testsuite/libgomp.c/alloc-pinned-1.c: New test.
	* testsuite/libgomp.c/alloc-pinned-2.c: New test.

Comments

Jakub Jelinek Jan. 4, 2022, 3:55 p.m. UTC | #1
On Tue, Jan 04, 2022 at 03:32:17PM +0000, Andrew Stubbs wrote:
> This patch implements the OpenMP pinned memory trait for Linux hosts. On
> other hosts and on devices the trait becomes a no-op (instead of being
> rejected).
> 
> The memory is locked via the mlock syscall, which is both the "correct" way
> to do it on Linux, and a problem because the default ulimit for pinned
> memory is very small (and most users don't have permission to increase it
> (much?)). Therefore the code emits a non-fatal warning message if locking
> fails.
> 
> Another approach might be to use cudaHostAlloc to allocate the memory in the
> first place, which bypasses the ulimit somehow, but this would not help
> non-NVidia users.
> 
> The tests work on Linux and will xfail on other hosts; neither libgomp nor
> the test knows how to allocate or query pinned memory elsewhere.
> 
> The patch applies on top of the text of my previously submitted patches, but
> does not actually depend on the functionality of those patches.
> 
> OK for stage 1?
> 
> I'll commit a backport to OG11 shortly.
> 
> Andrew

> libgomp: pinned memory
> 
> Implement the OpenMP pinned memory trait on Linux hosts using the mlock
> syscall.
> 
> libgomp/ChangeLog:
> 
> 	* allocator.c (MEMSPACE_PIN): New macro.
> 	(xmlock): New function.
> 	(omp_init_allocator): Don't disallow the pinned trait.
> 	(omp_aligned_alloc): Add pinning via MEMSPACE_PIN.
> 	(omp_aligned_calloc): Likewise.
> 	(omp_realloc): Likewise.
> 	* testsuite/libgomp.c/alloc-pinned-1.c: New test.
> 	* testsuite/libgomp.c/alloc-pinned-2.c: New test.
> 
> diff --git a/libgomp/allocator.c b/libgomp/allocator.c
> index b1f5fe0a5e2..671b91e7ff8 100644
> --- a/libgomp/allocator.c
> +++ b/libgomp/allocator.c
> @@ -51,6 +51,25 @@
>  #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
>    ((void)MEMSPACE, (void)SIZE, free (ADDR))
>  #endif
> +#ifndef MEMSPACE_PIN
> +/* Only define this on supported host platforms.  */
> +#ifdef __linux__
> +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
> +  ((void)MEMSPACE, xmlock (ADDR, SIZE))
> +
> +#include <sys/mman.h>
> +#include <stdio.h>
> +void
> +xmlock (void *addr, size_t size)
> +{
> +  if (mlock (addr, size))
> +      perror ("libgomp: failed to pin memory (ulimit too low?)");
> +}
> +#else
> +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
> +  ((void)MEMSPACE, (void)ADDR, (void)SIZE)
> +#endif
> +#endif

The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
instead add libgomp/config/linux/allocator.c that includes some headers,
defines some macros and then includes the generic allocator.c.

I think perror is the wrong thing to do, omp_alloc etc. has a well defined
interface what to do in such cases - the allocation should just fail (not be
allocated) and depending on user's choice that can be fatal, or return NULL,
or chain to some other allocator with other properties etc.

Other issues in the patch are that it doesn't munlock on deallocation and
that because of that deallocation we need to figure out what to do on page
boundaries.  As documented, mlock can be passed address and/or address +
size that aren't at page boundaries and pinning happens even just for
partially touched pages.  But munlock unpins also even the partially
overlapping pages and we don't know at that point whether some other pinned
allocations don't appear in those pages.
Some bad options are only pin pages wholy contained within the allocation
and don't pin partial pages around it, force at least page alignment and
size so that everything can be pinned, somehow ensure that we never allocate
more than one pinned allocation in such partial pages (but can allocate
there non-pinned allocations), or e.g. use some internal data structure to
track how many pinned allocations are on the partial pages (say a hash map
from page start address to a counter how many pinned allocations are there,
if it goes to 0 munlock even that page, otherwise munlock just the wholy
contained pages), or perhaps use page size aligned allocation and size and
just remember in some data structure that the partial pages could be used
for other pinned (small) allocations.

	Jakub
Andrew Stubbs Jan. 4, 2022, 4:58 p.m. UTC | #2
On 04/01/2022 15:55, Jakub Jelinek wrote:
> The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
> instead add libgomp/config/linux/allocator.c that includes some headers,
> defines some macros and then includes the generic allocator.c.

OK, good point, I can do that.

> I think perror is the wrong thing to do, omp_alloc etc. has a well defined
> interface what to do in such cases - the allocation should just fail (not be
> allocated) and depending on user's choice that can be fatal, or return NULL,
> or chain to some other allocator with other properties etc.

I did it this way because pinning feels more like an optimization, and 
falling back to "just works" seemed like what users would want to 
happen. The perror was added because it turns out the default ulimit is 
tiny and I wanted to hint at the solution.

I guess you're right that the consistent behaviour would be to silently 
switch to the fallback allocator, but it still feels like users will be 
left in the dark about why it failed.

> Other issues in the patch are that it doesn't munlock on deallocation and
> that because of that deallocation we need to figure out what to do on page
> boundaries.  As documented, mlock can be passed address and/or address +
> size that aren't at page boundaries and pinning happens even just for
> partially touched pages.  But munlock unpins also even the partially
> overlapping pages and we don't know at that point whether some other pinned
> allocations don't appear in those pages.

Right, it doesn't munlock because of these issues. I don't know of any 
way to solve this that wouldn't involve building tables of locked ranges 
(and knowing what the page size is).

I considered using mmap with the lock flag instead, but the failure mode 
looked unhelpful. I guess we could mmap with the regular flags, then 
mlock after. That should bypass the regular heap and ensure each 
allocation has it's own page. I'm not sure what the unintended 
side-effects of that might be.

> Some bad options are only pin pages wholy contained within the allocation
> and don't pin partial pages around it, force at least page alignment and
> size so that everything can be pinned, somehow ensure that we never allocate
> more than one pinned allocation in such partial pages (but can allocate
> there non-pinned allocations), or e.g. use some internal data structure to
> track how many pinned allocations are on the partial pages (say a hash map
> from page start address to a counter how many pinned allocations are there,
> if it goes to 0 munlock even that page, otherwise munlock just the wholy
> contained pages), or perhaps use page size aligned allocation and size and
> just remember in some data structure that the partial pages could be used
> for other pinned (small) allocations.

Bad options indeed. If any part of the memory block is not pinned I 
expect no performance gains whatsoever. And all this other business adds 
complexity and runtime overhead.

For version 1.0 it feels reasonable to omit the unlock step and hope 
that a) pinned data will be long-lived, or b) short-lived pinned data 
will be replaced with more data that -- most likely -- occupies the same 
pages.

Similarly, it seems likely that serious HPC applications will run on 
devices with lots of RAM, and if not any page swapping will destroy the 
performance gains of using OpenMP.

For now I'll just fix the architectural issues.

Andrew
Jakub Jelinek Jan. 4, 2022, 6:28 p.m. UTC | #3
On Tue, Jan 04, 2022 at 04:58:19PM +0000, Andrew Stubbs wrote:
> > I think perror is the wrong thing to do, omp_alloc etc. has a well defined
> > interface what to do in such cases - the allocation should just fail (not be
> > allocated) and depending on user's choice that can be fatal, or return NULL,
> > or chain to some other allocator with other properties etc.
> 
> I did it this way because pinning feels more like an optimization, and
> falling back to "just works" seemed like what users would want to happen.
> The perror was added because it turns out the default ulimit is tiny and I
> wanted to hint at the solution.

Something like perror might be acceptable for GOMP_DEBUG mode, but not
normal operation.  So perhaps use gomp_debug there instead?

If it is just an optimization for the user, they should be using the
chaining to corresponding allocator without the pinning to make it clear
what they want and also standard conforming.

> > Other issues in the patch are that it doesn't munlock on deallocation and
> > that because of that deallocation we need to figure out what to do on page
> > boundaries.  As documented, mlock can be passed address and/or address +
> > size that aren't at page boundaries and pinning happens even just for
> > partially touched pages.  But munlock unpins also even the partially
> > overlapping pages and we don't know at that point whether some other pinned
> > allocations don't appear in those pages.
> 
> Right, it doesn't munlock because of these issues. I don't know of any way
> to solve this that wouldn't involve building tables of locked ranges (and
> knowing what the page size is).
> 
> I considered using mmap with the lock flag instead, but the failure mode
> looked unhelpful. I guess we could mmap with the regular flags, then mlock
> after. That should bypass the regular heap and ensure each allocation has
> it's own page. I'm not sure what the unintended side-effects of that might
> be.

But the munlock is even more important because of the low ulimit -l, because
if munlock isn't done on deallocation, the by default I think 64KB limit
will be reached even much earlier.  If most users have just 64KB limit on
pinned memory per process, then that most likely asks for grabbing such memory
in whole pages and doing memory management on that resource.
Because vasting that precious memory on the partial pages which will most
likely get non-pinned allocations when we just have 16 such pages is a big
waste.

	Jakub
Jakub Jelinek Jan. 4, 2022, 6:47 p.m. UTC | #4
On Tue, Jan 04, 2022 at 07:28:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > > Other issues in the patch are that it doesn't munlock on deallocation and
> > > that because of that deallocation we need to figure out what to do on page
> > > boundaries.  As documented, mlock can be passed address and/or address +
> > > size that aren't at page boundaries and pinning happens even just for
> > > partially touched pages.  But munlock unpins also even the partially
> > > overlapping pages and we don't know at that point whether some other pinned
> > > allocations don't appear in those pages.
> > 
> > Right, it doesn't munlock because of these issues. I don't know of any way
> > to solve this that wouldn't involve building tables of locked ranges (and
> > knowing what the page size is).
> > 
> > I considered using mmap with the lock flag instead, but the failure mode
> > looked unhelpful. I guess we could mmap with the regular flags, then mlock
> > after. That should bypass the regular heap and ensure each allocation has
> > it's own page. I'm not sure what the unintended side-effects of that might
> > be.
> 
> But the munlock is even more important because of the low ulimit -l, because
> if munlock isn't done on deallocation, the by default I think 64KB limit
> will be reached even much earlier.  If most users have just 64KB limit on
> pinned memory per process, then that most likely asks for grabbing such memory
> in whole pages and doing memory management on that resource.
> Because vasting that precious memory on the partial pages which will most
> likely get non-pinned allocations when we just have 16 such pages is a big
> waste.

E.g. if we start using (dynamically, using dlopen/dlsym etc.) the memkind
library for some of the allocators, for the pinned memory we could use
e.g. the memkind_create_fixed API - on the first pinned allocation, check
what is the ulimit -l and if it is fairly small, mmap PROT_NONE the whole
pinned size (but don't pin it whole at start, just whatever we need as we
go).

	Jakub
Andrew Stubbs Jan. 5, 2022, 5:07 p.m. UTC | #5
On 04/01/2022 18:47, Jakub Jelinek wrote:
> On Tue, Jan 04, 2022 at 07:28:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
>>>> Other issues in the patch are that it doesn't munlock on deallocation and
>>>> that because of that deallocation we need to figure out what to do on page
>>>> boundaries.  As documented, mlock can be passed address and/or address +
>>>> size that aren't at page boundaries and pinning happens even just for
>>>> partially touched pages.  But munlock unpins also even the partially
>>>> overlapping pages and we don't know at that point whether some other pinned
>>>> allocations don't appear in those pages.
>>>
>>> Right, it doesn't munlock because of these issues. I don't know of any way
>>> to solve this that wouldn't involve building tables of locked ranges (and
>>> knowing what the page size is).
>>>
>>> I considered using mmap with the lock flag instead, but the failure mode
>>> looked unhelpful. I guess we could mmap with the regular flags, then mlock
>>> after. That should bypass the regular heap and ensure each allocation has
>>> it's own page. I'm not sure what the unintended side-effects of that might
>>> be.
>>
>> But the munlock is even more important because of the low ulimit -l, because
>> if munlock isn't done on deallocation, the by default I think 64KB limit
>> will be reached even much earlier.  If most users have just 64KB limit on
>> pinned memory per process, then that most likely asks for grabbing such memory
>> in whole pages and doing memory management on that resource.
>> Because vasting that precious memory on the partial pages which will most
>> likely get non-pinned allocations when we just have 16 such pages is a big
>> waste.
> 
> E.g. if we start using (dynamically, using dlopen/dlsym etc.) the memkind
> library for some of the allocators, for the pinned memory we could use
> e.g. the memkind_create_fixed API - on the first pinned allocation, check
> what is the ulimit -l and if it is fairly small, mmap PROT_NONE the whole
> pinned size (but don't pin it whole at start, just whatever we need as we
> go).

I don't believe 64KB will be anything like enough for any real HPC 
application. Is it really worth optimizing for this case?

Anyway, I'm working on an implementation using mmap instead of malloc 
for pinned allocations. I figure that will simplify the unpin algorithm 
(because it'll be munmap) and optimize for large allocations such as I 
imagine HPC applications will use. It won't fix the ulimit issue.

Andrew
Andrew Stubbs Jan. 13, 2022, 1:53 p.m. UTC | #6
On 05/01/2022 17:07, Andrew Stubbs wrote:
> I don't believe 64KB will be anything like enough for any real HPC 
> application. Is it really worth optimizing for this case?
> 
> Anyway, I'm working on an implementation using mmap instead of malloc 
> for pinned allocations. I figure that will simplify the unpin algorithm 
> (because it'll be munmap) and optimize for large allocations such as I 
> imagine HPC applications will use. It won't fix the ulimit issue.

Here's my new patch.

This version is intended to apply on top of the latest version of my 
low-latency allocator patch, although the dependency is mostly textual.

Pinned memory is allocated via mmap + mlock, and allocation fails 
(returns NULL) if the lock fails and there's no fallback configured.

This means that large allocations will now be page aligned and therefore 
pin the smallest number of pages for the size requested, and that that 
memory will be unpinned automatically when freed via munmap, or moved 
via mremap.

Obviously this is not ideal for allocations much smaller than one page. 
If that turns out to be a problem in the real world then we can add a 
special case fairly straight-forwardly, and incur the extra page 
tracking expense in those cases only, or maybe implement our own 
pinned-memory heap (something like already proposed for low-latency 
memory, perhaps).

Also new is a realloc implementation that works better when reallocation 
fails. This is confirmed by the new testcases.

OK for stage 1?

Thanks

Andrew
libgomp: pinned memory

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	(xmlock): New function.
	(omp_init_allocator): Don't disallow the pinned trait.
	(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	(omp_free): Likewise.
	* config/linux/allocator.c: New file.
	* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	* testsuite/libgomp.c/alloc-pinned-1.c: New test.
	* testsuite/libgomp.c/alloc-pinned-2.c: New test.
	* testsuite/libgomp.c/alloc-pinned-3.c: New test.
	* testsuite/libgomp.c/alloc-pinned-4.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 1cc7486fc4c..5ab161b6314 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -36,16 +36,20 @@
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (SIZE))
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, SIZE))
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE))
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (ADDR))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -208,7 +212,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
     data.alignment = sizeof (void *);
 
   /* No support for these so far (for hbw will use memkind).  */
-  if (data.pinned || data.memspace == omp_high_bw_mem_space)
+  if (data.memspace == omp_high_bw_mem_space)
     return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -309,7 +313,8 @@ retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
-      ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+      ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			    allocator_data->pinned);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -329,7 +334,8 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-      ptr = MEMSPACE_ALLOC (memspace, new_size);
+      ptr = MEMSPACE_ALLOC (memspace, new_size,
+			    allocator_data && allocator_data->pinned);
       if (ptr == NULL)
 	goto fail;
     }
@@ -356,9 +362,9 @@ fail:
     {
     case omp_atv_default_mem_fb:
       if ((new_alignment > sizeof (void *) && new_alignment > alignment)
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -410,6 +416,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
   struct omp_mem_header *data;
   omp_memspace_handle_t memspace __attribute__((unused))
     = omp_default_mem_space;
+  int pinned __attribute__((unused)) = false;
 
   if (ptr == NULL)
     return;
@@ -432,11 +439,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 	}
 
       memspace = allocator_data->memspace;
+      pinned = allocator_data->pinned;
     }
   else
     memspace = predefined_alloc_mapping[data->allocator];
 
-  MEMSPACE_FREE (memspace, data->ptr, data->size);
+  MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
 }
 
 ialias (omp_free)
@@ -524,7 +532,8 @@ retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
-      ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size);
+      ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size,
+			     allocator_data->pinned);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -544,7 +553,8 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-      ptr = MEMSPACE_CALLOC (memspace, new_size);
+      ptr = MEMSPACE_CALLOC (memspace, new_size,
+			     allocator_data && allocator_data->pinned);
       if (ptr == NULL)
 	goto fail;
     }
@@ -571,9 +581,9 @@ fail:
     {
     case omp_atv_default_mem_fb:
       if ((new_alignment > sizeof (void *) && new_alignment > alignment)
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -710,9 +720,13 @@ retry:
 #endif
       if (prev_size)
 	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-				    data->size, new_size);
+				    data->size, new_size,
+				    (free_allocator_data
+				     && free_allocator_data->pinned),
+				    allocator_data->pinned);
       else
-	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+				  allocator_data->pinned);
       if (new_ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -744,9 +758,13 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-      new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
+      new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size,
+				  (free_allocator_data
+				   && free_allocator_data->pinned),
+				  allocator_data && allocator_data->pinned);
       if (new_ptr == NULL)
 	goto fail;
+
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
       ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
       ((struct omp_mem_header *) ret)[-1].size = new_size;
@@ -759,7 +777,8 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-      new_ptr = MEMSPACE_ALLOC (memspace, new_size);
+      new_ptr = MEMSPACE_ALLOC (memspace, new_size,
+				allocator_data && allocator_data->pinned);
       if (new_ptr == NULL)
 	goto fail;
     }
@@ -802,9 +821,9 @@ fail:
     {
     case omp_atv_default_mem_fb:
       if (new_alignment > sizeof (void *)
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
new file mode 100644
index 00000000000..5f3ae491f07
--- /dev/null
+++ b/libgomp/config/linux/allocator.c
@@ -0,0 +1,124 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Implement malloc routines that can handle pinned memory on Linux.
+   
+   It's possible to use mlock on any heap memory, but using munlock is
+   problematic if there are multiple pinned allocations on the same page.
+   Tracking all that manually would be possible, but adds overhead. This may
+   be worth it if there are a lot of small allocations getting pinned, but
+   this seems less likely in a HPC application.
+
+   Instead we optimize for large pinned allocations, and use mmap to ensure
+   that two pinned allocations don't share the same page.  This also means
+   that large allocations don't pin extra pages by being poorly aligned.  */
+
+#define _GNU_SOURCE
+#include <sys/mman.h>
+#include <string.h>
+#include "libgomp.h"
+
+static void *
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+{
+  (void)memspace;
+
+  if (pin)
+    {
+      void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+      if (addr == MAP_FAILED)
+	return NULL;
+
+      if (mlock (addr, size))
+	{
+	  gomp_debug (0, "libgomp: failed to pin memory (ulimit too low?)\n");
+	  munmap (addr, size);
+	  return NULL;
+	}
+
+      return addr;
+    }
+  else
+    return malloc (size);
+}
+
+static void *
+linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
+{
+  if (pin)
+    return linux_memspace_alloc (memspace, size, pin);
+  else
+    return calloc (1, size);
+}
+
+static void
+linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
+		     int pin)
+{
+  (void)memspace;
+
+  if (pin)
+    munmap (addr, size);
+  else
+    free (addr);
+}
+
+static void *
+linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+			size_t oldsize, size_t size, int oldpin, int pin)
+{
+  if (oldpin && pin)
+    {
+      void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
+      if (newaddr == MAP_FAILED)
+	return NULL;
+
+      return newaddr;
+    }
+  else if (oldpin || pin)
+    {
+      void *newaddr = linux_memspace_alloc (memspace, size, pin);
+      if (newaddr)
+	{
+	  memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+	  linux_memspace_free (memspace, addr, oldsize, oldpin);
+	}
+
+      return newaddr;
+    }
+  else
+    return realloc (addr, size);
+}
+
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  linux_memspace_alloc (MEMSPACE, SIZE, PIN)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  linux_memspace_calloc (MEMSPACE, SIZE, PIN)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  linux_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  linux_memspace_free (MEMSPACE, ADDR, SIZE, PIN)
+
+#include "../../allocator.c"
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 6bc2ea48043..f740b97f6ac 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -358,13 +358,13 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
     return realloc (addr, size);
 }
 
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_calloc (MEMSPACE, SIZE)
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
new file mode 100644
index 00000000000..0a6360cda29
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+/* Allocate more than a page each time, but stay within the ulimit.  */
+#define SIZE 10*1024
+
+int
+main ()
+{
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, 1, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
new file mode 100644
index 00000000000..8fdb4ff5cfd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works (pool_size code path).  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+/* Allocate more than a page each time, but stay within the ulimit.  */
+#define SIZE 10*1024
+
+int
+main ()
+{
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space,
+							 2, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+  if (!p)
+    abort ();
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+  if (!p)
+    abort ();
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
new file mode 100644
index 00000000000..943dfea5c9b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
@@ -0,0 +1,125 @@
+/* { dg-do run } */
+
+/* Test that pinned memory fails correctly.  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+/* This should be large enough to cover multiple pages.  */
+#define SIZE 10000*1024
+
+int
+main ()
+{
+  /* Pinned memory, no fallback.  */
+  const omp_alloctrait_t traits1[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_null_fb }
+  };
+  omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 2, traits1);
+
+  /* Pinned memory, plain memory fallback.  */
+  const omp_alloctrait_t traits2[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_default_mem_fb }
+  };
+  omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 2, traits2);
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fall back
+  p = omp_alloc (SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fall back
+  p = omp_calloc (1, SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // Should fall back to no realloc needed
+  p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc);
+  if (p != notpinned)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
new file mode 100644
index 00000000000..d9cb8dfe1fd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
@@ -0,0 +1,127 @@
+/* { dg-do run } */
+
+/* Test that pinned memory fails correctly, pool_size code path.  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+/* This should be large enough to cover multiple pages.  */
+#define SIZE 10000*1024
+
+int
+main ()
+{
+  /* Pinned memory, no fallback.  */
+  const omp_alloctrait_t traits1[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_null_fb },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 3, traits1);
+
+  /* Pinned memory, plain memory fallback.  */
+  const omp_alloctrait_t traits2[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_default_mem_fb },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 3, traits2);
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fall back
+  p = omp_alloc (SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fall back
+  p = omp_calloc (1, SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // Should fall back to no realloc needed
+  p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc);
+  if (p != notpinned)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}
diff mbox series

Patch

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index b1f5fe0a5e2..671b91e7ff8 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -51,6 +51,25 @@ 
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   ((void)MEMSPACE, (void)SIZE, free (ADDR))
 #endif
+#ifndef MEMSPACE_PIN
+/* Only define this on supported host platforms.  */
+#ifdef __linux__
+#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
+  ((void)MEMSPACE, xmlock (ADDR, SIZE))
+
+#include <sys/mman.h>
+#include <stdio.h>
+void
+xmlock (void *addr, size_t size)
+{
+  if (mlock (addr, size))
+      perror ("libgomp: failed to pin memory (ulimit too low?)");
+}
+#else
+#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \
+  ((void)MEMSPACE, (void)ADDR, (void)SIZE)
+#endif
+#endif
 
 /* Map the predefined allocators to the correct memory space.
    The index to this table is the omp_allocator_handle_t enum value.  */
@@ -212,7 +231,7 @@  omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
     data.alignment = sizeof (void *);
 
   /* No support for these so far (for hbw will use memkind).  */
-  if (data.pinned || data.memspace == omp_high_bw_mem_space)
+  if (data.memspace == omp_high_bw_mem_space)
     return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -326,6 +345,9 @@  retry:
 #endif
 	  goto fail;
 	}
+
+      if (allocator_data->pinned)
+	MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
     }
   else
     {
@@ -335,6 +357,9 @@  retry:
       ptr = MEMSPACE_ALLOC (memspace, new_size);
       if (ptr == NULL)
 	goto fail;
+
+      if (allocator_data && allocator_data->pinned)
+	MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
     }
 
   if (new_alignment > sizeof (void *))
@@ -539,6 +564,9 @@  retry:
 #endif
 	  goto fail;
 	}
+
+      if (allocator_data->pinned)
+	MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
     }
   else
     {
@@ -548,6 +576,9 @@  retry:
       ptr = MEMSPACE_CALLOC (memspace, new_size);
       if (ptr == NULL)
 	goto fail;
+
+      if (allocator_data && allocator_data->pinned)
+	MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
     }
 
   if (new_alignment > sizeof (void *))
@@ -727,7 +758,11 @@  retry:
 #endif
 	  goto fail;
 	}
-      else if (prev_size)
+
+      if (allocator_data->pinned)
+	MEMSPACE_PIN (allocator_data->memspace, new_ptr, new_size);
+
+      if (prev_size)
 	{
 	  ret = (char *) new_ptr + sizeof (struct omp_mem_header);
 	  ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
@@ -747,6 +782,10 @@  retry:
       new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
       if (new_ptr == NULL)
 	goto fail;
+
+      if (allocator_data && allocator_data->pinned)
+	MEMSPACE_PIN (allocator_data->memspace, ptr, new_size);
+
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
       ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
       ((struct omp_mem_header *) ret)[-1].size = new_size;
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
new file mode 100644
index 00000000000..0a6360cda29
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -0,0 +1,81 @@ 
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+/* Allocate more than a page each time, but stay within the ulimit.  */
+#define SIZE 10*1024
+
+int
+main ()
+{
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, 1, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
new file mode 100644
index 00000000000..8fdb4ff5cfd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -0,0 +1,87 @@ 
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works (pool_size code path).  */
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+/* Allocate more than a page each time, but stay within the ulimit.  */
+#define SIZE 10*1024
+
+int
+main ()
+{
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space,
+							 2, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+  if (!p)
+    abort ();
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+  if (!p)
+    abort ();
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}