[committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library

Message ID YqGs96PqQhyxj0yV@tucnak
State Committed
Headers
Series [committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library |

Commit Message

Jakub Jelinek June 9, 2022, 8:19 a.m. UTC
  Hi!

This patch adds support for dlopening libmemkind.so on Linux and uses it
for some kinds of allocations (but not yet e.g. pinned memory).

Bootstrapped/regtested on x86_64-linux and i686-linux (with libmemkind
around) and compile tested with LIBGOMP_USE_MEMKIND undefined, committed
to trunk.

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* allocator.c: Include dlfcn.h if LIBGOMP_USE_MEMKIND is defined.
	(enum gomp_memkind_kind): New type.
	(struct omp_allocator_data): Add memkind field if LIBGOMP_USE_MEMKIND
	is defined.
	(struct gomp_memkind_data): New type.
	(memkind_data, memkind_data_once): New variables.
	(gomp_init_memkind, gomp_get_memkind): New functions.
	(omp_init_allocator): Initialize data.memkind, don't fail for
	omp_high_bw_mem_space if libmemkind supports it.
	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
	memkind support of LIBGOMP_USE_MEMKIND is defined.
	* config/linux/allocator.c: New file.


	Jakub
  

Comments

Thomas Schwinge June 9, 2022, 10:11 a.m. UTC | #1
Hi Jakub!

On 2022-06-09T10:19:03+0200, Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> This patch adds support for dlopening libmemkind.so

Instead of 'dlopen'ing literally 'libmemkind.so':

> --- libgomp/allocator.c.jj    2022-06-08 08:21:03.099446883 +0200
> +++ libgomp/allocator.c       2022-06-08 13:41:45.647133610 +0200

> +  void *handle = dlopen ("libmemkind.so", RTLD_LAZY);

..., shouldn't this instead 'dlopen' 'libmemkind.so.0'?  At least for
Debian/Ubuntu, the latter ('libmemkind.so.0') is shipped in the "library"
package:

    $ apt-file list libmemkind0 | grep -F libmemkind.so
    libmemkind0: /usr/lib/x86_64-linux-gnu/libmemkind.so.0
    libmemkind0: /usr/lib/x86_64-linux-gnu/libmemkind.so.0.0.1

..., but the former ('libmemkind.so') only in the "development" package:

    $ apt-file list libmemkind-dev | grep -F libmemkind.so
    libmemkind-dev: /usr/lib/x86_64-linux-gnu/libmemkind.so

..., which users of GCC/libgomp shouldn't have to care about.


Any plans about test cases for this?  (Not trivial, I suppose?)

Or, at least some 'gomp_debug' logging, what's happening behind the
scenes?


> --- libgomp/config/linux/allocator.c.jj       2022-06-08 08:58:23.197078191 +0200
> +++ libgomp/config/linux/allocator.c  2022-06-08 09:39:15.108410730 +0200
> @@ -0,0 +1,36 @@

> +#define _GNU_SOURCE
> +#include "libgomp.h"
> +#if defined(PLUGIN_SUPPORT) && defined(LIBGOMP_USE_PTHREADS)
> +#define LIBGOMP_USE_MEMKIND
> +#endif
> +
> +#include "../../../allocator.c"

Given this use of 'PLUGIN_SUPPORT' (and thus 'dlopen' etc.) for something
different than libgomp plugins (offloading), might move 'DL_LIBS',
'PLUGIN_SUPPORT' from 'libgomp/plugin/configfrag.ac' into
'libgomp/configure.ac', and 'libgomp_la_LIBADD += $(DL_LIBS)' from
'libgomp/plugin/Makefrag.am' into 'libgomp/Makefile.am'.


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek June 9, 2022, 11:57 a.m. UTC | #2
On Thu, Jun 09, 2022 at 12:11:28PM +0200, Thomas Schwinge wrote:
> On 2022-06-09T10:19:03+0200, Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > This patch adds support for dlopening libmemkind.so
> 
> Instead of 'dlopen'ing literally 'libmemkind.so':
> 
> > --- libgomp/allocator.c.jj    2022-06-08 08:21:03.099446883 +0200
> > +++ libgomp/allocator.c       2022-06-08 13:41:45.647133610 +0200
> 
> > +  void *handle = dlopen ("libmemkind.so", RTLD_LAZY);
> 
> ..., shouldn't this instead 'dlopen' 'libmemkind.so.0'?  At least for
> Debian/Ubuntu, the latter ('libmemkind.so.0') is shipped in the "library"
> package:

I agree and I've actually noticed it too right before committing, but I thought
I'll investigate and tweak incrementally because "libmemkind.so"
is what I've actually tested (it is what llvm libomp uses).

> 
>     $ apt-file list libmemkind0 | grep -F libmemkind.so
>     libmemkind0: /usr/lib/x86_64-linux-gnu/libmemkind.so.0
>     libmemkind0: /usr/lib/x86_64-linux-gnu/libmemkind.so.0.0.1
> 
> ..., but the former ('libmemkind.so') only in the "development" package:
> 
>     $ apt-file list libmemkind-dev | grep -F libmemkind.so
>     libmemkind-dev: /usr/lib/x86_64-linux-gnu/libmemkind.so
> 
> ..., which users of GCC/libgomp shouldn't have to care about.

Similarly in Fedora memkind package provides just
/usr/lib64/libautohbw.so.0
/usr/lib64/libautohbw.so.0.0.0
/usr/lib64/libmemkind.so.0
/usr/lib64/libmemkind.so.0.0.1
/usr/lib64/libmemtier.so.0
/usr/lib64/libmemtier.so.0.0.0
and
/usr/lib64/libautohbw.so
/usr/lib64/libmemkind.so
/usr/lib64/libmemtier.so
comes from memkind-devel.

> Any plans about test cases for this?  (Not trivial, I suppose?)

That is the hard part.
All the testing I've done so far were for atv_interleaved:
#include <omp.h>

int
main ()
{
  omp_alloctrait_t traits[3]
    = { { omp_atk_alignment, 64 },
        { omp_atk_fallback, omp_atv_null_fb },
        { omp_atk_partition, omp_atv_interleaved } };
  omp_allocator_handle_t a;

  a = omp_init_allocator (omp_default_mem_space, 3, traits);
  if (a == omp_null_allocator)
    return 1;
  void *p = omp_alloc (128, a);
  if (!p)
    return 2;
  void *q = omp_realloc (p, 256, a, a);
  if (!q)
    return 3;
  void *r = omp_calloc (1, 512, a);
  if (!r)
    return 4;
  omp_free (q, a);
  omp_free (r, a);
  return 0;
}
because that is something that works even on my devel WS, though
in the testcase one doesn't figure out if memkind was actually available and
whether the memory was indeed interleaved or not, just that it works
(I could certainly also store some data and read them back after realloc,
and also test one without omp_atk_alignment which effectively prevents
memkind_realloc from being called and uses allocation + deallocation), but
that is it.  I've actually stepped through in the debugger to verify
memkind_* is called...

Now for HBW memory, some googling around and brief look at the memkind
source shows that it probably supports just Intel Xeon Phi HBW memory,
I'm waiting for access to such a box right now but it might take a few days.

For the DAX stuff, I admit I don't know what it exactly is (what kind of hw
it needs).

> > --- libgomp/config/linux/allocator.c.jj       2022-06-08 08:58:23.197078191 +0200
> > +++ libgomp/config/linux/allocator.c  2022-06-08 09:39:15.108410730 +0200
> > @@ -0,0 +1,36 @@
> 
> > +#define _GNU_SOURCE
> > +#include "libgomp.h"
> > +#if defined(PLUGIN_SUPPORT) && defined(LIBGOMP_USE_PTHREADS)
> > +#define LIBGOMP_USE_MEMKIND
> > +#endif
> > +
> > +#include "../../../allocator.c"
> 
> Given this use of 'PLUGIN_SUPPORT' (and thus 'dlopen' etc.) for something
> different than libgomp plugins (offloading), might move 'DL_LIBS',
> 'PLUGIN_SUPPORT' from 'libgomp/plugin/configfrag.ac' into
> 'libgomp/configure.ac', and 'libgomp_la_LIBADD += $(DL_LIBS)' from
> 'libgomp/plugin/Makefrag.am' into 'libgomp/Makefile.am'.

Maybe, but libgomp/plugin/configfrag.ac is included unconditionally
and the memkind support is some kind of plugin too, just not offloading
plugin, but allocator plugin...
Didn't want to spend too much time on it and PLUGIN_SUPPORT
is right now solely about dlsym exists and -ldl works and has been added.

	Jakub
  
Richard Sandiford June 9, 2022, 5:07 p.m. UTC | #3
Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi!
>
> This patch adds support for dlopening libmemkind.so on Linux and uses it
> for some kinds of allocations (but not yet e.g. pinned memory).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (with libmemkind
> around) and compile tested with LIBGOMP_USE_MEMKIND undefined, committed
> to trunk.
>
> 2022-06-09  Jakub Jelinek  <jakub@redhat.com>
>
> 	* allocator.c: Include dlfcn.h if LIBGOMP_USE_MEMKIND is defined.
> 	(enum gomp_memkind_kind): New type.
> 	(struct omp_allocator_data): Add memkind field if LIBGOMP_USE_MEMKIND
> 	is defined.
> 	(struct gomp_memkind_data): New type.
> 	(memkind_data, memkind_data_once): New variables.
> 	(gomp_init_memkind, gomp_get_memkind): New functions.
> 	(omp_init_allocator): Initialize data.memkind, don't fail for
> 	omp_high_bw_mem_space if libmemkind supports it.
> 	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
> 	memkind support of LIBGOMP_USE_MEMKIND is defined.
> 	* config/linux/allocator.c: New file.

Dunno if this has already been reported, but I'm getting:

.../libgomp/config/linux/allocator.c:36:10: fatal error: ../../../allocator.c: No such file or directory
   36 | #include "../../../allocator.c"
      |          ^~~~~~~~~~~~~~~~~~~~~~

Should there be one less "../"?

Richard

> --- libgomp/allocator.c.jj	2022-06-08 08:21:03.099446883 +0200
> +++ libgomp/allocator.c	2022-06-08 13:41:45.647133610 +0200
> @@ -31,9 +31,28 @@
>  #include "libgomp.h"
>  #include <stdlib.h>
>  #include <string.h>
> +#ifdef LIBGOMP_USE_MEMKIND
> +#include <dlfcn.h>
> +#endif
>  
>  #define omp_max_predefined_alloc omp_thread_mem_alloc
>  
> +enum gomp_memkind_kind
> +{
> +  GOMP_MEMKIND_NONE = 0,
> +#define GOMP_MEMKIND_KINDS \
> +  GOMP_MEMKIND_KIND (HBW_INTERLEAVE),		\
> +  GOMP_MEMKIND_KIND (HBW_PREFERRED),		\
> +  GOMP_MEMKIND_KIND (DAX_KMEM_ALL),		\
> +  GOMP_MEMKIND_KIND (DAX_KMEM),			\
> +  GOMP_MEMKIND_KIND (INTERLEAVE),		\
> +  GOMP_MEMKIND_KIND (DEFAULT)
> +#define GOMP_MEMKIND_KIND(kind) GOMP_MEMKIND_##kind
> +  GOMP_MEMKIND_KINDS,
> +#undef GOMP_MEMKIND_KIND
> +  GOMP_MEMKIND_COUNT
> +};
> +
>  struct omp_allocator_data
>  {
>    omp_memspace_handle_t memspace;
> @@ -46,6 +65,9 @@ struct omp_allocator_data
>    unsigned int fallback : 8;
>    unsigned int pinned : 1;
>    unsigned int partition : 7;
> +#ifdef LIBGOMP_USE_MEMKIND
> +  unsigned int memkind : 8;
> +#endif
>  #ifndef HAVE_SYNC_BUILTINS
>    gomp_mutex_t lock;
>  #endif
> @@ -59,13 +81,95 @@ struct omp_mem_header
>    void *pad;
>  };
>  
> +struct gomp_memkind_data
> +{
> +  void *memkind_handle;
> +  void *(*memkind_malloc) (void *, size_t);
> +  void *(*memkind_calloc) (void *, size_t, size_t);
> +  void *(*memkind_realloc) (void *, void *, size_t);
> +  void (*memkind_free) (void *, void *);
> +  int (*memkind_check_available) (void *);
> +  void **kinds[GOMP_MEMKIND_COUNT];
> +};
> +
> +#ifdef LIBGOMP_USE_MEMKIND
> +static struct gomp_memkind_data *memkind_data;
> +static pthread_once_t memkind_data_once = PTHREAD_ONCE_INIT;
> +
> +static void
> +gomp_init_memkind (void)
> +{
> +  void *handle = dlopen ("libmemkind.so", RTLD_LAZY);
> +  struct gomp_memkind_data *data;
> +  int i;
> +  static const char *kinds[] = {
> +    NULL,
> +#define GOMP_MEMKIND_KIND(kind) "MEMKIND_" #kind
> +    GOMP_MEMKIND_KINDS
> +#undef GOMP_MEMKIND_KIND
> +  };
> +
> +  data = calloc (1, sizeof (struct gomp_memkind_data));
> +  if (data == NULL)
> +    {
> +      if (handle)
> +	dlclose (handle);
> +      return;
> +    }
> +  if (!handle)
> +    {
> +      __atomic_store_n (&memkind_data, data, MEMMODEL_RELEASE);
> +      return;
> +    }
> +  data->memkind_handle = handle;
> +  data->memkind_malloc
> +    = (__typeof (data->memkind_malloc)) dlsym (handle, "memkind_malloc");
> +  data->memkind_calloc
> +    = (__typeof (data->memkind_calloc)) dlsym (handle, "memkind_calloc");
> +  data->memkind_realloc
> +    = (__typeof (data->memkind_realloc)) dlsym (handle, "memkind_realloc");
> +  data->memkind_free
> +    = (__typeof (data->memkind_free)) dlsym (handle, "memkind_free");
> +  data->memkind_check_available
> +    = (__typeof (data->memkind_check_available))
> +      dlsym (handle, "memkind_check_available");
> +  if (data->memkind_malloc
> +      && data->memkind_calloc
> +      && data->memkind_realloc
> +      && data->memkind_free
> +      && data->memkind_check_available)
> +    for (i = 1; i < GOMP_MEMKIND_COUNT; ++i)
> +      {
> +	data->kinds[i] = (void **) dlsym (handle, kinds[i]);
> +	if (data->kinds[i] && data->memkind_check_available (*data->kinds[i]))
> +	  data->kinds[i] = NULL;
> +      }
> +  __atomic_store_n (&memkind_data, data, MEMMODEL_RELEASE);
> +}
> +
> +static struct gomp_memkind_data *
> +gomp_get_memkind (void)
> +{
> +  struct gomp_memkind_data *data
> +    = __atomic_load_n (&memkind_data, MEMMODEL_ACQUIRE);
> +  if (data)
> +    return data;
> +  pthread_once (&memkind_data_once, gomp_init_memkind);
> +  return __atomic_load_n (&memkind_data, MEMMODEL_ACQUIRE);
> +}
> +#endif
> +
>  omp_allocator_handle_t
>  omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
>  		    const omp_alloctrait_t traits[])
>  {
>    struct omp_allocator_data data
>      = { memspace, 1, ~(uintptr_t) 0, 0, 0, omp_atv_contended, omp_atv_all,
> -	omp_atv_default_mem_fb, omp_atv_false, omp_atv_environment };
> +	omp_atv_default_mem_fb, omp_atv_false, omp_atv_environment,
> +#ifdef LIBGOMP_USE_MEMKIND
> +	GOMP_MEMKIND_NONE
> +#endif
> +      };
>    struct omp_allocator_data *ret;
>    int i;
>  
> @@ -179,8 +283,48 @@ omp_init_allocator (omp_memspace_handle_
>    if (data.alignment < sizeof (void *))
>      data.alignment = sizeof (void *);
>  
> -  /* No support for these so far (for hbw will use memkind).  */
> -  if (data.pinned || data.memspace == omp_high_bw_mem_space)
> +  switch (memspace)
> +    {
> +    case omp_high_bw_mem_space:
> +#ifdef LIBGOMP_USE_MEMKIND
> +      struct gomp_memkind_data *memkind_data;
> +      memkind_data = gomp_get_memkind ();
> +      if (data.partition == omp_atv_interleaved
> +	  && memkind_data->kinds[GOMP_MEMKIND_HBW_INTERLEAVE])
> +	{
> +	  data.memkind = GOMP_MEMKIND_HBW_INTERLEAVE;
> +	  break;
> +	}
> +      else if (memkind_data->kinds[GOMP_MEMKIND_HBW_PREFERRED])
> +	{
> +	  data.memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +	  break;
> +	}
> +#endif
> +      return omp_null_allocator;
> +    case omp_large_cap_mem_space:
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind_data = gomp_get_memkind ();
> +      if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM_ALL])
> +	data.memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      else if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM])
> +	data.memkind = GOMP_MEMKIND_DAX_KMEM;
> +#endif
> +      break;
> +    default:
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (data.partition == omp_atv_interleaved)
> +	{
> +	  memkind_data = gomp_get_memkind ();
> +	  if (memkind_data->kinds[GOMP_MEMKIND_INTERLEAVE])
> +	    data.memkind = GOMP_MEMKIND_INTERLEAVE;
> +	}
> +#endif
> +      break;
> +    }
> +
> +  /* No support for this so far.  */
> +  if (data.pinned)
>      return omp_null_allocator;
>  
>    ret = gomp_malloc (sizeof (struct omp_allocator_data));
> @@ -213,6 +357,9 @@ omp_aligned_alloc (size_t alignment, siz
>    struct omp_allocator_data *allocator_data;
>    size_t new_size, new_alignment;
>    void *ptr, *ret;
> +#ifdef LIBGOMP_USE_MEMKIND
> +  enum gomp_memkind_kind memkind;
> +#endif
>  
>    if (__builtin_expect (size == 0, 0))
>      return NULL;
> @@ -232,12 +379,28 @@ retry:
>        allocator_data = (struct omp_allocator_data *) allocator;
>        if (new_alignment < allocator_data->alignment)
>  	new_alignment = allocator_data->alignment;
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind = allocator_data->memkind;
> +#endif
>      }
>    else
>      {
>        allocator_data = NULL;
>        if (new_alignment < sizeof (void *))
>  	new_alignment = sizeof (void *);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind = GOMP_MEMKIND_NONE;
> +      if (allocator == omp_high_bw_mem_alloc)
> +	memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +      else if (allocator == omp_large_cap_mem_alloc)
> +	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  if (!memkind_data->kinds[memkind])
> +	    memkind = GOMP_MEMKIND_NONE;
> +	}
> +#endif
>      }
>  
>    new_size = sizeof (struct omp_mem_header);
> @@ -281,7 +444,16 @@ retry:
>        allocator_data->used_pool_size = used_pool_size;
>        gomp_mutex_unlock (&allocator_data->lock);
>  #endif
> -      ptr = malloc (new_size);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  ptr = memkind_data->memkind_malloc (kind, new_size);
> +	}
> +      else
> +#endif
> +	ptr = malloc (new_size);
>        if (ptr == NULL)
>  	{
>  #ifdef HAVE_SYNC_BUILTINS
> @@ -297,7 +469,16 @@ retry:
>      }
>    else
>      {
> -      ptr = malloc (new_size);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  ptr = memkind_data->memkind_malloc (kind, new_size);
> +	}
> +      else
> +#endif
> +	ptr = malloc (new_size);
>        if (ptr == NULL)
>  	goto fail;
>      }
> @@ -321,6 +502,9 @@ fail:
>  	{
>  	case omp_atv_default_mem_fb:
>  	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
> +#ifdef LIBGOMP_USE_MEMKIND
> +	      || memkind
> +#endif
>  	      || (allocator_data
>  		  && allocator_data->pool_size < ~(uintptr_t) 0))
>  	    {
> @@ -393,7 +577,36 @@ omp_free (void *ptr, omp_allocator_handl
>  	  gomp_mutex_unlock (&allocator_data->lock);
>  #endif
>  	}
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (allocator_data->memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[allocator_data->memkind];
> +	  memkind_data->memkind_free (kind, data->ptr);
> +	  return;
> +	}
> +#endif
> +    }
> +#ifdef LIBGOMP_USE_MEMKIND
> +  else
> +    {
> +      enum gomp_memkind_kind memkind = GOMP_MEMKIND_NONE;
> +      if (data->allocator == omp_high_bw_mem_alloc)
> +	memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +      else if (data->allocator == omp_large_cap_mem_alloc)
> +	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  if (memkind_data->kinds[memkind])
> +	    {
> +	      void *kind = *memkind_data->kinds[memkind];
> +	      memkind_data->memkind_free (kind, data->ptr);
> +	      return;
> +	    }
> +	}
>      }
> +#endif
>    free (data->ptr);
>  }
>  
> @@ -412,6 +625,9 @@ omp_aligned_calloc (size_t alignment, si
>    struct omp_allocator_data *allocator_data;
>    size_t new_size, size_temp, new_alignment;
>    void *ptr, *ret;
> +#ifdef LIBGOMP_USE_MEMKIND
> +  enum gomp_memkind_kind memkind;
> +#endif
>  
>    if (__builtin_expect (size == 0 || nmemb == 0, 0))
>      return NULL;
> @@ -431,12 +647,28 @@ retry:
>        allocator_data = (struct omp_allocator_data *) allocator;
>        if (new_alignment < allocator_data->alignment)
>  	new_alignment = allocator_data->alignment;
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind = allocator_data->memkind;
> +#endif
>      }
>    else
>      {
>        allocator_data = NULL;
>        if (new_alignment < sizeof (void *))
>  	new_alignment = sizeof (void *);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind = GOMP_MEMKIND_NONE;
> +      if (allocator == omp_high_bw_mem_alloc)
> +	memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +      else if (allocator == omp_large_cap_mem_alloc)
> +	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  if (!memkind_data->kinds[memkind])
> +	    memkind = GOMP_MEMKIND_NONE;
> +	}
> +#endif
>      }
>  
>    new_size = sizeof (struct omp_mem_header);
> @@ -482,7 +714,16 @@ retry:
>        allocator_data->used_pool_size = used_pool_size;
>        gomp_mutex_unlock (&allocator_data->lock);
>  #endif
> -      ptr = calloc (1, new_size);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  ptr = memkind_data->memkind_calloc (kind, 1, new_size);
> +	}
> +      else
> +#endif
> +	ptr = calloc (1, new_size);
>        if (ptr == NULL)
>  	{
>  #ifdef HAVE_SYNC_BUILTINS
> @@ -498,7 +739,16 @@ retry:
>      }
>    else
>      {
> -      ptr = calloc (1, new_size);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  ptr = memkind_data->memkind_calloc (kind, 1, new_size);
> +	}
> +      else
> +#endif
> +	ptr = calloc (1, new_size);
>        if (ptr == NULL)
>  	goto fail;
>      }
> @@ -522,6 +772,9 @@ fail:
>  	{
>  	case omp_atv_default_mem_fb:
>  	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
> +#ifdef LIBGOMP_USE_MEMKIND
> +	      || memkind
> +#endif
>  	      || (allocator_data
>  		  && allocator_data->pool_size < ~(uintptr_t) 0))
>  	    {
> @@ -562,6 +815,9 @@ omp_realloc (void *ptr, size_t size, omp
>    size_t new_size, old_size, new_alignment, old_alignment;
>    void *new_ptr, *ret;
>    struct omp_mem_header *data;
> +#ifdef LIBGOMP_USE_MEMKIND
> +  enum gomp_memkind_kind memkind, free_memkind;
> +#endif
>  
>    if (__builtin_expect (ptr == NULL, 0))
>      return ialias_call (omp_aligned_alloc) (1, size, allocator);
> @@ -585,13 +841,51 @@ retry:
>        allocator_data = (struct omp_allocator_data *) allocator;
>        if (new_alignment < allocator_data->alignment)
>  	new_alignment = allocator_data->alignment;
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind = allocator_data->memkind;
> +#endif
>      }
>    else
> -    allocator_data = NULL;
> +    {
> +      allocator_data = NULL;
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind = GOMP_MEMKIND_NONE;
> +      if (allocator == omp_high_bw_mem_alloc)
> +	memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +      else if (allocator == omp_large_cap_mem_alloc)
> +	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  if (!memkind_data->kinds[memkind])
> +	    memkind = GOMP_MEMKIND_NONE;
> +	}
> +#endif
> +    }
>    if (free_allocator > omp_max_predefined_alloc)
> -    free_allocator_data = (struct omp_allocator_data *) free_allocator;
> +    {
> +      free_allocator_data = (struct omp_allocator_data *) free_allocator;
> +#ifdef LIBGOMP_USE_MEMKIND
> +      free_memkind = free_allocator_data->memkind;
> +#endif
> +    }
>    else
> -    free_allocator_data = NULL;
> +    {
> +      free_allocator_data = NULL;
> +#ifdef LIBGOMP_USE_MEMKIND
> +      free_memkind = GOMP_MEMKIND_NONE;
> +      if (free_allocator == omp_high_bw_mem_alloc)
> +	free_memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +      else if (free_allocator == omp_large_cap_mem_alloc)
> +	free_memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      if (free_memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  if (!memkind_data->kinds[free_memkind])
> +	    free_memkind = GOMP_MEMKIND_NONE;
> +	}
> +#endif
> +    }
>    old_alignment = (uintptr_t) ptr - (uintptr_t) (data->ptr);
>  
>    new_size = sizeof (struct omp_mem_header);
> @@ -659,6 +953,19 @@ retry:
>        allocator_data->used_pool_size = used_pool_size;
>        gomp_mutex_unlock (&allocator_data->lock);
>  #endif
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  if (prev_size)
> +	    new_ptr = memkind_data->memkind_realloc (kind, data->ptr,
> +						     new_size);
> +	  else
> +	    new_ptr = memkind_data->memkind_malloc (kind, new_size);
> +	}
> +      else
> +#endif
>        if (prev_size)
>  	new_ptr = realloc (data->ptr, new_size);
>        else
> @@ -687,10 +994,23 @@ retry:
>      }
>    else if (new_alignment == sizeof (void *)
>  	   && old_alignment == sizeof (struct omp_mem_header)
> +#ifdef LIBGOMP_USE_MEMKIND
> +	   && memkind == free_memkind
> +#endif
>  	   && (free_allocator_data == NULL
>  	       || free_allocator_data->pool_size == ~(uintptr_t) 0))
>      {
> -      new_ptr = realloc (data->ptr, new_size);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  new_ptr = memkind_data->memkind_realloc (kind, data->ptr,
> +						   new_size);
> +	}
> +      else
> +#endif
> +	new_ptr = realloc (data->ptr, new_size);
>        if (new_ptr == NULL)
>  	goto fail;
>        ret = (char *) new_ptr + sizeof (struct omp_mem_header);
> @@ -701,7 +1021,16 @@ retry:
>      }
>    else
>      {
> -      new_ptr = malloc (new_size);
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (memkind)
> +	{
> +	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +	  void *kind = *memkind_data->kinds[memkind];
> +	  new_ptr = memkind_data->memkind_malloc (kind, new_size);
> +	}
> +      else
> +#endif
> +	new_ptr = malloc (new_size);
>        if (new_ptr == NULL)
>  	goto fail;
>      }
> @@ -731,6 +1060,15 @@ retry:
>        gomp_mutex_unlock (&free_allocator_data->lock);
>  #endif
>      }
> +#ifdef LIBGOMP_USE_MEMKIND
> +  if (free_memkind)
> +    {
> +      struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
> +      void *kind = *memkind_data->kinds[free_memkind];
> +      memkind_data->memkind_free (kind, data->ptr);
> +      return ret;
> +    }
> +#endif
>    free (data->ptr);
>    return ret;
>  
> @@ -741,6 +1079,9 @@ fail:
>  	{
>  	case omp_atv_default_mem_fb:
>  	  if (new_alignment > sizeof (void *)
> +#ifdef LIBGOMP_USE_MEMKIND
> +	      || memkind
> +#endif
>  	      || (allocator_data
>  		  && allocator_data->pool_size < ~(uintptr_t) 0))
>  	    {
> --- libgomp/config/linux/allocator.c.jj	2022-06-08 08:58:23.197078191 +0200
> +++ libgomp/config/linux/allocator.c	2022-06-08 09:39:15.108410730 +0200
> @@ -0,0 +1,36 @@
> +/* Copyright (C) 2022 Free Software Foundation, Inc.
> +   Contributed by Jakub Jelinek <jakub@redhat.com>.
> +
> +   This file is part of the GNU Offloading and Multi Processing Library
> +   (libgomp).
> +
> +   Libgomp is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* This file contains wrappers for the system allocation routines.  Most
> +   places in the OpenMP API do not make any provision for failure, so in
> +   general we cannot allow memory allocation to fail.  */
> +
> +#define _GNU_SOURCE
> +#include "libgomp.h"
> +#if defined(PLUGIN_SUPPORT) && defined(LIBGOMP_USE_PTHREADS)
> +#define LIBGOMP_USE_MEMKIND
> +#endif
> +
> +#include "../../../allocator.c"
>
> 	Jakub
  
Jakub Jelinek June 9, 2022, 5:48 p.m. UTC | #4
On Thu, Jun 09, 2022 at 06:07:20PM +0100, Richard Sandiford wrote:
> Dunno if this has already been reported, but I'm getting:
> 
> .../libgomp/config/linux/allocator.c:36:10: fatal error: ../../../allocator.c: No such file or directory
>    36 | #include "../../../allocator.c"
>       |          ^~~~~~~~~~~~~~~~~~~~~~
> 
> Should there be one less "../"?

Ouch, you're right.
I'm configuring with ../configure, dunno if that is the reason why it
happened to work for me.

Fixed up now, sorry.

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/allocator.c: Fix up #include directive.

--- libgomp/config/linux/allocator.c.jj
+++ libgomp/config/linux/allocator.c
@@ -33,4 +33,4 @@
 #define LIBGOMP_USE_MEMKIND
 #endif
 
-#include "../../../allocator.c"
+#include "../../allocator.c"


	Jakub
  
Jakub Jelinek June 10, 2022, 7:23 p.m. UTC | #5
On Thu, Jun 09, 2022 at 01:57:52PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jun 09, 2022 at 12:11:28PM +0200, Thomas Schwinge wrote:
> > On 2022-06-09T10:19:03+0200, Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > > This patch adds support for dlopening libmemkind.so
> > 
> > Instead of 'dlopen'ing literally 'libmemkind.so':
> > 
> > > --- libgomp/allocator.c.jj    2022-06-08 08:21:03.099446883 +0200
> > > +++ libgomp/allocator.c       2022-06-08 13:41:45.647133610 +0200
> > 
> > > +  void *handle = dlopen ("libmemkind.so", RTLD_LAZY);
> > 
> > ..., shouldn't this instead 'dlopen' 'libmemkind.so.0'?  At least for
> > Debian/Ubuntu, the latter ('libmemkind.so.0') is shipped in the "library"
> > package:
> 
> I agree and I've actually noticed it too right before committing, but I thought
> I'll investigate and tweak incrementally because "libmemkind.so"
> is what I've actually tested (it is what llvm libomp uses).

And here is what I've committed after bootstrapping/regtesting it on
x86_64-linux and i686-linux.

2022-06-10  Jakub Jelinek  <jakub@redhat.com>

	* allocator.c (gomp_init_memkind): Call dlopen with "libmemkind.so.0"
	rather than "libmemkind.so".

--- libgomp/allocator.c.jj	2022-06-09 10:14:33.470973961 +0200
+++ libgomp/allocator.c	2022-06-09 14:05:33.665803457 +0200
@@ -99,7 +99,7 @@ static pthread_once_t memkind_data_once
 static void
 gomp_init_memkind (void)
 {
-  void *handle = dlopen ("libmemkind.so", RTLD_LAZY);
+  void *handle = dlopen ("libmemkind.so.0", RTLD_LAZY);
   struct gomp_memkind_data *data;
   int i;
   static const char *kinds[] = {


	Jakub
  
Andrew Stubbs June 28, 2022, 9:29 p.m. UTC | #6
On 09/06/2022 09:19, Jakub Jelinek via Gcc-patches wrote:
> +  switch (memspace)
> +    {
> +    case omp_high_bw_mem_space:
> +#ifdef LIBGOMP_USE_MEMKIND
> +      struct gomp_memkind_data *memkind_data;
> +      memkind_data = gomp_get_memkind ();
> +      if (data.partition == omp_atv_interleaved
> +	  && memkind_data->kinds[GOMP_MEMKIND_HBW_INTERLEAVE])
> +	{
> +	  data.memkind = GOMP_MEMKIND_HBW_INTERLEAVE;
> +	  break;
> +	}
> +      else if (memkind_data->kinds[GOMP_MEMKIND_HBW_PREFERRED])
> +	{
> +	  data.memkind = GOMP_MEMKIND_HBW_PREFERRED;
> +	  break;
> +	}
> +#endif
> +      return omp_null_allocator;
> +    case omp_large_cap_mem_space:
> +#ifdef LIBGOMP_USE_MEMKIND
> +      memkind_data = gomp_get_memkind ();
> +      if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM_ALL])
> +	data.memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> +      else if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM])
> +	data.memkind = GOMP_MEMKIND_DAX_KMEM;
> +#endif
> +      break;
> +    default:
> +#ifdef LIBGOMP_USE_MEMKIND
> +      if (data.partition == omp_atv_interleaved)
> +	{
> +	  memkind_data = gomp_get_memkind ();
> +	  if (memkind_data->kinds[GOMP_MEMKIND_INTERLEAVE])
> +	    data.memkind = GOMP_MEMKIND_INTERLEAVE;
> +	}
> +#endif
> +      break;
> +    }
> +

This conflicts with mine and Abid's patches to implement the device 
allocators and host unified shared memory via the overridable 
"MEMSPACE_ALLOC" hooks. I can still use those for the "else" case, but 
they'll be inactive on any configuration where libmemkind exists. That's 
fine for the device code, and may be OK for USM (given that libmemkind 
won't have an option for that). There's a problem for the 
NVidia-specific host-memory pinning I have planned though.

How do you propose we resolve this conflict, please?

Ideally it will be in such a way that the conditional is resolved at 
compile time and the routine can be inlined (so no fake-OO function 
pointers in structs, I think).

Thanks

Andrew
  
Jakub Jelinek June 29, 2022, 10:45 a.m. UTC | #7
On Tue, Jun 28, 2022 at 10:29:53PM +0100, Andrew Stubbs wrote:
> On 09/06/2022 09:19, Jakub Jelinek via Gcc-patches wrote:
> > +  switch (memspace)
> > +    {
> > +    case omp_high_bw_mem_space:
> > +#ifdef LIBGOMP_USE_MEMKIND
> > +      struct gomp_memkind_data *memkind_data;
> > +      memkind_data = gomp_get_memkind ();
> > +      if (data.partition == omp_atv_interleaved
> > +	  && memkind_data->kinds[GOMP_MEMKIND_HBW_INTERLEAVE])
> > +	{
> > +	  data.memkind = GOMP_MEMKIND_HBW_INTERLEAVE;
> > +	  break;
> > +	}
> > +      else if (memkind_data->kinds[GOMP_MEMKIND_HBW_PREFERRED])
> > +	{
> > +	  data.memkind = GOMP_MEMKIND_HBW_PREFERRED;
> > +	  break;
> > +	}
> > +#endif
> > +      return omp_null_allocator;
> > +    case omp_large_cap_mem_space:
> > +#ifdef LIBGOMP_USE_MEMKIND
> > +      memkind_data = gomp_get_memkind ();
> > +      if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM_ALL])
> > +	data.memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
> > +      else if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM])
> > +	data.memkind = GOMP_MEMKIND_DAX_KMEM;
> > +#endif
> > +      break;
> > +    default:
> > +#ifdef LIBGOMP_USE_MEMKIND
> > +      if (data.partition == omp_atv_interleaved)
> > +	{
> > +	  memkind_data = gomp_get_memkind ();
> > +	  if (memkind_data->kinds[GOMP_MEMKIND_INTERLEAVE])
> > +	    data.memkind = GOMP_MEMKIND_INTERLEAVE;
> > +	}
> > +#endif
> > +      break;
> > +    }
> > +
> 
> This conflicts with mine and Abid's patches to implement the device
> allocators and host unified shared memory via the overridable
> "MEMSPACE_ALLOC" hooks. I can still use those for the "else" case, but
> they'll be inactive on any configuration where libmemkind exists. That's
> fine for the device code, and may be OK for USM (given that libmemkind won't
> have an option for that). There's a problem for the NVidia-specific
> host-memory pinning I have planned though.
> 
> How do you propose we resolve this conflict, please?
> 
> Ideally it will be in such a way that the conditional is resolved at compile
> time and the routine can be inlined (so no fake-OO function pointers in
> structs, I think).

memkind isn't used just because it exists, but because it supports some
requested trait that isn't otherwise supported.
Using callbacks instead of selecting the allocator to call using ifs is
possible if all those callbacks have the same arguments or if we add enough
dummy arguments to the callbacks such that some wrappers can handle it all.

Right now, memkind is used if ->memkind in the allocator data isn't
GOMP_MEMKIND_NONE, so when it is, you can certainly call some callback
(performance question might be whether to call it in the else case always
using callback or whether to call malloc etc. directly if callback is NULL).

And omp_init_allocator needs to decide what to do if one asks for features
that need memkind as well as for features that need whatever you/Abid have
been working on.  A possible resolution is punt (return omp_null_allocator),
or prefer one feature over the other one or vice versa.
From the features currently handled by memkind, even before my changes for
consistency with libomp from llvm omp_high_bw_mem_space was considered a
hard request, ditto omp_atk_pinned omp_atv_true, but the rest was just taken
as optimization hint.

	Jakub
  
Andrew Stubbs June 29, 2022, 12:24 p.m. UTC | #8
On 29/06/2022 11:45, Jakub Jelinek wrote:
> And omp_init_allocator needs to decide what to do if one asks for features
> that need memkind as well as for features that need whatever you/Abid have
> been working on.  A possible resolution is punt (return omp_null_allocator),
> or prefer one feature over the other one or vice versa.
>  From the features currently handled by memkind, even before my changes for
> consistency with libomp from llvm omp_high_bw_mem_space was considered a
> hard request, ditto omp_atk_pinned omp_atv_true, but the rest was just taken
> as optimization hint.

Right now I'm rebasing our patches so that they build and pass our tests 
again.

I don't know what to do if someone requests pinned high-bandwidth 
memory. I don't even know if that's a thing that can exist. Right now, 
as I have my dev branch, the high-bandwidth will take precedence simply 
because memkind is checked first. I presume that if memkind ever grew 
the ability to allocate pinned memory you would still have the choose 
one "kind" or the other.

Andrew
  

Patch

--- libgomp/allocator.c.jj	2022-06-08 08:21:03.099446883 +0200
+++ libgomp/allocator.c	2022-06-08 13:41:45.647133610 +0200
@@ -31,9 +31,28 @@ 
 #include "libgomp.h"
 #include <stdlib.h>
 #include <string.h>
+#ifdef LIBGOMP_USE_MEMKIND
+#include <dlfcn.h>
+#endif
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+enum gomp_memkind_kind
+{
+  GOMP_MEMKIND_NONE = 0,
+#define GOMP_MEMKIND_KINDS \
+  GOMP_MEMKIND_KIND (HBW_INTERLEAVE),		\
+  GOMP_MEMKIND_KIND (HBW_PREFERRED),		\
+  GOMP_MEMKIND_KIND (DAX_KMEM_ALL),		\
+  GOMP_MEMKIND_KIND (DAX_KMEM),			\
+  GOMP_MEMKIND_KIND (INTERLEAVE),		\
+  GOMP_MEMKIND_KIND (DEFAULT)
+#define GOMP_MEMKIND_KIND(kind) GOMP_MEMKIND_##kind
+  GOMP_MEMKIND_KINDS,
+#undef GOMP_MEMKIND_KIND
+  GOMP_MEMKIND_COUNT
+};
+
 struct omp_allocator_data
 {
   omp_memspace_handle_t memspace;
@@ -46,6 +65,9 @@  struct omp_allocator_data
   unsigned int fallback : 8;
   unsigned int pinned : 1;
   unsigned int partition : 7;
+#ifdef LIBGOMP_USE_MEMKIND
+  unsigned int memkind : 8;
+#endif
 #ifndef HAVE_SYNC_BUILTINS
   gomp_mutex_t lock;
 #endif
@@ -59,13 +81,95 @@  struct omp_mem_header
   void *pad;
 };
 
+struct gomp_memkind_data
+{
+  void *memkind_handle;
+  void *(*memkind_malloc) (void *, size_t);
+  void *(*memkind_calloc) (void *, size_t, size_t);
+  void *(*memkind_realloc) (void *, void *, size_t);
+  void (*memkind_free) (void *, void *);
+  int (*memkind_check_available) (void *);
+  void **kinds[GOMP_MEMKIND_COUNT];
+};
+
+#ifdef LIBGOMP_USE_MEMKIND
+static struct gomp_memkind_data *memkind_data;
+static pthread_once_t memkind_data_once = PTHREAD_ONCE_INIT;
+
+static void
+gomp_init_memkind (void)
+{
+  void *handle = dlopen ("libmemkind.so", RTLD_LAZY);
+  struct gomp_memkind_data *data;
+  int i;
+  static const char *kinds[] = {
+    NULL,
+#define GOMP_MEMKIND_KIND(kind) "MEMKIND_" #kind
+    GOMP_MEMKIND_KINDS
+#undef GOMP_MEMKIND_KIND
+  };
+
+  data = calloc (1, sizeof (struct gomp_memkind_data));
+  if (data == NULL)
+    {
+      if (handle)
+	dlclose (handle);
+      return;
+    }
+  if (!handle)
+    {
+      __atomic_store_n (&memkind_data, data, MEMMODEL_RELEASE);
+      return;
+    }
+  data->memkind_handle = handle;
+  data->memkind_malloc
+    = (__typeof (data->memkind_malloc)) dlsym (handle, "memkind_malloc");
+  data->memkind_calloc
+    = (__typeof (data->memkind_calloc)) dlsym (handle, "memkind_calloc");
+  data->memkind_realloc
+    = (__typeof (data->memkind_realloc)) dlsym (handle, "memkind_realloc");
+  data->memkind_free
+    = (__typeof (data->memkind_free)) dlsym (handle, "memkind_free");
+  data->memkind_check_available
+    = (__typeof (data->memkind_check_available))
+      dlsym (handle, "memkind_check_available");
+  if (data->memkind_malloc
+      && data->memkind_calloc
+      && data->memkind_realloc
+      && data->memkind_free
+      && data->memkind_check_available)
+    for (i = 1; i < GOMP_MEMKIND_COUNT; ++i)
+      {
+	data->kinds[i] = (void **) dlsym (handle, kinds[i]);
+	if (data->kinds[i] && data->memkind_check_available (*data->kinds[i]))
+	  data->kinds[i] = NULL;
+      }
+  __atomic_store_n (&memkind_data, data, MEMMODEL_RELEASE);
+}
+
+static struct gomp_memkind_data *
+gomp_get_memkind (void)
+{
+  struct gomp_memkind_data *data
+    = __atomic_load_n (&memkind_data, MEMMODEL_ACQUIRE);
+  if (data)
+    return data;
+  pthread_once (&memkind_data_once, gomp_init_memkind);
+  return __atomic_load_n (&memkind_data, MEMMODEL_ACQUIRE);
+}
+#endif
+
 omp_allocator_handle_t
 omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 		    const omp_alloctrait_t traits[])
 {
   struct omp_allocator_data data
     = { memspace, 1, ~(uintptr_t) 0, 0, 0, omp_atv_contended, omp_atv_all,
-	omp_atv_default_mem_fb, omp_atv_false, omp_atv_environment };
+	omp_atv_default_mem_fb, omp_atv_false, omp_atv_environment,
+#ifdef LIBGOMP_USE_MEMKIND
+	GOMP_MEMKIND_NONE
+#endif
+      };
   struct omp_allocator_data *ret;
   int i;
 
@@ -179,8 +283,48 @@  omp_init_allocator (omp_memspace_handle_
   if (data.alignment < sizeof (void *))
     data.alignment = sizeof (void *);
 
-  /* No support for these so far (for hbw will use memkind).  */
-  if (data.pinned || data.memspace == omp_high_bw_mem_space)
+  switch (memspace)
+    {
+    case omp_high_bw_mem_space:
+#ifdef LIBGOMP_USE_MEMKIND
+      struct gomp_memkind_data *memkind_data;
+      memkind_data = gomp_get_memkind ();
+      if (data.partition == omp_atv_interleaved
+	  && memkind_data->kinds[GOMP_MEMKIND_HBW_INTERLEAVE])
+	{
+	  data.memkind = GOMP_MEMKIND_HBW_INTERLEAVE;
+	  break;
+	}
+      else if (memkind_data->kinds[GOMP_MEMKIND_HBW_PREFERRED])
+	{
+	  data.memkind = GOMP_MEMKIND_HBW_PREFERRED;
+	  break;
+	}
+#endif
+      return omp_null_allocator;
+    case omp_large_cap_mem_space:
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind_data = gomp_get_memkind ();
+      if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM_ALL])
+	data.memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
+      else if (memkind_data->kinds[GOMP_MEMKIND_DAX_KMEM])
+	data.memkind = GOMP_MEMKIND_DAX_KMEM;
+#endif
+      break;
+    default:
+#ifdef LIBGOMP_USE_MEMKIND
+      if (data.partition == omp_atv_interleaved)
+	{
+	  memkind_data = gomp_get_memkind ();
+	  if (memkind_data->kinds[GOMP_MEMKIND_INTERLEAVE])
+	    data.memkind = GOMP_MEMKIND_INTERLEAVE;
+	}
+#endif
+      break;
+    }
+
+  /* No support for this so far.  */
+  if (data.pinned)
     return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -213,6 +357,9 @@  omp_aligned_alloc (size_t alignment, siz
   struct omp_allocator_data *allocator_data;
   size_t new_size, new_alignment;
   void *ptr, *ret;
+#ifdef LIBGOMP_USE_MEMKIND
+  enum gomp_memkind_kind memkind;
+#endif
 
   if (__builtin_expect (size == 0, 0))
     return NULL;
@@ -232,12 +379,28 @@  retry:
       allocator_data = (struct omp_allocator_data *) allocator;
       if (new_alignment < allocator_data->alignment)
 	new_alignment = allocator_data->alignment;
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind = allocator_data->memkind;
+#endif
     }
   else
     {
       allocator_data = NULL;
       if (new_alignment < sizeof (void *))
 	new_alignment = sizeof (void *);
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind = GOMP_MEMKIND_NONE;
+      if (allocator == omp_high_bw_mem_alloc)
+	memkind = GOMP_MEMKIND_HBW_PREFERRED;
+      else if (allocator == omp_large_cap_mem_alloc)
+	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  if (!memkind_data->kinds[memkind])
+	    memkind = GOMP_MEMKIND_NONE;
+	}
+#endif
     }
 
   new_size = sizeof (struct omp_mem_header);
@@ -281,7 +444,16 @@  retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
-      ptr = malloc (new_size);
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  ptr = memkind_data->memkind_malloc (kind, new_size);
+	}
+      else
+#endif
+	ptr = malloc (new_size);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -297,7 +469,16 @@  retry:
     }
   else
     {
-      ptr = malloc (new_size);
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  ptr = memkind_data->memkind_malloc (kind, new_size);
+	}
+      else
+#endif
+	ptr = malloc (new_size);
       if (ptr == NULL)
 	goto fail;
     }
@@ -321,6 +502,9 @@  fail:
 	{
 	case omp_atv_default_mem_fb:
 	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+#ifdef LIBGOMP_USE_MEMKIND
+	      || memkind
+#endif
 	      || (allocator_data
 		  && allocator_data->pool_size < ~(uintptr_t) 0))
 	    {
@@ -393,7 +577,36 @@  omp_free (void *ptr, omp_allocator_handl
 	  gomp_mutex_unlock (&allocator_data->lock);
 #endif
 	}
+#ifdef LIBGOMP_USE_MEMKIND
+      if (allocator_data->memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[allocator_data->memkind];
+	  memkind_data->memkind_free (kind, data->ptr);
+	  return;
+	}
+#endif
+    }
+#ifdef LIBGOMP_USE_MEMKIND
+  else
+    {
+      enum gomp_memkind_kind memkind = GOMP_MEMKIND_NONE;
+      if (data->allocator == omp_high_bw_mem_alloc)
+	memkind = GOMP_MEMKIND_HBW_PREFERRED;
+      else if (data->allocator == omp_large_cap_mem_alloc)
+	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  if (memkind_data->kinds[memkind])
+	    {
+	      void *kind = *memkind_data->kinds[memkind];
+	      memkind_data->memkind_free (kind, data->ptr);
+	      return;
+	    }
+	}
     }
+#endif
   free (data->ptr);
 }
 
@@ -412,6 +625,9 @@  omp_aligned_calloc (size_t alignment, si
   struct omp_allocator_data *allocator_data;
   size_t new_size, size_temp, new_alignment;
   void *ptr, *ret;
+#ifdef LIBGOMP_USE_MEMKIND
+  enum gomp_memkind_kind memkind;
+#endif
 
   if (__builtin_expect (size == 0 || nmemb == 0, 0))
     return NULL;
@@ -431,12 +647,28 @@  retry:
       allocator_data = (struct omp_allocator_data *) allocator;
       if (new_alignment < allocator_data->alignment)
 	new_alignment = allocator_data->alignment;
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind = allocator_data->memkind;
+#endif
     }
   else
     {
       allocator_data = NULL;
       if (new_alignment < sizeof (void *))
 	new_alignment = sizeof (void *);
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind = GOMP_MEMKIND_NONE;
+      if (allocator == omp_high_bw_mem_alloc)
+	memkind = GOMP_MEMKIND_HBW_PREFERRED;
+      else if (allocator == omp_large_cap_mem_alloc)
+	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  if (!memkind_data->kinds[memkind])
+	    memkind = GOMP_MEMKIND_NONE;
+	}
+#endif
     }
 
   new_size = sizeof (struct omp_mem_header);
@@ -482,7 +714,16 @@  retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
-      ptr = calloc (1, new_size);
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  ptr = memkind_data->memkind_calloc (kind, 1, new_size);
+	}
+      else
+#endif
+	ptr = calloc (1, new_size);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -498,7 +739,16 @@  retry:
     }
   else
     {
-      ptr = calloc (1, new_size);
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  ptr = memkind_data->memkind_calloc (kind, 1, new_size);
+	}
+      else
+#endif
+	ptr = calloc (1, new_size);
       if (ptr == NULL)
 	goto fail;
     }
@@ -522,6 +772,9 @@  fail:
 	{
 	case omp_atv_default_mem_fb:
 	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+#ifdef LIBGOMP_USE_MEMKIND
+	      || memkind
+#endif
 	      || (allocator_data
 		  && allocator_data->pool_size < ~(uintptr_t) 0))
 	    {
@@ -562,6 +815,9 @@  omp_realloc (void *ptr, size_t size, omp
   size_t new_size, old_size, new_alignment, old_alignment;
   void *new_ptr, *ret;
   struct omp_mem_header *data;
+#ifdef LIBGOMP_USE_MEMKIND
+  enum gomp_memkind_kind memkind, free_memkind;
+#endif
 
   if (__builtin_expect (ptr == NULL, 0))
     return ialias_call (omp_aligned_alloc) (1, size, allocator);
@@ -585,13 +841,51 @@  retry:
       allocator_data = (struct omp_allocator_data *) allocator;
       if (new_alignment < allocator_data->alignment)
 	new_alignment = allocator_data->alignment;
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind = allocator_data->memkind;
+#endif
     }
   else
-    allocator_data = NULL;
+    {
+      allocator_data = NULL;
+#ifdef LIBGOMP_USE_MEMKIND
+      memkind = GOMP_MEMKIND_NONE;
+      if (allocator == omp_high_bw_mem_alloc)
+	memkind = GOMP_MEMKIND_HBW_PREFERRED;
+      else if (allocator == omp_large_cap_mem_alloc)
+	memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  if (!memkind_data->kinds[memkind])
+	    memkind = GOMP_MEMKIND_NONE;
+	}
+#endif
+    }
   if (free_allocator > omp_max_predefined_alloc)
-    free_allocator_data = (struct omp_allocator_data *) free_allocator;
+    {
+      free_allocator_data = (struct omp_allocator_data *) free_allocator;
+#ifdef LIBGOMP_USE_MEMKIND
+      free_memkind = free_allocator_data->memkind;
+#endif
+    }
   else
-    free_allocator_data = NULL;
+    {
+      free_allocator_data = NULL;
+#ifdef LIBGOMP_USE_MEMKIND
+      free_memkind = GOMP_MEMKIND_NONE;
+      if (free_allocator == omp_high_bw_mem_alloc)
+	free_memkind = GOMP_MEMKIND_HBW_PREFERRED;
+      else if (free_allocator == omp_large_cap_mem_alloc)
+	free_memkind = GOMP_MEMKIND_DAX_KMEM_ALL;
+      if (free_memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  if (!memkind_data->kinds[free_memkind])
+	    free_memkind = GOMP_MEMKIND_NONE;
+	}
+#endif
+    }
   old_alignment = (uintptr_t) ptr - (uintptr_t) (data->ptr);
 
   new_size = sizeof (struct omp_mem_header);
@@ -659,6 +953,19 @@  retry:
       allocator_data->used_pool_size = used_pool_size;
       gomp_mutex_unlock (&allocator_data->lock);
 #endif
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  if (prev_size)
+	    new_ptr = memkind_data->memkind_realloc (kind, data->ptr,
+						     new_size);
+	  else
+	    new_ptr = memkind_data->memkind_malloc (kind, new_size);
+	}
+      else
+#endif
       if (prev_size)
 	new_ptr = realloc (data->ptr, new_size);
       else
@@ -687,10 +994,23 @@  retry:
     }
   else if (new_alignment == sizeof (void *)
 	   && old_alignment == sizeof (struct omp_mem_header)
+#ifdef LIBGOMP_USE_MEMKIND
+	   && memkind == free_memkind
+#endif
 	   && (free_allocator_data == NULL
 	       || free_allocator_data->pool_size == ~(uintptr_t) 0))
     {
-      new_ptr = realloc (data->ptr, new_size);
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  new_ptr = memkind_data->memkind_realloc (kind, data->ptr,
+						   new_size);
+	}
+      else
+#endif
+	new_ptr = realloc (data->ptr, new_size);
       if (new_ptr == NULL)
 	goto fail;
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
@@ -701,7 +1021,16 @@  retry:
     }
   else
     {
-      new_ptr = malloc (new_size);
+#ifdef LIBGOMP_USE_MEMKIND
+      if (memkind)
+	{
+	  struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+	  void *kind = *memkind_data->kinds[memkind];
+	  new_ptr = memkind_data->memkind_malloc (kind, new_size);
+	}
+      else
+#endif
+	new_ptr = malloc (new_size);
       if (new_ptr == NULL)
 	goto fail;
     }
@@ -731,6 +1060,15 @@  retry:
       gomp_mutex_unlock (&free_allocator_data->lock);
 #endif
     }
+#ifdef LIBGOMP_USE_MEMKIND
+  if (free_memkind)
+    {
+      struct gomp_memkind_data *memkind_data = gomp_get_memkind ();
+      void *kind = *memkind_data->kinds[free_memkind];
+      memkind_data->memkind_free (kind, data->ptr);
+      return ret;
+    }
+#endif
   free (data->ptr);
   return ret;
 
@@ -741,6 +1079,9 @@  fail:
 	{
 	case omp_atv_default_mem_fb:
 	  if (new_alignment > sizeof (void *)
+#ifdef LIBGOMP_USE_MEMKIND
+	      || memkind
+#endif
 	      || (allocator_data
 		  && allocator_data->pool_size < ~(uintptr_t) 0))
 	    {
--- libgomp/config/linux/allocator.c.jj	2022-06-08 08:58:23.197078191 +0200
+++ libgomp/config/linux/allocator.c	2022-06-08 09:39:15.108410730 +0200
@@ -0,0 +1,36 @@ 
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Jakub Jelinek <jakub@redhat.com>.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file contains wrappers for the system allocation routines.  Most
+   places in the OpenMP API do not make any provision for failure, so in
+   general we cannot allow memory allocation to fail.  */
+
+#define _GNU_SOURCE
+#include "libgomp.h"
+#if defined(PLUGIN_SUPPORT) && defined(LIBGOMP_USE_PTHREADS)
+#define LIBGOMP_USE_MEMKIND
+#endif
+
+#include "../../../allocator.c"