From patchwork Thu Jul  7 10:34:32 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55822
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 62736384D1BA
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:35:38 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 845BB3857B99
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:35:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 845BB3857B99
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112657"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:13 -0800
IronPort-SDR: 
 lw219uuPrHWMB2ohKRIoS300zZMXKYjx39vKvNplsppqkrsDScUMkMgH6tI23D+k6ihGh8k8lB
 Jfind9b37Qbgr7GklpTCQthT6WilZmN1/0FHh6FFRW23PLCZ33j0EQBSyNJvKwRn34GOKtciRm
 rowpSbtXbZQBbfYxsoerhkVH3gpEgi73tOlCf1bC0Q2MGYZIkaGmx0S3Y6HekguovyUNBbwUp3
 V5H+rCGbGKQmlvanjz6WWL77sLCBd0KPsvryZ18wISU7wLmLP9HtNHUob8CkJGtVyafblF1z9N
 8ec=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 01/17] libgomp, nvptx: low-latency memory allocator
Date: Thu, 7 Jul 2022 11:34:32 +0100
Message-ID: 
 <400092d8ce44340cece0e2e38f88edbad6400b03.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS,
 SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): New macro.
	(MEMSPACE_CALLOC): New macro.
	(MEMSPACE_REALLOC): New macro.
	(MEMSPACE_FREE): New macro.
	(dynamic_smem_size): New constants.
	(omp_alloc): Use MEMSPACE_ALLOC.
	Implement fall-backs for predefined allocators.
	(omp_free): Use MEMSPACE_FREE.
	(omp_calloc): Use MEMSPACE_CALLOC.
	Implement fall-backs for predefined allocators.
	(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
	Implement fall-backs for predefined allocators.
	* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
	(__nvptx_lowlat_pool): New asm varaible.
	(gomp_nvptx_main): Initialize the low-latency heap.
	* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
	(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
	(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
	* config/nvptx/allocator.c: New file.
	* testsuite/libgomp.c/allocators-1.c: New test.
	* testsuite/libgomp.c/allocators-2.c: New test.
	* testsuite/libgomp.c/allocators-3.c: New test.
	* testsuite/libgomp.c/allocators-4.c: New test.
	* testsuite/libgomp.c/allocators-5.c: New test.
	* testsuite/libgomp.c/allocators-6.c: New test.

co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>
---
 libgomp/allocator.c                        | 235 ++++++++-----
 libgomp/config/nvptx/allocator.c           | 370 +++++++++++++++++++++
 libgomp/config/nvptx/team.c                |  28 ++
 libgomp/plugin/plugin-nvptx.c              |  23 +-
 libgomp/testsuite/libgomp.c/allocators-1.c |  56 ++++
 libgomp/testsuite/libgomp.c/allocators-2.c |  64 ++++
 libgomp/testsuite/libgomp.c/allocators-3.c |  42 +++
 libgomp/testsuite/libgomp.c/allocators-4.c | 196 +++++++++++
 libgomp/testsuite/libgomp.c/allocators-5.c |  63 ++++
 libgomp/testsuite/libgomp.c/allocators-6.c | 117 +++++++
 10 files changed, 1110 insertions(+), 84 deletions(-)
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-6.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index b04820b8cf9..9b33bcf529b 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -37,6 +37,34 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config/<target>/allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 enum gomp_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
@@ -453,7 +481,7 @@ retry:
 	}
       else
 #endif
-	ptr = malloc (new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -478,7 +506,13 @@ retry:
 	}
       else
 #endif
-	ptr = malloc (new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	}
       if (ptr == NULL)
 	goto fail;
     }
@@ -496,35 +530,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+		  ? allocator_data->fallback
+		  : allocator == omp_default_mem_alloc
+		  ? omp_atv_null_fb
+		  : omp_atv_default_mem_fb);
+  switch (fallback)
     {
-      switch (allocator_data->fallback)
-	{
-	case omp_atv_default_mem_fb:
-	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+    case omp_atv_default_mem_fb:
+      if ((new_alignment > sizeof (void *) && new_alignment > alignment)
 #ifdef LIBGOMP_USE_MEMKIND
-	      || memkind
+	  || memkind
 #endif
-	      || (allocator_data
-		  && allocator_data->pool_size < ~(uintptr_t) 0))
-	    {
-	      allocator = omp_default_mem_alloc;
-	      goto retry;
-	    }
-	  /* Otherwise, we've already performed default mem allocation
-	     and if that failed, it won't succeed again (unless it was
-	     intermittent.  Return NULL then, as that is the fallback.  */
-	  break;
-	case omp_atv_null_fb:
-	  break;
-	default:
-	case omp_atv_abort_fb:
-	  gomp_fatal ("Out of memory allocating %lu bytes",
-		      (unsigned long) size);
-	case omp_atv_allocator_fb:
-	  allocator = allocator_data->fb_data;
+	  || (allocator_data
+	      && allocator_data->pool_size < ~(uintptr_t) 0)
+	  || !allocator_data)
+	{
+	  allocator = omp_default_mem_alloc;
 	  goto retry;
 	}
+      /* Otherwise, we've already performed default mem allocation
+	 and if that failed, it won't succeed again (unless it was
+	 intermittent.  Return NULL then, as that is the fallback.  */
+      break;
+    case omp_atv_null_fb:
+      break;
+    default:
+    case omp_atv_abort_fb:
+      gomp_fatal ("Out of memory allocating %lu bytes",
+		  (unsigned long) size);
+    case omp_atv_allocator_fb:
+      allocator = allocator_data->fb_data;
+      goto retry;
     }
   return NULL;
 }
@@ -557,6 +594,8 @@ void
 omp_free (void *ptr, omp_allocator_handle_t allocator)
 {
   struct omp_mem_header *data;
+  omp_memspace_handle_t memspace __attribute__((unused))
+    = omp_default_mem_space;
 
   if (ptr == NULL)
     return;
@@ -586,10 +625,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 	  return;
 	}
 #endif
+
+      memspace = allocator_data->memspace;
     }
-#ifdef LIBGOMP_USE_MEMKIND
   else
     {
+#ifdef LIBGOMP_USE_MEMKIND
       enum gomp_memkind_kind memkind = GOMP_MEMKIND_NONE;
       if (data->allocator == omp_high_bw_mem_alloc)
 	memkind = GOMP_MEMKIND_HBW_PREFERRED;
@@ -605,9 +646,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 	      return;
 	    }
 	}
-    }
 #endif
-  free (data->ptr);
+
+      memspace = predefined_alloc_mapping[data->allocator];
+    }
+
+  MEMSPACE_FREE (memspace, data->ptr, data->size);
 }
 
 ialias (omp_free)
@@ -723,7 +767,7 @@ retry:
 	}
       else
 #endif
-	ptr = calloc (1, new_size);
+	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -748,7 +792,13 @@ retry:
 	}
       else
 #endif
-	ptr = calloc (1, new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size);
+	}
       if (ptr == NULL)
 	goto fail;
     }
@@ -766,35 +816,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+		  ? allocator_data->fallback
+		  : allocator == omp_default_mem_alloc
+		  ? omp_atv_null_fb
+		  : omp_atv_default_mem_fb);
+  switch (fallback)
     {
-      switch (allocator_data->fallback)
-	{
-	case omp_atv_default_mem_fb:
-	  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+    case omp_atv_default_mem_fb:
+      if ((new_alignment > sizeof (void *) && new_alignment > alignment)
 #ifdef LIBGOMP_USE_MEMKIND
-	      || memkind
+	  || memkind
 #endif
-	      || (allocator_data
-		  && allocator_data->pool_size < ~(uintptr_t) 0))
-	    {
-	      allocator = omp_default_mem_alloc;
-	      goto retry;
-	    }
-	  /* Otherwise, we've already performed default mem allocation
-	     and if that failed, it won't succeed again (unless it was
-	     intermittent.  Return NULL then, as that is the fallback.  */
-	  break;
-	case omp_atv_null_fb:
-	  break;
-	default:
-	case omp_atv_abort_fb:
-	  gomp_fatal ("Out of memory allocating %lu bytes",
-		      (unsigned long) (size * nmemb));
-	case omp_atv_allocator_fb:
-	  allocator = allocator_data->fb_data;
+	  || (allocator_data
+	      && allocator_data->pool_size < ~(uintptr_t) 0)
+	  || !allocator_data)
+	{
+	  allocator = omp_default_mem_alloc;
 	  goto retry;
 	}
+      /* Otherwise, we've already performed default mem allocation
+	 and if that failed, it won't succeed again (unless it was
+	 intermittent.  Return NULL then, as that is the fallback.  */
+      break;
+    case omp_atv_null_fb:
+      break;
+    default:
+    case omp_atv_abort_fb:
+      gomp_fatal ("Out of memory allocating %lu bytes",
+		  (unsigned long) (size * nmemb));
+    case omp_atv_allocator_fb:
+      allocator = allocator_data->fb_data;
+      goto retry;
     }
   return NULL;
 }
@@ -967,9 +1020,10 @@ retry:
       else
 #endif
       if (prev_size)
-	new_ptr = realloc (data->ptr, new_size);
+	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+				    data->size, new_size);
       else
-	new_ptr = malloc (new_size);
+	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
       if (new_ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -1010,7 +1064,13 @@ retry:
 	}
       else
 #endif
-	new_ptr = realloc (data->ptr, new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
+	}
       if (new_ptr == NULL)
 	goto fail;
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
@@ -1030,7 +1090,13 @@ retry:
 	}
       else
 #endif
-	new_ptr = malloc (new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->memspace
+	       : predefined_alloc_mapping[allocator]);
+	  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
+	}
       if (new_ptr == NULL)
 	goto fail;
     }
@@ -1073,35 +1139,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+		  ? allocator_data->fallback
+		  : allocator == omp_default_mem_alloc
+		  ? omp_atv_null_fb
+		  : omp_atv_default_mem_fb);
+  switch (fallback)
     {
-      switch (allocator_data->fallback)
-	{
-	case omp_atv_default_mem_fb:
-	  if (new_alignment > sizeof (void *)
+    case omp_atv_default_mem_fb:
+      if (new_alignment > sizeof (void *)
 #ifdef LIBGOMP_USE_MEMKIND
-	      || memkind
+	  || memkind
 #endif
-	      || (allocator_data
-		  && allocator_data->pool_size < ~(uintptr_t) 0))
-	    {
-	      allocator = omp_default_mem_alloc;
-	      goto retry;
-	    }
-	  /* Otherwise, we've already performed default mem allocation
-	     and if that failed, it won't succeed again (unless it was
-	     intermittent.  Return NULL then, as that is the fallback.  */
-	  break;
-	case omp_atv_null_fb:
-	  break;
-	default:
-	case omp_atv_abort_fb:
-	  gomp_fatal ("Out of memory allocating %lu bytes",
-		      (unsigned long) size);
-	case omp_atv_allocator_fb:
-	  allocator = allocator_data->fb_data;
+	  || (allocator_data
+	      && allocator_data->pool_size < ~(uintptr_t) 0)
+	  || !allocator_data)
+	{
+	  allocator = omp_default_mem_alloc;
 	  goto retry;
 	}
+      /* Otherwise, we've already performed default mem allocation
+	 and if that failed, it won't succeed again (unless it was
+	 intermittent.  Return NULL then, as that is the fallback.  */
+      break;
+    case omp_atv_null_fb:
+      break;
+    default:
+    case omp_atv_abort_fb:
+      gomp_fatal ("Out of memory allocating %lu bytes",
+		  (unsigned long) size);
+    case omp_atv_allocator_fb:
+      allocator = allocator_data->fb_data;
+      goto retry;
     }
   return NULL;
 }
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
new file mode 100644
index 00000000000..6bc2ea48043
--- /dev/null
+++ b/libgomp/config/nvptx/allocator.c
@@ -0,0 +1,370 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The low-latency allocators use space reserved in .shared memory when the
+   kernel is launched.  The heap is initialized in gomp_nvptx_main and all
+   allocations are forgotten when the kernel exits.  Allocations to other
+   memory spaces all use the system malloc syscall.
+
+   The root heap descriptor is stored elsewhere in shared memory, and each
+   free chunk contains a similar descriptor for the next free chunk in the
+   chain.
+
+   The descriptor is two 16-bit values: offset and size, which describe the
+   location of a chunk of memory available for allocation. The offset is
+   relative to the base of the heap.  The special value 0xffff, 0xffff
+   indicates that the heap is locked.  The descriptor is encoded into a
+   single 32-bit integer so that it may be easily accessed atomically.
+
+   Memory is allocated to the first free chunk that fits.  The free chain
+   is always stored in order of the offset to assist coalescing adjacent
+   chunks.  */
+
+#include "libgomp.h"
+#include <stdlib.h>
+
+/* There should be some .shared space reserved for us.  There's no way to
+   express this magic extern sizeless array in C so use asm.  */
+asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n");
+
+extern uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon));
+
+typedef union {
+  uint32_t raw;
+  struct {
+    uint16_t offset;
+    uint16_t size;
+  } desc;
+} heapdesc;
+
+static void *
+nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      char *shared_pool;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+
+      /* Memory is allocated in 8-byte granularity.  */
+      size = (size + 7) & ~7;
+
+      /* Acquire a lock on the low-latency heap.  */
+      heapdesc root;
+      do
+	{
+	  root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+					  0xffffffff, MEMMODEL_ACQUIRE);
+	  if (root.raw != 0xffffffff)
+	    break;
+	  /* Spin.  */
+	}
+      while (1);
+
+      /* Walk the free chain.  */
+      heapdesc chunk = {root.raw};
+      uint32_t *prev_chunkptr = NULL;
+      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+      heapdesc onward_chain = {chunkptr[0]};
+      while (chunk.desc.size != 0 && (uint32_t)size > chunk.desc.size)
+	{
+	  chunk.raw = onward_chain.raw;
+	  prev_chunkptr = chunkptr;
+	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+	  onward_chain.raw = chunkptr[0];
+	}
+
+      void *result = NULL;
+      if (chunk.desc.size != 0)
+	{
+	  /* Allocation successful.  */
+	  result = chunkptr;
+
+	  /* Update the free chain.  */
+	  heapdesc stillfree = {chunk.raw};
+	  stillfree.desc.offset += size;
+	  stillfree.desc.size -= size;
+	  uint32_t *stillfreeptr = (uint32_t*)(shared_pool
+					       + stillfree.desc.offset);
+
+	  if (stillfree.desc.size == 0)
+	    /* The whole chunk was used.  */
+	    stillfree.raw = onward_chain.raw;
+	  else
+	    /* The chunk was split, so restore the onward chain.  */
+	    stillfreeptr[0] = onward_chain.raw;
+
+	  /* The previous free slot or root now points to stillfree.  */
+	  if (prev_chunkptr)
+	    prev_chunkptr[0] = stillfree.raw;
+	  else
+	    root.raw = stillfree.raw;
+	}
+
+      /* Update the free chain root and release the lock.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+      return result;
+    }
+  else
+    return malloc (size);
+}
+
+static void *
+nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      /* Memory is allocated in 8-byte granularity.  */
+      size = (size + 7) & ~7;
+
+      uint64_t *result = nvptx_memspace_alloc (memspace, size);
+      if (result)
+	/* Inline memset in which we know size is a multiple of 8.  */
+	for (unsigned i = 0; i < (unsigned)size/8; i++)
+	  result[i] = 0;
+
+      return result;
+    }
+  else
+    return calloc (1, size);
+}
+
+static void
+nvptx_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      char *shared_pool;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+
+      /* Memory is allocated in 8-byte granularity.  */
+      size = (size + 7) & ~7;
+
+      /* Acquire a lock on the low-latency heap.  */
+      heapdesc root;
+      do
+	{
+	  root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+					  0xffffffff, MEMMODEL_ACQUIRE);
+	  if (root.raw != 0xffffffff)
+	    break;
+	  /* Spin.  */
+	}
+      while (1);
+
+      /* Walk the free chain to find where to insert a new entry.  */
+      heapdesc chunk = {root.raw}, prev_chunk;
+      uint32_t *prev_chunkptr = NULL, *prevprev_chunkptr = NULL;
+      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+      heapdesc onward_chain = {chunkptr[0]};
+      while (chunk.desc.size != 0 && addr > (void*)chunkptr)
+	{
+	  prev_chunk.raw = chunk.raw;
+	  chunk.raw = onward_chain.raw;
+	  prevprev_chunkptr = prev_chunkptr;
+	  prev_chunkptr = chunkptr;
+	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+	  onward_chain.raw = chunkptr[0];
+	}
+
+      /* Create the new chunk descriptor.  */
+      heapdesc newfreechunk;
+      newfreechunk.desc.offset = (uint16_t)((uintptr_t)addr
+					    - (uintptr_t)shared_pool);
+      newfreechunk.desc.size = (uint16_t)size;
+
+      /* Coalesce adjacent free chunks.  */
+      if (newfreechunk.desc.offset + size == chunk.desc.offset)
+	{
+	  /* Free chunk follows.  */
+	  newfreechunk.desc.size += chunk.desc.size;
+	  chunk.raw = onward_chain.raw;
+	}
+      if (prev_chunkptr)
+	{
+	  if (prev_chunk.desc.offset + prev_chunk.desc.size
+	      == newfreechunk.desc.offset)
+	    {
+	      /* Free chunk precedes.  */
+	      newfreechunk.desc.offset = prev_chunk.desc.offset;
+	      newfreechunk.desc.size += prev_chunk.desc.size;
+	      addr = shared_pool + prev_chunk.desc.offset;
+	      prev_chunkptr = prevprev_chunkptr;
+	    }
+	}
+
+      /* Update the free chain in the new and previous chunks.  */
+      ((uint32_t*)addr)[0] = chunk.raw;
+      if (prev_chunkptr)
+	prev_chunkptr[0] = newfreechunk.raw;
+      else
+	root.raw = newfreechunk.raw;
+
+      /* Update the free chain root and release the lock.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+    }
+  else
+    free (addr);
+}
+
+static void *
+nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+			size_t oldsize, size_t size)
+{
+  if (memspace == omp_low_lat_mem_space)
+    {
+      char *shared_pool;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+
+      /* Memory is allocated in 8-byte granularity.  */
+      oldsize = (oldsize + 7) & ~7;
+      size = (size + 7) & ~7;
+
+      if (oldsize == size)
+	return addr;
+
+      /* Acquire a lock on the low-latency heap.  */
+      heapdesc root;
+      do
+	{
+	  root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
+					  0xffffffff, MEMMODEL_ACQUIRE);
+	  if (root.raw != 0xffffffff)
+	    break;
+	  /* Spin.  */
+	}
+      while (1);
+
+      /* Walk the free chain.  */
+      heapdesc chunk = {root.raw};
+      uint32_t *prev_chunkptr = NULL;
+      uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+      heapdesc onward_chain = {chunkptr[0]};
+      while (chunk.desc.size != 0 && (void*)chunkptr < addr)
+	{
+	  chunk.raw = onward_chain.raw;
+	  prev_chunkptr = chunkptr;
+	  chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
+	  onward_chain.raw = chunkptr[0];
+	}
+
+      void *result = NULL;
+      if (size < oldsize)
+	{
+	  /* The new allocation is smaller than the old; we can always
+	     shrink an allocation in place.  */
+	  result = addr;
+
+	  uint32_t *nowfreeptr = (uint32_t*)(addr + size);
+
+	  /* Update the free chain.  */
+	  heapdesc nowfree;
+	  nowfree.desc.offset = (char*)nowfreeptr - shared_pool;
+	  nowfree.desc.size = oldsize - size;
+
+	  if (nowfree.desc.offset + size == chunk.desc.offset)
+	    {
+	      /* Coalesce following free chunk.  */
+	      nowfree.desc.size += chunk.desc.size;
+	      nowfreeptr[0] = onward_chain.raw;
+	    }
+	  else
+	    nowfreeptr[0] = chunk.raw;
+
+	  /* The previous free slot or root now points to nowfree.  */
+	  if (prev_chunkptr)
+	    prev_chunkptr[0] = nowfree.raw;
+	  else
+	    root.raw = nowfree.raw;
+	}
+      else if (chunk.desc.size != 0
+	       && (char *)addr + oldsize == (char *)chunkptr
+	       && chunk.desc.size >= size-oldsize)
+	{
+	  /* The new allocation is larger than the old, and we found a
+	     large enough free block right after the existing block,
+	     so we extend into that space.  */
+	  result = addr;
+
+	  uint16_t delta = size-oldsize;
+
+	  /* Update the free chain.  */
+	  heapdesc stillfree = {chunk.raw};
+	  stillfree.desc.offset += delta;
+	  stillfree.desc.size -= delta;
+	  uint32_t *stillfreeptr = (uint32_t*)(shared_pool
+					       + stillfree.desc.offset);
+
+	  if (stillfree.desc.size == 0)
+	    /* The whole chunk was used.  */
+	    stillfree.raw = onward_chain.raw;
+	  else
+	    /* The chunk was split, so restore the onward chain.  */
+	    stillfreeptr[0] = onward_chain.raw;
+
+	  /* The previous free slot or root now points to stillfree.  */
+	  if (prev_chunkptr)
+	    prev_chunkptr[0] = stillfree.raw;
+	  else
+	    root.raw = stillfree.raw;
+	}
+      /* Else realloc in-place has failed and result remains NULL.  */
+
+      /* Update the free chain root and release the lock.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+
+      if (result == NULL)
+	{
+	  /* The allocation could not be extended in place, so we simply
+	     allocate fresh memory and move the data.  If we can't allocate
+	     from low-latency memory then we leave the original alloaction
+	     intact and return NULL.
+	     We could do a fall-back to main memory, but we don't know what
+	     the fall-back trait said to do.  */
+	  result = nvptx_memspace_alloc (memspace, size);
+	  if (result != NULL)
+	    {
+	      /* Inline memcpy in which we know oldsize is a multiple of 8.  */
+	      uint64_t *from = addr, *to = result;
+	      for (unsigned i = 0; i < (unsigned)oldsize/8; i++)
+		to[i] = from[i];
+
+	      nvptx_memspace_free (memspace, addr, oldsize);
+	    }
+	}
+      return result;
+    }
+  else
+    return realloc (addr, size);
+}
+
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  nvptx_memspace_alloc (MEMSPACE, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  nvptx_memspace_calloc (MEMSPACE, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+  nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+  nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+
+#include "../../allocator.c"
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 6923416fb4e..65a7af3417b 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -33,9 +33,13 @@
 
 struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
 int __gomp_team_num __attribute__((shared,nocommon));
+uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon));
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
+/* There should be some .shared space reserved for us.  There's no way to
+   express this magic extern sizeless array in C so use asm.  */
+asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n");
 
 /* This externally visible function handles target region entry.  It
    sets up a per-team thread pool and transfers control by calling FN (FN_DATA)
@@ -63,6 +67,30 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
       nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
       memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
 
+      /* Find the low-latency heap details ....  */
+      uint32_t *shared_pool;
+      uint32_t shared_pool_size = 0;
+      asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
+#if __PTX_ISA_VERSION_MAJOR__ > 4 \
+    || (__PTX_ISA_VERSION_MAJOR__ == 4 && __PTX_ISA_VERSION_MAJOR__ >= 1)
+      asm ("mov.u32\t%0, %%dynamic_smem_size;\n"
+	   : "=r"(shared_pool_size));
+#endif
+
+      /* ... and initialize it with an empty free-chain.  */
+      union {
+	uint32_t raw;
+	struct {
+	  uint16_t offset;
+	  uint16_t size;
+	} desc;
+      } root;
+      root.desc.offset = 0;		 /* The first byte is free.  */
+      root.desc.size = shared_pool_size; /* The whole space is free.  */
+      shared_pool[0] = 0;		 /* Terminate free chain.  */
+      __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+
+      /* Initialize the thread pool.  */
       struct gomp_thread_pool *pool = alloca (sizeof (*pool));
       pool->threads = alloca (ntids * sizeof (*pool->threads));
       for (tid = 0; tid < ntids; tid++)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index bc63e274cdf..40739ba592d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -334,6 +334,11 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+/* OpenMP kernels reserve a small amount of ".shared" space for use by
+   omp_alloc.  The size is configured using GOMP_NVPTX_LOWLAT_POOL, but the
+   default is set here.  */
+static unsigned lowlat_pool_size = 8*1024;
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -1205,6 +1210,22 @@ GOMP_OFFLOAD_init_device (int n)
       instantiated_devices++;
     }
 
+  const char *var_name = "GOMP_NVPTX_LOWLAT_POOL";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var != NULL)
+    {
+      char *endptr;
+      unsigned long val = strtoul (env_var, &endptr, 10);
+      if (endptr == NULL || *endptr != '\0'
+	  || errno == ERANGE || errno == EINVAL
+	  || val > UINT_MAX)
+	GOMP_PLUGIN_error ("Error parsing %s", var_name);
+      else
+	lowlat_pool_size = val;
+    }
+
   pthread_mutex_unlock (&ptx_dev_lock);
 
   return dev != NULL;
@@ -2030,7 +2051,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
 		     " [(teams: %u), 1, 1] [(lanes: 32), (threads: %u), 1]\n",
 		     __FUNCTION__, fn_name, teams, threads);
   r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
-			 32, threads, 1, 0, NULL, NULL, config);
+			 32, threads, 1, lowlat_pool_size, NULL, NULL, config);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-1.c b/libgomp/testsuite/libgomp.c/allocators-1.c
new file mode 100644
index 00000000000..04968e4c83d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-1.c
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+
+/* Test that omp_alloc returns usable memory.  */
+
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+test (int n, omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:n) map(to:allocator)
+  {
+    int *a;
+    a = (int *) omp_alloc(n*sizeof(int), allocator);
+
+    #pragma omp parallel
+    for (int i = 0; i < n; i++)
+      a[i] = i;
+
+    for (int i = 0; i < n; i++)
+      if (a[i] != i)
+	{
+	  __builtin_printf ("data mismatch at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    omp_free(a, allocator);
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (10, omp_default_mem_alloc);
+  test (10, omp_large_cap_mem_alloc);
+  test (10, omp_const_mem_alloc);
+  test (10, omp_high_bw_mem_alloc);
+  test (10, omp_low_lat_mem_alloc);
+  test (10, omp_cgroup_mem_alloc);
+  test (10, omp_pteam_mem_alloc);
+  test (10, omp_thread_mem_alloc);
+
+  // Larger than low-latency memory limit
+  test (100000, omp_default_mem_alloc);
+  test (100000, omp_large_cap_mem_alloc);
+  test (100000, omp_const_mem_alloc);
+  test (100000, omp_high_bw_mem_alloc);
+  test (100000, omp_low_lat_mem_alloc);
+  test (100000, omp_cgroup_mem_alloc);
+  test (100000, omp_pteam_mem_alloc);
+  test (100000, omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-2.c b/libgomp/testsuite/libgomp.c/allocators-2.c
new file mode 100644
index 00000000000..a98f1b4c05e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-2.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+/* Test concurrent and repeated allocations.  */
+
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+test (int n, omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:n) map(to:allocator)
+  {
+    int **a;
+    a = (int **) omp_alloc(n*sizeof(int*), allocator);
+
+    #pragma omp parallel for
+    for (int i = 0; i < n; i++)
+      {
+	/*Use 10x to ensure we do activate low-latency fall-back.  */
+	a[i] = omp_alloc(sizeof(int)*10, allocator);
+	a[i][0] = i;
+      }
+
+    for (int i = 0; i < n; i++)
+      if (a[i][0] != i)
+	{
+	  __builtin_printf ("data mismatch at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    #pragma omp parallel for
+    for (int i = 0; i < n; i++)
+      omp_free(a[i], allocator);
+
+    omp_free (a, allocator);
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (10, omp_default_mem_alloc);
+  test (10, omp_large_cap_mem_alloc);
+  test (10, omp_const_mem_alloc);
+  test (10, omp_high_bw_mem_alloc);
+  test (10, omp_low_lat_mem_alloc);
+  test (10, omp_cgroup_mem_alloc);
+  test (10, omp_pteam_mem_alloc);
+  test (10, omp_thread_mem_alloc);
+
+  // Larger than low-latency memory limit (on aggregate)
+  test (1000, omp_default_mem_alloc);
+  test (1000, omp_large_cap_mem_alloc);
+  test (1000, omp_const_mem_alloc);
+  test (1000, omp_high_bw_mem_alloc);
+  test (1000, omp_low_lat_mem_alloc);
+  test (1000, omp_cgroup_mem_alloc);
+  test (1000, omp_pteam_mem_alloc);
+  test (1000, omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-3.c b/libgomp/testsuite/libgomp.c/allocators-3.c
new file mode 100644
index 00000000000..45514c2a088
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-3.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+
+/* Stress-test omp_alloc/omp_malloc under concurrency.  */
+
+#include <omp.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#pragma omp requires dynamic_allocators
+
+#define N 1000
+
+void
+test (omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:allocator)
+  {
+    #pragma omp parallel for
+    for (int i = 0; i < N; i++)
+      for (int j = 0; j < N; j++)
+	{
+	  int *p = omp_alloc(sizeof(int), allocator);
+	  omp_free(p, allocator);
+	}
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (omp_default_mem_alloc);
+  test (omp_large_cap_mem_alloc);
+  test (omp_const_mem_alloc);
+  test (omp_high_bw_mem_alloc);
+  test (omp_low_lat_mem_alloc);
+  test (omp_cgroup_mem_alloc);
+  test (omp_pteam_mem_alloc);
+  test (omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c
new file mode 100644
index 00000000000..9fa6aa1624f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-4.c
@@ -0,0 +1,196 @@
+/* { dg-do run } */
+
+/* Test that low-latency free chains are sound.  */
+
+#include <stddef.h>
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+check (int cond, const char *msg)
+{
+  if (!cond)
+    {
+      __builtin_printf ("%s\n", msg);
+      __builtin_abort ();
+    }
+}
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
+    omp_alloctrait_t traits[1]
+      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
+							1, traits);
+
+    int size = 4;
+
+    char *a = omp_alloc(size, lowlat);
+    char *b = omp_alloc(size, lowlat);
+    char *c = omp_alloc(size, lowlat);
+    char *d = omp_alloc(size, lowlat);
+
+    /* There are headers and padding to account for.  */
+    int size2 = size + (b-a);
+    int size3 = size + (c-a);
+    int size4 = size + (d-a) + 100; /* Random larger amount.  */
+
+    check (a != NULL && b != NULL && c != NULL && d != NULL,
+	   "omp_alloc returned NULL\n");
+
+    omp_free(a, lowlat);
+    char *p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not reuse first chunk");
+
+    omp_free(b, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not reuse second chunk");
+
+    omp_free(c, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not reuse third chunk");
+
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == a, "allocate did not coalesce first two chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2)");
+
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == b, "allocate did not coalesce middle two chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2)");
+
+    omp_free(b, lowlat);
+    omp_free(a, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == a, "allocate did not coalesce first two chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2), reverse free");
+
+    omp_free(c, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size2, lowlat);
+    check (p == b, "allocate did not coalesce second two chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2), reverse free");
+
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == a, "allocate did not coalesce first three chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2)");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split first chunk (3)");
+
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    omp_free(d, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == b, "allocate did not coalesce last three chunks");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1)");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2)");
+    p = omp_alloc (size, lowlat);
+    check (p == d, "allocate did not split second chunk (3)");
+
+    omp_free(c, lowlat);
+    omp_free(b, lowlat);
+    omp_free(a, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == a, "allocate did not coalesce first three chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split first chunk (3), reverse free");
+
+    omp_free(d, lowlat);
+    omp_free(c, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == b, "allocate did not coalesce second three chunks, reverse free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2), reverse free");
+    p = omp_alloc (size, lowlat);
+    check (p == d, "allocate did not split second chunk (3), reverse free");
+
+    omp_free(c, lowlat);
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == a, "allocate did not coalesce first three chunks, mixed free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == a, "allocate did not split first chunk (1), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split first chunk (2), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split first chunk (3), mixed free");
+
+    omp_free(d, lowlat);
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    p = omp_alloc (size3, lowlat);
+    check (p == b, "allocate did not coalesce second three chunks, mixed free");
+
+    omp_free(p, lowlat);
+    p = omp_alloc (size, lowlat);
+    check (p == b, "allocate did not split second chunk (1), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == c, "allocate did not split second chunk (2), mixed free");
+    p = omp_alloc (size, lowlat);
+    check (p == d, "allocate did not split second chunk (3), mixed free");
+
+    omp_free(a, lowlat);
+    omp_free(b, lowlat);
+    omp_free(c, lowlat);
+    omp_free(d, lowlat);
+    p = omp_alloc(size4, lowlat);
+    check (p == a, "allocate did not coalesce all memory");
+  }
+
+return 0;
+}
+
diff --git a/libgomp/testsuite/libgomp.c/allocators-5.c b/libgomp/testsuite/libgomp.c/allocators-5.c
new file mode 100644
index 00000000000..9694010cf1f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-5.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+
+/* Test calloc with omp_alloc.  */
+
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+test (int n, omp_allocator_handle_t allocator)
+{
+  #pragma omp target map(to:n) map(to:allocator)
+  {
+    int *a;
+    a = (int *) omp_calloc(n, sizeof(int), allocator);
+
+    for (int i = 0; i < n; i++)
+      if (a[i] != 0)
+	{
+	  __builtin_printf ("memory not zeroed at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    #pragma omp parallel
+    for (int i = 0; i < n; i++)
+      a[i] = i;
+
+    for (int i = 0; i < n; i++)
+      if (a[i] != i)
+	{
+	  __builtin_printf ("data mismatch at %i\n", i);
+	  __builtin_abort ();
+	}
+
+    omp_free(a, allocator);
+  }
+}
+
+int
+main ()
+{
+  // Smaller than low-latency memory limit
+  test (10, omp_default_mem_alloc);
+  test (10, omp_large_cap_mem_alloc);
+  test (10, omp_const_mem_alloc);
+  test (10, omp_high_bw_mem_alloc);
+  test (10, omp_low_lat_mem_alloc);
+  test (10, omp_cgroup_mem_alloc);
+  test (10, omp_pteam_mem_alloc);
+  test (10, omp_thread_mem_alloc);
+
+  // Larger than low-latency memory limit
+  test (100000, omp_default_mem_alloc);
+  test (100000, omp_large_cap_mem_alloc);
+  test (100000, omp_const_mem_alloc);
+  test (100000, omp_high_bw_mem_alloc);
+  test (100000, omp_low_lat_mem_alloc);
+  test (100000, omp_cgroup_mem_alloc);
+  test (100000, omp_pteam_mem_alloc);
+  test (100000, omp_thread_mem_alloc);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c
new file mode 100644
index 00000000000..90bf73095ef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-6.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+
+/* Test that low-latency realloc and free chains are sound.  */
+
+#include <stddef.h>
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+void
+check (int cond, const char *msg)
+{
+  if (!cond)
+    {
+      __builtin_printf ("%s\n", msg);
+      __builtin_abort ();
+    }
+}
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
+    omp_alloctrait_t traits[1]
+      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
+							1, traits);
+
+    int size = 16;
+
+    char *a = (char *)omp_alloc(size, lowlat);
+    char *b = (char *)omp_alloc(size, lowlat);
+    char *c = (char *)omp_alloc(size, lowlat);
+    char *d = (char *)omp_alloc(size, lowlat);
+
+    /* There are headers and padding to account for.  */
+    int size2 = size + (b-a);
+    int size3 = size + (c-a);
+    int size4 = size + (d-a) + 100; /* Random larger amount.  */
+
+    check (a != NULL && b != NULL && c != NULL && d != NULL,
+	   "omp_alloc returned NULL\n");
+
+    char *p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse same size chunk, no space after");
+
+    p = omp_realloc (b, size-8, lowlat, lowlat);
+    check (p == b, "realloc did not reuse smaller chunk, no space after");
+
+    p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse original size chunk, no space after");
+
+    /* Make space after b.  */
+    omp_free(c, lowlat);
+
+    p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse same size chunk");
+
+    p = omp_realloc (b, size-8, lowlat, lowlat);
+    check (p == b, "realloc did not reuse smaller chunk");
+
+    p = omp_realloc (b, size, lowlat, lowlat);
+    check (p == b, "realloc did not reuse original size chunk");
+
+    p = omp_realloc (b, size+8, lowlat, lowlat);
+    check (p == b, "realloc did not extend in place by a little");
+
+    p = omp_realloc (b, size2, lowlat, lowlat);
+    check (p == b, "realloc did not extend into whole next chunk");
+
+    p = omp_realloc (b, size3, lowlat, lowlat);
+    check (p != b, "realloc did not move b elsewhere");
+    omp_free (p, lowlat);
+
+
+    p = omp_realloc (a, size, lowlat, lowlat);
+    check (p == a, "realloc did not reuse same size chunk, first position");
+
+    p = omp_realloc (a, size-8, lowlat, lowlat);
+    check (p == a, "realloc did not reuse smaller chunk, first position");
+
+    p = omp_realloc (a, size, lowlat, lowlat);
+    check (p == a, "realloc did not reuse original size chunk, first position");
+
+    p = omp_realloc (a, size+8, lowlat, lowlat);
+    check (p == a, "realloc did not extend in place by a little, first position");
+
+    p = omp_realloc (a, size3, lowlat, lowlat);
+    check (p == a, "realloc did not extend into whole next chunk, first position");
+
+    p = omp_realloc (a, size4, lowlat, lowlat);
+    check (p != a, "realloc did not move a elsewhere, first position");
+    omp_free (p, lowlat);
+
+
+    p = omp_realloc (d, size, lowlat, lowlat);
+    check (p == d, "realloc did not reuse same size chunk, last position");
+
+    p = omp_realloc (d, size-8, lowlat, lowlat);
+    check (p == d, "realloc did not reuse smaller chunk, last position");
+
+    p = omp_realloc (d, size, lowlat, lowlat);
+    check (p == d, "realloc did not reuse original size chunk, last position");
+
+    p = omp_realloc (d, size+8, lowlat, lowlat);
+    check (p == d, "realloc did not extend in place by d little, last position");
+
+    /* Larger than low latency memory.  */
+    p = omp_realloc(d, 100000000, lowlat, lowlat);
+    check (p == NULL, "realloc did not fail on OOM");
+  }
+
+return 0;
+}
+

From patchwork Thu Jul  7 10:34:33 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55821
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 600733851157
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:35:36 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id A08CF3857410
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:35:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A08CF3857410
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112661"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:16 -0800
IronPort-SDR: 
 cP5AheROrDqrZXTC8loXu/qO7zMWONe2sLqaOwItn77ptqJ+JTExcHM128d3JCDs7AWXyMm2E0
 8tL+mPUCPQ6EjjfguEgBAonE2ap8tuhHyCl1i2rtwQm0MUD26vAQ2mRoqnrEFRTvud9I3LoGt6
 DuFY3cEaDkz8K1RWmr9X89AwFuxUByCorofK3PHg7hgRiLbGx23gcA8f5WZxRNYhigZdYa+S7x
 qt5l6EuOP34rbLNE9FqeZd/6e+UOp782SDkB2/XXL+1WeClVBYrvhoYlKGemNmmG7lAQcdFPt7
 5KA=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 02/17] libgomp: pinned memory
Date: Thu, 7 Jul 2022 11:34:33 +0100
Message-ID: 
 <fdd8ab97564dca31c8c1c1cc54b7b437981bea3c.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	(xmlock): New function.
	(omp_init_allocator): Don't disallow the pinned trait.
	(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	(omp_free): Likewise.
	* config/linux/allocator.c: New file.
	* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
	(MEMSPACE_CALLOC): Add PIN.
	(MEMSPACE_REALLOC): Add PIN.
	(MEMSPACE_FREE): Add PIN.
	* testsuite/libgomp.c/alloc-pinned-1.c: New test.
	* testsuite/libgomp.c/alloc-pinned-2.c: New test.
	* testsuite/libgomp.c/alloc-pinned-3.c: New test.
	* testsuite/libgomp.c/alloc-pinned-4.c: New test.
---
 libgomp/allocator.c                          |  67 ++++++----
 libgomp/config/linux/allocator.c             |  99 ++++++++++++++
 libgomp/config/nvptx/allocator.c             |   8 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  95 +++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 101 ++++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 130 ++++++++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 132 +++++++++++++++++++
 7 files changed, 602 insertions(+), 30 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 9b33bcf529b..54310ab93ca 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -39,16 +39,20 @@
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (SIZE))
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, SIZE))
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE))
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (ADDR))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -351,10 +355,6 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
       break;
     }
 
-  /* No support for this so far.  */
-  if (data.pinned)
-    return omp_null_allocator;
-
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
   *ret = data;
 #ifndef HAVE_SYNC_BUILTINS
@@ -481,7 +481,8 @@ retry:
 	}
       else
 #endif
-	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			      allocator_data->pinned);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -511,7 +512,8 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size,
+				allocator_data && allocator_data->pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -542,9 +544,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -596,6 +598,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
   struct omp_mem_header *data;
   omp_memspace_handle_t memspace __attribute__((unused))
     = omp_default_mem_space;
+  int pinned __attribute__((unused)) = false;
 
   if (ptr == NULL)
     return;
@@ -627,6 +630,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
       memspace = allocator_data->memspace;
+      pinned = allocator_data->pinned;
     }
   else
     {
@@ -651,7 +655,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
       memspace = predefined_alloc_mapping[data->allocator];
     }
 
-  MEMSPACE_FREE (memspace, data->ptr, data->size);
+  MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
 }
 
 ialias (omp_free)
@@ -767,7 +771,8 @@ retry:
 	}
       else
 #endif
-	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size,
+			       allocator_data->pinned);
       if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -797,7 +802,8 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size,
+				 allocator_data && allocator_data->pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -828,9 +834,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -1021,9 +1027,13 @@ retry:
 #endif
       if (prev_size)
 	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-				    data->size, new_size);
+				    data->size, new_size,
+				    (free_allocator_data
+				     && free_allocator_data->pinned),
+				    allocator_data->pinned);
       else
-	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+				  allocator_data->pinned);
       if (new_ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -1069,10 +1079,14 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size);
+	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size,
+				      (free_allocator_data
+				       && free_allocator_data->pinned),
+				      allocator_data && allocator_data->pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
+
       ret = (char *) new_ptr + sizeof (struct omp_mem_header);
       ((struct omp_mem_header *) ret)[-1].ptr = new_ptr;
       ((struct omp_mem_header *) ret)[-1].size = new_size;
@@ -1095,7 +1109,8 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  new_ptr = MEMSPACE_ALLOC (memspace, new_size);
+	  new_ptr = MEMSPACE_ALLOC (memspace, new_size,
+				    allocator_data && allocator_data->pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
@@ -1151,9 +1166,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	      && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index b73acce9121..1496e41875c 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -33,4 +33,103 @@
 #define LIBGOMP_USE_MEMKIND
 #endif
 
+/* Implement malloc routines that can handle pinned memory on Linux.
+   
+   It's possible to use mlock on any heap memory, but using munlock is
+   problematic if there are multiple pinned allocations on the same page.
+   Tracking all that manually would be possible, but adds overhead. This may
+   be worth it if there are a lot of small allocations getting pinned, but
+   this seems less likely in a HPC application.
+
+   Instead we optimize for large pinned allocations, and use mmap to ensure
+   that two pinned allocations don't share the same page.  This also means
+   that large allocations don't pin extra pages by being poorly aligned.  */
+
+#define _GNU_SOURCE
+#include <sys/mman.h>
+#include <string.h>
+#include "libgomp.h"
+
+static void *
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+{
+  (void)memspace;
+
+  if (pin)
+    {
+      void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
+			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+      if (addr == MAP_FAILED)
+	return NULL;
+
+      if (mlock (addr, size))
+	{
+	  gomp_debug (0, "libgomp: failed to pin memory (ulimit too low?)\n");
+	  munmap (addr, size);
+	  return NULL;
+	}
+
+      return addr;
+    }
+  else
+    return malloc (size);
+}
+
+static void *
+linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
+{
+  if (pin)
+    return linux_memspace_alloc (memspace, size, pin);
+  else
+    return calloc (1, size);
+}
+
+static void
+linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
+		     int pin)
+{
+  (void)memspace;
+
+  if (pin)
+    munmap (addr, size);
+  else
+    free (addr);
+}
+
+static void *
+linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+			size_t oldsize, size_t size, int oldpin, int pin)
+{
+  if (oldpin && pin)
+    {
+      void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
+      if (newaddr == MAP_FAILED)
+	return NULL;
+
+      return newaddr;
+    }
+  else if (oldpin || pin)
+    {
+      void *newaddr = linux_memspace_alloc (memspace, size, pin);
+      if (newaddr)
+	{
+	  memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+	  linux_memspace_free (memspace, addr, oldsize, oldpin);
+	}
+
+      return newaddr;
+    }
+  else
+    return realloc (addr, size);
+}
+
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  linux_memspace_alloc (MEMSPACE, SIZE, PIN)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  linux_memspace_calloc (MEMSPACE, SIZE, PIN)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  linux_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  linux_memspace_free (MEMSPACE, ADDR, SIZE, PIN)
+
 #include "../../allocator.c"
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 6bc2ea48043..f740b97f6ac 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -358,13 +358,13 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
     return realloc (addr, size);
 }
 
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_calloc (MEMSPACE, SIZE)
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
new file mode 100644
index 00000000000..79792b16d83
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, 1, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
new file mode 100644
index 00000000000..228c656b715
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -0,0 +1,101 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works (pool_size code path).  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
+  const omp_alloctrait_t traits[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space,
+							 2, traits);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, allocator);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, allocator, allocator);
+  if (!p)
+    abort ();
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, allocator);
+  if (!p)
+    abort ();
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
new file mode 100644
index 00000000000..90539ffe3e0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
@@ -0,0 +1,130 @@
+/* { dg-do run } */
+
+/* Test that pinned memory fails correctly.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+int
+#define PAGE_SIZE 10000*1024 /* unknown */
+
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
+  /* Pinned memory, no fallback.  */
+  const omp_alloctrait_t traits1[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_null_fb }
+  };
+  omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 2, traits1);
+
+  /* Pinned memory, plain memory fallback.  */
+  const omp_alloctrait_t traits2[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_default_mem_fb }
+  };
+  omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 2, traits2);
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fall back
+  p = omp_alloc (SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fall back
+  p = omp_calloc (1, SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // Should fall back to no realloc needed
+  p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc);
+  if (p != notpinned)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
new file mode 100644
index 00000000000..534e49eefc4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
@@ -0,0 +1,132 @@
+/* { dg-do run } */
+
+/* Test that pinned memory fails correctly, pool_size code path.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+int
+#define PAGE_SIZE 10000*1024 /* unknown */
+
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* This needs to be large enough to cover multiple pages.  */
+  const int SIZE = PAGE_SIZE*4;
+
+  /* Pinned memory, no fallback.  */
+  const omp_alloctrait_t traits1[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_null_fb },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 3, traits1);
+
+  /* Pinned memory, plain memory fallback.  */
+  const omp_alloctrait_t traits2[] = {
+      { omp_atk_pinned, 1 },
+      { omp_atk_fallback, omp_atv_default_mem_fb },
+      { omp_atk_pool_size, SIZE*8 }
+  };
+  omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 3, traits2);
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, allocator1);
+  if (p)
+    abort ();
+
+  // Should fall back
+  p = omp_alloc (SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fall back
+  p = omp_calloc (1, SIZE, allocator2);
+  if (!p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // Should fall back to no realloc needed
+  p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc);
+  if (p != notpinned)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}

From patchwork Thu Jul  7 10:34:34 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55824
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 02C3C386CE4C
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:36:02 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id C0ECB38560AE
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:35:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C0ECB38560AE
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112664"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:18 -0800
IronPort-SDR: 
 v/rK1YJEbJa/pV+Ai/xezDpFL0NLno4m23JXl4l7geY2oNNYrRnT3XqJ3Yul8aOPu3QOwKlYs0
 YdVGQAKJdymlP5QcaqG2rxIF9PFMsSW3uVpFF4eCejh8tzBnWiyJzT98ij7IWpSXM96G7y99DT
 ciE2ZAXbR0NiKR8znGaKQHZ84DqYq9W+jFMu+yR3J2Tw3vHL7cIArOTTptocKYpFRhHFLueYaU
 wAfsn33Jdt3SVm+/vxqttGajVKcSRS074oHm4s3wmnqzpXyQ8wGAQkMp5Atcyn87KYnjsqe2BV
 vs4=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc
Date: Thu, 7 Jul 2022 11:34:34 +0100
Message-ID: 
 <d148064484d92246255382c7aa00e4869c57c12b.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

	* allocator.c (omp_max_predefined_alloc): Update.
	(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
	(omp_free): Likewise.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
	* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
	* testsuite/libgomp.c/alloc-pinned-5.c: New test.
	* testsuite/libgomp.c/alloc-pinned-6.c: New test.
	* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
---
 libgomp/allocator.c                           |  60 +++++++----
 libgomp/omp.h.in                              |   1 +
 libgomp/omp_lib.f90.in                        |   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  |  90 ++++++++++++++++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 ++++++++++++++++++
 .../libgomp.fortran/alloc-pinned-1.f90        |  16 +++
 6 files changed, 252 insertions(+), 18 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 54310ab93ca..029d0d40a36 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include <dlfcn.h>
 #endif
 
-#define omp_max_predefined_alloc omp_thread_mem_alloc
+#define omp_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -67,6 +67,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
 };
 
 enum gomp_memkind_kind
@@ -512,8 +513,11 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size,
-				allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -534,7 +538,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		     || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -653,6 +658,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
       memspace = predefined_alloc_mapping[data->allocator];
+      pinned = (data->allocator == ompx_pinned_mem_alloc);
     }
 
   MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
@@ -802,8 +808,11 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size,
-				 allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size, pinned);
 	}
       if (ptr == NULL)
 	goto fail;
@@ -824,7 +833,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		     || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1026,11 +1036,15 @@ retry:
       else
 #endif
       if (prev_size)
-	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-				    data->size, new_size,
-				    (free_allocator_data
-				     && free_allocator_data->pinned),
-				    allocator_data->pinned);
+	{
+	  int was_pinned __attribute__((unused))
+	    = (free_allocator_data
+	       ? free_allocator_data->pinned
+	       : free_allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+				      data->size, new_size, was_pinned,
+				      allocator_data->pinned);
+	}
       else
 	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
 				  allocator_data->pinned);
@@ -1079,10 +1093,16 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
+	  int was_pinned __attribute__((unused))
+	    = (free_allocator_data
+	       ? free_allocator_data->pinned
+	       : free_allocator == ompx_pinned_mem_alloc);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
 	  new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size,
-				      (free_allocator_data
-				       && free_allocator_data->pinned),
-				      allocator_data && allocator_data->pinned);
+				      was_pinned, pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
@@ -1109,8 +1129,11 @@ retry:
 	    = (allocator_data
 	       ? allocator_data->memspace
 	       : predefined_alloc_mapping[allocator]);
-	  new_ptr = MEMSPACE_ALLOC (memspace, new_size,
-				    allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	    = (allocator_data
+	       ? allocator_data->pinned
+	       : allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
       if (new_ptr == NULL)
 	goto fail;
@@ -1156,7 +1179,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		     || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 925a650135e..eb071aa2e00 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
   omp_cgroup_mem_alloc = 6,
   omp_pteam_mem_alloc = 7,
   omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 9,
   __omp_allocator_handle_t_max__ = __UINTPTR_MAX__
 } omp_allocator_handle_t;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 7ba115f3a1a..10610d64cfe 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -158,6 +158,8 @@
                  parameter :: omp_pteam_mem_alloc = 7
         integer (kind=omp_allocator_handle_kind), &
                  parameter :: omp_thread_mem_alloc = 8
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_pinned_mem_alloc = 9
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_default_mem_space = 0
         integer (omp_memspace_handle_kind), &
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-5.c b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
new file mode 100644
index 00000000000..315c7161a39
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that ompx_pinned_mem_alloc works.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+#define CHECK_SIZE(SIZE) { \
+  struct rlimit limit; \
+  if (getrlimit (RLIMIT_MEMLOCK, &limit) \
+      || limit.rlim_cur <= SIZE) \
+    fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \
+  }
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+#define PAGE_SIZE 1 /* unknown */
+#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n");
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE;
+  CHECK_SIZE (SIZE*3);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  void *p = omp_alloc (SIZE, ompx_pinned_mem_alloc);
+  if (!p)
+    abort ();
+
+  int amount = get_pinned_mem ();
+  if (amount == 0)
+    abort ();
+
+  p = omp_realloc (p, SIZE*2, ompx_pinned_mem_alloc, ompx_pinned_mem_alloc);
+
+  int amount2 = get_pinned_mem ();
+  if (amount2 <= amount)
+    abort ();
+
+  p = omp_calloc (1, SIZE, ompx_pinned_mem_alloc);
+
+  if (get_pinned_mem () <= amount2)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
new file mode 100644
index 00000000000..bbe20c04875
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
@@ -0,0 +1,101 @@
+/* { dg-do run } */
+
+/* Test that ompx_pinned_mem_alloc fails correctly.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/resource.h>
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+
+void
+set_pin_limit (int size)
+{
+  struct rlimit limit;
+  if (getrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+  limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size);
+  if (setrlimit (RLIMIT_MEMLOCK, &limit))
+    abort ();
+}
+#else
+#define PAGE_SIZE 10000*1024 /* unknown */
+
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+void
+set_pin_limit ()
+{
+}
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  /* Allocate at least a page each time, but stay within the ulimit.  */
+  const int SIZE = PAGE_SIZE*4;
+
+  /* Ensure that the limit is smaller than the allocation.  */
+  set_pin_limit (SIZE/2);
+
+  // Sanity check
+  if (get_pinned_mem () != 0)
+    abort ();
+
+  // Should fail
+  void *p = omp_alloc (SIZE, ompx_pinned_mem_alloc);
+  if (p)
+    abort ();
+
+  // Should fail
+  p = omp_calloc (1, SIZE, ompx_pinned_mem_alloc);
+  if (p)
+    abort ();
+
+  // Should fail to realloc
+  void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc);
+  p = omp_realloc (notpinned, SIZE, ompx_pinned_mem_alloc, omp_default_mem_alloc);
+  if (!notpinned || p)
+    abort ();
+
+  // No memory should have been pinned
+  int amount = get_pinned_mem ();
+  if (amount != 0)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 b/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
new file mode 100644
index 00000000000..798dc3d5a12
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
@@ -0,0 +1,16 @@
+! Ensure that the ompx_pinned_mem_alloc predefined allocator is present and
+! accepted.  The majority of the functionality testing lives in the C tests.
+!
+! { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } }
+
+program main
+  use omp_lib
+  use ISO_C_Binding
+  implicit none (external, type)
+
+  type(c_ptr) :: p
+
+  p = omp_alloc (10_c_size_t, ompx_pinned_mem_alloc);
+  if (.not. c_associated (p)) stop 1
+  call omp_free (p, ompx_pinned_mem_alloc);
+end program main

From patchwork Thu Jul  7 10:34:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55823
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 119C6384B0FB
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:35:43 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 157233856090
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:35:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 157233856090
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112666"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:21 -0800
IronPort-SDR: 
 965zD6ldZlw6BwBbbE3HZAj35EDnJBqmyVPG1YQBdmoWtChW9TZwKsw20tQex225RnREYCt2Ox
 qKIp26JqI6pOfzkyGwWD/Cw5vtD80t33XChHNgnQh7vAvCg/wknH0ADWcTvx0EpJ8a9y8sCaLm
 3cCdO8BEpuUIsUxHAzgO4Eck87gncPuJpW95HooBTOHosP4IGt0w0UB9KXwdIMxXj3mO/J+sBP
 qtiEP4oUA7J+3dmyT+2iax2HdqBNfpge2ujRlgtiAOcS4XmvNFRndYizYtBovuAyw5+hUos9UY
 iCM=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 04/17] openmp, nvptx: low-lat memory access traits
Date: Thu, 7 Jul 2022 11:34:35 +0100
Message-ID: 
 <2810723bd4e98723e5b9eca476eb7e981590c81a.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.

libgomp/ChangeLog:

	* allocator.c (MEMSPACE_VALIDATE): New macro.
	(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
	(MEMSPACE_VALIDATE): New macro.
	* testsuite/libgomp.c/allocators-4.c (main): Add access trait.
	* testsuite/libgomp.c/allocators-6.c (main): Add access trait.
	* testsuite/libgomp.c/allocators-7.c: New test.
---
 libgomp/allocator.c                        | 15 +++++
 libgomp/config/nvptx/allocator.c           | 11 ++++
 libgomp/testsuite/libgomp.c/allocators-4.c |  7 ++-
 libgomp/testsuite/libgomp.c/allocators-6.c |  7 ++-
 libgomp/testsuite/libgomp.c/allocators-7.c | 68 ++++++++++++++++++++++
 5 files changed, 102 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-7.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 029d0d40a36..48ab0782e6b 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -54,6 +54,9 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   (PIN ? NULL : free (ADDR))
 #endif
+#ifndef MEMSPACE_VALIDATE
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) 1
+#endif
 
 /* Map the predefined allocators to the correct memory space.
    The index to this table is the omp_allocator_handle_t enum value.  */
@@ -438,6 +441,10 @@ retry:
   if (__builtin_add_overflow (size, new_size, &new_size))
     goto fail;
 
+  if (allocator_data
+      && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+    goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
     {
@@ -733,6 +740,10 @@ retry:
   if (__builtin_add_overflow (size_temp, new_size, &new_size))
     goto fail;
 
+  if (allocator_data
+      && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+    goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
     {
@@ -964,6 +975,10 @@ retry:
     goto fail;
   old_size = data->size;
 
+  if (allocator_data
+      && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+    goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
     {
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index f740b97f6ac..0102680b717 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -358,6 +358,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
     return realloc (addr, size);
 }
 
+static inline int
+nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+  /* Disallow use of low-latency memory when it must be accessible by
+     all threads.  */
+  return (memspace != omp_low_lat_mem_space
+	  || access != omp_atv_all);
+}
+
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
 #define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
@@ -366,5 +375,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  nvptx_memspace_validate (MEMSPACE, ACCESS)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c
index 9fa6aa1624f..cae27ea33c1 100644
--- a/libgomp/testsuite/libgomp.c/allocators-4.c
+++ b/libgomp/testsuite/libgomp.c/allocators-4.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
     /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-    omp_alloctrait_t traits[1]
-      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_alloctrait_t traits[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+          { omp_atk_access, omp_atv_pteam } };
     omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-							1, traits);
+							2, traits);
 
     int size = 4;
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c
index 90bf73095ef..c03233df582 100644
--- a/libgomp/testsuite/libgomp.c/allocators-6.c
+++ b/libgomp/testsuite/libgomp.c/allocators-6.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
     /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-    omp_alloctrait_t traits[1]
-      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_alloctrait_t traits[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+          { omp_atk_access, omp_atv_pteam } };
     omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-							1, traits);
+							2, traits);
 
     int size = 16;
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-7.c b/libgomp/testsuite/libgomp.c/allocators-7.c
new file mode 100644
index 00000000000..a0a738b1d1d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/allocators-7.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+
+/* { dg-require-effective-target offload_device } */
+/* { dg-xfail-if "not implemented" { ! offload_target_nvptx } } */
+
+/* Test that GPU low-latency allocation is limited to team access.  */
+
+#include <stddef.h>
+#include <omp.h>
+
+#pragma omp requires dynamic_allocators
+
+int
+main ()
+{
+  #pragma omp target
+  {
+    /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
+    omp_alloctrait_t traits[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+	  { omp_atk_access, omp_atv_pteam } };
+    omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
+							2, traits);
+
+    omp_alloctrait_t traits_all[2]
+      = { { omp_atk_fallback, omp_atv_null_fb },
+	  { omp_atk_access, omp_atv_all } };
+    omp_allocator_handle_t lowlat_all
+      = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all);
+
+    omp_alloctrait_t traits_default[1]
+      = { { omp_atk_fallback, omp_atv_null_fb } };
+    omp_allocator_handle_t lowlat_default
+      = omp_init_allocator (omp_low_lat_mem_space, 1, traits_default);
+
+    void *a = omp_alloc(1, lowlat);	    // good
+    void *b = omp_alloc(1, lowlat_all);     // bad
+    void *c = omp_alloc(1, lowlat_default); // bad
+
+    if (!a || b || c)
+      __builtin_abort ();
+
+    omp_free (a, lowlat);
+
+
+    a = omp_calloc(1, 1, lowlat);	  // good
+    b = omp_calloc(1, 1, lowlat_all);     // bad
+    c = omp_calloc(1, 1, lowlat_default); // bad
+
+    if (!a || b || c)
+      __builtin_abort ();
+
+    omp_free (a, lowlat);
+
+
+    a = omp_realloc(NULL, 1, lowlat, lowlat);		      // good
+    b = omp_realloc(NULL, 1, lowlat_all, lowlat_all);	      // bad
+    c = omp_realloc(NULL, 1, lowlat_default, lowlat_default); // bad
+
+    if (!a || b || c)
+      __builtin_abort ();
+
+    omp_free (a, lowlat);
+  }
+
+return 0;
+}
+

From patchwork Thu Jul  7 10:34:36 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55827
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 0A0FD384D140
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:37:09 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 98273385041B
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:36:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 98273385041B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448537"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:09 -0800
IronPort-SDR: 
 T/40XxOBYa2Z3segwK1sItDTInTzFWbheLHNam0GuLuleXVHluq2EsBB2W2kyDqEbyXl+50J0l
 yYtF3oi2FG6zMqY+Y1VsmXxBWbHr+msNoq6wThnHPDyNsOQ1OPSQEnAC5Q7LDjd4VFpzdgi8oC
 YpoPVU7pxAVffjewua55OqtGYA8470Moe6E0xDfKZd9y1CBmSBcoqNAbkBpYb11z7zp15j3wJ0
 FHLsj5rrkIXd+EawypN//DZFvWT5S/uX00yKD83FLQ8jw4XpqrqVA7tv3c6OCGg0xA92bbZ3dh
 9I8=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc
Date: Thu, 7 Jul 2022 11:34:36 +0100
Message-ID: 
 <ef374d055251b2bc65b97d7e54a0a72d811b869d.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This adds support for using Cuda Managed Memory with omp_alloc.  It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.

There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).

The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.

include/ChangeLog:

	* cuda/cuda.h (CUdevice_attribute): Add definitions for
	CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
	CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
	(CUmemAttach_flags): New.
	(CUpointer_attribute): New.
	(cuMemAllocManaged): New prototype.
	(cuPointerGetAttribute): New prototype.

libgomp/ChangeLog:

	* allocator.c (omp_max_predefined_alloc): Update.
	(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
	(omp_aligned_calloc): Likewise.
	(omp_realloc): Likewise.
	* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
	(linux_memspace_calloc): Handle USM.
	(linux_memspace_free): Handle USM.
	(linux_memspace_realloc): Handle USM.
	* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
	ompx_host_mem_alloc.
	(nvptx_memspace_calloc): Likewise.
	(nvptx_memspace_realloc): Likewise.
	* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
	(GOMP_OFFLOAD_usm_free): New prototype.
	(GOMP_OFFLOAD_is_usm_ptr): New prototype.
	* libgomp.h (gomp_usm_alloc): New prototype.
	(gomp_usm_free): New prototype.
	(gomp_is_usm_ptr): New prototype.
	(struct gomp_device_descr): Add USM functions.
	* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
	and ompx_host_mem_space.
	(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
	ompx_host_mem_alloc.
	* omp_lib.f90.in: Likewise.
	* plugin/cuda-lib.def (cuMemAllocManaged): Add new call.
	(cuPointerGetAttribute): Likewise.
	* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
	Call cuMemAllocManaged as appropriate.
	(GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS
	and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(GOMP_OFFLOAD_alloc): Move internals to ...
	(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
	(GOMP_OFFLOAD_usm_alloc): New function.
	(GOMP_OFFLOAD_usm_free): New function.
	(GOMP_OFFLOAD_is_usm_ptr): New function.
	* target.c (gomp_map_vars_internal): Add USM support.
	(gomp_usm_alloc): New function.
	(gomp_usm_free): New function.
	(gomp_load_plugin_for_device): New function.
	* testsuite/libgomp.c/usm-1.c: New test.
	* testsuite/libgomp.c/usm-2.c: New test.
	* testsuite/libgomp.c/usm-3.c: New test.
	* testsuite/libgomp.c/usm-4.c: New test.
	* testsuite/libgomp.c/usm-5.c: New test.

co-authored-by: Kwok Cheung Yeung  <kcy@codesourcery.com>

squash! openmp, nvptx: ompx_unified_shared_mem_alloc
---
 include/cuda/cuda.h                 | 12 ++++++
 libgomp/allocator.c                 | 13 ++++--
 libgomp/config/linux/allocator.c    | 48 ++++++++++++++--------
 libgomp/config/nvptx/allocator.c    |  6 +++
 libgomp/libgomp-plugin.h            |  3 ++
 libgomp/libgomp.h                   |  6 +++
 libgomp/omp.h.in                    |  4 ++
 libgomp/omp_lib.f90.in              |  8 ++++
 libgomp/plugin/cuda-lib.def         |  2 +
 libgomp/plugin/plugin-nvptx.c       | 47 ++++++++++++++++++---
 libgomp/target.c                    | 64 +++++++++++++++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-1.c | 24 +++++++++++
 libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++++
 libgomp/testsuite/libgomp.c/usm-3.c | 35 ++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-4.c | 36 ++++++++++++++++
 libgomp/testsuite/libgomp.c/usm-5.c | 28 +++++++++++++
 16 files changed, 340 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 3938d05d150..8135e7c9247 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -77,9 +77,19 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31,
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
   CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
 } CUdevice_attribute;
 
+typedef enum {
+  CU_MEM_ATTACH_GLOBAL = 0x1
+} CUmemAttach_flags;
+
+typedef enum {
+  CU_POINTER_ATTRIBUTE_IS_MANAGED = 8
+} CUpointer_attribute;
+
 enum {
   CU_EVENT_DEFAULT = 0,
   CU_EVENT_DISABLE_TIMING = 2
@@ -169,6 +179,7 @@ CUresult cuMemGetInfo (size_t *, size_t *);
 CUresult cuMemAlloc (CUdeviceptr *, size_t);
 #define cuMemAllocHost cuMemAllocHost_v2
 CUresult cuMemAllocHost (void **, size_t);
+CUresult cuMemAllocManaged(CUdeviceptr *, size_t, unsigned int);
 CUresult cuMemcpy (CUdeviceptr, CUdeviceptr, size_t);
 #define cuMemcpyDtoDAsync cuMemcpyDtoDAsync_v2
 CUresult cuMemcpyDtoDAsync (CUdeviceptr, CUdeviceptr, size_t, CUstream);
@@ -195,6 +206,7 @@ CUresult cuModuleLoadData (CUmodule *, const void *);
 CUresult cuModuleUnload (CUmodule);
 CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 					  CUoccupancyB2DSize, size_t, int);
+CUresult cuPointerGetAttribute(void *, CUpointer_attribute, CUdeviceptr);
 typedef void (*CUstreamCallback)(CUstream, CUresult, void *);
 CUresult cuStreamAddCallback(CUstream, CUstreamCallback, void *, unsigned int);
 CUresult cuStreamCreate (CUstream *, unsigned);
diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 48ab0782e6b..ec31f8841a3 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include <dlfcn.h>
 #endif
 
-#define omp_max_predefined_alloc ompx_pinned_mem_alloc
+#define omp_max_predefined_alloc ompx_host_mem_alloc
 
 /* These macros may be overridden in config/<target>/allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -71,6 +71,8 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
   omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
+  ompx_unified_shared_mem_space,  /* ompx_unified_shared_mem_alloc. */
+  ompx_host_mem_space,     /* ompx_host_mem_alloc.  */
 };
 
 enum gomp_memkind_kind
@@ -546,7 +548,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -845,7 +848,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1195,7 +1199,8 @@ fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
 		  : (allocator == omp_default_mem_alloc
-		     || allocator == ompx_pinned_mem_alloc)
+		     || allocator == ompx_pinned_mem_alloc
+		     || allocator == ompx_host_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 1496e41875c..18235f59775 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -53,9 +53,11 @@
 static void *
 linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    {
+      return gomp_usm_alloc (size);
+    }
+  else if (pin)
     {
       void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
 			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
@@ -78,7 +80,14 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    {
+      void *ret = gomp_usm_alloc (size);
+      memset (ret, 0, size);
+      return ret;
+    }
+  else if (memspace == ompx_unified_shared_mem_space
+      || pin)
     return linux_memspace_alloc (memspace, size, pin);
   else
     return calloc (1, size);
@@ -88,9 +97,9 @@ static void
 linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
 		     int pin)
 {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    gomp_usm_free (addr);
+  else if (pin)
     munmap (addr, size);
   else
     free (addr);
@@ -100,7 +109,9 @@ static void *
 linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 			size_t oldsize, size_t size, int oldpin, int pin)
 {
-  if (oldpin && pin)
+  if (memspace == ompx_unified_shared_mem_space)
+    goto manual_realloc;
+  else if (oldpin && pin)
     {
       void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
       if (newaddr == MAP_FAILED)
@@ -109,18 +120,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
       return newaddr;
     }
   else if (oldpin || pin)
-    {
-      void *newaddr = linux_memspace_alloc (memspace, size, pin);
-      if (newaddr)
-	{
-	  memcpy (newaddr, addr, oldsize < size ? oldsize : size);
-	  linux_memspace_free (memspace, addr, oldsize, oldpin);
-	}
-
-      return newaddr;
-    }
+    goto manual_realloc;
   else
     return realloc (addr, size);
+
+manual_realloc:
+  void *newaddr = linux_memspace_alloc (memspace, size, pin);
+  if (newaddr)
+    {
+      memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+      linux_memspace_free (memspace, addr, oldsize, oldpin);
+    }
+
+  return newaddr;
 }
 
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 0102680b717..c1a73511623 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
       __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return malloc (size);
 }
@@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
 
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return calloc (1, size);
 }
@@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 	}
       return result;
     }
+  else if (memspace == ompx_host_mem_space)
+    return NULL;
   else
     return realloc (addr, size);
 }
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index ab3ed638475..3e609bd3894 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -134,6 +134,9 @@ extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *,
 extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *);
 extern void *GOMP_OFFLOAD_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_free (int, void *);
+extern void *GOMP_OFFLOAD_usm_alloc (int, size_t);
+extern bool GOMP_OFFLOAD_usm_free (int, void *);
+extern bool GOMP_OFFLOAD_is_usm_ptr (void *);
 extern bool GOMP_OFFLOAD_dev2host (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_host2dev (int, void *, const void *, size_t);
 extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c243c4d6cf4..3fdce301372 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1014,6 +1014,9 @@ extern int gomp_pause_host (void);
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
 extern bool gomp_target_task_fn (void *);
+extern void * gomp_usm_alloc (size_t size);
+extern void gomp_usm_free (void *device_ptr);
+extern bool gomp_is_usm_ptr (void *ptr);
 
 /* Splay tree definitions.  */
 typedef struct splay_tree_node_s *splay_tree_node;
@@ -1239,6 +1242,9 @@ struct gomp_device_descr
   __typeof (GOMP_OFFLOAD_unload_image) *unload_image_func;
   __typeof (GOMP_OFFLOAD_alloc) *alloc_func;
   __typeof (GOMP_OFFLOAD_free) *free_func;
+  __typeof (GOMP_OFFLOAD_usm_alloc) *usm_alloc_func;
+  __typeof (GOMP_OFFLOAD_usm_free) *usm_free_func;
+  __typeof (GOMP_OFFLOAD_is_usm_ptr) *is_usm_ptr_func;
   __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func;
   __typeof (GOMP_OFFLOAD_host2dev) *host2dev_func;
   __typeof (GOMP_OFFLOAD_dev2dev) *dev2dev_func;
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index eb071aa2e00..eea019ad88d 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -120,6 +120,8 @@ typedef enum omp_memspace_handle_t __GOMP_UINTPTR_T_ENUM
   omp_const_mem_space = 2,
   omp_high_bw_mem_space = 3,
   omp_low_lat_mem_space = 4,
+  ompx_unified_shared_mem_space = 5,
+  ompx_host_mem_space = 6,
   __omp_memspace_handle_t_max__ = __UINTPTR_MAX__
 } omp_memspace_handle_t;
 
@@ -135,6 +137,8 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
   omp_pteam_mem_alloc = 7,
   omp_thread_mem_alloc = 8,
   ompx_pinned_mem_alloc = 9,
+  ompx_unified_shared_mem_alloc = 10,
+  ompx_host_mem_alloc = 11,
   __omp_allocator_handle_t_max__ = __UINTPTR_MAX__
 } omp_allocator_handle_t;
 
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 10610d64cfe..39a58b4bc4d 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -160,6 +160,10 @@
                  parameter :: omp_thread_mem_alloc = 8
         integer (kind=omp_allocator_handle_kind), &
                  parameter :: ompx_pinned_mem_alloc = 9
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_unified_shared_mem_alloc = 10
+        integer (kind=omp_allocator_handle_kind), &
+                 parameter :: ompx_host_mem_alloc = 11
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_default_mem_space = 0
         integer (omp_memspace_handle_kind), &
@@ -170,6 +174,10 @@
                  parameter :: omp_high_bw_mem_space = 3
         integer (omp_memspace_handle_kind), &
                  parameter :: omp_low_lat_mem_space = 4
+        integer (omp_memspace_handle_kind), &
+                 parameter :: omp_unified_shared_mem_space = 5
+        integer (omp_memspace_handle_kind), &
+                 parameter :: omp_host_mem_space = 6
         integer, parameter :: omp_initial_device = -1
         integer, parameter :: omp_invalid_device = -4
 
diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index cd91b39b1d2..b6d03290f35 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -29,6 +29,7 @@ CUDA_ONE_CALL_MAYBE_NULL (cuLinkCreate_v2)
 CUDA_ONE_CALL (cuLinkDestroy)
 CUDA_ONE_CALL (cuMemAlloc)
 CUDA_ONE_CALL (cuMemAllocHost)
+CUDA_ONE_CALL (cuMemAllocManaged)
 CUDA_ONE_CALL (cuMemcpy)
 CUDA_ONE_CALL (cuMemcpyDtoDAsync)
 CUDA_ONE_CALL (cuMemcpyDtoH)
@@ -46,6 +47,7 @@ CUDA_ONE_CALL (cuModuleLoad)
 CUDA_ONE_CALL (cuModuleLoadData)
 CUDA_ONE_CALL (cuModuleUnload)
 CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
+CUDA_ONE_CALL (cuPointerGetAttribute)
 CUDA_ONE_CALL (cuStreamAddCallback)
 CUDA_ONE_CALL (cuStreamCreate)
 CUDA_ONE_CALL (cuStreamDestroy)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 40739ba592d..2800c0dce6d 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1046,11 +1046,13 @@ nvptx_stacks_free (struct ptx_device *ptx_dev, bool force)
 }
 
 static void *
-nvptx_alloc (size_t s, bool suppress_errors)
+nvptx_alloc (size_t s, bool suppress_errors, bool usm)
 {
   CUdeviceptr d;
 
-  CUresult r = CUDA_CALL_NOCHECK (cuMemAlloc, &d, s);
+  CUresult r = (usm ? CUDA_CALL_NOCHECK (cuMemAllocManaged, &d, s,
+					 CU_MEM_ATTACH_GLOBAL)
+		: CUDA_CALL_NOCHECK (cuMemAlloc, &d, s));
   if (suppress_errors && r == CUDA_ERROR_OUT_OF_MEMORY)
     return NULL;
   else if (r != CUDA_SUCCESS)
@@ -1185,6 +1187,8 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   int num_devices = nvptx_get_num_devices ();
   /* Return -1 if no omp_requires_mask cannot be fulfilled but
      devices were present.  */
+  omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+			 | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
   if (num_devices > 0 && omp_requires_mask != 0)
     return -1;
   return num_devices;
@@ -1432,8 +1436,8 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data)
   return ret;
 }
 
-void *
-GOMP_OFFLOAD_alloc (int ord, size_t size)
+static void *
+GOMP_OFFLOAD_alloc_1 (int ord, size_t size, bool usm)
 {
   if (!nvptx_attach_host_thread_to_device (ord))
     return NULL;
@@ -1456,7 +1460,7 @@ GOMP_OFFLOAD_alloc (int ord, size_t size)
       blocks = tmp;
     }
 
-  void *d = nvptx_alloc (size, true);
+  void *d = nvptx_alloc (size, true, usm);
   if (d)
     return d;
   else
@@ -1464,10 +1468,22 @@ GOMP_OFFLOAD_alloc (int ord, size_t size)
       /* Memory allocation failed.  Try freeing the stacks block, and
 	 retrying.  */
       nvptx_stacks_free (ptx_dev, true);
-      return nvptx_alloc (size, false);
+      return nvptx_alloc (size, false, usm);
     }
 }
 
+void *
+GOMP_OFFLOAD_alloc (int ord, size_t size)
+{
+  return GOMP_OFFLOAD_alloc_1 (ord, size, false);
+}
+
+void *
+GOMP_OFFLOAD_usm_alloc (int ord, size_t size)
+{
+  return GOMP_OFFLOAD_alloc_1 (ord, size, true);
+}
+
 bool
 GOMP_OFFLOAD_free (int ord, void *ptr)
 {
@@ -1475,6 +1491,25 @@ GOMP_OFFLOAD_free (int ord, void *ptr)
 	  && nvptx_free (ptr, ptx_devices[ord]));
 }
 
+bool
+GOMP_OFFLOAD_usm_free (int ord, void *ptr)
+{
+  return GOMP_OFFLOAD_free (ord, ptr);
+}
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  bool managed = false;
+  /* This returns 3 outcomes ...
+     CUDA_ERROR_INVALID_VALUE    - Not a Cuda allocated pointer.
+     CUDA_SUCCESS, managed:false - Cuda allocated, but not USM.
+     CUDA_SUCCESS, managed:true  - USM.  */
+  CUDA_CALL_NOCHECK (cuPointerGetAttribute, &managed,
+		     CU_POINTER_ATTRIBUTE_IS_MANAGED, (CUdeviceptr)ptr);
+  return managed;
+}
+
 void
 GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum,
 			   void **hostaddrs, void **devaddrs,
diff --git a/libgomp/target.c b/libgomp/target.c
index 4dac81862d7..4e203ae3c06 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1049,6 +1049,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	    tgt->list[i].offset = 0;
 	  continue;
 	}
+      else if (devicep->is_usm_ptr_func
+	       && devicep->is_usm_ptr_func (hostaddrs[i]))
+	{
+	  /* The memory is visible from both host and target
+	     so nothing needs to be moved.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = OFFSET_INLINED;
+	  continue;
+	}
       else if ((kind & typemask) == GOMP_MAP_STRUCT)
 	{
 	  size_t first = i + 1;
@@ -1524,6 +1533,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  continue;
 		}
 	      default:
+		if (tgt->list[i].offset == OFFSET_INLINED)
+		  continue;
 		break;
 	      }
 	    splay_tree_key k = &array->key;
@@ -3401,6 +3412,56 @@ omp_target_free (void *device_ptr, int device_num)
   gomp_mutex_unlock (&devicep->lock);
 }
 
+void *
+gomp_usm_alloc (size_t size)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  struct gomp_device_descr *devicep = resolve_device (icv->default_device_var,
+						      false);
+  if (devicep == NULL)
+    return NULL;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    return malloc (size);
+
+  void *ret = NULL;
+  gomp_mutex_lock (&devicep->lock);
+  if (devicep->usm_alloc_func)
+    ret = devicep->usm_alloc_func (devicep->target_id, size);
+  gomp_mutex_unlock (&devicep->lock);
+  return ret;
+}
+
+void
+gomp_usm_free (void *device_ptr)
+{
+  if (device_ptr == NULL)
+    return;
+
+  struct gomp_task_icv *icv = gomp_icv (false);
+  struct gomp_device_descr *devicep = resolve_device (icv->default_device_var,
+						      false);
+  if (devicep == NULL)
+    return;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    {
+      free (device_ptr);
+      return;
+    }
+
+  gomp_mutex_lock (&devicep->lock);
+  if (devicep->usm_free_func
+      && !devicep->usm_free_func (devicep->target_id, device_ptr))
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      gomp_fatal ("error in freeing device memory block at %p", device_ptr);
+    }
+  gomp_mutex_unlock (&devicep->lock);
+}
+
 int
 omp_target_is_present (const void *ptr, int device_num)
 {
@@ -4041,6 +4102,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (unload_image);
   DLSYM (alloc);
   DLSYM (free);
+  DLSYM_OPT (usm_alloc, usm_alloc);
+  DLSYM_OPT (usm_free, usm_free);
+  DLSYM_OPT (is_usm_ptr, is_usm_ptr);
   DLSYM (dev2host);
   DLSYM (host2dev);
   device->capabilities = device->get_caps_func ();
diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c
new file mode 100644
index 00000000000..1b35f19c45b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int), ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  *a = 42;
+  uintptr_t a_p = (uintptr_t)a;
+
+  #pragma omp target is_device_ptr(a)
+    {
+      if (*a != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c
new file mode 100644
index 00000000000..689cee7e456
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-2.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+  #pragma omp target map(a[0])
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  #pragma omp target map(a[1])
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c
new file mode 100644
index 00000000000..2ca66afe93f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-3.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target data map(a[0:2])
+    {
+#pragma omp target
+	{
+	  if (a[0] != 42 || a_p != (uintptr_t)a)
+	    __builtin_abort ();
+	}
+
+#pragma omp target
+	{
+	  if (a[1] != 43 || a_p != (uintptr_t)a)
+	    __builtin_abort ();
+	}
+    }
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c
new file mode 100644
index 00000000000..753908c8440
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-4.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+
+#include <omp.h>
+#include <stdint.h>
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target enter data map(to:a[0:2])
+
+#pragma omp target
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+#pragma omp target
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+#pragma omp target exit data map(delete:a[0:2])
+
+  omp_free(a, ompx_unified_shared_mem_alloc);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c
new file mode 100644
index 00000000000..4d8b3cf71b1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-5.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target offload_device } */
+
+#include <omp.h>
+#include <stdint.h>
+
+#pragma omp requires unified_shared_memory
+
+int
+main ()
+{
+  int *a = (int *) omp_alloc(sizeof(int), ompx_host_mem_alloc);
+  if (!a)
+    __builtin_abort ();
+
+  a[0] = 42;
+
+  uintptr_t a_p = (uintptr_t)a;
+
+#pragma omp target map(a[0:1])
+    {
+      if (a[0] != 42 || a_p == (uintptr_t)a)
+	__builtin_abort ();
+    }
+
+  omp_free(a, ompx_host_mem_alloc);
+  return 0;
+}

From patchwork Thu Jul  7 10:34:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55826
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id AC57B384A898
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:37:06 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 61883386189C
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:36:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 61883386189C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448557"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:10 -0800
IronPort-SDR: 
 PNFfJiC1/82ycrwTl/4J6pik4cefm9fQVMsCkGf0sF97f3+i1+gvO+ZM75wpbNwcOlW/xjmwEq
 kcqkCg4rttg3jibkgdDety4Yum4nGedZvFnrBFZUe4Gy/mBvor/kaY2oY333FGDDyI/i+xAbqk
 zTBbflh1TNmDJIKvbyScarfV9Nwr5LOMfBQrFXCGoBk+8XIMZzqCb7q+o2wvZAe++sqaGYfoje
 6BmvsKtzB3VR0bPHp9oQlo6MaWISvOcz/+nS8tY/uSGUpG3Q2vQKr3PAaIbL+waw9spw0rFRz7
 S2A=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 06/17] openmp: Add -foffload-memory
Date: Thu, 7 Jul 2022 11:34:37 +0100
Message-ID: 
 <07f0cd465555c2bac53cbb2248baab02002cc4ef.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

	* common.opt: Add -foffload-memory and its enum values.
	* coretypes.h (enum offload_memory): New.
	* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt      | 16 ++++++++++++++++
 gcc/coretypes.h     |  7 +++++++
 gcc/doc/invoke.texi | 16 +++++++++++++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index e7a51e882ba..8d76980fbbb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2213,6 +2213,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 08b9ac9094c..dd52d5bb113 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -206,6 +206,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d5ff1018372..3df39bb06e3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
--foffload=@var{arg}  -foffload-options=@var{arg} @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2708,6 +2708,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
 @end smallexample
 
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming

From patchwork Thu Jul  7 10:34:38 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55829
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A5EEE383B79A
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:37:39 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 2FFFA385021A
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:36:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2FFFA385021A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448603"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:14 -0800
IronPort-SDR: 
 ZH7p72jl7djkUOuPVqepPxnZ3a3duvKkalXit5sBEnGTh+XVitxwMI/ta8dCFLq9AZ0lAIgdN+
 RhjMfpIACSqsm/1R8dzMDIefVZ9b71Dia+q20yhJp2GtcfpzY4mdJ661LJGul9YT5KuYaXKfOp
 81G680Yrv+6VB9eLF0DmD5pT2+99jjSZLEnF+iILXouuYiP/V+o3JMNcbA7BDnJO96Ne3EUseh
 YycT+e5i6WvLH+E1TmiKVaU1r8L/L+VdnAjgGkAOM9AbGqd8t3NbX1AnqZ5VRzRTmxhrwdLlpX
 g+4=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 07/17] openmp: allow requires unified_shared_memory
Date: Thu, 7 Jul 2022 11:34:38 +0100
Message-ID: 
 <902d406d18cad52d683e4e1cb2dbb19ea1afd81a.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.

It also checks that -foffload-memory isn't set to an incompatible mode.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_requires): Allow "requires
	  unified_share_memory" and "unified_address".

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_requires): Allow "requires
	unified_share_memory" and "unified_address".

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_requires): Allow "requires
	unified_share_memory" and "unified_address".

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-1.c: New test.
	* c-c++-common/gomp/usm-4.c: New test.
	* gfortran.dg/gomp/usm-1.f90: New test.
	* gfortran.dg/gomp/usm-4.f90: New test.
---
 gcc/c/c-parser.cc                        | 22 ++++++++++++++++++++--
 gcc/cp/parser.cc                         | 22 ++++++++++++++++++++--
 gcc/fortran/openmp.cc                    | 13 +++++++++++++
 gcc/testsuite/c-c++-common/gomp/usm-1.c  |  4 ++++
 gcc/testsuite/c-c++-common/gomp/usm-4.c  |  4 ++++
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 |  6 ++++++
 gcc/testsuite/gfortran.dg/gomp/usm-4.f90 |  6 ++++++
 7 files changed, 73 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-4.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-4.f90

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9c02141e2c6..c30f67cd2da 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -22726,9 +22726,27 @@ c_parser_omp_requires (c_parser *parser)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_address%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "unified_shared_memory"))
-	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_shared_memory%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index df657a3fb2b..3deafc7c928 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -46860,9 +46860,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	    this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_address%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "unified_shared_memory"))
-	    this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+	    {
+	      this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	      if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "%<unified_shared_memory%> is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	      flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	    }
 	  else if (!strcmp (p, "dynamic_allocators"))
 	    this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index bd4ff259fe0..91bf8a3c50d 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
+#include "options.h"
 
 /* Match an end of OpenMP directive.  End of OpenMP directive is optional
    whitespace, followed by '\n' or comment '!'.  */
@@ -5556,6 +5557,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_ADDRESS;
 	  if (requires_clauses & OMP_REQ_UNIFIED_ADDRESS)
 	    goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	      && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	    gfc_error_now ("unified_address at %C is incompatible with "
+			   "the selected -foffload-memory option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
 	}
       else if (gfc_match (clauses[2]) == MATCH_YES)
 	{
@@ -5563,6 +5570,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_SHARED_MEMORY;
 	  if (requires_clauses & OMP_REQ_UNIFIED_SHARED_MEMORY)
 	    goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	      && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	    gfc_error_now ("unified_shared_memory at %C is incompatible with "
+			   "the selected -foffload-memory option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
 	}
       else if (gfc_match (clauses[3]) == MATCH_YES)
 	{
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-1.c b/gcc/testsuite/c-c++-common/gomp/usm-1.c
new file mode 100644
index 00000000000..8d2ba62aba3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-1.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+#pragma omp requires unified_shared_memory  /* { dg-error ".unified_shared_memory. is incompatible with the selected .-foffload-memory. option" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-4.c b/gcc/testsuite/c-c++-common/gomp/usm-4.c
new file mode 100644
index 00000000000..84f6f785079
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-4.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+#pragma omp requires unified_address        /* { dg-error ".unified_address. is incompatible with the selected .-foffload-memory. option" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90
new file mode 100644
index 00000000000..340f6bb50a5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=pinned" }
+
+!$omp requires unified_shared_memory  ! { dg-error "unified_shared_memory at .* is incompatible with the selected -foffload-memory option" }
+
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-4.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90
new file mode 100644
index 00000000000..725b07f2f88
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=pinned" }
+
+!$omp requires unified_address  ! { dg-error "unified_address at .* is incompatible with the selected -foffload-memory option" }
+
+end

From patchwork Thu Jul  7 10:34:39 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55832
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C1254386CE75
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:38:09 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 6E3593864877
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:36:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6E3593864877
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448636"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:18 -0800
IronPort-SDR: 
 7lxpei2NkP0mIbJnHx3Y+JJ9mpGDshFuFVLXe0aIkRT8Nqn6oNr21LInSOFFCRlU1oA7v80DxB
 NZk3KEQ6IXu4aV0Ht1zEj7MI+0P0XY3zUZkDnaTKNXlZc499rVAjQTmnnFlDjc6DpgjpgjthYS
 Sp1CSr72Dru/HZsXYryNb4jwDy+xexYa93HcN9kTqkPSOtawXE5kGuBroR42GPel8R8EEwaNrh
 Sj3GMdB06ycgGcAVzZ7qfncrapskn980mdn1x5xVeAXdzkqmZCs3erET959xHYLmpYCOsJDsq5
 lHg=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 08/17] openmp: -foffload-memory=pinned
Date: Thu, 7 Jul 2022 11:34:39 +0100
Message-ID: 
 <8011a994bb38db60f37127880b0fc682564f6e8d.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

	* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
	* omp-low.cc (omp_enable_pinned_mode): New function.
	(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

	* config/linux/allocator.c (always_pinned_mode): New variable.
	(GOMP_enable_pinned_mode): New function.
	(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
	(linux_memspace_calloc): Likewise.
	(linux_memspace_free): Likewise.
	(linux_memspace_realloc): Likewise.
	* libgomp.map: Add GOMP_enable_pinned_mode.
	* testsuite/libgomp.c/alloc-pinned-7.c: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def                          |  3 +
 gcc/omp-low.cc                                | 66 +++++++++++++++++++
 .../c-c++-common/gomp/alloc-pinned-1.c        | 28 ++++++++
 libgomp/config/linux/allocator.c              | 26 ++++++++
 libgomp/libgomp.map                           |  1 +
 libgomp/target.c                              |  4 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++++++++++++++++++
 7 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index ee5213eedcf..276dd7812f2 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -470,3 +470,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+		  "GOMP_enable_pinned_mode",
+		  BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..ba612e5c67d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+    return;
+  visited = true;
+
+  /* Create a new function like this:
+     
+       static void __attribute__((constructor))
+       __set_pinned_mode ()
+       {
+         GOMP_enable_pinned_mode ();
+       }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+				      NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		       void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14676,6 +14738,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
     finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+    omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
new file mode 100644
index 00000000000..e0e08019bff
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+#if __cplusplus
+#define EXTERNC extern "C"
+#else
+#define EXTERNC
+#endif
+
+/* Intercept the libgomp initialization call to check it happens.  */
+
+int good = 0;
+
+EXTERNC void
+GOMP_enable_pinned_mode ()
+{
+  good = 1;
+}
+
+int
+main ()
+{
+  if (!good)
+    __builtin_exit (1);
+
+  return 0;
+}
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 18235f59775..e7fe6c3c49a 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -50,9 +50,26 @@
 #include <string.h>
 #include "libgomp.h"
 
+static bool always_pinned_mode = false;
+
+/* This function is called by the compiler when -foffload-memory=pinned
+   is used.  */
+
+void
+GOMP_enable_pinned_mode ()
+{
+  if (mlockall (MCL_CURRENT | MCL_FUTURE) != 0)
+    gomp_error ("failed to pin all memory (ulimit too low?)");
+  else
+    always_pinned_mode = true;
+}
+
 static void *
 linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     {
       return gomp_usm_alloc (size);
@@ -80,6 +97,9 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     {
       void *ret = gomp_usm_alloc (size);
@@ -97,6 +117,9 @@ static void
 linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
 		     int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     gomp_usm_free (addr);
   else if (pin)
@@ -109,6 +132,9 @@ static void *
 linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 			size_t oldsize, size_t size, int oldpin, int pin)
 {
+  /* Explicit pinning may not be required.  */
+  pin = pin && !always_pinned_mode;
+
   if (memspace == ompx_unified_shared_mem_space)
     goto manual_realloc;
   else if (oldpin && pin)
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 46d5f10f3e1..c86734f15e2 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -400,6 +400,7 @@ GOMP_5.0.1 {
   global:
 	GOMP_alloc;
 	GOMP_free;
+	GOMP_enable_pinned_mode;
 } GOMP_5.0;
 
 GOMP_5.1 {
diff --git a/libgomp/target.c b/libgomp/target.c
index 4e203ae3c06..3dd09b7afbd 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1,3 +1,4 @@
+#include <stdio.h>
 /* Copyright (C) 2013-2022 Free Software Foundation, Inc.
    Contributed by Jakub Jelinek <jakub@redhat.com>.
 
@@ -1533,7 +1534,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  continue;
 		}
 	      default:
-		if (tgt->list[i].offset == OFFSET_INLINED)
+		if (tgt->list[i].offset == OFFSET_INLINED
+		    && !array)
 		  continue;
 		break;
 	      }
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
new file mode 100644
index 00000000000..8dc19055038
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */
+
+/* Test that pinned memory works.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#ifdef __linux__
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+
+int
+get_pinned_mem ()
+{
+  int pid = getpid ();
+  char buf[100];
+  sprintf (buf, "/proc/%d/status", pid);
+
+  FILE *proc = fopen (buf, "r");
+  if (!proc)
+    abort ();
+  while (fgets (buf, 100, proc))
+    {
+      int val;
+      if (sscanf (buf, "VmLck: %d", &val))
+	{
+	  fclose (proc);
+	  return val;
+	}
+    }
+  abort ();
+}
+#else
+int
+get_pinned_mem ()
+{
+  return 0;
+}
+
+#define mlockall(...) 0
+#endif
+
+#include <omp.h>
+
+int
+main ()
+{
+  // Sanity check
+  if (get_pinned_mem () == 0)
+    {
+      /* -foffload-memory=pinned has failed, but maybe that's because
+	 isufficient pinned memory was available.  */
+      if (mlockall (MCL_CURRENT | MCL_FUTURE) == 0)
+	abort ();
+    }
+
+  return 0;
+}

From patchwork Thu Jul  7 10:34:40 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55825
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 62EC5385142A
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:36:37 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 551973857BBB
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:36:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 551973857BBB
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112718"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:18 -0800
IronPort-SDR: 
 CQvw7pNDYnLEBkV1RTR2QQIprZaPY2R7RW4MJv/KC0uMRGtyZTXqEQZdbzEmPwdLjO9s16ghO+
 2g/6Uzwa0Q2SnVtJudcVfINJ9JiZ9F+irn2SgtcGmvqbxMKpfVYJ15tfKXBDKRZEJfVYZx0mXt
 PMAbRYvYdWvT26t7VB6Oqxr1t0RzIObQlx34M8pKdKbYRBb1ZU6SbV0wwt4BlKFWEqxLuRpLTf
 LbaGRgY969AVr4JAVxunIx6VnfXVYEbifdyW5fQIsLS+l8Nw4lXfVP8i8gshvQatf+LJY1GJ4V
 gOY=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 09/17] openmp: Use libgomp memory allocation functions with
 unified shared memory.
Date: Thu, 7 Jul 2022 11:34:40 +0100
Message-ID: 
 <4c5987af7ca4f9de5ce05d2f2297e862c8b83596.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.  This helps existing code to benefit
from the unified shared memory.  The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.

We only replace replacable new operator and not the class member or placement new.

gcc/ChangeLog:

	* omp-low.cc (usm_transform): New function.
	(make_pass_usm_transform): Likewise.
	(class pass_usm_transform): New.
	* passes.def: Add pass_usm_transform.
	* tree-pass.h (make_pass_usm_transform): New declaration.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/usm-2.c: New test.
	* c-c++-common/gomp/usm-3.c: New test.
	* g++.dg/gomp/usm-1.C: New test.
	* g++.dg/gomp/usm-2.C: New test.
	* g++.dg/gomp/usm-3.C: New test.
	* gfortran.dg/gomp/usm-2.f90: New test.
	* gfortran.dg/gomp/usm-3.f90: New test.

libgomp/ChangeLog:

	* testsuite/libgomp.c/usm-6.c: New test.
	* testsuite/libgomp.c++/usm-1.C: Likewise.

co-authored-by: Andrew Stubbs  <ams@codesourcery.com>
---
 gcc/omp-low.cc                           | 174 +++++++++++++++++++++++
 gcc/passes.def                           |   1 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c  |  46 ++++++
 gcc/testsuite/c-c++-common/gomp/usm-3.c  |  44 ++++++
 gcc/testsuite/g++.dg/gomp/usm-1.C        |  32 +++++
 gcc/testsuite/g++.dg/gomp/usm-2.C        |  30 ++++
 gcc/testsuite/g++.dg/gomp/usm-3.C        |  38 +++++
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 |  16 +++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 |  13 ++
 gcc/tree-pass.h                          |   1 +
 libgomp/testsuite/libgomp.c++/usm-1.C    |  54 +++++++
 libgomp/testsuite/libgomp.c/usm-6.c      |  92 ++++++++++++
 12 files changed, 541 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index ba612e5c67d..cdadd6f0c96 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -15097,6 +15097,180 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt)
 {
   return new pass_diagnose_omp_blocks (ctxt);
 }
+
+/* Provide transformation required for using unified shared memory
+   by replacing calls to standard memory allocation functions with
+   function provided by the libgomp.  */
+
+static tree
+usm_transform (gimple_stmt_iterator *gsi_p, bool *,
+	       struct walk_stmt_info *wi)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  /* ompx_unified_shared_mem_alloc is 10.  */
+  const unsigned int unified_shared_mem_alloc = 10;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      {
+	gcall *gs = as_a <gcall *> (stmt);
+	tree fndecl = gimple_call_fndecl (gs);
+	if (fndecl)
+	  {
+	    tree allocator = build_int_cst (pointer_sized_int_node,
+					    unified_shared_mem_alloc);
+	    const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+	    if ((strcmp (name, "malloc") == 0)
+		 || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+		     && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
+		 || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
+		 || strcmp (name, "omp_target_alloc") == 0)
+	      {
+		  tree omp_alloc_type
+		    = build_function_type_list (ptr_type_node, size_type_node,
+						pointer_sized_int_node,
+						NULL_TREE);
+		tree repl = build_fn_decl ("omp_alloc", omp_alloc_type);
+		tree size = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if (strcmp (name, "aligned_alloc") == 0)
+	      {
+		/*  May be we can also use this for new operator with
+		    std::align_val_t parameter.  */
+		tree omp_alloc_type
+		  = build_function_type_list (ptr_type_node, size_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_aligned_alloc",
+					   omp_alloc_type);
+		tree align = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 3, align, size,
+					       allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if ((strcmp (name, "calloc") == 0)
+		      || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CALLOC))
+	      {
+		tree omp_calloc_type
+		  = build_function_type_list (ptr_type_node, size_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_calloc", omp_calloc_type);
+		tree num = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 3, num, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else if ((strcmp (name, "realloc") == 0)
+		      || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_REALLOC))
+	      {
+		tree omp_realloc_type
+		  = build_function_type_list (ptr_type_node, ptr_type_node,
+					      size_type_node,
+					      pointer_sized_int_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_realloc", omp_realloc_type);
+		tree ptr = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 4, ptr, size, allocator,
+					       allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	    else  if ((strcmp (name, "free") == 0)
+		       || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			   && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE)
+		       || (DECL_IS_OPERATOR_DELETE_P (fndecl)
+			   && DECL_IS_REPLACEABLE_OPERATOR (fndecl))
+		       || strcmp (name, "omp_target_free") == 0)
+	      {
+		tree omp_free_type
+		  = build_function_type_list (void_type_node, ptr_type_node,
+					      pointer_sized_int_node,
+					      NULL_TREE);
+		tree repl = build_fn_decl ("omp_free", omp_free_type);
+		tree ptr = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, ptr, allocator);
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	      }
+	  }
+      }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL_TREE;
+}
+
+namespace {
+
+const pass_data pass_data_usm_transform =
+{
+  GIMPLE_PASS, /* type */
+  "usm_transform", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_usm_transform : public gimple_opt_pass
+{
+public:
+  pass_usm_transform (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_usm_transform, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openmp || flag_openmp_simd)
+	    && (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+		|| omp_requires_mask & OMP_REQUIRES_UNIFIED_ADDRESS);
+  }
+  virtual unsigned int execute (function *)
+  {
+    struct walk_stmt_info wi;
+    gimple_seq body = gimple_body (current_function_decl);
+
+    memset (&wi, 0, sizeof (wi));
+    walk_gimple_seq (body, usm_transform, NULL, &wi);
+
+    return 0;
+  }
+
+}; // class pass_usm_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_usm_transform (gcc::context *ctxt)
+{
+  return new pass_usm_transform (ctxt);
+}
 
 
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index 375d3d62d51..7f838bfc96a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_diagnose_tm_blocks);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);
+  NEXT_PASS (pass_usm_transform);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
   NEXT_PASS (pass_refactor_eh);
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c
new file mode 100644
index 00000000000..8c20ef94e69
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-usm_transform" } */
+
+#pragma omp requires unified_shared_memory
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void *malloc (__SIZE_TYPE__);
+void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
+void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
+void *realloc(void *, __SIZE_TYPE__);
+void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
+
+#ifdef __cplusplus
+}
+#endif
+
+void
+foo ()
+{
+  void *p1 = malloc(20);
+  void *p2 = realloc(p1, 30);
+  void *p3 = calloc(4, 15);
+  void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
+  free (p2);
+  free (p3);
+  free (p4);
+  omp_target_free (p5, 1);
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c
new file mode 100644
index 00000000000..2b0cbb45e27
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void *malloc (__SIZE_TYPE__);
+void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__);
+void *calloc(__SIZE_TYPE__, __SIZE_TYPE__);
+void *realloc(void *, __SIZE_TYPE__);
+void free (void *);
+void *omp_target_alloc (__SIZE_TYPE__, int);
+void omp_target_free (void *, int);
+
+#ifdef __cplusplus
+}
+#endif
+
+void
+foo ()
+{
+  void *p1 = malloc(20);
+  void *p2 = realloc(p1, 30);
+  void *p3 = calloc(4, 15);
+  void *p4 = aligned_alloc(16, 40);
+  void *p5 = omp_target_alloc(50, 1);
+  free (p2);
+  free (p3);
+  free (p4);
+  omp_target_free (p5, 1);
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " free"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " aligned_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " malloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not " omp_target_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-1.C b/gcc/testsuite/g++.dg/gomp/usm-1.C
new file mode 100644
index 00000000000..bd70a81b5bb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-1.C
@@ -0,0 +1,32 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -fdump-tree-usm_transform" }
+
+#pragma omp requires unified_shared_memory
+
+struct t1
+{
+  int a;
+  int b;
+};
+
+typedef unsigned char uint8_t;
+
+void
+foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y)
+{
+  uint8_t *p1 = new uint8_t;
+  uint8_t *p2 = new uint8_t[20];
+  t1 *p3 = new t1;
+  t1 *p4 = new t1[y];
+  delete p1;
+  delete p3;
+  delete [] p2;
+  delete [] p4;
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator new"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator delete"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-2.C b/gcc/testsuite/g++.dg/gomp/usm-2.C
new file mode 100644
index 00000000000..f6ab155c6de
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-2.C
@@ -0,0 +1,30 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -foffload-memory=unified -fdump-tree-usm_transform" }
+
+struct t1
+{
+  int a;
+  int b;
+};
+
+typedef unsigned char uint8_t;
+
+void
+foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y)
+{
+  uint8_t *p1 = new uint8_t;
+  uint8_t *p2 = new uint8_t[20];
+  t1 *p3 = new t1;
+  t1 *p4 = new t1[y];
+  delete p1;
+  delete p3;
+  delete [] p2;
+  delete [] p4;
+}
+
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator new"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "operator delete"  "usm_transform"  } } */
diff --git a/gcc/testsuite/g++.dg/gomp/usm-3.C b/gcc/testsuite/g++.dg/gomp/usm-3.C
new file mode 100644
index 00000000000..50ac9302c8b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/usm-3.C
@@ -0,0 +1,38 @@
+// { dg-do compile }
+// { dg-options "-fopenmp -fdump-tree-usm_transform" }
+
+#pragma omp requires unified_shared_memory
+
+#include <new>
+
+
+struct X {
+    static void* operator new(std::size_t count)
+    {
+      static char buf[10];
+      return &buf[0];
+    }
+    static void* operator new[](std::size_t count)
+    {
+      static char buf[10];
+      return &buf[0];
+    }
+    static void operator delete(void*)
+    {
+    }
+    static void operator delete[](void*)
+    {
+    }
+};
+void foo() {
+  X* p1 = new X;
+  delete p1;
+  X* p2 = new X[10];
+  delete[] p2;
+  unsigned char buf[24] ;
+  int *p3 = new (buf) int(3);
+  p3[0] = 1;
+}
+
+/* { dg-final { scan-tree-dump-not "omp_alloc"  "usm_transform"  } } */
+/* { dg-final { scan-tree-dump-not "omp_free"  "usm_transform"  } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90
new file mode 100644
index 00000000000..dc775260cb7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-usm_transform" }
+
+!$omp requires unified_shared_memory
+end
+
+subroutine foo()
+  implicit none
+  integer, allocatable :: var1
+
+  allocate(var1)
+
+end subroutine
+
+! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform"  } } 
+! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform"  } } 
\ No newline at end of file
diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90
new file mode 100644
index 00000000000..7983444ebff
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" }
+
+subroutine foo()
+  implicit none
+  integer, allocatable :: var1
+
+  allocate(var1)
+
+end subroutine
+
+! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform"  } } 
+! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform"  } } 
\ No newline at end of file
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 606d1d60b85..494a9662afa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -424,6 +424,7 @@ extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_usm_transform (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C
new file mode 100644
index 00000000000..fea25e5f10b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/usm-1.C
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+#include <stdint.h>
+
+#pragma omp requires unified_shared_memory
+
+int g1 = 0;
+
+struct s1
+{
+  s1() { a = g1++;}
+  ~s1() { g1--;}
+  int a;
+};
+
+int
+main ()
+{
+  s1 *p1 = new s1;
+  s1 *p2 = new s1[10];
+
+  if (!p1 || !p2 || p1->a != 0)
+    __builtin_abort ();
+
+  for (int i = 0; i < 10; i++)
+    if (p2[i].a != i+1)
+      __builtin_abort ();
+
+  uintptr_t pp1 = (uintptr_t)p1;
+  uintptr_t pp2 = (uintptr_t)p2;
+
+#pragma omp target firstprivate(pp1, pp2)
+    {
+      s1 *t1 = (s1*)pp1;
+      s1 *t2 = (s1*)pp2;
+      if (t1->a != 0)
+	__builtin_abort ();
+
+      for (int i = 0; i < 10; i++)
+	if (t2[i].a != i+1)
+	  __builtin_abort ();
+
+      t1->a = 42;
+    }
+
+  if (p1->a != 42)
+    __builtin_abort ();
+
+  delete [] p2;
+  delete p1;
+  if (g1 != 0)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
new file mode 100644
index 00000000000..c207140092a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include <omp.h>
+
+/* On old systems, the declaraition may not be present in stdlib.h which
+   will generate a warning.  This function is going to be replaced with
+   omp_aligned_alloc so the purpose of this declaration is to avoid that
+   warning.  */
+void *aligned_alloc(size_t alignment, size_t size);
+
+#pragma omp requires unified_shared_memory
+
+int
+main ()
+{
+  int *a = (int *) malloc(sizeof(int)*2);
+  int *b = (int *) calloc(sizeof(int), 3);
+  int *c = (int *) realloc(NULL, sizeof(int) * 4);
+  int *d = (int *) aligned_alloc(32, sizeof(int));
+  int *e = (int *) omp_target_alloc(sizeof(int), 1);
+  if (!a || !b || !c || !d || !e)
+    __builtin_abort ();
+
+  a[0] = 42;
+  a[1] = 43;
+  b[0] = 52;
+  b[1] = 53;
+  b[2] = 54;
+  c[0] = 62;
+  c[1] = 63;
+  c[2] = 64;
+  c[3] = 65;
+
+  uintptr_t a_p = (uintptr_t)a;
+  uintptr_t b_p = (uintptr_t)b;
+  uintptr_t c_p = (uintptr_t)c;
+  uintptr_t d_p = (uintptr_t)d;
+  uintptr_t e_p = (uintptr_t)e;
+
+  if (d_p & 31 != 0)
+    __builtin_abort ();
+
+#pragma omp target enter data map(to:a[0:2])
+
+#pragma omp target is_device_ptr(c)
+    {
+      if (a[0] != 42 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+      if (b[0] != 52 || b[2] != 54 || b_p != (uintptr_t)b)
+	__builtin_abort ();
+      if (c[0] != 62 || c[3] != 65 || c_p != (uintptr_t)c)
+	__builtin_abort ();
+      if (d_p != (uintptr_t)d)
+	__builtin_abort ();
+      if (e_p != (uintptr_t)e)
+	__builtin_abort ();
+      a[0] = 72;
+      b[0] = 82;
+      c[0] = 92;
+      e[0] = 102;
+    }
+
+#pragma omp target
+    {
+      if (a[1] != 43 || a_p != (uintptr_t)a)
+	__builtin_abort ();
+      if (b[1] != 53 || b_p != (uintptr_t)b)
+	__builtin_abort ();
+      if (c[1] != 63 || c[2] != 64 || c_p != (uintptr_t)c)
+	__builtin_abort ();
+      a[1] = 73;
+      b[1] = 83;
+      c[1] = 93;
+    }
+
+#pragma omp target exit data map(delete:a[0:2])
+
+  if (a[0] != 72 || a[1] != 73
+      || b[0] != 82 || b[1] != 83
+      || c[0] != 92 || c[1] != 93
+      || e[0] != 102)
+	__builtin_abort ();
+  free(a);
+  free(b);
+  free(c);
+  omp_target_free(e, 1);
+  return 0;
+}

From patchwork Thu Jul  7 10:34:41 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55834
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 05DC03839439
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:38:21 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id A993C38418AF
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:37:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A993C38418AF
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448914"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:08 -0800
IronPort-SDR: 
 z8xUjM4WP/xwrGRp8ZkscD6w4AC8pTe4nHDYw3H/sMouXgcqgLbByix4TS5fgpKGdx8PoztBWw
 iJbUYXtuvpiZg8f72rxUmftVBvtkbdt/FYQRulW0CxHDHVN0oJqF+wVX3VDHB8CmsN7wo3ayj4
 nMEJiALjB5R4r8yKcSVQy3dRVwtkWYoklySKil4v96XX1lfThzYc5M41InLV0+HCUc/EIh49wc
 TSs87U8p4/HEBMn7dtZnIDDQSfPxuF211oDM2XNotDLWkc9Oadu4ZzciMDivBRcUuGP3o3+qqp
 jFc=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0)
Date: Thu, 7 Jul 2022 11:34:41 +0100
Message-ID: 
 <c00649080f9127a0eeabb45536a2846ffc4c3fa7.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Currently we only make use of this directive when it is associated
with an allocate statement.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE.
	(show_code_node): Likewise.
	* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
	(OMP_LIST_ALLOCATOR): New enum value.
	(enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
	* match.h (gfc_match_omp_allocate): New function.
	* openmp.cc (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR.
	(OMP_ALLOCATE_CLAUSES): New define.
	(gfc_match_omp_allocate): New function.
	(resolve_omp_clauses): Add ALLOCATOR in clause_names.
	(omp_code_to_statement): Handle EXEC_OMP_ALLOCATE.
	(EMPTY_VAR_LIST): New define.
	(check_allocate_directive_restrictions): New function.
	(gfc_resolve_omp_allocate): Likewise.
	(gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE.
	* parse.cc (decode_omp_directive): Handle ST_OMP_ALLOCATE.
	(next_statement): Likewise.
	(gfc_ascii_statement): Likewise.
	* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE.
	* st.cc (gfc_free_statement): Likewise.
	* trans.cc (trans_code): Likewise
---
 gcc/fortran/dump-parse-tree.cc                |   3 +
 gcc/fortran/gfortran.h                        |   4 +-
 gcc/fortran/match.h                           |   1 +
 gcc/fortran/openmp.cc                         | 199 +++++++++++++++++-
 gcc/fortran/parse.cc                          |  10 +-
 gcc/fortran/resolve.cc                        |   1 +
 gcc/fortran/st.cc                             |   1 +
 gcc/fortran/trans.cc                          |   1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 +++++++
 10 files changed, 400 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 5352008a63d..e0c6c0d9d96 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2003,6 +2003,7 @@ show_omp_node (int level, gfc_code *c)
     case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break;
     case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break;
     case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break;
+    case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break;
     case EXEC_OMP_ATOMIC: name = "ATOMIC"; break;
     case EXEC_OMP_BARRIER: name = "BARRIER"; break;
     case EXEC_OMP_CANCEL: name = "CANCEL"; break;
@@ -2204,6 +2205,7 @@ show_omp_node (int level, gfc_code *c)
       || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA
       || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN
       || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR
+      || c->op == EXEC_OMP_ALLOCATE
       || (c->op == EXEC_OMP_ORDERED && c->block == NULL))
     return;
   if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS)
@@ -3329,6 +3331,7 @@ show_code_node (int level, gfc_code *c)
     case EXEC_OACC_CACHE:
     case EXEC_OACC_ENTER_DATA:
     case EXEC_OACC_EXIT_DATA:
+    case EXEC_OMP_ALLOCATE:
     case EXEC_OMP_ATOMIC:
     case EXEC_OMP_CANCEL:
     case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 696aadd7db6..755469185a6 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -259,7 +259,7 @@ enum gfc_statement
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
   ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL,
   ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
-  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
+  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
   ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS,
@@ -1398,6 +1398,7 @@ enum
   OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NONTEMPORAL,
   OMP_LIST_ALLOCATE,
+  OMP_LIST_ALLOCATOR,
   OMP_LIST_HAS_DEVICE_ADDR,
   OMP_LIST_ENTER,
   OMP_LIST_NUM /* Must be the last.  */
@@ -2908,6 +2909,7 @@ enum gfc_exec_op
   EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
   EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
   EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE,
+  EXEC_OMP_ALLOCATE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 495c93e0b5c..fe43d4b3fd3 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -149,6 +149,7 @@ match gfc_match_oacc_routine (void);
 
 /* OpenMP directive matchers.  */
 match gfc_match_omp_eos_error (void);
+match gfc_match_omp_allocate (void);
 match gfc_match_omp_atomic (void);
 match gfc_match_omp_barrier (void);
 match gfc_match_omp_cancel (void);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 91bf8a3c50d..38003890bb0 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -986,6 +986,7 @@ enum omp_mask2
   OMP_CLAUSE_FINALIZE,
   OMP_CLAUSE_ATTACH,
   OMP_CLAUSE_NOHOST,
+  OMP_CLAUSE_ALLOCATOR,
   OMP_CLAUSE_HAS_DEVICE_ADDR,  /* OpenMP 5.1  */
   OMP_CLAUSE_ENTER, /* OpenMP 5.2 */
   /* This must come last.  */
@@ -3784,6 +3785,7 @@ cleanup:
 }
 
 
+#define OMP_ALLOCATE_CLAUSES (omp_mask (OMP_CLAUSE_ALLOCATOR))
 #define OMP_PARALLEL_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
    | OMP_CLAUSE_SHARED | OMP_CLAUSE_COPYIN | OMP_CLAUSE_REDUCTION	\
@@ -6001,6 +6003,64 @@ gfc_match_omp_ordered_depend (void)
   return match_omp (EXEC_OMP_ORDERED, omp_mask (OMP_CLAUSE_DEPEND));
 }
 
+/* omp allocate (list) [clause-list]
+   - clause-list:  allocator
+*/
+
+match
+gfc_match_omp_allocate (void)
+{
+  gfc_omp_clauses *c = gfc_get_omp_clauses ();
+  gfc_expr *allocator = NULL;
+  match m;
+
+  m = gfc_match (" (");
+  if (m == MATCH_YES)
+    {
+      m = gfc_match_omp_variable_list ("", &c->lists[OMP_LIST_ALLOCATOR],
+				       true, NULL);
+
+      if (m != MATCH_YES)
+	{
+	  /* If the list was empty, we must find closing ')'.  */
+	  m = gfc_match (")");
+	  if (m != MATCH_YES)
+	    return m;
+	}
+    }
+
+  if (gfc_match (" allocator ( ") == MATCH_YES)
+    {
+      m = gfc_match_expr (&allocator);
+      if (m != MATCH_YES)
+	{
+	  gfc_error ("Expected allocator at %C");
+	  return MATCH_ERROR;
+	}
+      if (gfc_match (" ) ") != MATCH_YES)
+	{
+	  gfc_error ("Expected ')' at %C");
+	  gfc_free_expr (allocator);
+	  return MATCH_ERROR;
+	}
+    }
+
+  if (gfc_match_omp_eos () != MATCH_YES)
+    {
+      gfc_free_expr (allocator);
+      gfc_error ("Unexpected junk after $OMP allocate at %C");
+      return MATCH_ERROR;
+    }
+  gfc_omp_namelist *n;
+  for (n = c->lists[OMP_LIST_ALLOCATOR]; n; n = n->next)
+      n->expr = gfc_copy_expr (allocator);
+
+  new_st.op = EXEC_OMP_ALLOCATE;
+  new_st.ext.omp_clauses = c;
+  gfc_free_expr (allocator);
+  return MATCH_YES;
+}
+
 
 /* omp atomic [clause-list]
    - atomic-clause:  read | write | update
@@ -6482,7 +6542,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	"IN_REDUCTION", "TASK_REDUCTION",
 	"DEVICE_RESIDENT", "LINK", "USE_DEVICE",
 	"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR",
-	"NONTEMPORAL", "ALLOCATE", "HAS_DEVICE_ADDR", "ENTER" };
+	"NONTEMPORAL", "ALLOCATE", "HAS_DEVICE_ADDR", "ENTER", "ALLOCATOR" };
   STATIC_ASSERT (ARRAY_SIZE (clause_names) == OMP_LIST_NUM);
 
   if (omp_clauses == NULL)
@@ -9006,6 +9066,8 @@ omp_code_to_statement (gfc_code *code)
 {
   switch (code->op)
     {
+    case EXEC_OMP_ALLOCATE:
+      return ST_OMP_ALLOCATE;
     case EXEC_OMP_PARALLEL:
       return ST_OMP_PARALLEL;
     case EXEC_OMP_PARALLEL_MASKED:
@@ -9486,6 +9548,138 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
     }
 }
 
+static void
+check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
+				       gfc_namespace *ns, locus loc)
+{
+  if (sym->attr.save != SAVE_NONE || sym->attr.in_common == 1
+      || sym->module != NULL)
+    {
+      int tmp;
+      /*  Assumption here is that we can extract an integer then
+	  it is a predefined thing.  */
+      if (!omp_al || gfc_extract_int (omp_al, &tmp))
+	  gfc_error ("%qs should use predefined allocator at %L", sym->name,
+		     &loc);
+    }
+  if (ns != sym->ns)
+    gfc_error ("%qs is not in the same scope as %<allocate%>"
+	       " directive at %L", sym->name, &loc);
+}
+
+#define EMPTY_VAR_LIST(node) \
+  (node->ext.omp_clauses->lists[OMP_LIST_ALLOCATOR] == NULL)
+
+static void
+gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns)
+{
+  gfc_alloc *al;
+  gfc_omp_namelist *n = NULL;
+  gfc_omp_namelist *cn = NULL;
+  gfc_omp_namelist *p, *tail;
+  gfc_code *cur;
+  hash_set<gfc_symbol*> vars;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+  cn = clauses->lists[OMP_LIST_ALLOCATOR];
+  gfc_expr *omp_al = cn ? cn->expr : NULL;
+
+  if (omp_al && (omp_al->ts.type != BT_INTEGER
+      || omp_al->ts.kind != gfc_c_intptr_kind))
+    gfc_error ("Expected integer expression of the "
+	       "%<omp_allocator_handle_kind%> kind at %L", &omp_al->where);
+
+  /* Check that variables in this allocate directive are not duplicated
+     in this directive or others coming directly after it.  */
+  for (cur = code; cur != NULL && cur->op == EXEC_OMP_ALLOCATE;
+      cur = cur->next)
+    {
+      gfc_omp_clauses *c = cur->ext.omp_clauses;
+      gcc_assert (c);
+      for (n = c->lists[OMP_LIST_ALLOCATOR]; n; n = n->next)
+	{
+	  if (vars.contains (n->sym))
+	    gfc_error ("%qs is used in multiple %<allocate%> "
+		       "directives at %L", n->sym->name, &cur->loc);
+	  /* This helps us avoid duplicate error messages.  */
+	  if (cur == code)
+	    vars.add (n->sym);
+	}
+    }
+
+  if (cur == NULL || cur->op != EXEC_ALLOCATE)
+    {
+      /*  There is no allocate statement right after allocate directive.
+	  We don't support this case at the moment.  */
+      for (n = cn; n != NULL; n = n->next)
+	{
+	  gfc_symbol *sym = n->sym;
+	  if (sym->attr.allocatable == 1)
+	    gfc_error ("%qs with ALLOCATABLE attribute is not allowed in "
+		       "%<allocate%> directive at %L as this directive is not"
+		       " associated with an %<allocate%> statement.",
+		       sym->name, &code->loc);
+	}
+      sorry_at (code->loc.lb->location, "%<allocate%> directive that is "
+		"not associated with an %<allocate%> statement is not "
+		"supported.");
+      return;
+    }
+
+  /* If there is another allocate directive right after this one, check
+     that none of them is empty.  Doing it this way, we can check this
+     thing even when multiple directives are together and generate
+     error at right location.  */
+  if (code->next && code->next->op == EXEC_OMP_ALLOCATE
+      && (EMPTY_VAR_LIST (code) || EMPTY_VAR_LIST (code->next)))
+    gfc_error ("Empty variable list is not allowed at %L when multiple "
+	       "%<allocate%> directives are associated with an "
+	       "%<allocate%> statement.",
+	       EMPTY_VAR_LIST (code) ? &code->loc : &code->next->loc);
+
+  if (EMPTY_VAR_LIST (code))
+    {
+      /* Empty namelist means allocate directive applies to all
+	 variables in allocate statement.  'cur' points to associated
+	 allocate statement.  */
+      for (al = cur->ext.alloc.list; al != NULL; al = al->next)
+	if (al->expr && al->expr->symtree && al->expr->symtree->n.sym)
+	  {
+	    check_allocate_directive_restrictions (al->expr->symtree->n.sym,
+						   omp_al, ns, code->loc);
+	    p = gfc_get_omp_namelist ();
+	    p->sym = al->expr->symtree->n.sym;
+	    p->expr = omp_al;
+	    p->where = code->loc;
+	    if (cn == NULL)
+	      cn = tail = p;
+	    else
+	      {
+		tail->next = p;
+		tail = tail->next;
+	      }
+	  }
+      clauses->lists[OMP_LIST_ALLOCATOR]= cn;
+    }
+  else
+    {
+      for (n = cn; n != NULL; n = n->next)
+	{
+	  for (al = cur->ext.alloc.list; al != NULL; al = al->next)
+	    if (al->expr && al->expr->symtree && al->expr->symtree->n.sym
+		&& al->expr->symtree->n.sym == n->sym)
+	      break;
+	  if (al == NULL)
+	    gfc_error ("%qs in %<allocate%> directive at %L is not present "
+		       "in associated %<allocate%> statement.",
+		       n->sym->name, &code->loc);
+	  check_allocate_directive_restrictions (n->sym, omp_al, ns,
+						 code->loc);
+	}
+    }
+}
+
 
 void
 gfc_resolve_oacc_directive (gfc_code *code, gfc_namespace *ns ATTRIBUTE_UNUSED)
@@ -9627,6 +9821,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_namespace *ns)
       code->ext.omp_clauses->if_present = false;
       resolve_omp_clauses (code, code->ext.omp_clauses, ns);
       break;
+    case EXEC_OMP_ALLOCATE:
+      gfc_resolve_omp_allocate (code, ns);
+      break;
     default:
       break;
     }
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 0b4c596996c..97d182d46ad 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -886,6 +886,7 @@ decode_omp_directive (void)
     {
     case 'a':
       matcho ("atomic", gfc_match_omp_atomic, ST_OMP_ATOMIC);
+      matcho ("allocate", gfc_match_omp_allocate, ST_OMP_ALLOCATE);
       break;
     case 'b':
       matcho ("barrier", gfc_match_omp_barrier, ST_OMP_BARRIER);
@@ -1673,9 +1674,9 @@ next_statement (void)
   case ST_OMP_CANCEL: case ST_OMP_CANCELLATION_POINT: case ST_OMP_DEPOBJ: \
   case ST_OMP_TARGET_UPDATE: case ST_OMP_TARGET_ENTER_DATA: \
   case ST_OMP_TARGET_EXIT_DATA: case ST_OMP_ORDERED_DEPEND: case ST_OMP_ERROR: \
-  case ST_ERROR_STOP: case ST_OMP_SCAN: case ST_SYNC_ALL: \
-  case ST_SYNC_IMAGES: case ST_SYNC_MEMORY: case ST_LOCK: case ST_UNLOCK: \
-  case ST_FORM_TEAM: case ST_CHANGE_TEAM: \
+  case ST_OMP_ALLOCATE: case ST_ERROR_STOP: case ST_OMP_SCAN: \
+  case ST_SYNC_ALL: case ST_SYNC_IMAGES: case ST_SYNC_MEMORY: case ST_LOCK: \
+  case ST_UNLOCK: case ST_FORM_TEAM: case ST_CHANGE_TEAM: \
   case ST_END_TEAM: case ST_SYNC_TEAM: \
   case ST_EVENT_POST: case ST_EVENT_WAIT: case ST_FAIL_IMAGE: \
   case ST_OACC_UPDATE: case ST_OACC_WAIT: case ST_OACC_CACHE: \
@@ -2352,6 +2353,9 @@ gfc_ascii_statement (gfc_statement st)
     case ST_OACC_END_ATOMIC:
       p = "!$ACC END ATOMIC";
       break;
+    case ST_OMP_ALLOCATE:
+      p = "!$OMP ALLOCATE";
+      break;
     case ST_OMP_ATOMIC:
       p = "!$OMP ATOMIC";
       break;
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 2ebf076f730..65f24b88067 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -12368,6 +12368,7 @@ start:
 	  gfc_resolve_oacc_directive (code, ns);
 	  break;
 
+	case EXEC_OMP_ALLOCATE:
 	case EXEC_OMP_ATOMIC:
 	case EXEC_OMP_BARRIER:
 	case EXEC_OMP_CANCEL:
diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc
index 73f30c2137f..7b282e96c3d 100644
--- a/gcc/fortran/st.cc
+++ b/gcc/fortran/st.cc
@@ -214,6 +214,7 @@ gfc_free_statement (gfc_code *p)
     case EXEC_OACC_ENTER_DATA:
     case EXEC_OACC_EXIT_DATA:
     case EXEC_OACC_ROUTINE:
+    case EXEC_OMP_ALLOCATE:
     case EXEC_OMP_ATOMIC:
     case EXEC_OMP_CANCEL:
     case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 912a206f2ed..a9d5714be22 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -2174,6 +2174,7 @@ trans_code (gfc_code * code, tree cond)
 	  res = gfc_trans_dt_end (code);
 	  break;
 
+	case EXEC_OMP_ALLOCATE:
 	case EXEC_OMP_ATOMIC:
 	case EXEC_OMP_BARRIER:
 	case EXEC_OMP_CANCEL:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
new file mode 100644
index 00000000000..3f512d66495
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
@@ -0,0 +1,112 @@
+! { dg-do compile }
+
+module test
+  integer, allocatable :: mvar1
+  integer, allocatable :: mvar2
+  integer, allocatable :: mvar3
+end module
+
+subroutine foo(x, y)
+  use omp_lib
+  implicit none
+  integer  :: x
+  integer  :: y
+  
+  integer, allocatable :: var1(:)
+  integer, allocatable :: var2(:)
+  integer, allocatable :: var3(:)
+  integer, allocatable :: var4(:)
+  integer, allocatable :: var5(:)
+  integer, allocatable :: var6(:)
+  integer, allocatable :: var7(:)
+  integer, allocatable :: var8(:)
+  integer, allocatable :: var9(:)
+
+  !$omp allocate (var1) allocator(10) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind at .1." }
+  allocate (var1(x))
+
+  !$omp allocate (var2)  ! { dg-error "'var2' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
+  allocate (var3(x))
+
+  !$omp allocate (x) ! { dg-message "sorry, unimplemented: 'allocate' directive that is not associated with an 'allocate' statement is not supported." }
+  x = 2
+
+  !$omp allocate (var4) ! { dg-error "'var4' with ALLOCATABLE attribute is not allowed in 'allocate' directive at .1. as this directive is not associated with an 'allocate' statement." } 
+  ! { dg-message "sorry, unimplemented: 'allocate' directive that is not associated with an 'allocate' statement is not supported." "" { target *-*-* } .-1 }
+  y = 2
+
+  !$omp allocate (var5)
+  !$omp allocate  ! { dg-error "Empty variable list is not allowed at .1. when multiple 'allocate' directives are associated with an 'allocate' statement." }
+  allocate (var5(x))
+
+  !$omp allocate (var6)
+  !$omp allocate (var7)  ! { dg-error "'var7' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
+  !$omp allocate (var8)  ! { dg-error "'var8' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
+  allocate (var6(x))
+
+  !$omp allocate (var9)
+  !$omp allocate (var9)  ! { dg-error "'var9' is used in multiple 'allocate' directives at .1." }
+  allocate (var9(x))
+
+end subroutine
+
+function outer(a)
+  IMPLICIT NONE
+
+  integer :: outer, a
+  integer, allocatable :: var1
+
+  outer = inner(a) + 5
+  return
+
+  contains
+
+    integer function inner(x)
+    integer :: x
+    integer, allocatable :: var2
+
+    !$omp allocate (var1, var2) ! { dg-error "'var1' is not in the same scope as 'allocate' directive at .1." }
+    allocate (var1, var2)
+
+    inner = x + 10
+    return
+    end function inner
+
+end function outer
+
+subroutine bar(s)
+  use omp_lib
+  use test
+  integer  :: s
+  integer, save, allocatable :: svar1
+  integer, save, allocatable :: svar2
+  integer, save, allocatable :: svar3
+
+  type (omp_alloctrait) :: traits(3)
+  integer (omp_allocator_handle_kind) :: a
+
+  traits = [omp_alloctrait (omp_atk_alignment, 64), &
+            omp_alloctrait (omp_atk_fallback, omp_atv_null_fb), &
+            omp_alloctrait (omp_atk_pool_size, 8192)]
+  a = omp_init_allocator (omp_default_mem_space, 3, traits)
+  if (a == omp_null_allocator) stop 1
+
+  !$omp allocate (mvar1) allocator(a) ! { dg-error "'mvar1' should use predefined allocator at .1." }
+  allocate (mvar1)
+
+  !$omp allocate (mvar2) ! { dg-error "'mvar2' should use predefined allocator at .1." }
+  allocate (mvar2)
+
+  !$omp allocate (mvar3) allocator(omp_low_lat_mem_alloc)
+  allocate (mvar3)
+
+  !$omp allocate (svar1)  allocator(a) ! { dg-error "'svar1' should use predefined allocator at .1." }
+  allocate (svar1)
+
+  !$omp allocate (svar2) ! { dg-error "'svar2' should use predefined allocator at .1." }
+  allocate (svar2)
+
+  !$omp allocate (svar3) allocator(omp_low_lat_mem_alloc)
+  allocate (svar3)
+end subroutine
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
new file mode 100644
index 00000000000..761b6dede28
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
@@ -0,0 +1,73 @@
+! { dg-do compile }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_thread_mem_alloc = 8
+end module
+
+subroutine foo(x, y)
+  use omp_lib_kinds
+  implicit none
+  integer  :: x
+  integer  :: y
+
+  integer, allocatable :: var1(:)
+  integer, allocatable :: var2(:)
+  integer, allocatable :: var3(:)
+  integer, allocatable :: var4(:,:)
+  integer, allocatable :: var5(:)
+  integer, allocatable :: var6(:)
+  integer, allocatable :: var7(:)
+  integer, allocatable :: var8(:)
+  integer, allocatable :: var9(:)
+  integer, allocatable :: var10(:)
+  integer, allocatable :: var11(:)
+  integer, allocatable :: var12(:)
+
+  !$omp allocate (var1) allocator(omp_default_mem_alloc)
+  allocate (var1(x))
+  
+  !$omp allocate (var2)
+  allocate (var2(x))
+
+  !$omp allocate (var3, var4) allocator(omp_large_cap_mem_alloc)
+  allocate (var3(x),var4(x,y))
+
+  !$omp allocate()
+  allocate (var5(x))
+
+  !$omp allocate
+  allocate (var6(x))
+
+  !$omp allocate () allocator(omp_default_mem_alloc)
+  allocate (var7(x))
+
+  !$omp allocate allocator(omp_default_mem_alloc)
+  allocate (var8(x))
+
+  !$omp allocate (var9) allocator(omp_default_mem_alloc)
+  !$omp allocate (var10) allocator(omp_large_cap_mem_alloc)
+  allocate (var9(x), var10(x))
+
+end subroutine

From patchwork Thu Jul  7 10:34:42 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55830
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 25D5F383664F
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:37:45 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id B2A25384189D
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:37:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B2A25384189D
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448931"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:10 -0800
IronPort-SDR: 
 6pRn9wCFnfMbwJM35bgfDGTPCNrVLgfJ/73714OGJceq+s+6tVLU1YV4G/gCvjTjj02VB08WQK
 deuWuDNtcXAIogdN6TTnjQYSiaZwbegPL7pqWe9ZBb7M84BW0sOgQF2qPXapSAsm3DV3JIDnwz
 cxyJcV0sBfS1QRR3LwvqGxvvRt2CkmtmlMHc2DWO2hPIJupjOu9Rj+GftzBt93Wg6tusnje6cy
 /eFRbPM3iORzzs0FdlPmcdnZZeS/l6opyPj7na9V5Mh9+jA4PPgQOCJe5beWlkFwsUk8uu3/Tz
 JME=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 11/17] Translate allocate directive (OpenMP 5.0).
Date: Thu, 7 Jul 2022 11:34:42 +0100
Message-ID: 
 <6a5caebc7e24c68f4bf788ae2cd5ee2faf868051.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
	(gfc_trans_omp_allocate): New function.
	(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.

gcc/ChangeLog:

	* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
	(dump_generic_node): Handle OMP_ALLOCATE.
	* tree.def (OMP_ALLOCATE): New.
	* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
	(OMP_ALLOCATE_DECL): Likewise.
	(OMP_ALLOCATE_ALLOCATOR): Likewise.
	* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: New test.
---
 gcc/fortran/trans-openmp.cc                   | 44 ++++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++++++++++++++++++
 gcc/tree-core.h                               |  3 +
 gcc/tree-pretty-print.cc                      | 19 +++++
 gcc/tree.cc                                   |  1 +
 gcc/tree.def                                  |  4 ++
 gcc/tree.h                                    | 11 +++
 7 files changed, 154 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index de27ed52c02..3ee63e416ed 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2728,6 +2728,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  }
 	      }
 	  break;
+	case OMP_LIST_ALLOCATOR:
+	  for (; n != NULL; n = n->next)
+	    if (n->sym->attr.referenced)
+	      {
+		tree t = gfc_trans_omp_variable (n->sym, false);
+		if (t != error_mark_node)
+		  {
+		    tree node = build_omp_clause (input_location,
+						  OMP_CLAUSE_ALLOCATOR);
+		    OMP_ALLOCATE_DECL (node) = t;
+		    if (n->expr)
+		      {
+			tree allocator_;
+			gfc_init_se (&se, NULL);
+			gfc_conv_expr (&se, n->expr);
+			allocator_ = gfc_evaluate_now (se.expr, block);
+			OMP_ALLOCATE_ALLOCATOR (node) = allocator_;
+		      }
+		    omp_clauses = gfc_trans_add_clause (node, omp_clauses);
+		  }
+	      }
+	  break;
 	case OMP_LIST_LINEAR:
 	  {
 	    gfc_expr *last_step_expr = NULL;
@@ -4982,6 +5004,26 @@ gfc_trans_omp_atomic (gfc_code *code)
   return gfc_finish_block (&block);
 }
 
+static tree
+gfc_trans_omp_allocate (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+
+  gfc_start_block (&block);
+  stmt = make_node (OMP_ALLOCATE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
+						       code->loc, false,
+						       true);
+  gfc_add_expr_to_block (&block, stmt);
+  gfc_merge_block_scope (&block);
+  return gfc_finish_block (&block);
+}
+
 static tree
 gfc_trans_omp_barrier (void)
 {
@@ -7488,6 +7530,8 @@ gfc_trans_omp_directive (gfc_code *code)
 {
   switch (code->op)
     {
+    case EXEC_OMP_ALLOCATE:
+      return gfc_trans_omp_allocate (code);
     case EXEC_OMP_ATOMIC:
       return gfc_trans_omp_atomic (code);
     case EXEC_OMP_BARRIER:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
new file mode 100644
index 00000000000..2de2b52ee44
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -0,0 +1,72 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_thread_mem_alloc = 8
+end module
+
+
+subroutine foo(x, y, al)
+  use omp_lib_kinds
+  implicit none
+  
+type :: my_type
+  integer :: i
+  integer :: j
+  real :: x
+end type
+
+  integer  :: x
+  integer  :: y
+  integer (kind=omp_allocator_handle_kind) :: al
+
+  integer, allocatable :: var1
+  integer, allocatable :: var2
+  real, allocatable :: var3(:,:)
+  type (my_type), allocatable :: var4
+  integer, pointer :: pii, parr(:)
+
+  character, allocatable :: str1a, str1aarr(:) 
+  character(len=5), allocatable :: str5a, str5aarr(:)
+  
+  !$omp allocate
+  allocate(str1a, str1aarr(10), str5a, str5aarr(10))
+
+  !$omp allocate (var1) allocator(omp_default_mem_alloc)
+  !$omp allocate (var2) allocator(omp_large_cap_mem_alloc)
+  allocate (var1, var2)
+
+  !$omp allocate (var4)  allocator(omp_low_lat_mem_alloc)
+  allocate (var4)
+  var4%i = 5
+
+  !$omp allocate (var3)  allocator(omp_low_lat_mem_alloc)
+  allocate (var3(x,y))
+
+  !$omp allocate
+  allocate(pii, parr(5))
+end subroutine
+
+! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index ab5fa01e5cb..774bf0d7658 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -522,6 +522,9 @@ enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: allocator.  */
+  OMP_CLAUSE_ALLOCATOR
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 47371d8bcbe..4d21babbd34 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -767,6 +767,20 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE_ALLOCATOR:
+      pp_string (pp, "(");
+      dump_generic_node (pp, OMP_ALLOCATE_DECL (clause),
+			 spc, flags, false);
+      if (OMP_ALLOCATE_ALLOCATOR (clause))
+	{
+	  pp_string (pp, ":allocator(");
+	  dump_generic_node (pp, OMP_ALLOCATE_ALLOCATOR (clause),
+			     spc, flags, false);
+	  pp_right_paren (pp);
+	}
+      pp_right_paren (pp);
+      break;
+
     case OMP_CLAUSE_ALLOCATE:
       pp_string (pp, "allocate(");
       if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (clause))
@@ -3525,6 +3539,11 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       dump_omp_clauses (pp, OACC_CACHE_CLAUSES (node), spc, flags);
       break;
 
+    case OMP_ALLOCATE:
+      pp_string (pp, "#pragma omp allocate ");
+      dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags);
+      break;
+
     case OMP_PARALLEL:
       pp_string (pp, "#pragma omp parallel");
       dump_omp_clauses (pp, OMP_PARALLEL_CLAUSES (node), spc, flags);
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 84000dd8b69..6dc1cf4d9b3 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -351,6 +351,7 @@ unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
+  2, /* OMP_CLAUSE_ALLOCATOR */
 };
 
 const char * const omp_clause_code_name[] =
diff --git a/gcc/tree.def b/gcc/tree.def
index 62650b6934b..b4d2f7a575d 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1307,6 +1307,10 @@ DEFTREECODE (OMP_ATOMIC_READ, "omp_atomic_read", tcc_statement, 1)
 DEFTREECODE (OMP_ATOMIC_CAPTURE_OLD, "omp_atomic_capture_old", tcc_statement, 2)
 DEFTREECODE (OMP_ATOMIC_CAPTURE_NEW, "omp_atomic_capture_new", tcc_statement, 2)
 
+/* OpenMP - #pragma omp allocate
+   Operand 0: Clauses.  */
+DEFTREECODE (OMP_ALLOCATE, "omp allocate", tcc_statement, 1)
+
 /* OpenMP clauses.  */
 DEFTREECODE (OMP_CLAUSE, "omp_clause", tcc_exceptional, 0)
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 6f6ad5a3a5f..b2575c18693 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1466,6 +1466,8 @@ class auto_suppress_location_wrappers
 #define OACC_UPDATE_CLAUSES(NODE) \
   TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0)
 
+#define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0)
+
 #define OMP_PARALLEL_BODY(NODE)    TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0)
 #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1)
 
@@ -1872,6 +1874,15 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_ALLOCATE_ALIGN(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATE), 2)
 
+/* May be we can use OMP_CLAUSE_DECL but the I am not sure where to place
+   OMP_CLAUSE_ALLOCATOR in omp_clause_code.  */
+
+#define OMP_ALLOCATE_DECL(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 0)
+
+#define OMP_ALLOCATE_ALLOCATOR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 1)
+
 /* True if an ALLOCATE clause was present on a combined or composite
    construct and the code for splitting the clauses has already performed
    checking if the listed variable has explicit privatization on the

From patchwork Thu Jul  7 10:34:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55828
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 117673842AFD
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:37:29 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 5ED2238485B9
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:37:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5ED2238485B9
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112769"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:11 -0800
IronPort-SDR: 
 L7hVmE722WAMiMUrGc5hHCzlUz3QX5IJoWhqN0ohKaM7KCuF09KW00pTNsjRUFqdXZoyBIcgYV
 Rfw3tin8eFBK+dGr/16KetTdKvx7EnhQBxgGBkeT8iWp4OavTRXh/X2h72I7oO/lkgayA5zXdh
 84TEs5yTMi2BE3qToV8ioRqLZh7oq0RcbnUGZNV2UDrwXYuCXmdKB899iug1LqyoaLz/HdZAMG
 Z9tVzsdSiFtAkb1/t111Vh4QAVL+WyS0mGFeCZxk9unnSTMZdjaYWUiQ2FrepC2jvmtl0Oc3c7
 5T0=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).
Date: Thu, 7 Jul 2022 11:34:43 +0100
Message-ID: 
 <acf09be17780d6703540938e35207e0db4af450e.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Currently we are only handling omp allocate directive that is associated
with an allocate statement.  This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive.  But the free calls come in a separate cleanup block.  To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.

gcc/fortran/ChangeLog:

	* gfortran.h (struct access_ref): Declare new members
	omp_allocated and omp_allocated_end.
	* openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to
	NULL.
	(prepare_omp_allocated_var_list_for_cleanup): New function.
	(gfc_resolve_omp_allocate): Call it.
	* trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated.
	* trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt
	generated for allocate directive.

gcc/ChangeLog:

	* tree-core.h (struct tree_base): Add comments.
	* tree-pretty-print.cc (dump_generic_node): Handle allocate directive
	kind.
	* tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define.
	(OMP_ALLOCATE_KIND_FREE): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive.
---
 gcc/fortran/gfortran.h                        |  1 +
 gcc/fortran/openmp.cc                         | 30 +++++++++++++++++++
 gcc/fortran/trans-decl.cc                     | 20 +++++++++++++
 gcc/fortran/trans-openmp.cc                   |  6 ++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  3 +-
 gcc/tree-core.h                               |  6 ++++
 gcc/tree-pretty-print.cc                      |  4 +++
 gcc/tree.h                                    |  4 +++
 8 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 755469185a6..c6f58341cf3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1829,6 +1829,7 @@ typedef struct gfc_symbol
   gfc_array_spec *as;
   struct gfc_symbol *result;	/* function result symbol */
   gfc_component *components;	/* Derived type components */
+  gfc_omp_namelist *omp_allocated, *omp_allocated_end;
 
   /* Defined only for Cray pointees; points to their pointer.  */
   struct gfc_symbol *cp_pointer;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 38003890bb0..4c94bc763b5 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -6057,6 +6057,7 @@ gfc_match_omp_allocate (void)
 
   new_st.op = EXEC_OMP_ALLOCATE;
   new_st.ext.omp_clauses = c;
+  new_st.resolved_sym = NULL;
   gfc_free_expr (allocator);
   return MATCH_YES;
 }
@@ -9548,6 +9549,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
     }
 }
 
+static void
+prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc)
+{
+  gfc_symbol *proc = cn->sym->ns->proc_name;
+  gfc_omp_namelist *p, *n;
+
+  for (n = cn; n; n = n->next)
+    {
+      if (n->sym->attr.allocatable && !n->sym->attr.save
+	  && !n->sym->attr.result && !proc->attr.is_main_program)
+	{
+	  p = gfc_get_omp_namelist ();
+	  p->sym = n->sym;
+	  p->expr = gfc_copy_expr (n->expr);
+	  p->where = loc;
+	  p->next = NULL;
+	  if (proc->omp_allocated == NULL)
+	    proc->omp_allocated_end = proc->omp_allocated = p;
+	  else
+	    {
+	      proc->omp_allocated_end->next = p;
+	      proc->omp_allocated_end = p;
+	    }
+
+	}
+    }
+}
+
 static void
 check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
 				       gfc_namespace *ns, locus loc)
@@ -9678,6 +9707,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns)
 						 code->loc);
 	}
     }
+  prepare_omp_allocated_var_list_for_cleanup (cn, code->loc);
 }
 
 
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 6493cc2f6b1..326365f22fc 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -4588,6 +4588,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 	  }
     }
 
+  /* Generate a dummy allocate pragma with free kind so that cleanup
+     of those variables which were allocated using the allocate statement
+     associated with an allocate clause happens correctly.  */
+
+  if (proc_sym->omp_allocated)
+    {
+      gfc_clear_new_st ();
+      new_st.op = EXEC_OMP_ALLOCATE;
+      gfc_omp_clauses *c = gfc_get_omp_clauses ();
+      c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated;
+      new_st.ext.omp_clauses = c;
+      /* This is just a hacky way to convey to handler that we are
+	 dealing with cleanup here.  Saves us from using another field
+	 for it.  */
+      new_st.resolved_sym = proc_sym->omp_allocated->sym;
+      gfc_add_init_cleanup (block, NULL,
+			    gfc_trans_omp_directive (&new_st));
+      gfc_free_omp_clauses (c);
+      proc_sym->omp_allocated = NULL;
+    }
 
   /* Initialize the INTENT(OUT) derived type dummy arguments.  This
      should be done here so that the offsets and lbounds of arrays
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 3ee63e416ed..ab3c0c620b7 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5019,6 +5019,12 @@ gfc_trans_omp_allocate (gfc_code *code)
   OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
 						       code->loc, false,
 						       true);
+  if (code->next == NULL && code->block == NULL
+      && code->resolved_sym != NULL)
+    OMP_ALLOCATE_KIND_FREE (stmt) = 1;
+  else
+    OMP_ALLOCATE_KIND_ALLOCATE (stmt) = 1;
+
   gfc_add_expr_to_block (&block, stmt);
   gfc_merge_block_scope (&block);
   return gfc_finish_block (&block);
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
index 2de2b52ee44..0eb35178e03 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -69,4 +69,5 @@ end type
   allocate(pii, parr(5))
 end subroutine
 
-! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } }
+! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 774bf0d7658..b0d5c074552 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1257,6 +1257,9 @@ struct GTY(()) tree_base {
        EXPR_LOCATION_WRAPPER_P in
 	   NON_LVALUE_EXPR, VIEW_CONVERT_EXPR
 
+       OMP_ALLOCATE_KIND_ALLOCATE in
+	   OMP_ALLOCATE
+
    private_flag:
 
        TREE_PRIVATE in
@@ -1283,6 +1286,9 @@ struct GTY(()) tree_base {
        ENUM_IS_OPAQUE in
 	   ENUMERAL_TYPE
 
+       OMP_ALLOCATE_KIND_FREE in
+	   OMP_ALLOCATE
+
    protected_flag:
 
        TREE_PROTECTED in
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4d21babbd34..23dd45de556 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -3541,6 +3541,10 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 
     case OMP_ALLOCATE:
       pp_string (pp, "#pragma omp allocate ");
+      if (OMP_ALLOCATE_KIND_ALLOCATE (node))
+	pp_string (pp, "(kind=allocate) ");
+      else if (OMP_ALLOCATE_KIND_FREE (node))
+	pp_string (pp, "(kind=free) ");
       dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags);
       break;
 
diff --git a/gcc/tree.h b/gcc/tree.h
index b2575c18693..1b67505f974 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1467,6 +1467,10 @@ class auto_suppress_location_wrappers
   TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0)
 
 #define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0)
+#define OMP_ALLOCATE_KIND_ALLOCATE(NODE) \
+  (OMP_ALLOCATE_CHECK (NODE)->base.public_flag)
+#define OMP_ALLOCATE_KIND_FREE(NODE) \
+  (OMP_ALLOCATE_CHECK (NODE)->base.private_flag)
 
 #define OMP_PARALLEL_BODY(NODE)    TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0)
 #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1)

From patchwork Thu Jul  7 10:34:44 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55833
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id CACC1382E81C
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:38:15 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 5B41E383F969
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:37:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B41E383F969
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448958"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:16 -0800
IronPort-SDR: 
 OJkQtFoiTcsRZP1TmIlZCK5bjbpXzqj+/X0wZUEFkKoNxHpC7FxfMmwhB9wbBwmk03+JfoV1Dn
 2luaPbzAitG225lbfRDCzySwjg+DvxoSW+P0yRtTW31ZHY7N+o86AKBabAMDZ21QPSbTB3ZuaI
 o7T+nK0V4WfgQJBdx0zNSgGrF/R4RKQiHzt67/4w7bN6fKbgz1HR4WlMhQBejFZwt+IQUQ6dxL
 s8fZ6NGQfBY+N0ZQjFK814gpi9OySzE83e/xODO04/LHaw07cCeNWwzQgJ9bHouz6a/eYS6M6I
 tKA=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 13/17] Gimplify allocate directive (OpenMP 5.0).
Date: Thu, 7 Jul 2022 11:34:44 +0100
Message-ID: 
 <9e2ae3ebed095b6a0b3f57fc93c5ee8f8f3d0a45.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/ChangeLog:

	* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
	* gimple-pretty-print.cc (dump_gimple_omp_allocate): New function.
	(pp_gimple_stmt_1): Call it.
	* gimple.cc (gimple_build_omp_allocate): New function.
	* gimple.def (GIMPLE_OMP_ALLOCATE): New node.
	* gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK,
	GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE.
	(struct gomp_allocate): New.
	(is_a_helper <gomp_allocate *>::test): New.
	(is_a_helper <const gomp_allocate *>::test): New.
	(gimple_build_omp_allocate): Declare.
	(gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with
	GIMPLE_OMP_ALLOCATE.
	(gimple_omp_allocate_set_clauses): New.
	(gimple_omp_allocate_set_kind): Likewise.
	(gimple_omp_allocate_clauses): Likewise.
	(gimple_omp_allocate_kind): Likewise.
	(CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE.
	* gimplify.cc (gimplify_omp_allocate): New.
	(gimplify_expr): Call it.
	* gsstruct.def (GSS_OMP_ALLOCATE): Define.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: Add tests.
---
 gcc/doc/gimple.texi                           | 38 +++++++++++-
 gcc/gimple-pretty-print.cc                    | 37 ++++++++++++
 gcc/gimple.cc                                 | 12 ++++
 gcc/gimple.def                                |  6 ++
 gcc/gimple.h                                  | 60 ++++++++++++++++++-
 gcc/gimplify.cc                               | 19 ++++++
 gcc/gsstruct.def                              |  1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  4 +-
 8 files changed, 173 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index dd9149377f3..67b9061f3a7 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values (layouts) and
      + gomp_continue
      |        layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE
      |
+     + gomp_allocate
+     |        layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE
+     |
      + gomp_atomic_load
      |        layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD
      |
@@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE instruction set.
 @item @code{GIMPLE_GOTO}		@tab x			@tab x
 @item @code{GIMPLE_LABEL}		@tab x			@tab x
 @item @code{GIMPLE_NOP}			@tab x			@tab x
+@item @code{GIMPLE_OMP_ALLOCATE}	@tab x			@tab x
 @item @code{GIMPLE_OMP_ATOMIC_LOAD}	@tab x			@tab x
 @item @code{GIMPLE_OMP_ATOMIC_STORE}	@tab x			@tab x
 @item @code{GIMPLE_OMP_CONTINUE}	@tab x			@tab x
@@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}.
 * @code{GIMPLE_LABEL}::
 * @code{GIMPLE_GOTO}::
 * @code{GIMPLE_NOP}::
+* @code{GIMPLE_OMP_ALLOCATE}::
 * @code{GIMPLE_OMP_ATOMIC_LOAD}::
 * @code{GIMPLE_OMP_ATOMIC_STORE}::
 * @code{GIMPLE_OMP_CONTINUE}::
@@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement.
 Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}.
 @end deftypefn
 
+@node @code{GIMPLE_OMP_ALLOCATE}
+@subsection @code{GIMPLE_OMP_ALLOCATE}
+@cindex @code{GIMPLE_OMP_ALLOCATE}
+
+@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @
+tree clauses, int kind)
+Build a @code{GIMPLE_OMP_ALLOCATE} statement.  @code{CLAUSES} is the clauses
+associated with this node.  @code{KIND} is the enumeration value
+@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory
+or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @
+gomp_allocate *g, tree clauses)
+Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @
+const gomp_allocate *g)
+Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @
+gomp_allocate *g, int kind)
+Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @
+const gomp_atomic_load *g)
+Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
 @node @code{GIMPLE_OMP_ATOMIC_LOAD}
 @subsection @code{GIMPLE_OMP_ATOMIC_LOAD}
 @cindex @code{GIMPLE_OMP_ATOMIC_LOAD}
@@ -1760,7 +1797,6 @@ const gomp_atomic_load *g)
 Get the @code{RHS} of an atomic set.
 @end deftypefn
 
-
 @node @code{GIMPLE_OMP_ATOMIC_STORE}
 @subsection @code{GIMPLE_OMP_ATOMIC_STORE}
 @cindex @code{GIMPLE_OMP_ATOMIC_STORE}
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index ebd87b20a0a..bb961a900df 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer *buffer, const gomp_critical *gs,
     }
 }
 
+static void
+dump_gimple_omp_allocate (pretty_printer *buffer, const gomp_allocate *gs,
+			  int spc, dump_flags_t flags)
+{
+  if (flags & TDF_RAW)
+    {
+      const char *kind="";
+      switch (gimple_omp_allocate_kind (gs))
+      {
+	case GF_OMP_ALLOCATE_KIND_ALLOCATE:
+	  kind = "allocate";
+	  break;
+	case GF_OMP_ALLOCATE_KIND_FREE:
+	  kind = "free";
+	  break;
+      }
+    dump_gimple_fmt (buffer, spc, flags, "%G <kind:%s CLAUSES <", gs, kind);
+    dump_omp_clauses (buffer, gimple_omp_allocate_clauses (gs), spc, flags);
+    dump_gimple_fmt (buffer, spc, flags, " > >");
+    }
+  else
+    {
+      pp_string (buffer, "#pragma omp allocate ");
+      if (gimple_omp_allocate_kind (gs) == GF_OMP_ALLOCATE_KIND_ALLOCATE)
+	pp_string (buffer, "(kind=allocate) ");
+      else if (gimple_omp_allocate_kind (gs) == GF_OMP_ALLOCATE_KIND_FREE)
+	pp_string (buffer, "(kind=free) ");
+
+      dump_omp_clauses (buffer, gimple_omp_allocate_clauses (gs), spc, flags);
+    }
+}
+
 /* Dump a GIMPLE_OMP_ORDERED tuple on the pretty_printer BUFFER.  */
 
 static void
@@ -2823,6 +2855,11 @@ pp_gimple_stmt_1 (pretty_printer *buffer, const gimple *gs, int spc,
 				flags);
       break;
 
+    case GIMPLE_OMP_ALLOCATE:
+      dump_gimple_omp_allocate (buffer, as_a <const gomp_allocate *> (gs), spc,
+				flags);
+      break;
+
     case GIMPLE_CATCH:
       dump_gimple_catch (buffer, as_a <const gcatch *> (gs), spc, flags);
       break;
diff --git a/gcc/gimple.cc b/gcc/gimple.cc
index 9b156399ba1..a8b29f85d3d 100644
--- a/gcc/gimple.cc
+++ b/gcc/gimple.cc
@@ -1280,6 +1280,18 @@ gimple_build_omp_atomic_store (tree val, enum omp_memory_order mo)
   return p;
 }
 
+/* Build a GIMPLE_OMP_ALLOCATE statement.  */
+
+gomp_allocate *
+gimple_build_omp_allocate (tree clauses, int kind)
+{
+  gomp_allocate *p
+    = as_a <gomp_allocate *> (gimple_alloc (GIMPLE_OMP_ALLOCATE, 0));
+  gimple_omp_allocate_set_clauses (p, clauses);
+  gimple_omp_allocate_set_kind (p, kind);
+  return p;
+}
+
 /* Build a GIMPLE_TRANSACTION statement.  */
 
 gtransaction *
diff --git a/gcc/gimple.def b/gcc/gimple.def
index 296c73c2d52..079565c3920 100644
--- a/gcc/gimple.def
+++ b/gcc/gimple.def
@@ -388,6 +388,12 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", GSS_OMP_PARALLEL_LAYOUT)
    CHILD_FN and DATA_ARG like for GIMPLE_OMP_PARALLEL.  */
 DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_PARALLEL_LAYOUT)
 
+/* GIMPLE_OMP_ALLOCATE <CLAUSES> represents
+   #pragma omp allocate
+   CLAUSES is an OMP_CLAUSE chain holding the associated clauses which hold
+   variables to be allocated.  */
+DEFGSCODE(GIMPLE_OMP_ALLOCATE, "gimple_omp_allocate", GSS_OMP_ALLOCATE)
+
 /* GIMPLE_OMP_ORDERED <BODY, CLAUSES> represents #pragma omp ordered.
    BODY is the sequence of statements to execute in the ordered section.
    CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 1d15ff98ac2..aa0ae4078ad 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -150,6 +150,9 @@ enum gf_mask {
     GF_CALL_BY_DESCRIPTOR	= 1 << 10,
     GF_CALL_NOCF_CHECK		= 1 << 11,
     GF_CALL_FROM_NEW_OR_DELETE	= 1 << 12,
+    GF_OMP_ALLOCATE_KIND_MASK	= (1 << 2) - 1,
+    GF_OMP_ALLOCATE_KIND_ALLOCATE = 1,
+    GF_OMP_ALLOCATE_KIND_FREE = 2,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
     GF_OMP_TASK_TASKLOOP	= 1 << 0,
     GF_OMP_TASK_TASKWAIT	= 1 << 1,
@@ -796,6 +799,17 @@ struct GTY((tag("GSS_OMP_ATOMIC_LOAD")))
   tree rhs, lhs;
 };
 
+/* GSS_OMP_ALLOCATE.  */
+
+struct GTY((tag("GSS_OMP_ALLOCATE")))
+  gomp_allocate : public gimple
+{
+  /* [ WORD 1-6 ] : base class */
+
+  /* [ WORD 7 ]  */
+  tree clauses;
+};
+
 /* GIMPLE_OMP_ATOMIC_STORE.
    See note on GIMPLE_OMP_ATOMIC_LOAD.  */
 
@@ -1129,6 +1143,14 @@ is_a_helper <gomp_atomic_store *>::test (gimple *gs)
   return gs->code == GIMPLE_OMP_ATOMIC_STORE;
 }
 
+template <>
+template <>
+inline bool
+is_a_helper <gomp_allocate *>::test (gimple *gs)
+{
+  return gs->code == GIMPLE_OMP_ALLOCATE;
+}
+
 template <>
 template <>
 inline bool
@@ -1371,6 +1393,14 @@ is_a_helper <const gomp_atomic_store *>::test (const gimple *gs)
   return gs->code == GIMPLE_OMP_ATOMIC_STORE;
 }
 
+template <>
+template <>
+inline bool
+is_a_helper <const gomp_allocate *>::test (const gimple *gs)
+{
+  return gs->code == GIMPLE_OMP_ALLOCATE;
+}
+
 template <>
 template <>
 inline bool
@@ -1572,6 +1602,7 @@ gomp_sections *gimple_build_omp_sections (gimple_seq, tree);
 gimple *gimple_build_omp_sections_switch (void);
 gomp_single *gimple_build_omp_single (gimple_seq, tree);
 gomp_target *gimple_build_omp_target (gimple_seq, int, tree);
+gomp_allocate *gimple_build_omp_allocate (tree, int);
 gomp_teams *gimple_build_omp_teams (gimple_seq, tree);
 gomp_atomic_load *gimple_build_omp_atomic_load (tree, tree,
 						enum omp_memory_order);
@@ -2312,7 +2343,7 @@ static inline unsigned
 gimple_omp_subcode (const gimple *s)
 {
   gcc_gimple_checking_assert (gimple_code (s) >= GIMPLE_OMP_ATOMIC_LOAD
-			      && gimple_code (s) <= GIMPLE_OMP_TEAMS);
+			      && gimple_code (s) <= GIMPLE_OMP_ALLOCATE);
   return s->subcode;
 }
 
@@ -6365,6 +6396,30 @@ gimple_omp_sections_set_control (gimple *gs, tree control)
   omp_sections_stmt->control = control;
 }
 
+static inline void
+gimple_omp_allocate_set_clauses (gomp_allocate *gs, tree c)
+{
+  gs->clauses = c;
+}
+
+static inline void
+gimple_omp_allocate_set_kind (gomp_allocate *gs, int kind)
+{
+  gs->subcode = (gs->subcode & ~GF_OMP_ALLOCATE_KIND_MASK)
+		      | (kind & GF_OMP_ALLOCATE_KIND_MASK);
+}
+
+static inline tree
+gimple_omp_allocate_clauses (const gomp_allocate *gs)
+{
+  return gs->clauses;
+}
+
+static inline int
+gimple_omp_allocate_kind (const gomp_allocate *gs)
+{
+  return (gimple_omp_subcode (gs) & GF_OMP_ALLOCATE_KIND_MASK);
+}
 
 /* Set the value being stored in an atomic store.  */
 
@@ -6648,7 +6703,8 @@ gimple_return_set_retval (greturn *gs, tree retval)
     case GIMPLE_OMP_RETURN:			\
     case GIMPLE_OMP_ATOMIC_LOAD:		\
     case GIMPLE_OMP_ATOMIC_STORE:		\
-    case GIMPLE_OMP_CONTINUE
+    case GIMPLE_OMP_CONTINUE:			\
+    case GIMPLE_OMP_ALLOCATE
 
 static inline bool
 is_gimple_omp (const gimple *stmt)
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 04990ad91a6..1119ee3bc42 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -14356,6 +14356,21 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
   *expr_p = NULL_TREE;
 }
 
+static void
+gimplify_omp_allocate (tree *expr_p, gimple_seq *pre_p)
+{
+  tree expr = *expr_p;
+  int kind;
+  if (OMP_ALLOCATE_KIND_ALLOCATE (expr))
+    kind = GF_OMP_ALLOCATE_KIND_ALLOCATE;
+  else
+    kind = GF_OMP_ALLOCATE_KIND_FREE;
+  gimple *stmt = gimple_build_omp_allocate (OMP_ALLOCATE_CLAUSES (expr),
+					    kind);
+  gimplify_seq_add_stmt (pre_p, stmt);
+  *expr_p = NULL_TREE;
+}
+
 /* Gimplify the gross structure of OpenACC enter/exit data, update, and OpenMP
    target update constructs.  */
 
@@ -15755,6 +15770,10 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	  gimplify_omp_target_update (expr_p, pre_p);
 	  ret = GS_ALL_DONE;
 	  break;
+	case OMP_ALLOCATE:
+	  gimplify_omp_allocate (expr_p, pre_p);
+	  ret = GS_ALL_DONE;
+	  break;
 
 	case OMP_SECTION:
 	case OMP_MASTER:
diff --git a/gcc/gsstruct.def b/gcc/gsstruct.def
index 19e1088b718..9c7526596e8 100644
--- a/gcc/gsstruct.def
+++ b/gcc/gsstruct.def
@@ -50,4 +50,5 @@ DEFGSSTRUCT(GSS_OMP_SINGLE_LAYOUT, gimple_statement_omp_single_layout, false)
 DEFGSSTRUCT(GSS_OMP_CONTINUE, gomp_continue, false)
 DEFGSSTRUCT(GSS_OMP_ATOMIC_LOAD, gomp_atomic_load, false)
 DEFGSSTRUCT(GSS_OMP_ATOMIC_STORE_LAYOUT, gomp_atomic_store, false)
+DEFGSSTRUCT(GSS_OMP_ALLOCATE, gomp_allocate, false)
 DEFGSSTRUCT(GSS_TRANSACTION, gtransaction, false)
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
index 0eb35178e03..6957bc55da0 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple" }
 
 module omp_lib_kinds
   use iso_c_binding, only: c_int, c_intptr_t
@@ -71,3 +71,5 @@ end subroutine
 
 ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } }
 ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "gimple" } }
+! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "gimple" } }

From patchwork Thu Jul  7 10:34:45 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55831
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 04ED73838202
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:37:50 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 136DA3844079
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:37:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 136DA3844079
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112774"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:16 -0800
IronPort-SDR: 
 4b4n0proXL+0z8LjrW7Rl5WDJ6QgZ7MeKod31ztvsV4f7YzrB0lJ36QUs3M1//x/lfYNdNnLTM
 cDsIlLDyJlq+KmjndHNywszOvqXUE9uk6pXL5c9Aa9ITCLmQyrnOBpGY0hTRtxkqUCpIj7ccYl
 dCzEOGLqZVGkkPE/5uePdbLebvD2eVj8c4YwxukZVb0uHxNtzGmdNP7NXVEMjdYpIHPYsO0Ojq
 7NlSJy7T+ZC8iT1t6OPq8UcQphBn7Quxa4r5SOR3L0gb1LjhZjnlhnGaxR1ftj+0ozvuvCg3dY
 kTw=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 14/17] Lower allocate directive (OpenMP 5.0).
Date: Thu, 7 Jul 2022 11:34:45 +0100
Message-ID: 
 <0f75f3d2a2b5bf11ec30ed989d6237e438d94f77.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.

gcc/ChangeLog:

	* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
	(scan_omp_allocate): New.
	(scan_omp_1_stmt): Call it.
	(lower_omp_allocate): New function.
	(lower_omp_1): Call it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: Add tests.
	* gfortran.dg/gomp/allocate-7.f90: New test.
	* gfortran.dg/gomp/allocate-8.f90: New test.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-2.f90: New test.
---
 gcc/omp-low.cc                                | 139 ++++++++++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |   9 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 |  13 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 |  15 ++
 .../testsuite/libgomp.fortran/allocate-2.f90  |  48 ++++++
 5 files changed, 224 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index cdadd6f0c96..7d1a2a0d795 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1746,6 +1746,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_TASK_REDUCTION:
 	case OMP_CLAUSE_ALLOCATE:
+	case OMP_CLAUSE_ALLOCATOR:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -1963,6 +1964,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_FILTER:
 	case OMP_CLAUSE__CONDTEMP_:
+	case OMP_CLAUSE_ALLOCATOR:
 	  break;
 
 	case OMP_CLAUSE__CACHE_:
@@ -3033,6 +3035,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for *stmt,
   maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true;
 }
 
+/* Scan an OpenMP allocate directive.  */
+
+static void
+scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx)
+{
+  omp_context *ctx;
+  ctx = new_omp_context (stmt, outer_ctx);
+  scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -4332,6 +4344,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	    insert_decl_map (&ctx->cb, var, var);
       }
       break;
+    case GIMPLE_OMP_ALLOCATE:
+      scan_omp_allocate (as_a <gomp_allocate *> (stmt), ctx);
+      break;
     default:
       *handled_ops_p = false;
       break;
@@ -8768,6 +8783,125 @@ lower_omp_single_simple (gomp_single *single_stmt, gimple_seq *pre_p)
   gimple_seq_add_stmt (pre_p, gimple_build_label (flabel));
 }
 
+static void
+lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *ctx)
+{
+  gomp_allocate *st = as_a <gomp_allocate *> (gsi_stmt (*gsi_p));
+  tree clauses = gimple_omp_allocate_clauses (st);
+  int kind = gimple_omp_allocate_kind (st);
+  gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE
+	      || kind == GF_OMP_ALLOCATE_KIND_FREE);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR)
+	continue;
+
+      bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE);
+      /* The allocate directives that appear in a target region must specify
+	 an allocator clause unless a requires directive with the
+	 dynamic_allocators clause is present in the same compilation unit.  */
+      if (OMP_ALLOCATE_ALLOCATOR (c) == NULL_TREE
+	  && ((omp_requires_mask & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0)
+	  && omp_maybe_offloaded_ctx (ctx))
+	error_at (OMP_CLAUSE_LOCATION (c), "%<allocate%> directive must"
+		  " specify an allocator here");
+
+      tree var = OMP_ALLOCATE_DECL (c);
+
+      gimple_stmt_iterator gsi = *gsi_p;
+      for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+
+	  if (gimple_code (stmt) != GIMPLE_CALL
+	      || (allocate && gimple_call_fndecl (stmt)
+		  != builtin_decl_explicit (BUILT_IN_MALLOC))
+	      || (!allocate && gimple_call_fndecl (stmt)
+		  != builtin_decl_explicit (BUILT_IN_FREE)))
+	    continue;
+	  const gcall *gs = as_a <const gcall *> (stmt);
+	  tree allocator = OMP_ALLOCATE_ALLOCATOR (c)
+			   ? OMP_ALLOCATE_ALLOCATOR (c)
+			   : integer_zero_node;
+	  if (allocate)
+	    {
+	      tree lhs = gimple_call_lhs (gs);
+	      if (lhs && TREE_CODE (lhs) == SSA_NAME)
+		{
+		  gimple_stmt_iterator gsi2 = gsi;
+		  gsi_next (&gsi2);
+		  gimple *assign = gsi_stmt (gsi2);
+		  if (gimple_code (assign) == GIMPLE_ASSIGN)
+		    {
+		      lhs = gimple_assign_lhs (as_a <const gassign *> (assign));
+		      if (lhs == NULL_TREE
+			  || TREE_CODE (lhs) != COMPONENT_REF)
+			continue;
+		      lhs = TREE_OPERAND (lhs, 0);
+		    }
+		}
+
+	      if (lhs == var)
+		{
+		  unsigned HOST_WIDE_INT ialign = 0;
+		  tree align;
+		  if (TYPE_P (var))
+		    ialign = TYPE_ALIGN_UNIT (var);
+		  else
+		    ialign = DECL_ALIGN_UNIT (var);
+		  align = build_int_cst (size_type_node, ialign);
+		  tree repl = builtin_decl_explicit (BUILT_IN_GOMP_ALLOC);
+		  tree size = gimple_call_arg (gs, 0);
+		  gimple *g = gimple_build_call (repl, 3, align, size,
+						 allocator);
+		  gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		  gimple_set_location (g, gimple_location (stmt));
+		  gsi_replace (&gsi, g, true);
+		  /* The malloc call has been replaced.  Now see if there is
+		     any free call due to deallocate statement and replace
+		     that too.  */
+		  allocate = false;
+		}
+	    }
+	  else
+	    {
+	      tree arg = gimple_call_arg (gs, 0);
+	      if (arg && TREE_CODE (arg) == SSA_NAME)
+		{
+		  gimple_stmt_iterator gsi2 = gsi;
+		  gsi_prev (&gsi2);
+		  if (!gsi_end_p (gsi2))
+		    {
+		      gimple *gs = gsi_stmt (gsi2);
+		      if (gimple_code (gs) == GIMPLE_ASSIGN)
+			{
+			  const gassign *assign = as_a <const gassign *> (gs);
+			  tree rhs = gimple_assign_rhs1 (assign);
+			  tree lhs = gimple_assign_lhs (assign);
+			  if (lhs == arg && rhs
+			      && TREE_CODE (rhs) == COMPONENT_REF)
+			      arg = TREE_OPERAND (rhs, 0);
+			}
+		    }
+		}
+
+	      if (arg == var)
+		{
+		  tree repl = builtin_decl_explicit (BUILT_IN_GOMP_FREE);
+		  gimple *g = gimple_build_call (repl, 2,
+						 gimple_call_arg (gs, 0),
+						 allocator);
+		  gimple_set_location (g, gimple_location (stmt));
+		  gsi_replace (&gsi, g, true);
+		  break;
+		}
+	    }
+	}
+    }
+  gsi_replace (gsi_p, gimple_build_nop (), true);
+}
+
 
 /* A subroutine of lower_omp_single.  Expand the simple form of
    a GIMPLE_OMP_SINGLE, with a copyprivate clause:
@@ -14431,6 +14565,11 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gcc_assert (ctx);
       lower_omp_scope (gsi_p, ctx);
       break;
+    case GIMPLE_OMP_ALLOCATE:
+      ctx = maybe_lookup_ctx (stmt);
+      gcc_assert (ctx);
+      lower_omp_allocate (gsi_p, ctx);
+      break;
     case GIMPLE_OMP_SINGLE:
       ctx = maybe_lookup_ctx (stmt);
       gcc_assert (ctx);
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
index 6957bc55da0..738d9936f6a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -1,5 +1,6 @@
 ! { dg-do compile }
 ! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple" }
+! { dg-additional-options "-fdump-tree-omplower" }
 
 module omp_lib_kinds
   use iso_c_binding, only: c_int, c_intptr_t
@@ -47,6 +48,7 @@ end type
   real, allocatable :: var3(:,:)
   type (my_type), allocatable :: var4
   integer, pointer :: pii, parr(:)
+  integer, allocatable :: var
 
   character, allocatable :: str1a, str1aarr(:) 
   character(len=5), allocatable :: str5a, str5aarr(:)
@@ -67,9 +69,16 @@ end type
 
   !$omp allocate
   allocate(pii, parr(5))
+
+  ! allocate statement not associated with an allocate directive
+  allocate(var)
 end subroutine
 
 ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } }
 ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } }
 ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "gimple" } }
 ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "gimple" } }
+! { dg-final { scan-tree-dump-times "builtin_malloc" 11 "original" } }
+! { dg-final { scan-tree-dump-times "builtin_free" 9 "original" } }
+! { dg-final { scan-tree-dump-times "GOMP_alloc" 10 "omplower" } }
+! { dg-final { scan-tree-dump-times "GOMP_free" 8 "omplower" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
new file mode 100644
index 00000000000..db76e901c08
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+
+subroutine bar(a)
+  implicit none
+  integer  :: a
+  integer, allocatable :: var
+!$omp target
+  !$omp allocate (var) ! { dg-error "'allocate' directive must specify an allocator here" }
+  allocate (var)
+!$omp end target
+
+end subroutine
+
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
new file mode 100644
index 00000000000..699a3b80878
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+
+
+subroutine bar(a)
+  implicit none
+  integer  :: a
+  integer, allocatable :: var
+!$omp requires dynamic_allocators
+!$omp target
+  !$omp allocate (var)
+  allocate (var)
+!$omp end target
+
+end subroutine
+
diff --git a/libgomp/testsuite/libgomp.fortran/allocate-2.f90 b/libgomp/testsuite/libgomp.fortran/allocate-2.f90
new file mode 100644
index 00000000000..2219f107fe7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/allocate-2.f90
@@ -0,0 +1,48 @@
+! { dg-do run }
+! { dg-additional-sources allocate-1.c }
+! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
+
+module m
+  use omp_lib
+  use iso_c_binding
+  implicit none
+  interface
+    integer(c_int) function is_64bit_aligned (a) bind(C)
+      import :: c_int
+      integer  :: a
+    end
+  end interface
+
+contains
+
+subroutine foo (x, y, h)
+  use omp_lib
+  integer  :: x
+  integer  :: y
+  integer (kind=omp_allocator_handle_kind) :: h
+  integer, allocatable :: var1
+
+  !$omp allocate (var1)  allocator(h)
+  allocate (var1)
+
+  if (is_64bit_aligned(var1) == 0) then
+    stop 19
+  end if
+
+  deallocate(var1)
+end subroutine
+end module m
+
+program main
+  use omp_lib
+  use m
+  type (omp_alloctrait) :: traits(2)
+  integer (omp_allocator_handle_kind) :: a
+
+  traits = [omp_alloctrait (omp_atk_alignment, 64), &
+            omp_alloctrait (omp_atk_fallback, omp_atv_null_fb)]
+  a = omp_init_allocator (omp_default_mem_space, 2, traits)
+  if (a == omp_null_allocator) stop 1
+  call foo (42, 12, a);
+  call omp_destroy_allocator (a);
+end

From patchwork Thu Jul  7 10:34:46 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55836
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3C1DC3955628
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:38:47 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 300213838238
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:38:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 300213838238
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112835"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:38:12 -0800
IronPort-SDR: 
 hEjIAa4QkQzPHJwGrx+6hzuEgV0js3fkSPeCOfuiSwjJQfbl70sdBY8V0wcFthqrh2jxnE3njD
 WlcxzFmfwXV+Hx+iWynepmH9t+LyqRfPo6I3DFYVI1OaUe8cpSLe4rP017e9rokM725Ukf4DU1
 4dBq/0Oj0h+2IXIC76nPeTyLy9Jdnoql3A/0gVO0XaIXJNiQ1L452rIwr0azTDyNzuKZ1kDfU2
 +v2ez3IeTmUftWJtlLP4GPT1kcdGLXvrurFaU9abewaDazJSQfZi4hs+ZUkqZuc517zbHuBqOc
 sX4=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 15/17] amdgcn: Support XNACK mode
Date: Thu, 7 Jul 2022 11:34:46 +0100
Message-ID: 
 <f8527d622db92f64123282e7bdca1e404c1c316f.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_STOCKGEN,
 SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt.  This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.

To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber.  When the port supports scheduling
of s_waitcnt instructions there will be further requirements.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
	(ASM_SPEC): Use XNACKOPT.
	* config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ...
	(enum hsaco_attr_type): ... this, and generalize the names.
	(TARGET_XNACK): New macro.
	* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
	Add xnack compatible alternatives.
	(gather<mode>_insn_2offsets<exec>): Likewise.
	* config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices
	other than Fiji.
	(gcn_expand_epilogue): Remove early-clobber problems.
	(output_file_start): Emit xnack attributes.
	(gcn_hsa_declare_function_name): Obey -mxnack setting.
	* config/gcn/gcn.md (xnack): New attribute.
	(enabled): Rework to include "xnack" attribute.
	(*movbi): Add xnack compatible alternatives.
	(*mov<mode>_insn): Likewise.
	(*mov<mode>_insn): Likewise.
	(*mov<mode>_insn): Likewise.
	(*movti_insn): Likewise.
	* config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax.
	(sram_ecc_type): Rename to ...
	(hsaco_attr_type: ... this.)
	* config/gcn/mkoffload.c (SET_XNACK_ANY): New macro.
	(TEST_XNACK): Delete.
	(TEST_XNACK_ANY): New macro.
	(TEST_XNACK_ON): New macro.
	(main): Support the new -mxnack=on/off/any syntax.
---
 gcc/config/gcn/gcn-hsa.h    |   3 +-
 gcc/config/gcn/gcn-opts.h   |  10 ++--
 gcc/config/gcn/gcn-valu.md  |  29 ++++-----
 gcc/config/gcn/gcn.cc       |  34 ++++++-----
 gcc/config/gcn/gcn.md       | 113 +++++++++++++++++++++++-------------
 gcc/config/gcn/gcn.opt      |  18 +++---
 gcc/config/gcn/mkoffload.cc |  19 ++++--
 7 files changed, 140 insertions(+), 86 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index b3079cebb43..fd08947574f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -81,12 +81,13 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
    configuration.  The name of the attribute also changed.  */
 #define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc"
+#define XNACKOPT "mxnack=on:-mattr=+xnack;mxnack=off:-mattr=-xnack"
 
 /* Use LLVM assembler and linker options.  */
 #define ASM_SPEC  "-triple=amdgcn--amdhsa "  \
 		  "%:last_arg(%{march=*:-mcpu=%*}) " \
 		  "%{!march=*|march=fiji:--amdhsa-code-object-version=3} " \
-		  "%{" NO_XNACK "mxnack:-mattr=+xnack;:-mattr=-xnack} " \
+		  "%{" NO_XNACK XNACKOPT "}" \
 		  "%{" NO_SRAM_ECC SRAMOPT "} " \
 		  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index b62dfb45f59..07ddc79cda3 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -48,11 +48,13 @@ extern enum gcn_isa {
 #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
 #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
 
-enum sram_ecc_type
+#define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
+
+enum hsaco_attr_type
 {
-  SRAM_ECC_OFF,
-  SRAM_ECC_ON,
-  SRAM_ECC_ANY
+  HSACO_ATTR_OFF,
+  HSACO_ATTR_ON,
+  HSACO_ATTR_ANY
 };
 
 #endif
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index abe46201344..ec114db9dd1 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -741,13 +741,13 @@ (define_expand "gather<mode>_expr<exec>"
     {})
 
 (define_insn "gather<mode>_insn_1offset<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		   "=v,&v")
 	(unspec:V_ALL
-	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v")
+	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v, v")
 			(vec_duplicate:<VnDI>
-			  (match_operand 2 "immediate_operand"	   " n")))
-	   (match_operand 3 "immediate_operand"			   " n")
-	   (match_operand 4 "immediate_operand"			   " n")
+			  (match_operand 2 "immediate_operand"	   " n, n")))
+	   (match_operand 3 "immediate_operand"			   " n, n")
+	   (match_operand 4 "immediate_operand"			   " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_FLAT_P (INTVAL (operands[3]))
@@ -777,7 +777,8 @@ (define_insn "gather<mode>_insn_1offset<exec>"
     return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_insn "gather<mode>_insn_1offset_ds<exec>"
   [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
@@ -802,17 +803,18 @@ (define_insn "gather<mode>_insn_1offset_ds<exec>"
    (set_attr "length" "12")])
 
 (define_insn "gather<mode>_insn_2offsets<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"			"=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		     "=v,&v")
 	(unspec:V_ALL
 	  [(plus:<VnDI>
 	     (plus:<VnDI>
 	       (vec_duplicate:<VnDI>
-		 (match_operand:DI 1 "register_operand"			"Sv"))
+		 (match_operand:DI 1 "register_operand"		     "Sv,Sv"))
 	       (sign_extend:<VnDI>
-		 (match_operand:<VnSI> 2 "register_operand"		" v")))
-	     (vec_duplicate:<VnDI> (match_operand 3 "immediate_operand" " n")))
-	   (match_operand 4 "immediate_operand"				" n")
-	   (match_operand 5 "immediate_operand"				" n")
+		 (match_operand:<VnSI> 2 "register_operand"	     " v, v")))
+	     (vec_duplicate:<VnDI> (match_operand 3 "immediate_operand"
+								     " n, n")))
+	   (match_operand 4 "immediate_operand"			     " n, n")
+	   (match_operand 5 "immediate_operand"			     " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_GLOBAL_P (INTVAL (operands[4]))
@@ -831,7 +833,8 @@ (define_insn "gather<mode>_insn_2offsets<exec>"
     return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_expand "scatter_store<mode><vnsi>"
   [(match_operand:DI 0 "register_operand")
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6fc20d3f659..4df05453604 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -170,9 +170,14 @@ gcn_option_override (void)
 	acc_lds_size = 32768;
     }
 
-  /* The xnack option is a placeholder, for now.  */
-  if (flag_xnack)
-    sorry ("XNACK support");
+  /* gfx908 "Fiji" does not support XNACK.  */
+  if (gcn_arch == PROCESSOR_FIJI)
+    {
+      if (flag_xnack == HSACO_ATTR_ON)
+	error ("-mxnack=on is incompatible with -march=fiji");
+      /* Allow HSACO_ATTR_ANY silently because that's the default.  */
+      flag_xnack = HSACO_ATTR_OFF;
+    }
 }
 
 /* }}}  */
@@ -3188,17 +3193,19 @@ gcn_expand_epilogue (void)
       /* Assume that an exit value compatible with gcn-run is expected.
          That is, the third input parameter is an int*.
 
-         We can't allocate any new registers, but the kernarg_reg is
-         dead after this, so we'll use that.  */
+         We can't allocate any new registers, but the dispatch_ptr and
+	 kernarg_reg are dead after this, so we'll use those.  */
+      rtx dispatch_ptr_reg = gen_rtx_REG (DImode, cfun->machine->args.reg
+					  [DISPATCH_PTR_ARG]);
       rtx kernarg_reg = gen_rtx_REG (DImode, cfun->machine->args.reg
 				     [KERNARG_SEGMENT_PTR_ARG]);
       rtx retptr_mem = gen_rtx_MEM (DImode,
 				    gen_rtx_PLUS (DImode, kernarg_reg,
 						  GEN_INT (16)));
       set_mem_addr_space (retptr_mem, ADDR_SPACE_SCALAR_FLAT);
-      emit_move_insn (kernarg_reg, retptr_mem);
+      emit_move_insn (dispatch_ptr_reg, retptr_mem);
 
-      rtx retval_mem = gen_rtx_MEM (SImode, kernarg_reg);
+      rtx retval_mem = gen_rtx_MEM (SImode, dispatch_ptr_reg);
       set_mem_addr_space (retval_mem, ADDR_SPACE_SCALAR_FLAT);
       emit_move_insn (retval_mem,
 		      gen_rtx_REG (SImode, SGPR_REGNO (RETURN_VALUE_REG)));
@@ -5250,11 +5257,12 @@ static void
 output_file_start (void)
 {
   /* In HSACOv4 no attribute setting means the binary supports "any" hardware
-     configuration.  In GCC binaries, this is true for SRAM ECC, but not
-     XNACK.  */
-  const char *xnack = (flag_xnack ? ":xnack+" : ":xnack-");
-  const char *sram_ecc = (flag_sram_ecc == SRAM_ECC_ON ? ":sramecc+"
-			  : flag_sram_ecc == SRAM_ECC_OFF ? ":sramecc-"
+     configuration.  */
+  const char *xnack = (flag_xnack == HSACO_ATTR_ON ? ":xnack+"
+		       : flag_xnack == HSACO_ATTR_OFF ? ":xnack-"
+		       : "");
+  const char *sram_ecc = (flag_sram_ecc == HSACO_ATTR_ON ? ":sramecc+"
+			  : flag_sram_ecc == HSACO_ATTR_OFF ? ":sramecc-"
 			  : "");
 
   const char *cpu;
@@ -5298,7 +5306,7 @@ void
 gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
 {
   int sgpr, vgpr;
-  bool xnack_enabled = false;
+  bool xnack_enabled = TARGET_XNACK;
 
   fputs ("\n\n", file);
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 033c1708e88..0f9381c9194 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -277,12 +277,19 @@ (define_attr "length" ""
 
 (define_attr "gcn_version" "gcn3,gcn5" (const_string "gcn3"))
 
+(define_attr "xnack" "na,off,on" (const_string "na"))
+
 (define_attr "enabled" ""
-  (cond [(eq_attr "gcn_version" "gcn3") (const_int 1)
-	 (and (eq_attr "gcn_version" "gcn5")
-	      (ne (symbol_ref "TARGET_GCN5_PLUS") (const_int 0)))
-	   (const_int 1)]
-	(const_int 0)))
+  (cond [(and (eq_attr "gcn_version" "gcn5")
+	      (eq (symbol_ref "TARGET_GCN5_PLUS") (const_int 0)))
+	   (const_int 0)
+	 (and (eq_attr "xnack" "off")
+	      (ne (symbol_ref "TARGET_XNACK") (const_int 0)))
+	   (const_int 0)
+	 (and (eq_attr "xnack" "on")
+	      (eq (symbol_ref "TARGET_XNACK") (const_int 0)))
+	   (const_int 0)]
+	(const_int 1)))
 
 ; We need to be able to identify v_readlane and v_writelane with
 ; SGPR lane selection in order to handle "Manually Inserted Wait States".
@@ -472,9 +479,9 @@ (define_split
 
 (define_insn "*movbi"
   [(set (match_operand:BI 0 "nonimmediate_operand"
-				    "=Sg,   v,Sg,cs,cV,cV,Sm,RS, v,RF, v,RM")
+			  "=Sg,   v,Sg,cs,cV,cV,Sm,&Sm,RS, v,&v,RF, v,&v,RM")
 	(match_operand:BI 1 "gcn_load_operand"
-				    "SSA,vSvA, v,SS, v,SS,RS,Sm,RF, v,RM, v"))]
+			  "SSA,vSvA, v,SS, v,SS,RS, RS,Sm,RF,RF, v,RM,RM, v"))]
   ""
   {
     /* SCC as an operand is currently not accepted by the LLVM assembler, so
@@ -501,66 +508,77 @@ (define_insn "*movbi"
       return "s_mov_b32\tvcc_lo, %1\;"
 	     "s_mov_b32\tvcc_hi, 0";
     case 6:
-      return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)";
     case 7:
-      return "s_store_dword\t%1, %A0";
+      return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)";
     case 8:
-      return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0";
+      return "s_store_dword\t%1, %A0";
     case 9:
-      return "flat_store_dword\t%A0, %1%O0%g0";
     case 10:
-      return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)";
+      return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0";
     case 11:
+      return "flat_store_dword\t%A0, %1%O0%g0";
+    case 12:
+    case 13:
+      return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)";
+    case 14:
       return "global_store_dword\t%A0, %1%O0%g0";
     default:
       gcc_unreachable ();
     }
   }
-  [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,flat,flat,
-		     flat,flat")
-   (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12")])
+  [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,smem,flat,flat,
+		     flat,flat,flat,flat")
+   (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,*,*,off,on,*,off,on,*,off,on,*")])
 
 ; 32bit move pattern
 
 (define_insn "*mov<mode>_insn"
   [(set (match_operand:SISF 0 "nonimmediate_operand"
-		  "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG,   v,SD, v,RM")
+     "=SD,SD,SD,SD,&SD,RB,Sm,&Sm,RS,v,Sg, v, v,&v,RF,v,RLRG,   v,SD, v,&v,RM")
 	(match_operand:SISF 1 "gcn_load_operand"
-		  "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B,   v,RLRG, Y,RM, v"))]
+    "SSA, J, B,RB, RB,Sm,RS, RS,Sm,v, v,Sv,RF,RF, v,B,   v,RLRG, Y,RM,RM, v"))]
   ""
   "@
   s_mov_b32\t%0, %1
   s_movk_i32\t%0, %1
   s_mov_b32\t%0, %1
   s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0)
+  s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0)
   s_buffer_store%s1\t%1, s[0:3], %0
   s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   s_store_dword\t%1, %A0
   v_mov_b32\t%0, %1
   v_readlane_b32\t%0, %1, 0
   v_writelane_b32\t%0, %1, 0
   flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dword\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
   ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   s_mov_b32\t%0, %1
   global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dword\t%A0, %1%O0%g0"
-  [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,vop1,vop3a,vop3a,flat,
-		     flat,vop1,ds,ds,sop1,flat,flat")
-   (set_attr "exec" "*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,12,12,12,12,4,8,8,12,12,8,12,12,8,12,12")])
+  [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,smem,smem,vop1,vop3a,
+	      vop3a,flat,flat,flat,vop1,ds,ds,sop1,flat,flat,flat")
+   (set_attr "exec" "*,*,*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length"
+	     "4,4,8,12,12,12,12,12,12,4,8,8,12,12,12,8,12,12,8,12,12,12")
+   (set_attr "xnack"
+	     "*,*,*,off,on,*,off,on,*,*,*,*,off,on,*,*,*,*,*,off,on,*")])
 
 ; 8/16bit move pattern
 ; TODO: implement combined load and zero_extend, but *only* for -msram-ecc=on
 
 (define_insn "*mov<mode>_insn"
   [(set (match_operand:QIHI 0 "nonimmediate_operand"
-				 "=SD,SD,SD,v,Sg, v, v,RF,v,RLRG,   v, v,RM")
+			   "=SD,SD,SD,v,Sg, v, v,&v,RF,v,RLRG,   v, v,&v,RM")
 	(match_operand:QIHI 1 "gcn_load_operand"
-				 "SSA, J, B,v, v,Sv,RF, v,B,   v,RLRG,RM, v"))]
+			   "SSA, J, B,v, v,Sv,RF,RF, v,B,   v,RLRG,RM,RM, v"))]
   "gcn_valid_move_p (<MODE>mode, operands[0], operands[1])"
   "@
   s_mov_b32\t%0, %1
@@ -570,24 +588,27 @@ (define_insn "*mov<mode>_insn"
   v_readlane_b32\t%0, %1, 0
   v_writelane_b32\t%0, %1, 0
   flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store%s0\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
   ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store%s0\t%A0, %1%O0%g0"
-  [(set_attr "type"
-	     "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat")
-   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12")])
+  [(set_attr "type" "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,flat,vop1,ds,ds,
+	             flat,flat,flat")
+   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,8,4,4,4,12,12,12,8,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,*,*,off,on,*,*,*,*,off,on,*")])
 
 ; 64bit move pattern
 
 (define_insn_and_split "*mov<mode>_insn"
   [(set (match_operand:DIDF 0 "nonimmediate_operand"
-			  "=SD,SD,SD,RS,Sm,v, v,Sg, v, v,RF,RLRG,   v, v,RM")
+		"=SD,SD,SD,RS,Sm,&Sm,v, v,Sg, v, v,&v,RF,RLRG,   v, v,&v,RM")
 	(match_operand:DIDF 1 "general_operand"
-			  "SSA, C,DB,Sm,RS,v,DB, v,Sv,RF, v,   v,RLRG,RM, v"))]
+		"SSA, C,DB,Sm,RS, RS,v,DB, v,Sv,RF,RF, v,   v,RLRG,RM,RM, v"))]
   "GET_CODE(operands[1]) != SYMBOL_REF"
   "@
   s_mov_b64\t%0, %1
@@ -595,15 +616,18 @@ (define_insn_and_split "*mov<mode>_insn"
   #
   s_store_dwordx2\t%1, %A0
   s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   #
   #
   #
   #
   flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dwordx2\t%A0, %1%O0%g0
   ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dwordx2\t%A0, %1%O0%g0"
   "reload_completed
    && ((!MEM_P (operands[0]) && !MEM_P (operands[1])
@@ -634,29 +658,33 @@ (define_insn_and_split "*mov<mode>_insn"
 	operands[3] = inhi;
       }
   }
-  [(set_attr "type" "sop1,sop1,mult,smem,smem,vmult,vmult,vmult,vmult,flat,
-		     flat,ds,ds,flat,flat")
-   (set_attr "length" "4,8,*,12,12,*,*,*,*,12,12,12,12,12,12")])
+  [(set_attr "type" "sop1,sop1,mult,smem,smem,smem,vmult,vmult,vmult,vmult,
+	      flat,flat,flat,ds,ds,flat,flat,flat")
+   (set_attr "length" "4,8,*,12,12,12,*,*,*,*,12,12,12,12,12,12,12,12")
+   (set_attr "xnack" "*,*,*,*,off,on,*,*,*,*,off,on,*,*,*,off,on,*")])
 
 ; 128-bit move.
 
 (define_insn_and_split "*movti_insn"
   [(set (match_operand:TI 0 "nonimmediate_operand"
-				      "=SD,RS,Sm,RF, v,v, v,SD,RM, v,RL, v")
-	(match_operand:TI 1 "general_operand"  
-				      "SSB,Sm,RS, v,RF,v,Sv, v, v,RM, v,RL"))]
+			     "=SD,RS,Sm,&Sm,RF, v,&v,v, v,SD,RM, v,&v,RL, v")
+	(match_operand:TI 1 "general_operand"
+			     "SSB,Sm,RS, RS, v,RF,RF,v,Sv, v, v,RM,RM, v,RL"))]
   ""
   "@
   #
   s_store_dwordx4\t%1, %A0
   s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
+  s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0)
   flat_store_dwordx4\t%A0, %1%O0%g0
   flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0
+  flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0
   #
   #
   #
   global_store_dwordx4\t%A0, %1%O0%g0
   global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
+  global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)"
   "reload_completed
@@ -678,10 +706,11 @@ (define_insn_and_split "*movti_insn"
     operands[0] = gcn_operand_part (TImode, operands[0], 0);
     operands[1] = gcn_operand_part (TImode, operands[1], 0);
   }
-  [(set_attr "type" "mult,smem,smem,flat,flat,vmult,vmult,vmult,flat,flat,\
-		     ds,ds")
-   (set_attr "delayeduse" "*,*,yes,*,*,*,*,*,yes,*,*,*")
-   (set_attr "length" "*,12,12,12,12,*,*,*,12,12,12,12")])
+  [(set_attr "type" "mult,smem,smem,smem,flat,flat,flat,vmult,vmult,vmult,flat,
+	             flat,flat,ds,ds")
+   (set_attr "delayeduse" "*,*,yes,yes,*,*,*,*,*,*,*,yes,*,*,*")
+   (set_attr "length" "*,12,12,12,12,12,12,*,*,*,12,12,12,12,12")
+   (set_attr "xnack" "*,*,off,on,*,off,on,*,*,*,*,off,on,*,*")])
 
 ;; }}}
 ;; {{{ Prologue/Epilogue
@@ -844,6 +873,8 @@ (define_insn "movdi_symbol"
   (clobber (reg:BI SCC_REG))]
  "GET_CODE (operands[1]) == SYMBOL_REF || GET_CODE (operands[1]) == LABEL_REF"
   {
+    /* This s_load may not be XNACK-safe on devices where the GOT may fault.
+       DGPUs are most likely fine.  */
     if (SYMBOL_REF_P (operands[1])
 	&& SYMBOL_REF_WEAK (operands[1]))
 	return "s_getpc_b64\t%0\;"
@@ -868,6 +899,8 @@ (define_insn "movdi_symbol_save_scc"
   {
     /* !!! These sequences clobber CC_SAVE_REG.  */
 
+    /* This s_load may not be XNACK-safe on devices where the GOT may fault.
+       DGPUs are most likely fine.  */
     if (SYMBOL_REF_P (operands[1])
 	&& SYMBOL_REF_WEAK (operands[1]))
 	return "s_mov_b32\ts22, scc\;"
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..759f7a064c9 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -81,23 +81,23 @@ Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
 
-mxnack
-Target Var(flag_xnack) Init(0)
-Compile for devices requiring XNACK enabled. Default off.
-
 Enum
-Name(sram_ecc_type) Type(enum sram_ecc_type)
+Name(hsaco_attr_type) Type(enum hsaco_attr_type)
 SRAM-ECC modes:
 
 EnumValue
-Enum(sram_ecc_type) String(off) Value(SRAM_ECC_OFF)
+Enum(hsaco_attr_type) String(off) Value(HSACO_ATTR_OFF)
 
 EnumValue
-Enum(sram_ecc_type) String(on) Value(SRAM_ECC_ON)
+Enum(hsaco_attr_type) String(on) Value(HSACO_ATTR_ON)
 
 EnumValue
-Enum(sram_ecc_type) String(any) Value(SRAM_ECC_ANY)
+Enum(hsaco_attr_type) String(any) Value(HSACO_ATTR_ANY)
+
+mxnack=
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_xnack) Init(HSACO_ATTR_ANY)
+Compile for devices requiring XNACK enabled. Default off.
 
 msram-ecc=
-Target RejectNegative Joined ToLower Enum(sram_ecc_type) Var(flag_sram_ecc) Init(SRAM_ECC_ANY)
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_sram_ecc) Init(HSACO_ATTR_ANY)
 Compile for devices with the SRAM ECC feature enabled, or not. Default \"any\".
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index b8b3fecfcb4..cb8903c27cb 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -72,10 +72,14 @@
 
 #define SET_XNACK_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
 				 | EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define SET_XNACK_ANY(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
+				  | EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define SET_XNACK_OFF(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \
 				  | EF_AMDGPU_FEATURE_XNACK_OFF_V4)
-#define TEST_XNACK(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
-			 == EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_ANY(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			     == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
+#define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			    == EF_AMDGPU_FEATURE_XNACK_ON_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 				    | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -884,9 +888,11 @@ main (int argc, char **argv)
 	fPIC = true;
       else if (strcmp (argv[i], "-fpic") == 0)
 	fpic = true;
-      else if (strcmp (argv[i], "-mxnack") == 0)
+      else if (strcmp (argv[i], "-mxnack=on") == 0)
 	SET_XNACK_ON (elf_flags);
-      else if (strcmp (argv[i], "-mno-xnack") == 0)
+      else if (strcmp (argv[i], "-mxnack=any") == 0)
+	SET_XNACK_ANY (elf_flags);
+      else if (strcmp (argv[i], "-mxnack=off") == 0)
 	SET_XNACK_OFF (elf_flags);
       else if (strcmp (argv[i], "-msram-ecc=on") == 0)
 	SET_SRAM_ECC_ON (elf_flags);
@@ -1045,8 +1051,9 @@ main (int argc, char **argv)
       obstack_ptr_grow (&ld_argv_obstack, gcn_s2_name);
       obstack_ptr_grow (&ld_argv_obstack, "-lgomp");
       obstack_ptr_grow (&ld_argv_obstack,
-			(TEST_XNACK (elf_flags)
-			 ? "-mxnack" : "-mno-xnack"));
+			(TEST_XNACK_ON (elf_flags) ? "-mxnack=on"
+			 : TEST_XNACK_ANY (elf_flags) ? "-mxnack=any"
+			 : "-mxnack=off"));
       obstack_ptr_grow (&ld_argv_obstack,
 			(TEST_SRAM_ECC_ON (elf_flags) ? "-msram-ecc=on"
 			 : TEST_SRAM_ECC_ANY (elf_flags) ? "-msram-ecc=any"

From patchwork Thu Jul  7 10:34:47 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55837
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9A3E9385274D
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:38:52 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id B88AB3839C77
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:38:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B88AB3839C77
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112839"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:38:15 -0800
IronPort-SDR: 
 iUlIJXuB2Xhm3SEGu51/kG539DbAZlXkrOoKr0udZMPfn/yf2i4JGV/AvpvTul823mj+2q8emx
 +lotecVOiRmjkBoa8Bn7QkIeRlhD8Cwi5z2vJoPOuUT/k3f+3nd5EaZmqf00S2D99Y112del52
 rma8UCPJNNYTzj1PJnp0IxcJfcsKq/x00yK3lsOF8x8UbUSs9Du/NEEJ1Y1j0qad/1pAnDhlGw
 ZxZoTNkoaR77EiuoA/HiGTGj+t6K+6nJgveKGK97hpsJjPVgEHTvelmOm/QMIUCICzXYnPjCDA
 4w8=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
Date: Thu, 7 Jul 2022 11:34:47 +0100
Message-ID: 
 <c73170e6b71179306d40ce6a04a3c02cb4c3cef6.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.

This patch sets a new attribute on OpenMP offload functions to ensure that the
information is passed all the way to the backend.  The backend then places a
marker in the assembler code for mkoffload to find. Finally mkoffload places a
constructor function into the final program to ensure that the HSA_XNACK
environment variable passes the correct mode to the GPU.

The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.

gcc/ChangeLog:

	* config/gcn/gcn.c (unified_shared_memory_enabled): New variable.
	(gcn_init_cumulative_args): Handle attribute "omp unified memory".
	(gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+".
	* config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro.
	(process_asm): Detect "MKOFFLOAD OPTIONS: USM+".
	Emit configure_xnack constructor, as required.
	* omp-low.c (create_omp_child_function): Add attribute "omp unified
	memory".
---
 gcc/config/gcn/gcn.cc       | 28 +++++++++++++++++++++++++++-
 gcc/config/gcn/mkoffload.cc | 37 ++++++++++++++++++++++++++++++++++++-
 gcc/omp-low.cc              |  4 ++++
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 4df05453604..88cc505597e 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -68,6 +68,11 @@ static bool ext_gcn_constants_init = 0;
 
 enum gcn_isa gcn_isa = ISA_GCN3;	/* Default to GCN3.  */
 
+/* Record whether the host compiler added "omp unifed memory" attributes to
+   any functions.  We can then pass this on to mkoffload to ensure xnack is
+   compatible there too.  */
+static bool unified_shared_memory_enabled = false;
+
 /* Reserve this much space for LDS (for propagating variables from
    worker-single mode to worker-partitioned mode), per workgroup.  Global
    analysis could calculate an exact bound, but we don't do that yet.
@@ -2542,6 +2547,25 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ ,
   if (!caller && cfun->machine->normal_function)
     gcn_detect_incoming_pointer_arg (fndecl);
 
+  if (fndecl && lookup_attribute ("omp unified memory",
+				  DECL_ATTRIBUTES (fndecl)))
+    {
+      unified_shared_memory_enabled = true;
+
+      switch (gcn_arch)
+	{
+	case PROCESSOR_FIJI:
+	case PROCESSOR_VEGA10:
+	case PROCESSOR_VEGA20:
+	  error ("GPU architecture does not support Unified Shared Memory");
+	default:
+	  ;
+	}
+
+      if (flag_xnack == HSACO_ATTR_OFF)
+	error ("Unified Shared Memory is enabled, but XNACK is disabled");
+    }
+
   reinit_regs ();
 }
 
@@ -5458,12 +5482,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
   assemble_name (file, name);
   fputs (":\n", file);
 
-  /* This comment is read by mkoffload.  */
+  /* These comments are read by mkoffload.  */
   if (flag_openacc)
     fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n",
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name);
+  if (unified_shared_memory_enabled)
+    fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n");
 }
 
 /* Implement TARGET_ASM_SELECT_SECTION.
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index cb8903c27cb..5741d0a917b 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -80,6 +80,8 @@
 			     == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
 			    == EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_OFF(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			     == EF_AMDGPU_FEATURE_XNACK_OFF_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 				    | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -474,6 +476,7 @@ static void
 process_asm (FILE *in, FILE *out, FILE *cfile)
 {
   int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+  bool unified_shared_memory_enabled = false;
   struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&dims_os);
@@ -498,6 +501,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
+  char dummy;
   enum
     { IN_CODE,
       IN_METADATA,
@@ -517,6 +521,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 		dims_count++;
 	      }
 
+	    if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0)
+	      unified_shared_memory_enabled = true;
+
 	    break;
 	  }
 	case IN_METADATA:
@@ -565,7 +572,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	  }
 	}
 
-      char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
@@ -617,6 +623,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fprintf (cfile, "#include <stdlib.h>\n");
   fprintf (cfile, "#include <stdint.h>\n");
   fprintf (cfile, "#include <stdbool.h>\n\n");
+  fprintf (cfile, "#include <stdio.h>\n\n");
 
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
 
@@ -657,6 +664,34 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
     }
   fprintf (cfile, "\n};\n\n");
 
+  /* Emit a constructor function to set the HSA_XNACK environment variable.
+     This must be done before the ROCr runtime library is loaded.
+     We never override a user value (exit empty string), but we do emit a
+     useful diagnostic in the wrong mode (the ROCr message is not good.  */
+  if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled)
+    fatal_error (input_location,
+		 "conflicting settings; XNACK is forced off but Unified "
+		 "Shared Memory is on");
+  if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled)
+    fprintf (cfile,
+	     "static __attribute__((constructor))\n"
+	     "void configure_xnack (void)\n"
+	     "{\n"
+	     "  const char *val = getenv (\"HSA_XNACK\");\n"
+	     "  if (!val || val[0] == '\\0')\n"
+	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
+	     "  else if (%s)\n"
+	     "    {\n"
+	     "      fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; "
+			    "please unset\\n\", val);\n"
+	     "      exit (1);\n"
+	     "    }\n"
+	     "}\n\n",
+	     unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags),
+	     (unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags)
+	      ? "val[0] != '1' || val[1] != '\\0'"
+	      : "val[0] == '1' && val[1] == '\\0'"));
+
   obstack_free (&fns_os, NULL);
   for (i = 0; i < dims_count; i++)
     free (dims[i].name);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 7d1a2a0d795..239446beb52 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -2107,6 +2107,10 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier (target_attr),
 		       NULL_TREE, DECL_ATTRIBUTES (decl));
+      if (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED)
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("omp unified memory"),
+		       NULL_TREE, DECL_ATTRIBUTES (decl));
     }
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),

From patchwork Thu Jul  7 10:34:48 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 55835
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id EA0883852766
	for <patchwork@sourceware.org>; Thu,  7 Jul 2022 10:38:35 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 4B5953851162
 for <gcc-patches@gcc.gnu.org>; Thu,  7 Jul 2022 10:38:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4B5953851162
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112840"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:38:17 -0800
IronPort-SDR: 
 hsmDtHG4c9ob87hnU+MLMl6M5znrCHpz//VKymun+IF49fgdI9s6nfDa9Z1iY69+oYeLjmPZGy
 Hn6oRaMHaKtBUlylJRrs3Xr+Rd4zdecF6ajEDKme25VTSEXfw0vqDWMZ5Q41I7SHWk4LkRLITN
 EfqY7oexLJiyvc02946oNiPeSux1F8H/ImUQk+bmMBw96mgj5Gm+6z9PMY41s0XuZpITJs9LfX
 CAxE4WdXlIbOQkMDl3zQm2jSWLx8Jj6bbK6W/viICJq6RToiXN10RqR7JHIOq2dOFa24GohOHG
 arU=
From: Andrew Stubbs <ams@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH 17/17] amdgcn: libgomp plugin USM implementation
Date: Thu, 7 Jul 2022 11:34:48 +0100
Message-ID: 
 <9d8aca9014fe40a76e1f389daf94351b522ab73b.1657188329.git.ams@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <cover.1657188329.git.ams@codesourcery.com>
References: <cover.1657188329.git.ams@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Implement the Unified Shared Memory API calls in the GCN plugin.

The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware.  However, there's
no known way to check what memory region was used, after the fact, so we use a
splay tree to record allocations so we can answer "is_usm_ptr" later.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Allow
	GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(struct usm_splay_tree_key_s): New.
	(usm_splay_compare): New.
	(splay_tree_prefix): New.
	(GOMP_OFFLOAD_usm_alloc): New.
	(GOMP_OFFLOAD_usm_free): New.
	(GOMP_OFFLOAD_is_usm_ptr): New.
	(GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold.
	Add GOMP_REQUIRES_UNIFIED_ADDRESS and
	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
	(gomp_fatal): New.
	(splay_tree_c): New.
	* testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New.
	* testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target.
	* testsuite/libgomp.c-c++-common/requires-1.c: Likewise.
	* testsuite/libgomp.c/usm-1.c: Likewise.
	* testsuite/libgomp.c/usm-2.c: Likewise.
	* testsuite/libgomp.c/usm-3.c: Likewise.
	* testsuite/libgomp.c/usm-4.c: Likewise.
	* testsuite/libgomp.c/usm-5.c: Likewise.
	* testsuite/libgomp.c/usm-6.c: Likewise.
---
 libgomp/plugin/plugin-gcn.c                   | 104 +++++++++++++++++-
 libgomp/testsuite/lib/libgomp.exp             |  22 ++++
 libgomp/testsuite/libgomp.c++/usm-1.C         |   2 +-
 .../libgomp.c-c++-common/requires-1.c         |   1 +
 libgomp/testsuite/libgomp.c/usm-1.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-2.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-3.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-4.c           |   1 +
 libgomp/testsuite/libgomp.c/usm-5.c           |   2 +-
 libgomp/testsuite/libgomp.c/usm-6.c           |   2 +-
 10 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..6a9ff5cd93e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3226,7 +3226,11 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (!init_hsa_context ())
     return 0;
   /* Return -1 if no omp_requires_mask cannot be fulfilled but
-     devices were present.  */
+     devices were present.
+     Note: not all devices support USM, but the compiler refuses to create
+     binaries for those that don't anyway.  */
+  omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+			 | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
   if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
     return -1;
   return hsa_context.agent_count;
@@ -3810,6 +3814,89 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
 		       GOMP_PLUGIN_target_task_completion, async_data);
 }
 
+/* Use a splay tree to track USM allocations.  */
+
+typedef struct usm_splay_tree_node_s *usm_splay_tree_node;
+typedef struct usm_splay_tree_s *usm_splay_tree;
+typedef struct usm_splay_tree_key_s *usm_splay_tree_key;
+
+struct usm_splay_tree_key_s {
+  void *addr;
+  size_t size;
+};
+
+static inline int
+usm_splay_compare (usm_splay_tree_key x, usm_splay_tree_key y)
+{
+  if ((x->addr <= y->addr && x->addr + x->size > y->addr)
+      || (y->addr <= x->addr && y->addr + y->size > x->addr))
+    return 0;
+
+  return (x->addr > y->addr ? 1 : -1);
+}
+
+#define splay_tree_prefix usm
+#include "../splay-tree.h"
+
+static struct usm_splay_tree_s usm_map = { NULL };
+
+/* Allocate memory suitable for Unified Shared Memory.
+
+   In fact, AMD memory need only be "coarse grained", which target
+   allocations already are.  We do need to track allocations so that
+   GOMP_OFFLOAD_is_usm_ptr can look them up.  */
+
+void *
+GOMP_OFFLOAD_usm_alloc (int device, size_t size)
+{
+  void *ptr = GOMP_OFFLOAD_alloc (device, size);
+
+  usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s));
+  node->key.addr = ptr;
+  node->key.size = size;
+  node->left = NULL;
+  node->right = NULL;
+  usm_splay_tree_insert (&usm_map, node);
+
+  return ptr;
+}
+
+/* Free memory allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_usm_free (int device, void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  usm_splay_tree_key node = usm_splay_tree_lookup (&usm_map, &key);
+  if (node)
+    {
+      usm_splay_tree_remove (&usm_map, &key);
+      free (node);
+    }
+
+  return GOMP_OFFLOAD_free (device, ptr);
+}
+
+/* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  return usm_splay_tree_lookup (&usm_map, &key);
+}
+
+/* Indicate which GOMP_REQUIRES_* features are supported.  */
+
+bool
+GOMP_OFFLOAD_supported_features (unsigned int *mask)
+{
+  *mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+             | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
+
+  return (*mask == 0);
+}
+
 /* }}} */
 /* {{{ OpenACC Plugin API  */
 
@@ -4084,3 +4171,18 @@ GOMP_OFFLOAD_openacc_destroy_thread_data (void *data)
 }
 
 /* }}} */
+/* {{{ USM splay tree */
+
+/* Include this now so that splay-tree.c doesn't include it later.  This
+   avoids a conflict with splay_tree_prefix.  */
+#include "libgomp.h"
+
+/* This allows splay-tree.c to call gomp_fatal in this context.  The splay
+   tree code doesn't use the variadic arguments right now.  */
+#define gomp_fatal(MSG, ...) GOMP_PLUGIN_fatal (MSG)
+
+/* Include the splay tree code inline, with the prefixes added.  */
+#define splay_tree_prefix usm
+#define splay_tree_c
+#include "../splay-tree.h"
+/* }}}  */
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 891f90929d2..dce1af279e1 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -536,3 +536,25 @@ int main() {
     return 0;
 } } "-lcuda -lcudart" ]
 }
+
+# return 1 if OpenMP Unified Share Memory is supported
+
+proc check_effective_target_omp_usm { } {
+    if { [libgomp_check_effective_target_offload_target "nvptx"] } {
+	return 1
+    }
+
+    if { [libgomp_check_effective_target_offload_target "amdgcn"] } {
+	return [check_no_compiler_messages omp_usm executable {
+           #pragma omp requires unified_shared_memory
+	   int main () {
+	     #pragma omp target
+	       ;
+	     return 0;
+	   }
+	}]
+    }
+
+    return 0
+}
+
diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C
index fea25e5f10b..6e88f90d61f 100644
--- a/libgomp/testsuite/libgomp.c++/usm-1.C
+++ b/libgomp/testsuite/libgomp.c++/usm-1.C
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+/* { dg-require-effective-target omp_usm } */
 #include <stdint.h>
 
 #pragma omp requires unified_shared_memory
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
index fedf9779769..b760d5ebaf7 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
@@ -1,5 +1,6 @@
 /* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } */
 /* { dg-additional-sources requires-1-aux.c } */
+/* { dg-require-effective-target omp_usm } */
 
 /* Check diagnostic by device-compiler's lto1.
    Other file uses: 'requires unified_address'.  */
diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c
index 1b35f19c45b..e73f1816f9a 100644
--- a/libgomp/testsuite/libgomp.c/usm-1.c
+++ b/libgomp/testsuite/libgomp.c/usm-1.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c
index 689cee7e456..31f2bae7145 100644
--- a/libgomp/testsuite/libgomp.c/usm-2.c
+++ b/libgomp/testsuite/libgomp.c/usm-2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c
index 2ca66afe93f..2c78a0d8ced 100644
--- a/libgomp/testsuite/libgomp.c/usm-3.c
+++ b/libgomp/testsuite/libgomp.c/usm-3.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c
index 753908c8440..1ac5498f73f 100644
--- a/libgomp/testsuite/libgomp.c/usm-4.c
+++ b/libgomp/testsuite/libgomp.c/usm-4.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c
index 4d8b3cf71b1..563397f941a 100644
--- a/libgomp/testsuite/libgomp.c/usm-5.c
+++ b/libgomp/testsuite/libgomp.c/usm-5.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <omp.h>
 #include <stdint.h>
diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c
index c207140092a..bd14f8197b3 100644
--- a/libgomp/testsuite/libgomp.c/usm-6.c
+++ b/libgomp/testsuite/libgomp.c/usm-6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */
+/* { dg-require-effective-target omp_usm } */
 
 #include <stdint.h>
 #include <stdlib.h>