From patchwork Thu Jul 7 10:34:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55822 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 62736384D1BA for ; Thu, 7 Jul 2022 10:35:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 845BB3857B99 for ; Thu, 7 Jul 2022 10:35:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 845BB3857B99 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112657" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:13 -0800 IronPort-SDR: lw219uuPrHWMB2ohKRIoS300zZMXKYjx39vKvNplsppqkrsDScUMkMgH6tI23D+k6ihGh8k8lB Jfind9b37Qbgr7GklpTCQthT6WilZmN1/0FHh6FFRW23PLCZ33j0EQBSyNJvKwRn34GOKtciRm rowpSbtXbZQBbfYxsoerhkVH3gpEgi73tOlCf1bC0Q2MGYZIkaGmx0S3Y6HekguovyUNBbwUp3 V5H+rCGbGKQmlvanjz6WWL77sLCBd0KPsvryZ18wISU7wLmLP9HtNHUob8CkJGtVyafblF1z9N 8ec= From: Andrew Stubbs To: Subject: [PATCH 01/17] libgomp, nvptx: low-latency memory allocator Date: Thu, 7 Jul 2022 11:34:32 +0100 Message-ID: <400092d8ce44340cece0e2e38f88edbad6400b03.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using the GOMP_NVPTX_LOWLAT_POOL environment variable. The use of the PTX dynamic_smem_size feature means that low-latency allocator will not work with the PTX 3.1 multilib. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): New macro. (MEMSPACE_CALLOC): New macro. (MEMSPACE_REALLOC): New macro. (MEMSPACE_FREE): New macro. (dynamic_smem_size): New constants. (omp_alloc): Use MEMSPACE_ALLOC. Implement fall-backs for predefined allocators. (omp_free): Use MEMSPACE_FREE. (omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for predefined allocators. (omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC.. Implement fall-backs for predefined allocators. * config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable. (__nvptx_lowlat_pool): New asm varaible. (gomp_nvptx_main): Initialize the low-latency heap. * plugin/plugin-nvptx.c (lowlat_pool_size): New variable. (GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar. (GOMP_OFFLOAD_run): Apply lowlat_pool_size. * config/nvptx/allocator.c: New file. * testsuite/libgomp.c/allocators-1.c: New test. * testsuite/libgomp.c/allocators-2.c: New test. * testsuite/libgomp.c/allocators-3.c: New test. * testsuite/libgomp.c/allocators-4.c: New test. * testsuite/libgomp.c/allocators-5.c: New test. * testsuite/libgomp.c/allocators-6.c: New test. co-authored-by: Kwok Cheung Yeung --- libgomp/allocator.c | 235 ++++++++----- libgomp/config/nvptx/allocator.c | 370 +++++++++++++++++++++ libgomp/config/nvptx/team.c | 28 ++ libgomp/plugin/plugin-nvptx.c | 23 +- libgomp/testsuite/libgomp.c/allocators-1.c | 56 ++++ libgomp/testsuite/libgomp.c/allocators-2.c | 64 ++++ libgomp/testsuite/libgomp.c/allocators-3.c | 42 +++ libgomp/testsuite/libgomp.c/allocators-4.c | 196 +++++++++++ libgomp/testsuite/libgomp.c/allocators-5.c | 63 ++++ libgomp/testsuite/libgomp.c/allocators-6.c | 117 +++++++ 10 files changed, 1110 insertions(+), 84 deletions(-) create mode 100644 libgomp/config/nvptx/allocator.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-1.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-2.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-3.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-4.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-5.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-6.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index b04820b8cf9..9b33bcf529b 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -37,6 +37,34 @@ #define omp_max_predefined_alloc omp_thread_mem_alloc +/* These macros may be overridden in config//allocator.c. */ +#ifndef MEMSPACE_ALLOC +#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE) +#endif +#ifndef MEMSPACE_CALLOC +#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE) +#endif +#ifndef MEMSPACE_REALLOC +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE) +#endif +#ifndef MEMSPACE_FREE +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR) +#endif + +/* Map the predefined allocators to the correct memory space. + The index to this table is the omp_allocator_handle_t enum value. */ +static const omp_memspace_handle_t predefined_alloc_mapping[] = { + omp_default_mem_space, /* omp_null_allocator. */ + omp_default_mem_space, /* omp_default_mem_alloc. */ + omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */ + omp_default_mem_space, /* omp_const_mem_alloc. */ + omp_high_bw_mem_space, /* omp_high_bw_mem_alloc. */ + omp_low_lat_mem_space, /* omp_low_lat_mem_alloc. */ + omp_low_lat_mem_space, /* omp_cgroup_mem_alloc. */ + omp_low_lat_mem_space, /* omp_pteam_mem_alloc. */ + omp_low_lat_mem_space, /* omp_thread_mem_alloc. */ +}; + enum gomp_memkind_kind { GOMP_MEMKIND_NONE = 0, @@ -453,7 +481,7 @@ retry: } else #endif - ptr = malloc (new_size); + ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size); if (ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -478,7 +506,13 @@ retry: } else #endif - ptr = malloc (new_size); + { + omp_memspace_handle_t memspace __attribute__((unused)) + = (allocator_data + ? allocator_data->memspace + : predefined_alloc_mapping[allocator]); + ptr = MEMSPACE_ALLOC (memspace, new_size); + } if (ptr == NULL) goto fail; } @@ -496,35 +530,38 @@ retry: return ret; fail: - if (allocator_data) + int fallback = (allocator_data + ? allocator_data->fallback + : allocator == omp_default_mem_alloc + ? omp_atv_null_fb + : omp_atv_default_mem_fb); + switch (fallback) { - switch (allocator_data->fallback) - { - case omp_atv_default_mem_fb: - if ((new_alignment > sizeof (void *) && new_alignment > alignment) + case omp_atv_default_mem_fb: + if ((new_alignment > sizeof (void *) && new_alignment > alignment) #ifdef LIBGOMP_USE_MEMKIND - || memkind + || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0)) - { - allocator = omp_default_mem_alloc; - goto retry; - } - /* Otherwise, we've already performed default mem allocation - and if that failed, it won't succeed again (unless it was - intermittent. Return NULL then, as that is the fallback. */ - break; - case omp_atv_null_fb: - break; - default: - case omp_atv_abort_fb: - gomp_fatal ("Out of memory allocating %lu bytes", - (unsigned long) size); - case omp_atv_allocator_fb: - allocator = allocator_data->fb_data; + || (allocator_data + && allocator_data->pool_size < ~(uintptr_t) 0) + || !allocator_data) + { + allocator = omp_default_mem_alloc; goto retry; } + /* Otherwise, we've already performed default mem allocation + and if that failed, it won't succeed again (unless it was + intermittent. Return NULL then, as that is the fallback. */ + break; + case omp_atv_null_fb: + break; + default: + case omp_atv_abort_fb: + gomp_fatal ("Out of memory allocating %lu bytes", + (unsigned long) size); + case omp_atv_allocator_fb: + allocator = allocator_data->fb_data; + goto retry; } return NULL; } @@ -557,6 +594,8 @@ void omp_free (void *ptr, omp_allocator_handle_t allocator) { struct omp_mem_header *data; + omp_memspace_handle_t memspace __attribute__((unused)) + = omp_default_mem_space; if (ptr == NULL) return; @@ -586,10 +625,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) return; } #endif + + memspace = allocator_data->memspace; } -#ifdef LIBGOMP_USE_MEMKIND else { +#ifdef LIBGOMP_USE_MEMKIND enum gomp_memkind_kind memkind = GOMP_MEMKIND_NONE; if (data->allocator == omp_high_bw_mem_alloc) memkind = GOMP_MEMKIND_HBW_PREFERRED; @@ -605,9 +646,12 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) return; } } - } #endif - free (data->ptr); + + memspace = predefined_alloc_mapping[data->allocator]; + } + + MEMSPACE_FREE (memspace, data->ptr, data->size); } ialias (omp_free) @@ -723,7 +767,7 @@ retry: } else #endif - ptr = calloc (1, new_size); + ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size); if (ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -748,7 +792,13 @@ retry: } else #endif - ptr = calloc (1, new_size); + { + omp_memspace_handle_t memspace __attribute__((unused)) + = (allocator_data + ? allocator_data->memspace + : predefined_alloc_mapping[allocator]); + ptr = MEMSPACE_CALLOC (memspace, new_size); + } if (ptr == NULL) goto fail; } @@ -766,35 +816,38 @@ retry: return ret; fail: - if (allocator_data) + int fallback = (allocator_data + ? allocator_data->fallback + : allocator == omp_default_mem_alloc + ? omp_atv_null_fb + : omp_atv_default_mem_fb); + switch (fallback) { - switch (allocator_data->fallback) - { - case omp_atv_default_mem_fb: - if ((new_alignment > sizeof (void *) && new_alignment > alignment) + case omp_atv_default_mem_fb: + if ((new_alignment > sizeof (void *) && new_alignment > alignment) #ifdef LIBGOMP_USE_MEMKIND - || memkind + || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0)) - { - allocator = omp_default_mem_alloc; - goto retry; - } - /* Otherwise, we've already performed default mem allocation - and if that failed, it won't succeed again (unless it was - intermittent. Return NULL then, as that is the fallback. */ - break; - case omp_atv_null_fb: - break; - default: - case omp_atv_abort_fb: - gomp_fatal ("Out of memory allocating %lu bytes", - (unsigned long) (size * nmemb)); - case omp_atv_allocator_fb: - allocator = allocator_data->fb_data; + || (allocator_data + && allocator_data->pool_size < ~(uintptr_t) 0) + || !allocator_data) + { + allocator = omp_default_mem_alloc; goto retry; } + /* Otherwise, we've already performed default mem allocation + and if that failed, it won't succeed again (unless it was + intermittent. Return NULL then, as that is the fallback. */ + break; + case omp_atv_null_fb: + break; + default: + case omp_atv_abort_fb: + gomp_fatal ("Out of memory allocating %lu bytes", + (unsigned long) (size * nmemb)); + case omp_atv_allocator_fb: + allocator = allocator_data->fb_data; + goto retry; } return NULL; } @@ -967,9 +1020,10 @@ retry: else #endif if (prev_size) - new_ptr = realloc (data->ptr, new_size); + new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr, + data->size, new_size); else - new_ptr = malloc (new_size); + new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size); if (new_ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -1010,7 +1064,13 @@ retry: } else #endif - new_ptr = realloc (data->ptr, new_size); + { + omp_memspace_handle_t memspace __attribute__((unused)) + = (allocator_data + ? allocator_data->memspace + : predefined_alloc_mapping[allocator]); + new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size); + } if (new_ptr == NULL) goto fail; ret = (char *) new_ptr + sizeof (struct omp_mem_header); @@ -1030,7 +1090,13 @@ retry: } else #endif - new_ptr = malloc (new_size); + { + omp_memspace_handle_t memspace __attribute__((unused)) + = (allocator_data + ? allocator_data->memspace + : predefined_alloc_mapping[allocator]); + new_ptr = MEMSPACE_ALLOC (memspace, new_size); + } if (new_ptr == NULL) goto fail; } @@ -1073,35 +1139,38 @@ retry: return ret; fail: - if (allocator_data) + int fallback = (allocator_data + ? allocator_data->fallback + : allocator == omp_default_mem_alloc + ? omp_atv_null_fb + : omp_atv_default_mem_fb); + switch (fallback) { - switch (allocator_data->fallback) - { - case omp_atv_default_mem_fb: - if (new_alignment > sizeof (void *) + case omp_atv_default_mem_fb: + if (new_alignment > sizeof (void *) #ifdef LIBGOMP_USE_MEMKIND - || memkind + || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0)) - { - allocator = omp_default_mem_alloc; - goto retry; - } - /* Otherwise, we've already performed default mem allocation - and if that failed, it won't succeed again (unless it was - intermittent. Return NULL then, as that is the fallback. */ - break; - case omp_atv_null_fb: - break; - default: - case omp_atv_abort_fb: - gomp_fatal ("Out of memory allocating %lu bytes", - (unsigned long) size); - case omp_atv_allocator_fb: - allocator = allocator_data->fb_data; + || (allocator_data + && allocator_data->pool_size < ~(uintptr_t) 0) + || !allocator_data) + { + allocator = omp_default_mem_alloc; goto retry; } + /* Otherwise, we've already performed default mem allocation + and if that failed, it won't succeed again (unless it was + intermittent. Return NULL then, as that is the fallback. */ + break; + case omp_atv_null_fb: + break; + default: + case omp_atv_abort_fb: + gomp_fatal ("Out of memory allocating %lu bytes", + (unsigned long) size); + case omp_atv_allocator_fb: + allocator = allocator_data->fb_data; + goto retry; } return NULL; } diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c new file mode 100644 index 00000000000..6bc2ea48043 --- /dev/null +++ b/libgomp/config/nvptx/allocator.c @@ -0,0 +1,370 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* The low-latency allocators use space reserved in .shared memory when the + kernel is launched. The heap is initialized in gomp_nvptx_main and all + allocations are forgotten when the kernel exits. Allocations to other + memory spaces all use the system malloc syscall. + + The root heap descriptor is stored elsewhere in shared memory, and each + free chunk contains a similar descriptor for the next free chunk in the + chain. + + The descriptor is two 16-bit values: offset and size, which describe the + location of a chunk of memory available for allocation. The offset is + relative to the base of the heap. The special value 0xffff, 0xffff + indicates that the heap is locked. The descriptor is encoded into a + single 32-bit integer so that it may be easily accessed atomically. + + Memory is allocated to the first free chunk that fits. The free chain + is always stored in order of the offset to assist coalescing adjacent + chunks. */ + +#include "libgomp.h" +#include + +/* There should be some .shared space reserved for us. There's no way to + express this magic extern sizeless array in C so use asm. */ +asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n"); + +extern uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon)); + +typedef union { + uint32_t raw; + struct { + uint16_t offset; + uint16_t size; + } desc; +} heapdesc; + +static void * +nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size) +{ + if (memspace == omp_low_lat_mem_space) + { + char *shared_pool; + asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool)); + + /* Memory is allocated in 8-byte granularity. */ + size = (size + 7) & ~7; + + /* Acquire a lock on the low-latency heap. */ + heapdesc root; + do + { + root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root, + 0xffffffff, MEMMODEL_ACQUIRE); + if (root.raw != 0xffffffff) + break; + /* Spin. */ + } + while (1); + + /* Walk the free chain. */ + heapdesc chunk = {root.raw}; + uint32_t *prev_chunkptr = NULL; + uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset); + heapdesc onward_chain = {chunkptr[0]}; + while (chunk.desc.size != 0 && (uint32_t)size > chunk.desc.size) + { + chunk.raw = onward_chain.raw; + prev_chunkptr = chunkptr; + chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset); + onward_chain.raw = chunkptr[0]; + } + + void *result = NULL; + if (chunk.desc.size != 0) + { + /* Allocation successful. */ + result = chunkptr; + + /* Update the free chain. */ + heapdesc stillfree = {chunk.raw}; + stillfree.desc.offset += size; + stillfree.desc.size -= size; + uint32_t *stillfreeptr = (uint32_t*)(shared_pool + + stillfree.desc.offset); + + if (stillfree.desc.size == 0) + /* The whole chunk was used. */ + stillfree.raw = onward_chain.raw; + else + /* The chunk was split, so restore the onward chain. */ + stillfreeptr[0] = onward_chain.raw; + + /* The previous free slot or root now points to stillfree. */ + if (prev_chunkptr) + prev_chunkptr[0] = stillfree.raw; + else + root.raw = stillfree.raw; + } + + /* Update the free chain root and release the lock. */ + __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + return result; + } + else + return malloc (size); +} + +static void * +nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size) +{ + if (memspace == omp_low_lat_mem_space) + { + /* Memory is allocated in 8-byte granularity. */ + size = (size + 7) & ~7; + + uint64_t *result = nvptx_memspace_alloc (memspace, size); + if (result) + /* Inline memset in which we know size is a multiple of 8. */ + for (unsigned i = 0; i < (unsigned)size/8; i++) + result[i] = 0; + + return result; + } + else + return calloc (1, size); +} + +static void +nvptx_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size) +{ + if (memspace == omp_low_lat_mem_space) + { + char *shared_pool; + asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool)); + + /* Memory is allocated in 8-byte granularity. */ + size = (size + 7) & ~7; + + /* Acquire a lock on the low-latency heap. */ + heapdesc root; + do + { + root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root, + 0xffffffff, MEMMODEL_ACQUIRE); + if (root.raw != 0xffffffff) + break; + /* Spin. */ + } + while (1); + + /* Walk the free chain to find where to insert a new entry. */ + heapdesc chunk = {root.raw}, prev_chunk; + uint32_t *prev_chunkptr = NULL, *prevprev_chunkptr = NULL; + uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset); + heapdesc onward_chain = {chunkptr[0]}; + while (chunk.desc.size != 0 && addr > (void*)chunkptr) + { + prev_chunk.raw = chunk.raw; + chunk.raw = onward_chain.raw; + prevprev_chunkptr = prev_chunkptr; + prev_chunkptr = chunkptr; + chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset); + onward_chain.raw = chunkptr[0]; + } + + /* Create the new chunk descriptor. */ + heapdesc newfreechunk; + newfreechunk.desc.offset = (uint16_t)((uintptr_t)addr + - (uintptr_t)shared_pool); + newfreechunk.desc.size = (uint16_t)size; + + /* Coalesce adjacent free chunks. */ + if (newfreechunk.desc.offset + size == chunk.desc.offset) + { + /* Free chunk follows. */ + newfreechunk.desc.size += chunk.desc.size; + chunk.raw = onward_chain.raw; + } + if (prev_chunkptr) + { + if (prev_chunk.desc.offset + prev_chunk.desc.size + == newfreechunk.desc.offset) + { + /* Free chunk precedes. */ + newfreechunk.desc.offset = prev_chunk.desc.offset; + newfreechunk.desc.size += prev_chunk.desc.size; + addr = shared_pool + prev_chunk.desc.offset; + prev_chunkptr = prevprev_chunkptr; + } + } + + /* Update the free chain in the new and previous chunks. */ + ((uint32_t*)addr)[0] = chunk.raw; + if (prev_chunkptr) + prev_chunkptr[0] = newfreechunk.raw; + else + root.raw = newfreechunk.raw; + + /* Update the free chain root and release the lock. */ + __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + } + else + free (addr); +} + +static void * +nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, + size_t oldsize, size_t size) +{ + if (memspace == omp_low_lat_mem_space) + { + char *shared_pool; + asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool)); + + /* Memory is allocated in 8-byte granularity. */ + oldsize = (oldsize + 7) & ~7; + size = (size + 7) & ~7; + + if (oldsize == size) + return addr; + + /* Acquire a lock on the low-latency heap. */ + heapdesc root; + do + { + root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root, + 0xffffffff, MEMMODEL_ACQUIRE); + if (root.raw != 0xffffffff) + break; + /* Spin. */ + } + while (1); + + /* Walk the free chain. */ + heapdesc chunk = {root.raw}; + uint32_t *prev_chunkptr = NULL; + uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset); + heapdesc onward_chain = {chunkptr[0]}; + while (chunk.desc.size != 0 && (void*)chunkptr < addr) + { + chunk.raw = onward_chain.raw; + prev_chunkptr = chunkptr; + chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset); + onward_chain.raw = chunkptr[0]; + } + + void *result = NULL; + if (size < oldsize) + { + /* The new allocation is smaller than the old; we can always + shrink an allocation in place. */ + result = addr; + + uint32_t *nowfreeptr = (uint32_t*)(addr + size); + + /* Update the free chain. */ + heapdesc nowfree; + nowfree.desc.offset = (char*)nowfreeptr - shared_pool; + nowfree.desc.size = oldsize - size; + + if (nowfree.desc.offset + size == chunk.desc.offset) + { + /* Coalesce following free chunk. */ + nowfree.desc.size += chunk.desc.size; + nowfreeptr[0] = onward_chain.raw; + } + else + nowfreeptr[0] = chunk.raw; + + /* The previous free slot or root now points to nowfree. */ + if (prev_chunkptr) + prev_chunkptr[0] = nowfree.raw; + else + root.raw = nowfree.raw; + } + else if (chunk.desc.size != 0 + && (char *)addr + oldsize == (char *)chunkptr + && chunk.desc.size >= size-oldsize) + { + /* The new allocation is larger than the old, and we found a + large enough free block right after the existing block, + so we extend into that space. */ + result = addr; + + uint16_t delta = size-oldsize; + + /* Update the free chain. */ + heapdesc stillfree = {chunk.raw}; + stillfree.desc.offset += delta; + stillfree.desc.size -= delta; + uint32_t *stillfreeptr = (uint32_t*)(shared_pool + + stillfree.desc.offset); + + if (stillfree.desc.size == 0) + /* The whole chunk was used. */ + stillfree.raw = onward_chain.raw; + else + /* The chunk was split, so restore the onward chain. */ + stillfreeptr[0] = onward_chain.raw; + + /* The previous free slot or root now points to stillfree. */ + if (prev_chunkptr) + prev_chunkptr[0] = stillfree.raw; + else + root.raw = stillfree.raw; + } + /* Else realloc in-place has failed and result remains NULL. */ + + /* Update the free chain root and release the lock. */ + __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + + if (result == NULL) + { + /* The allocation could not be extended in place, so we simply + allocate fresh memory and move the data. If we can't allocate + from low-latency memory then we leave the original alloaction + intact and return NULL. + We could do a fall-back to main memory, but we don't know what + the fall-back trait said to do. */ + result = nvptx_memspace_alloc (memspace, size); + if (result != NULL) + { + /* Inline memcpy in which we know oldsize is a multiple of 8. */ + uint64_t *from = addr, *to = result; + for (unsigned i = 0; i < (unsigned)oldsize/8; i++) + to[i] = from[i]; + + nvptx_memspace_free (memspace, addr, oldsize); + } + } + return result; + } + else + return realloc (addr, size); +} + +#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \ + nvptx_memspace_alloc (MEMSPACE, SIZE) +#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ + nvptx_memspace_calloc (MEMSPACE, SIZE) +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \ + nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE) +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \ + nvptx_memspace_free (MEMSPACE, ADDR, SIZE) + +#include "../../allocator.c" diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c index 6923416fb4e..65a7af3417b 100644 --- a/libgomp/config/nvptx/team.c +++ b/libgomp/config/nvptx/team.c @@ -33,9 +33,13 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon)); int __gomp_team_num __attribute__((shared,nocommon)); +uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon)); static void gomp_thread_start (struct gomp_thread_pool *); +/* There should be some .shared space reserved for us. There's no way to + express this magic extern sizeless array in C so use asm. */ +asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n"); /* This externally visible function handles target region entry. It sets up a per-team thread pool and transfers control by calling FN (FN_DATA) @@ -63,6 +67,30 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data) nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs)); memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs)); + /* Find the low-latency heap details .... */ + uint32_t *shared_pool; + uint32_t shared_pool_size = 0; + asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool)); +#if __PTX_ISA_VERSION_MAJOR__ > 4 \ + || (__PTX_ISA_VERSION_MAJOR__ == 4 && __PTX_ISA_VERSION_MAJOR__ >= 1) + asm ("mov.u32\t%0, %%dynamic_smem_size;\n" + : "=r"(shared_pool_size)); +#endif + + /* ... and initialize it with an empty free-chain. */ + union { + uint32_t raw; + struct { + uint16_t offset; + uint16_t size; + } desc; + } root; + root.desc.offset = 0; /* The first byte is free. */ + root.desc.size = shared_pool_size; /* The whole space is free. */ + shared_pool[0] = 0; /* Terminate free chain. */ + __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + + /* Initialize the thread pool. */ struct gomp_thread_pool *pool = alloca (sizeof (*pool)); pool->threads = alloca (ntids * sizeof (*pool->threads)); for (tid = 0; tid < ntids; tid++) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index bc63e274cdf..40739ba592d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -334,6 +334,11 @@ struct ptx_device static struct ptx_device **ptx_devices; +/* OpenMP kernels reserve a small amount of ".shared" space for use by + omp_alloc. The size is configured using GOMP_NVPTX_LOWLAT_POOL, but the + default is set here. */ +static unsigned lowlat_pool_size = 8*1024; + static inline struct nvptx_thread * nvptx_thread (void) { @@ -1205,6 +1210,22 @@ GOMP_OFFLOAD_init_device (int n) instantiated_devices++; } + const char *var_name = "GOMP_NVPTX_LOWLAT_POOL"; + const char *env_var = secure_getenv (var_name); + notify_var (var_name, env_var); + + if (env_var != NULL) + { + char *endptr; + unsigned long val = strtoul (env_var, &endptr, 10); + if (endptr == NULL || *endptr != '\0' + || errno == ERANGE || errno == EINVAL + || val > UINT_MAX) + GOMP_PLUGIN_error ("Error parsing %s", var_name); + else + lowlat_pool_size = val; + } + pthread_mutex_unlock (&ptx_dev_lock); return dev != NULL; @@ -2030,7 +2051,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) " [(teams: %u), 1, 1] [(lanes: 32), (threads: %u), 1]\n", __FUNCTION__, fn_name, teams, threads); r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1, - 32, threads, 1, 0, NULL, NULL, config); + 32, threads, 1, lowlat_pool_size, NULL, NULL, config); if (r != CUDA_SUCCESS) GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r)); diff --git a/libgomp/testsuite/libgomp.c/allocators-1.c b/libgomp/testsuite/libgomp.c/allocators-1.c new file mode 100644 index 00000000000..04968e4c83d --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-1.c @@ -0,0 +1,56 @@ +/* { dg-do run } */ + +/* Test that omp_alloc returns usable memory. */ + +#include + +#pragma omp requires dynamic_allocators + +void +test (int n, omp_allocator_handle_t allocator) +{ + #pragma omp target map(to:n) map(to:allocator) + { + int *a; + a = (int *) omp_alloc(n*sizeof(int), allocator); + + #pragma omp parallel + for (int i = 0; i < n; i++) + a[i] = i; + + for (int i = 0; i < n; i++) + if (a[i] != i) + { + __builtin_printf ("data mismatch at %i\n", i); + __builtin_abort (); + } + + omp_free(a, allocator); + } +} + +int +main () +{ + // Smaller than low-latency memory limit + test (10, omp_default_mem_alloc); + test (10, omp_large_cap_mem_alloc); + test (10, omp_const_mem_alloc); + test (10, omp_high_bw_mem_alloc); + test (10, omp_low_lat_mem_alloc); + test (10, omp_cgroup_mem_alloc); + test (10, omp_pteam_mem_alloc); + test (10, omp_thread_mem_alloc); + + // Larger than low-latency memory limit + test (100000, omp_default_mem_alloc); + test (100000, omp_large_cap_mem_alloc); + test (100000, omp_const_mem_alloc); + test (100000, omp_high_bw_mem_alloc); + test (100000, omp_low_lat_mem_alloc); + test (100000, omp_cgroup_mem_alloc); + test (100000, omp_pteam_mem_alloc); + test (100000, omp_thread_mem_alloc); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/allocators-2.c b/libgomp/testsuite/libgomp.c/allocators-2.c new file mode 100644 index 00000000000..a98f1b4c05e --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-2.c @@ -0,0 +1,64 @@ +/* { dg-do run } */ + +/* Test concurrent and repeated allocations. */ + +#include + +#pragma omp requires dynamic_allocators + +void +test (int n, omp_allocator_handle_t allocator) +{ + #pragma omp target map(to:n) map(to:allocator) + { + int **a; + a = (int **) omp_alloc(n*sizeof(int*), allocator); + + #pragma omp parallel for + for (int i = 0; i < n; i++) + { + /*Use 10x to ensure we do activate low-latency fall-back. */ + a[i] = omp_alloc(sizeof(int)*10, allocator); + a[i][0] = i; + } + + for (int i = 0; i < n; i++) + if (a[i][0] != i) + { + __builtin_printf ("data mismatch at %i\n", i); + __builtin_abort (); + } + + #pragma omp parallel for + for (int i = 0; i < n; i++) + omp_free(a[i], allocator); + + omp_free (a, allocator); + } +} + +int +main () +{ + // Smaller than low-latency memory limit + test (10, omp_default_mem_alloc); + test (10, omp_large_cap_mem_alloc); + test (10, omp_const_mem_alloc); + test (10, omp_high_bw_mem_alloc); + test (10, omp_low_lat_mem_alloc); + test (10, omp_cgroup_mem_alloc); + test (10, omp_pteam_mem_alloc); + test (10, omp_thread_mem_alloc); + + // Larger than low-latency memory limit (on aggregate) + test (1000, omp_default_mem_alloc); + test (1000, omp_large_cap_mem_alloc); + test (1000, omp_const_mem_alloc); + test (1000, omp_high_bw_mem_alloc); + test (1000, omp_low_lat_mem_alloc); + test (1000, omp_cgroup_mem_alloc); + test (1000, omp_pteam_mem_alloc); + test (1000, omp_thread_mem_alloc); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/allocators-3.c b/libgomp/testsuite/libgomp.c/allocators-3.c new file mode 100644 index 00000000000..45514c2a088 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-3.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ + +/* Stress-test omp_alloc/omp_malloc under concurrency. */ + +#include +#include +#include + +#pragma omp requires dynamic_allocators + +#define N 1000 + +void +test (omp_allocator_handle_t allocator) +{ + #pragma omp target map(to:allocator) + { + #pragma omp parallel for + for (int i = 0; i < N; i++) + for (int j = 0; j < N; j++) + { + int *p = omp_alloc(sizeof(int), allocator); + omp_free(p, allocator); + } + } +} + +int +main () +{ + // Smaller than low-latency memory limit + test (omp_default_mem_alloc); + test (omp_large_cap_mem_alloc); + test (omp_const_mem_alloc); + test (omp_high_bw_mem_alloc); + test (omp_low_lat_mem_alloc); + test (omp_cgroup_mem_alloc); + test (omp_pteam_mem_alloc); + test (omp_thread_mem_alloc); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c new file mode 100644 index 00000000000..9fa6aa1624f --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-4.c @@ -0,0 +1,196 @@ +/* { dg-do run } */ + +/* Test that low-latency free chains are sound. */ + +#include +#include + +#pragma omp requires dynamic_allocators + +void +check (int cond, const char *msg) +{ + if (!cond) + { + __builtin_printf ("%s\n", msg); + __builtin_abort (); + } +} + +int +main () +{ + #pragma omp target + { + /* Ensure that the memory we get *is* low-latency with a null-fallback. */ + omp_alloctrait_t traits[1] + = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, + 1, traits); + + int size = 4; + + char *a = omp_alloc(size, lowlat); + char *b = omp_alloc(size, lowlat); + char *c = omp_alloc(size, lowlat); + char *d = omp_alloc(size, lowlat); + + /* There are headers and padding to account for. */ + int size2 = size + (b-a); + int size3 = size + (c-a); + int size4 = size + (d-a) + 100; /* Random larger amount. */ + + check (a != NULL && b != NULL && c != NULL && d != NULL, + "omp_alloc returned NULL\n"); + + omp_free(a, lowlat); + char *p = omp_alloc (size, lowlat); + check (p == a, "allocate did not reuse first chunk"); + + omp_free(b, lowlat); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not reuse second chunk"); + + omp_free(c, lowlat); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not reuse third chunk"); + + omp_free(a, lowlat); + omp_free(b, lowlat); + p = omp_alloc (size2, lowlat); + check (p == a, "allocate did not coalesce first two chunks"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == a, "allocate did not split first chunk (1)"); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split first chunk (2)"); + + omp_free(b, lowlat); + omp_free(c, lowlat); + p = omp_alloc (size2, lowlat); + check (p == b, "allocate did not coalesce middle two chunks"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split second chunk (1)"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split second chunk (2)"); + + omp_free(b, lowlat); + omp_free(a, lowlat); + p = omp_alloc (size2, lowlat); + check (p == a, "allocate did not coalesce first two chunks, reverse free"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == a, "allocate did not split first chunk (1), reverse free"); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split first chunk (2), reverse free"); + + omp_free(c, lowlat); + omp_free(b, lowlat); + p = omp_alloc (size2, lowlat); + check (p == b, "allocate did not coalesce second two chunks, reverse free"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split second chunk (1), reverse free"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split second chunk (2), reverse free"); + + omp_free(a, lowlat); + omp_free(b, lowlat); + omp_free(c, lowlat); + p = omp_alloc (size3, lowlat); + check (p == a, "allocate did not coalesce first three chunks"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == a, "allocate did not split first chunk (1)"); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split first chunk (2)"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split first chunk (3)"); + + omp_free(b, lowlat); + omp_free(c, lowlat); + omp_free(d, lowlat); + p = omp_alloc (size3, lowlat); + check (p == b, "allocate did not coalesce last three chunks"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split second chunk (1)"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split second chunk (2)"); + p = omp_alloc (size, lowlat); + check (p == d, "allocate did not split second chunk (3)"); + + omp_free(c, lowlat); + omp_free(b, lowlat); + omp_free(a, lowlat); + p = omp_alloc (size3, lowlat); + check (p == a, "allocate did not coalesce first three chunks, reverse free"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == a, "allocate did not split first chunk (1), reverse free"); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split first chunk (2), reverse free"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split first chunk (3), reverse free"); + + omp_free(d, lowlat); + omp_free(c, lowlat); + omp_free(b, lowlat); + p = omp_alloc (size3, lowlat); + check (p == b, "allocate did not coalesce second three chunks, reverse free"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split second chunk (1), reverse free"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split second chunk (2), reverse free"); + p = omp_alloc (size, lowlat); + check (p == d, "allocate did not split second chunk (3), reverse free"); + + omp_free(c, lowlat); + omp_free(a, lowlat); + omp_free(b, lowlat); + p = omp_alloc (size3, lowlat); + check (p == a, "allocate did not coalesce first three chunks, mixed free"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == a, "allocate did not split first chunk (1), mixed free"); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split first chunk (2), mixed free"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split first chunk (3), mixed free"); + + omp_free(d, lowlat); + omp_free(b, lowlat); + omp_free(c, lowlat); + p = omp_alloc (size3, lowlat); + check (p == b, "allocate did not coalesce second three chunks, mixed free"); + + omp_free(p, lowlat); + p = omp_alloc (size, lowlat); + check (p == b, "allocate did not split second chunk (1), mixed free"); + p = omp_alloc (size, lowlat); + check (p == c, "allocate did not split second chunk (2), mixed free"); + p = omp_alloc (size, lowlat); + check (p == d, "allocate did not split second chunk (3), mixed free"); + + omp_free(a, lowlat); + omp_free(b, lowlat); + omp_free(c, lowlat); + omp_free(d, lowlat); + p = omp_alloc(size4, lowlat); + check (p == a, "allocate did not coalesce all memory"); + } + +return 0; +} + diff --git a/libgomp/testsuite/libgomp.c/allocators-5.c b/libgomp/testsuite/libgomp.c/allocators-5.c new file mode 100644 index 00000000000..9694010cf1f --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-5.c @@ -0,0 +1,63 @@ +/* { dg-do run } */ + +/* Test calloc with omp_alloc. */ + +#include + +#pragma omp requires dynamic_allocators + +void +test (int n, omp_allocator_handle_t allocator) +{ + #pragma omp target map(to:n) map(to:allocator) + { + int *a; + a = (int *) omp_calloc(n, sizeof(int), allocator); + + for (int i = 0; i < n; i++) + if (a[i] != 0) + { + __builtin_printf ("memory not zeroed at %i\n", i); + __builtin_abort (); + } + + #pragma omp parallel + for (int i = 0; i < n; i++) + a[i] = i; + + for (int i = 0; i < n; i++) + if (a[i] != i) + { + __builtin_printf ("data mismatch at %i\n", i); + __builtin_abort (); + } + + omp_free(a, allocator); + } +} + +int +main () +{ + // Smaller than low-latency memory limit + test (10, omp_default_mem_alloc); + test (10, omp_large_cap_mem_alloc); + test (10, omp_const_mem_alloc); + test (10, omp_high_bw_mem_alloc); + test (10, omp_low_lat_mem_alloc); + test (10, omp_cgroup_mem_alloc); + test (10, omp_pteam_mem_alloc); + test (10, omp_thread_mem_alloc); + + // Larger than low-latency memory limit + test (100000, omp_default_mem_alloc); + test (100000, omp_large_cap_mem_alloc); + test (100000, omp_const_mem_alloc); + test (100000, omp_high_bw_mem_alloc); + test (100000, omp_low_lat_mem_alloc); + test (100000, omp_cgroup_mem_alloc); + test (100000, omp_pteam_mem_alloc); + test (100000, omp_thread_mem_alloc); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c new file mode 100644 index 00000000000..90bf73095ef --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-6.c @@ -0,0 +1,117 @@ +/* { dg-do run } */ + +/* Test that low-latency realloc and free chains are sound. */ + +#include +#include + +#pragma omp requires dynamic_allocators + +void +check (int cond, const char *msg) +{ + if (!cond) + { + __builtin_printf ("%s\n", msg); + __builtin_abort (); + } +} + +int +main () +{ + #pragma omp target + { + /* Ensure that the memory we get *is* low-latency with a null-fallback. */ + omp_alloctrait_t traits[1] + = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, + 1, traits); + + int size = 16; + + char *a = (char *)omp_alloc(size, lowlat); + char *b = (char *)omp_alloc(size, lowlat); + char *c = (char *)omp_alloc(size, lowlat); + char *d = (char *)omp_alloc(size, lowlat); + + /* There are headers and padding to account for. */ + int size2 = size + (b-a); + int size3 = size + (c-a); + int size4 = size + (d-a) + 100; /* Random larger amount. */ + + check (a != NULL && b != NULL && c != NULL && d != NULL, + "omp_alloc returned NULL\n"); + + char *p = omp_realloc (b, size, lowlat, lowlat); + check (p == b, "realloc did not reuse same size chunk, no space after"); + + p = omp_realloc (b, size-8, lowlat, lowlat); + check (p == b, "realloc did not reuse smaller chunk, no space after"); + + p = omp_realloc (b, size, lowlat, lowlat); + check (p == b, "realloc did not reuse original size chunk, no space after"); + + /* Make space after b. */ + omp_free(c, lowlat); + + p = omp_realloc (b, size, lowlat, lowlat); + check (p == b, "realloc did not reuse same size chunk"); + + p = omp_realloc (b, size-8, lowlat, lowlat); + check (p == b, "realloc did not reuse smaller chunk"); + + p = omp_realloc (b, size, lowlat, lowlat); + check (p == b, "realloc did not reuse original size chunk"); + + p = omp_realloc (b, size+8, lowlat, lowlat); + check (p == b, "realloc did not extend in place by a little"); + + p = omp_realloc (b, size2, lowlat, lowlat); + check (p == b, "realloc did not extend into whole next chunk"); + + p = omp_realloc (b, size3, lowlat, lowlat); + check (p != b, "realloc did not move b elsewhere"); + omp_free (p, lowlat); + + + p = omp_realloc (a, size, lowlat, lowlat); + check (p == a, "realloc did not reuse same size chunk, first position"); + + p = omp_realloc (a, size-8, lowlat, lowlat); + check (p == a, "realloc did not reuse smaller chunk, first position"); + + p = omp_realloc (a, size, lowlat, lowlat); + check (p == a, "realloc did not reuse original size chunk, first position"); + + p = omp_realloc (a, size+8, lowlat, lowlat); + check (p == a, "realloc did not extend in place by a little, first position"); + + p = omp_realloc (a, size3, lowlat, lowlat); + check (p == a, "realloc did not extend into whole next chunk, first position"); + + p = omp_realloc (a, size4, lowlat, lowlat); + check (p != a, "realloc did not move a elsewhere, first position"); + omp_free (p, lowlat); + + + p = omp_realloc (d, size, lowlat, lowlat); + check (p == d, "realloc did not reuse same size chunk, last position"); + + p = omp_realloc (d, size-8, lowlat, lowlat); + check (p == d, "realloc did not reuse smaller chunk, last position"); + + p = omp_realloc (d, size, lowlat, lowlat); + check (p == d, "realloc did not reuse original size chunk, last position"); + + p = omp_realloc (d, size+8, lowlat, lowlat); + check (p == d, "realloc did not extend in place by d little, last position"); + + /* Larger than low latency memory. */ + p = omp_realloc(d, 100000000, lowlat, lowlat); + check (p == NULL, "realloc did not fail on OOM"); + } + +return 0; +} + From patchwork Thu Jul 7 10:34:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55821 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 600733851157 for ; Thu, 7 Jul 2022 10:35:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id A08CF3857410 for ; Thu, 7 Jul 2022 10:35:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A08CF3857410 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112661" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:16 -0800 IronPort-SDR: cP5AheROrDqrZXTC8loXu/qO7zMWONe2sLqaOwItn77ptqJ+JTExcHM128d3JCDs7AWXyMm2E0 8tL+mPUCPQ6EjjfguEgBAonE2ap8tuhHyCl1i2rtwQm0MUD26vAQ2mRoqnrEFRTvud9I3LoGt6 DuFY3cEaDkz8K1RWmr9X89AwFuxUByCorofK3PHg7hgRiLbGx23gcA8f5WZxRNYhigZdYa+S7x qt5l6EuOP34rbLNE9FqeZd/6e+UOp782SDkB2/XXL+1WeClVBYrvhoYlKGemNmmG7lAQcdFPt7 5KA= From: Andrew Stubbs To: Subject: [PATCH 02/17] libgomp: pinned memory Date: Thu, 7 Jul 2022 11:34:33 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (xmlock): New function. (omp_init_allocator): Don't disallow the pinned trait. (omp_aligned_alloc): Add pinning to all MEMSPACE_* calls. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. (omp_free): Likewise. * config/linux/allocator.c: New file. * config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. * testsuite/libgomp.c/alloc-pinned-1.c: New test. * testsuite/libgomp.c/alloc-pinned-2.c: New test. * testsuite/libgomp.c/alloc-pinned-3.c: New test. * testsuite/libgomp.c/alloc-pinned-4.c: New test. --- libgomp/allocator.c | 67 ++++++---- libgomp/config/linux/allocator.c | 99 ++++++++++++++ libgomp/config/nvptx/allocator.c | 8 +- libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 95 +++++++++++++ libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 101 ++++++++++++++ libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 130 ++++++++++++++++++ libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 132 +++++++++++++++++++ 7 files changed, 602 insertions(+), 30 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 9b33bcf529b..54310ab93ca 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -39,16 +39,20 @@ /* These macros may be overridden in config//allocator.c. */ #ifndef MEMSPACE_ALLOC -#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE) +#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ + (PIN ? NULL : malloc (SIZE)) #endif #ifndef MEMSPACE_CALLOC -#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE) +#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ + (PIN ? NULL : calloc (1, SIZE)) #endif #ifndef MEMSPACE_REALLOC -#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE) +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \ + ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE)) #endif #ifndef MEMSPACE_FREE -#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR) +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ + (PIN ? NULL : free (ADDR)) #endif /* Map the predefined allocators to the correct memory space. @@ -351,10 +355,6 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits, break; } - /* No support for this so far. */ - if (data.pinned) - return omp_null_allocator; - ret = gomp_malloc (sizeof (struct omp_allocator_data)); *ret = data; #ifndef HAVE_SYNC_BUILTINS @@ -481,7 +481,8 @@ retry: } else #endif - ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size); + ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size, + allocator_data->pinned); if (ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -511,7 +512,8 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_ALLOC (memspace, new_size); + ptr = MEMSPACE_ALLOC (memspace, new_size, + allocator_data && allocator_data->pinned); } if (ptr == NULL) goto fail; @@ -542,9 +544,9 @@ fail: #ifdef LIBGOMP_USE_MEMKIND || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0) - || !allocator_data) + || !allocator_data + || allocator_data->pool_size < ~(uintptr_t) 0 + || allocator_data->pinned) { allocator = omp_default_mem_alloc; goto retry; @@ -596,6 +598,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) struct omp_mem_header *data; omp_memspace_handle_t memspace __attribute__((unused)) = omp_default_mem_space; + int pinned __attribute__((unused)) = false; if (ptr == NULL) return; @@ -627,6 +630,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) #endif memspace = allocator_data->memspace; + pinned = allocator_data->pinned; } else { @@ -651,7 +655,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) memspace = predefined_alloc_mapping[data->allocator]; } - MEMSPACE_FREE (memspace, data->ptr, data->size); + MEMSPACE_FREE (memspace, data->ptr, data->size, pinned); } ialias (omp_free) @@ -767,7 +771,8 @@ retry: } else #endif - ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size); + ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size, + allocator_data->pinned); if (ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -797,7 +802,8 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_CALLOC (memspace, new_size); + ptr = MEMSPACE_CALLOC (memspace, new_size, + allocator_data && allocator_data->pinned); } if (ptr == NULL) goto fail; @@ -828,9 +834,9 @@ fail: #ifdef LIBGOMP_USE_MEMKIND || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0) - || !allocator_data) + || !allocator_data + || allocator_data->pool_size < ~(uintptr_t) 0 + || allocator_data->pinned) { allocator = omp_default_mem_alloc; goto retry; @@ -1021,9 +1027,13 @@ retry: #endif if (prev_size) new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr, - data->size, new_size); + data->size, new_size, + (free_allocator_data + && free_allocator_data->pinned), + allocator_data->pinned); else - new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size); + new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size, + allocator_data->pinned); if (new_ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -1069,10 +1079,14 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size); + new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size, + (free_allocator_data + && free_allocator_data->pinned), + allocator_data && allocator_data->pinned); } if (new_ptr == NULL) goto fail; + ret = (char *) new_ptr + sizeof (struct omp_mem_header); ((struct omp_mem_header *) ret)[-1].ptr = new_ptr; ((struct omp_mem_header *) ret)[-1].size = new_size; @@ -1095,7 +1109,8 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - new_ptr = MEMSPACE_ALLOC (memspace, new_size); + new_ptr = MEMSPACE_ALLOC (memspace, new_size, + allocator_data && allocator_data->pinned); } if (new_ptr == NULL) goto fail; @@ -1151,9 +1166,9 @@ fail: #ifdef LIBGOMP_USE_MEMKIND || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0) - || !allocator_data) + || !allocator_data + || allocator_data->pool_size < ~(uintptr_t) 0 + || allocator_data->pinned) { allocator = omp_default_mem_alloc; goto retry; diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c index b73acce9121..1496e41875c 100644 --- a/libgomp/config/linux/allocator.c +++ b/libgomp/config/linux/allocator.c @@ -33,4 +33,103 @@ #define LIBGOMP_USE_MEMKIND #endif +/* Implement malloc routines that can handle pinned memory on Linux. + + It's possible to use mlock on any heap memory, but using munlock is + problematic if there are multiple pinned allocations on the same page. + Tracking all that manually would be possible, but adds overhead. This may + be worth it if there are a lot of small allocations getting pinned, but + this seems less likely in a HPC application. + + Instead we optimize for large pinned allocations, and use mmap to ensure + that two pinned allocations don't share the same page. This also means + that large allocations don't pin extra pages by being poorly aligned. */ + +#define _GNU_SOURCE +#include +#include +#include "libgomp.h" + +static void * +linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) +{ + (void)memspace; + + if (pin) + { + void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (addr == MAP_FAILED) + return NULL; + + if (mlock (addr, size)) + { + gomp_debug (0, "libgomp: failed to pin memory (ulimit too low?)\n"); + munmap (addr, size); + return NULL; + } + + return addr; + } + else + return malloc (size); +} + +static void * +linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) +{ + if (pin) + return linux_memspace_alloc (memspace, size, pin); + else + return calloc (1, size); +} + +static void +linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size, + int pin) +{ + (void)memspace; + + if (pin) + munmap (addr, size); + else + free (addr); +} + +static void * +linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, + size_t oldsize, size_t size, int oldpin, int pin) +{ + if (oldpin && pin) + { + void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE); + if (newaddr == MAP_FAILED) + return NULL; + + return newaddr; + } + else if (oldpin || pin) + { + void *newaddr = linux_memspace_alloc (memspace, size, pin); + if (newaddr) + { + memcpy (newaddr, addr, oldsize < size ? oldsize : size); + linux_memspace_free (memspace, addr, oldsize, oldpin); + } + + return newaddr; + } + else + return realloc (addr, size); +} + +#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ + linux_memspace_alloc (MEMSPACE, SIZE, PIN) +#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ + linux_memspace_calloc (MEMSPACE, SIZE, PIN) +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \ + linux_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ + linux_memspace_free (MEMSPACE, ADDR, SIZE, PIN) + #include "../../allocator.c" diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index 6bc2ea48043..f740b97f6ac 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -358,13 +358,13 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, return realloc (addr, size); } -#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \ +#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ nvptx_memspace_alloc (MEMSPACE, SIZE) -#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ +#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ nvptx_memspace_calloc (MEMSPACE, SIZE) -#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \ +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \ nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE) -#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \ +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ nvptx_memspace_free (MEMSPACE, ADDR, SIZE) #include "../../allocator.c" diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c new file mode 100644 index 00000000000..79792b16d83 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c @@ -0,0 +1,95 @@ +/* { dg-do run } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that pinned memory works. */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#define CHECK_SIZE(SIZE) { \ + struct rlimit limit; \ + if (getrlimit (RLIMIT_MEMLOCK, &limit) \ + || limit.rlim_cur <= SIZE) \ + fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ + } + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} +#else +#define PAGE_SIZE 1 /* unknown */ +#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n"); + +int +get_pinned_mem () +{ + return 0; +} +#endif + +#include + +int +main () +{ + /* Allocate at least a page each time, but stay within the ulimit. */ + const int SIZE = PAGE_SIZE; + CHECK_SIZE (SIZE*3); + + const omp_alloctrait_t traits[] = { + { omp_atk_pinned, 1 } + }; + omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, 1, traits); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + void *p = omp_alloc (SIZE, allocator); + if (!p) + abort (); + + int amount = get_pinned_mem (); + if (amount == 0) + abort (); + + p = omp_realloc (p, SIZE*2, allocator, allocator); + + int amount2 = get_pinned_mem (); + if (amount2 <= amount) + abort (); + + p = omp_calloc (1, SIZE, allocator); + + if (get_pinned_mem () <= amount2) + abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c new file mode 100644 index 00000000000..228c656b715 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c @@ -0,0 +1,101 @@ +/* { dg-do run } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that pinned memory works (pool_size code path). */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#define CHECK_SIZE(SIZE) { \ + struct rlimit limit; \ + if (getrlimit (RLIMIT_MEMLOCK, &limit) \ + || limit.rlim_cur <= SIZE) \ + fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ + } + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} +#else +#define PAGE_SIZE 1 /* unknown */ +#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n"); + +int +get_pinned_mem () +{ + return 0; +} +#endif + +#include + +int +main () +{ + /* Allocate at least a page each time, but stay within the ulimit. */ + const int SIZE = PAGE_SIZE; + CHECK_SIZE (SIZE*3); + + const omp_alloctrait_t traits[] = { + { omp_atk_pinned, 1 }, + { omp_atk_pool_size, SIZE*8 } + }; + omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, + 2, traits); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + void *p = omp_alloc (SIZE, allocator); + if (!p) + abort (); + + int amount = get_pinned_mem (); + if (amount == 0) + abort (); + + p = omp_realloc (p, SIZE*2, allocator, allocator); + if (!p) + abort (); + + int amount2 = get_pinned_mem (); + if (amount2 <= amount) + abort (); + + p = omp_calloc (1, SIZE, allocator); + if (!p) + abort (); + + if (get_pinned_mem () <= amount2) + abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c new file mode 100644 index 00000000000..90539ffe3e0 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c @@ -0,0 +1,130 @@ +/* { dg-do run } */ + +/* Test that pinned memory fails correctly. */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} + +void +set_pin_limit (int size) +{ + struct rlimit limit; + if (getrlimit (RLIMIT_MEMLOCK, &limit)) + abort (); + limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size); + if (setrlimit (RLIMIT_MEMLOCK, &limit)) + abort (); +} +#else +int +#define PAGE_SIZE 10000*1024 /* unknown */ + +get_pinned_mem () +{ + return 0; +} + +void +set_pin_limit () +{ +} +#endif + +#include + +int +main () +{ + /* This needs to be large enough to cover multiple pages. */ + const int SIZE = PAGE_SIZE*4; + + /* Pinned memory, no fallback. */ + const omp_alloctrait_t traits1[] = { + { omp_atk_pinned, 1 }, + { omp_atk_fallback, omp_atv_null_fb } + }; + omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 2, traits1); + + /* Pinned memory, plain memory fallback. */ + const omp_alloctrait_t traits2[] = { + { omp_atk_pinned, 1 }, + { omp_atk_fallback, omp_atv_default_mem_fb } + }; + omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 2, traits2); + + /* Ensure that the limit is smaller than the allocation. */ + set_pin_limit (SIZE/2); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + // Should fail + void *p = omp_alloc (SIZE, allocator1); + if (p) + abort (); + + // Should fail + p = omp_calloc (1, SIZE, allocator1); + if (p) + abort (); + + // Should fall back + p = omp_alloc (SIZE, allocator2); + if (!p) + abort (); + + // Should fall back + p = omp_calloc (1, SIZE, allocator2); + if (!p) + abort (); + + // Should fail to realloc + void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc); + p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc); + if (!notpinned || p) + abort (); + + // Should fall back to no realloc needed + p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc); + if (p != notpinned) + abort (); + + // No memory should have been pinned + int amount = get_pinned_mem (); + if (amount != 0) + abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c new file mode 100644 index 00000000000..534e49eefc4 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c @@ -0,0 +1,132 @@ +/* { dg-do run } */ + +/* Test that pinned memory fails correctly, pool_size code path. */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} + +void +set_pin_limit (int size) +{ + struct rlimit limit; + if (getrlimit (RLIMIT_MEMLOCK, &limit)) + abort (); + limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size); + if (setrlimit (RLIMIT_MEMLOCK, &limit)) + abort (); +} +#else +int +#define PAGE_SIZE 10000*1024 /* unknown */ + +get_pinned_mem () +{ + return 0; +} + +void +set_pin_limit () +{ +} +#endif + +#include + +int +main () +{ + /* This needs to be large enough to cover multiple pages. */ + const int SIZE = PAGE_SIZE*4; + + /* Pinned memory, no fallback. */ + const omp_alloctrait_t traits1[] = { + { omp_atk_pinned, 1 }, + { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_pool_size, SIZE*8 } + }; + omp_allocator_handle_t allocator1 = omp_init_allocator (omp_default_mem_space, 3, traits1); + + /* Pinned memory, plain memory fallback. */ + const omp_alloctrait_t traits2[] = { + { omp_atk_pinned, 1 }, + { omp_atk_fallback, omp_atv_default_mem_fb }, + { omp_atk_pool_size, SIZE*8 } + }; + omp_allocator_handle_t allocator2 = omp_init_allocator (omp_default_mem_space, 3, traits2); + + /* Ensure that the limit is smaller than the allocation. */ + set_pin_limit (SIZE/2); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + // Should fail + void *p = omp_alloc (SIZE, allocator1); + if (p) + abort (); + + // Should fail + p = omp_calloc (1, SIZE, allocator1); + if (p) + abort (); + + // Should fall back + p = omp_alloc (SIZE, allocator2); + if (!p) + abort (); + + // Should fall back + p = omp_calloc (1, SIZE, allocator2); + if (!p) + abort (); + + // Should fail to realloc + void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc); + p = omp_realloc (notpinned, SIZE, allocator1, omp_default_mem_alloc); + if (!notpinned || p) + abort (); + + // Should fall back to no realloc needed + p = omp_realloc (notpinned, SIZE, allocator2, omp_default_mem_alloc); + if (p != notpinned) + abort (); + + // No memory should have been pinned + int amount = get_pinned_mem (); + if (amount != 0) + abort (); + + return 0; +} From patchwork Thu Jul 7 10:34:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55824 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 02C3C386CE4C for ; Thu, 7 Jul 2022 10:36:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id C0ECB38560AE for ; Thu, 7 Jul 2022 10:35:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C0ECB38560AE Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112664" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:18 -0800 IronPort-SDR: v/rK1YJEbJa/pV+Ai/xezDpFL0NLno4m23JXl4l7geY2oNNYrRnT3XqJ3Yul8aOPu3QOwKlYs0 YdVGQAKJdymlP5QcaqG2rxIF9PFMsSW3uVpFF4eCejh8tzBnWiyJzT98ij7IWpSXM96G7y99DT ciE2ZAXbR0NiKR8znGaKQHZ84DqYq9W+jFMu+yR3J2Tw3vHL7cIArOTTptocKYpFRhHFLueYaU wAfsn33Jdt3SVm+/vxqttGajVKcSRS074oHm4s3wmnqzpXyQ8wGAQkMp5Atcyn87KYnjsqe2BV vs4= From: Andrew Stubbs To: Subject: [PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc Date: Thu, 7 Jul 2022 11:34:34 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. libgomp/ChangeLog: * allocator.c (omp_max_predefined_alloc): Update. (omp_aligned_alloc): Support ompx_pinned_mem_alloc. (omp_free): Likewise. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc. * omp_lib.f90.in: Add ompx_pinned_mem_alloc. * testsuite/libgomp.c/alloc-pinned-5.c: New test. * testsuite/libgomp.c/alloc-pinned-6.c: New test. * testsuite/libgomp.fortran/alloc-pinned-1.f90: New test. --- libgomp/allocator.c | 60 +++++++---- libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in | 2 + libgomp/testsuite/libgomp.c/alloc-pinned-5.c | 90 ++++++++++++++++ libgomp/testsuite/libgomp.c/alloc-pinned-6.c | 101 ++++++++++++++++++ .../libgomp.fortran/alloc-pinned-1.f90 | 16 +++ 6 files changed, 252 insertions(+), 18 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 54310ab93ca..029d0d40a36 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -35,7 +35,7 @@ #include #endif -#define omp_max_predefined_alloc omp_thread_mem_alloc +#define omp_max_predefined_alloc ompx_pinned_mem_alloc /* These macros may be overridden in config//allocator.c. */ #ifndef MEMSPACE_ALLOC @@ -67,6 +67,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = { omp_low_lat_mem_space, /* omp_cgroup_mem_alloc. */ omp_low_lat_mem_space, /* omp_pteam_mem_alloc. */ omp_low_lat_mem_space, /* omp_thread_mem_alloc. */ + omp_default_mem_space, /* ompx_pinned_mem_alloc. */ }; enum gomp_memkind_kind @@ -512,8 +513,11 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_ALLOC (memspace, new_size, - allocator_data && allocator_data->pinned); + int pinned __attribute__((unused)) + = (allocator_data + ? allocator_data->pinned + : allocator == ompx_pinned_mem_alloc); + ptr = MEMSPACE_ALLOC (memspace, new_size, pinned); } if (ptr == NULL) goto fail; @@ -534,7 +538,8 @@ retry: fail: int fallback = (allocator_data ? allocator_data->fallback - : allocator == omp_default_mem_alloc + : (allocator == omp_default_mem_alloc + || allocator == ompx_pinned_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -653,6 +658,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) #endif memspace = predefined_alloc_mapping[data->allocator]; + pinned = (data->allocator == ompx_pinned_mem_alloc); } MEMSPACE_FREE (memspace, data->ptr, data->size, pinned); @@ -802,8 +808,11 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_CALLOC (memspace, new_size, - allocator_data && allocator_data->pinned); + int pinned __attribute__((unused)) + = (allocator_data + ? allocator_data->pinned + : allocator == ompx_pinned_mem_alloc); + ptr = MEMSPACE_CALLOC (memspace, new_size, pinned); } if (ptr == NULL) goto fail; @@ -824,7 +833,8 @@ retry: fail: int fallback = (allocator_data ? allocator_data->fallback - : allocator == omp_default_mem_alloc + : (allocator == omp_default_mem_alloc + || allocator == ompx_pinned_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -1026,11 +1036,15 @@ retry: else #endif if (prev_size) - new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr, - data->size, new_size, - (free_allocator_data - && free_allocator_data->pinned), - allocator_data->pinned); + { + int was_pinned __attribute__((unused)) + = (free_allocator_data + ? free_allocator_data->pinned + : free_allocator == ompx_pinned_mem_alloc); + new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr, + data->size, new_size, was_pinned, + allocator_data->pinned); + } else new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size, allocator_data->pinned); @@ -1079,10 +1093,16 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); + int was_pinned __attribute__((unused)) + = (free_allocator_data + ? free_allocator_data->pinned + : free_allocator == ompx_pinned_mem_alloc); + int pinned __attribute__((unused)) + = (allocator_data + ? allocator_data->pinned + : allocator == ompx_pinned_mem_alloc); new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size, - (free_allocator_data - && free_allocator_data->pinned), - allocator_data && allocator_data->pinned); + was_pinned, pinned); } if (new_ptr == NULL) goto fail; @@ -1109,8 +1129,11 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - new_ptr = MEMSPACE_ALLOC (memspace, new_size, - allocator_data && allocator_data->pinned); + int pinned __attribute__((unused)) + = (allocator_data + ? allocator_data->pinned + : allocator == ompx_pinned_mem_alloc); + new_ptr = MEMSPACE_ALLOC (memspace, new_size, pinned); } if (new_ptr == NULL) goto fail; @@ -1156,7 +1179,8 @@ retry: fail: int fallback = (allocator_data ? allocator_data->fallback - : allocator == omp_default_mem_alloc + : (allocator == omp_default_mem_alloc + || allocator == ompx_pinned_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in index 925a650135e..eb071aa2e00 100644 --- a/libgomp/omp.h.in +++ b/libgomp/omp.h.in @@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM omp_cgroup_mem_alloc = 6, omp_pteam_mem_alloc = 7, omp_thread_mem_alloc = 8, + ompx_pinned_mem_alloc = 9, __omp_allocator_handle_t_max__ = __UINTPTR_MAX__ } omp_allocator_handle_t; diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in index 7ba115f3a1a..10610d64cfe 100644 --- a/libgomp/omp_lib.f90.in +++ b/libgomp/omp_lib.f90.in @@ -158,6 +158,8 @@ parameter :: omp_pteam_mem_alloc = 7 integer (kind=omp_allocator_handle_kind), & parameter :: omp_thread_mem_alloc = 8 + integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_pinned_mem_alloc = 9 integer (omp_memspace_handle_kind), & parameter :: omp_default_mem_space = 0 integer (omp_memspace_handle_kind), & diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-5.c b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c new file mode 100644 index 00000000000..315c7161a39 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-5.c @@ -0,0 +1,90 @@ +/* { dg-do run } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that ompx_pinned_mem_alloc works. */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#define CHECK_SIZE(SIZE) { \ + struct rlimit limit; \ + if (getrlimit (RLIMIT_MEMLOCK, &limit) \ + || limit.rlim_cur <= SIZE) \ + fprintf (stderr, "unsufficient lockable memory; please increase ulimit\n"); \ + } + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} +#else +#define PAGE_SIZE 1 /* unknown */ +#define CHECK_SIZE(SIZE) fprintf (stderr, "OS unsupported\n"); + +int +get_pinned_mem () +{ + return 0; +} +#endif + +#include + +int +main () +{ + /* Allocate at least a page each time, but stay within the ulimit. */ + const int SIZE = PAGE_SIZE; + CHECK_SIZE (SIZE*3); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + void *p = omp_alloc (SIZE, ompx_pinned_mem_alloc); + if (!p) + abort (); + + int amount = get_pinned_mem (); + if (amount == 0) + abort (); + + p = omp_realloc (p, SIZE*2, ompx_pinned_mem_alloc, ompx_pinned_mem_alloc); + + int amount2 = get_pinned_mem (); + if (amount2 <= amount) + abort (); + + p = omp_calloc (1, SIZE, ompx_pinned_mem_alloc); + + if (get_pinned_mem () <= amount2) + abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c new file mode 100644 index 00000000000..bbe20c04875 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c @@ -0,0 +1,101 @@ +/* { dg-do run } */ + +/* Test that ompx_pinned_mem_alloc fails correctly. */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include +#include + +#define PAGE_SIZE sysconf(_SC_PAGESIZE) + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} + +void +set_pin_limit (int size) +{ + struct rlimit limit; + if (getrlimit (RLIMIT_MEMLOCK, &limit)) + abort (); + limit.rlim_cur = (limit.rlim_max < size ? limit.rlim_max : size); + if (setrlimit (RLIMIT_MEMLOCK, &limit)) + abort (); +} +#else +#define PAGE_SIZE 10000*1024 /* unknown */ + +int +get_pinned_mem () +{ + return 0; +} + +void +set_pin_limit () +{ +} +#endif + +#include + +int +main () +{ + /* Allocate at least a page each time, but stay within the ulimit. */ + const int SIZE = PAGE_SIZE*4; + + /* Ensure that the limit is smaller than the allocation. */ + set_pin_limit (SIZE/2); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + // Should fail + void *p = omp_alloc (SIZE, ompx_pinned_mem_alloc); + if (p) + abort (); + + // Should fail + p = omp_calloc (1, SIZE, ompx_pinned_mem_alloc); + if (p) + abort (); + + // Should fail to realloc + void *notpinned = omp_alloc (SIZE, omp_default_mem_alloc); + p = omp_realloc (notpinned, SIZE, ompx_pinned_mem_alloc, omp_default_mem_alloc); + if (!notpinned || p) + abort (); + + // No memory should have been pinned + int amount = get_pinned_mem (); + if (amount != 0) + abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 b/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 new file mode 100644 index 00000000000..798dc3d5a12 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 @@ -0,0 +1,16 @@ +! Ensure that the ompx_pinned_mem_alloc predefined allocator is present and +! accepted. The majority of the functionality testing lives in the C tests. +! +! { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } + +program main + use omp_lib + use ISO_C_Binding + implicit none (external, type) + + type(c_ptr) :: p + + p = omp_alloc (10_c_size_t, ompx_pinned_mem_alloc); + if (.not. c_associated (p)) stop 1 + call omp_free (p, ompx_pinned_mem_alloc); +end program main From patchwork Thu Jul 7 10:34:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55823 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 119C6384B0FB for ; Thu, 7 Jul 2022 10:35:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 157233856090 for ; Thu, 7 Jul 2022 10:35:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 157233856090 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112666" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:35:21 -0800 IronPort-SDR: 965zD6ldZlw6BwBbbE3HZAj35EDnJBqmyVPG1YQBdmoWtChW9TZwKsw20tQex225RnREYCt2Ox qKIp26JqI6pOfzkyGwWD/Cw5vtD80t33XChHNgnQh7vAvCg/wknH0ADWcTvx0EpJ8a9y8sCaLm 3cCdO8BEpuUIsUxHAzgO4Eck87gncPuJpW95HooBTOHosP4IGt0w0UB9KXwdIMxXj3mO/J+sBP qtiEP4oUA7J+3dmyT+2iax2HdqBNfpge2ujRlgtiAOcS4XmvNFRndYizYtBovuAyw5+hUos9UY iCM= From: Andrew Stubbs To: Subject: [PATCH 04/17] openmp, nvptx: low-lat memory access traits Date: Thu, 7 Jul 2022 11:34:35 +0100 Message-ID: <2810723bd4e98723e5b9eca476eb7e981590c81a.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog: * allocator.c (MEMSPACE_VALIDATE): New macro. (omp_aligned_alloc): Use MEMSPACE_VALIDATE. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/nvptx/allocator.c (nvptx_memspace_validate): New function. (MEMSPACE_VALIDATE): New macro. * testsuite/libgomp.c/allocators-4.c (main): Add access trait. * testsuite/libgomp.c/allocators-6.c (main): Add access trait. * testsuite/libgomp.c/allocators-7.c: New test. --- libgomp/allocator.c | 15 +++++ libgomp/config/nvptx/allocator.c | 11 ++++ libgomp/testsuite/libgomp.c/allocators-4.c | 7 ++- libgomp/testsuite/libgomp.c/allocators-6.c | 7 ++- libgomp/testsuite/libgomp.c/allocators-7.c | 68 ++++++++++++++++++++++ 5 files changed, 102 insertions(+), 6 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/allocators-7.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 029d0d40a36..48ab0782e6b 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -54,6 +54,9 @@ #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ (PIN ? NULL : free (ADDR)) #endif +#ifndef MEMSPACE_VALIDATE +#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) 1 +#endif /* Map the predefined allocators to the correct memory space. The index to this table is the omp_allocator_handle_t enum value. */ @@ -438,6 +441,10 @@ retry: if (__builtin_add_overflow (size, new_size, &new_size)) goto fail; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) + goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { @@ -733,6 +740,10 @@ retry: if (__builtin_add_overflow (size_temp, new_size, &new_size)) goto fail; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) + goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { @@ -964,6 +975,10 @@ retry: goto fail; old_size = data->size; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) + goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index f740b97f6ac..0102680b717 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -358,6 +358,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, return realloc (addr, size); } +static inline int +nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access) +{ + /* Disallow use of low-latency memory when it must be accessible by + all threads. */ + return (memspace != omp_low_lat_mem_space + || access != omp_atv_all); +} + #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ nvptx_memspace_alloc (MEMSPACE, SIZE) #define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ @@ -366,5 +375,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE) #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ nvptx_memspace_free (MEMSPACE, ADDR, SIZE) +#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \ + nvptx_memspace_validate (MEMSPACE, ACCESS) #include "../../allocator.c" diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c index 9fa6aa1624f..cae27ea33c1 100644 --- a/libgomp/testsuite/libgomp.c/allocators-4.c +++ b/libgomp/testsuite/libgomp.c/allocators-4.c @@ -23,10 +23,11 @@ main () #pragma omp target { /* Ensure that the memory we get *is* low-latency with a null-fallback. */ - omp_alloctrait_t traits[1] - = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, - 1, traits); + 2, traits); int size = 4; diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c index 90bf73095ef..c03233df582 100644 --- a/libgomp/testsuite/libgomp.c/allocators-6.c +++ b/libgomp/testsuite/libgomp.c/allocators-6.c @@ -23,10 +23,11 @@ main () #pragma omp target { /* Ensure that the memory we get *is* low-latency with a null-fallback. */ - omp_alloctrait_t traits[1] - = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, - 1, traits); + 2, traits); int size = 16; diff --git a/libgomp/testsuite/libgomp.c/allocators-7.c b/libgomp/testsuite/libgomp.c/allocators-7.c new file mode 100644 index 00000000000..a0a738b1d1d --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-7.c @@ -0,0 +1,68 @@ +/* { dg-do run } */ + +/* { dg-require-effective-target offload_device } */ +/* { dg-xfail-if "not implemented" { ! offload_target_nvptx } } */ + +/* Test that GPU low-latency allocation is limited to team access. */ + +#include +#include + +#pragma omp requires dynamic_allocators + +int +main () +{ + #pragma omp target + { + /* Ensure that the memory we get *is* low-latency with a null-fallback. */ + omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; + omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, + 2, traits); + + omp_alloctrait_t traits_all[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_all } }; + omp_allocator_handle_t lowlat_all + = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all); + + omp_alloctrait_t traits_default[1] + = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_allocator_handle_t lowlat_default + = omp_init_allocator (omp_low_lat_mem_space, 1, traits_default); + + void *a = omp_alloc(1, lowlat); // good + void *b = omp_alloc(1, lowlat_all); // bad + void *c = omp_alloc(1, lowlat_default); // bad + + if (!a || b || c) + __builtin_abort (); + + omp_free (a, lowlat); + + + a = omp_calloc(1, 1, lowlat); // good + b = omp_calloc(1, 1, lowlat_all); // bad + c = omp_calloc(1, 1, lowlat_default); // bad + + if (!a || b || c) + __builtin_abort (); + + omp_free (a, lowlat); + + + a = omp_realloc(NULL, 1, lowlat, lowlat); // good + b = omp_realloc(NULL, 1, lowlat_all, lowlat_all); // bad + c = omp_realloc(NULL, 1, lowlat_default, lowlat_default); // bad + + if (!a || b || c) + __builtin_abort (); + + omp_free (a, lowlat); + } + +return 0; +} + From patchwork Thu Jul 7 10:34:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55827 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0A0FD384D140 for ; Thu, 7 Jul 2022 10:37:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 98273385041B for ; Thu, 7 Jul 2022 10:36:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 98273385041B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448537" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:09 -0800 IronPort-SDR: T/40XxOBYa2Z3segwK1sItDTInTzFWbheLHNam0GuLuleXVHluq2EsBB2W2kyDqEbyXl+50J0l yYtF3oi2FG6zMqY+Y1VsmXxBWbHr+msNoq6wThnHPDyNsOQ1OPSQEnAC5Q7LDjd4VFpzdgi8oC YpoPVU7pxAVffjewua55OqtGYA8470Moe6E0xDfKZd9y1CBmSBcoqNAbkBpYb11z7zp15j3wJ0 FHLsj5rrkIXd+EawypN//DZFvWT5S/uX00yKD83FLQ8jw4XpqrqVA7tv3c6OCGg0xA92bbZ3dh 9I8= From: Andrew Stubbs To: Subject: [PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc Date: Thu, 7 Jul 2022 11:34:36 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to allocate memory in the "managed" space and explicitly on the host (it is intended that "malloc" will be intercepted by the compiler). The nvptx plugin is modified to make the necessary Cuda calls, and libgomp is modified to switch to shared-memory mode for USM allocated mappings. include/ChangeLog: * cuda/cuda.h (CUdevice_attribute): Add definitions for CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR. (CUmemAttach_flags): New. (CUpointer_attribute): New. (cuMemAllocManaged): New prototype. (cuPointerGetAttribute): New prototype. libgomp/ChangeLog: * allocator.c (omp_max_predefined_alloc): Update. (omp_aligned_alloc): Don't fallback ompx_host_mem_alloc. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/linux/allocator.c (linux_memspace_alloc): Handle USM. (linux_memspace_calloc): Handle USM. (linux_memspace_free): Handle USM. (linux_memspace_realloc): Handle USM. * config/nvptx/allocator.c (nvptx_memspace_alloc): Reject ompx_host_mem_alloc. (nvptx_memspace_calloc): Likewise. (nvptx_memspace_realloc): Likewise. * libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype. (GOMP_OFFLOAD_usm_free): New prototype. (GOMP_OFFLOAD_is_usm_ptr): New prototype. * libgomp.h (gomp_usm_alloc): New prototype. (gomp_usm_free): New prototype. (gomp_is_usm_ptr): New prototype. (struct gomp_device_descr): Add USM functions. * omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space and ompx_host_mem_space. (omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and ompx_host_mem_alloc. * omp_lib.f90.in: Likewise. * plugin/cuda-lib.def (cuMemAllocManaged): Add new call. (cuPointerGetAttribute): Likewise. * plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter. Call cuMemAllocManaged as appropriate. (GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (GOMP_OFFLOAD_alloc): Move internals to ... (GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter. (GOMP_OFFLOAD_usm_alloc): New function. (GOMP_OFFLOAD_usm_free): New function. (GOMP_OFFLOAD_is_usm_ptr): New function. * target.c (gomp_map_vars_internal): Add USM support. (gomp_usm_alloc): New function. (gomp_usm_free): New function. (gomp_load_plugin_for_device): New function. * testsuite/libgomp.c/usm-1.c: New test. * testsuite/libgomp.c/usm-2.c: New test. * testsuite/libgomp.c/usm-3.c: New test. * testsuite/libgomp.c/usm-4.c: New test. * testsuite/libgomp.c/usm-5.c: New test. co-authored-by: Kwok Cheung Yeung squash! openmp, nvptx: ompx_unified_shared_mem_alloc --- include/cuda/cuda.h | 12 ++++++ libgomp/allocator.c | 13 ++++-- libgomp/config/linux/allocator.c | 48 ++++++++++++++-------- libgomp/config/nvptx/allocator.c | 6 +++ libgomp/libgomp-plugin.h | 3 ++ libgomp/libgomp.h | 6 +++ libgomp/omp.h.in | 4 ++ libgomp/omp_lib.f90.in | 8 ++++ libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-nvptx.c | 47 ++++++++++++++++++--- libgomp/target.c | 64 +++++++++++++++++++++++++++++ libgomp/testsuite/libgomp.c/usm-1.c | 24 +++++++++++ libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++++ libgomp/testsuite/libgomp.c/usm-3.c | 35 ++++++++++++++++ libgomp/testsuite/libgomp.c/usm-4.c | 36 ++++++++++++++++ libgomp/testsuite/libgomp.c/usm-5.c | 28 +++++++++++++ 16 files changed, 340 insertions(+), 28 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h index 3938d05d150..8135e7c9247 100644 --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -77,9 +77,19 @@ typedef enum { CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39, CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 } CUdevice_attribute; +typedef enum { + CU_MEM_ATTACH_GLOBAL = 0x1 +} CUmemAttach_flags; + +typedef enum { + CU_POINTER_ATTRIBUTE_IS_MANAGED = 8 +} CUpointer_attribute; + enum { CU_EVENT_DEFAULT = 0, CU_EVENT_DISABLE_TIMING = 2 @@ -169,6 +179,7 @@ CUresult cuMemGetInfo (size_t *, size_t *); CUresult cuMemAlloc (CUdeviceptr *, size_t); #define cuMemAllocHost cuMemAllocHost_v2 CUresult cuMemAllocHost (void **, size_t); +CUresult cuMemAllocManaged(CUdeviceptr *, size_t, unsigned int); CUresult cuMemcpy (CUdeviceptr, CUdeviceptr, size_t); #define cuMemcpyDtoDAsync cuMemcpyDtoDAsync_v2 CUresult cuMemcpyDtoDAsync (CUdeviceptr, CUdeviceptr, size_t, CUstream); @@ -195,6 +206,7 @@ CUresult cuModuleLoadData (CUmodule *, const void *); CUresult cuModuleUnload (CUmodule); CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction, CUoccupancyB2DSize, size_t, int); +CUresult cuPointerGetAttribute(void *, CUpointer_attribute, CUdeviceptr); typedef void (*CUstreamCallback)(CUstream, CUresult, void *); CUresult cuStreamAddCallback(CUstream, CUstreamCallback, void *, unsigned int); CUresult cuStreamCreate (CUstream *, unsigned); diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 48ab0782e6b..ec31f8841a3 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -35,7 +35,7 @@ #include #endif -#define omp_max_predefined_alloc ompx_pinned_mem_alloc +#define omp_max_predefined_alloc ompx_host_mem_alloc /* These macros may be overridden in config//allocator.c. */ #ifndef MEMSPACE_ALLOC @@ -71,6 +71,8 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = { omp_low_lat_mem_space, /* omp_pteam_mem_alloc. */ omp_low_lat_mem_space, /* omp_thread_mem_alloc. */ omp_default_mem_space, /* ompx_pinned_mem_alloc. */ + ompx_unified_shared_mem_space, /* ompx_unified_shared_mem_alloc. */ + ompx_host_mem_space, /* ompx_host_mem_alloc. */ }; enum gomp_memkind_kind @@ -546,7 +548,8 @@ fail: int fallback = (allocator_data ? allocator_data->fallback : (allocator == omp_default_mem_alloc - || allocator == ompx_pinned_mem_alloc) + || allocator == ompx_pinned_mem_alloc + || allocator == ompx_host_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -845,7 +848,8 @@ fail: int fallback = (allocator_data ? allocator_data->fallback : (allocator == omp_default_mem_alloc - || allocator == ompx_pinned_mem_alloc) + || allocator == ompx_pinned_mem_alloc + || allocator == ompx_host_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -1195,7 +1199,8 @@ fail: int fallback = (allocator_data ? allocator_data->fallback : (allocator == omp_default_mem_alloc - || allocator == ompx_pinned_mem_alloc) + || allocator == ompx_pinned_mem_alloc + || allocator == ompx_host_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c index 1496e41875c..18235f59775 100644 --- a/libgomp/config/linux/allocator.c +++ b/libgomp/config/linux/allocator.c @@ -53,9 +53,11 @@ static void * linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) { - (void)memspace; - - if (pin) + if (memspace == ompx_unified_shared_mem_space) + { + return gomp_usm_alloc (size); + } + else if (pin) { void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); @@ -78,7 +80,14 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) static void * linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) { - if (pin) + if (memspace == ompx_unified_shared_mem_space) + { + void *ret = gomp_usm_alloc (size); + memset (ret, 0, size); + return ret; + } + else if (memspace == ompx_unified_shared_mem_space + || pin) return linux_memspace_alloc (memspace, size, pin); else return calloc (1, size); @@ -88,9 +97,9 @@ static void linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size, int pin) { - (void)memspace; - - if (pin) + if (memspace == ompx_unified_shared_mem_space) + gomp_usm_free (addr); + else if (pin) munmap (addr, size); else free (addr); @@ -100,7 +109,9 @@ static void * linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, size_t oldsize, size_t size, int oldpin, int pin) { - if (oldpin && pin) + if (memspace == ompx_unified_shared_mem_space) + goto manual_realloc; + else if (oldpin && pin) { void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE); if (newaddr == MAP_FAILED) @@ -109,18 +120,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, return newaddr; } else if (oldpin || pin) - { - void *newaddr = linux_memspace_alloc (memspace, size, pin); - if (newaddr) - { - memcpy (newaddr, addr, oldsize < size ? oldsize : size); - linux_memspace_free (memspace, addr, oldsize, oldpin); - } - - return newaddr; - } + goto manual_realloc; else return realloc (addr, size); + +manual_realloc: + void *newaddr = linux_memspace_alloc (memspace, size, pin); + if (newaddr) + { + memcpy (newaddr, addr, oldsize < size ? oldsize : size); + linux_memspace_free (memspace, addr, oldsize, oldpin); + } + + return newaddr; } #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index 0102680b717..c1a73511623 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size) __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); return result; } + else if (memspace == ompx_host_mem_space) + return NULL; else return malloc (size); } @@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size) return result; } + else if (memspace == ompx_host_mem_space) + return NULL; else return calloc (1, size); } @@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, } return result; } + else if (memspace == ompx_host_mem_space) + return NULL; else return realloc (addr, size); } diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index ab3ed638475..3e609bd3894 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -134,6 +134,9 @@ extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *, extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *); extern void *GOMP_OFFLOAD_alloc (int, size_t); extern bool GOMP_OFFLOAD_free (int, void *); +extern void *GOMP_OFFLOAD_usm_alloc (int, size_t); +extern bool GOMP_OFFLOAD_usm_free (int, void *); +extern bool GOMP_OFFLOAD_is_usm_ptr (void *); extern bool GOMP_OFFLOAD_dev2host (int, void *, const void *, size_t); extern bool GOMP_OFFLOAD_host2dev (int, void *, const void *, size_t); extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index c243c4d6cf4..3fdce301372 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1014,6 +1014,9 @@ extern int gomp_pause_host (void); extern void gomp_init_targets_once (void); extern int gomp_get_num_devices (void); extern bool gomp_target_task_fn (void *); +extern void * gomp_usm_alloc (size_t size); +extern void gomp_usm_free (void *device_ptr); +extern bool gomp_is_usm_ptr (void *ptr); /* Splay tree definitions. */ typedef struct splay_tree_node_s *splay_tree_node; @@ -1239,6 +1242,9 @@ struct gomp_device_descr __typeof (GOMP_OFFLOAD_unload_image) *unload_image_func; __typeof (GOMP_OFFLOAD_alloc) *alloc_func; __typeof (GOMP_OFFLOAD_free) *free_func; + __typeof (GOMP_OFFLOAD_usm_alloc) *usm_alloc_func; + __typeof (GOMP_OFFLOAD_usm_free) *usm_free_func; + __typeof (GOMP_OFFLOAD_is_usm_ptr) *is_usm_ptr_func; __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func; __typeof (GOMP_OFFLOAD_host2dev) *host2dev_func; __typeof (GOMP_OFFLOAD_dev2dev) *dev2dev_func; diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in index eb071aa2e00..eea019ad88d 100644 --- a/libgomp/omp.h.in +++ b/libgomp/omp.h.in @@ -120,6 +120,8 @@ typedef enum omp_memspace_handle_t __GOMP_UINTPTR_T_ENUM omp_const_mem_space = 2, omp_high_bw_mem_space = 3, omp_low_lat_mem_space = 4, + ompx_unified_shared_mem_space = 5, + ompx_host_mem_space = 6, __omp_memspace_handle_t_max__ = __UINTPTR_MAX__ } omp_memspace_handle_t; @@ -135,6 +137,8 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM omp_pteam_mem_alloc = 7, omp_thread_mem_alloc = 8, ompx_pinned_mem_alloc = 9, + ompx_unified_shared_mem_alloc = 10, + ompx_host_mem_alloc = 11, __omp_allocator_handle_t_max__ = __UINTPTR_MAX__ } omp_allocator_handle_t; diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in index 10610d64cfe..39a58b4bc4d 100644 --- a/libgomp/omp_lib.f90.in +++ b/libgomp/omp_lib.f90.in @@ -160,6 +160,10 @@ parameter :: omp_thread_mem_alloc = 8 integer (kind=omp_allocator_handle_kind), & parameter :: ompx_pinned_mem_alloc = 9 + integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_unified_shared_mem_alloc = 10 + integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_host_mem_alloc = 11 integer (omp_memspace_handle_kind), & parameter :: omp_default_mem_space = 0 integer (omp_memspace_handle_kind), & @@ -170,6 +174,10 @@ parameter :: omp_high_bw_mem_space = 3 integer (omp_memspace_handle_kind), & parameter :: omp_low_lat_mem_space = 4 + integer (omp_memspace_handle_kind), & + parameter :: omp_unified_shared_mem_space = 5 + integer (omp_memspace_handle_kind), & + parameter :: omp_host_mem_space = 6 integer, parameter :: omp_initial_device = -1 integer, parameter :: omp_invalid_device = -4 diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def index cd91b39b1d2..b6d03290f35 100644 --- a/libgomp/plugin/cuda-lib.def +++ b/libgomp/plugin/cuda-lib.def @@ -29,6 +29,7 @@ CUDA_ONE_CALL_MAYBE_NULL (cuLinkCreate_v2) CUDA_ONE_CALL (cuLinkDestroy) CUDA_ONE_CALL (cuMemAlloc) CUDA_ONE_CALL (cuMemAllocHost) +CUDA_ONE_CALL (cuMemAllocManaged) CUDA_ONE_CALL (cuMemcpy) CUDA_ONE_CALL (cuMemcpyDtoDAsync) CUDA_ONE_CALL (cuMemcpyDtoH) @@ -46,6 +47,7 @@ CUDA_ONE_CALL (cuModuleLoad) CUDA_ONE_CALL (cuModuleLoadData) CUDA_ONE_CALL (cuModuleUnload) CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize) +CUDA_ONE_CALL (cuPointerGetAttribute) CUDA_ONE_CALL (cuStreamAddCallback) CUDA_ONE_CALL (cuStreamCreate) CUDA_ONE_CALL (cuStreamDestroy) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 40739ba592d..2800c0dce6d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1046,11 +1046,13 @@ nvptx_stacks_free (struct ptx_device *ptx_dev, bool force) } static void * -nvptx_alloc (size_t s, bool suppress_errors) +nvptx_alloc (size_t s, bool suppress_errors, bool usm) { CUdeviceptr d; - CUresult r = CUDA_CALL_NOCHECK (cuMemAlloc, &d, s); + CUresult r = (usm ? CUDA_CALL_NOCHECK (cuMemAllocManaged, &d, s, + CU_MEM_ATTACH_GLOBAL) + : CUDA_CALL_NOCHECK (cuMemAlloc, &d, s)); if (suppress_errors && r == CUDA_ERROR_OUT_OF_MEMORY) return NULL; else if (r != CUDA_SUCCESS) @@ -1185,6 +1187,8 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) int num_devices = nvptx_get_num_devices (); /* Return -1 if no omp_requires_mask cannot be fulfilled but devices were present. */ + omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS + | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY); if (num_devices > 0 && omp_requires_mask != 0) return -1; return num_devices; @@ -1432,8 +1436,8 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data) return ret; } -void * -GOMP_OFFLOAD_alloc (int ord, size_t size) +static void * +GOMP_OFFLOAD_alloc_1 (int ord, size_t size, bool usm) { if (!nvptx_attach_host_thread_to_device (ord)) return NULL; @@ -1456,7 +1460,7 @@ GOMP_OFFLOAD_alloc (int ord, size_t size) blocks = tmp; } - void *d = nvptx_alloc (size, true); + void *d = nvptx_alloc (size, true, usm); if (d) return d; else @@ -1464,10 +1468,22 @@ GOMP_OFFLOAD_alloc (int ord, size_t size) /* Memory allocation failed. Try freeing the stacks block, and retrying. */ nvptx_stacks_free (ptx_dev, true); - return nvptx_alloc (size, false); + return nvptx_alloc (size, false, usm); } } +void * +GOMP_OFFLOAD_alloc (int ord, size_t size) +{ + return GOMP_OFFLOAD_alloc_1 (ord, size, false); +} + +void * +GOMP_OFFLOAD_usm_alloc (int ord, size_t size) +{ + return GOMP_OFFLOAD_alloc_1 (ord, size, true); +} + bool GOMP_OFFLOAD_free (int ord, void *ptr) { @@ -1475,6 +1491,25 @@ GOMP_OFFLOAD_free (int ord, void *ptr) && nvptx_free (ptr, ptx_devices[ord])); } +bool +GOMP_OFFLOAD_usm_free (int ord, void *ptr) +{ + return GOMP_OFFLOAD_free (ord, ptr); +} + +bool +GOMP_OFFLOAD_is_usm_ptr (void *ptr) +{ + bool managed = false; + /* This returns 3 outcomes ... + CUDA_ERROR_INVALID_VALUE - Not a Cuda allocated pointer. + CUDA_SUCCESS, managed:false - Cuda allocated, but not USM. + CUDA_SUCCESS, managed:true - USM. */ + CUDA_CALL_NOCHECK (cuPointerGetAttribute, &managed, + CU_POINTER_ATTRIBUTE_IS_MANAGED, (CUdeviceptr)ptr); + return managed; +} + void GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum, void **hostaddrs, void **devaddrs, diff --git a/libgomp/target.c b/libgomp/target.c index 4dac81862d7..4e203ae3c06 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1049,6 +1049,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, tgt->list[i].offset = 0; continue; } + else if (devicep->is_usm_ptr_func + && devicep->is_usm_ptr_func (hostaddrs[i])) + { + /* The memory is visible from both host and target + so nothing needs to be moved. */ + tgt->list[i].key = NULL; + tgt->list[i].offset = OFFSET_INLINED; + continue; + } else if ((kind & typemask) == GOMP_MAP_STRUCT) { size_t first = i + 1; @@ -1524,6 +1533,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, continue; } default: + if (tgt->list[i].offset == OFFSET_INLINED) + continue; break; } splay_tree_key k = &array->key; @@ -3401,6 +3412,56 @@ omp_target_free (void *device_ptr, int device_num) gomp_mutex_unlock (&devicep->lock); } +void * +gomp_usm_alloc (size_t size) +{ + struct gomp_task_icv *icv = gomp_icv (false); + struct gomp_device_descr *devicep = resolve_device (icv->default_device_var, + false); + if (devicep == NULL) + return NULL; + + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + return malloc (size); + + void *ret = NULL; + gomp_mutex_lock (&devicep->lock); + if (devicep->usm_alloc_func) + ret = devicep->usm_alloc_func (devicep->target_id, size); + gomp_mutex_unlock (&devicep->lock); + return ret; +} + +void +gomp_usm_free (void *device_ptr) +{ + if (device_ptr == NULL) + return; + + struct gomp_task_icv *icv = gomp_icv (false); + struct gomp_device_descr *devicep = resolve_device (icv->default_device_var, + false); + if (devicep == NULL) + return; + + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + { + free (device_ptr); + return; + } + + gomp_mutex_lock (&devicep->lock); + if (devicep->usm_free_func + && !devicep->usm_free_func (devicep->target_id, device_ptr)) + { + gomp_mutex_unlock (&devicep->lock); + gomp_fatal ("error in freeing device memory block at %p", device_ptr); + } + gomp_mutex_unlock (&devicep->lock); +} + int omp_target_is_present (const void *ptr, int device_num) { @@ -4041,6 +4102,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device, DLSYM (unload_image); DLSYM (alloc); DLSYM (free); + DLSYM_OPT (usm_alloc, usm_alloc); + DLSYM_OPT (usm_free, usm_free); + DLSYM_OPT (is_usm_ptr, is_usm_ptr); DLSYM (dev2host); DLSYM (host2dev); device->capabilities = device->get_caps_func (); diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c new file mode 100644 index 00000000000..1b35f19c45b --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-1.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int), ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + *a = 42; + uintptr_t a_p = (uintptr_t)a; + + #pragma omp target is_device_ptr(a) + { + if (*a != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c new file mode 100644 index 00000000000..689cee7e456 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-2.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + + uintptr_t a_p = (uintptr_t)a; + + #pragma omp target map(a[0]) + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + + #pragma omp target map(a[1]) + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + } + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c new file mode 100644 index 00000000000..2ca66afe93f --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-3.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + + uintptr_t a_p = (uintptr_t)a; + +#pragma omp target data map(a[0:2]) + { +#pragma omp target + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + +#pragma omp target + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + } + } + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c new file mode 100644 index 00000000000..753908c8440 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-4.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + + uintptr_t a_p = (uintptr_t)a; + +#pragma omp target enter data map(to:a[0:2]) + +#pragma omp target + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + +#pragma omp target + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + } + +#pragma omp target exit data map(delete:a[0:2]) + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c new file mode 100644 index 00000000000..4d8b3cf71b1 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-5.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-require-effective-target offload_device } */ + +#include +#include + +#pragma omp requires unified_shared_memory + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int), ompx_host_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + + uintptr_t a_p = (uintptr_t)a; + +#pragma omp target map(a[0:1]) + { + if (a[0] != 42 || a_p == (uintptr_t)a) + __builtin_abort (); + } + + omp_free(a, ompx_host_mem_alloc); + return 0; +} From patchwork Thu Jul 7 10:34:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55826 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AC57B384A898 for ; Thu, 7 Jul 2022 10:37:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 61883386189C for ; Thu, 7 Jul 2022 10:36:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 61883386189C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448557" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:10 -0800 IronPort-SDR: PNFfJiC1/82ycrwTl/4J6pik4cefm9fQVMsCkGf0sF97f3+i1+gvO+ZM75wpbNwcOlW/xjmwEq kcqkCg4rttg3jibkgdDety4Yum4nGedZvFnrBFZUe4Gy/mBvor/kaY2oY333FGDDyI/i+xAbqk zTBbflh1TNmDJIKvbyScarfV9Nwr5LOMfBQrFXCGoBk+8XIMZzqCb7q+o2wvZAe++sqaGYfoje 6BmvsKtzB3VR0bPHp9oQlo6MaWISvOcz/+nS8tY/uSGUpG3Q2vQKr3PAaIbL+waw9spw0rFRz7 S2A= From: Andrew Stubbs To: Subject: [PATCH 06/17] openmp: Add -foffload-memory Date: Thu, 7 Jul 2022 11:34:37 +0100 Message-ID: <07f0cd465555c2bac53cbb2248baab02002cc4ef.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 ++++++++++++++++ gcc/coretypes.h | 7 +++++++ gcc/doc/invoke.texi | 16 +++++++++++++++- 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index e7a51e882ba..8d76980fbbb 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2213,6 +2213,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32) EnumValue Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) +foffload-memory= +Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE) +-foffload-memory=[none|unified|pinned] Use an offload memory optimization. + +Enum +Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs) + +EnumValue +Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE) + +EnumValue +Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED) + +EnumValue +Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED) + fomit-frame-pointer Common Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 08b9ac9094c..dd52d5bb113 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -206,6 +206,13 @@ enum offload_abi { OFFLOAD_ABI_ILP32 }; +/* Types of memory optimization for an offload device. */ +enum offload_memory { + OFFLOAD_MEMORY_NONE, + OFFLOAD_MEMORY_UNIFIED, + OFFLOAD_MEMORY_PINNED +}; + /* Types of profile update methods. */ enum profile_update { PROFILE_UPDATE_SINGLE, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d5ff1018372..3df39bb06e3 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -202,7 +202,7 @@ in the following sections. -fno-builtin -fno-builtin-@var{function} -fcond-mismatch @gol -ffreestanding -fgimple -fgnu-tm -fgnu89-inline -fhosted @gol -flax-vector-conversions -fms-extensions @gol --foffload=@var{arg} -foffload-options=@var{arg} @gol +-foffload=@var{arg} -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol -fopenacc -fopenacc-dim=@var{geom} @gol -fopenmp -fopenmp-simd @gol -fpermitted-flt-eval-methods=@var{standard} @gol @@ -2708,6 +2708,20 @@ Typical command lines are -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm @end smallexample +@item -foffload-memory=none +@itemx -foffload-memory=unified +@itemx -foffload-memory=pinned +@opindex foffload-memory +@cindex OpenMP offloading memory modes +Enable a memory optimization mode to use with OpenMP. The default behavior, +@option{-foffload-memory=none}, is to do nothing special (unless enabled via +a requires directive in the code). @option{-foffload-memory=unified} is +equivalent to @code{#pragma omp requires unified_shared_memory}. +@option{-foffload-memory=pinned} forces all host memory to be pinned (this +mode may require the user to increase the ulimit setting for locked memory). +All translation units must select the same setting to avoid undefined +behavior. + @item -fopenacc @opindex fopenacc @cindex OpenACC accelerator programming From patchwork Thu Jul 7 10:34:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55829 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A5EEE383B79A for ; Thu, 7 Jul 2022 10:37:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 2FFFA385021A for ; Thu, 7 Jul 2022 10:36:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2FFFA385021A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448603" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:14 -0800 IronPort-SDR: ZH7p72jl7djkUOuPVqepPxnZ3a3duvKkalXit5sBEnGTh+XVitxwMI/ta8dCFLq9AZ0lAIgdN+ RhjMfpIACSqsm/1R8dzMDIefVZ9b71Dia+q20yhJp2GtcfpzY4mdJ661LJGul9YT5KuYaXKfOp 81G680Yrv+6VB9eLF0DmD5pT2+99jjSZLEnF+iILXouuYiP/V+o3JMNcbA7BDnJO96Ne3EUseh YycT+e5i6WvLH+E1TmiKVaU1r8L/L+VdnAjgGkAOM9AbGqd8t3NbX1AnqZ5VRzRTmxhrwdLlpX g+4= From: Andrew Stubbs To: Subject: [PATCH 07/17] openmp: allow requires unified_shared_memory Date: Thu, 7 Jul 2022 11:34:38 +0100 Message-ID: <902d406d18cad52d683e4e1cb2dbb19ea1afd81a.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This is the front-end portion of the Unified Shared Memory implementation. It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets flag_offload_memory, but is otherwise inactive, for now. It also checks that -foffload-memory isn't set to an incompatible mode. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Allow "requires unified_share_memory" and "unified_address". gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Allow "requires unified_share_memory" and "unified_address". gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_requires): Allow "requires unified_share_memory" and "unified_address". gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-1.c: New test. * c-c++-common/gomp/usm-4.c: New test. * gfortran.dg/gomp/usm-1.f90: New test. * gfortran.dg/gomp/usm-4.f90: New test. --- gcc/c/c-parser.cc | 22 ++++++++++++++++++++-- gcc/cp/parser.cc | 22 ++++++++++++++++++++-- gcc/fortran/openmp.cc | 13 +++++++++++++ gcc/testsuite/c-c++-common/gomp/usm-1.c | 4 ++++ gcc/testsuite/c-c++-common/gomp/usm-4.c | 4 ++++ gcc/testsuite/gfortran.dg/gomp/usm-1.f90 | 6 ++++++ gcc/testsuite/gfortran.dg/gomp/usm-4.f90 | 6 ++++++ 7 files changed, 73 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-4.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-4.f90 diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 9c02141e2c6..c30f67cd2da 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -22726,9 +22726,27 @@ c_parser_omp_requires (c_parser *parser) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index df657a3fb2b..3deafc7c928 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -46860,9 +46860,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index bd4ff259fe0..91bf8a3c50d 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "gomp-constants.h" #include "target-memory.h" /* For gfc_encode_character. */ +#include "options.h" /* Match an end of OpenMP directive. End of OpenMP directive is optional whitespace, followed by '\n' or comment '!'. */ @@ -5556,6 +5557,12 @@ gfc_match_omp_requires (void) requires_clause = OMP_REQ_UNIFIED_ADDRESS; if (requires_clauses & OMP_REQ_UNIFIED_ADDRESS) goto duplicate_clause; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + gfc_error_now ("unified_address at %C is incompatible with " + "the selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; } else if (gfc_match (clauses[2]) == MATCH_YES) { @@ -5563,6 +5570,12 @@ gfc_match_omp_requires (void) requires_clause = OMP_REQ_UNIFIED_SHARED_MEMORY; if (requires_clauses & OMP_REQ_UNIFIED_SHARED_MEMORY) goto duplicate_clause; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + gfc_error_now ("unified_shared_memory at %C is incompatible with " + "the selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; } else if (gfc_match (clauses[3]) == MATCH_YES) { diff --git a/gcc/testsuite/c-c++-common/gomp/usm-1.c b/gcc/testsuite/c-c++-common/gomp/usm-1.c new file mode 100644 index 00000000000..8d2ba62aba3 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-1.c @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ + +#pragma omp requires unified_shared_memory /* { dg-error ".unified_shared_memory. is incompatible with the selected .-foffload-memory. option" } */ diff --git a/gcc/testsuite/c-c++-common/gomp/usm-4.c b/gcc/testsuite/c-c++-common/gomp/usm-4.c new file mode 100644 index 00000000000..84f6f785079 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-4.c @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ + +#pragma omp requires unified_address /* { dg-error ".unified_address. is incompatible with the selected .-foffload-memory. option" } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 new file mode 100644 index 00000000000..340f6bb50a5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 @@ -0,0 +1,6 @@ +! { dg-do compile } +! { dg-additional-options "-foffload-memory=pinned" } + +!$omp requires unified_shared_memory ! { dg-error "unified_shared_memory at .* is incompatible with the selected -foffload-memory option" } + +end diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-4.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90 new file mode 100644 index 00000000000..725b07f2f88 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-4.f90 @@ -0,0 +1,6 @@ +! { dg-do compile } +! { dg-additional-options "-foffload-memory=pinned" } + +!$omp requires unified_address ! { dg-error "unified_address at .* is incompatible with the selected -foffload-memory option" } + +end From patchwork Thu Jul 7 10:34:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55832 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C1254386CE75 for ; Thu, 7 Jul 2022 10:38:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 6E3593864877 for ; Thu, 7 Jul 2022 10:36:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6E3593864877 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448636" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:18 -0800 IronPort-SDR: 7lxpei2NkP0mIbJnHx3Y+JJ9mpGDshFuFVLXe0aIkRT8Nqn6oNr21LInSOFFCRlU1oA7v80DxB NZk3KEQ6IXu4aV0Ht1zEj7MI+0P0XY3zUZkDnaTKNXlZc499rVAjQTmnnFlDjc6DpgjpgjthYS Sp1CSr72Dru/HZsXYryNb4jwDy+xexYa93HcN9kTqkPSOtawXE5kGuBroR42GPel8R8EEwaNrh Sj3GMdB06ycgGcAVzZ7qfncrapskn980mdn1x5xVeAXdzkqmZCs3erET959xHYLmpYCOsJDsq5 lHg= From: Andrew Stubbs To: Subject: [PATCH 08/17] openmp: -foffload-memory=pinned Date: Thu, 7 Jul 2022 11:34:39 +0100 Message-ID: <8011a994bb38db60f37127880b0fc682564f6e8d.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlockall to enable always-on memory pinning. It requires that the ulimit feature is set high enough to accommodate all the program's memory usage. In this mode the ompx_pinned_memory_alloc feature is disabled as it is not needed and may conflict. gcc/ChangeLog: * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New. * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. libgomp/ChangeLog: * config/linux/allocator.c (always_pinned_mode): New variable. (GOMP_enable_pinned_mode): New function. (linux_memspace_alloc): Disable pinning when always_pinned_mode set. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise. * libgomp.map: Add GOMP_enable_pinned_mode. * testsuite/libgomp.c/alloc-pinned-7.c: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/alloc-pinned-1.c: New test. --- gcc/omp-builtins.def | 3 + gcc/omp-low.cc | 66 +++++++++++++++++++ .../c-c++-common/gomp/alloc-pinned-1.c | 28 ++++++++ libgomp/config/linux/allocator.c | 26 ++++++++ libgomp/libgomp.map | 1 + libgomp/target.c | 4 +- libgomp/testsuite/libgomp.c/alloc-pinned-7.c | 63 ++++++++++++++++++ 7 files changed, 190 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index ee5213eedcf..276dd7812f2 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -470,3 +470,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning", BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error", BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE, + "GOMP_enable_pinned_mode", + BT_FN_VOID, ATTR_NOTHROW_LIST) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d73c165f029..ba612e5c67d 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx) input_location = saved_location; } +/* Emit a constructor function to enable -foffload-memory=pinned + at runtime. Libgomp handles the OS mode setting, but we need to trigger + it by calling GOMP_enable_pinned mode before the program proper runs. */ + +static void +omp_enable_pinned_mode () +{ + static bool visited = false; + if (visited) + return; + visited = true; + + /* Create a new function like this: + + static void __attribute__((constructor)) + __set_pinned_mode () + { + GOMP_enable_pinned_mode (); + } + */ + + tree name = get_identifier ("__set_pinned_mode"); + tree voidfntype = build_function_type_list (void_type_node, NULL_TREE); + tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype); + + TREE_STATIC (decl) = 1; + TREE_USED (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + DECL_IGNORED_P (decl) = 0; + TREE_PUBLIC (decl) = 0; + DECL_UNINLINABLE (decl) = 1; + DECL_EXTERNAL (decl) = 0; + DECL_CONTEXT (decl) = NULL_TREE; + DECL_INITIAL (decl) = make_node (BLOCK); + BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl; + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"), + NULL_TREE, NULL_TREE); + + tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, + void_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_IGNORED_P (t) = 1; + DECL_CONTEXT (t) = decl; + DECL_RESULT (decl) = t; + + push_struct_function (decl); + init_tree_ssa (cfun); + + tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE); + gcall *call = gimple_build_call (calldecl, 0); + + gimple_seq seq = NULL; + gimple_seq_add_stmt (&seq, call); + gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL)); + + cfun->function_end_locus = UNKNOWN_LOCATION; + cfun->curr_properties |= PROP_gimple_any; + pop_cfun (); + cgraph_node::add_new_function (decl, true); +} + /* Main entry point. */ static unsigned int @@ -14676,6 +14738,10 @@ execute_lower_omp (void) for (auto task_stmt : task_cpyfns) finalize_task_copyfn (task_stmt); task_cpyfns.release (); + + if (flag_offload_memory == OFFLOAD_MEMORY_PINNED) + omp_enable_pinned_mode (); + return 0; } diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c new file mode 100644 index 00000000000..e0e08019bff --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +#if __cplusplus +#define EXTERNC extern "C" +#else +#define EXTERNC +#endif + +/* Intercept the libgomp initialization call to check it happens. */ + +int good = 0; + +EXTERNC void +GOMP_enable_pinned_mode () +{ + good = 1; +} + +int +main () +{ + if (!good) + __builtin_exit (1); + + return 0; +} diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c index 18235f59775..e7fe6c3c49a 100644 --- a/libgomp/config/linux/allocator.c +++ b/libgomp/config/linux/allocator.c @@ -50,9 +50,26 @@ #include #include "libgomp.h" +static bool always_pinned_mode = false; + +/* This function is called by the compiler when -foffload-memory=pinned + is used. */ + +void +GOMP_enable_pinned_mode () +{ + if (mlockall (MCL_CURRENT | MCL_FUTURE) != 0) + gomp_error ("failed to pin all memory (ulimit too low?)"); + else + always_pinned_mode = true; +} + static void * linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) { return gomp_usm_alloc (size); @@ -80,6 +97,9 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) static void * linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) { void *ret = gomp_usm_alloc (size); @@ -97,6 +117,9 @@ static void linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) gomp_usm_free (addr); else if (pin) @@ -109,6 +132,9 @@ static void * linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, size_t oldsize, size_t size, int oldpin, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) goto manual_realloc; else if (oldpin && pin) diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 46d5f10f3e1..c86734f15e2 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -400,6 +400,7 @@ GOMP_5.0.1 { global: GOMP_alloc; GOMP_free; + GOMP_enable_pinned_mode; } GOMP_5.0; GOMP_5.1 { diff --git a/libgomp/target.c b/libgomp/target.c index 4e203ae3c06..3dd09b7afbd 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1,3 +1,4 @@ +#include /* Copyright (C) 2013-2022 Free Software Foundation, Inc. Contributed by Jakub Jelinek . @@ -1533,7 +1534,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, continue; } default: - if (tgt->list[i].offset == OFFSET_INLINED) + if (tgt->list[i].offset == OFFSET_INLINED + && !array) continue; break; } diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c new file mode 100644 index 00000000000..8dc19055038 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c @@ -0,0 +1,63 @@ +/* { dg-do run } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that pinned memory works. */ + +#include +#include + +#ifdef __linux__ +#include +#include + +#include + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} +#else +int +get_pinned_mem () +{ + return 0; +} + +#define mlockall(...) 0 +#endif + +#include + +int +main () +{ + // Sanity check + if (get_pinned_mem () == 0) + { + /* -foffload-memory=pinned has failed, but maybe that's because + isufficient pinned memory was available. */ + if (mlockall (MCL_CURRENT | MCL_FUTURE) == 0) + abort (); + } + + return 0; +} From patchwork Thu Jul 7 10:34:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55825 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 62EC5385142A for ; Thu, 7 Jul 2022 10:36:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 551973857BBB for ; Thu, 7 Jul 2022 10:36:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 551973857BBB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112718" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:36:18 -0800 IronPort-SDR: CQvw7pNDYnLEBkV1RTR2QQIprZaPY2R7RW4MJv/KC0uMRGtyZTXqEQZdbzEmPwdLjO9s16ghO+ 2g/6Uzwa0Q2SnVtJudcVfINJ9JiZ9F+irn2SgtcGmvqbxMKpfVYJ15tfKXBDKRZEJfVYZx0mXt PMAbRYvYdWvT26t7VB6Oqxr1t0RzIObQlx34M8pKdKbYRBb1ZU6SbV0wwt4BlKFWEqxLuRpLTf LbaGRgY969AVr4JAVxunIx6VnfXVYEbifdyW5fQIsLS+l8Nw4lXfVP8i8gshvQatf+LJY1GJ4V gOY= From: Andrew Stubbs To: Subject: [PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory. Date: Thu, 7 Jul 2022 11:34:40 +0100 Message-ID: <4c5987af7ca4f9de5ce05d2f2297e862c8b83596.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit from the unified shared memory. The libgomp does the correct thing with all the mapping constructs and there is no memory copies if the pointer is pointing to unified shared memory. We only replace replacable new operator and not the class member or placement new. gcc/ChangeLog: * omp-low.cc (usm_transform): New function. (make_pass_usm_transform): Likewise. (class pass_usm_transform): New. * passes.def: Add pass_usm_transform. * tree-pass.h (make_pass_usm_transform): New declaration. gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-2.c: New test. * c-c++-common/gomp/usm-3.c: New test. * g++.dg/gomp/usm-1.C: New test. * g++.dg/gomp/usm-2.C: New test. * g++.dg/gomp/usm-3.C: New test. * gfortran.dg/gomp/usm-2.f90: New test. * gfortran.dg/gomp/usm-3.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.c/usm-6.c: New test. * testsuite/libgomp.c++/usm-1.C: Likewise. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc | 174 +++++++++++++++++++++++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++++++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++++++ gcc/testsuite/g++.dg/gomp/usm-1.C | 32 +++++ gcc/testsuite/g++.dg/gomp/usm-2.C | 30 ++++ gcc/testsuite/g++.dg/gomp/usm-3.C | 38 +++++ gcc/testsuite/gfortran.dg/gomp/usm-2.f90 | 16 +++ gcc/testsuite/gfortran.dg/gomp/usm-3.f90 | 13 ++ gcc/tree-pass.h | 1 + libgomp/testsuite/libgomp.c++/usm-1.C | 54 +++++++ libgomp/testsuite/libgomp.c/usm-6.c | 92 ++++++++++++ 12 files changed, 541 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index ba612e5c67d..cdadd6f0c96 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -15097,6 +15097,180 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt) { return new pass_diagnose_omp_blocks (ctxt); } + +/* Provide transformation required for using unified shared memory + by replacing calls to standard memory allocation functions with + function provided by the libgomp. */ + +static tree +usm_transform (gimple_stmt_iterator *gsi_p, bool *, + struct walk_stmt_info *wi) +{ + gimple *stmt = gsi_stmt (*gsi_p); + /* ompx_unified_shared_mem_alloc is 10. */ + const unsigned int unified_shared_mem_alloc = 10; + + switch (gimple_code (stmt)) + { + case GIMPLE_CALL: + { + gcall *gs = as_a (stmt); + tree fndecl = gimple_call_fndecl (gs); + if (fndecl) + { + tree allocator = build_int_cst (pointer_sized_int_node, + unified_shared_mem_alloc); + const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + if ((strcmp (name, "malloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC) + || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl) + || strcmp (name, "omp_target_alloc") == 0) + { + tree omp_alloc_type + = build_function_type_list (ptr_type_node, size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_alloc", omp_alloc_type); + tree size = gimple_call_arg (gs, 0); + gimple *g = gimple_build_call (repl, 2, size, allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if (strcmp (name, "aligned_alloc") == 0) + { + /* May be we can also use this for new operator with + std::align_val_t parameter. */ + tree omp_alloc_type + = build_function_type_list (ptr_type_node, size_type_node, + size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_aligned_alloc", + omp_alloc_type); + tree align = gimple_call_arg (gs, 0); + tree size = gimple_call_arg (gs, 1); + gimple *g = gimple_build_call (repl, 3, align, size, + allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "calloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CALLOC)) + { + tree omp_calloc_type + = build_function_type_list (ptr_type_node, size_type_node, + size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_calloc", omp_calloc_type); + tree num = gimple_call_arg (gs, 0); + tree size = gimple_call_arg (gs, 1); + gimple *g = gimple_build_call (repl, 3, num, size, allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "realloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_REALLOC)) + { + tree omp_realloc_type + = build_function_type_list (ptr_type_node, ptr_type_node, + size_type_node, + pointer_sized_int_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_realloc", omp_realloc_type); + tree ptr = gimple_call_arg (gs, 0); + tree size = gimple_call_arg (gs, 1); + gimple *g = gimple_build_call (repl, 4, ptr, size, allocator, + allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "free") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE) + || (DECL_IS_OPERATOR_DELETE_P (fndecl) + && DECL_IS_REPLACEABLE_OPERATOR (fndecl)) + || strcmp (name, "omp_target_free") == 0) + { + tree omp_free_type + = build_function_type_list (void_type_node, ptr_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_free", omp_free_type); + tree ptr = gimple_call_arg (gs, 0); + gimple *g = gimple_build_call (repl, 2, ptr, allocator); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + } + } + break; + + default: + break; + } + + return NULL_TREE; +} + +namespace { + +const pass_data pass_data_usm_transform = +{ + GIMPLE_PASS, /* type */ + "usm_transform", /* name */ + OPTGROUP_OMP, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_gimple_any, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_usm_transform : public gimple_opt_pass +{ +public: + pass_usm_transform (gcc::context *ctxt) + : gimple_opt_pass (pass_data_usm_transform, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return (flag_openmp || flag_openmp_simd) + && (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED + || omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY + || omp_requires_mask & OMP_REQUIRES_UNIFIED_ADDRESS); + } + virtual unsigned int execute (function *) + { + struct walk_stmt_info wi; + gimple_seq body = gimple_body (current_function_decl); + + memset (&wi, 0, sizeof (wi)); + walk_gimple_seq (body, usm_transform, NULL, &wi); + + return 0; + } + +}; // class pass_usm_transform + +} // anon namespace + +gimple_opt_pass * +make_pass_usm_transform (gcc::context *ctxt) +{ + return new pass_usm_transform (ctxt); +} #include "gt-omp-low.h" diff --git a/gcc/passes.def b/gcc/passes.def index 375d3d62d51..7f838bfc96a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_diagnose_tm_blocks); NEXT_PASS (pass_omp_oacc_kernels_decompose); NEXT_PASS (pass_lower_omp); + NEXT_PASS (pass_usm_transform); NEXT_PASS (pass_lower_cf); NEXT_PASS (pass_lower_tm); NEXT_PASS (pass_refactor_eh); diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c new file mode 100644 index 00000000000..8c20ef94e69 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-usm_transform" } */ + +#pragma omp requires unified_shared_memory + +#ifdef __cplusplus +extern "C" { +#endif + +void *malloc (__SIZE_TYPE__); +void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__); +void *calloc(__SIZE_TYPE__, __SIZE_TYPE__); +void *realloc(void *, __SIZE_TYPE__); +void free (void *); +void *omp_target_alloc (__SIZE_TYPE__, int); +void omp_target_free (void *, int); + +#ifdef __cplusplus +} +#endif + +void +foo () +{ + void *p1 = malloc(20); + void *p2 = realloc(p1, 30); + void *p3 = calloc(4, 15); + void *p4 = aligned_alloc(16, 40); + void *p5 = omp_target_alloc(50, 1); + free (p2); + free (p3); + free (p4); + omp_target_free (p5, 1); +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " free" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " aligned_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " malloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " omp_target_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " omp_target_free" "usm_transform" } } */ diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c new file mode 100644 index 00000000000..2b0cbb45e27 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } */ + +#ifdef __cplusplus +extern "C" { +#endif + +void *malloc (__SIZE_TYPE__); +void *aligned_alloc (__SIZE_TYPE__, __SIZE_TYPE__); +void *calloc(__SIZE_TYPE__, __SIZE_TYPE__); +void *realloc(void *, __SIZE_TYPE__); +void free (void *); +void *omp_target_alloc (__SIZE_TYPE__, int); +void omp_target_free (void *, int); + +#ifdef __cplusplus +} +#endif + +void +foo () +{ + void *p1 = malloc(20); + void *p2 = realloc(p1, 30); + void *p3 = calloc(4, 15); + void *p4 = aligned_alloc(16, 40); + void *p5 = omp_target_alloc(50, 1); + free (p2); + free (p3); + free (p4); + omp_target_free (p5, 1); +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_aligned_alloc \\(16, 40, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(50, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " free" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " aligned_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " malloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " omp_target_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " omp_target_free" "usm_transform" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/usm-1.C b/gcc/testsuite/g++.dg/gomp/usm-1.C new file mode 100644 index 00000000000..bd70a81b5bb --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/usm-1.C @@ -0,0 +1,32 @@ +// { dg-do compile } +// { dg-options "-fopenmp -fdump-tree-usm_transform" } + +#pragma omp requires unified_shared_memory + +struct t1 +{ + int a; + int b; +}; + +typedef unsigned char uint8_t; + +void +foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y) +{ + uint8_t *p1 = new uint8_t; + uint8_t *p2 = new uint8_t[20]; + t1 *p3 = new t1; + t1 *p4 = new t1[y]; + delete p1; + delete p3; + delete [] p2; + delete [] p4; +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator new" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator delete" "usm_transform" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/usm-2.C b/gcc/testsuite/g++.dg/gomp/usm-2.C new file mode 100644 index 00000000000..f6ab155c6de --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/usm-2.C @@ -0,0 +1,30 @@ +// { dg-do compile } +// { dg-options "-fopenmp -foffload-memory=unified -fdump-tree-usm_transform" } + +struct t1 +{ + int a; + int b; +}; + +typedef unsigned char uint8_t; + +void +foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y) +{ + uint8_t *p1 = new uint8_t; + uint8_t *p2 = new uint8_t[20]; + t1 *p3 = new t1; + t1 *p4 = new t1[y]; + delete p1; + delete p3; + delete [] p2; + delete [] p4; +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator new" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator delete" "usm_transform" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/usm-3.C b/gcc/testsuite/g++.dg/gomp/usm-3.C new file mode 100644 index 00000000000..50ac9302c8b --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/usm-3.C @@ -0,0 +1,38 @@ +// { dg-do compile } +// { dg-options "-fopenmp -fdump-tree-usm_transform" } + +#pragma omp requires unified_shared_memory + +#include + + +struct X { + static void* operator new(std::size_t count) + { + static char buf[10]; + return &buf[0]; + } + static void* operator new[](std::size_t count) + { + static char buf[10]; + return &buf[0]; + } + static void operator delete(void*) + { + } + static void operator delete[](void*) + { + } +}; +void foo() { + X* p1 = new X; + delete p1; + X* p2 = new X[10]; + delete[] p2; + unsigned char buf[24] ; + int *p3 = new (buf) int(3); + p3[0] = 1; +} + +/* { dg-final { scan-tree-dump-not "omp_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "omp_free" "usm_transform" } } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 new file mode 100644 index 00000000000..dc775260cb7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-usm_transform" } + +!$omp requires unified_shared_memory +end + +subroutine foo() + implicit none + integer, allocatable :: var1 + + allocate(var1) + +end subroutine + +! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform" } } +! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform" } } \ No newline at end of file diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 new file mode 100644 index 00000000000..7983444ebff --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 @@ -0,0 +1,13 @@ +! { dg-do compile } +! { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } + +subroutine foo() + implicit none + integer, allocatable :: var1 + + allocate(var1) + +end subroutine + +! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform" } } +! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform" } } \ No newline at end of file diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 606d1d60b85..494a9662afa 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -424,6 +424,7 @@ extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_usm_transform (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt); diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C new file mode 100644 index 00000000000..fea25e5f10b --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/usm-1.C @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */ +#include + +#pragma omp requires unified_shared_memory + +int g1 = 0; + +struct s1 +{ + s1() { a = g1++;} + ~s1() { g1--;} + int a; +}; + +int +main () +{ + s1 *p1 = new s1; + s1 *p2 = new s1[10]; + + if (!p1 || !p2 || p1->a != 0) + __builtin_abort (); + + for (int i = 0; i < 10; i++) + if (p2[i].a != i+1) + __builtin_abort (); + + uintptr_t pp1 = (uintptr_t)p1; + uintptr_t pp2 = (uintptr_t)p2; + +#pragma omp target firstprivate(pp1, pp2) + { + s1 *t1 = (s1*)pp1; + s1 *t2 = (s1*)pp2; + if (t1->a != 0) + __builtin_abort (); + + for (int i = 0; i < 10; i++) + if (t2[i].a != i+1) + __builtin_abort (); + + t1->a = 42; + } + + if (p1->a != 42) + __builtin_abort (); + + delete [] p2; + delete p1; + if (g1 != 0) + __builtin_abort (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c new file mode 100644 index 00000000000..c207140092a --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-6.c @@ -0,0 +1,92 @@ +/* { dg-do run } */ +/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */ + +#include +#include + +#include + +/* On old systems, the declaraition may not be present in stdlib.h which + will generate a warning. This function is going to be replaced with + omp_aligned_alloc so the purpose of this declaration is to avoid that + warning. */ +void *aligned_alloc(size_t alignment, size_t size); + +#pragma omp requires unified_shared_memory + +int +main () +{ + int *a = (int *) malloc(sizeof(int)*2); + int *b = (int *) calloc(sizeof(int), 3); + int *c = (int *) realloc(NULL, sizeof(int) * 4); + int *d = (int *) aligned_alloc(32, sizeof(int)); + int *e = (int *) omp_target_alloc(sizeof(int), 1); + if (!a || !b || !c || !d || !e) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + b[0] = 52; + b[1] = 53; + b[2] = 54; + c[0] = 62; + c[1] = 63; + c[2] = 64; + c[3] = 65; + + uintptr_t a_p = (uintptr_t)a; + uintptr_t b_p = (uintptr_t)b; + uintptr_t c_p = (uintptr_t)c; + uintptr_t d_p = (uintptr_t)d; + uintptr_t e_p = (uintptr_t)e; + + if (d_p & 31 != 0) + __builtin_abort (); + +#pragma omp target enter data map(to:a[0:2]) + +#pragma omp target is_device_ptr(c) + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + if (b[0] != 52 || b[2] != 54 || b_p != (uintptr_t)b) + __builtin_abort (); + if (c[0] != 62 || c[3] != 65 || c_p != (uintptr_t)c) + __builtin_abort (); + if (d_p != (uintptr_t)d) + __builtin_abort (); + if (e_p != (uintptr_t)e) + __builtin_abort (); + a[0] = 72; + b[0] = 82; + c[0] = 92; + e[0] = 102; + } + +#pragma omp target + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + if (b[1] != 53 || b_p != (uintptr_t)b) + __builtin_abort (); + if (c[1] != 63 || c[2] != 64 || c_p != (uintptr_t)c) + __builtin_abort (); + a[1] = 73; + b[1] = 83; + c[1] = 93; + } + +#pragma omp target exit data map(delete:a[0:2]) + + if (a[0] != 72 || a[1] != 73 + || b[0] != 82 || b[1] != 83 + || c[0] != 92 || c[1] != 93 + || e[0] != 102) + __builtin_abort (); + free(a); + free(b); + free(c); + omp_target_free(e, 1); + return 0; +} From patchwork Thu Jul 7 10:34:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55834 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 05DC03839439 for ; Thu, 7 Jul 2022 10:38:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id A993C38418AF for ; Thu, 7 Jul 2022 10:37:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A993C38418AF Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448914" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:08 -0800 IronPort-SDR: z8xUjM4WP/xwrGRp8ZkscD6w4AC8pTe4nHDYw3H/sMouXgcqgLbByix4TS5fgpKGdx8PoztBWw iJbUYXtuvpiZg8f72rxUmftVBvtkbdt/FYQRulW0CxHDHVN0oJqF+wVX3VDHB8CmsN7wo3ayj4 nMEJiALjB5R4r8yKcSVQy3dRVwtkWYoklySKil4v96XX1lfThzYc5M41InLV0+HCUc/EIh49wc TSs87U8p4/HEBMn7dtZnIDDQSfPxuF211oDM2XNotDLWkc9Oadu4ZzciMDivBRcUuGP3o3+qqp jFc= From: Andrew Stubbs To: Subject: [PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0) Date: Thu, 7 Jul 2022 11:34:41 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Currently we only make use of this directive when it is associated with an allocate statement. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE. (show_code_node): Likewise. * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE. (OMP_LIST_ALLOCATOR): New enum value. (enum gfc_exec_op): Add EXEC_OMP_ALLOCATE. * match.h (gfc_match_omp_allocate): New function. * openmp.cc (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR. (OMP_ALLOCATE_CLAUSES): New define. (gfc_match_omp_allocate): New function. (resolve_omp_clauses): Add ALLOCATOR in clause_names. (omp_code_to_statement): Handle EXEC_OMP_ALLOCATE. (EMPTY_VAR_LIST): New define. (check_allocate_directive_restrictions): New function. (gfc_resolve_omp_allocate): Likewise. (gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE. * parse.cc (decode_omp_directive): Handle ST_OMP_ALLOCATE. (next_statement): Likewise. (gfc_ascii_statement): Likewise. * resolve.cc (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code): Likewise --- gcc/fortran/dump-parse-tree.cc | 3 + gcc/fortran/gfortran.h | 4 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 199 +++++++++++++++++- gcc/fortran/parse.cc | 10 +- gcc/fortran/resolve.cc | 1 + gcc/fortran/st.cc | 1 + gcc/fortran/trans.cc | 1 + gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++++++++++ gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 | 73 +++++++ 10 files changed, 400 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 5352008a63d..e0c6c0d9d96 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -2003,6 +2003,7 @@ show_omp_node (int level, gfc_code *c) case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break; case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break; case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break; + case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break; case EXEC_OMP_ATOMIC: name = "ATOMIC"; break; case EXEC_OMP_BARRIER: name = "BARRIER"; break; case EXEC_OMP_CANCEL: name = "CANCEL"; break; @@ -2204,6 +2205,7 @@ show_omp_node (int level, gfc_code *c) || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR + || c->op == EXEC_OMP_ALLOCATE || (c->op == EXEC_OMP_ORDERED && c->block == NULL)) return; if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS) @@ -3329,6 +3331,7 @@ show_code_node (int level, gfc_code *c) case EXEC_OACC_CACHE: case EXEC_OACC_ENTER_DATA: case EXEC_OACC_EXIT_DATA: + case EXEC_OMP_ALLOCATE: case EXEC_OMP_ATOMIC: case EXEC_OMP_CANCEL: case EXEC_OMP_CANCELLATION_POINT: diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 696aadd7db6..755469185a6 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -259,7 +259,7 @@ enum gfc_statement ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP, ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL, ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE, - ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, + ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE, ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC, ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED, ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS, @@ -1398,6 +1398,7 @@ enum OMP_LIST_USE_DEVICE_ADDR, OMP_LIST_NONTEMPORAL, OMP_LIST_ALLOCATE, + OMP_LIST_ALLOCATOR, OMP_LIST_HAS_DEVICE_ADDR, OMP_LIST_ENTER, OMP_LIST_NUM /* Must be the last. */ @@ -2908,6 +2909,7 @@ enum gfc_exec_op EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA, EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE, + EXEC_OMP_ALLOCATE, EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER, EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO, EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE, diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 495c93e0b5c..fe43d4b3fd3 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -149,6 +149,7 @@ match gfc_match_oacc_routine (void); /* OpenMP directive matchers. */ match gfc_match_omp_eos_error (void); +match gfc_match_omp_allocate (void); match gfc_match_omp_atomic (void); match gfc_match_omp_barrier (void); match gfc_match_omp_cancel (void); diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 91bf8a3c50d..38003890bb0 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -986,6 +986,7 @@ enum omp_mask2 OMP_CLAUSE_FINALIZE, OMP_CLAUSE_ATTACH, OMP_CLAUSE_NOHOST, + OMP_CLAUSE_ALLOCATOR, OMP_CLAUSE_HAS_DEVICE_ADDR, /* OpenMP 5.1 */ OMP_CLAUSE_ENTER, /* OpenMP 5.2 */ /* This must come last. */ @@ -3784,6 +3785,7 @@ cleanup: } +#define OMP_ALLOCATE_CLAUSES (omp_mask (OMP_CLAUSE_ALLOCATOR)) #define OMP_PARALLEL_CLAUSES \ (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE \ | OMP_CLAUSE_SHARED | OMP_CLAUSE_COPYIN | OMP_CLAUSE_REDUCTION \ @@ -6001,6 +6003,64 @@ gfc_match_omp_ordered_depend (void) return match_omp (EXEC_OMP_ORDERED, omp_mask (OMP_CLAUSE_DEPEND)); } +/* omp allocate (list) [clause-list] + - clause-list: allocator +*/ + +match +gfc_match_omp_allocate (void) +{ + gfc_omp_clauses *c = gfc_get_omp_clauses (); + gfc_expr *allocator = NULL; + match m; + + m = gfc_match (" ("); + if (m == MATCH_YES) + { + m = gfc_match_omp_variable_list ("", &c->lists[OMP_LIST_ALLOCATOR], + true, NULL); + + if (m != MATCH_YES) + { + /* If the list was empty, we must find closing ')'. */ + m = gfc_match (")"); + if (m != MATCH_YES) + return m; + } + } + + if (gfc_match (" allocator ( ") == MATCH_YES) + { + m = gfc_match_expr (&allocator); + if (m != MATCH_YES) + { + gfc_error ("Expected allocator at %C"); + return MATCH_ERROR; + } + if (gfc_match (" ) ") != MATCH_YES) + { + gfc_error ("Expected ')' at %C"); + gfc_free_expr (allocator); + return MATCH_ERROR; + } + } + + if (gfc_match_omp_eos () != MATCH_YES) + { + gfc_free_expr (allocator); + gfc_error ("Unexpected junk after $OMP allocate at %C"); + return MATCH_ERROR; + } + gfc_omp_namelist *n; + for (n = c->lists[OMP_LIST_ALLOCATOR]; n; n = n->next) + n->expr = gfc_copy_expr (allocator); + + new_st.op = EXEC_OMP_ALLOCATE; + new_st.ext.omp_clauses = c; + gfc_free_expr (allocator); + return MATCH_YES; +} + /* omp atomic [clause-list] - atomic-clause: read | write | update @@ -6482,7 +6542,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, "IN_REDUCTION", "TASK_REDUCTION", "DEVICE_RESIDENT", "LINK", "USE_DEVICE", "CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR", - "NONTEMPORAL", "ALLOCATE", "HAS_DEVICE_ADDR", "ENTER" }; + "NONTEMPORAL", "ALLOCATE", "HAS_DEVICE_ADDR", "ENTER", "ALLOCATOR" }; STATIC_ASSERT (ARRAY_SIZE (clause_names) == OMP_LIST_NUM); if (omp_clauses == NULL) @@ -9006,6 +9066,8 @@ omp_code_to_statement (gfc_code *code) { switch (code->op) { + case EXEC_OMP_ALLOCATE: + return ST_OMP_ALLOCATE; case EXEC_OMP_PARALLEL: return ST_OMP_PARALLEL; case EXEC_OMP_PARALLEL_MASKED: @@ -9486,6 +9548,138 @@ gfc_resolve_oacc_routines (gfc_namespace *ns) } } +static void +check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al, + gfc_namespace *ns, locus loc) +{ + if (sym->attr.save != SAVE_NONE || sym->attr.in_common == 1 + || sym->module != NULL) + { + int tmp; + /* Assumption here is that we can extract an integer then + it is a predefined thing. */ + if (!omp_al || gfc_extract_int (omp_al, &tmp)) + gfc_error ("%qs should use predefined allocator at %L", sym->name, + &loc); + } + if (ns != sym->ns) + gfc_error ("%qs is not in the same scope as %" + " directive at %L", sym->name, &loc); +} + +#define EMPTY_VAR_LIST(node) \ + (node->ext.omp_clauses->lists[OMP_LIST_ALLOCATOR] == NULL) + +static void +gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns) +{ + gfc_alloc *al; + gfc_omp_namelist *n = NULL; + gfc_omp_namelist *cn = NULL; + gfc_omp_namelist *p, *tail; + gfc_code *cur; + hash_set vars; + + gfc_omp_clauses *clauses = code->ext.omp_clauses; + gcc_assert (clauses); + cn = clauses->lists[OMP_LIST_ALLOCATOR]; + gfc_expr *omp_al = cn ? cn->expr : NULL; + + if (omp_al && (omp_al->ts.type != BT_INTEGER + || omp_al->ts.kind != gfc_c_intptr_kind)) + gfc_error ("Expected integer expression of the " + "% kind at %L", &omp_al->where); + + /* Check that variables in this allocate directive are not duplicated + in this directive or others coming directly after it. */ + for (cur = code; cur != NULL && cur->op == EXEC_OMP_ALLOCATE; + cur = cur->next) + { + gfc_omp_clauses *c = cur->ext.omp_clauses; + gcc_assert (c); + for (n = c->lists[OMP_LIST_ALLOCATOR]; n; n = n->next) + { + if (vars.contains (n->sym)) + gfc_error ("%qs is used in multiple % " + "directives at %L", n->sym->name, &cur->loc); + /* This helps us avoid duplicate error messages. */ + if (cur == code) + vars.add (n->sym); + } + } + + if (cur == NULL || cur->op != EXEC_ALLOCATE) + { + /* There is no allocate statement right after allocate directive. + We don't support this case at the moment. */ + for (n = cn; n != NULL; n = n->next) + { + gfc_symbol *sym = n->sym; + if (sym->attr.allocatable == 1) + gfc_error ("%qs with ALLOCATABLE attribute is not allowed in " + "% directive at %L as this directive is not" + " associated with an % statement.", + sym->name, &code->loc); + } + sorry_at (code->loc.lb->location, "% directive that is " + "not associated with an % statement is not " + "supported."); + return; + } + + /* If there is another allocate directive right after this one, check + that none of them is empty. Doing it this way, we can check this + thing even when multiple directives are together and generate + error at right location. */ + if (code->next && code->next->op == EXEC_OMP_ALLOCATE + && (EMPTY_VAR_LIST (code) || EMPTY_VAR_LIST (code->next))) + gfc_error ("Empty variable list is not allowed at %L when multiple " + "% directives are associated with an " + "% statement.", + EMPTY_VAR_LIST (code) ? &code->loc : &code->next->loc); + + if (EMPTY_VAR_LIST (code)) + { + /* Empty namelist means allocate directive applies to all + variables in allocate statement. 'cur' points to associated + allocate statement. */ + for (al = cur->ext.alloc.list; al != NULL; al = al->next) + if (al->expr && al->expr->symtree && al->expr->symtree->n.sym) + { + check_allocate_directive_restrictions (al->expr->symtree->n.sym, + omp_al, ns, code->loc); + p = gfc_get_omp_namelist (); + p->sym = al->expr->symtree->n.sym; + p->expr = omp_al; + p->where = code->loc; + if (cn == NULL) + cn = tail = p; + else + { + tail->next = p; + tail = tail->next; + } + } + clauses->lists[OMP_LIST_ALLOCATOR]= cn; + } + else + { + for (n = cn; n != NULL; n = n->next) + { + for (al = cur->ext.alloc.list; al != NULL; al = al->next) + if (al->expr && al->expr->symtree && al->expr->symtree->n.sym + && al->expr->symtree->n.sym == n->sym) + break; + if (al == NULL) + gfc_error ("%qs in % directive at %L is not present " + "in associated % statement.", + n->sym->name, &code->loc); + check_allocate_directive_restrictions (n->sym, omp_al, ns, + code->loc); + } + } +} + void gfc_resolve_oacc_directive (gfc_code *code, gfc_namespace *ns ATTRIBUTE_UNUSED) @@ -9627,6 +9821,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_namespace *ns) code->ext.omp_clauses->if_present = false; resolve_omp_clauses (code, code->ext.omp_clauses, ns); break; + case EXEC_OMP_ALLOCATE: + gfc_resolve_omp_allocate (code, ns); + break; default: break; } diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index 0b4c596996c..97d182d46ad 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -886,6 +886,7 @@ decode_omp_directive (void) { case 'a': matcho ("atomic", gfc_match_omp_atomic, ST_OMP_ATOMIC); + matcho ("allocate", gfc_match_omp_allocate, ST_OMP_ALLOCATE); break; case 'b': matcho ("barrier", gfc_match_omp_barrier, ST_OMP_BARRIER); @@ -1673,9 +1674,9 @@ next_statement (void) case ST_OMP_CANCEL: case ST_OMP_CANCELLATION_POINT: case ST_OMP_DEPOBJ: \ case ST_OMP_TARGET_UPDATE: case ST_OMP_TARGET_ENTER_DATA: \ case ST_OMP_TARGET_EXIT_DATA: case ST_OMP_ORDERED_DEPEND: case ST_OMP_ERROR: \ - case ST_ERROR_STOP: case ST_OMP_SCAN: case ST_SYNC_ALL: \ - case ST_SYNC_IMAGES: case ST_SYNC_MEMORY: case ST_LOCK: case ST_UNLOCK: \ - case ST_FORM_TEAM: case ST_CHANGE_TEAM: \ + case ST_OMP_ALLOCATE: case ST_ERROR_STOP: case ST_OMP_SCAN: \ + case ST_SYNC_ALL: case ST_SYNC_IMAGES: case ST_SYNC_MEMORY: case ST_LOCK: \ + case ST_UNLOCK: case ST_FORM_TEAM: case ST_CHANGE_TEAM: \ case ST_END_TEAM: case ST_SYNC_TEAM: \ case ST_EVENT_POST: case ST_EVENT_WAIT: case ST_FAIL_IMAGE: \ case ST_OACC_UPDATE: case ST_OACC_WAIT: case ST_OACC_CACHE: \ @@ -2352,6 +2353,9 @@ gfc_ascii_statement (gfc_statement st) case ST_OACC_END_ATOMIC: p = "!$ACC END ATOMIC"; break; + case ST_OMP_ALLOCATE: + p = "!$OMP ALLOCATE"; + break; case ST_OMP_ATOMIC: p = "!$OMP ATOMIC"; break; diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 2ebf076f730..65f24b88067 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -12368,6 +12368,7 @@ start: gfc_resolve_oacc_directive (code, ns); break; + case EXEC_OMP_ALLOCATE: case EXEC_OMP_ATOMIC: case EXEC_OMP_BARRIER: case EXEC_OMP_CANCEL: diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc index 73f30c2137f..7b282e96c3d 100644 --- a/gcc/fortran/st.cc +++ b/gcc/fortran/st.cc @@ -214,6 +214,7 @@ gfc_free_statement (gfc_code *p) case EXEC_OACC_ENTER_DATA: case EXEC_OACC_EXIT_DATA: case EXEC_OACC_ROUTINE: + case EXEC_OMP_ALLOCATE: case EXEC_OMP_ATOMIC: case EXEC_OMP_CANCEL: case EXEC_OMP_CANCELLATION_POINT: diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc index 912a206f2ed..a9d5714be22 100644 --- a/gcc/fortran/trans.cc +++ b/gcc/fortran/trans.cc @@ -2174,6 +2174,7 @@ trans_code (gfc_code * code, tree cond) res = gfc_trans_dt_end (code); break; + case EXEC_OMP_ALLOCATE: case EXEC_OMP_ATOMIC: case EXEC_OMP_BARRIER: case EXEC_OMP_CANCEL: diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 new file mode 100644 index 00000000000..3f512d66495 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 @@ -0,0 +1,112 @@ +! { dg-do compile } + +module test + integer, allocatable :: mvar1 + integer, allocatable :: mvar2 + integer, allocatable :: mvar3 +end module + +subroutine foo(x, y) + use omp_lib + implicit none + integer :: x + integer :: y + + integer, allocatable :: var1(:) + integer, allocatable :: var2(:) + integer, allocatable :: var3(:) + integer, allocatable :: var4(:) + integer, allocatable :: var5(:) + integer, allocatable :: var6(:) + integer, allocatable :: var7(:) + integer, allocatable :: var8(:) + integer, allocatable :: var9(:) + + !$omp allocate (var1) allocator(10) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind at .1." } + allocate (var1(x)) + + !$omp allocate (var2) ! { dg-error "'var2' in 'allocate' directive at .1. is not present in associated 'allocate' statement." } + allocate (var3(x)) + + !$omp allocate (x) ! { dg-message "sorry, unimplemented: 'allocate' directive that is not associated with an 'allocate' statement is not supported." } + x = 2 + + !$omp allocate (var4) ! { dg-error "'var4' with ALLOCATABLE attribute is not allowed in 'allocate' directive at .1. as this directive is not associated with an 'allocate' statement." } + ! { dg-message "sorry, unimplemented: 'allocate' directive that is not associated with an 'allocate' statement is not supported." "" { target *-*-* } .-1 } + y = 2 + + !$omp allocate (var5) + !$omp allocate ! { dg-error "Empty variable list is not allowed at .1. when multiple 'allocate' directives are associated with an 'allocate' statement." } + allocate (var5(x)) + + !$omp allocate (var6) + !$omp allocate (var7) ! { dg-error "'var7' in 'allocate' directive at .1. is not present in associated 'allocate' statement." } + !$omp allocate (var8) ! { dg-error "'var8' in 'allocate' directive at .1. is not present in associated 'allocate' statement." } + allocate (var6(x)) + + !$omp allocate (var9) + !$omp allocate (var9) ! { dg-error "'var9' is used in multiple 'allocate' directives at .1." } + allocate (var9(x)) + +end subroutine + +function outer(a) + IMPLICIT NONE + + integer :: outer, a + integer, allocatable :: var1 + + outer = inner(a) + 5 + return + + contains + + integer function inner(x) + integer :: x + integer, allocatable :: var2 + + !$omp allocate (var1, var2) ! { dg-error "'var1' is not in the same scope as 'allocate' directive at .1." } + allocate (var1, var2) + + inner = x + 10 + return + end function inner + +end function outer + +subroutine bar(s) + use omp_lib + use test + integer :: s + integer, save, allocatable :: svar1 + integer, save, allocatable :: svar2 + integer, save, allocatable :: svar3 + + type (omp_alloctrait) :: traits(3) + integer (omp_allocator_handle_kind) :: a + + traits = [omp_alloctrait (omp_atk_alignment, 64), & + omp_alloctrait (omp_atk_fallback, omp_atv_null_fb), & + omp_alloctrait (omp_atk_pool_size, 8192)] + a = omp_init_allocator (omp_default_mem_space, 3, traits) + if (a == omp_null_allocator) stop 1 + + !$omp allocate (mvar1) allocator(a) ! { dg-error "'mvar1' should use predefined allocator at .1." } + allocate (mvar1) + + !$omp allocate (mvar2) ! { dg-error "'mvar2' should use predefined allocator at .1." } + allocate (mvar2) + + !$omp allocate (mvar3) allocator(omp_low_lat_mem_alloc) + allocate (mvar3) + + !$omp allocate (svar1) allocator(a) ! { dg-error "'svar1' should use predefined allocator at .1." } + allocate (svar1) + + !$omp allocate (svar2) ! { dg-error "'svar2' should use predefined allocator at .1." } + allocate (svar2) + + !$omp allocate (svar3) allocator(omp_low_lat_mem_alloc) + allocate (svar3) +end subroutine + diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 new file mode 100644 index 00000000000..761b6dede28 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 @@ -0,0 +1,73 @@ +! { dg-do compile } + +module omp_lib_kinds + use iso_c_binding, only: c_int, c_intptr_t + implicit none + private :: c_int, c_intptr_t + integer, parameter :: omp_allocator_handle_kind = c_intptr_t + + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_null_allocator = 0 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_default_mem_alloc = 1 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_large_cap_mem_alloc = 2 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_const_mem_alloc = 3 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_high_bw_mem_alloc = 4 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_low_lat_mem_alloc = 5 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_cgroup_mem_alloc = 6 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_pteam_mem_alloc = 7 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_thread_mem_alloc = 8 +end module + +subroutine foo(x, y) + use omp_lib_kinds + implicit none + integer :: x + integer :: y + + integer, allocatable :: var1(:) + integer, allocatable :: var2(:) + integer, allocatable :: var3(:) + integer, allocatable :: var4(:,:) + integer, allocatable :: var5(:) + integer, allocatable :: var6(:) + integer, allocatable :: var7(:) + integer, allocatable :: var8(:) + integer, allocatable :: var9(:) + integer, allocatable :: var10(:) + integer, allocatable :: var11(:) + integer, allocatable :: var12(:) + + !$omp allocate (var1) allocator(omp_default_mem_alloc) + allocate (var1(x)) + + !$omp allocate (var2) + allocate (var2(x)) + + !$omp allocate (var3, var4) allocator(omp_large_cap_mem_alloc) + allocate (var3(x),var4(x,y)) + + !$omp allocate() + allocate (var5(x)) + + !$omp allocate + allocate (var6(x)) + + !$omp allocate () allocator(omp_default_mem_alloc) + allocate (var7(x)) + + !$omp allocate allocator(omp_default_mem_alloc) + allocate (var8(x)) + + !$omp allocate (var9) allocator(omp_default_mem_alloc) + !$omp allocate (var10) allocator(omp_large_cap_mem_alloc) + allocate (var9(x), var10(x)) + +end subroutine From patchwork Thu Jul 7 10:34:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55830 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 25D5F383664F for ; Thu, 7 Jul 2022 10:37:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id B2A25384189D for ; Thu, 7 Jul 2022 10:37:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B2A25384189D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448931" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:10 -0800 IronPort-SDR: 6pRn9wCFnfMbwJM35bgfDGTPCNrVLgfJ/73714OGJceq+s+6tVLU1YV4G/gCvjTjj02VB08WQK deuWuDNtcXAIogdN6TTnjQYSiaZwbegPL7pqWe9ZBb7M84BW0sOgQF2qPXapSAsm3DV3JIDnwz cxyJcV0sBfS1QRR3LwvqGxvvRt2CkmtmlMHc2DWO2hPIJupjOu9Rj+GftzBt93Wg6tusnje6cy /eFRbPM3iORzzs0FdlPmcdnZZeS/l6opyPj7na9V5Mh9+jA4PPgQOCJe5beWlkFwsUk8uu3/Tz JME= From: Andrew Stubbs To: Subject: [PATCH 11/17] Translate allocate directive (OpenMP 5.0). Date: Thu, 7 Jul 2022 11:34:42 +0100 Message-ID: <6a5caebc7e24c68f4bf788ae2cd5ee2faf868051.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR. (gfc_trans_omp_allocate): New function. (gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE. gcc/ChangeLog: * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR. (dump_generic_node): Handle OMP_ALLOCATE. * tree.def (OMP_ALLOCATE): New. * tree.h (OMP_ALLOCATE_CLAUSES): Likewise. (OMP_ALLOCATE_DECL): Likewise. (OMP_ALLOCATE_ALLOCATOR): Likewise. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: New test. --- gcc/fortran/trans-openmp.cc | 44 ++++++++++++ gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++++++++++++++++++ gcc/tree-core.h | 3 + gcc/tree-pretty-print.cc | 19 +++++ gcc/tree.cc | 1 + gcc/tree.def | 4 ++ gcc/tree.h | 11 +++ 7 files changed, 154 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index de27ed52c02..3ee63e416ed 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -2728,6 +2728,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } } break; + case OMP_LIST_ALLOCATOR: + for (; n != NULL; n = n->next) + if (n->sym->attr.referenced) + { + tree t = gfc_trans_omp_variable (n->sym, false); + if (t != error_mark_node) + { + tree node = build_omp_clause (input_location, + OMP_CLAUSE_ALLOCATOR); + OMP_ALLOCATE_DECL (node) = t; + if (n->expr) + { + tree allocator_; + gfc_init_se (&se, NULL); + gfc_conv_expr (&se, n->expr); + allocator_ = gfc_evaluate_now (se.expr, block); + OMP_ALLOCATE_ALLOCATOR (node) = allocator_; + } + omp_clauses = gfc_trans_add_clause (node, omp_clauses); + } + } + break; case OMP_LIST_LINEAR: { gfc_expr *last_step_expr = NULL; @@ -4982,6 +5004,26 @@ gfc_trans_omp_atomic (gfc_code *code) return gfc_finish_block (&block); } +static tree +gfc_trans_omp_allocate (gfc_code *code) +{ + stmtblock_t block; + tree stmt; + + gfc_omp_clauses *clauses = code->ext.omp_clauses; + gcc_assert (clauses); + + gfc_start_block (&block); + stmt = make_node (OMP_ALLOCATE); + TREE_TYPE (stmt) = void_type_node; + OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses, + code->loc, false, + true); + gfc_add_expr_to_block (&block, stmt); + gfc_merge_block_scope (&block); + return gfc_finish_block (&block); +} + static tree gfc_trans_omp_barrier (void) { @@ -7488,6 +7530,8 @@ gfc_trans_omp_directive (gfc_code *code) { switch (code->op) { + case EXEC_OMP_ALLOCATE: + return gfc_trans_omp_allocate (code); case EXEC_OMP_ATOMIC: return gfc_trans_omp_atomic (code); case EXEC_OMP_BARRIER: diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 new file mode 100644 index 00000000000..2de2b52ee44 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 @@ -0,0 +1,72 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-original" } + +module omp_lib_kinds + use iso_c_binding, only: c_int, c_intptr_t + implicit none + private :: c_int, c_intptr_t + integer, parameter :: omp_allocator_handle_kind = c_intptr_t + + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_null_allocator = 0 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_default_mem_alloc = 1 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_large_cap_mem_alloc = 2 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_const_mem_alloc = 3 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_high_bw_mem_alloc = 4 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_low_lat_mem_alloc = 5 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_cgroup_mem_alloc = 6 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_pteam_mem_alloc = 7 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_thread_mem_alloc = 8 +end module + + +subroutine foo(x, y, al) + use omp_lib_kinds + implicit none + +type :: my_type + integer :: i + integer :: j + real :: x +end type + + integer :: x + integer :: y + integer (kind=omp_allocator_handle_kind) :: al + + integer, allocatable :: var1 + integer, allocatable :: var2 + real, allocatable :: var3(:,:) + type (my_type), allocatable :: var4 + integer, pointer :: pii, parr(:) + + character, allocatable :: str1a, str1aarr(:) + character(len=5), allocatable :: str5a, str5aarr(:) + + !$omp allocate + allocate(str1a, str1aarr(10), str5a, str5aarr(10)) + + !$omp allocate (var1) allocator(omp_default_mem_alloc) + !$omp allocate (var2) allocator(omp_large_cap_mem_alloc) + allocate (var1, var2) + + !$omp allocate (var4) allocator(omp_low_lat_mem_alloc) + allocate (var4) + var4%i = 5 + + !$omp allocate (var3) allocator(omp_low_lat_mem_alloc) + allocate (var3(x,y)) + + !$omp allocate + allocate(pii, parr(5)) +end subroutine + +! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } } diff --git a/gcc/tree-core.h b/gcc/tree-core.h index ab5fa01e5cb..774bf0d7658 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -522,6 +522,9 @@ enum omp_clause_code { /* OpenACC clause: nohost. */ OMP_CLAUSE_NOHOST, + + /* OpenMP clause: allocator. */ + OMP_CLAUSE_ALLOCATOR }; #undef DEFTREESTRUCT diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 47371d8bcbe..4d21babbd34 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -767,6 +767,20 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) pp_right_paren (pp); break; + case OMP_CLAUSE_ALLOCATOR: + pp_string (pp, "("); + dump_generic_node (pp, OMP_ALLOCATE_DECL (clause), + spc, flags, false); + if (OMP_ALLOCATE_ALLOCATOR (clause)) + { + pp_string (pp, ":allocator("); + dump_generic_node (pp, OMP_ALLOCATE_ALLOCATOR (clause), + spc, flags, false); + pp_right_paren (pp); + } + pp_right_paren (pp); + break; + case OMP_CLAUSE_ALLOCATE: pp_string (pp, "allocate("); if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (clause)) @@ -3525,6 +3539,11 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, dump_omp_clauses (pp, OACC_CACHE_CLAUSES (node), spc, flags); break; + case OMP_ALLOCATE: + pp_string (pp, "#pragma omp allocate "); + dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags); + break; + case OMP_PARALLEL: pp_string (pp, "#pragma omp parallel"); dump_omp_clauses (pp, OMP_PARALLEL_CLAUSES (node), spc, flags); diff --git a/gcc/tree.cc b/gcc/tree.cc index 84000dd8b69..6dc1cf4d9b3 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -351,6 +351,7 @@ unsigned const char omp_clause_num_ops[] = 0, /* OMP_CLAUSE_IF_PRESENT */ 0, /* OMP_CLAUSE_FINALIZE */ 0, /* OMP_CLAUSE_NOHOST */ + 2, /* OMP_CLAUSE_ALLOCATOR */ }; const char * const omp_clause_code_name[] = diff --git a/gcc/tree.def b/gcc/tree.def index 62650b6934b..b4d2f7a575d 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1307,6 +1307,10 @@ DEFTREECODE (OMP_ATOMIC_READ, "omp_atomic_read", tcc_statement, 1) DEFTREECODE (OMP_ATOMIC_CAPTURE_OLD, "omp_atomic_capture_old", tcc_statement, 2) DEFTREECODE (OMP_ATOMIC_CAPTURE_NEW, "omp_atomic_capture_new", tcc_statement, 2) +/* OpenMP - #pragma omp allocate + Operand 0: Clauses. */ +DEFTREECODE (OMP_ALLOCATE, "omp allocate", tcc_statement, 1) + /* OpenMP clauses. */ DEFTREECODE (OMP_CLAUSE, "omp_clause", tcc_exceptional, 0) diff --git a/gcc/tree.h b/gcc/tree.h index 6f6ad5a3a5f..b2575c18693 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1466,6 +1466,8 @@ class auto_suppress_location_wrappers #define OACC_UPDATE_CLAUSES(NODE) \ TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0) +#define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0) + #define OMP_PARALLEL_BODY(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0) #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1) @@ -1872,6 +1874,15 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_ALLOCATE_ALIGN(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATE), 2) +/* May be we can use OMP_CLAUSE_DECL but the I am not sure where to place + OMP_CLAUSE_ALLOCATOR in omp_clause_code. */ + +#define OMP_ALLOCATE_DECL(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 0) + +#define OMP_ALLOCATE_ALLOCATOR(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 1) + /* True if an ALLOCATE clause was present on a combined or composite construct and the code for splitting the clauses has already performed checking if the listed variable has explicit privatization on the From patchwork Thu Jul 7 10:34:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55828 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 117673842AFD for ; Thu, 7 Jul 2022 10:37:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 5ED2238485B9 for ; Thu, 7 Jul 2022 10:37:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5ED2238485B9 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112769" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:11 -0800 IronPort-SDR: L7hVmE722WAMiMUrGc5hHCzlUz3QX5IJoWhqN0ohKaM7KCuF09KW00pTNsjRUFqdXZoyBIcgYV Rfw3tin8eFBK+dGr/16KetTdKvx7EnhQBxgGBkeT8iWp4OavTRXh/X2h72I7oO/lkgayA5zXdh 84TEs5yTMi2BE3qToV8ioRqLZh7oq0RcbnUGZNV2UDrwXYuCXmdKB899iug1LqyoaLz/HdZAMG Z9tVzsdSiFtAkb1/t111Vh4QAVL+WyS0mGFeCZxk9unnSTMZdjaYWUiQ2FrepC2jvmtl0Oc3c7 5T0= From: Andrew Stubbs To: Subject: [PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0). Date: Thu, 7 Jul 2022 11:34:43 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Currently we are only handling omp allocate directive that is associated with an allocate statement. This statement results in malloc and free calls. The malloc calls are easy to get to as they are in the same block as allocate directive. But the free calls come in a separate cleanup block. To help any later passes finding them, an allocate directive is generated in the cleanup block with kind=free. The normal allocate directive is given kind=allocate. gcc/fortran/ChangeLog: * gfortran.h (struct access_ref): Declare new members omp_allocated and omp_allocated_end. * openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to NULL. (prepare_omp_allocated_var_list_for_cleanup): New function. (gfc_resolve_omp_allocate): Call it. * trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated. * trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt generated for allocate directive. gcc/ChangeLog: * tree-core.h (struct tree_base): Add comments. * tree-pretty-print.cc (dump_generic_node): Handle allocate directive kind. * tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define. (OMP_ALLOCATE_KIND_FREE): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive. --- gcc/fortran/gfortran.h | 1 + gcc/fortran/openmp.cc | 30 +++++++++++++++++++ gcc/fortran/trans-decl.cc | 20 +++++++++++++ gcc/fortran/trans-openmp.cc | 6 ++++ gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 3 +- gcc/tree-core.h | 6 ++++ gcc/tree-pretty-print.cc | 4 +++ gcc/tree.h | 4 +++ 8 files changed, 73 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 755469185a6..c6f58341cf3 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -1829,6 +1829,7 @@ typedef struct gfc_symbol gfc_array_spec *as; struct gfc_symbol *result; /* function result symbol */ gfc_component *components; /* Derived type components */ + gfc_omp_namelist *omp_allocated, *omp_allocated_end; /* Defined only for Cray pointees; points to their pointer. */ struct gfc_symbol *cp_pointer; diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 38003890bb0..4c94bc763b5 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -6057,6 +6057,7 @@ gfc_match_omp_allocate (void) new_st.op = EXEC_OMP_ALLOCATE; new_st.ext.omp_clauses = c; + new_st.resolved_sym = NULL; gfc_free_expr (allocator); return MATCH_YES; } @@ -9548,6 +9549,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns) } } +static void +prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc) +{ + gfc_symbol *proc = cn->sym->ns->proc_name; + gfc_omp_namelist *p, *n; + + for (n = cn; n; n = n->next) + { + if (n->sym->attr.allocatable && !n->sym->attr.save + && !n->sym->attr.result && !proc->attr.is_main_program) + { + p = gfc_get_omp_namelist (); + p->sym = n->sym; + p->expr = gfc_copy_expr (n->expr); + p->where = loc; + p->next = NULL; + if (proc->omp_allocated == NULL) + proc->omp_allocated_end = proc->omp_allocated = p; + else + { + proc->omp_allocated_end->next = p; + proc->omp_allocated_end = p; + } + + } + } +} + static void check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al, gfc_namespace *ns, locus loc) @@ -9678,6 +9707,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns) code->loc); } } + prepare_omp_allocated_var_list_for_cleanup (cn, code->loc); } diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index 6493cc2f6b1..326365f22fc 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -4588,6 +4588,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block) } } + /* Generate a dummy allocate pragma with free kind so that cleanup + of those variables which were allocated using the allocate statement + associated with an allocate clause happens correctly. */ + + if (proc_sym->omp_allocated) + { + gfc_clear_new_st (); + new_st.op = EXEC_OMP_ALLOCATE; + gfc_omp_clauses *c = gfc_get_omp_clauses (); + c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated; + new_st.ext.omp_clauses = c; + /* This is just a hacky way to convey to handler that we are + dealing with cleanup here. Saves us from using another field + for it. */ + new_st.resolved_sym = proc_sym->omp_allocated->sym; + gfc_add_init_cleanup (block, NULL, + gfc_trans_omp_directive (&new_st)); + gfc_free_omp_clauses (c); + proc_sym->omp_allocated = NULL; + } /* Initialize the INTENT(OUT) derived type dummy arguments. This should be done here so that the offsets and lbounds of arrays diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 3ee63e416ed..ab3c0c620b7 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -5019,6 +5019,12 @@ gfc_trans_omp_allocate (gfc_code *code) OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses, code->loc, false, true); + if (code->next == NULL && code->block == NULL + && code->resolved_sym != NULL) + OMP_ALLOCATE_KIND_FREE (stmt) = 1; + else + OMP_ALLOCATE_KIND_ALLOCATE (stmt) = 1; + gfc_add_expr_to_block (&block, stmt); gfc_merge_block_scope (&block); return gfc_finish_block (&block); diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 index 2de2b52ee44..0eb35178e03 100644 --- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 @@ -69,4 +69,5 @@ end type allocate(pii, parr(5)) end subroutine -! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } } +! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } } diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 774bf0d7658..b0d5c074552 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1257,6 +1257,9 @@ struct GTY(()) tree_base { EXPR_LOCATION_WRAPPER_P in NON_LVALUE_EXPR, VIEW_CONVERT_EXPR + OMP_ALLOCATE_KIND_ALLOCATE in + OMP_ALLOCATE + private_flag: TREE_PRIVATE in @@ -1283,6 +1286,9 @@ struct GTY(()) tree_base { ENUM_IS_OPAQUE in ENUMERAL_TYPE + OMP_ALLOCATE_KIND_FREE in + OMP_ALLOCATE + protected_flag: TREE_PROTECTED in diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 4d21babbd34..23dd45de556 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -3541,6 +3541,10 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, case OMP_ALLOCATE: pp_string (pp, "#pragma omp allocate "); + if (OMP_ALLOCATE_KIND_ALLOCATE (node)) + pp_string (pp, "(kind=allocate) "); + else if (OMP_ALLOCATE_KIND_FREE (node)) + pp_string (pp, "(kind=free) "); dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags); break; diff --git a/gcc/tree.h b/gcc/tree.h index b2575c18693..1b67505f974 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1467,6 +1467,10 @@ class auto_suppress_location_wrappers TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0) #define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0) +#define OMP_ALLOCATE_KIND_ALLOCATE(NODE) \ + (OMP_ALLOCATE_CHECK (NODE)->base.public_flag) +#define OMP_ALLOCATE_KIND_FREE(NODE) \ + (OMP_ALLOCATE_CHECK (NODE)->base.private_flag) #define OMP_PARALLEL_BODY(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0) #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1) From patchwork Thu Jul 7 10:34:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55833 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CACC1382E81C for ; Thu, 7 Jul 2022 10:38:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 5B41E383F969 for ; Thu, 7 Jul 2022 10:37:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B41E383F969 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="78448958" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:16 -0800 IronPort-SDR: OJkQtFoiTcsRZP1TmIlZCK5bjbpXzqj+/X0wZUEFkKoNxHpC7FxfMmwhB9wbBwmk03+JfoV1Dn 2luaPbzAitG225lbfRDCzySwjg+DvxoSW+P0yRtTW31ZHY7N+o86AKBabAMDZ21QPSbTB3ZuaI o7T+nK0V4WfgQJBdx0zNSgGrF/R4RKQiHzt67/4w7bN6fKbgz1HR4WlMhQBejFZwt+IQUQ6dxL s8fZ6NGQfBY+N0ZQjFK814gpi9OySzE83e/xODO04/LHaw07cCeNWwzQgJ9bHouz6a/eYS6M6I tKA= From: Andrew Stubbs To: Subject: [PATCH 13/17] Gimplify allocate directive (OpenMP 5.0). Date: Thu, 7 Jul 2022 11:34:44 +0100 Message-ID: <9e2ae3ebed095b6a0b3f57fc93c5ee8f8f3d0a45.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE. * gimple-pretty-print.cc (dump_gimple_omp_allocate): New function. (pp_gimple_stmt_1): Call it. * gimple.cc (gimple_build_omp_allocate): New function. * gimple.def (GIMPLE_OMP_ALLOCATE): New node. * gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK, GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE. (struct gomp_allocate): New. (is_a_helper ::test): New. (is_a_helper ::test): New. (gimple_build_omp_allocate): Declare. (gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with GIMPLE_OMP_ALLOCATE. (gimple_omp_allocate_set_clauses): New. (gimple_omp_allocate_set_kind): Likewise. (gimple_omp_allocate_clauses): Likewise. (gimple_omp_allocate_kind): Likewise. (CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE. * gimplify.cc (gimplify_omp_allocate): New. (gimplify_expr): Call it. * gsstruct.def (GSS_OMP_ALLOCATE): Define. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: Add tests. --- gcc/doc/gimple.texi | 38 +++++++++++- gcc/gimple-pretty-print.cc | 37 ++++++++++++ gcc/gimple.cc | 12 ++++ gcc/gimple.def | 6 ++ gcc/gimple.h | 60 ++++++++++++++++++- gcc/gimplify.cc | 19 ++++++ gcc/gsstruct.def | 1 + gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 4 +- 8 files changed, 173 insertions(+), 4 deletions(-) diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index dd9149377f3..67b9061f3a7 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values (layouts) and + gomp_continue | layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE | + + gomp_allocate + | layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE + | + gomp_atomic_load | layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD | @@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE instruction set. @item @code{GIMPLE_GOTO} @tab x @tab x @item @code{GIMPLE_LABEL} @tab x @tab x @item @code{GIMPLE_NOP} @tab x @tab x +@item @code{GIMPLE_OMP_ALLOCATE} @tab x @tab x @item @code{GIMPLE_OMP_ATOMIC_LOAD} @tab x @tab x @item @code{GIMPLE_OMP_ATOMIC_STORE} @tab x @tab x @item @code{GIMPLE_OMP_CONTINUE} @tab x @tab x @@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}. * @code{GIMPLE_LABEL}:: * @code{GIMPLE_GOTO}:: * @code{GIMPLE_NOP}:: +* @code{GIMPLE_OMP_ALLOCATE}:: * @code{GIMPLE_OMP_ATOMIC_LOAD}:: * @code{GIMPLE_OMP_ATOMIC_STORE}:: * @code{GIMPLE_OMP_CONTINUE}:: @@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement. Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}. @end deftypefn +@node @code{GIMPLE_OMP_ALLOCATE} +@subsection @code{GIMPLE_OMP_ALLOCATE} +@cindex @code{GIMPLE_OMP_ALLOCATE} + +@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @ +tree clauses, int kind) +Build a @code{GIMPLE_OMP_ALLOCATE} statement. @code{CLAUSES} is the clauses +associated with this node. @code{KIND} is the enumeration value +@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory +or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates. +@end deftypefn + +@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @ +gomp_allocate *g, tree clauses) +Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + +@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @ +const gomp_allocate *g) +Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + +@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @ +gomp_allocate *g, int kind) +Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + +@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @ +const gomp_atomic_load *g) +Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + @node @code{GIMPLE_OMP_ATOMIC_LOAD} @subsection @code{GIMPLE_OMP_ATOMIC_LOAD} @cindex @code{GIMPLE_OMP_ATOMIC_LOAD} @@ -1760,7 +1797,6 @@ const gomp_atomic_load *g) Get the @code{RHS} of an atomic set. @end deftypefn - @node @code{GIMPLE_OMP_ATOMIC_STORE} @subsection @code{GIMPLE_OMP_ATOMIC_STORE} @cindex @code{GIMPLE_OMP_ATOMIC_STORE} diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc index ebd87b20a0a..bb961a900df 100644 --- a/gcc/gimple-pretty-print.cc +++ b/gcc/gimple-pretty-print.cc @@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer *buffer, const gomp_critical *gs, } } +static void +dump_gimple_omp_allocate (pretty_printer *buffer, const gomp_allocate *gs, + int spc, dump_flags_t flags) +{ + if (flags & TDF_RAW) + { + const char *kind=""; + switch (gimple_omp_allocate_kind (gs)) + { + case GF_OMP_ALLOCATE_KIND_ALLOCATE: + kind = "allocate"; + break; + case GF_OMP_ALLOCATE_KIND_FREE: + kind = "free"; + break; + } + dump_gimple_fmt (buffer, spc, flags, "%G >"); + } + else + { + pp_string (buffer, "#pragma omp allocate "); + if (gimple_omp_allocate_kind (gs) == GF_OMP_ALLOCATE_KIND_ALLOCATE) + pp_string (buffer, "(kind=allocate) "); + else if (gimple_omp_allocate_kind (gs) == GF_OMP_ALLOCATE_KIND_FREE) + pp_string (buffer, "(kind=free) "); + + dump_omp_clauses (buffer, gimple_omp_allocate_clauses (gs), spc, flags); + } +} + /* Dump a GIMPLE_OMP_ORDERED tuple on the pretty_printer BUFFER. */ static void @@ -2823,6 +2855,11 @@ pp_gimple_stmt_1 (pretty_printer *buffer, const gimple *gs, int spc, flags); break; + case GIMPLE_OMP_ALLOCATE: + dump_gimple_omp_allocate (buffer, as_a (gs), spc, + flags); + break; + case GIMPLE_CATCH: dump_gimple_catch (buffer, as_a (gs), spc, flags); break; diff --git a/gcc/gimple.cc b/gcc/gimple.cc index 9b156399ba1..a8b29f85d3d 100644 --- a/gcc/gimple.cc +++ b/gcc/gimple.cc @@ -1280,6 +1280,18 @@ gimple_build_omp_atomic_store (tree val, enum omp_memory_order mo) return p; } +/* Build a GIMPLE_OMP_ALLOCATE statement. */ + +gomp_allocate * +gimple_build_omp_allocate (tree clauses, int kind) +{ + gomp_allocate *p + = as_a (gimple_alloc (GIMPLE_OMP_ALLOCATE, 0)); + gimple_omp_allocate_set_clauses (p, clauses); + gimple_omp_allocate_set_kind (p, kind); + return p; +} + /* Build a GIMPLE_TRANSACTION statement. */ gtransaction * diff --git a/gcc/gimple.def b/gcc/gimple.def index 296c73c2d52..079565c3920 100644 --- a/gcc/gimple.def +++ b/gcc/gimple.def @@ -388,6 +388,12 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", GSS_OMP_PARALLEL_LAYOUT) CHILD_FN and DATA_ARG like for GIMPLE_OMP_PARALLEL. */ DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_PARALLEL_LAYOUT) +/* GIMPLE_OMP_ALLOCATE represents + #pragma omp allocate + CLAUSES is an OMP_CLAUSE chain holding the associated clauses which hold + variables to be allocated. */ +DEFGSCODE(GIMPLE_OMP_ALLOCATE, "gimple_omp_allocate", GSS_OMP_ALLOCATE) + /* GIMPLE_OMP_ORDERED represents #pragma omp ordered. BODY is the sequence of statements to execute in the ordered section. CLAUSES is an OMP_CLAUSE chain holding the associated clauses. */ diff --git a/gcc/gimple.h b/gcc/gimple.h index 1d15ff98ac2..aa0ae4078ad 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -150,6 +150,9 @@ enum gf_mask { GF_CALL_BY_DESCRIPTOR = 1 << 10, GF_CALL_NOCF_CHECK = 1 << 11, GF_CALL_FROM_NEW_OR_DELETE = 1 << 12, + GF_OMP_ALLOCATE_KIND_MASK = (1 << 2) - 1, + GF_OMP_ALLOCATE_KIND_ALLOCATE = 1, + GF_OMP_ALLOCATE_KIND_FREE = 2, GF_OMP_PARALLEL_COMBINED = 1 << 0, GF_OMP_TASK_TASKLOOP = 1 << 0, GF_OMP_TASK_TASKWAIT = 1 << 1, @@ -796,6 +799,17 @@ struct GTY((tag("GSS_OMP_ATOMIC_LOAD"))) tree rhs, lhs; }; +/* GSS_OMP_ALLOCATE. */ + +struct GTY((tag("GSS_OMP_ALLOCATE"))) + gomp_allocate : public gimple +{ + /* [ WORD 1-6 ] : base class */ + + /* [ WORD 7 ] */ + tree clauses; +}; + /* GIMPLE_OMP_ATOMIC_STORE. See note on GIMPLE_OMP_ATOMIC_LOAD. */ @@ -1129,6 +1143,14 @@ is_a_helper ::test (gimple *gs) return gs->code == GIMPLE_OMP_ATOMIC_STORE; } +template <> +template <> +inline bool +is_a_helper ::test (gimple *gs) +{ + return gs->code == GIMPLE_OMP_ALLOCATE; +} + template <> template <> inline bool @@ -1371,6 +1393,14 @@ is_a_helper ::test (const gimple *gs) return gs->code == GIMPLE_OMP_ATOMIC_STORE; } +template <> +template <> +inline bool +is_a_helper ::test (const gimple *gs) +{ + return gs->code == GIMPLE_OMP_ALLOCATE; +} + template <> template <> inline bool @@ -1572,6 +1602,7 @@ gomp_sections *gimple_build_omp_sections (gimple_seq, tree); gimple *gimple_build_omp_sections_switch (void); gomp_single *gimple_build_omp_single (gimple_seq, tree); gomp_target *gimple_build_omp_target (gimple_seq, int, tree); +gomp_allocate *gimple_build_omp_allocate (tree, int); gomp_teams *gimple_build_omp_teams (gimple_seq, tree); gomp_atomic_load *gimple_build_omp_atomic_load (tree, tree, enum omp_memory_order); @@ -2312,7 +2343,7 @@ static inline unsigned gimple_omp_subcode (const gimple *s) { gcc_gimple_checking_assert (gimple_code (s) >= GIMPLE_OMP_ATOMIC_LOAD - && gimple_code (s) <= GIMPLE_OMP_TEAMS); + && gimple_code (s) <= GIMPLE_OMP_ALLOCATE); return s->subcode; } @@ -6365,6 +6396,30 @@ gimple_omp_sections_set_control (gimple *gs, tree control) omp_sections_stmt->control = control; } +static inline void +gimple_omp_allocate_set_clauses (gomp_allocate *gs, tree c) +{ + gs->clauses = c; +} + +static inline void +gimple_omp_allocate_set_kind (gomp_allocate *gs, int kind) +{ + gs->subcode = (gs->subcode & ~GF_OMP_ALLOCATE_KIND_MASK) + | (kind & GF_OMP_ALLOCATE_KIND_MASK); +} + +static inline tree +gimple_omp_allocate_clauses (const gomp_allocate *gs) +{ + return gs->clauses; +} + +static inline int +gimple_omp_allocate_kind (const gomp_allocate *gs) +{ + return (gimple_omp_subcode (gs) & GF_OMP_ALLOCATE_KIND_MASK); +} /* Set the value being stored in an atomic store. */ @@ -6648,7 +6703,8 @@ gimple_return_set_retval (greturn *gs, tree retval) case GIMPLE_OMP_RETURN: \ case GIMPLE_OMP_ATOMIC_LOAD: \ case GIMPLE_OMP_ATOMIC_STORE: \ - case GIMPLE_OMP_CONTINUE + case GIMPLE_OMP_CONTINUE: \ + case GIMPLE_OMP_ALLOCATE static inline bool is_gimple_omp (const gimple *stmt) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 04990ad91a6..1119ee3bc42 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -14356,6 +14356,21 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) *expr_p = NULL_TREE; } +static void +gimplify_omp_allocate (tree *expr_p, gimple_seq *pre_p) +{ + tree expr = *expr_p; + int kind; + if (OMP_ALLOCATE_KIND_ALLOCATE (expr)) + kind = GF_OMP_ALLOCATE_KIND_ALLOCATE; + else + kind = GF_OMP_ALLOCATE_KIND_FREE; + gimple *stmt = gimple_build_omp_allocate (OMP_ALLOCATE_CLAUSES (expr), + kind); + gimplify_seq_add_stmt (pre_p, stmt); + *expr_p = NULL_TREE; +} + /* Gimplify the gross structure of OpenACC enter/exit data, update, and OpenMP target update constructs. */ @@ -15755,6 +15770,10 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, gimplify_omp_target_update (expr_p, pre_p); ret = GS_ALL_DONE; break; + case OMP_ALLOCATE: + gimplify_omp_allocate (expr_p, pre_p); + ret = GS_ALL_DONE; + break; case OMP_SECTION: case OMP_MASTER: diff --git a/gcc/gsstruct.def b/gcc/gsstruct.def index 19e1088b718..9c7526596e8 100644 --- a/gcc/gsstruct.def +++ b/gcc/gsstruct.def @@ -50,4 +50,5 @@ DEFGSSTRUCT(GSS_OMP_SINGLE_LAYOUT, gimple_statement_omp_single_layout, false) DEFGSSTRUCT(GSS_OMP_CONTINUE, gomp_continue, false) DEFGSSTRUCT(GSS_OMP_ATOMIC_LOAD, gomp_atomic_load, false) DEFGSSTRUCT(GSS_OMP_ATOMIC_STORE_LAYOUT, gomp_atomic_store, false) +DEFGSSTRUCT(GSS_OMP_ALLOCATE, gomp_allocate, false) DEFGSSTRUCT(GSS_TRANSACTION, gtransaction, false) diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 index 0eb35178e03..6957bc55da0 100644 --- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-additional-options "-fdump-tree-original" } +! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple" } module omp_lib_kinds use iso_c_binding, only: c_int, c_intptr_t @@ -71,3 +71,5 @@ end subroutine ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } } ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "gimple" } } +! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "gimple" } } From patchwork Thu Jul 7 10:34:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55831 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 04ED73838202 for ; Thu, 7 Jul 2022 10:37:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 136DA3844079 for ; Thu, 7 Jul 2022 10:37:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 136DA3844079 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112774" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:37:16 -0800 IronPort-SDR: 4b4n0proXL+0z8LjrW7Rl5WDJ6QgZ7MeKod31ztvsV4f7YzrB0lJ36QUs3M1//x/lfYNdNnLTM cDsIlLDyJlq+KmjndHNywszOvqXUE9uk6pXL5c9Aa9ITCLmQyrnOBpGY0hTRtxkqUCpIj7ccYl dCzEOGLqZVGkkPE/5uePdbLebvD2eVj8c4YwxukZVb0uHxNtzGmdNP7NXVEMjdYpIHPYsO0Ojq 7NlSJy7T+ZC8iT1t6OPq8UcQphBn7Quxa4r5SOR3L0gb1LjhZjnlhnGaxR1ftj+0ozvuvCg3dY kTw= From: Andrew Stubbs To: Subject: [PATCH 14/17] Lower allocate directive (OpenMP 5.0). Date: Thu, 7 Jul 2022 11:34:45 +0100 Message-ID: <0f75f3d2a2b5bf11ec30ed989d6237e438d94f77.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch looks for malloc/free calls that were generated by allocate statement that is associated with allocate directive and replaces them with GOMP_alloc and GOMP_free. gcc/ChangeLog: * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR. (scan_omp_allocate): New. (scan_omp_1_stmt): Call it. (lower_omp_allocate): New function. (lower_omp_1): Call it. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: Add tests. * gfortran.dg/gomp/allocate-7.f90: New test. * gfortran.dg/gomp/allocate-8.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-2.f90: New test. --- gcc/omp-low.cc | 139 ++++++++++++++++++ gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 9 ++ gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 | 13 ++ gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 | 15 ++ .../testsuite/libgomp.fortran/allocate-2.f90 | 48 ++++++ 5 files changed, 224 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90 diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index cdadd6f0c96..7d1a2a0d795 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -1746,6 +1746,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_TASK_REDUCTION: case OMP_CLAUSE_ALLOCATE: + case OMP_CLAUSE_ALLOCATOR: break; case OMP_CLAUSE_ALIGNED: @@ -1963,6 +1964,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_FILTER: case OMP_CLAUSE__CONDTEMP_: + case OMP_CLAUSE_ALLOCATOR: break; case OMP_CLAUSE__CACHE_: @@ -3033,6 +3035,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for *stmt, maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true; } +/* Scan an OpenMP allocate directive. */ + +static void +scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx) +{ + omp_context *ctx; + ctx = new_omp_context (stmt, outer_ctx); + scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx); +} + /* Scan an OpenMP sections directive. */ static void @@ -4332,6 +4344,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p, insert_decl_map (&ctx->cb, var, var); } break; + case GIMPLE_OMP_ALLOCATE: + scan_omp_allocate (as_a (stmt), ctx); + break; default: *handled_ops_p = false; break; @@ -8768,6 +8783,125 @@ lower_omp_single_simple (gomp_single *single_stmt, gimple_seq *pre_p) gimple_seq_add_stmt (pre_p, gimple_build_label (flabel)); } +static void +lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *ctx) +{ + gomp_allocate *st = as_a (gsi_stmt (*gsi_p)); + tree clauses = gimple_omp_allocate_clauses (st); + int kind = gimple_omp_allocate_kind (st); + gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE + || kind == GF_OMP_ALLOCATE_KIND_FREE); + + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + { + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR) + continue; + + bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE); + /* The allocate directives that appear in a target region must specify + an allocator clause unless a requires directive with the + dynamic_allocators clause is present in the same compilation unit. */ + if (OMP_ALLOCATE_ALLOCATOR (c) == NULL_TREE + && ((omp_requires_mask & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0) + && omp_maybe_offloaded_ctx (ctx)) + error_at (OMP_CLAUSE_LOCATION (c), "% directive must" + " specify an allocator here"); + + tree var = OMP_ALLOCATE_DECL (c); + + gimple_stmt_iterator gsi = *gsi_p; + for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) != GIMPLE_CALL + || (allocate && gimple_call_fndecl (stmt) + != builtin_decl_explicit (BUILT_IN_MALLOC)) + || (!allocate && gimple_call_fndecl (stmt) + != builtin_decl_explicit (BUILT_IN_FREE))) + continue; + const gcall *gs = as_a (stmt); + tree allocator = OMP_ALLOCATE_ALLOCATOR (c) + ? OMP_ALLOCATE_ALLOCATOR (c) + : integer_zero_node; + if (allocate) + { + tree lhs = gimple_call_lhs (gs); + if (lhs && TREE_CODE (lhs) == SSA_NAME) + { + gimple_stmt_iterator gsi2 = gsi; + gsi_next (&gsi2); + gimple *assign = gsi_stmt (gsi2); + if (gimple_code (assign) == GIMPLE_ASSIGN) + { + lhs = gimple_assign_lhs (as_a (assign)); + if (lhs == NULL_TREE + || TREE_CODE (lhs) != COMPONENT_REF) + continue; + lhs = TREE_OPERAND (lhs, 0); + } + } + + if (lhs == var) + { + unsigned HOST_WIDE_INT ialign = 0; + tree align; + if (TYPE_P (var)) + ialign = TYPE_ALIGN_UNIT (var); + else + ialign = DECL_ALIGN_UNIT (var); + align = build_int_cst (size_type_node, ialign); + tree repl = builtin_decl_explicit (BUILT_IN_GOMP_ALLOC); + tree size = gimple_call_arg (gs, 0); + gimple *g = gimple_build_call (repl, 3, align, size, + allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (&gsi, g, true); + /* The malloc call has been replaced. Now see if there is + any free call due to deallocate statement and replace + that too. */ + allocate = false; + } + } + else + { + tree arg = gimple_call_arg (gs, 0); + if (arg && TREE_CODE (arg) == SSA_NAME) + { + gimple_stmt_iterator gsi2 = gsi; + gsi_prev (&gsi2); + if (!gsi_end_p (gsi2)) + { + gimple *gs = gsi_stmt (gsi2); + if (gimple_code (gs) == GIMPLE_ASSIGN) + { + const gassign *assign = as_a (gs); + tree rhs = gimple_assign_rhs1 (assign); + tree lhs = gimple_assign_lhs (assign); + if (lhs == arg && rhs + && TREE_CODE (rhs) == COMPONENT_REF) + arg = TREE_OPERAND (rhs, 0); + } + } + } + + if (arg == var) + { + tree repl = builtin_decl_explicit (BUILT_IN_GOMP_FREE); + gimple *g = gimple_build_call (repl, 2, + gimple_call_arg (gs, 0), + allocator); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (&gsi, g, true); + break; + } + } + } + } + gsi_replace (gsi_p, gimple_build_nop (), true); +} + /* A subroutine of lower_omp_single. Expand the simple form of a GIMPLE_OMP_SINGLE, with a copyprivate clause: @@ -14431,6 +14565,11 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) gcc_assert (ctx); lower_omp_scope (gsi_p, ctx); break; + case GIMPLE_OMP_ALLOCATE: + ctx = maybe_lookup_ctx (stmt); + gcc_assert (ctx); + lower_omp_allocate (gsi_p, ctx); + break; case GIMPLE_OMP_SINGLE: ctx = maybe_lookup_ctx (stmt); gcc_assert (ctx); diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 index 6957bc55da0..738d9936f6a 100644 --- a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 @@ -1,5 +1,6 @@ ! { dg-do compile } ! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple" } +! { dg-additional-options "-fdump-tree-omplower" } module omp_lib_kinds use iso_c_binding, only: c_int, c_intptr_t @@ -47,6 +48,7 @@ end type real, allocatable :: var3(:,:) type (my_type), allocatable :: var4 integer, pointer :: pii, parr(:) + integer, allocatable :: var character, allocatable :: str1a, str1aarr(:) character(len=5), allocatable :: str5a, str5aarr(:) @@ -67,9 +69,16 @@ end type !$omp allocate allocate(pii, parr(5)) + + ! allocate statement not associated with an allocate directive + allocate(var) end subroutine ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "original" } } ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "original" } } ! { dg-final { scan-tree-dump-times "#pragma omp allocate \\(kind=allocate\\)" 6 "gimple" } } ! { dg-final { scan-tree-dump "#pragma omp allocate \\(kind=free\\)" "gimple" } } +! { dg-final { scan-tree-dump-times "builtin_malloc" 11 "original" } } +! { dg-final { scan-tree-dump-times "builtin_free" 9 "original" } } +! { dg-final { scan-tree-dump-times "GOMP_alloc" 10 "omplower" } } +! { dg-final { scan-tree-dump-times "GOMP_free" 8 "omplower" } } diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 new file mode 100644 index 00000000000..db76e901c08 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 @@ -0,0 +1,13 @@ +! { dg-do compile } + +subroutine bar(a) + implicit none + integer :: a + integer, allocatable :: var +!$omp target + !$omp allocate (var) ! { dg-error "'allocate' directive must specify an allocator here" } + allocate (var) +!$omp end target + +end subroutine + diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 new file mode 100644 index 00000000000..699a3b80878 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 @@ -0,0 +1,15 @@ +! { dg-do compile } + + +subroutine bar(a) + implicit none + integer :: a + integer, allocatable :: var +!$omp requires dynamic_allocators +!$omp target + !$omp allocate (var) + allocate (var) +!$omp end target + +end subroutine + diff --git a/libgomp/testsuite/libgomp.fortran/allocate-2.f90 b/libgomp/testsuite/libgomp.fortran/allocate-2.f90 new file mode 100644 index 00000000000..2219f107fe7 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/allocate-2.f90 @@ -0,0 +1,48 @@ +! { dg-do run } +! { dg-additional-sources allocate-1.c } +! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" } + +module m + use omp_lib + use iso_c_binding + implicit none + interface + integer(c_int) function is_64bit_aligned (a) bind(C) + import :: c_int + integer :: a + end + end interface + +contains + +subroutine foo (x, y, h) + use omp_lib + integer :: x + integer :: y + integer (kind=omp_allocator_handle_kind) :: h + integer, allocatable :: var1 + + !$omp allocate (var1) allocator(h) + allocate (var1) + + if (is_64bit_aligned(var1) == 0) then + stop 19 + end if + + deallocate(var1) +end subroutine +end module m + +program main + use omp_lib + use m + type (omp_alloctrait) :: traits(2) + integer (omp_allocator_handle_kind) :: a + + traits = [omp_alloctrait (omp_atk_alignment, 64), & + omp_alloctrait (omp_atk_fallback, omp_atv_null_fb)] + a = omp_init_allocator (omp_default_mem_space, 2, traits) + if (a == omp_null_allocator) stop 1 + call foo (42, 12, a); + call omp_destroy_allocator (a); +end From patchwork Thu Jul 7 10:34:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55836 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3C1DC3955628 for ; Thu, 7 Jul 2022 10:38:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 300213838238 for ; Thu, 7 Jul 2022 10:38:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 300213838238 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112835" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:38:12 -0800 IronPort-SDR: hEjIAa4QkQzPHJwGrx+6hzuEgV0js3fkSPeCOfuiSwjJQfbl70sdBY8V0wcFthqrh2jxnE3njD WlcxzFmfwXV+Hx+iWynepmH9t+LyqRfPo6I3DFYVI1OaUe8cpSLe4rP017e9rokM725Ukf4DU1 4dBq/0Oj0h+2IXIC76nPeTyLy9Jdnoql3A/0gVO0XaIXJNiQ1L452rIwr0azTDyNzuKZ1kDfU2 +v2ez3IeTmUftWJtlLP4GPT1kcdGLXvrurFaU9abewaDazJSQfZi4hs+ZUkqZuc517zbHuBqOc sX4= From: Andrew Stubbs To: Subject: [PATCH 15/17] amdgcn: Support XNACK mode Date: Thu, 7 Jul 2022 11:34:46 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_STOCKGEN, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. To support the feature we must be able to set the appropriate meta-data and set the load instructions to early-clobber. When the port supports scheduling of s_waitcnt instructions there will be further requirements. gcc/ChangeLog: * config/gcn/gcn-hsa.h (XNACKOPT): New macro. (ASM_SPEC): Use XNACKOPT. * config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ... (enum hsaco_attr_type): ... this, and generalize the names. (TARGET_XNACK): New macro. * config/gcn/gcn-valu.md (gather_insn_1offset): Add xnack compatible alternatives. (gather_insn_2offsets): Likewise. * config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices other than Fiji. (gcn_expand_epilogue): Remove early-clobber problems. (output_file_start): Emit xnack attributes. (gcn_hsa_declare_function_name): Obey -mxnack setting. * config/gcn/gcn.md (xnack): New attribute. (enabled): Rework to include "xnack" attribute. (*movbi): Add xnack compatible alternatives. (*mov_insn): Likewise. (*mov_insn): Likewise. (*mov_insn): Likewise. (*movti_insn): Likewise. * config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax. (sram_ecc_type): Rename to ... (hsaco_attr_type: ... this.) * config/gcn/mkoffload.c (SET_XNACK_ANY): New macro. (TEST_XNACK): Delete. (TEST_XNACK_ANY): New macro. (TEST_XNACK_ON): New macro. (main): Support the new -mxnack=on/off/any syntax. --- gcc/config/gcn/gcn-hsa.h | 3 +- gcc/config/gcn/gcn-opts.h | 10 ++-- gcc/config/gcn/gcn-valu.md | 29 ++++----- gcc/config/gcn/gcn.cc | 34 ++++++----- gcc/config/gcn/gcn.md | 113 +++++++++++++++++++++++------------- gcc/config/gcn/gcn.opt | 18 +++--- gcc/config/gcn/mkoffload.cc | 19 ++++-- 7 files changed, 140 insertions(+), 86 deletions(-) diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index b3079cebb43..fd08947574f 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -81,12 +81,13 @@ extern unsigned int gcn_local_sym_hash (const char *name); /* In HSACOv4 no attribute setting means the binary supports "any" hardware configuration. The name of the attribute also changed. */ #define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc" +#define XNACKOPT "mxnack=on:-mattr=+xnack;mxnack=off:-mattr=-xnack" /* Use LLVM assembler and linker options. */ #define ASM_SPEC "-triple=amdgcn--amdhsa " \ "%:last_arg(%{march=*:-mcpu=%*}) " \ "%{!march=*|march=fiji:--amdhsa-code-object-version=3} " \ - "%{" NO_XNACK "mxnack:-mattr=+xnack;:-mattr=-xnack} " \ + "%{" NO_XNACK XNACKOPT "}" \ "%{" NO_SRAM_ECC SRAMOPT "} " \ "-filetype=obj" #define LINK_SPEC "--pie --export-dynamic" diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index b62dfb45f59..07ddc79cda3 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -48,11 +48,13 @@ extern enum gcn_isa { #define TARGET_M0_LDS_LIMIT (TARGET_GCN3) #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS) -enum sram_ecc_type +#define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) + +enum hsaco_attr_type { - SRAM_ECC_OFF, - SRAM_ECC_ON, - SRAM_ECC_ANY + HSACO_ATTR_OFF, + HSACO_ATTR_ON, + HSACO_ATTR_ANY }; #endif diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index abe46201344..ec114db9dd1 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -741,13 +741,13 @@ (define_expand "gather_expr" {}) (define_insn "gather_insn_1offset" - [(set (match_operand:V_ALL 0 "register_operand" "=v") + [(set (match_operand:V_ALL 0 "register_operand" "=v,&v") (unspec:V_ALL - [(plus: (match_operand: 1 "register_operand" " v") + [(plus: (match_operand: 1 "register_operand" " v, v") (vec_duplicate: - (match_operand 2 "immediate_operand" " n"))) - (match_operand 3 "immediate_operand" " n") - (match_operand 4 "immediate_operand" " n") + (match_operand 2 "immediate_operand" " n, n"))) + (match_operand 3 "immediate_operand" " n, n") + (match_operand 4 "immediate_operand" " n, n") (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_FLAT_P (INTVAL (operands[3])) @@ -777,7 +777,8 @@ (define_insn "gather_insn_1offset" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "xnack" "off,on")]) (define_insn "gather_insn_1offset_ds" [(set (match_operand:V_ALL 0 "register_operand" "=v") @@ -802,17 +803,18 @@ (define_insn "gather_insn_1offset_ds" (set_attr "length" "12")]) (define_insn "gather_insn_2offsets" - [(set (match_operand:V_ALL 0 "register_operand" "=v") + [(set (match_operand:V_ALL 0 "register_operand" "=v,&v") (unspec:V_ALL [(plus: (plus: (vec_duplicate: - (match_operand:DI 1 "register_operand" "Sv")) + (match_operand:DI 1 "register_operand" "Sv,Sv")) (sign_extend: - (match_operand: 2 "register_operand" " v"))) - (vec_duplicate: (match_operand 3 "immediate_operand" " n"))) - (match_operand 4 "immediate_operand" " n") - (match_operand 5 "immediate_operand" " n") + (match_operand: 2 "register_operand" " v, v"))) + (vec_duplicate: (match_operand 3 "immediate_operand" + " n, n"))) + (match_operand 4 "immediate_operand" " n, n") + (match_operand 5 "immediate_operand" " n, n") (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_GLOBAL_P (INTVAL (operands[4])) @@ -831,7 +833,8 @@ (define_insn "gather_insn_2offsets" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "xnack" "off,on")]) (define_expand "scatter_store" [(match_operand:DI 0 "register_operand") diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 6fc20d3f659..4df05453604 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -170,9 +170,14 @@ gcn_option_override (void) acc_lds_size = 32768; } - /* The xnack option is a placeholder, for now. */ - if (flag_xnack) - sorry ("XNACK support"); + /* gfx908 "Fiji" does not support XNACK. */ + if (gcn_arch == PROCESSOR_FIJI) + { + if (flag_xnack == HSACO_ATTR_ON) + error ("-mxnack=on is incompatible with -march=fiji"); + /* Allow HSACO_ATTR_ANY silently because that's the default. */ + flag_xnack = HSACO_ATTR_OFF; + } } /* }}} */ @@ -3188,17 +3193,19 @@ gcn_expand_epilogue (void) /* Assume that an exit value compatible with gcn-run is expected. That is, the third input parameter is an int*. - We can't allocate any new registers, but the kernarg_reg is - dead after this, so we'll use that. */ + We can't allocate any new registers, but the dispatch_ptr and + kernarg_reg are dead after this, so we'll use those. */ + rtx dispatch_ptr_reg = gen_rtx_REG (DImode, cfun->machine->args.reg + [DISPATCH_PTR_ARG]); rtx kernarg_reg = gen_rtx_REG (DImode, cfun->machine->args.reg [KERNARG_SEGMENT_PTR_ARG]); rtx retptr_mem = gen_rtx_MEM (DImode, gen_rtx_PLUS (DImode, kernarg_reg, GEN_INT (16))); set_mem_addr_space (retptr_mem, ADDR_SPACE_SCALAR_FLAT); - emit_move_insn (kernarg_reg, retptr_mem); + emit_move_insn (dispatch_ptr_reg, retptr_mem); - rtx retval_mem = gen_rtx_MEM (SImode, kernarg_reg); + rtx retval_mem = gen_rtx_MEM (SImode, dispatch_ptr_reg); set_mem_addr_space (retval_mem, ADDR_SPACE_SCALAR_FLAT); emit_move_insn (retval_mem, gen_rtx_REG (SImode, SGPR_REGNO (RETURN_VALUE_REG))); @@ -5250,11 +5257,12 @@ static void output_file_start (void) { /* In HSACOv4 no attribute setting means the binary supports "any" hardware - configuration. In GCC binaries, this is true for SRAM ECC, but not - XNACK. */ - const char *xnack = (flag_xnack ? ":xnack+" : ":xnack-"); - const char *sram_ecc = (flag_sram_ecc == SRAM_ECC_ON ? ":sramecc+" - : flag_sram_ecc == SRAM_ECC_OFF ? ":sramecc-" + configuration. */ + const char *xnack = (flag_xnack == HSACO_ATTR_ON ? ":xnack+" + : flag_xnack == HSACO_ATTR_OFF ? ":xnack-" + : ""); + const char *sram_ecc = (flag_sram_ecc == HSACO_ATTR_ON ? ":sramecc+" + : flag_sram_ecc == HSACO_ATTR_OFF ? ":sramecc-" : ""); const char *cpu; @@ -5298,7 +5306,7 @@ void gcn_hsa_declare_function_name (FILE *file, const char *name, tree) { int sgpr, vgpr; - bool xnack_enabled = false; + bool xnack_enabled = TARGET_XNACK; fputs ("\n\n", file); diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md index 033c1708e88..0f9381c9194 100644 --- a/gcc/config/gcn/gcn.md +++ b/gcc/config/gcn/gcn.md @@ -277,12 +277,19 @@ (define_attr "length" "" (define_attr "gcn_version" "gcn3,gcn5" (const_string "gcn3")) +(define_attr "xnack" "na,off,on" (const_string "na")) + (define_attr "enabled" "" - (cond [(eq_attr "gcn_version" "gcn3") (const_int 1) - (and (eq_attr "gcn_version" "gcn5") - (ne (symbol_ref "TARGET_GCN5_PLUS") (const_int 0))) - (const_int 1)] - (const_int 0))) + (cond [(and (eq_attr "gcn_version" "gcn5") + (eq (symbol_ref "TARGET_GCN5_PLUS") (const_int 0))) + (const_int 0) + (and (eq_attr "xnack" "off") + (ne (symbol_ref "TARGET_XNACK") (const_int 0))) + (const_int 0) + (and (eq_attr "xnack" "on") + (eq (symbol_ref "TARGET_XNACK") (const_int 0))) + (const_int 0)] + (const_int 1))) ; We need to be able to identify v_readlane and v_writelane with ; SGPR lane selection in order to handle "Manually Inserted Wait States". @@ -472,9 +479,9 @@ (define_split (define_insn "*movbi" [(set (match_operand:BI 0 "nonimmediate_operand" - "=Sg, v,Sg,cs,cV,cV,Sm,RS, v,RF, v,RM") + "=Sg, v,Sg,cs,cV,cV,Sm,&Sm,RS, v,&v,RF, v,&v,RM") (match_operand:BI 1 "gcn_load_operand" - "SSA,vSvA, v,SS, v,SS,RS,Sm,RF, v,RM, v"))] + "SSA,vSvA, v,SS, v,SS,RS, RS,Sm,RF,RF, v,RM,RM, v"))] "" { /* SCC as an operand is currently not accepted by the LLVM assembler, so @@ -501,66 +508,77 @@ (define_insn "*movbi" return "s_mov_b32\tvcc_lo, %1\;" "s_mov_b32\tvcc_hi, 0"; case 6: - return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)"; case 7: - return "s_store_dword\t%1, %A0"; + return "s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0)"; case 8: - return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0"; + return "s_store_dword\t%1, %A0"; case 9: - return "flat_store_dword\t%A0, %1%O0%g0"; case 10: - return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)"; + return "flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0"; case 11: + return "flat_store_dword\t%A0, %1%O0%g0"; + case 12: + case 13: + return "global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)"; + case 14: return "global_store_dword\t%A0, %1%O0%g0"; default: gcc_unreachable (); } } - [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,flat,flat, - flat,flat") - (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*") - (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12")]) + [(set_attr "type" "sop1,vop1,vop3a,sopk,vopc,mult,smem,smem,smem,flat,flat, + flat,flat,flat,flat") + (set_attr "exec" "*,*,none,*,*,*,*,*,*,*,*,*,*,*,*") + (set_attr "length" "4,4,4,4,4,8,12,12,12,12,12,12,12,12,12") + (set_attr "xnack" "*,*,*,*,*,*,off,on,*,off,on,*,off,on,*")]) ; 32bit move pattern (define_insn "*mov_insn" [(set (match_operand:SISF 0 "nonimmediate_operand" - "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG, v,SD, v,RM") + "=SD,SD,SD,SD,&SD,RB,Sm,&Sm,RS,v,Sg, v, v,&v,RF,v,RLRG, v,SD, v,&v,RM") (match_operand:SISF 1 "gcn_load_operand" - "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B, v,RLRG, Y,RM, v"))] + "SSA, J, B,RB, RB,Sm,RS, RS,Sm,v, v,Sv,RF,RF, v,B, v,RLRG, Y,RM,RM, v"))] "" "@ s_mov_b32\t%0, %1 s_movk_i32\t%0, %1 s_mov_b32\t%0, %1 s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0) + s_buffer_load%s0\t%0, s[0:3], %1\;s_waitcnt\tlgkmcnt(0) s_buffer_store%s1\t%1, s[0:3], %0 s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0) + s_load_dword\t%0, %A1\;s_waitcnt\tlgkmcnt(0) s_store_dword\t%1, %A0 v_mov_b32\t%0, %1 v_readlane_b32\t%0, %1, 0 v_writelane_b32\t%0, %1, 0 flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0 + flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0 flat_store_dword\t%A0, %1%O0%g0 v_mov_b32\t%0, %1 ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) s_mov_b32\t%0, %1 global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) global_store_dword\t%A0, %1%O0%g0" - [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,vop1,vop3a,vop3a,flat, - flat,vop1,ds,ds,sop1,flat,flat") - (set_attr "exec" "*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*") - (set_attr "length" "4,4,8,12,12,12,12,4,8,8,12,12,8,12,12,8,12,12")]) + [(set_attr "type" "sop1,sopk,sop1,smem,smem,smem,smem,smem,smem,vop1,vop3a, + vop3a,flat,flat,flat,vop1,ds,ds,sop1,flat,flat,flat") + (set_attr "exec" "*,*,*,*,*,*,*,*,*,*,none,none,*,*,*,*,*,*,*,*,*,*") + (set_attr "length" + "4,4,8,12,12,12,12,12,12,4,8,8,12,12,12,8,12,12,8,12,12,12") + (set_attr "xnack" + "*,*,*,off,on,*,off,on,*,*,*,*,off,on,*,*,*,*,*,off,on,*")]) ; 8/16bit move pattern ; TODO: implement combined load and zero_extend, but *only* for -msram-ecc=on (define_insn "*mov_insn" [(set (match_operand:QIHI 0 "nonimmediate_operand" - "=SD,SD,SD,v,Sg, v, v,RF,v,RLRG, v, v,RM") + "=SD,SD,SD,v,Sg, v, v,&v,RF,v,RLRG, v, v,&v,RM") (match_operand:QIHI 1 "gcn_load_operand" - "SSA, J, B,v, v,Sv,RF, v,B, v,RLRG,RM, v"))] + "SSA, J, B,v, v,Sv,RF,RF, v,B, v,RLRG,RM,RM, v"))] "gcn_valid_move_p (mode, operands[0], operands[1])" "@ s_mov_b32\t%0, %1 @@ -570,24 +588,27 @@ (define_insn "*mov_insn" v_readlane_b32\t%0, %1, 0 v_writelane_b32\t%0, %1, 0 flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0 + flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0 flat_store%s0\t%A0, %1%O0%g0 v_mov_b32\t%0, %1 ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) global_store%s0\t%A0, %1%O0%g0" - [(set_attr "type" - "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat") - (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*") - (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12")]) + [(set_attr "type" "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,flat,vop1,ds,ds, + flat,flat,flat") + (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*,*,*") + (set_attr "length" "4,4,8,4,4,4,12,12,12,8,12,12,12,12,12") + (set_attr "xnack" "*,*,*,*,*,*,off,on,*,*,*,*,off,on,*")]) ; 64bit move pattern (define_insn_and_split "*mov_insn" [(set (match_operand:DIDF 0 "nonimmediate_operand" - "=SD,SD,SD,RS,Sm,v, v,Sg, v, v,RF,RLRG, v, v,RM") + "=SD,SD,SD,RS,Sm,&Sm,v, v,Sg, v, v,&v,RF,RLRG, v, v,&v,RM") (match_operand:DIDF 1 "general_operand" - "SSA, C,DB,Sm,RS,v,DB, v,Sv,RF, v, v,RLRG,RM, v"))] + "SSA, C,DB,Sm,RS, RS,v,DB, v,Sv,RF,RF, v, v,RLRG,RM,RM, v"))] "GET_CODE(operands[1]) != SYMBOL_REF" "@ s_mov_b64\t%0, %1 @@ -595,15 +616,18 @@ (define_insn_and_split "*mov_insn" # s_store_dwordx2\t%1, %A0 s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0) + s_load_dwordx2\t%0, %A1\;s_waitcnt\tlgkmcnt(0) # # # # flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0 + flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0 flat_store_dwordx2\t%A0, %1%O0%g0 ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0) global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) global_store_dwordx2\t%A0, %1%O0%g0" "reload_completed && ((!MEM_P (operands[0]) && !MEM_P (operands[1]) @@ -634,29 +658,33 @@ (define_insn_and_split "*mov_insn" operands[3] = inhi; } } - [(set_attr "type" "sop1,sop1,mult,smem,smem,vmult,vmult,vmult,vmult,flat, - flat,ds,ds,flat,flat") - (set_attr "length" "4,8,*,12,12,*,*,*,*,12,12,12,12,12,12")]) + [(set_attr "type" "sop1,sop1,mult,smem,smem,smem,vmult,vmult,vmult,vmult, + flat,flat,flat,ds,ds,flat,flat,flat") + (set_attr "length" "4,8,*,12,12,12,*,*,*,*,12,12,12,12,12,12,12,12") + (set_attr "xnack" "*,*,*,*,off,on,*,*,*,*,off,on,*,*,*,off,on,*")]) ; 128-bit move. (define_insn_and_split "*movti_insn" [(set (match_operand:TI 0 "nonimmediate_operand" - "=SD,RS,Sm,RF, v,v, v,SD,RM, v,RL, v") - (match_operand:TI 1 "general_operand" - "SSB,Sm,RS, v,RF,v,Sv, v, v,RM, v,RL"))] + "=SD,RS,Sm,&Sm,RF, v,&v,v, v,SD,RM, v,&v,RL, v") + (match_operand:TI 1 "general_operand" + "SSB,Sm,RS, RS, v,RF,RF,v,Sv, v, v,RM,RM, v,RL"))] "" "@ # s_store_dwordx4\t%1, %A0 s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0) + s_load_dwordx4\t%0, %A1\;s_waitcnt\tlgkmcnt(0) flat_store_dwordx4\t%A0, %1%O0%g0 flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0 + flat_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\t0 # # # global_store_dwordx4\t%A0, %1%O0%g0 global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) + global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0) ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0) ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)" "reload_completed @@ -678,10 +706,11 @@ (define_insn_and_split "*movti_insn" operands[0] = gcn_operand_part (TImode, operands[0], 0); operands[1] = gcn_operand_part (TImode, operands[1], 0); } - [(set_attr "type" "mult,smem,smem,flat,flat,vmult,vmult,vmult,flat,flat,\ - ds,ds") - (set_attr "delayeduse" "*,*,yes,*,*,*,*,*,yes,*,*,*") - (set_attr "length" "*,12,12,12,12,*,*,*,12,12,12,12")]) + [(set_attr "type" "mult,smem,smem,smem,flat,flat,flat,vmult,vmult,vmult,flat, + flat,flat,ds,ds") + (set_attr "delayeduse" "*,*,yes,yes,*,*,*,*,*,*,*,yes,*,*,*") + (set_attr "length" "*,12,12,12,12,12,12,*,*,*,12,12,12,12,12") + (set_attr "xnack" "*,*,off,on,*,off,on,*,*,*,*,off,on,*,*")]) ;; }}} ;; {{{ Prologue/Epilogue @@ -844,6 +873,8 @@ (define_insn "movdi_symbol" (clobber (reg:BI SCC_REG))] "GET_CODE (operands[1]) == SYMBOL_REF || GET_CODE (operands[1]) == LABEL_REF" { + /* This s_load may not be XNACK-safe on devices where the GOT may fault. + DGPUs are most likely fine. */ if (SYMBOL_REF_P (operands[1]) && SYMBOL_REF_WEAK (operands[1])) return "s_getpc_b64\t%0\;" @@ -868,6 +899,8 @@ (define_insn "movdi_symbol_save_scc" { /* !!! These sequences clobber CC_SAVE_REG. */ + /* This s_load may not be XNACK-safe on devices where the GOT may fault. + DGPUs are most likely fine. */ if (SYMBOL_REF_P (operands[1]) && SYMBOL_REF_WEAK (operands[1])) return "s_mov_b32\ts22, scc\;" diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt index 9606aaf0b1a..759f7a064c9 100644 --- a/gcc/config/gcn/gcn.opt +++ b/gcc/config/gcn/gcn.opt @@ -81,23 +81,23 @@ Wopenacc-dims Target Var(warn_openacc_dims) Warning Warn about invalid OpenACC dimensions. -mxnack -Target Var(flag_xnack) Init(0) -Compile for devices requiring XNACK enabled. Default off. - Enum -Name(sram_ecc_type) Type(enum sram_ecc_type) +Name(hsaco_attr_type) Type(enum hsaco_attr_type) SRAM-ECC modes: EnumValue -Enum(sram_ecc_type) String(off) Value(SRAM_ECC_OFF) +Enum(hsaco_attr_type) String(off) Value(HSACO_ATTR_OFF) EnumValue -Enum(sram_ecc_type) String(on) Value(SRAM_ECC_ON) +Enum(hsaco_attr_type) String(on) Value(HSACO_ATTR_ON) EnumValue -Enum(sram_ecc_type) String(any) Value(SRAM_ECC_ANY) +Enum(hsaco_attr_type) String(any) Value(HSACO_ATTR_ANY) + +mxnack= +Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_xnack) Init(HSACO_ATTR_ANY) +Compile for devices requiring XNACK enabled. Default off. msram-ecc= -Target RejectNegative Joined ToLower Enum(sram_ecc_type) Var(flag_sram_ecc) Init(SRAM_ECC_ANY) +Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_sram_ecc) Init(HSACO_ATTR_ANY) Compile for devices with the SRAM ECC feature enabled, or not. Default \"any\". diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index b8b3fecfcb4..cb8903c27cb 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -72,10 +72,14 @@ #define SET_XNACK_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \ | EF_AMDGPU_FEATURE_XNACK_ON_V4) +#define SET_XNACK_ANY(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \ + | EF_AMDGPU_FEATURE_XNACK_ANY_V4) #define SET_XNACK_OFF(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_XNACK_V4) \ | EF_AMDGPU_FEATURE_XNACK_OFF_V4) -#define TEST_XNACK(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ - == EF_AMDGPU_FEATURE_XNACK_ON_V4) +#define TEST_XNACK_ANY(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ + == EF_AMDGPU_FEATURE_XNACK_ANY_V4) +#define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ + == EF_AMDGPU_FEATURE_XNACK_ON_V4) #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \ | EF_AMDGPU_FEATURE_SRAMECC_ON_V4) @@ -884,9 +888,11 @@ main (int argc, char **argv) fPIC = true; else if (strcmp (argv[i], "-fpic") == 0) fpic = true; - else if (strcmp (argv[i], "-mxnack") == 0) + else if (strcmp (argv[i], "-mxnack=on") == 0) SET_XNACK_ON (elf_flags); - else if (strcmp (argv[i], "-mno-xnack") == 0) + else if (strcmp (argv[i], "-mxnack=any") == 0) + SET_XNACK_ANY (elf_flags); + else if (strcmp (argv[i], "-mxnack=off") == 0) SET_XNACK_OFF (elf_flags); else if (strcmp (argv[i], "-msram-ecc=on") == 0) SET_SRAM_ECC_ON (elf_flags); @@ -1045,8 +1051,9 @@ main (int argc, char **argv) obstack_ptr_grow (&ld_argv_obstack, gcn_s2_name); obstack_ptr_grow (&ld_argv_obstack, "-lgomp"); obstack_ptr_grow (&ld_argv_obstack, - (TEST_XNACK (elf_flags) - ? "-mxnack" : "-mno-xnack")); + (TEST_XNACK_ON (elf_flags) ? "-mxnack=on" + : TEST_XNACK_ANY (elf_flags) ? "-mxnack=any" + : "-mxnack=off")); obstack_ptr_grow (&ld_argv_obstack, (TEST_SRAM_ECC_ON (elf_flags) ? "-msram-ecc=on" : TEST_SRAM_ECC_ANY (elf_flags) ? "-msram-ecc=any" From patchwork Thu Jul 7 10:34:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55837 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9A3E9385274D for ; Thu, 7 Jul 2022 10:38:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id B88AB3839C77 for ; Thu, 7 Jul 2022 10:38:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B88AB3839C77 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112839" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:38:15 -0800 IronPort-SDR: iUlIJXuB2Xhm3SEGu51/kG539DbAZlXkrOoKr0udZMPfn/yf2i4JGV/AvpvTul823mj+2q8emx +lotecVOiRmjkBoa8Bn7QkIeRlhD8Cwi5z2vJoPOuUT/k3f+3nd5EaZmqf00S2D99Y112del52 rma8UCPJNNYTzj1PJnp0IxcJfcsKq/x00yK3lsOF8x8UbUSs9Du/NEEJ1Y1j0qad/1pAnDhlGw ZxZoTNkoaR77EiuoA/HiGTGj+t6K+6nJgveKGK97hpsJjPVgEHTvelmOm/QMIUCICzXYnPjCDA 4w8= From: Andrew Stubbs To: Subject: [PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK Date: Thu, 7 Jul 2022 11:34:47 +0100 Message-ID: X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure that the information is passed all the way to the backend. The backend then places a marker in the assembler code for mkoffload to find. Finally mkoffload places a constructor function into the final program to ensure that the HSA_XNACK environment variable passes the correct mode to the GPU. The HSA_XNACK variable must be set before the HSA runtime is even loaded, so it makes more sense to have this set within the constructor than at some point later within libgomp or the GCN plugin. gcc/ChangeLog: * config/gcn/gcn.c (unified_shared_memory_enabled): New variable. (gcn_init_cumulative_args): Handle attribute "omp unified memory". (gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+". * config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro. (process_asm): Detect "MKOFFLOAD OPTIONS: USM+". Emit configure_xnack constructor, as required. * omp-low.c (create_omp_child_function): Add attribute "omp unified memory". --- gcc/config/gcn/gcn.cc | 28 +++++++++++++++++++++++++++- gcc/config/gcn/mkoffload.cc | 37 ++++++++++++++++++++++++++++++++++++- gcc/omp-low.cc | 4 ++++ 3 files changed, 67 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 4df05453604..88cc505597e 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -68,6 +68,11 @@ static bool ext_gcn_constants_init = 0; enum gcn_isa gcn_isa = ISA_GCN3; /* Default to GCN3. */ +/* Record whether the host compiler added "omp unifed memory" attributes to + any functions. We can then pass this on to mkoffload to ensure xnack is + compatible there too. */ +static bool unified_shared_memory_enabled = false; + /* Reserve this much space for LDS (for propagating variables from worker-single mode to worker-partitioned mode), per workgroup. Global analysis could calculate an exact bound, but we don't do that yet. @@ -2542,6 +2547,25 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ , if (!caller && cfun->machine->normal_function) gcn_detect_incoming_pointer_arg (fndecl); + if (fndecl && lookup_attribute ("omp unified memory", + DECL_ATTRIBUTES (fndecl))) + { + unified_shared_memory_enabled = true; + + switch (gcn_arch) + { + case PROCESSOR_FIJI: + case PROCESSOR_VEGA10: + case PROCESSOR_VEGA20: + error ("GPU architecture does not support Unified Shared Memory"); + default: + ; + } + + if (flag_xnack == HSACO_ATTR_OFF) + error ("Unified Shared Memory is enabled, but XNACK is disabled"); + } + reinit_regs (); } @@ -5458,12 +5482,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) assemble_name (file, name); fputs (":\n", file); - /* This comment is read by mkoffload. */ + /* These comments are read by mkoffload. */ if (flag_openacc) fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n", oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG), oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER), oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name); + if (unified_shared_memory_enabled) + fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n"); } /* Implement TARGET_ASM_SELECT_SECTION. diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index cb8903c27cb..5741d0a917b 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -80,6 +80,8 @@ == EF_AMDGPU_FEATURE_XNACK_ANY_V4) #define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ == EF_AMDGPU_FEATURE_XNACK_ON_V4) +#define TEST_XNACK_OFF(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ + == EF_AMDGPU_FEATURE_XNACK_OFF_V4) #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \ | EF_AMDGPU_FEATURE_SRAMECC_ON_V4) @@ -474,6 +476,7 @@ static void process_asm (FILE *in, FILE *out, FILE *cfile) { int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0; + bool unified_shared_memory_enabled = false; struct obstack fns_os, dims_os, regcounts_os; obstack_init (&fns_os); obstack_init (&dims_os); @@ -498,6 +501,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) fn_count += 2; char buf[1000]; + char dummy; enum { IN_CODE, IN_METADATA, @@ -517,6 +521,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile) dims_count++; } + if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0) + unified_shared_memory_enabled = true; + break; } case IN_METADATA: @@ -565,7 +572,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile) } } - char dummy; if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0) { state = IN_VARS; @@ -617,6 +623,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) fprintf (cfile, "#include \n"); fprintf (cfile, "#include \n"); fprintf (cfile, "#include \n\n"); + fprintf (cfile, "#include \n\n"); fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count); @@ -657,6 +664,34 @@ process_asm (FILE *in, FILE *out, FILE *cfile) } fprintf (cfile, "\n};\n\n"); + /* Emit a constructor function to set the HSA_XNACK environment variable. + This must be done before the ROCr runtime library is loaded. + We never override a user value (exit empty string), but we do emit a + useful diagnostic in the wrong mode (the ROCr message is not good. */ + if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled) + fatal_error (input_location, + "conflicting settings; XNACK is forced off but Unified " + "Shared Memory is on"); + if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled) + fprintf (cfile, + "static __attribute__((constructor))\n" + "void configure_xnack (void)\n" + "{\n" + " const char *val = getenv (\"HSA_XNACK\");\n" + " if (!val || val[0] == '\\0')\n" + " setenv (\"HSA_XNACK\", \"%d\", true);\n" + " else if (%s)\n" + " {\n" + " fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; " + "please unset\\n\", val);\n" + " exit (1);\n" + " }\n" + "}\n\n", + unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags), + (unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags) + ? "val[0] != '1' || val[1] != '\\0'" + : "val[0] == '1' && val[1] == '\\0'")); + obstack_free (&fns_os, NULL); for (i = 0; i < dims_count; i++) free (dims[i].name); diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 7d1a2a0d795..239446beb52 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -2107,6 +2107,10 @@ create_omp_child_function (omp_context *ctx, bool task_copy) DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (target_attr), NULL_TREE, DECL_ATTRIBUTES (decl)); + if (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED) + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("omp unified memory"), + NULL_TREE, DECL_ATTRIBUTES (decl)); } t = build_decl (DECL_SOURCE_LOCATION (decl), From patchwork Thu Jul 7 10:34:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55835 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA0883852766 for ; Thu, 7 Jul 2022 10:38:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 4B5953851162 for ; Thu, 7 Jul 2022 10:38:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4B5953851162 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,252,1650960000"; d="scan'208";a="81112840" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 07 Jul 2022 02:38:17 -0800 IronPort-SDR: hsmDtHG4c9ob87hnU+MLMl6M5znrCHpz//VKymun+IF49fgdI9s6nfDa9Z1iY69+oYeLjmPZGy Hn6oRaMHaKtBUlylJRrs3Xr+Rd4zdecF6ajEDKme25VTSEXfw0vqDWMZ5Q41I7SHWk4LkRLITN EfqY7oexLJiyvc02946oNiPeSux1F8H/ImUQk+bmMBw96mgj5Gm+6z9PMY41s0XuZpITJs9LfX CAxE4WdXlIbOQkMDl3zQm2jSWLx8Jj6bbK6W/viICJq6RToiXN10RqR7JHIOq2dOFa24GohOHG arU= From: Andrew Stubbs To: Subject: [PATCH 17/17] amdgcn: libgomp plugin USM implementation Date: Thu, 7 Jul 2022 11:34:48 +0100 Message-ID: <9d8aca9014fe40a76e1f389daf94351b522ab73b.1657188329.git.ams@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Implement the Unified Shared Memory API calls in the GCN plugin. The allocate and free are pretty straight-forward because all "target" memory allocations are compatible with USM, on the right hardware. However, there's no known way to check what memory region was used, after the fact, so we use a splay tree to record allocations so we can answer "is_usm_ptr" later. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (struct usm_splay_tree_key_s): New. (usm_splay_compare): New. (splay_tree_prefix): New. (GOMP_OFFLOAD_usm_alloc): New. (GOMP_OFFLOAD_usm_free): New. (GOMP_OFFLOAD_is_usm_ptr): New. (GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold. Add GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (gomp_fatal): New. (splay_tree_c): New. * testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New. * testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target. * testsuite/libgomp.c-c++-common/requires-1.c: Likewise. * testsuite/libgomp.c/usm-1.c: Likewise. * testsuite/libgomp.c/usm-2.c: Likewise. * testsuite/libgomp.c/usm-3.c: Likewise. * testsuite/libgomp.c/usm-4.c: Likewise. * testsuite/libgomp.c/usm-5.c: Likewise. * testsuite/libgomp.c/usm-6.c: Likewise. --- libgomp/plugin/plugin-gcn.c | 104 +++++++++++++++++- libgomp/testsuite/lib/libgomp.exp | 22 ++++ libgomp/testsuite/libgomp.c++/usm-1.C | 2 +- .../libgomp.c-c++-common/requires-1.c | 1 + libgomp/testsuite/libgomp.c/usm-1.c | 1 + libgomp/testsuite/libgomp.c/usm-2.c | 1 + libgomp/testsuite/libgomp.c/usm-3.c | 1 + libgomp/testsuite/libgomp.c/usm-4.c | 1 + libgomp/testsuite/libgomp.c/usm-5.c | 2 +- libgomp/testsuite/libgomp.c/usm-6.c | 2 +- 10 files changed, 133 insertions(+), 4 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index ea327bf2ca0..6a9ff5cd93e 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3226,7 +3226,11 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) if (!init_hsa_context ()) return 0; /* Return -1 if no omp_requires_mask cannot be fulfilled but - devices were present. */ + devices were present. + Note: not all devices support USM, but the compiler refuses to create + binaries for those that don't anyway. */ + omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS + | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY); if (hsa_context.agent_count > 0 && omp_requires_mask != 0) return -1; return hsa_context.agent_count; @@ -3810,6 +3814,89 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars, GOMP_PLUGIN_target_task_completion, async_data); } +/* Use a splay tree to track USM allocations. */ + +typedef struct usm_splay_tree_node_s *usm_splay_tree_node; +typedef struct usm_splay_tree_s *usm_splay_tree; +typedef struct usm_splay_tree_key_s *usm_splay_tree_key; + +struct usm_splay_tree_key_s { + void *addr; + size_t size; +}; + +static inline int +usm_splay_compare (usm_splay_tree_key x, usm_splay_tree_key y) +{ + if ((x->addr <= y->addr && x->addr + x->size > y->addr) + || (y->addr <= x->addr && y->addr + y->size > x->addr)) + return 0; + + return (x->addr > y->addr ? 1 : -1); +} + +#define splay_tree_prefix usm +#include "../splay-tree.h" + +static struct usm_splay_tree_s usm_map = { NULL }; + +/* Allocate memory suitable for Unified Shared Memory. + + In fact, AMD memory need only be "coarse grained", which target + allocations already are. We do need to track allocations so that + GOMP_OFFLOAD_is_usm_ptr can look them up. */ + +void * +GOMP_OFFLOAD_usm_alloc (int device, size_t size) +{ + void *ptr = GOMP_OFFLOAD_alloc (device, size); + + usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s)); + node->key.addr = ptr; + node->key.size = size; + node->left = NULL; + node->right = NULL; + usm_splay_tree_insert (&usm_map, node); + + return ptr; +} + +/* Free memory allocated via GOMP_OFFLOAD_usm_alloc. */ + +bool +GOMP_OFFLOAD_usm_free (int device, void *ptr) +{ + struct usm_splay_tree_key_s key = { ptr, 1 }; + usm_splay_tree_key node = usm_splay_tree_lookup (&usm_map, &key); + if (node) + { + usm_splay_tree_remove (&usm_map, &key); + free (node); + } + + return GOMP_OFFLOAD_free (device, ptr); +} + +/* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc. */ + +bool +GOMP_OFFLOAD_is_usm_ptr (void *ptr) +{ + struct usm_splay_tree_key_s key = { ptr, 1 }; + return usm_splay_tree_lookup (&usm_map, &key); +} + +/* Indicate which GOMP_REQUIRES_* features are supported. */ + +bool +GOMP_OFFLOAD_supported_features (unsigned int *mask) +{ + *mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS + | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY); + + return (*mask == 0); +} + /* }}} */ /* {{{ OpenACC Plugin API */ @@ -4084,3 +4171,18 @@ GOMP_OFFLOAD_openacc_destroy_thread_data (void *data) } /* }}} */ +/* {{{ USM splay tree */ + +/* Include this now so that splay-tree.c doesn't include it later. This + avoids a conflict with splay_tree_prefix. */ +#include "libgomp.h" + +/* This allows splay-tree.c to call gomp_fatal in this context. The splay + tree code doesn't use the variadic arguments right now. */ +#define gomp_fatal(MSG, ...) GOMP_PLUGIN_fatal (MSG) + +/* Include the splay tree code inline, with the prefixes added. */ +#define splay_tree_prefix usm +#define splay_tree_c +#include "../splay-tree.h" +/* }}} */ diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index 891f90929d2..dce1af279e1 100644 --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -536,3 +536,25 @@ int main() { return 0; } } "-lcuda -lcudart" ] } + +# return 1 if OpenMP Unified Share Memory is supported + +proc check_effective_target_omp_usm { } { + if { [libgomp_check_effective_target_offload_target "nvptx"] } { + return 1 + } + + if { [libgomp_check_effective_target_offload_target "amdgcn"] } { + return [check_no_compiler_messages omp_usm executable { + #pragma omp requires unified_shared_memory + int main () { + #pragma omp target + ; + return 0; + } + }] + } + + return 0 +} + diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C index fea25e5f10b..6e88f90d61f 100644 --- a/libgomp/testsuite/libgomp.c++/usm-1.C +++ b/libgomp/testsuite/libgomp.c++/usm-1.C @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */ +/* { dg-require-effective-target omp_usm } */ #include #pragma omp requires unified_shared_memory diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c index fedf9779769..b760d5ebaf7 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c @@ -1,5 +1,6 @@ /* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } */ /* { dg-additional-sources requires-1-aux.c } */ +/* { dg-require-effective-target omp_usm } */ /* Check diagnostic by device-compiler's lto1. Other file uses: 'requires unified_address'. */ diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c index 1b35f19c45b..e73f1816f9a 100644 --- a/libgomp/testsuite/libgomp.c/usm-1.c +++ b/libgomp/testsuite/libgomp.c/usm-1.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target omp_usm } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c index 689cee7e456..31f2bae7145 100644 --- a/libgomp/testsuite/libgomp.c/usm-2.c +++ b/libgomp/testsuite/libgomp.c/usm-2.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target omp_usm } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c index 2ca66afe93f..2c78a0d8ced 100644 --- a/libgomp/testsuite/libgomp.c/usm-3.c +++ b/libgomp/testsuite/libgomp.c/usm-3.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target omp_usm } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c index 753908c8440..1ac5498f73f 100644 --- a/libgomp/testsuite/libgomp.c/usm-4.c +++ b/libgomp/testsuite/libgomp.c/usm-4.c @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target omp_usm } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c index 4d8b3cf71b1..563397f941a 100644 --- a/libgomp/testsuite/libgomp.c/usm-5.c +++ b/libgomp/testsuite/libgomp.c/usm-5.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target omp_usm } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c index c207140092a..bd14f8197b3 100644 --- a/libgomp/testsuite/libgomp.c/usm-6.c +++ b/libgomp/testsuite/libgomp.c/usm-6.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */ +/* { dg-require-effective-target omp_usm } */ #include #include