From patchwork Thu Jan 27 16:41:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 50492 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22A10394743C for ; Thu, 27 Jan 2022 16:41:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 48F22382BC36 for ; Thu, 27 Jan 2022 16:41:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 48F22382BC36 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: WVhVM+niAih/V8b8hinnvXqMM/nPoqSuOEVP/yWpva0Zhd10GY/lgOG29sVK4nb7ljEUuQ1hge oghVadyvtOpPhwq2wuB0CES62q+TYASqEw4HSTm6XzS7YQsDuYdHv87iDi0ZChOs5tQG4TnjUA zUC++RwkQNv6HArODPUgHva7eMLNQIWsfLKIqTkVc5eX6MjF17lpICNkmUuxrYPvMNb5eZ2iFJ zHwqiS8hMQvdaLtq86EtLyXqEVZVh43A09spWGdUcU4yeCPR59Wsta/3dFqbz1MqGjnq21bCgz gkEkpBoT2Ifux0aiFlpNfIKo X-IronPort-AV: E=Sophos;i="5.88,321,1635235200"; d="scan'208";a="71300212" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 27 Jan 2022 08:41:32 -0800 IronPort-SDR: g8vmI5/ofOwVsZvHtNGyCwlcf3QToRqtbrieHY8/s+ZJw18MJuOgkKWohkdjPUeUVRhg25aMr7 RG7r/3t2GQB0xeHb87cLVirVWNc5hzDS0kSeP+xeYRp9GXUFZayLON7CEET53mS38rCvGCQa3+ 3Mdor4L60elW+/a6hUjJqJsPDZcNLPWAHXjlLXzlKePWaqPpYqau+3Vj6cmGjx3S2mmieGEBmO LNTOk+9tWpQaxOX/UWy11uf+Zk2QXJH8oYXAf9hKX19vJH8x8Ogwa1bL/wdkeoswOSJB6F/Wju xZA= Message-ID: Date: Thu, 27 Jan 2022 16:41:26 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 Content-Language: en-GB From: Andrew Stubbs Subject: [PATCH] openmp, nvptx: low-lat memory access traits To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the NVPTX low-latency allocator that I have previously posted (awaiting re-review). The patch assumes that all my previously posted patches are applied already. Given that any memory allocated from the low-latency memory space cannot support the "access=all" allocator trait (because the hardware does not provide any means to do so) it seems reasonable that omp_alloc should fail (or fall back) when asked to do so. Unfortunately the "all" setting is the default when the trait is not specified explicitly, so it must also fail in that case also. This patch implements the restriction accordingly. The validation applies only to the NVPTX configuration, so some future implementation for another target can do whatever it needs with "access". Without explicitly saying so, this change means that the omp_low_latency_mem_alloc predefined allocator now implies "access=pteam" (at least on NVPTX). OK for stage 1? Thanks Andrew openmp, nvptx: low-lat memory access traits The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog: * allocator.c (MEMSPACE_VALIDATE): New macro. (omp_aligned_alloc): Use MEMSPACE_VALIDATE. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/nvptx/allocator.c (nvptx_memspace_validate): New function. (MEMSPACE_VALIDATE): New macro. * testsuite/libgomp.c/allocators-4.c (main): Add access trait. * testsuite/libgomp.c/allocators-6.c (main): Add access trait. * testsuite/libgomp.c/allocators-7.c: New test. diff --git a/libgomp/allocator.c b/libgomp/allocator.c index b1f41ccc0d4..000ccc2dd9c 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -51,6 +51,9 @@ #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ (PIN ? NULL : free (ADDR)) #endif +#ifndef MEMSPACE_VALIDATE +#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) 1 +#endif /* Map the predefined allocators to the correct memory space. The index to this table is the omp_allocator_handle_t enum value. */ @@ -279,6 +282,10 @@ retry: if (__builtin_add_overflow (size, new_size, &new_size)) goto fail; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) + goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { @@ -505,6 +512,10 @@ retry: if (__builtin_add_overflow (size_temp, new_size, &new_size)) goto fail; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) + goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { @@ -672,6 +683,10 @@ retry: goto fail; old_size = data->size; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) + goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index f740b97f6ac..0102680b717 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -358,6 +358,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, return realloc (addr, size); } +static inline int +nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access) +{ + /* Disallow use of low-latency memory when it must be accessible by + all threads. */ + return (memspace != omp_low_lat_mem_space + || access != omp_atv_all); +} + #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ nvptx_memspace_alloc (MEMSPACE, SIZE) #define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ @@ -366,5 +375,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE) #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ nvptx_memspace_free (MEMSPACE, ADDR, SIZE) +#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \ + nvptx_memspace_validate (MEMSPACE, ACCESS) #include "../../allocator.c" diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c index 9fa6aa1624f..cae27ea33c1 100644 --- a/libgomp/testsuite/libgomp.c/allocators-4.c +++ b/libgomp/testsuite/libgomp.c/allocators-4.c @@ -23,10 +23,11 @@ main () #pragma omp target { /* Ensure that the memory we get *is* low-latency with a null-fallback. */ - omp_alloctrait_t traits[1] - = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, - 1, traits); + 2, traits); int size = 4; diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c index 90bf73095ef..c03233df582 100644 --- a/libgomp/testsuite/libgomp.c/allocators-6.c +++ b/libgomp/testsuite/libgomp.c/allocators-6.c @@ -23,10 +23,11 @@ main () #pragma omp target { /* Ensure that the memory we get *is* low-latency with a null-fallback. */ - omp_alloctrait_t traits[1] - = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, - 1, traits); + 2, traits); int size = 16; diff --git a/libgomp/testsuite/libgomp.c/allocators-7.c b/libgomp/testsuite/libgomp.c/allocators-7.c new file mode 100644 index 00000000000..a0a738b1d1d --- /dev/null +++ b/libgomp/testsuite/libgomp.c/allocators-7.c @@ -0,0 +1,68 @@ +/* { dg-do run } */ + +/* { dg-require-effective-target offload_device } */ +/* { dg-xfail-if "not implemented" { ! offload_target_nvptx } } */ + +/* Test that GPU low-latency allocation is limited to team access. */ + +#include +#include + +#pragma omp requires dynamic_allocators + +int +main () +{ + #pragma omp target + { + /* Ensure that the memory we get *is* low-latency with a null-fallback. */ + omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; + omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, + 2, traits); + + omp_alloctrait_t traits_all[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_all } }; + omp_allocator_handle_t lowlat_all + = omp_init_allocator (omp_low_lat_mem_space, 2, traits_all); + + omp_alloctrait_t traits_default[1] + = { { omp_atk_fallback, omp_atv_null_fb } }; + omp_allocator_handle_t lowlat_default + = omp_init_allocator (omp_low_lat_mem_space, 1, traits_default); + + void *a = omp_alloc(1, lowlat); // good + void *b = omp_alloc(1, lowlat_all); // bad + void *c = omp_alloc(1, lowlat_default); // bad + + if (!a || b || c) + __builtin_abort (); + + omp_free (a, lowlat); + + + a = omp_calloc(1, 1, lowlat); // good + b = omp_calloc(1, 1, lowlat_all); // bad + c = omp_calloc(1, 1, lowlat_default); // bad + + if (!a || b || c) + __builtin_abort (); + + omp_free (a, lowlat); + + + a = omp_realloc(NULL, 1, lowlat, lowlat); // good + b = omp_realloc(NULL, 1, lowlat_all, lowlat_all); // bad + c = omp_realloc(NULL, 1, lowlat_default, lowlat_default); // bad + + if (!a || b || c) + __builtin_abort (); + + omp_free (a, lowlat); + } + +return 0; +} +