From patchwork Tue Jan 4 15:32:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 49531 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58B5F3858001 for ; Tue, 4 Jan 2022 15:32:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id A1B7B3858C2C for ; Tue, 4 Jan 2022 15:32:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A1B7B3858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: ZKp+//47FmAOwFuHlprD8GvomsUfQbyUHS5ejEGsCxbslkVU34Bcp6lADRt3DY9WIKkQuvEM3X X8eW1sSLRSNBTlQaUZIMscWxXuqDlmTNt5RhH4gIA9ZeqYIvvYjmpEI2ROMGUvqZ4bsd3ia2+/ sHqPN5tOJK9n1DFfGW/OB8iNVswMOV2mqcmIQ/k9pA+waIsIi2sk7Hl75b0gDp20GMQoxNAR9b DIK6UfWzYg2CCcl+RNyxlKgm1/COOtCWdiUpkVpBX0s9jxeA3T9S2g8n92ciMfYDZy17Y3uTya tH1mGVvByg+P7M2KW0IocW6O X-IronPort-AV: E=Sophos;i="5.88,261,1635235200"; d="scan'208";a="70358568" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 04 Jan 2022 07:32:22 -0800 IronPort-SDR: tz2NrUaqk8x8Tk6W3MohAMLFqBVFH9FBt5NyhOfah9HJ3UeX+lh2h7Jj63IshsmhWCmKXcm4h9 5gZbvBuiXKfWmZp2ujsdJ7nVC2+tewQ/nit6GWF4sCn/08neAfT6taqgw/K7tWKn93XpZbFc67 Tb2N+tyhEOrHXmRLUPta5JW/+OP9SgGc1xT58UIxqlLkrBeOr6jHmaxLZkFU1opAekl238xm/R 7nqy7B2/M1JG4nEEG/4Hv7GR/Tpu3MZMAsGlgHR51rVzbpUHi06Um4aEDX3bojqSpGeXAOC/ux bqI= Message-ID: Date: Tue, 4 Jan 2022 15:32:17 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Content-Language: en-GB From: Andrew Stubbs Subject: [PATCH] libgomp, openmp: pinned memory To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements the OpenMP pinned memory trait for Linux hosts. On other hosts and on devices the trait becomes a no-op (instead of being rejected). The memory is locked via the mlock syscall, which is both the "correct" way to do it on Linux, and a problem because the default ulimit for pinned memory is very small (and most users don't have permission to increase it (much?)). Therefore the code emits a non-fatal warning message if locking fails. Another approach might be to use cudaHostAlloc to allocate the memory in the first place, which bypasses the ulimit somehow, but this would not help non-NVidia users. The tests work on Linux and will xfail on other hosts; neither libgomp nor the test knows how to allocate or query pinned memory elsewhere. The patch applies on top of the text of my previously submitted patches, but does not actually depend on the functionality of those patches. OK for stage 1? I'll commit a backport to OG11 shortly. Andrew libgomp: pinned memory Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. libgomp/ChangeLog: * allocator.c (MEMSPACE_PIN): New macro. (xmlock): New function. (omp_init_allocator): Don't disallow the pinned trait. (omp_aligned_alloc): Add pinning via MEMSPACE_PIN. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * testsuite/libgomp.c/alloc-pinned-1.c: New test. * testsuite/libgomp.c/alloc-pinned-2.c: New test. diff --git a/libgomp/allocator.c b/libgomp/allocator.c index b1f5fe0a5e2..671b91e7ff8 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -51,6 +51,25 @@ #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \ ((void)MEMSPACE, (void)SIZE, free (ADDR)) #endif +#ifndef MEMSPACE_PIN +/* Only define this on supported host platforms. */ +#ifdef __linux__ +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \ + ((void)MEMSPACE, xmlock (ADDR, SIZE)) + +#include +#include +void +xmlock (void *addr, size_t size) +{ + if (mlock (addr, size)) + perror ("libgomp: failed to pin memory (ulimit too low?)"); +} +#else +#define MEMSPACE_PIN(MEMSPACE, ADDR, SIZE) \ + ((void)MEMSPACE, (void)ADDR, (void)SIZE) +#endif +#endif /* Map the predefined allocators to the correct memory space. The index to this table is the omp_allocator_handle_t enum value. */ @@ -212,7 +231,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits, data.alignment = sizeof (void *); /* No support for these so far (for hbw will use memkind). */ - if (data.pinned || data.memspace == omp_high_bw_mem_space) + if (data.memspace == omp_high_bw_mem_space) return omp_null_allocator; ret = gomp_malloc (sizeof (struct omp_allocator_data)); @@ -326,6 +345,9 @@ retry: #endif goto fail; } + + if (allocator_data->pinned) + MEMSPACE_PIN (allocator_data->memspace, ptr, new_size); } else { @@ -335,6 +357,9 @@ retry: ptr = MEMSPACE_ALLOC (memspace, new_size); if (ptr == NULL) goto fail; + + if (allocator_data && allocator_data->pinned) + MEMSPACE_PIN (allocator_data->memspace, ptr, new_size); } if (new_alignment > sizeof (void *)) @@ -539,6 +564,9 @@ retry: #endif goto fail; } + + if (allocator_data->pinned) + MEMSPACE_PIN (allocator_data->memspace, ptr, new_size); } else { @@ -548,6 +576,9 @@ retry: ptr = MEMSPACE_CALLOC (memspace, new_size); if (ptr == NULL) goto fail; + + if (allocator_data && allocator_data->pinned) + MEMSPACE_PIN (allocator_data->memspace, ptr, new_size); } if (new_alignment > sizeof (void *)) @@ -727,7 +758,11 @@ retry: #endif goto fail; } - else if (prev_size) + + if (allocator_data->pinned) + MEMSPACE_PIN (allocator_data->memspace, new_ptr, new_size); + + if (prev_size) { ret = (char *) new_ptr + sizeof (struct omp_mem_header); ((struct omp_mem_header *) ret)[-1].ptr = new_ptr; @@ -747,6 +782,10 @@ retry: new_ptr = MEMSPACE_REALLOC (memspace, data->ptr, data->size, new_size); if (new_ptr == NULL) goto fail; + + if (allocator_data && allocator_data->pinned) + MEMSPACE_PIN (allocator_data->memspace, ptr, new_size); + ret = (char *) new_ptr + sizeof (struct omp_mem_header); ((struct omp_mem_header *) ret)[-1].ptr = new_ptr; ((struct omp_mem_header *) ret)[-1].size = new_size; diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c new file mode 100644 index 00000000000..0a6360cda29 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c @@ -0,0 +1,81 @@ +/* { dg-do run } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that pinned memory works. */ + +#ifdef __linux__ +#include +#include +#include +#include + +#include + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} +#else +int +get_pinned_mem () +{ + return 0; +} +#endif + +#include + +/* Allocate more than a page each time, but stay within the ulimit. */ +#define SIZE 10*1024 + +int +main () +{ + const omp_alloctrait_t traits[] = { + { omp_atk_pinned, 1 } + }; + omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, 1, traits); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + void *p = omp_alloc (SIZE, allocator); + if (!p) + abort (); + + int amount = get_pinned_mem (); + if (amount == 0) + abort (); + + p = omp_realloc (p, SIZE*2, allocator, allocator); + + int amount2 = get_pinned_mem (); + if (amount2 <= amount) + abort (); + + p = omp_calloc (1, SIZE, allocator); + + if (get_pinned_mem () <= amount2) + abort (); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c new file mode 100644 index 00000000000..8fdb4ff5cfd --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c @@ -0,0 +1,87 @@ +/* { dg-do run } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that pinned memory works (pool_size code path). */ + +#ifdef __linux__ +#include +#include +#include +#include + +#include + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + fclose (proc); + return val; + } + } + abort (); +} +#else +int +get_pinned_mem () +{ + return 0; +} +#endif + +#include + +/* Allocate more than a page each time, but stay within the ulimit. */ +#define SIZE 10*1024 + +int +main () +{ + const omp_alloctrait_t traits[] = { + { omp_atk_pinned, 1 }, + { omp_atk_pool_size, SIZE*8 } + }; + omp_allocator_handle_t allocator = omp_init_allocator (omp_default_mem_space, + 2, traits); + + // Sanity check + if (get_pinned_mem () != 0) + abort (); + + void *p = omp_alloc (SIZE, allocator); + if (!p) + abort (); + + int amount = get_pinned_mem (); + if (amount == 0) + abort (); + + p = omp_realloc (p, SIZE*2, allocator, allocator); + if (!p) + abort (); + + int amount2 = get_pinned_mem (); + if (amount2 <= amount) + abort (); + + p = omp_calloc (1, SIZE, allocator); + if (!p) + abort (); + + if (get_pinned_mem () <= amount2) + abort (); + + return 0; +}