From patchwork Tue Mar 8 11:30:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abid Qadeer X-Patchwork-Id: 51780 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6BBA385DC06 for ; Tue, 8 Mar 2022 11:32:28 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id A9388385AC3E; Tue, 8 Mar 2022 11:31:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A9388385AC3E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,164,1643702400"; d="scan'208";a="72843961" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 08 Mar 2022 03:31:17 -0800 IronPort-SDR: Czvvb23/91oWodnP7C9YjKWouLKN7nbbqNOYzL1TmVhRFmtGl0KMRvWXD7ezy7zf7XL0VSfMQ2 CCgJZC4GHvtYdAe6QA70JMgIeh548oR8BjS+Br8t7pZMh5sO9Z9U4fE2RuUedfmWYiG7ahbrij wKXkdTEJClQVFeul2DJ8NT0qUI4OLwbPNb33KtojvJmdOriAwcMTHEDmQ9D5UGsEilvF03xlHm l2VyVuX8syVhIK+J3XesoHbvkbZ4EXDsxRRGxXKnDNnfzrfbrvjJnZJhD5u0mcSirCBxhCIZgM 54A= From: Hafiz Abid Qadeer To: , Subject: [PATCH 1/5] openmp: Add -foffload-memory Date: Tue, 8 Mar 2022 11:30:55 +0000 Message-ID: <20220308113059.688551-2-abidh@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220308113059.688551-1-abidh@codesourcery.com> References: <20220308113059.688551-1-abidh@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, ams@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Andrew Stubbs Add a new option. It will be used in follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 ++++++++++++++++ gcc/coretypes.h | 7 +++++++ gcc/doc/invoke.texi | 16 +++++++++++++++- 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index 8b6513de47c..17426523e23 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2182,6 +2182,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32) EnumValue Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) +foffload-memory= +Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE) +-foffload-memory=[none|unified|pinned] Use an offload memory optimization. + +Enum +Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs) + +EnumValue +Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE) + +EnumValue +Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED) + +EnumValue +Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED) + fomit-frame-pointer Common Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 08b9ac9094c..dd52d5bb113 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -206,6 +206,13 @@ enum offload_abi { OFFLOAD_ABI_ILP32 }; +/* Types of memory optimization for an offload device. */ +enum offload_memory { + OFFLOAD_MEMORY_NONE, + OFFLOAD_MEMORY_UNIFIED, + OFFLOAD_MEMORY_PINNED +}; + /* Types of profile update methods. */ enum profile_update { PROFILE_UPDATE_SINGLE, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 248ed534aee..d16019fc8c3 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -202,7 +202,7 @@ in the following sections. -fno-builtin -fno-builtin-@var{function} -fcond-mismatch @gol -ffreestanding -fgimple -fgnu-tm -fgnu89-inline -fhosted @gol -flax-vector-conversions -fms-extensions @gol --foffload=@var{arg} -foffload-options=@var{arg} @gol +-foffload=@var{arg} -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol -fopenacc -fopenacc-dim=@var{geom} @gol -fopenmp -fopenmp-simd @gol -fpermitted-flt-eval-methods=@var{standard} @gol @@ -2694,6 +2694,20 @@ Typical command lines are -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm @end smallexample +@item -foffload-memory=none +@itemx -foffload-memory=unified +@itemx -foffload-memory=pinned +@opindex foffload-memory +@cindex OpenMP offloading memory modes +Enable a memory optimization mode to use with OpenMP. The default behavior, +@option{-foffload-memory=none}, is to do nothing special (unless enabled via +a requires directive in the code). @option{-foffload-memory=unified} is +equivalent to @code{#pragma omp requires unified_shared_memory}. +@option{-foffload-memory=pinned} forces all host memory to be pinned (this +mode may require the user to increase the ulimit setting for locked memory). +All translation units must select the same setting to avoid undefined +behavior. + @item -fopenacc @opindex fopenacc @cindex OpenACC accelerator programming From patchwork Tue Mar 8 11:30:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abid Qadeer X-Patchwork-Id: 51781 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6229D3857417 for ; Tue, 8 Mar 2022 11:33:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 9D65D385C404; Tue, 8 Mar 2022 11:31:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9D65D385C404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,164,1643702400"; d="scan'208";a="72879779" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 08 Mar 2022 03:31:30 -0800 IronPort-SDR: NYLTWVrXQOt5LePQ9qhytV8qYERX9n43jbYMrCPPBkNFD2cnQIrs5RIF+00aA2oXgxnET2cSNq 7QMqWmsvyq3Gp2n74IUQf96o8j5dvz/EVcCegYLJyPuNS09jLLQgMU0WN+sVL8L/okK+4x1wgU /r7+kkXdw3V6i658cBXe8keEtafODEDTlJkDYnhnK3tPUez+zDiTLWoNxtABUmYr9wCcWc633h 24/JQSdWFYYXiNjAYi87zB8Enoepq9xwD+B/Pql/LW3h0ct6wic3wegBfaZaDqe586nkf/vO2l /10= From: Hafiz Abid Qadeer To: , Subject: [PATCH 2/5] openmp: allow requires unified_shared_memory Date: Tue, 8 Mar 2022 11:30:56 +0000 Message-ID: <20220308113059.688551-3-abidh@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220308113059.688551-1-abidh@codesourcery.com> References: <20220308113059.688551-1-abidh@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, ams@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Andrew Stubbs This is the front-end portion of the Unified Shared Memory implementation. It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets flag_offload_memory, but is otherwise inactive, for now. It also checks that -foffload-memory isn't set to an incompatible mode. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Allow "requires unified_share_memory". gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Allow "requires unified_share_memory". gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_requires): Allow "requires unified_share_memory". gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-1.c: New test. * gfortran.dg/gomp/usm-1.f90: New test. --- gcc/c/c-parser.cc | 13 ++++++++++++- gcc/cp/parser.cc | 13 ++++++++++++- gcc/fortran/openmp.cc | 10 +++++++++- gcc/testsuite/c-c++-common/gomp/usm-1.c | 4 ++++ gcc/testsuite/gfortran.dg/gomp/usm-1.f90 | 6 ++++++ 5 files changed, 43 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 84deac04c44..dc834158d1c 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -22542,7 +22542,16 @@ c_parser_omp_requires (c_parser *parser) if (!strcmp (p, "unified_address")) this_req = OMP_REQUIRES_UNIFIED_ADDRESS; else if (!strcmp (p, "unified_shared_memory")) + { this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_shared_memory is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) @@ -22609,7 +22618,9 @@ c_parser_omp_requires (c_parser *parser) c_parser_skip_to_pragma_eol (parser, false); return; } - if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS) + if (p + && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS + && this_req != OMP_REQUIRES_UNIFIED_SHARED_MEMORY) sorry_at (cloc, "%qs clause on % directive not " "supported yet", p); if (p) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 03d99aba13e..ba263152aaf 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -46464,7 +46464,16 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) if (!strcmp (p, "unified_address")) this_req = OMP_REQUIRES_UNIFIED_ADDRESS; else if (!strcmp (p, "unified_shared_memory")) + { this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_shared_memory is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) @@ -46537,7 +46546,9 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) cp_parser_skip_to_pragma_eol (parser, pragma_tok); return false; } - if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS) + if (p + && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS + && this_req != OMP_REQUIRES_UNIFIED_SHARED_MEMORY) sorry_at (cloc, "%qs clause on % directive not " "supported yet", p); if (p) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 16cd03a3d67..1f434857719 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "gomp-constants.h" #include "target-memory.h" /* For gfc_encode_character. */ +#include "options.h" /* Match an end of OpenMP directive. End of OpenMP directive is optional whitespace, followed by '\n' or comment '!'. */ @@ -5373,6 +5374,12 @@ gfc_match_omp_requires (void) requires_clause = OMP_REQ_UNIFIED_SHARED_MEMORY; if (requires_clauses & OMP_REQ_UNIFIED_SHARED_MEMORY) goto duplicate_clause; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + gfc_error_now ("unified_shared_memory at %C is incompatible with " + "the selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; } else if (gfc_match (clauses[3]) == MATCH_YES) { @@ -5412,7 +5419,8 @@ gfc_match_omp_requires (void) goto error; if (requires_clause & ~(OMP_REQ_ATOMIC_MEM_ORDER_MASK - | OMP_REQ_DYNAMIC_ALLOCATORS)) + | OMP_REQ_DYNAMIC_ALLOCATORS + | OMP_REQ_UNIFIED_SHARED_MEMORY)) gfc_error_now ("Sorry, %qs clause at %L on REQUIRES directive is not " "yet supported", clause, &old_loc); if (!gfc_omp_requires_add_clause (requires_clause, clause, &old_loc, NULL)) diff --git a/gcc/testsuite/c-c++-common/gomp/usm-1.c b/gcc/testsuite/c-c++-common/gomp/usm-1.c new file mode 100644 index 00000000000..619c21a83f4 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-1.c @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ + +#pragma omp requires unified_shared_memory /* { dg-error "unified_shared_memory is incompatible with the selected -foffload-memory option" } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 new file mode 100644 index 00000000000..340f6bb50a5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-1.f90 @@ -0,0 +1,6 @@ +! { dg-do compile } +! { dg-additional-options "-foffload-memory=pinned" } + +!$omp requires unified_shared_memory ! { dg-error "unified_shared_memory at .* is incompatible with the selected -foffload-memory option" } + +end From patchwork Tue Mar 8 11:30:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abid Qadeer X-Patchwork-Id: 51782 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 64C71385DC26 for ; Tue, 8 Mar 2022 11:33:41 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id A1AD0385C416; Tue, 8 Mar 2022 11:31:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A1AD0385C416 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,164,1643702400"; d="scan'208";a="72879783" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 08 Mar 2022 03:31:31 -0800 IronPort-SDR: JZZHiNP3GpYlzWLIMpdljhrJDj4fjyewPy8zBspVPpQY/hKqa2U0Z/PIDLpeQla8T3YWjk74+4 w79hCE53jJV/kpapt9GEOiDYNDnRNuL5pdMBeVr23OD7T/1paQwkoAQpv1xxnHJL0+8I3swGCu 7nINlPYlWZpD3XbMIHfJQEe447A/83LwF0gVzTR5YzC9AabEyRQkNO39ueUYWZpzifjfpeRP2l 2B+xHuC+BDdn7aWS0YaQ/CMPxrc9ox9UqAXEZM0VFx+B5KuQG/KVGkZNVdLI0qkqN3jSu08tk1 LRM= From: Hafiz Abid Qadeer To: , Subject: [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc Date: Tue, 8 Mar 2022 11:30:57 +0000 Message-ID: <20220308113059.688551-4-abidh@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220308113059.688551-1-abidh@codesourcery.com> References: <20220308113059.688551-1-abidh@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, ams@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Andrew Stubbs This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to allocate memory in the "managed" space and explicitly on the host (it is intended that "malloc" will be intercepted by the compiler). The nvptx plugin is modified to make the necessary Cuda calls, and libgomp is modified to switch to shared-memory mode for USM allocated mappings. libgomp/ChangeLog: * allocator.c (omp_max_predefined_alloc): Update. (omp_aligned_alloc): Don't fallback ompx_host_mem_alloc. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/linux/allocator.c (linux_memspace_alloc): Handle USM. (linux_memspace_calloc): Handle USM. (linux_memspace_free): Handle USM. (linux_memspace_realloc): Handle USM. * config/nvptx/allocator.c (nvptx_memspace_alloc): Reject ompx_host_mem_alloc. (nvptx_memspace_calloc): Likewise. (nvptx_memspace_realloc): Likewise. * libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype. (GOMP_OFFLOAD_usm_free): New prototype. (GOMP_OFFLOAD_is_usm_ptr): New prototype. * libgomp.h (gomp_usm_alloc): New prototype. (gomp_usm_free): New prototype. (gomp_is_usm_ptr): New prototype. (struct gomp_device_descr): Add USM functions. * omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space and ompx_host_mem_space. (omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and ompx_host_mem_alloc. * omp_lib.f90.in: Likewise. * plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter. Call cuMemAllocManaged as appropriate. (GOMP_OFFLOAD_alloc): Move internals to ... (GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter. (GOMP_OFFLOAD_usm_alloc): New function. (GOMP_OFFLOAD_usm_free): New function. (GOMP_OFFLOAD_is_usm_ptr): New function. * target.c (gomp_map_vars_internal): Add USM support. (gomp_usm_alloc): New function. (gomp_usm_free): New function. (gomp_load_plugin_for_device): New function. * testsuite/libgomp.c/usm-1.c: New test. * testsuite/libgomp.c/usm-2.c: New test. * testsuite/libgomp.c/usm-3.c: New test. * testsuite/libgomp.c/usm-4.c: New test. * testsuite/libgomp.c/usm-5.c: New test. --- libgomp/allocator.c | 13 ++++-- libgomp/config/linux/allocator.c | 48 ++++++++++++-------- libgomp/config/nvptx/allocator.c | 6 +++ libgomp/libgomp-plugin.h | 3 ++ libgomp/libgomp.h | 6 +++ libgomp/omp.h.in | 4 ++ libgomp/omp_lib.f90.in | 8 ++++ libgomp/plugin/plugin-nvptx.c | 45 ++++++++++++++++--- libgomp/target.c | 70 +++++++++++++++++++++++++++++ libgomp/testsuite/libgomp.c/usm-1.c | 24 ++++++++++ libgomp/testsuite/libgomp.c/usm-2.c | 32 +++++++++++++ libgomp/testsuite/libgomp.c/usm-3.c | 35 +++++++++++++++ libgomp/testsuite/libgomp.c/usm-4.c | 36 +++++++++++++++ libgomp/testsuite/libgomp.c/usm-5.c | 28 ++++++++++++ 14 files changed, 330 insertions(+), 28 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 000ccc2dd9c..18045dbe0c4 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -32,7 +32,7 @@ #include #include -#define omp_max_predefined_alloc ompx_pinned_mem_alloc +#define omp_max_predefined_alloc ompx_host_mem_alloc /* These macros may be overridden in config//allocator.c. */ #ifndef MEMSPACE_ALLOC @@ -68,6 +68,8 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = { omp_low_lat_mem_space, /* omp_pteam_mem_alloc. */ omp_low_lat_mem_space, /* omp_thread_mem_alloc. */ omp_default_mem_space, /* ompx_pinned_mem_alloc. */ + ompx_unified_shared_mem_space, /* ompx_unified_shared_mem_alloc. */ + ompx_host_mem_space, /* ompx_host_mem_alloc. */ }; struct omp_allocator_data @@ -367,7 +369,8 @@ fail: int fallback = (allocator_data ? allocator_data->fallback : (allocator == omp_default_mem_alloc - || allocator == ompx_pinned_mem_alloc) + || allocator == ompx_pinned_mem_alloc + || allocator == ompx_host_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -597,7 +600,8 @@ fail: int fallback = (allocator_data ? allocator_data->fallback : (allocator == omp_default_mem_alloc - || allocator == ompx_pinned_mem_alloc) + || allocator == ompx_pinned_mem_alloc + || allocator == ompx_host_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -855,7 +859,8 @@ fail: int fallback = (allocator_data ? allocator_data->fallback : (allocator == omp_default_mem_alloc - || allocator == ompx_pinned_mem_alloc) + || allocator == ompx_pinned_mem_alloc + || allocator == ompx_host_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c index 5f3ae491f07..face524259c 100644 --- a/libgomp/config/linux/allocator.c +++ b/libgomp/config/linux/allocator.c @@ -42,9 +42,11 @@ static void * linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) { - (void)memspace; - - if (pin) + if (memspace == ompx_unified_shared_mem_space) + { + return gomp_usm_alloc (size, GOMP_DEVICE_ICV); + } + else if (pin) { void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); @@ -67,7 +69,14 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) static void * linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) { - if (pin) + if (memspace == ompx_unified_shared_mem_space) + { + void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV); + memset (ret, 0, size); + return ret; + } + else if (memspace == ompx_unified_shared_mem_space + || pin) return linux_memspace_alloc (memspace, size, pin); else return calloc (1, size); @@ -77,9 +86,9 @@ static void linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size, int pin) { - (void)memspace; - - if (pin) + if (memspace == ompx_unified_shared_mem_space) + gomp_usm_free (addr, GOMP_DEVICE_ICV); + else if (pin) munmap (addr, size); else free (addr); @@ -89,7 +98,9 @@ static void * linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, size_t oldsize, size_t size, int oldpin, int pin) { - if (oldpin && pin) + if (memspace == ompx_unified_shared_mem_space) + goto manual_realloc; + else if (oldpin && pin) { void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE); if (newaddr == MAP_FAILED) @@ -98,18 +109,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, return newaddr; } else if (oldpin || pin) - { - void *newaddr = linux_memspace_alloc (memspace, size, pin); - if (newaddr) - { - memcpy (newaddr, addr, oldsize < size ? oldsize : size); - linux_memspace_free (memspace, addr, oldsize, oldpin); - } - - return newaddr; - } + goto manual_realloc; else return realloc (addr, size); + +manual_realloc: + void *newaddr = linux_memspace_alloc (memspace, size, pin); + if (newaddr) + { + memcpy (newaddr, addr, oldsize < size ? oldsize : size); + linux_memspace_free (memspace, addr, oldsize, oldpin); + } + + return newaddr; } #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index 0102680b717..c1a73511623 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size) __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); return result; } + else if (memspace == ompx_host_mem_space) + return NULL; else return malloc (size); } @@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size) return result; } + else if (memspace == ompx_host_mem_space) + return NULL; else return calloc (1, size); } @@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, } return result; } + else if (memspace == ompx_host_mem_space) + return NULL; else return realloc (addr, size); } diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index 07ab700b80c..104f375bc1b 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -134,6 +134,9 @@ extern int GOMP_OFFLOAD_load_image (int, unsigned, const void *, extern bool GOMP_OFFLOAD_unload_image (int, unsigned, const void *); extern void *GOMP_OFFLOAD_alloc (int, size_t); extern bool GOMP_OFFLOAD_free (int, void *); +extern void *GOMP_OFFLOAD_usm_alloc (int, size_t); +extern bool GOMP_OFFLOAD_usm_free (int, void *); +extern bool GOMP_OFFLOAD_is_usm_ptr (void *); extern bool GOMP_OFFLOAD_dev2host (int, void *, const void *, size_t); extern bool GOMP_OFFLOAD_host2dev (int, void *, const void *, size_t); extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, size_t); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index b9e03919993..1cbde607794 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1013,6 +1013,9 @@ extern int gomp_pause_host (void); extern void gomp_init_targets_once (void); extern int gomp_get_num_devices (void); extern bool gomp_target_task_fn (void *); +extern void * gomp_usm_alloc (size_t size, int device_num); +extern void gomp_usm_free (void *device_ptr, int device_num); +extern bool gomp_is_usm_ptr (void *ptr); /* Splay tree definitions. */ typedef struct splay_tree_node_s *splay_tree_node; @@ -1238,6 +1241,9 @@ struct gomp_device_descr __typeof (GOMP_OFFLOAD_unload_image) *unload_image_func; __typeof (GOMP_OFFLOAD_alloc) *alloc_func; __typeof (GOMP_OFFLOAD_free) *free_func; + __typeof (GOMP_OFFLOAD_usm_alloc) *usm_alloc_func; + __typeof (GOMP_OFFLOAD_usm_free) *usm_free_func; + __typeof (GOMP_OFFLOAD_is_usm_ptr) *is_usm_ptr_func; __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func; __typeof (GOMP_OFFLOAD_host2dev) *host2dev_func; __typeof (GOMP_OFFLOAD_dev2dev) *dev2dev_func; diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in index 1d002d36aae..4ec4475306b 100644 --- a/libgomp/omp.h.in +++ b/libgomp/omp.h.in @@ -120,6 +120,8 @@ typedef enum omp_memspace_handle_t __GOMP_UINTPTR_T_ENUM omp_const_mem_space = 2, omp_high_bw_mem_space = 3, omp_low_lat_mem_space = 4, + ompx_unified_shared_mem_space = 5, + ompx_host_mem_space = 6, __omp_memspace_handle_t_max__ = __UINTPTR_MAX__ } omp_memspace_handle_t; @@ -135,6 +137,8 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM omp_pteam_mem_alloc = 7, omp_thread_mem_alloc = 8, ompx_pinned_mem_alloc = 9, + ompx_unified_shared_mem_alloc = 10, + ompx_host_mem_alloc = 11, __omp_allocator_handle_t_max__ = __UINTPTR_MAX__ } omp_allocator_handle_t; diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in index a095dad8962..e1c32aa78d2 100644 --- a/libgomp/omp_lib.f90.in +++ b/libgomp/omp_lib.f90.in @@ -160,6 +160,10 @@ parameter :: omp_thread_mem_alloc = 8 integer (kind=omp_allocator_handle_kind), & parameter :: ompx_pinned_mem_alloc = 9 + integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_unified_shared_mem_alloc = 10 + integer (kind=omp_allocator_handle_kind), & + parameter :: ompx_host_mem_alloc = 11 integer (omp_memspace_handle_kind), & parameter :: omp_default_mem_space = 0 integer (omp_memspace_handle_kind), & @@ -170,6 +174,10 @@ parameter :: omp_high_bw_mem_space = 3 integer (omp_memspace_handle_kind), & parameter :: omp_low_lat_mem_space = 4 + integer (omp_memspace_handle_kind), & + parameter :: omp_unified_shared_mem_space = 5 + integer (omp_memspace_handle_kind), & + parameter :: omp_host_mem_space = 6 type omp_alloctrait integer (kind=omp_alloctrait_key_kind) key diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 1b9a5e95c07..b664d652a45 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1042,11 +1042,13 @@ nvptx_stacks_free (struct ptx_device *ptx_dev, bool force) } static void * -nvptx_alloc (size_t s, bool suppress_errors) +nvptx_alloc (size_t s, bool suppress_errors, bool usm) { CUdeviceptr d; - CUresult r = CUDA_CALL_NOCHECK (cuMemAlloc, &d, s); + CUresult r = (usm ? CUDA_CALL_NOCHECK (cuMemAllocManaged, &d, s, + CU_MEM_ATTACH_GLOBAL) + : CUDA_CALL_NOCHECK (cuMemAlloc, &d, s)); if (suppress_errors && r == CUDA_ERROR_OUT_OF_MEMORY) return NULL; else if (r != CUDA_SUCCESS) @@ -1423,8 +1425,8 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data) return ret; } -void * -GOMP_OFFLOAD_alloc (int ord, size_t size) +static void * +GOMP_OFFLOAD_alloc_1 (int ord, size_t size, bool usm) { if (!nvptx_attach_host_thread_to_device (ord)) return NULL; @@ -1447,7 +1449,7 @@ GOMP_OFFLOAD_alloc (int ord, size_t size) blocks = tmp; } - void *d = nvptx_alloc (size, true); + void *d = nvptx_alloc (size, true, usm); if (d) return d; else @@ -1455,10 +1457,22 @@ GOMP_OFFLOAD_alloc (int ord, size_t size) /* Memory allocation failed. Try freeing the stacks block, and retrying. */ nvptx_stacks_free (ptx_dev, true); - return nvptx_alloc (size, false); + return nvptx_alloc (size, false, usm); } } +void * +GOMP_OFFLOAD_alloc (int ord, size_t size) +{ + return GOMP_OFFLOAD_alloc_1 (ord, size, false); +} + +void * +GOMP_OFFLOAD_usm_alloc (int ord, size_t size) +{ + return GOMP_OFFLOAD_alloc_1 (ord, size, true); +} + bool GOMP_OFFLOAD_free (int ord, void *ptr) { @@ -1466,6 +1480,25 @@ GOMP_OFFLOAD_free (int ord, void *ptr) && nvptx_free (ptr, ptx_devices[ord])); } +bool +GOMP_OFFLOAD_usm_free (int ord, void *ptr) +{ + return GOMP_OFFLOAD_free (ord, ptr); +} + +bool +GOMP_OFFLOAD_is_usm_ptr (void *ptr) +{ + bool managed = false; + /* This returns 3 outcomes ... + CUDA_ERROR_INVALID_VALUE - Not a Cuda allocated pointer. + CUDA_SUCCESS, managed:false - Cuda allocated, but not USM. + CUDA_SUCCESS, managed:true - USM. */ + CUDA_CALL_NOCHECK (cuPointerGetAttribute, &managed, + CU_POINTER_ATTRIBUTE_IS_MANAGED, (CUdeviceptr)ptr); + return managed; +} + void GOMP_OFFLOAD_openacc_exec (void (*fn) (void *), size_t mapnum, void **hostaddrs, void **devaddrs, diff --git a/libgomp/target.c b/libgomp/target.c index 9017458885e..f98e8da2526 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1030,6 +1030,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, tgt->list[i].offset = 0; continue; } + else if (devicep->is_usm_ptr_func + && devicep->is_usm_ptr_func (hostaddrs[i])) + { + /* The memory is visible from both host and target + so nothing needs to be moved. */ + tgt->list[i].key = NULL; + tgt->list[i].offset = OFFSET_INLINED; + continue; + } else if ((kind & typemask) == GOMP_MAP_STRUCT) { size_t first = i + 1; @@ -1488,6 +1497,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, continue; } default: + if (tgt->list[i].offset == OFFSET_INLINED + && !array) + continue; break; } splay_tree_key k = &array->key; @@ -3323,6 +3335,61 @@ omp_target_free (void *device_ptr, int device_num) gomp_mutex_unlock (&devicep->lock); } +void * +gomp_usm_alloc (size_t size, int device_num) +{ + if (device_num == gomp_get_num_devices ()) + return malloc (size); + + struct gomp_device_descr *devicep = resolve_device (device_num); + if (devicep == NULL) + return NULL; + + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + return malloc (size); + + void *ret = NULL; + gomp_mutex_lock (&devicep->lock); + if (devicep->usm_alloc_func) + ret = devicep->usm_alloc_func (devicep->target_id, size); + gomp_mutex_unlock (&devicep->lock); + return ret; +} + +void +gomp_usm_free (void *device_ptr, int device_num) +{ + if (device_ptr == NULL) + return; + + if (device_num == gomp_get_num_devices ()) + { + free (device_ptr); + return; + } + + struct gomp_device_descr *devicep = resolve_device (device_num); + if (devicep == NULL) + return; + + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) + { + free (device_ptr); + return; + } + + gomp_mutex_lock (&devicep->lock); + if (devicep->usm_free_func + && !devicep->usm_free_func (devicep->target_id, device_ptr)) + { + gomp_mutex_unlock (&devicep->lock); + gomp_fatal ("error in freeing device memory block at %p", device_ptr); + } + gomp_mutex_unlock (&devicep->lock); +} + int omp_target_is_present (const void *ptr, int device_num) { @@ -3740,6 +3807,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device, DLSYM (unload_image); DLSYM (alloc); DLSYM (free); + DLSYM_OPT (usm_alloc, usm_alloc); + DLSYM_OPT (usm_free, usm_free); + DLSYM_OPT (is_usm_ptr, is_usm_ptr); DLSYM (dev2host); DLSYM (host2dev); device->capabilities = device->get_caps_func (); diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c new file mode 100644 index 00000000000..1b35f19c45b --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-1.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int), ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + *a = 42; + uintptr_t a_p = (uintptr_t)a; + + #pragma omp target is_device_ptr(a) + { + if (*a != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c new file mode 100644 index 00000000000..689cee7e456 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-2.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + + uintptr_t a_p = (uintptr_t)a; + + #pragma omp target map(a[0]) + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + + #pragma omp target map(a[1]) + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + } + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c new file mode 100644 index 00000000000..2ca66afe93f --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-3.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + + uintptr_t a_p = (uintptr_t)a; + +#pragma omp target data map(a[0:2]) + { +#pragma omp target + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + +#pragma omp target + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + } + } + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c new file mode 100644 index 00000000000..753908c8440 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-4.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ + +#include +#include + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int)*2, ompx_unified_shared_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + + uintptr_t a_p = (uintptr_t)a; + +#pragma omp target enter data map(to:a[0:2]) + +#pragma omp target + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + } + +#pragma omp target + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + } + +#pragma omp target exit data map(delete:a[0:2]) + + omp_free(a, ompx_unified_shared_mem_alloc); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-5.c b/libgomp/testsuite/libgomp.c/usm-5.c new file mode 100644 index 00000000000..4d8b3cf71b1 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-5.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-require-effective-target offload_device } */ + +#include +#include + +#pragma omp requires unified_shared_memory + +int +main () +{ + int *a = (int *) omp_alloc(sizeof(int), ompx_host_mem_alloc); + if (!a) + __builtin_abort (); + + a[0] = 42; + + uintptr_t a_p = (uintptr_t)a; + +#pragma omp target map(a[0:1]) + { + if (a[0] != 42 || a_p == (uintptr_t)a) + __builtin_abort (); + } + + omp_free(a, ompx_host_mem_alloc); + return 0; +} From patchwork Tue Mar 8 11:30:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abid Qadeer X-Patchwork-Id: 51783 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5C15D385DC06 for ; Tue, 8 Mar 2022 11:35:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 42E7D385480B; Tue, 8 Mar 2022 11:31:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 42E7D385480B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,164,1643702400"; d="scan'208";a="72701968" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 08 Mar 2022 03:31:47 -0800 IronPort-SDR: K3zjtVa7cQwY4bUGhM6eq5I5r8xG5h0YnncEaMu56EhYvvT3bRTfouD5Wg3JCrWl8AjMH9KB+F N5uD/c5fXBJe2/ME+np0CfiFK2vipRBBuT79NQjNtOuj8cFqxfqHlIkgypUf4kCs36YuDxV0kk F95gqMjoMdwm6exMP3egH5D17qc4LqDev3+PcU4PWdEGq8n5Kfiu+xPnDVGnjESBWaRoW1y7UF fwLxkGUmzKn0uF7e4fE+NJ8Rqp77nX5Yg9Jm7YKdfuMSUR6kAGwtBb+TEz4K/lnEh3iVtzbq7n DAw= From: Hafiz Abid Qadeer To: , Subject: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory. Date: Tue, 8 Mar 2022 11:30:58 +0000 Message-ID: <20220308113059.688551-5-abidh@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220308113059.688551-1-abidh@codesourcery.com> References: <20220308113059.688551-1-abidh@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, ams@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit from the unified shared memory. The libgomp does the correct thing with all the mapping constructs and there is no memory copies if the pointer is pointing to unified shared memory. We only replace replacable new operator and not the class member or placement new. gcc/ChangeLog: * omp-low.cc (usm_transform): New function. (make_pass_usm_transform): Likewise. (class pass_usm_transform): New. * passes.def: Add pass_usm_transform. * tree-pass.h (make_pass_usm_transform): New declaration. gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-2.c: New test. * c-c++-common/gomp/usm-3.c: New test. * g++.dg/gomp/usm-1.C: New test. * g++.dg/gomp/usm-2.C: New test. * g++.dg/gomp/usm-3.C: New test. * gfortran.dg/gomp/usm-2.f90: New test. * gfortran.dg/gomp/usm-3.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.c/usm-6.c: New test. * testsuite/libgomp.c++/usm-1.C: Likewise. --- gcc/omp-low.cc | 152 +++++++++++++++++++++++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 34 +++++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 32 +++++ gcc/testsuite/g++.dg/gomp/usm-1.C | 32 +++++ gcc/testsuite/g++.dg/gomp/usm-2.C | 30 +++++ gcc/testsuite/g++.dg/gomp/usm-3.C | 38 ++++++ gcc/testsuite/gfortran.dg/gomp/usm-2.f90 | 16 +++ gcc/testsuite/gfortran.dg/gomp/usm-3.f90 | 13 ++ gcc/tree-pass.h | 1 + libgomp/testsuite/libgomp.c++/usm-1.C | 54 ++++++++ libgomp/testsuite/libgomp.c/usm-6.c | 70 +++++++++++ 12 files changed, 473 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 5ce3a50709a..ec08d59f676 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14849,6 +14849,158 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt) { return new pass_diagnose_omp_blocks (ctxt); } + +/* Provide transformation required for using unified shared memory + by replacing calls to standard memory allocation functions with + function provided by the libgomp. */ + +static tree +usm_transform (gimple_stmt_iterator *gsi_p, bool *, + struct walk_stmt_info *wi) +{ + gimple *stmt = gsi_stmt (*gsi_p); + /* ompx_unified_shared_mem_alloc is 10. */ + const unsigned int unified_shared_mem_alloc = 10; + + switch (gimple_code (stmt)) + { + case GIMPLE_CALL: + { + gcall *gs = as_a (stmt); + tree fndecl = gimple_call_fndecl (gs); + if (fndecl) + { + tree allocator = build_int_cst (pointer_sized_int_node, + unified_shared_mem_alloc); + const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + if ((strcmp (name, "malloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC) + || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)) + { + tree omp_alloc_type + = build_function_type_list (ptr_type_node, size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_alloc", omp_alloc_type); + tree size = gimple_call_arg (gs, 0); + gimple *g = gimple_build_call (repl, 2, size, allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "calloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CALLOC)) + { + tree omp_calloc_type + = build_function_type_list (ptr_type_node, size_type_node, + size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_calloc", omp_calloc_type); + tree num = gimple_call_arg (gs, 0); + tree size = gimple_call_arg (gs, 1); + gimple *g = gimple_build_call (repl, 3, num, size, allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "realloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_REALLOC)) + { + tree omp_realloc_type + = build_function_type_list (ptr_type_node, ptr_type_node, + size_type_node, + pointer_sized_int_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_realloc", omp_realloc_type); + tree ptr = gimple_call_arg (gs, 0); + tree size = gimple_call_arg (gs, 1); + gimple *g = gimple_build_call (repl, 4, ptr, size, allocator, + allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "free") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FREE) + || (DECL_IS_OPERATOR_DELETE_P (fndecl) + && DECL_IS_REPLACEABLE_OPERATOR (fndecl))) + { + tree omp_free_type + = build_function_type_list (void_type_node, ptr_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_free", omp_free_type); + tree ptr = gimple_call_arg (gs, 0); + gimple *g = gimple_build_call (repl, 2, ptr, allocator); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + } + } + break; + + default: + break; + } + + return NULL_TREE; +} + +namespace { + +const pass_data pass_data_usm_transform = +{ + GIMPLE_PASS, /* type */ + "usm_transform", /* name */ + OPTGROUP_OMP, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_gimple_any, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_usm_transform : public gimple_opt_pass +{ +public: + pass_usm_transform (gcc::context *ctxt) + : gimple_opt_pass (pass_data_usm_transform, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return (flag_openmp || flag_openmp_simd) + && (flag_offload_memory == OFFLOAD_MEMORY_UNIFIED + || omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY); + } + virtual unsigned int execute (function *) + { + struct walk_stmt_info wi; + gimple_seq body = gimple_body (current_function_decl); + + memset (&wi, 0, sizeof (wi)); + walk_gimple_seq (body, usm_transform, NULL, &wi); + + return 0; + } + +}; // class pass_usm_transform + +} // anon namespace + +gimple_opt_pass * +make_pass_usm_transform (gcc::context *ctxt) +{ + return new pass_usm_transform (ctxt); +} #include "gt-omp-low.h" diff --git a/gcc/passes.def b/gcc/passes.def index f7718181038..98c7736bb8b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_diagnose_tm_blocks); NEXT_PASS (pass_omp_oacc_kernels_decompose); NEXT_PASS (pass_lower_omp); + NEXT_PASS (pass_usm_transform); NEXT_PASS (pass_lower_cf); NEXT_PASS (pass_lower_tm); NEXT_PASS (pass_refactor_eh); diff --git a/gcc/testsuite/c-c++-common/gomp/usm-2.c b/gcc/testsuite/c-c++-common/gomp/usm-2.c new file mode 100644 index 00000000000..2f3f986012c --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-2.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fdump-tree-usm_transform" } */ + +#pragma omp requires unified_shared_memory + +#ifdef __cplusplus +extern "C" { +#endif + +void *malloc (__SIZE_TYPE__); +void *calloc(__SIZE_TYPE__, __SIZE_TYPE__); +void *realloc(void *, __SIZE_TYPE__); +void free (void *); + +#ifdef __cplusplus +} +#endif + +void +foo () +{ + void *p1 = malloc(20); + void *p2 = realloc(p1, 30); + void *p3 = calloc(4, 15); + free (p2); + free (p3); +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 2 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " free" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " malloc" "usm_transform" } } */ diff --git a/gcc/testsuite/c-c++-common/gomp/usm-3.c b/gcc/testsuite/c-c++-common/gomp/usm-3.c new file mode 100644 index 00000000000..c8230e7ff7c --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/usm-3.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } */ + +#ifdef __cplusplus +extern "C" { +#endif + +void *malloc (__SIZE_TYPE__); +void *calloc(__SIZE_TYPE__, __SIZE_TYPE__); +void *realloc(void *, __SIZE_TYPE__); +void free (void *); + +#ifdef __cplusplus +} +#endif + +void +foo () +{ + void *p1 = malloc(20); + void *p2 = realloc(p1, 30); + void *p3 = calloc(4, 15); + free (p2); + free (p3); +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_realloc \\(.*, 30, 10, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_calloc \\(4, 15, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 2 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " free" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not " malloc" "usm_transform" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/usm-1.C b/gcc/testsuite/g++.dg/gomp/usm-1.C new file mode 100644 index 00000000000..bd70a81b5bb --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/usm-1.C @@ -0,0 +1,32 @@ +// { dg-do compile } +// { dg-options "-fopenmp -fdump-tree-usm_transform" } + +#pragma omp requires unified_shared_memory + +struct t1 +{ + int a; + int b; +}; + +typedef unsigned char uint8_t; + +void +foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y) +{ + uint8_t *p1 = new uint8_t; + uint8_t *p2 = new uint8_t[20]; + t1 *p3 = new t1; + t1 *p4 = new t1[y]; + delete p1; + delete p3; + delete [] p2; + delete [] p4; +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator new" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator delete" "usm_transform" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/usm-2.C b/gcc/testsuite/g++.dg/gomp/usm-2.C new file mode 100644 index 00000000000..f6ab155c6de --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/usm-2.C @@ -0,0 +1,30 @@ +// { dg-do compile } +// { dg-options "-fopenmp -foffload-memory=unified -fdump-tree-usm_transform" } + +struct t1 +{ + int a; + int b; +}; + +typedef unsigned char uint8_t; + +void +foo (__SIZE_TYPE__ x, __SIZE_TYPE__ y) +{ + uint8_t *p1 = new uint8_t; + uint8_t *p2 = new uint8_t[20]; + t1 *p3 = new t1; + t1 *p4 = new t1[y]; + delete p1; + delete p3; + delete [] p2; + delete [] p4; +} + +/* { dg-final { scan-tree-dump-times "omp_alloc \\(1, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc \\(20, 10\\)" 1 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_alloc" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-times "omp_free" 4 "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator new" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "operator delete" "usm_transform" } } */ diff --git a/gcc/testsuite/g++.dg/gomp/usm-3.C b/gcc/testsuite/g++.dg/gomp/usm-3.C new file mode 100644 index 00000000000..50ac9302c8b --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/usm-3.C @@ -0,0 +1,38 @@ +// { dg-do compile } +// { dg-options "-fopenmp -fdump-tree-usm_transform" } + +#pragma omp requires unified_shared_memory + +#include + + +struct X { + static void* operator new(std::size_t count) + { + static char buf[10]; + return &buf[0]; + } + static void* operator new[](std::size_t count) + { + static char buf[10]; + return &buf[0]; + } + static void operator delete(void*) + { + } + static void operator delete[](void*) + { + } +}; +void foo() { + X* p1 = new X; + delete p1; + X* p2 = new X[10]; + delete[] p2; + unsigned char buf[24] ; + int *p3 = new (buf) int(3); + p3[0] = 1; +} + +/* { dg-final { scan-tree-dump-not "omp_alloc" "usm_transform" } } */ +/* { dg-final { scan-tree-dump-not "omp_free" "usm_transform" } } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 new file mode 100644 index 00000000000..dc775260cb7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-2.f90 @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-usm_transform" } + +!$omp requires unified_shared_memory +end + +subroutine foo() + implicit none + integer, allocatable :: var1 + + allocate(var1) + +end subroutine + +! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform" } } +! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform" } } \ No newline at end of file diff --git a/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 new file mode 100644 index 00000000000..7983444ebff --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/usm-3.f90 @@ -0,0 +1,13 @@ +! { dg-do compile } +! { dg-additional-options "-foffload-memory=unified -fdump-tree-usm_transform" } + +subroutine foo() + implicit none + integer, allocatable :: var1 + + allocate(var1) + +end subroutine + +! { dg-final { scan-tree-dump-times "omp_alloc" 1 "usm_transform" } } +! { dg-final { scan-tree-dump-times "omp_free" 1 "usm_transform" } } \ No newline at end of file diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 606d1d60b85..494a9662afa 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -424,6 +424,7 @@ extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_usm_transform (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt); diff --git a/libgomp/testsuite/libgomp.c++/usm-1.C b/libgomp/testsuite/libgomp.c++/usm-1.C new file mode 100644 index 00000000000..fea25e5f10b --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/usm-1.C @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */ +#include + +#pragma omp requires unified_shared_memory + +int g1 = 0; + +struct s1 +{ + s1() { a = g1++;} + ~s1() { g1--;} + int a; +}; + +int +main () +{ + s1 *p1 = new s1; + s1 *p2 = new s1[10]; + + if (!p1 || !p2 || p1->a != 0) + __builtin_abort (); + + for (int i = 0; i < 10; i++) + if (p2[i].a != i+1) + __builtin_abort (); + + uintptr_t pp1 = (uintptr_t)p1; + uintptr_t pp2 = (uintptr_t)p2; + +#pragma omp target firstprivate(pp1, pp2) + { + s1 *t1 = (s1*)pp1; + s1 *t2 = (s1*)pp2; + if (t1->a != 0) + __builtin_abort (); + + for (int i = 0; i < 10; i++) + if (t2[i].a != i+1) + __builtin_abort (); + + t1->a = 42; + } + + if (p1->a != 42) + __builtin_abort (); + + delete [] p2; + delete p1; + if (g1 != 0) + __builtin_abort (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/usm-6.c b/libgomp/testsuite/libgomp.c/usm-6.c new file mode 100644 index 00000000000..d98da68a1ed --- /dev/null +++ b/libgomp/testsuite/libgomp.c/usm-6.c @@ -0,0 +1,70 @@ +/* { dg-do run } */ +/* { dg-skip-if "Only valid for nvptx" { ! offload_target_nvptx } } */ + +#include +#include + +#pragma omp requires unified_shared_memory + +int +main () +{ + int *a = (int *) malloc(sizeof(int)*2); + int *b = (int *) calloc(sizeof(int), 3); + int *c = (int *) realloc(NULL, sizeof(int) * 4); + if (!a || !b || !c) + __builtin_abort (); + + a[0] = 42; + a[1] = 43; + b[0] = 52; + b[1] = 53; + b[2] = 54; + c[0] = 62; + c[1] = 63; + c[2] = 64; + c[3] = 65; + + uintptr_t a_p = (uintptr_t)a; + uintptr_t b_p = (uintptr_t)b; + uintptr_t c_p = (uintptr_t)c; + +#pragma omp target enter data map(to:a[0:2]) + +#pragma omp target is_device_ptr(c) + { + if (a[0] != 42 || a_p != (uintptr_t)a) + __builtin_abort (); + if (b[0] != 52 || b[2] != 54 || b_p != (uintptr_t)b) + __builtin_abort (); + if (c[0] != 62 || c[3] != 65 || c_p != (uintptr_t)c) + __builtin_abort (); + a[0] = 72; + b[0] = 82; + c[0] = 92; + } + +#pragma omp target + { + if (a[1] != 43 || a_p != (uintptr_t)a) + __builtin_abort (); + if (b[1] != 53 || b_p != (uintptr_t)b) + __builtin_abort (); + if (c[1] != 63 || c[2] != 64 || c_p != (uintptr_t)c) + __builtin_abort (); + a[1] = 73; + b[1] = 83; + c[1] = 93; + } + +#pragma omp target exit data map(delete:a[0:2]) + + if (a[0] != 72 || a[1] != 73 + || b[0] != 82 || b[1] != 83 + || c[0] != 92 || c[1] != 93) + __builtin_abort (); + free(a); + free(b); + free(c); + return 0; +} From patchwork Tue Mar 8 11:30:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abid Qadeer X-Patchwork-Id: 51784 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B627385DC10 for ; Tue, 8 Mar 2022 11:35:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 55D19385C421; Tue, 8 Mar 2022 11:31:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55D19385C421 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,164,1643702400"; d="scan'208";a="72701969" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 08 Mar 2022 03:31:47 -0800 IronPort-SDR: VysnImb/mOnku8KOVHXKgv+cbZjziKIobGLyDBPGdUyxKb6qpL1nahNmH8YXIAaYhg/zTNUi8S eUsv2Hiw6mGMWIwLqw2RSL4WTQ+lzwZEH2XH1pOnxtfJi1otPmqTv05RauPovi0YzfUl2KR67B P2EUnL5Lt2CjuGwTI8vqPu5y14qYPmIJ0Rlf3+HM+B+g8wPf1IqnyoDT22vtMuxbkwHaMdtmTV WSMbwlNEh8P/0xBzSxyCFCw36sj6BUORBb9BLeACZqDHFcazMrkv6zHBX1+EpTjMq0Bwq1gJBf Qg0= From: Hafiz Abid Qadeer To: , Subject: [PATCH 5/5] openmp: -foffload-memory=pinned Date: Tue, 8 Mar 2022 11:30:59 +0000 Message-ID: <20220308113059.688551-6-abidh@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220308113059.688551-1-abidh@codesourcery.com> References: <20220308113059.688551-1-abidh@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jakub@redhat.com, ams@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Andrew Stubbs Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlockall to enable always-on memory pinning. It requires that the ulimit feature is set high enough to accommodate all the program's memory usage. In this mode the ompx_pinned_memory_alloc feature is disabled as it is not needed and may conflict. gcc/ChangeLog: * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. libgomp/ChangeLog: * config/linux/allocator.c (always_pinned_mode): New variable. (GOMP_enable_pinned_mode): New function. (linux_memspace_alloc): Disable pinning when always_pinned_mode set. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise. * libgomp.map (GOMP_5.1.1): New version space with GOMP_enable_pinned_mode. * testsuite/libgomp.c/alloc-pinned-7.c: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/alloc-pinned-1.c: New test. --- gcc/omp-low.cc | 68 +++++++++++++++++++ .../c-c++-common/gomp/alloc-pinned-1.c | 28 ++++++++ libgomp/config/linux/allocator.c | 26 +++++++ libgomp/libgomp.map | 5 ++ libgomp/testsuite/libgomp.c/alloc-pinned-7.c | 66 ++++++++++++++++++ 5 files changed, 193 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index ec08d59f676..ce21b3bd6f8 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14441,6 +14441,70 @@ lower_omp (gimple_seq *body, omp_context *ctx) input_location = saved_location; } +/* Emit a constructor function to enable -foffload-memory=pinned + at runtime. Libgomp handles the OS mode setting, but we need to trigger + it by calling GOMP_enable_pinned mode before the program proper runs. */ + +static void +omp_enable_pinned_mode () +{ + static bool visited = false; + if (visited) + return; + visited = true; + + /* Create a new function like this: + + static void __attribute__((constructor)) + __set_pinned_mode () + { + GOMP_enable_pinned_mode (); + } + */ + + tree name = get_identifier ("__set_pinned_mode"); + tree voidfntype = build_function_type_list (void_type_node, NULL_TREE); + tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype); + + TREE_STATIC (decl) = 1; + TREE_USED (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + DECL_IGNORED_P (decl) = 0; + TREE_PUBLIC (decl) = 0; + DECL_UNINLINABLE (decl) = 1; + DECL_EXTERNAL (decl) = 0; + DECL_CONTEXT (decl) = NULL_TREE; + DECL_INITIAL (decl) = make_node (BLOCK); + BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl; + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"), + NULL_TREE, NULL_TREE); + + tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, + void_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_IGNORED_P (t) = 1; + DECL_CONTEXT (t) = decl; + DECL_RESULT (decl) = t; + + push_struct_function (decl); + init_tree_ssa (cfun); + + tree callname = get_identifier ("GOMP_enable_pinned_mode"); + tree calldecl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, callname, + voidfntype); + gcall *call = gimple_build_call (calldecl, 0); + + gimple_seq seq = NULL; + gimple_seq_add_stmt (&seq, call); + gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL)); + + cfun->function_end_locus = UNKNOWN_LOCATION; + cfun->curr_properties |= PROP_gimple_any; + pop_cfun (); + cgraph_node::add_new_function (decl, true); +} + /* Main entry point. */ static unsigned int @@ -14497,6 +14561,10 @@ execute_lower_omp (void) for (auto task_stmt : task_cpyfns) finalize_task_copyfn (task_stmt); task_cpyfns.release (); + + if (flag_offload_memory == OFFLOAD_MEMORY_PINNED) + omp_enable_pinned_mode (); + return 0; } diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c new file mode 100644 index 00000000000..e0e08019bff --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +#if __cplusplus +#define EXTERNC extern "C" +#else +#define EXTERNC +#endif + +/* Intercept the libgomp initialization call to check it happens. */ + +int good = 0; + +EXTERNC void +GOMP_enable_pinned_mode () +{ + good = 1; +} + +int +main () +{ + if (!good) + __builtin_exit (1); + + return 0; +} diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c index face524259c..4bd5bd6b930 100644 --- a/libgomp/config/linux/allocator.c +++ b/libgomp/config/linux/allocator.c @@ -39,9 +39,26 @@ #include #include "libgomp.h" +static bool always_pinned_mode = false; + +/* This function is called by the compiler when -foffload-memory=pinned + is used. */ + +void +GOMP_enable_pinned_mode () +{ + if (mlockall (MCL_CURRENT | MCL_FUTURE) != 0) + gomp_error ("failed to pin all memory (ulimit too low?)"); + else + always_pinned_mode = true; +} + static void * linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) { return gomp_usm_alloc (size, GOMP_DEVICE_ICV); @@ -69,6 +86,9 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin) static void * linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) { void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV); @@ -86,6 +106,9 @@ static void linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) gomp_usm_free (addr, GOMP_DEVICE_ICV); else if (pin) @@ -98,6 +121,9 @@ static void * linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr, size_t oldsize, size_t size, int oldpin, int pin) { + /* Explicit pinning may not be required. */ + pin = pin && !always_pinned_mode; + if (memspace == ompx_unified_shared_mem_space) goto manual_realloc; else if (oldpin && pin) diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 2ac58094169..40402dc9893 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -402,6 +402,11 @@ GOMP_5.1 { GOMP_teams4; } GOMP_5.0.1; +GOMP_5.1.1 { + global: + GOMP_enable_pinned_mode; +} GOMP_5.1; + OACC_2.0 { global: acc_get_num_devices; diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-7.c b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c new file mode 100644 index 00000000000..6fd19b46a5c --- /dev/null +++ b/libgomp/testsuite/libgomp.c/alloc-pinned-7.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-additional-options "-foffload-memory=pinned" } */ + +/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu } } */ + +/* Test that pinned memory works. */ + +#ifdef __linux__ +#include +#include +#include +#include + +#include + +int +get_pinned_mem () +{ + int pid = getpid (); + char buf[100]; + sprintf (buf, "/proc/%d/status", pid); + + FILE *proc = fopen (buf, "r"); + if (!proc) + abort (); + while (fgets (buf, 100, proc)) + { + int val; + if (sscanf (buf, "VmLck: %d", &val)) + { + printf ("lock %d\n", val); + fclose (proc); + return val; + } + } + abort (); +} +#else +int +get_pinned_mem () +{ + return 0; +} + +#define mlockall(...) 0 +#endif + +#include + +/* Allocate more than a page each time, but stay within the ulimit. */ +#define SIZE 10*1024 + +int +main () +{ + // Sanity check + if (get_pinned_mem () == 0) + { + /* -foffload-memory=pinned has failed, but maybe that's because + isufficient pinned memory was available. */ + if (mlockall (MCL_CURRENT | MCL_FUTURE) == 0) + abort (); + } + + return 0; +}