From patchwork Thu Aug 25 17:30:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tobias Burnus X-Patchwork-Id: 57063 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 93D1838515CC for ; Thu, 25 Aug 2022 17:30:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id BA3133858C2F for ; Thu, 25 Aug 2022 17:30:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA3133858C2F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,263,1654588800"; d="diff'?scan'208,217";a="84699004" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 25 Aug 2022 09:30:19 -0800 IronPort-SDR: O5NTzziEkZaBqWZTYPnXSUsIB5eRZUM3aZdwTDuaZ6Chr1vBJ3bh4tg5VOXJBIho4DMNoAYuG9 3oPJae8Mf137spzThEq3e3m4X1CNXsFA+3jMwXZsngUJC/pJ1TylDL4siCXvKmFPaHvuQdT2QM MrCEz5tWxwTtBLNepEZVV+CtMikO9d/oR0mwVZ9TJsjEIOhdY67p6b9lXoXCuJCzFb4dLiPlrF pdKdKtqn9+dc3vqSE2XupCFSQK2DI68Zt82y89fnWGs1gkmAg0OB2KsaMBa0B1f9/rrxfHxRSd Fgw= Message-ID: Date: Thu, 25 Aug 2022 19:30:14 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Subject: [Patch][2/3] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Content-Language: en-US From: Tobias Burnus To: gcc-patches , Jakub Jelinek , Tom de Vries References: <40563a1c-49ef-a185-3c01-9f717cd48fc5@codesourcery.com> In-Reply-To: <40563a1c-49ef-a185-3c01-9f717cd48fc5@codesourcery.com> X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" On 25.08.22 16:54, Tobias Burnus wrote: The attached patch prepare for reverse-offload device->host function-address lookup by requesting (if needed) the on-device address. This patch adds the actual implementation for NVPTX. Having array[] = {fn1,fn2}; works with nvptx only since sm_35; hence, if there is a reverse_offload and sm_30 is used, there will be a compile-time error. To avoid incompatibilities, I compile with the same PTX ISA .version and sm_XX version as the (last) file that contains the reverse offload. While it should not matter, some newer CUDA might not support, e.g., sm_35 or do not like a specific ISA version - thus, that seemed to be safer. This is currently effectively a no op as with [1/3] patch, always NULL is passed and as GOMP_OFFLOAD_get_num_devices returns <= 0 as soon as 'omp requires reverse_offload' has been specified. OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup Add support to nvptx for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/nvptx/mkoffload.cc (record_id): Strip quotations from function name. (process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is at least sm_35, create '$offload_func_table' global array and init with reverse-offload function addresses. * config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New force_public attribute to force .visible. (nvptx_declare_function_name): For "omp target device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload function address table '$offload_func_table' if rev_fn_table is not NULL. gcc/config/nvptx/mkoffload.cc | 104 ++++++++++++++++++++++++++++++++++++++++-- gcc/config/nvptx/nvptx.cc | 20 +++++--- libgomp/plugin/plugin-nvptx.c | 19 +++++++- 3 files changed, 131 insertions(+), 12 deletions(-) diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 3eea0a8f138..c496766b1cc 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -108,12 +108,14 @@ xputenv (const char *string) static void record_id (const char *p1, id_map ***where) { + gcc_assert (p1[0] == '"'); + p1++; const char *end = strchr (p1, '\n'); if (!end) fatal_error (input_location, "malformed ptx file"); id_map *v = XNEW (id_map); - size_t len = end - p1; + size_t len = end - p1 - 1 ; /* remove tailing '"' */ v->ptx_name = XNEWVEC (char, len + 1); memcpy (v->ptx_name, p1, len); v->ptx_name[len] = '\0'; @@ -242,6 +244,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) id_map const *id; unsigned obj_count = 0; unsigned ix; + const char *sm_ver = NULL, *version = NULL; + const char *sm_ver2 = NULL, *version2 = NULL; + size_t file_cnt = 0; + size_t *file_idx = XALLOCAVEC (size_t, len); fprintf (out, "#include \n\n"); @@ -250,6 +256,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires) for (size_t i = 0; i != len;) { char c; + bool output_fn_ptr = false; + file_idx[file_cnt++] = i; fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++); while ((c = input[i++])) @@ -261,6 +269,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires) case '\n': fprintf (out, "\\n\"\n\t\""); /* Look for mappings on subsequent lines. */ + if (UNLIKELY (startswith (input + i, ".target sm_"))) + { + sm_ver = input + i + strlen (".target sm_"); + continue; + } + if (UNLIKELY (startswith (input + i, ".version "))) + { + version = input + i + strlen (".version "); + continue; + } while (startswith (input + i, "//:")) { i += 3; @@ -268,7 +286,10 @@ process (FILE *in, FILE *out, uint32_t omp_requires) if (startswith (input + i, "VAR_MAP ")) record_id (input + i + 8, &vars_tail); else if (startswith (input + i, "FUNC_MAP ")) - record_id (input + i + 9, &funcs_tail); + { + output_fn_ptr = true; + record_id (input + i + 9, &funcs_tail); + } else abort (); /* Skip to next line. */ @@ -286,6 +307,81 @@ process (FILE *in, FILE *out, uint32_t omp_requires) putc (c, out); } fprintf (out, "\";\n\n"); + if (output_fn_ptr + && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + if (sm_ver && sm_ver[0] == '3' && sm_ver[1] == '0' + && sm_ver[2] == '\n') + fatal_error (input_location, + "% requires at least " + "% for %<-misa=%>"); + sm_ver2 = sm_ver; + version2 = version; + } + } + + /* Create function-pointer array, required for reverse + offload function-pointer lookup. */ + + if (func_ids && (omp_requires & GOMP_REQUIRES_REVERSE_OFFLOAD) != 0) + { + const char needle[] = "// BEGIN GLOBAL FUNCTION DECL: "; + fprintf (out, "static const char ptx_code_%u[] =\n", obj_count++); + fprintf (out, "\t\".version "); + for (size_t i = 0; version2[i] != '\0' && version2[i] != '\n'; i++) + fputc (version2[i], out); + fprintf (out, "\"\n\t\".target sm_"); + for (size_t i = 0; version2[i] != '\0' && sm_ver2[i] != '\n'; i++) + fputc (sm_ver2[i], out); + fprintf (out, "\"\n\t\".file 1 \\\"\\\"\"\n"); + + size_t fidx = 0; + for (id = func_ids; id; id = id->next) + { + /* Only 'nohost' functions are needed - use NULL for the rest. + Alternatively, besides searching for 'BEGIN FUNCTION DECL', + checking for '.visible .entry ' + id->ptx_name would be + required. */ + if (!endswith (id->ptx_name, "$nohost")) + continue; + fprintf (out, "\t\".extern "); + const char *p = input + file_idx[fidx]; + while (true) + { + p = strstr (p, needle); + if (!p) + { + fidx++; + if (fidx >= file_cnt) + break; + p = input + file_idx[fidx]; + continue; + } + p += strlen (needle); + if (!startswith (p, id->ptx_name)) + continue; + p += strlen (id->ptx_name); + if (*p != '\n') + continue; + p++; + gcc_assert (startswith (p, ".visible ")); + p += strlen (".visible "); + for (; *p != '\0' && *p != '\n'; p++) + fputc (*p, out); + break; + } + fprintf (out, "\"\n"); + if (fidx == file_cnt) + fatal_error (input_location, + "Cannot find function declaration for %qs", + id->ptx_name); + } + fprintf (out, "\t\".visible .global .align 8 .u64 " + "$offload_func_table[] = {"); + for (comma = "", id = func_ids; id; comma = ",", id = id->next) + fprintf (out, "%s\"\n\t\t\"%s", comma, + endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0"); + fprintf (out, "};\\n\";\n\n"); } /* Dump out array of pointers to ptx object strings. */ @@ -300,7 +396,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) /* Dump out variable idents. */ fprintf (out, "static const char *const var_mappings[] = {"); for (comma = "", id = var_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t%s", comma, id->ptx_name); + fprintf (out, "%s\n\t\"%s\"", comma, id->ptx_name); fprintf (out, "\n};\n\n"); /* Dump out function idents. */ @@ -309,7 +405,7 @@ process (FILE *in, FILE *out, uint32_t omp_requires) " unsigned short dim[%d];\n" "} func_mappings[] = {\n", GOMP_DIM_MAX); for (comma = "", id = func_ids; id; comma = ",", id = id->next) - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); + fprintf (out, "%s\n\t{\"%s\"}", comma, id->ptx_name); fprintf (out, "\n};\n\n"); fprintf (out, diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index e4297e2d6c3..3293c096822 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -989,15 +989,15 @@ write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name) static void write_fn_proto_1 (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public) { if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL) - write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name); + write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name); /* PTX declaration. */ if (DECL_EXTERNAL (decl)) s << ".extern "; - else if (TREE_PUBLIC (decl)) + else if (TREE_PUBLIC (decl) || force_public) s << (DECL_WEAK (decl) ? ".weak " : ".visible "); s << (write_as_kernel (DECL_ATTRIBUTES (decl)) ? ".entry " : ".func "); @@ -1086,7 +1086,7 @@ write_fn_proto_1 (std::stringstream &s, bool is_defn, static void write_fn_proto (std::stringstream &s, bool is_defn, - const char *name, const_tree decl) + const char *name, const_tree decl, bool force_public=false) { const char *replacement = nvptx_name_replacement (name); char *replaced_dots = NULL; @@ -1103,9 +1103,9 @@ write_fn_proto (std::stringstream &s, bool is_defn, if (is_defn) /* Emit a declaration. The PTX assembler gets upset without it. */ - write_fn_proto_1 (s, false, name, decl); + write_fn_proto_1 (s, false, name, decl, force_public); - write_fn_proto_1 (s, is_defn, name, decl); + write_fn_proto_1 (s, is_defn, name, decl, force_public); if (replaced_dots) XDELETE (replaced_dots); @@ -1481,7 +1481,13 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) tree fntype = TREE_TYPE (decl); tree result_type = TREE_TYPE (fntype); int argno = 0; + bool force_public = false; + /* For reverse-offload 'nohost' functions: In order to be collectable in + '$offload_func_table', cf. mkoffload.cc, the function has to be visible. */ + if (lookup_attribute ("omp target device_ancestor_nohost", + DECL_ATTRIBUTES (decl))) + force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { @@ -1493,7 +1499,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) /* We construct the initial part of the function into a string stream, in order to share the prototype writing code. */ std::stringstream s; - write_fn_proto (s, true, name, decl); + write_fn_proto (s, true, name, decl, force_public); s << "{\n"; bool return_in_mem = write_return_type (s, false, result_type); diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index d130665ed19..ac400fc2a1d 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1273,7 +1273,7 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) int GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, struct addr_pair **target_table, - uint64_t **rev_fn_table __attribute__((unused))) + uint64_t **rev_fn_table) { CUmodule module; const char *const *var_names; @@ -1376,6 +1376,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, targ_tbl->start = targ_tbl->end = 0; targ_tbl++; + if (rev_fn_table && fn_entries == 0) + *rev_fn_table = NULL; + else if (rev_fn_table) + { + CUdeviceptr var; + size_t bytes; + r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module, + "$offload_func_table"); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r)); + assert (bytes == sizeof (uint64_t) * fn_entries); + *rev_fn_table = GOMP_PLUGIN_malloc (sizeof (uint64_t) * fn_entries); + r = CUDA_CALL_NOCHECK (cuMemcpyDtoH, *rev_fn_table, var, bytes); + if (r != CUDA_SUCCESS) + GOMP_PLUGIN_fatal ("cuMemcpyDtoH error: %s", cuda_error (r)); + } + nvptx_set_clocktick (module, dev); return fn_entries + var_entries + other_entries;