From patchwork Tue Jul 12 14:16:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 55969 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 021533857407 for ; Tue, 12 Jul 2022 14:16:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 6A89B3857BA3 for ; Tue, 12 Jul 2022 14:16:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6A89B3857BA3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,265,1650960000"; d="scan'208";a="81358736" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 12 Jul 2022 06:16:40 -0800 IronPort-SDR: P3OW6WaHae2JMwVvUdSkiwRIOx+gP7LCeE+CwIQyh35cNiG5DJwqMxMsj1wcI+l9wy0gxRURtD j7jzjWIyBdkGNI/8hAyAZHZGoBbaIsPGYm6oX9GTTkDC7hztGpFI5fEVlySpZykrBW+CYQ9r6h KQG1k3WXv2o/TBPb6jgcI7TWs2H30CnNEcKUbkHXUlftnsTTbnvuwDDuuYTuD1mi+ATkwNmfOV 4DgdhivoaAAMZaJINiFv3ZicqxuDc/lFSNrtEI4JScLbQ4smLhiOO2zUWhgh+oinVlUK0pTvrj BE4= Message-ID: <0e1a740e-46d5-ebfa-36f4-9a069ddf8620@codesourcery.com> Date: Tue, 12 Jul 2022 15:16:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-GB From: Andrew Stubbs Subject: [PATCH] openmp: fix max_vf setting for amdgcn offloading To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch ensures that the maximum vectorization factor used to set the "safelen" attribute on "omp simd" constructs is suitable for all the configured offload devices. Right now it makes the proper adjustment for NVPTX, but otherwise just uses a value suitable for the host system (always x86_64 in the case of amdgcn). This typically ends up being 16 where 64 is the minimum for vectorization to work properly on GCN. There is a potential problem that one "safelen" must be set for *all* offload devices, which means it can't be perfect for all devices. However I believe that too big is always OK (at least for powers of two?) whereas too small is not OK, so this code always selects the largest value of max_vf, regardless of where it comes from. The existing target VF function, omp_max_simt_vf, is tangled up with the notion of whether SIMT is available or not, so I couldn't add amdgcn in there. It's tempting to have omp_max_vf do some kind of autodetect what VF to choose, but the current implementation in omp-general.cc doesn't have access to the context in a convenient way, and nor do all the callers, so I couldn't easily do that. Instead, I have opted to add a new function, omp_max_simd_vf, which can check for the presence of amdgcn. While reviewing the callers of omp_max_vf I found one other case that looks like it ought to be tuned for the device, not just the host. In that case it's not clear how to achieve that and in fact, at least on x86_64, the way it is coded the actual value from omp_max_vf is always ignored in favour of a much larger "minimum", so I have added a comment for the next person to touch that spot and left it alone. This change gives a 10x performance improvement on the BabelStream "dot" benchmark on amdgcn and is not harmful on nvptx. OK for mainline? I will commit a backport to OG12 shortly. Andrew openmp: fix max_vf setting for amdgcn offloading Ensure that the "max_vf" figure used for the "safelen" attribute is large enough for the largest configured offload device. This change gives ~10x speed improvement on the Bablestream "dot" benchmark for AMD GCN. gcc/ChangeLog: * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Add comment. * omp-general.cc (omp_max_simd_vf): New function. * omp-general.h (omp_max_simd_vf): New prototype. * omp-low.cc (lower_rec_simd_input_clauses): Select largest from omp_max_vf, omp_max_simt_vf, and omp_max_simd_vf. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_amdgcn_offloading_enabled): New. (check_effective_target_nvptx_offloading_enabled): New. * gcc.dg/gomp/target-vf.c: New test. diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc index 6bcf6eba691..e908c27fc44 100644 --- a/gcc/gimple-loop-versioning.cc +++ b/gcc/gimple-loop-versioning.cc @@ -555,7 +555,10 @@ loop_versioning::loop_versioning (function *fn) unvectorizable code, since it is the largest size that can be handled efficiently by scalar code. omp_max_vf calculates the maximum number of bytes in a vector, when such a value is relevant - to loop optimization. */ + to loop optimization. + FIXME: this probably needs to use omp_max_simd_vf when in a target + region, but how to tell? (And MAX_FIXED_MODE_SIZE is large enough that + it doesn't actually matter.) */ m_maximum_scale = estimated_poly_value (omp_max_vf ()); m_maximum_scale = MAX (m_maximum_scale, MAX_FIXED_MODE_SIZE); } diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index a406c578f33..8c6fcebc4b3 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -994,6 +994,24 @@ omp_max_simt_vf (void) return 0; } +/* Return maximum SIMD width if offloading may target SIMD hardware. */ + +int +omp_max_simd_vf (void) +{ + if (!optimize) + return 0; + if (ENABLE_OFFLOADING) + for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;) + { + if (startswith (c, "amdgcn")) + return 64; + else if ((c = strchr (c, ':'))) + c++; + } + return 0; +} + /* Store the construct selectors as tree codes from last to first, return their number. */ diff --git a/gcc/omp-general.h b/gcc/omp-general.h index 74e90e1a71a..410343e45fa 100644 --- a/gcc/omp-general.h +++ b/gcc/omp-general.h @@ -104,6 +104,7 @@ extern gimple *omp_build_barrier (tree lhs); extern tree find_combined_omp_for (tree *, int *, void *); extern poly_uint64 omp_max_vf (void); extern int omp_max_simt_vf (void); +extern int omp_max_simd_vf (void); extern int omp_constructor_traits_to_codes (tree, enum tree_code *); extern tree omp_check_context_selector (location_t loc, tree ctx); extern void omp_mark_declare_variant (location_t loc, tree variant, diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d73c165f029..1a9a509adb9 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -4646,7 +4646,14 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, { if (known_eq (sctx->max_vf, 0U)) { - sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf (); + /* If we are compiling for multiple devices choose the largest VF. */ + sctx->max_vf = omp_max_vf (); + if (omp_maybe_offloaded_ctx (ctx)) + { + if (sctx->is_simt) + sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simt_vf ()); + sctx->max_vf = ordered_max (sctx->max_vf, omp_max_simd_vf ()); + } if (maybe_gt (sctx->max_vf, 1U)) { tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt), diff --git a/gcc/testsuite/gcc.dg/gomp/target-vf.c b/gcc/testsuite/gcc.dg/gomp/target-vf.c new file mode 100644 index 00000000000..14cea45e53c --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/target-vf.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -O2 -fdump-tree-omplower" } */ + +/* Ensure that the omp_max_vf, omp_max_simt_vf, and omp_max_simd_vf are working + properly to set the OpenMP vectorization factor for the offload target, and + not just for the host. */ + +float +foo (float * __restrict x, float * __restrict y) +{ + float sum = 0.0; + +#pragma omp target teams distribute parallel for simd map(tofrom: sum) reduction(+:sum) + for (int i=0; i<1024; i++) + sum += x[i] * y[i]; + + return sum; +} + +/* { dg-final { scan-tree-dump "safelen\\(64\\)" "omplower" { target amdgcn_offloading_enabled } } } */ +/* { dg-final { scan-tree-dump "safelen\\(32\\)" "omplower" { target { { nvptx_offloading_enabled } && { ! amdgcn_offloading_enabled } } } } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4ed7b25b9a4..363354be461 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1025,6 +1025,16 @@ proc check_effective_target_offloading_enabled {} { return [check_configured_with "--enable-offload-targets"] } +# Return 1 if compiled with --enable-offload-targets=amdgcn +proc check_effective_target_amdgcn_offloading_enabled {} { + return [check_configured_with {--enable-offload-targets=[^ ]*amdgcn}] +} + +# Return 1 if compiled with --enable-offload-targets=amdgcn +proc check_effective_target_nvptx_offloading_enabled {} { + return [check_configured_with {--enable-offload-targets=[^ ]*nvptx}] +} + # Return 1 if compilation with -fopenacc is error-free for trivial # code, 0 otherwise.