From patchwork Wed Dec 15 15:54:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48944 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 81C9C385842F for ; Wed, 15 Dec 2021 15:56:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 3775B3858C27 for ; Wed, 15 Dec 2021 15:55:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3775B3858C27 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: mbhOgLMGLXef7wf8fXjfOWOUB2F/2rmiEguK7ET9U1Nc0YJz/gxiEafLTmgRraNQtNrMGUOpw+ JU2NdsB8H8hEgOfTVKQ7+2MvjX+YAEBinWe8unK9dPXzA6CiGSvu1mKmgNqbVZzsgTXfsyru8Z xSWzUITAN7/5vxxAsqIctVv/4encFby+NMsj2NXyqX8oTxe+8Mmg8SQ7kphrHlPqjX+emZfl9l jM/XCE2YGaSDDPlDED+fngQJS4GQrChFFV+uHZry1XE4QC4GRIVP7H6/4RzNESl+yPUov0SQNG Qy2awrvHukWGdMnBXh7/q0Ld X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584563" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:01 -0800 IronPort-SDR: xca62t3aRMYV59bLcbrlIXvGu7BJFcP6pFmcf21Pf+qaCmsNvDd61/217up6VO+bLg1+9O7wm9 HWWkO7wJ7k20yaMUZr8dUnDUjodYhyOGaFXE72/4oI4GTvk1lQZzMSDLCDxrRTzk99j3/QqNoJ QRC0v3QfDUHfpQEikJvWVp2m2AfHtfqDKshBdCBd5dmVbb/HLwXhHoshYDIFO7ZkgkJTUx/qwm cI9rtrXRF/zS57/0Ppohe0FO1ki0uQrrCKEgdBfbJZIC1aMWQiIMUsMzwFKLlhdcjMNv+8yLxg sjA= From: Frederik Harwath To: Subject: [PATCH 01/40] Kernels loops annotation: C and C++. Date: Wed, 15 Dec 2021 16:54:08 +0100 Message-ID: <20211215155447.19379-2-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPAM_BODY, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandra Loosemore , thomas@codesourcery.com, joseph@codesourcery.com, nathan@acm.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore This patch detects loops in kernels regions that are candidates for parallelization, and adds "#pragma acc loop auto" annotations to them. This annotation is controlled by the -fopenacc-kernels-annotate-loops option, which is enabled by default. -Wopenacc-kernels-annotate-loops can be used to produce diagnostics about loops that cannot be annotated. gcc/c-family/ * c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare. * c-omp.c: Include tree-iterator.h (enum annotation_state): New. (struct annotation_info): New. (do_not_annotate_loop): New. (do_not_annotate_loop_nest): New. (annotation_error): New. (c_finish_omp_for_internal): Split from c_finish_omp_for. Use annotation_error function. Code refactoring to avoid destructive changes that cannot be undone in case of error. (is_local_var): New. (lang_specific_unwrap_initializer): New. (annotate_for_loop): New. (check_and_annotate_for_loop): New. (annotate_loops_in_kernels_regions): New. (c_oacc_annotate_loops_in_kernels_regions): New. * c.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. gcc/c/ * c-decl.c (c_unwrap_for_init): New. (finish_function): Call c_oacc_annotate_loops_in_kernels_regions. gcc/cp/ * decl.c (cp_unwrap_for_init): New. (finish_function): Call c_oacc_annotate_loops_in_kernels_regions. gcc/ * doc/invoke.texi (Option Summary): Add entries for -Wopenacc-kernels-annotate-loops and -fno-openacc-kernels-annotate-loops. (Warning Options): Document -Wopenacc-kernels-annotate-loops. (Optimization Options): Document -fno-openacc-kernels-annotate-loops. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Add -fno-openacc-kernels-annotate-loops option. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise. * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise. * c-c++-common/goacc/kernels-double-reduction.c: Likewise. * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise. * c-c++-common/goacc/kernels-loop-2.c: Likewise. * c-c++-common/goacc/kernels-loop-3.c: Likewise. * c-c++-common/goacc/kernels-loop-data-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-loop-data-update.c: Likewise. * c-c++-common/goacc/kernels-loop-data.c: Likewise. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise. * c-c++-common/goacc/kernels-loop-n.c: Likewise. * c-c++-common/goacc/kernels-loop-nest.c: Likewise. * c-c++-common/goacc/kernels-loop.c: Likewise. * c-c++-common/goacc/kernels-one-counter-var.c: Likewise. * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-reduction.c: Likewise. * c-c++-common/goacc/kernels-loop-annotation-1.c: New. * c-c++-common/goacc/kernels-loop-annotation-2.c: New. * c-c++-common/goacc/kernels-loop-annotation-3.c: New. * c-c++-common/goacc/kernels-loop-annotation-4.c: New. * c-c++-common/goacc/kernels-loop-annotation-5.c: New. * c-c++-common/goacc/kernels-loop-annotation-6.c: New. * c-c++-common/goacc/kernels-loop-annotation-7.c: New. * c-c++-common/goacc/kernels-loop-annotation-8.c: New. * c-c++-common/goacc/kernels-loop-annotation-9.c: New. * c-c++-common/goacc/kernels-loop-annotation-10.c: New. * c-c++-common/goacc/kernels-loop-annotation-11.c: New. * c-c++-common/goacc/kernels-loop-annotation-12.c: New. * c-c++-common/goacc/kernels-loop-annotation-13.c: New. * c-c++-common/goacc/kernels-loop-annotation-14.c: New. * c-c++-common/goacc/kernels-loop-annotation-15.c: New. * c-c++-common/goacc/kernels-loop-annotation-16.c: New. * c-c++-common/goacc/kernels-loop-annotation-17.c: New. --- gcc/c-family/c-common.h | 1 + gcc/c-family/c-omp.c | 799 ++++++++++++++++-- gcc/c-family/c.opt | 8 + gcc/c/c-decl.c | 28 + gcc/cp/decl.c | 44 + gcc/doc/invoke.texi | 32 +- .../goacc/classify-kernels-unparallelized.c | 1 + .../c-c++-common/goacc/classify-kernels.c | 3 +- .../kernels-counter-var-redundant-load.c | 1 + .../kernels-counter-vars-function-scope.c | 1 + .../goacc/kernels-double-reduction-n.c | 1 + .../goacc/kernels-double-reduction.c | 1 + .../c-c++-common/goacc/kernels-loop-2.c | 1 + .../c-c++-common/goacc/kernels-loop-3.c | 1 + .../goacc/kernels-loop-annotation-1.c | 26 + .../goacc/kernels-loop-annotation-10.c | 32 + .../goacc/kernels-loop-annotation-11.c | 27 + .../goacc/kernels-loop-annotation-12.c | 28 + .../goacc/kernels-loop-annotation-13.c | 27 + .../goacc/kernels-loop-annotation-14.c | 22 + .../goacc/kernels-loop-annotation-15.c | 22 + .../goacc/kernels-loop-annotation-16.c | 26 + .../goacc/kernels-loop-annotation-17.c | 26 + .../goacc/kernels-loop-annotation-2.c | 21 + .../goacc/kernels-loop-annotation-3.c | 24 + .../goacc/kernels-loop-annotation-4.c | 34 + .../goacc/kernels-loop-annotation-5.c | 27 + .../goacc/kernels-loop-annotation-6.c | 27 + .../goacc/kernels-loop-annotation-7.c | 26 + .../goacc/kernels-loop-annotation-8.c | 27 + .../goacc/kernels-loop-annotation-9.c | 26 + .../c-c++-common/goacc/kernels-loop-data-2.c | 1 + .../goacc/kernels-loop-data-enter-exit-2.c | 1 + .../goacc/kernels-loop-data-enter-exit.c | 1 + .../goacc/kernels-loop-data-update.c | 1 + .../c-c++-common/goacc/kernels-loop-data.c | 1 + .../c-c++-common/goacc/kernels-loop-g.c | 1 + .../goacc/kernels-loop-mod-not-zero.c | 1 + .../c-c++-common/goacc/kernels-loop-n.c | 1 + .../c-c++-common/goacc/kernels-loop-nest.c | 1 + .../c-c++-common/goacc/kernels-loop.c | 1 + .../goacc/kernels-one-counter-var.c | 1 + .../kernels-parallel-loop-data-enter-exit.c | 1 + .../c-c++-common/goacc/kernels-reduction.c | 1 + 44 files changed, 1322 insertions(+), 61 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index f60714e34160..f8b414401a5d 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1247,6 +1247,7 @@ extern enum omp_clause_default_kind c_omp_predetermined_sharing (tree); extern enum omp_clause_defaultmap_kind c_omp_predetermined_mapping (tree); extern tree c_omp_check_context_selector (location_t, tree); extern void c_omp_mark_declare_variant (location_t, tree, tree); +extern void c_oacc_annotate_loops_in_kernels_regions (tree, tree (*) (tree)); extern const char *c_omp_map_clause_name (tree, bool); extern void c_omp_adjust_map_clauses (tree, bool); diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index fad060670b65..fad50da8fbc4 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -37,7 +37,7 @@ along with GCC; see the file COPYING3. If not see #include "langhooks.h" #include "bitmap.h" #include "gimple-fold.h" - +#include "tree-iterator.h" /* Complete a #pragma oacc wait construct. LOC is the location of the #pragma. */ @@ -918,6 +918,110 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr) return incr; } +/* State of annotation traversal for FOR loops in kernels regions, + used to control processing and diagnostic messages that are deferred until + the entire loop has been scanned. */ +enum annotation_state { + as_outer, + as_in_kernels_region, + as_in_kernels_loop, + /* The remaining state values represent conversion failures caught + while in as_in_kernels_loop state. To test whether the traversal is + in the body of a kernels loop, use (state >= as_in_kernels_loop). */ + as_invalid_variable_type, + as_missing_initializer, + as_invalid_initializer, + as_missing_predicate, + as_invalid_predicate, + as_missing_increment, + as_invalid_increment, + as_explicit_annotation, + as_invalid_control_flow, + as_invalid_break, + as_invalid_return, + as_invalid_call, + as_invalid_modification +}; + +/* Structure used to hold state for automatic annotation of FOR loops + in kernels regions. LOOP is the nearest enclosing loop, or + NULL_TREE if outside of a loop context. VARS is a tree_list + containing the variables controlling LOOP's termination (the + induction variable and a possible limit variable). STATE keeps + track of whether loop satisfies all criteria making it legal to + parallelize. Otherwise, REASON is a statement that blocks + automatic parallelization, such as an unstructured jump or an + assignment to a variable in VARS, used for printing diagnostics. + + These structures are chained through NEXT, which points to the + next-closest enclosing loop's or the kernels region's annotation info, if + any. */ + +struct annotation_info +{ + tree loop; + tree vars; + bool break_ok; + enum annotation_state state; + tree reason; + struct annotation_info *next; +}; + +/* Mark the current loop's INFO as not OK to annotate, recording STATE + and REASON for producing diagnostics later. */ + +static void +do_not_annotate_loop (struct annotation_info *info, + enum annotation_state state, tree reason) +{ + if (info->state == as_in_kernels_loop) + { + info->state = state; + info->reason = reason; + } +} + +/* Mark the current loop identified by INFO and all of its ancestors (i.e., + enclosing loops) as not OK to annotate. Arguments are the same as + for do_not_annotate_loop. */ + +static void +do_not_annotate_loop_nest (struct annotation_info *info, + enum annotation_state state, tree reason) +{ + while (info != NULL) + { + do_not_annotate_loop (info, state, reason); + info = info->next; + } +} + +/* If INFO is non-null, call do_not_annotate_loop with STATE and REASON + to record info for diagnosing an error later. Otherwise emit an error now + at ELOCUS with message MSG and the optional arguments. */ + +static void annotation_error (struct annotation_info *, + enum annotation_state, tree, location_t, + const char *, ...) ATTRIBUTE_GCC_DIAG(5,6); +static +void annotation_error (struct annotation_info *info, + enum annotation_state state, + tree reason, + location_t elocus, + const char *msg, ...) +{ + if (info) + do_not_annotate_loop (info, state, reason); + else + { + auto_diagnostic_group d; + va_list ap; + va_start (ap, msg); + emit_diagnostic_valist (DK_ERROR, elocus, -1, msg, &ap); + va_end (ap); + } +} + /* Validate and generate OMP_FOR. DECLV is a vector of iteration variables, for each collapsed loop. @@ -927,12 +1031,19 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr) INITV, CONDV and INCRV are vectors containing initialization expressions, controlling predicates and increment expressions. BODY is the body of the loop and PRE_BODY statements that go before - the loop. */ + the loop. FINAL_P is true if not inside a C++ template. -tree -c_finish_omp_for (location_t locus, enum tree_code code, tree declv, - tree orig_declv, tree initv, tree condv, tree incrv, - tree body, tree pre_body, bool final_p) + INFO is null if called to parse an explicitly-annotated OMP for + loop, otherwise it holds state information for automatically + annotating a regular FOR loop in a kernels region. In the former case, + malformed loops are hard errors; otherwise we just record the annotation + failure in INFO. */ + +static tree +c_finish_omp_for_internal (location_t locus, enum tree_code code, tree declv, + tree orig_declv, tree initv, tree condv, tree incrv, + tree body, tree pre_body, bool final_p, + struct annotation_info *info) { location_t elocus; bool fail = false; @@ -956,12 +1067,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, if (!INTEGRAL_TYPE_P (TREE_TYPE (decl)) && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE) { - error_at (elocus, "invalid type for iteration variable %qE", decl); + annotation_error (info, as_invalid_variable_type, decl, elocus, + "invalid type for iteration variable %qE", decl); fail = true; } else if (TYPE_ATOMIC (TREE_TYPE (decl))) { - error_at (elocus, "%<_Atomic%> iteration variable %qE", decl); + annotation_error (info, as_invalid_variable_type, decl, elocus, + "%<_Atomic%> iteration variable %qE", decl); fail = true; /* _Atomic iterator confuses stuff too much, so we risk ICE trying to diagnose it further. */ @@ -977,7 +1090,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, init = DECL_INITIAL (decl); if (init == NULL) { - error_at (elocus, "%qE is not initialized", decl); + annotation_error (info, as_missing_initializer, decl, elocus, + "%qE is not initialized", decl); init = integer_zero_node; fail = true; } @@ -998,7 +1112,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, if (cond == NULL_TREE) { - error_at (elocus, "missing controlling predicate"); + annotation_error (info, as_missing_predicate, NULL_TREE, elocus, + "missing controlling predicate"); fail = true; } else @@ -1014,12 +1129,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, if (EXPR_HAS_LOCATION (cond)) elocus = EXPR_LOCATION (cond); - if (TREE_CODE (cond) == LT_EXPR - || TREE_CODE (cond) == LE_EXPR - || TREE_CODE (cond) == GT_EXPR - || TREE_CODE (cond) == GE_EXPR - || TREE_CODE (cond) == NE_EXPR - || TREE_CODE (cond) == EQ_EXPR) + enum tree_code condcode = TREE_CODE (cond); + + if (condcode == LT_EXPR + || condcode == LE_EXPR + || condcode == GT_EXPR + || condcode == GE_EXPR + || condcode == NE_EXPR + || condcode == EQ_EXPR) { tree op0 = TREE_OPERAND (cond, 0); tree op1 = TREE_OPERAND (cond, 1); @@ -1039,79 +1156,88 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, if (TREE_CODE (op0) == NOP_EXPR && decl == TREE_OPERAND (op0, 0)) { - TREE_OPERAND (cond, 0) = TREE_OPERAND (op0, 0); - TREE_OPERAND (cond, 1) - = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl), - TREE_OPERAND (cond, 1)); + op0 = TREE_OPERAND (op0, 0); + op1 = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl), + op1); } else if (TREE_CODE (op1) == NOP_EXPR && decl == TREE_OPERAND (op1, 0)) { - TREE_OPERAND (cond, 1) = TREE_OPERAND (op1, 0); - TREE_OPERAND (cond, 0) - = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl), - TREE_OPERAND (cond, 0)); + op1 = TREE_OPERAND (op1, 0); + op0 = fold_build1_loc (elocus, NOP_EXPR, TREE_TYPE (decl), + op0); } - if (decl == TREE_OPERAND (cond, 0)) + if (decl == op0) cond_ok = true; - else if (decl == TREE_OPERAND (cond, 1)) + else if (decl == op1) { - TREE_SET_CODE (cond, - swap_tree_comparison (TREE_CODE (cond))); - TREE_OPERAND (cond, 1) = TREE_OPERAND (cond, 0); - TREE_OPERAND (cond, 0) = decl; + condcode = swap_tree_comparison (condcode); + op1 = op0; + op0 = decl; cond_ok = true; } - if (TREE_CODE (cond) == NE_EXPR - || TREE_CODE (cond) == EQ_EXPR) + if (condcode == NE_EXPR || condcode == EQ_EXPR) { if (!INTEGRAL_TYPE_P (TREE_TYPE (decl))) { - if (code == OACC_LOOP || TREE_CODE (cond) == EQ_EXPR) + if (code == OACC_LOOP || condcode == EQ_EXPR) cond_ok = false; } - else if (operand_equal_p (TREE_OPERAND (cond, 1), + else if (operand_equal_p (op1, TYPE_MIN_VALUE (TREE_TYPE (decl)), 0)) - TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR - ? GT_EXPR : LE_EXPR); - else if (operand_equal_p (TREE_OPERAND (cond, 1), + condcode = (condcode == NE_EXPR ? GT_EXPR : LE_EXPR); + else if (operand_equal_p (op1, TYPE_MAX_VALUE (TREE_TYPE (decl)), 0)) - TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR - ? LT_EXPR : GE_EXPR); - else if (code == OACC_LOOP || TREE_CODE (cond) == EQ_EXPR) + condcode = (condcode == NE_EXPR ? LT_EXPR : GE_EXPR); + else if (code == OACC_LOOP || condcode == EQ_EXPR) cond_ok = false; } - if (cond_ok && TREE_VEC_ELT (condv, i) != cond) + if (cond_ok) { - tree ce = NULL_TREE, *pce = &ce; - tree type = TREE_TYPE (TREE_OPERAND (cond, 1)); - for (tree c = TREE_VEC_ELT (condv, i); c != cond; - c = TREE_OPERAND (c, 1)) + /* We postponed destructive changes to canonicalize + cond until we're sure it is OK. In the !error_p + case where we are trying to transform a regular FOR_STMT + to OMP_FOR, we don't want to destroy the original + condition if we aren't going to be able to do the + transformation anyway. */ + TREE_SET_CODE (cond, condcode); + TREE_OPERAND (cond, 0) = op0; + TREE_OPERAND (cond, 1) = op1; + + if (TREE_VEC_ELT (condv, i) != cond) { - *pce = build2 (COMPOUND_EXPR, type, TREE_OPERAND (c, 0), - TREE_OPERAND (cond, 1)); - pce = &TREE_OPERAND (*pce, 1); + tree ce = NULL_TREE, *pce = &ce; + tree type = TREE_TYPE (op1); + for (tree c = TREE_VEC_ELT (condv, i); c != cond; + c = TREE_OPERAND (c, 1)) + { + *pce = build2 (COMPOUND_EXPR, type, + TREE_OPERAND (c, 0), op1); + pce = &TREE_OPERAND (*pce, 1); + } + op1 = ce; + TREE_VEC_ELT (condv, i) = cond; } - TREE_OPERAND (cond, 1) = ce; - TREE_VEC_ELT (condv, i) = cond; } } if (!cond_ok) { - error_at (elocus, "invalid controlling predicate"); + annotation_error (info, as_invalid_predicate, cond, elocus, + "invalid controlling predicate"); fail = true; } } if (incr == NULL_TREE) { - error_at (elocus, "missing increment expression"); + annotation_error (info, as_missing_increment, NULL_TREE, elocus, + "missing increment expression"); fail = true; } else @@ -1210,9 +1336,11 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, if (i == NULL_TREE || !operand_equal_p (unit, i, 0)) { - error_at (elocus, - "increment is not constant 1 or " - "-1 for % condition"); + annotation_error (info, + as_invalid_increment, + incr, elocus, + "increment is not constant 1 or " + "-1 for % condition"); fail = true; } } @@ -1228,9 +1356,10 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, { if (!integer_onep (i) && !integer_minus_onep (i)) { - error_at (elocus, - "increment is not constant 1 or -1 for" - " % condition"); + annotation_error (info, as_invalid_increment, + incr, elocus, + "increment is not constant 1 or -1 for" + " % condition"); fail = true; } } @@ -1242,7 +1371,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, } if (!incr_ok) { - error_at (elocus, "invalid increment expression"); + annotation_error (info, as_invalid_increment, incr, + elocus, "invalid increment expression"); fail = true; } } @@ -1270,6 +1400,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv, } } +/* External entry point to c_finish_omp_for_internal, called from the + parsers. See above for description of the arguments. */ + +tree +c_finish_omp_for (location_t locus, enum tree_code code, tree declv, + tree orig_declv, tree initv, tree condv, tree incrv, + tree body, tree pre_body, bool final_p) +{ + return c_finish_omp_for_internal (locus, code, declv, + orig_declv, initv, condv, incrv, + body, pre_body, final_p, NULL); +} + + /* Type for passing data in between c_omp_check_loop_iv and c_omp_check_loop_iv_r. */ @@ -3000,6 +3144,543 @@ c_omp_map_clause_name (tree clause, bool oacc) return omp_clause_code_name[OMP_CLAUSE_CODE (clause)]; } +/* The following functions implement automatic recognition and annotation of + for loops in OpenACC kernels regions. Inside a kernels region, a nest of + for loops that does not contain any annotated OpenACC loops, nor break + or goto statements or assignments to the variables controlling loop + termination, is converted to an OMP_FOR node with an "acc loop auto" + annotation on each loop. This feature is controlled by + flag_openacc_kernels_annotate_loops. */ + +/* Check whether DECL is the declaration of a local variable (or function + parameter) of integral type that does not have its address taken. */ + +static bool +is_local_var (tree decl) +{ + return ((TREE_CODE (decl) == VAR_DECL || TREE_CODE (decl) == PARM_DECL) + && DECL_CONTEXT (decl) != NULL + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && INTEGRAL_TYPE_P (TREE_TYPE (decl)) + && !TREE_ADDRESSABLE (decl)); +} + +/* The initializer for a FOR_STMT is sometimes wrapped in various other + language-specific tree structures. We need a hook to unwrap them. + This function takes a tree argument and should return either a + MODIFY_EXPR, VAR_DECL, or NULL_TREE. */ + +static tree (*lang_specific_unwrap_initializer) (tree); + +/* Try to annotate the given NODE, which must be a FOR_STMT, with a + "#pragma acc loop auto" annotation. In practice, this means + building an OMP_FOR node for it. PREV_STMT is the statement + immediately before the loop, which may be used as the loop's + initialization statement. Annotating the loop may fail, in which + case INFO is used to record the cause of the failure and the + original loop remains unchanged. This function returns the + transformed loop if the transformation succeeded, the original node + otherwise. */ + +static tree +annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi, + struct annotation_info *info) +{ + gcc_checking_assert (TREE_CODE (node) == FOR_STMT); + + location_t loc = EXPR_LOCATION (node); + tree cond = FOR_COND (node); + gcc_assert (cond); + tree decl = TREE_OPERAND (cond, 0); + gcc_assert (decl && TREE_CODE (decl) == VAR_DECL); + tree init = FOR_INIT_STMT (node); + tree prev_stmt = NULL_TREE; + bool unlink_prev = false; + bool fix_decl = false; + + + /* Both the C and C++ front ends normally put the initializer in the + statement list just before the FOR_STMT instead of in FOR_INIT_STMT. + If FOR_INIT_STMT happens to exist but isn't a MODIFY_EXPR, bail out + because the code below won't handle it. */ + if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR) + { + do_not_annotate_loop (info, as_invalid_initializer, NULL_TREE); + return node; + } + + /* Examine the statement before the loop to see if it is a + valid initializer. It must be either a MODIFY_EXPR or VAR_DECL, + possibly wrapped in language-specific structure. */ + if (init == NULL_TREE && prev_tsi != NULL) + { + prev_stmt = tsi_stmt (*prev_tsi); + + /* Call the language-specific hook to unwrap prev_stmt. */ + if (prev_stmt) + prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt); + + /* See if we have a valid MODIFY_EXPR. */ + if (prev_stmt + && TREE_CODE (prev_stmt) == MODIFY_EXPR + && TREE_OPERAND (prev_stmt, 0) == decl + && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1))) + { + init = prev_stmt; + unlink_prev = true; + } + else if (prev_stmt == decl + && !TREE_SIDE_EFFECTS (DECL_INITIAL (decl))) + { + /* If the preceding statement is the declaration of the loop + variable with its initialization, build an assignment + expression for the loop's initializer. */ + init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl, + DECL_INITIAL (decl)); + /* We need to remove the initializer from the decl if we + end up using the init we just built instead. */ + fix_decl = true; + } + } + + if (init == NULL_TREE) + /* There is nothing we can do to find the correct init statement for + this loop, but c_finish_omp_for insists on having one and would fail + otherwise. In that case, we would just return node. Do that + directly, here. */ + { + do_not_annotate_loop (info, as_missing_initializer, NULL_TREE); + return node; + } + + tree incr = FOR_EXPR (node); + + /* The C++ frontend can wrap the increment two levels deep inside a + cleanup expression, but c_finish_omp_for does not care about that. */ + if (incr != NULL_TREE && TREE_CODE (incr) == CLEANUP_POINT_EXPR) + incr = TREE_OPERAND (TREE_OPERAND (incr, 0), 0); + tree body = FOR_BODY (node); + + tree declv = make_tree_vec (1); + tree initv = make_tree_vec (1); + tree condv = make_tree_vec (1); + tree incrv = make_tree_vec (1); + TREE_VEC_ELT (declv, 0) = decl; + TREE_VEC_ELT (initv, 0) = init; + TREE_VEC_ELT (condv, 0) = cond; + TREE_VEC_ELT (incrv, 0) = incr; + + /* Do the actual transformation. This can still fail because + c_finish_omp_for has some stricter checks than we have performed up to + this point. */ + tree omp_for = c_finish_omp_for_internal (loc, OACC_LOOP, declv, NULL_TREE, + initv, condv, incrv, body, + NULL_TREE, false, info); + if (omp_for != NULL_TREE) + { + if (unlink_prev) + /* We don't need the previous statement that we consumed as an + initializer in the new OMP_FOR any more. */ + tsi_delink (prev_tsi); + + if (fix_decl) + /* We no longer need the initializer expression on the decl of + the loop variable and don't want to duplicate it. The + kernels conversion pass would interpret it as a stray + assignment in a gang-single region. */ + DECL_INITIAL (prev_stmt) = NULL_TREE; + + /* Add an auto clause, then return the new loop. */ + tree auto_clause = build_omp_clause (loc, OMP_CLAUSE_AUTO); + OMP_CLAUSE_CHAIN (auto_clause) = OMP_FOR_CLAUSES (omp_for); + OMP_FOR_CLAUSES (omp_for) = auto_clause; + return omp_for; + } + + return node; +} + +/* Forward declaration. */ +static tree annotate_loops_in_kernels_regions (tree *, int *, void *); + +/* Given a FOR_STMT NODE that is a candidate for parallelization, check its + body for validity, then try to annotate it with + "#pragma oacc loop auto", possibly modifying the current node in place. + The INFO argument contains the traversal state at the point the loop + appears. */ + +static void +check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi, + struct annotation_info *info) +{ + tree node = *nodeptr; + gcc_assert (TREE_CODE (node) == FOR_STMT); + + /* This structure describes the current loop statement. */ + struct annotation_info loop_info + = { node, NULL_TREE, false, as_in_kernels_loop, NULL_TREE, info }; + tree cond = FOR_COND (node); + + /* If we are in the body of an explicitly-annotated loop, do not add + annotations to this loop or any other nested loops. */ + if (info->state == as_explicit_annotation) + do_not_annotate_loop (&loop_info, as_explicit_annotation, info->reason); + + /* We need to find the controlling variable for the loop in order + to detect whether it is modified in the body of the loop. + That is why we are doing some checks on the loop condition + that duplicate what c_finish_omp_for is doing. */ + + /* The loop condition must be a comparison. */ + else if (cond == NULL_TREE) + do_not_annotate_loop (&loop_info, as_missing_predicate, NULL_TREE); + else if (TREE_CODE_CLASS (TREE_CODE (cond)) != tcc_comparison) + do_not_annotate_loop (&loop_info, as_invalid_predicate, cond); + else + { + /* The condition's LHS must be a local variable that does not + have its address taken. Its RHS must also be such a local + variable or a constant. */ + tree induction_var = TREE_OPERAND (cond, 0); + tree limit_var = TREE_OPERAND (cond, 1); + if (!is_local_var (induction_var) + || (!is_local_var (limit_var) + && (TREE_CODE_CLASS (TREE_CODE (limit_var)) + != tcc_constant))) + do_not_annotate_loop (&loop_info, as_invalid_predicate, cond); + else + { + /* These variables must not be assigned to in the loop. */ + loop_info.vars = tree_cons (NULL_TREE, induction_var, + loop_info.vars); + if (TREE_CODE_CLASS (TREE_CODE (limit_var)) != tcc_constant) + loop_info.vars = tree_cons (NULL_TREE, limit_var, loop_info.vars); + } + } + + /* Walk the body. This will process any nested loops, so we have to do it + even if we have already rejected this loop as a candidate for + annotation. */ + walk_tree (&FOR_BODY (node), annotate_loops_in_kernels_regions, + (void *) &loop_info, NULL); + + if (loop_info.state == as_in_kernels_loop) + { + /* If the traversal of the loop and all nested loops didn't hit + any problems, attempt the actual transformation. If it + succeeds, replace this node with the annotated loop. */ + tree result = annotate_for_loop (node, prev_tsi, &loop_info); + if (result != node) + { + /* Success! */ + *nodeptr = result; + return; + } + } + + /* If we got here, we have a FOR_STMT we could not convert to an + OMP loop. */ + + if (loop_info.state == as_invalid_return) + /* This is diagnosed elsewhere as a hard error, so no warning is + needed here. */ + return; + + /* Issue warnings about other problems. */ + auto_diagnostic_group d; + if (warning_at (EXPR_LOCATION (node), + OPT_Wopenacc_kernels_annotate_loops, + "loop cannot be annotated for OpenACC parallelization")) + { + location_t locus; + if (loop_info.reason && EXPR_HAS_LOCATION (loop_info.reason)) + locus = EXPR_LOCATION (loop_info.reason); + else + locus = EXPR_LOCATION (node); + switch (loop_info.state) + { + case as_invalid_variable_type: + inform (locus, "invalid type for iteration variable %qE", + loop_info.reason); + break; + case as_missing_initializer: + inform (locus, "missing iteration variable initializer"); + break; + case as_invalid_initializer: + inform (locus, "unrecognized initializer"); + break; + case as_missing_predicate: + inform (locus, "missing controlling predicate"); + break; + case as_invalid_predicate: + inform (locus, "invalid controlling predicate"); + break; + case as_missing_increment: + inform (locus, "missing increment expression"); + break; + case as_invalid_increment: + inform (locus, "invalid increment expression"); + break; + case as_explicit_annotation: + inform (locus, "explicit OpenACC annotation in loop nest"); + break; + case as_invalid_control_flow: + inform (locus, "loop contains unstructured control flow"); + break; + case as_invalid_break: + inform (locus, "loop contains % statement"); + break; + case as_invalid_call: + inform (locus, "loop contains call to non-oacc function"); + break; + case as_invalid_modification: + inform (locus, "invalid modification of controlling variable"); + break; + default: + gcc_unreachable (); + } + } +} + +/* Traversal function for walk_tree. Visit the tree, finding OpenACC + kernels regions. DATA is NULL if we are outside of a kernels region, + otherwise it is a pointer to the enclosing kernels region's + annotation_info struct. If the traversal encounters a for loop inside a + kernels region that is a candidate for parallelization, annotate it + with OpenACC loop directives. */ + +static tree +annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees, + void *data) +{ + tree node = *nodeptr; + struct annotation_info *info = (struct annotation_info *) data; + gcc_assert (info); + + switch (TREE_CODE (node)) + { + case OACC_KERNELS: + /* Recursively process the body of the kernels region in a new info + scope. */ + if (info->state == as_outer) + { + struct annotation_info nested_info + = { NULL_TREE, NULL_TREE, true, + as_in_kernels_region, NULL_TREE, info }; + walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions, + (void *) &nested_info, NULL); + *walk_subtrees = 0; + } + break; + + case OACC_LOOP: + /* Do not try to add automatic OpenACC annotations inside manually + annotated loops. Presumably, the user avoided doing it on + purpose; for example, all available levels of parallelism may + have been used up. */ + { + struct annotation_info nested_info + = { NULL_TREE, NULL_TREE, false, as_explicit_annotation, + node, info }; + if (info->state >= as_in_kernels_region) + do_not_annotate_loop_nest (info, as_explicit_annotation, + node); + walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions, + (void *) &nested_info, NULL); + *walk_subtrees = 0; + } + break; + + case FOR_STMT: + /* Try to annotate the loop if we are in a kernels region. + This will do a recursive traversal of the loop body in a new + info scope. */ + if (info->state >= as_in_kernels_region) + { + check_and_annotate_for_loop (nodeptr, NULL, info); + *walk_subtrees = 0; + } + break; + + case LABEL_EXPR: + /* Possibly unstructured control flow. Unless we perform further + analyses, we must assume that such control flow may enter the + current loop. In this case, we must not parallelize the loop. */ + if (info->state >= as_in_kernels_loop + && TREE_USED (LABEL_EXPR_LABEL (node))) + do_not_annotate_loop_nest (info, as_invalid_control_flow, node); + break; + + case GOTO_EXPR: + /* Possibly unstructured control flow. Unless we perform further + analyses, we must assume that such control flow may leave the + current loop. In this case, we must not parallelize the loop. */ + if (info->state >= as_in_kernels_loop) + do_not_annotate_loop_nest (info, as_invalid_control_flow, node); + break; + + case BREAK_STMT: + /* A break statement. Whether or not this is valid depends on the + enclosing context. */ + if (info->state >= as_in_kernels_loop && !info->break_ok) + do_not_annotate_loop (info, as_invalid_break, node); + break; + + case RETURN_EXPR: + /* A return leaves the entire loop nest. */ + if (info->state >= as_in_kernels_loop) + do_not_annotate_loop_nest (info, as_invalid_return, node); + break; + + case CALL_EXPR: + /* Direct function calls to functions marked as OpenACC routines are + allowed. Reject indirect calls or calls to non-routines. */ + if (info->state >= as_in_kernels_loop) + { + tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE; + if (fn != NULL_TREE && TREE_CODE (fn) == FUNCTION_DECL) + fn_decl = fn; + else if (fn != NULL_TREE && TREE_CODE (fn) == ADDR_EXPR) + { + tree fn_op = TREE_OPERAND (fn, 0); + if (fn_op != NULL_TREE && TREE_CODE (fn_op) == FUNCTION_DECL) + fn_decl = fn_op; + } + if (fn_decl == NULL_TREE) + do_not_annotate_loop_nest (info, as_invalid_call, node); + else if (!lookup_attribute ("oacc function", + DECL_ATTRIBUTES (fn_decl))) + do_not_annotate_loop_nest (info, as_invalid_call, node); + } + break; + + case MODIFY_EXPR: + /* See if this assignment's LHS is one of the variables that must + not be modified in the loop body because they control termination + of the loop (or an enclosing loop in the nest). */ + if (info->state >= as_in_kernels_loop) + { + tree lhs = TREE_OPERAND (node, 0); + if (!is_local_var (lhs)) + /* Early exit: This cannot be a variable we care about. */ + break; + /* Walk up the loop stack. Invalidate the ones controlled by this + variable. There may be several, if this variable is the common + iteration limit for several nested loops. */ + for (struct annotation_info *outer_loop = info; outer_loop != NULL; + outer_loop = outer_loop->next) + for (tree t = outer_loop->vars; t != NULL_TREE; t = TREE_CHAIN (t)) + if (TREE_VALUE (t) == lhs) + { + do_not_annotate_loop (outer_loop, + as_invalid_modification, + node); + break; + } + } + break; + + case SWITCH_STMT: + /* Needs special handling to allow break in the body. */ + if (info->state >= as_in_kernels_loop) + { + bool save_break_ok = info->break_ok; + + walk_tree (&SWITCH_STMT_COND (node), + annotate_loops_in_kernels_regions, + (void *) info, NULL); + info->break_ok = true; + walk_tree (&SWITCH_STMT_BODY (node), + annotate_loops_in_kernels_regions, + (void *) info, NULL); + info->break_ok = save_break_ok; + *walk_subtrees = 0; + } + break; + + case WHILE_STMT: + /* Needs special handling to allow break in the body. */ + if (info->state >= as_in_kernels_loop) + { + bool save_break_ok = info->break_ok; + + walk_tree (&WHILE_COND (node), annotate_loops_in_kernels_regions, + (void *) info, NULL); + info->break_ok = true; + walk_tree (&WHILE_BODY (node), annotate_loops_in_kernels_regions, + (void *) info, NULL); + info->break_ok = save_break_ok; + *walk_subtrees = 0; + } + break; + + case DO_STMT: + /* Needs special handling to allow break in the body. */ + if (info->state >= as_in_kernels_loop) + { + bool save_break_ok = info->break_ok; + + walk_tree (&DO_COND (node), annotate_loops_in_kernels_regions, + (void *) info, NULL); + info->break_ok = true; + walk_tree (&DO_BODY (node), annotate_loops_in_kernels_regions, + (void *) info, NULL); + info->break_ok = save_break_ok; + *walk_subtrees = 0; + } + break; + + case STATEMENT_LIST: + /* We iterate over these explicitly so that we can track the previous + statement in the chain. It may be the initializer for a following + FOR_STMT node. */ + if (info->state >= as_in_kernels_region) + { + tree_stmt_iterator i = tsi_start (node); + tree_stmt_iterator prev, *prev_tsi = NULL; + while (!tsi_end_p (i)) + { + tree *stmtptr = tsi_stmt_ptr (i); + if (TREE_CODE (*stmtptr) == FOR_STMT) + { + check_and_annotate_for_loop (stmtptr, prev_tsi, info); + *walk_subtrees = 0; + } + else + walk_tree (stmtptr, annotate_loops_in_kernels_regions, + (void *) info, NULL); + prev = i; + prev_tsi = &prev; + tsi_next (&i); + } + *walk_subtrees = 0; + } + break; + + default: + break; + } + + return NULL_TREE; +} + +/* Find for loops in OpenACC kernels regions that do not have OpenACC + annotations but look like they might benefit from automatic + parallelization. Convert them from FOR_STMT to OMP_FOR nodes and + add the equivalent of "#pragma acc loop auto" annotations for them. + Assumes flag_openacc_kernels_annotate_loops is set. */ + +void +c_oacc_annotate_loops_in_kernels_regions (tree decl, + tree (*unwrap_fn) (tree)) +{ + struct annotation_info info + = { NULL_TREE, NULL_TREE, true, as_outer, NULL_TREE, NULL }; + lang_specific_unwrap_initializer = unwrap_fn; + walk_tree (&DECL_SAVED_TREE (decl), annotate_loops_in_kernels_regions, + (void *) &info, NULL); +} + /* Used to merge map clause information in c_omp_adjust_map_clauses. */ struct map_clause { diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 06457ac739e4..a0f43d6d325f 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1074,6 +1074,10 @@ Wopenacc-parallelism C C++ Var(warn_openacc_parallelism) Warning Warn about potentially suboptimal choices related to OpenACC parallelism. +Wopenacc-kernels-annotate-loops +C ObjC C++ ObjC++ Warning Var(warn_openacc_kernels_annotate_loops) Init(0) +Warn about loops in OpenACC kernels regions that cannot be parallelized. + Wopenmp-simd C C++ Var(warn_openmp_simd) Warning LangEnabledBy(C C++,Wall) Warn if a simd directive is overridden by the vectorizer cost model. @@ -1910,6 +1914,10 @@ fopenacc-dim= C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims) Specify default OpenACC compute dimensions. +fopenacc-kernels-annotate-loops +C ObjC C++ ObjC++ LTO Optimization Var(flag_openacc_kernels_annotate_loops) Init(1) +Automatically parallelize unannotated loops in OpenACC kernels regions. + fopenmp C ObjC C++ ObjC++ LTO Var(flag_openmp) Enable OpenMP (implies -frecursive in Fortran). diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 186fa1692c16..467b3425b9a4 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -10230,6 +10230,29 @@ temp_pop_parm_decls (void) pop_scope (); } +/* Function passed to c_oacc_annotate_loop_in_kernels_regions to do + language-specific unwrapping of an initializer expression. */ +static tree +c_unwrap_for_init (tree x) +{ + if (!x) + return NULL_TREE; + + while (true) + switch (TREE_CODE (x)) + { + case MODIFY_EXPR: + case VAR_DECL: + return x; + + case DECL_EXPR: + x = TREE_OPERAND (x, 0); + break; + + default: + return NULL_TREE; + } +} /* Finish up a function declaration and compile that function all the way to assembler language output. Then free the storage @@ -10332,6 +10355,11 @@ finish_function (location_t end_loc) if (warn_unused_parameter) do_warn_unused_parameter (fndecl); + /* If requested, automatically annotate suitable loops in OpenACC kernels + regions with OpenACC loop annotations to allow auto-parallelization. */ + if (flag_openacc && flag_openacc_kernels_annotate_loops) + c_oacc_annotate_loops_in_kernels_regions (fndecl, c_unwrap_for_init); + /* Store the end of the function, so that we get good line number info for the epilogue. */ cfun->function_end_locus = end_loc; diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 7c2a134e4061..17f14d1f6742 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -17528,6 +17528,45 @@ emit_coro_helper (tree helper) expand_or_defer_fn (helper); } + +/* Function passed to c_oacc_annotate_loop_in_kernels_regions to do + language-specific unwrapping of an initializer expression. */ +static tree +cp_unwrap_for_init (tree x) +{ + if (!x) + return NULL_TREE; + + while (true) + switch (TREE_CODE (x)) + { + case MODIFY_EXPR: + case VAR_DECL: + return x; + + case CLEANUP_POINT_EXPR: + x = TREE_OPERAND (x, 0); + break; + + case EXPR_STMT: + x = TREE_OPERAND (x, 0); + break; + + case DECL_EXPR: + x = TREE_OPERAND (x, 0); + break; + + case CONVERT_EXPR: + if (TREE_TYPE (x) != void_type_node) + return NULL_TREE; + x = TREE_OPERAND (x, 0); + break; + + default: + return NULL_TREE; + } +} + /* Finish up a function declaration and compile that function all the way to assembler language output. The free the storage for the function definition. INLINE_P is TRUE if we just @@ -17832,6 +17871,11 @@ finish_function (bool inline_p) && !DECL_CLONED_FUNCTION_P (fndecl)) do_warn_unused_parameter (fndecl); + /* If requested, automatically annotate suitable loops in OpenACC kernels + regions with OpenACC loop annotations to allow auto-parallelization. */ + if (flag_openacc && flag_openacc_kernels_annotate_loops) + c_oacc_annotate_loops_in_kernels_regions (fndecl, cp_unwrap_for_init); + /* Genericize before inlining. */ if (!processing_template_decl && !DECL_IMMEDIATE_FUNCTION_P (fndecl) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9fb74d349203..e0f09610408c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -371,6 +371,7 @@ Objective-C and Objective-C++ Dialects}. -Wnull-dereference -Wno-odr @gol -Wopenacc-parallelism @gol -Wopenmp-simd @gol +-Wopenacc-kernels-annotate-loops -Wopenmp-simd @gol -Wno-overflow -Woverlength-strings -Wno-override-init-side-effects @gol -Wpacked -Wno-packed-bitfield-compat -Wpacked-not-aligned -Wpadded @gol -Wparentheses -Wno-pedantic-ms-format @gol @@ -533,7 +534,8 @@ Objective-C and Objective-C++ Dialects}. -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol -fmove-loop-invariants -fmove-loop-stores -fno-branch-count-reg @gol -fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse @gol --fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole @gol +-fno-guess-branch-probability -fno-inline -fno-math-errno @gol +-fno-openacc-kernels-annotate-loops -fno-peephole @gol -fno-peephole2 -fno-printf-return-value -fno-sched-interblock @gol -fno-sched-spec -fno-signed-zeros @gol -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol @@ -8957,6 +8959,13 @@ Enabled by default. @cindex OpenACC accelerator programming Warn about potentially suboptimal choices related to OpenACC parallelism. +@item -Wopenacc-kernels-annotate-loops +@opindex Wopenacc-kernels-annotate-loops +@opindex Wno-Wopenacc-kernels-annotate-loops +Warn about @code{for} (C/C++) or @code{DO} (Fortran) loops in OpenACC +kernels regions that cannot be automatically annotated for +parallelization with @option{-fopenacc-kernels-annotate-loops}. + @item -Wopenmp-simd @opindex Wopenmp-simd @opindex Wno-openmp-simd @@ -14835,6 +14844,27 @@ SIMD iterations. @end table +@item -fno-openacc-kernels-annotate-loops +@opindex fno-openacc-kernels-annotate-loops +@opindex fopenacc-kernels-annotate-loops +@cindex kernels regions, OpenACC +Disable automatic parallelization of unannotated loops in OpenACC +kernels regions. The default is to attempt to add implicit +@code{acc loop auto} annotations to loops in kernels regions if +@option{-fopenacc} is enabled. + +Note that you can use @option{-Wopenacc-kernels-annotate-loops} to +diagnose @code{for} loops that cannot be automatically annotated +(@pxref{Warning Options}). Reasons why automatic loop annotations +cannot be applied include premature exits, calls to functions without +an @code{openacc routine} annotation, or unstructured control flow in +the loop body. In C and C++, the loop variable initialization, end +test, and increment expressions must additionally conform to +restrictions similar to those for explicitly-annotated loops, and the +loop variable must not be otherwise modified in the body of the loop. +An explicit @code{acc loop} annotation disables automatic annotations +on any nested or containing loops. + @end table @node Instrumentation Options diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c index 1d12658790d1..e391184f403d 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c @@ -2,6 +2,7 @@ OpenACC kernels. */ /* { dg-additional-options "-O2" } + { dg-additional-options "-fno-openacc-kernels-annotate-loops" } { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } { dg-additional-options "-fdump-tree-parloops1-all" } diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c index bdf7b4a06410..779e2b0a24db 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c @@ -1,7 +1,8 @@ /* Check offloaded function's attributes and classification for OpenACC - kernels. */ + 'kernels' (parloops version). */ /* { dg-additional-options "-O2" } + { dg-additional-options "-fno-openacc-kernels-annotate-loops" } { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } { dg-additional-options "-fdump-tree-parloops1-all" } diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c index 030425475495..c37152c74041 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-dom3" } */ #include diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c index c475333f1aef..b1f43029af7c 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c index 8f7f415b58d8..e87aab3295c7 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fopt-info-optimized-omp" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c index c11d36fb4373..2323857fb4ad 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fopt-info-optimized-omp" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c index acef6a1a1793..adca30bf2cd7 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c index 75e2bb78cea4..5f16085ff386 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c new file mode 100644 index 000000000000..c7b5ac882195 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-1.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that all loops in the nest are annotated. */ + +void f (float a[16][16], float b[16][16], float c[16][16]) +{ + int i, j, k; + +#pragma acc kernels copyin(a[0:16][0:16], b[0:16][0:16]) copyout(c[0:16][0:16]) + { + for (i = 0; i < 16; i++) { + for (j = 0; j < 16; j++) { + float t = 0; + for (k = 0; k < 16; k++) + t += a[i][k] * b[k][j]; + c[i][j] = t; + } + } + } + +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c new file mode 100644 index 000000000000..58b41d20e232 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-10.c @@ -0,0 +1,32 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a random goto in the body can't be annotated. */ + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + { + if (a[i] < 0) + { + t = 0; + goto bad; + } + t += a[i] * b[i]; + } + bad: + ; + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c new file mode 100644 index 000000000000..e9d2ef48611a --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-11.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a random label in the body triggers a warning. */ + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i = n - 1; + +#pragma acc kernels + { + goto spaghetti; + for (i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + { + spaghetti: + t += a[i] * b[i]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c new file mode 100644 index 000000000000..ba408bc3634d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-12.c @@ -0,0 +1,28 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that in a situation with nested loops, a problem that prevents + annotation of the inner loop only still allows the outer loop to be + annotated. */ + +float f (float *a, float *b, int n) +{ + float t = 0; + +#pragma acc kernels + { + for (int i = 0; i < n; i++) + for (int j = 0; j <= i; j++) /* { dg-warning "loop cannot be annotated" } */ + { + if (a[i] < 0 || b[j] < 0) + j = i; + else + t += a[i] * b[j]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c new file mode 100644 index 000000000000..64433e816ed4 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-13.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that in a situation with nested loops, a problem that prevents + annotation of the outer loop only still allows the inner loop to be + annotated. */ + +float f (float *a, float *b, int n) +{ + float t = 0; + +#pragma acc kernels + { + for (int i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + { + if (a[i] < 0) + n = i; + for (int j = 0; j <= i; j++) + t += a[i] * b[j]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c new file mode 100644 index 000000000000..379e6baf97c3 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-14.c @@ -0,0 +1,22 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that an explicit annotation on an outer loop suppresses annotation + of inner loops, and produces a diagnostic. */ + +void f (float *a, float *b) +{ + float t = 0; + +#pragma acc kernels + { +#pragma acc loop seq + for (int l = 0; l < 20; l++) + for (int m = 0; m < 20; m++) /* { dg-warning "loop cannot be annotated" } */ + b[m] = a[m]; + } +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c new file mode 100644 index 000000000000..9a2a7cabde5d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-15.c @@ -0,0 +1,22 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that an explicit annotation on an inner loop suppresses annotation + of outer loops, and produces a diagnostic. */ + +void f (float *a, float *b) +{ + float t = 0; + +#pragma acc kernels + { + for (int l = 0; l < 20; l++) /* { dg-warning "loop cannot be annotated" } */ +#pragma acc loop seq + for (int m = 0; m < 20; m++) + b[m] = a[m]; + } +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c new file mode 100644 index 000000000000..075f897fad4a --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-16.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a modification of the loop variable in the + body cannot be annotated. */ + +float f (float *a, float *b, int n) +{ + float t = 0; + +#pragma acc kernels + { + for (int i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + { + if (a[i] < 0 || b[i] < 0) + i = n; + else + t += a[i] * b[i]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c new file mode 100644 index 000000000000..507678965b4d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-17.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a modification of the loop iteration count + variable in the body cannot be annotated. */ + +float f (float *a, float *b, int n) +{ + float t = 0; + +#pragma acc kernels + { + for (int i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + { + if (a[i] < 0 || b[i] < 0) + n = i; + else + t += a[i] * b[i]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c new file mode 100644 index 000000000000..9e0a946828ff --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-2.c @@ -0,0 +1,21 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a variable bound can be annotated. */ + +float f (float *a, float *b, int n) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) + t += a[i] * b[i]; + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c new file mode 100644 index 000000000000..f60070e27961 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-3.c @@ -0,0 +1,24 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a conditional in the body can be annotated. */ + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) + if (a[i] > 0 && b[i] > 0) + t += a[i] * b[i]; + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c new file mode 100644 index 000000000000..949871cc42ec --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-4.c @@ -0,0 +1,34 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a switch and break in the body can be annotated. */ + +#define n 16 + +float f (float *a, float *b, int state) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) + switch (state) + { + case 0: + default: + t += a[i] * b[i]; + break; + + case 1: + if (a[i] > 0 && b[i] > 0) + t += a[i] * b[i]; + break; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c new file mode 100644 index 000000000000..03dfe8fbcd40 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-5.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a continue statement in the body can be annotated. */ + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) + { + if (a[i] < 0 || b[i] < 0) + continue; + t += a[i] * b[i]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c new file mode 100644 index 000000000000..ede6b3c8cd67 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-6.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a break statement in the body cannot be annotated. */ + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + { + if (a[i] < 0 || b[i] < 0) + break; + t += a[i] * b[i]; + } + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c new file mode 100644 index 000000000000..20ee29989665 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-7.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with a random function call in the body cannot be + annotated. */ + +extern float g (float); + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) /* { dg-warning "loop cannot be annotated" } */ + t += g (a[i] * b[i]); + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c new file mode 100644 index 000000000000..796f048d67ca --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-8.c @@ -0,0 +1,27 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a loop with an openacc function call in the body can be + annotated. */ + +#pragma acc routine worker +extern float g (float); + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) + t += g (a[i] * b[i]); + } + return t; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c new file mode 100644 index 000000000000..048f1b09a84d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-9.c @@ -0,0 +1,26 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that a kernels loop with a return in the body triggers a hard + error. */ + +#define n 16 + +float f (float *a, float *b) +{ + float t = 0; + int i; + +#pragma acc kernels + { + for (i = 0; i < n; i++) + { + if (a[i] < 0 || b[i] < 0) + return 0.0; /* { dg-error "invalid branch" } */ + t += a[i] * b[i]; + } + } + return t; +} diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c index 71800217991a..9a97de6f6e13 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c index 0c9f83312408..31e8378e3d74 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c index 0bd21b68d317..ad591551b979 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c index dd5a84146a8e..4acffef41ba1 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c index a658182de904..327aa0570c9c 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c index 73b469d70610..26c65fe742aa 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c @@ -1,5 +1,6 @@ /* { dg-additional-options "-O2" } */ /* { dg-additional-options "-g" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c index 55926230d578..8955cf29224b 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c index e86be1b1cdc0..d88a61dbab51 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c index 2b0e186ae297..5943d56a5bbe 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c index 9619d53b43d7..ad525cdbe141 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c index 69539b24a78d..f799baffd8df 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c index 81b0fee5a44c..b8093b54dec8 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c index 5921b88920fd..105cbcf3ba2e 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fdump-tree-parloops1-all" } */ /* { dg-additional-options "-fdump-tree-optimized" } */ From patchwork Wed Dec 15 15:54:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48943 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4BC3E3858027 for ; Wed, 15 Dec 2021 15:55:58 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id B467F3858D39 for ; Wed, 15 Dec 2021 15:55:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B467F3858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: tIQjPXZwCfyQgwDPAlv3+uY/G7MSG3z451LEYqsZgLYPLSmXbFB19/TjmI+9E6vReCa4yQMhAj +97XKFHZmN4NJrkLGvxwujTjerd7jRcc92oi93eJATM3hCuqKDlHedSogiMJl57suuZDr6Mqad NIV+Tyf6ZHWjoNSE4+0k6ckn2VIZ99HhqPE5ZDbPRuctUF1KTzQFqZtCiqP3qNl4hzCKteriDp YkkBymKPmAYk5i8gib93Z7yMAoBy1ECbojMs8jaNW+1gzmfaJoVcxezoZYHNQfeeX+S4wThOVi vl5PA9gScp+cgYFz5+R6MAfZ X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584565" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:03 -0800 IronPort-SDR: rQtshmbH784vE01ajhFeKoBDmPqE2UHIdAkGd1oExTuZ2oY18h2/s77s4II327oa0aGn01mOYd bJmvZW4WH8M6C+qYsPtYab7IfNbYKP/TMdcWGDxcwmlo/DWCANB7mcvCj8oS0QCHNtBu2G2wcP NhV8ozsYGXZ9QBG3egZR04R0enAyIuLfPICxm/QPdVwuV2x7/QZwVhWm/iVJFTbrjrEApFn4CD bdu5b9f4GXvC0old315rbqd8HU48QsEEHaX2aBZh4oBMsKuzHQVbUhWw/w7AurLXGhIDa5EcCC sPs= From: Frederik Harwath To: Subject: [PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases. Date: Wed, 15 Dec 2021 16:54:09 +0100 Message-ID: <20211215155447.19379-3-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandra Loosemore , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore 2020-03-27 Sandra Loosemore gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c: Add -fno-openacc-kernels-annotate-loops. --- gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c | 1 + 1 file changed, 1 insertion(+) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c index cdf85d4bafae..0f2d2f0a757b 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c @@ -1,5 +1,6 @@ /* Test OpenACC 'kernels' construct decomposition. */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fopt-info-omp-all" } */ /* { dg-additional-options "--param=openacc-kernels=decompose" } /* { dg-additional-options "-O2" } for 'parloops'. */ From patchwork Wed Dec 15 15:54:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48945 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6C2133858430 for ; Wed, 15 Dec 2021 15:57:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 9D6E2385801A; Wed, 15 Dec 2021 15:55:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9D6E2385801A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: Kn6E/YZmsiVfD1m8lFen7uHEt6nQP/59sL00Owq9DQk66RvGfZziER1O3v6DPuqAzyeiX2D8ub OfEdkTDaznQ7uCVySdHUp/Aia86hyY87eLOye1m6DzLqyWlBLd0ls3dqsZ8UujZ3Utou9jko2O r6dP3ujK5FM3Tz9qHZ3wnnhC3E5JyYdYfa30BL6iPZJS7lriz8zIwTFLVlOB8yzUg7JbX3UDOg qOI7ebh44I8uiBkZTKcptiG3gGy1CFrFRLZhPrTIhYoVsXQPSJqam0uy2YrNLaAf9KY+3+Q+T8 b99DHWatyvDqG4gtBGxbHKmG X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584566" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:07 -0800 IronPort-SDR: BGzLLOwbJAxKbvdTntHJSj1qjSYsxJ7WHy+jaxwyCq5yjsYwpClM+9QGkSMjheYWo+/JWPniNJ VVC2aboaQFEknfG/jUswsZFbhiJyyhBiy8BmhkiI+CaQDMvh+fwBZxpXzN/2K8E1pJWiHX/xAD IeJtmplW4CF1wnJiHjoIHeccnRgEfGR158LLfJMHKEitumW81AXVnMJ5lKzlM6UlIXN1ZtlAZs NyOp/bdD1D8Pj49tW4965QqhjLME3pfwa1qvQ4ixNGqTGxMI6AZc1dJcsfeINQEoCZh8u/ti2r YJw= From: Frederik Harwath To: Subject: [PATCH 03/40] Kernels loops annotation: Fortran. Date: Wed, 15 Dec 2021 16:54:10 +0100 Message-ID: <20211215155447.19379-4-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tobias@codesourcery.com, Sandra Loosemore , =?utf-8?q?Gerg=C3=B6_Barany?= , thomas@codesourcery.com, fortran@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore This patch implements the Fortran support for adding "#pragma acc loop auto" annotations to loops in OpenACC kernels regions. It implements the same -fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options that were previously added (and documented) for the C/C++ front ends. Co-Authored-By: Gergö Barany gcc/fortran/ * gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare. * lang.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. * openmp.c: Include options.h. (enum annotation_state, enum annotation_result): New. (check_code_for_invalid_calls): New. (check_expr_for_invalid_calls): New. (check_for_invalid_calls): New. (annotate_do_loop): New. (annotate_do_loops_in_kernels): New. (compute_goto_targets): New. (gfc_oacc_annotate_loops_in_kernels_regions): New. * parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops. gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add -fno-openacc-kernels-annotate-loops option. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/kernels-loop-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data.f95: Likewise. * gfortran.dg/goacc/kernels-loop-n.f95: Likewise. * gfortran.dg/goacc/kernels-loop.f95: Likewise. * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-1.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-9.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-10.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: New. --- gcc/fortran/gfortran.h | 1 + gcc/fortran/lang.opt | 8 + gcc/fortran/openmp.c | 364 ++++++++++++++++++ gcc/fortran/parse.c | 9 + .../goacc/classify-kernels-unparallelized.f95 | 1 + .../gfortran.dg/goacc/classify-kernels.f95 | 1 + .../gfortran.dg/goacc/common-block-3.f90 | 1 + .../gfortran.dg/goacc/kernels-loop-2.f95 | 1 + .../goacc/kernels-loop-annotation-1.f95 | 33 ++ .../goacc/kernels-loop-annotation-10.f95 | 32 ++ .../goacc/kernels-loop-annotation-11.f95 | 34 ++ .../goacc/kernels-loop-annotation-12.f95 | 39 ++ .../goacc/kernels-loop-annotation-13.f95 | 38 ++ .../goacc/kernels-loop-annotation-14.f95 | 35 ++ .../goacc/kernels-loop-annotation-15.f95 | 35 ++ .../goacc/kernels-loop-annotation-16.f95 | 34 ++ .../goacc/kernels-loop-annotation-2.f95 | 32 ++ .../goacc/kernels-loop-annotation-3.f95 | 33 ++ .../goacc/kernels-loop-annotation-4.f95 | 34 ++ .../goacc/kernels-loop-annotation-5.f95 | 35 ++ .../goacc/kernels-loop-annotation-6.f95 | 34 ++ .../goacc/kernels-loop-annotation-7.f95 | 48 +++ .../goacc/kernels-loop-annotation-8.f95 | 50 +++ .../goacc/kernels-loop-annotation-9.f95 | 34 ++ .../gfortran.dg/goacc/kernels-loop-data-2.f95 | 1 + .../goacc/kernels-loop-data-enter-exit-2.f95 | 1 + .../goacc/kernels-loop-data-enter-exit.f95 | 1 + .../goacc/kernels-loop-data-update.f95 | 1 + .../gfortran.dg/goacc/kernels-loop-data.f95 | 1 + .../gfortran.dg/goacc/kernels-loop-n.f95 | 1 + .../gfortran.dg/goacc/kernels-loop.f95 | 1 + .../kernels-parallel-loop-data-enter-exit.f95 | 1 + 32 files changed, 974 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95 -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index f7662c59a5df..50db768ce0fc 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3545,6 +3545,7 @@ void gfc_resolve_oacc_declare (gfc_namespace *); void gfc_resolve_oacc_parallel_loop_blocks (gfc_code *, gfc_namespace *); void gfc_resolve_oacc_blocks (gfc_code *, gfc_namespace *); void gfc_resolve_oacc_routines (gfc_namespace *); +void gfc_oacc_annotate_loops_in_kernels_regions (gfc_namespace *); /* expr.c */ void gfc_free_actual_arglist (gfc_actual_arglist *); diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index 6db01c736be1..a202c04c4a25 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -289,6 +289,10 @@ Wopenacc-parallelism Fortran ; Documented in C +Wopenacc-kernels-annotate-loops +Fortran +; Documented in C + Wopenmp-simd Fortran ; Documented in C @@ -695,6 +699,10 @@ fopenacc-dim= Fortran LTO Joined Var(flag_openacc_dims) ; Documented in C +fopenacc-kernels-annotate-loops +Fortran LTO Optimization +; Documented in C + fopenmp Fortran LTO ; Documented in C diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index dcf22ac2c2f3..243b5e0a9ac6 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "gomp-constants.h" #include "target-memory.h" /* For gfc_encode_character. */ +#include "options.h" /* Match an end of OpenMP directive. End of OpenMP directive is optional whitespace, followed by '\n' or comment '!'. */ @@ -9090,3 +9091,366 @@ gfc_resolve_omp_udrs (gfc_symtree *st) for (omp_udr = st->n.omp_udr; omp_udr; omp_udr = omp_udr->next) gfc_resolve_omp_udr (omp_udr); } + + +/* The following functions implement automatic recognition and annotation of + DO loops in OpenACC kernels regions. Inside a kernels region, a nest of + DO loops that does not contain any annotated OpenACC loops, nor EXIT + or GOTO statements, gets an automatic "acc loop auto" annotation + on each loop. + This feature is controlled by flag_openacc_kernels_annotate_loops. */ + + +/* State of annotation state traversal for DO loops in kernels regions. */ +enum annotation_state { + as_outer, + as_in_kernels_region, + as_in_kernels_loop, + as_in_kernels_inner_loop +}; + +/* Return status of annotation traversal. */ +enum annotation_result { + ar_ok, + ar_invalid_loop, + ar_invalid_nest +}; + +/* Code walk function for check_for_invalid_calls. */ + +static int +check_code_for_invalid_calls (gfc_code **codep, int *walk_subtrees, + void *data ATTRIBUTE_UNUSED) +{ + gfc_code *code = *codep; + switch (code->op) + { + case EXEC_CALL: + /* Calls to openacc routines are permitted. */ + if (code->resolved_sym + && (code->resolved_sym->attr.oacc_routine_lop + != OACC_ROUTINE_LOP_NONE)) + return 0; + /* Else fall through. */ + + case EXEC_CALL_PPC: + case EXEC_ASSIGN_CALL: + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "Subroutine call at %L prevents annotation of loop nest", + &code->loc); + *walk_subtrees = 0; + return 1; + + default: + return 0; + } +} + +/* Expr walk function for check_for_invalid_calls. */ + +static int +check_expr_for_invalid_calls (gfc_expr **exprp, int *walk_subtrees, + void *data ATTRIBUTE_UNUSED) +{ + gfc_expr *expr = *exprp; + switch (expr->expr_type) + { + case EXPR_FUNCTION: + if (expr->value.function.esym + && (expr->value.function.esym->attr.oacc_routine_lop + != OACC_ROUTINE_LOP_NONE)) + return 0; + /* Else fall through. */ + + case EXPR_COMPCALL: + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "Function call at %L prevents annotation of loop nest", + &expr->where); + *walk_subtrees = 0; + return 1; + + default: + return 0; + } +} + +/* Return TRUE if the DO loop CODE contains function or procedure + calls that ought to prohibit annotation. This traversal is + separate from the main annotation tree walk because we need to walk + expressions as well as executable statements. */ + +static bool +check_for_invalid_calls (gfc_code *code) +{ + gcc_assert (code->op == EXEC_DO); + return gfc_code_walker (&code, check_code_for_invalid_calls, + check_expr_for_invalid_calls, NULL); +} + +/* Annotate DO loop CODE with OpenACC "loop auto". */ + +static void +annotate_do_loop (gfc_code *code, gfc_code *parent) +{ + + /* A DO loop's body is another phony DO node whose next pointer starts + the actual body. */ + gcc_assert (code->op == EXEC_DO); + gcc_assert (code->block->op == EXEC_DO); + + /* Build the "acc loop auto" annotation and add the loop as its + body. */ + gfc_omp_clauses *clauses = gfc_get_omp_clauses (); + clauses->par_auto = 1; + gfc_code *oacc_loop = gfc_get_code (EXEC_OACC_LOOP); + oacc_loop->block = gfc_get_code (EXEC_OACC_LOOP); + oacc_loop->block->next = code; + oacc_loop->ext.omp_clauses = clauses; + oacc_loop->loc = code->loc; + oacc_loop->block->loc = code->loc; + + /* Splice the annotation into the place of the original loop. */ + if (parent->block == code) + parent->block = oacc_loop; + else + { + gfc_code *prev = parent->block; + while (prev != code && prev->next != code) + { + prev = prev->next; + gcc_assert (prev != NULL); + } + prev->next = oacc_loop; + } + oacc_loop->next = code->next; + code->next = NULL; +} + +/* Recursively traverse CODE in block PARENT, finding OpenACC kernels + regions. GOTO_TARGETS keeps track of statement labels that are + targets of gotos in the current function, while STATE keeps track + of the current context of the traversal. If the traversal + encounters a DO loop inside a kernels region, annotate it with + OpenACC loop directives if appropriate. Return the status of the + traversal. */ + +static enum annotation_result +annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent, + hash_set *goto_targets, + annotation_state state) +{ + gfc_code *next_code = NULL; + enum annotation_result retval = ar_ok; + + for ( ; code; code = next_code) + { + bool walk_block = true; + next_code = code->next; + + if (state >= as_in_kernels_loop + && code->here && goto_targets->contains (code->here)) + /* This statement has a label that is the target of a GOTO or some + other jump. Do not try to sort out the details, just reject + this loop nest. */ + { + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "Possible control transfer to label at %L " + "prevents annotation of loop nest", + &code->loc); + return ar_invalid_nest; + } + + switch (code->op) + { + case EXEC_OACC_KERNELS: + /* Enter kernels region. */ + annotate_do_loops_in_kernels (code->block->next, code, + goto_targets, + as_in_kernels_region); + walk_block = false; + break; + + case EXEC_OACC_PARALLEL_LOOP: + case EXEC_OACC_PARALLEL: + case EXEC_OACC_KERNELS_LOOP: + case EXEC_OACC_LOOP: + /* Do not try to add automatic OpenACC annotations inside manually + annotated loops. Presumably, the user avoided doing it on + purpose; for example, all available levels of parallelism may + have been used up. */ + if (state >= as_in_kernels_region) + { + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "Explicit loop annotation at %L " + "prevents annotation of loop nest", + &code->loc); + return ar_invalid_nest; + } + walk_block = false; + break; + + case EXEC_DO: + if (state >= as_in_kernels_region) + { + /* A DO loop's body is another phony DO node whose next + pointer starts the actual body. Skip the phony node. */ + gcc_assert (code->block->op == EXEC_DO); + enum annotation_result result + = annotate_do_loops_in_kernels (code->block->next, code, + goto_targets, + as_in_kernels_loop); + /* Check for function/procedure calls in the body of the + loop that would prevent parallelization. Unlike in C/C++, + we do not have to check that there is no modification of + the loop variable or loop count since they are already + handled by the semantics of DO loops in the FORTRAN + language. */ + if (result != ar_invalid_nest && check_for_invalid_calls (code)) + result = ar_invalid_nest; + if (result == ar_ok) + annotate_do_loop (code, parent); + else if (result == ar_invalid_nest + && state >= as_in_kernels_loop) + /* The outer loop is invalid, too, so stop traversal. */ + return result; + walk_block = false; + } + break; + + case EXEC_DO_WHILE: + case EXEC_DO_CONCURRENT: + /* Traverse the body in a special state to allow EXIT statements + from these loops. */ + if (state >= as_in_kernels_loop) + { + enum annotation_result result + = annotate_do_loops_in_kernels (code->block, code, + goto_targets, + as_in_kernels_inner_loop); + if (result == ar_invalid_nest) + return result; + else if (result != ar_ok) + retval = result; + walk_block = false; + } + break; + + case EXEC_GOTO: + case EXEC_ARITHMETIC_IF: + case EXEC_STOP: + case EXEC_ERROR_STOP: + /* A jump that may leave this loop. */ + if (state >= as_in_kernels_loop) + { + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "Possible unstructured control flow at %L " + "prevents annotation of loop nest", + &code->loc); + return ar_invalid_nest; + } + break; + + case EXEC_RETURN: + /* A return from a kernels region is diagnosed elsewhere as a + hard error, so no warning is needed here. */ + if (state >= as_in_kernels_loop) + return ar_invalid_nest; + break; + + case EXEC_EXIT: + if (state == as_in_kernels_loop) + { + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "Exit at %L prevents annotation of loop", + &code->loc); + retval = ar_invalid_loop; + } + break; + + case EXEC_BACKSPACE: + case EXEC_CLOSE: + case EXEC_ENDFILE: + case EXEC_FLUSH: + case EXEC_INQUIRE: + case EXEC_OPEN: + case EXEC_READ: + case EXEC_REWIND: + case EXEC_WRITE: + /* Executing side-effecting I/O statements in parallel doesn't + make much sense. If this is what users want, they can always + add explicit annotations on the loop nest. */ + if (state >= as_in_kernels_loop) + { + gfc_warning (OPT_Wopenacc_kernels_annotate_loops, + "I/O statement at %L prevents annotation of loop", + &code->loc); + return ar_invalid_nest; + } + break; + + default: + break; + } + + /* Visit nested statements, if any, returning early if we hit + any problems. */ + if (walk_block) + { + enum annotation_result result + = annotate_do_loops_in_kernels (code->block, code, + goto_targets, state); + if (result == ar_invalid_nest) + return result; + else if (result != ar_ok) + retval = result; + } + } + return retval; +} + +/* Traverse CODE to find all the labels referenced by GOTO and similar + statements and store them in GOTO_TARGETS. */ + +static void +compute_goto_targets (gfc_code *code, hash_set *goto_targets) +{ + for ( ; code; code = code->next) + { + switch (code->op) + { + case EXEC_GOTO: + case EXEC_LABEL_ASSIGN: + goto_targets->add (code->label1); + gcc_fallthrough (); + + case EXEC_ARITHMETIC_IF: + goto_targets->add (code->label2); + goto_targets->add (code->label3); + gcc_fallthrough (); + + default: + /* Visit nested statements, if any. */ + if (code->block != NULL) + compute_goto_targets (code->block, goto_targets); + } + } +} + +/* Find DO loops in OpenACC kernels regions that do not have OpenACC + annotations but look like they might benefit from automatic + parallelization. Add "acc loop auto" annotations for them. Assumes + flag_openacc_kernels_annotate_loops is set. */ + +void +gfc_oacc_annotate_loops_in_kernels_regions (gfc_namespace *ns) +{ + if (ns->proc_name) + { + hash_set goto_targets; + compute_goto_targets (ns->code, &goto_targets); + annotate_do_loops_in_kernels (ns->code, NULL, &goto_targets, as_outer); + } + + for (ns = ns->contained; ns; ns = ns->sibling) + gfc_oacc_annotate_loops_in_kernels_regions (ns); +} diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 12aa80ec45ca..04e9d2450b16 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -6912,6 +6912,15 @@ done: if (flag_c_prototypes || flag_c_prototypes_external) fprintf (stdout, "\n#ifdef __cplusplus\n}\n#endif\n"); + /* Add annotations on loops in OpenACC kernels regions if requested. This + is most easily done on this representation close to the source code. */ + if (flag_openacc && flag_openacc_kernels_annotate_loops) + { + gfc_current_ns = gfc_global_ns_list; + for (; gfc_current_ns; gfc_current_ns = gfc_current_ns->sibling) + gfc_oacc_annotate_loops_in_kernels_regions (gfc_current_ns); + } + /* Do the translation. */ translate_all_program_units (gfc_global_ns_list); diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 index 3fb48b321f2f..2ceae2088070 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 @@ -2,6 +2,7 @@ ! OpenACC kernels. ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fopt-info-optimized-omp" } ! { dg-additional-options "-fdump-tree-ompexp" } ! { dg-additional-options "-fdump-tree-parloops1-all" } diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 index 6c8d298e236d..d061a241074b 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 @@ -2,6 +2,7 @@ ! kernels. ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fopt-info-optimized-omp" } ! { dg-additional-options "-fdump-tree-ompexp" } ! { dg-additional-options "-fdump-tree-parloops1-all" } diff --git a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 index 5defe2ea85de..d2816c3e9364 100644 --- a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 @@ -1,4 +1,5 @@ ! { dg-options "-fopenacc -fdump-tree-omplower" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } module consts integer, parameter :: n = 100 diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 index ef53324dd2a0..63774ffb5aff 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 new file mode 100644 index 000000000000..41f6307dbb17 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 @@ -0,0 +1,33 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that all loops in the nest are annotated. + +subroutine f (a, b, c) + implicit none + + real, intent (in), dimension(16,16) :: a + real, intent (in), dimension(16,16) :: b + real, intent (out), dimension(16,16) :: c + + integer :: i, j, k + real :: t + +!$acc kernels copyin(a(1:16,1:16), b(1:16,1:16)) copyout(c(1:16,1:16)) + + do i = 1, 16 + do j = 1, 16 + t = 0 + do k = 1, 16 + t = t + a(i,k) * b(k,j) + end do + c(i,j) = t; + end do + end do + +!$acc end kernels +end subroutine f + +! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 new file mode 100644 index 000000000000..f612c5beb963 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 @@ -0,0 +1,32 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a random goto in the body can't be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + if (a(i) < 0 .or. b(i) < 0) then + go to 10 ! { dg-warning "Possible unstructured control flow" } + end if + t = t + a(i) * b(i) + end do + +10 f = t + +!$acc end kernels + +end function f diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 new file mode 100644 index 000000000000..d51482e4685d --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 @@ -0,0 +1,34 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-additional-options "-std=legacy" } +! { dg-do compile } + +! Test that a loop with a random label in the body cannot be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + t = 0.0 + +!$acc kernels + + goto 10 + + do i = 1, 16 +10 t = t + a(i) * b(i) ! { dg-warning "Possible control transfer to label" } + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 new file mode 100644 index 000000000000..3c4956d70775 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 @@ -0,0 +1,39 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that in a situation with nested loops, a problem that prevents +! annotation of the inner loop only still allows the outer loop to be +! annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i, j + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + do j = 1, 16 + if (a(i) < 0 .or. b(j) < 0) then + exit ! { dg-warning "Exit" } + else + t = t + a(i) * b(j) + end if + end do + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 new file mode 100644 index 000000000000..3ec459f0a8df --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 @@ -0,0 +1,38 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that in a situation with nested loops, a problem that prevents +! annotation of the outer loop only still allows the inner loop to be +! annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i, j + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + if (a(i) < 0) then + exit ! { dg-warning "Exit" } + end if + do j = 1, 16 + t = t + a(i) * b(j) + end do + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 new file mode 100644 index 000000000000..91f431cca432 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 @@ -0,0 +1,35 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that an explicit annotation on an outer loop suppresses annotation +! of inner loops, and produces a diagnostic. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i, j + real :: t + + t = 0.0 + +!$acc kernels + +!$acc loop seq ! { dg-warning "Explicit loop annotation" } + do i = 1, 16 + do j = 1, 16 + t = t + a(i) * b(j) + end do + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 new file mode 100644 index 000000000000..570c12d3ad70 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 @@ -0,0 +1,35 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that an explicit annotation on an inner loop suppresses annotation +! of the outer loop, and produces a diagnostic. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i, j + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + !$acc loop seq ! { dg-warning "Explicit loop annotation" } + do j = 1, 16 + t = t + a(i) * b(j) + end do + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 new file mode 100644 index 000000000000..6e44a304b28b --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 @@ -0,0 +1,34 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that loops containing I/O statements can't be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i, j + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + do j = 1, 16 + print *, " i =", i, " j =", j ! { dg-warning "I/O statement" } + t = t + a(i) * b(j) + end do + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 new file mode 100644 index 000000000000..4624a05247d9 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 @@ -0,0 +1,32 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a variable bound can be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (:) :: a, b + + integer :: i, n + real :: t + + t = 0.0 + n = size (a) + +!$acc kernels + + do i = 1, n + t = t + a(i) * b(i) + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 new file mode 100644 index 000000000000..daed8f7f6e9d --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 @@ -0,0 +1,33 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a conditional in the body can be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + if (a(i) > 0 .and. b(i) > 0) then + t = t + a(i) * b(i) + end if + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 new file mode 100644 index 000000000000..0c4ad256b7eb --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 @@ -0,0 +1,34 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a case construct in the body can be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + +!$acc kernels + + do i = 1, 16 + select case (i) + case (1) + t = a(i) * b(i) + case default + t = t + a(i) * b(i) + end select + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 new file mode 100644 index 000000000000..1c3f87eed6e4 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 @@ -0,0 +1,35 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a cycle statement in the body can be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + if (a(i) < 0 .or. b(i) < 0) then + cycle + end if + t = t + a(i) * b(i) + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } + diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 new file mode 100644 index 000000000000..43173a70df24 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 @@ -0,0 +1,34 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a exit statement in the body cannot be annotated. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + if (a(i) < 0 .or. b(i) < 0) then + exit ! { dg-warning "Exit" } + end if + t = t + a(i) * b(i) + end do + + f = t + +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 new file mode 100644 index 000000000000..ec42213220e7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 @@ -0,0 +1,48 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a random function call in the body cannot +! be annotated. + + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + interface + function g (x) + real :: g + real, intent (in) :: x + end function g + + subroutine h (x) + real, intent (in) :: x + end subroutine h + end interface + + t = 0.0 + +!$acc kernels + do i = 1, 16 + t = t + g (a(i) * b(i)) ! { dg-warning "Function call" } + end do + + do i = 1, 16 + call h (t) ! { dg-warning "Subroutine call" } + t = t + a(i) * b(i) + end do + + f = t +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } + diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 new file mode 100644 index 000000000000..9188f70d9664 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 @@ -0,0 +1,50 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a call to a declared openacc function/subroutine +! can be annotated. + + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + interface + function g (x) + !$acc routine worker + real :: g + real, intent (in) :: x + end function g + + subroutine h (x) + !$acc routine worker + real, intent (in) :: x + end subroutine h + end interface + + t = 0.0 + +!$acc kernels + do i = 1, 16 + t = t + g (a(i) * b(i)) + end do + + do i = 1, 16 + call h (t) + t = t + a(i) * b(i) + end do + + f = t +!$acc end kernels + +end function f + +! { dg-final { scan-tree-dump-times "acc loop private\\(i\\) auto" 2 "original" } } + diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95 new file mode 100644 index 000000000000..f5aa5a0f43b5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-9.f95 @@ -0,0 +1,34 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with a return statement in the body gives a hard +! error. + +function f (a, b) + implicit none + + real :: f + real, intent (in), dimension (16) :: a, b + + integer :: i + real :: t + + t = 0.0 + +!$acc kernels + + do i = 1, 16 + if (a(i) < 0 .or. b(i) < 0) then + f = 0.0 + return ! { dg-error "invalid branch" } + end if + t = t + a(i) * b(i) + end do + + f = t + +!$acc end kernels + +end function f diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 index 2f1dcd603a14..c1f6ef8df600 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 index 447e85d64483..313e3df7f63d 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 index 4edb2889b7b1..26671064ba27 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 index fc113e1f6602..d79ed796c366 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 index 94522f586362..d8ef52af2e6a 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 index b9c4aea074d7..6b7334144c87 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 index 6dc7b2e0f28f..aadfcfc41448 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 index 48c20b999423..0d45c5cf4338 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 @@ -1,4 +1,5 @@ ! { dg-additional-options "-O2" } +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fdump-tree-parloops1-all" } ! { dg-additional-options "-fdump-tree-optimized" } From patchwork Wed Dec 15 15:54:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48946 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E73C73858428 for ; Wed, 15 Dec 2021 15:58:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id BE7F7385801E; Wed, 15 Dec 2021 15:55:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE7F7385801E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: MsmJumfmodvUnOzHrpWiangHCz/uwL/3KeH6SL8FlK6ZdfUKfunYJN3CuQPl0U2oLiyABkJMvJ efWy1SFazs6Rdk9lslL0OEe3schT9mR4RW3p2yvoLwq9M8IJ96wF5x51ZyIZ8IntcisE54nV2s yOffmkuaWoE81PpI6GWBqDVWY8TS5y7oW1H76QxH5EDPiZ74J26PAdqcOFKRLAyvfoCUsYFbvj B0tF3uAsg+3Wh5utPrIzqB3YVzgTJmbzQ4uujNIxGgHvmBZ6HF7klGwX81xlp3LCHV3W/DjnLg kdTTahojddc/U6NuV34FNmtn X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736533" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:14 -0800 IronPort-SDR: Th46J/ZALpO5UxfeMMLXMNtnqz6KKXVcAB7AF9I7lsQD+fAFLJT+ej+aaStNUMo9OD67mGez2J UW3IBFjW6iU6MXDSv/pPZzID7oc9SMMebuQ0+BVwcBm6TnKIzW6vJ+Cy3Ff4awCmJKRSA2zV0R VUy9FtqCJQWFescwPUWnvfIBW0z9RrdYBEPXCWniJXP8gOggs3BZ6e70oykKckU3qOAuU+5SlP 4j1tTB7HNkPYH0wH9srr9ERIyeVlsp0GGey9S/SVFevwTQ72nD5tCbpeTLo1DULvyMcH4ujXdD HRw= From: Frederik Harwath To: Subject: [PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass. Date: Wed, 15 Dec 2021 16:54:11 +0100 Message-ID: <20211215155447.19379-5-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tobias@codesourcery.com, Sandra Loosemore , thomas@codesourcery.com, fortran@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore 2020-03-27 Sandra Loosemore gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust line numbering. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Add -fno-openacc-kernels-annotate-loops. --- .../gfortran.dg/goacc/classify-kernels-unparallelized.f95 | 5 +++-- gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 5 +++-- gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 | 1 + 3 files changed, 7 insertions(+), 4 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 index 2ceae2088070..00aac9aa94ea 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 @@ -23,8 +23,9 @@ program main call setup(a, b) - !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" } - do i = 0, n - 1 + !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) + do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" } + ! { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } 24 } c(i) = a(f (i)) + b(f (i)) end do !$acc end kernels diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 index d061a241074b..ba815319abf2 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 @@ -19,8 +19,9 @@ program main call setup(a, b) - !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC gang loop parallelism" } - do i = 0, n - 1 + !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) + do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop parallelism" } + ! { dg-message "beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } 20 } c(i) = a(i) + b(i) end do !$acc end kernels diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 index 238482b91a49..04c998d11dad 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 @@ -1,5 +1,6 @@ ! Test OpenACC 'kernels' construct decomposition. +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fopt-info-omp-all" } ! { dg-additional-options "--param=openacc-kernels=decompose" } ! { dg-additional-options "-O2" } for 'parloops'. From patchwork Wed Dec 15 15:54:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48947 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E87233857823 for ; Wed, 15 Dec 2021 15:58:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id BA266385802B for ; Wed, 15 Dec 2021 15:55:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA266385802B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: KXrBP8k8GqzmrDoeY5ts2zus/ZLS1FElFxxVdXAz0HXOzzGRwn7xDrz/bTIOajjWvqWN+L8JhG JSRMwMWhGPkmT4sAuB3OdK1fR30DzMWF8lAJJVNQ55mDT7VR61kWMPFM/n7IddZQUjP2Xk8Bol GKODYMuTG0e2JPmAQw7hmWqIlwgWkGIeurhU1PnYKG9QQ74+zmXUazD7qwc/mkay7K6TQrhqvu Pgxbc9o0LsmWrIGCGiuZ3DJ9sYEEjbwLpNYXYhNPfXGxS3nBuEGE+aVpyVYwK2W01Og4YsU53i SCwXLIzA2Vnm9h85uMAnHSHY X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736536" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:17 -0800 IronPort-SDR: EAEGbk4N26KmbB2x5f3fLR8hOgRU7xTZOHKzGybzqRyRZTKENBAgokvpdsRveqjHJfTvPkLyKP eUv6t4WA05LemRO/AaRvwvDNhLuQoRP7ylzn3A92SMPd6Pay62LXvBbeDbkqv8vTInW+pk1q9y rFoG2jJot7CW96nnlLMjVyAbIGHcLFe/nSIg4IJIAsQytNr6T8ZK4qpZl863xQ+BAncYGkocYQ qHLnVWD/GDeP6rq92oZnzioZvmrNRYK0DwdxI6z1G+/m7qILMe59ZQ0F9NAQQaiLmV1mQ0tg8W ok4= From: Frederik Harwath To: Subject: [PATCH 05/40] Fix bug in processing of array dimensions in data clauses. Date: Wed, 15 Dec 2021 16:54:12 +0100 Message-ID: <20211215155447.19379-6-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandra Loosemore , thomas@codesourcery.com, nathan@acm.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore The g++ front end wraps the array length and low_bound values in NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail. The test case c-c++-common/goacc/kernels-loop-annotation-1.c was tickling this bug and giving bogus errors in g++ because it was falling through to dynamic array code instead of recognizing the constant bounds. This patch was posted upstream here https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html but not yet committed. It may be that some other fix for this problem is implemented on mainline instead; check before merging this patch. 2020-03-31 Sandra Loosemore gcc/cp/ * semantics.c (handle_omp_array_sections_1): Call STRIP_NOPS on length and low_bound; (handle_omp_array_sections): Likewise. --- gcc/cp/semantics.c | 9 +++++++++ 1 file changed, 9 insertions(+) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 2443d0327498..c2643d0a7a24 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -5145,6 +5145,10 @@ handle_omp_array_sections_1 (tree c, tree t, vec &types, if (length) length = mark_rvalue_use (length); /* We need to reduce to real constant-values for checks below. */ + if (length) + STRIP_NOPS (length); + if (low_bound) + STRIP_NOPS (low_bound); if (length) length = fold_simple (length); if (low_bound) @@ -5457,6 +5461,11 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) tree low_bound = TREE_PURPOSE (t); tree length = TREE_VALUE (t); + if (length) + STRIP_NOPS (length); + if (low_bound) + STRIP_NOPS (low_bound); + i--; if (low_bound && TREE_CODE (low_bound) == INTEGER_CST From patchwork Wed Dec 15 15:54:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48948 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 00D743857C77 for ; Wed, 15 Dec 2021 15:59:25 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 908E1385842F; Wed, 15 Dec 2021 15:55:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 908E1385842F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: aLzVlocf82sWQygCJ+bv+FlK66JvYsc4v22g4vSHgwMhnrz6PjnbSzJz7RBjPHvcqg7rZ44NAe RXTpeCoN2a/Ymv/I1JKtQOj7GZw0+3kMsrVODY1s8R23jtRLNQje4SC/NdumfPfkYqYpdr1Knd PqpkyOO8AybwP9nPRcCwwvUB/tcK9hz4uFpP0iAqyVKgEhlCykZs5WHNqNk7xicWQk0GFUHzn+ /j9M+iYFIoaTxfTIUiEz+Z6h9YNEV4uGMusTrRcTEbUITv0krjrDNHBk6XeRODaNL6tkRoH03e /X6Pt5ZzqMdsrfUwqapihaEf X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736538" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:19 -0800 IronPort-SDR: ALLs/6P9Sk/TXEZe7ZJG0VQk0GyPxrjHfBB8ho6YGtKXsG+jIL4I/QtZbDtcQ6m+vFHB7KgslS BFWtoS2CXiSwzZjjnhKYTxhjrCAivOyFoSpr0vnGyD0vQ0p9dA9kOi3DSgqHss7RyB2SrEkUPj n5ek/c3VrCOLQnO5kbdOSxYqzpwRfMLv/yfHBG5xXJCPpu+2+LI2fU614RsL/4P1h6Tzkvwr/F 77dM/EDTCOegRf8X0iwzbj1e730Lgse77nPsM/AWgtzxZguVhfAm8xFcKzJnsXs+8lhgxk4rB2 TZg= From: Frederik Harwath To: Subject: [PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives. Date: Wed, 15 Dec 2021 16:54:13 +0100 Message-ID: <20211215155447.19379-7-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: fortran@gcc.gnu.org, nathan@acm.org, Sandra Loosemore , tobias@codesourcery.com, thomas@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore 2020-08-19 Sandra Loosemore gcc/ * tree.h (OACC_LOOP_COMBINED): New. gcc/c/ * c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED. gcc/cp/ * parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_do): Add combined parameter, use it to set OACC_LOOP_COMBINED. Update all call sites. --- gcc/c/c-parser.c | 3 +++ gcc/cp/parser.c | 3 +++ gcc/fortran/trans-openmp.c | 34 +++++++++++++++++++++------------- gcc/tree.h | 5 +++++ 4 files changed, 32 insertions(+), 13 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 80dd61d599ef..1258b48693de 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -17371,6 +17371,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name, omp_clause_mask mask, tree *cclauses, bool *if_p) { bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1; + bool is_combined = (cclauses != NULL); strcat (p_name, " loop"); mask |= OACC_LOOP_CLAUSE_MASK; @@ -17389,6 +17390,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name, tree block = c_begin_compound_stmt (true); tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL, if_p); + if (stmt && stmt != error_mark_node) + OACC_LOOP_COMBINED (stmt) = is_combined; block = c_end_compound_stmt (loc, block, true); add_stmt (block); diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 4c2075742d6a..c834d25b028f 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -44580,6 +44580,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name, omp_clause_mask mask, tree *cclauses, bool *if_p) { bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1; + bool is_combined = (cclauses != NULL); strcat (p_name, " loop"); mask |= OACC_LOOP_CLAUSE_MASK; @@ -44598,6 +44599,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name, tree block = begin_omp_structured_block (); int save = cp_parser_begin_omp_structured_block (parser); tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p); + if (stmt && stmt != error_mark_node) + OACC_LOOP_COMBINED (stmt) = is_combined; cp_parser_end_omp_structured_block (parser, save); add_stmt (finish_omp_structured_block (block)); diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index e81c5588c53c..618e106791e5 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -4855,7 +4855,8 @@ typedef struct dovar_init_d { static tree gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, - gfc_omp_clauses *do_clauses, tree par_clauses) + gfc_omp_clauses *do_clauses, tree par_clauses, + bool combined) { gfc_se se; tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls; @@ -5219,7 +5220,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, case EXEC_OMP_DISTRIBUTE: stmt = make_node (OMP_DISTRIBUTE); break; case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break; case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break; - case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break; + case EXEC_OACC_LOOP: + stmt = make_node (OACC_LOOP); + OACC_LOOP_COMBINED (stmt) = combined; + break; default: gcc_unreachable (); } @@ -5313,7 +5317,8 @@ gfc_trans_oacc_combined_directive (gfc_code *code) pblock = █ else pushlevel (); - stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL); + stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL, + true); protected_set_expr_location (stmt, loc); if (TREE_CODE (stmt) != BIND_EXPR) stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); @@ -6151,7 +6156,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t *pblock, omp_do_clauses = gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_DO], code->loc); body = gfc_trans_omp_do (code, EXEC_OMP_SIMD, pblock ? pblock : &block, - &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses); + &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses, false); if (pblock == NULL) { if (TREE_CODE (body) != BIND_EXPR) @@ -6209,7 +6214,7 @@ gfc_trans_omp_parallel_do (gfc_code *code, bool is_loop, stmtblock_t *pblock, } stmt = gfc_trans_omp_do (code, is_loop ? EXEC_OMP_LOOP : EXEC_OMP_DO, new_pblock, &clausesa[GFC_OMP_SPLIT_DO], - omp_clauses); + omp_clauses, false); if (pblock == NULL) { if (TREE_CODE (stmt) != BIND_EXPR) @@ -6496,7 +6501,8 @@ gfc_trans_omp_distribute (gfc_code *code, gfc_omp_clauses *clausesa) case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: stmt = gfc_trans_omp_do (code, EXEC_OMP_SIMD, &block, - &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE); + &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE, + false); if (TREE_CODE (stmt) != BIND_EXPR) stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); else @@ -6555,13 +6561,13 @@ gfc_trans_omp_teams (gfc_code *code, gfc_omp_clauses *clausesa, case EXEC_OMP_TEAMS_DISTRIBUTE: stmt = gfc_trans_omp_do (code, EXEC_OMP_DISTRIBUTE, NULL, &clausesa[GFC_OMP_SPLIT_DISTRIBUTE], - NULL); + NULL, false); break; case EXEC_OMP_TARGET_TEAMS_LOOP: case EXEC_OMP_TEAMS_LOOP: stmt = gfc_trans_omp_do (code, EXEC_OMP_LOOP, NULL, &clausesa[GFC_OMP_SPLIT_DO], - NULL); + NULL, false); break; default: stmt = gfc_trans_omp_distribute (code, clausesa); @@ -6641,7 +6647,8 @@ gfc_trans_omp_target (gfc_code *code) break; case EXEC_OMP_TARGET_SIMD: stmt = gfc_trans_omp_do (code, EXEC_OMP_SIMD, &block, - &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE); + &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE, + false); if (TREE_CODE (stmt) != BIND_EXPR) stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); else @@ -6712,7 +6719,8 @@ gfc_trans_omp_taskloop (gfc_code *code, gfc_exec_op op) break; case EXEC_OMP_TASKLOOP_SIMD: stmt = gfc_trans_omp_do (code, EXEC_OMP_SIMD, &block, - &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE); + &clausesa[GFC_OMP_SPLIT_SIMD], NULL_TREE, + false); if (TREE_CODE (stmt) != BIND_EXPR) stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); else @@ -6756,7 +6764,7 @@ gfc_trans_omp_master_masked_taskloop (gfc_code *code, gfc_exec_op op) stmt = gfc_trans_omp_do (code, EXEC_OMP_TASKLOOP, NULL, code->op != EXEC_OMP_MASTER_TASKLOOP ? &clausesa[GFC_OMP_SPLIT_TASKLOOP] - : code->ext.omp_clauses, NULL); + : code->ext.omp_clauses, NULL, false); } if (TREE_CODE (stmt) != BIND_EXPR) stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); @@ -7119,7 +7127,7 @@ gfc_trans_oacc_directive (gfc_code *code) return gfc_trans_oacc_construct (code); case EXEC_OACC_LOOP: return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses, - NULL); + NULL, false); case EXEC_OACC_UPDATE: case EXEC_OACC_CACHE: case EXEC_OACC_ENTER_DATA: @@ -7159,7 +7167,7 @@ gfc_trans_omp_directive (gfc_code *code) case EXEC_OMP_SIMD: case EXEC_OMP_TASKLOOP: return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses, - NULL); + NULL, false); case EXEC_OMP_DISTRIBUTE_PARALLEL_DO: case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_DISTRIBUTE_SIMD: diff --git a/gcc/tree.h b/gcc/tree.h index 7542d97ce121..15e5147f40b0 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1524,6 +1524,11 @@ class auto_suppress_location_wrappers #define OMP_MASKED_COMBINED(NODE) \ (OMP_MASKED_CHECK (NODE)->base.private_flag) +/* True on an OACC_LOOP statement if it is part of a combined construct, + for example "#pragma acc kernels loop". */ +#define OACC_LOOP_COMBINED(NODE) \ + (OACC_LOOP_CHECK (NODE)->base.private_flag) + /* Memory order for OMP_ATOMIC*. */ #define OMP_ATOMIC_MEMORY_ORDER(NODE) \ (TREE_RANGE_CHECK (NODE, OMP_ATOMIC, \ From patchwork Wed Dec 15 15:54:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48949 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AE866385AC1D for ; Wed, 15 Dec 2021 15:59:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id E9975385802B for ; Wed, 15 Dec 2021 15:55:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E9975385802B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: BlF9C89zElbXZJpSbHatRGltKk0zXjkliH0LRjwYz8yYSmHQDfBA36b8zxudMGKK5NRf7eqONI dK2UQP8ljYMaEk4eNSlP6qAnnyfxe13HfgBeQXoA2A+XloL0MSWF4RveW+Cba/Yfp19kVJr8JF hzjIjoZpKl7L1NSwu+E/GTps4TnPb7wWEqGYxZkuanpQJmjZNvJJZxX28azbOx6wHf8QyepPfv xrRb9GpGfF7vEaWyTZQ6hIzKClLRCnlRya0ar+5bEe4hQ4dbAEQbBzQZGpaqmZS1F9yChZyxGn 129wDyOdmO2WGaF6WzZTv6xM X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736543" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:22 -0800 IronPort-SDR: 3whl+JL5uVbHSQwlU4P0ct6jFfbFVwPSD9MYSYBeqLUNP6Htc+b59lwZVJ5zksQfnq6A5nzZS2 6mQuBWHrfatuIZEeEb23MGQMu+TD6kHFbDrztdqqzob1I54+9Y+ZfhiLcRu4SZ8cxXryb5rSkt N03chzgs97iIaU2JB4Mu1fNZjJaO+FJ10rPB/10sI9zAxmdF0bfDw7JYlaCFcAbLnoNyuBosPh FHlXUWhJiWCsFthr3Ef4w9EuIIEB7ENgq+SLcarwuThlkd5302iTql3hAEOakG8bmmzVVPE+dZ 7GA= From: Frederik Harwath To: Subject: [PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++). Date: Wed, 15 Dec 2021 16:54:14 +0100 Message-ID: <20211215155447.19379-8-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandra Loosemore , thomas@codesourcery.com, joseph@codesourcery.com, nathan@acm.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore Normally explicit loop directives in a kernels region inhibit automatic annotation of other loops in the same nest, on the theory that users have indicated they want manual control over that section of code. However there seems to be an expectation in user code that the combined "kernels loop" directive should still allow annotation of inner loops. This patch implements this behavior for C and C++. 2020-08-19 Sandra Loosemore gcc/c-family/ * c-omp.c (annotate_loops_in_kernels_regions): Process inner loops in combined "acc kernels loop" directives. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-18.c: New. * c-c++-common/goacc/kernels-loop-annotation-19.c: New. * c-c++-common/goacc/combined-directives.c: Adjust expected patterns. --- gcc/c-family/c-omp.c | 36 ++++++++++++------- .../c-c++-common/goacc/combined-directives.c | 2 +- .../goacc/kernels-loop-annotation-18.c | 18 ++++++++++ .../goacc/kernels-loop-annotation-19.c | 19 ++++++++++ 4 files changed, 62 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index fad50da8fbc4..30757877eafe 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3477,18 +3477,30 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees, /* Do not try to add automatic OpenACC annotations inside manually annotated loops. Presumably, the user avoided doing it on purpose; for example, all available levels of parallelism may - have been used up. */ - { - struct annotation_info nested_info - = { NULL_TREE, NULL_TREE, false, as_explicit_annotation, - node, info }; - if (info->state >= as_in_kernels_region) - do_not_annotate_loop_nest (info, as_explicit_annotation, - node); - walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions, - (void *) &nested_info, NULL); - *walk_subtrees = 0; - } + have been used up. However, assume that the combined construct + "#pragma acc kernels loop" means to try to process the whole + loop nest. + Note that a single OACC_LOOP construct represents an entire set + of collapsed loops so we do not have to deal explicitly with the + collapse clause here, as the Fortran front end does. */ + if (info->state == as_in_kernels_region && OACC_LOOP_COMBINED (node)) + { + walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions, + (void *) info, NULL); + *walk_subtrees = 0; + } + else + { + struct annotation_info nested_info + = { NULL_TREE, NULL_TREE, false, as_explicit_annotation, + node, info }; + if (info->state >= as_in_kernels_region) + do_not_annotate_loop_nest (info, as_explicit_annotation, + node); + walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions, + (void *) &nested_info, NULL); + *walk_subtrees = 0; + } break; case FOR_STMT: diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c b/gcc/testsuite/c-c++-common/goacc/combined-directives.c index c2a3c57b48b8..2519f23d49f0 100644 --- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c +++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c @@ -110,7 +110,7 @@ test () // { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } } // { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } } // { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } } -// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } } +// { dg-final { scan-tree-dump-times "acc loop auto" 6 "gimple" } } // { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } } // { dg-final { scan-tree-dump-times "acc loop independent private.i" 2 "gimple" } } // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } } diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c new file mode 100644 index 000000000000..89ec6447625f --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c @@ -0,0 +1,18 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that "acc kernels loop" directive causes annotation of the entire + loop nest. */ + +void f (float *a, float *b) +{ +#pragma acc kernels loop + for (int k = 0; k < 20; k++) + for (int l = 0; l < 20; l++) + for (int m = 0; m < 20; m++) + b[m] = a[m]; +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 2 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c new file mode 100644 index 000000000000..77a3b7a9136d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c @@ -0,0 +1,19 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that "acc kernels loop" directive causes annotation of the entire + loop nest in the presence of a collapse clause. */ + +void f (float *a, float *b) +{ +#pragma acc kernels loop collapse(2) + for (int k = 0; k < 20; k++) + for (int l = 0; l < 20; l++) + for (int m = 0; m < 20; m++) + b[m] = a[m]; +} + +/* { dg-final { scan-tree-dump-times "acc loop collapse.2." 1 "original" } } */ +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ From patchwork Wed Dec 15 15:54:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48950 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 692C83857C5F for ; Wed, 15 Dec 2021 16:00:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 528E8385842E; Wed, 15 Dec 2021 15:55:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 528E8385842E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: SWVa975c/qLkZtYxQaOKR5H5ccW/93Rd73gsPNTX9svUe4/rwPtarp5hWlImB+aKWovro/6FXJ 9BVx5qAH9vpBMnruSY9ewYNmy1UDKltJffm3UrSP/pdHroliFVGmB+X0S95cDSSm+5mSnCvubM YVeNse+5emsdkYwmgteNoAjgjlkLSLh+RqY5eViySrgAp1tMbyaNJq5KOBNyAv5dY6gk72TNti UNO1le+3Z/X/W+Y9Fvn1HLUjFlhhjWcidiZSqW8kRIi/4d4ppUtEPiGrljKK26faBP8R/NUaQA uc6P+qKnMtRsUaU1woHcmwDn X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738357" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:32 -0800 IronPort-SDR: LPQy8uZoKzy2ji5WtEO1jji7bJygHO1TIgnRqmC77ngLMt2T0pwQPEWDS0pteboXKo4oOtmSjS goxW4abRjAekJLjzBG9dBky/ZdwmZU/mO/KATk6vk0x6W9x0C2tgQ9N5csgO/mV7M/BUuazH+L Ba59zteKt6thU0kR0RRoCrRdKHqi4gqcGsnS0CuFk58h8YwX5B3i61HI75ulI2XtgNhxbljM15 uZO4t000zChixVjS3YOpNq/4V9K3dgSCEEwajnk0UTjTVA9PyCj3uCsCgEWzzeds+DoTrfJ8Hw Vv8= From: Frederik Harwath To: Subject: [PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran). Date: Wed, 15 Dec 2021 16:54:15 +0100 Message-ID: <20211215155447.19379-9-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tobias@codesourcery.com, Sandra Loosemore , thomas@codesourcery.com, fortran@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore Normally explicit loop directives in a kernels region inhibit automatic annotation of other loops in the same nest, on the theory that users have indicated they want manual control over that section of code. However there seems to be an expectation in user code that the combined "kernels loop" directive should still allow annotation of inner loops. This patch implements this behavior in Fortran. 2020-08-19 Sandra Loosemore gcc/fortran/ * openmp.c (annotate_do_loops_in_kernels): Handle EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner loops in a combined "acc kernels loop" directive. gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-annotation-18.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-19.f95: New. * gfortran.dg/goacc/combined-directives.f90: Adjust expected patterns. * gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise. * gfortran.dg/goacc/private-predetermined-kernels-1.f95: Likewise. --- gcc/fortran/openmp.c | 50 ++++++++++++++++++- .../gfortran.dg/goacc/combined-directives.f90 | 19 +++++-- .../goacc/kernels-loop-annotation-18.f95 | 28 +++++++++++ .../goacc/kernels-loop-annotation-19.f95 | 29 +++++++++++ .../goacc/private-explicit-kernels-1.f95 | 7 ++- .../goacc/private-predetermined-kernels-1.f95 | 7 ++- 6 files changed, 131 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95 -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 243b5e0a9ac6..b0b68b494778 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -9272,7 +9272,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent, case EXEC_OACC_PARALLEL_LOOP: case EXEC_OACC_PARALLEL: - case EXEC_OACC_KERNELS_LOOP: case EXEC_OACC_LOOP: /* Do not try to add automatic OpenACC annotations inside manually annotated loops. Presumably, the user avoided doing it on @@ -9317,6 +9316,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent, } break; + case EXEC_OACC_KERNELS_LOOP: + /* This is a combined "acc kernels loop" directive. We want to + leave the outer loop alone but try to annotate any nested + loops in the body. The expected structure nesting here is + EXEC_OACC_KERNELS_LOOP + EXEC_OACC_KERNELS_LOOP + EXEC_DO + EXEC_DO + ...body... */ + if (code->block) + /* Might be empty? */ + { + gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP); + gfc_omp_clauses *clauses = code->ext.omp_clauses; + int collapse = clauses->collapse; + gfc_expr_list *tile = clauses->tile_list; + gfc_code *inner = code->block->next; + + gcc_assert (inner->op == EXEC_DO); + gcc_assert (inner->block->op == EXEC_DO); + + /* We need to skip over nested loops covered by "collapse" or + "tile" clauses. "Tile" takes precedence + (see gfc_trans_omp_do). */ + if (tile) + { + collapse = 0; + for (gfc_expr_list *el = tile; el; el = el->next) + collapse++; + } + if (clauses->orderedc) + collapse = clauses->orderedc; + if (collapse <= 0) + collapse = 1; + for (int i = 1; i < collapse; i++) + { + gcc_assert (inner->op == EXEC_DO); + gcc_assert (inner->block->op == EXEC_DO); + inner = inner->block->next; + } + if (inner) + /* Loop might have empty body? */ + annotate_do_loops_in_kernels (inner->block->next, + inner, goto_targets, + as_in_kernels_region); + } + walk_block = false; + break; + case EXEC_DO_WHILE: case EXEC_DO_CONCURRENT: /* Traverse the body in a special state to allow EXIT statements diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 index 956349204f4d..562a4e40cd7d 100644 --- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 @@ -139,10 +139,21 @@ end subroutine test ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } } ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } } -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 2 "gimple" } } -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 2 "gimple" } } -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 2 "gimple" } } -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 2 "gimple" } } + +! These are the parallel loop variants. +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 1 "gimple" } } + +! These are the kernels loop variants. Here the inner loops are annotated +! separately. +! { dg-final { scan-tree-dump-times "acc loop private.i. worker" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop private.i. vector" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop private.i. seq" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop private.i. auto" 1 "gimple" } } +! { dg-final { scan-tree-dump-times "acc loop auto private.j." 4 "gimple" } } + ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" } } ! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" } } ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95 new file mode 100644 index 000000000000..e4e210a92dbb --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95 @@ -0,0 +1,28 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that "acc kernels loop" directive causes annotation of the entire +! loop nest. + +subroutine f (a, b) + + implicit none + real, intent (in), dimension(20) :: a + real, intent (out), dimension(20) :: b + integer :: k, l, m + +!$acc kernels loop + do k = 1, 20 + do l = 1, 20 + do m = 1, 20 + b(m) = a(m); + end do + end do + end do + +end subroutine f + +! { dg-final { scan-tree-dump-times "acc loop auto" 2 "original" } } + diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95 new file mode 100644 index 000000000000..5dd6e7f538a6 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95 @@ -0,0 +1,29 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that "acc kernels loop" directive causes annotation of the entire +! loop nest in the presence of a collapse clause. + +subroutine f (a, b) + + implicit none + real, intent (in), dimension(20) :: a + real, intent (out), dimension(20) :: b + integer :: k, l, m + +!$acc kernels loop collapse(2) + do k = 1, 20 + do l = 1, 20 + do m = 1, 20 + b(m) = a(m); + end do + end do + end do + +end subroutine f + +! { dg-final { scan-tree-dump-times "acc loop .*collapse.2." 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } + diff --git a/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95 b/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95 index 5d563d226b0c..0c47045df9c8 100644 --- a/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95 @@ -73,8 +73,9 @@ program test !$acc kernels loop private(i2_1_c, j2_1_c) independent ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "original" } } - ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "gimple" } } + ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) independent" 1 "gimple" } } do i2_1_c = 1, 100 + ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j2_1_c\\)" 1 "gimple" } } do j2_1_c = 1, 100 end do end do @@ -130,9 +131,11 @@ program test !$acc kernels loop private(i3_1_c, j3_1_c, k3_1_c) independent ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "original" } } - ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "gimple" } } + ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) independent" 1 "gimple" } } do i3_1_c = 1, 100 + ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j3_1_c\\)" 1 "gimple" } } do j3_1_c = 1, 100 + ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(k3_1_c\\)" 1 "gimple" } } do k3_1_c = 1, 100 end do end do diff --git a/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95 b/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95 index 12a7854526a9..3357a20263e7 100644 --- a/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95 @@ -73,8 +73,9 @@ program test !$acc kernels loop independent ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "original" } } - ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) private\\(j2_1_c\\) independent" 1 "gimple" } } + ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i2_1_c\\) independent" 1 "gimple" } } do i2_1_c = 1, 100 + ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j2_1_c\\)" 1 "gimple" } } do j2_1_c = 1, 100 end do end do @@ -130,9 +131,11 @@ program test !$acc kernels loop independent ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "original" } } - ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) private\\(j3_1_c\\) private\\(k3_1_c\\) independent" 1 "gimple" } } + ! { dg-final { scan-tree-dump-times "#pragma acc loop private\\(i3_1_c\\) independent" 1 "gimple" } } do i3_1_c = 1, 100 + ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(j3_1_c\\)" 1 "gimple" } } do j3_1_c = 1, 100 + ! { dg-final { scan-tree-dump-times "#pragma acc loop auto private\\(k3_1_c\\)" 1 "gimple" } } do k3_1_c = 1, 100 end do end do From patchwork Wed Dec 15 15:54:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48951 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 09A36385741B for ; Wed, 15 Dec 2021 16:01:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 04A04385800C; Wed, 15 Dec 2021 15:55:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 04A04385800C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: WXfWwOSvGMJtzcbSW7y6guXw2qETziFk+MTqnfA9sE9odgbzt2oR+wAu7als30vuktA7XwM4CH J3VIMsmR/FIt/i2Gu9Cl0OLspeo2/irzLrZ8AqSPRER9IgIWlA/Zhs5B/CHcFP0cRWk2pba7A9 07Wr6O8EcTueClgblszCUnlLh4u7LB26VccUryMTHSU8v8KbWeBCJBjUN99QIJNZ3bXLPN+UvM 6yfMkFk5RE1FNC9C8MBmIuf8wosjqy/DTM5rUzNIaGLVrRBN9tM7XiKY2ebHKRKNKbRrxvYIoo 1wMl6JyykgsodUbG6d9kZ7he X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738358" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:33 -0800 IronPort-SDR: fTundpMeqxVVh1lMKpSA1O9M6PaZBgTqj2wVSLn0ua2oafHzjFKLfqF5TdTUg4IxdpM5vcW2nf rY9u0oK9Hk8DBAcCttXbjN+woHo1ngBDKRHn5M0GxtrliLFCJBbD0Mp5Cd2JVrqRqKjMb2SaXV mYKjVciERR6688IXeDLEzz+CWToRsqbvHqxAOAxrBXro4p1gcNnz9APODTZDOIJlhO+k5CsUwB dGvrb2YKt6R1sXgap429fSllx7N/0tWGI3n2wWpf9dW9G3c/US4/ill0ANCzXIxdRm1r3eyCez O9c= From: Frederik Harwath To: Subject: [PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops. Date: Wed, 15 Dec 2021 16:54:16 +0100 Message-ID: <20211215155447.19379-10-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: fortran@gcc.gnu.org, nathan@acm.org, Sandra Loosemore , tobias@codesourcery.com, thomas@codesourcery.com, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore This tweak to the OpenACC kernels loop annotation relaxes the restrictions on function calls in the loop body. Normally calls to functions not explicitly marked with a parallelism attribute are not permitted, but C/C++ builtins and Fortran intrinsics have known semantics so we can generally permit those without restriction. If any turn out to be problematical, we can add on here to recognize them, or in the processing of the "auto" annotations. 2020-08-22 Sandra Loosemore gcc/c-family/ * c-omp.c (annotate_loops_in_kernels_regions): Test for calls to builtins. gcc/fortran/ * openmp.c (check_expr_for_invalid_calls): Check for intrinsic functions. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-20.c: New. * gfortran.dg/goacc/kernels-loop-annotation-20.f95: New. --- gcc/c-family/c-omp.c | 10 ++++--- gcc/fortran/openmp.c | 9 ++++--- .../goacc/kernels-loop-annotation-20.c | 23 ++++++++++++++++ .../goacc/kernels-loop-annotation-20.f95 | 26 +++++++++++++++++++ 4 files changed, 61 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index 30757877eafe..e7c27f45e888 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3545,8 +3545,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees, break; case CALL_EXPR: - /* Direct function calls to functions marked as OpenACC routines are - allowed. Reject indirect calls or calls to non-routines. */ + /* Direct function calls to builtins and functions marked as + OpenACC routines are allowed. Reject indirect calls or calls + to non-routines. */ if (info->state >= as_in_kernels_loop) { tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE; @@ -3560,8 +3561,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees, } if (fn_decl == NULL_TREE) do_not_annotate_loop_nest (info, as_invalid_call, node); - else if (!lookup_attribute ("oacc function", - DECL_ATTRIBUTES (fn_decl))) + else if (!fndecl_built_in_p (fn_decl, BUILT_IN_NORMAL) + && !lookup_attribute ("oacc function", + DECL_ATTRIBUTES (fn_decl))) do_not_annotate_loop_nest (info, as_invalid_call, node); } break; diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index b0b68b494778..d5d996e378d7 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -9156,9 +9156,12 @@ check_expr_for_invalid_calls (gfc_expr **exprp, int *walk_subtrees, switch (expr->expr_type) { case EXPR_FUNCTION: - if (expr->value.function.esym - && (expr->value.function.esym->attr.oacc_routine_lop - != OACC_ROUTINE_LOP_NONE)) + /* Permit calls to Fortran intrinsic functions and to routines + with an explicitly declared parallelism level. */ + if (expr->value.function.isym + || (expr->value.function.esym + && (expr->value.function.esym->attr.oacc_routine_lop + != OACC_ROUTINE_LOP_NONE))) return 0; /* Else fall through. */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c new file mode 100644 index 000000000000..5e3f02845713 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c @@ -0,0 +1,23 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that calls to built-in functions don't inhibit kernels loop + annotation. */ + +void foo (int n, int *input, int *out1, int *out2) +{ +#pragma acc kernels + { + int i; + + for (i = 0; i < n; i++) + { + out1[i] = __builtin_clz (input[i]); + out2[i] = __builtin_popcount (input[i]); + } + } +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 new file mode 100644 index 000000000000..5169a0a1676d --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 @@ -0,0 +1,26 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with calls to intrinsics in the body can be annotated. + +subroutine f (n, input, out1, out2) + implicit none + integer :: n + integer, intent (in), dimension (n) :: input + integer, intent (out), dimension (n) :: out1, out2 + + integer :: i + +!$acc kernels + + do i = 1, n + out1(i) = min (i, input(i)) + out2(i) = not (input(i)) + end do +!$acc end kernels + +end subroutine f + +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } From patchwork Wed Dec 15 15:54:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48952 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C63ED385DC14 for ; Wed, 15 Dec 2021 16:02:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 335633857C69; Wed, 15 Dec 2021 15:55:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 335633857C69 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: N6wa/0cZ9j9ThD4DKuKFcyj3cPomtTCaVzTHSDgTV8YsTtHyGv7TkXI7+ZtHgDvKSVnutUwB5F Hc0v2veHSviL2rvdwzE/KRjn7RZIs1TEkVE/dMQXiulalmtZVJnaZsVPmcEfQ3nO1Scd9XSawa EUCbwEjmM/1OoKsWzQFWPahcACCj6zlOU3KePxg1ZwSHhtj3F9YDo5IupNSfgwdb8R7BXUWFu9 /bS/NbY/QZSwe7Iq/Trw78Ghq6WtfHYwtcR9ISL4UECN3dr1NIQqFTiWhLJI7xrCfLvKJkgkwR wECbVET5TL8KtwAyx+b5gvjV X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738360" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:36 -0800 IronPort-SDR: iOMdWEHpw2ubaizPkBs8B4KIzd2PVYUgqnfLirsD9jImDuOTX/UihQ+j4NR8XkF+tfaiVJy6fV fdwauusl5f26n2q1uIH6hp0AfhCB3UUqSS7j/TzDzE7iks+WGTVYoyX8zJiGhSTTBik7sehzyT KhQKtYgeqIvft9AvljcOUfWC2k5GCDAdp5eZUuFfPPjw06yJ7PTXMOHo1kcocjc1qg4nr2dBT4 Iy+Q/O/PHoFNUMtLy4agxdAk6/4SdkvX+yN1S4MKdrBZtHXInSt0oAqBnyLJrgcCZL0Y6vqgOr c8c= From: Frederik Harwath To: Subject: [PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation. Date: Wed, 15 Dec 2021 16:54:17 +0100 Message-ID: <20211215155447.19379-11-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tobias@codesourcery.com, Sandra Loosemore , thomas@codesourcery.com, fortran@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore Several of the Fortran tests for kernels loop annotation were failing due to changes in the formatting of "acc loop" constructs in the dump file. Now the "auto" clause appears first, instead of after "private". 2020-08-23 Sandra Loosemore gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update expected output. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise. --- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 | 2 +- 14 files changed, 14 insertions(+), 14 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 index 41f6307dbb17..42e751dbfb83 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 @@ -30,4 +30,4 @@ subroutine f (a, b, c) !$acc end kernels end subroutine f -! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 index d51482e4685d..6e2e2c41172b 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 @@ -31,4 +31,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 index 3c4956d70775..03c4234ce7cd 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 @@ -36,4 +36,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 index 3ec459f0a8df..6aeb3f2fe4d0 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 @@ -35,4 +35,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 index 91f431cca432..7d1cff64a3d9 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 @@ -32,4 +32,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 index 570c12d3ad70..dab0d4030d03 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 @@ -32,4 +32,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 index 6e44a304b28b..15ef670e246d 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 @@ -31,4 +31,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 index 4624a05247d9..2baaa594be18 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 @@ -29,4 +29,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 index daed8f7f6e9d..e629891e31f9 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 @@ -30,4 +30,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 index 0c4ad256b7eb..6c3300b70537 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 @@ -31,4 +31,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 index 1c3f87eed6e4..52a9e7e7a85b 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 @@ -31,5 +31,5 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 index 43173a70df24..60eb245a22a9 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 @@ -31,4 +31,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 index ec42213220e7..438a13acee18 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 @@ -44,5 +44,5 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 index 9188f70d9664..aa97e37c054c 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 @@ -46,5 +46,5 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private\\(i\\) auto" 2 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 2 "original" } } From patchwork Wed Dec 15 15:54:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48953 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 600013858005 for ; Wed, 15 Dec 2021 16:02:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 79BCF3857C6A for ; Wed, 15 Dec 2021 15:55:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 79BCF3857C6A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: uur4byBxdAs6ynR3R4sUYGcQAUsMU18RYRAvq64GbJizWO6D2BinrZBnr9kGGc6QZwTQGej2xr I0yu4rcD+AI4Rkj0HK+L+CFcBLWuLh+76zR80us2c0UJMyzcF6bZvxNNh0TV6l7PQ10Q4nHSFC ApEthdlr2zuChuuA/dbFmBDYV4zjrXt23PrYhEHmDJzJF3Uytlj83I6wUaPBDS2I+v9i0P/2RD Mfbqep21Ox/ai9nk1xtxI7SNBnSvbcMBlT0gmC4JnZfMX3quBjwWwvmnixDmpGo3VwJcy2jypc g7kNVPv6glHGne5Fck8BPg/J X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738364" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:37 -0800 IronPort-SDR: rq7nDVGP65x1n4/15wiETY/LXY84OHwcB2BLn5EAA5QUs5HHoV1OnMLkzZmjbfX34sEfKiTWun fJuwGA8Hu5sHJwqLfuBgqtL/X+wvMB8ms7HVzsFtiXgUl/laL1mzlQqW9N6WqPdxFJzSJZbcDp Rwm9umxQpromWCPuaSVr5vyK+TBvdwKlrFxrQ6WZjWlEuuwe9mP9YlCaaB46EVf213Ip4MIhIi SaVQCBJBlct4w9l9sn5Cd484BRYXeiIJ8bExFQxuZv+5IXvaCO3UNW85G/UQDQNpH/Owpa/Nog 0P0= From: Frederik Harwath To: Subject: [PATCH 11/40] Clean up loop variable extraction in OpenACC kernels loop annotation. Date: Wed, 15 Dec 2021 16:54:18 +0100 Message-ID: <20211215155447.19379-12-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-02.mgc.mentorg.com (139.181.222.2) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandra Loosemore , thomas@codesourcery.com, joseph@codesourcery.com, nathan@acm.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore The code for identifying annotatable loops in OpenACC kernels regions previously looked for the loop variable as the left-hand side of the comparison in the loop end test. However, front end optimizations sometimes switch the sense of the comparison, making this method unreliable. In particular, it's ambiguous when both operands to the end test comparison are local variables. This patch reorders the loop processing to identify the loop variable from the initializer, rather than the end test. The processing of the end test then just checks that one of the operands to the comparison matches the variable appearing in the initializer. Much of the patch is code refactoring, moving the initializer analysis out of annotate_for_loop to check_and_annotate_for_loop so it can be performed earlier. 2020-08-30 Sandra Loosemore gcc/c-family/ * c-omp.c (annotate_for_loop): Move initializer processing... (check_and_annotate_for_loop): ... to here. Allow the loop variable as either operand to the condition. --- gcc/c-family/c-omp.c | 196 +++++++++++++++++++++---------------------- 1 file changed, 98 insertions(+), 98 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index e7c27f45e888..e73fb5d01f7e 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3174,86 +3174,26 @@ static tree (*lang_specific_unwrap_initializer) (tree); /* Try to annotate the given NODE, which must be a FOR_STMT, with a "#pragma acc loop auto" annotation. In practice, this means - building an OMP_FOR node for it. PREV_STMT is the statement - immediately before the loop, which may be used as the loop's - initialization statement. Annotating the loop may fail, in which - case INFO is used to record the cause of the failure and the - original loop remains unchanged. This function returns the - transformed loop if the transformation succeeded, the original node - otherwise. */ + building an OMP_FOR node for it. DECL and INIT are the + previously-verified iteration variable and initializer. Annotating + the loop may fail, in which case INFO is used to record the cause + of the failure and the original loop remains unchanged. This + function returns the transformed loop if the transformation + succeeded, the original node otherwise. */ static tree -annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi, +annotate_for_loop (tree node, tree decl, tree init, struct annotation_info *info) { gcc_checking_assert (TREE_CODE (node) == FOR_STMT); location_t loc = EXPR_LOCATION (node); tree cond = FOR_COND (node); + tree incr = FOR_EXPR (node); + + gcc_assert (decl); gcc_assert (cond); - tree decl = TREE_OPERAND (cond, 0); gcc_assert (decl && TREE_CODE (decl) == VAR_DECL); - tree init = FOR_INIT_STMT (node); - tree prev_stmt = NULL_TREE; - bool unlink_prev = false; - bool fix_decl = false; - - - /* Both the C and C++ front ends normally put the initializer in the - statement list just before the FOR_STMT instead of in FOR_INIT_STMT. - If FOR_INIT_STMT happens to exist but isn't a MODIFY_EXPR, bail out - because the code below won't handle it. */ - if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR) - { - do_not_annotate_loop (info, as_invalid_initializer, NULL_TREE); - return node; - } - - /* Examine the statement before the loop to see if it is a - valid initializer. It must be either a MODIFY_EXPR or VAR_DECL, - possibly wrapped in language-specific structure. */ - if (init == NULL_TREE && prev_tsi != NULL) - { - prev_stmt = tsi_stmt (*prev_tsi); - - /* Call the language-specific hook to unwrap prev_stmt. */ - if (prev_stmt) - prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt); - - /* See if we have a valid MODIFY_EXPR. */ - if (prev_stmt - && TREE_CODE (prev_stmt) == MODIFY_EXPR - && TREE_OPERAND (prev_stmt, 0) == decl - && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1))) - { - init = prev_stmt; - unlink_prev = true; - } - else if (prev_stmt == decl - && !TREE_SIDE_EFFECTS (DECL_INITIAL (decl))) - { - /* If the preceding statement is the declaration of the loop - variable with its initialization, build an assignment - expression for the loop's initializer. */ - init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl, - DECL_INITIAL (decl)); - /* We need to remove the initializer from the decl if we - end up using the init we just built instead. */ - fix_decl = true; - } - } - - if (init == NULL_TREE) - /* There is nothing we can do to find the correct init statement for - this loop, but c_finish_omp_for insists on having one and would fail - otherwise. In that case, we would just return node. Do that - directly, here. */ - { - do_not_annotate_loop (info, as_missing_initializer, NULL_TREE); - return node; - } - - tree incr = FOR_EXPR (node); /* The C++ frontend can wrap the increment two levels deep inside a cleanup expression, but c_finish_omp_for does not care about that. */ @@ -3278,18 +3218,6 @@ annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi, NULL_TREE, false, info); if (omp_for != NULL_TREE) { - if (unlink_prev) - /* We don't need the previous statement that we consumed as an - initializer in the new OMP_FOR any more. */ - tsi_delink (prev_tsi); - - if (fix_decl) - /* We no longer need the initializer expression on the decl of - the loop variable and don't want to duplicate it. The - kernels conversion pass would interpret it as a stray - assignment in a gang-single region. */ - DECL_INITIAL (prev_stmt) = NULL_TREE; - /* Add an auto clause, then return the new loop. */ tree auto_clause = build_omp_clause (loc, OMP_CLAUSE_AUTO); OMP_CLAUSE_CHAIN (auto_clause) = OMP_FOR_CLAUSES (omp_for); @@ -3315,11 +3243,16 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi, { tree node = *nodeptr; gcc_assert (TREE_CODE (node) == FOR_STMT); + tree init = FOR_INIT_STMT (node); + tree cond = FOR_COND (node); + tree prev_stmt = NULL_TREE; + tree decl = NULL_TREE; + bool unlink_prev = false; + bool fix_decl = false; /* This structure describes the current loop statement. */ struct annotation_info loop_info = { node, NULL_TREE, false, as_in_kernels_loop, NULL_TREE, info }; - tree cond = FOR_COND (node); /* If we are in the body of an explicitly-annotated loop, do not add annotations to this loop or any other nested loops. */ @@ -3331,30 +3264,84 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi, That is why we are doing some checks on the loop condition that duplicate what c_finish_omp_for is doing. */ - /* The loop condition must be a comparison. */ + /* First we need to find the decl and initializer for the + controlling variable. Both the C and C++ front ends normally put + the initializer in the statement list just before the FOR_STMT + instead of in FOR_INIT_STMT. If FOR_INIT_STMT happens to exist + but isn't a MODIFY_EXPR, give up. + handle it. */ + + else if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR) + do_not_annotate_loop (&loop_info, as_invalid_initializer, NULL_TREE); + + /* Examine the statement before the loop to see if it is a + valid initializer. It must be either a MODIFY_EXPR or VAR_DECL, + possibly wrapped in language-specific structure. */ + else if (init == NULL_TREE && prev_tsi != NULL && tsi_stmt (*prev_tsi)) + { + prev_stmt = tsi_stmt (*prev_tsi); + + /* Call the language-specific hook to unwrap prev_stmt. */ + prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt); + + /* See if we have a valid MODIFY_EXPR. */ + if (TREE_CODE (prev_stmt) == MODIFY_EXPR + && is_local_var (TREE_OPERAND (prev_stmt, 0)) + && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1))) + { + decl = TREE_OPERAND (prev_stmt, 0); + init = prev_stmt; + unlink_prev = true; + } + else if (is_local_var (prev_stmt) + && !TREE_SIDE_EFFECTS (DECL_INITIAL (prev_stmt))) + { + /* If the preceding statement is the declaration of the loop + variable with its initialization, build an assignment + expression for the loop's initializer. */ + decl = prev_stmt; + init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl, + DECL_INITIAL (decl)); + /* We need to remove the initializer from the decl if we + end up using the init we just built instead. */ + fix_decl = true; + } + } + + if (init == NULL_TREE || decl == NULL_TREE) + /* There is nothing we can do to find the correct init statement for + this loop. */ + do_not_annotate_loop (&loop_info, as_missing_initializer, NULL_TREE); + + /* The condition must be a comparison of the decl we found in + the initializer against an expression that can be hoisted + outside the loop. */ + if (loop_info.state > as_in_kernels_loop) + /* Skip validating condition if we've already got an error. */ + ; else if (cond == NULL_TREE) do_not_annotate_loop (&loop_info, as_missing_predicate, NULL_TREE); else if (TREE_CODE_CLASS (TREE_CODE (cond)) != tcc_comparison) do_not_annotate_loop (&loop_info, as_invalid_predicate, cond); else { - /* The condition's LHS must be a local variable that does not - have its address taken. Its RHS must also be such a local - variable or a constant. */ - tree induction_var = TREE_OPERAND (cond, 0); - tree limit_var = TREE_OPERAND (cond, 1); - if (!is_local_var (induction_var) - || (!is_local_var (limit_var) - && (TREE_CODE_CLASS (TREE_CODE (limit_var)) - != tcc_constant))) + tree limit_exp = NULL_TREE; + + if (TREE_OPERAND (cond, 0) == decl) + limit_exp = TREE_OPERAND (cond, 1); + else if (TREE_OPERAND (cond, 1) == decl) + limit_exp = TREE_OPERAND (cond, 0); + + if (!limit_exp + || (!is_local_var (limit_exp) + && (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant))) do_not_annotate_loop (&loop_info, as_invalid_predicate, cond); else { /* These variables must not be assigned to in the loop. */ - loop_info.vars = tree_cons (NULL_TREE, induction_var, - loop_info.vars); - if (TREE_CODE_CLASS (TREE_CODE (limit_var)) != tcc_constant) - loop_info.vars = tree_cons (NULL_TREE, limit_var, loop_info.vars); + loop_info.vars = tree_cons (NULL_TREE, decl, loop_info.vars); + if (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant) + loop_info.vars = tree_cons (NULL_TREE, limit_exp, loop_info.vars); } } @@ -3369,11 +3356,24 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi, /* If the traversal of the loop and all nested loops didn't hit any problems, attempt the actual transformation. If it succeeds, replace this node with the annotated loop. */ - tree result = annotate_for_loop (node, prev_tsi, &loop_info); + tree result = annotate_for_loop (node, decl, init, &loop_info); if (result != node) { /* Success! */ *nodeptr = result; + + if (unlink_prev) + /* We don't need the previous statement that we consumed + as an initializer in the new OMP_FOR any more. */ + tsi_delink (prev_tsi); + + if (fix_decl) + /* We no longer need the initializer expression on the + decl of the loop variable and don't want to duplicate + it. The kernels conversion pass would interpret it as + a stray assignment in a gang-single region. */ + DECL_INITIAL (decl) = NULL_TREE; + return; } } From patchwork Wed Dec 15 15:54:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48954 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EC6E6385E82B for ; Wed, 15 Dec 2021 16:03:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 177363858039 for ; Wed, 15 Dec 2021 15:55:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 177363858039 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: FkXb6U0IGdeGzaR1J0RGJrb5PC8qaZNaS4mLK/Bp73rLbRFHhDZvH/m+LMqOsfmyKEtCKevgOu IlTOBFStci9/kjNFrpyjxCdw6sLheXYQnnh1XRlKnai1yYm+eOD2GLfb29IGnCBBJxBsolTihk Vv+7m56eO+GBN9NMhtEHGjwdKT34GZZ3BWDwnkgqoY9KotU5wpeJLIRisVL3icesU4OBEyKt3K agAKF4cSuFFbv0lQeTalKv4QRyftev4jDX9z0g5NTrAssqK8qM+pleZxBh7TwddDzhYX5I4XoK AOJwVt5Jo7cqD/h8Dav46/ix X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736557" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:48 -0800 IronPort-SDR: WzDbXTaVXkzrTKS9q/BTZ9etubChrTz1+fij98IzNJ6UaWU9ZixyaR9lNtUUhzkuy/SUV2J5AH /JPeSQZZinVU7rI2M1MscyTiDoB85igZB6uXK4oey6LFS0TCal4oSrEQZeWoyufAWpMhkoLvCW epeEX1k06nbF5SchHpXUmAGrSYmKeJhOb81FFRE7CvssCokRV63f1Ig/A6wyjJ3DuokQeNYC3G yJ/1vJjqIaiNBNieZNwdAwJHL8FXu79VF8EzlpGsHM0HeyRO7nvuWwDa8SaZhQpJjeWcgm71ds kdU= From: Frederik Harwath To: Subject: [PATCH 12/40] Relax some restrictions on the loop bound in kernels loop annotation. Date: Wed, 15 Dec 2021 16:54:19 +0100 Message-ID: <20211215155447.19379-13-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandra Loosemore , thomas@codesourcery.com, joseph@codesourcery.com, nathan@acm.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Sandra Loosemore OpenACC loop semantics require that the loop bound be computable before entering the loop, rather than the C/C++ semantics where the end test is evaluated on every iteration. Formerly the kernels loop annotater permitted only constants and variables not modified in the loop body in the loop bound expression. This patch relaxes those restrictions somewhat to allow many forms of expressions involving such constants and variables, including calls to constant functions. 2020-08-30 Sandra Loosemore gcc/c-family/ * c-omp.c (end_test_ok_for_annotation_r): New. (end_test_ok_for_annotation): New. (check_and_annotate_for_loop): Use the new helper function. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-21.c: New. * c-c++-common/goacc/kernels-loop-annotation-22.c: New. --- gcc/c-family/c-omp.c | 120 ++++++++++++++++-- .../goacc/kernels-loop-annotation-21.c | 42 ++++++ .../goacc/kernels-loop-annotation-22.c | 41 ++++++ 3 files changed, 194 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index e73fb5d01f7e..dc63d304ca67 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3165,6 +3165,116 @@ is_local_var (tree decl) && !TREE_ADDRESSABLE (decl)); } +/* EXP is a loop bound expression for a comparison against local + variable DECL. Check whether this is potentially valid in an OpenACC loop + context, namely that it can be precomputed when entering the loop + construct per the OpenACC specification. Local variables referenced + in both DECL and EXP that may not be modified in the body of the loop + are added to the list in INFO to be checked later. + + FIXME: Ideally we would like to make this test permissive rather than + restrictive, and allow the later conversion of the "auto" attribute to + either "seq" or "independent" to make the determination using dataflow, + alias analysis, etc rather than a tree traversal. But presently it does + not do that and always just hoists the loop bound expression. So the + current implementation only considers expressions involving unmodified + local variables and constants, using a tree walk. */ + +static tree +end_test_ok_for_annotation_r (tree *tp, int *walk_subtrees, + void *data) +{ + tree exp = *tp; + struct annotation_info *info = (struct annotation_info *) data; + + switch (TREE_CODE_CLASS (TREE_CODE (exp))) + { + case tcc_constant: + /* Constants are trivially known to be invariant. */ + return NULL_TREE; + + case tcc_declaration: + if (is_local_var (exp)) + { + tree t; + /* Add it to the list of variables that can't be modified in the + loop, only if not already present. */ + for (t = info->vars; t && TREE_VALUE (t) != exp; + t = TREE_CHAIN (t)) + ; + if (!t) + info->vars = tree_cons (NULL_TREE, exp, info->vars); + return NULL_TREE; + } + else if (TREE_CODE (exp) == VAR_DECL && TREE_READONLY (exp)) + return NULL_TREE; + else if (TREE_CODE (exp) == FUNCTION_DECL) + return NULL_TREE; + break; + + case tcc_unary: + case tcc_binary: + case tcc_comparison: + /* Allow arithmetic expressions and comparisons provided + that the operands are good. */ + return NULL_TREE; + + default: + /* Handle some special cases. */ + switch (TREE_CODE (exp)) + { + case COND_EXPR: + case TRUTH_ANDIF_EXPR: + case TRUTH_ORIF_EXPR: + case TRUTH_AND_EXPR: + case TRUTH_OR_EXPR: + case TRUTH_XOR_EXPR: + case TRUTH_NOT_EXPR: + /* ?: and boolean operators are OK. */ + return NULL_TREE; + + case CALL_EXPR: + /* Allow calls to constant functions with invariant operands. */ + { + tree fndecl = get_callee_fndecl (exp); + if (fndecl && TREE_READONLY (fndecl)) + return NULL_TREE; + } + break; + + case ADDR_EXPR: + /* We can expect addresses of things to be invariant. */ + return NULL_TREE; + + default: + break; + } + } + + /* Reject anything else. */ + *walk_subtrees = 0; + return exp; +} + +static bool +end_test_ok_for_annotation (tree decl, tree exp, + struct annotation_info *info) +{ + /* Traversal returns NULL_TREE if all is well. */ + if (!walk_tree (&exp, end_test_ok_for_annotation_r, info, NULL)) + { + /* So far, so good. Check the decl against any variables collected + in the exp. */ + tree t; + for (t = info->vars; t; t = TREE_CHAIN (t)) + if (TREE_VALUE (t) == decl) + return false; + info->vars = tree_cons (NULL_TREE, decl, info->vars); + return true; + } + return false; +} + /* The initializer for a FOR_STMT is sometimes wrapped in various other language-specific tree structures. We need a hook to unwrap them. This function takes a tree argument and should return either a @@ -3333,16 +3443,8 @@ check_and_annotate_for_loop (tree *nodeptr, tree_stmt_iterator *prev_tsi, limit_exp = TREE_OPERAND (cond, 0); if (!limit_exp - || (!is_local_var (limit_exp) - && (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant))) + || !end_test_ok_for_annotation (decl, limit_exp, &loop_info)) do_not_annotate_loop (&loop_info, as_invalid_predicate, cond); - else - { - /* These variables must not be assigned to in the loop. */ - loop_info.vars = tree_cons (NULL_TREE, decl, loop_info.vars); - if (TREE_CODE_CLASS (TREE_CODE (limit_exp)) != tcc_constant) - loop_info.vars = tree_cons (NULL_TREE, limit_exp, loop_info.vars); - } } /* Walk the body. This will process any nested loops, so we have to do it diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c new file mode 100644 index 000000000000..f87444ede4b4 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c @@ -0,0 +1,42 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test for rejecting annotation on loops that have various subexpressions + in the loop end test that are not loop-invariant. */ + +extern int g (int); +extern int x; +extern int gg (int, int) __attribute__ ((const)); + +void f (float *a, float *b, int n) +{ + + int j; +#pragma acc kernels + { + /* Non-constant function call. */ + for (int i = 0; i < g(n); i++) /* { dg-warning "loop cannot be annotated" } */ + a[i] = b[i]; + + /* Global variable. */ + for (int i = x; i < n + x; i++) /* { dg-warning "loop cannot be annotated" } */ + a[i] = b[i]; + + /* Explicit reference to the loop variable. */ + for (int i = 0; i < gg (i, n); i++) /* { dg-warning "loop cannot be annotated" } */ + a[i] = b[i]; + + /* Reference to a variable that is modified in the body of the loop. */ + j = 0; + for (int i = 0; i < gg (j, n); i++) /* { dg-warning "loop cannot be annotated" } */ + { + a[i] = b[i]; + j = i; + } + + } +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c new file mode 100644 index 000000000000..6a5099d2ff9d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c @@ -0,0 +1,41 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test for accepting annotation on loops that have various forms of + loop-invariant expressions in their end test. */ + +extern const int x; +extern int g (int) __attribute__ ((const)); + +void f (float *a, float *b, int n) +{ + + int j; +#pragma acc kernels + { + /* Reversed form of comparison. */ + for (int i = 0; n >= i; i++) + a[i] = b[i]; + + /* Constant function call. */ + for (int i = 0; i < g(n); i++) + a[i] = b[i]; + + /* Constant global variable. */ + for (int i = 0; i < x; i++) + a[i] = b[i]; + + /* Complicated expression involving conditionals, etc. */ + for (int i = 0; i < ((x == 4) ? (n << 2) : (n << 3)); i++) + a[i] = b[i]; + + /* Reference to a local variable not modified in the loop. */ + j = ((x == 4) ? (n << 2) : (n << 3)); + for (int i = 0; i < j; i++) + a[i] = b[i]; + } +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 5 "original" } } */ From patchwork Wed Dec 15 15:54:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48955 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D174E385DC3B for ; Wed, 15 Dec 2021 16:03:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 51382385803B; Wed, 15 Dec 2021 15:55:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 51382385803B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 4VxD3vMUpqt1i8H/kVasv0xVs/FKGkDUGTDpMWnPtJZZCFNw8FaIuwMsBIoGmRT4j/6JBo/mru wlyLOqxq0wdZBd/7rD1tqasetGhbCek+Rsy6faCpIe6wBnxLGsmLCmSpK19YqMdVICg4dzwf2U w3q2Dz7/0mSbzz0UQMfj//QEbaio22WkkCaGLeeiC/0aG1zMm0PjPJnzsBEVOze7HmMt/G25Ot uQ9g/TEg9h5zm/JUY5BynVpHxhI8AW+1tmPFRRgONybkNgosar6Mrcp2dCLB3oZRhhsC6c92Bg /3vrdd3yjv/Z38Kuo9r5g91l X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736558" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:49 -0800 IronPort-SDR: Q7XZst4v0k1uGrjIwDDNW0747AX8+yBRlCXs6B7lHKwkAKXtkxfX/VlsrkE5A74SkPlKP41LIP 4R8nUgxVGH70dxPMyMqcAOcJZLv0oNqj+lpf8wpPn0L5LZ6uIWggFz3VJinyachJVWjRMVlmDf cuDEDs3YP6ScMcAdGwdEUIVGMC/3lVmwEBtHgKyLuLjrpHsxPcjp6QS2ftjlaMZozU3BUkHDts 126fObBhZs+RWy3WWu5IMS69Xrc5km6ivVGLGOSBxeCZah0e5CXhrUKMwYPlSH++KDla1tWkrp RxU= From: Frederik Harwath To: Subject: [PATCH 13/40] Fortran: Delinearize array accesses Date: Wed, 15 Dec 2021 16:54:20 +0100 Message-ID: <20211215155447.19379-14-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tobias@codesourcery.com, rguenther@suse.de, thomas@codesourcery.com, fortran@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The Fortran front end presently linearizes accesses to multi-dimensional arrays by combining the indices for the various dimensions into a series of explicit multiplies and adds with refactoring to allow CSE of invariant parts of the computation. Unfortunately this representation interferes with Graphite-based loop optimizations. It is difficult to recover the original multi-dimensional form of the access by the time loop optimizations run because parts of it have already been optimized away or into a form that is not easily recognizable, so it seems better to have the Fortran front end produce delinearized accesses to begin with, a set of nested ARRAY_REFs similar to the existing behavior of the C and C++ front ends. This is a long-standing problem that has previously been discussed e.g. in PR 14741 and PR61000. This patch is an initial implementation for explicit array accesses only; it doesn't handle the accesses generated during scalarization of whole-array or array-section operations, which follow a different code path. Co-Authored-By: Tobias Burnus gcc/ChangeLog: * expr.c (get_inner_reference): Handle NOP_EXPR. gcc/fortran/ChangeLog: * lang.opt: Document -param=delinearize. * trans-array.c: (get_class_array_vptr): New function. (get_array_lbound): New function. (get_array_ubound): New function. (gfc_conv_array_ref): Implement main delinearization logic. (build_array_ref): Adjust. gcc/testsuite/ChangeLog: * gfortran.dg/assumed_type_2.f90: Adjust test expectations. * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/graphite/block-2.f: Likewise. * gfortran.dg/graphite/block-3.f90: Likewise. * gfortran.dg/graphite/block-4.f90: Likewise. * gfortran.dg/graphite/id-9.f: Likewise. * gfortran.dg/inline_matmul_16.f90: Likewise. * gfortran.dg/inline_matmul_24.f90: Likewise. * gfortran.dg/no_arg_check_2.f90: Likewise. * gfortran.dg/pr32921.f: Likewise. * gfortran.dg/reassoc_4.f: Likewise. * gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise. --- gcc/expr.c | 1 + gcc/fortran/lang.opt | 4 + gcc/fortran/trans-array.c | 321 +++++++++++++----- gcc/testsuite/gfortran.dg/assumed_type_2.f90 | 6 +- .../gfortran.dg/goacc/kernels-loop-inner.f95 | 2 +- .../gfortran.dg/gomp/affinity-clause-1.f90 | 2 +- gcc/testsuite/gfortran.dg/graphite/block-2.f | 9 +- .../gfortran.dg/graphite/block-3.f90 | 2 +- .../gfortran.dg/graphite/block-4.f90 | 2 +- gcc/testsuite/gfortran.dg/graphite/id-9.f | 2 +- .../gfortran.dg/inline_matmul_16.f90 | 2 + .../gfortran.dg/inline_matmul_24.f90 | 2 +- gcc/testsuite/gfortran.dg/no_arg_check_2.f90 | 6 +- gcc/testsuite/gfortran.dg/pr32921.f | 2 +- gcc/testsuite/gfortran.dg/reassoc_4.f | 2 +- .../gfortran.dg/vect/fast-math-mgrid-resid.f | 1 + 16 files changed, 270 insertions(+), 96 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/expr.c b/gcc/expr.c index eb33643bd770..188905b4fe4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7759,6 +7759,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize, break; case VIEW_CONVERT_EXPR: + case NOP_EXPR: break; case MEM_REF: diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index a202c04c4a25..25c5a5a32c41 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -521,6 +521,10 @@ fdefault-real-16 Fortran Var(flag_default_real_16) Set the default real kind to an 16 byte wide type. +-param=delinearize= +Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) Param Optimization +Delinearize array references. + fdollar-ok Fortran Var(flag_dollar_ok) Allow dollar signs in entity names. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index 5ceb261b6989..e84b4cb55f05 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t) } } - static tree -build_array_ref (tree desc, tree offset, tree decl, tree vptr) +get_class_array_vptr (tree desc, tree vptr) { - tree tmp; tree type; tree cdesc; @@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr) && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type))) vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0)); } + return vptr; +} +static tree +build_array_ref (tree desc, tree offset, tree decl, tree vptr) +{ + tree tmp; + vptr = get_class_array_vptr (desc, vptr); tmp = gfc_conv_array_data (desc); tmp = build_fold_indirect_ref_loc (input_location, tmp); tmp = gfc_build_array_ref (tmp, offset, decl, vptr); return tmp; } +/* Get the declared lower bound for rank N of array DECL which might + be either a bare array or a descriptor. This differs from + gfc_conv_array_lbound because it gets information for temporary array + objects from AR instead of the descriptor (they can differ). */ + +static tree +get_array_lbound (tree decl, int n, gfc_symbol *sym, + gfc_array_ref *ar, gfc_se *se) +{ + if (sym->attr.temporary) + { + gfc_se tmpse; + gfc_init_se (&tmpse, se); + gfc_conv_expr_type (&tmpse, ar->as->lower[n], gfc_array_index_type); + gfc_add_block_to_block (&se->pre, &tmpse.pre); + return tmpse.expr; + } + else + return gfc_conv_array_lbound (decl, n); +} + +/* Similarly for the upper bound. */ +static tree +get_array_ubound (tree decl, int n, gfc_symbol *sym, + gfc_array_ref *ar, gfc_se *se) +{ + if (sym->attr.temporary) + { + gfc_se tmpse; + gfc_init_se (&tmpse, se); + gfc_conv_expr_type (&tmpse, ar->as->upper[n], gfc_array_index_type); + gfc_add_block_to_block (&se->pre, &tmpse.pre); + return tmpse.expr; + } + else + return gfc_conv_array_ubound (decl, n); +} + /* Build an array reference. se->expr already holds the array descriptor. This should be either a variable, indirect variable reference or component reference. For arrays which do not have a descriptor, se->expr will be the data pointer. - a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]*/ + + There are two strategies here. In the traditional case, multidimensional + arrays are explicitly linearized into a one-dimensional array, with the + index computed as if by + a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]] + + However, we can often get better code using the Graphite framework + and scalar evolutions in the middle end, which expects to see + multidimensional array accesses represented as nested ARRAY_REFs, similar + to what the C/C++ front ends produce. Delinearization is controlled + by flag_delinearize_aref. */ void gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, @@ -3798,11 +3851,16 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, tree tmp; tree stride; tree decl = NULL_TREE; + tree cooked_decl = NULL_TREE; + tree vptr = se->class_vptr; gfc_se indexse; gfc_se tmpse; gfc_symbol * sym = expr->symtree->n.sym; char *var_name = NULL; + tree aref = NULL_TREE; + tree atype = NULL_TREE; + /* Handle coarrays. */ if (ar->dimen == 0) { gcc_assert (ar->codimen || sym->attr.select_rank_temporary @@ -3862,15 +3920,160 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, } } + /* Per comments above, DECL is not always a declaration. It may be + either a variable, indirect variable reference, or component + reference. It may have array or pointer type, or it may be a + descriptor with RECORD_TYPE. */ decl = se->expr; if (IS_CLASS_ARRAY (sym) && sym->attr.dummy && ar->as->type != AS_DEFERRED) decl = sym->backend_decl; - cst_offset = offset = gfc_index_zero_node; - add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl)); + /* A pointer array component can be detected from its field decl. Fix + the descriptor, mark the resulting variable decl and store it in + COOKED_DECL to pass to gfc_build_array_ref. */ + if (get_CFI_desc (sym, expr, &cooked_decl, ar)) + cooked_decl = build_fold_indirect_ref_loc (input_location, cooked_decl); + if (!expr->ts.deferred && !sym->attr.codimension + && is_pointer_array (se->expr)) + { + if (TREE_CODE (se->expr) == COMPONENT_REF) + cooked_decl = se->expr; + else if (TREE_CODE (se->expr) == INDIRECT_REF) + cooked_decl = TREE_OPERAND (se->expr, 0); + else + cooked_decl = se->expr; + } + else if (expr->ts.deferred + || (sym->ts.type == BT_CHARACTER + && sym->attr.select_type_temporary)) + { + if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr))) + { + cooked_decl = se->expr; + if (TREE_CODE (cooked_decl) == INDIRECT_REF) + cooked_decl = TREE_OPERAND (cooked_decl, 0); + } + else + cooked_decl = sym->backend_decl; + } + else if (sym->ts.type == BT_CLASS) + { + if (UNLIMITED_POLY (sym)) + { + gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr); + gfc_init_se (&tmpse, NULL); + gfc_conv_expr (&tmpse, class_expr); + if (!se->class_vptr) + vptr = gfc_class_vptr_get (tmpse.expr); + gfc_free_expr (class_expr); + cooked_decl = tmpse.expr; + } + else + cooked_decl = NULL_TREE; + } + + /* Find the base of the array; this normally has ARRAY_TYPE. */ + tree base = build_fold_indirect_ref_loc (input_location, + gfc_conv_array_data (se->expr)); + tree type = TREE_TYPE (base); - /* Calculate the offsets from all the dimensions. Make sure to associate - the final offset so that we form a chain of loop invariant summands. */ + /* Handle special cases, copied from gfc_build_array_ref. After we get + through this, we know TYPE definitely is an ARRAY_TYPE. */ + if (GFC_ARRAY_TYPE_P (type) && GFC_TYPE_ARRAY_RANK (type) == 0) + { + gcc_assert (GFC_TYPE_ARRAY_CORANK (type) > 0); + se->expr = fold_convert (TYPE_MAIN_VARIANT (type), base); + return; + } + if (TREE_CODE (type) != ARRAY_TYPE) + { + gcc_assert (cooked_decl == NULL_TREE); + se->expr = base; + return; + } + + /* Check for cases where we cannot delinearize. */ + + bool delinearize = flag_delinearize_aref; + + /* There is no point in trying to delinearize 1-dimensional arrays. */ + if (ar->dimen == 1) + delinearize = false; + + if (delinearize + && (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)) + || (DECL_P (se->expr) + && DECL_LANG_SPECIFIC (se->expr) + && GFC_DECL_SAVED_DESCRIPTOR (se->expr)))) + { + /* Descriptor arrays that may not be contiguous cannot + be delinearized without using the stride in the descriptor, + which generally involves introducing a division operation. + That's unlikely to produce optimal code, so avoid doing it. */ + tree desc = se->expr; + if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr))) + desc = GFC_DECL_SAVED_DESCRIPTOR (se->expr); + tree tmptype = TREE_TYPE (desc); + if (POINTER_TYPE_P (tmptype)) + tmptype = TREE_TYPE (tmptype); + enum gfc_array_kind akind = GFC_TYPE_ARRAY_AKIND (tmptype); + if (akind != GFC_ARRAY_ASSUMED_SHAPE_CONT + && akind != GFC_ARRAY_ASSUMED_RANK_CONT + && akind != GFC_ARRAY_ALLOCATABLE + && akind != GFC_ARRAY_POINTER_CONT) + delinearize = false; + } + + /* See gfc_build_array_ref in trans.c. If we have a cooked_decl or + vptr, then we most likely have to do pointer arithmetic using a + linearized array offset. */ + if (delinearize && cooked_decl) + delinearize = false; + else if (delinearize && get_class_array_vptr (se->expr, vptr)) + delinearize = false; + + if (!delinearize) + { + /* Initialize the offset from the array descriptor. This accounts + for the array base being something other than zero. */ + cst_offset = offset = gfc_index_zero_node; + add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl)); + } + else + { + /* If we are delinearizing, build up the nested array type using the + dimension information we have for each rank. */ + atype = TREE_TYPE (type); + for (n = 0; n < ar->dimen; n++) + { + /* We're working from the outermost nested array reference inward + in this step. ATYPE is the element type for the access in + this rank; build the new array type based on the bounds + information and store it back into ATYPE for the next rank's + processing. */ + tree lbound = get_array_lbound (decl, n, sym, ar, se); + tree ubound = get_array_ubound (decl, n, sym, ar, se); + tree dimen = build_range_type (TREE_TYPE (lbound), + lbound, ubound); + atype = build_array_type (atype, dimen); + + /* Emit a DECL_EXPR for the array type so the gimplification of + its type sizes works correctly. */ + if (! TYPE_NAME (atype)) + TYPE_NAME (atype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL, + NULL_TREE, atype); + gfc_add_expr_to_block (&se->pre, + build1 (DECL_EXPR, atype, + TYPE_NAME (atype))); + } + + /* Cast base to the innermost array type. */ + if (DECL_P (base)) + TREE_ADDRESSABLE (base) = 1; + aref = build1 (NOP_EXPR, atype, base); + } + + /* Process indices in reverse order. */ for (n = ar->dimen - 1; n >= 0; n--) { /* Calculate the index for this dimension. */ @@ -3888,16 +4091,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, indexse.expr = save_expr (indexse.expr); /* Lower bound. */ - tmp = gfc_conv_array_lbound (decl, n); - if (sym->attr.temporary) - { - gfc_init_se (&tmpse, se); - gfc_conv_expr_type (&tmpse, ar->as->lower[n], - gfc_array_index_type); - gfc_add_block_to_block (&se->pre, &tmpse.pre); - tmp = tmpse.expr; - } - + tmp = get_array_lbound (decl, n, sym, ar, se); cond = fold_build2_loc (input_location, LT_EXPR, logical_type_node, indexse.expr, tmp); msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' " @@ -3912,16 +4106,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, arrays. */ if (n < ar->dimen - 1 || ar->as->type != AS_ASSUMED_SIZE) { - tmp = gfc_conv_array_ubound (decl, n); - if (sym->attr.temporary) - { - gfc_init_se (&tmpse, se); - gfc_conv_expr_type (&tmpse, ar->as->upper[n], - gfc_array_index_type); - gfc_add_block_to_block (&se->pre, &tmpse.pre); - tmp = tmpse.expr; - } - + tmp = get_array_ubound (decl, n, sym, ar, se); cond = fold_build2_loc (input_location, GT_EXPR, logical_type_node, indexse.expr, tmp); msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' " @@ -3934,65 +4119,41 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, } } - /* Multiply the index by the stride. */ - stride = gfc_conv_array_stride (decl, n); - tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, - indexse.expr, stride); - - /* And add it to the total. */ - add_to_offset (&cst_offset, &offset, tmp); - } - - if (!integer_zerop (cst_offset)) - offset = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, offset, cst_offset); - - /* A pointer array component can be detected from its field decl. Fix - the descriptor, mark the resulting variable decl and pass it to - build_array_ref. */ - decl = NULL_TREE; - if (get_CFI_desc (sym, expr, &decl, ar)) - decl = build_fold_indirect_ref_loc (input_location, decl); - if (!expr->ts.deferred && !sym->attr.codimension - && is_pointer_array (se->expr)) - { - if (TREE_CODE (se->expr) == COMPONENT_REF) - decl = se->expr; - else if (TREE_CODE (se->expr) == INDIRECT_REF) - decl = TREE_OPERAND (se->expr, 0); - else - decl = se->expr; - } - else if (expr->ts.deferred - || (sym->ts.type == BT_CHARACTER - && sym->attr.select_type_temporary)) - { - if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr))) + if (!delinearize) { - decl = se->expr; - if (TREE_CODE (decl) == INDIRECT_REF) - decl = TREE_OPERAND (decl, 0); + /* Multiply the index by the stride. */ + stride = gfc_conv_array_stride (decl, n); + tmp = fold_build2_loc (input_location, MULT_EXPR, + gfc_array_index_type, + indexse.expr, stride); + + /* And add it to the total. */ + add_to_offset (&cst_offset, &offset, tmp); } else - decl = sym->backend_decl; - } - else if (sym->ts.type == BT_CLASS) - { - if (UNLIMITED_POLY (sym)) { - gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr); - gfc_init_se (&tmpse, NULL); - gfc_conv_expr (&tmpse, class_expr); - if (!se->class_vptr) - se->class_vptr = gfc_class_vptr_get (tmpse.expr); - gfc_free_expr (class_expr); - decl = tmpse.expr; + /* Peel off a layer of array nesting from ATYPE to + to get the result type of the new ARRAY_REF. */ + atype = TREE_TYPE (atype); + aref = build4 (ARRAY_REF, atype, aref, indexse.expr, + NULL_TREE, NULL_TREE); } - else - decl = NULL_TREE; } - se->expr = build_array_ref (se->expr, offset, decl, se->class_vptr); + if (!delinearize) + { + /* Build a linearized array reference using the offset from all + dimensions. */ + if (!integer_zerop (cst_offset)) + offset = fold_build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, offset, cst_offset); + se->class_vptr = vptr; + vptr = get_class_array_vptr (se->expr, vptr); + se->expr = gfc_build_array_ref (base, offset, cooked_decl, vptr); + } + else + /* Return the outermost ARRAY_REF we already built. */ + se->expr = aref; } diff --git a/gcc/testsuite/gfortran.dg/assumed_type_2.f90 b/gcc/testsuite/gfortran.dg/assumed_type_2.f90 index 5d3cd7eaece9..07be87ef1eb6 100644 --- a/gcc/testsuite/gfortran.dg/assumed_type_2.f90 +++ b/gcc/testsuite/gfortran.dg/assumed_type_2.f90 @@ -147,12 +147,12 @@ end ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } } -! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } } +! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } } -! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } } +! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } } -! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } } +! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 index a3ad591f926c..d8d14c42be01 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 @@ -7,7 +7,7 @@ program main integer :: a(100,100), b(100,100) integer :: i, j, d - !$acc kernels ! { dg-message "optimized: assigned OpenACC seq loop parallelism" } + !$acc kernels ! { dg-message "optimized: assigned OpenACC gang loop parallelism" } do i=1,100 do j=1,100 a(i,j) = 1 diff --git a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 index 13bdd36d0b4d..51c6013565a1 100644 --- a/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/affinity-clause-1.f90 @@ -22,7 +22,7 @@ end ! { dg-final { scan-tree-dump-times "D\\.\[0-9\]+ = .integer.kind=4.. __builtin_cosf ..real.kind=4.. a \\+ 1.0e\\+0\\);" 2 "original" } } -! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &b\\\[.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &d\\\[\\(.*jj \\* 5 \\+ .* ?\\) \\+ -6\\\]\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &b\\\[.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) jj=2:5:2, integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &\\(\\(integer\\(kind.*?d\\).*?$" 1 "original" } } ! { dg final { scan-tree-dump-times "#pragma omp task affinity\\(iterator\\(integer\\(kind=4\\) i=D.3938:5:1\\):\\*\\(c_char \\*\\) &b\\\[\\(.* ? \\+ -1\\\]\\) affinity\\(iterator\\(integer\\(kind=4\\) i=D\\.\[0-9\]+:5:1\\):\\*\\(c_char \\*\\) &d\\\[\\(\\(integer\\(kind=8\\)\\) i \\+ -1\\) \\* 6\\\]\\)" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/graphite/block-2.f b/gcc/testsuite/gfortran.dg/graphite/block-2.f index bea8ddeb8267..266da378c5d9 100644 --- a/gcc/testsuite/gfortran.dg/graphite/block-2.f +++ b/gcc/testsuite/gfortran.dg/graphite/block-2.f @@ -1,5 +1,11 @@ ! { dg-do compile } ! { dg-additional-options "-std=legacy" } + +! ldist introduces a __builtin_memset for the first loop and hence +! breaks the testcases's assumption regarding the number of SCoPs +! because Graphite cannot deal with the call. +! { dg-additional-options "-fdisable-tree-ldist" } + SUBROUTINE MATRIX_MUL_UNROLLED (A, B, C, L, M, N) DIMENSION A(L,M), B(M,N), C(L,N) @@ -18,5 +24,4 @@ RETURN END -! Disabled for now as it requires delinearization. -! { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" { xfail *-*-* } } } +! { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" } } diff --git a/gcc/testsuite/gfortran.dg/graphite/block-3.f90 b/gcc/testsuite/gfortran.dg/graphite/block-3.f90 index 452de7349050..0edca92bb894 100644 --- a/gcc/testsuite/gfortran.dg/graphite/block-3.f90 +++ b/gcc/testsuite/gfortran.dg/graphite/block-3.f90 @@ -12,6 +12,6 @@ enddo end subroutine matrix_multiply -! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } } +! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } } ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } } diff --git a/gcc/testsuite/gfortran.dg/graphite/block-4.f90 b/gcc/testsuite/gfortran.dg/graphite/block-4.f90 index 42af5b62444e..f2aed98bcf82 100644 --- a/gcc/testsuite/gfortran.dg/graphite/block-4.f90 +++ b/gcc/testsuite/gfortran.dg/graphite/block-4.f90 @@ -15,6 +15,6 @@ enddo end subroutine matrix_multiply -! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } } +! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } } ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } } diff --git a/gcc/testsuite/gfortran.dg/graphite/id-9.f b/gcc/testsuite/gfortran.dg/graphite/id-9.f index c93937088972..885a9dfaa1bb 100644 --- a/gcc/testsuite/gfortran.dg/graphite/id-9.f +++ b/gcc/testsuite/gfortran.dg/graphite/id-9.f @@ -8,7 +8,7 @@ do l=1,3 do k=1,l enddo - bar(k,l)=bar(k,l)+(v3b-1.d0) + bar(k,l)=bar(k,l)+(v3b-1.d0) ! { dg-bogus ".*iteration 2 invokes undefined behavior" "TODO" { xfail *-*-* } } enddo enddo do m=1,ne diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_16.f90 b/gcc/testsuite/gfortran.dg/inline_matmul_16.f90 index 580cb1ac9393..2a7f63b9c963 100644 --- a/gcc/testsuite/gfortran.dg/inline_matmul_16.f90 +++ b/gcc/testsuite/gfortran.dg/inline_matmul_16.f90 @@ -1,5 +1,7 @@ ! { dg-do run } ! { dg-options "-ffrontend-optimize -fdump-tree-optimized -Wrealloc-lhs -finline-matmul-limit=1000 -O" } +! { dg-additional-options "--param delinearize=0" } TODO + ! PR 66094: Check functionality for MATMUL(TRANSPOSE(A),B)) for two-dimensional arrays program main implicit none diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 index 3168d5f10064..8d84f3cdb01b 100644 --- a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 +++ b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 @@ -39,4 +39,4 @@ program testMATMUL call abort() end if end program testMATMUL -! { dg-final { scan-tree-dump-times "gamma5\\\[__var_1_do \\* 4 \\+ __var_2_do\\\]|gamma5\\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } } +! { dg-final { scan-tree-dump-times "gamma5.*\\\[NON_LVALUE_EXPR <__var_1_do>\\\]\\\[NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 index 3570b9719ebb..0900dd82646f 100644 --- a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 +++ b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 @@ -129,12 +129,12 @@ end ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } } -! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } } +! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } } -! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } } +! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } } -! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } } +! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } } ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/pr32921.f b/gcc/testsuite/gfortran.dg/pr32921.f index 0661208edde5..853438609c43 100644 --- a/gcc/testsuite/gfortran.dg/pr32921.f +++ b/gcc/testsuite/gfortran.dg/pr32921.f @@ -45,4 +45,4 @@ RETURN END -! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } } +! { dg-final { scan-tree-dump-times "ubound" 4 "lim2" } } diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f b/gcc/testsuite/gfortran.dg/reassoc_4.f index fdcb46e835cf..2368b76aecb2 100644 --- a/gcc/testsuite/gfortran.dg/reassoc_4.f +++ b/gcc/testsuite/gfortran.dg/reassoc_4.f @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200" } +! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200 --param delinearize=0" } subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight) integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1 real*8 w(3,3),vo(3,3),anisox(3,3,3,3),s(60,60),weight diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f index 08965cc5e202..6c469b1964c6 100644 --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f @@ -3,6 +3,7 @@ ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 -fpredictive-commoning -fdump-tree-pcom-details -std=legacy" } ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } } ! { dg-additional-options "-mzarch" { target { s390*-*-* } } } +! { dg-additional-options "--param delinearize=0" } TODO ******* RESID COMPUTES THE RESIDUAL: R = V - AU * From patchwork Wed Dec 15 15:54:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48957 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 938E8385802D for ; Wed, 15 Dec 2021 16:04:58 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id CDB45385842C for ; Wed, 15 Dec 2021 15:55:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CDB45385842C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: bMv0WzH332lWgwmSSmY8dZBab5tSpPDUqxSK2VOxG1uYFnf9hnU4Ewreukk/OHPL9L9aLTL8mk BDaLL2KsY8sNTsgsDxu5CpqmsBsjvs2VJx+BAYxZv1gsPzuwexg15MjzQYyB9LZzk6seI1Ih6M jCJgPpc/zLGmLmHtmvy+Yon1kicrPhfVIpGCAECw9EfM/WZX6gRHLWSVnohHnOTMSa5zEtZjd3 MeyYiIsLv7JlbzNeZh0fdY+bd1f7yEsDQXuIBtJRs7df4YCkxBT+osU8jlly0lbekskLcXsyYY 95JrMPwUERRIyrkBOz/WyheG X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736560" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:53 -0800 IronPort-SDR: gWy2gxaCFU4fDk1O2tZ/jIQrX17jWOn1rBaSFhBGJYWu3zFZXXa1GEodHVbILjhfBj9kG6lDHR APeaYp5cWt0p/uQBo6+FKuuQVBEL5lFA85FYay6uThGf5hZUZP9o7T+AiI5KzFZXka3lMW53Xu Cu1PmtaaKmU3nPO2rP1kvd/uKmrpDlzX3KK6+5865ntW37oybeaFN4tv6YQw6+sWYFaCy85sv5 LtnCj4M+SGKa1ryQZE8uuwzGUH9l3J3vVye8c7jzQVkYZHMgFaGKZ5rbMQYCWhHqHXNCyl/nyY lUA= From: Frederik Harwath To: Subject: [PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite Date: Wed, 15 Dec 2021 16:54:21 +0100 Message-ID: <20211215155447.19379-15-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The OpenACC device lowering pass must run after the Graphite pass to allow for the use of Graphite for automatic parallelization of kernels regions in the future. Experimentation has shown that it is best, performancewise, to run pass_oacc_device_lower together with the related passes pass_oacc_loop_designation and pass_oacc_gimple_workers early after pass_graphite in pass_tree_loop, at least if the other tree loop passes are not adjusted. In particular, to enable vectorization which is crucial for GCN offloading, device lowering should happen before pass_vectorize. To bring the loops contained in the offloading functions into the shape expected by the loop vectorizer, we have to make sure that some passes that previously were executed only once before pass_tree_loop are also executed on the offloading functions. To ensure the execution of pass_oacc_device_lower if pass_tree_loop does not execute (no loops, no optimizations), we introduce two further copies of the pass to the pipeline that run if there are no loops or if no optimization is performed. gcc/ChangeLog: * omp-general.c (oacc_get_fn_dim_size): Return 0 on missing "dims". * omp-oacc-neuter-broadcast.cc: Make pass_omp_oacc_neuter_broadcast clonable. * omp-offload.c (pass_oacc_loop_designation::clone): New member function. (pass_oacc_gimple_workers::clone): Likewise. (pass_oacc_gimple_device_lower::clone): Likewise. * passes.c (pass_data_no_loop_optimizations): New pass_data. (class pass_no_loop_optimizations): New pass. (make_pass_no_loop_optimizations): New function. * passes.def: Move pass_oacc_{loop_designation, gimple_workers, device_lower} into tree_loop, and add copies to pass_tree_no_loop and to new pass_no_loop_optimizations. Add copies of passes pass_ccp, pass_ipa_warn, pass_complete_unrolli, pass_backprop, pass_phiprop, pass_fix_loops after the OpenACC passes in pass_tree_loop. * tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New member function. (pass_complete_unrolli::clone): Likewise. * tree-ssa-loop.c (pass_fix_loops::clone): Likewise. (pass_tree_loop_init::clone): Likewise. (pass_tree_loop_done::clone): Likewise. * tree-ssa-phiprop.c (pass_phiprop::clone): Likewise. * tree-pass.h (make_pass_oacc_only): New declaration. (make_pass_oacc_functions_only): New declaration. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust expected output to pass name changes due to the pass reordering and cloning. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/goacc/loop-processing-1.c: Adjust expected output to pass name changes due to the pass reordering and cloning. * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-routine.c: Likewise. * c-c++-common/goacc/routine-nohost-1.c: Likewise. * c-c++-common/unroll-1.c: Likewise. * c-c++-common/unroll-4.c: Likewise. * gcc.dg/tree-ssa/backprop-1.c: Likewise. * gcc.dg/tree-ssa/backprop-2.c: Likewise. * gcc.dg/tree-ssa/backprop-3.c: Likewise. * gcc.dg/tree-ssa/backprop-4.c: Likewise. * gcc.dg/tree-ssa/backprop-5.c: Likewise. * gcc.dg/tree-ssa/backprop-6.c: Likewise. * gcc.dg/tree-ssa/cunroll-1.c: Likewise. * gcc.dg/tree-ssa/cunroll-3.c: Likewise. * gcc.dg/tree-ssa/cunroll-9.c: Likewise. * gcc.dg/tree-ssa/ldist-17.c: Likewise. * gcc.dg/tree-ssa/loop-38.c: Likewise. * gcc.dg/tree-ssa/pr21463.c: Likewise. * gcc.dg/tree-ssa/pr45427.c: Likewise. * gcc.dg/tree-ssa/pr61743-1.c: Likewise. * gcc.dg/unroll-2.c: Likewise. * gcc.dg/unroll-3.c: Likewise. * gcc.dg/unroll-4.c: Likewise. * gcc.dg/unroll-5.c: Likewise. * gcc.dg/vect/vect-profile-1.c: Likewise. * gcc.dg/tree-ssa/loopclosedphi.c: Likewise. * gcc.dg/tree-ssa/pr59597.c: Likewise. * gcc.dg/vect/bb-slp-59.c: Likewise. * c-c++-common/goacc/device-lowering-debug-optimization.c: New test. * c-c++-common/goacc/device-lowering-no-loops.c: New test. * c-c++-common/goacc/device-lowering-no-optimization.c: New test. Co-Authored-By: Thomas Schwinge --- gcc/omp-general.c | 8 +- gcc/omp-oacc-neuter-broadcast.cc | 2 + gcc/omp-offload.c | 6 ++ gcc/passes.c | 42 ++++++++ gcc/passes.def | 44 ++++++++- .../goacc/classify-kernels-unparallelized.c | 8 +- .../c-c++-common/goacc/classify-kernels.c | 8 +- .../c-c++-common/goacc/classify-parallel.c | 8 +- .../c-c++-common/goacc/classify-routine.c | 22 ++--- .../device-lowering-debug-optimization.c | 29 ++++++ .../goacc/device-lowering-no-loops.c | 17 ++++ .../goacc/device-lowering-no-optimization.c | 30 ++++++ .../c-c++-common/goacc/routine-nohost-1.c | 6 +- gcc/testsuite/c-c++-common/unroll-1.c | 8 +- gcc/testsuite/c-c++-common/unroll-4.c | 4 +- .../gcc.dg/goacc/loop-processing-1.c | 5 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/loop-38.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21463.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr45427.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr59597.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c | 2 +- gcc/testsuite/gcc.dg/unroll-2.c | 2 +- gcc/testsuite/gcc.dg/unroll-3.c | 4 +- gcc/testsuite/gcc.dg/unroll-4.c | 4 +- gcc/testsuite/gcc.dg/unroll-5.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-59.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-profile-1.c | 2 +- gcc/tree-pass.h | 2 + gcc/tree-ssa-loop-ivcanon.c | 2 + gcc/tree-ssa-loop.c | 99 +++++++++++++++++++ gcc/tree-ssa-phiprop.c | 2 + .../libgomp.oacc-c-c++-common/pr85486-2.c | 2 +- .../vector-length-128-1.c | 2 +- .../vector-length-128-2.c | 3 +- .../vector-length-128-3.c | 2 +- .../vector-length-128-4.c | 2 +- .../vector-length-128-5.c | 2 +- .../vector-length-128-6.c | 2 +- .../vector-length-128-7.c | 2 +- 50 files changed, 363 insertions(+), 88 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/omp-general.c b/gcc/omp-general.c index 445275524134..27a1bc8092c8 100644 --- a/gcc/omp-general.c +++ b/gcc/omp-general.c @@ -2954,7 +2954,13 @@ oacc_get_fn_dim_size (tree fn, int axis) while (axis--) dims = TREE_CHAIN (dims); - int size = TREE_INT_CST_LOW (TREE_VALUE (dims)); + tree v = TREE_VALUE (dims); + /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to + avoid ICE for some OpenACC 'kernels' ("parloops") constructs. */ + if (v == NULL_TREE) + return 0; + + int size = TREE_INT_CST_LOW (v); return size; } diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc index e43338f3abf2..94ecdc4d4e9a 100644 --- a/gcc/omp-oacc-neuter-broadcast.cc +++ b/gcc/omp-oacc-neuter-broadcast.cc @@ -1992,6 +1992,8 @@ public: return execute_omp_oacc_neuter_broadcast (); } + opt_pass * clone () { return new pass_omp_oacc_neuter_broadcast (m_ctxt); } + }; // class pass_omp_oacc_neuter_broadcast } // anon namespace diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 833f7ddea58f..e99aaac0e515 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -2444,6 +2444,8 @@ public: return execute_oacc_loop_designation (); } + opt_pass * clone () { return new pass_oacc_loop_designation (m_ctxt); } + }; // class pass_oacc_loop_designation const pass_data pass_data_oacc_device_lower = @@ -2467,12 +2469,16 @@ public: {} /* opt_pass methods: */ + /* TODO If this were gated on something like '!(fun->curr_properties & + PROP_gimple_oaccdevlow)', then we could easily have several instances + in the pass pipeline? */ virtual bool gate (function *) { return flag_openacc; }; virtual unsigned int execute (function *) { return execute_oacc_device_lower (); } + opt_pass * clone () { return new pass_oacc_device_lower (m_ctxt); } }; // class pass_oacc_device_lower diff --git a/gcc/passes.c b/gcc/passes.c index 64550b00b43c..4a1f4a4b5900 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -620,6 +620,48 @@ make_pass_all_optimizations_g (gcc::context *ctxt) namespace { +const pass_data pass_data_no_loop_optimizations = +{ + GIMPLE_PASS, /* type */ + "*no_loop_optimizations", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_OPTIMIZE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +/* This pass runs if loop optimizations are disabled + at the current optimization level. */ + +class pass_no_loop_optimizations : public gimple_opt_pass +{ +public: + pass_no_loop_optimizations (gcc::context *ctxt) + : gimple_opt_pass (pass_data_no_loop_optimizations, ctxt) + {} + + /* opt_pass methods: */ + virtual bool + gate (function *) + { + return !optimize || optimize_debug; + } + +}; // class pass_no_loop_optimizations + +} // anon namespace + +static gimple_opt_pass * +make_pass_no_loop_optimizations (gcc::context *ctxt) +{ + return new pass_no_loop_optimizations (ctxt); +} + +namespace { + const pass_data pass_data_rest_of_compilation = { RTL_PASS, /* type */ diff --git a/gcc/passes.def b/gcc/passes.def index 0f541454e7f1..5b9bb422d281 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -183,9 +183,6 @@ along with GCC; see the file COPYING3. If not see INSERT_PASSES_AFTER (all_passes) NEXT_PASS (pass_fixup_cfg); NEXT_PASS (pass_lower_eh_dispatch); - NEXT_PASS (pass_oacc_loop_designation); - NEXT_PASS (pass_omp_oacc_neuter_broadcast); - NEXT_PASS (pass_oacc_device_lower); NEXT_PASS (pass_omp_device_lower); NEXT_PASS (pass_omp_target_link); NEXT_PASS (pass_adjust_alignment); @@ -292,6 +289,35 @@ along with GCC; see the file COPYING3. If not see POP_INSERT_PASSES () NEXT_PASS (pass_parallelize_loops, false /* oacc_kernels_p */); NEXT_PASS (pass_expand_omp_ssa); + + /* Interrupt pass_tree_loop for OpenACC device lowering. */ + NEXT_PASS (pass_oacc_only); + PUSH_INSERT_PASSES_WITHIN (pass_oacc_only) + NEXT_PASS (pass_tree_loop_done); + NEXT_PASS (pass_oacc_loop_designation); + NEXT_PASS (pass_omp_oacc_neuter_broadcast); + NEXT_PASS (pass_oacc_device_lower); + + NEXT_PASS (pass_oacc_functions_only); + PUSH_INSERT_PASSES_WITHIN (pass_oacc_functions_only) + /* Repeat some passes on OpenACC functions after device lowering. */ + /* Lower complex instructions arising from OpenACC + reductions. */ + NEXT_PASS (pass_lower_complex); + /* Those passes are necessary here to allow the loop vectorizer to + work on the offloading functions which is important for AMD GCN + offloading. */ + NEXT_PASS (pass_ccp, true /* nonzero_p */); + NEXT_PASS (pass_complete_unrolli); + NEXT_PASS (pass_backprop); + NEXT_PASS (pass_phiprop); + NEXT_PASS (pass_fix_loops); + POP_INSERT_PASSES () + + /* Continue pass_tree_loop after OpenACC device lowering. */ + NEXT_PASS (pass_tree_loop_init); + POP_INSERT_PASSES () + NEXT_PASS (pass_ch_vect); NEXT_PASS (pass_if_conversion); /* pass_vectorize must immediately follow pass_if_conversion. @@ -311,15 +337,21 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_loop_prefetch); /* Run IVOPTs after the last pass that uses data-reference analysis as that doesn't handle TARGET_MEM_REFs. */ + NEXT_PASS (pass_iv_optimize); NEXT_PASS (pass_lim); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () + + /* Pass group that runs when pass_tree_loop is disabled or there are no loops in the function. */ NEXT_PASS (pass_tree_no_loop); PUSH_INSERT_PASSES_WITHIN (pass_tree_no_loop) NEXT_PASS (pass_slp_vectorize); + NEXT_PASS (pass_oacc_loop_designation); + NEXT_PASS (pass_omp_oacc_neuter_broadcast); + NEXT_PASS (pass_oacc_device_lower); POP_INSERT_PASSES () NEXT_PASS (pass_simduid_cleanup); NEXT_PASS (pass_lower_vector_ssa); @@ -397,6 +429,12 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_local_pure_const); NEXT_PASS (pass_modref); POP_INSERT_PASSES () + NEXT_PASS (pass_no_loop_optimizations); + PUSH_INSERT_PASSES_WITHIN (pass_no_loop_optimizations) + NEXT_PASS (pass_oacc_loop_designation); + NEXT_PASS (pass_omp_oacc_neuter_broadcast); + NEXT_PASS (pass_oacc_device_lower); + POP_INSERT_PASSES () NEXT_PASS (pass_tm_init); PUSH_INSERT_PASSES_WITHIN (pass_tm_init) NEXT_PASS (pass_tm_mark); diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c index e391184f403d..338676aa20ff 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c @@ -6,7 +6,7 @@ { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } { dg-additional-options "-fdump-tree-parloops1-all" } - { dg-additional-options "-fdump-tree-oaccloops" } */ + { dg-additional-options "-fdump-tree-oaccloops1" } */ /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting aspects of that functionality. */ @@ -39,6 +39,6 @@ void KERNELS () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c index 779e2b0a24db..37e2a57455d1 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c @@ -6,7 +6,7 @@ { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } { dg-additional-options "-fdump-tree-parloops1-all" } - { dg-additional-options "-fdump-tree-oaccloops" } */ + { dg-additional-options "-fdump-tree-oaccloops1" } */ /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting aspects of that functionality. */ @@ -35,6 +35,6 @@ void KERNELS () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c index 9056aa69dad6..82b70ae280cd 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c @@ -4,7 +4,7 @@ /* { dg-additional-options "-O2" } { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } - { dg-additional-options "-fdump-tree-oaccloops" } */ + { dg-additional-options "-fdump-tree-oaccloops1" } */ /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting aspects of that functionality. */ @@ -27,6 +27,6 @@ void PARALLEL () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops" } } */ + { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c b/gcc/testsuite/c-c++-common/goacc/classify-routine.c index f7f0454009bf..cd539370dbbf 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c @@ -4,7 +4,7 @@ /* { dg-additional-options "-O2" } { dg-additional-options "-fopt-info-optimized-omp" } { dg-additional-options "-fdump-tree-ompexp" } - { dg-additional-options "-fdump-tree-oaccloops" } */ + { dg-additional-options "-fdump-tree-oaccloops1" } */ /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting aspects of that functionality. */ @@ -29,14 +29,14 @@ void ROUTINE () /* Check the offloaded function's classification and compute dimensions (will always be 1 x 1 x 1 for non-offloading compilation). - { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' doesn't have 'nohost' clause" 1 "oaccloops" { target c } } } - { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops" { target { c++ && { ! offloading_enabled } } } } } - { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops" { target { c++ && offloading_enabled } } } } - { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' not discarded" 1 "oaccloops" { target c } } } - { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' not discarded" 1 "oaccloops" { target { c++ && { ! offloading_enabled } } } } } - { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' not discarded" 1 "oaccloops" { target { c++ && offloading_enabled } } } } + { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' doesn't have 'nohost' clause" 1 "oaccloops1" { target c } } } + { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops1" { target { c++ && { ! offloading_enabled } } } } } + { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' doesn't have 'nohost' clause" 1 "oaccloops1" { target { c++ && offloading_enabled } } } } + { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE' not discarded" 1 "oaccloops1" { target c } } } + { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'void ROUTINE\\(\\)' not discarded" 1 "oaccloops1" { target { c++ && { ! offloading_enabled } } } } } + { dg-final { scan-tree-dump-times "(?n)OpenACC routine 'ROUTINE\\(\\)' not discarded" 1 "oaccloops1" { target { c++ && offloading_enabled } } } } TODO See PR101551 for 'offloading_enabled' differences. - { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops" } } - { dg-final { scan-tree-dump-times "(?n)void ROUTINE \\(\\)" 1 "oaccloops" } } */ + { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops1" } } + { dg-final { scan-tree-dump-times "(?n)void ROUTINE \\(\\)" 1 "oaccloops1" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c new file mode 100644 index 000000000000..5bf37cc61580 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c @@ -0,0 +1,29 @@ +/* Verify that OpenACC device lowering executes with "-Og". The actual logic in + the test function does not matter. */ + +/* { dg-additional-options "-Og -fdump-tree-oaccdevlow" } */ + +int main() +{ + int i, j; + int ina[1024], out[1024], acc; + + for (j = 0; j < 32; j++) + for (i = 0; i < 32; i++) + ina[j * 32 + i] = (i == j) ? 2 : 0; + + acc = 0; +#pragma acc parallel loop copy(acc, ina, out) + for (j = 0; j < 32; j++) + { +#pragma acc loop reduction(+:acc) + for (i = 0; i < 32; i++) + acc += ina[i]; + + out[j] = acc; + } + + return 0; +} + +/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow3" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c new file mode 100644 index 000000000000..193b5620de1d --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c @@ -0,0 +1,17 @@ +/* Verify that OpenACC device lowering executes even if there are no OpenACC + loops. */ + +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */ + +int main() +{ + int x; +#pragma acc parallel copy(x) + { + asm volatile(""); + } + + return 0; +} + +/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow2" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c new file mode 100644 index 000000000000..69e2b22d73ba --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c @@ -0,0 +1,30 @@ +/* Verify that OpenACC device lowering executes with "-O0". The actual + logic in the test function does not matter. */ + +/* { dg-additional-options "-O0 -fdump-tree-oaccdevlow" } */ + +int main() +{ + + int i, j; + int ina[1024], out[1024], acc; + + for (j = 0; j < 32; j++) + for (i = 0; i < 32; i++) + ina[j * 32 + i] = (i == j) ? 2 : 0; + + acc = 0; +#pragma acc parallel loop copy(acc, ina, out) + for (j = 0; j < 32; j++) + { +#pragma acc loop reduction(+:acc) + for (i = 0; i < 32; i++) + acc += ina[i]; + + out[j] = acc; + } + + return 0; +} + +/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow3" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c index 59ebb2bc5a9f..4f9a3a333570 100644 --- a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c +++ b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c @@ -13,7 +13,7 @@ int THREE(void) #pragma acc routine nohost extern int THREE(void); -/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*THREE[^']*' has 'nohost' clause\.$} 1 oaccloops } } */ +/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*THREE[^']*' has 'nohost' clause\.$} 1 "oaccloops*" } } */ #pragma acc routine nohost @@ -30,7 +30,7 @@ extern void NOTHING(void); #pragma acc routine (NOTHING) nohost -/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*NOTHING[^']*' has 'nohost' clause\.$} 1 oaccloops } } */ +/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*NOTHING[^']*' has 'nohost' clause\.$} 1 "oaccloops*" } } */ extern float ADD(float, float); @@ -47,4 +47,4 @@ extern float ADD(float, float); #pragma acc routine (ADD) nohost -/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*ADD[^']*' has 'nohost' clause\.$} 1 oaccloops } } */ +/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine '[^']*ADD[^']*' has 'nohost' clause\.$} 1 "oaccloops*" } } */ diff --git a/gcc/testsuite/c-c++-common/unroll-1.c b/gcc/testsuite/c-c++-common/unroll-1.c index fe7f4f31912c..8e57a44be231 100644 --- a/gcc/testsuite/c-c++-common/unroll-1.c +++ b/gcc/testsuite/c-c++-common/unroll-1.c @@ -1,5 +1,5 @@ -/* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } */ +/* { dg-do compile } * +/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdump-rtl-loop2_unroll-details" } */ extern void bar (int); @@ -10,12 +10,12 @@ void test (void) #pragma GCC unroll 8 for (unsigned long i = 1; i <= 8; ++i) bar(i); - /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli" } } */ + /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli1" } } */ #pragma GCC unroll 8 for (unsigned long i = 1; i <= 7; ++i) bar(i); - /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli" } } */ + /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli1" } } */ #pragma GCC unroll 8 for (unsigned long i = 1; i <= 15; ++i) diff --git a/gcc/testsuite/c-c++-common/unroll-4.c b/gcc/testsuite/c-c++-common/unroll-4.c index 1c1988174ba7..fe7f9e10626e 100644 --- a/gcc/testsuite/c-c++-common/unroll-4.c +++ b/gcc/testsuite/c-c++-common/unroll-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */ +/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli1-details" } */ extern void bar (int); @@ -17,6 +17,6 @@ void test (void) for (unsigned long i = 1; i <= j; ++i) bar(i); - /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */ + /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli1" } } */ /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */ } diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c index 78b9aed89beb..c191125b7951 100644 --- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c +++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c @@ -1,5 +1,4 @@ -/* Make sure that OpenACC loop processing happens. */ -/* { dg-additional-options "-O2 -fdump-tree-oaccloops" } */ +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow*" } */ extern int place (); @@ -15,4 +14,4 @@ void vector_1 (int *ary, int size) } } -/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccloops" } } */ +/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccloops*" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c index 302fdb570b63..b6b11bf30afa 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -g -fdump-tree-backprop-details" } */ +/* { dg-options "-O -g -fdump-tree-backprop1-details" } */ /* Test a simple case of non-looping code in which both uses ignore the sign and both definitions are sign ops. */ @@ -18,5 +18,5 @@ TEST_FUNCTION (float, f) TEST_FUNCTION (double, ) TEST_FUNCTION (long double, l) -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */ -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR = 2; } virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_complete_unrolli (m_ctxt); } }; // class pass_complete_unrolli diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 1bbf2f1fb2c8..8d5572033f7b 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -70,6 +70,8 @@ public: virtual bool gate (function *) { return flag_tree_loop_optimize; } virtual unsigned int execute (function *fn); + + opt_pass * clone () { return new pass_fix_loops (m_ctxt); } }; // class pass_fix_loops unsigned int @@ -136,6 +138,8 @@ public: /* opt_pass methods: */ virtual bool gate (function *fn) { return gate_loop (fn); } + + opt_pass * clone () { return new pass_tree_loop (m_ctxt); } }; // class pass_tree_loop } // anon namespace @@ -200,6 +204,97 @@ make_pass_oacc_kernels (gcc::context *ctxt) { return new pass_oacc_kernels (ctxt); } +/* A superpass that runs its subpasses on OpenACC functions only. */ + +namespace { + +const pass_data pass_data_oacc_functions_only = +{ + GIMPLE_PASS, /* type */ + "*oacc_fns_only", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_LOOP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_oacc_functions_only: public gimple_opt_pass +{ +public: + pass_oacc_functions_only (gcc::context *ctxt) + : gimple_opt_pass (pass_data_oacc_functions_only, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *fn) { + if (!flag_openacc) + return false; + + if (!oacc_get_fn_attrib (fn->decl)) + return false; + + return true; + } + +}; // class pass_oacc_functions_only + +} // anon namespace + +gimple_opt_pass * +make_pass_oacc_functions_only (gcc::context *ctxt) +{ + return new pass_oacc_functions_only (ctxt); +} + +/* A superpass that runs its subpasses only if compiling for OpenACC. */ + +namespace { + +const pass_data pass_data_oacc_only = +{ + GIMPLE_PASS, /* type */ + "*oacc_only", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_LOOP, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_oacc_only: public gimple_opt_pass +{ +public: + pass_oacc_only (gcc::context *ctxt) + : gimple_opt_pass (pass_data_oacc_only, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *fn) { + if (!flag_openacc) + return false; + + if (!oacc_get_fn_attrib (fn->decl)) + return false; + + return true; + } + +}; // class pass_oacc_only + +} // anon namespace + +gimple_opt_pass * +make_pass_oacc_only (gcc::context *ctxt) +{ + return new pass_oacc_only (ctxt); +} + + /* The ipa oacc superpass. */ @@ -343,6 +438,8 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); } + }; // class pass_tree_loop_init unsigned int @@ -556,6 +653,8 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *) { return tree_ssa_loop_done (); } + opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); } + }; // class pass_tree_loop_done } // anon namespace diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c index 78b0461c839d..f138f766286b 100644 --- a/gcc/tree-ssa-phiprop.c +++ b/gcc/tree-ssa-phiprop.c @@ -479,6 +479,8 @@ public: virtual bool gate (function *) { return flag_tree_phiprop; } virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_phiprop (m_ctxt); } + }; // class pass_phiprop unsigned int diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c index 17cc9bd663e5..4438f6c24fed 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c @@ -7,5 +7,5 @@ #include "pr85486.c" -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops1" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c index 5158bb5eb89e..c0a29c7556f9 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c @@ -34,5 +34,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c index a3e44ebfbcb4..326f6d8dc31a 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c @@ -1,6 +1,7 @@ /* { dg-do run { target openacc_nvidia_accel_selected } } */ /* { dg-additional-options "-fopenacc-dim=::128" } */ /* { dg-additional-options "-foffload=-fdump-tree-oaccloops" } */ +>>>>>>> adfd567486a0 (Move pass_oacc_device_lower after pass_graphite) /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */ #include @@ -35,5 +36,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c index a85400d09c50..efc9297acdee 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c @@ -38,5 +38,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c index 24c078f377c3..1c83ec0cc18d 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c @@ -36,5 +36,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c index fcca9f593bb2..f2391dca7272 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c @@ -37,5 +37,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c index 0807eab7eee4..8ddaaf592cc1 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c @@ -37,5 +37,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c index 4a8c1bf549e9..97abbfc20986 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c @@ -36,5 +36,5 @@ main (void) return 0; } -/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops" } } */ +/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccloops*" } } */ /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=8, vectors=128" } */ From patchwork Wed Dec 15 15:54:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48956 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DF6D4385E00B for ; Wed, 15 Dec 2021 16:04:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id E9E843858023 for ; Wed, 15 Dec 2021 15:55:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E9E843858023 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: DKEYJaF2socQpg6RxLMdzrvijhtDz5jfFu+s+m2Lpf0US1xCi2ShPZyIj8uH5iNq4XZWZZ3br2 /f6TrKTi1Lk+1STKMaRgbD1mqNIVuM0wAc/E0CAHEZp89L9q8W/jaDu7TO6C3yfjQckFeAf2Di QGeuy322Tt66HEnC0M2VWGfzUaZ4+L0GeE8S7FA4T9de71jPmyw5p7yF4aVFhNgN09VT8E11ds XcHI09ee7SzTk7Ak4VouYSqtzV596R9/rPJh7p5aYwwAJyyVO0PKBKeq+XiIYY3z3/IjiMXp/e bvlBeCEQ00/tR1lyIfBCoHCL X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736561" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:55:54 -0800 IronPort-SDR: JyDgyaPVaFpKhZSIJMWtvdCgASHGlnqa5UMcy5pvQ6crTn+rbLmroV/vOxMebapdilXfJef37q PrQWhBPr33+3NHusixLSULu8vQZS7fbkbBKuV041ZqNt/kCTuzcjHdk8iZVrmjkHZxxDsfq2dn xV2dte6g/K4MeyP4yVynuv/kmCsZ3KfvhBuNwPLx72Bdh1VLbsy9WeDN5i5E5+2TEI8sbcq6KF uCl27d4/QTAycPQWNu1b2BoBdhOv7x1opjvcshaEHxlJ4LGcJSDFgliQe6Zq4rYYQMBszSgyal tQo= From: Frederik Harwath To: Subject: [PATCH 15/40] graphite: Extend SCoP detection dump output Date: Wed, 15 Dec 2021 16:54:22 +0100 Message-ID: <20211215155447.19379-16-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Extend dump output to make understanding why Graphite rejects to include a loop in a SCoP easier (for GCC developers). ChangeLog: * graphite-scop-detection.c (scop_detection::can_represent_loop): Output reason for failure to dump file. (scop_detection::harmful_loop_in_region): Likewise. (scop_detection::graphite_can_represent_expr): Likewise. (scop_detection::stmt_has_simple_data_refs_p): Likewise. (scop_detection::stmt_simple_for_scop_p): Likewise. (print_sese_loop_numbers): New function. (scop_detection::add_scop): Use from here to print loops in rejected SCoP. --- gcc/graphite-scop-detection.c | 188 +++++++++++++++++++++++++++++----- 1 file changed, 165 insertions(+), 23 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 3e729b159b09..46c470210d05 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -69,12 +69,27 @@ public: fprintf (output.dump_file, "%d", i); return output; } + friend debug_printer & operator<< (debug_printer &output, const char *s) { fprintf (output.dump_file, "%s", s); return output; } + + friend debug_printer & + operator<< (debug_printer &output, gimple* stmt) + { + print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS); + return output; + } + + friend debug_printer & + operator<< (debug_printer &output, tree t) + { + print_generic_expr (output.dump_file, t, TDF_SLIM); + return output; + } } dp; #define DEBUG_PRINT(args) do \ @@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) const return combined; } +/* Print the loop numbers of the loops contained + in SESE to FILE. */ + +static void +print_sese_loop_numbers (FILE *file, sese_l sese) +{ + loop_p loop; + bool printed = false; + FOR_EACH_LOOP (loop, 0) + { + if (loop_in_sese_p (loop, sese)) + fprintf (file, "%d, ", loop->num); + printed = true; + } + if (printed) + fprintf (file, "\b\b"); +} + /* Build scop outer->inner if possible. */ void @@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop) if (! next || harmful_loop_in_region (next)) { - if (s) - add_scop (s); + if (next) + DEBUG_PRINT ( + dp << "[scop-detection] Discarding SCoP on loops "; + print_sese_loop_numbers (dump_file, next); + dp << " because of harmful loops\n";); + if (s) + add_scop (s); build_scop_depth (loop); s = invalid_sese; } @@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop) || !single_pred_p (loop->latch) || exit->src != single_pred (loop->latch) || !empty_block_p (loop->latch)) - return false; + { + DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape unsupported.\n"); + return false; + } + + bool edge_irreducible + = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP; + if (edge_irreducible) + { + DEBUG_PRINT ( + dp << "[can_represent_loop-fail] Loop is not a natural loop.\n"); + return false; + } + + bool niter_is_unconditional = number_of_iterations_exit (loop, + single_exit (loop), + &niter_desc, false); - return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP) - && number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false) - && niter_desc.control.no_overflow - && (niter = number_of_latch_executions (loop)) - && !chrec_contains_undetermined (niter) - && graphite_can_represent_expr (scop, loop, niter); + if (!niter_is_unconditional) + { + DEBUG_PRINT ( + dp << "[can_represent_loop-fail] Loop niter not unconditional.\n" + << "Condition: " << niter_desc.assumptions << "\n"); + return false; + } + + niter = number_of_latch_executions (loop); + if (!niter) + { + DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n"); + return false; + } + if (!niter_desc.control.no_overflow) + { + DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n"); + return false; + } + + bool undetermined_coefficients = chrec_contains_undetermined (niter); + if (undetermined_coefficients) + { + DEBUG_PRINT (dp << "[can_represent_loop-fail] " + << "Loop niter chrec contains undetermined coefficients.\n"); + return false; + } + + bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter); + if (!can_represent_expr) + { + DEBUG_PRINT (dp << "[can_represent_loop-fail] " + << "Loop niter expression cannot be represented: " + << niter << "\n"); + return false; + } + + return true; } /* Return true when BEGIN is the preheader edge of a loop with a single exit @@ -640,6 +726,16 @@ scop_detection::add_scop (sese_l s) scops.safe_push (s); DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, s)); + + if (dump_file && dump_flags & TDF_DETAILS) + { + loop_p loop; + fprintf (dump_file, "Loops in SCoP: "); + FOR_EACH_LOOP (loop, 0) + if (loop_in_sese_p (loop, s)) + fprintf (dump_file, "%d ", loop->num); + fprintf (dump_file, "\n"); + } } /* Return true when a statement in SCOP cannot be represented by Graphite. */ @@ -665,7 +761,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const /* The basic block should not be part of an irreducible loop. */ if (bb->flags & BB_IRREDUCIBLE_LOOP) - return true; + { + DEBUG_PRINT (dp << "[scop-detection-fail] Found bb in irreducible " + "loop.\n"); + return true; + } /* Check for unstructured control flow: CFG not generated by structured if-then-else. */ @@ -676,7 +776,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const FOR_EACH_EDGE (e, ei, bb->succs) if (!dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest) && !dominated_by_p (CDI_DOMINATORS, e->dest, bb)) - return true; + { + DEBUG_PRINT (dp << "[scop-detection-fail] Found unstructured " + "control flow.\n"); + return true; + } } /* Collect all loops in the current region. */ @@ -688,7 +792,10 @@ scop_detection::harmful_loop_in_region (sese_l scop) const for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb)) - return true; + { + DEBUG_PRINT (dp << "[scop-detection-fail] Found harmful statement.\n"); + return true; + } for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb); dom; @@ -731,9 +838,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const && ! loop_nest_has_data_refs (loop)) { DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num - << "does not have any data reference.\n"); + << " does not have any data reference.\n"); return true; } + + DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n"); } return false; @@ -922,7 +1031,21 @@ scop_detection::graphite_can_represent_expr (sese_l scop, loop_p loop, tree expr) { tree scev = cached_scalar_evolution_in_region (scop, loop, expr); - return graphite_can_represent_scev (scop, scev); + bool can_represent = graphite_can_represent_scev (scop, scev); + + if (!can_represent) + { + if (dump_file) + { + fprintf (dump_file, "[graphite_can_represent_expr] Cannot represent " + "scev \""); + print_generic_expr (dump_file, scev, TDF_SLIM); + fprintf (dump_file, "\" of expression "); + print_generic_expr (dump_file, expr, TDF_SLIM); + fprintf (dump_file, " in loop %d\n", loop->num); + } + } + return can_represent; } /* Return true if the data references of STMT can be represented by Graphite. @@ -938,7 +1061,11 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt) auto_vec drs; if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs)) - return false; + { + DEBUG_PRINT (dp << + "[stmt_has_simple_data_refs_p] Unanalyzable statement.\n"); + return false; + } int j; data_reference_p dr; @@ -946,7 +1073,12 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt) { for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i) if (! graphite_can_represent_scev (scop, DR_ACCESS_FN (dr, i))) - return false; + { + DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] Cannot " + "represent access function SCEV: " + << DR_ACCESS_FN (dr, i) << "\n"); + return false; + } } return true; @@ -1027,14 +1159,24 @@ scop_detection::stmt_simple_for_scop_p (sese_l scop, gimple *stmt, for (unsigned i = 0; i < 2; ++i) { tree op = gimple_op (stmt, i); - if (!graphite_can_represent_expr (scop, loop, op) - /* We can only constrain on integer type. */ - || ! INTEGRAL_TYPE_P (TREE_TYPE (op))) + if (!graphite_can_represent_expr (scop, loop, op)) + { + DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt, + "[scop-detection-fail] " + "Graphite cannot represent cond " + "stmt operator expression.\n")); + DEBUG_PRINT (dp << op << "\n"); + + return false; + } + + if (! INTEGRAL_TYPE_P (TREE_TYPE (op))) { - DEBUG_PRINT (dp << "[scop-detection-fail] " - << "Graphite cannot represent stmt:\n"; - print_gimple_stmt (dump_file, stmt, 0, - TDF_VOPS | TDF_MEMSYMS)); + DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt, + "[scop-detection-fail] " + "Graphite cannot represent cond " + "statement operator. " + "Type must be integral.\n")); return false; } } From patchwork Wed Dec 15 15:54:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48958 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5A925385C41D for ; Wed, 15 Dec 2021 16:05:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id A4D27385843B for ; Wed, 15 Dec 2021 15:56:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A4D27385843B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: QyCjiAqyr7k+y0pFqZud91Ko6yAJPvNs8PaFYt6OyjLC1m+wEyAUrtKCRvSKyTsOImU7KapKHZ ZKeD+WTL0+gcOdcYQGyckJqDShNFRRei3ORCK9AUyxrzPA8GNEbH5Da1yrKu8PQ0s9zJv+kyB3 TGrQXMJ9Ek/n84k0GBiqmwKW+BMo5mNqzh7lrBeRq5Z6kvSuKw7PJTYuAUAIGt9tJEMjN4+Nu3 1Q3yb7Jg0NnNrNuu2H26KeQHFdKAYAr7grgOx4Yqc8D6tVmYwP7di6j1KKMQNXmilazJ28vhy6 43AkNAcevHBFlSPj7eY6Kcm3 X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738376" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:03 -0800 IronPort-SDR: eBhrp8VIo52c/ud8hR80ZA+oZFd7AXnKzU6aNBfJ7IqCA92JJIIg6kCw6PbFwDBQP5LhbyqJvt sb8VhGbTRkqMm7+DXaLdkHUb+uRJ43x/PC2yuUJvc0VmZ/9yKjb5rJXAJW8mGo06sS40BNOqjm YDCpw5H02qfsUZ2DvJd6RJxFF30wob0nx9I/gW6ZO3wSwjyU2bNYw4fGvLm5e45s4Eq4P5eqgp vRNdqi1sL5O+oFmYBryCgcBaKj+nV4LWKK6bp1lVfry0iL1yM6cWDGSSL1jMYsTH68WKnGuXTz m2w= From: Frederik Harwath To: Subject: [PATCH 16/40] graphite: Rename isl_id_for_ssa_name Date: Wed, 15 Dec 2021 16:54:23 +0100 Message-ID: <20211215155447.19379-17-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The SSA names for which this function gets used are always SCoP parameters and hence "isl_id_for_parameter" is a better name. It also explains the prefix "P_" for those names in the ISL representation. gcc/ChangeLog: * graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ... (isl_id_for_parameter): ... this new function name. (build_scop_context): Adjust function use. --- gcc/graphite-sese-to-poly.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index 99ea0327b1a7..204d382ed4cc 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take isl_space *space) return isl_pw_aff_mul (lhs, rhs); } -/* Return an isl identifier from the name of the ssa_name E. */ +/* Return an isl identifier for the parameter P. */ static isl_id * -isl_id_for_ssa_name (scop_p s, tree e) +isl_id_for_parameter (scop_p s, tree p) { - char name1[14]; - snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e)); - return isl_id_alloc (s->isl_context, name1, e); + gcc_checking_assert (TREE_CODE (p) == SSA_NAME); + char name[14]; + snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p)); + return isl_id_alloc (s->isl_context, name, p); } /* Return an isl identifier for the data reference DR. Data references and @@ -898,15 +899,15 @@ build_scop_context (scop_p scop) isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0); unsigned i; - tree e; - FOR_EACH_VEC_ELT (region->params, i, e) + tree p; + FOR_EACH_VEC_ELT (region->params, i, p) space = isl_space_set_dim_id (space, isl_dim_param, i, - isl_id_for_ssa_name (scop, e)); + isl_id_for_parameter (scop, p)); scop->param_context = isl_set_universe (space); - FOR_EACH_VEC_ELT (region->params, i, e) - add_param_constraints (scop, i, e); + FOR_EACH_VEC_ELT (region->params, i, p) + add_param_constraints (scop, i, p); } /* Return true when loop A is nested in loop B. */ From patchwork Wed Dec 15 15:54:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48959 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F38E3385C017 for ; Wed, 15 Dec 2021 16:06:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 5C4E03858027 for ; Wed, 15 Dec 2021 15:56:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5C4E03858027 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 7THf6bMF6i7Vfq21n3AXuhP1AYGMi+XEDYY3g+4gNqTvLnJBxa5cGupyo5IurbZt1qlHcBWJGg 11vlR0Qhj5zsAmuCme5+ewq+PcKZ7TEGu2t795QNF2K5y7uMcR6L3xy8RSzHH9JSFqVxmw89tF ItofJrzhwlIBMSSAJ69y2o3bnw5ep7imRw9nYIfIYbaTYgB4IvYBk+UZZ3LOB/yGawP0lDJVVM KB8r8bcqYrZSJtjXPa66h9PFzWOqXaaAke3Mdv08+awFpPuhCNvI38BYah33kpRnxqUsMl9fIM wDRbCmrcELM5kkdmuSTuKY5p X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738378" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:04 -0800 IronPort-SDR: OmQ+u66p7yPHBoVyd115JGlAtYd9tL2+7Oh2Kvgi+RPmfXegAJxoAN7AiBfBmHnjQWBT5JOVJ7 FRBuRJDf4dn6A5Qp6rxw71sN7VontmTd6o7QtSH9R7e4DT+raCuWf4JarFEPuPxp9WY819UyvA AYFF6ftY8YOZt9wJRIPVFxNAzuZf9gQ3MG50a3JLp/q4eHf0Nb5zfc8M29HsdtWCblY9fjOCqw YweX9a573WWEvkKqYfdSwbFMrGrRV2hV6ewORZ/HHtA+GDZMdikj9f/7Humw1UM7dVFlcBrYTZ gMc= From: Frederik Harwath To: Subject: [PATCH 17/40] graphite: Fix minor mistakes in comments Date: Wed, 15 Dec 2021 16:54:24 +0100 Message-ID: <20211215155447.19379-18-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and a reference to a variable which does not exist. * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo in comment. --- gcc/graphite-isl-ast-to-gimple.c | 2 +- gcc/graphite-sese-to-poly.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index 1ad68a1d4735..0712d85b67a6 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq) basic_block begin_bb = get_entry_bb (codegen_region); /* Inserting the gimple statements in a vector because gimple_seq behave - in strage ways when inserting the stmts from it into different basic + in strange ways when inserting the stmts from it into different basic blocks one at a time. */ auto_vec stmts; for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi); diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index 204d382ed4cc..33d6a98327b8 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -649,14 +649,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind, isl_map *acc, isl_set *subscript_sizes) { scop_p scop = PBB_SCOP (pbb); - /* Each scalar variables has a unique alias set number starting from + /* Each scalar variable has a unique alias set number starting from the maximum alias set assigned to a dr. */ int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var); subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0, alias_set); /* Add a constrain to the ACCESSES polyhedron for the alias set of - data reference DR. */ + the reference */ isl_constraint *c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (acc))); c = isl_constraint_set_constant_si (c, -alias_set); From patchwork Wed Dec 15 15:54:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48960 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4DDAB385AC23 for ; Wed, 15 Dec 2021 16:06:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id F0E6E3857C4B for ; Wed, 15 Dec 2021 15:56:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F0E6E3857C4B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: lVBWamfN3EvW7+QcoaIf+hQJ0ZXRcyZj2FKaf8EBOtpalH4BzKWCCTHgYN+HmEE+Qo3j8MURWE ZUXuZv1yh4psSVLnopW5WlQfT0RvDPjHweq+yMyIW3Ze/Xx+yf+FvQzKV+/WgV3h+8393tuyRe xHi9VKjjuwq9s/mIGdwmV1yE/rdynMeMxb/EU8KTNwOrqdPHwDVSlAYR4iMvSaimwlbOjaYd9S WrutAX6+pRaJ2zOoXYEIrbrACw8d/HX5n2OPrEIrq0P7aM32NPSvXPC+eE3sH8fuvrIf1/YHwX Jvobovx8z8ovghJxeFO+1QU/ X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738380" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:08 -0800 IronPort-SDR: vnJ/0lxN4JJ9eSBwgnj9TxuLahkUPydQb+VbW/R/vUNQkywnziTe6bjqjtN+0rNXHRqo3mz7R1 y/8aGJYEgBfeAUywM7h98KJ9NBRq3Pa326j/e7qrWA28AryqJzSLW+vtynrLnwFttnFz9uWJSv 5kIEJoiZIU1+hh/pqZHLrXYQy7+JAoXDgrdgajlNIz4h9x/3fcTAGTp33hA9N3BKcgpV4r7WUc 7VTo/3tCsGdKhNZFqIblp5zP+4muM127xjJgm55iah4IWTVPGI1Nc7dFDL/7QKfiRP1XJayPey c2s= From: Frederik Harwath To: Subject: [PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c Date: Wed, 15 Dec 2021 16:54:25 +0100 Message-ID: <20211215155447.19379-19-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Move this function from tree-loop-distribution.c to tree-data-ref.c and make it non-static to enable its use from other parts of GCC. gcc/ChangeLog: * tree-loop-distribution.c (data_ref_segment_size): Remove function. (latch_dominated_by_data_ref): Likewise. (compute_alias_check_pairs): Likewise. * tree-data-ref.c (data_ref_segment_size): New function, copied from tree-loop-distribution.c (compute_alias_check_pairs): Likewise. (latch_dominated_by_data_ref): Likewise. * tree-data-ref.h (compute_alias_check_pairs): New declaration. --- gcc/tree-data-ref.c | 87 ++++++++++++++++++++++++++++++++++++ gcc/tree-data-ref.h | 3 ++ gcc/tree-loop-distribution.c | 87 ------------------------------------ 3 files changed, 90 insertions(+), 87 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 46f4ffedb483..6a3659dc490c 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -2636,6 +2636,93 @@ create_intersect_range_checks (class loop *loop, tree *cond_expr, dump_printf (MSG_NOTE, "using an address-based overlap test\n"); } +/* Compute and return an expression whose value is the segment length which + will be accessed by DR in NITERS iterations. */ + +static tree +data_ref_segment_size (struct data_reference *dr, tree niters) +{ + niters = size_binop (MINUS_EXPR, + fold_convert (sizetype, niters), + size_one_node); + return size_binop (MULT_EXPR, + fold_convert (sizetype, DR_STEP (dr)), + fold_convert (sizetype, niters)); +} + +/* Return true if LOOP's latch is dominated by statement for data reference + DR. */ + +static inline bool +latch_dominated_by_data_ref (class loop *loop, data_reference *dr) +{ + return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, + gimple_bb (DR_STMT (dr))); +} + +/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's + data dependence relations ALIAS_DDRS. */ + +void +compute_alias_check_pairs (class loop *loop, vec *alias_ddrs, + vec *comp_alias_pairs) +{ + unsigned int i; + unsigned HOST_WIDE_INT factor = 1; + tree niters_plus_one, niters = number_of_latch_executions (loop); + + gcc_assert (niters != NULL_TREE && niters != chrec_dont_know); + niters = fold_convert (sizetype, niters); + niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node); + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Creating alias check pairs:\n"); + + /* Iterate all data dependence relations and compute alias check pairs. */ + for (i = 0; i < alias_ddrs->length (); i++) + { + ddr_p ddr = (*alias_ddrs)[i]; + struct data_reference *dr_a = DDR_A (ddr); + struct data_reference *dr_b = DDR_B (ddr); + tree seg_length_a, seg_length_b; + + if (latch_dominated_by_data_ref (loop, dr_a)) + seg_length_a = data_ref_segment_size (dr_a, niters_plus_one); + else + seg_length_a = data_ref_segment_size (dr_a, niters); + + if (latch_dominated_by_data_ref (loop, dr_b)) + seg_length_b = data_ref_segment_size (dr_b, niters_plus_one); + else + seg_length_b = data_ref_segment_size (dr_b, niters); + + unsigned HOST_WIDE_INT access_size_a + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a)))); + unsigned HOST_WIDE_INT access_size_b + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b)))); + unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a))); + unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b))); + + dr_with_seg_len_pair_t dr_with_seg_len_pair + (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a), + dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b), + /* ??? Would WELL_ORDERED be safe? */ + dr_with_seg_len_pair_t::REORDERED); + + comp_alias_pairs->safe_push (dr_with_seg_len_pair); + } + + if (tree_fits_uhwi_p (niters)) + factor = tree_to_uhwi (niters); + + /* Prune alias check pairs. */ + prune_runtime_alias_test_list (comp_alias_pairs, factor); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Improved number of alias checks from %d to %d\n", + alias_ddrs->length (), comp_alias_pairs->length ()); +} + /* Create a conditional expression that represents the run-time checks for overlapping of address ranges represented by a list of data references pairs passed in ALIAS_PAIRS. Data references are in LOOP. The returned diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 74f579c9f3f2..4929b059ddea 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -582,6 +582,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop *, bool); extern int data_ref_compare_tree (tree, tree); extern void prune_runtime_alias_test_list (vec *, poly_uint64); + +extern void compute_alias_check_pairs (class loop *, vec *, + vec *); extern void create_runtime_alias_checks (class loop *, const vec *, tree*); diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 583c01a42d86..ed6f2c2974f1 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -2582,93 +2582,6 @@ loop_distribution::break_alias_scc_partitions (struct graph *rdg, } } -/* Compute and return an expression whose value is the segment length which - will be accessed by DR in NITERS iterations. */ - -static tree -data_ref_segment_size (struct data_reference *dr, tree niters) -{ - niters = size_binop (MINUS_EXPR, - fold_convert (sizetype, niters), - size_one_node); - return size_binop (MULT_EXPR, - fold_convert (sizetype, DR_STEP (dr)), - fold_convert (sizetype, niters)); -} - -/* Return true if LOOP's latch is dominated by statement for data reference - DR. */ - -static inline bool -latch_dominated_by_data_ref (class loop *loop, data_reference *dr) -{ - return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, - gimple_bb (DR_STMT (dr))); -} - -/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's - data dependence relations ALIAS_DDRS. */ - -static void -compute_alias_check_pairs (class loop *loop, vec *alias_ddrs, - vec *comp_alias_pairs) -{ - unsigned int i; - unsigned HOST_WIDE_INT factor = 1; - tree niters_plus_one, niters = number_of_latch_executions (loop); - - gcc_assert (niters != NULL_TREE && niters != chrec_dont_know); - niters = fold_convert (sizetype, niters); - niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node); - - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Creating alias check pairs:\n"); - - /* Iterate all data dependence relations and compute alias check pairs. */ - for (i = 0; i < alias_ddrs->length (); i++) - { - ddr_p ddr = (*alias_ddrs)[i]; - struct data_reference *dr_a = DDR_A (ddr); - struct data_reference *dr_b = DDR_B (ddr); - tree seg_length_a, seg_length_b; - - if (latch_dominated_by_data_ref (loop, dr_a)) - seg_length_a = data_ref_segment_size (dr_a, niters_plus_one); - else - seg_length_a = data_ref_segment_size (dr_a, niters); - - if (latch_dominated_by_data_ref (loop, dr_b)) - seg_length_b = data_ref_segment_size (dr_b, niters_plus_one); - else - seg_length_b = data_ref_segment_size (dr_b, niters); - - unsigned HOST_WIDE_INT access_size_a - = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a)))); - unsigned HOST_WIDE_INT access_size_b - = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b)))); - unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a))); - unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b))); - - dr_with_seg_len_pair_t dr_with_seg_len_pair - (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a), - dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b), - /* ??? Would WELL_ORDERED be safe? */ - dr_with_seg_len_pair_t::REORDERED); - - comp_alias_pairs->safe_push (dr_with_seg_len_pair); - } - - if (tree_fits_uhwi_p (niters)) - factor = tree_to_uhwi (niters); - - /* Prune alias check pairs. */ - prune_runtime_alias_test_list (comp_alias_pairs, factor); - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, - "Improved number of alias checks from %d to %d\n", - alias_ddrs->length (), comp_alias_pairs->length ()); -} - /* Given data dependence relations in ALIAS_DDRS, generate runtime alias checks and version LOOP under condition of these runtime alias checks. */ From patchwork Wed Dec 15 15:54:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48961 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1B536385C8B0 for ; Wed, 15 Dec 2021 16:07:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 9D2DD3857C42 for ; Wed, 15 Dec 2021 15:56:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9D2DD3857C42 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 8LoNeDCKtb3JqtxNkW0Cv7tevLA52PHxpq7+AXYvGC4teZ357V1aBbVs25yYRTWxgSBpClHO8C VKR5kMUfLFY4tN2HjwTLtbRPY9JL/XO6OUiinyoCde4rbRY0PR0/7eCgWdNiDqiaSFAwPsJKNX kk4pw1FXLDXGCpPSZQJu6o2o1ZGq9a4RJTas3WGVu1U5J6iNTWivPcHGJgQD4qHgXJJPKSE31R N03V7spGctrvx4PqqZU1tOF4ZJwnhMDKNrWVSl1SdKbH/1JusEKjs03jgxbW5tcSO90wi09OUN uIZBX0OobRamgNdknYGv2Ob9 X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738381" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:09 -0800 IronPort-SDR: 8FHEtOr663iljXIDOXAzcrmHOsz9eEvkx9Bk+E+BuVKmlLS2ej0tMmTceKANkBoTUk+GDMunm2 PBg3XjBOldcWX5dMIRtvvaMk/bJx7zt85UHqZJW60wi7/jmyDw09RBjdIvexHlhZQV66XgsrLH DDXEZ24RAXzaON7FpDmIiPXs2FuxkqonGgieSRMaSaM29uT8k/puzY/XKlMspT3ZVGETYkb+bb /n2NjSd9ur6q9lKCTtvHzczIAt8zQVhfXcI/P6pdkEQlrOUbhYDR1ezGSWzLXAnk6PxOFvjAz+ eZo= From: Frederik Harwath To: Subject: [PATCH 19/40] graphite: Add runtime alias checking Date: Wed, 15 Dec 2021 16:54:26 +0100 Message-ID: <20211215155447.19379-20-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Graphite rejects a SCoP if it contains a pair of data references for which it cannot determine statically if they may alias. This happens very often, for instance in C code which does not use explicit "restrict". This commit adds the possibility to analyze a SCoP nevertheless and perform an alias check at runtime. Then, if aliasing is detected, the execution will fall back to the unoptimized SCoP. TODO This needs more testing on non-OpenACC code. gcc/ChangeLog: * common.opt: Add fgraphite-runtime-alias-checks. * graphite-isl-ast-to-gimple.c (generate_alias_cond): New function. (graphite_regenerate_ast_isl): Use from here. * graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ... (free_scop): and release here. * graphite-scop-detection.c (dr_defs_outside_region): New function. (dr_well_analyzed_for_runtime_alias_check_p): New function. (graphite_runtime_alias_check_p): New function. (build_alias_set): Record unhandled alias ddrs for later alias check creation if flag_graphite_runtime_alias_checks is true instead of failing. * graphite.h (struct scop): Add field unhandled_alias_ddrs. * sese.h (has_operands_from_region_p): New function. gcc/testsuite/ChangeLog: * gcc.dg/graphite/alias-1.c: New test. --- gcc/common.opt | 4 + gcc/graphite-isl-ast-to-gimple.c | 60 ++++++ gcc/graphite-poly.c | 2 + gcc/graphite-scop-detection.c | 241 +++++++++++++++++++++--- gcc/graphite.h | 4 + gcc/sese.h | 18 ++ gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 +++ 7 files changed, 328 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/common.opt b/gcc/common.opt index 1a5b9bfcca91..b6c46ab63e34 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1673,6 +1673,10 @@ fgraphite-identity Common Var(flag_graphite_identity) Optimization Enable Graphite Identity transformation. +fgraphite-runtime-alias-checks +Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1) +Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be resolved statically. + fhoist-adjacent-loads Common Var(flag_hoist_adjacent_loads) Optimization Enable hoisting adjacent loads to encourage generating conditional move diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index 0712d85b67a6..073b471775de 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry, } } +/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of + aliasing. */ + +static tree +generate_alias_cond (vec &alias_ddrs, loop_p context_loop) +{ + gcc_checking_assert (flag_graphite_runtime_alias_checks + && alias_ddrs.length () > 0); + gcc_checking_assert (context_loop); + + auto_vec check_pairs; + compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs); + gcc_checking_assert (check_pairs.length () > 0); + + tree alias_cond = NULL_TREE; + create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond); + gcc_checking_assert (alias_cond); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Generated runtime alias check: "); + print_generic_expr (dump_file, alias_cond, dump_flags); + fprintf (dump_file, "\n"); + } + + return alias_cond; +} + /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP. Return true if code generation succeeded. */ @@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop) region->if_region = if_region; loop_p context_loop = region->region.entry->src->loop_father; + gcc_checking_assert (context_loop); edge e = single_succ_edge (if_region->true_region->region.entry->dest); basic_block bb = split_edge (e); /* Update the true_region exit edge. */ region->if_region->true_region->region.exit = single_succ_edge (bb); + if (flag_graphite_runtime_alias_checks + && scop->unhandled_alias_ddrs.length () > 0) + { + /* SCoP detection has failed to handle the aliasing between some data + references of the SCoP statically. Generate an alias check that selects + the newly generated version of the SCoP in the true-branch of the + conditional if aliasing can be ruled out at runtime and the original + version of the SCoP, otherwise. */ + + loop_p loop + = find_common_loop (scop->scop_info->region.entry->dest->loop_father, + scop->scop_info->region.exit->src->loop_father); + tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop); + tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); + set_ifsese_condition (region->if_region, non_alias_cond); + + /* The loop-nest vec is shared by all DDRs. */ + DDR_LOOP_NEST (scop->unhandled_alias_ddrs[0]).release (); + + unsigned int i; + struct data_dependence_relation *ddr; + + FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr) + if (ddr) + free_dependence_relation (ddr); + scop->unhandled_alias_ddrs.truncate (0); + } + + if (dump_file) + fprintf (dump_file, "[codegen] isl AST to Gimple succeeded.\n"); + t.translate_isl_ast (context_loop, root_node, e, ip); if (! t.codegen_error_p ()) { diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c index 1dfc28e6caea..a7aabcb33c99 100644 --- a/gcc/graphite-poly.c +++ b/gcc/graphite-poly.c @@ -255,6 +255,7 @@ new_scop (edge entry, edge exit) scop_set_region (s, region); s->pbbs.create (3); s->drs.create (3); + s->unhandled_alias_ddrs.create (1); s->dependence = NULL; return s; } @@ -272,6 +273,7 @@ free_scop (scop_p scop) scop->pbbs.release (); scop->drs.release (); + scop->unhandled_alias_ddrs.release (); isl_set_free (scop->param_context); scop->param_context = NULL; diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 46c470210d05..924004e3f3c4 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -1542,6 +1542,125 @@ try_generate_gimple_bb (scop_p scop, basic_block bb) return new_gimple_poly_bb (bb, drs, reads, writes); } +/* Checks if all parts of DR are defined outside of REGION. This allows an + alias check involving DR to be placed in front of the region. */ + +static opt_result +dr_defs_outside_region (const sese_l ®ion, data_reference_p dr) +{ + static const char *pre + = "cannot create alias check for SCoP. Data reference's"; + static const char *suf = "uses definitions from SCoP.\n"; + opt_result res = opt_result::success (); + + if (has_operands_from_region_p (DR_BASE_OBJECT (dr), region)) + res = opt_result::failure_at (DR_STMT (dr), "%s base %s", pre, suf); + else if (has_operands_from_region_p (DR_INIT (dr), region)) + res = opt_result::failure_at (DR_STMT (dr), "%s constant offset %s", pre, + suf); + else if (has_operands_from_region_p (DR_STEP (dr), region)) + res = opt_result::failure_at (DR_STMT (dr), "%s step %s", pre, suf); + else if (has_operands_from_region_p (DR_OFFSET (dr), region)) + res = opt_result::failure_at (DR_STMT (dr), "%s loop-invariant offset %s", + pre, suf); + else if (has_operands_from_region_p (DR_BASE_ADDRESS (dr), region)) + res = opt_result::failure_at (DR_STMT (dr), "%s base address %s", pre, + suf); + else + for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i) + if (has_operands_from_region_p (DR_ACCESS_FN (dr, i), region)) + { + res = opt_result::failure_at ( + DR_STMT (dr), "%s %d-th access function %s", pre, i + 1, pre); + break; + } + + return res; +} + +/* Check that all constituents of DR that are used by the + "compute_alias_check_pairs" function have been analyzed as required. */ + +static opt_result +dr_well_analyzed_for_runtime_alias_check_p (data_reference_p dr) +{ + static const char* error = + "data-reference not well-analyzed for runtime check."; + gimple* stmt = DR_STMT (dr); + opt_result res = opt_result::success (); + + if (! DR_BASE_ADDRESS (dr)) + res = opt_result::failure_at (stmt, "%s no base address.\n", error); + else if (! DR_OFFSET (dr)) + res = opt_result::failure_at (stmt, "%s no offset.\n", error); + else if (! DR_INIT (dr)) + res = opt_result::failure_at (stmt, "%s no init.\n", error); + else if (! DR_STEP (dr)) + res = opt_result::failure_at (stmt, "%s no step.\n", error); + else if (! tree_fits_uhwi_p (DR_STEP (dr))) + res = opt_result::failure_at (stmt, "%s step too large.\n", error); + + if (!res) + DEBUG_PRINT (dump_data_reference (dump_file, dr)); + + return res; +} + +/* Return TRUE if it is possible to create a runtime alias check for + data-references DR1 and DR2 from LOOP and place it in front of REGION. */ + +static opt_result +graphite_runtime_alias_check_p (data_reference_p dr1, data_reference_p dr2, + class loop *loop, const sese_l ®ion) +{ + gcc_checking_assert (loop); + gcc_checking_assert (dr1); + gcc_checking_assert (dr2); + + if (dump_file) + { + fprintf (dump_file, + "Attempting runtime alias check creation for DRs:\n"); + dump_data_reference (dump_file, dr1); + dump_data_reference (dump_file, dr2); + } + + if (!optimize_loop_for_speed_p (loop)) + return opt_result::failure_at (DR_STMT (dr1), + "runtime alias check not supported when" + " optimizing for size.\n"); + + /* Verify that we have enough information about the data-references and + context loop to construct a runtime alias check expression with + "compute_alias_check_pairs". */ + tree niters = number_of_latch_executions (loop); + if (niters == NULL_TREE || niters == chrec_dont_know) + return opt_result::failure_at (DR_STMT (dr1), + "failed to obtain number of iterations of " + "loop %d.\n", loop->num); + + opt_result ok = dr_well_analyzed_for_runtime_alias_check_p (dr1); + if (!ok) + return ok; + + ok = dr_well_analyzed_for_runtime_alias_check_p (dr2); + if (!ok) + return ok; + + /* The runtime alias check would be placed before REGION and hence it cannot + use definitions made within REGION. */ + + ok = dr_defs_outside_region (region, dr1); + if (!ok) + return ok; + + ok = dr_defs_outside_region (region, dr2); + if (!ok) + return ok; + + return opt_result::success (); +} + /* Compute alias-sets for all data references in DRS. */ static bool @@ -1549,7 +1668,7 @@ build_alias_set (scop_p scop) { int num_vertices = scop->drs.length (); struct graph *g = new_graph (num_vertices); - dr_info *dr1, *dr2; + dr_info *dri1, *dri2; int i, j; int *all_vertices; @@ -1557,33 +1676,110 @@ build_alias_set (scop_p scop) = find_common_loop (scop->scop_info->region.entry->dest->loop_father, scop->scop_info->region.exit->src->loop_father); - FOR_EACH_VEC_ELT (scop->drs, i, dr1) - for (j = i+1; scop->drs.iterate (j, &dr2); j++) - if (dr_may_alias_p (dr1->dr, dr2->dr, nest)) - { - /* Dependences in the same alias set need to be handled - by just looking at DR_ACCESS_FNs. */ - if (DR_NUM_DIMENSIONS (dr1->dr) == 0 - || DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr) - || ! operand_equal_p (DR_BASE_OBJECT (dr1->dr), - DR_BASE_OBJECT (dr2->dr), - OEP_ADDRESS_OF) - || ! types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (dr1->dr)), - TREE_TYPE (DR_BASE_OBJECT (dr2->dr)))) - { - free_graph (g); - return false; - } - add_edge (g, i, j); - add_edge (g, j, i); - } + gcc_checking_assert (nest); + + vec nest_vec; + nest_vec.create (1); + if (flag_graphite_runtime_alias_checks) + nest_vec.safe_push (nest); + + FOR_EACH_VEC_ELT (scop->drs, i, dri1) + { + data_reference_p dr1 = dri1->dr; + + for (j = i + 1; scop->drs.iterate (j, &dri2); j++) + { + + data_reference_p dr2 = dri2->dr; + if (!(DR_IS_READ (dr1) && DR_IS_READ (dr2)) + && dr_may_alias_p (dr1, dr2, nest)) + { + /* Dependences in the same alias set need to be handled + by just looking at DR_ACCESS_FNs. */ + bool dimension_zero = DR_NUM_DIMENSIONS (dr1) == 0; + bool different_dimensions + = DR_NUM_DIMENSIONS (dr1) != DR_NUM_DIMENSIONS (dr2); + bool different_base_objects = !operand_equal_p ( + DR_BASE_OBJECT (dr1), DR_BASE_OBJECT (dr2), OEP_ADDRESS_OF); + bool incompatible_types + = !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (dr1)), + TREE_TYPE (DR_BASE_OBJECT (dr2))); + bool ddr_can_be_handled + = !(dimension_zero || different_dimensions + || different_base_objects || incompatible_types); + + if (!ddr_can_be_handled) + { + DEBUG_PRINT ( + dp << "[build_alias_set] " + "Cannot handle aliasing between data references:\n"; + print_gimple_stmt (dump_file, dr1->stmt, 2, TDF_DETAILS); + print_gimple_stmt (dump_file, dr2->stmt, 2, TDF_DETAILS); + dp << "\n"); + if (dimension_zero) + DEBUG_PRINT (dp << "DR1 has dimension 0.\n"); + if (different_base_objects) + DEBUG_PRINT (dp << "DRs have different base objects.\n"); + if (different_dimensions) + DEBUG_PRINT (dp << "DRs have different dimensions.\n"); + if (incompatible_types) + DEBUG_PRINT (dp << + "DRs have incompatible base object types.\n"); + } + + if (ddr_can_be_handled) + { + add_edge (g, i, j); + add_edge (g, j, i); + continue; + } + + loop_p common_loop + = find_common_loop ((DR_STMT (dr1))->bb->loop_father, + (DR_STMT (dr2))->bb->loop_father); + edge scop_entry = scop->scop_info->region.entry; + dr1 = create_data_ref (scop_entry, common_loop, DR_REF (dr1), + DR_STMT (dr1), DR_IS_READ (dr1), + DR_IS_CONDITIONAL_IN_STMT (dr1)); + dr2 = create_data_ref (scop_entry, common_loop, DR_REF (dr2), + DR_STMT (dr2), DR_IS_READ (dr2), + DR_IS_CONDITIONAL_IN_STMT (dr2)); + + if (flag_graphite_runtime_alias_checks + && graphite_runtime_alias_check_p (dr1, dr2, nest, + scop->scop_info->region)) + { + ddr_p ddr = initialize_data_dependence_relation (dr1, dr2, + nest_vec); + scop->unhandled_alias_ddrs.safe_push (ddr); + } + else + { + if (flag_graphite_runtime_alias_checks) + { + unsigned int i; + struct data_dependence_relation *ddr; + + FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr) + if (ddr) + free_dependence_relation (ddr); + scop->unhandled_alias_ddrs.truncate (0); + } + + nest_vec.release (); + free_graph (g); + return false; + } + } + } + } all_vertices = XNEWVEC (int, num_vertices); for (i = 0; i < num_vertices; i++) all_vertices[i] = i; scop->max_alias_set - = graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1; + = graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1; free (all_vertices); for (i = 0; i < g->n_vertices; i++) @@ -1703,7 +1899,6 @@ gather_bbs::after_dom_children (basic_block bb) } } - /* Compute sth like an execution order, dominator order with first executing edges that stay inside the current loop, delaying processing exit edges. */ diff --git a/gcc/graphite.h b/gcc/graphite.h index 6464d2f50ce7..03febfa39986 100644 --- a/gcc/graphite.h +++ b/gcc/graphite.h @@ -368,6 +368,10 @@ struct scop /* The maximum alias set as assigned to drs by build_alias_sets. */ unsigned max_alias_set; + /* A set of ddrs that were rejected by build_alias_set during scop detection + and that must be handled by other means (runtime checking). */ + vec unhandled_alias_ddrs; + /* All the basic blocks in this scop that contain memory references and that will be represented as statements in the polyhedral representation. */ diff --git a/gcc/sese.h b/gcc/sese.h index cd19e6010196..c51ea68bfb47 100644 --- a/gcc/sese.h +++ b/gcc/sese.h @@ -153,6 +153,24 @@ defined_in_sese_p (tree name, const sese_l &r) return stmt_in_sese_p (SSA_NAME_DEF_STMT (name), r); } +/* Returns true if EXPR has operands that are defined in REGION. */ + +static bool +has_operands_from_region_p (tree expr, const sese_l ®ion) +{ + if (!expr || is_gimple_min_invariant (expr)) + return false; + + if (TREE_CODE (expr) == SSA_NAME) + return defined_in_sese_p (expr, region); + + for (int i = 0; i < TREE_OPERAND_LENGTH (expr); i++) + if (has_operands_from_region_p (TREE_OPERAND (expr, i), region)) + return true; + + return false; +} + /* Returns true when LOOP is in REGION. */ static inline bool diff --git a/gcc/testsuite/gcc.dg/graphite/alias-1.c b/gcc/testsuite/gcc.dg/graphite/alias-1.c new file mode 100644 index 000000000000..ee80dae1df33 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/alias-1.c @@ -0,0 +1,22 @@ +/* This test demonstrates a loop nest that Graphite cannot handle + because of aliasing. It should be possible to handle this loop nest + by creating a runtime alias check like in the very similar test + alias-0-runtime-check.c. However Graphite analyses the data + reference with respect to the innermost loop that contains the data + reference, the variable "i" remains uninstantiated (in contrast to + "j"), and consequently the alias check cannot be placed outside of + the SCoP since "i" is not defined there. */ + +/* { dg-options "-O2 -fgraphite-identity -fgraphite-runtime-alias-checks -fdump-tree-graphite-details" } */ + +void sum(int *x, int *y, unsigned *sum) +{ + unsigned i,j; + *sum = 0; + + for (i = 0; i < 10000; i=i+1) + for (j = 0; j < 22222; j=j+1) + *sum += x[i] + y[j]; +} + +/* { dg-final { scan-tree-dump "number of SCoPs: 1" "graphite" { xfail *-*-* } } } */ From patchwork Wed Dec 15 15:54:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48964 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E55E3857027 for ; Wed, 15 Dec 2021 16:09:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 5C3EE3858001 for ; Wed, 15 Dec 2021 15:56:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5C3EE3858001 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: d5IvJsQCwBVNY1aFwrDEAYxVxKkGegkiG4L//YlLvrJMyfgH8pe+aOBm6Ac6rScXegf1ZAawj2 jC6LZKwgPS5LfmgXfW36hEIZGW+JkWFc9lIBozSX2wexSl/fHT5dqU0PV7IvQNEGNLLZ2/c8sy P41p9uaIsoy9o7ZpTZhTer1YMhmhU4XRSx4Zc5oHHwIykigQLbDX6fJQso/Uc3qMWdxY26Nsqp kMevmGUJtFWHgXlvk9fhMmt1l2ZsqI1a9w6/aCYalZUw1xKIBvAe/jef9xSxfGZw9LUclxOS5h wK+eDPOz5ecii7QqdrzQBP9O X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738386" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:21 -0800 IronPort-SDR: gG21uNS/u0Yq9mpZE+bhdoHA+mcwFS9KCrEN8zMZCw3DHkLz8Of5skBRHUdnCt4sBa977Ug9qy LY6Y56SIJ4fHo2VOK6LEPXHnEhpKAvi2liRgBXIdb2+t3XGoAGnDDSfS2HKpI8d7FXNsDkyaff RFqIxupWplyKPHHEWzzl+0A1Q3jsKN3uGRxhybuZvyI8R9uDxixfxTIH1vq0SFxCrEvtLQN8EL gizLvHjq5oLJ6YWne2MGUvPL/ALwPN/Ecw63jZ2bkT56SKOkJChPCiiPvFuIDT29NqhrM8inEw WM8= From: Frederik Harwath To: Subject: [PATCH 20/40] openacc: Use Graphite for dependence analysis in "kernels" regions Date: Wed, 15 Dec 2021 16:54:27 +0100 Message-ID: <20211215155447.19379-21-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This commit changes the handling of OpenACC "kernels" to use Graphite for dependence analysis. To this end, it first introduces a new internal representation for "kernels" regions which should be analyzed by Graphite in pass_omp_oacc_kernels_decompose. This is now the default for all "kernels" regions, but the old handling is still available through the command line parameter "--param=openacc_kernels=decompose-parloops". The handling of this new region type in the omp lowering and omp offloading passes follows the existing handling for "parallel" regions. This replaces the specialized handling for "kernels" regions that was previously used and which was in limited in many ways. Graphite is adjusted to be able to analyze the OpenACC functions that get outlined from the "kernels" regions. It is enabled to handle the internal function calls that contain information about OpenACC constructs. In some places where function calls would be rejected by Graphite, those calls need to be ignored. In other places, information about the loop step, bounds etc. needs to be extracted from the calls. The goal is to enable an analysis of the original loop parameters although the omp lowering and expansion steps have already modified the loop structure. Some parallelization-enabling constructs such as OpenACC "reduction" and "private"/"firstprivate" clauses must be recognized and the data-dependences must be adjusted to reflect the semantics of those constructs. The data-dependence analysis step in Graphite has so far been tied to the code generation step. This commit introduces a separate data-dependence analysis step that avoids the code generation. This is necessary because adjusting the code generation to create a correct OpenACC loop structure would require very considerable effort and the goal of this commit is to implement the dependence analysis only. The ability to use Graphite for dependence analysis without its code generation might be of independent interest, but it is so far used for OpenACC purposes only. In general, all changes to Graphite try to avoid affecting other uses of Graphite as much as possible. gcc/ChangeLog: * Makefile.in: Add graphite-oacc.o * cfgloop.c (alloc_loop): Set can_be_parallel_valid_p to false. * cfgloop.h: Add can_be_parallel_valid_p field. * cfgloopmanip.c (copy_loop_info): Add assert. * config/nvptx/nvptx.c (nvptx_goacc_reduction_setup): Add assert. * doc/invoke.texi: Adjust param openacc-kernels description. * doc/passes.texi: Adjust pass_ipa_oacc_kernels description. * flag-types.h (enum openacc_kernels):Add OPENACC_KERNELS_DECOMPOSE_PARLOOPS. * gimple-pretty-print.c (dump_gimple_omp_target): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and widen GF_OMP_TARGET_KIND_MASK. (is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (is_gimple_omp_offloaded): Likewise. * graphite-dependences.c (scop_get_reads_and_writes): Handle "kills" and "reduction" PDRs. (apply_schedule_on_deps): Add dump output for intermediate steps of the dependence computation to enable understanding of unexpected dependences. (carries_deps): Likewise. (scop_get_dependences): Handle "kill" operations and add dump output. * graphite-isl-ast-to-gimple.c (visit_schedule_loop_node): New function. (graphite_oacc_analyze_scop): New function. * graphite-optimize-isl.c (optimize_isl): Remove "static" and add argument to identify OpenACC use; don't fail on unchanged schedule in this case. * graphite-poly.c (new_poly_dr): Handle "kills". (print_pdr): Likewise. (new_gimple_poly_bb): Likewise. (free_gimple_poly_bb): Likewise. (new_scop): Handle "reduction", "private", and "firstprivate" hash sets. (free_scop): Likewise. (print_isl_space): New function. (debug_isl_space): New function. * graphite-scop-detection.c (scop_detection::can_represent_loop): Don't fail if niter is 0 in OpenACC functions. (scop_detection::add_scop): Don't reject regions with only one loop in OpenACC functions. (ignored_oacc_internal_call_p): New function. (scan_tree_for_params): Handle VIEW_CONVERT_EXPR. (stmt_has_side_effects): Ignore internal OpenACC function calls. (add_write): Likewise. (add_read): Likewise. (add_kill): New function. (add_kills): New function. (add_oacc_kills): New function. (try_generate_gimple_bb): Kill false dependences for OpenACC "private"/"firstprivate" vars. (gather_bbs::gather_bbs): Determin OpenACC "private"/"firstprivate" vars in region. (gather_bbs::before_dom_children): Add assert. (determine_openacc_reductions): New function. (build_scops): Determine OpenACC "reduction" vars in SCoP. * graphite-sese-to-poly.c (oacc_ifn_call_extract): New declaration. (oacc_internal_call_p): New function. (build_poly_dr): Ignore internal OpenACC function calls, handle "reduction" refs. (build_poly_sr): Likewise; handle "kill" operations. * graphite.c (graphite_transform_loops): Accept functions with only a single loop. (oacc_enable_graphite_p): New function. (gate_graphite_transforms): Enable pass on OpenACC functions. * graphite.h (enum poly_dr_type): Add PDR_KILL. (struct poly_dr): Add "is_reduction" field. (new_poly_dr): Add argument to declaration. (pdr_kill_p): New function. (print_isl_space): New declaration. (debug_isl_space): New declaration. (struct scop): Add fields "reductions_vars", "oacc_firstprivate_vars", and "oacc_private_scalars". (optimize_isl): New declaration. (graphite_oacc_analyze_scop): New declaration. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE * internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE * omp-expand.c (struct omp_region): Adjust comment. (expand_omp_for): Add asserts about expected "kernels" region types. (mark_loops_in_oacc_kernels_region): Likewise. (expand_omp_target): Likewise; handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (build_omp_regions_1): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. Likewise. (omp_make_gimple_edges): Likewise. * omp-general.c (oacc_get_kernels_attrib): New function. (oacc_get_fn_dim_size): Allow argument to be NULL. * omp-general.h (oacc_get_kernels_attrib): New declaration. * omp-low.c (struct omp_context): Add fields "oacc_firstprivate_vars" and "oacc_private_scalars". (was_originally_oacc_kernels): New function. (is_oacc_kernels_decomposed_graphite_part): New function. (new_omp_context): Allocate "oacc_first_private_vars" and "oacc_private_scalars" ... (delete_omp_context): ... and free from here. (oacc_record_firstprivate_var_clauses): New function. (oacc_record_private_scalars): New function. (scan_sharing_clauses): Call functions to record "private" scalars and "firstprivate" variables. (check_oacc_kernel_gwv): Add assert. (ctx_in_oacc_kernels_region): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (scan_omp_for): Likewise. (check_omp_nesting_restrictions): Likewise. (lower_oacc_head_mark): Likewise. (lower_omp_for): Likewise. (lower_omp_target): Create "private" and "firstprivate" marker call statements. (lower_oacc_head_tail): Adjust "private" and "firstprivate" marker calls. (lower_oacc_reductions): Emit "private" and "firstprivate" marker call statements. (make_oacc_firstprivate_vars_marker): New function. (make_oacc_private_scalars_marker): New function. * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn): Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to region using the new "kernels" handling. (make_region_seq): Adjust default region type for new "kernels" handling; no more exceptions, let Graphite handle everything. (make_region_loop_nest): Likewise; add dump output and assert. (adjust_nested_loop_clauses): Stop creating "auto" clauses if loop has "independent", "gang" etc. (transform_kernels_loop_clauses): Likewise. * omp-offload.c (oacc_extract_loop_call): New function. (oacc_loop_get_cfg_loop): New function. (can_be_parallel_str): New function. (oacc_loop_can_be_parallel_p): New function. (oacc_parallel_kernels_graphite_fun_p): New function. (oacc_parallel_fun_p): New function. (oacc_loop_transform_auto_into_independent): New function, ... (oacc_loop_fixed_partitions): ... called from here to transfer the result of Graphite's analysis to the loop. (execute_oacc_loop_designation): Handle "oacc functions with "parallel_kernels_graphite" attribute. (execute_oacc_device_lower): Handle IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE. * omp-offload.h (oacc_extract_loop_call): Add declaration. * params.opt: Add "param=openacc-kernels" value "decompose-parloops". * sese.c (scalar_evolution_in_region): "Redirect" SCEV analysis to outer loop for IFN_GOACC_LOOP calls. * sese.h: Add field "kill_scalar_refs". * tree-chrec.c (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR like CASE_CONVERT. * tree-data-ref.c (dump_data_reference): Include DR_BASE_ADDRESS and DR_OFFSET in dump output. (get_references_in_stmt): Don't reject OpenACC internal function calls. (graphite_find_data_references_in_stmt): Remove unused variable. * tree-parloops.c (pass_parallelize_loops::execute): Disable pass with the new kernels handling, enable if requested explicitly. * tree-scalar-evolution.c (set_scev_analyze_openacc_calls): Set flag to enable the analysis of internal OpenACC function calls (use for Graphite only). (oacc_call_analyzable_p): New function. (oacc_ifn_call_extract): New function. (oacc_simplify): New function. (add_to_evolution): Simplify OpenACC internal function calls if applicable. (follow_ssa_edge_binary): Likewise. (follow_ssa_edge_expr): Likewise. (follow_copies_to_constant): Likewise. (analyze_initial_condition): Likewise. (interpret_loop_phi): Likewise. (interpret_gimple_call): New function. (interpret_rhs_expr): Likewise. (instantiate_scev_name): Likewise. (analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions. (expression_expensive_p): Consider internal OpenACC calls to be cheap. * tree-scalar-evolution.h (set_scev_analyze_openacc_calls): New declaration. (oacc_call_analyzable_p): New declaration. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Mark lhs of internal OpenACC function calls necessary. * tree-ssa-loop-niter.c (oacc_call_analyzable_p): New function. (oacc_ifn_call_extract): New declaration. (interpret_gimple_call): New delcaration. (expand_simple_operations): Handle internal OpenACC function calls. * tree-ssa-loop.c (gate_oacc_kernels): Disable for new "kernels" handling. * graphite-oacc.c: New file. * graphite-oacc.h: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust. * testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels.c: Adjust. * gfortran.dg/goacc/loop-auto-transfer-2.f90: New test. * gfortran.dg/goacc/loop-auto-transfer-3.f90: New test. * gfortran.dg/goacc/loop-auto-transfer-4.f90: New test. Co-Authored-By: Thomas Schwinge --- gcc/Makefile.in | 1 + gcc/cfgloop.c | 1 + gcc/cfgloop.h | 6 + gcc/cfgloopmanip.c | 1 + gcc/config/nvptx/nvptx.c | 7 + gcc/doc/invoke.texi | 20 +- gcc/doc/passes.texi | 6 +- gcc/flag-types.h | 1 + gcc/gimple-pretty-print.c | 3 + gcc/gimple.h | 5 + gcc/graphite-dependences.c | 220 ++++-- gcc/graphite-isl-ast-to-gimple.c | 93 ++- gcc/graphite-oacc.c | 688 ++++++++++++++++++ gcc/graphite-oacc.h | 55 ++ gcc/graphite-optimize-isl.c | 7 +- gcc/graphite-poly.c | 39 +- gcc/graphite-scop-detection.c | 190 ++++- gcc/graphite-sese-to-poly.c | 65 +- gcc/graphite.c | 120 ++- gcc/graphite.h | 35 +- gcc/internal-fn.c | 4 + gcc/internal-fn.h | 4 +- gcc/omp-expand.c | 65 +- gcc/omp-general.c | 21 +- gcc/omp-general.h | 1 + gcc/omp-low.c | 389 ++++++++-- gcc/omp-oacc-kernels-decompose.cc | 145 ++-- gcc/omp-offload.c | 483 +++++++++++- gcc/omp-offload.h | 2 + gcc/params.opt | 7 +- gcc/sese.c | 25 +- gcc/sese.h | 1 + .../c-c++-common/goacc/classify-kernels.c | 2 +- .../goacc/loop-auto-transfer-2.f90 | 47 ++ .../goacc/loop-auto-transfer-3.f90 | 103 +++ .../goacc/loop-auto-transfer-4.f90 | 323 ++++++++ gcc/tree-chrec.c | 3 + gcc/tree-data-ref.c | 20 +- gcc/tree-parloops.c | 18 +- gcc/tree-scalar-evolution.c | 177 ++++- gcc/tree-scalar-evolution.h | 3 + gcc/tree-ssa-dce.c | 23 + gcc/tree-ssa-loop-niter.c | 6 + gcc/tree-ssa-loop.c | 11 + .../libgomp.oacc-c-c++-common/parallel-dims.c | 2 + .../kernels-independent.f90 | 1 + .../libgomp.oacc-fortran/kernels-loop-1.f90 | 1 + .../libgomp.oacc-fortran/pr94358-1.f90 | 1 + 48 files changed, 3123 insertions(+), 328 deletions(-) create mode 100644 gcc/graphite-oacc.c create mode 100644 gcc/graphite-oacc.h create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 571e9c28e29d..debd8047cc85 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1433,6 +1433,7 @@ OBJS = \ graphite-poly.o \ graphite-scop-detection.o \ graphite-sese-to-poly.o \ + graphite-oacc.o \ gtype-desc.o \ haifa-sched.o \ hash-map-tests.o \ diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c index 2ba9918bfa2a..a15c2c84c3ca 100644 --- a/gcc/cfgloop.c +++ b/gcc/cfgloop.c @@ -349,6 +349,7 @@ alloc_loop (void) loop->exits = ggc_cleared_alloc (); loop->exits->next = loop->exits->prev = loop->exits; loop->can_be_parallel = false; + loop->can_be_parallel_valid_p = false; loop->constraints = 0; loop->nb_iterations_upper_bound = 0; loop->nb_iterations_likely_upper_bound = 0; diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 0f71a6bf18f2..866ea23c8369 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -213,6 +213,12 @@ public: /* True if the loop can be parallel. */ unsigned can_be_parallel : 1; + /* True if the can_be_parallel flag is valid, i.e. the + parallelizability of the loop has been analyzed. This can be + used to distinguish between unparallelizable loops and a failed + analysis, e.g. to provide better diagnostic messages. */ + unsigned can_be_parallel_valid_p : 1; + /* True if -Waggressive-loop-optimizations warned about this loop already. */ unsigned warned_aggressive_loop_optimizations : 1; diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c index aa538a221e1f..05c381123f65 100644 --- a/gcc/cfgloopmanip.c +++ b/gcc/cfgloopmanip.c @@ -952,6 +952,7 @@ copy_loop_info (class loop *loop, class loop *target) target->simdlen = loop->simdlen; target->constraints = loop->constraints; target->can_be_parallel = loop->can_be_parallel; + target->can_be_parallel_valid_p = loop->can_be_parallel_valid_p; target->warned_aggressive_loop_optimizations |= loop->warned_aggressive_loop_optimizations; target->dont_vectorize = loop->dont_vectorize; diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 951252e598a2..faec06f2af7c 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -6368,7 +6368,14 @@ nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa) } if (lhs) + { + //TODO Earlier check for ICE as reported in . + //TODO Not sure if this makes too much sense to have (just) here -- should probably be moved (way) further up in the pipeline? + if (TREE_CODE (TREE_TYPE (lhs)) == REFERENCE_TYPE) + gcc_checking_assert (is_gimple_addressable (var)); + gimplify_assign (lhs, var, &seq); + } pop_gimplify_context (NULL); gsi_replace_with_seq (&gsi, seq, true); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e0f09610408c..f58cdd8724d7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14775,14 +14775,22 @@ Maximum depth of logical expression evaluation ranger will look through when evaluating outgoing edge ranges. @item openacc-kernels -Specify mode of OpenACC `kernels' constructs handling. -With @option{--param=openacc-kernels=decompose}, OpenACC `kernels' +Specify mode of OpenACC `kernels' constructs handling. With +@option{--param=openacc-kernels=decompose}, OpenACC `kernels' constructs are decomposed into parts, a sequence of compute -constructs, each then handled individually. -This is work in progress. +constructs, each then handled individually. The data dependence +analysis that is necessary to determine if loops can be parallelized +is performed by the Graphite pass. +This is the default. +With @option{--param=openacc-kernels=decompose-parloops}, OpenACC +`kernels' constructs are decomposed into parts, a sequence of compute +constructs, each then handled individually by the @samp{parloops} +pass. +This is deprecated. With @option{--param=openacc-kernels=parloops}, OpenACC `kernels' -constructs are handled by the @samp{parloops} pass, en bloc. -This is the current default. +constructs are handled by the @samp{parloops} pass, en bloc. This is +deprecated. +This is deprecated. @item openacc-privatization Specify mode of OpenACC privatization diagnostics for diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi index 9046cbed2d90..2649e01cc945 100644 --- a/gcc/doc/passes.texi +++ b/gcc/doc/passes.texi @@ -248,9 +248,9 @@ constraints in order to generate the points-to sets. It is located in This is a pass group for processing OpenACC kernels regions. It is a subpass of the IPA OpenACC pass group that runs on offloaded functions -containing OpenACC kernels loops. It is located in -@file{tree-ssa-loop.c} and is described by -@code{pass_ipa_oacc_kernels}. +containing OpenACC kernels loops if @samp{parloops} based handling of +kernels regions is used. It is located in @file{tree-ssa-loop.c} and +is described by @code{pass_ipa_oacc_kernels}. @item Target clone diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 7cf8c28933b2..bc118308f929 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -481,6 +481,7 @@ enum vrp_mode enum openacc_kernels { OPENACC_KERNELS_DECOMPOSE, + OPENACC_KERNELS_DECOMPOSE_PARLOOPS, OPENACC_KERNELS_PARLOOPS }; diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 1cd1597359e8..9f4dea184cf3 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -1784,6 +1784,9 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs, case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: kind = " oacc_parallel_kernels_gang_single"; break; + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: + kind = " oacc_parallel_kernels_graphite"; + break; case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: kind = " oacc_data_kernels"; break; diff --git a/gcc/gimple.h b/gcc/gimple.h index 3cde3cde7fee..412efff5fa44 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -185,6 +185,9 @@ enum gf_mask { /* A 'GF_OMP_TARGET_KIND_OACC_DATA' representing an OpenACC 'kernels' decomposed parts' 'data' construct. */ GF_OMP_TARGET_KIND_OACC_DATA_KERNELS = 16, + /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels' + construct, for Graphite to analyze. */ + GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE = 17, GF_OMP_TEAMS_HOST = 1 << 0, /* True on an GIMPLE_OMP_RETURN statement if the return does not require @@ -6652,6 +6655,7 @@ is_gimple_omp_oacc (const gimple *stmt) case GF_OMP_TARGET_KIND_OACC_DECLARE: case GF_OMP_TARGET_KIND_OACC_HOST_DATA: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: return true; @@ -6681,6 +6685,7 @@ is_gimple_omp_offloaded (const gimple *stmt) case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: return true; default: return false; diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c index 9f2eda34add3..24b081624c72 100644 --- a/gcc/graphite-dependences.c +++ b/gcc/graphite-dependences.c @@ -38,6 +38,9 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "tree-data-ref.h" #include "graphite.h" +#include "graphite-oacc.h" +#include "gimple-pretty-print.h" + /* Add the constraints from the set S to the domain of MAP. */ @@ -63,71 +66,108 @@ add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb) return constrain_domain (x, isl_set_copy (pbb->domain)); } -/* Returns an isl description of all memory operations in SCOP. The memory - reads are returned in READS and writes in MUST_WRITES and MAY_WRITES. */ +/* Returns an isl description of all memory operations in SCOP. The + memory reads are returned in READS and writes in MUST_WRITES and + MAY_WRITES, kills go to KILLS. */ static void scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads, isl_union_map *&must_writes, - isl_union_map *&may_writes) + isl_union_map *&may_writes, + isl_union_map *&kills) { int i, j; poly_bb_p pbb; poly_dr_p pdr; FOR_EACH_VEC_ELT (scop->pbbs, i, pbb) + { + FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr) { - FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr) { - if (pdr_read_p (pdr)) - { - if (dump_file) - { - fprintf (dump_file, "Adding read to depedence graph: "); - print_pdr (dump_file, pdr); - } - isl_union_map *um - = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); - reads = isl_union_map_union (reads, um); - if (dump_file) - { - fprintf (dump_file, "Reads depedence graph: "); - print_isl_union_map (dump_file, reads); - } - } - else if (pdr_write_p (pdr)) - { - if (dump_file) - { - fprintf (dump_file, "Adding must write to depedence graph: "); - print_pdr (dump_file, pdr); - } - isl_union_map *um - = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); - must_writes = isl_union_map_union (must_writes, um); - if (dump_file) - { - fprintf (dump_file, "Must writes depedence graph: "); - print_isl_union_map (dump_file, must_writes); - } - } - else if (pdr_may_write_p (pdr)) - { - if (dump_file) - { - fprintf (dump_file, "Adding may write to depedence graph: "); - print_pdr (dump_file, pdr); - } - isl_union_map *um - = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); - may_writes = isl_union_map_union (may_writes, um); - if (dump_file) - { - fprintf (dump_file, "May writes depedence graph: "); - print_isl_union_map (dump_file, may_writes); - } - } - } + isl_union_map *um = NULL; + + if (pdr->is_reduction) + { + if (dump_file) + { + fprintf (dump_file, + "Skipped reduction variable %s in statement .\n", + pdr_write_p (pdr) ? "read" : "write"); + print_gimple_stmt (dump_file, pdr->stmt, 0, dump_flags); + fprintf (dump_file, "\n"); + } + continue; + } + + if (pdr_read_p (pdr)) + { + if (dump_file) + { + fprintf (dump_file, "Adding %sread to dependence graph: ", + pdr->is_reduction ? "reduction " : ""); + print_pdr (dump_file, pdr); + isl_map* tmp = add_pdr_constraints (pdr, pbb); + print_isl_map (dump_file, tmp); + isl_map_free (tmp); + } + um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); + + reads = isl_union_map_union (reads, um); + if (dump_file) + { + fprintf (dump_file, "Reads dependence graph: "); + print_isl_union_map (dump_file, reads); + } + } + else if (pdr_write_p (pdr)) + { + if (dump_file) + { + fprintf (dump_file, "Adding %smust write to dependence graph: ", + pdr->is_reduction ? "reduction " : ""); + print_pdr (dump_file, pdr); + } + + + um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); + + must_writes = isl_union_map_union (must_writes, um); + } + else if (pdr_may_write_p (pdr)) + { + if (dump_file) + { + fprintf (dump_file, "Adding %smay write to dependence graph: ", + pdr->is_reduction ? "reduction " : ""); + print_pdr (dump_file, pdr); + } + um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); + + may_writes = isl_union_map_union (may_writes, um); + if (dump_file) + { + fprintf (dump_file, "May writes dependence graph: "); + print_isl_union_map (dump_file, may_writes); + } + } + else if (pdr_kill_p (pdr)) + { + if (dump_file) + { + fprintf (dump_file, "Adding kill to dependence graph: "); + print_pdr (dump_file, pdr); + } + um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb)); + + kills = isl_union_map_union (kills, um); + if (dump_file) + { + fprintf (dump_file, "Kills: "); + print_isl_union_map (dump_file, kills); + } + } } + } } /* Helper function used on each MAP of a isl_union_map. Computes the @@ -203,7 +243,19 @@ apply_schedule_on_deps (__isl_keep isl_union_map *schedule, isl_union_map *trans = extend_schedule (isl_union_map_copy (schedule)); isl_union_map *ux = isl_union_map_copy (deps); ux = isl_union_map_apply_domain (ux, isl_union_map_copy (trans)); + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "Applied domain map to dependences:\n"); + print_isl_union_map (dump_file, ux); + } ux = isl_union_map_apply_range (ux, trans); + + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "Applied range map:\n"); + print_isl_union_map (dump_file, ux); + } + ux = isl_union_map_coalesce (ux); if (!isl_union_map_is_empty (ux)) @@ -230,6 +282,12 @@ carries_deps (__isl_keep isl_union_map *schedule, if (x == NULL) return false; + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "Applied schedule on dependences:\n"); + print_isl_map (dump_file, x); + } + isl_space *space = isl_map_get_space (x); isl_map *lex = isl_map_lex_le (isl_space_range (space)); isl_constraint *ineq = isl_inequality_alloc @@ -244,7 +302,22 @@ carries_deps (__isl_keep isl_union_map *schedule, ineq = isl_constraint_set_constant_si (ineq, -1); lex = isl_map_add_constraint (lex, ineq); lex = isl_map_coalesce (lex); + + + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "Lex: \n"); + print_isl_map (dump_file, lex); + } + x = isl_map_intersect (x, lex); + + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "Intersect: \n"); + print_isl_map (dump_file, x); + } + bool res = !isl_map_is_empty (x); isl_map_free (x); @@ -265,8 +338,9 @@ scop_get_dependences (scop_p scop) isl_space *space = isl_set_get_space (scop->param_context); isl_union_map *reads = isl_union_map_empty (isl_space_copy (space)); isl_union_map *must_writes = isl_union_map_empty (isl_space_copy (space)); - isl_union_map *may_writes = isl_union_map_empty (space); - scop_get_reads_and_writes (scop, reads, must_writes, may_writes); + isl_union_map *may_writes = isl_union_map_empty (isl_space_copy (space)); + isl_union_map *kills = isl_union_map_empty (space); + scop_get_reads_and_writes (scop, reads, must_writes, may_writes, kills); if (dump_file) { @@ -282,10 +356,11 @@ scop_get_dependences (scop_p scop) fprintf (dump_file, " [1, i0] is a 'memref' with alias set 1" " and first subscript access i0.\n"); fprintf (dump_file, " [106] is a 'scalar reference' which is the sum of" - " SSA_NAME_VERSION 6" - " and --param graphite-max-arrays-per-scop=100\n"); + " SSA_NAME_VERSION 6 and scop->max_alias_set whose value\n is 100" + " in this example.\n"); fprintf (dump_file, "-----------------------\n\n"); + fprintf (dump_file, "max_alias_set: %d\n", scop->max_alias_set); fprintf (dump_file, "data references (\n"); fprintf (dump_file, " reads: "); print_isl_union_map (dump_file, reads); @@ -293,31 +368,59 @@ scop_get_dependences (scop_p scop) print_isl_union_map (dump_file, must_writes); fprintf (dump_file, " may_writes: "); print_isl_union_map (dump_file, may_writes); + fprintf (dump_file, " kills: "); + print_isl_union_map (dump_file, kills); fprintf (dump_file, ")\n"); } gcc_assert (scop->original_schedule); + isl_union_access_info *ai; ai = isl_union_access_info_from_sink (isl_union_map_copy (reads)); ai = isl_union_access_info_set_must_source (ai, isl_union_map_copy (must_writes)); ai = isl_union_access_info_set_may_source (ai, may_writes); + ai = isl_union_access_info_set_kill (ai, isl_union_map_copy (kills)); ai = isl_union_access_info_set_schedule (ai, isl_schedule_copy (scop->original_schedule)); isl_union_flow *flow = isl_union_access_info_compute_flow (ai); isl_union_map *raw = isl_union_flow_get_must_dependence (flow); + + if (dump_file) + { + fprintf (dump_file, "raw dependences (\n"); + print_isl_union_map (dump_file, raw); + fprintf (dump_file, ")\n"); + } + isl_union_flow_free (flow); ai = isl_union_access_info_from_sink (isl_union_map_copy (must_writes)); ai = isl_union_access_info_set_must_source (ai, must_writes); ai = isl_union_access_info_set_may_source (ai, reads); + ai = isl_union_access_info_set_kill (ai, kills); ai = isl_union_access_info_set_schedule (ai, isl_schedule_copy (scop->original_schedule)); flow = isl_union_access_info_compute_flow (ai); isl_union_map *waw = isl_union_flow_get_must_dependence (flow); + + if (dump_file) + { + fprintf (dump_file, "waw dependences (\n"); + print_isl_union_map (dump_file, waw); + fprintf (dump_file, ")\n"); + } isl_union_map *war = isl_union_flow_get_may_dependence (flow); war = isl_union_map_subtract (war, isl_union_map_copy (waw)); + + if (dump_file) + { + fprintf (dump_file, "war dependences (\n"); + print_isl_union_map (dump_file, war); + fprintf (dump_file, ")\n"); + } + isl_union_flow_free (flow); raw = isl_union_map_coalesce (raw); @@ -331,6 +434,9 @@ scop_get_dependences (scop_p scop) if (dump_file) { + fprintf (dump_file, "(space: " ); + print_isl_space (dump_file, space); + fprintf (dump_file, ")\n"); fprintf (dump_file, "data dependences (\n"); print_isl_union_map (dump_file, dependences); fprintf (dump_file, ")\n"); diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index 073b471775de..e820e2c32202 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "tree-vectorizer.h" #include "graphite.h" +#include "graphite-oacc.h" +#include "stdlib.h" struct ast_build_info { @@ -1456,8 +1458,8 @@ generate_entry_out_of_ssa_copies (edge false_entry, } } -/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of - aliasing. */ +/* Create a condition that evaluates to TRUE if all ALIAS_DDRS + are free of aliasing. */ static tree generate_alias_cond (vec &alias_ddrs, loop_p context_loop) @@ -1617,4 +1619,91 @@ graphite_regenerate_ast_isl (scop_p scop) return !t.codegen_error_p (); } +/* A callback for traversing a schedule tree that visits the band + nodes of a schedule which correspond to loops. Checks if the local + schedule carries any dependencies and marks the corresponding CFG + loops as being parallelizable accordingly. */ + +static isl_bool +visit_schedule_loop_node (__isl_keep isl_schedule_node *node, void *user) +{ + isl_bool visit_children = isl_bool_true; + + if (isl_schedule_node_get_type (node) != isl_schedule_node_band) + return visit_children; + + isl_union_map *dependences = (isl_union_map *)user; + isl_union_map *schedule + = isl_schedule_node_band_get_partial_schedule_union_map (node); + isl_space *space = isl_schedule_node_band_get_space (node); + + isl_id *id = isl_space_get_tuple_id (space, isl_dim_out); + const char *name = isl_id_get_name (id); + /* Expect format set by add_loop_schedule, i.e. "L_n" */ + gcc_checking_assert (name[0] == 'L' && name[1] == '_'); + int loop_num = atoi (name + 2); + isl_id_free (id); + + int dimension = isl_space_dim (space, isl_dim_out); + loop_p loop = get_loop (cfun, loop_num); + + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "CFG loop %d:\n", loop_num); + print_isl_union_map (dump_file, schedule); + fprintf (dump_file, "Schedule dimension: %d\n", dimension); + + fprintf (dump_file, "Schedule node space:\n"); + print_isl_space (dump_file, space); + fprintf (dump_file, "data dependences (\n"); + print_isl_union_map (dump_file, dependences); + fprintf (dump_file, ")\n"); + } + + bool has_deps = carries_deps (schedule, dependences, dimension); + + loop->can_be_parallel = !has_deps; + loop->can_be_parallel_valid_p = true; + + if (dump_file && dump_flags & TDF_DETAILS) + { + dump_user_location_t loc = find_loop_location (loop); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, + "loop %s data-dependences.\n", + has_deps ? "has" : "has no"); + + fprintf (dump_file, ")\n"); + } + + isl_union_map_free (schedule); + isl_space_free (space); + + + return visit_children; +} + +/* This function performs data-dependence analysis on the SCoP without using + Graphite's code generation. This is meant for OpenACC use since the code + generator is unable to reconstruct the OpenACC loop structure. */ + +bool +graphite_oacc_analyze_scop (scop_p scop) +{ + timevar_push (TV_GRAPHITE_CODE_GEN); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "[graphite_oacc_analyze_scop] schedule:\n"); + print_isl_schedule (dump_file, scop->original_schedule); + } + + /* Analyze dependences in SCoP and mark loops as parallelizable accordingly. */ + isl_schedule_foreach_schedule_node_top_down ( + scop->original_schedule, visit_schedule_loop_node, scop->dependence); + + timevar_pop (TV_GRAPHITE_CODE_GEN); + + return true; +} + #endif /* HAVE_isl */ diff --git a/gcc/graphite-oacc.c b/gcc/graphite-oacc.c new file mode 100644 index 000000000000..9b3dc7998401 --- /dev/null +++ b/gcc/graphite-oacc.c @@ -0,0 +1,688 @@ +/* Functions for analyzing the OpenACC loop structure from Graphite. + + Copyright (C) 2021 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "cfghooks.h" +#include "tree.h" +#include "gimple.h" +#include "cfgloop.h" + +#include "internal-fn.h" +#include "gimple.h" +#include "tree-cfg.h" +#include "tree-pretty-print.h" +#include "gimple-pretty-print.h" +#include "print-tree.h" + +#include "gimple-ssa.h" +#include "gimple-iterator.h" +#include "tree-phinodes.h" +#include "tree-ssa-operands.h" +#include "ssa-iterators.h" +#include "omp-general.h" +#include "graphite-oacc.h" + +unsigned +gimple_call_internal_kind (gimple *call) +{ + return TREE_INT_CST_LOW (gimple_call_arg (call, 0)); +} + +static bool inline gimple_call_ifn_unique_p (gimple *call, + enum ifn_unique_kind kind) +{ + if (!gimple_call_internal_p (call, IFN_UNIQUE)) + return false; + + return kind == gimple_call_internal_kind (call); +} + +static bool inline goacc_reduction_call_p (gimple *call) +{ + return gimple_call_internal_p (call, IFN_GOACC_REDUCTION); +} + +static bool inline goacc_reduction_call_p (gimple *call, + enum ifn_goacc_reduction_kind kind) +{ + return gimple_call_internal_p (call, IFN_GOACC_REDUCTION) + && gimple_call_internal_kind (call) == kind; +} + +/* Check if VAR is private in the OpenACC loop that encloses the cfg LOOP. The + function returns TRUE if there is an IFN_UNIQUE_OACC_PRIVATE call in the + head sequence that precedes the CFG loop. */ + +bool +is_oacc_private (tree var, loop_p loop) +{ + return false; + + if (TREE_CODE (var) == SSA_NAME) + { + if (!SSA_NAME_VAR (var)) + return false; + + var = SSA_NAME_VAR (var); + } + + gcc_checking_assert (TREE_CODE (var) == VAR_DECL); + + if (!loop) + return false; + + basic_block bb = loop->header; + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + + while (bb != entry_bb) + { + bb = get_immediate_dominator (CDI_DOMINATORS, bb); + gimple *stmt = last_stmt (bb); + if (!stmt) + continue; + + /* We are looking for the sequence of IFN_UNIQUE calls at the + head of the current OpenACC loop. */ + if (!gimple_call_internal_p (stmt, IFN_UNIQUE)) + continue; + + enum ifn_unique_kind kind + = (enum ifn_unique_kind)TREE_INT_CST_LOW (gimple_call_arg (stmt, 0)); + + /* The head mark that starts the current OpenACC loop. + Private calls above here are irrelevant. Stop. */ + if (kind == IFN_UNIQUE_OACC_HEAD_MARK && gimple_call_num_args (stmt) > 2) + break; + + if (kind != IFN_UNIQUE_OACC_PRIVATE) + continue; + + tree private_var = gimple_call_arg (stmt, 3); + + if (TREE_CODE (private_var) == ADDR_EXPR) + private_var = TREE_OPERAND (private_var, 0); + + if (var == private_var) + return true; + } + + return false; +} + +void +oacc_add_private_var_kills (loop_p loop, vec *kills) +{ + gcc_checking_assert (loop); + + basic_block bb = loop->header; + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + + while (bb != entry_bb) + { + bb = get_immediate_dominator (CDI_DOMINATORS, bb); + + gimple *stmt = last_stmt (bb); + if (!stmt) + continue; + + /* We are looking for the sequence of IFN_UNIQUE calls at the head of the + current OpenACC loop. */ + + if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_HEAD_MARK)) + continue; + + /* The head mark that starts the current OpenACC loop. + Private calls above here are irrelevant. Stop. */ + if (gimple_call_num_args (stmt) > 2) + break; + + if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_PRIVATE)) + continue; + + tree private_var = gimple_call_arg (stmt, 3); + + gcc_checking_assert (TREE_CODE (private_var) == ADDR_EXPR); + private_var = TREE_OPERAND (private_var, 0); + kills->safe_push (private_var); + } +} + +typedef std::pair gcall_pair; + +/* Returns a pair that contains the internal function calls that start + and end the head sequence of the OpenACC loop enclosing the cfg + loop LOOP or a pair of NULL pointers if LOOP is not enclosed in a + OpenACC LOOP. */ + +gcall_pair +find_oacc_head_marks (loop_p loop) +{ + basic_block bb = loop->header; + basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); + + gcall *top_head_mark = NULL; + gcall *bottom_head_mark = NULL; + + while (bb != entry_bb) + { + bb = get_immediate_dominator (CDI_DOMINATORS, bb); + + gimple *stmt = last_stmt (bb); + if (!stmt) + continue; + + /* Look for IFN_UNIQUE calls in the head of OpenACC loop. */ + if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_HEAD_MARK)) + continue; + + if (!bottom_head_mark) + { + bottom_head_mark = as_a (stmt); + continue; + } + + /* The head mark that starts the current OpenACC loop can be + recognized by the number of call arguments, cf. omp-low.c. */ + if (gimple_call_num_args (stmt) > 3) + { + top_head_mark = as_a (stmt); + break; + } + } + + gcc_checking_assert ((top_head_mark && bottom_head_mark) + || (!top_head_mark && !bottom_head_mark)); + + return gcall_pair (top_head_mark, bottom_head_mark); +} + +/* Returns the internal function call that starts the tail sequence of the + OpenACC loop that encloses the CFG loop LOOP or NULL if LOOP is not + contained in an OpenACC loop. */ + +gcall * +find_oacc_top_tail_mark (loop_p loop) +{ + gcall_pair head_marks = find_oacc_head_marks (loop); + + if (!head_marks.first || !head_marks.second) + return NULL; + + tree data_dep = gimple_call_lhs (head_marks.second); + gcc_checking_assert (has_single_use (data_dep)); + + gimple *tail_mark; + use_operand_p use_p; + single_imm_use (data_dep, &use_p, &tail_mark); + + return as_a (tail_mark); +} + +/* Returns a pair containing the internal function calls that start and end the + tail sequence of the OpenACC loop that encloses the cfg loop LOOP or a pair + of NULL pointers if LOOP does not belong to an OpenACC loop. */ + +gcall_pair +find_oacc_tail_marks (loop_p loop) +{ + gcall *top_tail_mark = find_oacc_top_tail_mark (loop); + + if (!top_tail_mark) + return gcall_pair (NULL, NULL); + + tree data_dep = gimple_call_lhs (top_tail_mark); + gimple *stmt = top_tail_mark; + + while (has_single_use (data_dep)) + { + use_operand_p use_p; + single_imm_use (data_dep, &use_p, &stmt); + data_dep = gimple_call_lhs (stmt); + + gcc_checking_assert (gimple_call_internal_p (stmt)); + } + + gcall *end_tail_mark = as_a (stmt); + + gcc_checking_assert ( + gimple_call_ifn_unique_p (end_tail_mark, IFN_UNIQUE_OACC_TAIL_MARK)); + + return gcall_pair (top_tail_mark, end_tail_mark); +} + +/* Add all ssa names to VARS that can be reached from PHI by a + phi node walk. */ + +static void +collect_oacc_reduction_vars_phi_walk (gphi *phi, hash_set &vars) +{ + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES) + { + tree use = USE_FROM_PTR (use_p); + if (TREE_CODE (use) != SSA_NAME) + continue; + + if (vars.contains (use)) + continue; + + gimple *def_stmt = SSA_NAME_DEF_STMT (use); + vars.add (use); + + gphi *use_phi = dyn_cast (def_stmt); + if (use_phi) + { + collect_oacc_reduction_vars_phi_walk (use_phi, vars); + + continue; + } + } +} + +/* Returns true iff following the immediate use chain from the + IFN_GOACC_REDUCTION call CALL leads out of loop that contains CALL. */ + +static bool +reduction_use_in_outer_loop_p (gcall *call) +{ + gcc_checking_assert (goacc_reduction_call_p (call)); + + tree data_dep = gimple_call_lhs (call); + + /* The IFN_GOACC_REDUCTION_CALLS are linked in a chain through + immediate uses. Move to the end of this chain. */ + gimple *stmt = call; + while (has_single_use (data_dep)) + { + use_operand_p use_p; + single_imm_use (data_dep, &use_p, &stmt); + + if (!goacc_reduction_call_p (stmt)) + return true; + + data_dep = gimple_call_lhs (stmt); + } + + gcc_checking_assert (goacc_reduction_call_p (stmt)); + + /* Call starting further reduction use in outer loop. */ + if (goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_SETUP)) + return true; + + /* Reduction use ends with last internal call in present loop. */ + if (goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_TEARDOWN)) + return false; + gcc_unreachable (); +} + +/* Add all ssa names to VARS that can be reached from BB by walking + through the phi nodes which start at the result of an OpenACC + reduction computation in BB. */ + +static void +collect_oacc_reduction_vars_in_bb (basic_block bb, hash_set &vars) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (!goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_FINI)) + continue; + + tree var = gimple_call_arg (stmt, 2); + gcc_checking_assert (TREE_CODE (var) == SSA_NAME); + + if (vars.contains (var)) + continue; + + gimple *def_stmt = SSA_NAME_DEF_STMT (var); + + if (gimple_code (def_stmt) != GIMPLE_PHI) + { + gcc_checking_assert (goacc_reduction_call_p (def_stmt)); + + continue; + } + + gcc_checking_assert ( + goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_FINI)); + gcc_checking_assert (gimple_code (def_stmt) == GIMPLE_PHI); + + if (reduction_use_in_outer_loop_p (as_a (stmt))) + vars.add (var); + + collect_oacc_reduction_vars_phi_walk (static_cast (def_stmt), + vars); + } +} + +/* Add all ssa names to VARS that are defined by phi nodes in the header of LOOP + such that at least one argument of the phi belongs to VARS. */ + +static void +collect_oacc_reduction_vars_in_loop_header (loop_p loop, hash_set &vars) +{ + for (gphi_iterator gpi = gsi_start_phis (loop->header); !gsi_end_p (gpi); + gsi_next (&gpi)) + { + gphi *phi = const_cast (gpi.phi ()); + + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES) + { + tree use = USE_FROM_PTR (use_p); + if (vars.contains (use)) + vars.add (gimple_phi_result (phi)); + } + } +} + +/* Find the ssa names that belong to an OpenACC reduction in the OpenACC loop + that surrounds the cfg loop LOOP and add them to VARS. LOOP must be + contained in an OpenACC loop. + + Since the reductions have not and cannot be lowered before execution of the + Graphite pass because their lowering is device dependent, Graphite needs to + simulate the privatization of the reduction variables by removing + dependences between the iteration instances of the loop and the dependences + arising from copying the initial value of the reduction variable in and the + result out. + + The OpenACC lowering will copy the results of reduction computations at the + IFN_GOACC_REDUCTION_FINI calls. The main reduction statement can thus be + identified by walking from those calls through all encountered phi nodes + until we reach a gimple assignment statement. The ssa name defined by this + statement as well as the ssa_names encountered in the phis along the way are + recorded in VARS. In addition, the ssa name defined by each phi which uses a + previously identified reduction variable in LOOP's header will also be added + to VARS. */ + +void +collect_oacc_reduction_vars (loop_p loop, hash_set &vars) +{ + gcall_pair tail = find_oacc_tail_marks (loop); + bool in_openacc_loop = tail.first != NULL; + + if (!in_openacc_loop) + return; + + const gcall *top_mark = tail.first; + const gcall *bottom_mark = tail.second; + + basic_block bb = top_mark->bb; + gcc_checking_assert (single_succ_p (bb)); + + do + { + bb = single_succ (bb); + collect_oacc_reduction_vars_in_bb (bb, vars); + } + while (bb != bottom_mark->bb && single_succ_p (bb)); + + collect_oacc_reduction_vars_in_loop_header (loop, vars); +} + +static void collect_oacc_privatized_vars_phi_walk_visit_phi_uses ( + tree var, hash_set &vars, hash_set &visited); + +/* Add all ssa names to VARS that can be reached from PHI by a phi node walk. */ + +static void +collect_oacc_privatized_vars_phi_walk (gphi *phi, hash_set &vars, + hash_set &visited) +{ + tree var = PHI_RESULT (phi); + bool existed = vars.add (var); + if (existed) + return; + + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES) + { + tree use = USE_FROM_PTR (use_p); + if (TREE_CODE (use) != SSA_NAME) + continue; + + if (visited.contains (use)) + continue; + + gimple *def_stmt = SSA_NAME_DEF_STMT (use); + gphi *use_phi = dyn_cast (def_stmt); + if (use_phi) + { + collect_oacc_privatized_vars_phi_walk (use_phi, vars, visited); + visited.add (use); + continue; + } + + vars.add (use); + + /* Visit the uses of USE in other phi nodes. This is used to get from loop + exit phis in inner loops to the loop entry phis. */ + + collect_oacc_privatized_vars_phi_walk_visit_phi_uses (use, vars, visited); + visited.add (use); + } +} + +/* Records all uses of VAR in phis in VARS and continues the phi walk on each + such use. */ + +static void +collect_oacc_privatized_vars_phi_walk_visit_phi_uses (tree var, + hash_set &vars, + hash_set &visited) +{ + imm_use_iterator iter; + use_operand_p use_p; + FOR_EACH_IMM_USE_FAST (use_p, iter, var) + { + tree use = USE_FROM_PTR (use_p); + if (TREE_CODE (use) != SSA_NAME) + continue; + + if (visited.contains (use)) + continue; + + gimple *use_stmt = USE_STMT (use_p); + gphi *use_phi = dyn_cast (use_stmt); + + if (use_phi) + { + visited.add (PHI_RESULT (use_phi)); + collect_oacc_privatized_vars_phi_walk (use_phi, vars, visited); + continue; + } + + if (TREE_CODE (use) == SSA_NAME + && SSA_NAME_VAR (use) == SSA_NAME_VAR (var)) + { + if (!vars.add (use)) + collect_oacc_privatized_vars_phi_walk_visit_phi_uses (use, vars, + visited); + continue; + } + } + + return; +} + +/* Return the first IFN_UNIQUE call with the given KIND that follows the tail + sequence of the OpenACC loop surrounding LOOP. */ + +static gcall * +find_ifn_unique_call_below (loop_p loop, enum ifn_unique_kind kind) +{ + gcall_pair tail = find_oacc_tail_marks (loop); + bool in_openacc_loop = tail.first != NULL; + + if (!in_openacc_loop) + return NULL; + + edge exit = single_exit (loop); + basic_block bb = exit->dest; + while ((bb = get_immediate_dominator (CDI_POST_DOMINATORS, bb))) + { + gimple *stmt = last_stmt (bb); + + if (!stmt) + continue; + + if (gimple_call_ifn_unique_p (stmt, kind)) + return static_cast (stmt); + } + + return NULL; +} + +/* Return the IFN_UNIQUE_OACC_PRIVATE_SCALAR call which follows the tail + sequence of the OpenACC loop surrounding LOOP. */ + +gcall * +get_oacc_private_scalars_call (loop_p loop) +{ + return find_ifn_unique_call_below (loop, IFN_UNIQUE_OACC_PRIVATE_SCALAR); +} + +/* Return the IFN_UNIQUE_OACC_FIRSTPRIVATE call which follows the tail + sequence of the OpenACC loop surrounding LOOP. */ + +gcall * +get_oacc_firstprivate_call (loop_p loop) +{ + return find_ifn_unique_call_below (loop, IFN_UNIQUE_OACC_FIRSTPRIVATE); +} + +/* Find the ssa names that belong to the computation of variables that are + "private" in the OpenACC loop that surrounds the CFG loop LOOP and add them + to VARS. LOOP must be contained in an OpenACC loop. + + The CFG loop structure of OpenACC loops does not directly reflect the + privatization of the variable since the original loop has been enclosed in a + "chunking" loop. The "private" scalars variables are alive in those two + outermost CFG loops and the corresponding phis must be ignored by Graphite in + order to recognize the parallelizability of the loop. Omp-low.c places a + special internal function call after the outermost loop of a parallel region + whose arguments list the "private" variables that are considered here */ + +void +collect_oacc_privatized_vars (gcall *marker, hash_set &vars) +{ + if (!marker) + return; + + gcc_checking_assert (marker->bb->loop_father->num == 0); + + /* Search for phis that can be reached from the vars listed in the + PRIVATE_SCALARS_CALL's arguments. */ + + const unsigned n = gimple_call_num_args (marker); + for (unsigned i = 1; i < n; ++i) + { + tree arg = gimple_call_arg (marker, i); + + if (TREE_CODE (arg) != SSA_NAME) + continue; + + gimple *def_stmt = SSA_NAME_DEF_STMT (arg); + gphi *phi = dyn_cast (def_stmt); + if (!phi) + { + /* If the argument does not point to a phi, then it must be some value + defined outside of any OpenACC loop nest, i.e. a parameter of the + loop-nest. */ + gcc_checking_assert (!def_stmt->bb + || def_stmt->bb->loop_father->num == 0); + continue; + } + + hash_set visited; + collect_oacc_privatized_vars_phi_walk (phi, vars, visited); + } +} + +/* Return true if LOOP is an OpenACC loop with an "auto" clause, false otherwise. */ + +static bool +oacc_loop_with_auto_clause_p (loop_p loop) +{ + gcall_pair head_marks = find_oacc_head_marks (loop); + + if (!head_marks.first) + return false; + + unsigned flags = TREE_INT_CST_LOW (gimple_call_arg (head_marks.first, 3)); + return flags & OLF_AUTO; +} + +/* Return true if FUN is an outlined OpenACC function that contains loops with + "auto" clauses. */ + +static bool +function_has_auto_loops_p (function *fun) +{ + gcc_checking_assert (oacc_function_p (fun)); + + for (auto loop : loops_list (fun, 0)) + if (oacc_loop_with_auto_clause_p (loop)) + return true; + + return false; +} + +/* Return true if Graphite might analyze outlined OpenACC functions for the kind + of target region for which FUN was created. The actual decision whether + Graphite runs on FUN may be subject to further restrictions. */ + +bool +graphite_analyze_oacc_target_region_type_p (function *fun) +{ + gcc_checking_assert (oacc_function_p (fun)); + + bool is_oacc_parallel + = lookup_attribute ("oacc parallel", + DECL_ATTRIBUTES (current_function_decl)) + != NULL; + + bool is_oacc_parallel_kernels_graphite + = lookup_attribute ("oacc parallel_kernels_graphite", + DECL_ATTRIBUTES (current_function_decl)) + != NULL; + + return is_oacc_parallel || is_oacc_parallel_kernels_graphite; +} + +/* Return true if FUN is an outlined OpenACC function that is going to be + analyzed by Graphite. */ + +bool +graphite_analyze_oacc_function_p (function *fun) +{ + gcc_checking_assert (oacc_function_p (fun)); + + return graphite_analyze_oacc_target_region_type_p (cfun) + && function_has_auto_loops_p (cfun); +} diff --git a/gcc/graphite-oacc.h b/gcc/graphite-oacc.h new file mode 100644 index 000000000000..458e8de24dac --- /dev/null +++ b/gcc/graphite-oacc.h @@ -0,0 +1,55 @@ +/* Functions for analyzing the OpenACC loop structure from Graphite. + + Copyright (C) 2021 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_GRAPHITE_OACC_H +#define GCC_GRAPHITE_OACC_H + +#include "stringpool.h" +#include "omp-general.h" +#include "attribs.h" +#include "cfgloop.h" +#include "tree-pretty-print.h" +#include "print-tree.h" + +static inline bool oacc_function_p (function *fun) +{ + return oacc_get_fn_attrib (fun->decl); +} + +extern bool is_oacc_private (tree var, loop_p loop); +extern void oacc_add_private_var_kills (loop_p loop, vec *kills); + +extern const gcall* find_oacc_head_mark (loop_p loop, bool last = false); + +extern void collect_oacc_reduction_vars (loop_p loop, hash_set &vars); +extern void collect_oacc_firstprivate_vars (loop_p loop, hash_set &vars); +extern void collect_oacc_private_scalars (loop_p loop, hash_set &vars); +extern void collect_oacc_privatized_vars (gcall *marker, hash_set &vars); + +extern gcall* get_oacc_firstprivate_call (loop_p loop); +extern gcall* get_oacc_private_scalars_call (loop_p loop); + +extern bool graphite_analyze_oacc_function_p (function *fun); +extern bool graphite_analyze_oacc_target_region_type_p (function *fun); + +extern gcall* get_oacc_firstprivate_call (loop_p loop); +extern gcall* get_oacc_private_scalars_call (loop_p loop); + +#endif /* GCC_GRAPHITE_OACC_H */ diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c index 6928f3e33dca..019452700a49 100644 --- a/gcc/graphite-optimize-isl.c +++ b/gcc/graphite-optimize-isl.c @@ -109,8 +109,8 @@ scop_get_domains (scop_p scop) /* Compute the schedule for SCOP based on its parameters, domain and set of constraints. Then apply the schedule to SCOP. */ -static bool -optimize_isl (scop_p scop) +bool +optimize_isl (scop_p scop, bool oacc_enabled_graphite) { int old_err = isl_options_get_on_error (scop->isl_context); int old_max_operations = isl_ctx_get_max_operations (scop->isl_context); @@ -196,7 +196,8 @@ optimize_isl (scop_p scop) print_schedule_ast (dump_file, scop->original_schedule, scop); isl_schedule_free (scop->transformed_schedule); scop->transformed_schedule = isl_schedule_copy (scop->original_schedule); - return flag_graphite_identity || flag_loop_parallelize_all; + return flag_graphite_identity || flag_loop_parallelize_all + || oacc_enabled_graphite; } return true; diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c index a7aabcb33c99..810f7a9918bc 100644 --- a/gcc/graphite-poly.c +++ b/gcc/graphite-poly.c @@ -89,7 +89,8 @@ debug_iteration_domains (scop_p scop) void new_poly_dr (poly_bb_p pbb, gimple *stmt, enum poly_dr_type type, - isl_map *acc, isl_set *subscript_sizes) + isl_map *acc, isl_set *subscript_sizes, + bool is_reduction) { static int id = 0; poly_dr_p pdr = XNEW (struct poly_dr); @@ -102,10 +103,12 @@ new_poly_dr (poly_bb_p pbb, gimple *stmt, enum poly_dr_type type, pdr->subscript_sizes = subscript_sizes; PDR_TYPE (pdr) = type; PBB_DRS (pbb).safe_push (pdr); + pdr->is_reduction = is_reduction; if (dump_file) { - fprintf (dump_file, "Converting dr: "); + fprintf (dump_file, "Converting%sdr: ", + is_reduction ? " reduction " : " "); print_pdr (dump_file, pdr); fprintf (dump_file, "To polyhedral representation:\n"); fprintf (dump_file, " - access functions: "); @@ -181,6 +184,10 @@ print_pdr (FILE *file, poly_dr_p pdr) fprintf (file, "may_write \n"); break; + case PDR_KILL: + fprintf (file, "kill \n"); + break; + default: gcc_unreachable (); } @@ -206,13 +213,15 @@ debug_pdr (poly_dr_p pdr) gimple_poly_bb_p new_gimple_poly_bb (basic_block bb, vec drs, - vec reads, vec writes) + vec reads, vec writes, + vec kills) { gimple_poly_bb_p gbb = XNEW (struct gimple_poly_bb); GBB_BB (gbb) = bb; GBB_DATA_REFS (gbb) = drs; gbb->read_scalar_refs = reads; gbb->write_scalar_refs = writes; + gbb->kill_scalar_refs = kills; GBB_CONDITIONS (gbb).create (0); GBB_CONDITION_CASES (gbb).create (0); @@ -229,6 +238,7 @@ free_gimple_poly_bb (gimple_poly_bb_p gbb) GBB_CONDITION_CASES (gbb).release (); gbb->read_scalar_refs.release (); gbb->write_scalar_refs.release (); + gbb->kill_scalar_refs.release (); XDELETE (gbb); } @@ -255,6 +265,9 @@ new_scop (edge entry, edge exit) scop_set_region (s, region); s->pbbs.create (3); s->drs.create (3); + s->reduction_vars = new hash_set(1); + s->oacc_firstprivate_vars = new hash_set(1); + s->oacc_private_scalars = new hash_set(1); s->unhandled_alias_ddrs.create (1); s->dependence = NULL; return s; @@ -273,6 +286,9 @@ free_scop (scop_p scop) scop->pbbs.release (); scop->drs.release (); + delete scop->reduction_vars; + delete scop->oacc_firstprivate_vars; + delete scop->oacc_private_scalars; scop->unhandled_alias_ddrs.release (); isl_set_free (scop->param_context); @@ -529,6 +545,23 @@ debug_isl_map (__isl_keep isl_map *map) print_isl_map (stderr, map); } + +void +print_isl_space (FILE *f, __isl_keep isl_space *space) +{ + isl_printer *p = isl_printer_to_file (the_isl_ctx, f); + p = isl_printer_set_yaml_style (p, ISL_YAML_STYLE_BLOCK); + p = isl_printer_print_space (p, space); + p = isl_printer_print_str (p, "\n"); + isl_printer_free (p); +} + +DEBUG_FUNCTION void +debug_isl_space (__isl_keep isl_space *space) +{ + print_isl_space (stderr, space); +} + void print_isl_union_map (FILE *f, __isl_keep isl_union_map *map) { diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 924004e3f3c4..234dbe0ec729 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -49,6 +49,10 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "cfganal.h" #include "graphite.h" +#include "omp-general.h" +#include "graphite-oacc.h" +#include "print-tree.h" +#include "internal-fn.h" class debug_printer { @@ -527,14 +531,13 @@ scop_detection::merge_sese (sese_l first, sese_l second) const static void print_sese_loop_numbers (FILE *file, sese_l sese) { - loop_p loop; bool printed = false; - FOR_EACH_LOOP (loop, 0) - { - if (loop_in_sese_p (loop, sese)) - fprintf (file, "%d, ", loop->num); - printed = true; - } + for (auto loop : loops_list (cfun, 0)) + { + if (loop_in_sese_p (loop, sese)) + fprintf (file, "%d, ", loop->num); + printed = true; + } if (printed) fprintf (file, "\b\b"); } @@ -630,7 +633,9 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop) DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n"); return false; } - if (!niter_desc.control.no_overflow) + /* TODO The zero niter can probably be allowed in general */ + if (!niter_desc.control.no_overflow + && !(oacc_function_p (cfun) && integer_zerop (niter))) { DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n"); return false; @@ -701,8 +706,7 @@ scop_detection::add_scop (sese_l s) s.exit = single_succ_edge (s.exit->dest); } - /* Do not add scops with only one loop. */ - if (region_has_one_loop (s)) + if (!oacc_function_p (cfun) && region_has_one_loop (s)) { DEBUG_PRINT (dp << "[scop-detection-fail] Discarding one loop SCoP: "; print_sese (dump_file, s)); @@ -729,11 +733,10 @@ scop_detection::add_scop (sese_l s) if (dump_file && dump_flags & TDF_DETAILS) { - loop_p loop; fprintf (dump_file, "Loops in SCoP: "); - FOR_EACH_LOOP (loop, 0) - if (loop_in_sese_p (loop, s)) - fprintf (dump_file, "%d ", loop->num); + for (auto loop : loops_list (cfun, 0)) + if (loop_in_sese_p (loop, s)) + fprintf (dump_file, "%d ", loop->num); fprintf (dump_file, "\n"); } } @@ -1084,6 +1087,17 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt) return true; } +/* Check if STMT is a internal OpenACC function call that should be ignored when + Graphite checks side effects. */ + +static inline bool +ignored_oacc_internal_call_p (gimple *stmt) +{ + return is_gimple_call (stmt) + && (gimple_call_internal_p (stmt, IFN_UNIQUE) + || gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION)); +} + /* GIMPLE_ASM and GIMPLE_CALL may embed arbitrary side effects. Calls have side-effects, except those to const or pure functions. */ @@ -1091,6 +1105,9 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt) static bool stmt_has_side_effects (gimple *stmt) { + if (ignored_oacc_internal_call_p (stmt)) + return false; + if (gimple_has_volatile_ops (stmt) || (gimple_code (stmt) == GIMPLE_CALL && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))) @@ -1288,6 +1305,7 @@ scan_tree_for_params (sese_info_p s, tree e) case NEGATE_EXPR: case BIT_NOT_EXPR: CASE_CONVERT: + case VIEW_CONVERT_EXPR: case NON_LVALUE_EXPR: scan_tree_for_params (s, TREE_OPERAND (e, 0)); break; @@ -1362,6 +1380,9 @@ find_scop_parameters (scop_p scop) static void add_write (vec *writes, tree def) { + if (ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (def))) + return; + writes->safe_push (def); DEBUG_PRINT (dp << "Adding scalar write: "; print_generic_expr (dump_file, def); @@ -1370,9 +1391,27 @@ add_write (vec *writes, tree def) SSA_NAME_DEF_STMT (def), 0)); } +static void +add_kill (vec *kills, tree def) +{ + if (ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (def))) + return; + + kills->safe_push (def); + DEBUG_PRINT (dp << "Adding scalar kill: "; + print_generic_expr (dump_file, def); + dp << "\n"); +} + static void add_read (vec *reads, tree use, gimple *use_stmt) { + gcc_assert (TREE_CODE (use) == SSA_NAME); + + if ((use_stmt && ignored_oacc_internal_call_p (use_stmt)) + || ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (use))) + return; + DEBUG_PRINT (dp << "Adding scalar read: "; print_generic_expr (dump_file, use); dp << "\nFrom stmt: "; @@ -1428,6 +1467,58 @@ build_cross_bb_scalars_use (scop_p scop, tree use, gimple *use_stmt, add_read (reads, use, use_stmt); } +/* Add kills for all ssa names in vector FROM to vector KILLS. */ + +static void add_kills (hash_set* from, vec &kills) +{ + hash_set::iterator end = from->end(); + hash_set::iterator it = from->begin (); + for (; it != end; ++it) + { + tree var = *it; + add_kill (&kills, var); + } +} + +/* Add kill operations for the privatized OpenACC variables that have been + recorded for SCOP for the basic block BB into the vector KILLS. */ + +static void +add_oacc_kills (scop_p scop, basic_block bb, vec &kills) +{ + + loop_p loop = bb->loop_father; + + /* Right now we only handle "firstprivate" and "private" variables that occur + on an OpenACC computer region. Those affect only the outermost and hence - + because of the "chunking" loop created in omp-expand.c around the original + loop - the two outermost CFG loops. */ + if (loop_depth (loop) > 2) + return; + + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (e->src == loop->header) + { + add_kills (scop->oacc_private_scalars, kills); + add_kills (scop->oacc_firstprivate_vars, kills); + break; + } + } + + FOR_EACH_EDGE (e, ei, bb->succs) + { + if (e->dest == loop->header) + { + add_kills (scop->oacc_private_scalars, kills); + add_kills (scop->oacc_firstprivate_vars, kills); + break; + } + } +} + /* Generates a polyhedral black box only if the bb contains interesting information. */ @@ -1436,6 +1527,7 @@ try_generate_gimple_bb (scop_p scop, basic_block bb) { vec drs = vNULL; vec writes = vNULL; + vec kills = vNULL; vec reads = vNULL; sese_l region = scop->scop_info->region; @@ -1497,10 +1589,15 @@ try_generate_gimple_bb (scop_p scop, basic_block bb) gsi_next (&psi)) { gphi *phi = psi.phi (); - tree res = gimple_phi_result (phi); - if (virtual_operand_p (res)) - continue; - /* To simulate out-of-SSA the predecessor of edges into PHI nodes + tree res = gimple_phi_result (phi); + if (virtual_operand_p (res)) + continue; + + if (scop->oacc_private_scalars->contains (res) + || scop->oacc_firstprivate_vars->contains (res)) + continue; + + /* To simulate out-of-SSA the predecessor of edges into PHI nodes has a copy from the PHI argument to the PHI destination. */ if (! scev_analyzable_p (res, scop->scop_info->region)) add_write (&writes, res); @@ -1536,10 +1633,15 @@ try_generate_gimple_bb (scop_p scop, basic_block bb) } } - if (drs.is_empty () && writes.is_empty () && reads.is_empty ()) + if (loop && /* i.e. BB belongs to SCOP. */ + oacc_function_p (cfun)) + add_oacc_kills (scop, bb, kills); + + if (drs.is_empty () && writes.is_empty () && reads.is_empty () + && kills.is_empty ()) return NULL; - return new_gimple_poly_bb (bb, drs, reads, writes); + return new_gimple_poly_bb (bb, drs, reads, writes, kills); } /* Checks if all parts of DR are defined outside of REGION. This allows an @@ -1802,10 +1904,21 @@ private: auto_vec conditions, cases; scop_p scop; }; -} + gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo) - : dom_walker (direction, ALL_BLOCKS, bb_to_rpo), scop (scop) + : dom_walker (direction, ALL_BLOCKS, bb_to_rpo), scop (scop) { + if (oacc_function_p (cfun)) + { + edge scop_entry = scop->scop_info->region.entry; + loop_p loop = scop_entry->dest->loop_father; + gcall *firstprivate_call = get_oacc_firstprivate_call (loop); + collect_oacc_privatized_vars (firstprivate_call, + *scop->oacc_firstprivate_vars); + + gcall *private_call = get_oacc_private_scalars_call (loop); + collect_oacc_privatized_vars (private_call, *scop->oacc_private_scalars); + } } /* Call-back for dom_walk executed before visiting the dominated @@ -1864,6 +1977,8 @@ gather_bbs::before_dom_children (basic_block bb) data_reference_p dr; FOR_EACH_VEC_ELT (gbb->data_refs, i, dr) { + gcc_checking_assert (! ignored_oacc_internal_call_p (DR_STMT (dr))); + DEBUG_PRINT (dp << "Adding memory "; if (dr->is_read) dp << "read: "; @@ -1899,6 +2014,8 @@ gather_bbs::after_dom_children (basic_block bb) } } +} + /* Compute sth like an execution order, dominator order with first executing edges that stay inside the current loop, delaying processing exit edges. */ @@ -1921,6 +2038,21 @@ cmp_pbbs (const void *pa, const void *pb) return 0; } +/* Analyze the OpenACC loop structure surrounding SCOP to determine the ssa + names that belong to OpenACC reduction computations. */ + +static void +determine_openacc_reductions (scop_p scop) +{ + for (auto loop : loops_list (cfun, 0)) + { + if (!loop_in_sese_p (loop, scop->scop_info->region)) + continue; + + collect_oacc_reduction_vars (loop, *scop->reduction_vars); + } +} + /* Find Static Control Parts (SCoP) in the current function and pushes them to SCOPS. */ @@ -1956,11 +2088,12 @@ build_scops (vec *scops) /* Sort pbbs after execution order for initial schedule generation. */ scop->pbbs.qsort (cmp_pbbs); - if (! build_alias_set (scop)) - { - DEBUG_PRINT (dp << "[scop-detection-fail] cannot handle dependences\n"); - free_scop (scop); - continue; + if (!build_alias_set (scop)) + { + DEBUG_PRINT (dp + << "[scop-detection-fail] cannot handle dependences\n"); + free_scop (scop); + continue; } /* Do not optimize a scop containing only PBBs that do not belong @@ -1997,6 +2130,9 @@ build_scops (vec *scops) continue; } + if (oacc_function_p (cfun)) + determine_openacc_reductions (scop); + scops->safe_push (scop); } diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index 33d6a98327b8..e6aced3b0004 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see #include "gimplify.h" #include "gimplify-me.h" #include "tree-cfg.h" +#include "graphite-oacc.h" #include "tree-ssa-loop-manip.h" #include "tree-ssa-loop-niter.h" #include "tree-ssa-loop.h" @@ -46,6 +47,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "domwalk.h" #include "tree-ssa-propagate.h" +#include "tree-pretty-print.h" +#include "gimple-pretty-print.h" +#include "internal-fn.h" #include "graphite.h" /* Return an isl identifier for the polyhedral basic block PBB. */ @@ -201,6 +205,8 @@ parameter_index_in_region (tree name, sese_info_p region) return -1; } +tree oacc_ifn_call_extract (gimple*); + /* Extract an affine expression from the tree E in the scop S. */ static isl_pw_aff * @@ -604,6 +610,21 @@ pdr_add_data_dimensions (isl_set *subscript_sizes, scop_p scop, return isl_set_coalesce (subscript_sizes); } +static inline bool +oacc_internal_call_p (gimple *stmt) +{ + if (!stmt || !is_gimple_call (stmt)) + return false; + + /* graphite-scop-detection.c should filter out those calls. */ + gcc_assert (!gimple_call_internal_p (stmt, IFN_UNIQUE)); + + /* Should be handled by scalar evolution analysis. */ + gcc_assert (!gimple_call_internal_p (stmt, IFN_GOACC_LOOP)); + + return false; +} + /* Build data accesses for DRI. */ static void @@ -640,13 +661,18 @@ build_poly_dr (dr_info &dri) subscript_sizes = pdr_add_data_dimensions (subscript_sizes, scop, dr); } - new_poly_dr (pbb, DR_STMT (dr), DR_IS_READ (dr) ? PDR_READ : PDR_WRITE, - acc, subscript_sizes); + if (oacc_internal_call_p (DR_STMT (dr))) + return; + + bool is_reduction = scop->reduction_vars->contains (DR_BASE_ADDRESS (dr)); + enum poly_dr_type dr_type = DR_IS_READ (dr) ? PDR_READ : PDR_WRITE; + + new_poly_dr (pbb, DR_STMT (dr), dr_type, acc, subscript_sizes, is_reduction); } static void build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind, - isl_map *acc, isl_set *subscript_sizes) + isl_map *acc, isl_set *subscript_sizes, bool is_reduction) { scop_p scop = PBB_SCOP (pbb); /* Each scalar variable has a unique alias set number starting from @@ -663,7 +689,7 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind, c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1); new_poly_dr (pbb, stmt, kind, isl_map_add_constraint (acc, c), - subscript_sizes); + subscript_sizes, is_reduction); } /* Record all cross basic block scalar variables in PBB. */ @@ -675,6 +701,7 @@ build_poly_sr (poly_bb_p pbb) gimple_poly_bb_p gbb = PBB_BLACK_BOX (pbb); vec &reads = gbb->read_scalar_refs; vec &writes = gbb->write_scalar_refs; + vec &kills = gbb->kill_scalar_refs; isl_space *dc = isl_set_get_space (pbb->domain); int nb_out = 1; @@ -689,13 +716,39 @@ build_poly_sr (poly_bb_p pbb) int i; tree var; FOR_EACH_VEC_ELT (writes, i, var) + { + if (oacc_internal_call_p (SSA_NAME_DEF_STMT (var))) + continue; + + bool is_reduction = scop->reduction_vars->contains (var); + build_poly_sr_1 (pbb, SSA_NAME_DEF_STMT (var), var, PDR_WRITE, - isl_map_copy (acc), isl_set_copy (subscript_sizes)); + isl_map_copy (acc), isl_set_copy (subscript_sizes), + is_reduction); + } + + FOR_EACH_VEC_ELT (kills, i, var) + { + build_poly_sr_1 (pbb, NULL, var, PDR_KILL, + isl_map_copy (acc), isl_set_copy (subscript_sizes), + false); + } scalar_use *use; FOR_EACH_VEC_ELT (reads, i, use) + { + tree use_var = use->second; + gcc_checking_assert (TREE_CODE (use_var) == SSA_NAME); + + if (oacc_internal_call_p (use->first) + || oacc_internal_call_p (SSA_NAME_DEF_STMT (use->second))) + continue; + + bool is_reduction = scop->reduction_vars->contains (use->second); + build_poly_sr_1 (pbb, use->first, use->second, PDR_READ, isl_map_copy (acc), - isl_set_copy (subscript_sizes)); + isl_set_copy (subscript_sizes), is_reduction); + } isl_map_free (acc); isl_set_free (subscript_sizes); diff --git a/gcc/graphite.c b/gcc/graphite.c index 0060caea22ed..293d5425ff15 100644 --- a/gcc/graphite.c +++ b/gcc/graphite.c @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. If not see #include "cfghooks.h" #include "tree.h" #include "gimple.h" +#include "gimple-iterator.h" +#include "gimplify-me.h" #include "ssa.h" #include "fold-const.h" #include "gimple-iterator.h" @@ -58,6 +60,14 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "tree-into-ssa.h" #include "graphite.h" +#include "graphite-oacc.h" +#include "cgraph.h" +#include "gimple-pretty-print.h" +#include "print-tree.h" +#include "tree-pretty-print.h" +#include "internal-fn.h" + +static bool have_isl = true; /* Print global statistics to FILE. */ @@ -416,9 +426,12 @@ graphite_transform_loops (void) vec scops = vNULL; isl_ctx *ctx; - /* If a function is parallel it was most probably already run through graphite - once. No need to run again. */ - if (parallelized_function_p (cfun->decl)) + /* If a function is parallel it was most probably already run through + graphite once. No need to run again. This is not true for OpenACC + functions. The function was created for offloading, bu we still might have + to figure out which loops may be parallelized. */ + + if (parallelized_function_p (cfun->decl) && !oacc_function_p (cfun)) return; calculate_dominance_info (CDI_DOMINATORS); @@ -444,6 +457,7 @@ graphite_transform_loops (void) seir_cache = new hash_map; calculate_dominance_info (CDI_POST_DOMINATORS); + set_scev_analyze_openacc_calls (oacc_function_p (cfun)); build_scops (&scops); free_dominance_info (CDI_POST_DOMINATORS); @@ -457,26 +471,50 @@ graphite_transform_loops (void) print_global_statistics (dump_file); } - FOR_EACH_VEC_ELT (scops, i, scop) - if (dbg_cnt (graphite_scop)) - { - scop->isl_context = ctx; - if (!build_poly_scop (scop)) - continue; - - if (!apply_poly_transforms (scop)) - continue; - - changed = true; - if (graphite_regenerate_ast_isl (scop) - && dump_enabled_p ()) - { - dump_user_location_t loc = find_loop_location - (scops[i]->scop_info->region.entry->dest->loop_father); - dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, - "loop nest optimized\n"); - } - } + if (oacc_function_p (cfun)) + { + /* OpenACC uses Graphite for dependence analysis only. + Code generation would need not to understand the + OpenACC internal function calls before it could be + enabled. */ + + FOR_EACH_VEC_ELT (scops, i, scop) + if (dbg_cnt (graphite_scop)) + { + scop->isl_context = ctx; + if (!build_poly_scop (scop)) + continue; + + if (!optimize_isl (scop, true)) + continue; + + graphite_oacc_analyze_scop (scop); + changed = true; + } + set_scev_analyze_openacc_calls (false); + } + else // Non-OpenACC-functions + { + FOR_EACH_VEC_ELT (scops, i, scop) + if (dbg_cnt (graphite_scop)) + { + scop->isl_context = ctx; + if (!build_poly_scop (scop)) + continue; + + if (!apply_poly_transforms (scop)) + continue; + + changed = true; + if (graphite_regenerate_ast_isl (scop) && dump_enabled_p ()) + { + dump_user_location_t loc = find_loop_location ( + scops[i]->scop_info->region.entry->dest->loop_father); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, + "loop nest optimized\n"); + } + } + } delete seir_cache; seir_cache = NULL; @@ -518,6 +556,8 @@ graphite_transform_loops (void) #else /* If isl is not available: #ifndef HAVE_isl. */ +static bool have_isl = false; + static void graphite_transform_loops (void) { @@ -530,7 +570,10 @@ graphite_transform_loops (void) static unsigned int graphite_transforms (struct function *fun) { - if (number_of_loops (fun) <= 1) + + unsigned num_loops = number_of_loops (fun); + if (num_loops == 0 + || (num_loops == 1 && !oacc_function_p (cfun))) return 0; graphite_transform_loops (); @@ -538,14 +581,35 @@ graphite_transforms (struct function *fun) return 0; } +/* Return TRUE if fun is an OpenACC outlined function that should be analyzed + by Graphite. */ + +static inline bool oacc_enable_graphite_p (function *fun) +{ + if (!flag_openacc || !oacc_get_fn_attrib (fun->decl)) + return false; + + if (!graphite_analyze_oacc_target_region_type_p (fun)) + return false; + + bool optimizing = global_options.x_optimize <= 0; + /* Enabling Graphite if isl is not available aborts compilation. Prefer to + skip it and emit a warning, unless optimizations are enabled. */ + if (!have_isl && !optimizing) + warning (OPT_Wall, "Unable to analyze OpenACC regions with Graphite; isl " + "is not available."); + return true; +} + static bool -gate_graphite_transforms (void) +gate_graphite_transforms (function *fun) { /* Enable -fgraphite pass if any one of the graphite optimization flags is turned on. */ if (flag_graphite_identity || flag_loop_parallelize_all - || flag_loop_nest_optimize) + || flag_loop_nest_optimize + || oacc_enable_graphite_p (fun)) flag_graphite = 1; return flag_graphite != 0; @@ -574,7 +638,7 @@ public: {} /* opt_pass methods: */ - virtual bool gate (function *) { return gate_graphite_transforms (); } + virtual bool gate (function *fun) { return gate_graphite_transforms (fun); } }; // class pass_graphite @@ -609,7 +673,7 @@ public: {} /* opt_pass methods: */ - virtual bool gate (function *) { return gate_graphite_transforms (); } + virtual bool gate (function *fun) { return gate_graphite_transforms (fun); } virtual unsigned int execute (function *fun) { return graphite_transforms (fun); } }; // class pass_graphite_transforms diff --git a/gcc/graphite.h b/gcc/graphite.h index 03febfa39986..9c508f31109f 100644 --- a/gcc/graphite.h +++ b/gcc/graphite.h @@ -42,7 +42,8 @@ enum poly_dr_type /* PDR_MAY_READs are represented using PDR_READS. This does not limit the expressiveness. */ PDR_WRITE, - PDR_MAY_WRITE + PDR_MAY_WRITE, + PDR_KILL }; struct poly_dr @@ -61,6 +62,9 @@ struct poly_dr enum poly_dr_type type; + /* Indicates that this PDR is part of an OpenACC "reduction" computation. */ + bool is_reduction; + /* The access polyhedron contains the polyhedral space this data reference will access. @@ -185,7 +189,7 @@ struct poly_dr #define PDR_ACCESSES(PDR) (NULL) void new_poly_dr (poly_bb_p, gimple *, enum poly_dr_type, - isl_map *, isl_set *); + isl_map *, isl_set *, bool); void debug_pdr (poly_dr_p); void print_pdr (FILE *, poly_dr_p); @@ -211,6 +215,14 @@ pdr_may_write_p (poly_dr_p pdr) return PDR_TYPE (pdr) == PDR_MAY_WRITE; } +/* Returns true when PDR is a "kill". */ + +static inline bool +pdr_kill_p (poly_dr_p pdr) +{ + return PDR_TYPE (pdr) == PDR_KILL; +} + /* POLY_BB represents a blackbox in the polyhedral model. */ struct poly_bb @@ -281,6 +293,8 @@ extern void print_isl_aff (FILE *, isl_aff *); extern void print_isl_constraint (FILE *, isl_constraint *); extern void print_isl_schedule (FILE *, isl_schedule *); extern void debug_isl_schedule (isl_schedule *); +extern void print_isl_space (FILE *, isl_space *); +extern void debug_isl_space (isl_space *); extern void print_isl_ast (FILE *, isl_ast_node *); extern void debug_isl_ast (isl_ast_node *); extern void debug_isl_set (isl_set *); @@ -380,6 +394,18 @@ struct scop /* All the data references in this scop. */ vec drs; + /* This set contains the ssa names that are OpenACC "reduction" variables + in the loops from SCOP using them. */ + hash_set *reduction_vars; + + /* If SCOP is contained in an OpenACC compute region, this is the set of + ssa names that are "firstprivate" in this region. */ + hash_set *oacc_firstprivate_vars; + + /* If SCOP is contained in an OpenACC compute region, this is the set of + ssa names that are "private" in this region. */ + hash_set *oacc_private_scalars; + /* The context describes known restrictions concerning the parameters and relations in between the parameters. @@ -411,7 +437,8 @@ struct scop extern scop_p new_scop (edge, edge); extern void free_scop (scop_p); extern gimple_poly_bb_p new_gimple_poly_bb (basic_block, vec, - vec, vec); + vec, vec, vec); +extern bool optimize_isl (scop_p, bool = false); extern bool apply_poly_transforms (scop_p); /* Set the region of SCOP to REGION. */ @@ -447,10 +474,10 @@ carries_deps (__isl_keep isl_union_map *schedule, extern bool build_poly_scop (scop_p); extern bool graphite_regenerate_ast_isl (scop_p); +extern bool graphite_oacc_analyze_scop (scop_p); extern void build_scops (vec *); extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree); extern void dot_all_sese (FILE *, vec &); extern void dot_sese (sese_l &); extern void dot_cfg (); - #endif diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 0cba95411a63..8a96f7600f68 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -3004,6 +3004,10 @@ expand_UNIQUE (internal_fn, gcall *stmt) else gcc_unreachable (); break; + case IFN_UNIQUE_OACC_PRIVATE: + case IFN_UNIQUE_OACC_PRIVATE_SCALAR: + case IFN_UNIQUE_OACC_FIRSTPRIVATE: + break; } if (pattern) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 19d0f849a5ad..d1028f05b0d8 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -40,7 +40,9 @@ along with GCC; see the file COPYING3. If not see DEF(UNSPEC), \ DEF(OACC_FORK), DEF(OACC_JOIN), \ DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK), \ - DEF(OACC_PRIVATE) + DEF(OACC_PRIVATE), \ + DEF(OACC_PRIVATE_SCALAR), \ + DEF(OACC_FIRSTPRIVATE) enum ifn_unique_kind { #define DEF(X) IFN_UNIQUE_##X diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index 70957a66da83..365d167b6428 100644 --- a/gcc/omp-expand.c +++ b/gcc/omp-expand.c @@ -108,6 +108,10 @@ struct omp_region /* The ordered stmt if type is GIMPLE_OMP_ORDERED and it has a depend clause. */ gomp_ordered *ord_stmt; + + /* True if this is nested inside an OpenACC kernels construct that + will be handled by the "parloops" pass. */ + bool inside_kernels_p; }; static struct omp_region *root_omp_region; @@ -8110,7 +8114,24 @@ expand_omp_for (struct omp_region *region, gimple *inner_stmt) expand_omp_simd (region, &fd); else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP) { - gcc_assert (!inner_stmt && !fd.non_rect); + struct omp_region *target_region; + for (target_region = region->outer; target_region; + target_region = target_region->outer) + { + if (region->type == GIMPLE_OMP_TARGET) + { + gomp_target *entry_stmt + = as_a (last_stmt (target_region->entry)); + + if (gimple_omp_target_kind (entry_stmt) + == GF_OMP_TARGET_KIND_OACC_KERNELS) + gcc_checking_assert ( + param_openacc_kernels != OPENACC_KERNELS_DECOMPOSE_PARLOOPS + && param_openacc_kernels != OPENACC_KERNELS_PARLOOPS); + } + } + + gcc_assert (!inner_stmt); expand_oacc_for (region, &fd); } else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_TASKLOOP) @@ -9515,6 +9536,10 @@ static void mark_loops_in_oacc_kernels_region (basic_block region_entry, basic_block region_exit) { + gcc_checking_assert (param_openacc_kernels + == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + class loop *outer = region_entry->loop_father; gcc_assert (region_exit == NULL || outer == region_exit->loop_father); @@ -9679,24 +9704,29 @@ expand_omp_target (struct omp_region *region) entry_stmt = as_a (last_stmt (region->entry)); target_kind = gimple_omp_target_kind (entry_stmt); + if (!(param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)) + gcc_checking_assert (target_kind != GF_OMP_TARGET_KIND_OACC_KERNELS); + new_bb = region->entry; offloaded = is_gimple_omp_offloaded (entry_stmt); switch (target_kind) { + case GF_OMP_TARGET_KIND_OACC_PARALLEL: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: + case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_REGION: case GF_OMP_TARGET_KIND_UPDATE: case GF_OMP_TARGET_KIND_ENTER_DATA: case GF_OMP_TARGET_KIND_EXIT_DATA: - case GF_OMP_TARGET_KIND_OACC_PARALLEL: case GF_OMP_TARGET_KIND_OACC_KERNELS: - case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_UPDATE: case GF_OMP_TARGET_KIND_OACC_ENTER_DATA: case GF_OMP_TARGET_KIND_OACC_EXIT_DATA: case GF_OMP_TARGET_KIND_OACC_DECLARE: - case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: - case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: case GF_OMP_TARGET_KIND_DATA: case GF_OMP_TARGET_KIND_OACC_DATA: case GF_OMP_TARGET_KIND_OACC_HOST_DATA: @@ -9736,6 +9766,12 @@ expand_omp_target (struct omp_region *region) NULL_TREE, DECL_ATTRIBUTES (child_fn)); break; case GF_OMP_TARGET_KIND_OACC_KERNELS: + gcc_checking_assert ( + param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + + mark_loops_in_oacc_kernels_region (region->entry, region->exit); + DECL_ATTRIBUTES (child_fn) = tree_cons (get_identifier ("oacc kernels"), NULL_TREE, DECL_ATTRIBUTES (child_fn)); @@ -9755,6 +9791,11 @@ expand_omp_target (struct omp_region *region) = tree_cons (get_identifier ("oacc parallel_kernels_gang_single"), NULL_TREE, DECL_ATTRIBUTES (child_fn)); break; + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: + DECL_ATTRIBUTES (child_fn) + = tree_cons (get_identifier ("oacc parallel_kernels_graphite"), + NULL_TREE, DECL_ATTRIBUTES (child_fn)); + break; default: /* Make sure we don't miss any. */ gcc_checking_assert (!(is_gimple_omp_oacc (entry_stmt) @@ -9967,6 +10008,7 @@ expand_omp_target (struct omp_region *region) case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: start_ix = BUILT_IN_GOACC_PARALLEL; break; case GF_OMP_TARGET_KIND_OACC_DATA: @@ -10448,14 +10490,15 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent, case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: - break; + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: + case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: + break; case GF_OMP_TARGET_KIND_UPDATE: case GF_OMP_TARGET_KIND_ENTER_DATA: case GF_OMP_TARGET_KIND_EXIT_DATA: case GF_OMP_TARGET_KIND_DATA: case GF_OMP_TARGET_KIND_OACC_DATA: case GF_OMP_TARGET_KIND_OACC_HOST_DATA: - case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: case GF_OMP_TARGET_KIND_OACC_UPDATE: case GF_OMP_TARGET_KIND_OACC_ENTER_DATA: case GF_OMP_TARGET_KIND_OACC_EXIT_DATA: @@ -10638,7 +10681,10 @@ public: /* opt_pass methods: */ virtual bool gate (function *fun) { - return !(fun->curr_properties & PROP_gimple_eomp); + return !(fun->curr_properties & PROP_gimple_eomp) + && (!oacc_get_kernels_attrib (cfun->decl) + || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); } virtual unsigned int execute (function *) { return execute_expand_omp (); } opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); } @@ -10708,6 +10754,8 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region, case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: + case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: break; case GF_OMP_TARGET_KIND_UPDATE: case GF_OMP_TARGET_KIND_ENTER_DATA: @@ -10715,7 +10763,6 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region, case GF_OMP_TARGET_KIND_DATA: case GF_OMP_TARGET_KIND_OACC_DATA: case GF_OMP_TARGET_KIND_OACC_HOST_DATA: - case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: case GF_OMP_TARGET_KIND_OACC_UPDATE: case GF_OMP_TARGET_KIND_OACC_ENTER_DATA: case GF_OMP_TARGET_KIND_OACC_EXIT_DATA: diff --git a/gcc/omp-general.c b/gcc/omp-general.c index 27a1bc8092c8..1940c96a200c 100644 --- a/gcc/omp-general.c +++ b/gcc/omp-general.c @@ -2929,6 +2929,15 @@ oacc_get_fn_attrib (tree fn) return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn)); } +/* Retrieve the oacc kernels attrib and return it. Non-oacc + functions will return NULL. */ + +tree +oacc_get_kernels_attrib (tree fn) +{ + return lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)); +} + /* Return true if FN is an OpenMP or OpenACC offloading function. */ bool @@ -2955,10 +2964,16 @@ oacc_get_fn_dim_size (tree fn, int axis) dims = TREE_CHAIN (dims); tree v = TREE_VALUE (dims); - /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to - avoid ICE for some OpenACC 'kernels' ("parloops") constructs. */ + /* TODO-kernels With 'pass_oacc_device_lower' moved "later", this is necessary + to avoid ICE for some OpenACC 'kernels' ("parloops") constructs. */ if (v == NULL_TREE) - return 0; + { + gcc_checking_assert ( + param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + + return 0; + } int size = TREE_INT_CST_LOW (v); diff --git a/gcc/omp-general.h b/gcc/omp-general.h index 8fe744c6a7af..28584ed8d56e 100644 --- a/gcc/omp-general.h +++ b/gcc/omp-general.h @@ -119,6 +119,7 @@ extern int oacc_verify_routine_clauses (tree, tree *, location_t, const char *); extern tree oacc_build_routine_dims (tree clauses); extern tree oacc_get_fn_attrib (tree fn); +extern tree oacc_get_kernels_attrib (tree fn); extern bool offloading_function_p (tree fn); extern int oacc_get_fn_dim_size (tree fn, int axis); extern int oacc_get_ifn_dim_arg (const gimple *stmt); diff --git a/gcc/omp-low.c b/gcc/omp-low.c index f58a191e014c..afd6061ae1e9 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -154,6 +154,12 @@ struct omp_context /* True if this construct can be cancelled. */ bool cancellable; + /* "firstprivate" variables in this context */ + hash_set *oacc_firstprivate_vars; + + /* Scalar "private" variables in this context. */ + hash_set *oacc_private_scalars; + /* True if lower_omp_1 should look up lastprivate conditional in parent context. */ bool combined_into_simd_safelen1; @@ -213,10 +219,30 @@ is_oacc_parallel_or_serial (omp_context *ctx) { enum gimple_code outer_type = gimple_code (ctx->stmt); return ((outer_type == GIMPLE_OMP_TARGET) - && ((gimple_omp_target_kind (ctx->stmt) - == GF_OMP_TARGET_KIND_OACC_PARALLEL) - || (gimple_omp_target_kind (ctx->stmt) - == GF_OMP_TARGET_KIND_OACC_SERIAL))); + && ((gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL) + || (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_SERIAL) + || (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE))); +} + +/* Return true if CTX corresponds to an oacc region that was generated from + an original kernels region that has been lowered to parallel regions. */ + +static bool +was_originally_oacc_kernels (omp_context *ctx) +{ + enum gimple_code outer_type = gimple_code (ctx->stmt); + return ((outer_type == GIMPLE_OMP_TARGET) + && ((gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED) + || (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE) + || (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE) + || (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS))); } /* Return whether CTX represents an OpenACC 'kernels' construct. @@ -242,10 +268,34 @@ is_oacc_kernels_decomposed_part (omp_context *ctx) == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED) || (gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE) + || (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE) || (gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS))); } +/* Return whether CTX represents an OpenACC 'kernels' decomposed part that will + be analyzed by Graphite. */ + +static bool +is_oacc_kernels_decomposed_graphite_part (omp_context *ctx) +{ + return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE; +} + + +/* Return whether CTX represents an OpenACC 'kernels' data part. */ + +static bool +is_oacc_data_kernels_part (omp_context *ctx) +{ + return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS; +} + /* Return true if STMT corresponds to an OpenMP target region. */ static bool is_omp_target (gimple *stmt) @@ -1011,6 +1061,9 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->cb.decl_map = new hash_map; + ctx->oacc_firstprivate_vars = new hash_set (); + ctx->oacc_private_scalars = new hash_set (); + return ctx; } @@ -1093,6 +1146,8 @@ delete_omp_context (splay_tree_value value) delete ctx->lastprivate_conditional_map; delete ctx->allocate_map; + delete ctx->oacc_firstprivate_vars; + delete ctx->oacc_private_scalars; XDELETE (ctx); } @@ -1155,6 +1210,43 @@ fixup_child_record_type (omp_context *ctx) = build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT); } +static void +oacc_record_firstprivate_var_clauses (omp_context *ctx, tree clauses) +{ + tree c; + + for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE) + { + tree decl = OMP_CLAUSE_DECL (c); + + if (TREE_ADDRESSABLE (decl)) + continue; + + ctx->oacc_firstprivate_vars->add (decl); + } +} + +static void +oacc_record_private_scalars (omp_context *ctx, tree clauses) +{ + tree c; + + for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE) + { + tree decl = OMP_CLAUSE_DECL (c); + if (!(VAR_P (decl) + && !(TREE_READONLY (decl) + && (TREE_STATIC (decl) || DECL_EXTERNAL (decl))))) + continue; + + if (TREE_ADDRESSABLE (decl)) + continue; + ctx->oacc_private_scalars->add (decl); + } +} + /* Instantiate decls as necessary in CTX to satisfy the data sharing specified by CLAUSES. */ @@ -1726,9 +1818,15 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) break; /* FALLTHRU */ - case OMP_CLAUSE_FIRSTPRIVATE: - case OMP_CLAUSE_PRIVATE: - case OMP_CLAUSE_LINEAR: + case OMP_CLAUSE_FIRSTPRIVATE: + if (is_oacc_kernels_decomposed_graphite_part (ctx)) + oacc_record_firstprivate_var_clauses (ctx, c); + gcc_fallthrough (); + case OMP_CLAUSE_PRIVATE: + if (is_oacc_kernels_decomposed_graphite_part (ctx)) + oacc_record_private_scalars (ctx, c); + gcc_fallthrough (); + case OMP_CLAUSE_LINEAR: case OMP_CLAUSE_IS_DEVICE_PTR: decl = OMP_CLAUSE_DECL (c); if (is_variable_sized (decl)) @@ -2591,12 +2689,21 @@ enclosing_target_ctx (omp_context *ctx) static bool ctx_in_oacc_kernels_region (omp_context *ctx) { + gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE + || param_openacc_kernels + == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + for (;ctx != NULL; ctx = ctx->outer) { gimple *stmt = ctx->stmt; - if (gimple_code (stmt) == GIMPLE_OMP_TARGET - && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS) - return true; + if (gimple_code (stmt) != GIMPLE_OMP_TARGET) + continue; + + int target_kind = gimple_omp_target_kind (stmt); + if (target_kind == GF_OMP_TARGET_KIND_OACC_KERNELS + || target_kind == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE) + return true; } return false; @@ -2610,6 +2717,10 @@ ctx_in_oacc_kernels_region (omp_context *ctx) static unsigned check_oacc_kernel_gwv (gomp_for *stmt, omp_context *ctx) { + gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + bool checking = true; unsigned outer_mask = 0; unsigned this_mask = 0; @@ -2681,9 +2792,11 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx) { omp_context *tgt = enclosing_target_ctx (outer_ctx); - if (!(tgt && is_oacc_kernels (tgt))) - for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) - { + if (!tgt + || (is_oacc_parallel_or_serial (tgt) + && !was_originally_oacc_kernels (tgt))) + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + { tree c_op0; switch (OMP_CLAUSE_CODE (c)) { @@ -3101,26 +3214,31 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) inside an OpenACC CTX. */ if (gimple_code (stmt) == GIMPLE_OMP_ATOMIC_LOAD || gimple_code (stmt) == GIMPLE_OMP_ATOMIC_STORE) - /* ..., except for the atomic codes that OpenACC shares with OpenMP. */ + /* ..., except for the atomic codes that OpenACC shares with OpenMP */ + ; + else if (gimple_code (stmt) == GIMPLE_OMP_TARGET + + && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE) + /* ... and except for target regions introduced for kernels. */ + ; - else if (!(is_gimple_omp (stmt) - && is_gimple_omp_oacc (stmt))) + + else if (!(is_gimple_omp (stmt) && is_gimple_omp_oacc (stmt))) { if (oacc_get_fn_attrib (cfun->decl) != NULL) - { - error_at (gimple_location (stmt), - "non-OpenACC construct inside of OpenACC routine"); - return false; - } + { + error_at (gimple_location (stmt), + "non-OpenACC construct inside of OpenACC routine"); + return false; + } else - for (omp_context *octx = ctx; octx != NULL; octx = octx->outer) - if (is_gimple_omp (octx->stmt) - && is_gimple_omp_oacc (octx->stmt)) - { - error_at (gimple_location (stmt), - "non-OpenACC construct inside of OpenACC region"); - return false; - } + for (omp_context *octx = ctx; octx != NULL; octx = octx->outer) + if (is_gimple_omp (octx->stmt) && is_gimple_omp_oacc (octx->stmt)) + { + error_at (gimple_location (stmt), + "non-OpenACC construct inside of OpenACC region"); + return false; + } } if (ctx != NULL) @@ -3275,6 +3393,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: ok = true; break; @@ -3774,6 +3893,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) break; case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: /* OpenACC 'kernels' decomposed parts. */ stmt_name = "kernels"; break; @@ -3794,6 +3914,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) ctx_stmt_name = "host_data"; break; case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS: /* OpenACC 'kernels' decomposed parts. */ ctx_stmt_name = "kernels"; break; @@ -3801,10 +3922,12 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) } /* OpenACC/OpenMP mismatch? */ - if (is_gimple_omp_oacc (stmt) - != is_gimple_omp_oacc (ctx->stmt)) - { - error_at (gimple_location (stmt), + if (is_gimple_omp_oacc (stmt) != is_gimple_omp_oacc (ctx->stmt) + && (gimple_code (stmt) != GIMPLE_OMP_TARGET + || gimple_omp_target_kind (stmt) + != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)) + { + error_at (gimple_location (stmt), "%s %qs construct inside of %s %qs region", (is_gimple_omp_oacc (stmt) ? "OpenACC" : "OpenMP"), stmt_name, @@ -3812,7 +3935,16 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx) ? "OpenACC" : "OpenMP"), ctx_stmt_name); return false; } - if (is_gimple_omp_offloaded (ctx->stmt)) + + if ((gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS) + && (gimple_code (stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (stmt) + == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)) + ; + + else if (is_gimple_omp_offloaded (ctx->stmt)) { /* No GIMPLE_OMP_TARGET inside offloaded OpenACC CTX. */ if (is_gimple_omp_oacc (ctx->stmt)) @@ -7373,9 +7505,11 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *body_p, static void lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, - gcall *fork, gcall *private_marker, gcall *join, - gimple_seq *fork_seq, gimple_seq *join_seq, - omp_context *ctx) + gcall *fork, gcall *private_marker, + gcall *private_scalars_marker, + gcall *firstprivate_marker, gcall *join, + gimple_seq *fork_seq, gimple_seq *join_seq, + omp_context *ctx) { gimple_seq before_fork = NULL; gimple_seq after_fork = NULL; @@ -7391,7 +7525,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, /* No 'reduction' clauses on OpenACC 'kernels'. */ gcc_checking_assert (!is_oacc_kernels (ctx)); /* Likewise, on OpenACC 'kernels' decomposed parts. */ - gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx)); + gcc_checking_assert ( + !is_oacc_kernels_decomposed_part (ctx) + || is_oacc_kernels_decomposed_graphite_part (ctx)); tree orig = OMP_CLAUSE_DECL (c); tree var = maybe_lookup_decl (orig, ctx); @@ -7585,7 +7721,12 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, gimple_seq_add_stmt (fork_seq, fork); gimple_seq_add_seq (fork_seq, after_fork); + if (private_scalars_marker) + gimple_seq_add_stmt (join_seq, private_scalars_marker); + if (firstprivate_marker) + gimple_seq_add_stmt (join_seq, firstprivate_marker); gimple_seq_add_seq (join_seq, before_join); + if (join) gimple_seq_add_stmt (join_seq, join); gimple_seq_add_seq (join_seq, after_join); @@ -8294,16 +8435,29 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses, else gcc_unreachable (); - /* In a parallel region, loops are implicitly INDEPENDENT. */ - if (!tgt || is_oacc_parallel_or_serial (tgt)) - tag |= OLF_INDEPENDENT; + /* In a parallel region, loops without auto and seq clauses are + implicitly INDEPENDENT. */ + if ((!tgt + || (is_oacc_parallel_or_serial (tgt) + && !is_oacc_kernels_decomposed_graphite_part (tgt))) + && !(tag & (OLF_SEQ | OLF_AUTO))) + { + tag |= OLF_INDEPENDENT; + } /* Loops inside OpenACC 'kernels' decomposed parts' regions are expected to have an explicit 'seq' or 'independent' clause, and no 'auto' clause. */ - if (tgt && is_oacc_kernels_decomposed_part (tgt)) + if (tgt && is_oacc_kernels_decomposed_part (tgt) + && !is_oacc_kernels_decomposed_graphite_part (tgt)) { - gcc_assert (tag & (OLF_SEQ | OLF_INDEPENDENT)); - gcc_assert (!(tag & OLF_AUTO)); + tag |= OLF_INDEPENDENT; + + gcc_checking_assert ( + gimple_code (ctx->stmt) != GIMPLE_OMP_TARGET + /* Loops in kernels regions that will be handled by Graphite should + have been made 'auto' by "pass_convert_oacc_kernels". */ + || gimple_omp_target_kind (ctx->stmt) + != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE); } if (tag & OLF_TILE) @@ -8358,7 +8512,9 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head, static void lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker, - gimple_seq *head, gimple_seq *tail, omp_context *ctx) + gcall *private_scalars_marker, + gcall *firstprivate_marker, gimple_seq *head, + gimple_seq *tail, omp_context *ctx) { bool inner = false; tree ddvar = create_tmp_var (integer_type_node, ".data_dep"); @@ -8373,6 +8529,20 @@ lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker, gimple_call_set_arg (private_marker, 1, ddvar); } + if (private_scalars_marker) + { + gimple_set_location (private_scalars_marker, loc); + gimple_call_set_lhs (private_scalars_marker, ddvar); + gimple_call_set_arg (private_scalars_marker, 1, ddvar); + } + + if (firstprivate_marker) + { + gimple_set_location (firstprivate_marker, loc); + gimple_call_set_lhs (firstprivate_marker, ddvar); + gimple_call_set_arg (firstprivate_marker, 1, ddvar); + } + tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK); tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN); @@ -8402,9 +8572,10 @@ lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker, build_int_cst (integer_type_node, done), &join_seq); - lower_oacc_reductions (loc, clauses, place, inner, - fork, (count == 1) ? private_marker : NULL, - join, &fork_seq, &join_seq, ctx); + lower_oacc_reductions (loc, clauses, place, inner, fork, + (count == 1) ? private_marker : NULL, + private_scalars_marker, firstprivate_marker, join, + &fork_seq, &join_seq, ctx); /* Append this level to head. */ gimple_seq_add_seq (head, fork_seq); @@ -11531,6 +11702,76 @@ lower_oacc_private_marker (omp_context *ctx) return gimple_build_call_internal_vec (IFN_UNIQUE, args); } +/* Return an internal function call that contains a list of variables which are + "firstprivate" in the compute region representend by CTX. This call is used + to help Graphite identify those static. */ + +static gcall * +make_oacc_firstprivate_vars_marker (omp_context *ctx) +{ + auto_vec args; + + args.quick_push ( + build_int_cst (integer_type_node, IFN_UNIQUE_OACC_FIRSTPRIVATE)); + + /* TODO Change the data structure/iteration to ensure that the ordering of the + variables remains stable between GCC runs. */ + hash_set::iterator end = ctx->oacc_firstprivate_vars->end(); + hash_set::iterator it = ctx->oacc_firstprivate_vars->begin (); + for (; it != end; ++it) + { + tree decl = *it; + for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer) + { + tree inner_decl = maybe_lookup_decl (decl, thisctx); + if (inner_decl) + { + decl = inner_decl; + break; + } + } + + args.safe_push (decl); + } + + return gimple_build_call_internal_vec (IFN_UNIQUE, args); +} + +/* Return an internal function call that contains a list of scalar variables + which are "private" in the compute region represented by CTX. This call is + used to help Graphite identify those variables. */ + +static gcall * +make_oacc_private_scalars_marker (omp_context *ctx) +{ + auto_vec args; + + args.quick_push ( + build_int_cst (integer_type_node, IFN_UNIQUE_OACC_PRIVATE_SCALAR)); + + /* TODO Change the data structure/iteration to ensure that the ordering of + the variables remains stable between GCC runs. */ + hash_set::iterator end = ctx->oacc_private_scalars->end (); + hash_set::iterator it = ctx->oacc_private_scalars->begin (); + for (; it != end; ++it) + { + tree decl = *it; + for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer) + { + tree inner_decl = maybe_lookup_decl (decl, thisctx); + if (inner_decl) + { + decl = inner_decl; + break; + } + } + + args.safe_push (decl); + } + + return gimple_build_call_internal_vec (IFN_UNIQUE, args); +} + /* Lower code for an OMP loop directive. */ static void @@ -11739,11 +11980,16 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) /* Once lowered, extract the bounds and clauses. */ omp_extract_for_data (stmt, &fd, NULL); - if (is_gimple_omp_oacc (ctx->stmt) - && !ctx_in_oacc_kernels_region (ctx)) - lower_oacc_head_tail (gimple_location (stmt), - gimple_omp_for_clauses (stmt), private_marker, - &oacc_head, &oacc_tail, ctx); + bool oacc_kernels_parloops = false; + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS) + oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx); + if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops) + { + lower_oacc_head_tail (gimple_location (stmt), + gimple_omp_for_clauses (stmt), private_marker, + NULL, NULL, &oacc_head, &oacc_tail, ctx); + } /* Add OpenACC partitioning and reduction markers just before the loop. */ if (oacc_head) @@ -12559,6 +12805,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) case GF_OMP_TARGET_KIND_OACC_DECLARE: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED: case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE: + case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE: data_region = false; break; case GF_OMP_TARGET_KIND_DATA: @@ -12751,13 +12998,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) break; case OMP_CLAUSE_FIRSTPRIVATE: - gcc_checking_assert (offloaded); - if (is_gimple_omp_oacc (ctx->stmt)) - { + gcc_checking_assert (offloaded || is_oacc_data_kernels_part (ctx)); + if (is_gimple_omp_oacc (ctx->stmt)) + { /* No 'firstprivate' clauses on OpenACC 'kernels'. */ gcc_checking_assert (!is_oacc_kernels (ctx)); - /* Likewise, on OpenACC 'kernels' decomposed parts. */ - gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx)); goto oacc_firstprivate; } @@ -12785,13 +13030,12 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) break; case OMP_CLAUSE_PRIVATE: + gcc_checking_assert (offloaded || is_oacc_data_kernels_part (ctx)); gcc_checking_assert (offloaded); if (is_gimple_omp_oacc (ctx->stmt)) { /* No 'private' clauses on OpenACC 'kernels'. */ gcc_checking_assert (!is_oacc_kernels (ctx)); - /* Likewise, on OpenACC 'kernels' decomposed parts. */ - gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx)); break; } @@ -13066,7 +13310,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) } else if (is_gimple_reg (var)) { - gcc_assert (offloaded); + gcc_assert (offloaded || is_oacc_data_kernels_part (ctx)); tree avar = create_tmp_var (TREE_TYPE (var)); mark_addressable (avar); enum gomp_map_kind map_kind = OMP_CLAUSE_MAP_KIND (c); @@ -13846,13 +14090,26 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) gcall *private_marker = lower_oacc_private_marker (ctx); - if (private_marker) + gcall *firstprivate_marker = NULL; + gcall *private_scalars_marker = NULL; + + /* The markers for "private" and "firstprivate" scalars are only used + to help "Graphite" identify those variables for which it has to + adjust some dependences. */ + if (is_oacc_kernels_decomposed_graphite_part (ctx)) + { + firstprivate_marker = make_oacc_firstprivate_vars_marker (ctx); + private_scalars_marker = make_oacc_private_scalars_marker (ctx); + } + + if (private_marker) gimple_call_set_arg (private_marker, 2, level); - lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level, - false, NULL, private_marker, NULL, &fork_seq, - &join_seq, ctx); - } + lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level, + false, NULL, private_marker, + private_scalars_marker, firstprivate_marker, + NULL, &fork_seq, &join_seq, ctx); + } gimple_seq_add_seq (&new_body, fork_seq); gimple_seq_add_seq (&new_body, tgt_body); diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc index 4ba5758a9067..c96207d96250 100644 --- a/gcc/omp-oacc-kernels-decompose.cc +++ b/gcc/omp-oacc-kernels-decompose.cc @@ -176,8 +176,13 @@ adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p, compiler logic to analyze this, so can't parallelize it here, so we'd very likely be running into a performance problem if we were to execute this unparallelized, thus forward the whole loop - nest to 'parloops'. */ - *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS; + nest to Graphite/"parloops". */ + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE) + *region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE; + else if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS) + *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS; + else + gcc_unreachable (); /* Terminate: final decision for this region. */ *handled_ops_p = true; return integer_zero_node; @@ -197,8 +202,13 @@ adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p, the compiler logic to analyze this, so can't parallelize it here, so we'd very likely be running into a performance problem if we were to execute this unparallelized, thus forward the whole thing to - 'parloops'. */ - *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS; + Graphite/"parloops". */ + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE) + *region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE; + else if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS) + *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS; + else + gcc_unreachable (); /* Terminate: final decision for this region. */ *handled_ops_p = true; return integer_zero_node; @@ -309,7 +319,9 @@ make_region_seq (location_t loc, gimple_seq stmts, /* Figure out the region code for this region. */ /* Optimistic default: assume "setup code", no looping; thus not performance-critical. */ - int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE; + int region_code = param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE + ? GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE + : GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE; adjust_region_code (stmts, ®ion_code); if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE) @@ -330,6 +342,13 @@ make_region_seq (location_t loc, gimple_seq stmts, loops nested inside this sequentially executed statement. */ make_loops_gang_single (stmts); } + else if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, loc_stmts_first, + "beginning % part in OpenACC" + " % region\n"); + } else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS) { if (dump_enabled_p ()) @@ -437,21 +456,24 @@ adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *, tree *outer_clause_ptr = NULL; switch (OMP_CLAUSE_CODE (loop_clause)) { - case OMP_CLAUSE_GANG: - outer_clause_ptr = wi_info->loop_gang_clause_ptr; - break; - case OMP_CLAUSE_WORKER: - outer_clause_ptr = wi_info->loop_worker_clause_ptr; - break; - case OMP_CLAUSE_VECTOR: - outer_clause_ptr = wi_info->loop_vector_clause_ptr; - break; - case OMP_CLAUSE_SEQ: - case OMP_CLAUSE_INDEPENDENT: - case OMP_CLAUSE_AUTO: - add_auto_clause = false; - default: - break; + case OMP_CLAUSE_GANG: + outer_clause_ptr = wi_info->loop_gang_clause_ptr; + add_auto_clause = false; + break; + case OMP_CLAUSE_WORKER: + outer_clause_ptr = wi_info->loop_worker_clause_ptr; + add_auto_clause = false; + break; + case OMP_CLAUSE_VECTOR: + outer_clause_ptr = wi_info->loop_vector_clause_ptr; + add_auto_clause = false; + break; + case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_INDEPENDENT: + case OMP_CLAUSE_AUTO: + add_auto_clause = false; + default: + break; } if (outer_clause_ptr != NULL) { @@ -525,30 +547,34 @@ transform_kernels_loop_clauses (gimple *omp_for, loop_clause = OMP_CLAUSE_CHAIN (loop_clause)) { bool found_num_clause = false; - tree *clause_ptr, clause_to_check; + tree *clause_ptr; + tree clause_to_check = NULL_TREE; switch (OMP_CLAUSE_CODE (loop_clause)) - { - case OMP_CLAUSE_GANG: - found_num_clause = true; - clause_ptr = &loop_gang_clause; - clause_to_check = num_gangs_clause; - break; - case OMP_CLAUSE_WORKER: - found_num_clause = true; - clause_ptr = &loop_worker_clause; - clause_to_check = num_workers_clause; - break; - case OMP_CLAUSE_VECTOR: - found_num_clause = true; - clause_ptr = &loop_vector_clause; - clause_to_check = vector_length_clause; - break; - case OMP_CLAUSE_INDEPENDENT: - case OMP_CLAUSE_SEQ: - case OMP_CLAUSE_AUTO: - add_auto_clause = false; - default: - break; + { + case OMP_CLAUSE_GANG: + found_num_clause = true; + add_auto_clause = false; + clause_ptr = &loop_gang_clause; + clause_to_check = num_gangs_clause; + break; + case OMP_CLAUSE_WORKER: + found_num_clause = true; + add_auto_clause = false; + clause_ptr = &loop_worker_clause; + clause_to_check = num_workers_clause; + break; + case OMP_CLAUSE_VECTOR: + found_num_clause = true; + add_auto_clause = false; + clause_ptr = &loop_vector_clause; + clause_to_check = vector_length_clause; + break; + case OMP_CLAUSE_INDEPENDENT: + case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_AUTO: + add_auto_clause = false; + default: + break; } if (found_num_clause && OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL) { @@ -646,10 +672,13 @@ make_region_loop_nest (gimple *omp_for, gimple_seq stmts, clauses = unshare_expr (clauses); /* Figure out the region code for this region. */ - /* Optimistic default: assume that the loop nest is parallelizable - (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) 'auto' clause, - and no un-annotated loops). */ - int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED; + /* For "parloops", use an optimistic default: assume that the loop nest is + parallelizable (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) + 'auto' clause, and no un-annotated loops). */ + int region_code = param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE + ? GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE + : GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED; + adjust_region_code (stmts, ®ion_code); if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED) @@ -661,6 +690,19 @@ make_region_loop_nest (gimple *omp_for, gimple_seq stmts, "parallelized loop nest" " in OpenACC % region\n"); + clauses = transform_kernels_loop_clauses (omp_for, + num_gangs_clause, + num_workers_clause, + vector_length_clause, + clauses); + } + else if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, omp_for, + "forwarded loop nest in OpenACC % region" + " to % for analysis\n"); + clauses = transform_kernels_loop_clauses (omp_for, num_gangs_clause, num_workers_clause, @@ -1526,8 +1568,13 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { - return (flag_openacc - && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE); + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE + || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS) + return flag_openacc; + else if (param_openacc_kernels == OPENACC_KERNELS_PARLOOPS) + return false; + else + gcc_unreachable (); } virtual unsigned int execute (function *) { diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index e99aaac0e515..2743e90f79a3 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -746,6 +746,198 @@ oacc_xform_loop (gcall *call) gsi_replace_with_seq (&gsi, seq, true); } +/* This is used for expanding the loop calls to "fake" values that mimic the + values used for host execution during scalar evolution analysis in + Graphite. The function has been derived from oacc_xform_loop which could not + be used because it rewrites the code directly. + + TODO This function can either be simplified significantly (cf. the fixed + values for number_of_threads, thread_index, chunking, striding) or unified + with oacc_xform_loop. */ + +tree +oacc_extract_loop_call (gcall *call) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (call); + enum ifn_goacc_loop_kind code + = (enum ifn_goacc_loop_kind)TREE_INT_CST_LOW (gimple_call_arg (call, 0)); + tree dir = gimple_call_arg (call, 1); + tree range = gimple_call_arg (call, 2); + tree step = gimple_call_arg (call, 3); + tree chunk_size = NULL_TREE; + unsigned mask = (unsigned)TREE_INT_CST_LOW (gimple_call_arg (call, 5)); + tree lhs = gimple_call_lhs (call); + tree type = NULL_TREE; + tree diff_type = TREE_TYPE (range); + tree r = NULL_TREE; + bool chunking = false, striding = true; + unsigned outer_mask = mask & (~mask + 1); // Outermost partitioning + + gcc_checking_assert (lhs); + + type = TREE_TYPE (lhs); + + tree number_of_threads = integer_one_node; + tree thread_index = integer_zero_node; + + /* striding=true, chunking=true + -> invalid. + striding=true, chunking=false + -> chunks=1 + striding=false,chunking=true + -> chunks=ceil (range/(chunksize*threads*step)) + striding=false,chunking=false + -> chunk_size=ceil(range/(threads*step)),chunks=1 */ + + switch (code) + { + default: + gcc_unreachable (); + + case IFN_GOACC_LOOP_CHUNKS: + if (!chunking) + r = build_int_cst (type, 1); + else + { + /* chunk_max + = (range - dir) / (chunks * step * num_threads) + dir */ + tree per = number_of_threads; + per = fold_convert (type, per); + chunk_size = fold_convert (type, chunk_size); + per = fold_build2 (MULT_EXPR, type, per, chunk_size); + per = fold_build2 (MULT_EXPR, type, per, step); + r = fold_build2 (MINUS_EXPR, type, range, dir); + r = fold_build2 (PLUS_EXPR, type, r, per); + r = fold_build2 (TRUNC_DIV_EXPR, type, r, per); + } + break; + + case IFN_GOACC_LOOP_STEP: + { + /* If striding, step by the entire compute volume, otherwise + step by the inner volume. */ + r = number_of_threads; + r = fold_build2 (MULT_EXPR, type, fold_convert (type, r), step); + } + break; + + case IFN_GOACC_LOOP_OFFSET: + /* Enable vectorization on non-SIMT targets. */ + if (!targetm.simt.vf + && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR) + /* If not -fno-tree-loop-vectorize, hint that we want to vectorize + the loop. */ + && (flag_tree_loop_vectorize + || !global_options_set.x_flag_tree_loop_vectorize)) + { + basic_block bb = gsi_bb (gsi); + class loop *parent = bb->loop_father; + class loop *body = parent->inner; + + parent->force_vectorize = true; + parent->safelen = INT_MAX; + + /* "Chunking loops" may have inner loops. */ + if (parent->inner) + { + body->force_vectorize = true; + body->safelen = INT_MAX; + } + + cfun->has_force_vectorize_loops = true; + } + if (striding) + { + r = thread_index; + r = fold_convert (diff_type, r); + } + else + { + tree inner_size = number_of_threads; + tree outer_size = number_of_threads; + tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), + inner_size, outer_size); + + volume = fold_convert (diff_type, volume); + if (chunking) + chunk_size = fold_convert (diff_type, chunk_size); + else + { + tree per = fold_build2 (MULT_EXPR, diff_type, volume, step); + + chunk_size = fold_build2 (MINUS_EXPR, diff_type, range, dir); + chunk_size = fold_build2 (PLUS_EXPR, diff_type, chunk_size, per); + chunk_size + = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per); + } + + tree span = fold_build2 (MULT_EXPR, diff_type, chunk_size, + fold_convert (diff_type, inner_size)); + r = thread_index; + r = fold_convert (diff_type, r); + r = fold_build2 (MULT_EXPR, diff_type, r, span); + + tree inner = thread_index; + inner = fold_convert (diff_type, inner); + r = fold_build2 (PLUS_EXPR, diff_type, r, inner); + + if (chunking) + { + tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6)); + tree per + = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size); + per = fold_build2 (MULT_EXPR, diff_type, per, chunk); + + r = fold_build2 (PLUS_EXPR, diff_type, r, per); + } + } + r = fold_build2 (MULT_EXPR, diff_type, r, step); + if (type != diff_type) + r = fold_convert (type, r); + break; + + case IFN_GOACC_LOOP_BOUND: + if (striding) + r = range; + else + { + tree inner_size = number_of_threads; + tree outer_size = number_of_threads; + tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), + inner_size, outer_size); + + volume = fold_convert (diff_type, volume); + if (chunking) + chunk_size = fold_convert (diff_type, chunk_size); + else + { + tree per = fold_build2 (MULT_EXPR, diff_type, volume, step); + + chunk_size = fold_build2 (MINUS_EXPR, diff_type, range, dir); + chunk_size = fold_build2 (PLUS_EXPR, diff_type, chunk_size, per); + chunk_size + = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per); + } + + tree span = fold_build2 (MULT_EXPR, diff_type, chunk_size, + fold_convert (diff_type, inner_size)); + + r = fold_build2 (MULT_EXPR, diff_type, span, step); + + tree offset = gimple_call_arg (call, 6); + r = fold_build2 (PLUS_EXPR, diff_type, r, + fold_convert (diff_type, offset)); + r = fold_build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type, + r, range); + } + if (diff_type != type) + r = fold_convert (type, r); + break; + } + + return r; +} + /* Transform a GOACC_TILE call. Determines the element loop span for the specified loop of the nest. This is 1 if we're not tiling. @@ -936,7 +1128,8 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) #endif if (check && warn_openacc_parallelism - && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))) + && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)) + && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn))) { static char const *const axes[] = /* Must be kept in sync with GOMP_DIM enumeration. */ @@ -1435,7 +1628,219 @@ oacc_loop_process (oacc_loop *loop) oacc_loop_process (loop->sibling); } -/* Walk the OpenACC loop heirarchy checking and assigning the +/* Return the outermost CFG loop that is enclosed between the head and + tail mark calls for LOOP, or NULL if there is no such CFG loop. + + The outermost CFG loop is a loop that is used for "chunking" the + original loop from the user's code. The lower_omp_for function + in omp-low.c which creates the head and tail mark sequence and + the expand_oacc_for function in omp-expand.c are relevant for + understanding the structure that we expect to find here. But note + that the passes implemented in those files do not operate on CFG + loops and hence the correspondence to the CFG loop structure is + not directly visible there and has to be inferred. */ + +static loop_p +oacc_loop_get_cfg_loop (oacc_loop *loop) +{ + loop_p enclosed_cfg_loop = NULL; + for (unsigned dim = 0; dim < GOMP_DIM_MAX; ++dim) + { + gcall *tail_mark = loop->tails[dim]; + gimple *head_mark = loop->heads[dim]; + if (!tail_mark) + continue; + + if (dump_file && (dump_flags & TDF_DETAILS)) + dump_printf (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, "%G", + tail_mark); + + loop_p mark_cfg_loop = tail_mark->bb->loop_father; + loop_p current_cfg_loop = mark_cfg_loop; + + /* Ascend from TAIL_MARK until a different CFG loop is reached. + + From the way that OpenACC loops are treated in omp-low.c, we + could expect the tail marker to be immediately preceded by a + loop exit. But loop optimizations (e.g. store-motion in + pass_lim) can change this. */ + basic_block bb = tail_mark->bb; + bool empty_loop = false; + while (current_cfg_loop == mark_cfg_loop) + { + /* If the OpenACC loop becomes empty due to optimizations, + there is no CFG loop at all enclosed between head and + tail mark */ + if (bb == head_mark->bb) + { + empty_loop = true; + break; + } + + bb = get_immediate_dominator (CDI_DOMINATORS, bb); + current_cfg_loop = bb->loop_father; + } + + if (empty_loop) + continue; + + /* We expect to find the same CFG loop enclosed between all head + and tail mark pairs. Hence we actually need to look at only + the first available pair. But we consider all for + verification purposes. */ + if (enclosed_cfg_loop) + { + gcc_assert (current_cfg_loop == enclosed_cfg_loop); + continue; + } + + enclosed_cfg_loop = current_cfg_loop; + + gcc_checking_assert (dominated_by_p ( + CDI_DOMINATORS, enclosed_cfg_loop->header, head_mark->bb)); + } + + return enclosed_cfg_loop; +} + +static const char* +can_be_parallel_str (loop_p loop) +{ + if (!loop->can_be_parallel_valid_p) + return "not analyzed"; + + return loop->can_be_parallel ? "can be parallel" : "cannot be parallel"; +} + +/* Returns true if LOOP is known to be parallelizable and false + otherwise. The decision is based on the the dependence analysis + that must have been previously performed by Graphite on the CFG + loops contained in the OpenACC loop LOOP. The value of ANALYZED is + set to true if all relevant CFG loops have been analyzed. */ + +static bool +oacc_loop_can_be_parallel_p (oacc_loop *loop, bool& analyzed) +{ + /* Graphite will not run without enabled optimizations, so we cannot + expect to find any parallelizability information on the CFG loops. */ + if (!optimize) + return false; + + const dump_user_location_t loc + = dump_user_location_t::from_location_t (loop->loc); + + if (dump_file && (dump_flags & TDF_DETAILS)) + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, loc, + "Inspecting CFG-loops for OpenACC loop.\n"); + + /* Search for the CFG loops that are enclosed between the head and + tail mark calls for LOOP. The two outer CFG loops are considered + to belong to the OpenACC loop and hence the CAN_BE_PARALLEL flags + on those loops will be used to determine the return value. */ + bool can_be_parallel = false; + loop_p enclosed_cfg_loop = oacc_loop_get_cfg_loop (loop); + + if (enclosed_cfg_loop + /* The inner loop may have been removed in degenerate cases, e.g. + if an infinite "for (; ;)" gets optimized in an OpenACC loop nest. */ + && enclosed_cfg_loop->inner) + { + gcc_assert (enclosed_cfg_loop->inner != NULL); + gcc_assert (enclosed_cfg_loop->inner->next == NULL); + + can_be_parallel = enclosed_cfg_loop->can_be_parallel + && enclosed_cfg_loop->inner->can_be_parallel; + + analyzed = enclosed_cfg_loop->can_be_parallel_valid_p + && enclosed_cfg_loop->inner->can_be_parallel_valid_p; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + dump_printf (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, + "\tOuter loop <%d> preceeding tail mark %s.\n" + "\tInner loop <%d> %s.\n", + enclosed_cfg_loop->num, + can_be_parallel_str (enclosed_cfg_loop), + enclosed_cfg_loop->inner->num, + can_be_parallel_str (enclosed_cfg_loop->inner)); + } + } + else if (dump_file && (dump_flags & TDF_DETAILS)) + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, loc, + "Empty OpenACC loop.\n"); + + return can_be_parallel; +} + +static bool +oacc_parallel_kernels_graphite_fun_p () +{ + return lookup_attribute ("oacc parallel_kernels_graphite", + DECL_ATTRIBUTES (cfun->decl)); +} + +static bool +oacc_parallel_fun_p () +{ + return lookup_attribute ("oacc parallel", + DECL_ATTRIBUTES (cfun->decl)); +} + +/* If LOOP is an "auto" loop for which dependence analysis has determined that + it can be parallelized, make it "independent" by adjusting its FLAGS field + and return true. Otherwise, return false. */ + +static bool +oacc_loop_transform_auto_into_independent (oacc_loop *loop) +{ + if (!optimize) + return false; + + /* This function is only relevant on "kernels" + regions that have been explicitly designated + to be analyzed by Graphite and on "auto" + loops in "parallel" regions. */ + if (!oacc_parallel_kernels_graphite_fun_p () && + !oacc_parallel_fun_p ()) + return false; + + if (loop->routine) + return false; + + if (!(loop->flags & OLF_AUTO)) + return false; + + bool analyzed = false; + bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed); + dump_user_location_t loc = dump_user_location_t::from_location_t (loop->loc); + + if (dump_enabled_p ()) + { + if (!analyzed) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "'auto' loop has not been analyzed (cf. 'graphite' " + "dumps for more information).\n"); + } + if (!can_be_parallel) + return false; + + loop->flags |= OLF_INDEPENDENT; + + /* We need to keep the OLF_AUTO flag for now. + oacc_loop_fixed_partitions and oacc_loop_auto_partitions + interpret "independent auto" as "this loop can be parallel, + please determine the dimensions" which seems to correspond to the + meaning of those clauses in an old OpenACC version. We rely on + this behaviour to assign the dimensions for this loop. + + TODO Use a different flag to indicate that the dimensions must be assigned. */ + + // loop->flags &= ~OLF_AUTO; + + return true; +} + +/* Walk the OpenACC loop hierarchy checking and assigning the programmer-specified partitionings. OUTER_MASK is the partitioning this loop is contained within. Return mask of partitioning encountered. If any auto loops are discovered, set GOMP_DIM_MAX @@ -1491,6 +1896,9 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask) loop->flags |= OLF_AUTO; mask_all |= GOMP_DIM_MASK (GOMP_DIM_MAX); } + + if (oacc_loop_transform_auto_into_independent (loop)) + mask_all |= GOMP_DIM_MASK (GOMP_DIM_MAX); } if (this_mask & outer_mask) @@ -1932,24 +2340,29 @@ execute_oacc_loop_designation () flag_openacc_dims = (char *)&flag_openacc_dims; } - bool is_oacc_parallel - = (lookup_attribute ("oacc parallel", - DECL_ATTRIBUTES (current_function_decl)) != NULL); bool is_oacc_kernels = (lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (current_function_decl)) != NULL); + bool is_oacc_parallel + = (lookup_attribute ("oacc parallel", + DECL_ATTRIBUTES (current_function_decl)) != NULL); bool is_oacc_serial = (lookup_attribute ("oacc serial", DECL_ATTRIBUTES (current_function_decl)) != NULL); bool is_oacc_parallel_kernels_parallelized - = (lookup_attribute ("oacc parallel_kernels_parallelized", - DECL_ATTRIBUTES (current_function_decl)) != NULL); + = (lookup_attribute ("oacc parallel_kernels_parallelized", + DECL_ATTRIBUTES (current_function_decl)) + != NULL); + bool is_oacc_parallel_kernels_graphite + = (lookup_attribute ("oacc parallel_kernels_graphite", + DECL_ATTRIBUTES (current_function_decl)) != NULL); bool is_oacc_parallel_kernels_gang_single = (lookup_attribute ("oacc parallel_kernels_gang_single", DECL_ATTRIBUTES (current_function_decl)) != NULL); int fn_level = oacc_fn_attrib_level (attrs); bool is_oacc_routine = (fn_level >= 0); gcc_checking_assert (is_oacc_parallel + + is_oacc_parallel_kernels_graphite + is_oacc_kernels + is_oacc_serial + is_oacc_parallel_kernels_parallelized @@ -1957,31 +2370,50 @@ execute_oacc_loop_designation () + is_oacc_routine == 1); - bool is_oacc_kernels_parallelized - = (lookup_attribute ("oacc kernels parallelized", - DECL_ATTRIBUTES (current_function_decl)) != NULL); - if (is_oacc_kernels_parallelized) - gcc_checking_assert (is_oacc_kernels); + if (is_oacc_parallel_kernels_parallelized) + { + gcc_checking_assert (!is_oacc_kernels); + gcc_checking_assert (!is_oacc_parallel_kernels_gang_single); + } + if (is_oacc_parallel_kernels_parallelized) + { + gcc_checking_assert (!is_oacc_kernels); + gcc_checking_assert (!is_oacc_parallel_kernels_gang_single); + } + if (is_oacc_parallel_kernels_gang_single) + { + gcc_checking_assert (!is_oacc_kernels); + gcc_checking_assert (!is_oacc_parallel_kernels_parallelized); + } + if (is_oacc_parallel_kernels_graphite) + { + gcc_checking_assert (!is_oacc_kernels); + gcc_checking_assert (!is_oacc_parallel_kernels_gang_single); + gcc_checking_assert (!is_oacc_parallel_kernels_parallelized); + } if (dump_file) { - if (is_oacc_parallel) - fprintf (dump_file, "Function is OpenACC parallel offload\n"); + if (fn_level >= 0) + fprintf (dump_file, "Function is OpenACC routine level %d\n", + fn_level); else if (is_oacc_kernels) fprintf (dump_file, "Function is %s OpenACC kernels offload\n", - (is_oacc_kernels_parallelized + (is_oacc_parallel_kernels_parallelized ? "parallelized" : "unparallelized")); - else if (is_oacc_serial) - fprintf (dump_file, "Function is OpenACC serial offload\n"); else if (is_oacc_parallel_kernels_parallelized) fprintf (dump_file, "Function is %s OpenACC kernels offload\n", "parallel_kernels_parallelized"); else if (is_oacc_parallel_kernels_gang_single) fprintf (dump_file, "Function is %s OpenACC kernels offload\n", "parallel_kernels_gang_single"); - else if (is_oacc_routine) - fprintf (dump_file, "Function is OpenACC routine level %d\n", - fn_level); + else if (is_oacc_parallel_kernels_graphite) + fprintf (dump_file, "Function is %s OpenACC kernels offload\n", + "parallel_kernels_graphite"); + else if (is_oacc_serial) + fprintf (dump_file, "Function is OpenACC serial offload\n"); + else if (is_oacc_parallel) + fprintf (dump_file, "Function is OpenACC parallel offload\n"); else gcc_unreachable (); } @@ -2027,7 +2459,7 @@ execute_oacc_loop_designation () /* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1 kernels, so remove the parallelism dimensions function attributes potentially set earlier on. */ - if (is_oacc_kernels && !is_oacc_kernels_parallelized) + if (is_oacc_kernels && !is_oacc_parallel_kernels_parallelized) { oacc_set_fn_attrib (current_function_decl, NULL, NULL); attrs = oacc_get_fn_attrib (current_function_decl); @@ -2042,8 +2474,10 @@ execute_oacc_loop_designation () unsigned used_mask = oacc_loop_partition (loops, outer_mask); /* OpenACC kernels constructs are special: they currently don't use the generic oacc_loop infrastructure and attribute/dimension processing. */ - if (is_oacc_kernels && is_oacc_kernels_parallelized) + if (is_oacc_kernels && is_oacc_parallel_kernels_parallelized) { + gcc_checking_assert (!is_oacc_parallel_kernels_graphite); + /* Parallelized OpenACC kernels constructs use gang parallelism. See also tree-parloops.c:create_parallel_loop. */ used_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG); @@ -2192,6 +2626,11 @@ execute_oacc_device_lower () remove = true; break; + case IFN_UNIQUE_OACC_PRIVATE_SCALAR: + case IFN_UNIQUE_OACC_FIRSTPRIVATE: + remove = true; + break; + case IFN_UNIQUE_OACC_PRIVATE: { dump_flags_t l_dump_flags diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h index b91d08cd2182..cacc8ea7614d 100644 --- a/gcc/omp-offload.h +++ b/gcc/omp-offload.h @@ -31,5 +31,7 @@ extern GTY(()) vec *offload_vars; extern void omp_finish_file (void); extern void omp_discover_implicit_declare_target (void); +extern tree oacc_extract_loop_call (gcall *call); + #endif /* GCC_OMP_DEVICE_H */ diff --git a/gcc/params.opt b/gcc/params.opt index 8c5948f7a84d..52de12617cbe 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -794,8 +794,8 @@ Common Joined UInteger Var(param_min_vect_loop_bound) Param Optimization If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization. -param=openacc-kernels= -Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Param ---param=openacc-kernels=[decompose|parloops] Specify mode of OpenACC 'kernels' constructs handling. +Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_DECOMPOSE) Param +--param=openacc-kernels=[decompose|decompose-parloops|parloops] Specify mode of OpenACC 'kernels' constructs handling. Enum Name(openacc_kernels) Type(enum openacc_kernels) @@ -803,6 +803,9 @@ Name(openacc_kernels) Type(enum openacc_kernels) EnumValue Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE) +EnumValue +Enum(openacc_kernels) String(decompose-parloops) Value(OPENACC_KERNELS_DECOMPOSE_PARLOOPS) + EnumValue Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS) diff --git a/gcc/sese.c b/gcc/sese.c index ca88f9bbfdf1..50bdde6c537a 100644 --- a/gcc/sese.c +++ b/gcc/sese.c @@ -448,8 +448,29 @@ scalar_evolution_in_region (const sese_l ®ion, loop_p loop, tree t) if (!loop_in_sese_p (loop, region)) loop = NULL; - return instantiate_scev (region.entry, loop, - analyze_scalar_evolution (loop, t)); + tree chrec = analyze_scalar_evolution (loop, t); + + /* The IFN_GOACC_LOOP calls may evolve to an ssa name that is defined outside + of LOOP. To avoid failing the scev analysis, we need this special + handling. */ + if (TREE_CODE (t) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (t); + basic_block def_bb = def_stmt->bb; + if (is_gimple_call (def_stmt) + && gimple_call_internal_p (def_stmt, IFN_GOACC_LOOP) + && TREE_CODE (chrec) == SSA_NAME && def_bb + && SSA_NAME_DEF_STMT (chrec)->bb) + { + loop_p outer_loop = SSA_NAME_DEF_STMT (chrec)->bb->loop_father; + loop_p inner_loop = def_bb->loop_father; + + if (outer_loop != inner_loop) + return scalar_evolution_in_region (region, outer_loop, chrec); + } + } + + return instantiate_scev (region.entry, loop, chrec); } /* Return true if BB is empty, contains only DEBUG_INSNs. */ diff --git a/gcc/sese.h b/gcc/sese.h index c51ea68bfb47..114bb9b0c0b4 100644 --- a/gcc/sese.h +++ b/gcc/sese.h @@ -280,6 +280,7 @@ typedef struct gimple_poly_bb vec data_refs; vec read_scalar_refs; vec write_scalar_refs; + vec kill_scalar_refs; } *gimple_poly_bb_p; #define GBB_BB(GBB) (GBB)->bb diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c index 37e2a57455d1..8430cb868157 100644 --- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c @@ -20,7 +20,7 @@ extern unsigned int *__restrict c; void KERNELS () { #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */ - for (unsigned int i = 0; i < N; i++) + for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */ c[i] = a[i] + b[i]; } diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 new file mode 100644 index 000000000000..bba67dcf7cbc --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 @@ -0,0 +1,47 @@ +! Verify that Graphite's analysis of the CFG loops gets correctly +! transferred to the OpenACC loop structure for loop-nests of depth 1 + +! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details -fopt-info-optimized -fopt-info-missed" } +! { dg-additional-options "--param max-isl-operations=0" } +! { dg-additional-options "-O2" } +! { dg-prune-output ".*not inlinable.*" } + +module test_module + + real, allocatable :: array1(:) + real, allocatable :: array2(:) + + contains + +subroutine test_loop_nest_depth_1 () + implicit none + + integer :: i,n + + if (size (array1) /= size (array2)) return + n = size(array1) + + !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC gang vector loop parallelism" } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 } + ! { dg-message ".auto. loop can be parallel" "" {target *-*-*} .-2 } + do i=1, n + array2(i) = array1(i) ! { dg-message "loop has no data-dependences" } + end do + + + !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC seq loop parallelism" } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-2 } + do i=1, n-1 + array1(i+1) = array1(i) + 10 ! { dg-message "loop has data-dependences" } + array2(i) = array1(i) + end do + + return +end subroutine test_loop_nest_depth_1 + + + +end module test_module + +! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 2 "graphite" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 new file mode 100644 index 000000000000..d635cc5e4fe0 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 @@ -0,0 +1,103 @@ +! Verify that Graphite's analysis of the CFG loops gets correctly +! transferred to the OpenACC loop structure for loop-nests of depth 2 + +! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details" } +! { dg-additional-options "-fopt-info-optimized -fopt-info-missed" } +! { dg-additional-options "-O2" } +! { dg-prune-output ".*not inlinable.*" } + +module test_module + implicit none + + integer, parameter :: n = 100 + integer, parameter :: m = 100 + +contains + + subroutine test_loop_nest_depth_2 (array) + integer :: i, j + real :: array (2, n, m) + + ! Perfect loop-nest, inner and outer loop can be parallel + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, m + array (1, i, j) = array(2, i, j) ! { dg-message "loop has no data-dependences" } + end do + end do + !$acc end parallel + + ! Imperfect loop-nest, inner and outer loop can be parallel + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + array (2, i, n) = array(1, i, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, m + array (1, i, j) = array (2, i,j) ! { dg-message "loop has no data-dependences" } + end do + end do + !$acc end parallel + + ! Imperfect loop-nest, inner loop can be parallel, outer loop cannot be parallel + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do i=1, n-1 + array (1, i+1, 1) = array (2, i, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, m + array (1, i, j) = array (2, i, j) ! { dg-message "loop has no data-dependences" } + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest, inner loop can be parallel, outer loop cannot be parallel + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + array (2, i, n) = array (1, i, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do j=1, m-1 + array (1, i, j+1) = array (1, i, j) ! { dg-message "loop has data-dependences" } + end do + end do + !$acc end parallel + return + end subroutine test_loop_nest_depth_2 + +end module test_module + + +! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 4 "graphite" } } One function per kernel, all should be analyzed +! { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite" } } Original function should not be analyzed diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 new file mode 100644 index 000000000000..97acecd8807b --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 @@ -0,0 +1,323 @@ +! Verify that Graphite's analysis of the CFG loops gets correctly +! transferred to the OpenACC loop structure for loop-nests of depth 3 + +! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details" } +! { dg-additional-options "-fopt-info-optimized -fopt-info-missed" } +! { dg-additional-options "-O2" } +! { dg-prune-output ".*not inlinable.*" } + +module test_module + implicit none + + integer, parameter :: n = 100 + +contains + + subroutine test_loop_nest_depth_3 (array) + integer :: i, j, k + real :: array (2, n, n, n) + + ! Perfect loop-nest. Can be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do k=1, n + array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" } + end do + end do + end do + !$acc end parallel + + ! Perfect loop-nest. Innermost loop cannot be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do k=1, n-1 + array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Perfect loop-nest. Cannot be parallel because it contains no + ! data-reference and is hence not analyzed by Graphite. This is + ! expected: empty loops should not be parallel either cf. e.g. + ! "../../gfortran.dg/goacc/note-parallelism.f90". + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-2 } + do i=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-2 } + do j=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-bogus "loop has no data-dependences" "OpenACC internal chunking CFG loop not analyzed" {target *-*-*} .-2 } + ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-3 } + do k=1, n + array (1, i, j, k) = array(1, i, j, k) ! { dg-bogus "loop has no data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. All levels can be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, n-1 + array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do k=1, n-1 + array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. First level can be parallel, second level + ! can be parallel, third level cannot be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, n-1 + array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do k=1, n-1 + array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. First level can be parallel, second level + ! cannot be parallel, third level can be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do j=1, n-1 + array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do k=1, n-1 + array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. First level can be parallel, second and + ! third level cannot be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do i=1, n + array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do j=1, n-1 + array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do k=1, n-1 + array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. First level cannot be parallel, second and + ! third levels can be parallel + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do i=1, n - 1 + array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do k=1, n + array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. First level cannot be parallel, second + ! level can be parallel, third level cannot be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do i=1, n - 1 + array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do j=1, n + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do k=1, n - 1 + array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. First level cannot be parallel, second + ! level cannot be parallel, third level can be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do i=1, n - 1 + array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do j=1, n - 1 + array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 } + do k=1, n + array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" } + end do + end do + end do + !$acc end parallel + + + ! Imperfect loop-nest. All levels cannot be parallel. + + !$acc parallel copy(array) + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do i=1, n-1 + array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do j=1, n-1 + array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" } + !$acc loop auto + ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 } + ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 } + ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 } + do k=1, n-1 + array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" } + end do + end do + end do + !$acc end parallel + + return + end subroutine test_loop_nest_depth_3 + +end module test_module + + +! Outlined functions for all kernels but the one without data-references should be analyzed. +! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 10 "graphite" } } +! Original test functon and one outlined kernel function should not be analyzed +! { dg-final { scan-tree-dump-times "number of SCoPs: 0" 2 "graphite" } } diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c index eeb67ded3dcf..8170265a8d6e 100644 --- a/gcc/tree-chrec.c +++ b/gcc/tree-chrec.c @@ -249,6 +249,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type, return chrec_fold_plus_poly_poly (code, type, op0, op1); CASE_CONVERT: + case VIEW_CONVERT_EXPR: { /* We can strip sign-conversions to signed by performing the operation in unsigned. */ @@ -282,6 +283,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type, } CASE_CONVERT: + case VIEW_CONVERT_EXPR: { /* We can strip sign-conversions to signed by performing the operation in unsigned. */ @@ -323,6 +325,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type, : build_int_cst_type (type, -1))); CASE_CONVERT: + case VIEW_CONVERT_EXPR: if (tree_contains_chrecs (op1, NULL)) return chrec_dont_know; /* FALLTHRU */ diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 6a3659dc490c..2b97c5043ac1 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -100,6 +100,8 @@ along with GCC; see the file COPYING3. If not see #include "vr-values.h" #include "range-op.h" #include "tree-ssa-loop-ivopts.h" +#include "print-tree.h" +#include "graphite-oacc.h" static struct datadep_stats { @@ -225,7 +227,10 @@ dump_data_reference (FILE *outf, print_generic_stmt (outf, DR_REF (dr)); fprintf (outf, "# base_object: "); print_generic_stmt (outf, DR_BASE_OBJECT (dr)); - + fprintf (outf, "# base_address: "); + print_generic_stmt (outf, DR_BASE_ADDRESS (dr)); + fprintf (outf, "# loop-invariant offset: "); + print_generic_stmt (outf, DR_OFFSET (dr)); for (i = 0; i < DR_NUM_DIMENSIONS (dr); i++) { fprintf (outf, "# Access function %d: ", i); @@ -5865,9 +5870,13 @@ get_references_in_stmt (gimple *stmt, vec *references) if (gimple_call_internal_p (stmt)) switch (gimple_call_internal_fn (stmt)) { - case IFN_GOMP_SIMD_LANE: - { - class loop *loop = gimple_bb (stmt)->loop_father; + case IFN_UNIQUE: + case IFN_GOACC_REDUCTION: + case IFN_GOACC_LOOP: + return false; + case IFN_GOMP_SIMD_LANE: + { + class loop *loop = gimple_bb (stmt)->loop_father; tree uid = gimple_call_arg (stmt, 0); gcc_assert (TREE_CODE (uid) == SSA_NAME); if (loop == NULL @@ -6042,7 +6051,6 @@ graphite_find_data_references_in_stmt (edge nest, loop_p loop, gimple *stmt, vec *datarefs) { auto_vec references; - bool ret = true; data_reference_p dr; if (get_references_in_stmt (stmt, &references)) @@ -6056,7 +6064,7 @@ graphite_find_data_references_in_stmt (edge nest, loop_p loop, gimple *stmt, datarefs->safe_push (dr); } - return ret; + return true; } /* Search the data references in LOOP, and record the information into diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 5e64d5ed7a38..6c4bec69e7d0 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -4173,7 +4173,16 @@ public: virtual bool gate (function *) { if (oacc_kernels_p) - return flag_openacc; + { + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE) + return false; + + gcc_checking_assert ( + param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + + return flag_openacc; + } else return flag_tree_parallelize_loops > 1; } @@ -4192,6 +4201,13 @@ public: unsigned pass_parallelize_loops::execute (function *fun) { + if (oacc_kernels_p) + { + gcc_checking_assert ( + param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + } + tree nthreads = builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS); if (nthreads == NULL_TREE) return 0; diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index dbdfe8ffa721..00ad0bc6a4c5 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -264,6 +264,8 @@ along with GCC; see the file COPYING3. If not see #include "gimple.h" #include "ssa.h" #include "gimple-pretty-print.h" +#include "tree-pretty-print.h" +#include "print-tree.h" #include "fold-const.h" #include "gimplify.h" #include "gimple-iterator.h" @@ -276,6 +278,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "cfgloop.h" #include "tree-chrec.h" +#include "internal-fn.h" +#include "graphite-oacc.h" #include "tree-affine.h" #include "tree-scalar-evolution.h" #include "dumpfile.h" @@ -284,6 +288,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-into-ssa.h" #include "builtins.h" #include "case-cfn-macros.h" +#include "omp-offload.h" +#include "internal-fn.h" static tree analyze_scalar_evolution_1 (class loop *, tree); static tree analyze_scalar_evolution_for_address_of (class loop *loop, @@ -311,7 +317,19 @@ struct scev_info_hasher : ggc_ptr_hash static GTY (()) hash_table *scalar_evolution_info; - +/* This flag indicates that internal OpenACC calls should be analyzed. + The analysis is not valid in general. It is used to allow Graphite + to analyze the partially lowered OpenACC loops as if it was seeing + the unlowered loops. */ + +static bool analyze_openacc_calls = false; + +void set_scev_analyze_openacc_calls (bool analyze) +{ + analyze_openacc_calls = analyze; +} + + /* Constructs a new SCEV_INFO_STR structure for VAR and INSTANTIATED_BELOW. */ static inline struct scev_info_str * @@ -577,6 +595,51 @@ get_scalar_evolution (basic_block instantiated_below, tree scalar) return res; } +bool +oacc_call_analyzable_p (gimple *stmt) +{ + return analyze_openacc_calls + && gimple_call_internal_p (stmt, IFN_GOACC_LOOP); +} + +bool +oacc_call_analyzable_p (tree t) +{ + return TREE_CODE (t) == SSA_NAME + && oacc_call_analyzable_p (SSA_NAME_DEF_STMT (t)); +} + +/* Extract loop information from a OpenACC internal function call. */ + +tree +oacc_ifn_call_extract (gimple *stmt) +{ + if (oacc_call_analyzable_p (stmt)) + { + gcc_assert (gimple_call_internal_p (stmt, IFN_GOACC_LOOP)); + return oacc_extract_loop_call (as_a (stmt)); + } + + return chrec_dont_know; +} + +/* If EXPR is a analyzable internal OpenACC function call, + return the result of its analysis; otherwise return EXPR. */ + +tree +oacc_simplify (tree expr) +{ + if (expr == NULL || TREE_CODE (expr) != SSA_NAME) + return expr; + + gimple *def = SSA_NAME_DEF_STMT (expr); + + if (oacc_call_analyzable_p (def)) + return oacc_ifn_call_extract (def); + + return expr; +} + /* Helper function for add_to_evolution. Returns the evolution function for an assignment of the form "a = b + c", where "a" and "b" are on the strongly connected component. CHREC_BEFORE is the @@ -794,6 +857,8 @@ add_to_evolution (unsigned loop_nb, tree chrec_before, enum tree_code code, if (to_add == NULL_TREE) return chrec_before; + to_add = oacc_simplify (to_add); + /* TO_ADD is either a scalar, or a parameter. TO_ADD is not instantiated at this point. */ if (TREE_CODE (to_add) == POLYNOMIAL_CHREC) @@ -966,6 +1031,7 @@ follow_ssa_edge_binary (class loop *loop, gimple *at_stmt, res = t_false; } + *evolution_of_loop = oacc_simplify (*evolution_of_loop); return res; } @@ -1116,6 +1182,8 @@ follow_ssa_edge_inner_loop_phi (class loop *outer_loop, evolution_of_loop, limit); } +tree interpret_gimple_call (class loop *loop, gimple *call); + /* Follow the ssa edge into the expression EXPR. Return true if the strongly connected component has been found. */ @@ -1124,8 +1192,11 @@ follow_ssa_edge_expr (class loop *loop, gimple *at_stmt, tree expr, gphi *halting_phi, tree *evolution_of_loop, int limit) { - enum tree_code code; - tree type, rhs0, rhs1 = NULL_TREE; + enum tree_code code = LAST_AND_UNUSED_TREE_CODE; + tree type = NULL_TREE; + tree rhs0 = NULL_TREE; + tree rhs1 = NULL_TREE; + /* The EXPR is one of the following cases: - an SSA_NAME, @@ -1140,6 +1211,7 @@ follow_ssa_edge_expr (class loop *loop, gimple *at_stmt, tree expr, PHI nodes and otherwise expand appropriately for the expression handling below. */ tail_recurse: + expr = oacc_simplify (expr); if (TREE_CODE (expr) == SSA_NAME) { gimple *def = SSA_NAME_DEF_STMT (expr); @@ -1187,28 +1259,37 @@ tail_recurse: return t_false; } - /* At this level of abstraction, the program is just a set - of GIMPLE_ASSIGNs and PHI_NODEs. In principle there is no - other def to be handled. */ - if (!is_gimple_assign (def)) - return t_false; + /* At this level of abstraction, the program is just a set of + GIMPLE_ASSIGNs and PHI_NODEs. In principle there is no other def to + be handled except for OpenACC internal function calls. */ + if (is_gimple_assign (def)) + { + code = gimple_assign_rhs_code (def); + + switch (get_gimple_rhs_class (code)) + { + case GIMPLE_BINARY_RHS: + rhs0 = gimple_assign_rhs1 (def); + rhs1 = gimple_assign_rhs2 (def); + break; + case GIMPLE_UNARY_RHS: + case GIMPLE_SINGLE_RHS: + rhs0 = gimple_assign_rhs1 (def); + break; + default: + return t_false; + } + type = TREE_TYPE (gimple_assign_lhs (def)); + at_stmt = def; + } + else if (oacc_call_analyzable_p (expr)) { + // TODO-kernels Is this still needed here? + rhs0 = interpret_gimple_call (loop, def); + type = TREE_TYPE (gimple_call_lhs (def)); + at_stmt = def; + } + else return t_false; - code = gimple_assign_rhs_code (def); - switch (get_gimple_rhs_class (code)) - { - case GIMPLE_BINARY_RHS: - rhs0 = gimple_assign_rhs1 (def); - rhs1 = gimple_assign_rhs2 (def); - break; - case GIMPLE_UNARY_RHS: - case GIMPLE_SINGLE_RHS: - rhs0 = gimple_assign_rhs1 (def); - break; - default: - return t_false; - } - type = TREE_TYPE (gimple_assign_lhs (def)); - at_stmt = def; } else { @@ -1473,6 +1554,7 @@ follow_copies_to_constant (tree var) else break; } + res = oacc_simplify (res); if (CONSTANT_CLASS_P (res)) return res; return var; @@ -1506,6 +1588,7 @@ analyze_initial_condition (gphi *loop_phi_node) tree branch = PHI_ARG_DEF (loop_phi_node, i); basic_block bb = gimple_phi_arg_edge (loop_phi_node, i)->src; + branch = oacc_simplify (branch); /* When the branch is oriented to the loop's body, it does not contribute to the initial condition. */ if (flow_bb_inside_loop_p (loop, bb)) @@ -1533,6 +1616,7 @@ analyze_initial_condition (gphi *loop_phi_node) /* We may not have fully constant propagated IL. Handle degenerate PHIs here to not miss important early loop unrollings. */ init_cond = follow_copies_to_constant (init_cond); + init_cond = oacc_simplify (init_cond); if (dump_file && (dump_flags & TDF_SCEV)) { @@ -1558,6 +1642,7 @@ interpret_loop_phi (class loop *loop, gphi *loop_phi_node) /* Otherwise really interpret the loop phi. */ init_cond = analyze_initial_condition (loop_phi_node); res = analyze_evolution_in_loop (loop_phi_node, init_cond); + init_cond = analyze_initial_condition (loop_phi_node); /* Verify we maintained the correct initial condition throughout possible conversions in the SSA chain. */ @@ -1630,8 +1715,11 @@ interpret_rhs_expr (class loop *loop, gimple *at_stmt, return chrec_convert (type, rhs1, at_stmt); if (code == SSA_NAME) - return chrec_convert (type, analyze_scalar_evolution (loop, rhs1), - at_stmt); + { + rhs1 = oacc_simplify (rhs1); + return chrec_convert (type, analyze_scalar_evolution (loop, rhs1), + at_stmt); + } if (code == ASSERT_EXPR) { @@ -1920,7 +2008,25 @@ interpret_gimple_assign (class loop *loop, gimple *stmt) gimple_assign_rhs2 (stmt)); } - +/* Interpret a gimple call statement. */ + +tree +interpret_gimple_call (class loop *loop __attribute__ ((__unused__)), gimple *call) +{ + + /* Information about OpenACC loops is encoded in internal function calls. + Extract loop information from those calls. Ignore other calls for now. */ + if (!oacc_call_analyzable_p (call)) + return chrec_dont_know; + + tree expr = oacc_ifn_call_extract (call); + tree analyzed = expr; + + tree lhs = gimple_call_lhs (call); + gcc_assert (lhs); + + return chrec_convert (TREE_TYPE (lhs), analyzed, call); +} /* This section contains all the entry points: - number_of_iterations_in_loop, @@ -1943,6 +2049,8 @@ analyze_scalar_evolution_1 (class loop *loop, tree var) def = SSA_NAME_DEF_STMT (var); bb = gimple_bb (def); + if (!bb) + return chrec_dont_know; def_loop = bb->loop_father; if (!flow_bb_inside_loop_p (loop, bb)) @@ -1969,6 +2077,10 @@ analyze_scalar_evolution_1 (class loop *loop, tree var) res = interpret_gimple_assign (loop, def); break; + case GIMPLE_CALL: + res = interpret_gimple_call (loop, def); + break; + case GIMPLE_PHI: if (loop_phi_node_p (def)) res = interpret_loop_phi (loop, as_a (def)); @@ -2261,6 +2373,14 @@ instantiate_scev_name (edge instantiate_below, class loop *def_loop; basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (chrec)); + if (oacc_call_analyzable_p (chrec)) + { + tree res + = interpret_gimple_call (evolution_loop, SSA_NAME_DEF_STMT (chrec)); + + return res; + } + /* A parameter, nothing to do. */ if (!def_bb || !dominated_by_p (CDI_DOMINATORS, def_bb, instantiate_below->dest)) @@ -3376,6 +3496,9 @@ expression_expensive_p (tree expr, hash_map &cache, return true; } + if (oacc_call_analyzable_p (expr)) + return false; + bool visited_p; uint64_t &local_cost = cache.get_or_insert (expr, &visited_p); if (visited_p) diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h index d679f7285b30..f35bfcd80417 100644 --- a/gcc/tree-scalar-evolution.h +++ b/gcc/tree-scalar-evolution.h @@ -42,6 +42,9 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *, bool); extern bool iv_can_overflow_p (class loop *, tree, tree, tree); extern tree compute_overall_effect_of_inner_loop (class loop *, tree); +extern void set_scev_analyze_openacc_calls (bool); +extern bool oacc_call_analyzable_p (gimple); +extern bool oacc_call_analyzable_p (tree); /* Returns the basic block preceding LOOP, or the CFG entry block when the loop is function's body. */ diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index 1281e67489c0..132e17251de0 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -242,6 +242,26 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive) && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee)) return; + /* Most, but not all function calls are required. Function calls that + produce no result and have no side effects (i.e. const pure + functions) are unnecessary. */ + if (gimple_has_side_effects (stmt)) + { + mark_stmt_necessary (stmt, true); + + /* The lhs of the OpenACC loop and reduction calls necessary, + cf. the lowering in omp-offload.c. */ + if (gimple_call_internal_p (stmt, IFN_UNIQUE) + || gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION)) + { + tree lhs = gimple_call_lhs (stmt); + if (lhs) + mark_operand_necessary (lhs); + } + + return; + } + /* IFN_GOACC_LOOP calls are necessary in that they are used to represent parameter (i.e. step, bound) of a lowered OpenACC partitioned loop. But this kind of partitioned loop might not @@ -251,6 +271,9 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive) if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP)) { mark_stmt_necessary (stmt, true); + tree lhs = gimple_call_lhs (stmt); + gcc_assert (lhs); + mark_operand_necessary (lhs); return; } break; diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 75109407124f..b2689d348d64 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2039,6 +2039,9 @@ simplify_replace_tree (tree expr, tree old, tree new_tree, return (ret ? (do_fold ? fold (ret) : ret) : expr); } +bool oacc_call_analyzable_p (gimple* stmt); +tree interpret_gimple_call (class loop *loop, gimple *call); + /* Expand definitions of ssa names in EXPR as long as they are simple enough, and return the new expression. If STOP is specified, stop expanding if EXPR equals to it. */ @@ -2054,6 +2057,9 @@ expand_simple_operations (tree expr, tree stop, hash_map &cache) if (expr == NULL_TREE) return expr; + if (oacc_call_analyzable_p (expr)) + expr = interpret_gimple_call (NULL, SSA_NAME_DEF_STMT (expr)); + if (is_gimple_min_invariant (expr)) return expr; diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 8d5572033f7b..168bd348a6f2 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -155,6 +155,13 @@ make_pass_tree_loop (gcc::context *ctxt) static bool gate_oacc_kernels (function *fn) { + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE) + return false; + + gcc_checking_assert (param_openacc_kernels + == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS); + if (!flag_openacc) return false; @@ -323,6 +330,10 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { + if (param_openacc_kernels != OPENACC_KERNELS_DECOMPOSE_PARLOOPS + && param_openacc_kernels != OPENACC_KERNELS_PARLOOPS) + return false; + return (optimize && flag_openacc /* Don't bother doing anything if the program has errors. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c index 9392e1d88c58..086645b3ac3d 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c @@ -3,6 +3,8 @@ /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting aspects of that functionality. */ +/* { dg-additional-options "-O2" } for Graphite/"kernels". */ + /* See also '../libgomp.oacc-fortran/parallel-dims.f90'. */ diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 index 5a47aca2dba2..f79d01ccc419 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 @@ -1,5 +1,6 @@ ! { dg-do run } ! { dg-additional-options "-cpp" } +! { dg-additional-options "-O2" } for Graphite #define N (1024 * 512) diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90 index 37aa0ac4f632..5d35bdf9d6ff 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90 @@ -1,6 +1,7 @@ ! Exercise the auto, independent, seq and tile loop clauses inside ! kernels regions. +! { dg-additional-options "-O2" } for Graphite ! { dg-do run } program loops diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 index cf1d0e569278..74ee6fde84f8 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 @@ -1,6 +1,7 @@ ! { dg-do run } ! { dg-additional-options "-fopt-info-omp-all" } ! { dg-additional-options "--param=openacc-kernels=decompose" } +! { dg-additional-options "-O2" } for Graphite ! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName' ! passed to 'incr' may be unset, and in that case, it will be set to [...]", From patchwork Wed Dec 15 15:54:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48962 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9AED2385AC2D for ; Wed, 15 Dec 2021 16:08:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 753583858410 for ; Wed, 15 Dec 2021 15:56:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 753583858410 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: ZSSMg779MAA9YZ5woxMTkpJnrCHSRSGUEwSzKxOS126nOOTx26FU9HVqw4MwHNAhjNEFaswyPz U3B1/nhcykppHWfgcLiK4fqZ/r1SvgzaXuBXe/u1dJcszwQwcfELRs308dwzLD4oxo9yS/fpjw ecweqVmqP4E1on28nF5dLKyM0F+QgwNmfkHZyR9yQluA+USKeoCKHq5cqen5ezIqgNvpFfQAGT E+/OvWRAIS0Il1uiWRDrC8vGHbc9rOJpFTQWatDHGbNIIaQMqw/KBL4z8ICZrAv94x0AJKuvvb YvZgsaeomo/vSied1Mx8Ww1N X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738391" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:23 -0800 IronPort-SDR: 04j43a0d2+llMMHlChcPPsN6BapzcaCwymwxUN9buWJHT9nMVPKahPXGQDSbt09A61c61eVWmP RbZM34Zns6ZtHjno/z9o8Hk0d6OC0chtJTQ/jVTFfdX3nXlYS/1O7RLR6uVlkyVyzZN074HRg5 iurDV0UPUTQVPgW3dwb2nLdCNsc2u+Y9Bh+pjLHDqYBOIK4QigStJ/ZgceQhHFQItylWgDswF9 prbyVyKOg1k/YlY1Ds9TWfrkRHOlXhnznSbyX3aHqZc4J8OoXo+y1/wY697VS88yt195FF8bmZ Ny8= From: Frederik Harwath To: Subject: [PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps Date: Wed, 15 Dec 2021 16:54:28 +0100 Message-ID: <20211215155447.19379-22-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * graph.c (oacc_get_fn_attrib): New declaration. (find_loop_location): New declaration. (draw_cfg_nodes_for_loop): Print value of the can_be_parallel flag at the top of loops in OpenACC functions. --- gcc/graph.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graph.c b/gcc/graph.c index 9acd1d5b95e4..a34356e8a7ec 100644 --- a/gcc/graph.c +++ b/gcc/graph.c @@ -192,6 +192,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct function *fun) } } + +extern tree oacc_get_fn_attrib (tree); +extern dump_user_location_t find_loop_location (class loop *); + /* Draw all the basic blocks in LOOP. Print the blocks in breath-first order to get a good ranking of the nodes. This function is recursive: It first prints inner loops, then the body of LOOP itself. */ @@ -206,17 +210,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int funcdef_no, if (loop->header != NULL && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun)) - pp_printf (pp, - "\tsubgraph cluster_%d_%d {\n" - "\tstyle=\"filled\";\n" - "\tcolor=\"darkgreen\";\n" - "\tfillcolor=\"%s\";\n" - "\tlabel=\"loop %d\";\n" - "\tlabeljust=l;\n" - "\tpenwidth=2;\n", - funcdef_no, loop->num, - fillcolors[(loop_depth (loop) - 1) % 3], - loop->num); + { + pp_printf (pp, + "\tsubgraph cluster_%d_%d {\n" + "\tstyle=\"filled\";\n" + "\tcolor=\"darkgreen\";\n" + "\tfillcolor=\"%s\";\n" + "\tlabel=\"loop %d %s\";\n" + "\tlabeljust=l;\n" + "\tpenwidth=2;\n", + funcdef_no, loop->num, + fillcolors[(loop_depth (loop) - 1) % 3], loop->num, + /* This is only meaningful for loops that have been processed + by Graphite. + + TODO Use can_be_parallel_valid_p? */ + !oacc_get_fn_attrib (cfun->decl) + ? "" + : loop->can_be_parallel ? "(can_be_parallel = true)" + : "(can_be_parallel = false)"); + } for (class loop *inner = loop->inner; inner; inner = inner->next) draw_cfg_nodes_for_loop (pp, funcdef_no, inner); From patchwork Wed Dec 15 15:54:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48963 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0AEFF3853801 for ; Wed, 15 Dec 2021 16:08:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 34BC7385843B for ; Wed, 15 Dec 2021 15:56:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 34BC7385843B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: y95AOQpl1I/r6Sl7QR1XL3m7qpizKm5VYm0c2ZtFo3WOFRrtPXgo7z1i9kF6L7L2pPOaz8HZ8w ucP5BDqrGAC8qmDZrr3UvKewMfKvMOUlqUAQx5jalcY60K5UTN5dZFXdMAFRwkyEnfiI/FbEKF CkrVVJkv0dqDpjJAs4fG1yFn+MHINMwv1RIORtYz5MjQ0AU7XS85ZZ5umFs9U8KRWJiqBldjig m2Bm60LX0hqy6WoHOGLtM4lPKX56E/pI28wRF70ch95/76yWTQTSA3qJ6gjjZXeuqHdr7iM/vN gtTWzX6skKyRJ+lqr/n/he0U X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738392" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:23 -0800 IronPort-SDR: ekwk1Pai37vzf0rQNMUtvelSrHZHYmd65CTR8XtlvGBRaUEyYfStby4OLz6acjKQ8w63URf+jn VJ5yvha/mV+nD5Jrsr6OIJLnAG4/CuxuNS5MFWlcaAwoUMly+gpGOrHAoGwZwTFpnKSBn6CEvV XWYxe649cewOz7IbbfHtla/LY0IuBWfJyTzcgaRyFx3V39X4fWwS1z/4nt3W7ycw7gAywkVKqI IRySJC6BWJBe6ZR+YxYA0SbFUIDrOAr6al/w9qSCT6bagsdHpkEBVCt82+Un/AuJ2CgVSSqIc0 3aY= From: Frederik Harwath To: Subject: [PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions Date: Wed, 15 Dec 2021 16:54:29 +0100 Message-ID: <20211215155447.19379-23-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" With the old "kernels" handling, unparallelized regions would get executed with 1x1x1 partitioning even if the user provided explicit num_gangs, num_workers clauses etc. This commit restores this behavior by removing unused partitioning after assigning the parallelism dimensions to loops. gcc/ChangeLog: * omp-offload.c (oacc_remove_unused_partitioning): New function for removing partitioning that is not used by any loop. (oacc_validate_dims): Call oacc_remove_unused_partitioning and enable warnings about unused partitioning. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust expectations. --- gcc/omp-offload.c | 51 +++++++++++++++++-- .../acc_prof-kernels-1.c | 18 ++++--- 2 files changed, 58 insertions(+), 11 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 2743e90f79a3..392ca56b1f4f 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -1097,6 +1097,39 @@ oacc_parse_default_dims (const char *dims) targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0); } +/* Remove parallelism dimensions below LEVEL which are not set in USED + from DIMS and emit a warning pointing to the location of FN. */ + +static void +oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used) +{ + + bool host_compiler = true; +#ifdef ACCEL_COMPILER + host_compiler = false; +#endif + + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM enumeration. */ + { "gang", "worker", "vector" }; + + char removed_partitions[20] = "\0"; + for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++) + if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0) + { + if (host_compiler) + { + strcat (removed_partitions, axes[ix]); + strcat (removed_partitions, " "); + } + dims[ix] = -1; + } + if (removed_partitions[0] != '\0') + warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism, + "removed %spartitioning from % region", + removed_partitions); +} + /* Validate and update the dimensions for offloaded FN. ATTRS is the raw attribute. DIMS is an array of dimensions, which is filled in. LEVEL is the partitioning level of a routine, or -1 for an offload @@ -1117,6 +1150,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) for (ix = 0; ix != GOMP_DIM_MAX; ix++) { purpose[ix] = TREE_PURPOSE (pos); + tree val = TREE_VALUE (pos); dims[ix] = val ? TREE_INT_CST_LOW (val) : -1; pos = TREE_CHAIN (pos); @@ -1126,14 +1160,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) #ifdef ACCEL_COMPILER check = false; #endif + + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM enumeration. */ + { "gang", "worker", "vector" }; + if (check && warn_openacc_parallelism - && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)) - && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn))) + && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))) { - static char const *const axes[] = - /* Must be kept in sync with GOMP_DIM enumeration. */ - { "gang", "worker", "vector" }; for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++) if (dims[ix] < 0) ; /* Defaulting axis. */ @@ -1144,14 +1179,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) "region contains %s partitioned code but" " is not %s partitioned", axes[ix], axes[ix]); else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1) + { /* The dimension is explicitly partitioned to non-unity, but no use is made within the region. */ warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism, "region is %s partitioned but" " does not contain %s partitioned code", axes[ix], axes[ix]); + } } + if (lookup_attribute ("oacc parallel_kernels_graphite", + DECL_ATTRIBUTES (fn))) + oacc_remove_unused_partitioning (fn, dims, level, used); + bool changed = targetm.goacc.validate_dims (fn, dims, level, used); /* Default anything left to 1 or a partitioned default. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c index ad33f72e2fb6..65c83dce01c9 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c @@ -7,6 +7,8 @@ #include +/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" } { "" } } */ +/* { dg-additional-options "-Wopenacc-parallelism" } */ /* Use explicit 'copyin' clauses, to work around "'firstprivate' optimizations", which will cause the value at the point of call to be used @@ -95,12 +97,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e assert (event_info->launch_event.num_workers >= 1); else { -#ifdef __OPTIMIZE__ - assert (event_info->launch_event.num_workers == num_workers); -#else - /* See 'num_gangs' above. */ - assert (event_info->launch_event.num_workers == 1); -#endif + /* Unused partitioning levels get removed from "kernels" region. */ + assert (event_info->launch_event.num_workers == real_num_workers); } if (vector_length < 1) assert (event_info->launch_event.vector_length >= 1); @@ -183,6 +181,7 @@ int main() STATE_OP (state, = 0); num_gangs = 30; num_workers = 3; + real_num_workers = 1; vector_length = 5; { #define N 100 @@ -192,6 +191,8 @@ int main() /* { dg-prune-output "using vector_length \\(32\\), ignoring 5" } */ { for (int i = 0; i < N; ++i) + /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 } */ + /* { dg-warning "removed worker partitioning from 'kernels' region" "" { target *-*-* } .-2 } */ x[i] = i * i; } if (acc_device_type == acc_device_host) @@ -208,6 +209,9 @@ int main() STATE_OP (state, = 0); num_gangs = 22; num_workers = 5; + /* No worker loop and hence, in a kernels region, worker partitioning + should be removed. */ + real_num_workers = 1; vector_length = 7; { #define N 100 @@ -217,6 +221,8 @@ int main() /* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */ { for (int i = 0; i < N; ++i) + /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 } */ + /* { dg-warning "removed worker partitioning from 'kernels' region" "" { target *-*-* } .-2 } */ x[i] = i * i; } if (acc_device_type == acc_device_host) From patchwork Wed Dec 15 15:54:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48965 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E3B33858001 for ; Wed, 15 Dec 2021 16:10:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id C4497385843E for ; Wed, 15 Dec 2021 15:56:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C4497385843E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: VMvI8ZbRfpguSIkaZ60jiWX6PVtn38OFoZpdLHkbwvZSp2BFyno6jXmHdmsKi4qatwJF/zAi2v hgWxwW9x/1Odysgh6r2AGbLd+zDUPaFCVDYDj8bZaXHb821/NN/o8QOXEyotSiwGqfl0q8fmtB j7vzmSjM7O47tyi9Hq+7ytNT8/MDXMo2b9pDRy54Na/lY97Owoeae7jxaEhaZtZ0nK/lOnT3DB abNReA6LuAOkc25a+j3+ULNHlV+FQ9TR6zgLKOCg8x+wG8dWKixZW79ylGIryShO/13Ii4oPTf 0AQSvbGwWrMFgEeFOy5nwKgp X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69738395" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:26 -0800 IronPort-SDR: wgP8Gv6i5ksAj57JUkg7nMO1Z0+EOfz1etSMwJkM4qv4mDFIi2e7NdTXy2lbmFpv1HSSqVwdvE pXMcSiBOrJR+r2Lje3iKwCAxPH932pycd4cKsLW7PBt/iSlc4C1eaQTqW41dhaS8Bsftz3nav7 oKUFpfURn1Zom1QRdKB0qQl38U5Jp59bRWyrbAT5s9nASf8xt1d7BgnS3yV1nKL39UzmsikeFC SgPx/s+8l+e3eZKuCiMIjYdzOoZfjz2jlwbM/C55p6vC7Ei97ayOes4ad6ARtng2KqI6q16K7+ Xgg= From: Frederik Harwath To: Subject: [PATCH 23/40] Add function for printing a single OMP_CLAUSE Date: Wed, 15 Dec 2021 16:54:30 +0100 Message-ID: <20211215155447.19379-24-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain") changed the dumping behavior for OMP_CLAUSEs. The old behavior is required for a follow-up commit ("openacc: Add data optimization pass") that optimizes single OMP_CLAUSEs. gcc/ChangeLog: * tree-pretty-print.c (print_omp_clause_to_str): Add new function. * tree-pretty-print.h (print_omp_clause_to_str): Add declaration. --- gcc/tree-pretty-print.c | 11 +++++++++++ gcc/tree-pretty-print.h | 1 + 2 files changed, 12 insertions(+) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index 275dc7d8af73..e85370cfe722 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -1360,6 +1360,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) } } +/* Print the single clause at the top of the clause chain C to a string and + return it. Note that print_generic_expr_to_str prints the whole clause chain + instead. The caller must free the returned memory. */ + +char * +print_omp_clause_to_str (tree c) +{ + pretty_printer pp; + dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS); + return xstrdup (pp_formatted_text (&pp)); +} /* Dump chain of OMP clauses. diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h index dacd256302b2..f9ff0ee1ce0b 100644 --- a/gcc/tree-pretty-print.h +++ b/gcc/tree-pretty-print.h @@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = TDF_NONE); extern char *print_generic_expr_to_str (tree); extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t, bool = true); +extern char *print_omp_clause_to_str (tree); extern void dump_omp_atomic_memory_order (pretty_printer *, enum omp_memory_order); extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int, From patchwork Wed Dec 15 15:54:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48966 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4D9B1385843E for ; Wed, 15 Dec 2021 16:11:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 1F341385801D for ; Wed, 15 Dec 2021 15:56:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F341385801D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: yAg0PViWUxn2Tghf3s0A+VwCf0ADItUXxMZXrXRpgeG00jjsoXIK7ATlan9cYzTVyyZujS/SOt t5lUa20Ke1oG2mskefZPAEzUQ6i2DXR+QtOW7DDFd2zwa2Er/EFcWuIYUvSLe+yoxyxKhjsM8y u1mKQhJyWsFlsiV3r/xRLNIYNcpBRnYH7cWCmvldFSWEwG6BIHVsime11PquWpYRF7KlJ3eS4L OxyiMbFB1i0AzSH6ojkAejWDapAgOccrxk5WVWU968h8gkdF24B9WBqpMlt4BHnzQ4U/0QNq4f GPuMNkrEDBz+5Ix0rwiydt+r X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="72258708" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:37 -0800 IronPort-SDR: PvxyiGg2x/Sg5Dw+idPike15O7/3J+VO2U97S9fivh840SqhqkM9OTF5FR4VE+mwLA0XBCJ17j kSAPCDFLmqm3kIi4cpMQv+NdgCnauZDcW9kVDIgqwwZZx+Hpwk9XHmXnBvrhTLtq52rhLh1lEq 5NKOw3WVft7UAq4K+U6iAKAQllZ3isxWvXKJE3U4iP1+lzrCROtyQRasK81sH1ltI1FPVVtMxZ A9C8aw++znKK6YH/JCkplFk49JCnFKnCXCo04iX+u3hxfYOjxG4EEOqdW0WlBMLTw2cR0IOElf ins= From: Frederik Harwath To: Subject: [PATCH 24/40] openacc: Add data optimization pass Date: Wed, 15 Dec 2021 16:54:31 +0100 Message-ID: <20211215155447.19379-25-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Stubbs , rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Andrew Stubbs Address PR90591 "Avoid unnecessary data transfer out of OMP construct", for simple (but common) cases. This commit adds a pass that optimizes data mapping clauses. Currently, it can optimize copy/map(tofrom) clauses involving scalars to copyin/map(to) and further to "private". The pass is restricted "kernels" regions but could be extended to other types of regions. gcc/ChangeLog: * Makefile.in: Add pass. * doc/gimple.texi: TODO. * gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking. * gimple-walk.h (struct walk_stmt_info): Add field. * passes.def: Add new pass. * tree-pass.h (make_pass_omp_data_optimize): New declaration. * omp-data-optimize.cc: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Expect optimization messages. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/goacc/uninit-copy-clause.c: Likewise. * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise. * c-c++-common/goacc/omp_data_optimize-1.c: New test. * g++.dg/goacc/omp_data_optimize-1.C: New test. * gfortran.dg/goacc/omp_data_optimize-1.f90: New test. Co-Authored-By: Thomas Schwinge --- gcc/Makefile.in | 1 + gcc/doc/gimple.texi | 2 + gcc/gimple-walk.c | 15 +- gcc/gimple-walk.h | 6 + gcc/omp-data-optimize.cc | 951 ++++++++++++++++++ gcc/passes.def | 1 + .../c-c++-common/goacc/omp_data_optimize-1.c | 677 +++++++++++++ .../c-c++-common/goacc/uninit-copy-clause.c | 6 + .../g++.dg/goacc/omp_data_optimize-1.C | 169 ++++ .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++++++++++ .../gfortran.dg/goacc/uninit-copy-clause.f95 | 2 + gcc/tree-pass.h | 1 + .../kernels-decompose-1.c | 2 + .../libgomp.oacc-fortran/pr94358-1.f90 | 4 + 14 files changed, 2422 insertions(+), 3 deletions(-) create mode 100644 gcc/omp-data-optimize.cc create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index debd8047cc85..e876e6ec993c 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1515,6 +1515,7 @@ OBJS = \ omp-oacc-kernels-decompose.o \ omp-oacc-neuter-broadcast.o \ omp-simd-clone.o \ + omp-data-optimize.o \ opt-problem.o \ optabs.o \ optabs-libfuncs.o \ diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 5d89dbcc68d5..c8f0b8b2a826 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -2770,4 +2770,6 @@ calling @code{walk_gimple_stmt} on each one. @code{WI} is as in @code{walk_gimple_stmt}. If @code{walk_gimple_stmt} returns non-@code{NULL}, the walk is stopped and the value returned. Otherwise, all the statements are walked and @code{NULL_TREE} returned. + +TODO update for forward vs. backward. @end deftypefn diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c index e15fd4697ba1..b6add4394ab2 100644 --- a/gcc/gimple-walk.c +++ b/gcc/gimple-walk.c @@ -32,6 +32,8 @@ along with GCC; see the file COPYING3. If not see /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt on each one. WI is as in walk_gimple_stmt. + TODO update for forward vs. backward. + If walk_gimple_stmt returns non-NULL, the walk is stopped, and the value is stored in WI->CALLBACK_RESULT. Also, the statement that produced the value is returned if this statement has not been @@ -44,9 +46,10 @@ gimple * walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt, walk_tree_fn callback_op, struct walk_stmt_info *wi) { - gimple_stmt_iterator gsi; + bool forward = !(wi && wi->backward); - for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); ) + gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq); + for (; !gsi_end_p (gsi); ) { tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi); if (ret) @@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt, } if (!wi->removed_stmt) - gsi_next (&gsi); + { + if (forward) + gsi_next (&gsi); + else //TODO Correct? + gsi_prev (&gsi); + //TODO This could do with some unit testing (see other 'gcc/*-tests.c' files for inspiration), to make sure all the corner cases (removing first/last, for example) work correctly. + } } if (wi) diff --git a/gcc/gimple-walk.h b/gcc/gimple-walk.h index f471f10088df..4ebc71d73ddf 100644 --- a/gcc/gimple-walk.h +++ b/gcc/gimple-walk.h @@ -71,6 +71,12 @@ struct walk_stmt_info /* True if we've removed the statement that was processed. */ BOOL_BITFIELD removed_stmt : 1; + + /*TODO True if we're walking backward instead of forward. */ + //TODO This flag is only applicable for 'walk_gimple_seq'. + //TODO Instead of this somewhat mis-placed (?) flag here, may be able to factor out the walking logic woult of 'walk_gimple_stmt', and do the backward walking in a separate function? + //TODO + BOOL_BITFIELD backward : 1; }; /* Callback for walk_gimple_stmt. Called for every statement found diff --git a/gcc/omp-data-optimize.cc b/gcc/omp-data-optimize.cc new file mode 100644 index 000000000000..31f615c1d2bd --- /dev/null +++ b/gcc/omp-data-optimize.cc @@ -0,0 +1,951 @@ +/* OMP data optimize + + Copyright (C) 2021 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* This pass tries to optimize OMP data movement. + + The purpose is two-fold: (1) simply avoid redundant data movement, and (2) + as an enabler for other compiler optimizations. + + Currently, the focus is on OpenACC 'kernels' constructs, but this may be + done more generally later: other compute constructs, but also structured + 'data' constructs, for example. + + Currently, this implements: + - Convert "copy/map(tofrom)" to "copyin/map(to)", where the variable is + known to be dead on exit. + - Further optimize to "private" where the variable is also known to be + dead on entry. + + Future improvements may include: + - Optimize mappings that do not start as "copy/map(tofrom)". + - Optimize mappings to "copyout/map(from)" where the variable is dead on + entry, but not exit. + - Improved data liveness checking. + - Etc. + + As long as we make sure to not violate user-expected OpenACC semantics, we + may do "anything". + + The pass runs too early to use the full data flow analysis tools, so this + uses some simplified rules. The analysis could certainly be improved. + + A variable is dead on exit if + 1. Nothing reads it between the end of the target region and the end + of the function. + 2. It is not global, static, external, or otherwise persistent. + 3. It is not addressable (and therefore cannot be aliased). + 4. There are no backward jumps following the target region (and therefore + there can be no loop around the target region). + + A variable is dead on entry if the first occurrence of the variable within + the target region is a write. The algorithm attempts to check all possible + code paths, but may give up where control flow is too complex. No attempt + is made to evaluate conditionals, so it is likely that it will miss cases + where the user might declare private manually. + + Future improvements: + 1. Allow backward jumps (loops) where the target is also after the end of + the target region. + 2. Detect dead-on-exit variables when there is a write following the + target region (tricky, in the presence of conditionals). + 3. Ignore reads in the "else" branch of conditionals where the target + region is in the "then" branch. + 4. Optimize global/static/external variables that are provably dead on + entry or exit. + (Most of this can be achieved by unifying the two DF algorithms in this + file; the one for scanning inside the target regions had to be made more + capable, with propagation of live state across blocks, but that's more + effort than I have time right now to do the rework.) +*/ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree-pass.h" +#include "options.h" +#include "tree.h" +#include "function.h" +#include "basic-block.h" +#include "gimple.h" +#include "gimplify.h" +#include "gimple-iterator.h" +#include "gimple-walk.h" +#include "gomp-constants.h" +#include "gimple-pretty-print.h" + +#define DUMP_LOC(STMT) \ + dump_user_location_t::from_location_t (OMP_CLAUSE_LOCATION (STMT)) + +/* These types track why we could *not* optimize a variable mapping. The + main reason for differentiating the different reasons is diagnostics. */ + +enum inhibit_kinds { + INHIBIT_NOT, // "optimize" + INHIBIT_USE, + INHIBIT_JMP, + INHIBIT_BAD +}; + +struct inhibit_descriptor +{ + enum inhibit_kinds kind; + gimple *stmt; +}; + +/* OMP Data Optimize walk state tables. */ +struct ODO_State { + hash_map candidates; + hash_set visited_labels; + bool lhs_scanned; +}; + +/* These types track whether a variable can be full private, or not. + + These are ORDERED in ascending precedence; when combining two values + (at a conditional or switch), the higher value is used. */ + +enum access_kinds { + ACCESS_NONE, /* Variable not accessed. */ + ACCESS_DEF_FIRST, /* Variable is defined before use. */ + ACCESS_UNKNOWN, /* Status is yet to be determined. */ + ACCESS_UNSUPPORTED, /* Variable is array or reference. */ + ACCESS_USE_FIRST /* Variable is used without definition (live on entry). */ +}; + +struct ODO_BB { + access_kinds access; + gimple *foot_stmt; +}; + +struct ODO_Target_state { + tree var; + + const void *bb_id; /* A unique id for the BB (use a convenient pointer). */ + ODO_BB bb; + bool lhs_scanned; + bool can_short_circuit; + + hash_map scanned_bb; +}; + +/* Classify a newly discovered variable, and add it to the candidate list. */ + +static void +omp_data_optimize_add_candidate (const dump_user_location_t &loc, tree var, + ODO_State *state) +{ + inhibit_descriptor in; + in.stmt = NULL; + + if (DECL_EXTERNAL (var)) + { + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, loc, + " -> unsuitable variable: %<%T%> is external\n", var); + + in.kind = INHIBIT_BAD; + } + else if (TREE_STATIC (var)) + { + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, loc, + " -> unsuitable variable: %<%T%> is static\n", var); + + in.kind = INHIBIT_BAD; + } + else if (TREE_ADDRESSABLE (var)) + { + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, loc, + " -> unsuitable variable: %<%T%> is addressable\n", + var); + + in.kind = INHIBIT_BAD; + } + else + { + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, loc, " -> candidate variable: %<%T%>\n", + var); + + in.kind = INHIBIT_NOT; + } + + if (state->candidates.put (var, in)) + gcc_unreachable (); +} + +/* Add all the variables in a gimple bind statement to the list of + optimization candidates. */ + +static void +omp_data_optimize_stmt_bind (const gbind *bind, ODO_State *state) +{ + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, bind, "considering scope\n"); + + tree vars = gimple_bind_vars (bind); + for (tree var = vars; var; var = TREE_CHAIN (var)) + omp_data_optimize_add_candidate (bind, var, state); +} + +/* Assess a control flow statement to see if it prevents us from optimizing + OMP variable mappings. A conditional jump usually won't, but a loop + means a much more complicated liveness algorithm than this would be needed + to reason effectively. */ + +static void +omp_data_optimize_stmt_jump (gimple *stmt, ODO_State *state) +{ + /* In the general case, in presence of looping/control flow, we cannot make + any promises about (non-)uses of 'var's -- so we have to inhibit + optimization. */ + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, stmt, "loop/control encountered: %G\n", stmt); + + bool forward = false; + switch (gimple_code (stmt)) + { + case GIMPLE_COND: + if (state->visited_labels.contains (gimple_cond_true_label + (as_a (stmt))) + && state->visited_labels.contains (gimple_cond_false_label + (as_a (stmt)))) + forward = true; + break; + case GIMPLE_GOTO: + if (state->visited_labels.contains (gimple_goto_dest + (as_a (stmt)))) + forward = true; + break; + case GIMPLE_SWITCH: + { + gswitch *sw = as_a (stmt); + forward = true; + for (unsigned i = 0; i < gimple_switch_num_labels (sw); i++) + if (!state->visited_labels.contains (CASE_LABEL + (gimple_switch_label (sw, + i)))) + { + forward = false; + break; + } + break; + } + case GIMPLE_ASM: + { + gasm *asm_stmt = as_a (stmt); + forward = true; + for (unsigned i = 0; i < gimple_asm_nlabels (asm_stmt); i++) + if (!state->visited_labels.contains (TREE_VALUE + (gimple_asm_label_op + (asm_stmt, i)))) + { + forward = false; + break; + } + break; + } + default: + gcc_unreachable (); + } + if (forward) + { + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, stmt, + " -> forward jump; candidates remain valid\n"); + + return; + } + + /* If we get here then control flow has invalidated all current optimization + candidates. */ + for (hash_map::iterator it = state->candidates.begin (); + it != state->candidates.end (); + ++it) + { + if ((*it).second.kind == INHIBIT_BAD) + continue; + + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, stmt, " -> discarding candidate: %T\n", + (*it).first); + + /* We're walking backward: this earlier instance ("earlier" in + 'gimple_seq' forward order) overrides what we may have had before. */ + (*it).second.kind = INHIBIT_JMP; + (*it).second.stmt = stmt; + } +} + +/* A helper callback for omp_data_optimize_can_be_private. + Check if an operand matches the specific one we're looking for, and + assess the context in which it appears. */ + +static tree +omp_data_optimize_scan_target_op (tree *tp, int *walk_subtrees, void *data) +{ + struct walk_stmt_info *wi = (struct walk_stmt_info *) data; + ODO_Target_state *state = (ODO_Target_state *)wi->info; + tree op = *tp; + + if (wi->is_lhs && !state->lhs_scanned + && state->bb.access != ACCESS_USE_FIRST) + { + /* We're at the top level of the LHS operand. Anything we scan inside + (array indices etc.) should be treated as RHS. */ + state->lhs_scanned = 1; + + /* Writes to arrays and references are unhandled, as yet. */ + tree base = get_base_address (op); + if (base && base != op && base == state->var) + { + state->bb.access = ACCESS_UNSUPPORTED; + *walk_subtrees = 0; + } + /* Write to scalar variable. */ + else if (op == state->var) + { + state->bb.access = ACCESS_DEF_FIRST; + *walk_subtrees = 0; + } + } + else if (op == state->var) + { + state->bb.access = ACCESS_USE_FIRST; + *walk_subtrees = 0; + } + return NULL; +} + +/* A helper callback for omp_data_optimize_can_be_private, this assesses a + statement inside a target region to see how it affects the data flow of the + operands. A set of basic blocks is recorded, each with the observed access + details for the given variable. */ + +static tree +omp_data_optimize_scan_target_stmt (gimple_stmt_iterator *gsi_p, + bool *handled_ops_p, + struct walk_stmt_info *wi) +{ + ODO_Target_state *state = (ODO_Target_state *) wi->info; + gimple *stmt = gsi_stmt (*gsi_p); + + /* If an access was found in the previous statement then we're done. */ + if (state->bb.access != ACCESS_NONE && state->can_short_circuit) + { + *handled_ops_p = true; + return (tree)1; /* Return non-NULL, otherwise ignored. */ + } + + /* If the first def/use is already found then don't check more operands. */ + *handled_ops_p = state->bb.access != ACCESS_NONE; + + switch (gimple_code (stmt)) + { + /* These will be the last statement in a basic block, and will always + be followed by a label or the end of scope. */ + case GIMPLE_COND: + case GIMPLE_GOTO: + case GIMPLE_SWITCH: + if (state->bb.access == ACCESS_NONE) + state->bb.access = ACCESS_UNKNOWN; + state->bb.foot_stmt = stmt; + state->can_short_circuit = false; + break; + + /* asm goto statements are not necessarily followed by a label. */ + case GIMPLE_ASM: + if (gimple_asm_nlabels (as_a (stmt)) > 0) + { + if (state->bb.access == ACCESS_NONE) + state->bb.access = ACCESS_UNKNOWN; + state->bb.foot_stmt = stmt; + state->scanned_bb.put (state->bb_id, state->bb); + + /* Start a new fake BB using the asm string as a unique id. */ + state->bb_id = gimple_asm_string (as_a (stmt)); + state->bb.access = ACCESS_NONE; + state->bb.foot_stmt = NULL; + state->can_short_circuit = false; + } + break; + + /* A label is the beginning of a new basic block, and possibly the end + of the previous, in the case of a fall-through. */ + case GIMPLE_LABEL: + if (state->bb.foot_stmt == NULL) + state->bb.foot_stmt = stmt; + if (state->bb.access == ACCESS_NONE) + state->bb.access = ACCESS_UNKNOWN; + state->scanned_bb.put (state->bb_id, state->bb); + + state->bb_id = gimple_label_label (as_a (stmt)); + state->bb.access = ACCESS_NONE; + state->bb.foot_stmt = NULL; + break; + + /* These should not occur inside target regions?? */ + case GIMPLE_RETURN: + gcc_unreachable (); + + default: + break; + } + + /* Now walk the operands. */ + state->lhs_scanned = false; + return NULL; +} + +/* Check every operand under a gimple statement to see if a specific variable + is dead on entry to an OMP TARGET statement. If so, then we can make the + variable mapping PRIVATE. */ + +static bool +omp_data_optimize_can_be_private (tree var, gimple *target_stmt) +{ + ODO_Target_state state; + state.var = var; + void *root_id = var; /* Any non-null pointer will do for the unique ID. */ + state.bb_id = root_id; + state.bb.access = ACCESS_NONE; + state.bb.foot_stmt = NULL; + state.lhs_scanned = false; + state.can_short_circuit = true; + + struct walk_stmt_info wi; + memset (&wi, 0, sizeof (wi)); + wi.info = &state; + + /* Walk the target region and build the BB list. */ + gimple_seq target_body = *gimple_omp_body_ptr (target_stmt); + walk_gimple_seq (target_body, omp_data_optimize_scan_target_stmt, + omp_data_optimize_scan_target_op, &wi); + + /* Calculate the liveness data for the whole region. */ + if (state.can_short_circuit) + ; /* state.access has the answer already. */ + else + { + /* There's some control flow to navigate. */ + + /* First enter the final BB into the table. */ + state.scanned_bb.put (state.bb_id, state.bb); + + /* Propagate the known access findings to the parent BBs. + + For each BB that does not have a known liveness value, combine + the liveness data from its descendent BBs, if known. Repeat until + there are no more changes to make. */ + bool changed; + do { + changed = false; + for (hash_map::iterator it = state.scanned_bb.begin (); + it != state.scanned_bb.end (); + ++it) + { + ODO_BB *bb = &(*it).second; + tree label; + const void *bb_id1, *bb_id2; + ODO_BB *chain_bb1, *chain_bb2; + unsigned num_labels; + + /* The foot statement is NULL, in the exit block. + Blocks that already have liveness data are done. */ + if (bb->foot_stmt == NULL + || bb->access != ACCESS_UNKNOWN) + continue; + + /* If we get here then bb->access == ACCESS_UNKNOWN. */ + switch (gimple_code (bb->foot_stmt)) + { + /* If the final statement of a block is the label statement + then we have a fall-through. The liveness data can be simply + copied from the next block. */ + case GIMPLE_LABEL: + bb_id1 = gimple_label_label (as_a (bb->foot_stmt)); + chain_bb1 = state.scanned_bb.get (bb_id1); + if (chain_bb1->access != ACCESS_UNKNOWN) + { + bb->access = chain_bb1->access; + changed = true; + } + break; + + /* Combine the liveness data from both branches of a conditional + statement. The access values are ordered such that the + higher value takes precedence. */ + case GIMPLE_COND: + bb_id1 = gimple_cond_true_label (as_a + (bb->foot_stmt)); + bb_id2 = gimple_cond_false_label (as_a + (bb->foot_stmt)); + chain_bb1 = state.scanned_bb.get (bb_id1); + chain_bb2 = state.scanned_bb.get (bb_id2); + bb->access = (chain_bb1->access > chain_bb2->access + ? chain_bb1->access + : chain_bb2->access); + if (bb->access != ACCESS_UNKNOWN) + changed = true; + break; + + /* Copy the liveness data from the destination block. */ + case GIMPLE_GOTO: + bb_id1 = gimple_goto_dest (as_a (bb->foot_stmt)); + chain_bb1 = state.scanned_bb.get (bb_id1); + if (chain_bb1->access != ACCESS_UNKNOWN) + { + bb->access = chain_bb1->access; + changed = true; + } + break; + + /* Combine the liveness data from all the branches of a switch + statement. The access values are ordered such that the + highest value takes precedence. */ + case GIMPLE_SWITCH: + num_labels = gimple_switch_num_labels (as_a + (bb->foot_stmt)); + bb->access = ACCESS_NONE; /* Lowest precedence value. */ + for (unsigned i = 0; i < num_labels; i++) + { + label = gimple_switch_label (as_a + (bb->foot_stmt), i); + chain_bb1 = state.scanned_bb.get (CASE_LABEL (label)); + bb->access = (bb->access > chain_bb1->access + ? bb->access + : chain_bb1->access); + } + if (bb->access != ACCESS_UNKNOWN) + changed = true; + break; + + /* Combine the liveness data from all the branches of an asm goto + statement. The access values are ordered such that the + highest value takes precedence. */ + case GIMPLE_ASM: + num_labels = gimple_asm_nlabels (as_a (bb->foot_stmt)); + bb->access = ACCESS_NONE; /* Lowest precedence value. */ + /* Loop through all the labels and the fall-through block. */ + for (unsigned i = 0; i < num_labels + 1; i++) + { + if (i < num_labels) + bb_id1 = TREE_VALUE (gimple_asm_label_op + (as_a (bb->foot_stmt), i)); + else + /* The fall-through fake-BB uses the string for an ID. */ + bb_id1 = gimple_asm_string (as_a + (bb->foot_stmt)); + chain_bb1 = state.scanned_bb.get (bb_id1); + bb->access = (bb->access > chain_bb1->access + ? bb->access + : chain_bb1->access); + } + if (bb->access != ACCESS_UNKNOWN) + changed = true; + break; + + /* No other statement kinds should appear as foot statements. */ + default: + gcc_unreachable (); + } + } + } while (changed); + + /* The access status should now be readable from the initial BB, + if one could be determined. */ + state.bb = *state.scanned_bb.get (root_id); + } + + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + { + for (hash_map::iterator it = state.scanned_bb.begin (); + it != state.scanned_bb.end (); + ++it) + { + ODO_BB *bb = &(*it).second; + dump_printf_loc (MSG_NOTE, bb->foot_stmt, + "%<%T%> is %s on entry to block ending here\n", var, + (bb->access == ACCESS_NONE + || bb->access == ACCESS_DEF_FIRST ? "dead" + : bb->access == ACCESS_USE_FIRST ? "live" + : bb->access == ACCESS_UNSUPPORTED + ? "unknown (unsupported op)" + : "unknown (complex control flow)")); + } + /* If the answer was found early then then the last BB to be scanned + will not have been entered into the table. */ + if (state.can_short_circuit) + dump_printf_loc (MSG_NOTE, target_stmt, + "%<%T%> is %s on entry to target region\n", var, + (state.bb.access == ACCESS_NONE + || state.bb.access == ACCESS_DEF_FIRST ? "dead" + : state.bb.access == ACCESS_USE_FIRST ? "live" + : state.bb.access == ACCESS_UNSUPPORTED + ? "unknown (unsupported op)" + : "unknown (complex control flow)")); + } + + if (state.bb.access != ACCESS_DEF_FIRST + && dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf_loc (MSG_NOTE, target_stmt, "%<%T%> is not suitable" + " for private optimization; %s\n", var, + (state.bb.access == ACCESS_USE_FIRST + ? "live on entry" + : state.bb.access == ACCESS_UNKNOWN + ? "complex control flow" + : "unknown reason")); + + return state.bb.access == ACCESS_DEF_FIRST; +} + +/* Inspect a tree operand, from a gimple walk, and check to see if it is a + variable use that might mean the variable is not a suitable candidate for + optimization in a prior target region. + + This algorithm is very basic and can be easily fooled by writes with + subsequent reads, but it should at least err on the safe side. */ + +static void +omp_data_optimize_inspect_op (tree op, ODO_State *state, bool is_lhs, + gimple *stmt) +{ + if (is_lhs && !state->lhs_scanned) + { + /* We're at the top level of the LHS operand. + Anything we scan inside should be treated as RHS. */ + state->lhs_scanned = 1; + + /* Writes to variables are not yet taken into account, beyond not + invalidating the optimization, but not everything on the + left-hand-side is a write (array indices, etc.), and if one element of + an array is written to then we should assume the rest is live. */ + tree base = get_base_address (op); + if (base && base == op) + return; /* Writes to scalars are not a "use". */ + } + + if (!DECL_P (op)) + return; + + /* If we get here then we have found a use of a variable. */ + tree var = op; + + inhibit_descriptor *id = state->candidates.get (var); + if (id && id->kind != INHIBIT_BAD) + { + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + { + if (gimple_code (stmt) == GIMPLE_OMP_TARGET) + dump_printf_loc (MSG_NOTE, id->stmt, + "encountered variable use in target stmt\n"); + else + dump_printf_loc (MSG_NOTE, id->stmt, + "encountered variable use: %G\n", stmt); + dump_printf_loc (MSG_NOTE, id->stmt, + " -> discarding candidate: %T\n", op); + } + + /* We're walking backward: this earlier instance ("earlier" in + 'gimple_seq' forward order) overrides what we may have had before. */ + id->kind = INHIBIT_USE; + id->stmt = stmt; + } +} + +/* Optimize the data mappings of a target region, where our backward gimple + walk has identified that the variable is definitely dead on exit. */ + +static void +omp_data_optimize_stmt_target (gimple *stmt, ODO_State *state) +{ + for (tree *pc = gimple_omp_target_clauses_ptr (stmt); *pc; + pc = &OMP_CLAUSE_CHAIN (*pc)) + { + if (OMP_CLAUSE_CODE (*pc) != OMP_CLAUSE_MAP) + continue; + + tree var = OMP_CLAUSE_DECL (*pc); + if (OMP_CLAUSE_MAP_KIND (*pc) == GOMP_MAP_FORCE_TOFROM + || OMP_CLAUSE_MAP_KIND (*pc) == GOMP_MAP_TOFROM) + { + /* The dump_printf_loc format code %T does not print + the head clause of a clause chain but the whole chain. + Print the last considered clause manually. */ + char *c_s_prev = NULL; + if (dump_enabled_p ()) + c_s_prev = print_omp_clause_to_str (*pc); + + inhibit_descriptor *id = state->candidates.get (var); + if (!id) { + /* The variable was not a parameter or named in any bind, so it + must be in an external scope, and therefore live-on-exit. */ + if (dump_enabled_p ()) + dump_printf_loc(MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc), + "%qs not optimized: %<%T%> is unsuitable" + " for privatization\n", + c_s_prev, var); + continue; + } + + switch (id->kind) + { + case INHIBIT_NOT: /* Don't inhibit optimization. */ + + /* Change map type from "tofrom" to "to". */ + OMP_CLAUSE_SET_MAP_KIND (*pc, GOMP_MAP_TO); + + if (dump_enabled_p ()) + { + char *c_s_opt = print_omp_clause_to_str (*pc); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, DUMP_LOC (*pc), + "%qs optimized to %qs\n", c_s_prev, c_s_opt); + free (c_s_prev); + c_s_prev = c_s_opt; + } + + /* Variables that are dead-on-entry and dead-on-loop can be + further optimized to private. */ + if (omp_data_optimize_can_be_private (var, stmt)) + { + tree c_f = (build_omp_clause + (OMP_CLAUSE_LOCATION (*pc), + OMP_CLAUSE_PRIVATE)); + OMP_CLAUSE_DECL (c_f) = var; + OMP_CLAUSE_CHAIN (c_f) = OMP_CLAUSE_CHAIN (*pc); + //TODO Copy "implicit" flag from 'var'. + *pc = c_f; + + if (dump_enabled_p ()) + { + char *c_s_opt = print_omp_clause_to_str (*pc); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, DUMP_LOC (*pc), + "%qs further optimized to %qs\n", + c_s_prev, c_s_opt); + free (c_s_prev); + c_s_prev = c_s_opt; + } + } + break; + + case INHIBIT_USE: /* Optimization inhibited by a variable use. */ + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc), + "%qs not optimized: %<%T%> used...\n", + c_s_prev, var); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, id->stmt, + "... here\n"); + } + break; + + case INHIBIT_JMP: /* Optimization inhibited by control flow. */ + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc), + "%qs not optimized: %<%T%> disguised by" + " looping/control flow...\n", c_s_prev, var); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, id->stmt, + "... here\n"); + } + break; + + case INHIBIT_BAD: /* Optimization inhibited by properties. */ + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc), + "%qs not optimized: %<%T%> is unsuitable" + " for privatization\n", c_s_prev, var); + } + break; + + default: + gcc_unreachable (); + } + + if (dump_enabled_p ()) + free (c_s_prev); + } + } + + /* Variables used by target regions cannot be optimized from earlier + target regions. */ + for (tree c = *gimple_omp_target_clauses_ptr (stmt); + c; c = OMP_CLAUSE_CHAIN (c)) + { + /* This needs to include all the mapping clauses listed in + OMP_TARGET_CLAUSE_MASK in c-parser.c. */ + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP + && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_PRIVATE + && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE) + continue; + + tree var = OMP_CLAUSE_DECL (c); + omp_data_optimize_inspect_op (var, state, false, stmt); + } +} + +/* Call back for gimple walk. Scan the statement for target regions and + variable uses or control flow that might prevent us optimizing offload + data copies. */ + +static tree +omp_data_optimize_callback_stmt (gimple_stmt_iterator *gsi_p, + bool *handled_ops_p, + struct walk_stmt_info *wi) +{ + ODO_State *state = (ODO_State *) wi->info; + + *handled_ops_p = false; + state->lhs_scanned = false; + + gimple *stmt = gsi_stmt (*gsi_p); + + switch (gimple_code (stmt)) + { + /* A bind introduces a new variable scope that might include optimizable + variables. */ + case GIMPLE_BIND: + omp_data_optimize_stmt_bind (as_a (stmt), state); + break; + + /* Tracking labels allows us to understand control flow better. */ + case GIMPLE_LABEL: + state->visited_labels.add (gimple_label_label (as_a (stmt))); + break; + + /* Statements that might constitute some looping/control flow pattern + may inhibit optimization of target mappings. */ + case GIMPLE_COND: + case GIMPLE_GOTO: + case GIMPLE_SWITCH: + case GIMPLE_ASM: + omp_data_optimize_stmt_jump (stmt, state); + break; + + /* A target statement that will have variables for us to optimize. */ + case GIMPLE_OMP_TARGET: + /* For now, only look at OpenACC 'kernels' constructs. */ + if (gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS) + { + omp_data_optimize_stmt_target (stmt, state); + + /* Don't walk inside the target region; use of private variables + inside the private region does not stop them being private! + NOTE: we *do* want to walk target statement types that are not + (yet) handled by omp_data_optimize_stmt_target as the uses there + must not be missed. */ + // TODO add tests for mixed kernels/parallels + *handled_ops_p = true; + } + break; + + default: + break; + } + + return NULL; +} + +/* Call back for gimple walk. Scan the operand for variable uses. */ + +static tree +omp_data_optimize_callback_op (tree *tp, int *walk_subtrees, void *data) +{ + struct walk_stmt_info *wi = (struct walk_stmt_info *) data; + + omp_data_optimize_inspect_op (*tp, (ODO_State *)wi->info, wi->is_lhs, + wi->stmt); + + *walk_subtrees = 1; + return NULL; +} + +/* Main pass entry point. See comments at head of file. */ + +static unsigned int +omp_data_optimize (void) +{ + /* Capture the function arguments so that they can be optimized. */ + ODO_State state; + for (tree decl = DECL_ARGUMENTS (current_function_decl); + decl; + decl = DECL_CHAIN (decl)) + { + const dump_user_location_t loc = dump_user_location_t::from_function_decl (decl); + omp_data_optimize_add_candidate (loc, decl, &state); + } + + /* Scan and optimize the function body, from bottom to top. */ + struct walk_stmt_info wi; + memset (&wi, 0, sizeof (wi)); + wi.backward = true; + wi.info = &state; + gimple_seq body = gimple_body (current_function_decl); + walk_gimple_seq (body, omp_data_optimize_callback_stmt, + omp_data_optimize_callback_op, &wi); + + return 0; +} + + +namespace { + +const pass_data pass_data_omp_data_optimize = +{ + GIMPLE_PASS, /* type */ + "omp_data_optimize", /* name */ + OPTGROUP_OMP, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_gimple_any, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_omp_data_optimize : public gimple_opt_pass +{ +public: + pass_omp_data_optimize (gcc::context *ctxt) + : gimple_opt_pass (pass_data_omp_data_optimize, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return (flag_openacc + && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE); + } + virtual unsigned int execute (function *) + { + return omp_data_optimize (); + } + +}; // class pass_omp_data_optimize + +} // anon namespace + +gimple_opt_pass * +make_pass_omp_data_optimize (gcc::context *ctxt) +{ + return new pass_omp_data_optimize (ctxt); +} diff --git a/gcc/passes.def b/gcc/passes.def index 5b9bb422d281..681392f8f79f 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_warn_unused_result); NEXT_PASS (pass_diagnose_omp_blocks); NEXT_PASS (pass_diagnose_tm_blocks); + NEXT_PASS (pass_omp_data_optimize); NEXT_PASS (pass_omp_oacc_kernels_decompose); NEXT_PASS (pass_lower_omp); NEXT_PASS (pass_lower_cf); diff --git a/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c b/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c new file mode 100644 index 000000000000..c90031a40b71 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c @@ -0,0 +1,677 @@ +/* Test 'gcc/omp-data-optimize.c'. */ + +/* { dg-additional-options "-fdump-tree-gimple-raw" } */ +/* { dg-additional-options "-fopt-info-omp-all" } */ + +/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName' + passed to 'incr' may be unset, and in that case, it will be set to [...]", + so to maintain compatibility with earlier Tcl releases, we manually + initialize counter variables: + { dg-line l_compute[variable c_compute 0] } + { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid + "WARNING: dg-line var l_compute defined, but not used". + { dg-line l_use[variable c_use 0] } + { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid + "WARNING: dg-line var l_use defined, but not used". + { dg-line l_lcf[variable c_lcf 0] } + { dg-message "dummy" "" { target iN-VAl-Id } l_lcf } to avoid + "WARNING: dg-line var l_lcf defined, but not used". */ + +extern int ef1(int); + + +/* Optimization happens. */ + +long opt_1_gvar1; +extern short opt_1_evar1; +static long opt_1_svar1; + +static int opt_1(int opt_1_pvar1) +{ + int opt_1_lvar1; + extern short opt_1_evar2; + static long opt_1_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + int dummy1 = opt_1_pvar1; + int dummy2 = opt_1_lvar1; + int dummy3 = opt_1_evar2; + int dummy4 = opt_1_svar2; + + int dummy5 = opt_1_gvar1; + int dummy6 = opt_1_evar1; + int dummy7 = opt_1_svar1; + } + + return 0; + +/* { dg-optimized {'map\(force_tofrom:opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_1_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +long opt_2_gvar1; +extern short opt_2_evar1; +static long opt_2_svar1; + +static int opt_2(int opt_2_pvar1) +{ + int opt_2_lvar1; + extern short opt_2_evar2; + static long opt_2_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + int dummy1 = opt_2_pvar1; + int dummy2 = opt_2_lvar1; + int dummy3 = opt_2_evar2; + int dummy4 = opt_2_svar2; + + int dummy5 = opt_2_gvar1; + int dummy6 = opt_2_evar1; + int dummy7 = opt_2_svar1; + } + + /* A write does not inhibit optimization. */ + + opt_2_pvar1 = 0; + opt_2_lvar1 = 1; + opt_2_evar2 = 2; + opt_2_svar2 = 3; + + opt_2_gvar1 = 10; + opt_2_evar1 = 11; + opt_2_svar1 = 12; + + return 0; + +/* { dg-optimized {'map\(force_tofrom:opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + { dg-missed {'map\(force_tofrom:opt_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + { dg-missed {'map\(force_tofrom:opt_2_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +long opt_3_gvar1; +extern short opt_3_evar1; +static long opt_3_svar1; + +static int opt_3(int opt_3_pvar1) +{ + int opt_3_lvar1; + extern short opt_3_evar2; + static long opt_3_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* A write inside the kernel inhibits optimization to firstprivate. + TODO: optimize to private where the variable is dead-on-entry. */ + + opt_3_pvar1 = 1; + opt_3_lvar1 = 2; + opt_3_evar2 = 3; + opt_3_svar2 = 4; + + opt_3_gvar1 = 5; + opt_3_evar1 = 6; + opt_3_svar1 = 7; + } + + return 0; + +/* { dg-optimized {'map\(force_tofrom:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:opt_3_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + { dg-missed {'map\(force_tofrom:opt_3_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + { dg-missed {'map\(force_tofrom:opt_3_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void opt_4() +{ + int opt_4_larray1[10]; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + int dummy1 = opt_4_larray1[4]; + int dummy2 = opt_4_larray1[8]; + } + +/* { dg-optimized {'map\(tofrom:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-bogus {'map\(to:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'firstprivate\(opt_4_larray1\)'} "" { target *-*-* } l_compute$c_compute } */ +} + +static void opt_5 (int opt_5_pvar1) +{ + int opt_5_larray1[10]; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + opt_5_larray1[opt_5_pvar1] = 1; + opt_5_pvar1[opt_5_larray1] = 2; + } + +/* { dg-optimized {'map\(force_tofrom:opt_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ + +/* TODO: this probably should be optimizable. */ +/* { dg-missed {'map\(tofrom:opt_5_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_5_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + + +/* Similar, but with optimization inhibited because of variable use. */ + +static int use_1(int use_1_pvar1) +{ + float use_1_lvar1; + extern char use_1_evar2; + static double use_1_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + use_1_pvar1 = 0; + use_1_lvar1 = 1; + use_1_evar2 = 2; + use_1_svar2 = 3; + } + + int s = 0; + s += use_1_pvar1; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */ + s += use_1_lvar1; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */ + s += use_1_evar2; /* { dg-bogus {note: \.\.\. here} "" { target *-*-* } } */ + s += use_1_svar2; /* { dg-bogus {note: \.\.\. here} "" { target *-*-* } } */ + + return s; + +/* { dg-missed {'map\(force_tofrom:use_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_pvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:use_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:use_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:use_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +extern int use_2_a1[]; + +static int use_2(int use_2_pvar1) +{ + int use_2_lvar1; + extern int use_2_evar2; + static int use_2_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + use_2_pvar1 = 0; + use_2_lvar1 = 1; + use_2_evar2 = 2; + use_2_svar2 = 3; + } + + int s = 0; + s += use_2_a1[use_2_pvar1]; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */ + s += use_2_a1[use_2_lvar1]; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */ + s += use_2_a1[use_2_evar2]; + s += use_2_a1[use_2_svar2]; + + return s; + +/*TODO The following GIMPLE dump scanning maybe too fragile (across + different GCC configurations)? The idea is to verify that we're indeed + doing the "deep scanning", as discussed in + . */ +/* { dg-final { scan-tree-dump-times {(?n) gimple_assign $} 1 "gimple" } } */ +/* { dg-missed {'map\(force_tofrom:use_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_pvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-final { scan-tree-dump-times {(?n) gimple_assign $} 1 "gimple" } } */ +/* { dg-missed {'map\(force_tofrom:use_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-final { scan-tree-dump-times {(?n) gimple_assign $} 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times {(?n) gimple_assign $} 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times {(?n) gimple_assign $} 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times {(?n) gimple_assign $} 1 "gimple" } } */ +/* { dg-missed {'map\(force_tofrom:use_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:use_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void use_3 () +{ + int use_5_lvar1; + int use_5_larray1[10]; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + use_5_lvar1 = 5; + } + + use_5_larray1[use_5_lvar1] = 1; /* { dg-line l_use[incr c_use] } */ + +/* { dg-missed {'map\(force_tofrom:use_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_5_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */ +} + + +/* Similar, but with the optimization inhibited because of looping/control flow. */ + +static void lcf_1(int lcf_1_pvar1) +{ + float lcf_1_lvar1; + extern char lcf_1_evar2; + static double lcf_1_svar2; + + for (int i = 0; i < ef1(i); ++i) /* { dg-line l_lcf[incr c_lcf] } */ + { +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_1_pvar1 = 0; + lcf_1_lvar1 = 1; + lcf_1_evar2 = 2; + lcf_1_svar2 = 3; + } + } + +/* { dg-missed {'map\(force_tofrom:lcf_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */ +} + +static void lcf_2(int lcf_2_pvar1) +{ + float lcf_2_lvar1; + extern char lcf_2_evar2; + static double lcf_2_svar2; + + if (ef1 (0)) + return; + + repeat: +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_2_pvar1 = 0; + lcf_2_lvar1 = 1; + lcf_2_evar2 = 2; + lcf_2_svar2 = 3; + } + + goto repeat; /* { dg-line l_lcf[incr c_lcf] } */ + +/* { dg-missed {'map\(force_tofrom:lcf_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } +/* { dg-missed {'map\(force_tofrom:lcf_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */ +} + +static void lcf_3(int lcf_3_pvar1) +{ + float lcf_3_lvar1; + extern char lcf_3_evar2; + static double lcf_3_svar2; + + if (ef1 (0)) + return; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_3_pvar1 = 0; + lcf_3_lvar1 = 1; + lcf_3_evar2 = 2; + lcf_3_svar2 = 3; + } + + // Backward jump after kernel + repeat: + goto repeat; /* { dg-line l_lcf[incr c_lcf] } */ + +/* { dg-missed {'map\(force_tofrom:lcf_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } +/* { dg-missed {'map\(force_tofrom:lcf_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */ +} + +static void lcf_4(int lcf_4_pvar1) +{ + float lcf_4_lvar1; + extern char lcf_4_evar2; + static double lcf_4_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_4_pvar1 = 0; + lcf_4_lvar1 = 1; + lcf_4_evar2 = 2; + lcf_4_svar2 = 3; + } + + // Forward jump after kernel + goto out; + + out: + return; + +/* { dg-missed {'map\(force_tofrom:lcf_4_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_4_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +} + +static void lcf_5(int lcf_5_pvar1) +{ + float lcf_5_lvar1; + extern char lcf_5_evar2; + static double lcf_5_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_5_pvar1 = 0; + lcf_5_lvar1 = 1; + lcf_5_evar2 = 2; + lcf_5_svar2 = 3; + } + + if (ef1 (-1)) + ; + + return; + +/* { dg-optimized {'map\(force_tofrom:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_5_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_5_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void lcf_6(int lcf_6_pvar1) +{ + float lcf_6_lvar1; + extern char lcf_6_evar2; + static double lcf_6_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_6_pvar1 = 0; + lcf_6_lvar1 = 1; + lcf_6_evar2 = 2; + lcf_6_svar2 = 3; + } + + int x = ef1 (-2) ? 1 : -1; + + return; + +/* { dg-optimized {'map\(force_tofrom:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_6_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_6_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void lcf_7(int lcf_7_pvar1) +{ + float lcf_7_lvar1; + extern char lcf_7_evar2; + static double lcf_7_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_7_pvar1 = 0; + lcf_7_lvar1 = 1; + lcf_7_evar2 = 2; + lcf_7_svar2 = 3; + } + + switch (ef1 (-2)) + { + case 0: ef1 (10); break; + case 2: ef1 (11); break; + default: ef1 (12); break; + } + + return; + +/* { dg-optimized {'map\(force_tofrom:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_7_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_7_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_7_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_7_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_7_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_7_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void lcf_8(int lcf_8_pvar1) +{ + float lcf_8_lvar1; + extern char lcf_8_evar2; + static double lcf_8_svar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + lcf_8_pvar1 = 0; + lcf_8_lvar1 = 1; + lcf_8_evar2 = 2; + lcf_8_svar2 = 3; + } + + asm goto ("" :::: out); + +out: + return; + +/* { dg-optimized {'map\(force_tofrom:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_8_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_8_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_8_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_8_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:lcf_8_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_8_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +/* Ensure that variables are promoted to private properly. */ + +static void priv_1 () +{ + int priv_1_lvar1, priv_1_lvar2, priv_1_lvar3, priv_1_lvar4, priv_1_lvar5; + int priv_1_lvar6, priv_1_lvar7, priv_1_lvar8, priv_1_lvar9, priv_1_lvar10; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + priv_1_lvar1 = 1; + int dummy = priv_1_lvar2; + + if (priv_1_lvar2) + { + priv_1_lvar3 = 1; + } + else + { + priv_1_lvar3 = 2; + } + + priv_1_lvar5 = priv_1_lvar3; + + if (priv_1_lvar2) + { + priv_1_lvar4 = 1; + int dummy = priv_1_lvar4; + } + + switch (priv_1_lvar2) + { + case 0: + priv_1_lvar5 = 1; + dummy = priv_1_lvar6; + break; + case 1: + priv_1_lvar5 = 2; + priv_1_lvar6 = 3; + break; + default: + break; + } + + asm goto ("" :: "r"(priv_1_lvar7) :: label1, label2); + if (0) + { +label1: + priv_1_lvar8 = 1; + priv_1_lvar9 = 2; + } + if (0) + { +label2: + dummy = priv_1_lvar9; + dummy = priv_1_lvar10; + } + } + +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-bogus {'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar2\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar3\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar4\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar5\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-bogus {'map\(to:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar6\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-bogus {'map\(to:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar7\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar8\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-bogus {'map\(to:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar9\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-bogus {'map\(to:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar10\)'} "" { target *-*-* } l_compute$c_compute } */ +} + +static void multiple_kernels_1 () +{ +#pragma acc kernels + { + int multiple_kernels_1_lvar1 = 1; + } + + int multiple_kernels_2_lvar1; +#pragma acc kernels + { + int multiple_kernels_2_lvar1 = 1; + } + +#pragma acc parallel + { + multiple_kernels_2_lvar1++; + } +} + +static int ref_1 () +{ + int *ref_1_ref1; + int ref_1_lvar1; + + ref_1_ref1 = &ref_1_lvar1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + ref_1_lvar1 = 1; + } + + return *ref_1_ref1; + +/* { dg-missed {'map\(force_tofrom:ref_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static int ref_2 () +{ + int *ref_2_ref1; + int ref_2_lvar1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + ref_2_lvar1 = 1; + } + + ref_2_ref1 = &ref_2_lvar1; + return *ref_2_ref1; + +/* { dg-missed {'map\(force_tofrom:ref_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void ref_3 () +{ + int ref_3_lvar1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + // FIXME: could be optimized + { + int *ref_3_ref1 = &ref_3_lvar1; + ref_3_lvar1 = 1; + } + +/* { dg-missed {'map\(force_tofrom:ref_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void ref_4 () +{ + int ref_4_lvar1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + // FIXME: could be optmized + { + int *ref_4_ref1 = &ref_4_lvar1; + *ref_4_ref1 = 1; + } + +/* { dg-missed {'map\(force_tofrom:ref_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static void conditional_1 (int conditional_1_pvar1) +{ + int conditional_1_lvar1 = 1; + + if (conditional_1_pvar1) + { + // TODO: should be opimizable, but isn't due to later usage in the + // linear scan. +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + int dummy = conditional_1_lvar1; + } + } + else + { + int dummy = conditional_1_lvar1; /* { dg-line l_use[incr c_use] } */ + } + +/* { dg-missed {'map\(force_tofrom:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'conditional_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */ +} + +static void conditional_2 (int conditional_2_pvar1) +{ + int conditional_2_lvar1 = 1; + + if (conditional_2_pvar1) + { + int dummy = conditional_2_lvar1; + } + else + { +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + int dummy = conditional_2_lvar1; + } + } + +/* { dg-optimized {'map\(force_tofrom:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +} diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c index b3cc4459328f..628b84940a1c 100644 --- a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c +++ b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c @@ -7,6 +7,12 @@ foo (void) int i; #pragma acc kernels + /* { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 } */ + /*TODO With the 'copy' -> 'firstprivate' optimization, the original implicit 'copy(i)' clause gets optimized into a 'firstprivate(i)' clause -- and the expected (?) warning diagnostic appears. + Have to read up the history behind these test cases. + Should this test remain here in this file even if now testing 'firstprivate'? + Or, should the optimization be disabled for such testing? + Or, the testing be duplicated for both variants? */ { i = 1; } diff --git a/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C b/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C new file mode 100644 index 000000000000..5483e5682410 --- /dev/null +++ b/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C @@ -0,0 +1,169 @@ +/* Test 'gcc/omp-data-optimize.c'. */ + +/* { dg-additional-options "-std=c++11" } */ +/* { dg-additional-options "-fdump-tree-gimple-raw" } */ +/* { dg-additional-options "-fopt-info-omp-all" } */ + +/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName' + passed to 'incr' may be unset, and in that case, it will be set to [...]", + so to maintain compatibility with earlier Tcl releases, we manually + initialize counter variables: + { dg-line l_compute[variable c_compute 0] } + { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid + "WARNING: dg-line var l_compute defined, but not used". + { dg-line l_use[variable c_use 0] } + { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid + "WARNING: dg-line var l_use defined, but not used". */ + +static int closure_1 (int closure_1_pvar1) +{ + int closure_1_lvar1 = 1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + closure_1_lvar1 = closure_1_pvar1; + } + + auto lambda = [closure_1_lvar1]() {return closure_1_lvar1;}; /* { dg-line l_use[incr c_use] } */ + return lambda(); + +/* { dg-optimized {'map\(force_tofrom:closure_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_1_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:closure_1_lvar1 \[len: [0-9]\]\[implicit\]\)' not optimized: 'closure_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */ +} + +static int closure_2 (int closure_2_pvar1) +{ + int closure_2_lvar1 = 1; + + auto lambda = [closure_2_lvar1]() {return closure_2_lvar1;}; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + closure_2_lvar1 = closure_2_pvar1; + } + + return lambda(); + +/* { dg-optimized {'map\(force_tofrom:closure_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_2_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(closure_2_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +} + +static int closure_3 (int closure_3_pvar1) +{ + int closure_3_lvar1 = 1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + closure_3_lvar1 = closure_3_pvar1; + } + + auto lambda = [&]() {return closure_3_lvar1;}; + + return lambda(); + +/* { dg-optimized {'map\(force_tofrom:closure_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_3_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {map\(force_tofrom:closure_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static int closure_4 (int closure_4_pvar1) +{ + int closure_4_lvar1 = 1; + + auto lambda = [&]() {return closure_4_lvar1;}; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + closure_4_lvar1 = closure_4_pvar1; + } + + return lambda(); + +/* { dg-optimized {'map\(force_tofrom:closure_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_4_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {map\(force_tofrom:closure_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */ +} + +static int closure_5 (int closure_5_pvar1) +{ + int closure_5_lvar1 = 1; + + auto lambda = [=]() {return closure_5_lvar1;}; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + closure_5_lvar1 = closure_5_pvar1; + } + + return lambda(); + +/* { dg-optimized {'map\(force_tofrom:closure_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-optimized {'map\(force_tofrom:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + { dg-optimized {'map\(to:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(closure_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */ +} + +static int closure_6 (int closure_6_pvar1) +{ + int closure_6_lvar1 = 1; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + closure_6_lvar1 = closure_6_pvar1; + } + + auto lambda = [=]() {return closure_6_lvar1;}; /* { dg-line l_use[incr c_use] } */ + + return lambda(); + +/* { dg-optimized {'map\(force_tofrom:closure_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_6_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */ +/* { dg-missed {'map\(force_tofrom:closure_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_6_lvar1' used...} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */ +} + +static int try_1 () +{ + int try_1_lvar1, try_1_lvar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + try_1_lvar1 = 1; + } + + try { + try_1_lvar2 = try_1_lvar1; /* { dg-line l_use[incr c_use] } */ + } catch (...) {} + + return try_1_lvar2; + +/* { dg-missed {'map\(force_tofrom:try_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'try_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */ +} + +static int try_2 () +{ + int try_2_lvar1, try_2_lvar2; + +#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + { + /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + try_2_lvar1 = 1; + } + + try { + try_2_lvar2 = 1; + } catch (...) { + try_2_lvar2 = try_2_lvar1; /* { dg-line l_use[incr c_use] } */ + } + + return try_2_lvar2; + +/* { dg-missed {'map\(force_tofrom:try_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'try_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute } + { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */ +} diff --git a/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 b/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 new file mode 100644 index 000000000000..ce3e556faf26 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 @@ -0,0 +1,588 @@ +! { dg-additional-options "-fdump-tree-gimple-raw" } +! { dg-additional-options "-fopt-info-omp-all" } + +! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName' +! passed to 'incr' may be unset, and in that case, it will be set to [...]", +! so to maintain compatibility with earlier Tcl releases, we manually +! initialize counter variables: +! { dg-line l_compute[variable c_compute 0] } +! { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid +! "WARNING: dg-line var l_compute defined, but not used". +! { dg-line l_use[variable c_use 0] } +! { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid +! "WARNING: dg-line var l_use defined, but not used". + +module globals + use ISO_C_BINDING + implicit none + integer :: opt_1_gvar1 = 1 + integer(C_INT), bind(C) :: opt_1_evar1 + integer :: opt_2_gvar1 = 1 + integer(C_INT), bind(C) :: opt_2_evar1 + integer :: opt_3_gvar1 = 1 + integer(C_INT), bind(C) :: opt_3_evar1 + integer :: use_1_gvar1 = 1 + integer(C_INT), bind(C) :: use_1_evar1 + integer :: use_2_gvar1 = 1 + integer(C_INT), bind(C) :: use_2_evar1 + integer :: use_2_a1(100) + integer(C_INT), bind(C) :: lcf_1_evar2 + integer(C_INT), bind(C) :: lcf_2_evar2 + integer(C_INT), bind(C) :: lcf_3_evar2 + integer(C_INT), bind(C) :: lcf_4_evar2 + integer(C_INT), bind(C) :: lcf_5_evar2 + integer(C_INT), bind(C) :: lcf_6_evar2 + save +end module globals + +subroutine opt_1 (opt_1_pvar1) + use globals + implicit none + integer :: opt_1_pvar1 + integer :: opt_1_lvar1 + integer, save :: opt_1_svar1 = 3 + integer :: dummy1, dummy2, dummy3, dummy4, dummy5 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + dummy1 = opt_1_pvar1; + dummy2 = opt_1_lvar1; + + dummy3 = opt_1_gvar1; + dummy4 = opt_1_evar1; + dummy5 = opt_1_svar1; + !$acc end kernels + +! Parameter is pass-by-reference +! { dg-missed {'map\(force_tofrom:\*opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-optimized {'map\(force_tofrom:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! +! { dg-missed {'map\(force_tofrom:opt_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_1_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! +! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy3\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy4\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy5\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine opt_1 + +subroutine opt_2 (opt_2_pvar1) + use globals + implicit none + integer :: opt_2_pvar1 + integer :: opt_2_lvar1 + integer, save :: opt_2_svar1 = 3 + integer :: dummy1, dummy2, dummy3, dummy4, dummy5 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + dummy1 = opt_2_pvar1; + dummy2 = opt_2_lvar1; + + dummy3 = opt_2_gvar1; + dummy4 = opt_2_evar1; + dummy5 = opt_2_svar1; + !$acc end kernels + + ! A write does not inhibit optimization. + opt_2_pvar1 = 0; + opt_2_lvar1 = 1; + + opt_2_gvar1 = 10; + opt_2_evar1 = 11; + opt_2_svar1 = 12; + +! { dg-missed {'map\(force_tofrom:\*opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-optimized {'map\(force_tofrom:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_2_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy3\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy4\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy5\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine opt_2 + +subroutine opt_3 (opt_3_pvar1) + use globals + implicit none + integer :: opt_3_pvar1 + integer :: opt_3_lvar1 + integer, save :: opt_3_svar1 = 3 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + opt_3_pvar1 = 0; + opt_3_lvar1 = 1; + + opt_3_gvar1 = 10; + opt_3_evar1 = 11; + opt_3_svar1 = 12; + !$acc end kernels + +! Parameter is pass-by-reference +! { dg-missed {'map\(force_tofrom:\*opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_3_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-optimized {'map\(force_tofrom:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +! +! { dg-missed {'map\(force_tofrom:opt_3_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_3_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } + +! { dg-missed {'map\(force_tofrom:opt_3_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_3_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine opt_3 + +subroutine opt_4 () + implicit none + integer, dimension(10) :: opt_4_larray1 + integer :: dummy1, dummy2 + + ! TODO Fortran local arrays are addressable (and may be visable to nested + ! functions, etc.) so they are not optimizable yet. + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + dummy1 = opt_4_larray1(4) + dummy2 = opt_4_larray1(8) + !$acc end kernels + +! { dg-missed {'map\(tofrom:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_4_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! +! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine opt_4 + +subroutine opt_5 (opt_5_pvar1) + implicit none + integer, dimension(10) :: opt_5_larray1 + integer :: opt_5_lvar1, opt_5_pvar1 + + opt_5_lvar1 = opt_5_pvar1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + opt_5_larray1(opt_5_lvar1) = 1 + !$acc end kernels + +! { dg-missed {'map\(tofrom:opt_5_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_5_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! +! { dg-optimized {'map\(force_tofrom:opt_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine opt_5 + +subroutine use_1 (use_1_pvar1) + use globals + implicit none + integer :: use_1_pvar1 + integer :: use_1_lvar1 + integer, save :: use_1_svar1 = 3 + integer :: s + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + use_1_pvar1 = 0; + use_1_lvar1 = 1; + + ! FIXME: svar is optimized: should not be + use_1_gvar1 = 10; + use_1_evar1 = 11; + use_1_svar1 = 12; + !$acc end kernels + + s = 0 + s = s + use_1_pvar1 + s = s + use_1_lvar1 ! { dg-missed {\.\.\. here} "" { target *-*-* } } + s = s + use_1_gvar1 + s = s + use_1_evar1 + s = s + use_1_svar1 + +! { dg-missed {'map\(force_tofrom:\*use_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*use_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_1_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine use_1 + +subroutine use_2 (use_2_pvar1) + use globals + implicit none + integer :: use_2_pvar1 + integer :: use_2_lvar1 + integer, save :: use_2_svar1 = 3 + integer :: s + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + use_2_pvar1 = 0; + use_2_lvar1 = 1; + use_2_gvar1 = 10; + use_2_evar1 = 11; + use_2_svar1 = 12; + !$acc end kernels + + s = 0 + s = s + use_2_a1(use_2_pvar1) + s = s + use_2_a1(use_2_lvar1) ! { dg-missed {\.\.\. here} "" { target *-*-* } } + s = s + use_2_a1(use_2_gvar1) + s = s + use_2_a1(use_2_evar1) + s = s + use_2_a1(use_2_svar1) + +! { dg-missed {'map\(force_tofrom:\*use_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*use_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:use_2_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine use_2 + +! Optimization inhibited because of looping/control flow. + +subroutine lcf_1 (lcf_1_pvar1, iter) + use globals + implicit none + real :: lcf_1_pvar1 + real :: lcf_1_lvar1 + real, save :: lcf_1_svar2 + integer :: i, iter + + do i = 1, iter ! { dg-line l_use[incr c_use] } + !$acc kernels ! { dg-line l_compute[incr c_compute] } + lcf_1_pvar1 = 0 + lcf_1_lvar1 = 1 + lcf_1_evar2 = 2 + lcf_1_svar2 = 3 + !$acc end kernels + end do + +! { dg-missed {'map\(force_tofrom:\*lcf_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } +end subroutine lcf_1 + +subroutine lcf_2 (lcf_2_pvar1) + use globals + implicit none + real :: lcf_2_pvar1 + real :: lcf_2_lvar1 + real, save :: lcf_2_svar2 + integer :: dummy + +10 dummy = 1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + lcf_2_pvar1 = 0 + lcf_2_lvar1 = 1 + lcf_2_evar2 = 2 + lcf_2_svar2 = 3 + !$acc end kernels + + go to 10 ! { dg-line l_use[incr c_use] } + +! { dg-missed {'map\(force_tofrom:\*lcf_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } +end subroutine lcf_2 + +subroutine lcf_3 (lcf_3_pvar1) + use globals + implicit none + real :: lcf_3_pvar1 + real :: lcf_3_lvar1 + real, save :: lcf_3_svar2 + integer :: dummy + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + lcf_3_pvar1 = 0 + lcf_3_lvar1 = 1 + lcf_3_evar2 = 2 + lcf_3_svar2 = 3 + !$acc end kernels + + ! Backward jump after kernel +10 dummy = 1 + go to 10 ! { dg-line l_use[incr c_use] } + +! { dg-missed {'map\(force_tofrom:\*lcf_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_3_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } +end subroutine lcf_3 + +subroutine lcf_4 (lcf_4_pvar1) + use globals + implicit none + real :: lcf_4_pvar1 + real :: lcf_4_lvar1 + real, save :: lcf_4_svar2 + integer :: dummy + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + lcf_4_pvar1 = 0 + lcf_4_lvar1 = 1 + lcf_4_evar2 = 2 + lcf_4_svar2 = 3 + !$acc end kernels + + ! Forward jump after kernel + go to 10 +10 dummy = 1 + +! { dg-missed {'map\(force_tofrom:\*lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_4_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_4_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_4_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine lcf_4 + +subroutine lcf_5 (lcf_5_pvar1, lcf_5_pvar2) + use globals + implicit none + real :: lcf_5_pvar1 + real :: lcf_5_pvar2 + real :: lcf_5_lvar1 + real, save :: lcf_5_svar2 + integer :: dummy + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + lcf_5_pvar1 = 0 + lcf_5_lvar1 = 1 + lcf_5_evar2 = 2 + lcf_5_svar2 = 3 + !$acc end kernels + + if (lcf_5_pvar2 > 0) then + dummy = 1 + end if + +! { dg-missed {'map\(force_tofrom:\*lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_5_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_5_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_5_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine lcf_5 + +subroutine lcf_6 (lcf_6_pvar1, lcf_6_pvar2) + use globals + implicit none + real :: lcf_6_pvar1 + real :: lcf_6_pvar2 + real :: lcf_6_lvar1 + real, save :: lcf_6_svar2 + integer :: dummy + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + lcf_6_pvar1 = 0 + lcf_6_lvar1 = 1 + lcf_6_evar2 = 2 + lcf_6_svar2 = 3 + !$acc end kernels + + dummy = merge(1,0, lcf_6_pvar2 > 0) + +! { dg-missed {'map\(force_tofrom:\*lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_6_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_6_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:lcf_6_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine lcf_6 + +subroutine priv_1 () + implicit none + integer :: priv_1_lvar1, priv_1_lvar2, priv_1_lvar3, priv_1_lvar4 + integer :: priv_1_lvar5, priv_1_lvar6, dummy + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + ! { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */ + priv_1_lvar1 = 1 + dummy = priv_1_lvar2 + + if (priv_1_lvar2 > 0) then + priv_1_lvar3 = 1 + else + priv_1_lvar3 = 2 + end if + + priv_1_lvar5 = priv_1_lvar3 + + if (priv_1_lvar2 > 0) then + priv_1_lvar4 = 1 + dummy = priv_1_lvar4 + end if + !$acc end kernels + +! { dg-optimized {'map\(force_tofrom:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-bogus {'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar2\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar3\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar4\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar5\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(force_tofrom:dummy \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:dummy \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine priv_1 + +subroutine multiple_kernels_1 () + implicit none + integer :: multiple_kernels_1_lvar1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + multiple_kernels_1_lvar1 = 1 + !$acc end kernels + + !$acc kernels ! { dg-line l_use[incr c_use] } + multiple_kernels_1_lvar1 = multiple_kernels_1_lvar1 + 1 + !$acc end kernels + +! { dg-missed {'map\(force_tofrom:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'multiple_kernels_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } + +! { dg-optimized {'map\(force_tofrom:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_use$c_use } +end subroutine multiple_kernels_1 + +subroutine multiple_kernels_2 () + implicit none + integer :: multiple_kernels_2_lvar1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + multiple_kernels_2_lvar1 = 1 + !$acc end kernels + + !$acc parallel + multiple_kernels_2_lvar1 = multiple_kernels_2_lvar1 + 1 ! { dg-line l_use[incr c_use] } + !$acc end parallel + +! { dg-missed {'map\(force_tofrom:multiple_kernels_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'multiple_kernels_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } +end subroutine multiple_kernels_2 + +integer function ref_1 () + implicit none + integer, target :: ref_1_lvar1 + integer, target :: ref_1_lvar2 + integer, pointer :: ref_1_ref1 + + ref_1_ref1 => ref_1_lvar1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + ref_1_lvar1 = 1 + ! FIXME: currently considered unsuitable; but could be optimized + ref_1_lvar2 = 2 + !$acc end kernels + + ref_1 = ref_1_ref1 + +! { dg-missed {'map\(force_tofrom:ref_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:ref_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end function ref_1 + +integer function ref_2 () + implicit none + integer, target :: ref_2_lvar1 + integer, target :: ref_2_lvar2 + integer, pointer :: ref_2_ref1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + ref_2_lvar1 = 1 + ! FIXME: currently considered unsuitable, but could be optimized + ref_2_lvar2 = 2 + !$acc end kernels + + ref_2_ref1 => ref_2_lvar1 + ref_2 = ref_2_ref1 + +! { dg-missed {'map\(force_tofrom:ref_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:ref_2_lvar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end function ref_2 + +subroutine ref_3 () + implicit none + integer, target :: ref_3_lvar1 + integer, pointer :: ref_3_ref1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + ref_3_ref1 => ref_3_lvar1 + + ! FIXME: currently considered unsuitable, but could be optimized + ref_3_lvar1 = 1 + !$acc end kernels + +! { dg-missed {'map\(force_tofrom:\*ref_3_ref1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*ref_3_ref1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:ref_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine ref_3 + +subroutine ref_4 () + implicit none + integer, target :: ref_4_lvar1 + integer, pointer :: ref_4_ref1 + + !$acc kernels ! { dg-line l_compute[incr c_compute] } + ref_4_ref1 => ref_4_lvar1 + + ! FIXME: currently considered unsuitable, but could be optimized + ref_4_ref1 = 1 + !$acc end kernels + +! { dg-missed {'map\(force_tofrom:\*ref_4_ref1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*ref_4_ref1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +! { dg-missed {'map\(force_tofrom:ref_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } +end subroutine ref_4 + +subroutine conditional_1 (conditional_1_pvar1) + implicit none + integer :: conditional_1_pvar1 + integer :: conditional_1_lvar1 + + conditional_1_lvar1 = 1 + + if (conditional_1_pvar1 > 0) then + !$acc kernels ! { dg-line l_compute[incr c_compute] } + conditional_1_lvar1 = 2 + !$acc end kernels + else + conditional_1_lvar1 = 3 + end if + +! { dg-optimized {'map\(force_tofrom:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(conditional_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine conditional_1 + +subroutine conditional_2 (conditional_2_pvar1) + implicit none + integer :: conditional_2_pvar1 + integer :: conditional_2_lvar1 + + conditional_2_lvar1 = 1 + + if (conditional_2_pvar1 > 0) then + conditional_2_lvar1 = 3 + else + !$acc kernels ! { dg-line l_compute[incr c_compute] } + conditional_2_lvar1 = 2 + !$acc end kernels + end if + +! { dg-optimized {'map\(force_tofrom:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } +! { dg-optimized {'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(conditional_2_lvar1\)'} "" { target *-*-* } l_compute$c_compute } +end subroutine conditional_2 diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 index b2aae1df5229..97fbe1268b73 100644 --- a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 @@ -5,6 +5,8 @@ subroutine foo integer :: i !$acc kernels + ! { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 } + !TODO See discussion in '../../c-c++-common/goacc/uninit-copy-clause.c'. i = 1 !$acc end kernels diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index ebaa3c86694f..7a48091f4286 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -423,6 +423,7 @@ extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt); extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_omp_data_optimize (gcc::context *ctxt); extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt); diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c index e08cfa56e3c9..88742a3bfdf4 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c @@ -29,6 +29,8 @@ int main() int b[N] = { 0 }; #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ + /* { dg-missed {'map\(tofrom:b [^)]+\)' not optimized: 'b' is unsuitable for privatization} "" { target *-*-* } .-1 } + { dg-missed {'map\(force_tofrom:a [^)]+\)' not optimized: 'a' is unsuitable for privatization} "" { target *-*-* } .-2 } */ { int c = 234; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */ /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute } diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 index 74ee6fde84f8..994a8a35110f 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 @@ -17,6 +17,10 @@ subroutine kernel(lo, hi, a, b, c) real, dimension(lo:hi) :: a, b, c !$acc kernels copyin(lo, hi) + ! { dg-optimized {'map\(force_tofrom:offset.[0-9]+ [^)]+\)' optimized to 'map\(to:offset.[0-9]+ [^)]+\)'} "" {target *-*-* } .-1 } + ! { dg-missed {'map\(tofrom:\*c [^)]+\)' not optimized: '\*c' is unsuitable for privatization} "" { target *-*-* } .-2 } + ! { dg-missed {'map\(tofrom:\*b [^)]+\)' not optimized: '\*b' is unsuitable for privatization} "" { target *-*-* } .-3 } + ! { dg-missed {'map\(tofrom:\*a [^)]+\)' not optimized: '\*a' is unsuitable for privatization} "" { target *-*-* } .-4 } !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] } ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } From patchwork Wed Dec 15 15:54:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48968 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CEBB3385800D for ; Wed, 15 Dec 2021 16:13:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 311F93858034 for ; Wed, 15 Dec 2021 15:56:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 311F93858034 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: qfk+3fMvFi+yThTSC/SCaP8nHn8fCZEpgCwgZHXn7CTh9eNiv1AuW1oEiyL90da/Nckofi7/0F Si6VpoDvHsLdMzamaZ2LBwrJaQWCLmYu3XR/4a0sZe/ahwZonobKQNzsU1nIr1s8LH2i7AMHlm yAAQszozOKf3wpVF1dG4jdJyf0fPY2oj1ESmT+lmFtW5iU9GZ9IxwBbr1wsftKtuTX81946I6o g/6ZEJVWN6dhSH+uk+ft4cHLAeKShDF0JZId1vw6vRsxUffKOYe9X63sV8VoEge90FnHCEF8ys FWqMGsSG3s1x6y2el6NmU3zV X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="72258710" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:38 -0800 IronPort-SDR: l8EZ/cefRwaYOLqrWauw3Pmlr+TCY/EDAXJdItfOskuE/PreWs4/YloA1Ll9y7Cqkxs6ZSXirg DIGAa12bAQR5NC8xoM/r0ogO8SWv416jTMSccld2aMhsfEDsElFXB/cdP4Sm08WWR7l1IWIH5z oUasKCsh5HDJFRHRDGlZbEjaki8krkxI2liEebbXLgP8kytq5JaVGRGtPoZdtnxuLD3FfD/8RH GxYbTcLgRsRCjDVeshwoc5Vwk4sDTj8J3i4QGqKFw1CZuOg+0IDp1ACmPlKL29PEItq31NLpIS zdg= From: Frederik Harwath To: Subject: [PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels Date: Wed, 15 Dec 2021 16:54:32 +0100 Message-ID: <20211215155447.19379-26-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Stubbs , rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Andrew Stubbs This commit adds the code generation for the runtime alias checks for OpenACC loops that have been analyzed by Graphite. The runtime alias check condition gets generated in Graphite. It is evaluated by the code generated for the IFN_GOACC_LOOP internal function calls. If aliasing is detected at runtime, the execution dimensions get adjusted to execute the affected loops sequentially. gcc/ChangeLog: * graphite-isl-ast-to-gimple.c: Include internal-fn.h. (graphite_oacc_analyze_scop): Implement runtime alias checks. * omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter to GOACC_LOOP internal calls, and initialise it to integer_one_node. * omp-offload.c (oacc_xform_loop): Integrate the runtime alias check into the GOACC_LOOP expansion. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test. --- gcc/graphite-isl-ast-to-gimple.c | 122 ++++++++ gcc/omp-expand.c | 37 +-- gcc/omp-offload.c | 271 ++++++++++-------- .../runtime-alias-check-1.c | 79 +++++ .../runtime-alias-check-2.c | 90 ++++++ 5 files changed, 457 insertions(+), 142 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index e820e2c32202..010adaabb000 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3. If not see #include "graphite.h" #include "graphite-oacc.h" #include "stdlib.h" +#include "internal-fn.h" struct ast_build_info { @@ -1697,6 +1698,127 @@ graphite_oacc_analyze_scop (scop_p scop) print_isl_schedule (dump_file, scop->original_schedule); } + if (flag_graphite_runtime_alias_checks + && scop->unhandled_alias_ddrs.length () > 0) + { + sese_info_p region = scop->scop_info; + + /* Usually there will be a chunking loop with the actual work loop + inside it. In some corner cases there may only be one loop. */ + loop_p top_loop = region->region.entry->dest->loop_father; + loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop; + tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, active_loop); + + /* Walk back to GOACC_LOOP block. */ + basic_block goacc_loop_block = region->region.entry->src; + + /* Find the GOACC_LOOP calls. If there aren't any then this is not an + OpenACC kernels loop and will need different handling. */ + gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block); + while (!gsi_end_p (gsitop) + && (!is_gimple_call (gsi_stmt (gsitop)) + || !gimple_call_internal_p (gsi_stmt (gsitop)) + || (gimple_call_internal_fn (gsi_stmt (gsitop)) + != IFN_GOACC_LOOP))) + gsi_next (&gsitop); + + if (!gsi_end_p (gsitop)) + { + /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted + statements. There ought not be any problematic dependencies because + the chunk size and step are only computed for very specific purposes. + They may not be at the very top of the block, but they should be + found together (the asserts test this assuption). */ + gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block); + gsi_move_after (&gsitop, &gsibottom); + gimple_stmt_iterator gsiinsert = gsibottom; + gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop)) + && gimple_call_internal_p (gsi_stmt (gsitop)) + && (gimple_call_internal_fn (gsi_stmt (gsitop)) + == IFN_GOACC_LOOP)); + gsi_move_after (&gsitop, &gsibottom); + + /* Insert "noalias_p = COND" before the GOACC_LOOP statements. + Note that these likely depend on some of the hoisted statements. */ + tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, NULL, + true, GSI_NEW_STMT); + + /* Insert the cond_val into each GOACC_LOOP call in the region. */ + for (int n = -1; n < (int)region->bbs.length (); n++) + { + /* Cover the region plus goacc_loop_block. */ + basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n]; + + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (!is_gimple_call (stmt) + || !gimple_call_internal_p (stmt)) + continue; + + gcall *goacc_call = as_a (stmt); + if (gimple_call_internal_fn (goacc_call) != IFN_GOACC_LOOP) + continue; + + enum ifn_goacc_loop_kind code = (enum ifn_goacc_loop_kind) + TREE_INT_CST_LOW (gimple_call_arg (goacc_call, 0)); + int argno = 0; + switch (code) + { + case IFN_GOACC_LOOP_CHUNKS: + case IFN_GOACC_LOOP_STEP: + argno = 6; + break; + + case IFN_GOACC_LOOP_OFFSET: + case IFN_GOACC_LOOP_BOUND: + argno = 7; + break; + + default: + gcc_unreachable (); + } + + gimple_call_set_arg (goacc_call, argno, cond_val); + update_stmt (goacc_call); + + if (dump_enabled_p () && dump_flags & TDF_DETAILS) + dump_printf (MSG_NOTE, + "Runtime alias condition applied to: %G", + goacc_call); + } + } + } + else + { + /* There wasn't any GOACC_LOOP calls where we expected to find them, + therefore this isn't an OpenACC parallel loop. If it runs + sequentially then there's no need to worry about aliasing, so + nothing much to do here. */ + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "Runtime alias check *not* inserted for" + " bb %d (GOACC_LOOP not found)"); + + /* Unset can_be_parallel, in case something else might use it. */ + for (unsigned int i = 0; i < region->bbs.length (); i++) + if (region->bbs[i]->loop_father) + region->bbs[i]->loop_father->can_be_parallel = 0; + } + + /* The loop-nest vec is shared by all DDRs. */ + DDR_LOOP_NEST (scop->unhandled_alias_ddrs[0]).release (); + + unsigned int i; + struct data_dependence_relation *ddr; + + FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr) + if (ddr) + free_dependence_relation (ddr); + scop->unhandled_alias_ddrs.truncate (0); + } + /* Analyze dependences in SCoP and mark loops as parallelizable accordingly. */ isl_schedule_foreach_schedule_node_top_down ( scop->original_schedule, visit_schedule_loop_node, scop->dependence); diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index 365d167b6428..585ce798ee15 100644 --- a/gcc/omp-expand.c +++ b/gcc/omp-expand.c @@ -7719,10 +7719,11 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) ass = gimple_build_assign (chunk_no, expr); gsi_insert_before (&gsi, ass, GSI_SAME_STMT); - call = gimple_build_call_internal (IFN_GOACC_LOOP, 6, + call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, build_int_cst (integer_type_node, IFN_GOACC_LOOP_CHUNKS), - dir, range, s, chunk_size, gwv); + dir, range, s, chunk_size, gwv, + integer_one_node); gimple_call_set_lhs (call, chunk_max); gimple_set_location (call, loc); gsi_insert_before (&gsi, call, GSI_SAME_STMT); @@ -7730,10 +7731,11 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) else chunk_size = chunk_no; - call = gimple_build_call_internal (IFN_GOACC_LOOP, 6, + call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, build_int_cst (integer_type_node, IFN_GOACC_LOOP_STEP), - dir, range, s, chunk_size, gwv); + dir, range, s, chunk_size, gwv, + integer_one_node); gimple_call_set_lhs (call, step); gimple_set_location (call, loc); gsi_insert_before (&gsi, call, GSI_SAME_STMT); @@ -7767,20 +7769,20 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) /* Loop offset & bound go into head_bb. */ gsi = gsi_start_bb (head_bb); - call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, + call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, build_int_cst (integer_type_node, IFN_GOACC_LOOP_OFFSET), - dir, range, s, - chunk_size, gwv, chunk_no); + dir, range, s, chunk_size, gwv, chunk_no, + integer_one_node); gimple_call_set_lhs (call, offset_init); gimple_set_location (call, loc); gsi_insert_after (&gsi, call, GSI_CONTINUE_LINKING); - call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, + call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, build_int_cst (integer_type_node, IFN_GOACC_LOOP_BOUND), - dir, range, s, - chunk_size, gwv, offset_init); + dir, range, s, chunk_size, gwv, + offset_init, integer_one_node); gimple_call_set_lhs (call, bound); gimple_set_location (call, loc); gsi_insert_after (&gsi, call, GSI_CONTINUE_LINKING); @@ -7830,22 +7832,25 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) tree chunk = build_int_cst (diff_type, 0); /* Never chunked. */ t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_OFFSET); - call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range, - element_s, chunk, e_gwv, chunk); + call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, t, dir, e_range, + element_s, chunk, e_gwv, chunk, + integer_one_node); gimple_call_set_lhs (call, e_offset); gimple_set_location (call, loc); gsi_insert_before (&gsi, call, GSI_SAME_STMT); t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_BOUND); - call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range, - element_s, chunk, e_gwv, e_offset); + call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, t, dir, e_range, + element_s, chunk, e_gwv, e_offset, + integer_one_node); gimple_call_set_lhs (call, e_bound); gimple_set_location (call, loc); gsi_insert_before (&gsi, call, GSI_SAME_STMT); t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_STEP); - call = gimple_build_call_internal (IFN_GOACC_LOOP, 6, t, dir, e_range, - element_s, chunk, e_gwv); + call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range, + element_s, chunk, e_gwv, + integer_one_node); gimple_call_set_lhs (call, e_step); gimple_set_location (call, loc); gsi_insert_before (&gsi, call, GSI_SAME_STMT); diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 392ca56b1f4f..3458a1acbceb 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -555,6 +555,7 @@ oacc_xform_loop (gcall *call) bool chunking = false, striding = true; unsigned outer_mask = mask & (~mask + 1); // Outermost partitioning unsigned inner_mask = mask & ~outer_mask; // Inner partitioning (if any) + tree noalias = NULL_TREE; /* Skip lowering if return value of IFN_GOACC_LOOP call is not used. */ if (!lhs) @@ -596,147 +597,165 @@ oacc_xform_loop (gcall *call) switch (code) { - default: gcc_unreachable (); + default: + gcc_unreachable (); case IFN_GOACC_LOOP_CHUNKS: + noalias = gimple_call_arg (call, 6); if (!chunking) - r = build_int_cst (type, 1); + r = build_int_cst (type, 1); else - { - /* chunk_max - = (range - dir) / (chunks * step * num_threads) + dir */ - tree per = oacc_thread_numbers (false, mask, &seq); - per = fold_convert (type, per); - chunk_size = fold_convert (type, chunk_size); - per = fold_build2 (MULT_EXPR, type, per, chunk_size); - per = fold_build2 (MULT_EXPR, type, per, step); - r = build2 (MINUS_EXPR, type, range, dir); - r = build2 (PLUS_EXPR, type, r, per); - r = build2 (TRUNC_DIV_EXPR, type, r, per); - } + { + /* chunk_max + = (range - dir) / (chunks * step * num_threads) + dir */ + tree per = oacc_thread_numbers (false, mask, &seq); + per = fold_convert (type, per); + noalias = fold_convert (type, noalias); + per = fold_build2 (MULT_EXPR, type, per, noalias); + per = fold_build2 (MAX_EXPR, type, per, fold_convert (type, integer_one_node)); + chunk_size = fold_convert (type, chunk_size); + per = fold_build2 (MULT_EXPR, type, per, chunk_size); + per = fold_build2 (MULT_EXPR, type, per, step); + r = fold_build2 (MINUS_EXPR, type, range, dir); + r = fold_build2 (PLUS_EXPR, type, r, per); + r = build2 (TRUNC_DIV_EXPR, type, r, per); + } break; case IFN_GOACC_LOOP_STEP: + noalias = gimple_call_arg (call, 6); { /* If striding, step by the entire compute volume, otherwise - step by the inner volume. */ + step by the inner volume. */ unsigned volume = striding ? mask : inner_mask; + noalias = fold_convert (type, noalias); r = oacc_thread_numbers (false, volume, &seq); + r = fold_convert (type, r); + r = build2 (MULT_EXPR, type, r, noalias); + r = build2 (MAX_EXPR, type, r, fold_convert (type, fold_convert (type, integer_one_node))); r = build2 (MULT_EXPR, type, fold_convert (type, r), step); + break; } - break; - - case IFN_GOACC_LOOP_OFFSET: - /* Enable vectorization on non-SIMT targets. */ - if (!targetm.simt.vf - && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR) - /* If not -fno-tree-loop-vectorize, hint that we want to vectorize - the loop. */ - && (flag_tree_loop_vectorize - || !OPTION_SET_P (flag_tree_loop_vectorize))) - { - basic_block bb = gsi_bb (gsi); - class loop *parent = bb->loop_father; - class loop *body = parent->inner; - - parent->force_vectorize = true; - parent->safelen = INT_MAX; - - /* "Chunking loops" may have inner loops. */ - if (parent->inner) - { - body->force_vectorize = true; - body->safelen = INT_MAX; - } - - cfun->has_force_vectorize_loops = true; - } - if (striding) - { - r = oacc_thread_numbers (true, mask, &seq); - r = fold_convert (diff_type, r); - } - else - { - tree inner_size = oacc_thread_numbers (false, inner_mask, &seq); - tree outer_size = oacc_thread_numbers (false, outer_mask, &seq); - tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), - inner_size, outer_size); - - volume = fold_convert (diff_type, volume); - if (chunking) - chunk_size = fold_convert (diff_type, chunk_size); - else - { - tree per = fold_build2 (MULT_EXPR, diff_type, volume, step); - chunk_size = build2 (MINUS_EXPR, diff_type, range, dir); - chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per); - chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per); - } - - tree span = build2 (MULT_EXPR, diff_type, chunk_size, - fold_convert (diff_type, inner_size)); - r = oacc_thread_numbers (true, outer_mask, &seq); - r = fold_convert (diff_type, r); - r = build2 (MULT_EXPR, diff_type, r, span); - - tree inner = oacc_thread_numbers (true, inner_mask, &seq); - inner = fold_convert (diff_type, inner); - r = fold_build2 (PLUS_EXPR, diff_type, r, inner); - - if (chunking) - { - tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6)); - tree per - = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size); - per = build2 (MULT_EXPR, diff_type, per, chunk); - - r = build2 (PLUS_EXPR, diff_type, r, per); - } - } - r = fold_build2 (MULT_EXPR, diff_type, r, step); - if (type != diff_type) - r = fold_convert (type, r); - break; - - case IFN_GOACC_LOOP_BOUND: - if (striding) - r = range; - else - { - tree inner_size = oacc_thread_numbers (false, inner_mask, &seq); - tree outer_size = oacc_thread_numbers (false, outer_mask, &seq); - tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), - inner_size, outer_size); - - volume = fold_convert (diff_type, volume); - if (chunking) - chunk_size = fold_convert (diff_type, chunk_size); - else - { - tree per = fold_build2 (MULT_EXPR, diff_type, volume, step); - - chunk_size = build2 (MINUS_EXPR, diff_type, range, dir); - chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per); - chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per); - } - - tree span = build2 (MULT_EXPR, diff_type, chunk_size, - fold_convert (diff_type, inner_size)); - - r = fold_build2 (MULT_EXPR, diff_type, span, step); - - tree offset = gimple_call_arg (call, 6); - r = build2 (PLUS_EXPR, diff_type, r, - fold_convert (diff_type, offset)); - r = build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, - diff_type, r, range); - } - if (diff_type != type) - r = fold_convert (type, r); - break; + case IFN_GOACC_LOOP_OFFSET: + noalias = gimple_call_arg (call, 7); + if (striding) + { + r = oacc_thread_numbers (true, mask, &seq); + r = fold_convert (diff_type, r); + tree tmp1 = build2 (NE_EXPR, boolean_type_node, r, + fold_convert (diff_type, integer_zero_node)); + tree tmp2 = build2 (EQ_EXPR, boolean_type_node, noalias, + boolean_false_node); + tree tmp3 = build2 (BIT_AND_EXPR, diff_type, + fold_convert (diff_type, tmp1), + fold_convert (diff_type, tmp2)); + tree tmp4 = build2 (MULT_EXPR, diff_type, tmp3, range); + r = build2 (PLUS_EXPR, diff_type, r, tmp4); + } + else + { + tree inner_size = oacc_thread_numbers (false, inner_mask, &seq); + tree outer_size = oacc_thread_numbers (false, outer_mask, &seq); + tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), + inner_size, outer_size); + + volume = fold_convert (diff_type, volume); + if (chunking) + chunk_size = fold_convert (diff_type, chunk_size); + else + { + tree per = fold_build2 (MULT_EXPR, diff_type, volume, step); + /* chunk_size = (range + per - 1) / per. */ + chunk_size = build2 (MINUS_EXPR, diff_type, range, dir); + chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per); + chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per); + } + + /* Curtail the range in all but one thread when there may be + aliasing to prevent parallelization. */ + tree n = oacc_thread_numbers (true, mask, &seq); + n = fold_convert (diff_type, n); + tree tmp1 = build2 (NE_EXPR, boolean_type_node, n, + fold_convert (diff_type, integer_zero_node)); + tree tmp2 = build2 (EQ_EXPR, boolean_type_node, noalias, + boolean_false_node); + tree tmp3 = build2 (BIT_AND_EXPR, diff_type, + fold_convert (diff_type, tmp1), + fold_convert (diff_type, tmp2)); + range = build2 (MULT_EXPR, diff_type, tmp3, range); + + tree span = build2 (MULT_EXPR, diff_type, chunk_size, + fold_convert (diff_type, inner_size)); + r = oacc_thread_numbers (true, outer_mask, &seq); + r = fold_convert (diff_type, r); + r = build2 (PLUS_EXPR, diff_type, r, range); + r = build2 (MULT_EXPR, diff_type, r, span); + + tree inner = oacc_thread_numbers (true, inner_mask, &seq); + + inner = fold_convert (diff_type, inner); + r = fold_build2 (PLUS_EXPR, diff_type, r, inner); + + if (chunking) + { + tree chunk + = fold_convert (diff_type, gimple_call_arg (call, 6)); + tree per + = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size); + per = build2 (MULT_EXPR, diff_type, per, chunk); + + r = build2 (PLUS_EXPR, diff_type, r, per); + } + } + r = fold_build2 (MULT_EXPR, diff_type, r, step); + if (type != diff_type) + r = fold_convert (type, r); + break; + + case IFN_GOACC_LOOP_BOUND: + if (striding) + r = range; + else + { + noalias = fold_convert (diff_type, gimple_call_arg (call, 7)); + + tree inner_size = oacc_thread_numbers (false, inner_mask, &seq); + tree outer_size = oacc_thread_numbers (false, outer_mask, &seq); + tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), + inner_size, outer_size); + + volume = fold_convert (diff_type, volume); + volume = fold_build2 (MULT_EXPR, diff_type, volume, noalias); + volume + = fold_build2 (MAX_EXPR, diff_type, volume, fold_convert (diff_type, integer_one_node)); + if (chunking) + chunk_size = fold_convert (diff_type, chunk_size); + else + { + tree per = fold_build2 (MULT_EXPR, diff_type, volume, step); + /* chunk_size = (range + per - 1) / per. */ + chunk_size = build2 (MINUS_EXPR, diff_type, range, dir); + chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per); + chunk_size + = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per); + } + + tree span = build2 (MULT_EXPR, diff_type, chunk_size, + fold_convert (diff_type, inner_size)); + + r = fold_build2 (MULT_EXPR, diff_type, span, step); + + tree offset = gimple_call_arg (call, 6); + r = build2 (PLUS_EXPR, diff_type, r, + fold_convert (diff_type, offset)); + r = build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type, r, + range); + } + if (diff_type != type) + r = fold_convert (type, r); + break; } gimplify_assign (lhs, r, &seq); diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c new file mode 100644 index 000000000000..2fb1c712beb3 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c @@ -0,0 +1,79 @@ +/* Test that a simple array copy does the right thing when the input and + output data overlap. The GPU kernel should automatically switch to + a sequential operation mode in order to give the expected results. */ + +#include +#include + +void f(int *data, int n, int to, int from, int count) +{ + /* We cannot use copyin for two overlapping arrays because we get an error + that the memory is already present. We also cannot do the pointer + arithmetic inside the kernels region because it just ends up using + host pointers (bug?). Using enter data with a single array, and + acc_deviceptr solves the problem. */ +#pragma acc enter data copyin(data[0:n]) + + int *a = (int*)acc_deviceptr (data+to); + int *b = (int*)acc_deviceptr (data+from); + +#pragma acc kernels + for (int i = 0; i < count; i++) + a[i] = b[i]; + +#pragma acc exit data copyout(data[0:n]) +} + +#define N 2000 + +int data[N]; + +int +main () +{ + for (int i=0; i < N; i++) + data[i] = i; + + /* Baseline test; no aliasing. The high part of the data is copied to + the lower part. */ + int to = 0; + int from = N/2; + int count = N/2; + f (data, N, to, from, count); + for (int i=0; i < N; i++) + if (data[i] != (i%count)+count) + exit (1); + + /* Check various amounts of data overlap. */ + int tests[] = {1, 10, N/4, N/2-10, N/2-1}; + for (int t = 0; t < sizeof (tests)/sizeof(tests[0]); t++) + { + for (int i=0; i < N; i++) + data[i] = i; + + /* Output overlaps the latter part of input; expect the initial no-aliased + part of the input to repeat throughout the aliased portion. */ + to = tests[t]; + from = 0; + count = N-tests[t]; + f (data, N, to, from, count); + for (int i=0; i < N; i++) + if (data[i] != i%tests[t]) + exit (2); + + for (int i=0; i < N; i++) + data[i] = i; + + /* Input overlaps the latter part of the output; expect the copy to work + in the obvious manner. */ + to = 0; + from = tests[t]; + count = N-tests[t]; + f (data, N, to, from, count); + for (int i=0; i < count; i++) + if (data[i+to] != i+tests[t]) + exit (3); + } + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c new file mode 100644 index 000000000000..96c03297d5b4 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c @@ -0,0 +1,90 @@ +/* Test that a simple array copy does the right thing when the input and + output data overlap. The GPU kernel should automatically switch to + a sequential operation mode in order to give the expected results. + + This test does not check the correctness of the output (there are other + tests for that), but checks that the code really does select the faster + path, when it can, by comparing the timing. */ + +/* No optimization means no issue with aliasing. + { dg-skip-if "" { *-*-* } { "-O0" } { "" } } + { dg-skip-if "" { *-*-* } { "-foffload=disable" } { "" } } */ + +#include +#include +#include + +void f(int *data, int n, int to, int from, int count) +{ + int *a = (int*)acc_deviceptr (data+to); + int *b = (int*)acc_deviceptr (data+from); + +#pragma acc kernels + for (int i = 0; i < count; i++) + a[i] = b[i]; +} + +#define N 1000000 +int data[N]; + +int +main () +{ + struct timeval start, stop, difference; + long basetime, aliastime; + + for (int i=0; i < N; i++) + data[i] = i; + + /* Ensure that the data copies are outside the timed zone. */ +#pragma acc enter data copyin(data[0:N]) + + /* Baseline test; no aliasing. The high part of the data is copied to + the lower part. */ + int to = 0; + int from = N/2; + int count = N/2; + gettimeofday (&start, NULL); + f (data, N, to, from, count); + gettimeofday (&stop, NULL); + timersub (&stop, &start, &difference); + basetime = difference.tv_sec * 1000000 + difference.tv_usec; + + /* Check various amounts of data overlap. */ + int tests[] = {1, 10, N/4, N/2-10, N/2-1}; + for (int i = 0; i < sizeof (tests)/sizeof(tests[0]); i++) + { + to = 0; + from = N/2 - tests[i]; + gettimeofday (&start, NULL); + f (data, N, to, from, count); + gettimeofday (&stop, NULL); + timersub (&stop, &start, &difference); + aliastime = difference.tv_sec * 1000000 + difference.tv_usec; + + /* If the aliased runtime is less than 200% of the non-aliased runtime + then the runtime alias check probably selected the wrong path. + (Actually we expect the difference to be far greater than that.) */ + if (basetime*2 > aliastime) + exit (1); + } + + /* Repeat the baseline check just to make sure it didn't also get slower + after the first run. */ + to = 0; + from = N/2; + gettimeofday (&start, NULL); + f (data, N, to, from, count); + gettimeofday (&stop, NULL); + timersub (&stop, &start, &difference); + int controltime = difference.tv_sec * 1000000 + difference.tv_usec; + + /* The two times should be roughly the same, but we just check it wouldn't + pass the aliastime test above. */ + if (basetime*2 <= controltime) + exit (2); + +#pragma acc exit data copyout(data[0:N]) + + return 0; +} From patchwork Wed Dec 15 15:54:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48967 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B05013858410 for ; Wed, 15 Dec 2021 16:12:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 5CEBF3858439 for ; Wed, 15 Dec 2021 15:56:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5CEBF3858439 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 5Rr22kLpTPqLi9/gjBCQvXcMEdNeunpAHB0XJMB5r61m7Nx2Zb3yxJ/QTCef6l9/xpEIffXmxm ps+Dct2txVE/cSuL1TEMLnrMPhZn3DITAgDbl4/QUyt821y2H7i4PcvIiDqZSYBnNdkLIHXzFt MiOJtlZx0vlOg/jBuQutu0qlXgGDzUluj8VuWZxSFQZ3so1/fU5ecltxs+xpyNTuZ9Q3vK2D8G KvEIhG6pePxHqTNXfgd3VjtZwZ7gh0wNXcX3/fh0podsH+8NXfMiz4f4MmzBpLx5q6iv461oUg hRDXAmLhxyA6i6K99Jtm1t8r X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="72258711" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:42 -0800 IronPort-SDR: 7iMKgjJivqEjZ6h71BwdcxUEoc8/zshTQto09JU4jTU7+pw7UUzazptf8Uq68hQaqRtjdIYTbk 4xHLjn3acr7h1TKifN0Xf08L4F/eZvQKkcIVG0Xhp41MPjexf4I8Mqy7W5abxVhtCUJtFyiXEC 8X1YjVCJdFnJWfBrMovOQxwYNENLAx1SOz96LGEJstFP/Hdq9efUuhy5cqgaKRTUk5tuoKiOrt z5dE7IqFhNkyGzK9kgk5Wqxk2+Z2/yknzClu/VuPk9rASW1/WqKeXTZoOTSfECbWqRakosJO0Z 73o= From: Frederik Harwath To: Subject: [PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences Date: Wed, 15 Dec 2021 16:54:33 +0100 Message-ID: <20211215155447.19379-27-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This commit concerns loops in OpenACC "kernels" region that have been marked up with an explicit "independent" clause by the user, but for which Graphite found data dependences. A discussion on the private internal OpenACC mailing list suggested that warning the user about the dependences woud be a more acceptable solution than reverting the user's decision. This behavior is implemented by the present commit. gcc/ChangeLog: * common.opt: Add flag Wopenacc-false-independent. * omp-offload.c (oacc_loop_warn_if_false_independent): New function. (oacc_loop_fixed_partitions): Call from here. --- gcc/common.opt | 5 +++++ gcc/omp-offload.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/common.opt b/gcc/common.opt index b6c46ab63e34..ec76a88f14e3 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -850,6 +850,11 @@ Wtsan Common Var(warn_tsan) Init(1) Warning Warn about unsupported features in ThreadSanitizer. +Wopenacc-false-independent +Common Var(warn_openacc_false_independent) Init(1) Warning +Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\" +clause but analysis shows that it has loop-carried dependences. + Xassembler Driver Separate diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 3458a1acbceb..36dde11f5955 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -1900,6 +1900,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop *loop) return true; } +/* Emit a warning if LOOP has an "independent" clause but Graphite's + analysis shows that it has data dependences. Note that we respect + the user's explicit decision to parallelize the loop but we + nevertheless warn that this decision could be wrong. */ + +static void +oacc_loop_warn_if_false_independent (oacc_loop *loop) +{ + if (!optimize) + return; + + if (loop->routine) + return; + + /* TODO Warn about "auto" & "independent" in "parallel" regions? */ + if (!oacc_parallel_kernels_graphite_fun_p ()) + return; + + if (!(loop->flags & OLF_INDEPENDENT)) + return; + + bool analyzed = false; + bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed); + loop_p cfg_loop = oacc_loop_get_cfg_loop (loop); + + if (cfg_loop && cfg_loop->inner && !analyzed) + { + if (dump_enabled_p ()) + { + const dump_user_location_t loc + = dump_user_location_t::from_location_t (loop->loc); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "'independent' loop in 'kernels' region has not been " + "analyzed (cf. 'graphite' " + "dumps for more information).\n"); + } + return; + } + + if (!can_be_parallel) + warning_at (loop->loc, 0, + "loop has \"independent\" clause but data dependences were " + "found."); +} + /* Walk the OpenACC loop hierarchy checking and assigning the programmer-specified partitionings. OUTER_MASK is the partitioning this loop is contained within. Return mask of partitioning @@ -1951,6 +1996,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask) } } + /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */ + if (warn_openacc_false_independent) + oacc_loop_warn_if_false_independent (loop); + if (maybe_auto && (loop->flags & OLF_INDEPENDENT)) { loop->flags |= OLF_AUTO; From patchwork Wed Dec 15 15:54:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48969 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 83A0E3858418 for ; Wed, 15 Dec 2021 16:13:33 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 362E4385801D for ; Wed, 15 Dec 2021 15:56:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 362E4385801D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: LRpl2+8ROrBmKSTf1w3b6FBJtunXKfvUOaU8cKyARHbr0GB2hdCuQ8+smvu6rZcCkX0g8fXWXW uMP1sggR60DQDC2B/SdExayABKk/5hoe+b3x/2xG18N06MWp0atEu6ZzE0gacELfIM/7z2CvsH 6i3ugSuTjGhn437ScX9G750iB7HI0EzmVZMkLRuxN1PYHcgqoFTvlVLTUrfKRy3X5rGxF+ljZr cxbBZhRhZ2Nfv+IgHX0Uhxduad22rmdmeXXO42ZrICqcn/qzxqRNB/RS/4hGeuo9d9mTzN+fk/ Ir4rruPRMCvVKm+EsWQcbYrf X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="72258713" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:43 -0800 IronPort-SDR: WpUemnJXL/P/7lLh6TPC6pI7T4PZ/ehdh5lYp3rJApBMcGD5XRwKE8iE6Un/BpFnBi57TrlFnp hqGFDb1eFGW8mz4aVouRObeyRtcPrjhBKO40TUk0bLkGeo2yOhngzXQxMEbu5SFxZipvA7JzJp m/5MSRi5wMH0+XiCboyAmOesYQ2zpebV8C1Ed/S5PI6iXkGE94fz/QXyatfI12UBDtLWiBMveN PFJ08zXA/Kt1lMUHfd2ZaJwlatW7NICzMQmMTpCYAMmux7wGFMo2cTfjtOzLMTQdNRaAYWkDvy HME= From: Frederik Harwath To: Subject: [PATCH 27/40] openacc: Handle internal function calls in pass_lim Date: Wed, 15 Dec 2021 16:54:34 +0100 Message-ID: <20211215155447.19379-28-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The loop invariant motion pass correctly refuses to move statements out of a loop if any other statement in the loop is unanalyzable. The pass does not know how to handle the OpenACC internal function calls which was not necessary until recently when the OpenACC device lowering pass was moved to a later position in the pass pipeline. This commit changes pass_lim to ignore the OpenACC internal function calls which do not contain any memory references. The hoisting enabled by this change can be useful for the data-dependence analysis in Graphite; for instance, in the outlined functions for OpenACC regions, all invariant accesses to the ".omp_data_i" struct should be hoisted out of the OpenACC loop. This is particularly important for variables that were scalars in the original loop and which have been turned into accesses to the struct by the outlining process. Not hoisting those can prevent scalar evolution analysis which is crucial for Graphite. Since any hoisting that introduces intermediate names - and hence, "fake" dependences - inside the analyzed nest can be harmful to data-dependence analysis, a flag to restrict the hoisting in OpenACC functions is added to the pass. The pass instance that executes before Graphite now runs with this flag set to true and the pass instance after Graphite runs unrestricted. A more precise way of selecting the statements for which hoisting should be enabled is left for a future improvement. gcc/ChangeLog: * passes.def: Set restrict_oacc_hoisting to true for the early pass_lim instance. * tree-ssa-loop-im.c (movement_possibility): Add restrict_oacc_hoisting flag to function; restrict movement if set. (compute_invariantness): Add restrict_oacc_hoisting flag and pass it on. (gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE calls. (loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and pass it on. (pass_lim::execute): Pass on new flags. * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust declaration. * gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to loop_invariant_motion_in_fun. --- gcc/gimple-loop-interchange.cc | 2 +- gcc/passes.def | 2 +- gcc/tree-ssa-loop-im.c | 57 ++++++++++++++++++++++++++++------ gcc/tree-ssa-loop-manip.h | 2 +- 4 files changed, 51 insertions(+), 12 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc index ccd5083145f8..7c9b7b2345fa 100644 --- a/gcc/gimple-loop-interchange.cc +++ b/gcc/gimple-loop-interchange.cc @@ -2107,7 +2107,7 @@ pass_linterchange::execute (function *fun) if (changed_p) { unsigned todo = TODO_update_ssa_only_virtuals; - todo |= loop_invariant_motion_in_fun (cfun, false); + todo |= loop_invariant_motion_in_fun (cfun, false, false); scev_reset (); return todo; } diff --git a/gcc/passes.def b/gcc/passes.def index 681392f8f79f..1da9382bac53 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -250,7 +250,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_laddress); - NEXT_PASS (pass_lim); + NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */); NEXT_PASS (pass_walloca, false); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 4b187c2cdafe..466dc494fb52 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -47,6 +47,8 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "tree-dfa.h" #include "dbgcnt.h" +#include "graphite-oacc.h" +#include "internal-fn.h" /* TODO: Support for predicated code motion. I.e. @@ -327,11 +329,23 @@ enum move_pos Otherwise return MOVE_IMPOSSIBLE. */ enum move_pos -movement_possibility (gimple *stmt) +movement_possibility (gimple *stmt, bool restrict_oacc_hoisting) { tree lhs; enum move_pos ret = MOVE_POSSIBLE; + if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl) + && gimple_code (stmt) == GIMPLE_ASSIGN) + { + tree rhs = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR) + rhs = TREE_OPERAND (rhs, 0); + + if (TREE_CODE (rhs) == ARRAY_REF) + return MOVE_IMPOSSIBLE; + } + if (flag_unswitch_loops && gimple_code (stmt) == GIMPLE_COND) { @@ -981,7 +995,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi) statements. */ static void -compute_invariantness (basic_block bb) +compute_invariantness (basic_block bb, bool restrict_oacc_hoisting) { enum move_pos pos; gimple_stmt_iterator bsi; @@ -1009,7 +1023,7 @@ compute_invariantness (basic_block bb) { stmt = gsi_stmt (bsi); - pos = movement_possibility (stmt); + pos = movement_possibility (stmt, restrict_oacc_hoisting); if (pos == MOVE_IMPOSSIBLE) continue; @@ -1040,7 +1054,7 @@ compute_invariantness (basic_block bb) { stmt = gsi_stmt (bsi); - pos = movement_possibility (stmt); + pos = movement_possibility (stmt, restrict_oacc_hoisting); if (pos == MOVE_IMPOSSIBLE) { if (nonpure_call_p (stmt)) @@ -1465,6 +1479,13 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt) if (!gimple_vuse (stmt)) return; + /* The expansion of those OpenACC internal function calls which occurs in a + * later pass does not introduce any memory references. Hence it is safe to + * ignore them. */ + if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP) + || gimple_call_internal_p (stmt, IFN_UNIQUE)) + return; + mem = simple_mem_ref_in_stmt (stmt, &is_stored); if (!mem && is_gimple_assign (stmt)) { @@ -1506,7 +1527,7 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt) ao_ref_alias_set (&aor); HOST_WIDE_INT offset, size, max_size; poly_int64 saved_maxsize = aor.max_size, mem_off; - tree mem_base; + tree mem_base = NULL; bool ref_decomposed; if (aor.max_size_known_p () && aor.offset.is_constant (&offset) @@ -3244,7 +3265,8 @@ tree_ssa_lim_finalize (void) Only perform store motion if STORE_MOTION is true. */ unsigned int -loop_invariant_motion_in_fun (function *fun, bool store_motion) +loop_invariant_motion_in_fun (function *fun, bool store_motion, + bool restrict_oacc_hoisting) { unsigned int todo = 0; @@ -3262,7 +3284,7 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion) /* For each statement determine the outermost loop in that it is invariant and cost for computing the invariant. */ for (int i = 0; i < n; ++i) - compute_invariantness (BASIC_BLOCK_FOR_FN (fun, rpo[i])); + compute_invariantness (BASIC_BLOCK_FOR_FN (fun, rpo[i]), restrict_oacc_hoisting); /* Execute store motion. Force the necessary invariants to be moved out of the loops as well. */ @@ -3309,13 +3331,21 @@ class pass_lim : public gimple_opt_pass { public: pass_lim (gcc::context *ctxt) - : gimple_opt_pass (pass_data_lim, ctxt) + : gimple_opt_pass (pass_data_lim, ctxt), restrict_oacc_hoisting (false) {} + void set_pass_param (unsigned int n, bool param) + { + gcc_assert (n == 0); + restrict_oacc_hoisting = param; + } + /* opt_pass methods: */ opt_pass * clone () { return new pass_lim (m_ctxt); } virtual bool gate (function *) { return flag_tree_loop_im != 0; } virtual unsigned int execute (function *); +private: + bool restrict_oacc_hoisting; }; // class pass_lim @@ -3328,7 +3358,16 @@ pass_lim::execute (function *fun) if (number_of_loops (fun) <= 1) return 0; - unsigned int todo = loop_invariant_motion_in_fun (fun, flag_move_loop_stores); + + bool store_motion = flag_move_loop_stores; + /* TODO Enabling store motion in OpenACC kernel functions requires further + handling of the OpenACC internal function calls. It can also be harmful + to data-dependence analysis. Keep it disabled for now. */ + if (oacc_function_p (cfun) && graphite_analyze_oacc_target_region_type_p (cfun)) + store_motion = false; + + unsigned int todo = loop_invariant_motion_in_fun (fun, store_motion, + restrict_oacc_hoisting); if (!in_loop_pipeline) loop_optimizer_finalize (); diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h index 4f604e1bd24a..864fb9f1d355 100644 --- a/gcc/tree-ssa-loop-manip.h +++ b/gcc/tree-ssa-loop-manip.h @@ -53,7 +53,7 @@ extern void tree_transform_and_unroll_loop (class loop *, unsigned, transform_callback, void *); extern void tree_unroll_loop (class loop *, unsigned, tree_niter_desc *); extern tree canonicalize_loop_ivs (class loop *, tree *, bool); -extern unsigned int loop_invariant_motion_in_fun (function *, bool); +extern unsigned int loop_invariant_motion_in_fun (function *, bool, bool); #endif /* GCC_TREE_SSA_LOOP_MANIP_H */ From patchwork Wed Dec 15 15:54:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48970 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 405793858033 for ; Wed, 15 Dec 2021 16:14:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 199B43857C44 for ; Wed, 15 Dec 2021 15:56:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 199B43857C44 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 7xIfPcTPUmTGZwfDzgFSjWgOhYJBC1SoDWnkEY72Hk7g13cRqYOQl4Hh1VQs6zneBl4WXMxu0g h/OlyJT+tXNF+XspGAb5pjLz6SvgTUxj01W9aMW5w1U9EpqHeX1e7182MEFCNc5PuGBFDDa04p O9mU1Kk4eHAs/44+ZApc0s171aksKtkeiFya7yC0Sfnv1AOzZmZmGjUGPdLKGYOL+89T17rxcV LDRBWLCHijfIa7n6zO0cs1+OkBpxUsu5Catf2LyFRm3lk//ffuWTlZ8Lzdp0UcQEwSaiiTb+eM aqtduSDbB1sSM3SVcDd2DnAa X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736613" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:51 -0800 IronPort-SDR: jxEAtj+s/gjD6yw54ymPGIcFI7BhFP1jDVE09y+ylZmoPS5u2kiNRhhm0ztIPCOQf7sV9AoOZx j1afaZEIsd6zg/AesjlXUkYlxXxUcgVg/B/cz58mDD22TNKFH1ZgCzpqxZWIHlRWLxaa3ur4vW DshbsYEx4CxeuDA3scGAuBf7Ms0HyJ3maXzCSW0KZAvVSZsjr6Ov9tC8nnLHcdM34KW3PHw+fS LEL932oUUN/d3lbKtzMJ5uBmK/AG4LVI5Huzz2cdA/39ciGlN45BvrXnOHvEWBUYnhhA/RzgKX XN0= From: Frederik Harwath To: Subject: [PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite Date: Wed, 15 Dec 2021 16:54:35 +0100 Message-ID: <20211215155447.19379-29-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The additional dependences introduced by partial redundancy elimination proper and by the code hoisting step of the pass very often cause Graphite to fail on OpenACC functions. On the other hand, the pass can also enable the analysis of OpenACC loops (cf. e.g. the loop-auto-transfer-4.f90 testcase), for instance, because full redundancy elimination removes definitions that would otherwise prevent the creation of runtime alias checks outside of the SCoP. This commit disables the actual partial redundancy elimination step as well as the code hoisting step of pass_pre on OpenACC functions that might be handled by Graphite. gcc/ChangeLog: * tree-ssa-pre.c (insert): Skip any insertions in OpenACC functions that might be processed by Graphite. --- gcc/tree-ssa-pre.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index dc55d868cc19..d61210fc2ee9 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-cfgcleanup.h" #include "alias.h" #include "gimple-range.h" +#include "graphite-oacc.h" /* Even though this file is called tree-ssa-pre.c, we actually implement a bit more than just PRE here. All of them piggy-back @@ -3742,6 +3743,22 @@ do_hoist_insertion (basic_block block) static void insert (void) { + + /* The additional dependences introduced by the code insertions + can cause Graphite's dependence analysis to fail . Without + special handling of those dependences in Graphite, it seems + better to skip this step if OpenACC loops that need to be handled + by Graphite are found. Note that the full redundancy elimination + step of this pass is useful for the purpose of dependence + analysis, for instance, because it can remove definitions from + SCoPs that would otherwise prevent the creation of runtime alias + checks since those may only use definitions that are available + before the SCoP. */ + + if (oacc_function_p (cfun) + && ::graphite_analyze_oacc_function_p (cfun)) + return; + basic_block bb; FOR_ALL_BB_FN (bb, cfun) From patchwork Wed Dec 15 15:54:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48971 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 97556385781A for ; Wed, 15 Dec 2021 16:14:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id DBC1F3858439 for ; Wed, 15 Dec 2021 15:56:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DBC1F3858439 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: L3Nm97Y64IGadsqRVAqPVER9Bcran469yyd1qnMzelvafHX9xe2D+LSuC8ueZRgffHILBTQM0d O7LKiGZTTHWr6QZbn0pPbS89o/IKL7IAuUee3GENG5+fuJxIZgOn8sUws5pe6iHtgHarkfvrGZ f9iR/833tXvxAFjspyhPjkRJ/fYR5i5WGBT8ZADnCZtcFEpLkmZGNZ/ukupjyTAxz+rSjybQam ftEChdww2hfKFfFvGVNUPlojHGtbLu9xf62sHFteiw1dj6FSHQy7cW42GzWs5DvQiH8W23aHqk rlVnoDaJyIRFn8RW2IACaHiw X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736621" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:55 -0800 IronPort-SDR: 7CB3EG7gTnXXmpR3G7SBIfYEoDRUak7zRxBtU+M7Lm8yrPbQpo9W/qN3huWVT85pH4egQD+x29 KHyLuFUTm/BEXmXtBcDd/jQWrH1mhJ2EOVUdlg73CB3C1NspXvVhlOkUmxvQtgyjHxcGRi/Ee7 swojQqKRV6XOrE/JpAbg+HhSEntgm5GIqDjh6OpWWktF0x5bxRIrGprrvpP166m2U7UfNwoTHj mE6yDK+AZ7Y4SVRdjv74hwRMbIzATIwQ4sKa4+BKpehoXaW/Ol3RZRHXSGgt14P2xk9yLSNGMO jAY= From: Frederik Harwath To: Subject: [PATCH 29/40] graphite: Tune parameters for OpenACC use Date: Wed, 15 Dec 2021 16:54:36 +0100 Message-ID: <20211215155447.19379-30-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The default values of some parameters that restrict Graphite's resource usage are too low for many OpenACC codes. Furthermore, exceeding the limits does not alwas lead to user-visible diagnostic messages. This commit increases the parameter values on OpenACC functions. The values were chosen to allow for the analysis of all "kernels" regions in the SPEC ACCEL v1.3 benchmark suite. Warnings about exceeded Graphite-related limits are added to the -fopt-info-missed output. Those warnings are phrased in a uniform way that intentionally refers to the "data-dependence analysis" of "OpenACC loops" instead of "a failure in Graphite" to make them easier to understand for users. gcc/ChangeLog: * graphite-optimize-isl.c (optimize_isl): Adjust param_max_isl_operations value for OpenACC functions and add special warnings if value gets exceeded. * graphite-scop-detection.c (build_scops): Likewise for param_graphite_max_arrays_per_scop. gcc/testsuite/ChangeLog: * gcc.dg/goacc/graphite-parameter-1.c: New test. * gcc.dg/goacc/graphite-parameter-2.c: New test. --- gcc/graphite-optimize-isl.c | 35 ++++++++++++++++--- gcc/graphite-scop-detection.c | 28 ++++++++++++++- .../gcc.dg/goacc/graphite-parameter-1.c | 21 +++++++++++ .../gcc.dg/goacc/graphite-parameter-2.c | 23 ++++++++++++ 4 files changed, 101 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c index 019452700a49..4eecbd20b740 100644 --- a/gcc/graphite-optimize-isl.c +++ b/gcc/graphite-optimize-isl.c @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "dumpfile.h" #include "tree-vectorizer.h" #include "graphite.h" +#include "graphite-oacc.h" /* get_schedule_for_node_st - Improve schedule for the schedule node. @@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite) int old_err = isl_options_get_on_error (scop->isl_context); int old_max_operations = isl_ctx_get_max_operations (scop->isl_context); int max_operations = param_max_isl_operations; + + /* The default value for param_max_isl_operations is easily exceeded + by "kernels" loops in existing OpenACC codes. Raise the values + significantly since analyzing those loops is crucial. */ + if (param_max_isl_operations == 350000 /* default value */ + && oacc_function_p (cfun)) + max_operations = 2000000; + if (max_operations) isl_ctx_set_max_operations (scop->isl_context, max_operations); isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE); @@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite) dump_user_location_t loc = find_loop_location (scop->scop_info->region.entry->dest->loop_father); if (isl_ctx_last_error (scop->isl_context) == isl_error_quota) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, - "loop nest not optimized, optimization timed out " - "after %d operations [--param max-isl-operations]\n", - max_operations); - else + { + if (oacc_function_p (cfun)) + { + /* Special casing for OpenACC to unify diagnostic messages + here and in graphite-scop-detection.c. */ + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "data-dependence analysis of OpenACC loop " + "nest " + "failed; try increasing the value of " + "--param=" + "max-isl-operations=%d.\n", + max_operations); + } + else + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "loop nest not optimized, optimization timed " + "out after %d operations [--param " + "max-isl-operations]\n", + max_operations); + } + else dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, "loop nest not optimized, ISL signalled an error\n"); } diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 234dbe0ec729..9a5e43a5bfc6 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -2053,6 +2053,9 @@ determine_openacc_reductions (scop_p scop) } } + +extern dump_user_location_t find_loop_location (class loop *); + /* Find Static Control Parts (SCoP) in the current function and pushes them to SCOPS. */ @@ -2106,6 +2109,11 @@ build_scops (vec *scops) } unsigned max_arrays = param_graphite_max_arrays_per_scop; + + if (oacc_function_p (cfun) + && param_graphite_max_arrays_per_scop == 100 /* default value */) + max_arrays = 200; + if (max_arrays > 0 && scop->drs.length () >= max_arrays) { @@ -2113,7 +2121,16 @@ build_scops (vec *scops) << scop->drs.length () << " is larger than --param graphite-max-arrays-per-scop=" << max_arrays << ".\n"); - free_scop (scop); + + if (dump_enabled_p () && oacc_function_p (cfun)) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + find_loop_location (s->entry->dest->loop_father), + "data-dependence analysis of OpenACC loop nest " + "failed; try increasing the value of --param=" + "graphite-max-arrays-per-scop=%d.\n", + max_arrays); + + free_scop (scop); continue; } @@ -2126,6 +2143,15 @@ build_scops (vec *scops) << scop_nb_params (scop) << " larger than --param graphite-max-nb-scop-params=" << max_dim << ".\n"); + + if (dump_enabled_p () && oacc_function_p (cfun)) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + find_loop_location (s->entry->dest->loop_father), + "data-dependence analysis of OpenACC loop nest " + "failed; try increasing the value of --param=" + "graphite-max-nb-scop-params=%d.\n", + max_dim); + free_scop (scop); continue; } diff --git a/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c new file mode 100644 index 000000000000..45adbb3f0e85 --- /dev/null +++ b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c @@ -0,0 +1,21 @@ +/* Verify that a warning about an exceeded Graphite parameter gets + output as optimization information and not only as a dump message + for OpenACC functions. */ + +/* { dg-additional-options "-O2 -fopt-info-missed --param=graphite-max-arrays-per-scop=1" } */ + +extern int a[1000]; +extern int b[1000]; + +void test () +{ +#pragma acc parallel loop auto +/* { dg-missed {data-dependence analysis of OpenACC loop nest failed\; try increasing the value of --param=graphite-max-arrays-per-scop=1.} "" { target *-*-* } .-1 } */ +/* { dg-missed {'auto' loop has not been analyzed \(cf. 'graphite' dumps for more information\).} "" { target *-*-* } .-2 } */ +/* { dg-missed {.*not inlinable.*} "" { target *-*-* } .-3 } */ + for (int i = 1; i < 995; i++) + a[i] = b[i + 5] + b[i - 1]; +} + + +/* { dg-prune-output ".*not inlinable.*"} */ diff --git a/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c new file mode 100644 index 000000000000..f2830cd62db0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c @@ -0,0 +1,23 @@ +/* Verify that a warning about an exceeded Graphite parameter gets + output as optimization information and not only as a dump message + for OpenACC functions. */ + +/* { dg-additional-options "-O2 -fopt-info-missed --param=max-isl-operations=1" } */ + +void test (int* restrict a, int *restrict b) +{ + int i = 1; + int j = 1; + int m = 0; + +#pragma acc parallel loop auto copyin(b) copyout(a) reduction(max:m) +/* { dg-missed {data-dependence analysis of OpenACC loop nest failed; try increasing the value of --param=max-isl-operations=1.} "" { target *-*-* } .-1 } */ +/* { dg-missed {'auto' loop has not been analyzed \(cf. 'graphite' dumps for more information\).} "" { target *-*-* } .-2 } */ +/* { dg-missed {.*not inlinable.*} "" { target *-*-* } .-3 } */ + for (i = 1; i < 995; i++) + { + int x = b[i] * 2; + for (j = 1; j < 995; j++) + m = m + a[i] + x; + } +} From patchwork Wed Dec 15 15:54:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48972 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 453DA3857C65 for ; Wed, 15 Dec 2021 16:15:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 6A8713858C3A for ; Wed, 15 Dec 2021 15:56:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6A8713858C3A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: TmfV1brw0h+sFE1rwsNe6RXNR3BbuFg+UVhxEw8dop90HnXVWL0d7uiMKLCAsxJV+/zEfXzfpM KPGrPIJgKve9g7OUcIVlnuFFJu8hYmyZAnXYkUV8K7f17aOwE28JWolRLeQSh4/NWqX0cXnxSa iOpjkGk9q80/ahlIclnnMWvP4G0uhFFYL3p1/gcdU/2lwKE2ZFzYXh1bMwJjf3/XwmxRvvXmWz d7pL4KnG1ec+3ueF56WEtrvDdbV/Mq1Fs5fQcg7Z7OevoiRbWWFTiQE6iJDoMXZQEcYY1hjbLN 9UNWJGVoyUno71j0vtBGcLX9 X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736624" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:56:56 -0800 IronPort-SDR: XKi7jHDdVProZRuLe1xGMKr8iY3+vdakvTmEAfk6HAM5amRzOV5D7XxzdE47Mxn+23WWZntfPH I7j5/nYDtLB/Vc9h120DNhXtoMx9kymJnmogVgte3qrYXeOsVnNinkPihqm+iuWmu6aNAOwD69 nwKx7mFkbaJMmYtKe4BKwQuqWAhT0qMc3yB2lcSLw4/rcfPQMPX1NcEJNWDYCsBO2LYaIGESqg rdOjwRF+Cven+aPAOKCNizyEvfgekqwKYUAvY5v6TLu/M1x4BnE0DGkI/6fRg6PtMzp2kt5X+a h00= From: Frederik Harwath To: Subject: [PATCH 30/40] graphite: Adjust scop loop-nest choice Date: Wed, 15 Dec 2021 16:54:37 +0100 Message-ID: <20211215155447.19379-31-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The find_common_loop function is used in Graphite to obtain a common super-loop of all loops inside a SCoP. The function is applied to the loop of the destination block of the edge that leads into the SESE region and the loop of the source block of the edge that exits the region. The exit block is usually introduced by the canonicalization of the loop structure that Graphite does to support its code generation. If it is empty, it may happen that it belongs to the outer fake loop. This way, build_alias_set may end up analysing data-references with respect to this loop although there may exist a proper super-loop of the SCoP loops. This does not seem to be correct in general and it leads to problems with runtime alias check creation which fails if executed on a loop without niter information. gcc/ChangeLog: * graphite-scop-detection.c (scop_context_loop): New function. (build_alias_set): Use scop_context_loop instead of find_common_loop. * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise. * graphite.h (scop_context_loop): New declaration. --- gcc/graphite-isl-ast-to-gimple.c | 4 +--- gcc/graphite-scop-detection.c | 21 ++++++++++++++++++--- gcc/graphite.h | 1 + 3 files changed, 20 insertions(+), 6 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index 010adaabb000..acadf544fadd 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop) conditional if aliasing can be ruled out at runtime and the original version of the SCoP, otherwise. */ - loop_p loop - = find_common_loop (scop->scop_info->region.entry->dest->loop_father, - scop->scop_info->region.exit->src->loop_father); + loop_p loop = scop_context_loop (scop); tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop); tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); set_ifsese_condition (region->if_region, non_alias_cond); diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 9a5e43a5bfc6..f173e6c4f890 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb) return NULL; } + +/* Return the innermost loop that encloses all loops in SCOP. */ + +loop_p +scop_context_loop (scop_p scop) +{ + edge scop_entry = scop->scop_info->region.entry; + edge scop_exit = scop->scop_info->region.exit; + basic_block exit_bb = scop_exit->src; + + while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb)) + exit_bb = single_pred (exit_bb); + + loop_p entry_loop = scop_entry->dest->loop_father; + return find_common_loop (entry_loop, exit_bb->loop_father); +} + namespace { @@ -1774,9 +1791,7 @@ build_alias_set (scop_p scop) int i, j; int *all_vertices; - struct loop *nest - = find_common_loop (scop->scop_info->region.entry->dest->loop_father, - scop->scop_info->region.exit->src->loop_father); + struct loop *nest = scop_context_loop (scop); gcc_checking_assert (nest); diff --git a/gcc/graphite.h b/gcc/graphite.h index 9c508f31109f..dacb27a9073c 100644 --- a/gcc/graphite.h +++ b/gcc/graphite.h @@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree); extern void dot_all_sese (FILE *, vec &); extern void dot_sese (sese_l &); extern void dot_cfg (); +extern loop_p scop_context_loop (scop_p); #endif From patchwork Wed Dec 15 15:54:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48973 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9AA02385781E for ; Wed, 15 Dec 2021 16:15:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 2B1D6385800B for ; Wed, 15 Dec 2021 15:57:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2B1D6385800B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: TXJXRqhymGi+Z+EYqdoI/m2/FtOP+Q06ygo+VW+NVsfBytMZpSCtrmOaVMQzM+InYuLE0l86fJ fTHHOm5YC1iDBaN7r3ovWsuA1G75DcFTqBNxmQi//r0RWFCbCeHIL0mWFY8ouRv2QJr8GO5uwW hs7ij9uE5z3Sg0c+qCtX78juBg2jMacWPmrGscDcRpC1v7ly1WTwHW5w+SwAoUaXmGhWfTH2ab T2ryqs13yj5kevaIWdV/GQZFMAXTYmQfBeBNspZaaGVuqQn2DM3ARtIyxxaOkDvox47fp4VGbI t22gcBpmxblEanFiKSvqfYLq X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69736628" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:00 -0800 IronPort-SDR: lFZSWeokgWB8a//NbQRLhsWnRaDqFmjxu26r4ihoU+r30VxNw0YMOBlev0rkgJsyjCraDIdK7r w1wXmz6E/+uLFjxRBlw/EIkmwXxBGS8TfBueGZCLcwdVBSYUFRCbF3/cX1YwnygPQClvGQ8UzY Z5lgWlLACi3WtkEOfRBAq+PvR/Rame9wnJkyNu/sTpMiXG3Dp59ucXu4gtDC1jyBapYdPachje i6W05BNScgtyxDayN3n71CVMM6FX65t2OvXbD4yvdSRuM0njLC/paP/5qzWQFVOf4ZnROEXDSx PVs= From: Frederik Harwath To: Subject: [PATCH 31/40] graphite: Accept loops without data references Date: Wed, 15 Dec 2021 16:54:38 +0100 Message-ID: <20211215155447.19379-32-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: rguenther@suse.de, sebpop@gmail.com, thomas@codesourcery.com, grosser@fim.uni-passau.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" It seems that the check that rejects loops without data references is only included to avoid handling non-profitable loops. Including those loops in Graphite's analysis enables more consistent diagnostic messages in OpenACC "kernels" code and does not introduce any testsuite regressions. If executing Graphite on loops without data references leads to noticeable compile time slow-downs for non-OpenACC users of Graphite, the check can be re-introduced but restricted to non-OpenACC functions. gcc/ChangeLog: * graphite-scop-detection.c (scop_detection::harmful_loop_in_region): Remove check for loops without data references. --- gcc/graphite-scop-detection.c | 13 ------------- 1 file changed, 13 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index f173e6c4f890..2dcb85508a3d 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -849,19 +849,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const return true; } - /* Check if all loop nests have at least one data reference. - ??? This check is expensive and loops premature at this point. - If important to retain we can pre-compute this for all innermost - loops and reject those when we build a SESE region for a loop - during SESE discovery. */ - if (! loop->inner - && ! loop_nest_has_data_refs (loop)) - { - DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num - << " does not have any data reference.\n"); - return true; - } - DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n"); } From patchwork Wed Dec 15 15:54:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48974 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2086F3858023 for ; Wed, 15 Dec 2021 16:16:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 7AC703857C44 for ; Wed, 15 Dec 2021 15:57:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7AC703857C44 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: B+iovd7VK8qjSLtmnUZ/TDg71EANI6pWZzaKoQh79sjjKv0kqA4UsxsrpAdLJoqE94XJKOtcui AD1nlkTDxAC5ZdhJ4Edl7coV1XFNc88682UlQ1M+Vy8rINzIZ4gUEIqF6auGNSDLdJUQGDkcgB 2gBtqsK6P1cndOhGTAUo0YP6GLo7PJz2Wkzj7FJld51zhIqAla17elMdoZvgKUA3rB+RpqFUzG Wt+aDGRxxCCTmlCZkPauTcj/4IoA2IqvVL05AEOV1XP0ZAZs8+1ZTlLSB0hNoi1/XJ1JErlZ9S yFHt1MaSSQkg/MzB3jhqdkiL X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584644" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:09 -0800 IronPort-SDR: 3faYh5RoAn+nMD77+wAyPw8jaDx1HBYu9Fy9mUSPOgJQL7guqapvmRXTqk9wdl9O8dCaQEEVx0 SXfcCXnGq6mazYcV4nfw/Nquhvc+YrLWcVf/tLqxAhJVseQIc1tplnXuLl31kju0lx4KOK8QHh Cf6m/qyKGqWhGil0wQCLxny0Fp/35R6st5RWKsTwDVxLkc1nz3BkM5WC4wDeQkc/pdO4IgCMOT RaafNPbTkcZXtGWQ9jt1mWPHc6f7vSsi2mftZ9Gkebn/fI08EKIQWlZ3mxIvHZAJhyjaJLz4Ng EfE= From: Frederik Harwath To: Subject: [PATCH 32/40] Reference reduction localization Date: Wed, 15 Dec 2021 16:54:39 +0100 Message-ID: <20211215155447.19379-33-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julian Brown , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Julian Brown gcc/ * gimplify.c (privatize_reduction): New struct. (localize_reductions_r, localize_reductions): New functions. (gimplify_omp_for): Call localize_reductions. (gimplify_omp_workshare): Likewise. * omp-low.c (lower_oacc_reductions): Handle localized reductions. Create fewer temp vars. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL documentation. * tree.c (omp_clause_num_ops): Bump number of ops for OMP_CLAUSE_REDUCTION to 6. (walk_tree_1): Adjust accordingly. * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro. --- gcc/gimplify.c | 102 +++++++++++++++++++++++++++++++++++ gcc/omp-low.c | 45 +++++----------- gcc/tree-core.h | 4 +- gcc/tree.c | 137 +++++++++++++++++++++++++++++++++++++++++++++--- gcc/tree.h | 2 + 5 files changed, 250 insertions(+), 40 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index c2ab96e7e182..9a4331c70d6e 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -240,6 +240,11 @@ struct gimplify_omp_ctx int defaultmap[5]; }; +struct privatize_reduction +{ + tree ref_var, local_var; +}; + static struct gimplify_ctx *gimplify_ctxp; static struct gimplify_omp_ctx *gimplify_omp_ctxp; static bool in_omp_construct; @@ -11900,6 +11905,80 @@ gimplify_omp_taskloop_expr (tree type, tree *tp, gimple_seq *pre_p, OMP_FOR_CLAUSES (orig_for_stmt) = c; } +/* Helper function for localize_reductions. Replace all uses of REF_VAR with + LOCAL_VAR. */ + +static tree +localize_reductions_r (tree *tp, int *walk_subtrees, void *data) +{ + enum tree_code tc = TREE_CODE (*tp); + struct privatize_reduction *pr = (struct privatize_reduction *) data; + + if (TYPE_P (*tp)) + *walk_subtrees = 0; + + switch (tc) + { + case INDIRECT_REF: + case MEM_REF: + if (TREE_OPERAND (*tp, 0) == pr->ref_var) + *tp = pr->local_var; + + *walk_subtrees = 0; + break; + + case VAR_DECL: + case PARM_DECL: + case RESULT_DECL: + if (*tp == pr->ref_var) + *tp = pr->local_var; + + *walk_subtrees = 0; + break; + + default: + break; + } + + return NULL_TREE; +} + +/* OpenACC worker and vector loop state propagation requires reductions + to be inside local variables. This function replaces all reference-type + reductions variables associated with the loop with a local copy. It is + also used to create private copies of reduction variables for those + which are not associated with acc loops. */ + +static void +localize_reductions (tree clauses, tree body) +{ + tree c, var, type, new_var; + struct privatize_reduction pr; + + for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION) + { + var = OMP_CLAUSE_DECL (c); + + if (!lang_hooks.decls.omp_privatize_by_reference (var)) + { + OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = NULL; + continue; + } + + type = TREE_TYPE (TREE_TYPE (var)); + new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var))); + + pr.ref_var = var; + pr.local_var = new_var; + + walk_tree (&body, localize_reductions_r, &pr, NULL); + + OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var; + } +} + + /* Gimplify the gross structure of an OMP_FOR statement. */ static enum gimplify_status @@ -12126,6 +12205,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) gcc_unreachable (); } + if (ort == ORT_ACC) + { + gimplify_omp_ctx *outer = gimplify_omp_ctxp; + + while (outer + && outer->region_type != ORT_ACC_PARALLEL + && outer->region_type != ORT_ACC_KERNELS) + outer = outer->outer_context; + + /* FIXME: Reductions only work in parallel regions at present. We avoid + doing the reduction localization transformation in kernels regions + here, because the code to remove reductions in kernels regions cannot + handle that. */ + if (outer && outer->region_type == ORT_ACC_PARALLEL) + localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p)); + } + /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear clause for the IV. */ if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1) @@ -13654,6 +13750,12 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) || (ort & ORT_HOST_TEAMS) == ORT_HOST_TEAMS) { push_gimplify_context (); + + /* FIXME: Reductions are not supported in kernels regions yet. */ + if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL) + localize_reductions (OMP_TARGET_CLAUSES (*expr_p), + OMP_TARGET_BODY (*expr_p)); + gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body); if (gimple_code (g) == GIMPLE_BIND) pop_gimplify_context (g); diff --git a/gcc/omp-low.c b/gcc/omp-low.c index afd6061ae1e9..ae5cdfc5e260 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -7530,9 +7530,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, || is_oacc_kernels_decomposed_graphite_part (ctx)); tree orig = OMP_CLAUSE_DECL (c); - tree var = maybe_lookup_decl (orig, ctx); + tree var; tree ref_to_res = NULL_TREE; - tree incoming, outgoing, v1, v2, v3; + tree incoming, outgoing; bool is_private = false; enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c); @@ -7544,6 +7544,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, rcode = BIT_IOR_EXPR; tree op = build_int_cst (unsigned_type_node, rcode); + var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c); + if (!var) + var = maybe_lookup_decl (orig, ctx); if (!var) var = orig; @@ -7636,34 +7639,11 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, if (omp_privatize_by_reference (orig)) { - tree type = TREE_TYPE (var); - const char *id = IDENTIFIER_POINTER (DECL_NAME (var)); - - if (!inner) - { - tree x = create_tmp_var (TREE_TYPE (type), id); - gimplify_assign (var, build_fold_addr_expr (x), fork_seq); - } - - v1 = create_tmp_var (type, id); - v2 = create_tmp_var (type, id); - v3 = create_tmp_var (type, id); - - gimplify_assign (v1, var, fork_seq); - gimplify_assign (v2, var, fork_seq); - gimplify_assign (v3, var, fork_seq); - - var = build_simple_mem_ref (var); - v1 = build_simple_mem_ref (v1); - v2 = build_simple_mem_ref (v2); - v3 = build_simple_mem_ref (v3); outgoing = build_simple_mem_ref (outgoing); if (!TREE_CONSTANT (incoming)) incoming = build_simple_mem_ref (incoming); } - else - v1 = v2 = v3 = var; /* Determine position in reduction buffer, which may be used by target. The parser has ensured that this is not a @@ -7696,20 +7676,21 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION, TREE_TYPE (var), 6, init_code, unshare_expr (ref_to_res), - v1, level, op, off); + var, level, op, off); tree fini_call = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION, TREE_TYPE (var), 6, fini_code, unshare_expr (ref_to_res), - v2, level, op, off); + var, level, op, off); tree teardown_call = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION, - TREE_TYPE (var), 6, teardown_code, - ref_to_res, v3, level, op, off); + TREE_TYPE (var), 6, + teardown_code, ref_to_res, var, + level, op, off); - gimplify_assign (v1, setup_call, &before_fork); - gimplify_assign (v2, init_call, &after_fork); - gimplify_assign (v3, fini_call, &before_join); + gimplify_assign (var, setup_call, &before_fork); + gimplify_assign (var, init_call, &after_fork); + gimplify_assign (var, fini_call, &before_join); gimplify_assign (outgoing, teardown_call, &after_join); } diff --git a/gcc/tree-core.h b/gcc/tree-core.h index f0c65a25f070..980bdee6c285 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -269,7 +269,9 @@ enum omp_clause_code { placeholder used in OMP_CLAUSE_REDUCTION_{INIT,MERGE}. Operand 4: OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER: Another dummy VAR_DECL placeholder, used like the above for C/C++ array - reductions. */ + reductions. + Operand 5: OMP_CLAUSE_REDUCTION_PRIVATE_DECL: A private VAR_DECL of + the original DECL associated with the reduction clause. */ OMP_CLAUSE_REDUCTION, /* OpenMP clause: task_reduction (operator:variable_list). */ diff --git a/gcc/tree.c b/gcc/tree.c index 7bfd64160f4e..08f5a3e884bf 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -283,7 +283,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_SHARED */ 1, /* OMP_CLAUSE_FIRSTPRIVATE */ 2, /* OMP_CLAUSE_LASTPRIVATE */ - 5, /* OMP_CLAUSE_REDUCTION */ + 6, /* OMP_CLAUSE_REDUCTION */ 5, /* OMP_CLAUSE_TASK_REDUCTION */ 5, /* OMP_CLAUSE_IN_REDUCTION */ 1, /* OMP_CLAUSE_COPYIN */ @@ -11134,12 +11134,135 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data, break; case OMP_CLAUSE: - { - int len = omp_clause_num_ops[OMP_CLAUSE_CODE (*tp)]; - for (int i = 0; i < len; i++) - WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i)); - WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); - } + switch (OMP_CLAUSE_CODE (*tp)) + { + case OMP_CLAUSE_GANG: + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1)); + /* FALLTHRU */ + + case OMP_CLAUSE_ASYNC: + case OMP_CLAUSE_WAIT: + case OMP_CLAUSE_WORKER: + case OMP_CLAUSE_VECTOR: + case OMP_CLAUSE_NUM_GANGS: + case OMP_CLAUSE_NUM_WORKERS: + case OMP_CLAUSE_VECTOR_LENGTH: + case OMP_CLAUSE_PRIVATE: + case OMP_CLAUSE_SHARED: + case OMP_CLAUSE_FIRSTPRIVATE: + case OMP_CLAUSE_COPYIN: + case OMP_CLAUSE_COPYPRIVATE: + case OMP_CLAUSE_FILTER: + case OMP_CLAUSE_FINAL: + case OMP_CLAUSE_IF: + case OMP_CLAUSE_NUM_THREADS: + case OMP_CLAUSE_SCHEDULE: + case OMP_CLAUSE_UNIFORM: + case OMP_CLAUSE_DEPEND: + case OMP_CLAUSE_NONTEMPORAL: + case OMP_CLAUSE_NUM_TEAMS: + case OMP_CLAUSE_THREAD_LIMIT: + case OMP_CLAUSE_DEVICE: + case OMP_CLAUSE_DIST_SCHEDULE: + case OMP_CLAUSE_SAFELEN: + case OMP_CLAUSE_SIMDLEN: + case OMP_CLAUSE_ORDERED: + case OMP_CLAUSE_PRIORITY: + case OMP_CLAUSE_GRAINSIZE: + case OMP_CLAUSE_NUM_TASKS: + case OMP_CLAUSE_HINT: + case OMP_CLAUSE_TO_DECLARE: + case OMP_CLAUSE_LINK: + case OMP_CLAUSE_DETACH: + case OMP_CLAUSE_USE_DEVICE_PTR: + case OMP_CLAUSE_USE_DEVICE_ADDR: + case OMP_CLAUSE_IS_DEVICE_PTR: + case OMP_CLAUSE_INCLUSIVE: + case OMP_CLAUSE_EXCLUSIVE: + case OMP_CLAUSE__LOOPTEMP_: + case OMP_CLAUSE__REDUCTEMP_: + case OMP_CLAUSE__CONDTEMP_: + case OMP_CLAUSE__SCANTEMP_: + case OMP_CLAUSE__SIMDUID_: + case OMP_CLAUSE_AFFINITY: + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 0)); + /* FALLTHRU */ + + case OMP_CLAUSE_INDEPENDENT: + case OMP_CLAUSE_NOWAIT: + case OMP_CLAUSE_DEFAULT: + case OMP_CLAUSE_UNTIED: + case OMP_CLAUSE_MERGEABLE: + case OMP_CLAUSE_PROC_BIND: + case OMP_CLAUSE_DEVICE_TYPE: + case OMP_CLAUSE_INBRANCH: + case OMP_CLAUSE_NOTINBRANCH: + case OMP_CLAUSE_FOR: + case OMP_CLAUSE_PARALLEL: + case OMP_CLAUSE_SECTIONS: + case OMP_CLAUSE_TASKGROUP: + case OMP_CLAUSE_NOGROUP: + case OMP_CLAUSE_THREADS: + case OMP_CLAUSE_SIMD: + case OMP_CLAUSE_DEFAULTMAP: + case OMP_CLAUSE_ORDER: + case OMP_CLAUSE_BIND: + case OMP_CLAUSE_AUTO: + case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_NOHOST: + case OMP_CLAUSE_TILE: + case OMP_CLAUSE__SIMT_: + case OMP_CLAUSE_IF_PRESENT: + case OMP_CLAUSE_FINALIZE: + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + + case OMP_CLAUSE_LASTPRIVATE: + WALK_SUBTREE (OMP_CLAUSE_DECL (*tp)); + WALK_SUBTREE (OMP_CLAUSE_LASTPRIVATE_STMT (*tp)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + + case OMP_CLAUSE_COLLAPSE: + { + int i; + for (i = 0; i < 3; i++) + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + } + + case OMP_CLAUSE_LINEAR: + WALK_SUBTREE (OMP_CLAUSE_DECL (*tp)); + WALK_SUBTREE (OMP_CLAUSE_LINEAR_STEP (*tp)); + WALK_SUBTREE (OMP_CLAUSE_LINEAR_STMT (*tp)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + + case OMP_CLAUSE_ALIGNED: + case OMP_CLAUSE_ALLOCATE: + case OMP_CLAUSE_FROM: + case OMP_CLAUSE_TO: + case OMP_CLAUSE_MAP: + case OMP_CLAUSE__CACHE_: + WALK_SUBTREE (OMP_CLAUSE_DECL (*tp)); + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + + case OMP_CLAUSE_REDUCTION: + { + for (int i = 0; i < 6; i++) + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + } + + case OMP_CLAUSE_TASK_REDUCTION: + case OMP_CLAUSE_IN_REDUCTION: + { + for (int i = 0; i < 5; i++) + WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, i)); + WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp)); + } + + default: + gcc_unreachable (); + } break; case TARGET_EXPR: diff --git a/gcc/tree.h b/gcc/tree.h index 15e5147f40b0..5ee1c33f4e15 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1746,6 +1746,8 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_REDUCTION_DECL_PLACEHOLDER(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (NODE, OMP_CLAUSE_REDUCTION, \ OMP_CLAUSE_IN_REDUCTION), 4) +#define OMP_CLAUSE_REDUCTION_PRIVATE_DECL(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_REDUCTION), 5) /* True if a REDUCTION clause may reference the original list item (omp_orig) in its OMP_CLAUSE_REDUCTION_{,GIMPLE_}INIT. */ From patchwork Wed Dec 15 15:54:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48975 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A08493857C45 for ; Wed, 15 Dec 2021 16:16:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id BC5103858412 for ; Wed, 15 Dec 2021 15:57:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BC5103858412 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: ljcwdh0vtvHKgOaM8S+GlTznQ+buPKAsVtkkCLppYsSKudZmUV6RgxJi0xMdAwNBMNshiF2xSu Jq3Ubcg0LqhO8p22Xe1xdO9AvcKYXXZuC0skj9wIX4QVBf/lZHrVyzWVP0cSCjYBL/9+ZZ+h3l HL6hKsXWQgU1SHbsbvB4szU6nzHfLOR4+VgVs9EkvS5L2aHjjF1t8xC5EA3AHvpqQHcXGGhTKT gbeKAqky79KFYXeQ4SgMEEXvwpBmyVhebtz71IkqtZnd89m4FSRMWnHtrhWFYWao8/0pHsVEyH hw/vYCsxKEOOc/0DwCHRm9yR X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584645" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:09 -0800 IronPort-SDR: xlusxeUb+AvzQc1MHeptw08wN8zTJPXcQax7CWPxlzk0XFAD7BMNhhPR1DVHLJcUd7g93TJ5JC BjnbOfChBltDzku0Jq1VHrXDeDcRVFj9aVLvLeEQl85En0sRHJmcGnPWbflrnOTvsHiJdZ2Mt3 z36qrnuGTsBBXLjfcLdw90sdoo+rv1LilODuQh18oS++/3qRZVf6/ee+VIgl153Bt3gautd5/5 O1ASnUSCJ+EUXAjb+qdMTxD9N7ACw+IXTRJf+37md/yuW3wjzMsrUh1m8W3E4DC3TYvqfmbVr+ tlE= From: Frederik Harwath To: Subject: [PATCH 33/40] Fix tree check failure with reduction localization Date: Wed, 15 Dec 2021 16:54:40 +0100 Message-ID: <20211215155447.19379-34-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julian Brown , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Julian Brown gcc/ * gimplify.c (gimplify_omp_workshare): Use OMP_CLAUSES, OMP_BODY instead of OMP_TARGET_CLAUSES, OMP_TARGET_BODY. --- gcc/gimplify.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 9a4331c70d6e..04ffbc256442 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -13753,8 +13753,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) /* FIXME: Reductions are not supported in kernels regions yet. */ if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL) - localize_reductions (OMP_TARGET_CLAUSES (*expr_p), - OMP_TARGET_BODY (*expr_p)); + localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr)); gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body); if (gimple_code (g) == GIMPLE_BIND) From patchwork Wed Dec 15 15:54:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48976 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EDE953857C50 for ; Wed, 15 Dec 2021 16:17:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 32583385843B for ; Wed, 15 Dec 2021 15:57:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 32583385843B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: MO8RkHKGErXUv/EwHuFC75t9Hlqbrq/9/knsGCtJmzioMPTFIAW4g7FgmSq1AQY1DCvsx9sXn4 AHc09YbprmSDSbw3Y0jgz4zJ9Iashm9lRIdqa3KR412/nDbiCe/1F9kZMljvTRisy61IcIoARw QO7DzmBGgIZdl5P9e9Jm7a8jC3MGV8avudRJYYC639Ss7w3EtPIP8RIjVZavMeFdf+GxzCslXZ 6fH14J7N3l69tBnl2xSnvO+gNnEG9NjQapf/aJKq9KqTHmqZ2+vO3D/oP9BhrsKrSvJMoHNN+A FGLqhjnIC3rBLz+BAa+A+KtJ X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584648" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:14 -0800 IronPort-SDR: zzmq+STVrRARhYLkdoeyOlZPZvxxa+0WKwBRJ/xm4VkzUSx0Q82mMnc6MuGpfnptOqce0pjX3o faA0XV3avienhoSlDy+Sknqe5hBgsiNWH9zZ9WeAbJ6JuUlGjQzoPNgcnsQHGXDCrWSEaiWqSd KNN0rHucl40CZBBWM3DsvNNcDSOnv04MMh5MLhax4RMfEeaTnUEaYfvPwzOM8kZUkkLtOn3xJj WuZW0UU1bpGGbHBDGKJoRlMkRpOpdi/aQgHzl0DdskPPVzmEOrcHXgh44DYBQ4dCJclUQk/SFg Mbc= From: Frederik Harwath To: Subject: [PATCH 34/40] Use more appropriate var in localize_reductions call Date: Wed, 15 Dec 2021 16:54:41 +0100 Message-ID: <20211215155447.19379-35-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julian Brown , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Julian Brown gcc/ * gimplify.c (gimplify_omp_for): Use for_stmt in call to localize_reductions. --- gcc/gimplify.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 04ffbc256442..daa69ccf6202 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -12219,7 +12219,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) here, because the code to remove reductions in kernels regions cannot handle that. */ if (outer && outer->region_type == ORT_ACC_PARALLEL) - localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p)); + localize_reductions (OMP_FOR_CLAUSES (for_stmt), + OMP_FOR_BODY (for_stmt)); } /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear From patchwork Wed Dec 15 15:54:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48977 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4DCBB385843D for ; Wed, 15 Dec 2021 16:17:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 5737D3858412 for ; Wed, 15 Dec 2021 15:57:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5737D3858412 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: JUKLvXHAKihT1hhwvBX33mjMw0cfIQl7goEbDEBD4E8v0MYX151SVb7Kn4AFg6MtLjPxipzzu7 8kMwY6EaGbD+gRrnoAELnp83Plo83HxI+DSLi5z3meCP7h8p2fQYIEoZ2wwIN9oSKZOgt6weOR T/dzVot77ygg9+LfzGszZRqp+eYW6GpQ5ydEGNfXrgVn9Mzth24YuDDM3ha9B2HmCQQWLxc+xt ISyZNKI57UIGqvcV+eDvpVCDjXNioa71Oc0sp7qH1R6kY/9GNWCSwdV/fcDYg96h8ihXa6MNtl GWl3CU+oT0OqTff4BhxhpYuI X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584649" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:14 -0800 IronPort-SDR: AUYoQDBehf9lRYP4rCMpKuOVxQ5DNAaXUhPVo6Y9WKO5r59px+MqM7k0gGWRdgFkLhbNjPegiv IIG0tQUWjBZwYMt7XbvQm2aJyZVeK7Q8iErbEBs6HOqcT+iT/fxUPQ8Fe2tSDJjGJuSw3+UgCN Do/rUyBVMWBhCfJsGxveUmT695zLUhFBONA189951zHr9aG5TgsRdSCrl9G/Fah6dAXz3pDV0B /c1wVWp6YJcEGrKS1V+kcA5y9iT3xmgcZTbSbZk3CeNFMIUuxVSVqfNWZty5v4L4U/ZvlzxDFg hp0= From: Frederik Harwath To: Subject: [PATCH 35/40] Handle references in OpenACC "private" clauses Date: Wed, 15 Dec 2021 16:54:42 +0100 Message-ID: <20211215155447.19379-36-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Julian Brown , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Julian Brown gcc/ * gimplify.c (localize_reductions): Rewrite references for OMP_CLAUSE_PRIVATE also. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test. * testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test. --- gcc/gimplify.c | 15 ++++ .../libgomp.oacc-c++/privatized-ref-2.C | 64 +++++++++++++++++ .../libgomp.oacc-c++/privatized-ref-3.C | 64 +++++++++++++++++ .../libgomp.oacc-fortran/privatized-ref-1.f95 | 71 +++++++++++++++++++ 4 files changed, 214 insertions(+) create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index daa69ccf6202..bf37388f947c 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -11976,6 +11976,21 @@ localize_reductions (tree clauses, tree body) OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var; } + else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE) + { + var = OMP_CLAUSE_DECL (c); + + if (!lang_hooks.decls.omp_privatize_by_reference (var)) + continue; + + type = TREE_TYPE (TREE_TYPE (var)); + new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var))); + + pr.ref_var = var; + pr.local_var = new_var; + + walk_tree (&body, localize_reductions_r, &pr, NULL); + } } diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C new file mode 100644 index 000000000000..3884f163132c --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C @@ -0,0 +1,64 @@ +/* { dg-do run } */ + +#include + +void workers (void) +{ + double res[65536]; + int i; + +#pragma acc parallel copyout(res) num_gangs(64) num_workers(64) + { + int i, j; +#pragma acc loop gang + for (i = 0; i < 256; i++) + { +#pragma acc loop worker + for (j = 0; j < 256; j++) + { + int tmpvar; + int &tmpref = tmpvar; + tmpref = (i * 256 + j) * 99; + res[i * 256 + j] = tmpref; + } + } + } + + for (i = 0; i < 65536; i++) + if (res[i] != i * 99) + abort (); +} + +void vectors (void) +{ + double res[65536]; + int i; + +#pragma acc parallel copyout(res) num_gangs(64) num_workers(64) + { + int i, j; +#pragma acc loop gang worker + for (i = 0; i < 256; i++) + { +#pragma acc loop vector + for (j = 0; j < 256; j++) + { + int tmpvar; + int &tmpref = tmpvar; + tmpref = (i * 256 + j) * 101; + res[i * 256 + j] = tmpref; + } + } + } + + for (i = 0; i < 65536; i++) + if (res[i] != i * 101) + abort (); +} + +int main (int argc, char *argv[]) +{ + workers (); + vectors (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C new file mode 100644 index 000000000000..c1a10cba31b3 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C @@ -0,0 +1,64 @@ +/* { dg-do run } */ + +#include + +void workers (void) +{ + double res[65536]; + int i; + +#pragma acc parallel copyout(res) num_gangs(64) num_workers(64) + { + int i, j; + int tmpvar; + int &tmpref = tmpvar; +#pragma acc loop gang + for (i = 0; i < 256; i++) + { +#pragma acc loop worker private(tmpref) + for (j = 0; j < 256; j++) + { + tmpref = (i * 256 + j) * 99; + res[i * 256 + j] = tmpref; + } + } + } + + for (i = 0; i < 65536; i++) + if (res[i] != i * 99) + abort (); +} + +void vectors (void) +{ + double res[65536]; + int i; + +#pragma acc parallel copyout(res) num_gangs(64) num_workers(64) + { + int i, j; + int tmpvar; + int &tmpref = tmpvar; +#pragma acc loop gang worker + for (i = 0; i < 256; i++) + { +#pragma acc loop vector private(tmpref) + for (j = 0; j < 256; j++) + { + tmpref = (i * 256 + j) * 101; + res[i * 256 + j] = tmpref; + } + } + } + + for (i = 0; i < 65536; i++) + if (res[i] != i * 101) + abort (); +} + +int main (int argc, char *argv[]) +{ + workers (); + vectors (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 new file mode 100644 index 000000000000..fe1520a8078c --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 @@ -0,0 +1,71 @@ +! { dg-do run } + +program main + implicit none + integer :: myint + integer :: i + real :: res(65536), tmp + + res(:) = 0.0 + + myint = 5 + call workers(myint, res) + + do i=1,65536 + tmp = i * 99 + if (res(i) .ne. tmp) stop 1 + end do + + res(:) = 0.0 + + myint = 7 + call vectors(myint, res) + + do i=1,65536 + tmp = i * 101 + if (res(i) .ne. tmp) stop 2 + end do + +contains + + subroutine workers(t1, res) + implicit none + integer :: t1 + integer :: i, j + real, intent(out) :: res(:) + + !$acc parallel copyout(res) num_gangs(64) num_workers(64) ! { dg-warning "using num_workers \\(32\\), ignoring 64" "" { target openacc_nvidia_accel_selected } } + + !$acc loop gang + do i=0,255 + !$acc loop worker private(t1) + do j=1,256 + t1 = (i * 256 + j) * 99 + res(i * 256 + j) = t1 + end do + end do + + !$acc end parallel + end subroutine workers + + subroutine vectors(t1, res) + implicit none + integer :: t1 + integer :: i, j + real, intent(out) :: res(:) + + !$acc parallel copyout(res) num_gangs(64) num_workers(64) ! { dg-warning "using num_workers \\(32\\), ignoring 64" "" { target openacc_nvidia_accel_selected } } + + !$acc loop gang worker + do i=0,255 + !$acc loop vector private(t1) + do j=1,256 + t1 = (i * 256 + j) * 101 + res(i * 256 + j) = t1 + end do + end do + + !$acc end parallel + end subroutine vectors + +end program main From patchwork Wed Dec 15 15:54:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48978 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A0F63857C4B for ; Wed, 15 Dec 2021 16:18:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 36612385802B for ; Wed, 15 Dec 2021 15:57:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 36612385802B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 3ak5CTWWK+DJilwSvmt4Dnvv+//p2aj72RPNNNxvXAHGjGu69Nw+CjUOkUK6AQG9I3puiicVaD gyKrybtMdvwhaWRPA+tThTh6/ln1ebDs5ejAfvJE3Ej3OdcDLeSqWclVh22d0MjNTU2lc0CpuN ifyNM1Vklm7xxSu1hhxcKT2zm0Y9sh+i6pZDWJzk1WcuRur1pxR25jcSDjhq8yNcPRcIp+sWUG VwZEHY3A5h2dU6z6sDySWItEu30Cv03wv/PXNzbEDxAgqQH4HYzO7dLNpNYKGYfA3NPtcwSiJE 0vKDUBSo12UFe5ssDl8C/lq8 X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584653" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:23 -0800 IronPort-SDR: 6Sddxl63vu+dZ6plm50qdWMJd3pGQUgd5OjL+PKGwzyC9B8JFCfsXU4P4lkADSeXpD6VM6HQxa Z5S97chOpRoeRSz9vs/TQN44SwEMk4BTPUyRiBeu5+5N8HjjTbRbar9pg5WmQZR+zlFsgqSvoE IZU3Ddy7nkK6ZIy/oI5uL98fJ8ILp7fg2JEbILpNwBjPKWABecibmc6U+ejxDLB/VxE+UGYTMT 2NeRz2ZrduiIkQeslLx/Ert1kNijN/7Z9U4bK6Icf2t9KtA+i5E28rgP1/SG5lyOmoSEOIQ9VT 4K4= From: Frederik Harwath To: Subject: [PATCH 36/40] openacc: Enable reduction variable localization for "kernels" Date: Wed, 15 Dec 2021 16:54:43 +0100 Message-ID: <20211215155447.19379-37-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * gimplify.c (gimplify_omp_for): Enable localization on "kernels" regions. (gimplify_omp_workshare): Likewise. --- gcc/gimplify.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index bf37388f947c..a0137089496b 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -12229,11 +12229,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) && outer->region_type != ORT_ACC_KERNELS) outer = outer->outer_context; - /* FIXME: Reductions only work in parallel regions at present. We avoid - doing the reduction localization transformation in kernels regions - here, because the code to remove reductions in kernels regions cannot - handle that. */ - if (outer && outer->region_type == ORT_ACC_PARALLEL) + if (outer && (outer->region_type == ORT_ACC_PARALLEL + || (outer->region_type == ORT_ACC_KERNELS + && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE))) localize_reductions (OMP_FOR_CLAUSES (for_stmt), OMP_FOR_BODY (for_stmt)); } @@ -13767,8 +13765,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) { push_gimplify_context (); - /* FIXME: Reductions are not supported in kernels regions yet. */ - if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL) + if (ort == ORT_ACC_PARALLEL + || (ort == ORT_ACC_KERNELS + && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)) localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr)); gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body); From patchwork Wed Dec 15 15:54:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48979 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9912C3857C4B for ; Wed, 15 Dec 2021 16:18:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 7FF2A3858420 for ; Wed, 15 Dec 2021 15:57:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7FF2A3858420 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 9Eh0D7v/9JzNQug90Znhsvuoi2pK0r/jCk04MxjWoUY3bXcZ9fx2CHsk3Wq5hwXdaZDl22ZD6I w0ltOprbnT4Awk20VcSQzAnnrIRfemfDbvst5M7vQ1wW0IlETeokKUi639h16htDRP9Iu/RH1v CayaCptvJ8ayfyFQrHgPNPRkwe8ss8kACXKhrHWaGBDYlHuU2UrirfCtMlr5j+kKJ4Sj5oZy1A Ndw/wtYMVB1LGSGDgf74Z9ttturuPxhd8R8O+RxD8LcUtlZZ/xAUs2aQ+MmlODTGqLmysBvEH8 FKDpAv8UUir9Ii9GjZh8e2Sc X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584655" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:25 -0800 IronPort-SDR: CE7Q0vcU39ZxLn3BrYMMdkn+eGTdSPm+7C3ncMi6Pt1t3578DZezQZko6uhaetcx/KZGQOkTlg grGUFXFGUeXI+XQCS6MDvIKsdMM811Or/CT15A1ZRQnmo1wF81hqIIUyKMr2qkcl3JzuQmNTHa ab1GJIIzkETNzRMKVdRVjW2ujCOM0jEDONCcHI/BHLUfnv7viaKIRt7ZOFNOuc+1615nBzFxFX XGL7yymYwt06SEMJMkCHpTL5EzwzQbIhAa230I3M73WEW16yW9gPl3Fh5Qyy502e5hv15+Qp4c ruE= From: Frederik Harwath To: Subject: [PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels' Date: Wed, 15 Dec 2021 16:54:44 +0100 Message-ID: <20211215155447.19379-38-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tobias Burnus , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Tobias Burnus Nearly all variable mapping is moved from 'kernels' to a surrounding 'data kernels' and then 'force_present' mapped for the 'kernels'. However, as libgomp.oacc-c-c++-common/declare-vla.c shows, moving 'int i, N' will fail as there is a special case for is_gimple_reg in mapping and that fails badly if outside a target region (e.g. offloading = false). As those are transferred by value and not as a pointer, it makes more sense to only map them at 'kernels' and ignore them for 'data kernels'. Additionally, as e.g. libgomp.oacc-c-c++-common/kernels-decompose-1.c shows, one still additionally to handle 'kernels'-declared variables which now are declared in 'kernels data' and and can be handled as is_gimple_reg. gcc/ * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): is_gimple_reg vars are not yet mapped, fall through to map is as before the transformation. (omp_oacc_kernels_decompose_1): Don't map is_gimple_reg vars. (decompose_kernels_region_body): Use tofrom for is_gimple_reg vars. (omp_oacc_kernels_decompose_1): Handle is_gimple_reg vars as without data kernels. gcc/testsuite/ * gfortran.dg/goacc/declare-3.f95: Update scan-tree-dump-times. --- gcc/omp-oacc-kernels-decompose.cc | 9 +++++++-- gcc/testsuite/gfortran.dg/goacc/declare-3.f95 | 2 +- 2 files changed, 8 insertions(+), 3 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc index c96207d96250..a6be1f1ed238 100644 --- a/gcc/omp-oacc-kernels-decompose.cc +++ b/gcc/omp-oacc-kernels-decompose.cc @@ -873,7 +873,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body, else inner_bind_vars = next; } - else + else if (!is_gimple_reg (v)) { /* Otherwise, build the map clause. */ tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP); @@ -1222,7 +1222,9 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses) if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL) { tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT); + OMP_CLAUSE_SET_MAP_KIND (present_clause, + is_gimple_reg (var) + ? GOMP_MAP_TOFROM : GOMP_MAP_FORCE_PRESENT); OMP_CLAUSE_DECL (present_clause) = var; OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var); OMP_CLAUSE_CHAIN (present_clause) = present_clauses; @@ -1437,6 +1439,9 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt) region causes runtime errors. */ break; + if (is_gimple_reg (decl)) + break; + /* For non-artificial variables, and for non-declaration expressions like A[0:n], copy the clause to the data region. */ diff --git a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 index 9127cba6600d..2a1fe0a68465 100644 --- a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 @@ -39,7 +39,7 @@ program test use mod_d use mod_e - ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) map\(force_to:b\) map\(force_alloc:a\)$} original } } + ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) map\(to:b\) map\(alloc:a\)$} original } } end program test ! { dg-final { scan-tree-dump-times {#pragma acc data} 1 original } } From patchwork Wed Dec 15 15:54:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48980 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DA8E1385840B for ; Wed, 15 Dec 2021 16:19:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id DF6A83857C48 for ; Wed, 15 Dec 2021 15:57:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF6A83857C48 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: +vaZjWmXAtWHvbJIX2B9I2Vi01xHP9X+U9hDUJAhrukqVCXMlAjbN54lsazu25Wo1WEzo8GqKz HJMqw1T4gvq4YBUF3lMpKPoSMb7tiKep38D4NKoMCp7/vEKEIaKY5b2bcgpQVnMFgJTRdAEnu/ JMVpw3fvn667N3fRbw3prdwz8MCTYbMBNsk6IYEwHUDmQIYeH6LVFJ4+yqAj+wARRaIgRT+4HZ z4MS3avDT9t0BFpCl5NPIseQifWhS1bB5+F7SCRO9IEpBBXzj/GxxEXsCdm7JREfKr3TrDCffo BHW6Vv7z/pQIIbLOx0nsxZrC X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584657" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:28 -0800 IronPort-SDR: x4QSE3kaDv8+t4xAJdAudtSK+026LeXz5WJMIN+xd/WONMxjPfb4LhXz8GG0N5PW9179OZcyCZ 7pipuULen2Hkc7x0+6jBuHbDhiNTvh9R+E6syBNew9o6zjE3FjX70qgwUU/2l8Rm+0bk5O3qGI o6eqJ7EEfS0YbFr+P2jfJcZmI2dRzIHtdjBbnVpFicju8Y1OQISOB0vqOo/iYofBtBvqalaTA7 yi+NasH2NGxBp3aBV3ezdAJZTBkHD1npYYbCHjx0Upffc/r11Sn4yrtQi6fijg5OlBWLOO+VFx gVI= From: Frederik Harwath To: Subject: [PATCH 38/40] openacc: fix privatization of by-reference arrays Date: Wed, 15 Dec 2021 16:54:45 +0100 Message-ID: <20211215155447.19379-39-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tobias Burnus , thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: Tobias Burnus Replacing of a by-reference variable in a private clause by a local variable makes sense; however, for arrays, the size is not directly known by the type. This causes an ICE via create_tmp_var which indirectly invokes force_constant_size in this case - but the latter only handled Ada. gcc/ChangeLog: * gimplify.c (localize_reductions): Do not create local variable for privatized arrays. --- gcc/gimplify.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index a0137089496b..952bc449a7db 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -11982,8 +11982,9 @@ localize_reductions (tree clauses, tree body) if (!lang_hooks.decls.omp_privatize_by_reference (var)) continue; - type = TREE_TYPE (TREE_TYPE (var)); + if (TREE_CODE (type) == ARRAY_TYPE) + continue; new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var))); pr.ref_var = var; From patchwork Wed Dec 15 15:54:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 48981 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 24E6F3858039 for ; Wed, 15 Dec 2021 16:19:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 3448D3857C71 for ; Wed, 15 Dec 2021 15:57:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3448D3857C71 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: 2PjFo07jSKfgsjy6ivcMX7rSbS9rCcnVq0YlChVH+zBhOxL55dg0jUO+Rlh7qggqYh3JZu60qQ 0EDVRX84s6/nGx7i6iWdayJ3WyJ5yF5Zt4d2nZYACOXRogkpbArZrrTIGEyX3+GgcAuoXew8TY qfXKNfyCUliVM7K8PBXcQo863warYdMwU64TJb5KC1g2Udebopu5+Enqytgf37TnjOOXXMJ2iQ +0BqVMPpKE30/TYa9Stjcy/tiSbfEQena0uynXKeudssVayywnaYLGvr7R9WP/nOb4Xx8EaD4Q qOyzda4s6DE3tQ/k3wwWkoul X-IronPort-AV: E=Sophos;i="5.88,207,1635235200"; d="scan'208";a="69584658" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Dec 2021 07:57:30 -0800 IronPort-SDR: dmDL4DxklDNEh3F+b5INb3R9AhEskCYwYRv++6Z3tbu9D7RmxF+As+2oy5ZnFUfpbt+9ORBsgI g8sHIj+qQ3k+7UpaC4gVbC3YExSS67sF3ZMFQqsl6i30Zy/zOdN1/eoLftBC6j2dqLet1mruf8 zKEjkN3yyJcVom7NezXmmuBeCV46oHleYxm8GRntcueQWIdO9I+Ti/VqXiKvmP+66Qa/JE1Xfa 5U/ZpCtPO6xO9VnKFD+ElPsRK7ulXdY2790IY0laV/EYjHc+c9AkPf1AltuyAB6nKthDcTzERN BAs= From: Frederik Harwath To: Subject: [PATCH 39/40] openacc: Check type for references in reduction lowering Date: Wed, 15 Dec 2021 16:54:46 +0100 Message-ID: <20211215155447.19379-40-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211215155447.19379-1-frederik@codesourcery.com> References: <20211215155447.19379-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-09.mgc.mentorg.com (139.181.222.9) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thomas@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/ChangeLog: * omp-low.c (lower_oacc_reductions): Only create a reference if variable has pointer type. --- gcc/omp-low.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/omp-low.c b/gcc/omp-low.c index ae5cdfc5e260..2b8b848ec03a 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -7639,9 +7639,10 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner, if (omp_privatize_by_reference (orig)) { - outgoing = build_simple_mem_ref (outgoing); + if (POINTER_TYPE_P (TREE_TYPE (outgoing))) + outgoing = build_simple_mem_ref (outgoing); - if (!TREE_CONSTANT (incoming)) + if (POINTER_TYPE_P (TREE_TYPE (incoming))) incoming = build_simple_mem_ref (incoming); }