From patchwork Fri Mar 24 15:30:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 66862 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 005123888833 for ; Fri, 24 Mar 2023 15:52:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 4641A3858C50; Fri, 24 Mar 2023 15:51:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4641A3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,288,1673942400"; d="scan'208";a="274531" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 24 Mar 2023 07:31:21 -0800 IronPort-SDR: 7JtpxPoxVWZGGnlJ8/XVAKNkjDX4KRJWys6wszrpD70XOKGncTINR1M48i5Jbyk+lhEqnH8mlQ wbXWV0qe5KnSXzffU2A5yyffn43EgGqBOqD+RGC7z4SBwcT37rx9OgiWcjSN9ILenMzjD6wEWj WIlFm3lvg98muRN6FL+QPR1yt5cELEG1hz139gmj5ThhJUUF3NRH7rdOZdUxIOnWgMsmu7BBuv 1VTTUk+3mJq3NfWOuvpqlZno1SENZ06sUphyzi+PWjd5ZDNJNcVUAY/SGcLuJVqWzRW+qfg3bl CvQ= From: Frederik Harwath To: , , , Subject: [PATCH 4/7] openmp: Add Fortran support for "omp tile" Date: Fri, 24 Mar 2023 16:30:42 +0100 Message-ID: <20230324153046.3996092-5-frederik@codesourcery.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20230324153046.3996092-1-frederik@codesourcery.com> References: <20230324153046.3996092-1-frederik@codesourcery.com> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This commit implements the Fortran front end support for the "omp tile" directive and the corresponding middle end transformation. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE. (enum gfc_exec_op): Add EXEC_OMP_TILE. (loop_transform_p): New declaration. (struct gfc_omp_clauses): Add "tile_sizes" field. * dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping. (show_omp_node): Handle EXEC_OMP_TILE. (show_code_node): Likewise. * match.h (gfc_match_omp_tile): New declaration. * openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field. (match_tile_sizes): New function. (OMP_TILE_CLAUSES): New macro. (gfc_match_omp_tile): New function. (resolve_omp_do): Handle EXEC_OMP_TILE. (resolve_omp_tile): New function. (omp_code_to_statement): Handle EXEC_OMP_TILE. (gfc_resolve_omp_directive): Likewise. * parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE and ST_OMP_TILE. (next_statement): Handle ST_OMP_TILE. (gfc_ascii_statement): Likewise. (parse_omp_do): Likewise. (parse_executable): Likewise. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE. (gfc_resolve_code): Likewise. * st.cc (gfc_free_statement): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field. (loop_transform_p): New function. (gfc_expr_list_len): New function. (gfc_trans_omp_do): Handle EXEC_OMP_TILE. (gfc_trans_omp_directive): Likewise. * trans.cc (trans_code): Likewise. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_loop): Likewise. * omp-transform-loops.cc (walk_omp_for_loops): New declaration. (subst_var_in_op): New function. (subst_var): New function. (gomp_for_number_of_iterations): Adjust. (gomp_for_iter_count_type): New function. (gimple_assign_rhs_to_tree): New function. (subst_defs): New function. (gomp_for_uncollapse): Adjust. (transformation_clause_p): Add OMP_CLAUSE_TILE. (tile): New function. (transform_gomp_for): Handle OMP_CLAUSE_TILE. (optimize_transformation_clauses): Handle OMP_CLAUSE_TILE. * omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TILE. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE. * tree.cc: Add OMP_CLAUSE_TILE. * tree.h (OMP_CLAUSE_TILE_SIZES): New macro. libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test. * testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/loop-transforms/tile-1.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-2.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-3.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-4.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test. --- gcc/fortran/dump-parse-tree.cc | 17 +- gcc/fortran/gfortran.h | 7 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 373 +++++++++++++----- gcc/fortran/parse.cc | 15 + gcc/fortran/resolve.cc | 3 + gcc/fortran/st.cc | 1 + gcc/fortran/trans-openmp.cc | 86 ++-- gcc/fortran/trans.cc | 1 + gcc/gimplify.cc | 3 + gcc/omp-general.cc | 2 +- gcc/omp-transform-loops.cc | 340 +++++++++++++++- .../gomp/loop-transforms/tile-1.f90 | 163 ++++++++ .../gomp/loop-transforms/tile-1a.f90 | 10 + .../gomp/loop-transforms/tile-2.f90 | 80 ++++ .../gomp/loop-transforms/tile-3.f90 | 18 + .../gomp/loop-transforms/tile-4.f90 | 95 +++++ .../gomp/loop-transforms/tile-unroll-1.f90 | 57 +++ .../gomp/loop-transforms/unroll-tile-1.f90 | 37 ++ .../gomp/loop-transforms/unroll-tile-2.f90 | 41 ++ gcc/tree-core.h | 3 + gcc/tree-pretty-print.cc | 8 + gcc/tree.cc | 7 +- gcc/tree.h | 3 + .../loop-transforms/unroll-full-tile.C | 84 ++++ .../loop-transforms/tile-1.f90 | 71 ++++ .../loop-transforms/tile-2.f90 | 117 ++++++ .../loop-transforms/tile-unroll-1.f90 | 112 ++++++ .../loop-transforms/tile-unroll-2.f90 | 71 ++++ .../loop-transforms/tile-unroll-3.f90 | 77 ++++ .../loop-transforms/tile-unroll-4.f90 | 75 ++++ .../loop-transforms/unroll-tile-1.f90 | 112 ++++++ .../loop-transforms/unroll-tile-2.f90 | 71 ++++ 33 files changed, 2042 insertions(+), 119 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 -- 2.36.1 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index e069aca1f1d..82183285954 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -2062,6 +2062,18 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) if (omp_clauses->unroll_partial_factor > 0) fprintf (dumpfile, "(%u)", omp_clauses->unroll_partial_factor); } + if (omp_clauses->tile_sizes) + { + gfc_expr_list *sizes; + fputs (" TILE SIZES(", dumpfile); + for (sizes = omp_clauses->tile_sizes; sizes; sizes = sizes->next) + { + show_expr (sizes->expr); + if (sizes->next) + fputs (", ", dumpfile); + } + fputc (')', dumpfile); + } } /* Show a single OpenMP or OpenACC directive node and everything underneath it @@ -2172,6 +2184,7 @@ show_omp_node (int level, gfc_code *c) name = "TEAMS DISTRIBUTE PARALLEL DO SIMD"; break; case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: name = "TEAMS DISTRIBUTE SIMD"; break; case EXEC_OMP_TEAMS_LOOP: name = "TEAMS LOOP"; break; + case EXEC_OMP_TILE: name = "TILE"; break; case EXEC_OMP_UNROLL: name = "UNROLL"; break; case EXEC_OMP_WORKSHARE: name = "WORKSHARE"; break; default: @@ -2249,6 +2262,7 @@ show_omp_node (int level, gfc_code *c) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: omp_clauses = c->ext.omp_clauses; @@ -2311,7 +2325,7 @@ show_omp_node (int level, gfc_code *c) d = d->block; } } - else if (c->op == EXEC_OMP_UNROLL) + else if (c->op == EXEC_OMP_UNROLL || c->op == EXEC_OMP_TILE) show_code (level + 1, c->block != NULL ? c->block->next : c->next); else show_code (level + 1, c->block->next); @@ -3491,6 +3505,7 @@ show_code_node (int level, gfc_code *c) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: show_omp_node (level, c); diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 5ef4a8907b0..8b4eadf9b4d 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -320,7 +320,8 @@ enum gfc_statement ST_OMP_ERROR, ST_OMP_ASSUME, ST_OMP_END_ASSUME, ST_OMP_ASSUMES, /* Note: gfc_match_omp_nothing returns ST_NONE. */ ST_OMP_NOTHING, ST_NONE, - ST_OMP_UNROLL, ST_OMP_END_UNROLL + ST_OMP_UNROLL, ST_OMP_END_UNROLL, + ST_OMP_TILE, ST_OMP_END_TILE }; /* Types of interfaces that we can have. Assignment interfaces are @@ -1550,6 +1551,7 @@ typedef struct gfc_omp_clauses struct gfc_expr *dist_chunk_size; struct gfc_expr *message; struct gfc_omp_assumptions *assume; + struct gfc_expr_list *tile_sizes; const char *critical_name; enum gfc_omp_default_sharing default_sharing; enum gfc_omp_atomic_op atomic_op; @@ -2977,7 +2979,7 @@ enum gfc_exec_op EXEC_OMP_TARGET_TEAMS_LOOP, EXEC_OMP_MASKED, EXEC_OMP_PARALLEL_MASKED, EXEC_OMP_PARALLEL_MASKED_TASKLOOP, EXEC_OMP_PARALLEL_MASKED_TASKLOOP_SIMD, EXEC_OMP_MASKED_TASKLOOP, EXEC_OMP_MASKED_TASKLOOP_SIMD, EXEC_OMP_SCOPE, - EXEC_OMP_UNROLL, + EXEC_OMP_UNROLL, EXEC_OMP_TILE, EXEC_OMP_ERROR }; @@ -3874,6 +3876,7 @@ bool gfc_inline_intrinsic_function_p (gfc_expr *); /* trans-openmp.cc */ bool loop_transform_p (gfc_exec_op op); +int gfc_expr_list_len (gfc_expr_list *); /* bbt.cc */ typedef int (*compare_fn) (void *, void *); diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 5640c725f09..d04e1cd66a4 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -226,6 +226,7 @@ match gfc_match_omp_teams_distribute_parallel_do_simd (void); match gfc_match_omp_teams_distribute_simd (void); match gfc_match_omp_teams_loop (void); match gfc_match_omp_threadprivate (void); +match gfc_match_omp_tile (void); match gfc_match_omp_unroll (void); match gfc_match_omp_workshare (void); match gfc_match_omp_end_critical (void); diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index ec707d977cd..1de61029768 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -191,6 +191,7 @@ gfc_free_omp_clauses (gfc_omp_clauses *c) i == OMP_LIST_ALLOCATE); gfc_free_expr_list (c->wait_list); gfc_free_expr_list (c->tile_list); + gfc_free_expr_list (c->tile_sizes); free (CONST_CAST (char *, c->critical_name)); if (c->assume) { @@ -977,6 +978,76 @@ cleanup: return MATCH_ERROR; } +static match +match_tile_sizes (gfc_expr_list **list) +{ + gfc_expr_list *head, *tail, *p; + locus old_loc; + gfc_expr *expr; + match m; + + head = tail = NULL; + + old_loc = gfc_current_locus; + + m = gfc_match_char ('('); + if (m != MATCH_YES) + goto syntax; + + for (;;) + { + m = gfc_match_expr (&expr); + if (m == MATCH_YES) + { + p = gfc_get_expr_list (); + if (head == NULL) + head = tail = p; + else + { + tail->next = p; + tail = tail->next; + } + int size = 0; + if (m == MATCH_YES) + { + if (gfc_extract_int (expr, &size, 1)) + goto cleanup; + else if (size < 1) + { + gfc_error_now ("tile size not constant " + "positive integer at %C"); + goto cleanup; + } + tail->expr = expr; + } + goto next_item; + } + if (m == MATCH_ERROR) + goto cleanup; + goto syntax; + + next_item: + if (gfc_match_char (')') == MATCH_YES) + break; + if (gfc_match_char (',') != MATCH_YES) + goto syntax; + } + + while (*list) + list = &(*list)->next; + + *list = head; + return MATCH_YES; + +syntax: + gfc_error ("Syntax error in 'tile sizes' list at %C"); + +cleanup: + gfc_free_expr_list (head); + gfc_current_locus = old_loc; + return MATCH_ERROR; +} + /* OpenMP clauses. */ enum omp_mask1 { @@ -1054,6 +1125,7 @@ enum omp_mask2 OMP_CLAUSE_UNROLL_FULL, /* OpenMP 5.1. */ OMP_CLAUSE_UNROLL_NONE, /* OpenMP 5.1. */ OMP_CLAUSE_UNROLL_PARTIAL, /* OpenMP 5.1. */ + OMP_CLAUSE_TILE, /* OpenMP 5.1. */ OMP_CLAUSE_ASYNC, OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, @@ -4310,7 +4382,8 @@ cleanup: omp_mask (OMP_CLAUSE_NOWAIT) #define OMP_UNROLL_CLAUSES \ (omp_mask (OMP_CLAUSE_UNROLL_FULL) | OMP_CLAUSE_UNROLL_PARTIAL) - +#define OMP_TILE_CLAUSES \ + (omp_mask (OMP_CLAUSE_TILE)) static match match_omp (gfc_exec_op op, const omp_mask mask) @@ -6409,6 +6482,16 @@ gfc_match_omp_teams_distribute_simd (void) | OMP_SIMD_CLAUSES); } +match +gfc_match_omp_tile (void) +{ + gfc_omp_clauses *c = gfc_get_omp_clauses(); + new_st.op = EXEC_OMP_TILE; + new_st.ext.omp_clauses = c; + + return match_tile_sizes (&c->tile_sizes); +} + match gfc_match_omp_unroll (void) { @@ -9289,75 +9372,6 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol *sym, bool add_clause) } } - -static bool -omp_unroll_removes_loop_nest (gfc_code *code) -{ - gcc_assert (code->op == EXEC_OMP_UNROLL); - if (!code->ext.omp_clauses) - return true; - - if (code->ext.omp_clauses->unroll_none) - { - gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loop " - "into a non-loop", - &code->loc); - return true; - } - if (code->ext.omp_clauses->unroll_full) - { - gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into a " - "non-loop", - &code->loc); - return true; - } - return false; -} - -static void -resolve_loop_transform_generic (gfc_code *code, const char *descr) -{ - gcc_assert (code->block); - - if (code->block->op == EXEC_OMP_UNROLL - && !omp_unroll_removes_loop_nest (code->block)) - return; - - if (code->block->next->op == EXEC_OMP_UNROLL - && !omp_unroll_removes_loop_nest (code->block->next)) - return; - - if (code->block->next->op == EXEC_DO_WHILE) - { - gfc_error ("%s invalid around DO WHILE or DO without loop " - "control at %L", descr, &code->loc); - return; - } - if (code->block->next->op == EXEC_DO_CONCURRENT) - { - gfc_error ("%s invalid around DO CONCURRENT loop at %L", - descr, &code->loc); - return; - } - - gfc_error ("missing canonical loop nest after %s at %L", - descr, &code->loc); - -} - -static void -resolve_omp_unroll (gfc_code *code) -{ - if (!code->block || code->block->op == EXEC_DO) - return; - - if (code->block->next->op == EXEC_DO) - return; - - resolve_loop_transform_generic (code, "!$OMP UNROLL"); -} - - static void handle_local_var (gfc_symbol *sym) { @@ -9488,6 +9502,106 @@ bound_expr_is_canonical (gfc_code *code, int depth, gfc_expr *expr, return false; } +static bool +omp_unroll_removes_loop_nest (gfc_code *code) +{ + gcc_assert (code->op == EXEC_OMP_UNROLL); + if (!code->ext.omp_clauses) + return true; + + if (code->ext.omp_clauses->unroll_none) + { + gfc_warning (0, "!$OMP UNROLL without PARTIAL clause at %L turns loop " + "into a non-loop", + &code->loc); + return true; + } + if (code->ext.omp_clauses->unroll_full) + { + gfc_warning (0, "!$OMP UNROLL with FULL clause at %L turns loop into a " + "non-loop", + &code->loc); + return true; + } + return false; +} + +static gfc_code * +resolve_nested_loop_transforms (gfc_code *code, const char *name, + int required_depth, locus *loc) +{ + if (!code) + return code; + + bool error = false; + while (loop_transform_p (code->op)) + { + if (!error && code->op == EXEC_OMP_UNROLL) + { + if (omp_unroll_removes_loop_nest (code)) + { + gfc_error ("missing canonical loop nest after %s at %L", name, + loc); + error = true; + } + else if (required_depth > 1) + { + gfc_error ("loop nest depth after !$OMP UNROLL at %L is insufficient " + "for outer %s", &code->loc, name); + error = true; + } + } + else if (!error && code->op == EXEC_OMP_TILE + && required_depth > gfc_expr_list_len (code->ext.omp_clauses->tile_sizes)) + { + gfc_error ("loop nest depth after !$OMP TILE at %L is insufficient " + "for outer %s", &code->loc, name); + error = true; + } + + if (code->block) + code = code->block->next; + else + code = code->next; + } + gcc_assert (!loop_transform_p (code->op)); + + return code; +} + +static void +resolve_omp_unroll (gfc_code *code) +{ + const char *descr = "!$OMP UNROLL"; + locus *loc = &code->loc; + + if (!code->block || code->block->op == EXEC_DO) + return; + + code = resolve_nested_loop_transforms (code->block->next, descr, 1, + &code->loc); + + if (code->op == EXEC_DO) + return; + + if (code->op == EXEC_DO_WHILE) + { + gfc_error ("%s invalid around DO WHILE or DO without loop " + "control at %L", descr, loc); + return; + } + + if (code->op == EXEC_DO_CONCURRENT) + { + gfc_error ("%s invalid around DO CONCURRENT loop at %L", + descr, loc); + return; + } + + gfc_error ("missing canonical loop nest after %s at %L", + descr, loc); +} + static void resolve_omp_do (gfc_code *code) { @@ -9592,30 +9706,13 @@ resolve_omp_do (gfc_code *code) break; case EXEC_OMP_TEAMS_LOOP: name = "!$OMP TEAMS LOOP"; break; case EXEC_OMP_UNROLL: name = "!$OMP UNROLL"; break; + case EXEC_OMP_TILE: name = "!$OMP TILE"; break; default: gcc_unreachable (); } if (code->ext.omp_clauses) resolve_omp_clauses (code, code->ext.omp_clauses, NULL); - do_code = code->block->next; - /* Move forward over any loop transformation directives to find the loop. */ - bool error = false; - while (do_code->op == EXEC_OMP_UNROLL) - { - if (!error && omp_unroll_removes_loop_nest (do_code)) - { - gfc_error ("missing canonical loop nest after %s at %L", name, - &code->loc); - error = true; - } - if (do_code->block) - do_code = do_code->block->next; - else - do_code = do_code->next; - } - gcc_assert (do_code->op != EXEC_OMP_UNROLL); - if (code->ext.omp_clauses->orderedc) collapse = code->ext.omp_clauses->orderedc; else @@ -9630,6 +9727,9 @@ resolve_omp_do (gfc_code *code) depth and treats any further inner loops as the final-loop-body. So here we also check canonical loop nest form only for the number of outer loops specified by the COLLAPSE clause too. */ + do_code = resolve_nested_loop_transforms (code->block->next, name, collapse, + &code->loc); + for (i = 1; i <= collapse; i++) { gfc_symbol *start_var = NULL, *end_var = NULL; @@ -9745,6 +9845,98 @@ resolve_omp_do (gfc_code *code) } } +static void +resolve_omp_tile (gfc_code *code) +{ + gfc_code *do_code, *c; + gfc_symbol *dovar; + const char *name = "!$OMP TILE"; + + unsigned num_loops = 0; + gcc_assert (code->ext.omp_clauses->tile_sizes); + for (gfc_expr_list *el = code->ext.omp_clauses->tile_sizes; el; + el = el->next) + num_loops++; + + do_code = resolve_nested_loop_transforms (code, name, num_loops, &code->loc); + + for (unsigned i = 1; i <= num_loops; i++) + { + if (do_code->op == EXEC_DO_WHILE) + { + gfc_error ("%s cannot be a DO WHILE or DO without loop control " + "at %L", name, &do_code->loc); + break; + } + if (do_code->op == EXEC_DO_CONCURRENT) + { + gfc_error ("%s cannot be a DO CONCURRENT loop at %L", name, + &do_code->loc); + break; + } + if (do_code->op != EXEC_DO) + { + gfc_error ("%s must be DO loop at %L", name, + &do_code->loc); + break; + } + + gcc_assert (do_code->op != EXEC_OMP_UNROLL); + gcc_assert (do_code->op == EXEC_DO); + dovar = do_code->ext.iterator->var->symtree->n.sym; + if (i > 1) + { + gfc_code *do_code2 = code; + while (loop_transform_p (do_code2->op)) + { + if (do_code2->block) + do_code2 = do_code2->block->next; + else + do_code2 = do_code2->next; + } + gcc_assert (!loop_transform_p (do_code2->op)); + + for (unsigned j = 1; j < i; j++) + { + gfc_symbol *ivar = do_code2->ext.iterator->var->symtree->n.sym; + if (dovar == ivar + || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->start) + || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->end) + || gfc_find_sym_in_expr (ivar, do_code->ext.iterator->step)) + { + gfc_error ("%s loops don't form rectangular " + "iteration space at %L", name, &do_code->loc); + break; + } + do_code2 = do_code2->block->next; + } + } + for (c = do_code->next; c; c = c->next) + if (c->op != EXEC_NOP && c->op != EXEC_CONTINUE) + { + gfc_error ("%s loops not perfectly nested at %L", + name, &c->loc); + break; + } + if (i == num_loops || c) + break; + do_code = do_code->block; + if (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE) + { + gfc_error ("not enough DO loops for %s at %L", + name, &code->loc); + break; + } + do_code = do_code->next; + if (do_code == NULL + || (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE)) + { + gfc_error ("not enough DO loops for %s at %L", + name, &code->loc); + break; + } + } +} static gfc_statement omp_code_to_statement (gfc_code *code) @@ -9889,6 +10081,8 @@ omp_code_to_statement (gfc_code *code) return ST_OMP_PARALLEL_LOOP; case EXEC_OMP_DEPOBJ: return ST_OMP_DEPOBJ; + case EXEC_OMP_TILE: + return ST_OMP_TILE; case EXEC_OMP_UNROLL: return ST_OMP_UNROLL; default: @@ -10320,6 +10514,9 @@ gfc_resolve_omp_directive (gfc_code *code, gfc_namespace *ns) case EXEC_OMP_TEAMS_LOOP: resolve_omp_do (code); break; + case EXEC_OMP_TILE: + resolve_omp_tile (code); + break; case EXEC_OMP_UNROLL: resolve_omp_unroll (code); break; diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index 094678436b4..1cc5200f35a 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -1009,6 +1009,7 @@ decode_omp_directive (void) matcho ("end teams loop", gfc_match_omp_eos_error, ST_OMP_END_TEAMS_LOOP); matcho ("end teams", gfc_match_omp_eos_error, ST_OMP_END_TEAMS); matchs ("end unroll", gfc_match_omp_eos_error, ST_OMP_END_UNROLL); + matchs ("end tile", gfc_match_omp_eos_error, ST_OMP_END_TILE); matcho ("end workshare", gfc_match_omp_end_nowait, ST_OMP_END_WORKSHARE); break; @@ -1137,6 +1138,7 @@ decode_omp_directive (void) matcho ("teams", gfc_match_omp_teams, ST_OMP_TEAMS); matchdo ("threadprivate", gfc_match_omp_threadprivate, ST_OMP_THREADPRIVATE); + matchs ("tile sizes", gfc_match_omp_tile, ST_OMP_TILE); break; case 'u': matchs ("unroll", gfc_match_omp_unroll, ST_OMP_UNROLL); @@ -1729,6 +1731,7 @@ next_statement (void) case ST_OMP_TARGET_PARALLEL_LOOP: case ST_OMP_TARGET_TEAMS_LOOP: \ case ST_OMP_ASSUME: \ case ST_OMP_UNROLL: \ + case ST_OMP_TILE: \ case ST_CRITICAL: \ case ST_OACC_PARALLEL_LOOP: case ST_OACC_PARALLEL: case ST_OACC_KERNELS: \ case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: \ @@ -2774,6 +2777,9 @@ gfc_ascii_statement (gfc_statement st, bool strip_sentinel) case ST_OMP_THREADPRIVATE: p = "!$OMP THREADPRIVATE"; break; + case ST_OMP_TILE: + p = "!$OMP TILE"; + break; case ST_OMP_UNROLL: p = "!$OMP UNROLL"; break; @@ -5214,6 +5220,11 @@ parse_omp_do (gfc_statement omp_st) num_unroll++; continue; } + else if (st == ST_OMP_TILE) + { + accept_statement (st); + continue; + } else unexpected_statement (st); } @@ -5338,6 +5349,9 @@ parse_omp_do (gfc_statement omp_st) case ST_OMP_TEAMS_LOOP: omp_end_st = ST_OMP_END_TEAMS_LOOP; break; + case ST_OMP_TILE: + omp_end_st = ST_OMP_END_TILE; + break; case ST_OMP_UNROLL: omp_end_st = ST_OMP_END_UNROLL; break; @@ -6025,6 +6039,7 @@ parse_executable (gfc_statement st) case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case ST_OMP_TEAMS_DISTRIBUTE_SIMD: case ST_OMP_TEAMS_LOOP: + case ST_OMP_TILE: case ST_OMP_UNROLL: st = parse_omp_do (st); if (st == ST_IMPLIED_ENDDO) diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 46988ff281d..182aa18053c 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -11041,6 +11041,7 @@ gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_LOOP: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: break; @@ -12198,6 +12199,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns) case EXEC_OMP_LOOP: case EXEC_OMP_SIMD: case EXEC_OMP_TARGET_SIMD: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: gfc_resolve_omp_do_blocks (code, ns); break; @@ -12695,6 +12697,7 @@ start: case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: gfc_resolve_omp_directive (code, ns); diff --git a/gcc/fortran/st.cc b/gcc/fortran/st.cc index 6112831e621..cea874e4474 100644 --- a/gcc/fortran/st.cc +++ b/gcc/fortran/st.cc @@ -277,6 +277,7 @@ gfc_free_statement (gfc_code *p) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: gfc_free_omp_clauses (p->ext.omp_clauses); diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 73c416c951d..6936cd7f5ee 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -3913,6 +3913,24 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, omp_clauses = gfc_trans_add_clause (c, omp_clauses); } + if (clauses->tile_sizes) + { + vec *tvec; + gfc_expr_list *el; + + vec_alloc (tvec, 4); + + for (el = clauses->tile_sizes; el; el = el->next) + vec_safe_push (tvec, gfc_convert_expr_to_tree (block, el->expr)); + + c = build_omp_clause (gfc_get_location (&where), + OMP_CLAUSE_TILE); + OMP_CLAUSE_TILE_SIZES (c) = build_tree_list_vec (tvec); + omp_clauses = gfc_trans_add_clause (c, omp_clauses); + + tvec->truncate (0); + } + if (clauses->ordered) { c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_ORDERED); @@ -5106,7 +5124,7 @@ gfc_trans_omp_cancel (gfc_code *code) bool loop_transform_p (gfc_exec_op op) { - return op == EXEC_OMP_UNROLL; + return op == EXEC_OMP_UNROLL || op == EXEC_OMP_TILE; } static tree @@ -5280,6 +5298,16 @@ gfc_nonrect_loop_expr (stmtblock_t *pblock, gfc_se *sep, int loop_n, return true; } +int +gfc_expr_list_len (gfc_expr_list *list) +{ + unsigned len = 0; + for (; list; list = list->next) + len++; + + return len; +} + static tree gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, gfc_omp_clauses *do_clauses, tree par_clauses) @@ -5295,25 +5323,14 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, dovar_init *di; unsigned ix; vec *saved_doacross_steps = doacross_steps; - gfc_expr_list *tile = do_clauses ? do_clauses->tile_list : clauses->tile_list; gfc_code *orig_code = code; locus top_loc = code->loc; - - /* Both collapsed and tiled loops are lowered the same way. In - OpenACC, those clauses are not compatible, so prioritize the tile - clause, if present. */ - if (tile) - { - collapse = 0; - for (gfc_expr_list *el = tile; el; el = el->next) - collapse++; - } - - doacross_steps = NULL; - if (clauses->orderedc) - collapse = clauses->orderedc; - if (collapse <= 0) - collapse = 1; + gfc_expr_list *oacc_tile + = do_clauses ? do_clauses->tile_list : clauses->tile_list; + gfc_expr_list *omp_tile + = do_clauses ? do_clauses->tile_sizes : clauses->tile_sizes; + gcc_assert (!omp_tile || op == EXEC_OMP_TILE); + gcc_assert (!(oacc_tile && omp_tile)); if (pblock == NULL) { @@ -5321,21 +5338,42 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, pblock = █ } code = code->block->next; - gcc_assert (code->op == EXEC_DO || code->op == EXEC_OMP_UNROLL); + gcc_assert (code->op == EXEC_DO || loop_transform_p (code->op)); /* Loop transformation directives surrounding the associated loop of an "omp do" (or similar directive) are represented as clauses on the "omp do". */ loop_transform_clauses = NULL; - while (code->op == EXEC_OMP_UNROLL) + int omp_tile_depth = gfc_expr_list_len (omp_tile); + while (loop_transform_p (code->op)) { tree clauses = gfc_trans_omp_clauses (pblock, code->ext.omp_clauses, code->loc); - loop_transform_clauses = chainon (loop_transform_clauses, clauses); + /* There might be several "!$omp tile" transformations surrounding the + loop. Use the innermost one which must have the largest tiling depth. + If an inner directive has a smaller tiling depth than an outer + directive, an error will be emitted in pass-omp_transform_loops. */ + omp_tile_depth = gfc_expr_list_len (code->ext.omp_clauses->tile_sizes); + + loop_transform_clauses = chainon (loop_transform_clauses, clauses); code = code->block ? code->block->next : code->next; } - gcc_assert (code->op != EXEC_OMP_UNROLL); + gcc_assert (!loop_transform_p (code->op)); gcc_assert (code->op == EXEC_DO); + /* Both collapsed and tiled loops are lowered the same way. In + OpenACC, those clauses are not compatible, so prioritize the tile + clause, if present. */ + if (oacc_tile) + collapse = gfc_expr_list_len (oacc_tile); + + doacross_steps = NULL; + if (clauses->orderedc) + collapse = clauses->orderedc; + if (collapse <= 0) + collapse = 1; + + collapse = MAX (collapse, omp_tile_depth); + init = make_tree_vec (collapse); cond = make_tree_vec (collapse); incr = make_tree_vec (collapse); @@ -5346,7 +5384,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, on the simd construct and DO's clauses are translated elsewhere. */ do_clauses->sched_simd = false; - if (op == EXEC_OMP_UNROLL) + if (loop_transform_p (op)) { /* This is a loop transformation on a loop which is not associated with any other directive. Use the directive location instead of the loop @@ -5695,6 +5733,7 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break; case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break; case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break; + case EXEC_OMP_TILE: stmt = make_node (OMP_LOOP_TRANS); break; case EXEC_OMP_UNROLL: stmt = make_node (OMP_LOOP_TRANS); break; default: gcc_unreachable (); } @@ -7793,6 +7832,7 @@ gfc_trans_omp_directive (gfc_code *code) case EXEC_OMP_LOOP: case EXEC_OMP_SIMD: case EXEC_OMP_TASKLOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: return gfc_trans_omp_do (code, code->op, NULL, code->ext.omp_clauses, NULL); diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc index 56ec59fe80e..94b23c3b77a 100644 --- a/gcc/fortran/trans.cc +++ b/gcc/fortran/trans.cc @@ -2520,6 +2520,7 @@ trans_code (gfc_code * code, tree cond) case EXEC_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case EXEC_OMP_TEAMS_DISTRIBUTE_SIMD: case EXEC_OMP_TEAMS_LOOP: + case EXEC_OMP_TILE: case EXEC_OMP_UNROLL: case EXEC_OMP_WORKSHARE: res = gfc_trans_omp_directive (code); diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 14616eb5316..4d504a12451 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -12105,6 +12105,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: case OMP_CLAUSE_UNROLL_PARTIAL: + case OMP_CLAUSE_TILE: break; case OMP_CLAUSE_NOHOST: default: @@ -13076,6 +13077,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_INCLUSIVE: case OMP_CLAUSE_EXCLUSIVE: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: case OMP_CLAUSE_UNROLL_PARTIAL: @@ -15134,6 +15136,7 @@ gimplify_omp_loop (tree *expr_p, gimple_seq *pre_p) } pc = &OMP_CLAUSE_CHAIN (*pc); break; + case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNROLL_PARTIAL: case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index 0f326128874..e568ba0703e 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -2264,7 +2264,7 @@ omp_loop_transform_clause_p (tree c) enum omp_clause_code code = OMP_CLAUSE_CODE (c); return (code == OMP_CLAUSE_UNROLL_FULL || code == OMP_CLAUSE_UNROLL_PARTIAL - || code == OMP_CLAUSE_UNROLL_NONE); + || code == OMP_CLAUSE_UNROLL_NONE || code == OMP_CLAUSE_TILE); } /* Try to resolve declare variant, return the variant decl if it should diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc index d845d0e4798..858a271261a 100644 --- a/gcc/omp-transform-loops.cc +++ b/gcc/omp-transform-loops.cc @@ -211,6 +211,9 @@ gomp_for_constant_iterations_p (gomp_for *omp_for, return true; } +static gimple_seq +expand_transformed_loop (gomp_for *omp_for); + /* Split a gomp_for that represents a collapsed loop-nest into single loops. The result is a gomp_for of the same kind which is not collapsed (i.e. gimple_omp_for_collapse (OMP_FOR) == 1) and which contains nested, @@ -220,7 +223,7 @@ gomp_for_constant_iterations_p (gomp_for *omp_for, FROM_DEPTH are left collapsed. */ static gomp_for* -gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0) +gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0, bool expand = false) { int collapse = gimple_omp_for_collapse (omp_for); gcc_assert (from_depth < collapse); @@ -251,7 +254,11 @@ gomp_for_uncollapse (gomp_for *omp_for, int from_depth = 0) gimple_omp_for_set_index (level_omp_for, 0, gimple_omp_for_index (omp_for, level)); - body = level_omp_for; + + if (expand) + body = expand_transformed_loop (level_omp_for); + else + body = level_omp_for; } omp_for->collapse = from_depth; @@ -808,6 +815,316 @@ canonicalize_conditions (gomp_for *omp_for) return new_decls; } +/* Execute the tiling transformation for OMP_FOR with the given TILE_SIZES and + return the resulting gimple bind. TILE_SIZES must be a non-empty tree chain + of integer constants and the collapse of OMP_FOR must be at least the length + of TILE_SIZES. TRANSFORMATION_CLAUSES are the loop transformations that + must be applied to OMP_FOR. Those are applied on the result of the tiling + transformation. LOC is the location for diagnostic messages. + + Example 1 + --------- + --------- + + Original loop + ------------- + + #pragma omp for + #pragma omp tile sizes(3) + for (i = 1; i <= n; i = i + 1) + { + body; + } + + Internally, the tile directive is represented as a clause on the + omp for, i.e. as #pragma omp for tile_sizes(3). + + Transformed loop + ---------------- + + #pragma omp for + for (.omp_tile_index = 1; .omp_tile_index < ceil(n/3); .omp_tile_index = .omp_tile_index + 3) + { + D.4287 = .omp_tile_index + 3 + 1 + #pragma omp loop_transform + for (i = .omp_tile_index; i < D.4287; i = i + 1) + { + if (i.0 > n) + goto L.0 + body; + } + L_0: + } + + The outer loop is the "floor loop" and the inner loop is the "tile + loop". The tile loop is never in canonical loop nest form and + hence it cannot be associated with any loop construct. The + GCC-internal "omp loop transform" construct will be lowered after + the tiling transformation. + */ + +static gimple_seq +tile (gomp_for *omp_for, location_t loc, tree tile_sizes, + tree transformation_clauses, walk_ctx *ctx) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE | MSG_PRIORITY_INTERNALS, + dump_user_location_t::from_location_t (loc), + "Executing tile transformation %T:\n %G\n", + transformation_clauses, static_cast (omp_for)); + + gimple_seq tile_loops = copy_gimple_seq_and_replace_locals (omp_for); + gimple_seq floor_loops = copy_gimple_seq_and_replace_locals (omp_for); + + size_t collapse = gimple_omp_for_collapse (omp_for); + size_t tiling_depth = list_length (tile_sizes); + tree clauses = gimple_omp_for_clauses (omp_for); + size_t clause_collapse = 1; + tree collapse_clause = NULL; + + if (tree c = omp_find_clause (clauses, OMP_CLAUSE_ORDERED)) + { + error_at (OMP_CLAUSE_LOCATION (c), + "% invalid in conjunction with %"); + return omp_for; + } + + if (tree c = omp_find_clause (clauses, OMP_CLAUSE_COLLAPSE)) + { + tree expr = OMP_CLAUSE_COLLAPSE_EXPR (c); + clause_collapse = tree_to_uhwi (expr); + collapse_clause = c; + } + + /* The 'omp tile' construct creates a canonical loop-nest whose nesting depth + equals tiling_depth. The whole loop-nest has depth at least 2 * + omp_tile_depth, but the 'tile loops' at levels + omp_tile_depth+1...2*omp_tile_depth are not in canonical loop-nest form + and hence cannot be associated with a loop construct. */ + if (clause_collapse > tiling_depth) + { + error_at (OMP_CLAUSE_LOCATION (collapse_clause), + "collapse cannot extend below the floor loops " + "generated by the % construct"); + OMP_CLAUSE_COLLAPSE_EXPR (collapse_clause) + = build_int_cst (unsigned_type_node, tiling_depth); + return transform_gomp_for (omp_for, NULL, ctx); + } + + if (tiling_depth > collapse) + return transform_gomp_for (omp_for, NULL, ctx); + + gcc_assert (collapse >= clause_collapse); + + push_gimplify_context (); + + /* Create the index variables for iterating the tiles in the floor + loops first tiling_depth loops transformed loop nest. */ + gimple_seq floor_loops_pre_body = NULL; + size_t tile_level = 0; + auto_vec sizes_vec; + for (tree el = tile_sizes; el; el = TREE_CHAIN (el), tile_level++) + { + size_t nest_level = tile_level; + tree index = gimple_omp_for_index (omp_for, nest_level); + tree init = gimple_omp_for_initial (omp_for, nest_level); + tree incr = gimple_omp_for_incr (omp_for, nest_level); + tree step = TREE_OPERAND (incr, 1); + + /* Initialize original index variables in the pre-body. The + loop lowering will not initialize them because of the changed + index variables. */ + gimplify_assign (index, init, &floor_loops_pre_body); + + tree tile_size = fold_convert (TREE_TYPE (step), TREE_VALUE (el)); + sizes_vec.safe_push (tile_size); + tree tile_index = create_tmp_var (TREE_TYPE (index), ".omp_tile_index"); + gimplify_assign (tile_index, init, &floor_loops_pre_body); + + /* Floor loops */ + step = fold_build2 (MULT_EXPR, TREE_TYPE (step), step, tile_size); + tree tile_step = step; + /* For combined constructs, step will be gimplified on the outer + gomp_for. */ + if (!gimple_omp_for_combined_into_p (omp_for) && !TREE_CONSTANT (step)) + { + tile_step = create_tmp_var (TREE_TYPE (step), ".omp_tile_step"); + gimplify_assign (tile_step, step, &floor_loops_pre_body); + } + incr = fold_build2 (TREE_CODE (incr), TREE_TYPE (incr), tile_index, + tile_step); + gimple_omp_for_set_incr (floor_loops, nest_level, incr); + gimple_omp_for_set_index (floor_loops, nest_level, tile_index); + } + gbind *result_bind = gimple_build_bind (NULL, NULL, NULL); + pop_gimplify_context (result_bind); + gimple_seq_add_seq (gimple_omp_for_pre_body_ptr (floor_loops), + floor_loops_pre_body); + + /* The tiling loops will not form a perfect loop nest because the + loop for each tiling dimension needs to check if the current tile + is incomplete and this check is intervening code. Since OpenMP + 5.1 does not allow the collapse of the loop-nest to extend beyond + the floor loops, this is not a problem. + + "Uncollapse" the tiling loop nest, i.e. split the loop nest into + nested separate gomp_for structures for each level. This allows + to add the incomplete tile checks to each level loop. */ + + tile_loops = gomp_for_uncollapse (as_a (tile_loops)); + gimple_omp_for_set_kind (as_a (tile_loops), + GF_OMP_FOR_KIND_TRANSFORM_LOOP); + gimple_omp_for_set_clauses (tile_loops, NULL_TREE); + gimple_omp_for_set_pre_body (tile_loops, NULL); + + /* Transform the loop bodies of the "uncollapsed" tiling loops and + add them to the body of the floor loops. At this point, the + loop nest consists of perfectly nested gimple_omp_for constructs, + each representing a single loop. */ + gimple_seq floor_loops_body = NULL; + gimple *level_loop = tile_loops; + gimple_seq_add_stmt (&floor_loops_body, tile_loops); + gimple_seq *surrounding_seq = &floor_loops_body; + + push_gimplify_context (); + + tree break_label = create_artificial_label (UNKNOWN_LOCATION); + gimple_seq_add_stmt (surrounding_seq, gimple_build_label (break_label)); + for (size_t level = 0; level < tiling_depth; level++) + { + tree original_index = gimple_omp_for_index (omp_for, level); + tree original_final = gimple_omp_for_final (omp_for, level); + + tree tile_index = gimple_omp_for_index (floor_loops, level); + tree tile_size = sizes_vec[level]; + tree type = TREE_TYPE (tile_index); + tree plus_type = type; + + tree incr = gimple_omp_for_incr (omp_for, level); + tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr); + + gimple_seq *pre_body = gimple_omp_for_pre_body_ptr (level_loop); + gimple_seq level_body = gimple_omp_body (level_loop); + gcc_assert (gimple_omp_for_collapse (level_loop) == 1); + tree_code original_cond = gimple_omp_for_cond (omp_for, level); + + gimple_omp_for_set_initial (level_loop, 0, tile_index); + + tree tile_final = create_tmp_var (type); + tree scaled_tile_size = fold_build2 (MULT_EXPR, TREE_TYPE (tile_size), + tile_size, step); + + tree_code plus_code = PLUS_EXPR; + if (POINTER_TYPE_P (TREE_TYPE (tile_index))) + { + plus_code = POINTER_PLUS_EXPR; + int unsignedp = TYPE_UNSIGNED (TREE_TYPE (scaled_tile_size)); + plus_type = signed_or_unsigned_type_for (unsignedp, ptrdiff_type_node); + } + + scaled_tile_size = fold_convert (plus_type, scaled_tile_size); + gimplify_assign (tile_final, + fold_build2 (plus_code, type, + tile_index, scaled_tile_size), + pre_body); + gimple_omp_for_set_final (level_loop, 0, tile_final); + + /* Redefine the original loop index variable of OMP_FOR in terms of the + floor loop and the tiling loop index variable for the current + dimension/level at the top of the loop. */ + gimple_seq level_preamble = NULL; + + push_gimplify_context (); + + tree body_label = create_artificial_label (UNKNOWN_LOCATION); + + /* Handle partial tiles, i.e. add a check that breaks from the tile loop + if the new index value does not belong to the iteration space of the + original loop. */ + gimple_seq_add_stmt (&level_preamble, + gimple_build_cond (original_cond, original_index, + original_final, body_label, + break_label)); + gimple_seq_add_stmt (&level_preamble, gimple_build_label (body_label)); + + auto gsi = gsi_start (level_body); + gsi_insert_seq_before (&gsi, level_preamble, GSI_SAME_STMT); + gbind *level_bind = gimple_build_bind (NULL, NULL, NULL); + pop_gimplify_context (level_bind); + gimple_bind_set_body (level_bind, level_body); + gimple_omp_set_body (level_loop, level_bind); + + surrounding_seq = &level_body; + level_loop = gsi_stmt (gsi); + + /* The label for jumping out of the loop at the next nesting + level. For the outermost level, the label is put after the + loop-nest, for the last one it is not necessary. */ + if (level != tiling_depth - 1) + { + break_label = create_artificial_label (UNKNOWN_LOCATION); + gsi_insert_after (&gsi, gimple_build_label (break_label), + GSI_NEW_STMT); + } + } + + gbind *tile_loops_bind; + tile_loops_bind = gimple_build_bind (NULL, tile_loops, NULL); + pop_gimplify_context (tile_loops_bind); + + gimple_omp_set_body (floor_loops, tile_loops_bind); + + tree remaining_clauses = OMP_CLAUSE_CHAIN (transformation_clauses); + + /* Collapsing of the OMP_FOR is used both for the "omp tile" + implementation and for the actual "collapse" clause. If the + tiling depth was greater than the collapse depth required by the + clauses on OMP_FOR, the collapse of OMP_FOR must be adjusted to + the latter value and all loops below the new collapse depth must + be transformed to GF_OMP_FOR_KIND_TRANSFORM_LOOP to ensure their + lowering in this pass. */ + size_t new_collapse = clause_collapse; + + /* Keep the omp_for collapsed if there are further transformations */ + if (remaining_clauses) + { + size_t next_transform_depth = 1; + if (OMP_CLAUSE_CODE (remaining_clauses) == OMP_CLAUSE_TILE) + next_transform_depth + = list_length (OMP_CLAUSE_TILE_SIZES (remaining_clauses)); + + /* The current "omp tile" transformation reduces the nesting depth + of the canonical loop-nest to TILING_DEPTH. + Hence the following "omp tile" transformation is invalid if + it requires a greater nesting depth. */ + gcc_assert (next_transform_depth <= tiling_depth); + if (next_transform_depth > new_collapse) + new_collapse = next_transform_depth; + } + + if (collapse > new_collapse) + floor_loops = gomp_for_uncollapse (as_a (floor_loops), + new_collapse, true); + + /* Lower the uncollapsed tile loops. */ + walk_omp_for_loops (gimple_bind_body_ptr (tile_loops_bind), ctx); + + gcc_assert (remaining_clauses || !collapse_clause + || gimple_omp_for_collapse (floor_loops) + == (size_t)clause_collapse); + + if (gimple_omp_for_combined_into_p (omp_for)) + ctx->inner_combined_loop = as_a (floor_loops); + + /* Apply remaining transformation clauses and assemble the transformation + result. */ + gimple_bind_set_body (result_bind, + transform_gomp_for (as_a (floor_loops), + remaining_clauses, ctx)); + + return result_bind; +} + /* Combined distribute or taskloop constructs are represented by two or more nested gomp_for constructs which are created during gimplification. Loop transformations on the combined construct are @@ -999,6 +1316,10 @@ transform_gomp_for (gomp_for *omp_for, tree transformation, walk_ctx *ctx) ctx); } break; + case OMP_CLAUSE_TILE: + result = tile (omp_for, loc, OMP_CLAUSE_TILE_SIZES (transformation), + transformation, ctx); + break; default: gcc_unreachable (); } @@ -1177,6 +1498,21 @@ optimize_transformation_clauses (tree clauses) unroll_partial = c; } break; + case OMP_CLAUSE_TILE: + { + /* No optimization for those clauses yet, but they end any chain of + "unroll partial" clauses. */ + if (merged_unroll_partial && dump_enabled_p ()) + print_optimized_unroll_partial_msg (unroll_partial); + + if (unroll_partial) + OMP_CLAUSE_CHAIN (unroll_partial) = c; + + unroll_partial = NULL; + merged_unroll_partial = false; + last_non_unroll = c; + } + break; default: gcc_unreachable (); } diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 new file mode 100644 index 00000000000..84ea93300fa --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1.f90 @@ -0,0 +1,163 @@ +subroutine test + implicit none + integer :: i, j, k + + !$omp tile sizes(1) + do i = 1,100 + call dummy(i) + end do + + !$omp tile sizes(1) + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(2+3) + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(-21) ! { dg-error {tile size not constant positive integer at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(0) ! { dg-error {tile size not constant positive integer at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(i) ! { dg-error {Constant expression required at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes( ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(2 ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes() ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(2,) ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(,2) ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(,i) ! { dg-error {Syntax error in 'tile sizes' list at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(i,) ! { dg-error {Constant expression required at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(1,2) + do i = 1,100 + do j = 1,100 + call dummy(j) + end do + end do + !$end omp tile + + !$omp tile sizes(1,2) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} } + do i = 1,100 + call dummy(i) + end do + !$end omp tile + + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} } + do i = 1,100 + do j = 1,100 + call dummy(i) + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) + do i = 1,100 + do j = 1,100 + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) + do i = 1,100 + do j = 1,100 + do k = 1,100 + call dummy(i) + end do + end do + call dummy(i) ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} } + end do + !$end omp tile + + !$omp tile sizes(1,2,1) + do i = 1,100 + do j = 1,100 + do k = 1,100 + call dummy(i) + end do + call dummy(j) ! { dg-error {\!\$OMP TILE loops not perfectly nested at \(1\)} } + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} } + do i = 1,100 + call dummy(i) + do j = 1,100 + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} } + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile +end subroutine test diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 new file mode 100644 index 00000000000..29d7532bc37 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-1a.f90 @@ -0,0 +1,10 @@ + +subroutine test + !$omp tile sizes(1,2,1) ! { dg-error {not enough DO loops for \!\$OMP TILE at \(1\)} } + do i = 1,100 + do j = 1,100 + call dummy(i) + end do + end do + !$end omp tile +end subroutine test diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 new file mode 100644 index 00000000000..8a5eae3a188 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-2.f90 @@ -0,0 +1,80 @@ +subroutine test1 + implicit none + integer :: i, j, k + + !$omp tile sizes (1,2) + !$omp tile sizes (1,2) + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + + !$omp tile sizes (8) + !$omp tile sizes (1,2) + !$omp tile sizes (1,2,3) + do i = 1,100 + do j = 1,100 + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile +end subroutine test1 + +subroutine test2 + implicit none + integer :: i, j, k + + !$omp taskloop collapse(2) + !$omp tile sizes (3,4) + !$omp tile sizes (1,2) + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$omp end taskloop + + !$omp taskloop simd + !$omp tile sizes (8) + !$omp tile sizes (1,2) + !$omp tile sizes (1,2,3) + do i = 1,100 + do j = 1,100 + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$omp end taskloop simd +end subroutine test2 + +subroutine test3 + implicit none + integer :: i, j, k + + !$omp taskloop collapse(3) ! { dg-error {not enough DO loops for collapsed \!\$OMP TASKLOOP at \(1\)} } + !$omp tile sizes (1,2) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TASKLOOP} } + !$omp tile sizes (1,2) + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$omp end taskloop +end subroutine test3 diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 new file mode 100644 index 00000000000..eaa7895eaa0 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-3.f90 @@ -0,0 +1,18 @@ +subroutine test + implicit none + integer :: i, j, k + + !$omp parallel do collapse(2) ordered(2) + !$omp tile sizes (1,2) + do i = 1,100 ! { dg-error {'ordered' invalid in conjunction with 'omp tile'} } + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$end omp target + +end subroutine test diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 new file mode 100644 index 00000000000..b2dca0bbec6 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-4.f90 @@ -0,0 +1,95 @@ + +subroutine test1 + implicit none + integer :: i, j, k + + !$omp tile sizes (1,2) + !$omp tile sizes (1) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} } + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test1 + +subroutine test2 + implicit none + integer :: i, j, k + + !$omp tile sizes (1,2) + !$omp tile sizes (1) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} } + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test2 + +subroutine test3 + implicit none + integer :: i, j, k + + !$omp target teams distribute + !$omp tile sizes (1,2) + !$omp tile sizes (1) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TILE} } + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test3 + +subroutine test4 + implicit none + integer :: i, j, k + + !$omp target teams distribute collapse(2) + !$omp tile sizes (8) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP TARGET TEAMS DISTRIBUTE} } + !$omp tile sizes (1,2) + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + +end subroutine test4 + +subroutine test5 + implicit none + integer :: i, j, k + + !$omp parallel do collapse(2) ordered(2) + !$omp tile sizes (8) ! { dg-error {loop nest depth after \!\$OMP TILE at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} } + !$omp tile sizes (1,2) + do i = 1,100 + do j = 1,100 + call dummy(j) + do k = 1,100 + call dummy(i) + end do + end do + end do + !$end omp tile + !$end omp tile + !$end omp target + +end subroutine test5 diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 new file mode 100644 index 00000000000..27920701b36 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90 @@ -0,0 +1,57 @@ +function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + + !$omp parallel do collapse(2) + !$omp tile sizes (8,8) + !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP TILE} } + ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP PARALLEL DO} "" { target *-*-*} .-1 } + do i = 1,m + do j = 1,n + inner = 0 + do k = 1, n + inner = inner + a(k, i) * b(j, k) + end do + c(j, i) = inner + end do + end do + + !$omp tile sizes (8,8) + !$omp unroll partial(2) ! { dg-error {loop nest depth after \!\$OMP UNROLL at \(1\) is insufficient for outer \!\$OMP TILE} } + do i = 1,m + do j = 1,n + inner = 0 + do k = 1, n + inner = inner + a(k, i) * b(j, k) + end do + c(j, i) = inner + end do + end do + + !$omp tile sizes (8) + !$omp unroll partial(1) + do i = 1,m + do j = 1,n + inner = 0 + do k = 1, n + inner = inner + a(k, i) * b(j, k) + end do + c(j, i) = inner + end do + end do + + !$omp parallel do collapse(2) ! { dg-error {missing canonical loop nest after \!\$OMP PARALLEL DO at \(1\)} } + !$omp tile sizes (8,8) ! { dg-error {missing canonical loop nest after \!\$OMP TILE at \(1\)} } + !$omp unroll full ! { dg-warning {\!\$OMP UNROLL with FULL clause at \(1\) turns loop into a non-loop} } + do i = 1,m + do j = 1,n + inner = 0 + do k = 1, n + inner = inner + a(k, i) * b(j, k) + end do + c(j, i) = inner + end do + end do +end function mult diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 new file mode 100644 index 00000000000..cda878f3037 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90 @@ -0,0 +1,37 @@ +! { dg-additional-options "-fdump-tree-original" } +! { dg-additional-options "-fdump-tree-omp_transform_loops" } + +function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + + !$omp parallel do + !$omp unroll partial(1) + !$omp tile sizes (8,8) + do i = 1,m + do j = 1,n + inner = 0 + do k = 1, n + inner = inner + a(k, i) * b(j, k) + end do + c(j, i) = inner + end do + end do +end function mult + +! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(1\) tile sizes\(8, 8\)} 1 "original" } } +! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } } + +! Tiling adds two floor and two tile loops. + +! Number of conditional statements after tiling: +! 5 +! = 2 (lowering of 2 tile loops) +! + 1 (partial tile handling in 2 tile loops) +! + 1 (lowering of non-associated floor loop) + +! The unrolling with unroll factor 1 currently gets executed (TODO could/should be skipped?) + +! { dg-final { scan-tree-dump-times {if \([A-Za-z0-9_.]+ < } 5 "omp_transform_loops" } } diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 new file mode 100644 index 00000000000..00615011856 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90 @@ -0,0 +1,41 @@ +! { dg-additional-options "-fdump-tree-original" } +! { dg-additional-options "-fdump-tree-omp_transform_loops" } + +function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + c = 0 + + !$omp target + !$omp parallel do + !$omp unroll partial(2) + !$omp tile sizes (8,8,4) + do i = 1,m + do j = 1,n + do k = 1, n + c(j,i) = c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + !$omp end target +end function mult + +! { dg-final { scan-tree-dump-times {#pragma omp for nowait unroll_partial\(2\) tile sizes\(8, 8, 4\)} 1 "original" } } +! { dg-final { scan-tree-dump-not "#pragma omp loop_transform unroll_partial" "omp_transform_loops" } } + +! Check the number of loops + +! Tiling adds three tile and three floor loops. +! The outermost floor loop is associated with the "!$omp parallel do" +! and hence it isn't lowered in the transformation pass. +! Number of conditional statements after tiling: +! 8 +! = 2 (inner floor loop lowering) +! + 3 (partial tile handling in 3 tile loops) +! + 3 (lowering of 3 tile loops) +! +! Unrolling creates 2 copies of the tiled loop nest. + +! { dg-final { scan-tree-dump-times {if \([A-Za-z0-9_.]+ < } 16 "omp_transform_loops" } } diff --git a/gcc/tree-core.h b/gcc/tree-core.h index f1429824158..b241e144515 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -534,6 +534,9 @@ enum omp_clause_code { /* Internal representation for an "omp unroll partial" directive. */ OMP_CLAUSE_UNROLL_PARTIAL, + + /* Represents a "tile" directive internally. */ + OMP_CLAUSE_TILE }; #undef DEFTREESTRUCT diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index cae81719e68..02c207d87a0 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -521,6 +521,14 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) pp_right_paren (pp); } break; + case OMP_CLAUSE_TILE: + pp_string (pp, "tile sizes"); + pp_left_paren (pp); + gcc_assert (OMP_CLAUSE_TILE_SIZES (clause)); + dump_generic_node (pp, OMP_CLAUSE_TILE_SIZES (clause), spc, flags, + false); + pp_right_paren (pp); + break; case OMP_CLAUSE__LOOPTEMP_: name = "_looptemp_"; goto print_remap; diff --git a/gcc/tree.cc b/gcc/tree.cc index fc7e22d352f..893f509fa3a 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -327,8 +327,10 @@ unsigned const char omp_clause_num_ops[] = 0, /* OMP_CLAUSE_FINALIZE */ 0, /* OMP_CLAUSE_NOHOST */ 0, /* OMP_CLAUSE_UNROLL_FULL */ + 0, /* OMP_CLAUSE_UNROLL_NONE */ - 1 /* OMP_CLAUSE_UNROLL_PARTIAL */ + 1, /* OMP_CLAUSE_UNROLL_PARTIAL */ + 1 /* OMP_CLAUSE_TILE */ }; const char * const omp_clause_code_name[] = @@ -422,7 +424,8 @@ const char * const omp_clause_code_name[] = "nohost", "unroll_full", "unroll_none", - "unroll_partial" + "unroll_partial", + "tile" }; /* Unless specific to OpenACC, we tend to internally maintain OpenMP-centric diff --git a/gcc/tree.h b/gcc/tree.h index 6f7a6e7017a..8f4d2761d1a 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1790,6 +1790,9 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_UNROLL_PARTIAL_EXPR(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_UNROLL_PARTIAL), 0) +#define OMP_CLAUSE_TILE_SIZES(NODE) \ + OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0) + #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind) diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C new file mode 100644 index 00000000000..8970bfa7fd8 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/loop-transforms/unroll-full-tile.C @@ -0,0 +1,84 @@ +#include +#include + +template +int sum () +{ + int sum = 0; +#pragma omp unroll full +#pragma omp tile sizes(dim0, dim1) + for (unsigned i = 0; i < 4; i++) + for (unsigned j = 0; j < 5; j++) + sum++; + + return sum; +} + +int main () +{ + if (sum <1,1> () != 20) + __builtin_abort (); + if (sum <1,2> () != 20) + __builtin_abort (); + if (sum <1,3> () != 20) + __builtin_abort (); + if (sum <1,4> () != 20) + __builtin_abort (); + if (sum <1,5> () != 20) + __builtin_abort (); + + if (sum <2,1> () != 20) + __builtin_abort (); + if (sum <2,2> () != 20) + __builtin_abort (); + if (sum <2,3> () != 20) + __builtin_abort (); + if (sum <2,4> () != 20) + __builtin_abort (); + if (sum <2,5> () != 20) + __builtin_abort (); + + if (sum <3,1> () != 20) + __builtin_abort (); + if (sum <3,2> () != 20) + __builtin_abort (); + if (sum <3,3> () != 20) + __builtin_abort (); + if (sum <3,4> () != 20) + __builtin_abort (); + if (sum <3,5> () != 20) + __builtin_abort (); + + if (sum <4,1> () != 20) + __builtin_abort (); + if (sum <4,2> () != 20) + __builtin_abort (); + if (sum <4,3> () != 20) + __builtin_abort (); + if (sum <4,4> () != 20) + __builtin_abort (); + if (sum <4,5> () != 20) + __builtin_abort (); + + if (sum <5,1> () != 20) + __builtin_abort (); + if (sum <5,2> () != 20) + __builtin_abort (); + if (sum <5,3> () != 20) + __builtin_abort (); + if (sum <5,4> () != 20) + __builtin_abort (); + if (sum <5,5> () != 20) + __builtin_abort (); + + if (sum <6,1> () != 20) + __builtin_abort (); + if (sum <6,2> () != 20) + __builtin_abort (); + if (sum <6,3> () != 20) + __builtin_abort (); + if (sum <6,4> () != 20) + __builtin_abort (); + if (sum <6,5> () != 20) + __builtin_abort (); +} diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 new file mode 100644 index 00000000000..bb48c31224e --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-1.f90 @@ -0,0 +1,71 @@ +module matrix + implicit none + integer :: n = 10 + integer :: m = 10 + +contains + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + !$omp parallel do collapse(2) private(inner) + !$omp tile sizes (8, 1) + do i = 1,m + do j = 1,n + inner = 0 + do k = 1, n + inner = inner + a(k, i) * b(j, k) + end do + c(j, i) = inner + end do + end do + end function mult + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = merge(1,0, i.eq.j) + b(j,i) = j + end do + end do + + c = mult (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 new file mode 100644 index 00000000000..6aedbf4724f --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 @@ -0,0 +1,117 @@ +! { dg-additional-options "-fdump-tree-original" } +! { dg-do run } + +module test_functions + contains + integer function compute_sum1() result(sum) + implicit none + + integer :: i,j + + sum = 0 + !$omp do + do i = 1,10,3 + !$omp tile sizes(2) + do j = 1,10,3 + sum = sum + 1 + end do + end do + end function + + integer function compute_sum2() result(sum) + implicit none + + integer :: i,j + + sum = 0 + !$omp do + do i = 1,10,3 + !$omp tile sizes(16) + do j = 1,10,3 + sum = sum + 1 + end do + end do + end function + + integer function compute_sum3() result(sum) + implicit none + + integer :: i,j + + sum = 0 + !$omp do + do i = 1,10,3 + !$omp tile sizes(100) + do j = 1,10,3 + sum = sum + 1 + end do + end do + end function + + integer function compute_sum4() result(sum) + implicit none + + integer :: i,j + + sum = 0 + !$omp do + !$omp tile sizes(6,10) + do i = 1,10,3 + do j = 1,10,3 + sum = sum + 1 + end do + end do + end function + + integer function compute_sum5() result(sum) + implicit none + + integer :: i,j + + sum = 0 + !$omp parallel do collapse(2) + !$omp tile sizes(6,10) + do i = 1,10,3 + do j = 1,10,3 + sum = sum + 1 + end do + end do + end function +end module test_functions + +program test + use test_functions + implicit none + + integer :: result + + result = compute_sum1 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result = compute_sum2 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result = compute_sum3 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result = compute_sum4 () + write (*,*) result + if (result .ne. 16) then + call abort + end if + + result = compute_sum5 () + write (*,*) result + if (result .ne. 16) then + call abort + end if +end program diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 new file mode 100644 index 00000000000..2f2f014ead9 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90 @@ -0,0 +1,112 @@ +module matrix + implicit none + integer :: n = 10 + integer :: m = 10 + +contains + + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,10 + do j = 1,n + c(j,i) = 0 + end do + end do + + !$omp unroll partial(10) + !$omp tile sizes(1, 3) + do i = 1,10 + do j = 1,n + do k = 1, n + write (*,*) i, j, k + c(j,i) = c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult + + function mult2 (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,10 + do j = 1,n + c(j,i) = 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes(1,2) + do i = 1,10 + do j = 1,n + do k = 1, n + write (*,*) i, j, k + c(j,i) = c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult2 + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = merge(1,0, i.eq.j) + b(j,i) = j + end do + end do + + ! c = mult (a, b) + + ! call print_matrix (a) + ! call print_matrix (b) + ! call print_matrix (c) + + ! do i = 1,n + ! do j = 1,m + ! if (b(i,j) .ne. c(i,j)) call abort () + ! end do + ! end do + + + c = mult2 (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 new file mode 100644 index 00000000000..1b5b623b838 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90 @@ -0,0 +1,71 @@ +module matrix + implicit none + integer :: n = 10 + integer :: m = 10 + +contains + + function copy (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,10 + do j = 1,n + c(j,i) = 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes (1,5) + do i = 1,10 + do j = 1,n + c(j,i) = c(j,i) + a(j, i) + end do + end do + end function copy + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = 1 + end do + end do + + c = copy (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (c(i,j) .ne. a(i,j)) call abort () + end do + end do + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 new file mode 100644 index 00000000000..518968f1335 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90 @@ -0,0 +1,77 @@ +module matrix + implicit none + integer :: n = 4 + integer :: m = 4 + +contains + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + ! omp do private(inner) + do i = 1,m + !$omp unroll partial(4) + !$omp tile sizes (5) + do j = 1,n + do k = 1, n + write (*,*) "i", i, "j", j, "k", k + if (k == 1) then + inner = 0 + endif + inner = inner + a(k, i) * b(j, k) + if (k == n) then + c(j, i) = inner + endif + end do + end do + end do + end function mult + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = merge(1,0, i.eq.j) + b(j,i) = j + end do + end do + + c = mult (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 new file mode 100644 index 00000000000..807135df5e8 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90 @@ -0,0 +1,75 @@ +module matrix + implicit none + integer :: n = 4 + integer :: m = 4 + +contains + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,m + do j = 1,n + c(j, i) = 0 + end do + end do + + !$omp parallel do + do i = 1,m + !$omp tile sizes (5,2) + do j = 1,n + do k = 1, n + c(j,i) = c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = merge(1,0, i.eq.j) + b(j,i) = j + end do + end do + + c = mult (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 new file mode 100644 index 00000000000..2f2f014ead9 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90 @@ -0,0 +1,112 @@ +module matrix + implicit none + integer :: n = 10 + integer :: m = 10 + +contains + + function mult (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,10 + do j = 1,n + c(j,i) = 0 + end do + end do + + !$omp unroll partial(10) + !$omp tile sizes(1, 3) + do i = 1,10 + do j = 1,n + do k = 1, n + write (*,*) i, j, k + c(j,i) = c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult + + function mult2 (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,10 + do j = 1,n + c(j,i) = 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes(1,2) + do i = 1,10 + do j = 1,n + do k = 1, n + write (*,*) i, j, k + c(j,i) = c(j,i) + a(k, i) * b(j, k) + end do + end do + end do + end function mult2 + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine + +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = merge(1,0, i.eq.j) + b(j,i) = j + end do + end do + + ! c = mult (a, b) + + ! call print_matrix (a) + ! call print_matrix (b) + ! call print_matrix (c) + + ! do i = 1,n + ! do j = 1,m + ! if (b(i,j) .ne. c(i,j)) call abort () + ! end do + ! end do + + + c = mult2 (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (b(i,j) .ne. c(i,j)) call abort () + end do + end do + +end program main diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 new file mode 100644 index 00000000000..1b5b623b838 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90 @@ -0,0 +1,71 @@ +module matrix + implicit none + integer :: n = 10 + integer :: m = 10 + +contains + + function copy (a, b) result (c) + integer, allocatable, dimension (:,:) :: a,b,c + integer :: i, j, k, inner + + allocate(c( n, m )) + do i = 1,10 + do j = 1,n + c(j,i) = 0 + end do + end do + + !$omp unroll partial(2) + !$omp tile sizes (1,5) + do i = 1,10 + do j = 1,n + c(j,i) = c(j,i) + a(j, i) + end do + end do + end function copy + + subroutine print_matrix (m) + integer, allocatable :: m(:,:) + integer :: i, j, n + + n = size (m, 1) + do i = 1,n + do j = 1,n + write (*, fmt="(i4)", advance='no') m(j, i) + end do + write (*, *) "" + end do + write (*, *) "" + end subroutine +end module matrix + +program main + use matrix + implicit none + + integer, allocatable :: a(:,:),b(:,:),c(:,:) + integer :: i,j + + allocate(a( n, m )) + allocate(b( n, m )) + + do i = 1,n + do j = 1,m + a(j,i) = 1 + end do + end do + + c = copy (a, b) + + call print_matrix (a) + call print_matrix (b) + call print_matrix (c) + + do i = 1,n + do j = 1,m + if (c(i,j) .ne. a(i,j)) call abort () + end do + end do + +end program main