OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]
Commit Message
Hi!
On 2019-05-08T14:51:57+0100, Julian Brown <julian@codesourcery.com> wrote:
> - The "addressable" bit is set during the kernels conversion pass for
> variables that have "create" (alloc) clauses created for them in the
> synthesised outer data region (instead of in the front-end, etc.,
> where it can't be done accurately). Such variables actually have
> their address taken during transformations made in a later pass
> (omp-low, I think), but there's a phase-ordering problem that means
> the flag should be set earlier.
The actual issue is a bit different, but yes, there is a problem.
The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
"ICE in lower_omp_target, at omp-low.c:12287". (And I'm confused why we
didn't run into that with the OpenACC 'kernels' decomposition
originally.) I've pushed to master branch
commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized data
clauses as addressable [PR100280]", see attached.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Comments
Hi!
Jakub, need your review/approval here, please:
On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown <julian@codesourcery.com> wrote:
>> - The "addressable" bit is set during the kernels conversion pass for
>> variables that have "create" (alloc) clauses created for them in the
>> synthesised outer data region (instead of in the front-end, etc.,
>> where it can't be done accurately). Such variables actually have
>> their address taken during transformations made in a later pass
>> (omp-low, I think), but there's a phase-ordering problem that means
>> the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
> "ICE in lower_omp_target, at omp-low.c:12287". (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.) I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]"
> ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
>
> 13073 else if (is_gimple_reg (var))
> 13074 {
> 13075 gcc_assert (offloaded);
> 13076 tree avar = create_tmp_var (TREE_TYPE (var));
> 13077 mark_addressable (avar);
>
> ..., which (a) is only implemented for actualy *offloaded* regions (but not
> data regions), and (b) the subsequently synthesized code for writing to and
> later reading back from the temporary fundamentally conflicts with OpenACC
> 'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial
> to make work, so let's just avoid this case.
> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple *body)
>
> /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> - variable. If INNER_CLEANUP is present, add a try-finally statement with
> + variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> + If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block. Return the new data region, or
> the original BODY if no data region was needed. */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> + /* See <https://gcc.gnu.org/PR100280>. */
> + TREE_ADDRESSABLE (v) = 1;
> }
> }
So, that's too simple. ;-) ... and gives rise to workaround patches like
we have on the og11 development branch:
- "Avoid introducing 'create' mapping clauses for loop index variables in kernels regions",
- "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously",
- "Fix for is_gimple_reg vars to 'data kernels'"
We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
because that may easily violate GIMPLE invariants, leading to ICEs later.
There are a few open PRs, which my following changes are addressing. To
make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
discussion of <https://gcc.gnu.org/PR102330>. (PR102330 turned out to be
unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a
different patch for it.)
I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
above (plus one or two additional ones to be introduced in later
patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
'make_addressable_vars' for later fix-up.
(In reply to Jakub Jelinek from comment #9)
> Whether you can use the same bitmap or need to add another bitmap next to
> task_shared_vars is something hard to guess without diving into it deeply.
Per my understanding of the code, the only place where I had doubts is
'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that
what this is doing is either a no-op in the
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the
original 'task_shared_vars' handling has been. Either way: I couldn't
come up with a way (test case) that we'd actually run into this case;
you'd have to have the relevant OpenMP constructs inside an OpenACC
'kernels' region, which isn't permitted per
'gcc/omp-low.cc:check_omp_nesting_restrictions'.
OK to proceed in this way?
Grüße
Thomas
--- gcc/omp-low.cc
+++ gcc/omp-low.cc
@@ -188,7 +188,7 @@ struct omp_context
static splay_tree all_contexts;
static int taskreg_nesting_level;
static int target_nesting_level;
-static bitmap task_shared_vars;
+static bitmap make_addressable_vars;
static bitmap global_nonaddressable_vars;
static vec<omp_context *> taskreg_contexts;
static vec<gomp_task *> task_cpyfns;
@@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
/* Taking address of OUTER in lower_send_shared_vars
might need regimplification of everything that uses the
variable. */
- if (!task_shared_vars)
- task_shared_vars = BITMAP_ALLOC (NULL);
- bitmap_set_bit (task_shared_vars, DECL_UID (outer));
+ if (!make_addressable_vars)
+ make_addressable_vars = BITMAP_ALLOC (NULL);
+ bitmap_set_bit (make_addressable_vars, DECL_UID (outer));
TREE_ADDRESSABLE (outer) = 1;
}
return true;
@@ -601,13 +601,13 @@ omp_copy_decl_2 (tree var, tree name, tree type, omp_context *ctx)
else
record_vars (copy);
- /* If VAR is listed in task_shared_vars, it means it wasn't
- originally addressable and is just because task needs to take
- it's address. But we don't need to take address of privatizations
+ /* If VAR is listed in make_addressable_vars, it wasn't
+ originally addressable, but was only later made so.
+ We don't need to take address of privatizations
from that var. */
if (TREE_ADDRESSABLE (var)
- && ((task_shared_vars
- && bitmap_bit_p (task_shared_vars, DECL_UID (var)))
+ && ((make_addressable_vars
+ && bitmap_bit_p (make_addressable_vars, DECL_UID (var)))
|| (global_nonaddressable_vars
&& bitmap_bit_p (global_nonaddressable_vars, DECL_UID (var)))))
TREE_ADDRESSABLE (copy) = 0;
@@ -1495,6 +1495,21 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
if (ctx->outer)
scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer);
decl = OMP_CLAUSE_DECL (c);
+ /* If requested, make 'decl' addressable. */
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+ && OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c))
+ {
+ gcc_checking_assert (DECL_P (decl));
+
+ gcc_checking_assert (!TREE_ADDRESSABLE (decl));
+ if (!make_addressable_vars)
+ make_addressable_vars = BITMAP_ALLOC (NULL);
+ bitmap_set_bit (make_addressable_vars, DECL_UID (decl));
+ TREE_ADDRESSABLE (decl) = 1;
+
+ /* Done. */
+ OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (c) = 0;
+ }
/* Global variables with "omp declare target" attribute
don't need to be copied, the receiver side will use them
directly. However, global variables with "omp declare target link"
@@ -2371,11 +2405,11 @@ finish_taskreg_scan (omp_context *ctx)
if (ctx->record_type == NULL_TREE)
return;
- /* If any task_shared_vars were needed, verify all
+ /* If any make_addressable_vars were needed, verify all
OMP_CLAUSE_SHARED clauses on GIMPLE_OMP_{PARALLEL,TASK,TEAMS}
statements if use_pointer_for_field hasn't changed
because of that. If it did, update field types now. */
- if (task_shared_vars)
+ if (make_addressable_vars)
{
tree c;
@@ -2390,7 +2424,7 @@ finish_taskreg_scan (omp_context *ctx)
the receiver side will use them directly. */
if (is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx)))
continue;
- if (!bitmap_bit_p (task_shared_vars, DECL_UID (decl))
+ if (!bitmap_bit_p (make_addressable_vars, DECL_UID (decl))
|| !use_pointer_for_field (decl, ctx))
continue;
tree field = lookup_field (decl, ctx);
@@ -14040,7 +14074,7 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
/* Callback for lower_omp_1. Return non-NULL if *tp needs to be
regimplified. If DATA is non-NULL, lower_omp_1 is outside
- of OMP context, but with task_shared_vars set. */
+ of OMP context, but with make_addressable_vars set. */
static tree
lower_omp_regimplify_p (tree *tp, int *walk_subtrees,
@@ -14054,9 +14088,9 @@ lower_omp_regimplify_p (tree *tp, int *walk_subtrees,
&& DECL_HAS_VALUE_EXPR_P (t))
return t;
- if (task_shared_vars
+ if (make_addressable_vars
&& DECL_P (t)
- && bitmap_bit_p (task_shared_vars, DECL_UID (t)))
+ && bitmap_bit_p (make_addressable_vars, DECL_UID (t)))
return t;
/* If a global variable has been privatized, TREE_CONSTANT on
@@ -14141,7 +14175,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
if (gimple_has_location (stmt))
input_location = gimple_location (stmt);
- if (task_shared_vars)
+ if (make_addressable_vars)
memset (&wi, '\0', sizeof (wi));
/* If we have issued syntax errors, avoid doing any heavy lifting.
@@ -14158,7 +14192,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
case GIMPLE_COND:
{
gcond *cond_stmt = as_a <gcond *> (stmt);
- if ((ctx || task_shared_vars)
+ if ((ctx || make_addressable_vars)
&& (walk_tree (gimple_cond_lhs_ptr (cond_stmt),
lower_omp_regimplify_p,
ctx ? NULL : &wi, NULL)
@@ -14250,7 +14284,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
lower_omp_critical (gsi_p, ctx);
break;
case GIMPLE_OMP_ATOMIC_LOAD:
- if ((ctx || task_shared_vars)
+ if ((ctx || make_addressable_vars)
&& walk_tree (gimple_omp_atomic_load_rhs_ptr (
as_a <gomp_atomic_load *> (stmt)),
lower_omp_regimplify_p, ctx ? NULL : &wi, NULL))
@@ -14371,7 +14405,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
default:
regimplify:
- if ((ctx || task_shared_vars)
+ if ((ctx || make_addressable_vars)
&& walk_gimple_op (stmt, lower_omp_regimplify_p,
ctx ? NULL : &wi))
{
@@ -14435,10 +14469,10 @@ execute_lower_omp (void)
if (all_contexts->root)
{
- if (task_shared_vars)
+ if (make_addressable_vars)
push_gimplify_context ();
lower_omp (&body, NULL);
- if (task_shared_vars)
+ if (make_addressable_vars)
pop_gimplify_context (NULL);
}
@@ -14447,7 +14481,7 @@ execute_lower_omp (void)
splay_tree_delete (all_contexts);
all_contexts = NULL;
}
- BITMAP_FREE (task_shared_vars);
+ BITMAP_FREE (make_addressable_vars);
BITMAP_FREE (global_nonaddressable_vars);
/* If current function is a method, remove artificial dummy VAR_DECL created
--- gcc/omp-oacc-kernels-decompose.cc
+++ gcc/omp-oacc-kernels-decompose.cc
@@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
prev_mapped_var = v;
/* See <https://gcc.gnu.org/PR100280>. */
- TREE_ADDRESSABLE (v) = 1;
+ if (!TREE_ADDRESSABLE (v))
+ {
+ /* Request that OMP lowering make 'v' addressable. */
+ OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1;
+ }
}
}
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -1145,6 +1145,9 @@ struct GTY(()) tree_base {
PREDICT_EXPR_OUTCOME in
PREDICT_EXPR
+ OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE in
+ OMP_CLAUSE
+
static_flag:
TREE_STATIC in
--- gcc/tree.h
+++ gcc/tree.h
@@ -1695,6 +1695,11 @@ class auto_suppress_location_wrappers
#define OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.deprecated_flag)
+/* Flag that 'OMP_CLAUSE_DECL (NODE)' is to be made addressable during OMP
+ lowering. */
+#define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
+ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.addressable_flag)
+
/* True on an OMP_CLAUSE_USE_DEVICE_PTR with an OpenACC 'if_present'
clause. */
#define OMP_CLAUSE_USE_DEVICE_PTR_IF_PRESENT(NODE) \
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
On Tue, Mar 01, 2022 at 05:46:20PM +0100, Thomas Schwinge wrote:
> OK to proceed in this way?
With a suitable ChangeLog entry and one nit fixed yes.
> --- gcc/omp-low.cc
> +++ gcc/omp-low.cc
> @@ -188,7 +188,7 @@ struct omp_context
> static splay_tree all_contexts;
> static int taskreg_nesting_level;
> static int target_nesting_level;
> -static bitmap task_shared_vars;
> +static bitmap make_addressable_vars;
> static bitmap global_nonaddressable_vars;
> static vec<omp_context *> taskreg_contexts;
> static vec<gomp_task *> task_cpyfns;
> @@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
> /* Taking address of OUTER in lower_send_shared_vars
> might need regimplification of everything that uses the
> variable. */
> - if (!task_shared_vars)
> - task_shared_vars = BITMAP_ALLOC (NULL);
> - bitmap_set_bit (task_shared_vars, DECL_UID (outer));
> + if (!make_addressable_vars)
> + make_addressable_vars = BITMAP_ALLOC (NULL);
> + bitmap_set_bit (make_addressable_vars, DECL_UID (outer));
Has the MUA replaced tabs with spaces?
> --- gcc/omp-oacc-kernels-decompose.cc
> +++ gcc/omp-oacc-kernels-decompose.cc
> @@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
> prev_mapped_var = v;
>
> /* See <https://gcc.gnu.org/PR100280>. */
> - TREE_ADDRESSABLE (v) = 1;
> + if (!TREE_ADDRESSABLE (v))
> + {
> + /* Request that OMP lowering make 'v' addressable. */
> + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1;
> + }
That is a single statement body, so shouldn't have {}s around it.
Jakub
From 9b32c1669aad5459dd053424f9967011348add83 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 16 Dec 2021 22:02:37 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]
... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
13073 else if (is_gimple_reg (var))
13074 {
13075 gcc_assert (offloaded);
13076 tree avar = create_tmp_var (TREE_TYPE (var));
13077 mark_addressable (avar);
..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial
to make work, so let's just avoid this case.
gcc/
PR middle-end/100280
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Mark variables used in synthesized data clauses as addressable.
gcc/testsuite/
PR middle-end/100280
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
'--param openacc-kernels=decompose'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Update.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
'--param openacc-kernels=decompose'.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
libgomp/
PR middle-end/100280
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Update.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
Suggested-by: Julian Brown <julian@codesourcery.com>
---
gcc/omp-oacc-kernels-decompose.cc | 6 +-
.../goacc/classify-kernels-parloops.c | 41 +++++++
...classify-kernels-unparallelized-parloops.c | 45 +++++++
.../goacc/classify-kernels-unparallelized.c | 5 +-
.../c-c++-common/goacc/classify-kernels.c | 5 +-
.../c-c++-common/goacc/kernels-decompose-2.c | 16 ++-
.../goacc/kernels-decompose-ice-1.c | 114 ------------------
.../goacc/kernels-decompose-ice-2.c | 22 ----
.../goacc/kernels-decompose-pr100280-1.c | 19 +++
.../goacc/classify-kernels-parloops.f95 | 43 +++++++
...assify-kernels-unparallelized-parloops.f95 | 47 ++++++++
.../goacc/classify-kernels-unparallelized.f95 | 5 +-
.../gfortran.dg/goacc/classify-kernels.f95 | 5 +-
.../declare-vla-kernels-decompose-ice-1.c | 2 +-
.../libgomp.oacc-c-c++-common/f-asyncwait-1.c | 53 ++++----
.../kernels-decompose-1.c | 6 +-
16 files changed, 264 insertions(+), 170 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100280-1.c
create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-parloops.f95
create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
@@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple *body)
/* If INNER_BIND_VARS holds variables, build an OpenACC data region with
location LOC containing BODY and having 'create (var)' clauses for each
- variable. If INNER_CLEANUP is present, add a try-finally statement with
+ variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
+ If INNER_CLEANUP is present, add a try-finally statement with
this cleanup code in the finally block. Return the new data region, or
the original BODY if no data region was needed. */
@@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
inner_data_clauses = new_clause;
prev_mapped_var = v;
+
+ /* See <https://gcc.gnu.org/PR100280>. */
+ TREE_ADDRESSABLE (v) = 1;
}
}
new file mode 100644
@@ -0,0 +1,41 @@
+/* Check offloaded function's attributes and classification for OpenACC
+ kernels. */
+
+/* { dg-additional-options "--param openacc-kernels=parloops" } */
+
+/* { dg-additional-options "-O2" }
+ { dg-additional-options "-fopt-info-optimized-omp" }
+ { dg-additional-options "-fdump-tree-ompexp" }
+ { dg-additional-options "-fdump-tree-parloops1-all" }
+ { dg-additional-options "-fdump-tree-oaccloops" } */
+
+/* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
+ aspects of that functionality. */
+
+#define N 1024
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+void KERNELS ()
+{
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+ for (unsigned int i = 0; i < N; i++)
+ c[i] = a[i] + b[i];
+}
+
+/* Check the offloaded function's attributes.
+ { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check that exactly one OpenACC kernels construct is analyzed, and that it
+ can be parallelized.
+ { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
+ { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+ { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check the offloaded function's classification and compute dimensions (will
+ always be 1 x 1 x 1 for non-offloading compilation).
+ { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
+ { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
+ { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
new file mode 100644
@@ -0,0 +1,45 @@
+/* Check offloaded function's attributes and classification for unparallelized
+ OpenACC kernels. */
+
+/* { dg-additional-options "--param openacc-kernels=parloops" } */
+
+/* { dg-additional-options "-O2" }
+ { dg-additional-options "-fopt-info-optimized-omp" }
+ { dg-additional-options "-fdump-tree-ompexp" }
+ { dg-additional-options "-fdump-tree-parloops1-all" }
+ { dg-additional-options "-fdump-tree-oaccloops" } */
+
+/* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
+ aspects of that functionality. */
+
+#define N 1024
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+/* An "extern"al mapping of loop iterations/array indices makes the loop
+ unparallelizable. */
+extern unsigned int f (unsigned int);
+
+void KERNELS ()
+{
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+ for (unsigned int i = 0; i < N; i++)
+ c[i] = a[f (i)] + b[f (i)];
+}
+
+/* Check the offloaded function's attributes.
+ { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check that exactly one OpenACC kernels construct is analyzed, and that it
+ can't be parallelized.
+ { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
+ { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+ { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } } */
+
+/* Check the offloaded function's classification and compute dimensions (will
+ always be 1 x 1 x 1 for non-offloading compilation).
+ { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
+ { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
+ { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
@@ -1,8 +1,10 @@
/* Check offloaded function's attributes and classification for unparallelized
OpenACC kernels. */
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
/* { dg-additional-options "-O2" }
- { dg-additional-options "-fopt-info-optimized-omp" }
+ { dg-additional-options "-fopt-info-all-omp" }
{ dg-additional-options "-fdump-tree-ompexp" }
{ dg-additional-options "-fdump-tree-parloops1-all" }
{ dg-additional-options "-fdump-tree-oaccloops" } */
@@ -23,6 +25,7 @@ extern unsigned int f (unsigned int);
void KERNELS ()
{
#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+ /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
for (unsigned int i = 0; i < N; i++)
c[i] = a[f (i)] + b[f (i)];
}
@@ -1,8 +1,10 @@
/* Check offloaded function's attributes and classification for OpenACC
kernels. */
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
/* { dg-additional-options "-O2" }
- { dg-additional-options "-fopt-info-optimized-omp" }
+ { dg-additional-options "-fopt-info-all-omp" }
{ dg-additional-options "-fdump-tree-ompexp" }
{ dg-additional-options "-fdump-tree-parloops1-all" }
{ dg-additional-options "-fdump-tree-oaccloops" } */
@@ -19,6 +21,7 @@ extern unsigned int *__restrict c;
void KERNELS ()
{
#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+ /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
for (unsigned int i = 0; i < N; i++)
c[i] = a[i] + b[i];
}
@@ -55,7 +55,7 @@ main ()
;
}
- { /*TODO Instead of using 'for (int i = 0; [...])', move 'int i' outside, to work around for ICE detailed in 'kernels-decompose-ice-1.c'. */
+ {
int i;
#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
/* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_compute$c_compute } */
@@ -64,6 +64,20 @@ main ()
a[i] = 0;
}
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+ /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } l_compute$c_compute } */
+ /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
+ {
+ int i;
+ }
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+ /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
+ /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+ /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+ for (int i = 0; i < N; i++)
+ a[i] = 0;
+
#pragma acc kernels loop /* { dg-line l_loop_i[incr c_loop_i] } */
/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
deleted file mode 100644
@@ -1,114 +0,0 @@
-/* Test OpenACC 'kernels' construct decomposition. */
-
-/* { dg-additional-options "-fopt-info-omp-all" } */
-
-/* { dg-additional-options "-fchecking --param=openacc-kernels=decompose" } */
-/* { dg-ice "TODO" }
- { dg-prune-output "during GIMPLE pass: omplower" } */
-
-/* { dg-additional-options "--param=openacc-privatization=noisy" } */
-
-/* Reduced from 'kernels-decompose-2.c'.
- (Hopefully) similar instances:
- - 'kernels-decompose-ice-2.c'
- - 'libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c'
- - 'libgomp.oacc-c-c++-common/kernels-decompose-1.c'
-*/
-
-int
-main ()
-{
-#define N 10
-
-#pragma acc kernels
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
- /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
- for (int i = 0; i < N; i++)
- ;
-
- return 0;
-}
-
-/*
- In 'gimple' we've got:
-
- main ()
- {
- int D.2087;
-
- {
- int a[10];
-
- try
- {
- #pragma omp target oacc_kernels map(tofrom:a [len: 40])
- {
- {
- int i;
-
- i = 0;
- goto <D.2085>;
- [...]
-
- ..., which in 'omp_oacc_kernels_decompose' we turn into:
-
- main ()
- {
- int D.2087;
-
- {
- int a[10];
-
- try
- {
- #pragma omp target oacc_data_kernels map(tofrom:a [len: 40])
- {
- try
- {
- {
- int i;
-
- #pragma omp target oacc_data_kernels map(alloc:i [len: 4])
- {
- try
- {
- {
- #pragma omp target oacc_kernels async(-1) map(force_present:i [len: 4]) map(force_present:a [len: 40])
- {
- i = 0;
- goto <D.2085>;
- [...]
-
- ..., which results in ICE in:
-
- #1 0x0000000000d2247b in lower_omp_target (gsi_p=gsi_p@entry=0x7fffffffbc90, ctx=ctx@entry=0x2c994c0) at [...]/gcc/omp-low.c:11981
- 11981 gcc_assert (offloaded);
- (gdb) list
- 11976 talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
- 11977 gimplify_assign (x, var, &ilist);
- 11978 }
- 11979 else if (is_gimple_reg (var))
- 11980 {
- 11981 gcc_assert (offloaded);
- 11982 tree avar = create_tmp_var (TREE_TYPE (var));
- 11983 mark_addressable (avar);
- 11984 enum gomp_map_kind map_kind = OMP_CLAUSE_MAP_KIND (c);
- 11985 if (GOMP_MAP_COPY_TO_P (map_kind)
- (gdb) call debug_tree(var)
- <var_decl 0x7ffff7feebd0 i
- type <integer_type 0x7ffff67be5e8 int sizes-gimplified public SI
- size <integer_cst 0x7ffff67a5f18 constant 32>
- unit-size <integer_cst 0x7ffff67a5f30 constant 4>
- align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff67be5e8 precision:32 min <integer_cst 0x7ffff67a5ed0 -2147483648> max <integer_cst 0x7ffff67a5ee8 2147483647>
- pointer_to_this <pointer_type 0x7ffff67c69d8>>
- used read SI [...]:15:12 size <integer_cst 0x7ffff67a5f18 32> unit-size <integer_cst 0x7ffff67a5f30 4>
- align:32 warn_if_not_align:0 context <function_decl 0x7ffff68eea00 main>>
-
- Just defusing the 'assert' is not sufficient:
-
- libgomp: present clause: !acc_is_present (0x7ffe29cba3ec, 4 (0x4))
-
- TODO Can't the 'omp_oacc_kernels_decompose' transformation be much simpler, such that we avoid the intermediate 'data' if we've got just one compute construct inside it?
- TODO But it's not clear if that'd just resolve one simple instance of the general problem?
-
-*/
deleted file mode 100644
@@ -1,22 +0,0 @@
-/* Test OpenACC 'kernels' construct decomposition. */
-
-/* { dg-additional-options "-fopt-info-omp-all" } */
-
-/* { dg-additional-options "-fchecking --param=openacc-kernels=decompose" } */
-/* { dg-ice "TODO" }
- { dg-prune-output "during GIMPLE pass: omplower" } */
-
-/* { dg-additional-options "--param=openacc-privatization=noisy" } */
-
-/* Reduced from 'kernels-decompose-ice-1.c'. */
-
-int
-main ()
-{
-#pragma acc kernels
- /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .-1 } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
- {
- int i;
- }
-}
new file mode 100644
@@ -0,0 +1,19 @@
+/* Reduced from 'libgomp.oacc-c-c++-common/kernels-loop-2.c'. */
+
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+void
+foo (void) /* { dg-line l_f_1 } */
+{
+#pragma acc kernels /* { dg-line l_k_1 } */
+ /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_k_1 } */
+ /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_k_1 } */
+ /* { dg-bogus {note: beginning 'parloops' part in OpenACC 'kernels' region} {TODO location} { xfail *-*-* } l_f_1 }
+ { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} TODO { xfail *-*-* } .+1 } */
+ for (int i;;)
+ ;
+}
new file mode 100644
@@ -0,0 +1,43 @@
+! Check offloaded function's attributes and classification for OpenACC
+! kernels.
+
+! { dg-additional-options "--param openacc-kernels=parloops" }
+
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-fdump-tree-ompexp" }
+! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-oaccloops" }
+
+! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
+! aspects of that functionality.
+
+program main
+ implicit none
+ integer, parameter :: n = 1024
+ integer, dimension (0:n-1) :: a, b, c
+ integer :: i
+
+ call setup(a, b)
+
+ !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+ do i = 0, n - 1
+ c(i) = a(i) + b(i)
+ end do
+ !$acc end kernels
+end program main
+
+! Check the offloaded function's attributes.
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+
+! Check that exactly one OpenACC kernels construct is analyzed, and that it
+! can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+
+! Check the offloaded function's classification and compute dimensions (will
+! always be 1 x 1 x 1 for non-offloading compilation).
+! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
new file mode 100644
@@ -0,0 +1,47 @@
+! Check offloaded function's attributes and classification for unparallelized
+! OpenACC kernels.
+
+! { dg-additional-options "--param openacc-kernels=parloops" }
+
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-fdump-tree-ompexp" }
+! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-oaccloops" }
+
+! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
+! aspects of that functionality.
+
+program main
+ implicit none
+ integer, parameter :: n = 1024
+ integer, dimension (0:n-1) :: a, b, c
+ integer :: i
+
+ ! An "external" mapping of loop iterations/array indices makes the loop
+ ! unparallelizable.
+ integer, external :: f
+
+ call setup(a, b)
+
+ !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+ do i = 0, n - 1
+ c(i) = a(f (i)) + b(f (i))
+ end do
+ !$acc end kernels
+end program main
+
+! Check the offloaded function's attributes.
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+
+! Check that exactly one OpenACC kernels construct is analyzed, and that it
+! can't be parallelized.
+! { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } }
+
+! Check the offloaded function's classification and compute dimensions (will
+! always be 1 x 1 x 1 for non-offloading compilation).
+! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
@@ -1,8 +1,10 @@
! Check offloaded function's attributes and classification for unparallelized
! OpenACC kernels.
+! { dg-additional-options "--param openacc-kernels=decompose" }
+
! { dg-additional-options "-O2" }
-! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-fopt-info-all-omp" }
! { dg-additional-options "-fdump-tree-ompexp" }
! { dg-additional-options "-fdump-tree-parloops1-all" }
! { dg-additional-options "-fdump-tree-oaccloops" }
@@ -23,6 +25,7 @@ program main
call setup(a, b)
!$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+ ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
do i = 0, n - 1
c(i) = a(f (i)) + b(f (i))
end do
@@ -1,8 +1,10 @@
! Check offloaded function's attributes and classification for OpenACC
! kernels.
+! { dg-additional-options "--param openacc-kernels=decompose" }
+
! { dg-additional-options "-O2" }
-! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-fopt-info-all-omp" }
! { dg-additional-options "-fdump-tree-ompexp" }
! { dg-additional-options "-fdump-tree-parloops1-all" }
! { dg-additional-options "-fdump-tree-oaccloops" }
@@ -19,6 +21,7 @@ program main
call setup(a, b)
!$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+ ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
do i = 0, n - 1
c(i) = a(i) + b(i)
end do
@@ -1,5 +1,5 @@
/* { dg-additional-options "--param=openacc-kernels=decompose" } */
-/* Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.
+/* ICE similar to PR100280, but not the same.
{ dg-ice "TODO" }
TODO { dg-prune-output "during GIMPLE pass: omplower" }
TODO { dg-do link } */
@@ -3,7 +3,7 @@
/* Based on '../libgomp.oacc-fortran/asyncwait-1.f90'. */
/* { dg-additional-options "--param=openacc-kernels=decompose" } */
-/* TODO To avoid PR100280 ICE { dg-additional-options "--param=openacc-kernels=parloops" } */
+/* { dg-xfail-run-if TODO { openacc_radeon_accel_selected } } */
/* { dg-additional-options "-fopt-info-all-omp" }
{ dg-additional-options "-foffload=-fopt-info-all-omp" } */
@@ -202,11 +202,12 @@ main (void)
#pragma acc data copy (a[0:N]) copy (b[0:N])
{
-#pragma acc kernels async /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
- /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc kernels async
#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+ /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
for (int i = 0; i < N; ++i)
b[i] = a[i];
@@ -229,11 +230,12 @@ main (void)
#pragma acc data copy (a[0:N]) copy (b[0:N])
{
-#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
- /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc kernels async (1)
#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+ /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
for (int i = 0; i < N; ++i)
b[i] = a[i];
@@ -259,24 +261,27 @@ main (void)
#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N])
{
#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+ /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute } */
/* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target { ! __OPTIMIZE__ } } l_compute$c_compute }
{ dg-optimized "assigned OpenACC gang loop parallelism" "" { target { __OPTIMIZE__ } } l_compute$c_compute } */
+ /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
for (int i = 0; i < N; ++i)
b[i] = (a[i] * a[i] * a[i]) / a[i];
#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+ /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute } */
/* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target { ! __OPTIMIZE__ } } l_compute$c_compute }
{ dg-optimized "assigned OpenACC gang loop parallelism" "" { target { __OPTIMIZE__ } } l_compute$c_compute } */
+ /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
for (int i = 0; i < N; ++i)
c[i] = (a[i] * 4) / a[i];
-#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
- /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc kernels async (1)
#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+ /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
for (int i = 0; i < N; ++i)
d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
@@ -307,33 +312,37 @@ main (void)
#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N])
{
#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+ /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute } */
/* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target { ! __OPTIMIZE__ } } l_compute$c_compute }
{ dg-optimized "assigned OpenACC gang loop parallelism" "" { target { __OPTIMIZE__ } } l_compute$c_compute } */
+ /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
for (int i = 0; i < N; ++i)
b[i] = (a[i] * a[i] * a[i]) / a[i];
-#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
- /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc kernels async (1)
#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+ /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
for (int i = 0; i < N; ++i)
c[i] = (a[i] * 4) / a[i];
-#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
- /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc kernels async (1)
#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+ /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
for (int i = 0; i < N; ++i)
d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
-#pragma acc kernels wait (1) async (1) /* { dg-line l_compute[incr c_compute] } */
- /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
- /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc kernels wait (1) async (1)
#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+ /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+ /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
for (int i = 0; i < N; ++i)
e[i] = a[i] + b[i] + c[i] + d[i];
@@ -32,11 +32,7 @@ int main()
{
/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
int c = 234;
- /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute }
- { dg-note {variable 'c\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
-
- /*TODO Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'. */
- (volatile int *) &c;
+ /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute } */
#pragma acc loop independent gang /* { dg-line l_loop_i[incr c_loop_i] } */
/* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i } */
--
2.34.1