From patchwork Wed Nov 17 16:03:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47815
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 087293858D28
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:10:00 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id D9DC53858432
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:03:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D9DC53858432
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 8h/G++Oy59cg0VQg4FEMJ0942AUQOQyaCMcWukAlPc0RXShuCQaMLts+6ILylIoLhF+BaO2iEh
 xxFMiyzVSt96a3HUsUEqXkqlAqOPMtCZ46FvH5qGkaI2PlVbT+vUIqv4BEdcr7QAyTCXFnEcIo
 Y7DjtdRLnzOPjZaf20NUA4C24IzPv4GlHxvwNrUz3tfNgjOO5dNVzw2EKNaQepi+P1BF8FSoOW
 XUZfO7OMjxM1xsSfW+RDQRa9ky2ubRQZzlRml7C0DPtVmrGimVlHuji/Tiid+LDKk5oDXGgFyl
 QQ+mjOwuH+OW9mWg0FusFTkE
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68604012"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa2.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:03:42 -0800
IronPort-SDR: 
 UPSFOt15Nokwk0wB+sg415+7AnXKrxWraCKn2fL2wDcTBU63HGJqHawyDd27mB25IGc+7yVaNt
 JKwliI99Gy8l1exNqQwYriCR4A2GwiD/bn8IWEQ8NWXA1ZvEHIZbYom4N4g9D1nrI+OTwPVPYa
 bBQ846sJwuMJBxTvlzfxBWVvbe3z9giY5ERCOnMPWsT+fU5a8lAmcnIn0WUdyn1SRZIq8RobQe
 JvPpBRUOC7b7C6KL7K6BRW6UL4VrsGa9bJY+kxX5Vyy8J/1kNzcaX2bhunWD+GZIumuCp6eO9u
 /eA=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 01/22] Fortran: delinearize multi-dimensional
 array accesses
Date: Wed, 17 Nov 2021 17:03:10 +0100
Message-ID: <20211117160330.20029-2-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

From: Sandra Loosemore <sandra@codesourcery.com>

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

        gcc/
        * expr.c (get_inner_reference): Handle NOP_EXPR like
        VIEW_CONVERT_EXPR.

        gcc/fortran/
        * lang.opt (-param=delinearize=): New.
        * trans-array.c (get_class_array_vptr): New, split from...
        (build_array_ref): ...here.
        (get_array_lbound, get_array_ubound): New, split from...
        (gfc_conv_array_ref): ...here.  Additional code refactoring
        plus support for delinearization of the array access.

        gcc/testsuite/
        * gfortran.dg/assumed_type_2.f90: Adjust patterns.
        * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
        * gfortran.dg/graphite/block-3.f90: Remove xfails.
        * gfortran.dg/graphite/block-4.f90: Likewise.
        * gfortran.dg/inline_matmul_24.f90: Adjust patterns.
        * gfortran.dg/no_arg_check_2.f90: Likewise.
        * gfortran.dg/pr32921.f: Likewise.
        * gfortran.dg/reassoc_4.f: Disable delinearization for this test.

Co-Authored-By: Tobias Burnus  <tobias@codesourcery.com>
---
 gcc/expr.c                                    |   1 +
 gcc/fortran/lang.opt                          |   4 +
 gcc/fortran/trans-array.c                     | 321 +++++++++++++-----
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90          |   1 -
 .../gfortran.dg/graphite/block-4.f90          |   1 -
 gcc/testsuite/gfortran.dg/graphite/id-9.f     |   2 +-
 .../gfortran.dg/inline_matmul_24.f90          |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f           |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f         |   2 +-
 13 files changed, 264 insertions(+), 95 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/expr.c b/gcc/expr.c
index 21b7e96ed62e..c7ee800c4d4f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7539,6 +7539,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
          break;

        case VIEW_CONVERT_EXPR:
+       case NOP_EXPR:
          break;

        case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index dba333448c11..1548d56278a4 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index b7d949929722..3eb9a1778173 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
     }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr)
          && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
        vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
     }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank N of array DECL which might
+   be either a bare array or a descriptor.  This differs from
+   gfc_conv_array_lbound because it gets information for temporary array
+   objects from AR instead of the descriptor (they can differ).  */
+
+static tree
+get_array_lbound (tree decl, int n, gfc_symbol *sym,
+                 gfc_array_ref *ar, gfc_se *se)
+{
+  if (sym->attr.temporary)
+    {
+      gfc_se tmpse;
+      gfc_init_se (&tmpse, se);
+      gfc_conv_expr_type (&tmpse, ar->as->lower[n], gfc_array_index_type);
+      gfc_add_block_to_block (&se->pre, &tmpse.pre);
+      return tmpse.expr;
+    }
+  else
+    return gfc_conv_array_lbound (decl, n);
+}
+
+/* Similarly for the upper bound.  */
+static tree
+get_array_ubound (tree decl, int n, gfc_symbol *sym,
+                 gfc_array_ref *ar, gfc_se *se)
+{
+  if (sym->attr.temporary)
+    {
+      gfc_se tmpse;
+      gfc_init_se (&tmpse, se);
+      gfc_conv_expr_type (&tmpse, ar->as->upper[n], gfc_array_index_type);
+      gfc_add_block_to_block (&se->pre, &tmpse.pre);
+      return tmpse.expr;
+    }
+  else
+    return gfc_conv_array_ubound (decl, n);
+}
+

 /* Build an array reference.  se->expr already holds the array descriptor.
    This should be either a variable, indirect variable reference or component
    reference.  For arrays which do not have a descriptor, se->expr will be
    the data pointer.
-   a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]*/
+
+   There are two strategies here.  In the traditional case, multidimensional
+   arrays are explicitly linearized into a one-dimensional array, with the
+   index computed as if by
+   a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]
+
+   However, we can often get better code using the Graphite framework
+   and scalar evolutions in the middle end, which expects to see
+   multidimensional array accesses represented as nested ARRAY_REFs, similar
+   to what the C/C++ front ends produce.  Delinearization is controlled
+   by flag_delinearize_aref.  */

 void
 gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
@@ -3798,11 +3851,16 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
   tree tmp;
   tree stride;
   tree decl = NULL_TREE;
+  tree cooked_decl = NULL_TREE;
+  tree vptr = se->class_vptr;
   gfc_se indexse;
   gfc_se tmpse;
   gfc_symbol * sym = expr->symtree->n.sym;
   char *var_name = NULL;
+  tree aref = NULL_TREE;
+  tree atype = NULL_TREE;

+  /* Handle coarrays.  */
   if (ar->dimen == 0)
     {
       gcc_assert (ar->codimen || sym->attr.select_rank_temporary
@@ -3862,15 +3920,160 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
        }
     }

+  /* Per comments above, DECL is not always a declaration.  It may be
+     either a variable, indirect variable reference, or component
+     reference.  It may have array or pointer type, or it may be a
+     descriptor with RECORD_TYPE.  */
   decl = se->expr;
   if (IS_CLASS_ARRAY (sym) && sym->attr.dummy && ar->as->type != AS_DEFERRED)
     decl = sym->backend_decl;

-  cst_offset = offset = gfc_index_zero_node;
-  add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl));
+  /* A pointer array component can be detected from its field decl. Fix
+     the descriptor, mark the resulting variable decl and store it in
+     COOKED_DECL to pass to gfc_build_array_ref.  */
+  if (get_CFI_desc (sym, expr, &cooked_decl, ar))
+    cooked_decl = build_fold_indirect_ref_loc (input_location, cooked_decl);
+  if (!expr->ts.deferred && !sym->attr.codimension
+      && is_pointer_array (se->expr))
+    {
+      if (TREE_CODE (se->expr) == COMPONENT_REF)
+       cooked_decl = se->expr;
+      else if (TREE_CODE (se->expr) == INDIRECT_REF)
+       cooked_decl = TREE_OPERAND (se->expr, 0);
+      else
+       cooked_decl = se->expr;
+    }
+  else if (expr->ts.deferred
+          || (sym->ts.type == BT_CHARACTER
+              && sym->attr.select_type_temporary))
+    {
+      if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+       {
+         cooked_decl = se->expr;
+         if (TREE_CODE (cooked_decl) == INDIRECT_REF)
+           cooked_decl = TREE_OPERAND (cooked_decl, 0);
+       }
+      else
+       cooked_decl = sym->backend_decl;
+    }
+  else if (sym->ts.type == BT_CLASS)
+    {
+      if (UNLIMITED_POLY (sym))
+       {
+         gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr);
+         gfc_init_se (&tmpse, NULL);
+         gfc_conv_expr (&tmpse, class_expr);
+         if (!se->class_vptr)
+           vptr = gfc_class_vptr_get (tmpse.expr);
+         gfc_free_expr (class_expr);
+         cooked_decl = tmpse.expr;
+       }
+      else
+       cooked_decl = NULL_TREE;
+    }
+
+  /* Find the base of the array; this normally has ARRAY_TYPE.  */
+  tree base = build_fold_indirect_ref_loc (input_location,
+                                          gfc_conv_array_data (se->expr));
+  tree type = TREE_TYPE (base);

-  /* Calculate the offsets from all the dimensions.  Make sure to associate
-     the final offset so that we form a chain of loop invariant summands.  */
+  /* Handle special cases, copied from gfc_build_array_ref.  After we get
+     through this, we know TYPE definitely is an ARRAY_TYPE.  */
+  if (GFC_ARRAY_TYPE_P (type) && GFC_TYPE_ARRAY_RANK (type) == 0)
+    {
+      gcc_assert (GFC_TYPE_ARRAY_CORANK (type) > 0);
+      se->expr = fold_convert (TYPE_MAIN_VARIANT (type), base);
+      return;
+    }
+  if (TREE_CODE (type) != ARRAY_TYPE)
+    {
+      gcc_assert (cooked_decl == NULL_TREE);
+      se->expr = base;
+      return;
+    }
+
+  /* Check for cases where we cannot delinearize.  */
+
+  bool delinearize = flag_delinearize_aref;
+
+  /* There is no point in trying to delinearize 1-dimensional arrays.  */
+  if (ar->dimen == 1)
+    delinearize = false;
+
+  if (delinearize
+      && (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr))
+         || (DECL_P (se->expr)
+             && DECL_LANG_SPECIFIC (se->expr)
+             && GFC_DECL_SAVED_DESCRIPTOR (se->expr))))
+    {
+      /* Descriptor arrays that may not be contiguous cannot
+        be delinearized without using the stride in the descriptor,
+        which generally involves introducing a division operation.
+        That's unlikely to produce optimal code, so avoid doing it.  */
+      tree desc = se->expr;
+      if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+       desc = GFC_DECL_SAVED_DESCRIPTOR (se->expr);
+      tree tmptype = TREE_TYPE (desc);
+      if (POINTER_TYPE_P (tmptype))
+       tmptype = TREE_TYPE (tmptype);
+      enum gfc_array_kind akind = GFC_TYPE_ARRAY_AKIND (tmptype);
+      if (akind != GFC_ARRAY_ASSUMED_SHAPE_CONT
+         && akind != GFC_ARRAY_ASSUMED_RANK_CONT
+         && akind != GFC_ARRAY_ALLOCATABLE
+         && akind != GFC_ARRAY_POINTER_CONT)
+       delinearize = false;
+    }
+
+  /* See gfc_build_array_ref in trans.c.  If we have a cooked_decl or
+     vptr, then we most likely have to do pointer arithmetic using a
+     linearized array offset.  */
+  if (delinearize && cooked_decl)
+    delinearize = false;
+  else if (delinearize && get_class_array_vptr (se->expr, vptr))
+    delinearize = false;
+
+  if (!delinearize)
+    {
+      /* Initialize the offset from the array descriptor.  This accounts
+        for the array base being something other than zero.  */
+      cst_offset = offset = gfc_index_zero_node;
+      add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl));
+    }
+  else
+    {
+      /* If we are delinearizing, build up the nested array type using the
+        dimension information we have for each rank.  */
+      atype = TREE_TYPE (type);
+      for (n = 0; n < ar->dimen; n++)
+       {
+         /* We're working from the outermost nested array reference inward
+            in this step.  ATYPE is the element type for the access in
+            this rank; build the new array type based on the bounds
+            information and store it back into ATYPE for the next rank's
+            processing.  */
+         tree lbound = get_array_lbound (decl, n, sym, ar, se);
+         tree ubound = get_array_ubound (decl, n, sym, ar, se);
+         tree dimen = build_range_type (TREE_TYPE (lbound),
+                                        lbound, ubound);
+         atype = build_array_type (atype, dimen);
+
+         /* Emit a DECL_EXPR for the array type so the gimplification of
+            its type sizes works correctly.  */
+         if (! TYPE_NAME (atype))
+           TYPE_NAME (atype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
+                                           NULL_TREE, atype);
+         gfc_add_expr_to_block (&se->pre,
+                                build1 (DECL_EXPR, atype,
+                                        TYPE_NAME (atype)));
+       }
+
+      /* Cast base to the innermost array type.  */
+      if (DECL_P (base))
+       TREE_ADDRESSABLE (base) = 1;
+      aref = build1 (NOP_EXPR, atype, base);
+    }
+
+  /* Process indices in reverse order.  */
   for (n = ar->dimen - 1; n >= 0; n--)
     {
       /* Calculate the index for this dimension.  */
@@ -3888,16 +4091,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
          indexse.expr = save_expr (indexse.expr);

          /* Lower bound.  */
-         tmp = gfc_conv_array_lbound (decl, n);
-         if (sym->attr.temporary)
-           {
-             gfc_init_se (&tmpse, se);
-             gfc_conv_expr_type (&tmpse, ar->as->lower[n],
-                                 gfc_array_index_type);
-             gfc_add_block_to_block (&se->pre, &tmpse.pre);
-             tmp = tmpse.expr;
-           }
-
+         tmp = get_array_lbound (decl, n, sym, ar, se);
          cond = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
                                  indexse.expr, tmp);
          msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' "
@@ -3912,16 +4106,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
             arrays.  */
          if (n < ar->dimen - 1 || ar->as->type != AS_ASSUMED_SIZE)
            {
-             tmp = gfc_conv_array_ubound (decl, n);
-             if (sym->attr.temporary)
-               {
-                 gfc_init_se (&tmpse, se);
-                 gfc_conv_expr_type (&tmpse, ar->as->upper[n],
-                                     gfc_array_index_type);
-                 gfc_add_block_to_block (&se->pre, &tmpse.pre);
-                 tmp = tmpse.expr;
-               }
-
+             tmp = get_array_ubound (decl, n, sym, ar, se);
              cond = fold_build2_loc (input_location, GT_EXPR,
                                      logical_type_node, indexse.expr, tmp);
              msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' "
@@ -3934,65 +4119,41 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
            }
        }

-      /* Multiply the index by the stride.  */
-      stride = gfc_conv_array_stride (decl, n);
-      tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
-                            indexse.expr, stride);
-
-      /* And add it to the total.  */
-      add_to_offset (&cst_offset, &offset, tmp);
-    }
-
-  if (!integer_zerop (cst_offset))
-    offset = fold_build2_loc (input_location, PLUS_EXPR,
-                             gfc_array_index_type, offset, cst_offset);
-
-  /* A pointer array component can be detected from its field decl. Fix
-     the descriptor, mark the resulting variable decl and pass it to
-     build_array_ref.  */
-  decl = NULL_TREE;
-  if (get_CFI_desc (sym, expr, &decl, ar))
-    decl = build_fold_indirect_ref_loc (input_location, decl);
-  if (!expr->ts.deferred && !sym->attr.codimension
-      && is_pointer_array (se->expr))
-    {
-      if (TREE_CODE (se->expr) == COMPONENT_REF)
-       decl = se->expr;
-      else if (TREE_CODE (se->expr) == INDIRECT_REF)
-       decl = TREE_OPERAND (se->expr, 0);
-      else
-       decl = se->expr;
-    }
-  else if (expr->ts.deferred
-          || (sym->ts.type == BT_CHARACTER
-              && sym->attr.select_type_temporary))
-    {
-      if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+      if (!delinearize)
        {
-         decl = se->expr;
-         if (TREE_CODE (decl) == INDIRECT_REF)
-           decl = TREE_OPERAND (decl, 0);
+         /* Multiply the index by the stride.  */
+         stride = gfc_conv_array_stride (decl, n);
+         tmp = fold_build2_loc (input_location, MULT_EXPR,
+                                gfc_array_index_type,
+                                indexse.expr, stride);
+
+         /* And add it to the total.  */
+         add_to_offset (&cst_offset, &offset, tmp);
        }
       else
-       decl = sym->backend_decl;
-    }
-  else if (sym->ts.type == BT_CLASS)
-    {
-      if (UNLIMITED_POLY (sym))
        {
-         gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr);
-         gfc_init_se (&tmpse, NULL);
-         gfc_conv_expr (&tmpse, class_expr);
-         if (!se->class_vptr)
-           se->class_vptr = gfc_class_vptr_get (tmpse.expr);
-         gfc_free_expr (class_expr);
-         decl = tmpse.expr;
+         /* Peel off a layer of array nesting from ATYPE to
+            to get the result type of the new ARRAY_REF.  */
+         atype = TREE_TYPE (atype);
+         aref = build4 (ARRAY_REF, atype, aref, indexse.expr,
+                        NULL_TREE, NULL_TREE);
        }
-      else
-       decl = NULL_TREE;
     }

-  se->expr = build_array_ref (se->expr, offset, decl, se->class_vptr);
+  if (!delinearize)
+    {
+      /* Build a linearized array reference using the offset from all
+        dimensions.  */
+      if (!integer_zerop (cst_offset))
+       offset = fold_build2_loc (input_location, PLUS_EXPR,
+                                 gfc_array_index_type, offset, cst_offset);
+      se->class_vptr = vptr;
+      vptr = get_class_array_vptr (se->expr, vptr);
+      se->expr = gfc_build_array_ref (base, offset, cooked_decl, vptr);
+    }
+ else
+   /* Return the outermost ARRAY_REF we already built.  */
+   se->expr = aref;
 }


diff --git a/gcc/testsuite/gfortran.dg/assumed_type_2.f90 b/gcc/testsuite/gfortran.dg/assumed_type_2.f90
index 5d3cd7eaece9..07be87ef1eb6 100644
--- a/gcc/testsuite/gfortran.dg/assumed_type_2.f90
+++ b/gcc/testsuite/gfortran.dg/assumed_type_2.f90
@@ -147,12 +147,12 @@ end

 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }

-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index b2eb98959f80..23e64d29ab3e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -9,7 +9,7 @@ program main
    integer :: a(100,100), b(100,100)
    integer :: i, j, d

-   !$acc kernels ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+   !$acc kernels ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
    do i=1,100
      do j=1,100
        a(i,j) = 1
diff --git a/gcc/testsuite/gfortran.dg/graphite/block-2.f b/gcc/testsuite/gfortran.dg/graphite/block-2.f
index bea8ddeb8267..266da378c5d9 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-2.f
+++ b/gcc/testsuite/gfortran.dg/graphite/block-2.f
@@ -1,5 +1,11 @@
 ! { dg-do compile }
 ! { dg-additional-options "-std=legacy" }
+
+! ldist introduces a __builtin_memset for the first loop and hence
+! breaks the testcases's assumption regarding the number of SCoPs
+! because Graphite cannot deal with the call.
+! { dg-additional-options "-fdisable-tree-ldist" }
+
       SUBROUTINE MATRIX_MUL_UNROLLED (A, B, C, L, M, N)
       DIMENSION A(L,M), B(M,N), C(L,N)

@@ -18,5 +24,4 @@
       RETURN
       END

-! Disabled for now as it requires delinearization.
-! { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" } }
diff --git a/gcc/testsuite/gfortran.dg/graphite/block-3.f90 b/gcc/testsuite/gfortran.dg/graphite/block-3.f90
index 452de7349050..60c7952c3654 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-3.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/block-3.f90
@@ -12,6 +12,5 @@ enddo

 end subroutine matrix_multiply

-! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } }

diff --git a/gcc/testsuite/gfortran.dg/graphite/block-4.f90 b/gcc/testsuite/gfortran.dg/graphite/block-4.f90
index 42af5b62444e..1bc7a1bb2ae1 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-4.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/block-4.f90
@@ -15,6 +15,5 @@ enddo

 end subroutine matrix_multiply

-! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } }

diff --git a/gcc/testsuite/gfortran.dg/graphite/id-9.f b/gcc/testsuite/gfortran.dg/graphite/id-9.f
index c93937088972..a43f02bdd42c 100644
--- a/gcc/testsuite/gfortran.dg/graphite/id-9.f
+++ b/gcc/testsuite/gfortran.dg/graphite/id-9.f
@@ -8,7 +8,7 @@
                   do l=1,3
                      do k=1,l
                      enddo
-                     bar(k,l)=bar(k,l)+(v3b-1.d0)
+                     bar(k,l)=bar(k,l)+(v3b-1.d0) ! { dg-bogus ".*iteration 2 invokes undefined behavior" "TODO-kernels Caused by delinearization patch" { xfail *-*-* }   }
                   enddo
             enddo
             do m=1,ne
diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
index 3168d5f10064..8d84f3cdb01b 100644
--- a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
+++ b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
@@ -39,4 +39,4 @@ program testMATMUL
       call abort()
     end if
 end program testMATMUL
-! { dg-final { scan-tree-dump-times "gamma5\\\[__var_1_do \\* 4 \\+ __var_2_do\\\]|gamma5\\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } }
+! { dg-final { scan-tree-dump-times "gamma5.*\\\[NON_LVALUE_EXPR <__var_1_do>\\\]\\\[NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
index 3570b9719ebb..0900dd82646f 100644
--- a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
+++ b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
@@ -129,12 +129,12 @@ end

 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }

-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/pr32921.f b/gcc/testsuite/gfortran.dg/pr32921.f
index 0661208edde5..853438609c43 100644
--- a/gcc/testsuite/gfortran.dg/pr32921.f
+++ b/gcc/testsuite/gfortran.dg/pr32921.f
@@ -45,4 +45,4 @@

       RETURN
       END
-! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } }
+! { dg-final { scan-tree-dump-times "ubound" 4 "lim2" } }
diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f b/gcc/testsuite/gfortran.dg/reassoc_4.f
index fdcb46e835cf..2368b76aecb2 100644
--- a/gcc/testsuite/gfortran.dg/reassoc_4.f
+++ b/gcc/testsuite/gfortran.dg/reassoc_4.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200" }
+! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200 --param delinearize=0" }
       subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight)
       integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1
       real*8 w(3,3),vo(3,3),anisox(3,3,3,3),s(60,60),weight

From patchwork Wed Nov 17 16:03:11 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47817
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id EC217385841C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:11:08 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 3F754385800C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:03:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3F754385800C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 7FC4SR3BbUsO0XzgfKfCdxcS/zZu6NRM9NAI+HuaKn5fkZIYkUWgYObFuYDqumlw+WFSzED5+f
 q0lsyhuO+2rj0k8rIT52PLLWllPHc22EROUUiS8sGQ8nnsCdQErpJcN/PycWwLiihRn4yQ792O
 panD7HwWw9LmnioHurfd19G5GLTI9wYlH4bE7mF0EL+5JuUMJWKQ30Nu4AiN2Ehcb671Z3NxpT
 uSWmPt1dIm7A4XMvfRJfFc8qImPCfCWep/QFujT584yLGINfxN1khnfLwO8wZb8VBgmSwLETkh
 joTrnoI0EnmqDTOfc60awLFk
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68604015"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa2.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:03:46 -0800
IronPort-SDR: 
 nrP8jiHLTVCZT4dW+6BeZ0r4GtNnuOkP+ucGfuoIZkI2I0noQexTK3KOD4sg1IV2Ac7sb5N6yz
 GWnO0fDRO3tWOuK9M17rxPznirPqOJSTwQe83KcRjMSCih4DFI+gGgbS8H+eMwz4c57m9BUlP0
 NCYhFyzRRw4FyVtuZUZyjpEq90MbunC0GQd2m1L0XyhqFbVucWoYg5Jikcdbz3fo/trNJ+UJxO
 U5RFyV0u8A46IidfO10uqR64Qofa78e7kF4FntvYDLfrSTX7rmb5vGavf4UI0TNVW3oEBBwsGB
 ujE=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 02/22] openacc: Move pass_oacc_device_lower
 after pass_graphite
Date: Wed, 17 Nov 2021 17:03:11 +0100
Message-ID: <20211117160330.20029-3-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

        * omp-general.c (oacc_get_fn_dim_size): Return 0 on
        missing "dims".
        * omp-offload.c (pass_oacc_loop_designation::clone): New
        member function.
        (pass_oacc_gimple_workers::clone): Likewise.
        (pass_oacc_gimple_device_lower::clone): Likewise.
        * passes.c (pass_data_no_loop_optimizations): New pass_data.
        (class pass_no_loop_optimizations): New pass.
        (make_pass_no_loop_optimizations): New function.
        * passes.def: Move pass_oacc_{loop_designation,
        gimple_workers, device_lower} into tree_loop, and add
        copies to pass_tree_no_loop and to new
        pass_no_loop_optimizations.  Add copies of passes pass_ccp,
        pass_ipa_warn, pass_complete_unrolli, pass_backprop,
        pass_phiprop, pass_fix_loops after the OpenACC passes
        in pass_tree_loop.
        * tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
        New member function.
        (pass_complete_unrolli::clone): Likewise.
        * tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
        (pass_tree_loop_init::clone): Likewise.
        (pass_tree_loop_done::clone): Likewise.
        * tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
        expected output to pass name changes due to the pass
        reordering and cloning.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

        * gcc.dg/goacc/loop-processing-1.c: Adjust expected output
        * to pass name changes due to the pass reordering and cloning.
        * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
        * c-c++-common/goacc/classify-kernels.c: Likewise.
        * c-c++-common/goacc/classify-parallel.c: Likewise.
        * c-c++-common/goacc/classify-routine.c: Likewise.
        * c-c++-common/goacc/routine-nohost-1.c: Likewise.
        * c-c++-common/unroll-1.c: Likewise.
        * c-c++-common/unroll-4.c: Likewise.
        * gcc.dg/goacc/loop-processing-1.c: Likewise.
        * gcc.dg/tree-ssa/backprop-1.c: Likewise.
        * gcc.dg/tree-ssa/backprop-2.c: Likewise.
        * gcc.dg/tree-ssa/backprop-3.c: Likewise.
        * gcc.dg/tree-ssa/backprop-4.c: Likewise.
        * gcc.dg/tree-ssa/backprop-5.c: Likewise.
        * gcc.dg/tree-ssa/backprop-6.c: Likewise.
        * gcc.dg/tree-ssa/cunroll-1.c: Likewise.
        * gcc.dg/tree-ssa/cunroll-3.c: Likewise.
        * gcc.dg/tree-ssa/cunroll-9.c: Likewise.
        * gcc.dg/tree-ssa/ldist-17.c: Likewise.
        * gcc.dg/tree-ssa/loop-38.c: Likewise.
        * gcc.dg/tree-ssa/pr21463.c: Likewise.
        * gcc.dg/tree-ssa/pr45427.c: Likewise.
        * gcc.dg/tree-ssa/pr61743-1.c: Likewise.
        * gcc.dg/unroll-2.c: Likewise.
        * gcc.dg/unroll-3.c: Likewise.
        * gcc.dg/unroll-4.c: Likewise.
        * gcc.dg/unroll-5.c: Likewise.
        * gcc.dg/vect/vect-profile-1.c: Likewise.
        * c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
        * c-c++-common/goacc/device-lowering-no-loops.c: New test.
        * c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/omp-general.c                             |  8 +-
 gcc/omp-offload.c                             |  8 ++
 gcc/passes.c                                  | 42 ++++++++
 gcc/passes.def                                | 44 ++++++++-
 .../goacc/classify-kernels-unparallelized.c   |  8 +-
 .../c-c++-common/goacc/classify-kernels.c     |  8 +-
 .../c-c++-common/goacc/classify-parallel.c    |  8 +-
 .../c-c++-common/goacc/classify-routine.c     |  8 +-
 .../device-lowering-debug-optimization.c      | 29 ++++++
 .../goacc/device-lowering-no-loops.c          | 17 ++++
 .../goacc/device-lowering-no-optimization.c   | 30 ++++++
 .../c-c++-common/goacc/routine-nohost-1.c     |  2 +-
 gcc/testsuite/c-c++-common/unroll-1.c         |  8 +-
 gcc/testsuite/c-c++-common/unroll-4.c         |  4 +-
 .../gcc.dg/goacc/loop-processing-1.c          |  7 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c    |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c    |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c    |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c    |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c     |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c     |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c     |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c      |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-38.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21463.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr45427.c       |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c     |  2 +-
 gcc/testsuite/gcc.dg/unroll-2.c               |  2 +-
 gcc/testsuite/gcc.dg/unroll-3.c               |  4 +-
 gcc/testsuite/gcc.dg/unroll-4.c               |  4 +-
 gcc/testsuite/gcc.dg/unroll-5.c               |  4 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-59.c         |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-profile-1.c    |  2 +-
 gcc/tree-pass.h                               |  2 +
 gcc/tree-ssa-loop-ivcanon.c                   |  2 +
 gcc/tree-ssa-loop.c                           | 99 +++++++++++++++++++
 gcc/tree-ssa-phiprop.c                        |  2 +
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |  2 +-
 .../vector-length-128-1.c                     |  4 +-
 .../vector-length-128-2.c                     |  4 +-
 .../vector-length-128-3.c                     |  4 +-
 .../vector-length-128-4.c                     |  4 +-
 .../vector-length-128-5.c                     |  4 +-
 .../vector-length-128-6.c                     |  4 +-
 .../vector-length-128-7.c                     |  4 +-
 48 files changed, 360 insertions(+), 86 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 5b34584ad051..694c14af7b9e 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -2954,7 +2954,13 @@ oacc_get_fn_dim_size (tree fn, int axis)
   while (axis--)
     dims = TREE_CHAIN (dims);

-  int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+  tree v = TREE_VALUE (dims);
+  /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to
+     avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
+  if (v == NULL_TREE)
+    return 0;
+
+  int size = TREE_INT_CST_LOW (v);

   return size;
 }
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index b4592594ee49..bbdcc5207880 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2594,6 +2594,8 @@ public:
       return execute_oacc_loop_designation ();
     }

+  opt_pass * clone () { return new pass_oacc_loop_designation (m_ctxt); }
+
 }; // class pass_oacc_loop_designation

 const pass_data pass_data_oacc_gimple_workers =
@@ -2628,6 +2630,8 @@ public:
     {
       return execute_oacc_gimple_workers ();
     }
+
+  opt_pass * clone () { return new pass_oacc_gimple_workers (m_ctxt); }

 }; // class pass_oacc_gimple_workers

@@ -2652,12 +2656,16 @@ public:
   {}

   /* opt_pass methods: */
+  /* TODO If this were gated on something like '!(fun->curr_properties &
+     PROP_gimple_oaccdevlow)', then we could easily have several instances
+     in the pass pipeline? */
   virtual bool gate (function *) { return flag_openacc; };

   virtual unsigned int execute (function *)
     {
       return execute_oacc_device_lower ();
     }
+  opt_pass * clone () { return new pass_oacc_device_lower (m_ctxt); }

 }; // class pass_oacc_device_lower

diff --git a/gcc/passes.c b/gcc/passes.c
index 64550b00b43c..4a1f4a4b5900 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -620,6 +620,48 @@ make_pass_all_optimizations_g (gcc::context *ctxt)

 namespace {

+const pass_data pass_data_no_loop_optimizations =
+{
+  GIMPLE_PASS, /* type */
+  "*no_loop_optimizations", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_OPTIMIZE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+/* This pass runs if loop optimizations are disabled
+   at the current optimization level. */
+
+class pass_no_loop_optimizations : public gimple_opt_pass
+{
+public:
+  pass_no_loop_optimizations (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_no_loop_optimizations, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool
+  gate (function *)
+  {
+    return !optimize || optimize_debug;
+  }
+
+}; // class pass_no_loop_optimizations
+
+} // anon namespace
+
+static gimple_opt_pass *
+make_pass_no_loop_optimizations (gcc::context *ctxt)
+{
+  return new pass_no_loop_optimizations (ctxt);
+}
+
+namespace {
+
 const pass_data pass_data_rest_of_compilation =
 {
   RTL_PASS, /* type */
diff --git a/gcc/passes.def b/gcc/passes.def
index f6e99ac1f4ed..9220fdc8ca75 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -183,9 +183,6 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
-  NEXT_PASS (pass_oacc_loop_designation);
-  NEXT_PASS (pass_oacc_gimple_workers);
-  NEXT_PASS (pass_oacc_device_lower);
   NEXT_PASS (pass_omp_device_lower);
   NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_adjust_alignment);
@@ -288,6 +285,35 @@ along with GCC; see the file COPYING3.  If not see
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_parallelize_loops, false /* oacc_kernels_p */);
          NEXT_PASS (pass_expand_omp_ssa);
+
+         /* Interrupt pass_tree_loop for OpenACC device lowering. */
+         NEXT_PASS (pass_oacc_only);
+         PUSH_INSERT_PASSES_WITHIN (pass_oacc_only)
+           NEXT_PASS (pass_tree_loop_done);
+           NEXT_PASS (pass_oacc_loop_designation);
+           NEXT_PASS (pass_oacc_gimple_workers);
+           NEXT_PASS (pass_oacc_device_lower);
+
+           NEXT_PASS (pass_oacc_functions_only);
+           PUSH_INSERT_PASSES_WITHIN (pass_oacc_functions_only)
+               /* Repeat some passes on OpenACC functions after device lowering. */
+               /* Lower complex instructions arising from OpenACC
+               reductions. */
+               NEXT_PASS (pass_lower_complex);
+               /* Those passes are necessary here to allow the loop vectorizer to
+               work on the offloading functions which is important for AMD GCN
+               offloading. */
+               NEXT_PASS (pass_ccp, true /* nonzero_p */);
+               NEXT_PASS (pass_complete_unrolli);
+               NEXT_PASS (pass_backprop);
+               NEXT_PASS (pass_phiprop);
+               NEXT_PASS (pass_fix_loops);
+           POP_INSERT_PASSES ()
+
+          /* Continue pass_tree_loop after OpenACC device lowering. */
+         NEXT_PASS (pass_tree_loop_init);
+         POP_INSERT_PASSES ()
+
          NEXT_PASS (pass_ch_vect);
          NEXT_PASS (pass_if_conversion);
          /* pass_vectorize must immediately follow pass_if_conversion.
@@ -307,15 +333,21 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_loop_prefetch);
          /* Run IVOPTs after the last pass that uses data-reference analysis
             as that doesn't handle TARGET_MEM_REFs.  */
+
          NEXT_PASS (pass_iv_optimize);
          NEXT_PASS (pass_lim);
          NEXT_PASS (pass_tree_loop_done);
       POP_INSERT_PASSES ()
+
+
       /* Pass group that runs when pass_tree_loop is disabled or there
          are no loops in the function.  */
       NEXT_PASS (pass_tree_no_loop);
       PUSH_INSERT_PASSES_WITHIN (pass_tree_no_loop)
          NEXT_PASS (pass_slp_vectorize);
+         NEXT_PASS (pass_oacc_loop_designation);
+         NEXT_PASS (pass_oacc_gimple_workers);
+         NEXT_PASS (pass_oacc_device_lower);
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_simduid_cleanup);
       NEXT_PASS (pass_lower_vector_ssa);
@@ -393,6 +425,12 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_local_pure_const);
       NEXT_PASS (pass_modref);
   POP_INSERT_PASSES ()
+  NEXT_PASS (pass_no_loop_optimizations);
+  PUSH_INSERT_PASSES_WITHIN (pass_no_loop_optimizations)
+      NEXT_PASS (pass_oacc_loop_designation);
+      NEXT_PASS (pass_oacc_gimple_workers);
+      NEXT_PASS (pass_oacc_device_lower);
+  POP_INSERT_PASSES ()
   NEXT_PASS (pass_tm_init);
   PUSH_INSERT_PASSES_WITHIN (pass_tm_init)
       NEXT_PASS (pass_tm_mark);
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index d8706b9a0a0a..7ce42a469ad3 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -6,7 +6,7 @@
    { dg-additional-options "-fopt-info-note-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
    { dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -40,6 +40,6 @@ void KERNELS ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index e3dc5c01a29b..de7525e67f14 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -6,7 +6,7 @@
    { dg-additional-options "-fopt-info-note-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
    { dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -35,6 +35,6 @@ void KERNELS ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
index 6225a4381dd4..68deb4fdfaf6 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -27,6 +27,6 @@ void PARALLEL ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
index 3454771ed92b..dcd2522be1de 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -29,6 +29,6 @@ void ROUTINE ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\), oacc function \\(0 1, 1 0, 1 0\\)\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
new file mode 100644
index 000000000000..5bf37cc61580
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
@@ -0,0 +1,29 @@
+/* Verify that OpenACC device lowering executes with "-Og". The actual logic in
+   the test function does not matter. */
+
+/* { dg-additional-options "-Og -fdump-tree-oaccdevlow" } */
+
+int main()
+{
+  int i, j;
+  int ina[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      ina[j * 32 + i] = (i == j) ? 2 : 0;
+
+  acc = 0;
+#pragma acc parallel loop copy(acc, ina, out)
+      for (j = 0; j < 32; j++)
+        {
+#pragma acc loop reduction(+:acc)
+         for (i = 0; i < 32; i++)
+              acc += ina[i];
+
+         out[j] = acc;
+        }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow3" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
new file mode 100644
index 000000000000..193b5620de1d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
@@ -0,0 +1,17 @@
+/* Verify that OpenACC device lowering executes even if there are no OpenACC
+   loops. */
+
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+int main()
+{
+  int x;
+#pragma acc parallel copy(x)
+  {
+    asm volatile("");
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow2" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c
new file mode 100644
index 000000000000..69e2b22d73ba
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c
@@ -0,0 +1,30 @@
+/* Verify that OpenACC device lowering executes with "-O0".  The actual
+   logic in the test function does not matter. */
+
+/* { dg-additional-options "-O0 -fdump-tree-oaccdevlow" } */
+
+int main()
+{
+
+  int i, j;
+  int ina[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      ina[j * 32 + i] = (i == j) ? 2 : 0;
+
+  acc = 0;
+#pragma acc parallel loop copy(acc, ina, out)
+      for (j = 0; j < 32; j++)
+        {
+#pragma acc loop reduction(+:acc)
+         for (i = 0; i < 32; i++)
+              acc += ina[i];
+
+         out[j] = acc;
+        }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump ".omp_fn" "oaccdevlow3" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
index ebeaadb0b811..480c57feb05f 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
@@ -25,4 +25,4 @@ float ADD(float x, float y)
   return x + y;
 }

-/* { dg-final { scan-tree-dump-times "Discarding function" 3 "oaccloops" } } */
+/* { dg-final { scan-tree-dump-times "Discarding function" 3 "oaccloops*" } } */
diff --git a/gcc/testsuite/c-c++-common/unroll-1.c b/gcc/testsuite/c-c++-common/unroll-1.c
index fe7f4f31912c..8e57a44be231 100644
--- a/gcc/testsuite/c-c++-common/unroll-1.c
+++ b/gcc/testsuite/c-c++-common/unroll-1.c
@@ -1,5 +1,5 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } */
+/* { dg-do compile } *
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdump-rtl-loop2_unroll-details" } */

 extern void bar (int);

@@ -10,12 +10,12 @@ void test (void)
   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 8; ++i)
     bar(i);
-  /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "11:.*: loop with 8 iterations completely unrolled" "cunrolli1" } } */

   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 7; ++i)
     bar(i);
-  /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "16:.*: loop with 7 iterations completely unrolled" "cunrolli1" } } */

   #pragma GCC unroll 8
   for (unsigned long i = 1; i <= 15; ++i)
diff --git a/gcc/testsuite/c-c++-common/unroll-4.c b/gcc/testsuite/c-c++-common/unroll-4.c
index 1c1988174ba7..fe7f9e10626e 100644
--- a/gcc/testsuite/c-c++-common/unroll-4.c
+++ b/gcc/testsuite/c-c++-common/unroll-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli1-details" } */

 extern void bar (int);

@@ -17,6 +17,6 @@ void test (void)
   for (unsigned long i = 1; i <= j; ++i)
     bar(i);

-  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */
+  /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli1" } } */
   /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */
 }
diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
index 7533c4fe0e88..6979cce71b05 100644
--- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -1,5 +1,4 @@
-/* Make sure that OpenACC loop processing happens.  */
-/* { dg-additional-options "-O2 -fdump-tree-oaccloops" } */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow*" } */

 extern int place ();

@@ -15,5 +14,5 @@ void vector_1 (int *ary, int size)
   }
 }

-/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop [0-9]{2}\(1\).*.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 68\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 68\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccloops" } } */
-
+/* { dg-final { scan-tree-dump {
+OpenACC loops.*Loop 0\(0\).*Loop [0-9]{2}\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow*" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
index 302fdb570b63..b6b11bf30afa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple case of non-looping code in which both uses ignore
    the sign and both definitions are sign ops.  */
@@ -18,5 +18,5 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <x} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <x} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
index d54fd36e2fb3..bef921be500b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple case of non-looping code in which both uses ignore
    the sign but only one definition is a sign op.  */
@@ -18,4 +18,4 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
index a244b4af2ac2..1b76ce05cbef 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple case of non-looping code in which one use ignores
    the sign but another doesn't.  */
@@ -18,4 +18,4 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 0 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -x} 0 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
index 54355009c744..02223fd9f23b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a simple reduction loop in which all inputs are sign ops and
    the consumer doesn't care about the sign.  */
@@ -17,5 +17,5 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 3 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 3 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
index e4f0f856ff6b..9dd04408b3a8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -g -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -g -fdump-tree-backprop1-details" }  */

 /* Test a loop that does both a multiplication and addition.  The addition
    should prevent any sign ops from being removed.  */
@@ -17,4 +17,4 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 0 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = __builtin_copysign} 0 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index 31f05716f149..1d17c7328036 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-backprop-details" }  */
+/* { dg-options "-O -fdump-tree-backprop1-details" }  */

 void start (void *);
 void end (void *);
@@ -26,5 +26,5 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)

-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop1" } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 "backprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
index bcafbfe86b50..110c6cd8635e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O3 -fdump-tree-cunrolli1-details" } */
 int a[2];
 void
 test(int c)
@@ -9,5 +9,5 @@ test(int c)
     a[i]=5;
 }
 /* Array bounds says the loop will not roll much.  */
-/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */
-/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli1"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
index e25c638ac514..f8ab47cebf08 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 int a[1];
 void
 test(int c)
@@ -12,4 +12,4 @@ test(int c)
 }
 /* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */

-/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli"} } */
+/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
index 886dc147ad1a..f93db92ab384 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdisable-tree-evrp" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fdisable-tree-evrp" } */
 void abort (void);
 int q (void);
 int a[10];
@@ -20,4 +20,4 @@ t (int n)
     }
   return sum;
 }
-/* { dg-final { scan-tree-dump-times "Removed pointless exit:" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "Removed pointless exit:" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
index b3617f685a1d..86c84606ce51 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details -fdisable-tree-cunroll -fdisable-tree-cunrolli" } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details -fdisable-tree-cunroll -fdisable-tree-cunrolli1" } */

 typedef int mad_fixed_t;
 struct mad_pcm
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
index 7ca1e4709751..f8f04ffaa456 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */
 int a[10];
 int b[11];
 int q (void);
@@ -15,4 +15,4 @@ t(int n)
        sum+=b[i];
   return sum;
 }
-/* { dg-final { scan-tree-dump "Loop 1 iterates at most 11 times" "cunrolli" } } */
+/* { dg-final { scan-tree-dump "Loop 1 iterates at most 11 times" "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
index d71b757fbca5..482c19ea1485 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
@@ -18,4 +18,4 @@ t6 (int qz, int wh)
     qz = jl * wh;
 }

-/* { dg-final { scan-tree-dump-times "Replacing" 2 "loopdone"} } */
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "loopdone2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
index ed0829a038c4..c6f1226d6834 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-phiprop-details" } */
+/* { dg-options "-O -fdump-tree-phiprop1-details" } */

 struct f
 {
@@ -16,4 +16,4 @@ int g(int i, int c, struct f *ff, int g)
   return *t;
 }

-/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop" } } */
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
index 2f86f02a30ce..3e8a13cd40c0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details" } */

 extern void abort (void);
 int __attribute__((noinline,noclone))
@@ -25,4 +25,4 @@ int main()
   return 0;
 }

-/* { dg-final { scan-tree-dump-times "bounded by 0x0\[^0-9a-f\]" 0 "cunrolli"} } */
+/* { dg-final { scan-tree-dump-times "bounded by 0x0\[^0-9a-f\]" 0 "cunrolli1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
index 669d357045cb..069df138bcbe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
@@ -50,4 +50,4 @@ int foo1 (e_u8 a[4][N], int b1, int b2, e_u8 b[M+1][4][N])


 /* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunroll" } } */

 /* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 2 "cunroll" } } */

-/* { dg-final { scan-tree-dump-not "completely unrolled" "cunrolli" } } */

+/* { dg-final { scan-tree-dump-not "completely unrolled" "cunrolli1" } } */

diff --git a/gcc/testsuite/gcc.dg/unroll-2.c b/gcc/testsuite/gcc.dg/unroll-2.c
index 8baceaac1699..f94174f0f1d3 100644
--- a/gcc/testsuite/gcc.dg/unroll-2.c
+++ b/gcc/testsuite/gcc.dg/unroll-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details=stderr -fno-peel-loops -fno-tree-vrp  -fdisable-tree-cunroll -fenable-tree-cunrolli" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details=stderr -fno-peel-loops -fno-tree-vrp  -fdisable-tree-cunroll -fenable-tree-cunrolli1" } */

 /* Blank lines can occur in the output of
    -fdump-tree-cunrolli-details=stderr.  */
diff --git a/gcc/testsuite/gcc.dg/unroll-3.c b/gcc/testsuite/gcc.dg/unroll-3.c
index 10bf59b9a2e7..0284378b9c5c 100644
--- a/gcc/testsuite/gcc.dg/unroll-3.c
+++ b/gcc/testsuite/gcc.dg/unroll-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunrolli=foo -fenable-tree-cunrolli=foo" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunrolli1=foo -fenable-tree-cunrolli1=foo" } */

 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@ int foo2(void)
   return 1;
 }

-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-4.c b/gcc/testsuite/gcc.dg/unroll-4.c
index 17f194212279..d62e2e7afa0a 100644
--- a/gcc/testsuite/gcc.dg/unroll-4.c
+++ b/gcc/testsuite/gcc.dg/unroll-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli=foo -fdisable-tree-cunrolli=foo2" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli1=foo -fdisable-tree-cunrolli1=foo2" } */

 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@ int foo2(void)
   return 1;
 }

-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/unroll-5.c b/gcc/testsuite/gcc.dg/unroll-5.c
index f3bdebe9882f..c81467cd4202 100644
--- a/gcc/testsuite/gcc.dg/unroll-5.c
+++ b/gcc/testsuite/gcc.dg/unroll-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli=foo2 -fdisable-tree-cunrolli=foo" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli1-details -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fenable-tree-cunrolli1=foo2 -fdisable-tree-cunrolli1=foo" } */

 unsigned a[100], b[100];
 inline void bar()
@@ -28,4 +28,4 @@ int foo2(void)
   return 1;
 }

-/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */
+/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
index 815b44e1f7cf..2f7c17d803eb 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-59.c
@@ -22,5 +22,5 @@ void foo (void)
 /* We should be able to vectorize the cycle in one SLP attempt including
    both load groups and do only one permutation.  */
 /* { dg-final { scan-tree-dump-times "transform load" 2 "slp1" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "loopdone2" } } */
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-profile-1.c b/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
index 922f965806f9..a8b3ffb87d06 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-profile-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
-/* { dg-additional-options "-fdump-tree-vect-details-blocks -fdisable-tree-cunrolli" } */
+/* { dg-additional-options "-fdump-tree-vect-details-blocks -fdisable-tree-cunrolli1" } */

 /* At least one of these should correspond to a full vector.  */

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 4a575a54c045..5ffd2799ac2c 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -482,6 +482,8 @@ extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_only (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_functions_only (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_nonnull_compare (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 3f9e9d0869f2..430c0520736a 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1587,6 +1587,7 @@ public:

   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_complete_unroll (m_ctxt); }

 }; // class pass_complete_unroll

@@ -1646,6 +1647,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *) { return optimize >= 2; }
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_complete_unrolli (m_ctxt); }

 }; // class pass_complete_unrolli

diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 957ac0f3baab..21961200db66 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -70,6 +70,8 @@ public:
   virtual bool gate (function *) { return flag_tree_loop_optimize; }

   virtual unsigned int execute (function *fn);
+
+  opt_pass * clone () { return new pass_fix_loops (m_ctxt); }
 }; // class pass_fix_loops

 unsigned int
@@ -136,6 +138,8 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fn) { return gate_loop (fn); }

+
+  opt_pass * clone () { return new pass_tree_loop (m_ctxt); }
 }; // class pass_tree_loop

 } // anon namespace
@@ -201,6 +205,97 @@ make_pass_oacc_kernels (gcc::context *ctxt)
 {
   return new pass_oacc_kernels (ctxt);
 }
+/* A superpass that runs its subpasses on OpenACC functions only.  */
+
+namespace {
+
+const pass_data pass_data_oacc_functions_only =
+{
+  GIMPLE_PASS, /* type */
+  "*oacc_fns_only", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_functions_only: public gimple_opt_pass
+{
+public:
+  pass_oacc_functions_only (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_functions_only, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) {
+    if (!flag_openacc)
+      return false;
+
+    if (!oacc_get_fn_attrib (fn->decl))
+      return false;
+
+    return true;
+  }
+
+}; // class pass_oacc_functions_only
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_functions_only (gcc::context *ctxt)
+{
+  return new pass_oacc_functions_only (ctxt);
+}
+
+/* A superpass that runs its subpasses only if compiling for OpenACC.  */
+
+namespace {
+
+const pass_data pass_data_oacc_only =
+{
+  GIMPLE_PASS, /* type */
+  "*oacc_only", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_only: public gimple_opt_pass
+{
+public:
+  pass_oacc_only (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_only, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) {
+    if (!flag_openacc)
+      return false;
+
+    if (!oacc_get_fn_attrib (fn->decl))
+      return false;
+
+    return true;
+  }
+
+}; // class pass_oacc_only
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_only (gcc::context *ctxt)
+{
+  return new pass_oacc_only (ctxt);
+}
+
+

 /* The ipa oacc superpass.  */

@@ -344,6 +439,8 @@ public:
   /* opt_pass methods: */
   virtual unsigned int execute (function *);

+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
+
 }; // class pass_tree_loop_init

 unsigned int
@@ -558,6 +655,8 @@ public:
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }

+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
+
 }; // class pass_tree_loop_done

 } // anon namespace
diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 64d6eda5f6c2..2b727ed0d013 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -479,6 +479,8 @@ public:
   virtual bool gate (function *) { return flag_tree_phiprop; }
   virtual unsigned int execute (function *);

+  opt_pass * clone () { return new pass_phiprop (m_ctxt); }
+
 }; // class pass_phiprop

 unsigned int
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
index d45326488cd8..bc55d158a81f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
@@ -7,5 +7,5 @@

 #include "pr85486.c"

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
index 18d77cc5ecb1..22891a243e14 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -34,5 +34,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
index 8b5b2a4a92d5..30418f378f93 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=::128" } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -35,5 +35,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
index 59be37a7c27e..754964d60100 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* We default to warp size 32 for the vector length, so the GOMP_OPENACC_DIM has
    no effect.  */
 /* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */
@@ -38,5 +38,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
index e5d1df09b8a3..44364cbc51a7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -36,5 +36,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
index e60f1c28db4a..5e387c6ced61 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=:2:128" } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -37,5 +37,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
index a1f67622f84d..d32f4e4417ab 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-set-target-env-var "GOMP_OPENACC_DIM" ":2:" } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -37,5 +37,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
index c419f6499b53..df5cb09df712 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -36,5 +36,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow1" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=8, vectors=128" } */

From patchwork Wed Nov 17 16:03:12 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47816
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 5BB46385842C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:10:37 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 584443858432
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:03:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 584443858432
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 tPVRjRHXFJRsgg8hiWZMtGYuirBfppmIDBZ1Du2T6klHAFwuyqOD1H3OdC8UVzo3gULLcrr4v3
 Z3HjD/PLehRn1C68/CkHYeGq5z1pI6P/4jTB53dNqGdV/iB3c/83Pz4FcX4lK4eCjUCxLkb3Yf
 SOqMxTU6Upku94CLANuaGGzngjI9YfDvL+z/OJnBOT8ASNYkZdYdMezJ5fhZhScuwItlYv4476
 OkrwZJhvAnmvDjCAvmJg26Vn22jJAxDrvMDnzOzGKsfm8apz2E5D0yhdL0rcwPXVAsEV27pr4v
 X3VswKJ3Gq9B11MadFGUBtHy
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68604022"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa2.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:03:49 -0800
IronPort-SDR: 
 DKmovncmpAHFVj71A5S2iRUG9TTZnQ9eIMdsFz3/UCa7jaLUdCmYzC8815PbkwrPZtXIvlFTRD
 dY8m8dT/vnkg6wOhmkzuO1BUYvv2uDDj1tTe9pjNrE4wJJvr4wm63OoPI4oJFVZ/NuL5+C3U2y
 DRlfN3KxRXNcc965hxD88CiMKYWBiZGzvtn6bmatODnjoJu85ehKewuJ4+4VAwnyToou4HNHGk
 74ZIzz+10s1wL9rV1XdUGkCYu56qiFm4nyRQkRj7YLUwc2UakDMSid8ktz0OLzZI36Y1w9J0ef
 tAQ=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 03/22] graphite: Extend SCoP detection dump
 output
Date: Wed, 17 Nov 2021 17:03:12 +0100
Message-ID: <20211117160330.20029-4-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

        * graphite-scop-detection.c (scop_detection::can_represent_loop):
        Output reason for failure to dump file.
        (scop_detection::harmful_loop_in_region): Likewise.
        (scop_detection::graphite_can_represent_expr): Likewise.
        (scop_detection::stmt_has_simple_data_refs_p): Likewise.
        (scop_detection::stmt_simple_for_scop_p): Likewise.
        (print_sese_loop_numbers): New function.
        (scop_detection::add_scop): Use from here to print loops in
        rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +++++++++++++++++++++++++++++-----
 1 file changed, 165 insertions(+), 23 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
     fprintf (output.dump_file, "%d", i);
     return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
     fprintf (output.dump_file, "%s", s);
     return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+    print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+    return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+    print_generic_expr (output.dump_file, t, TDF_SLIM);
+    return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+    if (loop_in_sese_p (loop, sese))
+      fprintf (file, "%d, ", loop->num);
+    printed = true;
+  }
+  if (printed)
+    fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
       if (! next
          || harmful_loop_in_region (next))
        {
-         if (s)
-           add_scop (s);
+          if (next)
+            DEBUG_PRINT (
+                dp << "[scop-detection] Discarding SCoP on loops ";
+                print_sese_loop_numbers (dump_file, next);
+                dp << " because of harmful loops\n";);
+          if (s)
+            add_scop (s);
          build_scop_depth (loop);
          s = invalid_sese;
        }
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop)
       || !single_pred_p (loop->latch)
       || exit->src != single_pred (loop->latch)
       || !empty_block_p (loop->latch))
-    return false;
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape unsupported.\n");
+      return false;
+    }
+
+  bool edge_irreducible
+      = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+    {
+      DEBUG_PRINT (
+          dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+      return false;
+    }
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+                                                          single_exit (loop),
+                                                          &niter_desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-    && number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-    && niter_desc.control.no_overflow
-    && (niter = number_of_latch_executions (loop))
-    && !chrec_contains_undetermined (niter)
-    && graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+    {
+      DEBUG_PRINT (
+          dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+             << "Condition: " << niter_desc.assumptions << "\n");
+      return false;
+    }
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+      return false;
+    }
+  if (!niter_desc.control.no_overflow)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n");
+      return false;
+    }
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+                  << "Loop niter chrec contains undetermined coefficients.\n");
+      return false;
+    }
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+    {
+      DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+                  << "Loop niter expression cannot be represented: "
+                  << niter << "\n");
+      return false;
+    }
+
+  return true;
 }

 /* Return true when BEGIN is the preheader edge of a loop with a single exit
@@ -640,6 +726,16 @@ scop_detection::add_scop (sese_l s)

   scops.safe_push (s);
   DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, s));
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      loop_p loop;
+      fprintf (dump_file, "Loops in SCoP: ");
+      FOR_EACH_LOOP (loop, 0)
+      if (loop_in_sese_p (loop, s))
+        fprintf (dump_file, "%d ", loop->num);
+      fprintf (dump_file, "\n");
+    }
 }

 /* Return true when a statement in SCOP cannot be represented by Graphite.  */
@@ -665,7 +761,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const

       /* The basic block should not be part of an irreducible loop.  */
       if (bb->flags & BB_IRREDUCIBLE_LOOP)
-       return true;
+       {
+          DEBUG_PRINT (dp << "[scop-detection-fail] Found bb in irreducible "
+                      "loop.\n");
+          return true;
+        }

       /* Check for unstructured control flow: CFG not generated by structured
         if-then-else.  */
@@ -676,7 +776,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
          FOR_EACH_EDGE (e, ei, bb->succs)
            if (!dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)
                && !dominated_by_p (CDI_DOMINATORS, e->dest, bb))
-             return true;
+             {
+                DEBUG_PRINT (dp << "[scop-detection-fail] Found unstructured "
+                                   "control flow.\n");
+                return true;
+              }
        }

       /* Collect all loops in the current region.  */
@@ -688,7 +792,10 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
       for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
           !gsi_end_p (gsi); gsi_next (&gsi))
        if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb))
-         return true;
+         {
+           DEBUG_PRINT (dp << "[scop-detection-fail] Found harmful statement.\n");
+           return true;
+         }

       for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb);
           dom;
@@ -731,9 +838,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
          && ! loop_nest_has_data_refs (loop))
        {
          DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-                      << "does not have any data reference.\n");
+                      << " does not have any data reference.\n");
          return true;
        }
+
+      DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n");
     }

   return false;
@@ -922,7 +1031,21 @@ scop_detection::graphite_can_represent_expr (sese_l scop, loop_p loop,
                                             tree expr)
 {
   tree scev = cached_scalar_evolution_in_region (scop, loop, expr);
-  return graphite_can_represent_scev (scop, scev);
+  bool can_represent = graphite_can_represent_scev (scop, scev);
+
+  if (!can_represent)
+    {
+      if (dump_file)
+       {
+          fprintf (dump_file, "[graphite_can_represent_expr] Cannot represent "
+                  "scev \"");
+          print_generic_expr (dump_file, scev, TDF_SLIM);
+          fprintf (dump_file, "\" of expression ");
+          print_generic_expr (dump_file, expr, TDF_SLIM);
+          fprintf (dump_file, " in loop %d\n", loop->num);
+        }
+    }
+  return can_represent;
 }

 /* Return true if the data references of STMT can be represented by Graphite.
@@ -938,7 +1061,11 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)

   auto_vec<data_reference_p> drs;
   if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs))
-    return false;
+    {
+      DEBUG_PRINT (dp <<
+                  "[stmt_has_simple_data_refs_p] Unanalyzable statement.\n");
+      return false;
+    }

   int j;
   data_reference_p dr;
@@ -946,7 +1073,12 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
     {
       for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i)
        if (! graphite_can_represent_scev (scop, DR_ACCESS_FN (dr, i)))
-         return false;
+         {
+            DEBUG_PRINT (dp << "[stmt_has_simple_data_refs_p] Cannot "
+                               "represent access function SCEV: "
+                            << DR_ACCESS_FN (dr, i) << "\n");
+            return false;
+          }
     }

   return true;
@@ -1027,14 +1159,24 @@ scop_detection::stmt_simple_for_scop_p (sese_l scop, gimple *stmt,
        for (unsigned i = 0; i < 2; ++i)
          {
            tree op = gimple_op (stmt, i);
-           if (!graphite_can_represent_expr (scop, loop, op)
-               /* We can only constrain on integer type.  */
-               || ! INTEGRAL_TYPE_P (TREE_TYPE (op)))
+           if (!graphite_can_represent_expr (scop, loop, op))
+             {
+               DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+                                             "[scop-detection-fail] "
+                                             "Graphite cannot represent cond "
+                                             "stmt operator expression.\n"));
+               DEBUG_PRINT (dp << op << "\n");
+
+               return false;
+             }
+
+             if (! INTEGRAL_TYPE_P (TREE_TYPE (op)))
              {
-               DEBUG_PRINT (dp << "[scop-detection-fail] "
-                               << "Graphite cannot represent stmt:\n";
-                            print_gimple_stmt (dump_file, stmt, 0,
-                                               TDF_VOPS | TDF_MEMSYMS));
+               DEBUG_PRINT (dump_printf_loc (MSG_MISSED_OPTIMIZATION, stmt,
+                                             "[scop-detection-fail] "
+                                             "Graphite cannot represent cond "
+                                             "statement operator. "
+                                             "Type must be integral.\n"));
                return false;
              }
          }

From patchwork Wed Nov 17 16:03:13 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47818
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id EE982385E007
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:11:46 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 404003858412
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 404003858412
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 LeYFzHUO8NSM6hEqzzrXcOFWtEx3QnVL15uUZP6oRVXjp/BXucYWdbrf3KBVvKTT7xEuVRCCJQ
 Ye12eu7JqzWuQudwZRyQzce2i6KvaVcHVaUV/A3Piadri7l8p2FYEltJIeq62qONqFXeRKsAIu
 vjwqXQ85qd/CcT9GDLktSvMc36mjC8uuhNZvNseTdVC/WRQP/BfIuW1Ayzw/2ywiSDhcFx5A1o
 tpqxBWC8JUCckl8WjdX1ZXPhF+UCzLX7Ze98wd0cnLiDd8TlZUR3LRhepxEd+8i4h6CpJjgbRa
 tdoX9BXPswdh0nhmRkFmAIXo
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445324"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:00 -0800
IronPort-SDR: 
 ZzVjkaBJcOKRlkVEo9bA7BeZzWerH5g+YFSZaBJXAN1wUoht4lNji9c6NqxbJoc7Y9ijWPi9Ts
 dqAatGHcBZYOBV8+HRAMHQ+RGc/9r6xUORxQmFgCx5c1i+YdXOiHg4iGgEKX7Eo7pymt5dkhwW
 WPbHYUo82pBwYfkgRj7nH2SDwW/Mpx+OQnCri7CHhG0hXAPXBvm3irx4Xpf2GcY1ZiA02fmsUD
 z/ClzC4UsGmkM61RPLCY97GtGX6LJt9MRln+Gn+B0gP0n7PW5YJh+sc0sIA5nwhDTVCtodpPSY
 Fwc=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 04/22] graphite: Rename isl_id_for_ssa_name
Date: Wed, 17 Nov 2021 17:03:13 +0100
Message-ID: <20211117160330.20029-5-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

        * graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
          (isl_id_for_parameter): ... this new function name.
          (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index eebf2e02cfca..195851cb540a 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -893,15 +894,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
     space = isl_space_set_dim_id (space, isl_dim_param, i,
-                                  isl_id_for_ssa_name (scop, e));
+                                  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-    add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+    add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */

From patchwork Wed Nov 17 16:03:14 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47819
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3A796385B83B
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:12:16 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id D8F0F3858412
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D8F0F3858412
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 5QE7qHW/xgoJnk4dNT3i8Pqno63nkMbr7pbgzQpwUGmXBxSWTmunk2/d8dIfOYRKE6oNF+iAK6
 7a2e/P57xY1iylfzj6xY3kIteJFdn8Bep28vNcsH/DN4OMopzPRq03wHPCFgL7q3/DC5jDKVT3
 YY9zXRxRhrnLPuaf1SHHGuTgD8COrhZ1KDQ6ilsSetTUg/WPRx10UN2Y0caEGqh5jghXmnVRFN
 dkO5YI/QX5sytbSzwY/mGvq8xEulegEOTQ5qBbuiZ1Bob9CMGO7ayzpL0GWEIoVTUd2YCvCqdH
 ZSKQ0T/XbJJauY+t72WXj6VC
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445327"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:02 -0800
IronPort-SDR: 
 Pq3nAJuU4e8V1wERxKcPRtOlWOwV/nmflrx3+a+vWD/SklkP4EYyikhB+9ku3n3WxCMdoHAdA5
 /egUn4OVtbYsQ0Yb7bbr2rxWnAqBLi5aGPByeCkrFaYoXFy9Ba8mZHyMJH1edxUhjCXUZdz6aT
 7JzuT16U6nLRWwrwcjtLSNwBIiEU73FTCcTnO+d9sjh9Wqm8Zk42HpiUbDGIi+JfQ+NrtsCqlF
 +YTDslvWZBf/t98dTiHnW0dCM1o3mspN58rdWW+sZ2aYqA7j1g5Kp6NLFZxc8U/3A6gVL0Ligw
 f8Q=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 05/22] graphite: Fix minor mistakes in
 comments
Date: Wed, 17 Nov 2021 17:03:14 +0100
Message-ID: <20211117160330.20029-6-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/ChangeLog:

        * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
          a reference to a variable which does not exist.
        * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
          in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c      | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c202213f39b3..44c06016f1a2 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
-     in strage ways when inserting the stmts from it into different basic
+     in strange ways when inserting the stmts from it into different basic
      blocks one at a time.  */
   auto_vec<gimple *, 3> stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 195851cb540a..12fa2d669b3c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -644,14 +644,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
                 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
      the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
                                    alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
-     data reference DR.  */
+     the reference */
   isl_constraint *c
     = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);

From patchwork Wed Nov 17 16:03:15 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47820
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 86408385DC0C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:12:45 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 5C232385841C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5C232385841C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 qharzFcyP6MoWQvzoPnW62rf0THNx3gs3gd0p15bQdgvPIzHeu8BU1mDObmcS9sYTZqZpE+RWp
 AXPDZFFHFxW/BMDLDl369+Hnkd0n+8eKmorafhX1ivZSFu+NGOOZlrL/kqAOExjqhmOjIyBvMB
 CB/TqgXMbvlB8iRhxswOlTt8HED+v5jf1bRhQULwK3U3kXlSFD0U9qgpnSsKmLjKU3nm4oX78L
 5CNLOJeF8B9Eb85XqaPtHvCnli5Lrs28OQeDQTuU//ZUAw3pnIuQtuJVjn5IsHB7+LyvVtuXDL
 mfe9wBiasMQfakdwdBPAueQ+
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445333"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:07 -0800
IronPort-SDR: 
 BDkKMMEoTgo9AEcq2K+nWSuzT0SnbZI/yV25ByFe/UoUzdqNE2dxM8Ivxs6Jfji6PTCl79K3iH
 MYLI1wmQCkCCWLjeyv9Ad+iFzocmN9EbJce333N0bGvo+aigT4CXs1mkDldvBX8GZ49HG7Z0dT
 CsDOmIojuCZx+1uySFjt9YFxOSpbbE3Jmp0f8jZRccnpmEXHttDfzXhPdl3st27mEj/8O2zhBm
 oEaUIaKEy2ArpwKAJPY+vuW3EOr/KBZp8NDQFAN7846onXTYtBCjtfMrXpVOlHSiMbuUqYPg6Q
 4Sk=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 07/22] Move compute_alias_check_pairs to
 tree-data-ref.c
Date: Wed, 17 Nov 2021 17:03:15 +0100
Message-ID: <20211117160330.20029-7-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
        * tree-loop-distribution.c (data_ref_segment_size): Remove function.
        (latch_dominated_by_data_ref): Likewise.
        (compute_alias_check_pairs): Likewise.

        * tree-data-ref.c (data_ref_segment_size): New function,
        copied from tree-loop-distribution.c
        (compute_alias_check_pairs): Likewise.
        (latch_dominated_by_data_ref): Likewise.

        * tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c          | 87 ++++++++++++++++++++++++++++++++++++
 gcc/tree-data-ref.h          |  3 ++
 gcc/tree-loop-distribution.c | 87 ------------------------------------
 3 files changed, 90 insertions(+), 87 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index d04e95f7c285..71f8d790e618 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2645,6 +2645,93 @@ create_intersect_range_checks (class loop *loop, tree *cond_expr,
     dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+                      fold_convert (sizetype, niters),
+                      size_one_node);
+  return size_binop (MULT_EXPR,
+                    fold_convert (sizetype, DR_STEP (dr)),
+                    fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+                        gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
+                          vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+    {
+      ddr_p ddr = (*alias_ddrs)[i];
+      struct data_reference *dr_a = DDR_A (ddr);
+      struct data_reference *dr_b = DDR_B (ddr);
+      tree seg_length_a, seg_length_b;
+
+      if (latch_dominated_by_data_ref (loop, dr_a))
+       seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+      else
+       seg_length_a = data_ref_segment_size (dr_a, niters);
+
+      if (latch_dominated_by_data_ref (loop, dr_b))
+       seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+      else
+       seg_length_b = data_ref_segment_size (dr_b, niters);
+
+      unsigned HOST_WIDE_INT access_size_a
+       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
+      unsigned HOST_WIDE_INT access_size_b
+       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
+      unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+      unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+      dr_with_seg_len_pair_t dr_with_seg_len_pair
+       (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+        /* ??? Would WELL_ORDERED be safe?  */
+        dr_with_seg_len_pair_t::REORDERED);
+
+      comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+    }
+
+  if (tree_fits_uhwi_p (niters))
+    factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file,
+            "Improved number of alias checks from %d to %d\n",
+            alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
    overlapping of address ranges represented by a list of data references
    pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 8001cc54f518..5016ec926b1d 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -577,6 +577,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop *, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *,
                                           poly_uint64);
+
+extern void compute_alias_check_pairs (class loop *, vec<ddr_p> *,
+                                      vec<dr_with_seg_len_pair_t> *);
 extern void create_runtime_alias_checks (class loop *,
                                         vec<dr_with_seg_len_pair_t> *, tree*);
 extern tree dr_direction_indicator (struct data_reference *);
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 65aa1df4abae..d987cdb34424 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2559,93 +2559,6 @@ loop_distribution::break_alias_scc_partitions (struct graph *rdg,
     }
 }

-/* Compute and return an expression whose value is the segment length which
-   will be accessed by DR in NITERS iterations.  */
-
-static tree
-data_ref_segment_size (struct data_reference *dr, tree niters)
-{
-  niters = size_binop (MINUS_EXPR,
-                      fold_convert (sizetype, niters),
-                      size_one_node);
-  return size_binop (MULT_EXPR,
-                    fold_convert (sizetype, DR_STEP (dr)),
-                    fold_convert (sizetype, niters));
-}
-
-/* Return true if LOOP's latch is dominated by statement for data reference
-   DR.  */
-
-static inline bool
-latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
-{
-  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
-                        gimple_bb (DR_STMT (dr)));
-}
-
-/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
-   data dependence relations ALIAS_DDRS.  */
-
-static void
-compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
-                          vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
-{
-  unsigned int i;
-  unsigned HOST_WIDE_INT factor = 1;
-  tree niters_plus_one, niters = number_of_latch_executions (loop);
-
-  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
-  niters = fold_convert (sizetype, niters);
-  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Creating alias check pairs:\n");
-
-  /* Iterate all data dependence relations and compute alias check pairs.  */
-  for (i = 0; i < alias_ddrs->length (); i++)
-    {
-      ddr_p ddr = (*alias_ddrs)[i];
-      struct data_reference *dr_a = DDR_A (ddr);
-      struct data_reference *dr_b = DDR_B (ddr);
-      tree seg_length_a, seg_length_b;
-
-      if (latch_dominated_by_data_ref (loop, dr_a))
-       seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
-      else
-       seg_length_a = data_ref_segment_size (dr_a, niters);
-
-      if (latch_dominated_by_data_ref (loop, dr_b))
-       seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
-      else
-       seg_length_b = data_ref_segment_size (dr_b, niters);
-
-      unsigned HOST_WIDE_INT access_size_a
-       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
-      unsigned HOST_WIDE_INT access_size_b
-       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
-      unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
-      unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
-
-      dr_with_seg_len_pair_t dr_with_seg_len_pair
-       (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
-        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
-        /* ??? Would WELL_ORDERED be safe?  */
-        dr_with_seg_len_pair_t::REORDERED);
-
-      comp_alias_pairs->safe_push (dr_with_seg_len_pair);
-    }
-
-  if (tree_fits_uhwi_p (niters))
-    factor = tree_to_uhwi (niters);
-
-  /* Prune alias check pairs.  */
-  prune_runtime_alias_test_list (comp_alias_pairs, factor);
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file,
-            "Improved number of alias checks from %d to %d\n",
-            alias_ddrs->length (), comp_alias_pairs->length ());
-}
-
 /* Given data dependence relations in ALIAS_DDRS, generate runtime alias
    checks and version LOOP under condition of these runtime alias checks.  */


From patchwork Wed Nov 17 16:03:16 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47821
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 41348385800C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:13:15 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 6B93E385841C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B93E385841C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 ycesRS+fM/h/zaYZKoxBGV8p4XqTqgpKaSBdfwzRvfUCkEcBbCE7nw9oJFxInAQuRmUWpn2XT0
 PXMP0rPhge4dgfC7K8gD/8ps4HSTFNzmepzaGJ3rrBjgu7FJ1nnVExCRZyS2VEkVRa6xHjy6Sj
 WdYKkbsIw+egmoM4iHkZLyMQ3PmC/wDXlXK6GPKVhAd+zJ4PEWRqQn4EfFhdfv4abDrNy48QY3
 iMtngWpRwOtY3dAZkBFwK1JDIFRW3gkSNBMtGr71YC5h1EXbPwOk9smOCZqbODg8wBL+9tSo/3
 vzbIt/HDSD4CvEK64dGxNpQx
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445335"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:09 -0800
IronPort-SDR: 
 B8HIxyB2iBtsKAf2E8pKuK0UzKluoP+ZH8z3CrwEv+IaIENd6VQHyXGraUuxi1WEOLQl9/eM5C
 qyKXYlTKaQBBkBqHSPpEknWz2zcnmUeDhDsqmRpkPTKcsdxXzYus5RFqb+yww1VSKSnCR/MGLO
 Tv7hCiOMIjzqq7kLKI7eBdtiny/4N8N3SwHHkkaUF1WaQ4QHNb464dVGk98D0somq7Jfwxm4xu
 ZdA1ds7ymmhEeCCIKrRxbraHwCqak1rrW9UfkkrtkDKxhMoAbgR/aP3iJjJxht1+D/+X1ktMWA
 eVY=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 08/22] graphite: Add runtime alias checking
Date: Wed, 17 Nov 2021 17:03:16 +0100
Message-ID: <20211117160330.20029-8-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

        * common.opt: Add fgraphite-runtime-alias-checks.
        * graphite-isl-ast-to-gimple.c
        (generate_alias_cond): New function.
        (graphite_regenerate_ast_isl): Use from here.
        * graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
        (free_scop): and release here.
        * graphite-scop-detection.c (dr_defs_outside_region): New function.
        (dr_well_analyzed_for_runtime_alias_check_p): New function.
        (graphite_runtime_alias_check_p): New function.
        (build_alias_set): Record unhandled alias ddrs for later alias check
        creation if flag_graphite_runtime_alias_checks is true instead
        of failing.
        * graphite.h (struct scop): Add field unhandled_alias_ddrs.
        * sese.h (has_operands_from_region_p): New function.
gcc/testsuite/ChangeLog:

        * gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt                          |   4 +
 gcc/graphite-isl-ast-to-gimple.c        |  60 ++++++
 gcc/graphite-poly.c                     |   2 +
 gcc/graphite-scop-detection.c           | 239 +++++++++++++++++++++---
 gcc/graphite.h                          |   4 +
 gcc/sese.h                              |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 326 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/common.opt b/gcc/common.opt
index 771398bc03de..aa695e56dc48 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1636,6 +1636,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 44c06016f1a2..caa0160b9bce 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
     }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec<ddr_p> &alias_ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+                       && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec<dr_with_seg_len_pair_t> check_pairs;
+  compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Generated runtime alias check: ");
+      print_generic_expr (dump_file, alias_cond, dump_flags);
+      fprintf (dump_file, "\n");
+    }
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
    Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+      && scop->unhandled_alias_ddrs.length () > 0)
+    {
+      /* SCoP detection has failed to handle the aliasing between some data
+        references of the SCoP statically. Generate an alias check that selects
+        the newly generated version of the SCoP in the true-branch of the
+        conditional if aliasing can be ruled out at runtime and the original
+        version of the SCoP, otherwise. */
+
+      loop_p loop
+          = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+                              scop->scop_info->region.exit->src->loop_father);
+      tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+      tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+      set_ifsese_condition (region->if_region, non_alias_cond);
+
+      /* The loop-nest vec is shared by all DDRs. */
+      DDR_LOOP_NEST (scop->unhandled_alias_ddrs[0]).release ();
+
+      unsigned int i;
+      struct data_dependence_relation *ddr;
+
+      FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr)
+       if (ddr)
+         free_dependence_relation (ddr);
+      scop->unhandled_alias_ddrs.truncate (0);
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "[codegen] isl AST to Gimple succeeded.\n");
+
   t.translate_isl_ast (context_loop, root_node, e, ip);
   if (! t.codegen_error_p ())
     {
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 2e31b2782c24..27d5e43af125 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -264,6 +264,7 @@ new_scop (edge entry, edge exit)
   scop_set_region (s, region);
   s->pbbs.create (3);
   s->drs.create (3);
+  s->unhandled_alias_ddrs.create (1);
   s->dependence = NULL;
   return s;
 }
@@ -284,6 +285,7 @@ free_scop (scop_p scop)

   scop->pbbs.release ();
   scop->drs.release ();
+  scop->unhandled_alias_ddrs.release ();

   isl_set_free (scop->param_context);
   scop->param_context = NULL;
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 46c470210d05..26ba61d1d601 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -1542,6 +1542,123 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
   return new_gimple_poly_bb (bb, drs, reads, writes);
 }

+/* Checks if all parts of DR are defined outside of REGION.  This allows an
+   alias check involving DR to be placed in front of the region. */
+
+static opt_result
+dr_defs_outside_region (const sese_l &region, data_reference_p dr)
+{
+  static const char *pre
+      = "cannot create alias check for SCoP. Data reference's";
+  static const char *suf = "uses definitions from SCoP.\n";
+  opt_result res = opt_result::success ();
+
+  if (has_operands_from_region_p (DR_BASE_OBJECT (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s base %s", pre, suf);
+  else if (has_operands_from_region_p (DR_INIT (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s constant offset %s", pre,
+                                  suf);
+  else if (has_operands_from_region_p (DR_STEP (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s step %s", pre, suf);
+  else if (has_operands_from_region_p (DR_OFFSET (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s loop-invariant offset %s",
+                                  pre, suf);
+  else if (has_operands_from_region_p (DR_BASE_ADDRESS (dr), region))
+    res = opt_result::failure_at (DR_STMT (dr), "%s base address %s", pre,
+                                  suf);
+  else
+    for (unsigned i = 0; i < DR_NUM_DIMENSIONS (dr); ++i)
+      if (has_operands_from_region_p (DR_ACCESS_FN (dr, i), region))
+        {
+          res = opt_result::failure_at (
+              DR_STMT (dr), "%s %d-th access function  %s", pre, i + 1, pre);
+          break;
+        }
+
+  return opt_result::success ();
+}
+
+/* Check that all constituents of DR that are used by the
+   "compute_alias_check_pairs" function have been analyzed as required. */
+
+static opt_result
+dr_well_analyzed_for_runtime_alias_check_p (data_reference_p dr)
+{
+  static const char* error =
+    "data-reference not well-analyzed for runtime check.";
+  gimple* stmt = DR_STMT (dr);
+
+  if (! DR_BASE_ADDRESS (dr))
+    return opt_result::failure_at (stmt, "%s no base address.\n", error);
+  else if (! DR_OFFSET (dr))
+    return opt_result::failure_at (stmt, "%s no offset.\n", error);
+  else if (! DR_INIT (dr))
+    return opt_result::failure_at (stmt, "%s no init.\n", error);
+  else if (! DR_STEP (dr))
+    return opt_result::failure_at (stmt, "%s no step.\n", error);
+  else if (! tree_fits_uhwi_p (DR_STEP (dr)))
+    return opt_result::failure_at (stmt, "%s step too large.\n", error);
+
+  DEBUG_PRINT (dump_data_reference (dump_file, dr));
+
+  return opt_result::success ();
+}
+
+/* Return TRUE if it is possible to create a runtime alias check for
+   data-references DR1 and DR2 from LOOP and place it in front of REGION. */
+
+static opt_result
+graphite_runtime_alias_check_p (data_reference_p dr1, data_reference_p dr2,
+                                class loop *loop, const sese_l &region)
+{
+  gcc_checking_assert (loop);
+  gcc_checking_assert (dr1);
+  gcc_checking_assert (dr2);
+
+  if (dump_file)
+    {
+      fprintf (dump_file,
+               "Attempting runtime alias check creation for DRs:\n");
+      dump_data_reference (dump_file, dr1);
+      dump_data_reference (dump_file, dr2);
+    }
+
+  if (!optimize_loop_for_speed_p (loop))
+    return opt_result::failure_at (DR_STMT (dr1),
+                                   "runtime alias check not supported when"
+                                   " optimizing for size.\n");
+
+  /* Verify that we have enough information about the data-references and
+     context loop to construct a runtime alias check expression with
+     "compute_alias_check_pairs". */
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+    return opt_result::failure_at (DR_STMT (dr1),
+                                  "failed to obtain number of iterations of "
+                                  "loop %d.\n", loop->num);
+
+  opt_result ok = dr_well_analyzed_for_runtime_alias_check_p (dr1);
+  if (!ok)
+    return ok;
+
+  ok = dr_well_analyzed_for_runtime_alias_check_p (dr2);
+  if (!ok)
+    return ok;
+
+  /* The runtime alias check would be placed before REGION and hence it cannot
+     use definitions made within REGION. */
+
+  ok = dr_defs_outside_region (region, dr1);
+  if (!ok)
+    return ok;
+
+  ok = dr_defs_outside_region (region, dr2);
+  if (!ok)
+    return ok;
+
+  return opt_result::success ();
+}
+
 /* Compute alias-sets for all data references in DRS.  */

 static bool
@@ -1549,7 +1666,7 @@ build_alias_set (scop_p scop)
 {
   int num_vertices = scop->drs.length ();
   struct graph *g = new_graph (num_vertices);
-  dr_info *dr1, *dr2;
+  dr_info *dri1, *dri2;
   int i, j;
   int *all_vertices;

@@ -1557,33 +1674,110 @@ build_alias_set (scop_p scop)
     = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
                        scop->scop_info->region.exit->src->loop_father);

-  FOR_EACH_VEC_ELT (scop->drs, i, dr1)
-    for (j = i+1; scop->drs.iterate (j, &dr2); j++)
-      if (dr_may_alias_p (dr1->dr, dr2->dr, nest))
-       {
-         /* Dependences in the same alias set need to be handled
-            by just looking at DR_ACCESS_FNs.  */
-         if (DR_NUM_DIMENSIONS (dr1->dr) == 0
-             || DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr)
-             || ! operand_equal_p (DR_BASE_OBJECT (dr1->dr),
-                                   DR_BASE_OBJECT (dr2->dr),
-                                   OEP_ADDRESS_OF)
-             || ! types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (dr1->dr)),
-                                      TREE_TYPE (DR_BASE_OBJECT (dr2->dr))))
-           {
-             free_graph (g);
-             return false;
-           }
-         add_edge (g, i, j);
-         add_edge (g, j, i);
-       }
+  gcc_checking_assert (nest);
+
+  vec<loop_p> nest_vec;
+  nest_vec.create (1);
+  if (flag_graphite_runtime_alias_checks)
+    nest_vec.safe_push (nest);
+
+  FOR_EACH_VEC_ELT (scop->drs, i, dri1)
+    {
+      data_reference_p dr1 = dri1->dr;
+
+      for (j = i + 1; scop->drs.iterate (j, &dri2); j++)
+        {
+
+          data_reference_p dr2 = dri2->dr;
+          if (!(DR_IS_READ (dr1) && DR_IS_READ (dr2))
+              && dr_may_alias_p (dr1, dr2, nest))
+            {
+              /* Dependences in the same alias set need to be handled
+                 by just looking at DR_ACCESS_FNs.  */
+              bool dimension_zero = DR_NUM_DIMENSIONS (dr1) == 0;
+              bool different_dimensions
+                  = DR_NUM_DIMENSIONS (dr1) != DR_NUM_DIMENSIONS (dr2);
+              bool different_base_objects = !operand_equal_p (
+                  DR_BASE_OBJECT (dr1), DR_BASE_OBJECT (dr2), OEP_ADDRESS_OF);
+              bool incompatible_types
+                  = !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (dr1)),
+                                         TREE_TYPE (DR_BASE_OBJECT (dr2)));
+              bool ddr_can_be_handled
+                  = !(dimension_zero || different_dimensions
+                      || different_base_objects || incompatible_types);
+
+              if (!ddr_can_be_handled)
+                {
+                  DEBUG_PRINT (
+                      dp << "[build_alias_set] "
+                            "Cannot handle aliasing between data references:\n";
+                      print_gimple_stmt (dump_file, dr1->stmt, 2, TDF_DETAILS);
+                      print_gimple_stmt (dump_file, dr2->stmt, 2, TDF_DETAILS);
+                      dp << "\n");
+                  if (dimension_zero)
+                    DEBUG_PRINT (dp << "DR1 has dimension 0.\n");
+                  if (different_base_objects)
+                    DEBUG_PRINT (dp << "DRs have different base objects.\n");
+                  if (different_dimensions)
+                    DEBUG_PRINT (dp << "DRs have different dimensions.\n");
+                  if (incompatible_types)
+                    DEBUG_PRINT (dp <<
+                                "DRs have incompatible base object types.\n");
+                }
+
+              if (ddr_can_be_handled)
+                {
+                  add_edge (g, i, j);
+                  add_edge (g, j, i);
+                  continue;
+                }
+
+              loop_p common_loop
+                  = find_common_loop ((DR_STMT (dr1))->bb->loop_father,
+                                      (DR_STMT (dr2))->bb->loop_father);
+              edge scop_entry = scop->scop_info->region.entry;
+              dr1 = create_data_ref (scop_entry, common_loop, DR_REF (dr1),
+                                     DR_STMT (dr1), DR_IS_READ (dr1),
+                                     DR_IS_CONDITIONAL_IN_STMT (dr1));
+              dr2 = create_data_ref (scop_entry, common_loop, DR_REF (dr2),
+                                     DR_STMT (dr2), DR_IS_READ (dr2),
+                                     DR_IS_CONDITIONAL_IN_STMT (dr2));
+
+              if (flag_graphite_runtime_alias_checks
+                  && graphite_runtime_alias_check_p (dr1, dr2, nest,
+                                                     scop->scop_info->region))
+                {
+                  ddr_p ddr = initialize_data_dependence_relation (dr1, dr2,
+                                                                   nest_vec);
+                  scop->unhandled_alias_ddrs.safe_push (ddr);
+                }
+              else
+                {
+                  if (flag_graphite_runtime_alias_checks)
+                    {
+                      unsigned int i;
+                      struct data_dependence_relation *ddr;
+
+                      FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr)
+                      if (ddr)
+                        free_dependence_relation (ddr);
+                      scop->unhandled_alias_ddrs.truncate (0);
+                    }
+
+                  nest_vec.release ();
+                  free_graph (g);
+                  return false;
+                }
+            }
+      }
+    }

   all_vertices = XNEWVEC (int, num_vertices);
   for (i = 0; i < num_vertices; i++)
     all_vertices[i] = i;

   scop->max_alias_set
-    = graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1;
+      = graphds_dfs (g, all_vertices, num_vertices, NULL, true, NULL) + 1;
   free (all_vertices);

   for (i = 0; i < g->n_vertices; i++)
@@ -1703,7 +1897,6 @@ gather_bbs::after_dom_children (basic_block bb)
     }
 }

-
 /* Compute sth like an execution order, dominator order with first executing
    edges that stay inside the current loop, delaying processing exit edges.  */

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 6464d2f50ce7..03febfa39986 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -368,6 +368,10 @@ struct scop
   /* The maximum alias set as assigned to drs by build_alias_sets.  */
   unsigned max_alias_set;

+  /* A set of ddrs that were rejected by build_alias_set during scop detection
+     and that must be handled by other means (runtime checking). */
+  vec<ddr_p> unhandled_alias_ddrs;
+
   /* All the basic blocks in this scop that contain memory references
      and that will be represented as statements in the polyhedral
      representation.  */
diff --git a/gcc/sese.h b/gcc/sese.h
index cd19e6010196..c51ea68bfb47 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -153,6 +153,24 @@ defined_in_sese_p (tree name, const sese_l &r)
   return stmt_in_sese_p (SSA_NAME_DEF_STMT (name), r);
 }

+/* Returns true if EXPR has operands that are defined in REGION.  */
+
+static bool
+has_operands_from_region_p (tree expr, const sese_l &region)
+{
+  if (!expr || is_gimple_min_invariant (expr))
+    return false;
+
+  if (TREE_CODE (expr) == SSA_NAME)
+    return defined_in_sese_p (expr, region);
+
+  for (int i = 0; i < TREE_OPERAND_LENGTH (expr); i++)
+    if (has_operands_from_region_p (TREE_OPERAND (expr, i), region))
+      return true;
+
+  return false;
+}
+
 /* Returns true when LOOP is in REGION.  */

 static inline bool
diff --git a/gcc/testsuite/gcc.dg/graphite/alias-1.c b/gcc/testsuite/gcc.dg/graphite/alias-1.c
new file mode 100644
index 000000000000..ee80dae1df33
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/alias-1.c
@@ -0,0 +1,22 @@
+/* This test demonstrates a loop nest that Graphite cannot handle
+   because of aliasing. It should be possible to handle this loop nest
+   by creating a runtime alias check like in the very similar test
+   alias-0-runtime-check.c. However Graphite analyses the data
+   reference with respect to the innermost loop that contains the data
+   reference, the variable "i" remains uninstantiated (in contrast to
+   "j"), and consequently the alias check cannot be placed outside of
+   the SCoP since "i" is not defined there. */
+
+/* { dg-options "-O2 -fgraphite-identity -fgraphite-runtime-alias-checks -fdump-tree-graphite-details" } */
+
+void sum(int *x, int *y, unsigned *sum)
+{
+  unsigned i,j;
+  *sum = 0;
+
+  for (i = 0; i < 10000; i=i+1)
+    for (j = 0; j < 22222; j=j+1)
+      *sum +=  x[i] + y[j];
+}
+
+/* { dg-final { scan-tree-dump "number of SCoPs: 1" "graphite" { xfail *-*-* } } } */

From patchwork Wed Nov 17 16:03:17 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47823
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B4481385840A
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:14:50 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 54127385AC0A
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 54127385AC0A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 reMVIyPhc9v0FqRltGKtC+xuegHUzeMY/G46GVGvfRF18Ly6aLOfsdzltf3wRtA34D6n2dO1jy
 wZ6SM9raVzVfpJ/NpnUVFQiJvGjYCn605sit5/US0GRneTbK/IX96empjEsoqQ6XRfiWfrKe2i
 JVfsiUjwyTLA/G96Gu/gaEjcJWR1saJx6Tpkfqh97WRBvOCgH53y7fGUjNhcMEFvi4BNylr+XD
 7WCP3oriUXDV+B7m9ODlnz5x74LRvB2Foc0XCp4m2sxKHdCGTY1x5HRK7szBOW2xXx4fUTaQHg
 nEAwQ8+8jmIm7fsfY2O7Z4bJ
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445345"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:19 -0800
IronPort-SDR: 
 EIIJ5H9PneMRY2TLuyXQoB7D0q6Vn5b95Z/fzAX/YsQC8nlUWQJUbCTkwj+2yvJfzp+3pAyF6S
 ZMff7TqHE5iFRFeSi/W+ZW3z51Rm6PqwRtWatVeyPS3EUTwMrU8gzH1WvzFpv7a0bZopcizDMl
 AmDYTIUHh6/Ezg7VcXB8HKQdbkzKXoczAEPrzUsPkJmQhOt48iXfxmhlCA5yToopfxEh7QvYla
 SRBaQ4aWUa2dFH3dxL20TRO3XOSjvj44OZ3+aAP4DRvGtbd95gAueWCBKTqeR4Mv51czUHCkmd
 vws=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 09/22] openacc: Use Graphite for dependence
 analysis in "kernels" regions
Date: Wed, 17 Nov 2021 17:03:17 +0100
Message-ID: <20211117160330.20029-9-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose.  This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops".  The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions.  This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.

Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure.  Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs.  The data-dependence analysis step in
Graphite has so far been tied to the code generation step.  This
commit introduces a separate data-dependence analysis step that avoids
the code generation.  This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.

gcc/ChangeLog:

        * Makefile.in: Add graphite-oacc.o
        * cfgloop.c (alloc_loop): Set can_be_parallel_valid_p to false.
        * cfgloop.h: Add can_be_parallel_valid_p field.
        * cfgloopmanip.c (copy_loop_info): Add assert.
        * config/nvptx/nvptx.c (nvptx_goacc_reduction_setup):
        * doc/invoke.texi: Adjust param openacc-kernels description.
        * doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
        * flag-types.h (enum openacc_kernels):Add
        OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
        * gimple-pretty-print.c (dump_gimple_omp_target): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        * gimple.h (enum gf_mask): Add
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
        widen GF_OMP_TARGET_KIND_MASK.
        (is_gimple_omp_oacc): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (is_gimple_omp_offloaded): Likewise.
        * gimplify.c (gimplify_omp_for): Enable reduction localization
        for "kernels" regions.
        (gimplify_omp_workshare): Likewise.
        * graphite-dependences.c (scop_get_reads_and_writes): Handle
        "kills" and "reduction" PDRs.
        (apply_schedule_on_deps): Add dump output for intermediate
        steps of the dependence computation to enable understanding
        of unexpected dependences.
        (carries_deps): Likewise.
        (scop_get_dependences): Handle "kill" operations and add dump
        output.
        * graphite-isl-ast-to-gimple.c (visit_schedule_loop_node): New function.
        (graphite_oacc_analyze_scop): New function.
        * graphite-optimize-isl.c (optimize_isl): Remove "static" and
        add argument to identify OpenACC use; don't fail on unchanged
        schedule in this case.
        * graphite-poly.c (new_poly_dr): Handle "kills".
        (print_pdr): Likewise.
        (new_gimple_poly_bb): Likewise.
        (free_gimple_poly_bb): Likewise.
        (new_scop): Handle "reduction", "private", and "firstprivate"
        hash sets.
        (free_scop): Likewise.
        (print_isl_space): New function.
        (debug_isl_space): New function.
        * graphite-scop-detection.c (scop_detection::can_represent_loop):
        Don't fail if niter is 0 in OpenACC functions.
        (scop_detection::add_scop): Don't reject regions with only one
        loop in OpenACC functions.
        (ignored_oacc_internal_call_p): New function.
        (scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
        (stmt_has_side_effects): Ignore internal OpenACC function calls.
        (add_write): Likewise.
        (add_read): Likewise.
        (add_kill): New function.
        (add_kills): New function.
        (add_oacc_kills): New function.
        (try_generate_gimple_bb): Kill false dependences for OpenACC
        "private"/"firstprivate" vars.
        (gather_bbs::gather_bbs): Determin OpenACC
        "private"/"firstprivate" vars in region.
        (gather_bbs::before_dom_children): Add assert.
        (determine_openacc_reductions): New function.
        (build_scops): Determine OpenACC "reduction" vars in SCoP.
        * graphite-sese-to-poly.c (oacc_ifn_call_extract): New declaration.
        (oacc_internal_call_p): New function.
        (build_poly_dr): Ignore internal OpenACC function calls,
        * handle "reduction" refs.
        (build_poly_sr): Likewise; handle "kill" operations.
        * graphite.c (graphite_transform_loops): Accept functions with
        only a single loop.
        (oacc_enable_graphite_p): New function.
        (gate_graphite_transforms): Enable pass on OpenACC functions.
        * graphite.h (enum poly_dr_type): Add PDR_KILL.
        (struct poly_dr): Add "is_reduction" field.
        (new_poly_dr): Add argument to declaration.
        (pdr_kill_p): New function.
        (print_isl_space): New declaration.
        (debug_isl_space): New declaration.
        (struct scop): Add fields "reductions_vars",
        "oacc_firstprivate_vars", and "oacc_private_scalars".
        (optimize_isl): New declaration.
        (graphite_oacc_analyze_scop): New declaration.
        * internal-fn.c (expand_UNIQUE): Handle
        IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
        * internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
        * omp-expand.c (struct omp_region): Adjust comment.
        (expand_omp_taskloop_for_inner):
        (expand_omp_for): Add asserts about expected "kernels" region types.
        (mark_loops_in_oacc_kernels_region): Likewise.
        (expand_omp_target): Likewise; handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (build_omp_regions_1): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        Likewise.
        (omp_make_gimple_edges): Likewise.
        * omp-general.c (oacc_get_kernels_attrib): New function.
        (oacc_get_fn_dim_size): Allow argument to be NULL.
        * omp-general.h (oacc_get_kernels_attrib): New declaration.
        * omp-low.c (struct omp_context): Add fields
        "oacc_firstprivate_vars" and "oacc_private_scalars".
        (was_originally_oacc_kernels): New function.
        (is_oacc_kernels):
        (is_oacc_kernels_decomposed_graphite_part): New function.
        (new_omp_context): Allocate "oacc_first_private_vars" and
        "oacc_private_scalars" ...
        (delete_omp_context): ... and free from here.
        (oacc_record_firstprivate_var_clauses): New function.
        (oacc_record_private_scalars): New function.
        (scan_sharing_clauses): Call functions to record "private"
        scalars and "firstprivate" variables.
        (check_oacc_kernel_gwv): Add assert.
        (ctx_in_oacc_kernels_region): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (scan_omp_for): Likewise.
        (check_omp_nesting_restrictions): Likewise.
        (lower_oacc_head_mark): Likewise.
        (lower_omp_for): Likewise.
        (lower_omp_target): Create "private" and "firstprivate" marker
        call statements.
        (lower_oacc_head_tail): Adjust "private" and "firstprivate"
        marker calls.
        (lower_oacc_reductions): Emit "private" and "firstprivate"
         marker call statements.
        (make_oacc_firstprivate_vars_marker): New function.
        (make_oacc_private_scalars_marker): New function.
        * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
        Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
        region using the new "kernels" handling.
        (make_region_seq): Adjust default region type for new
        "kernels" handling; no more exceptions, let Graphite handle everything.
        (make_region_loop_nest): Likewise; add dump output and assert.
        (adjust_nested_loop_clauses): Stop creating "auto" clauses if
        loop has "independent", "gang" etc.
        (transform_kernels_loop_clauses): Likewise.
        * omp-offload.c (oacc_extract_loop_call): New function.
        (oacc_loop_get_cfg_loop): New function.
        (can_be_parallel_str): New function.
        (oacc_loop_can_be_parallel_p): New function.
        (oacc_parallel_kernels_graphite_fun_p): New function.
        (oacc_parallel_fun_p): New function.
        (oacc_loop_transform_auto_into_independent): New function, ...
        (oacc_loop_fixed_partitions): ... called from here to transfer
        the result of Graphite's analysis to the loop.
        (execute_oacc_loop_designation): Handle "oacc
        functions with "parallel_kernels_graphite" attribute.
        (execute_oacc_device_lower): Handle
        IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
        * omp-offload.h (oacc_extract_loop_call): Add declaration.
        * params.opt: Add "param=openacc-kernels" value "decompose-parloops".
        * sese.c (scalar_evolution_in_region): "Redirect" SCEV
        analysis to outer loop for IFN_GOACC_LOOP calls.
        * sese.h: Add field "kill_scalar_refs".
        * tree-chrec.c (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
        like CASE_CONVERT.
        * tree-data-ref.c (dump_data_reference): Include
        * DR_BASE_ADDRESS and DR_OFFSET in dump output.
        (get_references_in_stmt): Don't reject OpenACC internal function
        calls.
        (graphite_find_data_references_in_stmt): Remove unused variable.
        * tree-parloops.c (pass_parallelize_loops::execute): Disable
        pass with the new kernels handling, enable if requested explicitly.
        * tree-scalar-evolution.c (set_scev_analyze_openacc_calls):
        Set flag to enable the analysis of internal OpenACC function
        calls (use for Graphite only).
        (oacc_call_analyzable_p): New function.
        (oacc_ifn_call_extract): New function.
        (oacc_simplify): New function.
        (add_to_evolution): Simplify OpenACC internal function calls
        if applicable.
        (follow_ssa_edge_binary): Likewise.
        (follow_ssa_edge_expr): Likewise.
        (follow_copies_to_constant): Likewise.
        (analyze_initial_condition): Likewise.
        (interpret_loop_phi): Likewise.
        (interpret_gimple_call): New function.
        (interpret_rhs_expr): Likewise.
        (instantiate_scev_name): Likewise.
        (analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
        (expression_expensive_p): Consider internal OpenACC calls to
        be cheap.
        * tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
        New declaration.
        (oacc_call_analyzable_p): New declaration.
        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Mark
        lhs of internal OpenACC function calls necessary.
        * tree-ssa-ifcombine.c (recognize_if_then_else):
        * tree-ssa-loop-niter.c (oacc_call_analyzable_p):
        (oacc_ifn_call_extract): New declaration.
        (interpret_gimple_call): New delcaration.
        (expand_simple_operations): Handle internal OpenACC function calls.
        * tree-ssa-loop.c (gate_oacc_kernels): Disable for new
        "kernels" handling.
        * graphite-oacc.c: New file.
        * graphite-oacc.h: New file.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/classify-kernels.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
        * c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
        * c-c++-common/goacc/classify-kernels-unparallelized.c: Removed.
        * c-c++-common/goacc/kernels-reduction.c: Removed.
        * gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
        * gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
        * gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/Makefile.in                               |   1 +
 gcc/cfgloop.c                                 |   1 +
 gcc/cfgloop.h                                 |   6 +
 gcc/cfgloopmanip.c                            |   1 +
 gcc/config/nvptx/nvptx.c                      |   7 +
 gcc/doc/invoke.texi                           |  20 +-
 gcc/doc/passes.texi                           |   6 +-
 gcc/flag-types.h                              |   1 +
 gcc/gimple-pretty-print.c                     |   3 +
 gcc/gimple.h                                  |   7 +-
 gcc/gimplify.c                                |  13 +-
 gcc/graphite-dependences.c                    | 220 ++++--
 gcc/graphite-isl-ast-to-gimple.c              |  93 ++-
 gcc/graphite-oacc.c                           | 689 ++++++++++++++++++
 gcc/graphite-oacc.h                           |  55 ++
 gcc/graphite-optimize-isl.c                   |   7 +-
 gcc/graphite-poly.c                           |  39 +-
 gcc/graphite-scop-detection.c                 | 171 ++++-
 gcc/graphite-sese-to-poly.c                   |  65 +-
 gcc/graphite.c                                | 120 ++-
 gcc/graphite.h                                |  35 +-
 gcc/internal-fn.c                             |   2 +
 gcc/internal-fn.h                             |   4 +-
 gcc/omp-expand.c                              |  73 +-
 gcc/omp-general.c                             |  21 +-
 gcc/omp-general.h                             |   1 +
 gcc/omp-low.c                                 | 321 ++++++--
 gcc/omp-oacc-kernels-decompose.cc             | 145 ++--
 gcc/omp-offload.c                             | 512 +++++++++++--
 gcc/omp-offload.h                             |   2 +
 gcc/params.opt                                |   5 +-
 gcc/sese.c                                    |  25 +-
 gcc/sese.h                                    |   1 +
 .../goacc/classify-kernels-unparallelized.c   |  45 --
 .../c-c++-common/goacc/classify-kernels.c     |   2 +-
 .../c-c++-common/goacc/kernels-reduction.c    |  36 -
 ...kernels-conditional-loop-independent_seq.c |   2 +-
 .../goacc/note-parallelism-1-kernels-loops.c  |   4 +-
 .../goacc/note-parallelism-kernels-loops.c    |  14 +-
 .../goacc/loop-auto-transfer-2.f90            |  47 ++
 .../goacc/loop-auto-transfer-3.f90            | 103 +++
 .../goacc/loop-auto-transfer-4.f90            | 323 ++++++++
 gcc/tree-chrec.c                              |   3 +
 gcc/tree-data-ref.c                           |  20 +-
 gcc/tree-parloops.c                           |  18 +-
 gcc/tree-scalar-evolution.c                   | 179 ++++-
 gcc/tree-scalar-evolution.h                   |   3 +
 gcc/tree-ssa-dce.c                            |  14 +
 gcc/tree-ssa-loop-niter.c                     |   6 +
 gcc/tree-ssa-loop.c                           |  11 +
 .../libgomp.oacc-c-c++-common/parallel-dims.c |   2 +
 .../gangprivate-attrib-1.f90                  |   2 +-
 .../kernels-independent.f90                   |   1 +
 .../libgomp.oacc-fortran/kernels-loop-1.f90   |   1 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |   1 +
 55 files changed, 3089 insertions(+), 420 deletions(-)
 create mode 100644 gcc/graphite-oacc.c
 create mode 100644 gcc/graphite-oacc.h
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 304434cbb4b0..4ebdcdbc5f8c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1426,6 +1426,7 @@ OBJS = \
        graphite-poly.o \
        graphite-scop-detection.o \
        graphite-sese-to-poly.o \
+       graphite-oacc.o \
        gtype-desc.o \
        haifa-sched.o \
        hash-map-tests.o \
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 4e227cd0891e..996a38fca894 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -351,6 +351,7 @@ alloc_loop (void)
   loop->exits = ggc_cleared_alloc<loop_exit> ();
   loop->exits->next = loop->exits->prev = loop->exits;
   loop->can_be_parallel = false;
+  loop->can_be_parallel_valid_p = false;
   loop->constraints = 0;
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 113241da130a..f067bfec539e 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -213,6 +213,12 @@ public:
   /* True if the loop can be parallel.  */
   unsigned can_be_parallel : 1;

+  /* True if the can_be_parallel flag is valid, i.e.  the
+     parallelizability of the loop has been analyzed.  This can be
+     used to distinguish between unparallelizable loops and a failed
+     analysis, e.g. to provide better diagnostic messages. */
+  unsigned can_be_parallel_valid_p : 1;
+
   /* True if -Waggressive-loop-optimizations warned about this loop
      already.  */
   unsigned warned_aggressive_loop_optimizations : 1;
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 99a88b855e11..8305f6a75a29 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1017,6 +1017,7 @@ copy_loop_info (class loop *loop, class loop *target)
   target->simdlen = loop->simdlen;
   target->constraints = loop->constraints;
   target->can_be_parallel = loop->can_be_parallel;
+  target->can_be_parallel_valid_p = loop->can_be_parallel_valid_p;
   target->warned_aggressive_loop_optimizations
     |= loop->warned_aggressive_loop_optimizations;
   target->dont_vectorize = loop->dont_vectorize;
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index e23c3902306d..15f6fc821328 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5941,7 +5941,14 @@ nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
     }

   if (lhs)
+    {
+      //TODO Earlier check for ICE as reported in <http://mid.mail-archive.com/878s9zgir3.fsf@euler.schwinge.homeip.net>.
+      //TODO Not sure if this makes too much sense to have (just) here -- should probably be moved (way) further up in the pipeline?
+      if (TREE_CODE (TREE_TYPE (lhs)) == REFERENCE_TYPE)
+       gcc_checking_assert (is_gimple_addressable (var));
+
     gimplify_assign (lhs, var, &seq);
+    }

   pop_gimplify_context (NULL);
   gsi_replace_with_seq (&gsi, seq, true);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7f938e30d3aa..ef55ee595fc4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14434,14 +14434,22 @@ Maximum depth of logical expression evaluation ranger will look through
 when evaluating outgoing edge ranges.

 @item openacc-kernels
-Specify mode of OpenACC `kernels' constructs handling.
-With @option{--param=openacc-kernels=decompose}, OpenACC `kernels'
+Specify mode of OpenACC `kernels' constructs handling.  With
+@option{--param=openacc-kernels=decompose}, OpenACC `kernels'
 constructs are decomposed into parts, a sequence of compute
-constructs, each then handled individually.
-This is work in progress.
+constructs, each then handled individually. The data dependence
+analysis that is necessary to determine if loops can be parallelized
+is performed by the Graphite pass.
+This is the default.
+With @option{--param=openacc-kernels=decompose-parloops}, OpenACC
+`kernels' constructs are decomposed into parts, a sequence of compute
+constructs, each then handled individually by the @samp{parloops}
+pass.
+This is deprecated.
 With @option{--param=openacc-kernels=parloops}, OpenACC `kernels'
-constructs are handled by the @samp{parloops} pass, en bloc.
-This is the current default.
+constructs are handled by the @samp{parloops} pass, en bloc.  This is
+deprecated.
+This is deprecated.

 @end table

diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 9046cbed2d90..2649e01cc945 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -248,9 +248,9 @@ constraints in order to generate the points-to sets.  It is located in

 This is a pass group for processing OpenACC kernels regions.  It is a
 subpass of the IPA OpenACC pass group that runs on offloaded functions
-containing OpenACC kernels loops.  It is located in
-@file{tree-ssa-loop.c} and is described by
-@code{pass_ipa_oacc_kernels}.
+containing OpenACC kernels loops if @samp{parloops} based handling of
+kernels regions is used. It is located in @file{tree-ssa-loop.c} and
+is described by @code{pass_ipa_oacc_kernels}.

 @item Target clone

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a038c8fb738f..db803eb19c87 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -424,6 +424,7 @@ enum evrp_mode
 enum openacc_kernels
 {
   OPENACC_KERNELS_DECOMPOSE,
+  OPENACC_KERNELS_DECOMPOSE_PARLOOPS,
   OPENACC_KERNELS_PARLOOPS
 };

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2618b39c031d..03d9010e044a 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1769,6 +1769,9 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       kind = " oacc_parallel_kernels_gang_single";
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+      kind = " oacc_parallel_kernels_graphite";
+      break;
     case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       kind = " oacc_data_kernels";
       break;
diff --git a/gcc/gimple.h b/gcc/gimple.h
index ab41d851de74..988956242820 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -161,7 +161,7 @@ enum gf_mask {
     GF_OMP_FOR_KIND_SIMD       = 5,
     GF_OMP_FOR_COMBINED                = 1 << 3,
     GF_OMP_FOR_COMBINED_INTO   = 1 << 4,
-    GF_OMP_TARGET_KIND_MASK    = (1 << 4) - 1,
+    GF_OMP_TARGET_KIND_MASK    = (1 << 5) - 1,
     GF_OMP_TARGET_KIND_REGION  = 0,
     GF_OMP_TARGET_KIND_DATA    = 1,
     GF_OMP_TARGET_KIND_UPDATE  = 2,
@@ -184,6 +184,9 @@ enum gf_mask {
     /* A 'GF_OMP_TARGET_KIND_OACC_DATA' representing an OpenACC 'kernels'
        decomposed parts' 'data' construct.  */
     GF_OMP_TARGET_KIND_OACC_DATA_KERNELS = 15,
+    /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels'
+       construct, for Graphite to analyze.  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE = 16,
     GF_OMP_TEAMS_HOST          = 1 << 0,

     /* True on an GIMPLE_OMP_RETURN statement if the return does not require
@@ -6619,6 +6622,7 @@ is_gimple_omp_oacc (const gimple *stmt)
        case GF_OMP_TARGET_KIND_OACC_DECLARE:
        case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+       case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
        case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
          return true;
@@ -6648,6 +6652,7 @@ is_gimple_omp_offloaded (const gimple *stmt)
        case GF_OMP_TARGET_KIND_OACC_SERIAL:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+       case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
          return true;
        default:
          return false;
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 24ce0e0fbe94..3291c030aca5 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12934,11 +12934,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
             && outer->region_type != ORT_ACC_KERNELS)
        outer = outer->outer_context;

-      /* FIXME: Reductions only work in parallel regions at present.  We avoid
-        doing the reduction localization transformation in kernels regions
-        here, because the code to remove reductions in kernels regions cannot
-        handle that.  */
-      if (outer && outer->region_type == ORT_ACC_PARALLEL)
+      if (outer && (outer->region_type == ORT_ACC_PARALLEL
+                   || (outer->region_type == ORT_ACC_KERNELS
+                       && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)))
        localize_reductions (OMP_FOR_CLAUSES (for_stmt),
                             OMP_FOR_BODY (for_stmt));
     }
@@ -14472,8 +14470,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     {
       push_gimplify_context ();

-      /* FIXME: Reductions are not supported in kernels regions yet.  */
-      if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+      if (ort == ORT_ACC_PARALLEL
+          || (ort == ORT_ACC_KERNELS
+              && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE))
         localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

       gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 9f2eda34add3..24b081624c72 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -38,6 +38,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "graphite.h"
+#include "graphite-oacc.h"
+#include "gimple-pretty-print.h"
+

 /* Add the constraints from the set S to the domain of MAP.  */

@@ -63,71 +66,108 @@ add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb)
   return constrain_domain (x, isl_set_copy (pbb->domain));
 }

-/* Returns an isl description of all memory operations in SCOP.  The memory
-   reads are returned in READS and writes in MUST_WRITES and MAY_WRITES.  */
+/* Returns an isl description of all memory operations in SCOP.  The
+   memory reads are returned in READS and writes in MUST_WRITES and
+   MAY_WRITES, kills go to KILLS. */

 static void
 scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
                           isl_union_map *&must_writes,
-                          isl_union_map *&may_writes)
+                          isl_union_map *&may_writes,
+                          isl_union_map *&kills)
 {
   int i, j;
   poly_bb_p pbb;
   poly_dr_p pdr;

   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
+  {
+    FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr)
     {
-      FOR_EACH_VEC_ELT (PBB_DRS (pbb), j, pdr) {
-       if (pdr_read_p (pdr))
-         {
-           if (dump_file)
-             {
-               fprintf (dump_file, "Adding read to depedence graph: ");
-               print_pdr (dump_file, pdr);
-             }
-           isl_union_map *um
-             = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
-           reads = isl_union_map_union (reads, um);
-           if (dump_file)
-             {
-               fprintf (dump_file, "Reads depedence graph: ");
-               print_isl_union_map (dump_file, reads);
-             }
-         }
-       else if (pdr_write_p (pdr))
-         {
-           if (dump_file)
-             {
-               fprintf (dump_file, "Adding must write to depedence graph: ");
-               print_pdr (dump_file, pdr);
-             }
-           isl_union_map *um
-             = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
-           must_writes = isl_union_map_union (must_writes, um);
-           if (dump_file)
-             {
-               fprintf (dump_file, "Must writes depedence graph: ");
-               print_isl_union_map (dump_file, must_writes);
-             }
-         }
-       else if (pdr_may_write_p (pdr))
-         {
-           if (dump_file)
-             {
-               fprintf (dump_file, "Adding may write to depedence graph: ");
-               print_pdr (dump_file, pdr);
-             }
-           isl_union_map *um
-             = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
-           may_writes = isl_union_map_union (may_writes, um);
-           if (dump_file)
-             {
-               fprintf (dump_file, "May writes depedence graph: ");
-               print_isl_union_map (dump_file, may_writes);
-             }
-         }
-      }
+      isl_union_map *um = NULL;
+
+      if (pdr->is_reduction)
+       {
+         if (dump_file)
+           {
+              fprintf (dump_file,
+                       "Skipped reduction variable %s in statement .\n",
+                      pdr_write_p (pdr) ? "read" : "write");
+             print_gimple_stmt (dump_file, pdr->stmt, 0, dump_flags);
+             fprintf (dump_file, "\n");
+            }
+          continue;
+       }
+
+      if (pdr_read_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding %sread to dependence graph: ",
+                   pdr->is_reduction ? "reduction " : "");
+              print_pdr (dump_file, pdr);
+             isl_map* tmp = add_pdr_constraints (pdr, pbb);
+             print_isl_map (dump_file, tmp);
+             isl_map_free (tmp);
+            }
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          reads = isl_union_map_union (reads, um);
+          if (dump_file)
+           {
+              fprintf (dump_file, "Reads dependence graph: ");
+              print_isl_union_map (dump_file, reads);
+            }
+        }
+      else if (pdr_write_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding %smust write to dependence graph: ",
+                      pdr->is_reduction ? "reduction " : "");
+              print_pdr (dump_file, pdr);
+            }
+
+
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          must_writes = isl_union_map_union (must_writes, um);
+        }
+      else if (pdr_may_write_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding %smay write to dependence graph: ",
+                      pdr->is_reduction ? "reduction " : "");
+              print_pdr (dump_file, pdr);
+            }
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          may_writes = isl_union_map_union (may_writes, um);
+          if (dump_file)
+            {
+              fprintf (dump_file, "May writes dependence graph: ");
+              print_isl_union_map (dump_file, may_writes);
+            }
+        }
+      else if (pdr_kill_p (pdr))
+        {
+          if (dump_file)
+            {
+              fprintf (dump_file, "Adding kill to dependence graph: ");
+              print_pdr (dump_file, pdr);
+            }
+          um = isl_union_map_from_map (add_pdr_constraints (pdr, pbb));
+
+          kills = isl_union_map_union (kills, um);
+          if (dump_file)
+            {
+              fprintf (dump_file, "Kills: ");
+              print_isl_union_map (dump_file, kills);
+            }
+        }
     }
+  }
 }

 /* Helper function used on each MAP of a isl_union_map.  Computes the
@@ -203,7 +243,19 @@ apply_schedule_on_deps (__isl_keep isl_union_map *schedule,
   isl_union_map *trans = extend_schedule (isl_union_map_copy (schedule));
   isl_union_map *ux = isl_union_map_copy (deps);
   ux = isl_union_map_apply_domain (ux, isl_union_map_copy (trans));
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Applied domain map to dependences:\n");
+      print_isl_union_map (dump_file, ux);
+    }
   ux = isl_union_map_apply_range (ux, trans);
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Applied range map:\n");
+      print_isl_union_map (dump_file, ux);
+    }
+
   ux = isl_union_map_coalesce (ux);

   if (!isl_union_map_is_empty (ux))
@@ -230,6 +282,12 @@ carries_deps (__isl_keep isl_union_map *schedule,
   if (x == NULL)
     return false;

+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Applied schedule on dependences:\n");
+      print_isl_map (dump_file, x);
+    }
+
   isl_space *space = isl_map_get_space (x);
   isl_map *lex = isl_map_lex_le (isl_space_range (space));
   isl_constraint *ineq = isl_inequality_alloc
@@ -244,7 +302,22 @@ carries_deps (__isl_keep isl_union_map *schedule,
   ineq = isl_constraint_set_constant_si (ineq, -1);
   lex = isl_map_add_constraint (lex, ineq);
   lex = isl_map_coalesce (lex);
+
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Lex: \n");
+      print_isl_map (dump_file, lex);
+    }
+
   x = isl_map_intersect (x, lex);
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "Intersect: \n");
+      print_isl_map (dump_file, x);
+    }
+
   bool res = !isl_map_is_empty (x);

   isl_map_free (x);
@@ -265,8 +338,9 @@ scop_get_dependences (scop_p scop)
   isl_space *space = isl_set_get_space (scop->param_context);
   isl_union_map *reads = isl_union_map_empty (isl_space_copy (space));
   isl_union_map *must_writes = isl_union_map_empty (isl_space_copy (space));
-  isl_union_map *may_writes = isl_union_map_empty (space);
-  scop_get_reads_and_writes (scop, reads, must_writes, may_writes);
+  isl_union_map *may_writes = isl_union_map_empty (isl_space_copy (space));
+  isl_union_map *kills = isl_union_map_empty (space);
+  scop_get_reads_and_writes (scop, reads, must_writes, may_writes, kills);

   if (dump_file)
     {
@@ -282,10 +356,11 @@ scop_get_dependences (scop_p scop)
       fprintf (dump_file, "  [1, i0] is a 'memref' with alias set 1"
               " and first subscript access i0.\n");
       fprintf (dump_file, "  [106] is a 'scalar reference' which is the sum of"
-              " SSA_NAME_VERSION 6"
-              " and --param graphite-max-arrays-per-scop=100\n");
+              " SSA_NAME_VERSION 6 and scop->max_alias_set whose value\n is 100"
+              " in this example.\n");
       fprintf (dump_file, "-----------------------\n\n");

+      fprintf (dump_file, "max_alias_set: %d\n", scop->max_alias_set);
       fprintf (dump_file, "data references (\n");
       fprintf (dump_file, "  reads: ");
       print_isl_union_map (dump_file, reads);
@@ -293,31 +368,59 @@ scop_get_dependences (scop_p scop)
       print_isl_union_map (dump_file, must_writes);
       fprintf (dump_file, "  may_writes: ");
       print_isl_union_map (dump_file, may_writes);
+      fprintf (dump_file, "  kills: ");
+      print_isl_union_map (dump_file, kills);
       fprintf (dump_file, ")\n");
     }

   gcc_assert (scop->original_schedule);

+
   isl_union_access_info *ai;
   ai = isl_union_access_info_from_sink (isl_union_map_copy (reads));
   ai = isl_union_access_info_set_must_source (ai, isl_union_map_copy (must_writes));
   ai = isl_union_access_info_set_may_source (ai, may_writes);
+  ai = isl_union_access_info_set_kill (ai, isl_union_map_copy (kills));
   ai = isl_union_access_info_set_schedule
     (ai, isl_schedule_copy (scop->original_schedule));
   isl_union_flow *flow = isl_union_access_info_compute_flow (ai);
   isl_union_map *raw = isl_union_flow_get_must_dependence (flow);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "raw dependences (\n");
+      print_isl_union_map (dump_file, raw);
+      fprintf (dump_file, ")\n");
+    }
+
   isl_union_flow_free (flow);

   ai = isl_union_access_info_from_sink (isl_union_map_copy (must_writes));
   ai = isl_union_access_info_set_must_source (ai, must_writes);
   ai = isl_union_access_info_set_may_source (ai, reads);
+  ai = isl_union_access_info_set_kill (ai, kills);
   ai = isl_union_access_info_set_schedule
     (ai, isl_schedule_copy (scop->original_schedule));
   flow = isl_union_access_info_compute_flow (ai);

   isl_union_map *waw = isl_union_flow_get_must_dependence (flow);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "waw dependences (\n");
+      print_isl_union_map (dump_file, waw);
+      fprintf (dump_file, ")\n");
+    }
   isl_union_map *war = isl_union_flow_get_may_dependence (flow);
   war = isl_union_map_subtract (war, isl_union_map_copy (waw));
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "war dependences (\n");
+      print_isl_union_map (dump_file, war);
+      fprintf (dump_file, ")\n");
+    }
+
   isl_union_flow_free (flow);

   raw = isl_union_map_coalesce (raw);
@@ -331,6 +434,9 @@ scop_get_dependences (scop_p scop)

   if (dump_file)
     {
+      fprintf (dump_file, "(space: " );
+      print_isl_space (dump_file, space);
+      fprintf (dump_file, ")\n");
       fprintf (dump_file, "data dependences (\n");
       print_isl_union_map (dump_file, dependences);
       fprintf (dump_file, ")\n");
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index caa0160b9bce..c516170d9493 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -56,6 +56,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"
+#include "stdlib.h"

 struct ast_build_info
 {
@@ -1456,8 +1458,8 @@ generate_entry_out_of_ssa_copies (edge false_entry,
     }
 }

-/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
-   aliasing. */
+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS
+   are free of aliasing. */

 static tree
 generate_alias_cond (vec<ddr_p> &alias_ddrs, loop_p context_loop)
@@ -1618,4 +1620,91 @@ graphite_regenerate_ast_isl (scop_p scop)
   return !t.codegen_error_p ();
 }

+/* A callback for traversing a schedule tree that visits the band
+ nodes of a schedule which correspond to loops. Checks if the local
+ schedule carries any dependencies and marks the corresponding CFG
+ loops as being parallelizable accordingly. */
+
+static isl_bool
+visit_schedule_loop_node (__isl_keep isl_schedule_node *node, void *user)
+{
+  isl_bool visit_children = isl_bool_true;
+
+  if (isl_schedule_node_get_type (node) != isl_schedule_node_band)
+    return visit_children;
+
+  isl_union_map *dependences = (isl_union_map *)user;
+  isl_union_map *schedule
+      = isl_schedule_node_band_get_partial_schedule_union_map (node);
+  isl_space *space = isl_schedule_node_band_get_space (node);
+
+  isl_id *id = isl_space_get_tuple_id (space, isl_dim_out);
+  const char *name = isl_id_get_name (id);
+  /* Expect format set by add_loop_schedule, i.e. "L_n" */
+  gcc_checking_assert (name[0] == 'L' && name[1] == '_');
+  int loop_num = atoi (name + 2);
+  isl_id_free (id);
+
+  int dimension = isl_space_dim (space, isl_dim_out);
+  loop_p loop = get_loop (cfun, loop_num);
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      fprintf (dump_file, "CFG loop %d:\n", loop_num);
+      print_isl_union_map (dump_file, schedule);
+      fprintf (dump_file, "Schedule dimension: %d\n", dimension);
+
+      fprintf (dump_file, "Schedule node space:\n");
+      print_isl_space (dump_file, space);
+      fprintf (dump_file, "data dependences (\n");
+      print_isl_union_map (dump_file, dependences);
+      fprintf (dump_file, ")\n");
+    }
+
+  bool has_deps = carries_deps (schedule, dependences, dimension);
+
+  loop->can_be_parallel = !has_deps;
+  loop->can_be_parallel_valid_p = true;
+
+  if (dump_file && dump_flags & TDF_DETAILS)
+    {
+      dump_user_location_t loc = find_loop_location (loop);
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+                       "loop %s data-dependences.\n",
+                      has_deps ? "has" : "has no");
+
+      fprintf (dump_file, ")\n");
+    }
+
+  isl_union_map_free (schedule);
+  isl_space_free (space);
+
+
+  return visit_children;
+}
+
+/* This function performs data-dependence analysis on the SCoP without using
+   Graphite's code generation. This is meant for OpenACC use since the code
+   generator is unable to reconstruct the OpenACC loop structure. */
+
+bool
+graphite_oacc_analyze_scop (scop_p scop)
+{
+  timevar_push (TV_GRAPHITE_CODE_GEN);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "[graphite_oacc_analyze_scop] schedule:\n");
+      print_isl_schedule (dump_file, scop->original_schedule);
+    }
+
+  /* Analyze dependences in SCoP and mark loops as parallelizable accordingly. */
+  isl_schedule_foreach_schedule_node_top_down (
+      scop->original_schedule, visit_schedule_loop_node, scop->dependence);
+
+  timevar_pop (TV_GRAPHITE_CODE_GEN);
+
+  return true;
+}
+
 #endif  /* HAVE_isl */
diff --git a/gcc/graphite-oacc.c b/gcc/graphite-oacc.c
new file mode 100644
index 000000000000..94df2bc19c73
--- /dev/null
+++ b/gcc/graphite-oacc.c
@@ -0,0 +1,689 @@
+/* Functions for analyzing the OpenACC loop structure from Graphite.
+
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "cfghooks.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfgloop.h"
+
+#include "internal-fn.h"
+#include "gimple.h"
+#include "tree-cfg.h"
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "print-tree.h"
+
+#include "gimple-ssa.h"
+#include "gimple-iterator.h"
+#include "tree-phinodes.h"
+#include "tree-ssa-operands.h"
+#include "ssa-iterators.h"
+#include "omp-general.h"
+#include "graphite-oacc.h"
+
+unsigned
+gimple_call_internal_kind (gimple *call)
+{
+  return TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+}
+
+static bool inline gimple_call_ifn_unique_p (gimple *call,
+                                             enum ifn_unique_kind kind)
+{
+  if (!gimple_call_internal_p (call, IFN_UNIQUE))
+    return false;
+
+  return kind == gimple_call_internal_kind (call);
+}
+
+static bool inline goacc_reduction_call_p (gimple *call)
+{
+  return gimple_call_internal_p (call, IFN_GOACC_REDUCTION);
+}
+
+static bool inline goacc_reduction_call_p (gimple *call,
+                                           enum ifn_goacc_reduction_kind kind)
+{
+  return gimple_call_internal_p (call, IFN_GOACC_REDUCTION)
+         && gimple_call_internal_kind (call) == kind;
+}
+
+/* Check if VAR is private in the OpenACC loop that encloses the cfg LOOP. The
+   function returns TRUE if there is an IFN_UNIQUE_OACC_PRIVATE call in the
+   head sequence that precedes the CFG loop. */
+
+bool
+is_oacc_private (tree var, loop_p loop)
+{
+  return false;
+
+  if (TREE_CODE (var) == SSA_NAME)
+    {
+      if (!SSA_NAME_VAR (var))
+        return false;
+
+      var = SSA_NAME_VAR (var);
+    }
+
+  gcc_checking_assert (TREE_CODE (var) == VAR_DECL);
+
+  if (!loop)
+    return false;
+
+  basic_block bb = loop->header;
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  while (bb != entry_bb)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      gimple *stmt = last_stmt (bb);
+      if (!stmt)
+        continue;
+
+      /* We are looking for the sequence of IFN_UNIQUE calls at the
+          head of the current OpenACC loop. */
+      if (!gimple_call_internal_p (stmt, IFN_UNIQUE))
+        continue;
+
+      enum ifn_unique_kind kind
+          = (enum ifn_unique_kind)TREE_INT_CST_LOW (gimple_call_arg (stmt, 0));
+
+      /* The head mark that starts the current OpenACC loop.
+          Private calls above here are irrelevant. Stop. */
+      if (kind == IFN_UNIQUE_OACC_HEAD_MARK && gimple_call_num_args (stmt) > 2)
+        break;
+
+      if (kind != IFN_UNIQUE_OACC_PRIVATE)
+        continue;
+
+      tree private_var = gimple_call_arg (stmt, 3);
+
+      if (TREE_CODE (private_var) == ADDR_EXPR)
+        private_var = TREE_OPERAND (private_var, 0);
+
+      if (var == private_var)
+        return true;
+    }
+
+  return false;
+}
+
+void
+oacc_add_private_var_kills (loop_p loop, vec<tree> *kills)
+{
+  gcc_checking_assert (loop);
+
+  basic_block bb = loop->header;
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  while (bb != entry_bb)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+      gimple *stmt = last_stmt (bb);
+      if (!stmt)
+        continue;
+
+      /* We are looking for the sequence of IFN_UNIQUE calls at the head of the
+         current OpenACC loop. */
+
+      if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_HEAD_MARK))
+        continue;
+
+      /* The head mark that starts the current OpenACC loop.
+         Private calls above here are irrelevant. Stop. */
+      if (gimple_call_num_args (stmt) > 2)
+        break;
+
+      if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_PRIVATE))
+        continue;
+
+      tree private_var = gimple_call_arg (stmt, 3);
+
+      gcc_checking_assert (TREE_CODE (private_var) == ADDR_EXPR);
+      private_var = TREE_OPERAND (private_var, 0);
+      kills->safe_push (private_var);
+    }
+}
+
+typedef std::pair<gcall *, gcall *> gcall_pair;
+
+/* Returns a pair that contains the internal function calls that start
+   and end the head sequence of the OpenACC loop enclosing the cfg
+   loop LOOP or a pair of NULL pointers if LOOP is not enclosed in a
+   OpenACC LOOP. */
+
+gcall_pair
+find_oacc_head_marks (loop_p loop)
+{
+  basic_block bb = loop->header;
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  gcall *top_head_mark = NULL;
+  gcall *bottom_head_mark = NULL;
+
+  while (bb != entry_bb)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+      gimple *stmt = last_stmt (bb);
+      if (!stmt)
+        continue;
+
+      /* Look for IFN_UNIQUE calls in the head of OpenACC loop. */
+      if (!gimple_call_ifn_unique_p (stmt, IFN_UNIQUE_OACC_HEAD_MARK))
+        continue;
+
+      if (!bottom_head_mark)
+        {
+          bottom_head_mark = as_a<gcall *> (stmt);
+          continue;
+        }
+
+      /* The head mark that starts the current OpenACC loop can be
+         recognized by the number of call arguments, cf. omp-low.c.  */
+      if (gimple_call_num_args (stmt) > 3)
+        {
+          top_head_mark = as_a<gcall *> (stmt);
+          break;
+        }
+    }
+
+  gcc_checking_assert ((top_head_mark && bottom_head_mark)
+                       || (!top_head_mark && !bottom_head_mark));
+
+  return gcall_pair (top_head_mark, bottom_head_mark);
+}
+
+/* Returns the internal function call that starts the tail sequence of the
+   OpenACC loop that encloses the CFG loop LOOP or NULL if LOOP is not
+   contained in an OpenACC loop. */
+
+gcall *
+find_oacc_top_tail_mark (loop_p loop)
+{
+  gcall_pair head_marks = find_oacc_head_marks (loop);
+
+  if (!head_marks.first || !head_marks.second)
+    return NULL;
+
+  tree data_dep = gimple_call_lhs (head_marks.second);
+  gcc_checking_assert (has_single_use (data_dep));
+
+  gimple *tail_mark;
+  use_operand_p use_p;
+  single_imm_use (data_dep, &use_p, &tail_mark);
+
+  return as_a<gcall *> (tail_mark);
+}
+
+/* Returns a pair containing the internal function calls that start and end the
+   tail sequence of the OpenACC loop that encloses the cfg loop LOOP or a pair
+   of NULL pointers if LOOP does not belong to an OpenACC loop. */
+
+gcall_pair
+find_oacc_tail_marks (loop_p loop)
+{
+  gcall *top_tail_mark = find_oacc_top_tail_mark (loop);
+
+  if (!top_tail_mark)
+    return gcall_pair (NULL, NULL);
+
+  tree data_dep = gimple_call_lhs (top_tail_mark);
+  gimple *stmt = top_tail_mark;
+
+  while (has_single_use (data_dep))
+    {
+      use_operand_p use_p;
+      single_imm_use (data_dep, &use_p, &stmt);
+      data_dep = gimple_call_lhs (stmt);
+
+      gcc_checking_assert (gimple_call_internal_p (stmt));
+    }
+
+  gcall *end_tail_mark = as_a<gcall *> (stmt);
+
+  gcc_checking_assert (
+      gimple_call_ifn_unique_p (end_tail_mark, IFN_UNIQUE_OACC_TAIL_MARK));
+
+  return gcall_pair (top_tail_mark, end_tail_mark);
+}
+
+/* Add all ssa names to VARS that can be reached from PHI by a
+   phi node walk. */
+
+static void
+collect_oacc_reduction_vars_phi_walk (gphi *phi, hash_set<tree> &vars)
+{
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES)
+  {
+    tree use = USE_FROM_PTR (use_p);
+    if (TREE_CODE (use) != SSA_NAME)
+      continue;
+
+    if (vars.contains (use))
+      continue;
+
+    gimple *def_stmt = SSA_NAME_DEF_STMT (use);
+    vars.add (use);
+
+    gphi *use_phi = dyn_cast<gphi *> (def_stmt);
+    if (use_phi)
+      {
+        collect_oacc_reduction_vars_phi_walk (use_phi, vars);
+
+        continue;
+      }
+  }
+}
+
+/* Returns true iff following the immediate use chain from the
+   IFN_GOACC_REDUCTION call CALL leads out of loop that contains CALL. */
+
+static bool
+reduction_use_in_outer_loop_p (gcall *call)
+{
+  gcc_checking_assert (goacc_reduction_call_p (call));
+
+  tree data_dep = gimple_call_lhs (call);
+
+  /* The IFN_GOACC_REDUCTION_CALLS are linked in a chain through
+     immediate uses. Move to the end of this chain. */
+  gimple *stmt = call;
+  while (has_single_use (data_dep))
+    {
+      use_operand_p use_p;
+      single_imm_use (data_dep, &use_p, &stmt);
+
+      if (!goacc_reduction_call_p (stmt))
+        return true;
+
+      data_dep = gimple_call_lhs (stmt);
+    }
+
+  gcc_checking_assert (goacc_reduction_call_p (stmt));
+
+  /* Call starting further reduction use in outer loop. */
+  if (goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_SETUP))
+    return true;
+
+  /* Reduction use ends with last internal call in present loop. */
+  if (goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_TEARDOWN))
+    return false;
+  gcc_unreachable ();
+}
+
+/* Add all ssa names to VARS that can be reached from BB by walking
+   through the phi nodes which start at the result of an OpenACC
+   reduction computation in BB. */
+
+static void
+collect_oacc_reduction_vars_in_bb (basic_block bb, hash_set<tree> &vars)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (!goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_FINI))
+        continue;
+
+      tree var = gimple_call_arg (stmt, 2);
+      gcc_checking_assert (TREE_CODE (var) == SSA_NAME);
+
+      if (vars.contains (var))
+        continue;
+
+      gimple *def_stmt = SSA_NAME_DEF_STMT (var);
+
+      if (gimple_code (def_stmt) != GIMPLE_PHI)
+        {
+          gcc_checking_assert (goacc_reduction_call_p (def_stmt));
+
+          continue;
+        }
+
+      gcc_checking_assert (
+          goacc_reduction_call_p (stmt, IFN_GOACC_REDUCTION_FINI));
+      gcc_checking_assert (gimple_code (def_stmt) == GIMPLE_PHI);
+
+      if (reduction_use_in_outer_loop_p (as_a<gcall *> (stmt)))
+        vars.add (var);
+
+      collect_oacc_reduction_vars_phi_walk (static_cast<gphi *> (def_stmt),
+                                            vars);
+    }
+}
+
+/* Add all ssa names to VARS that are defined by phi nodes in the header of LOOP
+   such that at least one argument of the phi belongs to VARS. */
+
+static void
+collect_oacc_reduction_vars_in_loop_header (loop_p loop, hash_set<tree> &vars)
+{
+  for (gphi_iterator gpi = gsi_start_phis (loop->header); !gsi_end_p (gpi);
+       gsi_next (&gpi))
+    {
+      gphi *phi = const_cast<gphi *> (gpi.phi ());
+
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES)
+      {
+        tree use = USE_FROM_PTR (use_p);
+        if (vars.contains (use))
+          vars.add (gimple_phi_result (phi));
+      }
+    }
+}
+
+/* Find the ssa names that belong to an OpenACC reduction in the OpenACC loop
+   that surrounds the cfg loop LOOP and add them to VARS.  LOOP must be
+   contained in an OpenACC loop.
+
+   Since the reductions have not and cannot be lowered before execution of the
+   Graphite pass because their lowering is device dependent, Graphite needs to
+   simulate the privatization of the reduction variables by removing
+   dependences between the iteration instances of the loop and the dependences
+   arising from copying the initial value of the reduction variable in and the
+   result out.
+
+   The OpenACC lowering will copy the results of reduction computations at the
+   IFN_GOACC_REDUCTION_FINI calls.  The main reduction statement can thus be
+   identified by walking from those calls through all encountered phi nodes
+   until we reach a gimple assignment statement. The ssa name defined by this
+   statement as well as the ssa_names encountered in the phis along the way are
+   recorded in VARS. In addition, the ssa name defined by each phi which uses a
+   previously identified reduction variable in LOOP's header will also be added
+   to VARS. */
+
+void
+collect_oacc_reduction_vars (loop_p loop, hash_set<tree> &vars)
+{
+  gcall_pair tail = find_oacc_tail_marks (loop);
+  bool in_openacc_loop = tail.first != NULL;
+
+  if (!in_openacc_loop)
+    return;
+
+  const gcall *top_mark = tail.first;
+  const gcall *bottom_mark = tail.second;
+
+  basic_block bb = top_mark->bb;
+  gcc_checking_assert (single_succ_p (bb));
+
+  do
+    {
+      bb = single_succ (bb);
+      collect_oacc_reduction_vars_in_bb (bb, vars);
+    }
+  while (bb != bottom_mark->bb && single_succ_p (bb));
+
+  collect_oacc_reduction_vars_in_loop_header (loop, vars);
+}
+
+static void collect_oacc_privatized_vars_phi_walk_visit_phi_uses (
+    tree var, hash_set<tree> &vars, hash_set<tree> &visited);
+
+/* Add all ssa names to VARS that can be reached from PHI by a phi node walk. */
+
+static void
+collect_oacc_privatized_vars_phi_walk (gphi *phi, hash_set<tree> &vars,
+                                       hash_set<tree> &visited)
+{
+  tree var = PHI_RESULT (phi);
+  bool existed = vars.add (var);
+  if (existed)
+    return;
+
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_PHI_ARG (use_p, phi, iter, SSA_OP_ALL_USES)
+  {
+    tree use = USE_FROM_PTR (use_p);
+    if (TREE_CODE (use) != SSA_NAME)
+      continue;
+
+    if (visited.contains (use))
+      continue;
+
+    gimple *def_stmt = SSA_NAME_DEF_STMT (use);
+    gphi *use_phi = dyn_cast<gphi *> (def_stmt);
+    if (use_phi)
+      {
+        collect_oacc_privatized_vars_phi_walk (use_phi, vars, visited);
+        visited.add (use);
+        continue;
+      }
+
+    vars.add (use);
+
+    /* Visit the uses of USE in other phi nodes. This is used to get from loop
+       exit phis in inner loops to the loop entry phis. */
+
+    collect_oacc_privatized_vars_phi_walk_visit_phi_uses (use, vars, visited);
+    visited.add (use);
+  }
+}
+
+/* Records all uses of VAR in phis in VARS and continues the phi walk on each
+   such use. */
+
+static void
+collect_oacc_privatized_vars_phi_walk_visit_phi_uses (tree var,
+                                                      hash_set<tree> &vars,
+                                                      hash_set<tree> &visited)
+{
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
+  {
+    tree use = USE_FROM_PTR (use_p);
+    if (TREE_CODE (use) != SSA_NAME)
+      continue;
+
+    if (visited.contains (use))
+      continue;
+
+    gimple *use_stmt = USE_STMT (use_p);
+    gphi *use_phi = dyn_cast<gphi *> (use_stmt);
+
+    if (use_phi)
+      {
+        visited.add (PHI_RESULT (use_phi));
+        collect_oacc_privatized_vars_phi_walk (use_phi, vars, visited);
+        continue;
+      }
+
+    if (TREE_CODE (use) == SSA_NAME
+        && SSA_NAME_VAR (use) == SSA_NAME_VAR (var))
+      {
+        if (!vars.add (use))
+          collect_oacc_privatized_vars_phi_walk_visit_phi_uses (use, vars,
+                                                                visited);
+        continue;
+      }
+  }
+
+  return;
+}
+
+/* Return the first IFN_UNIQUE call with the given KIND that follows the tail
+   sequence of the OpenACC loop surrounding LOOP. */
+
+static gcall *
+find_ifn_unique_call_below (loop_p loop, enum ifn_unique_kind kind)
+{
+  gcall_pair tail = find_oacc_tail_marks (loop);
+  bool in_openacc_loop = tail.first != NULL;
+
+  if (!in_openacc_loop)
+    return NULL;
+
+  edge exit = single_exit (loop);
+  basic_block bb = exit->dest;
+  while ((bb = get_immediate_dominator (CDI_POST_DOMINATORS, bb)))
+    {
+      gimple *stmt = last_stmt (bb);
+
+      if (!stmt)
+        continue;
+
+      if (gimple_call_ifn_unique_p (stmt, kind))
+        return static_cast<gcall *> (stmt);
+    }
+
+  return NULL;
+}
+
+/* Return the IFN_UNIQUE_OACC_PRIVATE_SCALAR call which follows the tail
+   sequence of the OpenACC loop surrounding LOOP. */
+
+gcall *
+get_oacc_private_scalars_call (loop_p loop)
+{
+  return find_ifn_unique_call_below (loop, IFN_UNIQUE_OACC_PRIVATE_SCALAR);
+}
+
+/* Return the IFN_UNIQUE_OACC_FIRSTPRIVATE call which follows the tail
+   sequence of the OpenACC loop surrounding LOOP. */
+
+gcall *
+get_oacc_firstprivate_call (loop_p loop)
+{
+  return find_ifn_unique_call_below (loop, IFN_UNIQUE_OACC_FIRSTPRIVATE);
+}
+
+/* Find the ssa names that belong to the computation of variables that are
+   "private" in the OpenACC loop that surrounds the CFG loop LOOP and add them
+   to VARS.  LOOP must be contained in an OpenACC loop.
+
+   The CFG loop structure of OpenACC loops does not directly reflect the
+   privatization of the variable since the original loop has been enclosed in a
+   "chunking" loop. The "private" scalars variables are alive in those two
+   outermost CFG loops and the corresponding phis must be ignored by Graphite in
+   order to recognize the parallelizability of the loop. Omp-low.c places a
+   special internal function call after the outermost loop of a parallel region
+   whose arguments list the "private" variables that are considered here */
+
+void
+collect_oacc_privatized_vars (gcall *marker, hash_set<tree> &vars)
+{
+  if (!marker)
+    return;
+
+  gcc_checking_assert (marker->bb->loop_father->num == 0);
+
+  /* Search for phis that can be reached from the vars listed in the
+     PRIVATE_SCALARS_CALL's arguments. */
+
+  const unsigned n = gimple_call_num_args (marker);
+  for (unsigned i = 1; i < n; ++i)
+    {
+      tree arg = gimple_call_arg (marker, i);
+
+      if (TREE_CODE (arg) != SSA_NAME)
+        continue;
+
+      gimple *def_stmt = SSA_NAME_DEF_STMT (arg);
+      gphi *phi = dyn_cast<gphi *> (def_stmt);
+      if (!phi)
+        {
+          /* If the argument does not point to a phi, then it must be some value
+            defined outside of any OpenACC loop nest, i.e. a parameter of the
+            loop-nest. */
+          gcc_checking_assert (!def_stmt->bb
+                               || def_stmt->bb->loop_father->num == 0);
+          continue;
+        }
+
+      hash_set<tree> visited;
+      collect_oacc_privatized_vars_phi_walk (phi, vars, visited);
+    }
+}
+
+/* Return true if LOOP is an OpenACC loop with an "auto" clause, false otherwise. */
+
+static bool
+oacc_loop_with_auto_clause_p (loop_p loop)
+{
+  gcall_pair head_marks = find_oacc_head_marks (loop);
+
+  if (!head_marks.first)
+    return false;
+
+  unsigned flags = TREE_INT_CST_LOW (gimple_call_arg (head_marks.first, 3));
+  return flags & OLF_AUTO;
+}
+
+/* Return true if FUN is an outlined OpenACC function that contains loops with
+   "auto" clauses. */
+
+static bool
+function_has_auto_loops_p (function *fun)
+{
+  gcc_checking_assert (oacc_function_p (fun));
+
+  loop_p loop;
+  FOR_EACH_LOOP_FN (fun, loop, 0)
+  if (oacc_loop_with_auto_clause_p (loop))
+    return true;
+
+  return false;
+}
+
+/* Return true if Graphite might analyze outlined OpenACC functions for the kind
+   of target region for which FUN was created. The actual decision whether
+   Graphite runs on FUN may be subject to further restrictions. */
+
+bool
+graphite_analyze_oacc_target_region_type_p (function *fun)
+{
+  gcc_checking_assert (oacc_function_p (fun));
+
+  bool is_oacc_parallel
+      = lookup_attribute ("oacc parallel",
+                          DECL_ATTRIBUTES (current_function_decl))
+        != NULL;
+
+  bool is_oacc_parallel_kernels_graphite
+      = lookup_attribute ("oacc parallel_kernels_graphite",
+                          DECL_ATTRIBUTES (current_function_decl))
+        != NULL;
+
+  return is_oacc_parallel || is_oacc_parallel_kernels_graphite;
+}
+
+/* Return true if FUN is an outlined OpenACC function that is going to be
+   analyzed by Graphite. */
+
+bool
+graphite_analyze_oacc_function_p (function *fun)
+{
+  gcc_checking_assert (oacc_function_p (fun));
+
+  return graphite_analyze_oacc_target_region_type_p (cfun)
+         && function_has_auto_loops_p (cfun);
+}
diff --git a/gcc/graphite-oacc.h b/gcc/graphite-oacc.h
new file mode 100644
index 000000000000..458e8de24dac
--- /dev/null
+++ b/gcc/graphite-oacc.h
@@ -0,0 +1,55 @@
+/* Functions for analyzing the OpenACC loop structure from Graphite.
+
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_GRAPHITE_OACC_H
+#define GCC_GRAPHITE_OACC_H
+
+#include "stringpool.h"
+#include "omp-general.h"
+#include "attribs.h"
+#include "cfgloop.h"
+#include "tree-pretty-print.h"
+#include "print-tree.h"
+
+static inline bool oacc_function_p (function *fun)
+{
+  return oacc_get_fn_attrib (fun->decl);
+}
+
+extern bool is_oacc_private (tree var, loop_p loop);
+extern void oacc_add_private_var_kills (loop_p loop, vec<tree> *kills);
+
+extern const gcall* find_oacc_head_mark (loop_p loop, bool last = false);
+
+extern void collect_oacc_reduction_vars (loop_p loop, hash_set<tree> &vars);
+extern void collect_oacc_firstprivate_vars (loop_p loop, hash_set<tree> &vars);
+extern void collect_oacc_private_scalars (loop_p loop, hash_set<tree> &vars);
+extern void collect_oacc_privatized_vars (gcall *marker, hash_set<tree> &vars);
+
+extern gcall* get_oacc_firstprivate_call (loop_p loop);
+extern gcall* get_oacc_private_scalars_call (loop_p loop);
+
+extern bool graphite_analyze_oacc_function_p (function *fun);
+extern bool graphite_analyze_oacc_target_region_type_p (function *fun);
+
+extern gcall* get_oacc_firstprivate_call (loop_p loop);
+extern gcall* get_oacc_private_scalars_call (loop_p loop);
+
+#endif /* GCC_GRAPHITE_OACC_H */
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 6928f3e33dca..019452700a49 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -109,8 +109,8 @@ scop_get_domains (scop_p scop)
 /* Compute the schedule for SCOP based on its parameters, domain and set of
    constraints.  Then apply the schedule to SCOP.  */

-static bool
-optimize_isl (scop_p scop)
+bool
+optimize_isl (scop_p scop, bool oacc_enabled_graphite)
 {
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
@@ -196,7 +196,8 @@ optimize_isl (scop_p scop)
        print_schedule_ast (dump_file, scop->original_schedule, scop);
       isl_schedule_free (scop->transformed_schedule);
       scop->transformed_schedule = isl_schedule_copy (scop->original_schedule);
-      return flag_graphite_identity || flag_loop_parallelize_all;
+      return flag_graphite_identity || flag_loop_parallelize_all
+             || oacc_enabled_graphite;
     }

   return true;
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 27d5e43af125..1de376532ef1 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -92,7 +92,8 @@ debug_iteration_domains (scop_p scop)

 void
 new_poly_dr (poly_bb_p pbb, gimple *stmt, enum poly_dr_type type,
-            isl_map *acc, isl_set *subscript_sizes)
+            isl_map *acc, isl_set *subscript_sizes,
+            bool is_reduction)
 {
   static int id = 0;
   poly_dr_p pdr = XNEW (struct poly_dr);
@@ -105,10 +106,12 @@ new_poly_dr (poly_bb_p pbb, gimple *stmt, enum poly_dr_type type,
   pdr->subscript_sizes = subscript_sizes;
   PDR_TYPE (pdr) = type;
   PBB_DRS (pbb).safe_push (pdr);
+  pdr->is_reduction = is_reduction;

   if (dump_file)
     {
-      fprintf (dump_file, "Converting dr: ");
+      fprintf (dump_file, "Converting%sdr: ",
+              is_reduction ? " reduction " : " ");
       print_pdr (dump_file, pdr);
       fprintf (dump_file, "To polyhedral representation:\n");
       fprintf (dump_file, "  - access functions: ");
@@ -187,6 +190,10 @@ print_pdr (FILE *file, poly_dr_p pdr)
       fprintf (file, "may_write \n");
       break;

+    case PDR_KILL:
+      fprintf (file, "kill \n");
+      break;
+
     default:
       gcc_unreachable ();
     }
@@ -212,13 +219,15 @@ debug_pdr (poly_dr_p pdr)

 gimple_poly_bb_p
 new_gimple_poly_bb (basic_block bb, vec<data_reference_p> drs,
-                   vec<scalar_use> reads, vec<tree> writes)
+                   vec<scalar_use> reads, vec<tree> writes,
+                   vec<tree> kills)
 {
   gimple_poly_bb_p gbb = XNEW (struct gimple_poly_bb);
   GBB_BB (gbb) = bb;
   GBB_DATA_REFS (gbb) = drs;
   gbb->read_scalar_refs = reads;
   gbb->write_scalar_refs = writes;
+  gbb->kill_scalar_refs = kills;
   GBB_CONDITIONS (gbb).create (0);
   GBB_CONDITION_CASES (gbb).create (0);

@@ -235,6 +244,7 @@ free_gimple_poly_bb (gimple_poly_bb_p gbb)
   GBB_CONDITION_CASES (gbb).release ();
   gbb->read_scalar_refs.release ();
   gbb->write_scalar_refs.release ();
+  gbb->kill_scalar_refs.release ();
   XDELETE (gbb);
 }

@@ -264,6 +274,9 @@ new_scop (edge entry, edge exit)
   scop_set_region (s, region);
   s->pbbs.create (3);
   s->drs.create (3);
+  s->reduction_vars = new hash_set<tree>(1);
+  s->oacc_firstprivate_vars = new hash_set<tree>(1);
+  s->oacc_private_scalars = new hash_set<tree>(1);
   s->unhandled_alias_ddrs.create (1);
   s->dependence = NULL;
   return s;
@@ -285,6 +298,9 @@ free_scop (scop_p scop)

   scop->pbbs.release ();
   scop->drs.release ();
+  delete scop->reduction_vars;
+  delete scop->oacc_firstprivate_vars;
+  delete scop->oacc_private_scalars;
   scop->unhandled_alias_ddrs.release ();

   isl_set_free (scop->param_context);
@@ -550,6 +566,23 @@ debug_isl_map (__isl_keep isl_map *map)
   print_isl_map (stderr, map);
 }

+
+void
+print_isl_space (FILE *f, __isl_keep isl_space *space)
+{
+  isl_printer *p = isl_printer_to_file (the_isl_ctx, f);
+  p = isl_printer_set_yaml_style (p, ISL_YAML_STYLE_BLOCK);
+  p = isl_printer_print_space (p, space);
+  p = isl_printer_print_str (p, "\n");
+  isl_printer_free (p);
+}
+
+DEBUG_FUNCTION void
+debug_isl_space (__isl_keep isl_space *space)
+{
+  print_isl_space (stderr, space);
+}
+
 void
 print_isl_union_map (FILE *f, __isl_keep isl_union_map *map)
 {
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 26ba61d1d601..3d4ee30e8250 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "cfganal.h"
 #include "graphite.h"
+#include "omp-general.h"
+#include "graphite-oacc.h"
+#include "print-tree.h"
+#include "internal-fn.h"

 class debug_printer
 {
@@ -630,7 +634,9 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop)
       DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
       return false;
     }
-  if (!niter_desc.control.no_overflow)
+  /* TODO The zero niter can probably be allowed in general */
+  if (!niter_desc.control.no_overflow
+      && !(oacc_function_p (cfun) && integer_zerop (niter)))
     {
       DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n");
       return false;
@@ -701,8 +707,7 @@ scop_detection::add_scop (sese_l s)
       s.exit = single_succ_edge (s.exit->dest);
     }

-  /* Do not add scops with only one loop.  */
-  if (region_has_one_loop (s))
+  if (!oacc_function_p (cfun) && region_has_one_loop (s))
     {
       DEBUG_PRINT (dp << "[scop-detection-fail] Discarding one loop SCoP: ";
                   print_sese (dump_file, s));
@@ -1084,6 +1089,17 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
   return true;
 }

+/* Check if STMT is a internal OpenACC function call that should be ignored when
+   Graphite checks side effects. */
+
+static inline bool
+ignored_oacc_internal_call_p (gimple *stmt)
+{
+  return is_gimple_call (stmt)
+         && (gimple_call_internal_p (stmt, IFN_UNIQUE)
+             || gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION));
+}
+
 /* GIMPLE_ASM and GIMPLE_CALL may embed arbitrary side effects.
    Calls have side-effects, except those to const or pure
    functions.  */
@@ -1091,6 +1107,9 @@ scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
 static bool
 stmt_has_side_effects (gimple *stmt)
 {
+  if (ignored_oacc_internal_call_p (stmt))
+    return false;
+
   if (gimple_has_volatile_ops (stmt)
       || (gimple_code (stmt) == GIMPLE_CALL
          && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE)))
@@ -1288,6 +1307,7 @@ scan_tree_for_params (sese_info_p s, tree e)
     case NEGATE_EXPR:
     case BIT_NOT_EXPR:
     CASE_CONVERT:
+    case VIEW_CONVERT_EXPR:
     case NON_LVALUE_EXPR:
       scan_tree_for_params (s, TREE_OPERAND (e, 0));
       break;
@@ -1362,6 +1382,9 @@ find_scop_parameters (scop_p scop)
 static void
 add_write (vec<tree> *writes, tree def)
 {
+  if (ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (def)))
+    return;
+
   writes->safe_push (def);
   DEBUG_PRINT (dp << "Adding scalar write: ";
               print_generic_expr (dump_file, def);
@@ -1370,9 +1393,27 @@ add_write (vec<tree> *writes, tree def)
                                  SSA_NAME_DEF_STMT (def), 0));
 }

+static void
+add_kill (vec<tree> *kills, tree def)
+{
+  if (ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (def)))
+    return;
+
+  kills->safe_push (def);
+  DEBUG_PRINT (dp << "Adding scalar kill: ";
+              print_generic_expr (dump_file, def);
+              dp << "\n");
+}
+
 static void
 add_read (vec<scalar_use> *reads, tree use, gimple *use_stmt)
 {
+  gcc_assert (TREE_CODE (use) == SSA_NAME);
+
+  if ((use_stmt && ignored_oacc_internal_call_p (use_stmt))
+      || ignored_oacc_internal_call_p (SSA_NAME_DEF_STMT (use)))
+    return;
+
   DEBUG_PRINT (dp << "Adding scalar read: ";
               print_generic_expr (dump_file, use);
               dp << "\nFrom stmt: ";
@@ -1428,6 +1469,58 @@ build_cross_bb_scalars_use (scop_p scop, tree use, gimple *use_stmt,
     add_read (reads, use, use_stmt);
 }

+/* Add kills for all ssa names in vector FROM to vector KILLS. */
+
+static void add_kills (hash_set<tree>* from, vec<tree> &kills)
+{
+  hash_set<tree>::iterator end = from->end();
+  hash_set<tree>::iterator it = from->begin ();
+  for (; it != end; ++it)
+    {
+      tree var = *it;
+      add_kill (&kills, var);
+    }
+}
+
+/* Add kill operations for the privatized OpenACC variables that have been
+   recorded for SCOP for the basic block BB into the vector KILLS. */
+
+static void
+add_oacc_kills (scop_p scop, basic_block bb, vec<tree> &kills)
+{
+
+  loop_p loop = bb->loop_father;
+
+  /* Right now we only handle "firstprivate" and "private" variables that occur
+     on an OpenACC computer region. Those affect only the outermost and hence -
+     because of the "chunking" loop created in omp-expand.c around the original
+     loop - the two outermost CFG loops. */
+  if (loop_depth (loop) > 2)
+    return;
+
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+  {
+    if (e->src == loop->header)
+      {
+        add_kills (scop->oacc_private_scalars, kills);
+        add_kills (scop->oacc_firstprivate_vars, kills);
+        break;
+      }
+  }
+
+  FOR_EACH_EDGE (e, ei, bb->succs)
+  {
+    if (e->dest == loop->header)
+      {
+        add_kills (scop->oacc_private_scalars, kills);
+        add_kills (scop->oacc_firstprivate_vars, kills);
+        break;
+      }
+  }
+}
+
 /* Generates a polyhedral black box only if the bb contains interesting
    information.  */

@@ -1436,6 +1529,7 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
 {
   vec<data_reference_p> drs = vNULL;
   vec<tree> writes = vNULL;
+  vec<tree> kills = vNULL;
   vec<scalar_use> reads = vNULL;

   sese_l region = scop->scop_info->region;
@@ -1497,10 +1591,15 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
               gsi_next (&psi))
            {
              gphi *phi = psi.phi ();
-             tree res = gimple_phi_result (phi);
-             if (virtual_operand_p (res))
-               continue;
-             /* To simulate out-of-SSA the predecessor of edges into PHI nodes
+              tree res = gimple_phi_result (phi);
+              if (virtual_operand_p (res))
+                continue;
+
+              if (scop->oacc_private_scalars->contains (res)
+                  || scop->oacc_firstprivate_vars->contains (res))
+                continue;
+
+              /* To simulate out-of-SSA the predecessor of edges into PHI nodes
                 has a copy from the PHI argument to the PHI destination.  */
              if (! scev_analyzable_p (res, scop->scop_info->region))
                add_write (&writes, res);
@@ -1536,10 +1635,15 @@ try_generate_gimple_bb (scop_p scop, basic_block bb)
        }
     }

-  if (drs.is_empty () && writes.is_empty () && reads.is_empty ())
+  if (loop &&    /* i.e. BB belongs to SCOP. */
+      oacc_function_p (cfun))
+    add_oacc_kills (scop, bb, kills);
+
+  if (drs.is_empty () && writes.is_empty () && reads.is_empty ()
+      && kills.is_empty ())
     return NULL;

-  return new_gimple_poly_bb (bb, drs, reads, writes);
+  return new_gimple_poly_bb (bb, drs, reads, writes, kills);
 }

 /* Checks if all parts of DR are defined outside of REGION.  This allows an
@@ -1800,10 +1904,21 @@ private:
   auto_vec<gimple *, 3> conditions, cases;
   scop_p scop;
 };
-}
+
 gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo)
-  : dom_walker (direction, ALL_BLOCKS, bb_to_rpo), scop (scop)
+    : dom_walker (direction, ALL_BLOCKS, bb_to_rpo), scop (scop)
 {
+  if (oacc_function_p (cfun))
+    {
+      edge scop_entry = scop->scop_info->region.entry;
+      loop_p loop = scop_entry->dest->loop_father;
+      gcall *firstprivate_call = get_oacc_firstprivate_call (loop);
+      collect_oacc_privatized_vars (firstprivate_call,
+                                    *scop->oacc_firstprivate_vars);
+
+      gcall *private_call = get_oacc_private_scalars_call (loop);
+      collect_oacc_privatized_vars (private_call, *scop->oacc_private_scalars);
+    }
 }

 /* Call-back for dom_walk executed before visiting the dominated
@@ -1862,6 +1977,8 @@ gather_bbs::before_dom_children (basic_block bb)
   data_reference_p dr;
   FOR_EACH_VEC_ELT (gbb->data_refs, i, dr)
     {
+      gcc_checking_assert (! ignored_oacc_internal_call_p (DR_STMT (dr)));
+
       DEBUG_PRINT (dp << "Adding memory ";
                   if (dr->is_read)
                     dp << "read: ";
@@ -1897,6 +2014,8 @@ gather_bbs::after_dom_children (basic_block bb)
     }
 }

+}
+
 /* Compute sth like an execution order, dominator order with first executing
    edges that stay inside the current loop, delaying processing exit edges.  */

@@ -1919,6 +2038,22 @@ cmp_pbbs (const void *pa, const void *pb)
     return 0;
 }

+/* Analyze the OpenACC loop structure surrounding SCOP to determine the ssa
+   names that belong to OpenACC reduction computations. */
+
+static void
+determine_openacc_reductions (scop_p scop)
+{
+  loop_p loop;
+  FOR_EACH_LOOP (loop, 0)
+  {
+    if (!loop_in_sese_p (loop, scop->scop_info->region))
+      continue;
+
+    collect_oacc_reduction_vars (loop, *scop->reduction_vars);
+  }
+}
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
    them to SCOPS.  */

@@ -1954,11 +2089,12 @@ build_scops (vec<scop_p> *scops)
       /* Sort pbbs after execution order for initial schedule generation.  */
       scop->pbbs.qsort (cmp_pbbs);

-      if (! build_alias_set (scop))
-       {
-         DEBUG_PRINT (dp << "[scop-detection-fail] cannot handle dependences\n");
-         free_scop (scop);
-         continue;
+      if (!build_alias_set (scop))
+        {
+          DEBUG_PRINT (dp
+                      << "[scop-detection-fail] cannot handle dependences\n");
+          free_scop (scop);
+          continue;
        }

       /* Do not optimize a scop containing only PBBs that do not belong
@@ -1995,6 +2131,9 @@ build_scops (vec<scop_p> *scops)
          continue;
        }

+      if (oacc_function_p (cfun))
+        determine_openacc_reductions (scop);
+
       scops->safe_push (scop);
     }

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 12fa2d669b3c..1ee48e5a7aa5 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "gimplify-me.h"
 #include "tree-cfg.h"
+#include "graphite-oacc.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -46,6 +47,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "domwalk.h"
 #include "tree-ssa-propagate.h"
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "internal-fn.h"
 #include "graphite.h"

 /* Return an isl identifier for the polyhedral basic block PBB.  */
@@ -201,6 +205,8 @@ parameter_index_in_region (tree name, sese_info_p region)
   return -1;
 }

+tree oacc_ifn_call_extract (gimple*);
+
 /* Extract an affine expression from the tree E in the scop S.  */

 static isl_pw_aff *
@@ -599,6 +605,21 @@ pdr_add_data_dimensions (isl_set *subscript_sizes, scop_p scop,
   return isl_set_coalesce (subscript_sizes);
 }

+static inline bool
+oacc_internal_call_p (gimple *stmt)
+{
+  if (!stmt || !is_gimple_call (stmt))
+    return false;
+
+  /* graphite-scop-detection.c should filter out those calls. */
+  gcc_assert (!gimple_call_internal_p (stmt, IFN_UNIQUE));
+
+  /* Should be handled by scalar evolution analysis. */
+  gcc_assert (!gimple_call_internal_p (stmt, IFN_GOACC_LOOP));
+
+  return false;
+}
+
 /* Build data accesses for DRI.  */

 static void
@@ -635,13 +656,18 @@ build_poly_dr (dr_info &dri)
     subscript_sizes = pdr_add_data_dimensions (subscript_sizes, scop, dr);
   }

-  new_poly_dr (pbb, DR_STMT (dr), DR_IS_READ (dr) ? PDR_READ : PDR_WRITE,
-              acc, subscript_sizes);
+  if (oacc_internal_call_p (DR_STMT (dr)))
+    return;
+
+  bool is_reduction = scop->reduction_vars->contains (DR_BASE_ADDRESS (dr));
+  enum poly_dr_type dr_type = DR_IS_READ (dr) ? PDR_READ : PDR_WRITE;
+
+  new_poly_dr (pbb, DR_STMT (dr), dr_type, acc, subscript_sizes, is_reduction);
 }

 static void
 build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
-                isl_map *acc, isl_set *subscript_sizes)
+                 isl_map *acc, isl_set *subscript_sizes, bool is_reduction)
 {
   scop_p scop = PBB_SCOP (pbb);
   /* Each scalar variable has a unique alias set number starting from
@@ -658,7 +684,7 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind,
   c = isl_constraint_set_coefficient_si (c, isl_dim_out, 0, 1);

   new_poly_dr (pbb, stmt, kind, isl_map_add_constraint (acc, c),
-              subscript_sizes);
+               subscript_sizes, is_reduction);
 }

 /* Record all cross basic block scalar variables in PBB.  */
@@ -670,6 +696,7 @@ build_poly_sr (poly_bb_p pbb)
   gimple_poly_bb_p gbb = PBB_BLACK_BOX (pbb);
   vec<scalar_use> &reads = gbb->read_scalar_refs;
   vec<tree> &writes = gbb->write_scalar_refs;
+  vec<tree> &kills = gbb->kill_scalar_refs;

   isl_space *dc = isl_set_get_space (pbb->domain);
   int nb_out = 1;
@@ -684,13 +711,39 @@ build_poly_sr (poly_bb_p pbb)
   int i;
   tree var;
   FOR_EACH_VEC_ELT (writes, i, var)
+  {
+    if (oacc_internal_call_p (SSA_NAME_DEF_STMT (var)))
+      continue;
+
+    bool is_reduction = scop->reduction_vars->contains (var);
+
     build_poly_sr_1 (pbb, SSA_NAME_DEF_STMT (var), var, PDR_WRITE,
-                    isl_map_copy (acc), isl_set_copy (subscript_sizes));
+                     isl_map_copy (acc), isl_set_copy (subscript_sizes),
+                     is_reduction);
+  }
+
+  FOR_EACH_VEC_ELT (kills, i, var)
+  {
+    build_poly_sr_1 (pbb, NULL, var, PDR_KILL,
+                     isl_map_copy (acc), isl_set_copy (subscript_sizes),
+                     false);
+  }

   scalar_use *use;
   FOR_EACH_VEC_ELT (reads, i, use)
+  {
+    tree use_var = use->second;
+    gcc_checking_assert (TREE_CODE (use_var) == SSA_NAME);
+
+    if (oacc_internal_call_p (use->first)
+       || oacc_internal_call_p (SSA_NAME_DEF_STMT (use->second)))
+      continue;
+
+    bool is_reduction = scop->reduction_vars->contains (use->second);
+
     build_poly_sr_1 (pbb, use->first, use->second, PDR_READ, isl_map_copy (acc),
-                    isl_set_copy (subscript_sizes));
+                    isl_set_copy (subscript_sizes), is_reduction);
+  }

   isl_map_free (acc);
   isl_set_free (subscript_sizes);
diff --git a/gcc/graphite.c b/gcc/graphite.c
index 6c4fb42282b6..19a31beaa5fe 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "ssa.h"
 #include "fold-const.h"
 #include "gimple-iterator.h"
@@ -58,6 +60,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "tree-into-ssa.h"
 #include "graphite.h"
+#include "graphite-oacc.h"
+#include "cgraph.h"
+#include "gimple-pretty-print.h"
+#include "print-tree.h"
+#include "tree-pretty-print.h"
+#include "internal-fn.h"
+
+static bool have_isl = true;

 /* Print global statistics to FILE.  */

@@ -417,9 +427,12 @@ graphite_transform_loops (void)
   vec<scop_p> scops = vNULL;
   isl_ctx *ctx;

-  /* If a function is parallel it was most probably already run through graphite
-     once. No need to run again.  */
-  if (parallelized_function_p (cfun->decl))
+  /* If a function is parallel it was most probably already run through
+     graphite once. No need to run again.  This is not true for OpenACC
+     functions. The function was created for offloading, bu we still might have
+     to figure out which loops may be parallelized. */
+
+  if (parallelized_function_p (cfun->decl) && !oacc_function_p (cfun))
     return;

   calculate_dominance_info (CDI_DOMINATORS);
@@ -445,6 +458,7 @@ graphite_transform_loops (void)
   seir_cache = new hash_map<sese_scev_hash, tree>;

   calculate_dominance_info (CDI_POST_DOMINATORS);
+  set_scev_analyze_openacc_calls (oacc_function_p (cfun));
   build_scops (&scops);
   free_dominance_info (CDI_POST_DOMINATORS);

@@ -458,26 +472,50 @@ graphite_transform_loops (void)
       print_global_statistics (dump_file);
     }

-  FOR_EACH_VEC_ELT (scops, i, scop)
-    if (dbg_cnt (graphite_scop))
-      {
-       scop->isl_context = ctx;
-       if (!build_poly_scop (scop))
-         continue;
-
-       if (!apply_poly_transforms (scop))
-         continue;
-
-       changed = true;
-       if (graphite_regenerate_ast_isl (scop)
-           && dump_enabled_p ())
-         {
-           dump_user_location_t loc = find_loop_location
-             (scops[i]->scop_info->region.entry->dest->loop_father);
-           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
-                            "loop nest optimized\n");
-         }
-      }
+  if (oacc_function_p (cfun))
+    {
+      /* OpenACC uses Graphite for dependence analysis only.
+         Code generation would need not to understand the
+         OpenACC internal function calls before it could be
+         enabled. */
+
+      FOR_EACH_VEC_ELT (scops, i, scop)
+      if (dbg_cnt (graphite_scop))
+        {
+          scop->isl_context = ctx;
+          if (!build_poly_scop (scop))
+            continue;
+
+          if (!optimize_isl (scop, true))
+           continue;
+
+          graphite_oacc_analyze_scop (scop);
+          changed = true;
+        }
+      set_scev_analyze_openacc_calls (false);
+    }
+  else // Non-OpenACC-functions
+    {
+      FOR_EACH_VEC_ELT (scops, i, scop)
+      if (dbg_cnt (graphite_scop))
+        {
+          scop->isl_context = ctx;
+          if (!build_poly_scop (scop))
+            continue;
+
+          if (!apply_poly_transforms (scop))
+            continue;
+
+          changed = true;
+          if (graphite_regenerate_ast_isl (scop) && dump_enabled_p ())
+            {
+              dump_user_location_t loc = find_loop_location (
+                  scops[i]->scop_info->region.entry->dest->loop_father);
+              dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+                               "loop nest optimized\n");
+            }
+        }
+    }

   delete seir_cache;
   seir_cache = NULL;
@@ -520,6 +558,8 @@ graphite_transform_loops (void)

 #else /* If isl is not available: #ifndef HAVE_isl.  */

+static bool have_isl = false;
+
 static void
 graphite_transform_loops (void)
 {
@@ -532,7 +572,10 @@ graphite_transform_loops (void)
 static unsigned int
 graphite_transforms (struct function *fun)
 {
-  if (number_of_loops (fun) <= 1)
+
+  unsigned num_loops = number_of_loops (fun);
+  if (num_loops == 0
+      || (num_loops == 1 && !oacc_function_p (cfun)))
     return 0;

   graphite_transform_loops ();
@@ -540,14 +583,35 @@ graphite_transforms (struct function *fun)
   return 0;
 }

+/* Return TRUE if fun is an OpenACC outlined function that should be analyzed
+   by Graphite. */
+
+static inline bool oacc_enable_graphite_p (function *fun)
+{
+  if (!flag_openacc || !oacc_get_fn_attrib (fun->decl))
+    return false;
+
+  if (!graphite_analyze_oacc_target_region_type_p (fun))
+    return false;
+
+  bool optimizing = global_options.x_optimize <= 0;
+  /* Enabling Graphite if isl is not available aborts compilation. Prefer to
+     skip it and emit a warning, unless optimizations are enabled. */
+  if (!have_isl && !optimizing)
+    warning (OPT_Wall, "Unable to analyze OpenACC regions with Graphite; isl "
+                       "is not available.");
+  return true;
+}
+
 static bool
-gate_graphite_transforms (void)
+gate_graphite_transforms (function *fun)
 {
   /* Enable -fgraphite pass if any one of the graphite optimization flags
      is turned on.  */
   if (flag_graphite_identity
       || flag_loop_parallelize_all
-      || flag_loop_nest_optimize)
+      || flag_loop_nest_optimize
+      || oacc_enable_graphite_p (fun))
     flag_graphite = 1;

   return flag_graphite != 0;
@@ -576,7 +640,7 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *) { return gate_graphite_transforms (); }
+  virtual bool gate (function *fun) { return gate_graphite_transforms (fun); }

 }; // class pass_graphite

@@ -611,7 +675,7 @@ public:
   {}

   /* opt_pass methods: */
-  virtual bool gate (function *) { return gate_graphite_transforms (); }
+  virtual bool gate (function *fun) { return gate_graphite_transforms (fun); }
   virtual unsigned int execute (function *fun) { return graphite_transforms (fun); }

 }; // class pass_graphite_transforms
diff --git a/gcc/graphite.h b/gcc/graphite.h
index 03febfa39986..9c508f31109f 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -42,7 +42,8 @@ enum poly_dr_type
   /* PDR_MAY_READs are represented using PDR_READS.  This does not
      limit the expressiveness.  */
   PDR_WRITE,
-  PDR_MAY_WRITE
+  PDR_MAY_WRITE,
+  PDR_KILL
 };

 struct poly_dr
@@ -61,6 +62,9 @@ struct poly_dr

   enum poly_dr_type type;

+  /* Indicates that this PDR is part of an OpenACC "reduction" computation. */
+  bool is_reduction;
+
   /* The access polyhedron contains the polyhedral space this data
      reference will access.

@@ -185,7 +189,7 @@ struct poly_dr
 #define PDR_ACCESSES(PDR) (NULL)

 void new_poly_dr (poly_bb_p, gimple *, enum poly_dr_type,
-                 isl_map *, isl_set *);
+                 isl_map *, isl_set *, bool);
 void debug_pdr (poly_dr_p);
 void print_pdr (FILE *, poly_dr_p);

@@ -211,6 +215,14 @@ pdr_may_write_p (poly_dr_p pdr)
   return PDR_TYPE (pdr) == PDR_MAY_WRITE;
 }

+/* Returns true when PDR is a "kill".  */
+
+static inline bool
+pdr_kill_p (poly_dr_p pdr)
+{
+  return PDR_TYPE (pdr) == PDR_KILL;
+}
+
 /* POLY_BB represents a blackbox in the polyhedral model.  */

 struct poly_bb
@@ -281,6 +293,8 @@ extern void print_isl_aff (FILE *, isl_aff *);
 extern void print_isl_constraint (FILE *, isl_constraint *);
 extern void print_isl_schedule (FILE *, isl_schedule *);
 extern void debug_isl_schedule (isl_schedule *);
+extern void print_isl_space (FILE *, isl_space *);
+extern void debug_isl_space (isl_space *);
 extern void print_isl_ast (FILE *, isl_ast_node *);
 extern void debug_isl_ast (isl_ast_node *);
 extern void debug_isl_set (isl_set *);
@@ -380,6 +394,18 @@ struct scop
   /* All the data references in this scop.  */
   vec<dr_info> drs;

+  /* This set contains the ssa names that are OpenACC "reduction" variables
+     in the loops from SCOP using them. */
+  hash_set<tree> *reduction_vars;
+
+  /* If SCOP is contained in an OpenACC compute region, this is the set of
+     ssa names that are "firstprivate" in this region. */
+  hash_set<tree> *oacc_firstprivate_vars;
+
+  /* If SCOP is contained in an OpenACC compute region, this is the set of
+     ssa names that are "private" in this region. */
+  hash_set<tree> *oacc_private_scalars;
+
   /* The context describes known restrictions concerning the parameters
      and relations in between the parameters.

@@ -411,7 +437,8 @@ struct scop
 extern scop_p new_scop (edge, edge);
 extern void free_scop (scop_p);
 extern gimple_poly_bb_p new_gimple_poly_bb (basic_block, vec<data_reference_p>,
-                                           vec<scalar_use>, vec<tree>);
+                                           vec<scalar_use>, vec<tree>, vec<tree>);
+extern bool optimize_isl (scop_p, bool = false);
 extern bool apply_poly_transforms (scop_p);

 /* Set the region of SCOP to REGION.  */
@@ -447,10 +474,10 @@ carries_deps (__isl_keep isl_union_map *schedule,

 extern bool build_poly_scop (scop_p);
 extern bool graphite_regenerate_ast_isl (scop_p);
+extern bool graphite_oacc_analyze_scop (scop_p);
 extern void build_scops (vec<scop_p> *);
 extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree);
 extern void dot_all_sese (FILE *, vec<sese_l> &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
-
 #endif
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index d92080c80771..36c1c71cd41b 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2970,6 +2970,8 @@ expand_UNIQUE (internal_fn, gcall *stmt)
        gcc_unreachable ();
       break;
     case IFN_UNIQUE_OACC_PRIVATE:
+    case IFN_UNIQUE_OACC_PRIVATE_SCALAR:
+    case IFN_UNIQUE_OACC_FIRSTPRIVATE:
       break;
     }

diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 9004840e0f51..3d57cf5e643d 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -37,7 +37,9 @@ along with GCC; see the file COPYING3.  If not see
   DEF(UNSPEC), \
     DEF(OACC_FORK), DEF(OACC_JOIN),            \
     DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK),  \
-    DEF(OACC_PRIVATE)
+    DEF(OACC_PRIVATE),  \
+    DEF(OACC_PRIVATE_SCALAR),  \
+    DEF(OACC_FIRSTPRIVATE)

 enum ifn_unique_kind {
 #define DEF(X) IFN_UNIQUE_##X
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 54c2d65369ad..7a40ea2da1a0 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -108,7 +108,8 @@ struct omp_region
      a depend clause.  */
   gomp_ordered *ord_stmt;

-  /* True if this is nested inside an OpenACC kernels construct.  */
+  /* True if this is nested inside an OpenACC kernels construct that
+     will be handled by the "parloops" pass.  */
   bool inside_kernels_p;
 };

@@ -8153,13 +8154,35 @@ expand_omp_for (struct omp_region *region, gimple *inner_stmt)
     loops_state_set (LOOPS_NEED_FIXUP);

   if (region->inside_kernels_p)
-    expand_omp_for_generic (region, &fd, BUILT_IN_NONE, BUILT_IN_NONE,
-                           NULL_TREE, inner_stmt);
+    {
+      gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+      expand_omp_for_generic (region, &fd, BUILT_IN_NONE, BUILT_IN_NONE,
+                             NULL_TREE, inner_stmt);
+    }
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
     {
-      gcc_assert (!inner_stmt && !fd.non_rect);
+      struct omp_region *target_region;
+      for (target_region = region->outer; target_region;
+           target_region = target_region->outer)
+        {
+          if (region->type == GIMPLE_OMP_TARGET)
+            {
+              gomp_target *entry_stmt
+                  = as_a<gomp_target *> (last_stmt (target_region->entry));
+
+              if (gimple_omp_target_kind (entry_stmt)
+                  == GF_OMP_TARGET_KIND_OACC_KERNELS)
+                gcc_checking_assert (
+                    param_openacc_kernels != OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                    && param_openacc_kernels != OPENACC_KERNELS_PARLOOPS);
+            }
+        }
+
+      gcc_assert (!inner_stmt);
       expand_oacc_for (region, &fd);
     }
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_TASKLOOP)
@@ -9564,6 +9587,10 @@ static void
 mark_loops_in_oacc_kernels_region (basic_block region_entry,
                                   basic_block region_exit)
 {
+  gcc_checking_assert (param_openacc_kernels
+                           == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                       || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   class loop *outer = region_entry->loop_father;
   gcc_assert (region_exit == NULL || outer == region_exit->loop_father);

@@ -9728,23 +9755,28 @@ expand_omp_target (struct omp_region *region)

   entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
   target_kind = gimple_omp_target_kind (entry_stmt);
+  if (!(param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+        || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS))
+    gcc_checking_assert (target_kind != GF_OMP_TARGET_KIND_OACC_KERNELS);
+
   new_bb = region->entry;

   offloaded = is_gimple_omp_offloaded (entry_stmt);
   switch (target_kind)
     {
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+    case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_REGION:
     case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_ENTER_DATA:
     case GF_OMP_TARGET_KIND_EXIT_DATA:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
-    case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
-    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
     case GF_OMP_TARGET_KIND_DATA:
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
@@ -9784,6 +9816,12 @@ expand_omp_target (struct omp_region *region)
                     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
+      gcc_checking_assert (
+          param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+      mark_loops_in_oacc_kernels_region (region->entry, region->exit);
+
       DECL_ATTRIBUTES (child_fn)
        = tree_cons (get_identifier ("oacc kernels"),
                     NULL_TREE, DECL_ATTRIBUTES (child_fn));
@@ -9803,6 +9841,11 @@ expand_omp_target (struct omp_region *region)
        = tree_cons (get_identifier ("oacc parallel_kernels_gang_single"),
                     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+      DECL_ATTRIBUTES (child_fn)
+          = tree_cons (get_identifier ("oacc parallel_kernels_graphite"),
+                       NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
     default:
       /* Make sure we don't miss any.  */
       gcc_checking_assert (!(is_gimple_omp_oacc (entry_stmt)
@@ -10015,6 +10058,7 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
@@ -10517,14 +10561,15 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
                case GF_OMP_TARGET_KIND_OACC_SERIAL:
                case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
                case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
-                 break;
+                case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+                case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+                  break;
                case GF_OMP_TARGET_KIND_UPDATE:
                case GF_OMP_TARGET_KIND_ENTER_DATA:
                case GF_OMP_TARGET_KIND_EXIT_DATA:
                case GF_OMP_TARGET_KIND_DATA:
                case GF_OMP_TARGET_KIND_OACC_DATA:
                case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
-               case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
                case GF_OMP_TARGET_KIND_OACC_UPDATE:
                case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
                case GF_OMP_TARGET_KIND_OACC_DECLARE:
@@ -10706,7 +10751,10 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fun)
     {
-      return !(fun->curr_properties & PROP_gimple_eomp);
+      return !(fun->curr_properties & PROP_gimple_eomp)
+             && (!oacc_get_kernels_attrib (cfun->decl)
+                 || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                 || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
     }
   virtual unsigned int execute (function *) { return execute_expand_omp (); }
   opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
@@ -10776,6 +10824,8 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
        case GF_OMP_TARGET_KIND_OACC_SERIAL:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
        case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+       case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
+       case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
          break;
        case GF_OMP_TARGET_KIND_UPDATE:
        case GF_OMP_TARGET_KIND_ENTER_DATA:
@@ -10783,7 +10833,6 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
        case GF_OMP_TARGET_KIND_DATA:
        case GF_OMP_TARGET_KIND_OACC_DATA:
        case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
-       case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
        case GF_OMP_TARGET_KIND_OACC_UPDATE:
        case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
        case GF_OMP_TARGET_KIND_OACC_DECLARE:
diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 694c14af7b9e..c8aec1b18b58 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -2929,6 +2929,15 @@ oacc_get_fn_attrib (tree fn)
   return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
 }

+/* Retrieve the oacc kernels attrib and return it.  Non-oacc
+   functions will return NULL.  */
+
+tree
+oacc_get_kernels_attrib (tree fn)
+{
+  return lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn));
+}
+
 /* Return true if FN is an OpenMP or OpenACC offloading function.  */

 bool
@@ -2955,10 +2964,16 @@ oacc_get_fn_dim_size (tree fn, int axis)
     dims = TREE_CHAIN (dims);

   tree v = TREE_VALUE (dims);
-  /* TODO With 'pass_oacc_device_lower' moved "later", this is necessary to
-     avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
+  /* TODO-kernels With 'pass_oacc_device_lower' moved "later", this is necessary
+     to avoid ICE for some OpenACC 'kernels' ("parloops") constructs.  */
   if (v == NULL_TREE)
-    return 0;
+    {
+      gcc_checking_assert (
+          param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+      return 0;
+    }

   int size = TREE_INT_CST_LOW (v);

diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 956931522272..b27dc5e94096 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -120,6 +120,7 @@ extern int oacc_verify_routine_clauses (tree, tree *, location_t,
                                        const char *);
 extern tree oacc_build_routine_dims (tree clauses);
 extern tree oacc_get_fn_attrib (tree fn);
+extern tree oacc_get_kernels_attrib (tree fn);
 extern bool offloading_function_p (tree fn);
 extern int oacc_get_fn_dim_size (tree fn, int axis);
 extern int oacc_get_ifn_dim_arg (const gimple *stmt);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 43fababb5a37..d64db62cc35a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -157,6 +157,12 @@ struct omp_context
   /* Addressable variable decls in this context.  */
   vec<tree> *oacc_addressable_var_decls;

+  /* "firstprivate" variables in this context */
+  hash_set<tree> *oacc_firstprivate_vars;
+
+  /* Scalar "private" variables in this context. */
+  hash_set<tree> *oacc_private_scalars;
+
   /* True if lower_omp_1 should look up lastprivate conditional in parent
      context.  */
   bool combined_into_simd_safelen1;
@@ -220,7 +226,27 @@ is_oacc_parallel_or_serial (omp_context *ctx)
              || (gimple_omp_target_kind (ctx->stmt)
                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
              || (gimple_omp_target_kind (ctx->stmt)
-                 == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)));
+                 == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+             || (gimple_omp_target_kind (ctx->stmt)
+                 == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)));
+}
+
+/* Return true if CTX corresponds to an oacc region that was generated from
+   an original kernels region that has been lowered to parallel regions.  */
+
+static bool
+was_originally_oacc_kernels (omp_context *ctx)
+{
+  enum gimple_code outer_type = gimple_code (ctx->stmt);
+  return ((outer_type == GIMPLE_OMP_TARGET)
+          && ((gimple_omp_target_kind (ctx->stmt)
+               == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+              || (gimple_omp_target_kind (ctx->stmt)
+                  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
 }

 /* Return whether CTX represents an OpenACC 'kernels' construct.
@@ -246,10 +272,23 @@ is_oacc_kernels_decomposed_part (omp_context *ctx)
               == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
              || (gimple_omp_target_kind (ctx->stmt)
                  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+             || (gimple_omp_target_kind (ctx->stmt)
+                 == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
              || (gimple_omp_target_kind (ctx->stmt)
                  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
 }

+/* Return whether CTX represents an OpenACC 'kernels' decomposed part that will
+   be analyzed by Graphite.  */
+
+static bool
+is_oacc_kernels_decomposed_graphite_part (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+         && gimple_omp_target_kind (ctx->stmt)
+                == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE;
+}
+
 /* Return true if STMT corresponds to an OpenMP target region.  */
 static bool
 is_omp_target (gimple *stmt)
@@ -1139,6 +1178,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.decl_map = new hash_map<tree, tree>;

   ctx->oacc_addressable_var_decls = new vec<tree> ();
+  ctx->oacc_firstprivate_vars = new hash_set<tree> ();
+  ctx->oacc_private_scalars = new hash_set<tree> ();

   return ctx;
 }
@@ -1224,6 +1265,8 @@ delete_omp_context (splay_tree_value value)
   delete ctx->allocate_map;

   delete ctx->oacc_addressable_var_decls;
+  delete ctx->oacc_firstprivate_vars;
+  delete ctx->oacc_private_scalars;

   XDELETE (ctx);
 }
@@ -1286,6 +1329,43 @@ fixup_child_record_type (omp_context *ctx)
     = build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
 }

+static void
+oacc_record_firstprivate_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+      {
+        tree decl = OMP_CLAUSE_DECL (c);
+
+        if (TREE_ADDRESSABLE (decl))
+          continue;
+
+        ctx->oacc_firstprivate_vars->add (decl);
+      }
+}
+
+static void
+oacc_record_private_scalars (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+        tree decl = OMP_CLAUSE_DECL (c);
+        if (!(VAR_P (decl)
+              && !(TREE_READONLY (decl)
+                   && (TREE_STATIC (decl) || DECL_EXTERNAL (decl)))))
+          continue;
+
+        if (TREE_ADDRESSABLE (decl))
+          continue;
+        ctx->oacc_private_scalars->add (decl);
+      }
+}
+
 /* Instantiate decls as necessary in CTX to satisfy the data sharing
    specified by CLAUSES.  If BASE_POINTERS_RESTRICT, install var field with
    restrict.  */
@@ -1901,9 +1981,15 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
            break;
          /* FALLTHRU */

-       case OMP_CLAUSE_FIRSTPRIVATE:
-       case OMP_CLAUSE_PRIVATE:
-       case OMP_CLAUSE_LINEAR:
+        case OMP_CLAUSE_FIRSTPRIVATE:
+          if (is_oacc_kernels_decomposed_graphite_part (ctx))
+            oacc_record_firstprivate_var_clauses (ctx, c);
+          gcc_fallthrough ();
+        case OMP_CLAUSE_PRIVATE:
+          if (is_oacc_kernels_decomposed_graphite_part (ctx))
+            oacc_record_private_scalars (ctx, c);
+          gcc_fallthrough ();
+        case OMP_CLAUSE_LINEAR:
        case OMP_CLAUSE_IS_DEVICE_PTR:
          decl = OMP_CLAUSE_DECL (c);
          if (is_variable_sized (decl))
@@ -2766,12 +2852,21 @@ enclosing_target_ctx (omp_context *ctx)
 static bool
 ctx_in_oacc_kernels_region (omp_context *ctx)
 {
+  gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                       || param_openacc_kernels
+                              == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                       || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   for (;ctx != NULL; ctx = ctx->outer)
     {
       gimple *stmt = ctx->stmt;
-      if (gimple_code (stmt) == GIMPLE_OMP_TARGET
-         && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
-       return true;
+      if (gimple_code (stmt) != GIMPLE_OMP_TARGET)
+       continue;
+
+      int target_kind = gimple_omp_target_kind (stmt);
+      if (target_kind == GF_OMP_TARGET_KIND_OACC_KERNELS
+          || target_kind == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+        return true;
     }

   return false;
@@ -2785,6 +2880,10 @@ ctx_in_oacc_kernels_region (omp_context *ctx)
 static unsigned
 check_oacc_kernel_gwv (gomp_for *stmt, omp_context *ctx)
 {
+  gcc_checking_assert (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                      || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                      || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   bool checking = true;
   unsigned outer_mask = 0;
   unsigned this_mask = 0;
@@ -2856,9 +2955,11 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
     {
       omp_context *tgt = enclosing_target_ctx (outer_ctx);

-      if (!(tgt && is_oacc_kernels (tgt)))
-       for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-         {
+      if (!tgt
+          || (is_oacc_parallel_or_serial (tgt)
+              && !was_originally_oacc_kernels (tgt)))
+        for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+          {
            tree c_op0;
            switch (OMP_CLAUSE_CODE (c))
              {
@@ -3393,11 +3494,14 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)

   /* No nesting of non-OpenACC STMT (that is, an OpenMP one, or a GOMP builtin)
      inside an OpenACC CTX.  */
-  if (!(is_gimple_omp (stmt)
-       && is_gimple_omp_oacc (stmt))
+  if (!(is_gimple_omp (stmt) && is_gimple_omp_oacc (stmt))
       /* Except for atomic codes that we share with OpenMP.  */
       && !(gimple_code (stmt) == GIMPLE_OMP_ATOMIC_LOAD
-          || gimple_code (stmt) == GIMPLE_OMP_ATOMIC_STORE))
+           || gimple_code (stmt) == GIMPLE_OMP_ATOMIC_STORE)
+      /* Except for target regions introduced for kernels.  */
+      && (gimple_code (stmt) != GIMPLE_OMP_TARGET
+          || gimple_omp_target_kind (stmt)
+                 != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE))
     {
       if (oacc_get_fn_attrib (cfun->decl) != NULL)
        {
@@ -3568,6 +3672,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
                  case GF_OMP_TARGET_KIND_OACC_SERIAL:
                  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
                  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+                 case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
                    ok = true;
                    break;

@@ -4065,6 +4170,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
              break;
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+           case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
            case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
              /* OpenACC 'kernels' decomposed parts.  */
              stmt_name = "kernels"; break;
@@ -4085,6 +4191,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
              ctx_stmt_name = "host_data"; break;
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
            case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+           case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
            case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
              /* OpenACC 'kernels' decomposed parts.  */
              ctx_stmt_name = "kernels"; break;
@@ -4092,10 +4199,12 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
            }

          /* OpenACC/OpenMP mismatch?  */
-         if (is_gimple_omp_oacc (stmt)
-             != is_gimple_omp_oacc (ctx->stmt))
-           {
-             error_at (gimple_location (stmt),
+          if (is_gimple_omp_oacc (stmt) != is_gimple_omp_oacc (ctx->stmt)
+              && (gimple_code (stmt) != GIMPLE_OMP_TARGET
+                  || gimple_omp_target_kind (stmt)
+                         != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE))
+            {
+              error_at (gimple_location (stmt),
                        "%s %qs construct inside of %s %qs region",
                        (is_gimple_omp_oacc (stmt)
                         ? "OpenACC" : "OpenMP"), stmt_name,
@@ -7673,9 +7782,11 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *body_p,

 static void
 lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
-                      gcall *fork, gcall *private_marker, gcall *join,
-                      gimple_seq *fork_seq, gimple_seq *join_seq,
-                      omp_context *ctx)
+                       gcall *fork, gcall *private_marker,
+                       gcall *private_scalars_marker,
+                       gcall *firstprivate_marker, gcall *join,
+                       gimple_seq *fork_seq, gimple_seq *join_seq,
+                       omp_context *ctx)
 {
   gimple_seq before_fork = NULL;
   gimple_seq after_fork = NULL;
@@ -7691,9 +7802,11 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
        /* No 'reduction' clauses on OpenACC 'kernels'.  */
        gcc_checking_assert (!is_oacc_kernels (ctx));
        /* Likewise, on OpenACC 'kernels' decomposed parts.  */
-       gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
+        gcc_checking_assert (
+            !is_oacc_kernels_decomposed_part (ctx)
+            || is_oacc_kernels_decomposed_graphite_part (ctx));

-       tree orig = OMP_CLAUSE_DECL (c);
+        tree orig = OMP_CLAUSE_DECL (c);
        tree orig_clause;
        tree var;
        tree ref_to_res = NULL_TREE;
@@ -7896,7 +8009,12 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
     gimple_seq_add_stmt (fork_seq, fork);
   gimple_seq_add_seq (fork_seq, after_fork);

+  if (private_scalars_marker)
+    gimple_seq_add_stmt (join_seq, private_scalars_marker);
+  if (firstprivate_marker)
+    gimple_seq_add_stmt (join_seq, firstprivate_marker);
   gimple_seq_add_seq (join_seq, before_join);
+
   if (join)
     gimple_seq_add_stmt (join_seq, join);
   gimple_seq_add_seq (join_seq, after_join);
@@ -8609,16 +8727,27 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,

   /* In a parallel region, loops without auto and seq clauses are
      implicitly INDEPENDENT.  */
-  if ((!tgt || is_oacc_parallel_or_serial (tgt))
+  if ((!tgt
+       || (is_oacc_parallel_or_serial (tgt)
+           && !is_oacc_kernels_decomposed_graphite_part (tgt)))
       && !(tag & (OLF_SEQ | OLF_AUTO)))
-    tag |= OLF_INDEPENDENT;
+    {
+      tag |= OLF_INDEPENDENT;
+    }

   /* Loops inside OpenACC 'kernels' decomposed parts' regions are expected to
      have an explicit 'seq' or 'independent' clause, and no 'auto' clause.  */
-  if (tgt && is_oacc_kernels_decomposed_part (tgt))
+  if (tgt && is_oacc_kernels_decomposed_part (tgt)
+      && !is_oacc_kernels_decomposed_graphite_part (tgt))
     {
-      gcc_assert (tag & (OLF_SEQ | OLF_INDEPENDENT));
-      gcc_assert (!(tag & OLF_AUTO));
+      tag |= OLF_INDEPENDENT;
+
+      gcc_checking_assert (
+          gimple_code (ctx->stmt) != GIMPLE_OMP_TARGET
+          /* Loops in kernels regions that will be handled by Graphite should
+             have been made 'auto' by "pass_convert_oacc_kernels". */
+          || gimple_omp_target_kind (ctx->stmt)
+                 != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE);
     }

   if (tag & OLF_TILE)
@@ -8673,7 +8802,9 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head,

 static void
 lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
-                     gimple_seq *head, gimple_seq *tail, omp_context *ctx)
+                      gcall *private_scalars_marker,
+                      gcall *firstprivate_marker, gimple_seq *head,
+                      gimple_seq *tail, omp_context *ctx)
 {
   bool inner = false;
   tree ddvar = create_tmp_var (integer_type_node, ".data_dep");
@@ -8688,6 +8819,20 @@ lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
       gimple_call_set_arg (private_marker, 1, ddvar);
     }

+  if (private_scalars_marker)
+    {
+      gimple_set_location (private_scalars_marker, loc);
+      gimple_call_set_lhs (private_scalars_marker, ddvar);
+      gimple_call_set_arg (private_scalars_marker, 1, ddvar);
+    }
+
+  if (firstprivate_marker)
+    {
+      gimple_set_location (firstprivate_marker, loc);
+      gimple_call_set_lhs (firstprivate_marker, ddvar);
+      gimple_call_set_arg (firstprivate_marker, 1, ddvar);
+    }
+
   tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK);
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);

@@ -8718,9 +8863,10 @@ lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
                              build_int_cst (integer_type_node, done),
                              &join_seq);

-      lower_oacc_reductions (loc, clauses, place, inner,
-                            fork, (count == 1) ? private_marker : NULL,
-                            join, &fork_seq, &join_seq,  ctx);
+      lower_oacc_reductions (loc, clauses, place, inner, fork,
+                             (count == 1) ? private_marker : NULL,
+                             private_scalars_marker, firstprivate_marker, join,
+                             &fork_seq, &join_seq, ctx);

       /* Append this level to head. */
       gimple_seq_add_seq (head, fork_seq);
@@ -11721,6 +11867,76 @@ make_oacc_private_marker (omp_context *ctx)
   return gimple_build_call_internal_vec (IFN_UNIQUE, args);
 }

+/* Return an internal function call that contains a list of variables which are
+   "firstprivate" in the compute region representend by CTX. This call is used
+   to help Graphite identify those static. */
+
+static gcall *
+make_oacc_firstprivate_vars_marker (omp_context *ctx)
+{
+  auto_vec<tree, 5> args;
+
+  args.quick_push (
+      build_int_cst (integer_type_node, IFN_UNIQUE_OACC_FIRSTPRIVATE));
+
+  /* TODO Change the data structure/iteration to ensure that the ordering of the
+     variables remains stable between GCC runs. */
+  hash_set<tree>::iterator end = ctx->oacc_firstprivate_vars->end();
+  hash_set<tree>::iterator it = ctx->oacc_firstprivate_vars->begin ();
+  for (; it != end; ++it)
+    {
+      tree decl = *it;
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+       {
+         tree inner_decl = maybe_lookup_decl (decl, thisctx);
+         if (inner_decl)
+           {
+             decl = inner_decl;
+             break;
+           }
+       }
+
+      args.safe_push (decl);
+    }
+
+  return gimple_build_call_internal_vec (IFN_UNIQUE, args);
+}
+
+/* Return an internal function call that contains a list of scalar variables
+   which are "private" in the compute region represented by CTX. This call is
+   used to help Graphite identify those variables. */
+
+static gcall *
+make_oacc_private_scalars_marker (omp_context *ctx)
+{
+  auto_vec<tree, 5> args;
+
+  args.quick_push (
+      build_int_cst (integer_type_node, IFN_UNIQUE_OACC_PRIVATE_SCALAR));
+
+  /* TODO Change the data structure/iteration to ensure that the ordering of
+     the variables remains stable between GCC runs. */
+  hash_set<tree>::iterator end = ctx->oacc_private_scalars->end ();
+  hash_set<tree>::iterator it = ctx->oacc_private_scalars->begin ();
+  for (; it != end; ++it)
+    {
+      tree decl = *it;
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        {
+          tree inner_decl = maybe_lookup_decl (decl, thisctx);
+          if (inner_decl)
+            {
+              decl = inner_decl;
+              break;
+            }
+        }
+
+      args.safe_push (decl);
+    }
+
+  return gimple_build_call_internal_vec (IFN_UNIQUE, args);
+}
+
 /* Lower code for an OMP loop directive.  */

 static void
@@ -11929,11 +12145,16 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   /* Once lowered, extract the bounds and clauses.  */
   omp_extract_for_data (stmt, &fd, NULL);

-  if (is_gimple_omp_oacc (ctx->stmt)
-      && !ctx_in_oacc_kernels_region (ctx))
-    lower_oacc_head_tail (gimple_location (stmt),
-                         gimple_omp_for_clauses (stmt), private_marker,
-                         &oacc_head, &oacc_tail, ctx);
+  bool oacc_kernels_parloops = false;
+  if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+      || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)
+    oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx);
+  if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops)
+    {
+      lower_oacc_head_tail (gimple_location (stmt),
+                            gimple_omp_for_clauses (stmt), private_marker,
+                            NULL, NULL, &oacc_head, &oacc_tail, ctx);
+    }

   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
@@ -12833,6 +13054,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE:
       data_region = false;
       break;
     case GF_OMP_TARGET_KIND_DATA:
@@ -13073,8 +13295,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
          {
            /* No 'firstprivate' clauses on OpenACC 'kernels'.  */
            gcc_checking_assert (!is_oacc_kernels (ctx));
-           /* Likewise, on OpenACC 'kernels' decomposed parts.  */
-           gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));

            goto oacc_firstprivate;
          }
@@ -13107,8 +13327,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
          {
            /* No 'private' clauses on OpenACC 'kernels'.  */
            gcc_checking_assert (!is_oacc_kernels (ctx));
-           /* Likewise, on OpenACC 'kernels' decomposed parts.  */
-           gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));

            break;
          }
@@ -14259,13 +14477,26 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)

          gcall *private_marker = make_oacc_private_marker (ctx);

-         if (private_marker)
+         gcall *firstprivate_marker = NULL;
+         gcall *private_scalars_marker = NULL;
+
+          /* The markers for "private" and "firstprivate" scalars are only used
+             to help "Graphite" identify those variables for which it has to
+             adjust some dependences. */
+          if (is_oacc_kernels_decomposed_graphite_part (ctx))
+            {
+              firstprivate_marker = make_oacc_firstprivate_vars_marker (ctx);
+              private_scalars_marker = make_oacc_private_scalars_marker (ctx);
+            }
+
+          if (private_marker)
            gimple_call_set_arg (private_marker, 2, level);

-         lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level,
-                                false, NULL, private_marker, NULL, &fork_seq,
-                                &join_seq, ctx);
-       }
+          lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level,
+                                 false, NULL, private_marker,
+                                 private_scalars_marker, firstprivate_marker,
+                                 NULL, &fork_seq, &join_seq, ctx);
+        }

       gimple_seq_add_seq (&new_body, fork_seq);
       gimple_seq_add_seq (&new_body, tgt_body);
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 6acb6367a7f1..c8fdc3b6e5fd 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -176,8 +176,13 @@ adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
               compiler logic to analyze this, so can't parallelize it here, so
               we'd very likely be running into a performance problem if we
               were to execute this unparallelized, thus forward the whole loop
-              nest to 'parloops'.  */
-           *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+              nest to Graphite/"parloops".  */
+           if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+             *region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE;
+           else if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+             *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+           else
+             gcc_unreachable ();
            /* Terminate: final decision for this region.  */
            *handled_ops_p = true;
            return integer_zero_node;
@@ -197,8 +202,13 @@ adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
         the compiler logic to analyze this, so can't parallelize it here, so
         we'd very likely be running into a performance problem if we were to
         execute this unparallelized, thus forward the whole thing to
-        'parloops'.  */
-      *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+        Graphite/"parloops".  */
+      if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+       *region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE;
+      else if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+       *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+      else
+        gcc_unreachable ();
       /* Terminate: final decision for this region.  */
       *handled_ops_p = true;
       return integer_zero_node;
@@ -309,7 +319,9 @@ make_region_seq (location_t loc, gimple_seq stmts,
   /* Figure out the region code for this region.  */
   /* Optimistic default: assume "setup code", no looping; thus not
      performance-critical.  */
-  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
+  int region_code = param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                        ? GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE
+                        : GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
   adjust_region_code (stmts, &region_code);

   if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
@@ -330,6 +342,13 @@ make_region_seq (location_t loc, gimple_seq stmts,
         loops nested inside this sequentially executed statement.  */
       make_loops_gang_single (stmts);
     }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_NOTE, loc_stmts_first,
+                        "beginning %<Graphite%> part in OpenACC"
+                        " %<kernels%> region\n");
+    }
   else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
     {
       if (dump_enabled_p ())
@@ -437,21 +456,24 @@ adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
          tree *outer_clause_ptr = NULL;
          switch (OMP_CLAUSE_CODE (loop_clause))
            {
-           case OMP_CLAUSE_GANG:
-             outer_clause_ptr = wi_info->loop_gang_clause_ptr;
-             break;
-           case OMP_CLAUSE_WORKER:
-             outer_clause_ptr = wi_info->loop_worker_clause_ptr;
-             break;
-           case OMP_CLAUSE_VECTOR:
-             outer_clause_ptr = wi_info->loop_vector_clause_ptr;
-             break;
-           case OMP_CLAUSE_SEQ:
-           case OMP_CLAUSE_INDEPENDENT:
-           case OMP_CLAUSE_AUTO:
-             add_auto_clause = false;
-           default:
-             break;
+             case OMP_CLAUSE_GANG:
+               outer_clause_ptr = wi_info->loop_gang_clause_ptr;
+               add_auto_clause = false;
+               break;
+             case OMP_CLAUSE_WORKER:
+               outer_clause_ptr = wi_info->loop_worker_clause_ptr;
+               add_auto_clause = false;
+               break;
+             case OMP_CLAUSE_VECTOR:
+               outer_clause_ptr = wi_info->loop_vector_clause_ptr;
+               add_auto_clause = false;
+               break;
+             case OMP_CLAUSE_SEQ:
+             case OMP_CLAUSE_INDEPENDENT:
+             case OMP_CLAUSE_AUTO:
+               add_auto_clause = false;
+             default:
+               break;
            }
          if (outer_clause_ptr != NULL)
            {
@@ -525,30 +547,34 @@ transform_kernels_loop_clauses (gimple *omp_for,
        loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
     {
       bool found_num_clause = false;
-      tree *clause_ptr, clause_to_check;
+      tree *clause_ptr;
+      tree clause_to_check = NULL_TREE;
       switch (OMP_CLAUSE_CODE (loop_clause))
-       {
-       case OMP_CLAUSE_GANG:
-         found_num_clause = true;
-         clause_ptr = &loop_gang_clause;
-         clause_to_check = num_gangs_clause;
-         break;
-       case OMP_CLAUSE_WORKER:
-         found_num_clause = true;
-         clause_ptr = &loop_worker_clause;
-         clause_to_check = num_workers_clause;
-         break;
-       case OMP_CLAUSE_VECTOR:
-         found_num_clause = true;
-         clause_ptr = &loop_vector_clause;
-         clause_to_check = vector_length_clause;
-         break;
-       case OMP_CLAUSE_INDEPENDENT:
-       case OMP_CLAUSE_SEQ:
-       case OMP_CLAUSE_AUTO:
-         add_auto_clause = false;
-       default:
-         break;
+        {
+         case OMP_CLAUSE_GANG:
+           found_num_clause = true;
+           add_auto_clause = false;
+           clause_ptr = &loop_gang_clause;
+           clause_to_check = num_gangs_clause;
+           break;
+         case OMP_CLAUSE_WORKER:
+           found_num_clause = true;
+           add_auto_clause = false;
+           clause_ptr = &loop_worker_clause;
+           clause_to_check = num_workers_clause;
+           break;
+         case OMP_CLAUSE_VECTOR:
+           found_num_clause = true;
+           add_auto_clause = false;
+           clause_ptr = &loop_vector_clause;
+           clause_to_check = vector_length_clause;
+           break;
+         case OMP_CLAUSE_INDEPENDENT:
+         case OMP_CLAUSE_SEQ:
+         case OMP_CLAUSE_AUTO:
+           add_auto_clause = false;
+         default:
+           break;
        }
       if (found_num_clause && OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL)
        {
@@ -646,10 +672,13 @@ make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
   clauses = unshare_expr (clauses);

   /* Figure out the region code for this region.  */
-  /* Optimistic default: assume that the loop nest is parallelizable
-     (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) 'auto' clause,
-     and no un-annotated loops).  */
-  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+  /* For "parloops", use an optimistic default: assume that the loop nest is
+     parallelizable (essentially, no GIMPLE_OMP_FOR with (explicit or implicit)
+     'auto' clause, and no un-annotated loops).  */
+  int region_code = param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+                       ? GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE
+                       : GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+
   adjust_region_code (stmts, &region_code);

   if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
@@ -661,6 +690,19 @@ make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
                         "parallelized loop nest"
                         " in OpenACC %<kernels%> region\n");

+      clauses = transform_kernels_loop_clauses (omp_for,
+                                               num_gangs_clause,
+                                               num_workers_clause,
+                                               vector_length_clause,
+                                               clauses);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_NOTE, omp_for,
+                        "forwarded loop nest in OpenACC %<kernels%> region"
+                        " to %<Graphite%> for analysis\n");
+
       clauses = transform_kernels_loop_clauses (omp_for,
                                                num_gangs_clause,
                                                num_workers_clause,
@@ -1651,8 +1693,13 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
-    return (flag_openacc
-           && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+    if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE
+       || param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+      return flag_openacc;
+    else if (param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)
+      return false;
+    else
+      gcc_unreachable ();
   }
   virtual unsigned int execute (function *)
   {
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index bbdcc5207880..f5cb222efd8c 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -853,6 +853,202 @@ oacc_xform_loop (gcall *call)
   gsi_replace_with_seq (&gsi, seq, true);
 }

+/* This is used for expanding the loop calls to "fake" values that mimic the
+   values used for host execution during scalar evolution analysis in
+   Graphite. The function has been derived from oacc_xform_loop which could not
+   be used because it rewrites the code directly.
+
+   TODO This function can either be simplified significantly (cf. the fixed
+   values for number_of_threads, thread_index, chunking, striding) or unified
+   with oacc_xform_loop. */
+
+tree
+oacc_extract_loop_call (gcall *call)
+{
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
+  enum ifn_goacc_loop_kind code
+      = (enum ifn_goacc_loop_kind)TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+  tree dir = gimple_call_arg (call, 1);
+  tree range = gimple_call_arg (call, 2);
+  tree step = gimple_call_arg (call, 3);
+  tree chunk_size = NULL_TREE;
+  unsigned mask = (unsigned)TREE_INT_CST_LOW (gimple_call_arg (call, 5));
+  tree lhs = gimple_call_lhs (call);
+  tree type = NULL_TREE;
+  tree diff_type = TREE_TYPE (range);
+  tree r = NULL_TREE;
+  bool chunking = false, striding = true;
+  unsigned outer_mask = mask & (~mask + 1); // Outermost partitioning
+  /* unsigned inner_mask = mask & ~outer_mask; // Inner partitioning (if any)
+   */
+
+  gcc_checking_assert (lhs);
+
+  type = TREE_TYPE (lhs);
+
+  tree number_of_threads = integer_one_node;
+  tree thread_index = integer_zero_node;
+
+  /* striding=true, chunking=true
+       -> invalid.
+     striding=true, chunking=false
+       -> chunks=1
+     striding=false,chunking=true
+       -> chunks=ceil (range/(chunksize*threads*step))
+     striding=false,chunking=false
+       -> chunk_size=ceil(range/(threads*step)),chunks=1  */
+
+  switch (code)
+    {
+    default:
+      gcc_unreachable ();
+
+    case IFN_GOACC_LOOP_CHUNKS:
+      if (!chunking)
+        r = build_int_cst (type, 1);
+      else
+        {
+          /* chunk_max
+             = (range - dir) / (chunks * step * num_threads) + dir  */
+          tree per = number_of_threads;
+          per = fold_convert (type, per);
+          chunk_size = fold_convert (type, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, step);
+          r = fold_build2 (MINUS_EXPR, type, range, dir);
+          r = fold_build2 (PLUS_EXPR, type, r, per);
+          r = fold_build2 (TRUNC_DIV_EXPR, type, r, per);
+        }
+      break;
+
+    case IFN_GOACC_LOOP_STEP:
+      {
+        /* If striding, step by the entire compute volume, otherwise
+           step by the inner volume.  */
+        /* unsigned volume = striding ? mask : inner_mask; */
+
+        r = number_of_threads;
+        r = fold_build2 (MULT_EXPR, type, fold_convert (type, r), step);
+      }
+      break;
+
+    case IFN_GOACC_LOOP_OFFSET:
+      /* Enable vectorization on non-SIMT targets.  */
+      if (!targetm.simt.vf
+          && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+          /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
+             the loop.  */
+          && (flag_tree_loop_vectorize
+              || !global_options_set.x_flag_tree_loop_vectorize))
+        {
+          basic_block bb = gsi_bb (gsi);
+          class loop *parent = bb->loop_father;
+          class loop *body = parent->inner;
+
+          parent->force_vectorize = true;
+          parent->safelen = INT_MAX;
+
+          /* "Chunking loops" may have inner loops.  */
+          if (parent->inner)
+            {
+              body->force_vectorize = true;
+              body->safelen = INT_MAX;
+            }
+
+          cfun->has_force_vectorize_loops = true;
+        }
+      if (striding)
+        {
+          r = thread_index;
+          r = fold_convert (diff_type, r);
+        }
+      else
+        {
+          tree inner_size = number_of_threads;
+          tree outer_size = number_of_threads;
+          tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                     inner_size, outer_size);
+
+          volume = fold_convert (diff_type, volume);
+          if (chunking)
+            chunk_size = fold_convert (diff_type, chunk_size);
+          else
+            {
+              tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+
+              chunk_size = fold_build2 (MINUS_EXPR, diff_type, range, dir);
+              chunk_size = fold_build2 (PLUS_EXPR, diff_type, chunk_size, per);
+              chunk_size
+                  = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+            }
+
+          tree span = fold_build2 (MULT_EXPR, diff_type, chunk_size,
+                                   fold_convert (diff_type, inner_size));
+          r = thread_index;
+          r = fold_convert (diff_type, r);
+          r = fold_build2 (MULT_EXPR, diff_type, r, span);
+
+          tree inner = thread_index;
+          inner = fold_convert (diff_type, inner);
+          r = fold_build2 (PLUS_EXPR, diff_type, r, inner);
+
+          if (chunking)
+            {
+              tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6));
+              tree per
+                  = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size);
+              per = fold_build2 (MULT_EXPR, diff_type, per, chunk);
+
+              r = fold_build2 (PLUS_EXPR, diff_type, r, per);
+            }
+        }
+      r = fold_build2 (MULT_EXPR, diff_type, r, step);
+      if (type != diff_type)
+        r = fold_convert (type, r);
+      break;
+
+    case IFN_GOACC_LOOP_BOUND:
+      if (striding)
+        r = range;
+      else
+        {
+          tree inner_size = number_of_threads;
+          tree outer_size = number_of_threads;
+          tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                     inner_size, outer_size);
+
+          volume = fold_convert (diff_type, volume);
+          if (chunking)
+            chunk_size = fold_convert (diff_type, chunk_size);
+          else
+            {
+              tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+
+              chunk_size = fold_build2 (MINUS_EXPR, diff_type, range, dir);
+              chunk_size = fold_build2 (PLUS_EXPR, diff_type, chunk_size, per);
+              chunk_size
+                  = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+            }
+
+          tree span = fold_build2 (MULT_EXPR, diff_type, chunk_size,
+                                   fold_convert (diff_type, inner_size));
+
+          r = fold_build2 (MULT_EXPR, diff_type, span, step);
+
+          tree offset = gimple_call_arg (call, 6);
+          r = fold_build2 (PLUS_EXPR, diff_type, r,
+                           fold_convert (diff_type, offset));
+          r = fold_build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type,
+                           r, range);
+        }
+      if (diff_type != type)
+        r = fold_convert (type, r);
+      break;
+    }
+
+  return r;
+}
+
 /* Transform a GOACC_TILE call.  Determines the element loop span for
    the specified loop of the nest.  This is 1 if we're not tiling.

@@ -1050,7 +1246,8 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
 #endif
   if (check
       && warn_openacc_parallelism
-      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
+      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
+      && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn)))
     {
       static char const *const axes[] =
       /* Must be kept in sync with GOMP_DIM enumeration.  */
@@ -1550,7 +1747,219 @@ oacc_loop_process (oacc_loop *loop)
     oacc_loop_process (loop->sibling);
 }

-/* Walk the OpenACC loop heirarchy checking and assigning the
+/* Return the outermost CFG loop that is enclosed between the head and
+   tail mark calls for LOOP, or NULL if there is no such CFG loop.
+
+   The outermost CFG loop is a loop that is used for "chunking" the
+   original loop from the user's code.  The lower_omp_for function
+   in omp-low.c which creates the head and tail mark sequence and
+   the expand_oacc_for function in omp-expand.c are relevant for
+   understanding the structure that we expect to find here. But note
+   that the passes implemented in those files do not operate on CFG
+   loops and hence the correspondence to the CFG loop structure is
+   not directly visible there and has to be inferred. */
+
+static loop_p
+oacc_loop_get_cfg_loop (oacc_loop *loop)
+{
+  loop_p enclosed_cfg_loop = NULL;
+  for (unsigned dim = 0; dim < GOMP_DIM_MAX; ++dim)
+    {
+      gcall *tail_mark = loop->tails[dim];
+      gimple *head_mark = loop->heads[dim];
+      if (!tail_mark)
+        continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        dump_printf (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, "%G",
+                     tail_mark);
+
+      loop_p mark_cfg_loop = tail_mark->bb->loop_father;
+      loop_p current_cfg_loop = mark_cfg_loop;
+
+      /* Ascend from TAIL_MARK until a different CFG loop is reached.
+
+         From the way that OpenACC loops are treated in omp-low.c, we
+         could expect the tail marker to be immediately preceded by a
+         loop exit. But loop optimizations (e.g. store-motion in
+         pass_lim) can change this. */
+      basic_block bb = tail_mark->bb;
+      bool empty_loop = false;
+      while (current_cfg_loop == mark_cfg_loop)
+        {
+          /* If the OpenACC loop becomes empty due to optimizations,
+             there is no CFG loop at all enclosed between head and
+             tail mark */
+          if (bb == head_mark->bb)
+            {
+              empty_loop = true;
+              break;
+            }
+
+          bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+          current_cfg_loop = bb->loop_father;
+        }
+
+      if (empty_loop)
+        continue;
+
+      /* We expect to find the same CFG loop enclosed between all head
+         and tail mark pairs. Hence we actually need to look at only
+         the first available pair. But we consider all for
+         verification purposes. */
+      if (enclosed_cfg_loop)
+        {
+          gcc_assert (current_cfg_loop == enclosed_cfg_loop);
+          continue;
+        }
+
+      enclosed_cfg_loop = current_cfg_loop;
+
+      gcc_checking_assert (dominated_by_p (
+          CDI_DOMINATORS, enclosed_cfg_loop->header, head_mark->bb));
+    }
+
+  return enclosed_cfg_loop;
+}
+
+static const char*
+can_be_parallel_str (loop_p loop)
+{
+  if (!loop->can_be_parallel_valid_p)
+    return "not analyzed";
+
+  return loop->can_be_parallel ? "can be parallel" : "cannot be parallel";
+}
+
+/* Returns true if LOOP is known to be parallelizable and false
+   otherwise.  The decision is based on the the dependence analysis
+   that must have been previously performed by Graphite on the CFG
+   loops contained in the OpenACC loop LOOP.  The value of ANALYZED is
+   set to true if all relevant CFG loops have been analyzed. */
+
+static bool
+oacc_loop_can_be_parallel_p (oacc_loop *loop, bool& analyzed)
+{
+  /* Graphite will not run without enabled optimizations, so we cannot
+     expect to find any parallelizability information on the CFG loops. */
+  if (!optimize)
+    return false;
+
+  const dump_user_location_t loc
+      = dump_user_location_t::from_location_t (loop->loc);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, loc,
+                     "Inspecting CFG-loops for OpenACC loop.\n");
+
+  /* Search for the CFG loops that are enclosed between the head and
+     tail mark calls for LOOP. The two outer CFG loops are considered
+     to belong to the OpenACC loop and hence the CAN_BE_PARALLEL flags
+     on those loops will be used to determine the return value. */
+  bool can_be_parallel = false;
+  loop_p enclosed_cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (enclosed_cfg_loop
+      /* The inner loop may have been removed in degenerate cases, e.g.
+         if an infinite "for (; ;)" gets optimized in an OpenACC loop nest. */
+      && enclosed_cfg_loop->inner)
+    {
+      gcc_assert (enclosed_cfg_loop->inner != NULL);
+      gcc_assert (enclosed_cfg_loop->inner->next == NULL);
+
+      can_be_parallel = enclosed_cfg_loop->can_be_parallel
+                        && enclosed_cfg_loop->inner->can_be_parallel;
+
+      analyzed = enclosed_cfg_loop->can_be_parallel_valid_p
+                 && enclosed_cfg_loop->inner->can_be_parallel_valid_p;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          dump_printf (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS,
+                       "\tOuter loop <%d> preceeding tail mark %s.\n"
+                       "\tInner loop <%d> %s.\n",
+                       enclosed_cfg_loop->num,
+                       can_be_parallel_str (enclosed_cfg_loop),
+                       enclosed_cfg_loop->inner->num,
+                       can_be_parallel_str (enclosed_cfg_loop->inner));
+        }
+    }
+  else if (dump_file && (dump_flags & TDF_DETAILS))
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | MSG_PRIORITY_INTERNALS, loc,
+                     "Empty OpenACC loop.\n");
+
+  return can_be_parallel;
+}
+
+static bool
+oacc_parallel_kernels_graphite_fun_p ()
+{
+  return lookup_attribute ("oacc parallel_kernels_graphite",
+                           DECL_ATTRIBUTES (cfun->decl));
+}
+
+static bool
+oacc_parallel_fun_p ()
+{
+  return lookup_attribute ("oacc parallel",
+                           DECL_ATTRIBUTES (cfun->decl));
+}
+
+/* If LOOP is an "auto" loop for which dependence analysis has determined that
+   it can be parallelized, make it "independent" by adjusting its FLAGS field
+   and return true. Otherwise, return false. */
+
+static bool
+oacc_loop_transform_auto_into_independent (oacc_loop *loop)
+{
+  if (!optimize)
+    return false;
+
+  /* This function is only relevant on "kernels"
+     regions that have been explicitly designated
+     to be analyzed by Graphite and on "auto"
+     loops in "parallel" regions. */
+  if (!oacc_parallel_kernels_graphite_fun_p () &&
+      !oacc_parallel_fun_p ())
+    return false;
+
+  if (loop->routine)
+    return false;
+
+  if (!(loop->flags & OLF_AUTO))
+    return false;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  dump_user_location_t loc = dump_user_location_t::from_location_t (loop->loc);
+
+  if (dump_enabled_p ())
+    {
+      if (!analyzed)
+        dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                         "'auto' loop has not been analyzed (cf. 'graphite' "
+                         "dumps for more information).\n");
+    }
+  if (!can_be_parallel)
+    return false;
+
+  loop->flags |= OLF_INDEPENDENT;
+
+  /* We need to keep the OLF_AUTO flag for now.
+     oacc_loop_fixed_partitions and oacc_loop_auto_partitions
+     interpret "independent auto" as "this loop can be parallel,
+     please determine the dimensions" which seems to correspond to the
+     meaning of those clauses in an old OpenACC version.  We rely on
+     this behaviour to assign the dimensions for this loop.
+
+     TODO Use a different flag to indicate that the dimensions must be assigned. */
+
+  // loop->flags &= ~OLF_AUTO;
+
+  return true;
+}
+
+/* Walk the OpenACC loop hierarchy checking and assigning the
    programmer-specified partitionings.  OUTER_MASK is the partitioning
    this loop is contained within.  Return mask of partitioning
    encountered.  If any auto loops are discovered, set GOMP_DIM_MAX
@@ -1606,6 +2015,9 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
          loop->flags |= OLF_AUTO;
          mask_all |= GOMP_DIM_MASK (GOMP_DIM_MAX);
        }
+
+      if (oacc_loop_transform_auto_into_independent (loop))
+         mask_all |= GOMP_DIM_MASK (GOMP_DIM_MAX);
     }

   if (this_mask & outer_mask)
@@ -2077,81 +2489,88 @@ execute_oacc_loop_designation ()
       flag_openacc_dims = (char *)&flag_openacc_dims;
     }

-  bool is_oacc_parallel
-    = (lookup_attribute ("oacc parallel",
-                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_kernels
     = (lookup_attribute ("oacc kernels",
                         DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  bool is_oacc_parallel
+    = (lookup_attribute ("oacc parallel",
+                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_serial
     = (lookup_attribute ("oacc serial",
                         DECL_ATTRIBUTES (current_function_decl)) != NULL);
-  bool is_oacc_parallel_kernels_parallelized
-    = (lookup_attribute ("oacc parallel_kernels_parallelized",
-                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
-  bool is_oacc_parallel_kernels_gang_single
-    = (lookup_attribute ("oacc parallel_kernels_gang_single",
-                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
-  int fn_level = oacc_fn_attrib_level (attr);
-  bool is_oacc_routine = (fn_level >= 0);
-  gcc_checking_assert (is_oacc_parallel
-                      + is_oacc_kernels
-                      + is_oacc_serial
-                      + is_oacc_parallel_kernels_parallelized
-                      + is_oacc_parallel_kernels_gang_single
-                      + is_oacc_routine
-                      == 1);
-
   bool is_oacc_kernels_parallelized
     = (lookup_attribute ("oacc kernels parallelized",
                         DECL_ATTRIBUTES (current_function_decl)) != NULL);
   if (is_oacc_kernels_parallelized)
     gcc_checking_assert (is_oacc_kernels);
+  bool is_oacc_parallel_kernels_parallelized
+      = (lookup_attribute ("oacc parallel_kernels_parallelized",
+                           DECL_ATTRIBUTES (current_function_decl))
+         != NULL);
+  if (is_oacc_parallel_kernels_parallelized)
+    gcc_checking_assert (!is_oacc_kernels);
+  bool is_oacc_parallel_kernels_gang_single
+    = (lookup_attribute ("oacc parallel_kernels_gang_single",
+                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_parallel_kernels_gang_single)
+    gcc_checking_assert (!is_oacc_kernels);
+  gcc_checking_assert (!(is_oacc_parallel_kernels_parallelized
+                        && is_oacc_parallel_kernels_gang_single));
+  bool is_oacc_parallel_kernels_graphite
+    = (lookup_attribute ("oacc parallel_kernels_graphite",
+                        DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_parallel_kernels_graphite)
+      gcc_checking_assert (!is_oacc_kernels
+                          && !is_oacc_parallel_kernels_gang_single);
+
+  /* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
+     kernels, so remove the parallelism dimensions function attributes
+     potentially set earlier on.  */
+  if (is_oacc_kernels && !is_oacc_kernels_parallelized)
+    {
+      gcc_checking_assert (!is_oacc_parallel_kernels_graphite);
+      oacc_set_fn_attrib (current_function_decl, NULL, NULL);
+      attr = oacc_get_fn_attrib (current_function_decl);
+    }
+
+  /* Discover, partition and process the loops.  */
+  oacc_loop *loops = oacc_loop_discovery ();
+  int fn_level = oacc_fn_attrib_level (attr);

   if (dump_file)
     {
-      if (is_oacc_parallel)
-       fprintf (dump_file, "Function is OpenACC parallel offload\n");
+      if (fn_level >= 0)
+       fprintf (dump_file, "Function is OpenACC routine level %d\n",
+                fn_level);
       else if (is_oacc_kernels)
        fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
                 (is_oacc_kernels_parallelized
                  ? "parallelized" : "unparallelized"));
-      else if (is_oacc_serial)
-       fprintf (dump_file, "Function is OpenACC serial offload\n");
       else if (is_oacc_parallel_kernels_parallelized)
        fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
                 "parallel_kernels_parallelized");
       else if (is_oacc_parallel_kernels_gang_single)
        fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
                 "parallel_kernels_gang_single");
-      else if (is_oacc_routine)
-       fprintf (dump_file, "Function is OpenACC routine level %d\n",
-                fn_level);
+      else if (is_oacc_parallel_kernels_graphite)
+       fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+                "parallel_kernels_graphite");
+      else if (is_oacc_serial)
+       fprintf (dump_file, "Function is OpenACC serial offload\n");
+      else if (is_oacc_parallel)
+       fprintf (dump_file, "Function is OpenACC parallel offload\n");
       else
        gcc_unreachable ();
     }

-  /* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
-     kernels, so remove the parallelism dimensions function attributes
-     potentially set earlier on.  */
-  if (is_oacc_kernels && !is_oacc_kernels_parallelized)
-    {
-      oacc_set_fn_attrib (current_function_decl, NULL, NULL);
-      attr = oacc_get_fn_attrib (current_function_decl);
-    }
-
-  /* Discover, partition and process the loops.  */
-  oacc_loop *loops = oacc_loop_discovery ();
-  fn_level = oacc_fn_attrib_level (attr);
-
-  unsigned outer_mask = 0;
-  if (is_oacc_routine)
-    outer_mask = GOMP_DIM_MASK (fn_level) - 1;
+  unsigned outer_mask = fn_level >= 0 ? GOMP_DIM_MASK (fn_level) - 1 : 0;
   unsigned used_mask = oacc_loop_partition (loops, outer_mask);
   /* OpenACC kernels constructs are special: they currently don't use the
      generic oacc_loop infrastructure and attribute/dimension processing.  */
   if (is_oacc_kernels && is_oacc_kernels_parallelized)
     {
+      gcc_checking_assert (!is_oacc_parallel_kernels_graphite);
+
       /* Parallelized OpenACC kernels constructs use gang parallelism.  See
         also tree-parloops.c:create_parallel_loop.  */
       used_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG);
@@ -2410,6 +2829,11 @@ execute_oacc_device_lower ()
                  remove = true;
                  break;

+               case IFN_UNIQUE_OACC_PRIVATE_SCALAR:
+               case IFN_UNIQUE_OACC_FIRSTPRIVATE:
+                 remove = true;
+                 break;
+
                case IFN_UNIQUE_OACC_PRIVATE:
                  {
                    HOST_WIDE_INT level
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index a6f26a7c9628..34df72cefc84 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -32,5 +32,7 @@ extern GTY(()) vec<tree, va_gc> *offload_vars;
 extern int oacc_fn_attrib_level (tree attr);
 extern void omp_finish_file (void);
 extern void omp_discover_implicit_declare_target (void);
+extern tree oacc_extract_loop_call (gcall *call);
+

 #endif /* GCC_OMP_DEVICE_H */
diff --git a/gcc/params.opt b/gcc/params.opt
index a9c12264244b..e3116bb67d27 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -788,7 +788,7 @@ If -ftree-vectorize is used, the minimal loop bound of a loop to be considered f

 -param=openacc-kernels=
 Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_DECOMPOSE) Param
---param=openacc-kernels=[decompose|parloops]   Specify mode of OpenACC 'kernels' constructs handling.
+--param=openacc-kernels=[decompose|decompose-parloops|parloops]        Specify mode of OpenACC 'kernels' constructs handling.

 Enum
 Name(openacc_kernels) Type(enum openacc_kernels)
@@ -796,6 +796,9 @@ Name(openacc_kernels) Type(enum openacc_kernels)
 EnumValue
 Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)

+EnumValue
+Enum(openacc_kernels) String(decompose-parloops) Value(OPENACC_KERNELS_DECOMPOSE_PARLOOPS)
+
 EnumValue
 Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)

diff --git a/gcc/sese.c b/gcc/sese.c
index ca88f9bbfdf1..50bdde6c537a 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -448,8 +448,29 @@ scalar_evolution_in_region (const sese_l &region, loop_p loop, tree t)
   if (!loop_in_sese_p (loop, region))
     loop = NULL;

-  return instantiate_scev (region.entry, loop,
-                          analyze_scalar_evolution (loop, t));
+  tree chrec = analyze_scalar_evolution (loop, t);
+
+  /* The IFN_GOACC_LOOP calls may evolve to an ssa name that is defined outside
+     of LOOP. To avoid failing the scev analysis, we need this special
+     handling. */
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      gimple *def_stmt = SSA_NAME_DEF_STMT (t);
+      basic_block def_bb = def_stmt->bb;
+      if (is_gimple_call (def_stmt)
+          && gimple_call_internal_p (def_stmt, IFN_GOACC_LOOP)
+          && TREE_CODE (chrec) == SSA_NAME && def_bb
+          && SSA_NAME_DEF_STMT (chrec)->bb)
+        {
+          loop_p outer_loop = SSA_NAME_DEF_STMT (chrec)->bb->loop_father;
+          loop_p inner_loop = def_bb->loop_father;
+
+          if (outer_loop != inner_loop)
+            return scalar_evolution_in_region (region, outer_loop, chrec);
+        }
+    }
+
+  return instantiate_scev (region.entry, loop, chrec);
 }

 /* Return true if BB is empty, contains only DEBUG_INSNs.  */
diff --git a/gcc/sese.h b/gcc/sese.h
index c51ea68bfb47..114bb9b0c0b4 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -280,6 +280,7 @@ typedef struct gimple_poly_bb
   vec<data_reference_p> data_refs;
   vec<scalar_use> read_scalar_refs;
   vec<tree> write_scalar_refs;
+  vec<tree> kill_scalar_refs;
 } *gimple_poly_bb_p;

 #define GBB_BB(GBB) (GBB)->bb
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
deleted file mode 100644
index 7ce42a469ad3..000000000000
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ /dev/null
@@ -1,45 +0,0 @@
-/* Check offloaded function's attributes and classification for unparallelized
-   OpenACC 'kernels'.  */
-
-/* { dg-additional-options "-O2" }
-   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
-   { dg-additional-options "-fopt-info-note-optimized-omp" }
-   { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccloops1" } */
-
-/* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
-   aspects of that functionality.  */
-
-#define N 1024
-
-extern unsigned int *__restrict a;
-extern unsigned int *__restrict b;
-extern unsigned int *__restrict c;
-
-extern unsigned int f (unsigned int);
-#pragma acc routine (f) seq
-
-void KERNELS ()
-{
-#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
-    /* An "extern"al mapping of loop iterations/array indices makes the loop
-       unparallelizable.  */
-    c[i] = a[f (i)] + b[f (i)];
-}
-
-/* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
-
-/* Check that exactly one OpenACC kernels construct is analyzed, and that it
-   can't be parallelized.
-   { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } } */
-
-/* Check the offloaded function's classification and compute dimensions (will
-   always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index de7525e67f14..7aaebeff2828 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -20,7 +20,7 @@ extern unsigned int *__restrict c;
 void KERNELS ()
 {
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     c[i] = a[i] + b[i];
 }

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
deleted file mode 100644
index 1449f7a066d4..000000000000
--- a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ /dev/null
@@ -1,36 +0,0 @@
-/* { dg-additional-options "--param=openacc-kernels=parloops" } as this is
-   specifically testing "parloops" handling.  */
-/* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
-/* { dg-additional-options "-fdump-tree-parloops1-all" } */
-/* { dg-additional-options "-fdump-tree-optimized" } */
-
-#include <stdlib.h>
-
-#define n 10000
-
-unsigned int a[n];
-
-void  __attribute__((noinline,noclone))
-foo (void)
-{
-  int i;
-  unsigned int sum = 1;
-
-#pragma acc kernels copyin (a[0:n]) copy (sum)
-  {
-    for (i = 0; i < n; ++i)
-      sum += a[i];
-  }
-
-  if (sum != 5001)
-    abort ();
-}
-
-/* Check that only one loop is analyzed, and that it can be parallelized.  */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone, noinline\\)\\)" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
-
-/* Check that the loop has been split off into a function.  */
-/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
index 9e53b2490192..b3f4e24173af 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
@@ -16,7 +16,7 @@ main ()

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
  /* Strangely indented to keep this similar to other test cases.  */
-  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  if (c) /* { dg-message "optimized: beginning .Graphite. region in OpenACC .kernels. construct" } */
  {
 #pragma acc loop seq
   /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
index 3c78f2bf2911..3bcb7f430f4d 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
@@ -2,7 +2,7 @@
    construct containing loops.  */

 /* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
-/* { dg-additional-options "-fopt-info-note-optimized-omp" } */
+/* { dg-additional-options "-fopt-info-optimized-omp-note" } */

 //TODO update accordingly
 /* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
@@ -15,7 +15,7 @@ main ()
 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
  /* Strangely indented to keep this similar to other test cases.  */
  {
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     ;

   for (x = 0; x < 10; x++)
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
index 5ecd9378ee8a..8d82c21c1aa9 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
@@ -13,36 +13,36 @@ main ()
   int x, y, z;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     ;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     ;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     for (y = 0; y < 10; y++)
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     ;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     for (y = 0; y < 10; y++)
       ;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     for (y = 0; y < 10; y++)
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
     for (y = 0; y < 10; y++)
       for (z = 0; z < 10; z++)
        ;
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
new file mode 100644
index 000000000000..bba67dcf7cbc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
@@ -0,0 +1,47 @@
+! Verify that Graphite's analysis of the CFG loops gets correctly
+! transferred to the OpenACC loop structure for loop-nests of depth 1
+
+! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details -fopt-info-optimized -fopt-info-missed" }
+! { dg-additional-options "--param max-isl-operations=0" }
+! { dg-additional-options "-O2" }
+! { dg-prune-output ".*not inlinable.*" }
+
+module test_module
+
+  real, allocatable :: array1(:)
+  real, allocatable :: array2(:)
+
+  contains
+
+subroutine test_loop_nest_depth_1 ()
+  implicit none
+
+  integer :: i,n
+
+  if (size (array1) /= size (array2)) return
+  n = size(array1)
+
+  !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC gang vector loop parallelism" }
+  ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 }
+  ! { dg-message ".auto. loop can be parallel" "" {target *-*-*} .-2 }
+  do i=1, n
+     array2(i) = array1(i) ! { dg-message "loop has no data-dependences" }
+  end do
+
+
+  !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC seq loop parallelism" }
+  ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 }
+  ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-2 }
+  do i=1, n-1
+     array1(i+1) = array1(i) + 10 ! { dg-message "loop has data-dependences" }
+     array2(i) = array1(i)
+  end do
+
+  return
+end subroutine test_loop_nest_depth_1
+
+
+
+end module test_module
+
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 2 "graphite" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
new file mode 100644
index 000000000000..d635cc5e4fe0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
@@ -0,0 +1,103 @@
+! Verify that Graphite's analysis of the CFG loops gets correctly
+! transferred to the OpenACC loop structure for loop-nests of depth 2
+
+! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details" }
+! { dg-additional-options "-fopt-info-optimized -fopt-info-missed" }
+! { dg-additional-options "-O2" }
+! { dg-prune-output ".*not inlinable.*" }
+
+module test_module
+  implicit none
+
+  integer, parameter :: n = 100
+  integer, parameter :: m = 100
+
+contains
+
+  subroutine test_loop_nest_depth_2 (array)
+    integer :: i, j
+    real :: array (2, n, m)
+
+    ! Perfect loop-nest, inner and outer loop can be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, m
+          array (1, i, j) = array(2, i, j) ! { dg-message "loop has no data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+
+    ! Imperfect loop-nest, inner and outer loop can be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n) = array(1, i, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, m
+          array (1, i, j) = array (2, i,j) ! { dg-message "loop has no data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+
+    ! Imperfect loop-nest, inner loop can be parallel, outer loop cannot be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n-1
+       array (1, i+1, 1) = array (2, i, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, m
+          array (1, i, j) = array (2, i, j) ! { dg-message "loop has no data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest, inner loop can be parallel, outer loop cannot be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n) = array (1, i, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, m-1
+          array (1, i, j+1) = array (1, i, j) ! { dg-message "loop has data-dependences" }
+       end do
+    end do
+    !$acc end parallel
+    return
+  end subroutine test_loop_nest_depth_2
+
+end module test_module
+
+
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 4 "graphite"  } } One function per kernel, all should be analyzed
+! { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite" } } Original function should not be analyzed
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
new file mode 100644
index 000000000000..97acecd8807b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
@@ -0,0 +1,323 @@
+! Verify that Graphite's analysis of the CFG loops gets correctly
+! transferred to the OpenACC loop structure for loop-nests of depth 3
+
+! { dg-additional-options "-fdump-tree-graphite-details -fdump-tree-oaccloops1-details" }
+! { dg-additional-options "-fopt-info-optimized -fopt-info-missed" }
+! { dg-additional-options "-O2" }
+! { dg-prune-output ".*not inlinable.*" }
+
+module test_module
+  implicit none
+
+  integer, parameter :: n = 100
+
+contains
+
+  subroutine test_loop_nest_depth_3 (array)
+    integer :: i, j, k
+    real :: array (2, n, n, n)
+
+    ! Perfect loop-nest. Can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+    ! Perfect loop-nest. Innermost loop cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Perfect loop-nest. Cannot be parallel because it contains no
+    ! data-reference and is hence not analyzed by Graphite. This is
+    ! expected: empty loops should not be parallel either cf. e.g.
+    ! "../../gfortran.dg/goacc/note-parallelism.f90".
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-2 }
+    do i=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-2 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-bogus "loop has no data-dependences" "OpenACC internal chunking CFG loop not analyzed" {target *-*-*} .-2 }
+       ! { dg-missed ".auto. loop has not been analyzed .cf. .graphite. dumps for more information.." "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(1, i, j, k) ! { dg-bogus "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. All levels can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level can be parallel, second level
+    ! can be parallel, third level cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level can be parallel, second level
+    ! cannot be parallel, third level can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level can be parallel, second and
+    ! third level cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+    do i=1, n
+       array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level cannot be parallel, second and
+    ! third levels can be parallel
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n - 1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level cannot be parallel, second
+    ! level can be parallel, third level cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n - 1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+       do j=1, n
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n - 1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. First level cannot be parallel, second
+    ! level cannot be parallel, third level can be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n - 1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n - 1
+          array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
+          do k=1, n
+             array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+
+    ! Imperfect loop-nest. All levels cannot be parallel.
+
+    !$acc parallel copy(array)
+    !$acc loop auto
+    ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+    ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+    do i=1, n-1
+       array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
+       !$acc loop auto
+       ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+       ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+       do j=1, n-1
+          array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" }
+          !$acc loop auto
+          ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
+          ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
+          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
+          do k=1, n-1
+             array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
+          end do
+       end do
+    end do
+    !$acc end parallel
+
+    return
+  end subroutine test_loop_nest_depth_3
+
+end module test_module
+
+
+!  Outlined functions for all kernels but the one without data-references should be analyzed.
+! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 10 "graphite"  } }
+! Original test functon and one outlined kernel function should not be analyzed
+! { dg-final { scan-tree-dump-times "number of SCoPs: 0" 2 "graphite" } }
diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c
index eeb67ded3dcf..8170265a8d6e 100644
--- a/gcc/tree-chrec.c
+++ b/gcc/tree-chrec.c
@@ -249,6 +249,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
          return chrec_fold_plus_poly_poly (code, type, op0, op1);

        CASE_CONVERT:
+       case VIEW_CONVERT_EXPR:
          {
            /* We can strip sign-conversions to signed by performing the
               operation in unsigned.  */
@@ -282,6 +283,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
        }

     CASE_CONVERT:
+    case VIEW_CONVERT_EXPR:
       {
        /* We can strip sign-conversions to signed by performing the
           operation in unsigned.  */
@@ -323,6 +325,7 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
                                    : build_int_cst_type (type, -1)));

        CASE_CONVERT:
+       case VIEW_CONVERT_EXPR:
          if (tree_contains_chrecs (op1, NULL))
            return chrec_dont_know;
          /* FALLTHRU */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 71f8d790e618..1a29d2b81c0f 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -99,6 +99,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "range-op.h"
 #include "vr-values.h"
+#include "print-tree.h"
+#include "graphite-oacc.h"

 static struct datadep_stats
 {
@@ -227,7 +229,10 @@ dump_data_reference (FILE *outf,
   print_generic_stmt (outf, DR_REF (dr));
   fprintf (outf, "#  base_object: ");
   print_generic_stmt (outf, DR_BASE_OBJECT (dr));
-
+  fprintf (outf, "#  base_address: ");
+  print_generic_stmt (outf, DR_BASE_ADDRESS (dr));
+  fprintf (outf, "#  loop-invariant offset: ");
+  print_generic_stmt (outf, DR_OFFSET (dr));
   for (i = 0; i < DR_NUM_DIMENSIONS (dr); i++)
     {
       fprintf (outf, "#  Access function %d: ", i);
@@ -5833,9 +5838,13 @@ get_references_in_stmt (gimple *stmt, vec<data_ref_loc, va_heap> *references)
       if (gimple_call_internal_p (stmt))
        switch (gimple_call_internal_fn (stmt))
          {
-         case IFN_GOMP_SIMD_LANE:
-           {
-             class loop *loop = gimple_bb (stmt)->loop_father;
+         case IFN_UNIQUE:
+         case IFN_GOACC_REDUCTION:
+          case IFN_GOACC_LOOP:
+              return false;
+          case IFN_GOMP_SIMD_LANE:
+            {
+              class loop *loop = gimple_bb (stmt)->loop_father;
              tree uid = gimple_call_arg (stmt, 0);
              gcc_assert (TREE_CODE (uid) == SSA_NAME);
              if (loop == NULL
@@ -6014,7 +6023,6 @@ graphite_find_data_references_in_stmt (edge nest, loop_p loop, gimple *stmt,
   unsigned i;
   auto_vec<data_ref_loc, 2> references;
   data_ref_loc *ref;
-  bool ret = true;
   data_reference_p dr;

   if (get_references_in_stmt (stmt, &references))
@@ -6028,7 +6036,7 @@ graphite_find_data_references_in_stmt (edge nest, loop_p loop, gimple *stmt,
       datarefs->safe_push (dr);
     }

-  return ret;
+  return true;
 }

 /* Search the data references in LOOP, and record the information into
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index deff2d5e08b1..c1fa96d4acde 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -4174,7 +4174,16 @@ public:
   virtual bool gate (function *)
   {
     if (oacc_kernels_p)
-      return flag_openacc;
+      {
+       if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+         return false;
+
+        gcc_checking_assert (
+            param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+            || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
+        return flag_openacc;
+      }
     else
       return flag_tree_parallelize_loops > 1;
   }
@@ -4193,6 +4202,13 @@ public:
 unsigned
 pass_parallelize_loops::execute (function *fun)
 {
+  if (oacc_kernels_p)
+    {
+      gcc_checking_assert (
+          param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+          || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+    }
+
   tree nthreads = builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS);
   if (nthreads == NULL_TREE)
     return 0;
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index ff052be1021f..b21aff0dc3a1 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -264,6 +264,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "ssa.h"
 #include "gimple-pretty-print.h"
+#include "tree-pretty-print.h"
+#include "print-tree.h"
 #include "fold-const.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
@@ -276,6 +278,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-chrec.h"
+#include "internal-fn.h"
+#include "graphite-oacc.h"
 #include "tree-affine.h"
 #include "tree-scalar-evolution.h"
 #include "dumpfile.h"
@@ -284,6 +288,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "builtins.h"
 #include "case-cfn-macros.h"
+#include "omp-offload.h"
+#include "internal-fn.h"

 static tree analyze_scalar_evolution_1 (class loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (class loop *loop,
@@ -311,7 +317,19 @@ struct scev_info_hasher : ggc_ptr_hash<scev_info_str>

 static GTY (()) hash_table<scev_info_hasher> *scalar_evolution_info;

-
+/* This flag indicates that internal OpenACC calls should be analyzed.
+   The analysis is not valid in general. It is used to allow Graphite
+   to analyze the partially lowered OpenACC loops as if it was seeing
+   the unlowered loops. */
+
+static bool analyze_openacc_calls = false;
+
+void set_scev_analyze_openacc_calls (bool analyze)
+{
+  analyze_openacc_calls = analyze;
+}
+
+
 /* Constructs a new SCEV_INFO_STR structure for VAR and INSTANTIATED_BELOW.  */

 static inline struct scev_info_str *
@@ -577,6 +595,53 @@ get_scalar_evolution (basic_block instantiated_below, tree scalar)
   return res;
 }

+bool
+oacc_call_analyzable_p (gimple *stmt)
+{
+  return analyze_openacc_calls
+         && gimple_call_internal_p (stmt, IFN_GOACC_LOOP);
+}
+
+bool
+oacc_call_analyzable_p (tree t)
+{
+  return TREE_CODE (t) == SSA_NAME
+         && oacc_call_analyzable_p (SSA_NAME_DEF_STMT (t));
+}
+
+/* Extract loop information from a OpenACC internal function call. */
+
+tree
+oacc_ifn_call_extract (gimple *stmt)
+{
+  gcall *call = as_a<gcall *> (stmt);
+
+  if (oacc_call_analyzable_p (stmt))
+    {
+      gcc_assert (gimple_call_internal_p (stmt, IFN_GOACC_LOOP));
+      return oacc_extract_loop_call (as_a<gcall *> (stmt));
+    }
+
+  return chrec_dont_know;
+}
+
+/* If EXPR is a analyzable internal OpenACC function call,
+   return the result of its analysis; otherwise return EXPR. */
+
+tree
+oacc_simplify (tree expr)
+{
+  if (expr == NULL || TREE_CODE (expr) != SSA_NAME)
+    return expr;
+
+  gimple *def = SSA_NAME_DEF_STMT (expr);
+
+  if (oacc_call_analyzable_p (def))
+    return oacc_ifn_call_extract (def);
+
+  return expr;
+}
+
 /* Helper function for add_to_evolution.  Returns the evolution
    function for an assignment of the form "a = b + c", where "a" and
    "b" are on the strongly connected component.  CHREC_BEFORE is the
@@ -794,6 +859,8 @@ add_to_evolution (unsigned loop_nb, tree chrec_before, enum tree_code code,
   if (to_add == NULL_TREE)
     return chrec_before;

+  to_add = oacc_simplify (to_add);
+
   /* TO_ADD is either a scalar, or a parameter.  TO_ADD is not
      instantiated at this point.  */
   if (TREE_CODE (to_add) == POLYNOMIAL_CHREC)
@@ -966,6 +1033,7 @@ follow_ssa_edge_binary (class loop *loop, gimple *at_stmt,
       res = t_false;
     }

+  *evolution_of_loop = oacc_simplify (*evolution_of_loop);
   return res;
 }

@@ -1116,6 +1184,8 @@ follow_ssa_edge_inner_loop_phi (class loop *outer_loop,
                               evolution_of_loop, limit);
 }

+tree interpret_gimple_call (class loop *loop, gimple *call);
+
 /* Follow the ssa edge into the expression EXPR.
    Return true if the strongly connected component has been found.  */

@@ -1124,8 +1194,11 @@ follow_ssa_edge_expr (class loop *loop, gimple *at_stmt, tree expr,
                      gphi *halting_phi, tree *evolution_of_loop,
                      int limit)
 {
-  enum tree_code code;
-  tree type, rhs0, rhs1 = NULL_TREE;
+  enum tree_code code = LAST_AND_UNUSED_TREE_CODE;
+  tree type = NULL_TREE;
+  tree rhs0 = NULL_TREE;
+  tree rhs1 = NULL_TREE;
+

   /* The EXPR is one of the following cases:
      - an SSA_NAME,
@@ -1140,6 +1213,7 @@ follow_ssa_edge_expr (class loop *loop, gimple *at_stmt, tree expr,
      PHI nodes and otherwise expand appropriately for the expression
      handling below.  */
 tail_recurse:
+  expr = oacc_simplify (expr);
   if (TREE_CODE (expr) == SSA_NAME)
     {
       gimple *def = SSA_NAME_DEF_STMT (expr);
@@ -1187,28 +1261,37 @@ tail_recurse:
          return t_false;
        }

-      /* At this level of abstraction, the program is just a set
-        of GIMPLE_ASSIGNs and PHI_NODEs.  In principle there is no
-        other def to be handled.  */
-      if (!is_gimple_assign (def))
-       return t_false;
+      /* At this level of abstraction, the program is just a set of
+         GIMPLE_ASSIGNs and PHI_NODEs.  In principle there is no other def to
+         be handled except for OpenACC internal function calls. */
+      if (is_gimple_assign (def))
+        {
+          code = gimple_assign_rhs_code (def);
+
+          switch (get_gimple_rhs_class (code))
+            {
+            case GIMPLE_BINARY_RHS:
+              rhs0 = gimple_assign_rhs1 (def);
+              rhs1 = gimple_assign_rhs2 (def);
+              break;
+            case GIMPLE_UNARY_RHS:
+            case GIMPLE_SINGLE_RHS:
+              rhs0 = gimple_assign_rhs1 (def);
+              break;
+            default:
+              return t_false;
+            }
+          type = TREE_TYPE (gimple_assign_lhs (def));
+          at_stmt = def;
+        }
+      else if (oacc_call_analyzable_p (expr)) {
+       // TODO-kernels Is this still needed here?
+       rhs0 = interpret_gimple_call (loop, def);
+       type = TREE_TYPE (gimple_call_lhs (def));
+       at_stmt = def;
+      }
+      else return t_false;

-      code = gimple_assign_rhs_code (def);
-      switch (get_gimple_rhs_class (code))
-       {
-       case GIMPLE_BINARY_RHS:
-         rhs0 = gimple_assign_rhs1 (def);
-         rhs1 = gimple_assign_rhs2 (def);
-         break;
-       case GIMPLE_UNARY_RHS:
-       case GIMPLE_SINGLE_RHS:
-         rhs0 = gimple_assign_rhs1 (def);
-         break;
-       default:
-         return t_false;
-       }
-      type = TREE_TYPE (gimple_assign_lhs (def));
-      at_stmt = def;
     }
   else
     {
@@ -1473,6 +1556,7 @@ follow_copies_to_constant (tree var)
       else
        break;
     }
+  res = oacc_simplify (res);
   if (CONSTANT_CLASS_P (res))
     return res;
   return var;
@@ -1506,6 +1590,7 @@ analyze_initial_condition (gphi *loop_phi_node)
       tree branch = PHI_ARG_DEF (loop_phi_node, i);
       basic_block bb = gimple_phi_arg_edge (loop_phi_node, i)->src;

+      branch = oacc_simplify (branch);
       /* When the branch is oriented to the loop's body, it does
         not contribute to the initial condition.  */
       if (flow_bb_inside_loop_p (loop, bb))
@@ -1533,6 +1618,7 @@ analyze_initial_condition (gphi *loop_phi_node)
   /* We may not have fully constant propagated IL.  Handle degenerate PHIs here
      to not miss important early loop unrollings.  */
   init_cond = follow_copies_to_constant (init_cond);
+  init_cond = oacc_simplify (init_cond);

   if (dump_file && (dump_flags & TDF_SCEV))
     {
@@ -1558,6 +1644,7 @@ interpret_loop_phi (class loop *loop, gphi *loop_phi_node)
   /* Otherwise really interpret the loop phi.  */
   init_cond = analyze_initial_condition (loop_phi_node);
   res = analyze_evolution_in_loop (loop_phi_node, init_cond);
+  init_cond = analyze_initial_condition (loop_phi_node);

   /* Verify we maintained the correct initial condition throughout
      possible conversions in the SSA chain.  */
@@ -1630,8 +1717,11 @@ interpret_rhs_expr (class loop *loop, gimple *at_stmt,
        return chrec_convert (type, rhs1, at_stmt);

       if (code == SSA_NAME)
-       return chrec_convert (type, analyze_scalar_evolution (loop, rhs1),
-                             at_stmt);
+       {
+          rhs1 = oacc_simplify (rhs1);
+          return chrec_convert (type, analyze_scalar_evolution (loop, rhs1),
+                                at_stmt);
+        }

       if (code == ASSERT_EXPR)
        {
@@ -1920,7 +2010,25 @@ interpret_gimple_assign (class loop *loop, gimple *stmt)
                             gimple_assign_rhs2 (stmt));
 }

-
+/* Interpret a gimple call statement. */
+
+tree
+interpret_gimple_call (class loop *loop __attribute__ ((__unused__)), gimple *call)
+{
+
+  /* Information about OpenACC loops is encoded in internal function calls.
+     Extract loop information from those calls. Ignore other calls for now. */
+  if (!oacc_call_analyzable_p (call))
+    return chrec_dont_know;
+
+  tree expr = oacc_ifn_call_extract (call);
+  tree analyzed = expr;
+
+  tree lhs = gimple_call_lhs (call);
+  gcc_assert (lhs);
+
+  return chrec_convert (TREE_TYPE (lhs), analyzed, call);
+}

 /* This section contains all the entry points:
    - number_of_iterations_in_loop,
@@ -1943,6 +2051,8 @@ analyze_scalar_evolution_1 (class loop *loop, tree var)

   def = SSA_NAME_DEF_STMT (var);
   bb = gimple_bb (def);
+  if (!bb)
+    return chrec_dont_know;
   def_loop = bb->loop_father;

   if (!flow_bb_inside_loop_p (loop, bb))
@@ -1969,6 +2079,10 @@ analyze_scalar_evolution_1 (class loop *loop, tree var)
       res = interpret_gimple_assign (loop, def);
       break;

+    case GIMPLE_CALL:
+      res = interpret_gimple_call (loop, def);
+      break;
+
     case GIMPLE_PHI:
       if (loop_phi_node_p (def))
        res = interpret_loop_phi (loop, as_a <gphi *> (def));
@@ -2261,6 +2375,14 @@ instantiate_scev_name (edge instantiate_below,
   class loop *def_loop;
   basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (chrec));

+  if (oacc_call_analyzable_p (chrec))
+    {
+      tree res
+          = interpret_gimple_call (evolution_loop, SSA_NAME_DEF_STMT (chrec));
+
+      return res;
+    }
+
   /* A parameter, nothing to do.  */
   if (!def_bb
       || !dominated_by_p (CDI_DOMINATORS, def_bb, instantiate_below->dest))
@@ -3375,6 +3497,9 @@ expression_expensive_p (tree expr, hash_map<tree, uint64_t> &cache,
        return true;
     }

+  if (oacc_call_analyzable_p (expr))
+      return false;
+
   bool visited_p;
   uint64_t &local_cost = cache.get_or_insert (expr, &visited_p);
   if (visited_p)
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index d679f7285b30..f35bfcd80417 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -42,6 +42,9 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
                       bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void set_scev_analyze_openacc_calls (bool);
+extern bool oacc_call_analyzable_p (gimple);
+extern bool oacc_call_analyzable_p (tree);

 /* Returns the basic block preceding LOOP, or the CFG entry block when
    the loop is function's body.  */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index c027230acdc0..31637577cb7e 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -256,6 +256,17 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive)
        if (gimple_has_side_effects (stmt))
          {
            mark_stmt_necessary (stmt, true);
+
+            /* The lhs of the OpenACC loop and reduction calls necessary,
+              cf. the lowering in omp-offload.c. */
+            if (gimple_call_internal_p (stmt, IFN_UNIQUE)
+                || gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION))
+              {
+               tree lhs = gimple_call_lhs (stmt);
+               if (lhs)
+                  mark_operand_necessary (lhs);
+              }
+
            return;
          }
        /* IFN_GOACC_LOOP calls are necessary in that they are used to
@@ -267,6 +278,9 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive)
        if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP))
          {
            mark_stmt_necessary (stmt, true);
+           tree lhs = gimple_call_lhs (stmt);
+           gcc_assert (lhs);
+           mark_operand_necessary (lhs);
            return;
          }
        if (!gimple_call_lhs (stmt))
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 3817ec423e7c..c0f26ac75685 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1980,6 +1980,9 @@ simplify_replace_tree (tree expr, tree old, tree new_tree,
   return (ret ? (do_fold ? fold (ret) : ret) : expr);
 }

+bool oacc_call_analyzable_p (gimple* stmt);
+tree interpret_gimple_call (class loop *loop, gimple *call);
+
 /* Expand definitions of ssa names in EXPR as long as they are simple
    enough, and return the new expression.  If STOP is specified, stop
    expanding if EXPR equals to it.  */
@@ -1995,6 +1998,9 @@ expand_simple_operations (tree expr, tree stop, hash_map<tree, tree> &cache)
   if (expr == NULL_TREE)
     return expr;

+  if (oacc_call_analyzable_p (expr))
+    expr = interpret_gimple_call (NULL, SSA_NAME_DEF_STMT (expr));
+
   if (is_gimple_min_invariant (expr))
     return expr;

diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 21961200db66..e080c436c63a 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -155,6 +155,13 @@ make_pass_tree_loop (gcc::context *ctxt)
 static bool
 gate_oacc_kernels (function *fn)
 {
+  if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)
+    return false;
+
+  gcc_checking_assert (param_openacc_kernels
+                           == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+                       || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS);
+
   if (!flag_openacc)
     return false;

@@ -324,6 +331,10 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
+    if (param_openacc_kernels != OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+        && param_openacc_kernels != OPENACC_KERNELS_PARLOOPS)
+      return false;
+
     return (optimize
            && flag_openacc
            /* Don't bother doing anything if the program has errors.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 99d4333cdc80..16ec7172c448 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -3,6 +3,8 @@

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
+/* { dg-additional-options "-O2" } for Graphite/"kernels". */
+

 /* See also '../libgomp.oacc-fortran/parallel-dims.f90'.  */

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
index 985db81d9014..0d5ea73813de 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -7,7 +7,7 @@
 program main
   integer :: w, arr(0:31)

-  !$acc parallel num_gangs(32) num_workers(32) copyout(arr) ! { dg-warning "region is worker partitioned" }
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr) ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" }
     !$acc loop gang private(w)
     do j = 0, 31
       w = 0
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
index 5a47aca2dba2..f79d01ccc419 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-independent.f90
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
+! { dg-additional-options "-O2" } for Graphite

 #define N (1024 * 512)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90
index 37aa0ac4f632..5d35bdf9d6ff 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-1.f90
@@ -1,6 +1,7 @@
 ! Exercise the auto, independent, seq and tile loop clauses inside
 ! kernels regions.

+! { dg-additional-options "-O2" } for Graphite
 ! { dg-do run }

 program loops
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index cf1d0e569278..74ee6fde84f8 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -1,6 +1,7 @@
 ! { dg-do run }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
+! { dg-additional-options "-O2" } for Graphite

 ! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",

From patchwork Wed Nov 17 16:03:18 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47822
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 1BCB13858439
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:13:51 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 7CC6F385782F
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7CC6F385782F
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 1tqFVCUAWNEPPuGJ4G1pfuQ6DyscwYIXltofBJmN0NpO7EKNqvwVmXlQr5B0vG6WYG2Co9Zew2
 MHpBZ+IdvD08y7sSXDDzOxRHLp9kjKcmkAy37VG4M3L/sCrfwsWFpzR5d3q4HkrpZzn8wsOH64
 JkAa0vMHkfX9G547Zcis1qXuiUKxge6Ur6VnXXSnx4ijGqdwANvJ6ucflVfGRa2tQraksDL0gl
 YrP4Aw6rBehx/zpoaPRrA7EUAGwVtUdqsof69E/oeZYiv9uegye0qXJS30e+CytRBhdVaru3m8
 rloZzIuZcfJe9e0gorGlX+gZ
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445346"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:20 -0800
IronPort-SDR: 
 EYwpazGVhL9nAd9tsne0JH6G/iSRSNOYxg4BnO2+Gzf8x8Wwj7XhNzGR9ulCQC032M4OTk032a
 64Ao49TzfG6tf9ZjuI4nYWYFJFM4Vl5J9tqNG+BE8I9KlYNlGrYSHjrAzpyUA9qhnyNeDj08Q3
 V7UWIE5Aw13UgVWYY6y5/5mBxIQe7QgaGydb07hvqR7Oi0KvTKXeT6XjdEdYAitpb5j288aDUk
 dFO/SAR05cfg5j0N+OJ+Jm/4JM/0qcIz6zBqcDiYYdiQ5UkOBxis570F8L8w7hgau3rmVPXAaZ
 d00=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 10/22] openacc: Add "can_be_parallel" flag
 info to "graph" dumps
Date: Wed, 17 Nov 2021 17:03:18 +0100
Message-ID: <20211117160330.20029-10-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/ChangeLog:

        * graph.c (oacc_get_fn_attrib): New declaration.
        (find_loop_location): New declaration.
        (draw_cfg_nodes_for_loop): Print value of the
        can_be_parallel flag at the top of loops in OpenACC
        functions.
---
 gcc/graph.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graph.c b/gcc/graph.c
index ce8de33ffe10..3ad07be3b309 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -191,6 +191,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct function *fun)
     }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
    order to get a good ranking of the nodes.  This function is recursive:
    It first prints inner loops, then the body of LOOP itself.  */
@@ -205,17 +209,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int funcdef_no,

   if (loop->header != NULL
       && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-    pp_printf (pp,
-              "\tsubgraph cluster_%d_%d {\n"
-              "\tstyle=\"filled\";\n"
-              "\tcolor=\"darkgreen\";\n"
-              "\tfillcolor=\"%s\";\n"
-              "\tlabel=\"loop %d\";\n"
-              "\tlabeljust=l;\n"
-              "\tpenwidth=2;\n",
-              funcdef_no, loop->num,
-              fillcolors[(loop_depth (loop) - 1) % 3],
-              loop->num);
+    {
+      pp_printf (pp,
+                 "\tsubgraph cluster_%d_%d {\n"
+                 "\tstyle=\"filled\";\n"
+                 "\tcolor=\"darkgreen\";\n"
+                 "\tfillcolor=\"%s\";\n"
+                 "\tlabel=\"loop %d %s\";\n"
+                 "\tlabeljust=l;\n"
+                 "\tpenwidth=2;\n",
+                 funcdef_no, loop->num,
+                 fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+                 /* This is only meaningful for loops that have been processed
+                    by Graphite.
+
+                    TODO Use can_be_parallel_valid_p? */
+                 !oacc_get_fn_attrib (cfun->decl)
+                     ? ""
+                     : loop->can_be_parallel ? "(can_be_parallel = true)"
+                                             : "(can_be_parallel = false)");
+    }

   for (class loop *inner = loop->inner; inner; inner = inner->next)
     draw_cfg_nodes_for_loop (pp, funcdef_no, inner);

From patchwork Wed Nov 17 16:03:19 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47825
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9D3B63858D28
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:16:35 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 47B13385841C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 47B13385841C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 dm8P+X3fkzftovHFLMiJ8AvQh/QqM/A6u7OUvzF888H7wV7iasooryD+LCzqrbBoQ0LA1btTUl
 K21h627bph+W3J5TyaN73Lywc+7hcHxR9BxNQYj7bfiJ8Jb51DMf+mJSRSgRkgZyV3XsMRU6XH
 Gor8hJN9gVinxvaAusjiwZi4yymfeaQRyNVTyvxGv80n4VvdiS+y7dmvq/3OW0X5P9MjXb75uK
 IzMgNKDufPO/npCB2gcRyOQZP54LqemNRXUbvgl4mi2X8/o84YGuYuuu7xA+gbEU3LhJRy8MJT
 VQCGj9g4JSB6qzqh+edu9uwN
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445350"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:24 -0800
IronPort-SDR: 
 gaPWrsIgVpbaPR02sUPKtEZqjWHT+3kZLtgQxv+psVnzoz3jw4Pfu3DG+oJ1OUWwFSJyFv7Jfh
 GVvySggsxx+6LdH24exy5r+n2RvkGTh8jA/LnIISie63Nb/2URsHZL3Jdq6v8MpySGUxIolejU
 fk3TOsyStCyf57JrZw+AQhjvHNXMebER61paHNE6SX7ksVL8wmpt2hx0ZnQRtL5AfU7G2xgJ1W
 VJfanQrK6XuVQaQjS/t1tX9fa9OTGr54xgmGru/ypAThjIm6ocTrPm//fgQPWzUYqiXl9HpoP+
 BOE=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 11/22] openacc: Add further kernels tests
Date: Wed, 17 Nov 2021 17:03:19 +0100
Message-ID: <20211117160330.20029-11-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
        New test.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
        New test.
        * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
        New test.
        * c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
        * c-c++-common/goacc/kernels-reduction-parloops.c: New test.
        * c-c++-common/goacc/loop-auto-reductions.c: New test.
        * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
        New test.
        * c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
        * c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
        New test.
        * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
        New test.
        * gfortran.dg/goacc/kernels-conversion.f95: New test.
        * gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
        * gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
        * gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
        * gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
        * gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
        * gfortran.dg/goacc/kernels-reductions.f90: New test.
---
 ...classify-kernels-unparallelized-graphite.c |  41 +++++
 ...classify-kernels-unparallelized-parloops.c |  47 ++++++
 .../goacc/kernels-decompose-1-parloops.c      | 125 ++++++++++++++
 .../goacc/kernels-reduction-parloops.c        |  36 ++++
 .../c-c++-common/goacc/loop-auto-reductions.c |  22 +++
 ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++++++++++++++
 .../goacc/note-parallelism-kernels-loops-1.c  |  61 +++++++
 .../note-parallelism-kernels-loops-parloops.c |  53 ++++++
 ...assify-kernels-unparallelized-parloops.f95 |  44 +++++
 .../gfortran.dg/goacc/kernels-conversion.f95  |  52 ++++++
 .../goacc/kernels-decompose-1-parloops.f95    | 121 ++++++++++++++
 .../goacc/kernels-decompose-parloops-2.f95    | 154 ++++++++++++++++++
 .../goacc/kernels-loop-data-parloops-2.f95    |  52 ++++++
 .../goacc/kernels-loop-parloops-2.f95         |  45 +++++
 .../goacc/kernels-loop-parloops.f95           |  39 +++++
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +++++
 .../parallel-loop-auto-reduction-2.f90        |  98 +++++++++++
 17 files changed, 1155 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
new file mode 100644
index 000000000000..77f4524907a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
@@ -0,0 +1,41 @@
+/* Check offloaded function's attributes and classification for unparallelized
+   OpenACC 'kernels' with Graphite kernles handling (default).  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+   { dg-additional-options "-fopt-info-optimized-omp" }
+   { dg-additional-options "-fopt-info-note-omp" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-graphite-details" }
+   { dg-additional-options "-fdump-tree-oaccloops1" }
+   { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose-details" } */
+
+#define N 1024
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+extern unsigned int f (unsigned int);
+#pragma acc routine (f) seq
+
+void KERNELS ()
+{
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
+    /* An "extern"al mapping of loop iterations/array indices makes the loop
+       unparallelizable.  */
+    c[i] = a[f (i)] + b[f (i)]; /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+}
+
+/* Check the offloaded function's attributes.
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check that Graphite can handle neither the original nor the offloaded region
+   { dg-final { scan-tree-dump-times "number of SCoPs: 0" 2 "graphite" } }
+
+/* Check the offloaded function's classification and compute dimensions (will
+   always be 1 x 1 x 1 for non-offloading compilation).
+   { dg-final { scan-tree-dump-times "(?n)Function is parallel_kernels_graphite OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
new file mode 100644
index 000000000000..252ab8eb87b7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
@@ -0,0 +1,47 @@
+/* Check offloaded function's attributes and classification for unparallelized
+   OpenACC 'kernels' with "parloops" handling.  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "--param openacc-kernels=decompose-parloops" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+   { dg-additional-options "-fopt-info-note-optimized-omp" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-parloops1-all" }
+   { dg-additional-options "-fdump-tree-oaccloops1" }
+   { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose-details" } */
+
+/* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
+   aspects of that functionality.  */
+
+#define N 1024
+
+extern unsigned int *__restrict a;
+extern unsigned int *__restrict b;
+extern unsigned int *__restrict c;
+
+extern unsigned int f (unsigned int);
+#pragma acc routine (f) seq
+
+void KERNELS ()
+{
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    /* An "extern"al mapping of loop iterations/array indices makes the loop
+       unparallelizable.  */
+    c[i] = a[f (i)] + b[f (i)];
+}
+
+/* Check the offloaded function's attributes.
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+
+/* Check that exactly one OpenACC kernels construct is analyzed, and that it
+   can't be parallelized.
+   { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } } */
+
+/* Check the offloaded function's classification and compute dimensions (will
+   always be 1 x 1 x 1 for non-offloading compilation).
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
new file mode 100644
index 000000000000..76d528a6d8e1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
@@ -0,0 +1,125 @@
+/* Test OpenACC .kernels. region decomposition with
+   "split-parloops" handling.  */
+/* { dg-additional-options "--param openacc-kernels=decompose-parloops" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fopt-info-omp-all" } */
+/* { dg-additional-options "-Wopenacc-parallelism" } */
+/* { dg-additional-options "-O2" } for "parloops".  */
+
+/* See also "../../gfortran.dg/goacc/kernels-decompose-1.f95".  */
+
+#pragma acc routine gang
+extern int
+f_g (int);
+
+#pragma acc routine worker
+extern int
+f_w (int);
+
+#pragma acc routine vector
+extern int
+f_v (int);
+
+#pragma acc routine seq
+extern int
+f_s (int);
+
+int
+main ()
+{
+  int x, y, z;
+#define N 10
+  int a[N], b[N], c[N];
+
+#pragma acc kernels
+  {
+    x = 0; /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
+    y = x < 10;
+    z = x++;
+    ;
+  }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+  for (int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    a[i] = 0;
+
+#pragma acc kernels loop /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (int i = 0; i < N; i++)
+    b[i] = a[N - i - 1];
+
+#pragma acc kernels
+  {
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      b[i] = a[N - i - 1];
+
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      c[i] = a[i] * b[i];
+
+    a[z] = 0; /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
+
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      c[i] += a[i];
+
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
+    for (int i = 0 + 1; i < N; i++)
+      c[i] += c[i - 1];
+  }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC worker vector loop parallelism" } */
+  {
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; ++i)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
+      for (int j = 0; j < N; ++j)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+        /* { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 } */
+       for (int k = 0; k < N; ++k)
+         a[(i + j + k) % N]
+           = b[j]
+           + f_v (c[k]); /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
+
+    //TODO Should the following turn into "gang-single" instead of "parloops"?
+    //TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; else goto <D.2548>;", thus "parloops".
+    if (y < 5)
+#pragma acc loop independent /* { dg-missed "unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" } */
+      for (int j = 0; j < N; ++j)
+       b[j] = f_w (c[j]);
+  }
+
+#pragma acc kernels /* { dg-warning "region contains gang partitioned code but is not gang partitioned" } */
+  {
+    /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" "" { target *-*-* } .+1 } */
+    y = f_g (a[5]); /* { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" } */
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_w (c[j]); /* { dg-message "optimized: assigned OpenACC worker vector loop parallelism" } */
+  }
+
+#pragma acc kernels
+  {
+    y = 3; /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_v (c[j]); /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
+
+    z = 2; /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
+  }
+
+#pragma acc kernels /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
+  ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c
new file mode 100644
index 000000000000..1449f7a066d4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c
@@ -0,0 +1,36 @@
+/* { dg-additional-options "--param=openacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define n 10000
+
+unsigned int a[n];
+
+void  __attribute__((noinline,noclone))
+foo (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin (a[0:n]) copy (sum)
+  {
+    for (i = 0; i < n; ++i)
+      sum += a[i];
+  }
+
+  if (sum != 5001)
+    abort ();
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone, noinline\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c b/gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
new file mode 100644
index 000000000000..4d033ccff2d9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
@@ -0,0 +1,22 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-graphite-details" } */
+
+#include <stdlib.h>
+
+#define n 10000
+
+unsigned int a[n];
+
+void  __attribute__((noinline,noclone))
+foo (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc parallel copyin (a[0:n])
+  {
+#pragma acc loop auto reduction(+:sum) /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism"} */
+    for (i = 0; i < n; ++i)
+      sum += a[i];
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
new file mode 100644
index 000000000000..4889c398c06a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
@@ -0,0 +1,128 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing 'loop' constructs with explicit or implicit 'auto'
+   clause that are handled by "parloops".  */
+
+/* { dg-additional-options "--param openacc-kernels=decompose-parloops" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+/* { dg-additional-options "-fopt-info-note-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+#pragma acc loop
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc loop auto
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop auto
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc loop
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc loop auto
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc loop auto
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc loop
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+       ;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
new file mode 100644
index 000000000000..0cd2b9de1743
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
@@ -0,0 +1,61 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC "kernels"
+   construct containing loops.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+/* { dg-additional-options "-fopt-info-note-omp" } */
+/* { dg-additional-options "-O2" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    ;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++)  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    ;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    for (y = 0; y < 10; y++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+       ;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    ;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    for (y = 0; y < 10; y++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+      ;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    for (y = 0; y < 10; y++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+       ;
+
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */ \
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
+    for (y = 0; y < 10; y++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+       ;
+
+  return 0;
+}
+
+/* { dg-prune-output ".auto. loop cannot be parallel" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
new file mode 100644
index 000000000000..a3fea483a951
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
@@ -0,0 +1,53 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing loops.  */
+
+/* { dg-additional-options "--param openacc-kernels=decompose-parloops" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+/* { dg-additional-options "-fopt-info-note-omp" } */
+/* { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose-details" } */
+// TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+       ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+       ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
new file mode 100644
index 000000000000..c9e24449db16
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
@@ -0,0 +1,44 @@
+! Check offloaded function's attributes and classification for unparallelized
+! OpenACC kernels that are handled by "parloops".
+
+! { dg-additional-options "--param openacc-kernels=decompose-parloops" }
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+! { dg-additional-options "-fopt-info-optimized-note-omp" }
+! { dg-additional-options "-fdump-tree-ompexp" }
+! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-oaccloops1" }
+
+program main
+  implicit none
+  integer, parameter :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer :: i
+
+  ! A function call in a data-reference makes the loop unparallelizable
+  integer, external :: f
+
+  call setup(a, b)
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  do i = 0, n - 1
+                  ! { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } .-1 }
+     c(i) = a(f (i)) + b(f (i))
+  end do
+  !$acc end kernels
+end program main
+
+! Check the offloaded function's attributes.
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+
+! Check that exactly one OpenACC kernels construct is analyzed, and that it
+! can't be parallelized.
+! { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } }
+
+! Check the offloaded function's classification and compute dimensions (will
+! always be 1 x 1 x 1 for non-offloading compilation).
+! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
new file mode 100644
index 000000000000..fe287c38c387
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -0,0 +1,52 @@
+! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
+
+program main
+  implicit none
+  integer, parameter         :: N = 1024
+  integer, dimension (1:N)   :: a
+  integer                    :: i, sum
+
+  !$acc kernels copyin(a(1:N)) copy(sum)
+
+  ! converted to "oacc_kernels"
+  !$acc loop
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  ! converted to "oacc_parallel_kernels_gang_single"
+  sum = sum + 1
+  a(1) = a(1) + 1
+
+  ! converted to "oacc_parallel_kernels_parallelized"
+  !$acc loop independent
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  ! converted to "oacc_kernels"
+  if (sum .gt. 10) then
+    !$acc loop
+    do i = 1, N
+      sum = sum + a(i)
+    end do
+  end if
+
+  ! converted to "oacc_kernels"
+  !$acc loop auto
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  !$acc end kernels
+end program main
+
+! Check that the kernels region is split into a data region and enclosed
+! parallel regions.
+! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_graphite " 5 "omp_oacc_kernels_decompose" } }
+
+! Each of the parallel regions is async, and there is a final call to
+! __builtin_GOACC_wait.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_graphite async\\(-1\\)" 5 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "omp_oacc_kernels_decompose" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
new file mode 100644
index 000000000000..3ecf84da8367
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
@@ -0,0 +1,121 @@
+! Test OpenACC 'kernels' construct decomposition with "decompose-parloops"
+! handling
+
+! { dg-additional-options "--param openacc-kernels=decompose-parloops" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-Wopenacc-parallelism" }
+! { dg-additional-options "-O2" } for "parloops".
+
+! See also "../../c-c++-common/goacc/kernels-decompose-1.c".
+
+program main
+  implicit none
+
+  integer, external :: f_g
+  !$acc routine (f_g) gang
+  integer, external :: f_w
+  !$acc routine (f_w) worker
+  integer, external :: f_v
+  !$acc routine (f_v) vector
+  integer, external :: f_s
+  !$acc routine (f_s) seq
+
+  integer :: i, j, k
+  integer :: x, y, z
+  logical :: y_l
+  integer, parameter :: N = 10
+  integer :: a(N), b(N), c(N)
+
+  !$acc kernels
+  x = 0
+  y = 0
+  y_l = x < 10
+  z = x
+  x = x + 1
+  !$acc end kernels
+
+  !$acc kernels
+  do i = 1, N
+     ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } .-1 }
+     a(i) = 0
+  end do
+  !$acc end kernels
+
+  !$acc kernels loop ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc kernels
+  !$acc loop ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc loop ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N
+     c(i) = a(i) * b(i)
+  end do
+
+  a(z) = 0
+
+  !$acc loop ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N
+     c(i) = c(i) + a(i)
+  end do
+
+  !$acc loop seq ! { dg-optimized "assigned OpenACC seq loop parallelism" }
+  do i = 1 + 1, N
+     c(i) = c(i) + c(i - 1)
+  end do
+  !$acc end kernels
+
+  !$acc kernels ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  !$acc loop independent ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N
+     !$acc loop independent ! { dg-optimized "assigned OpenACC worker loop parallelism" }
+     do j = 1, N
+        !$acc loop independent ! { dg-optimized "assigned OpenACC seq loop parallelism" }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        ! { dg-bogus "optimized: assigned OpenACC vector loop parallelism" "" { target *-*-* } .-2 }
+        do k = 1, N
+           a(1 + mod(i + j + k, N)) &
+                = b(j) &
+                + f_v (c(k)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+        end do
+     end do
+  end do
+
+  !TODO Should the following turn into "gang-single" instead of "parloops"?
+  !TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; else goto <D.2548>;", thus "parloops".
+  if (y < 5) then
+     !$acc loop independent
+     do j = 1, N
+        b(j) = f_w (c(j))
+     end do
+  end if
+  !$acc end kernels
+
+  !$acc kernels  ! { dg-warning "region contains gang partitioned code but is not gang partitioned" }
+  y = f_g (a(5)) ! { dg-optimized "assigned OpenACC gang worker vector loop parallelism" }
+
+  !$acc loop independent ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do j = 1, N
+     b(j) = y + f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  y = 3
+
+  !$acc loop independent ! { dg-optimized "assigned OpenACC gang worker loop parallelism" }
+  do j = 1, N
+     b(j) = y + f_v (c(j)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+  end do
+
+  z = 2
+  !$acc end kernels
+
+  !$acc kernels
+  !$acc end kernels
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
new file mode 100644
index 000000000000..fc126ea5e037
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
@@ -0,0 +1,154 @@
+! Test OpenACC 'kernels' construct decomposition.
+
+! { dg-additional-options "-fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fopt-info-omp-all" }
+! { dg-additional-options "--param=openacc-kernels=decompose-parloops" }
+! { dg-additional-options "-O2" } for 'parloops'.
+
+! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
+! aspects of that functionality.
+
+! See also '../../c-c++-common/goacc/kernels-decompose-2.c'.
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c_loop_i 0 c_loop_j 0 c_loop_k 0 c_part 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".
+
+program main
+  implicit none
+
+  integer, external :: f_g
+  !$acc routine (f_g) gang
+  integer, external :: f_w
+  !$acc routine (f_w) worker
+  integer, external :: f_v
+  !$acc routine (f_v) vector
+  integer, external :: f_s
+  !$acc routine (f_s) seq
+
+  integer :: i, j, k
+  integer :: x, y, z
+  logical :: y_l
+  integer, parameter :: N = 10
+  integer :: a(N), b(N), c(N)
+
+  !$acc kernels
+  x = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  y = 0
+  y_l = x < 10
+  z = x
+  x = x + 1
+  ;
+  !$acc end kernels
+
+  !$acc kernels
+  do i = 1, N  ! { dg-line l_loop_i[incr c_loop_i] }
+     ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+     ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+     a(i) = 0
+  end do
+  !$acc end kernels
+
+  !$acc kernels loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc kernels
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     c(i) = a(i) * b(i)
+  end do
+
+  a(z) = 0
+
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     c(i) = c(i) + a(i)
+  end do
+
+  !$acc loop seq ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1 + 1, N
+     c(i) = c(i) + c(i - 1)
+  end do
+  !$acc end kernels
+
+  !$acc kernels ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+     ! { dg-optimized "assigned OpenACC worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+     do j = 1, N
+        !$acc loop independent ! { dg-line l_loop_k[incr c_loop_k] }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } l_loop_k$c_loop_k }
+        ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_k$c_loop_k }
+        do k = 1, N
+           a(1 + mod(i + j + k, N)) &
+                = b(j) &
+                + f_v (c(k)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+        end do
+     end do
+  end do
+
+  !TODO Should the following turn into "gang-single" instead of "parloops"?
+  !TODO The problem is that the first STMT is 'if (y <= 4) goto <D.2547>; else goto <D.2548>;', thus "parloops".
+  if (y < 5) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+     !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+     ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j }
+     do j = 1, N
+        b(j) = f_w (c(j))
+     end do
+  end if
+  !$acc end kernels
+
+  !$acc kernels
+  ! { dg-bogus "\[Ww\]arning: region contains gang partitioned code but is not gang partitioned" "TODO 'kernels'" { xfail *-*-* } .-1 }
+  y = f_g (a(5)) ! { dg-line l_part[incr c_part] }
+  !TODO If such a construct is placed in its own part (like it is, here), can't this actually use gang paralelism, instead of "gang-single"?
+  ! { dg-optimized "assigned OpenACC gang worker vector loop parallelism" "" { target *-*-* } l_part$c_part }
+
+  !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+  do j = 1, N
+     b(j) = y + f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  y = 3
+
+  !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-optimized "assigned OpenACC gang worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+  do j = 1, N
+     b(j) = y + f_v (c(j)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+  end do
+
+  z = 2
+  !$acc end kernels
+
+  !$acc kernels
+  !$acc end kernels
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
new file mode 100644
index 000000000000..c92ad4ccf6f2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
@@ -0,0 +1,52 @@
+! { dg-additional-options "--param=openacc-kernels=decompose-parloops" } as this is
+! specifically testing "parloops" handling.
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc data copyout (a(0:n-1))
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  !$acc data copyout (b(0:n-1))
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  !$acc data copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) STOP 1
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
new file mode 100644
index 000000000000..634445ad4a1b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
@@ -0,0 +1,45 @@
+! { dg-additional-options "--param openacc-kernels=decompose-parloops" } as this is
+! specifically testing "parloops" handling.
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc kernels copyout (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyout (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) STOP 1
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
new file mode 100644
index 000000000000..c6fa14f5920f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
@@ -0,0 +1,39 @@
+! { dg-additional-options "--param openacc-kernels=decompose-parloops" } as this is
+! specifically testing "parloops" handling.
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) STOP 1
+  end do
+
+end program main
+
+! Check that only one loop is analyzed, and that it can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90 b/gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
new file mode 100644
index 000000000000..2036395bf594
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
@@ -0,0 +1,37 @@
+! { dg-additional-options "--param openacc-kernels=decompose" }
+
+! A regression test checking that the reduction clause lowering does
+! not fail if a subroutine argument is used as a reduction variable in
+! a kernels region.
+
+! This was fine ...
+subroutine reduction_var_not_argument(res)
+  real res
+  real tmp
+  integer i
+
+  !$acc kernels
+  !$acc loop reduction(+:tmp)
+  do i=0,n-1
+     tmp = tmp + 1
+  end do
+  !$acc end kernels
+
+  res = tmp
+end subroutine reduction_var_not_argument
+
+! ... but this led to problems because ARG
+! was a pointer type that did not get dereferenced.
+subroutine reduction_var_as_argument(arg)
+  real arg
+  integer i
+
+  !$acc kernels
+  !$acc loop reduction(+:arg)
+  do i=0,n-1
+     arg = arg + 1
+  end do
+  !$acc end kernels
+end subroutine reduction_var_as_argument
+
+
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90
new file mode 100644
index 000000000000..0e9da426d998
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90
@@ -0,0 +1,98 @@
+! Check that the Graphite-based "auto" loop and "kernels" handling
+! is able to assign the parallelism dimensions correctly for a simple
+! loop-nest with reductions. All loops should be parallelized.
+
+! { dg-additional-options "-O2 -g" }
+! { dg-additional-options "-foffload=-fdump-tree-oaccloops1-details" }
+! { dg-additional-options "-foffload=-fopt-info-optimized" }
+! { dg-additional-options "-fdump-tree-oaccloops1-details" }
+! { dg-additional-options "-fopt-info-optimized" }
+
+module test
+  implicit none
+
+  integer, parameter :: n = 10000
+  integer :: a(n,n)
+  integer :: sums(n,n)
+
+contains
+  function sum_loop_auto() result(sum)
+    integer :: i, j
+    integer :: sum, max_val
+
+    sum = 0
+    max_val = 0
+
+    !$acc parallel copyin (a) reduction(+:sum)
+    !$acc loop auto reduction(+:sum) reduction(max:max_val) ! { dg-optimized "assigned OpenACC gang worker loop parallelism" }
+    ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
+    do i = 1,size (a, 1)
+       !$acc loop auto reduction(max:max_val) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+       ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
+       do j = 1,size(a, 2)
+          max_val = a(i,j)
+       end do
+       sum = sum + max_val
+    end do
+    !$acc end parallel
+  end function sum_loop_auto
+
+  function sum_kernels() result(sum)
+    integer :: i, j
+    integer :: sum, max_val
+
+    sum = 0
+    max_val = 0
+
+    !$acc kernels
+    ! { dg-optimized {'map\(force_tofrom:max_val [^)]+\)' optimized to 'map\(to:max_val [^)]+\)'} "" { target *-*-* } .-1 }
+    !$acc loop reduction(+:sum) reduction(max:max_val) ! { dg-optimized "assigned OpenACC gang worker loop parallelism" }
+    ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
+    ! { dg-optimized "forwarded loop nest in OpenACC .kernels. construct to .Graphite." "" { target *-*-* } .-2 }
+    do i = 1,size (a, 1)
+       !$acc loop reduction(max:max_val) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+       ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
+       do j = 1,size(a, 2)
+          max_val = a(i,j)
+       end do
+       sum = sum + max_val
+    end do
+    !$acc end kernels
+  end function sum_kernels
+end module test
+
+program main
+  use test
+
+  implicit none
+
+  integer :: result, i, j
+
+  ! We sum the maxima of n rows, each containing numbers
+  ! 1..n
+  integer, parameter :: expected_sum = n * n
+
+  do i = 1, size (a, 1) ! { dg-optimized "loop nest optimized" }
+     do j = 1, size (a, 2)
+        a(i, j) = j
+     end do
+  end do
+
+
+  result = sum_loop_auto()
+  if (result /= expected_sum) then
+     write (*, *) "Wrong result:", result
+     call abort()
+  endif
+
+  result = sum_kernels()
+  if (result /= expected_sum) then
+     write (*, *) "Wrong result:", result
+     call abort()
+  endif
+end program main
+
+! This ensures that the dg-optimized assertions above hold for both
+! compilers because the output goes to stderr and the dump file.
+! { dg-final { scan-offload-tree-dump-times "optimized: assigned OpenACC .*? parallelism" 4 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "optimized: assigned OpenACC .*? parallelism" 4 "oaccloops1" } }

From patchwork Wed Nov 17 16:03:20 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47824
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9F47B385AC35
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:16:05 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id C3EF5385AC35
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C3EF5385AC35
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 Jv2mBx3wxnX6ex/qmYiPWmalo3RYPFA1YSq1/4z6cPyS3qwUk/d5+G/D2xaLU/jbQV8z/v0daC
 tNAWKMLoo7ywwZvOESWjuNUuUEBnZ69z13T3stZZaXv2U4PEE+mKDw0INNeHTzeof/bhZfDghB
 buJhQsfY167X3p/LegI8+CEM+1RFpdpxEz0l4awWUh1GbsXGUaxLVTUbClUSlzKi8QoGp9NeRb
 4q3EKt3YFBYaQF5koqx+v0Unyl34p4oAH7Ku+p0f/Vi6sqozrcPHd2fdkQjWNxW1hIQskkmG4r
 B1nLUeCbydLol71T0hWTrjvc
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445352"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:24 -0800
IronPort-SDR: 
 a913eyI0WQSZWMYU6UvVUJ8AAFcxNJeNy9eI8ucafcj9Vo1cd8Wtq2gDFqztfSKGbN2HiHdBWZ
 FnP9M2CuI/dErNjYBIsVc6POySyaSse0tZQzBzZUh3Sj7y1phksLtCkN2cnJUfTr2hvEk2vfJQ
 eKh1U8YEB57ETg0/Q2jt1l2Yyqc+GVFmYEYoYZfAfSEi9Z5XVXcwfqTEITxunPtNInSC3tuBXz
 uQGvcx4m72bi/zZcLSlgFTBzHR/Jw1xtBpEmBxjm6VESUSlL6AQNyiXsQ/UXOUDc6ksN7FI2qG
 XwA=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 12/22] openacc: Remove unused partitioning in
 "kernels" regions
Date: Wed, 17 Nov 2021 17:03:20 +0100
Message-ID: <20211117160330.20029-12-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

        * omp-offload.c (oacc_remove_unused_partitioning): New function
        for removing partitioning that is not used by any loop.
        (oacc_validate_dims): Call oacc_remove_unused_partitioning and
        enable warnings about unused partitioning.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
        expectations.
---
 gcc/omp-offload.c                             | 51 +++++++++++++++++--
 .../acc_prof-kernels-1.c                      | 19 ++++---
 2 files changed, 59 insertions(+), 11 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index f5cb222efd8c..68cc5a9d9e5d 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1215,6 +1215,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+      /* Must be kept in sync with GOMP_DIM enumeration.  */
+      { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+    if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+      {
+        if (host_compiler)
+          {
+            strcat (removed_partitions, axes[ix]);
+            strcat (removed_partitions, " ");
+          }
+        dims[ix] = -1;
+      }
+  if (removed_partitions[0] != '\0')
+    warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+                "removed %spartitioning from %<kernels%> region",
+                removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
    raw attribute.  DIMS is an array of dimensions, which is filled in.
    LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1235,6 +1268,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
     {
       purpose[ix] = TREE_PURPOSE (pos);
+
       tree val = TREE_VALUE (pos);
       dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
       pos = TREE_CHAIN (pos);
@@ -1244,14 +1278,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+      /* Must be kept in sync with GOMP_DIM enumeration.  */
+      { "gang", "worker", "vector" };
+
   if (check
       && warn_openacc_parallelism
-      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-      && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn)))
+      && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
     {
-      static char const *const axes[] =
-      /* Must be kept in sync with GOMP_DIM enumeration.  */
-       { "gang", "worker", "vector" };
       for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
        if (dims[ix] < 0)
          ; /* Defaulting axis.  */
@@ -1262,14 +1297,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
                      "region contains %s partitioned code but"
                      " is not %s partitioned", axes[ix], axes[ix]);
        else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+         {
          /* The dimension is explicitly partitioned to non-unity, but
             no use is made within the region.  */
          warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
                      "region is %s partitioned but"
                      " does not contain %s partitioned code",
                      axes[ix], axes[ix]);
+          }
     }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+                         DECL_ATTRIBUTES (fn)))
+    oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 4a9b11a3d3fe..d398b3463617 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include <acc_prof.h>

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" } { "" } } */
+/* { dg-additional-options "-Wopenacc-parallelism" } */

 /* Use explicit 'copyin' clauses, to work around "'firstprivate'
    optimizations", which will cause the value at the point of call to be used
@@ -41,6 +43,7 @@ static int state = -1;
 static acc_device_t acc_device_type;
 static int acc_device_num;
 static int num_gangs, num_workers, vector_length;
+static int real_num_workers;
 static int async;


@@ -96,12 +99,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
     assert (event_info->launch_event.num_workers >= 1);
   else
     {
-#ifdef __OPTIMIZE__
-      assert (event_info->launch_event.num_workers == num_workers);
-#else
-      /* See 'num_gangs' above.  */
-      assert (event_info->launch_event.num_workers == 1);
-#endif
+      /* Unused partitioning levels get removed from "kernels" region. */
+      assert (event_info->launch_event.num_workers == real_num_workers);
     }
   if (vector_length < 1)
     assert (event_info->launch_event.vector_length >= 1);
@@ -186,6 +185,7 @@ int main()
   /* Parallelism dimensions: literal.  */
   num_gangs = 30;
   num_workers = 3;
+  real_num_workers = 1;
   vector_length = 5;
   {
 #define N 100
@@ -196,6 +196,8 @@ int main()
     /* { dg-prune-output "using vector_length \\(32\\), ignoring 5" } */
     {
       for (int i = 0; i < N; ++i)
+      /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 } */
+      /* { dg-warning "removed worker partitioning from 'kernels' region" "" { target *-*-* } .-2 } */
        x[i] = i * i;
     }
     if (acc_device_type == acc_device_host)
@@ -214,6 +216,9 @@ int main()
   /* Parallelism dimensions: variable.  */
   num_gangs = 22;
   num_workers = 5;
+  /* No worker loop and hence, in a kernels region, worker partitioning
+     should be removed. */
+  real_num_workers = 1;
   vector_length = 7;
   {
 #define N 100
@@ -224,6 +229,8 @@ int main()
     /* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */
     {
       for (int i = 0; i < N; ++i)
+      /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 } */
+      /* { dg-warning "removed worker partitioning from 'kernels' region" "" { target *-*-* } .-2 } */
        x[i] = i * i;
     }
     if (acc_device_type == acc_device_host)

From patchwork Wed Nov 17 16:03:21 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47826
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 35168385843D
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:17:12 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 74BD2385AC32
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 74BD2385AC32
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 sYTHKwmdbhTcfIcg19LryPB1G7PEWrrPVrK3RmJR3Feea+nk9FqkgS3Vcui/j/kunLrhNPQcjs
 p/ln8d+6wreWOE7nb9HWQjsnwVCwBVxj5IFkLRvOkvyveJ9f3iHSGgjABeU2/95q18PFs63BGH
 gHVlcLjQ1wu4pkmfCsOW8Cx6HIdjf1eLU2ZQyrfmBhGn3zLCnIwS9rW2rdpEjJL7af1511pDb8
 r+rfYwBaD3S5zKq8Oay1vh9So35Q6wPLVqT5OGrTZNnHAxDcCDynPhBxJOjgyT6IWLqDTjI6Kx
 Fhm4U7262WQDNcnjalK+0oRT
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="71081306"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa1.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:34 -0800
IronPort-SDR: 
 AR2EIAEvqqT0TJjii+rUpUtHcgmG8QgP54x5IWmmcp5LAL85os/ij9bxxOw9zejBmtsO8fQ9yb
 aoAndnv8Twi4DfOCTdNMUkP00ayi+euPg6XYy/3K/S/QMko8RZQpEYxXdf8b2RltSyd5iTRGKQ
 mKeR91WkFFUmzMxem+dbSs5sf1Mf0syFYz3MjmsCggxhD+7u+LuM54uH+oZpXkoICOAGfbAVdv
 hH97qJG5kt17Ay8lciGXuRa1LGdBidcwYKbeKwgQVrqL2dbVsrt3wJ3QbSJwrdGNOxsr5Cr4ks
 jpk=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 13/22] Add function for printing a single
 OMP_CLAUSE
Date: Wed, 17 Nov 2021 17:03:21 +0100
Message-ID: <20211117160330.20029-13-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

        * tree-pretty-print.c (print_omp_clause_to_str): Add new function.
        * tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++++++++++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index d769cd8f07c5..2e0255176c76 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1402,6 +1402,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
     }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text (&pp));
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index cafe9aa95989..3368cb9f1544 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
                              bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
                                          enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,

From patchwork Wed Nov 17 16:03:22 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47828
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 4A2A6385842C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:18:24 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 584B23857C66
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 584B23857C66
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 YStIXT6Df4ubTWQTItvKjiOC8Spo+a9acRj69KLA8gF7DYRb/swK/58CFgL8awBfrtW1com8hu
 1r5snq+KySo4JQ2dvvK9aCVwCRFVEVsjelvDekQa8yCN7FbPWWcTQlE0Cs2afvCuqtzC9mzCL+
 nQITVPy1l0TVgocgCu4fT9f435VY46Gx3GrrNdx1g0Ky6dyfG4z6omwyzWrKNt795Jwp2g0DEJ
 k3qOD2G9WsPi4md+ksCuTwY/OBk8TPwOetBwsJBB2nX1wUMQbZAYidvZU6Fs88NJmX5Qylz0Mr
 4nwReHyRMGLGv8whDV0uRQ0T
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="71081310"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa1.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:38 -0800
IronPort-SDR: 
 OgFOZvKSUBS9KaptCms3XQ9kt0o1qyj5ekAEVq+DX3rG3TRHxxL9X8YTCEYI8JWR2LqG0Xf4Wa
 27o6Bb/TIp1duao6K0oPZ7risudNckC8j4rwxCwG96DaZ3O6rD1p6fFNUKr80QJQaX3FKxltTl
 w1wsNcOly6W3EX96oi6sEKSoQ/fzkDReljkC0HKpbImIf8CS0MtJPUJunVv6rZgaGKsAEta86i
 tPJFJSns7xOz3s5+vr7rhzoOewkgRZgpfZNITAV9kWjFiRTovuKrYqUm7JoCK5BdznUv8YGv7b
 EuE=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 14/22] openacc: Add data optimization pass
Date: Wed, 17 Nov 2021 17:03:22 +0100
Message-ID: <20211117160330.20029-14-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

From: Andrew Stubbs <ams@codesourcery.com>

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

        * Makefile.in: Add pass.
        * doc/gimple.texi: TODO.
        * gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
        * gimple-walk.h (struct walk_stmt_info): Add field.
        * passes.def: Add new pass.
        * tree-pass.h (make_pass_omp_data_optimize): New declaration.
        * omp-data-optimize.cc: New file.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
        Expect optimization messages.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
        * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
        Likewise.
        * c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
        * c-c++-common/goacc/uninit-copy-clause.c: Likewise.
        * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
        * c-c++-common/goacc/omp_data_optimize-1.c: New test.
        * g++.dg/goacc/omp_data_optimize-1.C: New test.
        * gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/Makefile.in                               |   1 +
 gcc/doc/gimple.texi                           |   2 +
 gcc/gimple-walk.c                             |  15 +-
 gcc/gimple-walk.h                             |   6 +
 gcc/omp-data-optimize.cc                      | 951 ++++++++++++++++++
 gcc/passes.def                                |   1 +
 .../goacc/note-parallelism-1-kernels-loops.c  |   7 +-
 ...note-parallelism-1-kernels-straight-line.c |   9 +-
 .../goacc/note-parallelism-kernels-loops.c    |  10 +-
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +++++++++++++
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C        | 169 ++++
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++++++++++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h                               |   1 +
 .../kernels-decompose-1.c                     |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |   4 +
 17 files changed, 2444 insertions(+), 7 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4ebdcdbc5f8c..8c02b85d2a96 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1507,6 +1507,7 @@ OBJS = \
        omp-low.o \
        omp-oacc-kernels-decompose.o \
        omp-simd-clone.o \
+       omp-data-optimize.o \
        opt-problem.o \
        optabs.o \
        optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 4b3d7d7452e3..a83e17f71a40 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index cd287860994e..66fd491844d7 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
    on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
    If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
    value is stored in WI->CALLBACK_RESULT.  Also, the statement that
    produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
                     walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
     {
       tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi);
       if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
        }

       if (!wi->removed_stmt)
-       gsi_next (&gsi);
+       {
+         if (forward)
+           gsi_next (&gsi);
+         else //TODO Correct?  <http://mid.mail-archive.com/CAFiYyc1rFrh1tnCBgKWwLrCpkpLQ4_pXCT8K+dai2UtC0XezKQ@mail.gmail.com>
+           gsi_prev (&gsi);
+         //TODO This could do with some unit testing (see other 'gcc/*-tests.c' files for inspiration), to make sure all the corner cases (removing first/last, for example) work correctly.
+       }
     }

   if (wi)
diff --git a/gcc/gimple-walk.h b/gcc/gimple-walk.h
index f471f10088df..4ebc71d73ddf 100644
--- a/gcc/gimple-walk.h
+++ b/gcc/gimple-walk.h
@@ -71,6 +71,12 @@ struct walk_stmt_info

   /* True if we've removed the statement that was processed.  */
   BOOL_BITFIELD removed_stmt : 1;
+
+  /*TODO True if we're walking backward instead of forward.  */
+  //TODO This flag is only applicable for 'walk_gimple_seq'.
+  //TODO Instead of this somewhat mis-placed (?) flag here, may be able to factor out the walking logic woult of 'walk_gimple_stmt', and do the backward walking in a separate function?
+  //TODO <http://mid.mail-archive.com/874kh863d6.fsf@euler.schwinge.homeip.net>
+  BOOL_BITFIELD backward : 1;
 };

 /* Callback for walk_gimple_stmt.  Called for every statement found
diff --git a/gcc/omp-data-optimize.cc b/gcc/omp-data-optimize.cc
new file mode 100644
index 000000000000..31f615c1d2bd
--- /dev/null
+++ b/gcc/omp-data-optimize.cc
@@ -0,0 +1,951 @@
+/* OMP data optimize
+
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This pass tries to optimize OMP data movement.
+
+   The purpose is two-fold: (1) simply avoid redundant data movement, and (2)
+   as an enabler for other compiler optimizations.
+
+   Currently, the focus is on OpenACC 'kernels' constructs, but this may be
+   done more generally later: other compute constructs, but also structured
+   'data' constructs, for example.
+
+   Currently, this implements:
+    - Convert "copy/map(tofrom)" to "copyin/map(to)", where the variable is
+      known to be dead on exit.
+    - Further optimize to "private" where the variable is also known to be
+      dead on entry.
+
+   Future improvements may include:
+    - Optimize mappings that do not start as "copy/map(tofrom)".
+    - Optimize mappings to "copyout/map(from)" where the variable is dead on
+      entry, but not exit.
+    - Improved data liveness checking.
+    - Etc.
+
+   As long as we make sure to not violate user-expected OpenACC semantics, we
+   may do "anything".
+
+   The pass runs too early to use the full data flow analysis tools, so this
+   uses some simplified rules.  The analysis could certainly be improved.
+
+   A variable is dead on exit if
+    1. Nothing reads it between the end of the target region and the end
+       of the function.
+    2. It is not global, static, external, or otherwise persistent.
+    3. It is not addressable (and therefore cannot be aliased).
+    4. There are no backward jumps following the target region (and therefore
+       there can be no loop around the target region).
+
+   A variable is dead on entry if the first occurrence of the variable within
+   the target region is a write.  The algorithm attempts to check all possible
+   code paths, but may give up where control flow is too complex. No attempt
+   is made to evaluate conditionals, so it is likely that it will miss cases
+   where the user might declare private manually.
+
+   Future improvements:
+    1. Allow backward jumps (loops) where the target is also after the end of
+       the target region.
+    2. Detect dead-on-exit variables when there is a write following the
+       target region (tricky, in the presence of conditionals).
+    3. Ignore reads in the "else" branch of conditionals where the target
+       region is in the "then" branch.
+    4. Optimize global/static/external variables that are provably dead on
+       entry or exit.
+   (Most of this can be achieved by unifying the two DF algorithms in this
+   file; the one for scanning inside the target regions had to be made more
+   capable, with propagation of live state across blocks, but that's more
+   effort than I have time right now to do the rework.)
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree-pass.h"
+#include "options.h"
+#include "tree.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "gomp-constants.h"
+#include "gimple-pretty-print.h"
+
+#define DUMP_LOC(STMT) \
+  dump_user_location_t::from_location_t (OMP_CLAUSE_LOCATION (STMT))
+
+/* These types track why we could *not* optimize a variable mapping.  The
+   main reason for differentiating the different reasons is diagnostics.  */
+
+enum inhibit_kinds {
+  INHIBIT_NOT, // "optimize"
+  INHIBIT_USE,
+  INHIBIT_JMP,
+  INHIBIT_BAD
+};
+
+struct inhibit_descriptor
+{
+  enum inhibit_kinds kind;
+  gimple *stmt;
+};
+
+/* OMP Data Optimize walk state tables.  */
+struct ODO_State {
+  hash_map<tree, inhibit_descriptor> candidates;
+  hash_set<tree> visited_labels;
+  bool lhs_scanned;
+};
+
+/* These types track whether a variable can be full private, or not.
+
+   These are ORDERED in ascending precedence; when combining two values
+   (at a conditional or switch), the higher value is used.   */
+
+enum access_kinds {
+  ACCESS_NONE,      /* Variable not accessed.  */
+  ACCESS_DEF_FIRST, /* Variable is defined before use.  */
+  ACCESS_UNKNOWN,   /* Status is yet to be determined.  */
+  ACCESS_UNSUPPORTED, /* Variable is array or reference.  */
+  ACCESS_USE_FIRST  /* Variable is used without definition (live on entry).  */
+};
+
+struct ODO_BB {
+  access_kinds access;
+  gimple *foot_stmt;
+};
+
+struct ODO_Target_state {
+  tree var;
+
+  const void *bb_id;  /* A unique id for the BB (use a convenient pointer).  */
+  ODO_BB bb;
+  bool lhs_scanned;
+  bool can_short_circuit;
+
+  hash_map<const void*,ODO_BB> scanned_bb;
+};
+
+/* Classify a newly discovered variable, and add it to the candidate list.  */
+
+static void
+omp_data_optimize_add_candidate (const dump_user_location_t &loc, tree var,
+                                ODO_State *state)
+{
+  inhibit_descriptor in;
+  in.stmt = NULL;
+
+  if (DECL_EXTERNAL (var))
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc,
+                        " -> unsuitable variable: %<%T%> is external\n", var);
+
+      in.kind = INHIBIT_BAD;
+    }
+  else if (TREE_STATIC (var))
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc,
+                        " -> unsuitable variable: %<%T%> is static\n", var);
+
+      in.kind = INHIBIT_BAD;
+    }
+  else if (TREE_ADDRESSABLE (var))
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc,
+                        " -> unsuitable variable: %<%T%> is addressable\n",
+                        var);
+
+      in.kind = INHIBIT_BAD;
+    }
+  else
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, loc, " -> candidate variable: %<%T%>\n",
+                        var);
+
+      in.kind = INHIBIT_NOT;
+    }
+
+  if (state->candidates.put (var, in))
+    gcc_unreachable ();
+}
+
+/* Add all the variables in a gimple bind statement to the list of
+   optimization candidates.  */
+
+static void
+omp_data_optimize_stmt_bind (const gbind *bind, ODO_State *state)
+{
+  if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+    dump_printf_loc (MSG_NOTE, bind, "considering scope\n");
+
+  tree vars = gimple_bind_vars (bind);
+  for (tree var = vars; var; var = TREE_CHAIN (var))
+    omp_data_optimize_add_candidate (bind, var, state);
+}
+
+/* Assess a control flow statement to see if it prevents us from optimizing
+   OMP variable mappings.  A conditional jump usually won't, but a loop
+   means a much more complicated liveness algorithm than this would be needed
+   to reason effectively.  */
+
+static void
+omp_data_optimize_stmt_jump (gimple *stmt, ODO_State *state)
+{
+  /* In the general case, in presence of looping/control flow, we cannot make
+     any promises about (non-)uses of 'var's -- so we have to inhibit
+     optimization.  */
+  if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+    dump_printf_loc (MSG_NOTE, stmt, "loop/control encountered: %G\n", stmt);
+
+  bool forward = false;
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_COND:
+      if (state->visited_labels.contains (gimple_cond_true_label
+                                         (as_a <gcond*> (stmt)))
+         && state->visited_labels.contains (gimple_cond_false_label
+                                            (as_a <gcond*> (stmt))))
+       forward = true;
+      break;
+    case GIMPLE_GOTO:
+      if (state->visited_labels.contains (gimple_goto_dest
+                                         (as_a <ggoto*> (stmt))))
+       forward = true;
+      break;
+    case GIMPLE_SWITCH:
+       {
+         gswitch *sw = as_a <gswitch*> (stmt);
+         forward = true;
+         for (unsigned i = 0; i < gimple_switch_num_labels (sw); i++)
+           if (!state->visited_labels.contains (CASE_LABEL
+                                                (gimple_switch_label (sw,
+                                                                      i))))
+             {
+               forward = false;
+               break;
+             }
+         break;
+       }
+    case GIMPLE_ASM:
+       {
+         gasm *asm_stmt = as_a <gasm*> (stmt);
+         forward = true;
+         for (unsigned i = 0; i < gimple_asm_nlabels (asm_stmt); i++)
+           if (!state->visited_labels.contains (TREE_VALUE
+                                                (gimple_asm_label_op
+                                                 (asm_stmt, i))))
+             {
+               forward = false;
+               break;
+             }
+         break;
+       }
+    default:
+      gcc_unreachable ();
+    }
+  if (forward)
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, stmt,
+                        " -> forward jump; candidates remain valid\n");
+
+      return;
+    }
+
+  /* If we get here then control flow has invalidated all current optimization
+     candidates.  */
+  for (hash_map<tree, inhibit_descriptor>::iterator it = state->candidates.begin ();
+       it != state->candidates.end ();
+       ++it)
+    {
+      if ((*it).second.kind == INHIBIT_BAD)
+       continue;
+
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       dump_printf_loc (MSG_NOTE, stmt, " -> discarding candidate: %T\n",
+                        (*it).first);
+
+      /* We're walking backward: this earlier instance ("earlier" in
+        'gimple_seq' forward order) overrides what we may have had before.  */
+      (*it).second.kind = INHIBIT_JMP;
+      (*it).second.stmt = stmt;
+    }
+}
+
+/* A helper callback for omp_data_optimize_can_be_private.
+   Check if an operand matches the specific one we're looking for, and
+   assess the context in which it appears.  */
+
+static tree
+omp_data_optimize_scan_target_op (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  ODO_Target_state *state = (ODO_Target_state *)wi->info;
+  tree op = *tp;
+
+  if (wi->is_lhs && !state->lhs_scanned
+      && state->bb.access != ACCESS_USE_FIRST)
+    {
+      /* We're at the top level of the LHS operand.  Anything we scan inside
+        (array indices etc.) should be treated as RHS.  */
+      state->lhs_scanned = 1;
+
+      /* Writes to arrays and references are unhandled, as yet.  */
+      tree base = get_base_address (op);
+      if (base && base != op && base == state->var)
+       {
+         state->bb.access = ACCESS_UNSUPPORTED;
+         *walk_subtrees = 0;
+       }
+      /* Write to scalar variable.  */
+      else if (op == state->var)
+       {
+         state->bb.access = ACCESS_DEF_FIRST;
+         *walk_subtrees = 0;
+       }
+    }
+  else if (op == state->var)
+    {
+      state->bb.access = ACCESS_USE_FIRST;
+      *walk_subtrees = 0;
+    }
+  return NULL;
+}
+
+/* A helper callback for omp_data_optimize_can_be_private, this assesses a
+   statement inside a target region to see how it affects the data flow of the
+   operands.  A set of basic blocks is recorded, each with the observed access
+   details for the given variable.  */
+
+static tree
+omp_data_optimize_scan_target_stmt (gimple_stmt_iterator *gsi_p,
+                                   bool *handled_ops_p,
+                                   struct walk_stmt_info *wi)
+{
+  ODO_Target_state *state = (ODO_Target_state *) wi->info;
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  /* If an access was found in the previous statement then we're done.  */
+  if (state->bb.access != ACCESS_NONE && state->can_short_circuit)
+    {
+      *handled_ops_p = true;
+      return (tree)1;  /* Return non-NULL, otherwise ignored.  */
+    }
+
+  /* If the first def/use is already found then don't check more operands.  */
+  *handled_ops_p = state->bb.access != ACCESS_NONE;
+
+  switch (gimple_code (stmt))
+    {
+    /* These will be the last statement in a basic block, and will always
+       be followed by a label or the end of scope.  */
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+      if (state->bb.access == ACCESS_NONE)
+       state->bb.access = ACCESS_UNKNOWN;
+      state->bb.foot_stmt = stmt;
+      state->can_short_circuit = false;
+      break;
+
+    /* asm goto statements are not necessarily followed by a label.  */
+    case GIMPLE_ASM:
+      if (gimple_asm_nlabels (as_a <gasm*> (stmt)) > 0)
+       {
+         if (state->bb.access == ACCESS_NONE)
+           state->bb.access = ACCESS_UNKNOWN;
+         state->bb.foot_stmt = stmt;
+         state->scanned_bb.put (state->bb_id, state->bb);
+
+         /* Start a new fake BB using the asm string as a unique id.  */
+         state->bb_id = gimple_asm_string (as_a <gasm*> (stmt));
+         state->bb.access = ACCESS_NONE;
+         state->bb.foot_stmt = NULL;
+         state->can_short_circuit = false;
+       }
+      break;
+
+    /* A label is the beginning of a new basic block, and possibly the end
+       of the previous, in the case of a fall-through.  */
+    case GIMPLE_LABEL:
+      if (state->bb.foot_stmt == NULL)
+       state->bb.foot_stmt = stmt;
+      if (state->bb.access == ACCESS_NONE)
+       state->bb.access = ACCESS_UNKNOWN;
+      state->scanned_bb.put (state->bb_id, state->bb);
+
+      state->bb_id = gimple_label_label (as_a <glabel*> (stmt));
+      state->bb.access = ACCESS_NONE;
+      state->bb.foot_stmt = NULL;
+      break;
+
+    /* These should not occur inside target regions??  */
+    case GIMPLE_RETURN:
+      gcc_unreachable ();
+
+    default:
+      break;
+    }
+
+  /* Now walk the operands.  */
+  state->lhs_scanned = false;
+  return NULL;
+}
+
+/* Check every operand under a gimple statement to see if a specific variable
+   is dead on entry to an OMP TARGET statement.  If so, then we can make the
+   variable mapping PRIVATE.  */
+
+static bool
+omp_data_optimize_can_be_private (tree var, gimple *target_stmt)
+{
+  ODO_Target_state state;
+  state.var = var;
+  void *root_id = var;  /* Any non-null pointer will do for the unique ID.  */
+  state.bb_id = root_id;
+  state.bb.access = ACCESS_NONE;
+  state.bb.foot_stmt = NULL;
+  state.lhs_scanned = false;
+  state.can_short_circuit = true;
+
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = &state;
+
+  /* Walk the target region and build the BB list.  */
+  gimple_seq target_body = *gimple_omp_body_ptr (target_stmt);
+  walk_gimple_seq (target_body, omp_data_optimize_scan_target_stmt,
+                  omp_data_optimize_scan_target_op, &wi);
+
+  /* Calculate the liveness data for the whole region.  */
+  if (state.can_short_circuit)
+    ; /* state.access has the answer already.  */
+  else
+    {
+      /* There's some control flow to navigate.  */
+
+      /* First enter the final BB into the table.  */
+      state.scanned_bb.put (state.bb_id, state.bb);
+
+      /* Propagate the known access findings to the parent BBs.
+
+        For each BB that does not have a known liveness value, combine
+        the liveness data from its descendent BBs, if known.  Repeat until
+        there are no more changes to make.  */
+      bool changed;
+      do {
+       changed = false;
+       for (hash_map<const void*,ODO_BB>::iterator it = state.scanned_bb.begin ();
+            it != state.scanned_bb.end ();
+            ++it)
+         {
+           ODO_BB *bb = &(*it).second;
+           tree label;
+           const void *bb_id1, *bb_id2;
+           ODO_BB *chain_bb1, *chain_bb2;
+           unsigned num_labels;
+
+           /* The foot statement is NULL, in the exit block.
+              Blocks that already have liveness data are done.  */
+           if (bb->foot_stmt == NULL
+               || bb->access != ACCESS_UNKNOWN)
+             continue;
+
+           /* If we get here then bb->access == ACCESS_UNKNOWN.  */
+           switch (gimple_code (bb->foot_stmt))
+             {
+             /* If the final statement of a block is the label statement
+                then we have a fall-through.  The liveness data can be simply
+                copied from the next block.  */
+             case GIMPLE_LABEL:
+               bb_id1 = gimple_label_label (as_a <glabel*> (bb->foot_stmt));
+               chain_bb1 = state.scanned_bb.get (bb_id1);
+               if (chain_bb1->access != ACCESS_UNKNOWN)
+                 {
+                   bb->access = chain_bb1->access;
+                   changed = true;
+                 }
+               break;
+
+             /* Combine the liveness data from both branches of a conditional
+                statement.  The access values are ordered such that the
+                higher value takes precedence.  */
+             case GIMPLE_COND:
+               bb_id1 = gimple_cond_true_label (as_a <gcond*>
+                                                (bb->foot_stmt));
+               bb_id2 = gimple_cond_false_label (as_a <gcond*>
+                                                 (bb->foot_stmt));
+               chain_bb1 = state.scanned_bb.get (bb_id1);
+               chain_bb2 = state.scanned_bb.get (bb_id2);
+               bb->access = (chain_bb1->access > chain_bb2->access
+                             ? chain_bb1->access
+                             : chain_bb2->access);
+               if (bb->access != ACCESS_UNKNOWN)
+                 changed = true;
+               break;
+
+             /* Copy the liveness data from the destination block.  */
+             case GIMPLE_GOTO:
+               bb_id1 = gimple_goto_dest (as_a <ggoto*> (bb->foot_stmt));
+               chain_bb1 = state.scanned_bb.get (bb_id1);
+               if (chain_bb1->access != ACCESS_UNKNOWN)
+                 {
+                   bb->access = chain_bb1->access;
+                   changed = true;
+                 }
+               break;
+
+             /* Combine the liveness data from all the branches of a switch
+                statement.  The access values are ordered such that the
+                highest value takes precedence.  */
+             case GIMPLE_SWITCH:
+               num_labels = gimple_switch_num_labels (as_a <gswitch*>
+                                                      (bb->foot_stmt));
+               bb->access = ACCESS_NONE;  /* Lowest precedence value.  */
+               for (unsigned i = 0; i < num_labels; i++)
+                 {
+                   label = gimple_switch_label (as_a <gswitch*>
+                                                (bb->foot_stmt), i);
+                   chain_bb1 = state.scanned_bb.get (CASE_LABEL (label));
+                   bb->access = (bb->access > chain_bb1->access
+                                 ? bb->access
+                                 : chain_bb1->access);
+                 }
+               if (bb->access != ACCESS_UNKNOWN)
+                 changed = true;
+               break;
+
+             /* Combine the liveness data from all the branches of an asm goto
+                statement.  The access values are ordered such that the
+                highest value takes precedence.  */
+             case GIMPLE_ASM:
+               num_labels = gimple_asm_nlabels (as_a <gasm*> (bb->foot_stmt));
+               bb->access = ACCESS_NONE;  /* Lowest precedence value.  */
+               /* Loop through all the labels and the fall-through block.  */
+               for (unsigned i = 0; i < num_labels + 1; i++)
+                 {
+                   if (i < num_labels)
+                     bb_id1 = TREE_VALUE (gimple_asm_label_op
+                                          (as_a <gasm*> (bb->foot_stmt), i));
+                   else
+                     /* The fall-through fake-BB uses the string for an ID. */
+                     bb_id1 = gimple_asm_string (as_a <gasm*>
+                                                 (bb->foot_stmt));
+                   chain_bb1 = state.scanned_bb.get (bb_id1);
+                   bb->access = (bb->access > chain_bb1->access
+                                 ? bb->access
+                                 : chain_bb1->access);
+                 }
+               if (bb->access != ACCESS_UNKNOWN)
+                 changed = true;
+               break;
+
+             /* No other statement kinds should appear as foot statements.  */
+             default:
+               gcc_unreachable ();
+             }
+         }
+      } while (changed);
+
+      /* The access status should now be readable from the initial BB,
+        if one could be determined.  */
+      state.bb = *state.scanned_bb.get (root_id);
+    }
+
+  if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+    {
+      for (hash_map<const void*,ODO_BB>::iterator it = state.scanned_bb.begin ();
+          it != state.scanned_bb.end ();
+          ++it)
+       {
+         ODO_BB *bb = &(*it).second;
+         dump_printf_loc (MSG_NOTE, bb->foot_stmt,
+                          "%<%T%> is %s on entry to block ending here\n", var,
+                          (bb->access == ACCESS_NONE
+                           || bb->access == ACCESS_DEF_FIRST ? "dead"
+                           : bb->access == ACCESS_USE_FIRST ? "live"
+                           : bb->access == ACCESS_UNSUPPORTED
+                           ? "unknown (unsupported op)"
+                           : "unknown (complex control flow)"));
+       }
+      /* If the answer was found early then then the last BB to be scanned
+        will not have been entered into the table.  */
+      if (state.can_short_circuit)
+       dump_printf_loc (MSG_NOTE, target_stmt,
+                        "%<%T%> is %s on entry to target region\n", var,
+                        (state.bb.access == ACCESS_NONE
+                         || state.bb.access == ACCESS_DEF_FIRST ? "dead"
+                         : state.bb.access == ACCESS_USE_FIRST ? "live"
+                         : state.bb.access == ACCESS_UNSUPPORTED
+                         ? "unknown (unsupported op)"
+                         : "unknown (complex control flow)"));
+    }
+
+  if (state.bb.access != ACCESS_DEF_FIRST
+      && dump_enabled_p () && dump_flags & TDF_DETAILS)
+    dump_printf_loc (MSG_NOTE, target_stmt, "%<%T%> is not suitable"
+                    " for private optimization; %s\n", var,
+                    (state.bb.access == ACCESS_USE_FIRST
+                     ? "live on entry"
+                     : state.bb.access == ACCESS_UNKNOWN
+                     ? "complex control flow"
+                     : "unknown reason"));
+
+  return state.bb.access == ACCESS_DEF_FIRST;
+}
+
+/* Inspect a tree operand, from a gimple walk, and check to see if it is a
+   variable use that might mean the variable is not a suitable candidate for
+   optimization in a prior target region.
+
+   This algorithm is very basic and can be easily fooled by writes with
+   subsequent reads, but it should at least err on the safe side.  */
+
+static void
+omp_data_optimize_inspect_op (tree op, ODO_State *state, bool is_lhs,
+                             gimple *stmt)
+{
+  if (is_lhs && !state->lhs_scanned)
+    {
+      /* We're at the top level of the LHS operand.
+         Anything we scan inside should be treated as RHS.  */
+      state->lhs_scanned = 1;
+
+      /* Writes to variables are not yet taken into account, beyond not
+        invalidating the optimization, but not everything on the
+        left-hand-side is a write (array indices, etc.), and if one element of
+        an array is written to then we should assume the rest is live.  */
+      tree base = get_base_address (op);
+      if (base && base == op)
+       return;  /* Writes to scalars are not a "use".  */
+    }
+
+  if (!DECL_P (op))
+    return;
+
+  /* If we get here then we have found a use of a variable.  */
+  tree var = op;
+
+  inhibit_descriptor *id = state->candidates.get (var);
+  if (id && id->kind != INHIBIT_BAD)
+    {
+      if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+       {
+         if (gimple_code (stmt) == GIMPLE_OMP_TARGET)
+           dump_printf_loc (MSG_NOTE, id->stmt,
+                            "encountered variable use in target stmt\n");
+         else
+           dump_printf_loc (MSG_NOTE, id->stmt,
+                            "encountered variable use: %G\n", stmt);
+         dump_printf_loc (MSG_NOTE, id->stmt,
+                          " -> discarding candidate: %T\n", op);
+       }
+
+      /* We're walking backward: this earlier instance ("earlier" in
+        'gimple_seq' forward order) overrides what we may have had before.  */
+      id->kind = INHIBIT_USE;
+      id->stmt = stmt;
+    }
+}
+
+/* Optimize the data mappings of a target region, where our backward gimple
+   walk has identified that the variable is definitely dead on exit.  */
+
+static void
+omp_data_optimize_stmt_target (gimple *stmt, ODO_State *state)
+{
+  for (tree *pc = gimple_omp_target_clauses_ptr (stmt); *pc;
+       pc = &OMP_CLAUSE_CHAIN (*pc))
+    {
+      if (OMP_CLAUSE_CODE (*pc) != OMP_CLAUSE_MAP)
+       continue;
+
+      tree var = OMP_CLAUSE_DECL (*pc);
+      if (OMP_CLAUSE_MAP_KIND (*pc) == GOMP_MAP_FORCE_TOFROM
+         || OMP_CLAUSE_MAP_KIND (*pc) == GOMP_MAP_TOFROM)
+       {
+       /* The dump_printf_loc format code %T does not print
+          the head clause of a clause chain but the whole chain.
+          Print the last considered clause manually. */
+        char *c_s_prev = NULL;
+        if (dump_enabled_p ())
+         c_s_prev = print_omp_clause_to_str (*pc);
+
+        inhibit_descriptor *id = state->candidates.get (var);
+        if (!id) {
+          /* The variable was not a parameter or named in any bind, so it
+             must be in an external scope, and therefore live-on-exit.  */
+          if (dump_enabled_p ())
+            dump_printf_loc(MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                            "%qs not optimized: %<%T%> is unsuitable"
+                            " for privatization\n",
+                            c_s_prev, var);
+          continue;
+           }
+
+         switch (id->kind)
+           {
+           case INHIBIT_NOT:  /* Don't inhibit optimization.  */
+
+             /* Change map type from "tofrom" to "to".  */
+             OMP_CLAUSE_SET_MAP_KIND (*pc, GOMP_MAP_TO);
+
+             if (dump_enabled_p ())
+               {
+                 char *c_s_opt = print_omp_clause_to_str (*pc);
+                 dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, DUMP_LOC (*pc),
+                                  "%qs optimized to %qs\n", c_s_prev, c_s_opt);
+                 free (c_s_prev);
+                 c_s_prev = c_s_opt;
+               }
+
+             /* Variables that are dead-on-entry and dead-on-loop can be
+                further optimized to private.  */
+             if (omp_data_optimize_can_be_private (var, stmt))
+               {
+                 tree c_f = (build_omp_clause
+                             (OMP_CLAUSE_LOCATION (*pc),
+                              OMP_CLAUSE_PRIVATE));
+                 OMP_CLAUSE_DECL (c_f) = var;
+                 OMP_CLAUSE_CHAIN (c_f) = OMP_CLAUSE_CHAIN (*pc);
+                 //TODO Copy "implicit" flag from 'var'.
+                 *pc = c_f;
+
+                 if (dump_enabled_p ())
+                   {
+                     char *c_s_opt = print_omp_clause_to_str (*pc);
+                     dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, DUMP_LOC (*pc),
+                                      "%qs further optimized to %qs\n",
+                                      c_s_prev, c_s_opt);
+                     free (c_s_prev);
+                     c_s_prev = c_s_opt;
+                   }
+               }
+             break;
+
+           case INHIBIT_USE:  /* Optimization inhibited by a variable use.  */
+             if (dump_enabled_p ())
+               {
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                                  "%qs not optimized: %<%T%> used...\n",
+                                  c_s_prev, var);
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, id->stmt,
+                                  "... here\n");
+               }
+             break;
+
+           case INHIBIT_JMP:  /* Optimization inhibited by control flow.  */
+             if (dump_enabled_p ())
+               {
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                                  "%qs not optimized: %<%T%> disguised by"
+                                  " looping/control flow...\n", c_s_prev, var);
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, id->stmt,
+                                  "... here\n");
+               }
+             break;
+
+           case INHIBIT_BAD:  /* Optimization inhibited by properties.  */
+             if (dump_enabled_p ())
+               {
+                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, DUMP_LOC (*pc),
+                                  "%qs not optimized: %<%T%> is unsuitable"
+                                  " for privatization\n", c_s_prev, var);
+               }
+             break;
+
+           default:
+             gcc_unreachable ();
+           }
+
+         if (dump_enabled_p ())
+           free (c_s_prev);
+       }
+    }
+
+  /* Variables used by target regions cannot be optimized from earlier
+     target regions.  */
+  for (tree c = *gimple_omp_target_clauses_ptr (stmt);
+       c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      /* This needs to include all the mapping clauses listed in
+        OMP_TARGET_CLAUSE_MASK in c-parser.c.  */
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+         && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_PRIVATE
+         && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_FIRSTPRIVATE)
+       continue;
+
+      tree var = OMP_CLAUSE_DECL (c);
+      omp_data_optimize_inspect_op (var, state, false, stmt);
+    }
+}
+
+/* Call back for gimple walk.  Scan the statement for target regions and
+   variable uses or control flow that might prevent us optimizing offload
+   data copies.  */
+
+static tree
+omp_data_optimize_callback_stmt (gimple_stmt_iterator *gsi_p,
+                                bool *handled_ops_p,
+                                struct walk_stmt_info *wi)
+{
+  ODO_State *state = (ODO_State *) wi->info;
+
+  *handled_ops_p = false;
+  state->lhs_scanned = false;
+
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  switch (gimple_code (stmt))
+    {
+    /* A bind introduces a new variable scope that might include optimizable
+       variables.  */
+    case GIMPLE_BIND:
+      omp_data_optimize_stmt_bind (as_a <gbind *> (stmt), state);
+      break;
+
+    /* Tracking labels allows us to understand control flow better.  */
+    case GIMPLE_LABEL:
+      state->visited_labels.add (gimple_label_label (as_a <glabel *> (stmt)));
+      break;
+
+    /* Statements that might constitute some looping/control flow pattern
+       may inhibit optimization of target mappings.  */
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+    case GIMPLE_ASM:
+      omp_data_optimize_stmt_jump (stmt, state);
+      break;
+
+    /* A target statement that will have variables for us to optimize.  */
+    case GIMPLE_OMP_TARGET:
+      /* For now, only look at OpenACC 'kernels' constructs.  */
+      if (gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+       {
+         omp_data_optimize_stmt_target (stmt, state);
+
+         /* Don't walk inside the target region; use of private variables
+            inside the private region does not stop them being private!
+            NOTE: we *do* want to walk target statement types that are not
+            (yet) handled by omp_data_optimize_stmt_target as the uses there
+            must not be missed.  */
+         // TODO add tests for mixed kernels/parallels
+         *handled_ops_p = true;
+       }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Call back for gimple walk.  Scan the operand for variable uses.  */
+
+static tree
+omp_data_optimize_callback_op (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+
+  omp_data_optimize_inspect_op (*tp, (ODO_State *)wi->info, wi->is_lhs,
+                               wi->stmt);
+
+  *walk_subtrees = 1;
+  return NULL;
+}
+
+/* Main pass entry point.  See comments at head of file.  */
+
+static unsigned int
+omp_data_optimize (void)
+{
+  /* Capture the function arguments so that they can be optimized.  */
+  ODO_State state;
+  for (tree decl = DECL_ARGUMENTS (current_function_decl);
+       decl;
+       decl = DECL_CHAIN (decl))
+    {
+      const dump_user_location_t loc = dump_user_location_t::from_function_decl (decl);
+      omp_data_optimize_add_candidate (loc, decl, &state);
+    }
+
+  /* Scan and optimize the function body, from bottom to top.  */
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.backward = true;
+  wi.info = &state;
+  gimple_seq body = gimple_body (current_function_decl);
+  walk_gimple_seq (body, omp_data_optimize_callback_stmt,
+                  omp_data_optimize_callback_op, &wi);
+
+  return 0;
+}
+
+
+namespace {
+
+const pass_data pass_data_omp_data_optimize =
+{
+  GIMPLE_PASS, /* type */
+  "omp_data_optimize", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_omp_data_optimize : public gimple_opt_pass
+{
+public:
+  pass_omp_data_optimize (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_data_optimize, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openacc
+           && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+  }
+  virtual unsigned int execute (function *)
+  {
+    return omp_data_optimize ();
+  }
+
+}; // class pass_omp_data_optimize
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_data_optimize (gcc::context *ctxt)
+{
+  return new pass_omp_data_optimize (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 9220fdc8ca75..48c9821011f0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_warn_unused_result);
   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
+  NEXT_PASS (pass_omp_data_optimize);
   NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);
   NEXT_PASS (pass_lower_cf);
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
index 3bcb7f430f4d..6cf51904e7ad 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
@@ -12,7 +12,12 @@ main ()
 {
   int x, y, z;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
+/* { dg-optimized {'map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:z \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-1 }*/
+/* { dg-optimized {'map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:y \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-2 }*/
+/* { dg-optimized {'map\(force_tofrom:x \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:x \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-3 }*/
+/* { dg-optimized {'map\(to:x \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(x\)'}  "" { target *-*-* } .-4 } */
+
  /* Strangely indented to keep this similar to other test cases.  */
  {
   for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
index 829811e1ec7f..d4cb2364737c 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
@@ -28,7 +28,14 @@ main ()
 {
   int x, y, z;

-#pragma acc kernels /* { dg-warning "region contains gang partitioned code but is not gang partitioned" } */
+#pragma acc kernels /* { dg-line l_pragma_kernels } */
+  /* The variables aren't loop variables (on explicit or implicit 'loop' directives), so don't get (implicit) 'private' clauses, but they're (implicit) 'copy' -- which we then see get optimized: */
+  /* { dg-optimized {'map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:z \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_pragma_kernels } */
+  /* { dg-optimized {'map\(to:z \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(z\)'} "" { target *-*-* } l_pragma_kernels } */
+  /* { dg-optimized {'map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:y \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_pragma_kernels } */
+  /* { dg-optimized {'map\(to:y \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(y\)\'} "" { target *-*-* } l_pragma_kernels } */
+  /* { dg-optimized {'map\(force_tofrom:x \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:x \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_pragma_kernels } */
+  /* { dg-optimized {'map\(to:x \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(x\)'} "" { target *-*-* } l_pragma_kernels } */
   {
     x = 0; /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
     y = x < 10;
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
index 8d82c21c1aa9..92accdf27fa2 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
@@ -41,8 +41,14 @@ main ()
       for (z = 0; z < 10; z++)
        ;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  /* { dg-optimized {'map\(force_tofrom:x \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:x \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-1 } */
+  /* { dg-optimized {'map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:y \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-2 } */
+  /* { dg-optimized {'map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:z \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-3 } */
+  /* { dg-optimized {'map\(to:x \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(x\)'} "" { target *-*-* } .-4 } */
+  /* { dg-optimized {'map\(to:y \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(y\)'} "" { target *-*-* } .-5 } */
+  /* { dg-optimized {'map\(to:z \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(z\)'} "TODO-kernels z not privatized?" { xfail *-*-* } .-6 } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     for (y = 0; y < 10; y++)
       for (z = 0; z < 10; z++)
        ;
diff --git a/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c b/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
new file mode 100644
index 000000000000..c90031a40b71
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
@@ -0,0 +1,677 @@
+/* Test 'gcc/omp-data-optimize.c'.  */
+
+/* { dg-additional-options "-fdump-tree-gimple-raw" } */
+/* { dg-additional-options "-fopt-info-omp-all" } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_compute[variable c_compute 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid
+   "WARNING: dg-line var l_compute defined, but not used".
+   { dg-line l_use[variable c_use 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid
+   "WARNING: dg-line var l_use defined, but not used".
+   { dg-line l_lcf[variable c_lcf 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_lcf } to avoid
+   "WARNING: dg-line var l_lcf defined, but not used".  */
+
+extern int ef1(int);
+
+
+/* Optimization happens.  */
+
+long opt_1_gvar1;
+extern short opt_1_evar1;
+static long opt_1_svar1;
+
+static int opt_1(int opt_1_pvar1)
+{
+  int opt_1_lvar1;
+  extern short opt_1_evar2;
+  static long opt_1_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    int dummy1 = opt_1_pvar1;
+    int dummy2 = opt_1_lvar1;
+    int dummy3 = opt_1_evar2;
+    int dummy4 = opt_1_svar2;
+
+    int dummy5 = opt_1_gvar1;
+    int dummy6 = opt_1_evar1;
+    int dummy7 = opt_1_svar1;
+  }
+
+  return 0;
+
+/* { dg-optimized {'map\(force_tofrom:opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:opt_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_1_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+long opt_2_gvar1;
+extern short opt_2_evar1;
+static long opt_2_svar1;
+
+static int opt_2(int opt_2_pvar1)
+{
+  int opt_2_lvar1;
+  extern short opt_2_evar2;
+  static long opt_2_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    int dummy1 = opt_2_pvar1;
+    int dummy2 = opt_2_lvar1;
+    int dummy3 = opt_2_evar2;
+    int dummy4 = opt_2_svar2;
+
+    int dummy5 = opt_2_gvar1;
+    int dummy6 = opt_2_evar1;
+    int dummy7 = opt_2_svar1;
+  }
+
+  /* A write does not inhibit optimization.  */
+
+  opt_2_pvar1 = 0;
+  opt_2_lvar1 = 1;
+  opt_2_evar2 = 2;
+  opt_2_svar2 = 3;
+
+  opt_2_gvar1 = 10;
+  opt_2_evar1 = 11;
+  opt_2_svar1 = 12;
+
+  return 0;
+
+/* { dg-optimized {'map\(force_tofrom:opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_2_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+long opt_3_gvar1;
+extern short opt_3_evar1;
+static long opt_3_svar1;
+
+static int opt_3(int opt_3_pvar1)
+{
+  int opt_3_lvar1;
+  extern short opt_3_evar2;
+  static long opt_3_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    /* A write inside the kernel inhibits optimization to firstprivate.
+       TODO: optimize to private where the variable is dead-on-entry.  */
+
+    opt_3_pvar1 = 1;
+    opt_3_lvar1 = 2;
+    opt_3_evar2 = 3;
+    opt_3_svar2 = 4;
+
+    opt_3_gvar1 = 5;
+    opt_3_evar1 = 6;
+    opt_3_svar1 = 7;
+  }
+
+  return 0;
+
+/* { dg-optimized {'map\(force_tofrom:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_pvar1\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-optimized {'map\(force_tofrom:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:opt_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:opt_3_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_3_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {'map\(force_tofrom:opt_3_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void opt_4()
+{
+  int opt_4_larray1[10];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      int dummy1 = opt_4_larray1[4];
+      int dummy2 = opt_4_larray1[8];
+    }
+
+/* { dg-optimized {'map\(tofrom:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-bogus {'map\(to:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'firstprivate\(opt_4_larray1\)'} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void opt_5 (int opt_5_pvar1)
+{
+  int opt_5_larray1[10];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      opt_5_larray1[opt_5_pvar1] = 1;
+      opt_5_pvar1[opt_5_larray1] = 2;
+    }
+
+/* { dg-optimized {'map\(force_tofrom:opt_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+
+/* TODO: this probably should be optimizable.  */
+/* { dg-missed {'map\(tofrom:opt_5_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_5_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+
+/* Similar, but with optimization inhibited because of variable use.  */
+
+static int use_1(int use_1_pvar1)
+{
+  float use_1_lvar1;
+  extern char use_1_evar2;
+  static double use_1_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    use_1_pvar1 = 0;
+    use_1_lvar1 = 1;
+    use_1_evar2 = 2;
+    use_1_svar2 = 3;
+  }
+
+  int s = 0;
+  s += use_1_pvar1; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_1_lvar1; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_1_evar2; /* { dg-bogus {note: \.\.\. here} "" { target *-*-* } }  */
+  s += use_1_svar2; /* { dg-bogus {note: \.\.\. here} "" { target *-*-* } }  */
+
+  return s;
+
+/* { dg-missed {'map\(force_tofrom:use_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_pvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+extern int use_2_a1[];
+
+static int use_2(int use_2_pvar1)
+{
+  int use_2_lvar1;
+  extern int use_2_evar2;
+  static int use_2_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    use_2_pvar1 = 0;
+    use_2_lvar1 = 1;
+    use_2_evar2 = 2;
+    use_2_svar2 = 3;
+  }
+
+  int s = 0;
+  s += use_2_a1[use_2_pvar1]; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_2_a1[use_2_lvar1]; /* { dg-missed {\.\.\. here} "" { target *-*-* } } */
+  s += use_2_a1[use_2_evar2];
+  s += use_2_a1[use_2_svar2];
+
+  return s;
+
+/*TODO The following GIMPLE dump scanning maybe too fragile (across
+  different GCC configurations)?  The idea is to verify that we're indeed
+  doing the "deep scanning", as discussed in
+  <http://mid.mail-archive.com/877dm463sc.fsf@euler.schwinge.homeip.net>.  */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_pvar1\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-missed {'map\(force_tofrom:use_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_pvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_lvar1\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-missed {'map\(force_tofrom:use_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <var_decl, use_2_evar2\.[^,]+, use_2_evar2, NULL, NULL>$} 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_evar2\.[^\]]+\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <var_decl, use_2_svar2\.[^,]+, use_2_svar2, NULL, NULL>$} 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times {(?n)  gimple_assign <array_ref, [^,]+, use_2_a1\[use_2_svar2\.[^\]]+\], NULL, NULL>$} 1 "gimple" } } */
+/* { dg-missed {'map\(force_tofrom:use_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:use_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute } */
+}
+
+static void use_3 ()
+{
+  int use_5_lvar1;
+  int use_5_larray1[10];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      use_5_lvar1 = 5;
+    }
+
+  use_5_larray1[use_5_lvar1] = 1; /* { dg-line l_use[incr c_use] } */
+
+/* { dg-missed {'map\(force_tofrom:use_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_5_lvar1' used\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+
+/* Similar, but with the optimization inhibited because of looping/control flow.  */
+
+static void lcf_1(int lcf_1_pvar1)
+{
+  float lcf_1_lvar1;
+  extern char lcf_1_evar2;
+  static double lcf_1_svar2;
+
+  for (int i = 0; i < ef1(i); ++i) /* { dg-line l_lcf[incr c_lcf] } */
+ {
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_1_pvar1 = 0;
+    lcf_1_lvar1 = 1;
+    lcf_1_evar2 = 2;
+    lcf_1_svar2 = 3;
+  }
+ }
+
+/* { dg-missed {'map\(force_tofrom:lcf_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */
+}
+
+static void lcf_2(int lcf_2_pvar1)
+{
+  float lcf_2_lvar1;
+  extern char lcf_2_evar2;
+  static double lcf_2_svar2;
+
+  if (ef1 (0))
+    return;
+
+ repeat:
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_2_pvar1 = 0;
+    lcf_2_lvar1 = 1;
+    lcf_2_evar2 = 2;
+    lcf_2_svar2 = 3;
+  }
+
+  goto repeat; /* { dg-line l_lcf[incr c_lcf] } */
+
+/* { dg-missed {'map\(force_tofrom:lcf_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+/* { dg-missed {'map\(force_tofrom:lcf_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */
+}
+
+static void lcf_3(int lcf_3_pvar1)
+{
+  float lcf_3_lvar1;
+  extern char lcf_3_evar2;
+  static double lcf_3_svar2;
+
+  if (ef1 (0))
+    return;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_3_pvar1 = 0;
+    lcf_3_lvar1 = 1;
+    lcf_3_evar2 = 2;
+    lcf_3_svar2 = 3;
+  }
+
+  // Backward jump after kernel
+ repeat:
+  goto repeat; /* { dg-line l_lcf[incr c_lcf] } */
+
+/* { dg-missed {'map\(force_tofrom:lcf_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_pvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+/* { dg-missed {'map\(force_tofrom:lcf_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_lvar1' disguised by looping/control flow\.\.\.} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_lcf$c_lcf } */
+}
+
+static void lcf_4(int lcf_4_pvar1)
+{
+  float lcf_4_lvar1;
+  extern char lcf_4_evar2;
+  static double lcf_4_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_4_pvar1 = 0;
+    lcf_4_lvar1 = 1;
+    lcf_4_evar2 = 2;
+    lcf_4_svar2 = 3;
+  }
+
+  // Forward jump after kernel
+  goto out;
+
+    out:
+  return;
+
+/* { dg-missed {'map\(force_tofrom:lcf_4_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_4_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+}
+
+static void lcf_5(int lcf_5_pvar1)
+{
+  float lcf_5_lvar1;
+  extern char lcf_5_evar2;
+  static double lcf_5_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_5_pvar1 = 0;
+    lcf_5_lvar1 = 1;
+    lcf_5_evar2 = 2;
+    lcf_5_svar2 = 3;
+  }
+
+  if (ef1 (-1))
+    ;
+
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_5_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_5_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void lcf_6(int lcf_6_pvar1)
+{
+  float lcf_6_lvar1;
+  extern char lcf_6_evar2;
+  static double lcf_6_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_6_pvar1 = 0;
+    lcf_6_lvar1 = 1;
+    lcf_6_evar2 = 2;
+    lcf_6_svar2 = 3;
+  }
+
+  int x = ef1 (-2) ? 1 : -1;
+
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_6_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_6_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void lcf_7(int lcf_7_pvar1)
+{
+  float lcf_7_lvar1;
+  extern char lcf_7_evar2;
+  static double lcf_7_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_7_pvar1 = 0;
+    lcf_7_lvar1 = 1;
+    lcf_7_evar2 = 2;
+    lcf_7_svar2 = 3;
+  }
+
+  switch (ef1 (-2))
+    {
+    case 0: ef1 (10); break;
+    case 2: ef1 (11); break;
+    default: ef1 (12); break;
+    }
+
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_7_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_7_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_7_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_7_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_7_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_7_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_7_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_7_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void lcf_8(int lcf_8_pvar1)
+{
+  float lcf_8_lvar1;
+  extern char lcf_8_evar2;
+  static double lcf_8_svar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  {
+    lcf_8_pvar1 = 0;
+    lcf_8_lvar1 = 1;
+    lcf_8_evar2 = 2;
+    lcf_8_svar2 = 3;
+  }
+
+  asm goto ("" :::: out);
+
+out:
+  return;
+
+/* { dg-optimized {'map\(force_tofrom:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_8_pvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_8_pvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:lcf_8_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_8_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {'map\(force_tofrom:lcf_8_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_8_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:lcf_8_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_8_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+/* Ensure that variables are promoted to private properly.  */
+
+static void priv_1 ()
+{
+  int priv_1_lvar1, priv_1_lvar2, priv_1_lvar3, priv_1_lvar4, priv_1_lvar5;
+  int priv_1_lvar6, priv_1_lvar7, priv_1_lvar8, priv_1_lvar9, priv_1_lvar10;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      priv_1_lvar1 = 1;
+      int dummy = priv_1_lvar2;
+
+      if (priv_1_lvar2)
+       {
+         priv_1_lvar3 = 1;
+       }
+      else
+       {
+         priv_1_lvar3 = 2;
+       }
+
+      priv_1_lvar5 = priv_1_lvar3;
+
+      if (priv_1_lvar2)
+       {
+         priv_1_lvar4 = 1;
+         int dummy = priv_1_lvar4;
+       }
+
+      switch (priv_1_lvar2)
+       {
+       case 0:
+         priv_1_lvar5 = 1;
+         dummy = priv_1_lvar6;
+         break;
+       case 1:
+         priv_1_lvar5 = 2;
+         priv_1_lvar6 = 3;
+         break;
+       default:
+         break;
+       }
+
+      asm goto ("" :: "r"(priv_1_lvar7) :: label1, label2);
+      if (0)
+       {
+label1:
+         priv_1_lvar8 = 1;
+         priv_1_lvar9 = 2;
+       }
+      if (0)
+       {
+label2:
+         dummy = priv_1_lvar9;
+         dummy = priv_1_lvar10;
+       }
+    }
+
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar2\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar3\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar4\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar5\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar6 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar6\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar7 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar7\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-optimized {'map\(to:priv_1_lvar8 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar8\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar9 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar9\)'} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-optimized {'map\(force_tofrom:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+ { dg-bogus {'map\(to:priv_1_lvar10 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar10\)'} "" { target *-*-* } l_compute$c_compute } */
+}
+
+static void multiple_kernels_1 ()
+{
+#pragma acc kernels
+    {
+      int multiple_kernels_1_lvar1 = 1;
+    }
+
+    int multiple_kernels_2_lvar1;
+#pragma acc kernels
+    {
+      int multiple_kernels_2_lvar1 = 1;
+    }
+
+#pragma acc parallel
+    {
+      multiple_kernels_2_lvar1++;
+    }
+}
+
+static int ref_1 ()
+{
+  int *ref_1_ref1;
+  int ref_1_lvar1;
+
+  ref_1_ref1 = &ref_1_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      ref_1_lvar1 = 1;
+    }
+
+  return *ref_1_ref1;
+
+/* { dg-missed {'map\(force_tofrom:ref_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int ref_2 ()
+{
+  int *ref_2_ref1;
+  int ref_2_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      ref_2_lvar1 = 1;
+    }
+
+  ref_2_ref1 = &ref_2_lvar1;
+  return *ref_2_ref1;
+
+/* { dg-missed {'map\(force_tofrom:ref_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void ref_3 ()
+{
+  int ref_3_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  // FIXME: could be optimized
+    {
+      int *ref_3_ref1 = &ref_3_lvar1;
+      ref_3_lvar1 = 1;
+    }
+
+/* { dg-missed {'map\(force_tofrom:ref_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void ref_4 ()
+{
+  int ref_4_lvar1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  // FIXME: could be optmized
+    {
+      int *ref_4_ref1 = &ref_4_lvar1;
+      *ref_4_ref1 = 1;
+    }
+
+/* { dg-missed {'map\(force_tofrom:ref_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static void conditional_1 (int conditional_1_pvar1)
+{
+  int conditional_1_lvar1 = 1;
+
+  if (conditional_1_pvar1)
+    {
+      // TODO: should be opimizable, but isn't due to later usage in the
+      // linear scan.
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+       {
+         int dummy = conditional_1_lvar1;
+       }
+    }
+  else
+    {
+      int dummy = conditional_1_lvar1; /* { dg-line l_use[incr c_use] } */
+    }
+
+/* { dg-missed {'map\(force_tofrom:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'conditional_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static void conditional_2 (int conditional_2_pvar1)
+{
+  int conditional_2_lvar1 = 1;
+
+  if (conditional_2_pvar1)
+    {
+      int dummy = conditional_2_lvar1;
+    }
+  else
+    {
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+       {
+         int dummy = conditional_2_lvar1;
+       }
+    }
+
+/* { dg-optimized {'map\(force_tofrom:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute } */
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
index b3cc4459328f..628b84940a1c 100644
--- a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
@@ -7,6 +7,12 @@ foo (void)
   int i;

 #pragma acc kernels
+  /* { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 } */
+  /*TODO With the 'copy' -> 'firstprivate' optimization, the original implicit 'copy(i)' clause gets optimized into a 'firstprivate(i)' clause -- and the expected (?) warning diagnostic appears.
+    Have to read up the history behind these test cases.
+    Should this test remain here in this file even if now testing 'firstprivate'?
+    Or, should the optimization be disabled for such testing?
+    Or, the testing be duplicated for both variants?  */
   {
     i = 1;
   }
diff --git a/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C b/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
new file mode 100644
index 000000000000..5483e5682410
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
@@ -0,0 +1,169 @@
+/* Test 'gcc/omp-data-optimize.c'.  */
+
+/* { dg-additional-options "-std=c++11" } */
+/* { dg-additional-options "-fdump-tree-gimple-raw" } */
+/* { dg-additional-options "-fopt-info-omp-all" } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_compute[variable c_compute 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid
+   "WARNING: dg-line var l_compute defined, but not used".
+   { dg-line l_use[variable c_use 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid
+   "WARNING: dg-line var l_use defined, but not used".  */
+
+static int closure_1 (int closure_1_pvar1)
+{
+  int closure_1_lvar1 = 1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_1_lvar1 = closure_1_pvar1;
+    }
+
+  auto lambda = [closure_1_lvar1]() {return closure_1_lvar1;}; /* { dg-line l_use[incr c_use] } */
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_1_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:closure_1_lvar1 \[len: [0-9]\]\[implicit\]\)' not optimized: 'closure_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute } */
+/* { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static int closure_2 (int closure_2_pvar1)
+{
+  int closure_2_lvar1 = 1;
+
+  auto lambda = [closure_2_lvar1]() {return closure_2_lvar1;};
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_2_lvar1 = closure_2_pvar1;
+    }
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_2_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-optimized {'map\(force_tofrom:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:closure_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(closure_2_lvar1\)'} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_3 (int closure_3_pvar1)
+{
+  int closure_3_lvar1 = 1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_3_lvar1 = closure_3_pvar1;
+    }
+
+  auto lambda = [&]() {return closure_3_lvar1;};
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_3_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {map\(force_tofrom:closure_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_4 (int closure_4_pvar1)
+{
+  int closure_4_lvar1 = 1;
+
+  auto lambda = [&]() {return closure_4_lvar1;};
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_4_lvar1 = closure_4_pvar1;
+    }
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_4_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {map\(force_tofrom:closure_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_5 (int closure_5_pvar1)
+{
+  int closure_5_lvar1 = 1;
+
+  auto lambda = [=]() {return closure_5_lvar1;};
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_5_lvar1 = closure_5_pvar1;
+    }
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_5_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-optimized {'map\(force_tofrom:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+   { dg-optimized {'map\(to:closure_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(closure_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute }  */
+}
+
+static int closure_6 (int closure_6_pvar1)
+{
+  int closure_6_lvar1 = 1;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      closure_6_lvar1 = closure_6_pvar1;
+    }
+
+  auto lambda = [=]() {return closure_6_lvar1;}; /* { dg-line l_use[incr c_use] } */
+
+  return lambda();
+
+/* { dg-optimized {'map\(force_tofrom:closure_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:closure_6_pvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }  */
+/* { dg-missed {'map\(force_tofrom:closure_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'closure_6_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static int try_1 ()
+{
+  int try_1_lvar1, try_1_lvar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      try_1_lvar1 = 1;
+    }
+
+  try {
+    try_1_lvar2 = try_1_lvar1; /* { dg-line l_use[incr c_use] } */
+  } catch (...) {}
+
+  return try_1_lvar2;
+
+/* { dg-missed {'map\(force_tofrom:try_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'try_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
+
+static int try_2 ()
+{
+  int try_2_lvar1, try_2_lvar2;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    {
+      /* { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }  */
+      try_2_lvar1 = 1;
+    }
+
+  try {
+    try_2_lvar2 = 1;
+  } catch (...) {
+    try_2_lvar2 = try_2_lvar1; /* { dg-line l_use[incr c_use] } */
+  }
+
+  return try_2_lvar2;
+
+/* { dg-missed {'map\(force_tofrom:try_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'try_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+   { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use } */
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 b/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90
new file mode 100644
index 000000000000..ce3e556faf26
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90
@@ -0,0 +1,588 @@
+! { dg-additional-options "-fdump-tree-gimple-raw" }
+! { dg-additional-options "-fopt-info-omp-all" }
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_compute[variable c_compute 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_compute } to avoid
+! "WARNING: dg-line var l_compute defined, but not used".
+! { dg-line l_use[variable c_use 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_use } to avoid
+! "WARNING: dg-line var l_use defined, but not used".
+
+module globals
+  use ISO_C_BINDING
+  implicit none
+  integer :: opt_1_gvar1 = 1
+  integer(C_INT), bind(C) :: opt_1_evar1
+  integer :: opt_2_gvar1 = 1
+  integer(C_INT), bind(C) :: opt_2_evar1
+  integer :: opt_3_gvar1 = 1
+  integer(C_INT), bind(C) :: opt_3_evar1
+  integer :: use_1_gvar1 = 1
+  integer(C_INT), bind(C) :: use_1_evar1
+  integer :: use_2_gvar1 = 1
+  integer(C_INT), bind(C) :: use_2_evar1
+  integer :: use_2_a1(100)
+  integer(C_INT), bind(C) :: lcf_1_evar2
+  integer(C_INT), bind(C) :: lcf_2_evar2
+  integer(C_INT), bind(C) :: lcf_3_evar2
+  integer(C_INT), bind(C) :: lcf_4_evar2
+  integer(C_INT), bind(C) :: lcf_5_evar2
+  integer(C_INT), bind(C) :: lcf_6_evar2
+  save
+end module globals
+
+subroutine opt_1 (opt_1_pvar1)
+  use globals
+  implicit none
+  integer :: opt_1_pvar1
+  integer :: opt_1_lvar1
+  integer, save :: opt_1_svar1 = 3
+  integer :: dummy1, dummy2, dummy3, dummy4, dummy5
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    dummy1 = opt_1_pvar1;
+    dummy2 = opt_1_lvar1;
+
+    dummy3 = opt_1_gvar1;
+    dummy4 = opt_1_evar1;
+    dummy5 = opt_1_svar1;
+  !$acc end kernels
+
+! Parameter is pass-by-reference
+! { dg-missed {'map\(force_tofrom:\*opt_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-missed {'map\(force_tofrom:opt_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_1_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy3\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy4\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy5\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_1
+
+subroutine opt_2 (opt_2_pvar1)
+  use globals
+  implicit none
+  integer :: opt_2_pvar1
+  integer :: opt_2_lvar1
+  integer, save :: opt_2_svar1 = 3
+  integer :: dummy1, dummy2, dummy3, dummy4, dummy5
+
+  !$acc kernels    ! { dg-line l_compute[incr c_compute] }
+    dummy1 = opt_2_pvar1;
+    dummy2 = opt_2_lvar1;
+
+    dummy3 = opt_2_gvar1;
+    dummy4 = opt_2_evar1;
+    dummy5 = opt_2_svar1;
+  !$acc end kernels
+
+  ! A write does not inhibit optimization.
+  opt_2_pvar1 = 0;
+  opt_2_lvar1 = 1;
+
+  opt_2_gvar1 = 10;
+  opt_2_evar1 = 11;
+  opt_2_svar1 = 12;
+
+! { dg-missed {'map\(force_tofrom:\*opt_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_2_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy3\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy4\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy5\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_2
+
+subroutine opt_3 (opt_3_pvar1)
+  use globals
+  implicit none
+  integer :: opt_3_pvar1
+  integer :: opt_3_lvar1
+  integer, save :: opt_3_svar1 = 3
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    opt_3_pvar1 = 0;
+    opt_3_lvar1 = 1;
+
+    opt_3_gvar1 = 10;
+    opt_3_evar1 = 11;
+    opt_3_svar1 = 12;
+  !$acc end kernels
+
+! Parameter is pass-by-reference
+! { dg-missed {'map\(force_tofrom:\*opt_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*opt_3_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-optimized {'map\(force_tofrom:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:opt_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(opt_3_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-missed {'map\(force_tofrom:opt_3_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_3_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_3_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+
+! { dg-missed {'map\(force_tofrom:opt_3_svar1 \[len: 4\]\[implicit\]\)' not optimized: 'opt_3_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_3
+
+subroutine opt_4 ()
+  implicit none
+  integer, dimension(10) :: opt_4_larray1
+  integer :: dummy1, dummy2
+
+  ! TODO Fortran local arrays are addressable (and may be visable to nested
+  ! functions, etc.) so they are not optimizable yet.
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    dummy1 = opt_4_larray1(4)
+    dummy2 = opt_4_larray1(8)
+  !$acc end kernels
+
+! { dg-missed {'map\(tofrom:opt_4_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_4_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-optimized {'map\(force_tofrom:dummy1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy2\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_4
+
+subroutine opt_5 (opt_5_pvar1)
+  implicit none
+  integer, dimension(10) :: opt_5_larray1
+  integer :: opt_5_lvar1, opt_5_pvar1
+
+  opt_5_lvar1 = opt_5_pvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    opt_5_larray1(opt_5_lvar1) = 1
+  !$acc end kernels
+
+! { dg-missed {'map\(tofrom:opt_5_larray1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'opt_5_larray1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+!
+! { dg-optimized {'map\(force_tofrom:opt_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:opt_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine opt_5
+
+subroutine use_1 (use_1_pvar1)
+  use globals
+  implicit none
+  integer :: use_1_pvar1
+  integer :: use_1_lvar1
+  integer, save :: use_1_svar1 = 3
+  integer :: s
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    use_1_pvar1 = 0;
+    use_1_lvar1 = 1;
+
+    ! FIXME: svar is optimized: should not be
+    use_1_gvar1 = 10;
+    use_1_evar1 = 11;
+    use_1_svar1 = 12;
+  !$acc end kernels
+
+  s = 0
+  s = s + use_1_pvar1
+  s = s + use_1_lvar1 ! { dg-missed {\.\.\. here} "" { target *-*-* } }
+  s = s + use_1_gvar1
+  s = s + use_1_evar1
+  s = s + use_1_svar1
+
+! { dg-missed {'map\(force_tofrom:\*use_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*use_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_1_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_1_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine use_1
+
+subroutine use_2 (use_2_pvar1)
+  use globals
+  implicit none
+  integer :: use_2_pvar1
+  integer :: use_2_lvar1
+  integer, save :: use_2_svar1 = 3
+  integer :: s
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    use_2_pvar1 = 0;
+    use_2_lvar1 = 1;
+    use_2_gvar1 = 10;
+    use_2_evar1 = 11;
+    use_2_svar1 = 12;
+  !$acc end kernels
+
+  s = 0
+  s = s + use_2_a1(use_2_pvar1)
+  s = s + use_2_a1(use_2_lvar1) ! { dg-missed {\.\.\. here} "" { target *-*-* } }
+  s = s + use_2_a1(use_2_gvar1)
+  s = s + use_2_a1(use_2_evar1)
+  s = s + use_2_a1(use_2_svar1)
+
+! { dg-missed {'map\(force_tofrom:\*use_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*use_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_gvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_gvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_evar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_evar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:use_2_svar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'use_2_svar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine use_2
+
+! Optimization inhibited because of looping/control flow.
+
+subroutine lcf_1 (lcf_1_pvar1, iter)
+  use globals
+  implicit none
+  real :: lcf_1_pvar1
+  real :: lcf_1_lvar1
+  real, save :: lcf_1_svar2
+  integer :: i, iter
+
+  do i = 1, iter ! { dg-line l_use[incr c_use] }
+    !$acc kernels ! { dg-line l_compute[incr c_compute] }
+      lcf_1_pvar1 = 0
+      lcf_1_lvar1 = 1
+      lcf_1_evar2 = 2
+      lcf_1_svar2 = 3
+    !$acc end kernels
+  end do
+
+! { dg-missed {'map\(force_tofrom:\*lcf_1_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_1_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_1_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_1_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_1_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine lcf_1
+
+subroutine lcf_2 (lcf_2_pvar1)
+  use globals
+  implicit none
+  real :: lcf_2_pvar1
+  real :: lcf_2_lvar1
+  real, save :: lcf_2_svar2
+  integer :: dummy
+
+10 dummy = 1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_2_pvar1 = 0
+    lcf_2_lvar1 = 1
+    lcf_2_evar2 = 2
+    lcf_2_svar2 = 3
+  !$acc end kernels
+
+  go to 10 ! { dg-line l_use[incr c_use] }
+
+! { dg-missed {'map\(force_tofrom:\*lcf_2_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_2_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_2_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_2_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_2_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine lcf_2
+
+subroutine lcf_3 (lcf_3_pvar1)
+  use globals
+  implicit none
+  real :: lcf_3_pvar1
+  real :: lcf_3_lvar1
+  real, save :: lcf_3_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_3_pvar1 = 0
+    lcf_3_lvar1 = 1
+    lcf_3_evar2 = 2
+    lcf_3_svar2 = 3
+  !$acc end kernels
+
+  ! Backward jump after kernel
+10 dummy = 1
+  go to 10 ! { dg-line l_use[incr c_use] }
+
+! { dg-missed {'map\(force_tofrom:\*lcf_3_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_3_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_lvar1' disguised by looping/control flow...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_3_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_3_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_3_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine lcf_3
+
+subroutine lcf_4 (lcf_4_pvar1)
+  use globals
+  implicit none
+  real :: lcf_4_pvar1
+  real :: lcf_4_lvar1
+  real, save :: lcf_4_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_4_pvar1 = 0
+    lcf_4_lvar1 = 1
+    lcf_4_evar2 = 2
+    lcf_4_svar2 = 3
+  !$acc end kernels
+
+  ! Forward jump after kernel
+  go to 10
+10 dummy = 1
+
+! { dg-missed {'map\(force_tofrom:\*lcf_4_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_4_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:lcf_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_4_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_4_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_4_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_4_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine lcf_4
+
+subroutine lcf_5 (lcf_5_pvar1, lcf_5_pvar2)
+  use globals
+  implicit none
+  real :: lcf_5_pvar1
+  real :: lcf_5_pvar2
+  real :: lcf_5_lvar1
+  real, save :: lcf_5_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_5_pvar1 = 0
+    lcf_5_lvar1 = 1
+    lcf_5_evar2 = 2
+    lcf_5_svar2 = 3
+  !$acc end kernels
+
+  if (lcf_5_pvar2 > 0) then
+    dummy = 1
+  end if
+
+! { dg-missed {'map\(force_tofrom:\*lcf_5_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_5_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:lcf_5_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_5_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_5_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_5_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_5_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine lcf_5
+
+subroutine lcf_6 (lcf_6_pvar1, lcf_6_pvar2)
+  use globals
+  implicit none
+  real :: lcf_6_pvar1
+  real :: lcf_6_pvar2
+  real :: lcf_6_lvar1
+  real, save :: lcf_6_svar2
+  integer :: dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    lcf_6_pvar1 = 0
+    lcf_6_lvar1 = 1
+    lcf_6_evar2 = 2
+    lcf_6_svar2 = 3
+  !$acc end kernels
+
+  dummy = merge(1,0, lcf_6_pvar2 > 0)
+
+! { dg-missed {'map\(force_tofrom:\*lcf_6_pvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*lcf_6_pvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:lcf_6_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(lcf_6_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_6_evar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_evar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:lcf_6_svar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'lcf_6_svar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine lcf_6
+
+subroutine priv_1 ()
+  implicit none
+  integer :: priv_1_lvar1, priv_1_lvar2, priv_1_lvar3, priv_1_lvar4
+  integer :: priv_1_lvar5, priv_1_lvar6, dummy
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ! { dg-message {note: beginning 'Graphite' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 } */
+    priv_1_lvar1 = 1
+    dummy = priv_1_lvar2
+
+    if (priv_1_lvar2 > 0) then
+        priv_1_lvar3 = 1
+    else
+        priv_1_lvar3 = 2
+    end if
+
+    priv_1_lvar5 = priv_1_lvar3
+
+    if (priv_1_lvar2 > 0) then
+        priv_1_lvar4 = 1
+        dummy = priv_1_lvar4
+    end if
+  !$acc end kernels
+
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-bogus {'map\(to:priv_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar2\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar3 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar3\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar4 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar4\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:priv_1_lvar5 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(priv_1_lvar5\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(force_tofrom:dummy \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:dummy \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:dummy \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(dummy\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine priv_1
+
+subroutine multiple_kernels_1 ()
+  implicit none
+  integer :: multiple_kernels_1_lvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    multiple_kernels_1_lvar1 = 1
+  !$acc end kernels
+
+  !$acc kernels ! { dg-line l_use[incr c_use] }
+    multiple_kernels_1_lvar1 = multiple_kernels_1_lvar1 + 1
+  !$acc end kernels
+
+! { dg-missed {'map\(force_tofrom:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'multiple_kernels_1_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+
+! { dg-optimized {'map\(force_tofrom:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:multiple_kernels_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_use$c_use }
+end subroutine multiple_kernels_1
+
+subroutine multiple_kernels_2 ()
+  implicit none
+  integer :: multiple_kernels_2_lvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    multiple_kernels_2_lvar1 = 1
+  !$acc end kernels
+
+  !$acc parallel
+    multiple_kernels_2_lvar1 = multiple_kernels_2_lvar1 + 1 ! { dg-line l_use[incr c_use] }
+  !$acc end parallel
+
+! { dg-missed {'map\(force_tofrom:multiple_kernels_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'multiple_kernels_2_lvar1' used...} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {\.\.\. here} "" { target *-*-* } l_use$c_use }
+end subroutine multiple_kernels_2
+
+integer function ref_1 ()
+  implicit none
+  integer, target :: ref_1_lvar1
+  integer, target :: ref_1_lvar2
+  integer, pointer :: ref_1_ref1
+
+  ref_1_ref1 => ref_1_lvar1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_1_lvar1 = 1
+    ! FIXME: currently considered unsuitable; but could be optimized
+    ref_1_lvar2 = 2
+  !$acc end kernels
+
+  ref_1 = ref_1_ref1
+
+! { dg-missed {'map\(force_tofrom:ref_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_1_lvar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_1_lvar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end function ref_1
+
+integer function ref_2 ()
+  implicit none
+  integer, target :: ref_2_lvar1
+  integer, target :: ref_2_lvar2
+  integer, pointer :: ref_2_ref1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_2_lvar1 = 1
+    ! FIXME: currently considered unsuitable, but could be optimized
+    ref_2_lvar2 = 2
+  !$acc end kernels
+
+  ref_2_ref1 => ref_2_lvar1
+  ref_2 = ref_2_ref1
+
+! { dg-missed {'map\(force_tofrom:ref_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_2_lvar2 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_2_lvar2' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end function ref_2
+
+subroutine ref_3 ()
+  implicit none
+  integer, target :: ref_3_lvar1
+  integer, pointer :: ref_3_ref1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_3_ref1 => ref_3_lvar1
+
+    ! FIXME: currently considered unsuitable, but could be optimized
+    ref_3_lvar1 = 1
+  !$acc end kernels
+
+! { dg-missed {'map\(force_tofrom:\*ref_3_ref1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*ref_3_ref1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_3_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_3_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine ref_3
+
+subroutine ref_4 ()
+  implicit none
+  integer, target :: ref_4_lvar1
+  integer, pointer :: ref_4_ref1
+
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+    ref_4_ref1 => ref_4_lvar1
+
+    ! FIXME: currently considered unsuitable, but could be optimized
+    ref_4_ref1 = 1
+  !$acc end kernels
+
+! { dg-missed {'map\(force_tofrom:\*ref_4_ref1 \[len: [0-9]+\]\[implicit\]\)' not optimized: '\*ref_4_ref1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+! { dg-missed {'map\(force_tofrom:ref_4_lvar1 \[len: [0-9]+\]\[implicit\]\)' not optimized: 'ref_4_lvar1' is unsuitable for privatization} "" { target *-*-* } l_compute$c_compute }
+end subroutine ref_4
+
+subroutine conditional_1 (conditional_1_pvar1)
+  implicit none
+  integer :: conditional_1_pvar1
+  integer :: conditional_1_lvar1
+
+  conditional_1_lvar1 = 1
+
+  if (conditional_1_pvar1 > 0) then
+    !$acc kernels ! { dg-line l_compute[incr c_compute] }
+      conditional_1_lvar1 = 2
+    !$acc end kernels
+  else
+    conditional_1_lvar1 = 3
+  end if
+
+! { dg-optimized {'map\(force_tofrom:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:conditional_1_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(conditional_1_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine conditional_1
+
+subroutine conditional_2 (conditional_2_pvar1)
+  implicit none
+  integer :: conditional_2_pvar1
+  integer :: conditional_2_lvar1
+
+  conditional_2_lvar1 = 1
+
+  if (conditional_2_pvar1 > 0) then
+    conditional_2_lvar1 = 3
+  else
+    !$acc kernels ! { dg-line l_compute[incr c_compute] }
+      conditional_2_lvar1 = 2
+    !$acc end kernels
+  end if
+
+! { dg-optimized {'map\(force_tofrom:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_compute$c_compute }
+! { dg-optimized {'map\(to:conditional_2_lvar1 \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(conditional_2_lvar1\)'} "" { target *-*-* } l_compute$c_compute }
+end subroutine conditional_2
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
index b2aae1df5229..97fbe1268b73 100644
--- a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
@@ -5,6 +5,8 @@ subroutine foo
   integer :: i

   !$acc kernels
+  ! { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 }
+  !TODO See discussion in '../../c-c++-common/goacc/uninit-copy-clause.c'.
   i = 1
   !$acc end kernels

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 5ffd2799ac2c..6fde66bdd8de 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -419,6 +419,7 @@ extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_data_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index dd83557b6aa2..25d545559b15 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -23,6 +23,8 @@ int main()
   int b[N] = { 0 };

 #pragma acc kernels
+  /* { dg-missed {'map\(tofrom:b [^)]+\)' not optimized: 'b' is unsuitable for privatization} "" { target *-*-* } .-1 }
+     { dg-missed {'map\(force_tofrom:a [^)]+\)' not optimized: 'a' is unsuitable for privatization} "" { target *-*-* } .-2 } */
   {
     int c = 234; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index 74ee6fde84f8..994a8a35110f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -17,6 +17,10 @@ subroutine kernel(lo, hi, a, b, c)
   real, dimension(lo:hi) :: a, b, c

   !$acc kernels copyin(lo, hi)
+  ! { dg-optimized {'map\(force_tofrom:offset.[0-9]+ [^)]+\)' optimized to 'map\(to:offset.[0-9]+ [^)]+\)'} "" {target *-*-* } .-1 }
+  ! { dg-missed {'map\(tofrom:\*c [^)]+\)' not optimized: '\*c' is unsuitable for privatization} "" { target *-*-* } .-2 }
+  ! { dg-missed {'map\(tofrom:\*b [^)]+\)' not optimized: '\*b' is unsuitable for privatization} "" { target *-*-* } .-3 }
+  ! { dg-missed {'map\(tofrom:\*a [^)]+\)' not optimized: '\*a' is unsuitable for privatization} "" { target *-*-* } .-4 }
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
   ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }

From patchwork Wed Nov 17 16:03:23 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47827
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id E46163858406
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:17:41 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 1D1B9385842C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1D1B9385842C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 iqUZEp47NO4M/q1rPb6n+n0p9HmMIXy/Tpzh6/p1cAu9CLb8eQ56QDSUrFbbS/nkBDV1FSA3sl
 VUPhZQZRReosoch198ji+lhg3JkHtTyHk7srGtG1SMw7hTdssXrDUJ/1DkCvbV9XzxqZUFKGhH
 L7Q5XPTZBYsb1VW/7YiRrdWchyuyiHOZlRm4+LF/P98ESzla6M9ZDr+jvjgP1bZWx7aEOqMURF
 B80Ijfh4gCwfgi4QJ79gTaTySxtqrbCwycOK0QUswKx8+chQv/MYlh4BrwIq2ZkfljKARcX8Yp
 GFrPOZ7YBrVq5GvCMopQmNwc
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="71081312"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa1.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:39 -0800
IronPort-SDR: 
 DiIwDOr0E2OHgDuIqywGMcyp8YMRow141Ap8fEAvcLGj6L/fXNUGyO1XAI+bPBAbXqvXm0nrWI
 TmBaHwbWsoz9UTyCTMkSl5+oBMK+xaUUe1ZS145IBAhgBuvnDKKo+KFGqHxA7to717/ETaXbbQ
 HAgP/B77Q4bb69d/vuAZgaW7G0fCTXAKNK1LtiJXrDxPCDgQ8HmKMJuOs6uTfUQI9d7Qco8s+9
 d3M7FO22iU+FKPFgiR80nwE97ibTfOqf2v8m+5AvlIhbQj5R8+jfa4Qwy75KADahYSeRVSEVaU
 ttY=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 15/22] openacc: Add runtime alias checking
 for OpenACC kernels
Date: Wed, 17 Nov 2021 17:03:23 +0100
Message-ID: <20211117160330.20029-15-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

From: Andrew Stubbs <ams@codesourcery.com>

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

        * graphite-isl-ast-to-gimple.c: Include internal-fn.h.
        (graphite_oacc_analyze_scop): Implement runtime alias checks.
        * omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
        to GOACC_LOOP internal calls, and initialise it to integer_one_node.
        * omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
        into the GOACC_LOOP expansion.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c              | 122 ++++++
 gcc/graphite-scop-detection.c                 |  18 +-
 gcc/omp-expand.c                              |  37 +-
 gcc/omp-offload.c                             | 413 ++++++++++--------
 .../runtime-alias-check-1.c                   |  79 ++++
 .../runtime-alias-check-2.c                   |  90 ++++
 6 files changed, 550 insertions(+), 209 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c516170d9493..bdabe588c3d8 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1698,6 +1699,127 @@ graphite_oacc_analyze_scop (scop_p scop)
       print_isl_schedule (dump_file, scop->original_schedule);
     }

+  if (flag_graphite_runtime_alias_checks
+      && scop->unhandled_alias_ddrs.length () > 0)
+    {
+      sese_info_p region = scop->scop_info;
+
+      /* Usually there will be a chunking loop with the actual work loop
+        inside it.  In some corner cases there may only be one loop.  */
+      loop_p top_loop = region->region.entry->dest->loop_father;
+      loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+      tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, active_loop);
+
+      /* Walk back to GOACC_LOOP block.  */
+      basic_block goacc_loop_block = region->region.entry->src;
+
+      /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+        OpenACC kernels loop and will need different handling.  */
+      gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+      while (!gsi_end_p (gsitop)
+            && (!is_gimple_call (gsi_stmt (gsitop))
+                || !gimple_call_internal_p (gsi_stmt (gsitop))
+                || (gimple_call_internal_fn (gsi_stmt (gsitop))
+                    != IFN_GOACC_LOOP)))
+       gsi_next (&gsitop);
+
+      if (!gsi_end_p (gsitop))
+       {
+         /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+            statements.  There ought not be any problematic dependencies because
+            the chunk size and step are only computed for very specific purposes.
+            They may not be at the very top of the block, but they should be
+            found together (the asserts test this assuption). */
+         gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+         gsi_move_after (&gsitop, &gsibottom);
+         gimple_stmt_iterator gsiinsert = gsibottom;
+         gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+                              && gimple_call_internal_p (gsi_stmt (gsitop))
+                              && (gimple_call_internal_fn (gsi_stmt (gsitop))
+                                  == IFN_GOACC_LOOP));
+         gsi_move_after (&gsitop, &gsibottom);
+
+         /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+            Note that these likely depend on some of the hoisted statements.  */
+         tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, NULL,
+                                                   true, GSI_NEW_STMT);
+
+         /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+         for (int n = -1; n < (int)region->bbs.length (); n++)
+           {
+             /* Cover the region plus goacc_loop_block.  */
+             basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+             for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+                  !gsi_end_p (gsi);
+                  gsi_next (&gsi))
+               {
+                 gimple *stmt = gsi_stmt (gsi);
+                 if (!is_gimple_call (stmt)
+                     || !gimple_call_internal_p (stmt))
+                   continue;
+
+                 gcall *goacc_call = as_a <gcall*> (stmt);
+                 if (gimple_call_internal_fn (goacc_call) != IFN_GOACC_LOOP)
+                   continue;
+
+                 enum ifn_goacc_loop_kind code = (enum ifn_goacc_loop_kind)
+                   TREE_INT_CST_LOW (gimple_call_arg (goacc_call, 0));
+                 int argno = 0;
+                 switch (code)
+                   {
+                   case IFN_GOACC_LOOP_CHUNKS:
+                   case IFN_GOACC_LOOP_STEP:
+                     argno = 6;
+                     break;
+
+                   case IFN_GOACC_LOOP_OFFSET:
+                   case IFN_GOACC_LOOP_BOUND:
+                     argno = 7;
+                     break;
+
+                   default:
+                     gcc_unreachable ();
+                   }
+
+                 gimple_call_set_arg (goacc_call, argno, cond_val);
+                 update_stmt (goacc_call);
+
+                 if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+                   dump_printf (MSG_NOTE,
+                                "Runtime alias condition applied to: %G",
+                                goacc_call);
+               }
+           }
+       }
+      else
+       {
+         /* There wasn't any GOACC_LOOP calls where we expected to find them,
+            therefore this isn't an OpenACC parallel loop.  If it runs
+            sequentially then there's no need to worry about aliasing, so
+            nothing much to do here.  */
+         if (dump_enabled_p ())
+           dump_printf (MSG_NOTE, "Runtime alias check *not* inserted for"
+                        " bb %d (GOACC_LOOP not found)");
+
+         /* Unset can_be_parallel, in case something else might use it.  */
+         for (unsigned int i = 0; i < region->bbs.length (); i++)
+           if (region->bbs[i]->loop_father)
+             region->bbs[i]->loop_father->can_be_parallel = 0;
+       }
+
+      /* The loop-nest vec is shared by all DDRs. */
+      DDR_LOOP_NEST (scop->unhandled_alias_ddrs[0]).release ();
+
+      unsigned int i;
+      struct data_dependence_relation *ddr;
+
+      FOR_EACH_VEC_ELT (scop->unhandled_alias_ddrs, i, ddr)
+       if (ddr)
+         free_dependence_relation (ddr);
+      scop->unhandled_alias_ddrs.truncate (0);
+    }
+
   /* Analyze dependences in SCoP and mark loops as parallelizable accordingly. */
   isl_schedule_foreach_schedule_node_top_down (
       scop->original_schedule, visit_schedule_loop_node, scop->dependence);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3d4ee30e8250..8b41044bce5e 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -1679,7 +1679,7 @@ dr_defs_outside_region (const sese_l &region, data_reference_p dr)
           break;
         }

-  return opt_result::success ();
+  return res;
 }

 /* Check that all constituents of DR that are used by the
@@ -1691,21 +1691,23 @@ dr_well_analyzed_for_runtime_alias_check_p (data_reference_p dr)
   static const char* error =
     "data-reference not well-analyzed for runtime check.";
   gimple* stmt = DR_STMT (dr);
+  opt_result res = opt_result::success ();

   if (! DR_BASE_ADDRESS (dr))
-    return opt_result::failure_at (stmt, "%s no base address.\n", error);
+    res = opt_result::failure_at (stmt, "%s no base address.\n", error);
   else if (! DR_OFFSET (dr))
-    return opt_result::failure_at (stmt, "%s no offset.\n", error);
+    res = opt_result::failure_at (stmt, "%s no offset.\n", error);
   else if (! DR_INIT (dr))
-    return opt_result::failure_at (stmt, "%s no init.\n", error);
+    res = opt_result::failure_at (stmt, "%s no init.\n", error);
   else if (! DR_STEP (dr))
-    return opt_result::failure_at (stmt, "%s no step.\n", error);
+    res = opt_result::failure_at (stmt, "%s no step.\n", error);
   else if (! tree_fits_uhwi_p (DR_STEP (dr)))
-    return opt_result::failure_at (stmt, "%s step too large.\n", error);
+    res = opt_result::failure_at (stmt, "%s step too large.\n", error);

-  DEBUG_PRINT (dump_data_reference (dump_file, dr));
+  if (!res)
+    DEBUG_PRINT (dump_data_reference (dump_file, dr));

-  return opt_result::success ();
+  return res;
 }

 /* Return TRUE if it is possible to create a runtime alias check for
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 7a40ea2da1a0..182868501fe7 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7762,10 +7762,11 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
       ass = gimple_build_assign (chunk_no, expr);
       gsi_insert_before (&gsi, ass, GSI_SAME_STMT);

-      call = gimple_build_call_internal (IFN_GOACC_LOOP, 6,
+      call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
                                         build_int_cst (integer_type_node,
                                                        IFN_GOACC_LOOP_CHUNKS),
-                                        dir, range, s, chunk_size, gwv);
+                                        dir, range, s, chunk_size, gwv,
+                                        integer_one_node);
       gimple_call_set_lhs (call, chunk_max);
       gimple_set_location (call, loc);
       gsi_insert_before (&gsi, call, GSI_SAME_STMT);
@@ -7773,10 +7774,11 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
   else
     chunk_size = chunk_no;

-  call = gimple_build_call_internal (IFN_GOACC_LOOP, 6,
+  call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
                                     build_int_cst (integer_type_node,
                                                    IFN_GOACC_LOOP_STEP),
-                                    dir, range, s, chunk_size, gwv);
+                                    dir, range, s, chunk_size, gwv,
+                                    integer_one_node);
   gimple_call_set_lhs (call, step);
   gimple_set_location (call, loc);
   gsi_insert_before (&gsi, call, GSI_SAME_STMT);
@@ -7810,20 +7812,20 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
   /* Loop offset & bound go into head_bb.  */
   gsi = gsi_start_bb (head_bb);

-  call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
+  call = gimple_build_call_internal (IFN_GOACC_LOOP, 8,
                                     build_int_cst (integer_type_node,
                                                    IFN_GOACC_LOOP_OFFSET),
-                                    dir, range, s,
-                                    chunk_size, gwv, chunk_no);
+                                    dir, range, s, chunk_size, gwv, chunk_no,
+                                    integer_one_node);
   gimple_call_set_lhs (call, offset_init);
   gimple_set_location (call, loc);
   gsi_insert_after (&gsi, call, GSI_CONTINUE_LINKING);

-  call = gimple_build_call_internal (IFN_GOACC_LOOP, 7,
+  call = gimple_build_call_internal (IFN_GOACC_LOOP, 8,
                                     build_int_cst (integer_type_node,
                                                    IFN_GOACC_LOOP_BOUND),
-                                    dir, range, s,
-                                    chunk_size, gwv, offset_init);
+                                    dir, range, s, chunk_size, gwv,
+                                    offset_init, integer_one_node);
   gimple_call_set_lhs (call, bound);
   gimple_set_location (call, loc);
   gsi_insert_after (&gsi, call, GSI_CONTINUE_LINKING);
@@ -7873,22 +7875,25 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
          tree chunk = build_int_cst (diff_type, 0); /* Never chunked.  */

          t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_OFFSET);
-         call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range,
-                                            element_s, chunk, e_gwv, chunk);
+         call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, t, dir, e_range,
+                                            element_s, chunk, e_gwv, chunk,
+                                            integer_one_node);
          gimple_call_set_lhs (call, e_offset);
          gimple_set_location (call, loc);
          gsi_insert_before (&gsi, call, GSI_SAME_STMT);

          t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_BOUND);
-         call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range,
-                                            element_s, chunk, e_gwv, e_offset);
+         call = gimple_build_call_internal (IFN_GOACC_LOOP, 8, t, dir, e_range,
+                                            element_s, chunk, e_gwv, e_offset,
+                                            integer_one_node);
          gimple_call_set_lhs (call, e_bound);
          gimple_set_location (call, loc);
          gsi_insert_before (&gsi, call, GSI_SAME_STMT);

          t = build_int_cst (integer_type_node, IFN_GOACC_LOOP_STEP);
-         call = gimple_build_call_internal (IFN_GOACC_LOOP, 6, t, dir, e_range,
-                                            element_s, chunk, e_gwv);
+         call = gimple_build_call_internal (IFN_GOACC_LOOP, 7, t, dir, e_range,
+                                            element_s, chunk, e_gwv,
+                                            integer_one_node);
          gimple_call_set_lhs (call, e_step);
          gimple_set_location (call, loc);
          gsi_insert_before (&gsi, call, GSI_SAME_STMT);
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 68cc5a9d9e5d..94a975a88660 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -584,6 +584,7 @@ oacc_xform_loop (gcall *call)
   unsigned outer_mask = mask & (~mask + 1); // Outermost partitioning
   unsigned inner_mask = mask & ~outer_mask; // Inner partitioning (if any)
   tree vf_by_vectorizer = NULL_TREE;
+  tree noalias = NULL_TREE;

   /* Skip lowering if return value of IFN_GOACC_LOOP call is not used.  */
   if (!lhs)
@@ -648,202 +649,244 @@ oacc_xform_loop (gcall *call)

   switch (code)
     {
-    default: gcc_unreachable ();
+    default:
+      gcc_unreachable ();

     case IFN_GOACC_LOOP_CHUNKS:
+      noalias = gimple_call_arg (call, 6);
       if (!chunking)
-       r = build_int_cst (type, 1);
+        r = build_int_cst (type, 1);
       else
-       {
-         /* chunk_max
-            = (range - dir) / (chunks * step * num_threads) + dir  */
-         tree per = oacc_thread_numbers (false, mask, &seq);
-         per = fold_convert (type, per);
-         chunk_size = fold_convert (type, chunk_size);
-         per = fold_build2 (MULT_EXPR, type, per, chunk_size);
-         per = fold_build2 (MULT_EXPR, type, per, step);
-         r = fold_build2 (MINUS_EXPR, type, range, dir);
-         r = fold_build2 (PLUS_EXPR, type, r, per);
-         r = build2 (TRUNC_DIV_EXPR, type, r, per);
-       }
+        {
+          /* chunk_max
+             = (range - dir) / (chunks * step * num_threads) + dir  */
+          tree per = oacc_thread_numbers (false, mask, &seq);
+          per = fold_convert (type, per);
+          noalias = fold_convert (type, noalias);
+          per = fold_build2 (MULT_EXPR, type, per, noalias);
+          per = fold_build2 (MAX_EXPR, type, per, fold_convert (type, integer_one_node));
+          chunk_size = fold_convert (type, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, chunk_size);
+          per = fold_build2 (MULT_EXPR, type, per, step);
+          r = fold_build2 (MINUS_EXPR, type, range, dir);
+          r = fold_build2 (PLUS_EXPR, type, r, per);
+          r = build2 (TRUNC_DIV_EXPR, type, r, per);
+        }
       break;

     case IFN_GOACC_LOOP_STEP:
+      noalias = gimple_call_arg (call, 6);
       {
-       if (vf_by_vectorizer)
-         r = step;
-       else
-         {
-           /* If striding, step by the entire compute volume, otherwise
-              step by the inner volume.  */
-           unsigned volume = striding ? mask : inner_mask;
-
-           r = oacc_thread_numbers (false, volume, &seq);
-           r = build2 (MULT_EXPR, type, fold_convert (type, r), step);
-         }
+        if (vf_by_vectorizer)
+          r = step;
+        else
+          {
+            /* If striding, step by the entire compute volume, otherwise
+               step by the inner volume.  */
+            unsigned volume = striding ? mask : inner_mask;
+
+            noalias = fold_convert (type, noalias);
+            r = oacc_thread_numbers (false, volume, &seq);
+            r = fold_convert (type, r);
+            r = build2 (MULT_EXPR, type, r, noalias);
+            r = build2 (MAX_EXPR, type, r, fold_convert (type, fold_convert (type, integer_one_node)));
+            r = build2 (MULT_EXPR, type, fold_convert (type, r), step);
+          }
+        break;
       }
-      break;
-
-    case IFN_GOACC_LOOP_OFFSET:
-      if (vf_by_vectorizer)
-       {
-         /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
-            the loop.  */
-         if (flag_tree_loop_vectorize
-             || !global_options_set.x_flag_tree_loop_vectorize)
-           {
-             /* Enable vectorization on non-SIMT targets.  */
-             basic_block bb = gsi_bb (gsi);
-             class loop *chunk_loop = bb->loop_father;
-             class loop *inner_loop = chunk_loop->inner;
-
-             /* Chunking isn't supported for VF_BY_VECTORIZER loops yet,
-                so we know that the outer chunking loop will be executed just
-                once and the inner loop is the one which must be
-                vectorized (unless it has been optimized out for some
-                reason).  */
-             gcc_assert (!chunking);
-
-             if (inner_loop)
-               {
-                 inner_loop->force_vectorize = true;
-                 inner_loop->safelen = INT_MAX;
-
-                 cfun->has_force_vectorize_loops = true;
-               }
-           }

-         /* ...and expand the abstract loops such that the vectorizer can
-            work on them more effectively.
-
-            It might be nicer to merge this code with the "!striding" case
-            below, particularly if chunking support is added.  */
-         tree warppos
-           = oacc_thread_numbers (true, mask, vf_by_vectorizer, &seq);
-         warppos = fold_convert (diff_type, warppos);
-
-         tree volume
-           = oacc_thread_numbers (false, mask, vf_by_vectorizer, &seq);
-         volume = fold_convert (diff_type, volume);
-
-         tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
-         chunk_size = fold_build2 (PLUS_EXPR, diff_type, range, per);
-         chunk_size = fold_build2 (MINUS_EXPR, diff_type, chunk_size, dir);
-         chunk_size = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size,
-                                   per);
-
-         warppos = fold_build2 (MULT_EXPR, diff_type, warppos, chunk_size);
-
-         tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6));
-         chunk = fold_build2 (MULT_EXPR, diff_type, chunk, volume);
-         r = fold_build2 (PLUS_EXPR, diff_type, chunk, warppos);
-       }
-      else if (striding)
-       {
-         r = oacc_thread_numbers (true, mask, &seq);
-         r = fold_convert (diff_type, r);
-       }
-      else
-       {
-         tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
-         tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
-         tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
-                                    inner_size, outer_size);
-
-         volume = fold_convert (diff_type, volume);
-         if (chunking)
-           chunk_size = fold_convert (diff_type, chunk_size);
-         else
-           {
-             tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
-             /* chunk_size = (range + per - 1) / per.  */
-             chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
-             chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
-             chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
-           }
-
-         tree span = build2 (MULT_EXPR, diff_type, chunk_size,
-                             fold_convert (diff_type, inner_size));
-         r = oacc_thread_numbers (true, outer_mask, &seq);
-         r = fold_convert (diff_type, r);
-         r = build2 (MULT_EXPR, diff_type, r, span);
-
-         tree inner = oacc_thread_numbers (true, inner_mask, &seq);
-         inner = fold_convert (diff_type, inner);
-         r = fold_build2 (PLUS_EXPR, diff_type, r, inner);
-
-         if (chunking)
-           {
-             tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6));
-             tree per
-               = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size);
-             per = build2 (MULT_EXPR, diff_type, per, chunk);
-
-             r = build2 (PLUS_EXPR, diff_type, r, per);
-           }
-       }
-      r = fold_build2 (MULT_EXPR, diff_type, r, step);
-      if (type != diff_type)
-       r = fold_convert (type, r);
-      break;
-
-    case IFN_GOACC_LOOP_BOUND:
-      if (vf_by_vectorizer)
-       {
-         tree volume
-           = oacc_thread_numbers (false, mask, vf_by_vectorizer, &seq);
-         volume = fold_convert (diff_type, volume);
-
-         tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
-         chunk_size = fold_build2 (PLUS_EXPR, diff_type, range, per);
-         chunk_size = fold_build2 (MINUS_EXPR, diff_type, chunk_size, dir);
-         chunk_size = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size,
-                                   per);
-
-         vf_by_vectorizer = fold_convert (diff_type, vf_by_vectorizer);
-         tree vecsize = fold_build2 (MULT_EXPR, diff_type, chunk_size,
-                                     vf_by_vectorizer);
-         vecsize = fold_build2 (MULT_EXPR, diff_type, vecsize, step);
-         tree vecend = fold_convert (diff_type, gimple_call_arg (call, 6));
-         vecend = fold_build2 (PLUS_EXPR, diff_type, vecend, vecsize);
-         r = fold_build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type,
-                          range, vecend);
-       }
-      else if (striding)
-       r = range;
-      else
-       {
-         tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
-         tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
-         tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
-                                    inner_size, outer_size);
-
-         volume = fold_convert (diff_type, volume);
-         if (chunking)
-           chunk_size = fold_convert (diff_type, chunk_size);
-         else
-           {
-             tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
-             /* chunk_size = (range + per - 1) / per.  */
-             chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
-             chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
-             chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
-           }
-
-         tree span = build2 (MULT_EXPR, diff_type, chunk_size,
-                             fold_convert (diff_type, inner_size));
-
-         r = fold_build2 (MULT_EXPR, diff_type, span, step);
+      case IFN_GOACC_LOOP_OFFSET:
+       noalias = gimple_call_arg (call, 7);
+        if (vf_by_vectorizer)
+          {
+            /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
+               the loop.  */
+            if (flag_tree_loop_vectorize
+                || !global_options_set.x_flag_tree_loop_vectorize)
+              {
+                /* Enable vectorization on non-SIMT targets.  */
+                basic_block bb = gsi_bb (gsi);
+                class loop *chunk_loop = bb->loop_father;
+                class loop *inner_loop = chunk_loop->inner;
+
+                /* Chunking isn't supported for VF_BY_VECTORIZER loops yet,
+                   so we know that the outer chunking loop will be executed
+                   just once and the inner loop is the one which must be
+                   vectorized (unless it has been optimized out for some
+                   reason).  */
+                gcc_assert (!chunking);
+
+                if (inner_loop)
+                  {
+                    inner_loop->force_vectorize = true;
+                    inner_loop->safelen = INT_MAX;
+
+                    cfun->has_force_vectorize_loops = true;
+                  }
+              }
+
+            /* ...and expand the abstract loops such that the vectorizer can
+               work on them more effectively.
+
+               It might be nicer to merge this code with the "!striding" case
+               below, particularly if chunking support is added.  */
+            tree warppos
+                = oacc_thread_numbers (true, mask, vf_by_vectorizer, &seq);
+            warppos = fold_convert (diff_type, warppos);
+
+            tree volume
+                = oacc_thread_numbers (false, mask, vf_by_vectorizer, &seq);
+            volume = fold_convert (diff_type, volume);
+
+            tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+            chunk_size = fold_build2 (PLUS_EXPR, diff_type, range, per);
+            chunk_size = fold_build2 (MINUS_EXPR, diff_type, chunk_size, dir);
+            chunk_size
+                = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+
+            warppos = fold_build2 (MULT_EXPR, diff_type, warppos, chunk_size);
+
+            tree chunk = fold_convert (diff_type, gimple_call_arg (call, 6));
+            chunk = fold_build2 (MULT_EXPR, diff_type, chunk, volume);
+            r = fold_build2 (PLUS_EXPR, diff_type, chunk, warppos);
+          }
+        else if (striding)
+          {
+            r = oacc_thread_numbers (true, mask, &seq);
+            r = fold_convert (diff_type, r);
+            tree tmp1 = build2 (NE_EXPR, boolean_type_node, r,
+                                fold_convert (diff_type, integer_zero_node));
+            tree tmp2 = build2 (EQ_EXPR, boolean_type_node, noalias,
+                                boolean_false_node);
+            tree tmp3 = build2 (BIT_AND_EXPR, diff_type,
+                                fold_convert (diff_type, tmp1),
+                                fold_convert (diff_type, tmp2));
+            tree tmp4 = build2 (MULT_EXPR, diff_type, tmp3, range);
+            r = build2 (PLUS_EXPR, diff_type, r, tmp4);
+          }
+        else
+          {
+            tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
+            tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
+            tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                       inner_size, outer_size);
+
+            volume = fold_convert (diff_type, volume);
+            if (chunking)
+              chunk_size = fold_convert (diff_type, chunk_size);
+            else
+              {
+                tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+                /* chunk_size = (range + per - 1) / per.  */
+                chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
+                chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
+                chunk_size = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+              }
+
+            /* Curtail the range in all but one thread when there may be
+               aliasing to prevent parallelization.  */
+            tree n = oacc_thread_numbers (true, mask, &seq);
+            n = fold_convert (diff_type, n);
+            tree tmp1 = build2 (NE_EXPR, boolean_type_node, n,
+                                fold_convert (diff_type, integer_zero_node));
+            tree tmp2 = build2 (EQ_EXPR, boolean_type_node, noalias,
+                                boolean_false_node);
+            tree tmp3 = build2 (BIT_AND_EXPR, diff_type,
+                                fold_convert (diff_type, tmp1),
+                                fold_convert (diff_type, tmp2));
+            range = build2 (MULT_EXPR, diff_type, tmp3, range);
+
+            tree span = build2 (MULT_EXPR, diff_type, chunk_size,
+                                fold_convert (diff_type, inner_size));
+            r = oacc_thread_numbers (true, outer_mask, &seq);
+            r = fold_convert (diff_type, r);
+            r = build2 (PLUS_EXPR, diff_type, r, range);
+            r = build2 (MULT_EXPR, diff_type, r, span);
+
+            tree inner = oacc_thread_numbers (true, inner_mask, &seq);
+
+            inner = fold_convert (diff_type, inner);
+            r = fold_build2 (PLUS_EXPR, diff_type, r, inner);
+
+            if (chunking)
+              {
+                tree chunk
+                    = fold_convert (diff_type, gimple_call_arg (call, 6));
+                tree per
+                    = fold_build2 (MULT_EXPR, diff_type, volume, chunk_size);
+                per = build2 (MULT_EXPR, diff_type, per, chunk);
+
+                r = build2 (PLUS_EXPR, diff_type, r, per);
+              }
+          }
+        r = fold_build2 (MULT_EXPR, diff_type, r, step);
+        if (type != diff_type)
+          r = fold_convert (type, r);
+        break;

-         tree offset = gimple_call_arg (call, 6);
-         r = build2 (PLUS_EXPR, diff_type, r,
-                     fold_convert (diff_type, offset));
-         r = build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR,
-                     diff_type, r, range);
-       }
-      if (diff_type != type)
-       r = fold_convert (type, r);
-      break;
+      case IFN_GOACC_LOOP_BOUND:
+        if (vf_by_vectorizer)
+          {
+            tree volume
+                = oacc_thread_numbers (false, mask, vf_by_vectorizer, &seq);
+            volume = fold_convert (diff_type, volume);
+
+            tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+            chunk_size = fold_build2 (PLUS_EXPR, diff_type, range, per);
+            chunk_size = fold_build2 (MINUS_EXPR, diff_type, chunk_size, dir);
+            chunk_size
+                = fold_build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+
+            vf_by_vectorizer = fold_convert (diff_type, vf_by_vectorizer);
+            tree vecsize = fold_build2 (MULT_EXPR, diff_type, chunk_size,
+                                        vf_by_vectorizer);
+            vecsize = fold_build2 (MULT_EXPR, diff_type, vecsize, step);
+            tree vecend = fold_convert (diff_type, gimple_call_arg (call, 6));
+            vecend = fold_build2 (PLUS_EXPR, diff_type, vecend, vecsize);
+            r = fold_build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR,
+                             diff_type, range, vecend);
+          }
+        else if (striding)
+          r = range;
+        else
+          {
+            noalias = fold_convert (diff_type, gimple_call_arg (call, 7));
+
+            tree inner_size = oacc_thread_numbers (false, inner_mask, &seq);
+            tree outer_size = oacc_thread_numbers (false, outer_mask, &seq);
+            tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size),
+                                       inner_size, outer_size);
+
+            volume = fold_convert (diff_type, volume);
+            volume = fold_build2 (MULT_EXPR, diff_type, volume, noalias);
+            volume
+                = fold_build2 (MAX_EXPR, diff_type, volume, fold_convert (diff_type, integer_one_node));
+            if (chunking)
+              chunk_size = fold_convert (diff_type, chunk_size);
+            else
+              {
+                tree per = fold_build2 (MULT_EXPR, diff_type, volume, step);
+                /* chunk_size = (range + per - 1) / per.  */
+                chunk_size = build2 (MINUS_EXPR, diff_type, range, dir);
+                chunk_size = build2 (PLUS_EXPR, diff_type, chunk_size, per);
+                chunk_size
+                    = build2 (TRUNC_DIV_EXPR, diff_type, chunk_size, per);
+              }
+
+            tree span = build2 (MULT_EXPR, diff_type, chunk_size,
+                                fold_convert (diff_type, inner_size));
+
+            r = fold_build2 (MULT_EXPR, diff_type, span, step);
+
+            tree offset = gimple_call_arg (call, 6);
+            r = build2 (PLUS_EXPR, diff_type, r,
+                        fold_convert (diff_type, offset));
+            r = build2 (integer_onep (dir) ? MIN_EXPR : MAX_EXPR, diff_type, r,
+                        range);
+          }
+        if (diff_type != type)
+          r = fold_convert (type, r);
+        break;
     }

   gimplify_assign (lhs, r, &seq);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
new file mode 100644
index 000000000000..2fb1c712beb3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
@@ -0,0 +1,79 @@
+/* Test that a simple array copy does the right thing when the input and
+   output data overlap.  The GPU kernel should automatically switch to
+   a sequential operation mode in order to give the expected results.  */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+void f(int *data, int n, int to, int from, int count)
+{
+  /* We cannot use copyin for two overlapping arrays because we get an error
+     that the memory is already present.  We also cannot do the pointer
+     arithmetic inside the kernels region because it just ends up using
+     host pointers (bug?).  Using enter data with a single array, and
+     acc_deviceptr solves the problem.  */
+#pragma acc enter data copyin(data[0:n])
+
+  int *a = (int*)acc_deviceptr (data+to);
+  int *b = (int*)acc_deviceptr (data+from);
+
+#pragma acc kernels
+  for (int i = 0; i < count; i++)
+    a[i] = b[i];
+
+#pragma acc exit data copyout(data[0:n])
+}
+
+#define N 2000
+
+int data[N];
+
+int
+main ()
+{
+  for (int i=0; i < N; i++)
+    data[i] = i;
+
+  /* Baseline test; no aliasing. The high part of the data is copied to
+     the lower part.  */
+  int to = 0;
+  int from = N/2;
+  int count = N/2;
+  f (data, N, to, from, count);
+  for (int i=0; i < N; i++)
+    if (data[i] != (i%count)+count)
+      exit (1);
+
+  /* Check various amounts of data overlap.  */
+  int tests[] = {1, 10, N/4, N/2-10, N/2-1};
+  for (int t = 0; t < sizeof (tests)/sizeof(tests[0]); t++)
+    {
+      for (int i=0; i < N; i++)
+       data[i] = i;
+
+      /* Output overlaps the latter part of input; expect the initial no-aliased
+        part of the input to repeat throughout the aliased portion.  */
+      to = tests[t];
+      from = 0;
+      count = N-tests[t];
+      f (data, N, to, from, count);
+      for (int i=0; i < N; i++)
+       if (data[i] != i%tests[t])
+       exit (2);
+
+      for (int i=0; i < N; i++)
+       data[i] = i;
+
+      /* Input overlaps the latter part of the output; expect the copy to work
+        in the obvious manner.  */
+      to = 0;
+      from = tests[t];
+      count = N-tests[t];
+      f (data, N, to, from, count);
+      for (int i=0; i < count; i++)
+       if (data[i+to] != i+tests[t])
+       exit (3);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c
new file mode 100644
index 000000000000..96c03297d5b4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c
@@ -0,0 +1,90 @@
+/* Test that a simple array copy does the right thing when the input and
+   output data overlap.  The GPU kernel should automatically switch to
+   a sequential operation mode in order to give the expected results.
+
+   This test does not check the correctness of the output (there are other
+   tests for that), but checks that the code really does select the faster
+   path, when it can, by comparing the timing.  */
+
+/* No optimization means no issue with aliasing.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } }
+   { dg-skip-if "" { *-*-* } { "-foffload=disable" } { "" } } */
+
+#include <stdlib.h>
+#include <sys/time.h>
+#include <openacc.h>
+
+void f(int *data, int n, int to, int from, int count)
+{
+  int *a = (int*)acc_deviceptr (data+to);
+  int *b = (int*)acc_deviceptr (data+from);
+
+#pragma acc kernels
+  for (int i = 0; i < count; i++)
+    a[i] = b[i];
+}
+
+#define N 1000000
+int data[N];
+
+int
+main ()
+{
+  struct timeval start, stop, difference;
+  long basetime, aliastime;
+
+  for (int i=0; i < N; i++)
+    data[i] = i;
+
+  /* Ensure that the data copies are outside the timed zone.  */
+#pragma acc enter data copyin(data[0:N])
+
+  /* Baseline test; no aliasing. The high part of the data is copied to
+     the lower part.  */
+  int to = 0;
+  int from = N/2;
+  int count = N/2;
+  gettimeofday (&start, NULL);
+  f (data, N, to, from, count);
+  gettimeofday (&stop, NULL);
+  timersub (&stop, &start, &difference);
+  basetime = difference.tv_sec * 1000000 + difference.tv_usec;
+
+  /* Check various amounts of data overlap.  */
+  int tests[] = {1, 10, N/4, N/2-10, N/2-1};
+  for (int i = 0; i < sizeof (tests)/sizeof(tests[0]); i++)
+    {
+      to = 0;
+      from = N/2 - tests[i];
+      gettimeofday (&start, NULL);
+      f (data, N, to, from, count);
+      gettimeofday (&stop, NULL);
+      timersub (&stop, &start, &difference);
+      aliastime = difference.tv_sec * 1000000 + difference.tv_usec;
+
+      /* If the aliased runtime is less than 200% of the non-aliased runtime
+        then the runtime alias check probably selected the wrong path.
+        (Actually we expect the difference to be far greater than that.)  */
+      if (basetime*2 > aliastime)
+       exit (1);
+    }
+
+  /* Repeat the baseline check just to make sure it didn't also get slower
+     after the first run.  */
+  to = 0;
+  from = N/2;
+  gettimeofday (&start, NULL);
+  f (data, N, to, from, count);
+  gettimeofday (&stop, NULL);
+  timersub (&stop, &start, &difference);
+  int controltime = difference.tv_sec * 1000000 + difference.tv_usec;
+
+  /* The two times should be roughly the same, but we just check it wouldn't
+     pass the aliastime test above.  */
+  if (basetime*2 <= controltime)
+    exit (2);
+
+#pragma acc exit data copyout(data[0:N])
+
+  return 0;
+}

From patchwork Wed Nov 17 16:03:24 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47829
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 459CD385C413
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:19:14 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id EA1403858410
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA1403858410
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 8SxkoblOGgFoX4XDAuNXzeF85jO3ghAz7wlR8LH52P31SslnibkHlSSpTZ3iGxq+yV7kZu4fSA
 YBLJTw/ED1w/U7TOVZN6HwmdI9YHVaqPaUeOR5Obw+JV4g5LYTHf6vz5KCdN5HRo1BlWuo+DhB
 jFiYq9NTk6Tveu6pK7dItdoGblIIkPHSJLqG52Hh0FpQGw7BRVU6/L7cBKiVkVNPHZVbB4Reo1
 r4cXyXY2wOQdt0w9idsat9IBz7xRjyGzL1gwhy0GvSL8hR5f+hjUqbYzCJH00lk8I5ioImMbGH
 B2iNHKvarFswODBtC+kLKFwG
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445377"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:42 -0800
IronPort-SDR: 
 IKe8aBQwVpfSezwWIAEmlMEhrujwQYqrV69+rI6nDC1LoXP1eacr+Go8Zl/n+FJlJyfnepHmrE
 7qNA51IGXl9YkZQnQefvXeUZaCpeWiwsag1Wr1XgoxapObdpYJcYjNd5qcMykuWB/1ttgt0VwD
 pVgWyNxxzNakle5/0k0HcjcR4Mio+GK0f0LJ9PEbQ/2gb4TK9HgMH3Qz5x6ROFate9WaB8WvOm
 6vEK9oYD2RrEuZ4kir6UZkTSsM2RrWbVqrvm0FZaPLJirSoRvapvZsjAOLFMEo+Cfr5s/wnXWz
 dWM=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 16/22] openacc: Warn about "independent"
 "kernels" loops with data-dependences
Date: Wed, 17 Nov 2021 17:03:24 +0100
Message-ID: <20211117160330.20029-16-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

        * common.opt: Add flag Wopenacc-false-independent.
        * omp-offload.c (oacc_loop_warn_if_false_independent): New function.
        (oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt    |  5 +++++
 gcc/omp-offload.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/common.opt b/gcc/common.opt
index aa695e56dc48..4c38ed5cf9ab 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -838,6 +838,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 94a975a88660..b806e36ef515 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2043,6 +2043,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop *loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+    return;
+
+  if (loop->routine)
+    return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+    return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+    return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+    {
+      if (dump_enabled_p ())
+       {
+         const dump_user_location_t loc
+           = dump_user_location_t::from_location_t (loop->loc);
+         dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                          "'independent' loop in 'kernels' region has not been "
+                          "analyzed (cf. 'graphite' "
+                          "dumps for more information).\n");
+       }
+      return;
+    }
+
+  if (!can_be_parallel)
+    warning_at (loop->loc, 0,
+                "loop has \"independent\" clause but data dependences were "
+                "found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
    programmer-specified partitionings.  OUTER_MASK is the partitioning
    this loop is contained within.  Return mask of partitioning
@@ -2094,6 +2139,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
            }
        }

+      /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+      if (warn_openacc_false_independent)
+        oacc_loop_warn_if_false_independent (loop);
+
       if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
        {
          loop->flags |= OLF_AUTO;

From patchwork Wed Nov 17 16:03:25 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47830
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DB382385840E
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:19:52 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 043AA3858410
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 043AA3858410
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 7cE7xKm8DO8rpkX8Xvf3QIDjhPrBIVjof7oPSmoTqyvkZaX1XGmnM1hgu7EcUDIC2GJz6YfG5U
 TZzsLqVYTupqFkSA7bIHDrtrNSc6VjmmW+Vpp4Taf2xyHpxrkFiR5PajJjD49LIbaCf6Pd0t1b
 WwqdEVnRJA1RDKfufZdV6leI+E43VQLzPodk8b/FLcSWZnkqQ8/pNuXGTKJYDmmyyhYWU5lFvH
 EQazfgG6yk52kEt8T8kbsEhrKFhZoVHU6GW1re5voRuGjESNYC9FbYyLF9Vck3A5hLukrzkhXg
 O2QZoMHWIGJ6C8gCJ9LYFUgh
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445390"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:51 -0800
IronPort-SDR: 
 1FaprDfkpS1mA6KpGea7QxS44N4GM9U5Un88/+pls3kIymSo50HPt6bVLXzPXuI+4ZwU4yAeFJ
 gJ9rHsGvyuWb5RO85exIbbVbUq+VQObnzzWqLmvX4SSd3cQYZhS0fPPWZuy911dl1xM+QzXgkZ
 QJg//Wgt6x3atg3JH1pyf5ttV3CEl2sAqjhiLom6QXmgLUSV4vVMXfAimDsuGzG8UyB0bu7zUQ
 ojyOoaDeG8IhfvIBhiEPW/2yVeK1Yj0F2HouuUUC92pTPqp+Bi5w56QaQ1qMCa3H4urwm+FqBD
 s+A=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 17/22] openacc: Handle internal function
 calls in pass_lim
Date: Wed, 17 Nov 2021 17:03:25 +0100
Message-ID: <20211117160330.20029-17-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
        * passes.def: Set restrict_oacc_hoisting to true for the early
        pass_lim instance.
        * tree-ssa-loop-im.c (movement_possibility): Add
        restrict_oacc_hoisting flag to function; restrict movement if set.
        (compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
        (gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
        calls.
        (loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
        pass it on.
        (pass_lim::execute): Pass on new flags.
        * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust declaration.
        * gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to
        loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def                 |  2 +-
 gcc/tree-ssa-loop-im.c         | 58 ++++++++++++++++++++++++++++------
 gcc/tree-ssa-loop-manip.h      |  2 +-
 4 files changed, 52 insertions(+), 12 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 7b799eca805c..d617438910fd 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2096,7 +2096,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
     {
       unsigned todo = TODO_update_ssa_only_virtuals;
-      todo |= loop_invariant_motion_in_fun (cfun, false);
+      todo |= loop_invariant_motion_in_fun (cfun, false, false);
       scev_reset ();
       return todo;
     }
diff --git a/gcc/passes.def b/gcc/passes.def
index 48c9821011f0..d1dedbc287e2 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,7 +247,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_cse_sincos);
       NEXT_PASS (pass_optimize_bswap);
       NEXT_PASS (pass_laddress);
-      NEXT_PASS (pass_lim);
+      NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
       NEXT_PASS (pass_walloca, false);
       NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 7de47edbcb30..b392ae609aaf 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -320,11 +322,23 @@ enum move_pos
    Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+      && gimple_code (stmt) == GIMPLE_ASSIGN)
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+
+      if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+       rhs = TREE_OPERAND (rhs, 0);
+
+      if (TREE_CODE (rhs) == ARRAY_REF)
+         return MOVE_IMPOSSIBLE;
+    }
+
   if (flag_unswitch_loops
       && gimple_code (stmt) == GIMPLE_COND)
     {
@@ -974,7 +988,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
    statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1002,7 +1016,7 @@ compute_invariantness (basic_block bb)
       {
        stmt = gsi_stmt (bsi);

-       pos = movement_possibility (stmt);
+       pos = movement_possibility (stmt, restrict_oacc_hoisting);
        if (pos == MOVE_IMPOSSIBLE)
          continue;

@@ -1033,7 +1047,7 @@ compute_invariantness (basic_block bb)
     {
       stmt = gsi_stmt (bsi);

-      pos = movement_possibility (stmt);
+      pos = movement_possibility (stmt, restrict_oacc_hoisting);
       if (pos == MOVE_IMPOSSIBLE)
        {
          if (nonpure_call_p (stmt))
@@ -1458,7 +1472,15 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
   if (!gimple_vuse (stmt))
     return;

+  /* The expansion of those OpenACC internal function calls which occurs in a
+   * later pass does not introduce any memory references. Hence it is safe to
+   * ignore them. */
+  if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP)
+      || gimple_call_internal_p (stmt, IFN_UNIQUE))
+    return;
+
   mem = simple_mem_ref_in_stmt (stmt, &is_stored);
+
   if (!mem)
     {
       /* We use the shared mem_ref for all unanalyzable refs.  */
@@ -1484,7 +1506,7 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
       ao_ref_alias_set (&aor);
       HOST_WIDE_INT offset, size, max_size;
       poly_int64 saved_maxsize = aor.max_size, mem_off;
-      tree mem_base;
+      tree mem_base = NULL;
       bool ref_decomposed;
       if (aor.max_size_known_p ()
          && aor.offset.is_constant (&offset)
@@ -3155,7 +3177,8 @@ tree_ssa_lim_finalize (void)
    Only perform store motion if STORE_MOTION is true.  */

 unsigned int
-loop_invariant_motion_in_fun (function *fun, bool store_motion)
+loop_invariant_motion_in_fun (function *fun, bool store_motion,
+                             bool restrict_oacc_hoisting)
 {
   unsigned int todo = 0;

@@ -3173,7 +3196,7 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion)
   /* For each statement determine the outermost loop in that it is
      invariant and cost for computing the invariant.  */
   for (int i = 0; i < n; ++i)
-    compute_invariantness (BASIC_BLOCK_FOR_FN (fun, rpo[i]));
+    compute_invariantness (BASIC_BLOCK_FOR_FN (fun, rpo[i]), restrict_oacc_hoisting);

   /* Execute store motion.  Force the necessary invariants to be moved
      out of the loops as well.  */
@@ -3220,13 +3243,21 @@ class pass_lim : public gimple_opt_pass
 {
 public:
   pass_lim (gcc::context *ctxt)
-    : gimple_opt_pass (pass_data_lim, ctxt)
+    : gimple_opt_pass (pass_data_lim, ctxt), restrict_oacc_hoisting (false)
   {}

+  void set_pass_param (unsigned int n, bool param)
+    {
+      gcc_assert (n == 0);
+      restrict_oacc_hoisting = param;
+    }
+
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_lim (m_ctxt); }
   virtual bool gate (function *) { return flag_tree_loop_im != 0; }
   virtual unsigned int execute (function *);
+private:
+  bool restrict_oacc_hoisting;

 }; // class pass_lim

@@ -3239,7 +3270,16 @@ pass_lim::execute (function *fun)

   if (number_of_loops (fun) <= 1)
     return 0;
-  unsigned int todo = loop_invariant_motion_in_fun (fun, true);
+
+  bool store_motion = true;
+  /* TODO Enabling store motion in OpenACC kernel functions requires further
+     handling of the OpenACC internal function calls.  It can also be harmful
+     to data-dependence analysis. Keep it disabled for now. */
+  if (oacc_function_p (cfun) && graphite_analyze_oacc_target_region_type_p (cfun))
+    store_motion = false;
+
+  unsigned int todo = loop_invariant_motion_in_fun (fun, store_motion,
+                                                   restrict_oacc_hoisting);

   if (!in_loop_pipeline)
     loop_optimizer_finalize ();
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 86fc118b6bef..e0beb624aec9 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -55,7 +55,7 @@ extern void tree_transform_and_unroll_loop (class loop *, unsigned,
 extern void tree_unroll_loop (class loop *, unsigned,
                              edge, class tree_niter_desc *);
 extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
-extern unsigned int loop_invariant_motion_in_fun (function *, bool);
+extern unsigned int loop_invariant_motion_in_fun (function *, bool, bool);


 #endif /* GCC_TREE_SSA_LOOP_MANIP_H */

From patchwork Wed Nov 17 16:03:26 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47831
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 495FA3858432
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:20:22 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 1FEF43858410
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1FEF43858410
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 kVh7JX6IrfUSIddv47BwwGlwpkGuSlTLVfZrOANFaZJR/s/O4+0+y80l5uZbIa/o5H9xjRaxJD
 Z01HkG4o10sosl7CUJnd7dGuSrKhNFbtduWlfcEBXMgiRCB5MeI3bdGGX9rtGM6QIP52m60iau
 BKv7eWeF3BWAMDt+esquKQf1fpE50N5Gah1YhLj8lkH9dbLJ2LIT+mNdpa44aZfx7KUrilzuQe
 OpThFbcqH4MwWEbbOjQ3BIgWUjSwM3YwzFzMTa9O8YRq5yFyfBeyegTKpaxiieK0fHq0PYOWLQ
 1ibH7sAVIrlCjdvmg6MqQ2oY
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445391"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:52 -0800
IronPort-SDR: 
 wHfDCB6IuQHwK8Zta42QgC5nJVBGBx05eyn2nsna8fpTrNnTFtS3CJMV7vLSY/N8oglAv7+9X3
 SM8Rd9+4jLdzIz9oxuN5ibUbBcTqTqaB9n/Hc34hZGua5Lf/Mb7qpO9VwgqPq3gfjdz/7bjblg
 5mPSf3UR+PYSJ1QqnWNypUG78RQniXtZa813XicvZtuK3Ar4maIWLX8TsQGG1QTwJuFjl5C6LO
 +CiHuBcxHFPRGUu62Gui1rYs5aMsw+NvBd2NRqiR6T9ZmthzmCbwZSYLGi9uGRQn1PensTrO1h
 1so=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 18/22] openacc: Disable pass_pre on outlined
 functions analyzed by Graphite
Date: Wed, 17 Nov 2021 17:03:26 +0100
Message-ID: <20211117160330.20029-18-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

        * tree-ssa-pre.c (insert): Skip any insertions in OpenACC
        functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 2aedc31e1d73..b904354e4c78 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "tree-cfgcleanup.h"
 #include "alias.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
    implement a bit more than just PRE here.  All of them piggy-back
@@ -3736,6 +3737,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+    /* The additional dependences introduced by the code insertions
+     can cause Graphite's dependence analysis to fail .  Without
+     special handling of those dependences in Graphite, it seems
+     better to skip this step if OpenACC loops that need to be handled
+     by Graphite are found.  Note that the full redundancy elimination
+     step of this pass is useful for the purpose of dependence
+     analysis, for instance, because it can remove definitions from
+     SCoPs that would otherwise prevent the creation of runtime alias
+     checks since those may only use definitions that are available
+     before the SCoP. */
+
+  if (oacc_function_p (cfun)
+      && ::graphite_analyze_oacc_function_p (cfun))
+    return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)

From patchwork Wed Nov 17 16:03:27 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47832
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 930FA385C41C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:20:51 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id 8EBFE3858410
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8EBFE3858410
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 6covh08JQfELSHc1NuMPaGAYppv8apj8SCQU6XQynHkwcZr1Wouqo5fgIYbKP4hIhKfArzfR4u
 F51LHruNB5igNOiFOUhkrrsKCYvudfjaOJAMG/CEoTokriS5aVbdH8Lh6t/xgtDaRLQkquklsL
 9ds1DyUtwFogyHowx0h1535F3+KJL4Rp9dcR6fnzhQ4ZEDdbrvXxB2ZG82XHyROlU3cUuYGUIq
 YkIAMsqAmD4paA//xWm+JKT3faEEewxeKlkRZORPMvPO4fXrxNGzgp4EIE5axRUjrpn4iemTz0
 dTXYbu7No76gjYtVvv9XOALg
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445393"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:55 -0800
IronPort-SDR: 
 QbhpEhKgCUfSsG4ofjBJ1JrTClrXaZMFFNhQc4XRouY1BNAWSlIniUOPL7+3nqrOVysbqxJMDj
 ZlPcjqTALlrlJpE5Xgvf6EWXaNQPnXuPQW664sO1ZWb8sI7WCYU+bUqyxbr4EJGfZqwukBoeul
 cBPXPCu9jSfYUKuPv5FIRQ00r26Hjhd5ihjGKW+qYqAzMQlLVrlKWcqIZNA8w22Q+vY9GF4jqW
 DpR63HkHbEKWc4O2h4KZlN0yDznaeel79WHG7B3y73rk9C49CXsgcM6hB7pDA3Ux3mCGYIkxjO
 p5s=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 19/22] graphite: Tune parameters for OpenACC
 use
Date: Wed, 17 Nov 2021 17:03:27 +0100
Message-ID: <20211117160330.20029-19-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

        * graphite-optimize-isl.c (optimize_isl): Adjust
        param_max_isl_operations value for OpenACC functions and add
        special warnings if value gets exceeded.

        * graphite-scop-detection.c (build_scops): Likewise for
        param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

        * gcc.dg/goacc/graphite-parameter-1.c: New test.
        * gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c                   | 35 ++++++++++++++++---
 gcc/graphite-scop-detection.c                 | 28 ++++++++++++++-
 .../gcc.dg/goacc/graphite-parameter-1.c       | 21 +++++++++++
 .../gcc.dg/goacc/graphite-parameter-2.c       | 23 ++++++++++++
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+     by "kernels" loops in existing OpenACC codes.  Raise the values
+     significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 350000 /* default value */
+      && oacc_function_p (cfun))
+    max_operations = 2000000;
+
   if (max_operations)
     isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
          dump_user_location_t loc = find_loop_location
            (scop->scop_info->region.entry->dest->loop_father);
          if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-           dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-                            "loop nest not optimized, optimization timed out "
-                            "after %d operations [--param max-isl-operations]\n",
-                            max_operations);
-         else
+           {
+              if (oacc_function_p (cfun))
+               {
+                 /* Special casing for OpenACC to unify diagnostic messages
+                    here and in graphite-scop-detection.c. */
+                  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                                   "data-dependence analysis of OpenACC loop "
+                                   "nest "
+                                   "failed; try increasing the value of "
+                                   "--param="
+                                   "max-isl-operations=%d.\n",
+                                   max_operations);
+                }
+              else
+                dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                                 "loop nest not optimized, optimization timed "
+                                 "out after %d operations [--param "
+                                 "max-isl-operations]\n",
+                                 max_operations);
+            }
+          else
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
                             "loop nest not optimized, ISL signalled an error\n");
        }
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 8b41044bce5e..afc955cc97eb 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2056,6 +2056,9 @@ determine_openacc_reductions (scop_p scop)
   }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
    them to SCOPS.  */

@@ -2109,6 +2112,11 @@ build_scops (vec<scop_p> *scops)
        }

       unsigned max_arrays = param_graphite_max_arrays_per_scop;
+
+      if (oacc_function_p (cfun)
+          && param_graphite_max_arrays_per_scop == 100 /* default value */)
+        max_arrays = 200;
+
       if (max_arrays > 0
          && scop->drs.length () >= max_arrays)
        {
@@ -2116,7 +2124,16 @@ build_scops (vec<scop_p> *scops)
                       << scop->drs.length ()
                       << " is larger than --param graphite-max-arrays-per-scop="
                       << max_arrays << ".\n");
-         free_scop (scop);
+
+          if (dump_enabled_p () && oacc_function_p (cfun))
+            dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                             find_loop_location (s->entry->dest->loop_father),
+                             "data-dependence analysis of OpenACC loop nest "
+                             "failed; try increasing the value of --param="
+                             "graphite-max-arrays-per-scop=%d.\n",
+                             max_arrays);
+
+          free_scop (scop);
          continue;
        }

@@ -2129,6 +2146,15 @@ build_scops (vec<scop_p> *scops)
                          << scop_nb_params (scop)
                          << " larger than --param graphite-max-nb-scop-params="
                          << max_dim << ".\n");
+
+          if (dump_enabled_p () && oacc_function_p (cfun))
+            dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                             find_loop_location (s->entry->dest->loop_father),
+                             "data-dependence analysis of OpenACC loop nest "
+                             "failed; try increasing the value of --param="
+                             "graphite-max-nb-scop-params=%d.\n",
+                             max_dim);
+
          free_scop (scop);
          continue;
        }
diff --git a/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
new file mode 100644
index 000000000000..45adbb3f0e85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
@@ -0,0 +1,21 @@
+/* Verify that a warning about an exceeded Graphite parameter gets
+   output as optimization information and not only as a dump message
+   for OpenACC functions. */
+
+/* { dg-additional-options "-O2 -fopt-info-missed --param=graphite-max-arrays-per-scop=1" } */
+
+extern int a[1000];
+extern int b[1000];
+
+void test ()
+{
+#pragma acc parallel loop auto
+/* { dg-missed {data-dependence analysis of OpenACC loop nest failed\; try increasing the value of --param=graphite-max-arrays-per-scop=1.} "" { target *-*-* } .-1  } */
+/* { dg-missed {'auto' loop has not been analyzed \(cf. 'graphite' dumps for more information\).} "" { target *-*-* } .-2 } */
+/* { dg-missed {.*not inlinable.*} "" { target *-*-* } .-3 } */
+  for (int i = 1; i < 995; i++)
+    a[i] = b[i + 5] + b[i - 1];
+}
+
+
+/* { dg-prune-output ".*not inlinable.*"} */
diff --git a/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c
new file mode 100644
index 000000000000..f2830cd62db0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c
@@ -0,0 +1,23 @@
+/* Verify that a warning about an exceeded Graphite parameter gets
+   output as optimization information and not only as a dump message
+   for OpenACC functions. */
+
+/* { dg-additional-options "-O2 -fopt-info-missed --param=max-isl-operations=1" } */
+
+void test (int* restrict a, int *restrict b)
+{
+  int i = 1;
+  int j = 1;
+  int m = 0;
+
+#pragma acc parallel loop auto copyin(b) copyout(a) reduction(max:m)
+/* { dg-missed {data-dependence analysis of OpenACC loop nest failed; try increasing the value of --param=max-isl-operations=1.} "" { target *-*-* } .-1  } */
+/* { dg-missed {'auto' loop has not been analyzed \(cf. 'graphite' dumps for more information\).} "" { target *-*-* } .-2 } */
+/* { dg-missed {.*not inlinable.*} "" { target *-*-* } .-3 } */
+  for (i = 1; i < 995; i++)
+    {
+      int x = b[i] * 2;
+      for (j = 1; j < 995; j++)
+        m = m + a[i] + x;
+    }
+}

From patchwork Wed Nov 17 16:03:28 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47833
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C42A63858410
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:21:20 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180])
 by sourceware.org (Postfix) with ESMTPS id B107E385842C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:04:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B107E385842C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 zhto306mxhIJyJwO/LUGFAkolAEiMggbqaQI0jbZHQ0ZREFbm9zpIqm+Va1iPlzi2r02MpyepE
 4qjyTgQ5mDE6f4m1OSIBlzYZihmc4bxV7HSVxhKoLN6BEBTWWwN/eEgwPe1Amk86Pw5l6gBQ61
 6dIzAnk9ahk7eyuT6RqE+mRe0xXrw6l5iLLHNQHUEUrukqxZAgBgMb6NMCsvIKeuoAhFNoMwpR
 s7eVNMZr/A1+Q5b3K7kk7Uq386YGF4pVI6v3VG2tqsxKXC2eCp6tUos2ptY6i4F2piUINceXrU
 oDDjG43i8L5SbpCVVm/P5QNO
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68445394"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa3.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:04:56 -0800
IronPort-SDR: 
 PWgkvd5hIIv68FzRrqdbPX8TPuuluOi3Q5VrDGjM1YemdA86mMHWpxVpLM5jBjNJE5ZWh+zzt+
 YN986Qebh2TC47UcN3P0v3TtVlBEIDs+k61IFHYj/XiCUqTP0BERI5xymklpQaOx/vMW6fZ0pE
 Ju3pqFf5k+c/SAWOvVdf17d84gNIDhgwXhIYe+QWJzO0ka3a3mV/mWjwTewNhiKrkg0n8FpOV/
 laPa4pZk2ZTecLwKIEwclx24umvMcD1H53rIxUUPQ/A9+xfJ3D+WG9VPXtbxBztD8hCVHHvGYx
 3r0=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 20/22] graphite: Adjust scop loop-nest choice
Date: Wed, 17 Nov 2021 17:03:28 +0100
Message-ID: <20211117160330.20029-20-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

        * graphite-scop-detection.c (scop_context_loop): New function.
        (build_alias_set): Use scop_context_loop instead of find_common_loop.
        * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
        * graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c    | 21 ++++++++++++++++++---
 gcc/graphite.h                   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index bdabe588c3d8..ec055a358f39 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
         conditional if aliasing can be ruled out at runtime and the original
         version of the SCoP, otherwise. */

-      loop_p loop
-          = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-                              scop->scop_info->region.exit->src->loop_father);
+      loop_p loop = scop_context_loop (scop);
       tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
       tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
       set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index afc955cc97eb..99e906a5d120 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+    exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1776,9 +1793,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-    = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-                       scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree);
 extern void dot_all_sese (FILE *, vec<sese_l> &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif

From patchwork Wed Nov 17 16:03:29 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47834
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 382B1385841C
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:21:56 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 9DA613858436
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:05:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9DA613858436
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 +M90g2q1yRhnIvD48LILbOqgl6sb9U6pRyvkX/Nxi/GjU/4QXoIjgDJD4fceclyzDMjpEled53
 1IEpAtu4v/PyT6EW1BcVskrI2tCWV1nVgmMnLhMSFvQe/BgquVZQ1PEb6Tngxa+FuUDURVhP1Z
 wA5oONLPO4YblPyiVyU1dVBhG4vZyGKIrSX06/QaL5FbunBQn/zLWZg7DLgurYDMED+Fpyqq0y
 Ts40oQakCx3YBmyXMpDjY/ZR9DwrdFE5fcz9gq9OIRZrFNpgNE6re9JrDDeNpxz132h8AUU+a9
 K5zZROKB6Iu0usfUnGnYHLYU
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="71081364"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa1.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:05:06 -0800
IronPort-SDR: 
 TCrf9jtTOs6M5s+gotr6DEzlck2LaD9ypTaEkG3+XTQ7VwlEbFOUmEsXwjH4xHWVpeuVuEqdDU
 Omq0s/HIAH6D4uBdiueSNdfVf6bHgLxdqq4G5v0juBgXk5UvJx3pIKNQ74m2rBRh9qhxS/NrCq
 HlerFd+yhv3SOWDdgyFIN9Fq6A/EKav9gOV1X4VpgJCK+24E4+kVJKfVoJPCnbkAIbADBBHaex
 Ot/kKCZtQj3vpbjd1VPdSL/0E0oj+QgdqJ3q1Zd6+FgGTaXPWkkrKqbKT1ocV1b6Idyo6mqB2s
 kdY=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 21/22] graphite: Accept loops without data
 references
Date: Wed, 17 Nov 2021 17:03:29 +0100
Message-ID: <20211117160330.20029-21-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

        * graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
        Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -------------
 1 file changed, 13 deletions(-)

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 99e906a5d120..9311a0e42a57 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -851,19 +851,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
          return true;
        }

-      /* Check if all loop nests have at least one data reference.
-        ???  This check is expensive and loops premature at this point.
-        If important to retain we can pre-compute this for all innermost
-        loops and reject those when we build a SESE region for a loop
-        during SESE discovery.  */
-      if (! loop->inner
-         && ! loop_nest_has_data_refs (loop))
-       {
-         DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-                      << " does not have any data reference.\n");
-         return true;
-       }
-
       DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n");
     }


From patchwork Wed Nov 17 16:03:30 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Frederik Harwath <frederik@codesourcery.com>
X-Patchwork-Id: 47835
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A55FB3858436
	for <patchwork@sourceware.org>; Wed, 17 Nov 2021 16:22:51 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 459E4385842C
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Nov 2021 16:05:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 459E4385842C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: 
 2doMWYY6HOfv0zUsU+f0ytrm0TDYLA9OBDNu9oZ5zedXROuEB36V3QXyBczW4Lc1a7JWYHrTdE
 lifHkY4mVfHFxmOL0urT91EWHP9B0ljRalS11EIarns5ehQWWIwA5JFIYytVyd+ok7ONYqA4D+
 msXB2dMQZjPsp4os7lur6n5K/+cQAWjSRYifHFn/sWUNbFpLuG7E0Q0v71BUB7C2U6l9C0HHhJ
 zetPs+628sIN2Gm2BIV1r2RLMeexaYCkDlWc2csUnXqPOm41uGuqOdtk74VJKwh5mCjdDUWAPr
 BJKNC/N8vOzu11LNo2Pw9eGo
X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68604098"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa2.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:05:11 -0800
IronPort-SDR: 
 Iqu28DYme8BOKHi/AG0GUmctGTWjGxTqUpm3x0CmQjpcY4QQXRTWkwIiDn+smfa4dVVzecgGQo
 hiju3TpbFx6HlEEMjAl3nChDJ3k3sw5eOtzYcrLurlSAnt/3GXzAfO8z0zXZppUONkBCwdqxS2
 JHzjZJNVAopej4RxclrxSNqyhn+h3HRpycbij34v8xHYYlPsfRZeJnwt2SSshjHNGDzFDvs7w+
 i1RwnEbOFPOFLrgLWqGCnTHD+cqk+cPcwZZoS4v5xATb8I/DhxrNbzwFvbRnzR9etdfvjrBulj
 wX8=
From: Frederik Harwath <frederik@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [OG11][committed][PATCH 22/22] openacc: Adjust test expectations to
 new "kernels" handling
Date: Wed, 17 Nov 2021 17:03:30 +0100
Message-ID: <20211117160330.20029-22-frederik@codesourcery.com>
X-Mailer: git-send-email 2.33.0
In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com>
References: <20211117160330.20029-1-frederik@codesourcery.com>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To
 SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4)
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Adjust tests to changed expectations with the new Graphite-based
"kernels" handling.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust.
        * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90: Removed.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/acc-icf.c: Adjust.
        * c-c++-common/goacc/cache-3-1.c: Adjust.
        * c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: Adjust.
        * c-c++-common/goacc/classify-kernels.c: Adjust.
        * c-c++-common/goacc/classify-serial.c: Adjust.
        * c-c++-common/goacc/if-clause-2.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-1.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-2.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-ice-1.c: Adjust.
        * c-c++-common/goacc/kernels-decompose-ice-2.c: Adjust.
        * c-c++-common/goacc/kernels-loop-3-acc-loop.c: Adjust.
        * c-c++-common/goacc/kernels-loop-3.c: Adjust.
        * c-c++-common/goacc/loop-2-kernels.c: Adjust.
        * c-c++-common/goacc/nested-reductions-2-parallel.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Adjust.
        * c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c: Adjust.
        * c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c: Adjust.
        * c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c: Adjust.
        * c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Adjust.
        * c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c: Adjust.
        * c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
        * c-c++-common/goacc/routine-1.c: Adjust.
        * c-c++-common/goacc/routine-level-of-parallelism-2.c: Adjust.
        * c-c++-common/goacc/routine-nohost-1.c: Adjust.
        * c-c++-common/goacc/uninit-copy-clause.c: Adjust.
        * gcc.dg/goacc/loop-processing-1.c: Adjust.
        * gcc.dg/goacc/nested-function-1.c: Adjust.
        * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust.
        * gfortran.dg/goacc/classify-kernels.f95: Adjust.
        * gfortran.dg/goacc/classify-parallel.f95: Adjust.
        * gfortran.dg/goacc/classify-routine.f95: Adjust.
        * gfortran.dg/goacc/classify-serial.f95: Adjust.
        * gfortran.dg/goacc/common-block-3.f90: Adjust.
        * gfortran.dg/goacc/gang-static.f95: Adjust.
        * gfortran.dg/goacc/kernels-decompose-1.f95: Adjust.
        * gfortran.dg/goacc/kernels-decompose-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-data-2.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop-inner.f95: Adjust.
        * gfortran.dg/goacc/kernels-loop.f95: Adjust.
        * gfortran.dg/goacc/kernels-tree.f95: Adjust.
        * gfortran.dg/goacc/loop-2-kernels.f95: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-2.f90: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-3.f90: Adjust.
        * gfortran.dg/goacc/loop-auto-transfer-4.f90: Adjust.
        * gfortran.dg/goacc/nested-function-1.f90: Adjust.
        * gfortran.dg/goacc/nested-reductions-2-parallel.f90: Adjust.
        * gfortran.dg/goacc/pr72741.f90: Adjust.
        * gfortran.dg/goacc/private-explicit-kernels-1.f95: Adjust.
        * gfortran.dg/goacc/private-predetermined-kernels-1.f95: Adjust.
        * gfortran.dg/goacc/routine-module-mod-1.f90: Adjust.
        * gfortran.dg/goacc/uninit-copy-clause.f95: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Removed.
---
 gcc/testsuite/c-c++-common/goacc/acc-icf.c    |   4 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |   2 +-
 ...classify-kernels-unparallelized-graphite.c |   4 +-
 .../c-c++-common/goacc/classify-kernels.c     |  23 +--
 .../c-c++-common/goacc/classify-serial.c      |  12 +-
 .../c-c++-common/goacc/if-clause-2.c          |   2 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |  31 ++-
 .../c-c++-common/goacc/kernels-decompose-2.c  |   2 +-
 .../goacc/kernels-decompose-ice-1.c           |   5 +-
 .../goacc/kernels-decompose-ice-2.c           |   3 +-
 .../goacc/kernels-loop-3-acc-loop.c           |   2 +-
 .../c-c++-common/goacc/kernels-loop-3.c       |   2 +-
 .../c-c++-common/goacc/loop-2-kernels.c       |  20 +-
 .../goacc/nested-reductions-2-parallel.c      | 138 +++++++++++++
 ...kernels-conditional-loop-independent_seq.c | 129 ------------
 .../note-parallelism-1-kernels-loop-auto.c    | 104 ++++++----
 ...rallelism-1-kernels-loop-independent_seq.c |  19 +-
 .../goacc/note-parallelism-1-kernels-loops.c  |   4 +-
 ...note-parallelism-1-kernels-straight-line.c |   2 +-
 ...e-parallelism-combined-kernels-loop-auto.c |  34 ++--
 ...sm-combined-kernels-loop-independent_seq.c |  16 --
 ...kernels-conditional-loop-independent_seq.c |  38 ++--
 .../note-parallelism-kernels-loop-auto.c      | 100 +++++-----
 ...parallelism-kernels-loop-independent_seq.c |  27 +--
 .../goacc/note-parallelism-kernels-loops.c    |  29 +--
 gcc/testsuite/c-c++-common/goacc/routine-1.c  |   2 +-
 .../goacc/routine-level-of-parallelism-2.c    |   2 -
 .../c-c++-common/goacc/routine-nohost-1.c     |   2 +-
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 -
 .../gcc.dg/goacc/loop-processing-1.c          |   2 +-
 .../gcc.dg/goacc/nested-function-1.c          |   3 +-
 .../goacc/classify-kernels-unparallelized.f95 |  26 +--
 .../gfortran.dg/goacc/classify-kernels.f95    |  26 +--
 .../gfortran.dg/goacc/classify-parallel.f95   |   6 +-
 .../gfortran.dg/goacc/classify-routine.f95    |   8 +-
 .../gfortran.dg/goacc/classify-serial.f95     |  11 +-
 .../gfortran.dg/goacc/common-block-3.f90      |  14 +-
 .../gfortran.dg/goacc/gang-static.f95         |  14 +-
 .../gfortran.dg/goacc/kernels-decompose-1.f95 | 183 ++++++++++++------
 .../gfortran.dg/goacc/kernels-decompose-2.f95 | 112 +++++++----
 .../gfortran.dg/goacc/kernels-loop-2.f95      |  13 +-
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |  13 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   6 +-
 .../gfortran.dg/goacc/kernels-loop.f95        |  12 +-
 .../gfortran.dg/goacc/kernels-tree.f95        |   2 +-
 .../gfortran.dg/goacc/loop-2-kernels.f95      |  22 +--
 .../goacc/loop-auto-transfer-2.f90            |   2 -
 .../goacc/loop-auto-transfer-3.f90            |   8 -
 .../goacc/loop-auto-transfer-4.f90            |  30 ---
 .../gfortran.dg/goacc/nested-function-1.f90   |   2 +
 .../goacc/nested-reductions-2-parallel.f90    | 177 +++++++++++++++++
 gcc/testsuite/gfortran.dg/goacc/pr72741.f90   |   8 +-
 .../goacc/private-explicit-kernels-1.f95      |  13 +-
 .../goacc/private-predetermined-kernels-1.f95 |  16 +-
 .../goacc/routine-module-mod-1.f90            |   2 +-
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 -
 .../kernels-decompose-1.c                     |   5 +-
 .../libgomp.oacc-c-c++-common/parallel-dims.c |  34 ++--
 .../libgomp.oacc-c-c++-common/pr84955-1.c     |   1 -
 .../libgomp.oacc-c-c++-common/pr85381-2.c     |   8 +-
 .../libgomp.oacc-c-c++-common/pr85381-3.c     |   3 -
 .../libgomp.oacc-c-c++-common/pr85381-4.c     |   4 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |   2 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c     |   2 +-
 .../libgomp.oacc-c-c++-common/pr85486.c       |   2 +-
 .../vector-length-128-1.c                     |   5 +-
 .../vector-length-128-2.c                     |   5 +-
 .../vector-length-128-3.c                     |   5 +-
 .../vector-length-128-4.c                     |   5 +-
 .../vector-length-128-5.c                     |   5 +-
 .../vector-length-128-6.c                     |   5 +-
 .../vector-length-128-7.c                     |   5 +-
 .../gangprivate-attrib-1.f90                  |   3 +-
 .../gangprivate-attrib-2.f90                  |   3 +-
 .../kernels-acc-loop-reduction-2.f90          |  12 +-
 .../parallel-loop-auto-reduction-2.f90        |  98 ----------
 .../libgomp.oacc-fortran/pr94358-1.f90        |   2 -
 77 files changed, 908 insertions(+), 803 deletions(-)
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

diff --git a/gcc/testsuite/c-c++-common/goacc/acc-icf.c b/gcc/testsuite/c-c++-common/goacc/acc-icf.c
index bc2e0fd19b92..9cf119bf89c7 100644
--- a/gcc/testsuite/c-c++-common/goacc/acc-icf.c
+++ b/gcc/testsuite/c-c++-common/goacc/acc-icf.c
@@ -9,7 +9,7 @@
 /* { dg-bogus "warning: region is worker partitioned but does not contain worker partitioned code" "TODO default 'gang' 'vector'" { xfail *-*-* } .+3 }
    TODO It's the compiler's own decision to not use 'worker' parallelism here, so it doesn't make sense to bother the user about it.  */
 int
-routine1 (int n) /* { dg-bogus "region is worker partitioned but does not contain worker partitioned code" "" { xfail *-*-* } } */
+routine1 (int n)
 {
   int i;

@@ -24,7 +24,7 @@ routine1 (int n) /* { dg-bogus "region is worker partitioned but does not contai
 /* { dg-bogus "warning: region is worker partitioned but does not contain worker partitioned code" "TODO default 'gang' 'vector'" { xfail *-*-* } .+3 }
    TODO It's the compiler's own decision to not use 'worker' parallelism here, so it doesn't make sense to bother the user about it.  */
 int
-routine2 (int n) /* { dg-bogus "region is worker partitioned but does not contain worker partitioned code" "" { xfail *-*-* } } */
+routine2 (int n)
 {
   int i;

diff --git a/gcc/testsuite/c-c++-common/goacc/cache-3-1.c b/gcc/testsuite/c-c++-common/goacc/cache-3-1.c
index 5318a57d51e1..36235f6d49f4 100644
--- a/gcc/testsuite/c-c++-common/goacc/cache-3-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/cache-3-1.c
@@ -31,7 +31,7 @@ foo (int g[3][10], int h[4][8], int i[2][10], int j[][9],
     ;
   #pragma acc cache(t[2:5]) /* { dg-error "is threadprivate variable" } */
     ;
-  #pragma acc cache(k[0.5:]) /* { dg-error "low bound \[^\n\r]* of array section does not have integral type" } */
+ #pragma acc cache(k[0.5:]) /* { dg-error "low bound \[^\n\r]* of array section does not have integral type" } */
     ;
   #pragma acc cache(l[:7.5f]) /* { dg-error "length \[^\n\r]* of array section does not have integral type" } */
     ;
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
index 77f4524907a9..721b34acd9a8 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
@@ -2,7 +2,6 @@
    OpenACC 'kernels' with Graphite kernles handling (default).  */

 /* { dg-additional-options "-O2" }
-   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fopt-info-note-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
@@ -22,7 +21,8 @@ extern unsigned int f (unsigned int);
 void KERNELS ()
 {
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
-  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
+  for (unsigned int i = 0; i < N; i++)
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } .-1 } */
     /* An "extern"al mapping of loop iterations/array indices makes the loop
        unparallelizable.  */
     c[i] = a[f (i)] + b[f (i)]; /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index 7aaebeff2828..5abda60ed4a7 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -2,10 +2,9 @@
    'kernels' (parloops version).  */

 /* { dg-additional-options "-O2" }
-   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
-   { dg-additional-options "-fopt-info-note-optimized-omp" }
+   { dg-additional-options "-fopt-info-optimized-omp" }
+   { dg-additional-options "-fopt-info-note-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-parloops1-all" }
    { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
@@ -19,22 +18,14 @@ extern unsigned int *__restrict c;

 void KERNELS ()
 {
-#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  for (unsigned int i = 0; i < N; i++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  for (unsigned int i = 0; i < N; i++)  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } .-1 } */
     c[i] = a[i] + b[i];
 }

-/* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
-
-/* Check that exactly one OpenACC kernels construct is analyzed, and that it
-   can be parallelized.
-   { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
-
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Function is parallel_kernels_graphite OpenACC kernels offload" 1 "oaccloops1" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-serial.c b/gcc/testsuite/c-c++-common/goacc/classify-serial.c
index 0c21919758bb..98df04563ba4 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-serial.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-serial.c
@@ -4,7 +4,7 @@
 /* { dg-additional-options "-O2 -w" }
    { dg-additional-options "-fopt-info-optimized-omp" }
    { dg-additional-options "-fdump-tree-ompexp" }
-   { dg-additional-options "-fdump-tree-oaccloops" } */
+   { dg-additional-options "-fdump-tree-oaccloops1" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -18,9 +18,7 @@ extern unsigned int *__restrict c;
 void SERIAL ()
 {
 #pragma acc serial loop copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-bogus "warning: region contains gang partitioned code but is not gang partitioned" "TODO 'serial'" { xfail *-*-* } .-1 }
-     { dg-bogus "warning: region contains worker partitioned code but is not worker partitioned" "" { target *-*-* } .-2 }
-     { dg-bogus "warning: region contains vector partitioned code but is not vector partitioned" "TODO 'serial'" { xfail *-*-* } .-3 }
+  /* { dg-bogus "warning: region contains worker partitioned code but is not worker partitioned" "" { target *-*-* } .-2 }
      TODO Should we really diagnose this if the user explicitly requested 'serial'?
      TODO Should we instead diagnose ('-Wextra' category?) that the user may enable use of parallelism if replacing 'serial' with 'parallel', if applicable?  */
   for (unsigned int i = 0; i < N; i++)
@@ -32,6 +30,6 @@ void SERIAL ()

 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC serial offload" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC serial offload" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint\\)\\)" 1 "oaccloops1" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index a48072509e1a..96e36cac6eed 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -11,7 +11,7 @@ f (short c)
 #pragma acc kernels if(c) copy(c)
   /* { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels map\(tofrom:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "gimple" } } */
   /* { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels map\(tofrom:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } }
-     { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single async\(-1\) num_gangs\(1\) map\(force_present:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } } */
+     { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_graphite async\(-1\) map\(force_present:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } } */
   ++c;

 #pragma acc data if(c) copy(c)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index f549cbadfa7e..b9e14852fbef 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -1,9 +1,11 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+/* { dg-additional-options "-O2" } for "Graphite".  */

 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "-fdump-tree-gimple" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
-   { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
+   { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose-details" } */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
@@ -28,36 +30,34 @@ main (void)
   int i;
   unsigned int sum = 1;

-#pragma acc kernels copyin(a[0:N]) copy(sum)
-  /* { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } .-1 }
-     TODO Is this maybe the report that belongs to the XFAILed report further down?  */
+#pragma acc kernels copyin(a[0:N]) copy(sum) /* { dg-line l_kernels_pragma } */
+  /* { dg-missed {'map\(tofrom:sum \[len: [0-9]+\]\)' not optimized: 'sum' is unsuitable for privatization} "TODO Missing synthetic reduction clause" { target *-*-* } l_kernels_pragma } */
   {
     #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < N; ++i)
       sum += a[i];

-    sum++; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    sum++;
     a[0]++;

     #pragma acc loop independent /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
-    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" {target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-warning {loop has "independent" clause but data dependences were found.} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < N; ++i)
       sum += a[i];

-    if (sum > 10) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+    if (sum > 10)
       {
         #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
        /* { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_i$c_loop_i } */
-       /*TODO { dg-optimized "assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_loop_i$c_loop_i } */
+       /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
         for (i = 0; i < N; ++i)
           sum += a[i];
       }

     #pragma acc loop auto /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < N; ++i)
       sum += a[i];
@@ -77,18 +77,17 @@ main (void)
    sequence of compute constructs.
    { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels map\(tofrom:sum \[len: [0-9]+\]\) map\(to:a\[0\] \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
    As noted above, we get three "old-style" kernel regions, one gang-single region, and one parallelized loop region.
-   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels async\(-1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 3 "omp_oacc_kernels_decompose" } }
-   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_parallelized async\(-1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 1 "omp_oacc_kernels_decompose" } }
-   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single async\(-1\) num_gangs\(1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 1 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_graphite async\(-1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 5 "omp_oacc_kernels_decompose" } }

    'data' plus five CCs.
    { dg-final { scan-tree-dump-times {(?n)#pragma omp target } 6 "omp_oacc_kernels_decompose" } }

-   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 1 "omp_oacc_kernels_decompose" } }
    { dg-final { scan-tree-dump-times {(?n)#pragma acc loop independent private\(i\)$} 1 "omp_oacc_kernels_decompose" } }
-   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop auto private\(i\)$} 1 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop auto private\(i\)$} 2 "omp_oacc_kernels_decompose" } }
    { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "omp_oacc_kernels_decompose" } }

    Each of the parallel regions is async, and there is a final call to
    __builtin_GOACC_wait.
    { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "omp_oacc_kernels_decompose" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index f5f6a7e3e8b7..4eb030d4ce08 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -2,7 +2,7 @@

 /* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "--param=openacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose-parloops" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
index 5e0031f76c12..6701219d28bc 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
@@ -1,9 +1,8 @@
 /* Test OpenACC 'kernels' construct decomposition.  */

-/* { dg-additional-options "-fopt-info-omp-all" } */
+/* { dg-additional-options "-fopt-info-omp-optimized" } */
 /* { dg-additional-options "-fchecking --param=openacc-kernels=decompose" } */
-/* { dg-ice "TODO" }
-   { dg-prune-output "during GIMPLE pass: omplower" } */
+/* { dg-prune-output "during GIMPLE pass: omplower" } */

 /* Reduced from 'kernels-decompose-2.c'.
    (Hopefully) similar instances:
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
index 8bf60a9a5099..8de4b452fbc5 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
@@ -1,8 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */

 /* { dg-additional-options "-fchecking --param=openacc-kernels=decompose" } */
-/* { dg-ice "TODO" }
-   { dg-prune-output "during GIMPLE pass: omplower" } */
+/* { dg-prune-output "during GIMPLE pass: omplower" } */

 /* Reduced from 'kernels-decompose-ice-1.c'.  */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
index e715c488e947..a9098ac531f2 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "--param=openacc-kernels=parloops" } as this is
+/* { dg-additional-options "--param openacc-kernels=decompose-parloops" } as this is
    specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index c1aae7ffc19b..ae812583cf97 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "--param=openacc-kernels=parloops" } as this is
+/* { dg-additional-options "--param openacc-kernels=decompose-parloops" } as this is
    specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
index c989222669c0..143ef0a7905a 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -37,7 +37,7 @@ void K(void)
        for (j = 0; j < 10; j++)
          { }
       }
-#pragma acc loop seq gang // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" }
     for (i = 0; i < 10; i++)
       { }

@@ -59,11 +59,11 @@ void K(void)
 #pragma acc loop worker // { dg-error "inner loop uses same" }
        for (j = 0; j < 10; j++)
          { }
-#pragma acc loop gang
+#pragma acc loop gang /* { dg-bogus "incorrectly nested OpenACC loop parallelism" "TODO-kernels" { xfail *-*-* } } */
        for (j = 0; j < 10; j++)
          { }
       }
-#pragma acc loop seq worker // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" }
     for (i = 0; i < 10; i++)
       { }
 #pragma acc loop gang worker
@@ -85,14 +85,14 @@ void K(void)
 #pragma acc loop vector // { dg-error "inner loop uses same" }
        for (j = 1; j < 10; j++)
          { }
-#pragma acc loop worker
+#pragma acc loop worker /* { dg-bogus "incorrectly nested OpenACC loop parallelism" "TODO-kernels" { xfail *-*-* } } */
        for (j = 1; j < 10; j++)
          { }
-#pragma acc loop gang
+#pragma acc loop gang /* { dg-bogus "incorrectly nested OpenACC loop parallelism" "TODO-kernels" { xfail *-*-* } } */
        for (j = 1; j < 10; j++)
          { }
       }
-#pragma acc loop seq vector // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" }
     for (i = 0; i < 10; i++)
       { }
 #pragma acc loop gang vector
@@ -105,7 +105,7 @@ void K(void)
 #pragma acc loop auto
     for (i = 0; i < 10; i++)
       { }
-#pragma acc loop seq auto // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" }
     for (i = 0; i < 10; i++)
       { }
 #pragma acc loop gang auto // { dg-error "'auto' conflicts" }
@@ -147,7 +147,7 @@ void K(void)
 #pragma acc kernels loop worker(num:5)
   for (i = 0; i < 10; i++)
     { }
-#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" }
   for (i = 0; i < 10; i++)
     { }
 #pragma acc kernels loop gang worker
@@ -163,7 +163,7 @@ void K(void)
 #pragma acc kernels loop vector(length:5)
   for (i = 0; i < 10; i++)
     { }
-#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" }
   for (i = 0; i < 10; i++)
     { }
 #pragma acc kernels loop gang vector
@@ -176,7 +176,7 @@ void K(void)
 #pragma acc kernels loop auto
   for (i = 0; i < 10; i++)
     { }
-#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" }
   for (i = 0; i < 10; i++)
     { }
 #pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" }
diff --git a/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-parallel.c b/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-parallel.c
index 1f6b4e78293b..fb679f349abd 100644
--- a/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-parallel.c
@@ -387,3 +387,141 @@ void acc_parallel_loop_reduction (void)
       }
   }
 }
+
+/* The same tests as above, but inside a routine construct.  */
+#pragma acc routine gang
+void acc_routine (void)
+{
+  int i, j, k, l, sum, diff;
+
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+        // { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+       for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum) // { dg-warning "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(-:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+        // { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        for (k = 0; k < 10; k++)
+         #pragma acc loop reduction(*:sum) // { dg-warning "conflicting reduction operations for .sum." }
+         for (l = 0; l < 10; l++)
+           sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+      #pragma acc loop reduction(+:sum) // { dg-warning "conflicting reduction operations for .sum." })
+      // { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        for (k = 0; k < 10; k++)
+         #pragma acc loop reduction(*:sum) // { dg-warning "conflicting reduction operations for .sum." }
+         for (l = 0; l < 10; l++)
+           sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(-:diff) // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(+:sum) // { dg-warning "nested loop in reduction needs reduction clause for .diff." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+void acc_kernels (void)
+{
+  int i, j, k, sum, diff;
+
+  /* FIXME:  No diagnostics are produced for these loops because reductions
+     in kernels regions are not supported yet.  */
+  #pragma acc kernels
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:diff)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum) // { dg-warning "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
deleted file mode 100644
index b3f4e24173af..000000000000
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
+++ /dev/null
@@ -1,129 +0,0 @@
-/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
-   construct containing conditionally executed 'loop' constructs with
-   'independent' or 'seq' clauses.  */
-
-/* { dg-additional-options "-fopt-info-all-omp" } */
-
-//TODO update accordingly
-/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
-
-extern int c;
-
-int
-main ()
-{
-  int x, y, z;
-
-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
- /* Strangely indented to keep this similar to other test cases.  */
-  if (c) /* { dg-message "optimized: beginning .Graphite. region in OpenACC .kernels. construct" } */
- {
-#pragma acc loop seq
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent gang
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent worker
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent vector
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent gang vector
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent gang worker
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent worker vector
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent gang worker vector
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent gang
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent worker
-    for (y = 0; y < 10; y++)
-#pragma acc loop independent vector
-      for (z = 0; z < 10; z++)
-       ;
-
-#pragma acc loop independent
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-    ;
-
-#pragma acc loop independent
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent
-    for (y = 0; y < 10; y++)
-      ;
-
-#pragma acc loop independent
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent
-    for (y = 0; y < 10; y++)
-#pragma acc loop independent
-      for (z = 0; z < 10; z++)
-       ;
-
-#pragma acc loop seq
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent
-    for (y = 0; y < 10; y++)
-#pragma acc loop independent
-      for (z = 0; z < 10; z++)
-       ;
-
-#pragma acc loop independent
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop seq
-    for (y = 0; y < 10; y++)
-#pragma acc loop independent
-      for (z = 0; z < 10; z++)
-       ;
-
-#pragma acc loop independent
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent
-    for (y = 0; y < 10; y++)
-#pragma acc loop seq
-      for (z = 0; z < 10; z++)
-       ;
-
-#pragma acc loop seq
-  /* { dg-message "missed: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent
-    for (y = 0; y < 10; y++)
-#pragma acc loop seq
-      for (z = 0; z < 10; z++)
-       ;
- }
-
-  return 0;
-}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
index b0313796adee..8ad662524972 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
@@ -2,6 +2,7 @@
    construct containing 'loop' constructs with explicit or implicit 'auto'
    clause.  */

+/* { dg-additional-options "-fopt-info-optimized-omp" } */
 /* { dg-additional-options "-fopt-info-note-omp" } */

 //TODO update accordingly
@@ -15,109 +16,136 @@ main ()
 #pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  {
-#pragma acc loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang
+  /* { dg-message "optimized: assigned OpenACC gang loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto worker
+  /* { dg-message "optimized: assigned OpenACC worker loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto vector
+  /* { dg-message "optimized: assigned OpenACC vector loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto gang vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang vector
+  /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto gang worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang worker
+  /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto worker vector
+  /* { dg-message "optimized: assigned OpenACC worker vector loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto gang worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang worker vector
+  /* { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
     ;

-#pragma acc loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang
+  /* { dg-message "optimized: assigned OpenACC gang loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-3 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+#pragma acc loop auto worker
+  /* { dg-message "optimized: assigned OpenACC worker loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-2 } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+#pragma acc loop auto vector
+  /* { dg-message "optimized: assigned OpenACC vector loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-*} .-2 } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
       ;

 #pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "optimized: assigned OpenACC seq loop parallelism" "" { target *-*-*} .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;
  }
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
index 9eb846325a63..8d20840ef281 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
@@ -3,10 +3,13 @@
    clauses.  */

 /* { dg-additional-options "-fopt-info-all-omp" } */
+/* { dg-additional-options "-O2" } */

 //TODO update accordingly
 /* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */

+/* { dg-prune-output "^.*?loop in .kernels. region has not been analyzed.*?$" } */
+
 int
 main ()
 {
@@ -16,47 +19,38 @@ main ()
  /* Strangely indented to keep this similar to other test cases.  */
  {
 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent worker /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent vector /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent gang vector /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent gang worker /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent worker vector /* { dg-message "optimized: assigned OpenACC worker vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent gang worker vector /* { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent worker /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -65,19 +59,16 @@ main ()
        ;

 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
     for (y = 0; y < 10; y++)
       ;

 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -86,7 +77,6 @@ main ()
        ;

 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -95,7 +85,6 @@ main ()
        ;

 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -104,7 +93,6 @@ main ()
        ;

 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -113,7 +101,6 @@ main ()
        ;

 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
     for (y = 0; y < 10; y++)
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
index 6cf51904e7ad..524112357659 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
@@ -2,7 +2,7 @@
    construct containing loops.  */

 /* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
-/* { dg-additional-options "-fopt-info-optimized-omp-note" } */
+/* { dg-additional-options "-fopt-info-optimized-note-omp" } */

 //TODO update accordingly
 /* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
@@ -20,7 +20,7 @@ main ()

  /* Strangely indented to keep this similar to other test cases.  */
  {
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     ;

   for (x = 0; x < 10; x++)
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
index d4cb2364737c..a3fc0cb96578 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
@@ -37,7 +37,7 @@ main ()
   /* { dg-optimized {'map\(force_tofrom:x \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:x \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } l_pragma_kernels } */
   /* { dg-optimized {'map\(to:x \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(x\)'} "" { target *-*-* } l_pragma_kernels } */
   {
-    x = 0; /* { dg-message "note: beginning .gang-single. part in OpenACC .kernels. region" } */
+    x = 0; /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     y = x < 10;
     z = x++;
     ;
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
index a4f721067ccf..a189ef498c22 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
@@ -1,4 +1,4 @@
-/* Test the output of "-fopt-info-optimized-omp" for combined OpenACC 'kernels
+/* Test the output of "-fopt-info-note-omp" for combined OpenACC 'kernels
    loop' constructs with explicit or implicit 'auto' clause.  */

 /* { dg-additional-options "-fopt-info-note-omp" } */
@@ -12,47 +12,47 @@ main ()
   int x, y, z;

 #pragma acc kernels loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto gang vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto gang worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto gang worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
     for (y = 0; y < 10; y++)
@@ -61,19 +61,19 @@ main ()
        ;

 #pragma acc kernels loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop auto
     for (y = 0; y < 10; y++)
       ;

 #pragma acc kernels loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop auto
     for (y = 0; y < 10; y++)
@@ -82,7 +82,7 @@ main ()
        ;

 #pragma acc kernels loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop auto
     for (y = 0; y < 10; y++)
@@ -91,7 +91,7 @@ main ()
        ;

 #pragma acc kernels loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop
     for (y = 0; y < 10; y++)
@@ -100,7 +100,7 @@ main ()
        ;

 #pragma acc kernels loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop auto
     for (y = 0; y < 10; y++)
@@ -109,7 +109,7 @@ main ()
        ;

 #pragma acc kernels loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop auto
     for (y = 0; y < 10; y++)
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
index 54960918d8c2..9ad36a9ab5f0 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
@@ -12,47 +12,38 @@ main ()
   int x, y, z;

 #pragma acc kernels loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent gang /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent worker /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent vector /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent gang vector /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent gang worker /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent worker vector /* { dg-message "optimized: assigned OpenACC worker vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent gang worker vector /* { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent gang /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent worker /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -61,19 +52,16 @@ main ()
        ;

 #pragma acc kernels loop independent /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
     for (y = 0; y < 10; y++)
       ;

 #pragma acc kernels loop independent /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -82,7 +70,6 @@ main ()
        ;

 #pragma acc kernels loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -91,7 +78,6 @@ main ()
        ;

 #pragma acc kernels loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -100,7 +86,6 @@ main ()
        ;

 #pragma acc kernels loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -109,7 +94,6 @@ main ()
        ;

 #pragma acc kernels loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
     for (y = 0; y < 10; y++)
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
index 0a3babe7a44c..c26a7b40951f 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
@@ -1,8 +1,10 @@
-/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+/* Test the output of "-fopt-info-note-omp" for OpenACC 'kernels'
    constructs containing conditionally executed 'loop' constructs with
    'independent' or 'seq' clauses.  */

-/* { dg-additional-options "-fopt-info-all-omp" } */
+/* { dg-additional-options "-fopt-info-note-omp" } */
+/* { dg-additional-options "-fopt-info-missed-omp" } */
+/* { dg-additional-options "--param openacc-kernels=decompose-parloops" } */

 //TODO update accordingly
 /* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
@@ -14,7 +16,7 @@ main ()
 {
   int x, y, z;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -24,7 +26,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -34,7 +36,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -44,7 +46,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -54,7 +56,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -64,7 +66,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -74,7 +76,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -84,7 +86,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -94,7 +96,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -108,7 +110,7 @@ main ()
        ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -118,7 +120,7 @@ main ()
     ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -130,7 +132,7 @@ main ()
       ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -144,7 +146,7 @@ main ()
        ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -158,7 +160,7 @@ main ()
        ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -172,7 +174,7 @@ main ()
        ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
@@ -186,7 +188,7 @@ main ()
        ;
  }

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels
  /* Strangely indented to keep this similar to other test cases.  */
  if (c) /* { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" } */
  {
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
index 4f17204d1991..c9a39eb54ef9 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
@@ -13,124 +13,134 @@ main ()
   int x, y, z;

 #pragma acc kernels
-#pragma acc loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang /* { dg-optimized "assigned OpenACC gang loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto worker /* { dg-optimized "assigned OpenACC worker loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto vector /* { dg-optimized "assigned OpenACC vector loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto gang vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang vector /* { dg-optimized "assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto gang worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang worker /* { dg-optimized "assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto worker vector /* { dg-optimized "assigned OpenACC worker vector loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto gang worker vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang worker vector /* { dg-optimized "assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto gang /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto gang /* { dg-optimized "assigned OpenACC gang loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto worker /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+#pragma acc loop auto worker /* { dg-optimized "assigned OpenACC worker loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto vector /* { dg-error ".auto. conflicts with other OpenACC loop specifiers" } */
+#pragma acc loop auto vector /* { dg-optimized "assigned OpenACC vector loop parallelism" } */
+  /* { dg-bogus ".auto. conflicts with other OpenACC loop specifiers" "" { xfail *-*-* } .-1 } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels
-#pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
-#pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
       ;

 #pragma acc kernels
-#pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels
-#pragma acc loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels
-#pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop
+#pragma acc loop /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels
-#pragma acc loop auto
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop
+#pragma acc loop /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

 #pragma acc kernels
-#pragma acc loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
-#pragma acc loop
+#pragma acc loop /* { dg-optimized "assigned OpenACC seq loop parallelism" } */
       for (z = 0; z < 10; z++)
        ;

diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
index bbdcf1636b10..f0ac62e5d55f 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
@@ -2,7 +2,7 @@
    constructs containing 'loop' constructs with 'independent' or 'seq'
    clauses.  */

-/* { dg-additional-options "-fopt-info-note-optimized-omp" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */

 //TODO update accordingly
 /* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
@@ -14,55 +14,46 @@ main ()

 #pragma acc kernels
 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent worker /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent vector /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent gang vector /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent gang worker /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent worker vector /* { dg-message "optimized: assigned OpenACC worker vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent gang worker vector /* { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent worker /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -72,13 +63,11 @@ main ()

 #pragma acc kernels
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
     ;

 #pragma acc kernels
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -86,7 +75,6 @@ main ()

 #pragma acc kernels
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -96,7 +84,6 @@ main ()

 #pragma acc kernels
 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -106,7 +93,6 @@ main ()

 #pragma acc kernels
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -116,7 +102,6 @@ main ()

 #pragma acc kernels
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang worker loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
   for (x = 0; x < 10; x++)
 #pragma acc loop independent /* { dg-message "optimized: assigned OpenACC vector loop parallelism" } */
     for (y = 0; y < 10; y++)
@@ -124,15 +109,5 @@ main ()
       for (z = 0; z < 10; z++)
        ;

-#pragma acc kernels
-#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. region" "" { target *-*-* } .-1 } */
-  for (x = 0; x < 10; x++)
-#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang vector loop parallelism" } */
-    for (y = 0; y < 10; y++)
-#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-      for (z = 0; z < 10; z++)
-       ;
-
   return 0;
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
index 92accdf27fa2..30706a7e4a5e 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
@@ -1,8 +1,9 @@
 /* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
    construct containing loops.  */

-/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
-/* { dg-additional-options "-fopt-info-note-optimized-omp" } */
+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+   Normally, loop variables would get (implicit) 'private' clauses on the (implicit) 'loop' directives, but given '-fno-openacc-kernels-annotate-loops' they're (implicit) 'copy' -- which we then see get optimized.  */
+/* { dg-additional-options "-fopt-info-optimized-note-omp" } */

 //TODO update accordingly
 /* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
@@ -12,31 +13,31 @@ main ()
 {
   int x, y, z;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     ;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     ;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     for (y = 0; y < 10; y++)
       for (z = 0; z < 10; z++)
        ;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     ;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     for (y = 0; y < 10; y++)
       ;

-#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
-  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. region in OpenACC .kernels. construct" } */
+#pragma acc kernels
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" } */
     for (y = 0; y < 10; y++)
       for (z = 0; z < 10; z++)
        ;
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-1.c b/gcc/testsuite/c-c++-common/goacc/routine-1.c
index a11e602db363..9b65c6c4a00b 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-1.c
@@ -34,7 +34,7 @@ void nohost (void)

 int main ()
 {
-#pragma acc kernels num_gangs (32) num_workers (32) vector_length (32) /* { dg-warning "region contains gang partitioned code but is not gang partitioned" } */
+#pragma acc kernels num_gangs (32) num_workers (32) vector_length (32)
   {
     gang ();
     worker ();
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-level-of-parallelism-2.c b/gcc/testsuite/c-c++-common/goacc/routine-level-of-parallelism-2.c
index 0e0e4a728f05..33678fe8d0a9 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-level-of-parallelism-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-level-of-parallelism-2.c
@@ -11,8 +11,6 @@
    { dg-warning "region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } .+1 } */
 void g_1 (void)
 {
-  /* { dg-bogus "region is worker partitioned but does not contain worker partitioned code" "" { xfail *-*-* } .-2 } */
-  /* { dg-bogus "region is vector partitioned but does not contain vector partitioned code" "" { xfail *-*-* } .-3 } */
 }
 #pragma acc routine (g_1) gang
 #pragma acc routine (g_1) gang
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
index 480c57feb05f..bf3c1f3c7cbf 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
@@ -1,7 +1,7 @@
 /* Test the nohost clause for OpenACC routine directive.  Exercising different
    variants for declaring routines.  */

-/* { dg-additional-options "-fdump-tree-oaccloops" } */
+/* { dg-additional-options "-fdump-tree-oaccloops3" } */

 #pragma acc routine nohost
 int THREE(void)
diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
index 628b84940a1c..b3cc4459328f 100644
--- a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
@@ -7,12 +7,6 @@ foo (void)
   int i;

 #pragma acc kernels
-  /* { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 } */
-  /*TODO With the 'copy' -> 'firstprivate' optimization, the original implicit 'copy(i)' clause gets optimized into a 'firstprivate(i)' clause -- and the expected (?) warning diagnostic appears.
-    Have to read up the history behind these test cases.
-    Should this test remain here in this file even if now testing 'firstprivate'?
-    Or, should the optimization be disabled for such testing?
-    Or, the testing be duplicated for both variants?  */
   {
     i = 1;
   }
diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
index 6979cce71b05..d2245a8f70ce 100644
--- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow*" } */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow1" } */

 extern int place ();

diff --git a/gcc/testsuite/gcc.dg/goacc/nested-function-1.c b/gcc/testsuite/gcc.dg/goacc/nested-function-1.c
index e17c0e2227fc..6b94692ced7b 100644
--- a/gcc/testsuite/gcc.dg/goacc/nested-function-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/nested-function-1.c
@@ -2,6 +2,7 @@
 /* See gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90 for the Fortran
    version.  */

+/* { dg-excess-errors ".*insufficient partitioning.*" } */
 int main ()
 {
 #define N 100
@@ -35,7 +36,7 @@ int main ()
 #pragma acc loop seq tile(*)
        for (local_j = 0; local_j < N; ++local_j)
          ;
-#pragma acc loop auto independent tile(1)
+#pragma acc loop auto tile(1)
        for (local_j = 0; local_j < N; ++local_j)
          ;
       }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 7abf12f3d58d..b41b9e88a8be 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -2,11 +2,10 @@
 ! OpenACC kernels.

 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-note-optimized-omp" }
 ! { dg-additional-options "-fdump-tree-ompexp" }
-! { dg-additional-options "-fdump-tree-parloops1-all" }
-! { dg-additional-options "-fdump-tree-oaccloops" }
+! { dg-additional-options "-fdump-tree-graphite-all-details" }
+! { dg-additional-options "-fdump-tree-oaccloops1-details" }

 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
@@ -17,30 +16,23 @@ program main
   integer, dimension (0:n-1) :: a, b, c
   integer :: i

-  ! An "external" mapping of loop iterations/array indices makes the loop
-  ! unparallelizable.
+  ! A function call in a data-reference makes the loop unparallelizable
   integer, external :: f

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
-  do i = 0, n - 1 ! { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" }
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "note: beginning .Graphite. part in OpenACC .kernels. region" }
      c(i) = a(f (i)) + b(f (i))
   end do
   !$acc end kernels
 end program main

 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
-
-! Check that exactly one OpenACC kernels construct is analyzed, and that it
-! can't be parallelized.
-! { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "ompexp" } }

 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-not "^assigned OpenACC.*?loop parallelism$" "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "oaccloops1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index fb19a98d8a59..467e3ffb4b64 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -2,11 +2,8 @@
 ! kernels.

 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
-! { dg-additional-options "-fopt-info-note-optimized-omp" }
-! { dg-additional-options "-fdump-tree-ompexp" }
-! { dg-additional-options "-fdump-tree-parloops1-all" }
-! { dg-additional-options "-fdump-tree-oaccloops" }
+! { dg-additional-options "-fopt-info-optimized" }
+! { dg-additional-options "-fdump-tree-oaccloops1" }

 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
@@ -19,24 +16,15 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))  ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
-  do i = 0, n - 1 ! { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" }
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-optimized "assigned OpenACC gang vector loop parallelism" }
      c(i) = a(i) + b(i)
   end do
   !$acc end kernels
 end program main

-! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
-
-! Check that exactly one OpenACC kernels construct is analyzed, and that it
-! can be parallelized.
-! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
-
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is parallel_kernels_graphite OpenACC kernels offload" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "oaccloops1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
index ce4c08ff219d..347f17dbf131 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
@@ -29,6 +29,6 @@ end program main

 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95
index 02c929d31a00..dea566f07750 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-routine.f95
@@ -4,7 +4,7 @@
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 ! { dg-additional-options "-fdump-tree-ompexp" }
-! { dg-additional-options "-fdump-tree-oaccloops" }
+! { dg-additional-options "-fdump-tree-oaccloops1" }

 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
@@ -28,6 +28,6 @@ end subroutine ROUTINE

 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\)\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC routine level 1" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0 1, 1 1, 1 1\\), omp declare target \\(worker\\)\\)\\)" 1 "oaccloops1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95
index 946f4a80c012..4ce2d05e308a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95
@@ -4,7 +4,7 @@
 ! { dg-additional-options "-O2 -w" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 ! { dg-additional-options "-fdump-tree-ompexp" }
-! { dg-additional-options "-fdump-tree-oaccloops" }
+! { dg-additional-options "-fdump-tree-oaccloops1" }

 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
@@ -18,9 +18,6 @@ program main
   call setup(a, b)

   !$acc serial loop copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC gang vector loop parallelism" }
-  ! { dg-bogus "\[Ww\]arning: region contains gang partitioned code but is not gang partitioned" "TODO 'serial'" { xfail *-*-* } .-1 }
-  ! { dg-bogus "\[Ww\]arning: region contains worker partitioned code but is not worker partitioned" "" { target *-*-* } .-2 }
-  ! { dg-bogus "\[Ww\]arning: region contains vector partitioned code but is not vector partitioned" "TODO 'serial'" { xfail *-*-* } .-3 }
   do i = 0, n - 1
      c(i) = a(i) + b(i)
   end do
@@ -32,6 +29,6 @@ end program main

 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC serial offload" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC serial offload" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint\\)\\)" 1 "oaccloops1" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
index e9f169f95178..025ae0fd1e7b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
@@ -27,16 +27,16 @@ program main
   !$acc end kernels
 end program main

-! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:a \\\[len: 400\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:a \\\[len: 400\\\]\\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:b \\\[len: 400\\\]\\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:c \\\[len: 4\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:c \\\[len: 4\\\]\\\)" 1 "omplower" } }

-! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(tofrom:x \\\[len: 400\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(tofrom:x \\\[len: 400\\\]\\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(tofrom:y \\\[len: 400\\\]\\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_present:x \\\[len: 400\\\]\\\[implicit\\\]\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_present:y \\\[len: 400\\\]\\\[implicit\\\]\\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_tofrom:i \\\[len: 4\\\]\\\[implicit\\\]\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_tofrom:c \\\[len: 4\\\]\\\[implicit\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel_kernels_graphite .*map\\(force_present:x \\\[len: 400\\\]\\\[implicit\\\]\\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel_kernels_graphite .*map\\(force_present:y \\\[len: 400\\\]\\\[implicit\\\]\\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel_kernels_graphite .*private\\(i\\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_parallel_kernels_graphite .*map\\(force_tofrom:c \\\[len: 4\\\]\\\[implicit\\\]\\\)" 1 "omplower" } }

 ! Expecting no mapping of un-referenced common-blocks variables

diff --git a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95 b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
index cc83c7dd65b9..37c3222d4081 100644
--- a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
@@ -17,7 +17,7 @@ program main

   call test (a, b, 0, n)

-  !$acc parallel loop gang (static:1) num_gangs (10)
+  !$acc parallel loop gang (static:1) num_gangs (11)
   do i = 1, n
      a(i) = b(i) + 1
   end do
@@ -25,7 +25,7 @@ program main

   call test (a, b, 1, n)

-  !$acc parallel loop gang (static:2) num_gangs (10)
+  !$acc parallel loop gang (static:2) num_gangs (12)
   do i = 1, n
      a(i) = b(i) + 2
   end do
@@ -33,7 +33,7 @@ program main

   call test (a, b, 2, n)

-  !$acc parallel loop gang (static:5) num_gangs (10)
+  !$acc parallel loop gang (static:5) num_gangs (13)
   do i = 1, n
      a(i) = b(i) + 5
   end do
@@ -41,7 +41,7 @@ program main

   call test (a, b, 5, n)

-  !$acc parallel loop gang (static:20) num_gangs (10)
+  !$acc parallel loop gang (static:20) num_gangs (14)
   do i = 1, n
      a(i) = b(i) + 20
   end do
@@ -73,10 +73,8 @@ subroutine test (a, b, sarg, n)
   end do
 end subroutine test

-! { dg-final { scan-tree-dump-times "gang\\(static:\\\*\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "gang\\(static:1\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "gang\\(static:2\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "gang\\(static:5\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "gang\\(static:20\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "gang\\(static:20\\)" 2 "omplower" } }
+! { dg-final { scan-tree-dump-times "gang\\(static:\\\*\\)" 2 "omplower" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
index ddaf7f8e43d4..a70f5efdd6ba 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -1,92 +1,161 @@
 ! Test OpenACC 'kernels' construct decomposition.

-! { dg-additional-options "-fopt-info-omp-all" }
-! { dg-additional-options "-fdump-tree-gimple" }
-! { dg-additional-options "--param=openacc-kernels=decompose" }
-! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
-
+! { dg-additional-options "-fopt-info-optimized-note-omp" }
+! { dg-additional-options "-O2" } for "Graphite".
+! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose-details -fdump-tree-gimple-details" }
 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.

-! See also '../../c-c++-common/goacc/kernels-decompose-1.c'.
-
-! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
-! passed to 'incr' may be unset, and in that case, it will be set to [...]",
-! so to maintain compatibility with earlier Tcl releases, we manually
-! initialize counter variables:
-! { dg-line l_dummy[variable c_loop_i 0] }
-! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
-! "WARNING: dg-line var l_dummy defined, but not used".
-
 program main
   implicit none
-  integer, parameter         :: N = 1024
-  integer, dimension (1:N)   :: a
-  integer                    :: i, sum

-  !$acc kernels copyin(a(1:N)) copy(sum)
-  ! { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } .-1 }
-  !TODO Is this maybe the report that belongs to the XFAILed report further down?  */
+  integer, external :: f_g
+  !$acc routine (f_g) gang
+  integer, external :: f_w
+  !$acc routine (f_w) worker
+  integer, external :: f_v
+  !$acc routine (f_v) vector
+  integer, external :: f_s
+  !$acc routine (f_s) seq
+
+  integer :: i, j, k
+  integer :: x, y, z
+  logical :: y_l
+  integer, parameter :: N = 10
+  integer :: a(N), b(N), c(N)
+  integer :: sum
+
+  !$acc kernels
+  ! { dg-optimized {'map\(force_tofrom:x \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:x \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-1 }
+  ! { dg-optimized {'map\(to:x \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(x\)'} "" { target *-*-* } .-2 }
+  ! { dg-optimized {'map\(force_tofrom:y_l \[len: [0-9]+\]\[implicit\]\)' optimized to 'map\(to:y_l \[len: [0-9]+\]\[implicit\]\)'} "" { target *-*-* } .-3 }
+  ! { dg-optimized {'map\(to:y_l \[len: [0-9]+\]\[implicit\]\)' further optimized to 'private\(y_l\)'} "" { target *-*-* } .-4 }
+  x = 0
+  y = 0
+  y_l = x < 10
+  z = x
+  x = x + 1
+  ;
+  !$acc end kernels
+
+  !$acc kernels
+  do i = 1, N ! { dg-optimized "assigned OpenACC gang vector loop parallelism" }
+     a(i) = 0
+  end do
+  !$acc end kernels

-  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  !$acc kernels loop
+  ! { dg-bogus "assigned OpenACC seq loop parallelism" "TODO-kernels Graphite cannot represent access function" { xfail *-*-* } .-1 }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "TODO-kernels Graphite cannot represent access function " { xfail *-*-* } .-2 }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-3 }
   do i = 1, N
-    sum = sum + a(i)
+     b(i) = a(N - i + 1)
   end do

-  sum = sum + 1 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
-  a(1) = a(1) + 1
+  !$acc kernels ! { dg-optimized {.map\(force_tofrom:sum \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:sum \[len: [0-9]+\]\[implicit\]\).} }
+  !$acc loop
+  ! { dg-bogus "assigned OpenACC seq loop parallelism" "TODO-kernels Graphite cannot represent access function" { xfail *-*-* } .-1 }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "TODO-kernels Graphite cannot represent access function " { xfail *-*-* } .-2 }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-3 }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do

-  !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  !$acc loop
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "TODO-kernels Graphite cannot represent access function " { target *-*-* } .-1 }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-2 }
   do i = 1, N
-    sum = sum + a(i)
+     c(i) = a(i) * b(i)
   end do

-  if (sum .gt. 10) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
-    !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-    ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_i$c_loop_i }
-    !TODO { dg-optimized "assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_loop_i$c_loop_i }
-    do i = 1, N
-      sum = sum + a(i)
-    end do
-  end if
+  a(z) = 0

-  !$acc loop auto ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  !$acc loop
+  ! { dg-bogus "assigned OpenACC seq loop parallelism" "TODO-kernels missing synth reductions" { xfail *-*-* } .-1 }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "TODO-kernels missing synth reductions" { xfail *-*-* } .-2 }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. region to .Graphite. for analysis" "" { target *-*-* } .-3 }
   do i = 1, N
     sum = sum + a(i)
   end do

+  !$acc loop seq ! { dg-optimized "assigned OpenACC seq loop parallelism" }
+  do i = 1 + 1, N
+     c(i) = c(i) + c(i - 1)
+  end do
   !$acc end kernels
-end program main

-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels map\(to:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(tofrom:sum \[len: [0-9]+\]\)$} 1 "gimple" } }
+  !$acc kernels
+  !$acc loop independent ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N
+     !$acc loop independent ! { dg-optimized "assigned OpenACC worker loop parallelism" }
+     do j = 1, N
+        !$acc loop independent ! { dg-optimized "assigned OpenACC seq loop parallelism" }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        ! { dg-bogus "optimized: assigned OpenACC vector loop parallelism" "" { target *-*-* } .-2 }
+        do k = 1, N
+           a(1 + mod(i + j + k, N)) &
+                = b(j) &
+                + f_v (c(k)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+        end do
+     end do
+  end do
+
+  if (y < 5) then
+     !$acc loop independent ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+     do j = 1, N
+        b(j) = f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+     end do
+  end if
+  !$acc end kernels
+
+  !$acc kernels
+  y = f_g (a(5)) ! { dg-optimized "assigned OpenACC gang worker vector loop parallelism" }
+
+  !$acc loop independent ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  ! { dg-bogus "optimized: assigned OpenACC gang vector loop parallelism" "" { target *-*-* } .-1 }
+  do j = 1, N
+     b(j) = y + f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  ! { dg-optimized {.map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:z \[len: [0-9]+\]\[implicit\]\).} "" { target *-*-* } .-1 }
+  ! { dg-optimized {.map\(to:z \[len: [0-9]+\]\[implicit\]\). further optimized to .private\(z\).} "" { target *-*-* } .-2 }
+  ! { dg-optimized {.map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:y \[len: [0-9]+\]\[implicit\]\).} "" { target *-*-* } .-3 }
+  ! { dg-optimized {.map\(to:y \[len: [0-9]+\]\[implicit\]\). further optimized to .private\(y\).} "" { target *-*-* } .-4 }
+  y = 3
+
+  !$acc loop independent ! { dg-optimized "assigned OpenACC gang worker loop parallelism" }
+  ! { dg-bogus "optimized: assigned OpenACC gang vector loop parallelism" "" { target *-*-* } .-1 }
+  do j = 1, N
+     b(j) = y + f_v (c(j)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+  end do
+
+  z = 2
+  !$acc end kernels
+
+  !$acc kernels
+  !$acc end kernels
+end program main

-! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "gimple" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 4 "gimple" } }
 ! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) independent$} 1 "gimple" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) auto$} 1 "gimple" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "gimple" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop auto private\(i\)$} 1 "gimple" } }

 ! Check that the OpenACC 'kernels' got decomposed into 'data' and an enclosed
 ! sequence of compute constructs.
-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels map\(to:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(tofrom:sum \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
-! As noted above, we get three "old-style" kernel regions, one gang-single region, and one parallelized loop region.
-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels async\(-1\) map\(force_present:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(force_present:sum \[len: [0-9]+\]\)$} 3 "omp_oacc_kernels_decompose" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_parallelized async\(-1\) map\(force_present:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(force_present:sum \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single async\(-1\) num_gangs\(1\) map\(force_present:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(force_present:sum \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels} 8 "omp_oacc_kernels_decompose" } }
 !
 ! 'data' plus five CCs.
-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target } 6 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_graphite async\(-1\)} 6 "omp_oacc_kernels_decompose" } }

-! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop auto private\(i\)$} 5 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) seq$} 1 "omp_oacc_kernels_decompose" } }
 ! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) independent$} 1 "omp_oacc_kernels_decompose" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) auto} 1 "omp_oacc_kernels_decompose" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(j\) independent$} 4 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(k\) independent$} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 12 "omp_oacc_kernels_decompose" } }

 ! Each of the parallel regions is async, and there is a final call to
 ! __builtin_GOACC_wait.
-! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 4 "omp_oacc_kernels_decompose" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index c3e00283428b..c86535a9cd83 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,9 +1,9 @@
 ! Test OpenACC 'kernels' construct decomposition.

-! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+! { dg-additional-options "-fopenacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
-! { dg-additional-options "-O2" } for 'parloops'.
+! { dg-additional-options "-O2" } for 'Graphite'.

 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
@@ -36,71 +36,102 @@ program main
   integer, parameter :: N = 10
   integer :: a(N), b(N), c(N)

-  !$acc kernels
-  x = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  !$acc kernels ! { dg-line l_kernels[incr region] }
+  ! { dg-missed {.map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\). not optimized: .y. used} "" { target *-*-* } l_kernels$region }
+  ! { dg-optimized {.map\(force_tofrom:y_l \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:y_l \[len: [0-9]+\]\[implicit\]\).} "" { target *-*-* } l_kernels$region }
+  ! { dg-optimized {.map\(to:y_l \[len: [0-9]+\]\[implicit\]\). further optimized to .private\(y_l\).}  "" { target *-*-* } l_kernels$region }
+  ! { dg-optimized {.map\(force_tofrom:x \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:x \[len: [0-9]+\]\[implicit\]\).} "" { target *-*-* } l_kernels$region }
+  ! { dg-optimized {.map\(to:x \[len: [0-9]+\]\[implicit\]\). further optimized to .private\(x\).}  "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\). not optimized: .z. used} "" { target *-*-* } l_kernels$region }
+  x = 0
   y = 0
   y_l = x < 10
   z = x
   x = x + 1
+
   ;
   !$acc end kernels

-  !$acc kernels ! { dg-optimized "assigned OpenACC gang loop parallelism" }
-  do i = 1, N ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+  !$acc kernels ! { dg-line l_kernels[incr region] }
+  ! { dg-missed {'map\(tofrom:a \[len: [0-9]+\]\[implicit\]\)' not optimized: 'a' is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  do i = 1, N  ! { dg-line l_loop_i[incr c_loop_i] }
+     ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'Graphite' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+     ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
      a(i) = 0
   end do
   !$acc end kernels

   !$acc kernels loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-missed {.map\(tofrom:b \[len: [0-9]+\]\[implicit\]\). not optimized: .b. is unsuitable for privatization} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-missed {.map\(tofrom:a \[len: [0-9]+\]\[implicit\]\). not optimized: .a. is unsuitable for privatization} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'Graphite' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "TODO Graphite cannot represent access function" { xfail *-*-* } l_loop_i$c_loop_i }
+  ! { dg-bogus "assigned OpenACC seq loop parallelism" "TODO Graphite cannot represent access function" { xfail *-*-* } l_loop_i$c_loop_i }
+  ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(N - i + 1)
   end do

-  !$acc kernels
+  !$acc kernels ! { dg-line l_kernels[incr region] }
+  ! { dg-missed {.map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\). not optimized: .z. used} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed    {\.\.\. here}  "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(tofrom:c \[len: [0-9]+\]\[implicit\]\). not optimized: .c. is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(tofrom:b \[len: [0-9]+\]\[implicit\]\). not optimized: .b. is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(tofrom:a \[len: [0-9]+\]\[implicit\]\). not optimized: .a. is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'Graphite' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { xfail *-*-* } l_loop_i$c_loop_i }
+  ! { dg-bogus "assigned OpenACC seq loop parallelism" "TODO Graphite cannot represent access function" { xfail *-*-* } l_loop_i$c_loop_i }
+  ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(N - i + 1)
   end do

   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'Graphite' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      c(i) = a(i) * b(i)
   end do

-  a(z) = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  a(z) = 0

   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
-  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'Graphite' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      c(i) = c(i) + a(i)
   end do

   !$acc loop seq ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1 + 1, N
      c(i) = c(i) + c(i - 1)
   end do
   !$acc end kernels

-  !$acc kernels ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  !$acc kernels ! { dg-line l_kernels[incr region] }
+  ! { dg-missed {.map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\). not optimized: .y. used} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed    {\.\.\. here}  "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(tofrom:c \[len: [0-9]+\]\[implicit\]\). not optimized: .c. is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(tofrom:b \[len: [0-9]+\]\[implicit\]\). not optimized: .b. is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {.map\(tofrom:a \[len: [0-9]+\]\[implicit\]\). not optimized: .a. is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-bogus "missed: .independent. loop in .kernels. region has not been analyzed .cf. .graphite. dumps for more information.." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_i$c_loop_i }
+  ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
      ! { dg-optimized "assigned OpenACC worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+     ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
+     ! { dg-bogus "missed: .independent. loop in .kernels. region has not been analyzed .cf. .graphite. dumps for more information.." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
      do j = 1, N
         !$acc loop independent ! { dg-line l_loop_k[incr c_loop_k] }
         ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } l_loop_k$c_loop_k }
         ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_k$c_loop_k }
+        ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_k$c_loop_k }
+        ! { dg-bogus "missed: .independent. loop in .kernels. region has not been analyzed .cf. .graphite. dumps for more information.." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_k$c_loop_k }
         do k = 1, N
            a(1 + mod(i + j + k, N)) &
                 = b(j) &
@@ -109,45 +140,58 @@ program main
      end do
   end do

-  !TODO Should the following turn into "gang-single" instead of "parloops"?
-  !TODO The problem is that the first STMT is 'if (y <= 4) goto <D.2547>; else goto <D.2548>;', thus "parloops".
-  if (y < 5) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+  if (y < 5) then ! { dg-message "note: beginning 'Graphite' part in OpenACC 'kernels' region" }
      !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
-     ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j }
+     ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j } ! TODO-kernels Clarify: should this be unparallelized or should the warning go away?
+     ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
+     ! { dg-bogus "missed: .independent. loop in .kernels. region has not been analyzed .cf. .graphite. dumps for more information.." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
+     ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
      do j = 1, N
-        b(j) = f_w (c(j))
+        b(j) = f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
      end do
   end if
   !$acc end kernels

-  !$acc kernels
-  ! { dg-bogus "\[Ww\]arning: region contains gang partitioned code but is not gang partitioned" "TODO 'kernels'" { xfail *-*-* } .-1 }
+  !$acc kernels ! { dg-line l_kernels[incr region] }
+  ! { dg-missed {'map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\)' not optimized: 'y' used} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {'map\(tofrom:c \[len: [0-9]+\]\[implicit\]\)' not optimized: 'c' is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {'map\(tofrom:b \[len: [0-9]+\]\[implicit\]\)' not optimized: 'b' is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
+  ! { dg-missed {'map\(tofrom:a \[len: [0-9]+\]\[implicit\]\)' not optimized: 'a' is unsuitable for privatization} "" { target *-*-* } l_kernels$region }
   y = f_g (a(5)) ! { dg-line l_part[incr c_part] }
-  !TODO If such a construct is placed in its own part (like it is, here), can't this actually use gang paralelism, instead of "gang-single"?
-  ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" "" { target *-*-* } l_part$c_part }
   ! { dg-optimized "assigned OpenACC gang worker vector loop parallelism" "" { target *-*-* } l_part$c_part }

   !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
   ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
+  ! { dg-bogus "missed: .independent. loop in .kernels. region has not been analyzed .cf. .graphite. dumps for more information.." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
   do j = 1, N
      b(j) = y + f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
   end do
   !$acc end kernels

-  !$acc kernels
-  y = 3 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  !$acc kernels ! { dg-line l_kernels[incr region] }
+! { dg-optimized {.map\(force_tofrom:z \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:z \[len: [0-9]+\]\[implicit\]\).} "" { target *-*-* } l_kernels$region }
+! { dg-optimized {.map\(to:z \[len: [0-9]+\]\[implicit\]\). further optimized to .private\(z\).}  "" { target *-*-* } l_kernels$region }
+! { dg-optimized {.map\(force_tofrom:y \[len: [0-9]+\]\[implicit\]\). optimized to .map\(to:y \[len: [0-9]+\]\[implicit\]\).}  "" { target *-*-* } l_kernels$region }
+! { dg-optimized {.map\(to:y \[len: [0-9]+\]\[implicit\]\). further optimized to .private\(y\).}  "" { target *-*-* } l_kernels$region }
+! { dg-missed    {\.\.\. here}  "" { target *-*-* } l_kernels$region }
+! { dg-missed    {.map\(tofrom:c \[len: [0-9]+\]\[implicit\]\). not optimized: .c. is unsuitable for privatization}  "" { target *-*-* } l_kernels$region }
+! { dg-missed    {.map\(tofrom:b \[len: [0-9]+\]\[implicit\]\). not optimized: .b. is unsuitable for privatization}  "" { target *-*-* } l_kernels$region }
+! { dg-missed    {\.\.\. here}  "" { target *-*-* } l_kernels[expr {$region - 1}] }
+
+  y = 3

   !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
   ! { dg-optimized "assigned OpenACC gang worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-bogus "missed: .auto. loop has not been analyzed .cf. .graphite. dumps for more information." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
+  ! { dg-bogus "missed: .independent. loop in .kernels. region has not been analyzed .cf. .graphite. dumps for more information.." "TODO Inexact representation of access function in Graphite" { xfail *-*-* } l_loop_j$c_loop_j }
   do j = 1, N
      b(j) = y + f_v (c(j)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
   end do

-  z = 2 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  z = 2
   !$acc end kernels

-  !$acc kernels ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  !$acc kernels
   !$acc end kernels
 end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index 3884a81e6fa3..f3ab71da5f80 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -1,8 +1,7 @@
-! { dg-additional-options "--param=openacc-kernels=parloops" } as this is
-! specifically testing "parloops" handling.
+! { dg-additional-options "--param openacc-kernels=decompose" } as this is
+! specifically testing "Graphite" handling.
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
-! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-graphite-details" }
 ! { dg-additional-options "-fdump-tree-optimized" }

 program main
@@ -36,9 +35,9 @@ program main
 end program main

 ! Check that only three loops are analyzed, and that all can be parallelized.
-! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+! { dg-final { scan-tree-dump-times "loop has no data-dependences" 6 "graphite" } } ! Two CFG loops per OpenACC loop
+! { dg-final { scan-tree-dump-not "loop has data-dependences" "graphite" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 3 "graphite" } }

 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 62cd4d38a36c..348a2e186d87 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -1,8 +1,7 @@
-! { dg-additional-options "--param=openacc-kernels=parloops" } as this is
-! specifically testing "parloops" handling.
+! { dg-additional-options "--param=openacc-kernels=decompose" } as this is
+! specifically testing "Graphite" handling.
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
-! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-graphite-details" }
 ! { dg-additional-options "-fdump-tree-optimized" }

 program main
@@ -42,9 +41,9 @@ program main
 end program main

 ! Check that only three loops are analyzed, and that all can be parallelized.
-! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 3 "graphite" } }
+! { dg-final { scan-tree-dump-times "loop has no data-dependences" 6 "graphite" } } ! Two CFG loops per OpenACC loop

 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index 23e64d29ab3e..0c26c9897c0e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -1,4 +1,4 @@
-! { dg-additional-options "--param=openacc-kernels=parloops" } as this is
+! { dg-additional-options "--param openacc-kernels=decompose-parloops" } as this is
 ! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
@@ -9,8 +9,8 @@ program main
    integer :: a(100,100), b(100,100)
    integer :: i, j, d

-   !$acc kernels ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
-   do i=1,100
+   !$acc kernels
+   do i=1,10 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }0
      do j=1,100
        a(i,j) = 1
        b(i,j) = 2
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index 2f8db4b9fd7b..7ac3831294f7 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -1,8 +1,7 @@
-! { dg-additional-options "--param=openacc-kernels=parloops" } as this is
-! specifically testing "parloops" handling.
+! { dg-additional-options "--param openacc-kernels=decompose" } as this is
+! specifically testing "Graphite" handling.
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
-! { dg-additional-options "-fdump-tree-parloops1-all" }
+! { dg-additional-options "-fdump-tree-graphite-details" }
 ! { dg-additional-options "-fdump-tree-optimized" }

 program main
@@ -32,9 +31,8 @@ program main
 end program main

 ! Check that only one loop is analyzed, and that it can be parallelized.
-! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc parallel_kernels_graphite, omp target entrypoint\\)\\)" 1 "graphite" } }
+! { dg-final { scan-tree-dump-times "loop has no data-dependences" 2 "graphite" } } ! Two CFG loops per OpenACC loop

 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index 688ed0a7dc37..0a76b90d279c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -38,4 +38,4 @@ end program test
 ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 "original" } }

 ! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels if\((?:D\.|_)[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } }
-! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single num_gangs\(1\) if\((?:D\.|_)[0-9]+\) async\(-1\)$} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_graphite if\((?:D\.|_)[0-9]+\) async\(-1\)$} 1 "omp_oacc_kernels_decompose" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95
index a4cf11c806f7..853a4b22e212 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95
@@ -35,7 +35,7 @@ program test
       DO j = 1,10
       ENDDO
     ENDDO
-    !$acc loop seq gang ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+    !$acc loop seq gang ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
     DO i = 1,10
     ENDDO

@@ -56,11 +56,11 @@ program test
       !$acc loop worker ! { dg-error "inner loop uses same OpenACC parallelism as containing loop" }
       DO j = 1,10
       ENDDO
-      !$acc loop gang ! { dg-error "" "TODO" { xfail *-*-* } }
+      !$acc loop gang ! { dg-error "" }
       DO j = 1,10
       ENDDO
     ENDDO
-    !$acc loop seq worker ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+    !$acc loop seq worker ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
     DO i = 1,10
     ENDDO
     !$acc loop gang worker
@@ -81,14 +81,14 @@ program test
       !$acc loop vector ! { dg-error "inner loop uses same OpenACC parallelism as containing loop" }
       DO j = 1,10
       ENDDO
-      !$acc loop worker ! { dg-error "" "TODO" { xfail *-*-* } }
+      !$acc loop worker ! { dg-error "" }
       DO j = 1,10
       ENDDO
-      !$acc loop gang ! { dg-error "" "TODO" { xfail *-*-* } }
+      !$acc loop gang ! { dg-error "" }
       DO j = 1,10
       ENDDO
     ENDDO
-    !$acc loop seq vector ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+    !$acc loop seq vector ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
     DO i = 1,10
     ENDDO
     !$acc loop gang vector
@@ -101,7 +101,7 @@ program test
     !$acc loop auto
     DO i = 1,10
     ENDDO
-    !$acc loop seq auto ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+    !$acc loop seq auto ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
     DO i = 1,10
     ENDDO
     !$acc loop gang auto ! { dg-error "'auto' conflicts with other OpenACC loop specifiers" }
@@ -133,7 +133,7 @@ program test
   !$acc kernels loop gang(static:*)
   DO i = 1,10
   ENDDO
-  !$acc kernels loop seq gang ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+  !$acc kernels loop seq gang ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
   DO i = 1,10
   ENDDO

@@ -146,7 +146,7 @@ program test
   !$acc kernels loop worker(num:5)
   DO i = 1,10
   ENDDO
-  !$acc kernels loop seq worker ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+  !$acc kernels loop seq worker ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
   DO i = 1,10
   ENDDO
   !$acc kernels loop gang worker
@@ -162,7 +162,7 @@ program test
   !$acc kernels loop vector(length:5)
   DO i = 1,10
   ENDDO
-  !$acc kernels loop seq vector ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+  !$acc kernels loop seq vector ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
   DO i = 1,10
   ENDDO
   !$acc kernels loop gang vector
@@ -175,7 +175,7 @@ program test
   !$acc kernels loop auto
   DO i = 1,10
   ENDDO
-  !$acc kernels loop seq auto ! { dg-error "'seq' overrides other OpenACC loop specifiers" "TODO" { xfail *-*-* } }
+  !$acc kernels loop seq auto ! { dg-error "'seq' overrides other OpenACC loop specifiers" }
   DO i = 1,10
   ENDDO
   !$acc kernels loop gang auto ! { dg-error "'auto' conflicts with other OpenACC loop specifiers" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
index bba67dcf7cbc..a9e75f2da7da 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
@@ -23,7 +23,6 @@ subroutine test_loop_nest_depth_1 ()

   !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC gang vector loop parallelism" }
   ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 }
-  ! { dg-message ".auto. loop can be parallel" "" {target *-*-*} .-2 }
   do i=1, n
      array2(i) = array1(i) ! { dg-message "loop has no data-dependences" }
   end do
@@ -31,7 +30,6 @@ subroutine test_loop_nest_depth_1 ()

   !$acc parallel loop auto copy(array1, array2) ! { dg-message "assigned OpenACC seq loop parallelism" }
   ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-1 }
-  ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-2 }
   do i=1, n-1
      array1(i+1) = array1(i) + 10 ! { dg-message "loop has data-dependences" }
      array2(i) = array1(i)
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
index d635cc5e4fe0..0338c8788669 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
@@ -24,12 +24,10 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, m
           array (1, i, j) = array(2, i, j) ! { dg-message "loop has no data-dependences" }
        end do
@@ -42,13 +40,11 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        array (2, i, n) = array(1, i, n) ! { dg-message "loop has no data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, m
           array (1, i, j) = array (2, i,j) ! { dg-message "loop has no data-dependences" }
        end do
@@ -61,13 +57,11 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
     do i=1, n-1
        array (1, i+1, 1) = array (2, i, 1) ! { dg-message "loop has data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, m
           array (1, i, j) = array (2, i, j) ! { dg-message "loop has no data-dependences" }
        end do
@@ -81,13 +75,11 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        array (2, i, n) = array (1, i, n) ! { dg-message "loop has no data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
        do j=1, m-1
           array (1, i, j+1) = array (1, i, j) ! { dg-message "loop has data-dependences" }
        end do
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
index 97acecd8807b..3d8fdd37d661 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
@@ -23,17 +23,14 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
           do k=1, n
              array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
           end do
@@ -47,17 +44,14 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
           do k=1, n-1
              array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
           end do
@@ -98,19 +92,16 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC worker loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, n-1
           array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" }
           !$acc loop auto
           ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
           ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
           do k=1, n-1
              array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" }
           end do
@@ -126,19 +117,16 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, n-1
           array (2, i, j, n) = array (1, i, j, n) ! { dg-message "loop has no data-dependences" }
           !$acc loop auto
           ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
           ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
           do k=1, n-1
              array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
           end do
@@ -154,19 +142,16 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
        do j=1, n-1
           array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" }
           !$acc loop auto
           ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
           ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
           do k=1, n-1
              array (2, i, j, k) = array(1, i, j, k) ! { dg-message "loop has no data-dependences" }
           end do
@@ -182,19 +167,16 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
     do i=1, n
        array (2, i, n, n) = array (1, i, n, n) ! { dg-message "loop has no data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
        do j=1, n-1
           array (1, i, j+1, n) = array (1, i, j, n) ! { dg-message "loop has data-dependences" }
           !$acc loop auto
           ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
           ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
           do k=1, n-1
              array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
           end do
@@ -210,18 +192,15 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
     do i=1, n - 1
        array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC gang worker loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
           do k=1, n
              array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
           end do
@@ -237,18 +216,15 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
     do i=1, n - 1
        array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
        do j=1, n
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
           do k=1, n - 1
              array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
           end do
@@ -264,19 +240,16 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
     do i=1, n - 1
        array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
        do j=1, n - 1
           array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" }
           !$acc loop auto
           ! { dg-message "assigned OpenACC gang vector loop parallelism" "" {target *-*-*} .-1 }
           ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-          ! { dg-message "'auto' loop can be parallel" "" {target *-*-*} .-3 }
           do k=1, n
              array (1, i, j, k) = array(2, i, j, k) ! { dg-message "loop has no data-dependences" }
           end do
@@ -291,19 +264,16 @@ contains
     !$acc loop auto
     ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
     ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-    ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
     do i=1, n-1
        array (1, i+1, 1, 1) = array (1, i, 1, 1) ! { dg-message "loop has data-dependences" }
        !$acc loop auto
        ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
        ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-       ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
        do j=1, n-1
           array (1, i, j+1, 1) = array (1, i, j, 1) ! { dg-message "loop has data-dependences" }
           !$acc loop auto
           ! { dg-message "assigned OpenACC seq loop parallelism" "" {target *-*-*} .-1 }
           ! { dg-message "loop has no data-dependences" "OpenACC internal chunking CFG loop can be parallel" {target *-*-*} .-2 }
-          ! { dg-message "'auto' loop cannot be parallel" "" {target *-*-*} .-3 }
           do k=1, n-1
              array (1, i, j, k+1) = array(1, i, j, k) ! { dg-message "loop has data-dependences" }
           end do
diff --git a/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90 b/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90
index 005193f30a7c..7d3dacaf9c9f 100644
--- a/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90
@@ -99,3 +99,5 @@ contains
     !$acc exit data copyout(nonlocal_a) delete(nonlocal_i) finalize
   end subroutine nonlocal
 end program main
+
+! { dg-prune-output ".*insufficient partitioning.*" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-parallel.f90 b/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-parallel.f90
index 8fa2cabd35fa..688c0b576741 100644
--- a/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-parallel.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-parallel.f90
@@ -497,3 +497,180 @@ subroutine acc_parallel_loop_reduction ()
     end do
   end do
 end subroutine acc_parallel_loop_reduction
+
+! The same tests as above, but inside a routine construct.
+subroutine acc_routine ()
+  implicit none (type, external)
+  !$acc routine gang
+  integer :: i, j, k, l, sum, diff
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop  ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(+:sum)
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop collapse(2)  ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      do j = 1, 10
+        do k = 1, 10
+          !$acc loop reduction(+:sum)
+          do l = 1, 10
+            sum = 1
+          end do
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop  ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      do j = 1, 10
+        !$acc loop  ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        do k = 1, 10
+          !$acc loop reduction(+:sum)
+          do l = 1, 10
+            sum = 1
+          end do
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop reduction(-:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(+:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop reduction(-:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(-:sum)
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop reduction(-:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+      do j = 1, 10
+        !$acc loop  ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        do k = 1, 10
+          !$acc loop reduction(*:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+          do l = 1, 10
+            sum = 1
+          end do
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop reduction(-:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(+:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+        do k = 1, 10
+          !$acc loop reduction(*:sum)  ! { dg-warning "conflicting reduction operations for .sum." }
+          do l = 1, 10
+            sum = 1
+          end do
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum) reduction(-:diff)
+    do i = 1, 10
+      !$acc loop reduction(-:diff)  ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(+:sum)
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+
+      !$acc loop reduction(+:sum)  ! { dg-warning "nested loop in reduction needs reduction clause for .diff." }
+      do j = 1, 10
+        !$acc loop reduction(-:diff)
+        do k = 1, 10
+            diff = 1
+        end do
+      end do
+    end do
+end subroutine acc_routine
+
+subroutine acc_kernels ()
+  integer :: i, j, k, sum, diff
+
+  ! FIXME:  No diagnostics are produced for these loops because reductions
+  ! in kernels regions are not supported yet.
+  !$acc kernels
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      do j = 1, 10
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop
+      do j = 1, 10
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop reduction(-:diff)
+      do j = 1, 10
+        !$acc loop
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop ! { dg-warning "nested loop in reduction needs reduction clause for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(+:sum)
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, 10
+      !$acc loop reduction(-:sum) ! { dg-warning "conflicting reduction operations for .sum." }
+      do j = 1, 10
+        !$acc loop reduction(+:sum) ! { dg-warning "conflicting reduction operations for .sum." }
+        do k = 1, 10
+          sum = 1
+        end do
+      end do
+    end do
+  !$acc end kernels
+end subroutine acc_kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr72741.f90 b/gcc/testsuite/gfortran.dg/goacc/pr72741.f90
index b295a4fcc59c..c176f37d3837 100644
--- a/gcc/testsuite/gfortran.dg/goacc/pr72741.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/pr72741.f90
@@ -6,8 +6,8 @@ SUBROUTINE sub_1
   IMPLICIT NONE
   EXTERNAL :: g_1
   !$ACC ROUTINE (g_1) GANG WORKER ! { dg-error "Multiple loop axes" }
-  !$ACC ROUTINE (ABORT) SEQ VECTOR ! { dg-error "Multiple loop axes" "" { xfail *-*-* } }
-! { dg-bogus "invalid function name abort" "" { xfail *-*-* } .-1 }
+  !$ACC ROUTINE (ABORT) SEQ VECTOR ! { dg-error "Multiple loop axes" "" }
+! { dg-bogus "invalid function name abort" }

   CALL v_1
   CALL g_1
@@ -18,8 +18,8 @@ MODULE m_w_1
   IMPLICIT NONE
   EXTERNAL :: w_1
   !$ACC ROUTINE (w_1) WORKER SEQ ! { dg-error "Multiple loop axes" }
-  !$ACC ROUTINE (ABORT) VECTOR GANG ! { dg-error "Multiple loop axes" "" { xfail *-*-* } }
-! { dg-bogus "invalid function name abort" "" { xfail *-*-* } .-1 }
+  !$ACC ROUTINE (ABORT) VECTOR GANG ! { dg-error "Multiple loop axes" }
+! { dg-bogus "invalid function name abort" "" { target *-*-* } .-1 }

 CONTAINS
   SUBROUTINE sub_2
diff --git a/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95 b/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95
index fef512612bd9..ef67c6386ad4 100644
--- a/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/private-explicit-kernels-1.f95
@@ -28,19 +28,16 @@ program test
   integer :: i3_5_s, j3_5_s, k3_5_s

   !$acc kernels ! Explicit "private(i0_1)" clause cannot be specified here.
-  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 1 "original" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 1 "gimple" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "#pragma omp target oacc_kernels map\\(force_tofrom:i0_1 \\\[len: \[0-9\]+\\\]\\)" 0 "gimple" { xfail *-*-* } } } ! PR90067
+  ! The "private" clause gets added in the "omp_data_optimize" pass and is hence
+  ! not there in the "original"
+  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 0 "original"} }
   do i0_1 = 1, 100
   end do
   !$acc end kernels

   !$acc kernels ! Explicit "private(i0_2, j0_2)" clause cannot be specified here.
-  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 1 "original" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 1 "original" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 1 "gimple" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 1 "gimple" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "#pragma omp target oacc_kernels map\\(force_tofrom:j0_2 \\\[len: \[0-9\]+\\\]\\) map\\(force_tofrom:i0_2 \\\[len: \[0-9\]+\\\]\\)" 0 "gimple" { xfail *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 0 "original" } }
+  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 0 "original" } }
   do i0_2 = 1, 100
      do j0_2 = 1, 100
      end do
diff --git a/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95 b/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95
index 38459cfadf33..7446ea116ee5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/private-predetermined-kernels-1.f95
@@ -28,19 +28,19 @@ program test
   integer :: i3_5_s, j3_5_s, k3_5_s

   !$acc kernels
-  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 1 "original" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 1 "gimple" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "#pragma omp target oacc_kernels map\\(force_tofrom:i0_1 \\\[len: \[0-9\]+\\\]\\)" 0 "gimple" { xfail *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 0 "original" { target *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(i0_1\\)" 1 "gimple" { target *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "#pragma omp target oacc_kernels map\\(force_tofrom:i0_1 \\\[len: \[0-9\]+\\\]\\)" 0 "gimple" { target *-*-* } } } ! PR90067
   do i0_1 = 1, 100
   end do
   !$acc end kernels

   !$acc kernels
-  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 1 "original" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 1 "original" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 1 "gimple" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 1 "gimple" { xfail *-*-* } } } ! PR90067
-  ! { dg-final { scan-tree-dump-times "#pragma omp target oacc_kernels map\\(force_tofrom:j0_2 \\\[len: \[0-9\]+\\\]\\) map\\(force_tofrom:i0_2 \\\[len: \[0-9\]+\\\]\\)" 0 "gimple" { xfail *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 0 "original" { target *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 0 "original" { target *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(i0_2\\)" 1 "gimple" { target *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "private\\(j0_2\\)" 1 "gimple" { target *-*-* } } } ! PR90067
+  ! { dg-final { scan-tree-dump-times "#pragma omp target oacc_kernels map\\(force_tofrom:j0_2 \\\[len: \[0-9\]+\\\]\\) map\\(force_tofrom:i0_2 \\\[len: \[0-9\]+\\\]\\)" 0 "gimple" { target *-*-* } } } ! PR90067
   do i0_2 = 1, 100
      do j0_2 = 1, 100
      end do
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
index d773e8046b52..67b1f124a70c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
@@ -1,6 +1,7 @@
 ! OpenACC 'routine' directives inside a Fortran module.

 ! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-O0" }

 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
@@ -56,7 +57,6 @@ contains
   subroutine g_1 ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" }
     implicit none
     !$acc routine gang
-    ! { dg-bogus "\[Ww\]arning: region is worker partitioned but does not contain worker partitioned code" "TODO default 'gang' 'vector'" { xfail *-*-* } .-3 }

     integer :: i

diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
index 97fbe1268b73..b2aae1df5229 100644
--- a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
@@ -5,8 +5,6 @@ subroutine foo
   integer :: i

   !$acc kernels
-  ! { dg-warning "'i' is used uninitialized in this function" "" { target *-*-* } .-1 }
-  !TODO See discussion in '../../c-c++-common/goacc/uninit-copy-clause.c'.
   i = 1
   !$acc end kernels

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 25d545559b15..45130620e1aa 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -26,18 +26,17 @@ int main()
   /* { dg-missed {'map\(tofrom:b [^)]+\)' not optimized: 'b' is unsuitable for privatization} "" { target *-*-* } .-1 }
      { dg-missed {'map\(force_tofrom:a [^)]+\)' not optimized: 'a' is unsuitable for privatization} "" { target *-*-* } .-2 } */
   {
-    int c = 234; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    int c = 234; /* { dg-message "note: beginning 'Graphite' part in OpenACC 'kernels' region" } */

     /*TODO Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.  */
     (volatile int *) &c;

 #pragma acc loop independent gang /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       b[i] = c;

-    a = c; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    a = c; /* { dg-message "note: beginning 'Graphite' part in OpenACC 'kernels' region" } */
   }

   for (int i = 0; i < N; ++i)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 16ec7172c448..b0094f58e89d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -3,7 +3,6 @@

 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
-/* { dg-additional-options "-O2" } for Graphite/"kernels". */


 /* See also '../libgomp.oacc-fortran/parallel-dims.f90'.  */
@@ -108,9 +107,8 @@ int main ()
     int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
     gangs_min = workers_min = vectors_min = INT_MAX;
     gangs_max = workers_max = vectors_max = INT_MIN;
-#pragma acc parallel copy (gangs_actual) /* { dg-warning "region contains gang partitioned code but is not gang partitioned" } */ \
-  num_gangs (GANGS) /* { dg-warning "'num_gangs' value must be positive" "" { target c++ } } */
-    /* { dg-warning "region contains gang partitioned code but is not gang partitioned" "" { target *-*-* } .-2 } */
+#pragma acc parallel copy (gangs_actual) num_gangs (GANGS) /* { dg-warning "'num_gangs' value must be positive" "" { target c++ } } */
+    /* { dg-warning "region contains gang partitioned code but is not gang partitioned" "" { target *-*-* } .-1 } */
     {
       /* We're actually executing with num_gangs (1).  */
       gangs_actual = 1;
@@ -138,8 +136,8 @@ int main ()
     int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
     gangs_min = workers_min = vectors_min = INT_MAX;
     gangs_max = workers_max = vectors_max = INT_MIN;
-#pragma acc parallel copy (workers_actual) /* { dg-warning "region contains worker partitioned code but is not worker partitioned" } */ \
-  num_workers (WORKERS) /* { dg-warning "'num_workers' value must be positive" "" { target c++ } } */
+#pragma acc parallel copy (workers_actual) \
+    num_workers (WORKERS) /* { dg-warning "'num_workers' value must be positive" "" { target c++ } } */
     /* { dg-warning "region contains worker partitioned code but is not worker partitioned" "" { target *-*-* } .-2 } */
     {
       /* We're actually executing with num_workers (1).  */
@@ -168,10 +166,10 @@ int main ()
     int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
     gangs_min = workers_min = vectors_min = INT_MAX;
     gangs_max = workers_max = vectors_max = INT_MIN;
-#pragma acc parallel copy (vectors_actual) /* { dg-warning "region contains vector partitioned code but is not vector partitioned" } */ \
-  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } 164 } */ \
+#pragma acc parallel copy (vectors_actual) \
   vector_length (VECTORS) /* { dg-warning "'vector_length' value must be positive" "" { target c++ } } */
-    /* { dg-warning "region contains vector partitioned code but is not vector partitioned" "" { target *-*-* } .-2 } */
+    /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } .-2 } */
+    /* { dg-warning "region contains vector partitioned code but is not vector partitioned" "" { target *-*-* } .-3 } */
     {
       /* We're actually executing with vector_length (1), just the GCC nvptx
         back end enforces vector_length (32).  */
@@ -217,7 +215,6 @@ int main ()
 #pragma acc parallel copy (gangs_actual) /* { dg-warning "region is gang partitioned but does not contain gang partitioned code" } */ \
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max) \
   num_gangs (gangs)
-    /* { dg-bogus "warning: region is gang partitioned but does not contain gang partitioned code" "TODO 'reduction'" { xfail *-*-* } .-3 } */
     {
       if (acc_on_device (acc_device_host))
        {
@@ -577,12 +574,16 @@ int main ()
       asm volatile ("" : : : "memory");

 #pragma acc loop reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
+/* { dg-warning "region is gang partitioned but does not contain gang partitioned code" "" { target *-*-* } .-1 } */
+/* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-2 } */
+/* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } .-3 } */
+/* { dg-warning "removed gang worker vector partitioning from 'kernels' region" "" { target *-*-* } .-4 } */
       for (int i = 100; i > -100; --i)
-       {
-         gangs_min = gangs_max = acc_gang ();
-         workers_min = workers_max = acc_worker ();
-         vectors_min = vectors_max = acc_vector ();
-       }
+        {
+          gangs_min = gangs_max = acc_gang ();
+          workers_min = workers_max = acc_worker ();
+          vectors_min = vectors_max = acc_vector ();
+        }
     }
     if (gangs_min != 0 || gangs_max != 1 - 1
        || workers_min != 0 || workers_max != 1 - 1
@@ -627,6 +628,7 @@ int main ()
     /* { dg-bogus "warning: region contains gang partitioned code but is not gang partitioned" "TODO 'serial'" { xfail *-*-* } .-2 }
        { dg-bogus "warning: region contains worker partitioned code but is not worker partitioned" "TODO 'serial'" { xfail *-*-* } .-3 }
        { dg-bogus "warning: region contains vector partitioned code but is not vector partitioned" "TODO 'serial'" { xfail *-*-* } .-4 } */
+    /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } .-5 } */
     {
       if (acc_on_device (acc_device_nvidia))
        {
@@ -658,7 +660,7 @@ int main ()
        __builtin_abort ();
     if (gangs_min != 0 || gangs_max != 1 - 1
        || workers_min != 0 || workers_max != 1 - 1
-       || vectors_min != 0 || vectors_max != vectors_actual - 1)
+       || vectors_min != 0 || vectors_max >= vectors_actual)
       __builtin_abort ();
   }

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c
index 44767cd27c32..0fbfeb0b6102 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr84955-1.c
@@ -28,4 +28,3 @@ f2 (void)

   return i + j;
 }
-/* { dg-final { scan-tree-dump-not "if" "cddce2"} } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
index 84b9c01443e5..3898b1c55ab7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
@@ -15,8 +15,8 @@ main (void)
   return 0;
 }

-/* Todo: Boths bar.syncs can be removed.
-   Atm we generate this dead code inbetween forked and joining:
+/* All bar.syncs can be removed.
+   Previously, we generated dead code inbetween forked and joining:

                      mov.u32 %r28, %ntid.y;
                      mov.u32 %r29, %tid.y;
@@ -31,6 +31,6 @@ main (void)
              @%r33   bra     $L3;
      $L2:

-   so the loop is not recognized as empty loop (which we detect by seeing if
+   so the loop was not recognized as empty loop (which we detect by seeing if
    joining immediately follows forked).  */
-/* { dg-final { scan-offload-rtl-dump-times "nvptx_barsync" 2 "mach" } } */
+/* { dg-final { scan-offload-rtl-dump-times "nvptx_barsync" 0 "mach" } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
index 10a4116676b4..866349797f3c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
@@ -9,7 +9,6 @@ int a;
 #pragma acc declare create(a)

 #pragma acc routine vector
-/* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } .+2 } */
 void __attribute__((noinline, noclone))
 foo_v (void)
 {
@@ -17,8 +16,6 @@ foo_v (void)
 }

 #pragma acc routine worker
-/* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .+3 }
-   { dg-warning "region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } .+2 } */
 void __attribute__((noinline, noclone))
 foo_w (void)
 {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
index e1679444172c..0aeaf7799f68 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
@@ -11,11 +11,11 @@ main (void)
   {
     #pragma acc loop worker
     for (int i = 0; i < n; i++)
-      ;
+      asm volatile ("");

     #pragma acc loop worker
     for (int i = 0; i < n; i++)
-      ;
+      asm volatile ("");
   }

   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
index bc55d158a81f..e5b5c3e84374 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
@@ -7,5 +7,5 @@

 #include "pr85486.c"

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow3" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
index 33480a4ae682..3d0336a61ad6 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
@@ -7,5 +7,5 @@

 #include "pr85486.c"

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow3" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
index 0d98b82f9932..f7d2f3b0fdd4 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
@@ -54,5 +54,5 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow3" } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
index 22891a243e14..95946c581b19 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -34,5 +34,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } *
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
index 30418f378f93..cab21013daef 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=::128" } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -35,5 +35,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 128\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
index 754964d60100..003057415fc6 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* We default to warp size 32 for the vector length, so the GOMP_OPENACC_DIM has
    no effect.  */
 /* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */
@@ -38,5 +38,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=1, vectors=32" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
index 44364cbc51a7..b004a9d5072e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -36,5 +36,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
index 5e387c6ced61..9e7f9a2d9ed9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=:2:128" } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -37,5 +37,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 2, 128\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
index d32f4e4417ab..01b186e91bd7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-set-target-env-var "GOMP_OPENACC_DIM" ":2:" } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -37,5 +37,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=2, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
index df5cb09df712..8319cb32412e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
 /* { dg-set-target-env-var "GOMP_DEBUG" "1" } */

 #include <stdlib.h>
@@ -36,5 +36,6 @@ main (void)
   return 0;
 }

-/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)" "oaccdevlow1" } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)"  "oaccdevlow1" { target { any-opts "-O[123s]"} } } } */
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 0, 128\\)"  "oaccdevlow3" { target { any-opts "-O[0g]"} } } } */
 /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: launch gangs=1, workers=8, vectors=128" } */
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
index 0d5ea73813de..b5b3c53d59dd 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -2,7 +2,8 @@

 ! { dg-do run }
 ! { dg-additional-options "-fdump-tree-oaccdevlow-details -Wopenacc-parallelism" }
-! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow1" { target { any-opts "-O[1,2,3,s]"} } } } */
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow3" { target { no-opts "-O[1,2,3,s]"} } } } */

 program main
   integer :: w, arr(0:31)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
index 90e06be24ff5..2a2bc4cc7cf7 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
@@ -2,7 +2,8 @@

 ! { dg-do run }
 ! { dg-additional-options "-fdump-tree-oaccdevlow-details" }
-! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow1" { target { any-opts "-O[1,2,3,s]"} } } } */
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow3" { target { no-opts "-O[1,2,3,s]"} } } } */

 program main
   integer :: w, arr(0:31)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
index 0a612a57964e..a9131a06f77b 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
@@ -1,5 +1,10 @@
 ! { dg-do run }
-!
+! { dg-xfail-run-if "PR102424" { ! openacc_host_selected } }
+! TODO PR102424 OpenACC 'reduction' with outer 'loop seq', inner 'loop gang'
+
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-fopt-info-omp" }
+
 program foo

   IMPLICIT NONE
@@ -17,9 +22,10 @@ subroutine bar(vol)
   INTEGER :: j,k

   !$ACC KERNELS
-  !$ACC LOOP REDUCTION(+:vol)
+  ! TODO The "reduction" dependence handling in Graphite should be adjusted to take the outer "reduction" into account correctly
+  !$ACC LOOP REDUCTION(+:vol) ! { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO Suboptimal parallelism assigned" { xfail *-*-* } }
   DO k=1,2
-     !$ACC LOOP REDUCTION(+:vol)
+     !$ACC LOOP REDUCTION(+:vol) ! { dg-bogus "optimized: assigned OpenACC gang vector loop parallelism" "TODO Suboptimal parallelism assigned" { xfail *-*-* } }
      DO j=1,2
        vol = vol + 1
      ENDDO
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90
deleted file mode 100644
index 0e9da426d998..000000000000
--- a/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90
+++ /dev/null
@@ -1,98 +0,0 @@
-! Check that the Graphite-based "auto" loop and "kernels" handling
-! is able to assign the parallelism dimensions correctly for a simple
-! loop-nest with reductions. All loops should be parallelized.
-
-! { dg-additional-options "-O2 -g" }
-! { dg-additional-options "-foffload=-fdump-tree-oaccloops1-details" }
-! { dg-additional-options "-foffload=-fopt-info-optimized" }
-! { dg-additional-options "-fdump-tree-oaccloops1-details" }
-! { dg-additional-options "-fopt-info-optimized" }
-
-module test
-  implicit none
-
-  integer, parameter :: n = 10000
-  integer :: a(n,n)
-  integer :: sums(n,n)
-
-contains
-  function sum_loop_auto() result(sum)
-    integer :: i, j
-    integer :: sum, max_val
-
-    sum = 0
-    max_val = 0
-
-    !$acc parallel copyin (a) reduction(+:sum)
-    !$acc loop auto reduction(+:sum) reduction(max:max_val) ! { dg-optimized "assigned OpenACC gang worker loop parallelism" }
-    ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
-    do i = 1,size (a, 1)
-       !$acc loop auto reduction(max:max_val) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
-       ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
-       do j = 1,size(a, 2)
-          max_val = a(i,j)
-       end do
-       sum = sum + max_val
-    end do
-    !$acc end parallel
-  end function sum_loop_auto
-
-  function sum_kernels() result(sum)
-    integer :: i, j
-    integer :: sum, max_val
-
-    sum = 0
-    max_val = 0
-
-    !$acc kernels
-    ! { dg-optimized {'map\(force_tofrom:max_val [^)]+\)' optimized to 'map\(to:max_val [^)]+\)'} "" { target *-*-* } .-1 }
-    !$acc loop reduction(+:sum) reduction(max:max_val) ! { dg-optimized "assigned OpenACC gang worker loop parallelism" }
-    ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
-    ! { dg-optimized "forwarded loop nest in OpenACC .kernels. construct to .Graphite." "" { target *-*-* } .-2 }
-    do i = 1,size (a, 1)
-       !$acc loop reduction(max:max_val) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
-       ! { dg-optimized ".auto. loop can be parallel" "" { target *-*-* } .-1 }
-       do j = 1,size(a, 2)
-          max_val = a(i,j)
-       end do
-       sum = sum + max_val
-    end do
-    !$acc end kernels
-  end function sum_kernels
-end module test
-
-program main
-  use test
-
-  implicit none
-
-  integer :: result, i, j
-
-  ! We sum the maxima of n rows, each containing numbers
-  ! 1..n
-  integer, parameter :: expected_sum = n * n
-
-  do i = 1, size (a, 1) ! { dg-optimized "loop nest optimized" }
-     do j = 1, size (a, 2)
-        a(i, j) = j
-     end do
-  end do
-
-
-  result = sum_loop_auto()
-  if (result /= expected_sum) then
-     write (*, *) "Wrong result:", result
-     call abort()
-  endif
-
-  result = sum_kernels()
-  if (result /= expected_sum) then
-     write (*, *) "Wrong result:", result
-     call abort()
-  endif
-end program main
-
-! This ensures that the dg-optimized assertions above hold for both
-! compilers because the output goes to stderr and the dump file.
-! { dg-final { scan-offload-tree-dump-times "optimized: assigned OpenACC .*? parallelism" 4 "oaccloops1" } }
-! { dg-final { scan-tree-dump-times "optimized: assigned OpenACC .*? parallelism" 4 "oaccloops1" } }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index 994a8a35110f..6ac66e1a088b 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -22,13 +22,11 @@ subroutine kernel(lo, hi, a, b, c)
   ! { dg-missed {'map\(tofrom:\*b [^)]+\)' not optimized: '\*b' is unsuitable for privatization} "" { target *-*-* } .-3 }
   ! { dg-missed {'map\(tofrom:\*a [^)]+\)' not optimized: '\*a' is unsuitable for privatization} "" { target *-*-* } .-4 }
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = lo, hi
      b(i) = a(i)
   end do
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = lo, hi
      c(i) = b(i)