[RFC,WIP] OpenMP map with iterator + Fortran OpenMP deep mapping / custom allocator (+ Fortran co_reduce)

Message ID b31d131a-1911-9a70-b52d-8c9f4b8d7898@codesourcery.com
State New
Headers
Series [RFC,WIP] OpenMP map with iterator + Fortran OpenMP deep mapping / custom allocator (+ Fortran co_reduce) |

Commit Message

Tobias Burnus Dec. 6, 2021, 2 p.m. UTC
  This is a RFC/WIP patch about:

(A) OpenMP (C/C++/Fortran)
    omp target map(iterator(i=n:m),to : x(i))

(B) Fortran:
(1)   omp target map(to : dt_var, class_var)
(2)   omp parallel allocator(my_alloc) firstprivate(class_var)
(3)  call co_reduce(dt_coarray, my_func)

The problem with (A) is that there is not a compile-time countable
number of iterations such that it cannot be easily add to the array
used to call GOMP_target_ext.

The problem with (B) is that dt_var can have allocatable components
which complicates stuff and with recursive types, the number of
elements it not known at compile time - not with polymorphic types
as it depends on the recursion depth and dynamic type, respectively.


Comments/questions/remarks ... to the proposal below?

Regarding mapping, I currently have no idea how to handle
the virtual table. Thoughts?

  * * *

The idea for OpenMP mapping is a callback function - such that

integer function f() result(ires)
   implicit none
   integer :: a
   !$omp target  map(iterator(i=1:5), to: a)
   !$omp end target
   ires = 7
end

becomes

   #pragma omp target map(iterator(integer(kind=4) i=1:5:1):to:a)

and then during gimplify:

   #pragma omp target num_teams(1) thread_limit(0) map(map_function:f_._omp_mapfn.0 [len: 0])

with

unsigned long f_._omp_mapfn.0 (unsigned long (*<T626>) (void *) cb_fn,
                                void * token, void * base, unsigned short flags)
{
...

with the loop around the cb_fn call and flag = GOMP_MAP_TO.

(Not fully working yet. ME part needs still to generate the
loop similar to depend or affinity. For C/C++, the basic
parsing is done but some more code changes are needed
in the FE.)


  * * *

Fortran - with an OpenMP example:

module m
   implicit none (type, external)
   type t3
   end type t3
   type t
     class(t3), allocatable :: cx
     type(t3), pointer :: ptx
   end type t
end module m

use m
implicit none (type, external)
class(t), allocatable :: var

!$omp target map(to:var)
   if (allocated(var)) stop 1
!$omp end target
end


The idea is that this becomes:

   #pragma omp target map(to:var) map(map_function:var._vptr->_callback [len: 1]) map(to:var [len: 0])

That's:
* 'var' is first normally mapped
* Then the map function is added which gets 'var' as argument


(For an array, I plan to add an internal function which calls the
callback function in a scalarization loop.)


On the Fortran side - this requires in the vtable a new entry,
(*ABI breakage*) which points to:

integer(kind=8) __callback_m_T (
    integer(kind=8) (*<T655>) (void *, void *, integer(kind=8),
                               void (*<T6d>) (void), integer(kind=2)) cb,
    void * token, struct t & restrict scalar, integer(kind=4) f_flags)
{
   __result___callback_m_T = 0;
   if (scalar->cx._data != 0B)
     {
         void * D.4384;
         D.4384 = (void *) scalar->cx._data;
         __result___callback_m_T = cb (token, D.4384, scalar->cx._vptr->_size, 0B, 0)
                                   + __result___callback_m_T;
       __result___callback_m_T = cb (token, *scalar->cx._data, 0, *scalar->cx._vptr->_callback, 0)
                                 + __result___callback_m_T;
     }
   if (scalar->ptx != 0B)
     {
         void * D.4386;
         D.4386 = (void *) scalar->ptx;
         __result___callback_m_T = cb (token, D.4386, 0, 0B, 0) + __result___callback_m_T;
     }
   return __result___callback_m_T;
}


That is:

* For pointer, the CB is called with SIZE = 0, permitting the caller to
   remap pointer - or ignore the callback call.
* For allocatables, it passes the SIZE, permitting to map the allocatable
* If the allocatable is a CLASS or has allocatable components, cb is
   called with a callback function - which that those can be mapped as well.
   (and SIZE = 0)

(The GOMP_MAP_TO needs to be handled by libgomp, e.g. by putting it into
the void *token.)


The vtable's callback function can then also be used with
* OpenMP ALLOCATOR or for
* deep copying with CO_REDUCE.


Question: Does this way of passing make sense or not?
Comments?


Tobias


PS: The patch has a lot of pieces in places, but still lacks both
some glue code and some other bit. :-/
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Jakub Jelinek Dec. 6, 2021, 3:16 p.m. UTC | #1
On Mon, Dec 06, 2021 at 03:00:30PM +0100, Tobias Burnus wrote:
> This is a RFC/WIP patch about:
> 
> (A) OpenMP (C/C++/Fortran)
>    omp target map(iterator(i=n:m),to : x(i))
> 
> (B) Fortran:
> (1)   omp target map(to : dt_var, class_var)
> (2)   omp parallel allocator(my_alloc) firstprivate(class_var)
> (3)  call co_reduce(dt_coarray, my_func)
> 
> The problem with (A) is that there is not a compile-time countable
> number of iterations such that it cannot be easily add to the array
> used to call GOMP_target_ext.
> 
> The problem with (B) is that dt_var can have allocatable components
> which complicates stuff and with recursive types, the number of
> elements it not known at compile time - not with polymorphic types
> as it depends on the recursion depth and dynamic type, respectively.

I think there is no reason why the 3 arrays passed to GOMP_target_ext
(etc., for target data {, enter, exit} too and because this
affects to and from clauses as well, target update as well)
need to be constant size.
We can allocate them as VLA or from heap as well.
I guess only complication for using __builtin_allocate_with_align
would be target data, where the construct body could be using alloca
and we wouldn't want to silently free those allocas at the end of the
construct, though I bet we already have that problem whenever we
privatize some variable length variables on constructs that don't
result in outlined body into a new function, and outlining a body
into a new function will also break alloca across the boundaries.

We do a lot of sorting of the map clauses especially during gimplification,
one question is whether it is ok to sort the whole map clause with iterator
as one clause, or if we'd need to do the sorting at runtime.
With arbitrary lvalue expressions, the clauses with iterator
don't need to be just map(iterator(i=0:n),to : x[i]) but can be e.g.
map(iterator(i=0:n), tofrom : i == 0 ? a : i == 1 ? b : c[i - 2])
etc. (at least in C++, in C I think ?: doesn't give lvalues), or
*(i == 0 ? &a : i == 1 ? &b : &c[i - 2]) otherwise, though
I hope that is ok, it isn't much different from such lvalue expressions
when i isn't an iterator but say function parameter or some other variable,
I think we only map value in that case and don't really remap the vars
etc. (but sure, for map(iterator(i=0:n), to : foo(i).a[i].b[i]) we should
follow the rules for []s and .

So, I wouldn't be really afraid of going into dynamic allocation of the
arrays if the count isn't compile time constant.

Another thing is that it would be nice to optimize some most common cases
where some mappings could be described in more compact ways, and that
wouldn't be solely about iterator clause, but also when we start properly
implementing all the mapping nastiness of 5.0 and beyond, like mapping
of references, or the declare mapper stuff etc.
So if we come up with something like array descriptors Fortran has to
describe mapping of some possibly non-contiguous multidimensional array
with strides etc. in a single map element, it will be nice, but I'd
prefer not to outline complex expressions from map's clause as separate
function each, it can use many variables etc. from the parent function
and calling those as callbacks would be too ugly.

	Jakub
  
Tobias Burnus Dec. 6, 2021, 4:06 p.m. UTC | #2
On 06.12.21 16:16, Jakub Jelinek wrote:
> I think there is no reason why the 3 arrays passed to GOMP_target_ext
> (etc., for target data {, enter, exit} too and because this
> affects to and from clauses as well, target update as well)
> need to be constant size.
> We do a lot of sorting of the map clauses especially during gimplification,
> one question is whether it is ok to sort the whole map clause with iterator
> as one clause, or if we'd need to do the sorting at runtime.

Regarding sorting at runtime: It looks as if Julian's patches at
  [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework
can do without run-time sorting.

Regarding the sorting and iterators: I think we already have this problem
intrinsically – for depend/affinity, we create for <clause>(iterator(...) : a, b)
a single loop - also to have a consistency with regards to the array bounds.

But if we want to put 'd' between 'a' and 'b' - we either need to split
the loop - or 'd' cannot be put between 'a' and 'b'. That's a fundamental
issue. I am not sure whether that's a real issue as all have the same map
type, but still.

> but I'd
> prefer not to outline complex expressions from map's clause as separate
> function each, it can use many variables etc. from the parent function
> and calling those as callbacks would be too ugly.

I concur that it would be useful to avoid using callbacks; it it seems
as if it can be avoided for iterators. I am not sure how well, but okay.

But I have no idea how to avoid callbacks for allocatable components in
Fortran. For

type t
   type(t), allocatable :: a
end t
type(t) :: var

(recursive type) - it is at least semi-known at compile time:
   e = var;
   while (e)
    { map(e); e = e->a; }
I am not sure how to pass this on to the middle end - but
code for it can be generated.

But as soon as polymorphism comes into play, I do not see how
callbacks can be avoided. Like for:

   class(t) :: var2

Here, it is known at compile time that var2%a exists (recursively).
But the dynamic type might additionally have var2%b(:) which in turn
might have var2%(:)%c.


I see two places for calling the callback: Either by passing the
Fortran callback function on to libgomp or by generating the
function call handling inside omp-low.c - to populate a nonconstant
array.

Which solution do you prefer?

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek Dec. 6, 2021, 4:23 p.m. UTC | #3
On Mon, Dec 06, 2021 at 05:06:10PM +0100, Tobias Burnus wrote:
> Regarding the sorting and iterators: I think we already have this problem
> intrinsically – for depend/affinity, we create for <clause>(iterator(...) : a, b)
> a single loop - also to have a consistency with regards to the array bounds.

depend and affinity don't need to sort anything, we ignore affinity
altogether, depend is just an unordered list of (from what we care about) addresses
with the kinds next to them, it can contain duplicates etc. (and affinity
if we implemented it can too).
> 
> But if we want to put 'd' between 'a' and 'b' - we either need to split
> the loop - or 'd' cannot be put between 'a' and 'b'. That's a fundamental
> issue. I am not sure whether that's a real issue as all have the same map
> type, but still.
> 
> > but I'd
> > prefer not to outline complex expressions from map's clause as separate
> > function each, it can use many variables etc. from the parent function
> > and calling those as callbacks would be too ugly.
> 
> I concur that it would be useful to avoid using callbacks; it it seems
> as if it can be avoided for iterators. I am not sure how well, but okay.
> 
> But I have no idea how to avoid callbacks for allocatable components in
> Fortran. For
> 
> type t
>   type(t), allocatable :: a
> end t
> type(t) :: var
> 
> (recursive type) - it is at least semi-known at compile time:
>   e = var;
>   while (e)
>    { map(e); e = e->a; }
> I am not sure how to pass this on to the middle end - but
> code for it can be generated.

I bet we'd need to add a target hook for that, but other than that,
I don't see why we'd need a callback at runtime.
Let a target hook in first phase compute how many slots in the 3 arrays
will be needed, then let's allocate the 3 arrays, fill in the static
parts in there and when filling such maps follow the target hook to
emit inline code that fills in those extra mappings.
Note, I think it might be better to do declare mapper support before
doing the recursive allocatables or Fortran polymorphism, because
it will necessarily be affected by declare mapper at each level too.

But generally, I don't see why whatever you want to do with a callback
couldn't be done by just emitting a runtime loop that does something
when filling the arrays.  After all, we'll have such runtime loops even
for simple iterator unless we optimize those as an array descriptor,
map(iterator(i=0:n), to: *foo (i)) - in some way it is inlining what the
callback would do at the GOMP_target_ext etc. caller, but it is actually
the other way around, callbacks would mean outlining what can be done in
mere runtime loops inside of the function that has all the vars etc.
accessible there.

	Jakub
  

Patch

 gcc/c/c-parser.c              |  69 ++++++++-
 gcc/cp/parser.c               |  70 +++++++--
 gcc/fortran/class.c           | 351 ++++++++++++++++++++++++++++++++++++++++++
 gcc/fortran/dump-parse-tree.c |  14 +-
 gcc/fortran/gfortran.h        |   1 +
 gcc/fortran/intrinsic.c       |   2 +-
 gcc/fortran/module.c          |   9 +-
 gcc/fortran/openmp.c          |  41 ++++-
 gcc/fortran/resolve.c         |   2 +-
 gcc/fortran/trans-expr.c      |   5 +
 gcc/fortran/trans-intrinsic.c |   3 +-
 gcc/fortran/trans-openmp.c    |  59 ++++++-
 gcc/fortran/trans.h           |   1 +
 gcc/gimplify.c                | 132 ++++++++++++++++
 gcc/omp-low.c                 |  53 ++++++-
 gcc/tree-pretty-print.c       | 192 ++++++++++++-----------
 include/gomp-constants.h      |   4 +-
 libgomp/target.c              | 126 ++++++++++++++-
 18 files changed, 1004 insertions(+), 130 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index af2bb5bc8cc..24acc1ea24a 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -16094,34 +16094,61 @@  c_parser_omp_clause_depend (c_parser *parser, tree list)
    map ( [map-type-modifier[,] ...] map-kind: variable-list )
 
    map-type-modifier:
-     always | close */
+     always | close
+
+   OpenMP 5.1:
+   map-type-modifier:
+     always | close | iterator ( iterators-definition )  */
 
 static tree
 c_parser_omp_clause_map (c_parser *parser, tree list)
 {
   location_t clause_loc = c_parser_peek_token (parser)->location;
   enum gomp_map_kind kind = GOMP_MAP_TOFROM;
-  tree nl, c;
+  tree nl, c, iterators = NULL_TREE;
 
   matching_parens parens;
   if (!parens.require_open (parser))
     return list;
 
-  int pos = 1;
+  int pos = 1, pos2 = 0;
   int map_kind_pos = 0;
-  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)
+  while (true)
     {
-      if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == CPP_COLON)
+      c_token *tok = c_parser_peek_nth_token_raw (parser, pos + pos2);
+      if (tok->type != CPP_NAME)
+	break;
+      if (strcmp ("iterator", IDENTIFIER_POINTER (tok->value)) == 0)
+	{
+	  int n_parens = 0;
+	  pos2++;
+	  while (true)
+	    {
+	      tok = c_parser_peek_nth_token_raw (parser, pos + pos2);
+	      if (tok->type == CPP_EOF)
+		break;
+	      if (tok->type == CPP_OPEN_PAREN)
+		n_parens++;
+	      if (tok->type == CPP_CLOSE_PAREN)
+		n_parens--;
+	      if (n_parens == 0)
+		break;
+	      pos2++;
+	    }
+	}
+      if (c_parser_peek_nth_token_raw (parser, pos + pos2 + 1)->type
+	  == CPP_COLON)
 	{
 	  map_kind_pos = pos;
 	  break;
 	}
 
-      if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == CPP_COMMA)
+      if (c_parser_peek_nth_token_raw (parser, pos + pos2 + 1)->type
+	  == CPP_COMMA)
 	pos++;
       pos++;
     }
-
+__builtin_printf("Debug: pos=%d, map_kind_pos=%d\n", pos, map_kind_pos);
   int always_modifier = 0;
   int close_modifier = 0;
   for (int pos = 1; pos < map_kind_pos; ++pos)
@@ -16141,16 +16168,25 @@  c_parser_omp_clause_map (c_parser *parser, tree list)
 	    {
 	      c_parser_error (parser, "too many %<always%> modifiers");
 	      parens.skip_until_found_close (parser);
+	      if (iterators)
+		pop_scope ();
 	      return list;
 	    }
 	  always_modifier++;
 	}
+      else if (strcmp ("iterator", p) == 0 && iterators == NULL_TREE)
+	{
+	  iterators = c_parser_omp_iterators (parser);
+	  continue;
+	}
       else if (strcmp ("close", p) == 0)
 	{
 	  if (close_modifier)
 	    {
 	      c_parser_error (parser, "too many %<close%> modifiers");
 	      parens.skip_until_found_close (parser);
+	      if (iterators)
+		pop_scope ();
 	      return list;
 	    }
 	  close_modifier++;
@@ -16161,6 +16197,8 @@  c_parser_omp_clause_map (c_parser *parser, tree list)
 				  "modifier other than %<always%> or %<close%>"
 				  "on %<map%> clause");
 	  parens.skip_until_found_close (parser);
+	  if (iterators)
+	    pop_scope ();
 	  return list;
 	}
 
@@ -16188,6 +16226,8 @@  c_parser_omp_clause_map (c_parser *parser, tree list)
 	  c_parser_error (parser, "invalid map kind");
 	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
 				     "expected %<)%>");
+	  if (iterators)
+	    pop_scope ();
 	  return list;
 	}
       c_parser_consume_token (parser);
@@ -16196,8 +16236,21 @@  c_parser_omp_clause_map (c_parser *parser, tree list)
 
   nl = c_parser_omp_variable_list (parser, clause_loc, OMP_CLAUSE_MAP, list);
 
+  if (iterators)
+    {
+      tree block = pop_scope ();
+      if (iterators == error_mark_node)
+	iterators = NULL_TREE;
+      else
+	TREE_VEC_ELT (iterators, 5) = block;
+      sorry_at (clause_loc, "%<iterator%> in %<map%> clause not yet supported");
+    }
   for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
-    OMP_CLAUSE_SET_MAP_KIND (c, kind);
+    {
+      OMP_CLAUSE_SET_MAP_KIND (c, kind);
+      if (false && iterators)  /* Not yet supported.  */
+	OMP_CLAUSE_DECL (c) = build_tree_list (iterators, OMP_CLAUSE_DECL (c));
+    }
 
   parens.skip_until_found_close (parser);
   return nl;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 55e6a1a8b3a..698ce1a1a0c 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -39199,29 +39199,54 @@  cp_parser_omp_clause_depend (cp_parser *parser, tree list, location_t loc)
    map ( [map-type-modifier[,] ...] map-kind: variable-list )
 
    map-type-modifier:
-     always | close */
+     always | close
+
+   OpenMP 5.1:
+   map-type-modifier: always | close | iterator ( iterators-definition )  */
 
 static tree
 cp_parser_omp_clause_map (cp_parser *parser, tree list)
 {
-  tree nlist, c;
+  tree nlist, c, iterators = NULL_TREE;
   enum gomp_map_kind kind = GOMP_MAP_TOFROM;
 
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
     return list;
 
-  int pos = 1;
+  int pos = 1, pos2 = 0;
   int map_kind_pos = 0;
-  while (cp_lexer_peek_nth_token (parser->lexer, pos)->type == CPP_NAME
-	 || cp_lexer_peek_nth_token (parser->lexer, pos)->keyword == RID_DELETE)
+  while (true)
     {
-      if (cp_lexer_peek_nth_token (parser->lexer, pos + 1)->type == CPP_COLON)
+      cp_token *tok = cp_lexer_peek_nth_token (parser->lexer, pos + pos2);
+      if (tok->type != CPP_NAME && tok->keyword != RID_DELETE)
+	break;
+      if (strcmp ("iterator", IDENTIFIER_POINTER (tok->u.value)) == 0)
+	{
+	  int n_parens = 0;
+	  pos2++;
+	  while (true)
+	    {
+	      tok = cp_lexer_peek_nth_token (parser->lexer, pos + pos2);
+	      if (tok->type == CPP_EOF)
+		break;
+	      if (tok->type == CPP_OPEN_PAREN)
+		n_parens++;
+	      if (tok->type == CPP_CLOSE_PAREN)
+		n_parens--;
+	      if (n_parens == 0)
+		break;
+	      pos2++;
+	    }
+	}
+      if (cp_lexer_peek_nth_token (parser->lexer, pos + pos2 + 1)->type
+	  == CPP_COLON)
 	{
 	  map_kind_pos = pos;
 	  break;
 	}
 
-      if (cp_lexer_peek_nth_token (parser->lexer, pos + 1)->type == CPP_COMMA)
+      if (cp_lexer_peek_nth_token (parser->lexer, pos + pos2 + 1)->type
+	  == CPP_COMMA)
 	pos++;
       pos++;
     }
@@ -39247,10 +39272,18 @@  cp_parser_omp_clause_map (cp_parser *parser, tree list)
 						     /*recovering=*/true,
 						     /*or_comma=*/false,
 						     /*consume_paren=*/true);
+	      if (iterators)
+		poplevel (0, 1, 0);
 	      return list;
 	    }
 	  always_modifier = true;
 	}
+      else if (strcmp ("iterator", p) == 0 && iterators == NULL_TREE)
+	{
+	  begin_scope (sk_omp, NULL);
+	  iterators = cp_parser_omp_iterators (parser);
+	  continue;
+	}
       else if (strcmp ("close", p) == 0)
 	{
 	  if (close_modifier)
@@ -39260,6 +39293,8 @@  cp_parser_omp_clause_map (cp_parser *parser, tree list)
 						     /*recovering=*/true,
 						     /*or_comma=*/false,
 						     /*consume_paren=*/true);
+	      if (iterators)
+		poplevel (0, 1, 0);
 	      return list;
 	    }
 	  close_modifier = true;
@@ -39273,6 +39308,8 @@  cp_parser_omp_clause_map (cp_parser *parser, tree list)
 						 /*recovering=*/true,
 						 /*or_comma=*/false,
 						 /*consume_paren=*/true);
+	  if (iterators)
+	    poplevel (0, 1, 0);
 	  return list;
 	}
 
@@ -39301,6 +39338,8 @@  cp_parser_omp_clause_map (cp_parser *parser, tree list)
 	  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
 						 /*or_comma=*/false,
 						 /*consume_paren=*/true);
+	  if (iterators)
+	    poplevel (0, 1, 0);
 	  return list;
 	}
       cp_lexer_consume_token (parser->lexer);
@@ -39316,9 +39355,22 @@  cp_parser_omp_clause_map (cp_parser *parser, tree list)
 
   nlist = cp_parser_omp_var_list_no_open (parser, OMP_CLAUSE_MAP, list,
 					  NULL);
-
+  if (iterators)
+    {
+      tree block = poplevel (1, 1, 0);
+      if (iterators == error_mark_node)
+	iterators = NULL_TREE;
+      else
+	TREE_VEC_ELT (iterators, 5) = block;
+      sorry_at (DECL_SOURCE_LOCATION (TREE_VEC_ELT (iterators, 0)),
+		"%<iterator%> in %<map%> clause not yet supported");
+    }
   for (c = nlist; c != list; c = OMP_CLAUSE_CHAIN (c))
-    OMP_CLAUSE_SET_MAP_KIND (c, kind);
+    {
+      OMP_CLAUSE_SET_MAP_KIND (c, kind);
+      if (false && iterators)  /* Not yet supported.  */
+	OMP_CLAUSE_DECL (c) = build_tree_list (iterators, OMP_CLAUSE_DECL (c));
+    }
 
   return nlist;
 }
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 6b017667600..0a5ebf803c3 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -51,6 +51,8 @@  along with GCC; see the file COPYING3.  If not see
 		 allocatable components and calls FINAL subroutines.
     * _deallocate: A procedure pointer to a deallocation procedure; nonnull
 		 only for a recursive derived type.
+    * _callback: A procedure pointer, taking a callback proc pointer and
+		 calling that one for the DT and the allocatable components.
 
    After these follow procedure pointer components for the specific
    type-bound procedures.  */
@@ -2242,6 +2244,346 @@  generate_finalization_wrapper (gfc_symbol *derived, gfc_namespace *ns,
 }
 
 
+static void
+generate_callback_wrapper (gfc_symbol *derived, gfc_namespace *ns,
+			   const char *tname, gfc_component *vtab_cb)
+{
+  gfc_namespace *sub_ns;
+  gfc_code *last_code, *block;
+  gfc_symbol *callback, *cb, *token, *scalar, *f_flags;
+  gfc_symbol *c_ptr, *c_funptr, *c_short, *c_null_funptr;
+  int c_short_kind;
+  char *name;
+
+  /* Set up the namespace.  */
+  sub_ns = gfc_get_namespace (ns, 0);
+  sub_ns->sibling = ns->contained;
+  ns->contained = sub_ns;
+  sub_ns->resolved = 1;
+
+  gfc_namespace *saved_ns = gfc_current_ns;
+  gfc_current_ns = sub_ns;
+  gfc_import_iso_c_binding_module ();
+  gfc_current_ns = saved_ns;
+  gfc_find_symbol ("c_ptr", sub_ns, 0, &c_ptr);
+  gfc_find_symbol ("c_funptr", sub_ns, 0, &c_funptr);
+  gfc_find_symbol ("c_null_funptr", sub_ns, 0, &c_null_funptr);
+  gfc_find_symbol ("c_short", sub_ns, 0, &c_short);
+  c_short_kind = mpz_get_si (c_short->value->value.integer);
+
+  /* Set up the procedure symbol.  */
+  name = xasprintf ("__callback_%s", tname);
+  gfc_get_symbol (name, sub_ns, &callback);
+  free (name);
+  sub_ns->proc_name = callback;
+  callback->attr.flavor = FL_PROCEDURE;
+  callback->attr.function = 1;
+  callback->attr.pure = 0;
+  callback->attr.recursive = 1;
+  callback->result = callback;
+  callback->ts.type = BT_INTEGER;
+  callback->ts.kind = gfc_index_integer_kind;
+  callback->attr.artificial = 1;
+  callback->attr.always_explicit = 1;
+  callback->attr.if_source = IFSRC_DECL;
+  if (ns->proc_name->attr.flavor == FL_MODULE)
+    callback->module = ns->proc_name->name;
+  gfc_set_sym_referenced (callback);
+
+  /* Set up formal argument.  */
+  gfc_get_symbol ("cb", sub_ns, &cb);
+  cb->attr.flavor = FL_PROCEDURE;
+  cb->attr.artificial = 1;
+  cb->attr.dummy = 1;
+  cb->attr.elemental = 1;
+  cb->attr.function = 1;
+  cb->result = cb;
+  cb->ts.type = BT_INTEGER;
+  cb->ts.kind = gfc_index_integer_kind;
+  cb->attr.if_source = IFSRC_IFBODY;
+  gfc_set_sym_referenced (cb);
+  callback->formal = gfc_get_formal_arglist ();
+  callback->formal->sym = cb;
+  cb->formal_ns = gfc_get_namespace (sub_ns, 0);
+  cb->formal_ns->proc_name = cb;
+  /* cb_token. */
+  gfc_get_symbol ("cb_token", cb->formal_ns, &token);
+  token->ts.type = BT_DERIVED;
+  token->ts.u.derived = c_ptr;
+  token->attr.flavor = FL_VARIABLE;
+  token->attr.dummy = 1;
+  token->attr.value = 1;
+  token->attr.artificial = 1;
+  token->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (token);
+  cb->formal = gfc_get_formal_arglist ();
+  cb->formal->sym = token;
+  /* cb_var */
+  gfc_get_symbol ("cb_var", cb->formal_ns, &token);
+  token->ts.type = BT_DERIVED;
+  token->ts.u.derived = c_ptr;
+  token->attr.flavor = FL_VARIABLE;
+  token->attr.dummy = 1;
+  token->attr.value = 1;
+  token->attr.artificial = 1;
+  token->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (token);
+  cb->formal->next = gfc_get_formal_arglist ();
+  cb->formal->next->sym = token;
+  /* cb_len */
+  gfc_get_symbol ("cb_len", cb->formal_ns, &token);
+  token->ts.type = BT_INTEGER;
+  token->ts.kind = gfc_index_integer_kind;
+  token->attr.flavor = FL_VARIABLE;
+  token->attr.dummy = 1;
+  token->attr.value = 1;
+  token->attr.artificial = 1;
+  token->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (token);
+  cb->formal->next->next = gfc_get_formal_arglist ();
+  cb->formal->next->next->sym = token;
+  /* cb_fn */
+  gfc_get_symbol ("cb_fn", cb->formal_ns, &token);
+  token->ts.type = BT_DERIVED;
+  token->ts.u.derived = c_funptr;
+  token->attr.flavor = FL_VARIABLE;
+  token->attr.dummy = 1;
+  token->attr.value = 1;
+  token->attr.artificial = 1;
+  token->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (token);
+  cb->formal->next->next->next = gfc_get_formal_arglist ();
+  cb->formal->next->next->next->sym = token;
+  /* cb_flags */
+  gfc_get_symbol ("cb_flags", cb->formal_ns, &token);
+  token->ts.type = BT_INTEGER;
+  token->ts.kind = c_short_kind;
+  token->attr.flavor = FL_VARIABLE;
+  token->attr.dummy = 1;
+  token->attr.value = 1;
+  token->attr.artificial = 1;
+  token->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (token);
+  cb->formal->next->next->next->next = gfc_get_formal_arglist ();
+  cb->formal->next->next->next->next->sym = token;
+
+  /* Con't __callback_%s  args.  */
+  gfc_get_symbol ("token", sub_ns, &token);
+  token->ts.type = BT_DERIVED;
+  token->ts.u.derived = c_ptr;
+  token->attr.flavor = FL_VARIABLE;
+  token->attr.dummy = 1;
+  token->attr.value = 1;
+  token->attr.artificial = 1;
+  token->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (token);
+  callback->formal->next = gfc_get_formal_arglist ();
+  callback->formal->next->sym = token;
+
+  gfc_get_symbol ("scalar", sub_ns, &scalar);
+  scalar->ts.type = BT_DERIVED;
+  scalar->ts.u.derived = derived;
+  scalar->attr.flavor = FL_VARIABLE;
+  scalar->attr.dummy = 1;
+  scalar->attr.contiguous = 1;
+  scalar->attr.artificial = 1;
+  scalar->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (scalar);
+  callback->formal->next->next = gfc_get_formal_arglist ();
+  callback->formal->next->next->sym = scalar;
+
+  gfc_get_symbol ("f_flags", sub_ns, &f_flags);
+  f_flags->ts.type = BT_INTEGER;
+  f_flags->ts.kind = 4;
+  f_flags->attr.flavor = FL_VARIABLE;
+  f_flags->attr.dummy = 1;
+  f_flags->attr.value = 1;
+  f_flags->attr.artificial = 1;
+  f_flags->attr.intent = INTENT_IN;
+  gfc_set_sym_referenced (f_flags);
+  callback->formal->next->next->next = gfc_get_formal_arglist ();
+  callback->formal->next->next->next->sym = f_flags;
+
+  /* Set return value to 0.  */
+  last_code = gfc_get_code (EXEC_ASSIGN);
+  last_code->expr1 = gfc_lval_expr_from_sym (callback);
+  last_code->expr2 = gfc_get_int_expr (gfc_index_integer_kind, NULL, 0);
+  sub_ns->code = last_code;
+
+  /* Call now for pointer:
+       cb (token, comp->var(.data), 0, NULL, 0);
+     for allocatable:
+       cb (token, comp->var(.data), size, NULL, 0);
+     and then for allocatable of either class type or with allocatable comps
+       for each array element
+         cb (token, comp->var(.data), 0, var's cb fn, 0);  */
+  for (gfc_component *comp = derived->components; comp; comp = comp->next)
+    {
+      bool pointer = (comp->ts.type == BT_CLASS
+		      ? CLASS_DATA (comp)->attr.pointer : comp->attr.pointer);
+      if (!pointer && comp->ts.type != BT_CLASS && !comp->attr.allocatable)
+	continue;
+
+      gfc_expr *expr = gfc_lval_expr_from_sym (scalar);
+      expr->ref = gfc_get_ref ();
+      expr->ref->type = REF_COMPONENT;
+      expr->ref->u.c.sym = derived;
+      expr->ref->u.c.component = comp;
+      expr->ts = comp->ts;
+
+      gfc_expr *size;
+      if (pointer)
+	size = gfc_get_int_expr (gfc_index_integer_kind, NULL, 0);
+      else
+	{
+	  size = gfc_get_expr ();
+	  size->expr_type = EXPR_FUNCTION;
+	  size->value.function.isym
+	    = gfc_intrinsic_function_by_id (GFC_ISYM_SIZEOF);
+	  size->value.function.name = size->value.function.isym->name;
+	  size->value.function.esym = NULL;
+	  size->value.function.actual = gfc_get_actual_arglist ();
+	  size->value.function.actual->expr = gfc_copy_expr (expr);
+	  size->where = gfc_current_locus;
+	}
+
+      if (comp->ts.type == BT_CLASS)
+	gfc_add_data_component (expr);
+      if (comp->attr.dimension)
+	{
+	  gfc_ref *ref = expr->ref->next ? expr->ref->next : expr->ref;
+	  ref->next = gfc_get_ref ();
+	  ref = ref->next;
+	  ref->type = REF_ARRAY;
+	  ref->u.ar.type = AR_FULL;
+	  ref->u.ar.as = comp->as;
+	  expr->rank = comp->as->rank;
+	}
+
+      /* if (allocated/associated(comp) */
+      last_code->next = gfc_get_code (EXEC_IF);
+      last_code = last_code->next;
+      last_code->block = gfc_get_code (EXEC_IF);
+      block = last_code->block;
+      block->expr1 = gfc_get_expr ();
+      block->expr1->expr_type = EXPR_FUNCTION;
+      block->expr1->ts.type = BT_LOGICAL;
+      block->expr1->ts.kind = 1;
+      block->expr1->value.function.isym
+	= gfc_intrinsic_function_by_id (pointer ? GFC_ISYM_ASSOCIATED
+						: GFC_ISYM_ALLOCATED);
+      block->expr1->value.function.name
+	= block->expr1->value.function.isym->name;
+      block->expr1->value.function.esym = NULL;
+      block->expr1->value.function.actual = gfc_get_actual_arglist ();
+      block->expr1->value.function.actual->expr = gfc_copy_expr (expr);
+      if (pointer)
+	block->expr1->value.function.actual->next = gfc_get_actual_arglist ();
+      block->expr1->where = gfc_current_locus;
+
+      gfc_expr *loc_expr = gfc_get_expr ();
+      loc_expr->expr_type = EXPR_FUNCTION;
+      gfc_get_sym_tree ("c_loc", sub_ns, &loc_expr->symtree, false);
+      loc_expr->symtree->n.sym->attr.flavor = FL_PROCEDURE;
+      loc_expr->symtree->n.sym->intmod_sym_id = ISOCBINDING_LOC;
+      loc_expr->symtree->n.sym->attr.intrinsic = 1;
+      loc_expr->symtree->n.sym->from_intmod = INTMOD_ISO_C_BINDING;
+      loc_expr->value.function.isym = gfc_intrinsic_function_by_id (GFC_ISYM_C_LOC);
+      loc_expr->value.function.actual = gfc_get_actual_arglist ();
+      loc_expr->value.function.actual->expr = expr;
+      loc_expr->symtree->n.sym->result = expr->symtree->n.sym;
+      loc_expr->ts.type = BT_INTEGER;
+      loc_expr->ts.kind = gfc_index_integer_kind;
+      loc_expr->where = gfc_current_locus;
+    
+      /* Call CB procedure for ptr assignment or allocatable copying.  */
+      block->next = gfc_get_code (EXEC_ASSIGN);
+      block = block->next;
+      block->expr1 = gfc_lval_expr_from_sym (callback);
+      block->expr2 = gfc_get_expr ();
+      block->expr2->ts = callback->ts;
+      block->expr2->where = gfc_current_locus;
+      block->expr2->expr_type = EXPR_OP;
+      block->expr2->value.op.op = INTRINSIC_PLUS;
+      block->expr2->value.op.op1 = gfc_lval_expr_from_sym (callback);
+      block->expr2->value.op.op2 = gfc_get_expr ();
+
+      gfc_expr *e = block->expr2->value.op.op2;
+      e->expr_type = EXPR_FUNCTION;
+      e->ts = cb->ts;
+      e->symtree = gfc_find_symtree (sub_ns->sym_root, cb->name);
+      e->value.function.esym = cb;
+      e->value.function.esym->name = cb->name;
+      e->value.function.actual = gfc_get_actual_arglist ();
+      e->value.function.actual->expr = gfc_lval_expr_from_sym (token);
+      e->value.function.actual->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->expr = loc_expr;
+      e->value.function.actual->next->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->next->expr = size;
+      e->value.function.actual->next->next->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->next->next->expr
+	= gfc_lval_expr_from_sym (c_null_funptr);
+      e->value.function.actual->next->next->next->next
+	= gfc_get_actual_arglist ();
+      e->value.function.actual->next->next->next->next->expr
+	= gfc_get_int_expr (c_short_kind, NULL, 0);
+
+      /* Call for each element cb when comp can have allocatable comps. */
+      if (((comp->ts.type != BT_DERIVED || !comp->ts.u.derived->attr.alloc_comp)
+	    && comp->ts.type != BT_CLASS)
+	  || pointer)
+	continue;
+
+      gfc_expr *vtab_cb;
+      if (comp->ts.type == BT_DERIVED)
+	vtab_cb = gfc_lval_expr_from_sym (gfc_find_vtab (&comp->ts));
+      else
+	{
+	  vtab_cb = gfc_lval_expr_from_sym (scalar);
+	  vtab_cb->ref = gfc_get_ref ();
+	  vtab_cb->ref->type = REF_COMPONENT;
+	  vtab_cb->ref->u.c.sym = derived;
+	  vtab_cb->ref->u.c.component = comp;
+	  gfc_add_vptr_component (vtab_cb);
+	}
+      gfc_add_component_ref (vtab_cb, "_callback");
+
+      block->next = gfc_get_code (EXEC_ASSIGN);
+      block = block->next;
+      block->expr1 = gfc_lval_expr_from_sym (callback);
+      block->expr2 = gfc_get_expr ();
+      block->expr2->ts = callback->ts;
+      block->expr2->where = gfc_current_locus;
+      block->expr2->expr_type = EXPR_OP;
+      block->expr2->value.op.op = INTRINSIC_PLUS;
+      block->expr2->value.op.op1 = gfc_lval_expr_from_sym (callback);
+      block->expr2->value.op.op2 = gfc_get_expr ();
+
+      e = block->expr2->value.op.op2;
+      e->expr_type = EXPR_FUNCTION;
+      e->ts = cb->ts;
+      e->symtree = gfc_find_symtree (sub_ns->sym_root, cb->name);
+      e->value.function.esym = cb;
+      e->value.function.esym->name = cb->name;
+      e->value.function.actual = gfc_get_actual_arglist ();
+      e->value.function.actual->expr = gfc_lval_expr_from_sym (token);
+      e->value.function.actual->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->expr = gfc_copy_expr (expr);
+      e->value.function.actual->next->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->next->expr
+	= gfc_get_int_expr (gfc_index_integer_kind, NULL, 0);
+      e->value.function.actual->next->next->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->next->next->expr = vtab_cb;
+      e->value.function.actual->next->next->next->next = gfc_get_actual_arglist ();
+      e->value.function.actual->next->next->next->next->expr
+	= gfc_get_int_expr (c_short_kind, NULL, 0);
+    }
+
+  vtab_cb->initializer = gfc_lval_expr_from_sym (callback);
+  vtab_cb->ts.interface = callback;
+  gfc_commit_symbols ();
+}
+
 /* Add procedure pointers for all type-bound procedures to a vtab.  */
 
 static void
@@ -2598,6 +2940,15 @@  gfc_find_derived_vtab (gfc_symbol *derived)
 		  c->ts.interface = dealloc;
 		}
 
+	      /* Add component _callback.  */
+	      if (!gfc_add_component (vtype, "_callback", &c))
+		goto cleanup;
+	      c->attr.proc_pointer = 1;
+	      c->attr.access = ACCESS_PRIVATE;
+	      c->tb = XCNEW (gfc_typebound_proc);
+	      c->tb->ppc = 1;
+	      generate_callback_wrapper (derived, ns, tname, c);
+
 	      /* Add procedure pointers for type-bound procedures.  */
 	      if (!derived->attr.unlimited_polymorphic)
 		add_procs_to_declared_vtab (derived, vtype);
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 2aa44ff864c..b318ec5802d 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1331,14 +1331,22 @@  show_omp_namelist (int list_type, gfc_omp_namelist *n)
   for (; n; n = n->next)
     {
       gfc_current_ns = ns_curr;
-      if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND)
+      if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND
+	  || list_type == OMP_LIST_MAP)
 	{
 	  gfc_current_ns = n->u2.ns ? n->u2.ns : ns_curr;
 	  if (n->u2.ns != ns_iter)
 	    {
+	      const char *clause_name;
+	      switch (list_type)
+		{
+		case OMP_LIST_AFFINITY: clause_name = ") AFFINITY ("; break;
+		case OMP_LIST_DEPEND: clause_name = ") DEPEND ("; break;
+		case OMP_LIST_MAP: clause_name = ") MAP ("; break;
+		default: gcc_unreachable ();
+		}
 	      if (n != n2)
-		fputs (list_type == OMP_LIST_AFFINITY
-		       ? ") AFFINITY(" : ") DEPEND(", dumpfile);
+		fputs (clause_name, dumpfile);
 	      if (n->u2.ns)
 		{
 		  fputs ("ITERATOR(", dumpfile);
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index e5d2dd7971e..207a8307c99 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3783,6 +3783,7 @@  void gfc_free_wait (gfc_wait *);
 bool gfc_resolve_wait (gfc_wait *);
 
 /* module.c */
+void gfc_import_iso_c_binding_module (void);
 void gfc_module_init_2 (void);
 void gfc_module_done_2 (void);
 void gfc_dump_module (const char *, int);
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 3682f9ae21f..147b1fa3532 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -2029,7 +2029,7 @@  add_functions (void)
 
   add_sym_1 ("get_team", GFC_ISYM_GET_TEAM, CLASS_TRANSFORMATIONAL,
 	     ACTUAL_NO, BT_INTEGER, di, GFC_STD_F2018,
-	     gfc_check_get_team, NULL, gfc_resolve_get_team,
+	     gfc_check_get_team, gfc_simplify_get_team, gfc_resolve_get_team,
 	     level, BT_INTEGER, di, OPTIONAL);
 
   add_sym_0 ("getuid", GFC_ISYM_GETUID, CLASS_IMPURE, ACTUAL_NO, BT_INTEGER,
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 7b98ba539d6..4b9aa3c95ba 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -84,7 +84,7 @@  along with GCC; see the file COPYING3.  If not see
 
 /* Don't put any single quote (') in MOD_VERSION, if you want it to be
    recognized.  */
-#define MOD_VERSION "15"
+#define MOD_VERSION "16"
 
 
 /* Structure that describes a position within a module file.  */
@@ -6896,6 +6896,13 @@  import_iso_c_binding_module (void)
      }
 }
 
+void
+gfc_import_iso_c_binding_module (void)
+{
+  gcc_assert (gfc_rename_list == NULL);
+  import_iso_c_binding_module ();
+}
+
 
 /* Add an integer named constant from a given module.  */
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 846fd7b5c5a..bdcdfb3c1fa 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -107,7 +107,8 @@  gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->vector_length_expr);
   for (i = 0; i < OMP_LIST_NUM; i++)
     gfc_free_omp_namelist (c->lists[i],
-			   i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND);
+			   (i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND
+			    || i == OMP_LIST_MAP));
   gfc_free_expr_list (c->wait_list);
   gfc_free_expr_list (c->tile_list);
   free (CONST_CAST (char *, c->critical_name));
@@ -2304,6 +2305,8 @@  gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	      && gfc_match ("map ( ") == MATCH_YES)
 	    {
 	      locus old_loc2 = gfc_current_locus;
+	      gfc_namespace *ns_iter = NULL, *ns_curr = gfc_current_ns;
+	      match m, m_it = MATCH_NO;
 	      int always_modifier = 0;
 	      int close_modifier = 0;
 	      locus second_always_locus = old_loc2;
@@ -2312,6 +2315,8 @@  gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	      for (;;)
 		{
 		  locus current_locus = gfc_current_locus;
+		  gfc_namespace *ns_iter2 = NULL;
+		  match m_it2 = MATCH_NO;
 		  if (gfc_match ("always ") == MATCH_YES)
 		    {
 		      if (always_modifier++ == 1)
@@ -2322,6 +2327,20 @@  gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		      if (close_modifier++ == 1)
 			second_close_locus = current_locus;
 		    }
+		  else if ((m_it2 = gfc_match_iterator (&ns_iter2, false))
+			   != MATCH_NO)
+		    {
+		      if (m_it == MATCH_ERROR)
+			goto end;
+		      if (m_it == MATCH_YES)
+			{
+			  gfc_error ("too many %<iterator%> modifiers at %L",
+				     &current_locus);
+			  goto end;
+			}
+		      m_it = m_it2;
+		      ns_iter = ns_iter2;
+		    }
 		  else
 		    break;
 		  gfc_match (", ");
@@ -2360,14 +2379,22 @@  gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		  break;
 		}
 
+	      if (ns_iter)
+		gfc_current_ns = ns_iter;
 	      head = NULL;
-	      if (gfc_match_omp_variable_list ("", &c->lists[OMP_LIST_MAP],
-					       false, NULL, &head,
-					       true, true) == MATCH_YES)
+	      m = gfc_match_omp_variable_list ("", &c->lists[OMP_LIST_MAP],
+					       false, NULL, &head, true, true);
+	      gfc_current_ns = ns_curr;
+	      if (m == MATCH_YES)
 		{
 		  gfc_omp_namelist *n;
 		  for (n = *head; n; n = n->next)
-		    n->u.map_op = map_op;
+		    {
+		      n->u.map_op = map_op;
+		      n->u2.ns = ns_iter;
+		      if (ns_iter)
+			ns_iter->refs++;
+		    }
 		  continue;
 		}
 	      gfc_current_locus = old_loc;
@@ -6715,7 +6742,9 @@  resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	  case OMP_LIST_CACHE:
 	    for (; n != NULL; n = n->next)
 	      {
-		if ((list == OMP_LIST_DEPEND || list == OMP_LIST_AFFINITY)
+		if ((list == OMP_LIST_DEPEND
+		     || list == OMP_LIST_AFFINITY
+		     || list == OMP_LIST_MAP)
 		    && n->u2.ns && !n->u2.ns->resolved)
 		  {
 		    n->u2.ns->resolved = 1;
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 0ed31970f8b..7bfe9f266e7 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -13365,7 +13365,7 @@  resolve_fl_procedure (gfc_symbol *sym, int mp_flag)
 		     name, &sym->declared_at);
 	  return false;
 	}
-      if (sym->attr.dummy)
+      if (sym->attr.dummy && !sym->attr.artificial)
 	{
 	  gfc_error ("Dummy procedure %qs at %L shall not be elemental",
 		     sym->name, &sym->declared_at);
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index e413b2d7a1f..65684612c3c 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -203,6 +203,7 @@  gfc_get_ultimate_alloc_ptr_comps_caf_token (gfc_se *outerse, gfc_expr *expr)
 #define VTABLE_COPY_FIELD 4
 #define VTABLE_FINAL_FIELD 5
 #define VTABLE_DEALLOCATE_FIELD 6
+#define VTABLE_CALLBACK_FIELD 7
 
 
 tree
@@ -382,6 +383,7 @@  VTAB_GET_FIELD_GEN (def_init, VTABLE_DEF_INIT_FIELD)
 VTAB_GET_FIELD_GEN (copy, VTABLE_COPY_FIELD)
 VTAB_GET_FIELD_GEN (final, VTABLE_FINAL_FIELD)
 VTAB_GET_FIELD_GEN (deallocate, VTABLE_DEALLOCATE_FIELD)
+VTAB_GET_FIELD_GEN (callback, VTABLE_CALLBACK_FIELD)
 #undef VTAB_GET_FIELD_GEN
 
 /* The size field is returned as an array index type.  Therefore treat
@@ -419,6 +421,9 @@  gfc_vptr_size_get (tree vptr)
 #undef VTABLE_DEF_INIT_FIELD
 #undef VTABLE_COPY_FIELD
 #undef VTABLE_FINAL_FIELD
+#undef VTABLE_DEALLOCATE_FIELD
+#undef VTABLE_CALLBACK_FIELD
+
 
 
 /* IF ts is null (default), search for the last _class ref in the chain
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 909821d3284..125c1f32e6a 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -8101,7 +8101,8 @@  gfc_conv_intrinsic_sizeof (gfc_se *se, gfc_expr *expr)
 	byte_size = gfc_class_vtab_size_get (TREE_OPERAND (argse.expr, 0));
       else if (arg->rank > 0
 	       || (arg->rank == 0
-		   && arg->ref && arg->ref->type == REF_COMPONENT))
+		   && arg->ref && arg->ref->type == REF_COMPONENT
+		   && strcmp (arg->ref->u.c.component->name, "_data") == 0))  // FIXME!
 	/* The scalarizer added an additional temp.  To get the class' vptr
 	   one has to look at the original backend_decl.  */
 	byte_size = gfc_class_vtab_size_get (
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 201550691bd..fc11689e756 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2912,11 +2912,38 @@  gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	    }
 	  break;
 	case OMP_LIST_MAP:
+	  iterator = NULL_TREE;
+	  prev = NULL;
+	  prev_clauses = omp_clauses;
 	  for (; n != NULL; n = n->next)
 	    {
 	      if (!n->sym->attr.referenced)
 		continue;
 
+	      if (iterator && prev->u2.ns != n->u2.ns)
+		{
+		  BLOCK_SUBBLOCKS (tree_block) = gfc_finish_block (&iter_block);
+		  TREE_VEC_ELT (iterator, 5) = tree_block;
+		  for (tree c = omp_clauses; c != prev_clauses;
+		       c = OMP_CLAUSE_CHAIN (c))
+		    OMP_CLAUSE_DECL (c) = build_tree_list (iterator,
+							   OMP_CLAUSE_DECL (c));
+		  prev_clauses = omp_clauses;
+		  iterator = NULL_TREE;
+		}
+	      if (n->u2.ns && (!prev || prev->u2.ns != n->u2.ns))
+		{
+		  gfc_init_block (&iter_block);
+		  tree_block = make_node (BLOCK);
+		  TREE_USED (tree_block) = 1;
+		  BLOCK_VARS (tree_block) = NULL_TREE;
+		  iterator = handle_iterator (n->u2.ns, block,
+					      tree_block);
+		}
+	      if (!iterator)
+		gfc_init_block (&iter_block);
+	      prev = n;
+
 	      bool always_modifier = false;
 	      tree node = build_omp_clause (input_location, OMP_CLAUSE_MAP);
 	      tree node2 = NULL_TREE;
@@ -3023,8 +3050,8 @@  gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		}
 
 	      if (n->expr == NULL
-		  || (n->expr->ref->type == REF_ARRAY
-		      && n->expr->ref->u.ar.type == AR_FULL))
+		       || (n->expr->ref->type == REF_ARRAY
+			   && n->expr->ref->u.ar.type == AR_FULL))
 		{
 		  tree present = gfc_omp_check_optional_argument (decl, true);
 		  if (openacc && n->sym->ts.type == BT_CLASS)
@@ -3504,7 +3531,8 @@  gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		sorry ("unhandled expression");
 
 	      finalize_map_clause:
-
+	      if (!iterator)
+		gfc_add_block_to_block (block, &iter_block);
 	      omp_clauses = gfc_trans_add_clause (node, omp_clauses);
 	      if (node2)
 		omp_clauses = gfc_trans_add_clause (node2, omp_clauses);
@@ -3512,6 +3540,31 @@  gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		omp_clauses = gfc_trans_add_clause (node3, omp_clauses);
 	      if (node4)
 		omp_clauses = gfc_trans_add_clause (node4, omp_clauses);
+
+	      if (!openacc && n->sym->ts.type == BT_CLASS)
+		{
+		  node2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
+		  OMP_CLAUSE_SET_MAP_KIND (node2, GOMP_MAP_FUNCTION);
+		  OMP_CLAUSE_DECL (node2) = gfc_class_vtab_callback_get (decl);
+		  OMP_CLAUSE_SIZE (node2) = size_int (1);
+		  omp_clauses = gfc_trans_add_clause (node2, omp_clauses);
+
+		  node2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
+		  OMP_CLAUSE_DECL (node2) = decl;
+		  OMP_CLAUSE_SET_MAP_KIND (node2, OMP_CLAUSE_MAP_KIND (node));
+		  OMP_CLAUSE_SIZE (node2) = size_int (0);
+		  omp_clauses = gfc_trans_add_clause (node2, omp_clauses);
+		}
+
+	      if (iterator)
+		{
+		  BLOCK_SUBBLOCKS (tree_block) = gfc_finish_block (&iter_block);
+		  TREE_VEC_ELT (iterator, 5) = tree_block;
+		  for (tree c = omp_clauses; c != prev_clauses;
+		    c = OMP_CLAUSE_CHAIN (c))
+		  OMP_CLAUSE_DECL (c) = build_tree_list (iterator,
+							 OMP_CLAUSE_DECL (c));
+		}
 	    }
 	  break;
 	case OMP_LIST_TO:
diff --git a/gcc/fortran/trans.h b/gcc/fortran/trans.h
index 15012a336ff..f6906972c65 100644
--- a/gcc/fortran/trans.h
+++ b/gcc/fortran/trans.h
@@ -435,6 +435,7 @@  tree gfc_class_vtab_size_get (tree);
 tree gfc_class_vtab_def_init_get (tree);
 tree gfc_class_vtab_copy_get (tree);
 tree gfc_class_vtab_final_get (tree);
+tree gfc_class_vtab_callback_get (tree);
 /* Get an accessor to the vtab's * field, when a vptr handle is present.  */
 tree gfc_vptr_hash_get (tree);
 tree gfc_vptr_size_get (tree);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 326476f0238..55ea654a9e4 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8581,6 +8581,119 @@  gimplify_omp_depend (tree *list_p, gimple_seq *pre_p)
   return 1;
 }
 
+/* Gimplify the map clause with iterator.  This generates a (lambda) function
+   which is then invoked during the mapping:
+     size_t map_fn (size_t(*cb)(token), token, base_addr, flags)  */
+
+static void
+gimplify_omp_map_iterator (tree *list_p, gimple_seq * /*pre_p*/)
+{
+  //FIXME: UNKNOWN_LOCATION -> OMP_CLAUSE_LOCATION (c)
+  location_t loc = UNKNOWN_LOCATION;
+  tree name, type, decl, tmp, cb_fn, token, base;
+  /* Declare function.  */
+  name = clone_function_name_numbered (current_function_decl, "_omp_mapfn");
+  // FIXME: -- add flags
+  type = build_function_type_list (size_type_node, ptr_type_node, NULL_TREE);
+  type = build_pointer_type (type);
+  type = build_function_type_list (size_type_node, type, ptr_type_node,
+				   ptr_type_node, short_unsigned_type_node,
+				   NULL_TREE);
+  decl = build_decl (loc, FUNCTION_DECL, name, type);
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  TREE_PUBLIC (decl) = 0;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+
+  tmp = build_decl (loc, RESULT_DECL, NULL_TREE, size_type_node);
+  DECL_ARTIFICIAL (tmp) = 1;
+  DECL_IGNORED_P (tmp) = 1;
+  DECL_CONTEXT (tmp) = decl;
+  DECL_RESULT (decl) = tmp;
+
+  /* Declare its args.  */
+  tree arglist = NULL_TREE;
+  tree typelist = TYPE_ARG_TYPES (TREE_TYPE (decl));
+  tmp = TREE_VALUE (typelist);
+  cb_fn = build_decl (input_location, PARM_DECL, get_identifier ("cb_fn"), tmp);
+  DECL_CONTEXT (cb_fn) = decl;
+  DECL_ARG_TYPE (cb_fn) = TREE_VALUE (typelist);
+  TREE_READONLY (cb_fn) = 1;
+  arglist = chainon (arglist, cb_fn);
+
+  typelist = TREE_CHAIN (typelist);
+  tmp = TREE_VALUE (typelist);
+  token = build_decl (input_location, PARM_DECL, get_identifier ("token"), tmp);
+  DECL_CONTEXT (token) = decl;
+  DECL_ARG_TYPE (token) = TREE_VALUE (typelist);
+  TREE_READONLY (token) = 1;
+  arglist = chainon (arglist, token);
+
+  typelist = TREE_CHAIN (typelist);
+  tmp = TREE_VALUE (typelist);
+  base = build_decl (input_location, PARM_DECL, get_identifier ("base"), tmp);
+  DECL_CONTEXT (base) = decl;
+  DECL_ARG_TYPE (base) = TREE_VALUE (typelist);
+  TREE_READONLY (base) = 1;
+  arglist = chainon (arglist, base);
+
+  typelist = TREE_CHAIN (typelist);
+  tmp = TREE_VALUE (typelist);
+  base = build_decl (input_location, PARM_DECL, get_identifier ("flags"), tmp);
+  DECL_CONTEXT (base) = decl;
+  DECL_ARG_TYPE (base) = TREE_VALUE (typelist);
+  TREE_READONLY (base) = 1;
+  arglist = chainon (arglist, base);
+
+  DECL_ARGUMENTS (decl) = arglist;
+  push_struct_function (decl);
+  push_gimplify_context (true);
+
+  /* Body. */
+  gimple_seq seq = NULL;
+  tree size = build_decl (input_location, VAR_DECL,
+			  create_tmp_var_name ("size"), size_type_node);
+  tmp = fold_build2_loc (loc, MODIFY_EXPR, size_type_node,
+			 size, build_int_cst (size_type_node, 0));
+  gimplify_and_add (tmp, &seq);
+
+  tmp = build_call_expr_loc (loc, build_fold_indirect_ref_loc (loc, cb_fn), 1, token);
+  gimplify_and_add (tmp, &seq);
+
+  tmp = fold_build2_loc (input_location, MODIFY_EXPR, integer_type_node,
+			 DECL_RESULT (decl), size);
+  tmp = fold_build1_loc (loc, RETURN_EXPR, void_type_node, tmp);
+  gimplify_and_add (tmp, &seq);
+
+  pop_gimplify_context (NULL);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+  cfun->function_end_locus = loc;
+  cfun->curr_properties |= PROP_gimple_any;
+  init_tree_ssa (cfun);
+  pop_cfun ();
+
+  //cgraph_node *node = cgraph_node::get_create (decl);
+  cgraph_node::add_new_function (decl, true);
+
+  if (dump_file)
+    {
+      dump_function_header (dump_file, decl, dump_flags);
+      dump_function_to_file (decl, dump_file, dump_flags);
+    }
+
+  tree c = build_omp_clause (loc, OMP_CLAUSE_MAP);
+  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FUNCTION);
+  OMP_CLAUSE_DECL (c) = decl;  /* Will later contain the generated function. */
+  OMP_CLAUSE_SIZE (c) = size_int (0);
+  OMP_CLAUSE_CHAIN (c) = OMP_CLAUSE_CHAIN (*list_p);
+  *list_p = c;
+}
+
 /* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
    GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
    the struct node to insert the new mapping after (when the struct node is
@@ -9299,6 +9412,16 @@  gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	    default:
 	      break;
 	    }
+	  if (TREE_CODE (decl) == TREE_LIST
+	      && TREE_PURPOSE (decl)
+	      && TREE_CODE (TREE_PURPOSE (decl)) == TREE_VEC)
+	    {
+	      gimplify_omp_map_iterator (list_p, pre_p);
+	      omp_add_variable (ctx, TREE_VALUE (decl),
+				GOVD_MAP | GOVD_SEEN | GOVD_EXPLICIT);
+	      break;
+	    }
+
 	  /* For Fortran, not only the pointer to the data is mapped but also
 	     the address of the pointer, the array descriptor etc.; for
 	     'exit data' - and in particular for 'delete:' - having an 'alloc:'
@@ -11186,6 +11309,15 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	      remove = true;
 	      break;
 	    }
+	  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FUNCTION)
+	    {
+	      /* Ensure argument is kept.
+		 TODO: do removals similar to struct element mapping.  */
+	      HOST_WIDE_INT cnt = tree_to_shwi (OMP_CLAUSE_SIZE (c));
+	      while (c && cnt--)
+		c = OMP_CLAUSE_CHAIN (c);
+	      break;
+	    }
 	  decl = OMP_CLAUSE_DECL (c);
 	  /* Data clauses associated with reductions must be
 	     compatible with present_or_copy.  Warn and adjust the clause
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index de3a26e08fc..84ca8ae4e9a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1495,9 +1495,29 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_TO:
 	case OMP_CLAUSE_FROM:
 	case OMP_CLAUSE_MAP:
+	  decl = OMP_CLAUSE_DECL (c);
+	  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FUNCTION)
+	    {
+	      /* This is only needed on the sender side which maps all variables.
+		 FIXME: For map(..., a[i]....), there must be 'a' mapped and
+		 handled both on the sender & receiver side such that the map
+		 function only fills in the gaps.  */
+	      tree field
+		= build_decl (OMP_CLAUSE_LOCATION (c), FIELD_DECL, NULL_TREE,
+			      build_pointer_type (TREE_TYPE (decl)));
+	      SET_DECL_ALIGN (field, TYPE_ALIGN (ptr_type_node));
+	      insert_field_into_struct (ctx->record_type, field);
+	      /* To not clash with a map of the pointer variable itself,
+		 attach/detach maps have their field looked up by the *clause*
+		 tree expression, not the decl.  */
+	      gcc_assert (!splay_tree_lookup (ctx->field_map,
+					      (splay_tree_key) c));
+	      splay_tree_insert (ctx->field_map, (splay_tree_key) decl,
+				 (splay_tree_value) field);
+	      break;
+	    }
 	  if (ctx->outer)
 	    scan_omp_op (&OMP_CLAUSE_SIZE (c), ctx->outer);
-	  decl = OMP_CLAUSE_DECL (c);
 	  /* Global variables with "omp declare target" attribute
 	     don't need to be copied, the receiver side will use them
 	     directly.  However, global variables with "omp declare target link"
@@ -1794,7 +1814,8 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  break;
 
 	case OMP_CLAUSE_MAP:
-	  if (!is_gimple_omp_offloaded (ctx->stmt))
+	  if (!is_gimple_omp_offloaded (ctx->stmt)
+	      || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FUNCTION)
 	    break;
 	  decl = OMP_CLAUSE_DECL (c);
 	  if (DECL_P (decl)
@@ -10524,7 +10545,7 @@  oacc_privatization_begin_diagnose_var (const dump_flags_t l_dump_flags,
 # pragma GCC diagnostic ignored "-Wformat"
 #endif
   dump_printf_loc (l_dump_flags, d_u_loc,
-		   "variable %<%T%> ", decl);
+		   "variable %qT ", decl);
 #if __GNUC__ >= 10
 # pragma GCC diagnostic pop
 #endif
@@ -12635,6 +12656,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  case GOMP_MAP_ALWAYS_POINTER:
 	  case GOMP_MAP_ATTACH:
 	  case GOMP_MAP_DETACH:
+	  case GOMP_MAP_FUNCTION:
 	    break;
 	  case GOMP_MAP_IF_PRESENT:
 	  case GOMP_MAP_FORCE_ALLOC:
@@ -12699,6 +12721,12 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	    continue;
 	  }
 
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	    && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FUNCTION)
+	  {
+	    map_cnt++;
+	    continue;
+	  }
 	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	    && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
 		|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH)
@@ -12923,6 +12951,25 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  oacc_firstprivate_map:
 	    nc = c;
 	    ovar = OMP_CLAUSE_DECL (c);
+	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		&& OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FUNCTION)
+	      {
+		unsigned HOST_WIDE_INT tkind = OMP_CLAUSE_MAP_KIND (c);
+		splay_tree_node n = splay_tree_lookup (ctx->field_map,
+						       (splay_tree_key) ovar);
+		x = omp_build_component_ref (ctx->sender_decl, (tree) n->value);
+		gimplify_assign (x, build_fold_addr_expr (ovar), &ilist);
+		s = size_int (0);
+		purpose = size_int (map_idx++);
+		CONSTRUCTOR_APPEND_ELT (vsize, purpose, s);
+		gcc_checking_assert (tkind
+				     < (HOST_WIDE_INT_C (1U) << talign_shift));
+		gcc_checking_assert (
+		  tkind <= tree_to_uhwi (TYPE_MAX_VALUE (tkind_type)));
+		CONSTRUCTOR_APPEND_ELT (vkind, purpose,
+					build_int_cstu (tkind_type, tkind));
+		break;
+	      }
 	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 		&& (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
 		    || (OMP_CLAUSE_MAP_KIND (c)
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index fcc0796e3a1..54c618a8a5e 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -775,7 +775,7 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
 	    && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
 	  {
 	    dump_omp_iterators (pp, TREE_PURPOSE (t), spc, flags);
-	    pp_colon (pp);
+	    pp_comma (pp);
 	    t = TREE_VALUE (t);
 	  }
 	dump_generic_node (pp, t, spc, flags, false);
@@ -854,94 +854,108 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
 
     case OMP_CLAUSE_MAP:
       pp_string (pp, "map(");
-      switch (OMP_CLAUSE_MAP_KIND (clause))
-	{
-	case GOMP_MAP_ALLOC:
-	case GOMP_MAP_POINTER:
-	  pp_string (pp, "alloc");
-	  break;
-	case GOMP_MAP_IF_PRESENT:
-	  pp_string (pp, "no_alloc");
-	  break;
-	case GOMP_MAP_TO:
-	case GOMP_MAP_TO_PSET:
-	  pp_string (pp, "to");
-	  break;
-	case GOMP_MAP_FROM:
-	  pp_string (pp, "from");
-	  break;
-	case GOMP_MAP_TOFROM:
-	  pp_string (pp, "tofrom");
-	  break;
-	case GOMP_MAP_FORCE_ALLOC:
-	  pp_string (pp, "force_alloc");
-	  break;
-	case GOMP_MAP_FORCE_TO:
-	  pp_string (pp, "force_to");
-	  break;
-	case GOMP_MAP_FORCE_FROM:
-	  pp_string (pp, "force_from");
-	  break;
-	case GOMP_MAP_FORCE_TOFROM:
-	  pp_string (pp, "force_tofrom");
-	  break;
-	case GOMP_MAP_FORCE_PRESENT:
-	  pp_string (pp, "force_present");
-	  break;
-	case GOMP_MAP_DELETE:
-	  pp_string (pp, "delete");
-	  break;
-	case GOMP_MAP_FORCE_DEVICEPTR:
-	  pp_string (pp, "force_deviceptr");
-	  break;
-	case GOMP_MAP_ALWAYS_TO:
-	  pp_string (pp, "always,to");
-	  break;
-	case GOMP_MAP_ALWAYS_FROM:
-	  pp_string (pp, "always,from");
-	  break;
-	case GOMP_MAP_ALWAYS_TOFROM:
-	  pp_string (pp, "always,tofrom");
-	  break;
-	case GOMP_MAP_RELEASE:
-	  pp_string (pp, "release");
-	  break;
-	case GOMP_MAP_FIRSTPRIVATE_POINTER:
-	  pp_string (pp, "firstprivate");
-	  break;
-	case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
-	  pp_string (pp, "firstprivate ref");
-	  break;
-	case GOMP_MAP_STRUCT:
-	  pp_string (pp, "struct");
-	  break;
-	case GOMP_MAP_ALWAYS_POINTER:
-	  pp_string (pp, "always_pointer");
-	  break;
-	case GOMP_MAP_DEVICE_RESIDENT:
-	  pp_string (pp, "device_resident");
-	  break;
-	case GOMP_MAP_LINK:
-	  pp_string (pp, "link");
-	  break;
-	case GOMP_MAP_ATTACH:
-	  pp_string (pp, "attach");
-	  break;
-	case GOMP_MAP_DETACH:
-	  pp_string (pp, "detach");
-	  break;
-	case GOMP_MAP_FORCE_DETACH:
-	  pp_string (pp, "force_detach");
-	  break;
-	case GOMP_MAP_ATTACH_DETACH:
-	  pp_string (pp, "attach_detach");
-	  break;
-	default:
-	  gcc_unreachable ();
-	}
-      pp_colon (pp);
-      dump_generic_node (pp, OMP_CLAUSE_DECL (clause),
-			 spc, flags, false);
+      {
+	tree t = OMP_CLAUSE_DECL (clause);
+	if (t != NULL_TREE
+	    && TREE_CODE (t) == TREE_LIST
+	    && TREE_PURPOSE (t)
+	    && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+	  {
+	    dump_omp_iterators (pp, TREE_PURPOSE (t), spc, flags);
+	    pp_colon (pp);
+	    t = TREE_VALUE (t);
+	  }
+	switch (OMP_CLAUSE_MAP_KIND (clause))
+	  {
+	  case GOMP_MAP_ALLOC:
+	  case GOMP_MAP_POINTER:
+	    pp_string (pp, "alloc");
+	    break;
+	  case GOMP_MAP_IF_PRESENT:
+	    pp_string (pp, "no_alloc");
+	    break;
+	  case GOMP_MAP_TO:
+	  case GOMP_MAP_TO_PSET:
+	    pp_string (pp, "to");
+	    break;
+	  case GOMP_MAP_FROM:
+	    pp_string (pp, "from");
+	    break;
+	  case GOMP_MAP_TOFROM:
+	    pp_string (pp, "tofrom");
+	    break;
+	  case GOMP_MAP_FORCE_ALLOC:
+	    pp_string (pp, "force_alloc");
+	    break;
+	  case GOMP_MAP_FORCE_TO:
+	    pp_string (pp, "force_to");
+	    break;
+	  case GOMP_MAP_FORCE_FROM:
+	    pp_string (pp, "force_from");
+	    break;
+	  case GOMP_MAP_FORCE_TOFROM:
+	    pp_string (pp, "force_tofrom");
+	    break;
+	  case GOMP_MAP_FORCE_PRESENT:
+	    pp_string (pp, "force_present");
+	    break;
+	  case GOMP_MAP_DELETE:
+	    pp_string (pp, "delete");
+	    break;
+	  case GOMP_MAP_FORCE_DEVICEPTR:
+	    pp_string (pp, "force_deviceptr");
+	    break;
+	  case GOMP_MAP_ALWAYS_TO:
+	    pp_string (pp, "always,to");
+	    break;
+	  case GOMP_MAP_ALWAYS_FROM:
+	    pp_string (pp, "always,from");
+	    break;
+	  case GOMP_MAP_ALWAYS_TOFROM:
+	    pp_string (pp, "always,tofrom");
+	    break;
+	  case GOMP_MAP_RELEASE:
+	    pp_string (pp, "release");
+	    break;
+	  case GOMP_MAP_FIRSTPRIVATE_POINTER:
+	    pp_string (pp, "firstprivate");
+	    break;
+	  case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
+	    pp_string (pp, "firstprivate ref");
+	    break;
+	  case GOMP_MAP_STRUCT:
+	    pp_string (pp, "struct");
+	    break;
+	  case GOMP_MAP_ALWAYS_POINTER:
+	    pp_string (pp, "always_pointer");
+	    break;
+	  case GOMP_MAP_DEVICE_RESIDENT:
+	    pp_string (pp, "device_resident");
+	    break;
+	  case GOMP_MAP_LINK:
+	    pp_string (pp, "link");
+	    break;
+	  case GOMP_MAP_ATTACH:
+	    pp_string (pp, "attach");
+	    break;
+	  case GOMP_MAP_DETACH:
+	    pp_string (pp, "detach");
+	    break;
+	  case GOMP_MAP_FORCE_DETACH:
+	    pp_string (pp, "force_detach");
+	    break;
+	  case GOMP_MAP_ATTACH_DETACH:
+	    pp_string (pp, "attach_detach");
+	    break;
+	  case GOMP_MAP_FUNCTION:
+	    pp_string (pp, "map_function");
+	    break;
+	  default:
+	    gcc_unreachable ();
+	  }
+	pp_colon (pp);
+	dump_generic_node (pp, t, spc, flags, false);
+      }
      print_clause_size:
       if (OMP_CLAUSE_SIZE (clause))
 	{
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 3e42d7123ae..f5c12c9228e 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -162,7 +162,9 @@  enum gomp_map_kind
     /* In OpenACC, detach a pointer to a mapped struct field.  */
     GOMP_MAP_FORCE_DETACH =		(GOMP_MAP_DEEP_COPY
 					 | GOMP_MAP_FLAG_FORCE | 1),
-
+    /* Unrelated to GOMP_MAP_DEEP_COPY, but using still avaliable bits. */
+    /* Callback function to be used for mapping.  */
+    GOMP_MAP_FUNCTION = 		(GOMP_MAP_DEEP_COPY | 3),
     /* Internal to GCC, not used in libgomp.  */
     /* Do not map, but pointer assign a pointer instead.  */
     GOMP_MAP_FIRSTPRIVATE_POINTER =	(GOMP_MAP_LAST | 1),
diff --git a/libgomp/target.c b/libgomp/target.c
index 5d3103a40c2..77a7968870b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -876,20 +876,128 @@  gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
     }
 }
 
+struct mapfn_token {
+  size_t idx, max;
+  struct {
+    size_t num;
+    size_t size;
+    unsigned short kind;
+    void *hostaddr;
+  } *n;
+};
+
+/* Called by the GOMP_MAP_FUNCTION.  */
+/* Returns the number of mappings - 1 unless subfunctions are called.  */
+
+size_t
+GOMP_map_callback_fn (struct mapfn_token *token, void *hostaddr, size_t size,
+		      unsigned short kind)
+{
+  assert (token->idx < token->max);
+  token->n[token->idx].hostaddr = hostaddr;
+  token->n[token->idx].size = size;
+  token->n[token->idx].kind = kind;
+  token->idx++;
+  return 1;
+}
+
+/* Datatype of GOMP_MAP_FUNCTION.
+   Arguments:
+   - GOMP_map_callback_fn
+   - token  (passed on to GOMP_map_callback_fn)
+   - baseptr  (NULL unless GOMP_MAP_FUNCTION has size > 0)
+   - flags
+   Return value: Sum of values returned by GOMP_map_callback_fn.
+   i.e. number of requested mappings.  */
+typedef size_t (*map_callback_fn_t) (struct mapfn_token *, void *, size_t,
+				     unsigned short);
+typedef size_t (*map_fn_t) (map_callback_fn_t, void *, void *, unsigned short);
+
 static inline __attribute__((always_inline)) struct target_mem_desc *
 gomp_map_vars_internal (struct gomp_device_descr *devicep,
 			struct goacc_asyncqueue *aq, size_t mapnum,
-			void **hostaddrs, void **devaddrs, size_t *sizes,
-			void *kinds, bool short_mapkind,
+			void **hostaddrs_arg, void **devaddrs,
+			size_t *sizes_arg, void *kinds_arg, bool short_mapkind,
 			htab_t *refcount_set,
 			enum gomp_map_vars_kind pragma_kind)
 {
-  size_t i, tgt_align, tgt_size, not_found_cnt = 0;
+  const int typemask = short_mapkind ? 0xff : 0x7;
+  size_t i, extranums = 0, n_mapfn = 0;
+  struct mapfn_token token = {};
+  void **hostaddrs = hostaddrs_arg;
+  void *kinds = kinds_arg;
+  size_t *sizes = sizes_arg;
+  size_t *orig_idx = NULL;
+  /* For mapping function, get number of mappings.  */
+  for (i = 0; i < mapnum; i++)
+    {
+      if (get_kind (short_mapkind, kinds, i) == GOMP_MAP_FUNCTION)
+	{
+	  n_mapfn++;
+	  extranums--;  /* Mapping function.  */
+	  if (sizes[i] == 0)  /* Normal mapping but via map function.  */
+	    extranums += ((map_fn_t) hostaddrs[i]) (GOMP_map_callback_fn, NULL,
+						    NULL, 0);
+	  else  /* Complex mapping à la Fortran deep mapping.  */
+	    assert (false);
+	}
+    }
+  if (extranums)
+    {
+      mapnum += extranums;
+      token.idx = 0;
+      token.max = extranums;
+      token.n = gomp_malloc (extranums * sizeof (*token.n));
+      hostaddrs = gomp_malloc (mapnum * sizeof (*hostaddrs));
+      kinds = gomp_malloc (mapnum * (short_mapkind ? sizeof (unsigned short)
+						   : sizeof (unsigned char)));
+      sizes = gomp_malloc (mapnum * sizeof (*sizes));
+      orig_idx = gomp_malloc (mapnum * sizeof (*orig_idx));
+      size_t idx = 0, idx2 = 0;
+      for (i = 0; i < mapnum ; )
+	if (get_kind (short_mapkind, kinds, i) == GOMP_MAP_FUNCTION)
+	  {
+	    if (sizes[idx] == 0)  /* Normal mapping but via map function.  */
+	      ((map_fn_t) hostaddrs[i]) (GOMP_map_callback_fn, token.n,
+					 NULL, 0);
+	    else  /* Complex mapping à la Fortran deep mapping.  */
+	      assert (false);
+	    for (size_t j = idx2; j < token.idx; idx2++, i++)
+	      {
+		orig_idx[i] = idx;
+		hostaddrs[i] = token.n[idx2].hostaddr;
+		sizes[i] = token.n[idx2].size;
+		int kind = token.n[idx2].kind;
+		if (short_mapkind)
+		  ((unsigned short *) kinds)[i] = (unsigned short) kind;
+		else
+		  ((unsigned char *) kinds)[i] = (unsigned char) kind;
+		assert ((kind & typemask) != GOMP_MAP_USE_DEVICE_PTR
+		        && ((kind & typemask)
+			    != GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT));
+	      }	
+	    idx++;
+	  }
+	else
+	  {
+	    hostaddrs[i] = hostaddrs_arg[idx];
+	    sizes[i] = sizes_arg[idx];
+	    if (short_mapkind)
+	      ((unsigned short *) kinds)[i]
+		= ((unsigned short *) kinds_arg)[idx];
+	    else
+	      ((unsigned char *) kinds)[i]
+		= ((unsigned char *) kinds_arg)[idx];
+	    orig_idx[i] = idx;
+	    idx++;
+	    i++;
+	  }
+    }
+  size_t tgt_align, tgt_size, not_found_cnt = 0;
   bool has_firstprivate = false;
   bool has_always_ptrset = false;
   bool openmp_p = (pragma_kind & GOMP_MAP_VARS_OPENACC) == 0;
   const int rshift = short_mapkind ? 8 : 3;
-  const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
@@ -975,6 +1083,8 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		  hostaddrs[i]
 		    = (void *) (n->tgt->tgt_start + n->tgt_offset
 				+ cur_node.host_start);
+		  if (orig_idx)
+		    hostaddrs_arg[orig_idx[i]] = hostaddrs[i];
 		}
 	      else if ((kind & typemask) == GOMP_MAP_USE_DEVICE_PTR)
 		{
@@ -1679,6 +1789,14 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
       free (tgt);
       tgt = NULL;
     }
+  if (extranums)
+    {
+      free (token.n);
+      free (hostaddrs);
+      free (kinds);
+      free (sizes);
+      free (orig_idx);
+    }
 
   gomp_mutex_unlock (&devicep->lock);
   return tgt;