[2/5,gfortran] Translate allocate directive (OpenMP 5.0).

Message ID 20220113145320.3153087-3-abidh@codesourcery.com
State New
Headers
Series Support for allocate directive (OpenMP 5.0) |

Commit Message

Abid Qadeer Jan. 13, 2022, 2:53 p.m. UTC
  gcc/fortran/ChangeLog:

	* trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
	(gfc_trans_omp_allocate): New function.
	(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.

gcc/ChangeLog:

	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
	(dump_generic_node): Handle OMP_ALLOCATE.
	* tree.def (OMP_ALLOCATE): New.
	* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
	(OMP_ALLOCATE_DECL): Likewise.
	(OMP_ALLOCATE_ALLOCATOR): Likewise.
	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-6.f90: New test.
---
 gcc/fortran/trans-openmp.c                    | 44 ++++++++++++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++++++++++++++++++
 gcc/tree-core.h                               |  3 +
 gcc/tree-pretty-print.c                       | 19 +++++
 gcc/tree.c                                    |  1 +
 gcc/tree.def                                  |  4 ++
 gcc/tree.h                                    | 11 +++
 7 files changed, 154 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
  

Comments

Jakub Jelinek Oct. 11, 2022, 12:24 p.m. UTC | #1
On Thu, Jan 13, 2022 at 02:53:17PM +0000, Hafiz Abid Qadeer wrote:
> gcc/fortran/ChangeLog:
> 
> 	* trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
> 	(gfc_trans_omp_allocate): New function.
> 	(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.
> 
> gcc/ChangeLog:
> 
> 	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
> 	(dump_generic_node): Handle OMP_ALLOCATE.
> 	* tree.def (OMP_ALLOCATE): New.
> 	* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
> 	(OMP_ALLOCATE_DECL): Likewise.
> 	(OMP_ALLOCATE_ALLOCATOR): Likewise.
> 	* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gfortran.dg/gomp/allocate-6.f90: New test.

There is another issue besides what I wrote in my last review,
and I'm afraid I don't know what to do about it, hoping Tobias
has some ideas.
The problem is that without the allocate-stmt associated allocate directive,
Fortran allocatables are easily always allocated with malloc and freed with
free.  The deallocation can be implicit through reallocation, or explicit
deallocate statement etc.
But when some allocatables are now allocated with a different
allocator (when allocate-stmt associated allocate directive is used),
some allocatables are allocated with malloc and others with GOMP_alloc
but we need to free them with the corresponding allocator based on how
they were allocated, what has been allocated with malloc should be
deallocated with free, what has been allocated with GOMP_alloc should be
deallocated with GOMP_free.
The deallocation can be done in a completely different TU from where it has
been allocated, in theory it could be also not compiled with -fopenmp, etc.
So, I'm afraid we need to store somewhere whether we used malloc or
GOMP_alloc for the allocation (say somewhere in the array descriptor and for
other stuff somewhere on the side?) and slow down all code that needs
deallocation to check that bit (or say we don't support
deallocation/reallocation of OpenMP allocated allocatables without -fopenmp
on the deallocation TU and only slow down -fopenmp compiled code)?

Tobias, thoughts on this?

	Jakub
  
Tobias Burnus Oct. 11, 2022, 1:22 p.m. UTC | #2
Hi Jakub,

On 11.10.22 14:24, Jakub Jelinek wrote:

There is another issue besides what I wrote in my last review,
and I'm afraid I don't know what to do about it, hoping Tobias
has some ideas.
The problem is that without the allocate-stmt associated allocate directive,
Fortran allocatables are easily always allocated with malloc and freed with
free.  The deallocation can be implicit through reallocation, or explicit
deallocate statement etc.
...
But when some allocatables are now allocated with a different
allocator (when allocate-stmt associated allocate directive is used),
some allocatables are allocated with malloc and others with GOMP_alloc
but we need to free them with the corresponding allocator based on how
they were allocated, what has been allocated with malloc should be
deallocated with free, what has been allocated with GOMP_alloc should be
deallocated with GOMP_free.



I think the most common case is:

integer, allocatable :: var(:)
!$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var'
...
! optionally: deallocate(var)
end ! of scope: block/subroutine/... - automatic deallocation

Those can be easily handled. It gets more complicated with control flow:

if (...) then
  !$omp allocators allocator(...)
  allocate(...)
else
  allocate (...)
endif



However, the problem is really that there is is no mandatory
'!$omp deallocators' and also the wording like:

"If any operation of the base language causes a reallocation of
an array that is allocated with a memory allocator then that
memory allocator will be used to release the current memory
and to allocate the new memory." (OpenMP 5.0 wording)

There has been some attempt to relax the rules a bit, e.g. by
adding the wording:
"For allocated allocatable components of such variables, the allocator that
will be used for the deallocation and allocation is unspecified."

And some wording change (→issues 3189) to clarify related component issues.

But nonetheless, there is still the issue of:

(a) explicit DEALLOCATE in some other translation unit
(b) some intrinsic operation which reallocate the memory, either via libgomp
or in the source code:
  a = [1,2,3]  ! possibly reallocates
  str = trim(str) ! possibly reallocates
where the first one calls 'realloc' directly in the code and the second one
calls 'libgomp' for that.

 * * *

I don't see a good solution – and there is in principle the same issue with
unified-shared memory (USM) on hardware that does not support transparently
accessing all host memory on the device.

Compilers support this case by allocating memory in some special memory,
which is either accessible from both sides ('pinned') or migrates on the
first access from the device side - but remains there until the accessing
device kernel ends ('managed memory').

Newer hardware (+ associated Linux kernel support) permit accessing all
memory in a somewhat fast way, avoiding this issue (and special handling
is then left to the user.) For AMDGCN, my understanding is that all hardware
supported by GCC supports this - but glacial speed until the last hardware
architectures. For Nvidia, this is supported since Pascal (I think for Titan X,
P100, i.e. sm_5.2/sm_60) - but I believe not for all Pascal/Kepler hardware.

I mention this because the USM implementation at
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
suffers from this.
And https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html
tries to solve the the 'trim' example issue above - i.e. the case where
libgomp reallocates pinned/managed (pseudo-)USM memory.

 * * *

The deallocation can be done in a completely different TU from where it has
been allocated, in theory it could be also not compiled with -fopenmp, etc.
So, I'm afraid we need to store somewhere whether we used malloc or
GOMP_alloc for the allocation (say somewhere in the array descriptor and for
other stuff somewhere on the side?) and slow down all code that needs
deallocation to check that bit (or say we don't support
deallocation/reallocation of OpenMP allocated allocatables without -fopenmp
on the deallocation TU and only slow down -fopenmp compiled code)?

The problem with storing is that gfortran inserts the malloc/realloc/free calls directly, i.e. without library preloading, intercepting those libcalls, I do not see how it can work at all.

I also do not know how to handle the pinned-memory case above correctly, either.

One partial support would be requiring that the code using allocatables cannot do any reallocation/deallocation by only permitting calls to procedures which do not permit allocatables. (Such that no reallocation can happen.) – And print a 'sorry' for the rest.

Other implementations seem to have a Fortran library call for (re)allocations, which permits to swap the allocator from the generic one to the omp_default_mem_alloc.

* * *

In terms of the array descriptor, we have inside 'struct dtype_type'  the 'signed short attribute', which currently only holds CFI_attribute_pointer/CFI_attribute_allocatable/CFI_attribute_other (=0,1,2). And this is only used together with ISO C binding, permitting to use the other bits for other purpose (for the non-ISO-C case). Still, the question is *how* to use it in that case.

Thoughts on the generic issue on those thoughts?

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek Oct. 11, 2022, 2:15 p.m. UTC | #3
On Tue, Oct 11, 2022 at 03:22:02PM +0200, Tobias Burnus wrote:
> Hi Jakub,
> 
> On 11.10.22 14:24, Jakub Jelinek wrote:
> 
> There is another issue besides what I wrote in my last review,
> and I'm afraid I don't know what to do about it, hoping Tobias
> has some ideas.
> The problem is that without the allocate-stmt associated allocate directive,
> Fortran allocatables are easily always allocated with malloc and freed with
> free.  The deallocation can be implicit through reallocation, or explicit
> deallocate statement etc.
> ...
> But when some allocatables are now allocated with a different
> allocator (when allocate-stmt associated allocate directive is used),
> some allocatables are allocated with malloc and others with GOMP_alloc
> but we need to free them with the corresponding allocator based on how
> they were allocated, what has been allocated with malloc should be
> deallocated with free, what has been allocated with GOMP_alloc should be
> deallocated with GOMP_free.
> 
> 
> 
> I think the most common case is:
> 
> integer, allocatable :: var(:)
> !$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var'
> ...
> ! optionally: deallocate(var)
> end ! of scope: block/subroutine/... - automatic deallocation

So you talk here about the declarative directive the patch does sorry on,
or about the executable one above allocate stmt?

Anyway, even this simple case has the problem that one can have
subroutine foo (var)
  integer, allocatable:: var(:)
  var = [1, 2, 3] ! reallocate
end subroutine
and call foo (var) above.

> Those can be easily handled. It gets more complicated with control flow:
> 
> if (...) then
>  !$omp allocators allocator(...)
>  allocate(...)
> else
>  allocate (...)
> endif
> 
> 
> 
> However, the problem is really that there is is no mandatory
> '!$omp deallocators' and also the wording like:
> 
> "If any operation of the base language causes a reallocation of
> an array that is allocated with a memory allocator then that
> memory allocator will be used to release the current memory
> and to allocate the new memory." (OpenMP 5.0 wording)
> 
> There has been some attempt to relax the rules a bit, e.g. by
> adding the wording:
> "For allocated allocatable components of such variables, the allocator that
> will be used for the deallocation and allocation is unspecified."
> 
> And some wording change (→issues 3189) to clarify related component issues.
> 
> But nonetheless, there is still the issue of:
> 
> (a) explicit DEALLOCATE in some other translation unit
> (b) some intrinsic operation which reallocate the memory, either via libgomp
> or in the source code:
>  a = [1,2,3]  ! possibly reallocates
>  str = trim(str) ! possibly reallocates
> where the first one calls 'realloc' directly in the code and the second one
> calls 'libgomp' for that.
> 
> * * *
> 
> I don't see a good solution – and there is in principle the same issue with
> unified-shared memory (USM) on hardware that does not support transparently
> accessing all host memory on the device.
> 
> Compilers support this case by allocating memory in some special memory,
> which is either accessible from both sides ('pinned') or migrates on the
> first access from the device side - but remains there until the accessing
> device kernel ends ('managed memory').
> 
> Newer hardware (+ associated Linux kernel support) permit accessing all
> memory in a somewhat fast way, avoiding this issue (and special handling
> is then left to the user.) For AMDGCN, my understanding is that all hardware
> supported by GCC supports this - but glacial speed until the last hardware
> architectures. For Nvidia, this is supported since Pascal (I think for Titan X,
> P100, i.e. sm_5.2/sm_60) - but I believe not for all Pascal/Kepler hardware.
> 
> I mention this because the USM implementation at
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
> suffers from this.
> And https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html
> tries to solve the the 'trim' example issue above - i.e. the case where
> libgomp reallocates pinned/managed (pseudo-)USM memory.
> 
> * * *
> 
> The deallocation can be done in a completely different TU from where it has
> been allocated, in theory it could be also not compiled with -fopenmp, etc.
> So, I'm afraid we need to store somewhere whether we used malloc or
> GOMP_alloc for the allocation (say somewhere in the array descriptor and for
> other stuff somewhere on the side?) and slow down all code that needs
> deallocation to check that bit (or say we don't support
> deallocation/reallocation of OpenMP allocated allocatables without -fopenmp
> on the deallocation TU and only slow down -fopenmp compiled code)?
> 
> The problem with storing is that gfortran inserts the malloc/realloc/free calls directly, i.e. without library preloading, intercepting those libcalls, I do not see how it can work at all.

Well, it can use a weak symbol, if not linked against libgomp, the bit
that it is OpenMP shouldn't be set and so realloc/free will be used
and do
  if (arrdescr.gomp_alloced_bit)
    GOMP_free (arrdescr.data, 0);
  else
    free (arrdescr.data);
and similar.  And I think we can just document that we do this only for
-fopenmp compiled code.
But do we have a place to store that bit?  I presume in array descriptors
there could be some bit for it, but what to do about scalar allocatables,
or allocatable components etc.?
In theory we could use ugly stuff like if all the allocations would be
guaranteed to have at least 2 byte alignment use LSB bit of the pointer
to mark GOMP_alloc allocated memory for the scalar allocatables etc. but
then would need in -fopenmp compiled code to strip it away.

As for pinned memory, if it is allocated through libgomp allocators, that
should just work if GOMP_free/GOMP_realloc is used, that is why we have
those extra data in front of the allocations where we store everything we
need.  But those also make the OpenMP allocations incompatible with
malloc/free allocations.

	Jakub
  
Jakub Jelinek Oct. 11, 2022, 2:27 p.m. UTC | #4
On Tue, Oct 11, 2022 at 04:15:25PM +0200, Jakub Jelinek wrote:
> Well, it can use a weak symbol, if not linked against libgomp, the bit
> that it is OpenMP shouldn't be set and so realloc/free will be used
> and do
>   if (arrdescr.gomp_alloced_bit)
>     GOMP_free (arrdescr.data, 0);
>   else
>     free (arrdescr.data);
> and similar.  And I think we can just document that we do this only for
> -fopenmp compiled code.
> But do we have a place to store that bit?  I presume in array descriptors
> there could be some bit for it, but what to do about scalar allocatables,
> or allocatable components etc.?
> In theory we could use ugly stuff like if all the allocations would be
> guaranteed to have at least 2 byte alignment use LSB bit of the pointer
> to mark GOMP_alloc allocated memory for the scalar allocatables etc. but
> then would need in -fopenmp compiled code to strip it away.
> 
> As for pinned memory, if it is allocated through libgomp allocators, that
> should just work if GOMP_free/GOMP_realloc is used, that is why we have
> those extra data in front of the allocations where we store everything we
> need.  But those also make the OpenMP allocations incompatible with
> malloc/free allocations.

Yet another option would be to change the way our OpenMP allocators work,
instead of having allocation internal data before the allocated memory
have them somewhere on the side and use some data structures mapping
ranges of virtual memory to the allocation data.
We'd either need to use mmap to have better control on where exactly
we allocate stuff so that the on the side data structures wouldn't need
to be for every allocation, or do those for every allocation perhaps with
merging of adjacent allocations or something similar.
Disadvantage is that it would be slower and might need more locking etc.,
advantage is that it could be then malloc/free compatible, any not tracked
address would be forwarded from GOMP_free to free etc.  And we'd not waste
e.g. precious pinned etc. memory especially when doing allocations with very
high alignment, where the data before allocation means we can waste up to
max (32, alignment - 1) of extra memory.  And gfortran
inline emitted reallocation/deallocation could just emit GOMP_realloc/free
always for -fopenmp.  The way GOMP_ allocators are currently written, it is
our internal choice if we do it the current way or the on the side way or
some other way, but if we'd guarantee free compatibility we'd make it part
of the ABI.

CCing DJ and Carlos if they have thoughts about this.
The OpenMP spec essentially requires that allocations through its allocator
remember somewhere with which allocator (and its exact properties) each
allocation has been done, so that it can be taken into account during
reallocation or freeing.

	Jakub
  
Tobias Burnus Oct. 11, 2022, 2:38 p.m. UTC | #5
On 11.10.22 16:15, Jakub Jelinek wrote:

I think the most common case is:

integer, allocatable :: var(:)
!$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var'
...
! optionally: deallocate(var)
end ! of scope: block/subroutine/... - automatic deallocation



So you talk here about the declarative directive the patch does sorry on,
or about the executable one above allocate stmt?

Here, I was only talking about the most common usage case, with the
assumption that the user code does not cause any reallocation.

I later talked about accepting only code which cannot cause
reallocation (compile-time check of the code contained in the
scope).

Thus, a 'call foo(a)' would be fine, but not for ...


Anyway, even this simple case has the problem that one can have
subroutine foo (var)
  integer, allocatable:: var(:)

a 'foo' that has an 'allocatable' attribute for the dummy argument.
I think in the common case, it has not – such that most code can run w/o running into this issue.

However, for code like
  type t
    real, allocatable :: x(:), y(:), z(:)
  end type t
  type(t) :: var
  !$omp allocators(my_alloc)
  allocate(var%x(N), var%y(N), var%z(N))

  call bar(var%x)
  call foo(var)

it is more difficult: 'bar' works (if its dummy argument is not 'allocatable')
but for 'foo', the (re|de)allocation cannot be ruled out.
Thus, we always have to 'sorry' for such a code – and I fear it could be somewhat
common.



Well, it can use a weak symbol, if not linked against libgomp, the bit
that it is OpenMP shouldn't be set and so realloc/free will be used
and do
  if (arrdescr.gomp_alloced_bit)
    GOMP_free (arrdescr.data, 0);
  else
    free (arrdescr.data);
and similar.  And I think we can just document that we do this only for
-fopenmp compiled code.
But do we have a place to store that bit?

I presume in array descriptors
there could be some bit for it, but what to do about scalar allocatables,
or allocatable components etc.?

As mentioned, we could use the 'dtype.attribute' field which is currently not really used – and if, only 2 of the 16 bits are used. But you are right that for scalar allocatables, we do not use array descriptors (except with BIND(C)). Hmm.

For allocatable components, the same applied: If arrays, then there is an array descriptor – for scalars, there isn't. (And storing the length of a scalar character string with deferred length uses an aux variable + has lots of bugs.)

In theory we could use ugly stuff like if all the allocations would be
guaranteed to have at least 2 byte alignment use LSB bit of the pointer
to mark GOMP_alloc allocated memory for the scalar allocatables etc. but
then would need in -fopenmp compiled code to strip it away.

I think we could do tricks with scalar allocatable variable – but it will be more complicated with scalar allocatable components. Hmm.

As for pinned memory, if it is allocated through libgomp allocators, that
should just work if GOMP_free/GOMP_realloc is used, that is why we have
those extra data in front of the allocations where we store everything we
need.  But those also make the OpenMP allocations incompatible with
malloc/free allocations.


The problem of making pseudo-USM work is that it has to be applied to all (stack,heap) memory – which implies that all code using malloc/free needs to be either call the GOMP version or the GLIBC version, but shall not mix one or the other. – Thus, calling some library or any other file that was not compiled with -f... will have issues with malloc/free. Another issue is that variables not allocated via GOMP_* will not be accessible on the device in that case.

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Patch

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 9661c77f905..cb389f40370 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2649,6 +2649,28 @@  gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  }
 	      }
 	  break;
+	case OMP_LIST_ALLOCATOR:
+	  for (; n != NULL; n = n->next)
+	    if (n->sym->attr.referenced)
+	      {
+		tree t = gfc_trans_omp_variable (n->sym, false);
+		if (t != error_mark_node)
+		  {
+		    tree node = build_omp_clause (input_location,
+						  OMP_CLAUSE_ALLOCATOR);
+		    OMP_ALLOCATE_DECL (node) = t;
+		    if (n->expr)
+		      {
+			tree allocator_;
+			gfc_init_se (&se, NULL);
+			gfc_conv_expr (&se, n->expr);
+			allocator_ = gfc_evaluate_now (se.expr, block);
+			OMP_ALLOCATE_ALLOCATOR (node) = allocator_;
+		      }
+		    omp_clauses = gfc_trans_add_clause (node, omp_clauses);
+		  }
+	      }
+	  break;
 	case OMP_LIST_LINEAR:
 	  {
 	    gfc_expr *last_step_expr = NULL;
@@ -4888,6 +4910,26 @@  gfc_trans_omp_atomic (gfc_code *code)
   return gfc_finish_block (&block);
 }
 
+static tree
+gfc_trans_omp_allocate (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+
+  gfc_start_block (&block);
+  stmt = make_node (OMP_ALLOCATE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
+						       code->loc, false,
+						       true);
+  gfc_add_expr_to_block (&block, stmt);
+  gfc_merge_block_scope (&block);
+  return gfc_finish_block (&block);
+}
+
 static tree
 gfc_trans_omp_barrier (void)
 {
@@ -7280,6 +7322,8 @@  gfc_trans_omp_directive (gfc_code *code)
 {
   switch (code->op)
     {
+    case EXEC_OMP_ALLOCATE:
+      return gfc_trans_omp_allocate (code);
     case EXEC_OMP_ATOMIC:
       return gfc_trans_omp_atomic (code);
     case EXEC_OMP_BARRIER:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
new file mode 100644
index 00000000000..2de2b52ee44
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -0,0 +1,72 @@ 
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+     parameter :: omp_thread_mem_alloc = 8
+end module
+
+
+subroutine foo(x, y, al)
+  use omp_lib_kinds
+  implicit none
+  
+type :: my_type
+  integer :: i
+  integer :: j
+  real :: x
+end type
+
+  integer  :: x
+  integer  :: y
+  integer (kind=omp_allocator_handle_kind) :: al
+
+  integer, allocatable :: var1
+  integer, allocatable :: var2
+  real, allocatable :: var3(:,:)
+  type (my_type), allocatable :: var4
+  integer, pointer :: pii, parr(:)
+
+  character, allocatable :: str1a, str1aarr(:) 
+  character(len=5), allocatable :: str5a, str5aarr(:)
+  
+  !$omp allocate
+  allocate(str1a, str1aarr(10), str5a, str5aarr(10))
+
+  !$omp allocate (var1) allocator(omp_default_mem_alloc)
+  !$omp allocate (var2) allocator(omp_large_cap_mem_alloc)
+  allocate (var1, var2)
+
+  !$omp allocate (var4)  allocator(omp_low_lat_mem_alloc)
+  allocate (var4)
+  var4%i = 5
+
+  !$omp allocate (var3)  allocator(omp_low_lat_mem_alloc)
+  allocate (var3(x,y))
+
+  !$omp allocate
+  allocate(pii, parr(5))
+end subroutine
+
+! { dg-final { scan-tree-dump-times "#pragma omp allocate" 6 "original" } }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 61ae4bd931b..5bd5501e346 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -519,6 +519,9 @@  enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: allocator.  */
+  OMP_CLAUSE_ALLOCATOR,
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 352662567b4..c3891a359f2 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -740,6 +740,20 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE_ALLOCATOR:
+      pp_string (pp, "(");
+      dump_generic_node (pp, OMP_ALLOCATE_DECL (clause),
+			 spc, flags, false);
+      if (OMP_ALLOCATE_ALLOCATOR (clause))
+	{
+	  pp_string (pp, ":allocator(");
+	  dump_generic_node (pp, OMP_ALLOCATE_ALLOCATOR (clause),
+			     spc, flags, false);
+	  pp_right_paren (pp);
+	}
+      pp_right_paren (pp);
+      break;
+
     case OMP_CLAUSE_ALLOCATE:
       pp_string (pp, "allocate(");
       if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (clause))
@@ -3484,6 +3498,11 @@  dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
       dump_omp_clauses (pp, OACC_CACHE_CLAUSES (node), spc, flags);
       break;
 
+    case OMP_ALLOCATE:
+      pp_string (pp, "#pragma omp allocate ");
+      dump_omp_clauses (pp, OMP_ALLOCATE_CLAUSES (node), spc, flags);
+      break;
+
     case OMP_PARALLEL:
       pp_string (pp, "#pragma omp parallel");
       dump_omp_clauses (pp, OMP_PARALLEL_CLAUSES (node), spc, flags);
diff --git a/gcc/tree.c b/gcc/tree.c
index d98b77db50b..75141756d87 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -363,6 +363,7 @@  unsigned const char omp_clause_num_ops[] =
   0, /* OMP_CLAUSE_IF_PRESENT */
   0, /* OMP_CLAUSE_FINALIZE */
   0, /* OMP_CLAUSE_NOHOST */
+  2, /* OMP_CLAUSE_ALLOCATOR */
 };
 
 const char * const omp_clause_code_name[] =
diff --git a/gcc/tree.def b/gcc/tree.def
index 33eb3b7beff..9768bc29dec 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1301,6 +1301,10 @@  DEFTREECODE (OMP_ATOMIC_READ, "omp_atomic_read", tcc_statement, 1)
 DEFTREECODE (OMP_ATOMIC_CAPTURE_OLD, "omp_atomic_capture_old", tcc_statement, 2)
 DEFTREECODE (OMP_ATOMIC_CAPTURE_NEW, "omp_atomic_capture_new", tcc_statement, 2)
 
+/* OpenMP - #pragma omp allocate
+   Operand 0: Clauses.  */
+DEFTREECODE (OMP_ALLOCATE, "omp allocate", tcc_statement, 1)
+
 /* OpenMP clauses.  */
 DEFTREECODE (OMP_CLAUSE, "omp_clause", tcc_exceptional, 0)
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 318019c4dc5..2ec0b8c9240 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1405,6 +1405,8 @@  class auto_suppress_location_wrappers
 #define OACC_UPDATE_CLAUSES(NODE) \
   TREE_OPERAND (OACC_UPDATE_CHECK (NODE), 0)
 
+#define OMP_ALLOCATE_CLAUSES(NODE) TREE_OPERAND (OMP_ALLOCATE_CHECK (NODE), 0)
+
 #define OMP_PARALLEL_BODY(NODE)    TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 0)
 #define OMP_PARALLEL_CLAUSES(NODE) TREE_OPERAND (OMP_PARALLEL_CHECK (NODE), 1)
 
@@ -1801,6 +1803,15 @@  class auto_suppress_location_wrappers
 #define OMP_CLAUSE_ALLOCATE_ALIGN(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATE), 2)
 
+/* May be we can use OMP_CLAUSE_DECL but the I am not sure where to place
+   OMP_CLAUSE_ALLOCATOR in omp_clause_code.  */
+
+#define OMP_ALLOCATE_DECL(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 0)
+
+#define OMP_ALLOCATE_ALLOCATOR(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_ALLOCATOR), 1)
+
 /* True if an ALLOCATE clause was present on a combined or composite
    construct and the code for splitting the clauses has already performed
    checking if the listed variable has explicit privatization on the