OpenMP: Move omp requires checks to libgomp

Message ID 07fec82a-41cf-fdc5-6307-c068dd95ef1a@mentor.com
State New
Headers
Series OpenMP: Move omp requires checks to libgomp |

Commit Message

Tobias Burnus June 8, 2022, 3:56 a.m. UTC
  This is based on Chung-Lin's patch at https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563393.html

This is about code like:
   #pragma omp requires unified_shared_memory
   !$omp requires reverse_offload
which before was rejected with a sorry during parsing and is now
handled in libgomp (by excluding the devices from the available
device list).

Note: The requires-directive consistency check is nonfatal, i.e.
the program continues after the
  libgomp: requires-directive clause inconsistency between compilation units detected: 'unified_shared_memory' vs. 'reverse_offload'
message.

Changes compared to Chung-Lin's patch:
- I take the omp_* device API calls into account
- Rediffed because of changes done in the past year
- Included Thomas' fix for !ENABLE_OFFLOADING + intelmic, i.e. OG11 commit
   https://gcc.gnu.org/g:6da4ffd4a790f5f0abf290147217ca46a36f981e

On the libgomp side: The devices which do not fulfill the requirements are
now filtered out. That's in line how I understood the spec – and to make it
clearer, I spelled this out explicitly when adding (for other reasons) two
glossary items (passed 2nd vote but not yet in a released OpenMP spec):
- "accessible devices
    The host device and all non-host devices accessible for execution."
- "supported devices
    The host device and all non-host devices supported by the implementation
    for execution of target code for which the device-related requirements
    of the 'requires' directive are fulfilled."

Note:
* The FE only generates the requirement clauses when there is at least
   one declare target variable or function and offloading is used
   (target region, API call etc.)
   In particular, this implies that for !ENABLE_OFFLOADING, none is
   generated.
* libgomp only checks whether those values are consistent when
   env var OMP_TARGET_OFFLOAD != disable.

=> Thus, I protected the check for this (libgomp.c-c++-common/requires-1.c)
    by { dg-skip-if "" { ! offloading_enabled } }
    (and assume that OMP_TARGET_OFFLOAD is not set).

If env var OMP_TARGET_OFFLOAD != disable, it then runs for all configured
plugins and checks first whether devices are actually available and then
whether the requirement mask is fulfilled. Currently, none of the clauses
is supported (neither unified_shared_memory nor unified_shared_address nor
reverse_offload) even though there are patches submitted (and being worked on),
which add support for those.

I then unconditionally print a note like:
   libgomp: note: nvptx devices present but 'omp requires unified_shared_memory' cannot be fulfilled

This note is printed if env var OMP_TARGET_OFFLOAD != disable,
libgomp supports the device type, a device was found but omp requires
could not fulfilled.
This means that this message is also printed when compiled with
   -foffload=disable
or 'omp target if(0)' was used throught or ...

I think that's acceptable, but it could also be optimized further; however,
the initialization (e.g. GOMP_offload_register_ver) happens much later such
that the knowledge that a device is not needed (as with -foffload=disable)
is not available.

I hope the note is not too confusing, but otherwise:
* it could be postponed and then printed in context
   (requires device type <-> name association)
* it could only be printed with GOMP_DEBUG set
   but for the common case (why did it not run?), outputting it
   unconditionally surely helps to understand what went "wrong".

Thoughts? Comments? OK?

Tobias

PS: I have not fully tested the intelmic version.
PPS: I have not tried to implement the compile-time check to impose
consistent 'omp requires' – as proposed in the last review. I think I will
open a PR to track that proposal.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Jakub Jelinek June 9, 2022, 11:40 a.m. UTC | #1
On Wed, Jun 08, 2022 at 05:56:02AM +0200, Tobias Burnus wrote:
> gcc/c/ChangeLog:
> 
> 	* c-parser.cc (c_parser_declaration_or_fndef): Set
> 	OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
> 	"omp declare target" attribute.
> 	(c_parser_omp_target_data): Set	OMP_REQUIRES_TARGET_USED in
> 	omp_requires_mask.
> 	(c_parser_omp_target_enter_data): Likewise.
> 	(c_parser_omp_target_exit_data): Likewise.
> 	(c_parser_omp_requires): Remove sorry.
> 
> gcc/cp/ChangeLog:
> 
> 	* parser.cc (cp_parser_simple_declaration): Set
> 	OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
> 	"omp declare target" attribute.
> 	(cp_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
> 	omp_requires_mask.
> 	(cp_parser_omp_target_enter_data): Likewise.
> 	(cp_parser_omp_target_exit_data): Likewise.
> 	(cp_parser_omp_requires): Remove sorry.
> 
> gcc/fortran/ChangeLog:
> 
> 	* openmp.cc (gfc_match_omp_requires): Remove "not implemented yet".
> 	* parse.cc: Include "tree.h" and "omp-general.h".
> 	(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.
> 
> gcc/ChangeLog:
> 
> 	* omp-general.h (omp_runtime_api_call): New prototype.
> 	* omp-general.cc (omp_runtime_api_call): Added device_api_only arg
> 	and moved from ...
> 	* omp-low.cc (omp_runtime_api_call): ... here.
> 	(scan_omp_1_stmt): Update call.
> 	* gimplify.cc (gimplify_call_expr): Call omp_runtime_api_call.
> 	* omp-offload.cc (omp_finish_file): Add code to create OpenMP requires
> 	mask variable in .gnu.gomp_requires section, if needed.
> 
> include/ChangeLog:
> 
> 	* gomp-constants.h (GOMP_REQUIRES_UNIFIED_ADDRESS,
> 	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY,
> 	GOMP_REQUIRES_REVERSE_OFFLOAD): New.
> 
> libgcc/ChangeLog:
> 
> 	* offloadstuff.c (__requires_mask_table, __requires_mask_table_end):
> 	New symbols to mark start and end of the .gnu.gomp_requires section.
> 
> 
> libgomp/ChangeLog:
> 
> 	* libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add
> 	omp_requires_mask arg.
> 	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise;
> 	return -1 when device available but omp_requires_mask != 0.
> 	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise.
> 	* oacc-host.c (host_get_num_devices, host_openacc_get_property):
> 	Update call.
> 	* oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1,
> 	goacc_attach_host_thread_to_device, acc_get_num_devices,
> 	acc_set_device_num, get_property_any): Likewise.
> 	* target.c: (__requires_mask_table, __requires_mask_table_end):
> 	Declare weak extern symbols.
> 	(gomp_requires_to_name): New.
> 	(gomp_target_init): Add code to check .gnu._gomp_requires section
> 	mask values for inconsistencies; warn when requirements makes an
> 	existing device unsupported.
> 	* testsuite/libgomp.c-c++-common/requires-1-aux.c: New test.
> 	* testsuite/libgomp.c-c++-common/requires-1.c: New test.
> 	* testsuite/libgomp.c-c++-common/requires-2-aux.c: New test.
> 	* testsuite/libgomp.c-c++-common/requires-2.c: New test.
> 
> liboffloadmic/ChangeLog:
> 
> 	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices):
> 	Return -1 when device available but omp_requires_mask != 0.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/gomp/requires-4.c: Update dg-*.
> 	* c-c++-common/gomp/target-device-ancestor-2.c: Likewise.
> 	* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
> 	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
> 	* c-c++-common/gomp/target-device-ancestor-5.c: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move post-FE
> 	checks to ...
> 	* gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file.

> +      if (flag_openmp
> +         && lookup_attribute ("omp declare target",
> +                              DECL_ATTRIBUTES (current_function_decl)))
> +       omp_requires_mask
> +         = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);

I must admit it is unclear what the
"must appear lexically before any device constructs or device routines."
restriction actually means for device routines.
Is that lexically before definition of such device routines, or even their
declarations?

It wouldn't surprise me if some library packages started eventually adding
declare target directives in some headers around external declarations,
should that be the point after which we don't allow requires directives?

On the other side, for the definitions, we don't need to know when parsing
the definition whether it is a device routine.

void
foo (void)
{
}
#pragma omp declare target to (foo)

And yet another question: is
void bar (void);
#pragma omp declare target device_type (host) to (bar)
void
bar (void)
{
}
a device routine or not?

The above patch snippet is I believe for function definitions that were
arked as declare target before the definition somehow (another decl for
it merged with the new one or in between the begin/end).  And is true
even for device_type (host), to rule that out you'd need to check for
"omp declare target host" attribute not being present.
I'm not against the above snippet perhaps adjusted for device_type(host),
but IMHO we want clarifications from omp-lang.

> @@ -20915,6 +20921,10 @@ c_parser_omp_teams (location_t loc, c_parser *parser,
>  static tree
>  c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
>  {
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = c_parser_omp_all_clauses (parser, OMP_TARGET_DATA_CLAUSE_MASK,
>  				"#pragma omp target data");

target update is also a device construct and the above snippet hasn't been
added for it, ditto for interop which we don't implement yet.
But, my preference would be instead of adding these snippets to
c_parser_omp_target_{data,enter_data,exit_data,update} etc. move it from
c_parser_omp_target to c_parser_omp_all_clauses:
  if (flag_openmp
      && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE)) != 0)
    omp_requires_mask
      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
(somewhere at the start of the function), because the definition of device
constructs is exactly like that:
"device construct	An OpenMP construct that accepts the device clause."

> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index da2f370cdca..6e26d123370 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -15389,6 +15389,11 @@ cp_parser_simple_declaration (cp_parser* parser,
>  	  /* Otherwise, we're done with the list of declarators.  */
>  	  else
>  	    {
> +	      if (flag_openmp && lookup_attribute ("omp declare target",
> +						   DECL_ATTRIBUTES (decl)))
> +		omp_requires_mask
> +		  = (enum omp_requires) (omp_requires_mask
> +					 | OMP_REQUIRES_TARGET_USED);
>  	      pop_deferring_access_checks ();
>  	      cp_finalize_omp_declare_simd (parser, &odsd);
>  	      return;

Ditto.

> @@ -44287,6 +44292,10 @@ cp_parser_omp_teams (cp_parser *parser, cp_token *pragma_tok,
>  static tree
>  cp_parser_omp_target_data (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
>  {
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = cp_parser_omp_all_clauses (parser, OMP_TARGET_DATA_CLAUSE_MASK,
>  				 "#pragma omp target data", pragma_tok);
> @@ -44390,6 +44399,10 @@ cp_parser_omp_target_enter_data (cp_parser *parser, cp_token *pragma_tok,
>        return true;
>      }
>  
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = cp_parser_omp_all_clauses (parser, OMP_TARGET_ENTER_DATA_CLAUSE_MASK,
>  				 "#pragma omp target enter data", pragma_tok);
> @@ -44481,6 +44494,10 @@ cp_parser_omp_target_exit_data (cp_parser *parser, cp_token *pragma_tok,
>        return true;
>      }
>  
> +  if (flag_openmp)
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>    tree clauses
>      = cp_parser_omp_all_clauses (parser, OMP_TARGET_EXIT_DATA_CLAUSE_MASK,
>  				 "#pragma omp target exit data", pragma_tok);

Ditto.

For Fortran, is the above mostly not needed because requires need to be in
the specification part and device constructs are executable and appear in
the part after it?  Do we allow requires in BLOCK's specification part?

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -3644,6 +3644,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
>  	  return GS_OK;
>  	}
>      }
> +  if (fndecl && flag_openmp && omp_runtime_api_call (fndecl, true))
> +    omp_requires_mask
> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
>  
>    /* Remember the original function pointer type.  */
>    fnptrtype = TREE_TYPE (CALL_EXPR_FN (*expr_p));

I'm sure device APIs were discussed, but I must be blind and I can't find it
in either 5.0, 5.1 or 5.2.  All I see is device constructs or device routines
in those places where I'd also look for device related OpenMP runtime
library APIs.  Though, if some routine calls omp_get_num_devices (),
certainly the library at that point needs to know
reverse_offload/unified_shared_memory/etc. requires because that determines
how many devices it has.  So, what have I missed (aka on which place in the
standard the above snippet is based on)?
Perhaps I had in mind by "device routines" the OpenMP runtime APIs related
to devices, but unfortunately we have a different glossary for that term:
"device routine	A function (for C/C+ and Fortran) or subroutine (for Fortran)
		that can be executed on a target device, as part of a target region."

> +      /* Now likewise but for device API. */

Two spaces after .

> +      /* Now omp_* calls that are available as omp_* and omp_*_; however, the
> +	 DECL_NAME is always omp_* without tailing underscore. Non device.  */

Likewise.

> +      /* And device APIs. */
> +      "get_device_num",
> +      "get_initial_device",
> +      "is_initial_device",  /* Even if it does not require init'ed devices. */
> +      NULL,
> +      /* And finally calls available as omp_*, omp_*_ and omp_*_8_; however,
> +	 as DECL_NAME only omp_* and omp_*_8 appear. For non device.  */

Ditto 3x.

> --- a/gcc/omp-offload.cc
> +++ b/gcc/omp-offload.cc
> @@ -397,6 +397,27 @@ omp_finish_file (void)
>    unsigned num_funcs = vec_safe_length (offload_funcs);
>    unsigned num_vars = vec_safe_length (offload_vars);
>  
> +  if (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0)
> +    {
> +      if (targetm_common.have_named_sections)
> +	{
> +	  const char *requires_section = ".gnu.gomp_requires";
> +	  tree maskvar = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +				     get_identifier (".gomp_requires_mask"),
> +				     unsigned_type_node);
> +	  SET_DECL_ALIGN (maskvar, TYPE_ALIGN (unsigned_type_node));

Don't we want also DECL_USER_ALIGN (maskvar) = 1; so that
we never try to increase its alignment?

Is it an allocated section, or should it be better non-allocated and then
dealt with by mkoffload?

Shouldn't the vars in that section be const, so that it is a read-only
section?

Is unsigned_type_node what we want (say wouldn't be just unsigned_char_node
be enough, currently we just need 3 bits).

Also, wonder if for HAVE_GAS_SHF_MERGE && flag_merge_constants
we shouldn't try to make that section mergeable.  If it goes away during
linking and is replaced by something, then it doesn't matter, but otherwise,
as we don't record which TU had what flags, all we care about is that
there were some TUs which used device construct/routines (and device APIs?)
and used bitmask 7, other TUs that used bitmask 3 and others that used
bitmask 4.

> +	  TREE_STATIC (maskvar) = 1;
> +	  DECL_INITIAL (maskvar)
> +	    = build_int_cst (unsigned_type_node,
> +			     ((unsigned int) omp_requires_mask
> +			      & (OMP_REQUIRES_UNIFIED_ADDRESS
> +				 | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +				 | OMP_REQUIRES_REVERSE_OFFLOAD)));
> +	  set_decl_section_name (maskvar, requires_section);
> +	  varpool_node::finalize_decl (maskvar);
> +	}
> +    }
> +
>    if (num_funcs == 0 && num_vars == 0)
>      return;
>  
> @@ -442,6 +463,14 @@ omp_finish_file (void)
>      }
>    else
>      {
> +#ifndef ACCEL_COMPILER
> +      if (flag_openmp
> +	  && (omp_requires_mask & OMP_REQUIRES_TARGET_USED)
> +	  && (omp_requires_mask & (OMP_REQUIRES_UNIFIED_ADDRESS
> +				   | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +				   | OMP_REQUIRES_REVERSE_OFFLOAD)))
> +	sorry ("OpenMP device offloading is not supported for this target");
> +#endif

I don't understand this snippet.  Without named sections on the host,
I bet we simply don't support offloading at all,
the record_offload_symbol target hook is only non-trivially defined
for nvptx and nvptx isn't typical host for OpenMP offloading,
because we don't remember it anywhere.

> @@ -32,61 +29,4 @@ integer :: a, b, c
> -
> -
> -end
> \ No newline at end of file

Please avoid this in all files (unless it was there
previously and you are fixing it).

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
> @@ -330,6 +330,12 @@ enum gomp_map_kind
>  #define GOMP_DEPEND_MUTEXINOUTSET	4
>  #define GOMP_DEPEND_INOUTSET		5
>  
> +/* Flag values for requires-directive features, must match corresponding
> +   OMP_REQUIRES_* values in gcc/omp-general.h.  */
> +#define GOMP_REQUIRES_UNIFIED_ADDRESS       0x10
> +#define GOMP_REQUIRES_UNIFIED_SHARED_MEMORY 0x20
> +#define GOMP_REQUIRES_REVERSE_OFFLOAD       0x80

They don't have to much those, we can translate them
(and translating them to 1/2/4 might be a good idea).

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -125,7 +125,7 @@ extern void GOMP_PLUGIN_fatal (const char *, ...)
>  extern const char *GOMP_OFFLOAD_get_name (void);
>  extern unsigned int GOMP_OFFLOAD_get_caps (void);
>  extern int GOMP_OFFLOAD_get_type (void);
> -extern int GOMP_OFFLOAD_get_num_devices (void);
> +extern int GOMP_OFFLOAD_get_num_devices (unsigned int);

I wonder if we shouldn't rename it when we change the arguments,
so that if one mixes an older plugin with newer libgomp or vice versa
this is easily caught.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -36,6 +36,7 @@
>  # include <inttypes.h>  /* For PRIu64.  */
>  #endif
>  #include <string.h>
> +#include <stdio.h>  /* For snprintf. */
>  #include <assert.h>
>  #include <errno.h>
>  
> @@ -98,6 +99,13 @@ static int num_devices;
>  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>  static int num_devices_openmp;
>  
> +/* Start/end of .gnu.gomp.requires section of program, defined in

Isn't it .gnu.gomp_requires ?

> +   crtoffloadbegin/end.o.  */
> +__attribute__((weak))
> +extern const unsigned int __requires_mask_table[];
> +__attribute__((weak))
> +extern const unsigned int __requires_mask_table_end[];

I must say it is unclear to me how this works.
It will only find one such array, say in the executable,
or if the executable doesn't have it, in one of the shared libraries.

I think we want some solution that will work with OpenMP code
at least in the executable and some shared libraries it is linked against
(obviously another case is when a library with certain #pragma omp requires
is dlopened after we've finalized the number of devices, bet the options
in that case are either warn or fatal error).

The choices could be e.g. make __requires_mask_table{,_end} .hidden
and in the crtoffloadbegin.o (or end) unconditionally call some new libgomp
routine to register the table, but the disadvantage of that is that we could
have many of those register calls even when there is nothing to register
(sure, the .ctor in crtoffloadbegin.o (or end) could compare the 2 addresses
and not call anything if the table is empty but there would be still code
executed during initialization of the library).

Yet another possibility is linker plugin case.  We already use it for the
case where we actually have some offloading LTO bytecode, transform it into
a data section and register with GOMP_offload_register*.
So, if we could e.g. at the same time also process those requires arrays,
diagnose at link time if multiple TUs with that set disagree on the mask
value and in the target data provide that mask to the library, that would
be much nicer.
And the masks either could be gathered from .gnu.gomp_requires or it can be
somehow encoded in the offloading LTO or its associated data.
What is important though is that it will work even if we actually don't have
any "omp declare target" functions or variables in the TU or the whole
executable or library, just the requires mask.  But that can be dealt with
e.g. by forcing the LTO sections even for that case or so.

	Jakub
  
Tobias Burnus June 9, 2022, 12:46 p.m. UTC | #2
On 09.06.22 13:40, Jakub Jelinek via Gcc-patches wrote:
> On Wed, Jun 08, 2022 at 05:56:02AM +0200, Tobias Burnus wrote:
>> +         && lookup_attribute ("omp declare target",
>> +                              DECL_ATTRIBUTES (current_function_decl)))
>> +       omp_requires_mask
>> +         = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> I must admit it is unclear what the
> "must appear lexically before any device constructs or device routines."
> restriction actually means for device routines.
> Is that lexically before definition of such device routines, or even their
> declarations?
I have similar issues – also for Fortran (and C++) module use. Hence, I
had filled https://github.com/OpenMP/spec/issues/3240 (not publicly
accessible); I added your issues to the list.
> The above patch snippet is I believe for function definitions that were
> arked as declare target before the definition somehow (another decl for
> it merged with the new one or in between the begin/end).  And is true
> even for device_type (host), to rule that out you'd need to check for
> "omp declare target host" attribute not being present.
> I'm not against the above snippet perhaps adjusted for device_type(host),
> but IMHO we want clarifications from omp-lang
How to proceed for now? And does 'omp_is_initial_device()' on the host a
device function or not? It can be hard-coded to 'true' ...
> [...]
> target update is also a device construct and the above snippet hasn't been
> added for it, ditto for interop which we don't implement yet.
> But, my preference would be instead of adding these snippets to
> c_parser_omp_target_{data,enter_data,exit_data,update} etc. move it from
> c_parser_omp_target to c_parser_omp_all_clauses:
>    if (flag_openmp
>        && (mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEVICE)) != 0)
>      omp_requires_mask
>        = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> (somewhere at the start of the function), because the definition of device
> constructs is exactly like that:
> "device construct     An OpenMP construct that accepts the device clause."

Makes sense.

[C++ cases]

> Ditto.
> For Fortran, is the above mostly not needed because requires need to be in
> the specification part and device constructs are executable and appear in
> the part after it?  Do we allow requires in BLOCK's specification part?
We don't allow it in BLOCK – but there are issues related to USE-ing
modules, cf. OpenMP issue.
>> --- a/gcc/gimplify.cc
>> +++ b/gcc/gimplify.cc
>> @@ -3644,6 +3644,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
>> +  if (fndecl && flag_openmp && omp_runtime_api_call (fndecl, true))
>> +    omp_requires_mask
>> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> I'm sure device APIs were discussed, but I must be blind and I can't find it
> in either 5.0, 5.1 or 5.2.  All I see is device constructs or device routines
> in those places where I'd also look for device related OpenMP runtime
> library APIs.  Though, if some routine calls omp_get_num_devices (),
> certainly the library at that point needs to know
> reverse_offload/unified_shared_memory/etc. requires because that determines
> how many devices it has.  So, what have I missed (aka on which place in the
> standard the above snippet is based on)?

It is based on your review comments from last year ("Something I miss in
the patch is that for the device API calls") plus what requires some
device initialization. But otherwise, I also did not see it.

In terms of parsing, it makes no difference – contrary to
'unified_shared_memory', where the parser could decide not to add
implicit mapping, the compiler part is not affected by API calls.

I cannot really make up my mind whether it should be required in this
case or not. Maybe, it is not needed.

> + const char *requires_section = ".gnu.gomp_requires";
>> +      tree maskvar = build_decl (UNKNOWN_LOCATION, VAR_DECL,
>> +                                 get_identifier (".gomp_requires_mask"),
>> +                                 unsigned_type_node);
>> +      SET_DECL_ALIGN (maskvar, TYPE_ALIGN (unsigned_type_node));
> Don't we want also DECL_USER_ALIGN (maskvar) = 1; so that
> we never try to increase its alignment?
Probably yes.
> Is it an allocated section, or should it be better non-allocated and then
> dealt with by mkoffload?
>
> Shouldn't the vars in that section be const, so that it is a read-only
> section?
>
> Is unsigned_type_node what we want (say wouldn't be just unsigned_char_node
> be enough, currently we just need 3 bits).

Probably -that would be 8 bits, leaving 5 spare. I have not checked what
Andrew et al. do with the pinned-memory support by -f<some-flag>, but
that will likely use only 1 to 3 bits, if any.

> Also, wonder if for HAVE_GAS_SHF_MERGE && flag_merge_constants
> we shouldn't try to make that section mergeable.  If it goes away during
> linking and is replaced by something, then it doesn't matter, but otherwise,
> as we don't record which TU had what flags, all we care about is that
> there were some TUs which used device construct/routines (and device APIs?)
> and used bitmask 7, other TUs that used bitmask 3 and others that used
> bitmask 4.
(maybe – I am not sure about this, either.)
> @@ -442,6 +463,14 @@ omp_finish_file (void)
>       }
>     else
>       {
> +#ifndef ACCEL_COMPILER
> +      if (flag_openmp
> +       && (omp_requires_mask & OMP_REQUIRES_TARGET_USED)
> +       && (omp_requires_mask & (OMP_REQUIRES_UNIFIED_ADDRESS
> +                                | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +                                | OMP_REQUIRES_REVERSE_OFFLOAD)))
> +     sorry ("OpenMP device offloading is not supported for this target");
> +#endif
> I don't understand this snippet.  Without named sections on the host,
> I bet we simply don't support offloading at all,
> the record_offload_symbol target hook is only non-trivially defined
> for nvptx and nvptx isn't typical host for OpenMP offloading,
> because we don't remember it anywhere.

I thought that would address your: "This probably needs to sorry if the
target doesn't support named sections. We probably don't support LTO in
that case either though."

> I wonder if we shouldn't rename it when we change the arguments,
> so that if one mixes an older plugin with newer libgomp or vice versa
> this is easily caught.
Ok.
>> +/* Start/end of .gnu.gomp.requires section of program, defined in
> Isn't it .gnu.gomp_requires ?
Yes.
>> +   crtoffloadbegin/end.o.  */
>> +__attribute__((weak))
>> +extern const unsigned int __requires_mask_table[];
>> +__attribute__((weak))
>> +extern const unsigned int __requires_mask_table_end[];
> I must say it is unclear to me how this works.
> It will only find one such array, say in the executable,
> or if the executable doesn't have it, in one of the shared libraries.
>
> I think we want some solution that will work with OpenMP code
> at least in the executable and some shared libraries it is linked against
> (obviously another case is when a library with certain #pragma omp requires
> is dlopened after we've finalized the number of devices, bet the options
> in that case are either warn or fatal error).

There is the general problem that GCC does not work well with having
device routines in a shared library;  it works fine if the device part
is only in the library – but calling from an .exe program's target
region a declare-target function in a library won't work.

Thus, we may need to find a more general solution for this.

> The choices could be e.g. make __requires_mask_table{,_end} .hidden
> and in the crtoffloadbegin.o (or end) unconditionally call some new libgomp
> routine to register the table, but the disadvantage of that is that we could
> have many of those register calls even when there is nothing to register
> (sure, the .ctor in crtoffloadbegin.o (or end) could compare the 2 addresses
> and not call anything if the table is empty but there would be still code
> executed during initialization of the library).
>
> Yet another possibility is linker plugin case.  We already use it for the
> case where we actually have some offloading LTO bytecode, transform it into
> a data section and register with GOMP_offload_register*.

> So, if we could e.g. at the same time also process those requires arrays,
> diagnose at link time if multiple TUs with that set disagree on the mask
> value and in the target data provide that mask to the library, that would
> be much nicer.
> And the masks either could be gathered from .gnu.gomp_requires or it can be
> somehow encoded in the offloading LTO or its associated data.
> What is important though is that it will work even if we actually don't have
> any "omp declare target" functions or variables in the TU or the whole
> executable or library, just the requires mask.  But that can be dealt with
> e.g. by forcing the LTO sections even for that case or so.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek June 9, 2022, 2:19 p.m. UTC | #3
On Thu, Jun 09, 2022 at 02:46:34PM +0200, Tobias Burnus wrote:
> On 09.06.22 13:40, Jakub Jelinek via Gcc-patches wrote:
> > On Wed, Jun 08, 2022 at 05:56:02AM +0200, Tobias Burnus wrote:
> > > +         && lookup_attribute ("omp declare target",
> > > +                              DECL_ATTRIBUTES (current_function_decl)))
> > > +       omp_requires_mask
> > > +         = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> > I must admit it is unclear what the
> > "must appear lexically before any device constructs or device routines."
> > restriction actually means for device routines.
> > Is that lexically before definition of such device routines, or even their
> > declarations?
> I have similar issues – also for Fortran (and C++) module use. Hence, I
> had filled https://github.com/OpenMP/spec/issues/3240 (not publicly
> accessible); I added your issues to the list.
> > The above patch snippet is I believe for function definitions that were
> > arked as declare target before the definition somehow (another decl for
> > it merged with the new one or in between the begin/end).  And is true
> > even for device_type (host), to rule that out you'd need to check for
> > "omp declare target host" attribute not being present.
> > I'm not against the above snippet perhaps adjusted for device_type(host),
> > but IMHO we want clarifications from omp-lang
> How to proceed for now? And does 'omp_is_initial_device()' on the host a
> device function or not? It can be hard-coded to 'true' ...

If it is from me, bet it was because of that (mis)understanding that
device routines are device related runtime API calls.
I'd suggest to only mark in the patch what is clear (which is device
constructs) and defer the rest until it is clarified.

> > For Fortran, is the above mostly not needed because requires need to be in
> > the specification part and device constructs are executable and appear in
> > the part after it?  Do we allow requires in BLOCK's specification part?
> We don't allow it in BLOCK – but there are issues related to USE-ing
> modules, cf. OpenMP issue.

Ack.

> In terms of parsing, it makes no difference – contrary to
> 'unified_shared_memory', where the parser could decide not to add
> implicit mapping, the compiler part is not affected by API calls.

Yeah.  So perhaps on the standard side we should just keep the
lexically before device constructs (and metadirective/declare variant
device related resolution) in the restriction, but say that TUs
that have device constructs and device runtime APIs (or whatever is agreed)
imply that requires mask must be the same in all of them.

> > Shouldn't the vars in that section be const, so that it is a read-only
> > section?
> > 
> > Is unsigned_type_node what we want (say wouldn't be just unsigned_char_node
> > be enough, currently we just need 3 bits).
> 
> Probably -that would be 8 bits, leaving 5 spare. I have not checked what
> Andrew et al. do with the pinned-memory support by -f<some-flag>, but
> that will likely use only 1 to 3 bits, if any.

If it is SHF_MERGE, even 16-bit or 32-bit wouldn't be the end of the world,
or if it is in LTO streamed out stuff, we can use a bitpack for it...

> > Also, wonder if for HAVE_GAS_SHF_MERGE && flag_merge_constants
> > we shouldn't try to make that section mergeable.  If it goes away during
> > linking and is replaced by something, then it doesn't matter, but otherwise,
> > as we don't record which TU had what flags, all we care about is that
> > there were some TUs which used device construct/routines (and device APIs?)
> > and used bitmask 7, other TUs that used bitmask 3 and others that used
> > bitmask 4.
> (maybe – I am not sure about this, either.)
> > @@ -442,6 +463,14 @@ omp_finish_file (void)
> >       }
> >     else
> >       {
> > +#ifndef ACCEL_COMPILER
> > +      if (flag_openmp
> > +       && (omp_requires_mask & OMP_REQUIRES_TARGET_USED)
> > +       && (omp_requires_mask & (OMP_REQUIRES_UNIFIED_ADDRESS
> > +                                | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> > +                                | OMP_REQUIRES_REVERSE_OFFLOAD)))
> > +     sorry ("OpenMP device offloading is not supported for this target");
> > +#endif
> > I don't understand this snippet.  Without named sections on the host,
> > I bet we simply don't support offloading at all,
> > the record_offload_symbol target hook is only non-trivially defined
> > for nvptx and nvptx isn't typical host for OpenMP offloading,
> > because we don't remember it anywhere.
> 
> I thought that would address your: "This probably needs to sorry if the
> target doesn't support named sections. We probably don't support LTO in
> that case either though."

But sorry means we will fail to compile it.  Perhaps
inform would be better, but then we don't complain (warn/inform)
if no offloading targets are configured.  And, presence of requires
unified*/reverse_offload  as the reason for the diagnostics rather than
say presence of declare target functions would be weird.

I think best would be a fatal error if people try to configure
offloading targets for a compiler that doesn't support named sections,
or perhaps that and presence of anything that should be offloaded.

	Jakub
  
Tobias Burnus June 29, 2022, 2:33 p.m. UTC | #4
Hi Jakub, hi all,

new version attached. It now checks during lto1 whether the values are
consistent – and fails with a hard error.

The actually used value (by libgomp) is stored as a scalar weak symbol –
while for checking, each translation unit stores the integer value for
lto (alongside the offload table). This is both used for checking and in
lto1 (device + host lto1), to restore the value of 'omp_requires_mask'
for further use. – Currently, it is only used on the host to make the
value available to libgomp. – However, a device lto1 could also use it.
(Usage: cf. Andrew's USM gcn patch.)

Unchanged from previous version, libgomp outputs an warning/note if a
device could be found but the requires prevented libgomp from using it.
This message is also shown with -foffload=disable but it is not shown
for OMP_TARGET_OFFLOAD=disable.

Other change is that API calls no longer count as relevant for 'omp
requires' – such that compilation units which only contain those will
not output anything (independent whether there is an 'omp requires' or not.)

On 09.06.22 16:19, Jakub Jelinek wrote:
> On Thu, Jun 09, 2022 at 02:46:34PM +0200, Tobias Burnus wrote:
>> On 09.06.22 13:40, Jakub Jelinek via Gcc-patches wrote:
> If it is from me, bet it was because of that (mis)understanding that
> device routines are device related runtime API calls.
> I'd suggest to only mark in the patch what is clear (which is device
> constructs) and defer the rest until it is clarified.
Done so.
>>> Shouldn't the vars in that section be const, so that it is a read-only
>>> section?
>>>
>>> Is unsigned_type_node what we want (say wouldn't be just unsigned_char_node
>>> be enough, currently we just need 3 bits).
>> Probably -that would be 8 bits, leaving 5 spare. I have not checked what
>> Andrew et al. do with the pinned-memory support by -f<some-flag>, but
>> that will likely use only 1 to 3 bits, if any.
> If it is SHF_MERGE, even 16-bit or 32-bit wouldn't be the end of the world,
> or if it is in LTO streamed out stuff, we can use a bitpack for it...

As the final binary will only contain a single variable, the size should
not matter much. I currently use 'unsigned' but it could surely be
shorter.  For the .o files, it also outputs the unsigned value for each
TU, but that's also small. I was thinking about adding more data (like
location data, be it location_t or __FILENAME__). However, it uses a
stripped-down stream writer - and to do so, location/string writing
requires a different object (and reading it, data_in). I did not regard
this as worthwhile and, thus, I only output the used requires clauses
and not where they were used.

> I think best would be a fatal error if people try to configure
> offloading targets for a compiler that doesn't support named sections,
> or perhaps that and presence of anything that should be offloaded.

I do not use any named section – but I could if it makes sense. In any
case, the question is whether the current weak symbol makes sense or
not. And whether there are problems in using weak symbols (in libgomp's
target.c + for non-ACCEL_COMPILER, but only when the symbol needs to be
written). I am also not sure about the best naming. – Thoughts?

Otherwise, tested with no offloading configured + with offloading to
nvptx (fully tested) and gcn (partially) [all x86_64-gnu-linux)

Tobias

PS: At some point, we need to think about handling calling from a
program's target region a declare-target device function which is inside
a shared library. I am sure, we currently do not handle it. – For that,
we then also have to think about how to handle the requires clauses.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek June 29, 2022, 5:02 p.m. UTC | #5
On Wed, Jun 29, 2022 at 04:33:02PM +0200, Tobias Burnus wrote:
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -2488,6 +2488,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
>  	  break;
>  	}
>  
> +      if (flag_openmp
> +	  && lookup_attribute ("omp declare target",
> +			       DECL_ATTRIBUTES (current_function_decl)))
> +	omp_requires_mask
> +	  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
> +
>        if (DECL_DECLARED_INLINE_P (current_function_decl))
>          tv = TV_PARSE_INLINE;
>        else

I thought the above would be left out, at least until clarified what exactly
we mean with device routines in the restrictions.

> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -15389,6 +15389,11 @@ cp_parser_simple_declaration (cp_parser* parser,
>  	  /* Otherwise, we're done with the list of declarators.  */
>  	  else
>  	    {
> +	      if (flag_openmp && lookup_attribute ("omp declare target",
> +						   DECL_ATTRIBUTES (decl)))
> +		omp_requires_mask
> +		  = (enum omp_requires) (omp_requires_mask
> +					 | OMP_REQUIRES_TARGET_USED);
>  	      pop_deferring_access_checks ();
>  	      cp_finalize_omp_declare_simd (parser, &odsd);
>  	      return;

And the above too.

On the Fortran side I don't see it being done.
> --- a/gcc/lto-cgraph.cc
> +++ b/gcc/lto-cgraph.cc
> @@ -1068,12 +1069,28 @@ read_string (class lto_input_block *ib)
>  void
>  output_offload_tables (void)
>  {
> -  if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars))
> +  bool output_requires = (flag_openmp
> +			  && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0);
> +  if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars)
> +      && !output_requires)
>      return;
>  
>    struct lto_simple_output_block *ob
>      = lto_create_simple_output_block (LTO_section_offload_table);
>  
> +  if (output_requires)
> +    {
> +      HOST_WIDE_INT val = ((HOST_WIDE_INT) omp_requires_mask
> +			   & (OMP_REQUIRES_UNIFIED_ADDRESS
> +			      | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +			      | OMP_REQUIRES_REVERSE_OFFLOAD
> +			      | OMP_REQUIRES_TARGET_USED));

Why is the OMP_REQUIRES_TARGET_USED bit saved there?  It is always set
if output_requires...
If we want to make sure it is set in omp_requires_mask after streaming
in, we can just or it in back there.

> @@ -1811,6 +1844,24 @@ input_offload_tables (bool do_force_output)
>  	      if (do_force_output)
>  		varpool_node::get (var_decl)->force_output = 1;
>  	    }
> +	  else if (tag == LTO_symtab_edge)
> +	    {
> +	      static bool error_emitted = false;
> +	      HOST_WIDE_INT val = streamer_read_hwi (ib);
> +
> +	      if (omp_requires_mask == 0)
> +		omp_requires_mask = (omp_requires) val;

I mean here: (omp_requires) (val | OMP_REQUIRES_TARGET_USED);

> +	      else if (omp_requires_mask != val && !error_emitted)
> +		{
> +		  char buf[64], buf2[64];
> +		  omp_requires_to_name (buf, sizeof (buf), omp_requires_mask);
> +		  omp_requires_to_name (buf2, sizeof (buf2), val);
> +		  error ("OpenMP %<requires%> directive with non-identical "
> +			 "clauses in multiple compilation units: %qs vs. %qs",
> +			 buf, buf2);

I think the user should be told also where, so file_data->file_name and
saved another filename from the if (omp_requires_mask == 0) body.
I admit I haven't investigated whether it would be enough to just record
the const char * file_data->file_name or whether what it points to might be
freed before we report it.

> +		  error_emitted = true;

> --- a/gcc/omp-offload.cc
> +++ b/gcc/omp-offload.cc
> @@ -398,6 +399,26 @@ omp_finish_file (void)
>    unsigned num_funcs = vec_safe_length (offload_funcs);
>    unsigned num_vars = vec_safe_length (offload_vars);
>  
> +#ifndef ACCEL_COMPILER
> +  if (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0)
> +    {
> +      tree var = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +				 get_identifier ("__offload_requires_mask"),
> +				 unsigned_type_node);
> +      TREE_PUBLIC (var) = 1;
> +      TREE_STATIC (var) = 1;
> +      TREE_READONLY (var) = 1;
> +      DECL_INITIAL (var)
> +	= build_int_cst (unsigned_type_node,
> +			 ((unsigned int) omp_requires_mask
> +			  & (OMP_REQUIRES_UNIFIED_ADDRESS
> +			     | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +			     | OMP_REQUIRES_REVERSE_OFFLOAD)));
> +      declare_weak (var);
> +      varpool_node::finalize_decl (var);
> +    }
> +#endif

I think this __offload_requires_mask can't work reliably, not only because
it is a single var per process while one can have target regions in
multiple shared libraries (I know we've discussed that it doesn't always
work reliably, but we shouldn't hardcode something that will prevent it from
working properly altogether), but also because one can e.g. use symbol
versioning or simple linker script and __offload_requires_mask won't be
visible to libgomp.

Can't we arrange for GOMP_offload_register_ver to see the bitmasks somewhere
instead (and force GOMP_offload_register_ver even if there are no vars or
funcs and just the requires mask)?

GOMP_offload_register_ver passes a version number, perhaps we could bump
GOMP_VERSION from 1 to 2 and store the bitmask somewhere in the target data
and on the libgomp side handle both GOMP_VERSION_LIB (version) 1 and 2,
the former by pretending there is 0 bitmask?

There can be various ways how to get the bitmask into
config/*/*mkoffload.cc so that it can add it there.

Could be the weak __offload_requires_mask (but we probably e.g. can't assume
declare_weak will work), which we'd make also hidden and the *mkoffload.cc
generated source would refer to its address (but have it declared hidden so
that if it isn't present in current TU, we get NULL).  Disadvantage is the
relocation.

Or we could ask for the offloading lto1 when it merges those from different
offloadng TUs to emit some magic symbol in what it emits and have mkoffload
search for it.

Or mkoffload could pass some option to the offloading lto compilation, say
filename of a temporary file, let lto1 save into that file the bitmask and
let mkoffload read it.  Or env var.

> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -125,7 +125,7 @@ extern void GOMP_PLUGIN_fatal (const char *, ...)
>  extern const char *GOMP_OFFLOAD_get_name (void);
>  extern unsigned int GOMP_OFFLOAD_get_caps (void);
>  extern int GOMP_OFFLOAD_get_type (void);
> -extern int GOMP_OFFLOAD_get_num_devices (void);
> +extern int GOMP_OFFLOAD_get_num_devices (unsigned int);

This is an ABI change for plugins, don't we want to e.g. rename
the symbol as well, so that plugin mismatches are caught more easily?

	Jakub
  
Tobias Burnus June 29, 2022, 6:10 p.m. UTC | #6
Hi Jakub,

On 29.06.22 19:02, Jakub Jelinek wrote:
> On Wed, Jun 29, 2022 at 04:33:02PM +0200, Tobias Burnus wrote:
>> +      if (flag_openmp
>> +      && lookup_attribute ("omp declare target",
>> +                           DECL_ATTRIBUTES (current_function_decl)))
>> +    omp_requires_mask
>> +      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);

> I thought the above would be left out, at least until clarified what exactly
> we mean with device routines in the restrictions.
Missed that – I thought mostly of the API calls. However, I concur that
for 'declare target', it is not really needed as no data transfers or
reverse offloads can happen. - Additionally, I took this part from
Chung-Lin's patch and did not really think about this part ...

>> --- a/gcc/cp/parser.cc
>> +++ b/gcc/cp/parser.cc
> And the above too.
>
> On the Fortran side I don't see it being done.
(Likewise.)
>> --- a/gcc/lto-cgraph.cc
>> +++ b/gcc/lto-cgraph.cc
>> ...
>> +  if (output_requires)
>> +    {
>> +      HOST_WIDE_INT val = ((HOST_WIDE_INT) omp_requires_mask
>> +                       & (OMP_REQUIRES_UNIFIED_ADDRESS
>> +                          | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
>> +                          | OMP_REQUIRES_REVERSE_OFFLOAD
>> +                          | OMP_REQUIRES_TARGET_USED));
> Why is the OMP_REQUIRES_TARGET_USED bit saved there?  It is always set
> if output_requires...
> If we want to make sure it is set in omp_requires_mask after streaming
> in, we can just or it in back there.

It is there because it is later needed: With -flto, we otherwise
wouldn't generate the variable in omp-offload.cc. And as this value is
only used internally, I thought it could just stay there. However, it
could also be excluded here and ...

>> @@ -1811,6 +1844,24 @@ input_offload_tables (bool do_force_output)
>>            if (do_force_output)
>>              varpool_node::get (var_decl)->force_output = 1;
>>          }
>> +      else if (tag == LTO_symtab_edge)
>> +        {
>> +          static bool error_emitted = false;
>> +          HOST_WIDE_INT val = streamer_read_hwi (ib);
>> +
>> +          if (omp_requires_mask == 0)
>> +            omp_requires_mask = (omp_requires) val;
... added here as:

> I mean here: (omp_requires) (val | OMP_REQUIRES_TARGET_USED);
... but that also requires ...
>> +          else if (omp_requires_mask != val && !error_emitted)

... something like:

(omp_requires_mask & ~OMP_REQUIRES_TARGET_USED) != val

or something like that.

>> +            {
>> +              char buf[64], buf2[64];
>> +              omp_requires_to_name (buf, sizeof (buf), omp_requires_mask);
>> +              omp_requires_to_name (buf2, sizeof (buf2), val);
>> +              error ("OpenMP %<requires%> directive with non-identical "
>> +                     "clauses in multiple compilation units: %qs vs. %qs",
>> +                     buf, buf2);
> I think the user should be told also where, so file_data->file_name

With -save-temps, the filename is indeed helpful, e.g.:
"a-requires-1.o". However, without, I get names like: "/tmp/ccIgRkOW.o".

As mentioned, I was thinking of storing after the value some location
data, e.g. DECL_SOURCE_FILE() or even DECL_SOURCE_LOCATION() – by
keeping track of the first 'omp requires' in a translation unit.

Those can be streamed with  streamer_write_string and
lto_output_location. However, the both require as argument an "struct
output_block" and in lto-cgraph.cc, I only have a "struct
lto_simple_output_block".

And for reading, I additionally need a "class data_in" argument.

Thus, I think it is doable – however, I was not quite sure whether it
made sense to do all the effort or not. (And it was not immediately
clear to me how to create a "data_in" class and ...)

> Can't we arrange for GOMP_offload_register_ver to see the bitmasks somewhere
> instead (and force GOMP_offload_register_ver even if there are no vars or
> funcs and just the requires mask)?

This probably works – but means some restructuring.
GOMP_offload_register_ver assumes that num_devices is already available
– and we need to exclude those for which the 'omp requires' cannot be
fulfilled.

I think this could be either be done in GOMP_offload_register_ver by
decrementing num_offload_images + shifting the offload_images[i] entries
(or have some table to match user-visible numbers to the original number) .

Or it could just be done by setting a flag – and num_offload_images
updated later. We probably need something which is run later to exclude
those devices for which no image exists (existing but not supported
devices) – and for OpenMP 6's OMP_AVAILABLE_DEVICES env, which permits
to sort the devices and filter out some of them.

> Could be the weak __offload_requires_mask (but we probably e.g. can't assume
> declare_weak will work),
I assume that it does work in practice, given that it is only needed on
the host – and given which systems effectively support offloading. –
With -flto, we even would know that there is only one variable, but
unfortunately, we cannot count on a host lto1 run.
> Or we could ask for the offloading lto1 when it merges those from different
> offloadng TUs to emit some magic symbol in what it emits and have mkoffload
> search for it.
>
> Or mkoffload could pass some option to the offloading lto compilation, say
> filename of a temporary file, let lto1 save into that file the bitmask and
> let mkoffload read it.  Or env var.
(Can surely be done – having a constant in the GOMP_offload_register_ver
call would be surely useful.)
>> --- a/libgomp/libgomp-plugin.h
>> +++ b/libgomp/libgomp-plugin.h
>> ...
>> -extern int GOMP_OFFLOAD_get_num_devices (void);
>> +extern int GOMP_OFFLOAD_get_num_devices (unsigned int);
> This is an ABI change for plugins, don't we want to e.g. rename
> the symbol as well, so that plugin mismatches are caught more easily?

Yes, I recall that we talked about it, but I obviously missed to
actually change it. If we go for GOMP_offload_register_ver + updating
the list later, this function can probably stay as is – and we need
another func to query whether the requirements are fulfillable or not.

Tobias


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek June 29, 2022, 8:18 p.m. UTC | #7
On Wed, Jun 29, 2022 at 08:10:10PM +0200, Tobias Burnus wrote:
> > > +  if (output_requires)
> > > +    {
> > > +      HOST_WIDE_INT val = ((HOST_WIDE_INT) omp_requires_mask
> > > +                       & (OMP_REQUIRES_UNIFIED_ADDRESS
> > > +                          | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> > > +                          | OMP_REQUIRES_REVERSE_OFFLOAD
> > > +                          | OMP_REQUIRES_TARGET_USED));
> > Why is the OMP_REQUIRES_TARGET_USED bit saved there?  It is always set
> > if output_requires...
> > If we want to make sure it is set in omp_requires_mask after streaming
> > in, we can just or it in back there.
> 
> It is there because it is later needed: With -flto, we otherwise
> wouldn't generate the variable in omp-offload.cc. And as this value is
> only used internally, I thought it could just stay there. However, it
> could also be excluded here and ...

Ok, let's keep it then.  Wanted to save that 1 bit somewhere, but as it
isn't a pack, it is insignificant anyway.
More could be saved by reordering the bitmasks such that the atomic stuff is
in upper bits.

> > > +              char buf[64], buf2[64];
> > > +              omp_requires_to_name (buf, sizeof (buf), omp_requires_mask);
> > > +              omp_requires_to_name (buf2, sizeof (buf2), val);
> > > +              error ("OpenMP %<requires%> directive with non-identical "
> > > +                     "clauses in multiple compilation units: %qs vs. %qs",
> > > +                     buf, buf2);
> > I think the user should be told also where, so file_data->file_name
> 
> With -save-temps, the filename is indeed helpful, e.g.:
> "a-requires-1.o". However, without, I get names like: "/tmp/ccIgRkOW.o".

Even when using
gcc -fopenmp -c foo.c -o foo.o
gcc -fopenmp -c bar.c -o bar.o
gcc -fopenmp -o foo foo.o bar.o
?
Anyway, I think it would be good to ask Richi or Honza what is the best to
get at the TU's filename.
Perhaps location_t from TRANSLATION_UNIT_DECL or something similar?

> > Can't we arrange for GOMP_offload_register_ver to see the bitmasks somewhere
> > instead (and force GOMP_offload_register_ver even if there are no vars or
> > funcs and just the requires mask)?
> 
> This probably works – but means some restructuring.
> GOMP_offload_register_ver assumes that num_devices is already available
> – and we need to exclude those for which the 'omp requires' cannot be
> fulfilled.

GOMP_offload_register_ver is called from initializers of programs and shared
libraries.  For the program and non-dlopened shared libraries, the usual
case is that it is called before we initialize devices, so we just record it
in offload_images and continue.  When the devices are initialized, we push
it to those devices.

Another case is registration after number of devices is initialized.

The former should just enqueue also those bitmasks, and when we initialize
devices we should complain on mismatches and/or filter out unsuitable
devices.

For the latter case, in theory (at least when we also catch calls to the
device routines with the meaning of device related omp_* APIs) that means
some TU already had the mask and so the later loaded masks should be the
same, so we should just complain on mismatches.

Now, the target data I'm afraid is device specific, but for GOMP_VERSION 2
we could e.g. have the pointer passed to the function point to target data
with the mask at offset -4 bytes before it, or can just have always the
bitmask followed by real target data.

	Jakub
  

Patch

OpenMP: Move omp requires checks to libgomp

Handle reverse_offload, unified_address, and unified_shared_memory
requirements in libgomp by putting them into the .gnu.gomp_requires section.

For all in-principle supported devices, if a requirement cannot be fulfilled,
the device is excluded from the (supported) devices list. Currently, none of
those requirements are marked as supported for any of the non-host devices.

Additionally, libgomp checks for consistency across the entire
.gnu.gomp_requires section, matching the requirements set by the OpenMP spec.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_declaration_or_fndef): Set
	OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
	"omp declare target" attribute.
	(c_parser_omp_target_data): Set	OMP_REQUIRES_TARGET_USED in
	omp_requires_mask.
	(c_parser_omp_target_enter_data): Likewise.
	(c_parser_omp_target_exit_data): Likewise.
	(c_parser_omp_requires): Remove sorry.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_simple_declaration): Set
	OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
	"omp declare target" attribute.
	(cp_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
	omp_requires_mask.
	(cp_parser_omp_target_enter_data): Likewise.
	(cp_parser_omp_target_exit_data): Likewise.
	(cp_parser_omp_requires): Remove sorry.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_match_omp_requires): Remove "not implemented yet".
	* parse.cc: Include "tree.h" and "omp-general.h".
	(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.

gcc/ChangeLog:

	* omp-general.h (omp_runtime_api_call): New prototype.
	* omp-general.cc (omp_runtime_api_call): Added device_api_only arg
	and moved from ...
	* omp-low.cc (omp_runtime_api_call): ... here.
	(scan_omp_1_stmt): Update call.
	* gimplify.cc (gimplify_call_expr): Call omp_runtime_api_call.
	* omp-offload.cc (omp_finish_file): Add code to create OpenMP requires
	mask variable in .gnu.gomp_requires section, if needed.

include/ChangeLog:

	* gomp-constants.h (GOMP_REQUIRES_UNIFIED_ADDRESS,
	GOMP_REQUIRES_UNIFIED_SHARED_MEMORY,
	GOMP_REQUIRES_REVERSE_OFFLOAD): New.

libgcc/ChangeLog:

	* offloadstuff.c (__requires_mask_table, __requires_mask_table_end):
	New symbols to mark start and end of the .gnu.gomp_requires section.


libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add
	omp_requires_mask arg.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise;
	return -1 when device available but omp_requires_mask != 0.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise.
	* oacc-host.c (host_get_num_devices, host_openacc_get_property):
	Update call.
	* oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1,
	goacc_attach_host_thread_to_device, acc_get_num_devices,
	acc_set_device_num, get_property_any): Likewise.
	* target.c: (__requires_mask_table, __requires_mask_table_end):
	Declare weak extern symbols.
	(gomp_requires_to_name): New.
	(gomp_target_init): Add code to check .gnu._gomp_requires section
	mask values for inconsistencies; warn when requirements makes an
	existing device unsupported.
	* testsuite/libgomp.c-c++-common/requires-1-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-1.c: New test.
	* testsuite/libgomp.c-c++-common/requires-2-aux.c: New test.
	* testsuite/libgomp.c-c++-common/requires-2.c: New test.

liboffloadmic/ChangeLog:

	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices):
	Return -1 when device available but omp_requires_mask != 0.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/requires-4.c: Update dg-*.
	* c-c++-common/gomp/target-device-ancestor-2.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-3.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
	* c-c++-common/gomp/target-device-ancestor-5.c: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move post-FE
	checks to ...
	* gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file.

Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>

 gcc/c/c-parser.cc                                  |  21 +++-
 gcc/cp/parser.cc                                   |  20 ++-
 gcc/fortran/openmp.cc                              |   4 -
 gcc/fortran/parse.cc                               |  21 ++++
 gcc/gimplify.cc                                    |   3 +
 gcc/omp-general.cc                                 | 137 +++++++++++++++++++++
 gcc/omp-general.h                                  |   1 +
 gcc/omp-low.cc                                     | 135 +-------------------
 gcc/omp-offload.cc                                 |  29 +++++
 gcc/testsuite/c-c++-common/gomp/requires-4.c       |   2 -
 .../c-c++-common/gomp/target-device-ancestor-2.c   |  10 +-
 .../c-c++-common/gomp/target-device-ancestor-3.c   |   2 +-
 .../c-c++-common/gomp/target-device-ancestor-4.c   |   4 +-
 .../c-c++-common/gomp/target-device-ancestor-5.c   |   2 +-
 .../gfortran.dg/gomp/target-device-ancestor-2.f90  |  70 +----------
 .../gfortran.dg/gomp/target-device-ancestor-2a.f90 |  80 ++++++++++++
 .../gfortran.dg/gomp/target-device-ancestor-3.f90  |   6 +-
 .../gfortran.dg/gomp/target-device-ancestor-4.f90  |   6 +-
 include/gomp-constants.h                           |   6 +
 libgcc/offloadstuff.c                              |   9 ++
 libgomp/libgomp-plugin.h                           |   2 +-
 libgomp/oacc-host.c                                |   4 +-
 libgomp/oacc-init.c                                |  16 +--
 libgomp/plugin/plugin-gcn.c                        |   6 +-
 libgomp/plugin/plugin-nvptx.c                      |   9 +-
 libgomp/target.c                                   |  66 +++++++++-
 .../libgomp.c-c++-common/requires-1-aux.c          |  11 ++
 .../testsuite/libgomp.c-c++-common/requires-1.c    |  21 ++++
 .../libgomp.c-c++-common/requires-2-aux.c          |  11 ++
 .../testsuite/libgomp.c-c++-common/requires-2.c    |  20 +++
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp   |   6 +-
 31 files changed, 499 insertions(+), 241 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 1704a52be12..4748ce04737 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -2488,6 +2488,12 @@  c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	  break;
 	}
 
+      if (flag_openmp
+	  && lookup_attribute ("omp declare target",
+			       DECL_ATTRIBUTES (current_function_decl)))
+	omp_requires_mask
+	  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
       if (DECL_DECLARED_INLINE_P (current_function_decl))
         tv = TV_PARSE_INLINE;
       else
@@ -20915,6 +20921,10 @@  c_parser_omp_teams (location_t loc, c_parser *parser,
 static tree
 c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
 {
+  if (flag_openmp)
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   tree clauses
     = c_parser_omp_all_clauses (parser, OMP_TARGET_DATA_CLAUSE_MASK,
 				"#pragma omp target data");
@@ -21057,6 +21067,10 @@  c_parser_omp_target_enter_data (location_t loc, c_parser *parser,
       return true;
     }
 
+  if (flag_openmp)
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   tree clauses
     = c_parser_omp_all_clauses (parser, OMP_TARGET_ENTER_DATA_CLAUSE_MASK,
 				"#pragma omp target enter data");
@@ -21143,6 +21157,10 @@  c_parser_omp_target_exit_data (location_t loc, c_parser *parser,
       return true;
     }
 
+  if (flag_openmp)
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   tree clauses
     = c_parser_omp_all_clauses (parser, OMP_TARGET_EXIT_DATA_CLAUSE_MASK,
 				"#pragma omp target exit data");
@@ -22763,9 +22781,6 @@  c_parser_omp_requires (c_parser *parser)
 	      c_parser_skip_to_pragma_eol (parser, false);
 	      return;
 	    }
-	  if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
-	    sorry_at (cloc, "%qs clause on %<requires%> directive not "
-			    "supported yet", p);
 	  if (p)
 	    c_parser_consume_token (parser);
 	  if (this_req)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index da2f370cdca..6e26d123370 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15389,6 +15389,11 @@  cp_parser_simple_declaration (cp_parser* parser,
 	  /* Otherwise, we're done with the list of declarators.  */
 	  else
 	    {
+	      if (flag_openmp && lookup_attribute ("omp declare target",
+						   DECL_ATTRIBUTES (decl)))
+		omp_requires_mask
+		  = (enum omp_requires) (omp_requires_mask
+					 | OMP_REQUIRES_TARGET_USED);
 	      pop_deferring_access_checks ();
 	      cp_finalize_omp_declare_simd (parser, &odsd);
 	      return;
@@ -44287,6 +44292,10 @@  cp_parser_omp_teams (cp_parser *parser, cp_token *pragma_tok,
 static tree
 cp_parser_omp_target_data (cp_parser *parser, cp_token *pragma_tok, bool *if_p)
 {
+  if (flag_openmp)
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   tree clauses
     = cp_parser_omp_all_clauses (parser, OMP_TARGET_DATA_CLAUSE_MASK,
 				 "#pragma omp target data", pragma_tok);
@@ -44390,6 +44399,10 @@  cp_parser_omp_target_enter_data (cp_parser *parser, cp_token *pragma_tok,
       return true;
     }
 
+  if (flag_openmp)
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   tree clauses
     = cp_parser_omp_all_clauses (parser, OMP_TARGET_ENTER_DATA_CLAUSE_MASK,
 				 "#pragma omp target enter data", pragma_tok);
@@ -44481,6 +44494,10 @@  cp_parser_omp_target_exit_data (cp_parser *parser, cp_token *pragma_tok,
       return true;
     }
 
+  if (flag_openmp)
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   tree clauses
     = cp_parser_omp_all_clauses (parser, OMP_TARGET_EXIT_DATA_CLAUSE_MASK,
 				 "#pragma omp target exit data", pragma_tok);
@@ -46861,9 +46878,6 @@  cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 	      return false;
 	    }
-	  if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
-	    sorry_at (cloc, "%qs clause on %<requires%> directive not "
-			    "supported yet", p);
 	  if (p)
 	    cp_lexer_consume_token (parser->lexer);
 	  if (this_req)
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index d12cec43d64..7790ef34664 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -5481,10 +5481,6 @@  gfc_match_omp_requires (void)
       else
 	goto error;
 
-      if (requires_clause & ~(OMP_REQ_ATOMIC_MEM_ORDER_MASK
-			      | OMP_REQ_DYNAMIC_ALLOCATORS))
-	gfc_error_now ("Sorry, %qs clause at %L on REQUIRES directive is not "
-		       "yet supported", clause, &old_loc);
       if (!gfc_omp_requires_add_clause (requires_clause, clause, &old_loc, NULL))
 	goto error;
       requires_clauses |= requires_clause;
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 7356d1b5a3a..b142e169a5c 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -6908,6 +6908,27 @@  done:
       break;
     }
 
+  if (omp_requires & OMP_REQ_TARGET_MASK)
+    {
+      omp_requires_mask = (enum omp_requires) (omp_requires_mask
+					       | OMP_REQUIRES_TARGET_USED);
+      if (omp_requires & OMP_REQ_REVERSE_OFFLOAD)
+	omp_requires_mask
+	  = (enum omp_requires) (omp_requires_mask
+				 | OMP_REQUIRES_REVERSE_OFFLOAD);
+      if (omp_requires & OMP_REQ_UNIFIED_ADDRESS)
+	omp_requires_mask
+	  = (enum omp_requires) (omp_requires_mask
+				 | OMP_REQUIRES_UNIFIED_ADDRESS);
+      if (omp_requires & OMP_REQ_UNIFIED_SHARED_MEMORY)
+	omp_requires_mask
+	  = (enum omp_requires) (omp_requires_mask
+				 | OMP_REQUIRES_UNIFIED_SHARED_MEMORY);
+    }
+
+  if (omp_requires & OMP_REQ_DYNAMIC_ALLOCATORS)
+    omp_requires_mask
+	= (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_DYNAMIC_ALLOCATORS);
   /* Do the parse tree dump.  */
   gfc_current_ns = flag_dump_fortran_original ? gfc_global_ns_list : NULL;
 
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index cd1796643d7..3fe4571d677 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3644,6 +3644,9 @@  gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
 	  return GS_OK;
 	}
     }
+  if (fndecl && flag_openmp && omp_runtime_api_call (fndecl, true))
+    omp_requires_mask
+      = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
 
   /* Remember the original function pointer type.  */
   fnptrtype = TREE_TYPE (CALL_EXPR_FN (*expr_p));
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index a406c578f33..120bcaa10b2 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -89,6 +89,143 @@  omp_privatize_by_reference (tree decl)
   return lang_hooks.decls.omp_privatize_by_reference (decl);
 }
 
+/* Return true if FNDECL is an omp_* runtime API call; with device_api_only set,
+   returns true only for device API calls.  */
+
+bool
+omp_runtime_api_call (const_tree fndecl, bool device_api_only)
+{
+  tree declname = DECL_NAME (fndecl);
+  if (!declname
+      || (DECL_CONTEXT (fndecl) != NULL_TREE
+	  && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
+      || !TREE_PUBLIC (fndecl))
+    return false;
+
+  const char *name = IDENTIFIER_POINTER (declname);
+  if (!startswith (name, "omp_"))
+    return false;
+
+  static const char *omp_runtime_apis[] =
+    {
+      /* This array has 6 sections.  First omp_* calls that don't
+	 have any suffixes and are non-device APIs.  */
+      "aligned_alloc",
+      "aligned_calloc",
+      "alloc",
+      "calloc",
+      "free",
+      "realloc",
+      NULL,
+      /* Now likewise but for device API. */
+      "get_mapped_ptr",
+      "target_alloc",
+      "target_associate_ptr",
+      "target_disassociate_ptr",
+      "target_free",
+      "target_is_accessible",
+      "target_is_present",
+      "target_memcpy",
+      "target_memcpy_async",
+      "target_memcpy_rect",
+      "target_memcpy_rect_async",
+      NULL,
+      /* Now omp_* calls that are available as omp_* and omp_*_; however, the
+	 DECL_NAME is always omp_* without tailing underscore. Non device.  */
+      "capture_affinity",
+      "destroy_allocator",
+      "destroy_lock",
+      "destroy_nest_lock",
+      "display_affinity",
+      "fulfill_event",
+      "get_active_level",
+      "get_affinity_format",
+      "get_cancellation",
+      "get_default_allocator",
+      "get_default_device",
+      "get_dynamic",
+      "get_level",
+      "get_max_active_levels",
+      "get_max_task_priority",
+      "get_max_teams",
+      "get_max_threads",
+      "get_nested",
+      "get_num_devices",
+      "get_num_places",
+      "get_num_procs",
+      "get_num_teams",
+      "get_num_threads",
+      "get_partition_num_places",
+      "get_place_num",
+      "get_proc_bind",
+      "get_supported_active_levels",
+      "get_team_num",
+      "get_teams_thread_limit",
+      "get_thread_limit",
+      "get_thread_num",
+      "get_wtick",
+      "get_wtime",
+      "in_final",
+      "in_parallel",
+      "init_lock",
+      "init_nest_lock",
+      "pause_resource",
+      "pause_resource_all",
+      "set_affinity_format",
+      "set_default_allocator",
+      "set_lock",
+      "set_nest_lock",
+      "test_lock",
+      "test_nest_lock",
+      "unset_lock",
+      "unset_nest_lock",
+      NULL,
+      /* And device APIs. */
+      "get_device_num",
+      "get_initial_device",
+      "is_initial_device",  /* Even if it does not require init'ed devices. */
+      NULL,
+      /* And finally calls available as omp_*, omp_*_ and omp_*_8_; however,
+	 as DECL_NAME only omp_* and omp_*_8 appear. For non device.  */
+      "display_env",
+      "get_ancestor_thread_num",
+      "init_allocator",
+      "get_partition_place_nums",
+      "get_place_num_procs",
+      "get_place_proc_ids",
+      "get_schedule",
+      "get_team_size",
+      "set_default_device",
+      "set_dynamic",
+      "set_max_active_levels",
+      "set_nested",
+      "set_num_teams",
+      "set_num_threads",
+      "set_schedule",
+      "set_teams_thread_limit",
+      NULL,
+      /* And for device APIs. (Currently none.)  */
+    };
+
+  int mode = 0;
+  for (unsigned i = 0; i < ARRAY_SIZE (omp_runtime_apis); i++)
+    {
+      if (omp_runtime_apis[i] == NULL)
+	{
+	  mode++;
+	  continue;
+	}
+      if (device_api_only && mode % 2 != 0)
+	continue;
+      size_t len = strlen (omp_runtime_apis[i]);
+      if (strncmp (name + 4, omp_runtime_apis[i], len) == 0
+	  && (name[4 + len] == '\0'
+	      || (mode > 1 && strcmp (name + 4 + len, "_8") == 0)))
+	return true;
+    }
+  return false;
+}
+
 /* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or GT_EXPR,
    given that V is the loop index variable and STEP is loop step. */
 
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 7a94831e8f5..f1be9f23ef7 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -95,6 +95,7 @@  extern tree omp_find_clause (tree clauses, enum omp_clause_code kind);
 extern bool omp_is_allocatable_or_ptr (tree decl);
 extern tree omp_check_optional_argument (tree decl, bool for_present_check);
 extern bool omp_privatize_by_reference (tree decl);
+extern bool omp_runtime_api_call (const_tree fndecl, bool device_api_only);
 extern void omp_adjust_for_condition (location_t loc, enum tree_code *cond_code,
 				      tree *n2, tree v, tree step);
 extern tree omp_get_for_step_from_incr (location_t loc, tree incr);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index f976e3a1549..243fa27a62f 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -3989,134 +3989,6 @@  setjmp_or_longjmp_p (const_tree fndecl)
   return !strcmp (name, "setjmp") || !strcmp (name, "longjmp");
 }
 
-/* Return true if FNDECL is an omp_* runtime API call.  */
-
-static bool
-omp_runtime_api_call (const_tree fndecl)
-{
-  tree declname = DECL_NAME (fndecl);
-  if (!declname
-      || (DECL_CONTEXT (fndecl) != NULL_TREE
-          && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
-      || !TREE_PUBLIC (fndecl))
-    return false;
-
-  const char *name = IDENTIFIER_POINTER (declname);
-  if (!startswith (name, "omp_"))
-    return false;
-
-  static const char *omp_runtime_apis[] =
-    {
-      /* This array has 3 sections.  First omp_* calls that don't
-	 have any suffixes.  */
-      "aligned_alloc",
-      "aligned_calloc",
-      "alloc",
-      "calloc",
-      "free",
-      "get_mapped_ptr",
-      "realloc",
-      "target_alloc",
-      "target_associate_ptr",
-      "target_disassociate_ptr",
-      "target_free",
-      "target_is_accessible",
-      "target_is_present",
-      "target_memcpy",
-      "target_memcpy_async",
-      "target_memcpy_rect",
-      "target_memcpy_rect_async",
-      NULL,
-      /* Now omp_* calls that are available as omp_* and omp_*_; however, the
-	 DECL_NAME is always omp_* without tailing underscore.  */
-      "capture_affinity",
-      "destroy_allocator",
-      "destroy_lock",
-      "destroy_nest_lock",
-      "display_affinity",
-      "fulfill_event",
-      "get_active_level",
-      "get_affinity_format",
-      "get_cancellation",
-      "get_default_allocator",
-      "get_default_device",
-      "get_device_num",
-      "get_dynamic",
-      "get_initial_device",
-      "get_level",
-      "get_max_active_levels",
-      "get_max_task_priority",
-      "get_max_teams",
-      "get_max_threads",
-      "get_nested",
-      "get_num_devices",
-      "get_num_places",
-      "get_num_procs",
-      "get_num_teams",
-      "get_num_threads",
-      "get_partition_num_places",
-      "get_place_num",
-      "get_proc_bind",
-      "get_supported_active_levels",
-      "get_team_num",
-      "get_teams_thread_limit",
-      "get_thread_limit",
-      "get_thread_num",
-      "get_wtick",
-      "get_wtime",
-      "in_final",
-      "in_parallel",
-      "init_lock",
-      "init_nest_lock",
-      "is_initial_device",
-      "pause_resource",
-      "pause_resource_all",
-      "set_affinity_format",
-      "set_default_allocator",
-      "set_lock",
-      "set_nest_lock",
-      "test_lock",
-      "test_nest_lock",
-      "unset_lock",
-      "unset_nest_lock",
-      NULL,
-      /* And finally calls available as omp_*, omp_*_ and omp_*_8_; however,
-	 as DECL_NAME only omp_* and omp_*_8 appear.  */
-      "display_env",
-      "get_ancestor_thread_num",
-      "init_allocator",
-      "get_partition_place_nums",
-      "get_place_num_procs",
-      "get_place_proc_ids",
-      "get_schedule",
-      "get_team_size",
-      "set_default_device",
-      "set_dynamic",
-      "set_max_active_levels",
-      "set_nested",
-      "set_num_teams",
-      "set_num_threads",
-      "set_schedule",
-      "set_teams_thread_limit"
-    };
-
-  int mode = 0;
-  for (unsigned i = 0; i < ARRAY_SIZE (omp_runtime_apis); i++)
-    {
-      if (omp_runtime_apis[i] == NULL)
-	{
-	  mode++;
-	  continue;
-	}
-      size_t len = strlen (omp_runtime_apis[i]);
-      if (strncmp (name + 4, omp_runtime_apis[i], len) == 0
-	  && (name[4 + len] == '\0'
-	      || (mode > 1 && strcmp (name + 4 + len, "_8") == 0)))
-	return true;
-    }
-  return false;
-}
-
 /* Helper function for scan_omp.
 
    Callback for walk_gimple_stmt used to scan for OMP directives in
@@ -4171,7 +4043,8 @@  scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	      omp_context *octx = ctx;
 	      if (gimple_code (ctx->stmt) == GIMPLE_OMP_SCAN && ctx->outer)
 		octx = ctx->outer;
-	      if (octx->order_concurrent && omp_runtime_api_call (fndecl))
+	      if (octx->order_concurrent
+		  && omp_runtime_api_call (fndecl, false))
 		{
 		  remove = true;
 		  error_at (gimple_location (stmt),
@@ -4179,7 +4052,7 @@  scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 			    "%<order(concurrent)%> clause", fndecl);
 		}
 	      if (gimple_code (ctx->stmt) == GIMPLE_OMP_TEAMS
-		  && omp_runtime_api_call (fndecl)
+		  && omp_runtime_api_call (fndecl, false)
 		  && ((IDENTIFIER_LENGTH (DECL_NAME (fndecl))
 		       != strlen ("omp_get_num_teams"))
 		      || strcmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)),
@@ -4197,7 +4070,7 @@  scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	      if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
 		  && (gimple_omp_target_kind (ctx->stmt)
 		      == GF_OMP_TARGET_KIND_REGION)
-		  && omp_runtime_api_call (fndecl))
+		  && omp_runtime_api_call (fndecl, false))
 		{
 		  tree tgt_clauses = gimple_omp_target_clauses (ctx->stmt);
 		  tree c = omp_find_clause (tgt_clauses, OMP_CLAUSE_DEVICE);
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index ad4e772015e..998abab0f11 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -397,6 +397,27 @@  omp_finish_file (void)
   unsigned num_funcs = vec_safe_length (offload_funcs);
   unsigned num_vars = vec_safe_length (offload_vars);
 
+  if (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0)
+    {
+      if (targetm_common.have_named_sections)
+	{
+	  const char *requires_section = ".gnu.gomp_requires";
+	  tree maskvar = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+				     get_identifier (".gomp_requires_mask"),
+				     unsigned_type_node);
+	  SET_DECL_ALIGN (maskvar, TYPE_ALIGN (unsigned_type_node));
+	  TREE_STATIC (maskvar) = 1;
+	  DECL_INITIAL (maskvar)
+	    = build_int_cst (unsigned_type_node,
+			     ((unsigned int) omp_requires_mask
+			      & (OMP_REQUIRES_UNIFIED_ADDRESS
+				 | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+				 | OMP_REQUIRES_REVERSE_OFFLOAD)));
+	  set_decl_section_name (maskvar, requires_section);
+	  varpool_node::finalize_decl (maskvar);
+	}
+    }
+
   if (num_funcs == 0 && num_vars == 0)
     return;
 
@@ -442,6 +463,14 @@  omp_finish_file (void)
     }
   else
     {
+#ifndef ACCEL_COMPILER
+      if (flag_openmp
+	  && (omp_requires_mask & OMP_REQUIRES_TARGET_USED)
+	  && (omp_requires_mask & (OMP_REQUIRES_UNIFIED_ADDRESS
+				   | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
+				   | OMP_REQUIRES_REVERSE_OFFLOAD)))
+	sorry ("OpenMP device offloading is not supported for this target");
+#endif
       for (unsigned i = 0; i < num_funcs; i++)
 	{
 	  tree it = (*offload_funcs)[i];
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-4.c b/gcc/testsuite/c-c++-common/gomp/requires-4.c
index 88ba7746cf8..8f45d83ea6e 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-4.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-4.c
@@ -9,5 +9,3 @@  foo (void)
 #pragma omp requires unified_shared_memory	/* { dg-error "'unified_shared_memory' clause used lexically after first target construct or offloading API" } */
 #pragma omp requires unified_address	/* { dg-error "'unified_address' clause used lexically after first target construct or offloading API" } */
 #pragma omp requires reverse_offload	/* { dg-error "'reverse_offload' clause used lexically after first target construct or offloading API" } */
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-2.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-2.c
index cf05c505004..b16e701bd5a 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-2.c
@@ -1,13 +1,11 @@ 
 /* { dg-do compile } */
 
-#pragma omp requires reverse_offload /* { dg-message "sorry, unimplemented: 'reverse_offload' clause on 'requires' directive not supported yet" } */
+#pragma omp requires reverse_offload
 
 void
 foo (int n)
 {
-  /* The following test is marked with 'xfail' because a previous 'sorry' from
-     'reverse_offload' suppresses the 'sorry' for 'ancestor'.  */
-  #pragma omp target device (ancestor: 1) /* { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } } */
+  #pragma omp target device (ancestor: 1)
   ;
 
 
@@ -19,9 +17,9 @@  foo (int n)
   #pragma omp target device (ancestor : 42) /* { dg-error "the 'device' clause expression must evaluate to '1'" } */
   ;
 
-  #pragma omp target device (ancestor : n) /* { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } } */
+  #pragma omp target device (ancestor : n)
   ;
-  #pragma omp target device (ancestor : n + 1) /* { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } } */
+  #pragma omp target device (ancestor : n + 1)
   ;
 
 
diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-3.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-3.c
index ea6e5a0cf6c..d16590107d2 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-3.c
@@ -11,7 +11,7 @@  int bar (void);
 
 /* { dg-do compile } */
 
-#pragma omp requires reverse_offload /* { dg-message "sorry, unimplemented: 'reverse_offload' clause on 'requires' directive not supported yet" } */
+#pragma omp requires reverse_offload
 
 void
 foo (void)
diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
index b4b5620bbc0..241234f8daf 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
@@ -4,12 +4,12 @@ 
   /* Test to ensure that device-modifier 'ancestor' is parsed correctly in
      device clauses. */
 
-#pragma omp requires reverse_offload /* { dg-message "sorry, unimplemented: 'reverse_offload' clause on 'requires' directive not supported yet" } */
+#pragma omp requires reverse_offload
 
 void
 foo (void)
 {
-  #pragma omp target device (ancestor: 1) /* { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } } */
+  #pragma omp target device (ancestor: 1) /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */
   ;
 
 }
diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-5.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-5.c
index b6ff84bcdab..b1520ff0636 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-5.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-5.c
@@ -1,4 +1,4 @@ 
-#pragma omp requires reverse_offload  /* { dg-message "sorry, unimplemented: 'reverse_offload' clause on 'requires' directive not supported yet" } */
+#pragma omp requires reverse_offload
 
 void
 foo ()
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2.f90
index 117a1d000a5..230c690d84c 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2.f90
@@ -4,19 +4,16 @@  implicit none
 
 integer :: a, b, c
 
-!$omp requires reverse_offload  ! { dg-error "Sorry, 'reverse_offload' clause at \\(1\\) on REQUIRES directive is not yet supported" }
+!$omp requires reverse_offload
 
 
-! The following test case is marked with 'xfail' because a previous 'sorry' from
-! 'reverse_offload' suppresses the 'sorry' for 'ancestor'.
-
-!$omp target device (ancestor: 1)  ! { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } }
+!$omp target device (ancestor: 1)
 !$omp end target
 
-!$omp target device (ancestor : a)  ! { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } }
+!$omp target device (ancestor : a)
 !$omp end target
 
-!$omp target device (ancestor : a + 1)  ! { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } }
+!$omp target device (ancestor : a + 1)
 !$omp end target
 
 
@@ -32,61 +29,4 @@  integer :: a, b, c
 !$omp target device (42)
 !$omp end target
 
-
-! Ensure that no OpenMP constructs appear inside target regions with 'ancestor'.
-! The following test case is marked with 'xfail' because a previous 'sorry' from
-! 'reverse_offload' suppresses the 'sorry' for 'ancestor'.
-
-!$omp target device (ancestor: 1)
-  !$omp teams  ! { dg-error "" "OpenMP constructs are not allowed in target region with 'ancestor'" { xfail *-*-* } }
-  !$omp end teams
-!$omp end target
-
-!$omp target device (device_num: 1)
-  !$omp teams
-  !$omp end teams
-!$omp end target
-
-!$omp target device (1)
-  !$omp teams
-  !$omp end teams
-!$omp end target
-
-
-! Ensure that with 'ancestor' only the 'device', 'firstprivate', 'private',
-! 'defaultmap', and 'map' clauses appear on the construct.
-! The following test case is marked with 'xfail' because a previous 'sorry' from
-! 'reverse_offload' suppresses the 'sorry' for 'ancestor'.
-
-!$omp target nowait device (ancestor: 1)  ! { dg-error "" "with 'ancestor', only the 'device', 'firstprivate', 'private', 'defaultmap', and 'map' clauses may appear on the construct" { xfail *-*-* } }
-!$omp end target
-
-!$omp target device (ancestor: 1) nowait  ! { dg-error "" "with 'ancestor', only the 'device', 'firstprivate', 'private', 'defaultmap', and 'map' clauses may appear on the construct" { xfail *-*-* } }
-!$omp end target
-
-!$omp target nowait device (device_num: 1)
-!$omp end target
-
-!$omp target nowait device (1)
-!$omp end target
-
-!$omp target device (ancestor: 1) firstprivate (a) private (b) defaultmap (none) map (c)
-!$omp end target
-
-
-! Ensure that 'ancestor' is only used with 'target' constructs (not with
-! 'target data', 'target update' etc.).
-! The following test case is marked with 'xfail' because a previous 'sorry' from
-! 'reverse_offload' suppresses the 'sorry' for 'ancestor'.
-
-!$omp target data map (a) device (ancestor: 1)  ! { dg-error "" "'device' clause with 'ancestor' is only allowed on 'target' construct" { xfail *-*-* } }
-!$omp end target data
-
-!$omp target enter data map (to: a) device (ancestor: 1)  ! { dg-error "" "'device' clause with 'ancestor' is only allowed on 'target' construct" { xfail *-*-* } }
-!$omp target exit data map (from: a) device (ancestor: 1)  ! { dg-error "" "'device' clause with 'ancestor' is only allowed on 'target' construct" { xfail *-*-* } }
-
-!$omp target update to (a) device (ancestor: 1)  ! { dg-error "'device' clause with 'ancestor' is only allowed on 'target' construct" "" { xfail *-*-* } }
-! { dg-error "with 'ancestor', only the 'device', 'firstprivate', 'private', 'defaultmap', and 'map' clauses may appear on the construct" "" { xfail *-*-* } .-1 }
-
-
-end
\ No newline at end of file
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2a.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2a.f90
new file mode 100644
index 00000000000..feb76fe2144
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-2a.f90
@@ -0,0 +1,80 @@ 
+! { dg-do compile }
+
+implicit none
+
+integer :: a, b, c
+
+!$omp requires reverse_offload
+
+!$omp target device (ancestor: 1)
+!$omp end target
+
+!$omp target device (ancestor : a)
+!$omp end target
+
+!$omp target device (ancestor : a + 1)
+!$omp end target
+
+
+!$omp target device (device_num:42)
+!$omp end target
+
+!$omp target device (42)
+!$omp end target
+
+
+! Ensure that no OpenMP constructs appear inside target regions with 'ancestor'.
+
+!$omp target device (ancestor: 1)
+  !$omp teams  ! { dg-error "OpenMP constructs are not allowed in target region with 'ancestor'" }
+  !$omp end teams
+!$omp end target
+
+!$omp target device (device_num: 1)
+  !$omp teams
+  !$omp end teams
+!$omp end target
+
+!$omp target device (1)
+  !$omp teams
+  !$omp end teams
+!$omp end target
+
+
+! Ensure that with 'ancestor' only the 'device', 'firstprivate', 'private',
+! 'defaultmap', and 'map' clauses appear on the construct.
+
+!$omp target nowait device (ancestor: 1)  ! { dg-error "with 'ancestor', only the 'device', 'firstprivate', 'private', 'defaultmap', and 'map' clauses may appear on the construct" }
+!$omp end target
+
+!$omp target device (ancestor: 1) nowait  ! { dg-error "with 'ancestor', only the 'device', 'firstprivate', 'private', 'defaultmap', and 'map' clauses may appear on the construct" }
+!$omp end target
+
+!$omp target nowait device (device_num: 1)
+!$omp end target
+
+!$omp target nowait device (1)
+!$omp end target
+
+!$omp target device (ancestor: 1) firstprivate (a) private (b) defaultmap (none) map (c)
+!$omp end target
+
+
+! Ensure that 'ancestor' is only used with 'target' constructs (not with
+! 'target data', 'target update' etc.).
+! The following test case is marked with 'xfail' because a previous 'sorry' from
+! 'reverse_offload' suppresses the 'sorry' for 'ancestor'.
+
+!$omp target data map (a) device (ancestor: 1)  ! { dg-error "'device' clause with 'ancestor' is only allowed on 'target' construct" }
+!$omp end target data
+
+!$omp target enter data map (to: a) device (ancestor: 1)  ! { dg-error "'device' clause with 'ancestor' is only allowed on 'target' construct" }
+!$omp target exit data map (from: a) device (ancestor: 1)  ! { dg-error "'device' clause with 'ancestor' is only allowed on 'target' construct" }
+
+!$omp target update to (a) device (ancestor: 1)  ! { dg-error "'device' clause with 'ancestor' is only allowed on 'target' construct" }
+
+!$omp target device (ancestor: 1) if(.false.)
+! { dg-error "with 'ancestor', only the 'device', 'firstprivate', 'private', 'defaultmap', and 'map' clauses may appear on the construct" "" { target *-*-* } .-1 }
+!$omp end target
+
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-3.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-3.f90
index f1145bde2ec..e8975e6a08b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-3.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-3.f90
@@ -16,10 +16,10 @@  subroutine f1 ()
   implicit none
   integer :: n
 
-  !$omp requires reverse_offload  ! { dg-error "Sorry, 'reverse_offload' clause at \\(1\\) on REQUIRES directive is not yet supported" }
+  !$omp requires reverse_offload
 
   !$omp target device (ancestor : 1)
-    n = omp_get_thread_num ()  ! { dg-error "" "OpenMP runtime API call 'omp_get_thread_num' in a region with 'device\\(ancestor\\)' clause" { xfail *-*-* } }
+    n = omp_get_thread_num ()  ! { dg-error "OpenMP runtime API call 'omp_get_thread_num' in a region with 'device\\(ancestor\\)' clause" }
   !$omp end target
 
   !$omp target device (device_num : 1)
@@ -30,4 +30,4 @@  subroutine f1 ()
     n = omp_get_thread_num ()
   !$omp end target
 
-end
\ No newline at end of file
+end
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
index 63872fa51fb..ab56e2d1d52 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
@@ -4,11 +4,11 @@ 
 ! Test to ensure that device-modifier 'ancestor' is parsed correctly in
 ! device clauses.
 
-!$omp requires reverse_offload  ! { dg-error "Sorry, 'reverse_offload' clause at \\(1\\) on REQUIRES directive is not yet supported" }
+!$omp requires reverse_offload
 
-!$omp target device (ancestor : 1)  ! { dg-message "" "sorry, unimplemented: 'ancestor' not yet supported" { xfail *-*-* } }
+!$omp target device (ancestor : 1)  ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" }
 !$omp end target
 
 end
 
-! TODO: dg-final { scan-tree-dump-times "pragma omp target \[^\n\r)]*device\\(ancestor:1\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "pragma omp target \[^\n\r)]*device\\(ancestor:1\\)" 1 "original" } }
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 701d33dae49..ebf6978b697 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -330,6 +330,12 @@  enum gomp_map_kind
 #define GOMP_DEPEND_MUTEXINOUTSET	4
 #define GOMP_DEPEND_INOUTSET		5
 
+/* Flag values for requires-directive features, must match corresponding
+   OMP_REQUIRES_* values in gcc/omp-general.h.  */
+#define GOMP_REQUIRES_UNIFIED_ADDRESS       0x10
+#define GOMP_REQUIRES_UNIFIED_SHARED_MEMORY 0x20
+#define GOMP_REQUIRES_REVERSE_OFFLOAD       0x80
+
 /* HSA specific data structures.  */
 
 /* Identifiers of device-specific target arguments.  */
diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c
index 10e1fe19c8e..b2282924fb4 100644
--- a/libgcc/offloadstuff.c
+++ b/libgcc/offloadstuff.c
@@ -54,6 +54,9 @@  const void *const __offload_var_table[0]
   __attribute__ ((__used__, visibility ("hidden"),
 		  section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
 
+const unsigned int const __requires_mask_table[0]
+  __attribute__ ((__used__, section (".gnu.gomp_requires"))) = { };
+
 #elif defined CRT_END
 
 const void *const __offload_funcs_end[0]
@@ -63,6 +66,9 @@  const void *const __offload_vars_end[0]
   __attribute__ ((__used__, visibility ("hidden"),
 		  section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { };
 
+const unsigned int const __requires_mask_table_end[0]
+  __attribute__ ((__used__, section (".gnu.gomp_requires"))) = { };
+
 #elif defined CRT_TABLE
 
 extern const void *const __offload_func_table[];
@@ -77,6 +83,9 @@  const void *const __OFFLOAD_TABLE__[]
   &__offload_var_table, &__offload_vars_end
 };
 
+extern const unsigned int const __requires_mask_table[];
+extern const unsigned int const __requires_mask_table_end[];
+
 #else /* ! CRT_BEGIN && ! CRT_END && ! CRT_TABLE  */
 #error "One of CRT_BEGIN, CRT_END or CRT_TABLE must be defined."
 #endif
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 07ab700b80c..ab3ed638475 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -125,7 +125,7 @@  extern void GOMP_PLUGIN_fatal (const char *, ...)
 extern const char *GOMP_OFFLOAD_get_name (void);
 extern unsigned int GOMP_OFFLOAD_get_caps (void);
 extern int GOMP_OFFLOAD_get_type (void);
-extern int GOMP_OFFLOAD_get_num_devices (void);
+extern int GOMP_OFFLOAD_get_num_devices (unsigned int);
 extern bool GOMP_OFFLOAD_init_device (int);
 extern bool GOMP_OFFLOAD_fini_device (int);
 extern unsigned GOMP_OFFLOAD_version (void);
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 5bb889926d3..eb11b9cf16a 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -54,7 +54,7 @@  host_get_type (void)
 }
 
 static int
-host_get_num_devices (void)
+host_get_num_devices (unsigned int omp_requires_mask __attribute__((unused)))
 {
   return 1;
 }
@@ -229,7 +229,7 @@  host_openacc_get_property (int n, enum goacc_property prop)
 {
   union goacc_property_value nullval = { .val = 0 };
 
-  if (n >= host_get_num_devices ())
+  if (n >= host_get_num_devices (0))
     return nullval;
 
   switch (prop)
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 1565aa0f290..42c3e74e6ba 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -148,7 +148,7 @@  resolve_device (acc_device_t d, bool fail_is_error)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
 				  get_openacc_name (dispatchers[d]->name))
-		  && dispatchers[d]->get_num_devices_func () > 0)
+		  && dispatchers[d]->get_num_devices_func (0) > 0)
 		goto found;
 
 	    if (fail_is_error)
@@ -169,7 +169,7 @@  resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (known_device_type_p (++d))
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d] && dispatchers[d]->get_num_devices_func (0) > 0)
 	  goto found;
       if (d_arg == acc_device_default)
 	{
@@ -302,7 +302,7 @@  acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit)
 
   base_dev = resolve_device (d, true);
 
-  ndevs = base_dev->get_num_devices_func ();
+  ndevs = base_dev->get_num_devices_func (0);
 
   if (ndevs <= 0 || goacc_device_num >= ndevs)
     acc_dev_num_out_of_range (d, goacc_device_num, ndevs);
@@ -351,7 +351,7 @@  acc_shutdown_1 (acc_device_t d)
   /* Get the base device for this device type.  */
   base_dev = resolve_device (d, true);
 
-  ndevs = base_dev->get_num_devices_func ();
+  ndevs = base_dev->get_num_devices_func (0);
 
   /* Unload all the devices of this type that have been opened.  */
   for (i = 0; i < ndevs; i++)
@@ -520,7 +520,7 @@  goacc_attach_host_thread_to_device (int ord)
       base_dev = cached_base_dev;
     }
   
-  num_devices = base_dev->get_num_devices_func ();
+  num_devices = base_dev->get_num_devices_func (0);
   if (num_devices <= 0 || ord >= num_devices)
     acc_dev_num_out_of_range (acc_device_type (base_dev->type), ord,
 			      num_devices);
@@ -599,7 +599,7 @@  acc_get_num_devices (acc_device_t d)
   if (!acc_dev)
     return 0;
 
-  n = acc_dev->get_num_devices_func ();
+  n = acc_dev->get_num_devices_func (0);
   if (n < 0)
     n = 0;
 
@@ -779,7 +779,7 @@  acc_set_device_num (int ord, acc_device_t d)
 
       cached_base_dev = base_dev = resolve_device (d, true);
 
-      num_devices = base_dev->get_num_devices_func ();
+      num_devices = base_dev->get_num_devices_func (0);
 
       if (num_devices <= 0 || ord >= num_devices)
         acc_dev_num_out_of_range (d, ord, num_devices);
@@ -814,7 +814,7 @@  get_property_any (int ord, acc_device_t d, acc_device_property_t prop)
 
   struct gomp_device_descr *dev = resolve_device (d, true);
 
-  int num_devices = dev->get_num_devices_func ();
+  int num_devices = dev->get_num_devices_func (0);
 
   if (num_devices <= 0 || ord >= num_devices)
     acc_dev_num_out_of_range (d, ord, num_devices);
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 1c0436842da..ea327bf2ca0 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3221,10 +3221,14 @@  GOMP_OFFLOAD_version (void)
 /* Return the number of GCN devices on the system.  */
 
 int
-GOMP_OFFLOAD_get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 {
   if (!init_hsa_context ())
     return 0;
+  /* Return -1 if no omp_requires_mask cannot be fulfilled but
+     devices were present.  */
+  if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
+    return -1;
   return hsa_context.agent_count;
 }
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 387bcbbc52a..bc63e274cdf 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1175,9 +1175,14 @@  GOMP_OFFLOAD_get_type (void)
 }
 
 int
-GOMP_OFFLOAD_get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 {
-  return nvptx_get_num_devices ();
+  int num_devices = nvptx_get_num_devices ();
+  /* Return -1 if no omp_requires_mask cannot be fulfilled but
+     devices were present.  */
+  if (num_devices > 0 && omp_requires_mask != 0)
+    return -1;
+  return num_devices;
 }
 
 bool
diff --git a/libgomp/target.c b/libgomp/target.c
index 4740f8a45d3..0fd3f7f47ad 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -36,6 +36,7 @@ 
 # include <inttypes.h>  /* For PRIu64.  */
 #endif
 #include <string.h>
+#include <stdio.h>  /* For snprintf. */
 #include <assert.h>
 #include <errno.h>
 
@@ -98,6 +99,13 @@  static int num_devices;
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* Start/end of .gnu.gomp.requires section of program, defined in
+   crtoffloadbegin/end.o.  */
+__attribute__((weak))
+extern const unsigned int __requires_mask_table[];
+__attribute__((weak))
+extern const unsigned int __requires_mask_table_end[];
+
 /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
 
 static void *
@@ -4085,6 +4093,20 @@  gomp_target_fini (void)
     }
 }
 
+static void
+gomp_requires_to_name (char *buf, size_t size, unsigned int requires_mask)
+{
+  char *end = buf + size, *p = buf;
+  if (requires_mask & GOMP_REQUIRES_UNIFIED_ADDRESS)
+    p += snprintf (p, end - p, "unified_address");
+  if (requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+    p += snprintf (p, end - p, "%sunified_shared_memory",
+		   (p == buf ? "" : ", "));
+  if (requires_mask & GOMP_REQUIRES_REVERSE_OFFLOAD)
+    p += snprintf (p, end - p, "%sreverse_offload",
+		   (p == buf ? "" : ", "));
+}
+
 /* This function initializes the runtime for offloading.
    It parses the list of offload plugins, and tries to load these.
    On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -4106,6 +4128,35 @@  gomp_target_init (void)
   if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_DISABLED)
     return;
 
+  /* Mask of requires directive clause values, summarized from
+     .gnu.gomp.requires section. Offload plugins are queried with this mask to see
+     if all required features are supported.  */
+  unsigned int requires_mask = 0;
+  const unsigned int *mask_ptr = __requires_mask_table;
+  bool error_emitted = false;
+  while (mask_ptr != __requires_mask_table_end)
+    {
+      if (requires_mask == 0)
+	requires_mask = *mask_ptr;
+      else if (requires_mask != *mask_ptr)
+	{
+	  if (!error_emitted)
+	    {
+	      char buf[64], buf2[64];
+	      gomp_requires_to_name (buf, sizeof (buf), requires_mask);
+	      gomp_requires_to_name (buf2, sizeof (buf2), *mask_ptr);
+	      gomp_error ("requires-directive clause inconsistency between "
+			  "compilation units detected: '%s' vs. '%s'",
+			  buf, buf2);
+	      error_emitted = true;
+	    }
+	  /* This is inconsistent, but still merge to query for all features
+	     later.  */
+	  requires_mask |= *mask_ptr;
+	}
+      mask_ptr++;
+    }
+
   cur = OFFLOAD_PLUGINS;
   if (*cur)
     do
@@ -4132,8 +4183,19 @@  gomp_target_init (void)
 
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
-	    new_num_devs = current_device.get_num_devices_func ();
-	    if (new_num_devs >= 1)
+	    new_num_devs = current_device.get_num_devices_func (requires_mask);
+	    if (new_num_devs < 0)
+	      {
+		char buf[64];
+		gomp_requires_to_name (buf, sizeof (buf), requires_mask);
+		char *name = (char *) malloc (cur_len + 1);
+		memcpy (name, cur, cur_len);
+		name[cur_len] = '\0';
+		GOMP_PLUGIN_error ("note: %s devices present but 'omp requires "
+				   "%s' cannot be fulfilled", name, buf);
+		free (name);
+	      }
+	    else if (new_num_devs >= 1)
 	      {
 		/* Augment DEVICES and NUM_DEVICES.  */
 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-1-aux.c b/libgomp/testsuite/libgomp.c-c++-common/requires-1-aux.c
new file mode 100644
index 00000000000..8b9341523c6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-1-aux.c
@@ -0,0 +1,11 @@ 
+/* { dg-skip-if "" { *-*-* } } */
+
+#pragma omp requires reverse_offload
+
+int x;
+
+void foo (void)
+{
+  #pragma omp target
+  x = 1;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-1.c b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
new file mode 100644
index 00000000000..990b4e9817d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-1.c
@@ -0,0 +1,21 @@ 
+/* { dg-skip-if "" { ! offloading_enabled } } */
+/* { dg-additional-sources requires-1-aux.c } */
+
+#pragma omp requires unified_shared_memory
+
+int a[10];
+extern void foo (void);
+
+int
+main (void)
+{
+  #pragma omp target
+  for (int i = 0; i < 10; i++)
+    a[i] = 0;
+
+  foo ();
+  return 0;
+}
+
+/* { dg-output "libgomp: requires-directive clause inconsistency between compilation units detected" } */
+/* { dg-prune-output "nvptx device present but 'omp requires unified_shared_memory, reverse_offload, reverse_offload' cannot be fulfilled" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-2-aux.c b/libgomp/testsuite/libgomp.c-c++-common/requires-2-aux.c
new file mode 100644
index 00000000000..4077648347d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-2-aux.c
@@ -0,0 +1,11 @@ 
+/* { dg-skip-if "" { *-*-* } } */
+
+#pragma omp requires unified_shared_memory
+
+int x;
+
+void foo (void)
+{
+  #pragma omp target
+  x = 1;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-2.c b/libgomp/testsuite/libgomp.c-c++-common/requires-2.c
new file mode 100644
index 00000000000..bc55ab001e9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-2.c
@@ -0,0 +1,20 @@ 
+/* { dg-additional-sources requires-2-aux.c } */
+/* { dg-require-effective-target offload_device } */
+
+#pragma omp requires unified_shared_memory
+
+int a[10];
+extern void foo (void);
+
+int
+main (void)
+{
+  #pragma omp target
+  for (int i = 0; i < 10; i++)
+    a[i] = 0;
+
+  foo ();
+  return 0;
+}
+
+/* { dg-output "devices present but 'omp requires unified_shared_memory' cannot be fulfilled" } */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index d1678d0514e..33bae0650b4 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -168,8 +168,12 @@  GOMP_OFFLOAD_get_type (void)
 }
 
 extern "C" int
-GOMP_OFFLOAD_get_num_devices (void)
+GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
 {
+  /* Return -1 if no omp_requires_mask cannot be fulfilled but
+     devices were present.  */
+  if (num_devices > 0 && omp_requires_mask != 0)
+    return -1;
   TRACE ("(): return %d", num_devices);
   return num_devices;
 }