OpenMP: Prepare omp-* for ancestor:1 handling

Message ID c1eea8d7-ae50-ebdc-70be-4e7a9d6c5ca7@codesourcery.com
State New
Headers
Series OpenMP: Prepare omp-* for ancestor:1 handling |

Commit Message

Tobias Burnus June 29, 2022, 7:54 p.m. UTC
  Currently, this is a rather useless patch - even though it helps to reduce
the number of local patches I have. Due to the printed sorry, adding a
testcase with -fdump-tree-* is also not possible, yet.

For reverse offload, the plan is to call
   GOMP_target_ext
inside the on the device, passing
   'device(omp_initial_device)'
alias device(GOMP_DEVICE_HOST_FALLBACK) to the
target device's libgomp.

The pointer to the generated target-region function is then
passed as argument. However, that only works if that function
is not nullified ...

The reason that nullifying was added is:
   https://gcc.gnu.org/PR100573
   https://gcc.gnu.org/r12-1066-g95d67762171f83277a5700b270c0d1e2756f83f4
   https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571285.html


Note: Instead of just checking for GOMP_DEVICE_HOST_FALLBACK,
more effort could be done, e.g. by setting some attribute on the
generated function and then check for check for it. Example:
'omp target device_ancestor' + using lookup_attribute).

That's what's done in the second variant.

OK for mainline (which variant)? Or do you prefer to wait for
a more complete patch?

Tobias

PS: Reverse offload - still to do:
- 'requires' patch
- Generate two variants of the target-region function:
   an empty version on the device (just to have a pointer address
   in the offload_func table) and the full version (on the host only)
Those together are sufficient for a omp_get_num_device() == 0 version
(implied by 'required reverse_offload' not being fulfilled by any device).
For a more useful implementation, more work inside libgomp is required.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Jakub Jelinek June 30, 2022, 8:21 a.m. UTC | #1
On Wed, Jun 29, 2022 at 09:54:56PM +0200, Tobias Burnus wrote:
> Currently, this is a rather useless patch - even though it helps to reduce
> the number of local patches I have. Due to the printed sorry, adding a
> testcase with -fdump-tree-* is also not possible, yet.
> 
> For reverse offload, the plan is to call
>   GOMP_target_ext
> inside the on the device, passing
>   'device(omp_initial_device)'
> alias device(GOMP_DEVICE_HOST_FALLBACK) to the
> target device's libgomp.
> 
> The pointer to the generated target-region function is then
> passed as argument. However, that only works if that function
> is not nullified ...
> 
> The reason that nullifying was added is:
>   https://gcc.gnu.org/PR100573
>   https://gcc.gnu.org/r12-1066-g95d67762171f83277a5700b270c0d1e2756f83f4
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571285.html
> 
> 
> Note: Instead of just checking for GOMP_DEVICE_HOST_FALLBACK,
> more effort could be done, e.g. by setting some attribute on the
> generated function and then check for check for it. Example:
> 'omp target device_ancestor' + using lookup_attribute).
> 
> That's what's done in the second variant.
> 
> OK for mainline (which variant)? Or do you prefer to wait for
> a more complete patch?

So, what is the plan with reverse offload?
Which devices can do it?  I presume we won't bother with intelmic,
can gcn do it and how, can nvptx do it and how?

What we could do is implement it initially (with all the restriction
checking etc. needed) for host fallback only, say no devices support
reverse offload and take out the sorry.

But it would be good to at least have some idea how it will be implemented
for some offloading devices, whether map will there anything at all (or it
will require unified shared memory) and how we'll map the fn argument passed
to GOMP_target_ext back to the host function address.

	Jakub
  
Tobias Burnus June 30, 2022, 9:09 a.m. UTC | #2
Hi Jakub,

On 30.06.22 10:21, Jakub Jelinek wrote:
> So, what is the plan with reverse offload?

My idea was to just call omp_target_ext with
'device(omp_initial_device)'. This then automatically
works when called from a target region that runs on
omp_get_initial_device().

For the actual device part, this can be implemented
incrementally by supporting the reverse_offload for
a given device type.

For getting it to work when the code enclosing the ancestor:1
target region runs on an offloading device,
my idea is the following. Comments are welcome!


My idea was to do the same as done for I/O
(which supported for both nvptx and gcn). For GCN:

libgomp/plugin/plugin-gcn.c has:

struct kernargs {
   /* A pointer to struct output, below, for console output data.  */
   int64_t out_ptr;

   /* A pointer to struct heap, below.  */
   int64_t heap_ptr;

   /* A pointer to an ephemeral memory arena.
     Only needed for OpenMP.  */
   int64_t arena_ptr;

/* to be added: */
   /* A pointer to reverse-offload. */
   int64_t rev_ptr;

/* Now come the actual structs.*/
   /* Output data.  */
   struct output {
     int return_value;
     unsigned int next_output;
     struct printf_data {
...
};


This gets initialized on the host and then:

   while (hsa_fns.hsa_signal_wait_acquire_fn (s, HSA_SIGNAL_CONDITION_LT, 1,
                                              1000 * 1000,
                                              HSA_WAIT_STATE_BLOCKED) != 0)
     console_output (kernel, shadow->kernarg_address, false);

with:

   unsigned int from = __atomic_load_n (&kernargs->output_data.consumed,
                                        __ATOMIC_ACQUIRE);

The I/O itself is implemented in newlib,
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/sys/amdgcn/write.c

   register void **kernargs asm("s8");
   struct output *data = (struct output *)kernargs[2];

and then the data is filled.


For reverse offload, the idea is fill it on the device side via
/libgomp/config/gcn/target.c's GOMP_target_ext for
device == GOMP_DEVICE_HOST_FALLBACK && fn != NULL as:

Try to obtain a lock (busy wait)
Put addr/kinds/sizes into the struct
Put the device's fn pointer in the struct
busy wait for completion ('while (fn != NULL) { }')
unlock


And on the host side:
If fn == NULL (= data there) - return output/offload checking loop
Otherwise:
call a new function in target.c and pass args to it.
Once it completed, set fn = NULL to indicate it has been processed.

And in target.c's new reverse-offload-handling function:
- find generated-target function on the host,
   based on device stub function's pointer address
- Handle the mapping
- Call host function
- Handle the mapping
- return

Additionally:

If 'requires reverse_offload' is set, fill not only
the normal splay_tree for "host -> device" lookup but
also another one for the "device -> host" lookups.

Does this make sense?

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Patch

OpenMP: Prepare omp-* for ancestor:1 handling

gcc/ChangeLog:

	* omp-expand.cc (expand_omp_target): Set device to
	GOMP_DEVICE_HOST_FALLBACK for ancestor.
	* omp-low.cc (scan_omp_target): Add 'omp target device_ancestor'
	attribute to generated target-region function for ancestor:1.
	* omp-offload.cc (pass_omp_target_link::execute): Don't nullify
	function pointer for ancestor:1.

 gcc/omp-expand.cc  | 6 +++++-
 gcc/omp-low.cc     | 6 ++++++
 gcc/omp-offload.cc | 9 +++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 1023c56fc3d..dc0a963e9e3 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -10005,7 +10005,11 @@  expand_omp_target (struct omp_region *region)
 	    need_device_adjustment = true;
 	  device_loc = OMP_CLAUSE_LOCATION (c);
 	  if (OMP_CLAUSE_DEVICE_ANCESTOR (c))
-	    sorry_at (device_loc, "%<ancestor%> not yet supported");
+	    {
+	      device = build_int_cst (integer_type_node,
+				      GOMP_DEVICE_HOST_FALLBACK);
+	      sorry_at (device_loc, "%<ancestor%> not yet supported");
+	    }
 	}
       else
 	{
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b9d5529f212..140ef229cc0 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -3094,6 +3094,12 @@  scan_omp_target (gomp_target *stmt, omp_context *outer_ctx)
   if (offloaded)
     {
       create_omp_child_function (ctx, false);
+      tree c = omp_find_clause (gimple_omp_target_clauses (ctx->stmt),
+				OMP_CLAUSE_DEVICE);
+      if (c && OMP_CLAUSE_DEVICE_ANCESTOR (c))
+	DECL_ATTRIBUTES (ctx->cb.dst_fn)
+	  = tree_cons (get_identifier ("omp target device_ancestor"),
+		       NULL_TREE, DECL_ATTRIBUTES (ctx->cb.dst_fn));
       gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
     }
 
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 3a89119371c..a6c108aef30 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2803,6 +2803,15 @@  pass_omp_target_link::execute (function *fun)
 	{
 	  if (gimple_call_builtin_p (gsi_stmt (gsi), BUILT_IN_GOMP_TARGET))
 	    {
+	      tree dev = gimple_call_arg (gsi_stmt (gsi), 0);
+	      tree fn = gimple_call_arg (gsi_stmt (gsi), 1);
+	      if (POINTER_TYPE_P (TREE_TYPE (fn)))
+		fn = TREE_OPERAND (fn, 0);
+	      if (TREE_CODE (dev) == INTEGER_CST
+		  && wi::to_wide (dev) == GOMP_DEVICE_HOST_FALLBACK
+		  && lookup_attribute ("omp target device_ancestor",
+				       DECL_ATTRIBUTES (fn)) != NULL_TREE)
+		continue;  /* ancestor:1  */
 	      /* Nullify the second argument of __builtin_GOMP_target_ext.  */
 	      gimple_call_set_arg (gsi_stmt (gsi), 1, null_pointer_node);
 	      update_stmt (gsi_stmt (gsi));