OpenMP: Ensure that offloaded variables are public

Message ID 6843a549-0d4d-0ca7-ae34-929a15e0fa98@codesourcery.com
State Committed
Headers
Series OpenMP: Ensure that offloaded variables are public |

Commit Message

Andrew Stubbs Nov. 16, 2021, 11:49 a.m. UTC
  Hi,

This patch is needed for AMD GCN offloading when we use the assembler 
from LLVM 13+.

The GCN runtime (libgomp+ROCm) requires that the location of all 
variables in the offloaded variables table are discoverable at runtime 
(using the "hsa_executable_symbol_get_info" API), and this only works 
when the symbols are exported from the binary. Previously we solved this 
by having mkoffload insert ".global" directives into the assembler text, 
but newer LLVM assemblers emit an error if we do this when then variable 
was previously declared ".local" (which happens when a variable is 
zero-initialized and placed in the BSS).

Since we can no longer easily fix them up after the fact, this patch 
fixes them up during OMP lowering.

OK?

Andrew
OpenMP: Ensure that offloaded variables are public

The AMD GCN runtime loader requires that variables in the offload table are
exported (public) so that it can locate the load address and do the mapping.

gcc/ChangeLog:

	* config/gcn/mkoffload.c (process_asm): Don't add .global directives.
	* omp-offload.c (pass_omp_target_link::execute): Make offload_vars
	public.
  

Comments

Jakub Jelinek Nov. 30, 2021, 4:24 p.m. UTC | #1
On Tue, Nov 16, 2021 at 11:49:18AM +0000, Andrew Stubbs wrote:
> This patch is needed for AMD GCN offloading when we use the assembler from
> LLVM 13+.
> 
> The GCN runtime (libgomp+ROCm) requires that the location of all variables
> in the offloaded variables table are discoverable at runtime (using the
> "hsa_executable_symbol_get_info" API), and this only works when the symbols
> are exported from the binary. Previously we solved this by having mkoffload
> insert ".global" directives into the assembler text, but newer LLVM
> assemblers emit an error if we do this when then variable was previously
> declared ".local" (which happens when a variable is zero-initialized and
> placed in the BSS).
> 
> Since we can no longer easily fix them up after the fact, this patch fixes
> them up during OMP lowering.

I'm confused, how can that ever work reliably?
The !TREE_PUBLIC offload_vars can be static locals or static globals
or static anon namespace vars, but their names can very easily clash with
either static or non-static variables from other TUs.
Consider in one TU

static int a = 5;
static int baz (void) { static int b;
#pragma omp declare target to (b)
return ++b; }
int foo (void) { return ++a + baz (); }
#pragma omp declare target to (a, foo)

and

static int a = 5;
static int baz (void) { static int b;
#pragma omp declare target to (b)
return ++b; }
int bar (void) { return ++a + baz (); }
#pragma omp declare target to (a, bar)

int
main ()
{
  int v;
  #pragma omp target (from: v)
  v = foo () + bar ();
}

in another one.  This has
	.quad	a
	.quad	4
	.quad	b.0
	.quad	4
in .offload_var_table.  I'd guess this must fail to link or load
with GCN if it makes them forcibly TREE_PUBLIC.

Why does the GCN plugin or runtime need to know those vars?
It needs to know the single array that contains their addresses of course...

	Jakub
  
Jakub Jelinek Nov. 30, 2021, 4:54 p.m. UTC | #2
On Tue, Nov 30, 2021 at 05:24:49PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Consider in one TU
> 
> static int a = 5;
> static int baz (void) { static int b;
> #pragma omp declare target to (b)
> return ++b; }
> int foo (void) { return ++a + baz (); }
> #pragma omp declare target to (a, foo)
> 
> and
> 
> static int a = 5;
> static int baz (void) { static int b;
> #pragma omp declare target to (b)
> return ++b; }
> int bar (void) { return ++a + baz (); }
> #pragma omp declare target to (a, bar)
> 
> int
> main ()
> {
>   int v;
>   #pragma omp target (from: v)
>   v = foo () + bar ();
> }
> 
> in another one.  This has
> 	.quad	a
> 	.quad	4
> 	.quad	b.0
> 	.quad	4
> in .offload_var_table.  I'd guess this must fail to link or load
> with GCN if it makes them forcibly TREE_PUBLIC.
> 
> Why does the GCN plugin or runtime need to know those vars?
> It needs to know the single array that contains their addresses of course...

Actually, you've done it in ACCEL_COMPILER only, so
I assume linking the above two sources with -fopenmp into a single
binary or shared library will still work because LTO when reading
the byte-code in will remangle the names of those variables to something
where they are unique in that single *.s (or *.ptx) it emits.
But, if you put one of those TUs into a shared library and the other
into another shared library, I don't see how it can work anymore,
because both those ELF objects which will be in data sections of those
libraries might have clashing names.

If GCN can't support static variables (but isn't it ELF?) and there is no
other way than sacrifice offloading from multiple shared libraries or binary
in the same process, it at least shouldn't be done for targets which don't
need it (e.g. PTX) and shouldn't be done in the pass you've done it in
(because that means it will walk all the vars for each function it
processes, rather than just once).  So, better place would be e.g.
offload_handle_link_vars in lto/*.c or so.

	Jakub
  
Andrew Stubbs Dec. 2, 2021, 12:36 p.m. UTC | #3
On 30/11/2021 16:54, Jakub Jelinek wrote:
>> Why does the GCN plugin or runtime need to know those vars?
>> It needs to know the single array that contains their addresses of course...

With older LLVM there were issues with relocations that made it 
impossible to link the the offload_var_table. This is why mkoffload 
deletes it. I've not tried it again recently, so it's possible we could 
completely rework the way these are processed in the plugin, but that's 
the hard option.

What it currently does is a symbol lookup for each named variable listed 
in the C wrapper used to embed the kernel. The lookup provided by the 
AMD runtime ignores symbols that are not exported, even if they are 
present in the ELF.

> Actually, you've done it in ACCEL_COMPILER only, so
> I assume linking the above two sources with -fopenmp into a single
> binary or shared library will still work because LTO when reading
> the byte-code in will remangle the names of those variables to something
> where they are unique in that single *.s (or *.ptx) it emits.
> But, if you put one of those TUs into a shared library and the other
> into another shared library, I don't see how it can work anymore,
> because both those ELF objects which will be in data sections of those
> libraries might have clashing names.

The plugin loads each image file as an independent "executable". If 
there are multiple images then there *will be* duplicate symbols (e.g. 
"init_array") but this is not a problem because they're in a different 
context.

If there's a problem with duplicate symbols *within* a given image then 
we have a bigger problem because offload_var_table is referring to them 
by name. As you say, I presume the LTO stream-in is fixing up such 
conflicts.

> If GCN can't support static variables (but isn't it ELF?) and there is no
> other way than sacrifice offloading from multiple shared libraries or binary
> in the same process, it at least shouldn't be done for targets which don't
> need it (e.g. PTX) and shouldn't be done in the pass you've done it in
> (because that means it will walk all the vars for each function it
> processes, rather than just once).  So, better place would be e.g.
> offload_handle_link_vars in lto/*.c or so.

GCN is ELF and can support static variables just fine ... as long as you 
don't want to poke values into them from the outside. We do not support 
any sort of dynamic libraries; kernels are statically linked relocatable 
executables (the AMD runtime expects the binary to be a dynamic object 
itself, but there's nothing hunting for dependencies, or anything like 
that).

I've tried modifying offload_handle_link_vars but that spot doesn't 
catch the omp_data_sizes variables emitted by 
libgomp.c-c++-common/target_42.c, which was one of the motivating examples.

It is true that my current placement visits all the symbols for every 
function, meaning that they are adjusted in an earlier iteration of a 
pass than you might expect. I couldn't find a single place that fixed 
this problem only in the amdgcn compiler and wasn't too late.

Do you have a suggestion how to not do this for other GPU targets? We 
can add another hook or macro, of course ....

Thanks for the review!

Andrew
  
Jakub Jelinek Dec. 2, 2021, 12:58 p.m. UTC | #4
On Thu, Dec 02, 2021 at 12:36:30PM +0000, Andrew Stubbs wrote:
> On 30/11/2021 16:54, Jakub Jelinek wrote:
> > > Why does the GCN plugin or runtime need to know those vars?
> > > It needs to know the single array that contains their addresses of course...
> 
> With older LLVM there were issues with relocations that made it impossible
> to link the the offload_var_table. This is why mkoffload deletes it. I've
> not tried it again recently, so it's possible we could completely rework the
> way these are processed in the plugin, but that's the hard option.
> 
> What it currently does is a symbol lookup for each named variable listed in
> the C wrapper used to embed the kernel. The lookup provided by the AMD
> runtime ignores symbols that are not exported, even if they are present in
> the ELF.

Would be nice to know what the relocation issue is or was, offload_var_table
shouldn't be different from other arrays containing pointers to static vars,
no?
If you delete it and have to do the lookups in the plugin, I understand that
then they need to be public...

> The plugin loads each image file as an independent "executable". If there
> are multiple images then there *will be* duplicate symbols (e.g.
> "init_array") but this is not a problem because they're in a different
> context.
> 
> If there's a problem with duplicate symbols *within* a given image then we
> have a bigger problem because offload_var_table is referring to them by
> name. As you say, I presume the LTO stream-in is fixing up such conflicts.

Ah, ok.

> I've tried modifying offload_handle_link_vars but that spot doesn't catch
> the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c,
> which was one of the motivating examples.

Why doesn't catch it?  Is the variable created only post-IPA?
I'd think that it should have been created before IPA, streamed and
therefore I don't understand why you don't see it after streaming LTO in.

> It is true that my current placement visits all the symbols for every
> function, meaning that they are adjusted in an earlier iteration of a pass
> than you might expect. I couldn't find a single place that fixed this
> problem only in the amdgcn compiler and wasn't too late.
> 
> Do you have a suggestion how to not do this for other GPU targets? We can
> add another hook or macro, of course ....

Certainly a target hook.  But I'd really like to understand why you don't
see those earlier.

	Jakub
  
Andrew Stubbs Dec. 2, 2021, 4:05 p.m. UTC | #5
On 02/12/2021 12:58, Jakub Jelinek wrote:
>> I've tried modifying offload_handle_link_vars but that spot doesn't catch
>> the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c,
>> which was one of the motivating examples.
> 
> Why doesn't catch it?  Is the variable created only post-IPA?
> I'd think that it should have been created before IPA, streamed and
> therefore I don't understand why you don't see it after streaming LTO in.

On closer inspection it does, in fact, catch it as you'd expect, but 
then the variable is no longer marked public when it gets to 
pass_omp_target_link::execute, so something somewhere is resetting it. 
More investigation is needed....

Andrew
  
Andrew Stubbs Dec. 2, 2021, 4:31 p.m. UTC | #6
On 02/12/2021 16:05, Andrew Stubbs wrote:
> On 02/12/2021 12:58, Jakub Jelinek wrote:
>>> I've tried modifying offload_handle_link_vars but that spot doesn't 
>>> catch
>>> the omp_data_sizes variables emitted by 
>>> libgomp.c-c++-common/target_42.c,
>>> which was one of the motivating examples.
>>
>> Why doesn't catch it?  Is the variable created only post-IPA?
>> I'd think that it should have been created before IPA, streamed and
>> therefore I don't understand why you don't see it after streaming LTO in.
> 
> On closer inspection it does, in fact, catch it as you'd expect, but 
> then the variable is no longer marked public when it gets to 
> pass_omp_target_link::execute, so something somewhere is resetting it. 
> More investigation is needed....

The "whole-program" pass is removing the public flag. That's probably 
working as intended, and I assume it is run for offload code on purpose?

My original patch puts the flag back after this point, so it works fine.

Andrew
  
Jakub Jelinek Dec. 2, 2021, 4:43 p.m. UTC | #7
On Thu, Dec 02, 2021 at 04:31:36PM +0000, Andrew Stubbs wrote:
> On 02/12/2021 16:05, Andrew Stubbs wrote:
> > On 02/12/2021 12:58, Jakub Jelinek wrote:
> > > > I've tried modifying offload_handle_link_vars but that spot
> > > > doesn't catch
> > > > the omp_data_sizes variables emitted by
> > > > libgomp.c-c++-common/target_42.c,
> > > > which was one of the motivating examples.
> > > 
> > > Why doesn't catch it?  Is the variable created only post-IPA?
> > > I'd think that it should have been created before IPA, streamed and
> > > therefore I don't understand why you don't see it after streaming LTO in.
> > 
> > On closer inspection it does, in fact, catch it as you'd expect, but
> > then the variable is no longer marked public when it gets to
> > pass_omp_target_link::execute, so something somewhere is resetting it.
> > More investigation is needed....
> 
> The "whole-program" pass is removing the public flag. That's probably
> working as intended, and I assume it is run for offload code on purpose?

So you'd stick it somewhere into e.g. symbol_table::compile
after ipa_passes call, guarded with #ifdef ACCEL_COMPILER ?

	Jakub
  
Andrew Stubbs Dec. 9, 2021, 11:41 a.m. UTC | #8
On 02/12/2021 16:43, Jakub Jelinek wrote:
> On Thu, Dec 02, 2021 at 04:31:36PM +0000, Andrew Stubbs wrote:
>> On 02/12/2021 16:05, Andrew Stubbs wrote:
>>> On 02/12/2021 12:58, Jakub Jelinek wrote:
>>>>> I've tried modifying offload_handle_link_vars but that spot
>>>>> doesn't catch
>>>>> the omp_data_sizes variables emitted by
>>>>> libgomp.c-c++-common/target_42.c,
>>>>> which was one of the motivating examples.
>>>>
>>>> Why doesn't catch it?  Is the variable created only post-IPA?
>>>> I'd think that it should have been created before IPA, streamed and
>>>> therefore I don't understand why you don't see it after streaming LTO in.
>>>
>>> On closer inspection it does, in fact, catch it as you'd expect, but
>>> then the variable is no longer marked public when it gets to
>>> pass_omp_target_link::execute, so something somewhere is resetting it.
>>> More investigation is needed....
>>
>> The "whole-program" pass is removing the public flag. That's probably
>> working as intended, and I assume it is run for offload code on purpose?
> 
> So you'd stick it somewhere into e.g. symbol_table::compile
> after ipa_passes call, guarded with #ifdef ACCEL_COMPILER ?

I've given up on this approach, and switched to loading the symbol 
addresses from the table directly. The relocation issues that I had with 
older assemblers/linkers do not seem to be a problem any more.

This patch requires only a single symbol to be forced global, and since 
that's one that I create in mkoffload there is no issue with previous 
definitions.

I think I can approve this myself, but if you have any observations I'm 
happy to hear them.

Andrew
amdgcn: Change offload variable table discovery

Up to now the libgomp GCN plugin has been finding the offload variables
by using a symbol lookup, but the AMD runtime requires that the symbols are
global for that to work. This was ensured by mkoffload as a post-procssing
step, but the LLVM 13 assembler no longer accepts this in the case where the
variable was previously declared differently.

This patch switches to locating the symbols directly from the
offload_var_table, which means that only one symbol needs to be forced
global.

This changes breaks the libgomp image compatibility so GOMP_VERSION_GCN has
also been bumped.

gcc/ChangeLog:

	* config/gcn/mkoffload.c (process_asm): Process the variable table
	completely differently.
	(process_obj): Encode the varaible data differently.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION_GCN): Bump.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (struct gcn_image_desc): Remove global_variables.
	(GOMP_OFFLOAD_load_image): Locate the offload variables via the
	table, not individual symbols.

diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index b2e71ea5aa0..d609b7a6f9c 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -495,10 +495,8 @@ static void
 process_asm (FILE *in, FILE *out, FILE *cfile)
 {
   int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
-  struct obstack fns_os, vars_os, varsizes_os, dims_os, regcounts_os;
+  struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
-  obstack_init (&vars_os);
-  obstack_init (&varsizes_os);
   obstack_init (&dims_os);
   obstack_init (&regcounts_os);
 
@@ -567,16 +565,11 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	    unsigned varsize;
 	    if (sscanf (buf, " .8byte %ms\n", &varname))
 	      {
-		obstack_ptr_grow (&vars_os, varname);
+		fputs (buf, out);
 		fgets (buf, sizeof (buf), in);
 		if (!sscanf (buf, " .8byte %u\n", &varsize))
 		  abort ();
-		obstack_int_grow (&varsizes_os, varsize);
 		var_count++;
-
-		/* The HSA Runtime cannot locate the symbol if it is not
-		   exported from the kernel.  */
-		fprintf (out, "\t.global %s\n", varname);
 	      }
 	    break;
 	  }
@@ -595,7 +588,19 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 
       char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
-	state = IN_VARS;
+	{
+	  state = IN_VARS;
+
+	  /* Add a global symbol to allow plugin-gcn.c to locate the table
+	     at runtime.  It can't use the "offload_var_table.N" emitted by
+	     the compiler because a) they're not global, and b) there's one
+	     for each input file combined into the binary.  */
+	  fputs (buf, out);
+	  fputs ("\t.global .offload_var_table\n"
+		 "\t.type .offload_var_table, @object\n"
+		 ".offload_var_table:\n",
+		 out);
+	}
       else if (sscanf (buf, " .section .gnu.offload_funcs%c", &dummy) > 0)
 	state = IN_FUNCS;
       else if (sscanf (buf, " .amdgpu_metadata%c", &dummy) > 0)
@@ -622,7 +627,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	  regcount.sgpr_count = regcount.vgpr_count = -1;
 	}
 
-      if (state == IN_CODE || state == IN_METADATA)
+      if (state == IN_CODE || state == IN_METADATA || state == IN_VARS)
 	fputs (buf, out);
     }
 
@@ -633,24 +638,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fprintf (cfile, "#include <stdlib.h>\n");
   fprintf (cfile, "#include <stdbool.h>\n\n");
 
-  char **vars = XOBFINISH (&vars_os, char **);
-  unsigned *varsizes = XOBFINISH (&varsizes_os, unsigned *);
-  fprintf (cfile,
-	   "static const struct global_var_info {\n"
-	   "  const char *name;\n"
-	   "  void *address;\n"
-	   "} vars[] = {\n");
-  int i;
-  for (i = 0; i < var_count; ++i)
-    {
-      const char *sep = i < var_count - 1 ? "," : " ";
-      fprintf (cfile, "  { \"%s\", NULL }%s /* size: %u */\n", vars[i], sep,
-	       varsizes[i]);
-    }
-  fprintf (cfile, "};\n\n");
-
-  obstack_free (&vars_os, NULL);
-  obstack_free (&varsizes_os, NULL);
+  fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
 
   /* Dump out function idents.  */
   fprintf (cfile, "static const struct hsa_kernel_description {\n"
@@ -661,6 +649,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	   "} gcn_kernels[] = {\n  ");
   dim.d[0] = dim.d[1] = dim.d[2] = 0;
   const char *comma;
+  int i;
   for (comma = "", i = 0; i < fn_count; comma = ",\n  ", i++)
     {
       /* Find if we recorded dimensions for this function.  */
@@ -732,13 +721,11 @@ process_obj (FILE *in, FILE *cfile)
 	   "  unsigned kernel_count;\n"
 	   "  const struct hsa_kernel_description *kernel_infos;\n"
 	   "  unsigned global_variable_count;\n"
-	   "  const struct global_var_info *global_variables;\n"
 	   "} target_data = {\n"
 	   "  &gcn_image,\n"
 	   "  sizeof (gcn_kernels) / sizeof (gcn_kernels[0]),\n"
 	   "  gcn_kernels,\n"
-	   "  sizeof (vars) / sizeof (vars[0]),\n"
-	   "  vars\n"
+	   "  gcn_num_vars\n"
 	   "};\n\n");
 
   fprintf (cfile,
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 3e42d7123ae..0f7210b6f7d 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -262,7 +262,7 @@ enum gomp_map_kind
 #define GOMP_VERSION	1
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
-#define GOMP_VERSION_GCN 1
+#define GOMP_VERSION_GCN 2
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 9e7377c91f9..694862b97f4 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -392,7 +392,6 @@ struct gcn_image_desc
   const unsigned kernel_count;
   struct hsa_kernel_description *kernel_infos;
   const unsigned global_variable_count;
-  struct global_var_info *global_variables;
 };
 
 /* This enum mirrors the corresponding LLVM enum's values for all ISAs that we
@@ -3365,37 +3364,41 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   if (!create_and_finalize_hsa_program (agent))
     return -1;
 
-  for (unsigned i = 0; i < var_count; i++)
+  if (var_count > 0)
     {
-      struct global_var_info *v = &image_desc->global_variables[i];
-      GCN_DEBUG ("Looking for variable %s\n", v->name);
-
       hsa_status_t status;
       hsa_executable_symbol_t var_symbol;
       status = hsa_fns.hsa_executable_get_symbol_fn (agent->executable, NULL,
-						     v->name, agent->id,
+						     ".offload_var_table",
+						     agent->id,
 						     0, &var_symbol);
 
       if (status != HSA_STATUS_SUCCESS)
 	hsa_fatal ("Could not find symbol for variable in the code object",
 		   status);
 
-      uint64_t var_addr;
-      uint32_t var_size;
+      uint64_t var_table_addr;
       status = hsa_fns.hsa_executable_symbol_get_info_fn
-	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS, &var_addr);
+	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+	 &var_table_addr);
       if (status != HSA_STATUS_SUCCESS)
 	hsa_fatal ("Could not extract a variable from its symbol", status);
-      status = hsa_fns.hsa_executable_symbol_get_info_fn
-	(var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE, &var_size);
-      if (status != HSA_STATUS_SUCCESS)
-	hsa_fatal ("Could not extract a variable size from its symbol", status);
 
-      pair->start = var_addr;
-      pair->end = var_addr + var_size;
-      GCN_DEBUG ("Found variable %s at %p with size %u\n", v->name,
-		 (void *)var_addr, var_size);
-      pair++;
+      struct {
+	uint64_t addr;
+	uint64_t size;
+      } var_table[var_count];
+      GOMP_OFFLOAD_dev2host (agent->device_id, var_table,
+			     (void*)var_table_addr, sizeof (var_table));
+
+      for (unsigned i = 0; i < var_count; i++)
+	{
+	  pair->start = var_table[i].addr;
+	  pair->end = var_table[i].addr + var_table[i].size;
+	  GCN_DEBUG ("Found variable at %p with size %lu\n",
+		     (void *)var_table[i].addr, var_table[i].size);
+	  pair++;
+	}
     }
 
   GCN_DEBUG ("Looking for variable %s\n", STRINGX (GOMP_DEVICE_NUM_VAR));
  
Jakub Jelinek Dec. 9, 2021, 12:34 p.m. UTC | #9
On Thu, Dec 09, 2021 at 11:41:46AM +0000, Andrew Stubbs wrote:
> gcc/ChangeLog:
> 
> 	* config/gcn/mkoffload.c (process_asm): Process the variable table
> 	completely differently.
> 	(process_obj): Encode the varaible data differently.
> 
> include/ChangeLog:
> 
> 	* gomp-constants.h (GOMP_VERSION_GCN): Bump.
> 
> libgomp/ChangeLog:
> 
> 	* plugin/plugin-gcn.c (struct gcn_image_desc): Remove global_variables.
> 	(GOMP_OFFLOAD_load_image): Locate the offload variables via the
> 	table, not individual symbols.

I'm very happy this worked out.  LGTM, but sure, you can approve it
yourself.

	Jakub
  
Andrew Stubbs Dec. 22, 2021, 1:01 p.m. UTC | #10
This is now backported to the devel/omp/gcc-11 branch (OG11).

Andrew

On 09/12/2021 11:41, Andrew Stubbs wrote:
> On 02/12/2021 16:43, Jakub Jelinek wrote:
>> On Thu, Dec 02, 2021 at 04:31:36PM +0000, Andrew Stubbs wrote:
>>> On 02/12/2021 16:05, Andrew Stubbs wrote:
>>>> On 02/12/2021 12:58, Jakub Jelinek wrote:
>>>>>> I've tried modifying offload_handle_link_vars but that spot
>>>>>> doesn't catch
>>>>>> the omp_data_sizes variables emitted by
>>>>>> libgomp.c-c++-common/target_42.c,
>>>>>> which was one of the motivating examples.
>>>>>
>>>>> Why doesn't catch it?  Is the variable created only post-IPA?
>>>>> I'd think that it should have been created before IPA, streamed and
>>>>> therefore I don't understand why you don't see it after streaming 
>>>>> LTO in.
>>>>
>>>> On closer inspection it does, in fact, catch it as you'd expect, but
>>>> then the variable is no longer marked public when it gets to
>>>> pass_omp_target_link::execute, so something somewhere is resetting it.
>>>> More investigation is needed....
>>>
>>> The "whole-program" pass is removing the public flag. That's probably
>>> working as intended, and I assume it is run for offload code on purpose?
>>
>> So you'd stick it somewhere into e.g. symbol_table::compile
>> after ipa_passes call, guarded with #ifdef ACCEL_COMPILER ?
> 
> I've given up on this approach, and switched to loading the symbol 
> addresses from the table directly. The relocation issues that I had with 
> older assemblers/linkers do not seem to be a problem any more.
> 
> This patch requires only a single symbol to be forced global, and since 
> that's one that I create in mkoffload there is no issue with previous 
> definitions.
> 
> I think I can approve this myself, but if you have any observations I'm 
> happy to hear them.
> 
> Andrew
  

Patch

diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index b2e71ea5aa00..5b130cc6de71 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -573,10 +573,6 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
 		  abort ();
 		obstack_int_grow (&varsizes_os, varsize);
 		var_count++;
-
-		/* The HSA Runtime cannot locate the symbol if it is not
-		   exported from the kernel.  */
-		fprintf (out, "\t.global %s\n", varname);
 	      }
 	    break;
 	  }
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 833f7ddea58f..c6fb87a5dee2 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2799,6 +2799,18 @@  pass_omp_target_link::execute (function *fun)
 	}
     }
 
+  /* Variables in the offload table may need to be public for the runtime
+     loader to be able to locate them.  (This is true for at least amdgcn.)  */
+  if (offload_vars)
+    for (auto it = offload_vars->begin (); it != offload_vars->end (); it++)
+    if (!TREE_PUBLIC (*it))
+      {
+	TREE_PUBLIC (*it) = 1;
+
+	if (dump_enabled_p () && dump_flags & TDF_DETAILS)
+	  dump_printf (MSG_NOTE, "Make offload var public: %T\n", *it);
+      }
+
   return 0;
 }