[gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has (was: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has)

Message ID 1c60f33d-ec25-49f4-a95f-c46c7dca0327@baylibre.com
State New
Headers
Series [gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has (was: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has) |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply

Commit Message

Tobias Burnus Feb. 10, 2025, 3:24 p.m. UTC
  Hi all,

Andrew and I discussed about the following:

Andrew Stubbs wrote:
> This business of changing the -march flag from what the user specified 
> is also questionable.

Result: the new patch (attached) no longer automatically chooses the 
associated ISA architecture but prints a fatal error instead, pointing 
to the associated generic ISA.

[This error is only printed if the generic multilib actually exists (and 
the specific does not).]

OK for mainline?

* * *

I assume that in the future, distros will build with 
gfx-generic{9,10-3,11} and gfx90{8,a} (MI100/MI200, not compatible with 
gfx9) – and this will likely also become the default in GCC.

However, for now the default is unchanged as LLVM 19 is still quite new 
– and, even more importantly, ROCm 6.3.2 does not support multilibs.

This is expected to change soon, given that the ROCR-Runtime git repo 
already implements it and traces of generic support already are in ROCm 
6.3.2. (For instance, ROCm 6.3.2 can already decode the ELF flags of 
generic code objects.)

Tobias
  

Comments

Andrew Stubbs Feb. 10, 2025, 4:11 p.m. UTC | #1
On 10/02/2025 15:24, Tobias Burnus wrote:
> Hi all,
> 
> Andrew and I discussed about the following:
> 
> Andrew Stubbs wrote:
>> This business of changing the -march flag from what the user specified 
>> is also questionable.
> 
> Result: the new patch (attached) no longer automatically chooses the 
> associated ISA architecture but prints a fatal error instead, pointing 
> to the associated generic ISA.
> 
> [This error is only printed if the generic multilib actually exists (and 
> the specific does not).]
> 
> OK for mainline?
> 
> * * *
> 
> I assume that in the future, distros will build with gfx- 
> generic{9,10-3,11} and gfx90{8,a} (MI100/MI200, not compatible with 
> gfx9) – and this will likely also become the default in GCC.
> 
> However, for now the default is unchanged as LLVM 19 is still quite new 
> – and, even more importantly, ROCm 6.3.2 does not support multilibs.
> 
> This is expected to change soon, given that the ROCR-Runtime git repo 
> already implements it and traces of generic support already are in ROCm 
> 6.3.2. (For instance, ROCm 6.3.2 can already decode the ELF flags of 
> generic code objects.)

If, at some point in future, LLVM-LLD changes to permit linking generic 
libraries with specific-arch kernels, then this new error message should 
be replaced with a MULTILIB_MATCHES definition.

Until then, I think the proposed error message is better than the 
unexpected link error we have now.

> +  fatal_error (UNKNOWN_LOCATION,
> +	       "GCC was build without library support for %<-march=%s%>; "
> +	       "consider compiling for the associated generic architecture "
> +	       "%<-march=%s%> instead", isa_name, gen_name);

s/build/built/

OK with that change, assuming you tested it carefully.

Andrew
  

Patch

[gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has

Assume that a distro has configured, e.g., a gfx9-generic multilib but not
for gfx902. In that case, mkoffload would fail to link with "error:
incompatible mach".  With this commit, an error is printed suggesting to try
the associated generic architecture instead.  The behavior is unchanged if
there is a multilib available for the specific ISA or when there is also no
multilib for the generic ICA.

Note: The build of generic multilibs are currently not enabled by default;
they also require the linker/assembler of LLVM 19 or newer and, in particular,
for the execution a future ROCm release. (The next one? In any case, 6.3.2
does not support generic ISAs, yet.)

gcc/ChangeLog:

        * config/gcn/mkoffload.cc (enum elf_arch_code): Add
        EF_AMDGPU_MACH_AMDGCN_NONE.
        (elf_arch): Use enum elf_arch_code as type.
        (tool_cleanup): Silence warning by removing tailing '.' from error.
        (get_arch_name): Return enum elf_arch_code.
        (check_for_missing_lib): New; print fatal error if the multilib
	is not available but it is for the associate generic ISA.
        (main): Call it.

 gcc/config/gcn/mkoffload.cc | 101 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 94 insertions(+), 7 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 92e8fe70c12..64719a8e27c 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -53,6 +53,7 @@ 
 
 /* Extract the EF_AMDGPU_MACH_AMDGCN_GFXnnn from the def file.  */
 enum elf_arch_code {
+  EF_AMDGPU_MACH_AMDGCN_NONE = -1,  /* For generic handling.  */
 #define GCN_DEVICE(name, NAME, ELF_ARCH, ...) \
   EF_AMDGPU_MACH_AMDGCN_ ## NAME = ELF_ARCH,
 #include "gcn-devices.def"
@@ -135,9 +136,8 @@  static struct obstack files_to_cleanup;
 enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
 const char *offload_abi_host_opts = NULL;
 
-uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU architecture.
+enum elf_arch_code elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU architecture.
 uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4;
-
 static int gcn_stack_size = 0;  /* Zero means use default.  */
 
 /* Delete tempfiles.  */
@@ -782,7 +782,7 @@  compile_native (const char *infile, const char *outfile, const char *compiler,
   obstack_ptr_grow (&argv_obstack, ".c");
   if (!offload_abi_host_opts)
     fatal_error (input_location,
-		 "%<-foffload-abi-host-opts%> not specified.");
+		 "%<-foffload-abi-host-opts%> not specified");
   obstack_ptr_grow (&argv_obstack, offload_abi_host_opts);
   obstack_ptr_grow (&argv_obstack, infile);
   obstack_ptr_grow (&argv_obstack, "-c");
@@ -796,16 +796,15 @@  compile_native (const char *infile, const char *outfile, const char *compiler,
   obstack_free (&argv_obstack, NULL);
 }
 
-static int
+static enum elf_arch_code
 get_arch (const char *str, const char *with_arch_str)
 {
   /* Use the def file to map the name to the elf_arch_code.  */
   if (!str) ;
 #define GCN_DEVICE(name, NAME, ELF, ...) \
   else if (strcmp (str, #name) == 0) \
-    return ELF;
+    return (enum elf_arch_code) ELF;
 #include "gcn-devices.def"
-#undef GCN_DEVICE
 
   /* else */
   error ("unrecognized argument in option %<-march=%s%>", str);
@@ -839,7 +838,91 @@  get_arch (const char *str, const char *with_arch_str)
 
   exit (FATAL_EXIT_CODE);
 
-  return 0;
+  return EF_AMDGPU_MACH_AMDGCN_NONE;
+}
+
+static const char*
+get_arch_name (enum elf_arch_code arch_code)
+{
+  switch (arch_code)
+    {
+#define GCN_DEVICE(name, NAME, ELF, ...) \
+    case EF_AMDGPU_MACH_AMDGCN_ ## NAME: \
+      return #name;
+#include "../../gcc/config/gcn/gcn-devices.def"
+    default: return NULL;
+    }
+}
+
+/* If an generic arch exists and for the chosen arch no (multi)lib is
+   available, print a fatal error - and suggest to compile for the generic
+   version instead.  */
+
+static void
+check_for_missing_lib (enum elf_arch_code elf_arch,
+		       enum elf_arch_code default_arch)
+{
+  enum elf_arch_code generic_arch;
+  switch (elf_arch)
+    {
+#define GCN_DEVICE(name, NAME, ELF, ISA, XNACK, SRAM, WAVE64, CU, \
+		   MAX_ISA_VGPRS, GEN_VER, ARCH_FAM, GEN_MACH, ...) \
+    case EF_AMDGPU_MACH_AMDGCN_ ## NAME: \
+      generic_arch = EF_AMDGPU_MACH_AMDGCN_ ## GEN_MACH; break;
+#include "../../gcc/config/gcn/gcn-devices.def"
+    default: generic_arch = EF_AMDGPU_MACH_AMDGCN_NONE;
+    }
+
+  /* If not generic or the default arch, the library version exists.  */
+  if (generic_arch == EF_AMDGPU_MACH_AMDGCN_NONE || elf_arch == default_arch)
+    return;
+
+  /* Search gcn_arch in the multilib config, which might look like
+     "march=gfx900/march=gfx906".  */
+  const char *p = multilib_options;
+  const char *q = NULL;
+  const char *isa_name = get_arch_name (elf_arch);
+  while ((q = strstr (p, isa_name)) != NULL)
+    {
+      if (multilib_options + strlen ("march=") <= q
+	  && startswith (&q[-strlen ("march=")], "march="))
+	{
+	  const char r = q[strlen (isa_name)];
+	  if (r != '\0' && r != '/')
+	    continue;
+	  break;
+	}
+      p++;
+    }
+
+  /* Specified -march= exists in the multilib.  */
+  if (q != NULL)
+    return;
+
+  /* If no lib, try to find one for the generic arch.  */
+  const char *gen_name = get_arch_name (generic_arch);
+  if (generic_arch != default_arch)
+    {
+      p = multilib_options;
+      while ((q = strstr (p, gen_name)) != NULL)
+	{
+	  if (multilib_options + strlen ("march=") <= q
+	      && startswith (&q[-strlen ("march=")], "march="))
+	    {
+	      const char r = q[strlen (gen_name)];
+	      if (r != '\0' && r != '/')
+		continue;
+	      break;
+	    }
+	  p++;
+	}
+      if (q == NULL)
+	return;
+    }
+  fatal_error (UNKNOWN_LOCATION,
+	       "GCC was build without library support for %<-march=%s%>; "
+	       "consider compiling for the associated generic architecture "
+	       "%<-march=%s%> instead", isa_name, gen_name);
 }
 
 int
@@ -864,6 +947,7 @@  main (int argc, char **argv)
 	elf_arch = get_arch (configure_default_options[0].value, NULL);
 	break;
       }
+  enum elf_arch_code default_arch = elf_arch;
 
   obstack_init (&files_to_cleanup);
   if (atexit (mkoffload_cleanup) != 0)
@@ -998,6 +1082,8 @@  main (int argc, char **argv)
 	}
     }
 
+  check_for_missing_lib (elf_arch, default_arch);
+
   if (!(fopenacc ^ fopenmp))
     fatal_error (input_location,
 		 "either %<-fopenacc%> or %<-fopenmp%> must be set");
@@ -1056,6 +1142,7 @@  main (int argc, char **argv)
     case ELF: if (GEN_VER) SET_GENERIC_VERSION (elf_flags, GEN_VER); break;
 #include "gcn-devices.def"
 #undef GCN_DEVICE
+    case EF_AMDGPU_MACH_AMDGCN_NONE: gcc_unreachable ();
     }
 
   /* Build arguments for compiler pass.  */