OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Message ID 9f2ffb3d-4c58-4c0a-9c57-56fea0396c33@baylibre.com
State New
Headers
Series OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

Tobias Burnus Sept. 19, 2024, 1:23 p.m. UTC
  Hi all,

in order to know and potentially re-use a specific offload device (reproducibility,
affinity wise close to a CPU (socket), …) a mapping between an (universal?) unique
identifier and the OpenMP device number is useful. Thus, TR13 added support for it.

This is a collateral patch caused by looking at the API routines for other reasons
and looking at that part of the spec during the OpenMP F2F.

Besides the added API routines, the UID will be used elsewhere:
* In context selectors: 'target_device' supports 'uid(<string>)'.
* In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.

@Sandra: Besides the usual .texi part, for the 'target_device' trait set:
if you add a new GOMP routine for kind/arch/isa - can you also add an
UID argument such that we don't have to update the API when needing in the
not so far future.

@Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin +
.texi)?

@Jakub or anyone else — any comments, suggestions, remarks?

[The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
and seems to work fine.]

Tobias
  

Comments

Andre Vehreschild Sept. 19, 2024, 1:53 p.m. UTC | #1
Hi Tobias,

in the changelog of libgomp:

	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): Add.

"Add." what? Can you be more specific, i.e. is it just a dummy or prototype?

In the libgomp/libgomp.texi

+@node omp_get_uid_from_device
+@subsection @code{omp_get_uid_from_device} -- Obtain the unique id of a device
+@table @asis
+@item @emph{Description}:
+This function returns a pointer to _a_ string that represents a unique
identifier 
						^^^^^^^

+(UID) for the device specified by @var{device_num}.  It returns a ...

<snipp>

@@ -6604,6 +6673,12 @@ The implementation remark:
       @code{omp_thread_mem_alloc}, all use low-latency memory as first
       preference, and fall back to main graphics memory when the low-latency
       pool is exhausted.
+@item The unique identifier (UID), used with OpenMP's API UID routine, consists
+      of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
+      the CUDA runtime library.  This UUID is output in grouped lower-case
+      hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
+      4 digits, hyphen, 4 digits, hyphen, 16 digits.  The output matches the
+      format used by @code{nvidia-smi}.
 @end itemize

Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then
why is the "normal" UUID display format described here? This confuses me. (Just
curiosity.)

Er, and when I read further on, I find the nvptx implementation and that
contradicts the description. There a "normal" UUID is added to the GPU- id. So
you might want to make that implementation remark more clear

Sorry for the bickering. I just stumbled over that while waiting for a
regression test.

The remainder looks reasonable to me.

Regards,
	Andre

On Thu, 19 Sep 2024 15:23:54 +0200
Tobias Burnus <tburnus@baylibre.com> wrote:

> Hi all,
> 
> in order to know and potentially re-use a specific offload device
> (reproducibility, affinity wise close to a CPU (socket), …) a mapping between
> an (universal?) unique identifier and the OpenMP device number is useful.
> Thus, TR13 added support for it.
> 
> This is a collateral patch caused by looking at the API routines for other
> reasons and looking at that part of the spec during the OpenMP F2F.
> 
> Besides the added API routines, the UID will be used elsewhere:
> * In context selectors: 'target_device' supports 'uid(<string>)'.
> * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.
> 
> @Sandra: Besides the usual .texi part, for the 'target_device' trait set:
> if you add a new GOMP routine for kind/arch/isa - can you also add an
> UID argument such that we don't have to update the API when needing in the
> not so far future.
> 
> @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side (plugin +
> .texi)?
> 
> @Jakub or anyone else — any comments, suggestions, remarks?
> 
> [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
> and seems to work fine.]
> 
> Tobias
  
Tobias Burnus Sept. 19, 2024, 2:29 p.m. UTC | #2
Hi Andre,

thanks for reading the patch + commenting.

Andre Vehreschild wrote:
> in the changelog of libgomp:
>
> 	* fortran.c (omp_get_uid_from_device_,
> 	omp_get_uid_from_device_8_): Add.
>
> "Add." what? Can you be more specific, i.e. is it just a dummy or prototype?

Neither. It is a full implementation (that is a wrapper to the target.c 
function, directly called by C/C++).

The prototype used by fortran.c is 'omp.h.in' (i.e. the C/C++ header 
file, also used by user code) and for Fortran code of users, it is the 
module generated from 'omp_lib.f90.in' and the (deprecated) include file 
'omp_lib.h.in'.

The purpose of fortran.c in general – and also for the added code – is 
to be a wrapper between the Fortran API/ABI and the C ABI. In the 
current case, there are two reasons for the two functions:

(a) The result type is 'character(:), pointer' – but the C function just 
returns a '\0' terminated const char*. Hence, the wrapper function 
contains a '*result_len = strlen (*result);' besides the '*result = 
<call to C function>'

(b) The argument is an 'integer'. As we want to be compatible with 
-fdefault-integer-8, previously somewhat fashionable, we have an 
'int32_t' and an 'int64_t' version of the function – which needs a 
second wrapper function.

As for the other API routine, as a BIND(C) makes it call the C function, 
no wrapper it needed.

* * *

[Typo: missing 'a' – noted + will fix.]

* * *

> +@item The unique identifier (UID), used with OpenMP's API UID routine, consists
> +      of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
> +      the CUDA runtime library.  This UUID is output in grouped lower-case
> +      hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
> +      4 digits, hyphen, 4 digits, hyphen, 16 digits.  The output matches the
> +      format used by @code{nvidia-smi}.
>   @end itemize
>
> Do I get this right, that for CUDA this is, e.g. GPU-0123456789abdcef ? Then
> why is the "normal" UUID display format described here? This confuses me. (Just
> curiosity.)

For AMD, it is the following type of string, which contains a 8 bytes/16 
hex-digits UUID part: 'GPU-abcef0123456789'.

While for Nvidia it is 'GPU-abcdef12-1234-1234-01234567890abcd', 
consisting of a 16 bytes/32 hex-digits UUID.

For AMD, we directly get the string, matching what "rocminfo" shows as UUID.

For Nvidia, we don't get a string but a 'char bytes[16]' array filled 
with the values, which we print each as '%02x' hex digit. For the 
output, additionally, a "GPU-" prefix is added + a few hyphens. That's 
to mimic what 'nvidia-smi -a' outputs.

I admit it is slightly confusing – and when reading the .texi, it is 
also easy to miss that one part talks about AMD ("GCN") GPUs and the 
other about NVidia GPUs.

→ https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html

(In terms of OpenMP, it is only a unique identifier; it does not need to 
be universally unique [and also isn't for the host]; AMD and Nvidia call 
it UUID and it looks rather unique for the GPU; rocminfo also outputs an 
"UUID" for the CPU but that's just "CPU-XX" (twice for a dual socket 
system, i.e. not even unique), but we don't use this output.)

> Er, and when I read further on, I find the nvptx implementation and that
> contradicts the description. There a "normal" UUID is added to the GPU- id.

Now I am confused. What description contradicts which one?

Tobias
  

Patch

OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): Add.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.

 gcc/omp-general.cc                               |  4 +-
 include/cuda/cuda.h                              |  7 ++
 libgomp/config/gcn/target.c                      | 14 ++++
 libgomp/config/nvptx/target.c                    | 14 ++++
 libgomp/fortran.c                                | 15 +++++
 libgomp/libgomp-plugin.h                         |  1 +
 libgomp/libgomp.h                                |  2 +
 libgomp/libgomp.map                              |  8 +++
 libgomp/libgomp.texi                             | 81 +++++++++++++++++++++++-
 libgomp/omp.h.in                                 |  3 +
 libgomp/omp_lib.f90.in                           | 23 +++++++
 libgomp/omp_lib.h.in                             | 23 +++++++
 libgomp/plugin/cuda-lib.def                      |  2 +
 libgomp/plugin/plugin-gcn.c                      | 16 +++++
 libgomp/plugin/plugin-nvptx.c                    | 34 ++++++++++
 libgomp/target.c                                 | 56 ++++++++++++++++
 libgomp/testsuite/libgomp.c/device_uid.c         | 38 +++++++++++
 libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 ++++++++++++
 18 files changed, 379 insertions(+), 4 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index de91ba8a4a7..12788ad0249 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,6 +3260,7 @@  omp_runtime_api_procname (const char *name)
       "alloc",
       "calloc",
       "free",
+      "get_device_from_uid",
       "get_interop_int",
       "get_interop_ptr",
       "get_mapped_ptr",
@@ -3338,12 +3339,13 @@  omp_runtime_api_procname (const char *name)
 	 as DECL_NAME only omp_* and omp_*_8 appear.  */
       "display_env",
       "get_ancestor_thread_num",
-      "init_allocator",
+      "omp_get_uid_from_device",
       "get_partition_place_nums",
       "get_place_num_procs",
       "get_place_proc_ids",
       "get_schedule",
       "get_team_size",
+      "init_allocator",
       "set_default_device",
       "set_dynamic",
       "set_max_active_levels",
diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 804d08ca57e..0f90ade57c8 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -201,6 +201,10 @@  typedef struct {
   size_t WidthInBytes, Height, Depth;
 } CUDA_MEMCPY3D_PEER;
 
+typedef struct {
+  char bytes[16];
+} CUuuid;
+
 #define cuCtxCreate cuCtxCreate_v2
 CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
 #define cuCtxDestroy cuCtxDestroy_v2
@@ -214,6 +218,9 @@  CUresult cuCtxPushCurrent (CUcontext);
 CUresult cuCtxSynchronize (void);
 CUresult cuCtxSetLimit (CUlimit, size_t);
 CUresult cuDeviceGet (CUdevice *, int);
+/* _v2 was added in CUDA 11.4 and 'will supplant' the old one in 12.0. */
+CUresult cuDeviceGetUuid (CUuuid*, CUdevice);
+CUresult cuDeviceGetUuid_v2 (CUuuid*, CUdevice);
 #define cuDeviceTotalMem cuDeviceTotalMem_v2
 CUresult cuDeviceTotalMem (size_t *, CUdevice);
 CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index f7fa6aa6396..0a3008454b7 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -283,6 +283,18 @@  omp_get_interop_rc_desc (const omp_interop_t interop __attribute__ ((unused)),
   return rc_strings[omp_irc_no_value - ret_code];
 }
 
+const char *
+omp_get_uid_from_device (int device_num __attribute__ ((unused)))
+{
+  return NULL;
+}
+
+int
+omp_get_device_from_uid (const char *uid __attribute__ ((unused)))
+{
+  return omp_invalid_device;
+}
+
 ialias (omp_get_num_interop_properties)
 ialias (omp_get_interop_int)
 ialias (omp_get_interop_ptr)
@@ -290,3 +302,5 @@  ialias (omp_get_interop_str)
 ialias (omp_get_interop_name)
 ialias (omp_get_interop_type_desc)
 ialias (omp_get_interop_rc_desc)
+ialias (omp_get_uid_from_device)
+ialias (omp_get_device_from_uid)
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index 69666578c29..811396122b4 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -295,6 +295,18 @@  omp_get_interop_rc_desc (const omp_interop_t interop __attribute__ ((unused)),
   return rc_strings[omp_irc_no_value - ret_code];
 }
 
+const char *
+omp_get_uid_from_device (int device_num __attribute__ ((unused)))
+{
+  return NULL;
+}
+
+int
+omp_get_device_from_uid (const char *uid __attribute__ ((unused)))
+{
+  return omp_invalid_device;
+}
+
 ialias (omp_get_num_interop_properties)
 ialias (omp_get_interop_int)
 ialias (omp_get_interop_ptr)
@@ -302,3 +314,5 @@  ialias (omp_get_interop_str)
 ialias (omp_get_interop_name)
 ialias (omp_get_interop_type_desc)
 ialias (omp_get_interop_rc_desc)
+ialias (omp_get_uid_from_device)
+ialias (omp_get_device_from_uid)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index a76c33cee52..732475e3ff4 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -834,6 +834,21 @@  omp_get_interop_rc_desc_ (const char **res, size_t *res_len,
   *res_len = *res ? strlen (*res) : 0;
 }
 
+void
+omp_get_uid_from_device_ (const char **res, size_t *res_len,
+		     	 int32_t device_num) 
+{
+  *res = omp_get_uid_from_device (device_num);
+  *res_len = *res ? strlen (*res) : 0;
+}
+
+void
+omp_get_uid_from_device_8_ (const char **res, size_t *res_len,
+			    int64_t device_num) 
+{
+  omp_get_uid_from_device_ (res, res_len, (int32_t) device_num);
+}
+
 #ifndef LIBGOMP_OFFLOADED_ONLY
 
 void
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 0c9c28c65cf..ce8f7f3236f 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -127,6 +127,7 @@  extern void GOMP_PLUGIN_target_rev (uint64_t, uint64_t, uint64_t, uint64_t,
 
 /* Prototypes for functions implemented by libgomp plugins.  */
 extern const char *GOMP_OFFLOAD_get_name (void);
+extern const char *GOMP_OFFLOAD_get_uid (int);
 extern unsigned int GOMP_OFFLOAD_get_caps (void);
 extern int GOMP_OFFLOAD_get_type (void);
 extern int GOMP_OFFLOAD_get_num_devices (unsigned int);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 089393846d1..f3ecd95b377 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1387,6 +1387,7 @@  struct gomp_device_descr
 
   /* The name of the device.  */
   const char *name;
+  const char *uid;
 
   /* Capabilities of device (supports OpenACC, OpenMP).  */
   unsigned int capabilities;
@@ -1399,6 +1400,7 @@  struct gomp_device_descr
 
   /* Function handlers.  */
   __typeof (GOMP_OFFLOAD_get_name) *get_name_func;
+  __typeof (GOMP_OFFLOAD_get_uid) *get_uid_func;
   __typeof (GOMP_OFFLOAD_get_caps) *get_caps_func;
   __typeof (GOMP_OFFLOAD_get_type) *get_type_func;
   __typeof (GOMP_OFFLOAD_get_num_devices) *get_num_devices_func;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 7c2345eb29b..0023d3e1b6d 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -443,6 +443,14 @@  GOMP_5.1.3 {
 	omp_get_interop_rc_desc_;
 } GOMP_5.1.2;
 
+GOMP_6.0 {
+  global:
+	omp_get_device_from_uid;
+	omp_get_uid_from_device;
+	omp_get_uid_from_device_;
+	omp_get_uid_from_device_8_;
+} GOMP_5.1.3;
+
 OACC_2.0 {
   global:
 	acc_get_num_devices;
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e8003df6f02..e072d88bba9 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -582,7 +582,7 @@  Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item @code{omp_is_free_agent} and @code{omp_ancestor_is_free_agent} routines
       @tab N @tab
 @item @code{omp_get_device_from_uid} and @code{omp_get_uid_from_device} routines
-      @tab N @tab
+      @tab Y @tab
 @item @code{omp_get_device_num_teams}, @code{omp_set_device_num_teams},
       @code{omp_get_device_teams_thread_limit}, and
       @code{omp_set_device_teams_thread_limit} routines @tab N @tab
@@ -1675,12 +1675,12 @@  They have C linkage and do not throw exceptions.
 @menu
 * omp_get_num_procs::           Number of processors online
 @c * omp_get_max_progress_width:: <fixme>/TR11
-@c * omp_get_device_from_uid::  <fixme>/TR13
-@c * omp_get_uid_from_device::  <fixme>/TR13
 * omp_set_default_device::      Set the default device for target regions
 * omp_get_default_device::      Get the default device for target regions
 * omp_get_num_devices::         Number of target devices
 * omp_get_device_num::          Get device that current thread is running on
+* omp_get_device_from_uid::     Obtain the device number to a unique id
+* omp_get_uid_from_device::     Obtain the unique id of a device
 * omp_is_initial_device::       Whether executing on the host device
 * omp_get_initial_device::      Device number of host device
 @c * omp_get_device_num_teams::  <fixme>/TR13
@@ -1830,6 +1830,71 @@  as required since OpenMP 5.0.
 
 
 
+@node omp_get_device_from_uid
+@subsection @code{oomp_get_device_from_uid} -- Obtain the device number to a unique id
+@table @asis
+@item @emph{Description}:
+This function returns the device number associated with the passed
+unique-identifier (UID) string.  If no device with this UID is available, the value
+@code{omp_invalid_device} is returned.  The effect of running this routine in a
+@code{target} region is unspecified.
+
+GCC treates the UID string case sensitive; for the initial device, GCC currently
+only accepts the value @code{OMP_INITIAL_DEVICE} and returns for it the the value
+of @code{omp_initial_device}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_from_uid(const char *uid);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_from_uid(uid)}
+@item                   @tab @code{character(len=*), intent(in) :: uid}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_uid_from_device}, @ref{Offload-Target Specifics}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v6.0}, Section 24.7
+@end table
+
+
+
+@node omp_get_uid_from_device
+@subsection @code{omp_get_uid_from_device} -- Obtain the unique id of a device
+@table @asis
+@item @emph{Description}:
+This function returns a pointer to string that represents a unique identifier
+(UID) for the device specified by @var{device_num}.  It returns a @code{NULL} (C/C++)
+or a disassociated pointer (Fortran) for @code{omp_invalid_device}.  The effect of
+running this routine in a @code{target} region is unspecified.
+
+GCC currently returns for initial device the value @code{OMP_INITIAL_DEVICE}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{const char *omp_get_uid_from_device(int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{character(:) function omp_get_uid_from_device(device_num)}
+@item @emph{Interface}: @tab @code{pointer :: omp_get_uid_from_device}
+@item                   @tab @code{integer, intent(in) :: device_num}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_uid_from_device}, @ref{Offload-Target Specifics}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v6.0}, Section 24.8
+@end table
+
+
+
 @node omp_is_initial_device
 @subsection @code{omp_is_initial_device} -- Whether executing on the host device
 @table @asis
@@ -6517,6 +6582,10 @@  The implementation remark:
       @code{omp_thread_mem_alloc}, all use low-latency memory as first
       preference, and fall back to main graphics memory when the low-latency
       pool is exhausted.
+@item The unique identier (UID), used with OpenMP's API UID routine, is the
+      value returned by the HSA runtime library for @code{HSA_AMD_AGENT_INFO_UUID}.
+      For GPUs, it is currently @samp{GPU-} followed by 16 lower-case hex digits.
+      The output matches the one used by @code{rocminfo}.
 @end itemize
 
 
@@ -6604,6 +6673,12 @@  The implementation remark:
       @code{omp_thread_mem_alloc}, all use low-latency memory as first
       preference, and fall back to main graphics memory when the low-latency
       pool is exhausted.
+@item The unique identifier (UID), used with OpenMP's API UID routine, consists
+      of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
+      the CUDA runtime library.  This UUID is output in grouped lower-case
+      hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
+      4 digits, hyphen, 4 digits, hyphen, 16 digits.  The output matches the
+      format used by @code{nvidia-smi}.
 @end itemize
 
 
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 4ce790833ed..04aae8b51a3 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -425,6 +425,9 @@  extern const char *omp_get_interop_type_desc (const omp_interop_t,
 extern const char *omp_get_interop_rc_desc (const omp_interop_t,
 					    omp_interop_rc_t) __GOMP_NOTHROW;
 
+extern int omp_get_device_from_uid (const char *) __GOMP_NOTHROW;
+extern const char *omp_get_uid_from_device (int) __GOMP_NOTHROW;
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 1861c40266b..360352c5a07 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -1003,6 +1003,29 @@ 
           end function omp_get_interop_rc_desc
         end interface
 
+        interface
+          ! Note: In gfortran, strings are \0 termined
+          integer(c_int) function omp_get_device_from_uid(uid) bind(C)
+            use iso_c_binding
+            character(c_char), intent(in) :: uid(*)
+          end function omp_get_device_from_uid
+        end interface
+
+        interface omp_get_uid_from_device
+          ! Deviation from OpenMP 6.0: VALUE added.
+          character(:) function omp_get_uid_from_device (device_num)
+            use iso_c_binding
+            pointer :: omp_get_uid_from_device
+            integer(c_int32_t), intent(in), value :: device_num
+          end function omp_get_uid_from_device
+
+          character(:) function omp_get_uid_from_device_8 (device_num)
+            use iso_c_binding
+            pointer :: omp_get_uid_from_device_8
+            integer(c_int64_t), intent(in), value :: device_num
+          end function omp_get_uid_from_device_8
+        end interface omp_get_uid_from_device
+
 #if _OPENMP >= 201811
 !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
 !GCC$ ATTRIBUTES DEPRECATED :: omp_lock_hint_kind, omp_lock_hint_none
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 6959f1e96c7..10038611d80 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -610,3 +610,26 @@ 
           integer (omp_interop_rc_kind), value :: ret_code
         end function omp_get_interop_rc_desc
       end interface
+
+      interface
+!       Note: In gfortran, strings are \0 termined
+        integer(c_int) function omp_get_device_from_uid(uid) bind(C)
+          use iso_c_binding
+          character(c_char), intent(in) :: uid(*)
+        end function omp_get_device_from_uid
+      end interface
+
+      interface omp_get_uid_from_device
+!       Deviation from OpenMP 6.0: VALUE added.
+        character(:) function omp_get_uid_from_device (device_num)
+          use iso_c_binding
+          pointer :: omp_get_uid_from_device
+          integer(c_int32_t), intent(in), value :: device_num
+        end function omp_get_uid_from_device
+
+        character(:) function omp_get_uid_from_device_8 (device_num)
+          use iso_c_binding
+          pointer :: omp_get_uid_from_device_8
+          integer(c_int64_t), intent(in), value :: device_num
+        end function omp_get_uid_from_device_8
+      end interface omp_get_uid_from_device
diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index 9255c1cff68..eb562ace95e 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -10,6 +10,8 @@  CUDA_ONE_CALL (cuDeviceGet)
 CUDA_ONE_CALL (cuDeviceGetAttribute)
 CUDA_ONE_CALL (cuDeviceGetCount)
 CUDA_ONE_CALL (cuDeviceGetName)
+CUDA_ONE_CALL_MAYBE_NULL (cuDeviceGetUuid)
+CUDA_ONE_CALL_MAYBE_NULL (cuDeviceGetUuid_v2)
 CUDA_ONE_CALL (cuDeviceTotalMem)
 CUDA_ONE_CALL (cuDriverGetVersion)
 CUDA_ONE_CALL (cuEventCreate)
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 3d882b5ab63..bf6ad371ea2 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3316,6 +3316,22 @@  GOMP_OFFLOAD_get_name (void)
   return "gcn";
 }
 
+const char *
+GOMP_OFFLOAD_get_uid (int ord)
+{
+  char *str;
+  hsa_status_t status;
+  struct agent_info *agent = get_agent_info (ord);
+
+  /* HSA documentation states: maximally 21 characters including NUL.  */
+  str = GOMP_PLUGIN_malloc (21 * sizeof (char));
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AMD_AGENT_INFO_UUID,
+					  str);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not obtain device UUID", status);
+  return str;
+}
+
 /* Return the specific capabilities the HSA accelerator have.  */
 
 unsigned int
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 99cbcb699b3..261eb868611 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1242,6 +1242,40 @@  GOMP_OFFLOAD_get_name (void)
   return "nvptx";
 }
 
+const char *
+GOMP_OFFLOAD_get_uid (int ord)
+{
+  CUresult r;
+  CUuuid s;
+  struct ptx_device *dev = ptx_devices[ord];
+
+  if (CUDA_CALL_EXISTS (cuDeviceGetUuid_v2))
+    r = CUDA_CALL_NOCHECK (cuDeviceGetUuid_v2, &s, dev->dev);
+  else if (CUDA_CALL_EXISTS (cuDeviceGetUuid))
+    r = CUDA_CALL_NOCHECK (cuDeviceGetUuid, &s, dev->dev);
+  else
+    r = CUDA_ERROR_NOT_FOUND;
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetUuid error: %s", cuda_error (r));
+
+  size_t len = strlen ("GPU-12345678-9abc-defg-hijk-lmniopqrstuv");
+  char *str = (char *) GOMP_PLUGIN_malloc (len + 1);
+  sprintf (str,
+	   "GPU-%02x" "%02x" "%02x" "%02x"
+	   "-%02x" "%02x"
+	   "-%02x" "%02x"
+	   "-%02x" "%02x" "%02x" "%02x" "%02x" "%02x" "%02x" "%02x",
+	   (unsigned char) s.bytes[0], (unsigned char) s.bytes[1],
+	   (unsigned char) s.bytes[2], (unsigned char) s.bytes[3],
+	   (unsigned char) s.bytes[4], (unsigned char) s.bytes[5],
+	   (unsigned char) s.bytes[6], (unsigned char) s.bytes[7],
+	   (unsigned char) s.bytes[8], (unsigned char) s.bytes[9],
+	   (unsigned char) s.bytes[10], (unsigned char) s.bytes[11],
+	    (unsigned char) s.bytes[12], (unsigned char) s.bytes[13],
+	   (unsigned char) s.bytes[14], (unsigned char) s.bytes[15]);
+  return str;
+}
+
 unsigned int
 GOMP_OFFLOAD_get_caps (void)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index 47ec36928a6..fe7879b3741 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -51,6 +51,9 @@ 
 #define splay_tree_c
 #include "splay-tree.h"
 
+/* Used by omp_get_device_from_uid / omp_get_uid_from_device for the host.  */
+static char *str_omp_initial_device = "OMP_INITIAL_DEVICE";
+#define STR_OMP_DEV_PREFIX "OMP_DEV_"
 
 typedef uintptr_t *hash_entry_type;
 static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
@@ -5223,6 +5226,56 @@  ialias (omp_get_interop_name)
 ialias (omp_get_interop_type_desc)
 ialias (omp_get_interop_rc_desc)
 
+static const char *
+gomp_get_uid_for_device (struct gomp_device_descr *devicep, int device_num)
+{
+  if (devicep->uid)
+    return devicep->uid;
+
+  if (devicep->get_uid_func)
+    devicep->uid = devicep->get_uid_func (devicep->target_id);
+  if (!devicep->uid)
+    {
+      size_t ln = strlen (STR_OMP_DEV_PREFIX) + 10 + 1;
+      char *uid;
+      uid = gomp_malloc (ln);
+      snprintf (uid, ln, "%s%d", STR_OMP_DEV_PREFIX, device_num);
+      devicep->uid = uid;
+    }
+  return devicep->uid;
+}
+
+const char *
+omp_get_uid_from_device (int device_num)
+{
+  if (device_num < omp_initial_device || device_num > gomp_get_num_devices ())
+    return NULL;
+
+  if (device_num == omp_initial_device || device_num == gomp_get_num_devices ())
+    return str_omp_initial_device;
+
+  struct gomp_device_descr *devicep = resolve_device (device_num, false);
+  if (devicep == NULL)
+    return NULL;
+  return gomp_get_uid_for_device (devicep, device_num);
+}
+
+int
+omp_get_device_from_uid (const char *uid)
+{
+  if (uid == NULL)
+    return omp_invalid_device;
+  if (strcmp (uid, str_omp_initial_device) == 0)
+    return omp_initial_device;
+  for (int dev = 0; dev < gomp_get_num_devices (); dev++)
+    if (strcmp (uid, gomp_get_uid_for_device (&devices[dev], dev)) == 0)
+      return dev;
+  return omp_invalid_device;
+}
+
+ialias (omp_get_uid_from_device)
+ialias (omp_get_device_from_uid)
+
 #ifdef PLUGIN_SUPPORT
 
 /* This function tries to load a plugin for DEVICE.  Name of plugin is passed
@@ -5264,6 +5317,7 @@  gomp_load_plugin_for_device (struct gomp_device_descr *device,
     }
 
   DLSYM (get_name);
+  DLSYM_OPT (get_uid, get_uid);
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
@@ -5449,6 +5503,8 @@  gomp_target_init (void)
 		  }
 
 		current_device.name = current_device.get_name_func ();
+		/* Defer UID setting until needed + after gomp_init_device.  */
+	        current_device.uid = NULL;
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
diff --git a/libgomp/testsuite/libgomp.c/device_uid.c b/libgomp/testsuite/libgomp.c/device_uid.c
new file mode 100644
index 00000000000..0412d06f615
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/device_uid.c
@@ -0,0 +1,38 @@ 
+#include <stdlib.h>
+#include <string.h>
+#include <omp.h>
+
+int main()
+{
+  const char **strs = (const char **) malloc (sizeof (char*) * (omp_get_num_devices () + 1));
+  for (int i = omp_invalid_device - 1; i <= omp_get_num_devices () + 1; i++)
+    {
+      const char *str = omp_get_uid_from_device (i);
+      int dev = omp_get_device_from_uid (str);
+// __builtin_printf("%i -> %s -> %d\n", i, str, dev);
+      if (i < omp_initial_device || i > omp_get_num_devices ())
+	{
+	  if (dev != omp_invalid_device || str != NULL)
+	    abort ();
+	  continue;
+	}
+      if (i == omp_initial_device || i == omp_get_num_devices ())
+	{
+	  if ((dev != omp_initial_device && dev != omp_get_num_devices ())
+	      || str == NULL
+	      || strcmp (str, "OMP_INITIAL_DEVICE") != 0) /* GCC impl. choice */
+	    abort ();
+	  dev = omp_get_num_devices ();
+	}
+      else if (dev != i || str == NULL || str[0] == '\0')
+	abort ();
+      strs[dev] = str;
+    }
+
+  for (int i = 0; i < omp_get_num_devices (); i++)
+    for (int j = i + 1; j <= omp_get_num_devices (); j++)
+      if (strcmp (strs[i], strs[j]) == 0)
+	abort ();
+  free (strs);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/device_uid.f90 b/libgomp/testsuite/libgomp.fortran/device_uid.f90
new file mode 100644
index 00000000000..5104984f55e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/device_uid.f90
@@ -0,0 +1,42 @@ 
+program main
+  use omp_lib
+  implicit none (type, external)
+  integer :: i, j, dev
+  character(:), pointer :: str
+  type t
+    character(:), pointer :: str
+  end type t
+  type(t), allocatable :: strs(:)
+
+  allocate(strs(0:omp_get_num_devices ()))
+
+  do i = omp_invalid_device - 1, omp_get_num_devices () + 1
+    str => omp_get_uid_from_device (i)
+    dev = omp_get_device_from_uid (str);
+!print *, i, str, dev
+    if (i < omp_initial_device .or. i > omp_get_num_devices ()) then
+      if (dev /= omp_invalid_device .or. associated(str)) &
+        stop 1
+      cycle
+    end if
+    if (.not. associated(str)) &
+      stop 2
+    if (i == omp_initial_device .or. i == omp_get_num_devices ()) then
+      if ((dev /= omp_initial_device .and. dev /= omp_get_num_devices ()) &
+          .or. str /= "OMP_INITIAL_DEVICE") & ! /* GCC impl. choice */
+       stop 3
+      dev = omp_get_num_devices ()
+    else if (dev /= i .or. len(str) == 0) then
+      stop 4
+    end if
+    strs(dev)%str => str
+  end do 
+
+  do i = 0, omp_get_num_devices () - 1
+    do j = i + 1, omp_get_num_devices ()
+      if (strs(i)%str == strs(j)%str) &
+        stop 4
+    end do
+  end do
+  deallocate (strs)
+end