OpenMP: Fix omp_get_device_from_uid, minor cleanup (was: Re: [Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines)
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
Commit Message
Hi Thomas, hello all,
the attached follow-up patch does:
* It fixes an issue (thinko) related to Fortran and \0 terminated,
which fails for at least substring strings.
* Includes some minor fixes, e.g. ensuring the device is initialized
in omp_get_uid_from_device, the superfluous 'omp_', or adding some
inits to oacc-host.c.
* Now the plugins return NULL instead of failing when the UID cannot
be obtained; in that case, the fallback UID "OMP_DEV_%d" is used.
Comments or remarks before I commit it?
* * *
Regarding the topic of caching in the plugin instead of in
libgomp: If we want to change it, we either to remove the fallback
and require the existence and success of GOMP_OFFLOAD_get_uid.
Otherwise, with host fallback support, we have to cache it at both
locations, which is somehow not really sensible, either.
Thoughts on this topic?
* * *
Longer reply to Thomas' comments:
Thomas Schwinge wrote:
>> + "omp_get_uid_from_device",
> ..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly
> sorted).
Ups! Should be of course without. (as 'omp_' prefix is checked before).
> Do we apparently not have test suite coverage for these things?
We do *not* test all API routines. The check is, e.g., used in
gfc_error ("%s cannot contain OpenMP API call in intervening code "
or
"OpenMP runtime API call %qD in a region with "
"%<order(concurrent)%> clause", fndecl);
And we have a few tests for each of them, but not a full set of all API routines.
* * *
>> + const char *uid;
> Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for
> each call, is a minor performance optimization? (Similar to other items
> cached here, I guess.)
Yes, but it goes a bit beyond: As the pointer is returned to the user, it
has to be allocated at some point - and cached to avoid allocating more
memory when called repeatable called. As the fallback and host handling is
also done in target.c, the caching is done here.
(Besides the API routines, two env vars and one context selector for
'target_device' support the UID.)
* * *
> Please also update 'libgomp/oacc-host.c:host_dispatch'.
Done.
>> + ! Note: In gfortran, strings are \0 termined
>> + integer(c_int) function omp_get_device_from_uid(uid) bind(C)
> For my understanding: in general, Fortran strings are *not*
> NUL-terminated, right? So this is a specific properly of 'gfortran'
> and/or this GCC/OpenMP interface,
The Fortran standard leaves this to implementation, but by construction,
there is a length (however it is made handled internally, e.g. via the
declaration) and the actual data. - To aid debugging, gfortran NUL terminates
them.
However, when thinking a bit more about it, taking a substring of a
null-terminated string will not magically be \0 at the boundary of the
substring. - Thus, the simplified approach failed + a Fortran specific
function had to be added (→ fortran.c).
* * *
>> + interface omp_get_uid_from_device
>> + ! Deviation from OpenMP 6.0: VALUE added.
> (..., which I suppose you've reported to OpenMP...)
No - it is not really a bug in the standard. The OpenMP
specification tries to provide a consistent API - but it
is difficult to create an API without touching the ABI.
For the caller side, the usage is the same independent
whether there is an 'intent(in)' or VALUE attribute,
a Bind(C) with or without binding name. Or also a generic
interface with multiple specific ones - which we do to handle
-fdefault-integer-8.
Obviously, the compiler needs to know those details, but
unless users codes the interface themselves instead of
using omp.h / omp_lib.h / the omp_lib module.
Thus, that's one of the few deviation from the OpenMP
specification which does affect the ABI but not the API.
* * *
>> +GOMP_OFFLOAD_get_uid (int ord)
>> +{
> I guess I'd have just put this code into 'init_hsa_context', filling a
> new statically-sized 'uuid' field in 'hsa_context_info' (like
> 'driver_version_s'; and assuming that 'hsa_context_info' is the right
> abstraction for this), and then just return that 'uuid' from
> 'GOMP_OFFLOAD_get_uid'.
That would be one option. Still, we have to decide whether we either
want to have strictly everything handled in the device code - including
fallback handling (which could be an UID replacement or a fatal error).
Of we do part of the handling elsewhere, e.g. by permitting that the
plugin can fail or does not provide the functions, we can handle it
in target.c (as currently done) - but then we need to cache it there
as well (or at least the fallbacks).
* * *
> That way, you'd avoid the unclear semantics of
> who gets to 'free' the buffer returned from 'GOMP_OFFLOAD_get_uid' upon
> 'GOMP_OFFLOAD_fini_device' -- currently the memory is lost?
Well, depends what you mean by lost. The 'devices' data structure in
target.c is allocated early during device initialization and it is never
deallocated. Hence, also the current "uint" member is never deallocated
and remains until the end of the program accessible.
The variable it self has a rather high lifetime as the user might store
it and only much later use it. As a pointer to the string is is returned
to the user, the lifetime has to be rather long.
* * *
> But actually, should we just 'return NULL;' in that case, and let libgomp
> proper handle this gracefully, similar to as if 'GOMP_OFFLOAD_get_uid'
> isn't implemented at all by a plugin? (What are the expected OpenMP
> semantics?)
The spec is silent on it. Expected is that repeatable runs repeatable
such that (at least) for the non-host devices, the same always the same
UID is used for a specific device.
Most sensible is really the UUID, if available, but we can use a
fallback, if needed.
I now added a "return NULL" which clashes a bit with the topic of
caching in the plugin as now the OMP_DEV_%d string also needs to be
cached, but on the libgomp.so side.
* * *
>> + sprintf (str, "GPU-%02x" "%02x" "%02x" "%02x"
>> […]
>> + (unsigned char) s.bytes[0], (unsigned char) s.bytes[1],
BTW: without the (unsigned char), instead of, e.g, "ef" one gets
"ffffffef" which is a bit surprising at a glance.
Tobias
Comments
Now committed as r15-3799-gcdb9aa0f623ec7 /
https://gcc.gnu.org/r15-3799-gcdb9aa0f623ec7
Tobias
Am 21.09.24 um 01:33 schrieb Tobias Burnus:
> Hi Thomas, hello all,
>
> the attached follow-up patch does:
>
> * It fixes an issue (thinko) related to Fortran and \0 terminated,
> which fails for at least substring strings.
>
> * Includes some minor fixes, e.g. ensuring the device is initialized
> in omp_get_uid_from_device, the superfluous 'omp_', or adding some
> inits to oacc-host.c.
>
> * Now the plugins return NULL instead of failing when the UID cannot
> be obtained; in that case, the fallback UID "OMP_DEV_%d" is used.
>
> Comments or remarks before I commit it?
>
> * * *
>
> Regarding the topic of caching in the plugin instead of in
> libgomp: If we want to change it, we either to remove the fallback
> and require the existence and success of GOMP_OFFLOAD_get_uid.
> Otherwise, with host fallback support, we have to cache it at both
> locations, which is somehow not really sensible, either.
>
> Thoughts on this topic?
>
> * * *
>
> Longer reply to Thomas' comments:
>
> Thomas Schwinge wrote:
>
>>> + "omp_get_uid_from_device",
>> ..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly
>> sorted).
>
> Ups! Should be of course without. (as 'omp_' prefix is checked before).
>
>> Do we apparently not have test suite coverage for these things?
>
> We do *not* test all API routines. The check is, e.g., used in
>
> gfc_error ("%s cannot contain OpenMP API call in intervening code "
>
> or
> "OpenMP runtime API call %qD in a region with "
> "%<order(concurrent)%> clause", fndecl);
>
> And we have a few tests for each of them, but not a full set of all
> API routines.
>
> * * *
>
>>> + const char *uid;
>> Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for
>> each call, is a minor performance optimization? (Similar to other items
>> cached here, I guess.)
>
> Yes, but it goes a bit beyond: As the pointer is returned to the user, it
> has to be allocated at some point - and cached to avoid allocating more
> memory when called repeatable called. As the fallback and host
> handling is
> also done in target.c, the caching is done here.
>
> (Besides the API routines, two env vars and one context selector for
> 'target_device' support the UID.)
>
> * * *
>
>> Please also update 'libgomp/oacc-host.c:host_dispatch'.
> Done.
>>> + ! Note: In gfortran, strings are \0 termined
>>> + integer(c_int) function omp_get_device_from_uid(uid) bind(C)
>> For my understanding: in general, Fortran strings are *not*
>> NUL-terminated, right? So this is a specific properly of 'gfortran'
>> and/or this GCC/OpenMP interface,
>
> The Fortran standard leaves this to implementation, but by construction,
> there is a length (however it is made handled internally, e.g. via the
> declaration) and the actual data. - To aid debugging, gfortran NUL
> terminates
> them.
>
> However, when thinking a bit more about it, taking a substring of a
> null-terminated string will not magically be \0 at the boundary of the
> substring. - Thus, the simplified approach failed + a Fortran specific
> function had to be added (→ fortran.c).
>
> * * *
>
>>> + interface omp_get_uid_from_device
>>> + ! Deviation from OpenMP 6.0: VALUE added.
>> (..., which I suppose you've reported to OpenMP...)
>
> No - it is not really a bug in the standard. The OpenMP
> specification tries to provide a consistent API - but it
> is difficult to create an API without touching the ABI.
>
> For the caller side, the usage is the same independent
> whether there is an 'intent(in)' or VALUE attribute,
> a Bind(C) with or without binding name. Or also a generic
> interface with multiple specific ones - which we do to handle
> -fdefault-integer-8.
>
> Obviously, the compiler needs to know those details, but
> unless users codes the interface themselves instead of
> using omp.h / omp_lib.h / the omp_lib module.
>
> Thus, that's one of the few deviation from the OpenMP
> specification which does affect the ABI but not the API.
>
> * * *
>
>>> +GOMP_OFFLOAD_get_uid (int ord)
>>> +{
>> I guess I'd have just put this code into 'init_hsa_context', filling a
>> new statically-sized 'uuid' field in 'hsa_context_info' (like
>> 'driver_version_s'; and assuming that 'hsa_context_info' is the right
>> abstraction for this), and then just return that 'uuid' from
>> 'GOMP_OFFLOAD_get_uid'.
>
> That would be one option. Still, we have to decide whether we either
> want to have strictly everything handled in the device code - including
> fallback handling (which could be an UID replacement or a fatal error).
>
> Of we do part of the handling elsewhere, e.g. by permitting that the
> plugin can fail or does not provide the functions, we can handle it
> in target.c (as currently done) - but then we need to cache it there
> as well (or at least the fallbacks).
>
> * * *
>
>> That way, you'd avoid the unclear semantics of
>> who gets to 'free' the buffer returned from 'GOMP_OFFLOAD_get_uid' upon
>> 'GOMP_OFFLOAD_fini_device' -- currently the memory is lost?
>
> Well, depends what you mean by lost. The 'devices' data structure in
> target.c is allocated early during device initialization and it is
> never deallocated. Hence, also the current "uint" member is never
> deallocated and remains until the end of the program accessible.
>
> The variable it self has a rather high lifetime as the user might
> store it and only much later use it. As a pointer to the string is is
> returned to the user, the lifetime has to be rather long.
>
> * * *
>
>> But actually, should we just 'return NULL;' in that case, and let
>> libgomp
>> proper handle this gracefully, similar to as if 'GOMP_OFFLOAD_get_uid'
>> isn't implemented at all by a plugin? (What are the expected OpenMP
>> semantics?)
>
> The spec is silent on it. Expected is that repeatable runs repeatable
> such that (at least) for the non-host devices, the same always the
> same UID is used for a specific device.
>
> Most sensible is really the UUID, if available, but we can use a
> fallback, if needed.
>
> I now added a "return NULL" which clashes a bit with the topic of
> caching in the plugin as now the OMP_DEV_%d string also needs to be
> cached, but on the libgomp.so side.
>
> * * *
>>> + sprintf (str, "GPU-%02x" "%02x" "%02x" "%02x"
>>> […]
>>> + (unsigned char) s.bytes[0], (unsigned char) s.bytes[1],
>
> BTW: without the (unsigned char), instead of, e.g, "ef" one gets
> "ffffffef" which is a bit surprising at a glance.
>
> Tobias
OpenMP: Fix omp_get_device_from_uid, minor cleanup
In Fortran, omp_get_device_from_uid can also accept substrings, which are
then not NUL terminated. Fixed by introducing a fortran.c wrapper function.
Additionally, in case of a fail the plugin functions now return NULL instead
of failing fatally such that a fall-back UID is generated.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Strip "omp_" from
string; move get_device_from_uid as now a '_' suffix exists.
libgomp/ChangeLog:
* fortran.c (omp_get_device_from_uid_): New function.
* libgomp.map (GOMP_6.0): Add it.
* oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'.
* omp_lib.f90.in: Make it used by removing bind(C).
* omp_lib.h.in: Likewise.
* target.c (omp_get_device_from_uid): Ensure the device is initialized.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment;
return NULL in case of an error.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise.
* testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
gcc/omp-general.cc | 4 ++--
libgomp/fortran.c | 18 ++++++++++++++++++
libgomp/libgomp.map | 1 +
libgomp/oacc-host.c | 2 ++
libgomp/omp_lib.f90.in | 5 ++---
libgomp/omp_lib.h.in | 5 ++---
libgomp/plugin/plugin-gcn.c | 8 +++++++-
libgomp/plugin/plugin-nvptx.c | 7 +++++--
libgomp/target.c | 7 +++++--
libgomp/testsuite/libgomp.fortran/device_uid.f90 | 18 ++++++++++++++++--
10 files changed, 60 insertions(+), 15 deletions(-)
@@ -3260,7 +3260,6 @@ omp_runtime_api_procname (const char *name)
"alloc",
"calloc",
"free",
- "get_device_from_uid",
"get_interop_int",
"get_interop_ptr",
"get_mapped_ptr",
@@ -3290,6 +3289,7 @@ omp_runtime_api_procname (const char *name)
"get_cancellation",
"get_default_allocator",
"get_default_device",
+ "get_device_from_uid",
"get_device_num",
"get_dynamic",
"get_initial_device",
@@ -3339,7 +3339,7 @@ omp_runtime_api_procname (const char *name)
as DECL_NAME only omp_* and omp_*_8 appear. */
"display_env",
"get_ancestor_thread_num",
- "omp_get_uid_from_device",
+ "get_uid_from_device",
"get_partition_place_nums",
"get_place_num_procs",
"get_place_proc_ids",
@@ -834,6 +834,24 @@ omp_get_interop_rc_desc_ (const char **res, size_t *res_len,
*res_len = *res ? strlen (*res) : 0;
}
+int
+omp_get_device_from_uid_ (const char *uid, size_t uid_len)
+{
+#ifndef LIBGOMP_OFFLOADED_ONLY
+ char *str = __builtin_alloca ((uid_len + 1) * sizeof (char));
+ memcpy (str, uid, uid_len * sizeof (char));
+ str[uid_len] = '\0';
+ return omp_get_device_from_uid (str);
+#else
+ /* Inside the target region, invoking this routine is undefined
+ behavior; thus, resolve it already here - instead of inside
+ libgomp/config/.../target.c.
+ Note that on nvptx __builtin_alloca is defined, but fails with a sorry
+ during compilation, as it is unsupported until isa 7.3 / sm_52. */
+ return omp_invalid_device;
+#endif
+}
+
void
omp_get_uid_from_device_ (const char **res, size_t *res_len,
int32_t device_num)
@@ -446,6 +446,7 @@ GOMP_5.1.3 {
GOMP_6.0 {
global:
omp_get_device_from_uid;
+ omp_get_device_from_uid_;
omp_get_uid_from_device;
omp_get_uid_from_device_;
omp_get_uid_from_device_8_;
@@ -263,6 +263,7 @@ host_openacc_destroy_thread_data (void *tls_data __attribute__ ((unused)))
static struct gomp_device_descr host_dispatch =
{
.name = "host",
+ .uid = NULL,
.capabilities = (GOMP_OFFLOAD_CAP_SHARED_MEM
| GOMP_OFFLOAD_CAP_NATIVE_EXEC
| GOMP_OFFLOAD_CAP_OPENACC_200),
@@ -270,6 +271,7 @@ static struct gomp_device_descr host_dispatch =
.type = OFFLOAD_TARGET_TYPE_HOST,
.get_name_func = host_get_name,
+ .get_uid_func = NULL,
.get_caps_func = host_get_caps,
.get_type_func = host_get_type,
.get_num_devices_func = host_get_num_devices,
@@ -1004,10 +1004,9 @@
end interface
interface
- ! Note: In gfortran, strings are \0 termined
- integer(c_int) function omp_get_device_from_uid(uid) bind(C)
+ integer(c_int) function omp_get_device_from_uid (uid)
use iso_c_binding
- character(c_char), intent(in) :: uid(*)
+ character, intent(in) :: uid(*)
end function omp_get_device_from_uid
end interface
@@ -612,10 +612,9 @@
end interface
interface
-! Note: In gfortran, strings are \0 termined
- integer(c_int) function omp_get_device_from_uid(uid) bind(C)
+ integer(c_int) function omp_get_device_from_uid (uid)
use iso_c_binding
- character(c_char), intent(in) :: uid(*)
+ character, intent(in) :: uid(*)
end function omp_get_device_from_uid
end interface
@@ -3316,6 +3316,9 @@ GOMP_OFFLOAD_get_name (void)
return "gcn";
}
+/* Return the UID; if not available return NULL.
+ Returns freshly allocated memoy. */
+
const char *
GOMP_OFFLOAD_get_uid (int ord)
{
@@ -3328,7 +3331,10 @@ GOMP_OFFLOAD_get_uid (int ord)
status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AMD_AGENT_INFO_UUID,
str);
if (status != HSA_STATUS_SUCCESS)
- hsa_fatal ("Could not obtain device UUID", status);
+ {
+ free (str);
+ return NULL;
+ }
return str;
}
@@ -1242,6 +1242,9 @@ GOMP_OFFLOAD_get_name (void)
return "nvptx";
}
+/* Return the UID; if not available return NULL.
+ Returns freshly allocated memoy. */
+
const char *
GOMP_OFFLOAD_get_uid (int ord)
{
@@ -1254,9 +1257,9 @@ GOMP_OFFLOAD_get_uid (int ord)
else if (CUDA_CALL_EXISTS (cuDeviceGetUuid))
r = CUDA_CALL_NOCHECK (cuDeviceGetUuid, &s, dev->dev);
else
- r = CUDA_ERROR_NOT_FOUND;
+ return NULL;
if (r != CUDA_SUCCESS)
- GOMP_PLUGIN_fatal ("cuDeviceGetUuid error: %s", cuda_error (r));
+ NULL;
size_t len = strlen ("GPU-12345678-9abc-defg-hijk-lmniopqrstuv");
char *str = (char *) GOMP_PLUGIN_malloc (len + 1);
@@ -5268,8 +5268,11 @@ omp_get_device_from_uid (const char *uid)
if (strcmp (uid, str_omp_initial_device) == 0)
return omp_initial_device;
for (int dev = 0; dev < gomp_get_num_devices (); dev++)
- if (strcmp (uid, gomp_get_uid_for_device (&devices[dev], dev)) == 0)
- return dev;
+ {
+ struct gomp_device_descr *devicep = resolve_device (dev, false);
+ if (strcmp (uid, gomp_get_uid_for_device (devicep, dev)) == 0)
+ return dev;
+ }
return omp_invalid_device;
}
@@ -12,7 +12,7 @@ program main
do i = omp_invalid_device - 1, omp_get_num_devices () + 1
str => omp_get_uid_from_device (i)
- dev = omp_get_device_from_uid (str);
+ dev = omp_get_device_from_uid (str)
! print *, i, str, dev
if (i < omp_initial_device .or. i > omp_get_num_devices ()) then
if (dev /= omp_invalid_device .or. associated(str)) &
@@ -30,12 +30,26 @@ program main
stop 4
end if
strs(dev)%str => str
+
+ block
+ ! Check substring handling
+ character(len=100) :: long_str
+ integer :: dev2
+ long_str = str // "ABCDEF"
+ dev2 = omp_get_device_from_uid (long_str(1:len(str)))
+ if (i == omp_initial_device .or. i == omp_get_num_devices ()) then
+ if (dev2 /= omp_initial_device .and. dev2 /= omp_get_num_devices ()) &
+ stop 5
+ else if (dev /= dev2) then
+ stop 6
+ end if
+ end block
end do
do i = 0, omp_get_num_devices () - 1
do j = i + 1, omp_get_num_devices ()
if (strs(i)%str == strs(j)%str) &
- stop 4
+ stop 7
end do
end do
deallocate (strs)