On Tue, Jun 07, 2022 at 12:21:25AM +0200, Ahmed Sayed Mousse via Gcc-patches wrote:
> This patch is the initial implementation of OpenMP-API specs book section
> 20.5.5 with title "Thread Handles"
>
> libgomp/ChangeLog
>
> 2022-05-06 Ahmed Sayed <ahmedsayedmousse@gmail.com>
Two spaces should separate the date and name and name and email.
>
> * Makefile.am (libgompd_la_SOURCES): Add ompd-threads.c.
>
> * Makefile.in: Regenerate.
>
No empty lines in between (and all ChangeLog lines start with a tab (I
assume your mailer ate that).
> * ompd-support.h ( gompd_thread_initial_tls_bias ): New Variable.
No spaces after ( or before )
> * ompd-support.c ( gompd_thread_initial_tls_bias ): New Variable.
>
> ( gompd_load ): ( gompd_thread_initial_tls_bias ): Initialized with
> &gomp_tls_data - pthread_self ().
It is just gompd_load you are changing, so it should be:
(gompd_load): Initialize gompd_thread_initial_tls_bias.
or so.
> --- a/libgomp/ompd-support.c
> +++ b/libgomp/ompd-support.c
> @@ -36,6 +36,10 @@
> const char **ompd_dll_locations = NULL;
> __UINT64_TYPE__ gompd_state;
>
> +#if (defined HAVE_TLS || defined USE_EMUTLS)
> +__UINT64_TYPE__ gompd_thread_initial_tls_bias;
In reality it isn't these conditions, but
#ifdef GOMP_NEEDS_THREAD_HANDLE that determines if there is
a TLS bias possible.
But the point of those gompd_sizeof* and gompd_access* vars
was to make libgompd slightly more independent from the exact
libgomp version, otherwise one could just use sizeof and offsetof
values directly in libgompd.
So, even using similar ifdefs on the libgompd side looks wrong,
the var should be there unconditionally and just use some special
value (e.g. -1 which isn't a possible TLS bias because the
struct has some alignment requirements) to say that the TLS bias
can't be used and one needs to use struct gomp_thread's handle
member instead.
Also, as I mentioned yesterday, using __UINT64_TYPE__ for everything
is very vasteful, use the right type for each information.
As for TLS bias, in reality it will be up to +- a few hundreds of bytes,
worst case kilobytes, but in theory it could be on 64-bit targets even
larger than 4GB, but on 32-bit arches it can't, so size_t would
be the right type. Except I think the interfaces don't cover size_t size,
but long would be a usable replacement (not the same thing size-wise on
Windows, but Windows will always GOMP_NEEDS_THREAD_HANDLE).
> +#endif
> +
> void
> gompd_load (void)
> {
> @@ -61,7 +65,11 @@ gompd_load (void)
> = (__UINT64_TYPE__) & (((struct gomp_thread *) NULL)->handle);
> __UINT64_TYPE__ gompd_sizeof_gomp_thread_handle
> = sizeof (((struct gomp_thread *) NULL)->handle);
There is a preexisting bug above:
#ifdef GOMP_NEEDS_THREAD_HANDLE
__UINT64_TYPE__ gompd_access_gomp_thread_handle
= (__UINT64_TYPE__) & (((struct gomp_thread *) NULL)->handle);
__UINT64_TYPE__ gompd_sizeof_gomp_thread_handle
= sizeof (((struct gomp_thread *) NULL)->handle);
just defines automatic variables in the function and sets them to
those values. They need to be global vars, ideally const
initialized at file scope. But, as the field is sometimes present
and sometimes it is not, I think best would be to initialize
it to offsetof/sizeof #ifdef GOMP_NEEDS_THREAD_HANDLE and
otherwise to 0 and 0.
Then we even don't need a magic value or when TLS bias can't be used
and instead always GET_VALUE of gompd_sizeof_gomp_thread_handle,
if it is 0, then use TLS bias, otherwise load the handle.
Again, comment more about the already committed patch now, besides
trying to shrink the values from __UINT64_TYPE__ to probably short int
and making them const and initialized at file scope initializers and
using offsetof, there is a big question when do we expect OMPD to work.
Seems the gompd_{sizeof,access}* symbols aren't exported from the
library, so they are present (say on ELF) just in .symtab/.strtab
sections and debug info. Those sections can be stripped or stripped to
file, so that would mean OMPD would work only if the libgomp.so.1 library
is not stripped or has separate debug info installed.
Also, if one builds the library with LTO, I think the linker with the
compiler will happily remove all those symbols, as nothing uses them.
To fix this latter thing, one can just add __attribute__((used)) to
all those vars.
But if we want to make those work somehow even without debug info
and .symtab/.strtab sections around, I think we want to force the
symbols into .dynsym/.dynstr too (i.e. export in libgomp.map).
Exporting dozens of such symbols would be quite costly though.
So if we go that route, I think it would be best if we had just
1-2 of such variables with data for libgompd (probably 2 where
one is const and can be in .rodata and the other for vars that might need
changing). As most if not all of the const data can be represented in
unsigned short, I think it should be an array of const unsigned short,
with macros that say what each element means and those macros we'd just keep
frozen as an unchangeable ABI between the libgomp and libgompd libraries.
Ideally, [0] element of the array would be some kind of version which
libgompd initialization can check and punt if the version is unexpected
(in theory in the future libgompd could support multiple versions etc.).
> + #elif (defined HAVE_TLS || defined USE_EMUTLS)
> + gompd_thread_initial_tls_bias = (__UINT64_TYPE__) \
> + (&gomp_tls_data - pthread_self ());
I think you should add casts, (char *) &gomp_tls_data - (char *) pthread_self ()
But, sure, this difference is not constant and so needs to be in the other,
.data variable if the above notes are incorporated into a later patch,
for now just that the gompd_thread_initial_tls_bias var can't be const.
> +ompd_rc_t
> +ompd_get_thread_in_parallel (ompd_parallel_handle_t *parallel_handle, int
> + thread_num, ompd_thread_handle_t **thread_handle)
> +{
...
> + ompd_word_t team_size_var;
> + ret = ompd_get_icv_from_scope ((void *) parallel_handle, ompd_scope_parallel,
> + gompd_icv_team_size_var, &team_size_var);
> + if (ret != ompd_rc_ok)
> + return ret;
> + if (thread_num < 0 || thread_num >= team_size_var)
> + return ompd_rc_bad_input;
Does ompd_get_icv_from_scope with gompd_icv_team_size_var DTRT?
I.e. read the nthreads var from struct gomp_team if the parallel handle
represents a parallel with corresponding struct gomp_team, otherwise (if it
represents an implicit parallel, return 1)? See how
omp_get_team_size (0) aka omp_get_num_threads () is implemented on the
library side.
Jakub
@@ -94,7 +94,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c ompd-support.c
-libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-threads.c
include $(top_srcdir)/plugin/Makefrag.am
@@ -233,7 +233,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
oacc-target.lo ompd-support.lo $(am__objects_1)
libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
-am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo
+am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo \
+ ompd-threads.lo
libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
AM_V_P = $(am__v_P_@AM_V@)
am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -583,7 +584,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c ompd-support.c $(am__append_7)
-libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-threads.c
# Nvidia PTX OpenACC plugin.
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@@ -801,6 +802,7 @@ distclean-compile:
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-icv.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-init.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-support.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-threads.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
@@ -36,6 +36,10 @@
const char **ompd_dll_locations = NULL;
__UINT64_TYPE__ gompd_state;
+#if (defined HAVE_TLS || defined USE_EMUTLS)
+__UINT64_TYPE__ gompd_thread_initial_tls_bias;
+#endif
+
void
gompd_load (void)
{
@@ -61,7 +65,11 @@ gompd_load (void)
= (__UINT64_TYPE__) & (((struct gomp_thread *) NULL)->handle);
__UINT64_TYPE__ gompd_sizeof_gomp_thread_handle
= sizeof (((struct gomp_thread *) NULL)->handle);
+ #elif (defined HAVE_TLS || defined USE_EMUTLS)
+ gompd_thread_initial_tls_bias = (__UINT64_TYPE__) \
+ (&gomp_tls_data - pthread_self ());
#endif
+
gomp_debug (2, "OMP OMPD active\n");
static const char *ompd_dll_locations_array[2]
= {"libgompd" SONAME_SUFFIX (1) , NULL};
@@ -69,6 +69,10 @@
void gompd_load (void);
extern __UINT64_TYPE__ gompd_state;
+#if (defined HAVE_TLS || defined USE_EMUTLS)
+extern __UINT64_TYPE__ gompd_thread_initial_tls_bias;
+#endif
+
#define OMPD_ENABLED 0x1
#define GOMPD_FOREACH_ACCESS(gompd_access) \
new file mode 100644
Binary files /dev/null and b/libgomp/ompd-support.h.gch differ
new file mode 100644
@@ -0,0 +1,310 @@
+/* Copyright (C) The GNU Toolchain Authors.
+ Contributed by Ahmed Sayed <ahmedsayedmousse@gmail.com>.
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* This file contains the implementation of functions defined in
+ Section 5.5 ThreadHandles. */
+
+
+#include "ompd-helper.h"
+
+ompd_rc_t
+ompd_get_thread_in_parallel (ompd_parallel_handle_t *parallel_handle, int
+ thread_num, ompd_thread_handle_t **thread_handle)
+{
+
+ if (parallel_handle == NULL)
+ return ompd_rc_stale_handle;
+ if (parallel_handle->ah == NULL)
+ return ompd_rc_stale_handle;
+
+ ompd_address_space_context_t *context = parallel_handle->ah->context;
+ ompd_rc_t ret;
+
+ if (!context)
+ return ompd_rc_stale_handle;
+
+ if (!callbacks)
+ return ompd_rc_callback_error;
+
+ ompd_word_t team_size_var;
+ ret = ompd_get_icv_from_scope ((void *) parallel_handle, ompd_scope_parallel,
+ gompd_icv_team_size_var, &team_size_var);
+ if (ret != ompd_rc_ok)
+ return ret;
+ if (thread_num < 0 || thread_num >= team_size_var)
+ return ompd_rc_bad_input;
+
+ ompd_size_t temp_offset, field_offset;
+ ompd_address_t temp_symbol_addr, symbol_addr = {OMPD_SEGMENT_UNSPECIFIED, 0};
+
+ const char *symbol_name = "gompd_access_gomp_thread_pool_threads";
+ ret = callbacks->symbol_addr_lookup (context, NULL, symbol_name, &symbol_addr,
+ NULL);
+ ret = callbacks->read_memory (context, NULL, &symbol_addr,
+ target_sizes.sizeof_long_long, &temp_offset);
+
+ ret = callbacks->device_to_host (context, &temp_offset,
+ target_sizes.sizeof_long_long, 1,
+ &field_offset);
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ symbol_addr.address += field_offset;
+
+ ret = callbacks->read_memory (context, NULL, &symbol_addr,
+ target_sizes.sizeof_pointer,
+ &temp_symbol_addr.address);
+ ret = callbacks->device_to_host (context, &temp_symbol_addr.address,
+ target_sizes.sizeof_pointer, 1,
+ &symbol_addr.address);
+
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ symbol_addr.address += thread_num * target_sizes.sizeof_pointer;
+
+ ret = callbacks->read_memory (context, NULL, &symbol_addr,
+ target_sizes.sizeof_pointer,
+ &temp_symbol_addr.address);
+ ret = callbacks->device_to_host (context, &temp_symbol_addr.address,
+ target_sizes.sizeof_pointer, 1,
+ &symbol_addr.address);
+
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ ret = callbacks->alloc_memory (sizeof (ompd_thread_handle_t),
+ (void **) thread_handle);
+
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ if (symbol_addr.address == 0)
+ return ompd_rc_unsupported;
+
+ (*thread_handle)->th = symbol_addr;
+ (*thread_handle)->ah = parallel_handle->ah;
+ return ret;
+}
+
+/* The ompd_get_thread_handle function that maps a native thread to an
+ OMPD thread handle. */
+
+ompd_rc_t
+ompd_get_thread_handle (ompd_address_space_handle_t *handle,
+ ompd_thread_id_t kind, ompd_size_t sizeof_thread_id,
+ const void *thread_id,
+ ompd_thread_handle_t **thread_handle)
+{
+ if (!handle)
+ return ompd_rc_stale_handle;
+
+ if (kind != OMPD_THREAD_ID_PTHREAD)
+ return ompd_rc_unsupported;
+
+ ompd_address_space_context_t *context = handle->context;
+ ompd_thread_context_t *tcontext;
+ ompd_rc_t ret;
+
+ if (!context)
+ return ompd_rc_stale_handle;
+
+ ret = callbacks->get_thread_context_for_thread_id (context, kind,
+ sizeof_thread_id,
+ thread_id, &tcontext);
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ ompd_size_t temp_symbol_size, symbol_size;
+ ompd_address_t temp_symbol_addr, symbol_addr = {OMPD_SEGMENT_UNSPECIFIED, 0};
+
+ const char *symbol_name = "gompd_sizeof_gomp_thread";
+ ret = callbacks->symbol_addr_lookup(context, NULL, symbol_name, &symbol_addr,
+ NULL);
+ ret = callbacks->read_memory(context, NULL, &symbol_addr,
+ target_sizes.sizeof_long_long,
+ &temp_symbol_size);
+ ret = callbacks->device_to_host(context, &temp_symbol_size,
+ target_sizes.sizeof_long_long, 1,
+ &symbol_size);
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ ret = callbacks->symbol_addr_lookup (context, tcontext,"gomp_tls_data",
+ &symbol_addr, NULL);
+
+ /* if libgomp is offloading or using pthread_key_t then lookup can't
+ find the symbol "gomp_tls_data". */
+ if (ret == ompd_rc_error)
+ return ompd_rc_unsupported;
+
+ ret = callbacks->read_memory (context, tcontext, &symbol_addr, symbol_size,
+ &temp_symbol_addr.address);
+ ret = callbacks->device_to_host (context, &temp_symbol_addr.address,
+ symbol_size, 1, &symbol_addr.address);
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ ret = callbacks->alloc_memory (sizeof(ompd_thread_handle_t), (void **)
+ (thread_handle));
+
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ (*thread_handle)->ah = handle;
+ (*thread_handle)->th = symbol_addr;
+ (*thread_handle)->thread_context = tcontext;
+ return ret;
+}
+
+
+ompd_rc_t
+ompd_rel_thread_handle (ompd_thread_handle_t *thread_handle)
+{
+ if (thread_handle == NULL)
+ return ompd_rc_stale_handle;
+
+ ompd_rc_t ret;
+ ret = callbacks->free_memory ((void *) thread_handle);
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ return ompd_rc_ok;
+}
+
+
+/* return -1, 0 or 1 for thread_handle_1 <, == or > thread_handle_2. */
+ompd_rc_t
+ompd_thread_handle_compare (ompd_thread_handle_t *thread_handle_1,
+ ompd_thread_handle_t *thread_handle_2,
+ int *cmp_value )
+{
+
+ if (thread_handle_1 == NULL)
+ return ompd_rc_stale_handle;
+ if (thread_handle_2 == NULL)
+ return ompd_rc_stale_handle;
+ if (cmp_value == NULL)
+ return ompd_rc_bad_input;
+ if (thread_handle_1->ah->kind != thread_handle_2->ah->kind)
+ return ompd_rc_bad_input;
+ *cmp_value = thread_handle_1->th.address - thread_handle_2->th.address;
+
+ return ompd_rc_ok;
+}
+
+
+ompd_rc_t
+ompd_get_thread_id (ompd_thread_handle_t *thread_handle, ompd_thread_id_t kind,
+ ompd_size_t sizeof_thread_id, void *thread_id)
+{
+ if (kind != OMPD_THREAD_ID_PTHREAD)
+ return ompd_rc_unsupported;
+ if (thread_id == NULL)
+ return ompd_rc_bad_input;
+ if (!thread_handle || !thread_handle->ah)
+ return ompd_rc_stale_handle;
+
+ ompd_address_space_context_t *context = thread_handle->ah->context;
+ if (context == NULL)
+ return ompd_rc_stale_handle;
+
+ ompd_rc_t ret;
+ ompd_address_t taddr = thread_handle->th;
+ ompd_address_t temp_symbol_addr, symbol_addr = {OMPD_SEGMENT_UNSPECIFIED, 0};
+ ompd_size_t temp_symbol_size, symbol_size;
+ ompd_size_t temp_offset, offset;
+
+
+ char *symbol_name = "gompd_sizeof_gomp_thread_handle";
+ ret = callbacks->symbol_addr_lookup (context, NULL, symbol_name, &symbol_addr,
+ NULL);
+ /* If symbol isn't found we check for the optimized version. */
+ if (ret == ompd_rc_error)
+ goto try_bias;
+ ret = callbacks->read_memory (context, NULL, &symbol_addr,
+ target_sizes.sizeof_long_long,
+ &temp_symbol_size);
+ ret = callbacks->device_to_host (context, &temp_symbol_size,
+ target_sizes.sizeof_long_long, 1,
+ &symbol_size);
+ if (ret != ompd_rc_ok)
+ return ret;
+ if (sizeof_thread_id != symbol_size)
+ return ompd_rc_bad_input;
+
+ symbol_name = "gompd_access_gomp_thread_handle";
+ ret = callbacks->symbol_addr_lookup (context, NULL, symbol_name, &symbol_addr,
+ NULL);
+ ret = callbacks->read_memory (context, NULL, &symbol_addr, symbol_size,
+ &temp_offset);
+ ret = callbacks->device_to_host (context, &temp_offset, symbol_size, 1,
+ &offset);
+
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ taddr.address += offset;
+ ret = callbacks->read_memory (context, NULL, &taddr,
+ target_sizes.sizeof_long_long, thread_id);
+ return ret;
+
+try_bias:
+ symbol_name = "gompd_thread_initial_tls_bias";
+ ret = callbacks->symbol_addr_lookup (context, NULL, symbol_name, &symbol_addr,
+ NULL);
+ if (ret == ompd_rc_error)
+ return ompd_rc_unsupported;
+
+ ret = callbacks->read_memory (context, NULL, &symbol_addr,
+ target_sizes.sizeof_long_long, &temp_offset);
+ ret = callbacks->device_to_host (context, &temp_offset,
+ target_sizes.sizeof_long_long, 1, &offset);
+
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ ret = callbacks->symbol_addr_lookup (context, NULL,"gomp_tls_data",
+ &symbol_addr, NULL);
+ ret = callbacks->device_to_host (context, &temp_symbol_addr.address,
+ target_sizes.sizeof_long_long, 1,
+ &symbol_addr.address);
+ if (ret != ompd_rc_ok)
+ return ret;
+
+ taddr.address = symbol_addr.address + offset;
+ ret = callbacks->read_memory (context, NULL, &taddr,
+ target_sizes.sizeof_long_long, thread_id);
+ return ret;
+}
+
+
+/* OMPD doesn't support GPUs for now. */
+ompd_rc_t ompd_get_device_from_thread (ompd_thread_handle_t *thread_handle,
+ ompd_address_space_handle_t **device)
+{
+ if (thread_handle == NULL)
+ return ompd_rc_stale_handle;
+ return ompd_rc_unsupported;
+}