plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally)
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Thomas Schwinge wrote:
> That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
> its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does
> handle the error condition:
Well, it does not really handle it: It also just calls 'fatal',
admittedly after unlocking but if the program dies it does not
really make a difference.
However, there are known issues with libgomp-plugin calling FATAL,
which does not really trigger the FATAL part in libgomp, cf.
https://gcc.gnu.org/PR109664
* * *
The attached patch now defers the error handling to a bit later,
either still in libgomp-plugin or in one case it even propagates
the error through to a user-called routine.
Comments before I commit it?
Tobias
Comments
Hi Tobias!
On 2024-12-03T11:28:19+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
> Thomas Schwinge wrote:
>> That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
>> its users); the real user of ['GOMP_OFFLOAD_openacc_async_construct'] does
>> handle the error condition:
>
> Well, it does not really handle it: It also just calls 'fatal',
> admittedly after unlocking but if the program dies it does not
> really make a difference.
It makes a very big difference: without unlocking, the process deadlocks.
> However, there are known issues with libgomp-plugin calling FATAL,
> which does not really trigger the FATAL part in libgomp, cf.
> https://gcc.gnu.org/PR109664
Right.
> The attached patch now defers the error handling to a bit later,
> either still in libgomp-plugin or in one case it even propagates
> the error through to a user-called routine.
>
> Comments before I commit it?
Looks good to me, one incremental step forward, thanks!
(I see there are more missing-unlocking issues in the GCN
'GOMP_OFFLOAD_openacc_async_construct', and not only there; that's for
another day.)
Grüße
Thomas
> plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct
>
> Follow up to r15-5392-g884637b6362391. As the name implies,
> GOMP_OFFLOAD_openacc_async_construct is also externally called.
> Hence, partially revert previous commit to permit unlocking handling
> in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.
>
> Hence, also the other (indirect) callers had to be updated:
> GOMP_OFFLOAD_dev2dev fails now with 'false' and
> GOMP_OFFLOAD_async_run fatally.
>
> libgomp/ChangeLog:
>
> * plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
> Handle omp_async_queue == NULL after call tomaybe_init_omp_async.
> (GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
> partially reverting r15-5392.
>
> libgomp/plugin/plugin-gcn.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> index d26b93657bf..239acf8cb75 100644
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n)
> {
> struct agent_info *agent = get_agent_info (device);
> maybe_init_omp_async (agent);
> + if (!agent->omp_async_queue)
> + return false;
> queue_push_copy (agent->omp_async_queue, dst, src, n);
> return true;
> }
> @@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
> }
>
> maybe_init_omp_async (agent);
> + if (!agent->omp_async_queue)
> + GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed");
> queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);
> queue_push_callback (agent->omp_async_queue,
> GOMP_PLUGIN_target_task_completion, async_data);
> @@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
> gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
> }
>
> -/* Create a new asynchronous thread and queue for running future kernels;
> - issues a fatal error if the queue cannot be created as all callers expect
> - that the queue exists. */
> +/* Create a new asynchronous thread and queue for running future kernels. */
>
> struct goacc_asyncqueue *
> GOMP_OFFLOAD_openacc_async_construct (int device)
> @@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
>
> if (pthread_mutex_init (&aq->mutex, NULL))
> {
> - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
> + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
> return NULL;
> }
> if (pthread_cond_init (&aq->queue_cond_in, NULL))
> {
> - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
> return NULL;
> }
> if (pthread_cond_init (&aq->queue_cond_out, NULL))
> {
> - GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> + GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
> return NULL;
> }
>
plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct
Follow up to r15-5392-g884637b6362391. As the name implies,
GOMP_OFFLOAD_openacc_async_construct is also externally called.
Hence, partially revert previous commit to permit unlocking handling
in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.
Hence, also the other (indirect) callers had to be updated:
GOMP_OFFLOAD_dev2dev fails now with 'false' and
GOMP_OFFLOAD_async_run fatally.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
Handle omp_async_queue == NULL after call tomaybe_init_omp_async.
(GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
partially reverting r15-5392.
libgomp/plugin/plugin-gcn.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
@@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n)
{
struct agent_info *agent = get_agent_info (device);
maybe_init_omp_async (agent);
+ if (!agent->omp_async_queue)
+ return false;
queue_push_copy (agent->omp_async_queue, dst, src, n);
return true;
}
@@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
}
maybe_init_omp_async (agent);
+ if (!agent->omp_async_queue)
+ GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed");
queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);
queue_push_callback (agent->omp_async_queue,
GOMP_PLUGIN_target_task_completion, async_data);
@@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
}
-/* Create a new asynchronous thread and queue for running future kernels;
- issues a fatal error if the queue cannot be created as all callers expect
- that the queue exists. */
+/* Create a new asynchronous thread and queue for running future kernels. */
struct goacc_asyncqueue *
GOMP_OFFLOAD_openacc_async_construct (int device)
@@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
if (pthread_mutex_init (&aq->mutex, NULL))
{
- GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
+ GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
return NULL;
}
if (pthread_cond_init (&aq->queue_cond_in, NULL))
{
- GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+ GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
return NULL;
}
if (pthread_cond_init (&aq->queue_cond_out, NULL))
{
- GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+ GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
return NULL;
}