plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally)

Message ID f82b292f-dc54-49df-9d85-b321233572f8@baylibre.com
State Accepted
Headers
Series plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct (was: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally) |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

Tobias Burnus Dec. 3, 2024, 10:28 a.m. UTC
  Thomas Schwinge wrote:
> That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
> its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does
> handle the error condition:

Well, it does not really handle it: It also just calls 'fatal',
admittedly after unlocking but if the program dies it does not
really make a difference.

However, there are known issues with libgomp-plugin calling FATAL,
which does not really trigger the FATAL part in libgomp, cf.
https://gcc.gnu.org/PR109664

* * *

The attached patch now defers the error handling to a bit later,
either still in libgomp-plugin or in one case it even propagates
the error through to a user-called routine.

Comments before I commit it?

Tobias
  

Comments

Thomas Schwinge Dec. 4, 2024, 9:56 a.m. UTC | #1
Hi Tobias!

On 2024-12-03T11:28:19+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
> Thomas Schwinge wrote:
>> That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
>> its users); the real user of ['GOMP_OFFLOAD_openacc_async_construct'] does
>> handle the error condition:
>
> Well, it does not really handle it: It also just calls 'fatal',
> admittedly after unlocking but if the program dies it does not
> really make a difference.

It makes a very big difference: without unlocking, the process deadlocks.

> However, there are known issues with libgomp-plugin calling FATAL,
> which does not really trigger the FATAL part in libgomp, cf.
> https://gcc.gnu.org/PR109664

Right.

> The attached patch now defers the error handling to a bit later,
> either still in libgomp-plugin or in one case it even propagates
> the error through to a user-called routine.
>
> Comments before I commit it?

Looks good to me, one incremental step forward, thanks!

(I see there are more missing-unlocking issues in the GCN
'GOMP_OFFLOAD_openacc_async_construct', and not only there; that's for
another day.)


Grüße
 Thomas


> plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct
>
> Follow up to r15-5392-g884637b6362391. As the name implies,
> GOMP_OFFLOAD_openacc_async_construct is also externally called.
> Hence, partially revert previous commit to permit unlocking handling
> in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.
>
> Hence, also the other (indirect) callers had to be updated:
> GOMP_OFFLOAD_dev2dev fails now with 'false' and
> GOMP_OFFLOAD_async_run fatally.
>
> libgomp/ChangeLog:
>
> 	* plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
> 	Handle omp_async_queue == NULL after call tomaybe_init_omp_async.
> 	(GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
> 	partially reverting r15-5392.
>
>  libgomp/plugin/plugin-gcn.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> index d26b93657bf..239acf8cb75 100644
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -3939,6 +3939,8 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n)
>      {
>        struct agent_info *agent = get_agent_info (device);
>        maybe_init_omp_async (agent);
> +      if (!agent->omp_async_queue)
> +	return false;
>        queue_push_copy (agent->omp_async_queue, dst, src, n);
>        return true;
>      }
> @@ -4350,6 +4352,8 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
>      }
>  
>    maybe_init_omp_async (agent);
> +  if (!agent->omp_async_queue)
> +    GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed");
>    queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);
>    queue_push_callback (agent->omp_async_queue,
>  		       GOMP_PLUGIN_target_task_completion, async_data);
> @@ -4388,9 +4392,7 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
>    gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
>  }
>  
> -/* Create a new asynchronous thread and queue for running future kernels;
> -   issues a fatal error if the queue cannot be created as all callers expect
> -   that the queue exists.  */
> +/* Create a new asynchronous thread and queue for running future kernels.  */
>  
>  struct goacc_asyncqueue *
>  GOMP_OFFLOAD_openacc_async_construct (int device)
> @@ -4418,17 +4420,17 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
>  
>    if (pthread_mutex_init (&aq->mutex, NULL))
>      {
> -      GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
> +      GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
>        return NULL;
>      }
>    if (pthread_cond_init (&aq->queue_cond_in, NULL))
>      {
> -      GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +      GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
>        return NULL;
>      }
>    if (pthread_cond_init (&aq->queue_cond_out, NULL))
>      {
> -      GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +      GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
>        return NULL;
>      }
>
  

Patch

plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct

Follow up to r15-5392-g884637b6362391. As the name implies,
GOMP_OFFLOAD_openacc_async_construct is also externally called.
Hence, partially revert previous commit to permit unlocking handling
in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.

Hence, also the other (indirect) callers had to be updated:
GOMP_OFFLOAD_dev2dev fails now with 'false' and
GOMP_OFFLOAD_async_run fatally.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
	Handle omp_async_queue == NULL after call tomaybe_init_omp_async.
	(GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
	partially reverting r15-5392.

 libgomp/plugin/plugin-gcn.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d26b93657bf..239acf8cb75 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3939,6 +3939,8 @@  GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n)
     {
       struct agent_info *agent = get_agent_info (device);
       maybe_init_omp_async (agent);
+      if (!agent->omp_async_queue)
+	return false;
       queue_push_copy (agent->omp_async_queue, dst, src, n);
       return true;
     }
@@ -4350,6 +4352,8 @@  GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
     }
 
   maybe_init_omp_async (agent);
+  if (!agent->omp_async_queue)
+    GOMP_PLUGIN_fatal ("Asynchronous queue initialization failed");
   queue_push_launch (agent->omp_async_queue, kernel, tgt_vars, kla);
   queue_push_callback (agent->omp_async_queue,
 		       GOMP_PLUGIN_target_task_completion, async_data);
@@ -4388,9 +4392,7 @@  GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void *),
   gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
 }
 
-/* Create a new asynchronous thread and queue for running future kernels;
-   issues a fatal error if the queue cannot be created as all callers expect
-   that the queue exists.  */
+/* Create a new asynchronous thread and queue for running future kernels.  */
 
 struct goacc_asyncqueue *
 GOMP_OFFLOAD_openacc_async_construct (int device)
@@ -4418,17 +4420,17 @@  GOMP_OFFLOAD_openacc_async_construct (int device)
 
   if (pthread_mutex_init (&aq->mutex, NULL))
     {
-      GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
+      GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
       return NULL;
     }
   if (pthread_cond_init (&aq->queue_cond_in, NULL))
     {
-      GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+      GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
       return NULL;
     }
   if (pthread_cond_init (&aq->queue_cond_out, NULL))
     {
-      GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
+      GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
       return NULL;
     }