From patchwork Thu Jan 12 13:51:19 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Schwinge <thomas@codesourcery.com>
X-Patchwork-Id: 63096
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 90F8E38493F6
	for <patchwork@sourceware.org>; Thu, 12 Jan 2023 13:51:49 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252])
 by sourceware.org (Postfix) with ESMTPS id D5B9D3858C66
 for <gcc-patches@gcc.gnu.org>; Thu, 12 Jan 2023 13:51:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5B9D3858C66
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.97,211,1669104000"; d="scan'208,223";a="93223915"
Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165])
 by esa4.mentor.iphmx.com with ESMTP; 12 Jan 2023 05:51:31 -0800
IronPort-SDR: 
 B0Jets+p89f/STU7ZW1NbMqBkaG/I8fN8c/P91P3D/JkNnD0aAQGC8/jVuPhw+X3ETqgZs7RHq
 eevL1F3OMSUPqnZ+oID6tS37rqCYE+EmzthQeiFrHNI52ONdUwALvseQwKqyYJ7QtjNb8OZlpZ
 CcncjUBVOD8YF07bkcDiZydDpd1Yar53zYuJGz6N1M/MANdYW5yCvTTkxNh4CQx00zwdIlBw6Q
 giz88gIC1wy4MxJroFvIc8G4CT7t0YMncXOnW9qs2XFuO0KEApcmZUpcQXKjGqDFthiF5kAPjx
 cH0=
From: Thomas Schwinge <thomas@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>, Chung-Lin Tang <cltang@codesourcery.com>, "Tom
 de Vries" <tdevries@suse.de>
Subject: nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case
 (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes)
In-Reply-To: <9523b49a-0454-e0a9-826d-5eeec2a8c973@mentor.com>
References: <9523b49a-0454-e0a9-826d-5eeec2a8c973@mentor.com>
User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2
 (x86_64-pc-linux-gnu)
Date: Thu, 12 Jan 2023 14:51:19 +0100
Message-ID: <87zgan6eug.fsf@euler.schwinge.homeip.net>
MIME-Version: 1.0
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10)
X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi Chung-Lin, Tom!

It's been a while:

On 2018-09-25T21:11:58+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> [...] NVPTX/CUDA-specific implementation
> of the new-style goacc_asyncqueues.

In an OpenACC 'async' setting, where the device kernel (expectedly)
crashes because of "an illegal memory access was encountered", I'm
running into a deadlock here:

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

> +static void
> +cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
> +{
> +  if (res != CUDA_SUCCESS)
> +    GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
> +  struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
> +  cb->fn (cb->ptr);
> +  free (ptr);
> +}
> +
> +void
> +GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *aq,
> +                                        void (*callback_fn)(void *),
> +                                        void *userptr)
> +{
> +  struct nvptx_callback *b = GOMP_PLUGIN_malloc (sizeof (*b));
> +  b->fn = callback_fn;
> +  b->ptr = userptr;
> +  b->aq = aq;
> +  CUDA_CALL_ASSERT (cuStreamAddCallback, aq->cuda_stream,
> +                 cuda_callback_wrapper, (void *) b, 0);
> +}

In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with
'res != CUDA_SUCCESS' ("an illegal memory access was encountered").
When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
(..., which deadlocks); that's generally problematic: per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>
"'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".

Given that eventually we must reach a host/device synchronization point
(latest when the device is shut down at program termination), and the
non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the
"nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case"
attached.  OK to push?

(Might we even skip 'GOMP_PLUGIN_error' here, understanding that the
error will be caught and reported at the next host/device synchronization
point?  But I've not verified that.)


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

From b7ddcc0807967750e3c884326ed4c53c05cde81f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 12 Jan 2023 14:39:46 +0100
Subject: [PATCH] nvptx: Avoid deadlock in 'cuStreamAddCallback' callback,
 error case

When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device
(..., which may deadlock); that's generally problematic: per
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483>
"'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".

Given that eventually we must reach a host/device synchronization point
(latest when the device is shut down at program termination), and the
non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to
replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error'.

	libgomp/
	* plugin/plugin-nvptx.c (cuda_callback_wrapper): Invoke
	'GOMP_PLUGIN_error' instead of 'GOMP_PLUGIN_fatal'.
---
 libgomp/plugin/plugin-nvptx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 395639537e83..cdb3d435bdc8 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1927,7 +1927,7 @@ static void
 cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr)
 {
   if (res != CUDA_SUCCESS)
-    GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res));
+    GOMP_PLUGIN_error ("%s error: %s", __FUNCTION__, cuda_error (res));
   struct nvptx_callback *cb = (struct nvptx_callback *) ptr;
   cb->fn (cb->ptr);
   free (ptr);
-- 
2.39.0