From patchwork Fri Mar 10 14:07:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 66211 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A14853858401 for ; Fri, 10 Mar 2023 14:08:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 2EE483858D32 for ; Fri, 10 Mar 2023 14:08:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2EE483858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,249,1673942400"; d="scan'208,223";a="103869624" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 10 Mar 2023 06:08:06 -0800 IronPort-SDR: zxMk+ivjv8Ma++XQlmv2Nqcf5SXyHHCeczmMZQsNNeZ7SlHJYjV7L1hRN9tGKyol/euCMQoS4x onyLENDD7DbGyYMM5twmC/fxemRo+JNwVYAEHr6kR70diGvOCugjaWbnz5cUVdm8SDLBrrD0Dn knyqXHczIwlJoA+vUNZ7DGmUNGHF8wgRydSzSm0NqlmeFAMEPaO968wdMx4BjJPSvYINB7sXpK Fbq2hKOCSaFj6zyXv/wNVG+KWv6FtLL+oS4tZ0taNSyHMWDWSVBJnimUbP+y5gANzFGXhizXDU QPQ= From: Thomas Schwinge To: Julian Brown , CC: Andrew Stubbs Subject: Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position (was: [PATCH] [og9] OpenACC profiling support for AMD GCN) In-Reply-To: <20190906160213.69722-3-julian@codesourcery.com> References: <20190906160213.69722-1-julian@codesourcery.com> <20190906160213.69722-3-julian@codesourcery.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Fri, 10 Mar 2023 15:07:58 +0100 Message-ID: <87h6usbsxt.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! On 2019-09-06T09:02:13-0700, Julian Brown wrote: > This patch adds profiling support to the AMD GCN libgomp plugin, modeled > after the equivalent support in the NVPTX plugin. This gives a positive > test delta in AMD GCN offload testing. Yay! \o/ > I will apply to the openacc-gcc-9-branch shortly. ..., and later these changes got into master branch, via integration into "[PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin". > --- a/libgomp/plugin/plugin-gcn.c > +++ b/libgomp/plugin/plugin-gcn.c | static void | gomp_offload_free (void *ptr) | { | GCN_DEBUG ("Async thread ?:?: Freeing %p\n", ptr); | GOMP_OFFLOAD_free (0, ptr); | } > @@ -3046,6 +3075,35 @@ GOMP_OFFLOAD_free (int device, void *ptr) > return false; > } > > + struct goacc_thread *thr = GOMP_PLUGIN_goacc_thread (); > + bool profiling_dispatch_p > + = __builtin_expect (thr != NULL && thr->prof_info != NULL, false); > + if (profiling_dispatch_p) > + { > + [...] > + prof_info->event_type = acc_ev_free; > + > + [...] > + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, &data_event_info, > + api_info); > + } > + > return true; > } > > @@ -3276,6 +3334,35 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs, > {1, 64, 16} > }; > > + struct goacc_thread *thr = GOMP_PLUGIN_goacc_thread (); > + acc_prof_info *prof_info = thr->prof_info; > + acc_event_info enqueue_launch_event_info; > + acc_api_info *api_info = thr->api_info; > + bool profiling_dispatch_p = __builtin_expect (prof_info != NULL, false); > + if (profiling_dispatch_p) > + { > + prof_info->event_type = acc_ev_enqueue_launch_start; > + > + [...] > + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, > + &enqueue_launch_event_info, api_info); > + } > + > if (!async) > { > run_kernel (kernel, ind_da, &kla, NULL, false); | gomp_offload_free (ind_da); | } | else | { | queue_push_launch (aq, kernel, ind_da, &kla); | if (DEBUG_QUEUES) | GCN_DEBUG ("queue_push_callback %d:%d gomp_offload_free, %p\n", > aq->agent->device_id, aq->id, ind_da); > queue_push_callback (aq, gomp_offload_free, ind_da); > } > + > + if (profiling_dispatch_p) > + { > + prof_info->event_type = acc_ev_enqueue_launch_end; > + enqueue_launch_event_info.launch_event.event_type = prof_info->event_type; > + GOMP_PLUGIN_goacc_profiling_dispatch (prof_info, > + &enqueue_launch_event_info, > + api_info); > + } > } Per that, we've currently got: - [...] - acc_ev_enqueue_launch_start - launch kernel - free memory - acc_ev_free - acc_ev_enqueue_launch_end This confused another thing that I'm working on, so I adjusted that to: - [...] - acc_ev_enqueue_launch_start - launch kernel - acc_ev_enqueue_launch_end - free memory - acc_ev_free Pushed to master branch commit 649f1939baf11f45fd3579b8b9601c7840a097b3 "Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position", see attached. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From 649f1939baf11f45fd3579b8b9601c7840a097b3 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 2 Mar 2023 10:39:09 +0100 Subject: [PATCH] Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' position For an OpenACC compute construct, we've currently got: - [...] - acc_ev_enqueue_launch_start - launch kernel - free memory - acc_ev_free - acc_ev_enqueue_launch_end This confused another thing that I'm working on, so I adjusted that to: - [...] - acc_ev_enqueue_launch_start - launch kernel - acc_ev_enqueue_launch_end - free memory - acc_ev_free Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. libgomp/ * plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end' position. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Verify 'acc_ev_alloc', 'acc_ev_free'. --- libgomp/plugin/plugin-gcn.c | 23 +- .../acc_prof-parallel-1.c | 202 ++++++++++++++++-- 2 files changed, 195 insertions(+), 30 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 11ce6b0fa8d..96920a48d2e 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3192,18 +3192,9 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs, } if (!async) - { - run_kernel (kernel, ind_da, &kla, NULL, false); - gomp_offload_free (ind_da); - } + run_kernel (kernel, ind_da, &kla, NULL, false); else - { - queue_push_launch (aq, kernel, ind_da, &kla); - if (DEBUG_QUEUES) - GCN_DEBUG ("queue_push_callback %d:%d gomp_offload_free, %p\n", - aq->agent->device_id, aq->id, ind_da); - queue_push_callback (aq, gomp_offload_free, ind_da); - } + queue_push_launch (aq, kernel, ind_da, &kla); if (profiling_dispatch_p) { @@ -3213,6 +3204,16 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs, &enqueue_launch_event_info, api_info); } + + if (!async) + gomp_offload_free (ind_da); + else + { + if (DEBUG_QUEUES) + GCN_DEBUG ("queue_push_callback %d:%d gomp_offload_free, %p\n", + aq->agent->device_id, aq->id, ind_da); + queue_push_callback (aq, gomp_offload_free, ind_da); + } } /* }}} */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c index 1f503861cb6..cbf23d7d83b 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c @@ -195,6 +195,139 @@ static void cb_device_init_end (acc_prof_info *prof_info, acc_event_info *event_ #endif } +static void cb_alloc (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) +{ + DEBUG_printf ("%s\n", __FUNCTION__); + +#if DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT +# error TODO +#else + assert (state == 4 + || state == 6 + || state == 104 + || state == 106); + STATE_OP (state, ++); + + if (state == 5 + || state == 105) + { + assert (tool_info != NULL); + assert (tool_info->event_info.other_event.event_type == acc_ev_compute_construct_start); + assert (tool_info->nested != NULL); + assert (tool_info->nested->event_info.other_event.event_type == acc_ev_enter_data_start); + assert (tool_info->nested->nested == NULL); + } + else if (state == 7 + || state == 107) + { + assert (tool_info != NULL); + assert (tool_info->event_info.other_event.event_type == acc_ev_compute_construct_start); + assert (tool_info->nested == NULL); + } + else + abort (); +#endif + + assert (prof_info->event_type == acc_ev_alloc); + assert (prof_info->valid_bytes == _ACC_PROF_INFO_VALID_BYTES); + assert (prof_info->version == _ACC_PROF_INFO_VERSION); + assert (prof_info->device_type == acc_device_type); + assert (prof_info->device_number == acc_device_num); + assert (prof_info->thread_id == -1); + assert (prof_info->async == acc_async); + assert (prof_info->async_queue == prof_info->async); + assert (prof_info->src_file == NULL); + assert (prof_info->func_name == NULL); + assert (prof_info->line_no == -1); + assert (prof_info->end_line_no == -1); + assert (prof_info->func_line_no == -1); + assert (prof_info->func_end_line_no == -1); + + assert (event_info->data_event.event_type == prof_info->event_type); + assert (event_info->data_event.valid_bytes == _ACC_DATA_EVENT_INFO_VALID_BYTES); + assert (event_info->data_event.parent_construct == acc_construct_parallel); + assert (event_info->data_event.implicit == 1); + assert (event_info->data_event.tool_info == NULL); + assert (event_info->data_event.var_name == NULL); + assert (event_info->data_event.bytes != 0); + assert (event_info->data_event.host_ptr == NULL); + assert (event_info->data_event.device_ptr != NULL); + + assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES); + assert (api_info->device_type == prof_info->device_type); + assert (api_info->vendor == -1); + assert (api_info->device_handle == NULL); + assert (api_info->context_handle == NULL); + assert (api_info->async_handle == NULL); +} + +static void cb_free (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) +{ + DEBUG_printf ("%s\n", __FUNCTION__); + +#if DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT +# error TODO +#else + assert (state == 9 + || state == 11); + STATE_OP (state, ++); + + if (state == 10) + { + assert (tool_info != NULL); + assert (tool_info->event_info.other_event.event_type == acc_ev_compute_construct_start); + assert (tool_info->nested == NULL); + } + else if (state == 12) + { + assert (tool_info != NULL); + assert (tool_info->event_info.other_event.event_type == acc_ev_compute_construct_start); + assert (tool_info->nested != NULL); + assert (tool_info->nested->event_info.other_event.event_type == acc_ev_exit_data_start); + assert (tool_info->nested->nested == NULL); + } + else + abort (); +#endif + + assert (prof_info->event_type == acc_ev_free); + assert (prof_info->valid_bytes == _ACC_PROF_INFO_VALID_BYTES); + assert (prof_info->version == _ACC_PROF_INFO_VERSION); + assert (prof_info->device_type == acc_device_type); + assert (prof_info->device_number == acc_device_num); + assert (prof_info->thread_id == -1); + assert (prof_info->async == acc_async); + assert (prof_info->async_queue == prof_info->async); + assert (prof_info->src_file == NULL); + assert (prof_info->func_name == NULL); + assert (prof_info->line_no == -1); + assert (prof_info->end_line_no == -1); + assert (prof_info->func_line_no == -1); + assert (prof_info->func_end_line_no == -1); + + assert (event_info->data_event.event_type == prof_info->event_type); + assert (event_info->data_event.valid_bytes == _ACC_DATA_EVENT_INFO_VALID_BYTES); + assert (event_info->data_event.parent_construct == acc_construct_parallel); + assert (event_info->data_event.implicit == 1); + assert (event_info->data_event.tool_info == NULL); + assert (event_info->data_event.var_name == NULL); + if (acc_device_type == acc_device_nvidia) + assert (event_info->data_event.bytes == (size_t) -1); + else if (acc_device_type == acc_device_radeon) + assert (event_info->data_event.bytes == 0); + else + abort (); + assert (event_info->data_event.host_ptr == NULL); + assert (event_info->data_event.device_ptr != NULL); + + assert (api_info->valid_bytes == _ACC_API_INFO_VALID_BYTES); + assert (api_info->device_type == prof_info->device_type); + assert (api_info->vendor == -1); + assert (api_info->device_handle == NULL); + assert (api_info->context_handle == NULL); + assert (api_info->async_handle == NULL); +} + static void cb_enter_data_start (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) { DEBUG_printf ("%s\n", __FUNCTION__); @@ -246,8 +379,8 @@ static void cb_enter_data_end (acc_prof_info *prof_info, acc_event_info *event_i { DEBUG_printf ("%s\n", __FUNCTION__); - assert (state == 4 - || state == 104); + assert (state == 5 + || state == 105); #if defined COPYIN /* Conceptually, 'acc_ev_enter_data_end' marks the end of data copying, before 'acc_ev_enqueue_launch_start' marks invoking the compute region. @@ -316,9 +449,19 @@ static void cb_exit_data_start (acc_prof_info *prof_info, acc_event_info *event_ { DEBUG_printf ("%s\n", __FUNCTION__); - assert (state == 7 #if ASYNC_EXIT_DATA - || state == 107 + if (acc_async != acc_async_sync) + { + /* Compensate for the deferred 'acc_ev_free'. */ + state += 1; + } +#else +# error TODO +#endif + + assert (state == 10 +#if ASYNC_EXIT_DATA + || state == 110 #endif ); STATE_OP (state, ++); @@ -366,15 +509,25 @@ static void cb_exit_data_start (acc_prof_info *prof_info, acc_event_info *event_ tool_info->nested->event_info.other_event.event_type = event_info->other_event.event_type; event_info->other_event.tool_info = tool_info->nested; + +#if ASYNC_EXIT_DATA + if (acc_async != acc_async_sync) + { + /* Compensate for the deferred 'acc_ev_free'. */ + state += 1; + } +#else +# error TODO +#endif } static void cb_exit_data_end (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) { DEBUG_printf ("%s\n", __FUNCTION__); - assert (state == 8 + assert (state == 12 #if ASYNC_EXIT_DATA - || state == 108 + || state == 112 #endif ); STATE_OP (state, ++); @@ -488,6 +641,8 @@ static void cb_compute_construct_start (acc_prof_info *prof_info, acc_event_info { /* Compensate for the missing 'acc_ev_enter_data_start'. */ state += 1; + /* Compensate for the missing 'acc_ev_alloc'. */ + state += 1; } } @@ -499,12 +654,19 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info * { /* Compensate for the missing 'acc_ev_enter_data_end'. */ state += 1; + /* Compensate for the missing 'acc_ev_alloc'. */ + state += 1; /* Compensate for the missing 'acc_ev_enqueue_launch_start' and 'acc_ev_enqueue_launch_end'. */ state += 2; - /* Compensate for the missing 'acc_ev_exit_data_start' and - 'acc_ev_exit_data_end'. */ - state += 2; + /* Compensate for the missing 'acc_ev_free'. */ + state += 1; + /* Compensate for the missing 'acc_ev_exit_data_start'. */ + state += 1; + /* Compensate for the missing 'acc_ev_free'. */ + state += 1; + /* Compensate for the missing 'acc_ev_exit_data_end'. */ + state += 1; } #if !ASYNC_EXIT_DATA else if (acc_async != acc_async_sync) @@ -514,8 +676,8 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info * state += 2; } #endif - assert (state == 9 - || state == 109); + assert (state == 13 + || state == 113); STATE_OP (state, ++); assert (tool_info != NULL); @@ -569,8 +731,8 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e assert (acc_device_type != acc_device_host); - assert (state == 5 - || state == 105); + assert (state == 7 + || state == 107); STATE_OP (state, ++); assert (tool_info != NULL); @@ -638,8 +800,8 @@ static void cb_enqueue_launch_end (acc_prof_info *prof_info, acc_event_info *eve assert (acc_device_type != acc_device_host); - assert (state == 6 - || state == 106); + assert (state == 8 + || state == 108); STATE_OP (state, ++); assert (tool_info != NULL); @@ -703,6 +865,8 @@ int main() STATE_OP (state, = 0); reg (acc_ev_device_init_start, cb_device_init_start, acc_reg); reg (acc_ev_device_init_end, cb_device_init_end, acc_reg); + reg (acc_ev_alloc, cb_alloc, acc_reg); + reg (acc_ev_free, cb_free, acc_reg); reg (acc_ev_enter_data_start, cb_enter_data_start, acc_reg); reg (acc_ev_enter_data_end, cb_enter_data_end, acc_reg); reg (acc_ev_exit_data_start, cb_exit_data_start, acc_reg); @@ -725,9 +889,9 @@ int main() state_init = state; } - assert (state_init == 4); + assert (state_init == 5); } - assert (state == 10); + assert (state == 14); STATE_OP (state, = 100); @@ -742,9 +906,9 @@ int main() } acc_async = acc_async_sync; #pragma acc wait - assert (state_init == 104); + assert (state_init == 105); } - assert (state == 110); + assert (state == 114); return 0; } -- 2.25.1