From patchwork Mon Nov 15 12:05:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 47666 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D66FF3858405 for ; Mon, 15 Nov 2021 12:07:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D66FF3858405 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636978062; bh=Bh2l+PcZDIFseYpi9TUL/OArRjfYBpAMVVQoDu0yVlw=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=AlmUztGy3epwCRXgxvC49C5jDqMZXXsQVbDI/tuuEKzfgn76zvKQsfmTaWQp/40/+ DYxx3rCSjPLcheyytz9CurGVh5MnIsoODkLWxKEp/BnNHkFNv8aS0NIAUX+lVKRDOr jUmfjE5b+gpnUU6voETPJlHL65f3PXYEXQt++4Bw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 592823858433 for ; Mon, 15 Nov 2021 12:05:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 592823858433 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-373-ot8pkROKMIKikKgd0Ur08Q-1; Mon, 15 Nov 2021 07:05:46 -0500 X-MC-Unique: ot8pkROKMIKikKgd0Ur08Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 540041B2C989; Mon, 15 Nov 2021 12:05:45 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.54]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C59245D6BA; Mon, 15 Nov 2021 12:05:44 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 1AFC5fwc2522810 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 15 Nov 2021 13:05:42 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 1AFC5egZ2522685; Mon, 15 Nov 2021 13:05:40 +0100 Date: Mon, 15 Nov 2021 13:05:40 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH] openmp: Add support for thread_limit clause on target Message-ID: <20211115120540.GV2710@tucnak> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Cc: Tobias Burnus Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! OpenMP 5.1 says that thread_limit clause can also appear on target, and similarly to teams should affect the thread-limit-var ICV. On combined target teams, the clause goes to both. We actually passed thread_limit internally on target already before, but only used it for gcn/ptx offloading to hint how many threads should be created and for ptx didn't set thread_limit_var in that case. Similarly for host fallback. Also, I found that we weren't copying the args array that contains encoded thread_limit and num_teams clause for target (etc.) for async target. Will commit to trunk once testing finishes. 2021-11-15 Jakub Jelinek gcc/ * gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT to OMP_TARGET_CLAUSES if it isn't there already. gcc/c-family/ * c-omp.c (c_omp_split_clauses) : Duplicate to both OMP_TARGET and OMP_TEAMS. gcc/c/ * c-parser.c (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_THREAD_LIMIT. gcc/cp/ * parser.c (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_THREAD_LIMIT. libgomp/ * task.c (gomp_create_target_task): Copy args array as well. * target.c (gomp_target_fallback): Add args argument. Set gomp_icv (true)->thread_limit_var if thread_limit is present. (GOMP_target): Adjust gomp_target_fallback caller. (GOMP_target_ext): Likewise. (gomp_target_task_fn): Likewise. * config/nvptx/team.c (gomp_nvptx_main): Set gomp_global_icv.thread_limit_var. * testsuite/libgomp.c-c++-common/thread-limit-1.c: New test. Jakub --- gcc/gimplify.c.jj 2021-11-12 15:13:09.030919433 +0100 +++ gcc/gimplify.c 2021-11-15 11:06:20.021516251 +0100 @@ -13637,10 +13637,13 @@ optimize_target_teams (tree target, gimp if (!DECL_P (expr) && TREE_CODE (expr) != TARGET_EXPR) OMP_CLAUSE_OPERAND (c, 0) = *p; } - c = build_omp_clause (thread_limit_loc, OMP_CLAUSE_THREAD_LIMIT); - OMP_CLAUSE_THREAD_LIMIT_EXPR (c) = thread_limit; - OMP_CLAUSE_CHAIN (c) = OMP_TARGET_CLAUSES (target); - OMP_TARGET_CLAUSES (target) = c; + if (!omp_find_clause (OMP_TARGET_CLAUSES (target), OMP_CLAUSE_THREAD_LIMIT)) + { + c = build_omp_clause (thread_limit_loc, OMP_CLAUSE_THREAD_LIMIT); + OMP_CLAUSE_THREAD_LIMIT_EXPR (c) = thread_limit; + OMP_CLAUSE_CHAIN (c) = OMP_TARGET_CLAUSES (target); + OMP_TARGET_CLAUSES (target) = c; + } c = build_omp_clause (num_teams_loc, OMP_CLAUSE_NUM_TEAMS); OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR (c) = num_teams_upper; OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR (c) = num_teams_lower; --- gcc/c-family/c-omp.c.jj 2021-10-27 09:21:50.756247410 +0200 +++ gcc/c-family/c-omp.c 2021-11-15 11:03:28.400918573 +0100 @@ -1867,7 +1867,6 @@ c_omp_split_clauses (location_t loc, enu s = C_OMP_CLAUSE_SPLIT_TARGET; break; case OMP_CLAUSE_NUM_TEAMS: - case OMP_CLAUSE_THREAD_LIMIT: s = C_OMP_CLAUSE_SPLIT_TEAMS; break; case OMP_CLAUSE_DIST_SCHEDULE: @@ -2531,6 +2530,30 @@ c_omp_split_clauses (location_t loc, enu else s = C_OMP_CLAUSE_SPLIT_FOR; break; + /* thread_limit is allowed on target and teams. Distribute it + to all. */ + case OMP_CLAUSE_THREAD_LIMIT: + if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_MAP)) + != 0) + { + if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NUM_TEAMS)) + != 0) + { + c = build_omp_clause (OMP_CLAUSE_LOCATION (clauses), + OMP_CLAUSE_THREAD_LIMIT); + OMP_CLAUSE_THREAD_LIMIT_EXPR (c) + = OMP_CLAUSE_THREAD_LIMIT_EXPR (clauses); + OMP_CLAUSE_CHAIN (c) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; + cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = c; + } + else + { + s = C_OMP_CLAUSE_SPLIT_TARGET; + break; + } + } + s = C_OMP_CLAUSE_SPLIT_TEAMS; + break; /* Allocate clause is allowed on target, teams, distribute, parallel, for, sections and taskloop. Distribute it to all. */ case OMP_CLAUSE_ALLOCATE: --- gcc/c/c-parser.c.jj 2021-11-11 14:35:37.465350510 +0100 +++ gcc/c/c-parser.c 2021-11-15 10:51:27.257024830 +0100 @@ -20963,6 +20963,7 @@ c_parser_omp_target_exit_data (location_ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEFAULTMAP) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION) \ + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_THREAD_LIMIT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR)) static bool --- gcc/cp/parser.c.jj 2021-11-11 14:35:37.550349286 +0100 +++ gcc/cp/parser.c 2021-11-15 10:52:00.026564979 +0100 @@ -44015,6 +44015,7 @@ cp_parser_omp_target_update (cp_parser * | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_DEFAULTMAP) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_ALLOCATE) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION) \ + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_THREAD_LIMIT) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR)) static bool --- libgomp/task.c.jj 2021-05-18 10:04:31.543435200 +0200 +++ libgomp/task.c 2021-11-15 12:02:38.336697281 +0100 @@ -745,6 +745,7 @@ gomp_create_target_task (struct gomp_dev size_t depend_size = 0; uintptr_t depend_cnt = 0; size_t tgt_align = 0, tgt_size = 0; + uintptr_t args_cnt = 0; if (depend != NULL) { @@ -769,10 +770,22 @@ gomp_create_target_task (struct gomp_dev tgt_size += tgt_align - 1; else tgt_size = 0; + if (args) + { + void **cargs = args; + while (*cargs) + { + intptr_t id = (intptr_t) *cargs++; + if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM) + cargs++; + } + args_cnt = cargs + 1 - args; + } } task = gomp_malloc (sizeof (*task) + depend_size + sizeof (*ttask) + + args_cnt * sizeof (void *) + mapnum * (sizeof (void *) + sizeof (size_t) + sizeof (unsigned short)) + tgt_size); @@ -785,9 +798,18 @@ gomp_create_target_task (struct gomp_dev ttask->devicep = devicep; ttask->fn = fn; ttask->mapnum = mapnum; - ttask->args = args; memcpy (ttask->hostaddrs, hostaddrs, mapnum * sizeof (void *)); - ttask->sizes = (size_t *) &ttask->hostaddrs[mapnum]; + if (args_cnt) + { + ttask->args = (void **) &ttask->hostaddrs[mapnum]; + memcpy (ttask->args, args, args_cnt * sizeof (void *)); + ttask->sizes = (size_t *) &ttask->args[args_cnt]; + } + else + { + ttask->args = args; + ttask->sizes = (size_t *) &ttask->hostaddrs[mapnum]; + } memcpy (ttask->sizes, sizes, mapnum * sizeof (size_t)); ttask->kinds = (unsigned short *) &ttask->sizes[mapnum]; memcpy (ttask->kinds, kinds, mapnum * sizeof (unsigned short)); --- libgomp/target.c.jj 2021-11-12 15:13:09.113918200 +0100 +++ libgomp/target.c 2021-11-15 11:43:42.358162423 +0100 @@ -2362,7 +2362,7 @@ gomp_unload_device (struct gomp_device_d static void gomp_target_fallback (void (*fn) (void *), void **hostaddrs, - struct gomp_device_descr *devicep) + struct gomp_device_descr *devicep, void **args) { struct gomp_thread old_thr, *thr = gomp_thread (); @@ -2378,6 +2378,25 @@ gomp_target_fallback (void (*fn) (void * thr->place = old_thr.place; thr->ts.place_partition_len = gomp_places_list_len; } + if (args) + while (*args) + { + intptr_t id = (intptr_t) *args++, val; + if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM) + val = (intptr_t) *args++; + else + val = id >> GOMP_TARGET_ARG_VALUE_SHIFT; + if ((id & GOMP_TARGET_ARG_DEVICE_MASK) != GOMP_TARGET_ARG_DEVICE_ALL) + continue; + id &= GOMP_TARGET_ARG_ID_MASK; + if (id != GOMP_TARGET_ARG_THREAD_LIMIT) + continue; + val = val > INT_MAX ? INT_MAX : val; + if (val) + gomp_icv (true)->thread_limit_var = val; + break; + } + fn (hostaddrs); gomp_free_thread (thr); *thr = old_thr; @@ -2478,7 +2497,7 @@ GOMP_target (int device, void (*fn) (voi /* All shared memory devices should use the GOMP_target_ext function. */ || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM || !(fn_addr = gomp_get_target_fn_addr (devicep, fn))) - return gomp_target_fallback (fn, hostaddrs, devicep); + return gomp_target_fallback (fn, hostaddrs, devicep, NULL); htab_t refcount_set = htab_create (mapnum); struct target_mem_desc *tgt_vars @@ -2617,7 +2636,7 @@ GOMP_target_ext (int device, void (*fn) tgt_align, tgt_size); } } - gomp_target_fallback (fn, hostaddrs, devicep); + gomp_target_fallback (fn, hostaddrs, devicep, args); return; } @@ -3052,7 +3071,8 @@ gomp_target_task_fn (void *data) || (devicep->can_run_func && !devicep->can_run_func (fn_addr))) { ttask->state = GOMP_TARGET_TASK_FALLBACK; - gomp_target_fallback (ttask->fn, ttask->hostaddrs, devicep); + gomp_target_fallback (ttask->fn, ttask->hostaddrs, devicep, + ttask->args); return false; } --- libgomp/config/nvptx/team.c.jj 2021-11-15 09:20:47.966837531 +0100 +++ libgomp/config/nvptx/team.c 2021-11-15 11:14:36.477567443 +0100 @@ -55,6 +55,7 @@ gomp_nvptx_main (void (*fn) (void *), vo if (tid == 0) { gomp_global_icv.nthreads_var = ntids; + gomp_global_icv.thread_limit_var = ntids; /* Starting additional threads is not supported. */ gomp_global_icv.dyn_var = true; --- libgomp/testsuite/libgomp.c-c++-common/thread-limit-1.c.jj 2021-11-15 12:24:59.643001103 +0100 +++ libgomp/testsuite/libgomp.c-c++-common/thread-limit-1.c 2021-11-15 12:24:52.865095292 +0100 @@ -0,0 +1,23 @@ +#include +#include + +void +foo () +{ + { + #pragma omp target parallel nowait thread_limit (4) num_threads (1) + if (omp_get_thread_limit () > 4) + abort (); + } + #pragma omp taskwait +} + +int +main () +{ + #pragma omp target thread_limit (6) + if (omp_get_thread_limit () > 6) + abort (); + foo (); + return 0; +}