OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays
Commit Message
Based on sollve_vv's tests/4.5/target_teams_distribute/test_target_teams_distribute_nowait.F90
As discussed, for simple pointers – like here with nondescriptor array,
instead of alloc:a + pointer assign, a firstprivate + pointer assign makes
more sense.
It also avoids the race exposed by the sollve_vv testcase in some constellations.
(The testcase, both as attached and currently in sollve_vv [→ Issue #532], is
invalid if run with 'nowait' as there is a race related to the array - as it
only (un)mapped as (disjunct) array sections via interleaved, concurrently
running target constructs with nowait clause.)
There might be more places where this should/could be done – and in principle,
firstprivate could also be useful for an array descriptor (but not its data);
this could also be explored. (Including whether it should then not be privatized
with shared memory.)
OK?
Tobias
PS: OpenACC is excluded as it does its own firstprivate handling.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Comments
On Fri, May 13, 2022 at 07:21:02PM +0200, Tobias Burnus wrote:
> gcc/fortran/ChangeLog:
>
> * trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
> array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
> GOMP_MAP_POINTER for the pointer attachment.
>
> libgomp/ChangeLog:
>
> * testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
Not 100% sure if we want to add such a testcase into the testsuite given
that it is not valid OpenMP, but perhaps it is ok as we are testing a QoI.
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/target-nowait-array-section.f90
> @@ -0,0 +1,56 @@
> +! Runs the the target region asynchrolously and checks for it
> +!
> +! Note that map(alloc: work(:, i)) + nowait should be save
s/save/safe/
Otherwise LGTM.
Jakub
Hi Tobias!
On 2022-05-13T19:44:51+0200, Jakub Jelinek via Fortran <fortran@gcc.gnu.org> wrote:
> On Fri, May 13, 2022 at 07:21:02PM +0200, Tobias Burnus wrote:
>> gcc/fortran/ChangeLog:
>>
>> * trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
>> array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
>> GOMP_MAP_POINTER for the pointer attachment.
>>
>> libgomp/ChangeLog:
>>
>> * testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
>
> Not 100% sure if we want to add such a testcase into the testsuite given
> that it is not valid OpenMP, but perhaps it is ok as we are testing a QoI.
For non-offloading x86_64-pc-linux-gnu '-m32', I'm occasionally (but very
rarely!) seeing this test case FAIL its execution test. Similar can also
be seen on occasional reports via <gcc-testresults@gcc.gnu.org>,
<gcc-regression@gcc.gnu.org>.
Grüße
Thomas
'libgomp.fortran/target-nowait-array-section.f90':
| ! Runs the the target region asynchrolously and checks for it
| !
| ! Note that map(alloc: work(:, i)) + nowait should be safe
| ! given that a nondescriptor array is used. However, it still
| ! violates a map clause restriction, added in OpenMP 5.1 [354:10-13].
|
| PROGRAM test_target_teams_distribute_nowait
| USE ISO_Fortran_env, only: INT64
| implicit none
| INTEGER, parameter :: N = 1024, N_TASKS = 16
| INTEGER :: i, j, k, my_ticket
| INTEGER :: order(n_tasks)
| INTEGER(INT64) :: work(n, n_tasks)
| INTEGER :: ticket
| logical :: async
|
| ticket = 0
|
| !$omp target enter data map(to: ticket, order)
|
| !$omp parallel do num_threads(n_tasks)
| DO i = 1, n_tasks
| !$omp target map(alloc: work(:, i), ticket) private(my_ticket) nowait
| !!$omp target teams distribute map(alloc: work(:, i), ticket) private(my_ticket) nowait
| DO j = 1, n
| ! Waste cyles
| ! work(j, i) = 0
| ! DO k = 1, n*(n_tasks - i)
| ! work(j, i) = work(j, i) + i*j*k
| ! END DO
| my_ticket = 0
| !$omp atomic capture
| ticket = ticket + 1
| my_ticket = ticket
| !$omp end atomic
| !$omp atomic write
| order(i) = my_ticket
| END DO
| !$omp end target !teams distribute
| END DO
| !$omp end parallel do
|
| !$omp target exit data map(from:ticket, order)
|
| IF (ticket .ne. n_tasks*n) stop 1
| if (maxval(order) /= n_tasks*n) stop 2
| ! order(i) == n*i if synchronous and between n and n*n_tasks if run concurrently
| do i = 1, n_tasks
| if (order(i) < n .or. order(i) > n*n_tasks) stop 3
| end do
| async = .false.
| do i = 1, n_tasks
| if (order(i) /= n*i) async = .true.
| end do
| if (.not. async) stop 4 ! Did not run asynchronously
| end
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays
For a non-descriptor array, map(A(n:m)) was mapped as
map(tofrom:A[n-1] [len: ...]) map(alloc:A [pointer assign, bias: ...])
with this patch, it is changed to
map(tofrom:A[n-1] [len: ...]) map(firstprivate:A [pointer assign, bias: ...])
The latter avoids an alloc - and also avoids the race condition with
nowait in the enclosed testcase. (Note: predantically, the testcase is
invalid since OpenMP 5.1, violating the map clause restriction at [354:10-13].
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
GOMP_MAP_POINTER for the pointer attachment.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
gcc/fortran/trans-openmp.cc | 12 +++--
.../target-nowait-array-section.f90 | 56 ++++++++++++++++++++++
2 files changed, 65 insertions(+), 3 deletions(-)
@@ -3312,9 +3312,15 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
/* An array element or array section which is not part of a
derived type, etc. */
bool element = n->expr->ref->u.ar.type == AR_ELEMENT;
- gfc_trans_omp_array_section (block, n, decl, element,
- GOMP_MAP_POINTER, node, node2,
- node3, node4);
+ tree type = TREE_TYPE (decl);
+ gomp_map_kind k = GOMP_MAP_POINTER;
+ if (!openacc
+ && !GFC_DESCRIPTOR_TYPE_P (type)
+ && !(POINTER_TYPE_P (type)
+ && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (type))))
+ k = GOMP_MAP_FIRSTPRIVATE_POINTER;
+ gfc_trans_omp_array_section (block, n, decl, element, k,
+ node, node2, node3, node4);
}
else if (n->expr
&& n->expr->expr_type == EXPR_VARIABLE
new file mode 100644
@@ -0,0 +1,56 @@
+! Runs the the target region asynchrolously and checks for it
+!
+! Note that map(alloc: work(:, i)) + nowait should be save
+! given that a nondescriptor array is used. However, it still
+! violates a map clause restriction, added in OpenMP 5.1 [354:10-13].
+
+PROGRAM test_target_teams_distribute_nowait
+ USE ISO_Fortran_env, only: INT64
+ implicit none
+ INTEGER, parameter :: N = 1024, N_TASKS = 16
+ INTEGER :: i, j, k, my_ticket
+ INTEGER :: order(n_tasks)
+ INTEGER(INT64) :: work(n, n_tasks)
+ INTEGER :: ticket
+ logical :: async
+
+ ticket = 0
+
+ !$omp target enter data map(to: ticket, order)
+
+ !$omp parallel do num_threads(n_tasks)
+ DO i = 1, n_tasks
+ !$omp target map(alloc: work(:, i), ticket) private(my_ticket) nowait
+ !!$omp target teams distribute map(alloc: work(:, i), ticket) private(my_ticket) nowait
+ DO j = 1, n
+ ! Waste cyles
+! work(j, i) = 0
+! DO k = 1, n*(n_tasks - i)
+! work(j, i) = work(j, i) + i*j*k
+! END DO
+ my_ticket = 0
+ !$omp atomic capture
+ ticket = ticket + 1
+ my_ticket = ticket
+ !$omp end atomic
+ !$omp atomic write
+ order(i) = my_ticket
+ END DO
+ !$omp end target !teams distribute
+ END DO
+ !$omp end parallel do
+
+ !$omp target exit data map(from:ticket, order)
+
+ IF (ticket .ne. n_tasks*n) stop 1
+ if (maxval(order) /= n_tasks*n) stop 2
+ ! order(i) == n*i if synchronous and between n and n*n_tasks if run concurrently
+ do i = 1, n_tasks
+ if (order(i) < n .or. order(i) > n*n_tasks) stop 3
+ end do
+ async = .false.
+ do i = 1, n_tasks
+ if (order(i) /= n*i) async = .true.
+ end do
+ if (.not. async) stop 4 ! Did not run asynchronously
+end