libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy* (was: [Patch] libgomp.texi: Document omp_pause_resource{,_all})

Message ID 301e4198-4dbd-453f-8746-95d5d1ec2bf2@net-b.de
State New
Headers
Series libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy* (was: [Patch] libgomp.texi: Document omp_pause_resource{,_all}) |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed

Commit Message

Tobias Burnus Jan. 14, 2024, 11:15 p.m. UTC
  Hi Sandra, hi all,

Sandra Loosemore:
> On 1/14/24 07:26, Tobias Burnus wrote:
> I have some minor nits about typos and copy-editing.

Thanks. That's the downside of doing editing while being sleepy on a 
train. Updated and extended version (documenting also omp_target_memcpy) 
is attached. Warning: Still mostly done during a train ride.

> I assume the  formatting of the interface syntax
> is consistent with how it's done elsewhere in the manual.

It should be consistent, but I think it eventually need some cleanup as 
the indentation does not work for the continuation lines.

And I think some other improvements are needed, but that's a slow 
step-by-step process.

* * *

> Re the content, I see no documentation for omp_pause_resource_t or the 
> equivalent in Fortran, or any hint about what the kind argument is for.

There is actually some Fortran documentation at 
https://gcc.gnu.org/onlinedocs/gfortran/OpenMP-Modules-OMP_005fLIB-and-OMP_005fLIB_005fKINDS.html 
(gcc/fortran/intrinsic.texi).

But I concur that moving it to libgomp.texi and adding a C version makes 
sense; see also PR110364 under "BTW" (2nd paragraph).

> If it's to explain implementation-specific 
> features, then it should at least be documenting whether GCC supports 
> additional pause kinds as permitted by the spec.

It doesn't - and it lacks the OpenMP 6.0 addition (post-TR12) as well.

Tobias
  

Comments

Sandra Loosemore Jan. 15, 2024, 4:35 a.m. UTC | #1
On 1/14/24 16:15, Tobias Burnus wrote:

> +@node omp_target_memcpy
> +@subsection @code{omp_target_memcpy} -- Copy data between devices
> +@table @asis
> +@item @emph{Description}:
> +This routine tests copies @var{length} of bytes of data from the device
> +identified by device number @var{src_device_num} to device @var{dst_device_num}.

Hmmm, I'm sure it's the train's fault :-) but "tests copies" makes no sense, 
and that's cut-and-pasted multiple times.  I think you just mean "copies" in 
all cases.

> +@node omp_target_memcpy_rect
> +@subsection @code{omp_target_memcpy_rect} -- Copy a subvolume of data between devices
> +@table @asis
> +@item @emph{Description}:
> +This routine tests copies a subvolume of data from the device identified by
> +device number @var{src_device_num} to device @var{dst_device_num}.  The
> +subvolume of a multi-dimensional array of array dimension @var{num_dims} and
> +each array element has a size of @var{element_size} bytes.  The @var{volume}

This is kind of garbled.  How about rephrasing that second sentence as

The array has @var{num_dims} and each array element has a size of 
@var{element_size} bytes.


> +array specifies how many elements per dimension will be copied.  The full

s/will be/are/

> +array in number of elements is given by the @var{dst_dimensions} and
> +@var{src_dimensions} arguments for the array on the destination and source
> +device, respectively.  The offset per dimension to the first element to

I think we can simplify that sentence, too, like

The full sizes of the destination and source arrays are given by the 
@var{dst_dimensions} and @var{src_dimensions} arguments, respectively.

> +be copied is given by the @var{dst_offset} and @var{src_offset} arguments.
> +The routine returns zero on success and non-zero otherwise.
> +
> +The OpenMP only requires that @var{num_dims} up to three is supported. In order

s/OpenMP/OpenMP specification/ ?

> +to find implementation-specific maximally supported number of dimensions, the
> +routine will return this value when invoked with a NULL pointer to both the

s/will return/returns/

either "null pointer" or "@code{NULL}" is preferable to "NULL pointer".

> +@var{dst} and @var{src} arguments.  As GCC supports arbitrary dimensions, it
> +will return INTMAX.

s/will return INTMAX/returns @code{INT_MAX}/

> +
> +The device-number arguments must be conforming device number, the @var{src} and

s/number,/numbers,/


> +@var{dst} must be either both NULL or any of the following must be fulfilled:

same issue with "NULL" here, either "@code{NULL}" or "null pointers".

"any" seems unlikely to be useful.  Do you mean "all" of the following conditions?

> +@var{element_size} and @var{num_dims} must be positive, the @var{volume}, offset
> +and dimension arrays must have at least @var{num_dims} dimensions.
> +Running this routine in a @code{target} region except on the initial device
> +is not supported.

The part of the patch for omp_target_memcpy_rect_async has very similar 
problems and needs the same fixes.

-Sandra
  
Tobias Burnus Jan. 23, 2024, 11:37 a.m. UTC | #2
Hi Sandra,

thanks for the comments and proposals! An updated version is enclosed.

Unless you find more issues, I intent to commit it soon.

Tobias

PS: I think besides filling gaps, some editing wouldn't harm; if you 
feel bored ...

https://gcc.gnu.org/onlinedocs/libgomp/
  

Patch

libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy*

libgomp/ChangeLog:

	* libgomp.texi (Runtime Library Routines): Document
	omp_pause_resource, omp_pause_resource_all and
	omp_target_memcpy{,_rect}{,_async}.

 libgomp/libgomp.texi | 329 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 314 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 74d4ef34c43..d3adfd48545 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -561,7 +561,7 @@  specification in version 5.2.
 * Thread Affinity Routines::
 * Teams Region Routines::
 * Tasking Routines::
-@c * Resource Relinquishing Routines::
+* Resource Relinquishing Routines::
 * Device Information Routines::
 * Device Memory Routines::
 * Lock Routines::
@@ -1504,16 +1504,78 @@  and @code{false} represent their language-specific counterparts.
 
 
 
-@c @node Resource Relinquishing Routines
-@c @section Resource Relinquishing Routines
-@c
-@c Routines releasing resources used by the OpenMP runtime.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_pause_resource:: <fixme>
-@c * omp_pause_resource_all:: <fixme>
-@c @end menu
+@node Resource Relinquishing Routines
+@section Resource Relinquishing Routines
+
+Routines releasing resources used by the OpenMP runtime.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_pause_resource:: Release OpenMP resources on a device
+* omp_pause_resource_all:: Release OpenMP resources on all devices
+@end menu
+
+
+
+@node omp_pause_resource
+@subsection @code{omp_pause_resource} -- Release OpenMP resources on a device
+@table @asis
+@item @emph{Description}:
+Free resources used by the OpenMP program and the runtime library on and for the
+device specified by @var{device_num}; on success, zero is returned and non-zero
+otherwise.
+
+The value of @var{device_num} must be a conforming device number.  The routine
+may not be called from within any explicit region and all explicit threads that
+do not bind to the implicit parallel region have finalized execution.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind, int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind, device_num)}
+@item                   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@item                   @tab @code{integer device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.43.
+@end table
+
+
+
+@node omp_pause_resource_all
+@subsection @code{omp_pause_resource_all} -- Release OpenMP resources on all devices
+@table @asis
+@item @emph{Description}:
+Free resources used by the OpenMP program and the runtime library on all devices,
+including the host. On success, zero is returned and non-zero otherwise.
+
+The routine may not be called from within any explicit region and all explicit
+threads that do not bind to the implicit parallel region have finalized execution.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind)}
+@item                   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_pause_resource}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.44.
+@end table
+
+
 
 @node Device Information Routines
 @section Device Information Routines
@@ -1720,10 +1782,10 @@  pointers on devices. They have C linkage and do not throw exceptions.
 * omp_target_free:: Free device memory
 * omp_target_is_present:: Check whether storage is mapped
 * omp_target_is_accessible:: Check whether memory is device accessible
-@c * omp_target_memcpy:: <fixme>
-@c * omp_target_memcpy_rect:: <fixme>
-@c * omp_target_memcpy_async:: <fixme>
-@c * omp_target_memcpy_rect_async:: <fixme>
+* omp_target_memcpy:: Copy data between devices
+* omp_target_memcpy_rect:: Copy a subvolume of data between devices
+* omp_target_memcpy_async:: Copy data between devices asynchronously
+* omp_target_memcpy_rect_async:: Copy a subvolume of data between devices asynchronously
 @c * omp_target_memset:: <fixme>/TR12
 @c * omp_target_memset_async:: <fixme>/TR12
 * omp_target_associate_ptr:: Associate a device pointer with a host pointer
@@ -1899,6 +1961,243 @@  is not supported.
 
 
 
+@node omp_target_memcpy
+@subsection @code{omp_target_memcpy} -- Copy data between devices
+@table @asis
+@item @emph{Description}:
+This routine tests copies @var{length} of bytes of data from the device
+identified by device number @var{src_device_num} to device @var{dst_device_num}.
+The data is copied from the source device from the address provided by
+@var{src}, shifted by the offset of @var{src_offset} bytes, to the destination
+device's @var{dst} address shifted by @var{dst_offset}.  The routine returns
+zero on success and non-zero otherwise.
+
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_target_memcpy(void *dst,}
+@item                   @tab @code{                           const void *src,}
+@item                   @tab @code{                           size_t length,}
+@item                   @tab @code{                           size_t dst_offset,}
+@item                   @tab @code{                           size_t src_offset,}
+@item                   @tab @code{                           int dst_device_num,}
+@item                   @tab @code{                           int src_device_num)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy( &}
+@item                   @tab @code{    dst, src, length, dst_offset, src_offset, &}
+@item                   @tab @code{    dst_device_num, src_device_num) bind(C)}
+@item                   @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
+@item                   @tab @code{type(c_ptr), value :: dst, src}
+@item                   @tab @code{integer(c_size_t), value :: length, dst_offset, src_offset}
+@item                   @tab @code{integer(c_int), value :: dst_device_num, src_device_num}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_target_memcpy_async}, @ref{omp_target_memcpy_rect}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.5
+@end table
+
+
+
+@node omp_target_memcpy_async
+@subsection @code{omp_target_memcpy_async} -- Copy data between devices asynchronously
+@table @asis
+@item @emph{Description}:
+This routine tests copies asynchronously @var{length} of bytes of data from the
+device identified by device number @var{src_device_num} to device
+@var{dst_device_num}.  The data is copied from the source device from the
+address provided by @var{src}, shifted by the offset of @var{src_offset} bytes,
+to the destination device's @var{dst} address shifted by @var{dst_offset}.
+Task dependence is expressed by passing an array of depend objects to
+@var{depobj_list}, where the number of array elements is passed as
+@var{depobj_count}; if the count is zero, the @var{depobj_list} argument is
+ignored.  The routine returns zero if the copying process has successfully
+been started and non-zero otherwise.
+
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_target_memcpy_async(void *dst,}
+@item                   @tab @code{                           const void *src,}
+@item                   @tab @code{                           size_t length,}
+@item                   @tab @code{                           size_t dst_offset,}
+@item                   @tab @code{                           size_t src_offset,}
+@item                   @tab @code{                           int dst_device_num,}
+@item                   @tab @code{                           int src_device_num,}
+@item                   @tab @code{                           int depobj_count,}
+@item                   @tab @code{                           omp_depend_t *depobj_list)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy_async( &}
+@item                   @tab @code{    dst, src, length, dst_offset, src_offset, &}
+@item                   @tab @code{    dst_device_num, src_device_num, &}
+@item                   @tab @code{    depobj_count, depobj_list) bind(C)}
+@item                   @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
+@item                   @tab @code{type(c_ptr), value :: dst, src}
+@item                   @tab @code{integer(c_size_t), value :: length, dst_offset, src_offset}
+@item                   @tab @code{integer(c_int), value :: dst_device_num, src_device_num, depobj_count}
+@item                   @tab @code{integer(omp_depend_kind), optional :: depobj_list(*)}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_target_memcpy}, @ref{omp_target_memcpy_rect_async}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.7
+@end table
+
+
+
+@node omp_target_memcpy_rect
+@subsection @code{omp_target_memcpy_rect} -- Copy a subvolume of data between devices
+@table @asis
+@item @emph{Description}:
+This routine tests copies a subvolume of data from the device identified by
+device number @var{src_device_num} to device @var{dst_device_num}.  The
+subvolume of a multi-dimensional array of array dimension @var{num_dims} and
+each array element has a size of @var{element_size} bytes.  The @var{volume}
+array specifies how many elements per dimension will be copied.  The full
+array in number of elements is given by the @var{dst_dimensions} and
+@var{src_dimensions} arguments for the array on the destination and source
+device, respectively.  The offset per dimension to the first element to
+be copied is given by the @var{dst_offset} and @var{src_offset} arguments.
+The routine returns zero on success and non-zero otherwise.
+
+The OpenMP only requires that @var{num_dims} up to three is supported. In order
+to find implementation-specific maximally supported number of dimensions, the
+routine will return this value when invoked with a NULL pointer to both the
+@var{dst} and @var{src} arguments.  As GCC supports arbitrary dimensions, it
+will return INTMAX.
+
+The device-number arguments must be conforming device number, the @var{src} and
+@var{dst} must be either both NULL or any of the following must be fulfilled:
+@var{element_size} and @var{num_dims} must be positive, the @var{volume}, offset
+and dimension arrays must have at least @var{num_dims} dimensions.
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect(void *dst,}
+@item                   @tab @code{                           const void *src,}
+@item                   @tab @code{                           size_t element_size,}
+@item                   @tab @code{                           int num_dims,}
+@item                   @tab @code{                           const size_t *volume,}
+@item                   @tab @code{                           const size_t *dst_offset,}
+@item                   @tab @code{                           const size_t *src_offset,}
+@item                   @tab @code{                           const size_t *dst_dimensions,}
+@item                   @tab @code{                           const size_t *src_dimensions,}
+@item                   @tab @code{                           int dst_device_num,}
+@item                   @tab @code{                           int src_device_num)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy_rect( &}
+@item                   @tab @code{    dst, src, element_size, num_dims, volume, &}
+@item                   @tab @code{    dst_offset, src_offset, dst_dimensions, &}
+@item                   @tab @code{    src_dimensions, dst_device_num, src_device_num) bind(C)}
+@item                   @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
+@item                   @tab @code{type(c_ptr), value :: dst, src}
+@item                   @tab @code{integer(c_size_t), value :: element_size, dst_offset, src_offset}
+@item                   @tab @code{integer(c_size_t), value :: volume, dst_dimensions, src_dimensions}
+@item                   @tab @code{integer(c_int), value :: num_dims, dst_device_num, src_device_num}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_target_memcpy_rect_async}, @ref{omp_target_memcpy}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.6
+@end table
+
+
+
+@node omp_target_memcpy_rect_async
+@subsection @code{omp_target_memcpy_rect_async} -- Copy a subvolume of data between devices asynchronously
+@table @asis
+@item @emph{Description}:
+This routine tests copies asynchronously a subvolume of data from the device
+identified by device number @var{src_device_num} to device @var{dst_device_num}.
+The subvolume of a multi-dimensional array of array dimension @var{num_dims} and
+each array element has a size of @var{element_size} bytes.  The @var{volume}
+array specifies how many elements per dimension will be copied.  The full
+array in number of elements is given by the @var{dst_dimensions} and
+@var{src_dimensions} arguments for the array on the destination and source
+device, respectively.  The offset per dimension to the first element to
+be copied is given by the @var{dst_offset} and @var{src_offset} arguments.
+The routine returns zero if the copying process has successfully
+been started and non-zero otherwise.
+
+The OpenMP only requires that @var{num_dims} up to three is supported. In order
+to find implementation-specific maximally supported number of dimensions, the
+routine will return this value when invoked with a NULL pointer to both the
+@var{dst} and @var{src} arguments.  As GCC supports arbitrary dimensions, it
+will return INTMAX.
+
+The device-number arguments must be conforming device number, the @var{src} and
+@var{dst} must be either both NULL or any of the following must be fulfilled:
+@var{element_size} and @var{num_dims} must be positive, the @var{volume}, offset
+and dimension arrays must have at least @var{num_dims} dimensions.
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+Running this routine in a @code{target} region except on the initial device
+is not supported.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect_async(void *dst,}
+@item                   @tab @code{                           const void *src,}
+@item                   @tab @code{                           size_t element_size,}
+@item                   @tab @code{                           int num_dims,}
+@item                   @tab @code{                           const size_t *volume,}
+@item                   @tab @code{                           const size_t *dst_offset,}
+@item                   @tab @code{                           const size_t *src_offset,}
+@item                   @tab @code{                           const size_t *dst_dimensions,}
+@item                   @tab @code{                           const size_t *src_dimensions,}
+@item                   @tab @code{                           int dst_device_num,}
+@item                   @tab @code{                           int src_device_num,}
+@item                   @tab @code{                           int depobj_count,}
+@item                   @tab @code{                           omp_depend_t *depobj_list)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy_rect_async( &}
+@item                   @tab @code{    dst, src, element_size, num_dims, volume, &}
+@item                   @tab @code{    dst_offset, src_offset, dst_dimensions, &}
+@item                   @tab @code{    src_dimensions, dst_device_num, src_device_num, &}
+@item                   @tab @code{    depobj_count, depobj_list) bind(C)}
+@item                   @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
+@item                   @tab @code{type(c_ptr), value :: dst, src}
+@item                   @tab @code{integer(c_size_t), value :: element_size, dst_offset, src_offset}
+@item                   @tab @code{integer(c_size_t), value :: volume, dst_dimensions, src_dimensions}
+@item                   @tab @code{integer(c_int), value :: num_dims, dst_device_num, src_device_num}
+@item                   @tab @code{integer(c_int), value :: depobj_count}
+@item                   @tab @code{integer(omp_depend_kind), optional :: depobj_list(*)}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_target_memcpy_rect}, @ref{omp_target_memcpy_async}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.8
+@end table
+
+
+
 @node omp_target_associate_ptr
 @subsection @code{omp_target_associate_ptr} -- Associate a device pointer with a host pointer
 @table @asis