libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads)

Message ID a3383e0b-29d1-622b-3278-f10aa173fa62@codesourcery.com
State New
Headers
Series libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: Handle OpenMP's reverse offloads) |

Commit Message

Tobias Burnus Dec. 7, 2022, 8:08 a.m. UTC
  On 06.12.22 08:45, Tobias Burnus wrote:
> * As follow-up,  libgomp.texi must be updated

That is what the attached patch does – obviously, it is depending on the
main patch.

OK (once the main patch is in)?

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Tobias Burnus Dec. 10, 2022, 8:18 a.m. UTC | #1
Now that the reverse-offload patch is (nearly) in:

On 07.12.22 09:08, Tobias Burnus wrote:

> On 06.12.22 08:45, Tobias Burnus wrote:
>> * As follow-up,  libgomp.texi must be updated

Slight update to that uncommitted patch: I extended the nvptx entry to
state that only one reverse-offload region runs at a given time.

OK?

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Jakub Jelinek Jan. 31, 2023, 12:21 p.m. UTC | #2
On Sat, Dec 10, 2022 at 09:18:26AM +0100, Tobias Burnus wrote:
> libgomp.texi: Reverse-offload updates
> 
> libgomp/
> 	* libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'.
> 	(GCN): Add item about 'omp requires'.
> 	(nvptx): Likewise; add item about reverse offload.
> 
> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -192,8 +192,8 @@ The OpenMP 4.5 specification is fully supported.
>        env variable @tab Y @tab
>  @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
>  @item @code{requires} directive @tab P
> -      @tab complete but no non-host devices provides @code{unified_address},
> -      @code{unified_shared_memory} or @code{reverse_offload}
> +      @tab complete but no non-host devices provides @code{unified_address} or
> +      @code{unified_shared_memory}
>  @item @code{teams} construct outside an enclosing target region @tab Y @tab
>  @item Non-rectangular loop nests @tab Y @tab
>  @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
> @@ -228,7 +228,7 @@ The OpenMP 4.5 specification is fully supported.
>  @item @code{allocate} clause @tab P @tab Initial support
>  @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
>  @item @code{ancestor} modifier on @code{device} clause
> -      @tab Y @tab See comment for @code{requires}
> +      @tab Y @tab Host fallback with GCN devices
>  @item Implicit declare target directive @tab Y @tab
>  @item Discontiguous array section with @code{target update} construct
>        @tab N @tab
> @@ -288,7 +288,7 @@ The OpenMP 4.5 specification is fully supported.
>        @code{append_args} @tab N @tab
>  @item @code{dispatch} construct @tab N @tab
>  @item device-specific ICV settings with environment variables @tab Y @tab
> -@item @code{assume} directive @tab Y @tab
> +@item @code{assume} and @code{assumes} directives @tab Y @tab
>  @item @code{nothing} directive @tab Y @tab
>  @item @code{error} directive @tab Y @tab
>  @item @code{masked} construct @tab Y @tab
> @@ -4456,6 +4456,9 @@ The implementation remark:
>  @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
>        using the C library @code{printf} functions and the Fortran
>        @code{print}/@code{write} statements.
> +@item OpenMP code that has a requires directive with @code{unified_address},
> +      @code{unified_shared_memory} or @code{reverse_offload} will remove
> +      any GCN device from the list of available devices (``host fallback'').
>  @end itemize
>  
>  
> @@ -4507,6 +4510,15 @@ The implementation remark:
>  @item Compilation OpenMP code that contains @code{requires reverse_offload}
>        requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
>        is not supported.
> +@item For code containing reverse offload (i.e. @code{target} regions with
> +      @code{device(ancestor:1)}), there is a slight performance penality
> +      for @emph{all} target regions, consisting mostly of shutdown delay
> +      Per device, reverse offload regions are processed serial such that

s/serial/serially/ ?

> +      the next reverse offload region is only executed after the previous
> +      one returns.
> +@item OpenMP code that has a requires directive with @code{unified_address}
> +      or @code{unified_shared_memory} will remove any nvptx device from the
> +      list of available devices (``host fallback'').
>  @end itemize

Otherwise LGTM

	Jakub
  

Patch

libgomp.texi: Reverse-offload updates

libgomp/
	* libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'.
	(GCN): Add item about 'omp requires'.
	(nvptx): Likewise; add item about reverse offload.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index efa7d956a33..e9ab079ecf5 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -192,8 +192,8 @@  The OpenMP 4.5 specification is fully supported.
       env variable @tab Y @tab
 @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
 @item @code{requires} directive @tab P
-      @tab complete but no non-host devices provides @code{unified_address},
-      @code{unified_shared_memory} or @code{reverse_offload}
+      @tab complete but no non-host devices provides @code{unified_address} or
+      @code{unified_shared_memory}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab Y @tab
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
@@ -228,7 +228,7 @@  The OpenMP 4.5 specification is fully supported.
 @item @code{allocate} clause @tab P @tab Initial support
 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
 @item @code{ancestor} modifier on @code{device} clause
-      @tab Y @tab See comment for @code{requires}
+      @tab Y @tab Host fallback with GCN devices
 @item Implicit declare target directive @tab Y @tab
 @item Discontiguous array section with @code{target update} construct
       @tab N @tab
@@ -288,7 +288,7 @@  The OpenMP 4.5 specification is fully supported.
       @code{append_args} @tab N @tab
 @item @code{dispatch} construct @tab N @tab
 @item device-specific ICV settings with environment variables @tab Y @tab
-@item @code{assume} directive @tab Y @tab
+@item @code{assume} and @code{assumes} directives @tab Y @tab
 @item @code{nothing} directive @tab Y @tab
 @item @code{error} directive @tab Y @tab
 @item @code{masked} construct @tab Y @tab
@@ -4455,6 +4455,9 @@  The implementation remark:
 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
       using the C library @code{printf} functions and the Fortran
       @code{print}/@code{write} statements.
+@item OpenMP code that has a requires directive with @code{unified_address},
+      @code{unified_shared_memory} or @code{reverse_offload} will remove
+      any GCN device from the list of available devices (``host fallback'').
 @end itemize
 
 
@@ -4504,6 +4507,13 @@  The implementation remark:
 @item Compilation OpenMP code that contains @code{requires reverse_offload}
       requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
       is not supported.
+@item For code containing reverse offload (i.e. @code{target} regions with
+      @code{device(ancestor:1)}), there is a slight performance penality
+      for @emph{all} target regions, consisting mostly of shutdown delay
+      between zero to one microsecond and a tiny device querying overhead.
+@item OpenMP code that has a requires directive with @code{unified_address}
+      or @code{unified_shared_memory} will remove any nvptx device from the
+      list of available devices (``host fallback'').
 @end itemize