[1/5] openmp: Add -foffload-memory

Message ID 20220308113059.688551-2-abidh@codesourcery.com
State New
Headers
Series openmp: Handle pinned and unified shared memory. |

Commit Message

Abid Qadeer March 8, 2022, 11:30 a.m. UTC
  From: Andrew Stubbs <ams@codesourcery.com>

Add a new option.  It will be used in follow-up patches.

gcc/ChangeLog:

	* common.opt: Add -foffload-memory and its enum values.
	* coretypes.h (enum offload_memory): New.
	* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt      | 16 ++++++++++++++++
 gcc/coretypes.h     |  7 +++++++
 gcc/doc/invoke.texi | 16 +++++++++++++++-
 3 files changed, 38 insertions(+), 1 deletion(-)
  

Comments

Thomas Schwinge Feb. 13, 2023, 2:38 p.m. UTC | #1
Hi!

On 2022-03-08T11:30:55+0000, Hafiz Abid Qadeer <abidh@codesourcery.com> wrote:
> From: Andrew Stubbs <ams@codesourcery.com>
>
> Add a new option.  It will be used in follow-up patches.

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

> +@option{-foffload-memory=pinned} forces all host memory to be pinned (this
> +mode may require the user to increase the ulimit setting for locked memory).

So, this is currently implemented via 'mlockall', which, as discussed,
(a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it
meant to achieve (because it doesn't register the page-locked memory with
the GPU driver).

So one idea was to re-purpose the unified shared memory
'gcc/omp-low.cc:pass_usm_transform' (compiler pass that "changes calls to
malloc/free/calloc/realloc and operator new to memory allocation
functions in libgomp with allocator=ompx_unified_shared_mem_alloc"),
<https://inbox.sourceware.org/gcc-patches/20220308113059.688551-5-abidh@codesourcery.com>.
(I have not yet looked into that in detail.)

Here's now a different idea.  As '-foffload-memory=pinned', per the name
of the option, concerns itself with memory used in offloading but not
host execution generally, why are we actually attempting to "[force] all
host memory to be pinned" -- why not just the memory that's being used
with offloading?  That is, if '-foffload-memory=pinned' is set, register
as page-locked with the GPU driver all memory that appears in OMP
offloading data regions, such as OpenMP 'target' 'map' clauses etc.  That
way, this is directed at the offloading data transfers, as itended, but
at the same time we don't "waste" page-locked memory for generic host
memory allocations.  What do you think -- you, who've spent a lot more
time on this topic than I have, so it's likely possible that I fail to
realize some "details"?


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  
Andrew Stubbs Feb. 13, 2023, 3:20 p.m. UTC | #2
On 13/02/2023 14:38, Thomas Schwinge wrote:
> Hi!
> 
> On 2022-03-08T11:30:55+0000, Hafiz Abid Qadeer <abidh@codesourcery.com> wrote:
>> From: Andrew Stubbs <ams@codesourcery.com>
>>
>> Add a new option.  It will be used in follow-up patches.
> 
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
> 
>> +@option{-foffload-memory=pinned} forces all host memory to be pinned (this
>> +mode may require the user to increase the ulimit setting for locked memory).
> 
> So, this is currently implemented via 'mlockall', which, as discussed,
> (a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it
> meant to achieve (because it doesn't register the page-locked memory with
> the GPU driver).
> 
> So one idea was to re-purpose the unified shared memory
> 'gcc/omp-low.cc:pass_usm_transform' (compiler pass that "changes calls to
> malloc/free/calloc/realloc and operator new to memory allocation
> functions in libgomp with allocator=ompx_unified_shared_mem_alloc"),
> <https://inbox.sourceware.org/gcc-patches/20220308113059.688551-5-abidh@codesourcery.com>> (I have not yet looked into that in detail.)
> 
> Here's now a different idea.  As '-foffload-memory=pinned', per the name
> of the option, concerns itself with memory used in offloading but not
> host execution generally, why are we actually attempting to "[force] all
> host memory to be pinned" -- why not just the memory that's being used
> with offloading?  That is, if '-foffload-memory=pinned' is set, register
> as page-locked with the GPU driver all memory that appears in OMP
> offloading data regions, such as OpenMP 'target' 'map' clauses etc.  That
> way, this is directed at the offloading data transfers, as itended, but
> at the same time we don't "waste" page-locked memory for generic host
> memory allocations.  What do you think -- you, who've spent a lot more
> time on this topic than I have, so it's likely possible that I fail to
> realize some "details"?

The main reason it is the way it is is because in general it's not 
possible to know what memory is going to be offloaded at the time it is 
allocated (and stack/static memory is never allocated that way).

If there's a way to pin it after the fact then maybe that's not a 
terrible idea? The downside is that the memory might already have been 
paged out at that point, and we'd have to track what we'd previously 
pinned, or else re-pin it every time we launch a kernel. We'd also have 
no way to unpin previously pinned memory (not that that's relevant to 
the "lock all" case).

My original plan was to use omp_alloc for both the standard OpenMP 
support and the -foffload-memory option (to get the benefit of pinning 
without modifying any source), but then I decided that the mlockall 
option was much less invasive. This is still the best way to implement 
target-independent pinning, when there's no driver registration option.

Andrew
  

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..17426523e23 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2182,6 +2182,22 @@  Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 08b9ac9094c..dd52d5bb113 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -206,6 +206,13 @@  enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 248ed534aee..d16019fc8c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@  in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
--foffload=@var{arg}  -foffload-options=@var{arg} @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2694,6 +2694,20 @@  Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
 @end smallexample
 
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming