Update mmap() flags and errors lists

Message ID xnzfsxze5q.fsf@greed.delorie.com
State Changes Requested
Headers
Series Update mmap() flags and errors lists |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed

Commit Message

DJ Delorie May 10, 2024, 6:59 p.m. UTC
  [DJ - information taken from various sources, including man pages
(which I read, summarized in my notes, ignored for a while, then
rewrote from my notes and kernel sources - "how to take advantage of
bad memory" ;) and kernel sources (linux and hurd).  I contemplated
adding a table cross-referencing each flag with the kernels that
support them and versions introduced, but decided that was too much
work and detail for the results desired.]

[patch starts here]

Extend the list of MAP_* macros to include all macros available
to the average program (gcc -E -dM | grep MAP_*)

Extend the list of errno codes.
  

Comments

Florian Weimer June 4, 2024, 10:16 p.m. UTC | #1
* DJ Delorie:

> [DJ - information taken from various sources, including man pages
> (which I read, summarized in my notes, ignored for a while, then
> rewrote from my notes and kernel sources - "how to take advantage of
> bad memory" ;) and kernel sources (linux and hurd).  I contemplated
> adding a table cross-referencing each flag with the kernels that
> support them and versions introduced, but decided that was too much
> work and detail for the results desired.]
>
> [patch starts here]
>
> Extend the list of MAP_* macros to include all macros available
> to the average program (gcc -E -dM | grep MAP_*)
>
> Extend the list of errno codes.
>
> diff --git a/manual/llio.texi b/manual/llio.texi
> index fae49d1433..2086e04afd 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1573,10 +1573,15 @@ permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
>  of address space for future use.  The @code{mprotect} function can be
>  used to change the protection flags.  @xref{Memory Protection}.
>  
> -@var{flags} contains flags that control the nature of the map.
> -One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
> +@var{flags} contains flags that control the nature of the map.  One of
> +@code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or @code{MAP_PRIVATE}
> +must be specified.  Additional flags may be bitwise OR'd to further
> +define the mapping.

While you are adding this, please avoid starting a sentence with @var,
so something like:

  [The] @var{flags} [parameter] contains …

> -They include:
> +Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
> +all flags are supported on all versions of all operating systems.
> +Consult the kernel-specific documenation for details.  The flags
> +include:

typo: documen[t]ation

> +@item MAP_SHARED_VALIDATE
> +Similar to @code{MAP_SHARED} except that additional flags will be
> +validated by the kernel, and the call will fail if an unrecognized
> +flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
> +that doesn't support it causes the flag to be ignored.
> +@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
> +flags is required.

This leads to the question what to do if you want this checking behavior
with MAP_PRIVATE instead of MAP_SHARED.

> +
>  @item MAP_FIXED
>  This forces the system to use the exact mapping address specified in
> -@var{address} and fail if it can't.
> +@var{address} and fail if it can't.  Note that if the new mapping
> +would overlap an existing mapping, the existing map is unmapped.

This is misleading, I believe.  The overlapping part is replaced with
the new mapping.  If the overlap is incomplete, part of the previous
mapping remains.

> +@item MAP_HUGE_16KB
> +@dots{}
> +@item MAP_HUGE_16GB
> +Some architectures support more than one size of ``huge'' pages for
> +@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
> +them.  Note that while the ABI allows the caller to specify arbitrary
> +page sizes, not all sizes have corresponding defined macros, and not
> +all defined macros correspond to sizes supported by the kernel.  It is
> +up to the programmer to only ask for huge page sizes that are known to
> +be supported.

These we do not support?  (We probably should.)

> +@item MAP_32BIT
> +Require addresses that can be accessed with a 32 bit pointer, i.e.,
> +within the first 4 GiB.  Ignored if MAP_FIXED is specified.
> +
> +@item MAP_DENYWRITE
> +@item MAP_EXECUTABLE
> +@item MAP_FILE
> +
> +Provided for compatibility.  Ignored by the Linux kernel.

I thought that some corner cases still handle MAP_DENYWRITE?

> +@item MAP_FIXED_NOREPLACE
> +Similar to @code{MAP_FIXED} except the call will fail with
> +@code{EEXIST} if the new mapping would overwrite an existing mapping.

How does this interact with MAP_SHARED_VALIDATE above?  Can it be
combined with MAP_FIXED?

> +@item MAP_GROWSDOWN
> +This flag is used to make stacks, and is typically only needed inside
> +the program loader to set up the main stack and thread stacks for the
> +running process.  The mapping is created according to the other flags,
> +except an additional page just prior to the mapping is marked as a
> +``guard page''.  If a write is attempted inside this guard page, that
> +page is mapped, the mapping is extended, and a new guard page is
> +created.  Thus, the mapping continues to grow towards lower addresses
> +until it encounters some other mapping.

Maybe reference -fstack-clash-protection, and note that @theglibc{} does
not use this for thread stacks?

> +@item MAP_LOCKED
> +Requests that mapped pages are locked in memory (i.e. not paged out).
> +Note that this is a request and not a requirement; use @code{mlock} if
> +locking is mandatory.
> +
> +@item MAP_POPULATE
> +@item MAP_NONBLOCK
> +These two are opposites.  @code{MAP_POPULATE} requests that the kernel
> +read-ahead a file-backed mapping, causing more pages to be mapped
> +before they're needed.  @code{MAP_NONBLOCK} requests that the kernel
> +@emph{not} attempt such, only mapping pages when they're actually
> +needed.

MAP_POPULATE is just a hint, right?  And even with mlockall, or
MAP_LOCKED, it does not guarantee the absence of future page faults.

> +@item MAP_NORESERVE
> +Asks the kernel to not reserve physical backing for a mapping.  This
> +would be useful for, for example, a very large but sparsely used
> +mapping which need not be limited in span by available RAM or swap.
> +Note that writes to such a mapping may cause a @code{SIGSEGV} if the
> +amount of backing required eventualy exceeds system resources.
> +
> +On Linux, this flag's behavior may be overwridden by
> +@code{/proc/sys/vm/overcommit_memory} as documented in swap(5).

Shoud @xref the man-pages section added in the other patch.  However,
swap(5) does not appear to exist?

> +@item MAP_STACK
> +Ensures that the resulting mapping is suitable for use as a program
> +stack.  For example, the use of huge pages might be precluded.
> +
> +@item MAP_SYNC
> +This flag is used to map persistent memory devices into the running
> +program in such a way that writes to the mapping are immediately
> +written to the device as well.  Unlike most other flags, this one will
> +fail unless @code{MAP_SHARED_VALIDATE} is also given.

Is this about DAX?

> +@item MAP_UNINITIALIZED
> +This flag allows the kernel to map anonymous pages without zeroing
> +them out first.  This is, of course, a security risk, and will only
> +work if the kernel is built to allow it (typically, on single-process
> +embedded systems).
>  
>  @end vtable
>  
> @@ -1655,6 +1735,24 @@ Possible errors include:
>  
>  @table @code
>  
> +@item EACCES
> +
> +@var{filedes} was not open for the type of access specified in @var{protect}.
> +
> +@item EAGAIN
> +
> +Either the underlying file is locked, or the system has temporarily
> +run out of resources.

See below, I think the reference about locking is spurious.

> +@item EBADF
> +
> +The @var{fd} passes is invalid, and a valid file descriptor is required.

Is a file descriptor ever required?

> +@item EEXIST
> +
> +@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
> +found in the requested address range.

See my comment above for MAP_FIXED_NOREPLACE.

>  @item EINVAL
>  
>  Either @var{address} was unusable (because it is not a multiple of the
> @@ -1663,28 +1761,35 @@ applicable page size), or inconsistent @var{flags} were given.
>  If @code{MAP_HUGETLB} was specified, the file or system does not support
>  large page sizes.
>  
> -@item EACCES
> +@item ENFILE
>  
> -@var{filedes} was not open for the type of access specified in @var{protect}.
> +There are too many open files in the system.

Can this error actually happen?  It's a bit surprising.

> +@item ENODEV
> +
> +This file is of a type that doesn't support mapping.
>  
>  @item ENOMEM
>  
>  Either there is not enough memory for the operation, or the process is
>  out of address space.

This should probably reference vm.max_map_count.

> -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
> -@c However mandatory locks are not discussed in this manual.

Mandatory locks are disabled in pretty much all kernels out there, no?

> +@item EOVERFLOW
> +
> +Either the offset into the file causes the page counts to exceed the
> +range of a 32 bit number, or the offset requested exceeds the length
> +of the file.

The reference to page size may be incorrect.  I think it's a fixed
offset regardless of page size on systems that can't pass a 64-bit file
offset.

> +@item ETXTBSY
> +
> +@code{MAP_DENYWRITE} was specified, but the file descriptor given was
> +open for writing.

This seems to contradict the earlier suggestion that MAP_DENYWRITE is
ignored.

Thanks,
Florian
  
DJ Delorie June 5, 2024, 4:10 a.m. UTC | #2
v2 will follow...

Florian Weimer <fweimer@redhat.com> writes:
> While you are adding this, please avoid starting a sentence with @var,
> so something like:
>
>   [The] @var{flags} [parameter] contains …

Fixed.

>> -They include:
>> +Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
>> +all flags are supported on all versions of all operating systems.
>> +Consult the kernel-specific documenation for details.  The flags
>> +include:
>
> typo: documen[t]ation

Fixed.

>> +@item MAP_SHARED_VALIDATE
>> +Similar to @code{MAP_SHARED} except that additional flags will be
>> +validated by the kernel, and the call will fail if an unrecognized
>> +flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
>> +that doesn't support it causes the flag to be ignored.
>> +@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
>> +flags is required.
>
> This leads to the question what to do if you want this checking behavior
> with MAP_PRIVATE instead of MAP_SHARED.

I didn't write the spec ;-)

>>  @item MAP_FIXED
>>  This forces the system to use the exact mapping address specified in
>> -@var{address} and fail if it can't.
>> +@var{address} and fail if it can't.  Note that if the new mapping
>> +would overlap an existing mapping, the existing map is unmapped.
>
> This is misleading, I believe.  The overlapping part is replaced with
> the new mapping.  If the overlap is incomplete, part of the previous
> mapping remains.

Reworded.

>> +@item MAP_HUGE_16KB
>> +@dots{}
>> +@item MAP_HUGE_16GB
>> +Some architectures support more than one size of ``huge'' pages for
>> +@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
>> +them.  Note that while the ABI allows the caller to specify arbitrary
>> +page sizes, not all sizes have corresponding defined macros, and not
>> +all defined macros correspond to sizes supported by the kernel.  It is
>> +up to the programmer to only ask for huge page sizes that are known to
>> +be supported.
>
> These we do not support?  (We probably should.)

The ABI is a 6-bit bitfield giving the biased bit width of the page
size.  Not all combinations have macros, and not all combinations are
honored by the kernel.  We do have macros for the combinations that the
kernel honors.  So if the kernel can do it, it works, but if the kernel
can't do it, either you get a runtime error or a compile time error ;-)

I suppose we could list all 64 possible macros in our headers, but at
the moment, we don't.

>> +@item MAP_32BIT
>> +Require addresses that can be accessed with a 32 bit pointer, i.e.,
>> +within the first 4 GiB.  Ignored if MAP_FIXED is specified.
>> +
>> +@item MAP_DENYWRITE
>> +@item MAP_EXECUTABLE
>> +@item MAP_FILE
>> +
>> +Provided for compatibility.  Ignored by the Linux kernel.
>
> I thought that some corner cases still handle MAP_DENYWRITE?

Nope, completely ignored by the kernel.

>> +@item MAP_FIXED_NOREPLACE
>> +Similar to @code{MAP_FIXED} except the call will fail with
>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>
> How does this interact with MAP_SHARED_VALIDATE above?  Can it be
> combined with MAP_FIXED?

Superset of MAP_FIXED, so it's internally *always* combined:
        /* force arch specific MAP_FIXED handling in get_unmapped_area */
        if (flags & MAP_FIXED_NOREPLACE)
                flags |= MAP_FIXED;

I would assume it interacts with MAP_SHARED_VALIDATE exactly as
documented.  Creates a shared fixed mapping, unless the kernel doesn't
support MAP_FIXED_NOREPLACE, then errors.

>> +@item MAP_GROWSDOWN
>> +This flag is used to make stacks, and is typically only needed inside
>> +the program loader to set up the main stack and thread stacks for the
>> +running process.  The mapping is created according to the other flags,
>> +except an additional page just prior to the mapping is marked as a
>> +``guard page''.  If a write is attempted inside this guard page, that
>> +page is mapped, the mapping is extended, and a new guard page is
>> +created.  Thus, the mapping continues to grow towards lower addresses
>> +until it encounters some other mapping.
>
> Maybe reference -fstack-clash-protection, and note that @theglibc{} does
> not use this for thread stacks?

I took out the thread stack text.

Added text about -fstack-clash-protection.

>> +@item MAP_LOCKED
>> +Requests that mapped pages are locked in memory (i.e. not paged out).
>> +Note that this is a request and not a requirement; use @code{mlock} if
>> +locking is mandatory.
>> +
>> +@item MAP_POPULATE
>> +@item MAP_NONBLOCK
>> +These two are opposites.  @code{MAP_POPULATE} requests that the kernel
>> +read-ahead a file-backed mapping, causing more pages to be mapped
>> +before they're needed.  @code{MAP_NONBLOCK} requests that the kernel
>> +@emph{not} attempt such, only mapping pages when they're actually
>> +needed.
>
> MAP_POPULATE is just a hint, right?  And even with mlockall, or
> MAP_LOCKED, it does not guarantee the absence of future page faults.

Correct, which is why I said "requests" but I'll add better text.

>> +@item MAP_NORESERVE
>> +Asks the kernel to not reserve physical backing for a mapping.  This
>> +would be useful for, for example, a very large but sparsely used
>> +mapping which need not be limited in span by available RAM or swap.
>> +Note that writes to such a mapping may cause a @code{SIGSEGV} if the
>> +amount of backing required eventualy exceeds system resources.
>> +
>> +On Linux, this flag's behavior may be overwridden by
>> +@code{/proc/sys/vm/overcommit_memory} as documented in swap(5).
>
> Shoud @xref the man-pages section added in the other patch.  However,
> swap(5) does not appear to exist?

Should be proc(5).  I tweaked the wording to not need a reference, I
think.  We do *not* want to accidentally include-by-reference
documentation on /proc or /sys, just the system calls.

>> +@item MAP_SYNC
>> +This flag is used to map persistent memory devices into the running
>> +program in such a way that writes to the mapping are immediately
>> +written to the device as well.  Unlike most other flags, this one will
>> +fail unless @code{MAP_SHARED_VALIDATE} is also given.
>
> Is this about DAX?

Yes.

>> +@item EAGAIN
>> +
>> +Either the underlying file is locked, or the system has temporarily
>> +run out of resources.
>
> See below, I think the reference about locking is spurious.

Based on kernel code:

        if (!mlock_future_ok(mm, vm_flags, len))
                return -EAGAIN;

>> +@item EBADF
>> +
>> +The @var{fd} passes is invalid, and a valid file descriptor is required.
>
> Is a file descriptor ever required?

If mapping a file, yes.  That's the default ;-)

>> +@item EEXIST
>> +
>> +@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
>> +found in the requested address range.
>
> See my comment above for MAP_FIXED_NOREPLACE.

Tweaked the wording.

>>  @item EINVAL
>>  
>>  Either @var{address} was unusable (because it is not a multiple of the
>> @@ -1663,28 +1761,35 @@ applicable page size), or inconsistent @var{flags} were given.
>>  If @code{MAP_HUGETLB} was specified, the file or system does not support
>>  large page sizes.
>>  
>> -@item EACCES
>> +@item ENFILE
>>  
>> -@var{filedes} was not open for the type of access specified in @var{protect}.
>> +There are too many open files in the system.
>
> Can this error actually happen?  It's a bit surprising.

No direct mention in the kernel sources but the man pages documents it.
Removed.

>> +@item ENODEV
>> +
>> +This file is of a type that doesn't support mapping.
>>  
>>  @item ENOMEM
>>  
>>  Either there is not enough memory for the operation, or the process is
>>  out of address space.
>
> This should probably reference vm.max_map_count.

Noted.

>> -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
>> -@c However mandatory locks are not discussed in this manual.
>
> Mandatory locks are disabled in pretty much all kernels out there, no?

I wouldn't think we'd want to write documentation based on config
options; if you get this error, obviously the config option was set on ;-)

But I removed this comment because I mentioned locking (in general) in
the added EAGAIN entry.

>> +@item EOVERFLOW
>> +
>> +Either the offset into the file causes the page counts to exceed the
>> +range of a 32 bit number, or the offset requested exceeds the length
>> +of the file.
>
> The reference to page size may be incorrect.  I think it's a fixed
> offset regardless of page size on systems that can't pass a 64-bit file
> offset.

The code I was looking at was talking about having 2^32 *pages* mapped.

>> +@item ETXTBSY
>> +
>> +@code{MAP_DENYWRITE} was specified, but the file descriptor given was
>> +open for writing.
>
> This seems to contradict the earlier suggestion that MAP_DENYWRITE is
> ignored.

Kernel source says this is still returned if you try to map a swap file
for writing...  rewritten.
  
Florian Weimer June 5, 2024, 6:38 a.m. UTC | #3
* DJ Delorie:

>>> +@item MAP_HUGE_16KB
>>> +@dots{}
>>> +@item MAP_HUGE_16GB
>>> +Some architectures support more than one size of ``huge'' pages for
>>> +@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
>>> +them.  Note that while the ABI allows the caller to specify arbitrary
>>> +page sizes, not all sizes have corresponding defined macros, and not
>>> +all defined macros correspond to sizes supported by the kernel.  It is
>>> +up to the programmer to only ask for huge page sizes that are known to
>>> +be supported.
>>
>> These we do not support?  (We probably should.)
>
> The ABI is a 6-bit bitfield giving the biased bit width of the page
> size.  Not all combinations have macros, and not all combinations are
> honored by the kernel.  We do have macros for the combinations that the
> kernel honors.  So if the kernel can do it, it works, but if the kernel
> can't do it, either you get a runtime error or a compile time error ;-)
>
> I suppose we could list all 64 possible macros in our headers, but at
> the moment, we don't.

What I meant is that they aren't part of <sys/mman.h>.

>>> +@item MAP_FIXED_NOREPLACE
>>> +Similar to @code{MAP_FIXED} except the call will fail with
>>> +@code{EEXIST} if the new mapping would overwrite an existing mapping.
>>
>> How does this interact with MAP_SHARED_VALIDATE above?  Can it be
>> combined with MAP_FIXED?
>
> Superset of MAP_FIXED, so it's internally *always* combined:
>         /* force arch specific MAP_FIXED handling in get_unmapped_area */
>         if (flags & MAP_FIXED_NOREPLACE)
>                 flags |= MAP_FIXED;
>
> I would assume it interacts with MAP_SHARED_VALIDATE exactly as
> documented.  Creates a shared fixed mapping, unless the kernel doesn't
> support MAP_FIXED_NOREPLACE, then errors.

The question is if MAP_FIXED_NOREPLACE can be silently treated as
MAP_FIXED.

>>> +@item EAGAIN
>>> +
>>> +Either the underlying file is locked, or the system has temporarily
>>> +run out of resources.
>>
>> See below, I think the reference about locking is spurious.
>
> Based on kernel code:
>
>         if (!mlock_future_ok(mm, vm_flags, len))
>                 return -EAGAIN;

Ahh, this refers to mlock/MAP_LOCKED?  Please say so.  We shouldn't
discuss mandatory locking support.

>>> +@item EBADF
>>> +
>>> +The @var{fd} passes is invalid, and a valid file descriptor is required.
>>
>> Is a file descriptor ever required?
>
> If mapping a file, yes.  That's the default ;-)

Ah, right.  Do we say anywhere that the fd argument is ignored for
MAP_ANONYMOUS?

>>> -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
>>> -@c However mandatory locks are not discussed in this manual.
>>
>> Mandatory locks are disabled in pretty much all kernels out there, no?
>
> I wouldn't think we'd want to write documentation based on config
> options; if you get this error, obviously the config option was set on ;-)
>
> But I removed this comment because I mentioned locking (in general) in
> the added EAGAIN entry.

Different kind of locking, see above.

Mandatory locking is generally considered a very bad idea.  I doubt any
distribution kernels enable it.  I'm worried that discussing it would be
misleading at best.

>>> +@item EOVERFLOW
>>> +
>>> +Either the offset into the file causes the page counts to exceed the
>>> +range of a 32 bit number, or the offset requested exceeds the length
>>> +of the file.
>>
>> The reference to page size may be incorrect.  I think it's a fixed
>> offset regardless of page size on systems that can't pass a 64-bit file
>> offset.
>
> The code I was looking at was talking about having 2^32 *pages* mapped.

See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
regardless of page size.

And it turns out that we do not check that the offset argument fits into
32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
matters).  The documentation implies we should return EOVERFLOW in this
case.

Thanks,
Florian
  
DJ Delorie June 5, 2024, 6:42 p.m. UTC | #4
Florian Weimer <fweimer@redhat.com> writes:
> What I meant is that they aren't part of <sys/mman.h>.

So they aren't.  We define the shift macros, though...

Removed.

>> I would assume it interacts with MAP_SHARED_VALIDATE exactly as
>> documented.  Creates a shared fixed mapping, unless the kernel doesn't
>> support MAP_FIXED_NOREPLACE, then errors.
>
> The question is if MAP_FIXED_NOREPLACE can be silently treated as
> MAP_FIXED.

I wouldn't think so.  If the kernel supports it, it automatically
includes MAP_FIXED.  If the kernel doesn't support it, it's ignored.
You'd have to pass MAP_FIXED | MAP_FIXED_NOREPLACE to get the "silent"
treatment.

If you needed MAP_PRIVATE | MAP_FIXED_NOREPLACE I suppose you'd have to
do a test mapping with MAP_SHARED_VALIDATE | MAP_FIXED_NOREPLACE first,
to see if the running kernel supports it.  But this is the case for
*any* flag.

> Ahh, this refers to mlock/MAP_LOCKED?  Please say so.  We shouldn't
> discuss mandatory locking support.

Ok, I'll take out mentions of these locks.

I assume this is different than the mlock() and MAP_LOCKED locks, yes?

>>>> +@item EBADF
>>>> +
>>>> +The @var{fd} passes is invalid, and a valid file descriptor is required.
>>>
>>> Is a file descriptor ever required?
>>
>> If mapping a file, yes.  That's the default ;-)
>
> Ah, right.  Do we say anywhere that the fd argument is ignored for
> MAP_ANONYMOUS?

Yes.  I reworded it a bit.

@item MAP_ANONYMOUS
@itemx MAP_ANON
This flag tells the system to create an anonymous mapping, not connected
to a file.  @var{filedes} and @var{offset} are ignored, and the region is
initialized with zeros.

>>>> +@item EOVERFLOW
>>>> +
>>>> +Either the offset into the file causes the page counts to exceed the
>>>> +range of a 32 bit number, or the offset requested exceeds the length
>>>> +of the file.
>>>
>>> The reference to page size may be incorrect.  I think it's a fixed
>>> offset regardless of page size on systems that can't pass a 64-bit file
>>> offset.
>>
>> The code I was looking at was talking about having 2^32 *pages* mapped.
>
> See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
> regardless of page size.
>
> And it turns out that we do not check that the offset argument fits into
> 32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
> matters).  The documentation implies we should return EOVERFLOW in this
> case.


	/* offset overflow? */
	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
		return -EOVERFLOW;

the man pages describe it thusly:

       EOVERFLOW
              On 32-bit architecture together with the large file
              extension (i.e., using 64-bit off_t): the number of pages
              used for length plus number of pages used for offset would
              overflow unsigned long (32 bits).

I reworded it to be a bit more correct and at the same time vague ;-)
  
Florian Weimer June 14, 2024, 8:14 a.m. UTC | #5
* DJ Delorie:

> Florian Weimer <fweimer@redhat.com> writes:
>> What I meant is that they aren't part of <sys/mman.h>.
>
> So they aren't.  We define the shift macros, though...
>
> Removed.
>
>>> I would assume it interacts with MAP_SHARED_VALIDATE exactly as
>>> documented.  Creates a shared fixed mapping, unless the kernel doesn't
>>> support MAP_FIXED_NOREPLACE, then errors.
>>
>> The question is if MAP_FIXED_NOREPLACE can be silently treated as
>> MAP_FIXED.
>
> I wouldn't think so.  If the kernel supports it, it automatically
> includes MAP_FIXED.  If the kernel doesn't support it, it's ignored.
> You'd have to pass MAP_FIXED | MAP_FIXED_NOREPLACE to get the "silent"
> treatment.

Ahh, so the application has to if the return address changed and unmap?

>>> The code I was looking at was talking about having 2^32 *pages* mapped.
>>
>> See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
>> regardless of page size.
>>
>> And it turns out that we do not check that the offset argument fits into
>> 32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
>> matters).  The documentation implies we should return EOVERFLOW in this
>> case.
>
>
> 	/* offset overflow? */
> 	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
> 		return -EOVERFLOW;

This is kernel code.  We have a shift in userspace on some
architectures, too.

Thanks,
Florian
  
DJ Delorie June 14, 2024, 4:40 p.m. UTC | #6
Florian Weimer <fweimer@redhat.com> writes:
>>> The question is if MAP_FIXED_NOREPLACE can be silently treated as
>>> MAP_FIXED.
>>
>> I wouldn't think so.  If the kernel supports it, it automatically
>> includes MAP_FIXED.  If the kernel doesn't support it, it's ignored.
>> You'd have to pass MAP_FIXED | MAP_FIXED_NOREPLACE to get the "silent"
>> treatment.
>
> Ahh, so the application has to if the return address changed and unmap?

I suppose that's one way of auto-detecting it, yes.

>>>> The code I was looking at was talking about having 2^32 *pages* mapped.
>>>
>>> See MMAP2_PAGE_UNIT.  On most architectures, it's fixed at 4096,
>>> regardless of page size.
>>>
>>> And it turns out that we do not check that the offset argument fits into
>>> 32 bits after dividing by MMAP2_PAGE_UNIT (on targets where this
>>> matters).  The documentation implies we should return EOVERFLOW in this
>>> case.
>>
>>
>> 	/* offset overflow? */
>> 	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
>> 		return -EOVERFLOW;
>
> This is kernel code.  We have a shift in userspace on some
> architectures, too.

That shouldn't affect the kernel's computation of that error code,
though?
  

Patch

diff --git a/manual/llio.texi b/manual/llio.texi
index fae49d1433..2086e04afd 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1573,10 +1573,15 @@  permitted.  They include @code{PROT_READ}, @code{PROT_WRITE}, and
 of address space for future use.  The @code{mprotect} function can be
 used to change the protection flags.  @xref{Memory Protection}.
 
-@var{flags} contains flags that control the nature of the map.
-One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
+@var{flags} contains flags that control the nature of the map.  One of
+@code{MAP_SHARED}, @code{MAP_SHARED_VALIDATE}, or @code{MAP_PRIVATE}
+must be specified.  Additional flags may be bitwise OR'd to further
+define the mapping.
 
-They include:
+Note that, aside from @code{MAP_PRIVATE} and @code{MAP_SHARED}, not
+all flags are supported on all versions of all operating systems.
+Consult the kernel-specific documenation for details.  The flags
+include:
 
 @vtable @code
 @item MAP_PRIVATE
@@ -1598,9 +1603,18 @@  Note that actual writing may take place at any time.  You need to use
 @code{msync}, described below, if it is important that other processes
 using conventional I/O get a consistent view of the file.
 
+@item MAP_SHARED_VALIDATE
+Similar to @code{MAP_SHARED} except that additional flags will be
+validated by the kernel, and the call will fail if an unrecognized
+flag is provided.  With @code{MAP_SHARED} using a flag on a kernel
+that doesn't support it causes the flag to be ignored.
+@code{MAP_SHARED_VALIDATE} should be used when the behavior of all
+flags is required.
+
 @item MAP_FIXED
 This forces the system to use the exact mapping address specified in
-@var{address} and fail if it can't.
+@var{address} and fail if it can't.  Note that if the new mapping
+would overlap an existing mapping, the existing map is unmapped.
 
 @c One of these is official - the other is obviously an obsolete synonym
 @c Which is which?
@@ -1638,13 +1652,79 @@  Not all file systems support mappings with an increased page size.
 
 The @code{MAP_HUGETLB} flag is specific to Linux.
 
-@c There is a mechanism to select different hugepage sizes; see
-@c include/uapi/asm-generic/hugetlb_encode.h in the kernel sources.
-
-@c Linux has some other MAP_ options, which I have not discussed here.
-@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
-@c user programs (and I don't understand the last two).  MAP_LOCKED does
-@c not appear to be implemented.
+@item MAP_HUGE_16KB
+@dots{}
+@item MAP_HUGE_16GB
+Some architectures support more than one size of ``huge'' pages for
+@code{MAP_HUGETLB}.  These flags allow the caller to choose amongst
+them.  Note that while the ABI allows the caller to specify arbitrary
+page sizes, not all sizes have corresponding defined macros, and not
+all defined macros correspond to sizes supported by the kernel.  It is
+up to the programmer to only ask for huge page sizes that are known to
+be supported.
+
+@item MAP_32BIT
+Require addresses that can be accessed with a 32 bit pointer, i.e.,
+within the first 4 GiB.  Ignored if MAP_FIXED is specified.
+
+@item MAP_DENYWRITE
+@item MAP_EXECUTABLE
+@item MAP_FILE
+
+Provided for compatibility.  Ignored by the Linux kernel.
+
+@item MAP_FIXED_NOREPLACE
+Similar to @code{MAP_FIXED} except the call will fail with
+@code{EEXIST} if the new mapping would overwrite an existing mapping.
+
+@item MAP_GROWSDOWN
+This flag is used to make stacks, and is typically only needed inside
+the program loader to set up the main stack and thread stacks for the
+running process.  The mapping is created according to the other flags,
+except an additional page just prior to the mapping is marked as a
+``guard page''.  If a write is attempted inside this guard page, that
+page is mapped, the mapping is extended, and a new guard page is
+created.  Thus, the mapping continues to grow towards lower addresses
+until it encounters some other mapping.
+
+@item MAP_LOCKED
+Requests that mapped pages are locked in memory (i.e. not paged out).
+Note that this is a request and not a requirement; use @code{mlock} if
+locking is mandatory.
+
+@item MAP_POPULATE
+@item MAP_NONBLOCK
+These two are opposites.  @code{MAP_POPULATE} requests that the kernel
+read-ahead a file-backed mapping, causing more pages to be mapped
+before they're needed.  @code{MAP_NONBLOCK} requests that the kernel
+@emph{not} attempt such, only mapping pages when they're actually
+needed.
+
+@item MAP_NORESERVE
+Asks the kernel to not reserve physical backing for a mapping.  This
+would be useful for, for example, a very large but sparsely used
+mapping which need not be limited in span by available RAM or swap.
+Note that writes to such a mapping may cause a @code{SIGSEGV} if the
+amount of backing required eventualy exceeds system resources.
+
+On Linux, this flag's behavior may be overwridden by
+@code{/proc/sys/vm/overcommit_memory} as documented in swap(5).
+
+@item MAP_STACK
+Ensures that the resulting mapping is suitable for use as a program
+stack.  For example, the use of huge pages might be precluded.
+
+@item MAP_SYNC
+This flag is used to map persistent memory devices into the running
+program in such a way that writes to the mapping are immediately
+written to the device as well.  Unlike most other flags, this one will
+fail unless @code{MAP_SHARED_VALIDATE} is also given.
+
+@item MAP_UNINITIALIZED
+This flag allows the kernel to map anonymous pages without zeroing
+them out first.  This is, of course, a security risk, and will only
+work if the kernel is built to allow it (typically, on single-process
+embedded systems).
 
 @end vtable
 
@@ -1655,6 +1735,24 @@  Possible errors include:
 
 @table @code
 
+@item EACCES
+
+@var{filedes} was not open for the type of access specified in @var{protect}.
+
+@item EAGAIN
+
+Either the underlying file is locked, or the system has temporarily
+run out of resources.
+
+@item EBADF
+
+The @var{fd} passes is invalid, and a valid file descriptor is required.
+
+@item EEXIST
+
+@code{MAP_FIXED_NOREPLACE} was specified and an existing mapping was
+found in the requested address range.
+
 @item EINVAL
 
 Either @var{address} was unusable (because it is not a multiple of the
@@ -1663,28 +1761,35 @@  applicable page size), or inconsistent @var{flags} were given.
 If @code{MAP_HUGETLB} was specified, the file or system does not support
 large page sizes.
 
-@item EACCES
+@item ENFILE
 
-@var{filedes} was not open for the type of access specified in @var{protect}.
+There are too many open files in the system.
+
+@item ENODEV
+
+This file is of a type that doesn't support mapping.
 
 @item ENOMEM
 
 Either there is not enough memory for the operation, or the process is
 out of address space.
 
-@item ENODEV
-
-This file is of a type that doesn't support mapping.
-
 @item ENOEXEC
 
 The file is on a filesystem that doesn't support mapping.
 
-@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
-@c However mandatory locks are not discussed in this manual.
-@c
-@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented
-@c here) is used and the file is already open for writing.
+@item EPERM
+
+@item EOVERFLOW
+
+Either the offset into the file causes the page counts to exceed the
+range of a 32 bit number, or the offset requested exceeds the length
+of the file.
+
+@item ETXTBSY
+
+@code{MAP_DENYWRITE} was specified, but the file descriptor given was
+open for writing.
 
 @end table