manual: Update documentation of strerror and related functions

Message ID 87ilbo7kju.fsf@oldenburg.str.redhat.com
State Superseded
Headers
Series manual: Update documentation of strerror and related functions |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

Florian Weimer June 15, 2023, 4:32 p.m. UTC
  The current implementation of strerror is thread-safe, but this
has implications for the lifetime of the return string.

Describe the strerror_l function.  Describe both variants of the
strerror_r function.  Mention the lifetime of the returned string
for strerrorname_np and strerrordesc_np.  Clarify that perror
output depends on the current locale.

---
 manual/errno.texi | 123 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 97 insertions(+), 26 deletions(-)


base-commit: 57cd52ecec8567dd1bd91779198710f61889cf25
  

Comments

Florian Weimer June 21, 2023, 9:54 a.m. UTC | #1
* Florian Weimer:

>  You should not modify the string returned by @code{strerror}.  Also, if
> +you make subsequent calls to @code{strerror} or @code{strerror_l}, or
> +the thread that obtained the string exits, the returned pointer will be
> +invalidated.

I should point out that invalidation of the pointer on a subsequent
strerror or strerror_l call is likely a defect because the standard does
not allow this behavior.  Should we fix it?  It will require maintaining
a list of allocations associated with the current thread.

Thanks,
Florian
  
Andreas Schwab June 21, 2023, 10:20 a.m. UTC | #2
On Jun 21 2023, Florian Weimer via Libc-alpha wrote:

> * Florian Weimer:
>
>>  You should not modify the string returned by @code{strerror}.  Also, if
>> +you make subsequent calls to @code{strerror} or @code{strerror_l}, or
>> +the thread that obtained the string exits, the returned pointer will be
>> +invalidated.
>
> I should point out that invalidation of the pointer on a subsequent
> strerror or strerror_l call is likely a defect because the standard does
> not allow this behavior.

Doesn't the standard contain almost exactly the same wording?
  
Florian Weimer June 21, 2023, 10:34 a.m. UTC | #3
* Andreas Schwab:

> On Jun 21 2023, Florian Weimer via Libc-alpha wrote:
>
>> * Florian Weimer:
>>
>>>  You should not modify the string returned by @code{strerror}.  Also, if
>>> +you make subsequent calls to @code{strerror} or @code{strerror_l}, or
>>> +the thread that obtained the string exits, the returned pointer will be
>>> +invalidated.
>>
>> I should point out that invalidation of the pointer on a subsequent
>> strerror or strerror_l call is likely a defect because the standard does
>> not allow this behavior.
>
> Doesn't the standard contain almost exactly the same wording?

Not in C11, it only speaks of overwriting the string during future
calls.  POSIX says that it can be invalidated (I was confused before by
the CX shading, which makes this difficult to read).

If POSIX gives us permissions, this is probably good enough.  It still
does not match what's documented by the man-pages project.

Thanks,
Florian
  
Andreas Schwab June 21, 2023, 10:57 a.m. UTC | #4
On Jun 21 2023, Florian Weimer wrote:

> * Andreas Schwab:
>
>> On Jun 21 2023, Florian Weimer via Libc-alpha wrote:
>>
>>> * Florian Weimer:
>>>
>>>>  You should not modify the string returned by @code{strerror}.  Also, if
>>>> +you make subsequent calls to @code{strerror} or @code{strerror_l}, or
>>>> +the thread that obtained the string exits, the returned pointer will be
>>>> +invalidated.
>>>
>>> I should point out that invalidation of the pointer on a subsequent
>>> strerror or strerror_l call is likely a defect because the standard does
>>> not allow this behavior.
>>
>> Doesn't the standard contain almost exactly the same wording?
>
> Not in C11, it only speaks of overwriting the string during future
> calls.  POSIX says that it can be invalidated (I was confused before by
> the CX shading, which makes this difficult to read).

The new wording was added in the 2017 edition, which likely prompted the
corresponding change in C23.
  
Carlos O'Donell June 22, 2023, 7:45 p.m. UTC | #5
On 6/15/23 12:32, Florian Weimer via Libc-alpha wrote:
> The current implementation of strerror is thread-safe, but this
> has implications for the lifetime of the return string.

Agreed. Such implications persist in ISO C17, but have been corrected in the upcoming
version of the ISO C standard.

> Describe the strerror_l function.  Describe both variants of the
> strerror_r function.  Mention the lifetime of the returned string
> for strerrorname_np and strerrordesc_np.  Clarify that perror
> output depends on the current locale.

Looking forward to a v2 with fixes.
 
> ---
>  manual/errno.texi | 123 ++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 97 insertions(+), 26 deletions(-)
> 
> diff --git a/manual/errno.texi b/manual/errno.texi
> index 28dd871caa..f266bddb06 100644
> --- a/manual/errno.texi
> +++ b/manual/errno.texi
> @@ -1147,42 +1147,110 @@ name of the program that encountered the error.
>  
>  @deftypefun {char *} strerror (int @var{errnum})
>  @standards{ISO, string.h}
> -@safety{@prelim{}@mtunsafe{@mtasurace{:strerror}}@asunsafe{@ascuheap{} @ascuintl{}}@acunsafe{@acsmem{}}}
> -@c Calls strerror_r with a static buffer allocated with malloc on the
> -@c first use.
> +@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @ascuintl{}}@acunsafe{@acsmem{}}}

OK. Mark strerror() as MT-safe.

>  The @code{strerror} function maps the error code (@pxref{Checking for
>  Errors}) specified by the @var{errnum} argument to a descriptive error
> -message string.  The return value is a pointer to this string.
> +message string.  The string is translated according to the current
> +locale.  The return value is a pointer to this string.

OK. Add the caveat that the translation happens according to the current locale.
  
>  The value @var{errnum} normally comes from the variable @code{errno}.
>  
>  You should not modify the string returned by @code{strerror}.  Also, if
> -you make subsequent calls to @code{strerror}, the string might be
> -overwritten.  (But it's guaranteed that no library function ever calls
> -@code{strerror} behind your back.)
> +you make subsequent calls to @code{strerror} or @code{strerror_l}, or
> +the thread that obtained the string exits, the returned pointer will be
> +invalidated.

OK. Agreed, this is how it behaves because the memory may be freed in these cases. This
aligns with the currently proposed ISO C2x language.

> +
> +As there is no way to restore the previous state after calling
> +@code{strerror}, library code should not call this function because it
> +may intefere with application use of @code{strerror}, invalidating the
> +string pointer before the application is done using it.  Instead,
> +@code{strerror_r}, @code{snprintf} with the @samp{%m} or @samp{%#m}
> +specifiers, @code{strerrorname_np}, or @code{strerrordesc_np} can be
> +used instead.

OK. Agreed, good suggestions.

> +
> +The @code{strerror} function preserves the value of @code{errno} and
> +cannot fail.

OK. Agreed, after your change in 1d44530a5be2442e064baa48139adc9fdfb1fc6b we never
return NULL and fall back to an untranslated string.

>  
>  The function @code{strerror} is declared in @file{string.h}.
>  @end deftypefun
>  
> +@deftypefun {char *} strerror_l (int @var{errnum}, locale_t @var{locale})
> +@standards{POSIX, string.h}
> +@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @ascuintl{}}@acunsafe{@acsmem{}}}

OK.

> +This function is like @code{strerror}, except that the returned string
> +is translated according to @var{locale} (instead of the current locale
> +used by @code{strerror}).  Note that calling @code{strerror_l}
> +invalidates the pointer returned by @code{strerror} and vice versa.

OK.

> +
> +The function @code{strerror_l} is defined by POSIX and is declared in
> +@file{string.h}.


OK.

> +@end deftypefun
> +
>  @deftypefun {char *} strerror_r (int @var{errnum}, char *@var{buf}, size_t @var{n})
>  @standards{GNU, string.h}
>  @safety{@prelim{}@mtsafe{}@asunsafe{@ascuintl{}}@acunsafe{}}
> +The following description is for the GNU variant of the function,
> +used if @code{_GNU_SOURCE} is defined.  @xref{Feature Test Macros}.
> +
>  The @code{strerror_r} function works like @code{strerror} but instead of
> -returning the error message in a statically allocated buffer shared by
> -all threads in the process, it returns a private copy for the
> -thread.  This might be either some permanent global data or a message
> -string in the user supplied buffer starting at @var{buf} with the
> -length of @var{n} bytes.
> +returning a pointer to a string that is managed by @theglibc{}, it can
> +use the user supplied buffer starting at @var{buf} for storing the
> +string.

OK. Agreed, avoid discussing implementation details.

>  
> -At most @var{n} characters are written (including the NUL byte) so it is
> -up to the user to select a buffer large enough.
> +At most @var{n} characters are written (including the NUL byte) to
> +@var{buf}, so it is up to the user to select a buffer large enough.
> +Whether returned pointer points to the @var{buf} array or not depends on
> +the @var{errnum} argument.  If the result string is not stored in
> +@var{buf}, the string will does not change for the remaining execution
> +of the program.

s/will does/will/g

>  
> -This function should always be used in multi-threaded programs since
> -there is no way to guarantee the string returned by @code{strerror}
> -really belongs to the last call of the current thread.
> +The function @code{strerror_r} as described above is a GNU extension and
> +it is declared in @file{string.h}.  There is a POSIX variant of this
> +function, described next.

OK.

> +@end deftypefun
>  
> -The function @code{strerror_r} is a GNU extension and it is declared in
> -@file{string.h}.
> +@deftypefun int strerror_r (int @var{errnum}, char *@var{buf}, size_t @var{n})

Need "@safety{@prelim{}@mtsafe{}@asunsafe{@ascuintl{}}@acunsafe{}}" to match.

> +@standards{GNU, string.h}
> +
> +This variant of the @code{strerror_r} function is used if a standard is
> +selected that includes @code{strerror_r}, but @code{_GNU_SOURCE} is not
> +defined.  This POSIX variant of thefunction always writes the error

s/thefunction/the function/g

> +message to the specified buffer @var{buf} of size @var{n} bytes.
> +
> +Upon success, @code{strerror_r} returns 0.  Two more return values are
> +used to indicate failure.
> +
> +@vtable @code
> +@item EINVAL
> +The @var{errnum} argument does not correspond to a known error constant.
> +
> +@item ERANGE
> +The buffer size @var{n} is not large enough to store the entire error message.
> +@end vtable
> +
> +Even if an error is reported occurs, @code{strerror_r} still writes as

s/occurs//g

> +much of the error message to the output buffer as possible.  After a
> +call to @code{strerror_r}, the value of @code{errno} is unspecified.

OK. Last sentence appears redundant because errno in the standard is *defined* as
unspecified after a successful call, but this doesn't cover the failure case where
errno is also unspeficied, so you call that out.

> +
> +If you want to use the always-copying POSIX semantics of
> +@code{strerror_r} in a program that is potentially compiled with
> +@code{_GNU_SOURCE} defined, you can use @code{snprintf} with the
> +@samp{%m} conversion specifier, like this:
> +
> +@smallexample
> +int saved_errno = errno;
> +errno = errnum;
> +int ret = snprintf (buf, n, "%m");
> +errno = saved_errno;
> +if (strerrorname_np (errnum) == NULL)
> +  return EINVAL;
> +if (ret >= n)
> +  return ERANGE:
> +return 0;
> +@end smallexample
> +
> +This function is declared in @file{string.h} if it is declared at all.
> +It is a POSIX extension.

OK.

>  @end deftypefun
>  
>  @deftypefun void perror (const char *@var{message})
> @@ -1212,7 +1280,8 @@ The function @code{perror} is declared in @file{stdio.h}.
>  @safety{@mtsafe{}@assafe{}@acsafe{}}
>  This function returns the name describing the error @var{errnum} or
>  @code{NULL} if there is no known constant with this value (e.g "EINVAL"
> -for @code{EINVAL}).
> +for @code{EINVAL}).  The returned string does not change for the
> +remaining execution of the program.

OK.

>  
>  @pindex string.h
>  This function is a GNU extension, declared in the header file @file{string.h}.
> @@ -1223,18 +1292,20 @@ This function is a GNU extension, declared in the header file @file{string.h}.
>  @safety{@mtsafe{}@assafe{}@acsafe{}}
>  This function returns the message describing the error @var{errnum} or
>  @code{NULL} if there is no known constant with this value (e.g "Invalid
> -argument" for @code{EINVAL}).  Different than @code{strerror} the returned
> -description is not translated.
> +argument" for @code{EINVAL}).  Different than @code{strerror} the
> +returned description is not translated, and the returned string does not
> +change for the remaining execution of the program.

OK.

>  
>  @pindex string.h
>  This function is a GNU extension, declared in the header file @file{string.h}.
>  @end deftypefun
>  
>  @code{strerror} and @code{perror} produce the exact same message for any
> -given error code; the precise text varies from system to system.  With
> -@theglibc{}, the messages are fairly short; there are no multi-line
> -messages or embedded newlines.  Each error message begins with a capital
> -letter and does not include any terminating punctuation.
> +given error code under the same locale; the precise text varies from
> +system to system.  With @theglibc{}, the messages are fairly short;
> +there are no multi-line messages or embedded newlines.  Each error
> +message begins with a capital letter and does not include any
> +terminating punctuation.

OK.

>  
>  @cindex program name
>  @cindex name of running program
> 
> base-commit: 57cd52ecec8567dd1bd91779198710f61889cf25
>
  

Patch

diff --git a/manual/errno.texi b/manual/errno.texi
index 28dd871caa..f266bddb06 100644
--- a/manual/errno.texi
+++ b/manual/errno.texi
@@ -1147,42 +1147,110 @@  name of the program that encountered the error.
 
 @deftypefun {char *} strerror (int @var{errnum})
 @standards{ISO, string.h}
-@safety{@prelim{}@mtunsafe{@mtasurace{:strerror}}@asunsafe{@ascuheap{} @ascuintl{}}@acunsafe{@acsmem{}}}
-@c Calls strerror_r with a static buffer allocated with malloc on the
-@c first use.
+@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @ascuintl{}}@acunsafe{@acsmem{}}}
 The @code{strerror} function maps the error code (@pxref{Checking for
 Errors}) specified by the @var{errnum} argument to a descriptive error
-message string.  The return value is a pointer to this string.
+message string.  The string is translated according to the current
+locale.  The return value is a pointer to this string.
 
 The value @var{errnum} normally comes from the variable @code{errno}.
 
 You should not modify the string returned by @code{strerror}.  Also, if
-you make subsequent calls to @code{strerror}, the string might be
-overwritten.  (But it's guaranteed that no library function ever calls
-@code{strerror} behind your back.)
+you make subsequent calls to @code{strerror} or @code{strerror_l}, or
+the thread that obtained the string exits, the returned pointer will be
+invalidated.
+
+As there is no way to restore the previous state after calling
+@code{strerror}, library code should not call this function because it
+may intefere with application use of @code{strerror}, invalidating the
+string pointer before the application is done using it.  Instead,
+@code{strerror_r}, @code{snprintf} with the @samp{%m} or @samp{%#m}
+specifiers, @code{strerrorname_np}, or @code{strerrordesc_np} can be
+used instead.
+
+The @code{strerror} function preserves the value of @code{errno} and
+cannot fail.
 
 The function @code{strerror} is declared in @file{string.h}.
 @end deftypefun
 
+@deftypefun {char *} strerror_l (int @var{errnum}, locale_t @var{locale})
+@standards{POSIX, string.h}
+@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @ascuintl{}}@acunsafe{@acsmem{}}}
+This function is like @code{strerror}, except that the returned string
+is translated according to @var{locale} (instead of the current locale
+used by @code{strerror}).  Note that calling @code{strerror_l}
+invalidates the pointer returned by @code{strerror} and vice versa.
+
+The function @code{strerror_l} is defined by POSIX and is declared in
+@file{string.h}.
+@end deftypefun
+
 @deftypefun {char *} strerror_r (int @var{errnum}, char *@var{buf}, size_t @var{n})
 @standards{GNU, string.h}
 @safety{@prelim{}@mtsafe{}@asunsafe{@ascuintl{}}@acunsafe{}}
+The following description is for the GNU variant of the function,
+used if @code{_GNU_SOURCE} is defined.  @xref{Feature Test Macros}.
+
 The @code{strerror_r} function works like @code{strerror} but instead of
-returning the error message in a statically allocated buffer shared by
-all threads in the process, it returns a private copy for the
-thread.  This might be either some permanent global data or a message
-string in the user supplied buffer starting at @var{buf} with the
-length of @var{n} bytes.
+returning a pointer to a string that is managed by @theglibc{}, it can
+use the user supplied buffer starting at @var{buf} for storing the
+string.
 
-At most @var{n} characters are written (including the NUL byte) so it is
-up to the user to select a buffer large enough.
+At most @var{n} characters are written (including the NUL byte) to
+@var{buf}, so it is up to the user to select a buffer large enough.
+Whether returned pointer points to the @var{buf} array or not depends on
+the @var{errnum} argument.  If the result string is not stored in
+@var{buf}, the string will does not change for the remaining execution
+of the program.
 
-This function should always be used in multi-threaded programs since
-there is no way to guarantee the string returned by @code{strerror}
-really belongs to the last call of the current thread.
+The function @code{strerror_r} as described above is a GNU extension and
+it is declared in @file{string.h}.  There is a POSIX variant of this
+function, described next.
+@end deftypefun
 
-The function @code{strerror_r} is a GNU extension and it is declared in
-@file{string.h}.
+@deftypefun int strerror_r (int @var{errnum}, char *@var{buf}, size_t @var{n})
+@standards{GNU, string.h}
+
+This variant of the @code{strerror_r} function is used if a standard is
+selected that includes @code{strerror_r}, but @code{_GNU_SOURCE} is not
+defined.  This POSIX variant of thefunction always writes the error
+message to the specified buffer @var{buf} of size @var{n} bytes.
+
+Upon success, @code{strerror_r} returns 0.  Two more return values are
+used to indicate failure.
+
+@vtable @code
+@item EINVAL
+The @var{errnum} argument does not correspond to a known error constant.
+
+@item ERANGE
+The buffer size @var{n} is not large enough to store the entire error message.
+@end vtable
+
+Even if an error is reported occurs, @code{strerror_r} still writes as
+much of the error message to the output buffer as possible.  After a
+call to @code{strerror_r}, the value of @code{errno} is unspecified.
+
+If you want to use the always-copying POSIX semantics of
+@code{strerror_r} in a program that is potentially compiled with
+@code{_GNU_SOURCE} defined, you can use @code{snprintf} with the
+@samp{%m} conversion specifier, like this:
+
+@smallexample
+int saved_errno = errno;
+errno = errnum;
+int ret = snprintf (buf, n, "%m");
+errno = saved_errno;
+if (strerrorname_np (errnum) == NULL)
+  return EINVAL;
+if (ret >= n)
+  return ERANGE:
+return 0;
+@end smallexample
+
+This function is declared in @file{string.h} if it is declared at all.
+It is a POSIX extension.
 @end deftypefun
 
 @deftypefun void perror (const char *@var{message})
@@ -1212,7 +1280,8 @@  The function @code{perror} is declared in @file{stdio.h}.
 @safety{@mtsafe{}@assafe{}@acsafe{}}
 This function returns the name describing the error @var{errnum} or
 @code{NULL} if there is no known constant with this value (e.g "EINVAL"
-for @code{EINVAL}).
+for @code{EINVAL}).  The returned string does not change for the
+remaining execution of the program.
 
 @pindex string.h
 This function is a GNU extension, declared in the header file @file{string.h}.
@@ -1223,18 +1292,20 @@  This function is a GNU extension, declared in the header file @file{string.h}.
 @safety{@mtsafe{}@assafe{}@acsafe{}}
 This function returns the message describing the error @var{errnum} or
 @code{NULL} if there is no known constant with this value (e.g "Invalid
-argument" for @code{EINVAL}).  Different than @code{strerror} the returned
-description is not translated.
+argument" for @code{EINVAL}).  Different than @code{strerror} the
+returned description is not translated, and the returned string does not
+change for the remaining execution of the program.
 
 @pindex string.h
 This function is a GNU extension, declared in the header file @file{string.h}.
 @end deftypefun
 
 @code{strerror} and @code{perror} produce the exact same message for any
-given error code; the precise text varies from system to system.  With
-@theglibc{}, the messages are fairly short; there are no multi-line
-messages or embedded newlines.  Each error message begins with a capital
-letter and does not include any terminating punctuation.
+given error code under the same locale; the precise text varies from
+system to system.  With @theglibc{}, the messages are fairly short;
+there are no multi-line messages or embedded newlines.  Each error
+message begins with a capital letter and does not include any
+terminating punctuation.
 
 @cindex program name
 @cindex name of running program