manual: Document that strnlen, wcsnlen do not need arrays of full size

Message ID 878qyqmtnt.fsf@oldenburg.str.redhat.com
State New
Headers
Series manual: Document that strnlen, wcsnlen do not need arrays of full size |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed

Commit Message

Florian Weimer June 27, 2024, 12:55 p.m. UTC
  The standards are at best ambiguous as to whether it is valid to
call these functions with array arguments that are shorter than
the given number of elements.  Supporting such usage is very useful,
see stdio-common/Xprintf_buffer_puts_1.c for an example.  There is
no other string function that provides this functionality.

Also update the description of strndup accordingly because it uses
strnlen internally.

Tested on aarch64-linux-gnu (Neoverse-V2), i686-linux-gnu (Zen 4),
powerpc64le-linux-gnu (POWER10), s390x-linux-gnu (z16),
x86_64-linux-gnu (Zen 4).

---
 manual/string.texi        |  31 ++++++-----
 string/Makefile           |   1 +
 string/test-Xnlen-gnu.c   | 133 ++++++++++++++++++++++++++++++++++++++++++++++
 string/test-strnlen-gnu.c |   4 ++
 wcsmbs/Makefile           |   1 +
 wcsmbs/tst-wcsnlen-gnu.c  |   5 ++
 6 files changed, 163 insertions(+), 12 deletions(-)


base-commit: 21738846a19eb4a36981efd37d9ee7cb6d687494
  

Comments

Paul Eggert June 28, 2024, 11 p.m. UTC | #1
On 6/27/24 13:55, Florian Weimer wrote:
> The standards are at best ambiguous as to whether it is valid to
> call these functions with array arguments that are shorter than
> the given number of elements.  Supporting such usage is very useful,

I don't see any ambiguity in POSIX.1-2024 or in earlier POSIX versions. 
(The C standard doesn't specify strnlen.)

If there is any ambiguity then we should file a strnlen interpretation 
request to the POSIX maintainers to clear things up. I expect they'll 
interpret it the same way glibc (and everybody else) does. In the 
meantime, the manual shouldn't say that glibc is implementing an 
extension to POSIX.

So please omit the strnlen sentence "Support for input arrays shorter 
than @var{maxlen} characters is a GNU extension." And similarly for wcsnlen.
  
Florian Weimer June 28, 2024, 11:43 p.m. UTC | #2
* Paul Eggert:

> On 6/27/24 13:55, Florian Weimer wrote:
>> The standards are at best ambiguous as to whether it is valid to
>> call these functions with array arguments that are shorter than
>> the given number of elements.  Supporting such usage is very useful,
>
> I don't see any ambiguity in POSIX.1-2024 or in earlier POSIX
> versions. (The C standard doesn't specify strnlen.)
>
> If there is any ambiguity then we should file a strnlen interpretation
> request to the POSIX maintainers to clear things up. I expect they'll
> interpret it the same way glibc (and everybody else) does. In the
> meantime, the manual shouldn't say that glibc is implementing an
> extension to POSIX.
>
> So please omit the strnlen sentence "Support for input arrays shorter
> than @var{maxlen} characters is a GNU extension." And similarly for
> wcsnlen.

I had no idea.

I assumed that the previous text in the manual was specifically written
to make such uses of these functions invalid.  The 2.39 manual text
very explicitly established the precondition that the input must be an
array of MAXLEN characters:

     If the array S of size MAXLEN contains a null byte, the ‘strnlen’
     function returns the length of the string S in bytes.  Otherwise it
     returns MAXLEN.  Therefore this function is equivalent to ‘(strlen
     (S) < MAXLEN ? strlen (S) : MAXLEN)’ but it is more efficient and
     works even if S is not null-terminated so long as MAXLEN does not
     exceed the size of S's array.

But that's not what applications require, and fortunately it's not what
we implement, either.  I'll drop the GNU extension part if that's what
it takes.  But as far as I can tell, stopping reading at the first null
character is usually not documented as strnlen behavior, only stopping
at MAXLEN characters.

Thanks,
Florian
  
Paul Eggert June 29, 2024, 7:31 a.m. UTC | #3
On 6/29/24 00:43, Florian Weimer wrote:
> The 2.39 manual text
> very explicitly established the precondition that the input must be an
> array of MAXLEN characters:

Yes, that was a longstanding bug in the manual that I recently fixed, 
given that (a) it conflicted with POSIX and (b) it didn't match what 
glibc actually does.

It is a little odd that most buffer+size functions require the buffer to 
be at least that size, but there are a few exceptions like strnlen and 
strncmp where POSIX's carefully but tersely written language says the 
buffer can be shorter if it's null-terminated.
  

Patch

diff --git a/manual/string.texi b/manual/string.texi
index 0b667bd3fb..0bf3c45d0e 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -311,14 +311,16 @@  This function was introduced in @w{Amendment 1} to @w{ISO C90}.
 @deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
 @standards{POSIX.1, string.h}
 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
-This returns the offset of the first null byte in the array @var{s},
-except that it returns @var{maxlen} if the first @var{maxlen} bytes
-are all non-null.
-Therefore this function is equivalent to
-@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
-but it
-is more efficient and works even if @var{s} is not null-terminated so
-long as @var{maxlen} does not exceed the size of @var{s}'s array.
+This function returns the smallest non-negative integer @var{i} less
+than @var{maxlen} such that @code{@var{s}[@var{i}]} is zero, or
+@var{maxlen} if no such integer exists.  The array object into which
+@var{s} points must contain a null character at @var{s} or subsequently,
+or there must be at least @var{maxlen} array elements starting at
+@var{s}.
+
+Therefore this function is equivalent to @code{(strlen (@var{s}) <
+@var{maxlen} ? strlen (@var{s}) : @var{maxlen})} but it is more
+efficient and works even if @var{s} is not null-terminated.
 
 @smallexample
 char string[32] = "hello, world";
@@ -330,7 +332,8 @@  strnlen (string, 5)
 
 This function is part of POSIX.1-2008 and later editions, but was
 available in @theglibc{} and other systems as an extension long before
-it was standardized.  It is declared in @file{string.h}.
+it was standardized.  It is declared in @file{string.h}.  Support for
+input arrays shorter than @var{maxlen} characters is a GNU extension.
 @end deftypefun
 
 @deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
@@ -339,8 +342,9 @@  it was standardized.  It is declared in @file{string.h}.
 @code{wcsnlen} is the wide character equivalent to @code{strnlen}.  The
 @var{maxlen} parameter specifies the maximum number of wide characters.
 
-This function is part of POSIX.1-2008 and later editions, and is
-declared in @file{wchar.h}.
+This function is part of POSIX and is declared in @file{wchar.h}.
+Support for input arrays shorter than @var{maxlen} wide characters is a
+GNU extension.
 @end deftypefun
 
 @node Copying Strings and Arrays
@@ -922,7 +926,10 @@  processing strings.
 @standards{GNU, string.h}
 @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
 This function is similar to @code{strdup} but always copies at most
-@var{size} bytes into the newly allocated string.
+@var{size} bytes into the newly allocated string.  The array object into
+which @var{s} points must contain a null character at @var{s} or
+subsequently, or there must be at least @var{size} array elements
+starting at @var{s}.
 
 If the length of @var{s} is more than @var{size}, then @code{strndup}
 copies just the first @var{size} bytes and adds a closing null byte.
diff --git a/string/Makefile b/string/Makefile
index 8f31fa49e6..6cc629968a 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -184,6 +184,7 @@  tests := \
   test-strncpy \
   test-strndup \
   test-strnlen \
+  test-strnlen-gnu \
   test-strpbrk \
   test-strrchr \
   test-strspn \
diff --git a/string/test-Xnlen-gnu.c b/string/test-Xnlen-gnu.c
new file mode 100644
index 0000000000..ab40a2e687
--- /dev/null
+++ b/string/test-Xnlen-gnu.c
@@ -0,0 +1,133 @@ 
+/* Test GNU extension for non-array inputs to string length functions.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This skeleton file is included from string/test-strnlen-gnu.c and
+   wcsmbs/tst-wcsnlen-gnu.c to test that reading of the array stops at
+   the first null character.
+
+   TEST_IDENTIFIER must be the test function identifier.  TEST_NAME is
+   the same as a string.
+
+   CHAR must be defined as the character type.  */
+
+#include <array_length.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/next_to_fault.h>
+#include <support/test-driver.h>
+#include <sys/param.h>
+#include <unistd.h>
+
+typedef __typeof (TEST_IDENTIFIER) *proto_t;
+
+#define TEST_MAIN
+#include "test-string.h"
+
+IMPL (TEST_IDENTIFIER, 1)
+
+static int
+test_main (void)
+{
+  enum { buffer_length = 256 };
+  TEST_VERIFY_EXIT (sysconf (_SC_PAGESIZE) >= buffer_length);
+
+  test_init ();
+
+  /* Buffer layout: There are a_count 'A' character followed by
+     zero_count null character, for a total of buffer_length
+     character:
+
+     AAAAA...AAAAA 00000 ... 00000 (unmapped page follows)
+     \           / \             /
+       (a_count)     (zero_count)
+       \___ (buffer_length) ___/
+          ^
+          |
+          start_offset
+
+    The buffer length does not change, but a_count (and thus _zero)
+    and start_offset vary.
+
+    If start_offset == buffer_length, only 0 is a valid length
+    argument.  The result is 0.
+
+    Otherwwise, if zero_count > 0 (if there a null characters in the
+    buffer), then any length argument is valid.  If start_offset <
+    a_count (i.e., there is a non-null character at start_offset), the
+    result is the minimum of a_count - start_offset and the length
+    argument.  Otherwise the result is 0.
+
+    Otherwise, there are no null characters before the unmapped page.
+    The length argument must not be greater than buffer_length -
+    start_offset, and the result is the length argument.  */
+
+  struct support_next_to_fault ntf
+    = support_next_to_fault_allocate (buffer_length * sizeof (CHAR));
+  CHAR *buffer = (CHAR *) ntf.buffer;
+
+  FOR_EACH_IMPL (impl, 0)
+    {
+      printf ("info: testing %s\n", impl->name);
+      for (size_t i = 0; i < buffer_length; ++i)
+        buffer[i] = 'A';
+
+      for (int zero_count = 0; zero_count <= buffer_length; ++zero_count)
+        {
+          if (zero_count > 0)
+            buffer[buffer_length - zero_count] = 0;
+          int a_count = buffer_length - zero_count;
+          for (int start_offset = 0; start_offset <= buffer_length;
+               ++start_offset)
+            {
+              CHAR *start_pointer = buffer + start_offset;
+              if (start_offset == buffer_length)
+                TEST_COMPARE (CALL (impl, buffer + start_offset, 0), 0);
+              else if (zero_count > 0)
+                for (int length_argument = 0;
+                     length_argument <= 2 * buffer_length;
+                     ++length_argument)
+                  {
+                    if (test_verbose)
+                      printf ("zero_count=%d a_count=%d start_offset=%d"
+                              " length_argument=%d\n",
+                              zero_count, a_count, start_offset,
+                              length_argument);
+                    if (start_offset < a_count)
+                      TEST_COMPARE (CALL (impl, start_pointer, length_argument),
+                                    MIN (a_count - start_offset,
+                                         length_argument));
+                    else
+                      TEST_COMPARE (CALL (impl, start_pointer, length_argument),
+                                    0);
+                  }
+              else
+                for (int length_argument = 0;
+                     length_argument <= buffer_length - start_offset;
+                     ++length_argument)
+                  TEST_COMPARE (CALL (impl, start_pointer, length_argument),
+                                length_argument);
+            }
+        }
+    }
+
+  support_next_to_fault_free (&ntf);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/string/test-strnlen-gnu.c b/string/test-strnlen-gnu.c
new file mode 100644
index 0000000000..ffb3f534d0
--- /dev/null
+++ b/string/test-strnlen-gnu.c
@@ -0,0 +1,4 @@ 
+#define TEST_IDENTIFIER strnlen
+#define TEST_NAME "strnlen"
+typedef char CHAR;
+#include "test-Xnlen-gnu.c"
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index 1cddd8cc6d..1c65cd759b 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -184,6 +184,7 @@  tests := \
   tst-wcslcpy \
   tst-wcslcpy2 \
   tst-wcsnlen \
+  tst-wcsnlen-gnu \
   tst-wcstod-nan-locale \
   tst-wcstod-nan-sign \
   tst-wcstod-round \
diff --git a/wcsmbs/tst-wcsnlen-gnu.c b/wcsmbs/tst-wcsnlen-gnu.c
new file mode 100644
index 0000000000..68a17dd1f8
--- /dev/null
+++ b/wcsmbs/tst-wcsnlen-gnu.c
@@ -0,0 +1,5 @@ 
+#include <wchar.h>
+#define TEST_IDENTIFIER wcsnlen
+#define TEST_NAME "wcsnlen"
+typedef wchar_t CHAR;
+#include "../string/test-Xnlen-gnu.c"