manual: Document a GNU extension for strncmp/wcsncmp

Message ID 874j9eml6y.fsf@oldenburg.str.redhat.com
State New
Headers
Series manual: Document a GNU extension for strncmp/wcsncmp |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

Florian Weimer June 27, 2024, 3:58 p.m. UTC
  At least strncnmp is widely used for string prefix checking,
so add some language to make this valid.  Add tests to show that
glibc implements this extension.

This should probably go in after the strnlen/wcsnlen GNU extension.

Tested on aarch64-linux-gnu (Neoverse-V2), i686-linux-gnu (Zen 4),
powerpc64le-linux-gnu (POWER10), x86_64-linux-gnu (Zen 4).

On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934.
(There could be further issues because the test crashes rather early.)

---
 manual/string.texi        |  36 ++++++++-
 string/Makefile           |   1 +
 string/test-Xncmp-gnu.c   | 183 ++++++++++++++++++++++++++++++++++++++++++++++
 string/test-strncmp-gnu.c |   4 +
 wcsmbs/Makefile           |   1 +
 wcsmbs/test-wcsncmp-gnu.c |   5 ++
 6 files changed, 226 insertions(+), 4 deletions(-)


base-commit: 21738846a19eb4a36981efd37d9ee7cb6d687494
  

Comments

Andreas Schwab June 27, 2024, 4:33 p.m. UTC | #1
On Jun 27 2024, Florian Weimer wrote:

> @@ -1367,15 +1380,30 @@ This function is the similar to @code{strcmp}, except that no more than
>  @var{size} bytes are compared.  In other words, if the two
>  strings are the same in their first @var{size} bytes, the
>  return value is zero.
> +
> +As a GNU extension, the pointer arguments do not need to point to arrays
> +of at least @var{size} elements in some cases.

I don't think this is needed.  The standard already says that characters
beyond the first null character are not compared.  Similar wording
exists also for the other strn and wcsn functions.
  
Florian Weimer July 8, 2024, 7:40 a.m. UTC | #2
* Andreas Schwab:

> On Jun 27 2024, Florian Weimer wrote:
>
>> @@ -1367,15 +1380,30 @@ This function is the similar to @code{strcmp}, except that no more than
>>  @var{size} bytes are compared.  In other words, if the two
>>  strings are the same in their first @var{size} bytes, the
>>  return value is zero.
>> +
>> +As a GNU extension, the pointer arguments do not need to point to arrays
>> +of at least @var{size} elements in some cases.
>
> I don't think this is needed.  The standard already says that characters
> beyond the first null character are not compared.  Similar wording
> exists also for the other strn and wcsn functions.

I think that's still ambiguous whether they could be accessed.  Maybe it
shouldn't say this is an extension, see the strnlen discussion.

Thanks,
Florian
  
Paul Eggert July 9, 2024, 2:44 p.m. UTC | #3
On 7/8/24 09:40, Florian Weimer wrote:
> I think that's still ambiguous whether they could be accessed.  Maybe it
> shouldn't say this is an extension, see the strnlen discussion.

Perhaps the glibc manual could say that although this part of the 
behavior of strncmp/strnlen (and presumably related functions) are as 
stated for glibc and for all other known C library implementations, they 
might be an extension to the C and/or POSIX standards, as opinions 
differ as to whether the standards are clear about this.
  

Patch

diff --git a/manual/string.texi b/manual/string.texi
index 0b667bd3fb..ecd3c66d43 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -1234,6 +1234,12 @@  char} objects, then promoted to @code{int}).
 
 If the contents of the two blocks are equal, @code{memcmp} returns
 @code{0}.
+
+Note that @code{memcmp} requires objects of at least @var{size} bytes at
+@var{a1} and @var{a2}.  The implementation does not necessarily stop
+processing after the first byte difference.  Use @code{strcmp} to
+compare a string with a string literal, and use the GNU extension of
+@code{strncmp} to check if a string has a given prefix.
 @end deftypefun
 
 @deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
@@ -1247,6 +1253,13 @@  smaller or larger than the corresponding wide character in @var{a2}.
 
 If the contents of the two blocks are equal, @code{wmemcmp} returns
 @code{0}.
+
+Note that @code{wmemcmp} requires that @var{size} wide characters are
+available starting at @var{a1} and @var{a2}.  The implementation does
+not necessarily stop processing after the first difference encountered.
+Use @code{wcscmp} to compare a wide string with a wide string literal,
+and use the GNU extension of @code{wcsncmp} to check if a string has a
+given prefix.
 @end deftypefun
 
 On arbitrary arrays, the @code{memcmp} function is mostly useful for
@@ -1367,15 +1380,30 @@  This function is the similar to @code{strcmp}, except that no more than
 @var{size} bytes are compared.  In other words, if the two
 strings are the same in their first @var{size} bytes, the
 return value is zero.
+
+As a GNU extension, the pointer arguments do not need to point to arrays
+of at least @var{size} elements in some cases.  For example, for
+null-terminated strings @var{s1} and @var{s2}, the expression
+@code{strncmp (@var{s1}, @var{s2}, strlen (@var{s2})) == 0} is true if
+and only if the string @var{s2} is a prefix of the string @var{s1}.
+More generally, in the GNU version, @code{strncmp (@var{s1}, @var{s2},
+@var{size})} is valid if both @code{strnlen (@var{s1}, @var{size})} and
+@code{strnlen (@var{s2}, @var{size})} are valid.  In the prefix checking
+idiom, note that this still requires that @var{s1} is a null-terminated
+string there are fewer than @var{size} array elements starting at
+@var{s1}.
 @end deftypefun
 
 @deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
 @standards{ISO, wchar.h}
 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
-This function is similar to @code{wcscmp}, except that no more than
-@var{size} wide characters are compared.  In other words, if the two
-strings are the same in their first @var{size} wide characters, the
-return value is zero.
+This function is similar to @code{strncnmp}, except that it operates
+on wide characters instead of bytes.  At most @var{size} wide characters
+are compared.
+
+As a GNU extension, @code{wcsncmp (@var{ws1}, @var{ws2}, @var{size})} is
+valid if both @code{wcsnlen (@var{ws1}, @var{size})} and @code{wcsnlen
+(@var{ws2}, @var{size})} are valid.
 @end deftypefun
 
 @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
diff --git a/string/Makefile b/string/Makefile
index 8f31fa49e6..ad98d06391 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -181,6 +181,7 @@  tests := \
   test-strncasecmp \
   test-strncat \
   test-strncmp \
+  test-strncmp-gnu \
   test-strncpy \
   test-strndup \
   test-strnlen \
diff --git a/string/test-Xncmp-gnu.c b/string/test-Xncmp-gnu.c
new file mode 100644
index 0000000000..9dc1ecca3c
--- /dev/null
+++ b/string/test-Xncmp-gnu.c
@@ -0,0 +1,183 @@ 
+/* Test GNU extension for non-array inputs to string comparison functions.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This skeleton file is included from string/test-strncmp-gnu.c and
+   wcsmbs/tst-wcsncmp-gnu.c to test that reading of the arrays stops
+   at the first null character.
+
+   TEST_IDENTIFIER must be the test function identifier.  TEST_NAME is
+   the same as a string.
+
+   CHAR must be defined as the character type.  */
+
+#include <array_length.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/next_to_fault.h>
+#include <support/test-driver.h>
+#include <sys/param.h>
+#include <unistd.h>
+
+/* Much shorter than test-Xnlen-gnu.c because of deeply nested loops.  */
+enum { buffer_length = 80 };
+
+/* The test buffer layout follows what is described test-Xnlen-gnu.c,
+   except that there two buffers, left and right.  The variables
+   a_count, zero_count, start_offset are all duplicated.  */
+
+/* Return the maximum string length for a string that starts at
+   start_offset.  */
+static int
+string_length (int a_count, int start_offset)
+{
+  if (start_offset == buffer_length || start_offset >= a_count)
+    return 0;
+  else
+    return a_count - start_offset;
+}
+
+/* This is the valid maximum length argument computation for
+   strnlen/wcsnlen.  See text-Xnlen-gnu.c.  */
+static int
+maximum_length (int start_offset, int zero_count)
+{
+  if (start_offset == buffer_length)
+    return 0;
+  else if (zero_count > 0)
+    /* Effectively unbounded, but we need to stop fairly low,
+       otherwise testing takes too long.  */
+    return buffer_length + 32;
+  else
+    return buffer_length - start_offset;
+}
+
+typedef __typeof (TEST_IDENTIFIER) *proto_t;
+
+#define TEST_MAIN
+#include "test-string.h"
+
+IMPL (TEST_IDENTIFIER, 1)
+
+static int
+test_main (void)
+{
+  TEST_VERIFY_EXIT (sysconf (_SC_PAGESIZE) >= buffer_length);
+  test_init ();
+
+  struct support_next_to_fault left_ntf
+    = support_next_to_fault_allocate (buffer_length * sizeof (CHAR));
+  CHAR *left_buffer = (CHAR *) left_ntf.buffer;
+  struct support_next_to_fault right_ntf
+    = support_next_to_fault_allocate (buffer_length * sizeof (CHAR));
+  CHAR *right_buffer = (CHAR *) right_ntf.buffer;
+
+  FOR_EACH_IMPL (impl, 0)
+    {
+      printf ("info: testing %s\n", impl->name);
+      for (size_t i = 0; i < buffer_length; ++i)
+        left_buffer[i] = 'A';
+
+      for (int left_zero_count = 0; left_zero_count <= buffer_length;
+           ++left_zero_count)
+        {
+          if (left_zero_count > 0)
+            left_buffer[buffer_length - left_zero_count] = 0;
+          int left_a_count = buffer_length - left_zero_count;
+          for (size_t i = 0; i < buffer_length; ++i)
+            right_buffer[i] = 'A';
+          for (int right_zero_count = 0; right_zero_count <= buffer_length;
+               ++right_zero_count)
+            {
+              if (right_zero_count > 0)
+                right_buffer[buffer_length - right_zero_count] = 0;
+              int right_a_count = buffer_length - right_zero_count;
+              for (int left_start_offset = 0;
+                   left_start_offset <= buffer_length;
+                   ++left_start_offset)
+                {
+                  CHAR *left_start_pointer = left_buffer + left_start_offset;
+                  int left_maxlen
+                    = maximum_length (left_start_offset, left_zero_count);
+                  int left_length
+                    = string_length (left_a_count, left_start_offset);
+                  for (int right_start_offset = 0;
+                       right_start_offset <= buffer_length;
+                       ++right_start_offset)
+                    {
+                      CHAR *right_start_pointer
+                        = right_buffer + right_start_offset;
+                      int right_maxlen
+                        = maximum_length (right_start_offset, right_zero_count);
+                      int right_length
+                        = string_length (right_a_count, right_start_offset);
+
+                      /* Maximum length is modelled after strnlen/wcsnlen,
+                         and must be valid for both pointer arguments at
+                         the same time.  */
+                      int maxlen = MIN (left_maxlen, right_maxlen);
+
+                      for (int length_argument = 0; length_argument <= maxlen;
+                           ++length_argument)
+                        {
+                          if (test_verbose)
+                            {
+                              printf ("left: zero_count=%d"
+                                      " a_count=%d start_offset=%d\n",
+                                      left_zero_count, left_a_count,
+                                      left_start_offset);
+                              printf ("right: zero_count=%d"
+                                      " a_count=%d start_offset=%d\n",
+                                      right_zero_count, right_a_count,
+                                      right_start_offset);
+                              printf ("length argument: %d\n",
+                                      length_argument);
+                            }
+
+                          /* Effective lengths bounded by length argument.
+                             The effective length determines the
+                             outcome of the comparison.  */
+                          int left_effective
+                            = MIN (left_length, length_argument);
+                          int right_effective
+                            = MIN (right_length, length_argument);
+                          if (left_effective == right_effective)
+                            TEST_COMPARE (CALL (impl,
+                                                left_start_pointer,
+                                                right_start_pointer,
+                                                length_argument), 0);
+                          else if (left_effective < right_effective)
+                            TEST_COMPARE (CALL (impl,
+                                                left_start_pointer,
+                                                right_start_pointer,
+                                                length_argument) < 0, 1);
+                          else
+                            TEST_COMPARE (CALL (impl,
+                                                left_start_pointer,
+                                                right_start_pointer,
+                                                length_argument) > 0, 1);
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/string/test-strncmp-gnu.c b/string/test-strncmp-gnu.c
new file mode 100644
index 0000000000..0652145caa
--- /dev/null
+++ b/string/test-strncmp-gnu.c
@@ -0,0 +1,4 @@ 
+#define TEST_IDENTIFIER strncmp
+#define TEST_NAME "strncmp"
+typedef char CHAR;
+#include "test-Xncmp-gnu.c"
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index 1cddd8cc6d..884b9ce8b7 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -158,6 +158,7 @@  tests := \
   test-wcslen \
   test-wcsncat \
   test-wcsncmp \
+  test-wcsncmp-gnu \
   test-wcsncpy \
   test-wcsnlen \
   test-wcspbrk \
diff --git a/wcsmbs/test-wcsncmp-gnu.c b/wcsmbs/test-wcsncmp-gnu.c
new file mode 100644
index 0000000000..6d085d300b
--- /dev/null
+++ b/wcsmbs/test-wcsncmp-gnu.c
@@ -0,0 +1,5 @@ 
+#include <wchar.h>
+#define TEST_IDENTIFIER wcsncmp
+#define TEST_NAME "wcsncmp"
+typedef wchar_t CHAR;
+#include "../string/test-Xncmp-gnu.c"