[1/1] string: Add stpecpy(3)

Message ID 20221222214217.1619716-2-alx@kernel.org
State Superseded
Headers
Series string: Add stpecpy(3) |

Checks

Context Check Description
dj/TryBot-32bit success Build for i686
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Alejandro Colomar Dec. 22, 2022, 9:42 p.m. UTC
  Glibc didn't provide any function that copies a string with truncation.

It only provided strncpy(3) and stpncpy(3), which copy from a string
into a null-padded character sequence at the destination fixed-width
buffer, with truncation.

Those old functions, which don't produce a string, have been misused for
a long time as a poor-man's replacement for strlcpy(3), but doing that
is a source of bugs, since it's hard to calculate the right size that
should be passed to the function, and it's also necessary to explicitly
terminate the buffer with a null byte.  Detecting truncation is yet
another problem.

stpecpy(3), described in the new string_copying(7) manual page, is
similar to OpenBSD's strlcpy(3)/strlcat(3), but:

-  It's simpler to implement.
-  It's faster.
-  It's simpler to detect truncation.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Of course this is still a very early patch.  I just compiled it,
but we'd need to write tests for it.  I didn't want to do all of that
work before the discussion.  Since the source code has been copied from
libstp, I can at least say that the function works, since I already used
that library, but this still needs a lot of work to adapt to glibc, I guess.

 string/Makefile  |  1 +
 string/stpecpy.c | 39 +++++++++++++++++++++++++++++++++++++++
 string/string.h  |  7 +++++++
 3 files changed, 47 insertions(+)
 create mode 100644 string/stpecpy.c
  

Comments

Sam James Dec. 23, 2022, 7:02 a.m. UTC | #1
> On 22 Dec 2022, at 21:42, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> Glibc didn't provide any function that copies a string with truncation.
> 
> It only provided strncpy(3) and stpncpy(3), which copy from a string
> into a null-padded character sequence at the destination fixed-width
> buffer, with truncation.
> 
> Those old functions, which don't produce a string, have been misused for
> a long time as a poor-man's replacement for strlcpy(3), but doing that
> is a source of bugs, since it's hard to calculate the right size that
> should be passed to the function, and it's also necessary to explicitly
> terminate the buffer with a null byte.  Detecting truncation is yet
> another problem.
> 
> stpecpy(3), described in the new string_copying(7) manual page, is
> similar to OpenBSD's strlcpy(3)/strlcat(3), but:
> 
> -  It's simpler to implement.
> -  It's faster.
> -  It's simpler to detect truncation.

Given strlcpy and strlcat are in POSIX next and therefore bar
some extraordinary event will be in glibc, I think we should
probably wait until those two land, then see if there's still
an appetite for stpecpy in glibc.
  
Alejandro Colomar Dec. 23, 2022, 12:26 p.m. UTC | #2
Hi Sam!

On 12/23/22 08:02, Sam James wrote:
> 
> 
>> On 22 Dec 2022, at 21:42, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>
>> Glibc didn't provide any function that copies a string with truncation.
>>
>> It only provided strncpy(3) and stpncpy(3), which copy from a string
>> into a null-padded character sequence at the destination fixed-width
>> buffer, with truncation.
>>
>> Those old functions, which don't produce a string, have been misused for
>> a long time as a poor-man's replacement for strlcpy(3), but doing that
>> is a source of bugs, since it's hard to calculate the right size that
>> should be passed to the function, and it's also necessary to explicitly
>> terminate the buffer with a null byte.  Detecting truncation is yet
>> another problem.
>>
>> stpecpy(3), described in the new string_copying(7) manual page, is
>> similar to OpenBSD's strlcpy(3)/strlcat(3), but:
>>
>> -  It's simpler to implement.
>> -  It's faster.
>> -  It's simpler to detect truncation.
> 
> Given strlcpy and strlcat are in POSIX next and therefore bar
> some extraordinary event will be in glibc, I think we should
> probably wait until those two land, then see if there's still
> an appetite for stpecpy in glibc.

I disagree for the following reasons.

strlcpy(3)/strlcat(3) are designed to be _slow_, in exchange for added 
simplicity and safety.  That's what Theo told me about them.  They didn't care 
about performance.  The two performance issues are:

-  Traverse the entire input string, to make sure it's a string.  stpecpy(3) 
instead only reads what's necessary for the copy; it stops reading after truncation.

-  strlcat(3) finds the terminating null byte; that's something you already know 
where it is, with functions that return a useful pointer (mempcpy(3), stpcpy(3), 
and stpecpy(3)).

While there are many programs that may be fine with the OpenBSD functions, glibc 
should _also_ provide a way to do the operation with an optimal API.  And it's 
in line with glibc providing stpcpy(3) and mempcpy(3) as extensions (stpcpy(3) 
is now in POSIX).


The reason that triggered me wanting to add this function is seeing strncpy(3) 
used for a patch to some glibc internals themselves.  Using strlcpy(3)/cat(3) in 
glibc internals would be bad for performance; I would hope that glibc uses the 
optimal internals, even if it also provides slow functions for users.

There are probably more cases within existing code in glibc.  Just check the 
output of:

     $ grep -rn st.ncpy -A1 | grep -B1 " = '\\\\0'"


Moreover, in the Austin discussion for strlcpy(3)/cat(3), it was mentioned that 
strlcpy(3) has an interface identical to that of snprintf(3), and that "If we 
truly think that this is bad design, should we come up with a new version of 
snprintf() that also doesn't do this? I don't think so.".

Well, I do believe snprintf is also misdesigned, for the same reasons that the 
strlcpy(3) manual page states that you should use strlcpy(3) for catenating 
strings, but rather strlcat(3):

        To detect truncation, perhaps while building a pathname, something like
        the following might be used:

              char *dir, *file, pname[MAXPATHLEN];

              ...

              if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
                      goto toolong;
              if (strlcat(pname, file, sizeof(pname)) >= sizeof(pname))
                      goto toolong;

        Since it is known how many  characters  were  copied  the  first  time,
        things can be sped up a bit by using a copy instead of an append:

              char *dir, *file, pname[MAXPATHLEN];
              size_t n;

              ...

              n = strlcpy(pname, dir, sizeof(pname));
              if (n >= sizeof(pname))
                      goto toolong;
              if (strlcpy(pname + n, file, sizeof(pname) ‐ n) >= sizeof(pname) ‐ n)
                      goto toolong;

        However,  one  may question the validity of such optimizations, as they
        defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
        fact, the first version of this manual page got it wrong.

Guess what?  There's no 'cat' version of snprintf, so users are doomed to write 
buggy code when trying to use it to concatenate after some other string.  I've 
recently been investigating a lot about it, and found invocations of Undefined 
Behavior, and some milder cases of benign off-by-one (on the safe side, by luck) 
errors, in calls to snprintf(3) in several important projects:

-  NGINX Unit:
    -  Undefined Behavior:
       <https://github.com/nginx/unit/issues/795#issuecomment-1345400420>

    -  Wrong truncation detection:
       <https://github.com/nginx/unit/pull/734#discussion_r1043963527>
       <https://github.com/nginx/unit/issues/804>

-  shadow:
    -  off-by-one:
       <https://github.com/shadow-maint/shadow/pull/607>

    -  clever code that looks like a bug but it's not:
       <https://github.com/shadow-maint/shadow/issues/608>

Rather than adding some catenating variant of snprintf(3), I suggest adding a 
single function that has an interface similar to stpcpy(3) and mempcpy(3), and 
identical to stpecpy(3):  stpeprintf(3):

<http://www.alejandro-colomar.es/src/alx/alx/libstp.git/tree/src/stp/stpe/stpeprintf.c>

I implemented it in terms of vsnprintf(3), so I need to handle a return of -1, 
but if implemented from scratch in glibc, in could be written to not be limited 
to INT_MAX (although I wonder why anyone would want to copy more than INT_MAX as 
a formatted string).


Below is its manual page in libstp.

Cheers,

Alex
---

stpeprintf(3)              Library Functions Manual              stpeprintf(3)

NAME
        stpeprintf, vstpeprintf - create a formatted string with truncation

LIBRARY
        Stp string library (libstp, pkgconf ‐‐cflags ‐‐libs libstp)

SYNOPSIS
        #include <stp/stpe/stpeprintf.h>

        char *_Nullable stpeprintf(char *_Nullable dst, char end[0],
                                   const char *restrict fmt, ...);
        char *_Nullable vstpeprintf(char *_Nullable dst, char end[0],
                                   const char *restrict fmt, va_list ap);

DESCRIPTION
        These functions are almost identical to snprintf(3) and vsnprintf(3).

        The  destination  buffer  is limited by a pointer to its end —one after
        its last element— instead of a size.

        These  functions  can  be  chained  with  calls  to  stpeprintf(3)  and
        vstpeprintf(3).

RETURN VALUE
        NULL
               •  If this function failed (see ERRORS).
               •  If dst was NULL.

        end
               •  If this call truncated.
               •  If  dst  was equal to end (a previous call to these functions
                  truncated).

        dst + strlen(dst)
               On success, these functions return a pointer to the  terminating
               null byte.

ERRORS
        These functions may fail for any of the same reasons as vsnprintf(3).

ATTRIBUTES
        For  an  explanation  of  the  terms  used in this section, see attrib‐
        utes(7).
        ┌────────────────────────────────────────────┬───────────────┬─────────┐
        │Interface                                   │ Attribute     │ Value   │
        ├────────────────────────────────────────────┼───────────────┼─────────┤
        │stpeprintf(3), vstpeprintf(3)               │ Thread safety │ MT‐Safe │
        └────────────────────────────────────────────┴───────────────┴─────────┘

STANDARDS
        None.

EXAMPLES
        See stpecpy(3).

SEE ALSO
        stpecpy(3), string_copying(7)

libstp (unreleased)                 (date)                       stpeprintf(3)








-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 23, 2022, 12:29 p.m. UTC | #3
On 12/23/22 13:26, Alejandro Colomar wrote:

> Well, I do believe snprintf is also misdesigned, for the same reasons that the 
> strlcpy(3) manual page states that you should use strlcpy(3) for catenating 

typo fix:  s/should/shouldn't/

> strings, but rather strlcat(3):
> 
>         To detect truncation, perhaps while building a pathname, something like
>         the following might be used:
> 
>               char *dir, *file, pname[MAXPATHLEN];
> 
>               ...
> 
>               if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
>                       goto toolong;
>               if (strlcat(pname, file, sizeof(pname)) >= sizeof(pname))
>                       goto toolong;
> 
>         Since it is known how many  characters  were  copied  the  first  time,
>         things can be sped up a bit by using a copy instead of an append:
> 
>               char *dir, *file, pname[MAXPATHLEN];
>               size_t n;
> 
>               ...
> 
>               n = strlcpy(pname, dir, sizeof(pname));
>               if (n >= sizeof(pname))
>                       goto toolong;
>               if (strlcpy(pname + n, file, sizeof(pname) ‐ n) >= sizeof(pname) ‐ n)
>                       goto toolong;
> 
>         However,  one  may question the validity of such optimizations, as they
>         defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
>         fact, the first version of this manual page got it wrong.

-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 23, 2022, 5:21 p.m. UTC | #4
Hi Wilco,

On 12/23/22 13:26, Alejandro Colomar wrote:
>         However,  one  may question the validity of such optimizations, as they
>         defeat the whole purpose of strlcpy() and strlcat().  As  a  matter  of
>         fact, the first version of this manual page got it wrong.
> 
> Guess what?  There's no 'cat' version of snprintf, so users are doomed to write 
> buggy code when trying to use it to concatenate after some other string.  I've 
> recently been investigating a lot about it, and found invocations of Undefined 
> Behavior, and some milder cases of benign off-by-one (on the safe side, by luck) 
> errors, in calls to snprintf(3) in several important projects:
> 
> -  NGINX Unit:
>     -  Undefined Behavior:
>        <https://github.com/nginx/unit/issues/795#issuecomment-1345400420>
> 
>     -  Wrong truncation detection:
>        <https://github.com/nginx/unit/pull/734#discussion_r1043963527>
>        <https://github.com/nginx/unit/issues/804>
> 
> -  shadow:
>     -  off-by-one:
>        <https://github.com/shadow-maint/shadow/pull/607>
> 
>     -  clever code that looks like a bug but it's not:
>        <https://github.com/shadow-maint/shadow/issues/608>

And adding strlcat(3) doesn't address the issue about snprintf(3), which, as 
EdSchouten said in the Austin discussion:

"
- strlcpy() fits within the existing set of functions like a glove. strlcpy(a, 
b, n) behaves identically to snprintf(a, n, "%s", b). The return value always 
corresponds to the number of non-null bytes that would have been written. If we 
truly think that this is bad design, should we come up with a new version of 
snprintf() that also doesn't do this? I don't think so.
"

I only disagree in the last part ("I don't think so").  As I linked in my 
previous message, there have been numerous misuses of snprintf(3), due to the 
fact that it's not designed to be concatenated.  But of course there's no 
alternative, so the only way is using it, and hoping that you didn't introduce a 
bug.

Steffen (was it Nurpmeso?) in that same discussion rebated the claims about 
strlcpy(3) with performance claims that snprintf(3) is slow, but that was the 
least evil.  The real evil with snprintf(3) is that it doesn't have a 'cat' 
complement.

<https://www.austingroupbugs.net/view.php?id=986>

So, I'll send a second revision of the patch set to add stpeprintf(3).

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>
  
Sam James Dec. 31, 2022, 3:13 p.m. UTC | #5
> On 23 Dec 2022, at 12:26, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> Hi Sam!
> 
> On 12/23/22 08:02, Sam James wrote:
>>> On 22 Dec 2022, at 21:42, Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>> 
>>> Glibc didn't provide any function that copies a string with truncation.
>>> 
>>> It only provided strncpy(3) and stpncpy(3), which copy from a string
>>> into a null-padded character sequence at the destination fixed-width
>>> buffer, with truncation.
>>> 
>>> Those old functions, which don't produce a string, have been misused for
>>> a long time as a poor-man's replacement for strlcpy(3), but doing that
>>> is a source of bugs, since it's hard to calculate the right size that
>>> should be passed to the function, and it's also necessary to explicitly
>>> terminate the buffer with a null byte.  Detecting truncation is yet
>>> another problem.
>>> 
>>> stpecpy(3), described in the new string_copying(7) manual page, is
>>> similar to OpenBSD's strlcpy(3)/strlcat(3), but:
>>> 
>>> -  It's simpler to implement.
>>> -  It's faster.
>>> -  It's simpler to detect truncation.
>> Given strlcpy and strlcat are in POSIX next and therefore bar
>> some extraordinary event will be in glibc, I think we should
>> probably wait until those two land, then see if there's still
>> an appetite for stpecpy in glibc.
> 
> I disagree for the following reasons.
[snip]

Hi Alex,

Thanks for your detailed and thoughtful reply. I'll reflect on your comments here
and in the rest of the thread(s) - but there's some intriguing pointers you've made.

I wasn't trying to be dismissive at all so I hope it didn't come across like that.

Thank you again!

Best,
sam
  
Alejandro Colomar Dec. 31, 2022, 3:15 p.m. UTC | #6
Hey Sam!

On 12/31/22 16:13, Sam James wrote:
[...]

>>
>> I disagree for the following reasons.
> [snip]
> 
> Hi Alex,
> 
> Thanks for your detailed and thoughtful reply. I'll reflect on your comments here
> and in the rest of the thread(s) - but there's some intriguing pointers you've made.
> 
> I wasn't trying to be dismissive at all so I hope it didn't come across like that.

Ahh, no, I didn't take it bad; really :)

> 
> Thank you again!

Cheers!

Alex

> 
> Best,
> sam
> 

-- 
<http://www.alejandro-colomar.es/>
  

Patch

diff --git a/string/Makefile b/string/Makefile
index 938f528b8d..95e9ebce6d 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -73,6 +73,7 @@  routines := \
   sigabbrev_np \
   sigdescr_np \
   stpcpy \
+  stpecpy \
   stpncpy \
   strcasecmp \
   strcasecmp_l \
diff --git a/string/stpecpy.c b/string/stpecpy.c
new file mode 100644
index 0000000000..e6194559ee
--- /dev/null
+++ b/string/stpecpy.c
@@ -0,0 +1,39 @@ 
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 3.0 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+char *
+stpecpy (char *dest, char *end, const char *restrict src)
+{
+	char  *p;
+
+	if (dest == end)
+		return end;
+	if (dest == NULL)  // Allow chaining with stpeprintf(3).
+		return NULL;
+	if (dest > end)
+		__builtin_unreachable();
+
+	p = memccpy(dest, src, '\0', end - dest);
+	if (p != NULL)
+		return p - 1;
+
+	/* Truncation detected. */
+	end[-1] = '\0';
+	return end;
+}
diff --git a/string/string.h b/string/string.h
index 54dd8344de..966a8cb744 100644
--- a/string/string.h
+++ b/string/string.h
@@ -502,6 +502,13 @@  extern char *stpncpy (char *__restrict __dest,
 #endif
 
 #ifdef	__USE_GNU
+/* Copy the string SRC into a null-terminated string at DEST,
+   truncating if it would run after END.  Return a pointer to
+   the terminating null byte, or END if the string was truncated,
+   or NULL if DEST was NULL. */
+extern char *stpecpy (char *__dest, char *__end, const char *__restrict __src)
+     __THROW __nonnull ((2, 3));
+
 /* Compare S1 and S2 as strings holding name & indices/version numbers.  */
 extern int strverscmp (const char *__s1, const char *__s2)
      __THROW __attribute_pure__ __nonnull ((1, 2));