add attribute nonstring

Message ID d0432148-24d3-baaa-faf6-b00c33e45103@gmail.com
State New, archived
Headers

Commit Message

Martin Sebor Nov. 10, 2017, 5:31 p.m. UTC
  The latest revision of GCC 8.0 adds a -Wstringop-truncation option
to detect common misuses of the strncpy and strncat functions that
may truncate the copy and leave the result without a terminating
nul character.

On its own, the warning cannot distinguish the intended and safe
uses of the functions (to fill a buffer with data) from the
incorrect and unsafe ones (to make a "bounded" copy of a string).
Because the misuses have become prevalent and the correct/intended
uses are in a minority, GCC has added a new attribute called
nonstring to annotate character arrays that aren't meant to be
treated as nul-terminated strings.  The attribute disables
the checker.  In the future, GCC will also use the attribute to
issue warnings when such arrays are passed to functions that
expect nul-terminated strings.

Attached is a patch to annotate such arrays in Glibc to prevent
warnings on x86_64 (I didn't test any other targets).  The patch
touches the timezone/tzfile.h header even though (IIUC) it comes
from another project.  I'm not familiar with the process for
keeping the timezone directory in sync with tzcode.  Please let
me know how you'd like to go about making this change.

Thanks
Martin
  

Comments

Joseph Myers Nov. 10, 2017, 5:53 p.m. UTC | #1
On Fri, 10 Nov 2017, Martin Sebor wrote:

> Attached is a patch to annotate such arrays in Glibc to prevent
> warnings on x86_64 (I didn't test any other targets).  The patch
> touches the timezone/tzfile.h header even though (IIUC) it comes
> from another project.  I'm not familiar with the process for
> keeping the timezone directory in sync with tzcode.  Please let
> me know how you'd like to go about making this change.

For tzcode the normal practice is a bulk update from the latest release of 
tzcode.  We might consider a fix from https://github.com/eggert/tz.git if 
something newer than the latest release is needed in a particular case, 
but release versions are preferred.
  
Martin Sebor Nov. 10, 2017, 11:11 p.m. UTC | #2
On 11/10/2017 10:53 AM, Joseph Myers wrote:
> On Fri, 10 Nov 2017, Martin Sebor wrote:
>
>> Attached is a patch to annotate such arrays in Glibc to prevent
>> warnings on x86_64 (I didn't test any other targets).  The patch
>> touches the timezone/tzfile.h header even though (IIUC) it comes
>> from another project.  I'm not familiar with the process for
>> keeping the timezone directory in sync with tzcode.  Please let
>> me know how you'd like to go about making this change.
>
> For tzcode the normal practice is a bulk update from the latest release of
> tzcode.  We might consider a fix from https://github.com/eggert/tz.git if
> something newer than the latest release is needed in a particular case,
> but release versions are preferred.

Okay, thanks.  I've read that you run periodic Glibc builds with
the latest GCC so I wanted to give you a heads up on this change
and the patch to avoid breaking them.  Unless that's not a concern
(do you use -Werror in those builds?) what is the best way to
coordinate this update?

I CC Paul for his input in case updating tzcode first is important.

Martin
  
Joseph Myers Nov. 10, 2017, 11:45 p.m. UTC | #3
On Fri, 10 Nov 2017, Martin Sebor wrote:

> Okay, thanks.  I've read that you run periodic Glibc builds with
> the latest GCC so I wanted to give you a heads up on this change
> and the patch to avoid breaking them.  Unless that's not a concern
> (do you use -Werror in those builds?) what is the best way to
> coordinate this update?

The builds use -Werror, and run daily (actually more like every 30 hours 
when the builds succeed, because each build starts 24 hours after the 
previous one finished).  The main issue with a GCC patch breaking the 
build going in some time before a glibc patch fixing the build is that it 
may make it harder to find what GCC patch introduced any other build 
breakage that appears while there are warnings (we currently have two such 
breakages with GCC mainline, for powerpc64le and sh).

The most obvious question for people to consider in reviewing this patch 
(apart from getting a tzfile.h fix from upstream) is coding style: do we 
want __NONSTRING on a separate line as in this patch, or on the same line 
as the member declaration in question (before the type, or after the [])?  
Also, sys/cdefs.h should be using __GNUC_PREREQ (8, 0), not a direct test 
of __GNUC__.
  
Joseph Myers Nov. 14, 2017, 5:58 p.m. UTC | #4
On further investigation of the coding style issue: the normal practice in 
glibc is definitely that attributes (except on function definitions) go 
immediately before the ';' at the end of a declaration (there are e.g. 
various __attribute__ ((__aligned__ (something))) attributes like that on 
struct members).  Also, the normal naming practice in sys/cdefs.h would be 
__attribute_nonstring__ unless there is some reason to deviate from that.
  
Steve Ellcey Nov. 15, 2017, 10:09 p.m. UTC | #5
On Fri, 2017-11-10 at 16:11 -0700, Martin Sebor wrote:

> Okay, thanks.  I've read that you run periodic Glibc builds with
> the latest GCC so I wanted to give you a heads up on this change
> and the patch to avoid breaking them.  Unless that's not a concern
> (do you use -Werror in those builds?) what is the best way to
> coordinate this update?
> 
> I CC Paul for his input in case updating tzcode first is important.
> 
> Martin

Martin, were you going to update this patch with Joseph's comments and
resubmit?  The tz changes are checked in and the ifreq struct is taken
care of so all that remains are the changes to cdefs.h to define
__NONSTRING and then to use in utmp.h.  I don't have any comments on
that change beyond what Joseph made but I did do a build and run on
aarch64 with the latest GCC to verify that those changes work for me.

Steve Ellcey
sellcey@cavium.com
  
Siddhesh Poyarekar Nov. 27, 2017, 11:05 a.m. UTC | #6
On Friday 10 November 2017 11:01 PM, Martin Sebor wrote:
> diff --git a/sysdeps/gnu/bits/utmp.h b/sysdeps/gnu/bits/utmp.h
> index 2ee11cb..d4885f3 100644
> --- a/sysdeps/gnu/bits/utmp.h
> +++ b/sysdeps/gnu/bits/utmp.h
> @@ -59,9 +59,13 @@ struct utmp
>  {
>    short int ut_type;		/* Type of login.  */
>    pid_t ut_pid;			/* Process ID of login process.  */
> +  __NONSTRING
>    char ut_line[UT_LINESIZE];	/* Devicename.  */
> +  __NONSTRING
>    char ut_id[4];		/* Inittab ID.  */
> +  __NONSTRING
>    char ut_user[UT_NAMESIZE];	/* Username.  */
> +  __NONSTRING
>    char ut_host[UT_HOSTSIZE];	/* Hostname for remote login.  */
>    struct exit_status ut_exit;	/* Exit status of a process marked
>  				   as DEAD_PROCESS.  */

This breaks builds with the latest gcc since your commit to warn on
non-string arguments to strlen:


commit 0c45740b611e0930073eeae00422a17c62c2d983
Author: msebor <msebor@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Nov 21 20:01:58 2017 +0000

    PR tree-optimization/82945 - add warning for passing non-strings to
functions that expect string arguments

    gcc/ChangeLog:

        PR tree-optimization/82945
        * builtins.c (expand_builtin_strlen): Call maybe_warn_nonstring_arg.
        * calls.h (maybe_warn_nonstring_arg): Declare new function.
        * calls.c (get_attr_nonstring_decl, maybe_warn_nonstring_arg): New
        functions.
        (initialize_argument_information): Call maybe_warn_nonstring_arg.
        * calls.h (get_attr_nonstring_decl): Declare new function.
        * doc/extend.texi (attribute nonstring): Update.
        * gimple-fold.c (gimple_fold_builtin_strncpy): Call
        get_attr_nonstring_decl and handle it.
        * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.  Improve
        detection of nul-termination.
        (strlen_to_stridx): Change to a pointer.
        (handle_builtin_strlen, handle_builtin_stxncpy): Adjust.
        (pass_strlen::execute): Same.

    gcc/testsuite/ChangeLog:

        PR tree-optimization/82945
        * c-c++-common/Wstringop-truncation-2.c: New test.
        * c-c++-common/Wstringop-truncation.c: Adjust.
        * c-c++-common/attr-nonstring-2.c: Adjust.
        * c-c++-common/attr-nonstring-3.c: New test.


Siddhesh
  
Martin Sebor Nov. 27, 2017, 4:04 p.m. UTC | #7
On 11/27/2017 04:05 AM, Siddhesh Poyarekar wrote:
> On Friday 10 November 2017 11:01 PM, Martin Sebor wrote:
>> diff --git a/sysdeps/gnu/bits/utmp.h b/sysdeps/gnu/bits/utmp.h
>> index 2ee11cb..d4885f3 100644
>> --- a/sysdeps/gnu/bits/utmp.h
>> +++ b/sysdeps/gnu/bits/utmp.h
>> @@ -59,9 +59,13 @@ struct utmp
>>  {
>>    short int ut_type;		/* Type of login.  */
>>    pid_t ut_pid;			/* Process ID of login process.  */
>> +  __NONSTRING
>>    char ut_line[UT_LINESIZE];	/* Devicename.  */
>> +  __NONSTRING
>>    char ut_id[4];		/* Inittab ID.  */
>> +  __NONSTRING
>>    char ut_user[UT_NAMESIZE];	/* Username.  */
>> +  __NONSTRING
>>    char ut_host[UT_HOSTSIZE];	/* Hostname for remote login.  */
>>    struct exit_status ut_exit;	/* Exit status of a process marked
>>  				   as DEAD_PROCESS.  */
>
> This breaks builds with the latest gcc since your commit to warn on
> non-string arguments to strlen:

I didn't see any warnings in my x86_64-linux build but there could
be others on targets I don't test.  The only problem I do know of
(bug 22447) was fixed last Wednesday.

It would be helpful if you could share some details about the code
that triggers the warning, what target, etc.

Martin
  
Siddhesh Poyarekar Nov. 27, 2017, 4:24 p.m. UTC | #8
On Monday 27 November 2017 09:34 PM, Martin Sebor wrote:
> I didn't see any warnings in my x86_64-linux build but there could
> be others on targets I don't test.  The only problem I do know of
> (bug 22447) was fixed last Wednesday.
> 
> It would be helpful if you could share some details about the code
> that triggers the warning, what target, etc.

I saw this during an aarch64 cross-build on an x86_64 host:

In file included from ../sysdeps/unix/sysv/linux/getlogin_r.c:25:
../sysdeps/unix/getlogin_r.c: In function ‘getlogin_r_fd0’:
../sysdeps/unix/getlogin_r.c:83:23: error: ‘strlen’ argument 1 declared
attribute ‘nonstring’ [-Werror=stringop-overflow=]
       size_t needed = strlen (ut->ut_user) + 1;
                       ^~~~~~~~~~~~~~~~~~~~
In file included from ../login/utmp.h:29,
                 from ../include/utmp.h:2,
                 from ../sysdeps/unix/getlogin_r.c:26,
                 from ../sysdeps/unix/sysv/linux/getlogin_r.c:25:
../sysdeps/gnu/bits/utmp.h:65:8: note: argument ‘ut_user’ declared here
   char ut_user[UT_NAMESIZE]
        ^~~~~~~
cc1: all warnings being treated as errors

Siddhesh
  
Siddhesh Poyarekar Nov. 27, 2017, 4:26 p.m. UTC | #9
On Monday 27 November 2017 09:54 PM, Siddhesh Poyarekar wrote:
> On Monday 27 November 2017 09:34 PM, Martin Sebor wrote:
>> I didn't see any warnings in my x86_64-linux build but there could
>> be others on targets I don't test.  The only problem I do know of
>> (bug 22447) was fixed last Wednesday.
>>
>> It would be helpful if you could share some details about the code
>> that triggers the warning, what target, etc.
> 
> I saw this during an aarch64 cross-build on an x86_64 host:
> 
> In file included from ../sysdeps/unix/sysv/linux/getlogin_r.c:25:
> ../sysdeps/unix/getlogin_r.c: In function ‘getlogin_r_fd0’:
> ../sysdeps/unix/getlogin_r.c:83:23: error: ‘strlen’ argument 1 declared
> attribute ‘nonstring’ [-Werror=stringop-overflow=]
>        size_t needed = strlen (ut->ut_user) + 1;
>                        ^~~~~~~~~~~~~~~~~~~~
> In file included from ../login/utmp.h:29,
>                  from ../include/utmp.h:2,
>                  from ../sysdeps/unix/getlogin_r.c:26,
>                  from ../sysdeps/unix/sysv/linux/getlogin_r.c:25:
> ../sysdeps/gnu/bits/utmp.h:65:8: note: argument ‘ut_user’ declared here
>    char ut_user[UT_NAMESIZE]
>         ^~~~~~~
> cc1: all warnings being treated as errors

Ugh, sorry, that is in fact the fix.  I just noticed that I had not
synced to the HEAD.

Thanks,
Siddhesh
  

Patch

The -Wstringop-truncation option new in GCC 8 detects common misuses
of the strncat and strncpy function that may result in truncating
the copied string before the terminating NUL.  To avoid false positive
warnings for correct code that intentionally creates sequences of
characters that aren't guaranteed to be NUL-terminated, arrays that
are intended to store such sequences should be decorated with a new
nonstring attribute.  This change add this attribute to Glibc and
uses it to suppress such false positives.

ChangeLog:
	* misc/sys/cdefs.h (__NONSTRING): New macro.
	* sysdeps/gnu/bits/utmp.h (struct utmp): Use it.
	* sysdeps/gnu/net/if.h (struct ifreq): Same.
	* timezone/tzfile.h (struct tzhead): Same.

diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h
index cfd39d5..af6d1ac 100644
--- a/misc/sys/cdefs.h
+++ b/misc/sys/cdefs.h
@@ -407,6 +407,14 @@ 
 # endif
 #endif
 
+#if __GNUC__ >= 8
+/* Describes a char array that is not necessarily a NUL-terminated
+   string.  */
+# define __NONSTRING __attribute__ ((__nonstring__))
+#else
+# define __NONSTRING
+#endif
+
 #if (!defined _Static_assert && !defined __cplusplus \
      && (defined __STDC_VERSION__ ? __STDC_VERSION__ : 0) < 201112 \
      && (!__GNUC_PREREQ (4, 6) || defined __STRICT_ANSI__))
diff --git a/sysdeps/gnu/bits/utmp.h b/sysdeps/gnu/bits/utmp.h
index 2ee11cb..d4885f3 100644
--- a/sysdeps/gnu/bits/utmp.h
+++ b/sysdeps/gnu/bits/utmp.h
@@ -59,9 +59,13 @@  struct utmp
 {
   short int ut_type;		/* Type of login.  */
   pid_t ut_pid;			/* Process ID of login process.  */
+  __NONSTRING
   char ut_line[UT_LINESIZE];	/* Devicename.  */
+  __NONSTRING
   char ut_id[4];		/* Inittab ID.  */
+  __NONSTRING
   char ut_user[UT_NAMESIZE];	/* Username.  */
+  __NONSTRING
   char ut_host[UT_HOSTSIZE];	/* Hostname for remote login.  */
   struct exit_status ut_exit;	/* Exit status of a process marked
 				   as DEAD_PROCESS.  */
diff --git a/sysdeps/gnu/net/if.h b/sysdeps/gnu/net/if.h
index 0afce08..eb11813 100644
--- a/sysdeps/gnu/net/if.h
+++ b/sysdeps/gnu/net/if.h
@@ -129,6 +129,7 @@  struct ifreq
 # define IFNAMSIZ	IF_NAMESIZE
     union
       {
+	__NONSTRING
 	char ifrn_name[IFNAMSIZ];	/* Interface name, e.g. "en0".  */
       } ifr_ifrn;
 
diff --git a/timezone/tzfile.h b/timezone/tzfile.h
index 0e51dce..19fe20a 100644
--- a/timezone/tzfile.h
+++ b/timezone/tzfile.h
@@ -38,6 +38,7 @@ 
 #define	TZ_MAGIC	"TZif"
 
 struct tzhead {
+	__NONSTRING
 	char	tzh_magic[4];		/* TZ_MAGIC */
 	char	tzh_version[1];		/* '\0' or '2' or '3' as of 2013 */
 	char	tzh_reserved[15];	/* reserved; must be zero */