[v2] middle-end/104854: Limit strncmp overread warnings

Message ID 20220315053158.1486555-1-siddhesh@gotplt.org
State New
Headers
Series [v2] middle-end/104854: Limit strncmp overread warnings |

Commit Message

Siddhesh Poyarekar March 15, 2022, 5:31 a.m. UTC
  The size argument in strncmp only describe the maximum length to which
to compare two strings and is not an indication of sizes of the two
source strings.  Do not warn if it is larger than the two input strings
because it is entirely likely that the size argument is a conservative
maximum to accommodate inputs of different lengths and only a subset is
reachable through the current code path or that it is some other
application-specific property completely unrelated to the sizes of the
input strings.

gcc/ChangeLog:

	middle-end/104854
	* gimple-ssa-warn-access.cc
	(pass_waccess::warn_zero_sized_strncmp_inputs): New function.
	(pass_waccess::check_strncmp): Use it.

gcc/testsuite/ChangeLog:

	middle-end/104854
	* gcc.dg/Wstringop-overread.c (test_strncmp_array): Don't expect
	failures for non-zero sizes.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
---

Changes from v1:

A little better approach, ensuring that it tries to warn on zero length
inputs if the size of at least one of the two sources is known.

Also cc'ing Martin so that we can discuss approach on the list instead
of on the bug.  To summarize the discussion so far, Martin suggests that
the warning be split into levels but I'm contesting the utility of the
heuristics as a compiler warning given the looseness of the relationship
between the size argument and the inputs in the case of these functions.


 gcc/gimple-ssa-warn-access.cc             | 69 +++++++++--------------
 gcc/testsuite/gcc.dg/Wstringop-overread.c |  2 +-
 2 files changed, 28 insertions(+), 43 deletions(-)
  

Comments

Martin Sebor March 15, 2022, 3:39 p.m. UTC | #1
On 3/14/22 23:31, Siddhesh Poyarekar wrote:
> The size argument in strncmp only describe the maximum length to which
> to compare two strings and is not an indication of sizes of the two
> source strings.  Do not warn if it is larger than the two input strings
> because it is entirely likely that the size argument is a conservative
> maximum to accommodate inputs of different lengths and only a subset is
> reachable through the current code path or that it is some other
> application-specific property completely unrelated to the sizes of the
> input strings.

The strncmp function takes arrays as arguments (not necessarily
strings).  The main purpose of the -Wstringop-overread warning
for calls to it is to detect calls where one of the arrays is
not a nul-terminated string and the bound is larger than the size
of the array.  For example:

   char a[4], b[4];

   int f (void)
   {
     return strncmp (a, b, 8);   // -Wstringop-overread
   }

Such a call is suspect: if one of the arrays isn't nul-terminated
the call is undefined.  Otherwise, if both are nul-terminated there
is no point in calling strncmp with a bound greater than their sizes.

With no evidence that this warning is ever harmful I'd consider
suppressing it a regression.  Since the warning is a deliberate
feature in a released compiler and GCC is now in a regression
fixing stage, this patch is out of scope even if a case where
the warning wasn't helpful did turn up (none has been reported
so far).

> 
> gcc/ChangeLog:
> 
> 	middle-end/104854
> 	* gimple-ssa-warn-access.cc
> 	(pass_waccess::warn_zero_sized_strncmp_inputs): New function.
> 	(pass_waccess::check_strncmp): Use it.
> 
> gcc/testsuite/ChangeLog:
> 
> 	middle-end/104854
> 	* gcc.dg/Wstringop-overread.c (test_strncmp_array): Don't expect
> 	failures for non-zero sizes.
> 
> Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
> ---
> 
> Changes from v1:
> 
> A little better approach, ensuring that it tries to warn on zero length
> inputs if the size of at least one of the two sources is known.
> 
> Also cc'ing Martin so that we can discuss approach on the list instead
> of on the bug.  To summarize the discussion so far, Martin suggests that
> the warning be split into levels but I'm contesting the utility of the
> heuristics as a compiler warning given the looseness of the relationship
> between the size argument and the inputs in the case of these functions.

Thanks for CC'ing me.  The motivating example in pr104854 that we have
been discussing there involves strndup with a string literal.  That's
an entirely different case than the one your patch changes, and I don't
understand in what way you think they are related.

Martin

> 
> 
>   gcc/gimple-ssa-warn-access.cc             | 69 +++++++++--------------
>   gcc/testsuite/gcc.dg/Wstringop-overread.c |  2 +-
>   2 files changed, 28 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index 75297ed7c9e..15299770e29 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -2137,6 +2137,9 @@ private:
>     /* Return true if use follows an invalidating statement.  */
>     bool use_after_inval_p (gimple *, gimple *, bool = false);
>   
> +  /* Emit an overread warning for zero sized inputs to strncmp.  */
> +  void warn_zero_sized_strncmp_inputs (gimple *, tree *, access_data *);
> +
>     /* A pointer_query object to store information about pointers and
>        their targets in.  */
>     pointer_query m_ptr_qry;
> @@ -2619,8 +2622,20 @@ pass_waccess::check_stxncpy (gcall *stmt)
>   		data.mode, &data, m_ptr_qry.rvals);
>   }
>   
> -/* Check a call STMT to stpncpy() or strncpy() for overflow and warn
> -   if it does.  */
> +/* Warn for strncmp on a zero sized source or when an argument isn't
> +   nul-terminated.  */
> +void
> +pass_waccess::warn_zero_sized_strncmp_inputs (gimple *stmt, tree *bndrng,
> +					      access_data *pad)
> +{
> +  tree func = get_callee_fndecl (stmt);
> +  location_t loc = gimple_location (stmt);
> +  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func, bndrng,
> +			size_zero_node, pad);
> +}
> +
> +/* Check a call STMT to strncmp () for overflow and warn if it does.  This is
> +   limited to checking for NUL terminated arrays for now.  */
>   
>   void
>   pass_waccess::check_strncmp (gcall *stmt)
> @@ -2678,46 +2693,16 @@ pass_waccess::check_strncmp (gcall *stmt)
>     if (!bndrng[0] || integer_zerop (bndrng[0]))
>       return;
>   
> -  if (len1 && tree_int_cst_lt (len1, bndrng[0]))
> -    bndrng[0] = len1;
> -  if (len2 && tree_int_cst_lt (len2, bndrng[0]))
> -    bndrng[0] = len2;
> -
> -  /* compute_objsize almost never fails (and ultimately should never
> -     fail).  Don't bother to handle the rare case when it does.  */
> -  if (!compute_objsize (arg1, stmt, 1, &adata1.src, &m_ptr_qry)
> -      || !compute_objsize (arg2, stmt, 1, &adata2.src, &m_ptr_qry))
> -    return;
> -
> -  /* Compute the size of the remaining space in each array after
> -     subtracting any offset into it.  */
> -  offset_int rem1 = adata1.src.size_remaining ();
> -  offset_int rem2 = adata2.src.size_remaining ();
> -
> -  /* Cap REM1 and REM2 at the other if the other's argument is known
> -     to be an unterminated array, either because there's no space
> -     left in it after adding its offset or because it's constant and
> -     has no nul.  */
> -  if (rem1 == 0 || (rem1 < rem2 && lendata1.decl))
> -    rem2 = rem1;
> -  else if (rem2 == 0 || (rem2 < rem1 && lendata2.decl))
> -    rem1 = rem2;
> -
> -  /* Point PAD at the array to reference in the note if a warning
> -     is issued.  */
> -  access_data *pad = len1 ? &adata2 : &adata1;
> -  offset_int maxrem = wi::max (rem1, rem2, UNSIGNED);
> -  if (lendata1.decl || lendata2.decl
> -      || maxrem < wi::to_offset (bndrng[0]))
> -    {
> -      /* Warn when either argument isn't nul-terminated or the maximum
> -	 remaining space in the two arrays is less than the bound.  */
> -      tree func = get_callee_fndecl (stmt);
> -      location_t loc = gimple_location (stmt);
> -      maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func,
> -			    bndrng, wide_int_to_tree (sizetype, maxrem),
> -			    pad);
> -    }
> +  /* compute_objsize almost never fails (and ultimately should never fail).
> +     Don't bother to handle the rare case when it does.  Warn if either the
> +     source or destination has zero size, since the minimum bound is non-zero,
> +     hence guaranteeing an overread.  */
> +  if (compute_objsize (arg1, stmt, 1, &adata1.src, &m_ptr_qry)
> +      && adata1.src.size_remaining () == 0)
> +    warn_zero_sized_strncmp_inputs (stmt, bndrng, &adata1);
> +  if (compute_objsize (arg2, stmt, 1, &adata2.src, &m_ptr_qry)
> +      && adata2.src.size_remaining () == 0)
> +    warn_zero_sized_strncmp_inputs (stmt, bndrng, &adata2);
>   }
>   
>   /* Determine and check the sizes of the source and the destination
> diff --git a/gcc/testsuite/gcc.dg/Wstringop-overread.c b/gcc/testsuite/gcc.dg/Wstringop-overread.c
> index 7db74029819..fb8e626439d 100644
> --- a/gcc/testsuite/gcc.dg/Wstringop-overread.c
> +++ b/gcc/testsuite/gcc.dg/Wstringop-overread.c
> @@ -431,7 +431,7 @@ void test_strncmp_array (const char *s, int i)
>   
>     T (strncmp (a1, b1, 0));
>     T (strncmp (a1, b1, 1));
> -  T (strncmp (a1, b1, 2));      // { dg-warning "'strncmp' specified bound 2 exceeds source size 1" }
> +  T (strncmp (a1, b1, 2));
>   }
>   
>
  
Siddhesh Poyarekar March 15, 2022, 4:40 p.m. UTC | #2
On 15/03/2022 21:09, Martin Sebor wrote:
> The strncmp function takes arrays as arguments (not necessarily
> strings).  The main purpose of the -Wstringop-overread warning
> for calls to it is to detect calls where one of the arrays is
> not a nul-terminated string and the bound is larger than the size
> of the array.  For example:
> 
>    char a[4], b[4];
> 
>    int f (void)
>    {
>      return strncmp (a, b, 8);   // -Wstringop-overread
>    }
> 
> Such a call is suspect: if one of the arrays isn't nul-terminated
> the call is undefined.  Otherwise, if both are nul-terminated there

Isn't "suspect" too harsh a description though?  The bound does not 
specify the size of a or b, it specifies the maximum extent to which to 
compare a and b, the extent being any application-specific limit.  In 
fact the limit could be the size of some arbitrary third buffer that the 
contents of a or b must be copied to, truncating to the bound.

I agree the call is undefined if one of the arrays is not nul-terminated 
and that's the thing; nothing about the bound is undefined in this 
context, it's the NUL termination that is key.

> is no point in calling strncmp with a bound greater than their sizes.

There is, when the bound describes something else, e.g. the size of a 
third destination buffer into which one of the input buffers may get 
copied into.  Or when the bound describes the maximum length of a set of 
strings where only a subset of the strings are reachable in the current 
function and ranger sees it, allowing us to reduce our input string size 
estimate.  The bounds being the maximum of the lengths of two input 
strings is just one of many possibilities.

> With no evidence that this warning is ever harmful I'd consider

There is, the false positives were seen in Fedora/RHEL builds.

> suppressing it a regression.  Since the warning is a deliberate
> feature in a released compiler and GCC is now in a regression
> fixing stage, this patch is out of scope even if a case where
> the warning wasn't helpful did turn up (none has been reported
> so far).

Wait, I just reported an issue and it's across multiple packages in 
Fedora/RHEL :)

I think this is a regression since gcc 11 due to misunderstanding the 
specification and assuming too strong a relationship between the size 
argument of strncmp (and indeed strnlen and strndup) and the size of 
objects being passed to it.  Compliant code relies on the compiler to do 
the right thing here, i.e. optimize the strncmp call to strcmp and not 
panic about the size argument being larger than the input buffer size. 
If at all such a diagnostic needs to stay, it ought to go into the 
analyzer, where such looser heuristic suggestions are more acceptable 
and sometimes even appreciated.

FWIW, I'm open to splitting the warning levels as you suggested if 
that's the consensus since it at least provides a way to make these 
warnings saner. However I still haven't found the rationale presented so 
far compelling enough to justify these false positives; I just don't see 
a proportional enough reward.  Hopefully more people can chime in with 
their perspective on this.

Thanks,
Siddhesh
  
Martin Sebor March 15, 2022, 8:36 p.m. UTC | #3
On 3/15/22 10:40, Siddhesh Poyarekar wrote:
> On 15/03/2022 21:09, Martin Sebor wrote:
>> The strncmp function takes arrays as arguments (not necessarily
>> strings).  The main purpose of the -Wstringop-overread warning
>> for calls to it is to detect calls where one of the arrays is
>> not a nul-terminated string and the bound is larger than the size
>> of the array.  For example:
>>
>>    char a[4], b[4];
>>
>>    int f (void)
>>    {
>>      return strncmp (a, b, 8);   // -Wstringop-overread
>>    }
>>
>> Such a call is suspect: if one of the arrays isn't nul-terminated
>> the call is undefined.  Otherwise, if both are nul-terminated there
> 
> Isn't "suspect" too harsh a description though?  The bound does not 
> specify the size of a or b, it specifies the maximum extent to which to 
> compare a and b, the extent being any application-specific limit.  In 
> fact the limit could be the size of some arbitrary third buffer that the 
> contents of a or b must be copied to, truncating to the bound.

The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.

As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).

Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
<string.h>, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.

> I agree the call is undefined if one of the arrays is not nul-terminated 
> and that's the thing; nothing about the bound is undefined in this 
> context, it's the NUL termination that is key.
> 
>> is no point in calling strncmp with a bound greater than their sizes.
> 
> There is, when the bound describes something else, e.g. the size of a 
> third destination buffer into which one of the input buffers may get 
> copied into.  Or when the bound describes the maximum length of a set of 
> strings where only a subset of the strings are reachable in the current 
> function and ranger sees it, allowing us to reduce our input string size 
> estimate.  The bounds being the maximum of the lengths of two input 
> strings is just one of many possibilities.
> 
>> With no evidence that this warning is ever harmful I'd consider
> 
> There is, the false positives were seen in Fedora/RHEL builds.

I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether
or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).

(Granted, the manual makes it sound like -Wstringop-overread only
detects provable past-the-end reads.  That's a mistake in
the documentation that should be fixed.  The warning was never quite
so limited, nor was it intended to be.)

Martin

> 
>> suppressing it a regression.  Since the warning is a deliberate
>> feature in a released compiler and GCC is now in a regression
>> fixing stage, this patch is out of scope even if a case where
>> the warning wasn't helpful did turn up (none has been reported
>> so far).
> 
> Wait, I just reported an issue and it's across multiple packages in 
> Fedora/RHEL :)
> 
> I think this is a regression since gcc 11 due to misunderstanding the 
> specification and assuming too strong a relationship between the size 
> argument of strncmp (and indeed strnlen and strndup) and the size of 
> objects being passed to it.  Compliant code relies on the compiler to do 
> the right thing here, i.e. optimize the strncmp call to strcmp and not 
> panic about the size argument being larger than the input buffer size. 
> If at all such a diagnostic needs to stay, it ought to go into the 
> analyzer, where such looser heuristic suggestions are more acceptable 
> and sometimes even appreciated.
> 
> FWIW, I'm open to splitting the warning levels as you suggested if 
> that's the consensus since it at least provides a way to make these 
> warnings saner. However I still haven't found the rationale presented so 
> far compelling enough to justify these false positives; I just don't see 
> a proportional enough reward.  Hopefully more people can chime in with 
> their perspective on this.
> 
> Thanks,
> Siddhesh
>
  
Siddhesh Poyarekar March 16, 2022, 1:24 a.m. UTC | #4
On 16/03/2022 02:06, Martin Sebor wrote:
> The intended use of the strncmp bound is to limit the comparison to
> at most the size of the arrays or (in a subset of cases) the length
> of an initial substring. Providing an arbitrary bound that's not
> related to the sizes as you describe sounds very much like a misuse.

Nothing in the standard says that the bound is related to the sizes of 
input buffers.  I don't think deducing that intent makes sense either, 
nor concluding that any other use case is misuse.

> As a historical note, strncmp was first introduced in UNIX v7 where
> its purpose, alongside strncpy, was to manipulate (potentially)
> unterminated character arrays like file names stored in fixed size
> arrays (typically 14 bytes).  Strncpy would fill the buffers with
> ASCII data up to their size and pad the rest with nuls only if there
> was room.
> 
> Strncmp was then used to compare these potentially unterminated
> character arrays (e.g., archive headers in ld and ranlib).  The bound
> was the size of the fixed size array.  Its other use case was to compare
> leading portions of strings (e.g, when looking for an environment
> variable or when stripping "./" from path names).

Thanks for sharing the historical perspective.

> Since the early UNIX days, both strncpy and to a lesser extent strncmp
> have been widely misused and, along with many other functions in
> <string.h>, a frequent source of bugs due to common misunderstanding
> of their intended purpose.  The aim of these warnings is to detect
> the common (and sometimes less common) misuses and bugs.

They're all valid uses however since they do not violate the standard. 
If we find at compile time that the strings don't terminate at the 
bounds, emitting the warning is OK but the more pessimistic check seems 
like overkill.

> I haven't seen these so I can't very well comment on them.  But I can
> assure you that warning for the code above is intentional.  Whether
> or not the arrays are nul-terminated, the expected way to call
> the function is with a bound no greater than their size (some coding
> guidelines are explicit about this; see for example the CERT C Secure
> Coding standard rule ARR38-C).
> 
> (Granted, the manual makes it sound like -Wstringop-overread only
> detects provable past-the-end reads.  That's a mistake in
> the documentation that should be fixed.  The warning was never quite
> so limited, nor was it intended to be.)

The contention is not that it's not provable, it's more that it's 
doesn't even pass the "based on available information this is definitely 
buggy" assertion, making it more a strong suggestion than a warning that 
something is definitely amiss.  Which is why IMO it is more suitable as 
an analyzer check than a warning.

Thanks,
Siddhesh
  
Martin Sebor March 16, 2022, 11:41 p.m. UTC | #5
On 3/15/22 19:24, Siddhesh Poyarekar wrote:
> On 16/03/2022 02:06, Martin Sebor wrote:
>> The intended use of the strncmp bound is to limit the comparison to
>> at most the size of the arrays or (in a subset of cases) the length
>> of an initial substring. Providing an arbitrary bound that's not
>> related to the sizes as you describe sounds very much like a misuse.
> 
> Nothing in the standard says that the bound is related to the sizes of 
> input buffers.  I don't think deducing that intent makes sense either, 
> nor concluding that any other use case is misuse.
> 
>> As a historical note, strncmp was first introduced in UNIX v7 where
>> its purpose, alongside strncpy, was to manipulate (potentially)
>> unterminated character arrays like file names stored in fixed size
>> arrays (typically 14 bytes).  Strncpy would fill the buffers with
>> ASCII data up to their size and pad the rest with nuls only if there
>> was room.
>>
>> Strncmp was then used to compare these potentially unterminated
>> character arrays (e.g., archive headers in ld and ranlib).  The bound
>> was the size of the fixed size array.  Its other use case was to compare
>> leading portions of strings (e.g, when looking for an environment
>> variable or when stripping "./" from path names).
> 
> Thanks for sharing the historical perspective.
> 
>> Since the early UNIX days, both strncpy and to a lesser extent strncmp
>> have been widely misused and, along with many other functions in
>> <string.h>, a frequent source of bugs due to common misunderstanding
>> of their intended purpose.  The aim of these warnings is to detect
>> the common (and sometimes less common) misuses and bugs.
> 
> They're all valid uses however since they do not violate the standard. 
> If we find at compile time that the strings don't terminate at the 
> bounds, emitting the warning is OK but the more pessimistic check seems 
> like overkill.
> 
>> I haven't seen these so I can't very well comment on them.  But I can
>> assure you that warning for the code above is intentional.  Whether
>> or not the arrays are nul-terminated, the expected way to call
>> the function is with a bound no greater than their size (some coding
>> guidelines are explicit about this; see for example the CERT C Secure
>> Coding standard rule ARR38-C).
>>
>> (Granted, the manual makes it sound like -Wstringop-overread only
>> detects provable past-the-end reads.  That's a mistake in
>> the documentation that should be fixed.  The warning was never quite
>> so limited, nor was it intended to be.)
> 
> The contention is not that it's not provable, it's more that it's 
> doesn't even pass the "based on available information this is definitely 
> buggy" assertion, making it more a strong suggestion than a warning that 
> something is definitely amiss.  Which is why IMO it is more suitable as 
> an analyzer check than a warning.

As the GCC manual prominently states (and as I already pointed out)
warnings are:

   constructions that are not inherently erroneous but that are risky
   or suggest there may have been an error.

None of them implies a definite bug, and only a minority are about
violations of the standard.  Many point out benign code that's just
suspicious.

We can disagree (and many of us do) about the value of this or that
warning but that alone is not sufficient reason to change it.  Least
of all in stage 4, and with no details about the issues you say you
found.

Martin
  
Siddhesh Poyarekar March 17, 2022, 1:43 a.m. UTC | #6
On 17/03/2022 05:11, Martin Sebor wrote:
> As the GCC manual prominently states (and as I already pointed out)
> warnings are:

We are indeed going around in circles.  Hopefully someone else will 
pitch in and break the deadlock.

Siddhesh
  
Jonathan Wakely March 17, 2022, 10:35 a.m. UTC | #7
On 15/03/22 14:36 -0600, Martin Sebor wrote:
>On 3/15/22 10:40, Siddhesh Poyarekar wrote:
>> On 15/03/2022 21:09, Martin Sebor wrote:
>>> The strncmp function takes arrays as arguments (not necessarily
>>> strings).? The main purpose of the -Wstringop-overread warning
>>> for calls to it is to detect calls where one of the arrays is
>>> not a nul-terminated string and the bound is larger than the size
>>> of the array.? For example:
>>>
>>> ?? char a[4], b[4];
>>>
>>> ?? int f (void)
>>> ?? {
>>> ???? return strncmp (a, b, 8);?? // -Wstringop-overread
>>> ?? }
>>>
>>> Such a call is suspect: if one of the arrays isn't nul-terminated
>>> the call is undefined.? Otherwise, if both are nul-terminated there
>>
>> Isn't "suspect" too harsh a description though?? The bound does not
>> specify the size of a or b, it specifies the maximum extent to which to
>> compare a and b, the extent being any application-specific limit.? In
>> fact the limit could be the size of some arbitrary third buffer that the
>> contents of a or b must be copied to, truncating to the bound.
>
>The intended use of the strncmp bound is to limit the comparison to
>at most the size of the arrays or (in a subset of cases) the length
>of an initial substring. Providing an arbitrary bound that's not
>related to the sizes as you describe sounds very much like a misuse.
>
>As a historical note, strncmp was first introduced in UNIX v7 where
>its purpose, alongside strncpy, was to manipulate (potentially)
>unterminated character arrays like file names stored in fixed size
>arrays (typically 14 bytes).  Strncpy would fill the buffers with
>ASCII data up to their size and pad the rest with nuls only if there
>was room.
>
>Strncmp was then used to compare these potentially unterminated
>character arrays (e.g., archive headers in ld and ranlib).  The bound
>was the size of the fixed size array.  Its other use case was to compare
>leading portions of strings (e.g, when looking for an environment
>variable or when stripping "./" from path names).
>
>Since the early UNIX days, both strncpy and to a lesser extent strncmp
>have been widely misused and, along with many other functions in
><string.h>, a frequent source of bugs due to common misunderstanding
>of their intended purpose.  The aim of these warnings is to detect
>the common (and sometimes less common) misuses and bugs.
>
>> I agree the call is undefined if one of the arrays is not nul-terminated
>> and that's the thing; nothing about the bound is undefined in this
>> context, it's the NUL termination that is key.
>>
>>> is no point in calling strncmp with a bound greater than their sizes.
>>
>> There is, when the bound describes something else, e.g. the size of a
>> third destination buffer into which one of the input buffers may get
>> copied into.? Or when the bound describes the maximum length of a set of
>> strings where only a subset of the strings are reachable in the current
>> function and ranger sees it, allowing us to reduce our input string size
>> estimate.? The bounds being the maximum of the lengths of two input
>> strings is just one of many possibilities.
>>
>>> With no evidence that this warning is ever harmful I'd consider
>>
>> There is, the false positives were seen in Fedora/RHEL builds.
>
>I haven't seen these so I can't very well comment on them.  But I can
>assure you that warning for the code above is intentional.  Whether

I don't think anybody is saying it wasn't intentional, the point is
that we can change our minds and do it differently based on feedback
and usage experience.

If users no longer have faith in these warnings and just disable them
or ignore them, then the warnings do not find real bugs and are not
fit for purpose.

>or not the arrays are nul-terminated, the expected way to call
>the function is with a bound no greater than their size (some coding
>guidelines are explicit about this; see for example the CERT C Secure
>Coding standard rule ARR38-C).

That's fine. It shouldn't be in -Wall though. Please don't quote the
docs about suspicious constructs again, we're all smart enough to
consider things on balance and make trade offs instead of just taking
a dogmatic reading of the manual (especially when other parts of the
docs for these warnings are provably wrong).

I disagree this shouldn't be changed in stage 4, because the
ever-increasing false positive rate from the -Wstringop-xxx warnings
is a major regression in user experience with GCC.

You can keep insisting it's not a problem, but that doesn't match
other people's experience.
  
David Malcolm March 17, 2022, 1:02 p.m. UTC | #8
On Tue, 2022-03-15 at 22:10 +0530, Siddhesh Poyarekar wrote:
> On 15/03/2022 21:09, Martin Sebor wrote:
> > The strncmp function takes arrays as arguments (not necessarily
> > strings).  The main purpose of the -Wstringop-overread warning
> > for calls to it is to detect calls where one of the arrays is
> > not a nul-terminated string and the bound is larger than the size
> > of the array.  For example:
> > 
> >    char a[4], b[4];
> > 
> >    int f (void)
> >    {
> >      return strncmp (a, b, 8);   // -Wstringop-overread
> >    }
> > 
> > Such a call is suspect: if one of the arrays isn't nul-terminated
> > the call is undefined.  Otherwise, if both are nul-terminated there
> 
> Isn't "suspect" too harsh a description though?  The bound does not 
> specify the size of a or b, it specifies the maximum extent to which to
> compare a and b, the extent being any application-specific limit.  In
> fact the limit could be the size of some arbitrary third buffer that
> the 
> contents of a or b must be copied to, truncating to the bound.
> 
> I agree the call is undefined if one of the arrays is not nul-
> terminated 
> and that's the thing; nothing about the bound is undefined in this 
> context, it's the NUL termination that is key.
> 
> > is no point in calling strncmp with a bound greater than their sizes.
> 
> There is, when the bound describes something else, e.g. the size of a
> third destination buffer into which one of the input buffers may get 
> copied into.  Or when the bound describes the maximum length of a set
> of 
> strings where only a subset of the strings are reachable in the current
> function and ranger sees it, allowing us to reduce our input string
> size 
> estimate.  The bounds being the maximum of the lengths of two input 
> strings is just one of many possibilities.
> 
> > With no evidence that this warning is ever harmful I'd consider
> 
> There is, the false positives were seen in Fedora/RHEL builds.
> 
> > suppressing it a regression.  Since the warning is a deliberate
> > feature in a released compiler and GCC is now in a regression
> > fixing stage, this patch is out of scope even if a case where
> > the warning wasn't helpful did turn up (none has been reported
> > so far).
> 
> Wait, I just reported an issue and it's across multiple packages in 
> Fedora/RHEL :)

I think having more examples of where this is happening on real-world
code might help this discussion.

Do you have some URLs? (e.g. bug reports, build logs, etc)

Reading through this thread, all of the examples I've seen look like
they've been constructed; I'm interested in seeing what this looks like
"in the wild" as it were.

Quoting myself in the UX Guidelines:
  https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html

"28.1.3 Try the diagnostic on real-world code

It’s worth testing a new warning on many instances of real-world code,
written by different people, and seeing what it complains about, and
what it doesn’t complain about.

This may suggest heuristics that silence common false positives.

It may also suggest ways to improve the precision of the message."

...and various other parts of those guidelines may apply here.

Hope this is constructive
Dave





> 
> I think this is a regression since gcc 11 due to misunderstanding the
> specification and assuming too strong a relationship between the size
> argument of strncmp (and indeed strnlen and strndup) and the size of 
> objects being passed to it.  Compliant code relies on the compiler to
> do 
> the right thing here, i.e. optimize the strncmp call to strcmp and not 
> panic about the size argument being larger than the input buffer size. 
> If at all such a diagnostic needs to stay, it ought to go into the 
> analyzer, where such looser heuristic suggestions are more acceptable
> and sometimes even appreciated.
> 
> FWIW, I'm open to splitting the warning levels as you suggested if 
> that's the consensus since it at least provides a way to make these 
> warnings saner. However I still haven't found the rationale presented
> so 
> far compelling enough to justify these false positives; I just don't
> see 
> a proportional enough reward.  Hopefully more people can chime in with 
> their perspective on this.
> 
> Thanks,
> Siddhesh
>
  
Aldy Hernandez March 17, 2022, 2:32 p.m. UTC | #9
I agree with Siddhesh (and Jonathan).

Is there a release manager that can either review his patch, or
provide feedback as to what the course of action should be here?

Thanks.
Aldy

On Thu, Mar 17, 2022 at 2:44 AM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> On 17/03/2022 05:11, Martin Sebor wrote:
> > As the GCC manual prominently states (and as I already pointed out)
> > warnings are:
>
> We are indeed going around in circles.  Hopefully someone else will
> pitch in and break the deadlock.
>
> Siddhesh
>
  
Marek Polacek March 17, 2022, 3:31 p.m. UTC | #10
On Wed, Mar 16, 2022 at 06:54:51AM +0530, Siddhesh Poyarekar wrote:
> On 16/03/2022 02:06, Martin Sebor wrote:
> > The intended use of the strncmp bound is to limit the comparison to
> > at most the size of the arrays or (in a subset of cases) the length
> > of an initial substring. Providing an arbitrary bound that's not
> > related to the sizes as you describe sounds very much like a misuse.
> 
> Nothing in the standard says that the bound is related to the sizes of input
> buffers.  I don't think deducing that intent makes sense either, nor
> concluding that any other use case is misuse.
> 
> > As a historical note, strncmp was first introduced in UNIX v7 where
> > its purpose, alongside strncpy, was to manipulate (potentially)
> > unterminated character arrays like file names stored in fixed size
> > arrays (typically 14 bytes).  Strncpy would fill the buffers with
> > ASCII data up to their size and pad the rest with nuls only if there
> > was room.
> > 
> > Strncmp was then used to compare these potentially unterminated
> > character arrays (e.g., archive headers in ld and ranlib).  The bound
> > was the size of the fixed size array.  Its other use case was to compare
> > leading portions of strings (e.g, when looking for an environment
> > variable or when stripping "./" from path names).
> 
> Thanks for sharing the historical perspective.
> 
> > Since the early UNIX days, both strncpy and to a lesser extent strncmp
> > have been widely misused and, along with many other functions in
> > <string.h>, a frequent source of bugs due to common misunderstanding
> > of their intended purpose.  The aim of these warnings is to detect
> > the common (and sometimes less common) misuses and bugs.
> 
> They're all valid uses however since they do not violate the standard. If we
> find at compile time that the strings don't terminate at the bounds,
> emitting the warning is OK but the more pessimistic check seems like
> overkill.

I think I agree, I've tried

#include <string.h>
char a[] = "abc";
char b[] = "abcd";

int f (void)
{
  return strncmp (a, b, 8);
}
 
where I get

t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5 [-Wstringop-overread]
    7 |   return strncmp (a, b, 8);   // -Wstringop-overread
      |          ^~~~~~~~~~~~~~~~~

even without -Wall.  strncmp sees that a[3] is '\0' so it stops comparing
and there's no UB.

GCC 11 didn't emit that, so +1 for dialing this warning down.

> > I haven't seen these so I can't very well comment on them.  But I can
> > assure you that warning for the code above is intentional.  Whether
> > or not the arrays are nul-terminated, the expected way to call
> > the function is with a bound no greater than their size (some coding
> > guidelines are explicit about this; see for example the CERT C Secure
> > Coding standard rule ARR38-C).
> > 
> > (Granted, the manual makes it sound like -Wstringop-overread only
> > detects provable past-the-end reads.  That's a mistake in
> > the documentation that should be fixed.  The warning was never quite
> > so limited, nor was it intended to be.)
> 
> The contention is not that it's not provable, it's more that it's doesn't
> even pass the "based on available information this is definitely buggy"
> assertion, making it more a strong suggestion than a warning that something
> is definitely amiss.  Which is why IMO it is more suitable as an analyzer
> check than a warning.
> 
> Thanks,
> Siddhesh
> 

Marek
  
Jeff Law March 17, 2022, 4:46 p.m. UTC | #11
On Thu, Mar 17, 2022 at 9:32 AM Marek Polacek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, Mar 16, 2022 at 06:54:51AM +0530, Siddhesh Poyarekar wrote:
> > On 16/03/2022 02:06, Martin Sebor wrote:
> > > The intended use of the strncmp bound is to limit the comparison to
> > > at most the size of the arrays or (in a subset of cases) the length
> > > of an initial substring. Providing an arbitrary bound that's not
> > > related to the sizes as you describe sounds very much like a misuse.
> >
> > Nothing in the standard says that the bound is related to the sizes of
> input
> > buffers.  I don't think deducing that intent makes sense either, nor
> > concluding that any other use case is misuse.
> >
> > > As a historical note, strncmp was first introduced in UNIX v7 where
> > > its purpose, alongside strncpy, was to manipulate (potentially)
> > > unterminated character arrays like file names stored in fixed size
> > > arrays (typically 14 bytes).  Strncpy would fill the buffers with
> > > ASCII data up to their size and pad the rest with nuls only if there
> > > was room.
> > >
> > > Strncmp was then used to compare these potentially unterminated
> > > character arrays (e.g., archive headers in ld and ranlib).  The bound
> > > was the size of the fixed size array.  Its other use case was to
> compare
> > > leading portions of strings (e.g, when looking for an environment
> > > variable or when stripping "./" from path names).
> >
> > Thanks for sharing the historical perspective.
> >
> > > Since the early UNIX days, both strncpy and to a lesser extent strncmp
> > > have been widely misused and, along with many other functions in
> > > <string.h>, a frequent source of bugs due to common misunderstanding
> > > of their intended purpose.  The aim of these warnings is to detect
> > > the common (and sometimes less common) misuses and bugs.
> >
> > They're all valid uses however since they do not violate the standard.
> If we
> > find at compile time that the strings don't terminate at the bounds,
> > emitting the warning is OK but the more pessimistic check seems like
> > overkill.
>
> I think I agree, I've tried
>
> #include <string.h>
> char a[] = "abc";
> char b[] = "abcd";
>
> int f (void)
> {
>   return strncmp (a, b, 8);
> }
>
> where I get
>
> t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
> [-Wstringop-overread]
>     7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>       |          ^~~~~~~~~~~~~~~~~
>
> even without -Wall.  strncmp sees that a[3] is '\0' so it stops comparing
> and there's no UB.
>
This one is a clear case where warning is bad.   Both arguments are
constant and we can determine they are NUL terminated and an overread will
never occur.  No deep analysis really needed here.

THe far more interesting case in my mind is when one or both arguments have
an unknown NUL termination state.  I could argue either side of that case.
I lean towards warning but I understand that opinions differ and my
priorities have moved away from distro-level issues, so identifying code
that needs a careful review for correctness, particularly old or security
sensitive code, has become a much lower priority for me.   Combine that
with the fact that we're really just dealing with over-reads here, I can
support whatever the broadest consensus is.

jeff
  
Andreas Schwab March 17, 2022, 5:01 p.m. UTC | #12
On Mär 17 2022, Jeff Law via Gcc-patches wrote:

> On Thu, Mar 17, 2022 at 9:32 AM Marek Polacek via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>> I think I agree, I've tried
>>
>> #include <string.h>
>> char a[] = "abc";
>> char b[] = "abcd";
>>
>> int f (void)
>> {
>>   return strncmp (a, b, 8);
>> }
>>
>> where I get
>>
>> t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
>> [-Wstringop-overread]
>>     7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>>       |          ^~~~~~~~~~~~~~~~~
>>
>> even without -Wall.  strncmp sees that a[3] is '\0' so it stops comparing
>> and there's no UB.
>>
> This one is a clear case where warning is bad.   Both arguments are
> constant and we can determine they are NUL terminated and an overread will
> never occur.  No deep analysis really needed here.

Both a and b are modifiable, thus the compiler cannot assume anything.
  
Siddhesh Poyarekar March 17, 2022, 5:22 p.m. UTC | #13
On 17/03/2022 22:16, Jeff Law wrote:
>     #include <string.h>
>     char a[] = "abc";
>     char b[] = "abcd";
> 
>     int f (void)
>     {
>        return strncmp (a, b, 8);
>     }
> 
>     where I get
> 
>     t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
>     [-Wstringop-overread]
>          7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>            |          ^~~~~~~~~~~~~~~~~
> 
>     even without -Wall.  strncmp sees that a[3] is '\0' so it stops
>     comparing
>     and there's no UB.
> 
> This one is a clear case where warning is bad.   Both arguments are 
> constant and we can determine they are NUL terminated and an overread 
> will never occur.  No deep analysis really needed here.
> 
> THe far more interesting case in my mind is when one or both arguments 
> have an unknown NUL termination state.  I could argue either side of 
> that case.  I lean towards warning but I understand that opinions differ 
> and my priorities have moved away from distro-level issues, so 
> identifying code that needs a careful review for correctness, 
> particularly old or security sensitive code, has become a much lower 
> priority for me.   Combine that with the fact that we're really just 
> dealing with over-reads here, I can support whatever the broadest 
> consensus is.

Actually in the above reproducer a and b are not const, so this is in 
fact the case where the NUL termination state of the strings is in 
theory unknown.  From the distro level (and in general for applications) 
the question is how common this is and I gathered from a Red Hat 
internal conversation that it's not uncommon.  However David pointed out 
that I need to share more specific examples to quantify this, so I need 
to work on that.  I'll share an update once I have it.

One case I am aware of is the pmix package in Fedora/RHEL, which has the 
following warning:

pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
'PMIx_Get' reading 512 bytes from a region of size 15
   179 |     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, 
NULL, 0, &val))) {
       | 
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of type 
'const char *'
pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
   203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
const pmix_key_t key,
       |                           ^~~~~~~~
   177|       PMIX_PROC_CONSTRUCT(&proc);
   178|       PMIX_LOAD_PROCID(&proc, myproc.nspace, PMIX_RANK_WILDCARD);
   179|->     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, 
NULL, 0, &val))) {
   180|           fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
universe size failed: %d\n", myproc.nspace, myproc.rank, rc);
   181|           goto done;

which is due to PMIx_Get calling strncmp a few levels within with 
non-const strings and a max size of 512 (the maximum size that a key 
could be; AFAICT it's the size of the buffer into which the key gets 
written out), where the strings are always NULL terminated.

Thanks,
Siddhesh
  
Martin Sebor March 17, 2022, 5:51 p.m. UTC | #14
On 3/17/22 11:22, Siddhesh Poyarekar wrote:
> On 17/03/2022 22:16, Jeff Law wrote:
>>     #include <string.h>
>>     char a[] = "abc";
>>     char b[] = "abcd";
>>
>>     int f (void)
>>     {
>>        return strncmp (a, b, 8);
>>     }
>>
>>     where I get
>>
>>     t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
>>     [-Wstringop-overread]
>>          7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>>            |          ^~~~~~~~~~~~~~~~~
>>
>>     even without -Wall.  strncmp sees that a[3] is '\0' so it stops
>>     comparing
>>     and there's no UB.
>>
>> This one is a clear case where warning is bad.   Both arguments are 
>> constant and we can determine they are NUL terminated and an overread 
>> will never occur.  No deep analysis really needed here.
>>
>> THe far more interesting case in my mind is when one or both arguments 
>> have an unknown NUL termination state.  I could argue either side of 
>> that case.  I lean towards warning but I understand that opinions 
>> differ and my priorities have moved away from distro-level issues, so 
>> identifying code that needs a careful review for correctness, 
>> particularly old or security sensitive code, has become a much lower 
>> priority for me.   Combine that with the fact that we're really just 
>> dealing with over-reads here, I can support whatever the broadest 
>> consensus is.
> 
> Actually in the above reproducer a and b are not const, so this is in 
> fact the case where the NUL termination state of the strings is in 
> theory unknown.  From the distro level (and in general for applications) 
> the question is how common this is and I gathered from a Red Hat 
> internal conversation that it's not uncommon.  However David pointed out 
> that I need to share more specific examples to quantify this, so I need 
> to work on that.  I'll share an update once I have it.
> 
> One case I am aware of is the pmix package in Fedora/RHEL, which has the 
> following warning:
> 
> pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
> pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
> 'PMIx_Get' reading 512 bytes from a region of size 15
>    179 |     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, 
> NULL, 0, &val))) {
>        | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of type 
> 'const char *'
> pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
> pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
>    203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
> const pmix_key_t key,
>        |                           ^~~~~~~~
>    177|       PMIX_PROC_CONSTRUCT(&proc);
>    178|       PMIX_LOAD_PROCID(&proc, myproc.nspace, PMIX_RANK_WILDCARD);
>    179|->     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, 
> NULL, 0, &val))) {
>    180|           fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
> universe size failed: %d\n", myproc.nspace, myproc.rank, rc);
>    181|           goto done;
> 
> which is due to PMIx_Get calling strncmp a few levels within with 
> non-const strings and a max size of 512 (the maximum size that a key 
> could be; AFAICT it's the size of the buffer into which the key gets 
> written out), where the strings are always NULL terminated.

This warning has nothing to do with strncmp.

It's issued for the call to PMIx_Get(), where the caller passes as
the second argument PMIX_UNIV_SIZE, a macro that expands to
the string "pmix.univ.size".

The function is declared like so:

   PMIX_EXPORT pmix_status_t
   PMIx_Get(const pmix_proc_t *proc, const pmix_key_t key,
            const pmix_info_t info[], size_t ninfo,
            pmix_value_t **val);

The type of the second function argument, pmix_key_t, defined as

   typedef char pmix_key_t[PMIX_MAX_KEYLEN+1];

an array of 512 elements (PMIX_MAX_KEYLEN is defined to 511), but
PMIX_UNIV_SIZE is much smaller (just 15 bytes).

The warning detects passing smaller arrays to parameters of larger
types declared using the array syntax.  It's controlled by
-Warray-parameter.

Martin
  
Siddhesh Poyarekar March 17, 2022, 6:02 p.m. UTC | #15
On 17/03/2022 23:21, Martin Sebor wrote:
> On 3/17/22 11:22, Siddhesh Poyarekar wrote:
>> On 17/03/2022 22:16, Jeff Law wrote:
>>>     #include <string.h>
>>>     char a[] = "abc";
>>>     char b[] = "abcd";
>>>
>>>     int f (void)
>>>     {
>>>        return strncmp (a, b, 8);
>>>     }
>>>
>>>     where I get
>>>
>>>     t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
>>>     [-Wstringop-overread]
>>>          7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>>>            |          ^~~~~~~~~~~~~~~~~
>>>
>>>     even without -Wall.  strncmp sees that a[3] is '\0' so it stops
>>>     comparing
>>>     and there's no UB.
>>>
>>> This one is a clear case where warning is bad.   Both arguments are 
>>> constant and we can determine they are NUL terminated and an overread 
>>> will never occur.  No deep analysis really needed here.
>>>
>>> THe far more interesting case in my mind is when one or both 
>>> arguments have an unknown NUL termination state.  I could argue 
>>> either side of that case.  I lean towards warning but I understand 
>>> that opinions differ and my priorities have moved away from 
>>> distro-level issues, so identifying code that needs a careful review 
>>> for correctness, particularly old or security sensitive code, has 
>>> become a much lower priority for me.   Combine that with the fact 
>>> that we're really just dealing with over-reads here, I can support 
>>> whatever the broadest consensus is.
>>
>> Actually in the above reproducer a and b are not const, so this is in 
>> fact the case where the NUL termination state of the strings is in 
>> theory unknown.  From the distro level (and in general for 
>> applications) the question is how common this is and I gathered from a 
>> Red Hat internal conversation that it's not uncommon.  However David 
>> pointed out that I need to share more specific examples to quantify 
>> this, so I need to work on that.  I'll share an update once I have it.
>>
>> One case I am aware of is the pmix package in Fedora/RHEL, which has 
>> the following warning:
>>
>> pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
>> pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
>> 'PMIx_Get' reading 512 bytes from a region of size 15
>>    179 |     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, PMIX_UNIV_SIZE, 
>> NULL, 0, &val))) {
>>        | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of 
>> type 'const char *'
>> pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
>> pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
>>    203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
>> const pmix_key_t key,
>>        |                           ^~~~~~~~
>>    177|       PMIX_PROC_CONSTRUCT(&proc);
>>    178|       PMIX_LOAD_PROCID(&proc, myproc.nspace, PMIX_RANK_WILDCARD);
>>    179|->     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, 
>> PMIX_UNIV_SIZE, NULL, 0, &val))) {
>>    180|           fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
>> universe size failed: %d\n", myproc.nspace, myproc.rank, rc);
>>    181|           goto done;
>>
>> which is due to PMIx_Get calling strncmp a few levels within with 
>> non-const strings and a max size of 512 (the maximum size that a key 
>> could be; AFAICT it's the size of the buffer into which the key gets 
>> written out), where the strings are always NULL terminated.
> 
> This warning has nothing to do with strncmp.
> 
> It's issued for the call to PMIx_Get(), where the caller passes as
> the second argument PMIX_UNIV_SIZE, a macro that expands to
> the string "pmix.univ.size".
> 
> The function is declared like so:
> 
>    PMIX_EXPORT pmix_status_t
>    PMIx_Get(const pmix_proc_t *proc, const pmix_key_t key,
>             const pmix_info_t info[], size_t ninfo,
>             pmix_value_t **val);
> 
> The type of the second function argument, pmix_key_t, defined as
> 
>    typedef char pmix_key_t[PMIX_MAX_KEYLEN+1];
> 
> an array of 512 elements (PMIX_MAX_KEYLEN is defined to 511), but
> PMIX_UNIV_SIZE is much smaller (just 15 bytes).
> 
> The warning detects passing smaller arrays to parameters of larger
> types declared using the array syntax.  It's controlled by
> -Warray-parameter.

That's odd, shouldn't it show up as -Warray-parameter then and not 
-Wstringop-overread?

Siddhesh
  
Martin Sebor March 17, 2022, 6:13 p.m. UTC | #16
On 3/17/22 12:02, Siddhesh Poyarekar wrote:
> On 17/03/2022 23:21, Martin Sebor wrote:
>> On 3/17/22 11:22, Siddhesh Poyarekar wrote:
>>> On 17/03/2022 22:16, Jeff Law wrote:
>>>>     #include <string.h>
>>>>     char a[] = "abc";
>>>>     char b[] = "abcd";
>>>>
>>>>     int f (void)
>>>>     {
>>>>        return strncmp (a, b, 8);
>>>>     }
>>>>
>>>>     where I get
>>>>
>>>>     t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source 
>>>> size 5
>>>>     [-Wstringop-overread]
>>>>          7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>>>>            |          ^~~~~~~~~~~~~~~~~
>>>>
>>>>     even without -Wall.  strncmp sees that a[3] is '\0' so it stops
>>>>     comparing
>>>>     and there's no UB.
>>>>
>>>> This one is a clear case where warning is bad.   Both arguments are 
>>>> constant and we can determine they are NUL terminated and an 
>>>> overread will never occur.  No deep analysis really needed here.
>>>>
>>>> THe far more interesting case in my mind is when one or both 
>>>> arguments have an unknown NUL termination state.  I could argue 
>>>> either side of that case.  I lean towards warning but I understand 
>>>> that opinions differ and my priorities have moved away from 
>>>> distro-level issues, so identifying code that needs a careful review 
>>>> for correctness, particularly old or security sensitive code, has 
>>>> become a much lower priority for me.   Combine that with the fact 
>>>> that we're really just dealing with over-reads here, I can support 
>>>> whatever the broadest consensus is.
>>>
>>> Actually in the above reproducer a and b are not const, so this is in 
>>> fact the case where the NUL termination state of the strings is in 
>>> theory unknown.  From the distro level (and in general for 
>>> applications) the question is how common this is and I gathered from 
>>> a Red Hat internal conversation that it's not uncommon.  However 
>>> David pointed out that I need to share more specific examples to 
>>> quantify this, so I need to work on that.  I'll share an update once 
>>> I have it.
>>>
>>> One case I am aware of is the pmix package in Fedora/RHEL, which has 
>>> the following warning:
>>>
>>> pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
>>> pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
>>> 'PMIx_Get' reading 512 bytes from a region of size 15
>>>    179 |     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, 
>>> PMIX_UNIV_SIZE, NULL, 0, &val))) {
>>>        | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of 
>>> type 'const char *'
>>> pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
>>> pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
>>>    203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
>>> const pmix_key_t key,
>>>        |                           ^~~~~~~~
>>>    177|       PMIX_PROC_CONSTRUCT(&proc);
>>>    178|       PMIX_LOAD_PROCID(&proc, myproc.nspace, 
>>> PMIX_RANK_WILDCARD);
>>>    179|->     if (PMIX_SUCCESS != (rc = PMIx_Get(&proc, 
>>> PMIX_UNIV_SIZE, NULL, 0, &val))) {
>>>    180|           fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
>>> universe size failed: %d\n", myproc.nspace, myproc.rank, rc);
>>>    181|           goto done;
>>>
>>> which is due to PMIx_Get calling strncmp a few levels within with 
>>> non-const strings and a max size of 512 (the maximum size that a key 
>>> could be; AFAICT it's the size of the buffer into which the key gets 
>>> written out), where the strings are always NULL terminated.
>>
>> This warning has nothing to do with strncmp.
>>
>> It's issued for the call to PMIx_Get(), where the caller passes as
>> the second argument PMIX_UNIV_SIZE, a macro that expands to
>> the string "pmix.univ.size".
>>
>> The function is declared like so:
>>
>>    PMIX_EXPORT pmix_status_t
>>    PMIx_Get(const pmix_proc_t *proc, const pmix_key_t key,
>>             const pmix_info_t info[], size_t ninfo,
>>             pmix_value_t **val);
>>
>> The type of the second function argument, pmix_key_t, defined as
>>
>>    typedef char pmix_key_t[PMIX_MAX_KEYLEN+1];
>>
>> an array of 512 elements (PMIX_MAX_KEYLEN is defined to 511), but
>> PMIX_UNIV_SIZE is much smaller (just 15 bytes).
>>
>> The warning detects passing smaller arrays to parameters of larger
>> types declared using the array syntax.  It's controlled by
>> -Warray-parameter.
> 
> That's odd, shouldn't it show up as -Warray-parameter then and not 
> -Wstringop-overread?

I should have said: the array parameter feature is controlled by
-Warray-parameter.  (The warning above is obviously
-Wstringop-overread).

But since -Warray-parameter controls the array parameter feature,
it might be appropriate to also make the -Wstringop-overread and
-Wstringop-overflow instances for calls to such functions conditional
on the former option being enabled.  (Otherwise, -Warray-parameter
has a separate function of its own.)

I do see a minor issue with this warning in GCC 12: it's issued three
times for the same call, rather than once.  That's probably because
the new warning pass that issues it runs multiple times and doesn't
suppress the warning after issuing it the first time.

Martin

> 
> Siddhesh
>
  
Jason Merrill March 25, 2022, 1:26 p.m. UTC | #17
On 3/17/22 06:35, Jonathan Wakely via Gcc-patches wrote:
> On 15/03/22 14:36 -0600, Martin Sebor wrote:
>> On 3/15/22 10:40, Siddhesh Poyarekar wrote:
>>> On 15/03/2022 21:09, Martin Sebor wrote:
>>>> The strncmp function takes arrays as arguments (not necessarily
>>>> strings).? The main purpose of the -Wstringop-overread warning
>>>> for calls to it is to detect calls where one of the arrays is
>>>> not a nul-terminated string and the bound is larger than the size
>>>> of the array.? For example:
>>>>
>>>> ?? char a[4], b[4];
>>>>
>>>> ?? int f (void)
>>>> ?? {
>>>> ???? return strncmp (a, b, 8);?? // -Wstringop-overread
>>>> ?? }
>>>>
>>>> Such a call is suspect: if one of the arrays isn't nul-terminated
>>>> the call is undefined.? Otherwise, if both are nul-terminated there
>>>
>>> Isn't "suspect" too harsh a description though?? The bound does not
>>> specify the size of a or b, it specifies the maximum extent to which to
>>> compare a and b, the extent being any application-specific limit.? In
>>> fact the limit could be the size of some arbitrary third buffer that the
>>> contents of a or b must be copied to, truncating to the bound.
>>
>> The intended use of the strncmp bound is to limit the comparison to
>> at most the size of the arrays or (in a subset of cases) the length
>> of an initial substring. Providing an arbitrary bound that's not
>> related to the sizes as you describe sounds very much like a misuse.
>>
>> As a historical note, strncmp was first introduced in UNIX v7 where
>> its purpose, alongside strncpy, was to manipulate (potentially)
>> unterminated character arrays like file names stored in fixed size
>> arrays (typically 14 bytes).  Strncpy would fill the buffers with
>> ASCII data up to their size and pad the rest with nuls only if there
>> was room.
>>
>> Strncmp was then used to compare these potentially unterminated
>> character arrays (e.g., archive headers in ld and ranlib).  The bound
>> was the size of the fixed size array.  Its other use case was to compare
>> leading portions of strings (e.g, when looking for an environment
>> variable or when stripping "./" from path names).
>>
>> Since the early UNIX days, both strncpy and to a lesser extent strncmp
>> have been widely misused and, along with many other functions in
>> <string.h>, a frequent source of bugs due to common misunderstanding
>> of their intended purpose.  The aim of these warnings is to detect
>> the common (and sometimes less common) misuses and bugs.
>>
>>> I agree the call is undefined if one of the arrays is not nul-terminated
>>> and that's the thing; nothing about the bound is undefined in this
>>> context, it's the NUL termination that is key.
>>>
>>>> is no point in calling strncmp with a bound greater than their sizes.
>>>
>>> There is, when the bound describes something else, e.g. the size of a
>>> third destination buffer into which one of the input buffers may get
>>> copied into.? Or when the bound describes the maximum length of a set of
>>> strings where only a subset of the strings are reachable in the current
>>> function and ranger sees it, allowing us to reduce our input string size
>>> estimate.? The bounds being the maximum of the lengths of two input
>>> strings is just one of many possibilities.
>>>
>>>> With no evidence that this warning is ever harmful I'd consider
>>>
>>> There is, the false positives were seen in Fedora/RHEL builds.
>>
>> I haven't seen these so I can't very well comment on them.  But I can
>> assure you that warning for the code above is intentional.  Whether
> 
> I don't think anybody is saying it wasn't intentional, the point is
> that we can change our minds and do it differently based on feedback
> and usage experience.
> 
> If users no longer have faith in these warnings and just disable them
> or ignore them, then the warnings do not find real bugs and are not
> fit for purpose.
> 
>> or not the arrays are nul-terminated, the expected way to call
>> the function is with a bound no greater than their size (some coding
>> guidelines are explicit about this; see for example the CERT C Secure
>> Coding standard rule ARR38-C).
> 
> That's fine. It shouldn't be in -Wall though.

Perhaps a suitable compromise would be to add a separate warning flag 
specifically for the strn* warnings, so users deliberately using the 
bound to express a limit other than the length of the argument string 
(and confident that their strings are always NUL-terminated) can turn 
them off without turning off all the overread warnings.

Jason
  
Siddhesh Poyarekar March 25, 2022, 2:14 p.m. UTC | #18
On 25/03/2022 18:56, Jason Merrill via Gcc-patches wrote:
> Perhaps a suitable compromise would be to add a separate warning flag 
> specifically for the strn* warnings, so users deliberately using the 
> bound to express a limit other than the length of the argument string 
> (and confident that their strings are always NUL-terminated) can turn 
> them off without turning off all the overread warnings.

For strncmp (in cases where NUL termination cannot be proven) that is 
perhaps a reasonable compromise.  However I think I need to take a 
closer look to figure out if there are other ways to work around this, 
especially since discovering that I had misread the previous report.

I take back this patch and will revisit this a bit later, probably once 
stage 1 opens.

Thanks,
Siddhesh
  

Patch

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 75297ed7c9e..15299770e29 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2137,6 +2137,9 @@  private:
   /* Return true if use follows an invalidating statement.  */
   bool use_after_inval_p (gimple *, gimple *, bool = false);
 
+  /* Emit an overread warning for zero sized inputs to strncmp.  */
+  void warn_zero_sized_strncmp_inputs (gimple *, tree *, access_data *);
+
   /* A pointer_query object to store information about pointers and
      their targets in.  */
   pointer_query m_ptr_qry;
@@ -2619,8 +2622,20 @@  pass_waccess::check_stxncpy (gcall *stmt)
 		data.mode, &data, m_ptr_qry.rvals);
 }
 
-/* Check a call STMT to stpncpy() or strncpy() for overflow and warn
-   if it does.  */
+/* Warn for strncmp on a zero sized source or when an argument isn't
+   nul-terminated.  */
+void
+pass_waccess::warn_zero_sized_strncmp_inputs (gimple *stmt, tree *bndrng,
+					      access_data *pad)
+{
+  tree func = get_callee_fndecl (stmt);
+  location_t loc = gimple_location (stmt);
+  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func, bndrng,
+			size_zero_node, pad);
+}
+
+/* Check a call STMT to strncmp () for overflow and warn if it does.  This is
+   limited to checking for NUL terminated arrays for now.  */
 
 void
 pass_waccess::check_strncmp (gcall *stmt)
@@ -2678,46 +2693,16 @@  pass_waccess::check_strncmp (gcall *stmt)
   if (!bndrng[0] || integer_zerop (bndrng[0]))
     return;
 
-  if (len1 && tree_int_cst_lt (len1, bndrng[0]))
-    bndrng[0] = len1;
-  if (len2 && tree_int_cst_lt (len2, bndrng[0]))
-    bndrng[0] = len2;
-
-  /* compute_objsize almost never fails (and ultimately should never
-     fail).  Don't bother to handle the rare case when it does.  */
-  if (!compute_objsize (arg1, stmt, 1, &adata1.src, &m_ptr_qry)
-      || !compute_objsize (arg2, stmt, 1, &adata2.src, &m_ptr_qry))
-    return;
-
-  /* Compute the size of the remaining space in each array after
-     subtracting any offset into it.  */
-  offset_int rem1 = adata1.src.size_remaining ();
-  offset_int rem2 = adata2.src.size_remaining ();
-
-  /* Cap REM1 and REM2 at the other if the other's argument is known
-     to be an unterminated array, either because there's no space
-     left in it after adding its offset or because it's constant and
-     has no nul.  */
-  if (rem1 == 0 || (rem1 < rem2 && lendata1.decl))
-    rem2 = rem1;
-  else if (rem2 == 0 || (rem2 < rem1 && lendata2.decl))
-    rem1 = rem2;
-
-  /* Point PAD at the array to reference in the note if a warning
-     is issued.  */
-  access_data *pad = len1 ? &adata2 : &adata1;
-  offset_int maxrem = wi::max (rem1, rem2, UNSIGNED);
-  if (lendata1.decl || lendata2.decl
-      || maxrem < wi::to_offset (bndrng[0]))
-    {
-      /* Warn when either argument isn't nul-terminated or the maximum
-	 remaining space in the two arrays is less than the bound.  */
-      tree func = get_callee_fndecl (stmt);
-      location_t loc = gimple_location (stmt);
-      maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func,
-			    bndrng, wide_int_to_tree (sizetype, maxrem),
-			    pad);
-    }
+  /* compute_objsize almost never fails (and ultimately should never fail).
+     Don't bother to handle the rare case when it does.  Warn if either the
+     source or destination has zero size, since the minimum bound is non-zero,
+     hence guaranteeing an overread.  */
+  if (compute_objsize (arg1, stmt, 1, &adata1.src, &m_ptr_qry)
+      && adata1.src.size_remaining () == 0)
+    warn_zero_sized_strncmp_inputs (stmt, bndrng, &adata1);
+  if (compute_objsize (arg2, stmt, 1, &adata2.src, &m_ptr_qry)
+      && adata2.src.size_remaining () == 0)
+    warn_zero_sized_strncmp_inputs (stmt, bndrng, &adata2);
 }
 
 /* Determine and check the sizes of the source and the destination
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overread.c b/gcc/testsuite/gcc.dg/Wstringop-overread.c
index 7db74029819..fb8e626439d 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overread.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overread.c
@@ -431,7 +431,7 @@  void test_strncmp_array (const char *s, int i)
 
   T (strncmp (a1, b1, 0));
   T (strncmp (a1, b1, 1));
-  T (strncmp (a1, b1, 2));      // { dg-warning "'strncmp' specified bound 2 exceeds source size 1" }
+  T (strncmp (a1, b1, 2));
 }