[v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h

Message ID 20230710161300.1678172-1-xry111@xry111.site
State Superseded
Headers
Series [v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

Xi Ruoyao July 10, 2023, 4:13 p.m. UTC
  During the review of a GCC analyzer test case, we found most stdio
functions accepting a FILE * argument expect it to be nonnull and just
segfault when the argument is NULL.  Add nonnull attribute for them.

fflush and fflush_unlocked are well defined when __stream is NULL so
they are not touched.

For fputs, fgets, fread, fwrite, fprintf, vfprintf, and their unlocked
version, if __stream is empty but there is nothing to read or write,
they did not segfault.  But the standard disallow __stream to be empty
here, so nonnull attribute is also added for them.  Note that this may
blow up some old code already subtly broken.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
 libio/stdio.h | 148 ++++++++++++++++++++++++++++----------------------
 1 file changed, 82 insertions(+), 66 deletions(-)
  

Comments

Zack Weinberg July 10, 2023, 5:12 p.m. UTC | #1
On Mon, Jul 10, 2023, at 12:13 PM, Xi Ruoyao via Libc-alpha wrote:
> During the review of a GCC analyzer test case, we found most stdio
> functions accepting a FILE * argument expect it to be nonnull and just
> segfault when the argument is NULL.  Add nonnull attribute for them.

I think this patchset has a high risk of breaking application code,
because "this function will promptly crash if passed a NULL pointer" is
a very different property from "any code path that would cause this
function to be passed a NULL pointer is necessarily unreachable."

If we take it at all -- and my current gut feeling is that we
*shouldn't* -- we should do so early in a release cycle to give us the
best chance of discovering broken applications before the release.

zw
  
Xi Ruoyao July 10, 2023, 5:27 p.m. UTC | #2
On Mon, 2023-07-10 at 13:12 -0400, Zack Weinberg wrote:
> On Mon, Jul 10, 2023, at 12:13 PM, Xi Ruoyao via Libc-alpha wrote:
> > During the review of a GCC analyzer test case, we found most stdio
> > functions accepting a FILE * argument expect it to be nonnull and just
> > segfault when the argument is NULL.  Add nonnull attribute for them.
> 
> I think this patchset has a high risk of breaking application code,
> because "this function will promptly crash if passed a NULL pointer" is
> a very different property from "any code path that would cause this
> function to be passed a NULL pointer is necessarily unreachable."
> 
> If we take it at all -- and my current gut feeling is that we
> *shouldn't* -- we should do so early in a release cycle to give us the
> best chance of discovering broken applications before the release.

If they want to rely on "it must crash if passed a NULL pointer", they
should really use -fisolate-erroneous-paths-attribute.
  
Siddhesh Poyarekar July 10, 2023, 5:51 p.m. UTC | #3
On 2023-07-10 13:12, Zack Weinberg wrote:
> On Mon, Jul 10, 2023, at 12:13 PM, Xi Ruoyao via Libc-alpha wrote:
>> During the review of a GCC analyzer test case, we found most stdio
>> functions accepting a FILE * argument expect it to be nonnull and just
>> segfault when the argument is NULL.  Add nonnull attribute for them.
> 
> I think this patchset has a high risk of breaking application code,
> because "this function will promptly crash if passed a NULL pointer" is
> a very different property from "any code path that would cause this
> function to be passed a NULL pointer is necessarily unreachable."
> 
> If we take it at all -- and my current gut feeling is that we
> *shouldn't* -- we should do so early in a release cycle to give us the
> best chance of discovering broken applications before the release.

Thanks for your comment; it made me take a closer look at this.  I 
suppose it makes sense to push it in right after we tag 2.38 then, so 
that there's the rest of the year to test and fix broken applications 
before 2.39.  It may make sense to backport to the release branch for 
distributions if we find it to be stable enough.

Would it be more acceptable to you if this gets wrapped into fortify, 
i.e. it gets enabled if _FORTIFY_SOURCE is defined?  In fact, the 
wrappers in stdio2.h and the _chk variants of those functions should 
likely also get the __nonnull annotation.

Thanks,
Sid
  
Xi Ruoyao July 10, 2023, 6:41 p.m. UTC | #4
On Mon, 2023-07-10 at 13:51 -0400, Siddhesh Poyarekar wrote:
> On 2023-07-10 13:12, Zack Weinberg wrote:
> > On Mon, Jul 10, 2023, at 12:13 PM, Xi Ruoyao via Libc-alpha wrote:
> > > During the review of a GCC analyzer test case, we found most stdio
> > > functions accepting a FILE * argument expect it to be nonnull and just
> > > segfault when the argument is NULL.  Add nonnull attribute for them.
> > 
> > I think this patchset has a high risk of breaking application code,
> > because "this function will promptly crash if passed a NULL pointer" is
> > a very different property from "any code path that would cause this
> > function to be passed a NULL pointer is necessarily unreachable."
> > 
> > If we take it at all -- and my current gut feeling is that we
> > *shouldn't* -- we should do so early in a release cycle to give us the
> > best chance of discovering broken applications before the release.
> 
> Thanks for your comment; it made me take a closer look at this.  I 
> suppose it makes sense to push it in right after we tag 2.38 then, so 
> that there's the rest of the year to test and fix broken applications 
> before 2.39.  It may make sense to backport to the release branch for 
> distributions if we find it to be stable enough.
> 
> Would it be more acceptable to you if this gets wrapped into fortify, 
> i.e. it gets enabled if _FORTIFY_SOURCE is defined?  In fact, the 
> wrappers in stdio2.h and the _chk variants of those functions should 
> likely also get the __nonnull annotation.

But it then means w/o -D_FORTIFY_SOURCE we'll still not see the warning,
and GCC analyzer developers will still implement a lot of special cases
:(.

Maybe we should have a weaker version of nonnull which only performs the
diagnostic, not the optimization.  But it looks like they hate the idea:
https://gcc.gnu.org/PR110617.
  
Zack Weinberg July 10, 2023, 6:56 p.m. UTC | #5
On Mon, Jul 10, 2023, at 1:51 PM, Siddhesh Poyarekar wrote:
> On 2023-07-10 13:12, Zack Weinberg wrote:
>> On Mon, Jul 10, 2023, at 12:13 PM, Xi Ruoyao via Libc-alpha wrote:
>>> During the review of a GCC analyzer test case, we found most stdio
>>> functions accepting a FILE * argument expect it to be nonnull and
>>> just segfault when the argument is NULL.  Add nonnull attribute
>>> for them.
>>
>> I think this patchset has a high risk of breaking application code,
>> because "this function will promptly crash if passed a NULL pointer"
>> is a very different property from "any code path that would cause
>> this function to be passed a NULL pointer is necessarily
>> unreachable."
>>
>> If we take it at all -- and my current gut feeling is that we
>> *shouldn't* -- we should do so early in a release cycle to give us
>> the best chance of discovering broken applications before the
>> release.
>
> Thanks for your comment; it made me take a closer look at this.  I
> suppose it makes sense to push it in right after we tag 2.38 then, so
> that there's the rest of the year to test and fix broken applications
> before 2.39.

That would be fine with me.

> Would it be more acceptable to you if this gets wrapped into fortify,
> i.e. it gets enabled if _FORTIFY_SOURCE is defined?

I tend to agree with Xi that having the presence of __nonnull depend on
_FORTIFY_SOURCE would cause more problems than it solves.  Also, since
several Linux distributions enable _FORTIFY_SOURCE by default, we'd
still be risking significant breakage if we shipped that in 2.38.

> In fact, the wrappers in stdio2.h and the _chk variants of those
> functions should likely also get the __nonnull annotation.

Yes, divergence between the _chk variants and the unfortified variants
should be avoided as much as possible.

zw
  
Zack Weinberg July 10, 2023, 7:06 p.m. UTC | #6
On Mon, Jul 10, 2023, at 1:27 PM, Xi Ruoyao wrote:
> On Mon, 2023-07-10 at 13:12 -0400, Zack Weinberg wrote:
>> On Mon, Jul 10, 2023, at 12:13 PM, Xi Ruoyao via Libc-alpha wrote:
>> > During the review of a GCC analyzer test case, we found most stdio
>> > functions accepting a FILE * argument expect it to be nonnull and
>> > just segfault when the argument is NULL.  Add nonnull attribute for
>> > them.
>>
>> I think this patchset has a high risk of breaking application code,
>> because "this function will promptly crash if passed a NULL pointer"
>> is a very different property from "any code path that would cause
>> this function to be passed a NULL pointer is necessarily
>> unreachable."
>>
>> If we take it at all -- and my current gut feeling is that we
>> *shouldn't* -- we should do so early in a release cycle to give us
>> the best chance of discovering broken applications before the
>> release.
>
> If they want to rely on "it must crash if passed a NULL pointer", they
> should really use -fisolate-erroneous-paths-attribute.

The documentation for this option is not very clear to me, so I can't
tell if this is a sufficient workaround for the kind of breakage I've
seen in the past from over-zealous use of __attribute__((nonnull)).

In general, I think C compilers should _never_ assume that a construct
with UB is unreachable.  Converting constructs that are certain to cause
runtime UB into trap instructions _can_ be OK, but in the case where a
library function is provably called with arguments that cause UB, all
of the side effects of evaluating the function's arguments still need to
occur before the trap is executed.

zw
  
Siddhesh Poyarekar July 10, 2023, 7:31 p.m. UTC | #7
On 2023-07-10 14:56, Zack Weinberg wrote:
>> Would it be more acceptable to you if this gets wrapped into fortify,
>> i.e. it gets enabled if _FORTIFY_SOURCE is defined?
> 
> I tend to agree with Xi that having the presence of __nonnull depend on
> _FORTIFY_SOURCE would cause more problems than it solves.  Also, since
> several Linux distributions enable _FORTIFY_SOURCE by default, we'd
> still be risking significant breakage if we shipped that in 2.38.

I'm less concerned about the distribution breakage because they'll more 
likely than not get fixed; in fact my suggestion to put it behind the 
_FORTIFY_SOURCE wall was precisely for that reason.  I'd like us to weed 
out such cases in the distribution and get them fixed rather than 
maintaining status quo.  I'm relatively more concerned about 
non-distribution applications that tend to, e.g. disable security 
features because they see them as either performance hindrances or want 
some legacy broken code to just work.

Of course, I'm not concerned enough about these applications (sorry) to 
insist that it be put behind _FORTIFY_SOURCE, but I think it's a 
reasonable compromise.  That doesn't directly solve the analyzer problem 
though.  Maybe if it's OK to have the analyzer affect codegen, we could 
have the analyzer define _FORTIFY_SOURCE=3 and thus enable additional 
diagnostics too, like the __wur that also gets enabled only on 
fortification.  Is that something worth considering?

Thanks,
Sid
  
Xi Ruoyao July 10, 2023, 7:31 p.m. UTC | #8
On Mon, 2023-07-10 at 15:06 -0400, Zack Weinberg wrote:
> The documentation for this option is not very clear to me, so I can't
> tell if this is a sufficient workaround for the kind of breakage I've
> seen in the past from over-zealous use of __attribute__((nonnull)).
> 
> In general, I think C compilers should _never_ assume that a construct
> with UB is unreachable.  Converting constructs that are certain to cause
> runtime UB into trap instructions _can_ be OK, but in the case where a
> library function is provably called with arguments that cause UB, all
> of the side effects of evaluating the function's arguments still need to
> occur before the trap is executed.

AFAIK recent GCC releases behave like we expect.  I've seen some GCC
commits making GCC not to optimize away side effects before the UB. 
And, for a simple test:

extern int f(int *x) __attribute__((nonnull(1), pure));
extern int *p;

int g()
{
  if (!p)
    puts("oops");
  int x = f(p);
  return x;
}

GCC 13.1 does not optimize the "oops" line away, as it's before the UB.
But if we move the oops line after calling f:

int g()
{
  int x = f(p);
  if (!p)
    puts("oops");
  return x;
}

Then GCC 13.1 optimizes the oops line away because it's after the UB,
and I think it's perfectly valid (if "f" is a Glibc function where NULL
is unacceptable we'll make it crash, and the oops line won't be printed
anyway).

However earlier GCC releases indeed optimize away the side effects
before an UB.  I'm not sure about "how early", but this nasty issue has
been documented in some famous articles about UB, and I remember in my
lecture about UB in 2019 I explicitly warned the students about it.
  
Xi Ruoyao July 10, 2023, 7:35 p.m. UTC | #9
On Mon, 2023-07-10 at 15:31 -0400, Siddhesh Poyarekar wrote:
> On 2023-07-10 14:56, Zack Weinberg wrote:
> > > Would it be more acceptable to you if this gets wrapped into fortify,
> > > i.e. it gets enabled if _FORTIFY_SOURCE is defined?
> > 
> > I tend to agree with Xi that having the presence of __nonnull depend on
> > _FORTIFY_SOURCE would cause more problems than it solves.  Also, since
> > several Linux distributions enable _FORTIFY_SOURCE by default, we'd
> > still be risking significant breakage if we shipped that in 2.38.
> 
> I'm less concerned about the distribution breakage because they'll more 
> likely than not get fixed; in fact my suggestion to put it behind the 
> _FORTIFY_SOURCE wall was precisely for that reason.  I'd like us to weed 
> out such cases in the distribution and get them fixed rather than 
> maintaining status quo.  I'm relatively more concerned about 
> non-distribution applications that tend to, e.g. disable security 
> features because they see them as either performance hindrances or want 
> some legacy broken code to just work.
> 
> Of course, I'm not concerned enough about these applications (sorry) to 
> insist that it be put behind _FORTIFY_SOURCE, but I think it's a 
> reasonable compromise.  That doesn't directly solve the analyzer problem 
> though.  Maybe if it's OK to have the analyzer affect codegen, we could 
> have the analyzer define _FORTIFY_SOURCE=3 and thus enable additional 
> diagnostics too, like the __wur that also gets enabled only on 
> fortification.  Is that something worth considering?

Or can we just guard the __nonnull usage against __GNUC_PREREQ (x, 0)
where x is 12 or 13?  In the recent GCC releases the optimizer won't
kill a side effect before an UB so it should be much safer (see my reply
to Zack), and it's unlikely they'll use the latest GCC for some legacy
broken code.
  
Siddhesh Poyarekar July 10, 2023, 7:46 p.m. UTC | #10
On 2023-07-10 15:35, Xi Ruoyao wrote:
>> Of course, I'm not concerned enough about these applications (sorry) to
>> insist that it be put behind _FORTIFY_SOURCE, but I think it's a
>> reasonable compromise.  That doesn't directly solve the analyzer problem
>> though.  Maybe if it's OK to have the analyzer affect codegen, we could
>> have the analyzer define _FORTIFY_SOURCE=3 and thus enable additional
>> diagnostics too, like the __wur that also gets enabled only on
>> fortification.  Is that something worth considering?
> 
> Or can we just guard the __nonnull usage against __GNUC_PREREQ (x, 0)
> where x is 12 or 13?  In the recent GCC releases the optimizer won't
> kill a side effect before an UB so it should be much safer (see my reply
> to Zack), and it's unlikely they'll use the latest GCC for some legacy
> broken code.

It depends on whether that's a deliberate design decision in gcc (and we 
should probably look at what clang does too) or if it just happens to be 
so today because of some IR layout coincidence.  Then there are 
compilers that pretend[1] to be gcc or clang but don't behave anything 
like them.

Thanks,
Sid

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=30621
  
Alejandro Colomar July 10, 2023, 8:14 p.m. UTC | #11
[CC += Andrew]

Hi Xi, Andrew,

On 7/10/23 20:41, Xi Ruoyao wrote:
> Maybe we should have a weaker version of nonnull which only performs the
> diagnostic, not the optimization.  But it looks like they hate the idea:
> https://gcc.gnu.org/PR110617.
> 
This is the one thing that makes me use both Clang and GCC to compile,
because while any of them would be enough to build, I want as much
static analysis as I can get, and so I want -fanalyzer (so I need GCC),
but I also use _Nullable (so I need Clang).

If GCC had support for _Nullable, I would have in GCC the superset of
features that I need from both in a single vendor.  Moreover, Clang's
static analyzer is brain-damaged (sorry, but it doesn't have a simple
command line to run it, contrary to GCC's easy -fanalyzer), so having
GCC's analyzer get over those _Nullable qualifiers would be great.

Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
mode, as there are many cases where the compiler doesn't have enough
information, and the analyzer can get rid of false negatives and
positives.  See: <https://github.com/llvm/llvm-project/issues/57546>

I'll back the ask for the qualifiers in GCC, for compatibility with
Clang.

Thanks,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
  
Alejandro Colomar July 10, 2023, 8:16 p.m. UTC | #12
On 7/10/23 22:14, Alejandro Colomar wrote:
> [CC += Andrew]
> 
> Hi Xi, Andrew,
> 
> On 7/10/23 20:41, Xi Ruoyao wrote:
>> Maybe we should have a weaker version of nonnull which only performs the
>> diagnostic, not the optimization.  But it looks like they hate the idea:
>> https://gcc.gnu.org/PR110617.
>>
> This is the one thing that makes me use both Clang and GCC to compile,
> because while any of them would be enough to build, I want as much
> static analysis as I can get, and so I want -fanalyzer (so I need GCC),
> but I also use _Nullable (so I need Clang).
> 
> If GCC had support for _Nullable, I would have in GCC the superset of
> features that I need from both in a single vendor.  Moreover, Clang's
> static analyzer is brain-damaged (sorry, but it doesn't have a simple
> command line to run it, contrary to GCC's easy -fanalyzer), so having
> GCC's analyzer get over those _Nullable qualifiers would be great.
> 
> Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
> mode, as there are many cases where the compiler doesn't have enough
> information, and the analyzer can get rid of false negatives and
> positives.  See: <https://github.com/llvm/llvm-project/issues/57546>
> 
> I'll back the ask for the qualifiers in GCC, for compatibility with
> Clang.

BTW, Bionic libc is adding those qualifiers:

<https://android-review.googlesource.com/c/platform/bionic/+/2552567/7/libc/include/netinet/ether.h#45>
<https://android-review.googlesource.com/q/owner:zijunzhao@google.com+Nullability>

> 
> Thanks,
> Alex
> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
  
Xi Ruoyao July 10, 2023, 8:23 p.m. UTC | #13
On Mon, 2023-07-10 at 15:46 -0400, Siddhesh Poyarekar wrote:
> On 2023-07-10 15:35, Xi Ruoyao wrote:
> > > Of course, I'm not concerned enough about these applications (sorry) to
> > > insist that it be put behind _FORTIFY_SOURCE, but I think it's a
> > > reasonable compromise.  That doesn't directly solve the analyzer problem
> > > though.  Maybe if it's OK to have the analyzer affect codegen, we could
> > > have the analyzer define _FORTIFY_SOURCE=3 and thus enable additional
> > > diagnostics too, like the __wur that also gets enabled only on
> > > fortification.  Is that something worth considering?
> > 
> > Or can we just guard the __nonnull usage against __GNUC_PREREQ (x, 0)
> > where x is 12 or 13?  In the recent GCC releases the optimizer won't
> > kill a side effect before an UB so it should be much safer (see my reply
> > to Zack), and it's unlikely they'll use the latest GCC for some legacy
> > broken code.
> 
> It depends on whether that's a deliberate design decision in gcc (and we 
> should probably look at what clang does too) or if it just happens to be 
> so today because of some IR layout coincidence.

Allow me trying to get some clarification...  Jeff has said

"I thought during the introduction of erroneous path isolation that we
concluded stores, calls and such had observable side effects that must
be preserved, even when we hit a block that leads to
__builtin_unreachable."

during the reviewing of some GCC patch.  So is the "must be preserved"
statement applies generally, or only for -fisolate-erroneous-paths-*?

> Then there are 
> compilers that pretend[1] to be gcc or clang but don't behave anything
> like them.
> 
> Thanks,
> Sid
> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=30621
  
Jeff Law July 10, 2023, 8:33 p.m. UTC | #14
On 7/10/23 14:23, Xi Ruoyao wrote:
> On Mon, 2023-07-10 at 15:46 -0400, Siddhesh Poyarekar wrote:
>> On 2023-07-10 15:35, Xi Ruoyao wrote:
>>>> Of course, I'm not concerned enough about these applications (sorry) to
>>>> insist that it be put behind _FORTIFY_SOURCE, but I think it's a
>>>> reasonable compromise.  That doesn't directly solve the analyzer problem
>>>> though.  Maybe if it's OK to have the analyzer affect codegen, we could
>>>> have the analyzer define _FORTIFY_SOURCE=3 and thus enable additional
>>>> diagnostics too, like the __wur that also gets enabled only on
>>>> fortification.  Is that something worth considering?
>>>
>>> Or can we just guard the __nonnull usage against __GNUC_PREREQ (x, 0)
>>> where x is 12 or 13?  In the recent GCC releases the optimizer won't
>>> kill a side effect before an UB so it should be much safer (see my reply
>>> to Zack), and it's unlikely they'll use the latest GCC for some legacy
>>> broken code.
>>
>> It depends on whether that's a deliberate design decision in gcc (and we
>> should probably look at what clang does too) or if it just happens to be
>> so today because of some IR layout coincidence.
> 
> Allow me trying to get some clarification...  Jeff has said
> 
> "I thought during the introduction of erroneous path isolation that we
> concluded stores, calls and such had observable side effects that must
> be preserved, even when we hit a block that leads to
> __builtin_unreachable."
> 
> during the reviewing of some GCC patch.  So is the "must be preserved"
> statement applies generally, or only for -fisolate-erroneous-paths-*?
It applies in general.

Essentially up to the point where the UB happens we have to preserve 
visible side effects.  After the point where UB happens anything goes 
and our goal has been mark the paths through the CFG as dying at that 
point and forcing an immediate halt of the program (via __buitin_trap()).

There this all gets fuzzy is something like the NULL pointer property 
where the fact that a pointer must be non-null can backward propagate. 
ie, if a parameter is marked as non-null, then we will mark the 
corresponding SSA_NAME in the compiler as non-null.  Thus if there was 
some comparison of the SSA_NAME against NULL (perhaps well before the 
call), we'll optimize away that comparison.

Jeff
  
Xi Ruoyao July 10, 2023, 8:44 p.m. UTC | #15
On Mon, 2023-07-10 at 14:33 -0600, Jeff Law wrote:
> 
> 
> On 7/10/23 14:23, Xi Ruoyao wrote:
> > On Mon, 2023-07-10 at 15:46 -0400, Siddhesh Poyarekar wrote:
> > > On 2023-07-10 15:35, Xi Ruoyao wrote:
> > > > > Of course, I'm not concerned enough about these applications (sorry) to
> > > > > insist that it be put behind _FORTIFY_SOURCE, but I think it's a
> > > > > reasonable compromise.  That doesn't directly solve the analyzer problem
> > > > > though.  Maybe if it's OK to have the analyzer affect codegen, we could
> > > > > have the analyzer define _FORTIFY_SOURCE=3 and thus enable additional
> > > > > diagnostics too, like the __wur that also gets enabled only on
> > > > > fortification.  Is that something worth considering?
> > > > 
> > > > Or can we just guard the __nonnull usage against __GNUC_PREREQ (x, 0)
> > > > where x is 12 or 13?  In the recent GCC releases the optimizer won't
> > > > kill a side effect before an UB so it should be much safer (see my reply
> > > > to Zack), and it's unlikely they'll use the latest GCC for some legacy
> > > > broken code.
> > > 
> > > It depends on whether that's a deliberate design decision in gcc (and we
> > > should probably look at what clang does too) or if it just happens to be
> > > so today because of some IR layout coincidence.
> > 
> > Allow me trying to get some clarification...  Jeff has said
> > 
> > "I thought during the introduction of erroneous path isolation that we
> > concluded stores, calls and such had observable side effects that must
> > be preserved, even when we hit a block that leads to
> > __builtin_unreachable."
> > 
> > during the reviewing of some GCC patch.  So is the "must be preserved"
> > statement applies generally, or only for -fisolate-erroneous-paths-*?
> It applies in general.
> 
> Essentially up to the point where the UB happens we have to preserve 
> visible side effects.  After the point where UB happens anything goes 
> and our goal has been mark the paths through the CFG as dying at that 
> point and forcing an immediate halt of the program (via __buitin_trap()).
> 
> There this all gets fuzzy is something like the NULL pointer property 
> where the fact that a pointer must be non-null can backward propagate.
> ie, if a parameter is marked as non-null, then we will mark the 
> corresponding SSA_NAME in the compiler as non-null.  Thus if there was
> some comparison of the SSA_NAME against NULL (perhaps well before the 
> call), we'll optimize away that comparison.

So, for the SSA:

int g ()
{
  int x;
  int * local_p;
  int _6; 

  <bb 2> :
  local_p_3 = p;
  if (local_p_3 == 0B) 
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  puts ("oops");

  <bb 4> :
  x_5 = f (local_p_3);
  _6 = x_5;
  return _6; 

}

the bb 2 and 3 still may be optimized away (though it currently does not
happen)?
  
Zack Weinberg July 10, 2023, 8:55 p.m. UTC | #16
On Mon, Jul 10, 2023, at 4:33 PM, Jeff Law via Libc-alpha wrote:
> Essentially up to the point where the UB happens we have to preserve 
> visible side effects.  After the point where UB happens anything goes 
> and our goal has been mark the paths through the CFG as dying at that 
> point and forcing an immediate halt of the program (via __buitin_trap()).
>
> There this all gets fuzzy is something like the NULL pointer property 
> where the fact that a pointer must be non-null can backward propagate. 
> ie, if a parameter is marked as non-null, then we will mark the 
> corresponding SSA_NAME in the compiler as non-null.  Thus if there was 
> some comparison of the SSA_NAME against NULL (perhaps well before the 
> call), we'll optimize away that comparison.

Yep, see, that in and of itself is dangerous.

The bright line I would draw is: optimizations based on the assumption that control cannot proceed past the point where UB occurs are OK; optimizations based on the assumption that control cannot *reach* the point where UB occurs are *not* OK.  Removing a comparison to NULL, based on the observation that *later in some execution trace* the program will definitely dereference that pointer, falls in the latter category, *even if* there are no externally visible side effects in between the two points.

zw
  
Xi Ruoyao July 10, 2023, 9:03 p.m. UTC | #17
On Mon, 2023-07-10 at 16:55 -0400, Zack Weinberg wrote:
> On Mon, Jul 10, 2023, at 4:33 PM, Jeff Law via Libc-alpha wrote:
> > Essentially up to the point where the UB happens we have to preserve
> > visible side effects.  After the point where UB happens anything goes 
> > and our goal has been mark the paths through the CFG as dying at that 
> > point and forcing an immediate halt of the program (via __buitin_trap()).
> > 
> > There this all gets fuzzy is something like the NULL pointer property 
> > where the fact that a pointer must be non-null can backward propagate. 
> > ie, if a parameter is marked as non-null, then we will mark the 
> > corresponding SSA_NAME in the compiler as non-null.  Thus if there was 
> > some comparison of the SSA_NAME against NULL (perhaps well before the 
> > call), we'll optimize away that comparison.
> 
> Yep, see, that in and of itself is dangerous.
> 
> The bright line I would draw is: optimizations based on the assumption
> that control cannot proceed past the point where UB occurs are OK;
> optimizations based on the assumption that control cannot *reach* the
> point where UB occurs are *not* OK.  Removing a comparison to NULL,
> based on the observation that *later in some execution trace* the
> program will definitely dereference that pointer, falls in the latter
> category, *even if* there are no externally visible side effects in
> between the two points.

Why it's not OK if no externally visible side effects is between the two
points?

So what should we do now?  Remove all __nonnull from the headers?  Or
"some __nonnull are OK but my is not"?!
  
Zack Weinberg July 10, 2023, 9:22 p.m. UTC | #18
On Mon, Jul 10, 2023, at 5:03 PM, Xi Ruoyao via Libc-alpha wrote:
> On Mon, 2023-07-10 at 16:55 -0400, Zack Weinberg wrote:
>> On Mon, Jul 10, 2023, at 4:33 PM, Jeff Law via Libc-alpha wrote:
>> > Essentially up to the point where the UB happens we have to preserve
>> > visible side effects.  After the point where UB happens anything goes 
>> > and our goal has been mark the paths through the CFG as dying at that 
>> > point and forcing an immediate halt of the program (via __buitin_trap()).
>> > 
>> > There this all gets fuzzy is something like the NULL pointer property 
>> > where the fact that a pointer must be non-null can backward propagate. 
>> > ie, if a parameter is marked as non-null, then we will mark the 
>> > corresponding SSA_NAME in the compiler as non-null.  Thus if there was 
>> > some comparison of the SSA_NAME against NULL (perhaps well before the 
>> > call), we'll optimize away that comparison.
>> 
>> Yep, see, that in and of itself is dangerous.
>> 
>> The bright line I would draw is: optimizations based on the assumption
>> that control cannot proceed past the point where UB occurs are OK;
>> optimizations based on the assumption that control cannot *reach* the
>> point where UB occurs are *not* OK.  Removing a comparison to NULL,
>> based on the observation that *later in some execution trace* the
>> program will definitely dereference that pointer, falls in the latter
>> category, *even if* there are no externally visible side effects in
>> between the two points.
>
> Why it's not OK if no externally visible side effects is between the two
> points?

The short answer is that the C standard's definition of "externally visible side effect" is insufficient for the needs of certain security-critical programs. In particular, *how long it takes for the program to crash* may be an exploitable timing channel. (Look up "bomb oracle attack" in the cryptography literature if that seems impossible.)

> So what should we do now?  Remove all __nonnull from the headers?

That would be my choice. Or, leave them in as documentation, but make the macro expand to nothing except with specific compilers that are known not to optimize unsafely (this may be an empty set at present)

> Or "some __nonnull are OK but my is not"?!

I don't like that we have __nonnull at all. But adding it to stdio.h functions in particular is especially dangerous because of how widely used they are. I would say the same thing if you  were adding it to string.h or stdlib.h.

zw
  
Xi Ruoyao July 10, 2023, 9:33 p.m. UTC | #19
On Mon, 2023-07-10 at 17:22 -0400, Zack Weinberg wrote:
> On Mon, Jul 10, 2023, at 5:03 PM, Xi Ruoyao via Libc-alpha wrote:
> > On Mon, 2023-07-10 at 16:55 -0400, Zack Weinberg wrote:
> > > On Mon, Jul 10, 2023, at 4:33 PM, Jeff Law via Libc-alpha wrote:
> > > > Essentially up to the point where the UB happens we have to
> > > > preserve
> > > > visible side effects.  After the point where UB happens anything
> > > > goes 
> > > > and our goal has been mark the paths through the CFG as dying at
> > > > that 
> > > > point and forcing an immediate halt of the program (via
> > > > __buitin_trap()).
> > > > 
> > > > There this all gets fuzzy is something like the NULL pointer
> > > > property 
> > > > where the fact that a pointer must be non-null can backward
> > > > propagate. 
> > > > ie, if a parameter is marked as non-null, then we will mark the 
> > > > corresponding SSA_NAME in the compiler as non-null.  Thus if
> > > > there was 
> > > > some comparison of the SSA_NAME against NULL (perhaps well
> > > > before the 
> > > > call), we'll optimize away that comparison.
> > > 
> > > Yep, see, that in and of itself is dangerous.
> > > 
> > > The bright line I would draw is: optimizations based on the
> > > assumption
> > > that control cannot proceed past the point where UB occurs are OK;
> > > optimizations based on the assumption that control cannot *reach*
> > > the
> > > point where UB occurs are *not* OK.  Removing a comparison to
> > > NULL,
> > > based on the observation that *later in some execution trace* the
> > > program will definitely dereference that pointer, falls in the
> > > latter
> > > category, *even if* there are no externally visible side effects
> > > in
> > > between the two points.
> > 
> > Why it's not OK if no externally visible side effects is between the
> > two
> > points?
> 
> The short answer is that the C standard's definition of "externally
> visible side effect" is insufficient for the needs of certain
> security-critical programs. In particular, *how long it takes for the
> program to crash* may be an exploitable timing channel. (Look up "bomb
> oracle attack" in the cryptography literature if that seems
> impossible.)
> 
> > So what should we do now?  Remove all __nonnull from the headers?
> 
> That would be my choice. Or, leave them in as documentation, but make
> the macro expand to nothing except with specific compilers that are
> known not to optimize unsafely (this may be an empty set at present)
> 
> > Or "some __nonnull are OK but my is not"?!
> 
> I don't like that we have __nonnull at all. But adding it to stdio.h
> functions in particular is especially dangerous because of how widely
> used they are. I would say the same thing if you  were adding it to
> string.h or stdlib.h.

They *are* already in string.h and stdlib.h.
  
Jeff Law July 10, 2023, 9:51 p.m. UTC | #20
On 7/10/23 14:55, Zack Weinberg wrote:
> On Mon, Jul 10, 2023, at 4:33 PM, Jeff Law via Libc-alpha wrote:
>> Essentially up to the point where the UB happens we have to preserve
>> visible side effects.  After the point where UB happens anything goes
>> and our goal has been mark the paths through the CFG as dying at that
>> point and forcing an immediate halt of the program (via __buitin_trap()).
>>
>> There this all gets fuzzy is something like the NULL pointer property
>> where the fact that a pointer must be non-null can backward propagate.
>> ie, if a parameter is marked as non-null, then we will mark the
>> corresponding SSA_NAME in the compiler as non-null.  Thus if there was
>> some comparison of the SSA_NAME against NULL (perhaps well before the
>> call), we'll optimize away that comparison.
> 
> Yep, see, that in and of itself is dangerous.
> 
> The bright line I would draw is: optimizations based on the assumption that control cannot proceed past the point where UB occurs are OK; optimizations based on the assumption that control cannot *reach* the point where UB occurs are *not* OK.  Removing a comparison to NULL, based on the observation that *later in some execution trace* the program will definitely dereference that pointer, falls in the latter category, *even if* there are no externally visible side effects in between the two points.
I'd tend to agree these days and I think you've captured the issue 
pretty well.  And I suspect that probably contradicts statements I've 
made in the past in this space.   Time and experience have caused my 
position to evolve.

Jeff
  
Paul Eggert July 10, 2023, 10:09 p.m. UTC | #21
On 2023-07-10 14:22, Zack Weinberg via Libc-alpha wrote:

>> Why it's not OK if no externally visible side effects is between the two
>> points?
> 
> The short answer is that the C standard's definition of "externally visible side effect" is insufficient for the needs of certain security-critical programs. In particular, *how long it takes for the program to crash* may be an exploitable timing channel. (Look up "bomb oracle attack" in the cryptography literature if that seems impossible.)

I thought bomb oracles by definition explode when they don't give an 
answer. Exploding is different from undefined behavior. (Or perhaps you 
meant "padding oracle attack"? or something else? I'm a bit lost here.)

As for timing channels - doesn't every optimization affect timing 
channels? What's special about this one?
  
Siddhesh Poyarekar July 10, 2023, 10:34 p.m. UTC | #22
On 2023-07-10 16:33, Jeff Law wrote:
> Essentially up to the point where the UB happens we have to preserve 
> visible side effects.  After the point where UB happens anything goes 
> and our goal has been mark the paths through the CFG as dying at that 
> point and forcing an immediate halt of the program (via __buitin_trap()).
> 
> There this all gets fuzzy is something like the NULL pointer property 
> where the fact that a pointer must be non-null can backward propagate. 
> ie, if a parameter is marked as non-null, then we will mark the 
> corresponding SSA_NAME in the compiler as non-null.  Thus if there was 
> some comparison of the SSA_NAME against NULL (perhaps well before the 
> call), we'll optimize away that comparison.

Is this still the case though?  In this, for example, the check doesn't 
seem to get optimized away at any optimization level:

extern void do_foo (void);
extern void bar (const char *in) __attribute__ ((nonnull(1)));

int foo (const char *buf)
{
	if (buf == (void *) 0)
		do_foo ();

	bar (buf);
	return 0;
}

In fact, I would have expected the == NULL branch gone in the above case 
(which is buggy code after all and I reckon this is the situation Zack 
is worried about) and retained here:

extern void do_foo (void) __attribute__ ((noreturn));
extern void bar (const char *in) __attribute__ ((nonnull(1)));

int foo (const char *buf)
{
	if (buf == (void *) 0)
		do_foo ();

	bar (buf);
	return 0;
}

but the call gets retained in both cases.  It only seems to be getting 
eliminated when the NULL check is after the call to bar(), which should 
always be a correct optimization.

Sid
  
Siddhesh Poyarekar July 10, 2023, 10:48 p.m. UTC | #23
On 2023-07-10 17:22, Zack Weinberg wrote:
>> Why it's not OK if no externally visible side effects is between the two
>> points?
> 
> The short answer is that the C standard's definition of "externally visible side effect" is insufficient for the needs of certain security-critical programs. In particular, *how long it takes for the program to crash* may be an exploitable timing channel. (Look up "bomb oracle attack" in the cryptography literature if that seems impossible.)
> 

Those kinds of security-critical applications (also the kinds that need 
time-invariant copy routines) shouldn't be the main target for the 
library, although it would be a good idea to have special switches to 
enable code for them.

>> So what should we do now?  Remove all __nonnull from the headers?
> 
> That would be my choice. Or, leave them in as documentation, but make the macro expand to nothing except with specific compilers that are known not to optimize unsafely (this may be an empty set at present)
> 
>> Or "some __nonnull are OK but my is not"?!
> 
> I don't like that we have __nonnull at all. But adding it to stdio.h functions in particular is especially dangerous because of how widely used they are. I would say the same thing if you  were adding it to string.h or stdlib.h.

It's already very widely deployed in glibc, I doubt if adding to stdio.h 
would make it significantly worse.

Sid
  
Jeff Law July 10, 2023, 10:59 p.m. UTC | #24
On 7/10/23 16:34, Siddhesh Poyarekar wrote:
> On 2023-07-10 16:33, Jeff Law wrote:
>> Essentially up to the point where the UB happens we have to preserve 
>> visible side effects.  After the point where UB happens anything goes 
>> and our goal has been mark the paths through the CFG as dying at that 
>> point and forcing an immediate halt of the program (via __buitin_trap()).
>>
>> There this all gets fuzzy is something like the NULL pointer property 
>> where the fact that a pointer must be non-null can backward propagate. 
>> ie, if a parameter is marked as non-null, then we will mark the 
>> corresponding SSA_NAME in the compiler as non-null.  Thus if there was 
>> some comparison of the SSA_NAME against NULL (perhaps well before the 
>> call), we'll optimize away that comparison.
> 
> Is this still the case though?  In this, for example, the check doesn't 
> seem to get optimized away at any optimization level:
> 
> extern void do_foo (void);
> extern void bar (const char *in) __attribute__ ((nonnull(1)));
> 
> int foo (const char *buf)
> {
>      if (buf == (void *) 0)
>          do_foo ();
> 
>      bar (buf);
>      return 0;
> }
> 
> In fact, I would have expected the == NULL branch gone in the above case 
> (which is buggy code after all and I reckon this is the situation Zack 
> is worried about) and retained here:
> 
> extern void do_foo (void) __attribute__ ((noreturn));
> extern void bar (const char *in) __attribute__ ((nonnull(1)));
> 
> int foo (const char *buf)
> {
>      if (buf == (void *) 0)
>          do_foo ();
> 
>      bar (buf);
>      return 0;
> }
> 
> but the call gets retained in both cases.  It only seems to be getting 
> eliminated when the NULL check is after the call to bar(), which should 
> always be a correct optimization.
It's possible this got adjusted over time, particularly with Aldy & 
Andrew's work on ranger.  Poking around there a bit I see the forward 
propagating effect from the attribute on PARM_DECLs for incoming 
arguments and on return values at call sites.  What I don't see anymore 
is setting the global nonnull property on arbitrary call arguments 
marked as non-null.

Jeff
  
Sam James July 11, 2023, 12:45 a.m. UTC | #25
Siddhesh Poyarekar <siddhesh@gotplt.org> writes:

> On 2023-07-10 17:22, Zack Weinberg wrote:
>>> Why it's not OK if no externally visible side effects is between the two
>>> points?
>> The short answer is that the C standard's definition of "externally
>> visible side effect" is insufficient for the needs of certain
>> security-critical programs. In particular, *how long it takes for
>> the program to crash* may be an exploitable timing channel. (Look up
>> "bomb oracle attack" in the cryptography literature if that seems
>> impossible.)
>> 
>
> Those kinds of security-critical applications (also the kinds that
> need time-invariant copy routines) shouldn't be the main target for
> the library, although it would be a good idea to have special switches
> to enable code for them.
>
>>> So what should we do now?  Remove all __nonnull from the headers?
>> That would be my choice. Or, leave them in as documentation, but
>> make the macro expand to nothing except with specific compilers that
>> are known not to optimize unsafely (this may be an empty set at
>> present)

I've CC'd all the participants in this thread on the GCC bug that
Xi Ruoyao filed, but some of these attributes have (IIRC) been around
for decades (2004!):
* https://github.com/madler/pigz/issues/107#issuecomment-1368304437
* https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=be27d08c05911a658949ba7b84f4321a65a2dbf4

I don't like making this argument if something is unsound (because if
it's unsound, then it always has been), but this is purely about working
around buggy applications anyway - and clearly nobody's been bothered by
this in the last 20 years, at least enough to report a bug?

>> 
>>> Or "some __nonnull are OK but my is not"?!
>> I don't like that we have __nonnull at all. But adding it to stdio.h
>> functions in particular is especially dangerous because of how
>> widely used they are. I would say the same thing if you  were adding
>> it to string.h or stdlib.h.
>
> It's already very widely deployed in glibc, I doubt if adding to
> stdio.h would make it significantly worse.

Right.

And the attribute is also going to be *significantly* nerfed if
we drop it globally from our headers in glibc, as nobody gets
warnings for stdlib functions anymore.
  
Sam James July 11, 2023, 12:51 a.m. UTC | #26
Siddhesh Poyarekar <siddhesh@gotplt.org> writes:

> On 2023-07-10 14:56, Zack Weinberg wrote:
>>> Would it be more acceptable to you if this gets wrapped into fortify,
>>> i.e. it gets enabled if _FORTIFY_SOURCE is defined?
>> I tend to agree with Xi that having the presence of __nonnull depend
>> on
>> _FORTIFY_SOURCE would cause more problems than it solves.  Also, since
>> several Linux distributions enable _FORTIFY_SOURCE by default, we'd
>> still be risking significant breakage if we shipped that in 2.38.
>
> I'm less concerned about the distribution breakage because they'll
> more likely than not get fixed; in fact my suggestion to put it behind
> the _FORTIFY_SOURCE wall was precisely for that reason.  I'd like us
> to weed out such cases in the distribution and get them fixed rather
> than maintaining status quo.  I'm relatively more concerned about
> non-distribution applications that tend to, e.g. disable security
> features because they see them as either performance hindrances or
> want some legacy broken code to just work.
>
> Of course, I'm not concerned enough about these applications (sorry)
> to insist that it be put behind _FORTIFY_SOURCE, but I think it's a
> reasonable compromise.  That doesn't directly solve the analyzer
> problem though.  Maybe if it's OK to have the analyzer affect codegen,
> we could have the analyzer define _FORTIFY_SOURCE=3 and thus enable
> additional diagnostics too, like the __wur that also gets enabled only
> on fortification.  Is that something worth considering?

I like this as a compromise, at least for a temporary gate for us to
feel more confident about a big-bang switch to the remaining missing
annotations in headers, but - like you - I'm not convinced it's necessary.
  
Cristian Rodríguez July 11, 2023, 1:03 p.m. UTC | #27
On Mon, Jul 10, 2023 at 5:51 PM Jeff Law via Libc-alpha <
libc-alpha@sourceware.org> wrote:

>
> >
> > The bright line I would draw is: optimizations based on the assumption
> that control cannot proceed past the point where UB occurs are OK;
> optimizations based on the assumption that control cannot *reach* the point
> where UB occurs are *not* OK.  Removing a comparison to NULL, based on the
> observation that *later in some execution trace* the program will
> definitely dereference that pointer, falls in the latter category, *even
> if* there are no externally visible side effects in between the two points.
> I'd tend to agree these days and I think you've captured the issue
> pretty well.  And I suspect that probably contradicts statements I've
> made in the past in this space.   Time and experience have caused my
> position to evolve.
>


A single compiler flag disabling all this optimizations is needed. maybe
something like -fno-ub-abuse that sets -fno-strict-overflow
-fno-delete-null-pointer-checks -fother-non-obvious.flag-im-missing
or an -Osafer optimization level that makes sure that rules apply.
  
Zack Weinberg July 11, 2023, 7:12 p.m. UTC | #28
On Mon, Jul 10, 2023, at 5:33 PM, Xi Ruoyao via Libc-alpha wrote:
> On Mon, 2023-07-10 at 17:22 -0400, Zack Weinberg wrote:
>> I don't like that we have __nonnull at all. But adding it to stdio.h
>> functions in particular is especially dangerous because of how widely
>> used they are. I would say the same thing if you  were adding it to
>> string.h or stdlib.h.
>
> They *are* already in string.h and stdlib.h.

I've realized that yesterday I was trying to say two different things at
the same time and I probably confused all of you.  My apologies.  Let me
attempt to clarify:

I raised an objection to Xi's patches (adding non-null annotations to
stdio.h in particular) specifically because of the timing.  Having
thought about it some more, I'm fine with having those annotations, but
I think it is unwise to *add* them to such a widely-used, standardized
header right before a release freeze.  Instead, I'd like to see Xi's
patches land on trunk almost immediately after the 2.38 release branch
is created, so that we have the maximum amount of time to find any
problems before 2.39.

Independent of that, I have some concerns about C compilers, *in
general*, drawing inferences about what the program may or may not do,
based on the presence of a construct with runtime undefined behavior on
particular control flow paths.  I brought these up to explain why I
think Xi's patches are too risky for this stage of the glibc development
cycle, but I wasn't intending to suggest that glibc needed to take
existing non-null annotations out of its headers.  (I did say "I don't
like that we have __nonnull at all" but, given that we *do* have it, we
ought to have heard about it by now if the existing uses were causing
problems.)  I also wasn't trying to start an argument about those
inferences, and perhaps that conversation should move to one of GCC's
mailing lists.

zw
  
Zack Weinberg July 11, 2023, 7:18 p.m. UTC | #29
[cc list pruned for this tangent]

On Mon, Jul 10, 2023, at 6:09 PM, Paul Eggert wrote:
> On 2023-07-10 14:22, Zack Weinberg via Libc-alpha wrote:
>
>>> Why it's not OK if no externally visible side effects is between the
>>> two points?
>>
>> The short answer is that the C standard's definition of "externally
>> visible side effect" is insufficient for the needs of certain security-
>> critical programs. In particular, *how long it takes for the program
>> to crash* may be an exploitable timing channel. (Look up "bomb oracle
>> attack" in the cryptography literature if that seems impossible.)
>
> I thought bomb oracles by definition explode when they don't give
> an answer. Exploding is different from undefined behavior. (Or
> perhaps you meant "padding oracle attack"? or something else? I'm a
> bit lost here.)

An adversary doesn't typically care *why* the program crashes, only that
they can extract information from whether and when it crashed.  The
scenario I'm imagining is something like

if (pointer) {
   use(pointer);
}
expensive_computation();
use(pointer);

Making the initial 'if' unconditional means that the program will crash
before the expensive computation.  In this case that would remove a timing
channel, but you can see how it could just as easily be the other way around.

> As for timing channels - doesn't every optimization affect timing
> channels? What's special about this one?

Because making pointer dereferences unconditional (or conditional) is
the kind of optimization that can turn code that was *supposed* to
execute in constant time into code that doesn't.

zw
  
Siddhesh Poyarekar July 11, 2023, 8:12 p.m. UTC | #30
On 2023-07-11 15:12, Zack Weinberg wrote:
> On Mon, Jul 10, 2023, at 5:33 PM, Xi Ruoyao via Libc-alpha wrote:
>> On Mon, 2023-07-10 at 17:22 -0400, Zack Weinberg wrote:
>>> I don't like that we have __nonnull at all. But adding it to stdio.h
>>> functions in particular is especially dangerous because of how widely
>>> used they are. I would say the same thing if you  were adding it to
>>> string.h or stdlib.h.
>>
>> They *are* already in string.h and stdlib.h.
> 
> I've realized that yesterday I was trying to say two different things at
> the same time and I probably confused all of you.  My apologies.  Let me
> attempt to clarify:

Thank you for writing this Zack, this is really helpful in moving the 
conversation forward.

> I raised an objection to Xi's patches (adding non-null annotations to
> stdio.h in particular) specifically because of the timing.  Having
> thought about it some more, I'm fine with having those annotations, but
> I think it is unwise to *add* them to such a widely-used, standardized
> header right before a release freeze.  Instead, I'd like to see Xi's
> patches land on trunk almost immediately after the 2.38 release branch
> is created, so that we have the maximum amount of time to find any
> problems before 2.39.

Xi, if you're still interested, can you please post a v6 with 
annotations to the _chk variants of the functions and in stdio2.h as well?

Thanks,
Sid
  
Jeff Law July 11, 2023, 8:45 p.m. UTC | #31
On 7/11/23 13:18, Zack Weinberg via Libc-alpha wrote:

> 
>> As for timing channels - doesn't every optimization affect timing
>> channels? What's special about this one?
> 
> Because making pointer dereferences unconditional (or conditional) is
> the kind of optimization that can turn code that was *supposed* to
> execute in constant time into code that doesn't.
Which would imply that optimizations like sinking into more control 
dependent paths is undesirable, at least for code which wants to harden 
itself against timing attacks.

We don't do a ton of that in GCC, but there is certainly a pass which 
will try to adjust the placement of instructions so that they're placed 
at the shallowest loop nest, but deepest control depth while still 
dominating all the uses.  tree-ssa-sink.cc.

For code which cares about timing attacks, you probably want to turn 
that off :-)



jeff
  
Paul Eggert July 11, 2023, 11:59 p.m. UTC | #32
On 7/11/23 13:45, Jeff Law wrote:
> On 7/11/23 13:18, Zack Weinberg via Libc-alpha wrote: >> making pointer dereferences unconditional (or conditional) is
>> the kind of optimization that can turn code that was *supposed* to
>> execute in constant time into code that doesn't.
...
> Which would imply that optimizations like sinking into more control 
> dependent paths is undesirable, at least for code which wants to harden 
> itself against timing attacks.
> 
> We don't do a ton of that in GCC...

It shouldn't matter how much GCC does, as timing and power-channel 
attacks are impractical to prevent at the source code level. One needs 
to look at the machine code, one needs to know what specific machine the 
code is running on, and it's so hard to do this correctly that it 
continues to be an active area of research. See, for example, the 
Chroniton work done at MIT and presented last month at PLARCH 2023.[1] 
You don't need to read the whole paper, just its summary of the problem 
and the technology they're using to address it.

This sort of thing is of course way above glibc's pay grade and 
generally speaking we needn't worry about it now. We can wait to see 
whether the research ever becomes practical. (Along with the other 
research presented in that same conference, such as the paper proposing 
to use ChatGPT to reimplement malloc more securely. Yup, we're being 
replaced. :-)

[1]: Athalye A, Kaashoek MF, Zeldovich N, Tassarotti J. Leakage models 
are a leaky abstraction: the case for cycle-level verification of 
constant-time cryptography. PLARCH 2023. 
https://vm-web.pdos.csail.mit.edu/papers/chroniton:plarch23.pdf
  
Jeff Law July 12, 2023, 2:40 a.m. UTC | #33
On 7/11/23 17:59, Paul Eggert wrote:
> On 7/11/23 13:45, Jeff Law wrote:
>> On 7/11/23 13:18, Zack Weinberg via Libc-alpha wrote: >> making 
>> pointer dereferences unconditional (or conditional) is
>>> the kind of optimization that can turn code that was *supposed* to
>>> execute in constant time into code that doesn't.
> ...
>> Which would imply that optimizations like sinking into more control 
>> dependent paths is undesirable, at least for code which wants to 
>> harden itself against timing attacks.
>>
>> We don't do a ton of that in GCC...
> 
> It shouldn't matter how much GCC does, as timing and power-channel 
> attacks are impractical to prevent at the source code level. One needs 
> to look at the machine code, one needs to know what specific machine the 
> code is running on, and it's so hard to do this correctly that it 
> continues to be an active area of research. See, for example, the 
> Chroniton work done at MIT and presented last month at PLARCH 2023.[1] 
> You don't need to read the whole paper, just its summary of the problem 
> and the technology they're using to address it.
Some might claim otherwise.  Certainly the practicality of an attack is 
something to be evaluated and many aren't for various reasons.  But I 
think it's also been the case that we're finding out that things we 
thought were impractical have become practical over time.

Thankfully I don't have to ponder this kind of problem nearly as much as 
I used to :-)

jeff
  
Xi Ruoyao July 12, 2023, 8:59 a.m. UTC | #34
On Tue, 2023-07-11 at 16:12 -0400, Siddhesh Poyarekar wrote:
> > I raised an objection to Xi's patches (adding non-null annotations
> > to
> > stdio.h in particular) specifically because of the timing.  Having
> > thought about it some more, I'm fine with having those annotations,
> > but
> > I think it is unwise to *add* them to such a widely-used,
> > standardized
> > header right before a release freeze.  Instead, I'd like to see Xi's
> > patches land on trunk almost immediately after the 2.38 release
> > branch
> > is created, so that we have the maximum amount of time to find any
> > problems before 2.39.
> 
> Xi, if you're still interested, can you please post a v6 with 
> annotations to the _chk variants of the functions and in stdio2.h as
> well?

https://patchwork.sourceware.org/project/glibc/patch/20230712085615.1373977-1-xry111@xry111.site/
  
Martin Uecker Aug. 8, 2023, 10:01 a.m. UTC | #35
Am Montag, dem 10.07.2023 um 22:16 +0200 schrieb Alejandro Colomar via Gcc:
> On 7/10/23 22:14, Alejandro Colomar wrote:
> > [CC += Andrew]
> > 
> > Hi Xi, Andrew,
> > 
> > On 7/10/23 20:41, Xi Ruoyao wrote:
> > > Maybe we should have a weaker version of nonnull which only performs the
> > > diagnostic, not the optimization.  But it looks like they hate the idea:
> > > https://gcc.gnu.org/PR110617.
> > > 
> > This is the one thing that makes me use both Clang and GCC to compile,
> > because while any of them would be enough to build, I want as much
> > static analysis as I can get, and so I want -fanalyzer (so I need GCC),
> > but I also use _Nullable (so I need Clang).
> > 
> > If GCC had support for _Nullable, I would have in GCC the superset of
> > features that I need from both in a single vendor.  Moreover, Clang's
> > static analyzer is brain-damaged (sorry, but it doesn't have a simple
> > command line to run it, contrary to GCC's easy -fanalyzer), so having
> > GCC's analyzer get over those _Nullable qualifiers would be great.
> > 
> > Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
> > mode, as there are many cases where the compiler doesn't have enough
> > information, and the analyzer can get rid of false negatives and
> > positives.  See: <https://github.com/llvm/llvm-project/issues/57546>
> > 
> > I'll back the ask for the qualifiers in GCC, for compatibility with
> > Clang.
> 
> BTW, Bionic libc is adding those qualifiers:
> 
> <https://android-review.googlesource.com/c/platform/bionic/+/2552567/7/libc/include/netinet/ether.h#45>
> <https://android-review.googlesource.com/q/owner:zijunzhao@google.com+Nullability>
> 
> 

I am sceptical about these qualifiers because they do
not fit well with this language mechanism.   Qualifiers take
effect at accesses to objects and are discarded at lvalue
conversion.  So a qualifier should ideally describe a property
that is relevant for accessing objects but is not relevant
for values.

Also there are built-in conversion rules a qualifier should
conform to.  A pointer pointing to a type without qualifier 
can implicitely convert to a pointer to a type with qualifier,
e.g.   int*  can be converted to const int*.

Together, this implies that a qualifier should add constraints
to a type that are relevant to how an object is accessed.

"const" and "volatile" or "_Atomic" are good examples.
("restricted" does not quite follow these rules and is 
considered weird and difficult to understand.)

In contrast, "nonnull" and "nullable" are properties that
affect the set of values of the pointer, but not how the
pointer itself is accessed. 


Martin
  
enh Aug. 9, 2023, 12:14 a.m. UTC | #36
(bionic maintainer here, mostly by accident...)

yeah, we tried the GCC attributes once before with _disastrous_
results (because GCC saw it as an excuse to delete explicit null
checks, it broke all kinds of things). the clang attributes are
"better" in that they don't confuse two entirely separate notions ...
but they're not "good" because the analysis is so limited. i think we
were hoping for something more like the "used but not set" kind of
diagnostic, but afaict it really only catches _direct_ use of a
literal nullptr. so:

  foo(nullptr); // caught

  void* p = nullptr;
  foo(p); // not caught

without getting on to anything fancy like:

  void* p;
  if (a) p = bar();
  foo(p);

we've done all the annotations anyway, but we're not expecting to
actually catch many bugs with them, and in fact did not catch any real
bugs in AOSP while adding the annotations. (though we did find several
"you're not _wrong_, but ..." bits of code :-) )

what i really want for christmas is the annotation that lets me say
"this size_t argument tells you the size of this array argument" and
actually does something usefully fortify-like with that. i've seen
proposals like https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/
but i haven't seen anything i can use yet, so you -- who do use GCC
which aiui has something along these lines already -- will find out
what's good/bad about it before i do...

On Tue, Aug 8, 2023 at 3:01 AM Martin Uecker <muecker@gwdg.de> wrote:
>
> Am Montag, dem 10.07.2023 um 22:16 +0200 schrieb Alejandro Colomar via Gcc:
> > On 7/10/23 22:14, Alejandro Colomar wrote:
> > > [CC += Andrew]
> > >
> > > Hi Xi, Andrew,
> > >
> > > On 7/10/23 20:41, Xi Ruoyao wrote:
> > > > Maybe we should have a weaker version of nonnull which only performs the
> > > > diagnostic, not the optimization.  But it looks like they hate the idea:
> > > > https://gcc.gnu.org/PR110617.
> > > >
> > > This is the one thing that makes me use both Clang and GCC to compile,
> > > because while any of them would be enough to build, I want as much
> > > static analysis as I can get, and so I want -fanalyzer (so I need GCC),
> > > but I also use _Nullable (so I need Clang).
> > >
> > > If GCC had support for _Nullable, I would have in GCC the superset of
> > > features that I need from both in a single vendor.  Moreover, Clang's
> > > static analyzer is brain-damaged (sorry, but it doesn't have a simple
> > > command line to run it, contrary to GCC's easy -fanalyzer), so having
> > > GCC's analyzer get over those _Nullable qualifiers would be great.
> > >
> > > Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
> > > mode, as there are many cases where the compiler doesn't have enough
> > > information, and the analyzer can get rid of false negatives and
> > > positives.  See: <https://github.com/llvm/llvm-project/issues/57546>
> > >
> > > I'll back the ask for the qualifiers in GCC, for compatibility with
> > > Clang.
> >
> > BTW, Bionic libc is adding those qualifiers:
> >
> > <https://android-review.googlesource.com/c/platform/bionic/+/2552567/7/libc/include/netinet/ether.h#45>
> > <https://android-review.googlesource.com/q/owner:zijunzhao@google.com+Nullability>
> >
> >
>
> I am sceptical about these qualifiers because they do
> not fit well with this language mechanism.   Qualifiers take
> effect at accesses to objects and are discarded at lvalue
> conversion.  So a qualifier should ideally describe a property
> that is relevant for accessing objects but is not relevant
> for values.
>
> Also there are built-in conversion rules a qualifier should
> conform to.  A pointer pointing to a type without qualifier
> can implicitely convert to a pointer to a type with qualifier,
> e.g.   int*  can be converted to const int*.
>
> Together, this implies that a qualifier should add constraints
> to a type that are relevant to how an object is accessed.
>
> "const" and "volatile" or "_Atomic" are good examples.
> ("restricted" does not quite follow these rules and is
> considered weird and difficult to understand.)
>
> In contrast, "nonnull" and "nullable" are properties that
> affect the set of values of the pointer, but not how the
> pointer itself is accessed.
>
>
> Martin
>
>
>
>
>
  
Siddhesh Poyarekar Aug. 9, 2023, 1:11 a.m. UTC | #37
On 2023-08-08 20:14, enh wrote:
> (bionic maintainer here, mostly by accident...)
> 
> yeah, we tried the GCC attributes once before with _disastrous_
> results (because GCC saw it as an excuse to delete explicit null
> checks, it broke all kinds of things). the clang attributes are

AFAICT based on earlier discussions around this patch, the NULL check 
deletion anomalies in gcc seem to have been fixed recently.

> "better" in that they don't confuse two entirely separate notions ...
> but they're not "good" because the analysis is so limited. i think we
> were hoping for something more like the "used but not set" kind of
> diagnostic, but afaict it really only catches _direct_ use of a
> literal nullptr. so:
> 
>    foo(nullptr); // caught
> 
>    void* p = nullptr;
>    foo(p); // not caught
> 
> without getting on to anything fancy like:
> 
>    void* p;
>    if (a) p = bar();
>    foo(p);
> 
> we've done all the annotations anyway, but we're not expecting to
> actually catch many bugs with them, and in fact did not catch any real
> bugs in AOSP while adding the annotations. (though we did find several
> "you're not _wrong_, but ..." bits of code :-) )

I believe it did catch a few issues in the glibc test cases when we 
enabled fortification internally in addition to flagging several "you're 
not _wrong_, but..." bits.  In any case, it's only enabled with 
fortification since we use that as a hint for wanting safer code.

> what i really want for christmas is the annotation that lets me say
> "this size_t argument tells you the size of this array argument" and
> actually does something usefully fortify-like with that. i've seen
> proposals like https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/
> but i haven't seen anything i can use yet, so you -- who do use GCC
> which aiui has something along these lines already -- will find out
> what's good/bad about it before i do...

I think a lot of this is covered by __access__, __alloc_size__ and the 
upcoming __counted_by__ attributes.  There are still gaps that something 
like -fbounds-safety would cover (I think -fsanitize=bounds and 
-fsanitize=object-size further cover some of those gaps but there's a 
general lack of confidence in using them in production due to 
performance concerns; fbounds-safety has those concerns too AFAICT) but 
that gap is getting narrower.

Thanks,
Sid
  
Martin Uecker Aug. 9, 2023, 7:26 a.m. UTC | #38
Am Dienstag, dem 08.08.2023 um 17:14 -0700 schrieb enh:
> (bionic maintainer here, mostly by accident...)
> 
> yeah, we tried the GCC attributes once before with _disastrous_
> results (because GCC saw it as an excuse to delete explicit null
> checks, it broke all kinds of things).

Thanks! I am currently exploring different options and what
could be done to improve the situation from the language
as well as compile side.  It looks we have some partial
solution but they do not work quite right or do not
fit together perfectly.


>  the clang attributes are
> "better" in that they don't confuse two entirely separate notions ...
> but they're not "good" because the analysis is so limited. i think we
> were hoping for something more like the "used but not set" kind of
> diagnostic, but afaict it really only catches _direct_ use of a
> literal nullptr. so:
> 
>   foo(nullptr); // caught
> 
>   void* p = nullptr;
>   foo(p); // not caught
> 
> without getting on to anything fancy like:
> 
>   void* p;
>   if (a) p = bar();
>   foo(p);
> 
> we've done all the annotations anyway, but we're not expecting to
> actually catch many bugs with them, and in fact did not catch any real
> bugs in AOSP while adding the annotations. (though we did find several
> "you're not _wrong_, but ..." bits of code :-) )
> 
> what i really want for christmas is the annotation that lets me say
> "this size_t argument tells you the size of this array argument" and
> actually does something usefully fortify-like with that.
> 

What is your opinion about the access attribute?

https://godbolt.org/z/PPfajefvP

it is a bit cumbersome to use, but one can use [static]
instead, which gives you the same static warnings.

[static] does not work with __builtin_dynamic_object_size,
but maybe this could be changed (there is a bug filed.)

I am not sure whether [static] should imply [[gnu::nonnull]]
which would then also trigger the optimization. I think
clang uses it for optimization.

Martin


>  i've seen
> proposals like https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/
> but i haven't seen anything i can use yet, so you -- who do use GCC
> which aiui has something along these lines already -- will find out
> what's good/bad about it before i do...



> 
> On Tue, Aug 8, 2023 at 3:01 AM Martin Uecker <muecker@gwdg.de> wrote:
> > 
> > Am Montag, dem 10.07.2023 um 22:16 +0200 schrieb Alejandro Colomar via Gcc:
> > > On 7/10/23 22:14, Alejandro Colomar wrote:
> > > > [CC += Andrew]
> > > > 
> > > > Hi Xi, Andrew,
> > > > 
> > > > On 7/10/23 20:41, Xi Ruoyao wrote:
> > > > > Maybe we should have a weaker version of nonnull which only performs the
> > > > > diagnostic, not the optimization.  But it looks like they hate the idea:
> > > > > https://gcc.gnu.org/PR110617.
> > > > > 
> > > > This is the one thing that makes me use both Clang and GCC to compile,
> > > > because while any of them would be enough to build, I want as much
> > > > static analysis as I can get, and so I want -fanalyzer (so I need GCC),
> > > > but I also use _Nullable (so I need Clang).
> > > > 
> > > > If GCC had support for _Nullable, I would have in GCC the superset of
> > > > features that I need from both in a single vendor.  Moreover, Clang's
> > > > static analyzer is brain-damaged (sorry, but it doesn't have a simple
> > > > command line to run it, contrary to GCC's easy -fanalyzer), so having
> > > > GCC's analyzer get over those _Nullable qualifiers would be great.
> > > > 
> > > > Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
> > > > mode, as there are many cases where the compiler doesn't have enough
> > > > information, and the analyzer can get rid of false negatives and
> > > > positives.  See: <https://github.com/llvm/llvm-project/issues/57546>
> > > > 
> > > > I'll back the ask for the qualifiers in GCC, for compatibility with
> > > > Clang.
> > > 
> > > BTW, Bionic libc is adding those qualifiers:
> > > 
> > > <https://android-review.googlesource.com/c/platform/bionic/+/2552567/7/libc/include/netinet/ether.h#45>
> > > <https://android-review.googlesource.com/q/owner:zijunzhao@google.com+Nullability>
> > > 
> > > 
> > 
> > I am sceptical about these qualifiers because they do
> > not fit well with this language mechanism.   Qualifiers take
> > effect at accesses to objects and are discarded at lvalue
> > conversion.  So a qualifier should ideally describe a property
> > that is relevant for accessing objects but is not relevant
> > for values.
> > 
> > Also there are built-in conversion rules a qualifier should
> > conform to.  A pointer pointing to a type without qualifier
> > can implicitely convert to a pointer to a type with qualifier,
> > e.g.   int*  can be converted to const int*.
> > 
> > Together, this implies that a qualifier should add constraints
> > to a type that are relevant to how an object is accessed.
> > 
> > "const" and "volatile" or "_Atomic" are good examples.
> > ("restricted" does not quite follow these rules and is
> > considered weird and difficult to understand.)
> > 
> > In contrast, "nonnull" and "nullable" are properties that
> > affect the set of values of the pointer, but not how the
> > pointer itself is accessed.
> > 
> > 
> > Martin
> > 
> > 
> > 
> > 
> >
  
Alejandro Colomar Aug. 9, 2023, 10:42 a.m. UTC | #39
Hi Martin,

On 2023-08-09 09:26, Martin Uecker wrote:
> it is a bit cumbersome to use, but one can use [static]
> instead, which gives you the same static warnings.
> 
> [static] does not work with __builtin_dynamic_object_size,
> but maybe this could be changed (there is a bug filed.)
> 
> I am not sure whether [static] should imply [[gnu::nonnull]]

I have a gripe with ISO C's [static].  As you mention, ISO
conflated two functionalities in [static]:

-  The size of the array passed as argument must not be less
   than the size specified in the parameter's [].

-  The pointer must not be NULL.

And there are valid cases where you may want the first but
not the second.  Or the second but not the first (that's the
case for _Nonnull, of course).

In fact, it's so badly damaged, that it prompted a proposal
to ISO C of using [static 1] as an equivalent of _Nonnull in
the prototypes that accepted a pointer that should not be
NULL.  However, that proposal didn't include the functions
that actually take arrays as input (because they are taken
in the opposite order, so array syntax is not legal).  Don't
you find it ironic that ISO C could have used array syntax
for pointers and pointer syntax for arrays?  I do.

As for when one would want to mean the first (size of array)
but not _Nonnull: for a function where you may pass either
an array (which should not be smaller than the size), or a
sentinel NULL value.

Nevertheless, I floated the idea that [static] is completely
unnecessary, and nobody has yet been against it.

GCC could perfectly add a warning for the following case:

    void foo(size_t n, int a[n]);

    int
    main(void)
    {
        int a[7];

        foo(42, a);
    }

Nobody in their right mind would specify a size of an array
in a parameter and expect that passing a smaller array than
that can produce a valid program.  So, why not make that a
Wall warning?

And so [static] would be irrelevant in GNU C, because well,
what does it add?  In fact, I like that [static] is so badly
designed, because then we can repurpose plain [size] to mean
the right thing, which would produce cleaner programs
([static] just adds noise to the source).

What do you think of giving [42] a meaning, instead of just
ignoring it?

Cheers,
Alex

> which would then also trigger the optimization. I think
> clang uses it for optimization.
> 
> Martin
  
Martin Uecker Aug. 9, 2023, 12:03 p.m. UTC | #40
Hi Alejandro!

Am Mittwoch, dem 09.08.2023 um 12:42 +0200 schrieb Alejandro Colomar:

...

> 
> As for when one would want to mean the first (size of array)
> but not _Nonnull: for a function where you may pass either
> an array (which should not be smaller than the size), or a
> sentinel NULL value.
> 
> Nevertheless, I floated the idea that [static] is completely
> unnecessary, and nobody has yet been against it.
> 
> GCC could perfectly add a warning for the following case:
> 
>     void foo(size_t n, int a[n]);
> 
>     int
>     main(void)
>     {
>         int a[7];
> 
>         foo(42, a);
>     }
> 
> Nobody in their right mind would specify a size of an array
> in a parameter and expect that passing a smaller array than
> that can produce a valid program.  So, why not make that a
> Wall warning?

But we have this warning! is even activated by 
default without -Wall and already since GCC 11:





https://godbolt.org/z/sMbTon458

But this is for minimum required elements. How do 
we differentiate between null and non-null?

We have:

int[] or int* // no bound, nullable
int[N]	      // at least N, nullable
int[static N] // at least N, nonnull

The 'static' implies nonnull, so we could 
use 'static' to diffentiate between nonnull 
and nullable. 

What is missing something which implies bounds
also inside the callee.  You can use the "access"
attribute or we extend the meaning of int[N]
and int[static N] also imply a maximum bound.


Martin
  
Alejandro Colomar Aug. 9, 2023, 12:37 p.m. UTC | #41
Hi Martin!

On 2023-08-09 14:03, Martin Uecker wrote:
[...]

>> GCC could perfectly add a warning for the following case:
>>
>>     void foo(size_t n, int a[n]);
>>
>>     int
>>     main(void)
>>     {
>>         int a[7];
>>
>>         foo(42, a);
>>     }
>>
>> Nobody in their right mind would specify a size of an array
>> in a parameter and expect that passing a smaller array than
>> that can produce a valid program.  So, why not make that a
>> Wall warning?
> 
> But we have this warning! is even activated by 
> default without -Wall and already since GCC 11:

Ahh, I hadn't realized that was added (relatievly) recently.
That's great news!  Thanks!

> https://godbolt.org/z/sMbTon458
> 
> But this is for minimum required elements. How do 
> we differentiate between null and non-null?
> 
> We have:
> 
> int[] or int* // no bound, nullable
> int[N]	      // at least N, nullable
> int[static N] // at least N, nonnull
> 
> The 'static' implies nonnull, so we could 
> use 'static' to diffentiate between nonnull 
> and nullable. 

Since static is only for arrays (okay, they're all pointers, but
it's only usable with array syntax), and nullability is a thing
of pointers, I think static is not the right choice.

I oppose using array syntax for pointers, as it can confuse
programmers.

In any case, [static] is not that different from [_Nonnull], and
_Nonnull is more flexible, since you can also use it as
*_Nonnull.

Regarding the issues you have with _Nonnull being a qualifier,
I've been thinking about it for a long time and don't yet have
a concrete answer.  The more I think about it, the more I'm
convinced it is like `restrict`.  You can't have null correctness
as we can do with `const`.  With const, we know it's always bad
to store a const object in a non-const pointer.  With NULL, it
depends on the context: if you've checked for NULL in the lines
above, then it's fine to do the conversion.

There is a proposal for Clang to add `_Optional`, a qualifier to
the pointee, as a more correct version of _Nullable.  The
following would be equivalent:

    int *_Nullable  i;
    _Optional int   *j;

However, I don't believe it's a good choice, because of the above.
Instead, I think _Nullable or _Nonnull should be like `restrict`
and used only in function prototypes.  Let the analyzer do the
job.  I know `restrict` is hard to grasp, but I couldn't think of
anything better.

> 
> What is missing something which implies bounds
> also inside the callee.  You can use the "access"
> attribute or we extend the meaning of int[N]
> and int[static N] also imply a maximum bound.

Why is the upper bound important?  It's usually irrelevant.

In the case where one wants to make sure that an array is of an
exact size (instead of just "at least"), pointers to arrays can be
used.  They're just not often used, because you don't need that
so often.  I don't think we need a new addition to the language,
do we?

    $ cat ap.c
    #include <stddef.h>

    void
    foo(size_t n, int (*a)[n])
    {
        for (size_t i = 0; i < n; i++)
            (*a)[i] = 42;
    }
    $ gcc-13 -S -Wall -Wextra ap.c 
    $ 

In the function above, n is not a lower bound, but an exact bound.

GCC should add a warning for the following code:

    $ cat ap2.c
    #include <stddef.h>

    void foo(size_t n, int (*a)[n]);

    void
    bar(void)
    {
        int x[7];

        foo(3, &x);
        foo(9, &x);
    }
    $ gcc-13 -S -Wall -Wextra ap2.c
    $ 

Neither GCC nor Clang warn about that.  Not even with -fanalyzer.

Cheers,
Alex
  
Xi Ruoyao Aug. 9, 2023, 1:46 p.m. UTC | #42
On Wed, 2023-08-09 at 12:42 +0200, Alejandro Colomar wrote:
> I have a gripe with ISO C's [static].  As you mention, ISO
> conflated two functionalities in [static]:
> 
> -  The size of the array passed as argument must not be less
>    than the size specified in the parameter's [].
> 
> -  The pointer must not be NULL.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625559.html

If this patch is merged, they'll just become __nonnull and have all
advantages and disadvantages as __nonnull.
  
Martin Uecker Aug. 9, 2023, 2:24 p.m. UTC | #43
Am Mittwoch, dem 09.08.2023 um 14:37 +0200 schrieb Alejandro Colomar:
> Hi Martin!
> 
> On 2023-08-09 14:03, Martin Uecker wrote:


> 
> Regarding the issues you have with _Nonnull being a qualifier,
> I've been thinking about it for a long time and don't yet have
> a concrete answer.  The more I think about it, the more I'm
> convinced it is like `restrict`.  You can't have null correctness
> as we can do with `const`.  With const, we know it's always bad
> to store a const object in a non-const pointer.  With NULL, it
> depends on the context: if you've checked for NULL in the lines
> above, then it's fine to do the conversion.
> 
> There is a proposal for Clang to add `_Optional`, a qualifier to
> the pointee, as a more correct version of _Nullable.  The
> following would be equivalent:
> 
>     int *_Nullable  i;
>     _Optional int   *j;
> 
> However, I don't believe it's a good choice, because of the above.
> Instead, I think _Nullable or _Nonnull should be like `restrict`
> and used only in function prototypes.  Let the analyzer do the
> job.  I know `restrict` is hard to grasp, but I couldn't think of
> anything better.

I think this would be bad. More confusion is the least we need.
I would definitely prefer an attribute over a confusing qualifier 
which is not really a qualifier. 

_Optional is a something else and maybe not directly
comparable.  I think it is an interesting idea. First, it 
would fit well with rules the language.  One gets 
conversion errors exactly as with other qualifiers. So
one could add it to selected pointer in an existing code
base and incrementally improve the code by adding checks.

It would not be useful for existing interfaces that
need stay compatible.

> > 
> > What is missing something which implies bounds
> > also inside the callee.  You can use the "access"
> > attribute or we extend the meaning of int[N]
> > and int[static N] also imply a maximum bound.
> 
> Why is the upper bound important?  It's usually irrelevant.
> 
> In the case where one wants to make sure that an array is of an
> exact size (instead of just "at least"), pointers to arrays can be
> used.  They're just not often used, because you don't need that
> so often.  I don't think we need a new addition to the language,
> do we?
> 
>     $ cat ap.c
>     #include <stddef.h>
> 
>     void
>     foo(size_t n, int (*a)[n])
>     {
>         for (size_t i = 0; i < n; i++)
>             (*a)[i] = 42;
>     }
>     $ gcc-13 -S -Wall -Wextra ap.c 
>     $ 
> 
> In the function above, n is not a lower bound, but an exact bound.
> 
> GCC should add a warning for the following code:
> 
>     $ cat ap2.c
>     #include <stddef.h>
> 
>     void foo(size_t n, int (*a)[n]);
> 
>     void
>     bar(void)
>     {
>         int x[7];
> 
>         foo(3, &x);
>         foo(9, &x);
>     }
>     $ gcc-13 -S -Wall -Wextra ap2.c
>     $ 
> 
> Neither GCC nor Clang warn about that.  Not even with -fanalyzer.

I know. One can already get bounds checking side the callee with
this using UBSan. I have UBSan patch which would protect the
call itself and make sure the bounds are correct. Compile-time
warnings would also be good.

In any case, this does not work for existing interfaces.

Martin
  
enh Aug. 11, 2023, 11:34 p.m. UTC | #44
On Wed, Aug 9, 2023 at 12:26 AM Martin Uecker <muecker@gwdg.de> wrote:
>
> Am Dienstag, dem 08.08.2023 um 17:14 -0700 schrieb enh:
> > (bionic maintainer here, mostly by accident...)
> >
> > yeah, we tried the GCC attributes once before with _disastrous_
> > results (because GCC saw it as an excuse to delete explicit null
> > checks, it broke all kinds of things).
>
> Thanks! I am currently exploring different options and what
> could be done to improve the situation from the language
> as well as compile side.  It looks we have some partial
> solution but they do not work quite right or do not
> fit together perfectly.
>
>
> >  the clang attributes are
> > "better" in that they don't confuse two entirely separate notions ...
> > but they're not "good" because the analysis is so limited. i think we
> > were hoping for something more like the "used but not set" kind of
> > diagnostic, but afaict it really only catches _direct_ use of a
> > literal nullptr. so:
> >
> >   foo(nullptr); // caught
> >
> >   void* p = nullptr;
> >   foo(p); // not caught
> >
> > without getting on to anything fancy like:
> >
> >   void* p;
> >   if (a) p = bar();
> >   foo(p);
> >
> > we've done all the annotations anyway, but we're not expecting to
> > actually catch many bugs with them, and in fact did not catch any real
> > bugs in AOSP while adding the annotations. (though we did find several
> > "you're not _wrong_, but ..." bits of code :-) )
> >
> > what i really want for christmas is the annotation that lets me say
> > "this size_t argument tells you the size of this array argument" and
> > actually does something usefully fortify-like with that.
> >
>
> What is your opinion about the access attribute?
>
> https://godbolt.org/z/PPfajefvP
>
> it is a bit cumbersome to use, but one can use [static]
> instead, which gives you the same static warnings.

yeah, "that's hard to read" was actually my first reaction. maybe it's
just because i'm a libc maintainer used to the printf_like attribute,
but i actually find the regular __attribute__() style more readable.
you probably need to ask people who _consume_ more headers than they
write :-)

> [static] does not work with __builtin_dynamic_object_size,
> but maybe this could be changed (there is a bug filed.)
>
> I am not sure whether [static] should imply [[gnu::nonnull]]
> which would then also trigger the optimization. I think
> clang uses it for optimization.

yeah, that was what made us revert the old gcc nonnull annotations ---
we don't have any use case for "please use this to omit checks"
because we're generally dealing with public API. having the compiler
dead-code eliminate our attempts to return sensible errors was counter
to our goals, and seems like it would be in any place where we'd use a
bounds annotation too --- it'll be those worried about security (who
want to fail fast, and _before_ adding a potentially caller-controlled
index to an array base address!) who i'd expect to see using this kind
of thing first.

> Martin
>
>
> >  i've seen
> > proposals like https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/
> > but i haven't seen anything i can use yet, so you -- who do use GCC
> > which aiui has something along these lines already -- will find out
> > what's good/bad about it before i do...
>
>
>
> >
> > On Tue, Aug 8, 2023 at 3:01 AM Martin Uecker <muecker@gwdg.de> wrote:
> > >
> > > Am Montag, dem 10.07.2023 um 22:16 +0200 schrieb Alejandro Colomar via Gcc:
> > > > On 7/10/23 22:14, Alejandro Colomar wrote:
> > > > > [CC += Andrew]
> > > > >
> > > > > Hi Xi, Andrew,
> > > > >
> > > > > On 7/10/23 20:41, Xi Ruoyao wrote:
> > > > > > Maybe we should have a weaker version of nonnull which only performs the
> > > > > > diagnostic, not the optimization.  But it looks like they hate the idea:
> > > > > > https://gcc.gnu.org/PR110617.
> > > > > >
> > > > > This is the one thing that makes me use both Clang and GCC to compile,
> > > > > because while any of them would be enough to build, I want as much
> > > > > static analysis as I can get, and so I want -fanalyzer (so I need GCC),
> > > > > but I also use _Nullable (so I need Clang).
> > > > >
> > > > > If GCC had support for _Nullable, I would have in GCC the superset of
> > > > > features that I need from both in a single vendor.  Moreover, Clang's
> > > > > static analyzer is brain-damaged (sorry, but it doesn't have a simple
> > > > > command line to run it, contrary to GCC's easy -fanalyzer), so having
> > > > > GCC's analyzer get over those _Nullable qualifiers would be great.
> > > > >
> > > > > Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
> > > > > mode, as there are many cases where the compiler doesn't have enough
> > > > > information, and the analyzer can get rid of false negatives and
> > > > > positives.  See: <https://github.com/llvm/llvm-project/issues/57546>
> > > > >
> > > > > I'll back the ask for the qualifiers in GCC, for compatibility with
> > > > > Clang.
> > > >
> > > > BTW, Bionic libc is adding those qualifiers:
> > > >
> > > > <https://android-review.googlesource.com/c/platform/bionic/+/2552567/7/libc/include/netinet/ether.h#45>
> > > > <https://android-review.googlesource.com/q/owner:zijunzhao@google.com+Nullability>
> > > >
> > > >
> > >
> > > I am sceptical about these qualifiers because they do
> > > not fit well with this language mechanism.   Qualifiers take
> > > effect at accesses to objects and are discarded at lvalue
> > > conversion.  So a qualifier should ideally describe a property
> > > that is relevant for accessing objects but is not relevant
> > > for values.
> > >
> > > Also there are built-in conversion rules a qualifier should
> > > conform to.  A pointer pointing to a type without qualifier
> > > can implicitely convert to a pointer to a type with qualifier,
> > > e.g.   int*  can be converted to const int*.
> > >
> > > Together, this implies that a qualifier should add constraints
> > > to a type that are relevant to how an object is accessed.
> > >
> > > "const" and "volatile" or "_Atomic" are good examples.
> > > ("restricted" does not quite follow these rules and is
> > > considered weird and difficult to understand.)
> > >
> > > In contrast, "nonnull" and "nullable" are properties that
> > > affect the set of values of the pointer, but not how the
> > > pointer itself is accessed.
> > >
> > >
> > > Martin
> > >
> > >
> > >
> > >
> > >
>
  

Patch

diff --git a/libio/stdio.h b/libio/stdio.h
index 4cf9f1c012..a640e9beb5 100644
--- a/libio/stdio.h
+++ b/libio/stdio.h
@@ -1,5 +1,6 @@ 
 /* Define ISO C stdio on top of C++ iostreams.
    Copyright (C) 1991-2023 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -278,7 +279,7 @@  extern FILE *__REDIRECT (fopen, (const char *__restrict __filename,
 extern FILE *__REDIRECT (freopen, (const char *__restrict __filename,
 				   const char *__restrict __modes,
 				   FILE *__restrict __stream), freopen64)
-  __wur;
+  __wur __nonnull ((3));
 # else
 #  define fopen fopen64
 #  define freopen freopen64
@@ -330,21 +331,22 @@  extern __FILE *open_wmemstream (wchar_t **__bufloc, size_t *__sizeloc) __THROW
 
 /* If BUF is NULL, make STREAM unbuffered.
    Else make it use buffer BUF, of size BUFSIZ.  */
-extern void setbuf (FILE *__restrict __stream, char *__restrict __buf) __THROW;
+extern void setbuf (FILE *__restrict __stream, char *__restrict __buf) __THROW
+  __nonnull ((1));
 /* Make STREAM use buffering mode MODE.
    If BUF is not NULL, use N bytes of it for buffering;
    else allocate an internal buffer N bytes long.  */
 extern int setvbuf (FILE *__restrict __stream, char *__restrict __buf,
-		    int __modes, size_t __n) __THROW;
+		    int __modes, size_t __n) __THROW __nonnull ((1));
 
 #ifdef	__USE_MISC
 /* If BUF is NULL, make STREAM unbuffered.
    Else make it use SIZE bytes of BUF for buffering.  */
 extern void setbuffer (FILE *__restrict __stream, char *__restrict __buf,
-		       size_t __size) __THROW;
+		       size_t __size) __THROW __nonnull ((1));
 
 /* Make STREAM line-buffered.  */
-extern void setlinebuf (FILE *__stream) __THROW;
+extern void setlinebuf (FILE *__stream) __THROW __nonnull ((1));
 #endif
 
 
@@ -353,7 +355,7 @@  extern void setlinebuf (FILE *__stream) __THROW;
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
 extern int fprintf (FILE *__restrict __stream,
-		    const char *__restrict __format, ...);
+		    const char *__restrict __format, ...) __nonnull ((1));
 /* Write formatted output to stdout.
 
    This function is a possible cancellation point and therefore not
@@ -368,7 +370,7 @@  extern int sprintf (char *__restrict __s,
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
 extern int vfprintf (FILE *__restrict __s, const char *__restrict __format,
-		     __gnuc_va_list __arg);
+		     __gnuc_va_list __arg) __nonnull ((1));
 /* Write formatted output to stdout from argument list ARG.
 
    This function is a possible cancellation point and therefore not
@@ -418,7 +420,7 @@  extern int dprintf (int __fd, const char *__restrict __fmt, ...)
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
 extern int fscanf (FILE *__restrict __stream,
-		   const char *__restrict __format, ...) __wur;
+		   const char *__restrict __format, ...) __wur __nonnull ((1));
 /* Read formatted input from stdin.
 
    This function is a possible cancellation point and therefore not
@@ -439,7 +441,7 @@  extern int sscanf (const char *__restrict __s,
 #  ifdef __REDIRECT
 extern int __REDIRECT (fscanf, (FILE *__restrict __stream,
 				const char *__restrict __format, ...),
-		       __isoc23_fscanf) __wur;
+		       __isoc23_fscanf) __wur __nonnull ((1));
 extern int __REDIRECT (scanf, (const char *__restrict __format, ...),
 		       __isoc23_scanf) __wur;
 extern int __REDIRECT_NTH (sscanf, (const char *__restrict __s,
@@ -447,7 +449,8 @@  extern int __REDIRECT_NTH (sscanf, (const char *__restrict __s,
 			   __isoc23_sscanf);
 #  else
 extern int __isoc23_fscanf (FILE *__restrict __stream,
-			    const char *__restrict __format, ...) __wur;
+			    const char *__restrict __format, ...) __wur
+  __nonnull ((1));
 extern int __isoc23_scanf (const char *__restrict __format, ...) __wur;
 extern int __isoc23_sscanf (const char *__restrict __s,
 			    const char *__restrict __format, ...) __THROW;
@@ -459,7 +462,7 @@  extern int __isoc23_sscanf (const char *__restrict __s,
 #  ifdef __REDIRECT
 extern int __REDIRECT (fscanf, (FILE *__restrict __stream,
 				const char *__restrict __format, ...),
-		       __isoc99_fscanf) __wur;
+		       __isoc99_fscanf) __wur __nonnull ((1));
 extern int __REDIRECT (scanf, (const char *__restrict __format, ...),
 		       __isoc99_scanf) __wur;
 extern int __REDIRECT_NTH (sscanf, (const char *__restrict __s,
@@ -467,7 +470,8 @@  extern int __REDIRECT_NTH (sscanf, (const char *__restrict __s,
 			   __isoc99_sscanf);
 #  else
 extern int __isoc99_fscanf (FILE *__restrict __stream,
-			    const char *__restrict __format, ...) __wur;
+			    const char *__restrict __format, ...) __wur
+  __nonnull ((1));
 extern int __isoc99_scanf (const char *__restrict __format, ...) __wur;
 extern int __isoc99_sscanf (const char *__restrict __s,
 			    const char *__restrict __format, ...) __THROW;
@@ -485,7 +489,7 @@  extern int __isoc99_sscanf (const char *__restrict __s,
    marked with __THROW.  */
 extern int vfscanf (FILE *__restrict __s, const char *__restrict __format,
 		    __gnuc_va_list __arg)
-     __attribute__ ((__format__ (__scanf__, 2, 0))) __wur;
+     __attribute__ ((__format__ (__scanf__, 2, 0))) __wur __nonnull ((1));
 
 /* Read formatted input from stdin into argument list ARG.
 
@@ -508,7 +512,7 @@  extern int __REDIRECT (vfscanf,
 		       (FILE *__restrict __s,
 			const char *__restrict __format, __gnuc_va_list __arg),
 		       __isoc23_vfscanf)
-     __attribute__ ((__format__ (__scanf__, 2, 0))) __wur;
+     __attribute__ ((__format__ (__scanf__, 2, 0))) __wur __nonnull ((1));
 extern int __REDIRECT (vscanf, (const char *__restrict __format,
 				__gnuc_va_list __arg), __isoc23_vscanf)
      __attribute__ ((__format__ (__scanf__, 1, 0))) __wur;
@@ -520,7 +524,7 @@  extern int __REDIRECT_NTH (vsscanf,
 #   elif !defined __REDIRECT
 extern int __isoc23_vfscanf (FILE *__restrict __s,
 			     const char *__restrict __format,
-			     __gnuc_va_list __arg) __wur;
+			     __gnuc_va_list __arg) __wur __nonnull ((1));
 extern int __isoc23_vscanf (const char *__restrict __format,
 			    __gnuc_va_list __arg) __wur;
 extern int __isoc23_vsscanf (const char *__restrict __s,
@@ -537,7 +541,7 @@  extern int __REDIRECT (vfscanf,
 		       (FILE *__restrict __s,
 			const char *__restrict __format, __gnuc_va_list __arg),
 		       __isoc99_vfscanf)
-     __attribute__ ((__format__ (__scanf__, 2, 0))) __wur;
+     __attribute__ ((__format__ (__scanf__, 2, 0))) __wur __nonnull ((1));
 extern int __REDIRECT (vscanf, (const char *__restrict __format,
 				__gnuc_va_list __arg), __isoc99_vscanf)
      __attribute__ ((__format__ (__scanf__, 1, 0))) __wur;
@@ -549,7 +553,7 @@  extern int __REDIRECT_NTH (vsscanf,
 #   elif !defined __REDIRECT
 extern int __isoc99_vfscanf (FILE *__restrict __s,
 			     const char *__restrict __format,
-			     __gnuc_va_list __arg) __wur;
+			     __gnuc_va_list __arg) __wur __nonnull ((1));
 extern int __isoc99_vscanf (const char *__restrict __format,
 			    __gnuc_va_list __arg) __wur;
 extern int __isoc99_vsscanf (const char *__restrict __s,
@@ -568,8 +572,8 @@  extern int __isoc99_vsscanf (const char *__restrict __s,
 
    These functions are possible cancellation points and therefore not
    marked with __THROW.  */
-extern int fgetc (FILE *__stream);
-extern int getc (FILE *__stream);
+extern int fgetc (FILE *__stream) __nonnull ((1));
+extern int getc (FILE *__stream) __nonnull ((1));
 
 /* Read a character from stdin.
 
@@ -582,7 +586,7 @@  extern int getchar (void);
 
    These functions are possible cancellation points and therefore not
    marked with __THROW.  */
-extern int getc_unlocked (FILE *__stream);
+extern int getc_unlocked (FILE *__stream) __nonnull ((1));
 extern int getchar_unlocked (void);
 #endif /* Use POSIX.  */
 
@@ -593,7 +597,7 @@  extern int getchar_unlocked (void);
    cancellation point.  But due to similarity with an POSIX interface
    or due to the implementation it is a cancellation point and
    therefore not marked with __THROW.  */
-extern int fgetc_unlocked (FILE *__stream);
+extern int fgetc_unlocked (FILE *__stream) __nonnull ((1));
 #endif /* Use MISC.  */
 
 
@@ -604,8 +608,8 @@  extern int fgetc_unlocked (FILE *__stream);
 
    These functions is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int fputc (int __c, FILE *__stream);
-extern int putc (int __c, FILE *__stream);
+extern int fputc (int __c, FILE *__stream) __nonnull ((2));
+extern int putc (int __c, FILE *__stream) __nonnull ((2));
 
 /* Write a character to stdout.
 
@@ -620,7 +624,7 @@  extern int putchar (int __c);
    cancellation point.  But due to similarity with an POSIX interface
    or due to the implementation it is a cancellation point and
    therefore not marked with __THROW.  */
-extern int fputc_unlocked (int __c, FILE *__stream);
+extern int fputc_unlocked (int __c, FILE *__stream) __nonnull ((2));
 #endif /* Use MISC.  */
 
 #ifdef __USE_POSIX199506
@@ -628,7 +632,7 @@  extern int fputc_unlocked (int __c, FILE *__stream);
 
    These functions are possible cancellation points and therefore not
    marked with __THROW.  */
-extern int putc_unlocked (int __c, FILE *__stream);
+extern int putc_unlocked (int __c, FILE *__stream) __nonnull ((2));
 extern int putchar_unlocked (int __c);
 #endif /* Use POSIX.  */
 
@@ -636,10 +640,10 @@  extern int putchar_unlocked (int __c);
 #if defined __USE_MISC \
     || (defined __USE_XOPEN && !defined __USE_XOPEN2K)
 /* Get a word (int) from STREAM.  */
-extern int getw (FILE *__stream);
+extern int getw (FILE *__stream) __nonnull ((1));
 
 /* Write a word (int) to STREAM.  */
-extern int putw (int __w, FILE *__stream);
+extern int putw (int __w, FILE *__stream) __nonnull ((2));
 #endif
 
 
@@ -648,7 +652,7 @@  extern int putw (int __w, FILE *__stream);
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
 extern char *fgets (char *__restrict __s, int __n, FILE *__restrict __stream)
-     __wur __fortified_attr_access (__write_only__, 1, 2);
+     __wur __fortified_attr_access (__write_only__, 1, 2) __nonnull ((3));
 
 #if __GLIBC_USE (DEPRECATED_GETS)
 /* Get a newline-terminated string from stdin, removing the newline.
@@ -672,7 +676,7 @@  extern char *gets (char *__s) __wur __attribute_deprecated__;
    therefore not marked with __THROW.  */
 extern char *fgets_unlocked (char *__restrict __s, int __n,
 			     FILE *__restrict __stream) __wur
-    __fortified_attr_access (__write_only__, 1, 2);
+    __fortified_attr_access (__write_only__, 1, 2) __nonnull ((3));
 #endif
 
 
@@ -689,10 +693,10 @@  extern char *fgets_unlocked (char *__restrict __s, int __n,
    therefore not marked with __THROW.  */
 extern __ssize_t __getdelim (char **__restrict __lineptr,
                              size_t *__restrict __n, int __delimiter,
-                             FILE *__restrict __stream) __wur;
+                             FILE *__restrict __stream) __wur __nonnull ((4));
 extern __ssize_t getdelim (char **__restrict __lineptr,
                            size_t *__restrict __n, int __delimiter,
-                           FILE *__restrict __stream) __wur;
+                           FILE *__restrict __stream) __wur __nonnull ((4));
 
 /* Like `getdelim', but reads up to a newline.
 
@@ -702,7 +706,7 @@  extern __ssize_t getdelim (char **__restrict __lineptr,
    therefore not marked with __THROW.  */
 extern __ssize_t getline (char **__restrict __lineptr,
                           size_t *__restrict __n,
-                          FILE *__restrict __stream) __wur;
+                          FILE *__restrict __stream) __wur __nonnull ((3));
 #endif
 
 
@@ -710,7 +714,8 @@  extern __ssize_t getline (char **__restrict __lineptr,
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int fputs (const char *__restrict __s, FILE *__restrict __stream);
+extern int fputs (const char *__restrict __s, FILE *__restrict __stream)
+  __nonnull ((2));
 
 /* Write a string, followed by a newline, to stdout.
 
@@ -723,7 +728,7 @@  extern int puts (const char *__s);
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int ungetc (int __c, FILE *__stream);
+extern int ungetc (int __c, FILE *__stream) __nonnull ((2));
 
 
 /* Read chunks of generic data from STREAM.
@@ -731,13 +736,14 @@  extern int ungetc (int __c, FILE *__stream);
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
 extern size_t fread (void *__restrict __ptr, size_t __size,
-		     size_t __n, FILE *__restrict __stream) __wur;
+		     size_t __n, FILE *__restrict __stream) __wur
+  __nonnull((4));
 /* Write chunks of generic data to STREAM.
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
 extern size_t fwrite (const void *__restrict __ptr, size_t __size,
-		      size_t __n, FILE *__restrict __s);
+		      size_t __n, FILE *__restrict __s) __nonnull((4));
 
 #ifdef __USE_GNU
 /* This function does the same as `fputs' but does not lock the stream.
@@ -747,7 +753,7 @@  extern size_t fwrite (const void *__restrict __ptr, size_t __size,
    or due to the implementation it is a cancellation point and
    therefore not marked with __THROW.  */
 extern int fputs_unlocked (const char *__restrict __s,
-			   FILE *__restrict __stream);
+			   FILE *__restrict __stream) __nonnull ((2));
 #endif
 
 #ifdef __USE_MISC
@@ -758,9 +764,11 @@  extern int fputs_unlocked (const char *__restrict __s,
    or due to the implementation they are cancellation points and
    therefore not marked with __THROW.  */
 extern size_t fread_unlocked (void *__restrict __ptr, size_t __size,
-			      size_t __n, FILE *__restrict __stream) __wur;
+			      size_t __n, FILE *__restrict __stream) __wur
+  __nonnull ((4));
 extern size_t fwrite_unlocked (const void *__restrict __ptr, size_t __size,
-			       size_t __n, FILE *__restrict __stream);
+			       size_t __n, FILE *__restrict __stream)
+  __nonnull ((4));
 #endif
 
 
@@ -768,17 +776,18 @@  extern size_t fwrite_unlocked (const void *__restrict __ptr, size_t __size,
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int fseek (FILE *__stream, long int __off, int __whence);
+extern int fseek (FILE *__stream, long int __off, int __whence)
+  __nonnull ((1));
 /* Return the current position of STREAM.
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern long int ftell (FILE *__stream) __wur;
+extern long int ftell (FILE *__stream) __wur __nonnull ((1));
 /* Rewind to the beginning of STREAM.
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern void rewind (FILE *__stream);
+extern void rewind (FILE *__stream) __nonnull ((1));
 
 /* The Single Unix Specification, Version 2, specifies an alternative,
    more adequate interface for the two functions above which deal with
@@ -791,18 +800,20 @@  extern void rewind (FILE *__stream);
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int fseeko (FILE *__stream, __off_t __off, int __whence);
+extern int fseeko (FILE *__stream, __off_t __off, int __whence)
+  __nonnull ((1));
 /* Return the current position of STREAM.
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern __off_t ftello (FILE *__stream) __wur;
+extern __off_t ftello (FILE *__stream) __wur __nonnull ((1));
 # else
 #  ifdef __REDIRECT
 extern int __REDIRECT (fseeko,
 		       (FILE *__stream, __off64_t __off, int __whence),
-		       fseeko64);
-extern __off64_t __REDIRECT (ftello, (FILE *__stream), ftello64);
+		       fseeko64) __nonnull ((1));
+extern __off64_t __REDIRECT (ftello, (FILE *__stream), ftello64)
+  __nonnull ((1));
 #  else
 #   define fseeko fseeko64
 #   define ftello ftello64
@@ -815,18 +826,21 @@  extern __off64_t __REDIRECT (ftello, (FILE *__stream), ftello64);
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int fgetpos (FILE *__restrict __stream, fpos_t *__restrict __pos);
+extern int fgetpos (FILE *__restrict __stream, fpos_t *__restrict __pos)
+  __nonnull ((1));
 /* Set STREAM's position.
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int fsetpos (FILE *__stream, const fpos_t *__pos);
+extern int fsetpos (FILE *__stream, const fpos_t *__pos) __nonnull ((1));
 #else
 # ifdef __REDIRECT
 extern int __REDIRECT (fgetpos, (FILE *__restrict __stream,
-				 fpos_t *__restrict __pos), fgetpos64);
+				 fpos_t *__restrict __pos), fgetpos64)
+  __nonnull ((1));
 extern int __REDIRECT (fsetpos,
-		       (FILE *__stream, const fpos_t *__pos), fsetpos64);
+		       (FILE *__stream, const fpos_t *__pos), fsetpos64)
+  __nonnull ((1));
 # else
 #  define fgetpos fgetpos64
 #  define fsetpos fsetpos64
@@ -834,24 +848,26 @@  extern int __REDIRECT (fsetpos,
 #endif
 
 #ifdef __USE_LARGEFILE64
-extern int fseeko64 (FILE *__stream, __off64_t __off, int __whence);
-extern __off64_t ftello64 (FILE *__stream) __wur;
-extern int fgetpos64 (FILE *__restrict __stream, fpos64_t *__restrict __pos);
-extern int fsetpos64 (FILE *__stream, const fpos64_t *__pos);
+extern int fseeko64 (FILE *__stream, __off64_t __off, int __whence)
+  __nonnull ((1));
+extern __off64_t ftello64 (FILE *__stream) __wur __nonnull ((1));
+extern int fgetpos64 (FILE *__restrict __stream, fpos64_t *__restrict __pos)
+  __nonnull ((1));
+extern int fsetpos64 (FILE *__stream, const fpos64_t *__pos) __nonnull ((1));
 #endif
 
 /* Clear the error and EOF indicators for STREAM.  */
-extern void clearerr (FILE *__stream) __THROW;
+extern void clearerr (FILE *__stream) __THROW __nonnull ((1));
 /* Return the EOF indicator for STREAM.  */
-extern int feof (FILE *__stream) __THROW __wur;
+extern int feof (FILE *__stream) __THROW __wur __nonnull ((1));
 /* Return the error indicator for STREAM.  */
-extern int ferror (FILE *__stream) __THROW __wur;
+extern int ferror (FILE *__stream) __THROW __wur __nonnull ((1));
 
 #ifdef __USE_MISC
 /* Faster versions when locking is not required.  */
-extern void clearerr_unlocked (FILE *__stream) __THROW;
-extern int feof_unlocked (FILE *__stream) __THROW __wur;
-extern int ferror_unlocked (FILE *__stream) __THROW __wur;
+extern void clearerr_unlocked (FILE *__stream) __THROW __nonnull ((1));
+extern int feof_unlocked (FILE *__stream) __THROW __wur __nonnull ((1));
+extern int ferror_unlocked (FILE *__stream) __THROW __wur __nonnull ((1));
 #endif
 
 
@@ -864,12 +880,12 @@  extern void perror (const char *__s) __COLD;
 
 #ifdef	__USE_POSIX
 /* Return the system file descriptor for STREAM.  */
-extern int fileno (FILE *__stream) __THROW __wur;
+extern int fileno (FILE *__stream) __THROW __wur __nonnull ((1));
 #endif /* Use POSIX.  */
 
 #ifdef __USE_MISC
 /* Faster version when locking is not required.  */
-extern int fileno_unlocked (FILE *__stream) __THROW __wur;
+extern int fileno_unlocked (FILE *__stream) __THROW __wur __nonnull ((1));
 #endif
 
 
@@ -878,7 +894,7 @@  extern int fileno_unlocked (FILE *__stream) __THROW __wur;
 
    This function is a possible cancellation point and therefore not
    marked with __THROW.  */
-extern int pclose (FILE *__stream);
+extern int pclose (FILE *__stream) __nonnull ((1));
 
 /* Create a new stream connected to a pipe running the given command.
 
@@ -922,14 +938,14 @@  extern int obstack_vprintf (struct obstack *__restrict __obstack,
 /* These are defined in POSIX.1:1996.  */
 
 /* Acquire ownership of STREAM.  */
-extern void flockfile (FILE *__stream) __THROW;
+extern void flockfile (FILE *__stream) __THROW __nonnull ((1));
 
 /* Try to acquire ownership of STREAM but do not block if it is not
    possible.  */
-extern int ftrylockfile (FILE *__stream) __THROW __wur;
+extern int ftrylockfile (FILE *__stream) __THROW __wur __nonnull ((1));
 
 /* Relinquish the ownership granted for STREAM.  */
-extern void funlockfile (FILE *__stream) __THROW;
+extern void funlockfile (FILE *__stream) __THROW __nonnull ((1));
 #endif /* POSIX */
 
 #if defined __USE_XOPEN && !defined __USE_XOPEN2K && !defined __USE_GNU