Fix issue with successful malloc setting ENOMEM errno

Message ID 20210923215357.2962606-1-shorne@gmail.com
State Superseded
Headers
Series Fix issue with successful malloc setting ENOMEM errno |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Stafford Horne Sept. 23, 2021, 9:53 p.m. UTC
  When testing nptl/tst-pthread-attr-affinity-fail fails with:

    error: xsysconf.c:33: sysconf (83): Cannot allocate memory
    error: 1 test failures

This happens as xsysconf checks the errno after running sysconf.
Internally the sysconf request for _SC_NPROCESSORS_CONF on linux
allocates memory.  But there is a problem, even though malloc succeeds
errno is getting set to ENOMEM.  I have traced it as explained below.

On my platform the memory allocation is non-contiguous so we see malloc
do:

 1 brk (0)                - queries brk
 2 brk (addr + increment) - fails and sets ERRNO due to, current vm address
                            space not being large enough.
 3 mmap                   - gets a new address space
 4 malloc succeeds        - however ERRNO is still set to ENOMEM, due to
                            2

This patch clears errno after the fallback mmap call succeeds.
---
 malloc/malloc.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Florian Weimer Sept. 23, 2021, 10:11 p.m. UTC | #1
* Stafford Horne via Libc-alpha:

> On my platform the memory allocation is non-contiguous so we see malloc
> do:
>
>  1 brk (0)                - queries brk
>  2 brk (addr + increment) - fails and sets ERRNO due to, current vm address
>                             space not being large enough.
>  3 mmap                   - gets a new address space
>  4 malloc succeeds        - however ERRNO is still set to ENOMEM, due to
>                             2
>
> This patch clears errno after the fallback mmap call succeeds.

Thanks for tracking down the issue.

> ---
>  malloc/malloc.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 12dfaec81e..7d85238e0a 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -2705,6 +2705,9 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>                    brk = mbrk;
>                    snd_brk = brk + size;
>  
> +		  /* If MORECORE failed it may have set ENOMEM, clear that.  */
> +		  __set_errno (0);
> +
>                    /*
>                       Record that we no longer have a contiguous sbrk region.
>                       After the first time mmap is used as backup, we do not

We cannot set errno to zero, POSIX does not allow this.  What we have to
do instead in such cases is save the old errno value and restore it
after the call that clobbers errno.

The question is whether we should do this malloc.  POSIX allows
successful calls to clobber errno, and there are probably other paths in
malloc which are problematic in this regard.

But the error check in xsysconf is wrong.  I think it should be fixed
like this:

diff --git a/support/xsysconf.c b/support/xsysconf.c
index 2607d3a720..fce7795417 100644
--- a/support/xsysconf.c
+++ b/support/xsysconf.c
@@ -29,7 +29,7 @@ xsysconf (int name)
   int old_errno = errno;
   errno = 0;
   long result = sysconf (name);
-  if (errno != 0)
+  if (result == -1 && errno != 0)
     FAIL_EXIT1 ("sysconf (%d): %m", name);
   errno = old_errno;
   return result;

This is really a simple oversight.  It's a bit embarrassing that we get
this wrong inside glibc, too.  But I'm not quite ready yet for
advocating a policy that successful calls should never change errno. 8-)

Thanks,
Florian
  
Stafford Horne Sept. 23, 2021, 11:17 p.m. UTC | #2
On Fri, Sep 24, 2021 at 12:11:31AM +0200, Florian Weimer wrote:
> * Stafford Horne via Libc-alpha:
> 
> > On my platform the memory allocation is non-contiguous so we see malloc
> > do:
> >
> >  1 brk (0)                - queries brk
> >  2 brk (addr + increment) - fails and sets ERRNO due to, current vm address
> >                             space not being large enough.
> >  3 mmap                   - gets a new address space
> >  4 malloc succeeds        - however ERRNO is still set to ENOMEM, due to
> >                             2
> >
> > This patch clears errno after the fallback mmap call succeeds.
> 
> Thanks for tracking down the issue.

It was an interesting one.

> > ---
> >  malloc/malloc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/malloc/malloc.c b/malloc/malloc.c
> > index 12dfaec81e..7d85238e0a 100644
> > --- a/malloc/malloc.c
> > +++ b/malloc/malloc.c
> > @@ -2705,6 +2705,9 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
> >                    brk = mbrk;
> >                    snd_brk = brk + size;
> >  
> > +		  /* If MORECORE failed it may have set ENOMEM, clear that.  */
> > +		  __set_errno (0);
> > +
> >                    /*
> >                       Record that we no longer have a contiguous sbrk region.
> >                       After the first time mmap is used as backup, we do not
> 
> We cannot set errno to zero, POSIX does not allow this.  What we have to
> do instead in such cases is save the old errno value and restore it
> after the call that clobbers errno.

Right, I was thinking that thanks for clarifying.  I thought sending this simple
patch was good enough to start the discussion.

> The question is whether we should do this malloc.  POSIX allows
> successful calls to clobber errno, and there are probably other paths in
> malloc which are problematic in this regard.

OK, that makes sense, I didn't know it, but good to know.

> But the error check in xsysconf is wrong.  I think it should be fixed
> like this:
> 
> diff --git a/support/xsysconf.c b/support/xsysconf.c
> index 2607d3a720..fce7795417 100644
> --- a/support/xsysconf.c
> +++ b/support/xsysconf.c
> @@ -29,7 +29,7 @@ xsysconf (int name)
>    int old_errno = errno;
>    errno = 0;
>    long result = sysconf (name);
> -  if (errno != 0)
> +  if (result == -1 && errno != 0)
>      FAIL_EXIT1 ("sysconf (%d): %m", name);
>    errno = old_errno;
>    return result;
> 
> This is really a simple oversight.  It's a bit embarrassing that we get
> this wrong inside glibc, too.  But I'm not quite ready yet for
> advocating a policy that successful calls should never change errno. 8-)

OK, I was thinking of a similar patch to xsysconf at first but I got a bit side
tracked seeing malloc was setting errno to ENOMEM.

Let me test the xsysconf patch then send that with a commit message summarizing
some of the bits above.

-Stafford
  
Zack Weinberg Sept. 24, 2021, 1:18 p.m. UTC | #3
On Thu, Sep 23, 2021, at 6:11 PM, Florian Weimer via Libc-alpha wrote:
> We cannot set errno to zero, POSIX does not allow this.

I believe this is an ISO C requirement, not only POSIX.

> I'm not quite ready yet for
> advocating a policy that successful calls should never change errno.

There's a good case for making this be policy for sysconf specifically, since -1 is a possible *successful* return value. Same principle as strtol.

zw
  
DJ Delorie Sept. 24, 2021, 5:08 p.m. UTC | #4
Zack Weinberg via Libc-alpha <libc-alpha@sourceware.org> writes:
> On Thu, Sep 23, 2021, at 6:11 PM, Florian Weimer via Libc-alpha wrote:
>> We cannot set errno to zero, POSIX does not allow this.
>
> I believe this is an ISO C requirement, not only POSIX.

We do, however, allow for a save/restore around internal code where we
don't want those errors to affect a returned error status.
  

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 12dfaec81e..7d85238e0a 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2705,6 +2705,9 @@  sysmalloc (INTERNAL_SIZE_T nb, mstate av)
                   brk = mbrk;
                   snd_brk = brk + size;
 
+		  /* If MORECORE failed it may have set ENOMEM, clear that.  */
+		  __set_errno (0);
+
                   /*
                      Record that we no longer have a contiguous sbrk region.
                      After the first time mmap is used as backup, we do not