[PATCHv8] nss_db: protect against empty mappings

Message ID xna7e1tk70.fsf@greed.delorie.com
State Superseded
Headers

Commit Message

DJ Delorie June 28, 2019, 10:32 p.m. UTC
  Florian Weimer <fweimer@redhat.com> writes:
>> You could use xasprintf and free the pointer.
>
> Please also post the planned commit message.  Thanks.

latest, with xasprintf, and full commit blob posted...

From e465ac6a0cf322c72028d54cc6eb534bb035bd0d Mon Sep 17 00:00:00 2001
From: DJ Delorie <dj@redhat.com>
Date: Fri, 28 Jun 2019 18:30:00 -0500
Subject: nss_db: fix endent wrt NULL mappings [BZ #24695] [BZ #24696]

nss_db allows for getpwent et al to be called without a set*ent,
but it only works once.  After the last get*ent a set*ent is
required to restart, because the end*ent did not properly reset
the module.  Resetting it to NULL allows for a proper restart.

If the database doesn't exist, however, end*ent erroniously called
munmap which set errno.

The test case runs "makedb" inside the testroot, so needs selinux
DSOs installed.
  

Comments

Florian Weimer July 10, 2019, 9:51 a.m. UTC | #1
* DJ Delorie:

> diff --git a/ChangeLog b/ChangeLog
> index aece032385..37be9a998a 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,19 @@
> +2019-06-28  DJ Delorie  <dj@redhat.com>
> +	    Sergei Trofimovich <slyfox@inbox.ru>
> +
> +	[BZ #24696]
> +	[BZ #24695]
> +	* nss/nss_db/db-open.c (internal_endent): Protect against NULL
> +	mappings.
> +	* nss/tst-nss-db-endgrent.c: New.
> +	* nss/tst-nss-db-endgrent.root: New.
> +	* nss/tst-nss-db-endpwent.c: New.
> +	* nss/tst-nss-db-endpwent.root: New.
> +	* nss/Makefile: Add new tests.
> +	* support/links-dso-program-c.c: Add selinux dependency.
> +	* support/links-dso-program.cc: Add selinux dependency.
> +	* support/Makefile: Build those with -lselinux if enabled.
> +

This version looks okay to me.  Thanks.

Florian
  
Rafal Luzynski July 12, 2019, 12:12 a.m. UTC | #2
10.07.2019 20:52 DJ Delorie <dj@redhat.com> wrote:
> 
> Thanks!  Pushed.

Hello DJ,

This commit seems to cause failure.  After "make check"
nss/tst-nss-db-endgrent.out contains:

    error: tst-nss-db-endgrent.c:50: endgrent set errno to 22

    error: 1 test failures

Does it maybe still run the old version of the updated function?
The OS is Fedora 30, x86_64.

Regards,

Rafal
  
DJ Delorie July 12, 2019, 4:21 a.m. UTC | #3
If you applied the patch and just ran "make" to rebuild the parts that
changed, it won't update the testroot - you need to either remove the
testroot.pristine directory, or at least remove the
testroot.pristine/install.stamp file.

And yes, this was an intentional choice - running "make install" every
single time would have been prohibitively expensive; see the
"test-in-container: Install locales into the test container" thread for
more discussion on test times.
  
Rafal Luzynski July 12, 2019, 10:24 a.m. UTC | #4
12.07.2019 06:21 DJ Delorie <dj@redhat.com> wrote:
> 
> If you applied the patch and just ran "make" to rebuild the parts that
> changed, it won't update the testroot

Yes, that's exactly what I did.

> - you need to either remove the
> testroot.pristine directory, or at least remove the
> testroot.pristine/install.stamp file.
> [...]

Yes, after the full rebuild from scratch all tests pass correctly.
Thank you for explanation and sorry for the noise.

Regards,

Rafal
  
Carlos O'Donell July 12, 2019, 11:36 a.m. UTC | #5
On 7/12/19 6:24 AM, Rafal Luzynski wrote:
> 12.07.2019 06:21 DJ Delorie <dj@redhat.com> wrote:
>>
>> If you applied the patch and just ran "make" to rebuild the parts that
>> changed, it won't update the testroot
> 
> Yes, that's exactly what I did.
> 
>> - you need to either remove the
>> testroot.pristine directory, or at least remove the
>> testroot.pristine/install.stamp file.
>> [...]
> 
> Yes, after the full rebuild from scratch all tests pass correctly.
> Thank you for explanation and sorry for the noise.

Rafal,

If you think this is a bad developer experience, then please feel
free to voice your concerns.

We're not sure how to get the chroot to be rebuilt easily without
lots of additional cost.

DJ,

I wonder if we can't make the workflow different, an this is purely
a psychological thing:

(a) Make a "build done" stamp.
(b) Compare "build done" stamp with "chroot stamp" and if they are
     out of sync, all container tests *run* but have the final exit
     code overriden with fail with a new error code which is "OUT OF SYNC"
     You can still inspect the original failure code in the logs,
     but the container wrapper alters it on return.
(c) Developers see this and can ignore it, but at least the tests
     don't run with an erroneous chroot returning results.
(d) Developers must run "make update" to clear this problem, and
     the test summary script should indicate this when it see any
     "OUT OF SYNC" failures.

Thoughts?
  
Rafal Luzynski July 16, 2019, 9:54 a.m. UTC | #6
12.07.2019 13:36 Carlos O'Donell <carlos@redhat.com> wrote:
> [...]
> Rafal,
> 
> If you think this is a bad developer experience, then please feel
> free to voice your concerns.

This is little off-topic so I'm changing the subject of the
thread.

To some extent it is indeed a kind of bad experience.  In a perfect
world, every "make" run should rebuild all binaries whose source
code had changed and reuse those existing binaries from the previous
builds whose source code had not changed.  Usually this works fine
but sometimes does not.  In these rare cases "delete everything and
rebuild from scratch" is the correct answer.  I don't want to waste
my time and other developers' time to build a system where every
"make" works incrementally correctly because the aim is to ensure
that the rebuild from scratch works correctly because this is what
the distros are doing.  Incremental build is necessary only for our
small group of developers and, again, it works fine most of the time.

So maybe instead of focusing on incremental builds I should explain
why I perceive building as so important.  I always want to ensure
that my patches do not break glibc, including ensuring that they
don't break today (if yesterday everything was OK it does not mean
it is OK today because something might have changed upstream).
This is a simple and repetitive task which could be done automatically.

So the question is: can we have a CI/CD mechanism in glibc project
which would perform all builds and tests for every commit and raise
an alarm if anything goes wrong?  Can it be extended to verifying
patches when they are posted on this mailing list, before pushing
to the main repository?  Can it be part of sourceware.org, maybe
integrated with patchwork.sourceware.org?  Some big source code
management systems like GitHub or GitLab already include such
features, can we reuse them?  If not, can we have an unofficial
clone at GitHub (well, we already have one [1]) to do that task?

Such a mechanism would be useful to detect use cases like:
* something goes wrong but the problem is only in my machine
  because the online service confirms there is no problem;
* the code works fine in my machine and in all other developers'
  machines but fails in one exotic hardware architecture.

Regards,

Rafal

[1] https://github.com/bminor/glibc/
  
Florian Weimer July 16, 2019, noon UTC | #7
* Rafal Luzynski:

> Such a mechanism would be useful to detect use cases like:
> * something goes wrong but the problem is only in my machine
>   because the online service confirms there is no problem;
> * the code works fine in my machine and in all other developers'
>   machines but fails in one exotic hardware architecture.

We already have build and test bots, but I haven't been able to make any
sense of their output. 8-(

I would love to have a better contributor experience.  CI could be part
of that, but the kind of resources needed to give feedback in a
reasonable time frame (say, less than one hour) are difficult to come by
at this point.  (And we are not even talking about actually running
tests, just making sure that everything still builds.)

Some more notes about the build system:

Most of my development work involves short-circuiting the build system
for testing (make -j8 -O subdirs=… check).  I have to be really careful when I
do that because in some cases, it will corrupt your build tree, and then
I need to throw everything away and build from scratch again.  It's
considerably safer to do this instead:

  make -j8 -O && make -j8 -O subdirs=… check

But for me, it adds 12 seconds to every test tweak.  (For reference:
musl builds from scratch in less than 10 seconds on this hardware.)

What's worse, for me, the test-in-continer framework has made things
quite worse, but no one has been able to reproduce that.  Given things
are so brittle, I hesitate to encourage adoption of this style.  But not
being able to do something like that makes it really hard for new
contributors.

One experiment I'd like to do (and maybe someone wants to help with
that): Instrument CC and CXX invocations with something that captures
the current directory and the command lines during a regular build.
Afterwards, replay all the compiler invocations, in parallel.  (This may
need some manual tweaks for adding barriers, but perhaps not if we run
this test on an already-built tree.)  This should gives a number: How
much time does it take, at minimum, to build glibc, without make
overhead or artificial serialization.  It will tell us how inefficient
the current build system really is.  Is the make overhead for a
from-scratch build just those 12 seconds I mentioned above, or is it
much larger?

This should give us some guidance whether we need to focus on
from-scratch performance, or on making incremental builds accurate.

My feeling is that we need to tackle this first before we can offer a CI
workflow because even with remote CI, local builds will still be an
important part of the developer experience.  And if we can speed up
building significantly, we perhaps do not have to investigate ways to
run build-many-glibcs.py in a distributed fashion, on a cluster of
machines.

Thanks,
Florian
  
Carlos O'Donell July 16, 2019, 8:14 p.m. UTC | #8
On 7/16/19 8:00 AM, Florian Weimer wrote:
> My feeling is that we need to tackle this first before we can offer a CI
> workflow because even with remote CI, local builds will still be an
> important part of the developer experience.  And if we can speed up
> building significantly, we perhaps do not have to investigate ways to
> run build-many-glibcs.py in a distributed fashion, on a cluster of
> machines.

I agree.

However, I think we can discuss gitlab/CI style setups and they will
benefit from any speedup we have in the build cycle.

I have an item for discussion at GNU Cauldron 2019 regarding this
kind of workflow.
  
Zack Weinberg July 17, 2019, 5:58 p.m. UTC | #9
On Tue, Jul 16, 2019 at 8:00 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> One experiment I'd like to do (and maybe someone wants to help with
> that): Instrument CC and CXX invocations with something that captures
> the current directory and the command lines during a regular build.
> Afterwards, replay all the compiler invocations, in parallel.  (This may
> need some manual tweaks for adding barriers, but perhaps not if we run
> this test on an already-built tree.)  This should gives a number: How
> much time does it take, at minimum, to build glibc, without make
> overhead or artificial serialization.  It will tell us how inefficient
> the current build system really is.  Is the make overhead for a
> from-scratch build just those 12 seconds I mentioned above, or is it
> much larger?
>
> This should give us some guidance whether we need to focus on
> from-scratch performance, or on making incremental builds accurate.

This isn't the data point you were asking for, but it's a
complementary one: I wrote a simple program (attached; it's in Python,
so we've got interpreter overhead in here, but I expect the dominant
factor is OS overhead) that calls lstat() on every file in a set of
directory trees, except not descending into any subdirectory named
'.git', and records the result in a data structure, and then reports
how long it took to do that.  This should approximate the minimum
possible time for an ideal build tool to determine that an incremental
build has nothing to do.

On the computer I'm typing this on (Xeon E3-1245v3, Linux 4.19,
speculative execution mitigations active, SSD), I ran this test 50
times on a glibc source and build tree and got a median time of 0.181
seconds, with interquartile range of 0.00163 seconds.

zw
  
Florian Weimer July 17, 2019, 7:05 p.m. UTC | #10
* Zack Weinberg:

> This isn't the data point you were asking for, but it's a
> complementary one: I wrote a simple program (attached; it's in Python,
> so we've got interpreter overhead in here, but I expect the dominant
> factor is OS overhead) that calls lstat() on every file in a set of
> directory trees, except not descending into any subdirectory named
> '.git', and records the result in a data structure, and then reports
> how long it took to do that.  This should approximate the minimum
> possible time for an ideal build tool to determine that an incremental
> build has nothing to do.

An ideal build tool could do this multi-threaded.  That's not so
outrageous: Even make manages to distribute some of its overhead across
several processes running in parallel.

> On the computer I'm typing this on (Xeon E3-1245v3, Linux 4.19,
> speculative execution mitigations active, SSD), I ran this test 50
> times on a glibc source and build tree and got a median time of 0.181
> seconds, with interquartile range of 0.00163 seconds.

It might be instructive to compare this with “git status” (which is
multi-threaded these days), on a fake tree with both the sources and the
build tree.

Thanks,
Florian
  

Patch

diff --git a/ChangeLog b/ChangeLog
index aece032385..37be9a998a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@ 
+2019-06-28  DJ Delorie  <dj@redhat.com>
+	    Sergei Trofimovich <slyfox@inbox.ru>
+
+	[BZ #24696]
+	[BZ #24695]
+	* nss/nss_db/db-open.c (internal_endent): Protect against NULL
+	mappings.
+	* nss/tst-nss-db-endgrent.c: New.
+	* nss/tst-nss-db-endgrent.root: New.
+	* nss/tst-nss-db-endpwent.c: New.
+	* nss/tst-nss-db-endpwent.root: New.
+	* nss/Makefile: Add new tests.
+	* support/links-dso-program-c.c: Add selinux dependency.
+	* support/links-dso-program.cc: Add selinux dependency.
+	* support/Makefile: Build those with -lselinux if enabled.
+
 2019-06-28  Wilco Dijkstra  <wdijkstr@arm.com>
 
 	* benchtests/bench-math-inlines.c: Increase iterations.
diff --git a/nss/Makefile b/nss/Makefile
index 95081bddc5..a15c3b7d90 100644
--- a/nss/Makefile
+++ b/nss/Makefile
@@ -61,7 +61,9 @@  xtests			= bug-erange
 
 tests-container = \
 			  tst-nss-test3 \
-			  tst-nss-files-hosts-long
+			  tst-nss-files-hosts-long \
+			  tst-nss-db-endpwent \
+			  tst-nss-db-endgrent
 
 # Tests which need libdl
 ifeq (yes,$(build-shared))
diff --git a/nss/nss_db/db-open.c b/nss/nss_db/db-open.c
index 8a83d6b930..3fa11e9ab0 100644
--- a/nss/nss_db/db-open.c
+++ b/nss/nss_db/db-open.c
@@ -63,5 +63,9 @@  internal_setent (const char *file, struct nss_db_map *mapping)
 void
 internal_endent (struct nss_db_map *mapping)
 {
-  munmap (mapping->header, mapping->len);
+  if (mapping->header != NULL)
+    {
+      munmap (mapping->header, mapping->len);
+      mapping->header = NULL;
+    }
 }
diff --git a/nss/tst-nss-db-endgrent.c b/nss/tst-nss-db-endgrent.c
new file mode 100644
index 0000000000..367cc6c901
--- /dev/null
+++ b/nss/tst-nss-db-endgrent.c
@@ -0,0 +1,54 @@ 
+/* Test for endgrent changing errno for BZ #24696
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <sys/types.h>
+#include <grp.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <support/check.h>
+#include <support/support.h>
+
+/* The following test verifies that if the db NSS Service is initialized
+   with no database (getgrent), that a subsequent closure (endgrent) does
+   not set errno. In the case of the db service it is not an error to close
+   the service and so it should not set errno.  */
+
+static int
+do_test (void)
+{
+  /* Just make sure it's not there, although usually it won't be.  */
+  unlink ("/var/db/group.db");
+
+  /* This, in conjunction with the testroot's nsswitch.conf, causes
+     the nss_db module to be "connected" and initialized - but the
+     testroot has no group.db, so no mapping will be created.  */
+  getgrent ();
+
+  errno = 0;
+
+  /* Before the fix, this would call munmap (NULL) and set errno.  */
+  endgrent ();
+
+  if (errno != 0)
+    FAIL_EXIT1 ("endgrent set errno to %d\n", errno);
+
+  return 0;
+}
+#include <support/test-driver.c>
diff --git a/nss/tst-nss-db-endgrent.root/etc/nsswitch.conf b/nss/tst-nss-db-endgrent.root/etc/nsswitch.conf
new file mode 100644
index 0000000000..21471df94f
--- /dev/null
+++ b/nss/tst-nss-db-endgrent.root/etc/nsswitch.conf
@@ -0,0 +1 @@ 
+group : db files
diff --git a/nss/tst-nss-db-endpwent.c b/nss/tst-nss-db-endpwent.c
new file mode 100644
index 0000000000..cb85410b7c
--- /dev/null
+++ b/nss/tst-nss-db-endpwent.c
@@ -0,0 +1,66 @@ 
+/* Test for endpwent->getpwent crash for BZ #24695
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <pwd.h>
+
+#include <support/support.h>
+#include <support/check.h>
+
+/* It is entirely allowed to start with a getpwent call without
+   resetting the state of the service via a call to setpwent.
+   You can also call getpwent more times than you have entries in
+   the service, and it should not fail.  This test iteratates the
+   database once, gets to the end, and then attempts a second
+   iteration to look for crashes.  */
+
+static void
+try_it (void)
+{
+  struct passwd *pw;
+
+  /* setpwent is intentionally omitted here.  The first call to
+     getpwent detects that it's first and initializes.  The second
+     time try_it is called, this "first call" was not detected before
+     the fix, and getpwent would crash.  */
+
+  while ((pw = getpwent ()) != NULL)
+    ;
+
+  /* We only care if this segfaults or not.  */
+  endpwent ();
+}
+
+static int
+do_test (void)
+{
+  char *cmd;
+
+  cmd = xasprintf ("%s/makedb -o /var/db/passwd.db /var/db/passwd.in",
+		   support_bindir_prefix);
+  system (cmd);
+  free (cmd);
+
+  try_it ();
+  try_it ();
+
+  return 0;
+}
+#include <support/test-driver.c>
diff --git a/nss/tst-nss-db-endpwent.root/etc/nsswitch.conf b/nss/tst-nss-db-endpwent.root/etc/nsswitch.conf
new file mode 100644
index 0000000000..593ffc564a
--- /dev/null
+++ b/nss/tst-nss-db-endpwent.root/etc/nsswitch.conf
@@ -0,0 +1 @@ 
+passwd: db
diff --git a/nss/tst-nss-db-endpwent.root/var/db/passwd.in b/nss/tst-nss-db-endpwent.root/var/db/passwd.in
new file mode 100644
index 0000000000..98f39126ef
--- /dev/null
+++ b/nss/tst-nss-db-endpwent.root/var/db/passwd.in
@@ -0,0 +1,4 @@ 
+.root root:x:0:0:root:/root:/bin/bash
+=0 root:x:0:0:root:/root:/bin/bash
+.bin bin:x:1:1:bin:/bin:/sbin/nologin
+=1 bin:x:1:1:bin:/bin:/sbin/nologin
diff --git a/support/Makefile b/support/Makefile
index 56c1ed43bb..ab66913a02 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -191,6 +191,11 @@  LINKS_DSO_PROGRAM = links-dso-program
 LDLIBS-links-dso-program = -lstdc++ -lgcc -lgcc_s $(libunwind)
 endif
 
+ifeq (yes,$(have-selinux))
+LDLIBS-$(LINKS_DSO_PROGRAM) += -lselinux
+endif
+
+
 LDLIBS-test-container = $(libsupport)
 
 others += test-container
diff --git a/support/links-dso-program-c.c b/support/links-dso-program-c.c
index d28a28a0d0..5fcbab2c17 100644
--- a/support/links-dso-program-c.c
+++ b/support/links-dso-program-c.c
@@ -1,9 +1,26 @@ 
 #include <stdio.h>
 
+/* makedb needs selinux dso's.  */
+#ifdef HAVE_SELINUX
+# include <selinux/selinux.h>
+#endif
+
+/* The purpose of this file is to indicate to the build system which
+   shared objects need to be copied into the testroot, such as gcc or
+   selinux support libraries.  This program is never executed, only
+   scanned for dependencies on shared objects, so the code below may
+   seem weird - it's written to survive gcc optimization and force
+   such dependencies.
+*/
+
 int
 main (int argc, char **argv)
 {
   /* Complexity to keep gcc from optimizing this away.  */
   printf ("This is a test %s.\n", argc > 1 ? argv[1] : "null");
+#ifdef HAVE_SELINUX
+  /* This exists to force libselinux.so to be required.  */
+  printf ("selinux %d\n", is_selinux_enabled ());
+#endif
   return 0;
 }
diff --git a/support/links-dso-program.cc b/support/links-dso-program.cc
index dba6976c06..4bc2411086 100644
--- a/support/links-dso-program.cc
+++ b/support/links-dso-program.cc
@@ -1,11 +1,28 @@ 
 #include <iostream>
 
+/* makedb needs selinux dso's.  */
+#ifdef HAVE_SELINUX
+# include <selinux/selinux.h>
+#endif
+
 using namespace std;
 
+/* The purpose of this file is to indicate to the build system which
+   shared objects need to be copied into the testroot, such as gcc or
+   selinux support libraries.  This program is never executed, only
+   scanned for dependencies on shared objects, so the code below may
+   seem weird - it's written to survive gcc optimization and force
+   such dependencies.
+*/
+
 int
 main (int argc, char **argv)
 {
   /* Complexity to keep gcc from optimizing this away.  */
   cout << (argc > 1 ? argv[1] : "null");
+#ifdef HAVE_SELINUX
+  /* This exists to force libselinux.so to be required.  */
+  cout << "selinux " << is_selinux_enabled ();
+#endif
   return 0;
 }