Fix for bz14333 -- race between atexit() and exit()

Message ID CALoOobPx5KHn7rGQKTy3UnEq_YnzXnGKrKCLmJHPd5bkipb79g@mail.gmail.com
State New, archived
Headers

Commit Message

Paul Pluzhnikov July 20, 2017, 12:37 a.m. UTC
  On Wed, Jul 19, 2017 at 5:26 AM, Torvald Riegel <triegel@redhat.com> wrote:

> It would be good if you could add comments describing the locking scheme
> you are changing

The locking scheme is kind of trivial: hold the lock while reading or
writing any of the relevant globals.

I've added some comments; please let me know if/where more is desired.

Thanks,


2017-07-19  Paul Pluzhnikov  <ppluzhnikov@google.com>
            Ricky Zhou <rickyz@google.com>
            Anoop V Chakkalakkal <anoop.vijayan@in.ibm.com>

        [BZ #14333]
        * stdlib/cxa_atexit.c (__internal_atexit): Use __exit_funcs_lock.
        (__new_exitfn): Fail registration when we finished at_exit processing.
        * stdlib/on_exit.c (__on_exit): Likewise.
        * stdlib/exit.c (__exit_funcs_done): New variable.
        (__run_exit_handlers): Use __exit_funcs_lock.
        * stdlib/exit.h (__exit_funcs_lock, __exit_funcs_done): New
        declarations.
        * stdlib/Makefile (test-atexit-race, test-quick_at_exit-race)
        (test-__cxa_atexit-race): New tests.
        * stdlib/test-atexit-race-common.c: New.
        * stdlib/test-atexit-race.c: New.
        * stdlib/test-at_quick_exit-race.c: New.
        * stdlib/test-__cxa_atexit-race.c: New.
  

Comments

Torvald Riegel July 24, 2017, 8:07 p.m. UTC | #1
On Wed, 2017-07-19 at 17:37 -0700, Paul Pluzhnikov wrote:
> On Wed, Jul 19, 2017 at 5:26 AM, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > It would be good if you could add comments describing the locking scheme
> > you are changing
> 
> The locking scheme is kind of trivial: hold the lock while reading or
> writing any of the relevant globals.

Even if the scheme itself is not complex in the sense that it's simply
one lock that protects a few pieces of data, figuring out what is
intended can be nontrivial.  It's certainly harder than just reading a
comment, so it makes sense to write this down.

> I've added some comments; please let me know if/where more is desired.

Kind of, I guess.  When you write that __exit_funcs_lock protects
__exit_funcs, do you mean that it also protects the full list that this
global points to?  If so, please say that.

Does that fully remove the need for what looks like an (incorrect)
attempt to build a concurrent list?  I see an atomic_write_barrier in
__on_exit that doesn't seem to have a matching acquire load elsewhere;
there's another atomic_write_barrier that may have been intended to
synchronize with the acquire CAS in __cxa_finalize (but other loads from
that field are plain loads).

So, getting back to the level of detail in comments, what I'm looking
for is text that is sufficient to express the intent behind how
concurreny is handled in this piece of code.  I think for a simple
locking scheme such as this one, having one block of comments around the
central lock is good enough; generally, I think it's nicer to add
comments to the respective data (eg, __exit_funcs) that at least say
something like "See __exit_funcs_lock for concurrency notes.", so that
developers get a heads-up when trying to access the data.  (In this
case, both lock and data are really next to each other, so that might
not add that much.)

Has anyone reviewed this patch in detail yet (including the concurrency
aspect)?  I'm not familiar enough with the exit handler code currently,
so I don't want to dive into this unless (1) nobody else has done it nor
plans to do so and (2) we could still get this into the current release.
  
Paul Pluzhnikov July 24, 2017, 8:14 p.m. UTC | #2
On Mon, Jul 24, 2017 at 1:07 PM, Torvald Riegel <triegel@redhat.com> wrote:
> On Wed, 2017-07-19 at 17:37 -0700, Paul Pluzhnikov wrote:
>> On Wed, Jul 19, 2017 at 5:26 AM, Torvald Riegel <triegel@redhat.com> wrote:
>>
>> > It would be good if you could add comments describing the locking scheme
>> > you are changing
>>
>> The locking scheme is kind of trivial: hold the lock while reading or
>> writing any of the relevant globals.
>
> Even if the scheme itself is not complex in the sense that it's simply
> one lock that protects a few pieces of data, figuring out what is
> intended can be nontrivial.  It's certainly harder than just reading a
> comment, so it makes sense to write this down.
>
>> I've added some comments; please let me know if/where more is desired.
>
> Kind of, I guess.  When you write that __exit_funcs_lock protects
> __exit_funcs, do you mean that it also protects the full list that this
> global points to?  If so, please say that.

Yes, Will do.

> Does that fully remove the need for what looks like an (incorrect)
> attempt to build a concurrent list?

Probably. Let me review these. I suspect they are no longer necessary.

> Has anyone reviewed this patch in detail yet?

I don't believe so.


Thanks,
  
Carlos O'Donell July 24, 2017, 8:20 p.m. UTC | #3
On 07/24/2017 04:07 PM, Torvald Riegel wrote:
> Has anyone reviewed this patch in detail yet (including the concurrency
> aspect)?  I'm not familiar enough with the exit handler code currently,
> so I don't want to dive into this unless (1) nobody else has done it nor
> plans to do so and (2) we could still get this into the current release.
 
I have not reviewed this in detail and I don't plan to review it before
2.26 goes out the door since there are other more important blockers to
the release (bug 21298 which I'm still reviewing).

I do not think bug 14333 (this bug) counts as a blocker. We've had this
bug for years, and we've known about it, and lived with it.

I think we can continue to work on this set of patches to meet our
expectations.

Paul, just for reference look at f8bf15febcaf137bbec5a61101e88cd5a9d56ca8
and the amount of comments for what started off as a "simple change" that
was less than a few lines of code originally posted by VMWare :-)

I don't want to scare you, and you need not be held responsible for
documenting the P&C semantics of the code you touch, that's a responsibility
that all of us collectively bear. So I'm happy to help write more of this up,
but I can't do that until August.
  
Paul Pluzhnikov July 24, 2017, 8:57 p.m. UTC | #4
On Mon, Jul 24, 2017 at 1:20 PM, Carlos O'Donell <carlos@redhat.com> wrote:

> I do not think bug 14333 (this bug) counts as a blocker. We've had this
> bug for years, and we've known about it, and lived with it.

For the record, there is no urgency to get this into 2.26 from my side.

We do hit this bug on and off, and figuring out that it's *that* bug
(again) takes time, and the fix appears to be within reach ;-) but we
can certainly wait a bit more.
  

Patch

diff --git a/stdlib/Makefile b/stdlib/Makefile
index 0314d5926b..7731b53cc8 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -80,7 +80,8 @@  tests		:= tst-strtol tst-strtod testmb testrand testsort testdiv   \
 		   tst-strtol-locale tst-strtod-nan-locale tst-strfmon_l    \
 		   tst-quick_exit tst-thread-quick_exit tst-width	    \
 		   tst-width-stdint tst-strfrom tst-strfrom-locale	    \
-		   tst-getrandom
+		   tst-getrandom test-atexit-race test-at_quick_exit-race   \
+		   test-__cxa_atexit-race
 tests-internal	:= tst-strtod1i tst-strtod3 tst-strtod4 tst-strtod5i \
 		   tst-tls-atexit tst-tls-atexit-nodelete
 tests-static	:= tst-secure-getenv
@@ -89,6 +90,10 @@  ifeq ($(build-hardcoded-path-in-tests),yes)
 tests += tst-empty-env
 endif
 
+LDLIBS-test-atexit-race = $(shared-thread-library)
+LDLIBS-test-at_quick_exit-race = $(shared-thread-library)
+LDLIBS-test-__cxa_atexit-race = $(shared-thread-library)
+
 ifeq ($(have-cxx-thread_local),yes)
 CFLAGS-tst-quick_exit.o = -std=c++11
 LDLIBS-tst-quick_exit = -lstdc++
diff --git a/stdlib/cxa_atexit.c b/stdlib/cxa_atexit.c
index ce5d9f22b4..a21d865844 100644
--- a/stdlib/cxa_atexit.c
+++ b/stdlib/cxa_atexit.c
@@ -26,16 +26,25 @@ 
 
 #undef __cxa_atexit
 
+/* We change global data, so we need locking.  */
+__libc_lock_define_initialized (, __exit_funcs_lock)
+
 
 int
 attribute_hidden
 __internal_atexit (void (*func) (void *), void *arg, void *d,
 		   struct exit_function_list **listp)
 {
-  struct exit_function *new = __new_exitfn (listp);
+  struct exit_function *new;
+
+  __libc_lock_lock (__exit_funcs_lock);
+  new = __new_exitfn (listp);
 
   if (new == NULL)
-    return -1;
+    {
+      __libc_lock_unlock (__exit_funcs_lock);
+      return -1;
+    }
 
 #ifdef PTR_MANGLE
   PTR_MANGLE (func);
@@ -45,6 +54,7 @@  __internal_atexit (void (*func) (void *), void *arg, void *d,
   new->func.cxa.dso_handle = d;
   atomic_write_barrier ();
   new->flavor = ef_cxa;
+  __libc_lock_unlock (__exit_funcs_lock);
   return 0;
 }
 
@@ -60,10 +70,6 @@  __cxa_atexit (void (*func) (void *), void *arg, void *d)
 libc_hidden_def (__cxa_atexit)
 
 
-/* We change global data, so we need locking.  */
-__libc_lock_define_initialized (static, lock)
-
-
 static struct exit_function_list initial;
 struct exit_function_list *__exit_funcs = &initial;
 uint64_t __new_exitfn_called;
@@ -76,7 +82,10 @@  __new_exitfn (struct exit_function_list **listp)
   struct exit_function *r = NULL;
   size_t i = 0;
 
-  __libc_lock_lock (lock);
+  if (__exit_funcs_done)
+    /* exit code finished processing all handlers
+       so fail this registration */
+    return NULL;
 
   for (l = *listp; l != NULL; p = l, l = l->next)
     {
@@ -127,7 +136,5 @@  __new_exitfn (struct exit_function_list **listp)
       ++__new_exitfn_called;
     }
 
-  __libc_lock_unlock (lock);
-
   return r;
 }
diff --git a/stdlib/exit.c b/stdlib/exit.c
index c0b6d666c7..a3205ae8c0 100644
--- a/stdlib/exit.c
+++ b/stdlib/exit.c
@@ -19,11 +19,14 @@ 
 #include <stdlib.h>
 #include <unistd.h>
 #include <sysdep.h>
+#include <libc-lock.h>
 #include "exit.h"
 
 #include "set-hooks.h"
 DEFINE_HOOK (__libc_atexit, (void))
 
+/* Initialise the processing complete flag to false.  */
+bool __exit_funcs_done = false;
 
 /* Call all functions registered with `atexit' and `on_exit',
    in the reverse of the order in which they were registered
@@ -44,14 +47,31 @@  __run_exit_handlers (int status, struct exit_function_list **listp,
      the functions registered with `atexit' and `on_exit'. We call
      everyone on the list and use the status value in the last
      exit (). */
-  while (*listp != NULL)
+  while (true)
     {
-      struct exit_function_list *cur = *listp;
+      struct exit_function_list *cur;
+
+    restart:
+      __libc_lock_lock (__exit_funcs_lock);
+      cur = *listp;
+
+      if (cur == NULL)
+	{
+	  /* Exit processing complete.  We will not allow any more
+	     atexit/on_exit registrations.  */
+	  __exit_funcs_done = true;
+	  __libc_lock_unlock (__exit_funcs_lock);
+	  break;
+	}
 
       while (cur->idx > 0)
 	{
 	  const struct exit_function *const f =
 	    &cur->fns[--cur->idx];
+	  const uint64_t new_exitfn_called = __new_exitfn_called;
+
+	  /* Unlock the list while we call into user-provided code.  */
+	  __libc_lock_unlock (__exit_funcs_lock);
 	  switch (f->flavor)
 	    {
 	      void (*atfct) (void);
@@ -83,6 +103,16 @@  __run_exit_handlers (int status, struct exit_function_list **listp,
 	      cxafct (f->func.cxa.arg, status);
 	      break;
 	    }
+	  /* Re-lock again before looking at global state.  */
+	  __libc_lock_lock (__exit_funcs_lock);
+
+	  if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called))
+	    {
+	      /* The last exit function, or another thread, has registered
+		 more exit functions.  Start the loop over.  */
+	      __libc_lock_unlock (__exit_funcs_lock);
+	      goto restart;
+	    }
 	}
 
       *listp = cur->next;
@@ -90,6 +120,8 @@  __run_exit_handlers (int status, struct exit_function_list **listp,
 	/* Don't free the last element in the chain, this is the statically
 	   allocate element.  */
 	free (cur);
+
+      __libc_lock_unlock (__exit_funcs_lock);
     }
 
   if (run_list_atexit)
diff --git a/stdlib/exit.h b/stdlib/exit.h
index 7f2e679246..21776f6dcb 100644
--- a/stdlib/exit.h
+++ b/stdlib/exit.h
@@ -20,6 +20,7 @@ 
 
 #include <stdbool.h>
 #include <stdint.h>
+#include <libc-lock.h>
 
 enum
 {
@@ -57,11 +58,24 @@  struct exit_function_list
     size_t idx;
     struct exit_function fns[32];
   };
+
 extern struct exit_function_list *__exit_funcs attribute_hidden;
 extern struct exit_function_list *__quick_exit_funcs attribute_hidden;
+/* flag to mark that all registered atexit/onexit handlers are called */
+extern bool __exit_funcs_done attribute_hidden;
+extern uint64_t __new_exitfn_called attribute_hidden;
+
+/* This lock protects above globals: __exit_funcs, __quick_exit_funcs,
+   __exit_funcs_done and __new_exitfn_called (must be held while reading
+   or modifying any of these globals) against simultaneous access from
+   atexit/on_exit/at_quick_exit in multiple threads, and also from
+   simultaneous access while another thread is in the middle of calling
+   exit handlers.  See BZ#14333.  */
+__libc_lock_define (extern, __exit_funcs_lock);
+
 
 extern struct exit_function *__new_exitfn (struct exit_function_list **listp);
-extern uint64_t __new_exitfn_called attribute_hidden;
+
 
 extern void __run_exit_handlers (int status,
 				 struct exit_function_list **listp,
diff --git a/stdlib/on_exit.c b/stdlib/on_exit.c
index 83845e76d8..2db3063db9 100644
--- a/stdlib/on_exit.c
+++ b/stdlib/on_exit.c
@@ -24,10 +24,16 @@ 
 int
 __on_exit (void (*func) (int status, void *arg), void *arg)
 {
-  struct exit_function *new = __new_exitfn (&__exit_funcs);
+  struct exit_function *new;
+
+   __libc_lock_lock (__exit_funcs_lock);
+  new = __new_exitfn (&__exit_funcs);
 
   if (new == NULL)
-    return -1;
+    {
+      __libc_lock_unlock (__exit_funcs_lock);
+      return -1;
+    }
 
 #ifdef PTR_MANGLE
   PTR_MANGLE (func);
@@ -36,6 +42,7 @@  __on_exit (void (*func) (int status, void *arg), void *arg)
   new->func.on.arg = arg;
   atomic_write_barrier ();
   new->flavor = ef_on;
+  __libc_lock_unlock (__exit_funcs_lock);
   return 0;
 }
 weak_alias (__on_exit, on_exit)
diff --git a/stdlib/test-__cxa_atexit-race.c b/stdlib/test-__cxa_atexit-race.c
new file mode 100644
index 0000000000..b86f6ce212
--- /dev/null
+++ b/stdlib/test-__cxa_atexit-race.c
@@ -0,0 +1,34 @@ 
+/* A test for __cxa_atexit/exit race from bz14333.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file must be run from within a directory called "stdlib".  */
+
+#include <stdio.h>
+
+#define CALL_ATEXIT __cxa_atexit (&no_op, NULL, NULL)
+#define CALL_EXIT exit (0)
+
+int __cxa_atexit (void (*func) (void *), void *arg, void *d);
+
+static void
+no_op (void *ignored)
+{
+}
+
+#include <stdlib/test-atexit-race-common.c>
diff --git a/stdlib/test-at_quick_exit-race.c b/stdlib/test-at_quick_exit-race.c
new file mode 100644
index 0000000000..2521a6b77c
--- /dev/null
+++ b/stdlib/test-at_quick_exit-race.c
@@ -0,0 +1,30 @@ 
+/* A test for at_quick_exit/quick_exit race from bz14333.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file must be run from within a directory called "stdlib".  */
+
+#define CALL_ATEXIT at_quick_exit (&no_op)
+#define CALL_EXIT quick_exit (0)
+
+static void
+no_op (void)
+{
+}
+
+#include <stdlib/test-atexit-race-common.c>
diff --git a/stdlib/test-atexit-race-common.c b/stdlib/test-atexit-race-common.c
new file mode 100644
index 0000000000..c4cbd9e592
--- /dev/null
+++ b/stdlib/test-atexit-race-common.c
@@ -0,0 +1,62 @@ 
+/* Support file for atexit/exit, at_quick_exit/quick_exit, etc. race tests
+   from bz14333.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file must be run from within a directory called "stdlib".  */
+
+#if !defined(CALL_EXIT) || !defined(CALL_ATEXIT)
+#error Must define CALL_EXIT and CALL_ATEXIT before using this file.
+#endif
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+const size_t kNumThreads = 1024;
+const size_t kNumHandlers = 1024;
+
+static void *
+threadfunc (void *unused)
+{
+  size_t i;
+  for (i = 0; i < kNumHandlers; ++i) {
+    CALL_ATEXIT;
+  }
+  return NULL;
+}
+
+static int
+do_test (void)
+{
+  size_t i;
+  pthread_t thr;
+  pthread_attr_t attr;
+
+  pthread_attr_init (&attr);
+  pthread_attr_setdetachstate (&attr, 1);
+
+  for (i = 0; i < kNumThreads; ++i) {
+    pthread_create (&thr, &attr, threadfunc, NULL);
+  }
+
+  CALL_EXIT;
+}
+
+#define TEST_FUNCTION do_test
+#include <support/test-driver.c>
diff --git a/stdlib/test-atexit-race.c b/stdlib/test-atexit-race.c
new file mode 100644
index 0000000000..b183ecfd7e
--- /dev/null
+++ b/stdlib/test-atexit-race.c
@@ -0,0 +1,30 @@ 
+/* A test for atexit/exit race from bz14333.
+
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file must be run from within a directory called "stdlib".  */
+
+#define CALL_ATEXIT atexit (&no_op)
+#define CALL_EXIT exit (0)
+
+static void
+no_op (void)
+{
+}
+
+#include <stdlib/test-atexit-race-common.c>