malloc: Run fork handler as late as possible [BZ #19431]

Message ID 56BBAF3D.5030905@redhat.com
State Superseded
Headers

Commit Message

Florian Weimer Feb. 10, 2016, 9:44 p.m. UTC
  Previously, a thread M invoking fork would acquire locks in this order:

  (M1) malloc arena locks (in the registered fork handler)
  (M2) libio list lock

A thread F invoking flush (NULL) would acquire locks in this order:

  (F1) libio list lock
  (F2) individual _IO_FILE locks

A thread G running getdelim would use this order:

  (G1) _IO_FILE lock
  (G2) malloc arena lock

After executing (M1), (F1), (G1), none of the threads can make progress.

This commit changes the fork lock order to:

  (M'1) libio list lock
  (M'2) malloc arena locks

It explicitly encodes the lock order in the implementations of fork,
and does not rely on the registration order, thus avoiding the deadlock.

I couldn't test the Hurd bits, but the changes look straightforward enough.

Florian
  

Comments

Torvald Riegel Feb. 15, 2016, 6:10 p.m. UTC | #1
On Wed, 2016-02-10 at 22:44 +0100, Florian Weimer wrote:
> Previously, a thread M invoking fork would acquire locks in this order:
> 
>   (M1) malloc arena locks (in the registered fork handler)
>   (M2) libio list lock
> 
> A thread F invoking flush (NULL) would acquire locks in this order:
> 
>   (F1) libio list lock
>   (F2) individual _IO_FILE locks
> 
> A thread G running getdelim would use this order:
> 
>   (G1) _IO_FILE lock
>   (G2) malloc arena lock
> 
> After executing (M1), (F1), (G1), none of the threads can make progress.
> 
> This commit changes the fork lock order to:
> 
>   (M'1) libio list lock
>   (M'2) malloc arena locks
> 
> It explicitly encodes the lock order in the implementations of fork,
> and does not rely on the registration order, thus avoiding the deadlock.
> 
> I couldn't test the Hurd bits, but the changes look straightforward enough.

Are those changes, and thus the new scheme, documented anywhere?

I think it's really worthwhile to always update and improve the
documentation, especially if we're kind of (re)discovering knowledge
about the current implementation such as in this case.  I know this is
extra work, but if this isn't documented, it's less likely anybody but
you will be aware of how this all works.

> diff --git a/malloc/bug19431.c b/malloc/bug19431.c
> new file mode 100644
> index 0000000..3bd876b
> --- /dev/null
> +++ b/malloc/bug19431.c

I haven't seen this naming scheme before. Could you put a tst- prefix in
there, or something like that?

> diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
> new file mode 100644
> index 0000000..2fef840
> --- /dev/null
> +++ b/malloc/malloc-private.h

Should this perhaps be malloc-internal.h to match libc-internal.h?
  
Florian Weimer Feb. 18, 2016, 4:15 p.m. UTC | #2
On 02/15/2016 07:10 PM, Torvald Riegel wrote:

>> It explicitly encodes the lock order in the implementations of fork,
>> and does not rely on the registration order, thus avoiding the deadlock.
>>
>> I couldn't test the Hurd bits, but the changes look straightforward enough.
> 
> Are those changes, and thus the new scheme, documented anywhere?

Fork handlers used by the implementation should be invisible to
applications.  The old one wasn't, and this was a bug.

> I think it's really worthwhile to always update and improve the
> documentation, especially if we're kind of (re)discovering knowledge
> about the current implementation such as in this case.  I know this is
> extra work, but if this isn't documented, it's less likely anybody but
> you will be aware of how this all works.

Do you mean something like this inside the fork implementation?

  /* Acquire malloc locks.  This needs to come last because fork
     handlers may use malloc, and the libio list lock has an indirect
     malloc dependency as well (via the getdelim function).  */
  __malloc_fork_lock_parent ();

>> diff --git a/malloc/bug19431.c b/malloc/bug19431.c
>> new file mode 100644
>> index 0000000..3bd876b
>> --- /dev/null
>> +++ b/malloc/bug19431.c
> 
> I haven't seen this naming scheme before. Could you put a tst- prefix in
> there, or something like that?

Okay, I'm going to do that.  GCC uses names of the form “pr33880.c”,
maybe that's where I got the idea.

>> diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
>> new file mode 100644
>> index 0000000..2fef840
>> --- /dev/null
>> +++ b/malloc/malloc-private.h
> 
> Should this perhaps be malloc-internal.h to match libc-internal.h?

We also have libio/libioP.h, and include/malloc.h.  But I think a
separate header makes sense if its contents is truly internal.

What are the preferences here?

(I hope we will eventually stop build test cases against the internal
headers in include/.)

Florian
  
Florian Weimer March 10, 2016, 1:35 p.m. UTC | #3
On 02/10/2016 10:44 PM, Florian Weimer wrote:
> 
> 2016-02-10  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #19431]
> 	Run the malloc fork handler as late as possible to avoid deadlocks.
> 	* malloc/malloc-private.h: New file.
> 	* malloc/malloc.c: Include it.
> 	* malloc/arena.c (ATFORK_MEM): Remove.
> 	(__malloc_fork_lock_parent): Rename from ptmalloc_lock_all.
> 	Update comment.
> 	(__malloc_fork_unlock_parent): Rename from ptmalloc_unlock_all.
> 	(__malloc_fork_unlock_child): Rename from ptmalloc_unlock_all2.
> 	Remove outdated comment.
> 	(ptmalloc_init): Do not call thread_atfork.  Remove
> 	thread_atfork_static.
> 	* malloc/bug19431.c: New file.
> 	* Makefile (tests): Add it.
> 	(bug19431): Link against libpthread.
> 	* manual/memory.texi (Aligned Memory Blocks): Update safety
> 	annotation comments.
> 	* sysdeps/nptl/fork.c (__libc_fork): Call
> 	__malloc_fork_lock_parent, __malloc_fork_unlock_parent,
> 	__malloc_fork_unlock_child.
> 	* sysdeps/mach/hurd/fork.c (__fork): Likewise.

Ping?

Thanks,
Florian
  
Torvald Riegel April 12, 2016, 6:16 p.m. UTC | #4
On Thu, 2016-02-18 at 17:15 +0100, Florian Weimer wrote:
> On 02/15/2016 07:10 PM, Torvald Riegel wrote:
> 
> >> It explicitly encodes the lock order in the implementations of fork,
> >> and does not rely on the registration order, thus avoiding the deadlock.
> >>
> >> I couldn't test the Hurd bits, but the changes look straightforward enough.
> > 
> > Are those changes, and thus the new scheme, documented anywhere?
> 
> Fork handlers used by the implementation should be invisible to
> applications.  The old one wasn't, and this was a bug.

But have you documented this and your understanding of the
synchronization scheme anywhere?  Your explanation in the email with the
patch seems more detailed than the comments in the code.

> > I think it's really worthwhile to always update and improve the
> > documentation, especially if we're kind of (re)discovering knowledge
> > about the current implementation such as in this case.  I know this is
> > extra work, but if this isn't documented, it's less likely anybody but
> > you will be aware of how this all works.
> 
> Do you mean something like this inside the fork implementation?
> 
>   /* Acquire malloc locks.  This needs to come last because fork
>      handlers may use malloc, and the libio list lock has an indirect
>      malloc dependency as well (via the getdelim function).  */
>   __malloc_fork_lock_parent ();

Not quite.  IIRC this is one comment in one part of the code, but it's
not easy to find it from the other parts of code that are related.

One of the main motivations to document the synchronization scheme is to
avoid requiring future developers that want to modify the code to
closely review all the code (eg, to look for comments such as the one
above).  (The thinking is that, simplified, in sequential code it's
sufficient to look at the callers or the contract of your function,
whereas in concurrent code you don't easily know who is actually
affected by a particiular lock acquisition order.)
This is made easier if there's an overview of the synchronization
somewhere in the code, and it is referenced in affected parts of the
code.

> >> diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
> >> new file mode 100644
> >> index 0000000..2fef840
> >> --- /dev/null
> >> +++ b/malloc/malloc-private.h
> > 
> > Should this perhaps be malloc-internal.h to match libc-internal.h?
> 
> We also have libio/libioP.h, and include/malloc.h.  But I think a
> separate header makes sense if its contents is truly internal.
> 
> What are the preferences here?

I think having an internal header is good, but I can't give a definitive
reply regarding what the project prefers.
  
Roland McGrath April 12, 2016, 6:46 p.m. UTC | #5
> > > Should this perhaps be malloc-internal.h to match libc-internal.h?
> > 
> > We also have libio/libioP.h, and include/malloc.h.  But I think a
> > separate header makes sense if its contents is truly internal.
> > 
> > What are the preferences here?
> 
> I think having an internal header is good, but I can't give a definitive
> reply regarding what the project prefers.

We have a clear preference for keeping internal things in internal headers.
Some of the existing wrapper headers conflate pure wrapper issues (macro
versions of things, hidden_proto, etc) with providing internal interfaces.
But we want to avoid doing that in the future.  Cleaning up existing
violations of this principle would certainly be welcome.
  
Florian Weimer April 12, 2016, 7:06 p.m. UTC | #6
On 04/12/2016 08:46 PM, Roland McGrath wrote:
>>>> Should this perhaps be malloc-internal.h to match libc-internal.h?
>>>
>>> We also have libio/libioP.h, and include/malloc.h.  But I think a
>>> separate header makes sense if its contents is truly internal.
>>>
>>> What are the preferences here?
>>
>> I think having an internal header is good, but I can't give a definitive
>> reply regarding what the project prefers.
>
> We have a clear preference for keeping internal things in internal headers.
> Some of the existing wrapper headers conflate pure wrapper issues (macro
> versions of things, hidden_proto, etc) with providing internal interfaces.
> But we want to avoid doing that in the future.  Cleaning up existing
> violations of this principle would certainly be welcome.

Great, so include/malloc.h is out, and malloc-private.h, 
malloc-internal.h and mallocP.h are still in the race. :)

Just to clarify, do you see a future where we have about five different 
headers per subsystem?

   A. the installed header, subsystem/subsystem.h
   B. its wrapper, include/subsystem.h
   C. architecture-specific overrides, subsystem/bits/subsystem.h,
      to be installed
   D. architecture-specific overrides, not installed
      (must not be in bits/ per previous discussion)
   E. a header for libc-internal access across subsystem boundaries,
      name undecided (often mixed into include/subsystem.h today,
      but libioP.h is an existing example for such a separate header)
   F. a header for subsystem-internal declarations shared by
      translation units in the subsystem, naming convention undecided
      (but there is precedent subsystem-private.h, e.g. pty-private.h)

The new header in this patch is in category F (used across subsystems, 
internal to libc).  Should we keep this separate from E and use a 
separate naming convention?

Florian
  
Florian Weimer April 12, 2016, 7:15 p.m. UTC | #7
On 04/12/2016 08:16 PM, Torvald Riegel wrote:
> On Thu, 2016-02-18 at 17:15 +0100, Florian Weimer wrote:
>> On 02/15/2016 07:10 PM, Torvald Riegel wrote:
>>
>>>> It explicitly encodes the lock order in the implementations of fork,
>>>> and does not rely on the registration order, thus avoiding the deadlock.
>>>>
>>>> I couldn't test the Hurd bits, but the changes look straightforward enough.
>>>
>>> Are those changes, and thus the new scheme, documented anywhere?
>>
>> Fork handlers used by the implementation should be invisible to
>> applications.  The old one wasn't, and this was a bug.
>
> But have you documented this and your understanding of the
> synchronization scheme anywhere?  Your explanation in the email with the
> patch seems more detailed than the comments in the code.

That's because I'm talking mainly about removed code and explaining a 
bug.  I'm not sure how this information will be useful to future 
developers once the bug is gone.  We have a regression test, which 
should avoid reintroducing precisely the same bug.  For similar issues, 
we need to rely on code review and collective memory.

>>> I think it's really worthwhile to always update and improve the
>>> documentation, especially if we're kind of (re)discovering knowledge
>>> about the current implementation such as in this case.  I know this is
>>> extra work, but if this isn't documented, it's less likely anybody but
>>> you will be aware of how this all works.
>>
>> Do you mean something like this inside the fork implementation?
>>
>>    /* Acquire malloc locks.  This needs to come last because fork
>>       handlers may use malloc, and the libio list lock has an indirect
>>       malloc dependency as well (via the getdelim function).  */
>>    __malloc_fork_lock_parent ();
>
> Not quite.  IIRC this is one comment in one part of the code, but it's
> not easy to find it from the other parts of code that are related.
>
> One of the main motivations to document the synchronization scheme is to
> avoid requiring future developers that want to modify the code to
> closely review all the code (eg, to look for comments such as the one
> above).

The cycle isn't there anymore, so it would not particularly helpful to 
mention fork handlers in a comment on _IO_flush_all_lockp.

I would even go one step further: It would be misleading because it 
implies that we are confident about the libio locking behavior and lack 
of cycles.  But libio calls user-supplied callbacks (set by fopencookie) 
while implementation locks are acquired (again in _IO_flush_all_lockp). 
  Diagnostic functions in the malloc subsystem call into libio (via 
stdio streams) while malloc locks are acquired, and libio may in turn 
use malloc.  I'm sure I can find more examples with more poking.

> (The thinking is that, simplified, in sequential code it's
> sufficient to look at the callers or the contract of your function,
> whereas in concurrent code you don't easily know who is actually
> affected by a particiular lock acquisition order.)

With callbacks, you can't be sure even with sequential code.

> This is made easier if there's an overview of the synchronization
> somewhere in the code, and it is referenced in affected parts of the
> code.

I can add some comments to libio with what I have found, but this 
doesn't really related to the locking cycle I removed (because again, 
it's no longer there).

>>>> diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
>>>> new file mode 100644
>>>> index 0000000..2fef840
>>>> --- /dev/null
>>>> +++ b/malloc/malloc-private.h
>>>
>>> Should this perhaps be malloc-internal.h to match libc-internal.h?
>>
>> We also have libio/libioP.h, and include/malloc.h.  But I think a
>> separate header makes sense if its contents is truly internal.
>>
>> What are the preferences here?
>
> I think having an internal header is good, but I can't give a definitive
> reply regarding what the project prefers.

See my reply to Roland.  I was under the impression you objected to the 
name.

Florian
  
Roland McGrath April 12, 2016, 8:37 p.m. UTC | #8
> Great, so include/malloc.h is out, and malloc-private.h, 
> malloc-internal.h and mallocP.h are still in the race. :)

Right.  Personally I would prefer to avoid the camelP.h naming style.
Between the other two I don't have a particular preference except that we
should move towards consistent conventions and knowing what they mean.
e.g. perhaps foo-private.h should only be for things that really should be
private to the foo subsystem, while foo-internal.h is for things in the foo
subsystem that are internal to libc put public across subsystems.  Or
perhaps vice versa.  Or perhaps we don't need both of those.  Whatever.
But choosing conventions, describing them clearly, and using them
consistently is the goal.

> Just to clarify, do you see a future where we have about five different 
> headers per subsystem?

I guess I do, since we've identified 6 categories and there's only two of
them that I'm not 100% sure need to be distinct.

> The new header in this patch is in category F (used across subsystems, 
> internal to libc).  Should we keep this separate from E and use a 
> separate naming convention?

Tentatively yes, but maybe don't bother?  That is, I don't really know off
hand how much we have in categories E and F today or how much we're likely
to get.  If there is little enough it might not be worth distinguishing
them.  Personally I don't mind having clearly-defined categories with very
few members and being pedantic about putting things in the right categories
even when that means some of the headers declare one or two things.  But I
wouldn't want to insist on that.


Thanks,
Roland
  
Torvald Riegel April 13, 2016, 12:55 p.m. UTC | #9
On Tue, 2016-04-12 at 21:15 +0200, Florian Weimer wrote:
> On 04/12/2016 08:16 PM, Torvald Riegel wrote:
> > On Thu, 2016-02-18 at 17:15 +0100, Florian Weimer wrote:
> >> On 02/15/2016 07:10 PM, Torvald Riegel wrote:
> >>
> >>>> It explicitly encodes the lock order in the implementations of fork,
> >>>> and does not rely on the registration order, thus avoiding the deadlock.
> >>>>
> >>>> I couldn't test the Hurd bits, but the changes look straightforward enough.
> >>>
> >>> Are those changes, and thus the new scheme, documented anywhere?
> >>
> >> Fork handlers used by the implementation should be invisible to
> >> applications.  The old one wasn't, and this was a bug.
> >
> > But have you documented this and your understanding of the
> > synchronization scheme anywhere?  Your explanation in the email with the
> > patch seems more detailed than the comments in the code.
> 
> That's because I'm talking mainly about removed code and explaining a 
> bug.  I'm not sure how this information will be useful to future 
> developers once the bug is gone.  We have a regression test, which 
> should avoid reintroducing precisely the same bug.  For similar issues, 
> we need to rely on code review and collective memory.

But you do have a new scheme, right, or are doing things in such a way
that what was formerly a bug now doesn't matter?  You arrived through
some information on the conclusion that the new scheme works correctly;
isn't this worth documenting?  It doesn't seem to be obvious.  For
example, what about saying that fork handlers should be invisible to the
application or otherwise you'll get problem X?

Collective memory is not ideal, especially when the people working on a
project change.

> >>> I think it's really worthwhile to always update and improve the
> >>> documentation, especially if we're kind of (re)discovering knowledge
> >>> about the current implementation such as in this case.  I know this is
> >>> extra work, but if this isn't documented, it's less likely anybody but
> >>> you will be aware of how this all works.
> >>
> >> Do you mean something like this inside the fork implementation?
> >>
> >>    /* Acquire malloc locks.  This needs to come last because fork
> >>       handlers may use malloc, and the libio list lock has an indirect
> >>       malloc dependency as well (via the getdelim function).  */
> >>    __malloc_fork_lock_parent ();
> >
> > Not quite.  IIRC this is one comment in one part of the code, but it's
> > not easy to find it from the other parts of code that are related.
> >
> > One of the main motivations to document the synchronization scheme is to
> > avoid requiring future developers that want to modify the code to
> > closely review all the code (eg, to look for comments such as the one
> > above).
> 
> The cycle isn't there anymore, so it would not particularly helpful to 
> mention fork handlers in a comment on _IO_flush_all_lockp.
> 
> I would even go one step further: It would be misleading because it 
> implies that we are confident about the libio locking behavior and lack 
> of cycles.

Then why not put that into a note?  A simple TODO or XXX would suffice I
suppose, and would alert others that there's more that needs to be
reviewed.  It would be bad if we can only document perfect information.
As long as we're honest about how complete/accurate we think the
information is, providing an intermediate snapshot sounds fine to me.

> But libio calls user-supplied callbacks (set by fopencookie) 
> while implementation locks are acquired (again in _IO_flush_all_lockp). 
>   Diagnostic functions in the malloc subsystem call into libio (via 
> stdio streams) while malloc locks are acquired, and libio may in turn 
> use malloc.  I'm sure I can find more examples with more poking.

If you can think about those, why not put a comment into a TODO in the
code?  Or create a bug.  At least I tend to page out things, so if we
all do that our collective memory will page out things too :)

> > (The thinking is that, simplified, in sequential code it's
> > sufficient to look at the callers or the contract of your function,
> > whereas in concurrent code you don't easily know who is actually
> > affected by a particiular lock acquisition order.)
> 
> With callbacks, you can't be sure even with sequential code.

Sure, and there are other cases in sequential code when a logical
operation is spread across several functions.  But even in those cases
there are less possible interleavings than in a typical concurrent
execution, so the state space is smaller.

> > This is made easier if there's an overview of the synchronization
> > somewhere in the code, and it is referenced in affected parts of the
> > code.
> 
> I can add some comments to libio with what I have found, but this 
> doesn't really related to the locking cycle I removed (because again, 
> it's no longer there).

I agree that we don't want to document why doing something arbitrary was
a bad idea.  But if it may be something that might look sensible to do
at first sight, documenting why that doesn't work helps, I think.

I won't object to the patch if you don't add a comment, but my opinion
is that we should add more details and lessons learned as comments, so
that we don't have to rely on our memory as much.

> >>>> diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
> >>>> new file mode 100644
> >>>> index 0000000..2fef840
> >>>> --- /dev/null
> >>>> +++ b/malloc/malloc-private.h
> >>>
> >>> Should this perhaps be malloc-internal.h to match libc-internal.h?
> >>
> >> We also have libio/libioP.h, and include/malloc.h.  But I think a
> >> separate header makes sense if its contents is truly internal.
> >>
> >> What are the preferences here?
> >
> > I think having an internal header is good, but I can't give a definitive
> > reply regarding what the project prefers.
> 
> See my reply to Roland.  I was under the impression you objected to the 
> name.

I did object to the name, but simply because so far I had only seen
cases of -internal.h.  I hadn't looked at math, crypt, and login, which
use _private.h or -private.h.
  

Patch

2016-02-10  Florian Weimer  <fweimer@redhat.com>

	[BZ #19431]
	Run the malloc fork handler as late as possible to avoid deadlocks.
	* malloc/malloc-private.h: New file.
	* malloc/malloc.c: Include it.
	* malloc/arena.c (ATFORK_MEM): Remove.
	(__malloc_fork_lock_parent): Rename from ptmalloc_lock_all.
	Update comment.
	(__malloc_fork_unlock_parent): Rename from ptmalloc_unlock_all.
	(__malloc_fork_unlock_child): Rename from ptmalloc_unlock_all2.
	Remove outdated comment.
	(ptmalloc_init): Do not call thread_atfork.  Remove
	thread_atfork_static.
	* malloc/bug19431.c: New file.
	* Makefile (tests): Add it.
	(bug19431): Link against libpthread.
	* manual/memory.texi (Aligned Memory Blocks): Update safety
	annotation comments.
	* sysdeps/nptl/fork.c (__libc_fork): Call
	__malloc_fork_lock_parent, __malloc_fork_unlock_parent,
	__malloc_fork_unlock_child.
	* sysdeps/mach/hurd/fork.c (__fork): Likewise.

diff --git a/malloc/Makefile b/malloc/Makefile
index 360288b..7d1e426 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -29,7 +29,7 @@  tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
 	 tst-malloc-usable tst-realloc tst-posix_memalign \
 	 tst-pvalloc tst-memalign tst-mallopt tst-scratch_buffer \
 	 tst-malloc-backtrace tst-malloc-thread-exit \
-	 tst-malloc-thread-fail
+	 tst-malloc-thread-fail bug19431
 test-srcs = tst-mtrace
 
 routines = malloc morecore mcheck mtrace obstack \
@@ -52,6 +52,8 @@  $(objpfx)tst-malloc-thread-exit: $(common-objpfx)nptl/libpthread.so \
 			       $(common-objpfx)nptl/libpthread_nonshared.a
 $(objpfx)tst-malloc-thread-fail: $(common-objpfx)nptl/libpthread.so \
 			       $(common-objpfx)nptl/libpthread_nonshared.a
+$(objpfx)bug19431: $(common-objpfx)nptl/libpthread.so \
+			       $(common-objpfx)nptl/libpthread_nonshared.a
 
 # These should be removed by `make clean'.
 extra-objs = mcheck-init.o libmcheck.a
diff --git a/malloc/arena.c b/malloc/arena.c
index cd26cdd..8c3c3a5 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -133,10 +133,6 @@  static void *(*save_malloc_hook)(size_t __size, const void *);
 static void (*save_free_hook) (void *__ptr, const void *);
 static void *save_arena;
 
-# ifdef ATFORK_MEM
-ATFORK_MEM;
-# endif
-
 /* Magic value for the thread-specific arena pointer when
    malloc_atfork() is in use.  */
 
@@ -202,14 +198,13 @@  free_atfork (void *mem, const void *caller)
 /* Counter for number of times the list is locked by the same thread.  */
 static unsigned int atfork_recursive_cntr;
 
-/* The following two functions are registered via thread_atfork() to
-   make sure that the mutexes remain in a consistent state in the
-   fork()ed version of a thread.  Also adapt the malloc and free hooks
-   temporarily, because the `atfork' handler mechanism may use
-   malloc/free internally (e.g. in LinuxThreads). */
+/* The following three functions are called around fork from a
+   multi-threaded process.  We do not use the general fork handler
+   mechanism to make sure that our handlers are the last ones being
+   called.  */
 
-static void
-ptmalloc_lock_all (void)
+void
+__malloc_fork_lock_parent (void)
 {
   mstate ar_ptr;
 
@@ -217,7 +212,7 @@  ptmalloc_lock_all (void)
     return;
 
   /* We do not acquire free_list_lock here because we completely
-     reconstruct free_list in ptmalloc_unlock_all2.  */
+     reconstruct free_list in __malloc_fork_unlock_child.  */
 
   if (mutex_trylock (&list_lock))
     {
@@ -242,7 +237,7 @@  ptmalloc_lock_all (void)
   __free_hook = free_atfork;
   /* Only the current thread may perform malloc/free calls now.
      save_arena will be reattached to the current thread, in
-     ptmalloc_lock_all, so save_arena->attached_threads is not
+     __malloc_fork_lock_parent, so save_arena->attached_threads is not
      updated.  */
   save_arena = thread_arena;
   thread_arena = ATFORK_ARENA_PTR;
@@ -250,8 +245,8 @@  out:
   ++atfork_recursive_cntr;
 }
 
-static void
-ptmalloc_unlock_all (void)
+void
+__malloc_fork_unlock_parent (void)
 {
   mstate ar_ptr;
 
@@ -262,8 +257,8 @@  ptmalloc_unlock_all (void)
     return;
 
   /* Replace ATFORK_ARENA_PTR with save_arena.
-     save_arena->attached_threads was not changed in ptmalloc_lock_all
-     and is still correct.  */
+     save_arena->attached_threads was not changed in
+     __malloc_fork_lock_parent and is still correct.  */
   thread_arena = save_arena;
   __malloc_hook = save_malloc_hook;
   __free_hook = save_free_hook;
@@ -277,15 +272,8 @@  ptmalloc_unlock_all (void)
   (void) mutex_unlock (&list_lock);
 }
 
-# ifdef __linux__
-
-/* In NPTL, unlocking a mutex in the child process after a
-   fork() is currently unsafe, whereas re-initializing it is safe and
-   does not leak resources.  Therefore, a special atfork handler is
-   installed for the child. */
-
-static void
-ptmalloc_unlock_all2 (void)
+void
+__malloc_fork_unlock_child (void)
 {
   mstate ar_ptr;
 
@@ -321,11 +309,6 @@  ptmalloc_unlock_all2 (void)
   atfork_recursive_cntr = 0;
 }
 
-# else
-
-#  define ptmalloc_unlock_all2 ptmalloc_unlock_all
-# endif
-
 /* Initialization routine. */
 #include <string.h>
 extern char **_environ;
@@ -394,7 +377,6 @@  ptmalloc_init (void)
 #endif
 
   thread_arena = &main_arena;
-  thread_atfork (ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
   const char *s = NULL;
   if (__glibc_likely (_environ != NULL))
     {
@@ -469,14 +451,6 @@  ptmalloc_init (void)
   __malloc_initialized = 1;
 }
 
-/* There are platforms (e.g. Hurd) with a link-time hook mechanism. */
-#ifdef thread_atfork_static
-thread_atfork_static (ptmalloc_lock_all, ptmalloc_unlock_all,		      \
-                      ptmalloc_unlock_all2)
-#endif
-
-
-
 /* Managing heaps and arenas (for concurrent threads) */
 
 #if MALLOC_DEBUG > 1
@@ -822,7 +796,8 @@  _int_new_arena (size_t size)
      limit is reached).  At this point, some arena has to be attached
      to two threads.  We could acquire the arena lock before list_lock
      to make it less likely that reused_arena picks this new arena,
-     but this could result in a deadlock with ptmalloc_lock_all.  */
+     but this could result in a deadlock with
+     __malloc_fork_lock_parent.  */
 
   (void) mutex_lock (&a->mutex);
 
diff --git a/malloc/bug19431.c b/malloc/bug19431.c
new file mode 100644
index 0000000..3bd876b
--- /dev/null
+++ b/malloc/bug19431.c
@@ -0,0 +1,220 @@ 
+/* Test concurrent fork, getline, and fflush (NULL).
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <http://www.gnu.org/licenses/>.  */
+
+#include <sys/wait.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <time.h>
+#include <string.h>
+#include <signal.h>
+
+static int do_test (void);
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
+
+enum {
+  /* Number of threads which call fork.  */
+  fork_thread_count = 4,
+  /* Number of threads which call getline (and, indirectly,
+     malloc).  */
+  read_thread_count = 8,
+};
+
+static bool termination_requested;
+
+static void *
+fork_thread_function (void *closure)
+{
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    {
+      pid_t pid = fork ();
+      if (pid < 0)
+        {
+          printf ("error: fork: %m\n");
+          abort ();
+        }
+      else if (pid == 0)
+        _exit (17);
+
+      int status;
+      if (waitpid (pid, &status, 0) < 0)
+        {
+          printf ("error: waitpid: %m\n");
+          abort ();
+        }
+      if (!WIFEXITED (status) || WEXITSTATUS (status) != 17)
+        {
+          printf ("error: waitpid returned invalid status: %d\n", status);
+          abort ();
+        }
+    }
+  return NULL;
+}
+
+static char *file_to_read;
+
+static void *
+read_thread_function (void *closure)
+{
+  FILE *f = fopen (file_to_read, "r");
+  if (f == NULL)
+    {
+      printf ("error: fopen (%s): %m\n", file_to_read);
+      abort ();
+    }
+
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    {
+      rewind (f);
+      char *line = NULL;
+      size_t line_allocated = 0;
+      ssize_t ret = getline (&line, &line_allocated, f);
+      if (ret < 0)
+        {
+          printf ("error: getline: %m\n");
+          abort ();
+        }
+      free (line);
+    }
+  fclose (f);
+
+  return NULL;
+}
+
+static void *
+flushall_thread_function (void *closure)
+{
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    if (fflush (NULL) != 0)
+      {
+        printf ("error: fflush (NULL): %m\n");
+        abort ();
+      }
+  return NULL;
+}
+
+static void
+create_threads (pthread_t *threads, size_t count, void *(*func)(void *))
+{
+  for (size_t i = 0; i < count; ++i)
+    {
+      int ret = pthread_create (threads + i, NULL, func, NULL);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("error: pthread_create: %m\n");
+          abort ();
+        }
+    }
+}
+
+static void
+join_threads (pthread_t *threads, size_t count)
+{
+  for (size_t i = 0; i < count; ++i)
+    {
+      int ret = pthread_join (threads[i], NULL);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("error: pthread_join: %m\n");
+          abort ();
+        }
+    }
+}
+
+/* Create a file which consists of a single long line, and assigns
+   file_to_read.  The hope is that this triggers an allocation in
+   getline which needs a lock.  */
+static void
+create_file_with_large_line (void)
+{
+  int fd = create_temp_file ("bug19431-large-line", &file_to_read);
+  if (fd < 0)
+    {
+      printf ("error: create_temp_file: %m\n");
+      abort ();
+    }
+  FILE *f = fdopen (fd, "w+");
+  if (f == NULL)
+    {
+      printf ("error: fdopen: %m\n");
+      abort ();
+    }
+  for (int i = 0; i < 50000; ++i)
+    fputc ('x', f);
+  fputc ('\n', f);
+  if (ferror (f))
+    {
+      printf ("error: fputc: %m\n");
+      abort ();
+    }
+  if (fclose (f) != 0)
+    {
+      printf ("error: fclose: %m\n");
+      abort ();
+    }
+}
+
+static int
+do_test (void)
+{
+  /* Make sure that we do not exceed the arena limit with the number
+     of threads we configured.  */
+  if (mallopt (M_ARENA_MAX, 400) == 0)
+    {
+      printf ("error: mallopt (M_ARENA_MAX) failed\n");
+      return 1;
+    }
+
+  /* Leave some room for shutting down all threads gracefully.  */
+  int timeout = 3;
+  if (timeout > TIMEOUT)
+    timeout = TIMEOUT - 1;
+
+  create_file_with_large_line ();
+
+  pthread_t fork_threads[fork_thread_count];
+  create_threads (fork_threads, fork_thread_count, fork_thread_function);
+  pthread_t read_threads[read_thread_count];
+  create_threads (read_threads, read_thread_count, read_thread_function);
+  pthread_t flushall_threads[1];
+  create_threads (flushall_threads, 1, flushall_thread_function);
+
+  struct timespec ts = {timeout, 0};
+  if (nanosleep (&ts, NULL))
+    {
+      printf ("error: error: nanosleep: %m\n");
+      abort ();
+    }
+
+  __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED);
+
+  join_threads (flushall_threads, 1);
+  join_threads (read_threads, read_thread_count);
+  join_threads (fork_threads, fork_thread_count);
+
+  free (file_to_read);
+
+  return 0;
+}
diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
new file mode 100644
index 0000000..2fef840
--- /dev/null
+++ b/malloc/malloc-private.h
@@ -0,0 +1,32 @@ 
+/* Internal declarations for malloc.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_PRIVATE_H
+#define _MALLOC_PRIVATE_H
+
+/* Called in the parent process before a fork.  */
+void __malloc_fork_lock_parent (void) internal_function attribute_hidden;
+
+/* Called in the parent process after a fork.  */
+void __malloc_fork_unlock_parent (void) internal_function attribute_hidden;
+
+/* Called in the child process after a fork.  */
+void __malloc_fork_unlock_child (void) internal_function attribute_hidden;
+
+
+#endif /* _MALLOC_PRIVATE_H */
diff --git a/malloc/malloc.c b/malloc/malloc.c
index b8a43bf..527332f 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -244,6 +244,7 @@ 
 /* For ALIGN_UP et. al.  */
 #include <libc-internal.h>
 
+#include <malloc/malloc-private.h>
 
 /*
   Debugging:
diff --git a/manual/memory.texi b/manual/memory.texi
index 3d99147..a3ecc0d 100644
--- a/manual/memory.texi
+++ b/manual/memory.texi
@@ -1051,14 +1051,6 @@  systems that do not support @w{ISO C11}.
 @c     _dl_addr_inside_object ok
 @c    determine_info ok
 @c    __rtld_lock_unlock_recursive (dl_load_lock) @aculock
-@c   thread_atfork @asulock @aculock @acsfd @acsmem
-@c    __register_atfork @asulock @aculock @acsfd @acsmem
-@c     lll_lock (__fork_lock) @asulock @aculock
-@c     fork_handler_alloc @asulock @aculock @acsfd @acsmem
-@c      calloc dup @asulock @aculock @acsfd @acsmem
-@c     __linkin_atfork ok
-@c      catomic_compare_and_exchange_bool_acq ok
-@c     lll_unlock (__fork_lock) @aculock
 @c   *_environ @mtsenv
 @c   next_env_entry ok
 @c   strcspn dup ok
diff --git a/sysdeps/mach/hurd/fork.c b/sysdeps/mach/hurd/fork.c
index ad09fd7..ba64f86 100644
--- a/sysdeps/mach/hurd/fork.c
+++ b/sysdeps/mach/hurd/fork.c
@@ -26,6 +26,7 @@ 
 #include <assert.h>
 #include "hurdmalloc.h"		/* XXX */
 #include <tls.h>
+#include <malloc/malloc-private.h>
 
 #undef __fork
 
@@ -107,6 +108,9 @@  __fork (void)
       /* Run things that prepare for forking before we create the task.  */
       RUN_HOOK (_hurd_fork_prepare_hook, ());
 
+      /* Acquire malloc locks.  */
+      __malloc_fork_lock_parent ();
+
       /* Lock things that want to be locked before we fork.  */
       {
 	void *const *p;
@@ -604,6 +608,9 @@  __fork (void)
 			   nthreads * sizeof (*threads));
 	}
 
+      /* Release malloc locks.  */
+      __malloc_fork_unlock_parent ();
+
       /* Run things that want to run in the parent to restore it to
 	 normality.  Usually prepare hooks and parent hooks are
 	 symmetrical: the prepare hook arrests state in some way for the
@@ -655,6 +662,9 @@  __fork (void)
       /* Forking clears the trace flag.  */
       __sigemptyset (&_hurdsig_traced);
 
+      /* Release malloc locks.  */
+      __malloc_fork_unlock_child ();
+
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
index 27f8d52..94ee159 100644
--- a/sysdeps/nptl/fork.c
+++ b/sysdeps/nptl/fork.c
@@ -31,7 +31,7 @@ 
 #include <fork.h>
 #include <arch-fork.h>
 #include <futex-internal.h>
-
+#include <malloc/malloc-private.h>
 
 static void
 fresetlockfiles (void)
@@ -111,6 +111,9 @@  __libc_fork (void)
 
   _IO_list_lock ();
 
+  /* Acquire malloc locks.  */
+  __malloc_fork_lock_parent ();
+
 #ifndef NDEBUG
   pid_t ppid = THREAD_GETMEM (THREAD_SELF, tid);
 #endif
@@ -168,6 +171,9 @@  __libc_fork (void)
 # endif
 #endif
 
+      /* Release malloc locks.  */
+      __malloc_fork_unlock_child ();
+
       /* Reset the file list.  These are recursive mutexes.  */
       fresetlockfiles ();
 
@@ -209,6 +215,9 @@  __libc_fork (void)
       /* Restore the PID value.  */
       THREAD_SETMEM (THREAD_SELF, pid, parentpid);
 
+      /* Release malloc locks, parent process variant.  */
+      __malloc_fork_unlock_parent ();
+
       /* We execute this even if the 'fork' call failed.  */
       _IO_list_unlock ();