malloc: Run fork handler as late as possible [BZ #19431]

Message ID 570E5362.7070300@redhat.com
State Committed
Headers

Commit Message

Florian Weimer April 13, 2016, 2:10 p.m. UTC
  On 04/13/2016 02:55 PM, Torvald Riegel wrote:
> On Tue, 2016-04-12 at 21:15 +0200, Florian Weimer wrote:
>> On 04/12/2016 08:16 PM, Torvald Riegel wrote:
>>> On Thu, 2016-02-18 at 17:15 +0100, Florian Weimer wrote:
>>>> On 02/15/2016 07:10 PM, Torvald Riegel wrote:
>>>>
>>>>>> It explicitly encodes the lock order in the implementations of fork,
>>>>>> and does not rely on the registration order, thus avoiding the deadlock.
>>>>>>
>>>>>> I couldn't test the Hurd bits, but the changes look straightforward enough.
>>>>>
>>>>> Are those changes, and thus the new scheme, documented anywhere?
>>>>
>>>> Fork handlers used by the implementation should be invisible to
>>>> applications.  The old one wasn't, and this was a bug.
>>>
>>> But have you documented this and your understanding of the
>>> synchronization scheme anywhere?  Your explanation in the email with the
>>> patch seems more detailed than the comments in the code.
>>
>> That's because I'm talking mainly about removed code and explaining a
>> bug.  I'm not sure how this information will be useful to future
>> developers once the bug is gone.  We have a regression test, which
>> should avoid reintroducing precisely the same bug.  For similar issues,
>> we need to rely on code review and collective memory.
>
> But you do have a new scheme, right, or are doing things in such a way
> that what was formerly a bug now doesn't matter?  You arrived through
> some information on the conclusion that the new scheme works correctly;
> isn't this worth documenting?  It doesn't seem to be obvious.  For
> example, what about saying that fork handlers should be invisible to the
> application or otherwise you'll get problem X?

I have expanded the comments somewhat.  The key point is to make the 
malloc subsystem available to fork handlers, so it's not just about 
synchronization.

> If you can think about those, why not put a comment into a TODO in the
> code?  Or create a bug.  At least I tend to page out things, so if we
> all do that our collective memory will page out things too :)

I'll send a separate patch.

> I agree that we don't want to document why doing something arbitrary was
> a bad idea.  But if it may be something that might look sensible to do
> at first sight, documenting why that doesn't work helps, I think.

I think the history here is that ptmalloc was out-of-tree initially, so 
they had to use the fork handler mechanism to implement 
malloc-after-fork support.

>> See my reply to Roland.  I was under the impression you objected to the
>> name.
>
> I did object to the name, but simply because so far I had only seen
> cases of -internal.h.  I hadn't looked at math, crypt, and login, which
> use _private.h or -private.h.

I switched to malloc-internal.h because the existing [-_]private.h 
headers seem to be mostly subsystem-internal.

Based on previous feedback, I renamed the test to something more 
descriptive than just a bug number.

Florian
  

Comments

Torvald Riegel April 13, 2016, 2:29 p.m. UTC | #1
On Wed, 2016-04-13 at 16:10 +0200, Florian Weimer wrote:
> On 04/13/2016 02:55 PM, Torvald Riegel wrote:
> > On Tue, 2016-04-12 at 21:15 +0200, Florian Weimer wrote:
> >> On 04/12/2016 08:16 PM, Torvald Riegel wrote:
> >>> On Thu, 2016-02-18 at 17:15 +0100, Florian Weimer wrote:
> >>>> On 02/15/2016 07:10 PM, Torvald Riegel wrote:
> >>>>
> >>>>>> It explicitly encodes the lock order in the implementations of fork,
> >>>>>> and does not rely on the registration order, thus avoiding the deadlock.
> >>>>>>
> >>>>>> I couldn't test the Hurd bits, but the changes look straightforward enough.
> >>>>>
> >>>>> Are those changes, and thus the new scheme, documented anywhere?
> >>>>
> >>>> Fork handlers used by the implementation should be invisible to
> >>>> applications.  The old one wasn't, and this was a bug.
> >>>
> >>> But have you documented this and your understanding of the
> >>> synchronization scheme anywhere?  Your explanation in the email with the
> >>> patch seems more detailed than the comments in the code.
> >>
> >> That's because I'm talking mainly about removed code and explaining a
> >> bug.  I'm not sure how this information will be useful to future
> >> developers once the bug is gone.  We have a regression test, which
> >> should avoid reintroducing precisely the same bug.  For similar issues,
> >> we need to rely on code review and collective memory.
> >
> > But you do have a new scheme, right, or are doing things in such a way
> > that what was formerly a bug now doesn't matter?  You arrived through
> > some information on the conclusion that the new scheme works correctly;
> > isn't this worth documenting?  It doesn't seem to be obvious.  For
> > example, what about saying that fork handlers should be invisible to the
> > application or otherwise you'll get problem X?
> 
> I have expanded the comments somewhat.  The key point is to make the 
> malloc subsystem available to fork handlers, so it's not just about 
> synchronization.

LGTM.
  
Andreas Schwab April 14, 2016, 10:26 a.m. UTC | #2
Florian Weimer <fweimer@redhat.com> writes:

> 2016-04-13  Florian Weimer  <fweimer@redhat.com>
>
> 	[BZ #19431]
> 	Run the malloc fork handler as late as possible to avoid deadlocks.
> 	* malloc/malloc-internal.h: New file.
> 	* malloc/malloc.c: Include it.
> 	* malloc/arena.c (ATFORK_MEM): Remove.
> 	(__malloc_fork_lock_parent): Rename from ptmalloc_lock_all.
> 	Update comment.
> 	(__malloc_fork_unlock_parent): Rename from ptmalloc_unlock_all.
> 	(__malloc_fork_unlock_child): Rename from ptmalloc_unlock_all2.
> 	Remove outdated comment.
> 	(ptmalloc_init): Do not call thread_atfork.  Remove
> 	thread_atfork_static.

In file included from malloc.c:1889:0:
arena.c:139:1: error: conflicting types for '__malloc_fork_lock_parent'
 __malloc_fork_lock_parent (void)
 ^
In file included from malloc.c:247:0:
../malloc/malloc-internal.h:23:6: note: previous declaration of '__malloc_fork_lock_parent' was here
 void __malloc_fork_lock_parent (void) internal_function attribute_hidden;
      ^

Andreas.
  
Tulio Magno Quites Machado Filho May 10, 2016, 1:41 p.m. UTC | #3
Hi Florian,

Florian Weimer <fweimer@redhat.com> writes:

> 2016-04-13  Florian Weimer  <fweimer@redhat.com>
>
> 	[BZ #19431]
> 	Run the malloc fork handler as late as possible to avoid deadlocks.
> 	* malloc/malloc-internal.h: New file.
> 	* malloc/malloc.c: Include it.
> 	* malloc/arena.c (ATFORK_MEM): Remove.
> 	(__malloc_fork_lock_parent): Rename from ptmalloc_lock_all.
> 	Update comment.
> 	(__malloc_fork_unlock_parent): Rename from ptmalloc_unlock_all.
> 	(__malloc_fork_unlock_child): Rename from ptmalloc_unlock_all2.
> 	Remove outdated comment.
> 	(ptmalloc_init): Do not call thread_atfork.  Remove
> 	thread_atfork_static.
> 	* malloc/tst-malloc-fork-deadlock.c: New file.
> 	* Makefile (tests): Add tst-malloc-fork-deadlock.
> 	(tst-malloc-fork-deadlock): Link against libpthread.
> 	* manual/memory.texi (Aligned Memory Blocks): Update safety
> 	annotation comments.
> 	* sysdeps/nptl/fork.c (__libc_fork): Call
> 	__malloc_fork_lock_parent, __malloc_fork_unlock_parent,
> 	__malloc_fork_unlock_child.
> 	* sysdeps/mach/hurd/fork.c (__fork): Likewise.

Did you notice any intermittent on malloc/tst-mallocfork after this patch?

The child is getting into a deadlock if the signal arrives before the parent
is able to complete __malloc_fork_unlock_parent().
  
Florian Weimer May 10, 2016, 1:55 p.m. UTC | #4
On 05/10/2016 03:41 PM, Tulio Magno Quites Machado Filho wrote:
> Hi Florian,
>
> Florian Weimer <fweimer@redhat.com> writes:
>
>> 2016-04-13  Florian Weimer  <fweimer@redhat.com>
>>
>> 	[BZ #19431]
>> 	Run the malloc fork handler as late as possible to avoid deadlocks.
>> 	* malloc/malloc-internal.h: New file.
>> 	* malloc/malloc.c: Include it.
>> 	* malloc/arena.c (ATFORK_MEM): Remove.
>> 	(__malloc_fork_lock_parent): Rename from ptmalloc_lock_all.
>> 	Update comment.
>> 	(__malloc_fork_unlock_parent): Rename from ptmalloc_unlock_all.
>> 	(__malloc_fork_unlock_child): Rename from ptmalloc_unlock_all2.
>> 	Remove outdated comment.
>> 	(ptmalloc_init): Do not call thread_atfork.  Remove
>> 	thread_atfork_static.
>> 	* malloc/tst-malloc-fork-deadlock.c: New file.
>> 	* Makefile (tests): Add tst-malloc-fork-deadlock.
>> 	(tst-malloc-fork-deadlock): Link against libpthread.
>> 	* manual/memory.texi (Aligned Memory Blocks): Update safety
>> 	annotation comments.
>> 	* sysdeps/nptl/fork.c (__libc_fork): Call
>> 	__malloc_fork_lock_parent, __malloc_fork_unlock_parent,
>> 	__malloc_fork_unlock_child.
>> 	* sysdeps/mach/hurd/fork.c (__fork): Likewise.
>
> Did you notice any intermittent on malloc/tst-mallocfork after this patch?

No, but we saw a failure once on i686 *before* this patch went in.

> The child is getting into a deadlock if the signal arrives before the parent
> is able to complete __malloc_fork_unlock_parent().

But this could have happened before as well, right?

Thanks,
Florian
  
Tulio Magno Quites Machado Filho May 10, 2016, 6:05 p.m. UTC | #5
Florian Weimer <fweimer@redhat.com> writes:

> On 05/10/2016 03:41 PM, Tulio Magno Quites Machado Filho wrote:
>
>> The child is getting into a deadlock if the signal arrives before the parent
>> is able to complete __malloc_fork_unlock_parent().
>
> But this could have happened before as well, right?

Maybe, but I'm not able to reproduce this error before the removal of malloc
hooks from the fork handler.

I have a patch that seems to solve this issue.
I just want to test it for more time and on more servers before submitting it.
  
Florian Weimer May 10, 2016, 7:51 p.m. UTC | #6
On 05/10/2016 08:05 PM, Tulio Magno Quites Machado Filho wrote:
> Florian Weimer <fweimer@redhat.com> writes:
>
>> On 05/10/2016 03:41 PM, Tulio Magno Quites Machado Filho wrote:
>>
>>> The child is getting into a deadlock if the signal arrives before the parent
>>> is able to complete __malloc_fork_unlock_parent().
>>
>> But this could have happened before as well, right?
>
> Maybe, but I'm not able to reproduce this error before the removal of malloc
> hooks from the fork handler.

Hmm, maybe we are talking about a different failure?  I get a rare 
deadlock with an older glibc:

[fweimer@oldenburg tmp]$ x=0; while ./tst-mallocfork  ; do x=$((x + 1)) 
; done; echo $? $x
Timed out: killed the child process
0 22156
[fweimer@oldenburg tmp]$ x=0; while ./tst-mallocfork  ; do x=$((x + 1)) 
; done; echo $? $x
Timed out: killed the child process
0 10940

We looked at the test case internally a while back, and concluded it was 
invalid for several reasons.

Thanks,
Florian
  

Patch

2016-04-13  Florian Weimer  <fweimer@redhat.com>

	[BZ #19431]
	Run the malloc fork handler as late as possible to avoid deadlocks.
	* malloc/malloc-internal.h: New file.
	* malloc/malloc.c: Include it.
	* malloc/arena.c (ATFORK_MEM): Remove.
	(__malloc_fork_lock_parent): Rename from ptmalloc_lock_all.
	Update comment.
	(__malloc_fork_unlock_parent): Rename from ptmalloc_unlock_all.
	(__malloc_fork_unlock_child): Rename from ptmalloc_unlock_all2.
	Remove outdated comment.
	(ptmalloc_init): Do not call thread_atfork.  Remove
	thread_atfork_static.
	* malloc/tst-malloc-fork-deadlock.c: New file.
	* Makefile (tests): Add tst-malloc-fork-deadlock.
	(tst-malloc-fork-deadlock): Link against libpthread.
	* manual/memory.texi (Aligned Memory Blocks): Update safety
	annotation comments.
	* sysdeps/nptl/fork.c (__libc_fork): Call
	__malloc_fork_lock_parent, __malloc_fork_unlock_parent,
	__malloc_fork_unlock_child.
	* sysdeps/mach/hurd/fork.c (__fork): Likewise.

diff --git a/malloc/Makefile b/malloc/Makefile
index 59d4264..3283f4f 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -29,7 +29,7 @@  tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
 	 tst-malloc-usable tst-realloc tst-posix_memalign \
 	 tst-pvalloc tst-memalign tst-mallopt tst-scratch_buffer \
 	 tst-malloc-backtrace tst-malloc-thread-exit \
-	 tst-malloc-thread-fail
+	 tst-malloc-thread-fail tst-malloc-fork-deadlock
 test-srcs = tst-mtrace
 
 routines = malloc morecore mcheck mtrace obstack \
@@ -49,6 +49,7 @@  libmemusage-inhibit-o = $(filter-out .os,$(object-suffixes))
 $(objpfx)tst-malloc-backtrace: $(shared-thread-library)
 $(objpfx)tst-malloc-thread-exit: $(shared-thread-library)
 $(objpfx)tst-malloc-thread-fail: $(shared-thread-library)
+$(objpfx)tst-malloc-fork-deadlock: $(shared-thread-library)
 
 # These should be removed by `make clean'.
 extra-objs = mcheck-init.o libmcheck.a
diff --git a/malloc/arena.c b/malloc/arena.c
index cd26cdd..bba83e9 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -133,10 +133,6 @@  static void *(*save_malloc_hook)(size_t __size, const void *);
 static void (*save_free_hook) (void *__ptr, const void *);
 static void *save_arena;
 
-# ifdef ATFORK_MEM
-ATFORK_MEM;
-# endif
-
 /* Magic value for the thread-specific arena pointer when
    malloc_atfork() is in use.  */
 
@@ -202,14 +198,14 @@  free_atfork (void *mem, const void *caller)
 /* Counter for number of times the list is locked by the same thread.  */
 static unsigned int atfork_recursive_cntr;
 
-/* The following two functions are registered via thread_atfork() to
-   make sure that the mutexes remain in a consistent state in the
-   fork()ed version of a thread.  Also adapt the malloc and free hooks
-   temporarily, because the `atfork' handler mechanism may use
-   malloc/free internally (e.g. in LinuxThreads). */
+/* The following three functions are called around fork from a
+   multi-threaded process.  We do not use the general fork handler
+   mechanism to make sure that our handlers are the last ones being
+   called, so that other fork handlers can use the malloc
+   subsystem.  */
 
-static void
-ptmalloc_lock_all (void)
+void
+__malloc_fork_lock_parent (void)
 {
   mstate ar_ptr;
 
@@ -217,7 +213,7 @@  ptmalloc_lock_all (void)
     return;
 
   /* We do not acquire free_list_lock here because we completely
-     reconstruct free_list in ptmalloc_unlock_all2.  */
+     reconstruct free_list in __malloc_fork_unlock_child.  */
 
   if (mutex_trylock (&list_lock))
     {
@@ -242,7 +238,7 @@  ptmalloc_lock_all (void)
   __free_hook = free_atfork;
   /* Only the current thread may perform malloc/free calls now.
      save_arena will be reattached to the current thread, in
-     ptmalloc_lock_all, so save_arena->attached_threads is not
+     __malloc_fork_lock_parent, so save_arena->attached_threads is not
      updated.  */
   save_arena = thread_arena;
   thread_arena = ATFORK_ARENA_PTR;
@@ -250,8 +246,8 @@  out:
   ++atfork_recursive_cntr;
 }
 
-static void
-ptmalloc_unlock_all (void)
+void
+__malloc_fork_unlock_parent (void)
 {
   mstate ar_ptr;
 
@@ -262,8 +258,8 @@  ptmalloc_unlock_all (void)
     return;
 
   /* Replace ATFORK_ARENA_PTR with save_arena.
-     save_arena->attached_threads was not changed in ptmalloc_lock_all
-     and is still correct.  */
+     save_arena->attached_threads was not changed in
+     __malloc_fork_lock_parent and is still correct.  */
   thread_arena = save_arena;
   __malloc_hook = save_malloc_hook;
   __free_hook = save_free_hook;
@@ -277,15 +273,8 @@  ptmalloc_unlock_all (void)
   (void) mutex_unlock (&list_lock);
 }
 
-# ifdef __linux__
-
-/* In NPTL, unlocking a mutex in the child process after a
-   fork() is currently unsafe, whereas re-initializing it is safe and
-   does not leak resources.  Therefore, a special atfork handler is
-   installed for the child. */
-
-static void
-ptmalloc_unlock_all2 (void)
+void
+__malloc_fork_unlock_child (void)
 {
   mstate ar_ptr;
 
@@ -321,11 +310,6 @@  ptmalloc_unlock_all2 (void)
   atfork_recursive_cntr = 0;
 }
 
-# else
-
-#  define ptmalloc_unlock_all2 ptmalloc_unlock_all
-# endif
-
 /* Initialization routine. */
 #include <string.h>
 extern char **_environ;
@@ -394,7 +378,6 @@  ptmalloc_init (void)
 #endif
 
   thread_arena = &main_arena;
-  thread_atfork (ptmalloc_lock_all, ptmalloc_unlock_all, ptmalloc_unlock_all2);
   const char *s = NULL;
   if (__glibc_likely (_environ != NULL))
     {
@@ -469,14 +452,6 @@  ptmalloc_init (void)
   __malloc_initialized = 1;
 }
 
-/* There are platforms (e.g. Hurd) with a link-time hook mechanism. */
-#ifdef thread_atfork_static
-thread_atfork_static (ptmalloc_lock_all, ptmalloc_unlock_all,		      \
-                      ptmalloc_unlock_all2)
-#endif
-
-
-
 /* Managing heaps and arenas (for concurrent threads) */
 
 #if MALLOC_DEBUG > 1
@@ -822,7 +797,8 @@  _int_new_arena (size_t size)
      limit is reached).  At this point, some arena has to be attached
      to two threads.  We could acquire the arena lock before list_lock
      to make it less likely that reused_arena picks this new arena,
-     but this could result in a deadlock with ptmalloc_lock_all.  */
+     but this could result in a deadlock with
+     __malloc_fork_lock_parent.  */
 
   (void) mutex_lock (&a->mutex);
 
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
new file mode 100644
index 0000000..b830d3f
--- /dev/null
+++ b/malloc/malloc-internal.h
@@ -0,0 +1,32 @@ 
+/* Internal declarations for malloc, for use within libc.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_PRIVATE_H
+#define _MALLOC_PRIVATE_H
+
+/* Called in the parent process before a fork.  */
+void __malloc_fork_lock_parent (void) internal_function attribute_hidden;
+
+/* Called in the parent process after a fork.  */
+void __malloc_fork_unlock_parent (void) internal_function attribute_hidden;
+
+/* Called in the child process after a fork.  */
+void __malloc_fork_unlock_child (void) internal_function attribute_hidden;
+
+
+#endif /* _MALLOC_PRIVATE_H */
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1eed794..7aad75a 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -244,6 +244,7 @@ 
 /* For ALIGN_UP et. al.  */
 #include <libc-internal.h>
 
+#include <malloc/malloc-internal.h>
 
 /*
   Debugging:
diff --git a/malloc/tst-malloc-fork-deadlock.c b/malloc/tst-malloc-fork-deadlock.c
new file mode 100644
index 0000000..3bd876b
--- /dev/null
+++ b/malloc/tst-malloc-fork-deadlock.c
@@ -0,0 +1,220 @@ 
+/* Test concurrent fork, getline, and fflush (NULL).
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <http://www.gnu.org/licenses/>.  */
+
+#include <sys/wait.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <malloc.h>
+#include <time.h>
+#include <string.h>
+#include <signal.h>
+
+static int do_test (void);
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
+
+enum {
+  /* Number of threads which call fork.  */
+  fork_thread_count = 4,
+  /* Number of threads which call getline (and, indirectly,
+     malloc).  */
+  read_thread_count = 8,
+};
+
+static bool termination_requested;
+
+static void *
+fork_thread_function (void *closure)
+{
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    {
+      pid_t pid = fork ();
+      if (pid < 0)
+        {
+          printf ("error: fork: %m\n");
+          abort ();
+        }
+      else if (pid == 0)
+        _exit (17);
+
+      int status;
+      if (waitpid (pid, &status, 0) < 0)
+        {
+          printf ("error: waitpid: %m\n");
+          abort ();
+        }
+      if (!WIFEXITED (status) || WEXITSTATUS (status) != 17)
+        {
+          printf ("error: waitpid returned invalid status: %d\n", status);
+          abort ();
+        }
+    }
+  return NULL;
+}
+
+static char *file_to_read;
+
+static void *
+read_thread_function (void *closure)
+{
+  FILE *f = fopen (file_to_read, "r");
+  if (f == NULL)
+    {
+      printf ("error: fopen (%s): %m\n", file_to_read);
+      abort ();
+    }
+
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    {
+      rewind (f);
+      char *line = NULL;
+      size_t line_allocated = 0;
+      ssize_t ret = getline (&line, &line_allocated, f);
+      if (ret < 0)
+        {
+          printf ("error: getline: %m\n");
+          abort ();
+        }
+      free (line);
+    }
+  fclose (f);
+
+  return NULL;
+}
+
+static void *
+flushall_thread_function (void *closure)
+{
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    if (fflush (NULL) != 0)
+      {
+        printf ("error: fflush (NULL): %m\n");
+        abort ();
+      }
+  return NULL;
+}
+
+static void
+create_threads (pthread_t *threads, size_t count, void *(*func) (void *))
+{
+  for (size_t i = 0; i < count; ++i)
+    {
+      int ret = pthread_create (threads + i, NULL, func, NULL);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("error: pthread_create: %m\n");
+          abort ();
+        }
+    }
+}
+
+static void
+join_threads (pthread_t *threads, size_t count)
+{
+  for (size_t i = 0; i < count; ++i)
+    {
+      int ret = pthread_join (threads[i], NULL);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("error: pthread_join: %m\n");
+          abort ();
+        }
+    }
+}
+
+/* Create a file which consists of a single long line, and assigns
+   file_to_read.  The hope is that this triggers an allocation in
+   getline which needs a lock.  */
+static void
+create_file_with_large_line (void)
+{
+  int fd = create_temp_file ("bug19431-large-line", &file_to_read);
+  if (fd < 0)
+    {
+      printf ("error: create_temp_file: %m\n");
+      abort ();
+    }
+  FILE *f = fdopen (fd, "w+");
+  if (f == NULL)
+    {
+      printf ("error: fdopen: %m\n");
+      abort ();
+    }
+  for (int i = 0; i < 50000; ++i)
+    fputc ('x', f);
+  fputc ('\n', f);
+  if (ferror (f))
+    {
+      printf ("error: fputc: %m\n");
+      abort ();
+    }
+  if (fclose (f) != 0)
+    {
+      printf ("error: fclose: %m\n");
+      abort ();
+    }
+}
+
+static int
+do_test (void)
+{
+  /* Make sure that we do not exceed the arena limit with the number
+     of threads we configured.  */
+  if (mallopt (M_ARENA_MAX, 400) == 0)
+    {
+      printf ("error: mallopt (M_ARENA_MAX) failed\n");
+      return 1;
+    }
+
+  /* Leave some room for shutting down all threads gracefully.  */
+  int timeout = 3;
+  if (timeout > TIMEOUT)
+    timeout = TIMEOUT - 1;
+
+  create_file_with_large_line ();
+
+  pthread_t fork_threads[fork_thread_count];
+  create_threads (fork_threads, fork_thread_count, fork_thread_function);
+  pthread_t read_threads[read_thread_count];
+  create_threads (read_threads, read_thread_count, read_thread_function);
+  pthread_t flushall_threads[1];
+  create_threads (flushall_threads, 1, flushall_thread_function);
+
+  struct timespec ts = {timeout, 0};
+  if (nanosleep (&ts, NULL))
+    {
+      printf ("error: error: nanosleep: %m\n");
+      abort ();
+    }
+
+  __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED);
+
+  join_threads (flushall_threads, 1);
+  join_threads (read_threads, read_thread_count);
+  join_threads (fork_threads, fork_thread_count);
+
+  free (file_to_read);
+
+  return 0;
+}
diff --git a/manual/memory.texi b/manual/memory.texi
index 3d99147..a3ecc0d 100644
--- a/manual/memory.texi
+++ b/manual/memory.texi
@@ -1051,14 +1051,6 @@  systems that do not support @w{ISO C11}.
 @c     _dl_addr_inside_object ok
 @c    determine_info ok
 @c    __rtld_lock_unlock_recursive (dl_load_lock) @aculock
-@c   thread_atfork @asulock @aculock @acsfd @acsmem
-@c    __register_atfork @asulock @aculock @acsfd @acsmem
-@c     lll_lock (__fork_lock) @asulock @aculock
-@c     fork_handler_alloc @asulock @aculock @acsfd @acsmem
-@c      calloc dup @asulock @aculock @acsfd @acsmem
-@c     __linkin_atfork ok
-@c      catomic_compare_and_exchange_bool_acq ok
-@c     lll_unlock (__fork_lock) @aculock
 @c   *_environ @mtsenv
 @c   next_env_entry ok
 @c   strcspn dup ok
diff --git a/sysdeps/mach/hurd/fork.c b/sysdeps/mach/hurd/fork.c
index ad09fd7..2e8b59e 100644
--- a/sysdeps/mach/hurd/fork.c
+++ b/sysdeps/mach/hurd/fork.c
@@ -26,6 +26,7 @@ 
 #include <assert.h>
 #include "hurdmalloc.h"		/* XXX */
 #include <tls.h>
+#include <malloc/malloc-internal.h>
 
 #undef __fork
 
@@ -107,6 +108,12 @@  __fork (void)
       /* Run things that prepare for forking before we create the task.  */
       RUN_HOOK (_hurd_fork_prepare_hook, ());
 
+      /* Acquire malloc locks.  This needs to come last because fork
+	 handlers may use malloc, and the libio list lock has an
+	 indirect malloc dependency as well (via the getdelim
+	 function).  */
+      __malloc_fork_lock_parent ();
+
       /* Lock things that want to be locked before we fork.  */
       {
 	void *const *p;
@@ -604,6 +611,9 @@  __fork (void)
 			   nthreads * sizeof (*threads));
 	}
 
+      /* Release malloc locks.  */
+      __malloc_fork_unlock_parent ();
+
       /* Run things that want to run in the parent to restore it to
 	 normality.  Usually prepare hooks and parent hooks are
 	 symmetrical: the prepare hook arrests state in some way for the
@@ -655,6 +665,9 @@  __fork (void)
       /* Forking clears the trace flag.  */
       __sigemptyset (&_hurdsig_traced);
 
+      /* Release malloc locks.  */
+      __malloc_fork_unlock_child ();
+
       /* Run things that want to run in the child task to set up.  */
       RUN_HOOK (_hurd_fork_child_hook, ());
 
diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
index 27f8d52..1a68cbd 100644
--- a/sysdeps/nptl/fork.c
+++ b/sysdeps/nptl/fork.c
@@ -31,7 +31,7 @@ 
 #include <fork.h>
 #include <arch-fork.h>
 #include <futex-internal.h>
-
+#include <malloc/malloc-internal.h>
 
 static void
 fresetlockfiles (void)
@@ -111,6 +111,11 @@  __libc_fork (void)
 
   _IO_list_lock ();
 
+  /* Acquire malloc locks.  This needs to come last because fork
+     handlers may use malloc, and the libio list lock has an indirect
+     malloc dependency as well (via the getdelim function).  */
+  __malloc_fork_lock_parent ();
+
 #ifndef NDEBUG
   pid_t ppid = THREAD_GETMEM (THREAD_SELF, tid);
 #endif
@@ -168,6 +173,9 @@  __libc_fork (void)
 # endif
 #endif
 
+      /* Release malloc locks.  */
+      __malloc_fork_unlock_child ();
+
       /* Reset the file list.  These are recursive mutexes.  */
       fresetlockfiles ();
 
@@ -209,6 +217,9 @@  __libc_fork (void)
       /* Restore the PID value.  */
       THREAD_SETMEM (THREAD_SELF, pid, parentpid);
 
+      /* Release malloc locks, parent process variant.  */
+      __malloc_fork_unlock_parent ();
+
       /* We execute this even if the 'fork' call failed.  */
       _IO_list_unlock ();