Patchwork nptl: Fix deadlock on atfork handler which calls dlclose (BZ#24595)

login
register
mail settings
Submitter Adhemerval Zanella Netto
Date May 23, 2019, 1:30 p.m.
Message ID <20190523133048.14922-1-adhemerval.zanella@linaro.org>
Download mbox | patch
Permalink /patch/32829/
State New
Headers show

Comments

Adhemerval Zanella Netto - May 23, 2019, 1:30 p.m.
Some real-world cases rely upon that atfork handlers can call functions
that might change the atfork handlers, such as dlclose.  Since 27761a10
(Refactor atfork handlers), all internal atfork handlers access is
protected with a simple lock, not allowing reentrancy.  This leads to
deadlocks for the aforementioned scenario. Although this specific usage
is far from portable (as comment #2 in the bug report), glibc did allow
it.

This patch fixes by using a recursive lock along with a double-linked
list to hold the atfork handlers.  It allows atfork handlers to remove
elements through dlclose without invaliding the forward or backward
walking when issuing the atfork handlers.

Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu,
powerpc-linux-gnu, and aarch64-linux-gnu.

	* nptl/Makefile [build-shared == yes] (tests): Add tst-atfork3.
	(modules-names): Add tst-atfork3mod.
	(tst-atfork3mod.so-no-z-defs, $(objpfx)tst-atfork3:,
	LDFLAGS-tst-atfork3, $(objpfx)tst-atfork3mod.so,
	$(objpfx)tst-atfork3.out): New rules.
	* nptl/register-atfork.c: (__register_atfork, __unregister_atfork,
	__run_fork_handlers, libc_freeres_fn): Remove usage of dynarray in
	 favor of a double linked list.
	(atfork_lock): Change to a recursive lock.
	(fork_handlers_push_back, fork_handlers_remove,
	fork_handlers_remove_if, fork_handlers_remove_all): New functions.
	(fork_handler_list_find): Remove function.
	* nptl/tst-atfork3.c, nptl/tst-atfork3mod.c: New files.
	* sysdeps/nptl/fork.h (fork_handler): Add next and prev members.
---
 nptl/Makefile          |  10 ++-
 nptl/register-atfork.c | 194 +++++++++++++++++++++++------------------
 nptl/tst-atfork3.c     | 118 +++++++++++++++++++++++++
 nptl/tst-atfork3mod.c  |  45 ++++++++++
 sysdeps/nptl/fork.h    |   2 +
 5 files changed, 279 insertions(+), 90 deletions(-)
 create mode 100644 nptl/tst-atfork3.c
 create mode 100644 nptl/tst-atfork3mod.c
Carlos O'Donell - May 23, 2019, 2:36 p.m.
On 5/23/19 8:30 AM, Adhemerval Zanella wrote:
> Some real-world cases rely upon that atfork handlers can call functions
> that might change the atfork handlers, such as dlclose.  Since 27761a10
> (Refactor atfork handlers), all internal atfork handlers access is
> protected with a simple lock, not allowing reentrancy.  This leads to
> deadlocks for the aforementioned scenario. Although this specific usage
> is far from portable (as comment #2 in the bug report), glibc did allow
> it.
> 
> This patch fixes by using a recursive lock along with a double-linked
> list to hold the atfork handlers.  It allows atfork handlers to remove
> elements through dlclose without invaliding the forward or backward
> walking when issuing the atfork handlers.
> 
> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu,
> powerpc-linux-gnu, and aarch64-linux-gnu.

Florian and I talked about this bug this morning, and I agreed with him
that we really need to get away from running foreign funcions with locks
held. Such solutions always lead to problems. If we are going to refactor
this isn't there a way we can do so that runs the foreign function without
holding the list lock?

Your solution looks correct to me, it's an incremental improvement that
fixes the underlying issue by allowing the lock to be recursive, so it
feels like a good stop gap. I'm just wondering out loud if we can do better
without too much more work.

For example look at stdlib/exit.c where we unlock the list lock before
running the handlers. We make a copy of the data we need and run the 
function. Granted in this case it's only valid for one thread to ever be
calling _exit(), so we have it easier. In the atfork case we can have many
concurrent fork() calls in flight. Can we assume that the user is going
to coordinate their own dlclose vs. library-atfork-handler call?

Thoughts?
Adhemerval Zanella Netto - May 23, 2019, 2:53 p.m.
On 23/05/2019 11:36, Carlos O'Donell wrote:
> On 5/23/19 8:30 AM, Adhemerval Zanella wrote:
>> Some real-world cases rely upon that atfork handlers can call functions
>> that might change the atfork handlers, such as dlclose.  Since 27761a10
>> (Refactor atfork handlers), all internal atfork handlers access is
>> protected with a simple lock, not allowing reentrancy.  This leads to
>> deadlocks for the aforementioned scenario. Although this specific usage
>> is far from portable (as comment #2 in the bug report), glibc did allow
>> it.
>>
>> This patch fixes by using a recursive lock along with a double-linked
>> list to hold the atfork handlers.  It allows atfork handlers to remove
>> elements through dlclose without invaliding the forward or backward
>> walking when issuing the atfork handlers.
>>
>> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu,
>> powerpc-linux-gnu, and aarch64-linux-gnu.
> 
> Florian and I talked about this bug this morning, and I agreed with him
> that we really need to get away from running foreign funcions with locks
> held. Such solutions always lead to problems. If we are going to refactor
> this isn't there a way we can do so that runs the foreign function without
> holding the list lock?
> 
> Your solution looks correct to me, it's an incremental improvement that
> fixes the underlying issue by allowing the lock to be recursive, so it
> feels like a good stop gap. I'm just wondering out loud if we can do better
> without too much more work.
> 
> For example look at stdlib/exit.c where we unlock the list lock before
> running the handlers. We make a copy of the data we need and run the 
> function. Granted in this case it's only valid for one thread to ever be
> calling _exit(), so we have it easier. In the atfork case we can have many
> concurrent fork() calls in flight. Can we assume that the user is going
> to coordinate their own dlclose vs. library-atfork-handler call?

We can go back to old behaviour of using a lock-free registration on the
atfork lists, make a lock-free copy on fork, and handle the synchronization
with __unregister_atfork using locks plus futexes.  It would require all the
old complexity of mixing lock-free algorithms with locks accesses plus using 
an unbounded alloca on fork (maybe we could just use malloc to allocate
the backup list, my understanding it is avoid to to add another issue to
make fork async-signal-safe).

I still think the old solution is over-complex and the issue lays more
in the fact atfork handlers are under-specified under POSIX. My understanding
is also this problem is also out of the scope of POSIX, so I think it should
be an implementation detail to specify what exactly atfork handlers are
allowed to call.

Now, back to your question I think once user start to use dlclose on
library-atfork-handler I support they know are they are doing (even though
this is far from portable is not a bad design imho). Maybe we can check
exactly what user in trying on the pcks11 module and work towards a more
sane solution.
Florian Weimer - May 23, 2019, 5:31 p.m.
* Adhemerval Zanella:

> We can go back to old behaviour of using a lock-free registration on the
> atfork lists, make a lock-free copy on fork, and handle the synchronization
> with __unregister_atfork using locks plus futexes.  It would require all the
> old complexity of mixing lock-free algorithms with locks accesses plus using 
> an unbounded alloca on fork (maybe we could just use malloc to allocate
> the backup list, my understanding it is avoid to to add another issue to
> make fork async-signal-safe).

I think we can avoid the async-signal-safety issue (for now) by not
using locks in single-threaded mode.  pthread_atfork is not required to
be async-signal-safe, so we need not worry about fork/pthread_at_fork
interactions in single-threaded processes, beyond the modification of
the handler list from the handler itself.

Maybe we can avoid making the copy for every fork call, using some sort
of copy-on-write list?  On the reader side, the protocol would look like
this:

  lock the fork handler mutex
    get pointer to the dynlist head from a global variable
    increment the reference counter next to the dynlist head
  unlock the fork handler mutex

  perform all the fork work, traversing the list as needed
  (without any locking)

  lock the fork handler mutex
    decrement the reference counter
    if the counter is zero, delocate the entire data structure
      (list and struct containing list head plus reference counter)
  unlock the fork handler mutex

On the registration/unregistration side (pthread_atfork,
__unregister_atfork), we would do this:

  lock the fork handler mutex
    get pointer to the dynlist head from the global variable
    if the reference counter is 1:
      modify the list in place
    else:
      make a modified copy of the list
      update the global variable to point to it
      decrement the reference counter in the original list
  unlock the fork handler mutex

(The reference counter in the quiet state would have to be 1, because
the list is referenced from the global variable.)

I think this is still simpler than the original scheme.  It also ensures
that we are not modifying the list during the traversal in fork.
Instead, we update a list that will be used for future forks.

What do you think?

Thanks,
Florian
Adhemerval Zanella Netto - May 23, 2019, 6:33 p.m.
On 23/05/2019 14:31, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> We can go back to old behaviour of using a lock-free registration on the
>> atfork lists, make a lock-free copy on fork, and handle the synchronization
>> with __unregister_atfork using locks plus futexes.  It would require all the
>> old complexity of mixing lock-free algorithms with locks accesses plus using 
>> an unbounded alloca on fork (maybe we could just use malloc to allocate
>> the backup list, my understanding it is avoid to to add another issue to
>> make fork async-signal-safe).
> 
> I think we can avoid the async-signal-safety issue (for now) by not
> using locks in single-threaded mode.  pthread_atfork is not required to
> be async-signal-safe, so we need not worry about fork/pthread_at_fork
> interactions in single-threaded processes, beyond the modification of
> the handler list from the handler itself.
> 
> Maybe we can avoid making the copy for every fork call, using some sort
> of copy-on-write list?  On the reader side, the protocol would look like
> this:
> 
>   lock the fork handler mutex
>     get pointer to the dynlist head from a global variable
>     increment the reference counter next to the dynlist head
>   unlock the fork handler mutex
> 
>   perform all the fork work, traversing the list as needed
>   (without any locking)
> 
>   lock the fork handler mutex
>     decrement the reference counter
>     if the counter is zero, delocate the entire data structure
>       (list and struct containing list head plus reference counter)
>   unlock the fork handler mutex
> 
> On the registration/unregistration side (pthread_atfork,
> __unregister_atfork), we would do this:
> 
>   lock the fork handler mutex
>     get pointer to the dynlist head from the global variable
>     if the reference counter is 1:
>       modify the list in place
>     else:
>       make a modified copy of the list
>       update the global variable to point to it
>       decrement the reference counter in the original list
>   unlock the fork handler mutex
> 
> (The reference counter in the quiet state would have to be 1, because
> the list is referenced from the global variable.)
> 
> I think this is still simpler than the original scheme.  It also ensures
> that we are not modifying the list during the traversal in fork.
> Instead, we update a list that will be used for future forks.
> 
> What do you think?

The solution sounds correct, but I don't have a strong opinion if this
is really an improvement over a recursive lock plus a linked list.  It 
potentially adds 'free' calls in fork for multithread mode if list needs 
to be deallocated. Also, since the locks is internal to register-atfork.c
we might have a better control to make the exported interfaces not 
deadlock.
Florian Weimer - May 23, 2019, 7:50 p.m.
* Adhemerval Zanella:

> The solution sounds correct, but I don't have a strong opinion if this
> is really an improvement over a recursive lock plus a linked list.  It 
> potentially adds 'free' calls in fork for multithread mode if list needs 
> to be deallocated. Also, since the locks is internal to register-atfork.c
> we might have a better control to make the exported interfaces not 
> deadlock.

Recursive locks do not prevent deadlocks if the callback launches
threads.  I'm not sure if this can be considered a stretch.  If
there's dlclose involved, their might also be dlopen, and threads
launched from an ELF constructor and so on.

Furthermore, I'm not sure if the implementation of __unregister_atfork
is correct.  I think you have a potential use-after-free issue if the
callback deregisters a fork handler.
Adhemerval Zanella Netto - May 23, 2019, 11:46 p.m.
On 23/05/2019 16:50, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> The solution sounds correct, but I don't have a strong opinion if this
>> is really an improvement over a recursive lock plus a linked list.  It 
>> potentially adds 'free' calls in fork for multithread mode if list needs 
>> to be deallocated. Also, since the locks is internal to register-atfork.c
>> we might have a better control to make the exported interfaces not 
>> deadlock.
> 
> Recursive locks do not prevent deadlocks if the callback launches
> threads.  I'm not sure if this can be considered a stretch.  If
> there's dlclose involved, their might also be dlopen, and threads
> launched from an ELF constructor and so on.

That's why I said the core issue is atfork handlers are really
underspecified, and maybe it would be better if we document more
thoughtfully what we really aim to support by atfork handlers.

> 
> Furthermore, I'm not sure if the implementation of __unregister_atfork
> is correct.  I think you have a potential use-after-free issue if the
> callback deregisters a fork handler.
> 

The possible way I can think of is if an atfork handler from a module
try to dlclose itself and I am not sure how valid it is.

Anyhow, I will implement your suggestion and check the resulting code.
I really would like to avoid introduce a free call to fork again and
make this rather complex atfork handler simpler.
Florian Weimer - May 24, 2019, 11:06 a.m.
* Adhemerval Zanella:

> On 23/05/2019 16:50, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> The solution sounds correct, but I don't have a strong opinion if this
>>> is really an improvement over a recursive lock plus a linked list.  It 
>>> potentially adds 'free' calls in fork for multithread mode if list needs 
>>> to be deallocated. Also, since the locks is internal to register-atfork.c
>>> we might have a better control to make the exported interfaces not 
>>> deadlock.
>> 
>> Recursive locks do not prevent deadlocks if the callback launches
>> threads.  I'm not sure if this can be considered a stretch.  If
>> there's dlclose involved, their might also be dlopen, and threads
>> launched from an ELF constructor and so on.
>
> That's why I said the core issue is atfork handlers are really
> underspecified, and maybe it would be better if we document more
> thoughtfully what we really aim to support by atfork handlers.

That's also true.

>> Furthermore, I'm not sure if the implementation of __unregister_atfork
>> is correct.  I think you have a potential use-after-free issue if the
>> callback deregisters a fork handler.

> The possible way I can think of is if an atfork handler from a module
> try to dlclose itself and I am not sure how valid it is.

dlclose'ing itself is not really valid, I think.  It can only work if
the dlclose call is a tail call.

__unregister_atfork still adds a use-after-free issue, even with the
copy-on-write list: If a dlclose unregisters a fork handler, we will
still attempt to call it subsequently if it was on the list before.  No
data race is involved if the dlclose call happens itself in a fork
handler (so we are back to the territory of the bug).

So the simple reference counting scheme I proposed is probably not the
right solution, and we need a different data structure.

Thanks,
Florian
Adhemerval Zanella Netto - May 24, 2019, 1:24 p.m.
On 24/05/2019 08:06, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 23/05/2019 16:50, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> The solution sounds correct, but I don't have a strong opinion if this
>>>> is really an improvement over a recursive lock plus a linked list.  It 
>>>> potentially adds 'free' calls in fork for multithread mode if list needs 
>>>> to be deallocated. Also, since the locks is internal to register-atfork.c
>>>> we might have a better control to make the exported interfaces not 
>>>> deadlock.
>>>
>>> Recursive locks do not prevent deadlocks if the callback launches
>>> threads.  I'm not sure if this can be considered a stretch.  If
>>> there's dlclose involved, their might also be dlopen, and threads
>>> launched from an ELF constructor and so on.
>>
>> That's why I said the core issue is atfork handlers are really
>> underspecified, and maybe it would be better if we document more
>> thoughtfully what we really aim to support by atfork handlers.
> 
> That's also true.
> 
>>> Furthermore, I'm not sure if the implementation of __unregister_atfork
>>> is correct.  I think you have a potential use-after-free issue if the
>>> callback deregisters a fork handler.
> 
>> The possible way I can think of is if an atfork handler from a module
>> try to dlclose itself and I am not sure how valid it is.
> 
> dlclose'ing itself is not really valid, I think.  It can only work if
> the dlclose call is a tail call.

I would work towards not make dlclose itself valid, do you see any
reason to allow to do so?

> 
> __unregister_atfork still adds a use-after-free issue, even with the
> copy-on-write list: If a dlclose unregisters a fork handler, we will
> still attempt to call it subsequently if it was on the list before.  No
> data race is involved if the dlclose call happens itself in a fork
> handler (so we are back to the territory of the bug).
> 
> So the simple reference counting scheme I proposed is probably not the
> right solution, and we need a different data structure

My understanding is if the idea of dlclose on atfork handler is to
not just unregister the library but also prevent further arfork handler
to been called (since __cxa_finalize will call __unregister_atfork), 
I still think that the recursive mutex plus double-linked list is
the best a suitable solution if we assume two premisses:

  - It is invalid to dlclose itself.
  - Atfork handlers should be local to modules.

Both should avoid imho the use-after-free issue since the callback
won't try to deregister an already used entry in atfork handler list
walking.
Florian Weimer - May 24, 2019, 2:42 p.m.
* Adhemerval Zanella:

> On 24/05/2019 08:06, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> On 23/05/2019 16:50, Florian Weimer wrote:
>>>> * Adhemerval Zanella:
>>>>
>>>>> The solution sounds correct, but I don't have a strong opinion if this
>>>>> is really an improvement over a recursive lock plus a linked list.  It 
>>>>> potentially adds 'free' calls in fork for multithread mode if list needs 
>>>>> to be deallocated. Also, since the locks is internal to register-atfork.c
>>>>> we might have a better control to make the exported interfaces not 
>>>>> deadlock.
>>>>
>>>> Recursive locks do not prevent deadlocks if the callback launches
>>>> threads.  I'm not sure if this can be considered a stretch.  If
>>>> there's dlclose involved, their might also be dlopen, and threads
>>>> launched from an ELF constructor and so on.
>>>
>>> That's why I said the core issue is atfork handlers are really
>>> underspecified, and maybe it would be better if we document more
>>> thoughtfully what we really aim to support by atfork handlers.
>> 
>> That's also true.
>> 
>>>> Furthermore, I'm not sure if the implementation of __unregister_atfork
>>>> is correct.  I think you have a potential use-after-free issue if the
>>>> callback deregisters a fork handler.
>> 
>>> The possible way I can think of is if an atfork handler from a module
>>> try to dlclose itself and I am not sure how valid it is.
>> 
>> dlclose'ing itself is not really valid, I think.  It can only work if
>> the dlclose call is a tail call.
>
> I would work towards not make dlclose itself valid, do you see any
> reason to allow to do so?

Sorry, what do you mean?  Inspect the caller address and reject the
dlclose call if it's within the DSO being closed?  This just avoids a
crash, instead resulting in a dlclose error return.  Not sure if this is
particularly relevant.

>> __unregister_atfork still adds a use-after-free issue, even with the
>> copy-on-write list: If a dlclose unregisters a fork handler, we will
>> still attempt to call it subsequently if it was on the list before.  No
>> data race is involved if the dlclose call happens itself in a fork
>> handler (so we are back to the territory of the bug).
>> 
>> So the simple reference counting scheme I proposed is probably not the
>> right solution, and we need a different data structure
>
> My understanding is if the idea of dlclose on atfork handler is to
> not just unregister the library but also prevent further arfork handler
> to been called (since __cxa_finalize will call __unregister_atfork), 
> I still think that the recursive mutex plus double-linked list is
> the best a suitable solution if we assume two premisses:
>
>   - It is invalid to dlclose itself.
>   - Atfork handlers should be local to modules.
>
> Both should avoid imho the use-after-free issue since the callback
> won't try to deregister an already used entry in atfork handler list
> walking.

I'm not sure about that.  The dlclose could remove a fork handler that
has yet to be called.  The handlers may not have been registered in load
order.

Thanks,
Florian
Adhemerval Zanella Netto - May 24, 2019, 2:49 p.m.
On 24/05/2019 11:42, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 24/05/2019 08:06, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> On 23/05/2019 16:50, Florian Weimer wrote:
>>>>> * Adhemerval Zanella:
>>>>>
>>>>>> The solution sounds correct, but I don't have a strong opinion if this
>>>>>> is really an improvement over a recursive lock plus a linked list.  It 
>>>>>> potentially adds 'free' calls in fork for multithread mode if list needs 
>>>>>> to be deallocated. Also, since the locks is internal to register-atfork.c
>>>>>> we might have a better control to make the exported interfaces not 
>>>>>> deadlock.
>>>>>
>>>>> Recursive locks do not prevent deadlocks if the callback launches
>>>>> threads.  I'm not sure if this can be considered a stretch.  If
>>>>> there's dlclose involved, their might also be dlopen, and threads
>>>>> launched from an ELF constructor and so on.
>>>>
>>>> That's why I said the core issue is atfork handlers are really
>>>> underspecified, and maybe it would be better if we document more
>>>> thoughtfully what we really aim to support by atfork handlers.
>>>
>>> That's also true.
>>>
>>>>> Furthermore, I'm not sure if the implementation of __unregister_atfork
>>>>> is correct.  I think you have a potential use-after-free issue if the
>>>>> callback deregisters a fork handler.
>>>
>>>> The possible way I can think of is if an atfork handler from a module
>>>> try to dlclose itself and I am not sure how valid it is.
>>>
>>> dlclose'ing itself is not really valid, I think.  It can only work if
>>> the dlclose call is a tail call.
>>
>> I would work towards not make dlclose itself valid, do you see any
>> reason to allow to do so?
> 
> Sorry, what do you mean?  Inspect the caller address and reject the
> dlclose call if it's within the DSO being closed?  This just avoids a
> crash, instead resulting in a dlclose error return.  Not sure if this is
> particularly relevant.

Either a runtime or document this is undefined behaviour.

> 
>>> __unregister_atfork still adds a use-after-free issue, even with the
>>> copy-on-write list: If a dlclose unregisters a fork handler, we will
>>> still attempt to call it subsequently if it was on the list before.  No
>>> data race is involved if the dlclose call happens itself in a fork
>>> handler (so we are back to the territory of the bug).
>>>
>>> So the simple reference counting scheme I proposed is probably not the
>>> right solution, and we need a different data structure
>>
>> My understanding is if the idea of dlclose on atfork handler is to
>> not just unregister the library but also prevent further arfork handler
>> to been called (since __cxa_finalize will call __unregister_atfork), 
>> I still think that the recursive mutex plus double-linked list is
>> the best a suitable solution if we assume two premisses:
>>
>>   - It is invalid to dlclose itself.
>>   - Atfork handlers should be local to modules.
>>
>> Both should avoid imho the use-after-free issue since the callback
>> won't try to deregister an already used entry in atfork handler list
>> walking.
> 
> I'm not sure about that.  The dlclose could remove a fork handler that
> has yet to be called.  The handlers may not have been registered in load
> order.
> 

That's should not be a problem. My understanding is respecting the premisses,
a module unregistering an atfork handler will keep the atfork handler list 
concise (i.e, no dangling pointer or entry with invalid callback handlers).

Patch

diff --git a/nptl/Makefile b/nptl/Makefile
index de312b3477..a47a112dc5 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -391,7 +391,7 @@  tests += tst-cancelx2 tst-cancelx3 tst-cancelx4 tst-cancelx5 \
 	 tst-oncex3 tst-oncex4
 ifeq ($(build-shared),yes)
 tests += tst-atfork2 tst-tls4 tst-_res1 tst-fini1 tst-compat-forwarder \
-	 tst-audit-threads
+	 tst-audit-threads tst-atfork3
 tests-internal += tst-tls3 tst-tls3-malloc tst-tls5 tst-stackguard1
 tests-nolibpthread += tst-fini1
 ifeq ($(have-z-execstack),yes)
@@ -404,13 +404,14 @@  modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
 		tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \
 		tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod \
 		tst-join7mod tst-compat-forwarder-mod tst-audit-threads-mod1 \
-		tst-audit-threads-mod2
+		tst-audit-threads-mod2 tst-atfork3mod
 extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) \
 		   tst-cleanup4aux.o tst-cleanupx4aux.o
 test-extras += tst-cleanup4aux tst-cleanupx4aux
 test-modules = $(addprefix $(objpfx),$(addsuffix .so,$(modules-names)))
 
 tst-atfork2mod.so-no-z-defs = yes
+tst-atfork3mod.so-no-z-defs = yes
 tst-tls3mod.so-no-z-defs = yes
 tst-tls5mod.so-no-z-defs = yes
 tst-tls5moda.so-no-z-defs = yes
@@ -547,9 +548,12 @@  tst-cancelx7-ARGS = $(tst-cancel7-ARGS)
 tst-umask1-ARGS = $(objpfx)tst-umask1.temp
 
 $(objpfx)tst-atfork2: $(libdl) $(shared-thread-library)
+$(objpfx)tst-atfork3: $(libdl) $(shared-thread-library)
 LDFLAGS-tst-atfork2 = -rdynamic
+LDFLAGS-tst-atfork3 = -rdynamic
 tst-atfork2-ENV = MALLOC_TRACE=$(objpfx)tst-atfork2.mtrace
 $(objpfx)tst-atfork2mod.so: $(shared-thread-library)
+$(objpfx)tst-atfork3mod.so: $(shared-thread-library)
 
 tst-stack3-ENV = MALLOC_TRACE=$(objpfx)tst-stack3.mtrace
 $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out
@@ -649,7 +653,7 @@  $(addprefix $(objpfx), $(tests-reverse)): \
 $(objpfx)../libc.so: $(common-objpfx)libc.so ;
 $(addprefix $(objpfx),$(tests-static) $(xtests-static)): $(objpfx)libpthread.a
 
-$(objpfx)tst-atfork2.out: $(objpfx)tst-atfork2mod.so
+$(objpfx)tst-atfork3.out: $(objpfx)tst-atfork3mod.so
 else
 $(addprefix $(objpfx),$(tests) $(test-srcs)): $(objpfx)libpthread.a
 endif
diff --git a/nptl/register-atfork.c b/nptl/register-atfork.c
index 80a1becb5f..09a61fc210 100644
--- a/nptl/register-atfork.c
+++ b/nptl/register-atfork.c
@@ -20,131 +20,151 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <fork.h>
-#include <atomic.h>
+#include <libc-lock.h>
 
-#define DYNARRAY_ELEMENT           struct fork_handler
-#define DYNARRAY_STRUCT            fork_handler_list
-#define DYNARRAY_PREFIX            fork_handler_list_
-#define DYNARRAY_INITIAL_SIZE      48
-#include <malloc/dynarray-skeleton.c>
+/* The codes uses a recursive mutex plus a linked-list to allow an atfork
+   handler to modify the fork handler list.  A real usercase is when callback
+   calls dlclose to unload a specific library, where the library itself has an
+   atfork handler and thus is removal will trigger __unregister_atfork.
 
-static struct fork_handler_list fork_handlers;
-static bool fork_handler_init = false;
+   On __run_fork_handlers, if the callback triggers __unregister_atfork
+   the fork_handlers_remove_if removes the elements to make sure the
+   walking can progress (either fowards or backwards).  */
 
-static int atfork_lock = LLL_LOCK_INITIALIZER;
+struct fork_handler_list_t
+{
+  struct fork_handler *first;
+  struct fork_handler *last;
+};
+static struct fork_handler_list_t fork_handlers = { NULL, NULL };
+__libc_lock_define_initialized_recursive (static, atfork_lock);
+
+/* Add the elements ENTRY on the end of FORK_HANDLE_LIST.  */
+static void
+fork_handlers_push_back (struct fork_handler *entry)
+{
+  entry->next = NULL;
+  entry->prev = fork_handlers.last;
+  if (entry->prev != NULL)
+    entry->prev->next = entry;
+  if (fork_handlers.first == NULL)
+    fork_handlers.first = entry;
+  fork_handlers.last = entry;
+}
 
-int
-__register_atfork (void (*prepare) (void), void (*parent) (void),
-		   void (*child) (void), void *dso_handle)
+/* Remove the element ENTRY from the list FORK_HANDLE_LIST, updating
+   its internal state.  */
+static void
+fork_handlers_remove (struct fork_handler *entry)
 {
-  lll_lock (atfork_lock, LLL_PRIVATE);
+  if (entry->prev != NULL)
+    entry->prev->next = entry->next;
+  else
+    fork_handlers.first = entry->next;
 
-  if (!fork_handler_init)
-    {
-      fork_handler_list_init (&fork_handlers);
-      fork_handler_init = true;
-    }
+  if (entry->next != NULL)
+    entry->next->prev = entry->prev;
+  else
+    fork_handlers.last = entry->prev;
 
-  struct fork_handler *newp = fork_handler_list_emplace (&fork_handlers);
-  if (newp != NULL)
+  free (entry);
+}
+
+/* Remove all the elements that has its internal dso_handle equal to
+   DSO_HANDLE.  */
+static void
+fork_handlers_remove_if (void *dso_handle)
+{
+  struct fork_handler *it = fork_handlers.first;
+  while (it != NULL)
     {
-      newp->prepare_handler = prepare;
-      newp->parent_handler = parent;
-      newp->child_handler = child;
-      newp->dso_handle = dso_handle;
+      if (it->dso_handle == dso_handle)
+	{
+	  struct fork_handler *entry = it;
+	  it = it->next;
+	  fork_handlers_remove (entry);
+	}
+      else
+	it = it->next;
     }
-
-  /* Release the lock.  */
-  lll_unlock (atfork_lock, LLL_PRIVATE);
-
-  return newp == NULL ? ENOMEM : 0;
 }
-libc_hidden_def (__register_atfork)
 
-static struct fork_handler *
-fork_handler_list_find (struct fork_handler_list *fork_handlers,
-			void *dso_handle)
+/* Remove all elements from FORK_HANDLE_LIST.  */
+static void
+fork_handlers_remove_all (void)
 {
-  for (size_t i = 0; i < fork_handler_list_size (fork_handlers); i++)
+  struct fork_handler *it = fork_handlers.first;
+  while (it != NULL)
     {
-      struct fork_handler *elem = fork_handler_list_at (fork_handlers, i);
-      if (elem->dso_handle == dso_handle)
-	return elem;
+      struct fork_handler *entry = it;
+      it = it->next;
+      fork_handlers_remove (entry);
     }
-  return NULL;
 }
 
-void
-__unregister_atfork (void *dso_handle)
+int
+__register_atfork (void (*prepare) (void), void (*parent) (void),
+		   void (*child) (void), void *dso_handle)
 {
-  lll_lock (atfork_lock, LLL_PRIVATE);
-
-  struct fork_handler *first = fork_handler_list_find (&fork_handlers,
-						       dso_handle);
-  /* Removing is done by shifting the elements in the way the elements
-     that are not to be removed appear in the beginning in dynarray.
-     This avoid the quadradic run-time if a naive strategy to remove and
-     shift one element at time.  */
-  if (first != NULL)
-    {
-      struct fork_handler *new_end = first;
-      first++;
-      for (; first != fork_handler_list_end (&fork_handlers); ++first)
-	{
-	  if (first->dso_handle != dso_handle)
-	    {
-	      *new_end = *first;
-	      ++new_end;
-	    }
-	}
+  struct fork_handler *entry = malloc (sizeof (struct fork_handler));
+  if (entry == NULL)
+    return ENOMEM;
 
-      ptrdiff_t removed = first - new_end;
-      for (size_t i = 0; i < removed; i++)
-	fork_handler_list_remove_last (&fork_handlers);
-    }
+  entry->prepare_handler = prepare;
+  entry->parent_handler = parent;
+  entry->child_handler = child;
+  entry->dso_handle = dso_handle;
+
+  __libc_lock_lock_recursive (atfork_lock);
+  fork_handlers_push_back (entry);
+  __libc_lock_unlock_recursive (atfork_lock);
 
-  lll_unlock (atfork_lock, LLL_PRIVATE);
+  return 0;
 }
+libc_hidden_def (__register_atfork)
 
 void
-__run_fork_handlers (enum __run_fork_handler_type who, _Bool do_locking)
+__unregister_atfork (void *dso_handle)
 {
-  struct fork_handler *runp;
+  __libc_lock_lock_recursive (atfork_lock);
+  fork_handlers_remove_if (dso_handle);
+  __libc_lock_unlock_recursive (atfork_lock);
+}
 
+void
+__run_fork_handlers (enum __run_fork_handler_type who, _Bool do_locking)
+{
   if (who == atfork_run_prepare)
     {
       if (do_locking)
-	lll_lock (atfork_lock, LLL_PRIVATE);
-      size_t sl = fork_handler_list_size (&fork_handlers);
-      for (size_t i = sl; i > 0; i--)
+	__libc_lock_lock_recursive (atfork_lock);
+
+      for (struct fork_handler *it = fork_handlers.last; it != NULL;
+	   it = it->prev)
 	{
-	  runp = fork_handler_list_at (&fork_handlers, i - 1);
-	  if (runp->prepare_handler != NULL)
-	    runp->prepare_handler ();
+	  if (it->prepare_handler != NULL)
+	    it->prepare_handler ();
 	}
     }
   else
     {
-      size_t sl = fork_handler_list_size (&fork_handlers);
-      for (size_t i = 0; i < sl; i++)
+      for (struct fork_handler *it = fork_handlers.first; it != NULL;
+	   it = it->next)
 	{
-	  runp = fork_handler_list_at (&fork_handlers, i);
-	  if (who == atfork_run_child && runp->child_handler)
-	    runp->child_handler ();
-	  else if (who == atfork_run_parent && runp->parent_handler)
-	    runp->parent_handler ();
-	}
+	  if (who == atfork_run_child && it->child_handler != NULL)
+	    it->child_handler ();
+	  else if (who == atfork_run_parent && it->parent_handler != NULL)
+	    it->parent_handler ();
+        }
       if (do_locking)
-	lll_unlock (atfork_lock, LLL_PRIVATE);
+	__libc_lock_unlock_recursive (atfork_lock);
     }
 }
 
 
 libc_freeres_fn (free_mem)
 {
-  lll_lock (atfork_lock, LLL_PRIVATE);
-
-  fork_handler_list_free (&fork_handlers);
-
-  lll_unlock (atfork_lock, LLL_PRIVATE);
+  __libc_lock_lock_recursive (atfork_lock);
+  fork_handlers_remove_all ();
+  __libc_lock_unlock_recursive (atfork_lock);
 }
diff --git a/nptl/tst-atfork3.c b/nptl/tst-atfork3.c
new file mode 100644
index 0000000000..1435ecc86a
--- /dev/null
+++ b/nptl/tst-atfork3.c
@@ -0,0 +1,118 @@ 
+/* Check if pthread_atfork handler can call dlclose (BZ#24595)
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <dlfcn.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+#include <support/check.h>
+#include <support/xthread.h>
+#include <support/capture_subprocess.h>
+
+/* Check if pthread_atfork handlers do not deadlock when calling a function
+   that might alter the internal fork handle list, such as dlclose.
+
+   The test registers a callback set with pthread_atfork(), dlopen() a shared
+   library (nptl/tst-atfork3mod.c), calls an exported symbol from the library
+   (which in turn also registers atfork handlers), and calls fork to trigger
+   the callbacks.  */
+
+static void *handler;
+static bool run_dlclose_prepare;
+static bool run_dlclose_parent;
+static bool run_dlclose_child;
+
+static void
+prepare (void)
+{
+  if (run_dlclose_prepare)
+    TEST_COMPARE (dlclose (handler), 0);
+}
+
+static void
+parent (void)
+{
+  if (run_dlclose_parent)
+    TEST_COMPARE (dlclose (handler), 0);
+}
+
+static void
+child (void)
+{
+  if (run_dlclose_child)
+    TEST_COMPARE (dlclose (handler), 0);
+}
+
+static void
+proc_func (void *closure)
+{
+}
+
+static void
+do_test_generic (bool dlclose_prepare, bool dlclose_parent, bool dlclose_child)
+{
+  run_dlclose_prepare = dlclose_prepare;
+  run_dlclose_parent = dlclose_parent;
+  run_dlclose_child = dlclose_child;
+
+  handler = dlopen ("tst-atfork3mod.so", RTLD_NOW);
+  TEST_VERIFY (handler != NULL);
+
+  int (*atfork3mod_func)(void);
+  atfork3mod_func = dlsym (handler, "atfork3mod_func");
+  TEST_VERIFY (atfork3mod_func != NULL);
+
+  atfork3mod_func ();
+
+  struct support_capture_subprocess proc
+    = support_capture_subprocess (proc_func, NULL);
+  support_capture_subprocess_check (&proc, "tst-atfork3", 0, sc_allow_none);
+
+  handler = atfork3mod_func = NULL;
+}
+
+static void *
+thread_func (void *closure)
+{
+  return NULL;
+}
+
+static int
+do_test (void)
+{
+  {
+    /* Make the process acts as multithread.  */
+    pthread_attr_t attr;
+    xpthread_attr_init (&attr);
+    xpthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
+    xpthread_create (&attr, thread_func, NULL);
+  }
+
+  TEST_COMPARE (pthread_atfork (prepare, parent, child), 0);
+
+  do_test_generic (true  /* prepare */, false /* parent */, false /* child */);
+  do_test_generic (false /* prepare */, true  /* parent */, false /* child */);
+  do_test_generic (false /* prepare */, false /* parent */, true  /* child */);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/nptl/tst-atfork3mod.c b/nptl/tst-atfork3mod.c
new file mode 100644
index 0000000000..7449962e57
--- /dev/null
+++ b/nptl/tst-atfork3mod.c
@@ -0,0 +1,45 @@ 
+/* Copyright (C) 2003-2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Ulrich Drepper <drepper@redhat.com>, 2003.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <unistd.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include <support/check.h>
+
+static void
+mod_prepare (void)
+{
+}
+
+static void
+mod_parent (void)
+{
+}
+
+static void
+mod_child (void)
+{
+}
+
+int atfork3mod_func (void)
+{
+  TEST_COMPARE (pthread_atfork (mod_prepare, mod_parent, mod_child), 0);
+
+  return 0;
+}
diff --git a/sysdeps/nptl/fork.h b/sysdeps/nptl/fork.h
index 99ed76034b..c742252f8c 100644
--- a/sysdeps/nptl/fork.h
+++ b/sysdeps/nptl/fork.h
@@ -31,6 +31,8 @@  struct fork_handler
   void (*parent_handler) (void);
   void (*child_handler) (void);
   void *dso_handle;
+  struct fork_handler *next;
+  struct fork_handler *prev;
 };
 
 /* Function to call to unregister fork handlers.  */