[BZ,#19329] Fix race during concurrent dlopen and pthread_create

Message ID 1451385533-9485-1-git-send-email-i.palachev@samsung.com
State New, archived
Headers

Commit Message

Ilya Palachev Dec. 29, 2015, 10:38 a.m. UTC
  This patch fixes the bug
https://sourceware.org/bugzilla/show_bug.cgi?id=19329
that happens when pthread_create and dlopen are run concurrently in two
parallel threads. In this case the race between two threads can happen as
follows:

   thread #1                                     thread #2
      ||                                            ||
      ..                                            ..
      ||                                            ||
      vv                                            vv
    dlopen                                     pthread_create
      ||                                            ||
      ..                                            ..
      ||                                            ||
      vv                                            vv
 dl_open_worker----------------------+  _dl_allocate_tls_init--------+
 |                                   |  |                            |
 |  ...                              |  |  ...                       |
 |  _dl_add_to_slotinfo----------+   |  |                            |
 |  |                            |   |  |                            |
 |  |  ...                       |   |  |                            |
 |  |                            |   |  |                            |
 |  |  /* Statement #1.  */      |   |  |                            |
 |  |  listp->slotinfo[idx].gen =|   |  |                            |
 |  |  GL(dl_tls_generation) + 1;|   |  |                            |
 |  |                            |   |  |                            |
 |  |                            |   |  |                            |
 |  +----------------------------+   |  |                            |
 |                                   |  |                            |
 |                                   |  | /* Statement #3.  */       |
 |  /* The race window is here.  */  |  | assert (                   |
 |                                   |  | listp->slotinfo[cnt].gen   |
 |                                   |  | <= GL(dl_tls_generation)); |
 |                                   |  |                            |
 |  /* Statement #2.  */             |  |                            |
 |  ++GL(dl_tls_generation);         |  |                            |
 |                                   |  |                            |
 |                                   |  |                            |
 +-----------------------------------+  +----------------------------+

The patch changes the code so that to unify statements #1 and #2 in one
atomic block so that to prevent the race window in which the
statement #3 can happen.

The changes have no side effects because the changed check
   if (any_tls && __builtin_expect (GL(dl_tls_generation) == 0, 0))
succeeds iff any_tls is true and hence iff _dl_add_to_slotinfo was
called previously. So it can't happen at startup when
   GL(dl_tls_generation) == 0.
The function _dl_add_to_slotinfo is called only from dl_open_worker
and dl_main in elf/rtld.c where the dependencies of the currently
executed program are loaded.

Built and regtested for {x86_64,aarh64}-linux-gnu.

Signed-off-by: Ilya Palachev <i.palachev@samsung.com>

	[BZ #19329]
	* elf/dl-open.c (dl_open_worker): Remove the incrementation of global
	generation counter from here, since it should happen immediately after
	the addition of new TLS module to the slot info list.
	* elf/dl-tls.c (_dl_add_to_slotinfo): Add the atomic incrementation of
	global generation counter just when the new TLS module is added. This
	prevents the race that existed before.
---
 ChangeLog     | 10 ++++++++++
 elf/dl-open.c | 14 +++++++++++---
 elf/dl-tls.c  |  7 ++++++-
 3 files changed, 27 insertions(+), 4 deletions(-)
  

Comments

Szabolcs Nagy Dec. 29, 2015, 6:06 p.m. UTC | #1
* Ilya Palachev <i.palachev@samsung.com> [2015-12-29 13:38:53 +0300]:
> This patch fixes the bug
> https://sourceware.org/bugzilla/show_bug.cgi?id=19329
> that happens when pthread_create and dlopen are run concurrently in two
> parallel threads. In this case the race between two threads can happen as
> follows:
> 

thanks

i have a different fix for this, but even that's not complete.

there are several globals (related to tls) accessed in dlopen and
pthread_create without synchronization, those should be fixed
even if they haven't caused observable problems yet.

>    thread #1                                     thread #2
>       ||                                            ||
>       ..                                            ..
>       ||                                            ||
>       vv                                            vv
>     dlopen                                     pthread_create
>       ||                                            ||
>       ..                                            ..
>       ||                                            ||
>       vv                                            vv
>  dl_open_worker----------------------+  _dl_allocate_tls_init--------+
>  |                                   |  |                            |
>  |  ...                              |  |  ...                       |
>  |  _dl_add_to_slotinfo----------+   |  |                            |
>  |  |                            |   |  |                            |
>  |  |  ...                       |   |  |                            |
>  |  |                            |   |  |                            |
>  |  |  /* Statement #1.  */      |   |  |                            |
>  |  |  listp->slotinfo[idx].gen =|   |  |                            |
>  |  |  GL(dl_tls_generation) + 1;|   |  |                            |
>  |  |                            |   |  |                            |
>  |  |                            |   |  |                            |
>  |  +----------------------------+   |  |                            |
>  |                                   |  |                            |
>  |                                   |  | /* Statement #3.  */       |
>  |  /* The race window is here.  */  |  | assert (                   |
>  |                                   |  | listp->slotinfo[cnt].gen   |
>  |                                   |  | <= GL(dl_tls_generation)); |
>  |                                   |  |                            |
>  |  /* Statement #2.  */             |  |                            |
>  |  ++GL(dl_tls_generation);         |  |                            |
>  |                                   |  |                            |
>  |                                   |  |                            |
>  +-----------------------------------+  +----------------------------+
> 
> The patch changes the code so that to unify statements #1 and #2 in one
> atomic block so that to prevent the race window in which the
> statement #3 can happen.
> 
> The changes have no side effects because the changed check
>    if (any_tls && __builtin_expect (GL(dl_tls_generation) == 0, 0))
> succeeds iff any_tls is true and hence iff _dl_add_to_slotinfo was
> called previously. So it can't happen at startup when
>    GL(dl_tls_generation) == 0.
> The function _dl_add_to_slotinfo is called only from dl_open_worker
> and dl_main in elf/rtld.c where the dependencies of the currently
> executed program are loaded.
> 

this approach changes behaviour:

previously when a new dso was loaded all of its dependencies got
the same generation number: GL(dl_tls_generation)+1

with the patch each loaded dso increments the generation counter.
(so at least the overflow detection needs to be different)

i don't know if this is a problem, but my approach is conservative
and tries to fix the issue in _dl_allocate_tls_init by only
considering tls for which the generation counter is already updated.

(slotinfo etc also has to use atomics to make the changes visible
to other threads)

i can only post my fix after my holiday (sometime on the first week
of jan), but there still seem to be some fundamental issues with the
design (i think these races can be fixed, but the async-signal safe
allocation of tls will require significant changes here and without
that tls access from signal handlers is broken)

> Built and regtested for {x86_64,aarh64}-linux-gnu.
> 
> Signed-off-by: Ilya Palachev <i.palachev@samsung.com>
> 
> 	[BZ #19329]
> 	* elf/dl-open.c (dl_open_worker): Remove the incrementation of global
> 	generation counter from here, since it should happen immediately after
> 	the addition of new TLS module to the slot info list.
> 	* elf/dl-tls.c (_dl_add_to_slotinfo): Add the atomic incrementation of
> 	global generation counter just when the new TLS module is added. This
> 	prevents the race that existed before.
> ---
>  ChangeLog     | 10 ++++++++++
>  elf/dl-open.c | 14 +++++++++++---
>  elf/dl-tls.c  |  7 ++++++-
>  3 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/ChangeLog b/ChangeLog
> index 16e605a..3d7b007 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,13 @@
> +2015-12-29  Ilya Palachev  <i.palachev@samsung.com>
> +
> +	[BZ #19329]
> +	* elf/dl-open.c (dl_open_worker): Remove the incrementation of global
> +	generation counter from here, since it should happen immediately after
> +	the addition of new TLS module to the slot info list.
> +	* elf/dl-tls.c (_dl_add_to_slotinfo): Add the atomic incrementation of
> +	global generation counter just when the new TLS module is added. This
> +	prevents the race that existed before.
> +
>  2015-12-21  Florian Weimer  <fweimer@redhat.com>
>  
>  	[BZ #19182]
> diff --git a/elf/dl-open.c b/elf/dl-open.c
> index 5429d18..a0eff3b 100644
> --- a/elf/dl-open.c
> +++ b/elf/dl-open.c
> @@ -514,7 +514,9 @@ dl_open_worker (void *a)
>  	      && first_static_tls == new->l_searchlist.r_nlist)
>  	    first_static_tls = i;
>  
> -	  /* We have to bump the generation counter.  */
> +	  /* At least one new module with TLS has been loaded. This is the
> +	     only place where the value of any_tls becomes true, and it
> +	     happen when and only when the _dl_add_to_slotinfo was called.  */
>  	  any_tls = true;
>  	}
>  
> @@ -523,8 +525,14 @@ dl_open_worker (void *a)
>  	_dl_show_scope (imap, from_scope);
>      }
>  
> -  /* Bump the generation number if necessary.  */
> -  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
> +  /* Check the TLS generation counter integer overflow if at least one new
> +     module with TLS has been loaded. The condition holds iff any_tls is
> +     true. It can happen only iff _dl_add_to_slotinfo was called (see above
> +     code). And _dl_add_to_slotinfo always increments the value of
> +     GL(dl_tls_generation) atomically (it's done to prevent the race).
> +     That's why the fatal error does not happen in case when
> +     GL(dl_tls_generation) == 0 before any TLS module load.  */
> +  if (any_tls && __builtin_expect (GL(dl_tls_generation) == 0, 0))
>      _dl_fatal_printf (N_("\
>  TLS generation counter wrapped!  Please report this."));
>  
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 20c7e33..3cde943 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -943,5 +943,10 @@ cannot create TLS data structures"));
>  
>    /* Add the information into the slotinfo data structure.  */
>    listp->slotinfo[idx].map = l;
> -  listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
> +  /* Atomically increment the global TLS generation counter and set the
> +     generation counter of new TLS module to this updated value. This
> +     helps to prevent the race that happens if somebody tries to read
> +     the dl_tls_generation when listp->slotinfo[idx].gen is already
> +     incremented while GL(dl_tls_generation) was not.  */
> +  listp->slotinfo[idx].gen = atomic_increment_val(&GL(dl_tls_generation));
>  }
> -- 
> 2.1.3
  
Florian Weimer Dec. 29, 2015, 6:48 p.m. UTC | #2
On 12/29/2015 11:38 AM, Ilya Palachev wrote:
> +  listp->slotinfo[idx].gen = atomic_increment_val(&GL(dl_tls_generation));

Space after atomic_increment_val.  And I think it's a legacy macro, not
one of the new C11 macros.  It's surprising that atomic access is not
needed in any other places.  Why is this so?

Does the counter now need overflow protection on 32-bit architectures
because it is incremented more often?

Florian
  
Szabolcs Nagy Dec. 29, 2015, 8:53 p.m. UTC | #3
* Florian Weimer <fweimer@redhat.com> [2015-12-29 19:48:57 +0100]:
> On 12/29/2015 11:38 AM, Ilya Palachev wrote:
> > +  listp->slotinfo[idx].gen = atomic_increment_val(&GL(dl_tls_generation));
> 
> Space after atomic_increment_val.  And I think it's a legacy macro, not
> one of the new C11 macros.  It's surprising that atomic access is not
> needed in any other places.  Why is this so?
> 

using atomic read during pthread_create is missing from the patch
as well as the use of atomics for slotinfo[idx] members.

a correct fix is non-trivial and will add several atomic memory ops.

> Does the counter now need overflow protection on 32-bit architectures
> because it is incremented more often?

it already had overflow protection, the patch broke that though
  
Ilya Palachev Dec. 30, 2015, 9:08 a.m. UTC | #4
On 29.12.2015 21:06, Szabolcs Nagy wrote:
> thanks
>
> i have a different fix for this, but even that's not complete.
>
> there are several globals (related to tls) accessed in dlopen and
> pthread_create without synchronization, those should be fixed
> even if they haven't caused observable problems yet.
Very interesting, so we're waiting for your implementation.

>
> this approach changes behaviour:
>
> previously when a new dso was loaded all of its dependencies got
> the same generation number: GL(dl_tls_generation)+1
>
> with the patch each loaded dso increments the generation counter.
> (so at least the overflow detection needs to be different)
Sorry, now I see the mistake that has been done.

>
> i don't know if this is a problem, but my approach is conservative
> and tries to fix the issue in _dl_allocate_tls_init by only
> considering tls for which the generation counter is already updated.
>
> (slotinfo etc also has to use atomics to make the changes visible
> to other threads)
>
> i can only post my fix after my holiday (sometime on the first week
> of jan), but there still seem to be some fundamental issues with the
> design (i think these races can be fixed, but the async-signal safe
> allocation of tls will require significant changes here and without
> that tls access from signal handlers is broken)
>
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 16e605a..3d7b007 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@ 
+2015-12-29  Ilya Palachev  <i.palachev@samsung.com>
+
+	[BZ #19329]
+	* elf/dl-open.c (dl_open_worker): Remove the incrementation of global
+	generation counter from here, since it should happen immediately after
+	the addition of new TLS module to the slot info list.
+	* elf/dl-tls.c (_dl_add_to_slotinfo): Add the atomic incrementation of
+	global generation counter just when the new TLS module is added. This
+	prevents the race that existed before.
+
 2015-12-21  Florian Weimer  <fweimer@redhat.com>
 
 	[BZ #19182]
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 5429d18..a0eff3b 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -514,7 +514,9 @@  dl_open_worker (void *a)
 	      && first_static_tls == new->l_searchlist.r_nlist)
 	    first_static_tls = i;
 
-	  /* We have to bump the generation counter.  */
+	  /* At least one new module with TLS has been loaded. This is the
+	     only place where the value of any_tls becomes true, and it
+	     happen when and only when the _dl_add_to_slotinfo was called.  */
 	  any_tls = true;
 	}
 
@@ -523,8 +525,14 @@  dl_open_worker (void *a)
 	_dl_show_scope (imap, from_scope);
     }
 
-  /* Bump the generation number if necessary.  */
-  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
+  /* Check the TLS generation counter integer overflow if at least one new
+     module with TLS has been loaded. The condition holds iff any_tls is
+     true. It can happen only iff _dl_add_to_slotinfo was called (see above
+     code). And _dl_add_to_slotinfo always increments the value of
+     GL(dl_tls_generation) atomically (it's done to prevent the race).
+     That's why the fatal error does not happen in case when
+     GL(dl_tls_generation) == 0 before any TLS module load.  */
+  if (any_tls && __builtin_expect (GL(dl_tls_generation) == 0, 0))
     _dl_fatal_printf (N_("\
 TLS generation counter wrapped!  Please report this."));
 
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 20c7e33..3cde943 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -943,5 +943,10 @@  cannot create TLS data structures"));
 
   /* Add the information into the slotinfo data structure.  */
   listp->slotinfo[idx].map = l;
-  listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+  /* Atomically increment the global TLS generation counter and set the
+     generation counter of new TLS module to this updated value. This
+     helps to prevent the race that happens if somebody tries to read
+     the dl_tls_generation when listp->slotinfo[idx].gen is already
+     incremented while GL(dl_tls_generation) was not.  */
+  listp->slotinfo[idx].gen = atomic_increment_val(&GL(dl_tls_generation));
 }