Message ID | 20171017212919.741-1-mathieu.desnoyers@efficios.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 91909 invoked by alias); 17 Oct 2017 21:31:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 91853 invoked by uid 89); 17 Oct 2017 21:31:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: mail.efficios.com From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> To: "Carlos O'Donell" <carlos@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, "Ben Maurer" <bmaurer@fb.com>, libc-alpha@sourceware.org Subject: [RFC PATCH glibc] pthread_setspecific: Provide signal-safety across keys Date: Tue, 17 Oct 2017 17:29:19 -0400 Message-Id: <20171017212919.741-1-mathieu.desnoyers@efficios.com> |
Commit Message
Mathieu Desnoyers
Oct. 17, 2017, 9:29 p.m. UTC
The intent here is to provide signal-safety against callers to
pthread_{get/set}specific that work on different pthread keys,
without hurting performance of the normal use-cases that do not
care about signal-safety.
One thing to keep in mind is that callers of pthread_{get/set}specific
on a given key can disable signals if they require signal-safety
against operations on their key.
The real problem in my situation (lttng-ust tracer) is having
pthread_setspecific (invoked by the application) do memory allocation
and update pointers without disabling signals when it needs to allocate
"level2". This only happens the first time a key >=
PTHREAD_KEY_2NDLEVEL_SIZE is encountered by a thread. If lttng-ust
tracing inserted into a signal handler nests over this allocation, the
application pthread key specific may be lost, and memory leaked.
So I wonder how acceptable it would be to make just the memory
allocation part of pthread_setspecific signal-safe ? Technically, this
is not required very often, so it should not cause a significant
overhead.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Carlos O'Donell" <carlos@redhat.com>
CC: "Ben Maurer" <bmaurer@fb.com>
CC: libc-alpha@sourceware.org
---
nptl/pthread_setspecific.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
Comments
* Mathieu Desnoyers: > if (value == NULL) > /* We don't have to do anything. The value would in any case > be NULL. We can save the memory allocation. */ > return 0; > > + /* Ensure that pthread_setspecific and pthread_getspecific are > + signal-safe when used on different keys. */ > + sigfillset (&ss); > + pthread_sigmask (SIG_BLOCK, &ss, &old_ss); > + /* Read level2 again with signals disabled. */ > + level2 = THREAD_GETMEM_NC (self, specific, idx1st); > + if (level2 != NULL) > + goto skip_resize; > + > level2 > = (struct pthread_key_data *) calloc (PTHREAD_KEY_2NDLEVEL_SIZE, > sizeof (*level2)); I don't see how this adds signal safety. If the function is called from a signal handler, interrupting the malloc code, then this will still result in undefined behavior (heap corruption or deadlocks). Does lttng-ust call these functions even if the application does not? Then you really need to switch to real thread-local variables with the intial-exec model, possibly with indirection through a thread-local pointer variable and on-demand allocation using mmap in case you do not want to have these allocations in every thread.
On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > The intent here is to provide signal-safety against callers to > pthread_{get/set}specific that work on different pthread keys, > without hurting performance of the normal use-cases that do not > care about signal-safety. If we wish to provide particular safety guarantees beyond whatever is required by POSIX, I think they should be documented in the glibc manual as a GNU extension.
On 10/17/2017 02:54 PM, Florian Weimer wrote: > * Mathieu Desnoyers: > >> if (value == NULL) >> /* We don't have to do anything. The value would in any case >> be NULL. We can save the memory allocation. */ >> return 0; >> >> + /* Ensure that pthread_setspecific and pthread_getspecific are >> + signal-safe when used on different keys. */ >> + sigfillset (&ss); >> + pthread_sigmask (SIG_BLOCK, &ss, &old_ss); >> + /* Read level2 again with signals disabled. */ >> + level2 = THREAD_GETMEM_NC (self, specific, idx1st); >> + if (level2 != NULL) >> + goto skip_resize; >> + >> level2 >> = (struct pthread_key_data *) calloc (PTHREAD_KEY_2NDLEVEL_SIZE, >> sizeof (*level2)); > > I don't see how this adds signal safety. If the function is called > from a signal handler, interrupting the malloc code, then this will > still result in undefined behavior (heap corruption or deadlocks). > > Does lttng-ust call these functions even if the application does not? > Then you really need to switch to real thread-local variables with the > intial-exec model, possibly with indirection through a thread-local > pointer variable and on-demand allocation using mmap in case you do > not want to have these allocations in every thread. Agreed. The calloc proceeds in AS context and that leads to undefined behaviour. Please also read Florian's recommendation very carefully, you must us IE model TLS vars to ensure pre-allocation and avoid potential OOM during first variable touch, and to avoid first variable touch in a signal handler resulting in the same problem as above (though we are looking to fix this).
----- On Oct 17, 2017, at 5:54 PM, Florian Weimer fw@deneb.enyo.de wrote: > * Mathieu Desnoyers: > >> if (value == NULL) >> /* We don't have to do anything. The value would in any case >> be NULL. We can save the memory allocation. */ >> return 0; >> >> + /* Ensure that pthread_setspecific and pthread_getspecific are >> + signal-safe when used on different keys. */ >> + sigfillset (&ss); >> + pthread_sigmask (SIG_BLOCK, &ss, &old_ss); >> + /* Read level2 again with signals disabled. */ >> + level2 = THREAD_GETMEM_NC (self, specific, idx1st); >> + if (level2 != NULL) >> + goto skip_resize; >> + >> level2 >> = (struct pthread_key_data *) calloc (PTHREAD_KEY_2NDLEVEL_SIZE, >> sizeof (*level2)); > > I don't see how this adds signal safety. If the function is called > from a signal handler, interrupting the malloc code, then this will > still result in undefined behavior (heap corruption or deadlocks). Good point, I should probably turn this calloc into a on-demand mmap or such then. > > Does lttng-ust call these functions even if the application does not? Yes. > Then you really need to switch to real thread-local variables with the > intial-exec model, possibly with indirection through a thread-local > pointer variable and on-demand allocation using mmap in case you do > not want to have these allocations in every thread. Actually, liburcu-bp, which is the library actually using pthread_setspecific, does use a TLS pointer to a mmap'd region. The __thread variable sits in a library shared object, so it's not possible to use the initial-exec model. I work around this by touching the object once in a library constructor before other threads are launched. AFAIU, this takes care of performing the TLS lazy fixup. Thanks, Mathieu
On 10/17/2017 03:09 PM, Mathieu Desnoyers wrote: > The __thread variable sits in a library shared object, so it's not > possible to use the initial-exec model. I work around this by touching > the object once in a library constructor before other threads are > launched. AFAIU, this takes care of performing the TLS lazy fixup. You *can* use IE model, but doing so *might* cause your library not to load if we run out of static TLS space (some amount is reserved by the static linker for loading such libraries that use IE model TLS). The static TLS space is therefore something like a distro-wide global resources :-) However, we don't recommend IE model if you can get away with the performance of GD model and can manage to touch the variable early before it might be used in a signal handler.
On 10/17/2017 03:10 PM, Mathieu Desnoyers wrote: >> Please also read Florian's recommendation very carefully, you must us >> IE model TLS vars to ensure pre-allocation and avoid potential OOM >> during first variable touch, and to avoid first variable touch in a >> signal handler resulting in the same problem as above (though we are >> looking to fix this). > > Hopefully touching the TLS variable from a library constructor sides-steps > this issue. This has worked well in practice so far. It does. Thank you for the clarification.
----- On Oct 17, 2017, at 6:01 PM, Carlos O'Donell carlos@redhat.com wrote: > On 10/17/2017 02:54 PM, Florian Weimer wrote: >> * Mathieu Desnoyers: >> >>> if (value == NULL) >>> /* We don't have to do anything. The value would in any case >>> be NULL. We can save the memory allocation. */ >>> return 0; >>> >>> + /* Ensure that pthread_setspecific and pthread_getspecific are >>> + signal-safe when used on different keys. */ >>> + sigfillset (&ss); >>> + pthread_sigmask (SIG_BLOCK, &ss, &old_ss); >>> + /* Read level2 again with signals disabled. */ >>> + level2 = THREAD_GETMEM_NC (self, specific, idx1st); >>> + if (level2 != NULL) >>> + goto skip_resize; >>> + >>> level2 >>> = (struct pthread_key_data *) calloc (PTHREAD_KEY_2NDLEVEL_SIZE, >>> sizeof (*level2)); >> >> I don't see how this adds signal safety. If the function is called >> from a signal handler, interrupting the malloc code, then this will >> still result in undefined behavior (heap corruption or deadlocks). >> >> Does lttng-ust call these functions even if the application does not? >> Then you really need to switch to real thread-local variables with the >> intial-exec model, possibly with indirection through a thread-local >> pointer variable and on-demand allocation using mmap in case you do >> not want to have these allocations in every thread. > > Agreed. The calloc proceeds in AS context and that leads to undefined > behaviour. I indeed plan to spin an updated patch using mmap instead. > > Please also read Florian's recommendation very carefully, you must us > IE model TLS vars to ensure pre-allocation and avoid potential OOM > during first variable touch, and to avoid first variable touch in a > signal handler resulting in the same problem as above (though we are > looking to fix this). Hopefully touching the TLS variable from a library constructor sides-steps this issue. This has worked well in practice so far. Thanks, Mathieu > > -- > Cheers, > Carlos.
* Mathieu Desnoyers: >> Then you really need to switch to real thread-local variables with the >> intial-exec model, possibly with indirection through a thread-local >> pointer variable and on-demand allocation using mmap in case you do >> not want to have these allocations in every thread. > > Actually, liburcu-bp, which is the library actually using > pthread_setspecific, does use a TLS pointer to a mmap'd region. > > The __thread variable sits in a library shared object, so it's not > possible to use the initial-exec model. glibc supports this, and there shouldn't be any issues if the library is loaded into the initial process image (LD_PRELOAD or DT_NEEDED reference). dlopen can be problematic, though. With current upstream, additional initial-exec TLS variables do not impact later dlopen. (This was not always true.) > I work around this by touching > the object once in a library constructor before other threads are > launched. AFAIU, this takes care of performing the TLS lazy fixup. Global-dynamic currently has lazy allocation for each thread, so a library constructor is not good enough to ensure initialization. We probably want to fix this, but major language standards say that TLS access from signal handlers is undefined, so it's not even a real bug. But I have to admit that TLS access in signal handlers can be very useful and is often the only way to implement certain things without introducing a full managed runtime.
----- On Oct 17, 2017, at 5:55 PM, Joseph Myers joseph@codesourcery.com wrote: > On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > >> The intent here is to provide signal-safety against callers to >> pthread_{get/set}specific that work on different pthread keys, >> without hurting performance of the normal use-cases that do not >> care about signal-safety. > > If we wish to provide particular safety guarantees beyond whatever is > required by POSIX, I think they should be documented in the glibc manual > as a GNU extension. Good point! I'm narrowing it down to this text: manual/threads.texi @deftypefun int pthread_setspecific (pthread_key_t @var{key}, const void *@var{value}) @standards{POSIX, pthread.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} @c pthread_setspecific @asucorrupt @ascuheap @acucorrupt @acsmem @c a level2 block may be allocated by a signal handler after @c another call already made a decision to allocate it, thus losing @c the allocated value. the seq number is updated before the @c value, which might cause an earlier-generation value to seem @c current if setspecific is cancelled or interrupted by a signal @c KEY_UNUSED ok @c calloc dup @ascuheap @acsmem Associate the thread-specific @var{value} with @var{key} in the calling thread. @end deftypefun We'd need to edit the part about level2 block and allocation. The part about seq number should stay there. It's the part that states non-async-signal-safety across concurrent uses of pthread_setspecific for a given key. Would this edited version work ? I'm not sure I grasp the full meaning of @acsmem there. @deftypefun int pthread_setspecific (pthread_key_t @var{key}, const void *@var{value}) @standards{POSIX, pthread.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{}}@acunsafe{@acucorrupt{} @acsmem{}}} @c pthread_setspecific @asucorrupt @acucorrupt @acsmem @c the seq number is updated before the value, which might cause @c an earlier-generation value to seem current if setspecific is @c cancelled or interrupted by a signal @c KEY_UNUSED ok @c dup @acsmem Associate the thread-specific @var{value} with @var{key} in the calling thread. @end deftypefun Thanks, Mathieu > > -- > Joseph S. Myers > joseph@codesourcery.com
On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > @c pthread_setspecific @asucorrupt @ascuheap @acucorrupt @acsmem > @c a level2 block may be allocated by a signal handler after > @c another call already made a decision to allocate it, thus losing > @c the allocated value. the seq number is updated before the > @c value, which might cause an earlier-generation value to seem > @c current if setspecific is cancelled or interrupted by a signal > @c KEY_UNUSED ok > @c calloc dup @ascuheap @acsmem > Associate the thread-specific @var{value} with @var{key} in the calling thread. > @end deftypefun > > We'd need to edit the part about level2 block and allocation. That part is *comments* explaining to people reading the manual source code why the particular user-visible @safety{} annotations were determined from the library sources. It's true we should keep those up to date when the library sources change. But my point was that there should be *user-visible* documentation - not just comments in the manual sources - of any safety guarantees we choose to provide beyond POSIX.
----- On Oct 17, 2017, at 6:32 PM, Joseph Myers joseph@codesourcery.com wrote: > On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > >> @c pthread_setspecific @asucorrupt @ascuheap @acucorrupt @acsmem >> @c a level2 block may be allocated by a signal handler after >> @c another call already made a decision to allocate it, thus losing >> @c the allocated value. the seq number is updated before the >> @c value, which might cause an earlier-generation value to seem >> @c current if setspecific is cancelled or interrupted by a signal >> @c KEY_UNUSED ok >> @c calloc dup @ascuheap @acsmem >> Associate the thread-specific @var{value} with @var{key} in the calling thread. >> @end deftypefun >> >> We'd need to edit the part about level2 block and allocation. > > That part is *comments* explaining to people reading the manual source > code why the particular user-visible @safety{} annotations were determined > from the library sources. > > It's true we should keep those up to date when the library sources change. > But my point was that there should be *user-visible* documentation - not > just comments in the manual sources - of any safety guarantees we choose > to provide beyond POSIX. Can you provide a general hint on which repo and which documentation file I should update ? Thanks, Mathieu > > -- > Joseph S. Myers > joseph@codesourcery.com
On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > Can you provide a general hint on which repo and which documentation file > I should update ? That same file in the manual. But rather than just editing comments referring to details of the implementation, you need to write non-comment text explaining at the user API level, without reference to implementation internals, what API guarantees are provided in this regard. The text should be sufficient for users to understand what uses they can make of the API without needing to refer to implementation sources.
----- On Oct 17, 2017, at 6:19 PM, Florian Weimer fw@deneb.enyo.de wrote: > * Mathieu Desnoyers: > >>> Then you really need to switch to real thread-local variables with the >>> intial-exec model, possibly with indirection through a thread-local >>> pointer variable and on-demand allocation using mmap in case you do >>> not want to have these allocations in every thread. >> >> Actually, liburcu-bp, which is the library actually using >> pthread_setspecific, does use a TLS pointer to a mmap'd region. >> >> The __thread variable sits in a library shared object, so it's not >> possible to use the initial-exec model. > > glibc supports this, and there shouldn't be any issues if the library > is loaded into the initial process image (LD_PRELOAD or DT_NEEDED > reference). dlopen can be problematic, though. With current > upstream, additional initial-exec TLS variables do not impact later > dlopen. (This was not always true.) Unfortunately, it is allowed to dlopen() the lttng-ust tracer library as well. > >> I work around this by touching >> the object once in a library constructor before other threads are >> launched. AFAIU, this takes care of performing the TLS lazy fixup. > > Global-dynamic currently has lazy allocation for each thread, so a > library constructor is not good enough to ensure initialization. We > probably want to fix this, but major language standards say that TLS > access from signal handlers is undefined, so it's not even a real bug. > > But I have to admit that TLS access in signal handlers can be very > useful and is often the only way to implement certain things without > introducing a full managed runtime. Hrm, if with GD there is indeed memory allocation performed within each thread even after the initial explicit access done from the library constructor, then I may indeed need to use IE instead, even though it's a somewhat global resource. But since lttng-ust can be dlopen'd, I understand that IE is not appropriate neither. It looks like there is no good solution there ? Thanks, Mathieu
----- On Oct 17, 2017, at 6:50 PM, Joseph Myers joseph@codesourcery.com wrote: > On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > >> Can you provide a general hint on which repo and which documentation file >> I should update ? > > That same file in the manual. But rather than just editing comments > referring to details of the implementation, you need to write non-comment > text explaining at the user API level, without reference to implementation > internals, what API guarantees are provided in this regard. The text > should be sufficient for users to understand what uses they can make of > the API without needing to refer to implementation sources. Gotcha. So I guess we need to update pthread_getspecific too to document its async-signal safetyness, to make all this somewhat useful. How about the following ? @deftypefun void *pthread_getspecific (pthread_key_t @var{key}) @standards{POSIX, pthread.h} @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} @c pthread_getspecific ok Return the thread-specific data associated with @var{key} in the calling thread. As a non-POSIX extension, pthread_getspecific is async-signal safe. @end deftypefun @deftypefun int pthread_setspecific (pthread_key_t @var{key}, const void *@var{value}) @standards{POSIX, pthread.h} @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{}}@acunsafe{@acucorrupt{} @acsmem{}}} @c pthread_setspecific @asucorrupt @acucorrupt @acsmem @c the seq number is updated before the value, which might cause @c an earlier-generation value to seem current if setspecific is @c cancelled or interrupted by a signal @c KEY_UNUSED ok @c dup @acsmem Associate the thread-specific @var{value} with @var{key} in the calling thread. As a non-POSIX extension, pthread_setspecific is async-signal safe for concurrent invocations against different @var{key}, but not async-signal safe for concurrent invocations against the same @var{key}. @end deftypefun Thanks, Mathieu > > -- > Joseph S. Myers > joseph@codesourcery.com
On Tue, 17 Oct 2017, Mathieu Desnoyers wrote:
> As a non-POSIX extension, pthread_getspecific is async-signal safe.
Well, you should use @code{} around the function name in the
documentation. But I think that the documentation covers the sorts of
things I'd expect. This is not a review of the merits of the proposed
change or of whether the patch actually achieves the semantics you
document.
----- On Oct 17, 2017, at 6:09 PM, Carlos O'Donell carlos@redhat.com wrote: > On 10/17/2017 03:09 PM, Mathieu Desnoyers wrote: >> The __thread variable sits in a library shared object, so it's not >> possible to use the initial-exec model. I work around this by touching >> the object once in a library constructor before other threads are >> launched. AFAIU, this takes care of performing the TLS lazy fixup. > > You *can* use IE model, but doing so *might* cause your library not to > load if we run out of static TLS space (some amount is reserved by the > static linker for loading such libraries that use IE model TLS). > The static TLS space is therefore something like a distro-wide global > resources :-) > > However, we don't recommend IE model if you can get away with the > performance of GD model and can manage to touch the variable early > before it might be used in a signal handler. Touching it from a constructor only does it from the "main" thread perspective. Is there a need to touch it outside of a signal handler in every thread ? Thanks, Mathieu > > -- > Cheers, > Carlos.
----- On Oct 17, 2017, at 7:16 PM, Joseph Myers joseph@codesourcery.com wrote: > On Tue, 17 Oct 2017, Mathieu Desnoyers wrote: > >> As a non-POSIX extension, pthread_getspecific is async-signal safe. > > Well, you should use @code{} around the function name in the > documentation. But I think that the documentation covers the sorts of > things I'd expect. This is not a review of the merits of the proposed > change or of whether the patch actually achieves the semantics you > document. Fixed. Thanks for the feedback! Mathieu > > -- > Joseph S. Myers > joseph@codesourcery.com
On 10/17/2017 04:19 PM, Mathieu Desnoyers wrote: > ----- On Oct 17, 2017, at 6:09 PM, Carlos O'Donell carlos@redhat.com wrote: > >> On 10/17/2017 03:09 PM, Mathieu Desnoyers wrote: >>> The __thread variable sits in a library shared object, so it's not >>> possible to use the initial-exec model. I work around this by touching >>> the object once in a library constructor before other threads are >>> launched. AFAIU, this takes care of performing the TLS lazy fixup. >> >> You *can* use IE model, but doing so *might* cause your library not to >> load if we run out of static TLS space (some amount is reserved by the >> static linker for loading such libraries that use IE model TLS). >> The static TLS space is therefore something like a distro-wide global >> resources :-) >> >> However, we don't recommend IE model if you can get away with the >> performance of GD model and can manage to touch the variable early >> before it might be used in a signal handler. > > Touching it from a constructor only does it from the "main" thread > perspective. Is there a need to touch it outside of a signal handler > in every thread ? Yes, you must touch the TLS variable in every new thread *outside* of the signal handler, since each access in each new thread is required to force the deferred allocation to be carried out. I apologize for my earlier statement, I thought perhaps this was just one thread of execution.
* Mathieu Desnoyers: > ----- On Oct 17, 2017, at 6:19 PM, Florian Weimer fw@deneb.enyo.de wrote: > >> * Mathieu Desnoyers: >> >>>> Then you really need to switch to real thread-local variables with the >>>> intial-exec model, possibly with indirection through a thread-local >>>> pointer variable and on-demand allocation using mmap in case you do >>>> not want to have these allocations in every thread. >>> >>> Actually, liburcu-bp, which is the library actually using >>> pthread_setspecific, does use a TLS pointer to a mmap'd region. >>> >>> The __thread variable sits in a library shared object, so it's not >>> possible to use the initial-exec model. >> >> glibc supports this, and there shouldn't be any issues if the library >> is loaded into the initial process image (LD_PRELOAD or DT_NEEDED >> reference). dlopen can be problematic, though. With current >> upstream, additional initial-exec TLS variables do not impact later >> dlopen. (This was not always true.) > > Unfortunately, it is allowed to dlopen() the lttng-ust tracer library > as well. The question is whether this is a common use case, or if it is acceptable to have somewhat limited support for this (in the sense that if you dlopen many different libraries which use initial-exec TLS, you might see a failure eventually).
----- On Oct 18, 2017, at 1:36 AM, Florian Weimer fw@deneb.enyo.de wrote: > * Mathieu Desnoyers: > >> ----- On Oct 17, 2017, at 6:19 PM, Florian Weimer fw@deneb.enyo.de wrote: >> >>> * Mathieu Desnoyers: >>> >>>>> Then you really need to switch to real thread-local variables with the >>>>> intial-exec model, possibly with indirection through a thread-local >>>>> pointer variable and on-demand allocation using mmap in case you do >>>>> not want to have these allocations in every thread. >>>> >>>> Actually, liburcu-bp, which is the library actually using >>>> pthread_setspecific, does use a TLS pointer to a mmap'd region. >>>> >>>> The __thread variable sits in a library shared object, so it's not >>>> possible to use the initial-exec model. >>> >>> glibc supports this, and there shouldn't be any issues if the library >>> is loaded into the initial process image (LD_PRELOAD or DT_NEEDED >>> reference). dlopen can be problematic, though. With current >>> upstream, additional initial-exec TLS variables do not impact later >>> dlopen. (This was not always true.) >> >> Unfortunately, it is allowed to dlopen() the lttng-ust tracer library >> as well. > > The question is whether this is a common use case, or if it is > acceptable to have somewhat limited support for this (in the sense > that if you dlopen many different libraries which use initial-exec > TLS, you might see a failure eventually). My use-case is a library called from an application or library instrumentation. The objective is to minimally change the application/library (basically only instrument it). It should be possible to only instrument and trace a linked .so without changing the application. I need the tracer to run fine in a signal handler, and it happens to use TLS variables. I would assume this use-case is not so common, as the requirement of being used in a signal handler is more specific to a tracer than a generic library. So let's assume dlopen failure is acceptable if there are too many initial-exec TLS. Would it work today to fix signal handler tracing by changing lttng-ust to use TLS_IE, or would it require changes to glibc ? Thanks, Mathieu
On 10/18/2017 09:21 AM, Mathieu Desnoyers wrote: > So let's assume dlopen failure is acceptable if there are too many > initial-exec TLS. Would it work today to fix signal handler tracing > by changing lttng-ust to use TLS_IE, or would it require changes to > glibc ? No changes to glibc are required. We already use TLS IE in glibc specifically to fix this issue. Also libGL also uses TLS IE in the distro for performance reasons, so we get testing there. How much TLS data do you have? With libGL it's just a few words, and there is lots of spare space in the static TLS space for that.
----- On Oct 18, 2017, at 1:40 PM, carlos carlos@redhat.com wrote: > On 10/18/2017 09:21 AM, Mathieu Desnoyers wrote: >> So let's assume dlopen failure is acceptable if there are too many >> initial-exec TLS. Would it work today to fix signal handler tracing >> by changing lttng-ust to use TLS_IE, or would it require changes to >> glibc ? > > No changes to glibc are required. We already use TLS IE in glibc > specifically to fix this issue. Also libGL also uses TLS IE in > the distro for performance reasons, so we get testing there. > > How much TLS data do you have? With libGL it's just a few words, > and there is lots of spare space in the static TLS space for > that. For liburcu-bp, used by lttng-ust, a single pointer: DEFINE_URCU_TLS(struct rcu_reader *, rcu_reader); I make sure to do place the bulk of data into mmap'd memory. So, very little TLS space is required. Thanks, Mathieu > > -- > Cheers, > Carlos.
On 10/18/2017 11:16 AM, Mathieu Desnoyers wrote: > ----- On Oct 18, 2017, at 1:40 PM, carlos carlos@redhat.com wrote: > >> On 10/18/2017 09:21 AM, Mathieu Desnoyers wrote: >>> So let's assume dlopen failure is acceptable if there are too many >>> initial-exec TLS. Would it work today to fix signal handler tracing >>> by changing lttng-ust to use TLS_IE, or would it require changes to >>> glibc ? >> >> No changes to glibc are required. We already use TLS IE in glibc >> specifically to fix this issue. Also libGL also uses TLS IE in >> the distro for performance reasons, so we get testing there. >> >> How much TLS data do you have? With libGL it's just a few words, >> and there is lots of spare space in the static TLS space for >> that. > > For liburcu-bp, used by lttng-ust, a single pointer: > > DEFINE_URCU_TLS(struct rcu_reader *, rcu_reader); > > I make sure to do place the bulk of data into mmap'd memory. > So, very little TLS space is required. Perfect. Just like libGL then, which just has two pointers. The probability of your library not loading is going to be vanishingly small, so I think you can switch to TLS IE without a problem.
----- On Oct 18, 2017, at 2:38 PM, carlos carlos@redhat.com wrote: > On 10/18/2017 11:16 AM, Mathieu Desnoyers wrote: >> ----- On Oct 18, 2017, at 1:40 PM, carlos carlos@redhat.com wrote: >> >>> On 10/18/2017 09:21 AM, Mathieu Desnoyers wrote: >>>> So let's assume dlopen failure is acceptable if there are too many >>>> initial-exec TLS. Would it work today to fix signal handler tracing >>>> by changing lttng-ust to use TLS_IE, or would it require changes to >>>> glibc ? >>> >>> No changes to glibc are required. We already use TLS IE in glibc >>> specifically to fix this issue. Also libGL also uses TLS IE in >>> the distro for performance reasons, so we get testing there. >>> >>> How much TLS data do you have? With libGL it's just a few words, >>> and there is lots of spare space in the static TLS space for >>> that. >> >> For liburcu-bp, used by lttng-ust, a single pointer: >> >> DEFINE_URCU_TLS(struct rcu_reader *, rcu_reader); >> >> I make sure to do place the bulk of data into mmap'd memory. >> So, very little TLS space is required. > > Perfect. Just like libGL then, which just has two pointers. > > The probability of your library not loading is going to be > vanishingly small, so I think you can switch to TLS IE without > a problem. I'm trying to figure out the right TLS model for each liburcu libraries. Given that it means a major soname bump, I prefer to get it right from the get go. For TLS meant to be touched by a signal handler at first use in a thread, the hard requirement seems to be TLS initial-exec. This takes care of urcu-bp, the call-rcu, and rcu-defer TLS. I have other liburcu flavors where the application is required to explicitly register the TLS data structure at thread start. This lifts the "hard" requirement on making the first use signal-handler safe, given that I can mandate the registration to be performed outside of signal handler. The largest data structure I have is 64 bytes. Is it still acceptable for the IE model ? I understand that there is a performance benefit in using the initial-exec model over the global-dynamic. Given liburcu tends to be used on fast-paths, should I simply move all liburcu TLS variables to the IE model ? Thanks, Mathieu > > -- > Cheers, > Carlos.
* Mathieu Desnoyers: > I understand that there is a performance benefit in using > the initial-exec model over the global-dynamic. Given liburcu > tends to be used on fast-paths, should I simply move all > liburcu TLS variables to the IE model ? That should be okay. The situation for dlopen will improve because we will eventually offer some tuning knobs, so that it's possible to increase the reserve in anticipation of future dlopen calls. Right now, if your libraries are compatible with LD_PRELOAD, that would also be a workable solution, I think.
On 17/10/17 23:19, Florian Weimer wrote: > Global-dynamic currently has lazy allocation for each thread, so a > library constructor is not good enough to ensure initialization. We > probably want to fix this, but major language standards say that TLS > access from signal handlers is undefined, so it's not even a real bug. the relevant standard for a libc implementation is iso c which has 7.14.1.1p5 "If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static or thread storage duration that is not a lock-free atomic object other than by assigning a value to an object declared as volatile sig_atomic_t, or the signal handler calls any function in the standard library other than the abort function, the _Exit function, the quick_exit function, or the signal function with the first argument equal to the signal number corresponding to the signal that caused the invocation of the handler." i.e. lock-free atomic and volatile sig_atomic_t tls access should work.
On 10/19/2017 11:44 AM, Szabolcs Nagy wrote: > On 17/10/17 23:19, Florian Weimer wrote: >> Global-dynamic currently has lazy allocation for each thread, so a >> library constructor is not good enough to ensure initialization. We >> probably want to fix this, but major language standards say that TLS >> access from signal handlers is undefined, so it's not even a real bug. > > the relevant standard for a libc implementation is iso c which has > > 7.14.1.1p5 > "If the signal occurs other than as the result of calling the abort > or raise function, the behavior is undefined if the signal handler > refers to any object with static or thread storage duration that is > not a lock-free atomic object other than by assigning a value to an > object declared as volatile sig_atomic_t, or the signal handler calls > any function in the standard library other than the abort function, > the _Exit function, the quick_exit function, or the signal function > with the first argument equal to the signal number corresponding to > the signal that caused the invocation of the handler." > > i.e. lock-free atomic and volatile sig_atomic_t tls access should work. For us, it depends on the TLS model. Async-signal-safe TLS access is considered unimplementable in general, which is why C++ makes it undefined: “ \pnum \indextext{signal-safe!evaluation|see{evaluation, signal-safe}}% An evaluation is \defnx{signal-safe}{evaluation!signal-safe} unless it includes one of the following: \begin{itemize} […] \item an access to an object with thread storage duration; ” Thanks, Florian
On 19/10/17 11:37, Florian Weimer wrote: > On 10/19/2017 11:44 AM, Szabolcs Nagy wrote: >> On 17/10/17 23:19, Florian Weimer wrote: >>> Global-dynamic currently has lazy allocation for each thread, so a >>> library constructor is not good enough to ensure initialization. We >>> probably want to fix this, but major language standards say that TLS >>> access from signal handlers is undefined, so it's not even a real bug. >> >> the relevant standard for a libc implementation is iso c which has >> >> 7.14.1.1p5 >> "If the signal occurs other than as the result of calling the abort >> or raise function, the behavior is undefined if the signal handler >> refers to any object with static or thread storage duration that is >> not a lock-free atomic object other than by assigning a value to an >> object declared as volatile sig_atomic_t, or the signal handler calls >> any function in the standard library other than the abort function, >> the _Exit function, the quick_exit function, or the signal function >> with the first argument equal to the signal number corresponding to >> the signal that caused the invocation of the handler." >> >> i.e. lock-free atomic and volatile sig_atomic_t tls access should work. > > For us, it depends on the TLS model. Async-signal-safe TLS access is considered unimplementable in general, musl implements it: you just have to allocate tls for all dsos and all threads in dlopen and pthread_create. (if unbounded dlopen/dlclose calls are supported then it's a bit trickier since dtv slots need to be reused then, but should be possible in principle) and i think shared libraries are unimplementable for c++ already (ctors can execute arbitrary code during dlopen and can make it crash and there are odr issues with vague linkage vs RTLD_LOCAL), so general dynamic tls should not come up in a strictly conforming c++ program, so that's not a good reason to drop as-safety. > which is why C++ makes it undefined: > > “ > \pnum > \indextext{signal-safe!evaluation|see{evaluation, signal-safe}}% > An evaluation is \defnx{signal-safe}{evaluation!signal-safe} unless it includes one of the following: > > \begin{itemize} > […] > \item > an access to an object with thread storage duration; > ” > that's new text in c++17 (and yet another incompatibility with c).
----- On Oct 18, 2017, at 5:34 PM, Florian Weimer fw@deneb.enyo.de wrote: > * Mathieu Desnoyers: > >> I understand that there is a performance benefit in using >> the initial-exec model over the global-dynamic. Given liburcu >> tends to be used on fast-paths, should I simply move all >> liburcu TLS variables to the IE model ? > > That should be okay. > > The situation for dlopen will improve because we will eventually offer > some tuning knobs, so that it's possible to increase the reserve in > anticipation of future dlopen calls. > > Right now, if your libraries are compatible with LD_PRELOAD, that > would also be a workable solution, I think. I also figured I can introduce that TLS model change without bumping the library soname, even though those TLS symbols are exposed, because the link-editor figures out at runtime which reloc to apply based on the TLS model. This basically means the __attribute__((tls_model("initial-exec"))) should be applied to the definition, and the declaration of the TLS does not need any attribute. The link-editor will figure out the right reloc to apply at link-time, thus modifying the code of the TLS accesses. Let me know if I misunderstand that part, Thanks! Mathieu
On 10/19/2017 01:04 PM, Szabolcs Nagy wrote: > On 19/10/17 11:37, Florian Weimer wrote: >> On 10/19/2017 11:44 AM, Szabolcs Nagy wrote: >>> On 17/10/17 23:19, Florian Weimer wrote: >>>> Global-dynamic currently has lazy allocation for each thread, so a >>>> library constructor is not good enough to ensure initialization. We >>>> probably want to fix this, but major language standards say that TLS >>>> access from signal handlers is undefined, so it's not even a real bug. >>> >>> the relevant standard for a libc implementation is iso c which has >>> >>> 7.14.1.1p5 >>> "If the signal occurs other than as the result of calling the abort >>> or raise function, the behavior is undefined if the signal handler >>> refers to any object with static or thread storage duration that is >>> not a lock-free atomic object other than by assigning a value to an >>> object declared as volatile sig_atomic_t, or the signal handler calls >>> any function in the standard library other than the abort function, >>> the _Exit function, the quick_exit function, or the signal function >>> with the first argument equal to the signal number corresponding to >>> the signal that caused the invocation of the handler." >>> >>> i.e. lock-free atomic and volatile sig_atomic_t tls access should work. >> >> For us, it depends on the TLS model. Async-signal-safe TLS access is considered unimplementable in general, > > musl implements it: you just have to allocate tls for > all dsos and all threads in dlopen and pthread_create. > (if unbounded dlopen/dlclose calls are supported then > it's a bit trickier since dtv slots need to be reused > then, but should be possible in principle) We have received guidance that this is not desirable for all TLS usage scenarios. Some applications might not want to allocate backing storage for the full set of TLS variables for those threads which do not need all of them. But the entire world is not Linux. >> which is why C++ makes it undefined: >> >> “ >> \pnum >> \indextext{signal-safe!evaluation|see{evaluation, signal-safe}}% >> An evaluation is \defnx{signal-safe}{evaluation!signal-safe} unless it includes one of the following: >> >> \begin{itemize} >> […] >> \item >> an access to an object with thread storage duration; >> ” >> > > that's new text in c++17 (and yet another incompatibility > with c). In general, the C++ standard adheres much more to implementation experience than the C standard. This incompatibility is probably a defect in the C standard. Thanks, Florian
On Thu, 19 Oct 2017, Szabolcs Nagy wrote: > the relevant standard for a libc implementation is iso c which has > > 7.14.1.1p5 > "If the signal occurs other than as the result of calling the abort > or raise function, the behavior is undefined if the signal handler > refers to any object with static or thread storage duration that is > not a lock-free atomic object other than by assigning a value to an > object declared as volatile sig_atomic_t, or the signal handler calls > any function in the standard library other than the abort function, > the _Exit function, the quick_exit function, or the signal function > with the first argument equal to the signal number corresponding to > the signal that caused the invocation of the handler." Note that this changes following DR#462 to also allow calling <stdatomic.h> functions for lock-free types - but that doesn't change the principle that TLS access should work and so TLS should be allocated at dlopen / pthread_create time.
On Thu, 19 Oct 2017, Florian Weimer wrote: > > musl implements it: you just have to allocate tls for > > all dsos and all threads in dlopen and pthread_create. > > (if unbounded dlopen/dlclose calls are supported then > > it's a bit trickier since dtv slots need to be reused > > then, but should be possible in principle) > > We have received guidance that this is not desirable for all TLS usage > scenarios. Some applications might not want to allocate backing storage for > the full set of TLS variables for those threads which do not need all of them. My assertion is that the default, for new binaries, should be to do the allocation at dlopen / pthread_create time. This allows for the possibility of having a dlopen flag / pthread attribute / other API to allow lazy allocation in a particular case, with new symbol versions being used to keep existing binaries defaulting to lazy allocation.
diff --git a/nptl/pthread_setspecific.c b/nptl/pthread_setspecific.c index 214af3b661..aa5a1f17c8 100644 --- a/nptl/pthread_setspecific.c +++ b/nptl/pthread_setspecific.c @@ -61,18 +61,33 @@ __pthread_setspecific (pthread_key_t key, const void *value) level2 = THREAD_GETMEM_NC (self, specific, idx1st); if (level2 == NULL) { + sigset_t ss, old_ss; + if (value == NULL) /* We don't have to do anything. The value would in any case be NULL. We can save the memory allocation. */ return 0; + /* Ensure that pthread_setspecific and pthread_getspecific are + signal-safe when used on different keys. */ + sigfillset (&ss); + pthread_sigmask (SIG_BLOCK, &ss, &old_ss); + /* Read level2 again with signals disabled. */ + level2 = THREAD_GETMEM_NC (self, specific, idx1st); + if (level2 != NULL) + goto skip_resize; + level2 = (struct pthread_key_data *) calloc (PTHREAD_KEY_2NDLEVEL_SIZE, sizeof (*level2)); - if (level2 == NULL) + if (level2 == NULL) { + pthread_sigmask (SIG_SETMASK, &old_ss, NULL); return ENOMEM; + } THREAD_SETMEM_NC (self, specific, idx1st, level2); +skip_resize: + pthread_sigmask (SIG_SETMASK, &old_ss, NULL); } /* Pointer to the right array element. */