Message ID | orvaxmklme.fsf@livre.home |
---|---|
State | New, archived |
Headers |
Received: (qmail 16461 invoked by alias); 24 Sep 2016 03:44:53 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 16386 invoked by uid 89); 24 Sep 2016 03:44:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS, UNSUBSCRIBE_BODY autolearn=ham version=3.3.2 spammy=needlessly X-HELO: mx1.redhat.com From: Alexandre Oliva <aoliva@redhat.com> To: Andreas Schwab <schwab@suse.de> Cc: libc-alpha@sourceware.org Subject: Re: [PR19826] fix non-LE TLS in static programs References: <or60pqngb7.fsf@livre.home> <mvmshst5njj.fsf@hawking.suse.de> Date: Sat, 24 Sep 2016 00:44:09 -0300 In-Reply-To: <mvmshst5njj.fsf@hawking.suse.de> (Andreas Schwab's message of "Wed, 21 Sep 2016 16:34:56 +0200") Message-ID: <orvaxmklme.fsf@livre.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain |
Commit Message
Alexandre Oliva
Sept. 24, 2016, 3:44 a.m. UTC
On Sep 21, 2016, Andreas Schwab <schwab@suse.de> wrote: > On Sep 21 2016, Alexandre Oliva <aoliva@redhat.com> wrote: >> [BZ #19826] >> * elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early >> initialization of static TLS entries. >> * elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise. >> * nptl/allocatestack.c (init_one_static_tls): Likewise. > Ok. No, it's not! :-) :-( We'll need something along these lines, to fix the above, and to avoid making the same mistake again. I'm yet to test it myself, but I understand you'd already tested on ARM the combination of the previous patch and this reversal, minus the comments added below, i.e., the elf/dl-tls.c changes above. Is that so? Once I'm done with the testing, is this ok to install on top of the previous patch? [PR19826] revert needlessly reintroduced race, explain it Along with the proper fix for the problem, a couple of unnecessary DTV initializations were reintroduced, but they weren't just unnecessary: one of them introduced a race condition that was alluded to in the comments. In this patch, I add a long comment explaining why we can't modify another threads' DTV while initializing its static TLS block during dlopen. I also revert the unnecessary parts of the previous change, removing the DTV-initializing code from the above, to fix the race, and, for uniformity, also from the corresponding code that runs in single-threaded programs, that was harmless as far as I can tell. for ChangeLog [BZ #19826] * nptl/allocatestack.c (init_one_static_tls): Revert the previous change and explain the race condition it would introduce. * elf/dl-reloc.c (_dl_nothread_init_static_tls): Revert previous change for uniformity. --- elf/dl-reloc.c | 6 ------ nptl/allocatestack.c | 36 ++++++++++++++++++++++++++++++------ 2 files changed, 30 insertions(+), 12 deletions(-)
Comments
On Sat, 2016-09-24 at 00:44 -0300, Alexandre Oliva wrote: > In this patch, I add a long comment explaining why we can't modify > another threads' DTV while initializing its static TLS block during > dlopen. I appreciate that you added a comment. Given that you seemed to have paged in all of this currently, it would be even better if you could try to further improve the documentation of all of this. (I suppose missing the issue you fix here would have been less likely if proper documentation existed before.) Having the patch being reviewed by someone who's not deeply familiar with the TLS implementation would be good, as it shows whether this really makes the code more accessible (and thus reducing the bus factor...). Are you aware of any synchronization in the TLS stuff that is not based on locks? That is, were there are data races as defined by C11? If so, it would be great if you could transform them to using the new-style C11 atomics. I've reviewed nscd concurrency recently and converted it to use C11 atomics, and this process revealed many synchronization bugs such as missing barriers. > + (**) If we were to write to other threads' DTV entries here, we > + might race with their DTV resizing. We might then write to > + stale, possibly already free()d and reallocated memory. Or we > + could write past the end of the still-active DTV. So, don't do > + that! It's okay to point out why data races are bad, but you probably don't need it. I think we have consensus that every data race is bad and should be fixed (with very few exceptions, which then need lengthy explanations). The general understanding should be that there's no such thing as a benign data race. > + Another race, not to be mistaken for the above, has to do with > + the TLS block initialized below. Although we perform this > + initialization while holding the rtld lock, once dlopen > + completes, other threads might run code that accesses the > + variables in these TLS blocks, without taking the rtld lock or > + any other explicit memory acquire. What is a "memory acquire"? (If you should be assuming that acquire operations synchronize-with every other release operation such as a lock release, then this is wrong; remember that it is pairs of release/acquire action that are linked by a reads-from, so both release and acquire must target the same memory location.) > + Our implementation appears to assume that, in order for the other > + threads to access those variables, there will be some form of > + synchronization between the end of dlopen() in one thread and the > + use of the just-loaded modules in others, otherwise the very use > + of the loaded code could be undefined. */ > I'd reverse that: I find it misleading that you start that there is a race condition, when in fact this seems not to be the case because of the assumption. Therefore, I'd mention the assumption first, and then derive from that there is no data-race and that uses of the TLS will find it initialized.
On Sep 24, 2016, Torvald Riegel <triegel@redhat.com> wrote: > On Sat, 2016-09-24 at 00:44 -0300, Alexandre Oliva wrote: >> In this patch, I add a long comment explaining why we can't modify >> another threads' DTV while initializing its static TLS block during >> dlopen. > I appreciate that you added a comment. Given that you seemed to have > paged in all of this currently, it would be even better if you could try > to further improve the documentation of all of this. As your feedback to the patch shows, it's pointless for me to even try. My way of reasoning about this stuff doesn't seem to make sense to you. So why bother? > (I suppose missing the issue you fix here would have been less likely > if proper documentation existed before.) *nod*. That's why I added the new, longer comments: the comments I'd added before were not enough to stop even myself from reintroducing the error, because there was a shallow, more obvious (apparent? harmless?) race, and a deeper, less obvious one that's very harmful when it hits. > Are you aware of any synchronization in the TLS stuff that is not based > on locks? Yes, plenty. Off the top of my head, there's static TLS block initialization at dlopen vs use by other threads, there was DTV initialization vs use and resize (fixed by my 1.5yo-patch, broken by my patch from last week, and fixed again once the patch that reverts the breakage goes in, because we only mess with a thread's DTV at thread its start-up and then by the thread itself) but IIRC there are self-concurrency issues if DTV resizing is interrupted by a signal that uses GD and starts further resizing) and IIRC there is plenty of suspicious stuff in slotinfo manipulation vs use (the data structure is only modified while holding the rtld lock, but it's read while updating a DTV without any explicit synchronization; I never looked into it enough to tell what the assumptions were that make it safe at some point, to tell whether there's any chance they still are); I have a hunch that there are probably issues with TLS descriptors as well, because although they're created while holding the rtld lock, their use is not explicitly guarded either, nor are than the GOT entries relocated so as to point to them. IMHO these are all internal implementations, so the C11 memory model for applications doesn't quite apply to them: it's ok if they make stronger or weaker assumptions than what the language is supposed to expose to users of the language. Just like the *implementation* of mutexes, locks, condition variables and other such concepts won't necessarily abide by the rules imposed on users of the language, as long as they deliver the intended semantics, icluding synchronization and memory model properties. > If so, it would be great if you could transform them to using the > new-style C11 atomics. I guess it would, wouldn't it? Too bad I'm probably not. I just can't stand even the thought of the never-ending conversations that would ensue. So, I'm out of here; the only reason you're even from me is that I'm responsible enough to at least try to fix the breakage I introduced, once it was brought to my attention (at a time in which I actually noticed it; too bad I missed it the first time). > I've reviewed nscd concurrency recently and converted it to use C11 > atomics, and this process revealed many synchronization bugs such as > missing barriers. That shouldn't be surprising. A lot of that code was written before C11 was even in the foreseeable future. >> + Another race, not to be mistaken for the above, has to do with >> + the TLS block initialized below. Although we perform this >> + initialization while holding the rtld lock, once dlopen >> + completes, other threads might run code that accesses the >> + variables in these TLS blocks, without taking the rtld lock or >> + any other explicit memory acquire. > What is a "memory acquire"? That's a term from long before C standardized atomics, from the 20 years or so in which people like myself have been doing concurrent programming on multi-CPU computers without explicit language support, and even on single-CPU multi-tasked systems. In dusted books from that age, you may be able to find tons of information about such things as memory barriers or fences, some with acquire and release semantics, as well as the implied acquire and release of all memory that occurs at mutex acquire and release time, respectively. There are still some remnants from those days in e.g. Linux docs: https://www.kernel.org/doc/Documentation/memory-barriers.txt but a lot of what can be found on the Internet now seems to be more recent than C and C++ atomics. Still, the notion of memory acquire and release from the ancient scrolls is still there, even if a bit in disguise. There's no reason for us to limit our language and throw away all the old books just because the term was reused in a new, narrower context, is there? >> + Our implementation appears to assume that, in order for the other >> + threads to access those variables, there will be some form of >> + synchronization between the end of dlopen() in one thread and the >> + use of the just-loaded modules in others, otherwise the very use >> + of the loaded code could be undefined. */ > I'd reverse that: I find it misleading that you start that there is a > race condition, when in fact this seems not to be the case because of > the assumption. Therefore, I'd mention the assumption first, and then > derive from that there is no data-race and that uses of the TLS will > find it initialized. Unless the assumption fails to meet the definition of data race in some possibly irrelevant fashion. The point is that the implementation doesn't guard itself from certain races, if someone's determined to create one, but I pose they don't matter. Let me give you some examples: - there's the obvious case of the program with a race: the dlopen caller sets a global non-atomic variable when dlopen completes, and another thread busy-waits for the variable to change, and then uses its value to call a function that accesses the newly-loaded TLS variable. The race on the global variable enables the race on the TLS variable, so undefined behavior has already been invoked by the time the TLS implementation incurs its own race. - here's a variant that doesn't involve any other race: the dlopen caller writes to a pipe a pointer to a function that accesses the TLS variable, and another thread reads the pointer from the pipe and calls it -> race? there's clearly a happens-before order implied by the write() and read(), but that doesn't imply memory synchronization in the memory model AFAICT. - another variant in which there's a clear happens-before order, but no memory synchronization, so there's possibly a race: the dlopen caller installs a signal handler that accesses the TLS variable, and then gets the signal sent to the process itself; another thread is chosen to handle the signal and accesses the TLS variable -> race? - generalizing, use any out-of-(memory-)band information-carrying side effect that one thread could use to communicate to another thread how to call a function that accesses the TLS variable, both function and variable brought in by dlopen, and you'll have, technically, a data race, but IMO a harmless data race.
If you have to modify it anyway, would you mind adding the testcase from the bug? Andreas.
On Sep 26, 2016, Andreas Schwab <schwab@suse.de> wrote: > If you have to modify it anyway, would you mind adding the testcase from > the bug? The reason I didn't add the testcase was that I understood it was redundant with another existing test. Reviewing the earlier report, I got the idea that the regression introduced by last year's patch had already been detected by existing tests on the affected platforms, it just so happened that I missed the report. Ideally, we'd have a test to detects such a regression on x86_64 too, but I don't see a way to do that that's not too fragile. Like, we could introduce a TLS variable in the main program (so that it's hopefully assigned to the beginning of the TLS block), include the custom implementation of tls_get_addr in the test, and call it (with module id 1 and offset 0) to ensure it returns the same address that LE computes for the variable. There might be issues with different calling conventions for tls_get_addr, but if we introduced the custom function using a different name and a standardized calling convention (even if different from the one used by the TLS implementation), maybe we'd be able to catch such regressions. But we'd be taking too different a path from the normal use of __tls_get_addr to make the test reliable, not to mention the undeserved assumption that, just because a TLS variable is defined in the first object file to be linked in, its dynamic TLS offset is going to be zero. Do you all think it make sense to introduce such a test, in spite of all these caveats?
On Sun, 2016-09-25 at 19:18 -0300, Alexandre Oliva wrote: > On Sep 24, 2016, Torvald Riegel <triegel@redhat.com> wrote: > > > On Sat, 2016-09-24 at 00:44 -0300, Alexandre Oliva wrote: > >> In this patch, I add a long comment explaining why we can't modify > >> another threads' DTV while initializing its static TLS block during > >> dlopen. > > > I appreciate that you added a comment. Given that you seemed to have > > paged in all of this currently, it would be even better if you could try > > to further improve the documentation of all of this. > > As your feedback to the patch shows, it's pointless for me to even try. I do not think it is, provided that you are willing to work within the memory model that we have chosen for glibc (which is basically equivalent to C11's). > My way of reasoning about this stuff doesn't seem to make sense to you. > So why bother? This isn't about my personal opinion, or taste. glibc has chosen the C11 memory model (see above), so I consider this our technical context. As with any other technical context we pick, we want to build code (and documentation) that is sound within that context. Hence, if I think that a patch can be improved in this regard, I'll speak up. Consider an exaggerated but similar example: I want to add a non-trivial piece of code, and like to reason about it using notes on paper. Yet glibc has decided that it wants comments in the code. What should I do? Should I just give up, or try to find a way to do what glibc wants and how it decided to maintain the code base (eg, finding a back-and-forth conversion between glibc model of taking notes and mine)? > > (I suppose missing the issue you fix here would have been less likely > > if proper documentation existed before.) > > *nod*. That's why I added the new, longer comments: the comments I'd > added before were not enough to stop even myself from reintroducing the > error, because there was a shallow, more obvious (apparent? harmless?) > race, and a deeper, less obvious one that's very harmful when it hits. Right. But we also need to consider that more than one person will eventually want to maintain any piece of glibc code. Thus, we need a common way of reasoning about it. This needs to very well specified, and that's a job we can't do on our own. So, picking something that will be wide-spread, like C11's model, is the right thing to do. But then we also need to make use of it. I've noted a couple of times that of course, the project needs to find its own way of how to talk about concurrency. Discussing patches such as yours is a way of doing that. I've also always asked reviewers of my patches that concurrent code to also comment on whether they find the comments about concurrency in these good (eg, level of detail), or whether they have suggestions for improvement. However, we need to be technically correct in how we speak about it, and that includes sticking to the terminology that defines the memory model we have picked. Otherwise, we're deviating from our technical base of reasoning, and this will just create misunderstandings. > > Are you aware of any synchronization in the TLS stuff that is not based > > on locks? > > Yes, plenty. Off the top of my head, there's static TLS block > initialization at dlopen vs use by other threads, there was DTV > initialization vs use and resize (fixed by my 1.5yo-patch, broken by my > patch from last week, and fixed again once the patch that reverts the > breakage goes in, because we only mess with a thread's DTV at thread its > start-up and then by the thread itself) Can you fix this to use C11 atomics (or even the old-style atomics if you're not comfortable with using C11 atomics yet), please? > but IIRC there are > self-concurrency issues if DTV resizing is interrupted by a signal that > uses GD and starts further resizing) and IIRC there is plenty of > suspicious stuff in slotinfo manipulation vs use (the data structure is > only modified while holding the rtld lock, but it's read while updating > a DTV without any explicit synchronization; I never looked into it > enough to tell what the assumptions were that make it safe at some > point, to tell whether there's any chance they still are); That sounds like a incorrect attempt at double-checked locking, which I've found several cases of so far, so it would be "within the pattern". > I have a > hunch that there are probably issues with TLS descriptors as well, > because although they're created while holding the rtld lock, their use > is not explicitly guarded either, nor are than the GOT entries relocated > so as to point to them. > > IMHO these are all internal implementations, so the C11 memory model for > applications doesn't quite apply to them: The C11 model is the memory that glibc uses. See https://sourceware.org/glibc/wiki/Concurrency > it's ok if they make stronger > or weaker assumptions than what the language is supposed to expose to > users of the language. Just like the *implementation* of mutexes, > locks, condition variables and other such concepts won't necessarily > abide by the rules imposed on users of the language, as long as they > deliver the intended semantics, icluding synchronization and memory > model properties. See above. While I agree that we could use something else, we chose C11 for glibc's code too. There are very few cases where we'd need more than C11 atomics to implement glibc's concurrent code. > > If so, it would be great if you could transform them to using the > > new-style C11 atomics. > > I guess it would, wouldn't it? Too bad I'm probably not. I just can't > stand even the thought of the never-ending conversations that would > ensue. If your patches would follow the C11 model and the comments would be fine, you'd just get an "OK" in my review. > So, I'm out of here; the only reason you're even from me is that > I'm responsible enough to at least try to fix the breakage I introduced, > once it was brought to my attention (at a time in which I actually > noticed it; too bad I missed it the first time). > > > I've reviewed nscd concurrency recently and converted it to use C11 > > atomics, and this process revealed many synchronization bugs such as > > missing barriers. > > That shouldn't be surprising. A lot of that code was written before C11 > was even in the foreseeable future. It was fine to assume TSO when glibc was started, and the role of compilers in concurrency was less well understood than before. However, glibc claims to support ARM and PowerPC correctly now for quite a while, and that is not the case in concurrent code in many cases (eg, because assuming something like TSO). Also note that even if assuming a strong memory model, some of the nscd synchronization would be broken. Furthermore, you reviewed nscd-related code and marked it as thread-safe. This wasn't that long ago, and happened at a time at which weak memory models such as ARM's, and the role of the compiler were well understood. My patch shows that this wasn't thread-safe. Thus, I find it surprising that you think it wasn't surprising that the synchronization in nscd is buggy. > >> + Another race, not to be mistaken for the above, has to do with > >> + the TLS block initialized below. Although we perform this > >> + initialization while holding the rtld lock, once dlopen > >> + completes, other threads might run code that accesses the > >> + variables in these TLS blocks, without taking the rtld lock or > >> + any other explicit memory acquire. > > > What is a "memory acquire"? > > That's a term from long before C standardized atomics, from the 20 years > or so in which people like myself have been doing concurrent programming > on multi-CPU computers without explicit language support, and even on > single-CPU multi-tasked systems. In dusted books from that age, you may > be able to find tons of information about such things as memory barriers > or fences, some with acquire and release semantics, as well as the > implied acquire and release of all memory that occurs at mutex acquire > and release time, respectively. > > There are still some remnants from those days in e.g. Linux docs: > https://www.kernel.org/doc/Documentation/memory-barriers.txt > > but a lot of what can be found on the Internet now seems to be more > recent than C and C++ atomics. Still, the notion of memory acquire and > release from the ancient scrolls is still there, even if a bit in > disguise. There's no reason for us to limit our language and throw away > all the old books just because the term was reused in a new, narrower > context, is there? My point is that you're not specific. There's no such thing as The Acquire Semantics -- so if you should be specific which semantics you are referring to. Referring to a group of semantics isn't good enough if the members of this group differ significantly. One such difference is, for example, whether you assume that all acquire/release actions are global, or always "tied" to accesses to a specific memory location. You seem to think the former, but this isn't true in weak HW memory models we have today and that glibc claims to support. The language we use in glibc needs to be sufficiently precise for what we want to do in glibc. What you (or anyone else for that matter) thinks might have been popular terminology X years ago or on some architectures of the past is frankly irrelevant; sure, it can helps us get to proper terminology, but it's not sufficient in glibc code / comments. > >> + Our implementation appears to assume that, in order for the other > >> + threads to access those variables, there will be some form of > >> + synchronization between the end of dlopen() in one thread and the > >> + use of the just-loaded modules in others, otherwise the very use > >> + of the loaded code could be undefined. */ > > > I'd reverse that: I find it misleading that you start that there is a > > race condition, when in fact this seems not to be the case because of > > the assumption. Therefore, I'd mention the assumption first, and then > > derive from that there is no data-race and that uses of the TLS will > > find it initialized. > > Unless the assumption fails to meet the definition of data race in some > possibly irrelevant fashion. This assumption is orthogonal to how a data race is defined. > The point is that the implementation > doesn't guard itself from certain races, if someone's determined to > create one, That's why I suggested to clearly state the assumption first. You could also call it a precondition for this functionality in glibc. (I'm not aware of any "official" requirement / precondition, but if it exists, we should cite it.) > but I pose they don't matter. Let me give you some > examples: > > - there's the obvious case of the program with a race: the dlopen caller > sets a global non-atomic variable when dlopen completes, and another > thread busy-waits for the variable to change, and then uses its value to > call a function that accesses the newly-loaded TLS variable. The race > on the global variable enables the race on the TLS variable, so > undefined behavior has already been invoked by the time the TLS > implementation incurs its own race. > > - here's a variant that doesn't involve any other race: the dlopen > caller writes to a pipe a pointer to a function that accesses the TLS > variable, and another thread reads the pointer from the pipe and calls > it -> race? there's clearly a happens-before order implied by the > write() and read(), but that doesn't imply memory synchronization in the > memory model AFAICT. I would think that the preferable semantics would be that a read() reading data from a write() call would be considered to create a synchronizes-with edge between these. If it doesn't, then it's the same as if using relaxed-MO atomics to communicate the pointer, which doesn't create a synchronizes-with. In this case, which is not a data race, the precondition of dlopen would apply; if the user violates it, then it's the user's fault. The comment I suggest would then remind glibc developers that there is this precondition, and thus glibc is correct wrt its specification (which includes the precondition). > - another variant in which there's a clear happens-before order, but no > memory synchronization, so there's possibly a race: the dlopen caller > installs a signal handler that accesses the TLS variable, and then gets > the signal sent to the process itself; another thread is chosen to > handle the signal and accesses the TLS variable -> race? Violates the precondition, so undefined behavior. Whether that might or might not lead to a data race in the execution of the implementation is an implementation detail. > - generalizing, use any out-of-(memory-)band information-carrying side > effect that one thread could use to communicate to another thread how to > call a function that accesses the TLS variable, both function and > variable brought in by dlopen, and you'll have, technically, a data > race, but IMO a harmless data race. I'm missing some detail in the description of the case, so I can't say whether it would or would not be a data race in how the variable is communicated. But irrespective of whether the
On Sep 29, 2016, Torvald Riegel <triegel@redhat.com> wrote: > Consider an exaggerated but similar example: I want to add a > non-trivial piece of code, and like to reason about it using notes on > paper. Yet glibc has decided that it wants comments in the code. What > should I do? Should I just give up, or try to find a way to do what > glibc wants and how it decided to maintain the code base (eg, finding a > back-and-forth conversion between glibc model of taking notes and mine)? IMHO the important question is not what *you* should do in this scenario, but what should the GNU Libc community do, when an occasional contributor comes up with a fix for an existing bug. Should you boss around playing language police and demand further unrelated changes, so as to push the contributor away, or should you welcome and thank the contribution so that the contributor feels welcome and inclined to come back? > Right. But we also need to consider that more than one person will > eventually want to maintain any piece of glibc code. We also have to consider that by the time some of these people do, the standards will have changed again, and the language in the comments will again no longer be in line with the newer consensus of the community. That's already happened. As much as you want to portray things this way, some key concepts have not changed, nor have they been ruled out of the vocabulary. > Can you fix this to use C11 atomics (or even the old-style atomics if > you're not comfortable with using C11 atomics yet), please? Of course not. I have a lot of other work to do, and absolutely zero interest in going through the pain of trying to get anything vaguely concurrency-related into glibc, as much as I have studied and been interested in this area. That's unfortunate for glibc, that I'm still somewhat responsible for, but I just can't stand this pain. >> but IIRC there are >> self-concurrency issues if DTV resizing is interrupted by a signal that >> uses GD and starts further resizing) and IIRC there is plenty of >> suspicious stuff in slotinfo manipulation vs use (the data structure is >> only modified while holding the rtld lock, but it's read while updating >> a DTV without any explicit synchronization; I never looked into it >> enough to tell what the assumptions were that make it safe at some >> point, to tell whether there's any chance they still are); > That sounds like a incorrect attempt at double-checked locking, which > I've found several cases of so far, so it would be "within the pattern". I don't see how you could possibly have drawn that conclusion, since the readers do NOT take locks, before or after any test. > Also note that even if assuming a strong memory model, some of the nscd > synchronization would be broken. > Furthermore, you reviewed nscd-related code and marked it as > thread-safe. The review was limited to functions provided by glibc and documented in the glibc manual. nscd provides none. So your claim that I reviewed them may very well be the result of a misunderstanding leading to an incorrect assumption. Regardless, if you found any comments about nscd code in my thread-safety annotations (and presumably corrected them), it would have been polite (and appreciated) for you to copy me, so that I could have been informed and educated about the changes. I don't have recollection of having been copied on any such messages. > I find it surprising that you think it wasn't surprising that the > synchronization in nscd is buggy. I vaguely recall finding a few nss synchronization problems in nss plgins years before the thread-safety review and documentation project started. That's why I wasn't surprised by the existence of synchronization porblems in that general area, even though nss plugins were not part of the review. > I'm missing some detail in the description of the case, so I can't say > whether it would or would not be a data race in how the variable is > communicated. But irrespective of whether the (-: If this message didn't end at this comma,
On Thu, 2016-09-29 at 18:51 -0300, Alexandre Oliva wrote: > On Sep 29, 2016, Torvald Riegel <triegel@redhat.com> wrote: > > > Consider an exaggerated but similar example: I want to add a > > non-trivial piece of code, and like to reason about it using notes on > > paper. Yet glibc has decided that it wants comments in the code. What > > should I do? Should I just give up, or try to find a way to do what > > glibc wants and how it decided to maintain the code base (eg, finding a > > back-and-forth conversion between glibc model of taking notes and mine)? > > IMHO the important question is not what *you* should do in this > scenario, but what should the GNU Libc community do, when an occasional > contributor comes up with a fix for an existing bug. First of all, you are not a hobbyist contributor AFAIK. > Should you boss > around playing language police This is part of patch review. Being consistent re how we document and reason about concurrency, and ensuring that such comments are technically correct and match the code, is important. I've offered my help plenty of times, and the standard for new concurrent code we set is not unreasonable (eg, glibc doesn't use a custom memory model or atomics, but something that all modern C and C++ code will use). So asking patch contributors to adhere to that is fine I think. We're also expecting code to be proper C in all other aspects, so why should we make an exception for concurrency aspects? > and demand further unrelated changes, so > as to push the contributor away, I asked whether you could look at related parts of the code, because you mentioned that you know this larger part of the code base. > > Right. But we also need to consider that more than one person will > > eventually want to maintain any piece of glibc code. > > We also have to consider that by the time some of these people do, the > standards will have changed again, and the language in the comments will > again no longer be in line with the newer consensus of the community. When do you think will we get a new ISO C/C++ memory model? You seem to think that this will happen in just a few years... > >> but IIRC there are > >> self-concurrency issues if DTV resizing is interrupted by a signal that > >> uses GD and starts further resizing) and IIRC there is plenty of > >> suspicious stuff in slotinfo manipulation vs use (the data structure is > >> only modified while holding the rtld lock, but it's read while updating > >> a DTV without any explicit synchronization; I never looked into it > >> enough to tell what the assumptions were that make it safe at some > >> point, to tell whether there's any chance they still are); > > > That sounds like a incorrect attempt at double-checked locking, which > > I've found several cases of so far, so it would be "within the pattern". > > I don't see how you could possibly have drawn that conclusion, since the > readers do NOT take locks, before or after any test. Your description sounded as if it could match, in the sense that readers try to, for example, detect whether the initialization has been done (which uses a critical section); in that case, readers take no lock. Even if it's not initialization-only, the similar point is that you can't publish a result consiting of more than one memory location without memory barriers, and you can't read from it without barriers. > > Also note that even if assuming a strong memory model, some of the nscd > > synchronization would be broken. > > > Furthermore, you reviewed nscd-related code and marked it as > > thread-safe. > > The review was limited to functions provided by glibc and documented in > the glibc manual. nscd provides none. So your claim that I reviewed > them may very well be the result of a misunderstanding leading to an > incorrect assumption. The manual contains annotations of several functions that start with "nscd_". These are in the nscd client code, and are called from functions provided by glibc. Was it somebody else that put these annotations into the manual? nscd client-server synchronization is broken. That there is something wrong is easy to see because there clearly is synchronization and comments that hint at low-level synchronization -- but there are no atomic operations (nor sufficient locking), and thus no memory barriers for example; also, volatile-qualified variables are used even though there's no low-level IO, which is an indicator for synchronization done in a way before proper compiler support for it. > Regardless, if you found any comments about nscd code in my > thread-safety annotations (and presumably corrected them), it would have > been polite (and appreciated) for you to copy me, so that I could have > been informed and educated about the changes. I don't have recollection > of having been copied on any such messages. I have already sent the draft patch to a list you have access to. I haven't checked all the annotations again yet, but I believe that my patch now makes the annotations correct (considering the thread-safety aspect here).
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c index dcab666..42bddc1 100644 --- a/elf/dl-reloc.c +++ b/elf/dl-reloc.c @@ -137,12 +137,6 @@ _dl_nothread_init_static_tls (struct link_map *map) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" #endif - /* Fill in the DTV slot so that a later LD/GD access will find it. */ - dtv_t *dtv = THREAD_DTV (); - assert (map->l_tls_modid <= dtv[-1].counter); - dtv[map->l_tls_modid].pointer.to_free = NULL; - dtv[map->l_tls_modid].pointer.val = dest; - /* Initialize the memory. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 3016a2e..38b6530 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -1207,12 +1207,36 @@ init_one_static_tls (struct pthread *curp, struct link_map *map) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" # endif - /* Fill in the DTV slot so that a later LD/GD access will find it. */ - dtv_t *dtv = GET_DTV (TLS_TPADJ (curp)); - dtv[map->l_tls_modid].pointer.to_free = NULL; - dtv[map->l_tls_modid].pointer.val = dest; - - /* Initialize the memory. */ + /* We cannot delay the initialization of the Static TLS area, since + it can be accessed with LE or IE, but since the DTV is only used + by GD and LD (*) we can delay its update to avoid a race (**). + + (*) The main program's DTV of statically-linked programs may be + used by a custom __tls_get_addr used on platforms that don't + require TLS relaxations, but that one is never initialized by us, + since the main program is never dlopened; the main thread's DTV + entry for the main program gets initialized in __libc_setup_tls, + whereas other threads' corresponding DTV entries are initialized + by _dl_allocate_tls_init. + + (**) If we were to write to other threads' DTV entries here, we + might race with their DTV resizing. We might then write to + stale, possibly already free()d and reallocated memory. Or we + could write past the end of the still-active DTV. So, don't do + that! + + Another race, not to be mistaken for the above, has to do with + the TLS block initialized below. Although we perform this + initialization while holding the rtld lock, once dlopen + completes, other threads might run code that accesses the + variables in these TLS blocks, without taking the rtld lock or + any other explicit memory acquire. + + Our implementation appears to assume that, in order for the other + threads to access those variables, there will be some form of + synchronization between the end of dlopen() in one thread and the + use of the just-loaded modules in others, otherwise the very use + of the loaded code could be undefined. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); }