Message ID | 568D5E11.3010301@arm.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 98728 invoked by alias); 6 Jan 2016 18:34:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 98673 invoked by uid 89); 6 Jan 2016 18:34:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_05, SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Exchange-Antispam-Report-CFA-Test:102615240, increased, HX-Exchange-Antispam-Report-Test:180628864354917, 5187 X-HELO: eu-smtp-delivery-143.mimecast.com Message-ID: <568D5E11.3010301@arm.com> Date: Wed, 6 Jan 2016 18:33:53 +0000 From: Szabolcs Nagy <szabolcs.nagy@arm.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: GNU C Library <libc-alpha@sourceware.org> CC: <i.palachev@samsung.com>, "triegel@redhat.com" <triegel@redhat.com>, <nd@arm.com> Subject: [PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen X-ClientProxiedBy: DB3PR01CA0053.eurprd01.prod.exchangelabs.com (10.242.133.156) To DB5PR08MB0840.eurprd08.prod.outlook.com (25.164.43.18) X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0840; 2:TghWHJ4anPB5Dr62S6PLGGR1Ee4u/udr9f23cHjATmqM12uVlDaybhq+xvVmzVziUB1LdRGyX/CoJaXdS9tewQHBRRH/FskAjDiamEK7UyXhWz6jsZMSfLb4sYCHa1CkdChWCXUUT/d8tJTKkfgdPQ==; 3:18HiduhI0j/v4b3UMtftZMG2+36cglQQKpsR/d9MPPHcxdsmCycC2UIZqMKsBx2glxm3t/EI+kVi3T8EdVfrAvFgzNq3aHpakH4hOyCsbNmZJ1W8nT2J/fnpXLWSNTac; 25:SaDYkx2qCV6B0oawVLCyDwz9hYTm9rSeqBLvd+GiinGL2PrsoviDOR48LqeUSHgtfGoABs0FqBMzPe+OrnWt0LnjqSKqLzrrzmDoLtAhFeJwwjIqycQkKtte5rkKu1bw9Dv9ETohFOP0CW8nlNJRpwwl8P0QMvrbngTbNNwBW2pHxKvL+g/B4GgCfv2CCSEsWKWV5DHpnFhd5GOurqQuT2V8qF4fCSEUdwY6580p4ReT6X3QqqUgr36juFD8cBya31ehF8IXb0CbvJeqPTivkw==; 20:d92+zZzURzxu4A9n2OKND20xPuo3RYjffmOhWB0NE787OsBX5eVr730le8FE9LE2GPzUWSeX0z1/Qw01p94+hP8wKYwFac06kQCzwBDXjH85mrWpGsPZzQI97K6YIoYwMA42WrHKDsxE732XxhwaAUgEeu+fTjnFBa/n0eSYl2o= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0840; NoDisclaimer: True X-Microsoft-Antispam-PRVS: <DB5PR08MB08404579A5405F3522B43049EDF40@DB5PR08MB0840.eurprd08.prod.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(102615240)(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001); SRVR:DB5PR08MB0840; BCL:0; PCL:0; RULEID:; SRVR:DB5PR08MB0840; X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0840; 4:QcNfEYfeqkFPmY+eokvZeWtSYgW+xfdSvFC6NHbKFowx/jdtJGfHKgjOevZru48muaBB2v3ACY51TOxFv5daQZNxjZw8D/2Dhmz8GaLUrHiKUxdjogESMm9soPOSybp58BUyCbrFhmseNRIuo8UMyE/c9fgZbrDnHUfAmjyLh5Cl2qQM3Xcd8fiolEGKge/s+Yy2InPyPZeqxwHY3s0bfJ97Y6c12u9UzH8LNbS3//0SJO6OAT883CTebGecL7oViSUcdySJskfG0JuNn0YQ0HB0YPVfGnwTSeQjYGt8IORq0YkSYRkPOMOxzkprIXvogYfrIb7fDFOse2gvN6TwaGz9ft7hd8Swcz8nM+yT9q7KzJC021bylSMJzYiv7LWQYd9oN5u87Nw2eHlUxa1vkyY1MB/0+vT83fNxDmpI7HzG+5D/5rIqBvxG2IagJUnDBKT/5iq73f036llJxmcPdA== X-Forefront-PRVS: 0813C68E65 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6009001)(6049001)(377424004)(199003)(54534003)(189002)(229853001)(4326007)(3846002)(59896002)(42186005)(50986999)(189998001)(77096005)(92566002)(19580405001)(80316001)(40100003)(270700001)(87976001)(5004730100002)(65816999)(4610100001)(66066001)(33656002)(5890100001)(36756003)(5008740100001)(110136002)(54356999)(21490400002)(83506001)(122386002)(65806001)(586003)(84326002)(86362001)(5001960100002)(65956001)(81156007)(87266999)(1096002)(105586002)(64126003)(512874002)(101416001)(97736004)(4001350100001)(568964002)(106356001)(19580395003)(2476003)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB5PR08MB0840; H:[10.2.206.73]; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DB5PR08MB0840; 23:fpM4iCSX0poSVfKopaGDc5rKpVkeFJOoktYa9/5/x?= =?us-ascii?Q?FKf0uF7frU1Sk1nJo3r3O5+bGDjF2ZeEZD5DKM/KZEf9V3xRM+UpShcNGi1C?= =?us-ascii?Q?JkmvH4i/+TUf1r+/fCgJX5y2r3jfWoCkdypGTgX2T1hWFAVywsO8/6Tf78KG?= =?us-ascii?Q?vay06V3pGNB8v+Px4KSEF6zresz5zAn8ElPeBXZSGwk9BaGGbQb9FbVcEKO2?= =?us-ascii?Q?mpSzT2QswaQQ6UvqfiSXduzGkd6X1B3PyzacKUI+8rgCGyOHAl5Fa9BLcwV1?= =?us-ascii?Q?LdiZw2fJp6FX1xckDtdUnbDJ8enp9yebgN3KODiCbncb4sUR3FkQB3gxku+m?= =?us-ascii?Q?+KvMCtbDSxaz4U9SCd9/SFqLGlTM5e/Cy74UtRFNNaot5a16KvMBKNNnR8/d?= =?us-ascii?Q?EvYcz/SrwGZvpSh8ttu6yYK1ak1kDpniDMbGn4VyElsRVsNKOpjeA88kTV9H?= =?us-ascii?Q?91z3R/drgng5o/dxVIuWOkSgQjW4sPn/NdrDE1Qfw2IoCzU4/0wrwhFKuM/U?= =?us-ascii?Q?KqGpqYc/g/Q3BhKO+OlDWMqx5Zd4Hn4JjPCF6M/W4R7ZiV+ZbiP3mLJLC7IS?= =?us-ascii?Q?fd7z6HA65KlsYHOivxbSO4B2qc9u/H8cdIPynlu2hOKP58zMuLjgVRruwLCA?= =?us-ascii?Q?pC6k5V2ft+IsAjrvIHr+I9zgN/TN1Po683Bq/OXAXQLAERDjQNuklv5Wppwp?= =?us-ascii?Q?mxY9bl4b2w/2uo8vMWQDBdH5SQ1nz+kPxBDOSm21PcZkFyltgoBGbsqclqkg?= =?us-ascii?Q?xPbjwVeIf0JZKn9jSVWKlFRmubTXe9abv0F93bJt+akYrMSpmVnHrbsO/K0o?= =?us-ascii?Q?Cyp1m6N3TMlw1bya/xs0+ny1zK0epfVNMGXkMoIDwsrWpSPz72eNYCUDCcuW?= =?us-ascii?Q?t6pV8Q5lNF15+d3H3b+EoOOtvMAJLUVosXInhNzdR1WX6CSLMBVEh/N7CCBf?= =?us-ascii?Q?zpaJU4a2JT6DksfaLUWlo5YmRwyNHhk06w6/67BKE8PO7us+cedNbClbsuxp?= =?us-ascii?Q?+63gqSoUJd2iU5Lj3qz2m3+qON0xjLP6hDFdRJi355QFyy7Ogi1HnkMiiw+x?= =?us-ascii?Q?fwQJT++sDsqerr9763XlDLw/LagNsWZuuucGlizlqzq+xmr1SKLbPS2eC0Cy?= =?us-ascii?Q?ojBXJJLgaaVHACW2r6mYGDra4Xytib/wavIvY/CN8vq380ZotEHLTcaUnLE+?= =?us-ascii?Q?O6w6+9cdpMzZhklqMRaA6hA9pB27sqGku3naA+W+RMcgehriskcBun5qnUNw?= =?us-ascii?Q?N+duQxnBX0fqUoM2ZucpDm0bRRTv1VcRLO2qQkEydsuuS7ytneJrrbL4esWd?= =?us-ascii?Q?G9fj02u9AKLumiPq1gRl+/CLeo7+gM6wPxvajP/6bTfQmuF/T5urIGXpRhi3?= =?us-ascii?Q?L66rg=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0840; 5:nUTdvdJ+LvztOFxwk++Wtzf3Gzd3P6BqTD62yeYoXnClgpJYoCFEG+fcT9z3fkZ+W/oFcnbxqMuYT0pXNAo/GrzMY2J43mGPKP3tEDJrfxjYl/21z6vhNjQYCZvPiNVwAguvhrAGv70k7g5GoZdYdw==; 24:8ib1X71UKfcP0hc8of6MVpUEygL6rg/r+CGWTnm4RsBfa3Aj80xh+SwbHDb4e3/0JV2TmyOzbUeMYASCrDsvMMkOWnFdbvtSQoyS1xc6WlI=; 20:ONp8phRZ9Cp2Pq8xa7dczXvggwT0GA8o65hpoANGEFM7XbKVYXKddBU+PKo3Tb4+CEIrATgHHAnG2W3orwrq9Yyg+2XZHOjx+FP1zXaxKbjqYqQFSp4jh3XIJ1FqZml9A+LAorTyG9wM+AczYekg6zAD8I7+kRZxMVppDNOqkL0= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jan 2016 18:33:56.5628 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0840 X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0614; 2:/o/gVeDvSI0CazkxuIzxD7B7B5zYdkyFTWBXJsWeNmBFux1YE8Db3I4ezywL0jGwuJNbSkanVUn8H7V48jCHt+Hgp73SVOhu+UF3D/Ro9r1EJ315EOF5xFBOh7v3NH4R4hItrE1rOJ4UL8FqiLuFhA==; 23:4tZpsq23llim5UnTLdn7i9JAr62doTIcYMKvkihY1OquuFAqGe1SH1VszWYSj8En8vwkrwf1Nwnh1PW628CB6KmSXIVM1FLrkWo7KHQ5HDaqfKQorVr6Pf9SaGWKpEeP/u6C7foaU45jkuSPgeiey9uA4BvfE0mgBaAv7dvf+DolD4uSpd1Sn02Q6seJ8LuN X-OriginatorOrg: arm.com X-MC-Unique: OYzeA--0Tv2j_twWNxiQ8g-1 Content-Type: multipart/mixed; boundary="------------050006010307030201070505" |
Commit Message
Szabolcs Nagy
Jan. 6, 2016, 6:33 p.m. UTC
I've seen the following failure: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed! It seems dlopen modifies tls related dynamic linker data structures in dl_open_worker if a shared library is loaded with tls, while holding the GL(dl_load_lock). Meanwhile at thread creation the same globals are accessed when the dtv and tls is set up for the new thread in _dl_allocate_tls_init without holding any locks. At least the following objects may have conflicting access: GL(dl_tls_max_dtv_idx) GL(dl_tls_generation) listp->slotinfo[i].map listp->slotinfo[i].gen listp->next The race window that triggers the assert failure above is short compared to dlopen and thread creation, so it rarely happens, but the probability can be increased by loading a long chain of dependent shared objects with tls. Then all the loaded dsos get the same generation number, GL(dl_tls_generation)+1, in the slotinfo list and GL(dl_tls_generation) gets incremented only after that, so the assertion can fail in a concurrently created thread reading the new slotinfos. My fix loads the current number of dtv slots, GL(dl_tls_max_dtv_idx), and the current generation counter, GL(dl_tls_generation), at the beginning of _dl_allocate_tls_init and then ignores any slotinfo entry that got concurrently added. So instead of taking the GL(dl_load_lock), used atomics to avoid the races. I think various other accesses to the objects in the list above should be atomic as well, but that rabbit hole is deeper than what a quick fix could patch up. I can trigger the bug on x86_64 by loading a chain of 50 dsos and concurrently creating >1000 threads, but i cannot easily turn that into a glibc test case. (On aarch64 the failure happens to be more common.) A related issue is that glibc does not try to do worst-case allocation of tls for existing threads whenever a library is loaded so there might be further allocation needed at first tls access making it non-as-safe and possibly oom crash. This lazy allocation allows me to ignore the slotinfo of concurrently loaded dsos in _dl_allocate_tls_init, those will be lazy initialized. Changelog: 2016-01-06 Szabolcs Nagy <szabolcs.nagy@arm.com> [BZ #19329] * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) atomically. * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), GL(dl_tls_max_dtv_idx) and slotinfo entries atomically. (_dl_add_to_slotinfo): Write the slotinfo entry atomically.
Comments
On 06/01/16 18:33, Szabolcs Nagy wrote: > At least the following objects may have conflicting access: > > GL(dl_tls_max_dtv_idx) > GL(dl_tls_generation) > listp->slotinfo[i].map > listp->slotinfo[i].gen > listp->next > ... > I think various other accesses to the objects in the list above should > be atomic as well, but that rabbit hole is deeper than what a quick > fix could patch up. i thought i can get away without making listp->next access atomic, but now i see that can fail too > /* Check if the current dtv is big enough. */ > - if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) > + dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx)); > + if (dtv[-1].counter < dtv_slots) > { > /* Resize the dtv. */ > dtv = _dl_resize_dtv (dtv); > @@ -470,6 +473,7 @@ _dl_allocate_tls_init (void *result) > TLS. For those which are dynamically loaded we add the values > indicating deferred allocation. */ > listp = GL(dl_tls_dtv_slotinfo_list); > + gen_count = atomic_load_acquire (&GL(dl_tls_generation)); > while (1) > { > size_t cnt; > @@ -480,18 +484,22 @@ _dl_allocate_tls_init (void *result) > void *dest; > > /* Check for the total number of used slots. */ > - if (total + cnt > GL(dl_tls_max_dtv_idx)) > + if (total + cnt > dtv_slots) > break; .... > total += cnt; > - if (total >= GL(dl_tls_max_dtv_idx)) > + if (total >= dtv_slots) > break; > > listp = listp->next; here. a single listp node can hold 64 slotinfo entries, if more libraries are loaded with tls then GL(dl_tls_max_dtv_idx) > 64, but the updated listp->next pointer might not be visible here. i will update the patch.
On 06.01.2016 21:33, Szabolcs Nagy wrote: > /* Keep track of the maximum generation number. This might > not be the generation counter. */ > - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); > - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); > + maxgen = MAX (maxgen, gen); Thanks for the patch. But it seems quite strange that the failed assertion is simply deleted from the code. Is it still failing for your patch? How can you prove that it is working if the assertion that was failing is now just deleted from the code? If I just remove the assertion and do nothing else, the error will go away. Can you stay the assertion at its place or otherwise explain why do you want to remove it? -- Best regards, Ilya Palachev
On 11/01/16 15:48, Ilya Palachev wrote: > On 06.01.2016 21:33, Szabolcs Nagy wrote: >> /* Keep track of the maximum generation number. This might >> not be the generation counter. */ >> - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); >> - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); >> + maxgen = MAX (maxgen, gen); > > Thanks for the patch. > > But it seems quite strange that the failed assertion is simply deleted from the code. > Is it still failing for your patch? yes > How can you prove that it is working if the assertion that was failing is now just deleted from the code? i can only prove that the assertion is wrong by analysing the code: the condition it verifies cannot be enforced with the current design. removing it is harmless since the slotinfo entries are lazy initialized. > If I just remove the assertion and do nothing else, the error will go away. that's not enough: the dtv of the thread will be in an inconsistent state and tls access in the thread may crash. > Can you stay the assertion at its place or otherwise explain why do you want to remove it? > > -- > Best regards, > Ilya Palachev >
On 11/01/16 16:42, Szabolcs Nagy wrote: > On 11/01/16 15:48, Ilya Palachev wrote: >> How can you prove that it is working if the assertion that was failing is now just deleted from the code? > > i can only prove that the assertion is wrong by > analysing the code: the condition it verifies > cannot be enforced with the current design. > > removing it is harmless since the slotinfo entries > are lazy initialized. > actually this is not true if dlclose may be called concurrently with pthread_create as i noted in my new patch description. i don't know how to protect against that.
On 11.01.2016 19:46, Szabolcs Nagy wrote: > On 11/01/16 16:42, Szabolcs Nagy wrote: >> On 11/01/16 15:48, Ilya Palachev wrote: >>> How can you prove that it is working if the assertion that was >>> failing is now just deleted from the code? >> >> i can only prove that the assertion is wrong by >> analysing the code: the condition it verifies >> cannot be enforced with the current design. >> >> removing it is harmless since the slotinfo entries >> are lazy initialized. >> Thanks for explanation. Do you mean that slotinfo entries are lazy initialized in _dl_update_slotinfo ? > > actually this is not true if dlclose may be called > concurrently with pthread_create as i noted in my > new patch description. > > i don't know how to protect against that. Does _dl_update_slotinfo has support for updating DTV's of other threads in case when some thread called dlclose? As I can see dlclose just sets map pointer to NULL and increments the generation counter: 1. In remove_slotinfo function: listp->slotinfo[idx - disp].gen = GL(dl_tls_generation) + 1; listp->slotinfo[idx - disp].map = NULL; 2. In _dl_close_worker function: /* If we removed any object which uses TLS bump the generation counter. */ if (any_tls) { if (__glibc_unlikely (++GL(dl_tls_generation) == 0)) _dl_fatal_printf ("TLS generation counter wrapped! Please report as described in "REPORT_BUGS_TO".\n"); if (tls_free_end == GL(dl_tls_static_used)) GL(dl_tls_static_used) = tls_free_start; } You have just fixed such memory accesses with atomic_load_acquire and atomic_store_release. Won't these atomics be enough to fix races with dlclose? Or am I missing something? Or you just would like to fill separate bug report for dlclose race? -- Best regards, Ilya Palachev
On 12/01/16 12:20, Ilya Palachev wrote: > On 11.01.2016 19:46, Szabolcs Nagy wrote: >> On 11/01/16 16:42, Szabolcs Nagy wrote: >>> On 11/01/16 15:48, Ilya Palachev wrote: >>>> How can you prove that it is working if the assertion that was failing is now just deleted from the code? >>> >>> i can only prove that the assertion is wrong by >>> analysing the code: the condition it verifies >>> cannot be enforced with the current design. >>> >>> removing it is harmless since the slotinfo entries >>> are lazy initialized. >>> > > Thanks for explanation. > Do you mean that slotinfo entries are lazy initialized in _dl_update_slotinfo ? > >> >> actually this is not true if dlclose may be called >> concurrently with pthread_create as i noted in my >> new patch description. >> >> i don't know how to protect against that. > > Does _dl_update_slotinfo has support for updating DTV's of other threads in case when some thread called dlclose? _dl_update_slotinfo (and tls_get_addr_tail) also walk the slotinfo list without synchronization like _dl_allocate_tls_init. that is a worse case because it happens at tls access time not at a library call (pthread_create) time. they initialize the dtv of the current thread based on the global slotinfo list and might allocate memory for tls, resize the dtv and may also hold the global dl lock.. which are all problematic operations. (these functions should only do as-safe operations and must not do anything that can fail since, unlike a library call, a tls access cannot report failures) i did not try to fix them. i only attempted to fix the pthread_create vs dlopen case. > As I can see dlclose just sets map pointer to NULL and increments the generation counter: > > 1. In remove_slotinfo function: > > listp->slotinfo[idx - disp].gen = GL(dl_tls_generation) + 1; > listp->slotinfo[idx - disp].map = NULL; > > 2. In _dl_close_worker function: > /* If we removed any object which uses TLS bump the generation counter. */ > if (any_tls) > { > if (__glibc_unlikely (++GL(dl_tls_generation) == 0)) > _dl_fatal_printf ("TLS generation counter wrapped! Please report as described in "REPORT_BUGS_TO".\n"); > > if (tls_free_end == GL(dl_tls_static_used)) > GL(dl_tls_static_used) = tls_free_start; > } > > You have just fixed such memory accesses with atomic_load_acquire and atomic_store_release. > Won't these atomics be enough to fix races with dlclose? Or am I missing something? > i was looking at that and thought the memory accesses might be possible to sync (although it's a bit hairy), however dlclose frees the link_map at the end and that's not possible to sync. after _dl_allocate_tls_init loads .map and .gen from the slotinfo it accesses the map assuming that won't go away. the current design in glibc is very broken, e.g. musl can do tls initialization and access in a lockfree way because its dlclose does not free the loaded library, if it did then i think using the dl lock would be necessary. however the dl lock cannot (easily) fix the tls access case and currently it does not fix the pthread_create case either since the dl lock is held during ctors are running so user code can deadlock on it. tl;dr: if your libc has non-noop dlclose it must use a global lock in dlopen/dlclose/thread creation/tls access and user code must not run while that lock is held (e.g. signals must be blocked)
On Mon, 2016-01-11 at 16:42 +0000, Szabolcs Nagy wrote: > On 11/01/16 15:48, Ilya Palachev wrote: > > How can you prove that it is working if the assertion that was failing is now just deleted from the code? > > i can only prove that the assertion is wrong by > analysing the code: the condition it verifies > cannot be enforced with the current design. But that's no sufficient reason to remove the assertion. The assertion might have very well been the intent of the programmer, and what the code actually does might have not been what was intended. Removing the assertion would be similar to removing documentation or comments just because the code doesn't match it. IMO, the proper way to solve this is to investigate the conflict, and find out what the intent should be (eg, to make this work correctly); after that, one knows whether the code or the assertion are wrong. (If we are reasonably confident that the assertion might be wrong, or that the damage caused by it failing is larger than an error triggered by the potential bug, we might just make it not fail. But in this case, the assertion should remain as a comment with a big FIXME or such next to it.)
On Tue, 2016-01-12 at 14:28 +0000, Szabolcs Nagy wrote: > tl;dr: if your libc has non-noop dlclose it must use a > global lock in dlopen/dlclose/thread creation/tls access > and user code must not run while that lock is held > (e.g. signals must be blocked) Depending on what the non-noop functionality is, it might be possible to still implement this in a nonblocking way (so you avoid the blocking sync vs. signals and reentrancy issue).
On 22/01/16 14:49, Torvald Riegel wrote: > On Tue, 2016-01-12 at 14:28 +0000, Szabolcs Nagy wrote: >> tl;dr: if your libc has non-noop dlclose it must use a >> global lock in dlopen/dlclose/thread creation/tls access >> and user code must not run while that lock is held >> (e.g. signals must be blocked) > > Depending on what the non-noop functionality is, it might be possible to > still implement this in a nonblocking way (so you avoid the blocking > sync vs. signals and reentrancy issue). the non-noop functionality is that dlclose frees memory that pthread_create and tls access may look at (the link_map of a dso). i guess lock-free garbage collection is a possibility (e.g. refcounting, except tls access cannot call free so it's not entirely trivial.)
On Fri, 2016-01-22 at 18:32 +0000, Szabolcs Nagy wrote: > On 22/01/16 14:49, Torvald Riegel wrote: > > On Tue, 2016-01-12 at 14:28 +0000, Szabolcs Nagy wrote: > >> tl;dr: if your libc has non-noop dlclose it must use a > >> global lock in dlopen/dlclose/thread creation/tls access > >> and user code must not run while that lock is held > >> (e.g. signals must be blocked) > > > > Depending on what the non-noop functionality is, it might be possible to > > still implement this in a nonblocking way (so you avoid the blocking > > sync vs. signals and reentrancy issue). > > the non-noop functionality is that dlclose frees > memory that pthread_create and tls access may look at > (the link_map of a dso). > > i guess lock-free garbage collection is a possibility > (e.g. refcounting, except tls access cannot call free > so it's not entirely trivial.) So we need to reach a quiescent state (eg, like RCU has to). Not trivial, but it seems releasing memory can be deferred to a later time, and it does not need to be performed by a particular thread. So one can just build a list of things to be freed, and the first thread that can do it just does it.
diff --git a/elf/dl-open.c b/elf/dl-open.c index 5429d18..4e2cd6a 100644 --- a/elf/dl-open.c +++ b/elf/dl-open.c @@ -524,9 +524,15 @@ dl_open_worker (void *a) } /* Bump the generation number if necessary. */ - if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0)) - _dl_fatal_printf (N_("\ + if (any_tls) + { + size_t newgen = GL(dl_tls_generation) + 1; + if (__builtin_expect (newgen == 0, 0)) + _dl_fatal_printf (N_("\ TLS generation counter wrapped! Please report this.")); + /* Synchronize with the load acquire in _dl_allocate_tls_init. */ + atomic_store_release (&GL(dl_tls_generation), newgen); + } /* We need a second pass for static tls data, because _dl_update_slotinfo must not be run while calls to _dl_add_to_slotinfo are still pending. */ diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 20c7e33..3c08a5f 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -455,9 +455,12 @@ _dl_allocate_tls_init (void *result) struct dtv_slotinfo_list *listp; size_t total = 0; size_t maxgen = 0; + size_t gen_count; + size_t dtv_slots; /* Check if the current dtv is big enough. */ - if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) + dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx)); + if (dtv[-1].counter < dtv_slots) { /* Resize the dtv. */ dtv = _dl_resize_dtv (dtv); @@ -470,6 +473,7 @@ _dl_allocate_tls_init (void *result) TLS. For those which are dynamically loaded we add the values indicating deferred allocation. */ listp = GL(dl_tls_dtv_slotinfo_list); + gen_count = atomic_load_acquire (&GL(dl_tls_generation)); while (1) { size_t cnt; @@ -480,18 +484,22 @@ _dl_allocate_tls_init (void *result) void *dest; /* Check for the total number of used slots. */ - if (total + cnt > GL(dl_tls_max_dtv_idx)) + if (total + cnt > dtv_slots) break; - map = listp->slotinfo[cnt].map; + map = atomic_load_acquire (&listp->slotinfo[cnt].map); if (map == NULL) /* Unused entry. */ continue; + size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen); + if (gen > gen_count) + /* New, concurrently loaded entry. */ + continue; + /* Keep track of the maximum generation number. This might not be the generation counter. */ - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); + maxgen = MAX (maxgen, gen); dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; dtv[map->l_tls_modid].pointer.is_static = false; @@ -518,7 +526,7 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= GL(dl_tls_max_dtv_idx)) + if (total >= dtv_slots) break; listp = listp->next; @@ -942,6 +950,7 @@ cannot create TLS data structures")); } /* Add the information into the slotinfo data structure. */ - listp->slotinfo[idx].map = l; - listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; + atomic_store_relaxed (&listp->slotinfo[idx].gen, GL(dl_tls_generation) + 1); + /* Synchronize with the acquire load in _dl_allocate_tls_init. */ + atomic_store_release (&listp->slotinfo[idx].map, l); }