From patchwork Wed Jan 6 18:33:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Szabolcs Nagy X-Patchwork-Id: 10255 Received: (qmail 98728 invoked by alias); 6 Jan 2016 18:34:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 98673 invoked by uid 89); 6 Jan 2016 18:34:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_05, SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Exchange-Antispam-Report-CFA-Test:102615240, increased, HX-Exchange-Antispam-Report-Test:180628864354917, 5187 X-HELO: eu-smtp-delivery-143.mimecast.com Message-ID: <568D5E11.3010301@arm.com> Date: Wed, 6 Jan 2016 18:33:53 +0000 From: Szabolcs Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: GNU C Library CC: , "triegel@redhat.com" , Subject: [PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen X-ClientProxiedBy: DB3PR01CA0053.eurprd01.prod.exchangelabs.com (10.242.133.156) To DB5PR08MB0840.eurprd08.prod.outlook.com (25.164.43.18) X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0840; 2:TghWHJ4anPB5Dr62S6PLGGR1Ee4u/udr9f23cHjATmqM12uVlDaybhq+xvVmzVziUB1LdRGyX/CoJaXdS9tewQHBRRH/FskAjDiamEK7UyXhWz6jsZMSfLb4sYCHa1CkdChWCXUUT/d8tJTKkfgdPQ==; 3:18HiduhI0j/v4b3UMtftZMG2+36cglQQKpsR/d9MPPHcxdsmCycC2UIZqMKsBx2glxm3t/EI+kVi3T8EdVfrAvFgzNq3aHpakH4hOyCsbNmZJ1W8nT2J/fnpXLWSNTac; 25:SaDYkx2qCV6B0oawVLCyDwz9hYTm9rSeqBLvd+GiinGL2PrsoviDOR48LqeUSHgtfGoABs0FqBMzPe+OrnWt0LnjqSKqLzrrzmDoLtAhFeJwwjIqycQkKtte5rkKu1bw9Dv9ETohFOP0CW8nlNJRpwwl8P0QMvrbngTbNNwBW2pHxKvL+g/B4GgCfv2CCSEsWKWV5DHpnFhd5GOurqQuT2V8qF4fCSEUdwY6580p4ReT6X3QqqUgr36juFD8cBya31ehF8IXb0CbvJeqPTivkw==; 20:d92+zZzURzxu4A9n2OKND20xPuo3RYjffmOhWB0NE787OsBX5eVr730le8FE9LE2GPzUWSeX0z1/Qw01p94+hP8wKYwFac06kQCzwBDXjH85mrWpGsPZzQI97K6YIoYwMA42WrHKDsxE732XxhwaAUgEeu+fTjnFBa/n0eSYl2o= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB0840; NoDisclaimer: True X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(102615240)(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001); SRVR:DB5PR08MB0840; BCL:0; PCL:0; RULEID:; SRVR:DB5PR08MB0840; X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0840; 4:QcNfEYfeqkFPmY+eokvZeWtSYgW+xfdSvFC6NHbKFowx/jdtJGfHKgjOevZru48muaBB2v3ACY51TOxFv5daQZNxjZw8D/2Dhmz8GaLUrHiKUxdjogESMm9soPOSybp58BUyCbrFhmseNRIuo8UMyE/c9fgZbrDnHUfAmjyLh5Cl2qQM3Xcd8fiolEGKge/s+Yy2InPyPZeqxwHY3s0bfJ97Y6c12u9UzH8LNbS3//0SJO6OAT883CTebGecL7oViSUcdySJskfG0JuNn0YQ0HB0YPVfGnwTSeQjYGt8IORq0YkSYRkPOMOxzkprIXvogYfrIb7fDFOse2gvN6TwaGz9ft7hd8Swcz8nM+yT9q7KzJC021bylSMJzYiv7LWQYd9oN5u87Nw2eHlUxa1vkyY1MB/0+vT83fNxDmpI7HzG+5D/5rIqBvxG2IagJUnDBKT/5iq73f036llJxmcPdA== X-Forefront-PRVS: 0813C68E65 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6009001)(6049001)(377424004)(199003)(54534003)(189002)(229853001)(4326007)(3846002)(59896002)(42186005)(50986999)(189998001)(77096005)(92566002)(19580405001)(80316001)(40100003)(270700001)(87976001)(5004730100002)(65816999)(4610100001)(66066001)(33656002)(5890100001)(36756003)(5008740100001)(110136002)(54356999)(21490400002)(83506001)(122386002)(65806001)(586003)(84326002)(86362001)(5001960100002)(65956001)(81156007)(87266999)(1096002)(105586002)(64126003)(512874002)(101416001)(97736004)(4001350100001)(568964002)(106356001)(19580395003)(2476003)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB5PR08MB0840; H:[10.2.206.73]; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DB5PR08MB0840; 23:fpM4iCSX0poSVfKopaGDc5rKpVkeFJOoktYa9/5/x?= =?us-ascii?Q?FKf0uF7frU1Sk1nJo3r3O5+bGDjF2ZeEZD5DKM/KZEf9V3xRM+UpShcNGi1C?= =?us-ascii?Q?JkmvH4i/+TUf1r+/fCgJX5y2r3jfWoCkdypGTgX2T1hWFAVywsO8/6Tf78KG?= =?us-ascii?Q?vay06V3pGNB8v+Px4KSEF6zresz5zAn8ElPeBXZSGwk9BaGGbQb9FbVcEKO2?= =?us-ascii?Q?mpSzT2QswaQQ6UvqfiSXduzGkd6X1B3PyzacKUI+8rgCGyOHAl5Fa9BLcwV1?= =?us-ascii?Q?LdiZw2fJp6FX1xckDtdUnbDJ8enp9yebgN3KODiCbncb4sUR3FkQB3gxku+m?= =?us-ascii?Q?+KvMCtbDSxaz4U9SCd9/SFqLGlTM5e/Cy74UtRFNNaot5a16KvMBKNNnR8/d?= =?us-ascii?Q?EvYcz/SrwGZvpSh8ttu6yYK1ak1kDpniDMbGn4VyElsRVsNKOpjeA88kTV9H?= =?us-ascii?Q?91z3R/drgng5o/dxVIuWOkSgQjW4sPn/NdrDE1Qfw2IoCzU4/0wrwhFKuM/U?= =?us-ascii?Q?KqGpqYc/g/Q3BhKO+OlDWMqx5Zd4Hn4JjPCF6M/W4R7ZiV+ZbiP3mLJLC7IS?= =?us-ascii?Q?fd7z6HA65KlsYHOivxbSO4B2qc9u/H8cdIPynlu2hOKP58zMuLjgVRruwLCA?= =?us-ascii?Q?pC6k5V2ft+IsAjrvIHr+I9zgN/TN1Po683Bq/OXAXQLAERDjQNuklv5Wppwp?= =?us-ascii?Q?mxY9bl4b2w/2uo8vMWQDBdH5SQ1nz+kPxBDOSm21PcZkFyltgoBGbsqclqkg?= =?us-ascii?Q?xPbjwVeIf0JZKn9jSVWKlFRmubTXe9abv0F93bJt+akYrMSpmVnHrbsO/K0o?= =?us-ascii?Q?Cyp1m6N3TMlw1bya/xs0+ny1zK0epfVNMGXkMoIDwsrWpSPz72eNYCUDCcuW?= =?us-ascii?Q?t6pV8Q5lNF15+d3H3b+EoOOtvMAJLUVosXInhNzdR1WX6CSLMBVEh/N7CCBf?= =?us-ascii?Q?zpaJU4a2JT6DksfaLUWlo5YmRwyNHhk06w6/67BKE8PO7us+cedNbClbsuxp?= =?us-ascii?Q?+63gqSoUJd2iU5Lj3qz2m3+qON0xjLP6hDFdRJi355QFyy7Ogi1HnkMiiw+x?= =?us-ascii?Q?fwQJT++sDsqerr9763XlDLw/LagNsWZuuucGlizlqzq+xmr1SKLbPS2eC0Cy?= =?us-ascii?Q?ojBXJJLgaaVHACW2r6mYGDra4Xytib/wavIvY/CN8vq380ZotEHLTcaUnLE+?= =?us-ascii?Q?O6w6+9cdpMzZhklqMRaA6hA9pB27sqGku3naA+W+RMcgehriskcBun5qnUNw?= =?us-ascii?Q?N+duQxnBX0fqUoM2ZucpDm0bRRTv1VcRLO2qQkEydsuuS7ytneJrrbL4esWd?= =?us-ascii?Q?G9fj02u9AKLumiPq1gRl+/CLeo7+gM6wPxvajP/6bTfQmuF/T5urIGXpRhi3?= =?us-ascii?Q?L66rg=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0840; 5:nUTdvdJ+LvztOFxwk++Wtzf3Gzd3P6BqTD62yeYoXnClgpJYoCFEG+fcT9z3fkZ+W/oFcnbxqMuYT0pXNAo/GrzMY2J43mGPKP3tEDJrfxjYl/21z6vhNjQYCZvPiNVwAguvhrAGv70k7g5GoZdYdw==; 24:8ib1X71UKfcP0hc8of6MVpUEygL6rg/r+CGWTnm4RsBfa3Aj80xh+SwbHDb4e3/0JV2TmyOzbUeMYASCrDsvMMkOWnFdbvtSQoyS1xc6WlI=; 20:ONp8phRZ9Cp2Pq8xa7dczXvggwT0GA8o65hpoANGEFM7XbKVYXKddBU+PKo3Tb4+CEIrATgHHAnG2W3orwrq9Yyg+2XZHOjx+FP1zXaxKbjqYqQFSp4jh3XIJ1FqZml9A+LAorTyG9wM+AczYekg6zAD8I7+kRZxMVppDNOqkL0= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jan 2016 18:33:56.5628 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0840 X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0614; 2:/o/gVeDvSI0CazkxuIzxD7B7B5zYdkyFTWBXJsWeNmBFux1YE8Db3I4ezywL0jGwuJNbSkanVUn8H7V48jCHt+Hgp73SVOhu+UF3D/Ro9r1EJ315EOF5xFBOh7v3NH4R4hItrE1rOJ4UL8FqiLuFhA==; 23:4tZpsq23llim5UnTLdn7i9JAr62doTIcYMKvkihY1OquuFAqGe1SH1VszWYSj8En8vwkrwf1Nwnh1PW628CB6KmSXIVM1FLrkWo7KHQ5HDaqfKQorVr6Pf9SaGWKpEeP/u6C7foaU45jkuSPgeiey9uA4BvfE0mgBaAv7dvf+DolD4uSpd1Sn02Q6seJ8LuN X-OriginatorOrg: arm.com X-MC-Unique: OYzeA--0Tv2j_twWNxiQ8g-1 I've seen the following failure: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed! It seems dlopen modifies tls related dynamic linker data structures in dl_open_worker if a shared library is loaded with tls, while holding the GL(dl_load_lock). Meanwhile at thread creation the same globals are accessed when the dtv and tls is set up for the new thread in _dl_allocate_tls_init without holding any locks. At least the following objects may have conflicting access: GL(dl_tls_max_dtv_idx) GL(dl_tls_generation) listp->slotinfo[i].map listp->slotinfo[i].gen listp->next The race window that triggers the assert failure above is short compared to dlopen and thread creation, so it rarely happens, but the probability can be increased by loading a long chain of dependent shared objects with tls. Then all the loaded dsos get the same generation number, GL(dl_tls_generation)+1, in the slotinfo list and GL(dl_tls_generation) gets incremented only after that, so the assertion can fail in a concurrently created thread reading the new slotinfos. My fix loads the current number of dtv slots, GL(dl_tls_max_dtv_idx), and the current generation counter, GL(dl_tls_generation), at the beginning of _dl_allocate_tls_init and then ignores any slotinfo entry that got concurrently added. So instead of taking the GL(dl_load_lock), used atomics to avoid the races. I think various other accesses to the objects in the list above should be atomic as well, but that rabbit hole is deeper than what a quick fix could patch up. I can trigger the bug on x86_64 by loading a chain of 50 dsos and concurrently creating >1000 threads, but i cannot easily turn that into a glibc test case. (On aarch64 the failure happens to be more common.) A related issue is that glibc does not try to do worst-case allocation of tls for existing threads whenever a library is loaded so there might be further allocation needed at first tls access making it non-as-safe and possibly oom crash. This lazy allocation allows me to ignore the slotinfo of concurrently loaded dsos in _dl_allocate_tls_init, those will be lazy initialized. Changelog: 2016-01-06 Szabolcs Nagy [BZ #19329] * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) atomically. * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), GL(dl_tls_max_dtv_idx) and slotinfo entries atomically. (_dl_add_to_slotinfo): Write the slotinfo entry atomically. diff --git a/elf/dl-open.c b/elf/dl-open.c index 5429d18..4e2cd6a 100644 --- a/elf/dl-open.c +++ b/elf/dl-open.c @@ -524,9 +524,15 @@ dl_open_worker (void *a) } /* Bump the generation number if necessary. */ - if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0)) - _dl_fatal_printf (N_("\ + if (any_tls) + { + size_t newgen = GL(dl_tls_generation) + 1; + if (__builtin_expect (newgen == 0, 0)) + _dl_fatal_printf (N_("\ TLS generation counter wrapped! Please report this.")); + /* Synchronize with the load acquire in _dl_allocate_tls_init. */ + atomic_store_release (&GL(dl_tls_generation), newgen); + } /* We need a second pass for static tls data, because _dl_update_slotinfo must not be run while calls to _dl_add_to_slotinfo are still pending. */ diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 20c7e33..3c08a5f 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -455,9 +455,12 @@ _dl_allocate_tls_init (void *result) struct dtv_slotinfo_list *listp; size_t total = 0; size_t maxgen = 0; + size_t gen_count; + size_t dtv_slots; /* Check if the current dtv is big enough. */ - if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) + dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx)); + if (dtv[-1].counter < dtv_slots) { /* Resize the dtv. */ dtv = _dl_resize_dtv (dtv); @@ -470,6 +473,7 @@ _dl_allocate_tls_init (void *result) TLS. For those which are dynamically loaded we add the values indicating deferred allocation. */ listp = GL(dl_tls_dtv_slotinfo_list); + gen_count = atomic_load_acquire (&GL(dl_tls_generation)); while (1) { size_t cnt; @@ -480,18 +484,22 @@ _dl_allocate_tls_init (void *result) void *dest; /* Check for the total number of used slots. */ - if (total + cnt > GL(dl_tls_max_dtv_idx)) + if (total + cnt > dtv_slots) break; - map = listp->slotinfo[cnt].map; + map = atomic_load_acquire (&listp->slotinfo[cnt].map); if (map == NULL) /* Unused entry. */ continue; + size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen); + if (gen > gen_count) + /* New, concurrently loaded entry. */ + continue; + /* Keep track of the maximum generation number. This might not be the generation counter. */ - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); + maxgen = MAX (maxgen, gen); dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; dtv[map->l_tls_modid].pointer.is_static = false; @@ -518,7 +526,7 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= GL(dl_tls_max_dtv_idx)) + if (total >= dtv_slots) break; listp = listp->next; @@ -942,6 +950,7 @@ cannot create TLS data structures")); } /* Add the information into the slotinfo data structure. */ - listp->slotinfo[idx].map = l; - listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; + atomic_store_relaxed (&listp->slotinfo[idx].gen, GL(dl_tls_generation) + 1); + /* Synchronize with the acquire load in _dl_allocate_tls_init. */ + atomic_store_release (&listp->slotinfo[idx].map, l); }