Message ID | 5693D908.8090104@arm.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 42343 invoked by alias); 11 Jan 2016 16:32:29 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 41119 invoked by uid 89); 11 Jan 2016 16:32:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 spammy=525, modifies, *result, resize X-HELO: eu-smtp-delivery-143.mimecast.com Message-ID: <5693D908.8090104@arm.com> Date: Mon, 11 Jan 2016 16:32:08 +0000 From: Szabolcs Nagy <szabolcs.nagy@arm.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: GNU C Library <libc-alpha@sourceware.org> CC: <i.palachev@samsung.com>, "triegel@redhat.com" <triegel@redhat.com>, <nd@arm.com> Subject: Re: [PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen References: <568D5E11.3010301@arm.com> In-Reply-To: <568D5E11.3010301@arm.com> X-ClientProxiedBy: AM3PR04CA0118.eurprd04.prod.outlook.com (25.163.180.172) To VI1PR08MB0846.eurprd08.prod.outlook.com (25.164.93.144) X-Microsoft-Exchange-Diagnostics: 1; VI1PR08MB0846; 2:uwjP9kKFKInM7Pu6hmyUDioQtcFRUhSyyIeEuDQDHdMq7fxwvBQYOWya9rmtKq1zezFwc+d/5U8woeNdpNN7AKzVF7wf70DIe22g9TAk5bJ9dwebTsxpQOCeTBCdCDRCQpFiUCYzL8iiOMLtTOm34A==; 3:BG/llhFAvuijKxXwYdFpVsDlKgnjhNZn6rNThe53coTV5E5C9C9UiZT2W2Nj6QVQtWEYJntuPTX9LpCgpwvv6e7qwvHsmfeV2S3P9BQJGSViwOKpy4xbj/StNjqAmQBv; 25:l+PaNwhaJ0NzclYRScrB6cd9vcNhXjR9HFCrQBb78MH0vH3nbCTtOqdOIeFOdL9zLsUXtZlt4j2xNrX0xXHZZXiMfgYInoVituH/a4w8zftHY4BtjlMPYUPu6196uLp2GjDU+Z3bnEDMr/4S/EFBgD4MWQtA8cz1OVRjRHkA6+uVdD/1scEpnluAO2PR1SEmAknOyFgoKTy4RfpAdoGMpD+Ofoi+bQHtTUTpfhx0kzOOyD4tbfy5y4cuorYrxJPr; 20:UwwZ+7V8LqkWA5DoXeaX3O1aw+lF4A2DPuIJwlt4UsYWopSOkwy7ZOtpCBRTdxYEsyEjcecgPRtkU5oY+HIwSQbJmnnoVok4nTlfds5Od+uz3G060Vicxlt6nZJzhr1Fgh4pz9TUxhW+j6xi6MlEowmwL8rREDyvzMG/VHd2mGE= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:VI1PR08MB0846; X-MS-Office365-Filtering-Correlation-Id: ad71b152-25ee-4fda-5a40-08d31aa4c376 NoDisclaimer: True X-Microsoft-Antispam-PRVS: <VI1PR08MB08465269F03CCFEAA2232B2FEDC90@VI1PR08MB0846.eurprd08.prod.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(102615245)(601004)(2401047)(5005006)(520078)(8121501046)(3002001)(10201501046); SRVR:VI1PR08MB0846; BCL:0; PCL:0; RULEID:; SRVR:VI1PR08MB0846; X-Microsoft-Exchange-Diagnostics: 1; VI1PR08MB0846; 4:sfH0ows3CUqZaOn0XS5+rJvEy8ei5oNp2WHdMOgvQlZSUCv0IjCBMnrMbyUY6BCzD2f/6pZlgVA+jWA9VWvktfnLJ4ichCdHWqkYO8gLCUjCoGtB144Hbfe84I8qQJqJ649De+vtFypRoZhm7Jg9sz2MevIHMTy+TjS//e/u/Z/l/HPOIsNnC7eMxnhEMuqle2H7P/6KJXCP6YjGXGuPri+CzgUWPEBI9SSvORleDJCvfHZTQhEFN+eKLHaeWg9gPwYyVKbspl5walOjEbLgdKGQSAFwQ3tHF3jZgK3ZEa4NybVondjKRi/OIs/kZwjwey9DcVB52lRPM9usmzwxYWRThvX7q/JqykuTEVnZDmuMLJJAF+EtziysF9VcoRgs71FwAZrOEAWHaDFz2O6NAxYXbfCBtsUVucymWEJYAC49+/25I7QTuxmcdKlbqDE/ X-Forefront-PRVS: 0818724663 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6049001)(6009001)(189002)(377424004)(54534003)(199003)(4326007)(568964002)(5004730100002)(270700001)(2906002)(189998001)(5890100001)(65816999)(36756003)(2950100001)(64126003)(92566002)(21490400002)(77096005)(66066001)(5008740100001)(19580395003)(1096002)(84326002)(83506001)(586003)(19580405001)(3846002)(2476003)(80316001)(65956001)(105586002)(59896002)(33656002)(65806001)(86362001)(6116002)(110136002)(101416001)(5001960100002)(512874002)(81156007)(97736004)(87976001)(106356001)(4610100001)(87266999)(54356999)(122386002)(50986999)(76176999)(42186005)(40100003)(4001350100001); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR08MB0846; H:[10.2.206.73]; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; VI1PR08MB0846; 23:lJ9rfzG/dc7JLrTjZ+D81hmLi1ioqdAjB/LZbRBS6?= =?us-ascii?Q?01+0XXbyWjqpVtZvR7HGPROUnnfqZNfyLZ9xkfHpcz5w4IwNoWyrnoJmmuyN?= =?us-ascii?Q?80JEOEskHv+yMk73zU1ovxP6vQrzoLR7UHseP0J3zhkkUgkevkR5kjk27/XB?= =?us-ascii?Q?jtECFozQTtJ28kg+7VCeG11I1y9ZqGvCy4bJp0ZFyHUNGVoBlxB0bJZ2Wi10?= =?us-ascii?Q?eUwtrHM746GA0yipchQEe3DnRtZhTeWo2UVU6wiCIfw00veTY7P0ZRqYznX8?= =?us-ascii?Q?DLQvNeRQ5Zkn7HMLDB2hk1wlL2jw8xSHy9wvyMm8I2xaGbPxyWjy2IvCk5zj?= =?us-ascii?Q?nuwpNpTQbXfxBU12OeF7SWixXE9rkUChqF+1Zr7BcxEgzKaoXimCWlT5Hv/w?= =?us-ascii?Q?QCbbjzkGpWSIpfWam8WUOs7Oqo1Jbe6bHhTVE3VdgZFIdsmmvEtDmkX8uugN?= =?us-ascii?Q?fpjUec015iaNrBfEFyD8gJQkYKm6NvUiTtJd5Iv2QBjYO+oUU9kCsQWX8VLf?= =?us-ascii?Q?F2C6QakkRyzlU7RjBA13OFzWzAsFsPXgGPAalQhi1tR4BISQ+TXP+1sw6Gwm?= =?us-ascii?Q?SYEkBk6mp83dFx2hLxWWcHOj/xSiLq8Dy3kKhndv6NH5CU2Iqal52kw57938?= =?us-ascii?Q?mKt2akd1rvmw9fMtrQjMlaWmLpBHXEPmqkyis/WaYiEtY/4bcDkkIYS1xRYm?= =?us-ascii?Q?aTYCtvXu5IncIw49tJi/xzVjL0dhDF3Y5v8/DJMXxIXKDd55PckkWjVxuO2F?= =?us-ascii?Q?3XkbI44oJOI3fN53qEdDCZZ3XZ+OcMOvhddUJuJVpqJcHEC9zBZKm1NHntMU?= =?us-ascii?Q?l3Eqd7Q7VFP0LKcZ8Z7WeKOnmldDJ5dN2JtV2xsdcoEOfm5jZLlz2bsGqA42?= =?us-ascii?Q?TRJYBrt5rc7dueFrrhFg0btOk+1gSDxIbqWA1VXaAeJ5xSJGyBbIwxyql9TU?= =?us-ascii?Q?hw05uH30k5/LznnYYxMYYBRi49/bKnIAV46gSW+pmy5i8auaoBjXpZCnNu3C?= =?us-ascii?Q?1nIGYgCVsd209l+K7jwvF7UhbPYd1NksquABH+ohWuLjlwpmIzKO4WjLehDx?= =?us-ascii?Q?SKRK3W8lir5V/LRpV8/C33wWcxN7IrpwaSrKmpj1nq2XasG0sXmjJQOIGpsO?= =?us-ascii?Q?Lusc7b2KOBLT8ZzuKsoHGjU5BQu2l0GO49IyfS6hMZoX/8vWlWv4qYDLkJ+k?= =?us-ascii?Q?7QzGx85GwumwwKvd3+NQzx8dqwl4CY8k+8yDa0ouvAxzgf898+ya0f10KILb?= =?us-ascii?Q?Mc8ac5XFpf2JKFSRds+HmORWxJ8ZgJvhLpqHfV5qrTXtXUH8qC/kdOmxtulD?= =?us-ascii?Q?4yE/wnFXXYB/1rGVQwlppAYAkl3iLAEO63dxsGVo6c52sE5LcQhWpC8qKWW3?= =?us-ascii?Q?u7wgo7BlBjoZNiMovymKNoIjF1/goZFL5cbabnXcnzzNheL?= X-Microsoft-Exchange-Diagnostics: 1; VI1PR08MB0846; 5:8dMngbgcxCqXYFl03JIve8fKE0WyMfFNpZIBZHgDayOdS5SLICvCi7ut420YfuO59aIoRAvrsWYF2obdS92zMAs+r3kwrfayf7AxSUmTXZUj7bmHaK95wntEe4nzxakW6QQYdROCJPadcm4Xc9eC2w==; 24:24mrhOq9WtMaoxg/xYNdFdDVaz7MlmrExA6qArdrAsJ03zvf0s3exgqGlD8Atn9JFRp1odcBpXmvbgrXj+4zzGwDBOxLfIkkiZsTGdqEPsI=; 20:stqo/bmjtsay7Q5ACbsmVI1Mc+t2IpFCSEfkvQ+Zzy+UQtZNVVoTMnQzBlPuJqaJi8qejx3R44b3Ic3StSwnmiXTYJgA9kt5PBGSqowTq8KVdVif1ZISlXmD4NNA/ULujzBgEeZ1uPjegsbXGDHeh/IA9+JSg4mdnPeHeWnRPXQ= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Jan 2016 16:32:14.3300 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB0846 X-MC-Unique: KbT9JidyRE2h4Wdk8uLqoA-1 Content-Type: multipart/mixed; boundary="------------050103050503070406010703" |
Commit Message
Szabolcs Nagy
Jan. 11, 2016, 4:32 p.m. UTC
Changes in patch v2: * The listp->next race is fixed. * GL(dl_tls_generation) is loaded before GL(dl_tls_max_dtv_idx). (matters if several dlopen happens during _dl_allocate_tls_init between those two loads.) * more detailed analysis below. I can trigger the bug on x86_64 by loading a library with >50 deps with tls and concurrently creating >1000 threads, this was not turned into a glibc test case, but attached to the bug report. (On aarch64 the failure happens to be easier to trigger.) The following failures can be triggered: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed! and Inconsistency detected by ld.so: dl-tls.c: 525: _dl_allocate_tls_init: Assertion `listp != NULL' failed! dlopen modifies tls related dynamic linker data structures while holding the GL(dl_load_lock) if the loaded library has tls. Meanwhile at thread creation the same globals are accessed when the dtv and tls is set up for the new thread in _dl_allocate_tls_init without holding any locks. At least the following global objects may have conflicting access: GL(dl_tls_max_dtv_idx) GL(dl_tls_generation) listp->slotinfo[i].map listp->slotinfo[i].gen listp->next where listp points to a node in GL(dl_tls_dtv_slotinfo_list). The race window for the first assert failure above is short compared to dlopen and thread creation, so it rarely happens, but the probability can be increased if the loaded library has a lot of dependencies with tls, and if the deps don't fit into the same listp->slotinfo array then the second assert failure starts to happen. The current sequence of events is: dlopen: The modids of the loaded library and its dependencies are set one by one to ++GL(dl_tls_max_dtv_idx) (assuming they have tls). Then these modules are added to GL(dl_tls_dtv_slotinfo_list) one by one with the same generation number: GL(dl_tls_generation)+1. Then the GL(dl_tls_generation) is increased. pthread_create: The dtv for the thread is allocated assuming GL(dl_tls_max_dtv_idx) is the max modid that will be added to the dtv here. The GL(dl_tls_dtv_slotinfo_list) is walked to initialize dtv[modid] for all modules in the list and initialize the related tls. At the end dtv[0].counter is set to the max generation number seen, all modules with less-than-or-equal number must have initialized dtv and tls in this thread. The patch uses atomics to add modules to the slotinfo list and to increase GL(dl_tls_generation) during dlopen and similarly uses atomics to safely walk the slotinfo list in pthread_create. The dtv and tls are initialized for all modules up to the observed GL(dl_tls_generation), modules with larger generation number (concurrently loaded modules) are ignored. Further issues: dlclose also modifies the slotinfo list in unsafe ways and i don't immediately see a lockfree way to synchronize that with thread creation. (using the GL(dl_load_lock) at thread creation does not work because it can deadlock if pthread_create is called from a ctor while dlopen holds the lock.) i.e. this patch assumes dlclose is not called concurrently with pthread_create. Two (incorrect) assertions were removed from the code and i don't see an easy way to do similar runtime checks to guard against similar races or memory corruptions. I did not review all accesses to the problematic globals there are most likely other problems (e.g. GL(dl_tls_max_dtv_idx) is modified without atomics, tls access (__tls_get_addr) does not use atomics to access the same globals, etc) Glibc does not try to do worst-case allocation of tls for existing threads whenever a library is loaded so allocation might be needed at the first access to tls objects, making it non-as-safe and possibly oom crash. The dtv and tls of the modules that are ignored in this patch will be lazy initialized. Changelog: 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> [BZ #19329] * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) atomically. * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next atomically.
Comments
On 11/01/16 16:32, Szabolcs Nagy wrote: > Further issues: > > dlclose also modifies the slotinfo list in unsafe ways and i don't > immediately see a lockfree way to synchronize that with thread > creation. (using the GL(dl_load_lock) at thread creation does not work > because it can deadlock if pthread_create is called from a ctor while > dlopen holds the lock.) > i.e. this patch assumes dlclose is not called concurrently with > pthread_create. > dlopen holding the lock during ctors is observably broken, i filed https://sourceware.org/bugzilla/show_bug.cgi?id=19448 if that's fixed then pthread_create can grab the lock too and then dlopen/dlclose/pthread_create are trivially synchronized without any atomics. the problem of non-as-safety of tls access still remains, but that's a separate issue.
On 01/11/2016 11:32 AM, Szabolcs Nagy wrote: > 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> > > [BZ #19329] > * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) > atomically. > * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), > GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. > (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next > atomically. You are headed in the right direction. I like where this patch is going, but don't have enough time to review this in detail yet. At first glance your patch lacks sufficient concurrency documentation to be acceptable. You need to document which acquires the releases synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level of detail we need to maintain these kinds of changes. Cheers, Carlos.
On 13/01/16 02:34, Carlos O'Donell wrote: > On 01/11/2016 11:32 AM, Szabolcs Nagy wrote: >> 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> >> >> [BZ #19329] >> * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) >> atomically. >> * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), >> GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. >> (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next >> atomically. > > You are headed in the right direction. I like where this patch is going, > but don't have enough time to review this in detail yet. > will there be time before 2.23? > At first glance your patch lacks sufficient concurrency documentation to > be acceptable. You need to document which acquires the releases > synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level > of detail we need to maintain these kinds of changes. > i wanted to avoid documenting all the mess in the dynamic linker, but i can improve the comments. i see the following bugs: 1) pthread_create is not synced with dlopen 2) pthread_create is not synced with dlclose 3) tls access is not synced with dlopen 4) tls access is not synced with dlclose 5) tls access can oom crash 6) tls access is not as-safe 7) dlopen holds a global lock during ctors i can fix 1) by adding some atomics (this is the current patch). for 2) that's not enough, because dlclose has to wait with the free(link_map) while pthread_create is initializing the tls. i guess 3) can be fixed similarly to 1) but i don't have a test case for that. the rest is harder to fix. is it ok to only fix 1) for 2.23? or should i try to fix 3) as well (races on the same globals)?
On 14-01-2016 13:09, Szabolcs Nagy wrote: > On 13/01/16 02:34, Carlos O'Donell wrote: >> On 01/11/2016 11:32 AM, Szabolcs Nagy wrote: >>> 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> >>> >>> [BZ #19329] >>> * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) >>> atomically. >>> * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), >>> GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. >>> (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next >>> atomically. >> >> You are headed in the right direction. I like where this patch is going, >> but don't have enough time to review this in detail yet. >> > > will there be time before 2.23? I intend to send a message about the hard freeze today and the question is if this fix is suitable to not show regression in the following mount and if so if we could iron out the bugs. So are you confident about that? I am asking you this because this synchronization issues are a nest of rats sometimes and I also noted you found out multiple issues during the patch sending and reviewing. Objections? > >> At first glance your patch lacks sufficient concurrency documentation to >> be acceptable. You need to document which acquires the releases >> synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level >> of detail we need to maintain these kinds of changes. >> > > i wanted to avoid documenting all the mess in the dynamic linker, > but i can improve the comments. > > i see the following bugs: > > 1) pthread_create is not synced with dlopen > 2) pthread_create is not synced with dlclose > 3) tls access is not synced with dlopen > 4) tls access is not synced with dlclose > 5) tls access can oom crash > 6) tls access is not as-safe > 7) dlopen holds a global lock during ctors > > i can fix 1) by adding some atomics (this is the current patch). > > for 2) that's not enough, because dlclose has to wait with the > free(link_map) while pthread_create is initializing the tls. So which would be a better approach regarding it? > > i guess 3) can be fixed similarly to 1) but i don't have a > test case for that. > > the rest is harder to fix. > > is it ok to only fix 1) for 2.23? > or should i try to fix 3) as well (races on the same globals)? > I would say we attack one problem at time, even when they are related. How hard do you consider to fix 3.?
On 18/01/16 18:00, Adhemerval Zanella wrote: > On 14-01-2016 13:09, Szabolcs Nagy wrote: >> On 13/01/16 02:34, Carlos O'Donell wrote: >>> On 01/11/2016 11:32 AM, Szabolcs Nagy wrote: >>>> 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> >>>> >>>> [BZ #19329] >>>> * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) >>>> atomically. >>>> * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), >>>> GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. >>>> (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next >>>> atomically. >>> >>> You are headed in the right direction. I like where this patch is going, >>> but don't have enough time to review this in detail yet. >>> >> >> will there be time before 2.23? > > I intend to send a message about the hard freeze today and the question is > if this fix is suitable to not show regression in the following mount and > if so if we could iron out the bugs. So are you confident about that? > > I am asking you this because this synchronization issues are a nest of rats > sometimes and I also noted you found out multiple issues during the patch > sending and reviewing. Objections? > i'm confident that i can fix the races i can trigger (new threads vs dlopen), without creating regressions (other than performance regression). adding a handful of atomics is enough for that (it should not alter the behaviour when there is no conflicting memory access), however it might not be the best solution (too conservative) or i might miss a few races etc. >> >>> At first glance your patch lacks sufficient concurrency documentation to >>> be acceptable. You need to document which acquires the releases >>> synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level >>> of detail we need to maintain these kinds of changes. >>> >> >> i wanted to avoid documenting all the mess in the dynamic linker, >> but i can improve the comments. >> >> i see the following bugs: >> >> 1) pthread_create is not synced with dlopen >> 2) pthread_create is not synced with dlclose >> 3) tls access is not synced with dlopen >> 4) tls access is not synced with dlclose >> 5) tls access can oom crash >> 6) tls access is not as-safe >> 7) dlopen holds a global lock during ctors >> >> i can fix 1) by adding some atomics (this is the current patch). >> >> for 2) that's not enough, because dlclose has to wait with the >> free(link_map) while pthread_create is initializing the tls. > > So which would be a better approach regarding it? > i don't know what's the best approach one possibility is to not free memory in dlclose (i guess glibc does not want this). another is to sync e.g. by using some lock, but it cannot be GL(dl_load_lock) now because that is held during ctors are running in dlopen which may call pthread_create. of course using both atomics and locks is a bit ugly.. >> >> i guess 3) can be fixed similarly to 1) but i don't have a >> test case for that. >> >> the rest is harder to fix. >> >> is it ok to only fix 1) for 2.23? >> or should i try to fix 3) as well (races on the same globals)? >> > > I would say we attack one problem at time, even when they are related. > How hard do you consider to fix 3.? > i haven't looked at that in detail, it involves more code starting from __tls_get_addr, but it is probably less critical because the code tries to only look at slotinfo entries up to a point (module's gen number where the accessed tls is) that is guaranteed to be synced by a dlopen that returned. (which means less problems with concurrently loading modules, however GL(*) should be accessed with atomics). i can try to cook up something tomorrow with a detailed concurrency note and then others can decide if it's too risky change or not.
On 18-01-2016 18:09, Szabolcs Nagy wrote: > On 18/01/16 18:00, Adhemerval Zanella wrote: >> On 14-01-2016 13:09, Szabolcs Nagy wrote: >>> On 13/01/16 02:34, Carlos O'Donell wrote: >>>> On 01/11/2016 11:32 AM, Szabolcs Nagy wrote: >>>>> 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> >>>>> >>>>> [BZ #19329] >>>>> * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) >>>>> atomically. >>>>> * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), >>>>> GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. >>>>> (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next >>>>> atomically. >>>> >>>> You are headed in the right direction. I like where this patch is going, >>>> but don't have enough time to review this in detail yet. >>>> >>> >>> will there be time before 2.23? >> >> I intend to send a message about the hard freeze today and the question is >> if this fix is suitable to not show regression in the following mount and >> if so if we could iron out the bugs. So are you confident about that? >> >> I am asking you this because this synchronization issues are a nest of rats >> sometimes and I also noted you found out multiple issues during the patch >> sending and reviewing. Objections? >> > > i'm confident that i can fix the races i can > trigger (new threads vs dlopen), without creating > regressions (other than performance regression). > > adding a handful of atomics is enough for that > (it should not alter the behaviour when there is > no conflicting memory access), however it might > not be the best solution (too conservative) or i > might miss a few races etc. > Fair enough, I have added it on the blockers for 2.23 release. >>> >>>> At first glance your patch lacks sufficient concurrency documentation to >>>> be acceptable. You need to document which acquires the releases >>>> synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level >>>> of detail we need to maintain these kinds of changes. >>>> >>> >>> i wanted to avoid documenting all the mess in the dynamic linker, >>> but i can improve the comments. >>> >>> i see the following bugs: >>> >>> 1) pthread_create is not synced with dlopen >>> 2) pthread_create is not synced with dlclose >>> 3) tls access is not synced with dlopen >>> 4) tls access is not synced with dlclose >>> 5) tls access can oom crash >>> 6) tls access is not as-safe >>> 7) dlopen holds a global lock during ctors >>> >>> i can fix 1) by adding some atomics (this is the current patch). >>> >>> for 2) that's not enough, because dlclose has to wait with the >>> free(link_map) while pthread_create is initializing the tls. >> >> So which would be a better approach regarding it? >> > > i don't know what's the best approach > > one possibility is to not free memory in dlclose > (i guess glibc does not want this). > > another is to sync e.g. by using some lock, but it > cannot be GL(dl_load_lock) now because that is held > during ctors are running in dlopen which may call > pthread_create. > > of course using both atomics and locks is a bit > ugly.. Right, I think we will need to get back on this after 2.23 release. > >>> >>> i guess 3) can be fixed similarly to 1) but i don't have a >>> test case for that. >>> >>> the rest is harder to fix. >>> >>> is it ok to only fix 1) for 2.23? >>> or should i try to fix 3) as well (races on the same globals)? >>> >> >> I would say we attack one problem at time, even when they are related. >> How hard do you consider to fix 3.? >> > > i haven't looked at that in detail, it involves more > code starting from __tls_get_addr, but it is probably > less critical because the code tries to only look at > slotinfo entries up to a point (module's gen number where > the accessed tls is) that is guaranteed to be synced by > a dlopen that returned. > (which means less problems with concurrently loading > modules, however GL(*) should be accessed with atomics). > > i can try to cook up something tomorrow with a detailed > concurrency note and then others can decide if it's > too risky change or not. > Ok, let's delay the fix for 3. to 2.24 then.
On 01/14/2016 10:09 AM, Szabolcs Nagy wrote: > On 13/01/16 02:34, Carlos O'Donell wrote: >> On 01/11/2016 11:32 AM, Szabolcs Nagy wrote: >>> 2016-01-11 Szabolcs Nagy <szabolcs.nagy@arm.com> >>> >>> [BZ #19329] >>> * elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) >>> atomically. >>> * elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation), >>> GL(dl_tls_max_dtv_idx), slotinfo entries and listp->next atomically. >>> (_dl_add_to_slotinfo): Write the slotinfo entries and listp->next >>> atomically. >> >> You are headed in the right direction. I like where this patch is going, >> but don't have enough time to review this in detail yet. >> > > will there be time before 2.23? I don't think there is time to review this sufficiently to get it into 2.23, 2.24 yes, but not 2.23. Even then I will try to do a round of review, and I've asked Torvald if he has any spare cycles to help review this. >> At first glance your patch lacks sufficient concurrency documentation to >> be acceptable. You need to document which acquires the releases >> synchronizes-with. Please grep for "CONCURRENCY NOTES" to see the level >> of detail we need to maintain these kinds of changes. >> > > i wanted to avoid documenting all the mess in the dynamic linker, > but i can improve the comments. You must document *some* of the mess, and you must minimally document everything you change in order to provide future documentation for subsequent reviews of the concurrency code. > i see the following bugs: > > 1) pthread_create is not synced with dlopen > 2) pthread_create is not synced with dlclose > 3) tls access is not synced with dlopen > 4) tls access is not synced with dlclose > 5) tls access can oom crash > 6) tls access is not as-safe > 7) dlopen holds a global lock during ctors > > i can fix 1) by adding some atomics (this is the current patch). > > for 2) that's not enough, because dlclose has to wait with the > free(link_map) while pthread_create is initializing the tls. > > i guess 3) can be fixed similarly to 1) but i don't have a > test case for that. > > the rest is harder to fix. > > is it ok to only fix 1) for 2.23? > or should i try to fix 3) as well (races on the same globals)? No no. We want to minimize the changes going into 2.23. We should aim to fix only (1) right now if that is what you are interested in fixing. Everything we do should be incremental. At each step though we might ask ourselves: How do we solve the more global problem of getting rid of the load lock? Do we all agree that the better future would be an atomic process that can add loaded libraries to the link map list without having to take a lock? Cheers, Carlos.
On Mon, 2016-01-18 at 20:09 +0000, Szabolcs Nagy wrote: > of course using both atomics and locks is a bit > ugly.. Not necessarily. If it works and makes sense otherwise, I would not advice against it. Some problems are easier to solve with locks, some not. Mixing them isn't usually a big problem, if one follows some rules. For example, if you access something with atomics, access it with atomics everywhere, including in the critical sections.
On Mon, 2016-01-18 at 16:00 -0200, Adhemerval Zanella wrote:
> I would say we attack one problem at time, even when they are related.
I think we should at least have something of a high-level plan for how
to fix related problems. One will need that eventually unless the
different parts are indeed orthogonal, so it's not wasted work. But if
trying to fix one part of a group of related problems until the end
before looking at others, there's a risk of wasting time on details of a
solution (eg, reviews) that might not work if considering the whole
problem space.
So, I think the best approach here is to first understand what we
actually want to solve and provide in terms of behavior, then decide on
a high-level scheme for how to do this (ie, the major atomicity /
ordering pieces), and then implementing parts of this scheme one at a
time (or fixing/adapting current code for this).
On Wed, 2016-01-20 at 15:09 -0500, Carlos O'Donell wrote: > Everything we do should be incremental. At each step though we > might ask ourselves: How do we solve the more global problem > of getting rid of the load lock? IMO we should ask that. A better question would probably be to ask why we need the load lock; the lock is just a tool to implement some intended high-level synchronization scheme. We want to agree on this scheme early on, otherwise it will be hard to judge correctness. Note that it can be a perfectly fine approach to simply want to implement the atomicity and ordering provided by some critical sections differently -- but in this specific case, it sounds as if we'd like to have a lock-less scheme anyway, and the current scheme doesn't work or is not implemented correctly. Both hint that we should go up in layers of abstraction and revisit what we actually want.
diff --git a/elf/dl-open.c b/elf/dl-open.c index 6f178b3..7315c3a 100644 --- a/elf/dl-open.c +++ b/elf/dl-open.c @@ -524,9 +524,15 @@ dl_open_worker (void *a) } /* Bump the generation number if necessary. */ - if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0)) - _dl_fatal_printf (N_("\ + if (any_tls) + { + size_t newgen = GL(dl_tls_generation) + 1; + if (__builtin_expect (newgen == 0, 0)) + _dl_fatal_printf (N_("\ TLS generation counter wrapped! Please report this.")); + /* Synchronize with the load acquire in _dl_allocate_tls_init. */ + atomic_store_release (&GL(dl_tls_generation), newgen); + } /* We need a second pass for static tls data, because _dl_update_slotinfo must not be run while calls to _dl_add_to_slotinfo are still pending. */ diff --git a/elf/dl-tls.c b/elf/dl-tls.c index ed13fd9..0afbcab 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -455,9 +455,15 @@ _dl_allocate_tls_init (void *result) struct dtv_slotinfo_list *listp; size_t total = 0; size_t maxgen = 0; + size_t gen_count; + size_t dtv_slots; + /* Synchronize with dl_open_worker, concurrently loaded modules with + larger generation number should be ignored here. */ + gen_count = atomic_load_acquire (&GL(dl_tls_generation)); /* Check if the current dtv is big enough. */ - if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) + dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx)); + if (dtv[-1].counter < dtv_slots) { /* Resize the dtv. */ dtv = _dl_resize_dtv (dtv); @@ -480,18 +486,23 @@ _dl_allocate_tls_init (void *result) void *dest; /* Check for the total number of used slots. */ - if (total + cnt > GL(dl_tls_max_dtv_idx)) + if (total + cnt > dtv_slots) break; - map = listp->slotinfo[cnt].map; + /* Synchronize with _dl_add_to_slotinfo. */ + map = atomic_load_acquire (&listp->slotinfo[cnt].map); if (map == NULL) /* Unused entry. */ continue; + size_t gen = listp->slotinfo[cnt].gen; + if (gen > gen_count) + /* New, concurrently loaded entry. */ + continue; + /* Keep track of the maximum generation number. This might not be the generation counter. */ - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); + maxgen = MAX (maxgen, gen); dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; dtv[map->l_tls_modid].pointer.is_static = false; @@ -518,11 +529,13 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= GL(dl_tls_max_dtv_idx)) + if (total >= dtv_slots) break; - listp = listp->next; - assert (listp != NULL); + /* Synchronize with _dl_add_to_slotinfo. */ + listp = atomic_load_acquire (&listp->next); + if (listp == NULL) + break; } /* The DTV version is up-to-date now. */ @@ -916,7 +929,7 @@ _dl_add_to_slotinfo (struct link_map *l) the first slot. */ assert (idx == 0); - listp = prevp->next = (struct dtv_slotinfo_list *) + listp = (struct dtv_slotinfo_list *) malloc (sizeof (struct dtv_slotinfo_list) + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); if (listp == NULL) @@ -939,9 +952,13 @@ cannot create TLS data structures")); listp->next = NULL; memset (listp->slotinfo, '\0', TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); + /* _dl_allocate_tls_init concurrently walks this list at thread + creation, it must only observe initialized nodes in the list. */ + atomic_store_release (&prevp->next, listp); } /* Add the information into the slotinfo data structure. */ - listp->slotinfo[idx].map = l; listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; + /* Synchronize with the acquire load in _dl_allocate_tls_init. */ + atomic_store_release (&listp->slotinfo[idx].map, l); }