Message ID | DB6PR0801MB20537B594B5629491605CD74834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 16998 invoked by alias); 12 Oct 2017 09:35:46 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 16871 invoked by uid 89); 12 Oct 2017 09:35:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org> CC: nd <nd@arm.com> Subject: [PATCH 4/5] Fix deadlock in _int_free consistency check Date: Thu, 12 Oct 2017 09:35:41 +0000 Message-ID: <DB6PR0801MB20537B594B5629491605CD74834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com> authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6SPR00MB2500; 6:zNp88YZ19EbuRhDylAPnwFx5jg7JhY26+SbCPZCv/92lDgX98Iyy7QgbaL1qE33NV0U92DUu6haggDbEFJp1TYpWmh7bqU8M8W2sg6UmwEfvtQ+qcN5YPCEN19qUBkwfhrEZUHqlgRzQd3muDwfcKDtROBLotKUNHsAKj2M01yaI6r23v6JYAN8El/pUqMfUW6rcaXEi/uLuTHIS4cJ8wK8O9Fb+GDUEnGMcy5iq37OLa/oeHtvpZCWtNooEeegCjT2a0zugFTWJPY2p8D/z5bVO6uDYXYr1m25UFNiPyJQ0cdzuenyIQmtHJi9UPxRLut/V2gdylLPzFJDslfAWdw==; 5:xw+X48z8p7rt6hi2nwP1hDPJGE7maHeumjCalyQmviToSlQ95aG2XBDXqbjuNZFJvBevf7N5ZuTMgH3akCldP+/7hJ5rQLdNiF4SWxfPFjniy7Y+ZslFPFa6wEddtJu0qE5HcQu0t7MTi7uCCwuHTA==; 24:Uc95xYTD0o44Au1yyDsNv0UPV2IwrVo0Ts+IADkb9R6FerPaw8yNscKSoqX2HjOy/NuM00h0y236r+/A+ufnBGERQ2Bf9N1QcDPCmbqxq3A=; 7:giQ8roqA4AjTpZRcLZ6qUCsAW5LXwjTGzfmxoNuxybUY30AaY1ym4SNGRTIwT/7VZH7gIPf3FoU0KPgdMKeI9y94OPA44tm0qrKQjbzVQcIrY5NZ0MUnsqg5+LrP4gbz56FdlG2kO94dgv89qJfJ8y8W4v3BI20KLG7LshsM4f359HqAhNhKBhkJPm9LyVAP1TiP2Tvh/P+4vx1Yq+hjwGbUPEM13BIF58iDhKkPlGk= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: d6c7b88c-ed5c-4343-ba2b-08d511549abd x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254152)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075); SRVR:DB6SPR00MB2500; x-ms-traffictypediagnostic: DB6SPR00MB2500: nodisclaimer: True x-exchange-antispam-report-test: UriScan:(180628864354917); x-microsoft-antispam-prvs: <DB6SPR00MB250072F309EEAD13DB94CA7E834B0@DB6SPR00MB2500.eurprd08.prod.outlook.com> x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(100000703101)(100105400095)(3002001)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB6SPR00MB2500; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB6SPR00MB2500; x-forefront-prvs: 04583CED1A x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(346002)(376002)(39860400002)(189002)(377424004)(199003)(54534003)(68736007)(189998001)(99286003)(2900100001)(14454004)(2351001)(5660300001)(6116002)(3846002)(102836003)(8676002)(105586002)(106356001)(5640700003)(53936002)(6436002)(2501003)(81156014)(6506006)(8936002)(575784001)(86362001)(3280700002)(3660700001)(81166006)(97736004)(5250100002)(478600001)(101416001)(74316002)(305945005)(66066001)(7736002)(50986999)(6916009)(54356999)(72206003)(33656002)(4326008)(4001150100001)(9686003)(55016002)(25786009)(7696004)(2906002)(316002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6SPR00MB2500; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Oct 2017 09:35:41.2906 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6SPR00MB2500 |
Commit Message
Wilco Dijkstra
Oct. 12, 2017, 9:35 a.m. UTC
This patch fixes a deadlock in the fastbin consistency check. If we fail the fast check due to concurrent modifications to the next chunk or system_mem, we should not lock if we already have the arena lock. Simplify the check to make it obviously correct. Passes GLIBC tests, OK for commit? ChangeLog: 2017-10-11 Wilco Dijkstra <wdijkstr@arm.com> * malloc/malloc.c (_int_free): Fix deadlock bug. --
Comments
On Okt 12 2017, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > diff --git a/malloc/malloc.c b/malloc/malloc.c > index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) > >= av->system_mem, 0)) > { > /* We might not have a lock at this point and concurrent modifications > - of system_mem might have let to a false positive. Redo the test > - after getting the lock. */ > - if (!have_lock > - || ({ __libc_lock_lock (av->mutex); > - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; > - })) > + of system_mem might result in a false positive. Redo the test after > + getting the lock. */ > + if (!have_lock) > + __libc_lock_lock (av->mutex); > + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) There is no need to redo the tests if we had the lock. Andreas.
On 10/12/2017 11:35 AM, Wilco Dijkstra wrote: > This patch fixes a deadlock in the fastbin consistency check. > If we fail the fast check due to concurrent modifications to > the next chunk or system_mem, we should not lock if we already > have the arena lock. Simplify the check to make it obviously > correct. I don't think the subject line is correct. What is the deadlock? I don't see it. Thanks, Florian
On Okt 12 2017, Florian Weimer <fweimer@redhat.com> wrote: > On 10/12/2017 11:35 AM, Wilco Dijkstra wrote: >> This patch fixes a deadlock in the fastbin consistency check. >> If we fail the fast check due to concurrent modifications to >> the next chunk or system_mem, we should not lock if we already >> have the arena lock. Simplify the check to make it obviously >> correct. > > I don't think the subject line is correct. What is the deadlock? I don't > see it. The problem is that commit 24cffce736 inverted the condition on have_lock, which is wrong. Andreas.
Florian Weimer wrote: > I don't think the subject line is correct. What is the deadlock? I > don't see it. > - if (!have_lock > - || ({ __libc_lock_lock (av->mutex); It's right there. Have_lock means you've just done __libc_lock_lock (av->mutex), so doing it again (same thread) implies deadlock. Wilco
Andreas Schwab wrote: > diff --git a/malloc/malloc.c b/malloc/malloc.c > index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) > >= av->system_mem, 0)) > { > /* We might not have a lock at this point and concurrent modifications > - of system_mem might have let to a false positive. Redo the test > - after getting the lock. */ > - if (!have_lock > - || ({ __libc_lock_lock (av->mutex); > - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; > - })) > + of system_mem might result in a false positive. Redo the test after > + getting the lock. */ > + if (!have_lock) > + __libc_lock_lock (av->mutex); > + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) > There is no need to redo the tests if we had the lock. Well I guess an alternative is to do: if (have_lock) print error else { lock repeat test and print error unlock } I also wonder whether we should actually unlock again before printing the error - or do we assume/hope/know no memory allocation is ever required in the error case? Wilco
On Okt 12 2017, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > Andreas Schwab wrote: > >> diff --git a/malloc/malloc.c b/malloc/malloc.c >> index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 >> --- a/malloc/malloc.c >> +++ b/malloc/malloc.c >> @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) >> >= av->system_mem, 0)) >> { >> /* We might not have a lock at this point and concurrent modifications >> - of system_mem might have let to a false positive. Redo the test >> - after getting the lock. */ >> - if (!have_lock >> - || ({ __libc_lock_lock (av->mutex); >> - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ >> - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; >> - })) >> + of system_mem might result in a false positive. Redo the test after >> + getting the lock. */ >> + if (!have_lock) >> + __libc_lock_lock (av->mutex); >> + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ >> + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) > >> There is no need to redo the tests if we had the lock. > > Well I guess an alternative is to do: > > if (have_lock) > print error > else > { > lock > repeat test and print error > unlock > } No, you can just test have_lock again, and skip the redo if set. Still much clearer than the original layout. Andreas.
On 10/12/2017 12:16 PM, Andreas Schwab wrote: > On Okt 12 2017, Florian Weimer <fweimer@redhat.com> wrote: > >> On 10/12/2017 11:35 AM, Wilco Dijkstra wrote: >>> This patch fixes a deadlock in the fastbin consistency check. >>> If we fail the fast check due to concurrent modifications to >>> the next chunk or system_mem, we should not lock if we already >>> have the arena lock. Simplify the check to make it obviously >>> correct. >> >> I don't think the subject line is correct. What is the deadlock? I don't >> see it. > > The problem is that commit 24cffce736 inverted the condition on > have_lock, which is wrong. Oh, right, sorry about that. Thanks, Florian
On 10/12/2017 12:18 PM, Wilco Dijkstra wrote: > Florian Weimer wrote: > >> I don't think the subject line is correct. What is the deadlock? I >> don't see it. > >> - if (!have_lock >> - || ({ __libc_lock_lock (av->mutex); > > It's right there. Have_lock means you've just done __libc_lock_lock (av->mutex), > so doing it again (same thread) implies deadlock. Hmm. So if we enter this code path with have_lock, we don't have to re-do the check, but malloc_printerr will be called in the end anyway, so this is not the interesting case. In practice, without heap corruption, the lock will be acquired here and re-checking is needed, so I think your cleanup is okay after all. The logic is indeed much clearer. Thanks, Florian
DJ Delorie wrote: > I think the bug can be fixed by only changing the sense of the have_lock > condition: > - if (!have_lock > + if (have_lock Yes it could but then it's still impossible to follow the logic... The only reason I spotted the bug was because I refactored the code. Then the bug suddenly became very obvious. I think locking as a side effect in conditions is something we should avoid. So my variant or this one are reasonable ways to do it: if (error detected) { if (have_lock) /* we had the lock during the test above, so the test is valid, and the error we detect is valid. */ print error message else /* we didn't have the lock, so aquire it and repeat the test. If the error is still present, fail. */ get lock, repeat test, maybe print error message } I suppose we could also print the error once at the end: if (error detected) { bool fail = true; if (!have_lock) { get_lock fail = repeat test unlock } if (fail) print error } Wilco
On 10/12/2017 11:35 PM, Wilco Dijkstra wrote: > I suppose we could also print the error once at the end: > > if (error detected) > { > bool fail = true; > if (!have_lock) > { > get_lock > fail = repeat test > unlock > } > if (fail) > print error > } Yes, that's also a good option. Anyway, should I make the one-character change removing the ! in the meantime? This is a real bug, and we should fix that even if the code still isn't as pretty as it could be. Thanks, Florian
Florian Weimer wrote: > > Anyway, should I make the one-character change removing the ! in the > meantime? This is a real bug, and we should fix that even if the code > still isn't as pretty as it could be. I could easily commit the current patch as is if you're thinking of backporting it. Once we agree on what the best way of writing this sequence, I can provide an updated patch. Wilco
diff --git a/malloc/malloc.c b/malloc/malloc.c index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) >= av->system_mem, 0)) { /* We might not have a lock at this point and concurrent modifications - of system_mem might have let to a false positive. Redo the test - after getting the lock. */ - if (!have_lock - || ({ __libc_lock_lock (av->mutex); - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; - })) + of system_mem might result in a false positive. Redo the test after + getting the lock. */ + if (!have_lock) + __libc_lock_lock (av->mutex); + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) malloc_printerr ("free(): invalid next size (fast)"); - if (! have_lock) + if (!have_lock) __libc_lock_unlock (av->mutex); }