From patchwork Sat Jan 16 20:49:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Malte Skarupke X-Patchwork-Id: 41732 X-Patchwork-Delegate: carlos@redhat.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8D108388A03C; Sat, 16 Jan 2021 20:50:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8D108388A03C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1610830211; bh=d5yXxMk0nFEn5paTGMub8wV/3rbWyCtnpXue1tPECgY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=DyDeYNfdVGcbrqAwABJaE6ulCsTMh/W/7339hNDBCRETo2gTBbcY0RG0I/7smKICZ jTLnJInNHvwhyHeGMewqqKUpYvmb8U3jCoNbKGsillgIv3LRDc5Xmjd82Cu1Kxrd3w fGYpyUlJahN7iwjpBO8dHyKP/OouFdi0j406u9M4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout.web.de (mout.web.de [217.72.192.78]) by sourceware.org (Postfix) with ESMTPS id 6177A388A03C for ; Sat, 16 Jan 2021 20:50:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6177A388A03C X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from auth1-smtp.messagingengine.com ([66.111.4.227]) by smtp.web.de (mrweb103 [213.165.67.124]) with ESMTPSA (Nemesis) id 0M4I2R-1lqv6i1V5P-00rpQB; Sat, 16 Jan 2021 21:49:59 +0100 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id 96D8E27C0054; Sat, 16 Jan 2021 15:49:57 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Sat, 16 Jan 2021 15:49:57 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrtdeggddugeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofestddtredtredttd enucfhrhhomhepofgrlhhtvgcuufhkrghruhhpkhgvuceomhgrlhhtvghskhgrrhhuphhk vgesfigvsgdruggvqeenucggtffrrghtthgvrhhnpeffkeffudeilefhuefgvefhtdfgud duueehjeekkeelgfduleeitdevveeufffgjeenucffohhmrghinhepphhrohgsrggslhih uggrnhgtvgdrtghomhenucfkphepieejrddvheegrdduieehrdelnecuvehluhhsthgvrh fuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepmhgrlhhtvghskhgrrhhuphhk vgdomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidquddtudejtdefvdeliedqud duvdegvdekieelqdhmrghlthgvshhkrghruhhpkhgvpeepfigvsgdruggvsehfrghsthhm rghilhdrfhhm X-ME-Proxy: Received: from localhost.localdomain (unknown [67.254.165.9]) by mail.messagingengine.com (Postfix) with ESMTPA id EA670108005B; Sat, 16 Jan 2021 15:49:56 -0500 (EST) To: libc-alpha@sourceware.org Subject: [PATCH 1/5] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Date: Sat, 16 Jan 2021 15:49:46 -0500 Message-Id: <20210116204950.16434-1-malteskarupke@web.de> X-Mailer: git-send-email 2.17.1 X-Provags-ID: V03:K1:TwWHYoZBbKh/ZHFjkHKGHJXjLdWHaNZ0oANLGLRLD6Nyf7oRs9r +okFkg2tZnGIWO3ahFNiL/I+lxxp51gxyrZwJ3Lemrw9aHhVGzQiMGzSXuae1c6sEjcFA2H rgW5U0LiSdT5y8wol06ixRa/ORwjvRk3S3KqLGDBRZudBIGZ2OGt7MtWUyrTEF9TmiYJsLE 03U0nCzl7fqQGdFUB7B9A== X-UI-Out-Filterresults: notjunk:1;V03:K0:nTEvyZwzMpw=:fwJ5z/5Ytnrrwd+wVQwLPE jMwFR0vJLKXHP9pqqgcqXs6ieFqhiLp7VS69aEM2NFa7LG+Q7pPhLVYBfALSrOLJe4t2mP/6k 69VhQCLMkO5jeUrv35ThBSnxYt1fepiHiKGv0bLHbrmGKV4X71q+MaHpdTUqr6BfZcRs+7Kpb v7ONFgZLJnEAK6zQjpjsYhIIYzaYNx0sWZS9L/1VugWZzHK/UeCtq/DAMPpTbuOCZFKvI1lQI rI1SVHeg3Ne9b5cplyE1goWu5Me1XEVyh9ZMUf19tnuNmBdKW29Ih9Bm0Zx2C+9VkGulVP/7N rGCFcxerYKUjLisBNApzvKiSHbCIM1rcFM0Sq43eIW1a7jYO0OqIQE/+CV9kADPgLM73Tyl0R qP0SQ7TDkYZRHW3DTxLeJDlJORZW3QJv6NyrxiDVjd/coNKWAB9kzwonOoKOl2J1FJu8axsxd dNorR6vdCCsZA3L6k9IhhA3pRuIDZ8YYvd0aWP6lbXOxRCHuljFbB+SfVq7u24E2WIJmt0HFw 0rlIvIssGxx5tnX8g/Swq075oMgiIl1nmeql0sn+echQhanLRG/DRkvqM3EnXdkAZe0tmmF9A xpJ6Q2S77Usl+URDGv9D7vMUlOHSyQO73ygemUQX2c1mOFybkgJhtzAwzm7EEiNI/R35Sss91 xpK7qwLMI/gjaeBoChATXLOAyzhOOAvr7yolazqpzMTRudFbVc2Jl2zgHp36t6dELzCOteeMA GbfkZdRifd6zjd+ghPZK6w3TFuAPVgawjW3IZOvIE25uxmk9pJg+kR1Jrr5/CFBzlfmUYkFQa TlAfUoQdMA0pELQZb82uc0l1L9dv5hu65a3WWj1K9HOzNvODDHCazCkfsetZ5fiDatLo4cZS6 jpnm7d5IBtQVkxNStm4A== X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Malte Skarupke via Libc-alpha From: Malte Skarupke Reply-To: Malte Skarupke Cc: Malte Skarupke , triegel@redhat.com, malteskarupke@fastmail.fm Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" There was a rare bug in pthread_cond_wait's handling of the case when a signal was stolen because a waiter took a long time to leave pthread_cond_wait. I wrote about the bug here: https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/ The bug was subtle and only happened in an edge-case of an edge-case so rather than fixing it, I decided to remove the outer edge-case: By broadening the scope of grefs, stealing of signals becomes impossible. A signaling thread will always wait for all waiters to leave pthread_cond_wait before closing a group, so now no waiter from the past can come back and steal a signal from a future group. This change is the minimal amount of changes necessary to fix the bug. This leads to slightly slower performance, but the next two patches in this series will undo most of that damage. --- nptl/pthread_cond_wait.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) -- 2.17.1 diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c index 02d11c61db..0f50048c0b 100644 --- a/nptl/pthread_cond_wait.c +++ b/nptl/pthread_cond_wait.c @@ -405,6 +405,10 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, unsigned int g = wseq & 1; uint64_t seq = wseq >> 1; + /* Acquire a group reference and use acquire MO for that so that we + synchronize with the dummy read-modify-write in + __condvar_quiesce_and_switch_g1 if we read from that. */ + atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); /* Increase the waiter reference count. Relaxed MO is sufficient because we only need to synchronize when decrementing the reference count. */ unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8); @@ -422,6 +426,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, { __condvar_cancel_waiting (cond, seq, g, private); __condvar_confirm_wakeup (cond, private); + __condvar_dec_grefs (cond, g, private); return err; } @@ -471,24 +476,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, break; /* No signals available after spinning, so prepare to block. - We first acquire a group reference and use acquire MO for that so - that we synchronize with the dummy read-modify-write in - __condvar_quiesce_and_switch_g1 if we read from that. In turn, - in this case this will make us see the closed flag on __g_signals - that designates a concurrent attempt to reuse the group's slot. - We use acquire MO for the __g_signals check to make the - __g1_start check work (see spinning above). - Note that the group reference acquisition will not mask the - release MO when decrementing the reference count because we use - an atomic read-modify-write operation and thus extend the release - sequence. */ - atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); + First check the closed flag on __g_signals that designates a + concurrent attempt to reuse the group's slot. We use acquire MO for + the __g_signals check to make the __g1_start check work (see + spinning above). */ if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0) || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))) { - /* Our group is closed. Wake up any signalers that might be - waiting. */ - __condvar_dec_grefs (cond, g, private); + /* Our group is closed. */ goto done; } @@ -508,7 +503,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW)) { - __condvar_dec_grefs (cond, g, private); /* If we timed out, we effectively cancel waiting. Note that we have decremented __g_refs before cancellation, so that a deadlock between waiting for quiescence of our group in @@ -518,8 +512,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, result = err; goto done; } - else - __condvar_dec_grefs (cond, g, private); /* Reload signals. See above for MO. */ signals = atomic_load_acquire (cond->__data.__g_signals + g); @@ -602,6 +594,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, to allow for execution of pthread_cond_destroy while having acquired the mutex. */ __condvar_confirm_wakeup (cond, private); + __condvar_dec_grefs (cond, g, private); /* Woken up; now re-acquire the mutex. If this doesn't fail, return RESULT, which is set to ETIMEDOUT if a timeout occured, or zero otherwise. */