From patchwork Wed Sep 8 02:52:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Malte Skarupke X-Patchwork-Id: 44900 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 646E83851C0B for ; Wed, 8 Sep 2021 02:56:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 646E83851C0B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1631069810; bh=R3032T+wa9ZcghJ639SatfoLidRloSyQu8LF79G33zo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ViqpsNNALYcxxhRxz8dj7JxRtmN+heKEtCY0Q5PfYVzJjiyzUjEXzenJH0T08F5sm ynSG1CAyWJpUUV+wpRqDma3w/iMrHyE+dNPkWk2/Dn5OHxfSGCr7vTZ3RWYh8GK9tv Wbs2z/9kBNSqEhM4DTlzmRYR316SLMirL7ltjDNw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout.web.de (mout.web.de [212.227.17.11]) by sourceware.org (Postfix) with ESMTPS id 5B317385E82E for ; Wed, 8 Sep 2021 02:54:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B317385E82E X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from auth2-smtp.messagingengine.com ([66.111.4.228]) by smtp.web.de (mrweb106 [213.165.67.124]) with ESMTPSA (Nemesis) id 1MJFdB-1me6SU3f35-00KO90; Wed, 08 Sep 2021 04:54:14 +0200 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 7FBDB27C005B; Tue, 7 Sep 2021 22:54:12 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Tue, 07 Sep 2021 22:54:12 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudefiedgieduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepofgrlhhtvgcuufhkrghruhhpkhgvuceomhgrlhhtvghskhgr rhhuphhkvgesfigvsgdruggvqeenucggtffrrghtthgvrhhnpefggeelleeihfefleetie eludejkefgjefhueeitdeltdeikefgvdffjedtieffteenucffohhmrghinhepphhrohgs rggslhihuggrnhgtvgdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpehmrghlthgvshhkrghruhhpkhgvodhmvghsmhhtphgruhhthhhp vghrshhonhgrlhhithihqddutddujedtfedvleeiqdduuddvgedvkeeiledqmhgrlhhtvg hskhgrrhhuphhkvgeppeifvggsrdguvgesfhgrshhtmhgrihhlrdhfmh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 7 Sep 2021 22:54:11 -0400 (EDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 1/6] nptl: Fix pthread_cond_signal missing a sleeper (#BZ 25847) Date: Tue, 7 Sep 2021 22:52:34 -0400 Message-Id: <20210908025239.1326480-2-malteskarupke@web.de> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210908025239.1326480-1-malteskarupke@web.de> References: <20210908025239.1326480-1-malteskarupke@web.de> MIME-Version: 1.0 X-Provags-ID: V03:K1:sBGuHLRzlzZVfr0ZhO8TlXDMBy34pMaW0m2qt19YGradlVyCB5e GQhCv2/z3o1mZ3px/vUb0HjZ7h/E0AMid93F2L2OQjK5Q+CdUZxL+hiI5qCzYNp1eP4iwgy Gux3ddfd9qt7KmJDLIK4z1ktLg6eCOi/dHDvBTmoczwNhJMRd5rr0z2g3f+xl7sxuYJC+sn dlt3WPx/zxRAobuMT6I2Q== X-UI-Out-Filterresults: notjunk:1;V03:K0:7t6BBYxn17U=:AWLphKXYuEOpEb82kd5RJA WUziCZ9n27fKkOyy6Yinv87dRHB81U6gQw+wp2Gj+SUpYXKn7ZH/YkeO8zOx+sV/5nAE/TY3D SntT5g1BZq2/y5I+8Uvr3dxQdj9J1+Qz6Pbb1CIZCEKd31VZHcatqrGo8G0f9zZoHwlqcrlfd P2RSboTjRaRVjZFUA911BITMxPH85w9v+2zQfTCe9IvTTkazc0/w7JPe8hU9cB1jKSHzLtnPr ti5+RBE5Nfu2PYmraq+5rQ1hRKVNEoJnQzcsOdHoTWjfctxiyy8EqqJqXPewGzzOvTxph6s0/ ZpZezW9Pg++ubHGpXzdkVIyiWlu/7FYDDOwK1GBbxZ52Oe9qLG3IUz1kF296EiIjowrm1CnIR ONwayQlP8dCTJoEaf/ciiUOU1N551n2Kg7uX2QuaWLPnq5k0ZbA3EUrovG+3/Z/8OKjuRIdHZ 65RC23nJReLhVlORa3Y9t+OTFLBnnMUJjDBtKFFdl+zSFsoWiwuj0WO2EyK7IFj8gZXcxS3RZ UGiNN07jbOy9whqrj3dwRZyZeLh/QrbvQsUF6LFPTvmcrZLgV3Y1jOQ4vU3C6he5ijVcIxt0O A+owQU2I67zds12o7XIfdT8trj3jpNRVzJ7KMEcOJIoAnQDWl59U70PSyGHopE8ErUsdCD1k3 vhq4hu3t0pkhUDm2pR6jQtCrN+9Xt1MAt/yNQhdRPg0daMayBw0pNK5Gj/yrt7M/Nd2vaA6kR hFbc07U6oX5MAMhLCcpd/mnwO+yKKEdujD6Zi6rgX9KXtmnIg6WjcLR+mc8w6IcOZlqE6YYL3 X3UpoXo1A7u4mkFXpl4WEaHBQv5QHHzWAz8ovYZbce1jUIzwvCECBG20szXoX7vhCgV2GobgC KPp/8W2RYO9uQHZy4q4jDyeENg+RNMcst8Pm2CHvwyA7EAOalp2Z/O8HfZcaxytraI/lnMnpY 63Sj4M52uo3OJRQZMUb0qkGjmEjjLnhWTKXQtWApHrilzh8E+fR2aHSuD+KM+l9nUoRC9A7WO vvq+4PQVwm0ANC85Yx6/fQ5hOqzCFmdxDq/SC1/igX2/9Fyjl/xrM84gCTyWTUGPnjN52D5Bt zbUzQ7ko1IMU9w= X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Malte Skarupke via Libc-alpha From: Malte Skarupke Reply-To: Malte Skarupke Cc: Malte Skarupke , triegel@redhat.com, malteskarupke@fastmail.fm Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" There was a rare bug in pthread_cond_wait's handling of the case when a signal was stolen because a waiter took a long time to leave pthread_cond_wait. I wrote about the bug here: https://probablydance.com/2020/10/31/using-tla-in-the-real-world-to-understand-a-glibc-bug/ The bug was subtle and only happened in an edge-case of an edge-case so rather than fixing it, I decided to remove the outer edge-case: By broadening the scope of grefs, stealing of signals becomes impossible. A signaling thread will always wait for all waiters to leave pthread_cond_wait before closing a group, so now no waiter from the past can come back and steal a signal from a future group. This is patch 1/6, it contains the minimal amount of changes necessary to fix the bug. This leads to an unnecessary amount of atomic operations, but the other patches in this series will undo most of that damage. --- nptl/pthread_cond_wait.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-) -- 2.25.1 diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c index dc8c511f1a..cb674e2634 100644 --- a/nptl/pthread_cond_wait.c +++ b/nptl/pthread_cond_wait.c @@ -409,6 +409,12 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, we only need to synchronize when decrementing the reference count. */ unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8); int private = __condvar_get_private (flags); + /* Acquire a group reference and use acquire MO for that so that we + synchronize with the dummy read-modify-write in + __condvar_quiesce_and_switch_g1 if we read from the same group. This will + make us see the closed flag on __g_signals that designates a concurrent + attempt to reuse the group's slot. */ + atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); /* Now that we are registered as a waiter, we can release the mutex. Waiting on the condvar must be atomic with releasing the mutex, so if @@ -420,6 +426,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, err = __pthread_mutex_unlock_usercnt (mutex, 0); if (__glibc_unlikely (err != 0)) { + __condvar_dec_grefs (cond, g, private); __condvar_cancel_waiting (cond, seq, g, private); __condvar_confirm_wakeup (cond, private); return err; @@ -471,24 +478,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, break; /* No signals available after spinning, so prepare to block. - We first acquire a group reference and use acquire MO for that so - that we synchronize with the dummy read-modify-write in - __condvar_quiesce_and_switch_g1 if we read from that. In turn, - in this case this will make us see the closed flag on __g_signals - that designates a concurrent attempt to reuse the group's slot. - We use acquire MO for the __g_signals check to make the - __g1_start check work (see spinning above). - Note that the group reference acquisition will not mask the - release MO when decrementing the reference count because we use - an atomic read-modify-write operation and thus extend the release - sequence. */ - atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); + First check the closed flag on __g_signals that designates a + concurrent attempt to reuse the group's slot. We use acquire MO for + the __g_signals check to make the __g1_start check work (see + above). */ if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0) || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))) { - /* Our group is closed. Wake up any signalers that might be - waiting. */ - __condvar_dec_grefs (cond, g, private); + /* Our group is closed. */ goto done; } @@ -516,10 +513,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, the lock during cancellation is not possible. */ __condvar_cancel_waiting (cond, seq, g, private); result = err; - goto done; + goto acquire_lock; } - else - __condvar_dec_grefs (cond, g, private); /* Reload signals. See above for MO. */ signals = atomic_load_acquire (cond->__data.__g_signals + g); @@ -598,11 +593,13 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, done: - /* Confirm that we have been woken. We do that before acquiring the mutex - to allow for execution of pthread_cond_destroy while having acquired the - mutex. */ + /* Decrement group reference count and confirm that we have been woken. We do + that before acquiring the mutex to allow for execution of + pthread_cond_destroy while having acquired the mutex. */ + __condvar_dec_grefs (cond, g, private); __condvar_confirm_wakeup (cond, private); + acquire_lock: /* Woken up; now re-acquire the mutex. If this doesn't fail, return RESULT, which is set to ETIMEDOUT if a timeout occured, or zero otherwise. */ err = __pthread_mutex_cond_lock (mutex);