From patchwork Wed Nov 10 00:16:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 47345 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3122B3858018 for ; Wed, 10 Nov 2021 00:21:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3122B3858018 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636503715; bh=RhIyIvEPXt4NrcKAnhcpzVZiZZ6v1gKsR8tBGOULI54=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=OdiR9TrPFHPNUswuiabgXAgkRyjvMAUhcifnoPtCEzxCT1+LjDfqsMFLsRqznYoXQ wK0GHyoTS7upibN+3vrQbFsuCTZlDJnnKOJPDsS03cn7OY2tDQu4lcs6L75vCuiagN bL06Jz2jG6Lz7CVdN5n7CdnxJajIjNczknc4WPK8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id 751D93857C76 for ; Wed, 10 Nov 2021 00:16:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 751D93857C76 Received: by mail-pj1-x102c.google.com with SMTP id t5-20020a17090a4e4500b001a0a284fcc2so273201pjl.2 for ; Tue, 09 Nov 2021 16:16:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RhIyIvEPXt4NrcKAnhcpzVZiZZ6v1gKsR8tBGOULI54=; b=of67cuFy+EUsLjr+26o3GTZqDiVyiyh8EhV4HpOqY2l0CAqh4fkltINBAlNsIEs1uh 4ATf72ZE52+z0GRFUe/4hoUouPrh0a2nE19TJZ/2WE1j5p59CvPPDO4s342KEwG63dWf TEcTWIsXfcUPf9SGLnS/+A2ehPrCYfVRuLuDa94dcEDe+q4tPU6iZKzYu4h3SymrhnvQ 0P1rXjqzNCA1YyIQ1a29Sqc/B08pra0F/BXe8LXFDVSWAQ+M3HPDJilAgC/NO6mh8wyE yY3VYcPPbu0Hp+mjrb6PvIwHukrQtc4Rb+6SQIn4xXxOnuz18FiA1AghC+VmaOi/+ym8 xKYQ== X-Gm-Message-State: AOAM533HAlu0aOZYLEiujlK8G9NA3p5R0ajSJ/lGG4DXdIAk09bSZ7ma GvYIKhWS3taC55DgrvrikN8= X-Google-Smtp-Source: ABdhPJxTTz8kWz8yik3olm0HZYSwUpSlfzEy8rsqNtEnAAcRBqYvq2CkrWjryuVVuQl6MkB6vXD0SQ== X-Received: by 2002:a17:903:4043:b0:142:4f21:6976 with SMTP id n3-20020a170903404300b001424f216976mr11563336pla.62.1636503376575; Tue, 09 Nov 2021 16:16:16 -0800 (PST) Received: from gnu-cfl-2.localdomain ([172.58.35.133]) by smtp.gmail.com with ESMTPSA id mg12sm3651801pjb.10.2021.11.09.16.16.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Nov 2021 16:16:15 -0800 (PST) Received: from gnu-cfl-2.lan (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 7F5281A0985; Tue, 9 Nov 2021 16:16:14 -0800 (PST) To: libc-alpha@sourceware.org Subject: [PATCH v4 1/3] Reduce CAS in low level locks [BZ #28537] Date: Tue, 9 Nov 2021 16:16:12 -0800 Message-Id: <20211110001614.2087610-2-hjl.tools@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211110001614.2087610-1-hjl.tools@gmail.com> References: <20211110001614.2087610-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3030.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: Florian Weimer , Arjan van de Ven , Andreas Schwab Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. 1. Change low level locks to do an atomic load and skip CAS if compare may fail to reduce cache line bouncing on contended locks. 2. In __lll_lock, replace atomic_compare_and_exchange_bool_acq with atomic_compare_and_exchange_val_acq and pass down the result to __lll_lock_wait and __lll_lock_wait_private to avoid the redundant load there. 3. Drop __glibc_unlikely in __lll_trylock and lll_cond_trylock since we don't know if it's actually rare; in the contended case it is clearly not rare. --- nptl/lowlevellock.c | 12 ++++++------ sysdeps/nptl/lowlevellock.h | 29 ++++++++++++++++++++--------- 2 files changed, 26 insertions(+), 15 deletions(-) diff --git a/nptl/lowlevellock.c b/nptl/lowlevellock.c index 8c740529c4..d1965c01ca 100644 --- a/nptl/lowlevellock.c +++ b/nptl/lowlevellock.c @@ -22,30 +22,30 @@ #include void -__lll_lock_wait_private (int *futex) +__lll_lock_wait_private (int *futex, int futex_value) { - if (atomic_load_relaxed (futex) == 2) + if (futex_value == 2) goto futex; while (atomic_exchange_acquire (futex, 2) != 0) { futex: - LIBC_PROBE (lll_lock_wait_private, 1, futex); + LIBC_PROBE (lll_lock_wait_private, 2, futex, futex_value); futex_wait ((unsigned int *) futex, 2, LLL_PRIVATE); /* Wait if *futex == 2. */ } } libc_hidden_def (__lll_lock_wait_private) void -__lll_lock_wait (int *futex, int private) +__lll_lock_wait (int *futex, int futex_value, int private) { - if (atomic_load_relaxed (futex) == 2) + if (futex_value == 2) goto futex; while (atomic_exchange_acquire (futex, 2) != 0) { futex: - LIBC_PROBE (lll_lock_wait, 1, futex); + LIBC_PROBE (lll_lock_wait, 2, futex, futex_value); futex_wait ((unsigned int *) futex, 2, private); /* Wait if *futex == 2. */ } } diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h index 4d95114ed3..4235d13de9 100644 --- a/sysdeps/nptl/lowlevellock.h +++ b/sysdeps/nptl/lowlevellock.h @@ -66,7 +66,11 @@ 0. Otherwise leave lock unchanged and return non-zero to indicate that the lock was not acquired. */ #define __lll_trylock(lock) \ - __glibc_unlikely (atomic_compare_and_exchange_bool_acq ((lock), 1, 0)) + (__extension__ ({ \ + __typeof (*(lock)) __lock_value = atomic_load_relaxed (lock); \ + (__lock_value != 0 \ + || atomic_compare_and_exchange_bool_acq ((lock), 1, 0)); \ + })) #define lll_trylock(lock) \ __lll_trylock (&(lock)) @@ -74,11 +78,15 @@ return 0. Otherwise leave lock unchanged and return non-zero to indicate that the lock was not acquired. */ #define lll_cond_trylock(lock) \ - __glibc_unlikely (atomic_compare_and_exchange_bool_acq (&(lock), 2, 0)) + (__extension__ ({ \ + __typeof (lock) __lock_value = atomic_load_relaxed (&(lock)); \ + (__lock_value != 0 \ + || atomic_compare_and_exchange_bool_acq (&(lock), 2, 0)); \ + })) -extern void __lll_lock_wait_private (int *futex); +extern void __lll_lock_wait_private (int *futex, int futex_value); libc_hidden_proto (__lll_lock_wait_private) -extern void __lll_lock_wait (int *futex, int private); +extern void __lll_lock_wait (int *futex, int futex_value, int private); libc_hidden_proto (__lll_lock_wait) /* This is an expression rather than a statement even though its value is @@ -95,13 +103,15 @@ libc_hidden_proto (__lll_lock_wait) ((void) \ ({ \ int *__futex = (futex); \ - if (__glibc_unlikely \ - (atomic_compare_and_exchange_bool_acq (__futex, 1, 0))) \ + int __futex_value = atomic_load_relaxed (futex); \ + if (__futex_value != 0 \ + || ((__futex_value = atomic_compare_and_exchange_val_acq \ + (__futex, 1, 0) != 0))) \ { \ if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ - __lll_lock_wait_private (__futex); \ + __lll_lock_wait_private (futex, __futex_value); \ else \ - __lll_lock_wait (__futex, private); \ + __lll_lock_wait (futex, __futex_value, private); \ } \ })) #define lll_lock(futex, private) \ @@ -120,7 +130,8 @@ libc_hidden_proto (__lll_lock_wait) ({ \ int *__futex = (futex); \ if (__glibc_unlikely (atomic_exchange_acq (__futex, 2) != 0)) \ - __lll_lock_wait (__futex, private); \ + __lll_lock_wait (__futex, atomic_load_relaxed (__futex), \ + private); \ })) #define lll_cond_lock(futex, private) __lll_cond_lock (&(futex), private)