From patchwork Wed Nov 10 18:41:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 47421 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E18BB385842A for ; Wed, 10 Nov 2021 18:42:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E18BB385842A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636569779; bh=8r6mUl6bvuzW/d92rGPWXMHha5yoRoawTMV2DH8ShZI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ZHVXEBOyEOA9fRlRCPhpratL8hn+pbrZIwzzm+hTIan9wmw/PQ0pyQMZ8U/AAHn+D xkDpHWXv9cpLDguyWoBJh92viUx426ObvlFxu7F1n9ypStglsA6S0mWNFq/i3zBl2e pPKm8+PL9m8PwK1kviwggpX39/dOfG1EL6uT5YmU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by sourceware.org (Postfix) with ESMTPS id 09BC13858407 for ; Wed, 10 Nov 2021 18:41:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 09BC13858407 Received: by mail-pj1-x1033.google.com with SMTP id nh10-20020a17090b364a00b001a69adad5ebso2572365pjb.2 for ; Wed, 10 Nov 2021 10:41:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8r6mUl6bvuzW/d92rGPWXMHha5yoRoawTMV2DH8ShZI=; b=0P+3snpclIrbBVKKaLw9AkbS0oc2MJCn4DzCxm8TOJeVmgQ2AIOe+9mq4jfkfOr1LV QELQcXu8bBJ5XVmMonx7Q57eC92zYFUbqRyDWg17Zr0LZgyFHT3IE01t4xaOomrmJZsG kCHhsYk1OknhP1CE2hjiC0oBxbSTLHfMZlPvxtJ1MxlLbmr2Kqn66/bljEJ33IKqQdzX hPj//nhBFBXNAOYw+a2AalApELlb4TZqvFXgIvAGnYHdyx9mV1bllYs8O+6ONE7wcoLE 2opohSLk6rZ2OZ39I0pU389lqCeu5wfKSUnBgcI28Trz8Wh4+ITHPSYz5+Lzmv2uT7Vr sXhw== X-Gm-Message-State: AOAM533vyVIDCi126ESoPkbT6P4jApbvfVkK0MErCA0mdYvBOz0QeP1X QWDdJc/6sOCDnNVw8mpdz3E= X-Google-Smtp-Source: ABdhPJxmXGo5wnLR7QumhVC0Fc1R5+4iZYhj+cY8w5H1cm8yie4+VEx3c98s75wbkalfARq2oPinbA== X-Received: by 2002:a17:90a:1b2a:: with SMTP id q39mr1127370pjq.219.1636569715124; Wed, 10 Nov 2021 10:41:55 -0800 (PST) Received: from gnu-cfl-2.localdomain ([172.58.35.133]) by smtp.gmail.com with ESMTPSA id z8sm253154pgc.53.2021.11.10.10.41.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Nov 2021 10:41:54 -0800 (PST) Received: from gnu-cfl-2.lan (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 45E951A0987; Wed, 10 Nov 2021 10:41:53 -0800 (PST) To: libc-alpha@sourceware.org Subject: [PATCH v5 1/3] Reduce CAS in low level locks [BZ #28537] Date: Wed, 10 Nov 2021 10:41:51 -0800 Message-Id: <20211110184153.2269857-2-hjl.tools@gmail.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211110184153.2269857-1-hjl.tools@gmail.com> References: <20211110184153.2269857-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3029.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: Florian Weimer , Andreas Schwab , "Paul A . Clarke" , Arjan van de Ven Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. 1. Change low level locks to do an atomic load and skip CAS if compare may fail to reduce cache line bouncing on contended locks. 2. In __lll_lock, replace atomic_compare_and_exchange_bool_acq with atomic_compare_and_exchange_val_acq and pass down the result to __lll_lock_wait and __lll_lock_wait_private to avoid the redundant load there. --- nptl/lowlevellock.c | 12 ++++++------ sysdeps/nptl/lowlevellock.h | 33 +++++++++++++++++++++++---------- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/nptl/lowlevellock.c b/nptl/lowlevellock.c index 8c740529c4..d1965c01ca 100644 --- a/nptl/lowlevellock.c +++ b/nptl/lowlevellock.c @@ -22,30 +22,30 @@ #include void -__lll_lock_wait_private (int *futex) +__lll_lock_wait_private (int *futex, int futex_value) { - if (atomic_load_relaxed (futex) == 2) + if (futex_value == 2) goto futex; while (atomic_exchange_acquire (futex, 2) != 0) { futex: - LIBC_PROBE (lll_lock_wait_private, 1, futex); + LIBC_PROBE (lll_lock_wait_private, 2, futex, futex_value); futex_wait ((unsigned int *) futex, 2, LLL_PRIVATE); /* Wait if *futex == 2. */ } } libc_hidden_def (__lll_lock_wait_private) void -__lll_lock_wait (int *futex, int private) +__lll_lock_wait (int *futex, int futex_value, int private) { - if (atomic_load_relaxed (futex) == 2) + if (futex_value == 2) goto futex; while (atomic_exchange_acquire (futex, 2) != 0) { futex: - LIBC_PROBE (lll_lock_wait, 1, futex); + LIBC_PROBE (lll_lock_wait, 2, futex, futex_value); futex_wait ((unsigned int *) futex, 2, private); /* Wait if *futex == 2. */ } } diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h index 4d95114ed3..05260eb706 100644 --- a/sysdeps/nptl/lowlevellock.h +++ b/sysdeps/nptl/lowlevellock.h @@ -66,7 +66,12 @@ 0. Otherwise leave lock unchanged and return non-zero to indicate that the lock was not acquired. */ #define __lll_trylock(lock) \ - __glibc_unlikely (atomic_compare_and_exchange_bool_acq ((lock), 1, 0)) + (__extension__ ({ \ + __typeof (*(lock)) __lock_value = atomic_load_relaxed (lock); \ + (__lock_value != 0 \ + || __glibc_unlikely (atomic_compare_and_exchange_bool_acq ((lock), \ + 1, 0)));\ + })) #define lll_trylock(lock) \ __lll_trylock (&(lock)) @@ -74,11 +79,16 @@ return 0. Otherwise leave lock unchanged and return non-zero to indicate that the lock was not acquired. */ #define lll_cond_trylock(lock) \ - __glibc_unlikely (atomic_compare_and_exchange_bool_acq (&(lock), 2, 0)) - -extern void __lll_lock_wait_private (int *futex); + (__extension__ ({ \ + __typeof (lock) __lock_value = atomic_load_relaxed (&(lock)); \ + (__lock_value != 0 \ + || __glibc_unlikely (atomic_compare_and_exchange_bool_acq (&(lock),\ + 2, 0)));\ + })) + +extern void __lll_lock_wait_private (int *futex, int futex_value); libc_hidden_proto (__lll_lock_wait_private) -extern void __lll_lock_wait (int *futex, int private); +extern void __lll_lock_wait (int *futex, int futex_value, int private); libc_hidden_proto (__lll_lock_wait) /* This is an expression rather than a statement even though its value is @@ -95,13 +105,15 @@ libc_hidden_proto (__lll_lock_wait) ((void) \ ({ \ int *__futex = (futex); \ - if (__glibc_unlikely \ - (atomic_compare_and_exchange_bool_acq (__futex, 1, 0))) \ + int __futex_value = atomic_load_relaxed (futex); \ + if (__futex_value != 0 \ + || ((__futex_value = atomic_compare_and_exchange_val_acq \ + (__futex, 1, 0) != 0))) \ { \ if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \ - __lll_lock_wait_private (__futex); \ + __lll_lock_wait_private (futex, __futex_value); \ else \ - __lll_lock_wait (__futex, private); \ + __lll_lock_wait (futex, __futex_value, private); \ } \ })) #define lll_lock(futex, private) \ @@ -120,7 +132,8 @@ libc_hidden_proto (__lll_lock_wait) ({ \ int *__futex = (futex); \ if (__glibc_unlikely (atomic_exchange_acq (__futex, 2) != 0)) \ - __lll_lock_wait (__futex, private); \ + __lll_lock_wait (__futex, atomic_load_relaxed (__futex), \ + private); \ })) #define lll_cond_lock(futex, private) __lll_cond_lock (&(futex), private)