From patchwork Thu Nov 4 16:14:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 47061 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22FA23858005 for ; Thu, 4 Nov 2021 16:16:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 22FA23858005 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636042570; bh=JXzs7ZDAoYWZnoBChhlwxdsIjuz5B3vdZaBYPuwRCYQ=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=d09AP8/kmS/ABPTRdqAM3S/phfaB+yhqeXNXDw5zsEb9G0muWHboz7Cx38pR1/Ryw TgzPAPkrk7NfaH2U0UTQTtaUOpGQncBCSntbrlM9WBWroEznBdXW4ks3/pqXRHgCxL 1OIVSXzad6dPLTHqpSOXIiCs/GsvaOwe4sD2EeeU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id 1A6A5385840F for ; Thu, 4 Nov 2021 16:14:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1A6A5385840F Received: by mail-pl1-x62f.google.com with SMTP id p18so7962715plf.13 for ; Thu, 04 Nov 2021 09:14:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=JXzs7ZDAoYWZnoBChhlwxdsIjuz5B3vdZaBYPuwRCYQ=; b=nqguqyvobE6nTBYlENpm1wgAaZKd6P5gurjH/a/mlrrF2s1HL80kS6gC2qBjwFo0G1 NEc2B6kouRwK+rLtDppT3lOZRmGC2/30vk6JPGERZ0NqWoZHzGU0jVB3c9BZz9Qck2qE VnAEoD0cnaycscc8XSA1uYk+N4SE7KHKnX5mjamQ2gByKm9BTGRFiqe+xLw42uJxc+N2 +QSXKIavovqUweHQ74PHWLGE7JKgYakKZ750m+cvnmG8MDsBj1snngPp9vEQR7hErDww WW7CVHzH6UBA+HNNmSY87OewQCVs8r51NbH+F577HvZtiPEhwhpI0aH1wL3M9TeA5DfA tivQ== X-Gm-Message-State: AOAM530rdP3pN+MBI0S1PneWpr//LtuXLphqEfXbuxfjwcuoJeqWPfbi IX9YDNwIZbwS8IPsBQEJUWu5N5/td/Y= X-Google-Smtp-Source: ABdhPJzwWTraXfy9/AIszUZdiVSUjqXu5YPTNfs9VUSiBG2wBCtqWd88CkRIQEoaVIuYzvtQA7mXgw== X-Received: by 2002:a17:902:ce85:b0:141:de7d:514e with SMTP id f5-20020a170902ce8500b00141de7d514emr28791570plg.0.1636042485211; Thu, 04 Nov 2021 09:14:45 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.58.35.133]) by smtp.gmail.com with ESMTPSA id l1sm5269157pff.125.2021.11.04.09.14.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 09:14:44 -0700 (PDT) Received: from gnu-cfl-2.lan (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 5A4B61A0138; Thu, 4 Nov 2021 09:14:43 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3] x86: Optimize atomic_compare_and_exchange_[val|bool]_acq [BZ #28537] Date: Thu, 4 Nov 2021 09:14:43 -0700 Message-Id: <20211104161443.734681-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3030.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: Florian Weimer , Hongyu Wang , Andreas Schwab , liuhongt , Arjan van de Ven Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. Load the current memory value via a volatile pointer first, which should be atomic and won't be optimized out by compiler, check and return immediately if writing cache line may fail to reduce cache line bouncing on contended locks. This fixes BZ# 28537. A GCC bug is opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065 The fixed compiler should define __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to indicate that compiler will generate the check with the volatile load. Then glibc can check __HAVE_SYNC_COMPARE_AND_SWAP_LOAD_CHECK to avoid the extra volatile load. --- sysdeps/x86/atomic-machine.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86/atomic-machine.h b/sysdeps/x86/atomic-machine.h index 2692d94a92..597dc1cf92 100644 --- a/sysdeps/x86/atomic-machine.h +++ b/sysdeps/x86/atomic-machine.h @@ -73,9 +73,20 @@ typedef uintmax_t uatomic_max_t; #define ATOMIC_EXCHANGE_USES_CAS 0 #define atomic_compare_and_exchange_val_acq(mem, newval, oldval) \ - __sync_val_compare_and_swap (mem, oldval, newval) + ({ volatile __typeof (*(mem)) *memp = (mem); \ + __typeof (*(mem)) oldmem = *memp, ret; \ + ret = (oldmem == (oldval) \ + ? __sync_val_compare_and_swap (mem, oldval, newval) \ + : oldmem); \ + ret; }) #define atomic_compare_and_exchange_bool_acq(mem, newval, oldval) \ - (! __sync_bool_compare_and_swap (mem, oldval, newval)) + ({ volatile __typeof (*(mem)) *memp = (mem); \ + __typeof (*(mem)) oldmem = *memp; \ + int ret; \ + ret = (oldmem == (oldval) \ + ? !__sync_bool_compare_and_swap (mem, oldval, newval) \ + : 1); \ + ret; }) #define __arch_c_compare_and_exchange_val_8_acq(mem, newval, oldval) \