From patchwork Wed Nov 10 00:16:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 47343 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 85174385803D for ; Wed, 10 Nov 2021 00:20:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 85174385803D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636503630; bh=51wfbsWQIcmllq+BbsUNIU76+w3ZcM+zsoJYURnLahA=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=bCSeQF0nQ5/Kth51lYwR3rozao6wUn5L1lvkAJMjXRiXPy/LhhvC844kktMFg72c2 u3mnOYnSyDbB6bvQXstP5Jb+0kiGJ5PD05cfubNpxzgGqMkacilsR6mydNNSokWsvj JvVknpDooPNw6KsgzZYApS64lLqxDzRHL4kVG+18= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id 3847F3857C6D for ; Wed, 10 Nov 2021 00:16:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3847F3857C6D Received: by mail-pj1-x102c.google.com with SMTP id gf14-20020a17090ac7ce00b001a7a2a0b5c3so72573pjb.5 for ; Tue, 09 Nov 2021 16:16:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=51wfbsWQIcmllq+BbsUNIU76+w3ZcM+zsoJYURnLahA=; b=FnPanizTlGN3ZZTqBumMNy3bel66aMYWIwbB91IY3/7kxyewbwCym+CTTJPDIyEW57 zQUyvMVC836JpJ49eaNRm+UzwzULy4yjglO6LlRfC+jY2okd5UdzWe6Px9rggYGmluN/ DKuAfLC2aPbRryXkKlkZ14tNCVC3DuQ/3rIHMedeAJCfj+WXJ/otj9wG9rMmkfsoYg6h dKKvt/26p4S/HZyFyZWb5a+8fcVvcxIHEZkQ+KtNjVZw7DofedCa0W1sYE2xeXgzcTwn CW7X6TTvWB77bBoDY/p9Tvb9QeSR5MLvOO43f9OlxJsBp9kxLVHZY8IT+Q5rA8jfsmhb xR8A== X-Gm-Message-State: AOAM530d6c87Jk+8jefdVcfhAbGgfBk2ZjkPbERimBg5JGtmCTZecwiU SVWTN6WOrUWfyh4WdKHGSiE= X-Google-Smtp-Source: ABdhPJwRUWDddfaE7gb3mLrCrYjfBODoNQA8k0mVSQmhHqqCrozVoHCe5dOSkrw5rZdqXlFkWiVZ4w== X-Received: by 2002:a17:90b:3ecd:: with SMTP id rm13mr11748152pjb.157.1636503377329; Tue, 09 Nov 2021 16:16:17 -0800 (PST) Received: from gnu-cfl-2.localdomain ([172.58.35.133]) by smtp.gmail.com with ESMTPSA id b21sm16562624pff.179.2021.11.09.16.16.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Nov 2021 16:16:15 -0800 (PST) Received: from gnu-cfl-2.lan (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 7E4051A096B; Tue, 9 Nov 2021 16:16:14 -0800 (PST) To: libc-alpha@sourceware.org Subject: [PATCH v4 0/3] Optimize CAS [BZ #28537] Date: Tue, 9 Nov 2021 16:16:11 -0800 Message-Id: <20211110001614.2087610-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3021.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: Florian Weimer , Arjan van de Ven , Andreas Schwab Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. Optimize CAS in low level locks and pthread_mutex_lock.c: 1. Do an atomic load and skip CAS if compare may fail to reduce cache line bouncing on contended locks. 2. Replace atomic_compare_and_exchange_bool_acq with atomic_compare_and_exchange_val_acq to avoid the extra load. 3. Drop __glibc_unlikely in __lll_trylock and lll_cond_trylock since we don't know if it's actually rare; in the contended case it is clearly not rare. This is the first patch set to optimize CAS. I will investigate the rest CAS usages in glibc after this patch set has been accepted. H.J. Lu (3): Reduce CAS in low level locks [BZ #28537] Reduce CAS in __pthread_mutex_lock_full [BZ #28537] Optimize CAS in __pthread_mutex_lock_full [BZ #28537] nptl/lowlevellock.c | 12 ++++----- nptl/pthread_mutex_lock.c | 53 ++++++++++++++++++++++++++++--------- sysdeps/nptl/lowlevellock.h | 29 +++++++++++++------- 3 files changed, 67 insertions(+), 27 deletions(-)