Message ID | 20221115033559.66827-1-hongyu.wang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7DF1A3851889 for <patchwork@sourceware.org>; Tue, 15 Nov 2022 03:36:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7DF1A3851889 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668483393; bh=T05Xuwt232FoQ6i+rOWO5TNHBCHi0Rjh2KhTIUqBpso=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=ZwAHdso0KVaq5dBG0GMkFv2cNV0t0gYz4MescamkpGsTtulZJdg283RQraKIIEUTN 4t0JpxfF6gg4eeKJPpAvCJ4394TbIWdi1Waszyc6QPq9wYalM7haednEEY1t56W4+r ZkeR9KcZ/rJiERjTW2yMw29jePYPvWunmqcqHSYg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id BA9BD38582A1 for <gcc-patches@gcc.gnu.org>; Tue, 15 Nov 2022 03:36:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA9BD38582A1 X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="292545519" X-IronPort-AV: E=Sophos;i="5.96,164,1665471600"; d="scan'208";a="292545519" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2022 19:36:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="781183963" X-IronPort-AV: E=Sophos;i="5.96,164,1665471600"; d="scan'208";a="781183963" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga001.fm.intel.com with ESMTP; 14 Nov 2022 19:36:00 -0800 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 9EDF410056B2; Tue, 15 Nov 2022 11:35:59 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676] Date: Tue, 15 Nov 2022 11:35:59 +0800 Message-Id: <20221115033559.66827-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Hongyu Wang via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Hongyu Wang <hongyu.wang@intel.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]
|
|
Commit Message
Hongyu Wang
Nov. 15, 2022, 3:35 a.m. UTC
Hi, According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical. Adjust the wording according to the comments. Bootstrapped on x86_64-pc-linux-gnu, ok for trunk? gcc/ChangeLog: PR target/107676 * doc/invoke.texi: Reword the description of -mrelax-cmpxchg-loop. --- gcc/doc/invoke.texi | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Comments
On 15/11/22 11:35 +0800, Hongyu Wang wrote: >Hi, > >According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical. >Adjust the wording according to the comments. > >Bootstrapped on x86_64-pc-linux-gnu, ok for trunk? > >gcc/ChangeLog: > > PR target/107676 > * doc/invoke.texi: Reword the description of > -mrelax-cmpxchg-loop. >--- > gcc/doc/invoke.texi | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >index 40f667a630a..bdd7c319aef 100644 >--- a/gcc/doc/invoke.texi >+++ b/gcc/doc/invoke.texi >@@ -33805,10 +33805,12 @@ registers. > > @item -mrelax-cmpxchg-loop > @opindex mrelax-cmpxchg-loop >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg, >-execute pause if load value is not expected. This reduces excessive >-cachline bouncing when and works for all atomic logic fetch builtins >-that generates compare and swap loop. >+For compare and swap loops that emitted by some __atomic_* builtins s/that emitted/that are emitted/ >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch >+counterparts), emit an atomic load before cmpxchg instruction. If the s/before cmpxchg/before the cmpxchg/ >+loaded value is not equal to expected, execute a pause instead of s/not equal to expected/not equal to the expected/ >+directly run the cmpxchg instruction. This might reduce excessive s/directly run/directly running/ >+cacheline bouncing. > > @item -mindirect-branch=@var{choice} > @opindex mindirect-branch
On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote: > > @item -mrelax-cmpxchg-loop > > @opindex mrelax-cmpxchg-loop > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg, > >-execute pause if load value is not expected. This reduces excessive > >-cachline bouncing when and works for all atomic logic fetch builtins > >-that generates compare and swap loop. > >+For compare and swap loops that emitted by some __atomic_* builtins > > s/that emitted/that are emitted/ > > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch > >+counterparts), emit an atomic load before cmpxchg instruction. If the > > s/before cmpxchg/before the cmpxchg/ > > >+loaded value is not equal to expected, execute a pause instead of > > s/not equal to expected/not equal to the expected/ > > >+directly run the cmpxchg instruction. This might reduce excessive > > s/directly run/directly running/ This results in "... execute a pause instead of directly running the cmpxchg instruction", which needs further TLC because: * 'a pause' should be 'the PAUSE instruction'; * 'directly running [an instruction]' does not seem correct in context. The option also applies to __sync builtins, not just __atomic. How about the following: When emitting a compare-and-swap loop for @ref{__sync Builtins} and @ref{__atomic Builtins} lacking a native instruction, optimize for the highly contended case by issuing an atomic load before the @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction when restarting the loop. Alexander
On Tue, 15 Nov 2022 at 13:34, Alexander Monakov <amonakov@ispras.ru> wrote: > > On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote: > > > > @item -mrelax-cmpxchg-loop > > > @opindex mrelax-cmpxchg-loop > > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg, > > >-execute pause if load value is not expected. This reduces excessive > > >-cachline bouncing when and works for all atomic logic fetch builtins > > >-that generates compare and swap loop. > > >+For compare and swap loops that emitted by some __atomic_* builtins > > > > s/that emitted/that are emitted/ > > > > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch > > >+counterparts), emit an atomic load before cmpxchg instruction. If the > > > > s/before cmpxchg/before the cmpxchg/ > > > > >+loaded value is not equal to expected, execute a pause instead of > > > > s/not equal to expected/not equal to the expected/ > > > > >+directly run the cmpxchg instruction. This might reduce excessive > > > > s/directly run/directly running/ > > This results in "... execute a pause instead of directly running the > cmpxchg instruction", which needs further TLC because: > > * 'a pause' should be 'the PAUSE instruction'; > * 'directly running [an instruction]' does not seem correct in context. > > The option also applies to __sync builtins, not just __atomic. > > > How about the following: > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > and @ref{__atomic Builtins} lacking a native instruction, optimize > for the highly contended case by issuing an atomic load before the > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction > when restarting the loop. That's much better, thanks. My only remaining quibble would be that "invoking" an instruction seems only marginally better than running one. Emitting? Issuing? Using? Adding?
On Tue, 15 Nov 2022, Jonathan Wakely wrote: > > How about the following: > > > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > > and @ref{__atomic Builtins} lacking a native instruction, optimize > > for the highly contended case by issuing an atomic load before the > > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction > > when restarting the loop. > > That's much better, thanks. My only remaining quibble would be that > "invoking" an instruction seems only marginally better than running > one. Emitting? Issuing? Using? Adding? Right, it should be 'using'; let me also add 'to save CPU power': When emitting a compare-and-swap loop for @ref{__sync Builtins} and @ref{__atomic Builtins} lacking a native instruction, optimize for the highly contended case by issuing an atomic load before the @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction to save CPU power when restarting the loop. Alexander
> When emitting a compare-and-swap loop for @ref{__sync Builtins} > and @ref{__atomic Builtins} lacking a native instruction, optimize > for the highly contended case by issuing an atomic load before the > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction > to save CPU power when restarting the loop. Thanks for the correction, it looks quite clear now! Here is the updated patch, ok for trunk? Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org> 于2022年11月15日周二 21:59写道: > > > On Tue, 15 Nov 2022, Jonathan Wakely wrote: > > > > How about the following: > > > > > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > > > and @ref{__atomic Builtins} lacking a native instruction, optimize > > > for the highly contended case by issuing an atomic load before the > > > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction > > > when restarting the loop. > > > > That's much better, thanks. My only remaining quibble would be that > > "invoking" an instruction seems only marginally better than running > > one. Emitting? Issuing? Using? Adding? > > Right, it should be 'using'; let me also add 'to save CPU power': > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > and @ref{__atomic Builtins} lacking a native instruction, optimize > for the highly contended case by issuing an atomic load before the > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction > to save CPU power when restarting the loop. > > Alexander
On Wed, 16 Nov 2022, Hongyu Wang wrote: > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > > and @ref{__atomic Builtins} lacking a native instruction, optimize > > for the highly contended case by issuing an atomic load before the > > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction > > to save CPU power when restarting the loop. > > Thanks for the correction, it looks quite clear now! Here is the > updated patch, ok for trunk? Please use 'git commit --author' to indicate authorship of the patch (or simply let me push it once approved). Jonathan, please let us know if the above wording looks fine to you? Mainly I'm asking if '... and using' or '... and use' is easier to read. Alexander
> Please use 'git commit --author' to indicate authorship of the patch > (or simply let me push it once approved). Yes, just change the author and push it. Thanks for your help!
On Wed, 16 Nov 2022 at 07:51, Alexander Monakov <amonakov@ispras.ru> wrote: > > > On Wed, 16 Nov 2022, Hongyu Wang wrote: > > > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > > > and @ref{__atomic Builtins} lacking a native instruction, optimize > > > for the highly contended case by issuing an atomic load before the > > > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction > > > to save CPU power when restarting the loop. > > > > Thanks for the correction, it looks quite clear now! Here is the > > updated patch, ok for trunk? > > Please use 'git commit --author' to indicate authorship of the patch > (or simply let me push it once approved). > > Jonathan, please let us know if the above wording looks fine to you? > Mainly I'm asking if '... and using' or '... and use' is easier to read. Your wording above ("and using...") looks good, it reads naturally and clearly. It's quite a long sentence, so I considered suggesting: "... by issuing an atomic load before the CMPXCHG instruction. Also use the PAUSE instruction to save CPU power when restarting the loop." But I think your original is better. The sentence is long, but it flows better as a single sentence. As two sentences, the second one just seems tacked onto the end and it's less clear how it relates to the first.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 40f667a630a..bdd7c319aef 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -33805,10 +33805,12 @@ registers. @item -mrelax-cmpxchg-loop @opindex mrelax-cmpxchg-loop -Relax cmpxchg loop by emitting an early load and compare before cmpxchg, -execute pause if load value is not expected. This reduces excessive -cachline bouncing when and works for all atomic logic fetch builtins -that generates compare and swap loop. +For compare and swap loops that emitted by some __atomic_* builtins +(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch +counterparts), emit an atomic load before cmpxchg instruction. If the +loaded value is not equal to expected, execute a pause instead of +directly run the cmpxchg instruction. This might reduce excessive +cacheline bouncing. @item -mindirect-branch=@var{choice} @opindex mindirect-branch