Message ID | YZe5llec+qA6YdtE@toto.the-meissners.org |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 926183857C78 for <patchwork@sourceware.org>; Fri, 19 Nov 2021 14:50:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 926183857C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1637333435; bh=nvfLaRzNN+A/Kqf6pWEYxoU8jZHSLSBrfA4YXFAt8pg=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Z/GAUY7AvhXQscki+XjzWIhqojcXW2R+2JHC+aJ3bJnK0WmsNw9Bsexa3fiT4wSXk JitQgb6pxvJEfYN5ljET29UUAcQ6mrIS/12YcOW3oGe3fMsVJcVSES6gVyNigr5ns9 +65A+NB12XxfmiS/pGUxCZErPUwPyO3DLNC+7QiI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 31794385800C for <gcc-patches@gcc.gnu.org>; Fri, 19 Nov 2021 14:50:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 31794385800C Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AJEgcV5006965; Fri, 19 Nov 2021 14:50:04 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3cedww050r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 14:50:04 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1AJEjUSe022224; Fri, 19 Nov 2021 14:50:04 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0b-001b2d01.pphosted.com with ESMTP id 3cedww050g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 14:50:04 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1AJEmrgk019671; Fri, 19 Nov 2021 14:50:03 GMT Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by ppma03dal.us.ibm.com with ESMTP id 3ca50e3wb8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 19 Nov 2021 14:50:03 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1AJEo1cI49807688 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 19 Nov 2021 14:50:01 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38793124064; Fri, 19 Nov 2021 14:50:01 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5CC6412405A; Fri, 19 Nov 2021 14:50:00 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.240.210]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 19 Nov 2021 14:50:00 +0000 (GMT) Date: Fri, 19 Nov 2021 09:49:58 -0500 To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>, Segher Boessenkool <segher@kernel.crashing.org>, David Edelsohn <dje.gcc@gmail.com>, Bill Schmidt <wschmidt@linux.ibm.com>, Peter Bergner <bergner@linux.ibm.com>, Will Schmidt <will_schmidt@vnet.ibm.com> Subject: [PATCH 0/3] Add zero cycle move support Message-ID: <YZe5llec+qA6YdtE@toto.the-meissners.org> Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org, Segher Boessenkool <segher@kernel.crashing.org>, David Edelsohn <dje.gcc@gmail.com>, Bill Schmidt <wschmidt@linux.ibm.com>, Peter Bergner <bergner@linux.ibm.com>, Will Schmidt <will_schmidt@vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: K4b7RwLZ4FksDpG53tLstabEDARowQUe X-Proofpoint-GUID: s9EhC43d7jaduLnGSH_6N_ItoIes0MFZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-19_09,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 phishscore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 clxscore=1015 bulkscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 impostorscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111190081 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_MANYTO, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Michael Meissner via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Michael Meissner <meissner@linux.ibm.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series | Add zero cycle move support | |
Message
Michael Meissner
Nov. 19, 2021, 2:49 p.m. UTC
The next set of 3 patches add zero cycle move support to the Power10. Zero cycle moves are where the move to LR/CTR/TAR register that is adjacent to the jump to LR/CTR/TAR register can be fused together. At the moment, these set of three patches add support for zero cycle moves for indirect jumps and switch tables using the CTR register. Potential zero cycle moves for doing returns are not currently handled. In looking at the code, I discovered that just using zero cycle moves isn't as helpful unless we can eliminate the add instruction before doing the jump. I also noticed that the various power10 fusion options are only done if -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. I have done bootstraps and make check with these patches installed on both little endian power9 and little endian power10 systems. Can I install these patches? The following patches will be posted: 1) Patch to add zero cycle move for indirect jumps and switches. 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. 3) Patch to use absolute addresses for switch tables instead of relative addresses if zero cycle fusion is enabled.
Comments
Hi! On 11/19/21 8:49 AM, Michael Meissner wrote: > The next set of 3 patches add zero cycle move support to the Power10. Zero > cycle moves are where the move to LR/CTR/TAR register that is adjacent to the > jump to LR/CTR/TAR register can be fused together. > > At the moment, these set of three patches add support for zero cycle moves for > indirect jumps and switch tables using the CTR register. Potential zero cycle > moves for doing returns are not currently handled. > > In looking at the code, I discovered that just using zero cycle moves isn't as > helpful unless we can eliminate the add instruction before doing the jump. I > also noticed that the various power10 fusion options are only done if > -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. > > I have done bootstraps and make check with these patches installed on both > little endian power9 and little endian power10 systems. Can I install these > patches? > > The following patches will be posted: > > 1) Patch to add zero cycle move for indirect jumps and switches. > > 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. > > 3) Patch to use absolute addresses for switch tables instead of relative > addresses if zero cycle fusion is enabled. > For this last point, I had thought that the plan was to always switch over to absolute addresses for switch tables, following the work that Hao Chen did in this area. Am I misremembering? Hao Chen, can you please remind me where we ended up here? Thanks! Bill
On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote: > > Hi! > > On 11/19/21 8:49 AM, Michael Meissner wrote: > > The next set of 3 patches add zero cycle move support to the Power10. Zero > > cycle moves are where the move to LR/CTR/TAR register that is adjacent to the > > jump to LR/CTR/TAR register can be fused together. > > > > At the moment, these set of three patches add support for zero cycle moves for > > indirect jumps and switch tables using the CTR register. Potential zero cycle > > moves for doing returns are not currently handled. > > > > In looking at the code, I discovered that just using zero cycle moves isn't as > > helpful unless we can eliminate the add instruction before doing the jump. I > > also noticed that the various power10 fusion options are only done if > > -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. > > > > I have done bootstraps and make check with these patches installed on both > > little endian power9 and little endian power10 systems. Can I install these > > patches? > > > > The following patches will be posted: > > > > 1) Patch to add zero cycle move for indirect jumps and switches. > > > > 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. > > > > 3) Patch to use absolute addresses for switch tables instead of relative > > addresses if zero cycle fusion is enabled. > > > For this last point, I had thought that the plan was to always switch over to > absolute addresses for switch tables, following the work that Hao Chen did in > this area. Am I misremembering? Hao Chen, can you please remind me where we > ended up here? And do the absolute addressing for switch tables changes work on AIX? I thought that Hao Chen only had done the work for PPC64 Linux ELF syntax with promises of future changes to accommodate AIX as well. Thanks, David
On Mon, Nov 22, 2021 at 11:09:22AM -0500, David Edelsohn wrote: > On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote: > And do the absolute addressing for switch tables changes work on AIX? > I thought that Hao Chen only had done the work for PPC64 Linux ELF > syntax with promises of future changes to accommodate AIX as well. In theory it should work on AIX, since the assembler has to support syntax to load the contents of a 64-bit address in memory. In the past, when I measured this (probably in the power8 days), the issue was occasionally having 64-bit loads for the switch tables insted of 32-bit loads and an add instruction meant a slow down for 1-2 benchmarks that were extremely sensitive to cache sizes.
Bill and David, Currently, the absolute jump table is not by default enabled. It can be enabled by undocumented option "-mno-relative-jumptables". If the target supports named sections (have_named_sections), the feature can be enabled. We plan to enable the future by default in GCC12 and there is a ticket for it. Latest status is that I am waiting for comments on my patch. (https://github.ibm.com/wschmidt/power-gcc/issues/998#issuecomment-34643825). Thanks. || On 23/11/2021 上午 12:09, David Edelsohn wrote: > On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote: >> Hi! >> >> On 11/19/21 8:49 AM, Michael Meissner wrote: >>> The next set of 3 patches add zero cycle move support to the Power10. Zero >>> cycle moves are where the move to LR/CTR/TAR register that is adjacent to the >>> jump to LR/CTR/TAR register can be fused together. >>> >>> At the moment, these set of three patches add support for zero cycle moves for >>> indirect jumps and switch tables using the CTR register. Potential zero cycle >>> moves for doing returns are not currently handled. >>> >>> In looking at the code, I discovered that just using zero cycle moves isn't as >>> helpful unless we can eliminate the add instruction before doing the jump. I >>> also noticed that the various power10 fusion options are only done if >>> -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. >>> >>> I have done bootstraps and make check with these patches installed on both >>> little endian power9 and little endian power10 systems. Can I install these >>> patches? >>> >>> The following patches will be posted: >>> >>> 1) Patch to add zero cycle move for indirect jumps and switches. >>> >>> 2) Patch to enable p10 fusion for -mtune=power10 in addition to -mcpu=power10. >>> >>> 3) Patch to use absolute addresses for switch tables instead of relative >>> addresses if zero cycle fusion is enabled. >>> >> For this last point, I had thought that the plan was to always switch over to >> absolute addresses for switch tables, following the work that Hao Chen did in >> this area. Am I misremembering? Hao Chen, can you please remind me where we >> ended up here? > And do the absolute addressing for switch tables changes work on AIX? > I thought that Hao Chen only had done the work for PPC64 Linux ELF > syntax with promises of future changes to accommodate AIX as well. > > Thanks, David