[1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet.
Message ID | 20230420004615.2434390-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F1D30385840C for <patchwork@sourceware.org>; Thu, 20 Apr 2023 00:46:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F1D30385840C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681951611; bh=Z6G4zTV3i891bc3t5X+uNwmENTXXycasjaXNxunSbaA=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=YWAE/CVJaGa7rWSdWn40hTckAus/tKpAg/wf4v5itsSyrG8tnt/J5JKxE3YVI2wYr Iw09xtPEUlQ8MIEYOOnN/TPkWX8iXHh7Wry7lfnqb+mm+vVfN7+C5aI2zj8KYYxDZ9 AD4lck4R/h7QGdkfY+2vE4CrzY1n1ru22GhIC1IY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id AB49D3858D33 for <gcc-patches@gcc.gnu.org>; Thu, 20 Apr 2023 00:46:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB49D3858D33 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="347455441" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="347455441" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 17:46:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="691708840" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="691708840" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 19 Apr 2023 17:46:16 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id CEBC6100AC77; Thu, 20 Apr 2023 08:46:15 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH 1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet. Date: Thu, 20 Apr 2023 08:46:14 +0800 Message-Id: <20230420004615.2434390-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.39.1.388.g2fc9e9ca3c MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: liuhongt <hongtao.liu@intel.com> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet.
|
|
Commit Message
liuhongt
April 20, 2023, 12:46 a.m. UTC
1547 /* If this insn loads a parameter from its stack slot, then it 1548 represents a savings, rather than a cost, if the parameter is 1549 stored in memory. Record this fact. 1550 1551 Similarly if we're loading other constants from memory (constant 1552 pool, TOC references, small data areas, etc) and this is the only 1553 assignment to the destination pseudo. At that time, preferred regclass is unknown, and GENERAL_REGS is used to record memory move cost, but it's not accurate especially for large vector modes, i.e. 512-bit vector in x86 which would most probably allocate with SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate the cost of this load and make RA propagate the memeory operand into many consume instructions which causes worse performance. Fortunately, NO_REGS is used to record the best scenario, so the patch uses NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and aarch64-linux-gnu. Ok for trunk? gcc/ChangeLog: PR rtl-optimization/108707 * ira-costs.cc (scan_one_insn): Use NO_REGS instead of GENERAL_REGS when preferred reg_class is not known. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108707.c: New test. --- gcc/ira-costs.cc | 5 ++++- gcc/testsuite/gcc.target/i386/pr108707.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c
Comments
On Thu, Apr 20, 2023 at 8:46 AM liuhongt <hongtao.liu@intel.com> wrote: > > 1547 /* If this insn loads a parameter from its stack slot, then it > 1548 represents a savings, rather than a cost, if the parameter is > 1549 stored in memory. Record this fact. > 1550 > 1551 Similarly if we're loading other constants from memory (constant > 1552 pool, TOC references, small data areas, etc) and this is the only > 1553 assignment to the destination pseudo. > > At that time, preferred regclass is unknown, and GENERAL_REGS is used to > record memory move cost, but it's not accurate especially for large vector > modes, i.e. 512-bit vector in x86 which would most probably allocate with > SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate > the cost of this load and make RA propagate the memeory operand into many > consume instructions which causes worse performance. > > Fortunately, NO_REGS is used to record the best scenario, so the patch uses > NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > and aarch64-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > PR rtl-optimization/108707 > * ira-costs.cc (scan_one_insn): Use NO_REGS instead of > GENERAL_REGS when preferred reg_class is not known. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr108707.c: New test. > --- > gcc/ira-costs.cc | 5 ++++- > gcc/testsuite/gcc.target/i386/pr108707.c | 16 ++++++++++++++++ > 2 files changed, 20 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c > > diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc > index c0fdef807dd..d2a801ab9b0 100644 > --- a/gcc/ira-costs.cc > +++ b/gcc/ira-costs.cc > @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn) > && (! ira_use_lra_p || ! pic_offset_table_rtx > || ! contains_symbol_ref_p (XEXP (note, 0)))) > { > - enum reg_class cl = GENERAL_REGS; > + /* Costs for NO_REGS are used in cost calculation on the > + 1st pass when the preferred register classes are not > + known yet. In this case we take the best scenario. */ > + enum reg_class cl = NO_REGS; > rtx reg = SET_DEST (set); > int num = COST_INDEX (REGNO (reg)); > > diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c b/gcc/testsuite/gcc.target/i386/pr108707.c > new file mode 100644 > index 00000000000..bc1a476f551 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr108707.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O2" } */ > +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} } } */ > +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ > + > +#include<immintrin.h> > + > +void > +foo (__m512 pv, __m512 a, __m512 b, __m512 c, > + __m512* pdest, __m512* p1) > +{ > + __m512 t = *p1; > + pdest[0] = _mm512_fmadd_ps (t, pv, a); > + pdest[1] = _mm512_fmadd_ps (t, pv, b); > + pdest[2] = _mm512_fmadd_ps (t, pv, c); > +} > -- > 2.39.1.388.g2fc9e9ca3c >
On 4/19/23 20:46, liuhongt via Gcc-patches wrote: > 1547 /* If this insn loads a parameter from its stack slot, then it > 1548 represents a savings, rather than a cost, if the parameter is > 1549 stored in memory. Record this fact. > 1550 > 1551 Similarly if we're loading other constants from memory (constant > 1552 pool, TOC references, small data areas, etc) and this is the only > 1553 assignment to the destination pseudo. > > At that time, preferred regclass is unknown, and GENERAL_REGS is used to > record memory move cost, but it's not accurate especially for large vector > modes, i.e. 512-bit vector in x86 which would most probably allocate with > SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate > the cost of this load and make RA propagate the memeory operand into many > consume instructions which causes worse performance. For this case GENERAL_REGS was used in GCC practically all the time. You can check this in the old regclass.c file (existing until IRA introduction). But I guess it is ok to use NO_REGS for this to promote more usage of registers instead of equiv memory and as a lot of code was changed since then (the old versions of GCC even did not support vector regs). Although it would be nice to do some benchmarking (SPEC is preferable) for such kind of changes. On the other hand, I expect that any performance regression (if any) will be reported anyway. The patch is ok for me. You can commit it into the trunk. Thank you for addressing this issue. > Fortunately, NO_REGS is used to record the best scenario, so the patch uses > NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > and aarch64-linux-gnu. > Ok for trunk? > > gcc/ChangeLog: > > PR rtl-optimization/108707 > * ira-costs.cc (scan_one_insn): Use NO_REGS instead of > GENERAL_REGS when preferred reg_class is not known. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr108707.c: New test.
> -----Original Message----- > From: Vladimir Makarov <vmakarov@redhat.com> > Sent: Saturday, April 22, 2023 3:26 AM > To: Liu, Hongtao <hongtao.liu@intel.com>; gcc-patches@gcc.gnu.org > Cc: crazylht@gmail.com; hjl.tools@gmail.com > Subject: Re: [PATCH 1/2] Use NO_REGS in cost calculation when the > preferred register class are not known yet. > > > On 4/19/23 20:46, liuhongt via Gcc-patches wrote: > > 1547 /* If this insn loads a parameter from its stack slot, then it > > 1548 represents a savings, rather than a cost, if the parameter is > > 1549 stored in memory. Record this fact. > > 1550 > > 1551 Similarly if we're loading other constants from memory (constant > > 1552 pool, TOC references, small data areas, etc) and this is the only > > 1553 assignment to the destination pseudo. > > > > At that time, preferred regclass is unknown, and GENERAL_REGS is used > > to record memory move cost, but it's not accurate especially for large > > vector modes, i.e. 512-bit vector in x86 which would most probably > > allocate with SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS > > here will overestimate the cost of this load and make RA propagate the > > memeory operand into many consume instructions which causes worse > performance. > > For this case GENERAL_REGS was used in GCC practically all the time. You can > check this in the old regclass.c file (existing until IRA introduction). > > But I guess it is ok to use NO_REGS for this to promote more usage of > registers instead of equiv memory and as a lot of code was changed since > then (the old versions of GCC even did not support vector regs). > > Although it would be nice to do some benchmarking (SPEC is preferable) for > such kind of changes. Thanks, I've run SPEC2017 on x86 ICX, no big performance change, a little bit code size improvement as expected(codesize of 1 load + multi ops should be smaller than multi ciscy ops). > > On the other hand, I expect that any performance regression (if any) will be > reported anyway. > > The patch is ok for me. You can commit it into the trunk. > > Thank you for addressing this issue. > > > Fortunately, NO_REGS is used to record the best scenario, so the patch > > uses NO_REGS instead of GENERAL_REGS here, it could help RA in > PR108707. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and > > aarch64-linux-gnu. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR rtl-optimization/108707 > > * ira-costs.cc (scan_one_insn): Use NO_REGS instead of > > GENERAL_REGS when preferred reg_class is not known. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr108707.c: New test.
diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index c0fdef807dd..d2a801ab9b0 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn) && (! ira_use_lra_p || ! pic_offset_table_rtx || ! contains_symbol_ref_p (XEXP (note, 0)))) { - enum reg_class cl = GENERAL_REGS; + /* Costs for NO_REGS are used in cost calculation on the + 1st pass when the preferred register classes are not + known yet. In this case we take the best scenario. */ + enum reg_class cl = NO_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c b/gcc/testsuite/gcc.target/i386/pr108707.c new file mode 100644 index 00000000000..bc1a476f551 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108707.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} } } */ +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ + +#include<immintrin.h> + +void +foo (__m512 pv, __m512 a, __m512 b, __m512 c, + __m512* pdest, __m512* p1) +{ + __m512 t = *p1; + pdest[0] = _mm512_fmadd_ps (t, pv, a); + pdest[1] = _mm512_fmadd_ps (t, pv, b); + pdest[2] = _mm512_fmadd_ps (t, pv, c); +}