From patchwork Tue Jun 7 07:54:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 54881 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58853382CCAC for ; Tue, 7 Jun 2022 07:55:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58853382CCAC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1654588506; bh=RvO/Y6f6ZsEFRi+c2wwFdTmEAyP+Z/802KjwUkbKhFo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Do4XArMMEdqgyhGW6NWMlbJ2cbr0BWyN+Y8epSL+n2oJHAVNA2XTDyjpcgkzM62Nl HdrOTDQVA2wE+7Wjrlxp2aGvnyTvN/jQkK7CrTjP/xDNqb5L+ltytmtmGPWUyxa2Sd 01n+X++WOZDE7LJKdFW5gB9PMokfmQ0cZd8J7x8I= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id C91103857409 for ; Tue, 7 Jun 2022 07:54:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C91103857409 X-IronPort-AV: E=McAfee;i="6400,9594,10370"; a="340353591" X-IronPort-AV: E=Sophos;i="5.91,283,1647327600"; d="scan'208";a="340353591" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 00:54:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,283,1647327600"; d="scan'208";a="669865890" Received: from scymds01.sc.intel.com ([10.148.94.138]) by FMSMGA003.fm.intel.com with ESMTP; 07 Jun 2022 00:54:33 -0700 Received: from shliclel045.sh.intel.com (shliclel045.sh.intel.com [10.239.236.45]) by scymds01.sc.intel.com with ESMTP id 2577sVaH007206; Tue, 7 Jun 2022 00:54:32 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH] Simplify (B * v + C) * D -> BD* v + CD when B, C, D are all INTEGER_CST. Date: Tue, 7 Jun 2022 15:54:31 +0800 Message-Id: <20220607075431.24764-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: References: X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" >> + (mult:c (plus:c@4 (mult:c@5 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) >since canonicalization puts INTEGER_CSTs last the :c should not be necessary. Changed. >> + (if (single_use (@4) >> + && single_use (@5)) >since the resulting expression is not simple using :s instead of >single_use (..) should >work as well. Changed. > when we go from (a + CST1) * CST2 to a * CST2 + CST1*CST2 we have > to worry about CST1 == -a which would make (a+CST1) * INT_MAX > not overflow but a * INT_MAX + CST1 * INT_MAX might. Is the > overflow check for CST1 * INT_MAX sufficient to rule out > that a * CST2 does not overflow when (a + CST1) * CST2 does not > overflow? Consider a == 2, CST1 == -1, CST2 == INT_MAX, > here 1 * INT_MAX does not overflow, nor does -1 * INT_MAX, but > 2 * INT_MAX overflows and thus the resulting expression invokes > undefined behavior. > > The same issue probably arises for the first pattern outer half > which looks like (a' + CST2) * CST3 with a' = a * CST1? > > The appropriate solution might be to perform the arithmetic > in an unsigned type with the implication that has on value-range > analysis. Yes, the patch patched based on value-range analysis. Update the patch. Similar for (v + B) * C + D -> C * v + BCD. Don't simplify it when there's overflow and overflow is UB for type v. gcc/ChangeLog: PR tree-optimization/53533 * match.pd: Simplify (B * v + C) * D -> BD * v + CD and (v + B) * C + D -> C * v + BCD when B,C,D are all INTEGER_CST, and there's no overflow or !TYPE_OVERFLOW_UNDEFINED. gcc/testsuite/ChangeLog: * gcc.target/i386/pr53533-1.c: New test. * gcc.target/i386/pr53533-2.c: New test. * gcc.target/i386/pr53533-3.c: New test. * gcc.target/i386/pr53533-4.c: New test. * gcc.target/i386/pr53533-5.c: New test. * gcc.dg/vect/slp-11a.c: Adjust testcase. --- gcc/match.pd | 82 +++++++++++++++++++++++ gcc/testsuite/gcc.dg/vect/slp-11a.c | 10 +-- gcc/testsuite/gcc.target/i386/pr53533-1.c | 23 +++++++ gcc/testsuite/gcc.target/i386/pr53533-2.c | 46 +++++++++++++ gcc/testsuite/gcc.target/i386/pr53533-3.c | 24 +++++++ gcc/testsuite/gcc.target/i386/pr53533-4.c | 46 +++++++++++++ gcc/testsuite/gcc.target/i386/pr53533-5.c | 22 ++++++ 7 files changed, 248 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr53533-5.c diff --git a/gcc/match.pd b/gcc/match.pd index 44a385b912d..54f53a1f988 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -489,6 +489,88 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (!overflow || TYPE_OVERFLOW_WRAPS (type)) (mult @0 { wide_int_to_tree (type, mul); })))) +/* Similar to above, but there could be an extra add/sub between + successive multuiplications. */ +(simplify + (mult (plus:s (mult:s@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) + (with { + bool overflowed = true; + wi::overflow_type ovf1, ovf2; + wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@3), + TYPE_SIGN (type), &ovf1); + wide_int add = wi::mul (wi::to_wide (@2), wi::to_wide (@3), + TYPE_SIGN (type), &ovf2); + if (TYPE_OVERFLOW_UNDEFINED (type)) + { +#if GIMPLE + value_range vr0; + if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE + && get_global_range_query ()->range_of_expr (vr0, @4) + && vr0.kind () == VR_RANGE) + { + wide_int wmin0 = vr0.lower_bound (); + wide_int wmax0 = vr0.upper_bound (); + wmin0 = wi::mul (wmin0, wi::to_wide (@3), TYPE_SIGN (type), &ovf1); + wmax0 = wi::mul (wmax0, wi::to_wide (@3), TYPE_SIGN (type), &ovf2); + if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE) + { + wi::add (wmin0, add, TYPE_SIGN (type), &ovf1); + wi::add (wmax0, add, TYPE_SIGN (type), &ovf2); + if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE) + overflowed = false; + } + } +#endif + } + else + overflowed = false; + } + /* Skip folding on overflow. */ + (if (!overflowed) + (plus (mult @0 { wide_int_to_tree (type, mul); }) + { wide_int_to_tree (type, add); })))) + +/* Similar to above, but a multiplication between successive additions. */ +(simplify + (plus (mult:s (plus:s @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) + (with { + bool overflowed = true; + wi::overflow_type ovf1; + wi::overflow_type ovf2; + wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2), + TYPE_SIGN (type), &ovf1); + wide_int add = wi::add (mul, wi::to_wide (@3), + TYPE_SIGN (type), &ovf2); + if (TYPE_OVERFLOW_UNDEFINED (type)) + { +#if GIMPLE + value_range vr0; + if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE + && get_global_range_query ()->range_of_expr (vr0, @0) + && vr0.kind () == VR_RANGE) + { + wide_int wmin0 = vr0.lower_bound (); + wide_int wmax0 = vr0.upper_bound (); + wmin0 = wi::mul (wmin0, wi::to_wide (@2), TYPE_SIGN (type), &ovf1); + wmax0 = wi::mul (wmax0, wi::to_wide (@2), TYPE_SIGN (type), &ovf2); + if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE) + { + wi::add (wmin0, mul, TYPE_SIGN (type), &ovf1); + wi::add (wmax0, mul, TYPE_SIGN (type), &ovf2); + if (ovf1 == wi::OVF_NONE && ovf2 == wi::OVF_NONE) + overflowed = false; + } + } +#endif + } + else + overflowed = false; + + } + /* Skip folding on overflow. */ + (if (!overflowed) + (plus (mult @0 @2) { wide_int_to_tree (type, add); })))) + /* Optimize A / A to 1.0 if we don't care about NaNs or Infinities. */ (simplify diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c b/gcc/testsuite/gcc.dg/vect/slp-11a.c index bcd3c861ca4..e6632fa77be 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-11a.c +++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c @@ -9,14 +9,14 @@ int main1 () { int i; - unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7; - unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}; + int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7; + int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63}; /* Different operations - not SLPable. */ for (i = 0; i < N; i++) { a0 = in[i*8] + 5; - a1 = in[i*8 + 1] * 6; + a1 = in[i*8 + 1] * 51072; a2 = in[i*8 + 2] + 7; a3 = in[i*8 + 3] + 8; a4 = in[i*8 + 4] + 9; @@ -25,7 +25,7 @@ main1 () a7 = in[i*8 + 7] + 12; b0 = a0 * 3; - b1 = a1 * 2; + b1 = a1 * 51072; b2 = a2 * 12; b3 = a3 * 5; b4 = a4 * 8; @@ -47,7 +47,7 @@ main1 () for (i = 0; i < N; i++) { if (out[i*8] != (in[i*8] + 5) * 3 - 2 - || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3 + || out[i*8 + 1] != (in[i*8 + 1] * 51072) * 51072 - 3 || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2 || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1 || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8 diff --git a/gcc/testsuite/gcc.target/i386/pr53533-1.c b/gcc/testsuite/gcc.target/i386/pr53533-1.c new file mode 100644 index 00000000000..095de665366 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53533-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O1" } */ +/* { dg-final { scan-assembler-times "imull\[ \t\]" "1" } } */ +/* { dg-final { scan-assembler-times "(?:addl|subl)\[ \t\]" "1" { target { ! ia32 } } } } */ + +void +__attribute__((noipa)) +foo (unsigned a[256], unsigned b[256]) +{ + int i; + for (i = 0; i < 256; ++i) + { + unsigned tmp = a[i] + 12345U; + tmp *= 914237U; + tmp += 12332U; + tmp *= 914237U; + tmp += 12332U; + tmp *= 914237U; + tmp -= 13U; + tmp *= 8000U; + b[i] = tmp; + } +} diff --git a/gcc/testsuite/gcc.target/i386/pr53533-2.c b/gcc/testsuite/gcc.target/i386/pr53533-2.c new file mode 100644 index 00000000000..c31b6ff4dec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53533-2.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "pr53533-1.c" + +void +__attribute__((optimize("-O0"))) +foo1 (unsigned a[256], unsigned b[256]) +{ + int i; + for (i = 0; i < 256; ++i) + { + unsigned tmp = a[i] + 12345U; + tmp *= 914237U; + tmp += 12332U; + tmp *= 914237U; + tmp += 12332U; + tmp *= 914237U; + tmp -= 13U; + tmp *= 8000U; + b[i] = tmp; + } +} + +int main() +{ + unsigned int a[256]; + unsigned int b[256]; + unsigned int c[256]; + for (unsigned int i = 0; i != 256; i++) + { + b[i] = 0; + c[i] = 1; + a[i] = i * i - 10 * i + 33; + } + foo (a, b); + foo1 (a, c); + + for (unsigned int i = 0; i != 256; i++) + { + if (b[i] != c[i]) + __builtin_abort (); + } + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr53533-3.c b/gcc/testsuite/gcc.target/i386/pr53533-3.c new file mode 100644 index 00000000000..3b260d134e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53533-3.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fwrapv" } */ +/* { dg-final { scan-assembler-times "imull\[ \t\]" "1" } } */ +/* { dg-final { scan-assembler-times "(?:addl|subl)\[ \t\]" "1" { target { ! ia32 } } } } */ + +void +__attribute__((noipa)) +foo (int a[256], int b[256]) +{ + int i; + for (i = 0; i < 256; ++i) + { + int tmp = a[i] + 12345; + tmp *= 914237; + tmp += 12332; + tmp *= 914237; + tmp += 12332; + tmp *= 914237; + tmp -= 13; + tmp *= 8000; + b[i] = tmp; + } +} + diff --git a/gcc/testsuite/gcc.target/i386/pr53533-4.c b/gcc/testsuite/gcc.target/i386/pr53533-4.c new file mode 100644 index 00000000000..c29f90a44dc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53533-4.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fwrapv" } */ + +#include "pr53533-3.c" + +void +__attribute__((optimize("-O0"))) +foo1 (int a[256], int b[256]) +{ + int i; + for (i = 0; i < 256; ++i) + { + int tmp = a[i] + 12345; + tmp *= 914237; + tmp += 12332; + tmp *= 914237; + tmp += 12332; + tmp *= 914237; + tmp -= 13; + tmp *= 8000; + b[i] = tmp; + } +} + +int main() +{ + int a[256]; + int b[256]; + int c[256]; + for (int i = 0; i != 256; i++) + { + b[i] = 0; + c[i] = 1; + a[i] = i * i - 10 * i + 33; + } + foo (a, b); + foo1 (a, c); + + for (unsigned int i = 0; i != 256; i++) + { + if (b[i] != c[i]) + __builtin_abort (); + } + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr53533-5.c b/gcc/testsuite/gcc.target/i386/pr53533-5.c new file mode 100644 index 00000000000..56fe4f44064 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53533-5.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times {(?n)\* 2147483647} 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times {(?n)\* 1073741823} 1 "optimized" } } */ + +#define INT_MAX 2147483647 +int +foo (int a) +{ + /* When a == -2, there's no overflow for (a + 1) * INT_MAX - 1. + but overflow for a * INT_MAX + (INT_MAX - 1). + Don't simpify it. */ + return (a + 1) * INT_MAX - 1; +} + +int +foo1 (int a) +{ + /* Be conservative here, don't simplify this as long as + a * 2147483646 may overflow. */ + return 1073741823 * (a * 2 + 1); +}