From patchwork Thu Jan 6 22:39:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 49669 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D90D0385802F for ; Thu, 6 Jan 2022 22:39:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id CF23D3858412 for ; Thu, 6 Jan 2022 22:39:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CF23D3858412 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=I+vKvb5Vzyn5ua4UzHWQW1fy2xnB0PDXtRcfmbzMuZ8=; b=pTyd7jaj4Fl06IR0R3CcJb0OSk 5KY5hM5cU6mNYru7DjCTJO2yBdT8DqSVxDRLw5/WDG7bvS3GPxR7dQO6prtQN/I+L3eZI0wqr8alX 00+rwXslsFG+vqdhSVAyJh6/nvtnnKEJCPlIqY1stdzhzsqIwZhlX8SHZwHl0xQ9RfiRyE4s6vtPP QOh1zfdSmOV53XRHy2vaELOQJhA+2cijHZfI5WacONq4aOuQkzbybVfxftQPcF3DobZ7R1Cpm4xx/ iZzeW8/l9SkxxQNWlGoi84SAdvgNWv40GByMfMTjuliegprYkfcUUHftzevEl26U0hBToBMWoxEo/ XQXMWxVg==; Received: from host86-160-23-130.range86-160.btcentralplus.com ([86.160.23.130]:50044 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1n5bPw-0004fx-55; Thu, 06 Jan 2022 17:39:32 -0500 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH take #3] Recognize MULT_HIGHPART_EXPR in tree-ssa-math-opts pass. Date: Thu, 6 Jan 2022 22:39:29 -0000 Message-ID: <00bd01d8034e$45a05ee0$d0e11ca0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdgDTW0lhya++9t8T9eY0WuGWX7AQw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This is the third iteration of a patch to perceive MULT_HIGHPART_EXPR in the middle-end. As they say "the third time's a charm". The first version implemented this in match.pd, which was considered too early. https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551316.html The second version attempted to do this during RTL expansion, and was considered to be too late in the middle-end. https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576922.html https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576923.html This latest version incorporates Richard Biener's feedback/suggestion to perceive MULT_HIGHPART_EXPR in one of the "instruction selection passes", specifically tree-ssa-math-opts, where the recognition of highpart multiplications takes place in the same pass as widening multiplications. With each rewrite, the patch is also getting more aggressive in the set of widening multiplications that it recognizes as highpart multiplies. Currently any widening multiplication followed by a right shift (either signed or unsigned) by a bit count sufficient to eliminate the lowpart is recognized. The result of this shift doesn't need to be truncated. As previously, this patch confirms the target provides a suitable optab before introducing the MULT_HIGHPART_EXPR. This is the reason the testcase is restricted to x86_64, as this pass doesn't do anything on some platforms, but x86_64 should be sufficient to confirm that the pass is working/continues to work. This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap and make -k check (both with and without --target_board='unix{-m32}') with no new regressions. Ok for mainline? 2022-01-06 Roger Sayle gcc/ChangeLog * tree-ssa-math-opts.c (struct widen_mul_stats): Add a highpart_mults_inserted field. (convert_mult_to_highpart): New function to convert right shift of a widening multiply into a MULT_HIGHPART_EXPR. (math_opts_dom_walker::after_dom_children) [RSHIFT_EXPR]: Call new convert_mult_to_highpart function. (pass_optimize_widening_mul::execute): Add a statistics counter for tracking "highpart multiplications inserted" events. gcc/testsuite/ChangeLog * gcc.target/i386/mult-highpart.c: New test case. Thanks in advance, Roger diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 6131824..2014139 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -207,6 +207,9 @@ static struct /* Number of divmod calls inserted. */ int divmod_calls_inserted; + + /* Number of highpart multiplication ops inserted. */ + int highpart_mults_inserted; } widen_mul_stats; /* The instance of "struct occurrence" representing the highest @@ -4548,9 +4551,129 @@ convert_to_divmod (gassign *stmt) return true; } +/* Process a single gimple statement STMT, which has a RSHIFT_EXPR as + its rhs, and try to convert it into a MULT_HIGHPART_EXPR. The return + value is true iff we converted the statement. */ + +static bool +convert_mult_to_highpart (gimple *stmt, gimple_stmt_iterator *gsi) +{ + tree lhs = gimple_assign_lhs (stmt); + tree stype = TREE_TYPE (lhs); + tree sarg0 = gimple_assign_rhs1 (stmt); + tree sarg1 = gimple_assign_rhs2 (stmt); + + if (TREE_CODE (stype) != INTEGER_TYPE + || TREE_CODE (sarg1) != INTEGER_CST + || TREE_CODE (sarg0) != SSA_NAME + || !tree_fits_uhwi_p (sarg1) + || !has_single_use (sarg0)) + return false; + + gimple *def = SSA_NAME_DEF_STMT (sarg0); + if (!is_gimple_assign (def)) + return false; + + enum tree_code mcode = gimple_assign_rhs_code (def); + if (mcode == NOP_EXPR) + { + tree tmp = gimple_assign_rhs1 (def); + if (TREE_CODE (tmp) != SSA_NAME || !has_single_use (tmp)) + return false; + def = SSA_NAME_DEF_STMT (tmp); + if (!is_gimple_assign (def)) + return false; + mcode = gimple_assign_rhs_code (def); + } + + if (mcode != WIDEN_MULT_EXPR + || gimple_bb (def) != gimple_bb (stmt)) + return false; + tree mtype = TREE_TYPE (gimple_assign_lhs (def)); + if (TREE_CODE (mtype) != INTEGER_TYPE + || TYPE_PRECISION (mtype) != TYPE_PRECISION (stype)) + return false; + + tree mop1 = gimple_assign_rhs1 (def); + tree mop2 = gimple_assign_rhs2 (def); + tree optype = TREE_TYPE (mop1); + bool unsignedp = TYPE_UNSIGNED (optype); + unsigned int prec = TYPE_PRECISION (optype); + + if (optype != TREE_TYPE (mop2) + || unsignedp != TYPE_UNSIGNED (mtype) + || TYPE_PRECISION (mtype) != 2 * prec) + return false; + + unsigned HOST_WIDE_INT bits = tree_to_uhwi (sarg1); + if (bits < prec || bits >= 2 * prec) + return false; + + machine_mode mode = TYPE_MODE (optype); + optab tab = unsignedp ? umul_highpart_optab : smul_highpart_optab; + if (optab_handler (tab, mode) == CODE_FOR_nothing) + return false; + + location_t loc = gimple_location (stmt); + tree highpart1 = build_and_insert_binop (gsi, loc, "highparttmp", + MULT_HIGHPART_EXPR, mop1, mop2); + tree highpart2 = highpart1; + tree ntype = optype; + + if (TYPE_UNSIGNED (stype) != TYPE_UNSIGNED (optype)) + { + ntype = TYPE_UNSIGNED (stype) ? unsigned_type_for (optype) + : signed_type_for (optype); + highpart2 = build_and_insert_cast (gsi, loc, ntype, highpart1); + } + if (bits > prec) + highpart2 = build_and_insert_binop (gsi, loc, "highparttmp", + RSHIFT_EXPR, highpart2, + build_int_cst (ntype, bits - prec)); + + gassign *new_stmt = gimple_build_assign (lhs, NOP_EXPR, highpart2); + gsi_replace (gsi, new_stmt, true); + + /* The above transformation often creates useless type conversions, i.e. + when followed by a truncation, that aren't cleaned up elsewhere. */ + gimple *use_stmt; + imm_use_iterator use_iter; + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs) + if (gimple_assign_cast_p (use_stmt)) + { + tree utype = TREE_TYPE (gimple_assign_lhs (use_stmt)); + if (useless_type_conversion_p (ntype, utype)) + { + gimple_stmt_iterator use_gsi = gsi_for_stmt (use_stmt); + new_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), + highpart2); + gsi_replace (&use_gsi, new_stmt, true); + } + else if (bits == prec && useless_type_conversion_p (optype, utype)) + { + gimple_stmt_iterator use_gsi = gsi_for_stmt (use_stmt); + new_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), + highpart1); + gsi_replace (&use_gsi, new_stmt, true); + } + else if (TREE_CODE (utype) == INTEGER_TYPE + && TYPE_PRECISION (utype) <= prec) + { + gimple_stmt_iterator use_gsi = gsi_for_stmt (use_stmt); + new_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), + NOP_EXPR, highpart2); + gsi_replace (&use_gsi, new_stmt, true); + } + } + + widen_mul_stats.highpart_mults_inserted++; + return true; +} + + /* Find integer multiplications where the operands are extended from smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR - where appropriate. */ + or MULT_HIGHPART_EXPR where appropriate. */ namespace { @@ -4656,6 +4779,10 @@ math_opts_dom_walker::after_dom_children (basic_block bb) convert_to_divmod (as_a (stmt)); break; + case RSHIFT_EXPR: + convert_mult_to_highpart (stmt, &gsi); + break; + default:; } } @@ -4738,6 +4865,8 @@ pass_optimize_widening_mul::execute (function *fun) widen_mul_stats.fmas_inserted); statistics_counter_event (fun, "divmod calls inserted", widen_mul_stats.divmod_calls_inserted); + statistics_counter_event (fun, "highpart multiplications inserted", + widen_mul_stats.highpart_mults_inserted); return cfg_changed ? TODO_cleanup_cfg : 0; } diff --git a/gcc/testsuite/gcc.target/i386/mult-highpart.c b/gcc/testsuite/gcc.target/i386/mult-highpart.c new file mode 100644 index 0000000..efde311 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mult-highpart.c @@ -0,0 +1,167 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -Wno-long-long -fdump-tree-optimized" } */ + +typedef unsigned int __attribute ((mode(TI))) uti_t; +typedef int __attribute ((mode(TI))) ti_t; + +long long stest1(long long x, long long y) +{ + return ((ti_t)x * (ti_t)y) >> 64; +} + +long long stest2(long long x) +{ + return ((ti_t)x * 19065) >> 64; +} + +long long stest3(long long x, long long y) +{ + return (uti_t)((ti_t)x * (ti_t)y) >> 64; +} + +long long stest4(long long x) +{ + return (uti_t)((ti_t)x * 19065) >> 64; +} + +ti_t stest5(long long x, long long y) +{ + return ((ti_t)x * (ti_t)y) >> 64; +} + +ti_t stest6(long long x) +{ + return ((ti_t)x * 19065) >> 64; +} + +uti_t stest7(long long x, long long y) +{ + return (uti_t)((ti_t)x * (ti_t)y) >>64; +} + +uti_t stest8(long long x) +{ + return (uti_t)((ti_t)x * 19065) >> 64; +} + +long long stest9(long long x, long long y) +{ + return ((ti_t)x * (ti_t)y) >> 72; +} + +long long stest10(long long x) +{ + return ((ti_t)x * 19065) >> 72; +} + +long long stest11(long long x, long long y) +{ + return (uti_t)((ti_t)x * (ti_t)y) >> 72; +} + +long long stest12(long long x) +{ + return (uti_t)((ti_t)x * 19065) >> 72; +} + +ti_t stest13(long long x, long long y) +{ + return ((ti_t)x * (ti_t)y) >> 72; +} + +ti_t stest14(long long x) +{ + return ((ti_t)x * 19065) >> 72; +} + +uti_t stest15(long long x, long long y) +{ + return (uti_t)((ti_t)x * (ti_t)y) >> 72; +} + +uti_t stest16(long long x) +{ + return (uti_t)((ti_t)x * 19065) >> 72; +} + +unsigned long long utest1(unsigned long long x, unsigned long long y) +{ + return ((uti_t)x * (uti_t)y) >> 64; +} + +unsigned long long utest2(unsigned long long x) +{ + return ((uti_t)x * 19065) >> 64; +} + +unsigned long long utest3(unsigned long long x, unsigned long long y) +{ + return (ti_t)((uti_t)x * (uti_t)y) >> 64; +} + +unsigned long long utest4(unsigned long long x) +{ + return (ti_t)((uti_t)x * 19065) >> 64; +} + +uti_t utest5(unsigned long long x, unsigned long long y) +{ + return ((uti_t)x * (uti_t)y) >> 64; +} + +uti_t utest6(unsigned long long x) +{ + return ((uti_t)x * 19065) >> 64; +} + +ti_t utest7(unsigned long long x, unsigned long long y) +{ + return (ti_t)((uti_t)x * (uti_t)y) >>64; +} + +ti_t utest8(long long x) +{ + return (uti_t)((ti_t)x * 19065) >> 64; +} + +unsigned long long utest9(unsigned long long x, unsigned long long y) +{ + return ((uti_t)x * (uti_t)y) >> 72; +} + +unsigned long long utest10(unsigned long long x) +{ + return ((uti_t)x * 19065) >> 72; +} + +unsigned long long utest11(unsigned long long x, unsigned long long y) +{ + return (ti_t)((uti_t)x * (uti_t)y) >> 72; +} + +unsigned long long utest12(unsigned long long x) +{ + return (ti_t)((uti_t)x * 19065) >> 72; +} + +uti_t utest13(unsigned long long x, unsigned long long y) +{ + return ((uti_t)x * (uti_t)y) >> 72; +} + +uti_t utest14(unsigned long long x) +{ + return ((uti_t)x * 19065) >> 72; +} + +ti_t utest15(unsigned long long x, unsigned long long y) +{ + return (ti_t)((uti_t)x * (uti_t)y) >> 72; +} + +ti_t utest16(unsigned long long x) +{ + return (ti_t)((uti_t)x * 19065) >> 72; +} + +/* { dg-final { scan-tree-dump-times " h\\* " 32 "optimized" } } */