From patchwork Sat Sep 25 07:43:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 45433 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 228D93857811 for ; Sat, 25 Sep 2021 07:43:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 387573858403 for ; Sat, 25 Sep 2021 07:43:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 387573858403 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=X8udccaF1NNlM8YLgW6KEoLadF8WbrkxTyoMjs2pseE=; b=RdamvfoRaB/6WxLpkAEihkr0CP s/a8NK7lnHZzF6Q7p1o/T+STWlTBirGpi1ROwtp5MiUV2hkzBOG4Wj0rVfUqzrzJbd+U6jB2/aCe6 qsKcsh/u6x3WwuKR/mc7WpyzS6WAQmSVJ96iAGiYNPjDxrTs/R42xNSvhjl+GZzWdf/qgJxoyJ+jX VJReJvy0P3TnVol+wY6HgDHZjVTrwi1FgFx05jDGGXxuhnfY9HfwpuLFZd7V+8AyR8vLqvpsZFnza WXTnZk3Nhn0kxNB2aJpQvbB1nS5zDmMpzFf6BuESDxYdf/z0anfI6Sb/elWW8qbq9yDWxo+LpCNtP Oxs7e9UA==; Received: from host86-186-213-65.range86-186.btcentralplus.com ([86.186.213.65]:61294 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mU2LA-0000lL-AU for gcc-patches@gcc.gnu.org; Sat, 25 Sep 2021 03:43:20 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] Introduce sh_mul and uh_mul RTX codes for high-part multiplications Date: Sat, 25 Sep 2021 08:43:18 +0100 Message-ID: <01ce01d7b1e1$02f00000$08d00000$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adex38niLGjYQCbFTmuylnXbQ6gCTg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch introduces new RTX codes to allow the RTL passes and backends to consistently represent high-part multiplications. Currently, the RTL used by different backends for expanding smul3_highpart and umul3_highpart varies greatly, with many but not all choosing to express this something like: (define_insn "smuldi3_highpart" [(set (match_operand:DI 0 "nvptx_register_operand" "=R") (truncate:DI (lshiftrt:TI (mult:TI (sign_extend:TI (match_operand:DI 1 "nvptx_register_operand" "R")) (sign_extend:TI (match_operand:DI 2 "nvptx_register_operand" "R"))) (const_int 64))))] "" "%.\\tmul.hi.s64\\t%0, %1, %2;") One complication with using this "widening multiplication" representation is that it requires an intermediate in a wider mode, making it difficult or impossible to encode a high-part multiplication of the widest supported integer mode. A second is that it can interfere with optimization; for example simplify-rtx.c contains the comment: case TRUNCATE: /* Don't optimize (lshiftrt (mult ...)) as it would interfere with the umulXi3_highpart patterns. */ Hopefully these problems are solved (or reduced) by introducing a new canonical form for high-part multiplications in RTL passes. This also simplifies insn patterns when one operand is constant. Whilst implementing some constant folding simplifications and compile-time evaluation of these new RTX codes, I noticed that this functionality could also be added for the existing saturating arithmetic RTX codes. Then likewise when documenting these new RTX codes, I also took the opportunity to silence the @xref warnings in invoke.texi. This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2021-09-25 Roger Sayle gcc/ChangeLog * gcc/rtl.def (SH_MULT, UH_MULT): New RTX codes for representing signed and unsigned high-part multiplication respectively. * gcc/simplify-rtx.c (simplify_binary_operation_1) [SH_MULT, UH_MULT]: Simplify high-part multiplications by zero. [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT, SS_DIV, US_DIV]: Similar simplifications for saturating arithmetic. (simplify_const_binary_operation) [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT, SH_MULT, UH_MULT]: Implement compile-time evaluation for constant operands. * gcc/dwarf2out.c (mem_loc_descriptor): Skip SH_MULT and UH_MULT. * doc/rtl.texi (sh_mult, uhmult): Document new RTX codes. * doc/md.texi (smul@var{m}3_highpart, umul@var{m3}_highpart): Mention the new sh_mul and uh_mul RTX codes. * doc/invoke.texi: Silence @xref "compilation" warnings. Roger diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4acb941..2de7d99 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -3125,7 +3125,7 @@ errors if these functions are not inlined everywhere they are called. @itemx -fno-modules-ts @opindex fmodules-ts @opindex fno-modules-ts -Enable support for C++20 modules (@xref{C++ Modules}). The +Enable support for C++20 modules, see @xref{C++ Modules}. The @option{-fno-modules-ts} is usually not needed, as that is the default. Even though this is a C++20 feature, it is not currently implicitly enabled by selecting that standard version. @@ -33553,7 +33553,7 @@ version selected, although in pre-C++20 versions, it is of course an extension. No new source file suffixes are required or supported. If you wish to -use a non-standard suffix (@xref{Overall Options}), you also need +use a non-standard suffix, see @xref{Overall Options}, you also need to provide a @option{-x c++} option too.@footnote{Some users like to distinguish module interface files with a new suffix, such as naming the source @code{module.cppm}, which involves @@ -33615,8 +33615,8 @@ to be resolved at the end of compilation. Without this, imported macros are only resolved when expanded or (re)defined. This option detects conflicting import definitions for all macros. -@xref{C++ Module Mapper} for details of the @option{-fmodule-mapper} -family of options. +For details of the @option{-fmodule-mapper} family of options, +see @xref{C++ Module Mapper}. @menu * C++ Module Mapper:: Module Mapper @@ -33833,8 +33833,8 @@ dialect used and imports of the module.@footnote{The precise contents of this output may change.} The timestamp is the same value as that provided by the @code{__DATE__} & @code{__TIME__} macros, and may be explicitly specified with the environment variable -@code{SOURCE_DATE_EPOCH}. @xref{Environment Variables} for further -details. +@code{SOURCE_DATE_EPOCH}. For further details see +@xref{Environment Variables}. A set of related CMIs may be copied, provided the relative pathnames are preserved. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 2b41cb7..58b9bab0 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5776,11 +5776,13 @@ multiplication. @item @samp{smul@var{m}3_highpart} Perform a signed multiplication of operands 1 and 2, which have mode @var{m}, and store the most significant half of the product in operand 0. -The least significant half of the product is discarded. +The least significant half of the product is discarded. This may be +represented in RTL using a @code{sh_mul} RTX expression. @cindex @code{umul@var{m}3_highpart} instruction pattern @item @samp{umul@var{m}3_highpart} -Similar, but the multiplication is unsigned. +Similar, but the multiplication is unsigned. This may be represented +in RTL using an @code{uh_mul} RTX expression. @cindex @code{madd@var{m}@var{n}4} instruction pattern @item @samp{madd@var{m}@var{n}4} diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi index e1e76a9..94400a8 100644 --- a/gcc/doc/rtl.texi +++ b/gcc/doc/rtl.texi @@ -2524,7 +2524,19 @@ not be the same. For unsigned widening multiplication, use the same idiom, but with @code{zero_extend} instead of @code{sign_extend}. +@findex sh_mult +@findex uh_mult +@cindex high-part multiplication +@cindex multiplication high part +@item (sh_mult:@var{m} @var{x} @var{y}) +@itemx (uh_mult:@var{m} @var{x} @var{y}) +Represents the high-part multiplication of @var{x} and @var{y} carried +out in machine mode @var{m}. @code{sh_mult} returns the high part of +a signed multiplication @code{uh_mult} returns the high part of an +unsigned multiplication. + @findex fma +@cindex fused multiply-add @item (fma:@var{m} @var{x} @var{y} @var{z}) Represents the @code{fma}, @code{fmaf}, and @code{fmal} builtin functions, which compute @samp{@var{x} * @var{y} + @var{z}} diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 9876750..c89134e 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -16770,6 +16770,8 @@ mem_loc_descriptor (rtx rtl, machine_mode mode, natively. */ case SS_MULT: case US_MULT: + case SH_MULT: + case UH_MULT: case SS_DIV: case US_DIV: case SS_PLUS: diff --git a/gcc/rtl.def b/gcc/rtl.def index c80144b..8993e87 100644 --- a/gcc/rtl.def +++ b/gcc/rtl.def @@ -467,6 +467,11 @@ DEF_RTL_EXPR(SS_MULT, "ss_mult", "ee", RTX_COMM_ARITH) /* Multiplication with unsigned saturation */ DEF_RTL_EXPR(US_MULT, "us_mult", "ee", RTX_COMM_ARITH) +/* Signed high-part multiplication. */ +DEF_RTL_EXPR(SH_MULT, "sh_mult", "ee", RTX_COMM_ARITH) +/* Unsigned high-part multiplication. */ +DEF_RTL_EXPR(UH_MULT, "uh_mult", "ee", RTX_COMM_ARITH) + /* Operand 0 divided by operand 1. */ DEF_RTL_EXPR(DIV, "div", "ee", RTX_BIN_ARITH) /* Division with signed saturation */ diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index ebad5cb..b4b04b9 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4142,11 +4142,40 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, case US_PLUS: case SS_MINUS: case US_MINUS: + /* Simplify x + 0 to x, if possible. */ + if (trueop1 == CONST0_RTX (mode) && !HONOR_SIGNED_ZEROS (mode)) + return op0; + return 0; + case SS_MULT: case US_MULT: + /* Simplify x * 0 to 0, if possible. */ + if (trueop1 == CONST0_RTX (mode) + && !HONOR_NANS (mode) + && !HONOR_SIGNED_ZEROS (mode) + && !side_effects_p (op0)) + return op1; + + /* Simplify x * 1 to x, if possible. */ + if (trueop1 == CONST1_RTX (mode) && !HONOR_SNANS (mode)) + return op0; + return 0; + + case SH_MULT: + case UH_MULT: + /* Simplify x * 0 to 0, if possible. */ + if (trueop1 == CONST0_RTX (mode) + && !HONOR_NANS (mode) + && !HONOR_SIGNED_ZEROS (mode) + && !side_effects_p (op0)) + return op1; + return 0; + case SS_DIV: case US_DIV: - /* ??? There are simplifications that can be done. */ + /* Simplify x / 1 to x, if possible. */ + if (trueop1 == CONST1_RTX (mode) && !HONOR_SNANS (mode)) + return op0; return 0; case VEC_SERIES: @@ -5011,6 +5040,63 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode, } break; } + + case SS_PLUS: + result = wi::add (pop0, pop1, SIGNED, &overflow); + if (overflow == wi::OVF_OVERFLOW) + result = wi::max_value (GET_MODE_PRECISION (int_mode), SIGNED); + else if (overflow == wi::OVF_UNDERFLOW) + result = wi::max_value (GET_MODE_PRECISION (int_mode), SIGNED); + else if (overflow != wi::OVF_NONE) + return NULL_RTX; + break; + + case US_PLUS: + result = wi::add (pop0, pop1, UNSIGNED, &overflow); + if (overflow != wi::OVF_NONE) + result = wi::max_value (GET_MODE_PRECISION (int_mode), UNSIGNED); + break; + + case SS_MINUS: + result = wi::sub (pop0, pop1, SIGNED, &overflow); + if (overflow == wi::OVF_OVERFLOW) + result = wi::max_value (GET_MODE_PRECISION (int_mode), SIGNED); + else if (overflow == wi::OVF_UNDERFLOW) + result = wi::max_value (GET_MODE_PRECISION (int_mode), SIGNED); + else if (overflow != wi::OVF_NONE) + return NULL_RTX; + break; + + case US_MINUS: + result = wi::sub (pop0, pop1, UNSIGNED, &overflow); + if (overflow != wi::OVF_NONE) + result = wi::min_value (GET_MODE_PRECISION (int_mode), UNSIGNED); + break; + + case SS_MULT: + result = wi::mul (pop0, pop1, SIGNED, &overflow); + if (overflow == wi::OVF_OVERFLOW) + result = wi::max_value (GET_MODE_PRECISION (int_mode), SIGNED); + else if (overflow == wi::OVF_UNDERFLOW) + result = wi::max_value (GET_MODE_PRECISION (int_mode), SIGNED); + else if (overflow != wi::OVF_NONE) + return NULL_RTX; + break; + + case US_MULT: + result = wi::mul (pop0, pop1, UNSIGNED, &overflow); + if (overflow != wi::OVF_NONE) + result = wi::max_value (GET_MODE_PRECISION (int_mode), UNSIGNED); + break; + + case SH_MULT: + result = wi::mul_high (pop0, pop1, SIGNED); + break; + + case UH_MULT: + result = wi::mul_high (pop0, pop1, UNSIGNED); + break; + default: return NULL_RTX; }