From patchwork Sat Jul 23 08:51:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 56282 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 37F1B3857829 for ; Sat, 23 Jul 2022 08:51:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 3D9613858C50 for ; Sat, 23 Jul 2022 08:51:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3D9613858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=kmEl1Kvv8Imh1J++LTPKOsW/1st6oeXjRyhSJctlHMs=; b=MgnxUWw1wbDS+A2E8B3fHeqA8I IsulR8ErAFdgllUEBuMkWh7p3dtsudxljUPjsvmfHU1hXUG+1EeApBrlFBnyGYPh7IXxQLRU9Jyt7 WywufhjMZ3+uIqyurwDvxXmuMuZF9yq2UvKCgLKx3NjLsqTrMa1BAJWY342efkjOjS3ZMc0jss9U+ ckoFjabpuxZooZaisSpOSbT4BtzKf6DAfg/tKr2ohZIBF1fNhoarAFxkm51NnCKud0lGxK80GFnRe hdlnXP+E5eYUGUVlOiXvXs+eqxrqbVYnDH/oixmnesu+AXt6YDPcmb+kvZrBxKggRQGPiT4nw4iHj 9Ehdz4aQ==; Received: from host109-154-33-170.range109-154.btcentralplus.com ([109.154.33.170]:55416 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oFAqx-0001qV-Bh; Sat, 23 Jul 2022 04:51:15 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [x86 PATCH take #3] PR target/91681: zero_extendditi2 pattern for more optimizations. Date: Sat, 23 Jul 2022 09:51:11 +0100 Message-ID: <06bb01d89e71$5d6283a0$18278ae0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adieb2U2ZUf87U+NQuSXVFgQzV/7CQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi Uros, This is the next iteration of the zero_extendditi2 patch last reviewed here: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596204.html [1] The sse.md changes were split out, reviewed, approved and committed. [2] The *concat splitters have been moved post-reload matching what we now do for many/most of the double word functionality. [3] As you recommend, these *concat splitters now use split_double_mode to "subreg" operand[0] into parts, via a new helper function that can also handle overlapping registers, and even use xchg for the rare case that a double word is constructed from its high and low parts, but the wrong way around. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without -target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-07-23 Roger Sayle Uroš Bizjak gcc/ChangeLog PR target/91681 * config/i386/i386-expand.cc (split_double_concat): A new helper function for setting a double word value from two word values. * config/i386/i386-protos.h (split_double_concat): Prototype here. * config/i386/i386.md (zero_extendditi2): New define_insn_and_split. (*add3_doubleword_zext): New define_insn_and_split. (*sub3_doubleword_zext): New define_insn_and_split. (*concat3_1): New define_insn_and_split replacing previous define_split for implementing DST = (HI<<32)|LO as pair of move instructions, setting lopart and hipart. (*concat3_2): Likewise. (*concat3_3): Likewise, where HI is zero_extended. (*concat3_4): Likewise, where HI is zero_extended. gcc/testsuite/ChangeLog PR target/91681 * g++.target/i386/pr91681.C: New test case (from the PR). * gcc.target/i386/pr91681-1.c: New int128 test case. * gcc.target/i386/pr91681-2.c: Likewise. * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 40f821e..66d8f28 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -165,6 +165,46 @@ split_double_mode (machine_mode mode, rtx operands[], } } +/* Emit the double word assignment DST = { LO, HI }. */ + +void +split_double_concat (machine_mode mode, rtx dst, rtx lo, rtx hi) +{ + rtx dlo, dhi; + int deleted_move_count = 0; + split_double_mode (mode, &dst, 1, &dlo, &dhi); + if (!rtx_equal_p (dlo, hi)) + { + if (!rtx_equal_p (dlo, lo)) + emit_move_insn (dlo, lo); + else + deleted_move_count++; + if (!rtx_equal_p (dhi, hi)) + emit_move_insn (dhi, hi); + else + deleted_move_count++; + } + else if (!rtx_equal_p (lo, dhi)) + { + if (!rtx_equal_p (dhi, hi)) + emit_move_insn (dhi, hi); + else + deleted_move_count++; + if (!rtx_equal_p (dlo, lo)) + emit_move_insn (dlo, lo); + else + deleted_move_count++; + } + else if (mode == TImode) + emit_insn (gen_swapdi (dlo, dhi)); + else + emit_insn (gen_swapsi (dlo, dhi)); + + if (deleted_move_count == 2) + emit_note (NOTE_INSN_DELETED); +} + + /* Generate either "mov $0, reg" or "xor reg, reg", as appropriate for the target. */ diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index cf84775..e27c14f 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -85,6 +85,7 @@ extern void print_reg (rtx, int, FILE*); extern void ix86_print_operand (FILE *, rtx, int); extern void split_double_mode (machine_mode, rtx[], int, rtx[], rtx[]); +extern void split_double_concat (machine_mode, rtx, rtx lo, rtx); extern const char *output_set_got (rtx, rtx); extern const char *output_387_binary_op (rtx_insn *, rtx*); diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9aaeb69..4560681 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4379,6 +4379,16 @@ (set_attr "type" "imovx,mskmov,mskmov") (set_attr "mode" "SI,QI,QI")]) +(define_insn_and_split "zero_extendditi2" + [(set (match_operand:TI 0 "nonimmediate_operand" "=r,o") + (zero_extend:TI (match_operand:DI 1 "nonimmediate_operand" "rm,r")))] + "TARGET_64BIT" + "#" + "&& reload_completed" + [(set (match_dup 3) (match_dup 1)) + (set (match_dup 4) (const_int 0))] + "split_double_mode (TImode, &operands[0], 1, &operands[3], &operands[4]);") + ;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l. (define_peephole2 [(parallel [(set (match_operand:SWI48 0 "general_reg_operand") @@ -6512,6 +6522,31 @@ [(set_attr "type" "alu") (set_attr "mode" "QI")]) +(define_insn_and_split "*add3_doubleword_zext" + [(set (match_operand: 0 "nonimmediate_operand" "=r,o") + (plus: + (zero_extend: + (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) + (match_operand: 1 "nonimmediate_operand" "0,0"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (UNKNOWN, mode, operands)" + "#" + "&& reload_completed" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (plus:DWIH (match_dup 1) (match_dup 2)) + (match_dup 1))) + (set (match_dup 0) + (plus:DWIH (match_dup 1) (match_dup 2)))]) + (parallel [(set (match_dup 3) + (plus:DWIH + (plus:DWIH + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)) + (match_dup 4)) + (const_int 0))) + (clobber (reg:CC FLAGS_REG))])] + "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);") + ;; Like DWI, but use POImode instead of OImode. (define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "POI")]) @@ -6962,6 +6997,29 @@ } }) +(define_insn_and_split "*sub3_doubleword_zext" + [(set (match_operand: 0 "nonimmediate_operand" "=r,o") + (minus: + (match_operand: 1 "nonimmediate_operand" "0,0") + (zero_extend: + (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (UNKNOWN, mode, operands)" + "#" + "&& reload_completed" + [(parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (minus:DWIH (match_dup 1) (match_dup 2)))]) + (parallel [(set (match_dup 3) + (minus:DWIH + (minus:DWIH + (match_dup 4) + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))) + (const_int 0))) + (clobber (reg:CC FLAGS_REG))])] + "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);") + (define_insn "*sub_1" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,") (minus:SWI @@ -11111,34 +11169,68 @@ ;; Split DST = (HI<<32)|LO early to minimize register usage. (define_code_iterator any_or_plus [plus ior xor]) -(define_split - [(set (match_operand:DI 0 "register_operand") - (any_or_plus:DI - (ashift:DI (match_operand:DI 1 "register_operand") - (const_int 32)) - (zero_extend:DI (match_operand:SI 2 "register_operand"))))] - "!TARGET_64BIT" - [(set (match_dup 3) (match_dup 4)) - (set (match_dup 5) (match_dup 2))] +(define_insn_and_split "*concat3_1" + [(set (match_operand: 0 "nonimmediate_operand" "=ro") + (any_or_plus: + (ashift: (match_operand: 1 "register_operand" "r") + (match_operand: 2 "const_int_operand")) + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] + "INTVAL (operands[2]) == * BITS_PER_UNIT" + "#" + "&& reload_completed" + [(clobber (const_int 0))] { - operands[3] = gen_highpart (SImode, operands[0]); - operands[4] = gen_lowpart (SImode, operands[1]); - operands[5] = gen_lowpart (SImode, operands[0]); + split_double_concat (mode, operands[0], operands[3], + gen_lowpart (mode, operands[1])); + DONE; }) -(define_split - [(set (match_operand:DI 0 "register_operand") - (any_or_plus:DI - (zero_extend:DI (match_operand:SI 1 "register_operand")) - (ashift:DI (match_operand:DI 2 "register_operand") - (const_int 32))))] - "!TARGET_64BIT" - [(set (match_dup 3) (match_dup 4)) - (set (match_dup 5) (match_dup 1))] +(define_insn_and_split "*concat3_2" + [(set (match_operand: 0 "nonimmediate_operand" "=ro") + (any_or_plus: + (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) + (ashift: (match_operand: 2 "register_operand" "r") + (match_operand: 3 "const_int_operand"))))] + "INTVAL (operands[3]) == * BITS_PER_UNIT" + "#" + "&& reload_completed" + [(clobber (const_int 0))] +{ + split_double_concat (mode, operands[0], operands[1], + gen_lowpart (mode, operands[2])); + DONE; +}) + +(define_insn_and_split "*concat3_3" + [(set (match_operand: 0 "nonimmediate_operand" "=ro") + (any_or_plus: + (ashift: + (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) + (match_operand: 2 "const_int_operand")) + (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] + "INTVAL (operands[2]) == * BITS_PER_UNIT" + "#" + "&& reload_completed" + [(clobber (const_int 0))] { - operands[3] = gen_highpart (SImode, operands[0]); - operands[4] = gen_lowpart (SImode, operands[2]); - operands[5] = gen_lowpart (SImode, operands[0]); + split_double_concat (mode, operands[0], operands[3], operands[1]); + DONE; +}) + +(define_insn_and_split "*concat3_4" + [(set (match_operand: 0 "nonimmediate_operand" "=ro") + (any_or_plus: + (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) + (ashift: + (zero_extend: (match_operand:DWIH 2 "register_operand" "r")) + (match_operand: 3 "const_int_operand"))))] + "INTVAL (operands[3]) == * BITS_PER_UNIT" + "#" + "&& reload_completed" + [(clobber (const_int 0))] +{ + split_double_concat (mode, operands[0], operands[1], operands[2]); + DONE; }) ;; Negation instructions diff --git a/gcc/testsuite/g++.target/i386/pr91681.C b/gcc/testsuite/g++.target/i386/pr91681.C new file mode 100644 index 0000000..0271e43 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr91681.C @@ -0,0 +1,20 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +void multiply128x64x2_3 ( + const unsigned long a, + const unsigned long b, + const unsigned long c, + const unsigned long d, + __uint128_t o[2]) +{ + __uint128_t B0 = (__uint128_t) b * c; + __uint128_t B2 = (__uint128_t) a * c; + __uint128_t B1 = (__uint128_t) b * d; + __uint128_t B3 = (__uint128_t) a * d; + + o[0] = B2 + (B0 >> 64); + o[1] = B3 + (B1 >> 64); +} + +/* { dg-final { scan-assembler-not "xor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr91681-1.c b/gcc/testsuite/gcc.target/i386/pr91681-1.c new file mode 100644 index 0000000..ab83cc4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91681-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ +unsigned __int128 m; + +unsigned __int128 foo(unsigned __int128 x, unsigned long long y) +{ + return x + y; +} + +void bar(unsigned __int128 x, unsigned long long y) +{ + m = x + y; +} + +void baz(unsigned long long y) +{ + m += y; +} + +/* { dg-final { scan-assembler-not "xor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr91681-2.c b/gcc/testsuite/gcc.target/i386/pr91681-2.c new file mode 100644 index 0000000..ea52c72 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91681-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ +unsigned __int128 m; + +unsigned __int128 foo(unsigned __int128 x, unsigned long long y) +{ + return x - y; +} + +void bar(unsigned __int128 x, unsigned long long y) +{ + m = x - y; +} + +void baz(unsigned long long y) +{ + m -= y; +} + +/* { dg-final { scan-assembler-not "xor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr91681-3.c b/gcc/testsuite/gcc.target/i386/pr91681-3.c new file mode 100644 index 0000000..22a03c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91681-3.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ + +unsigned long long m; + +unsigned long long foo(unsigned long long x, unsigned int y) +{ + return x - y; +} + +void bar(unsigned long long x, unsigned int y) +{ + m = x - y; +} + +/* { dg-final { scan-assembler-not "xor" } } */