From patchwork Mon Oct 11 08:54:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 46062 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C29D5385803D for ; Mon, 11 Oct 2021 08:55:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 6877E3858D28 for ; Mon, 11 Oct 2021 08:54:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6877E3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=wutlJfRmh5iJoPrQ8k9lt26OPu0FTo4CVHYasGku2yE=; b=Uqu1OlGKmhY/+lhkP0oHzVu1L/ K0fWWK5NE8XY+a68G7bFcqXRi0TSwq1ssfhXQjrWo8I/48LKqXpoGDPmUcoh9qGzvSwg1hQYp4UDy ENbCiyw8zMOmHHIYqmk8p/+k4iJWIiUDN5d8t9luA3UzU5RSBpbnpUQ8ckURsahVZm6z7y6QFApN8 Zg+reeqPMhAfkFo6yK3lKsBlpsEyfEqGL1ttq4rtVgFHgrzRW05I0mdZBoXd8QaATVE0nsQ2Vflay C41Sw2Cu8ANEenH3V/C5zGCM7QaVGPM2Lh5bIOzdhv15tzb695q/i9okbDCWlweb4t28BT2ndDGsv W3AMfAUg==; Received: from host86-186-213-65.range86-186.btcentralplus.com ([86.186.213.65]:55362 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mZr59-00038i-O6; Mon, 11 Oct 2021 04:54:51 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend. Date: Mon, 11 Oct 2021 09:54:51 +0100 Message-ID: <01a001d7be7d$a7eaeea0$f7c0cbe0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 thread-index: Ade+e6Og2Es2N77JQIKHidzmCVSbZA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch contains two SUBREG-related optimization enabling tweaks to the x86 backend. The first change, to ix86_expand_vector_extract, cures the strange -march=cascadelake related non-determinism that affected my new test cases last week. Extracting a QImode or HImode element from an SSE vector performs a zero-extension to SImode, which is currently represented as: (set (subreg:SI (reg:QI target)) (zero_extend:SI (...)) Unfortunately, the semantics of this RTL doesn't quite match what was intended. A set of a paradoxical subreg allows the high-bits to take an arbitrary value (hence the non-determinism). A more correct representation should be: (set (reg:SI temp) (zero_extend:SI (...)) (set (reg:QI target) (subreg:QI (reg:SI temp)) Optionally with the SUBREG rtx annotated as SUBREG_PROMOTED_VAR_P to indicate that value is already zero-extended in the SUBREG_REG. The second change is very similar, which is why I've included it in this patch, where currently the early RTL optimizers can produce: (set (reg:V?? hardreg) (subreg ...)) where this instruction may require a spill/reload from memory when the modes aren't tieable. Alas the presence of the hard register prevents combine/gcse etc. optimizing this away, or reusing the result which would increase the lifetime of the hard register before reload. The solution is to treat vector hard registers the same way as the x86 backend handles scalar hard registers, and only allow sets from pseudos before register allocation, which is achieved by checking ix86_hardreg_mov_ok. Hence the above instruction is expanded and maintained as: (set (reg:V?? pseudo) (subreg ...)) (set (reg:V?? hardreg) (reg:V?? pseudo)) which allows the RTL optimizers freedom to optimize the SUBREG. This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. In theory, my recent "obvious" regexp fix to accommodate -march=cascadelake is no longer required, but there's no harm leaving the testsuite as it is. Ok for mainline? 2021-10-11 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_vector_move): Use a pseudo intermediate when moving a SUBREG into a hard register, by checking ix86_hardreg_mov_ok. (ix86_expand_vector_extract): Store zero-extended SImode intermediate in a pseudo, then set target using a SUBREG_PROMOTED annotated subreg. * config/i386/sse.md (mov_internal): Prevent CSE creating complex (SUBREG) sets of (vector) hard registers before reload, by checking ix86_hardreg_mov_ok. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 4780b99..44404bd 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -617,8 +617,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) /* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () - && !register_operand (op0, mode) - && !register_operand (op1, mode)) + && (!ix86_hardreg_mov_ok (op0, op1) + || (!register_operand (op0, mode) + && !register_operand (op1, mode)))) { rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); emit_move_insn (tmp, op1); @@ -16005,11 +16006,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) /* Let the rtl optimizers know about the zero extension performed. */ if (inner_mode == QImode || inner_mode == HImode) { + rtx reg = gen_reg_rtx (SImode); tmp = gen_rtx_ZERO_EXTEND (SImode, tmp); - target = gen_lowpart (SImode, target); + emit_move_insn (reg, tmp); + tmp = gen_lowpart (inner_mode, reg); + SUBREG_PROMOTED_VAR_P (tmp) = 1; + SUBREG_PROMOTED_SET (tmp, 1); } - emit_insn (gen_rtx_SET (target, tmp)); + emit_move_insn (target, tmp); } else { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4559b0c..e43f597 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1270,7 +1270,8 @@ " C,,vm,v"))] "TARGET_SSE && (register_operand (operands[0], mode) - || register_operand (operands[1], mode))" + || register_operand (operands[1], mode)) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) {