From patchwork Thu Mar 17 19:41:56 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Roger Sayle <roger@nextmovesoftware.com>
X-Patchwork-Id: 52088
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id F08963870886
	for <patchwork@sourceware.org>; Thu, 17 Mar 2022 19:53:19 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 605963842405
 for <gcc-patches@gcc.gnu.org>; Thu, 17 Mar 2022 19:41:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 605963842405
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=IzwfXTJvRS0LhL5yLJ778LiHX3bC8n6EzocI2cjZn+A=; b=N+mIobxA4euyH4M7HdgPaqvLew
 Q8uSVhqiK8FRnGideZZFW+1OL18lNUU2136ZCD+kVhfrtQPZ45uD7vU2a/txTm1Sq3huQSKpQl6KC
 LDN/wU/oG3YuEbX0WYz3YZHrkxdmC+cjZLUD1l2jzBMvK9RMgMIR72EaL9xAV8ViXdekpopcsHHVL
 Qketb9yscg0kYSPFrEB1ejJA/vK7bB9XeInHU5+DMi53mg2ut0dHwFFM+rHTtZll+K/eTT4gatVrH
 4dvr8wMtt+92Ozr6WiufEanPbOk9AIdCZva21Vbp/QUSHXX85chOo8/yS4e9aAYymajaZtxO08JOs
 FhWZ7EaQ==;
Received: from host86-186-213-42.range86-186.btcentralplus.com
 ([86.186.213.42]:57219 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1nUw0U-0007Wq-L7; Thu, 17 Mar 2022 15:41:58 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: [x86_64 PATCH] PR 90356: Use xor to load const_double 0.0 on SSE
 (always)
Date: Thu, 17 Mar 2022 19:41:56 -0000
Message-ID: <008e01d83a37$10aac930$32005b90$@nextmovesoftware.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: Adg6NfW/HgkrFv9ZQk+C+ujIOFOBQg==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com:
 roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Implementations of the x87 floating point instruction set have always
had some pretty strange characteristics.  For example on the original
Intel Pentium the FLDPI instruction (to load 3.14159... into a register)
took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles,
when a regular FLD (load from memory) took just 1 cycle!?  Given that
back then memory latencies were much lower (relatively) than they are
today, these instructions were all but useless except when optimizing
for size (impressively FLDZ/FLDPI require only two bytes).

Such was the world back in 2006 when Uros Bizjak first added support for
fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html
and then shortly after sensibly disabled them for !optimize_size with
https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html
[which was very expertly reviewed and approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204487.html ]

"And some things that should not have been forgotten were lost.
History became legend.  Legend became myth." -- Lord of the Rings

Alas this vestigial logic still persists in the compiler today,
so for example on x86_64 for the following function:

double foo(double x) { return x + 0.0; }

generates with -O2

foo:    addsd   .LC0(%rip), %xmm0
        ret
.LC0:   .long   0
        .long   0

preferring to read the constant 0.0 from memory [the constant pool],
except when optimizing for size.  With -Os we get:

foo:    xorps   %xmm1, %xmm1
        addsd   %xmm1, %xmm0
        ret

Which is not only smaller (the two instructions require seven bytes vs.
eight for the original addsd from mem, even without considering the
constant pool) but is also faster on modern hardware.  The latter code
sequence is generated by both clang and msvc with -O2.  Indeed Agner
Fogg documents the set of floating point/SSE constants that it's
cheaper to materialize than to load from memory.

This patch shuffles the conditions on the i386 backend's *movtf_internal,
*movdf_internal and *movsf_internal define_insns to untangle the newer
TARGET_SSE_MATH clauses from the historical standard_80387_constant_p
conditions.  Amongst the benefits of this are that it improves the code
generated for PR tree-optimization/90356 and resolves PR target/86722.
Many thanks to Hongtao whose approval of my PR 94680 "movq" patch
unblocked this one.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -check with no new failures.  Ok for mainline?


2022-03-17  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/86722
	PR tree-optimization/90356
	* config/i386/i386.md (*movtf_internal): Don't guard
	standard_sse_constant_p clause by optimize_function_for_size_p.
	(*movdf_internal): Likewise.
	(*movsf_internal): Likewise.

gcc/testsuite/ChangeLog
 	PR target/86722
 	PR tree-optimization/90356
 	* gcc.target/i386/pr86722.c: New test case.
	* gcc.target/i386/pr90356.c: New test case.

Thanks in advance,
Roger

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 46a2663..5b44c65 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3455,9 +3455,7 @@
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))
    && (lra_in_progress || reload_completed
        || !CONST_DOUBLE_P (operands[1])
-       || ((optimize_function_for_size_p (cfun)
-	    || (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC))
-	   && standard_sse_constant_p (operands[1], TFmode) == 1
+       || (standard_sse_constant_p (operands[1], TFmode) == 1
 	   && !memory_operand (operands[0], TFmode))
        || (!TARGET_MEMORY_MISMATCH_STALL
 	   && memory_operand (operands[0], TFmode)))"
@@ -3590,10 +3588,11 @@
        || !CONST_DOUBLE_P (operands[1])
        || ((optimize_function_for_size_p (cfun)
 	    || (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC))
-	   && ((IS_STACK_MODE (DFmode)
-		&& standard_80387_constant_p (operands[1]) > 0)
-	       || (TARGET_SSE2 && TARGET_SSE_MATH
-		   && standard_sse_constant_p (operands[1], DFmode) == 1))
+	   && IS_STACK_MODE (DFmode)
+	   && standard_80387_constant_p (operands[1]) > 0
+	   && !memory_operand (operands[0], DFmode))
+       || (TARGET_SSE2 && TARGET_SSE_MATH
+	   && standard_sse_constant_p (operands[1], DFmode) == 1
 	   && !memory_operand (operands[0], DFmode))
        || ((TARGET_64BIT || !TARGET_MEMORY_MISMATCH_STALL)
 	   && memory_operand (operands[0], DFmode))
@@ -3762,10 +3761,10 @@
        || !CONST_DOUBLE_P (operands[1])
        || ((optimize_function_for_size_p (cfun)
 	    || (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC))
-	   && ((IS_STACK_MODE (SFmode)
-		&& standard_80387_constant_p (operands[1]) > 0)
-	       || (TARGET_SSE && TARGET_SSE_MATH
-		   && standard_sse_constant_p (operands[1], SFmode) == 1)))
+	   && IS_STACK_MODE (SFmode)
+	   && standard_80387_constant_p (operands[1]) > 0)
+       || (TARGET_SSE && TARGET_SSE_MATH
+	   && standard_sse_constant_p (operands[1], SFmode) == 1)
        || memory_operand (operands[0], SFmode)
        || !TARGET_HARD_SF_REGS)"
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr86722.c b/gcc/testsuite/gcc.target/i386/pr86722.c
new file mode 100644
index 0000000..1092c4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr86722.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse" } */
+
+void f(double*d,double*e){
+  for(;d<e;++d)
+    *d=(*d<.5)?.7:0;
+}
+
+/* { dg-final { scan-assembler-not "andnpd" } } */
+/* { dg-final { scan-assembler-not "orpd" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr90356.c b/gcc/testsuite/gcc.target/i386/pr90356.c
new file mode 100644
index 0000000..ae533da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90356.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse" } */
+float doit(float k){
+    float c[2]={0.0};
+    c[1]+=k;
+    return c[0]+c[1];
+}
+
+/* { dg-final { scan-assembler "pxor" } } */