From patchwork Fri Jun  7 17:59:46 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "H.J. Lu" <hjl.tools@gmail.com>
X-Patchwork-Id: 33048
Received: (qmail 45175 invoked by alias); 7 Jun 2019 18:00:34 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 36338 invoked by uid 89); 7 Jun 2019 18:00:27 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-13.9 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2,
	GIT_PATCH_3, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE,
	SPF_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mail-ot1-f66.google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20161025;
	h=mime-version:from:date:message-id:subject:to;
	bh=Ft40I7aIj2GiHj/sYep+/LgtSN8WIYku6jcOrDzpJXQ=;
	b=ZN+2qdkOShe3YVdXo146ANVlIBpPiJC8LUHCkHMFA7UMsGLq6dxwzd62X9vwlrKEcl
	gPhhH9zrSb/U+u9P046nm8HkP5an/YP8UV8X+v1BneDJdPDxgvxnVdJdaFrm/V3p9izF
	doDi6j/tyMunNUS9lh2uiheNubmmuTd6XA6JMPLGsoCd1Sv5x50rV04ZqUIwE8lFru4D
	s0fZdW8ga6fMnfMpZ4pQAzQO7AcF9BvJdeoyeLJGXT8mN/N0Gd4MIQ5yRC9nmQ9ZFyr9
	gJNZvKBzTzpekeMUGxTvhLlurJw6EJbRk9W/2hH3pyabotfvG3bsRXxAP2xhjfYI/g6P
	CzKg==
MIME-Version: 1.0
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 7 Jun 2019 10:59:46 -0700
Message-ID: 
 <CAMe9rOoP_293SM=sYpuqx5Yg9K1a-F9GtN6g0DDFjHTZM5OkcA@mail.gmail.com>
Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128
To: GNU C Library <libc-alpha@sourceware.org>

-O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c
with 256-bit vector instructions, which leads to store forward stall:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

There is no easy fix in compiler.  This patch limits vector width to
128 bits to work around this issue.  It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.

OK for master branch?

* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New.  Set
to -mprefer-vector-width=128.

From 53f43ccf241896d37b759ac416df0ef0ccd2da0e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 17 May 2019 14:23:03 -0700
Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128

-O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c
with 256-bit vector instructions, which leads to store forward stall:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

There is no easy fix in compiler.  This patch limits vector width to
128 bits to work around this issue.  It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.

	* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New.  Set
	to -mprefer-vector-width=128.
---
 sysdeps/x86_64/fpu/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 2b7d69bb50..b5f9589021 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -237,3 +237,7 @@ CFLAGS-test-float-libmvec-sincosf-avx512.c = -DREQUIRE_AVX512F
 CFLAGS-test-float-libmvec-sincosf-avx512-main.c = $(libmvec-sincos-cflags) $(float-vlen16-arch-ext-cflags)
 endif
 endif
+
+ifeq ($(subdir),math)
+CFLAGS-branred.c = -mprefer-vector-width=128
+endif
-- 
2.20.1