From patchwork Fri Jun 7 17:59:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 33048 Received: (qmail 45175 invoked by alias); 7 Jun 2019 18:00:34 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 36338 invoked by uid 89); 7 Jun 2019 18:00:27 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-13.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-ot1-f66.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=Ft40I7aIj2GiHj/sYep+/LgtSN8WIYku6jcOrDzpJXQ=; b=ZN+2qdkOShe3YVdXo146ANVlIBpPiJC8LUHCkHMFA7UMsGLq6dxwzd62X9vwlrKEcl gPhhH9zrSb/U+u9P046nm8HkP5an/YP8UV8X+v1BneDJdPDxgvxnVdJdaFrm/V3p9izF doDi6j/tyMunNUS9lh2uiheNubmmuTd6XA6JMPLGsoCd1Sv5x50rV04ZqUIwE8lFru4D s0fZdW8ga6fMnfMpZ4pQAzQO7AcF9BvJdeoyeLJGXT8mN/N0Gd4MIQ5yRC9nmQ9ZFyr9 gJNZvKBzTzpekeMUGxTvhLlurJw6EJbRk9W/2hH3pyabotfvG3bsRXxAP2xhjfYI/g6P CzKg== MIME-Version: 1.0 From: "H.J. Lu" Date: Fri, 7 Jun 2019 10:59:46 -0700 Message-ID: Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128 To: GNU C Library -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions, which leads to store forward stall: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 There is no easy fix in compiler. This patch limits vector width to 128 bits to work around this issue. It improves performance of sin and cos by more than 40% on Skylake compiled with -O3 -march=skylake. OK for master branch? * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set to -mprefer-vector-width=128. From 53f43ccf241896d37b759ac416df0ef0ccd2da0e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 17 May 2019 14:23:03 -0700 Subject: [PATCH] x86-64: Compile branred.c with -mprefer-vector-width=128 -O3 with AVX vectorizes some loops in sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions, which leads to store forward stall: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 There is no easy fix in compiler. This patch limits vector width to 128 bits to work around this issue. It improves performance of sin and cos by more than 40% on Skylake compiled with -O3 -march=skylake. * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set to -mprefer-vector-width=128. --- sysdeps/x86_64/fpu/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile index 2b7d69bb50..b5f9589021 100644 --- a/sysdeps/x86_64/fpu/Makefile +++ b/sysdeps/x86_64/fpu/Makefile @@ -237,3 +237,7 @@ CFLAGS-test-float-libmvec-sincosf-avx512.c = -DREQUIRE_AVX512F CFLAGS-test-float-libmvec-sincosf-avx512-main.c = $(libmvec-sincos-cflags) $(float-vlen16-arch-ext-cflags) endif endif + +ifeq ($(subdir),math) +CFLAGS-branred.c = -mprefer-vector-width=128 +endif -- 2.20.1