From patchwork Fri Apr 25 10:39:43 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 677 Return-Path: X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx20.g.dreamhost.com (mx2.sub5.homie.mail.dreamhost.com [208.113.200.128]) by wilcox.dreamhost.com (Postfix) with ESMTP id 632A0360177 for ; Fri, 25 Apr 2014 03:40:02 -0700 (PDT) Received: by homiemail-mx20.g.dreamhost.com (Postfix, from userid 14307373) id 124CC415F39DE; Fri, 25 Apr 2014 03:40:02 -0700 (PDT) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx20.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx20.g.dreamhost.com (Postfix) with ESMTPS id E001741586FE8 for ; Fri, 25 Apr 2014 03:40:01 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:references:in-reply-to:subject:date :message-id:mime-version:content-type; q=dns; s=default; b=bikjd 7bYtUXHytMZkvnwQfGz5NnsPvN7y0BbVB6t9pk3GbysQEXOQJEZPSYsf1YnR8zZh vR+i7fCfHPpyFfgwzy3cvvcdXxpqz5mjhA25lSggg1z02YwABSL0wwBaOta0P/4y smHuXiO3iP6tz8YN62DZ78AAcLOCencK1klOJo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:references:in-reply-to:subject:date :message-id:mime-version:content-type; s=default; bh=QfSib/dUVgH gQE/12PMsdygbRJ8=; b=FoAKkZgHzwpWgW2GEJzitThBPgeD3cK8ah0NV6Rip3H awAuK9bOrZDRy4EbbDSuwEd1Ung0q84GL+NELuVNRujbq3smWgV2xeOfXAu1ZhxO lxuGlAdkrsuf7+xIcd59pSEaRml47kOwNkHiEKqo8A+a/SK7GzT0rhIAK+x05WqY = Received: (qmail 31204 invoked by alias); 25 Apr 2014 10:39:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 31095 invoked by uid 89); 25 Apr 2014 10:39:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com From: "Wilco" To: References: In-Reply-To: Subject: RE: [PATCH] Add generic HAVE_RM_CTX implementation Date: Fri, 25 Apr 2014 11:39:43 +0100 Message-ID: <000601cf6072$b14178f0$13c46ad0$@com> MIME-Version: 1.0 X-MC-Unique: 114042511394815201 X-DH-Original-To: glibc@patchwork.siddhesh.in Ping -----Original Message----- From: Wilco [mailto:wdijkstr@arm.com] Sent: 15 April 2014 14:35 To: 'libc-alpha@sourceware.org' Subject: [PATCH] Add generic HAVE_RM_CTX implementation Hi, This patch adds a generic implementation of HAVE_RM_CTX using standard fenv calls. As a result math functions using SET_RESTORE_ROUND* macros do not suffer from a large slowdown on targets which do not implement optimized libc_fe*_ctx inline functions. Most of the libc_fe* inline functions are now unused and could be removed in the future (there are a few math functions left which use a mixture of standard fenv calls and libc_fe* inline functions - they could be updated to use SET_RESTORE_ROUND or improved to avoid expensive fenv manipulations across just a few FP instructions). libc_feholdsetround*_noex_ctx is added to enable better optimization of SET_RESTORE_ROUND_NOEX* implementations. Performance measurements on ARM and x86 of sin() show significant gains over the current default, fairly close to a highly optimized fenv_private: ARM x86 no fenv_private : 100% 100% generic HAVE_RM_CTX : 250% 350% fenv_private (CTX) : 250% 450% Wilco ChangeLog: 2014-04-15 Wilco * sysdeps/generic/math_private.h: Add generic HAVE_RM_CTX implementation. New function (libc_feholdsetround_noex_ctx). --- sysdeps/generic/math_private.h | 116 ++++++++++++++++++++++++++++++++-------- 1 file changed, 93 insertions(+), 23 deletions(-) diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h index 9b881a3..fade483 100644 --- a/sysdeps/generic/math_private.h +++ b/sysdeps/generic/math_private.h @@ -20,6 +20,7 @@ #include #include #include +#include /* The original fdlibm code used statements like: n0 = ((*(int*)&one)>>29)^1; * index of high word * @@ -557,6 +558,16 @@ default_libc_feupdateenv_test (fenv_t *e, int ex) block is different from the current state. This saves a lot of time when the floating point unit is much slower than the fixed point units. */ +# ifndef libc_feholdsetround_noex_ctx +# define libc_feholdsetround_noex_ctx libc_feholdsetround_ctx +# endif +# ifndef libc_feholdsetround_noexf_ctx +# define libc_feholdsetround_noexf_ctx libc_feholdsetroundf_ctx +# endif +# ifndef libc_feholdsetround_noexl_ctx +# define libc_feholdsetround_noexl_ctx libc_feholdsetroundl_ctx +# endif + # ifndef libc_feresetround_noex_ctx # define libc_feresetround_noex_ctx libc_fesetenv_ctx # endif @@ -567,24 +578,80 @@ default_libc_feupdateenv_test (fenv_t *e, int ex) # define libc_feresetround_noexl_ctx libc_fesetenvl_ctx # endif -# ifndef libc_feholdsetround_53bit_ctx -# define libc_feholdsetround_53bit_ctx libc_feholdsetround_ctx -# endif +#else -# ifndef libc_feresetround_53bit_ctx -# define libc_feresetround_53bit_ctx libc_feresetround_ctx -# endif +/* Default implementation using standard fenv functions. + Avoid unnecessary rounding mode changes by first checking the + current rounding mode. Note the use of __glibc_unlikely is + important for performance. */ -# define SET_RESTORE_ROUND_GENERIC(RM,ROUNDFUNC,CLEANUPFUNC) \ - struct rm_ctx ctx __attribute__((cleanup(CLEANUPFUNC ## _ctx))); \ - ROUNDFUNC ## _ctx (&ctx, (RM)) -#else -# define SET_RESTORE_ROUND_GENERIC(RM, ROUNDFUNC, CLEANUPFUNC) \ - fenv_t __libc_save_rm __attribute__((cleanup(CLEANUPFUNC))); \ - ROUNDFUNC (&__libc_save_rm, (RM)) +static __always_inline void +libc_feholdsetround_ctx (struct rm_ctx *ctx, int round) +{ + ctx->updated_status = false; + + /* Update rounding mode only if different. */ + if (__glibc_unlikely (round != get_rounding_mode ())) + { + ctx->updated_status = true; + fegetenv (&ctx->env); + fesetround (round); + } +} + +static __always_inline void +libc_feresetround_ctx (struct rm_ctx *ctx) +{ + /* Restore the rounding mode if updated. */ + if (__glibc_unlikely (ctx->updated_status)) + feupdateenv (&ctx->env); +} + +static __always_inline void +libc_feholdsetround_noex_ctx (struct rm_ctx *ctx, int round) +{ + /* Save exception flags and rounding mode. */ + fegetenv (&ctx->env); + + /* Update rounding mode only if different. */ + if (__glibc_unlikely (round != get_rounding_mode ())) + fesetround (round); +} + +static __always_inline void +libc_feresetround_noex_ctx (struct rm_ctx *ctx) +{ + /* Restore exception flags and rounding mode. */ + fesetenv (&ctx->env); +} + +# define libc_feholdsetroundf_ctx libc_feholdsetround_ctx +# define libc_feholdsetroundl_ctx libc_feholdsetround_ctx +# define libc_feresetroundf_ctx libc_feresetround_ctx +# define libc_feresetroundl_ctx libc_feresetround_ctx + +# define libc_feholdsetround_noexf_ctx libc_feholdsetround_noex_ctx +# define libc_feholdsetround_noexl_ctx libc_feholdsetround_noex_ctx +# define libc_feresetround_noexf_ctx libc_feresetround_noex_ctx +# define libc_feresetround_noexl_ctx libc_feresetround_noex_ctx + +#endif + +#ifndef libc_feholdsetround_53bit_ctx +# define libc_feholdsetround_53bit_ctx libc_feholdsetround_ctx #endif +#ifndef libc_feresetround_53bit_ctx +# define libc_feresetround_53bit_ctx libc_feresetround_ctx +#endif + +#define SET_RESTORE_ROUND_GENERIC(RM,ROUNDFUNC,CLEANUPFUNC) \ + struct rm_ctx ctx __attribute__((cleanup (CLEANUPFUNC ## _ctx))); \ + ROUNDFUNC ## _ctx (&ctx, (RM)) -/* Save and restore the rounding mode within a lexical block. */ +/* Set the rounding mode within a lexical block. Restore the rounding mode to + the value at the start of the block. The exception mode must be preserved. + Exceptions raised within the block must be set in the exception flags. + Non-stop mode may be enabled inside the block. */ #define SET_RESTORE_ROUND(RM) \ SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetround, libc_feresetround) @@ -593,15 +660,18 @@ default_libc_feupdateenv_test (fenv_t *e, int ex) #define SET_RESTORE_ROUNDL(RM) \ SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetroundl, libc_feresetroundl) -/* Save and restore the rounding mode within a lexical block, and also - the set of exceptions raised within the block may be discarded. */ - -#define SET_RESTORE_ROUND_NOEX(RM) \ - SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetround, libc_feresetround_noex) -#define SET_RESTORE_ROUND_NOEXF(RM) \ - SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetroundf, libc_feresetround_noexf) -#define SET_RESTORE_ROUND_NOEXL(RM) \ - SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetroundl, libc_feresetround_noexl) +/* Set the rounding mode within a lexical block. Restore the rounding mode to + the value at the start of the block. The exception mode must be preserved. + Exceptions raised within the block must be discarded, and exception flags + are restored to the value at the start of the block. + Non-stop mode may be enabled inside the block. */ + +#define SET_RESTORE_ROUND_NOEX(RM) SET_RESTORE_ROUND_GENERIC (RM, \ + libc_feholdsetround_noex, libc_feresetround_noex) +#define SET_RESTORE_ROUND_NOEXF(RM) SET_RESTORE_ROUND_GENERIC (RM, \ + libc_feholdsetround_noexf, libc_feresetround_noexf) +#define SET_RESTORE_ROUND_NOEXL(RM) SET_RESTORE_ROUND_GENERIC (RM, \ + libc_feholdsetround_noexl, libc_feresetround_noexl) /* Like SET_RESTORE_ROUND, but also set rounding precision to 53 bits. */ #define SET_RESTORE_ROUND_53BIT(RM) \