From patchwork Tue Apr 15 13:35:06 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 559 Return-Path: X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx21.g.dreamhost.com (peon2454.g.dreamhost.com [208.113.200.127]) by wilcox.dreamhost.com (Postfix) with ESMTP id 4B5D5360075 for ; Tue, 15 Apr 2014 06:35:08 -0700 (PDT) Received: by homiemail-mx21.g.dreamhost.com (Postfix, from userid 14307373) id D9FA1137520E; Tue, 15 Apr 2014 06:35:07 -0700 (PDT) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx21.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx21.g.dreamhost.com (Postfix) with ESMTPS id A85E81375245 for ; Tue, 15 Apr 2014 06:35:07 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=default; b=Pt/ WqnHSa2akAg7amKd3gVnBuvsHzAcj6qKNDr9YZJeaG08b0sr3teEAC0S93UuBxA6 FYDJgo4+RBVUhnSiPn2GLDjoQUUM6JnHBdZg0h1WNNmEOLjwliL50/7TdrMNuu9a edwU3zg8aa+g2dA9JzUFTL5U30xjwZfwOSmicKP8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:mime-version :content-type:content-transfer-encoding; s=default; bh=cL1uzZMXb 6/LBYKgIOU0ak3Do1E=; b=OK7Kj1um1vqbczlTNvMkDPeFgR/D4xOFv+zoVwKUY 9HbInJ+cACreJRQxTMESJdumDxdtFnINqb3gAN9KP4yCmgRiVwlwxfdLIf9Il+Ot PvEs0F/G72xpX9izKVc5gBrznRvbTn/SddHqljYXr1nMkfkYwvUrPGtxtZYs5q4V kk= Received: (qmail 16636 invoked by alias); 15 Apr 2014 13:35:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 16613 invoked by uid 89); 15 Apr 2014 13:35:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com From: "Wilco" To: Subject: [PATCH] Add generic HAVE_RM_CTX implementation Date: Tue, 15 Apr 2014 14:35:06 +0100 Message-ID: <000101cf58af$83900a90$8ab01fb0$@com> MIME-Version: 1.0 X-MC-Unique: 114041514350004601 X-DH-Original-To: glibc@patchwork.siddhesh.in Hi, This patch adds a generic implementation of HAVE_RM_CTX using standard fenv calls. As a result math functions using SET_RESTORE_ROUND* macros do not suffer from a large slowdown on targets which do not implement optimized libc_fe*_ctx inline functions. Most of the libc_fe* inline functions are now unused and could be removed in the future (there are a few math functions left which use a mixture of standard fenv calls and libc_fe* inline functions - they could be updated to use SET_RESTORE_ROUND or improved to avoid expensive fenv manipulations across just a few FP instructions). libc_feholdsetround*_noex_ctx is added to enable better optimization of SET_RESTORE_ROUND_NOEX* implementations. Performance measurements on ARM and x86 of sin() show significant gains over the current default, fairly close to a highly optimized fenv_private: ARM x86 no fenv_private : 100% 100% generic HAVE_RM_CTX : 250% 350% fenv_private (CTX) : 250% 450% Wilco ChangeLog: 2014-04-15 Wilco * sysdeps/generic/math_private.h: Add generic HAVE_RM_CTX implementation. New function (libc_feholdsetround_noex_ctx). --- sysdeps/generic/math_private.h | 116 ++++++++++++++++++++++++++++++++-------- 1 file changed, 93 insertions(+), 23 deletions(-) diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h index 9b881a3..fade483 100644 --- a/sysdeps/generic/math_private.h +++ b/sysdeps/generic/math_private.h @@ -20,6 +20,7 @@ #include #include #include +#include /* The original fdlibm code used statements like: n0 = ((*(int*)&one)>>29)^1; * index of high word * @@ -557,6 +558,16 @@ default_libc_feupdateenv_test (fenv_t *e, int ex) block is different from the current state. This saves a lot of time when the floating point unit is much slower than the fixed point units. */ +# ifndef libc_feholdsetround_noex_ctx +# define libc_feholdsetround_noex_ctx libc_feholdsetround_ctx +# endif +# ifndef libc_feholdsetround_noexf_ctx +# define libc_feholdsetround_noexf_ctx libc_feholdsetroundf_ctx +# endif +# ifndef libc_feholdsetround_noexl_ctx +# define libc_feholdsetround_noexl_ctx libc_feholdsetroundl_ctx +# endif + # ifndef libc_feresetround_noex_ctx # define libc_feresetround_noex_ctx libc_fesetenv_ctx # endif @@ -567,24 +578,80 @@ default_libc_feupdateenv_test (fenv_t *e, int ex) # define libc_feresetround_noexl_ctx libc_fesetenvl_ctx # endif -# ifndef libc_feholdsetround_53bit_ctx -# define libc_feholdsetround_53bit_ctx libc_feholdsetround_ctx -# endif +#else -# ifndef libc_feresetround_53bit_ctx -# define libc_feresetround_53bit_ctx libc_feresetround_ctx -# endif +/* Default implementation using standard fenv functions. + Avoid unnecessary rounding mode changes by first checking the + current rounding mode. Note the use of __glibc_unlikely is + important for performance. */ -# define SET_RESTORE_ROUND_GENERIC(RM,ROUNDFUNC,CLEANUPFUNC) \ - struct rm_ctx ctx __attribute__((cleanup(CLEANUPFUNC ## _ctx))); \ - ROUNDFUNC ## _ctx (&ctx, (RM)) -#else -# define SET_RESTORE_ROUND_GENERIC(RM, ROUNDFUNC, CLEANUPFUNC) \ - fenv_t __libc_save_rm __attribute__((cleanup(CLEANUPFUNC))); \ - ROUNDFUNC (&__libc_save_rm, (RM)) +static __always_inline void +libc_feholdsetround_ctx (struct rm_ctx *ctx, int round) +{ + ctx->updated_status = false; + + /* Update rounding mode only if different. */ + if (__glibc_unlikely (round != get_rounding_mode ())) + { + ctx->updated_status = true; + fegetenv (&ctx->env); + fesetround (round); + } +} + +static __always_inline void +libc_feresetround_ctx (struct rm_ctx *ctx) +{ + /* Restore the rounding mode if updated. */ + if (__glibc_unlikely (ctx->updated_status)) + feupdateenv (&ctx->env); +} + +static __always_inline void +libc_feholdsetround_noex_ctx (struct rm_ctx *ctx, int round) +{ + /* Save exception flags and rounding mode. */ + fegetenv (&ctx->env); + + /* Update rounding mode only if different. */ + if (__glibc_unlikely (round != get_rounding_mode ())) + fesetround (round); +} + +static __always_inline void +libc_feresetround_noex_ctx (struct rm_ctx *ctx) +{ + /* Restore exception flags and rounding mode. */ + fesetenv (&ctx->env); +} + +# define libc_feholdsetroundf_ctx libc_feholdsetround_ctx +# define libc_feholdsetroundl_ctx libc_feholdsetround_ctx +# define libc_feresetroundf_ctx libc_feresetround_ctx +# define libc_feresetroundl_ctx libc_feresetround_ctx + +# define libc_feholdsetround_noexf_ctx libc_feholdsetround_noex_ctx +# define libc_feholdsetround_noexl_ctx libc_feholdsetround_noex_ctx +# define libc_feresetround_noexf_ctx libc_feresetround_noex_ctx +# define libc_feresetround_noexl_ctx libc_feresetround_noex_ctx + +#endif + +#ifndef libc_feholdsetround_53bit_ctx +# define libc_feholdsetround_53bit_ctx libc_feholdsetround_ctx #endif +#ifndef libc_feresetround_53bit_ctx +# define libc_feresetround_53bit_ctx libc_feresetround_ctx +#endif + +#define SET_RESTORE_ROUND_GENERIC(RM,ROUNDFUNC,CLEANUPFUNC) \ + struct rm_ctx ctx __attribute__((cleanup (CLEANUPFUNC ## _ctx))); \ + ROUNDFUNC ## _ctx (&ctx, (RM)) -/* Save and restore the rounding mode within a lexical block. */ +/* Set the rounding mode within a lexical block. Restore the rounding mode to + the value at the start of the block. The exception mode must be preserved. + Exceptions raised within the block must be set in the exception flags. + Non-stop mode may be enabled inside the block. */ #define SET_RESTORE_ROUND(RM) \ SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetround, libc_feresetround) @@ -593,15 +660,18 @@ default_libc_feupdateenv_test (fenv_t *e, int ex) #define SET_RESTORE_ROUNDL(RM) \ SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetroundl, libc_feresetroundl) -/* Save and restore the rounding mode within a lexical block, and also - the set of exceptions raised within the block may be discarded. */ - -#define SET_RESTORE_ROUND_NOEX(RM) \ - SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetround, libc_feresetround_noex) -#define SET_RESTORE_ROUND_NOEXF(RM) \ - SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetroundf, libc_feresetround_noexf) -#define SET_RESTORE_ROUND_NOEXL(RM) \ - SET_RESTORE_ROUND_GENERIC (RM, libc_feholdsetroundl, libc_feresetround_noexl) +/* Set the rounding mode within a lexical block. Restore the rounding mode to + the value at the start of the block. The exception mode must be preserved. + Exceptions raised within the block must be discarded, and exception flags + are restored to the value at the start of the block. + Non-stop mode may be enabled inside the block. */ + +#define SET_RESTORE_ROUND_NOEX(RM) SET_RESTORE_ROUND_GENERIC (RM, \ + libc_feholdsetround_noex, libc_feresetround_noex) +#define SET_RESTORE_ROUND_NOEXF(RM) SET_RESTORE_ROUND_GENERIC (RM, \ + libc_feholdsetround_noexf, libc_feresetround_noexf) +#define SET_RESTORE_ROUND_NOEXL(RM) SET_RESTORE_ROUND_GENERIC (RM, \ + libc_feholdsetround_noexl, libc_feresetround_noexl) /* Like SET_RESTORE_ROUND, but also set rounding precision to 53 bits. */ #define SET_RESTORE_ROUND_53BIT(RM) \