From patchwork Tue Aug 20 21:19:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 34198 Received: (qmail 111298 invoked by alias); 20 Aug 2019 21:19:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 111164 invoked by uid 89); 20 Aug 2019 21:19:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 spammy=H*Ad:D*br X-HELO: mx0a-001b2d01.pphosted.com From: "Paul A. Clarke" To: libc-alpha@sourceware.org Cc: tuliom@ascii.art.br, murphyp@linux.ibm.com Subject: [PATCH 1/4] [powerpc] fe{en, dis}ableexcept, fesetmode: optimize FPSCR accesses Date: Tue, 20 Aug 2019 16:19:42 -0500 In-Reply-To: <1566335985-14601-1-git-send-email-pc@us.ibm.com> References: <1566335985-14601-1-git-send-email-pc@us.ibm.com> x-cbid: 19082021-0072-0000-0000-00000453ECD7 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00011625; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000287; SDB=6.01249686; UDB=6.00659730; IPR=6.01031235; MB=3.00028250; MTD=3.00000008; XFM=3.00000015; UTC=2019-08-20 21:19:49 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19082021-0073-0000-0000-00004CC50E13 Message-Id: <1566335985-14601-2-git-send-email-pc@us.ibm.com> From: "Paul A. Clarke" Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the "mode" (exception enable and rounding mode) bits of the Floating Point Status Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be used to write just those bits back to the FPSCR. The net is better performance. In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or they determine that it doesn't need to be written because it is not changing. In either case, the local variable holds the current values of the enable bits in the FPSCR. This local variable can be used instead of again reading the FPSCR. Also, that value of the FPSCR which is read the second time is validated against the requested enables. Since the write can't fail, this validation step is unnecessary, and can be removed. Instead, the exceptions to be enabled (or disabled) are transformed into available bits in the FPSCR, then validated after being transformed back, to ensure that all requested bits are actually being set. For example, FE_INVALID_SQRT can be requested, but cannot actually be set. This bit is not mapped during the transformations, so a test for that bit being set before and after transformations will show the bit would not be set, and the function will return -1 for failure. Finally, convert the local macros in fesetmode.c to more generally useful macros in fenv_libc.h. 2019-08-20 Paul A. Clarke * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): New. (FPSCR_FPRF_MASK): New. (FPSCR_STATUS_MASK): New. * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Use lighter- weight access to FPSCR; remove unnecessary second FPSCR read and validate. * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Likewise. * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Use lighter-weight access to FPSCR; Use macros in fenv_libc.h in favor of local. --- v2: - Address issue raised by Paul Murphy. If the specified set of exceptions cannot be enabled (or disabled), then the function will return failure. - The current version of the code will enable (or disable) what it can _and_ return failure. This version will just return failure. sysdeps/powerpc/fpu/fedisblxcpt.c | 14 ++++++++------ sysdeps/powerpc/fpu/feenablxcpt.c | 15 ++++++++------- sysdeps/powerpc/fpu/fenv_libc.h | 10 +++++++++- sysdeps/powerpc/fpu/fesetmode.c | 15 +++++---------- 4 files changed, 30 insertions(+), 24 deletions(-) diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c index 5cc8799..a2b7add 100644 --- a/sysdeps/powerpc/fpu/fedisblxcpt.c +++ b/sysdeps/powerpc/fpu/fedisblxcpt.c @@ -26,23 +26,25 @@ fedisableexcept (int excepts) int result, new; /* Get current exception mask to return. */ - fe.fenv = curr.fenv = fegetenv_register (); + fe.fenv = curr.fenv = fegetenv_status (); result = fenv_reg_to_exceptions (fe.l); if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID) excepts = (excepts | FE_INVALID) & ~ FE_ALL_INVALID; + new = fenv_exceptions_to_reg (excepts); + + if (fenv_reg_to_exceptions (new) != excepts) + return -1; + /* Sets the new exception mask. */ - fe.l &= ~ fenv_exceptions_to_reg (excepts); + fe.l &= ~new; if (fe.l != curr.l) - fesetenv_register (fe.fenv); + fesetenv_mode (fe.fenv); - new = __fegetexcept (); if (new == 0 && result != 0) (void)__fe_mask_env (); - if ((new & excepts) != 0) - result = -1; return result; } diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c index 3b64398..c06a7fd 100644 --- a/sysdeps/powerpc/fpu/feenablxcpt.c +++ b/sysdeps/powerpc/fpu/feenablxcpt.c @@ -26,24 +26,25 @@ feenableexcept (int excepts) int result, new; /* Get current exception mask to return. */ - fe.fenv = curr.fenv = fegetenv_register (); + fe.fenv = curr.fenv = fegetenv_status (); result = fenv_reg_to_exceptions (fe.l); if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID) excepts = (excepts | FE_INVALID) & ~ FE_ALL_INVALID; + new = fenv_exceptions_to_reg (excepts); + + if (fenv_reg_to_exceptions (new) != excepts) + return -1; + /* Sets the new exception mask. */ - fe.l |= fenv_exceptions_to_reg (excepts); + fe.l |= new; if (fe.l != curr.l) - fesetenv_register (fe.fenv); + fesetenv_mode (fe.fenv); - new = __fegetexcept (); if (new != 0 && result == 0) (void) __fe_nomask_env_priv (); - if ((new & excepts) != excepts) - result = -1; - return result; } diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h index 853239f..8ba4832 100644 --- a/sysdeps/powerpc/fpu/fenv_libc.h +++ b/sysdeps/powerpc/fpu/fenv_libc.h @@ -70,6 +70,11 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden; __builtin_mtfsf (0xff, d); \ } while(0) +/* Set the last 2 nibbles of the FPSCR, which contain the + exception enables and the rounding mode. + 'fegetenv_status' retrieves these bits by reading the FPSCR. */ +#define fesetenv_mode(env) __builtin_mtfsf (0b00000011, (env)); + /* This very handy macro: - Sets the rounding mode to 'round to nearest'; - Sets the processor into IEEE mode; and @@ -206,8 +211,11 @@ enum { (FPSCR_VE_MASK|FPSCR_OE_MASK|FPSCR_UE_MASK|FPSCR_ZE_MASK|FPSCR_XE_MASK) #define FPSCR_BASIC_EXCEPTIONS_MASK \ (FPSCR_VX_MASK|FPSCR_OX_MASK|FPSCR_UX_MASK|FPSCR_ZX_MASK|FPSCR_XX_MASK) - +#define FPSCR_FPRF_MASK \ + (FPSCR_FPRF_C_MASK|FPSCR_FPRF_FL_MASK|FPSCR_FPRF_FG_MASK| \ + FPSCR_FPRF_FE_MASK|FPSCR_FPRF_FU_MASK) #define FPSCR_CONTROL_MASK (FPSCR_ENABLES_MASK|FPSCR_NI_MASK|FPSCR_RN_MASK) +#define FPSCR_STATUS_MASK (FPSCR_FR_MASK|FPSCR_FI_MASK|FPSCR_FPRF_MASK) /* The bits in the FENV(1) ABI for exceptions correspond one-to-one with bits in the FPSCR, albeit shifted to different but corresponding locations. diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c index 4f4f71a..e92559b 100644 --- a/sysdeps/powerpc/fpu/fesetmode.c +++ b/sysdeps/powerpc/fpu/fesetmode.c @@ -19,11 +19,6 @@ #include #include -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM \ - | _FPU_MASK_XM | _FPU_MASK_IM) - -#define FPU_STATUS 0xbffff700ULL - int fesetmode (const femode_t *modep) { @@ -32,18 +27,18 @@ fesetmode (const femode_t *modep) /* Logic regarding enabled exceptions as in fesetenv. */ new.fenv = *modep; - old.fenv = fegetenv_register (); - new.l = (new.l & ~FPU_STATUS) | (old.l & FPU_STATUS); + old.fenv = fegetenv_status (); + new.l = (new.l & ~FPSCR_STATUS_MASK) | (old.l & FPSCR_STATUS_MASK); if (old.l == new.l) return 0; - if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0) + if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0) (void) __fe_nomask_env_priv (); - if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0) + if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0) (void) __fe_mask_env (); - fesetenv_register (new.fenv); + fesetenv_mode (new.fenv); return 0; }