From patchwork Mon Dec 27 19:05:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Engel X-Patchwork-Id: 49309 X-Patchwork-Delegate: rearnsha@gcc.gnu.org Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2802D385801C for ; Mon, 27 Dec 2021 19:20:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by sourceware.org (Postfix) with ESMTPS id 86BEB3858419 for ; Mon, 27 Dec 2021 19:10:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 86BEB3858419 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 4C03C3200B5E; Mon, 27 Dec 2021 14:10:58 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 27 Dec 2021 14:10:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm1; bh=yd+SAG1YXYWnn Fvse1+bnfUzEfXRITjoDSAPDfVBVvw=; b=G6JwbvjacS35bMJmz41oH0BsNcOiV FyhzFhL5yvI67p0vbZWYzRM4GfURQFvRFytL/ZJexGCksNW47KCYKYpI3jSBLjiy IbSiVyzNwBRVgP6wIuKYaOSyNFcDbqsxvA865h9g6zQ4MN+BHDRV8D/aOPL47krO Lol/CQh//Y1hVJknsnZMBypfAhUY6S+WVSCqF27CVT9qBLFQAYUTOAiUhmgRJUVY DaoCz66VpfGsniQ1SETJUDr+lj1Xc5dNP7pqKLdXpcyw1+80LERLJyrVMb6zwllE NglHXrRjNU4hwDM4yU4wUBzaCPnIKnQdRqRCajYpA9uAQ5Z7td95GIlmQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=yd+SAG1YXYWnnFvse1+bnfUzEfXRITjoDSAPDfVBVvw=; b=BIV7BZLN dpvotNz0bgZh1BvUq7MZwVFj9hb6b5IAuNS2fc4W56Hd+AN5uHMal5zBtQY6cMkI g+u87gS8/P/HWMQsLVexvvzJmxQLnRLk722aeStS2lCTULcI444tNm5z4liMd4v2 V5jhlIi8JyXF/06mNyD3XuqWEnJ77YvhGwvfHaeVUf+MNg5k0fTCQtcnkA0FyyIX R2iTtq4HCNcwxdgorYw34EZ7ZFugw/Mz+dm4xlTqKqCdNEzKZ6Qu+Rodhwaa6Xc7 Xc4EKIBIEyAum1mvZKAyRSvxCiSDcX6navoDCi0MYAPCIqpXbRp/6hF/rGM1EO6m aEFQzXC2WPw97g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddruddujedguddtlecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhgggfestdekredtredttdenucfhrhhomhepffgrnhhi vghlucfgnhhgvghluceoghhnuhesuggrnhhivghlvghnghgvlhdrtghomheqnecuggftrf grthhtvghrnhepfeduudffjeejheeiteeltdefteelteeiveekhfegjeegffekveeigfeg keeljedvnecuffhomhgrihhnpehfrgguugdrshgspdhgnhhurdhorhhgpdhfnhgvghdrsh gspdhfuhhtihhlrdhssgdplhhisgdufhhunhgtshdrshgsnecuvehluhhsthgvrhfuihii vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgvlh drtghomh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 Dec 2021 14:10:56 -0500 (EST) Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96]) by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 1BRJAtO0061040; Mon, 27 Dec 2021 11:10:55 -0800 (PST) (envelope-from gnu@danielengel.com) From: Daniel Engel To: Richard Earnshaw , gcc-patches@gcc.gnu.org Subject: [PATCH v6 26/34] Import float addition and subtraction from the CM0 library Date: Mon, 27 Dec 2021 11:05:22 -0800 Message-Id: <20211227190530.3136549-27-gnu@danielengel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211227190530.3136549-1-gnu@danielengel.com> References: <20211227190530.3136549-1-gnu@danielengel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Engel , Christophe Lyon Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Since this is the first import of single-precision functions, some common parsing and formatting routines are also included. These common rotines will be referenced by other functions in subsequent commits. However, even if the size penalty is accounted entirely to __addsf3(), the total compiled size is still less than half the size of soft-float. gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/eabi/fadd.S (__addsf3, __subsf3): Added new functions. * config/arm/eabi/fneg.S (__negsf2): Added new file. * config/arm/eabi/futil.S (__fp_normalize2, __fp_lalign2, __fp_assemble, __fp_overflow, __fp_zero, __fp_check_nan): Added new file with shared helper functions. * config/arm/lib1funcs.S: #include eabi/fneg.S and eabi/futil.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Added _arm_addsf3, _arm_frsubsf3, _fp_exceptionf, _fp_checknanf, _fp_assemblef, and _fp_normalizef. --- libgcc/config/arm/eabi/fadd.S | 306 +++++++++++++++++++++++- libgcc/config/arm/eabi/fneg.S | 76 ++++++ libgcc/config/arm/eabi/fplib.h | 3 - libgcc/config/arm/eabi/futil.S | 418 +++++++++++++++++++++++++++++++++ libgcc/config/arm/lib1funcs.S | 2 + libgcc/config/arm/t-elf | 6 + 6 files changed, 798 insertions(+), 13 deletions(-) create mode 100644 libgcc/config/arm/eabi/fneg.S create mode 100644 libgcc/config/arm/eabi/futil.S diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S index fffbd91d1bc..77b81d62b3b 100644 --- a/libgcc/config/arm/eabi/fadd.S +++ b/libgcc/config/arm/eabi/fadd.S @@ -1,5 +1,7 @@ -/* Copyright (C) 2006-2021 Free Software Foundation, Inc. - Contributed by CodeSourcery. +/* fadd.S: Thumb-1 optimized 32-bit float addition and subtraction + + Copyright (C) 2018-2021 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) This file is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the @@ -21,18 +23,302 @@ . */ +#ifdef L_arm_frsubsf3 + +// float __aeabi_frsub(float, float) +// Returns the floating point difference of $r1 - $r0 in $r0. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION aeabi_frsub .text.sorted.libgcc.fpcore.b.frsub + CFI_START_FUNCTION + + #if defined(STRICT_NANS) && STRICT_NANS + // Check if $r0 is NAN before modifying. + lsls r2, r0, #1 + movs r3, #255 + lsls r3, #24 + + // Let fadd() find the NAN in the normal course of operation, + // moving it to $r0 and checking the quiet/signaling bit. + cmp r2, r3 + bhi SYM(__aeabi_fadd) + #endif + + // Flip sign and run through fadd(). + movs r2, #1 + lsls r2, #31 + adds r0, r2 + b SYM(__aeabi_fadd) + + CFI_END_FUNCTION +FUNC_END aeabi_frsub + +#endif /* L_arm_frsubsf3 */ + + #ifdef L_arm_addsubsf3 -FUNC_START aeabi_frsub +// float __aeabi_fsub(float, float) +// Returns the floating point difference of $r0 - $r1 in $r0. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION aeabi_fsub .text.sorted.libgcc.fpcore.c.faddsub +FUNC_ALIAS subsf3 aeabi_fsub + CFI_START_FUNCTION - push {r4, lr} - movs r4, #1 - lsls r4, #31 - eors r0, r0, r4 - bl __aeabi_fadd - pop {r4, pc} + #if defined(STRICT_NANS) && STRICT_NANS + // Check if $r1 is NAN before modifying. + lsls r2, r1, #1 + movs r3, #255 + lsls r3, #24 - FUNC_END aeabi_frsub + // Let fadd() find the NAN in the normal course of operation, + // moving it to $r0 and checking the quiet/signaling bit. + cmp r2, r3 + bhi SYM(__aeabi_fadd) + #endif + + // Flip sign and fall into fadd(). + movs r2, #1 + lsls r2, #31 + adds r1, r2 #endif /* L_arm_addsubsf3 */ + +// The execution of __subsf3() flows directly into __addsf3(), such that +// instructions must appear consecutively in the same memory section. +// However, this construction inhibits the ability to discard __subsf3() +// when only using __addsf3(). +// Therefore, this block configures __addsf3() for compilation twice. +// The first version is a minimal standalone implementation, and the second +// version is the continuation of __subsf3(). The standalone version must +// be declared WEAK, so that the combined version can supersede it and +// provide both symbols when required. +// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS. +#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3) + +#ifdef L_arm_addsf3 +// float __aeabi_fadd(float, float) +// Returns the floating point sum of $r0 + $r1 in $r0. +// Subsection ordering within fpcore keeps conditional branches within range. +WEAK_START_SECTION aeabi_fadd .text.sorted.libgcc.fpcore.c.fadd +WEAK_ALIAS addsf3 aeabi_fadd + CFI_START_FUNCTION + +#else /* L_arm_addsubsf3 */ +FUNC_ENTRY aeabi_fadd +FUNC_ALIAS addsf3 aeabi_fadd + +#endif + + // Standard registers, compatible with exception handling. + push { rT, lr } + .cfi_remember_state + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Drop the sign bit to compare absolute value. + lsls r2, r0, #1 + lsls r3, r1, #1 + + // Save the logical difference of original values. + // This actually makes the following swap slightly faster. + eors r1, r0 + + // Compare exponents+mantissa. + // MAYBE: Speedup for equal values? This would have to separately + // check for NAN/INF and then either: + // * Increase the exponent by '1' (for multiply by 2), or + // * Return +0 + cmp r2, r3 + bhs LLSYM(__fadd_ordered) + + // Reorder operands so the larger absolute value is in r2, + // the corresponding original operand is in $r0, + // and the smaller absolute value is in $r3. + movs r3, r2 + eors r0, r1 + lsls r2, r0, #1 + + LLSYM(__fadd_ordered): + // Extract the exponent of the larger operand. + // If INF/NAN, then it becomes an automatic result. + lsrs r2, #24 + cmp r2, #255 + beq LLSYM(__fadd_special) + + // Save the sign of the result. + lsrs rT, r0, #31 + lsls rT, #31 + mov ip, rT + + // If the original value of $r1 was to +/-0, + // $r0 becomes the automatic result. + // Because $r0 is known to be a finite value, return directly. + // It's actually important that +/-0 not go through the normal + // process, to keep "-0 +/- 0" from being turned into +0. + cmp r3, #0 + beq LLSYM(__fadd_zero) + + // Extract the second exponent. + lsrs r3, #24 + + // Calculate the difference of exponents (always positive). + subs r3, r2, r3 + + #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + // If the smaller operand is more than 25 bits less significant + // than the larger, the larger operand is an automatic result. + // The smaller operand can't affect the result, even after rounding. + cmp r3, #25 + bhi LLSYM(__fadd_return) + #endif + + // Isolate both mantissas, recovering the smaller. + lsls rT, r0, #9 + lsls r0, r1, #9 + eors r0, rT + + // If the larger operand is normal, restore the implicit '1'. + // If subnormal, the second operand will also be subnormal. + cmp r2, #0 + beq LLSYM(__fadd_normal) + adds rT, #1 + rors rT, rT + + // If the smaller operand is also normal, restore the implicit '1'. + // If subnormal, the smaller operand effectively remains multiplied + // by 2 w.r.t the first. This compensates for subnormal exponents, + // which are technically still -126, not -127. + cmp r2, r3 + beq LLSYM(__fadd_normal) + adds r0, #1 + rors r0, r0 + + LLSYM(__fadd_normal): + // Provide a spare bit for overflow. + // Normal values will be aligned in bits [30:7] + // Subnormal values will be aligned in bits [30:8] + lsrs rT, #1 + lsrs r0, #1 + + // If signs weren't matched, negate the smaller operand (branchless). + asrs r1, #31 + eors r0, r1 + subs r0, r1 + + // Keep a copy of the small mantissa for the remainder. + movs r1, r0 + + // Align the small mantissa for addition. + asrs r1, r3 + + // Isolate the remainder. + // NOTE: Given the various cases above, the remainder will only + // be used as a boolean for rounding ties to even. It is not + // necessary to negate the remainder for subtraction operations. + rsbs r3, #0 + adds r3, #32 + lsls r0, r3 + + // Because operands are ordered, the result will never be negative. + // If the result of subtraction is 0, the overall result must be +0. + // If the overall result in $r1 is 0, then the remainder in $r0 + // must also be 0, so no register copy is necessary on return. + adds r1, rT + beq LLSYM(__fadd_return) + + // The large operand was aligned in bits [29:7]... + // If the larger operand was normal, the implicit '1' went in bit [30]. + // + // After addition, the MSB of the result may be in bit: + // 31, if the result overflowed. + // 30, the usual case. + // 29, if there was a subtraction of operands with exponents + // differing by more than 1. + // < 28, if there was a subtraction of operands with exponents +/-1, + // < 28, if both operands were subnormal. + + // In the last case (both subnormal), the alignment shift will be 8, + // the exponent will be 0, and no rounding is necessary. + cmp r2, #0 + bne SYM(__fp_assemble) + + // Subnormal overflow automatically forms the correct exponent. + lsrs r0, r1, #8 + add r0, ip + + LLSYM(__fadd_return): + pop { rT, pc } + .cfi_restore_state + + LLSYM(__fadd_special): + #if defined(TRAP_NANS) && TRAP_NANS + // If $r1 is (also) NAN, force it in place of $r0. + // As the smaller NAN, it is more likely to be signaling. + movs rT, #255 + lsls rT, #24 + cmp r3, rT + bls LLSYM(__fadd_ordered2) + + eors r0, r1 + #endif + + LLSYM(__fadd_ordered2): + // There are several possible cases to consider here: + // 1. Any NAN/NAN combination + // 2. Any NAN/INF combination + // 3. Any NAN/value combination + // 4. INF/INF with matching signs + // 5. INF/INF with mismatched signs. + // 6. Any INF/value combination. + // In all cases but the case 5, it is safe to return $r0. + // In the special case, a new NAN must be constructed. + // First, check the mantissa to see if $r0 is NAN. + lsls r2, r0, #9 + + #if defined(TRAP_NANS) && TRAP_NANS + bne SYM(__fp_check_nan) + #else + bne LLSYM(__fadd_return) + #endif + + LLSYM(__fadd_zero): + // Next, check for an INF/value combination. + lsls r2, r1, #1 + bne LLSYM(__fadd_return) + + // Finally, check for matching sign on INF/INF. + // Also accepts matching signs when +/-0 are added. + bcc LLSYM(__fadd_return) + + #if defined(EXCEPTION_CODES) && EXCEPTION_CODES + movs r3, #(SUBTRACTED_INFINITY) + #endif + + #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS + // Restore original operands. + eors r1, r0 + #endif + + // Identify mismatched 0. + lsls r2, r0, #1 + bne SYM(__fp_exception) + + // Force mismatched 0 to +0. + eors r0, r0 + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END addsf3 +FUNC_END aeabi_fadd + +#ifdef L_arm_addsubsf3 +FUNC_END subsf3 +FUNC_END aeabi_fsub +#endif + +#endif /* L_arm_addsf3 */ + diff --git a/libgcc/config/arm/eabi/fneg.S b/libgcc/config/arm/eabi/fneg.S new file mode 100644 index 00000000000..b92e247c3d3 --- /dev/null +++ b/libgcc/config/arm/eabi/fneg.S @@ -0,0 +1,76 @@ +/* fneg.S: Thumb-1 optimized 32-bit float negation + + Copyright (C) 2018-2021 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + + +#ifdef L_arm_negsf2 + +// float __aeabi_fneg(float) [obsolete] +// The argument and result are in $r0. +// Uses $r1 and $r2 as scratch registers. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION aeabi_fneg .text.sorted.libgcc.fpcore.a.fneg +FUNC_ALIAS negsf2 aeabi_fneg + CFI_START_FUNCTION + + #if (defined(STRICT_NANS) && STRICT_NANS) || \ + (defined(TRAP_NANS) && TRAP_NANS) + // Check for NAN. + lsls r1, r0, #1 + movs r2, #255 + lsls r2, #24 + cmp r1, r2 + + #if defined(TRAP_NANS) && TRAP_NANS + blo SYM(__fneg_nan) + #else + blo LLSYM(__fneg_return) + #endif + #endif + + // Flip the sign. + movs r1, #1 + lsls r1, #31 + eors r0, r1 + + LLSYM(__fneg_return): + RET + + #if defined(TRAP_NANS) && TRAP_NANS + LLSYM(__fneg_nan): + // Set up registers for exception handling. + push { rT, lr } + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + b SYM(fp_check_nan) + #endif + + CFI_END_FUNCTION +FUNC_END negsf2 +FUNC_END aeabi_fneg + +#endif /* L_arm_negsf2 */ + diff --git a/libgcc/config/arm/eabi/fplib.h b/libgcc/config/arm/eabi/fplib.h index ca22d3db8e3..c1924a37ab3 100644 --- a/libgcc/config/arm/eabi/fplib.h +++ b/libgcc/config/arm/eabi/fplib.h @@ -45,9 +45,6 @@ /* Push extra registers when required for 64-bit stack alignment */ #define DOUBLE_ALIGN_STACK (1) -/* Manipulate *div0() parameters to meet the ARM runtime ABI specification. */ -#define PEDANTIC_DIV0 (1) - /* Define various exception codes. These don't map to anything in particular */ #define SUBTRACTED_INFINITY (20) #define INFINITY_TIMES_ZERO (21) diff --git a/libgcc/config/arm/eabi/futil.S b/libgcc/config/arm/eabi/futil.S new file mode 100644 index 00000000000..923c1c28b41 --- /dev/null +++ b/libgcc/config/arm/eabi/futil.S @@ -0,0 +1,418 @@ +/* futil.S: Thumb-1 optimized 32-bit float helper functions + + Copyright (C) 2018-2021 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + + +// These helper functions are exported in distinct object files to keep +// the linker from importing unused code. +// These helper functions do NOT follow AAPCS register conventions. + + +#ifdef L_fp_normalizef + +// Internal function, decomposes the unsigned float in $r2. +// The exponent will be returned in $r2, the mantissa in $r3. +// If subnormal, the mantissa will be normalized, so that +// the MSB of the mantissa (if any) will be aligned at bit[31]. +// Preserves $r0 and $r1, uses $rT as scratch space. +FUNC_START_SECTION fp_normalize2 .text.sorted.libgcc.fpcore.y.alignf + CFI_START_FUNCTION + + // Extract the mantissa. + lsls r3, r2, #8 + + // Extract the exponent. + lsrs r2, #24 + beq SYM(__fp_lalign2) + + // Restore the mantissa's implicit '1'. + adds r3, #1 + rors r3, r3 + + RET + + CFI_END_FUNCTION +FUNC_END fp_normalize2 + + +// Internal function, aligns $r3 so the MSB is aligned in bit[31]. +// Simultaneously, subtracts the shift from the exponent in $r2 +FUNC_ENTRY fp_lalign2 + CFI_START_FUNCTION + + #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + // Unroll the loop, similar to __clzsi2(). + lsrs rT, r3, #16 + bne LLSYM(__align8) + subs r2, #16 + lsls r3, #16 + + LLSYM(__align8): + lsrs rT, r3, #24 + bne LLSYM(__align4) + subs r2, #8 + lsls r3, #8 + + LLSYM(__align4): + lsrs rT, r3, #28 + bne LLSYM(__align2) + subs r2, #4 + lsls r3, #4 + #endif + + LLSYM(__align2): + // Refresh the state of the N flag before entering the loop. + tst r3, r3 + + LLSYM(__align_loop): + // Test before subtracting to compensate for the natural exponent. + // The largest subnormal should have an exponent of 0, not -1. + bmi LLSYM(__align_return) + subs r2, #1 + lsls r3, #1 + bne LLSYM(__align_loop) + + // Not just a subnormal... 0! By design, this should never happen. + // All callers of this internal function filter 0 as a special case. + // Was there an uncontrolled jump from somewhere else? Cosmic ray? + eors r2, r2 + + #ifdef DEBUG + bkpt #0 + #endif + + LLSYM(__align_return): + RET + + CFI_END_FUNCTION +FUNC_END fp_lalign2 + +#endif /* L_fp_normalizef */ + + +#ifdef L_fp_assemblef + +// Internal function to combine mantissa, exponent, and sign. No return. +// Expects the unsigned result in $r1. To avoid underflow (slower), +// the MSB should be in bits [31:29]. +// Expects any remainder bits of the unrounded result in $r0. +// Expects the exponent in $r2. The exponent must be relative to bit[30]. +// Expects the sign of the result (and only the sign) in $ip. +// Returns a correctly rounded floating value in $r0. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION fp_assemble .text.sorted.libgcc.fpcore.g.assemblef + CFI_START_FUNCTION + + // Work around CFI branching limitations. + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Examine the upper three bits [31:29] for underflow. + lsrs r3, r1, #29 + beq LLSYM(__fp_underflow) + + // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }. + // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'. + lsrs r3, r1, #1 + mvns r3, r3 + ands r3, r1 + lsrs r3, #30 + subs r3, #2 + rors r1, r3 + + // Update the exponent, assuming the final result will be normal. + // The new exponent is 1 less than actual, to compensate for the + // eventual addition of the implicit '1' in the result. + // If the final exponent becomes negative, proceed directly to gradual + // underflow, without bothering to search for the MSB. + adds r2, r3 + + FUNC_ENTRY fp_assemble2 + bmi LLSYM(__fp_subnormal) + + LLSYM(__fp_normal): + // Check for overflow (remember the implicit '1' to be added later). + cmp r2, #254 + bge SYM(__fp_overflow) + + // Save LSBs for the remainder. Position doesn't matter any more, + // these are just tiebreakers for round-to-even. + lsls rT, r1, #25 + + // Align the final result. + lsrs r1, #8 + + LLSYM(__fp_round): + // If carry bit is '0', always round down. + bcc LLSYM(__fp_return) + + // The carry bit is '1'. Round to nearest, ties to even. + // If either the saved remainder bits [6:0], the additional remainder + // bits in $r1, or the final LSB is '1', round up. + lsls r3, r1, #31 + orrs r3, rT + orrs r3, r0 + beq LLSYM(__fp_return) + + // If rounding up overflows, then the mantissa result becomes 2.0, + // which yields the correct return value up to and including INF. + adds r1, #1 + + LLSYM(__fp_return): + // Combine the mantissa and the exponent. + lsls r2, #23 + adds r0, r1, r2 + + // Combine with the saved sign. + // End of library call, return to user. + add r0, ip + + #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS + // TODO: Underflow/inexact reporting IFF remainder + #endif + + pop { rT, pc } + .cfi_restore_state + + LLSYM(__fp_underflow): + // Set up to align the mantissa. + movs r3, r1 + bne LLSYM(__fp_underflow2) + + // MSB wasn't in the upper 32 bits, check the remainder. + // If the remainder is also zero, the result is +/-0. + movs r3, r0 + beq SYM(__fp_zero) + + eors r0, r0 + subs r2, #32 + + LLSYM(__fp_underflow2): + // Save the pre-alignment exponent to align the remainder later. + movs r1, r2 + + // Align the mantissa with the MSB in bit[31]. + bl SYM(__fp_lalign2) + + // Calculate the actual remainder shift. + subs rT, r1, r2 + + // Align the lower bits of the remainder. + movs r1, r0 + lsls r0, rT + + // Combine the upper bits of the remainder with the aligned value. + rsbs rT, #0 + adds rT, #32 + lsrs r1, rT + adds r1, r3 + + // The MSB is now aligned at bit[31] of $r1. + // If the net exponent is still positive, the result will be normal. + // Because this function is used by fmul(), there is a possibility + // that the value is still wider than 24 bits; always round. + tst r2, r2 + bpl LLSYM(__fp_normal) + + LLSYM(__fp_subnormal): + // The MSB is aligned at bit[31], with a net negative exponent. + // The mantissa will need to be shifted right by the absolute value of + // the exponent, plus the normal shift of 8. + + // If the negative shift is smaller than -25, there is no result, + // no rounding, no anything. Return signed zero. + // (Otherwise, the shift for result and remainder may wrap.) + adds r2, #25 + bmi SYM(__fp_inexact_zero) + + // Save the extra bits for the remainder. + movs rT, r1 + lsls rT, r2 + + // Shift the mantissa to create a subnormal. + // Just like normal, round to nearest, ties to even. + movs r3, #33 + subs r3, r2 + eors r2, r2 + + // This shift must be last, leaving the shifted LSB in the C flag. + lsrs r1, r3 + b LLSYM(__fp_round) + + CFI_END_FUNCTION +FUNC_END fp_assemble + + +// Recreate INF with the appropriate sign. No return. +// Expects the sign of the result in $ip. +FUNC_ENTRY fp_overflow + CFI_START_FUNCTION + + #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS + // TODO: inexact/overflow exception + #endif + + FUNC_ENTRY fp_infinity + + // Work around CFI branching limitations. + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + movs r0, #255 + lsls r0, #23 + add r0, ip + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END fp_overflow + + +// Recreate 0 with the appropriate sign. No return. +// Expects the sign of the result in $ip. +FUNC_ENTRY fp_inexact_zero + CFI_START_FUNCTION + + #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS + // TODO: inexact/underflow exception + #endif + +FUNC_ENTRY fp_zero + + // Work around CFI branching limitations. + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Return 0 with the correct sign. + mov r0, ip + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END fp_zero +FUNC_END fp_inexact_zero + +#endif /* L_fp_assemblef */ + + +#ifdef L_fp_checknanf + +// Internal function to detect signaling NANs. No return. +// Uses $r2 as scratch space. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION fp_check_nan2 .text.sorted.libgcc.fpcore.j.checkf + CFI_START_FUNCTION + + // Work around CFI branching limitations. + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + + FUNC_ENTRY fp_check_nan + + // Check for quiet NAN. + lsrs r2, r0, #23 + bcs LLSYM(__quiet_nan) + + // Raise exception. Preserves both $r0 and $r1. + svc #(SVC_TRAP_NAN) + + // Quiet the resulting NAN. + movs r2, #1 + lsls r2, #22 + orrs r0, r2 + + LLSYM(__quiet_nan): + // End of library call, return to user. + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END fp_check_nan +FUNC_END fp_check_nan2 + +#endif /* L_fp_checknanf */ + + +#ifdef L_fp_exceptionf + +// Internal function to report floating point exceptions. No return. +// Expects the original argument(s) in $r0 (possibly also $r1). +// Expects a code that describes the exception in $r3. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION fp_exception .text.sorted.libgcc.fpcore.k.exceptf + CFI_START_FUNCTION + + // Work around CFI branching limitations. + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Create a quiet NAN. + movs r2, #255 + lsls r2, #1 + adds r2, #1 + lsls r2, #22 + + #if defined(EXCEPTION_CODES) && EXCEPTION_CODES + // Annotate the exception type in the NAN field. + // Make sure that the exception is in the valid region + lsls rT, r3, #13 + orrs r2, rT + #endif + + // Exception handler that expects the result already in $r2, + // typically when the result is not going to be NAN. + FUNC_ENTRY fp_exception2 + + #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS + svc #(SVC_FP_EXCEPTION) + #endif + + // TODO: Save exception flags in a static variable. + + // Set up the result, now that the argument isn't required any more. + movs r0, r2 + + // HACK: for sincosf(), with 2 parameters to return. + movs r1, r2 + + // End of library call, return to user. + pop { rT, pc } + .cfi_restore_state + + CFI_END_FUNCTION +FUNC_END fp_exception2 +FUNC_END fp_exception + +#endif /* L_arm_exception */ + diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index 31132633f32..6c3f29b71e2 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -2012,7 +2012,9 @@ LSYM(Lchange_\register): #include "bpabi-v6m.S" #include "eabi/fplib.h" #include "eabi/fcmp.S" +#include "eabi/fneg.S" #include "eabi/fadd.S" +#include "eabi/futil.S" #endif /* NOT_ISA_TARGET_32BIT */ #include "eabi/lcmp.S" #endif /* !__symbian__ */ diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf index e69579e16dd..c57d9ef50ac 100644 --- a/libgcc/config/arm/t-elf +++ b/libgcc/config/arm/t-elf @@ -32,6 +32,7 @@ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) LIB1ASMFUNCS += \ _internal_cmpsf2 \ _muldi3 \ + _arm_addsf3 \ endif @@ -95,6 +96,11 @@ LIB1ASMFUNCS += \ _arm_fcmpne \ _arm_eqsf2 \ _arm_gesf2 \ + _arm_frsubsf3 \ + _fp_exceptionf \ + _fp_checknanf \ + _fp_assemblef \ + _fp_normalizef \ endif