From patchwork Mon Dec 27 19:05:22 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 49309
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2802D385801C
	for <patchwork@sourceware.org>; Mon, 27 Dec 2021 19:20:27 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com
 [64.147.123.20])
 by sourceware.org (Postfix) with ESMTPS id 86BEB3858419
 for <gcc-patches@gcc.gnu.org>; Mon, 27 Dec 2021 19:10:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 86BEB3858419
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id 4C03C3200B5E;
 Mon, 27 Dec 2021 14:10:58 -0500 (EST)
Received: from mailfrontend1 ([10.202.2.162])
 by compute5.internal (MEProxy); Mon, 27 Dec 2021 14:10:58 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding; s=fm1; bh=yd+SAG1YXYWnn
 Fvse1+bnfUzEfXRITjoDSAPDfVBVvw=; b=G6JwbvjacS35bMJmz41oH0BsNcOiV
 FyhzFhL5yvI67p0vbZWYzRM4GfURQFvRFytL/ZJexGCksNW47KCYKYpI3jSBLjiy
 IbSiVyzNwBRVgP6wIuKYaOSyNFcDbqsxvA865h9g6zQ4MN+BHDRV8D/aOPL47krO
 Lol/CQh//Y1hVJknsnZMBypfAhUY6S+WVSCqF27CVT9qBLFQAYUTOAiUhmgRJUVY
 DaoCz66VpfGsniQ1SETJUDr+lj1Xc5dNP7pqKLdXpcyw1+80LERLJyrVMb6zwllE
 NglHXrRjNU4hwDM4yU4wUBzaCPnIKnQdRqRCajYpA9uAQ5Z7td95GIlmQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:date:from
 :in-reply-to:message-id:mime-version:references:subject:to
 :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=
 fm1; bh=yd+SAG1YXYWnnFvse1+bnfUzEfXRITjoDSAPDfVBVvw=; b=BIV7BZLN
 dpvotNz0bgZh1BvUq7MZwVFj9hb6b5IAuNS2fc4W56Hd+AN5uHMal5zBtQY6cMkI
 g+u87gS8/P/HWMQsLVexvvzJmxQLnRLk722aeStS2lCTULcI444tNm5z4liMd4v2
 V5jhlIi8JyXF/06mNyD3XuqWEnJ77YvhGwvfHaeVUf+MNg5k0fTCQtcnkA0FyyIX
 R2iTtq4HCNcwxdgorYw34EZ7ZFugw/Mz+dm4xlTqKqCdNEzKZ6Qu+Rodhwaa6Xc7
 Xc4EKIBIEyAum1mvZKAyRSvxCiSDcX6navoDCi0MYAPCIqpXbRp/6hF/rGM1EO6m
 aEFQzXC2WPw97g==
X-ME-Sender: <xms:wQ_KYQUB_fu1ZJFyW_TpbM39ZaD0I1fPU8EdKTGwCY4oRd2FxrFoKw>
 <xme:wQ_KYUkJNii82Ypv5riJ_sRJtsMaAOrOVbUpiy-46-FNC3jTYJBu-siVAub4a9BdI
 7034HH9_29mtw>
X-ME-Received: 
 <xmr:wQ_KYUbiBYDolQWeQQnmPxppKqm7geZGDM4fcBDLrq-ikMgdnqVMje-SpnuI8rJw3jPhPvOm2MqcEisDNuUxrQwpTYkhaIV2mQz6oCt8AO9jxzOAOB0n>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvuddruddujedguddtlecutefuodetggdotefrod
 ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh
 necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
 enucfjughrpefhvffufffkofgjfhgggfestdekredtredttdenucfhrhhomhepffgrnhhi
 vghlucfgnhhgvghluceoghhnuhesuggrnhhivghlvghnghgvlhdrtghomheqnecuggftrf
 grthhtvghrnhepfeduudffjeejheeiteeltdefteelteeiveekhfegjeegffekveeigfeg
 keeljedvnecuffhomhgrihhnpehfrgguugdrshgspdhgnhhurdhorhhgpdhfnhgvghdrsh
 gspdhfuhhtihhlrdhssgdplhhisgdufhhunhgtshdrshgsnecuvehluhhsthgvrhfuihii
 vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgvlh
 drtghomh
X-ME-Proxy: <xmx:wQ_KYfWnN13LwMePz0pSV6AXGWkRzMb1peU3MceQAugTZfaFCTK8CA>
 <xmx:wQ_KYalluDkBJqMWXNjPsGxvHsLeg0TNGNkjKnvWIFRHT5BzKAUjHg>
 <xmx:wQ_KYUe7iwW81nLYFEgUO4qh9zjK2lAtj2JC9QsLEt5jxHepAozSBg>
 <xmx:wQ_KYfweIV1YyD77RDeLxMvoFgdxiznNehSrcSuQS69jLRrjZLju7Q>
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 27 Dec 2021 14:10:56 -0500 (EST)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 1BRJAtO0061040; Mon, 27 Dec 2021 11:10:55 -0800 (PST)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v6 26/34] Import float addition and subtraction from the CM0
 library
Date: Mon, 27 Dec 2021 11:05:22 -0800
Message-Id: <20211227190530.3136549-27-gnu@danielengel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20211227190530.3136549-1-gnu@danielengel.com>
References: <20211227190530.3136549-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Since this is the first import of single-precision functions, some common
parsing and formatting routines are also included.  These common rotines
will be referenced by other functions in subsequent commits.
However, even if the size penalty is accounted entirely to __addsf3(),
the total compiled size is still less than half the size of soft-float.

gcc/libgcc/ChangeLog:
2021-01-13 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fadd.S (__addsf3, __subsf3): Added new functions.
	* config/arm/eabi/fneg.S (__negsf2): Added new file.
	* config/arm/eabi/futil.S (__fp_normalize2, __fp_lalign2, __fp_assemble,
	__fp_overflow, __fp_zero, __fp_check_nan): Added new file with shared
	helper functions.
	* config/arm/lib1funcs.S: #include eabi/fneg.S and eabi/futil.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_addsf3, _arm_frsubsf3,
	_fp_exceptionf, _fp_checknanf, _fp_assemblef, and _fp_normalizef.
---
 libgcc/config/arm/eabi/fadd.S  | 306 +++++++++++++++++++++++-
 libgcc/config/arm/eabi/fneg.S  |  76 ++++++
 libgcc/config/arm/eabi/fplib.h |   3 -
 libgcc/config/arm/eabi/futil.S | 418 +++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S  |   2 +
 libgcc/config/arm/t-elf        |   6 +
 6 files changed, 798 insertions(+), 13 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/futil.S

diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
index fffbd91d1bc..77b81d62b3b 100644
--- a/libgcc/config/arm/eabi/fadd.S
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -1,5 +1,7 @@
-/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+/* fadd.S: Thumb-1 optimized 32-bit float addition and subtraction
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
 
    This file is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
@@ -21,18 +23,302 @@
    <http://www.gnu.org/licenses/>.  */
 
 
+#ifdef L_arm_frsubsf3
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_frsub .text.sorted.libgcc.fpcore.b.frsub
+    CFI_START_FUNCTION
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r0 is NAN before modifying.
+        lsls    r2,     r0,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
+
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     SYM(__aeabi_fadd)
+      #endif
+
+        // Flip sign and run through fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r0,     r2
+        b       SYM(__aeabi_fadd)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_frsub
+
+#endif /* L_arm_frsubsf3 */
+
+
 #ifdef L_arm_addsubsf3
 
-FUNC_START aeabi_frsub
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fsub .text.sorted.libgcc.fpcore.c.faddsub
+FUNC_ALIAS subsf3 aeabi_fsub
+    CFI_START_FUNCTION
 
-      push	{r4, lr}
-      movs	r4, #1
-      lsls	r4, #31
-      eors	r0, r0, r4
-      bl	__aeabi_fadd
-      pop	{r4, pc}
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r1 is NAN before modifying.
+        lsls    r2,     r1,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
 
-      FUNC_END aeabi_frsub
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     SYM(__aeabi_fadd)
+      #endif
+
+        // Flip sign and fall into fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r1,     r2
 
 #endif /* L_arm_addsubsf3 */
 
+
+// The execution of __subsf3() flows directly into __addsf3(), such that
+//  instructions must appear consecutively in the same memory section.
+//  However, this construction inhibits the ability to discard __subsf3()
+//  when only using __addsf3().
+// Therefore, this block configures __addsf3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __subsf3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS.
+#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3)
+
+#ifdef L_arm_addsf3
+// float __aeabi_fadd(float, float)
+// Returns the floating point sum of $r0 + $r1 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+WEAK_START_SECTION aeabi_fadd .text.sorted.libgcc.fpcore.c.fadd
+WEAK_ALIAS addsf3 aeabi_fadd
+    CFI_START_FUNCTION
+
+#else /* L_arm_addsubsf3 */
+FUNC_ENTRY aeabi_fadd
+FUNC_ALIAS addsf3 aeabi_fadd
+
+#endif
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Drop the sign bit to compare absolute value.
+        lsls    r2,     r0,     #1
+        lsls    r3,     r1,     #1
+
+        // Save the logical difference of original values.
+        // This actually makes the following swap slightly faster.
+        eors    r1,     r0
+
+        // Compare exponents+mantissa.
+        // MAYBE: Speedup for equal values?  This would have to separately
+        //  check for NAN/INF and then either:
+        // * Increase the exponent by '1' (for multiply by 2), or
+        // * Return +0
+        cmp     r2,     r3
+        bhs     LLSYM(__fadd_ordered)
+
+        // Reorder operands so the larger absolute value is in r2,
+        //  the corresponding original operand is in $r0,
+        //  and the smaller absolute value is in $r3.
+        movs    r3,     r2
+        eors    r0,     r1
+        lsls    r2,     r0,     #1
+
+    LLSYM(__fadd_ordered):
+        // Extract the exponent of the larger operand.
+        // If INF/NAN, then it becomes an automatic result.
+        lsrs    r2,     #24
+        cmp     r2,     #255
+        beq     LLSYM(__fadd_special)
+
+        // Save the sign of the result.
+        lsrs    rT,     r0,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // If the original value of $r1 was to +/-0,
+        //  $r0 becomes the automatic result.
+        // Because $r0 is known to be a finite value, return directly.
+        // It's actually important that +/-0 not go through the normal
+        //  process, to keep "-0 +/- 0"  from being turned into +0.
+        cmp     r3,     #0
+        beq     LLSYM(__fadd_zero)
+
+        // Extract the second exponent.
+        lsrs    r3,     #24
+
+        // Calculate the difference of exponents (always positive).
+        subs    r3,     r2,     r3
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // If the smaller operand is more than 25 bits less significant
+        //  than the larger, the larger operand is an automatic result.
+        // The smaller operand can't affect the result, even after rounding.
+        cmp     r3,     #25
+        bhi     LLSYM(__fadd_return)
+      #endif
+
+        // Isolate both mantissas, recovering the smaller.
+        lsls    rT,     r0,     #9
+        lsls    r0,     r1,     #9
+        eors    r0,     rT
+
+        // If the larger operand is normal, restore the implicit '1'.
+        // If subnormal, the second operand will also be subnormal.
+        cmp     r2,     #0
+        beq     LLSYM(__fadd_normal)
+        adds    rT,     #1
+        rors    rT,     rT
+
+        // If the smaller operand is also normal, restore the implicit '1'.
+        // If subnormal, the smaller operand effectively remains multiplied
+        //  by 2 w.r.t the first.  This compensates for subnormal exponents,
+        //  which are technically still -126, not -127.
+        cmp     r2,     r3
+        beq     LLSYM(__fadd_normal)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LLSYM(__fadd_normal):
+        // Provide a spare bit for overflow.
+        // Normal values will be aligned in bits [30:7]
+        // Subnormal values will be aligned in bits [30:8]
+        lsrs    rT,     #1
+        lsrs    r0,     #1
+
+        // If signs weren't matched, negate the smaller operand (branchless).
+        asrs    r1,     #31
+        eors    r0,     r1
+        subs    r0,     r1
+
+        // Keep a copy of the small mantissa for the remainder.
+        movs    r1,     r0
+
+        // Align the small mantissa for addition.
+        asrs    r1,     r3
+
+        // Isolate the remainder.
+        // NOTE: Given the various cases above, the remainder will only
+        //  be used as a boolean for rounding ties to even.  It is not
+        //  necessary to negate the remainder for subtraction operations.
+        rsbs    r3,     #0
+        adds    r3,     #32
+        lsls    r0,     r3
+
+        // Because operands are ordered, the result will never be negative.
+        // If the result of subtraction is 0, the overall result must be +0.
+        // If the overall result in $r1 is 0, then the remainder in $r0
+        //  must also be 0, so no register copy is necessary on return.
+        adds    r1,     rT
+        beq     LLSYM(__fadd_return)
+
+        // The large operand was aligned in bits [29:7]...
+        // If the larger operand was normal, the implicit '1' went in bit [30].
+        //
+        // After addition, the MSB of the result may be in bit:
+        //    31,  if the result overflowed.
+        //    30,  the usual case.
+        //    29,  if there was a subtraction of operands with exponents
+        //          differing by more than 1.
+        //  < 28, if there was a subtraction of operands with exponents +/-1,
+        //  < 28, if both operands were subnormal.
+
+        // In the last case (both subnormal), the alignment shift will be 8,
+        //  the exponent will be 0, and no rounding is necessary.
+        cmp     r2,     #0
+        bne     SYM(__fp_assemble)
+
+        // Subnormal overflow automatically forms the correct exponent.
+        lsrs    r0,     r1,     #8
+        add     r0,     ip
+
+    LLSYM(__fadd_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LLSYM(__fadd_special):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If $r1 is (also) NAN, force it in place of $r0.
+        // As the smaller NAN, it is more likely to be signaling.
+        movs    rT,     #255
+        lsls    rT,     #24
+        cmp     r3,     rT
+        bls     LLSYM(__fadd_ordered2)
+
+        eors    r0,     r1
+      #endif
+
+    LLSYM(__fadd_ordered2):
+        // There are several possible cases to consider here:
+        //  1. Any NAN/NAN combination
+        //  2. Any NAN/INF combination
+        //  3. Any NAN/value combination
+        //  4. INF/INF with matching signs
+        //  5. INF/INF with mismatched signs.
+        //  6. Any INF/value combination.
+        // In all cases but the case 5, it is safe to return $r0.
+        // In the special case, a new NAN must be constructed.
+        // First, check the mantissa to see if $r0 is NAN.
+        lsls    r2,     r0,     #9
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LLSYM(__fadd_return)
+      #endif
+
+    LLSYM(__fadd_zero):
+        // Next, check for an INF/value combination.
+        lsls    r2,     r1,     #1
+        bne     LLSYM(__fadd_return)
+
+        // Finally, check for matching sign on INF/INF.
+        // Also accepts matching signs when +/-0 are added.
+        bcc     LLSYM(__fadd_return)
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(SUBTRACTED_INFINITY)
+      #endif
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // Restore original operands.
+        eors    r1,     r0
+      #endif
+
+        // Identify mismatched 0.
+        lsls    r2,     r0,     #1
+        bne     SYM(__fp_exception)
+
+        // Force mismatched 0 to +0.
+        eors    r0,     r0
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END addsf3
+FUNC_END aeabi_fadd
+
+#ifdef L_arm_addsubsf3
+FUNC_END subsf3
+FUNC_END aeabi_fsub
+#endif
+
+#endif /* L_arm_addsf3 */
+
diff --git a/libgcc/config/arm/eabi/fneg.S b/libgcc/config/arm/eabi/fneg.S
new file mode 100644
index 00000000000..b92e247c3d3
--- /dev/null
+++ b/libgcc/config/arm/eabi/fneg.S
@@ -0,0 +1,76 @@
+/* fneg.S: Thumb-1 optimized 32-bit float negation
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_negsf2
+
+// float __aeabi_fneg(float) [obsolete]
+// The argument and result are in $r0.
+// Uses $r1 and $r2 as scratch registers.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fneg .text.sorted.libgcc.fpcore.a.fneg
+FUNC_ALIAS negsf2 aeabi_fneg
+    CFI_START_FUNCTION
+
+  #if (defined(STRICT_NANS) && STRICT_NANS) || \
+      (defined(TRAP_NANS) && TRAP_NANS)
+        // Check for NAN.
+        lsls    r1,     r0,     #1
+        movs    r2,     #255
+        lsls    r2,     #24
+        cmp     r1,     r2
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        blo     SYM(__fneg_nan)
+      #else
+        blo     LLSYM(__fneg_return)
+      #endif
+  #endif
+
+        // Flip the sign.
+        movs    r1,     #1
+        lsls    r1,     #31
+        eors    r0,     r1
+
+    LLSYM(__fneg_return):
+        RET
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+    LLSYM(__fneg_nan):
+        // Set up registers for exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(fp_check_nan)
+  #endif
+
+    CFI_END_FUNCTION
+FUNC_END negsf2
+FUNC_END aeabi_fneg
+
+#endif /* L_arm_negsf2 */
+
diff --git a/libgcc/config/arm/eabi/fplib.h b/libgcc/config/arm/eabi/fplib.h
index ca22d3db8e3..c1924a37ab3 100644
--- a/libgcc/config/arm/eabi/fplib.h
+++ b/libgcc/config/arm/eabi/fplib.h
@@ -45,9 +45,6 @@
 /* Push extra registers when required for 64-bit stack alignment */
 #define DOUBLE_ALIGN_STACK (1)
 
-/* Manipulate *div0() parameters to meet the ARM runtime ABI specification. */
-#define PEDANTIC_DIV0 (1)
-
 /* Define various exception codes.  These don't map to anything in particular */
 #define SUBTRACTED_INFINITY (20)
 #define INFINITY_TIMES_ZERO (21)
diff --git a/libgcc/config/arm/eabi/futil.S b/libgcc/config/arm/eabi/futil.S
new file mode 100644
index 00000000000..923c1c28b41
--- /dev/null
+++ b/libgcc/config/arm/eabi/futil.S
@@ -0,0 +1,418 @@
+/* futil.S: Thumb-1 optimized 32-bit float helper functions
+
+   Copyright (C) 2018-2021 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+// These helper functions are exported in distinct object files to keep
+//  the linker from importing unused code.
+// These helper functions do NOT follow AAPCS register conventions.
+
+
+#ifdef L_fp_normalizef
+
+// Internal function, decomposes the unsigned float in $r2.
+// The exponent will be returned in $r2, the mantissa in $r3.
+// If subnormal, the mantissa will be normalized, so that
+//  the MSB of the mantissa (if any) will be aligned at bit[31].
+// Preserves $r0 and $r1, uses $rT as scratch space.
+FUNC_START_SECTION fp_normalize2 .text.sorted.libgcc.fpcore.y.alignf
+    CFI_START_FUNCTION
+
+        // Extract the mantissa.
+        lsls    r3,     r2,     #8
+
+        // Extract the exponent.
+        lsrs    r2,     #24
+        beq     SYM(__fp_lalign2)
+
+        // Restore the mantissa's implicit '1'.
+        adds    r3,     #1
+        rors    r3,     r3
+
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END fp_normalize2
+
+
+// Internal function, aligns $r3 so the MSB is aligned in bit[31].
+// Simultaneously, subtracts the shift from the exponent in $r2
+FUNC_ENTRY fp_lalign2
+    CFI_START_FUNCTION
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Unroll the loop, similar to __clzsi2().
+        lsrs    rT,     r3,     #16
+        bne     LLSYM(__align8)
+        subs    r2,     #16
+        lsls    r3,     #16
+
+    LLSYM(__align8):
+        lsrs    rT,     r3,     #24
+        bne     LLSYM(__align4)
+        subs    r2,     #8
+        lsls    r3,     #8
+
+    LLSYM(__align4):
+        lsrs    rT,     r3,     #28
+        bne     LLSYM(__align2)
+        subs    r2,     #4
+        lsls    r3,     #4
+  #endif
+
+    LLSYM(__align2):
+        // Refresh the state of the N flag before entering the loop.
+        tst     r3,     r3
+
+    LLSYM(__align_loop):
+        // Test before subtracting to compensate for the natural exponent.
+        // The largest subnormal should have an exponent of 0, not -1.
+        bmi     LLSYM(__align_return)
+        subs    r2,     #1
+        lsls    r3,     #1
+        bne     LLSYM(__align_loop)
+
+        // Not just a subnormal... 0!  By design, this should never happen.
+        // All callers of this internal function filter 0 as a special case.
+        // Was there an uncontrolled jump from somewhere else?  Cosmic ray?
+        eors    r2,     r2
+
+      #ifdef DEBUG
+        bkpt    #0
+      #endif
+
+    LLSYM(__align_return):
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END fp_lalign2
+
+#endif /* L_fp_normalizef */
+
+
+#ifdef L_fp_assemblef
+
+// Internal function to combine mantissa, exponent, and sign. No return.
+// Expects the unsigned result in $r1.  To avoid underflow (slower),
+//  the MSB should be in bits [31:29].
+// Expects any remainder bits of the unrounded result in $r0.
+// Expects the exponent in $r2.  The exponent must be relative to bit[30].
+// Expects the sign of the result (and only the sign) in $ip.
+// Returns a correctly rounded floating value in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_assemble .text.sorted.libgcc.fpcore.g.assemblef
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Examine the upper three bits [31:29] for underflow.
+        lsrs    r3,     r1,     #29
+        beq     LLSYM(__fp_underflow)
+
+        // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }.
+        // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'.
+        lsrs    r3,     r1,     #1
+        mvns    r3,     r3
+        ands    r3,     r1
+        lsrs    r3,     #30
+        subs    r3,     #2
+        rors    r1,     r3
+
+        // Update the exponent, assuming the final result will be normal.
+        // The new exponent is 1 less than actual, to compensate for the
+        //  eventual addition of the implicit '1' in the result.
+        // If the final exponent becomes negative, proceed directly to gradual
+        //  underflow, without bothering to search for the MSB.
+        adds    r2,     r3
+
+    FUNC_ENTRY fp_assemble2
+        bmi     LLSYM(__fp_subnormal)
+
+    LLSYM(__fp_normal):
+        // Check for overflow (remember the implicit '1' to be added later).
+        cmp     r2,     #254
+        bge     SYM(__fp_overflow)
+
+        // Save LSBs for the remainder. Position doesn't matter any more,
+        //  these are just tiebreakers for round-to-even.
+        lsls    rT,     r1,     #25
+
+        // Align the final result.
+        lsrs    r1,     #8
+
+    LLSYM(__fp_round):
+        // If carry bit is '0', always round down.
+        bcc     LLSYM(__fp_return)
+
+        // The carry bit is '1'.  Round to nearest, ties to even.
+        // If either the saved remainder bits [6:0], the additional remainder
+        //  bits in $r1, or the final LSB is '1', round up.
+        lsls    r3,     r1,     #31
+        orrs    r3,     rT
+        orrs    r3,     r0
+        beq     LLSYM(__fp_return)
+
+        // If rounding up overflows, then the mantissa result becomes 2.0,
+        //  which yields the correct return value up to and including INF.
+        adds    r1,     #1
+
+    LLSYM(__fp_return):
+        // Combine the mantissa and the exponent.
+        lsls    r2,     #23
+        adds    r0,     r1,     r2
+
+        // Combine with the saved sign.
+        // End of library call, return to user.
+        add     r0,     ip
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: Underflow/inexact reporting IFF remainder
+  #endif
+
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LLSYM(__fp_underflow):
+        // Set up to align the mantissa.
+        movs    r3,     r1
+        bne     LLSYM(__fp_underflow2)
+
+        // MSB wasn't in the upper 32 bits, check the remainder.
+        // If the remainder is also zero, the result is +/-0.
+        movs    r3,     r0
+        beq     SYM(__fp_zero)
+
+        eors    r0,     r0
+        subs    r2,     #32
+
+    LLSYM(__fp_underflow2):
+        // Save the pre-alignment exponent to align the remainder later.
+        movs    r1,     r2
+
+        // Align the mantissa with the MSB in bit[31].
+        bl      SYM(__fp_lalign2)
+
+        // Calculate the actual remainder shift.
+        subs    rT,     r1,     r2
+
+        // Align the lower bits of the remainder.
+        movs    r1,     r0
+        lsls    r0,     rT
+
+        // Combine the upper bits of the remainder with the aligned value.
+        rsbs    rT,     #0
+        adds    rT,     #32
+        lsrs    r1,     rT
+        adds    r1,     r3
+
+        // The MSB is now aligned at bit[31] of $r1.
+        // If the net exponent is still positive, the result will be normal.
+        // Because this function is used by fmul(), there is a possibility
+        //  that the value is still wider than 24 bits; always round.
+        tst     r2,     r2
+        bpl     LLSYM(__fp_normal)
+
+    LLSYM(__fp_subnormal):
+        // The MSB is aligned at bit[31], with a net negative exponent.
+        // The mantissa will need to be shifted right by the absolute value of
+        //  the exponent, plus the normal shift of 8.
+
+        // If the negative shift is smaller than -25, there is no result,
+        //  no rounding, no anything.  Return signed zero.
+        // (Otherwise, the shift for result and remainder may wrap.)
+        adds    r2,     #25
+        bmi     SYM(__fp_inexact_zero)
+
+        // Save the extra bits for the remainder.
+        movs    rT,     r1
+        lsls    rT,     r2
+
+        // Shift the mantissa to create a subnormal.
+        // Just like normal, round to nearest, ties to even.
+        movs    r3,     #33
+        subs    r3,     r2
+        eors    r2,     r2
+
+        // This shift must be last, leaving the shifted LSB in the C flag.
+        lsrs    r1,     r3
+        b       LLSYM(__fp_round)
+
+    CFI_END_FUNCTION
+FUNC_END fp_assemble
+
+
+// Recreate INF with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+FUNC_ENTRY fp_overflow
+    CFI_START_FUNCTION
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/overflow exception
+  #endif
+
+    FUNC_ENTRY fp_infinity
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        movs    r0,     #255
+        lsls    r0,     #23
+        add     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_overflow
+
+
+// Recreate 0 with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+FUNC_ENTRY fp_inexact_zero
+    CFI_START_FUNCTION
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/underflow exception
+  #endif
+
+FUNC_ENTRY fp_zero
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Return 0 with the correct sign.
+        mov     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_zero
+FUNC_END fp_inexact_zero
+
+#endif /* L_fp_assemblef */
+
+
+#ifdef L_fp_checknanf
+
+// Internal function to detect signaling NANs.  No return.
+// Uses $r2 as scratch space.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_check_nan2 .text.sorted.libgcc.fpcore.j.checkf
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+
+    FUNC_ENTRY fp_check_nan
+
+        // Check for quiet NAN.
+        lsrs    r2,     r0,     #23
+        bcs     LLSYM(__quiet_nan)
+
+        // Raise exception.  Preserves both $r0 and $r1.
+        svc     #(SVC_TRAP_NAN)
+
+        // Quiet the resulting NAN.
+        movs    r2,     #1
+        lsls    r2,     #22
+        orrs    r0,     r2
+
+    LLSYM(__quiet_nan):
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_check_nan
+FUNC_END fp_check_nan2
+
+#endif /* L_fp_checknanf */
+
+
+#ifdef L_fp_exceptionf
+
+// Internal function to report floating point exceptions.  No return.
+// Expects the original argument(s) in $r0 (possibly also $r1).
+// Expects a code that describes the exception in $r3.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_exception .text.sorted.libgcc.fpcore.k.exceptf
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Create a quiet NAN.
+        movs    r2,     #255
+        lsls    r2,     #1
+        adds    r2,     #1
+        lsls    r2,     #22
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        // Annotate the exception type in the NAN field.
+        // Make sure that the exception is in the valid region
+        lsls    rT,     r3,     #13
+        orrs    r2,     rT
+      #endif
+
+    // Exception handler that expects the result already in $r2,
+    //  typically when the result is not going to be NAN.
+    FUNC_ENTRY fp_exception2
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_FP_EXCEPTION)
+      #endif
+
+        // TODO: Save exception flags in a static variable.
+
+        // Set up the result, now that the argument isn't required any more.
+        movs    r0,     r2
+
+        // HACK: for sincosf(), with 2 parameters to return.
+        movs    r1,     r2
+
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_exception2
+FUNC_END fp_exception
+
+#endif /* L_arm_exception */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 31132633f32..6c3f29b71e2 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2012,7 +2012,9 @@ LSYM(Lchange_\register):
 #include "bpabi-v6m.S"
 #include "eabi/fplib.h"
 #include "eabi/fcmp.S"
+#include "eabi/fneg.S"
 #include "eabi/fadd.S"
+#include "eabi/futil.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index e69579e16dd..c57d9ef50ac 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -32,6 +32,7 @@ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
 LIB1ASMFUNCS += \
 	_internal_cmpsf2 \
 	_muldi3 \
+        _arm_addsf3 \
 	
 endif
 
@@ -95,6 +96,11 @@ LIB1ASMFUNCS += \
         _arm_fcmpne \
         _arm_eqsf2 \
         _arm_gesf2 \
+        _arm_frsubsf3 \
+	_fp_exceptionf \
+	_fp_checknanf \
+	_fp_assemblef \
+	_fp_normalizef \
 
 endif