From patchwork Mon Dec 27 19:05:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Engel X-Patchwork-Id: 49310 X-Patchwork-Delegate: rearnsha@gcc.gnu.org Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D7D963858424 for ; Mon, 27 Dec 2021 19:21:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by sourceware.org (Postfix) with ESMTPS id 80FB0385842D for ; Mon, 27 Dec 2021 19:11:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 80FB0385842D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=danielengel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 47F06320098D; Mon, 27 Dec 2021 14:11:08 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Mon, 27 Dec 2021 14:11:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm1; bh=GWLu28eBPALCf 2r714Mtzvs71ZxGYTPDl8WxxwCHknM=; b=D6SYAvKWLUtwS4zz6HJ2VcPVJeyjq QmN31rCDZZnSBlX0QGfa8WPZUEvYDHtOAPjynwtjgoPnyboKReSJzbauIVd5OpP3 hbz9Q2AOTvcDMtUQDd9XRgYNCOC51ostVhdBDrZJkCNh8t1rgPCnFjYFsmbI3FhX shdbqCetCu4u9H1AVcbGdoRVb5D3Uzf2vDDwzebnYdUMM0N8qqCE9hz0AJ/0SfvW cPODuIssWaKomNTYGdrPDN/PDgJk47R+lzdr3x43TrxMEF6zSZNl4mc7UaDRVddM 4Df9+2u5hflsnqwSiY4AN2ll1USqdzRooblJghs9CleDDoj2mJGhfJs7g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=GWLu28eBPALCf2r714Mtzvs71ZxGYTPDl8WxxwCHknM=; b=UjQqkOqX YYLdzjvCMTsvooxSCkxZyiBcON6bQC1nl4ANtt89ft0WguLi5yxvEgbPCcMm9yHh UHtb/daNJfhaQZ73GMuqhb6FLXdgsyLvd9VopGlqcg9dlQ1pRckiGQ/ke7YaCA31 7+KolZlQ3wDZj3Z3mGSNrQBmmN2/F7calRM3YXRXBKud/QbHxHAql4svhwQvZfLW A92JYrsQdabdaJI3/Pg6pH0Jxpq6dbAgttmzIKWjMgDf3mU1aZ8kW3VyyHcwrO+B Cw/TriajtdZ21bPPI0aY+shUNRyzzoEb70GrAjt3NvgPWnKFpy3rqGHJGcDFXEn8 GPzFEaAc/MhTLA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddruddujedguddutdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhgggfestdekredtredttdenucfhrhhomhepffgrnhhi vghlucfgnhhgvghluceoghhnuhesuggrnhhivghlvghnghgvlhdrtghomheqnecuggftrf grthhtvghrnheptdeihfelvdehveegveejvdeuffeuffektdehudfhhedtgeehleeutdeh gfetgeetnecuffhomhgrihhnpehfmhhulhdrshgspdhgnhhurdhorhhgpdhlihgsudhfuh hntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 Dec 2021 14:11:07 -0500 (EST) Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com [10.0.0.96]) by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id 1BRJB56O061088; Mon, 27 Dec 2021 11:11:05 -0800 (PST) (envelope-from gnu@danielengel.com) From: Daniel Engel To: Richard Earnshaw , gcc-patches@gcc.gnu.org Subject: [PATCH v6 27/34] Import float multiplication from the CM0 library Date: Mon, 27 Dec 2021 11:05:23 -0800 Message-Id: <20211227190530.3136549-28-gnu@danielengel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211227190530.3136549-1-gnu@danielengel.com> References: <20211227190530.3136549-1-gnu@danielengel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Engel , Christophe Lyon Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" gcc/libgcc/ChangeLog: 2021-01-13 Daniel Engel * config/arm/eabi/fmul.S (__mulsf3): New file. * config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only). * config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope (this object was previously blocked on v6m builds). --- libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++++++++++++++++++++++++ libgcc/config/arm/lib1funcs.S | 1 + libgcc/config/arm/t-elf | 3 +- 3 files changed, 218 insertions(+), 1 deletion(-) create mode 100644 libgcc/config/arm/eabi/fmul.S diff --git a/libgcc/config/arm/eabi/fmul.S b/libgcc/config/arm/eabi/fmul.S new file mode 100644 index 00000000000..767de988f0b --- /dev/null +++ b/libgcc/config/arm/eabi/fmul.S @@ -0,0 +1,215 @@ +/* fmul.S: Thumb-1 optimized 32-bit float multiplication + + Copyright (C) 2018-2021 Free Software Foundation, Inc. + Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com) + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + + +#ifdef L_arm_mulsf3 + +// float __aeabi_fmul(float, float) +// Returns $r0 after multiplication by $r1. +// Subsection ordering within fpcore keeps conditional branches within range. +FUNC_START_SECTION aeabi_fmul .text.sorted.libgcc.fpcore.m.fmul +FUNC_ALIAS mulsf3 aeabi_fmul + CFI_START_FUNCTION + + // Standard registers, compatible with exception handling. + push { rT, lr } + .cfi_remember_state + .cfi_remember_state + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset rT, 0 + .cfi_rel_offset lr, 4 + + // Save the sign of the result. + movs rT, r1 + eors rT, r0 + lsrs rT, #31 + lsls rT, #31 + mov ip, rT + + // Set up INF for comparison. + movs rT, #255 + lsls rT, #24 + + // Check for multiplication by zero. + lsls r2, r0, #1 + beq LLSYM(__fmul_zero1) + + lsls r3, r1, #1 + beq LLSYM(__fmul_zero2) + + // Check for INF/NAN. + cmp r3, rT + bhs LLSYM(__fmul_special2) + + cmp r2, rT + bhs LLSYM(__fmul_special1) + + // Because neither operand is INF/NAN, the result will be finite. + // It is now safe to modify the original operand registers. + lsls r0, #9 + + // Isolate the first exponent. When normal, add back the implicit '1'. + // The result is always aligned with the MSB in bit [31]. + // Subnormal mantissas remain effectively multiplied by 2x relative to + // normals, but this works because the weight of a subnormal is -126. + lsrs r2, #24 + beq LLSYM(__fmul_normalize2) + adds r0, #1 + rors r0, r0 + + LLSYM(__fmul_normalize2): + // IMPORTANT: exp10i() jumps in here! + // Repeat for the mantissa of the second operand. + // Short-circuit when the mantissa is 1.0, as the + // first mantissa is already prepared in $r0 + lsls r1, #9 + + // When normal, add back the implicit '1'. + lsrs r3, #24 + beq LLSYM(__fmul_go) + adds r1, #1 + rors r1, r1 + + LLSYM(__fmul_go): + // Calculate the final exponent, relative to bit [30]. + adds rT, r2, r3 + subs rT, #127 + + #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + // Short-circuit on multiplication by powers of 2. + lsls r3, r0, #1 + beq LLSYM(__fmul_simple1) + + lsls r3, r1, #1 + beq LLSYM(__fmul_simple2) + #endif + + // Save $ip across the call. + // (Alternatively, could push/pop a separate register, + // but the four instructions here are equivally fast) + // without imposing on the stack. + add rT, ip + + // 32x32 unsigned multiplication, 64 bit result. + bl SYM(__umulsidi3) __PLT__ + + // Separate the saved exponent and sign. + sxth r2, rT + subs rT, r2 + mov ip, rT + + b SYM(__fp_assemble) + + #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__ + LLSYM(__fmul_simple2): + // Move the high bits of the result to $r1. + movs r1, r0 + + LLSYM(__fmul_simple1): + // Clear the remainder. + eors r0, r0 + + // Adjust mantissa to match the exponent, relative to bit[30]. + subs r2, rT, #1 + b SYM(__fp_assemble) + #endif + + LLSYM(__fmul_zero1): + // $r0 was equal to 0, set up to check $r1 for INF/NAN. + lsls r2, r1, #1 + + LLSYM(__fmul_zero2): + #if defined(EXCEPTION_CODES) && EXCEPTION_CODES + movs r3, #(INFINITY_TIMES_ZERO) + #endif + + // Check the non-zero operand for INF/NAN. + // If NAN, it should be returned. + // If INF, the result should be NAN. + // Otherwise, the result will be +/-0. + cmp r2, rT + beq SYM(__fp_exception) + + // If the second operand is finite, the result is 0. + blo SYM(__fp_zero) + + #if defined(STRICT_NANS) && STRICT_NANS + // Restore values that got mixed in zero testing, then go back + // to sort out which one is the NAN. + lsls r3, r1, #1 + lsls r2, r0, #1 + #elif defined(TRAP_NANS) && TRAP_NANS + // Return NAN with the sign bit cleared. + lsrs r0, r2, #1 + b SYM(__fp_check_nan) + #else + lsrs r0, r2, #1 + // Return NAN with the sign bit cleared. + pop { rT, pc } + .cfi_restore_state + #endif + + LLSYM(__fmul_special2): + // $r1 is INF/NAN. In case of INF, check $r0 for NAN. + cmp r2, rT + + #if defined(TRAP_NANS) && TRAP_NANS + // Force swap if $r0 is not NAN. + bls LLSYM(__fmul_swap) + + // $r0 is NAN, keep if $r1 is INF + cmp r3, rT + beq LLSYM(__fmul_special1) + + // Both are NAN, keep the smaller value (more likely to signal). + cmp r2, r3 + #endif + + // Prefer the NAN already in $r0. + // (If TRAP_NANS, this is the smaller NAN). + bhi LLSYM(__fmul_special1) + + LLSYM(__fmul_swap): + movs r0, r1 + + LLSYM(__fmul_special1): + // $r0 is either INF or NAN. $r1 has already been examined. + // Flags are already set correctly. + lsls r2, r0, #1 + cmp r2, rT + beq SYM(__fp_infinity) + + #if defined(TRAP_NANS) && TRAP_NANS + b SYM(__fp_check_nan) + #else + pop { rT, pc } + .cfi_restore_state + #endif + + CFI_END_FUNCTION +FUNC_END mulsf3 +FUNC_END aeabi_fmul + +#endif /* L_arm_mulsf3 */ + diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index 6c3f29b71e2..ffc343c37d3 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -2015,6 +2015,7 @@ LSYM(Lchange_\register): #include "eabi/fneg.S" #include "eabi/fadd.S" #include "eabi/futil.S" +#include "eabi/fmul.S" #endif /* NOT_ISA_TARGET_32BIT */ #include "eabi/lcmp.S" #endif /* !__symbian__ */ diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf index c57d9ef50ac..682f273a1d2 100644 --- a/libgcc/config/arm/t-elf +++ b/libgcc/config/arm/t-elf @@ -10,7 +10,7 @@ THUMB1_ISA:=$(findstring __ARM_ARCH_ISA_THUMB 1,$(shell $(gcc_compile_bare) -dM # inclusion create when only multiplication is used, thus avoiding pulling in # useless division code. ifneq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) -LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3 +LIB1ASMFUNCS += _arm_muldf3 endif endif # !__symbian__ @@ -26,6 +26,7 @@ LIB1ASMFUNCS += \ _ctzsi2 \ _paritysi2 \ _popcountsi2 \ + _arm_mulsf3 \ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA)) # Group 0B: WEAK overridable function objects built for v6m only.