From patchwork Mon Oct 31 15:44:56 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59660
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id EE8A1386076B
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:46:55 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 555343853554
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 555343853554
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id 3EA8532002E8;
 Mon, 31 Oct 2022 11:46:08 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:46:08 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231167; x=
 1667317567; bh=6EmkKw0mTIVn397/NlTflDaQBN1NMN7l1yXLeXHod0E=; b=w
 RHXFyPsleXpouj1wroR4/8Qp5SBruRhQ/C+T9aUCX/OB3D+NM5Jbz6JeZrvy2Az9
 2vp3Rl0KpXcPBVsukNVViHwCddwBs2Ee7/Epi4mEBTDAyv13CRe/61P+rip7baX7
 NBzWi6BHZ4DnbMi0VuMLoFhKZw+ueM1IPVk7Bty4dyzU9NsrFFmPyxw8qTfZKq//
 2aKjRFafu/7LyK9w5NyIue2OG5xjJ4cR/j6yBfq8TGQnaS9qVukgiJhuQybWtbGk
 /2s6Mjd4DQZVahn+bhmJSFGk9l6lD/EO4YryDONbtSYsGukmofHDUopFYf6Ny1bQ
 AKI8El8QyIjI2Pu7801aA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231167; x=1667317567; bh=6EmkKw0mTIVn3
 97/NlTflDaQBN1NMN7l1yXLeXHod0E=; b=tLPIn8+5rW6eoO3K8z48R4jMyvyII
 JW/7MEy+LKeKk7MqLWidfY9VkJR7x0qDBzihnufH39OWRghnJKw9dqcLvs5+vvgv
 KFSIhUmYt4w9jlmXx3jfAV1v6VGjA1ZcVp2vym9R/O16C9UzcQrpeNXAx8EtLxOh
 A2iypecYKimf5JXykVXMLFPT3IvCISL1PIXvgTeGkMaDSUbeFrRQpHoo29/k24+c
 IK7Ai/NKTLmb6KPVY3i9N6RYf//w01EmOP32OlUjTglZ1Oyzo47nwy58EEmeY5sM
 8EHIqKiivQP2PqaRs0qEyLxb4InFI28lwn949UCb2OVMktkaWdWt8r4zA==
X-ME-Sender: <xms:v-1fY_PES9mFmuaULs8D4DDeyNMBAW8URAAaQS6AiYe6g8_cYXVmUQ>
 <xme:v-1fY59JLyrCyxS6JS-eEcs3EJl484oxHSNtc5H77F7BwNoOaPD153r86LIIK1hwP
 FuRNZqC5T0wwQ>
X-ME-Received: 
 <xmr:v-1fY-SKOyrSTLhzvWMFh4u1u4EgX636MO8Q4bHeDfbfKPH2vRcwAqZLnavmnGrXSyeATldtFvFb5n5-bvCbqwCw_NffI02yc04Lc7zKppaTLhxB24DAzYA>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdejlecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeeuieekgfegkeekleffkefhleevgfekkeekheeffeevkeehvdduvdfgvdeg
 heeuhfenucffohhmrghinheplhhisgdufhhunhgtshdrshgsnecuvehluhhsthgvrhfuih
 iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgv
 lhdrtghomh
X-ME-Proxy: <xmx:v-1fYztnNoFDDjxQiGKmRm2zjXGchtUt4ZK67a2__7ae-enF9tk7LA>
 <xmx:v-1fY3e6kHU8OyIaNJzUNRs34VuljNqrSXXgojm58GWo3kh2kflP5Q>
 <xmx:v-1fY_3qwK4PvQOdbmmA9jK14-0LnnQmo7ToPAy5mRfQN7boLZwUVw>
 <xmx:v-1fY4r_daQidFtia18cYQ8oGrxskwx6rZriZ3upsVRyck0yYvvDGQ>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:07 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFjxed087232; Mon, 31 Oct 2022 08:45:59 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 01/34] Add and restructure function declaration macros
Date: Mon, 31 Oct 2022 08:44:56 -0700
Message-Id: <20221031154529.3627576-2-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_STOCKGEN, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Most of these changes support subsequent patches in this series.
Particularly, the FUNC_START macro becomes part of a new macro chain:

  * FUNC_ENTRY              Common global symbol directives
  * FUNC_START_SECTION      FUNC_ENTRY to start a new <section>
  * FUNC_START              FUNC_START_SECTION <".text">

The effective definition of FUNC_START is unchanged from the previous
version of lib1funcs.  See code comments for detailed usage.

The new names FUNC_ENTRY and FUNC_START_SECTION were chosen specifically
to complement the existing FUNC_START name.  Alternate name patterns are
possible (such as {FUNC_SYMBOL, FUNC_START_SECTION, FUNC_START_TEXT}),
but any change to FUNC_START would require refactoring much of libgcc.

Additionally, a parallel chain of new macros supports weak functions:

  * WEAK_ENTRY
  * WEAK_START_SECTION
  * WEAK_START
  * WEAK_ALIAS

Moving the CFI_* macros earlier in the file scope will increase their
scope for use in additional functions.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/lib1funcs.S:
	(LLSYM): New macro prefix ".L" for strippable local symbols.
	(CFI_START_FUNCTION, CFI_END_FUNCTION): Moved earlier in the file.
	(FUNC_ENTRY): New macro for symbols with no ".section" directive.
	(WEAK_ENTRY): New macro FUNC_ENTRY + ".weak".
	(FUNC_START_SECTION): New macro FUNC_ENTRY with <section> argument.
	(WEAK_START_SECTION): New macro FUNC_START_SECTION + ".weak".
	(FUNC_START): Redefined in terms of FUNC_START_SECTION <".text">.
	(WEAK_START): New macro FUNC_START + ".weak".
	(WEAK_ALIAS): New macro FUNC_ALIAS + ".weak".
	(FUNC_END): Moved after FUNC_START macro group.
	(THUMB_FUNC_START): Moved near the other *FUNC* macros.
	(THUMB_SYNTAX, ARM_SYM_START, SYM_END): Deleted unused macros.
---
 libgcc/config/arm/lib1funcs.S | 109 +++++++++++++++++++++-------------
 1 file changed, 69 insertions(+), 40 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 8c39c9f20a2..a4fa62b3832 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -69,11 +69,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TYPE(x) .type SYM(x),function
 #define SIZE(x) .size SYM(x), . - SYM(x)
 #define LSYM(x) .x
+#define LLSYM(x) .L##x
 #else
 #define __PLT__
 #define TYPE(x)
 #define SIZE(x)
 #define LSYM(x) x
+#define LLSYM(x) x
 #endif
 
 /* Function end macros.  Variants for interworking.  */
@@ -182,6 +184,16 @@ LSYM(Lend_fde):
 #endif
 .endm
 
+.macro CFI_START_FUNCTION
+	.cfi_startproc
+	.cfi_remember_state
+.endm
+
+.macro CFI_END_FUNCTION
+	.cfi_restore_state
+	.cfi_endproc
+.endm
+
 /* Don't pass dirn, it's there just to get token pasting right.  */
 
 .macro	RETLDM	regs=, cond=, unwind=, dirn=ia
@@ -324,10 +336,6 @@ LSYM(Lend_fde):
 .endm
 #endif
 
-.macro FUNC_END name
-	SIZE (__\name)
-.endm
-
 .macro DIV_FUNC_END name signed
 	cfi_start	__\name, LSYM(Lend_div0)
 LSYM(Ldiv0):
@@ -340,48 +348,76 @@ LSYM(Ldiv0):
 	FUNC_END \name
 .endm
 
-.macro THUMB_FUNC_START name
-	.globl	SYM (\name)
-	TYPE	(\name)
-	.thumb_func
-SYM (\name):
-.endm
-
 /* Function start macros.  Variants for ARM and Thumb.  */
 
 #ifdef __thumb__
 #define THUMB_FUNC .thumb_func
 #define THUMB_CODE .force_thumb
-# if defined(__thumb2__)
-#define THUMB_SYNTAX
-# else
-#define THUMB_SYNTAX
-# endif
 #else
 #define THUMB_FUNC
 #define THUMB_CODE
-#define THUMB_SYNTAX
 #endif
 
+.macro THUMB_FUNC_START name
+	.globl  SYM (\name)
+	TYPE    (\name)
+	.thumb_func
+SYM (\name):
+.endm
+
+/* Strong global symbol, ".text" section.
+   The default macro for function declarations. */
 .macro FUNC_START name
-	.text
+	FUNC_START_SECTION \name .text
+.endm
+
+/* Weak global symbol, ".text" section.
+   Use WEAK_* macros to declare a function/object that may be discarded in by
+    the linker when another library or object exports the same name.
+   Typically, functions declared with WEAK_* macros implement a subset of
+    functionality provided by the overriding definition, and are discarded
+    when the full functionality is required. */
+.macro WEAK_START name
+	.weak SYM(__\name)
+	FUNC_START_SECTION \name .text
+.endm
+
+/* Strong global symbol, alternate section.
+   Use the *_START_SECTION macros for declarations that the linker should
+    place in a non-defailt section (e.g. ".rodata", ".text.subsection"). */
+.macro FUNC_START_SECTION name section
+	.section \section,"x"
+	.align 0
+	FUNC_ENTRY \name
+.endm
+
+/* Weak global symbol, alternate section. */
+.macro WEAK_START_SECTION name section
+	.weak SYM(__\name)
+	FUNC_START_SECTION \name \section
+.endm
+
+/* Strong global symbol.
+   Use *_ENTRY macros internal to a function/object body to declare a second
+    or subsequent entry point without changing the assembler state.
+   Because there is no alignment specification, these macros should never
+    replace the *_START_* macros as the first declaration in any object. */
+.macro FUNC_ENTRY name
 	.globl SYM (__\name)
 	TYPE (__\name)
-	.align 0
 	THUMB_CODE
 	THUMB_FUNC
-	THUMB_SYNTAX
 SYM (__\name):
 .endm
 
-.macro ARM_SYM_START name
-       TYPE (\name)
-       .align 0
-SYM (\name):
+/* Weak global symbol. */
+.macro WEAK_ENTRY name
+	.weak SYM(__\name)
+	FUNC_ENTRY \name
 .endm
 
-.macro SYM_END name
-       SIZE (\name)
+.macro FUNC_END name
+	SIZE (__\name)
 .endm
 
 /* Special function that will always be coded in ARM assembly, even if
@@ -447,6 +483,11 @@ SYM (__\name):
 #endif
 .endm
 
+.macro WEAK_ALIAS new old
+	.weak SYM(__\new)
+	FUNC_ALIAS \new \old
+.endm
+
 #ifndef NOT_ISA_TARGET_32BIT
 .macro	ARM_FUNC_ALIAS new old
 	.globl	SYM (__\new)
@@ -1459,10 +1500,8 @@ LSYM(Lover12):
 #ifdef L_dvmd_tls
 
 #ifdef __ARM_EABI__
-	WEAK aeabi_idiv0
-	WEAK aeabi_ldiv0
-	FUNC_START aeabi_idiv0
-	FUNC_START aeabi_ldiv0
+	WEAK_START aeabi_idiv0
+	WEAK_START aeabi_ldiv0
 	RET
 	FUNC_END aeabi_ldiv0
 	FUNC_END aeabi_idiv0
@@ -2170,16 +2209,6 @@ LSYM(Lchange_\register):
 
 #endif /* Arch supports thumb.  */
 
-.macro CFI_START_FUNCTION
-	.cfi_startproc
-	.cfi_remember_state
-.endm
-
-.macro CFI_END_FUNCTION
-	.cfi_restore_state
-	.cfi_endproc
-.endm
-
 #ifndef __symbian__
 /* The condition here must match the one in gcc/config/arm/elf.h and
    libgcc/config/arm/t-elf.  */

From patchwork Mon Oct 31 15:44:57 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59664
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 33A0E3851ABA
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:47:45 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 5A07A3853553
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5A07A3853553
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id 47E6132008FE;
 Mon, 31 Oct 2022 11:46:13 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:46:13 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231172; x=
 1667317572; bh=AMkHSIVSxpnQ9JhADAVloJ8iyEtfy+UZm74iOHMR7pE=; b=J
 YuHEbqD2Epwaenkr0vwGD3MZaQTJdCUcpj68aTIpDqvVMTvpolDLaoY45ygucPzU
 Jkmxa4iagbREYe+ZyJxy70VhdIqluFeQy34izKPNrt37fV7LCRrkKDsHBPnQs0ra
 7tbNyW3lhpnczoZW/df7qloS81QzSRUoM3WQZIKugWwiJ8QuqtpeCSSaGlN4J+JU
 OsHNqu0w8t/+rpL00ymOPReqKnxxiTmgzyxlfqDvLzk8FVQ7HIYaHATPo/63sEA3
 GBFGukhACZgU3vgTZ+bfwxRn72v81HQ7XtXDl9haqkFChOfk9tmE5KvO+xWj5n3J
 7dpEWu43Gle6+954Epkkg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231172; x=1667317572; bh=AMkHSIVSxpnQ9
 JhADAVloJ8iyEtfy+UZm74iOHMR7pE=; b=lSvOIcg29jlAAXCC30ZpZjvOG01Y8
 bDLN4S6x32JV8J+eE51c5KXZc72CMLbwyQkWaBsmeEdUye4uMyRSzUpjRXDxq+4F
 +6di7k6whRqZdO8foXfAwsJd/0RNv+wuuatZDgIvcTClAy4guwNdaDoV8yoeNhYQ
 gAyzFuxy4ZcYTUCBtf4p8zbKSXlif2lDE216kumSASO1GBIdWLPP5HAxvf31XHRi
 OWcqMeWhale26HqXZEDKgN9el2QgTNJmEiXN1rsCuLFbsAT1YphKK5n6CE4ClxuT
 DZPW/v3lTnISDR6Kfr9VUJUlW/a7t0IJ6XlMcReat2NEiaR95LZbUtoAA==
X-ME-Sender: <xms:xO1fY-WnTKXRo1BXzAv6Y1ICoOv509YzPTJK_-HgekoJRSMgOgewpw>
 <xme:xO1fY6lWjeXNiqzRrFJFgY9w6ygBDgbex7H08qbdtDhB81wf5e77T-NuWM-58i5R1
 BJcaGg3pvGQug>
X-ME-Received: 
 <xmr:xO1fYyYUNiK3rSPhiVYSQEmhEVQFI170Iig72wtwEq1PksYaTiKEAv9fBJVd0zw4KunW4Jm-SDp47Up20ALOL8kuz6JoLvGM40Sgxmkj8rgwubhRC3_9pFk>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdejlecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeeuieekgfegkeekleffkefhleevgfekkeekheeffeevkeehvdduvdfgvdeg
 heeuhfenucffohhmrghinheplhhisgdufhhunhgtshdrshgsnecuvehluhhsthgvrhfuih
 iivgepudenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgv
 lhdrtghomh
X-ME-Proxy: <xmx:xO1fY1UkDPNkRj0708mrqUydT3mi6BS58YsfK5IUUf4R03xfa4bF_Q>
 <xmx:xO1fY4mab9VIYoiARxEhZHzfltlBqtKW21Jebr5sXAMi9E_egwSdeg>
 <xmx:xO1fY6d-jureidLiP5_ZXHqste_kZOfe_809vCCwj-GNgWA53c_ZNg>
 <xmx:xO1fY9wP9v3fWKATKHt-BmkZVBpOCZ1-jv-E8TaUsjxNCM0VKN0q4g>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:12 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFk405087235; Mon, 31 Oct 2022 08:46:04 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 02/34] Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
Date: Mon, 31 Oct 2022 08:44:57 -0700
Message-Id: <20221031154529.3627576-3-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Since THUMB_FUNC_START does not insert the ".text" directive, it aligns
more closely with the new FUNC_ENTRY maro and is renamed accordingly.

THUMB_FUNC_START usage has been universally synonymous with the
".force_thumb" directive, so this is now folded into the definition.
Usage of ".force_thumb" and ".thumb_func" is now tightly coupled
throughout the "arm" subdirectory.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/lib1funcs.S: (THUMB_FUNC_START): Renamed to ...
	(THUMB_FUNC_ENTRY): for consistency; also added ".force_thumb".
	(_call_via_r0): Removed redundant preceding ".force_thumb".
	(__gnu_thumb1_case_sqi, __gnu_thumb1_case_uqi, __gnu_thumb1_case_shi,
	__gnu_thumb1_case_si): Removed redundant ".force_thumb" and ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index a4fa62b3832..726984a9d1d 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -358,10 +358,11 @@ LSYM(Ldiv0):
 #define THUMB_CODE
 #endif
 
-.macro THUMB_FUNC_START name
+.macro THUMB_FUNC_ENTRY name
 	.globl  SYM (\name)
 	TYPE    (\name)
 	.thumb_func
+	.force_thumb
 SYM (\name):
 .endm
 
@@ -1944,10 +1945,9 @@ ARM_FUNC_START ctzsi2
 	
 	.text
 	.align 0
-        .force_thumb
 
 .macro call_via register
-	THUMB_FUNC_START _call_via_\register
+	THUMB_FUNC_ENTRY _call_via_\register
 
 	bx	\register
 	nop
@@ -2030,7 +2030,7 @@ _arm_return_r11:
 .macro interwork_with_frame frame, register, name, return
 	.code	16
 
-	THUMB_FUNC_START \name
+	THUMB_FUNC_ENTRY \name
 
 	bx	pc
 	nop
@@ -2047,7 +2047,7 @@ _arm_return_r11:
 .macro interwork register
 	.code	16
 
-	THUMB_FUNC_START _interwork_call_via_\register
+	THUMB_FUNC_ENTRY _interwork_call_via_\register
 
 	bx	pc
 	nop
@@ -2084,7 +2084,7 @@ LSYM(Lchange_\register):
 	/* The LR case has to be handled a little differently...  */
 	.code 16
 
-	THUMB_FUNC_START _interwork_call_via_lr
+	THUMB_FUNC_ENTRY _interwork_call_via_lr
 
 	bx 	pc
 	nop
@@ -2112,9 +2112,7 @@ LSYM(Lchange_\register):
 	
 	.text
 	.align 0
-        .force_thumb
-	.syntax unified
-	THUMB_FUNC_START __gnu_thumb1_case_sqi
+	THUMB_FUNC_ENTRY __gnu_thumb1_case_sqi
 	push	{r1}
 	mov	r1, lr
 	lsrs	r1, r1, #1
@@ -2131,9 +2129,7 @@ LSYM(Lchange_\register):
 	
 	.text
 	.align 0
-        .force_thumb
-	.syntax unified
-	THUMB_FUNC_START __gnu_thumb1_case_uqi
+	THUMB_FUNC_ENTRY __gnu_thumb1_case_uqi
 	push	{r1}
 	mov	r1, lr
 	lsrs	r1, r1, #1
@@ -2150,9 +2146,7 @@ LSYM(Lchange_\register):
 	
 	.text
 	.align 0
-        .force_thumb
-	.syntax unified
-	THUMB_FUNC_START __gnu_thumb1_case_shi
+	THUMB_FUNC_ENTRY __gnu_thumb1_case_shi
 	push	{r0, r1}
 	mov	r1, lr
 	lsrs	r1, r1, #1
@@ -2170,9 +2164,7 @@ LSYM(Lchange_\register):
 	
 	.text
 	.align 0
-        .force_thumb
-	.syntax unified
-	THUMB_FUNC_START __gnu_thumb1_case_uhi
+	THUMB_FUNC_ENTRY __gnu_thumb1_case_uhi
 	push	{r0, r1}
 	mov	r1, lr
 	lsrs	r1, r1, #1
@@ -2190,9 +2182,7 @@ LSYM(Lchange_\register):
 	
 	.text
 	.align 0
-        .force_thumb
-	.syntax unified
-	THUMB_FUNC_START __gnu_thumb1_case_si
+	THUMB_FUNC_ENTRY __gnu_thumb1_case_si
 	push	{r0, r1}
 	mov	r1, lr
 	adds.n	r1, r1, #2	/* Align to word.  */

From patchwork Mon Oct 31 15:44:58 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59668
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 953DC393BA43
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:48:33 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id A425C385381B
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A425C385381B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id 8D851320093B;
 Mon, 31 Oct 2022 11:46:18 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:46:18 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231178; x=
 1667317578; bh=xQuk8kPa646vmWTOQ9Zp9zrvJHZN77hxMqAGx+3p9FQ=; b=m
 VWc4cAPHLy4ckXVw1DfjF1bNkRFU3ZbQD5eafKMdUirEwz9VbtpjaanedDLJaoov
 xaRtaIQAyj51aYr03HgLqxNWQLvOIF1/YR49pfk2FGAU/8m/Es0gaM/Ona1sT20V
 ilhuw4caOaIgTHT97MVEjPnMce5dPYXrISjkCuIddDbgQKOhlVXTz2AQNZm+X5TE
 Y/OhpXOiTxFIHzyeoN0H5OlFeQfUTmChsodVuYVnUjwuafoii9r1DV/+mOwM8hVv
 n26Ip/6MUdN5kLsj9mV8qTuZTkmKl3dz1uhmI2nFHqGQZZUP9ftsnKzQJQhGQOZC
 GeVsTXqA+SVII3m4hixqg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231178; x=1667317578; bh=xQuk8kPa646vm
 WTOQ9Zp9zrvJHZN77hxMqAGx+3p9FQ=; b=B5AC1p44j2WobU61CAJApFaRWndOg
 QHj0sPC3sVzNupYLndP1Nwp0dOGo8bFRZ/f1Y3o/LP7kDgno7oGoC+d56sqx48d0
 sELKLtlD/6iXcCIUfE+ZgUakIM7aTgOsEF8UIacHDpN/tykqlSOtvYkvIp3EdHGk
 kYCsBOWdZklxSXYHzIC87oqzVbr2lUiqK0JxyizcgYJQ0aSoRWeKSMhyUKn1euOS
 vcotF7oNoJJtGY4mIpDGFNWo7zco3ryBHBeovClayVaM4s4HVOHC+pj1fuv156/C
 qaACxJiXSpLVDqTkjRLRSu+CCoV5Izmt1QV2ep+laKt1mHbVS+Qk8CVUg==
X-ME-Sender: <xms:ye1fY17QhM197cg-mDyidsITZbT-44OflAkUYxIfs_VE5-ezDnOszA>
 <xme:ye1fYy5IX7fFPgXxWa39p1FZgIuz8LjZ5d2zwcZ1Nh7Xvo-Pu07U6WJcKF6aCWUpS
 U7ja4ezFr3FEg>
X-ME-Received: 
 <xmr:ye1fY8dnMxTm4175cwGjjfHpfQRtuXo2Dknta8fV3xFBTfbDohBKtLxMv60_F0eJLJBeZ1SJJrYSOllMs4tFS9Vi2f4F-Qp-QPzCEzGmEOaYnIj82u9Lifs>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeeuieekgfegkeekleffkefhleevgfekkeekheeffeevkeehvdduvdfgvdeg
 heeuhfenucffohhmrghinheplhhisgdufhhunhgtshdrshgsnecuvehluhhsthgvrhfuih
 iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgv
 lhdrtghomh
X-ME-Proxy: <xmx:ye1fY-Jld0s13QGRBi8FOYL-V1tIj8ZVyXCa5N4k9brV_WXa3OKC9A>
 <xmx:ye1fY5Ix-4sf6552abUpQ45XIqALnK-3V4GENALgpRmpsxAyn26pBg>
 <xmx:ye1fY3zj-xeUw_1oX4gs-iVyTFSZBdc_CNbAoITQ7pHYq0A7wcbM7Q>
 <xmx:yu1fYwUfEKR5_LWCiaF8j4oV-vpO6LCwc0BdI4hRkJZLHA4smc-6vA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:17 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFk9nD087238; Mon, 31 Oct 2022 08:46:09 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 03/34] Fix syntax warnings on conditional instructions
Date: Mon, 31 Oct 2022 08:44:58 -0700
Message-Id: <20221031154529.3627576-4-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/lib1funcs.S (RETLDM, ARM_DIV_BODY, ARM_MOD_BODY,
	_interwork_call_via_lr): Moved condition code after the flags
	update specifier "s".
	(ARM_FUNC_START, THUMB_LDIV0): Removed redundant ".syntax".
---
 libgcc/config/arm/lib1funcs.S | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 726984a9d1d..f2f82f9d509 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -204,7 +204,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
 	pop\cond	{\regs, lr}
 # else
-	ldm\cond\dirn	sp!, {\regs, lr}
+	ldm\dirn\cond	sp!, {\regs, lr}
 # endif
 	.endif
 	.ifnc "\unwind", ""
@@ -220,7 +220,7 @@ LSYM(Lend_fde):
 # if defined(__thumb2__)
 	pop\cond	{\regs, pc}
 # else
-	ldm\cond\dirn	sp!, {\regs, pc}
+	ldm\dirn\cond	sp!, {\regs, pc}
 # endif
 	.endif
 #endif
@@ -292,7 +292,6 @@ LSYM(Lend_fde):
 	pop	{r1, pc}
 
 #elif defined(__thumb2__)
-	.syntax unified
 	.ifc \signed, unsigned
 	cbz	r0, 1f
 	mov	r0, #0xffffffff
@@ -429,7 +428,6 @@ SYM (__\name):
 /* For Thumb-2 we build everything in thumb mode.  */
 .macro ARM_FUNC_START name
        FUNC_START \name
-       .syntax unified
 .endm
 #define EQUIV .thumb_set
 .macro  ARM_CALL name
@@ -643,7 +641,7 @@ pc		.req	r15
 	orrhs	\result,   \result,   \curbit,  lsr #3
 	cmp	\dividend, #0			@ Early termination?
 	do_it	ne, t
-	movnes	\curbit,   \curbit,  lsr #4	@ No, any more bits to do?
+	movsne	\curbit,   \curbit,  lsr #4	@ No, any more bits to do?
 	movne	\divisor,  \divisor, lsr #4
 	bne	1b
 
@@ -745,7 +743,7 @@ pc		.req	r15
 	subhs	\dividend, \dividend, \divisor, lsr #3
 	cmp	\dividend, #1
 	mov	\divisor, \divisor, lsr #4
-	subges	\order, \order, #4
+	subsge	\order, \order, #4
 	bge	1b
 
 	tst	\order, #3
@@ -2093,7 +2091,7 @@ LSYM(Lchange_\register):
 	.globl .Lchange_lr
 .Lchange_lr:
 	tst	lr, #1
-	stmeqdb	r13!, {lr, pc}
+	stmdbeq	r13!, {lr, pc}
 	mov	ip, lr
 	adreq	lr, _arm_return
 	bx	ip

From patchwork Mon Oct 31 15:44:59 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59661
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 692023852750
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:47:19 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id EFD493853561
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EFD493853561
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id 8FE593200917;
 Mon, 31 Oct 2022 11:46:23 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:46:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231183; x=
 1667317583; bh=FI0GRIy3zXmYGot0W0fHA5+2eB/6jtr+Aavg4VvcTaM=; b=b
 nfkSPaK+LEgHCrnHIwIIMOvM5hLRcuwyYPCgWQzpQIsfaKm3aCv5PZxtEsY3r9RY
 gYJyALICvlm33S/tc7/gg8HsOjZ2idPBQ3GhG032sRbj1LbdHgCz+CsS13pEQoog
 flwY62jLSTSGCbNU6jInTTFpH+W5xo2Q0xzTFpgfK5VVxgBkdnEZ2P4v+8fkSByJ
 jlg7Nur42loA7HbxjM15CIJ9pNdD3yIqy7HK0rTksEyg0ym/qaXhdcu2Bs0jLvvW
 hUyvheeoQ5xcX20fYvP0ggs/NutpW2euwLsZTmzVGWZ+viUGS733sRk5POiLcq/Z
 EHuk4BPN8m5RqiZ+5n5IA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231183; x=1667317583; bh=FI0GRIy3zXmYG
 ot0W0fHA5+2eB/6jtr+Aavg4VvcTaM=; b=VtLtHoFLVxG6Y0nbAwablPiHaSTZd
 +bN0BEJwx7DTyv/Py7D1OiKmEbsvKPPRCirCTEYBpW5D4fs53uzMB1VF29OyMDtL
 0Pv8w/dBwg4XRvRqWtTpGZ9WBMUPdTvuw5cI9ToQPaMrJXKJcNInqrIlxCw8MI8T
 A/7DwsPktjrT1l72+Kt3CsfVIWiJoyubVARTYGDswT+bvy+w47foAsrbKwLJgge3
 z9X3Xtxjj9jZCGps42pR7IKOUon3pUu0frRin1H7tHmy78uoflVK8AINC/aO8Dos
 NIIbbTKJuQlnpz2fo4OuvAxn5zmlPfpFmUZzf90GGQCJjeumpzvzPqU/g==
X-ME-Sender: <xms:z-1fY6rLt8qtmOYNSf_ot1BPz3pEWNmnrFdFMjmLgIy8bEtPLrucOQ>
 <xme:z-1fY4qP6uWKruVjAoyExzJ3MXwKR7LhL6aLU6jmcvilSf2Mwk4E6EXdn9sO0ycfZ
 GHnyn1oKndFsQ>
X-ME-Received: 
 <xmr:z-1fY_NxbZhUnfErVRHtDZyb0eLUDaCCBJQvvXUETtHkvioCNFB900Yrc6bjQmzoPRODTcyQLx-ZOEUZbs-RkAloXNKJtDS1mwrE08o2Hn88efrv1EF8hcs>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpefggeekleduieelueegvdehvdegjedvgefgtddvfeeuvedvveetffehhfdu
 jefggeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe
 hgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:z-1fY55C4UUvNtsYHOGiyXaHYGQg-wTHQlvgRe4WGpiG8dT5lgch-Q>
 <xmx:z-1fY54Vr8zMXJ_ZzAifL7Xdny76A-SajwujWzdVV6kMiBSY2ZePvA>
 <xmx:z-1fY5jEhuYHcDXxyQFSiAxSoNlAxoYG5o69HupTs29xprheROXjCA>
 <xmx:z-1fY5H4nx9H49R3zzkmmFySkllvMXJ4Oy-TLn7v-tQ-dV2yPe5Tnw>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:22 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkEZh087241; Mon, 31 Oct 2022 08:46:14 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 04/34] Reorganize LIB1ASMFUNCS object wrapper macros
Date: Mon, 31 Oct 2022 08:44:59 -0700
Message-Id: <20221031154529.3627576-5-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/t-elf (LIB1ASMFUNCS): Split macros into logical groups.
---
 libgcc/config/arm/t-elf | 66 +++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 9da6cd37054..93ea1cd8f76 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -14,19 +14,59 @@ LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
 endif
 endif # !__symbian__
 
-# For most CPUs we have an assembly soft-float implementations.
-# However this is not true for ARMv6M.  Here we want to use the soft-fp C
-# implementation.  The soft-fp code is only build for ARMv6M.  This pulls
-# in the asm implementation for other CPUs.
-LIB1ASMFUNCS += _udivsi3 _divsi3 _umodsi3 _modsi3 _dvmd_tls _bb_init_func \
-	_call_via_rX _interwork_call_via_rX \
-	_lshrdi3 _ashrdi3 _ashldi3 \
-	_arm_negdf2 _arm_addsubdf3 _arm_muldivdf3 _arm_cmpdf2 _arm_unorddf2 \
-	_arm_fixdfsi _arm_fixunsdfsi \
-	_arm_truncdfsf2 _arm_negsf2 _arm_addsubsf3 _arm_muldivsf3 \
-	_arm_cmpsf2 _arm_unordsf2 _arm_fixsfsi _arm_fixunssfsi \
-	_arm_floatdidf _arm_floatdisf _arm_floatundidf _arm_floatundisf \
-	_clzsi2 _clzdi2 _ctzsi2
+# This pulls in the available assembly function implementations.
+# The soft-fp code is only built for ARMv6M, since there is no
+# assembly implementation here for double-precision values.
+
+
+# Group 1: Integer function objects.
+LIB1ASMFUNCS += \
+	_ashldi3 \
+	_ashrdi3 \
+	_lshrdi3 \
+	_clzdi2 \
+	_clzsi2 \
+	_ctzsi2 \
+	_dvmd_tls \
+	_divsi3 \
+	_modsi3 \
+	_udivsi3 \
+	_umodsi3 \
+
+
+# Group 2: Single precision floating point function objects.
+LIB1ASMFUNCS += \
+	_arm_addsubsf3 \
+	_arm_cmpsf2 \
+	_arm_fixsfsi \
+	_arm_fixunssfsi \
+	_arm_floatdisf \
+	_arm_floatundisf \
+	_arm_muldivsf3 \
+	_arm_negsf2 \
+	_arm_unordsf2 \
+
+
+# Group 3: Double precision floating point function objects.
+LIB1ASMFUNCS += \
+	_arm_addsubdf3 \
+	_arm_cmpdf2 \
+	_arm_fixdfsi \
+	_arm_fixunsdfsi \
+	_arm_floatdidf \
+	_arm_floatundidf \
+	_arm_muldivdf3 \
+	_arm_negdf2 \
+	_arm_truncdfsf2 \
+	_arm_unorddf2 \
+
+
+# Group 4: Miscellaneous function objects.
+LIB1ASMFUNCS += \
+	_bb_init_func \
+	_call_via_rX \
+	_interwork_call_via_rX \
+
 
 # Currently there is a bug somewhere in GCC's alias analysis
 # or scheduling code that is breaking _fpmul_parts in fp-bit.c.

From patchwork Mon Oct 31 15:45:00 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59665
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id EDE9A38460AB
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:47:57 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id BDCA53853570
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BDCA53853570
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id A513032008FE;
 Mon, 31 Oct 2022 11:46:28 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:46:28 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231188; x=
 1667317588; bh=bizKXgz+Av2KRD555WC7aw8ta7XIGtFIpQMxI1WEAC4=; b=I
 NHrbvZs2NVc373Ua1xkKcREuMt1YMTuRhyALP0lMABBBUfCyQse3i3drppwtC53I
 X58rY/uAOKBUiLtOiEmbF9RxR4dz+pugKjalP445PTYBOcwNOBiGbihDlZIQ5wy6
 EFATQShvHib58p+bO8cu3pnV4P03w2VQhL+DjqB+xItualtG6bknQRqzB2wx1dLu
 8fxeCTAFPogEHKWQi9Ubl5eY1lNcF3XOeEkeLyL+hbHNnM/OFwj2J2SrIrYx58uf
 1Se1+Z25MBeJRNrhFSz87mc6Aj6xHlbKZBvU2tfc0jIV9g2J/JnwdCserWLsIhQP
 JdwpkyMd9XmicxvSo3pCA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231188; x=1667317588; bh=bizKXgz+Av2KR
 D555WC7aw8ta7XIGtFIpQMxI1WEAC4=; b=KDg2/Y2UwRZyiJ8GGdQstvAujd+HR
 o64rvF8tjuFlWmU/scuvRO4zF7HF9aN9LDydtAyCYjt9sQ0vqRx+lVL28JGvZelV
 WczzsVcIFU1ilQgGYGuIDuR+CG5fEO/Eg4pWqgMmoQDlAkqD0RKv71vTtPEtGOqH
 0TJabAVkHNWQyolnvPiHBo6QSqKXkcpzgvlgRfwSxkaSVCx/fO0d9Bjn5Ia8p9ce
 aRfM9Hvmdfq2jxmgnuQK/fMcQPvykA9AmmJapgx7owQyDfsuoutjQGA8FivCjqAe
 IKhlhXvtZbaudSvmJ1YQsxf1zoHOv50Vn6avoQOu9GiKRBGhH98uhGncQ==
X-ME-Sender: <xms:1O1fY8tyeqZYym5pDgl7ynd5kbRTFxUNbWcmHWPbjDtA5PXvE9yYAA>
 <xme:1O1fY5deXwrquDzREpsLr5IAFl8CnF1rfgGt-NMPUpDrv2qPE9H1aJRMF0YfP6iEU
 QO9OFdDobWc0Q>
X-ME-Received: 
 <xmr:1O1fY3xIKzbvnlytdgb8IkB-cbq3ycJsfHzLsvq4xznfkB8Jcr3Hp07UTL_0j8-E_t72fub-oSxbah7FQ-FepbOCCXivEB9OVCqjN5vQig85ERkxQJkJEDw>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdejlecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeeuieekgfegkeekleffkefhleevgfekkeekheeffeevkeehvdduvdfgvdeg
 heeuhfenucffohhmrghinheplhhisgdufhhunhgtshdrshgsnecuvehluhhsthgvrhfuih
 iivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgv
 lhdrtghomh
X-ME-Proxy: <xmx:1O1fY_OwwJJZdRx39YfKrmJ5ZUwlbHpYlh2zcG0zw4m0xpPVhGXmWg>
 <xmx:1O1fY89tTbGbSbebu6TV7YUq0Y4Ds29v-0EQQRUr3dEYNANPij3yCw>
 <xmx:1O1fY3VXWkzBPnmvpaAFUQCWPW5gz_KWSxiFe3nPbcm0RTrXi3atHg>
 <xmx:1O1fY-LF5B5VVO3H_GPNKSI2mAp28h2KU3MC47O3nFVOGkN-MRau4Q>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:27 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkJqe087244; Mon, 31 Oct 2022 08:46:19 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 05/34] Add the __HAVE_FEATURE_IT and IT() macros
Date: Mon, 31 Oct 2022 08:45:00 -0700
Message-Id: <20221031154529.3627576-6-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

These macros complement and extend the existing do_it() macro.
Together, they streamline the process of optimizing short branchless
contitional sequences to support ARM, Thumb-2, and Thumb-1.

The inherent architecture limitations of Thumb-1 means that writing
assembly code is somewhat more tedious.  And, while such code will run
unmodified in an ARM or Thumb-2 enfironment, it will lack one of the
key performance optimizations available there.

Initially, the first idea might be to split the an instruction sequence
with #ifdef(s): one path for Thumb-1 and the other for ARM/Thumb-2.
This could suffice if conditional execution optimizations were rare.

However, #ifdef(s) break flow of an algorithm and shift focus to the
architectural differences instead of the similarities.  On functions
with a high percentage of conditional execution, it starts to become
attractive to split everything into distinct architecture-specific
function objects -- even when the underlying algorithm is identical.

Additionally, duplicated code and comments (whether an individual
operand, a line, or a larger block) become a future maintenance
liability if the two versions aren't kept in sync.

See code comments for limitations and expecated usage.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	(__HAVE_FEATURE_IT, IT): New macros.
---
 libgcc/config/arm/lib1funcs.S | 68 +++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index f2f82f9d509..7a941ee9fc8 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -230,6 +230,7 @@ LSYM(Lend_fde):
    ARM and Thumb-2.  However this is only supported by recent gas, so define
    a set of macros to allow ARM code on older assemblers.  */
 #if defined(__thumb2__)
+#define __HAVE_FEATURE_IT
 .macro do_it cond, suffix=""
 	it\suffix	\cond
 .endm
@@ -245,6 +246,9 @@ LSYM(Lend_fde):
 	\name \dest, \src1, \tmp
 .endm
 #else
+#if !defined(__thumb__)
+#define __HAVE_FEATURE_IT
+#endif
 .macro do_it cond, suffix=""
 .endm
 .macro shift1 op, arg0, arg1, arg2
@@ -259,6 +263,70 @@ LSYM(Lend_fde):
 
 #define COND(op1, op2, cond) op1 ## op2 ## cond
 
+
+/* The IT() macro streamlines the construction of short branchless contitional
+    sequences that support ARM, Thumb-2, and Thumb-1.  It is intended as an
+    extension to the .do_it macro defined above.  Code not written with the
+    intent to support Thumb-1 need not use IT().
+
+   IT()'s main advantage is the minimization of syntax differences.  Unified
+    functions can support Thumb-1 without imposiing an undue performance
+    penalty on ARM and Thumb-2.  Writing code without duplicate instructions
+    and operands keeps the high level function flow clearer and should reduce
+    the incidence of maintenance bugs.
+
+   Where conditional execution is supported by ARM and Thumb-2, the specified
+    instruction compiles with the conditional suffix 'c'.
+
+   Where Thumb-1 and v6m do not support IT, the given instruction compiles
+    with the standard unified syntax suffix "s", and a preceding branch
+    instruction is required to implement conditional behavior.
+
+   (Aside: The Thumb-1 "s"-suffix pattern is somewhat simplistic, since it
+    does not support 'cmp' or 'tst' with a non-"s" suffix.  It also appends
+    "s" to 'mov' and 'add' with high register operands which are otherwise
+    legal on v6m.  Use of IT() will result in a compiler error for all of
+    these exceptional cases, and a full #ifdef code split will be required.
+    However, it is unlikely that code written with Thumb-1 compatibility
+    in mind will use such patterns, so IT() still promises a good value.)
+
+   Typical if/then/else usage is:
+
+    #ifdef __HAVE_FEATURE_IT
+        // ARM and Thumb-2 'true' condition.
+        do_it   c,      tee
+    #else
+        // Thumb-1 'false' condition.  This must be opposite the
+        //  sense of the ARM and Thumb-2 condition, since the
+        //  branch is taken to skip the 'true' instruction block.
+        b!c     else_label
+    #endif
+
+        // Conditional 'true' execution for all compile modes.
+     IT(ins1,c) op1,    op2
+     IT(ins2,c) op1,    op2
+
+    #ifndef __HAVE_FEATURE_IT
+        // Thumb-1 branch to skip the 'else' instruction block.
+        // Omitted for if/then usage.
+        b       end_label
+    #endif
+
+   else_label:
+        // Conditional 'false' execution for all compile modes.
+        // Omitted for if/then usage.
+     IT(ins3,!c) op1,   op2
+     IT(ins4,!c) op1,   op2
+
+   end_label:
+        // Unconditional execution resumes here.
+ */
+#ifdef __HAVE_FEATURE_IT
+  #define IT(ins,c) ins##c
+#else
+  #define IT(ins,c) ins##s
+#endif
+
 #ifdef __ARM_EABI__
 .macro ARM_LDIV0 name signed
 	cmp	r0, #0

From patchwork Mon Oct 31 15:45:01 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59663
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D5A80381E5F8
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:47:34 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 31688385357A
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 31688385357A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 047CC3200973;
 Mon, 31 Oct 2022 11:46:33 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:46:34 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231193; x=
 1667317593; bh=WbP+kV6JJJy1YwCW0ItrkG9IWt6EFpHze21uW634wsg=; b=y
 GT1cUuwYR6DqOorFZ7uMwNz8LOFbuEY8/5Y5rYC+Gj6EWcDsVKz/ROa2py4PdM9K
 BAwDVLo7GXJCZJn4TP8p95+UEIENIM//mm6G0/HDtMnxbfM3YEK9khFZhvVhmHvm
 KfEfpMVml35Zx48va00HL90mAci/o9WY0S4BMTRy7uzGbVFz27t/t0dmdRGZVZUv
 X7w9Pi394Lh5kzBjm/+a412cFTgAkOycpDx+UJpTUnLYCJ5QZvejePnZPbhodICy
 vJSO54cprMICPNouzi5Q8RF9G+nQrOdI/skFW0PQgG4JCwxrGtooV4IyVI3k1WYD
 7NhSHZVOMg0BzRHNUdq/w==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231193; x=1667317593; bh=WbP+kV6JJJy1Y
 wCW0ItrkG9IWt6EFpHze21uW634wsg=; b=UUHLDKNAVo28wUcUr3d0eD2g5YrfH
 b89W9ZK+RwT2BebOlIezpMRfimVKXOtF7EWCA+syCE3fMgLTQQSl1XHxmjo4OqP0
 2x2syPaz88B+3IUVOxWvligB4fcphWxeNERmpbkrPz9fUN2EwXQUkihQ0wjyiEMf
 VgvL6y8nvxIDAshGMoVCPSRbKQKOe4mb9+5yaW0Fjfr31ZS90VgcsO03hDdDG9LH
 8ZhxHHca9r5pNcJtJ8dXfMTExxqQQJQnkJJgOEg8KLCbfL8GM1Y584auh3M9PeWZ
 YqPSGwB1zLVGVwsmW+f99FwJKQ/5ZyKP+xLZdRYH+z281ezYSdo7QP44g==
X-ME-Sender: <xms:2e1fY_rYLVTbT5cLmgoo4G-_Zk518aWAGU5lW54ivh53rLyeWD0q0w>
 <xme:2e1fY5oa5CJ90NBJk5m3pSPtOL9-FcOeq5mW0Nj09Xz3v8IRreTMeTQCnxIuogVXD
 zKAmKETeHkUdQ>
X-ME-Received: 
 <xmr:2e1fY8MHXtTJ-tOZHSK2xHhUwng0Ki9H6AGyrVLJbGGDcmWbt8fuVa95jdr9AK5JsX0i7xJsfY2ElXLeBQU0v4qaQh3ahm8tj7maJjpnHil5XcWDPCMZqSc>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeduhefhvdekvefgkedtjeefvedtuefhiefhjeevueelffdvtdejieejkedv
 gfetueenucffohhmrghinheptghliidvrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh
 gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho
 mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:2e1fYy7sqW2J7won0FTjte0WUKEm9RUAifAR6AHcoFgvLjpBsg_D2Q>
 <xmx:2e1fY-6nNjYWpVpGhUbs1nAT6du2Xexw22madMVQhIea3gHw0_Cb9Q>
 <xmx:2e1fY6jwc-TXmkFu71tasA2ImCx9gCde64LUQOXlnKQRNoe5XnN8Fw>
 <xmx:2e1fY-GEicP_8b1uOTeuREP7uxLGwveG0T17KjXuf7NLgmCY06HGgg>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:32 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkOYn087247; Mon, 31 Oct 2022 08:46:24 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 06/34] Refactor 'clz' functions into a new file
Date: Mon, 31 Oct 2022 08:45:01 -0700
Message-Id: <20221031154529.3627576-7-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/lib1funcs.S (__clzsi2i, __clzdi2): Moved to ...
	* config/arm/clz2.S: New file.
---
 libgcc/config/arm/clz2.S      | 145 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S | 123 +---------------------------
 2 files changed, 146 insertions(+), 122 deletions(-)
 create mode 100644 libgcc/config/arm/clz2.S

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
new file mode 100644
index 00000000000..439341752ba
--- /dev/null
+++ b/libgcc/config/arm/clz2.S
@@ -0,0 +1,145 @@
+/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_clzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzsi2
+	movs	r1, #28
+	movs	r3, #1
+	lsls	r3, r3, #16
+	cmp	r0, r3 /* 0x10000 */
+	bcc	2f
+	lsrs	r0, r0, #16
+	subs	r1, r1, #16
+2:	lsrs	r3, r3, #8
+	cmp	r0, r3 /* #0x100 */
+	bcc	2f
+	lsrs	r0, r0, #8
+	subs	r1, r1, #8
+2:	lsrs	r3, r3, #4
+	cmp	r0, r3 /* #0x10 */
+	bcc	2f
+	lsrs	r0, r0, #4
+	subs	r1, r1, #4
+2:	adr	r2, 1f
+	ldrb	r0, [r2, r0]
+	adds	r0, r0, r1
+	bx lr
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+	FUNC_END clzsi2
+#else
+ARM_FUNC_START clzsi2
+# if defined (__ARM_FEATURE_CLZ)
+	clz	r0, r0
+	RET
+# else
+	mov	r1, #28
+	cmp	r0, #0x10000
+	do_it	cs, t
+	movcs	r0, r0, lsr #16
+	subcs	r1, r1, #16
+	cmp	r0, #0x100
+	do_it	cs, t
+	movcs	r0, r0, lsr #8
+	subcs	r1, r1, #8
+	cmp	r0, #0x10
+	do_it	cs, t
+	movcs	r0, r0, lsr #4
+	subcs	r1, r1, #4
+	adr	r2, 1f
+	ldrb	r0, [r2, r0]
+	add	r0, r0, r1
+	RET
+.align 2
+1:
+.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+	FUNC_END clzsi2
+#endif
+#endif /* L_clzsi2 */
+
+#ifdef L_clzdi2
+#if !defined (__ARM_FEATURE_CLZ)
+
+# ifdef NOT_ISA_TARGET_32BIT
+FUNC_START clzdi2
+	push	{r4, lr}
+	cmp	xxh, #0
+	bne	1f
+#  ifdef __ARMEB__
+	movs	r0, xxl
+	bl	__clzsi2
+	adds	r0, r0, #32
+	b 2f
+1:
+	bl	__clzsi2
+#  else
+	bl	__clzsi2
+	adds	r0, r0, #32
+	b 2f
+1:
+	movs	r0, xxh
+	bl	__clzsi2
+#  endif
+2:
+	pop	{r4, pc}
+# else /* NOT_ISA_TARGET_32BIT */
+ARM_FUNC_START clzdi2
+	do_push	{r4, lr}
+	cmp	xxh, #0
+	bne	1f
+#  ifdef __ARMEB__
+	mov	r0, xxl
+	bl	__clzsi2
+	add	r0, r0, #32
+	b 2f
+1:
+	bl	__clzsi2
+#  else
+	bl	__clzsi2
+	add	r0, r0, #32
+	b 2f
+1:
+	mov	r0, xxh
+	bl	__clzsi2
+#  endif
+2:
+	RETLDM	r4
+	FUNC_END clzdi2
+# endif /* NOT_ISA_TARGET_32BIT */
+
+#else /* defined (__ARM_FEATURE_CLZ) */
+
+ARM_FUNC_START clzdi2
+	cmp	xxh, #0
+	do_it	eq, et
+	clzeq	r0, xxl
+	clzne	r0, xxh
+	addeq	r0, r0, #32
+	RET
+	FUNC_END clzdi2
+
+#endif
+#endif /* L_clzdi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 7a941ee9fc8..469fea9ab5c 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1803,128 +1803,7 @@ LSYM(Lover12):
 
 #endif /* __symbian__ */
 
-#ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-	movs	r1, #28
-	movs	r3, #1
-	lsls	r3, r3, #16
-	cmp	r0, r3 /* 0x10000 */
-	bcc	2f
-	lsrs	r0, r0, #16
-	subs	r1, r1, #16
-2:	lsrs	r3, r3, #8
-	cmp	r0, r3 /* #0x100 */
-	bcc	2f
-	lsrs	r0, r0, #8
-	subs	r1, r1, #8
-2:	lsrs	r3, r3, #4
-	cmp	r0, r3 /* #0x10 */
-	bcc	2f
-	lsrs	r0, r0, #4
-	subs	r1, r1, #4
-2:	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	adds	r0, r0, r1
-	bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-	FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
-	clz	r0, r0
-	RET
-# else
-	mov	r1, #28
-	cmp	r0, #0x10000
-	do_it	cs, t
-	movcs	r0, r0, lsr #16
-	subcs	r1, r1, #16
-	cmp	r0, #0x100
-	do_it	cs, t
-	movcs	r0, r0, lsr #8
-	subcs	r1, r1, #8
-	cmp	r0, #0x10
-	do_it	cs, t
-	movcs	r0, r0, lsr #4
-	subcs	r1, r1, #4
-	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	add	r0, r0, r1
-	RET
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-# endif /* !defined (__ARM_FEATURE_CLZ) */
-	FUNC_END clzsi2
-#endif
-#endif /* L_clzsi2 */
-
-#ifdef L_clzdi2
-#if !defined (__ARM_FEATURE_CLZ)
-
-# ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzdi2
-	push	{r4, lr}
-	cmp	xxh, #0
-	bne	1f
-#  ifdef __ARMEB__
-	movs	r0, xxl
-	bl	__clzsi2
-	adds	r0, r0, #32
-	b 2f
-1:
-	bl	__clzsi2
-#  else
-	bl	__clzsi2
-	adds	r0, r0, #32
-	b 2f
-1:
-	movs	r0, xxh
-	bl	__clzsi2
-#  endif
-2:
-	pop	{r4, pc}
-# else /* NOT_ISA_TARGET_32BIT */
-ARM_FUNC_START clzdi2
-	do_push	{r4, lr}
-	cmp	xxh, #0
-	bne	1f
-#  ifdef __ARMEB__
-	mov	r0, xxl
-	bl	__clzsi2
-	add	r0, r0, #32
-	b 2f
-1:
-	bl	__clzsi2
-#  else
-	bl	__clzsi2
-	add	r0, r0, #32
-	b 2f
-1:
-	mov	r0, xxh
-	bl	__clzsi2
-#  endif
-2:
-	RETLDM	r4
-	FUNC_END clzdi2
-# endif /* NOT_ISA_TARGET_32BIT */
-
-#else /* defined (__ARM_FEATURE_CLZ) */
-
-ARM_FUNC_START clzdi2
-	cmp	xxh, #0
-	do_it	eq, et
-	clzeq	r0, xxl
-	clzne	r0, xxh
-	addeq	r0, r0, #32
-	RET
-	FUNC_END clzdi2
-
-#endif
-#endif /* L_clzdi2 */
+#include "clz2.S"
 
 #ifdef L_ctzsi2
 #ifdef NOT_ISA_TARGET_32BIT

From patchwork Mon Oct 31 15:45:02 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59662
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 059B8388B69B
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:47:28 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id DDBE03853563
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDBE03853563
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id C6F5E320097C;
 Mon, 31 Oct 2022 11:46:38 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:46:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231198; x=
 1667317598; bh=Yx8qmfJTmwQ7MBrKdMFqiUuFGaqdy08mMFpQiJmUfvs=; b=D
 Xc1blFVkT63UQzgUzkoXLEfZlfNz1S2j5IVpkEikhvLFhqhZPkbK9vTrRUyJizdg
 HH93rY6cprI02CJoteEyxJjk55bA2o3+w+IHgQpzbOmXq3tLIrE3eziFkcD+Hn1s
 4nSRCW4scmlThcgkIN0M+4HqRKorCSC8RgdNBaVQiWa6rnHHdMmIZANBm3qRbCTy
 zG/w1NFextYx6oK2uSn+t8ZnzNd+qtVIAqPofMM5a/sQQCvfuElsz1E6N08OUavO
 2ZHFehoOK7TuvWiqrvk0+Co/fvIzH9K0y39qNXXhev8FFMgXdRa0y2dICEXAdWPX
 XXi6bVWMquH7KQbiCH29g==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231198; x=1667317598; bh=Yx8qmfJTmwQ7M
 BrKdMFqiUuFGaqdy08mMFpQiJmUfvs=; b=LxHxo4u+u+VrbsIiH9og6RKPAV+Qk
 JiHXqZnJ4nX4T5HIZ4RovuN+hABgmkbYlLUpn93S/a8NueBAqhseo2m0GeXZ+QZd
 l+KJLPPqtELgTXvY4l8EYwCDUXZPf0sbk6DzA1UsXgqEHHIXzgWtWbXdrtH4NrCo
 FxpYmFOaR300ApYaQ4nwWYBa5jZTlHS0R0gdizAdniFFYm0388jRcY6cj5Ki8BS1
 90oiUFHelKWQVBnJQfVfhTVNcASxcnexNECiLSLt3LwGyi56T1UwFyOun3ygS2gr
 PtAX1d52F/LOqVR0iXxa3Ar5AvwBcKxdDbWJdZMXniIk+9KbPQevGwfYg==
X-ME-Sender: <xms:3u1fYzJ7lvx_idil4qXUgQJl3Da04thE4O5XVtY5Jaty6q120vg62A>
 <xme:3u1fY3J8z_R_-y-LrAe0qsCy6TmXuS24DvYqAHRpMBgBuLAZouZHmUQykTz-QaTyT
 KE1xLYTe0m7JA>
X-ME-Received: 
 <xmr:3u1fY7vAnjJkbS0wJQWZGrm1SRmtxUokJ9X8nPZ7-rFKZ4ZfZM_jmhjI6WcPP9oBhbbTRBmYbbTVEkAbNXGiLSAj8eXrkTsdDYHwZDB9V9v76QrL7nal3fU>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeefhfefgffgkeehteeftdevteeftdekieeugfeiffeggedviedufefgheet
 teevueenucffohhmrghinheptghtiidvrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh
 gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho
 mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:3u1fY8Z9B0oyRpNM1e5JIICvYfQw5AIju1XTvAWyuB5WqVd3Cc5PIg>
 <xmx:3u1fY6Y1_CI7nwS6ln0fE1WhzxLVm9xI1LpMG54NbmfnOlij_a0OyA>
 <xmx:3u1fYwCYhGdtLTI5Uz3HBqnAiLXEZguu7WRJ38mYT8P6VqNZ9w5dow>
 <xmx:3u1fY5kRnr8NiHeXUpxvWecg2UYAy0Gxx6TsKFyWVrWcM113QxaMdA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:37 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkT8d087250; Mon, 31 Oct 2022 08:46:29 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 07/34] Refactor 'ctz' functions into a new file
Date: Mon, 31 Oct 2022 08:45:02 -0700
Message-Id: <20221031154529.3627576-8-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/lib1funcs.S (__ctzsi2): Moved to ...
	* config/arm/ctz2.S: New file.
---
 libgcc/config/arm/ctz2.S      | 86 +++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S | 65 +-------------------------
 2 files changed, 87 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/ctz2.S

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
new file mode 100644
index 00000000000..1d885dcc71a
--- /dev/null
+++ b/libgcc/config/arm/ctz2.S
@@ -0,0 +1,86 @@
+/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_ctzsi2
+#ifdef NOT_ISA_TARGET_32BIT
+FUNC_START ctzsi2
+	negs	r1, r0
+	ands	r0, r0, r1
+	movs	r1, #28
+	movs	r3, #1
+	lsls	r3, r3, #16
+	cmp	r0, r3 /* 0x10000 */
+	bcc	2f
+	lsrs	r0, r0, #16
+	subs	r1, r1, #16
+2:	lsrs	r3, r3, #8
+	cmp	r0, r3 /* #0x100 */
+	bcc	2f
+	lsrs	r0, r0, #8
+	subs	r1, r1, #8
+2:	lsrs	r3, r3, #4
+	cmp	r0, r3 /* #0x10 */
+	bcc	2f
+	lsrs	r0, r0, #4
+	subs	r1, r1, #4
+2:	adr	r2, 1f
+	ldrb	r0, [r2, r0]
+	subs	r0, r0, r1
+	bx lr
+.align 2
+1:
+.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+	FUNC_END ctzsi2
+#else
+ARM_FUNC_START ctzsi2
+	rsb	r1, r0, #0
+	and	r0, r0, r1
+# if defined (__ARM_FEATURE_CLZ)
+	clz	r0, r0
+	rsb	r0, r0, #31
+	RET
+# else
+	mov	r1, #28
+	cmp	r0, #0x10000
+	do_it	cs, t
+	movcs	r0, r0, lsr #16
+	subcs	r1, r1, #16
+	cmp	r0, #0x100
+	do_it	cs, t
+	movcs	r0, r0, lsr #8
+	subcs	r1, r1, #8
+	cmp	r0, #0x10
+	do_it	cs, t
+	movcs	r0, r0, lsr #4
+	subcs	r1, r1, #4
+	adr	r2, 1f
+	ldrb	r0, [r2, r0]
+	sub	r0, r0, r1
+	RET
+.align 2
+1:
+.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
+# endif /* !defined (__ARM_FEATURE_CLZ) */
+	FUNC_END ctzsi2
+#endif
+#endif /* L_clzsi2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 469fea9ab5c..6cf7561835d 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1804,70 +1804,7 @@ LSYM(Lover12):
 #endif /* __symbian__ */
 
 #include "clz2.S"
-
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-	negs	r1, r0
-	ands	r0, r0, r1
-	movs	r1, #28
-	movs	r3, #1
-	lsls	r3, r3, #16
-	cmp	r0, r3 /* 0x10000 */
-	bcc	2f
-	lsrs	r0, r0, #16
-	subs	r1, r1, #16
-2:	lsrs	r3, r3, #8
-	cmp	r0, r3 /* #0x100 */
-	bcc	2f
-	lsrs	r0, r0, #8
-	subs	r1, r1, #8
-2:	lsrs	r3, r3, #4
-	cmp	r0, r3 /* #0x10 */
-	bcc	2f
-	lsrs	r0, r0, #4
-	subs	r1, r1, #4
-2:	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	subs	r0, r0, r1
-	bx lr
-.align 2
-1:
-.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-	FUNC_END ctzsi2
-#else
-ARM_FUNC_START ctzsi2
-	rsb	r1, r0, #0
-	and	r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-	clz	r0, r0
-	rsb	r0, r0, #31
-	RET
-# else
-	mov	r1, #28
-	cmp	r0, #0x10000
-	do_it	cs, t
-	movcs	r0, r0, lsr #16
-	subcs	r1, r1, #16
-	cmp	r0, #0x100
-	do_it	cs, t
-	movcs	r0, r0, lsr #8
-	subcs	r1, r1, #8
-	cmp	r0, #0x10
-	do_it	cs, t
-	movcs	r0, r0, lsr #4
-	subcs	r1, r1, #4
-	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	sub	r0, r0, r1
-	RET
-.align 2
-1:
-.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (__ARM_FEATURE_CLZ) */
-	FUNC_END ctzsi2
-#endif
-#endif /* L_clzsi2 */
+#include "ctz2.S"
 
 /* ------------------------------------------------------------------------ */
 /* These next two sections are here despite the fact that they contain Thumb 

From patchwork Mon Oct 31 15:45:03 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59667
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 76CCA38AA240
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:48:22 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id AC8D73852763
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AC8D73852763
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id 7EAF43200917;
 Mon, 31 Oct 2022 11:46:50 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:46:50 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231210; x=
 1667317610; bh=lt0pELubRSOPFzEnRpwkHqGaImwq+QxIgNDhPngg6TI=; b=d
 3ZxtrWL4uP6FJwl/r1xrmUtqkoqju2phZWAMjr/jvnQpktXBWmoyT+xdQNhOYEb8
 9EOgOsdPAK+PYcXp5/i5FwdPua5cLIHNeWM7/+pFcLiech50egAWbu6tfXaOeWK+
 QSZJwd7N1KPUzZR3Jm6xq/bInRxo+KT3ROsQbcVK3/IuwJczjTArnjhICBZo8gWz
 bmVh7GNUrsOSUrm+9PIJ1QYN1txaPRX7vRr08Fsm4G2BP2FLgOOjbZNNh71AOECR
 TNuK4feERZ439dS4repZacOhmnv465r9egS0VRQOKuel1g32CQkduTG7GLC0y4Tu
 0HiQ9y3wkG0C0y2EuzQEw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231210; x=1667317610; bh=lt0pELubRSOPF
 zEnRpwkHqGaImwq+QxIgNDhPngg6TI=; b=kIDcqaVG+yd/h4nAWLNQ3w2aLmrJD
 B7HzbGRDFVw4WcfOF1+c28Qole4avIKaS/W2eiiLQtYghvcOmf7tDnIPvIDDqPnT
 UVIz16VDrWtLIe2IFZpWXofrwZK6kpjD23a4JCIUT+sbtlAfxl3Rizb6bY2SrfWe
 zQvtzNFSQ6bu4txUXizp8hVZuqO5XT4DrRqUyieyzhqdUFZEYsXuGzi+6ziJ6r4M
 w973DVOd3QXIE/ZPtS10Rge9gSgfosmkCQpXdnFNDP0yAHeFVJs6lFORlonyiu24
 89Gv0AhpihuK4uzuIcGR5UNjk3N0sq3qksiSiyRlnVuyEqHk7OKyBKAJA==
X-ME-Sender: <xms:6e1fY9TbZg-UG9D9gyodqkBLX3DdgEU0HEWnO56NJfAwxI62aSQqjg>
 <xme:6e1fY2wtsr8jEULBmI2dH3dtUavIZsFypUv0PabzBKIr4oQH_Zahyq55DjtBHbZvA
 j6CdhOPhnnkog>
X-ME-Received: 
 <xmr:6e1fYy23nDLayk2iYefb4SA5IPS7Yn4AaEcHkjLiacdDU-V6jDOiNZ-s_ztJdeRqotBjkHuYveeGAn_0bodJBy9t__XT-cg7sgksO1_u1JMZK7chYxOv9M4>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpedvleevueetiefglefgteekkeeftdekhfeileelffekfefhieettddtfefg
 gfevkeenucffohhmrghinheplhhshhhifhhtrdhssgdpghhnuhdrohhrghdplhhisgdufh
 hunhgtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhf
 rhhomhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:6u1fY1DYq4GdpvxCSZbuj957d04kNnHWaipZ52p5zrjprlebLnH9ug>
 <xmx:6u1fY2i-hqAOhU_kprh6wOM1O2bamLPUUoNhbho4PmSpQF98IfqZCA>
 <xmx:6u1fY5oeN6G10hRCthDq4VmDOAwE-Vdel0nvVgoYOH-khiPS2JrJnQ>
 <xmx:6u1fY5swdDth-ComBxYVCpXKMw5BHxvFv3CWoGQSu_wy2N9Oxz7Ifw>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:49 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkewt087253; Mon, 31 Oct 2022 08:46:40 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 08/34] Refactor 64-bit shift functions into a new file
Date: Mon, 31 Oct 2022 08:45:03 -0700
Message-Id: <20221031154529.3627576-9-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_10_SHORT_WORD_LINES,
 SCC_20_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/lib1funcs.S (__ashldi3, __ashrdi3, __lshldi3): Moved to ...
	* config/arm/eabi/lshift.S: New file.
---
 libgcc/config/arm/eabi/lshift.S | 123 ++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S   | 103 +-------------------------
 2 files changed, 124 insertions(+), 102 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lshift.S

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
new file mode 100644
index 00000000000..6e79d96c118
--- /dev/null
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -0,0 +1,123 @@
+/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_lshrdi3
+
+	FUNC_START lshrdi3
+	FUNC_ALIAS aeabi_llsr lshrdi3
+	
+#ifdef __thumb__
+	lsrs	al, r2
+	movs	r3, ah
+	lsrs	ah, r2
+	mov	ip, r3
+	subs	r2, #32
+	lsrs	r3, r2
+	orrs	al, r3
+	negs	r2, r2
+	mov	r3, ip
+	lsls	r3, r2
+	orrs	al, r3
+	RET
+#else
+	subs	r3, r2, #32
+	rsb	ip, r2, #32
+	movmi	al, al, lsr r2
+	movpl	al, ah, lsr r3
+	orrmi	al, al, ah, lsl ip
+	mov	ah, ah, lsr r2
+	RET
+#endif
+	FUNC_END aeabi_llsr
+	FUNC_END lshrdi3
+
+#endif
+	
+#ifdef L_ashrdi3
+	
+	FUNC_START ashrdi3
+	FUNC_ALIAS aeabi_lasr ashrdi3
+	
+#ifdef __thumb__
+	lsrs	al, r2
+	movs	r3, ah
+	asrs	ah, r2
+	subs	r2, #32
+	@ If r2 is negative at this point the following step would OR
+	@ the sign bit into all of AL.  That's not what we want...
+	bmi	1f
+	mov	ip, r3
+	asrs	r3, r2
+	orrs	al, r3
+	mov	r3, ip
+1:
+	negs	r2, r2
+	lsls	r3, r2
+	orrs	al, r3
+	RET
+#else
+	subs	r3, r2, #32
+	rsb	ip, r2, #32
+	movmi	al, al, lsr r2
+	movpl	al, ah, asr r3
+	orrmi	al, al, ah, lsl ip
+	mov	ah, ah, asr r2
+	RET
+#endif
+
+	FUNC_END aeabi_lasr
+	FUNC_END ashrdi3
+
+#endif
+
+#ifdef L_ashldi3
+
+	FUNC_START ashldi3
+	FUNC_ALIAS aeabi_llsl ashldi3
+	
+#ifdef __thumb__
+	lsls	ah, r2
+	movs	r3, al
+	lsls	al, r2
+	mov	ip, r3
+	subs	r2, #32
+	lsls	r3, r2
+	orrs	ah, r3
+	negs	r2, r2
+	mov	r3, ip
+	lsrs	r3, r2
+	orrs	ah, r3
+	RET
+#else
+	subs	r3, r2, #32
+	rsb	ip, r2, #32
+	movmi	ah, ah, lsl r2
+	movpl	ah, al, lsl r3
+	orrmi	ah, ah, al, lsr ip
+	mov	al, al, lsl r2
+	RET
+#endif
+	FUNC_END aeabi_llsl
+	FUNC_END ashldi3
+
+#endif
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 6cf7561835d..aa5957b8399 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1699,108 +1699,7 @@ LSYM(Lover12):
 
 /* Prevent __aeabi double-word shifts from being produced on SymbianOS.  */
 #ifndef __symbian__
-
-#ifdef L_lshrdi3
-
-	FUNC_START lshrdi3
-	FUNC_ALIAS aeabi_llsr lshrdi3
-	
-#ifdef __thumb__
-	lsrs	al, r2
-	movs	r3, ah
-	lsrs	ah, r2
-	mov	ip, r3
-	subs	r2, #32
-	lsrs	r3, r2
-	orrs	al, r3
-	negs	r2, r2
-	mov	r3, ip
-	lsls	r3, r2
-	orrs	al, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	al, al, lsr r2
-	movpl	al, ah, lsr r3
-	orrmi	al, al, ah, lsl ip
-	mov	ah, ah, lsr r2
-	RET
-#endif
-	FUNC_END aeabi_llsr
-	FUNC_END lshrdi3
-
-#endif
-	
-#ifdef L_ashrdi3
-	
-	FUNC_START ashrdi3
-	FUNC_ALIAS aeabi_lasr ashrdi3
-	
-#ifdef __thumb__
-	lsrs	al, r2
-	movs	r3, ah
-	asrs	ah, r2
-	subs	r2, #32
-	@ If r2 is negative at this point the following step would OR
-	@ the sign bit into all of AL.  That's not what we want...
-	bmi	1f
-	mov	ip, r3
-	asrs	r3, r2
-	orrs	al, r3
-	mov	r3, ip
-1:
-	negs	r2, r2
-	lsls	r3, r2
-	orrs	al, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	al, al, lsr r2
-	movpl	al, ah, asr r3
-	orrmi	al, al, ah, lsl ip
-	mov	ah, ah, asr r2
-	RET
-#endif
-
-	FUNC_END aeabi_lasr
-	FUNC_END ashrdi3
-
-#endif
-
-#ifdef L_ashldi3
-
-	FUNC_START ashldi3
-	FUNC_ALIAS aeabi_llsl ashldi3
-	
-#ifdef __thumb__
-	lsls	ah, r2
-	movs	r3, al
-	lsls	al, r2
-	mov	ip, r3
-	subs	r2, #32
-	lsls	r3, r2
-	orrs	ah, r3
-	negs	r2, r2
-	mov	r3, ip
-	lsrs	r3, r2
-	orrs	ah, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	ah, ah, lsl r2
-	movpl	ah, al, lsl r3
-	orrmi	ah, ah, al, lsr ip
-	mov	al, al, lsl r2
-	RET
-#endif
-	FUNC_END aeabi_llsl
-	FUNC_END ashldi3
-
-#endif
-
+#include "eabi/lshift.S"
 #endif /* __symbian__ */
 
 #include "clz2.S"

From patchwork Mon Oct 31 15:45:04 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59669
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 8D9B73953801
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:48:53 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id BE8133860777
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:46:56 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE8133860777
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id 8E0E13200094;
 Mon, 31 Oct 2022 11:46:55 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:46:55 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231215; x=
 1667317615; bh=NQHA6C/JVA87/+133/0AHHXVcYpD7PJcCLZehkbDAV0=; b=p
 jgZUOmMEfRXOLpN/SUKk7hQp1TdAAYvTdWouNIqk1YnXOzO1VN6Z2lmmyH3mwnhX
 EzPrKzqVovcStwXGgsXAhRYvTrp6upw26UftOiornQvyF42rP0wGmUcwqEwh2KmF
 Y6uygykVkt891icOz/8aXhgPrCFSX81H0+USSFzRCp6tq8155Zem4yLk8180dpK1
 FDe7HyIgMYygcckPrGjX2J5kv+Svm/Kg0s7vo5NTMtCaPH+1lqlCfbdoGVmUdIAr
 4N+03a22ExwEkOFtOgTr6Pk6RKbWImpTqLMESuEi9SFkptXnrZViuq6Myufssvwp
 09HlKxy3DqY+3Wcnw3wAg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231215; x=1667317615; bh=NQHA6C/JVA87/
 +133/0AHHXVcYpD7PJcCLZehkbDAV0=; b=MxbToeHCrNtURPyLSqWHjQKz/2Bot
 J7woENAKiZ3fFxV3PFLOfbFsaknnPLBWoNMZypz/JODxYjKZ600avGJQp/UzLPag
 sCl5hmzOWvNytTlrN5vKeQ8iXhgFBVRRYXCy6gs+agJCqG4AFzmaWqZwe3yo5Dtl
 afwvM6l1AWzO79vYDBjuhkM20fbT9frT2Vk3sCDtGndBYjSxKvDLEROCVdi42ZJU
 hhP8XBJlSGzV9zYzFVIrRdOu+q1KBfyM4/xoaMVaTMRyEH1TMcIaXLAAzttH3u8D
 Y5PxbALXznOae2H8OnRFFpyz8BxRpch8r5sjHmQsftkwm1nrNRRQJFCHQ==
X-ME-Sender: <xms:7-1fYzfRhduZnkWjmxhaZpPxhrq_gDbvtvuYJ7bJFhvOCdzULXUS6g>
 <xme:7-1fY5Oam4qBQJWej0ewVKtuyX-qycktUuAPRyzzjaQ241IZ_Lwkgkjce4ujpOYMz
 sumeiDGn84Ovg>
X-ME-Received: 
 <xmr:7-1fY8g6BcL8PNXP1nKPUf0Iu20NA5AD2R12NlrKrwkXdNhQF7LVrHiHW5uFdrF2atMfDdp1uQEOr-GcufFPkfw8mulFpSqq1u42_TsHOPl7EG3SN3aDuCY>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeetfeefvdelfffhteeiteehfeekuedtteekheevjedtjefhudehheejjedv
 udffveenucffohhmrghinheptghliidvrdhssgdpghhnuhdrohhrghenucevlhhushhtvg
 hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgv
 nhhgvghlrdgtohhm
X-ME-Proxy: <xmx:7-1fY08jTTQC5Useo43uhjn4b5RPWX8cDYYrD5Yo-6804BEpoUHwBQ>
 <xmx:7-1fY_vGqN2JEdosGzhXEZWzqJCL8Uho1X3Kh0cXGld4h4nshboLbg>
 <xmx:7-1fYzHX7J2T4Wlb0icJx9JLuherIiAlRTgo2C1iBWkWzJQLDcFRvg>
 <xmx:7-1fY46YacrQtgmMz6JxSTd0plOk7jgpJ5eifY0hYE_NriCrBJtbgA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:54 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkkPu087256; Mon, 31 Oct 2022 08:46:46 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 09/34] Import 'clz' functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:04 -0700
Message-Id: <20221031154529.3627576-10-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

On architectures without __ARM_FEATURE_CLZ, this version combines __clzdi2()
with __clzsi2() into a single object with an efficient tail call.  Also, this
version merges the formerly separate Thumb and ARM code implementations
into a unified instruction sequence.  This change significantly improves
Thumb performance without affecting ARM performance.  Finally, this version
adds a new __OPTIMIZE_SIZE__ build option (binary search loop).

There is no change to the code for architectures with __ARM_FEATURE_CLZ.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bits/clz2.S (__clzsi2, __clzdi2): Reduced code size on
        architectures without __ARM_FEATURE_CLZ.
	* config/arm/t-elf (LIB1ASMFUNCS): Moved _clzsi2 to new weak roup.
---
 libgcc/config/arm/clz2.S | 363 +++++++++++++++++++++++++--------------
 libgcc/config/arm/t-elf  |   7 +-
 2 files changed, 237 insertions(+), 133 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index 439341752ba..ed04698fef4 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,145 +1,244 @@
-/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+/* clz2.S: Cortex M0 optimized 'clz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+#ifdef L_clzdi2
+
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2
+    CFI_START_FUNCTION
+
+        // Moved here from lib1funcs.S
+        cmp     xxh,    #0
+        do_it   eq,     et
+        clzeq   r0,     xxl
+        clzne   r0,     xxh
+        addeq   r0,     #32
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END clzdi2
+
+#endif /* L_clzdi2 */
 
 
 #ifdef L_clzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzsi2
-	movs	r1, #28
-	movs	r3, #1
-	lsls	r3, r3, #16
-	cmp	r0, r3 /* 0x10000 */
-	bcc	2f
-	lsrs	r0, r0, #16
-	subs	r1, r1, #16
-2:	lsrs	r3, r3, #8
-	cmp	r0, r3 /* #0x100 */
-	bcc	2f
-	lsrs	r0, r0, #8
-	subs	r1, r1, #8
-2:	lsrs	r3, r3, #4
-	cmp	r0, r3 /* #0x10 */
-	bcc	2f
-	lsrs	r0, r0, #4
-	subs	r1, r1, #4
-2:	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	adds	r0, r0, r1
-	bx lr
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-	FUNC_END clzsi2
-#else
-ARM_FUNC_START clzsi2
-# if defined (__ARM_FEATURE_CLZ)
-	clz	r0, r0
-	RET
-# else
-	mov	r1, #28
-	cmp	r0, #0x10000
-	do_it	cs, t
-	movcs	r0, r0, lsr #16
-	subcs	r1, r1, #16
-	cmp	r0, #0x100
-	do_it	cs, t
-	movcs	r0, r0, lsr #8
-	subcs	r1, r1, #8
-	cmp	r0, #0x10
-	do_it	cs, t
-	movcs	r0, r0, lsr #4
-	subcs	r1, r1, #4
-	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	add	r0, r0, r1
-	RET
-.align 2
-1:
-.byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
-# endif /* !defined (__ARM_FEATURE_CLZ) */
-	FUNC_END clzsi2
-#endif
+
+// int __clzsi2(int)
+// Counts leading zero bits in $r0.
+// Returns the result in $r0.
+FUNC_START_SECTION clzsi2 .text.sorted.libgcc.clz2.clzsi2
+    CFI_START_FUNCTION
+
+        // Moved here from lib1funcs.S
+        clz     r0,     r0
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END clzsi2
+
 #endif /* L_clzsi2 */
 
+#else /* !__ARM_FEATURE_CLZ */
+
 #ifdef L_clzdi2
-#if !defined (__ARM_FEATURE_CLZ)
-
-# ifdef NOT_ISA_TARGET_32BIT
-FUNC_START clzdi2
-	push	{r4, lr}
-	cmp	xxh, #0
-	bne	1f
-#  ifdef __ARMEB__
-	movs	r0, xxl
-	bl	__clzsi2
-	adds	r0, r0, #32
-	b 2f
-1:
-	bl	__clzsi2
-#  else
-	bl	__clzsi2
-	adds	r0, r0, #32
-	b 2f
-1:
-	movs	r0, xxh
-	bl	__clzsi2
-#  endif
-2:
-	pop	{r4, pc}
-# else /* NOT_ISA_TARGET_32BIT */
-ARM_FUNC_START clzdi2
-	do_push	{r4, lr}
-	cmp	xxh, #0
-	bne	1f
-#  ifdef __ARMEB__
-	mov	r0, xxl
-	bl	__clzsi2
-	add	r0, r0, #32
-	b 2f
-1:
-	bl	__clzsi2
-#  else
-	bl	__clzsi2
-	add	r0, r0, #32
-	b 2f
-1:
-	mov	r0, xxh
-	bl	__clzsi2
-#  endif
-2:
-	RETLDM	r4
-	FUNC_END clzdi2
-# endif /* NOT_ISA_TARGET_32BIT */
-
-#else /* defined (__ARM_FEATURE_CLZ) */
-
-ARM_FUNC_START clzdi2
-	cmp	xxh, #0
-	do_it	eq, et
-	clzeq	r0, xxl
-	clzne	r0, xxh
-	addeq	r0, r0, #32
-	RET
-	FUNC_END clzdi2
 
-#endif
+// int __clzdi2(long long)
+// Counts leading zero bits in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+FUNC_START_SECTION clzdi2 .text.sorted.libgcc.clz2.clzdi2
+    CFI_START_FUNCTION
+
+  #if defined(__ARMEB__) && __ARMEB__
+        // Check if the upper word is zero.
+        cmp     r0,     #0
+
+        // The upper word is non-zero, so calculate __clzsi2(upper).
+        bne     SYM(__clzsi2)
+
+        // The upper word is zero, so calculate 32 + __clzsi2(lower).
+        movs    r2,     #64
+        movs    r0,     r1
+        b       SYM(__internal_clzsi2)
+
+  #else /* !__ARMEB__ */
+        // Assume all the bits in the argument are zero.
+        movs    r2,     #64
+
+        // Check if the upper word is zero.
+        cmp     r1,     #0
+
+        // The upper word is zero, so calculate 32 + __clzsi2(lower).
+        beq     SYM(__internal_clzsi2)
+
+        // The upper word is non-zero, so set up __clzsi2(upper).
+        // Then fall through.
+        movs    r0,     r1
+
+  #endif /* !__ARMEB__ */
+
 #endif /* L_clzdi2 */
 
+
+// The bitwise implementation of __clzdi2() tightly couples with __clzsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __clzdi2() when only using __clzsi2().
+// Therefore, this block configures __clzsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __clzdi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_clzsi2' should appear before '_clzdi2' in LIB1ASMFUNCS.
+#if defined(L_clzsi2) || defined(L_clzdi2)
+
+#ifdef L_clzsi2
+// int __clzsi2(int)
+// Counts leading zero bits in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+WEAK_START_SECTION clzsi2 .text.sorted.libgcc.clz2.clzsi2
+    CFI_START_FUNCTION
+
+#else /* L_clzdi2 */
+FUNC_ENTRY clzsi2
+
+#endif
+
+        // Assume all the bits in the argument are zero
+        movs    r2,     #32
+
+#ifdef L_clzsi2
+    WEAK_ENTRY internal_clzsi2
+#else /* L_clzdi2 */
+    FUNC_ENTRY internal_clzsi2
+#endif
+
+        // Size optimized: 22 bytes, 51 cycles
+        // Speed optimized: 50 bytes, 20 cycles
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+        // Binary search starts at half the word width.
+        movs    r3,     #16
+
+    LLSYM(__clz_loop):
+        // Test the upper 'n' bits of the operand for ZERO.
+        movs    r1,     r0
+        lsrs    r1,     r3
+
+        // When the test fails, discard the lower bits of the register,
+        //  and deduct the count of discarded bits from the result.
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne, t
+      #else
+        beq     LLSYM(__clz_skip)
+      #endif
+
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     r3
+
+    LLSYM(__clz_skip):
+        // Decrease the shift distance for the next test.
+        lsrs    r3,     #1
+        bne     LLSYM(__clz_loop)
+
+  #else /* __OPTIMIZE_SIZE__ */
+
+        // Unrolled binary search.
+        lsrs    r1,     r0,     #16
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne,t
+      #else
+        beq     LLSYM(__clz8)
+      #endif
+
+        // Out of 32 bits, the first '1' is somewhere in the highest 16,
+        //  so the lower 16 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #16
+
+    LLSYM(__clz8):
+        lsrs    r1,     r0,     #8
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne,t
+      #else
+        beq     LLSYM(__clz4)
+      #endif
+
+        // Out of 16 bits, the first '1' is somewhere in the highest 8,
+        //  so the lower 8 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #8
+
+    LLSYM(__clz4):
+        lsrs    r1,     r0,     #4
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne,t
+      #else
+        beq     LLSYM(__clz2)
+      #endif
+
+        // Out of 8 bits, the first '1' is somewhere in the highest 4,
+        //  so the lower 4 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #4
+
+    LLSYM(__clz2):
+        // Load the remainder by index
+        adr     r1,     LLSYM(__clz_remainder)
+        ldrb    r0,     [r1, r0]
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+        // Account for the remainder.
+        subs    r0,     r2,     r0
+        RET
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        .align 2
+    LLSYM(__clz_remainder):
+        .byte 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4
+  #endif
+
+    CFI_END_FUNCTION
+FUNC_END internal_clzsi2
+FUNC_END clzsi2
+
+#ifdef L_clzdi2
+FUNC_END clzdi2
+#endif
+
+#endif /* L_clzsi2 || L_clzdi2 */
+
+#endif /* !__ARM_FEATURE_CLZ */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 93ea1cd8f76..af779afa0a9 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -19,13 +19,18 @@ endif # !__symbian__
 # assembly implementation here for double-precision values.
 
 
+# Group 0: WEAK overridable function objects.
+# See respective sources for rationale.
+LIB1ASMFUNCS += \
+        _clzsi2 \
+
+
 # Group 1: Integer function objects.
 LIB1ASMFUNCS += \
 	_ashldi3 \
 	_ashrdi3 \
 	_lshrdi3 \
 	_clzdi2 \
-	_clzsi2 \
 	_ctzsi2 \
 	_dvmd_tls \
 	_divsi3 \

From patchwork Mon Oct 31 15:45:05 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59671
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 00B5A381E70E
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:49:10 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id DF9E03851174
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF9E03851174
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 829EB32008FE;
 Mon, 31 Oct 2022 11:47:00 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:47:01 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231220; x=
 1667317620; bh=sNkTKfHwm4B9J/9d+5QwTEHlyVfxGmahv0kDIepKchY=; b=p
 tnusl8Q0cv/VnezeL09XEIKPeewTTg/65SQUFXsiXYr1iJTAut3ZdJCLZY/hM/hN
 fd2/G/tfaMGDWnjcO53Txw/BzzWLXqtgZkb/rZH4pCM1RxZeJzAN6S7Up1AVHsey
 eRijI/RR2PXOHMWkoEKu3jmpWBCTEKq9vuY4hhpVJHdL9oZicy/0o3k3D6EdjdwP
 zXbZpRAnPVbxw4IrM0Z6nhmT7+mGgNk2sCrO/ReMWXqCgPvMHauFt0e2x6CYmy/7
 ldu/OSniFHT/fNzb77RjdfQkP8aVVRkwcS292iwC92xZlD5TSlaJbYS1Tu9efZOq
 P0nxTsfVA8DrYeFm9pLPw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231220; x=1667317620; bh=sNkTKfHwm4B9J
 /9d+5QwTEHlyVfxGmahv0kDIepKchY=; b=QdIfRyWTlX9XezmPkQMb3Ia9+bua3
 29Sg7Nwff78iae35e8JhiR7rvslnx2quiW3GmqrXhfutaWy6nOV/aeA+u2g7sldD
 Z18np6/rJuJcl9Rn7GRODkcK5nVJZ3gVKqCKTOgBMMaTQGQU0LYalj4bL7QXe9lO
 GgEiaJnny97swVYfXjd5n/lonqNSm0Zbl59stpZ12fKNvYddHS/hTdlNf26eF5Dv
 RgGqA/jiwENvy+w6rHZt/1Tgke2RY5RVno3lEifETFcG+ETMEoPq6/4KlYF6/PtZ
 WN1OvUBgP2vA7nYKFWO4iL4F8OjN2WLjhU+9m7spJqAhcIiS7rl/UMI7Q==
X-ME-Sender: <xms:9O1fY9HN51L9_jdqWvqLZfAqYExQHHHQInCLGzZSUQZW1JlQ_6G_Ug>
 <xme:9O1fYyWEPtJJHCyVcLdEL34S5H-e08PsEXZqIjGdfx-k5SKkeL_U6Ut2h8yENM6FX
 zuIx145TwNx1g>
X-ME-Received: 
 <xmr:9O1fY_LZ046mzJ5i88eYdg7Rcb_d7Eu0gru5-HDxN2TDDxoUxuejtHyy0eq1ObOaDKaxBGxUqem6CTuBG60mRR71gvfir4OdAoe0bwU0SX-0c-4Y4THMJ0U>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpedtkedtgeegveejfeefvdetjeelgfegkeeitdeiueefkeefveevffelueei
 gfevueenucffohhmrghinheptghtiidvrdhssgdpghhnuhdrohhrghenucevlhhushhtvg
 hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgv
 nhhgvghlrdgtohhm
X-ME-Proxy: <xmx:9O1fYzGk_jHnMHW8gMCvPVJpDrW77n_DxJZE3yvKOcJl46z8yCvLuQ>
 <xmx:9O1fYzWzXyhG2VuIgR6oByRZ0Fn9yc3jJw0NjrTpI5klgnpCoWWlRw>
 <xmx:9O1fY-OHg9cgwT7in9KfEsjatoM4gYcHabY8FJdBThkM9AJCE2RaFQ>
 <xmx:9O1fYwgSGDMJSBDjrCE7K79xaQ5W4gzDjKD6GXs6sDF3OoxOthRyWw>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:46:59 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkp3p087259; Mon, 31 Oct 2022 08:46:51 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 10/34] Import 'ctz' functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:05 -0700
Message-Id: <20221031154529.3627576-11-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This version combines __ctzdi2() with __ctzsi2() into a single object with
an efficient tail call.  The former implementation of __ctzdi2() was in C.

On architectures without __ARM_FEATURE_CLZ, this version merges the formerly
separate Thumb and ARM code sequences into a unified instruction sequence.
This change significantly improves Thumb performance without affecting ARM
performance.  Finally, this version adds a new __OPTIMIZE_SIZE__ build option.

On architectures with __ARM_FEATURE_CLZ, __ctzsi2(0) now returns 32.  Formerly,
__ctzsi2(0) would return -1.  Architectures without __ARM_FEATURE_CLZ have
always returned 32, so this change makes the return value consistent.
This change costs 2 extra instructions (branchless).

Likewise on architectures with __ARM_FEATURE_CLZ,  __ctzdi2(0) now returns
64 instead of 31.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bits/ctz2.S (__ctzdi2): Added a new function.
	(__clzsi2): Reduced size on architectures without __ARM_FEATURE_CLZ;
	changed so __clzsi2(0)=32 on architectures wtih __ARM_FEATURE_CLZ.
	* config/arm/t-elf (LIB1ASMFUNCS): Added _ctzdi2;
	moved _ctzsi2 to the weak function objects group.
---
 libgcc/config/arm/ctz2.S | 308 +++++++++++++++++++++++++++++----------
 libgcc/config/arm/t-elf  |   3 +-
 2 files changed, 233 insertions(+), 78 deletions(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index 1d885dcc71a..82c81c6ae11 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,86 +1,240 @@
-/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+/* ctz2.S: ARM optimized 'ctz' functions
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
 
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
-#ifdef L_ctzsi2
-#ifdef NOT_ISA_TARGET_32BIT
-FUNC_START ctzsi2
-	negs	r1, r0
-	ands	r0, r0, r1
-	movs	r1, #28
-	movs	r3, #1
-	lsls	r3, r3, #16
-	cmp	r0, r3 /* 0x10000 */
-	bcc	2f
-	lsrs	r0, r0, #16
-	subs	r1, r1, #16
-2:	lsrs	r3, r3, #8
-	cmp	r0, r3 /* #0x100 */
-	bcc	2f
-	lsrs	r0, r0, #8
-	subs	r1, r1, #8
-2:	lsrs	r3, r3, #4
-	cmp	r0, r3 /* #0x10 */
-	bcc	2f
-	lsrs	r0, r0, #4
-	subs	r1, r1, #4
-2:	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	subs	r0, r0, r1
-	bx lr
-.align 2
-1:
-.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-	FUNC_END ctzsi2
+
+// When the hardware 'ctz' function is available, an efficient version
+//  of __ctzsi2(x) can be created by calculating '31 - __ctzsi2(lsb(x))',
+//  where lsb(x) is 'x' with only the least-significant '1' bit set.
+// The following offset applies to all of the functions in this file.
+#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+  #define CTZ_RESULT_OFFSET 1
 #else
-ARM_FUNC_START ctzsi2
-	rsb	r1, r0, #0
-	and	r0, r0, r1
-# if defined (__ARM_FEATURE_CLZ)
-	clz	r0, r0
-	rsb	r0, r0, #31
-	RET
-# else
-	mov	r1, #28
-	cmp	r0, #0x10000
-	do_it	cs, t
-	movcs	r0, r0, lsr #16
-	subcs	r1, r1, #16
-	cmp	r0, #0x100
-	do_it	cs, t
-	movcs	r0, r0, lsr #8
-	subcs	r1, r1, #8
-	cmp	r0, #0x10
-	do_it	cs, t
-	movcs	r0, r0, lsr #4
-	subcs	r1, r1, #4
-	adr	r2, 1f
-	ldrb	r0, [r2, r0]
-	sub	r0, r0, r1
-	RET
-.align 2
-1:
-.byte	27, 28, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31
-# endif /* !defined (__ARM_FEATURE_CLZ) */
-	FUNC_END ctzsi2
+  #define CTZ_RESULT_OFFSET 0
+#endif
+
+
+#ifdef L_ctzdi2
+
+// int __ctzdi2(long long)
+// Counts trailing zeros in a 64 bit double word.
+// Expects the argument  in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+FUNC_START_SECTION ctzdi2 .text.sorted.libgcc.ctz2.ctzdi2
+    CFI_START_FUNCTION
+
+      #if defined(__ARMEB__) && __ARMEB__
+        // Assume all the bits in the argument are zero.
+        movs    r2,    #(64 - CTZ_RESULT_OFFSET)
+
+        // Check if the lower word is zero.
+        cmp     r1,     #0
+
+        // The lower word is zero, so calculate 32 + __ctzsi2(upper).
+        beq     SYM(__internal_ctzsi2)
+
+        // The lower word is non-zero, so set up __ctzsi2(lower).
+        // Then fall through.
+        movs    r0,     r1
+
+      #else /* !__ARMEB__ */
+        // Check if the lower word is zero.
+        cmp     r0,     #0
+
+        // If the lower word is non-zero, result is just __ctzsi2(lower).
+        bne     SYM(__ctzsi2)
+
+        // The lower word is zero, so calculate 32 + __ctzsi2(upper).
+        movs    r2,    #(64 - CTZ_RESULT_OFFSET)
+        movs    r0,     r1
+        b       SYM(__internal_ctzsi2)
+
+      #endif /* !__ARMEB__ */
+
+#endif /* L_ctzdi2 */
+
+
+// The bitwise implementation of __ctzdi2() tightly couples with __ctzsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __ctzdi2() when only using __ctzsi2().
+// Therefore, this block configures __ctzsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __ctzdi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_ctzsi2' should appear before '_ctzdi2' in LIB1ASMFUNCS.
+#if defined(L_ctzsi2) || defined(L_ctzdi2)
+
+#ifdef L_ctzsi2
+// int __ctzsi2(int)
+// Counts trailing zeros in a 32 bit word.
+// Expects the argument in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+WEAK_START_SECTION ctzsi2 .text.sorted.libgcc.ctz2.ctzdi2
+    CFI_START_FUNCTION
+
+#else /* L_ctzdi2 */
+FUNC_ENTRY ctzsi2
+
 #endif
-#endif /* L_clzsi2 */
+
+        // Assume all the bits in the argument are zero
+        movs    r2,     #(32 - CTZ_RESULT_OFFSET)
+
+#ifdef L_ctzsi2
+    WEAK_ENTRY internal_ctzsi2
+#else /* L_ctzdi2 */
+    FUNC_ENTRY internal_ctzsi2
+#endif
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+
+        // Find the least-significant '1' bit of the argument.
+        rsbs    r1,     r0,     #0
+        ands    r1,     r0
+
+        // Maintain result compatibility with the software implementation.
+        // Technically, __ctzsi2(0) is undefined, but 32 seems better than -1.
+        //  (or possibly 31 if this is an intermediate result for __ctzdi2(0)).
+        // The carry flag from 'rsbs' gives '-1' iff the argument was 'zero'.
+        //  (NOTE: 'ands' with 0 shift bits does not change the carry flag.)
+        // After the jump, the final result will be '31 - (-1)'.
+        sbcs    r0,     r0
+
+     #ifdef __HAVE_FEATURE_IT
+        do_it   ne
+     #else
+        beq     LLSYM(__ctz_zero)
+     #endif
+
+        // Gives the number of '0' bits left of the least-significant '1'.
+     IT(clz,ne) r0,     r1
+
+  #elif defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Size optimized: 24 bytes, 52 cycles
+        // Speed optimized: 52 bytes, 21 cycles
+
+        // Binary search starts at half the word width.
+        movs    r3,     #16
+
+    LLSYM(__ctz_loop):
+        // Test the upper 'n' bits of the operand for ZERO.
+        movs    r1,     r0
+        lsls    r1,     r3
+
+        // When the test fails, discard the lower bits of the register,
+        //  and deduct the count of discarded bits from the result.
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne, t
+      #else
+        beq     LLSYM(__ctz_skip)
+      #endif
+
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     r3
+
+    LLSYM(__ctz_skip):
+        // Decrease the shift distance for the next test.
+        lsrs    r3,     #1
+        bne     LLSYM(__ctz_loop)
+
+        // Prepare the remainder.
+        lsrs    r0,     #31
+
+  #else /* !__OPTIMIZE_SIZE__ */
+
+        // Unrolled binary search.
+        lsls    r1,     r0,     #16
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne, t
+      #else
+        beq     LLSYM(__ctz8)
+      #endif
+
+        // Out of 32 bits, the first '1' is somewhere in the lowest 16,
+        //  so the higher 16 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #16
+
+    LLSYM(__ctz8):
+        lsls    r1,     r0,     #8
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne, t
+      #else
+        beq     LLSYM(__ctz4)
+      #endif
+
+        // Out of 16 bits, the first '1' is somewhere in the lowest 8,
+        //  so the higher 8 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #8
+
+    LLSYM(__ctz4):
+        lsls    r1,     r0,     #4
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne, t
+      #else
+        beq     LLSYM(__ctz2)
+      #endif
+
+        // Out of 8 bits, the first '1' is somewhere in the lowest 4,
+        //  so the higher 4 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #4
+
+    LLSYM(__ctz2):
+        // Look up the remainder by index.
+        lsrs    r0,     #28
+        adr     r3,     LLSYM(__ctz_remainder)
+        ldrb    r0,     [r3, r0]
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    LLSYM(__ctz_zero):
+        // Apply the remainder.
+        subs    r0,     r2,     r0
+        RET
+
+  #if (!defined(__ARM_FEATURE_CLZ) || !__ARM_FEATURE_CLZ) && \
+      (!defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__)
+        .align 2
+    LLSYM(__ctz_remainder):
+        .byte 0,4,3,4,2,4,3,4,1,4,3,4,2,4,3,4
+  #endif
+
+    CFI_END_FUNCTION
+FUNC_END internal_ctzsi2
+FUNC_END ctzsi2
+
+#ifdef L_ctzdi2
+FUNC_END ctzdi2
+#endif
+
+#endif /* L_ctzsi2 || L_ctzdi2 */
 
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index af779afa0a9..33b83ac4adf 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -23,6 +23,7 @@ endif # !__symbian__
 # See respective sources for rationale.
 LIB1ASMFUNCS += \
         _clzsi2 \
+	_ctzsi2 \
 
 
 # Group 1: Integer function objects.
@@ -31,7 +32,7 @@ LIB1ASMFUNCS += \
 	_ashrdi3 \
 	_lshrdi3 \
 	_clzdi2 \
-	_ctzsi2 \
+	_ctzdi2 \
 	_dvmd_tls \
 	_divsi3 \
 	_modsi3 \

From patchwork Mon Oct 31 15:45:06 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59672
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 56CCD3959E73
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:49:29 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id C6658384BC02
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C6658384BC02
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id B40C53200974;
 Mon, 31 Oct 2022 11:47:05 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:47:06 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231225; x=
 1667317625; bh=+mqXK0GkDMOt2mCL0pDwD+3ZSBXYCopi06zR1g2vQUs=; b=t
 F6bmtkKdX6mk0kfPp0k68buO6WPD6Ccl/EU9KxbgacjLhb6+KTnutO/Rh80H3SFE
 6JjPbbfDsjlS77aI595UZAlm9+FWzuKgOImfxoC5oj8/pgKIawX9ipAh+VQSXde9
 0g7gkcb6mKMTvR2y9DvXdxz/+Ds2bpMcWtoVLS8p9o31S1cHEvoDP31841GWgX7v
 NGUUqomu39C+nx3JkfjT2su0+Tdqo6tdvEWtuIIlLPB/ZjOXJ7UPnxftqlyuSPrF
 DtxzYKJbhinkCzhd9kRClakkVP1MV/auRDxiEom3fHOj4FaUUtXdVfp64Wv+xrYG
 IeUSTB332PPuL4mAppuBg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231225; x=1667317625; bh=+mqXK0GkDMOt2
 mCL0pDwD+3ZSBXYCopi06zR1g2vQUs=; b=F+mb0jERWvciOdo8RD4ThYP0xQFpD
 +PYtFWgR8BFdIE6tYU8hF87rwieV0teIsyPNCyg3U63aecdsCJ7137qLX+S9rQk4
 d98cMg3oPbJQhDC0h5uevX2GSg1wGhBcQTkYmiT9FYi++93DJJT7lk11mZiDGQQj
 eePwoy50CidH5r9ithFWSE0OEBe81ZkK3zubWmSewqhIv0sFDC2dPX0TCIk0bP+S
 8GUJ0jy0uX5WBLX18FtMLbT2BI/qkkaObOm+2SOSWRnvgpChPw9Yr8yhsMD5yYfW
 8hRyJCtkPQtYoDpUZKeK1oa3eJWm0dFJ+1WEl9OplGrGJ67Tz8FNS++jw==
X-ME-Sender: <xms:-e1fY7iQcspXbWyHOWeP_eIhCsAFhzY-FI_DmQHs_s0qiP9bKdTUmw>
 <xme:-e1fY4AKY2IMCSM3IxC4B5FDOV7IHt87wV18p8z9A_LusPrsLdyvXYExsYfEbo1so
 1ntf18KOj0k9g>
X-ME-Received: 
 <xmr:-e1fY7HQ-oQuputj-YjxdMZUNwCyTYmEnPGGNV3_ipHND8xWAE-Rw0_6iJsN_85F-AtABz-qnZ3nw4nAeHPJzUdYDtUvrCgKSVUH15IV_UpBCqaSngFNf9w>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdejlecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpefgvedtvdffvdejgfejffehfeejgfelfeelhfdtgfehheeftdetieffhfdt
 heduheenucffohhmrghinheplhhshhhifhhtrdhssgdpghhnuhdrohhrghenucevlhhush
 htvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgv
 lhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:-e1fY4RNWRyzAThcfsnSDsuqOkixDwGstr2HpNYs5Dp1TUkhOnAbfw>
 <xmx:-e1fY4xdnnd-E7x5FYKngK8adByhXk7vPU5pNMMpqBElP3_gdev_1w>
 <xmx:-e1fY-44cFyAXU25qan8f7A6VjC7M_9G7cQpjzf6s7KND0EiW-3yvg>
 <xmx:-e1fYz8Qi_0IJzjcW0Wryy1fSXQio6cFxPjlRhymYF2JGH5HkQCSew>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:04 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFkuQv087262; Mon, 31 Oct 2022 08:46:56 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 11/34] Import 64-bit shift functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:06 -0700
Message-Id: <20221031154529.3627576-12-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The Thumb versions of these functions are each 1-2 instructions smaller
and faster, and branchless when the IT instruction is available.

The ARM versions were converted to the "xxl/xxh" big-endian register
naming convention, but are otherwise unchanged.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bits/shift.S (__ashldi3, __ashrdi3, __lshldi3):
        Reduced code size on Thumb architectures;
	updated big-endian register naming convention to "xxl/xxh".
---
 libgcc/config/arm/eabi/lshift.S | 338 +++++++++++++++++++++-----------
 1 file changed, 228 insertions(+), 110 deletions(-)

diff --git a/libgcc/config/arm/eabi/lshift.S b/libgcc/config/arm/eabi/lshift.S
index 6e79d96c118..365350dfb2d 100644
--- a/libgcc/config/arm/eabi/lshift.S
+++ b/libgcc/config/arm/eabi/lshift.S
@@ -1,123 +1,241 @@
-/* Copyright (C) 1995-2022 Free Software Foundation, Inc.
+/* lshift.S: ARM optimized 64-bit integer shift
 
-This file is free software; you can redistribute it and/or modify it
-under the terms of the GNU General Public License as published by the
-Free Software Foundation; either version 3, or (at your option) any
-later version.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
 
-This file is distributed in the hope that it will be useful, but
-WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-General Public License for more details.
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
 
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
 
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-<http://www.gnu.org/licenses/>.  */
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
 
 
 #ifdef L_lshrdi3
 
-	FUNC_START lshrdi3
-	FUNC_ALIAS aeabi_llsr lshrdi3
-	
-#ifdef __thumb__
-	lsrs	al, r2
-	movs	r3, ah
-	lsrs	ah, r2
-	mov	ip, r3
-	subs	r2, #32
-	lsrs	r3, r2
-	orrs	al, r3
-	negs	r2, r2
-	mov	r3, ip
-	lsls	r3, r2
-	orrs	al, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	al, al, lsr r2
-	movpl	al, ah, lsr r3
-	orrmi	al, al, ah, lsl ip
-	mov	ah, ah, lsr r2
-	RET
-#endif
-	FUNC_END aeabi_llsr
-	FUNC_END lshrdi3
-
-#endif
-	
+// long long __aeabi_llsr(long long, int)
+// Logical shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+FUNC_START_SECTION aeabi_llsr .text.sorted.libgcc.lshrdi3
+FUNC_ALIAS lshrdi3 aeabi_llsr
+    CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+        // Save a copy for the remainder.
+        movs    r3,     xxh
+
+        // Assume a simple shift.
+        lsrs    xxl,    r2
+        lsrs    xxh,    r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+
+    #ifdef __HAVE_FEATURE_IT
+        do_it   lo,te
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsblo   r2,     #0
+        lsllo   r3,     r2
+
+        // The remainder shift extends into the hi word.
+        lsrhs   r3,     r2
+
+    #else /* !__HAVE_FEATURE_IT */
+        bhs     LLSYM(__llsr_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsls    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+      LLSYM(__llsr_large):
+        // Apply any remaining shift to the hi word.
+        lsrs    r3,     r2
+
+    #endif /* !__HAVE_FEATURE_IT */
+
+        // Merge remainder and result.
+        adds    xxl,    r3
+        RET
+
+  #else /* !__thumb__ */
+
+        subs    r3,     r2,     #32
+        rsb     ip,     r2,     #32
+        movmi   xxl,    xxl,    lsr r2
+        movpl   xxl,    xxh,    lsr r3
+        orrmi   xxl,    xxl,    xxh,    lsl ip
+        mov     xxh,    xxh,    lsr r2
+        RET
+
+  #endif /* !__thumb__ */
+
+
+    CFI_END_FUNCTION
+FUNC_END lshrdi3
+FUNC_END aeabi_llsr
+
+#endif /* L_lshrdi3 */
+
+
 #ifdef L_ashrdi3
-	
-	FUNC_START ashrdi3
-	FUNC_ALIAS aeabi_lasr ashrdi3
-	
-#ifdef __thumb__
-	lsrs	al, r2
-	movs	r3, ah
-	asrs	ah, r2
-	subs	r2, #32
-	@ If r2 is negative at this point the following step would OR
-	@ the sign bit into all of AL.  That's not what we want...
-	bmi	1f
-	mov	ip, r3
-	asrs	r3, r2
-	orrs	al, r3
-	mov	r3, ip
-1:
-	negs	r2, r2
-	lsls	r3, r2
-	orrs	al, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	al, al, lsr r2
-	movpl	al, ah, asr r3
-	orrmi	al, al, ah, lsl ip
-	mov	ah, ah, asr r2
-	RET
-#endif
-
-	FUNC_END aeabi_lasr
-	FUNC_END ashrdi3
-
-#endif
+
+// long long __aeabi_lasr(long long, int)
+// Arithmetic shift right the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+FUNC_START_SECTION aeabi_lasr .text.sorted.libgcc.ashrdi3
+FUNC_ALIAS ashrdi3 aeabi_lasr
+    CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+        // Save a copy for the remainder.
+        movs    r3,     xxh
+
+        // Assume a simple shift.
+        lsrs    xxl,    r2
+        asrs    xxh,    r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+
+    #ifdef __HAVE_FEATURE_IT
+        do_it   lo,te
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsblo   r2,     #0
+        lsllo   r3,     r2
+
+        // The remainder shift extends into the hi word.
+        asrhs   r3,     r2
+
+    #else /* !__HAVE_FEATURE_IT */
+        bhs     LLSYM(__lasr_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsls    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+      LLSYM(__lasr_large):
+        // Apply any remaining shift to the hi word.
+        asrs    r3,     r2
+
+    #endif /* !__HAVE_FEATURE_IT */
+
+        // Merge remainder and result.
+        adds    xxl,    r3
+        RET
+
+  #else /* !__thumb__ */
+
+        subs    r3,     r2,     #32
+        rsb     ip,     r2,     #32
+        movmi   xxl,    xxl,    lsr r2
+        movpl   xxl,    xxh,    asr r3
+        orrmi   xxl,    xxl,    xxh,    lsl ip
+        mov     xxh,    xxh,    asr r2
+        RET
+
+  #endif /* !__thumb__ */
+
+    CFI_END_FUNCTION
+FUNC_END ashrdi3
+FUNC_END aeabi_lasr
+
+#endif /* L_ashrdi3 */
+
 
 #ifdef L_ashldi3
 
-	FUNC_START ashldi3
-	FUNC_ALIAS aeabi_llsl ashldi3
-	
-#ifdef __thumb__
-	lsls	ah, r2
-	movs	r3, al
-	lsls	al, r2
-	mov	ip, r3
-	subs	r2, #32
-	lsls	r3, r2
-	orrs	ah, r3
-	negs	r2, r2
-	mov	r3, ip
-	lsrs	r3, r2
-	orrs	ah, r3
-	RET
-#else
-	subs	r3, r2, #32
-	rsb	ip, r2, #32
-	movmi	ah, ah, lsl r2
-	movpl	ah, al, lsl r3
-	orrmi	ah, ah, al, lsr ip
-	mov	al, al, lsl r2
-	RET
-#endif
-	FUNC_END aeabi_llsl
-	FUNC_END ashldi3
-
-#endif
+// long long __aeabi_llsl(long long, int)
+// Logical shift left the 64 bit value in $r1:$r0 by the count in $r2.
+// The result is only guaranteed for shifts in the range of '0' to '63'.
+// Uses $r3 as scratch space.
+.section .text.sorted.libgcc.ashldi3,"x"
+FUNC_START_SECTION aeabi_llsl .text.sorted.libgcc.ashldi3
+FUNC_ALIAS ashldi3 aeabi_llsl
+    CFI_START_FUNCTION
+
+  #if defined(__thumb__) && __thumb__
+
+        // Save a copy for the remainder.
+        movs    r3,     xxl
+
+        // Assume a simple shift.
+        lsls    xxl,    r2
+        lsls    xxh,    r2
+
+        // Test if the shift distance is larger than 1 word.
+        subs    r2,     #32
+
+    #ifdef __HAVE_FEATURE_IT
+        do_it   lo,te
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsblo   r2,     #0
+        lsrlo   r3,     r2
+
+        // The remainder shift extends into the hi word.
+        lslhs   r3,     r2
+
+    #else /* !__HAVE_FEATURE_IT */
+        bhs     LLSYM(__llsl_large)
+
+        // The remainder is opposite the main shift, (32 - x) bits.
+        rsbs    r2,     #0
+        lsrs    r3,     r2
+
+        // Cancel any remaining shift.
+        eors    r2,     r2
+
+      LLSYM(__llsl_large):
+        // Apply any remaining shift to the hi word.
+        lsls    r3,     r2
+
+    #endif /* !__HAVE_FEATURE_IT */
+
+        // Merge remainder and result.
+        adds    xxh,    r3
+        RET
+
+  #else /* !__thumb__ */
+
+        subs    r3,     r2,     #32
+        rsb     ip,     r2,     #32
+        movmi   xxh,    xxh,    lsl r2
+        movpl   xxh,    xxl,    lsl r3
+        orrmi   xxh,    xxh,    xxl,    lsr ip
+        mov     xxl,    xxl,    lsl r2
+        RET
+
+  #endif /* !__thumb__ */
+
+    CFI_END_FUNCTION
+FUNC_END ashldi3
+FUNC_END aeabi_llsl
+
+#endif /* L_ashldi3 */
+
+
 

From patchwork Mon Oct 31 15:45:07 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59675
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A2925395C037
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:50:00 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id D8192383FBBB
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D8192383FBBB
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id BF8133200974;
 Mon, 31 Oct 2022 11:47:10 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:47:11 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231230; x=
 1667317630; bh=IheOd4UT7YUTSAMbCVlAbTMEZnE7dlXUisX6zATjPk4=; b=b
 CVvrfvGY8ZdpiOoX26N8NrciqsaFGZdsL6mFVOzwPTvvt4UhbuIVcL0WZvUgDTPv
 Fyh79RTDZQ3aX9L1TOpzMKsj+XrZV2owupV+o2CChIRtIBrjnGJKp27atTe+I7kF
 nLOHnH4karudpitDZwoTMiVFSFnl/wws7EGOIV8yrdWSn9lXcKFiZ8g+DSRKXcxc
 P/wg3sSQu+uociSeLDaHgO9+Dla5tY1AfmOBbUscLs/pvUxHrb1zdY+XpEM2MG97
 ndjyjQ1rowh3V/y1WYe28dirOcKUcRQczPa6ekDVaIKPxD7En8hSuKhzH5yLAPEQ
 Hxrj6DJhukRO0EV0m3hng==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231230; x=1667317630; bh=IheOd4UT7YUTS
 AMbCVlAbTMEZnE7dlXUisX6zATjPk4=; b=EhuModMvudoTfm/6RVlsO1lvdcPiP
 r2kvPSgOmCMavOcf2kHE6sElVD7qJiUgT/J9iZqbfR+7yiW+8w2HerrOZ9RSV8K2
 ozE0BtZbqkNfCldHdLOQ4YrMmABZFfa4H5PP8I/H4ZoI9jjXeBXGpIUTG1WRVSqR
 RKiE7Is2p+1t2eYn+0IL/zvEAWYD9tdOB8szFdSFXP6j82028d3r2OL/Feaizc1Q
 x11zvKxqm6TvNj1wWZ5MuMtu4qusoRc8cq95OJnL1BGWjFYt+KV2hvu1p08EQUGl
 A0LtwxXXbXH9KbMjdimg8UpINF4pU27PnXvdRb3cIf4KRYvVKBivazO8A==
X-ME-Sender: <xms:_u1fY1aPgpKLm5gRGwsmAg5uHm6h0Q0uzucWjg9xGy18kKiCrOjp7Q>
 <xme:_u1fY8ZnnK1pgjUez-KVcY0jIF4trZQWOONt_VAJceVcU3Gwfx1YW1fobHx_u34o0
 Ypmy72A7swjkQ>
X-ME-Received: 
 <xmr:_u1fY39rvMbMzCJuG7o4CV-5mlWCf3zPHaNpOvCqoq-6ZpHR2viv1ZVACOPIa00C1MxpsPXEaUPUgBFT0CzILNvuH_v2fdaNLTkk9l_-_tBPbw86VyNobx8>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdejlecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeetfeefvdelfffhteeiteehfeekuedtteekheevjedtjefhudehheejjedv
 udffveenucffohhmrghinheptghliidvrdhssgdpghhnuhdrohhrghenucevlhhushhtvg
 hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgv
 nhhgvghlrdgtohhm
X-ME-Proxy: <xmx:_u1fYzpLxWacyOjc23JDawsKY-O0qtyHAu-F_0myiNH9m2vy9rUQUg>
 <xmx:_u1fYwroUuX6Zb6egdw47W6K799H98TBBXPq1FPje5xnuUcyGuXpJQ>
 <xmx:_u1fY5Rk5DY70d_aRTA34hzxyLHt8dAKp8XwzGV1qatqbLsMufnaiA>
 <xmx:_u1fY30sC4pLecsTNVu1NeWJ207ajrbrwyswEVoHE7gyJcl7pMkRYg>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:09 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFl1L3087265; Mon, 31 Oct 2022 08:47:01 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 12/34] Import 'clrsb' functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:07 -0700
Message-Id: <20221031154529.3627576-13-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This implementation provides an efficient tail call to __clzsi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bits/clz2.S (__clrsbsi2, __clrsbdi2):
	Added new functions.
	* config/arm/t-elf (LIB1ASMFUNCS):
	Added new function objects _clrsbsi2 and _clrsbdi2).
---
 libgcc/config/arm/clz2.S | 108 ++++++++++++++++++++++++++++++++++++++-
 libgcc/config/arm/t-elf  |   2 +
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index ed04698fef4..3d40811278b 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -1,4 +1,4 @@
-/* clz2.S: Cortex M0 optimized 'clz' functions
+/* clz2.S: ARM optimized 'clz' and related functions
 
    Copyright (C) 2018-2022 Free Software Foundation, Inc.
    Contributed by Daniel Engel (gnu@danielengel.com)
@@ -23,7 +23,7 @@
    <http://www.gnu.org/licenses/>.  */
 
 
-#if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+#ifdef __ARM_FEATURE_CLZ
 
 #ifdef L_clzdi2
 
@@ -242,3 +242,107 @@ FUNC_END clzdi2
 
 #endif /* !__ARM_FEATURE_CLZ */
 
+
+#ifdef L_clrsbdi2
+
+// int __clrsbdi2(int)
+// Counts the number of "redundant sign bits" in $r1:$r0.
+// Returns the result in $r0.
+// Uses $r2 and $r3 as scratch space.
+FUNC_START_SECTION clrsbdi2 .text.sorted.libgcc.clz2.clrsbdi2
+    CFI_START_FUNCTION
+
+  #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+        // Invert negative signs to keep counting zeros.
+        asrs    r3,     xxh,    #31
+        eors    xxl,    r3
+        eors    xxh,    r3
+
+        // Same as __clzdi2(), except that the 'C' flag is pre-calculated.
+        // Also, the trailing 'subs', since the last bit is not redundant.
+        do_it   eq,     et
+        clzeq   r0,     xxl
+        clzne   r0,     xxh
+        addeq   r0,     #32
+        subs    r0,     #1
+        RET
+
+  #else  /* !__ARM_FEATURE_CLZ */
+        // Result if all the bits in the argument are zero.
+        // Set it here to keep the flags clean after 'eors' below.
+        movs    r2,     #31
+
+        // Invert negative signs to keep counting zeros.
+        asrs    r3,     xxh,    #31
+        eors    xxh,    r3
+
+    #if defined(__ARMEB__) && __ARMEB__
+        // If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+        bne     SYM(__internal_clzsi2)
+
+        // The upper word is zero, prepare the lower word.
+        movs    r0,     r1
+        eors    r0,     r3
+
+    #else /* !__ARMEB__ */
+        // Save the lower word temporarily.
+        // This somewhat awkward construction adds one cycle when the
+        //  branch is not taken, but prevents a double-branch.
+        eors    r3,     r0
+
+        // If the upper word is non-zero, return '__clzsi2(upper) - 1'.
+        movs    r0,     r1
+        bne    SYM(__internal_clzsi2)
+
+        // Restore the lower word.
+        movs    r0,     r3
+
+    #endif /* !__ARMEB__ */
+
+        // The upper word is zero, return '31 + __clzsi2(lower)'.
+        adds    r2,     #32
+        b       SYM(__internal_clzsi2)
+
+  #endif /* !__ARM_FEATURE_CLZ */
+
+    CFI_END_FUNCTION
+FUNC_END clrsbdi2
+
+#endif /* L_clrsbdi2 */
+
+
+#ifdef L_clrsbsi2
+
+// int __clrsbsi2(int)
+// Counts the number of "redundant sign bits" in $r0.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+FUNC_START_SECTION clrsbsi2 .text.sorted.libgcc.clz2.clrsbsi2
+    CFI_START_FUNCTION
+
+        // Invert negative signs to keep counting zeros.
+        asrs    r2,     r0,    #31
+        eors    r0,     r2
+
+      #if defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ
+        // Count.
+        clz     r0,     r0
+
+        // The result for a positive value will always be >= 1.
+        // By definition, the last bit is not redundant.
+        subs    r0,     #1
+        RET
+
+      #else /* !__ARM_FEATURE_CLZ */
+        // Result if all the bits in the argument are zero.
+        // By definition, the last bit is not redundant.
+        movs    r2,     #31
+        b       SYM(__internal_clzsi2)
+
+      #endif  /* !__ARM_FEATURE_CLZ */
+
+    CFI_END_FUNCTION
+FUNC_END clrsbsi2
+
+#endif /* L_clrsbsi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 33b83ac4adf..89071cebe45 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -31,6 +31,8 @@ LIB1ASMFUNCS += \
 	_ashldi3 \
 	_ashrdi3 \
 	_lshrdi3 \
+	_clrsbsi2 \
+	_clrsbdi2 \
 	_clzdi2 \
 	_ctzdi2 \
 	_dvmd_tls \

From patchwork Mon Oct 31 15:45:08 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59673
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 62CC4395B43F
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:49:55 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 061D13851C19
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 061D13851C19
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id E5576320096B;
 Mon, 31 Oct 2022 11:47:15 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:47:16 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231235; x=
 1667317635; bh=T2qqtR3LNsENBNqGkHR8vJZwg9ZUCO0KM4lpcB2Zntw=; b=K
 SK44waR8SIAxUsg2Qw2o6Ge4dcXH2/T6kyPWttdDP+eJzmQHHfHQuUvtIydOPwGK
 f6y8UixXiBrHp9RppDGLTYYHvbZz2P3D3fp3dOpoZGTK8EAIXJOWBNJYbJ9SubDp
 SvnF05TI/oNkcaZNP0JCi1BIEUGPkeSWAFfcQucpueWjmh6AGrfgfyMfiRAQVPWH
 9THCyCTemD2ONK9H/wgdWRG8NHSdkAj9ZuJ8gFV+zsLK1tOzVEOYDqCXzzkX1bbd
 wYF8SbFFYEz7leelVglrQlszZxwE3VtEkaMEvoAGt0GN3rFlelcb4iwJ6JCYl74z
 w/XpAkclYSaR1iWdSspTA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231235; x=1667317635; bh=T2qqtR3LNsENB
 NqGkHR8vJZwg9ZUCO0KM4lpcB2Zntw=; b=LseMSBwfUJC733zv5Ve8dqXH9xfcM
 vWqLBoVqFWW2i6mgLm/idnMV9Lj6tKl6eNjJBH665pnvH42lLhstKofPrd0X/VRF
 Ph8c5JCGyPheIr1pSFoGvdRlGFjaXzaE8WU3fNa1KD/mQOFST+SnDlieE+01isam
 auX1G/qQAVZ8dShk92wkHiBvNK7qzFZraL4xYQ4W/LyCTUuFK3XkcQw33zI0zh9n
 5+S2kAre8dXdt0OR/UhSQgi3FWj3gI0S2UZI+kA5QuJrvmmC6UwMSp1PU1VyW0vw
 TZKSaTTN4+IeOYDA/SFPKQyYh5/wR+oSIdSrMU+tGnEC0kL576bYoWTBw==
X-ME-Sender: <xms:A-5fY0cX20Tcjx-EfMl9ReYwKBZiWjtLtpuEcEvJYcEyo5HZVjOnjQ>
 <xme:A-5fY2PPnHeqmvZg7JMI7LnZOaa3EMmVr5692rwenjjRJYntVJNXzq0h2mrFvNKy6
 fh_zbw-k31Mww>
X-ME-Received: 
 <xmr:A-5fY1iXu9DJZZHQTODLlDU9UzUZuMITTJiLhGFWFjJDK6O_s3WZeGYxVvaVyXBZwhH1GgPqSOGI0nm2Qoys_6EQS991C0W5xx5pkmGHIRlld3fwA90NoYg>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdejlecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpedtteehtddtgeevieejfeekheffhfejkefhiedufeelteffvdfffeehtdeh
 geekffenucffohhmrghinheptghtiidvrdhssgenucevlhhushhtvghrufhiiigvpedtne
 curfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:A-5fY5_s24PWTMy4n9PG7i_eT6KE-KjtmceBCpnwOGgwKexRbWR2BA>
 <xmx:A-5fYwteEvytufST4cQLgx7kif_G3f7inpczH-2Y7OlGZV05jc-WbA>
 <xmx:A-5fYwGJWpE3Jm2X983cHpZKtzrNMLiN1FXpOoGlXkFcKZY7GRAenw>
 <xmx:A-5fY54BKRAMz-3C9T2YdvtCJibeFtRd80ZX-uPdzKxpXhjCav9Q9A>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:14 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFl7ft087268; Mon, 31 Oct 2022 08:47:07 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 13/34] Import 'ffs' functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:08 -0700
Message-Id: <20221031154529.3627576-14-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This implementation provides an efficient tail call to __clzdi2(), making the
functions rather smaller and faster than the C versions.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bits/ctz2.S (__ffssi2, __ffsdi2): New functions.
	* config/arm/t-elf (LIB1ASMFUNCS): Added _ffssi2 and _ffsdi2.
---
 libgcc/config/arm/ctz2.S | 77 +++++++++++++++++++++++++++++++++++++++-
 libgcc/config/arm/t-elf  |  2 ++
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index 82c81c6ae11..d57acabae01 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -1,4 +1,4 @@
-/* ctz2.S: ARM optimized 'ctz' functions
+/* ctz2.S: ARM optimized 'ctz' and related functions
 
    Copyright (C) 2020-2022 Free Software Foundation, Inc.
    Contributed by Daniel Engel (gnu@danielengel.com)
@@ -238,3 +238,78 @@ FUNC_END ctzdi2
 
 #endif /* L_ctzsi2 || L_ctzdi2 */
 
+
+#ifdef L_ffsdi2
+
+// int __ffsdi2(int)
+// Return the index of the least significant 1-bit in $r1:r0,
+//  or zero if $r1:r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffsdi2 .text.sorted.libgcc.ctz2.ffsdi2
+    CFI_START_FUNCTION
+
+        // Simplify branching by assuming a non-zero lower word.
+        // For all such, ffssi2(x) == ctzsi2(x) + 1.
+        movs    r2,    #(33 - CTZ_RESULT_OFFSET)
+
+      #if defined(__ARMEB__) && __ARMEB__
+        // HACK: Save the upper word in a scratch register.
+        movs    r3,     r0
+
+        // Test the lower word.
+        movs    r0,     r1
+        bne     SYM(__internal_ctzsi2)
+
+        // Test the upper word.
+        movs    r2,    #(65 - CTZ_RESULT_OFFSET)
+        movs    r0,     r3
+        bne     SYM(__internal_ctzsi2)
+
+      #else /* !__ARMEB__ */
+        // Test the lower word.
+        cmp     r0,     #0
+        bne     SYM(__internal_ctzsi2)
+
+        // Test the upper word.
+        movs    r2,    #(65 - CTZ_RESULT_OFFSET)
+        movs    r0,     r1
+        bne     SYM(__internal_ctzsi2)
+
+      #endif /* !__ARMEB__ */
+
+        // Upper and lower words are both zero.
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END ffsdi2
+
+#endif /* L_ffsdi2 */
+
+
+#ifdef L_ffssi2
+
+// int __ffssi2(int)
+// Return the index of the least significant 1-bit in $r0,
+//  or zero if $r0 is zero.  The least significant bit is index 1.
+// Returns the result in $r0.
+// Uses $r2 and possibly $r3 as scratch space.
+// Same section as __ctzsi2() for sake of the tail call branches.
+FUNC_START_SECTION ffssi2 .text.sorted.libgcc.ctz2.ffssi2
+    CFI_START_FUNCTION
+
+        // Simplify branching by assuming a non-zero argument.
+        // For all such, ffssi2(x) == ctzsi2(x) + 1.
+        movs    r2,    #(33 - CTZ_RESULT_OFFSET)
+
+        // Test for zero, return unmodified.
+        cmp     r0,     #0
+        bne     SYM(__internal_ctzsi2)
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END ffssi2
+
+#endif /* L_ffssi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 89071cebe45..346fc766f17 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -35,6 +35,8 @@ LIB1ASMFUNCS += \
 	_clrsbdi2 \
 	_clzdi2 \
 	_ctzdi2 \
+	_ffssi2 \
+	_ffsdi2 \
 	_dvmd_tls \
 	_divsi3 \
 	_modsi3 \

From patchwork Mon Oct 31 15:45:09 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59678
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 6219D3AA9421
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:50:36 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 20ECE3852774
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 20ECE3852774
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 0C4AF3200945;
 Mon, 31 Oct 2022 11:47:28 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:47:29 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231248; x=
 1667317648; bh=P8+jBTtuJo1MEu0gFEtVOWtzYyNZ73jHQYh3RLnTTGA=; b=b
 yc/PqIxGzi9CJ+XEp2lmbI+HUG7znz12QvHQ6dgojSVSXj2rHCKXzWs7VC9cl3Nd
 Hz2BZ2Sdh0gD+5pRJ8xCN0aIhAeLzowayAIwK3il0L0Cw03lpAppYBk715/R/gLr
 /3omgt/uuj4RUjaN3MgAFXHtujD15li1fjWsm4egNkNny4p26Lr2Ue2PEM62E5ey
 anJv5ugJtNH5xXrV8wJXLR5i6lEO8UO01tfU3noSujOm6KAfvBxIBmX0jHlPRZ1h
 p86nS31e7FXdTmDiTaG+RRP7wVxgSWR/+m1s+D4m5FmWSDTcyyZkz6J5KSknEPbF
 8YZ7Kq0Pea9xLOmfA+HcA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231248; x=1667317648; bh=P8+jBTtuJo1ME
 u0gFEtVOWtzYyNZ73jHQYh3RLnTTGA=; b=r99lMQ05ieI2F78SMk4/p20Vygidh
 0aief92ADXqymYebSVhgW8BP5QVGBYoLcOXnCBY7imbw2n3xDBCJAY2rRG/Nsity
 S8Zofnv4mai/W5Dn7RLsunHzXUMH7AbDPhYBJX5ae+iGF7YZg6GO8tj+hFzmC2r6
 uQcwFpVL5sluKvo/5MukSqwXphfb4VDIylBvyZqivL1WSWLOx/WARb0rsM5LA04X
 MvHBMk6xAZQkiGrlzBEMMhH7Xd8ZLnWaS5JVfEDLT0C3nIip1QviN85PZJpVzcpu
 CcclxrnafmcW686nu4j8YJPld+OGhejCRNGYArBq3OeILj+mKuXZz0Zeg==
X-ME-Sender: <xms:EO5fY4amtdFSrPxHD9GN4_jaqlJ_oRwRsGU971dfA03SHGpmCwBWuw>
 <xme:EO5fYzZncn2GDmrxyeQZJfwlCsC50h758-EAwBAavNDbV4lSjztdNOgBsF3-uxOOX
 Yn6pzj3DntfIQ>
X-ME-Received: 
 <xmr:EO5fYy_OmCMfJIvjeaUURL1O2CMAl2CqjqM_6oImWKFdL-W8EmXBGkkC8qYu270VCLHKtLzWZ5iV8hAsVj4beP9dW49KdWvYWRL_73IjjB0V6YyQUTXUySw>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeefffevgffhuedvvddvleetveelgeetffffieeihfffjeejgfetgfeifefg
 jeetteenucffohhmrghinheplhhisgdufhhunhgtshdrshgspdhprghrihhthidrshgspd
 hgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhf
 rhhomhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:EO5fYypwhYvq7mgiPct_xCPGAoQQWar7qpZzW1JObPPOWpMpb7z1fw>
 <xmx:EO5fYzpCklnCZoajdZWkvKNhBfMeLMrpHQU8DV61TqxcHpzgeYLndA>
 <xmx:EO5fYwTrJGUnluXqHFskgITnqmUztjkZh5PSmOmrk93_GbXzWqGxZQ>
 <xmx:EO5fY62cKbheisiNL7p_NF8IBzWYJ3nw9c6Ivc69-ZcZdU-slsrPqg>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:27 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFlIjc087271; Mon, 31 Oct 2022 08:47:19 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 14/34] Import 'parity' functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:09 -0700
Message-Id: <20221031154529.3627576-15-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The functional overlap between the single- and double-word functions makes
functions makes this implementation about half the size of the C functions
if both functions are linked in the same application.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/parity.S: New file for __paritysi2/di2().
	* config/arm/lib1funcs.S: #include bit/parity.S
	* config/arm/t-elf (LIB1ASMFUNCS): Added _paritysi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/parity.S    | 120 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/t-elf       |   2 +
 3 files changed, 123 insertions(+)
 create mode 100644 libgcc/config/arm/parity.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index aa5957b8399..3f7b9e739f0 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1704,6 +1704,7 @@ LSYM(Lover12):
 
 #include "clz2.S"
 #include "ctz2.S"
+#include "parity.S"
 
 /* ------------------------------------------------------------------------ */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/parity.S b/libgcc/config/arm/parity.S
new file mode 100644
index 00000000000..1405bea93a3
--- /dev/null
+++ b/libgcc/config/arm/parity.S
@@ -0,0 +1,120 @@
+/* parity.S: ARM optimized parity functions
+
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_paritydi2
+
+// int __paritydi2(int)
+// Returns '0' if the number of bits set in $r1:r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+FUNC_START_SECTION paritydi2 .text.sorted.libgcc.paritydi2
+    CFI_START_FUNCTION
+
+        // Combine the upper and lower words, then fall through.
+        // Byte-endianness does not matter for this function.
+        eors    r0,     r1
+
+#endif /* L_paritydi2 */
+
+
+// The implementation of __paritydi2() tightly couples with __paritysi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __paritydi2() when only using __paritysi2().
+// Therefore, this block configures __paritysi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __paritydi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_paritysi2' should appear before '_paritydi2' in LIB1ASMFUNCS.
+#if defined(L_paritysi2) || defined(L_paritydi2)
+
+#ifdef L_paritysi2
+// int __paritysi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+WEAK_START_SECTION paritysi2 .text.sorted.libgcc.paritysi2
+    CFI_START_FUNCTION
+
+#else /* L_paritydi2 */
+FUNC_ENTRY paritysi2
+
+#endif
+
+  #if defined(__thumb__) && __thumb__
+    #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+
+        // Size optimized: 16 bytes, 40 cycles
+        // Speed optimized: 24 bytes, 14 cycles
+        movs    r2,     #16
+
+    LLSYM(__parity_loop):
+        // Calculate the parity of successively smaller half-words into the MSB.
+        movs    r1,     r0
+        lsls    r1,     r2
+        eors    r0,     r1
+        lsrs    r2,     #1
+        bne     LLSYM(__parity_loop)
+
+    #else /* !__OPTIMIZE_SIZE__ */
+
+        // Unroll the loop.  The 'libgcc' reference C implementation replaces
+        //  the x2 and the x1 shifts with a constant.  However, since it takes
+        //  4 cycles to load, index, and mask the constant result, it doesn't
+        //  cost anything to keep shifting (and saves a few bytes).
+        lsls    r1,     r0,     #16
+        eors    r0,     r1
+        lsls    r1,     r0,     #8
+        eors    r0,     r1
+        lsls    r1,     r0,     #4
+        eors    r0,     r1
+        lsls    r1,     r0,     #2
+        eors    r0,     r1
+        lsls    r1,     r0,     #1
+        eors    r0,     r1
+
+    #endif /* !__OPTIMIZE_SIZE__ */
+  #else /* !__thumb__ */
+
+        eors    r0,    r0,     r0,     lsl #16
+        eors    r0,    r0,     r0,     lsl #8
+        eors    r0,    r0,     r0,     lsl #4
+        eors    r0,    r0,     r0,     lsl #2
+        eors    r0,    r0,     r0,     lsl #1
+
+  #endif /* !__thumb__ */
+
+        lsrs    r0,     #31
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END paritysi2
+
+#ifdef L_paritydi2
+FUNC_END paritydi2
+#endif
+
+#endif /* L_paritysi2 || L_paritydi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 346fc766f17..0e9b9ce21af 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -24,6 +24,7 @@ endif # !__symbian__
 LIB1ASMFUNCS += \
         _clzsi2 \
 	_ctzsi2 \
+	_paritysi2 \
 
 
 # Group 1: Integer function objects.
@@ -37,6 +38,7 @@ LIB1ASMFUNCS += \
 	_ctzdi2 \
 	_ffssi2 \
 	_ffsdi2 \
+	_paritydi2 \
 	_dvmd_tls \
 	_divsi3 \
 	_modsi3 \

From patchwork Mon Oct 31 15:45:10 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59666
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id E7C2A3851C32
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:48:16 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 1BE13381D442
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1BE13381D442
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 0B2DF3200917;
 Mon, 31 Oct 2022 11:47:33 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:47:34 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231253; x=
 1667317653; bh=1Bqdt590zutcpZxuRkOx84EFdSVrfMrIpV48jQaiXYs=; b=v
 698qcz/czkiYfrN9iO95lNVylkS8uChox2i4DvjR6uX1zaq+6GMk2dewV1ZJn+BC
 GLh0iV/mStipe0PCVwuEsB2TLw7PB7NTfemq1yaC4VnsAmAZyLc5Xg0+AMERoCLT
 8+XFLBB1+0J/FXc1H+D4YM/PdCbmDBPbOOR/EM52h2slaexYw5Kf/LmYRB3UyX8P
 ISyUQ6bLqr0v45qtGBXWVTiIr+UAIk3zbCVURXK3Cao00N8YYLEEbsJAOYx6ecrY
 LXkDZ0qNT6TeFGz0GKTPTn/+jFMoV89Ag3pT2O6n39ixIXGDAcd29WR6TT19UpLs
 QghNz9Kei/HL5N/KFwbtg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231253; x=1667317653; bh=1Bqdt590zutcp
 ZxuRkOx84EFdSVrfMrIpV48jQaiXYs=; b=OY3/oB5yTvBh16bSv9gzTKhxrGRcK
 ZgNAfkP3RS1rW7ydlOjjZAYFX6CahI6NNY5InpCZCVdelNvTcID4dcJl8vCfjpkG
 wqNc/lxvFlSEBtgK8j/oFkYWXJgHCRHg96xPcNx5Hx+C58kH/5VItn+VYeFjWxuT
 IXecGJcgJVLpzg3JRELgyTC6sAVMWSPuzjAawXgn0vk7Ho709Ixa2gQHnh8XiWXr
 bTktanL1+XVdOqs+kh6dwEFO/17I6/Jb42birCyyIkOB+n//lLASvCDC3vhgxeDy
 pB5OBfYCpJN+/VYJaLz1OZ2Ph1bbWKp/IBNXNNekAYK+Y8vR7p9v9iXxA==
X-ME-Sender: <xms:Fe5fY0Y3xyqmpqkU0B9fMgcmmlsLubPrRoowLN2u2Ztb6kPfMu9f8Q>
 <xme:Fe5fY_aI-SJbDD68mWzFGjA7b002cmP6MHYVuC87SWc6irXYkm53cKsJaOvanwSPG
 FPF6Wwifddlzg>
X-ME-Received: 
 <xmr:Fe5fY--sdKOnL6H1gAu1fbPJpbqEqisBwq9aCSzXISEp9P_9YGe_aVbFp2Z2n6n4YzqZy88404NsMvvLiuoa9fEk6i_YispWFEe5DFg50rOEHQFndtvu55s>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpefgtdfggfelleffkeektdetteeftdfgkeehuddvjefgleeuteefheeujeeh
 gfdtgfenucffohhmrghinheplhhisgdufhhunhgtshdrshgspdhpohhptghnthdrshgspd
 hgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhf
 rhhomhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:Fe5fY-rHQ19Od3JlzXil7xrDEa5lgHruLbZS5WaYazNXHR7AadBldg>
 <xmx:Fe5fY_oedOkZWOL2pq66zHd_nk730LlC32dc0Obif9boPAKMSmZlXw>
 <xmx:Fe5fY8RopJWYiQri9ownJ4b7Z3HTXMnzbEPsQqouUDvM0KXr0zLVmQ>
 <xmx:Fe5fY21RIrQtl0VXkh9r2hMgsSm0dIfUyG8TGh2KVWD62dE3Ws6sxg>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:32 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFlPN3087274; Mon, 31 Oct 2022 08:47:25 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 15/34] Import 'popcnt' functions from the CM0 library
Date: Mon, 31 Oct 2022 08:45:10 -0700
Message-Id: <20221031154529.3627576-16-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The functional overlap between the single- and double-word functions
makes this implementation about 30% smaller than the C functions
if both functions are linked together in the same appliation.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/popcnt.S (__popcountsi, __popcountdi2): New file.
	* config/arm/lib1funcs.S: #include bit/popcnt.S
	* config/arm/t-elf (LIB1ASMFUNCS): Add _popcountsi2/di2.
---
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/popcnt.S    | 189 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/t-elf       |   2 +
 3 files changed, 192 insertions(+)
 create mode 100644 libgcc/config/arm/popcnt.S

diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 3f7b9e739f0..0eb6d1d52a7 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1705,6 +1705,7 @@ LSYM(Lover12):
 #include "clz2.S"
 #include "ctz2.S"
 #include "parity.S"
+#include "popcnt.S"
 
 /* ------------------------------------------------------------------------ */
 /* These next two sections are here despite the fact that they contain Thumb 
diff --git a/libgcc/config/arm/popcnt.S b/libgcc/config/arm/popcnt.S
new file mode 100644
index 00000000000..4613ea475b0
--- /dev/null
+++ b/libgcc/config/arm/popcnt.S
@@ -0,0 +1,189 @@
+/* popcnt.S: ARM optimized popcount functions
+
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_popcountdi2
+
+// int __popcountdi2(int)
+// Returns the number of bits set in $r1:$r0.
+// Returns the result in $r0.
+FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
+    CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Initialize the result.
+        // Compensate for the two extra loop (one for each word)
+        //  required to detect zero arguments.
+        movs    r2,     #2
+
+    LLSYM(__popcountd_loop):
+        // Same as __popcounts_loop below, except for $r1.
+        subs    r2,     #1
+        subs    r3,     r1,     #1
+        ands    r1,     r3
+        bcs     LLSYM(__popcountd_loop)
+
+        // Repeat the operation for the second word.
+        b       LLSYM(__popcounts_loop)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+        // Load the one-bit alternating mask.
+        ldr     r3,     =0x55555555
+
+        // Reduce the second word.
+        lsrs    r2,     r1,     #1
+        ands    r2,     r3
+        subs    r1,     r2
+
+        // Reduce the first word.
+        lsrs    r2,     r0,     #1
+        ands    r2,     r3
+        subs    r0,     r2
+
+        // Load the two-bit alternating mask.
+        ldr     r3,     =0x33333333
+
+        // Reduce the second word.
+        lsrs    r2,     r1,     #2
+        ands    r2,     r3
+        ands    r1,     r3
+        adds    r1,     r2
+
+        // Reduce the first word.
+        lsrs    r2,     r0,     #2
+        ands    r2,     r3
+        ands    r0,     r3
+        adds    r0,     r2
+
+        // There will be a maximum of 8 bits in each 4-bit field.
+        // Jump into the single word flow to combine and complete.
+        b       LLSYM(__popcounts_merge)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+#endif /* L_popcountdi2 */
+
+
+// The implementation of __popcountdi2() tightly couples with __popcountsi2(),
+//  such that instructions must appear consecutively in the same memory
+//  section for proper flow control.  However, this construction inhibits
+//  the ability to discard __popcountdi2() when only using __popcountsi2().
+// Therefore, this block configures __popcountsi2() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __popcountdi2().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_popcountsi2' should appear before '_popcountdi2' in LIB1ASMFUNCS.
+#if defined(L_popcountsi2) || defined(L_popcountdi2)
+
+#ifdef L_popcountsi2
+// int __popcountsi2(int)
+// Returns '0' if the number of bits set in $r0 is even, and '1' otherwise.
+// Returns the result in $r0.
+// Uses $r2 as scratch space.
+WEAK_START_SECTION popcountsi2 .text.sorted.libgcc.popcountsi2
+    CFI_START_FUNCTION
+
+#else /* L_popcountdi2 */
+FUNC_ENTRY popcountsi2
+
+#endif
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Initialize the result.
+        // Compensate for the extra loop required to detect zero.
+        movs    r2,     #1
+
+        // Kernighan's algorithm for __popcount(x):
+        //     for (c = 0; x; c++)
+        //         x &= x - 1;
+
+    LLSYM(__popcounts_loop):
+        // Every loop counts for a '1' set in the argument.
+        // Count down since it's easier to initialize positive compensation,
+        //  and the negation before function return is free.
+        subs    r2,     #1
+
+        // Clear one bit per loop.
+        subs    r3,     r0,     #1
+        ands    r0,     r3
+
+        // If this is a test for zero, it will be impossible to distinguish
+        //  between zero and one bits set: both terminate after one loop.
+        // Instead, subtraction underflow flags when zero entered the loop.
+        bcs     LLSYM(__popcounts_loop)
+
+        // Invert the result, since we have been counting negative.
+        rsbs    r0,     r2,     #0
+        RET
+
+  #else /* !__OPTIMIZE_SIZE__ */
+
+        // Load the one-bit alternating mask.
+        ldr     r3,     =0x55555555
+
+        // Reduce the word.
+        lsrs    r1,     r0,     #1
+        ands    r1,     r3
+        subs    r0,     r1
+
+        // Load the two-bit alternating mask.
+        ldr     r3,     =0x33333333
+
+        // Reduce the word.
+        lsrs    r1,     r0,     #2
+        ands    r0,     r3
+        ands    r1,     r3
+    LLSYM(__popcounts_merge):
+        adds    r0,     r1
+
+        // Load the four-bit alternating mask.
+        ldr     r3,     =0x0F0F0F0F
+
+        // Reduce the word.
+        lsrs    r1,     r0,     #4
+        ands    r0,     r3
+        ands    r1,     r3
+        adds    r0,     r1
+
+        // Accumulate individual byte sums into the MSB.
+        lsls    r1,     r0,     #8
+        adds    r0,     r1
+        lsls    r1,     r0,     #16
+        adds    r0,     r1
+
+        // Isolate the cumulative sum.
+        lsrs    r0,     #24
+        RET
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    CFI_END_FUNCTION
+FUNC_END popcountsi2
+
+#ifdef L_popcountdi2
+FUNC_END popcountdi2
+#endif
+
+#endif /* L_popcountsi2 || L_popcountdi2 */
+
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 0e9b9ce21af..2e3f04aa2f0 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -25,6 +25,7 @@ LIB1ASMFUNCS += \
         _clzsi2 \
 	_ctzsi2 \
 	_paritysi2 \
+	_popcountsi2 \
 
 
 # Group 1: Integer function objects.
@@ -39,6 +40,7 @@ LIB1ASMFUNCS += \
 	_ffssi2 \
 	_ffsdi2 \
 	_paritydi2 \
+	_popcountdi2 \
 	_dvmd_tls \
 	_divsi3 \
 	_modsi3 \

From patchwork Mon Oct 31 15:45:11 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59676
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3A77B39730EB
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:50:13 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 1931E384D14E
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1931E384D14E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailout.west.internal (Postfix) with ESMTP id 069513200917;
 Mon, 31 Oct 2022 11:47:38 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute4.internal (MEProxy); Mon, 31 Oct 2022 11:47:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231258; x=
 1667317658; bh=rWEO6S5kTbPxTPeU1lAXKHu7CRCjjemHEZAIf0dLQDw=; b=m
 60/ToT8KWx/ir9G29cIbjRoFaMcwVDvJ3gjwESCgCOc9z3TIxCb3ElN7zblouy7q
 DcGwFsJGWNsJ1QYipC1mBANzmYIjN4p2HAYpyDBG7uWUtZWK3sK4tOvjYla9WYmn
 FVOU0uaqrjjfMLyjLwpPZuzsQXTi9rQ+llqeMYXB7j1+8z2eRzeA9BdzTgThEu59
 Jvu1Sgr/xhbU9g9NnqsohUE/UX81L/wFCpeBRnDZv2GCBGomhCUILTrFwpZN7cBS
 o6QBMep/pGWo8a1lYedrbN8cd5ZuwvzS2cQixCvSHJd3I2zUjshX0e7QydtmoHJ7
 bKHBRCHM+2KLyUOi1asGw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231258; x=1667317658; bh=rWEO6S5kTbPxT
 PeU1lAXKHu7CRCjjemHEZAIf0dLQDw=; b=BrA2wdcm9Z4zPGZ4jdHnoEUPegRWn
 FgC677KrPnpsTo5wFC0jGc7uDgw4IeOcnJp51C19mkpOP47Ai1/KuM2WEHypk0AE
 eer+QGahelRY/xg0eiJwFyoE0s03qHCF+yXPsLLn8c0nPgY/Hh1IUBHkGy9f+op0
 u+8z6rAsphDWK2VtkOXAB2qt/WDWS56lApPQ9AOgzHZXv2qqwK9Yid/2NGAA5pBk
 +XeS1TGSdvc10aDa1NvTMN2Uk86H+9BeI+IuDeh8HHFPa9SAya4v42wuKBm6phVH
 gW8BMKMFmXUKThJ547Jcb/auWbhapEKl8up8rmQvQaSB6mNTRGuvIgxTg==
X-ME-Sender: <xms:Gu5fYybwObtmb-ALkHyiG31teVnlyOO8yJwWan5Pz0Y-trRILMWVlA>
 <xme:Gu5fY1aAa2TLMX2mrMG9ApB2q2Dk5v9yz-26NixSdAZWeERKqyTrDpxVNlsGZIcZv
 bLSPuKJWtnF_Q>
X-ME-Received: 
 <xmr:Gu5fY89AuzEvVNoFJQXLbWucbCzoA8sdBb1FPLF7Kp5flAw2XYLBL6bEE0e09zard3b9bIbM2KBk7Qk5OhZv9Gr-5XuKhBMXuqbO1UXdhx0iBWFFZ2h7Wrs>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeevffefjeevvdekjefgteekkeffjeeiueelveeufefhhfdvieelvdekfffg
 hfefvdenucffohhmrghinhepsghprggsihdqvheimhdrshgspdhltghmphdrshgspdhgnh
 hurdhorhhgpdhlihgsudhfuhhntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecu
 rfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:Gu5fY0r-q-7E4lY70sDIX13d8mLiQkGP9cEhZC97Fw4iRJp6aev4qw>
 <xmx:Gu5fY9qFLfpS0nFW3UHihjLlvI6xrecXqh89n4TFGywEOV22nQZ6NA>
 <xmx:Gu5fYyR2gl3oXQ2URh_48uVy1VaBXCM4nyZJUMPflEFirl4p3QwoUg>
 <xmx:Gu5fY020TQHn3M8ChbNMDVowFaDQM2Ne58b-qUTKGXUwOevql9Znxg>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:37 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFlULK087277; Mon, 31 Oct 2022 08:47:30 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 16/34] Refactor Thumb-1 64-bit comparison into a new file
Date: Mon, 31 Oct 2022 08:45:11 -0700
Message-Id: <20221031154529.3627576-17-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bpabi-v6m.S (__aeabi_lcmp, __aeabi_ulcmp): Moved to ...
	* config/arm/eabi/lcmp.S: New file.
	* config/arm/lib1funcs.S: #include eabi/lcmp.S.
---
 libgcc/config/arm/bpabi-v6m.S | 46 ----------------------
 libgcc/config/arm/eabi/lcmp.S | 73 +++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 74 insertions(+), 46 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index ea01d3f4d5f..3757e99508e 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,52 +33,6 @@
 	.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-FUNC_START aeabi_lcmp
-	cmp	xxh, yyh
-	beq	1f
-	bgt	2f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-2:
-	movs	r0, #1
-	RET
-1:
-	subs	r0, xxl, yyl
-	beq	1f
-	bhi	2f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-2:
-	movs	r0, #1
-1:
-	RET
-	FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-	
-#ifdef L_aeabi_ulcmp
-
-FUNC_START aeabi_ulcmp
-	cmp	xxh, yyh
-	bne	1f
-	subs	r0, xxl, yyl
-	beq	2f
-1:
-	bcs	1f
-	movs	r0, #1
-	negs	r0, r0
-	RET
-1:
-	movs	r0, #1
-2:
-	RET
-	FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
 
 .macro test_div_by_zero signed
 	cmp	yyh, #0
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
new file mode 100644
index 00000000000..336db1d398c
--- /dev/null
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -0,0 +1,73 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_aeabi_lcmp
+
+FUNC_START aeabi_lcmp
+        cmp     xxh, yyh
+        beq     1f
+        bgt     2f
+        movs    r0, #1
+        negs    r0, r0
+        RET
+2:
+        movs    r0, #1
+        RET
+1:
+        subs    r0, xxl, yyl
+        beq     1f
+        bhi     2f
+        movs    r0, #1
+        negs    r0, r0
+        RET
+2:
+        movs    r0, #1
+1:
+        RET
+        FUNC_END aeabi_lcmp
+
+#endif /* L_aeabi_lcmp */
+
+#ifdef L_aeabi_ulcmp
+
+FUNC_START aeabi_ulcmp
+        cmp     xxh, yyh
+        bne     1f
+        subs    r0, xxl, yyl
+        beq     2f
+1:
+        bcs     1f
+        movs    r0, #1
+        negs    r0, r0
+        RET
+1:
+        movs    r0, #1
+2:
+        RET
+        FUNC_END aeabi_ulcmp
+
+#endif /* L_aeabi_ulcmp */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 0eb6d1d52a7..d85a20252d9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1991,5 +1991,6 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+#include "eabi/lcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #endif /* !__symbian__ */

From patchwork Mon Oct 31 15:45:12 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59670
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A8AA5395447D
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:49:05 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id AEB223895FC9
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AEB223895FC9
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailout.west.internal (Postfix) with ESMTP id 2F0EE3200974;
 Mon, 31 Oct 2022 11:47:44 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute4.internal (MEProxy); Mon, 31 Oct 2022 11:47:44 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231263; x=
 1667317663; bh=tQrBo31Z5cT8vV3BZaQU7syldrdJnkhzZl0mnkod1oU=; b=r
 WwpKS3NhlhK3SAbkSm5Mdn3zTHNHNARf8oMHJ3y4el2DVo0kbPrabBoholOT0aGD
 LaG+oSpw7Nd6hfCOKo7CVRL+y3FDC8tAv0Bsm4MT18pPpF8V8qQUKmkPiiILGcvV
 pqB3F1F5xVk0M01zExCw1tbrO2NVAv0HCEsb6YPPLkCJqcdihJQsrJEX9LDMcZu0
 SoaYRnsWK3yfTGGO7icGIiUMoHTp3xAlUqTGEOmVqWqB468Iwms1du8HA2/U8UOR
 HFA8rCGlfxiqes4uOl4RRQ153RTIFMEPoORArD1H/M29YEWVNl0hovNdf8PM/AP6
 DO1QQWEq71K0vmUm6L2qw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231263; x=1667317663; bh=tQrBo31Z5cT8v
 V3BZaQU7syldrdJnkhzZl0mnkod1oU=; b=VbVnJ/b7D+SkcDI+8ty9rw3NUujvk
 f90l9ANAr96vBFyPj9gGiIz8YscC73mlh6M6sAQuTyT00biLhw0LfS8UqHTQ7cqH
 bwoXZo2gSpD++bpQ67Ofa0u/Wdnm0qzovz1K7VnKofIMFWaqpWnxOr+gQzUkpERT
 7Xcj7cmIRWIfjowhW0xv13GT1RlMk22YcLhgBQI81c0ynh1rdimqvTBIiUvIcCzI
 XJHiWuSJUCqgsnrJ2NUPn+Ljd8KQiw5SUqGUkttjWJ47MVu8Ddrqdhvcya1sFbWu
 4Wc8QvFNv5rqw16uXhxanR6uOQKPQCgRQ98SPP+AceIMDRT5CyvfgIisQ==
X-ME-Sender: <xms:H-5fY84SB73H6I0WRs6b7J3hgDSX8YbEiX_zznvSZSD8VV7E2IogGg>
 <xme:H-5fY94xxhirOMdylhNe78VNvmADpsuwR3uvoG6jN6SaNzzyeQG821I0iTpdgI2-v
 Dd3z2BZ_jxnoA>
X-ME-Received: 
 <xmr:H-5fY7cHORGAWCJcGkHUDHHQdu2El6mRywvnlT2_F4_RsOmwVBspN5CYMpLk2VDwiyi9IFXpugtpMm8NobmTKig5fgiDxfrjaS3dcR9Ky9Zi-08hzNEOGF0>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeeihedvieegkeduteejheeuteekieeiffduudekgfdtueeuieekvdeukeff
 gfelleenucffohhmrghinheplhgtmhhprdhssgdpghhnuhdrohhrghenucevlhhushhtvg
 hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgv
 nhhgvghlrdgtohhm
X-ME-Proxy: <xmx:H-5fYxIfzqX7Xrq-fZ_SUN57iWzho75l4Ez4lXgQkANqeGWRpmb-hw>
 <xmx:H-5fYwIriUXpEMn2TLo9ULxgHN6Woh63JnsrVjXtIu4j8exTpeX91g>
 <xmx:H-5fYyy4ybovDVe8wB5oxbjuG3WYdViVV5ABhT8XTSE3_MYEov3_ww>
 <xmx:H-5fY3XWeVpTOPidrO53OsIB8injmXCDRsE7eoSxcGojppbnDaguKw>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:43 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFlZYS087280; Mon, 31 Oct 2022 08:47:35 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 17/34] Import 64-bit comparison from CM0 library
Date: Mon, 31 Oct 2022 08:45:12 -0700
Message-Id: <20221031154529.3627576-18-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

These are 2-5 instructions smaller and just as fast.  Branches are
minimized, which will allow easier adaptation to Thumb-2/ARM mode.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Replaced;
	add macro configuration to build __cmpdi2() and __ucmpdi2().
	* config/arm/t-elf (LIB1ASMFUNCS): Added _cmpdi2 and _ucmpdi2.
---
 libgcc/config/arm/eabi/lcmp.S | 151 +++++++++++++++++++++++++---------
 libgcc/config/arm/t-elf       |   2 +
 2 files changed, 112 insertions(+), 41 deletions(-)

diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 336db1d398c..99c7970ecba 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* lcmp.S: Thumb-1 optimized 64-bit integer comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
 
    This file is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
@@ -24,50 +23,120 @@
    <http://www.gnu.org/licenses/>.  */
 
 
+#if defined(L_aeabi_lcmp) || defined(L_cmpdi2)
+
 #ifdef L_aeabi_lcmp
+  #define LCMP_NAME aeabi_lcmp
+  #define LCMP_SECTION .text.sorted.libgcc.lcmp
+#else
+  #define LCMP_NAME cmpdi2
+  #define LCMP_SECTION .text.sorted.libgcc.cmpdi2
+#endif
+
+// int __aeabi_lcmp(long long, long long)
+// int __cmpdi2(long long, long long)
+// Compares the 64 bit signed values in $r1:$r0 and $r3:$r2.
+// lcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// cmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION LCMP_NAME LCMP_SECTION
+    CFI_START_FUNCTION
+
+        // Calculate the difference $r1:$r0 - $r3:$r2.
+        subs    xxl,    yyl
+        sbcs    xxh,    yyh
+
+        // With $r2 free, create a known offset value without affecting
+        //  the N or Z flags.
+        // BUG? The originally unified instruction for v6m was 'mov r2, r3'.
+        //  However, this resulted in a compile error with -mthumb:
+        //    "MOV Rd, Rs with two low registers not permitted".
+        // Since unified syntax deprecates the "cpy" instruction, shouldn't
+        //  there be a backwards-compatible tranlation available?
+        cpy     r2,     r3
+
+        // Evaluate the comparison result.
+        blt     LLSYM(__lcmp_lt)
+
+        // The reference offset ($r2 - $r3) will be +2 iff the first
+        //  argument is larger, otherwise the offset value remains 0.
+        adds    r2,     #2
+
+        // Check for zero (equality in 64 bits).
+        // It doesn't matter which register was originally "hi".
+        orrs    r0,    r1
+
+        // The result is already 0 on equality.
+        beq     LLSYM(__lcmp_return)
+
+    LLSYM(__lcmp_lt):
+        // Create +1 or -1 from the offset value defined earlier.
+        adds    r3,     #1
+        subs    r0,     r2,     r3
+
+    LLSYM(__lcmp_return):
+      #ifdef L_cmpdi2
+        // Offset to the correct output specification.
+        adds    r0,     #1
+      #endif
 
-FUNC_START aeabi_lcmp
-        cmp     xxh, yyh
-        beq     1f
-        bgt     2f
-        movs    r0, #1
-        negs    r0, r0
-        RET
-2:
-        movs    r0, #1
-        RET
-1:
-        subs    r0, xxl, yyl
-        beq     1f
-        bhi     2f
-        movs    r0, #1
-        negs    r0, r0
-        RET
-2:
-        movs    r0, #1
-1:
         RET
-        FUNC_END aeabi_lcmp
 
-#endif /* L_aeabi_lcmp */
+    CFI_END_FUNCTION
+FUNC_END LCMP_NAME
+
+#endif /* L_aeabi_lcmp || L_cmpdi2 */
+
+
+#if defined(L_aeabi_ulcmp) || defined(L_ucmpdi2)
 
 #ifdef L_aeabi_ulcmp
+  #define ULCMP_NAME aeabi_ulcmp
+  #define ULCMP_SECTION .text.sorted.libgcc.ulcmp
+#else
+  #define ULCMP_NAME ucmpdi2
+  #define ULCMP_SECTION .text.sorted.libgcc.ucmpdi2
+#endif
+
+// int __aeabi_ulcmp(unsigned long long, unsigned long long)
+// int __ucmpdi2(unsigned long long, unsigned long long)
+// Compares the 64 bit unsigned values in $r1:$r0 and $r3:$r2.
+// ulcmp() returns $r0 = { -1, 0, +1 } for orderings { <, ==, > } respectively.
+// ucmpdi2() returns $r0 = { 0, 1, 2 } for orderings { <, ==, > } respectively.
+// Object file duplication assumes typical programs follow one runtime ABI.
+FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
+    CFI_START_FUNCTION
+
+        // Calculate the 'C' flag.
+        subs    xxl,    yyl
+        sbcs    xxh,    yyh
+
+        // Capture the carry flg.
+        // $r2 will contain -1 if the first value is smaller,
+        //  0 if the first value is larger or equal.
+        sbcs    r2,     r2
+
+        // Check for zero (equality in 64 bits).
+        // It doesn't matter which register was originally "hi".
+        orrs    r0,     r1
+
+        // The result is already 0 on equality.
+        beq     LLSYM(__ulcmp_return)
+
+        // Assume +1.  If -1 is correct, $r2 will override.
+        movs    r0,     #1
+        orrs    r0,     r2
+
+    LLSYM(__ulcmp_return):
+      #ifdef L_ucmpdi2
+        // Offset to the correct output specification.
+        adds    r0,     #1
+      #endif
 
-FUNC_START aeabi_ulcmp
-        cmp     xxh, yyh
-        bne     1f
-        subs    r0, xxl, yyl
-        beq     2f
-1:
-        bcs     1f
-        movs    r0, #1
-        negs    r0, r0
-        RET
-1:
-        movs    r0, #1
-2:
         RET
-        FUNC_END aeabi_ulcmp
 
-#endif /* L_aeabi_ulcmp */
+    CFI_END_FUNCTION
+FUNC_END ULCMP_NAME
+
+#endif /* L_aeabi_ulcmp || L_ucmpdi2 */
 
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 2e3f04aa2f0..83325410097 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -41,6 +41,8 @@ LIB1ASMFUNCS += \
 	_ffsdi2 \
 	_paritydi2 \
 	_popcountdi2 \
+	_cmpdi2 \
+	_ucmpdi2 \
 	_dvmd_tls \
 	_divsi3 \
 	_modsi3 \

From patchwork Mon Oct 31 15:45:13 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59674
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 95737385140C
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:49:56 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id AEE5A389838B
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AEE5A389838B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id 42EBE320093B;
 Mon, 31 Oct 2022 11:47:49 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:47:49 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231268; x=
 1667317668; bh=RmvQX19ZLua4MFD7xxv/ty/xl14A61A1/BK9hTZL4J8=; b=i
 uA59xSoKGm1Sige9LOS6LWTsewcZ5z6PxPLWYuveWVWNv3dpy8iF1anYWh7cwVE4
 sZU+QUq+xYBKnrAF6phutjnj2f1Jt5n9ubzF0hAh1/omcJhOx8kT97dMegly5Ydt
 W0fhHenw1yjne5XYuL4LQH6r2ngXh84wTABc07ScJ4+6geHqZtQj3Ny3Fjwwm3rG
 Geff2m0ggV0Ii3uHB0qT3RKkYlkvdTM1netcFP9fb1H3oop+s1phvgXFzfMCvsqA
 ZXYuryCnO4Ktr08l5vkQR8Pf9zydFQ1ePEvEcIYtL9pMhnB+ofldarzXgXp5w+cF
 uHi6ntb5EPrAJDuP9zAKw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231268; x=1667317668; bh=RmvQX19ZLua4M
 FD7xxv/ty/xl14A61A1/BK9hTZL4J8=; b=VuxuXiVq0H1bK8584wtlWl3j42GBC
 0y4E1Kxp3VaS7Gvz3n56HCbyagVyoPlzL6ulezj2EfQAhwwCFjLFUtHyHCikNHps
 m8dgt7nwRL22yuaPvJd6Nvg2/LuWl94nOjRwdFNqrC9/RumiciGXcsKKVMkQpDEf
 A23c8FJdlPWizadCQYGWQbwv72THX5Gx9bU+K6OZr9E9fbJ2/ER/JJSJrwI682s1
 3OSengWU1kcaGYlAM6z3LH0lcyCsRL/0LxcAZB5Tw7EdJXiGV69Rv+vlBbgyd2MQ
 4TE4tWKP7ZOB7EGQSheBzu61g5a3y12819Yed4XGss53y8jhoWD6Rn7rA==
X-ME-Sender: <xms:JO5fYyiq_oAzLNDJYgI0c5a6v8BacFDguRyUncSaK8ci0skwCMaWvw>
 <xme:JO5fYzB_JI5UYptLZY7sWYFmSO7hzJ4aqkCE8Xa0jHU_P0vecvWHMifoM3SnDgBgx
 wT-Jk0upOSaoQ>
X-ME-Received: 
 <xmr:JO5fY6GfbyfIwGINheMRRa6byUMyCnKslLTSxEMSJsQGkam8X6GeT5f_3WWIY4qjHHqPCUIy-5BPqFYFzMOIC6ipHXr5M9J9_futxPcrKEsfTnHFzoPGKoQ>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeffgffgfeefgffghefhjeetueelveejleffueefvdehkefggedtheduvedu
 vedvffenucffohhmrghinhepsghprggsihdrshgspdhltghmphdrshgspdhlihgsudhfuh
 hntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr
 ohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:JO5fY7QJQR8M67iJjB7ujquqDSo90h8lU2q_T4aNry8_tkXF-53nuQ>
 <xmx:JO5fY_ydnAMukQ5CSijTrJgREr2hfgzSQl3dhQLZE4UVNmcyhJ-S-Q>
 <xmx:JO5fY55I1uxmMYstt3PS4VzMMhUTWDtAATSbrb25EuNPnK0AIBE19w>
 <xmx:JO5fY6_uhtMSMJ1zRiFrePIQ-4cetCRq24jvGgUXqWBPjF6HQhLXqA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:48 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFleMo087283; Mon, 31 Oct 2022 08:47:40 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 18/34] Merge Thumb-2 optimizations for 64-bit comparison
Date: Mon, 31 Oct 2022 08:45:13 -0700
Message-Id: <20221031154529.3627576-19-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This effectively merges support for all architecture variants into a
common function path with appropriate build conditions.
ARM performance is 1-2 instructions faster; Thumb-2 is about 50% faster.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bpabi.S (__aeabi_lcmp, __aeabi_ulcmp): Removed.
	* config/arm/eabi/lcmp.S (__aeabi_lcmp, __aeabi_ulcmp): Added
	conditional execution on supported architectures (__ARM_FEATURE_IT).
	* config/arm/lib1funcs.S: Moved #include scope of eabi/lcmp.S.
---
 libgcc/config/arm/bpabi.S     | 42 -------------------------------
 libgcc/config/arm/eabi/lcmp.S | 47 ++++++++++++++++++++++++++++++++++-
 libgcc/config/arm/lib1funcs.S |  2 +-
 3 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
index 17fe707ddf3..531a64fa98d 100644
--- a/libgcc/config/arm/bpabi.S
+++ b/libgcc/config/arm/bpabi.S
@@ -34,48 +34,6 @@
 	.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-#ifdef L_aeabi_lcmp
-
-ARM_FUNC_START aeabi_lcmp
-	cmp	xxh, yyh
-	do_it	lt
-	movlt	r0, #-1
-	do_it	gt
-	movgt	r0, #1
-	do_it	ne
-	RETc(ne)
-	subs	r0, xxl, yyl
-	do_it	lo
-	movlo	r0, #-1
-	do_it	hi
-	movhi	r0, #1
-	RET
-	FUNC_END aeabi_lcmp
-
-#endif /* L_aeabi_lcmp */
-	
-#ifdef L_aeabi_ulcmp
-
-ARM_FUNC_START aeabi_ulcmp
-	cmp	xxh, yyh
-	do_it	lo
-	movlo	r0, #-1
-	do_it	hi
-	movhi	r0, #1
-	do_it	ne
-	RETc(ne)
-	cmp	xxl, yyl
-	do_it	lo
-	movlo	r0, #-1
-	do_it	hi
-	movhi	r0, #1
-	do_it	eq
-	moveq	r0, #0
-	RET
-	FUNC_END aeabi_ulcmp
-
-#endif /* L_aeabi_ulcmp */
-
 .macro test_div_by_zero signed
 /* Tail-call to divide-by-zero handlers which may be overridden by the user,
    so unwinding works properly.  */
diff --git a/libgcc/config/arm/eabi/lcmp.S b/libgcc/config/arm/eabi/lcmp.S
index 99c7970ecba..d397325cbef 100644
--- a/libgcc/config/arm/eabi/lcmp.S
+++ b/libgcc/config/arm/eabi/lcmp.S
@@ -46,6 +46,19 @@ FUNC_START_SECTION LCMP_NAME LCMP_SECTION
         subs    xxl,    yyl
         sbcs    xxh,    yyh
 
+    #ifdef __HAVE_FEATURE_IT
+        do_it   lt,t
+
+      #ifdef L_aeabi_lcmp
+        movlt   r0,    #-1
+      #else
+        movlt   r0,    #0
+      #endif
+
+        // Early return on '<'.
+        RETc(lt)
+
+    #else /* !__HAVE_FEATURE_IT */
         // With $r2 free, create a known offset value without affecting
         //  the N or Z flags.
         // BUG? The originally unified instruction for v6m was 'mov r2, r3'.
@@ -62,17 +75,27 @@ FUNC_START_SECTION LCMP_NAME LCMP_SECTION
         //  argument is larger, otherwise the offset value remains 0.
         adds    r2,     #2
 
+    #endif
+
         // Check for zero (equality in 64 bits).
         // It doesn't matter which register was originally "hi".
         orrs    r0,    r1
 
+    #ifdef __HAVE_FEATURE_IT
+        // The result is already 0 on equality.
+        // -1 already returned, so just force +1.
+        do_it   ne
+        movne   r0,     #1
+
+    #else /* !__HAVE_FEATURE_IT */
         // The result is already 0 on equality.
         beq     LLSYM(__lcmp_return)
 
-    LLSYM(__lcmp_lt):
+      LLSYM(__lcmp_lt):
         // Create +1 or -1 from the offset value defined earlier.
         adds    r3,     #1
         subs    r0,     r2,     r3
+    #endif
 
     LLSYM(__lcmp_return):
       #ifdef L_cmpdi2
@@ -111,21 +134,43 @@ FUNC_START_SECTION ULCMP_NAME ULCMP_SECTION
         subs    xxl,    yyl
         sbcs    xxh,    yyh
 
+    #ifdef __HAVE_FEATURE_IT
+        do_it   lo,t
+
+      #ifdef L_aeabi_ulcmp
+        movlo   r0,     -1
+      #else
+        movlo   r0,     #0
+      #endif
+
+        // Early return on '<'.
+        RETc(lo)
+
+    #else
         // Capture the carry flg.
         // $r2 will contain -1 if the first value is smaller,
         //  0 if the first value is larger or equal.
         sbcs    r2,     r2
+    #endif
 
         // Check for zero (equality in 64 bits).
         // It doesn't matter which register was originally "hi".
         orrs    r0,     r1
 
+    #ifdef __HAVE_FEATURE_IT
+        // The result is already 0 on equality.
+        // -1 already returned, so just force +1.
+        do_it   ne
+        movne   r0,     #1
+
+    #else /* !__HAVE_FEATURE_IT */
         // The result is already 0 on equality.
         beq     LLSYM(__ulcmp_return)
 
         // Assume +1.  If -1 is correct, $r2 will override.
         movs    r0,     #1
         orrs    r0,     r2
+    #endif
 
     LLSYM(__ulcmp_return):
       #ifdef L_ucmpdi2
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index d85a20252d9..796f6f30ed9 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1991,6 +1991,6 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
-#include "eabi/lcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
+#include "eabi/lcmp.S"
 #endif /* !__symbian__ */

From patchwork Mon Oct 31 15:45:14 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59679
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2D7C8382EA0D
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:50:47 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id A7B9E381E5FC
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:47:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A7B9E381E5FC
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id 8E1E7320096B;
 Mon, 31 Oct 2022 11:47:54 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:47:54 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231274; x=
 1667317674; bh=1JS7w8JapxuJXBFJQQt9VwavHbIZflw7mCc/Jajy2O8=; b=G
 AsiXzpARX0v7N+1g04LEbjsUTBXqjqG/9Gkuk7mf1/gB2gxP3CXHw+3JCgnrNqXi
 o5b8qYf/MN5E4/lgwgpzuIK6HqwDOzDYd2LN3drheY73ddP6ygkyXi38PvbeiogJ
 jvylins6Z1Ka45dgQmMIcCyK93e7Lj4plnxB6b++2uxaehKaGuFn2KnXCbL+vOWw
 0bqfUoIQJyXKmOQZDLKt9sdf4Lpmc7JUchBIOhrRXxfA7LgpM1dwnLEKBEKmX8RW
 gzwAwir7G00Ih10aFMIhzy5OzH8BOiS9afqwGDoQbOoMNW4vfKIT1naMDkzxJUqw
 HHaU/FpTCtPPemnhDFzRA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231274; x=1667317674; bh=1JS7w8JapxuJX
 BFJQQt9VwavHbIZflw7mCc/Jajy2O8=; b=kg/xR5MuzT6RAG40cSt2jQkJaQs8G
 ZJaeDCX5xq0p0Z/W0m8WL7DK7HW9CPFqrwjds/h09Kn7WfjmTzz37V8Gaqqw/zW5
 CEgjfRwFl4cKVdkYU0hS0X9po7X2fpdKM3PaEmZpV7183u/1Cy3SPz1VIaqBGhMm
 KNx8heke+2Hjl1pQWwKcrKLpB8JVyoYhzxHylfTftKkctlWJlTM7k7N6+Ju8+g9U
 Fk4Gi2hZcTWlCv3thQswMuQZ/b1qlu6GYLYlULyVQfo/X/V91ko6CD8gP7SfNt0K
 ZhYDHQ5trY5M/AfsHH+0nSlJdZK5opawhoc9toEjGE+ceUwt2TdDlSv8Q==
X-ME-Sender: <xms:Ke5fY9DWdEChBJmol6E0m1XDtWbceJpb3A8-6NIJHJZBrI_HjsSlhg>
 <xme:Ke5fY7hydBrgoL_qrjNCnEkdCyMWUDBgo8T4xmSslu_QnN-VtKnLsysafPzSFyAgx
 jltNPfFdfu0qw>
X-ME-Received: 
 <xmr:Ke5fY4mApivSOxYAQC1NlF98s0SvvRiIDoseqItmxuovOSxqD8UbRlc7EBTR5wt4Mlv5YnmI-axqbGJT35CZHiWO-6OMDmEdeiQnc5sNmPeDCxXTzYbrD6U>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeevuefgueeiueetkeekiefhhfetkeeiieelhfegudehheejjeekieetudek
 uddvudenucffohhmrghinhepihguihhvrdhssgdpghhnuhdrohhrghdpghgttgdrthgrrh
 hgvghtpdhlihgsudhfuhhntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgr
 rhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:Ku5fY3w8j77aKjQQHUwXBNtK2d66zV71Op6nXS6dadH7oSUzGX82eA>
 <xmx:Ku5fYyTt51lw2vIuZE07Okpo07iDEGMglaNgDbQjzVRX2LSzfxWMXw>
 <xmx:Ku5fY6bs5PzQpQVvRxRssmpNK30f0u6TVIUhpnjgZTqAU5GGE9qpsg>
 <xmx:Ku5fY9f6LQL3kwiH00446q3PhLeU1pfJDsIqA6mHmhaRXZWzzpzsww>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:53 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFlj1B087286; Mon, 31 Oct 2022 08:47:45 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 19/34] Import 32-bit division from the CM0 library
Date: Mon, 31 Oct 2022 08:45:14 -0700
Message-Id: <20221031154529.3627576-20-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/idiv.S: New file for __udivsi3() and __divsi3().
	* config/arm/lib1funcs.S: #include eabi/idiv.S (v6m only).
---
 libgcc/config/arm/eabi/idiv.S | 299 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |  19 ++-
 2 files changed, 317 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi/idiv.S

diff --git a/libgcc/config/arm/eabi/idiv.S b/libgcc/config/arm/eabi/idiv.S
new file mode 100644
index 00000000000..6e54863611a
--- /dev/null
+++ b/libgcc/config/arm/eabi/idiv.S
@@ -0,0 +1,299 @@
+/* div.S: Thumb-1 size-optimized 32-bit integer division
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef __GNUC__
+
+// int __aeabi_idiv0(int)
+// Helper function for division by 0.
+WEAK_START_SECTION aeabi_idiv0 .text.sorted.libgcc.idiv.idiv0
+FUNC_ALIAS cm0_idiv0 aeabi_idiv0
+    CFI_START_FUNCTION
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_DIVISION_BY_ZERO)
+      #endif
+
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END cm0_idiv0
+FUNC_END aeabi_idiv0
+
+#endif /* !__GNUC__ */
+
+
+#ifdef L_divsi3
+
+// int __aeabi_idiv(int, int)
+// idiv_return __aeabi_idivmod(int, int)
+// Returns signed $r0 after division by $r1.
+// Also returns the signed remainder in $r1.
+// Same parent section as __divsi3() to keep branches within range.
+FUNC_START_SECTION divsi3 .text.sorted.libgcc.idiv.divsi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_idiv divsi3
+  FUNC_ALIAS aeabi_idivmod divsi3
+#endif
+
+    CFI_START_FUNCTION
+
+        // Extend signs.
+        asrs    r2,     r0,     #31
+        asrs    r3,     r1,     #31
+
+        // Absolute value of the denominator, abort on division by zero.
+        eors    r1,     r3
+        subs    r1,     r3
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LLSYM(__idivmod_zero)
+      #else
+        beq     SYM(__uidivmod_zero)
+      #endif
+
+        // Absolute value of the numerator.
+        eors    r0,     r2
+        subs    r0,     r2
+
+        // Keep the sign of the numerator in bit[31] (for the remainder).
+        // Save the XOR of the signs in bits[15:0] (for the quotient).
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        lsrs    rT,     r3,     #16
+        eors    rT,     r2
+
+        // Handle division as unsigned.
+        bl      SYM(__uidivmod_nonzero) __PLT__
+
+        // Set the sign of the remainder.
+        asrs    r2,     rT,     #31
+        eors    r1,     r2
+        subs    r1,     r2
+
+        // Set the sign of the quotient.
+        sxth    r3,     rT
+        eors    r0,     r3
+        subs    r0,     r3
+
+    LLSYM(__idivmod_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LLSYM(__idivmod_zero):
+        // Set up the *div0() parameter specified in the ARM runtime ABI:
+        //  * 0 if the numerator is 0,
+        //  * Or, the largest value of the type manipulated by the calling
+        //     division function if the numerator is positive,
+        //  * Or, the least value of the type manipulated by the calling
+        //     division function if the numerator is negative.
+        subs    r1,     r0
+        orrs    r0,     r1
+        asrs    r0,     #31
+        lsrs    r0,     #1
+        eors    r0,     r2
+
+        // At least the __aeabi_idiv0() call is common.
+        b       SYM(__uidivmod_zero2)
+  #endif /* PEDANTIC_DIV0 */
+
+    CFI_END_FUNCTION
+FUNC_END divsi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_idiv
+  FUNC_END aeabi_idivmod
+#endif 
+
+#endif /* L_divsi3 */
+
+
+#ifdef L_udivsi3
+
+// int __aeabi_uidiv(unsigned int, unsigned int)
+// idiv_return __aeabi_uidivmod(unsigned int, unsigned int)
+// Returns unsigned $r0 after division by $r1.
+// Also returns the remainder in $r1.
+FUNC_START_SECTION udivsi3 .text.sorted.libgcc.idiv.udivsi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_uidiv udivsi3
+  FUNC_ALIAS aeabi_uidivmod udivsi3
+#endif
+
+    CFI_START_FUNCTION
+
+        // Abort on division by zero.
+        tst     r1,     r1
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LLSYM(__uidivmod_zero)
+      #else
+        beq     SYM(__uidivmod_zero)
+      #endif
+
+  #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+        // MAYBE: Optimize division by a power of 2
+  #endif
+
+    // Public symbol for the sake of divsi3().
+    FUNC_ENTRY uidivmod_nonzero
+        // Pre division: Shift the denominator as far as possible left
+        //  without making it larger than the numerator.
+        // The loop is destructive, save a copy of the numerator.
+        mov     ip,     r0
+
+        // Set up binary search.
+        movs    r3,     #16
+        movs    r2,     #1
+
+    LLSYM(__uidivmod_align):
+        // Prefer dividing the numerator to multipying the denominator
+        //  (multiplying the denominator may result in overflow).
+        lsrs    r0,     r3
+        cmp     r0,     r1
+        blo     LLSYM(__uidivmod_skip)
+
+        // Multiply the denominator and the result together.
+        lsls    r1,     r3
+        lsls    r2,     r3
+
+    LLSYM(__uidivmod_skip):
+        // Restore the numerator, and iterate until search goes to 0.
+        mov     r0,     ip
+        lsrs    r3,     #1
+        bne     LLSYM(__uidivmod_align)
+
+        // In The result $r3 has been conveniently initialized to 0.
+        b       LLSYM(__uidivmod_entry)
+
+    LLSYM(__uidivmod_loop):
+        // Scale the denominator and the quotient together.
+        lsrs    r1,     #1
+        lsrs    r2,     #1
+        beq     LLSYM(__uidivmod_return)
+
+    LLSYM(__uidivmod_entry):
+        // Test if the denominator is smaller than the numerator.
+        cmp     r0,     r1
+        blo     LLSYM(__uidivmod_loop)
+
+        // If the denominator is smaller, the next bit of the result is '1'.
+        // If the new remainder goes to 0, exit early.
+        adds    r3,     r2
+        subs    r0,     r1
+        bne     LLSYM(__uidivmod_loop)
+
+    LLSYM(__uidivmod_return):
+        mov     r1,     r0
+        mov     r0,     r3
+        RET
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LLSYM(__uidivmod_zero):
+        // Set up the *div0() parameter specified in the ARM runtime ABI:
+        //  * 0 if the numerator is 0,
+        //  * Or, the largest value of the type manipulated by the calling
+        //     division function if the numerator is positive.
+        subs    r1,     r0
+        orrs    r0,     r1
+        asrs    r0,     #31
+
+    FUNC_ENTRY uidivmod_zero2
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+      #else
+        push    { lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 4
+                .cfi_rel_offset lr, 0
+      #endif
+
+        // Since GCC implements __aeabi_idiv0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        // NOTE: __aeabi_idiv0() can't be implemented as a tail call, since any
+        //  non-trivial override will (likely) corrupt a remainder in $r1.
+        bl      SYM(__aeabi_idiv0) __PLT__
+
+        // Since the input to __aeabi_idiv0() was INF, there really isn't any
+        //  choice in which of the recommended *divmod() patterns to follow.
+        // Clear the remainder to complete {INF, 0}.
+        eors    r1,     r1
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { pc }
+                .cfi_restore_state
+      #endif
+
+  #else /* !PEDANTIC_DIV0 */
+    FUNC_ENTRY uidivmod_zero
+        // NOTE: The following code sets up a return pair of {0, numerator},
+        //  the second preference given by the ARM runtime ABI specification.
+        // The pedantic version is 18 bytes larger between __aeabi_idiv() and
+        //  __aeabi_uidiv().  However, this version does not conform to the
+        //  out-of-line parameter requirements given for __aeabi_idiv0(), and
+        //  also does not pass 'gcc/testsuite/gcc.target/arm/divzero.c'.
+
+        // Since the numerator may be overwritten by __aeabi_idiv0(), save now.
+        // Afterwards, it can be restored directly as the remainder.
+        push    { r0, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset lr, 4
+
+        // Set up the quotient (not ABI compliant).
+        eors    r0,     r0
+
+        // Since GCC implements div0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        bl      SYM(__aeabi_idiv0) __PLT__
+
+        // Restore the remainder and return.
+        pop     { r1, pc }
+                .cfi_restore_state
+
+  #endif /* !PEDANTIC_DIV0 */
+
+    CFI_END_FUNCTION
+FUNC_END udivsi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_uidiv
+  FUNC_END aeabi_uidivmod
+#endif
+
+#endif /* L_udivsi3 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 796f6f30ed9..cb01abec34c 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1158,6 +1158,10 @@ LSYM(Ldivbyzero_negative):
 /* ------------------------------------------------------------------------ */
 /*		Start of the Real Functions				    */
 /* ------------------------------------------------------------------------ */
+
+/* Disable these on v6m in favor of 'eabi/idiv.S', below. */
+#ifndef NOT_ISA_TARGET_32BIT
+
 #ifdef L_udivsi3
 
 #if defined(__prefer_thumb__)
@@ -1563,6 +1567,18 @@ LSYM(Lover12):
 	DIV_FUNC_END modsi3 signed
 
 #endif /* L_modsi3 */
+
+#else /* NOT_ISA_TARGET_32BIT */
+/* Temp registers. */
+#define rP r4
+#define rQ r5
+#define rS r6
+#define rT r7
+
+#define PEDANTIC_DIV0 (1)
+#include "eabi/idiv.S"
+#endif /* NOT_ISA_TARGET_32BIT */
+
 /* ------------------------------------------------------------------------ */
 #ifdef L_dvmd_tls
 
@@ -1578,7 +1594,8 @@ LSYM(Lover12):
 	FUNC_END div0
 #endif
 	
-#endif /* L_divmodsi_tools */
+#endif /* L_div_tls */
+
 /* ------------------------------------------------------------------------ */
 #ifdef L_dvmd_lnx
 @ GNU/Linux division-by zero handler.  Used in place of L_dvmd_tls

From patchwork Mon Oct 31 15:45:15 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59683
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 8F37A38555AE
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:51:32 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id A3C843810B5C
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A3C843810B5C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id 91E79320093C;
 Mon, 31 Oct 2022 11:47:59 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:47:59 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231279; x=
 1667317679; bh=Qh1t657BX6FsQbexPwGH634ofc89uLbF0cHQr9+aUZI=; b=v
 fAhCKBxgRKLaCDT9J1faDHxuSp7uGsVP0jJ5EhXgo5qgwxdGoLyGmYU0DTi6vMlD
 5pROODh1IJiGuTTTkK6yUR3ENB/x6BlYoSFNJKtOrf+xSoqOg6ivmLI37k0WZBGl
 y1DufA7lAh+EuGXkKLUt6rSAFQ86SgLvZxa1aCODB30XZxQkDVc96A9e2fS4jp+t
 OduqjNBW43PL+JUXmumRsTyRWOkPUzbGveHy0f/LJWpvhAkoinbd1e3DsJk6wuCw
 IG3dvlOQyGRdY4QJM3VVLDbXXN59aFgeMIHxnLvbishxLYJsdkKXo2n8TRcGG1+/
 CxsAYJV00gFVcgwng/AOw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231279; x=1667317679; bh=Qh1t657BX6FsQ
 bexPwGH634ofc89uLbF0cHQr9+aUZI=; b=GPS+vzUWJ/4Tt6Hx7+lhcMDLFeoFH
 MbBywZb9zrfLjgvMiNpPEWJAfHIXYlGwNUAx7GPSXVXydEMfSVnAKWmcihTB826P
 0KPo9Ep9qMYTIfe8G8/LriuHzx61TOHoJQTD8llVnxHUceQB6w0ly4q/yoWJ2H9T
 xK1bPWvNjyCLcUeb+D/4vkNtED4sR8fDx3hlK2t9foI2ieNMilPn3S+Kbbl15ymK
 nvYiXmf7zeC3kcKfogZ3N0W+ZF39a559mLMrVH9RshE4S+T7Vdst6+2qx2scwVuT
 WO1BdVsPOCbJSxUD5j686iZAJEiyDnoDIKs35NMiCseYm51EEx+AExYTA==
X-ME-Sender: <xms:L-5fY_Wk7sBFXpsJe2Z54OcpgHqflbDzDbzphYc51EPFpMczk_OAFA>
 <xme:L-5fY3l2ol0DOWSiOW0ig3moEVYcEyWKYMygwocTS93T-ECOqpZsI1B_fmcemensx
 jQcx025TkMyhQ>
X-ME-Received: 
 <xmr:L-5fY7Z_ifFU3rsFdxS8DIZ2V_lkWfWprQ6bX7muvD2OHSG8bHvKBHFIfFGeOfk1PWSIV5bYEb5SSLK5UGlJnwDFpRrPYAxXfM7rmQpzM_xPGqhhBbmHsrM>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpedvkeejgfejudegtddufeeviedvhfduheeiueehteegueekfeekieehueev
 feeludenucffohhmrghinhepsghprggsihdqvheimhdrshgspdhlughivhdrshgspdhgnh
 hurdhorhhgpdhlihgsudhfuhhntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecu
 rfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:L-5fY6VQ9DRZV0j8jDmpURjY_zkLfDWKiVPZ50wipAs-mKMQtLqZWQ>
 <xmx:L-5fY5lY2rZ0QfeWi8LQlxIbGpW53i9kIWCPQWxHZRRdoar7dvk7JA>
 <xmx:L-5fY3fLg-IIrjb0bzGSS86WN0BKs_Y_6vKMkTj_At2bNy7MHz8QQA>
 <xmx:L-5fY-x_FMELvVpQp7x2CejjTa3z5xluiw5y34drsiaFvSMAc_mKHw>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:47:58 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFloQ5087289; Mon, 31 Oct 2022 08:47:50 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 20/34] Refactor Thumb-1 64-bit division into a new file
Date: Mon, 31 Oct 2022 08:45:15 -0700
Message-Id: <20221031154529.3627576-21-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

        * config/arm/bpabi-v6m.S (__aeabi_ldivmod/ldivmod): Moved to ...
        * config/arm/eabi/ldiv.S: New file.
        * config/arm/lib1funcs.S: #include eabi/ldiv.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S |  81 -------------------------
 libgcc/config/arm/eabi/ldiv.S | 107 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |   1 +
 3 files changed, 108 insertions(+), 81 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ldiv.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 3757e99508e..d38a9208c60 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -34,87 +34,6 @@
 #endif /* __ARM_EABI__ */
 
 
-.macro test_div_by_zero signed
-	cmp	yyh, #0
-	bne	7f
-	cmp	yyl, #0
-	bne	7f
-	cmp	xxh, #0
-	.ifc	\signed, unsigned
-	bne	2f
-	cmp	xxl, #0
-2:
-	beq	3f
-	movs	xxh, #0
-	mvns	xxh, xxh		@ 0xffffffff
-	movs	xxl, xxh
-3:
-	.else
-	blt	6f
-	bgt	4f
-	cmp	xxl, #0
-	beq	5f
-4:	movs	xxl, #0
-	mvns	xxl, xxl		@ 0xffffffff
-	lsrs	xxh, xxl, #1		@ 0x7fffffff
-	b	5f
-6:	movs	xxh, #0x80
-	lsls	xxh, xxh, #24		@ 0x80000000
-	movs	xxl, #0
-5:
-	.endif
-	@ tailcalls are tricky on v6-m.
-	push	{r0, r1, r2}
-	ldr	r0, 1f
-	adr	r1, 1f
-	adds	r0, r1
-	str	r0, [sp, #8]
-	@ We know we are not on armv4t, so pop pc is safe.
-	pop	{r0, r1, pc}
-	.align	2
-1:
-	.word	__aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-	test_div_by_zero signed
-
-	push	{r0, r1}
-	mov	r0, sp
-	push	{r0, lr}
-	ldr	r0, [sp, #8]
-	bl	SYM(__gnu_ldivmod_helper)
-	ldr	r3, [sp, #4]
-	mov	lr, r3
-	add	sp, sp, #8
-	pop	{r2, r3}
-	RET
-	FUNC_END aeabi_ldivmod
-
-#endif /* L_aeabi_ldivmod */
-
-#ifdef L_aeabi_uldivmod
-
-FUNC_START aeabi_uldivmod
-	test_div_by_zero unsigned
-
-	push	{r0, r1}
-	mov	r0, sp
-	push	{r0, lr}
-	ldr	r0, [sp, #8]
-	bl	SYM(__udivmoddi4)
-	ldr	r3, [sp, #4]
-	mov	lr, r3
-	add	sp, sp, #8
-	pop	{r2, r3}
-	RET
-	FUNC_END aeabi_uldivmod
-	
-#endif /* L_aeabi_uldivmod */
-
 #ifdef L_arm_addsubsf3
 
 FUNC_START aeabi_frsub
diff --git a/libgcc/config/arm/eabi/ldiv.S b/libgcc/config/arm/eabi/ldiv.S
new file mode 100644
index 00000000000..3c8280ef580
--- /dev/null
+++ b/libgcc/config/arm/eabi/ldiv.S
@@ -0,0 +1,107 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+.macro test_div_by_zero signed
+        cmp     yyh, #0
+        bne     7f
+        cmp     yyl, #0
+        bne     7f
+        cmp     xxh, #0
+        .ifc    \signed, unsigned
+        bne     2f
+        cmp     xxl, #0
+2:
+        beq     3f
+        movs    xxh, #0
+        mvns    xxh, xxh                @ 0xffffffff
+        movs    xxl, xxh
+3:
+        .else
+        blt     6f
+        bgt     4f
+        cmp     xxl, #0
+        beq     5f
+4:      movs    xxl, #0
+        mvns    xxl, xxl                @ 0xffffffff
+        lsrs    xxh, xxl, #1            @ 0x7fffffff
+        b       5f
+6:      movs    xxh, #0x80
+        lsls    xxh, xxh, #24           @ 0x80000000
+        movs    xxl, #0
+5:
+        .endif
+        @ tailcalls are tricky on v6-m.
+        push    {r0, r1, r2}
+        ldr     r0, 1f
+        adr     r1, 1f
+        adds    r0, r1
+        str     r0, [sp, #8]
+        @ We know we are not on armv4t, so pop pc is safe.
+        pop     {r0, r1, pc}
+        .align  2
+1:
+        .word   __aeabi_ldiv0 - 1b
+7:
+.endm
+
+#ifdef L_aeabi_ldivmod
+
+FUNC_START aeabi_ldivmod
+        test_div_by_zero signed
+
+        push    {r0, r1}
+        mov     r0, sp
+        push    {r0, lr}
+        ldr     r0, [sp, #8]
+        bl      SYM(__gnu_ldivmod_helper)
+        ldr     r3, [sp, #4]
+        mov     lr, r3
+        add     sp, sp, #8
+        pop     {r2, r3}
+        RET
+        FUNC_END aeabi_ldivmod
+
+#endif /* L_aeabi_ldivmod */
+
+#ifdef L_aeabi_uldivmod
+
+FUNC_START aeabi_uldivmod
+        test_div_by_zero unsigned
+
+        push    {r0, r1}
+        mov     r0, sp
+        push    {r0, lr}
+        ldr     r0, [sp, #8]
+        bl      SYM(__udivmoddi4)
+        ldr     r3, [sp, #4]
+        mov     lr, r3
+        add     sp, sp, #8
+        pop     {r2, r3}
+        RET
+        FUNC_END aeabi_uldivmod
+
+#endif /* L_aeabi_uldivmod */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index cb01abec34c..51fb32e38aa 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1577,6 +1577,7 @@ LSYM(Lover12):
 
 #define PEDANTIC_DIV0 (1)
 #include "eabi/idiv.S"
+#include "eabi/ldiv.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 
 /* ------------------------------------------------------------------------ */

From patchwork Mon Oct 31 15:45:16 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59677
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 581FB398241B
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:50:33 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 4371F381D47C
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4371F381D47C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id E0F1A3200917;
 Mon, 31 Oct 2022 11:48:04 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:48:05 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231284; x=
 1667317684; bh=6AfMMp89wlVXhofaXfsNc5nN7aZAVJYDyA4ljcPwHeU=; b=t
 hRDwwixU1eZeQUHEfORmXpC4MpMGNWY5sx12a13FLCnqv5eeRg7Sl8u4OgBKQI3K
 /o9T6Iid7ipn/fRXhrC7LFgkK40RDSVatY2cPZTAGBoee7cOJWWnJ8V9LgSZ8J2c
 CPhd+5X5x+yLMQ1qcww2jIhf5iNfSi5l/hopmgUm/JQso3m/9/J65BeJEP0Wx0/N
 OI88k8hELux5rWBU2v0ZI9IaF19JDHGi+mb4gaRuyDkzBC0P1k6dHdskl9pHeVAx
 mKBA/ZCK6dW/iy49d7GRSuEzhPnQ7xCS4he8s4s30wpCucSbgqrP1/KAaWPzzVAi
 OZFla0jOCACX5bwIcBeVA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231284; x=1667317684; bh=6AfMMp89wlVXh
 ofaXfsNc5nN7aZAVJYDyA4ljcPwHeU=; b=oFJJs4GvX4JNeIpYNENiekvQCWM8d
 0zu9hLQtT/SY6vUO/t1F08bTWvF842ZpQf3e/ZL+c5PZatHCjmORvEra38IZirR9
 2mB7O2CqbFuQRM/MZLzQdPK7QPUXSwxbLIH6H/fyqerJ/atjFvR4SvY36ualvHG2
 gSLeORksR/TSnyaQXnpva6YOcb/yNn2IlE96g/PdDW+OwDgXPLdwZJayHPT5vxdt
 ryNEQc1BighyTIDXDU8TwkvTgjfJS4v6hN6cOAOvjJwvWhhY7dSHPJKXZdCij81O
 YIEO1xZzL82aL2pSH3/Jl25Wkarb0Q4oTqoWJtXQTCghg52QwzLb04jEg==
X-ME-Sender: <xms:NO5fYwiCnK0mLc0V0zEUEkbt8S1prm_H3BZq1izcA-ZAuvibvjyT4Q>
 <xme:NO5fY5CpzJ_YfT9oCc0JSqoAV0dfvO9OjZNp1KDipkhAhYDx3v5s2BuVwTbJTxUVa
 a9-uq_mp-seug>
X-ME-Received: 
 <xmr:NO5fY4HqmxIk85STf--V6NIGhMbFjXwLp2kzf0kaNZhOc-BUh3naoMwB8rnEfl6gfLBloIEVYBjLnjDf3vkobwSGeONsc9xcrjiapu1y_o431oJyYZXLVl4>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeegudfhkedtfeejjeetheevlefhgfeluefgleejgeelgffhteekiedtleev
 geduheenucffohhmrghinhepghhnuhdrohhrghdplhguihhvrdhssgdpghgttgdrthgrrh
 hgvghtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhep
 ghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:NO5fYxQr4lLzyI1kOiful1JcYzE4i4Ot0BS8deJ-_WJJPuZWRCi4Lg>
 <xmx:NO5fY9wTtMegDZOcAkUB5H05nEeV-LczX8Drqi-Yrp9gBDGKQIlWHQ>
 <xmx:NO5fY_64OmkhatxJGdQgm_nCfEz3WAAMzNSlIGSDTAQ6_26n5mveaA>
 <xmx:NO5fY49AI9Z2qcSemup8yujZ4ofcx9MFYe01Pp0g1e-gTuej8tP_IQ>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:03 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFltHY087292; Mon, 31 Oct 2022 08:47:55 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 21/34] Import 64-bit division from the CM0 library
Date: Mon, 31 Oct 2022 08:45:16 -0700
Message-Id: <20221031154529.3627576-22-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

        * config/arm/bpabi.c: Deleted unused file.
        * config/arm/eabi/ldiv.S (__aeabi_ldivmod, __aeabi_uldivmod):
        Replaced wrapper functions with a complete implementation.
        * config/arm/t-bpabi (LIB2ADD_ST): Removed bpabi.c.
        * config/arm/t-elf (LIB1ASMFUNCS): Added _divdi3 and _udivdi3.
---
 libgcc/config/arm/bpabi.c     |  42 ---
 libgcc/config/arm/eabi/ldiv.S | 542 +++++++++++++++++++++++++++++-----
 libgcc/config/arm/t-bpabi     |   3 +-
 libgcc/config/arm/t-elf       |   9 +
 4 files changed, 474 insertions(+), 122 deletions(-)
 delete mode 100644 libgcc/config/arm/bpabi.c

diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c
deleted file mode 100644
index d8ba940d1ff..00000000000
--- a/libgcc/config/arm/bpabi.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* Miscellaneous BPABI functions.
-
-   Copyright (C) 2003-2022 Free Software Foundation, Inc.
-   Contributed by CodeSourcery, LLC.
-
-   This file is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by the
-   Free Software Foundation; either version 3, or (at your option) any
-   later version.
-
-   This file is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-extern long long __divdi3 (long long, long long);
-extern unsigned long long __udivdi3 (unsigned long long, 
-				     unsigned long long);
-extern long long __gnu_ldivmod_helper (long long, long long, long long *);
-
-
-long long
-__gnu_ldivmod_helper (long long a, 
-		      long long b, 
-		      long long *remainder)
-{
-  long long quotient;
-
-  quotient = __divdi3 (a, b);
-  *remainder = a - b * quotient;
-  return quotient;
-}
-
diff --git a/libgcc/config/arm/eabi/ldiv.S b/libgcc/config/arm/eabi/ldiv.S
index 3c8280ef580..e3ba6497761 100644
--- a/libgcc/config/arm/eabi/ldiv.S
+++ b/libgcc/config/arm/eabi/ldiv.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* ldiv.S: Thumb-1 optimized 64-bit integer division
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
 
    This file is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
@@ -24,84 +23,471 @@
    <http://www.gnu.org/licenses/>.  */
 
 
-.macro test_div_by_zero signed
-        cmp     yyh, #0
-        bne     7f
-        cmp     yyl, #0
-        bne     7f
-        cmp     xxh, #0
-        .ifc    \signed, unsigned
-        bne     2f
-        cmp     xxl, #0
-2:
-        beq     3f
-        movs    xxh, #0
-        mvns    xxh, xxh                @ 0xffffffff
-        movs    xxl, xxh
-3:
-        .else
-        blt     6f
-        bgt     4f
-        cmp     xxl, #0
-        beq     5f
-4:      movs    xxl, #0
-        mvns    xxl, xxl                @ 0xffffffff
-        lsrs    xxh, xxl, #1            @ 0x7fffffff
-        b       5f
-6:      movs    xxh, #0x80
-        lsls    xxh, xxh, #24           @ 0x80000000
-        movs    xxl, #0
-5:
-        .endif
-        @ tailcalls are tricky on v6-m.
-        push    {r0, r1, r2}
-        ldr     r0, 1f
-        adr     r1, 1f
-        adds    r0, r1
-        str     r0, [sp, #8]
-        @ We know we are not on armv4t, so pop pc is safe.
-        pop     {r0, r1, pc}
-        .align  2
-1:
-        .word   __aeabi_ldiv0 - 1b
-7:
-.endm
-
-#ifdef L_aeabi_ldivmod
-
-FUNC_START aeabi_ldivmod
-        test_div_by_zero signed
-
-        push    {r0, r1}
-        mov     r0, sp
-        push    {r0, lr}
-        ldr     r0, [sp, #8]
-        bl      SYM(__gnu_ldivmod_helper)
-        ldr     r3, [sp, #4]
-        mov     lr, r3
-        add     sp, sp, #8
-        pop     {r2, r3}
+#ifndef __GNUC__
+
+// long long __aeabi_ldiv0(long long)
+// Helper function for division by 0.
+WEAK_START_SECTION aeabi_ldiv0 .text.sorted.libgcc.ldiv.ldiv0
+    CFI_START_FUNCTION
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_DIVISION_BY_ZERO)
+      #endif
+
         RET
-        FUNC_END aeabi_ldivmod
 
-#endif /* L_aeabi_ldivmod */
+    CFI_END_FUNCTION
+FUNC_END aeabi_ldiv0
 
-#ifdef L_aeabi_uldivmod
+#endif /* !__GNUC__ */
 
-FUNC_START aeabi_uldivmod
-        test_div_by_zero unsigned
 
-        push    {r0, r1}
-        mov     r0, sp
-        push    {r0, lr}
-        ldr     r0, [sp, #8]
-        bl      SYM(__udivmoddi4)
-        ldr     r3, [sp, #4]
-        mov     lr, r3
-        add     sp, sp, #8
-        pop     {r2, r3}
-        RET
-        FUNC_END aeabi_uldivmod
+#ifdef L_divdi3
+
+// long long __aeabi_ldiv(long long, long long)
+// lldiv_return __aeabi_ldivmod(long long, long long)
+// Returns signed $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+// Same parent section as __divsi3() to keep branches within range.
+FUNC_START_SECTION divdi3 .text.sorted.libgcc.ldiv.divdi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_ldiv divdi3
+  FUNC_ALIAS aeabi_ldivmod divdi3
+#endif
+
+    CFI_START_FUNCTION
+
+        // Test the denominator for zero before pushing registers.
+        cmp     yyl,    #0
+        bne     LLSYM(__ldivmod_valid)
+
+        cmp     yyh,    #0
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LLSYM(__ldivmod_zero)
+      #else
+        beq     SYM(__uldivmod_zero)
+      #endif
+
+    LLSYM(__ldivmod_valid):
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rP, rQ, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+      #else
+        push    { rP, rQ, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 12
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset lr, 8
+      #endif
+
+        // Absolute value of the numerator.
+        asrs    rP,     xxh,    #31
+        eors    xxl,    rP
+        eors    xxh,    rP
+        subs    xxl,    rP
+        sbcs    xxh,    rP
+
+        // Absolute value of the denominator.
+        asrs    rQ,     yyh,    #31
+        eors    yyl,    rQ
+        eors    yyh,    rQ
+        subs    yyl,    rQ
+        sbcs    yyh,    rQ
+
+        // Keep the XOR of signs for the quotient.
+        eors    rQ,     rP
+
+        // Handle division as unsigned.
+        bl      SYM(__uldivmod_nonzero) __PLT__
+
+        // Set the sign of the quotient.
+        eors    xxl,    rQ
+        eors    xxh,    rQ
+        subs    xxl,    rQ
+        sbcs    xxh,    rQ
+
+        // Set the sign of the remainder.
+        eors    yyl,    rP
+        eors    yyh,    rP
+        subs    yyl,    rP
+        sbcs    yyh,    rP
+
+    LLSYM(__ldivmod_return):
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rP, rQ, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { rP, rQ, pc }
+                .cfi_restore_state
+      #endif
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LLSYM(__ldivmod_zero):
+        // Save the sign of the numerator.
+        asrs    yyl,     xxh,    #31
+
+        // Set up the *div0() parameter specified in the ARM runtime ABI:
+        //  * 0 if the numerator is 0,
+        //  * Or, the largest value of the type manipulated by the calling
+        //     division function if the numerator is positive,
+        //  * Or, the least value of the type manipulated by the calling
+        //     division function if the numerator is negative.
+        rsbs    xxl,    #0
+        sbcs    yyh,    xxh
+        orrs    xxh,    yyh
+        asrs    xxl,    xxh,   #31
+        lsrs    xxh,    xxl,   #1
+        eors    xxh,    yyl
+        eors    xxl,    yyl
+
+        // At least the __aeabi_ldiv0() call is common.
+        b       SYM(__uldivmod_zero2)
+  #endif /* PEDANTIC_DIV0 */
+
+    CFI_END_FUNCTION
+FUNC_END divdi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_ldiv
+  FUNC_END aeabi_ldivmod
+#endif
+
+#endif /* L_divdi3 */
+
+
+#ifdef L_udivdi3
+
+// unsigned long long __aeabi_uldiv(unsigned long long, unsigned long long)
+// ulldiv_return __aeabi_uldivmod(unsigned long long, unsigned long long)
+// Returns unsigned $r1:$r0 after division by $r3:$r2.
+// Also returns the remainder in $r3:$r2.
+FUNC_START_SECTION udivdi3 .text.sorted.libgcc.ldiv.udivdi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_uldiv udivdi3
+  FUNC_ALIAS aeabi_uldivmod udivdi3
+#endif
+
+    CFI_START_FUNCTION
+
+        // Test the denominator for zero before changing the stack.
+        cmp     yyh,    #0
+        bne     SYM(__uldivmod_nonzero)
+
+        cmp     yyl,    #0
+      #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+        beq     LLSYM(__uldivmod_zero)
+      #else
+        beq     SYM(__uldivmod_zero)
+      #endif
+
+  #if defined(OPTIMIZE_SPEED) && OPTIMIZE_SPEED
+        // MAYBE: Optimize division by a power of 2
+  #endif
+
+    FUNC_ENTRY uldivmod_nonzero
+        push    { rP, rQ, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset rP, 0
+                .cfi_rel_offset rQ, 4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+
+        // Set up denominator shift, assuming a single width result.
+        movs    rP,     #32
+
+        // If the upper word of the denominator is 0 ...
+        tst     yyh,    yyh
+        bne     LLSYM(__uldivmod_setup)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // ... and the upper word of the numerator is also 0,
+        //  single width division will be at least twice as fast.
+        tst     xxh,    xxh
+        beq     LLSYM(__uldivmod_small)
+  #endif
+
+        // ... and the lower word of the denominator is less than or equal
+        //     to the upper word of the numerator ...
+        cmp     xxh,    yyl
+        blo     LLSYM(__uldivmod_setup)
+
+        //  ... then the result will be double width, at least 33 bits.
+        // Set up a flag in $rP to seed the shift for the second word.
+        movs    yyh,    yyl
+        eors    yyl,    yyl
+        adds    rP,     #64
+
+    LLSYM(__uldivmod_setup):
+        // Pre division: Shift the denominator as far as possible left
+        //  without making it larger than the numerator.
+        // Since search is destructive, first save a copy of the numerator.
+        mov     ip,     xxl
+        mov     lr,     xxh
+
+        // Set up binary search.
+        movs    rQ,     #16
+        eors    rT,     rT
+
+    LLSYM(__uldivmod_align):
+        // Maintain a secondary shift $rT = 32 - $rQ, making the overlapping
+        //  shifts between low and high words easier to construct.
+        adds    rT,     rQ
+
+        // Prefer dividing the numerator to multipying the denominator
+        //  (multiplying the denominator may result in overflow).
+        lsrs    xxh,    rQ
+
+        // Measure the high bits of denominator against the numerator.
+        cmp     xxh,    yyh
+        blo     LLSYM(__uldivmod_skip)
+        bhi     LLSYM(__uldivmod_shift)
+
+        // If the high bits are equal, construct the low bits for checking.
+        mov     xxh,    lr
+        lsls    xxh,    rT
+
+        lsrs    xxl,    rQ
+        orrs    xxh,    xxl
+
+        cmp     xxh,    yyl
+        blo     LLSYM(__uldivmod_skip)
+
+    LLSYM(__uldivmod_shift):
+        // Scale the denominator and the result together.
+        subs    rP,     rQ
+
+        // If the reduced numerator is still larger than or equal to the
+        //  denominator, it is safe to shift the denominator left.
+        movs    xxh,    yyl
+        lsrs    xxh,    rT
+        lsls    yyh,    rQ
+
+        lsls    yyl,    rQ
+        orrs    yyh,    xxh
+
+    LLSYM(__uldivmod_skip):
+        // Restore the numerator.
+        mov     xxl,    ip
+        mov     xxh,    lr
+
+        // Iterate until the shift goes to 0.
+        lsrs    rQ,     #1
+        bne     LLSYM(__uldivmod_align)
+
+        // Initialize the result (zero).
+        mov     ip,     rQ
+
+        // HACK: Compensate for the first word test.
+        lsls    rP,     #6
+
+    LLSYM(__uldivmod_word2):
+        // Is there another word?
+        lsrs    rP,     #6
+        beq     LLSYM(__uldivmod_return)
+
+        // Shift the calculated result by 1 word.
+        mov     lr,     ip
+        mov     ip,     rQ
+
+        // Set up the MSB of the next word of the quotient
+        movs    rQ,     #1
+        rors    rQ,     rP
+        b     LLSYM(__uldivmod_entry)
+
+    LLSYM(__uldivmod_loop):
+        // Divide the denominator by 2.
+        // It could be slightly faster to multiply the numerator,
+        //  but that would require shifting the remainder at the end.
+        lsls    rT,     yyh,    #31
+        lsrs    yyh,    #1
+        lsrs    yyl,    #1
+        adds    yyl,    rT
+
+        // Step to the next bit of the result.
+        lsrs    rQ,     #1
+        beq     LLSYM(__uldivmod_word2)
+
+    LLSYM(__uldivmod_entry):
+        // Test if the denominator is smaller, high byte first.
+        cmp     xxh,    yyh
+        blo     LLSYM(__uldivmod_loop)
+        bhi     LLSYM(__uldivmod_quotient)
+
+        cmp     xxl,    yyl
+        blo     LLSYM(__uldivmod_loop)
+
+    LLSYM(__uldivmod_quotient):
+        // Smaller denominator: the next bit of the quotient will be set.
+        add     ip,     rQ
+
+        // Subtract the denominator from the remainder.
+        // If the new remainder goes to 0, exit early.
+        subs    xxl,    yyl
+        sbcs    xxh,    yyh
+        bne     LLSYM(__uldivmod_loop)
+
+        tst     xxl,    xxl
+        bne     LLSYM(__uldivmod_loop)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Check whether there's still a second word to calculate.
+        lsrs    rP,     #6
+        beq     LLSYM(__uldivmod_return)
+
+        // If so, shift the result left by a full word.
+        mov     lr,     ip
+        mov     ip,     xxh // zero
+  #else
+        eors    rQ,     rQ
+        b       LLSYM(__uldivmod_word2)
+  #endif
+
+    LLSYM(__uldivmod_return):
+        // Move the remainder to the second half of the result.
+        movs    yyl,    xxl
+        movs    yyh,    xxh
+
+        // Move the quotient to the first half of the result.
+        mov     xxl,    ip
+        mov     xxh,    lr
+
+        pop     { rP, rQ, rT, pc }
+                .cfi_restore_state
+
+  #if defined(PEDANTIC_DIV0) && PEDANTIC_DIV0
+    LLSYM(__uldivmod_zero):
+        // Set up the *div0() parameter specified in the ARM runtime ABI:
+        //  * 0 if the numerator is 0,
+        //  * Or, the largest value of the type manipulated by the calling
+        //     division function if the numerator is positive.
+        subs    yyl,    xxl
+        sbcs    yyh,    xxh
+        orrs    xxh,    yyh
+        asrs    xxh,    #31
+        movs    xxl,    xxh
+
+    FUNC_ENTRY uldivmod_zero2
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+      #else
+        push    { lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 4
+                .cfi_rel_offset lr, 0
+      #endif
+
+        // Since GCC implements __aeabi_ldiv0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        // NOTE: __aeabi_ldiv0() can't be implemented as a tail call, since any
+        //  non-trivial override will (likely) corrupt a remainder in $r3:$r2.
+        bl      SYM(__aeabi_ldiv0) __PLT__
+
+        // Since the input to __aeabi_ldiv0() was INF, there really isn't any
+        //  choice in which of the recommended *divmod() patterns to follow.
+        // Clear the remainder to complete {INF, 0}.
+        eors    yyl,    yyl
+        eors    yyh,    yyh
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { pc }
+                .cfi_restore_state
+      #endif
+
+  #else /* !PEDANTIC_DIV0 */
+    FUNC_ENTRY uldivmod_zero
+        // NOTE: The following code sets up a return pair of {0, numerator},
+        //  the second preference given by the ARM runtime ABI specification.
+        // The pedantic version is 30 bytes larger between __aeabi_ldiv() and
+        //  __aeabi_uldiv().  However, this version does not conform to the
+        //  out-of-line parameter requirements given for __aeabi_ldiv0(), and
+        //  also does not pass 'gcc/testsuite/gcc.target/arm/divzero.c'.
+
+        // Since the numerator may be overwritten by __aeabi_ldiv0(), save now.
+        // Afterwards, they can be restored directly as the remainder.
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { r0, r1, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 16
+                .cfi_rel_offset xxl,0
+                .cfi_rel_offset xxh,4
+                .cfi_rel_offset rT, 8
+                .cfi_rel_offset lr, 12
+      #else
+        push    { r0, r1, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 12
+                .cfi_rel_offset xxl,0
+                .cfi_rel_offset xxh,4
+                .cfi_rel_offset lr, 8
+      #endif
+
+        // Set up the quotient.
+        eors    xxl,    xxl
+        eors    xxh,    xxh
+
+        // Since GCC implements div0() as a weak overridable function,
+        //  this call must be prepared for a jump beyond +/- 2 KB.
+        bl      SYM(__aeabi_ldiv0) __PLT__
+
+        // Restore the remainder and return.
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { r2, r3, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { r2, r3, pc }
+                .cfi_restore_state
+      #endif
+  #endif /* !PEDANTIC_DIV0 */
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LLSYM(__uldivmod_small):
+        // Arrange operands for (much faster) 32-bit division.
+      #if defined(__ARMEB__) && __ARMEB__
+        movs    r0,     r1
+        movs    r1,     r3
+      #else
+        movs    r1,     r2
+      #endif
+
+        bl      SYM(__uidivmod_nonzero) __PLT__
+
+        // Arrange results back into 64-bit format.
+      #if defined(__ARMEB__) && __ARMEB__
+        movs    r3,     r1
+        movs    r1,     r0
+      #else
+        movs    r2,     r1
+      #endif
+
+        // Extend quotient and remainder to 64 bits, unsigned.
+        eors    xxh,    xxh
+        eors    yyh,    yyh
+        pop     { rP, rQ, rT, pc }
+  #endif
+
+    CFI_END_FUNCTION
+FUNC_END udivdi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_uldiv
+  FUNC_END aeabi_uldivmod
+#endif
 
-#endif /* L_aeabi_uldivmod */
+#endif /* udivdi3 */
 
diff --git a/libgcc/config/arm/t-bpabi b/libgcc/config/arm/t-bpabi
index dddddc7c444..86234d5676f 100644
--- a/libgcc/config/arm/t-bpabi
+++ b/libgcc/config/arm/t-bpabi
@@ -2,8 +2,7 @@
 LIB1ASMFUNCS += _aeabi_lcmp _aeabi_ulcmp _aeabi_ldivmod _aeabi_uldivmod
 
 # Add the BPABI C functions.
-LIB2ADD += $(srcdir)/config/arm/bpabi.c \
-	   $(srcdir)/config/arm/unaligned-funcs.c
+LIB2ADD += $(srcdir)/config/arm/unaligned-funcs.c
 
 LIB2ADD_ST += $(srcdir)/config/arm/fp16.c
 
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 83325410097..4d430325fa1 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -50,6 +50,15 @@ LIB1ASMFUNCS += \
 	_umodsi3 \
 
 
+ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
+# Group 1B: Integer functions built for v6m only.
+LIB1ASMFUNCS += \
+	_divdi3 \
+	_udivdi3 \
+
+endif
+
+
 # Group 2: Single precision floating point function objects.
 LIB1ASMFUNCS += \
 	_arm_addsubsf3 \

From patchwork Mon Oct 31 15:45:17 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59680
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 13D7B3831939
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:51:10 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 35E7E382DE2B
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:11 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 35E7E382DE2B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id 1B72F3200984;
 Mon, 31 Oct 2022 11:48:10 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:48:10 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231289; x=
 1667317689; bh=8XthhmY1hPGFDFTSpu48BxWCwXC/oJiasGLsF0OeDnU=; b=Y
 fI0gnIPWOuayugNCld//QayTnCV2ObOGiFnJNinYcVHw1JsjMtLhK+zuVtSr8gso
 nSzUqomxuJt84XTBIV/CS/GgB+oUV4P9t6WnotXlUsjT2NijeE/QRWYFKR14pBzo
 pAd3XrMQhNQbbsuNxeLHxjXuVbfbFUBY7JiU+GztoxngoB9jhUTKDnJDgqKBzp0J
 9iz0fbGvyI1UGlWPaS0TyQV21EsgEj6/4jpw3402o0OuRO0qih+JL/fpPH0acLb4
 ktOqZsdbW7DS0ATt6n7GCHWNjznwoTlZhsLZBeaVUHMG3B70N19Hi/Y3+X/en3a9
 toiULrBQtfSEp2riDfASA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231289; x=1667317689; bh=8XthhmY1hPGFD
 FTSpu48BxWCwXC/oJiasGLsF0OeDnU=; b=fISZq/DzF5+Ps1uU1Q3F2DlKdIaws
 dhCz6ORaYuJGxhlyRhTYT/L9DNESarSmp8EbgnYbvF81qm83BUf6kjPIEiGp5Osx
 zV9QKB+Ox6zI+5T/1MgkkEDxwilBmK0CEiOGQE2U7oIe+Q+IgUEU/rmIY3oLxgYY
 zQ/pvzn44XZseTqhxudDsuxJZZnkHvyyRxu7PWc+qemiJQ2RHwRv8e6F3UmobKY9
 6rM1vOpvxSvbG6m56VVnwE7nXJLK7A/jiiAF48iuL7E7gA/9un2Ng3XZM0hU7C33
 djQHc2FLqiQJk3lDeWF+hJe9vQgGtwTayjIzhSvKmK8p1lr47fQzYiYgw==
X-ME-Sender: <xms:Oe5fYwGIpFASngXoQuCZf9_j6h2JGoO_0xMKB5z6grAMhR6m0akqFw>
 <xme:Oe5fY5XqIK-25JW43Zjn4DmOn4KViijoNqz4EPAakFtaUyLeZQs9zZsB2eox63Z0A
 H367eLsrPPI2Q>
X-ME-Received: 
 <xmr:Oe5fY6IXgpIVANYvIAMTdup6fKPdTZilAulSQTJZNNRtXSxhPzElWvxEuCsHSjJCAMZYQcz_zhn2TNNcTR_IaibDUtiJeqCO4nG4jpI3oPPMNHM2cXIhbbA>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeefiedugefhveevfedtgfehkefhleduffegveehgeeltedtkedvgefgveef
 uddtveenucffohhmrghinheplhhmuhhlrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh
 gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho
 mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:Oe5fYyEI28zhy56b-iT9Km-fxWbfNybDcmOqCecV6VtnmfQt5Toetg>
 <xmx:Oe5fY2XgP1vx0IvEQLiPUIxQBp6iF8jr1EtdDL-Stk5qR26AFG_1QA>
 <xmx:Oe5fY1PlzzG4Gzhn_0RHk4_j5Gvisek7BxjN1LC8aBhF0P5q7tMMrA>
 <xmx:Oe5fYzi1_lXM0aD1dRczBKzjJi4Wz4fn1WOrZBbFdvq8qSR9Gp1mGQ>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:08 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFm16Z087295; Mon, 31 Oct 2022 08:48:01 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 22/34] Import integer multiplication from the CM0 library
Date: Mon, 31 Oct 2022 08:45:17 -0700
Message-Id: <20221031154529.3627576-23-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/lmul.S: New file for __muldi3(), __mulsidi3(), and
	 __umulsidi3().
	* config/arm/lib1funcs.S: #eabi/lmul.S (v6m only).
	* config/arm/t-elf: Add the new objects to LIB1ASMFUNCS.
---
 libgcc/config/arm/eabi/lmul.S | 218 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf       |  13 +-
 3 files changed, 230 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/lmul.S

diff --git a/libgcc/config/arm/eabi/lmul.S b/libgcc/config/arm/eabi/lmul.S
new file mode 100644
index 00000000000..377e571bf09
--- /dev/null
+++ b/libgcc/config/arm/eabi/lmul.S
@@ -0,0 +1,218 @@
+/* lmul.S: Thumb-1 optimized 64-bit integer multiplication
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_muldi3
+
+// long long __aeabi_lmul(long long, long long)
+// Returns the least significant 64 bits of a 64 bit multiplication.
+// Expects the two multiplicands in $r1:$r0 and $r3:$r2.
+// Returns the product in $r1:$r0 (does not distinguish signed types).
+// Uses $r4 and $r5 as scratch space.
+// Same parent section as __umulsidi3() to keep tail call branch within range.
+FUNC_START_SECTION muldi3 .text.sorted.libgcc.lmul.muldi3
+
+#ifndef __symbian__
+  FUNC_ALIAS aeabi_lmul muldi3
+#endif
+
+    CFI_START_FUNCTION
+
+        // $r1:$r0 = 0xDDDDCCCCBBBBAAAA
+        // $r3:$r2 = 0xZZZZYYYYXXXXWWWW
+
+        // The following operations that only affect the upper 64 bits
+        //  can be safely discarded:
+        //   DDDD * ZZZZ
+        //   DDDD * YYYY
+        //   DDDD * XXXX
+        //   CCCC * ZZZZ
+        //   CCCC * YYYY
+        //   BBBB * ZZZZ
+
+        // MAYBE: Test for multiply by ZERO on implementations with a 32-cycle
+        //  'muls' instruction, and skip over the operation in that case.
+
+        // (0xDDDDCCCC * 0xXXXXWWWW), free $r1
+        muls    xxh,    yyl
+
+        // (0xZZZZYYYY * 0xBBBBAAAA), free $r3
+        muls    yyh,    xxl
+        adds    yyh,    xxh
+
+        // Put the parameters in the correct form for umulsidi3().
+        movs    xxh,    yyl
+        b       LLSYM(__mul_overflow)
+
+    CFI_END_FUNCTION
+FUNC_END muldi3
+
+#ifndef __symbian__
+  FUNC_END aeabi_lmul
+#endif
+
+#endif /* L_muldi3 */
+
+
+// The following implementation of __umulsidi3() integrates with __muldi3()
+//  above to allow the fast tail call while still preserving the extra
+//  hi-shifted bits of the result.  However, these extra bits add a few
+//  instructions not otherwise required when using only __umulsidi3().
+// Therefore, this block configures __umulsidi3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version adds the hi bits of __muldi3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols in programs that multiply long doubles.
+// This means '_umulsidi3' should appear before '_muldi3' in LIB1ASMFUNCS.
+#if defined(L_muldi3) || defined(L_umulsidi3)
+
+#ifdef L_umulsidi3
+// unsigned long long __umulsidi3(unsigned int, unsigned int)
+// Returns all 64 bits of a 32 bit multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $ip as scratch space.
+WEAK_START_SECTION umulsidi3 .text.sorted.libgcc.lmul.umulsidi3
+    CFI_START_FUNCTION
+
+#else /* L_muldi3 */
+FUNC_ENTRY umulsidi3
+    CFI_START_FUNCTION
+
+        // 32x32 multiply with 64 bit result.
+        // Expand the multiply into 4 parts, since muls only returns 32 bits.
+        //         (a16h * b16h / 2^32)
+        //       + (a16h * b16l / 2^48) + (a16l * b16h / 2^48)
+        //       + (a16l * b16l / 2^64)
+
+        // MAYBE: Test for multiply by 0 on implementations with a 32-cycle
+        //  'muls' instruction, and skip over the operation in that case.
+
+        eors    yyh,    yyh
+
+    LLSYM(__mul_overflow):
+        mov     ip,     yyh
+
+#endif /* !L_muldi3 */
+
+        // a16h * b16h
+        lsrs    r2,     xxl,    #16
+        lsrs    r3,     xxh,    #16
+        muls    r2,     r3
+
+      #ifdef L_muldi3
+        add     ip,     r2
+      #else
+        mov     ip,     r2
+      #endif
+
+        // a16l * b16h; save a16h first!
+        lsrs    r2,     xxl,    #16
+    #if (__ARM_ARCH >= 6)
+        uxth    xxl,    xxl
+    #else /* __ARM_ARCH < 6 */
+        lsls    xxl,    #16
+        lsrs    xxl,    #16
+    #endif
+        muls    r3,     xxl
+
+        // a16l * b16l
+    #if (__ARM_ARCH >= 6)
+        uxth    xxh,    xxh
+    #else /* __ARM_ARCH < 6 */
+        lsls    xxh,    #16
+        lsrs    xxh,    #16
+    #endif
+        muls    xxl,    xxh
+
+        // a16h * b16l
+        muls    xxh,    r2
+
+        // Distribute intermediate results.
+        eors    r2,     r2
+        adds    xxh,    r3
+        adcs    r2,     r2
+        lsls    r3,     xxh,    #16
+        lsrs    xxh,    #16
+        lsls    r2,     #16
+        adds    xxl,    r3
+        adcs    xxh,    r2
+
+        // Add in the high bits.
+        add     xxh,     ip
+
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END umulsidi3
+
+#endif /* L_muldi3 || L_umulsidi3 */
+
+
+#ifdef L_mulsidi3
+
+// long long mulsidi3(int, int)
+// Returns all 64 bits of a 32 bit signed multiplication.
+// Expects the two multiplicands in $r0 and $r1.
+// Returns the product in $r1:$r0.
+// Uses $r3, $r4 and $rT as scratch space.
+FUNC_START_SECTION mulsidi3 .text.sorted.libgcc.lmul.mulsidi3
+    CFI_START_FUNCTION
+
+        // Push registers for function call.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save signs of the arguments.
+        asrs    r3,     r0,     #31
+        asrs    rT,     r1,     #31
+
+        // Absolute value of the arguments.
+        eors    r0,     r3
+        eors    r1,     rT
+        subs    r0,     r3
+        subs    r1,     rT
+
+        // Save sign of the result.
+        eors    rT,     r3
+
+        bl      SYM(__umulsidi3) __PLT__
+
+        // Apply sign of the result.
+        eors    xxl,     rT
+        eors    xxh,     rT
+        subs    xxl,     rT
+        sbcs    xxh,     rT
+
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END mulsidi3
+
+#endif /* L_mulsidi3 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 51fb32e38aa..e828d53d732 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1578,6 +1578,7 @@ LSYM(Lover12):
 #define PEDANTIC_DIV0 (1)
 #include "eabi/idiv.S"
 #include "eabi/ldiv.S"
+#include "eabi/lmul.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 
 /* ------------------------------------------------------------------------ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 4d430325fa1..eb1acd8d5a2 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -27,6 +27,13 @@ LIB1ASMFUNCS += \
 	_paritysi2 \
 	_popcountsi2 \
 
+ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
+# Group 0B: WEAK overridable function objects built for v6m only.
+LIB1ASMFUNCS += \
+	_muldi3 \
+	
+endif
+
 
 # Group 1: Integer function objects.
 LIB1ASMFUNCS += \
@@ -51,11 +58,13 @@ LIB1ASMFUNCS += \
 
 
 ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
-# Group 1B: Integer functions built for v6m only.
+# Group 1B: Integer function objects built for v6m only.
 LIB1ASMFUNCS += \
 	_divdi3 \
 	_udivdi3 \
-
+	_mulsidi3 \
+	_umulsidi3 \
+	
 endif
 
 

From patchwork Mon Oct 31 15:45:18 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59684
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id AAE4438745C7
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:51:46 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id A7B35385356F
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A7B35385356F
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id 3BAD53200975;
 Mon, 31 Oct 2022 11:48:15 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:48:15 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231294; x=
 1667317694; bh=VZb3MOPO5Ge0aa9i6F+fkWLuquuhRLR/PHuR3Tx7BQY=; b=N
 FXvqAgyHp73HkY0XGyl1MNz/XegLelwc5g5cDbyikFAUATi+huZimYZc1qrd0Lvj
 9kCgE4XUSeOpMt5NgzXLWzsTs4j8jlbYAoSfIUFCQnfjn4IuDwGEa6HK2gHQNkE3
 MFi7ce+77zNsad15hIHPGOGBo7KsdrT6JvY6ilQA5TpMTTUGfPpjErrK9Xfoh9rq
 yoBoDS5C6qivcRcfumFD9xQLYkburR4fVLcmVam0rgn9wYcFqIQ/TZHJvgVZswsR
 ZFLc6tTP2IfbSLDpyUw7eh9zsHpwZicb1rAZGTfcRJ70T0jyZq/sSznxCDB9vqfa
 5G47Ulk7Bv6fUzGM+7B3Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231294; x=1667317694; bh=VZb3MOPO5Ge0a
 a9i6F+fkWLuquuhRLR/PHuR3Tx7BQY=; b=b4pWBZ6Z7D0z4CBITJH8SkF1wHMVE
 bNnNR89XME7B8ifOlrTWVfB0RLo+pfPe8BBoNlsJF2zEVLMd5WWbIKp1cinguq2R
 JeLCStJLQYZcfJ6V7OLjQNGDuYk5RKWxC/NBxvVc2Hr53z34mIZkKeBGwfvMIUDS
 TYiMynneI1HfYXVudcn1wLPmTZdL8O2XFFQ9djHjZ0V5gZ+BiB7zlS58K5eKfYRf
 Mzkg1QQK92hnx4KKAJCNtBSxzsBvkfoiFqX+G+DrkbykDtQ7kDpV1eFGUmT9w/HB
 WXTwupMHpnk/mZjPWAIlFNEclXc2BYD9mPxOmFOrwSQ/uo0yxdQtFhbwg==
X-ME-Sender: <xms:Pu5fY9St6hKTZwrBjsKRMEeUPYvfUpUhf4Q5rwoXVhbAeWboSwYSgA>
 <xme:Pu5fY2yXfkUvYQTYAjqG6PPrmyESzOJsmr-SvxhpmbquYGkUUgMEAYMPJRRGR-oyb
 zTXU8a09McrqQ>
X-ME-Received: 
 <xmr:Pu5fYy3Z4YkhE-SROGXP2jqHqo_53xaXxm2H0ilRCWxu33aSZGD3Y4oGeZQTPs23IsDAqMhPrms3yGHLhZ202hTi9ide9PNkf1wZMLwqwd1h2aubWjurh30>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeevtefgfeefffdvhfetteekieeltedttdehgeekveehjeelffeigeetkedu
 gfelteenucffohhmrghinhepsghprggsihdqvheimhdrshgspdhftghmphdrshgspdhgnh
 hurdhorhhgpdhlihgsudhfuhhntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecu
 rfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:Pu5fY1BSpx6h5nr_Y5_hVNck6dT2euAABn3zNNSZHVmXEyrjROLMIA>
 <xmx:Pu5fY2iwNd8_nBOLYZ9ZoMycDR71R-eGgJkFVX2PDbN8nUuebYCcBg>
 <xmx:Pu5fY5ooIJcRWa34e-fb2iqO2MfvTHGIPM2zYW0ba2G2RgSCH9H3Zw>
 <xmx:Pu5fY5tiT_KejOd4psrno2n-zKJVykhBrbSGUUfjNwlUgYcGvbGl3g>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:14 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFm6jQ087298; Mon, 31 Oct 2022 08:48:06 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 23/34] Refactor Thumb-1 float comparison into a new file
Date: Mon, 31 Oct 2022 08:45:18 -0700
Message-Id: <20221031154529.3627576-24-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bpabi-v6m.S (__aeabi_cfcmpeq, __aeabi_cfcmple,
	__aeabi_cfrcmple, __aeabi_fcmpeq, __aeabi_fcmple, aeabi_fcmple,
	__aeabi_fcmpgt, aeabi_fcmpge): Moved to ...
	* config/arm/eabi/fcmp.S: New file.
	* config/arm/lib1funcs.S: #include eabi/fcmp.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S | 63 -------------------------
 libgcc/config/arm/eabi/fcmp.S | 89 +++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 90 insertions(+), 63 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fcmp.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index d38a9208c60..8e0a45f4716 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -49,69 +49,6 @@ FUNC_START aeabi_frsub
 
 #endif /* L_arm_addsubsf3 */
 
-#ifdef L_arm_cmpsf2
-
-FUNC_START aeabi_cfrcmple
-
-	mov	ip, r0
-	movs	r0, r1
-	mov	r1, ip
-	b	6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
-	@ The status-returning routines are required to preserve all
-	@ registers except ip, lr, and cpsr.
-6:	push	{r0, r1, r2, r3, r4, lr}
-	bl	__lesf2
-	@ Set the Z flag correctly, and the C flag unconditionally.
-	cmp	r0, #0
-	@ Clear the C flag if the return value was -1, indicating
-	@ that the first operand was smaller than the second.
-	bmi	1f
-	movs	r1, #0
-	cmn	r0, r1
-1:
-	pop	{r0, r1, r2, r3, r4, pc}
-
-	FUNC_END aeabi_cfcmple
-	FUNC_END aeabi_cfcmpeq
-	FUNC_END aeabi_cfrcmple
-
-FUNC_START	aeabi_fcmpeq
-
-	push	{r4, lr}
-	bl	__eqsf2
-	negs	r0, r0
-	adds	r0, r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START	aeabi_fcmp\cond
-
-	push	{r4, lr}
-	bl	__\helper\mode
-	cmp	r0, #0
-	b\cond	1f
-	movs	r0, #0
-	pop	{r4, pc}
-1:
-	movs	r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff --git a/libgcc/config/arm/eabi/fcmp.S b/libgcc/config/arm/eabi/fcmp.S
new file mode 100644
index 00000000000..96d627f1fea
--- /dev/null
+++ b/libgcc/config/arm/eabi/fcmp.S
@@ -0,0 +1,89 @@
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+   Copyright (C) 2006-2020 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_cmpsf2
+
+FUNC_START aeabi_cfrcmple
+
+	mov	ip, r0
+	movs	r0, r1
+	mov	r1, ip
+	b	6f
+
+FUNC_START aeabi_cfcmpeq
+FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
+
+	@ The status-returning routines are required to preserve all
+	@ registers except ip, lr, and cpsr.
+6:	push	{r0, r1, r2, r3, r4, lr}
+	bl	__lesf2
+	@ Set the Z flag correctly, and the C flag unconditionally.
+	cmp	r0, #0
+	@ Clear the C flag if the return value was -1, indicating
+	@ that the first operand was smaller than the second.
+	bmi	1f
+	movs	r1, #0
+	cmn	r0, r1
+1:
+	pop	{r0, r1, r2, r3, r4, pc}
+
+	FUNC_END aeabi_cfcmple
+	FUNC_END aeabi_cfcmpeq
+	FUNC_END aeabi_cfrcmple
+
+FUNC_START	aeabi_fcmpeq
+
+	push	{r4, lr}
+	bl	__eqsf2
+	negs	r0, r0
+	adds	r0, r0, #1
+	pop	{r4, pc}
+
+	FUNC_END aeabi_fcmpeq
+
+.macro COMPARISON cond, helper, mode=sf2
+FUNC_START	aeabi_fcmp\cond
+
+	push	{r4, lr}
+	bl	__\helper\mode
+	cmp	r0, #0
+	b\cond	1f
+	movs	r0, #0
+	pop	{r4, pc}
+1:
+	movs	r0, #1
+	pop	{r4, pc}
+
+	FUNC_END aeabi_fcmp\cond
+.endm
+
+COMPARISON lt, le
+COMPARISON le, le
+COMPARISON gt, ge
+COMPARISON ge, ge
+
+#endif /* L_arm_cmpsf2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index e828d53d732..4d460a77332 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2010,6 +2010,7 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+#include "eabi/fcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */

From patchwork Mon Oct 31 15:45:19 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59682
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 265F638376A7
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:51:27 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id C46DC38AA261
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C46DC38AA261
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id 85F8D3200094;
 Mon, 31 Oct 2022 11:48:20 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:48:20 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231300; x=
 1667317700; bh=dvZjxcxQQALCRgBJXE5t2mBK964D5f1N1Xh5U+/BJp4=; b=y
 RA2Dd01YRIpFyrevGafLO1Q5gUctAyTpnbRPaQs3I7oc01rA8Fu7k0rIbIm+hk/5
 zlMHUKYuOcaBvwJ5y/pPHAvoDJ1iRbeUvVq156iOYO+5Ba0JZz9M4CsmiDzhXH13
 BYNLGZfZ+yvEgfxTZK8JJzPL5eLgUOtCBM8YcES3i4OYteZhxHbPt1NrVbpY9Rwf
 00yI7msoRfRqBcitXrsh3jlsxdNC0zJlTuwD8Tv4XUQWZQhkHvNrE8+4A1KtNvKG
 8VvemFFCETe5RQdU6NjnlHwzs83DTR1b4SLXTbD4b0dsGl1qGl4Wx3Tl165gYMhP
 fKJbaRTsThnwOeyK9qkDQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231300; x=1667317700; bh=dvZjxcxQQALCR
 gBJXE5t2mBK964D5f1N1Xh5U+/BJp4=; b=LghfjEzT6b/phDfaJuI97aeSLDKze
 80PcXH9DcJGUVfrH/5KM0AO3PC8WUWX3oHrLmGrw+K5OhN/PoUQPmxNmCrxCwNGe
 9pJtIv2VSrcB9eNxvklowosHKxgZvG04SN5qZoT3wrAAV6v9x2+2ir4s2mMPVGMO
 GKNOrSHEXsyywJrhLUS+VtA/7m4EgyTTQNSeoyUFZLPcCFW3BOypyV3XWVZFe+AM
 61MIe2dEADpOAfjuQ5OzpT8Pkg9TjGt85Iil247pixa7xV5Xn4wmmdle4u0lxvpt
 arUxqHjBXhxbt+JZnxaaeEtNqTBQScdPuZ1f8MTl/073ytFP1LRradpIA==
X-ME-Sender: <xms:Q-5fY6NoPCCEitjlX17GXJWJN0QYrqR9lVwjAH2O5Q_MBKGn-r1Www>
 <xme:Q-5fY48b9RcV7zQmeNh7GLywfFYHJxCzF2bLEoP4NPfK3QjaKmeGG1loinhcVPL-U
 QKfFeBZb4884A>
X-ME-Received: 
 <xmr:Q-5fYxSOU8vH2jRDd9-El4PoZF8RjWKxRZsQp6cPdoSD3kl37A3qYn1CVzWVv4keG96VChFK6XNg6Ah3zIHZRq7gitSaMGrq2iIaRhDCiwY-bOb_IKO6cDY>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeehleejtddtvdefjeevkefftdehleelvedtffeuhfeiffeludethfehkeej
 tefghfenucffohhmrghinhepfhgtmhhprdhssgdpghhnuhdrohhrghdplhhisgdufhhunh
 gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho
 mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:RO5fY6vijDu6saMBK3noEHeHyzGy1TW9-zAywcd2Z7oYcMgQP36L8Q>
 <xmx:RO5fYyfpiWpBVaWYq3PfJzlBdElRPml0j4eFurLngXEZE9hL6MlQOw>
 <xmx:RO5fY-3rC4dO5b3smJvatUddvmrbax7IE47GPMmg5G_ik_yDfXDzGQ>
 <xmx:RO5fYzpUBAzLe5rhMLrM4IWOPD487ErjH9sHADmHaKbj_Ki6tEhDag>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:19 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmBjU087301; Mon, 31 Oct 2022 08:48:11 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 24/34] Import float comparison from the CM0 library
Date: Mon, 31 Oct 2022 08:45:19 -0700
Message-Id: <20221031154529.3627576-25-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

These functions are significantly smaller and faster than the wrapper
functions and soft-float implementation they replace.  Using the first
comparison operator (e.g. '<=') in any program costs about 70 bytes
initially, but every additional operator incrementally adds just 4 bytes.

NOTE: It seems that the __aeabi_cfcmp*() routines formerly in bpabi-v6m.S
were not well tested, as they returned wrong results for the 'C' flag.
The replacement functions are fully tested.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fcmp.S (__cmpsf2, __eqsf2, __gesf2,
	__aeabi_fcmpne, __aeabi_fcmpun): Added new functions.
	(__aeabi_fcmpeq, __aeabi_fcmpne, __aeabi_fcmplt, __aeabi_fcmple,
	 __aeabi_fcmpge, __aeabi_fcmpgt, __aeabi_cfcmple, __aeabi_cfcmpeq,
	 __aeabi_cfrcmple): Replaced with branches to __internal_cmpsf2().
	* config/arm/eabi/fplib.h: New file with fcmp-specific constants
	and general build configuration macros.
	* config/arm/lib1funcs.S: #include eabi/fplib.h (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _internal_cmpsf2,
	_arm_cfcmpeq, _arm_cfcmple, _arm_cfrcmple, _arm_fcmpeq,
	_arm_fcmpge, _arm_fcmpgt, _arm_fcmple, _arm_fcmplt, _arm_fcmpne,
	_arm_eqsf2, and _arm_gesf2.
---
 libgcc/config/arm/eabi/fcmp.S  | 643 +++++++++++++++++++++++++++++----
 libgcc/config/arm/eabi/fplib.h |  83 +++++
 libgcc/config/arm/lib1funcs.S  |   1 +
 libgcc/config/arm/t-elf        |  18 +
 4 files changed, 681 insertions(+), 64 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fplib.h

diff --git a/libgcc/config/arm/eabi/fcmp.S b/libgcc/config/arm/eabi/fcmp.S
index 96d627f1fea..0c813fae8c5 100644
--- a/libgcc/config/arm/eabi/fcmp.S
+++ b/libgcc/config/arm/eabi/fcmp.S
@@ -1,8 +1,7 @@
-/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
-   ARMv6-M and ARMv8-M Baseline like ISA variants.
+/* fcmp.S: Thumb-1 optimized 32-bit float comparison
 
-   Copyright (C) 2006-2020 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
 
    This file is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
@@ -24,66 +23,582 @@
    <http://www.gnu.org/licenses/>.  */
 
 
+// The various compare functions in this file all expect to tail call __cmpsf2()
+//  with flags set for a particular comparison mode.  The __internal_cmpsf2()
+//  symbol  itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __cmpsf2().  Importing an archive
+//  file that also exports __cmpsf2() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2lz() for compilation twice.
+// The first version configures __internal_cmpsf2() as a WEAK standalone symbol,
+//  and the second exports __cmpsf2() and __internal_cmpsf2() normally.
+// A small bonus: programs not using __cmpsf2() itself will be slightly smaller.
+// 'L_internal_cmpsf2' should appear before 'L_arm_cmpsf2' in LIB1ASMFUNCS.
+#if defined(L_arm_cmpsf2) || defined(L_internal_cmpsf2)
+
+#define CMPSF2_SECTION .text.sorted.libgcc.fcmp.cmpsf2
+
+// int __cmpsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * +1 if ($r0 > $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * -1 if ($r0 < $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+#ifdef L_arm_cmpsf2
+FUNC_START_SECTION cmpsf2 CMPSF2_SECTION
+FUNC_ALIAS lesf2 cmpsf2
+FUNC_ALIAS ltsf2 cmpsf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+
+    // int,int __internal_cmpsf2(float, float, int)
+    // Internal function expects a set of control flags in $r2.
+    // If ordered, returns a comparison type { 0, 1, 2 } in $r3
+    FUNC_ENTRY internal_cmpsf2
+
+#else /* L_internal_cmpsf2 */
+    WEAK_START_SECTION internal_cmpsf2 CMPSF2_SECTION
+    CFI_START_FUNCTION
+
+#endif
+
+        // When operand signs are considered, the comparison result falls
+        //  within one of the following quadrants:
+        //
+        // $r0  $r1  $r0-$r1* flags  result
+        //  +    +      >      C=0     GT
+        //  +    +      =      Z=1     EQ
+        //  +    +      <      C=1     LT
+        //  +    -      >      C=1     GT
+        //  +    -      =      C=1     GT
+        //  +    -      <      C=1     GT
+        //  -    +      >      C=0     LT
+        //  -    +      =      C=0     LT
+        //  -    +      <      C=0     LT
+        //  -    -      >      C=0     LT
+        //  -    -      =      Z=1     EQ
+        //  -    -      <      C=1     GT
+        //
+        // *When interpeted as a subtraction of unsigned integers
+        //
+        // From the table, it is clear that in the presence of any negative
+        //  operand, the natural result simply needs to be reversed.
+        // Save the 'N' flag for later use.
+        movs    r3,     r0
+        orrs    r3,     r1
+        mov     ip,     r3
+
+        // Keep the absolute value of the second argument for NAN testing.
+        lsls    r3,     r1,     #1
+
+        // With the absolute value of the second argument safely stored,
+        //  recycle $r1 to calculate the difference of the arguments.
+        subs    r1,     r0,     r1
+
+        // Save the 'C' flag for use later.
+        // Effectively shifts all the flags 1 bit left.
+        adcs    r2,     r2
+
+        // Absolute value of the first argument.
+        lsls    r0,     #1
+
+        // Identify the largest absolute value between the two arguments.
+        cmp     r0,     r3
+        bhs     LLSYM(__fcmp_sorted)
+
+        // Keep the larger absolute value for NAN testing.
+        // NOTE: When the arguments are respectively a signaling NAN and a
+        //  quiet NAN, the quiet NAN has precedence.  This has consequences
+        //  if TRAP_NANS is enabled, but the flags indicate that exceptions
+        //  for quiet NANs should be suppressed.  After the signaling NAN is
+        //  discarded, no exception is raised, although it should have been.
+        // This could be avoided by using a fifth register to save both
+        //  arguments until the signaling bit can be tested, but that seems
+        //  like an excessive amount of ugly code for an ambiguous case.
+        movs    r0,     r3
+
+    LLSYM(__fcmp_sorted):
+        // If $r3 is NAN, the result is unordered.
+        movs    r3,     #255
+        lsls    r3,     #24
+        cmp     r0,     r3
+        bhi     LLSYM(__fcmp_unordered)
+
+        // Positive and negative zero must be considered equal.
+        // If the larger absolute value is +/-0, both must have been +/-0.
+        subs    r3,     r0,     #0
+        beq     LLSYM(__fcmp_zero)
+
+        // Test for regular equality.
+        subs    r3,     r1,     #0
+        beq     LLSYM(__fcmp_zero)
+
+        // Isolate the saved 'C', and invert if either argument was negative.
+        // Remembering that the original subtraction was $r1 - $r0,
+        //  the result will be 1 if 'C' was set (gt), or 0 for not 'C' (lt).
+        lsls    r3,     r2,     #31
+        add     r3,     ip
+        lsrs    r3,     #31
+
+        // HACK: Force the 'C' bit clear,
+        //  since bit[30] of $r3 may vary with the operands.
+        adds    r3,     #0
+
+    LLSYM(__fcmp_zero):
+        // After everything is combined, the temp result will be
+        //  2 (gt), 1 (eq), or 0 (lt).
+        adcs    r3,     r3
+
+        // Short-circuit return if the 3-way comparison flag is set.
+        // Otherwise, shifts the condition mask into bits[2:0].
+        lsrs    r2,     #2
+        bcs     LLSYM(__fcmp_return)
+
+        // If the bit corresponding to the comparison result is set in the
+        //  accepance mask, a '1' will fall out into the result.
+        movs    r0,     #1
+        lsrs    r2,     r3
+        ands    r0,     r2
+        RET
+
+    LLSYM(__fcmp_unordered):
+        // Set up the requested UNORDERED result.
+        // Remember the shift in the flags (above).
+        lsrs    r2,     #6
+
+  #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // TODO: ... The
+
+
+  #endif
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // Always raise an exception if FCMP_RAISE_EXCEPTIONS was specified.
+        bcs     LLSYM(__fcmp_trap)
+
+        // If FCMP_NO_EXCEPTIONS was specified, no exceptions on quiet NANs.
+        // The comparison flags are moot, so $r1 can serve as scratch space.
+        lsrs    r1,     r0,     #24
+        bcs     LLSYM(__fcmp_return2)
+
+    LLSYM(__fcmp_trap):
+        // Restore the NAN (sans sign) for an argument to the exception.
+        // As an IRQ, the handler restores all registers, including $r3.
+        // NOTE: The service handler may not return.
+        lsrs    r0,     #1
+        movs    r3,     #(UNORDERED_COMPARISON)
+        svc     #(SVC_TRAP_NAN)
+  #endif
+
+     LLSYM(__fcmp_return2):
+        // HACK: Work around result register mapping.
+        // This could probably be eliminated by remapping the flags register.
+        movs    r3,     r2
+
+    LLSYM(__fcmp_return):
+        // Finish setting up the result.
+        // Constant subtraction allows a negative result while keeping the
+        //  $r2 flag control word within 8 bits, particularly for FCMP_UN*.
+        // This operation also happens to set the 'Z' and 'C' flags correctly
+        //  per the requirements of __aeabi_cfcmple() et al.
+        subs    r0,     r3,     #1
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END internal_cmpsf2
+
 #ifdef L_arm_cmpsf2
+FUNC_END ltsf2
+FUNC_END lesf2
+FUNC_END cmpsf2
+#endif
+
+#endif /* L_arm_cmpsf2 || L_internal_cmpsf2 */
+
+
+#ifdef L_arm_eqsf2
+
+// int __eqsf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * -1 if ($r0 < $r1)
+//  *  0 if ($r0 == $r1)
+//  * +1 if ($r0 > $r1), or either argument is NAN
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION eqsf2 .text.sorted.libgcc.fcmp.eqsf2
+FUNC_ALIAS nesf2 eqsf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END nesf2
+FUNC_END eqsf2
+
+#endif /* L_arm_eqsf2 */
+
+
+#ifdef L_arm_gesf2
+
+// int __gesf2(float, float)
+// <https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html>
+// Returns the three-way comparison result of $r0 with $r1:
+//  * -1 if ($r0 < $r1), or either argument is NAN
+//  *  0 if ($r0 == $r1)
+//  * +1 if ($r0 > $r1)
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION gesf2 .text.sorted.libgcc.fcmp.gesf2
+FUNC_ALIAS gtsf2 gesf2
+    CFI_START_FUNCTION
+
+        // Assumption: The 'libgcc' functions should raise exceptions.
+        movs    r2,     #(FCMP_UN_NEGATIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END gtsf2
+FUNC_END gesf2
+
+#endif /* L_arm_gesf2 */
+
+
+#ifdef L_arm_fcmpeq
+
+// int __aeabi_fcmpeq(float, float)
+// Returns '1' in $r1 if ($r0 == $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmpeq .text.sorted.libgcc.fcmp.fcmpeq
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_fcmpeq
+
+#endif /* L_arm_fcmpeq */
+
+
+#ifdef L_arm_fcmpne
+
+// int __aeabi_fcmpne(float, float) [non-standard]
+// Returns '1' in $r1 if ($r0 != $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmpne .text.sorted.libgcc.fcmp.fcmpne
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_NE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_fcmpne
+
+#endif /* L_arm_fcmpne */
+
+
+#ifdef L_arm_fcmplt
+
+// int __aeabi_fcmplt(float, float)
+// Returns '1' in $r1 if ($r0 < $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmplt .text.sorted.libgcc.fcmp.fcmplt
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_fcmplt
+
+#endif /* L_arm_fcmplt */
+
+
+#ifdef L_arm_fcmple
+
+// int __aeabi_fcmple(float, float)
+// Returns '1' in $r1 if ($r0 <= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmple .text.sorted.libgcc.fcmp.fcmple
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_LE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_fcmple
+
+#endif /* L_arm_fcmple */
+
+
+#ifdef L_arm_fcmpge
+
+// int __aeabi_fcmpge(float, float)
+// Returns '1' in $r1 if ($r0 >= $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmpge .text.sorted.libgcc.fcmp.fcmpge
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GE)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_fcmpge
+
+#endif /* L_arm_fcmpge */
+
+
+#ifdef L_arm_fcmpgt
+
+// int __aeabi_fcmpgt(float, float)
+// Returns '1' in $r1 if ($r0 > $r1) (ordered).
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmpgt .text.sorted.libgcc.fcmp.fcmpgt
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_RAISE_EXCEPTIONS + FCMP_GT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_fcmpgt
+
+#endif /* L_arm_cmpgt */
+
+
+#ifdef L_arm_unordsf2
+
+// int __aeabi_fcmpun(float, float)
+// Returns '1' in $r1 if $r0 and $r1 are unordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION aeabi_fcmpun .text.sorted.libgcc.fcmp.fcmpun
+FUNC_ALIAS unordsf2 aeabi_fcmpun
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+FUNC_END unordsf2
+FUNC_END aeabi_fcmpun
+
+#endif /* L_arm_unordsf2 */
+
+
+#if defined(L_arm_cfcmple) || defined(L_arm_cfrcmple) || \
+   (defined(L_arm_cfcmpeq) && defined(TRAP_NANS) && TRAP_NANS)
+
+#if defined(L_arm_cfcmple)
+  #define CFCMPLE_NAME aeabi_cfcmple
+  #define CFCMPLE_SECTION .text.sorted.libgcc.fcmp.cfcmple
+#elif defined(L_arm_cfrcmple)
+  #define CFCMPLE_NAME aeabi_cfrcmple
+  #define CFCMPLE_SECTION .text.sorted.libgcc.fcmp.cfrcmple
+#else
+  #define CFCMPLE_NAME aeabi_cfcmpeq
+  #define CFCMPLE_SECTION .text.sorted.libgcc.fcmp.cfcmpeq
+#endif
+
+// void __aeabi_cfcmple(float, float)
+// void __aeabi_cfrcmple(float, float)
+// void __aeabi_cfcmpeq(float, float)
+// __aeabi_cfrcmple() first reverses the ordr of the input arguments.
+// __aeabi_cfcmpeq() is an alias of __aeabi_cfcmple() if the library
+//  does not support signaling NAN exceptions.
+// Three-way compare of $r0 ? $r1, with result in the status flags:
+//  * 'Z' is set only when the operands are ordered and equal.
+//  * 'C' is clear only when the operands are ordered and $r0 < $r1.
+// Preserves all core registers except $ip, $lr, and the CPSR.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION CFCMPLE_NAME CFCMPLE_SECTION
+
+  // __aeabi_cfcmpeq() is defined separately when TRAP_NANS is enabled.
+  #if defined(L_arm_cfcmple) && !(defined(TRAP_NANS) && TRAP_NANS)
+    FUNC_ALIAS aeabi_cfcmpeq aeabi_cfcmple
+  #endif
+
+    CFI_START_FUNCTION
+
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        push    { r0 - r3, rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 24
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset rT, 16
+                .cfi_rel_offset lr, 20
+      #else
+        push    { r0 - r3, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 20
+                .cfi_rel_offset r0, 0
+                .cfi_rel_offset r1, 4
+                .cfi_rel_offset r2, 8
+                .cfi_rel_offset r3, 12
+                .cfi_rel_offset lr, 16
+      #endif
+
+  #ifdef L_arm_cfcmple
+        // Even though the result in $r0 will be discarded, the 3-way
+        //  subtraction of '-1' that generates this result happens to
+        //  set 'C' and 'Z' perfectly.  Unordered results group with '>'.
+        // This happens to be the same control word as __cmpsf2(), meaning
+        //  that __cmpsf2() is a potential direct branch target.  However,
+        //  the choice to set a redundant control word and branch to
+        //  __internal_cmpsf2() makes this compiled object more robust
+        //  against linking with 'foreign' __cmpsf2() implementations.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_RAISE_EXCEPTIONS + FCMP_3WAY)
+  #elif defined(L_arm_cfrcmple)
+        // Instead of reversing the order of the operands, it's slightly
+        //  faster to inverted the result.  But, for that to fully work,
+        //  the sense of NAN must be pre-inverted.
+        movs    r2,     #(FCMP_UN_NEGATIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+  #else /* L_arm_cfcmpeq */
+        // Same as __aeabi_cfcmple(), except no exceptions on quiet NAN.
+        movs    r2,     #(FCMP_UN_POSITIVE + FCMP_NO_EXCEPTIONS + FCMP_3WAY)
+  #endif
+
+        bl      SYM(__internal_cmpsf2)
+
+  #ifdef L_arm_cfrcmple
+        // Instead of reversing the order of the operands, it's slightly
+        //  faster to inverted the result.  Since __internal_cmpsf2() sets
+        //  its flags by subtracing '1' from $r3, the reverse flags may be
+        //  simply obtained subtracting $r3 from 1.
+        movs    r1,    #1
+        subs    r1,    r3
+  #endif /* L_arm_cfrcmple */
+
+        // Clean up all working registers.
+      #if defined(DOUBLE_ALIGN_STACK) && DOUBLE_ALIGN_STACK
+        pop     { r0 - r3, rT, pc }
+                .cfi_restore_state
+      #else
+        pop     { r0 - r3, pc }
+                .cfi_restore_state
+      #endif
+
+    CFI_END_FUNCTION
+
+  #if defined(L_arm_cfcmple) && !(defined(TRAP_NANS) && TRAP_NANS)
+    FUNC_END aeabi_cfcmpeq
+  #endif
+
+FUNC_END CFCMPLE_NAME
+
+#endif /* L_arm_cfcmple || L_arm_cfrcmple || L_arm_cfcmpeq */
+
+
+// C99 libm functions
+#ifndef __GNUC__
+
+// int isgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 > $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION isgreaterf .text.sorted.libgcc.fcmp.isgtf
+MATH_ALIAS isgreaterf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isgreaterf
+FUNC_END isgreaterf
+
+
+// int isgreaterequalf(float, float)
+// Returns '1' in $r0 if ($r0 >= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION isgreaterequalf .text.sorted.libgcc.fcmp.isgef
+MATH_ALIAS isgreaterequalf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isgreaterequalf
+FUNC_END isgreaterequalf
+
+
+// int islessf(float, float)
+// Returns '1' in $r0 if ($r0 < $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION islessf .text.sorted.libgcc.fcmp.isltf
+MATH_ALIAS islessf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessf
+FUNC_END islessf
+
+
+// int islessequalf(float, float)
+// Returns '1' in $r0 if ($r0 <= $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION islessequalf .text.sorted.libgcc.fcmp.islef
+MATH_ALIAS islessequalf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessequalf
+FUNC_END islessequalf
+
+
+// int islessgreaterf(float, float)
+// Returns '1' in $r0 if ($r0 != $r1) and both $r0 and $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION islessgreaterf .text.sorted.libgcc.fcmp.isnef
+MATH_ALIAS islessgreaterf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END islessgreaterf
+FUNC_END islessgreaterf
+
+
+// int isunorderedf(float, float)
+// Returns '1' in $r0 if either $r0 or $r1 are ordered.
+// Uses $r2, $r3, and $ip as scratch space.
+// Same parent section as __cmpsf2() to keep tail call branch within range.
+FUNC_START_SECTION isunorderedf .text.sorted.libgcc.fcmp.isunf
+MATH_ALIAS isunorderedf
+    CFI_START_FUNCTION
+
+        movs    r2,     #(FCMP_UN_ZERO + FCMP_NO_EXCEPTIONS + FCMP_GT + FCMP_EQ)
+        b       SYM(__internal_cmpsf2)
+
+    CFI_END_FUNCTION
+MATH_END isunorderedf
+FUNC_END isunorderedf
 
-FUNC_START aeabi_cfrcmple
-
-	mov	ip, r0
-	movs	r0, r1
-	mov	r1, ip
-	b	6f
-
-FUNC_START aeabi_cfcmpeq
-FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
-
-	@ The status-returning routines are required to preserve all
-	@ registers except ip, lr, and cpsr.
-6:	push	{r0, r1, r2, r3, r4, lr}
-	bl	__lesf2
-	@ Set the Z flag correctly, and the C flag unconditionally.
-	cmp	r0, #0
-	@ Clear the C flag if the return value was -1, indicating
-	@ that the first operand was smaller than the second.
-	bmi	1f
-	movs	r1, #0
-	cmn	r0, r1
-1:
-	pop	{r0, r1, r2, r3, r4, pc}
-
-	FUNC_END aeabi_cfcmple
-	FUNC_END aeabi_cfcmpeq
-	FUNC_END aeabi_cfrcmple
-
-FUNC_START	aeabi_fcmpeq
-
-	push	{r4, lr}
-	bl	__eqsf2
-	negs	r0, r0
-	adds	r0, r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmpeq
-
-.macro COMPARISON cond, helper, mode=sf2
-FUNC_START	aeabi_fcmp\cond
-
-	push	{r4, lr}
-	bl	__\helper\mode
-	cmp	r0, #0
-	b\cond	1f
-	movs	r0, #0
-	pop	{r4, pc}
-1:
-	movs	r0, #1
-	pop	{r4, pc}
-
-	FUNC_END aeabi_fcmp\cond
-.endm
-
-COMPARISON lt, le
-COMPARISON le, le
-COMPARISON gt, ge
-COMPARISON ge, ge
-
-#endif /* L_arm_cmpsf2 */
+#endif /* !__GNUC__ */
 
diff --git a/libgcc/config/arm/eabi/fplib.h b/libgcc/config/arm/eabi/fplib.h
new file mode 100644
index 00000000000..91cc1dde0d7
--- /dev/null
+++ b/libgcc/config/arm/eabi/fplib.h
@@ -0,0 +1,83 @@
+/* fplib.h: Thumb-1 optimized floating point library configuration
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef __FPLIB_H
+#define __FPLIB_H
+
+/* Enable exception interrupt handler.
+   Exception implementation is opportunistic, and not fully tested.  */
+#define TRAP_EXCEPTIONS (0)
+#define EXCEPTION_CODES (0)
+
+/* Perform extra checks to avoid modifying the sign bit of NANs */
+#define STRICT_NANS (0)
+
+/* Trap signaling NANs regardless of context. */
+#define TRAP_NANS (0)
+
+/* TODO: Define service numbers according to the handler requirements */
+#define SVC_TRAP_NAN (0)
+#define SVC_FP_EXCEPTION (0)
+#define SVC_DIVISION_BY_ZERO (0)
+
+/* Push extra registers when required for 64-bit stack alignment */
+#define DOUBLE_ALIGN_STACK (1)
+
+/* Manipulate *div0() parameters to meet the ARM runtime ABI specification. */
+#define PEDANTIC_DIV0 (1)
+
+/* Define various exception codes.  These don't map to anything in particular */
+#define SUBTRACTED_INFINITY (20)
+#define INFINITY_TIMES_ZERO (21)
+#define DIVISION_0_BY_0 (22)
+#define DIVISION_INF_BY_INF (23)
+#define UNORDERED_COMPARISON (24)
+#define CAST_OVERFLOW (25)
+#define CAST_INEXACT (26)
+#define CAST_UNDEFINED (27)
+
+/* Exception control for quiet NANs.
+   If TRAP_NAN support is enabled, signaling NANs always raise exceptions. */
+#define FCMP_RAISE_EXCEPTIONS   16
+#define FCMP_NO_EXCEPTIONS      0
+
+/* The bit indexes in these assignments are significant.  See implementation.
+   They are shared publicly for eventual use by newlib.  */
+#define FCMP_3WAY           (1)
+#define FCMP_LT             (2)
+#define FCMP_EQ             (4)
+#define FCMP_GT             (8)
+
+#define FCMP_GE             (FCMP_EQ | FCMP_GT)
+#define FCMP_LE             (FCMP_LT | FCMP_EQ)
+#define FCMP_NE             (FCMP_LT | FCMP_GT)
+
+/* These flags affect the result of unordered comparisons.  See implementation.  */
+#define FCMP_UN_THREE       (128)
+#define FCMP_UN_POSITIVE    (64)
+#define FCMP_UN_ZERO        (32)
+#define FCMP_UN_NEGATIVE    (0)
+
+#endif /* __FPLIB_H */
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 4d460a77332..188d9d7ff47 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2010,6 +2010,7 @@ LSYM(Lchange_\register):
 #include "bpabi.S"
 #else /* NOT_ISA_TARGET_32BIT */
 #include "bpabi-v6m.S"
+#include "eabi/fplib.h"
 #include "eabi/fcmp.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index eb1acd8d5a2..e69579e16dd 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -30,6 +30,7 @@ LIB1ASMFUNCS += \
 ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
 # Group 0B: WEAK overridable function objects built for v6m only.
 LIB1ASMFUNCS += \
+	_internal_cmpsf2 \
 	_muldi3 \
 	
 endif
@@ -80,6 +81,23 @@ LIB1ASMFUNCS += \
 	_arm_negsf2 \
 	_arm_unordsf2 \
 
+ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
+# Group 2B: Single precision function objects built for v6m only.
+LIB1ASMFUNCS += \
+        _arm_cfcmpeq \
+        _arm_cfcmple \
+        _arm_cfrcmple \
+        _arm_fcmpeq \
+        _arm_fcmpge \
+        _arm_fcmpgt \
+        _arm_fcmple \
+        _arm_fcmplt \
+        _arm_fcmpne \
+        _arm_eqsf2 \
+        _arm_gesf2 \
+
+endif
+
 
 # Group 3: Double precision floating point function objects.
 LIB1ASMFUNCS += \

From patchwork Mon Oct 31 15:45:20 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59681
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2741D3851428
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:51:21 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id A0FF93860748
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A0FF93860748
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailout.west.internal (Postfix) with ESMTP id 81D8D3200949;
 Mon, 31 Oct 2022 11:48:25 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute4.internal (MEProxy); Mon, 31 Oct 2022 11:48:25 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231305; x=
 1667317705; bh=c3l4+UTJfXoeQ5mE1ykfsGdN/s32XAT7zPGVkLMGQ/8=; b=f
 lAgEiuuDVFjUkxvHTa4SHX23DUUWbbCTQgXeUcx+KEhhZdszj713ml7cK/oGaWT3
 13W2vDublofzhLptE0WGW/L4zIXSGCiVCuR1NrBsly3dyMqJZn2LX4BoRpAw2jUI
 gNg0aPA6L+JxW4wxWzJh2z0T7DaVQqCBHDpjUTAGntwVYW4nkYarjcXd28by6NZ3
 SasDltlcnUOz71n4mZPvXehEYdUm1LtW2t82Zx0enszrT6XPcElE4I3fblDN4KnC
 4ohUemnpgF+TGh/QnOkYhd7BqRkw6uxjVVn/s98TB3wILFrYfvDiqam9LOtMeM5F
 KGzsvBP8OGuOZdiGlcs1Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231305; x=1667317705; bh=c3l4+UTJfXoeQ
 5mE1ykfsGdN/s32XAT7zPGVkLMGQ/8=; b=SI2eVtEp/UlsWs6Tv0cy2qTCzzf4F
 Zpo8mgdTmAO8zZKBM1CVJEf0Lyf1FIO7d6gbQotq6itLNFJS1lYrdIPLMuFUQJSZ
 mUkHr7o4WfXk0voDUj4oyFRa65Y5CXxHQIGYzmjyhS6C4fAK+fyH37KJ450MIusR
 dRZZTbZbfcjXG0jfwBJMAsoxVlFOxwQF3NVS8U7F71gO1nTvxjGSWRF8BXq5CNyZ
 zBkdNcXkIRSNm+7v/3DtJbO69KYE9LGoS2/inFV5v9gDXQaZZ/ksHOy0590YDXW7
 5zYRX8UWcZ46IxxGB7qMJopBAtEk0Is8gWO5jYZ9JTsaSRTEzx6sHYsvg==
X-ME-Sender: <xms:SO5fY6GXAE1LBBhXjQ-7vg_hDbDZYzOe5GfxGb4g6M29ZrpuXCfGhw>
 <xme:SO5fY7VR3f-MHrq9r1QzbKb1dtQs40aB-En5hzG0mUeDX7NkPLcyCcdhtffNaf06I
 FrgTxDNYe-WaA>
X-ME-Received: 
 <xmr:SO5fY0LA4AgKcR2t26qZXg_p_ql2qUIgYhw5a6ejtk1Hp7xc8FE-AxkkhmO7uHI9QRMUxhBep1xMoTlLqX1lAEBJQ71sc9I53kc8CH9nnEtLQ_boenBPlz4>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpefftdeiudduhfdutdduudfffeeludekgeeuleelgedvffetvefgfeejffdt
 gefggeenucffohhmrghinhepsghprggsihdqvheimhdrshgspdhfrgguugdrshgspdhgnh
 hurdhorhhgpdhlihgsudhfuhhntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecu
 rfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:SO5fY0GUKHLUwFUttpw_n61Gj4R37MZCo9sFu9MxrcYEIK92jJAzPQ>
 <xmx:SO5fYwVKqVayvctixekcRLUwoMlfI79NoOfFIsD3WgITLK5lbwWi8A>
 <xmx:SO5fY3OeFxL0zQtpwrxDVkXBz2yPb34WpGC4QlnF_-XgSf7IqXsjRA>
 <xmx:Se5fY9iyT9tVvy_P7X3wj_9PL1081J52Zxhwv667n-wl2J7RduT-IA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:24 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmGu6087304; Mon, 31 Oct 2022 08:48:16 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 25/34] Refactor Thumb-1 float subtraction into a new file
Date: Mon, 31 Oct 2022 08:45:20 -0700
Message-Id: <20221031154529.3627576-26-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This will make it easier to isolate changes in subsequent patches.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bpabi-v6m.S (__aeabi_frsub): Moved to ...
	* config/arm/eabi/fadd.S: New file.
	* config/arm/lib1funcs.S: #include eabi/fadd.S (v6m only).
---
 libgcc/config/arm/bpabi-v6m.S | 16 ---------------
 libgcc/config/arm/eabi/fadd.S | 38 +++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |  1 +
 3 files changed, 39 insertions(+), 16 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fadd.S

diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 8e0a45f4716..afba648ec57 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -33,22 +33,6 @@
 	.eabi_attribute 25, 1
 #endif /* __ARM_EABI__ */
 
-
-#ifdef L_arm_addsubsf3
-
-FUNC_START aeabi_frsub
-
-      push	{r4, lr}
-      movs	r4, #1
-      lsls	r4, #31
-      eors	r0, r0, r4
-      bl	__aeabi_fadd
-      pop	{r4, pc}
-
-      FUNC_END aeabi_frsub
-
-#endif /* L_arm_addsubsf3 */
-
 #ifdef L_arm_addsubdf3
 
 FUNC_START aeabi_drsub
diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
new file mode 100644
index 00000000000..fffbd91d1bc
--- /dev/null
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -0,0 +1,38 @@
+/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_addsubsf3
+
+FUNC_START aeabi_frsub
+
+      push	{r4, lr}
+      movs	r4, #1
+      lsls	r4, #31
+      eors	r0, r0, r4
+      bl	__aeabi_fadd
+      pop	{r4, pc}
+
+      FUNC_END aeabi_frsub
+
+#endif /* L_arm_addsubsf3 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 188d9d7ff47..d1a2d2f7908 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2012,6 +2012,7 @@ LSYM(Lchange_\register):
 #include "bpabi-v6m.S"
 #include "eabi/fplib.h"
 #include "eabi/fcmp.S"
+#include "eabi/fadd.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */

From patchwork Mon Oct 31 15:45:21 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59685
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9ACB43845190
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:51:56 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id E99783860769
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E99783860769
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id D6B7D320093C;
 Mon, 31 Oct 2022 11:48:30 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:48:31 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231310; x=
 1667317710; bh=dqxfAF+fsAJAr86oOBlq+WWHOxNMZzMbodr+zGaQ1fI=; b=b
 YzcCmPt9T0o8Y1ga3yjzrw+TqDdU2X+odwbOH3jE0tbj0k6JBYrxw3IHWuKz+lji
 3NmpnRy0MpxYhfE8chHCOqwVNqVDhkkbEgmLiftO7t/bhDtd77auAdkqq1kGCjA+
 rhwnwJH8wD2s6ZdpSq2+OsuxfNkpPwUPsCWhf9B3Z5xPeM7Jcr7uQBDsl2va9TEQ
 ytNHAVjGBLvi0JTjIB4ePvLSfRxNqBwzRoiGgUdw9Cyxz3Ve6o1rbGeVl6tSjQQn
 Q0SxGy9+k0AXmB77eL8XF+TNfed80GBw62cphsIPloZ8OD6L9tLdjHn7RKMiZcvE
 nWtmMwfeyKk3SzVgH0FSg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231310; x=1667317710; bh=dqxfAF+fsAJAr
 86oOBlq+WWHOxNMZzMbodr+zGaQ1fI=; b=aoqmPYtj2tMpLYAcKXXDOAtr58x5K
 6/uSHh5WIqe6Aq5R4tijBiZUoflDLNbHSF/ZlrcwT0b7Xb4bB3VIri+x8L2ArXEU
 MWuAF3PbXFAr55seDd44oolBTCFSQUjcoDxLYJ5uQGR1i5mHURccNGpGiFj1ctas
 jSUfHPl60rjRRAAYo8g0Mh5aXqneFTNRwCNoOTGPEAabKm/o0JIYtxqnR/WXIp7B
 dWrr9q5j8nUJbWeVL9mLHaiGw3107HHCJC9kevA8Qscc7aELjdX0NPxjf5hsnzP4
 hCxJ68C+VtS9Jb7v+fVPNskquDwh7XfPlES+oVBcx1+uZGF2rHyADpmAw==
X-ME-Sender: <xms:Tu5fY9_O5DTL4r014DYdh3uWYNXPerNRQNEUCy_TLL_4MFoHSFzvSQ>
 <xme:Tu5fYxuL0nbZdeH_XChGynJk8UcdDpIQriONZ2kcG-W-HVgNhohtKjlg-5xdAifAI
 15_0dE_Aeqzgg>
X-ME-Received: 
 <xmr:Tu5fY7DU416F4ys56RL9aXJX6p97eaxmtN7EO8eLsEjSGzFkeeVVEP9EvXKb1rltrXx6NChc_8P3ISWR4v1QIKc6HKSKy2mdEF4bBEHntAkD0zySfEK0yYA>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeeivefhffdvlefghefgfefgleekvdejuedvveduueeltddtveduleetleev
 uddugfenucffohhmrghinhepfhgruggurdhssgdpghhnuhdrohhrghdpfhhnvghgrdhssg
 dpfhhuthhilhdrshgspdhlihgsudhfuhhntghsrdhssgenucevlhhushhtvghrufhiiigv
 pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgnhhusegurghnihgvlhgvnhhgvghlrd
 gtohhm
X-ME-Proxy: <xmx:Tu5fYxcbOw-QVOYOzPmau-9RKGgR76G9kullgy7RP9Bqv74CNLzr9Q>
 <xmx:Tu5fYyP3-L1EX2h2mP6z4qPKH8liHrSSP9Lvcw60fz8k6HKbvczTyw>
 <xmx:Tu5fYzlINUCmMSH4euvavpoAxXx22SS9iqGZsrXFsXTlEDn-Ozk-PA>
 <xmx:Tu5fY7alcFBpiF-qgCLXIbCKHbiddxOJEMyibBWxIH-4Urxydc8Q5g>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:29 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmLDs087307; Mon, 31 Oct 2022 08:48:21 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 26/34] Import float addition and subtraction from the CM0
 library
Date: Mon, 31 Oct 2022 08:45:21 -0700
Message-Id: <20221031154529.3627576-27-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Since this is the first import of single-precision functions, some common
parsing and formatting routines are also included.  These common rotines
will be referenced by other functions in subsequent commits.
However, even if the size penalty is accounted entirely to __addsf3(),
the total compiled size is still less than half the size of soft-float.

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fadd.S (__addsf3, __subsf3): Added new functions.
	* config/arm/eabi/fneg.S (__negsf2): Added new file.
	* config/arm/eabi/futil.S (__fp_normalize2, __fp_lalign2, __fp_assemble,
	__fp_overflow, __fp_zero, __fp_check_nan): Added new file with shared
	helper functions.
	* config/arm/lib1funcs.S: #include eabi/fneg.S and eabi/futil.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_addsf3, _arm_frsubsf3,
	_fp_exceptionf, _fp_checknanf, _fp_assemblef, and _fp_normalizef.
---
 libgcc/config/arm/eabi/fadd.S  | 306 +++++++++++++++++++++++-
 libgcc/config/arm/eabi/fneg.S  |  76 ++++++
 libgcc/config/arm/eabi/fplib.h |   3 -
 libgcc/config/arm/eabi/futil.S | 418 +++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S  |   2 +
 libgcc/config/arm/t-elf        |   6 +
 6 files changed, 798 insertions(+), 13 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/futil.S

diff --git a/libgcc/config/arm/eabi/fadd.S b/libgcc/config/arm/eabi/fadd.S
index fffbd91d1bc..176e330a1b6 100644
--- a/libgcc/config/arm/eabi/fadd.S
+++ b/libgcc/config/arm/eabi/fadd.S
@@ -1,5 +1,7 @@
-/* Copyright (C) 2006-2021 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
+/* fadd.S: Thumb-1 optimized 32-bit float addition and subtraction
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
 
    This file is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
@@ -21,18 +23,302 @@
    <http://www.gnu.org/licenses/>.  */
 
 
+#ifdef L_arm_frsubsf3
+
+// float __aeabi_frsub(float, float)
+// Returns the floating point difference of $r1 - $r0 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_frsub .text.sorted.libgcc.fpcore.b.frsub
+    CFI_START_FUNCTION
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r0 is NAN before modifying.
+        lsls    r2,     r0,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
+
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     SYM(__aeabi_fadd)
+      #endif
+
+        // Flip sign and run through fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r0,     r2
+        b       SYM(__aeabi_fadd)
+
+    CFI_END_FUNCTION
+FUNC_END aeabi_frsub
+
+#endif /* L_arm_frsubsf3 */
+
+
 #ifdef L_arm_addsubsf3
 
-FUNC_START aeabi_frsub
+// float __aeabi_fsub(float, float)
+// Returns the floating point difference of $r0 - $r1 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fsub .text.sorted.libgcc.fpcore.c.faddsub
+FUNC_ALIAS subsf3 aeabi_fsub
+    CFI_START_FUNCTION
 
-      push	{r4, lr}
-      movs	r4, #1
-      lsls	r4, #31
-      eors	r0, r0, r4
-      bl	__aeabi_fadd
-      pop	{r4, pc}
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Check if $r1 is NAN before modifying.
+        lsls    r2,     r1,     #1
+        movs    r3,     #255
+        lsls    r3,     #24
 
-      FUNC_END aeabi_frsub
+        // Let fadd() find the NAN in the normal course of operation,
+        //  moving it to $r0 and checking the quiet/signaling bit.
+        cmp     r2,     r3
+        bhi     SYM(__aeabi_fadd)
+      #endif
+
+        // Flip sign and fall into fadd().
+        movs    r2,     #1
+        lsls    r2,     #31
+        adds    r1,     r2
 
 #endif /* L_arm_addsubsf3 */
 
+
+// The execution of __subsf3() flows directly into __addsf3(), such that
+//  instructions must appear consecutively in the same memory section.
+//  However, this construction inhibits the ability to discard __subsf3()
+//  when only using __addsf3().
+// Therefore, this block configures __addsf3() for compilation twice.
+// The first version is a minimal standalone implementation, and the second
+//  version is the continuation of __subsf3().  The standalone version must
+//  be declared WEAK, so that the combined version can supersede it and
+//  provide both symbols when required.
+// '_arm_addsf3' should appear before '_arm_addsubsf3' in LIB1ASMFUNCS.
+#if defined(L_arm_addsf3) || defined(L_arm_addsubsf3)
+
+#ifdef L_arm_addsf3
+// float __aeabi_fadd(float, float)
+// Returns the floating point sum of $r0 + $r1 in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+WEAK_START_SECTION aeabi_fadd .text.sorted.libgcc.fpcore.c.fadd
+WEAK_ALIAS addsf3 aeabi_fadd
+    CFI_START_FUNCTION
+
+#else /* L_arm_addsubsf3 */
+FUNC_ENTRY aeabi_fadd
+FUNC_ALIAS addsf3 aeabi_fadd
+
+#endif
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Drop the sign bit to compare absolute value.
+        lsls    r2,     r0,     #1
+        lsls    r3,     r1,     #1
+
+        // Save the logical difference of original values.
+        // This actually makes the following swap slightly faster.
+        eors    r1,     r0
+
+        // Compare exponents+mantissa.
+        // MAYBE: Speedup for equal values?  This would have to separately
+        //  check for NAN/INF and then either:
+        // * Increase the exponent by '1' (for multiply by 2), or
+        // * Return +0
+        cmp     r2,     r3
+        bhs     LLSYM(__fadd_ordered)
+
+        // Reorder operands so the larger absolute value is in r2,
+        //  the corresponding original operand is in $r0,
+        //  and the smaller absolute value is in $r3.
+        movs    r3,     r2
+        eors    r0,     r1
+        lsls    r2,     r0,     #1
+
+    LLSYM(__fadd_ordered):
+        // Extract the exponent of the larger operand.
+        // If INF/NAN, then it becomes an automatic result.
+        lsrs    r2,     #24
+        cmp     r2,     #255
+        beq     LLSYM(__fadd_special)
+
+        // Save the sign of the result.
+        lsrs    rT,     r0,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // If the original value of $r1 was to +/-0,
+        //  $r0 becomes the automatic result.
+        // Because $r0 is known to be a finite value, return directly.
+        // It's actually important that +/-0 not go through the normal
+        //  process, to keep "-0 +/- 0"  from being turned into +0.
+        cmp     r3,     #0
+        beq     LLSYM(__fadd_zero)
+
+        // Extract the second exponent.
+        lsrs    r3,     #24
+
+        // Calculate the difference of exponents (always positive).
+        subs    r3,     r2,     r3
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // If the smaller operand is more than 25 bits less significant
+        //  than the larger, the larger operand is an automatic result.
+        // The smaller operand can't affect the result, even after rounding.
+        cmp     r3,     #25
+        bhi     LLSYM(__fadd_return)
+      #endif
+
+        // Isolate both mantissas, recovering the smaller.
+        lsls    rT,     r0,     #9
+        lsls    r0,     r1,     #9
+        eors    r0,     rT
+
+        // If the larger operand is normal, restore the implicit '1'.
+        // If subnormal, the second operand will also be subnormal.
+        cmp     r2,     #0
+        beq     LLSYM(__fadd_normal)
+        adds    rT,     #1
+        rors    rT,     rT
+
+        // If the smaller operand is also normal, restore the implicit '1'.
+        // If subnormal, the smaller operand effectively remains multiplied
+        //  by 2 w.r.t the first.  This compensates for subnormal exponents,
+        //  which are technically still -126, not -127.
+        cmp     r2,     r3
+        beq     LLSYM(__fadd_normal)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LLSYM(__fadd_normal):
+        // Provide a spare bit for overflow.
+        // Normal values will be aligned in bits [30:7]
+        // Subnormal values will be aligned in bits [30:8]
+        lsrs    rT,     #1
+        lsrs    r0,     #1
+
+        // If signs weren't matched, negate the smaller operand (branchless).
+        asrs    r1,     #31
+        eors    r0,     r1
+        subs    r0,     r1
+
+        // Keep a copy of the small mantissa for the remainder.
+        movs    r1,     r0
+
+        // Align the small mantissa for addition.
+        asrs    r1,     r3
+
+        // Isolate the remainder.
+        // NOTE: Given the various cases above, the remainder will only
+        //  be used as a boolean for rounding ties to even.  It is not
+        //  necessary to negate the remainder for subtraction operations.
+        rsbs    r3,     #0
+        adds    r3,     #32
+        lsls    r0,     r3
+
+        // Because operands are ordered, the result will never be negative.
+        // If the result of subtraction is 0, the overall result must be +0.
+        // If the overall result in $r1 is 0, then the remainder in $r0
+        //  must also be 0, so no register copy is necessary on return.
+        adds    r1,     rT
+        beq     LLSYM(__fadd_return)
+
+        // The large operand was aligned in bits [29:7]...
+        // If the larger operand was normal, the implicit '1' went in bit [30].
+        //
+        // After addition, the MSB of the result may be in bit:
+        //    31,  if the result overflowed.
+        //    30,  the usual case.
+        //    29,  if there was a subtraction of operands with exponents
+        //          differing by more than 1.
+        //  < 28, if there was a subtraction of operands with exponents +/-1,
+        //  < 28, if both operands were subnormal.
+
+        // In the last case (both subnormal), the alignment shift will be 8,
+        //  the exponent will be 0, and no rounding is necessary.
+        cmp     r2,     #0
+        bne     SYM(__fp_assemble)
+
+        // Subnormal overflow automatically forms the correct exponent.
+        lsrs    r0,     r1,     #8
+        add     r0,     ip
+
+    LLSYM(__fadd_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LLSYM(__fadd_special):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If $r1 is (also) NAN, force it in place of $r0.
+        // As the smaller NAN, it is more likely to be signaling.
+        movs    rT,     #255
+        lsls    rT,     #24
+        cmp     r3,     rT
+        bls     LLSYM(__fadd_ordered2)
+
+        eors    r0,     r1
+      #endif
+
+    LLSYM(__fadd_ordered2):
+        // There are several possible cases to consider here:
+        //  1. Any NAN/NAN combination
+        //  2. Any NAN/INF combination
+        //  3. Any NAN/value combination
+        //  4. INF/INF with matching signs
+        //  5. INF/INF with mismatched signs.
+        //  6. Any INF/value combination.
+        // In all cases but the case 5, it is safe to return $r0.
+        // In the special case, a new NAN must be constructed.
+        // First, check the mantissa to see if $r0 is NAN.
+        lsls    r2,     r0,     #9
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LLSYM(__fadd_return)
+      #endif
+
+    LLSYM(__fadd_zero):
+        // Next, check for an INF/value combination.
+        lsls    r2,     r1,     #1
+        bne     LLSYM(__fadd_return)
+
+        // Finally, check for matching sign on INF/INF.
+        // Also accepts matching signs when +/-0 are added.
+        bcc     LLSYM(__fadd_return)
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(SUBTRACTED_INFINITY)
+      #endif
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        // Restore original operands.
+        eors    r1,     r0
+      #endif
+
+        // Identify mismatched 0.
+        lsls    r2,     r0,     #1
+        bne     SYM(__fp_exception)
+
+        // Force mismatched 0 to +0.
+        eors    r0,     r0
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END addsf3
+FUNC_END aeabi_fadd
+
+#ifdef L_arm_addsubsf3
+FUNC_END subsf3
+FUNC_END aeabi_fsub
+#endif
+
+#endif /* L_arm_addsf3 */
+
diff --git a/libgcc/config/arm/eabi/fneg.S b/libgcc/config/arm/eabi/fneg.S
new file mode 100644
index 00000000000..a736c66beac
--- /dev/null
+++ b/libgcc/config/arm/eabi/fneg.S
@@ -0,0 +1,76 @@
+/* fneg.S: Thumb-1 optimized 32-bit float negation
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_negsf2
+
+// float __aeabi_fneg(float) [obsolete]
+// The argument and result are in $r0.
+// Uses $r1 and $r2 as scratch registers.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fneg .text.sorted.libgcc.fpcore.a.fneg
+FUNC_ALIAS negsf2 aeabi_fneg
+    CFI_START_FUNCTION
+
+  #if (defined(STRICT_NANS) && STRICT_NANS) || \
+      (defined(TRAP_NANS) && TRAP_NANS)
+        // Check for NAN.
+        lsls    r1,     r0,     #1
+        movs    r2,     #255
+        lsls    r2,     #24
+        cmp     r1,     r2
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        blo     SYM(__fneg_nan)
+      #else
+        blo     LLSYM(__fneg_return)
+      #endif
+  #endif
+
+        // Flip the sign.
+        movs    r1,     #1
+        lsls    r1,     #31
+        eors    r0,     r1
+
+    LLSYM(__fneg_return):
+        RET
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+    LLSYM(__fneg_nan):
+        // Set up registers for exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(fp_check_nan)
+  #endif
+
+    CFI_END_FUNCTION
+FUNC_END negsf2
+FUNC_END aeabi_fneg
+
+#endif /* L_arm_negsf2 */
+
diff --git a/libgcc/config/arm/eabi/fplib.h b/libgcc/config/arm/eabi/fplib.h
index 91cc1dde0d7..8f8998da811 100644
--- a/libgcc/config/arm/eabi/fplib.h
+++ b/libgcc/config/arm/eabi/fplib.h
@@ -45,9 +45,6 @@
 /* Push extra registers when required for 64-bit stack alignment */
 #define DOUBLE_ALIGN_STACK (1)
 
-/* Manipulate *div0() parameters to meet the ARM runtime ABI specification. */
-#define PEDANTIC_DIV0 (1)
-
 /* Define various exception codes.  These don't map to anything in particular */
 #define SUBTRACTED_INFINITY (20)
 #define INFINITY_TIMES_ZERO (21)
diff --git a/libgcc/config/arm/eabi/futil.S b/libgcc/config/arm/eabi/futil.S
new file mode 100644
index 00000000000..e85a37d817a
--- /dev/null
+++ b/libgcc/config/arm/eabi/futil.S
@@ -0,0 +1,418 @@
+/* futil.S: Thumb-1 optimized 32-bit float helper functions
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+// These helper functions are exported in distinct object files to keep
+//  the linker from importing unused code.
+// These helper functions do NOT follow AAPCS register conventions.
+
+
+#ifdef L_fp_normalizef
+
+// Internal function, decomposes the unsigned float in $r2.
+// The exponent will be returned in $r2, the mantissa in $r3.
+// If subnormal, the mantissa will be normalized, so that
+//  the MSB of the mantissa (if any) will be aligned at bit[31].
+// Preserves $r0 and $r1, uses $rT as scratch space.
+FUNC_START_SECTION fp_normalize2 .text.sorted.libgcc.fpcore.y.alignf
+    CFI_START_FUNCTION
+
+        // Extract the mantissa.
+        lsls    r3,     r2,     #8
+
+        // Extract the exponent.
+        lsrs    r2,     #24
+        beq     SYM(__fp_lalign2)
+
+        // Restore the mantissa's implicit '1'.
+        adds    r3,     #1
+        rors    r3,     r3
+
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END fp_normalize2
+
+
+// Internal function, aligns $r3 so the MSB is aligned in bit[31].
+// Simultaneously, subtracts the shift from the exponent in $r2
+FUNC_ENTRY fp_lalign2
+    CFI_START_FUNCTION
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Unroll the loop, similar to __clzsi2().
+        lsrs    rT,     r3,     #16
+        bne     LLSYM(__align8)
+        subs    r2,     #16
+        lsls    r3,     #16
+
+    LLSYM(__align8):
+        lsrs    rT,     r3,     #24
+        bne     LLSYM(__align4)
+        subs    r2,     #8
+        lsls    r3,     #8
+
+    LLSYM(__align4):
+        lsrs    rT,     r3,     #28
+        bne     LLSYM(__align2)
+        subs    r2,     #4
+        lsls    r3,     #4
+  #endif
+
+    LLSYM(__align2):
+        // Refresh the state of the N flag before entering the loop.
+        tst     r3,     r3
+
+    LLSYM(__align_loop):
+        // Test before subtracting to compensate for the natural exponent.
+        // The largest subnormal should have an exponent of 0, not -1.
+        bmi     LLSYM(__align_return)
+        subs    r2,     #1
+        lsls    r3,     #1
+        bne     LLSYM(__align_loop)
+
+        // Not just a subnormal... 0!  By design, this should never happen.
+        // All callers of this internal function filter 0 as a special case.
+        // Was there an uncontrolled jump from somewhere else?  Cosmic ray?
+        eors    r2,     r2
+
+      #ifdef DEBUG
+        bkpt    #0
+      #endif
+
+    LLSYM(__align_return):
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END fp_lalign2
+
+#endif /* L_fp_normalizef */
+
+
+#ifdef L_fp_assemblef
+
+// Internal function to combine mantissa, exponent, and sign. No return.
+// Expects the unsigned result in $r1.  To avoid underflow (slower),
+//  the MSB should be in bits [31:29].
+// Expects any remainder bits of the unrounded result in $r0.
+// Expects the exponent in $r2.  The exponent must be relative to bit[30].
+// Expects the sign of the result (and only the sign) in $ip.
+// Returns a correctly rounded floating value in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_assemble .text.sorted.libgcc.fpcore.g.assemblef
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Examine the upper three bits [31:29] for underflow.
+        lsrs    r3,     r1,     #29
+        beq     LLSYM(__fp_underflow)
+
+        // Convert bits [31:29] into an offset in the range of { 0, -1, -2 }.
+        // Right rotation aligns the MSB in bit [31], filling any LSBs with '0'.
+        lsrs    r3,     r1,     #1
+        mvns    r3,     r3
+        ands    r3,     r1
+        lsrs    r3,     #30
+        subs    r3,     #2
+        rors    r1,     r3
+
+        // Update the exponent, assuming the final result will be normal.
+        // The new exponent is 1 less than actual, to compensate for the
+        //  eventual addition of the implicit '1' in the result.
+        // If the final exponent becomes negative, proceed directly to gradual
+        //  underflow, without bothering to search for the MSB.
+        adds    r2,     r3
+
+    FUNC_ENTRY fp_assemble2
+        bmi     LLSYM(__fp_subnormal)
+
+    LLSYM(__fp_normal):
+        // Check for overflow (remember the implicit '1' to be added later).
+        cmp     r2,     #254
+        bge     SYM(__fp_overflow)
+
+        // Save LSBs for the remainder. Position doesn't matter any more,
+        //  these are just tiebreakers for round-to-even.
+        lsls    rT,     r1,     #25
+
+        // Align the final result.
+        lsrs    r1,     #8
+
+    LLSYM(__fp_round):
+        // If carry bit is '0', always round down.
+        bcc     LLSYM(__fp_return)
+
+        // The carry bit is '1'.  Round to nearest, ties to even.
+        // If either the saved remainder bits [6:0], the additional remainder
+        //  bits in $r1, or the final LSB is '1', round up.
+        lsls    r3,     r1,     #31
+        orrs    r3,     rT
+        orrs    r3,     r0
+        beq     LLSYM(__fp_return)
+
+        // If rounding up overflows, then the mantissa result becomes 2.0,
+        //  which yields the correct return value up to and including INF.
+        adds    r1,     #1
+
+    LLSYM(__fp_return):
+        // Combine the mantissa and the exponent.
+        lsls    r2,     #23
+        adds    r0,     r1,     r2
+
+        // Combine with the saved sign.
+        // End of library call, return to user.
+        add     r0,     ip
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: Underflow/inexact reporting IFF remainder
+  #endif
+
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LLSYM(__fp_underflow):
+        // Set up to align the mantissa.
+        movs    r3,     r1
+        bne     LLSYM(__fp_underflow2)
+
+        // MSB wasn't in the upper 32 bits, check the remainder.
+        // If the remainder is also zero, the result is +/-0.
+        movs    r3,     r0
+        beq     SYM(__fp_zero)
+
+        eors    r0,     r0
+        subs    r2,     #32
+
+    LLSYM(__fp_underflow2):
+        // Save the pre-alignment exponent to align the remainder later.
+        movs    r1,     r2
+
+        // Align the mantissa with the MSB in bit[31].
+        bl      SYM(__fp_lalign2)
+
+        // Calculate the actual remainder shift.
+        subs    rT,     r1,     r2
+
+        // Align the lower bits of the remainder.
+        movs    r1,     r0
+        lsls    r0,     rT
+
+        // Combine the upper bits of the remainder with the aligned value.
+        rsbs    rT,     #0
+        adds    rT,     #32
+        lsrs    r1,     rT
+        adds    r1,     r3
+
+        // The MSB is now aligned at bit[31] of $r1.
+        // If the net exponent is still positive, the result will be normal.
+        // Because this function is used by fmul(), there is a possibility
+        //  that the value is still wider than 24 bits; always round.
+        tst     r2,     r2
+        bpl     LLSYM(__fp_normal)
+
+    LLSYM(__fp_subnormal):
+        // The MSB is aligned at bit[31], with a net negative exponent.
+        // The mantissa will need to be shifted right by the absolute value of
+        //  the exponent, plus the normal shift of 8.
+
+        // If the negative shift is smaller than -25, there is no result,
+        //  no rounding, no anything.  Return signed zero.
+        // (Otherwise, the shift for result and remainder may wrap.)
+        adds    r2,     #25
+        bmi     SYM(__fp_inexact_zero)
+
+        // Save the extra bits for the remainder.
+        movs    rT,     r1
+        lsls    rT,     r2
+
+        // Shift the mantissa to create a subnormal.
+        // Just like normal, round to nearest, ties to even.
+        movs    r3,     #33
+        subs    r3,     r2
+        eors    r2,     r2
+
+        // This shift must be last, leaving the shifted LSB in the C flag.
+        lsrs    r1,     r3
+        b       LLSYM(__fp_round)
+
+    CFI_END_FUNCTION
+FUNC_END fp_assemble
+
+
+// Recreate INF with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+FUNC_ENTRY fp_overflow
+    CFI_START_FUNCTION
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/overflow exception
+  #endif
+
+    FUNC_ENTRY fp_infinity
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        movs    r0,     #255
+        lsls    r0,     #23
+        add     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_overflow
+
+
+// Recreate 0 with the appropriate sign.  No return.
+// Expects the sign of the result in $ip.
+FUNC_ENTRY fp_inexact_zero
+    CFI_START_FUNCTION
+
+  #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // TODO: inexact/underflow exception
+  #endif
+
+FUNC_ENTRY fp_zero
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Return 0 with the correct sign.
+        mov     r0,     ip
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_zero
+FUNC_END fp_inexact_zero
+
+#endif /* L_fp_assemblef */
+
+
+#ifdef L_fp_checknanf
+
+// Internal function to detect signaling NANs.  No return.
+// Uses $r2 as scratch space.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_check_nan2 .text.sorted.libgcc.fpcore.j.checkf
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+
+    FUNC_ENTRY fp_check_nan
+
+        // Check for quiet NAN.
+        lsrs    r2,     r0,     #23
+        bcs     LLSYM(__quiet_nan)
+
+        // Raise exception.  Preserves both $r0 and $r1.
+        svc     #(SVC_TRAP_NAN)
+
+        // Quiet the resulting NAN.
+        movs    r2,     #1
+        lsls    r2,     #22
+        orrs    r0,     r2
+
+    LLSYM(__quiet_nan):
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_check_nan
+FUNC_END fp_check_nan2
+
+#endif /* L_fp_checknanf */
+
+
+#ifdef L_fp_exceptionf
+
+// Internal function to report floating point exceptions.  No return.
+// Expects the original argument(s) in $r0 (possibly also $r1).
+// Expects a code that describes the exception in $r3.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_exception .text.sorted.libgcc.fpcore.k.exceptf
+    CFI_START_FUNCTION
+
+        // Work around CFI branching limitations.
+        .cfi_remember_state
+        .cfi_adjust_cfa_offset 8
+        .cfi_rel_offset rT, 0
+        .cfi_rel_offset lr, 4
+
+        // Create a quiet NAN.
+        movs    r2,     #255
+        lsls    r2,     #1
+        adds    r2,     #1
+        lsls    r2,     #22
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        // Annotate the exception type in the NAN field.
+        // Make sure that the exception is in the valid region
+        lsls    rT,     r3,     #13
+        orrs    r2,     rT
+      #endif
+
+    // Exception handler that expects the result already in $r2,
+    //  typically when the result is not going to be NAN.
+    FUNC_ENTRY fp_exception2
+
+      #if defined(TRAP_EXCEPTIONS) && TRAP_EXCEPTIONS
+        svc     #(SVC_FP_EXCEPTION)
+      #endif
+
+        // TODO: Save exception flags in a static variable.
+
+        // Set up the result, now that the argument isn't required any more.
+        movs    r0,     r2
+
+        // HACK: for sincosf(), with 2 parameters to return.
+        movs    r1,     r2
+
+        // End of library call, return to user.
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END fp_exception2
+FUNC_END fp_exception
+
+#endif /* L_arm_exception */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index d1a2d2f7908..bfe3397d892 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2012,7 +2012,9 @@ LSYM(Lchange_\register):
 #include "bpabi-v6m.S"
 #include "eabi/fplib.h"
 #include "eabi/fcmp.S"
+#include "eabi/fneg.S"
 #include "eabi/fadd.S"
+#include "eabi/futil.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index e69579e16dd..c57d9ef50ac 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -32,6 +32,7 @@ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
 LIB1ASMFUNCS += \
 	_internal_cmpsf2 \
 	_muldi3 \
+        _arm_addsf3 \
 	
 endif
 
@@ -95,6 +96,11 @@ LIB1ASMFUNCS += \
         _arm_fcmpne \
         _arm_eqsf2 \
         _arm_gesf2 \
+        _arm_frsubsf3 \
+	_fp_exceptionf \
+	_fp_checknanf \
+	_fp_assemblef \
+	_fp_normalizef \
 
 endif
 

From patchwork Mon Oct 31 15:45:22 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59687
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 543E538576B6
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:52:23 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id D04AE3885527
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D04AE3885527
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id BC2CC3200917;
 Mon, 31 Oct 2022 11:48:35 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute5.internal (MEProxy); Mon, 31 Oct 2022 11:48:36 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231315; x=
 1667317715; bh=r49nQ7pHeszYpi4qQBES36vacTd1m1sYISpdGnkpFw8=; b=D
 zw87uy32InqP6xs+T8cTokZyDbWGeawXGpAueCS0MgVIpAOr6Jsfk6mu3nk4glzp
 mFhG34UZWcLbDLacemTM/Gberny7POeHw9ICffPsqi4C+6LMEVegWQK3NGB+xeRa
 P7KqG1lo8AV+UrY9Hf5sftDh/gC2boz4IUi2/TTDyC9QY2fhgoyGXPB32q2V5DwU
 unPGWYAZ+P27WFLjWYkxyzs+XDBiYZx+f29QsofQOVs1emnnzhLgaQNf1avEb72r
 CUqdE2SXoaQW3sk0fO6XK9+RHL3MKD/Y7padiMDNpcGUk4UtK48Gek5vhYj/7D2n
 xEKwIGItWRMz6p9MfhGAQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231315; x=1667317715; bh=r49nQ7pHeszYp
 i4qQBES36vacTd1m1sYISpdGnkpFw8=; b=GvhokGZPXgc3H0TosHFIJuwLvujTB
 P/6OcFjGbaHaHf/+Vah7TQb+0AzUmuOHPS9P0rCcCIGfEBIs8CtNaL2o5/uvalNu
 ClLhaLtw9XuakLhoPVPfHHOvrcd22eodnTcDZpcAkctdhrmnwc7eCou8JAeO2InU
 2nESjTjUxs1RfL0Bn3B/948HKijRumMxVDCxwdiPr0LvRNYqGn1/XF9kFfruca1J
 lgXScqrsFQYxAmJspQbKx6GX0iV816zvn9cBhBwieVzrf801QLi8iCh4vL9qVA6b
 P5csiymFXKJiQlST5urE+80A8MQG5YRKkBJdCmnnbMeIXIqWu7jmWvxTg==
X-ME-Sender: <xms:U-5fY5FrOh3BeLFL5_2oQzoN4bku9QwnBR1DFFBrI-rluoUKl7296g>
 <xme:U-5fY-UuJh6-Zh1lTILuqJvyx9gp-Vfn-iyTmeWY9a5ozH7he9Kng7vzilZGLY17i
 4V-PbnfGLMaHQ>
X-ME-Received: 
 <xmr:U-5fY7IN4OxxyM7OW2Htbr9f5RT-3z4pZfRG1RFlxRwjkvit5hPd0PGvcMVhTL8Ry26YWKrMmrUDyVGYjzjToFmHBbxLiJ_ipjk5c2PQi3IBDBADXeRk2dw>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeffleeitddtheehgfelfeekieetvdduieevleetvddviefhgeeuueehieet
 fedvteenucffohhmrghinhepfhhmuhhlrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh
 gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho
 mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:U-5fY_Esr2iHkCSFL16NLnYzuzDZWWFql6zuu9Krd35gnhadk9bITA>
 <xmx:U-5fY_VsZp1WijmDWHEpQ3G1FDdN4KMVhjN89-S_-2cFhsSErzQ7Hw>
 <xmx:U-5fY6MeQ3N8yBC8c-wgT_ZzIIpdDnXRzpIIDeuz9AtyGI-ECeZ0rA>
 <xmx:U-5fY8hDFI99k6rfFTRrI-y_JunCWTytgSP_EOXQVQPpf1xrVx-MWw>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:34 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmQas087310; Mon, 31 Oct 2022 08:48:26 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 27/34] Import float multiplication from the CM0 library
Date: Mon, 31 Oct 2022 08:45:22 -0700
Message-Id: <20221031154529.3627576-28-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fmul.S (__mulsf3): New file.
	* config/arm/lib1funcs.S: #include eabi/fmul.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Moved _mulsf3 to global scope
	(this object was previously blocked on v6m builds).
---
 libgcc/config/arm/eabi/fmul.S | 215 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf       |   3 +-
 3 files changed, 218 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi/fmul.S

diff --git a/libgcc/config/arm/eabi/fmul.S b/libgcc/config/arm/eabi/fmul.S
new file mode 100644
index 00000000000..4ebd5a66f47
--- /dev/null
+++ b/libgcc/config/arm/eabi/fmul.S
@@ -0,0 +1,215 @@
+/* fmul.S: Thumb-1 optimized 32-bit float multiplication
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_mulsf3
+
+// float __aeabi_fmul(float, float)
+// Returns $r0 after multiplication by $r1.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fmul .text.sorted.libgcc.fpcore.m.fmul
+FUNC_ALIAS mulsf3 aeabi_fmul
+    CFI_START_FUNCTION
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save the sign of the result.
+        movs    rT,     r1
+        eors    rT,     r0
+        lsrs    rT,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // Set up INF for comparison.
+        movs    rT,     #255
+        lsls    rT,     #24
+
+        // Check for multiplication by zero.
+        lsls    r2,     r0,     #1
+        beq     LLSYM(__fmul_zero1)
+
+        lsls    r3,     r1,     #1
+        beq     LLSYM(__fmul_zero2)
+
+        // Check for INF/NAN.
+        cmp     r3,     rT
+        bhs     LLSYM(__fmul_special2)
+
+        cmp     r2,     rT
+        bhs     LLSYM(__fmul_special1)
+
+        // Because neither operand is INF/NAN, the result will be finite.
+        // It is now safe to modify the original operand registers.
+        lsls    r0,     #9
+
+        // Isolate the first exponent.  When normal, add back the implicit '1'.
+        // The result is always aligned with the MSB in bit [31].
+        // Subnormal mantissas remain effectively multiplied by 2x relative to
+        //  normals, but this works because the weight of a subnormal is -126.
+        lsrs    r2,     #24
+        beq     LLSYM(__fmul_normalize2)
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LLSYM(__fmul_normalize2):
+        // IMPORTANT: exp10i() jumps in here!
+        // Repeat for the mantissa of the second operand.
+        // Short-circuit when the mantissa is 1.0, as the
+        //  first mantissa is already prepared in $r0
+        lsls    r1,     #9
+
+        // When normal, add back the implicit '1'.
+        lsrs    r3,     #24
+        beq     LLSYM(__fmul_go)
+        adds    r1,     #1
+        rors    r1,     r1
+
+    LLSYM(__fmul_go):
+        // Calculate the final exponent, relative to bit [30].
+        adds    rT,     r2,     r3
+        subs    rT,     #127
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Short-circuit on multiplication by powers of 2.
+        lsls    r3,     r0,     #1
+        beq     LLSYM(__fmul_simple1)
+
+        lsls    r3,     r1,     #1
+        beq     LLSYM(__fmul_simple2)
+  #endif
+
+        // Save $ip across the call.
+        // (Alternatively, could push/pop a separate register,
+        //  but the four instructions here are equivally fast)
+        //  without imposing on the stack.
+        add     rT,     ip
+
+        // 32x32 unsigned multiplication, 64 bit result.
+        bl      SYM(__umulsidi3) __PLT__
+
+        // Separate the saved exponent and sign.
+        sxth    r2,     rT
+        subs    rT,     r2
+        mov     ip,     rT
+
+        b       SYM(__fp_assemble)
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LLSYM(__fmul_simple2):
+        // Move the high bits of the result to $r1.
+        movs    r1,     r0
+
+    LLSYM(__fmul_simple1):
+        // Clear the remainder.
+        eors    r0,     r0
+
+        // Adjust mantissa to match the exponent, relative to bit[30].
+        subs    r2,     rT,     #1
+        b       SYM(__fp_assemble)
+  #endif
+
+    LLSYM(__fmul_zero1):
+        // $r0 was equal to 0, set up to check $r1 for INF/NAN.
+        lsls    r2,     r1,     #1
+
+    LLSYM(__fmul_zero2):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(INFINITY_TIMES_ZERO)
+      #endif
+
+        // Check the non-zero operand for INF/NAN.
+        // If NAN, it should be returned.
+        // If INF, the result should be NAN.
+        // Otherwise, the result will be +/-0.
+        cmp     r2,     rT
+        beq     SYM(__fp_exception)
+
+        // If the second operand is finite, the result is 0.
+        blo     SYM(__fp_zero)
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Restore values that got mixed in zero testing, then go back
+        //  to sort out which one is the NAN.
+        lsls    r3,     r1,     #1
+        lsls    r2,     r0,     #1
+      #elif defined(TRAP_NANS) && TRAP_NANS
+        // Return NAN with the sign bit cleared.
+        lsrs    r0,     r2,     #1
+        b       SYM(__fp_check_nan)
+      #else
+        lsrs    r0,     r2,     #1
+        // Return NAN with the sign bit cleared.
+        pop     { rT, pc }
+                .cfi_restore_state
+      #endif
+
+    LLSYM(__fmul_special2):
+        // $r1 is INF/NAN.  In case of INF, check $r0 for NAN.
+        cmp     r2,     rT
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // Force swap if $r0 is not NAN.
+        bls     LLSYM(__fmul_swap)
+
+        // $r0 is NAN, keep if $r1 is INF
+        cmp     r3,     rT
+        beq     LLSYM(__fmul_special1)
+
+        // Both are NAN, keep the smaller value (more likely to signal).
+        cmp     r2,     r3
+      #endif
+
+        // Prefer the NAN already in $r0.
+        //  (If TRAP_NANS, this is the smaller NAN).
+        bhi     LLSYM(__fmul_special1)
+
+    LLSYM(__fmul_swap):
+        movs    r0,     r1
+
+    LLSYM(__fmul_special1):
+        // $r0 is either INF or NAN.  $r1 has already been examined.
+        // Flags are already set correctly.
+        lsls    r2,     r0,     #1
+        cmp     r2,     rT
+        beq     SYM(__fp_infinity)
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        b       SYM(__fp_check_nan)
+      #else
+        pop     { rT, pc }
+                .cfi_restore_state
+      #endif
+
+    CFI_END_FUNCTION
+FUNC_END mulsf3
+FUNC_END aeabi_fmul
+
+#endif /* L_arm_mulsf3 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index bfe3397d892..92245353442 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2015,6 +2015,7 @@ LSYM(Lchange_\register):
 #include "eabi/fneg.S"
 #include "eabi/fadd.S"
 #include "eabi/futil.S"
+#include "eabi/fmul.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index c57d9ef50ac..682f273a1d2 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -10,7 +10,7 @@ THUMB1_ISA:=$(findstring __ARM_ARCH_ISA_THUMB 1,$(shell $(gcc_compile_bare) -dM
 # inclusion create when only multiplication is used, thus avoiding pulling in
 # useless division code.
 ifneq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
-LIB1ASMFUNCS += _arm_muldf3 _arm_mulsf3
+LIB1ASMFUNCS += _arm_muldf3
 endif
 endif # !__symbian__
 
@@ -26,6 +26,7 @@ LIB1ASMFUNCS += \
 	_ctzsi2 \
 	_paritysi2 \
 	_popcountsi2 \
+	_arm_mulsf3 \
 
 ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
 # Group 0B: WEAK overridable function objects built for v6m only.

From patchwork Mon Oct 31 15:45:23 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59688
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 6D78F3887009
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:52:25 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 1B50D3948A7A
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1B50D3948A7A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailout.west.internal (Postfix) with ESMTP id 007243200094;
 Mon, 31 Oct 2022 11:48:40 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute4.internal (MEProxy); Mon, 31 Oct 2022 11:48:41 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231320; x=
 1667317720; bh=U32kClxKAmSv+UCHkogtGuZo7tJi8GM0dMLjnTfX068=; b=m
 I2+l9vQPPbTeeISA4z+h1lHMLt3HAf64BtOybFJxOxbCwjHNeYmx77d9NTII8Lrb
 ggGto6jW/F7LAn181L+6VVXjKS5ZwKI3peN8umdoNBJugNh3u2E+C2NBMEgtDTPS
 /oCDoacEdUhCbZFdWDGYPPxmD75OTOAhw5Zo6ufvjlsxVZOB0JpisKHvYGpXfCed
 GJcgEFzLrMCJcwp954is+T2QHVgwjy+o7UI7WT6ZXLBv9Vj5aVAB1G0UCk5qK48E
 h4CznFEraON7DA6l8pw3Id5H+INaEyEx/nrTBsWhOl6aKu9GkoXCtwh4rBsOoIyn
 dKsSUeErbd8sFnLCzMtXw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231320; x=1667317720; bh=U32kClxKAmSv+
 UCHkogtGuZo7tJi8GM0dMLjnTfX068=; b=fetRl5Fz0FALjnFJAK3/7C3HicGDy
 D1rOuOqYgJtovSeG9/d6bjH2lMqxtR1OCxihRGMY+RckLba34E1o2tFlo7oitS//
 OCd8OwyoFdADMBDF6v8X8wOxxPIUJ1sZ/pGfW+MGeYmhfa/EMUwYOHUm6tLhNlC7
 xuIcALRdAHVDNxCQ4iCo5Ih4m9yZC5cNIrPS49LRsLpZ4HceOglinCi/wTo+lDiF
 EykC+DDFcliMvgJFv4HPvxyNSlLz4ZssSlv2JtPHFmOpC1hRVko6kCyRdciNEwUz
 Qfw8eIPrPYWBr1zOhoh/bMsN4dwqVLzX9ETiZ8qKkaK9pziazMNN1NQyw==
X-ME-Sender: <xms:WO5fY9s8GZbr9UarRBW8xYVp9SH30PZBYrxveh1bTEkOw9GqMS3zug>
 <xme:WO5fY2fHReHbrupp4K1q7dbNx4WVkQRvAWy1A8YZBEia4qGw5VHnizpov8cgqGr3X
 uXpYxUXTEustg>
X-ME-Received: 
 <xmr:WO5fYwzJMLjJKTr8Z-D8uzQsyQpMZMXT1C9X0lLonrbd0UiaLmBGoyKfQMIJlJ63bD5c_ufCpk37QIyQGBQRkS8zGo25VkHhPX8BP62a1qE7vv0b86C7iIU>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeehheejvdegfeejleeukeeggeelfedvveelvdfgieeijefhhfehfeekhfeh
 udefueenucffohhmrghinhepfhguihhvrdhssgdpghhnuhdrohhrghdplhhisgdufhhunh
 gtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho
 mhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:WO5fY0POWUVBdkWEVIJy5R5v33Kyy6K7xcnW6JViKR9PicHd7Szj_A>
 <xmx:WO5fY9_IlLLEIJMXAGWmTcN-4DqdGpJA35sDO806lm84OdnEfB-94A>
 <xmx:WO5fY0V9TMZt9tgCGtzbin7qjn1TTJT-1A6y7oMpLDj183Eyb-snqg>
 <xmx:WO5fY_Jy9rOcUXMG9eEAV-DhQvjNKwefUVuofVJQvkiszNvX4S8_dQ>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:39 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmVI8087313; Mon, 31 Oct 2022 08:48:31 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 28/34] Import float division from the CM0 library
Date: Mon, 31 Oct 2022 08:45:23 -0700
Message-Id: <20221031154529.3627576-29-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fdiv.S (__divsf3, __fp_divloopf): New file.
	* config/arm/lib1funcs.S: #include eabi/fdiv.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _divsf3 and _fp_divloopf.
---
 libgcc/config/arm/eabi/fdiv.S | 261 ++++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S |   1 +
 libgcc/config/arm/t-elf       |   2 +
 3 files changed, 264 insertions(+)
 create mode 100644 libgcc/config/arm/eabi/fdiv.S

diff --git a/libgcc/config/arm/eabi/fdiv.S b/libgcc/config/arm/eabi/fdiv.S
new file mode 100644
index 00000000000..a6d73892b6d
--- /dev/null
+++ b/libgcc/config/arm/eabi/fdiv.S
@@ -0,0 +1,261 @@
+/* fdiv.S: Thumb-1 optimized 32-bit float division
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_divsf3
+
+// float __aeabi_fdiv(float, float)
+// Returns $r0 after division by $r1.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_fdiv .text.sorted.libgcc.fpcore.n.fdiv
+FUNC_ALIAS divsf3 aeabi_fdiv
+    CFI_START_FUNCTION
+
+        // Standard registers, compatible with exception handling.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save for the sign of the result.
+        movs    r3,     r1
+        eors    r3,     r0
+        lsrs    rT,     r3,     #31
+        lsls    rT,     #31
+        mov     ip,     rT
+
+        // Set up INF for comparison.
+        movs    rT,     #255
+        lsls    rT,     #24
+
+        // Check for divide by 0.  Automatically catches 0/0.
+        lsls    r2,     r1,     #1
+        beq     LLSYM(__fdiv_by_zero)
+
+        // Check for INF/INF, or a number divided by itself.
+        lsls    r3,     #1
+        beq     LLSYM(__fdiv_equal)
+
+        // Check the numerator for INF/NAN.
+        eors    r3,     r2
+        cmp     r3,     rT
+        bhs     LLSYM(__fdiv_special1)
+
+        // Check the denominator for INF/NAN.
+        cmp     r2,     rT
+        bhs     LLSYM(__fdiv_special2)
+
+        // Check the numerator for zero.
+        cmp     r3,     #0
+        beq     SYM(__fp_zero)
+
+        // No action if the numerator is subnormal.
+        //  The mantissa will normalize naturally in the division loop.
+        lsls    r0,     #9
+        lsrs    r1,     r3,     #24
+        beq     LLSYM(__fdiv_denominator)
+
+        // Restore the numerator's implicit '1'.
+        adds    r0,     #1
+        rors    r0,     r0
+
+    LLSYM(__fdiv_denominator):
+        // The denominator must be normalized and left aligned.
+        bl      SYM(__fp_normalize2)
+
+        // 25 bits of precision will be sufficient.
+        movs    rT,     #64
+
+        // Run division.
+        bl      SYM(__fp_divloopf)
+        b       SYM(__fp_assemble)
+
+    LLSYM(__fdiv_equal):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(DIVISION_INF_BY_INF)
+      #endif
+
+        // The absolute value of both operands are equal, but not 0.
+        // If both operands are INF, create a new NAN.
+        cmp     r2,     rT
+        beq     SYM(__fp_exception)
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // If both operands are NAN, return the NAN in $r0.
+        bhi     SYM(__fp_check_nan)
+      #else
+        bhi     LLSYM(__fdiv_return)
+      #endif
+
+        // Return 1.0f, with appropriate sign.
+        movs    r0,     #127
+        lsls    r0,     #23
+        add     r0,     ip
+
+    LLSYM(__fdiv_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    LLSYM(__fdiv_special2):
+        // The denominator is either INF or NAN, numerator is neither.
+        // Also, the denominator is not equal to 0.
+        // If the denominator is INF, the result goes to 0.
+        beq     SYM(__fp_zero)
+
+        // The only other option is NAN, fall through to branch.
+        mov     r0,     r1
+
+    LLSYM(__fdiv_special1):
+      #if defined(TRAP_NANS) && TRAP_NANS
+        // The numerator is INF or NAN.  If NAN, return it directly.
+        bne     SYM(__fp_check_nan)
+      #else
+        bne     LLSYM(__fdiv_return)
+      #endif
+
+        // If INF, the result will be INF if the denominator is finite.
+        // The denominator won't be either INF or 0,
+        //  so fall through the exception trap to check for NAN.
+        movs    r0,     r1
+
+    LLSYM(__fdiv_by_zero):
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(DIVISION_0_BY_0)
+      #endif
+
+        // The denominator is 0.
+        // If the numerator is also 0, the result will be a new NAN.
+        // Otherwise the result will be INF, with the correct sign.
+        lsls    r2,     r0,     #1
+        beq     SYM(__fp_exception)
+
+        // The result should be NAN if the numerator is NAN.  Otherwise,
+        //  the result is INF regardless of the numerator value.
+        cmp     r2,     rT
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+        bhi     SYM(__fp_check_nan)
+      #else
+        bhi     LLSYM(__fdiv_return)
+      #endif
+
+        // Recreate INF with the correct sign.
+        b       SYM(__fp_infinity)
+
+    CFI_END_FUNCTION
+FUNC_END divsf3
+FUNC_END aeabi_fdiv
+
+#endif /* L_arm_divsf3 */
+
+
+#ifdef L_fp_divloopf
+
+// Division helper, possibly to be shared with atan2.
+// Expects the numerator mantissa in $r0, exponent in $r1,
+//  plus the denominator mantissa in $r3, exponent in $r2, and
+//  a bit pattern in $rT that controls the result precision.
+// Returns quotient in $r1, exponent in $r2, pseudo remainder in $r0.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION fp_divloopf .text.sorted.libgcc.fpcore.o.fdiv2
+    CFI_START_FUNCTION
+
+        // Initialize the exponent, relative to bit[30].
+        subs    r2,     r1,     r2
+
+    SYM(__fp_divloopf2):
+        // The exponent should be (expN - 127) - (expD - 127) + 127.
+        // An additional offset of 25 is required to account for the
+        //  minimum number of bits in the result (before rounding).
+        // However, drop '1' because the offset is relative to bit[30],
+        //  while the result is calculated relative to bit[31].
+        adds    r2,     #(127 + 25 - 1)
+
+      #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+        // Dividing by a power of 2?
+        lsls    r1,     r3,     #1
+        beq     LLSYM(__fdiv_simple)
+      #endif
+
+        // Initialize the result.
+        eors    r1,     r1
+
+        // Clear the MSB, so that when the numerator is smaller than
+        //  the denominator, there is one bit free for a left shift.
+        // After a single shift, the numerator is guaranteed to be larger.
+        // The denominator ends up in r3, and the numerator ends up in r0,
+        //  so that the numerator serves as a psuedo-remainder in rounding.
+        // Shift the numerator one additional bit to compensate for the
+        //  pre-incrementing loop.
+        lsrs    r0,     #2
+        lsrs    r3,     #1
+
+    LLSYM(__fdiv_loop):
+        // Once the MSB of the output reaches the MSB of the register,
+        //  the result has been calculated to the required precision.
+        lsls    r1,     #1
+        bmi     LLSYM(__fdiv_break)
+
+        // Shift the numerator/remainder left to set up the next bit.
+        subs    r2,     #1
+        lsls    r0,     #1
+
+        // Test if the numerator/remainder is smaller than the denominator,
+        //  do nothing if it is.
+        cmp     r0,     r3
+        blo     LLSYM(__fdiv_loop)
+
+        // If the numerator/remainder is greater or equal, set the next bit,
+        //  and subtract the denominator.
+        adds    r1,     rT
+        subs    r0,     r3
+
+        // Short-circuit if the remainder goes to 0.
+        // Even with the overhead of "subnormal" alignment,
+        //  this is usually much faster than continuing.
+        bne     LLSYM(__fdiv_loop)
+
+        // Compensate the alignment of the result.
+        // The remainder does not need compensation, it's already 0.
+        lsls    r1,     #1
+
+    LLSYM(__fdiv_break):
+        RET
+
+  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+    LLSYM(__fdiv_simple):
+        // The numerator becomes the result, with a remainder of 0.
+        movs    r1,     r0
+        eors    r0,     r0
+        subs    r2,     #25
+        RET
+  #endif
+
+    CFI_END_FUNCTION
+FUNC_END fp_divloopf
+
+#endif /* L_fp_divloopf */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 92245353442..bd146aedc14 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2016,6 +2016,7 @@ LSYM(Lchange_\register):
 #include "eabi/fadd.S"
 #include "eabi/futil.S"
 #include "eabi/fmul.S"
+#include "eabi/fdiv.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 682f273a1d2..1812a1e1a99 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -98,10 +98,12 @@ LIB1ASMFUNCS += \
         _arm_eqsf2 \
         _arm_gesf2 \
         _arm_frsubsf3 \
+	_arm_divsf3 \
 	_fp_exceptionf \
 	_fp_checknanf \
 	_fp_assemblef \
 	_fp_normalizef \
+	_fp_divloopf \
 
 endif
 

From patchwork Mon Oct 31 15:45:24 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59690
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DEE3E388A4E6
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:55:03 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 6EFA8382A2EB
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6EFA8382A2EB
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id 39F3D3200302;
 Mon, 31 Oct 2022 11:48:46 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute3.internal (MEProxy); Mon, 31 Oct 2022 11:48:46 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231325; x=
 1667317725; bh=eyklWoW0yzFKWXTNFsqk9DQRMa9Zx86FIbFjBCPV1ls=; b=K
 KIhHiMEkeRTNSDChCytq5GGT5zKgkCUuYQ3nz2Z5YvOFzeYsjKL44Ed0gTSgVrLn
 jEyOVZDE6qmt2Lem0MMtt6RBPYoGoaYv2IovouInFls72eGESSULhekpxyKM8rVs
 zm7hnPC3C4JAXKioffOkoCHJO6x2RCw8Sz8Ca0pQZYibIi0yVNbEfL6s1TrLoYsD
 pUz7TSDZuLtzO0JEAFztp9m/fF5h4s99vXoPlA3WLc4f3EWzutJu3THIfPJs7n67
 0zOCWo12Xl9Rj3UW6iv4dMleclbgh0VZEojY/4y0BjB6CQqtU9R4atoGMNneB8Hl
 VQT8faqqP3adNzXTNUvbA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231325; x=1667317725; bh=eyklWoW0yzFKW
 XTNFsqk9DQRMa9Zx86FIbFjBCPV1ls=; b=cnAXqYWnLKRoFoqN/9B6l5UTTWW6P
 YZWjYUNVMJRRNOOlgr/V9I0pgjCCFs1BXL+Ze5mkwEcqoz4oKZBOC40ymTSKfqqn
 xoZE+5V20ttWLhSoqMvX2hbWqP7TleM0yOtV8TZ1AHDW3CYYcqetIK/Q9Lg7/bGx
 u9TeeghjJG94BYspm7EbirAguoId8pvmRq4HPjr5CnyWZ8WqpzNQX61OgvYx7fUk
 M/Sp7sDhqNqjv/bnYg2rIk0fYkZNEZz0LxqCjd83YBZP/OzaPuXhPR181qzADFgB
 GDlii+94HbgMxDkOzP7oAw+4SxoU3/ZIdMh33ZO7grRH4lQjbmOTPD1iQ==
X-ME-Sender: <xms:Xe5fY-OBTL3jDE0pI0lIx71Nly0tSUmJSqUSs0DXNfJx6FSg_A5wKw>
 <xme:Xe5fY89bfzWuJutDAZAU7_Weql0ZEKdNEMjng8k3OesPqH11vBIkdtbkGv3KVJJWx
 -in5r7ySzy8og>
X-ME-Received: 
 <xmr:Xe5fY1Rp6k9kDlbn8iDDMLYwj_MEpP6zzq9ZH64gj7UGP4ZDDLIWezjzImEAdKQlyWtR8DjWDrak0WdA4Iop1_2ZiVZBWDqEzjKYBwiaPXc3kks-Mq04cw0>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeevvdffleefgfefuedugefgffettdeiveeggeehiefgvdduvdekfeeijedt
 tdeggeenucffohhmrghinhepfhhflhhorghtrdhssgdpghhnuhdrohhrghdplhhisgdufh
 hunhgtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhf
 rhhomhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:Xe5fY-u8w9e1dXIe_h-NNsc6WvKRm4KTZh4u7MWnUuPCQRVYgAt4qA>
 <xmx:Xe5fY2dW-ujgT_2JVLY9sRi4oj4AMct-q15r2Vf7_CxuC1Aog8naJQ>
 <xmx:Xe5fYy3wEZYMuDMWu96d9HGQY_EfsyDZ5v7yLW9Xa5l23ZgT2LvzgA>
 <xmx:Xe5fY3opnbYe5cfzKOIdWT_SN35dPAyS0JYk8HNIR0Oe3aKE3Vo-kA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:44 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmbrF087316; Mon, 31 Oct 2022 08:48:37 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 29/34] Import integer-to-float conversion from the CM0
 library
Date: Mon, 31 Oct 2022 08:45:24 -0700
Message-Id: <20221031154529.3627576-30-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bpabi-lib.h (__floatdisf, __floatundisf):
	Remove obsolete RENAME_LIBRARY directives.
	* config/arm/eabi/ffloat.S (__aeabi_i2f, __aeabi_l2f, __aeabi_ui2f,
	__aeabi_ul2f): New file.
	* config/arm/lib1funcs.S: #include eabi/ffloat.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_floatunsisf,
	_arm_floatsisf, and _internal_floatundisf.
	Moved _arm_floatundisf to the weak function group
---
 libgcc/config/arm/bpabi-lib.h   |   6 -
 libgcc/config/arm/eabi/ffloat.S | 247 ++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S   |   1 +
 libgcc/config/arm/t-elf         |   5 +-
 4 files changed, 252 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ffloat.S

diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index 26ad5ffbe8b..7dd78d5668f 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -56,9 +56,6 @@
 #ifdef L_floatdidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatdidf, l2d)
 #endif
-#ifdef L_floatdisf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatdisf, l2f)
-#endif
 
 /* These renames are needed on ARMv6M.  Other targets get them from
    assembly routines.  */
@@ -71,9 +68,6 @@
 #ifdef L_floatundidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundidf, ul2d)
 #endif
-#ifdef L_floatundisf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundisf, ul2f)
-#endif
 
 /* For ARM bpabi, we only want to use a "__gnu_" prefix for the fixed-point
    helper functions - not everything in libgcc - in the interests of
diff --git a/libgcc/config/arm/eabi/ffloat.S b/libgcc/config/arm/eabi/ffloat.S
new file mode 100644
index 00000000000..c8bc55a24b6
--- /dev/null
+++ b/libgcc/config/arm/eabi/ffloat.S
@@ -0,0 +1,247 @@
+/* ffixed.S: Thumb-1 optimized integer-to-float conversion
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_floatsisf
+
+// float __aeabi_i2f(int)
+// Converts a signed integer in $r0 to float.
+
+// On little-endian cores (including all Cortex-M), __floatsisf() can be
+//  implemented as below in 5 instructions.  However, it can also be
+//  implemented by prefixing a single instruction to __floatdisf().
+// A memory savings of 4 instructions at a cost of only 2 execution cycles
+//  seems reasonable enough.  Plus, the trade-off only happens in programs
+//  that require both __floatsisf() and __floatdisf().  Programs only using
+//  __floatsisf() always get the smallest version.
+// When the combined version will be provided, this standalone version
+//  must be declared WEAK, so that the combined version can supersede it.
+// '_arm_floatsisf' should appear before '_arm_floatdisf' in LIB1ASMFUNCS.
+// Same parent section as __ul2f() to keep tail call branch within range.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+WEAK_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatsisf
+WEAK_ALIAS floatsisf aeabi_i2f
+    CFI_START_FUNCTION
+
+#else /* !__OPTIMIZE_SIZE__ */
+FUNC_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatsisf
+FUNC_ALIAS floatsisf aeabi_i2f
+    CFI_START_FUNCTION
+
+#endif /* !__OPTIMIZE_SIZE__ */
+
+        // Save the sign.
+        asrs    r3,     r0,     #31
+
+        // Absolute value of the input.
+        eors    r0,     r3
+        subs    r0,     r3
+
+        // Sign extension to long long unsigned.
+        eors    r1,     r1
+        b       SYM(__internal_floatundisf_noswap)
+
+    CFI_END_FUNCTION
+FUNC_END floatsisf
+FUNC_END aeabi_i2f
+
+#endif /* L_arm_floatsisf */
+
+
+#ifdef L_arm_floatdisf
+
+// float __aeabi_l2f(long long)
+// Converts a signed 64-bit integer in $r1:$r0 to a float in $r0.
+// See build comments for __floatsisf() above.
+// Same parent section as __ul2f() to keep tail call branch within range.
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+FUNC_START_SECTION aeabi_i2f .text.sorted.libgcc.fpcore.p.floatdisf
+FUNC_ALIAS floatsisf aeabi_i2f
+    CFI_START_FUNCTION
+
+      #if defined(__ARMEB__) && __ARMEB__
+        // __floatdisf() expects a big-endian lower word in $r1.
+        movs    xxl,    r0
+      #endif
+
+        // Sign extension to long long signed.
+        asrs    xxh,    xxl,    #31
+
+    FUNC_ENTRY aeabi_l2f
+    FUNC_ALIAS floatdisf aeabi_l2f
+
+#else  /* !__OPTIMIZE_SIZE__ */
+FUNC_START_SECTION aeabi_l2f .text.sorted.libgcc.fpcore.p.floatdisf
+FUNC_ALIAS floatdisf aeabi_l2f
+    CFI_START_FUNCTION
+
+#endif
+
+        // Save the sign.
+        asrs    r3,     xxh,     #31
+
+        // Absolute value of the input.
+        // Could this be arranged in big-endian mode so that this block also
+        //  swapped the input words?  Maybe.  But, since neither 'eors' nor
+        //  'sbcs' allow a third destination register, it seems unlikely to
+        //  save more than one cycle.  Also, the size of __floatdisf() and
+        //  __floatundisf() together would increase by two instructions.
+        eors    xxl,    r3
+        eors    xxh,    r3
+        subs    xxl,    r3
+        sbcs    xxh,    r3
+
+        b       SYM(__internal_floatundisf)
+
+    CFI_END_FUNCTION
+FUNC_END floatdisf
+FUNC_END aeabi_l2f
+
+#if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+FUNC_END floatsisf
+FUNC_END aeabi_i2f
+#endif
+
+#endif /* L_arm_floatsisf || L_arm_floatdisf */
+
+
+#ifdef L_arm_floatunsisf
+
+// float __aeabi_ui2f(unsigned)
+// Converts an unsigned integer in $r0 to float.
+FUNC_START_SECTION aeabi_ui2f .text.sorted.libgcc.fpcore.q.floatunsisf
+FUNC_ALIAS floatunsisf aeabi_ui2f
+    CFI_START_FUNCTION
+
+      #if defined(__ARMEB__) && __ARMEB__
+        // In big-endian mode, function flow breaks down.  __floatundisf()
+        //  wants to swap word order, but __floatunsisf() does not. The
+        // The choice is between leaving these arguments un-swapped and
+        //  branching, or canceling out the word swap in advance.
+        // The branching version would require one extra instruction to
+        //  clear the sign ($r3) because of __floatdisf() dependencies.
+        // While the branching version is technically one cycle faster
+        //  on the Cortex-M0 pipeline, branchless just feels better.
+
+        // Thus, __floatundisf() expects a big-endian lower word in $r1.
+        movs    xxl,    r0
+      #endif
+
+        // Extend to unsigned long long and fall through.
+        eors    xxh,    xxh
+
+#endif /* L_arm_floatunsisf */
+
+
+// The execution of __floatunsisf() flows directly into __floatundisf(), such
+//  that instructions must appear consecutively in the same memory section
+//  for proper flow control.  However, this construction inhibits the ability
+//  to discard __floatunsisf() when only using __floatundisf().
+// Additionally, both __floatsisf() and __floatdisf() expect to tail call
+//  __internal_floatundisf() with a sign argument. The __internal_floatundisf()
+//  symbol itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __floatsisf() or __floatdisf().
+// As a workaround, this block configures __internal_floatundisf() three times.
+// The first version provides __internal_floatundisf() as a WEAK standalone
+// symbol.  The second provides __floatundisf() and __internal_floatundisf(),
+//  still as weak symbols.  The third provides __floatunsisf() normally, but
+//  __floatundisf() remains weak in case the linker prefers another version.
+// '_internal_floatundisf', '_arm_floatundisf', and '_arm_floatunsisf' should
+//   appear in the given order in LIB1ASMFUNCS.
+#if defined(L_arm_floatunsisf) || defined(L_arm_floatundisf) || \
+    defined(L_internal_floatundisf)
+
+#define UL2F_SECTION .text.sorted.libgcc.fpcore.q.floatundisf
+
+#if defined(L_arm_floatundisf)
+// float __aeabi_ul2f(unsigned long long)
+// Converts an unsigned 64-bit integer in $r1:$r0 to a float in $r0.
+WEAK_START_SECTION aeabi_ul2f UL2F_SECTION
+WEAK_ALIAS floatundisf aeabi_ul2f
+    CFI_START_FUNCTION
+#elif defined(L_arm_floatunsisf)
+FUNC_ENTRY aeabi_ul2f
+FUNC_ALIAS floatundisf aeabi_ul2f
+#endif
+
+#if defined(L_arm_floatundisf) || defined(L_arm_floatunsisf)
+        // Sign is always positive.
+        eors    r3,     r3
+#endif
+
+#if defined(L_arm_floatunsisf)
+    // float internal_floatundisf(unsigned long long, int)
+    // Internal function expects the sign of the result in $r3[0].
+    FUNC_ENTRY internal_floatundisf
+
+#elif defined(L_arm_floatundisf)
+    WEAK_ENTRY internal_floatundisf
+
+#else /* L_internal_floatundisf */
+    WEAK_START_SECTION internal_floatundisf UL2F_SECTION
+    CFI_START_FUNCTION
+
+#endif
+
+      #if defined(__ARMEB__) && __ARMEB__
+        // Swap word order for register compatibility with __fp_assemble().
+        // Could this be optimized by re-defining __fp_assemble()?  Maybe.
+        // But the ramifications of dynamic register assignment on all
+        //  the other callers of __fp_assemble() would be enormous.
+        eors    r0,     r1
+        eors    r1,     r0
+        eors    r0,     r1
+      #endif
+
+#ifdef L_arm_floatunsisf
+    FUNC_ENTRY internal_floatundisf_noswap
+#else /* L_arm_floatundisf || L_internal_floatundisf */
+    WEAK_ENTRY internal_floatundisf_noswap
+#endif
+        // Default exponent, relative to bit[30] of $r1.
+        movs    r2,     #(127 - 1 + 63)
+
+        // Format the sign.
+        lsls    r3,     #31
+        mov     ip,     r3
+
+        push    { rT, lr }
+        b       SYM(__fp_assemble)
+
+    CFI_END_FUNCTION
+FUNC_END internal_floatundisf_noswap
+FUNC_END internal_floatundisf
+
+#if defined(L_arm_floatundisf) || defined(L_arm_floatunsisf)
+FUNC_END floatundisf
+FUNC_END aeabi_ul2f
+#endif
+
+#if defined(L_arm_floatunsisf)
+FUNC_END floatunsisf
+FUNC_END aeabi_ui2f
+#endif
+
+#endif /* L_arm_floatunsisf || L_arm_floatundisf */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index bd146aedc14..67bff9777fd 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2017,6 +2017,7 @@ LSYM(Lchange_\register):
 #include "eabi/futil.S"
 #include "eabi/fmul.S"
 #include "eabi/fdiv.S"
+#include "eabi/ffloat.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 1812a1e1a99..645d20f5f1c 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -26,14 +26,17 @@ LIB1ASMFUNCS += \
 	_ctzsi2 \
 	_paritysi2 \
 	_popcountsi2 \
+	_arm_floatundisf \
 	_arm_mulsf3 \
 
 ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
 # Group 0B: WEAK overridable function objects built for v6m only.
 LIB1ASMFUNCS += \
 	_internal_cmpsf2 \
+	_internal_floatundisf \
 	_muldi3 \
         _arm_addsf3 \
+	_arm_floatsisf \
 	
 endif
 
@@ -78,7 +81,6 @@ LIB1ASMFUNCS += \
 	_arm_fixsfsi \
 	_arm_fixunssfsi \
 	_arm_floatdisf \
-	_arm_floatundisf \
 	_arm_muldivsf3 \
 	_arm_negsf2 \
 	_arm_unordsf2 \
@@ -99,6 +101,7 @@ LIB1ASMFUNCS += \
         _arm_gesf2 \
         _arm_frsubsf3 \
 	_arm_divsf3 \
+	_arm_floatunsisf \
 	_fp_exceptionf \
 	_fp_checknanf \
 	_fp_assemblef \

From patchwork Mon Oct 31 15:45:25 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59691
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 022D538A9087
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:55:52 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 530F83952509
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:52 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 530F83952509
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
 by mailout.west.internal (Postfix) with ESMTP id 403703200094;
 Mon, 31 Oct 2022 11:48:51 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute4.internal (MEProxy); Mon, 31 Oct 2022 11:48:51 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231330; x=
 1667317730; bh=uN2X20fDQsJW3B41g5ilUTV3zS4pwprZtMREEpTuOqI=; b=c
 +AwwUZ8sCILfVwA+E6Rj6iFYh7iGCkHJc2YqzFXn4QpkyAMxj9xZ2MDwWuvUSNc4
 C0SU+y8GKFUzTCUm4VzJl9c6C7021+cjv8FAwmvZY2rCepS5CNlo42uHE/r3r55T
 TJJ5CI8XVYTUao/XsIRgmWm0ml3e7uxZCCx0Pg/0sZHfmLkjkH5EJF7OLbE9Kw1a
 ZqY8XNX0+FCUdx1pQJpJ9eiZT0RiVWWp851tP4SNQZ+UvsfN491decBYu0ViFgKE
 hvqS13e0GcBMmWOTUQSmOjh4jYNYCG2NUmSSOshOxTTuhtE4Pm6EaEWdk7sfRbuW
 JCPSCSKrIZLKQ/v/otHQw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231330; x=1667317730; bh=uN2X20fDQsJW3
 B41g5ilUTV3zS4pwprZtMREEpTuOqI=; b=NIepHCCuC+MUs361FePitk0ONfIgz
 qzRoTc35F7NxOUkWs2R7XEEsMs//cxWI2xavjqDdqWvPjfKufV5anpib3clFe17L
 W0DbeEHJNea1iKgShkZDi0EFgHNpmmATjKGciBcX03YR+WBFbs6JOdG1zMX3fMEk
 ji9pYzE8L4D1vcseL2YfLAzJTOxeIcvaq7G2OPL1dAcS1llgjxr9Mgs1Ny3Hm6Yx
 6BYGxKcsUktuBhTh7w7M8S9BLtkOVsm/Yv3IeFJ1GPUYRH04mtLvE5rNdrSUv98e
 keq6hXRZa87+Q7EFVwaKTQlx9GfbbmUIGDqw8oI/8S34VjVv/rE55JJ4Q==
X-ME-Sender: <xms:Yu5fY7vur7alWaT25hJVXcdMzancT9vDghWq__BWBRipK-gdO9MiyQ>
 <xme:Yu5fY8fDvl-EniQam411IRKxOZILGG56eoYkBHCkpS7QrRBVgyCJJ8FsUQyzscKrA
 bywcuWJk98o0w>
X-ME-Received: 
 <xmr:Yu5fY-x98izGCosoPwNOAs78xKiWB3Ihnp2YanfWvxIM2exfO2BSEacryKyIEfBZfxP9MKcGHHp27PE69fYvb3TXrwUr5V7bi1SxowhpwJjcqnnDvZWBIL4>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeffgfdtieeufeelvdeugeejtdfgieeuuedtffduleetteevueeiudeigfff
 ueetheenucffohhmrghinhepfhhfihigvggurdhssgdpghhnuhdrohhrghdplhhisgdufh
 hunhgtshdrshgsnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhf
 rhhomhepghhnuhesuggrnhhivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:Yu5fY6MR1QILuM39vg04pFeq-nH8nwyN6uR08Q8oWZ4AOSV2itMdYA>
 <xmx:Yu5fY7-4GHuojcJmn565F7pgJazh0OaB2J6voKU9-ggUXoHkjs7Fyw>
 <xmx:Yu5fY6XffBIZxfQLbxMFTWKcPnUXoAlQFedWNbTTo9XV7qEIrUC6Hg>
 <xmx:Yu5fY9J-VEoK9FJ8WxMOgLjmR5QLZwdVHTODOBV_Mybgd0TcViueRA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:50 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmgmG087319; Mon, 31 Oct 2022 08:48:42 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 30/34] Import float-to-integer conversion from the CM0
 library
Date: Mon, 31 Oct 2022 08:45:25 -0700
Message-Id: <20221031154529.3627576-31-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/bpabi-lib.h (muldi3): Removed duplicate.
	(fixunssfsi) Removed obsolete RENAME_LIBRARY directive.
	* config/arm/eabi/ffixed.S (__aeabi_f2iz, __aeabi_f2uiz,
	__aeabi_f2lz, __aeabi_f2ulz): New file.
	* config/arm/lib1funcs.S: #include eabi/ffixed.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _internal_fixsfdi,
	_internal_fixsfsi, _arm_fixsfdi, and _arm_fixunssfdi.
---
 libgcc/config/arm/bpabi-lib.h   |   6 -
 libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S   |   1 +
 libgcc/config/arm/t-elf         |   4 +
 4 files changed, 419 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/arm/eabi/ffixed.S

diff --git a/libgcc/config/arm/bpabi-lib.h b/libgcc/config/arm/bpabi-lib.h
index 7dd78d5668f..6425c1bad2a 100644
--- a/libgcc/config/arm/bpabi-lib.h
+++ b/libgcc/config/arm/bpabi-lib.h
@@ -32,9 +32,6 @@
 #ifdef L_muldi3
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (muldi3, lmul)
 #endif
-#ifdef L_muldi3
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (muldi3, lmul)
-#endif
 #ifdef L_fixdfdi
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixdfdi, d2lz) \
   extern DWtype __fixdfdi (DFtype) __attribute__((pcs("aapcs"))); \
@@ -62,9 +59,6 @@
 #ifdef L_fixunsdfsi
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixunsdfsi, d2uiz)
 #endif
-#ifdef L_fixunssfsi
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (fixunssfsi, f2uiz)
-#endif
 #ifdef L_floatundidf
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatundidf, ul2d)
 #endif
diff --git a/libgcc/config/arm/eabi/ffixed.S b/libgcc/config/arm/eabi/ffixed.S
new file mode 100644
index 00000000000..61c8a0fe1fd
--- /dev/null
+++ b/libgcc/config/arm/eabi/ffixed.S
@@ -0,0 +1,414 @@
+/* ffixed.S: Thumb-1 optimized float-to-integer conversion
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+// The implementation of __aeabi_f2uiz() expects to tail call __internal_f2iz()
+//  with the flags register set for unsigned conversion.  The __internal_f2iz()
+//  symbol itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __aeabi_f2iz().  Importing an
+//  archive file that exports __aeabi_f2iz() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2iz() for compilation twice.
+// The first version configures __internal_f2iz() as a WEAK standalone symbol,
+//  and the second exports __aeabi_f2iz() and __internal_f2iz() normally.
+// A small bonus: programs only using __aeabi_f2uiz() will be slightly smaller.
+// '_internal_fixsfsi' should appear before '_arm_fixsfsi' in LIB1ASMFUNCS.
+#if defined(L_arm_fixsfsi) || \
+   (defined(L_internal_fixsfsi) && \
+  !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__))
+
+// Subsection ordering within fpcore keeps conditional branches within range.
+#define F2IZ_SECTION .text.sorted.libgcc.fpcore.r.fixsfsi
+
+// int __aeabi_f2iz(float)
+// Converts a float in $r0 to signed integer, rounding toward 0.
+// Values out of range are forced to either INT_MAX or INT_MIN.
+// NAN becomes zero.
+#ifdef L_arm_fixsfsi
+FUNC_START_SECTION aeabi_f2iz F2IZ_SECTION
+FUNC_ALIAS fixsfsi aeabi_f2iz
+    CFI_START_FUNCTION
+#endif
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Flag for unsigned conversion.
+        movs    r1,     #33
+        b       SYM(__internal_fixsfdi)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+
+#ifdef L_arm_fixsfsi
+        // Flag for signed conversion.
+        movs    r3,     #1
+
+    // [unsigned] int internal_f2iz(float, int)
+    // Internal function expects a boolean flag in $r1.
+    // If the boolean flag is 0, the result is unsigned.
+    // If the boolean flag is 1, the result is signed.
+    FUNC_ENTRY internal_f2iz
+
+#else /* L_internal_fixsfsi */
+    WEAK_START_SECTION internal_f2iz F2IZ_SECTION
+    CFI_START_FUNCTION
+
+#endif
+
+        // Isolate the sign of the result.
+        asrs    r1,     r0,     #31
+        lsls    r0,     #1
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // Check for zero to avoid spurious underflow exception on -0.
+        beq     LLSYM(__f2iz_return)
+  #endif
+
+        // Isolate the exponent.
+        lsrs    r2,     r0,     #24
+
+  #if defined(TRAP_NANS) && TRAP_NANS
+        // Test for NAN.
+        // Otherwise, NAN will be converted like +/-INF.
+        cmp     r2,     #255
+        beq     LLSYM(__f2iz_nan)
+  #endif
+
+        // Extract the mantissa and restore the implicit '1'. Technically,
+        //  this is wrong for subnormals, but they flush to zero regardless.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+        //  * An exponent less than 127 will automatically flush to 0.
+        //  * An exponent of 127 will result in a shift of 31.
+        //  * An exponent of 128 will result in a shift of 30.
+        //  *  ...
+        //  * An exponent of 157 will result in a shift of 1.
+        //  * An exponent of 158 will result in no shift at all.
+        //  * An exponent larger than 158 will result in overflow.
+        rsbs    r2,     #0
+        adds    r2,     #158
+
+        // When the shift is less than minimum, the result will overflow.
+        // The only signed value to fail this test is INT_MIN (0x80000000),
+        //  but it will be returned correctly from the overflow branch.
+        cmp     r2,     r3
+        blt     LLSYM(__f2iz_overflow)
+
+        // If unsigned conversion of a negative value, also overflow.
+        // Would also catch -0.0f if not handled earlier.
+        cmn     r3,     r1
+        blt     LLSYM(__f2iz_overflow)
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // Save a copy for remainder testing
+        movs    r3,     r0
+  #endif
+
+        // Truncate the fraction.
+        lsrs    r0,     r2
+
+        // Two's complement negation, if applicable.
+        // Bonus: the sign in $r1 provides a suitable long long result.
+        eors    r0,     r1
+        subs    r0,     r1
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+        // If any bits set in the remainder, raise FE_INEXACT
+        rsbs    r2,     #0
+        adds    r2,     #32
+        lsls    r3,     r2
+        bne     LLSYM(__f2iz_inexact)
+  #endif
+
+    LLSYM(__f2iz_return):
+        RET
+
+    LLSYM(__f2iz_overflow):
+        // Positive unsigned integers (r1 == 0, r3 == 0), return 0xFFFFFFFF.
+        // Negative unsigned integers (r1 == -1, r3 == 0), return 0x00000000.
+        // Positive signed integers (r1 == 0, r3 == 1), return 0x7FFFFFFF.
+        // Negative signed integers (r1 == -1, r3 == 1), return 0x80000000.
+        // TODO: FE_INVALID exception, (but not for -2^31).
+        mvns    r0,     r1
+        lsls    r3,     #31
+        eors    r0,     r3
+        RET
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+    LLSYM(__f2iz_inexact):
+        // TODO: Another class of exceptions that doesn't overwrite $r0.
+        bkpt    #0
+
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(CAST_INEXACT)
+      #endif
+
+        b       SYM(__fp_exception)
+  #endif
+
+    LLSYM(__f2iz_nan):
+        // Check for INF
+        lsls    r2,     r0,     #9
+        beq     LLSYM(__f2iz_overflow)
+
+  #if defined(FP_EXCEPTION) && FP_EXCEPTION
+      #if defined(EXCEPTION_CODES) && EXCEPTION_CODES
+        movs    r3,     #(CAST_UNDEFINED)
+      #endif
+
+        b       SYM(__fp_exception)
+  #endif
+
+      #if defined(TRAP_NANS) && TRAP_NANS
+
+        // TODO: Extend to long long
+
+        // TODO: bl  fp_check_nan
+      #endif
+
+        // Return long long 0 on NAN.
+        eors    r0,     r0
+        eors    r1,     r1
+        RET
+
+FUNC_END internal_f2iz
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    CFI_END_FUNCTION
+
+#ifdef L_arm_fixsfsi
+FUNC_END fixsfsi
+FUNC_END aeabi_f2iz
+#endif
+
+#endif /* L_arm_fixsfsi || L_internal_fixsfsi */
+
+
+#ifdef L_arm_fixunssfsi
+
+// unsigned int __aeabi_f2uiz(float)
+// Converts a float in $r0 to unsigned integer, rounding toward 0.
+// Values out of range are forced to UINT_MAX.
+// Negative values and NAN all become zero.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_f2uiz .text.sorted.libgcc.fpcore.s.fixunssfsi
+FUNC_ALIAS fixunssfsi aeabi_f2uiz
+    CFI_START_FUNCTION
+
+  #if defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__
+        // Flag for unsigned conversion.
+        movs    r1,     #32
+        b       SYM(__internal_fixsfdi)
+
+  #else /* !__OPTIMIZE_SIZE__ */
+        // Flag for unsigned conversion.
+        movs    r3,     #0
+        b       SYM(__internal_f2iz)
+
+  #endif /* !__OPTIMIZE_SIZE__ */
+
+    CFI_END_FUNCTION
+FUNC_END fixunssfsi
+FUNC_END aeabi_f2uiz
+
+#endif /* L_arm_fixunssfsi */
+
+
+// The implementation of __aeabi_f2ulz() expects to tail call __internal_fixsfdi()
+//  with the flags register set for unsigned conversion.  The __internal_fixsfdi()
+//  symbol itself is unambiguous, but there is a remote risk that the linker
+//  will prefer some other symbol in place of __aeabi_f2lz().  Importing an
+//  archive file that exports __aeabi_f2lz() will throw an error in this case.
+// As a workaround, this block configures __aeabi_f2lz() for compilation twice.
+// The first version configures __internal_fixsfdi() as a WEAK standalone symbol,
+//  and the second exports __aeabi_f2lz() and __internal_fixsfdi() normally.
+// A small bonus: programs only using __aeabi_f2ulz() will be slightly smaller.
+// '_internal_fixsfdi' should appear before '_arm_fixsfdi' in LIB1ASMFUNCS.
+#if defined(L_arm_fixsfdi) || defined(L_internal_fixsfdi)
+
+// Subsection ordering within fpcore keeps conditional branches within range.
+#define F2LZ_SECTION .text.sorted.libgcc.fpcore.t.fixsfdi
+
+// long long aeabi_f2lz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to either INT64_MAX or INT64_MIN.
+// NAN becomes zero.
+#ifdef L_arm_fixsfdi
+FUNC_START_SECTION aeabi_f2lz F2LZ_SECTION
+FUNC_ALIAS fixsfdi aeabi_f2lz
+    CFI_START_FUNCTION
+
+        movs    r1,     #1
+
+    // [unsigned] long long int internal_fixsfdi(float, int)
+    // Internal function expects a shift flag in $r1.
+    // If the shift is flag 0, the result is unsigned.
+    // If the shift is flag is 1, the result is signed.
+    // If the shift is flag is 33, the result is signed int.
+    FUNC_ENTRY internal_fixsfdi
+
+#else /* L_internal_fixsfdi */
+    WEAK_START_SECTION internal_fixsfdi F2LZ_SECTION
+    CFI_START_FUNCTION
+
+#endif
+
+        // Split the sign of the result from the mantissa/exponent field.
+        // Handle +/-0 specially to avoid spurious exceptions.
+        asrs    r3,     r0,     #31
+        lsls    r0,     #1
+        beq     LLSYM(__f2lz_zero)
+
+        // If unsigned conversion of a negative value, also overflow.
+        // Specifically, is the LSB of $r1 clear when $r3 is equal to '-1'?
+        //
+        // $r3 (sign)   >=     $r2 (flag)
+        // 0xFFFFFFFF   false   0x00000000
+        // 0x00000000   true    0x00000000
+        // 0xFFFFFFFF   true    0x80000000
+        // 0x00000000   true    0x80000000
+        //
+        // (NOTE: This test will also trap -0.0f, unless handled earlier.)
+        lsls    r2,     r1,     #31
+        cmp     r3,     r2
+        blt     LLSYM(__f2lz_overflow)
+
+        // Isolate the exponent.
+        lsrs    r2,     r0,     #24
+
+//   #if defined(TRAP_NANS) && TRAP_NANS
+//         // Test for NAN.
+//         // Otherwise, NAN will be converted like +/-INF.
+//         cmp     r2,     #255
+//         beq     LLSYM(__f2lz_nan)
+//   #endif
+
+        // Calculate mantissa alignment. Given the implicit '1' in bit[31]:
+        //  * An exponent less than 127 will automatically flush to 0.
+        //  * An exponent of 127 will result in a shift of 63.
+        //  * An exponent of 128 will result in a shift of 62.
+        //  *  ...
+        //  * An exponent of 189 will result in a shift of 1.
+        //  * An exponent of 190 will result in no shift at all.
+        //  * An exponent larger than 190 will result in overflow
+        //     (189 in the case of signed integers).
+        rsbs    r2,     #0
+        adds    r2,     #190
+        // When the shift is less than minimum, the result will overflow.
+        // The only signed value to fail this test is INT_MIN (0x80000000),
+        //  but it will be returned correctly from the overflow branch.
+        cmp     r2,     r1
+        blt     LLSYM(__f2lz_overflow)
+
+        // Extract the mantissa and restore the implicit '1'. Technically,
+        //  this is wrong for subnormals, but they flush to zero regardless.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Calculate the upper word.
+        // If the shift is greater than 32, gives an automatic '0'.
+        movs    r1,     r0
+        lsrs    r1,     r2
+
+        // Reduce the shift for the lower word.
+        // If the original shift was less than 32, the result may be split
+        //  between the upper and lower words.
+        subs    r2,     #32
+        blt     LLSYM(__f2lz_split)
+
+        // Shift is still positive, keep moving right.
+        lsrs    r0,     r2
+
+        // TODO: Remainder test.
+        // $r1 is technically free, as long as it's zero by the time
+        //  this is over.
+
+    LLSYM(__f2lz_return):
+        // Two's complement negation, if the original was negative.
+        eors    r0,     r3
+        eors    r1,     r3
+        subs    r0,     r3
+        sbcs    r1,     r3
+        RET
+
+    LLSYM(__f2lz_split):
+        // Shift was negative, calculate the remainder
+        rsbs    r2,     #0
+        lsls    r0,     r2
+        b       LLSYM(__f2lz_return)
+
+    LLSYM(__f2lz_zero):
+        eors    r1,     r1
+        RET
+
+    LLSYM(__f2lz_overflow):
+        // Positive unsigned integers (r3 == 0, r1 == 0), return 0xFFFFFFFF.
+        // Negative unsigned integers (r3 == -1, r1 == 0), return 0x00000000.
+        // Positive signed integers (r3 == 0, r1 == 1), return 0x7FFFFFFF.
+        // Negative signed integers (r3 == -1, r1 == 1), return 0x80000000.
+        // TODO: FE_INVALID exception, (but not for -2^63).
+        mvns    r0,     r3
+
+        // For 32-bit results
+        lsls    r2,     r1,     #26
+        lsls    r1,     #31
+        ands    r2,     r1
+        eors    r0,     r2
+
+        eors    r1,     r0
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END internal_fixsfdi
+
+#ifdef L_arm_fixsfdi
+FUNC_END fixsfdi
+FUNC_END aeabi_f2lz
+#endif
+
+#endif /* L_arm_fixsfdi || L_internal_fixsfdi */
+
+
+#ifdef L_arm_fixunssfdi
+
+// unsigned long long __aeabi_f2ulz(float)
+// Converts a float in $r0 to a 64 bit integer in $r1:$r0, rounding toward 0.
+// Values out of range are forced to UINT64_MAX.
+// Negative values and NAN all become zero.
+// Subsection ordering within fpcore keeps conditional branches within range.
+FUNC_START_SECTION aeabi_f2ulz .text.sorted.libgcc.fpcore.u.fixunssfdi
+FUNC_ALIAS fixunssfdi aeabi_f2ulz
+    CFI_START_FUNCTION
+
+        eors    r1,     r1
+        b       SYM(__internal_fixsfdi)
+
+    CFI_END_FUNCTION
+FUNC_END fixunssfdi
+FUNC_END aeabi_f2ulz
+
+#endif /* L_arm_fixunssfdi */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 67bff9777fd..22619516eaf 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2017,6 +2017,7 @@ LSYM(Lchange_\register):
 #include "eabi/futil.S"
 #include "eabi/fmul.S"
 #include "eabi/fdiv.S"
+#include "eabi/ffixed.S"
 #include "eabi/ffloat.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 645d20f5f1c..6b0bb642ef5 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -34,6 +34,8 @@ ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
 LIB1ASMFUNCS += \
 	_internal_cmpsf2 \
 	_internal_floatundisf \
+	_internal_fixsfdi \
+	_internal_fixsfsi \
 	_muldi3 \
         _arm_addsf3 \
 	_arm_floatsisf \
@@ -102,6 +104,8 @@ LIB1ASMFUNCS += \
         _arm_frsubsf3 \
 	_arm_divsf3 \
 	_arm_floatunsisf \
+	_arm_fixsfdi \
+	_arm_fixunssfdi \
 	_fp_exceptionf \
 	_fp_checknanf \
 	_fp_assemblef \

From patchwork Mon Oct 31 15:45:26 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59689
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id CE7B738555BA
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:53:16 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 8D3AB3829BCF
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:48:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8D3AB3829BCF
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id 7C6523200302;
 Mon, 31 Oct 2022 11:48:56 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:48:56 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231336; x=
 1667317736; bh=qQ7mGDy4DYPcvGJDAtP2u8S0+mX7buFubUHNgzObVcg=; b=1
 5IuD3LZWVNmzdpgzYEsagI6Rd/1o/+kU5gOfbAWt0HVk6ngJe4opMINX3q5Z32Yw
 I5jfLW/MkxsSvJVDNMFxNTQ/3b8gsNn9wXxfFnhR2Rv4Mr1ElvRDzRf1o9Ry8cuS
 kO4N0JnEJbpQoiRvJEUa3lfZGWwMNKLSdMxLBl4uVXRzVxvedbPJRMpVfZR0A3TU
 h866zilzsxmndr0IMXHzhb6NviwNM6BIEBPZughQ2mggA3DA31t6Xanub1atc0CD
 8kc/AhaIJJSc6c68tT9c3AAHH5x1fyzOMHTmXfid2CYf+3QZ0ttS1LDQppI0IgFf
 4/GRnUygwH50ATHWBVmRA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231336; x=1667317736; bh=qQ7mGDy4DYPcv
 GJDAtP2u8S0+mX7buFubUHNgzObVcg=; b=gAHVIPElStqpZTIGCb3+JI4v0E9OF
 CnstkTFx2ocxZvCyJRCttVcPkS3/nCWLiSiJ3xXSi/ZhBjYPJ19htpKylqpvBaIx
 E7FjCtCpb4ICxEw9YJcsfyNlk8I7h2AFFcbRKPCNqi3PSf1e3SaIAt6yaBVbixSO
 40w7zwAtEKeTtCrkECfQbgzrX6eVakvM9Lcl6am+yQSNiE8EUmjUAlmYbYd/Xyxo
 XrYo6iSZkZcFj8r5OcJoSK87DLvXoR0vVgqQzKChlrmFqoTWRdPlVXUcbIeNjn7i
 N5pHe3+D4Onls4zGf1+/5f2QttaRux+fnO5sA05DBOT4gQynRCWle5NsA==
X-ME-Sender: <xms:Z-5fY0wLCl3gxge2RBrZaVPH1kjIiLqkKGQTDMVXPfe32A4kMqfK3Q>
 <xme:Z-5fY4Qzcz0N6W0ax6nc7_5d4pVW1lzUno45K5mbEY0zMjKQWfNkw-ORpOMJ-MVyb
 agYPMOWgb9q3g>
X-ME-Received: 
 <xmr:Z-5fY2XfZZe3nYPD0Y-xEmYv0HX8snY5ecIJyk8mD-i-UPlwLsHdlSHBB_0Ib64ALKj0atXSl-qE4YIZ9QKj6JoG1_VpScjlvlKFLjtTSFJTdGgsDIQKJS8>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdekudcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpefhtdetueeiieffffdtheeuueeigfeliedujeetteetgedtfeeuffdvfefh
 hfelveenucffohhmrghinhepfhgtrghsthdrshgspdhgnhhurdhorhhgpdhlihgsudhfuh
 hntghsrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr
 ohhmpehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:aO5fYyhIFMnYQVLaUhEzSnRpqcjiOLGACR7VEtOlCTTLZ59u2eUhWw>
 <xmx:aO5fY2BtmkeNg3FD4ngPxna2W-GD-UeRhzPlpmRzL7HnDyMakuXwcA>
 <xmx:aO5fYzJua9BQcXDufHFuZm_2VkoNDbFQcw9s6rre-52khUAnsoor5Q>
 <xmx:aO5fY5PGgVAWUcqrghJxa_6OxjSbbJHyMS1_gCULmpvbBRUxvADSUA>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:48:55 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmlZh087322; Mon, 31 Oct 2022 08:48:47 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 31/34] Import float<->double conversion from the CM0
 library
Date: Mon, 31 Oct 2022 08:45:26 -0700
Message-Id: <20221031154529.3627576-32-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fcast.S (__aeabi_d2f, __aeabi_f2d): New file.
	* config/arm/lib1funcs.S: #include eabi/fcast.S (v6m only).
	* config/arm/t-elf (LIB1ASMFUNCS): Added _arm_d2f and _arm_f2d.
---
 libgcc/config/arm/eabi/fcast.S | 256 +++++++++++++++++++++++++++++++++
 libgcc/config/arm/lib1funcs.S  |   1 +
 libgcc/config/arm/t-elf        |   2 +
 3 files changed, 259 insertions(+)
 create mode 100644 libgcc/config/arm/eabi/fcast.S

diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S
new file mode 100644
index 00000000000..f0d1373d31a
--- /dev/null
+++ b/libgcc/config/arm/eabi/fcast.S
@@ -0,0 +1,256 @@
+/* fcast.S: Thumb-1 optimized 32- and 64-bit float conversions
+
+   Copyright (C) 2018-2022 Free Software Foundation, Inc.
+   Contributed by Daniel Engel, Senva Inc (gnu@danielengel.com)
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifdef L_arm_f2d
+
+// double __aeabi_f2d(float)
+// Converts a single-precision float in $r0 to double-precision in $r1:$r0.
+// Rounding, overflow, and underflow are impossible.
+// INF and ZERO are returned unmodified.
+FUNC_START_SECTION aeabi_f2d .text.sorted.libgcc.fpcore.v.f2d
+FUNC_ALIAS extendsfdf2 aeabi_f2d
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        lsrs    r1,     r0,     #31
+        lsls    r1,     #31
+
+        // Set up registers for __fp_normalize2().
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Test for zero.
+        lsls    r0,     #1
+        beq     LLSYM(__f2d_return)
+
+        // Split the exponent and mantissa into separate registers.
+        // This is the most efficient way to convert subnormals in the
+        //  half-precision form into normals in single-precision.
+        // This does add a leading implicit '1' to INF and NAN,
+        //  but that will be absorbed when the value is re-assembled.
+        movs    r2,     r0
+        bl      SYM(__fp_normalize2) __PLT__
+
+        // Set up the exponent bias.  For INF/NAN values, the bias
+        //  is 1791 (2047 - 255 - 1), where the last '1' accounts
+        //  for the implicit '1' in the mantissa.
+        movs    r0,     #3
+        lsls    r0,     #9
+        adds    r0,     #255
+
+        // Test for INF/NAN, promote exponent if necessary
+        cmp     r2,     #255
+        beq     LLSYM(__f2d_indefinite)
+
+        // For normal values, the exponent bias is 895 (1023 - 127 - 1),
+        //  which is half of the prepared INF/NAN bias.
+        lsrs    r0,     #1
+
+    LLSYM(__f2d_indefinite):
+        // Assemble exponent with bias correction.
+        adds    r2,     r0
+        lsls    r2,     #20
+        adds    r1,     r2
+
+        // Assemble the high word of the mantissa.
+        lsrs    r0,     r3,     #11
+        add     r1,     r0
+
+        // Remainder of the mantissa in the low word of the result.
+        lsls    r0,     r3,     #21
+
+    LLSYM(__f2d_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END extendsfdf2
+FUNC_END aeabi_f2d
+
+#endif /* L_arm_f2d */
+
+
+#if defined(L_arm_d2f) || defined(L_arm_truncdfsf2)
+
+// HACK: Build two separate implementations:
+//  * __aeabi_d2f() rounds to nearest per traditional IEEE-753 rules.
+//  * __truncdfsf2() rounds towards zero per GCC specification.
+// Presumably, a program will consistently use one ABI or the other,
+//  which means that code size will not be duplicated in practice.
+// Merging two versions with dynamic rounding would be rather hard.
+#ifdef L_arm_truncdfsf2
+  #define D2F_NAME truncdfsf2
+  #define D2F_SECTION .text.sorted.libgcc.fpcore.x.truncdfsf2
+#else
+  #define D2F_NAME aeabi_d2f
+  #define D2F_SECTION .text.sorted.libgcc.fpcore.w.d2f
+#endif
+
+// float __aeabi_d2f(double)
+// Converts a double-precision float in $r1:$r0 to single-precision in $r0.
+// Values out of range become ZERO or INF; returns the upper 23 bits of NAN.
+FUNC_START_SECTION D2F_NAME D2F_SECTION
+    CFI_START_FUNCTION
+
+        // Save the sign.
+        lsrs    r2,     r1,     #31
+        lsls    r2,     #31
+        mov     ip,     r2
+
+        // Isolate the exponent (11 bits).
+        lsls    r2,     r1,     #1
+        lsrs    r2,     #21
+
+        // Isolate the mantissa.  It's safe to always add the implicit '1' --
+        //  even for subnormals -- since they will underflow in every case.
+        lsls    r1,     #12
+        adds    r1,     #1
+        rors    r1,     r1
+        lsrs    r3,     r0,     #21
+        adds    r1,     r3
+
+  #ifndef L_arm_truncdfsf2
+        // Fix the remainder.  Even though the mantissa already has 32 bits
+        //  of significance, this value still influences rounding ties.
+        lsls    r0,     #11
+  #endif
+
+        // Test for INF/NAN (r3 = 2047)
+        mvns    r3,     r2
+        lsrs    r3,     #21
+        cmp     r3,     r2
+        beq     LLSYM(__d2f_indefinite)
+
+        // Adjust exponent bias.  Offset is 127 - 1023, less 1 more since
+        //  __fp_assemble() expects the exponent relative to bit[30].
+        lsrs    r3,     #1
+        subs    r2,     r3
+        adds    r2,     #126
+
+  #ifndef L_arm_truncdfsf2
+    LLSYM(__d2f_overflow):
+        // Use the standard formatting for overflow and underflow.
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        b       SYM(__fp_assemble)
+                .cfi_restore_state
+
+  #else /* L_arm_truncdfsf2 */
+        // In theory, __truncdfsf2() could also push registers and branch to
+        //  __fp_assemble() after calculating the truncation shift and clearing
+        //  bits.  __fp_assemble() always rounds down if there is no remainder.
+        // However, after doing all of that work, the incremental cost to
+        //  finish assembling the return value is only 6 or 7 instructions
+        //  (depending on how __d2f_overflow() returns).
+        // This seems worthwhile to avoid linking in all of __fp_assemble().
+
+        // Test for INF.
+        cmp     r2,     #254
+        bge     LLSYM(__d2f_overflow)
+
+      #if defined(FP_EXCEPTIONS) && FP_EXCEPTIONS
+        // Preserve inexact zero.
+        orrs    r0,     r1
+      #endif
+
+        // HACK: Pre-empt the default round-to-nearest mode,
+        //  since GCC specifies rounding towards zero.
+        // Start by identifying subnormals by negative exponents.
+        asrs    r3,     r2,     #31
+        ands    r3,     r2
+
+        // Clear the exponent field if the result is subnormal.
+        eors    r2,     r3
+
+        // Add the subnormal shift to the nominal 8 bits of standard remainder.
+        // Also, saturate the low byte if the shift is larger than 32 bits.
+        // Anything larger would flush to zero anyway, and the shift
+        //  innstructions only examine the low byte of the second operand.
+        // Basically:
+        //    x = (-x + 8 > 32) ? 255 : (-x + 8)
+        //    x = (x + 24 < 0) ? 255 : (-x + 8)
+        //    x = (x + 24 < 0) ? 255 : (-(x + 24) + 32)
+        adds    r3,     #24
+        asrs    r0,     r3,     #31
+        subs    r3,     #32
+        rsbs    r3,     #0
+        orrs    r3,     r0
+
+        // Clear the insignificant bits.
+        lsrs    r1,     r3
+
+        // Combine the mantissa and the exponent.
+        lsls    r2,     #23
+        adds    r0,     r1,     r2
+
+        // Combine with the saved sign.
+        add     r0,     ip
+        RET
+
+    LLSYM(__d2f_overflow):
+        // Construct signed INF in $r0.
+        movs    r0,     #255
+        lsls    r0,     #23
+        add     r0,     ip
+        RET
+
+  #endif /* L_arm_truncdfsf2 */
+
+    LLSYM(__d2f_indefinite):
+        // Test for INF.  If the mantissa, exclusive of the implicit '1',
+        //  is equal to '0', the result will be INF.
+        lsls    r3,     r1,     #1
+        orrs    r3,     r0
+        beq     LLSYM(__d2f_overflow)
+
+        // TODO: Support for TRAP_NANS here.
+        // This will be double precision, not compatible with the current handler.
+
+        // Construct NAN with the upper 22 bits of the mantissa, setting bit[21]
+        //  to ensure a valid NAN without changing bit[22] (quiet)
+        subs    r2,     #0xD
+        lsls    r0,     r2,     #20
+        lsrs    r1,     #8
+        orrs    r0,     r1
+
+      #if defined(STRICT_NANS) && STRICT_NANS
+        // Yes, the NAN was probably altered, but at least keep the sign...
+        add     r0,     ip
+      #endif
+
+        RET
+
+    CFI_END_FUNCTION
+FUNC_END D2F_NAME
+
+#endif /* L_arm_d2f || L_arm_truncdfsf2 */
+
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 22619516eaf..28a5f4d5c86 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -2019,6 +2019,7 @@ LSYM(Lchange_\register):
 #include "eabi/fdiv.S"
 #include "eabi/ffixed.S"
 #include "eabi/ffloat.S"
+#include "eabi/fcast.S"
 #endif /* NOT_ISA_TARGET_32BIT */
 #include "eabi/lcmp.S"
 #endif /* !__symbian__ */
diff --git a/libgcc/config/arm/t-elf b/libgcc/config/arm/t-elf
index 6b0bb642ef5..434a7a85598 100644
--- a/libgcc/config/arm/t-elf
+++ b/libgcc/config/arm/t-elf
@@ -106,6 +106,8 @@ LIB1ASMFUNCS += \
 	_arm_floatunsisf \
 	_arm_fixsfdi \
 	_arm_fixunssfdi \
+	_arm_d2f \
+	_arm_f2d \
 	_fp_exceptionf \
 	_fp_checknanf \
 	_fp_assemblef \

From patchwork Mon Oct 31 15:45:27 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59692
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 46A5E38FA3E7
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:56:03 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id 460CD39540B1
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:49:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 460CD39540B1
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id 315DA320093B;
 Mon, 31 Oct 2022 11:49:02 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:49:02 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231341; x=
 1667317741; bh=Dz3AQRa/6CmfsofMYU6FbMAnKJN+F35UMuQThQLndsI=; b=Q
 PG/q3msJriB5t8ug34QAwqfYlW4q7vPoE9el+Nplctjq6nj36fR4SqhYXKb0HqvU
 xaHi1cTVcB5LpXA/woixf0jBZjITq1lXxjw0xpwFAoYzCrY2ZPBiCUmoFcYb/i7N
 YyE7UrFq74h2moAJ+Ju+SaBwyUrRXySdWhSoQWkcVlNQsPZGVhB4LUTJEkd1mk2+
 5u/1UNXebXd6A5aS7h/+Hpnqyi8jh3EX5lkRrac9smp/oIRXYvA1lWNIKlubmM0L
 dHJPSQdck9C3/aNnas4q0p46e4nJYpKbtlKIuHggf/yTOnAlvP1KFj3wTcNUq44A
 PcY5I6ZD7JPcE/bvIWyNQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231341; x=1667317741; bh=Dz3AQRa/6Cmfs
 ofMYU6FbMAnKJN+F35UMuQThQLndsI=; b=ZuCe1+d/O1jaer7XLDz2V/ISJpefA
 COtq0QEt7/gJ8C8ntM3IESDKWQ3lF+3D4jOczp1cM5CbeZ6fY0pr0Ztx/GoX5uYo
 vPCd7Zprjv3awRMl5FLLp4GNSvdPf0MQFZKGXwd3liFkxFqEpCo23/rma2/pQc80
 KZl7aGprZayyY/ytgaLaIswSjNGcSe8uWY2axIOrD7dgJXHjTj2gnClgk2MRU85C
 qn8cWmbIRMIp5BdSJ2WVf5GaqBcPxTd0c6bCEmVc/Cz2Z3IxucFU1CBcwORct57B
 92nomOCm5w28qAtB0m3sqsUqE1unwia2GwsM7q11WBKrt21GPDopD6R+A==
X-ME-Sender: <xms:be5fY5YDLg9I3SkfCmQubNCnQsYgTUTiXAeI6wHvZDLYhCl0a4-hmw>
 <xme:be5fYwZnDnhVRRx4JNRtAdBg7C7V24I1e6YSd3AJjeyqIJKuvsPpnqXPAjkLXB7fE
 eeIv3p_0PRSZA>
X-ME-Received: 
 <xmr:be5fY7-ZmTMZWw0xKZlDuoiQh5jrV5lZQMDse0hSIBa8Qsp9KwHwYhBU5ajxTCYdY_kgSw2d5rAeGixOtOiX_bEcU289tjdlbiAOAjM-nbR1DiuyXgK4xxM>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeehveeguedvvefhvdelveeuvefhueelveeitddufffgudeufefghefhgeeh
 udffkeenucffohhmrghinhepfhgtrghsthdrshgsnecuvehluhhsthgvrhfuihiivgeptd
 enucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgvlhdrtgho
 mh
X-ME-Proxy: <xmx:be5fY3p8f5uQMOUsGIy4KGt_J0Zm1Dc7SKBd0ffS0xTTUHzUJrI6rw>
 <xmx:be5fY0pg2YeNUYW_QDFed3hgu1bL4NlEwenWSWVaGFuM9ExlnoTq1w>
 <xmx:be5fY9QXL5XoXsO1yjeKO6cz-6XXMHhkwea1XhxytdIGZl4LAd9HnQ>
 <xmx:be5fY71sUCMcqPzcXnY8Ju4npO6s4-dSFoEfhF6MUi4RKgN5gQi8bQ>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:49:00 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmqXs087325; Mon, 31 Oct 2022 08:48:52 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 32/34] Import float<->__fp16 conversion from the CM0
 library
Date: Mon, 31 Oct 2022 08:45:27 -0700
Message-Id: <20221031154529.3627576-33-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/eabi/fcast.S (__aeabi_h2f, __aeabi_f2h): Added functions.
	* config/arm/fp16 (__gnu_f2h_ieee, __gnu_h2f_ieee, __gnu_f2h_alternative,
	__gnu_h2f_alternative): Disable build for v6m multilibs.
	* config/arm/t-bpabi (LIB1ASMFUNCS): Added _aeabi_f2h_ieee,
	_aeabi_h2f_ieee, _aeabi_f2h_alt, and _aeabi_h2f_alt (v6m only).
---
 libgcc/config/arm/eabi/fcast.S | 277 +++++++++++++++++++++++++++++++++
 libgcc/config/arm/fp16.c       |   4 +
 libgcc/config/arm/t-bpabi      |   7 +
 3 files changed, 288 insertions(+)

diff --git a/libgcc/config/arm/eabi/fcast.S b/libgcc/config/arm/eabi/fcast.S
index f0d1373d31a..09876a95767 100644
--- a/libgcc/config/arm/eabi/fcast.S
+++ b/libgcc/config/arm/eabi/fcast.S
@@ -254,3 +254,280 @@ FUNC_END D2F_NAME
 
 #endif /* L_arm_d2f || L_arm_truncdfsf2 */
 
+
+#if defined(L_aeabi_h2f_ieee) || defined(L_aeabi_h2f_alt)
+
+#ifdef L_aeabi_h2f_ieee
+  #define H2F_NAME aeabi_h2f
+  #define H2F_ALIAS gnu_h2f_ieee
+#else
+  #define H2F_NAME aeabi_h2f_alt
+  #define H2F_ALIAS gnu_h2f_alternative
+#endif
+
+// float __aeabi_h2f(short hf)
+// float __aeabi_h2f_alt(short hf)
+// Converts a half-precision float in $r0 to single-precision.
+// Rounding, overflow, and underflow conditions are impossible.
+// In IEEE mode, INF, ZERO, and NAN are returned unmodified.
+FUNC_START_SECTION H2F_NAME .text.sorted.libgcc.h2f
+FUNC_ALIAS H2F_ALIAS H2F_NAME
+    CFI_START_FUNCTION
+
+        // Set up registers for __fp_normalize2().
+        push    { rT, lr }
+                .cfi_remember_state
+                .cfi_adjust_cfa_offset 8
+                .cfi_rel_offset rT, 0
+                .cfi_rel_offset lr, 4
+
+        // Save the mantissa and exponent.
+        lsls    r2,     r0,     #17
+
+        // Isolate the sign.
+        lsrs    r0,     #15
+        lsls    r0,     #31
+
+        // Align the exponent at bit[24] for normalization.
+        // If zero, return the original sign.
+        lsrs    r2,     #3
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   eq
+        RETc(eq)
+      #else
+        beq     LLSYM(__h2f_return)
+      #endif
+
+        // Split the exponent and mantissa into separate registers.
+        // This is the most efficient way to convert subnormals in the
+        //  half-precision form into normals in single-precision.
+        // This does add a leading implicit '1' to INF and NAN,
+        //  but that will be absorbed when the value is re-assembled.
+        bl      SYM(__fp_normalize2) __PLT__
+
+   #ifdef L_aeabi_h2f_ieee
+        // Set up the exponent bias.  For INF/NAN values, the bias is 223,
+        //  where the last '1' accounts for the implicit '1' in the mantissa.
+        adds    r2,     #(255 - 31 - 1)
+
+        // Test for INF/NAN.
+        cmp     r2,     #254
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne
+      #else
+        beq     LLSYM(__h2f_assemble)
+      #endif
+
+        // For normal values, the bias should have been 111.
+        // However, this offset must be adjusted per the INF check above.
+     IT(sub,ne) r2,     #((255 - 31 - 1) - (127 - 15 - 1))
+
+    #else /* L_aeabi_h2f_alt */
+        // Set up the exponent bias.  All values are normal.
+        adds    r2,     #(127 - 15 - 1)
+    #endif
+
+    LLSYM(__h2f_assemble):
+        // Combine exponent and sign.
+        lsls    r2,     #23
+        adds    r0,     r2
+
+        // Combine mantissa.
+        lsrs    r3,     #8
+        add     r0,     r3
+
+    LLSYM(__h2f_return):
+        pop     { rT, pc }
+                .cfi_restore_state
+
+    CFI_END_FUNCTION
+FUNC_END H2F_NAME
+FUNC_END H2F_ALIAS
+
+#endif /* L_aeabi_h2f_ieee || L_aeabi_h2f_alt */
+
+
+#if defined(L_aeabi_f2h_ieee) || defined(L_aeabi_f2h_alt)
+
+#ifdef L_aeabi_f2h_ieee
+  #define F2H_NAME aeabi_f2h
+  #define F2H_ALIAS gnu_f2h_ieee
+#else
+  #define F2H_NAME aeabi_f2h_alt
+  #define F2H_ALIAS gnu_f2h_alternative
+#endif
+
+// short __aeabi_f2h(float f)
+// short __aeabi_f2h_alt(float f)
+// Converts a single-precision float in $r0 to half-precision,
+//  rounding to nearest, ties to even.
+// Values out of range are forced to either ZERO or INF.
+// In IEEE mode, the upper 12 bits of a NAN will be preserved.
+FUNC_START_SECTION F2H_NAME .text.sorted.libgcc.f2h
+FUNC_ALIAS F2H_ALIAS F2H_NAME
+    CFI_START_FUNCTION
+
+        // Set up the sign.
+        lsrs    r2,     r0,     #31
+        lsls    r2,     #15
+
+        // Save the exponent and mantissa.
+        // If ZERO, return the original sign.
+        lsls    r0,     #1
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne,t
+        addne   r0,     r2
+        RETc(ne)
+      #else
+        beq     LLSYM(__f2h_return)
+      #endif
+
+        // Isolate the exponent.
+        lsrs    r1,     r0,     #24
+
+  #ifdef L_aeabi_f2h_ieee
+        // Check for NAN.
+        cmp     r1,     #255
+        beq     LLSYM(__f2h_indefinite)
+
+        // Check for overflow.
+        cmp     r1,     #(127 + 15)
+        bhi     LLSYM(__f2h_overflow)
+
+  #else /* L_aeabi_f2h_alt */
+        // Detect overflow.
+        subs    r1,     #(127 + 16)
+        rsbs    r3,     r1,     $0
+        asrs    r3,     #31
+
+        // Saturate the mantissa on overflow.
+        bics    r0,     r3
+        lsrs    r3,     #17
+	orrs    r0,     r3
+        bcs     LLSYM(__f2h_return)
+
+  #endif /* L_aeabi_f2h_alt */
+
+        // Isolate the mantissa, adding back the implicit '1'.
+        lsls    r0,     #8
+        adds    r0,     #1
+        rors    r0,     r0
+
+        // Adjust exponent bias for half-precision, including '1' to
+        //  account for the mantissa's implicit '1'.
+      #ifdef L_aeabi_f2h_ieee
+        subs    r1,     #(127 - 15 + 1)
+      #else
+        adds    r1,     #((127 + 16) - (127 - 15 + 1))
+      #endif
+
+        bmi     LLSYM(__f2h_underflow)
+
+        // This next part is delicate.  The rouncing check requires a scratch
+        //  register, but the sign can't be merged in until after the final
+        //  overflow check below. Prepare the exponent.
+        // The mantissa and exponent can be combined, but the exponent
+        //  must be prepared now while the flags don't matter.
+        lsls    r1,     #10
+
+        // Split the mantissa (11 bits) and remainder (13 bits).
+        lsls    r3,     r0,     #12
+        lsrs    r0,     #21
+
+        // Combine mantissa and exponent without affecting flags.
+        add     r0,     r1
+
+     LLSYM(__f2h_round):
+        // If the carry bit is '0', always round down.
+      #ifdef __HAVE_FEATURE_IT
+        do_it   cs,t
+        addcs   r0,     r2
+        RETc(cs)
+      #else
+        bcc     LLSYM(__f2h_return)
+      #endif
+
+        // Carry was set.  If a tie (no remainder) and the
+        //  LSB of the result is '0', round down (to even).
+        lsls    r1,     r0,     #31
+        orrs    r1,     r3
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne
+      #else
+        beq     LLSYM(__f2h_return)
+      #endif
+
+        // Round up, ties to even.
+     IT(add,ne) r0,     #1
+
+  #ifndef L_aeabi_f2h_ieee
+        // HACK: The result may overflow to -0 not INF in alt mode.
+        // Subtract overflow to reverse.
+        lsrs    r3,    r0,    #15
+        subs    r0,    r3
+  #endif
+
+     LLSYM(__f2h_return):
+        // Combine mantissa and exponent with the sign.
+        adds    r0,     r2
+        RET
+
+    LLSYM(__f2h_underflow):
+        // Align the remainder. The remainder consists of the last 12 bits
+        //  of the mantissa plus the magnitude of underflow.
+        movs    r3,     r0
+        adds    r1,     #12
+        lsls    r3,     r1
+
+        // Align the mantissa.  The MSB of the remainder must be
+        // shifted out last into the 'C' flag for rounding.
+        subs    r1,     #33
+        rsbs    r1,     #0
+        lsrs    r0,     r1
+        b       LLSYM(__f2h_round)
+
+  #ifdef L_aeabi_f2h_ieee
+    LLSYM(__f2h_overflow):
+        // Create single-precision INF from which to construct half-precision.
+        movs    r0,     #255
+        lsls    r0,     #24
+
+    LLSYM(__f2h_indefinite):
+        // Check for INF.
+        lsls    r3,     r0,     #8
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne,t
+      #else
+        beq     LLSYM(__f2h_infinite)
+      #endif
+
+        // HACK: The ARM specification states "the least significant 13 bits
+        //  of a NAN are lost in the conversion." But what happens when the
+        //  NAN-ness of the value resides in these 13 bits?
+        // Set bit[8] to ensure NAN without changing bit[9] (quiet).
+     IT(add,ne) r2,     #128
+     IT(add,ne) r2,     #128
+
+    LLSYM(__f2h_infinite):
+        // Construct the result from the upper 11 bits of the mantissa
+        //  and the lower 5 bits of the exponent.
+        lsls    r0,     #3
+        lsrs    r0,     #17
+
+        // Combine with the sign (and possibly NAN flag).
+        orrs    r0,     r2
+        RET
+
+  #endif /* L_aeabi_f2h_ieee */
+
+    CFI_END_FUNCTION
+FUNC_END F2H_NAME
+FUNC_END F2H_ALIAS
+
+#endif  /* L_aeabi_f2h_ieee || L_aeabi_f2h_alt */
+
diff --git a/libgcc/config/arm/fp16.c b/libgcc/config/arm/fp16.c
index 39004a4fe1f..afde9a97d57 100644
--- a/libgcc/config/arm/fp16.c
+++ b/libgcc/config/arm/fp16.c
@@ -198,6 +198,8 @@ __gnu_h2f_internal(unsigned short a, int ieee)
   return sign | (((aexp + 0x70) << 23) + (mantissa << 13));
 }
 
+#if (__ARM_ARCH_ISA_ARM) || (__ARM_ARCH_ISA_THUMB > 1)
+
 unsigned short
 __gnu_f2h_ieee(unsigned int a)
 {
@@ -222,6 +224,8 @@ __gnu_h2f_alternative(unsigned short a)
   return __gnu_h2f_internal(a, 0);
 }
 
+#endif /* NOT_ISA_TARGET_32BIT */
+
 unsigned short
 __gnu_d2h_ieee (unsigned long long a)
 {
diff --git a/libgcc/config/arm/t-bpabi b/libgcc/config/arm/t-bpabi
index 86234d5676f..1b1ecfc638e 100644
--- a/libgcc/config/arm/t-bpabi
+++ b/libgcc/config/arm/t-bpabi
@@ -1,6 +1,13 @@
 # Add the bpabi.S functions.
 LIB1ASMFUNCS += _aeabi_lcmp _aeabi_ulcmp _aeabi_ldivmod _aeabi_uldivmod
 
+# Only enabled for v6m.
+ARM_ISA:=$(findstring __ARM_ARCH_ISA_ARM,$(shell $(gcc_compile_bare) -dM -E - </dev/null))
+THUMB1_ISA:=$(findstring __ARM_ARCH_ISA_THUMB 1,$(shell $(gcc_compile_bare) -dM -E - </dev/null))
+ifeq (__ARM_ARCH_ISA_THUMB 1,$(ARM_ISA)$(THUMB1_ISA))
+LIB1ASMFUNCS += _aeabi_f2h_ieee _aeabi_h2f_ieee _aeabi_f2h_alt _aeabi_h2f_alt
+endif
+
 # Add the BPABI C functions.
 LIB2ADD += $(srcdir)/config/arm/unaligned-funcs.c
 

From patchwork Mon Oct 31 15:45:28 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59686
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9A54938E8052
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:52:21 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id C2400382A2E7
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:49:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C2400382A2E7
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.west.internal (Postfix) with ESMTP id B0DC53200917;
 Mon, 31 Oct 2022 11:49:06 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163])
 by compute1.internal (MEProxy); Mon, 31 Oct 2022 11:49:07 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231346; x=
 1667317746; bh=bKElXalJZQMT+CjSWucYChGinYakKFZjJVQNXBweoh0=; b=a
 s8vXXB3FhY09IC/cB4W3oRajuZOckR3J4x/Zd3w8/U0st6VAF2mjbWGGdZbW75p0
 +bx+k3cJHqjuXlqNM9rZmW0CgL39MP3HcG9TP+xsPop2hOR944mNV4Qm2kioZvC4
 AEykQ4GZn90YpEgL9odmg6MpBJ6mWXp5u4XgzRNFccepsSUj5POZ/zbzxCLuQIhq
 EeTdf7I1B8MH09HSizBph+GpVtwBPxom2KUYqMrmVPVr2afSS09TpuQPr/aGTYRj
 eoV4QhgrclpEcv1uEnkaKwhV/x1Qc/kk1EXETlJO7x/BYV3fGsSqTVheaWfe1sR/
 1jt6/eXgQPXrDOet3lXlg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231346; x=1667317746; bh=bKElXalJZQMT+
 CjSWucYChGinYakKFZjJVQNXBweoh0=; b=QKVUiprgRqBgm/A3QTqrFEWa+1sY3
 IUzyI2ndg9NbTjrq+s0pfE/9+jiIUumZ+S3wn8f8reMZUU9XGjldH9WaT0IZYpGe
 g+JvPXb9VQyVFeMcQT/Zr0lkYP/olhlAuke6QDsfcrFQl39iWs7EznfNRc5xjucs
 cHxHfd9AmJybn2rNZ3IFvWeplXRAfD61uD6Hlf2zNSWekzSytxydlXbLOTv/2/Ki
 856/cqI7J04AQN463npcQ3RAxWY69+oJV+78q2cxehPWmPi3LszQ0SddGNG2AqMg
 i9NkzS+r1ZF6++siQKXXzkKlgoQxLjZWeEjhvRgxC7voK83v9CEZstmaA==
X-ME-Sender: <xms:cu5fY77xFOkJ0ne5yQHXEcVgwO-9l58JaG53mZRToF3hxjupatFnzA>
 <xme:cu5fYw6-OfZaFFbfwv7rY7vcdUIqj_09NyPHK06xlxTqvsmm7enGOszHLD4m3sqkl
 lN-reSZ6MGl0A>
X-ME-Received: 
 <xmr:cu5fYyfYngfESKxEw1Zch4mcK1oMWCpR-vnQbnGOjIm_OCwBIWdNbaD_E-_7N6qPsfcLg5BfKMbjkJvdyzeuS0i8LaNrGd0cyiPUT6hWwr8n__6js9ZO0kk>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdektdcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpefggeekleduieelueegvdehvdegjedvgefgtddvfeeuvedvveetffehhfdu
 jefggeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe
 hgnhhusegurghnihgvlhgvnhhgvghlrdgtohhm
X-ME-Proxy: <xmx:cu5fY8J-rE9CoLtap1ed-CG5eA59U4mS4M0UQXfR4dt6XThyr8XMNw>
 <xmx:cu5fY_IIcDk5qvPXBQyjfABCC8nLRsVIBNzsGrUpgn2vt0LNR97SBg>
 <xmx:cu5fY1zXu7gGiDa4gcXNBJXfeldWfKRA5EUX2PtaPJWSkCoNrvOwgg>
 <xmx:cu5fY2WCxFLmRDkWbgCGqj0t7WHTJYtgJU6yGeSpEcNiSBEMjJ4tVg>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:49:05 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFmvYD087328; Mon, 31 Oct 2022 08:48:57 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 33/34] Drop single-precision Thumb-1 soft-float functions
Date: Mon, 31 Oct 2022 08:45:28 -0700
Message-Id: <20221031154529.3627576-34-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

With the complete CM0 library integrated, regression testing showed new
failures with the message "compilation failed to produce executable":

    gcc.dg/fixed-point/convert-float-1.c
    gcc.dg/fixed-point/convert-float-3.c
    gcc.dg/fixed-point/convert-sat.c

Investigating, this appears to be caused by the linker.  I can't find a
comprehensive linker specification to claim this is actually a bug, but it
certainly doesn't match my expectations.  Investigating, I found issues
with the link order of these symbols:

  * __aeabi_fmul()
  * __aeabi_f2d()
  * __aeabi_f2iz()

Specifically, I expect the linker to import the _first_ definition of any
symbol.  This is the basic behavior that allows the soft-float library to
supply missing symbols on architectures without optimized routines.

Comparing the v6-m multilib with the default, I see symbol exports for all
of the affect symbols:

    gcc-obj/gcc/libgcc.a:

        // assembly routines

        _arm_mulsf3.o:
        00000000 W __aeabi_fmul
        00000000 W __mulsf3

        _arm_addsubdf3.o:
        00000368 T __aeabi_f2d
        00000368 T __extendsfdf2

        _arm_fixsfsi.o:
        00000000 T __aeabi_f2iz
        00000000 T __fixsfsi

        mulsf3.o:
        <empty>

        fixsfsi.o:
        <empty>

        extendsfdf2.o.o:
        <empty>

    gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a:

        // assembly routines

        _arm_mulsf3.o:
        00000000 T __aeabi_fmul
                 U __fp_assemble
                 U __fp_exception
                 U __fp_infinity
                 U __fp_zero
        00000000 T __mulsf3
                 U __umulsidi3

        _arm_fixsfsi.o:
        00000000 T __aeabi_f2iz
        00000000 T __fixsfsi
        00000002 T __internal_f2iz

        _arm_f2d.o:
        00000000 T __aeabi_f2d
        00000000 T __extendsfdf2
                 U __fp_normalize2

        // soft-float library

        mulsf3.o:
        00000000 T __aeabi_fmul

        fixsfsi.o:
        00000000 T __aeabi_f2iz

        extendsfdf2.o:
        00000000 T __aeabi_f2d

Given the order of the archive file, I expect the linker to import the affected
functions from the _arm_* archive elements.

For "convert-sat.c", all is well with -march=armv7-m.
        ...
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_muldf3.o
    OK> (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_mulsf3.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_cmpsf2.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_fixsfsi.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_fixunssfsi.o
    OK> (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_addsubdf3.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_cmpdf2.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_fixdfsi.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_arm_fixunsdfsi.o
    OK> (/home/mirdan/gcc-obj/gcc/libgcc.a)_fixsfdi.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_fixdfdi.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_fixunssfdi.o
        (/home/mirdan/gcc-obj/gcc/libgcc.a)_fixunsdfdi.o
        ...

However, with -march=armv6s-m, the linker imports these symbols from the soft-
float library.  (NOTE: The CM0 library only implements single-precision float
operations, so imports from muldf3.o, fixdfsi.o, etc are expected.)
        ...
    ??> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)mulsf3.o
    ??> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)fixsfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)muldf3.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)fixdfsi.o
    ??> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)extendsfdf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_clzsi2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmpge.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmple.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixsfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunssfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunssfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpdf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixdfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)eqdf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)gedf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)ledf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)subdf3.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)floatunsidf.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpsf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixsfsi.o
        ...

It seems that the order in which the linker resolves symbols matters.  In the
affected test cases, the linker begins searching for fixed-point function
symbols first: _subQQ.o, _cmpQQ.o, etc.  The fixed-point archive elements
appear after the _arm_* archive elements, so the initial definitions of the
floating point functions are discarded.  However, the fixed-point functions
contain unresolved symbol references which the linker registers progressively.

Given that the default libgcc.a does not build the soft-point library [1],
the linker cannot import any floating point objects until the second pass.

However, when v6-m/nofp/libgcc.a _does_ include the soft-point library, the
linker proceeds to import some floating point objects during the first pass.

To test this theory, add explicit symbol references to convert-sat.c:

        --- a/gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
        +++ b/gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
        @@ -11,6 +11,12 @@ extern void abort (void);

         int main ()
         {
        +  volatile float a = 1.0;
        +  volatile float b = 2.0;
        +  volatile float c = a * b;
        +  volatile double d = a;
        +  volatile int e = a;
        +
           SAT_CONV1 (short _Accum, hk);
           SAT_CONV1 (_Accum, k);
           SAT_CONV1 (long _Accum, lk);

Afterwards, the linker imports the expected symbols:
        ...
    ==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_mulsf3.o
    ==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_muldi3.o
    ==> (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixsfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_f2d.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_exceptionf.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_assemblef.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fp_normalizef.o
        ...
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)muldf3.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)fixdfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_clzsi2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixunssfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmpge.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fcmple.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixsfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_fixunssfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpdf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfsi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixdfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_fixunsdfdi.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)eqdf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)gedf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)ledf2.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)subdf3.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)floatunsidf.o
        (/home/mirdan/gcc-obj/gcc/thumb/v6-m/nofp/libgcc.a)_arm_cmpsf2.o
        ...

At a minimum this behavior results in the use of non-preferred code in an
affected application.  However, as long as each object exports a single
entry point, this does not automatically result in a build failure.

Indeed, in the case of __aeabi_fmul() and __aeabi_f2d(), all references seem
to resolve uniformly in favor of the soft-float library.  The first pass that
imports the soft-float version of __aeabi_f2iz() also succeeds.

However, the first pass fails to find __aeabi_f2uiz(), since the soft-float
library does not implement this variant.  So, this symbol remains undefined
until the second pass.  However, the assembly version of __aeabi_f2uiz()
the linker finds happens to be implemented as a branch to __internal_f2iz() [2].
But the linker, importing __internal_f2iz(), also finds the main entry point
__aeabi_f2iz().  And, since __aeabi_f2iz() was already found in the soft-float
library, the linker throws an error.

The solution is two-fold.  First, the assembly routines have separately been
made robust against this potential error condition (by weakening and splitting
symbols).  Second, this commit to block single-precision functions from the
soft-float library makes it impossible for the linker to select a non-preferred
version.  Two duplicate symbols remain (extendsfdf2) and (truncdfsf2), but the
situation is much improved.

[1] softfp_wrap_start = "#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1"

[2] (These operations share a substantial portion of their code path, so this
choice leads to a size reduction in programs that use both functions.)

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	* config/arm/t-softfp (softfp_float_modes): Added as "df".
---
 libgcc/config/arm/t-softfp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgcc/config/arm/t-softfp b/libgcc/config/arm/t-softfp
index 554ec9bc47b..bd6a4642e5f 100644
--- a/libgcc/config/arm/t-softfp
+++ b/libgcc/config/arm/t-softfp
@@ -1,2 +1,4 @@
 softfp_wrap_start := '\#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1'
 softfp_wrap_end := '\#endif'
+softfp_float_modes := df
+

From patchwork Mon Oct 31 15:45:29 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 59693
X-Patchwork-Delegate: rearnsha@gcc.gnu.org
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 8E5973952483
	for <patchwork@sourceware.org>; Mon, 31 Oct 2022 15:56:32 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com
 [64.147.123.19])
 by sourceware.org (Postfix) with ESMTPS id DC20F3954C6A
 for <gcc-patches@gcc.gnu.org>; Mon, 31 Oct 2022 15:49:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DC20F3954C6A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46])
 by mailout.west.internal (Postfix) with ESMTP id CA1DA320093C;
 Mon, 31 Oct 2022 11:49:11 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute2.internal (MEProxy); Mon, 31 Oct 2022 11:49:12 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=cc:cc:content-transfer-encoding:date:date:from:from
 :in-reply-to:in-reply-to:message-id:mime-version:references
 :reply-to:sender:subject:subject:to:to; s=fm1; t=1667231351; x=
 1667317751; bh=43IeROfjRFiJQAI0mAcOdx8b7SHFcZ3qBHyqcv5y8Qk=; b=g
 QPcd/FXQ1lgwZv72K8RP6phS+rLulN7yECRMFhZ7ZPzyX+SkN8mqMRgvw9p6Uql9
 bj348h6NSGsqNChtmVCMb0EabTKFfiLXR1rq865PTU30yp4HIaW9MlGKtKYaKlEH
 gi2PXiSQfMzF3aMVOcLLEu76kMfzyVTSsZcKMxvXi24Z4p3c7ymaEentFxbppIA4
 4XV40+8QAm1Ql7Erp4ROzvBHW+1EETi6sUBofKBrsKoPRGo19tX3pK0nrHjDTC3J
 Zsiwn4guzwkrEPnC0q6eL8ZDtr6esb7fRHiFuK6GX7sMBh7D3ym4nXIEmFO7395F
 8ax62sIl/4bTUNgfm3F5w==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding:date:date
 :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to
 :message-id:mime-version:references:reply-to:sender:subject
 :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm3; t=1667231351; x=1667317751; bh=43IeROfjRFiJQ
 AI0mAcOdx8b7SHFcZ3qBHyqcv5y8Qk=; b=YWvkMc0iSJ6uT0Rfzr1EOnzVS1FKt
 vcOn9SdJmv7yHcgyn05FVxKf5MK7ZtL5zMIP3PUzrFxs+THABpOIigag+KpzHHmT
 yRxytJhIzOsJYdvOYqeGSl1ksU6UyhryXC6qMoshbxq8QQwxx54Sva9adkeAdD6v
 PZFHQrjPsdlKWRTNal6d+GX1V50YrCMdVAswZiZdupNdamvvubRnsWRmfpJoEol/
 Kc5DQgFxLdJhVf/41aE8DZnZpRW3MCLieqMDdlP4BEU7jUVl+gVPCd5MYUgk4HoX
 +ZSL3MEm5JHYwruRl9eGC/QvCpfGSWcCb2uLzfl++YugO1Nm/fQUJinzg==
X-ME-Sender: <xms:d-5fY3DGFUi_YusBqO7ODgrKB0LwZPjH1sOwmHgBwyDmzVzhT9IoZw>
 <xme:d-5fY9g5xJgZodnPh96-IVYAtO-nErBbInmDTpJt-xYbCdRuqBDg5KCFbHIz-Y8wm
 dm8SEHDVm_FJw>
X-ME-Received: 
 <xmr:d-5fYykJBfJ4BL0aEML2HV3H4ANvHS-yqRplwObzcLF6iV6gK6nTjtdTQVI4wdKTmzMnWldIhDpKWGCupEUBDT-KezSHE_v5eaQFl662XvsCiW9z91HcwYw>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvgedrudefgdekudcutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgv
 lhcugfhnghgvlhcuoehgnhhusegurghnihgvlhgvnhhgvghlrdgtohhmqeenucggtffrrg
 htthgvrhhnpeekkeeutddtheetfeehvdfhhfelheevgeekieegteduvedvueegveeiuefg
 hffgffenucffohhmrghinheptghliidvrdhssgdptghtiidvrdhssgdplhhisgdufhhunh
 gtshdrshgspdhpohhptghnthdrshgspdhgnhhurdhorhhgnecuvehluhhsthgvrhfuihii
 vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnhhivghlvghnghgvlh
 drtghomh
X-ME-Proxy: <xmx:d-5fY5woD9ccHecfasIah8qdpct7fNVMXdTBA9moUMKYt0nFw8axGQ>
 <xmx:d-5fY8Q2BCSjWKpU4JAIEMINIP5bHVuAErp7-L0ab8sNkO9eK0E6dA>
 <xmx:d-5fY8YFlhSxnN66IXUmAadRtpr0A1IW6mcxg1vMPAoK41nrrZzwUQ>
 <xmx:d-5fY3fYraXa9Sv3GYoN-t_t01hKuGF6wiHtAxVhAFA4nM2IVEKd4g>
Feedback-ID: i791144d6:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 31 Oct 2022 11:49:10 -0400 (EDT)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 29VFn2Am087331; Mon, 31 Oct 2022 08:49:02 -0700 (PDT)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v7 34/34] Add -mpure-code support to the CM0 functions.
Date: Mon, 31 Oct 2022 08:45:29 -0700
Message-Id: <20221031154529.3627576-35-gnu@danielengel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20221031154529.3627576-1-gnu@danielengel.com>
References: <20221031154529.3627576-1-gnu@danielengel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

gcc/libgcc/ChangeLog:
2022-10-09 Daniel Engel <gnu@danielengel.com>

	Makefile.in (MPURE_CODE): New macro defines __PURE_CODE__.
	(gcc_compile): Appended MPURE_CODE.
	lib1funcs.S (FUNC_START_SECTION): Set flags for __PURE_CODE__.
	clz2.S (__clzsi2): Added -mpure-code compatible instructions.
	ctz2.S (__ctzsi2): Same.
	popcnt.S (__popcountsi2, __popcountdi2): Same.
---
 libgcc/Makefile.in            |  5 ++++-
 libgcc/config/arm/clz2.S      | 25 ++++++++++++++++++++++-
 libgcc/config/arm/ctz2.S      | 38 +++++++++++++++++++++++++++++++++--
 libgcc/config/arm/lib1funcs.S |  7 ++++++-
 libgcc/config/arm/popcnt.S    | 33 +++++++++++++++++++++++++-----
 5 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 1fe708a93f7..da2da7046cc 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -307,6 +307,9 @@ CRTSTUFF_CFLAGS = -O2 $(GCC_CFLAGS) $(INCLUDES) $(MULTILIB_CFLAGS) -g0 \
 # Extra flags to use when compiling crt{begin,end}.o.
 CRTSTUFF_T_CFLAGS =
 
+# Pass the -mpure-code flag into assembly for conditional compilation.
+MPURE_CODE = $(if $(findstring -mpure-code,$(CFLAGS)), -D__PURE_CODE__)
+
 MULTIDIR := $(shell $(CC) $(CFLAGS) -print-multi-directory)
 MULTIOSDIR := $(shell $(CC) $(CFLAGS) -print-multi-os-directory)
 
@@ -316,7 +319,7 @@ inst_slibdir = $(slibdir)$(MULTIOSSUBDIR)
 
 gcc_compile_bare = $(CC) $(INTERNAL_CFLAGS) $(CFLAGS-$(<F))
 compile_deps = -MT $@ -MD -MP -MF $(basename $@).dep
-gcc_compile = $(gcc_compile_bare) -o $@ $(compile_deps)
+gcc_compile = $(gcc_compile_bare) -o $@ $(compile_deps) $(MPURE_CODE)
 gcc_s_compile = $(gcc_compile) -DSHARED
 
 objects = $(filter %$(objext),$^)
diff --git a/libgcc/config/arm/clz2.S b/libgcc/config/arm/clz2.S
index 3d40811278b..4e6665511f3 100644
--- a/libgcc/config/arm/clz2.S
+++ b/libgcc/config/arm/clz2.S
@@ -214,17 +214,40 @@ FUNC_ENTRY clzsi2
      IT(sub,ne) r2,     #4
 
     LLSYM(__clz2):
+  #if defined(__PURE_CODE__) && __PURE_CODE__
+        // Without access to table data, continue unrolling the loop.
+        lsrs    r1,     r0,     #2
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne,t
+      #else
+        beq     LLSYM(__clz1)
+      #endif
+
+        // Out of 4 bits, the first '1' is somewhere in the highest 2,
+        //  so the lower 2 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #2
+
+    LLSYM(__clz1):
+        // Convert remainder {0,1,2,3} to {0,1,2,2}.
+        lsrs    r1,     r0,     #1
+        bics    r0,     r1
+
+  #else /* !__PURE_CODE__ */
         // Load the remainder by index
         adr     r1,     LLSYM(__clz_remainder)
         ldrb    r0,     [r1, r0]
 
+  #endif /* !__PURE_CODE__ */
   #endif /* !__OPTIMIZE_SIZE__ */
 
         // Account for the remainder.
         subs    r0,     r2,     r0
         RET
 
-  #if !defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__
+  #if !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__) && \
+      !(defined(__PURE_CODE__) && __PURE_CODE__)
         .align 2
     LLSYM(__clz_remainder):
         .byte 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4
diff --git a/libgcc/config/arm/ctz2.S b/libgcc/config/arm/ctz2.S
index d57acabae01..b21485156f6 100644
--- a/libgcc/config/arm/ctz2.S
+++ b/libgcc/config/arm/ctz2.S
@@ -209,11 +209,44 @@ FUNC_ENTRY ctzsi2
      IT(sub,ne) r2,     #4
 
     LLSYM(__ctz2):
+  #if defined(__PURE_CODE__) && __PURE_CODE__
+        // Without access to table data, continue unrolling the loop.
+        lsls    r1,     r0,     #2
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   ne, t
+      #else
+        beq     LLSYM(__ctz1)
+      #endif
+
+        // Out of 4 bits, the first '1' is somewhere in the lowest 2,
+        //  so the higher 2 bits are no longer interesting.
+     IT(mov,ne) r0,     r1
+     IT(sub,ne) r2,     #2
+
+    LLSYM(__ctz1):
+        // Convert remainder {0,1,2,3} in $r0[31:30] to {0,2,1,2}.
+        lsrs    r0,     #31
+
+      #ifdef __HAVE_FEATURE_IT
+        do_it   cs, t
+      #else
+        bcc     LLSYM(__ctz_zero)
+      #endif
+
+        // If bit[30] of the remainder is set, neither of these bits count
+        //  towards the result.  Bit[31] must be cleared.
+        // Otherwise, bit[31] becomes the final remainder.
+     IT(sub,cs) r2,     #2
+     IT(eor,cs) r0,     r0
+
+  #else /* !__PURE_CODE__ */
         // Look up the remainder by index.
         lsrs    r0,     #28
         adr     r3,     LLSYM(__ctz_remainder)
         ldrb    r0,     [r3, r0]
 
+  #endif /* !__PURE_CODE__ */
   #endif /* !__OPTIMIZE_SIZE__ */
 
     LLSYM(__ctz_zero):
@@ -221,8 +254,9 @@ FUNC_ENTRY ctzsi2
         subs    r0,     r2,     r0
         RET
 
-  #if (!defined(__ARM_FEATURE_CLZ) || !__ARM_FEATURE_CLZ) && \
-      (!defined(__OPTIMIZE_SIZE__) || !__OPTIMIZE_SIZE__)
+  #if !(defined(__ARM_FEATURE_CLZ) && __ARM_FEATURE_CLZ) && \
+      !(defined(__OPTIMIZE_SIZE__) && __OPTIMIZE_SIZE__) && \
+      !(defined(__PURE_CODE__) && __PURE_CODE__)
         .align 2
     LLSYM(__ctz_remainder):
         .byte 0,4,3,4,2,4,3,4,1,4,3,4,2,4,3,4
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 28a5f4d5c86..a15939a78bc 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -454,7 +454,12 @@ SYM (\name):
    Use the *_START_SECTION macros for declarations that the linker should
     place in a non-defailt section (e.g. ".rodata", ".text.subsection"). */
 .macro FUNC_START_SECTION name section
-	.section \section,"x"
+#ifdef __PURE_CODE__
+	/* SHF_ARM_PURECODE | SHF_ALLOC | SHF_EXECINSTR */
+	.section \section,"0x20000006",%progbits
+#else
+	.section \section,"ax",%progbits
+#endif
 	.align 0
 	FUNC_ENTRY \name
 .endm
diff --git a/libgcc/config/arm/popcnt.S b/libgcc/config/arm/popcnt.S
index 4613ea475b0..7e0878badaa 100644
--- a/libgcc/config/arm/popcnt.S
+++ b/libgcc/config/arm/popcnt.S
@@ -23,6 +23,29 @@
    <http://www.gnu.org/licenses/>.  */
 
 
+#if defined(L_popcountdi2) || defined(L_popcountsi2)
+
+.macro ldmask reg, temp, value
+    #if defined(__PURE_CODE__) && (__PURE_CODE__)
+      #ifdef NOT_ISA_TARGET_32BIT
+        movs    \reg,   \value
+        lsls    \temp,  \reg,   #8
+        orrs    \reg,   \temp
+        lsls    \temp,  \reg,   #16
+        orrs    \reg,   \temp
+      #else
+        // Assumption: __PURE_CODE__ only support M-profile.
+        movw    \reg    ((\value) * 0x101)
+        movt    \reg    ((\value) * 0x101)
+      #endif
+    #else
+        ldr     \reg,   =((\value) * 0x1010101)
+    #endif
+.endm
+
+#endif
+
+
 #ifdef L_popcountdi2
 
 // int __popcountdi2(int)
@@ -49,7 +72,7 @@ FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
 
   #else /* !__OPTIMIZE_SIZE__ */
         // Load the one-bit alternating mask.
-        ldr     r3,     =0x55555555
+        ldmask  r3,     r2,     0x55
 
         // Reduce the second word.
         lsrs    r2,     r1,     #1
@@ -62,7 +85,7 @@ FUNC_START_SECTION popcountdi2 .text.sorted.libgcc.popcountdi2
         subs    r0,     r2
 
         // Load the two-bit alternating mask.
-        ldr     r3,     =0x33333333
+        ldmask  r3,     r2,     0x33
 
         // Reduce the second word.
         lsrs    r2,     r1,     #2
@@ -140,7 +163,7 @@ FUNC_ENTRY popcountsi2
   #else /* !__OPTIMIZE_SIZE__ */
 
         // Load the one-bit alternating mask.
-        ldr     r3,     =0x55555555
+        ldmask  r3,     r2,     0x55
 
         // Reduce the word.
         lsrs    r1,     r0,     #1
@@ -148,7 +171,7 @@ FUNC_ENTRY popcountsi2
         subs    r0,     r1
 
         // Load the two-bit alternating mask.
-        ldr     r3,     =0x33333333
+        ldmask  r3,     r2,     0x33
 
         // Reduce the word.
         lsrs    r1,     r0,     #2
@@ -158,7 +181,7 @@ FUNC_ENTRY popcountsi2
         adds    r0,     r1
 
         // Load the four-bit alternating mask.
-        ldr     r3,     =0x0F0F0F0F
+        ldmask  r3,     r2,     0x0F
 
         // Reduce the word.
         lsrs    r1,     r0,     #4