From patchwork Mon Dec 27 19:04:56 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Daniel Engel <gnu@danielengel.com>
X-Patchwork-Id: 49283
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9161F3858020
	for <patchwork@sourceware.org>; Mon, 27 Dec 2021 19:06:34 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com
 [64.147.123.20])
 by sourceware.org (Postfix) with ESMTPS id B424D3858D3C
 for <gcc-patches@gcc.gnu.org>; Mon, 27 Dec 2021 19:06:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B424D3858D3C
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=danielengel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=danielengel.com
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailout.west.internal (Postfix) with ESMTP id 73219320100E;
 Mon, 27 Dec 2021 14:06:13 -0500 (EST)
Received: from mailfrontend1 ([10.202.2.162])
 by compute3.internal (MEProxy); Mon, 27 Dec 2021 14:06:13 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=danielengel.com;
 h=from:to:cc:subject:date:message-id:mime-version
 :content-transfer-encoding; s=fm1; bh=cX0MdxIJPpTZzZSAhTlpStUU4u
 mRL9ViBSxMGU7LTFY=; b=EGOwLwH6cGfAjFqh3dVUcsOqOxxRcNf5HEWgOMLEjr
 UzpmgBQNj8V/P2J0+PDONx/8a0+uF8HIfYU06MuNq7YCmzlam2AWvqI76jXi9y4C
 mMJxhYg5cBnsszl6nUjEV7tWTru8LT19RwK1mAVVe/iSvYH1R8KPWeAcebPOLjh/
 slJJMOkulnj1uoOo1ps+sfRhcGwidcFoffuZL8nkDVUNynAoGnVUkFx1t4lmmeM+
 XFdkcElPl5Yg8XagL8ay7ObrwsnLrfrPI5S3OR1a/PAVkIpB6YyD+Tf4gauUjniR
 jwRBoZT50eXoIiDrdbuA1sBVRrrx6/ctXR82Rugtjs8w==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:date:from
 :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=cX0MdxIJPpTZzZSAh
 TlpStUU4umRL9ViBSxMGU7LTFY=; b=hpuL2TvhBkeb9q1nqApNzdGxpQDKEbVtz
 pfO20wyuZ+Wj76T+g49JGYVttt84etbc+hJpyl0phi1S+iM0xDXMQSFEWJJTUbWr
 QxgerZLFIb/HzIZOSiQvEoQfuvdmEiNP7dQSYFRsCHrevyJaOtqUjxzIsFjkIgQD
 qV9MO/1eirT2abN4eAGXqo+ZK9shd0HUG3VNUZZn8m/bP93aPlpw6DfWSvPYI5TX
 DKiVJmvQrwlZGrj3WUziNhcc+XrBSTxVUCHk+sPOcY1DvTptXTGVaJWX1YK/lffb
 srlVDcbThbdUoKQbsByfpl2LSdoM4GoKiu6S9PPfpR24XVFW+KbUA==
X-ME-Sender: <xms:pA7KYSq9fvc3SpEmQdVRSjntvg4MpsBn7zltROFK0LphoYQfK6pHyg>
 <xme:pA7KYQpjmFV6iEgnxec-yFPwoQcT7qfMQAU7oPgEjBXUSYm5gimfdiLfr0cpvDRQK
 CzDe6rtjUjwvw>
X-ME-Received: 
 <xmr:pA7KYXN_dUsZv-K5O7sdlJzpPFJ_EOahdsTIALMiaFIWCOZs_3GOzLViUBidxxhzDwXHEx5oC3520HvPmqFrNUMjEljdbeIuawBe9MI44NF9BuYTkx5k>
X-ME-Proxy-Cause: 
 gggruggvucftvghtrhhoucdtuddrgedvuddruddujedguddtlecutefuodetggdotefrod
 ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh
 necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd
 enucfjughrpefhvffufffkofgggfestdekredtredttdenucfhrhhomhepffgrnhhivghl
 ucfgnhhgvghluceoghhnuhesuggrnhhivghlvghnghgvlhdrtghomheqnecuggftrfgrth
 htvghrnhepgeehueehheekleeuuefggefhtdevteffvdelieelieevudekheettefhtddu
 vdeinecuffhomhgrihhnpehgnhhurdhorhhgpdgurghnihgvlhgvnhhgvghlrdgtohhmpd
 hnvghtlhhisgdrohhrghdpjhhhrghushgvrhdruhhspdhuihgrrdgrtgdrsggvnecuvehl
 uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhnuhesuggrnh
 hivghlvghnghgvlhdrtghomh
X-ME-Proxy: <xmx:pA7KYR4CcC4xHnkh2suOAhz9s3_DNG0C2YjX9aWmX7XZeZLzOBLrMQ>
 <xmx:pA7KYR6fg5uBg16p9Se2ct5TplJ--pWhaaJ61VTzMc_5KiHtNJCrCg>
 <xmx:pA7KYRhQjqwHMLMjCh_7-vbffl3oZoPolJBgh3TY_KQDfqqiZXFy3w>
 <xmx:pA7KYREy7Cnk6bq2nweWf5QULSLvWS-TwCOQTeAwsUqozNZx67pEGA>
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon,
 27 Dec 2021 14:06:11 -0500 (EST)
Received: from ubuntu.lorien.danielengel.com (ubuntu.lorien.danielengel.com
 [10.0.0.96])
 by sendmail.lorien.danielengel.com (8.15.2/8.15.2) with ESMTP id
 1BRJ69lU060911; Mon, 27 Dec 2021 11:06:09 -0800 (PST)
 (envelope-from gnu@danielengel.com)
From: Daniel Engel <gnu@danielengel.com>
To: Richard Earnshaw <Richard.Earnshaw@foss.arm.com>, gcc-patches@gcc.gnu.org
Subject: [PATCH v6 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex
 M0
Date: Mon, 27 Dec 2021 11:04:56 -0800
Message-Id: <20211227190530.3136549-1-gnu@danielengel.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, KAM_INFOUSMEBIZ,
 KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: Daniel Engel <gnu@danielengel.com>,
 Christophe Lyon <christophe.lyon@linaro.org>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi Richard, 

I am re-submitting my libgcc patch from last year: 

    https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html 

I clearly missed the stage1 window again.  However, since the patch rebased 
cleanly onto gcc-12 with no regressions, and it's not quite stage4 yet, I 
figured submission is worth a chance. 

Regards,
Daniel
---

Changes since v5:

    * Rebased and tested with gcc-12

Regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}, clean master:

    # of expected passes            513596
    # of unexpected failures        38829
    # of unexpected successes       16
    # of expected failures          3450
    # of unresolved testcases       1108
    # of unsupported tests          28224

Patched master:

    # of expected passes            513596
    # of unexpected failures        38829
    # of unexpected successes       16
    # of expected failures          3450
    # of unresolved testcases       1108
    # of unsupported tests          28224

---

This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture.  There
are improvements to most of the EABI integer functions as well.  This is the
ibgcc component of a larger library project originally proposed in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.

I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.

There may be some follow-on projects worth discussing:

    * The library is currently integrated into the ARM v6s-m multilib only.  It
    is likely that some other architectures would benefit from these routines.
    However, I have NOT profiled the existing implementations (ieee754-sf.S) to
    estimate where improvements may be found.

    * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
    There may be useful bits in [1] that can be integrated.

On Cortex M0, the library has (approximately) the following properties:

Function(s)                     Size (bytes)        Cycles              Stack   Accuracy
__clzsi2                        50                  20                  0       exact
__clzsi2 (OPTIMIZE_SIZE)        22                  51                  0       exact
__clzdi2                        8+__clzsi2          4+__clzsi2          0       exact

__clrsbsi2                      8+__clzsi2          6+__clzsi2          0       exact
__clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    0       exact

__ctzsi2                        52                  21                  0       exact
__ctzsi2 (OPTIMIZE_SIZE)        24                  52                  0       exact
__ctzdi2                        8+__ctzsi2          5+__ctzsi2          0       exact

__ffssi2                        8                   6..(5+__ctzsi2)     0       exact
__ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     0       exact

__popcountsi2                   52                  25                  0       exact
__popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              0       exact
__popcountdi2                   34+__popcountsi2    46                  0       exact
__popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             0       exact

__paritysi2                     24                  14                  0       exact
__paritysi2 (OPTIMIZE_SIZE)     16                  38                  0       exact
__paritydi2                     2+__paritysi2       1+__paritysi2       0       exact

__umulsidi3                     44                  24                  0       exact
__mulsidi3                      30+__umulsidi3      24+__umulsidi3      8       exact
__muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       0       exact
__ashldi3 (__aeabi_llsl)        22                  13                  0       exact
__lshrdi3 (__aeabi_llsr)        22                  13                  0       exact
__ashrdi3 (__aeabi_lasr)        22                  13                  0       exact

__aeabi_lcmp                    20                  13                  0       exact
__aeabi_ulcmp                   16                  10                  0       exact

__udivsi3 (__aeabi_uidiv)       56                  72..385             0       < 1 lsb
__divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        8       < 1 lsb
__udivdi3 (__aeabi_uldiv)       164                 103..1394           16      < 1 lsb
__udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           16      < 1 lsb
__divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        32      < 1 lsb

__shared_float                  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd)         116+__shared_float  31..76              8       <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  8       <= 0.5 ulp
__subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          8       <= 0.5 ulp
__aeabi_frsub                   8+__addsf3          6+__addsf3          8       <= 0.5 ulp
__mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              8       <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  8       <= 0.5 ulp
__divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             8       <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            8       <= 0.5 ulp

__cmpsf2/__lesf2/__ltsf2        72                  33                  0       exact
__eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          0       exact

__floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              8       <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             8       <= 0.5 ulp
__floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     8       <= 0.5 ulp
__floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     8       <= 0.5 ulp
__floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       8       <= 0.5 ulp

__fixsfdi (__aeabi_f2lz)        74                  27..33              0       exact
__fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         0       exact
__fixsfsi (__aeabi_f2iz)        52                  19                  0       exact
__fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         0       exact
__fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         0       exact

__extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  8       exact
__truncsfdf2 (__aeabi_f2d)      88                  34                  8       exact
__aeabi_d2f                     56+__shared_float   54..58              8       <= 0.5 ulp
__aeabi_h2f                     34+__shared_float   34                  8       exact
__aeabi_f2h                     84                  23..34              0       <= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division

        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction

        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html