[v7,00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

Message ID 20221031154529.3627576-1-gnu@danielengel.com
Headers
Series libgcc: Thumb-1 Floating-Point Assembly for Cortex M0 |

Message

Daniel Engel Oct. 31, 2022, 3:44 p.m. UTC
  Hi Richard,

I am re-submitting my libgcc patch from 2021:

    https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
    https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html

I believe I have finally made the stage1 window. 

Regards,
Daniel

---

Changes since v6:

    * Rebased and tested with gcc-13

There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
Clean master:

    # of expected passes            529397
    # of unexpected failures        41160
    # of unexpected successes       12
    # of expected failures          3442
    # of unresolved testcases       978
    # of unsupported tests          28993

Patched master:

    # of expected passes            529397
    # of unexpected failures        41160
    # of unexpected successes       12
    # of expected failures          3442
    # of unresolved testcases       978
    # of unsupported tests          28993

---

This patch series adds an assembly-language implementation of IEEE-754 compliant
single-precision functions designed for the Cortex M0 (v6m) architecture.  There
are improvements to most of the EABI integer functions as well.  This is the
ibgcc component of a larger library project originally proposed in 2018:

    https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html

As one point of comparison, a test program [1] links 916 bytes from libgcc with
the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
That's a 90% size reduction.

I have extensive test vectors [2], and this patch pass all tests on an STM32F051.
These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 [5], plus
many of my own generation.

There may be some follow-on projects worth discussing:

    * The library is currently integrated into the ARM v6s-m multilib only.  It
    is likely that some other architectures would benefit from these routines.
    However, I have NOT profiled the existing implementations (ieee754-sf.S) to
    estimate where improvements may be found.

    * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
    There may be useful bits in [1] that can be integrated.

On Cortex M0, the library has (approximately) the following properties:

Function(s)                     Size (bytes)        Cycles              Stack   Accuracy
__clzsi2                        50                  20                  0       exact
__clzsi2 (OPTIMIZE_SIZE)        22                  51                  0       exact
__clzdi2                        8+__clzsi2          4+__clzsi2          0       exact

__clrsbsi2                      8+__clzsi2          6+__clzsi2          0       exact
__clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    0       exact

__ctzsi2                        52                  21                  0       exact
__ctzsi2 (OPTIMIZE_SIZE)        24                  52                  0       exact
__ctzdi2                        8+__ctzsi2          5+__ctzsi2          0       exact

__ffssi2                        8                   6..(5+__ctzsi2)     0       exact
__ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     0       exact

__popcountsi2                   52                  25                  0       exact
__popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              0       exact
__popcountdi2                   34+__popcountsi2    46                  0       exact
__popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             0       exact

__paritysi2                     24                  14                  0       exact
__paritysi2 (OPTIMIZE_SIZE)     16                  38                  0       exact
__paritydi2                     2+__paritysi2       1+__paritysi2       0       exact

__umulsidi3                     44                  24                  0       exact
__mulsidi3                      30+__umulsidi3      24+__umulsidi3      8       exact
__muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       0       exact
__ashldi3 (__aeabi_llsl)        22                  13                  0       exact
__lshrdi3 (__aeabi_llsr)        22                  13                  0       exact
__ashrdi3 (__aeabi_lasr)        22                  13                  0       exact

__aeabi_lcmp                    20                  13                  0       exact
__aeabi_ulcmp                   16                  10                  0       exact

__udivsi3 (__aeabi_uidiv)       56                  72..385             0       < 1 lsb
__divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        8       < 1 lsb
__udivdi3 (__aeabi_uldiv)       164                 103..1394           16      < 1 lsb
__udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           16      < 1 lsb
__divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        32      < 1 lsb

__shared_float                  178
__shared_float (OPTIMIZE_SIZE)  154

__addsf3 (__aeabi_fadd)         116+__shared_float  31..76              8       <= 0.5 ulp
__addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  8       <= 0.5 ulp
__subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          8       <= 0.5 ulp
__aeabi_frsub                   8+__addsf3          6+__addsf3          8       <= 0.5 ulp
__mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              8       <= 0.5 ulp
__mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  8       <= 0.5 ulp
__divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             8       <= 0.5 ulp
__divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            8       <= 0.5 ulp

__cmpsf2/__lesf2/__ltsf2        72                  33                  0       exact
__eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          0       exact
__unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          0       exact
__aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          0       exact

__floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              8       <= 0.5 ulp
__floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             8       <= 0.5 ulp
__floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     8       <= 0.5 ulp
__floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     8       <= 0.5 ulp
__floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       8       <= 0.5 ulp

__fixsfdi (__aeabi_f2lz)        74                  27..33              0       exact
__fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         0       exact
__fixsfsi (__aeabi_f2iz)        52                  19                  0       exact
__fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         0       exact
__fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         0       exact

__extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  8       exact
__truncsfdf2 (__aeabi_f2d)      88                  34                  8       exact
__aeabi_d2f                     56+__shared_float   54..58              8       <= 0.5 ulp
__aeabi_h2f                     34+__shared_float   34                  8       exact
__aeabi_f2h                     84                  23..34              0       <= 0.5 ulp

Copyright assignment is on file with the FSF.

Thanks,
Daniel Engel


[1] // Test program for size comparison

    extern int main (void)
    {
        volatile int x = 1;
        volatile unsigned long long int y = 10;
        volatile long long int z = x / y; // 64-bit division

        volatile float a = x; // 32-bit casting
        volatile float b = y; // 64 bit casting
        volatile float c = z / b; // float division
        volatile float d = a + c; // float addition
        volatile float e = c * b; // float multiplication
        volatile float f = d - e - c; // float subtraction

        if (f != c) // float comparison
            y -= (long long int)d; // float casting
    }

[2] http://danielengel.com/cm0_test_vectors.tgz
[3] http://www.netlib.org/fp/ucbtest.tgz
[4] http://www.jhauser.us/arithmetic/TestFloat.html
[5] http://win-www.uia.ac.be/u/cant/ieeecc754.html

---

Daniel Engel (34):
  Add and restructure function declaration macros
  Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
  Fix syntax warnings on conditional instructions
  Reorganize LIB1ASMFUNCS object wrapper macros
  Add the __HAVE_FEATURE_IT and IT() macros
  Refactor 'clz' functions into a new file
  Refactor 'ctz' functions into a new file
  Refactor 64-bit shift functions into a new file
  Import 'clz' functions from the CM0 library
  Import 'ctz' functions from the CM0 library
  Import 64-bit shift functions from the CM0 library
  Import 'clrsb' functions from the CM0 library
  Import 'ffs' functions from the CM0 library
  Import 'parity' functions from the CM0 library
  Import 'popcnt' functions from the CM0 library
  Refactor Thumb-1 64-bit comparison into a new file
  Import 64-bit comparison from CM0 library
  Merge Thumb-2 optimizations for 64-bit comparison
  Import 32-bit division from the CM0 library
  Refactor Thumb-1 64-bit division into a new file
  Import 64-bit division from the CM0 library
  Import integer multiplication from the CM0 library
  Refactor Thumb-1 float comparison into a new file
  Import float comparison from the CM0 library
  Refactor Thumb-1 float subtraction into a new file
  Import float addition and subtraction from the CM0 library
  Import float multiplication from the CM0 library
  Import float division from the CM0 library
  Import integer-to-float conversion from the CM0 library
  Import float-to-integer conversion from the CM0 library
  Import float<->double conversion from the CM0 library
  Import float<->__fp16 conversion from the CM0 library
  Drop single-precision Thumb-1 soft-float functions
  Add -mpure-code support to the CM0 functions.

 libgcc/Makefile.in              |   5 +-
 libgcc/config/arm/bpabi-lib.h   |  12 -
 libgcc/config/arm/bpabi-v6m.S   | 206 -----------
 libgcc/config/arm/bpabi.S       |  42 ---
 libgcc/config/arm/bpabi.c       |  42 ---
 libgcc/config/arm/clz2.S        | 371 ++++++++++++++++++++
 libgcc/config/arm/ctz2.S        | 349 ++++++++++++++++++
 libgcc/config/arm/eabi/fadd.S   | 324 +++++++++++++++++
 libgcc/config/arm/eabi/fcast.S  | 533 ++++++++++++++++++++++++++++
 libgcc/config/arm/eabi/fcmp.S   | 604 ++++++++++++++++++++++++++++++++
 libgcc/config/arm/eabi/fdiv.S   | 261 ++++++++++++++
 libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
 libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
 libgcc/config/arm/eabi/fmul.S   | 215 ++++++++++++
 libgcc/config/arm/eabi/fneg.S   |  76 ++++
 libgcc/config/arm/eabi/fplib.h  |  80 +++++
 libgcc/config/arm/eabi/futil.S  | 418 ++++++++++++++++++++++
 libgcc/config/arm/eabi/idiv.S   | 299 ++++++++++++++++
 libgcc/config/arm/eabi/lcmp.S   | 187 ++++++++++
 libgcc/config/arm/eabi/ldiv.S   | 493 ++++++++++++++++++++++++++
 libgcc/config/arm/eabi/lmul.S   | 218 ++++++++++++
 libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
 libgcc/config/arm/fp16.c        |   4 +
 libgcc/config/arm/lib1funcs.S   | 549 ++++++++++-------------------
 libgcc/config/arm/parity.S      | 120 +++++++
 libgcc/config/arm/popcnt.S      | 212 +++++++++++
 libgcc/config/arm/t-bpabi       |  10 +-
 libgcc/config/arm/t-elf         | 138 +++++++-
 libgcc/config/arm/t-softfp      |   2 +
 29 files changed, 5997 insertions(+), 675 deletions(-)
 delete mode 100644 libgcc/config/arm/bpabi.c
 create mode 100644 libgcc/config/arm/clz2.S
 create mode 100644 libgcc/config/arm/ctz2.S
 create mode 100644 libgcc/config/arm/eabi/fadd.S
 create mode 100644 libgcc/config/arm/eabi/fcast.S
 create mode 100644 libgcc/config/arm/eabi/fcmp.S
 create mode 100644 libgcc/config/arm/eabi/fdiv.S
 create mode 100644 libgcc/config/arm/eabi/ffixed.S
 create mode 100644 libgcc/config/arm/eabi/ffloat.S
 create mode 100644 libgcc/config/arm/eabi/fmul.S
 create mode 100644 libgcc/config/arm/eabi/fneg.S
 create mode 100644 libgcc/config/arm/eabi/fplib.h
 create mode 100644 libgcc/config/arm/eabi/futil.S
 create mode 100644 libgcc/config/arm/eabi/idiv.S
 create mode 100644 libgcc/config/arm/eabi/lcmp.S
 create mode 100644 libgcc/config/arm/eabi/ldiv.S
 create mode 100644 libgcc/config/arm/eabi/lmul.S
 create mode 100644 libgcc/config/arm/eabi/lshift.S
 create mode 100644 libgcc/config/arm/parity.S
 create mode 100644 libgcc/config/arm/popcnt.S
  

Comments

Daniel Engel Nov. 15, 2022, 3:27 p.m. UTC | #1
Hello, 

Is there still any interest in merging this patch? 

Thanks,
Daniel


On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote:
> Hi Richard,
>
> I am re-submitting my libgcc patch from 2021:
>
>     https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
>     https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
>
> I believe I have finally made the stage1 window. 
>
> Regards,
> Daniel
>
> ---
>
> Changes since v6:
>
>     * Rebased and tested with gcc-13
>
> There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
> Clean master:
>
>     # of expected passes            529397
>     # of unexpected failures        41160
>     # of unexpected successes       12
>     # of expected failures          3442
>     # of unresolved testcases       978
>     # of unsupported tests          28993
>
> Patched master:
>
>     # of expected passes            529397
>     # of unexpected failures        41160
>     # of unexpected successes       12
>     # of expected failures          3442
>     # of unresolved testcases       978
>     # of unsupported tests          28993
>
> ---
>
> This patch series adds an assembly-language implementation of IEEE-754 compliant
> single-precision functions designed for the Cortex M0 (v6m) architecture.  There
> are improvements to most of the EABI integer functions as well.  This is the
> ibgcc component of a larger library project originally proposed in 2018:
>
>     https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> As one point of comparison, a test program [1] links 916 bytes from libgcc with
> the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 toolchain.
> That's a 90% size reduction.
>
> I have extensive test vectors [2], and this patch pass all tests on an 
> STM32F051.
> These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 
> [5], plus
> many of my own generation.
>
> There may be some follow-on projects worth discussing:
>
>     * The library is currently integrated into the ARM v6s-m multilib only.  It
>     is likely that some other architectures would benefit from these routines.
>     However, I have NOT profiled the existing implementations (ieee754-sf.S) to
>     estimate where improvements may be found.
>
>     * GCC currently lacks test for some functions, such as __aeabi_[u]ldivmod().
>     There may be useful bits in [1] that can be integrated.
>
> On Cortex M0, the library has (approximately) the following properties:
>
> Function(s)                     Size (bytes)        Cycles              
> Stack   Accuracy
> __clzsi2                        50                  20                  
> 0       exact
> __clzsi2 (OPTIMIZE_SIZE)        22                  51                  
> 0       exact
> __clzdi2                        8+__clzsi2          4+__clzsi2          
> 0       exact
>
> __clrsbsi2                      8+__clzsi2          6+__clzsi2          
> 0       exact
> __clrsbdi2                      18+__clzsi2         (8..10)+__clzsi2    
> 0       exact
>
> __ctzsi2                        52                  21                  
> 0       exact
> __ctzsi2 (OPTIMIZE_SIZE)        24                  52                  
> 0       exact
> __ctzdi2                        8+__ctzsi2          5+__ctzsi2          
> 0       exact
>
> __ffssi2                        8                   6..(5+__ctzsi2)     
> 0       exact
> __ffsdi2                        14+__ctzsi2         9..(8+__ctzsi2)     
> 0       exact
>
> __popcountsi2                   52                  25                  
> 0       exact
> __popcountsi2 (OPTIMIZE_SIZE)   14                  9..201              
> 0       exact
> __popcountdi2                   34+__popcountsi2    46                  
> 0       exact
> __popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi2    17..401             
> 0       exact
>
> __paritysi2                     24                  14                  
> 0       exact
> __paritysi2 (OPTIMIZE_SIZE)     16                  38                  
> 0       exact
> __paritydi2                     2+__paritysi2       1+__paritysi2       
> 0       exact
>
> __umulsidi3                     44                  24                  
> 0       exact
> __mulsidi3                      30+__umulsidi3      24+__umulsidi3      
> 8       exact
> __muldi3 (__aeabi_lmul)         10+__umulsidi3      6+__umulsidi3       
> 0       exact
> __ashldi3 (__aeabi_llsl)        22                  13                  
> 0       exact
> __lshrdi3 (__aeabi_llsr)        22                  13                  
> 0       exact
> __ashrdi3 (__aeabi_lasr)        22                  13                  
> 0       exact
>
> __aeabi_lcmp                    20                  13                  
> 0       exact
> __aeabi_ulcmp                   16                  10                  
> 0       exact
>
> __udivsi3 (__aeabi_uidiv)       56                  72..385             
> 0       < 1 lsb
> __divsi3 (__aeabi_idiv)         38+__udivsi3        26+__udivsi3        
> 8       < 1 lsb
> __udivdi3 (__aeabi_uldiv)       164                 103..1394           
> 16      < 1 lsb
> __udivdi3 (OPTIMIZE_SIZE)       142                 120..1392           
> 16      < 1 lsb
> __divdi3 (__aeabi_ldiv)         54+__udivdi3        36+__udivdi3        
> 32      < 1 lsb
>
> __shared_float                  178
> __shared_float (OPTIMIZE_SIZE)  154
>
> __addsf3 (__aeabi_fadd)         116+__shared_float  31..76              
> 8       <= 0.5 ulp
> __addsf3 (OPTIMIZE_SIZE)        112+__shared_float  74                  
> 8       <= 0.5 ulp
> __subsf3 (__aeabi_fsub)         6+__addsf3          3+__addsf3          
> 8       <= 0.5 ulp
> __aeabi_frsub                   8+__addsf3          6+__addsf3          
> 8       <= 0.5 ulp
> __mulsf3 (__aeabi_fmul)         112+__shared_float  73..97              
> 8       <= 0.5 ulp
> __mulsf3 (OPTIMIZE_SIZE)        96+__shared_float   93                  
> 8       <= 0.5 ulp
> __divsf3 (__aeabi_fdiv)         132+__shared_float  83..361             
> 8       <= 0.5 ulp
> __divsf3 (OPTIMIZE_SIZE)        120+__shared_float  263..359            
> 8       <= 0.5 ulp
>
> __cmpsf2/__lesf2/__ltsf2        72                  33                  
> 0       exact
> __eqsf2/__nesf2                 4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __gesf2/__gesf2                 4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __unordsf2 (__aeabi_fcmpun)     4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmpeq                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmpne                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmplt                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmple                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
> __aeabi_fcmpge                  4+__cmpsf2          3+__cmpsf2          
> 0       exact
>
> __floatundisf (__aeabi_ul2f)    14+__shared_float   40..81              
> 8       <= 0.5 ulp
> __floatundisf (OPTIMIZE_SIZE)   14+__shared_float   40..237             
> 8       <= 0.5 ulp
> __floatunsisf (__aeabi_ui2f)    0+__floatundisf     1+__floatundisf     
> 8       <= 0.5 ulp
> __floatdisf (__aeabi_l2f)       14+__floatundisf    7+__floatundisf     
> 8       <= 0.5 ulp
> __floatsisf (__aeabi_i2f)       0+__floatdisf       1+__floatdisf       
> 8       <= 0.5 ulp
>
> __fixsfdi (__aeabi_f2lz)        74                  27..33              
> 0       exact
> __fixunssfdi (__aeabi_f2ulz)    4+__fixsfdi         3+__fixsfdi         
> 0       exact
> __fixsfsi (__aeabi_f2iz)        52                  19                  
> 0       exact
> __fixsfsi (OPTIMIZE_SIZE)       4+__fixsfdi         3+__fixsfdi         
> 0       exact
> __fixunssfsi (__aeabi_f2uiz)    4+__fixsfsi         3+__fixsfsi         
> 0       exact
>
> __extendsfdf2 (__aeabi_f2d)     42+__shared_float   38                  
> 8       exact
> __truncsfdf2 (__aeabi_f2d)      88                  34                  
> 8       exact
> __aeabi_d2f                     56+__shared_float   54..58              
> 8       <= 0.5 ulp
> __aeabi_h2f                     34+__shared_float   34                  
> 8       exact
> __aeabi_f2h                     84                  23..34              
> 0       <= 0.5 ulp
>
> Copyright assignment is on file with the FSF.
>
> Thanks,
> Daniel Engel
>
>
> [1] // Test program for size comparison
>
>     extern int main (void)
>     {
>         volatile int x = 1;
>         volatile unsigned long long int y = 10;
>         volatile long long int z = x / y; // 64-bit division
>
>         volatile float a = x; // 32-bit casting
>         volatile float b = y; // 64 bit casting
>         volatile float c = z / b; // float division
>         volatile float d = a + c; // float addition
>         volatile float e = c * b; // float multiplication
>         volatile float f = d - e - c; // float subtraction
>
>         if (f != c) // float comparison
>             y -= (long long int)d; // float casting
>     }
>
> [2] http://danielengel.com/cm0_test_vectors.tgz
> [3] http://www.netlib.org/fp/ucbtest.tgz
> [4] http://www.jhauser.us/arithmetic/TestFloat.html
> [5] http://win-www.uia.ac.be/u/cant/ieeecc754.html
>
> ---
>
> Daniel Engel (34):
>   Add and restructure function declaration macros
>   Rename THUMB_FUNC_START to THUMB_FUNC_ENTRY
>   Fix syntax warnings on conditional instructions
>   Reorganize LIB1ASMFUNCS object wrapper macros
>   Add the __HAVE_FEATURE_IT and IT() macros
>   Refactor 'clz' functions into a new file
>   Refactor 'ctz' functions into a new file
>   Refactor 64-bit shift functions into a new file
>   Import 'clz' functions from the CM0 library
>   Import 'ctz' functions from the CM0 library
>   Import 64-bit shift functions from the CM0 library
>   Import 'clrsb' functions from the CM0 library
>   Import 'ffs' functions from the CM0 library
>   Import 'parity' functions from the CM0 library
>   Import 'popcnt' functions from the CM0 library
>   Refactor Thumb-1 64-bit comparison into a new file
>   Import 64-bit comparison from CM0 library
>   Merge Thumb-2 optimizations for 64-bit comparison
>   Import 32-bit division from the CM0 library
>   Refactor Thumb-1 64-bit division into a new file
>   Import 64-bit division from the CM0 library
>   Import integer multiplication from the CM0 library
>   Refactor Thumb-1 float comparison into a new file
>   Import float comparison from the CM0 library
>   Refactor Thumb-1 float subtraction into a new file
>   Import float addition and subtraction from the CM0 library
>   Import float multiplication from the CM0 library
>   Import float division from the CM0 library
>   Import integer-to-float conversion from the CM0 library
>   Import float-to-integer conversion from the CM0 library
>   Import float<->double conversion from the CM0 library
>   Import float<->__fp16 conversion from the CM0 library
>   Drop single-precision Thumb-1 soft-float functions
>   Add -mpure-code support to the CM0 functions.
>
>  libgcc/Makefile.in              |   5 +-
>  libgcc/config/arm/bpabi-lib.h   |  12 -
>  libgcc/config/arm/bpabi-v6m.S   | 206 -----------
>  libgcc/config/arm/bpabi.S       |  42 ---
>  libgcc/config/arm/bpabi.c       |  42 ---
>  libgcc/config/arm/clz2.S        | 371 ++++++++++++++++++++
>  libgcc/config/arm/ctz2.S        | 349 ++++++++++++++++++
>  libgcc/config/arm/eabi/fadd.S   | 324 +++++++++++++++++
>  libgcc/config/arm/eabi/fcast.S  | 533 ++++++++++++++++++++++++++++
>  libgcc/config/arm/eabi/fcmp.S   | 604 ++++++++++++++++++++++++++++++++
>  libgcc/config/arm/eabi/fdiv.S   | 261 ++++++++++++++
>  libgcc/config/arm/eabi/ffixed.S | 414 ++++++++++++++++++++++
>  libgcc/config/arm/eabi/ffloat.S | 247 +++++++++++++
>  libgcc/config/arm/eabi/fmul.S   | 215 ++++++++++++
>  libgcc/config/arm/eabi/fneg.S   |  76 ++++
>  libgcc/config/arm/eabi/fplib.h  |  80 +++++
>  libgcc/config/arm/eabi/futil.S  | 418 ++++++++++++++++++++++
>  libgcc/config/arm/eabi/idiv.S   | 299 ++++++++++++++++
>  libgcc/config/arm/eabi/lcmp.S   | 187 ++++++++++
>  libgcc/config/arm/eabi/ldiv.S   | 493 ++++++++++++++++++++++++++
>  libgcc/config/arm/eabi/lmul.S   | 218 ++++++++++++
>  libgcc/config/arm/eabi/lshift.S | 241 +++++++++++++
>  libgcc/config/arm/fp16.c        |   4 +
>  libgcc/config/arm/lib1funcs.S   | 549 ++++++++++-------------------
>  libgcc/config/arm/parity.S      | 120 +++++++
>  libgcc/config/arm/popcnt.S      | 212 +++++++++++
>  libgcc/config/arm/t-bpabi       |  10 +-
>  libgcc/config/arm/t-elf         | 138 +++++++-
>  libgcc/config/arm/t-softfp      |   2 +
>  29 files changed, 5997 insertions(+), 675 deletions(-)
>  delete mode 100644 libgcc/config/arm/bpabi.c
>  create mode 100644 libgcc/config/arm/clz2.S
>  create mode 100644 libgcc/config/arm/ctz2.S
>  create mode 100644 libgcc/config/arm/eabi/fadd.S
>  create mode 100644 libgcc/config/arm/eabi/fcast.S
>  create mode 100644 libgcc/config/arm/eabi/fcmp.S
>  create mode 100644 libgcc/config/arm/eabi/fdiv.S
>  create mode 100644 libgcc/config/arm/eabi/ffixed.S
>  create mode 100644 libgcc/config/arm/eabi/ffloat.S
>  create mode 100644 libgcc/config/arm/eabi/fmul.S
>  create mode 100644 libgcc/config/arm/eabi/fneg.S
>  create mode 100644 libgcc/config/arm/eabi/fplib.h
>  create mode 100644 libgcc/config/arm/eabi/futil.S
>  create mode 100644 libgcc/config/arm/eabi/idiv.S
>  create mode 100644 libgcc/config/arm/eabi/lcmp.S
>  create mode 100644 libgcc/config/arm/eabi/ldiv.S
>  create mode 100644 libgcc/config/arm/eabi/lmul.S
>  create mode 100644 libgcc/config/arm/eabi/lshift.S
>  create mode 100644 libgcc/config/arm/parity.S
>  create mode 100644 libgcc/config/arm/popcnt.S
>
> -- 
> 2.34.1