From patchwork Thu Aug 19 04:13:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Hua X-Patchwork-Id: 44709 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 84A593897079 for ; Thu, 19 Aug 2021 04:14:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 84A593897079 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629346458; bh=IqTGrIGen6pIBo7kgPr2fZvFKVkLpP9zZmL9fPimM14=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=kgMwuLfdmPowO24RfggZgLFHTVgUDytTJgi9Dp2hwUebfjrmGJRAt4w8HXuWCwVyh sOQDRoeHETAu9PgHMotrO1MxhikYQ7xyS8OpL5i6piY55EXC4qKp0yngJLCNpkEpBK cYMQF0jVwO/X7kOSqf0z7j1ORd4rlwPCd6onY2Yo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by sourceware.org (Postfix) with ESMTPS id 0BFFD3992009 for ; Thu, 19 Aug 2021 04:13:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0BFFD3992009 Received: by mail-yb1-xb2a.google.com with SMTP id z18so9718006ybg.8 for ; Wed, 18 Aug 2021 21:13:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=IqTGrIGen6pIBo7kgPr2fZvFKVkLpP9zZmL9fPimM14=; b=EYMU+6YYhHKkbxBzt894onczA1jlDpAmNqW9Tnomn3hgeFMMx+A4mbN6P0y07Br/K8 /p/jLiP+iX/DbfMTIUmJovntoDcdIdUOpGMSxlTBCnmTOJV84lIjsuWHKJYyY8Anvy1s qCjpSMN3kZwANRiGE3/8PqyFGOIAUDaGDdNKKDrTLbQjZuUyPYQYlu9qHRgPbRM4ZioM Ijy+9r8EJRBJoi33SClKUbK0LFQdLslmq2GHO9fIS1FL+TA6Y4x+v+XC0KxeYOw4ft5f ARPXJowlSkNetmfLgtucAXkXqhOOC6/J339deSzSv0PRQDIGmAUodquZ6UkiuVEtBORG 9rmw== X-Gm-Message-State: AOAM530iADnciq8HJBS+vSSHkl0cnRkGOAjM1eSkQnb2R4Ep7JT11XYe UaBFyC4zdJuiTa6TNikg5nThHDBGvu9KzfSurRnMwfKx90Jgs3/T X-Google-Smtp-Source: ABdhPJzSTK7TLkf+XfBDNs2qYEhJKqedNvwCQxtYvDOVcJzI7FgAwtXiuwP6igV0iVvQFYd47zGtjyPcRKNydr8hJ9g= X-Received: by 2002:a25:7cc6:: with SMTP id x189mr17749080ybc.371.1629346418447; Wed, 18 Aug 2021 21:13:38 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 19 Aug 2021 12:13:26 +0800 Message-ID: Subject: [PATCH 10/14] [LoongArch] Build Infastructure To: libc-alpha@sourceware.org X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Paul Hua via Libc-alpha From: Paul Hua Reply-To: Paul Hua Cc: Xu Chenghua , huangpei@loongson.cn, caiyinyu@loongson.cn Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From e2faaa05fd55715952410a7a3f4505584e5763e1 Mon Sep 17 00:00:00 2001 From: caiyinyu Date: Tue, 27 Jul 2021 16:02:40 +0800 Subject: [PATCH 10/14] LoongArch: Build Infastructure * scripts/config.guess: Modified * scripts/config.sub: Likewise * sysdeps/loongarch/Implies: New file. * sysdeps/loongarch/Makefile: Likewise. * sysdeps/loongarch/configure: Likewise. * sysdeps/loongarch/configure.ac: Likewise. * sysdeps/loongarch/lp64/Implies-after: Likewise. * sysdeps/loongarch/nptl/Makefile: Likewise. * sysdeps/loongarch/preconfigure: Likewise. * sysdeps/loongarch/sys/regdef.h: Likewise. * sysdeps/unix/sysv/linux/loongarch/Implies: Likewise. * sysdeps/unix/sysv/linux/loongarch/Makefile: Likewise. * sysdeps/unix/sysv/linux/loongarch/Versions: Likewise. * sysdeps/unix/sysv/linux/loongarch/configure: Likewise. * sysdeps/unix/sysv/linux/loongarch/configure.ac: Likewise. * sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed: Likewise. * sysdeps/unix/sysv/linux/loongarch/lp64/Implies: Likewise. * sysdeps/unix/sysv/linux/loongarch/shlib-versions: Likewise. --- scripts/config.guess | 3 + scripts/config.sub | 1 + sysdeps/loongarch/Implies | 5 + sysdeps/loongarch/Makefile | 16 ++ sysdeps/loongarch/configure | 4 + sysdeps/loongarch/configure.ac | 6 + sysdeps/loongarch/lp64/Implies-after | 1 + sysdeps/loongarch/nptl/Makefile | 26 +++ sysdeps/loongarch/preconfigure | 9 + sysdeps/loongarch/sys/regdef.h | 100 +++++++++ sysdeps/unix/sysv/linux/loongarch/Implies | 1 + sysdeps/unix/sysv/linux/loongarch/Makefile | 16 ++ sysdeps/unix/sysv/linux/loongarch/Versions | 41 ++++ sysdeps/unix/sysv/linux/loongarch/configure | 199 ++++++++++++++++++ .../unix/sysv/linux/loongarch/configure.ac | 27 +++ .../unix/sysv/linux/loongarch/ldd-rewrite.sed | 1 + .../unix/sysv/linux/loongarch/lp64/Implies | 3 + .../unix/sysv/linux/loongarch/shlib-versions | 1 + 18 files changed, 460 insertions(+) create mode 100644 sysdeps/loongarch/Implies create mode 100644 sysdeps/loongarch/Makefile create mode 100644 sysdeps/loongarch/configure create mode 100644 sysdeps/loongarch/configure.ac create mode 100644 sysdeps/loongarch/lp64/Implies-after create mode 100644 sysdeps/loongarch/nptl/Makefile create mode 100644 sysdeps/loongarch/preconfigure create mode 100644 sysdeps/loongarch/sys/regdef.h create mode 100644 sysdeps/unix/sysv/linux/loongarch/Implies create mode 100644 sysdeps/unix/sysv/linux/loongarch/Makefile create mode 100644 sysdeps/unix/sysv/linux/loongarch/Versions create mode 100644 sysdeps/unix/sysv/linux/loongarch/configure create mode 100644 sysdeps/unix/sysv/linux/loongarch/configure.ac create mode 100644 sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/Implies create mode 100644 sysdeps/unix/sysv/linux/loongarch/shlib-versions diff --git a/scripts/config.guess b/scripts/config.guess index 0f9b29c884..1f73e0b06e 100755 --- a/scripts/config.guess +++ b/scripts/config.guess @@ -1040,6 +1040,9 @@ EOF riscv32:Linux:*:* | riscv64:Linux:*:*) echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" exit ;; + loongarch32:Linux:*:* | loongarch64:Linux:*:*) + echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" + exit ;; s390:Linux:*:* | s390x:Linux:*:*) echo "$UNAME_MACHINE"-ibm-linux-"$LIBC" exit ;; diff --git a/scripts/config.sub b/scripts/config.sub index a8f3f7e7cd..87d34db0f3 100755 --- a/scripts/config.sub +++ b/scripts/config.sub @@ -1208,6 +1208,7 @@ case $cpu-$vendor in | mipsisa64sr71k | mipsisa64sr71kel \ | mipsr5900 | mipsr5900el \ | mipstx39 | mipstx39el \ + | loongarch32 | loongarch64 \ | mmix \ | mn10200 | mn10300 \ | moxie \ diff --git a/sysdeps/loongarch/Implies b/sysdeps/loongarch/Implies new file mode 100644 index 0000000000..c88325b8be --- /dev/null +++ b/sysdeps/loongarch/Implies @@ -0,0 +1,5 @@ +init_array + +ieee754/ldbl-128 +ieee754/dbl-64 +ieee754/flt-32 diff --git a/sysdeps/loongarch/Makefile b/sysdeps/loongarch/Makefile new file mode 100644 index 0000000000..c08dedbaaf --- /dev/null +++ b/sysdeps/loongarch/Makefile @@ -0,0 +1,16 @@ +ifeq ($(subdir),misc) +sysdep_headers += sys/asm.h +endif + +# LoongArch's assembler also needs to know about PIC as it changes the +# definition of some assembler macros. +ASFLAGS-.os += $(pic-ccflag) + +abi-variants := lp32 lp64 + +ifeq (,$(filter $(default-abi),$(abi-variants))) +$(error Unknown ABI $(default-abi), must be one of $(abi-variants)) +endif + +abi-lp64-condition := defined _ABILP64 +abi-lp32-condition := defined _ABILP32 diff --git a/sysdeps/loongarch/configure b/sysdeps/loongarch/configure new file mode 100644 index 0000000000..1e5abf81a7 --- /dev/null +++ b/sysdeps/loongarch/configure @@ -0,0 +1,4 @@ +# This file is generated from configure.ac by Autoconf. DO NOT EDIT! + # Local configure fragment for sysdeps/loongarch/elf. + +#AC_DEFINE(PI_STATIC_AND_HIDDEN) diff --git a/sysdeps/loongarch/configure.ac b/sysdeps/loongarch/configure.ac new file mode 100644 index 0000000000..67b46ce048 --- /dev/null +++ b/sysdeps/loongarch/configure.ac @@ -0,0 +1,6 @@ +GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory. +# Local configure fragment for sysdeps/loongarch/elf. + +dnl It is always possible to access static and hidden symbols in an +dnl position independent way. +#AC_DEFINE(PI_STATIC_AND_HIDDEN) diff --git a/sysdeps/loongarch/lp64/Implies-after b/sysdeps/loongarch/lp64/Implies-after new file mode 100644 index 0000000000..a8cae95f9d --- /dev/null +++ b/sysdeps/loongarch/lp64/Implies-after @@ -0,0 +1 @@ +wordsize-64 diff --git a/sysdeps/loongarch/nptl/Makefile b/sysdeps/loongarch/nptl/Makefile new file mode 100644 index 0000000000..c6c773b179 --- /dev/null +++ b/sysdeps/loongarch/nptl/Makefile @@ -0,0 +1,26 @@ +# Makefile for sysdeps/loongarch/nptl. +# Copyright (C) 2021 Free Software Foundation, Inc. +# This file is part of the GNU C Library. +# +# The GNU C Library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. +# +# The GNU C Library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with the GNU C Library; if not, see +# . + +ifeq ($(subdir),csu) +gen-as-const-headers += tcb-offsets.sym +endif + +ifeq ($(subdir),nptl) +libpthread-sysdep_routines += nptl-sysdep +libpthread-shared-only-routines += nptl-sysdep +endif diff --git a/sysdeps/loongarch/preconfigure b/sysdeps/loongarch/preconfigure new file mode 100644 index 0000000000..26ffe88416 --- /dev/null +++ b/sysdeps/loongarch/preconfigure @@ -0,0 +1,9 @@ +case "$machine" in +loongarch*) + base_machine=loongarch + machine=loongarch/lp64 + ;; +esac + +#TODO: this file is useless now. +#Maybe we can make use of it to get arch info from GCC to set env diff --git a/sysdeps/loongarch/sys/regdef.h b/sysdeps/loongarch/sys/regdef.h new file mode 100644 index 0000000000..fb959f3901 --- /dev/null +++ b/sysdeps/loongarch/sys/regdef.h @@ -0,0 +1,100 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#ifndef _SYS_REGDEF_H +#define _SYS_REGDEF_H + +#if _LOONGARCH_SIM == _ABILP64 +#define zero $r0 +#define ra $r1 +#define tp $r2 +#define sp $r3 +#define a0 $r4 +#define a1 $r5 +#define a2 $r6 +#define a3 $r7 +#define a4 $r8 +#define a5 $r9 +#define a6 $r10 +#define a7 $r11 +#define v0 $r4 +#define v1 $r5 +#define t0 $r12 +#define t1 $r13 +#define t2 $r14 +#define t3 $r15 +#define t4 $r16 +#define t5 $r17 +#define t6 $r18 +#define t7 $r19 +#define t8 $r20 +#define x $r21 +#define fp $r22 +#define s0 $r23 +#define s1 $r24 +#define s2 $r25 +#define s3 $r26 +#define s4 $r27 +#define s5 $r28 +#define s6 $r29 +#define s7 $r30 +#define s8 $r31 + +#define fa0 $f0 +#define fa1 $f1 +#define fa2 $f2 +#define fa3 $f3 +#define fa4 $f4 +#define fa5 $f5 +#define fa6 $f6 +#define fa7 $f7 +#define fv0 $f0 +#define fv1 $f1 +#define ft0 $f8 +#define ft1 $f9 +#define ft2 $f10 +#define ft3 $f11 +#define ft4 $f12 +#define ft5 $f13 +#define ft6 $f14 +#define ft7 $f15 +#define ft8 $f16 +#define ft9 $f17 +#define ft10 $f18 +#define ft11 $f19 +#define ft12 $f20 +#define ft13 $f21 +#define ft14 $f22 +#define ft15 $f23 +#define fs0 $f24 +#define fs1 $f25 +#define fs2 $f26 +#define fs3 $f27 +#define fs4 $f28 +#define fs5 $f29 +#define fs6 $f30 +#define fs7 $f31 + +#elif _LOONGARCH_SIM == _ABILP32 +#error ABILP32 +#else +#error noABI +#endif + +#endif /* _SYS_REGDEF_H */ diff --git a/sysdeps/unix/sysv/linux/loongarch/Implies b/sysdeps/unix/sysv/linux/loongarch/Implies new file mode 100644 index 0000000000..e52b1ac310 --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/Implies @@ -0,0 +1 @@ +loongarch/nptl diff --git a/sysdeps/unix/sysv/linux/loongarch/Makefile b/sysdeps/unix/sysv/linux/loongarch/Makefile new file mode 100644 index 0000000000..30ac6fd497 --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/Makefile @@ -0,0 +1,16 @@ +ifeq ($(subdir),elf) +ifeq ($(build-shared),yes) +# This is needed for DSO loading from static binaries. +sysdep-dl-routines += dl-static +endif +endif + +#ifeq ($(subdir),misc) +#sysdep_headers += sys/cachectl.h +#sysdep_routines += flush-icache +#endif + +ifeq ($(subdir),stdlib) +gen-as-const-headers += ucontext_i.sym +endif + diff --git a/sysdeps/unix/sysv/linux/loongarch/Versions b/sysdeps/unix/sysv/linux/loongarch/Versions new file mode 100644 index 0000000000..006cef5ad0 --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/Versions @@ -0,0 +1,41 @@ +ld { + GLIBC_PRIVATE { + # used for loading by static libraries + _dl_var_init; + } +} +libc { + # The comment lines with "#errlist-compat" are magic; see errlist-compat.awk. + # When you get an error from errlist-compat.awk, you need to add a new + # version here. Don't do this blindly, since this means changing the ABI + # for all GNU/Linux configurations. + + GLIBC_2.0 { + #errlist-compat 123 + _sys_errlist; sys_errlist; _sys_nerr; sys_nerr; + + # Exception handling support functions from libgcc + __register_frame; __register_frame_table; __deregister_frame; + __frame_state_for; __register_frame_info_table; + + # Needed by gcc: + _flush_cache; + + # c* + cachectl; cacheflush; + } + GLIBC_2.2 { + #errlist-compat 1134 + _sys_errlist; sys_errlist; _sys_nerr; sys_nerr; + + # _* + _test_and_set; + } + GLIBC_2.11 { + fallocate64; + } + GLIBC_PRIVATE { + # nptl/pthread_cond_timedwait.c uses INTERNAL_VSYSCALL(clock_gettime). + __vdso_clock_gettime; + } +} diff --git a/sysdeps/unix/sysv/linux/loongarch/configure b/sysdeps/unix/sysv/linux/loongarch/configure new file mode 100644 index 0000000000..a9761a176c --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/configure @@ -0,0 +1,199 @@ +# This file is generated from configure.ac by Autoconf. DO NOT EDIT! + # Local configure fragment for sysdeps/unix/sysv/linux/loongarch. + +arch_minimum_kernel=4.15.0 + +libc_cv_loongarch_int_abi=no + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5 +$as_echo_n "checking for grep that handles long lines and -e... " >&6; } +if ${ac_cv_path_GREP+:} false; then : + $as_echo_n "(cached) " >&6 +else + if test -z "$GREP"; then + ac_path_GREP_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_prog in grep ggrep; do + for ac_exec_ext in '' $ac_executable_extensions; do + ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_GREP" || continue +# Check for GNU ac_path_GREP and select it if it is found. + # Check for GNU $ac_path_GREP +case `"$ac_path_GREP" --version 2>&1` in +*GNU*) + ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;; +*) + ac_count=0 + $as_echo_n 0123456789 >"conftest.in" + while : + do + cat "conftest.in" "conftest.in" >"conftest.tmp" + mv "conftest.tmp" "conftest.in" + cp "conftest.in" "conftest.nl" + $as_echo 'GREP' >> "conftest.nl" + "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break + diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break + as_fn_arith $ac_count + 1 && ac_count=$as_val + if test $ac_count -gt ${ac_path_GREP_max-0}; then + # Best one so far, save it but keep looking for a better one + ac_cv_path_GREP="$ac_path_GREP" + ac_path_GREP_max=$ac_count + fi + # 10*(2^10) chars as input seems more than enough + test $ac_count -gt 10 && break + done + rm -f conftest.in conftest.tmp conftest.nl conftest.out;; +esac + + $ac_path_GREP_found && break 3 + done + done + done +IFS=$as_save_IFS + if test -z "$ac_cv_path_GREP"; then + as_fn_error $? "no acceptable grep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 + fi +else + ac_cv_path_GREP=$GREP +fi + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5 +$as_echo "$ac_cv_path_GREP" >&6; } + GREP="$ac_cv_path_GREP" + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5 +$as_echo_n "checking for egrep... " >&6; } +if ${ac_cv_path_EGREP+:} false; then : + $as_echo_n "(cached) " >&6 +else + if echo a | $GREP -E '(a|b)' >/dev/null 2>&1 + then ac_cv_path_EGREP="$GREP -E" + else + if test -z "$EGREP"; then + ac_path_EGREP_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_prog in egrep; do + for ac_exec_ext in '' $ac_executable_extensions; do + ac_path_EGREP="$as_dir/$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_EGREP" || continue +# Check for GNU ac_path_EGREP and select it if it is found. + # Check for GNU $ac_path_EGREP +case `"$ac_path_EGREP" --version 2>&1` in +*GNU*) + ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;; +*) + ac_count=0 + $as_echo_n 0123456789 >"conftest.in" + while : + do + cat "conftest.in" "conftest.in" >"conftest.tmp" + mv "conftest.tmp" "conftest.in" + cp "conftest.in" "conftest.nl" + $as_echo 'EGREP' >> "conftest.nl" + "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break + diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break + as_fn_arith $ac_count + 1 && ac_count=$as_val + if test $ac_count -gt ${ac_path_EGREP_max-0}; then + # Best one so far, save it but keep looking for a better one + ac_cv_path_EGREP="$ac_path_EGREP" + ac_path_EGREP_max=$ac_count + fi + # 10*(2^10) chars as input seems more than enough + test $ac_count -gt 10 && break + done + rm -f conftest.in conftest.tmp conftest.nl conftest.out;; +esac + + $ac_path_EGREP_found && break 3 + done + done + done +IFS=$as_save_IFS + if test -z "$ac_cv_path_EGREP"; then + as_fn_error $? "no acceptable egrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 + fi +else + ac_cv_path_EGREP=$EGREP +fi + + fi +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5 +$as_echo "$ac_cv_path_EGREP" >&6; } + EGREP="$ac_cv_path_EGREP" + + +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +__SIZEOF_INT__ __SIZEOF_LONG__ __SIZEOF_POINTER__ + +_ACEOF +if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | + $EGREP "4 4 4" >/dev/null 2>&1; then : + libc_cv_loongarch_int_abi=lp32 +fi +rm -f conftest* + +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +__SIZEOF_INT__ __SIZEOF_LONG__ __SIZEOF_POINTER__ + +_ACEOF +if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | + $EGREP "4 8 8" >/dev/null 2>&1; then : + libc_cv_loongarch_int_abi=lp64 +fi +rm -f conftest* + +if test $libc_cv_loongarch_int_abi = no; then + as_fn_error $? "Unable to determine integer ABI" "$LINENO" 5 +fi + +config_vars="$config_vars +default-abi = $libc_cv_loongarch_int_abi" + +case $libc_cv_loongarch_int_abi in +lp32) + test -n "$libc_cv_slibdir" || +case "$prefix" in +/usr | /usr/) + libc_cv_slibdir='/lib32' + libc_cv_rtlddir='/lib32' + if test "$libdir" = '${exec_prefix}/lib'; then + libdir='${exec_prefix}/lib32'; + # Locale data can be shared between 32-bit and 64-bit libraries. + libc_cv_complocaledir='${exec_prefix}/lib/locale' + fi + ;; +esac + ;; +lp64) + test -n "$libc_cv_slibdir" || +case "$prefix" in +/usr | /usr/) + libc_cv_slibdir='/lib64' + libc_cv_rtlddir='/lib' + if test "$libdir" = '${exec_prefix}/lib'; then + libdir='${exec_prefix}/lib64'; + # Locale data can be shared between 32-bit and 64-bit libraries. + libc_cv_complocaledir='${exec_prefix}/lib/locale' + fi + ;; +esac + ;; +esac + +ldd_rewrite_script=sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed diff --git a/sysdeps/unix/sysv/linux/loongarch/configure.ac b/sysdeps/unix/sysv/linux/loongarch/configure.ac new file mode 100644 index 0000000000..fef4f4d2ae --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/configure.ac @@ -0,0 +1,27 @@ +sinclude(./aclocal.m4)dnl Autoconf lossage +GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory. +# Local configure fragment for sysdeps/unix/sysv/linux/loongarch. + +arch_minimum_kernel=4.15.0 + +libc_cv_loongarch_int_abi=no +AC_EGREP_CPP(4 4 4, [__SIZEOF_INT__ __SIZEOF_LONG__ __SIZEOF_POINTER__ + ], libc_cv_loongarch_int_abi=lp32) +AC_EGREP_CPP(4 8 8, [__SIZEOF_INT__ __SIZEOF_LONG__ __SIZEOF_POINTER__ + ], libc_cv_loongarch_int_abi=lp64) +if test $libc_cv_loongarch_int_abi = no; then + AC_MSG_ERROR([Unable to determine integer ABI]) +fi + +LIBC_CONFIG_VAR([default-abi], [$libc_cv_loongarch_int_abi]) + +case $libc_cv_loongarch_int_abi in +lp32) + LIBC_SLIBDIR_RTLDDIR([lib32], [lib32]) + ;; +lp64) + LIBC_SLIBDIR_RTLDDIR([lib64], [lib]) + ;; +esac + +ldd_rewrite_script=sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed diff --git a/sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed b/sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed new file mode 100644 index 0000000000..131c5f147f --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/ldd-rewrite.sed @@ -0,0 +1 @@ +s_^\(RTLDLIST=\)\(.*lib/\)\(ld-linux\)-\(loongarch64\)-\(lp64\)\(d*\)\(\.so\.[0-9.]*\)_\1"\2\3-\4-\5\7 \2\3-\4-\5d\7"_ diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/Implies b/sysdeps/unix/sysv/linux/loongarch/lp64/Implies new file mode 100644 index 0000000000..117c2b8efe --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/lp64/Implies @@ -0,0 +1,3 @@ +unix/sysv/linux/loongarch +unix/sysv/linux/generic +unix/sysv/linux/wordsize-64 diff --git a/sysdeps/unix/sysv/linux/loongarch/shlib-versions b/sysdeps/unix/sysv/linux/loongarch/shlib-versions new file mode 100644 index 0000000000..0e6345b836 --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/shlib-versions @@ -0,0 +1 @@ +DEFAULT GLIBC_2.27 From patchwork Thu Aug 19 04:16:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Hua X-Patchwork-Id: 44710 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C060C399201B for ; Thu, 19 Aug 2021 04:17:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C060C399201B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629346631; bh=0fIso/TI7W7KWWqH53iw39m30r5cp/TNDUjCEY64rnQ=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ci64c3cXxbXX7OqhDqaLqgwTSuuwXD3KprG7tnxNdjLA6TJWwcUgItsq2oOyTTSgm B/Rg2zpcwYEcBvRkvokagX/m2z4jAJN8Mo77CW13fa2PZLiH7sB+rXjOExHav021U/ n0ah8UFbszMmZO3aPd4i8HOPOI+ulp6DjTLdMBtQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by sourceware.org (Postfix) with ESMTPS id A66BC3857401 for ; Thu, 19 Aug 2021 04:16:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A66BC3857401 Received: by mail-yb1-xb2a.google.com with SMTP id z128so9706134ybc.10 for ; Wed, 18 Aug 2021 21:16:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=0fIso/TI7W7KWWqH53iw39m30r5cp/TNDUjCEY64rnQ=; b=J1qVUPacIhwjoKLXdW+TylUZjyQFSUQnCCuqS2Ehe1f1sGBgwwICPySMTeEPmjU7Ey 1biL3FLiwmrZ4gd01naB0KuVht3MzIQxAChCRUReXNaWv9W1j0FEzM+dZNewG8YkR/Zx Wd1ORnYPq1FrgmwcUTXV4Cv35AvipDL9XjLSAURwJyGiGyOwPSmBPaVbDtCSBBdWAy8E onP155p3bfZ/I/VWn3WTj/rGAK4jI0HlaoDOpl18zQbujS8mRgP8nFWdxdXGRKlMjGg4 x/GtvXMOX1c65u2Oh1PWoJ6nogvXk0Oyg7wJOODF65Y3OIPaBFBw3kx1VdsIRXqtHkDN PYsA== X-Gm-Message-State: AOAM530EkS9hRbL6t8gbgyw62qluIMv1F8XB+IbaUlaUeqRnn0yDa5kp y+WuwPw5CUajw4RAvwoMXToJTzfbAvhyo383/nofOWYRLpDIugp/ X-Google-Smtp-Source: ABdhPJzbT2f/4LMnkgWb21ouIYDRqqzcRLntvxZKgPBrmOr01zi6CPi5g1C69/h3gpmbU0BRk1ADwiZx3pJe8XaCbAQ= X-Received: by 2002:a25:c583:: with SMTP id v125mr15947785ybe.4.1629346596070; Wed, 18 Aug 2021 21:16:36 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 19 Aug 2021 12:16:15 +0800 Message-ID: Subject: [PATCH 11/14] [LoongArch] Hard Float Support To: libc-alpha@sourceware.org X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Paul Hua via Libc-alpha From: Paul Hua Reply-To: Paul Hua Cc: Xu Chenghua , huangpei@loongson.cn, caiyinyu@loongson.cn Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From 9e2429c4524800c843fdaa3988524a827340eb8c Mon Sep 17 00:00:00 2001 From: caiyinyu Date: Tue, 27 Jul 2021 16:15:06 +0800 Subject: [PATCH 11/14] LoongArch: Hard Float Support This patch contains hardware floating-point support for the LoongArch ISA. * sysdeps/loongarch/fpu/e_sqrt.c: New file. * sysdeps/loongarch/fpu/e_sqrtf.c: Likewise. * sysdeps/loongarch/fpu/fclrexcpt.c: Likewise. * sysdeps/loongarch/fpu/fedisblxcpt.c: Likewise. * sysdeps/loongarch/fpu/feenablxcpt.c: Likewise. * sysdeps/loongarch/fpu/fegetenv.c: Likewise. * sysdeps/loongarch/fpu/fegetexcept.c: Likewise. * sysdeps/loongarch/fpu/fegetmode.c: Likewise. * sysdeps/loongarch/fpu/fegetround.c: Likewise. * sysdeps/loongarch/fpu/feholdexcpt.c: Likewise. * sysdeps/loongarch/fpu/fenv_libc.h: Likewise. * sysdeps/loongarch/fpu/fesetenv.c: Likewise. * sysdeps/loongarch/fpu/fesetexcept.c: Likewise. * sysdeps/loongarch/fpu/fesetmode.c: Likewise. * sysdeps/loongarch/fpu/fesetround.c: Likewise. * sysdeps/loongarch/fpu/feupdateenv.c: Likewise. * sysdeps/loongarch/fpu/fgetexcptflg.c: Likewise. * sysdeps/loongarch/fpu/fraiseexcpt.c: Likewise. * sysdeps/loongarch/fpu/fsetexcptflg.c: Likewise. * sysdeps/loongarch/fpu/ftestexcept.c: Likewise. * sysdeps/loongarch/lp64/libm-test-ulps: Likewise. * sysdeps/loongarch/lp64/libm-test-ulps-name: Likewise. * sysdeps/loongarch/math_private.h: Likewise. --- sysdeps/loongarch/fpu/e_sqrt.c | 25 + sysdeps/loongarch/fpu/e_sqrtf.c | 25 + sysdeps/loongarch/fpu/fclrexcpt.c | 46 + sysdeps/loongarch/fpu/fedisblxcpt.c | 39 + sysdeps/loongarch/fpu/feenablxcpt.c | 39 + sysdeps/loongarch/fpu/fegetenv.c | 31 + sysdeps/loongarch/fpu/fegetexcept.c | 32 + sysdeps/loongarch/fpu/fegetmode.c | 27 + sysdeps/loongarch/fpu/fegetround.c | 33 + sysdeps/loongarch/fpu/feholdexcpt.c | 40 + sysdeps/loongarch/fpu/fenv_libc.h | 29 + sysdeps/loongarch/fpu/fesetenv.c | 42 + sysdeps/loongarch/fpu/fesetexcept.c | 32 + sysdeps/loongarch/fpu/fesetmode.c | 38 + sysdeps/loongarch/fpu/fesetround.c | 44 + sysdeps/loongarch/fpu/feupdateenv.c | 43 + sysdeps/loongarch/fpu/fgetexcptflg.c | 38 + sysdeps/loongarch/fpu/fraiseexcpt.c | 75 ++ sysdeps/loongarch/fpu/fsetexcptflg.c | 41 + sysdeps/loongarch/fpu/ftestexcept.c | 32 + sysdeps/loongarch/lp64/libm-test-ulps | 1411 ++++++++++++++++++++ sysdeps/loongarch/lp64/libm-test-ulps-name | 1 + sysdeps/loongarch/math_private.h | 248 ++++ 23 files changed, 2411 insertions(+) create mode 100644 sysdeps/loongarch/fpu/e_sqrt.c create mode 100644 sysdeps/loongarch/fpu/e_sqrtf.c create mode 100644 sysdeps/loongarch/fpu/fclrexcpt.c create mode 100644 sysdeps/loongarch/fpu/fedisblxcpt.c create mode 100644 sysdeps/loongarch/fpu/feenablxcpt.c create mode 100644 sysdeps/loongarch/fpu/fegetenv.c create mode 100644 sysdeps/loongarch/fpu/fegetexcept.c create mode 100644 sysdeps/loongarch/fpu/fegetmode.c create mode 100644 sysdeps/loongarch/fpu/fegetround.c create mode 100644 sysdeps/loongarch/fpu/feholdexcpt.c create mode 100644 sysdeps/loongarch/fpu/fenv_libc.h create mode 100644 sysdeps/loongarch/fpu/fesetenv.c create mode 100644 sysdeps/loongarch/fpu/fesetexcept.c create mode 100644 sysdeps/loongarch/fpu/fesetmode.c create mode 100644 sysdeps/loongarch/fpu/fesetround.c create mode 100644 sysdeps/loongarch/fpu/feupdateenv.c create mode 100644 sysdeps/loongarch/fpu/fgetexcptflg.c create mode 100644 sysdeps/loongarch/fpu/fraiseexcpt.c create mode 100644 sysdeps/loongarch/fpu/fsetexcptflg.c create mode 100644 sysdeps/loongarch/fpu/ftestexcept.c create mode 100644 sysdeps/loongarch/lp64/libm-test-ulps create mode 100644 sysdeps/loongarch/lp64/libm-test-ulps-name create mode 100644 sysdeps/loongarch/math_private.h diff --git a/sysdeps/loongarch/fpu/e_sqrt.c b/sysdeps/loongarch/fpu/e_sqrt.c new file mode 100644 index 0000000000..992e719e43 --- /dev/null +++ b/sysdeps/loongarch/fpu/e_sqrt.c @@ -0,0 +1,25 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +double +__ieee754_sqrt (double x) +{ + double z; + __asm__("fsqrt.d %0,%1" : "=f"(z) : "f"(x)); + return z; +} +strong_alias (__ieee754_sqrt, __sqrt_finite) diff --git a/sysdeps/loongarch/fpu/e_sqrtf.c b/sysdeps/loongarch/fpu/e_sqrtf.c new file mode 100644 index 0000000000..52954db072 --- /dev/null +++ b/sysdeps/loongarch/fpu/e_sqrtf.c @@ -0,0 +1,25 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +float +__ieee754_sqrtf (float x) +{ + float z; + __asm__("fsqrt.s %0,%1" : "=f"(z) : "f"(x)); + return z; +} +strong_alias (__ieee754_sqrtf, __sqrtf_finite) diff --git a/sysdeps/loongarch/fpu/fclrexcpt.c b/sysdeps/loongarch/fpu/fclrexcpt.c new file mode 100644 index 0000000000..6f77e9a391 --- /dev/null +++ b/sysdeps/loongarch/fpu/fclrexcpt.c @@ -0,0 +1,46 @@ +/* Clear given exceptions in current floating-point environment. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +int +feclearexcept (int excepts) +{ + int cw; + + /* Mask out unsupported bits/exceptions. */ + excepts &= FE_ALL_EXCEPT; + + /* Read the complete control word. */ + _FPU_GETCW (cw); + + /* Clear exception flag bits and cause bits. If the cause bit is not + cleared, the next CTC instruction (just below) will re-generate the + exception. */ + + cw &= ~(excepts | (excepts << CAUSE_SHIFT)); + + /* Put the new data in effect. */ + _FPU_SETCW (cw); + + /* Success. */ + return 0; +} +libm_hidden_def (feclearexcept) diff --git a/sysdeps/loongarch/fpu/fedisblxcpt.c b/sysdeps/loongarch/fpu/fedisblxcpt.c new file mode 100644 index 0000000000..532274bc19 --- /dev/null +++ b/sysdeps/loongarch/fpu/fedisblxcpt.c @@ -0,0 +1,39 @@ +/* Disable floating-point exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +int +fedisableexcept (int excepts) +{ + unsigned int new_exc, old_exc; + + /* Get the current control word. */ + _FPU_GETCW (new_exc); + + old_exc = (new_exc & ENABLE_MASK) << ENABLE_SHIFT; + + excepts &= FE_ALL_EXCEPT; + + new_exc &= ~(excepts >> ENABLE_SHIFT); + _FPU_SETCW (new_exc); + + return old_exc; +} diff --git a/sysdeps/loongarch/fpu/feenablxcpt.c b/sysdeps/loongarch/fpu/feenablxcpt.c new file mode 100644 index 0000000000..565ebd4d29 --- /dev/null +++ b/sysdeps/loongarch/fpu/feenablxcpt.c @@ -0,0 +1,39 @@ +/* Enable floating-point exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +int +feenableexcept (int excepts) +{ + unsigned int new_exc, old_exc; + + /* Get the current control word. */ + _FPU_GETCW (new_exc); + + old_exc = (new_exc & ENABLE_MASK) << ENABLE_SHIFT; + + excepts &= FE_ALL_EXCEPT; + + new_exc |= excepts >> ENABLE_SHIFT; + _FPU_SETCW (new_exc); + + return old_exc; +} diff --git a/sysdeps/loongarch/fpu/fegetenv.c b/sysdeps/loongarch/fpu/fegetenv.c new file mode 100644 index 0000000000..5e8c095fe5 --- /dev/null +++ b/sysdeps/loongarch/fpu/fegetenv.c @@ -0,0 +1,31 @@ +/* Store current floating-point environment. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +__fegetenv (fenv_t *envp) +{ + _FPU_GETCW (*envp); + + /* Success. */ + return 0; +} +libm_hidden_def (__fegetenv) weak_alias (__fegetenv, fegetenv) +libm_hidden_weak (fegetenv) diff --git a/sysdeps/loongarch/fpu/fegetexcept.c b/sysdeps/loongarch/fpu/fegetexcept.c new file mode 100644 index 0000000000..782e9d806d --- /dev/null +++ b/sysdeps/loongarch/fpu/fegetexcept.c @@ -0,0 +1,32 @@ +/* Get enabled floating-point exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +int +fegetexcept (void) +{ + unsigned int exc; + + /* Get the current control word. */ + _FPU_GETCW (exc); + + return (exc & ENABLE_MASK) << ENABLE_SHIFT; +} diff --git a/sysdeps/loongarch/fpu/fegetmode.c b/sysdeps/loongarch/fpu/fegetmode.c new file mode 100644 index 0000000000..9e4751f0e8 --- /dev/null +++ b/sysdeps/loongarch/fpu/fegetmode.c @@ -0,0 +1,27 @@ +/* Store current floating-point control modes. LoongArch version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +int +fegetmode (femode_t *modep) +{ + _FPU_GETCW (*modep); + return 0; +} diff --git a/sysdeps/loongarch/fpu/fegetround.c b/sysdeps/loongarch/fpu/fegetround.c new file mode 100644 index 0000000000..61a793a8c7 --- /dev/null +++ b/sysdeps/loongarch/fpu/fegetround.c @@ -0,0 +1,33 @@ +/* Return current rounding direction. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +__fegetround (void) +{ + int cw; + + /* Get control word. */ + _FPU_GETCW (cw); + + return cw & _FPU_RC_MASK; +} +libm_hidden_def (__fegetround) weak_alias (__fegetround, fegetround) +libm_hidden_weak (fegetround) diff --git a/sysdeps/loongarch/fpu/feholdexcpt.c b/sysdeps/loongarch/fpu/feholdexcpt.c new file mode 100644 index 0000000000..59791ba965 --- /dev/null +++ b/sysdeps/loongarch/fpu/feholdexcpt.c @@ -0,0 +1,40 @@ +/* Store current floating-point environment and clear exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +__feholdexcept (fenv_t *envp) +{ + fpu_control_t cw; + + /* Save the current state. */ + _FPU_GETCW (cw); + envp->__fp_control_register = cw; + + /* Clear all exception enable bits and flags. */ + cw &= ~(_FPU_MASK_V | _FPU_MASK_Z | _FPU_MASK_O | _FPU_MASK_U | _FPU_MASK_I + | FE_ALL_EXCEPT); + _FPU_SETCW (cw); + + return 0; +} + +libm_hidden_def (__feholdexcept) weak_alias (__feholdexcept, feholdexcept) +libm_hidden_weak (feholdexcept) diff --git a/sysdeps/loongarch/fpu/fenv_libc.h b/sysdeps/loongarch/fpu/fenv_libc.h new file mode 100644 index 0000000000..b229469632 --- /dev/null +++ b/sysdeps/loongarch/fpu/fenv_libc.h @@ -0,0 +1,29 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#ifndef _FENV_LIBC_H +#define _FENV_LIBC_H 1 + +/* Mask for enabling exceptions and for the CAUSE bits. */ +#define ENABLE_MASK 0x0000001FU +#define CAUSE_MASK 0x1F000000U + +/* Shift for FE_* flags to get up to the ENABLE bits and the CAUSE bits. */ +#define ENABLE_SHIFT 16 +#define CAUSE_SHIFT 8 + +#endif /* _FENV_LIBC_H */ diff --git a/sysdeps/loongarch/fpu/fesetenv.c b/sysdeps/loongarch/fpu/fesetenv.c new file mode 100644 index 0000000000..2a73a17db8 --- /dev/null +++ b/sysdeps/loongarch/fpu/fesetenv.c @@ -0,0 +1,42 @@ +/* Install given floating-point environment. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +__fesetenv (const fenv_t *envp) +{ + fpu_control_t cw; + + /* Read first current state to flush fpu pipeline. */ + _FPU_GETCW (cw); + + if (envp == FE_DFL_ENV) + _FPU_SETCW (_FPU_DEFAULT); + else if (envp == FE_NOMASK_ENV) + _FPU_SETCW (_FPU_IEEE); + else + _FPU_SETCW (envp->__fp_control_register); + + /* Success. */ + return 0; +} + +libm_hidden_def (__fesetenv) weak_alias (__fesetenv, fesetenv) +libm_hidden_weak (fesetenv) diff --git a/sysdeps/loongarch/fpu/fesetexcept.c b/sysdeps/loongarch/fpu/fesetexcept.c new file mode 100644 index 0000000000..e1e5527372 --- /dev/null +++ b/sysdeps/loongarch/fpu/fesetexcept.c @@ -0,0 +1,32 @@ +/* Set given exception flags. LoongArch version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +int +fesetexcept (int excepts) +{ + fpu_control_t temp; + + _FPU_GETCW (temp); + temp |= excepts & FE_ALL_EXCEPT; + _FPU_SETCW (temp); + + return 0; +} diff --git a/sysdeps/loongarch/fpu/fesetmode.c b/sysdeps/loongarch/fpu/fesetmode.c new file mode 100644 index 0000000000..f7ce867301 --- /dev/null +++ b/sysdeps/loongarch/fpu/fesetmode.c @@ -0,0 +1,38 @@ +/* Install given floating-point control modes. LoongArch version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#define FCSR_STATUS 0x1f1f0000 + +int +fesetmode (const femode_t *modep) +{ + fpu_control_t cw; + + _FPU_GETCW (cw); + cw &= FCSR_STATUS; + if (modep == FE_DFL_MODE) + cw |= _FPU_DEFAULT; + else + cw |= *modep & ~FCSR_STATUS; + _FPU_SETCW (cw); + + return 0; +} diff --git a/sysdeps/loongarch/fpu/fesetround.c b/sysdeps/loongarch/fpu/fesetround.c new file mode 100644 index 0000000000..dddac1ccf7 --- /dev/null +++ b/sysdeps/loongarch/fpu/fesetround.c @@ -0,0 +1,44 @@ +/* Set current rounding direction. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +__fesetround (int round) +{ + fpu_control_t cw; + + if ((round & ~_FPU_RC_MASK) != 0) + /* ROUND is no valid rounding mode. */ + return 1; + + /* Get current state. */ + _FPU_GETCW (cw); + + /* Set rounding bits. */ + cw &= ~_FPU_RC_MASK; + cw |= round; + /* Set new state. */ + _FPU_SETCW (cw); + + return 0; +} + +libm_hidden_def (__fesetround) weak_alias (__fesetround, fesetround) +libm_hidden_weak (fesetround) diff --git a/sysdeps/loongarch/fpu/feupdateenv.c b/sysdeps/loongarch/fpu/feupdateenv.c new file mode 100644 index 0000000000..ad147cbd46 --- /dev/null +++ b/sysdeps/loongarch/fpu/feupdateenv.c @@ -0,0 +1,43 @@ +/* Install given floating-point environment and raise exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +__feupdateenv (const fenv_t *envp) +{ + int temp; + + /* Save current exceptions. */ + _FPU_GETCW (temp); + temp &= FE_ALL_EXCEPT; + + /* Install new environment. */ + __fesetenv (envp); + + /* Raise the safed exception. Incidently for us the implementation + defined format of the values in objects of type fexcept_t is the + same as the ones specified using the FE_* constants. */ + __feraiseexcept (temp); + + /* Success. */ + return 0; +} +libm_hidden_def (__feupdateenv) weak_alias (__feupdateenv, feupdateenv) +libm_hidden_weak (feupdateenv) diff --git a/sysdeps/loongarch/fpu/fgetexcptflg.c b/sysdeps/loongarch/fpu/fgetexcptflg.c new file mode 100644 index 0000000000..85733765ea --- /dev/null +++ b/sysdeps/loongarch/fpu/fgetexcptflg.c @@ -0,0 +1,38 @@ +/* Store current representation for exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +fegetexceptflag (fexcept_t *flagp, int excepts) +{ + fpu_control_t temp; + + /* Get the current exceptions. */ + _FPU_GETCW (temp); + + /* We only save the relevant bits here. In particular, care has to be + taken with the CAUSE bits, as an inadvertent restore later on could + generate unexpected exceptions. */ + + *flagp = temp & excepts & FE_ALL_EXCEPT; + + /* Success. */ + return 0; +} diff --git a/sysdeps/loongarch/fpu/fraiseexcpt.c b/sysdeps/loongarch/fpu/fraiseexcpt.c new file mode 100644 index 0000000000..ac01dc7077 --- /dev/null +++ b/sysdeps/loongarch/fpu/fraiseexcpt.c @@ -0,0 +1,75 @@ +/* Raise given exceptions. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +int +__feraiseexcept (int excepts) +{ + const float fp_zero = 0.0, fp_one = 1.0, fp_max = FLT_MAX, fp_min = FLT_MIN, + fp_1e32 = 1.0e32f, fp_two = 2.0, fp_three = 3.0; + + /* Raise exceptions represented by EXPECTS. But we must raise only + one signal at a time. It is important that if the overflow/underflow + exception and the inexact exception are given at the same time, + the overflow/underflow exception follows the inexact exception. */ + + /* First: invalid exception. */ + if (FE_INVALID & excepts) + __asm__ __volatile__("fdiv.s $f0,%0,%0\n\t" + : + : "f"(fp_zero) + : "$f0"); + + /* Next: division by zero. */ + if (FE_DIVBYZERO & excepts) + __asm__ __volatile__("fdiv.s $f0,%0,%1\n\t" + : + : "f"(fp_one), "f"(fp_zero) + : "$f0"); + + /* Next: overflow. */ + if (FE_OVERFLOW & excepts) + /* There's no way to raise overflow without also raising inexact. */ + __asm__ __volatile__("fadd.s $f0,%0,%1\n\t" + : + : "f"(fp_max), "f"(fp_1e32) + : "$f0"); + + /* Next: underflow. */ + if (FE_UNDERFLOW & excepts) + __asm__ __volatile__("fdiv.s $f0,%0,%1\n\t" + : + : "f"(fp_min), "f"(fp_three) + : "$f0"); + + /* Last: inexact. */ + if (FE_INEXACT & excepts) + __asm__ __volatile__("fdiv.s $f0, %0, %1\n\t" + : + : "f"(fp_two), "f"(fp_three) + : "$f0"); + + /* Success. */ + return 0; +} + +libm_hidden_def (__feraiseexcept) weak_alias (__feraiseexcept, feraiseexcept) +libm_hidden_weak (feraiseexcept) diff --git a/sysdeps/loongarch/fpu/fsetexcptflg.c b/sysdeps/loongarch/fpu/fsetexcptflg.c new file mode 100644 index 0000000000..eef2faa6f4 --- /dev/null +++ b/sysdeps/loongarch/fpu/fsetexcptflg.c @@ -0,0 +1,41 @@ +/* Set floating-point environment exception handling. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +fesetexceptflag (const fexcept_t *flagp, int excepts) +{ + fpu_control_t temp; + + /* Get the current exceptions. */ + _FPU_GETCW (temp); + + /* Make sure the flags we want restored are legal. */ + excepts &= FE_ALL_EXCEPT; + + /* Now clear the bits called for, and copy them in from flagp. Note that + we ignore all non-flag bits from *flagp, so they don't matter. */ + temp = (temp & ~excepts) | (*flagp & excepts); + + _FPU_SETCW (temp); + + /* Success. */ + return 0; +} diff --git a/sysdeps/loongarch/fpu/ftestexcept.c b/sysdeps/loongarch/fpu/ftestexcept.c new file mode 100644 index 0000000000..3abd75ee42 --- /dev/null +++ b/sysdeps/loongarch/fpu/ftestexcept.c @@ -0,0 +1,32 @@ +/* Test exception in current environment. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +int +fetestexcept (int excepts) +{ + int cw; + + /* Get current control word. */ + _FPU_GETCW (cw); + + return cw & excepts & FE_ALL_EXCEPT; +} +libm_hidden_def (fetestexcept) diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps new file mode 100644 index 0000000000..31c02d7159 --- /dev/null +++ b/sysdeps/loongarch/lp64/libm-test-ulps @@ -0,0 +1,1411 @@ +# Begin of automatic generation + +# Maximal error of functions: +Function: "acos": +double: 1 +float: 1 +ldouble: 1 + +Function: "acos_downward": +double: 1 +float: 1 +ldouble: 1 + +Function: "acos_towardzero": +double: 1 +float: 1 +ldouble: 1 + +Function: "acos_upward": +double: 1 +float: 1 +ldouble: 1 + +Function: "acosh": +double: 2 +float: 2 +ldouble: 4 + +Function: "acosh_downward": +double: 2 +float: 2 +ldouble: 3 + +Function: "acosh_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: "acosh_upward": +double: 2 +float: 2 +ldouble: 3 + +Function: "asin": +double: 1 +float: 1 +ldouble: 1 + +Function: "asin_downward": +double: 1 +float: 1 +ldouble: 2 + +Function: "asin_towardzero": +double: 1 +float: 1 +ldouble: 1 + +Function: "asin_upward": +double: 2 +float: 1 +ldouble: 2 + +Function: "asinh": +double: 2 +float: 2 +ldouble: 4 + +Function: "asinh_downward": +double: 3 +float: 3 +ldouble: 4 + +Function: "asinh_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: "asinh_upward": +double: 3 +float: 3 +ldouble: 4 + +Function: "atan": +double: 1 +float: 1 +ldouble: 1 + +Function: "atan2": +float: 1 +ldouble: 2 + +Function: "atan2_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: "atan2_towardzero": +double: 1 +float: 2 +ldouble: 3 + +Function: "atan2_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: "atan_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: "atan_towardzero": +double: 1 +float: 1 +ldouble: 1 + +Function: "atan_upward": +double: 1 +float: 2 +ldouble: 2 + +Function: "atanh": +double: 2 +float: 2 +ldouble: 4 + +Function: "atanh_downward": +double: 3 +float: 3 +ldouble: 4 + +Function: "atanh_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: "atanh_upward": +double: 3 +float: 3 +ldouble: 4 + +Function: "cabs": +double: 1 +ldouble: 1 + +Function: "cabs_downward": +double: 1 +ldouble: 1 + +Function: "cabs_towardzero": +double: 1 +ldouble: 1 + +Function: "cabs_upward": +double: 1 +ldouble: 1 + +Function: Real part of "cacos": +double: 1 +float: 2 +ldouble: 2 + +Function: Imaginary part of "cacos": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "cacos_downward": +double: 3 +float: 2 +ldouble: 3 + +Function: Imaginary part of "cacos_downward": +double: 5 +float: 3 +ldouble: 6 + +Function: Real part of "cacos_towardzero": +double: 3 +float: 2 +ldouble: 3 + +Function: Imaginary part of "cacos_towardzero": +double: 4 +float: 2 +ldouble: 5 + +Function: Real part of "cacos_upward": +double: 2 +float: 2 +ldouble: 3 + +Function: Imaginary part of "cacos_upward": +double: 5 +float: 5 +ldouble: 7 + +Function: Real part of "cacosh": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "cacosh": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "cacosh_downward": +double: 4 +float: 2 +ldouble: 5 + +Function: Imaginary part of "cacosh_downward": +double: 3 +float: 3 +ldouble: 4 + +Function: Real part of "cacosh_towardzero": +double: 4 +float: 2 +ldouble: 5 + +Function: Imaginary part of "cacosh_towardzero": +double: 3 +float: 2 +ldouble: 3 + +Function: Real part of "cacosh_upward": +double: 4 +float: 3 +ldouble: 6 + +Function: Imaginary part of "cacosh_upward": +double: 3 +float: 2 +ldouble: 4 + +Function: "carg": +float: 1 +ldouble: 2 + +Function: "carg_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: "carg_towardzero": +double: 1 +float: 2 +ldouble: 3 + +Function: "carg_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: Real part of "casin": +double: 1 +float: 1 +ldouble: 2 + +Function: Imaginary part of "casin": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "casin_downward": +double: 3 +float: 2 +ldouble: 3 + +Function: Imaginary part of "casin_downward": +double: 5 +float: 3 +ldouble: 6 + +Function: Real part of "casin_towardzero": +double: 3 +float: 1 +ldouble: 3 + +Function: Imaginary part of "casin_towardzero": +double: 4 +float: 2 +ldouble: 5 + +Function: Real part of "casin_upward": +double: 3 +float: 2 +ldouble: 3 + +Function: Imaginary part of "casin_upward": +double: 5 +float: 5 +ldouble: 7 + +Function: Real part of "casinh": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "casinh": +double: 1 +float: 1 +ldouble: 2 + +Function: Real part of "casinh_downward": +double: 5 +float: 3 +ldouble: 6 + +Function: Imaginary part of "casinh_downward": +double: 3 +float: 2 +ldouble: 3 + +Function: Real part of "casinh_towardzero": +double: 4 +float: 2 +ldouble: 5 + +Function: Imaginary part of "casinh_towardzero": +double: 3 +float: 1 +ldouble: 3 + +Function: Real part of "casinh_upward": +double: 5 +float: 5 +ldouble: 7 + +Function: Imaginary part of "casinh_upward": +double: 3 +float: 2 +ldouble: 3 + +Function: Real part of "catan": +double: 1 +float: 1 +ldouble: 1 + +Function: Imaginary part of "catan": +double: 1 +float: 1 +ldouble: 1 + +Function: Real part of "catan_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: Imaginary part of "catan_downward": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "catan_towardzero": +double: 1 +float: 2 +ldouble: 2 + +Function: Imaginary part of "catan_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "catan_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: Imaginary part of "catan_upward": +double: 2 +float: 2 +ldouble: 3 + +Function: Real part of "catanh": +double: 1 +float: 1 +ldouble: 1 + +Function: Imaginary part of "catanh": +double: 1 +float: 1 +ldouble: 1 + +Function: Real part of "catanh_downward": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "catanh_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "catanh_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "catanh_towardzero": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "catanh_upward": +double: 4 +float: 4 +ldouble: 4 + +Function: Imaginary part of "catanh_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: "cbrt": +double: 4 +float: 1 +ldouble: 1 + +Function: "cbrt_downward": +double: 4 +float: 1 +ldouble: 1 + +Function: "cbrt_towardzero": +double: 3 +float: 1 +ldouble: 1 + +Function: "cbrt_upward": +double: 5 +float: 1 +ldouble: 1 + +Function: Real part of "ccos": +double: 1 +float: 1 +ldouble: 1 + +Function: Imaginary part of "ccos": +double: 1 +float: 1 +ldouble: 1 + +Function: Real part of "ccos_downward": +double: 1 +float: 1 +ldouble: 2 + +Function: Imaginary part of "ccos_downward": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "ccos_towardzero": +double: 1 +float: 2 +ldouble: 2 + +Function: Imaginary part of "ccos_towardzero": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "ccos_upward": +double: 1 +float: 2 +ldouble: 3 + +Function: Imaginary part of "ccos_upward": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "ccosh": +double: 1 +float: 1 +ldouble: 1 + +Function: Imaginary part of "ccosh": +double: 1 +float: 1 +ldouble: 1 + +Function: Real part of "ccosh_downward": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "ccosh_downward": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "ccosh_towardzero": +double: 2 +float: 3 +ldouble: 2 + +Function: Imaginary part of "ccosh_towardzero": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "ccosh_upward": +double: 1 +float: 2 +ldouble: 3 + +Function: Imaginary part of "ccosh_upward": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "cexp": +double: 2 +float: 1 +ldouble: 1 + +Function: Imaginary part of "cexp": +double: 1 +float: 2 +ldouble: 1 + +Function: Real part of "cexp_downward": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "cexp_downward": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "cexp_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "cexp_towardzero": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "cexp_upward": +double: 1 +float: 2 +ldouble: 3 + +Function: Imaginary part of "cexp_upward": +double: 3 +float: 2 +ldouble: 3 + +Function: Real part of "clog": +double: 3 +float: 3 +ldouble: 2 + +Function: Imaginary part of "clog": +double: 1 +float: 1 +ldouble: 1 + +Function: Real part of "clog10": +double: 3 +float: 4 +ldouble: 2 + +Function: Imaginary part of "clog10": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "clog10_downward": +double: 5 +float: 5 +ldouble: 3 + +Function: Imaginary part of "clog10_downward": +double: 2 +float: 4 +ldouble: 3 + +Function: Real part of "clog10_towardzero": +double: 5 +float: 5 +ldouble: 4 + +Function: Imaginary part of "clog10_towardzero": +double: 2 +float: 4 +ldouble: 3 + +Function: Real part of "clog10_upward": +double: 6 +float: 5 +ldouble: 4 + +Function: Imaginary part of "clog10_upward": +double: 2 +float: 4 +ldouble: 3 + +Function: Real part of "clog_downward": +double: 4 +float: 3 +ldouble: 3 + +Function: Imaginary part of "clog_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "clog_towardzero": +double: 4 +float: 4 +ldouble: 3 + +Function: Imaginary part of "clog_towardzero": +double: 1 +float: 3 +ldouble: 2 + +Function: Real part of "clog_upward": +double: 4 +float: 3 +ldouble: 4 + +Function: Imaginary part of "clog_upward": +double: 1 +float: 2 +ldouble: 2 + +Function: "cos": +double: 1 +float: 1 +ldouble: 2 + +Function: "cos_downward": +double: 1 +float: 1 +ldouble: 3 + +Function: "cos_towardzero": +double: 1 +float: 1 +ldouble: 1 + +Function: "cos_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: "cosh": +double: 2 +float: 2 +ldouble: 2 + +Function: "cosh_downward": +double: 3 +float: 1 +ldouble: 3 + +Function: "cosh_towardzero": +double: 3 +float: 1 +ldouble: 3 + +Function: "cosh_upward": +double: 2 +float: 2 +ldouble: 3 + +Function: Real part of "cpow": +double: 2 +float: 5 +ldouble: 4 + +Function: Imaginary part of "cpow": +float: 2 +ldouble: 1 + +Function: Real part of "cpow_downward": +double: 5 +float: 8 +ldouble: 6 + +Function: Imaginary part of "cpow_downward": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "cpow_towardzero": +double: 5 +float: 8 +ldouble: 6 + +Function: Imaginary part of "cpow_towardzero": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "cpow_upward": +double: 4 +float: 1 +ldouble: 3 + +Function: Imaginary part of "cpow_upward": +double: 1 +float: 2 +ldouble: 2 + +Function: Real part of "csin": +double: 1 +float: 1 +ldouble: 1 + +Function: Imaginary part of "csin": +ldouble: 1 + +Function: Real part of "csin_downward": +double: 3 +float: 3 +ldouble: 2 + +Function: Imaginary part of "csin_downward": +double: 1 +float: 1 +ldouble: 2 + +Function: Real part of "csin_towardzero": +double: 3 +float: 3 +ldouble: 2 + +Function: Imaginary part of "csin_towardzero": +double: 1 +float: 1 +ldouble: 2 + +Function: Real part of "csin_upward": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "csin_upward": +double: 1 +float: 2 +ldouble: 3 + +Function: Real part of "csinh": +float: 1 +ldouble: 1 + +Function: Imaginary part of "csinh": +double: 1 +float: 1 +ldouble: 1 + +Function: Real part of "csinh_downward": +double: 2 +float: 1 +ldouble: 2 + +Function: Imaginary part of "csinh_downward": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "csinh_towardzero": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "csinh_towardzero": +double: 3 +float: 3 +ldouble: 2 + +Function: Real part of "csinh_upward": +double: 1 +float: 2 +ldouble: 3 + +Function: Imaginary part of "csinh_upward": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "csqrt": +double: 2 +float: 2 +ldouble: 2 + +Function: Imaginary part of "csqrt": +double: 2 +float: 2 +ldouble: 2 + +Function: Real part of "csqrt_downward": +double: 5 +float: 4 +ldouble: 4 + +Function: Imaginary part of "csqrt_downward": +double: 4 +float: 3 +ldouble: 3 + +Function: Real part of "csqrt_towardzero": +double: 4 +float: 3 +ldouble: 3 + +Function: Imaginary part of "csqrt_towardzero": +double: 4 +float: 3 +ldouble: 3 + +Function: Real part of "csqrt_upward": +double: 5 +float: 4 +ldouble: 4 + +Function: Imaginary part of "csqrt_upward": +double: 3 +float: 3 +ldouble: 3 + +Function: Real part of "ctan": +double: 1 +float: 1 +ldouble: 3 + +Function: Imaginary part of "ctan": +double: 2 +float: 2 +ldouble: 3 + +Function: Real part of "ctan_downward": +double: 6 +float: 5 +ldouble: 4 + +Function: Imaginary part of "ctan_downward": +double: 2 +float: 2 +ldouble: 5 + +Function: Real part of "ctan_towardzero": +double: 5 +float: 2 +ldouble: 4 + +Function: Imaginary part of "ctan_towardzero": +double: 2 +float: 2 +ldouble: 5 + +Function: Real part of "ctan_upward": +double: 2 +float: 4 +ldouble: 5 + +Function: Imaginary part of "ctan_upward": +double: 2 +float: 2 +ldouble: 5 + +Function: Real part of "ctanh": +double: 2 +float: 2 +ldouble: 3 + +Function: Imaginary part of "ctanh": +double: 2 +float: 1 +ldouble: 3 + +Function: Real part of "ctanh_downward": +double: 4 +float: 2 +ldouble: 5 + +Function: Imaginary part of "ctanh_downward": +double: 6 +float: 5 +ldouble: 4 + +Function: Real part of "ctanh_towardzero": +double: 2 +float: 2 +ldouble: 5 + +Function: Imaginary part of "ctanh_towardzero": +double: 5 +float: 2 +ldouble: 3 + +Function: Real part of "ctanh_upward": +double: 2 +float: 2 +ldouble: 5 + +Function: Imaginary part of "ctanh_upward": +double: 2 +float: 3 +ldouble: 5 + +Function: "erf": +double: 1 +float: 1 +ldouble: 1 + +Function: "erf_downward": +double: 1 +float: 1 +ldouble: 2 + +Function: "erf_towardzero": +double: 1 +float: 1 +ldouble: 1 + +Function: "erf_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: "erfc": +double: 2 +float: 2 +ldouble: 4 + +Function: "erfc_downward": +double: 4 +float: 4 +ldouble: 5 + +Function: "erfc_towardzero": +double: 3 +float: 3 +ldouble: 4 + +Function: "erfc_upward": +double: 4 +float: 4 +ldouble: 5 + +Function: "exp": +double: 1 +float: 1 +ldouble: 1 + +Function: "exp10": +double: 2 +ldouble: 2 + +Function: "exp10_downward": +double: 3 +float: 1 +ldouble: 3 + +Function: "exp10_towardzero": +double: 3 +float: 1 +ldouble: 3 + +Function: "exp10_upward": +double: 2 +float: 1 +ldouble: 3 + +Function: "exp2": +double: 1 +ldouble: 1 + +Function: "exp2_downward": +double: 1 +ldouble: 1 + +Function: "exp2_towardzero": +double: 1 +ldouble: 1 + +Function: "exp2_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: "exp_downward": +double: 1 +float: 1 + +Function: "exp_towardzero": +double: 1 +float: 1 + +Function: "exp_upward": +double: 1 +float: 1 + +Function: "expm1": +double: 1 +float: 1 +ldouble: 2 + +Function: "expm1_downward": +double: 1 +float: 1 +ldouble: 2 + +Function: "expm1_towardzero": +double: 1 +float: 2 +ldouble: 4 + +Function: "expm1_upward": +double: 1 +float: 1 +ldouble: 3 + +Function: "gamma": +double: 3 +float: 3 +ldouble: 5 + +Function: "gamma_downward": +double: 4 +float: 4 +ldouble: 8 + +Function: "gamma_towardzero": +double: 4 +float: 3 +ldouble: 5 + +Function: "gamma_upward": +double: 4 +float: 5 +ldouble: 8 + +Function: "hypot": +double: 1 +ldouble: 1 + +Function: "hypot_downward": +double: 1 +ldouble: 1 + +Function: "hypot_towardzero": +double: 1 +ldouble: 1 + +Function: "hypot_upward": +double: 1 +ldouble: 1 + +Function: "j0": +double: 3 +float: 9 +ldouble: 2 + +Function: "j0_downward": +double: 6 +float: 9 +ldouble: 9 + +Function: "j0_towardzero": +double: 7 +float: 9 +ldouble: 9 + +Function: "j0_upward": +double: 9 +float: 8 +ldouble: 7 + +Function: "j1": +double: 4 +float: 9 +ldouble: 4 + +Function: "j1_downward": +double: 3 +float: 8 +ldouble: 4 + +Function: "j1_towardzero": +double: 4 +float: 8 +ldouble: 4 + +Function: "j1_upward": +double: 9 +float: 9 +ldouble: 3 + +Function: "jn": +double: 4 +float: 4 +ldouble: 7 + +Function: "jn_downward": +double: 4 +float: 5 +ldouble: 8 + +Function: "jn_towardzero": +double: 4 +float: 5 +ldouble: 8 + +Function: "jn_upward": +double: 5 +float: 4 +ldouble: 7 + +Function: "lgamma": +double: 3 +float: 3 +ldouble: 5 + +Function: "lgamma_downward": +double: 4 +float: 4 +ldouble: 8 + +Function: "lgamma_towardzero": +double: 4 +float: 3 +ldouble: 5 + +Function: "lgamma_upward": +double: 4 +float: 5 +ldouble: 8 + +Function: "log": +double: 1 +ldouble: 1 + +Function: "log10": +double: 2 +float: 2 +ldouble: 2 + +Function: "log10_downward": +double: 2 +float: 3 +ldouble: 1 + +Function: "log10_towardzero": +double: 2 +float: 1 +ldouble: 1 + +Function: "log10_upward": +double: 2 +float: 2 +ldouble: 1 + +Function: "log1p": +double: 1 +float: 1 +ldouble: 3 + +Function: "log1p_downward": +double: 1 +float: 2 +ldouble: 3 + +Function: "log1p_towardzero": +double: 2 +float: 2 +ldouble: 3 + +Function: "log1p_upward": +double: 2 +float: 2 +ldouble: 2 + +Function: "log2": +double: 1 +float: 1 +ldouble: 3 + +Function: "log2_downward": +double: 3 +ldouble: 3 + +Function: "log2_towardzero": +double: 2 +ldouble: 1 + +Function: "log2_upward": +double: 3 +ldouble: 1 + +Function: "log_downward": +ldouble: 1 + +Function: "log_towardzero": +ldouble: 2 + +Function: "log_upward": +double: 1 +ldouble: 2 + +Function: "pow": +double: 1 +ldouble: 2 + +Function: "pow_downward": +double: 1 +float: 1 +ldouble: 2 + +Function: "pow_towardzero": +double: 1 +float: 1 +ldouble: 2 + +Function: "pow_upward": +double: 1 +float: 1 +ldouble: 2 + +Function: "sin": +double: 1 +float: 1 +ldouble: 2 + +Function: "sin_downward": +double: 1 +float: 1 +ldouble: 3 + +Function: "sin_towardzero": +double: 1 +float: 1 +ldouble: 2 + +Function: "sin_upward": +double: 1 +float: 1 +ldouble: 3 + +Function: "sincos": +double: 1 +ldouble: 1 + +Function: "sincos_downward": +double: 1 +float: 1 +ldouble: 3 + +Function: "sincos_towardzero": +double: 1 +float: 1 +ldouble: 2 + +Function: "sincos_upward": +double: 1 +float: 1 +ldouble: 3 + +Function: "sinh": +double: 2 +float: 2 +ldouble: 2 + +Function: "sinh_downward": +double: 3 +float: 3 +ldouble: 3 + +Function: "sinh_towardzero": +double: 3 +float: 2 +ldouble: 3 + +Function: "sinh_upward": +double: 3 +float: 3 +ldouble: 4 + +Function: "tan": +float: 1 +ldouble: 1 + +Function: "tan_downward": +double: 1 +float: 2 +ldouble: 1 + +Function: "tan_towardzero": +double: 1 +float: 1 +ldouble: 1 + +Function: "tan_upward": +double: 1 +float: 1 +ldouble: 1 + +Function: "tanh": +double: 2 +float: 2 +ldouble: 2 + +Function: "tanh_downward": +double: 3 +float: 3 +ldouble: 4 + +Function: "tanh_towardzero": +double: 2 +float: 2 +ldouble: 3 + +Function: "tanh_upward": +double: 3 +float: 3 +ldouble: 3 + +Function: "tgamma": +double: 9 +float: 8 +ldouble: 4 + +Function: "tgamma_downward": +double: 9 +float: 7 +ldouble: 5 + +Function: "tgamma_towardzero": +double: 9 +float: 7 +ldouble: 5 + +Function: "tgamma_upward": +double: 9 +float: 8 +ldouble: 4 + +Function: "y0": +double: 2 +float: 8 +ldouble: 3 + +Function: "y0_downward": +double: 3 +float: 8 +ldouble: 7 + +Function: "y0_towardzero": +double: 3 +float: 8 +ldouble: 3 + +Function: "y0_upward": +double: 2 +float: 8 +ldouble: 4 + +Function: "y1": +double: 3 +float: 9 +ldouble: 5 + +Function: "y1_downward": +double: 6 +float: 8 +ldouble: 5 + +Function: "y1_towardzero": +double: 3 +float: 9 +ldouble: 2 + +Function: "y1_upward": +double: 6 +float: 9 +ldouble: 5 + +Function: "yn": +double: 3 +float: 3 +ldouble: 5 + +Function: "yn_downward": +double: 3 +float: 4 +ldouble: 5 + +Function: "yn_towardzero": +double: 3 +float: 3 +ldouble: 5 + +Function: "yn_upward": +double: 4 +float: 5 +ldouble: 5 + +# end of automatic generation diff --git a/sysdeps/loongarch/lp64/libm-test-ulps-name b/sysdeps/loongarch/lp64/libm-test-ulps-name new file mode 100644 index 0000000000..ce02281eab --- /dev/null +++ b/sysdeps/loongarch/lp64/libm-test-ulps-name @@ -0,0 +1 @@ +LoongArch 64-bit diff --git a/sysdeps/loongarch/math_private.h b/sysdeps/loongarch/math_private.h new file mode 100644 index 0000000000..9463f653d0 --- /dev/null +++ b/sysdeps/loongarch/math_private.h @@ -0,0 +1,248 @@ +/* Internal math stuff. LoongArch version. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef LOONGARCH_MATH_PRIVATE_H +#define LOONGARCH_MATH_PRIVATE_H 1 + +/* Inline functions to speed up the math library implementation. The + default versions of these routines are in generic/math_private.h + and call fesetround, feholdexcept, etc. These routines use inlined + code instead. */ + +#ifdef __loongarch_hard_float + +#include +#include +#include + +#define _FPU_MASK_ALL \ + (_FPU_MASK_V | _FPU_MASK_Z | _FPU_MASK_O | _FPU_MASK_U | _FPU_MASK_I \ + | FE_ALL_EXCEPT) + +static __always_inline void +libc_feholdexcept_loongarch (fenv_t *envp) +{ + fpu_control_t cw; + + /* Save the current state. */ + _FPU_GETCW (cw); + envp->__fp_control_register = cw; + + /* Clear all exception enable bits and flags. */ + cw &= ~(_FPU_MASK_ALL); + _FPU_SETCW (cw); +} +#define libc_feholdexcept libc_feholdexcept_loongarch +#define libc_feholdexceptf libc_feholdexcept_loongarch +#define libc_feholdexceptl libc_feholdexcept_loongarch + +static __always_inline void +libc_fesetround_loongarch (int round) +{ + fpu_control_t cw; + + /* Get current state. */ + _FPU_GETCW (cw); + + /* Set rounding bits. */ + cw &= ~_FPU_RC_MASK; + cw |= round; + + /* Set new state. */ + _FPU_SETCW (cw); +} +#define libc_fesetround libc_fesetround_loongarch +#define libc_fesetroundf libc_fesetround_loongarch +#define libc_fesetroundl libc_fesetround_loongarch + +static __always_inline void +libc_feholdexcept_setround_loongarch (fenv_t *envp, int round) +{ + fpu_control_t cw; + + /* Save the current state. */ + _FPU_GETCW (cw); + envp->__fp_control_register = cw; + + /* Clear all exception enable bits and flags. */ + cw &= ~(_FPU_MASK_ALL); + + /* Set rounding bits. */ + cw &= ~_FPU_RC_MASK; + cw |= round; + + /* Set new state. */ + _FPU_SETCW (cw); +} +#define libc_feholdexcept_setround libc_feholdexcept_setround_loongarch +#define libc_feholdexcept_setroundf libc_feholdexcept_setround_loongarch +#define libc_feholdexcept_setroundl libc_feholdexcept_setround_loongarch + +#define libc_feholdsetround libc_feholdexcept_setround_loongarch +#define libc_feholdsetroundf libc_feholdexcept_setround_loongarch +#define libc_feholdsetroundl libc_feholdexcept_setround_loongarch + +static __always_inline void +libc_fesetenv_loongarch (fenv_t *envp) +{ + fpu_control_t cw __attribute__ ((unused)); + + /* Read current state to flush fpu pipeline. */ + _FPU_GETCW (cw); + + _FPU_SETCW (envp->__fp_control_register); +} +#define libc_fesetenv libc_fesetenv_loongarch +#define libc_fesetenvf libc_fesetenv_loongarch +#define libc_fesetenvl libc_fesetenv_loongarch + +static __always_inline int +libc_feupdateenv_test_loongarch (fenv_t *envp, int excepts) +{ + /* int ret = fetestexcept (excepts); feupdateenv (envp); return ret; */ + int cw, temp; + + /* Get current control word. */ + _FPU_GETCW (cw); + + /* Set flag bits (which are accumulative), and *also* set the + cause bits. The setting of the cause bits is what actually causes + the hardware to generate the exception, if the corresponding enable + bit is set as well. */ + temp = cw & FE_ALL_EXCEPT; + temp |= envp->__fp_control_register | (temp << CAUSE_SHIFT); + + /* Set new state. */ + _FPU_SETCW (temp); + + return cw & excepts & FE_ALL_EXCEPT; +} +#define libc_feupdateenv_test libc_feupdateenv_test_loongarch +#define libc_feupdateenv_testf libc_feupdateenv_test_loongarch +#define libc_feupdateenv_testl libc_feupdateenv_test_loongarch + +static __always_inline void +libc_feupdateenv_loongarch (fenv_t *envp) +{ + libc_feupdateenv_test_loongarch (envp, 0); +} +#define libc_feupdateenv libc_feupdateenv_loongarch +#define libc_feupdateenvf libc_feupdateenv_loongarch +#define libc_feupdateenvl libc_feupdateenv_loongarch + +#define libc_feresetround libc_feupdateenv_loongarch +#define libc_feresetroundf libc_feupdateenv_loongarch +#define libc_feresetroundl libc_feupdateenv_loongarch + +static __always_inline int +libc_fetestexcept_loongarch (int excepts) +{ + int cw; + + /* Get current control word. */ + _FPU_GETCW (cw); + + return cw & excepts & FE_ALL_EXCEPT; +} +#define libc_fetestexcept libc_fetestexcept_loongarch +#define libc_fetestexceptf libc_fetestexcept_loongarch +#define libc_fetestexceptl libc_fetestexcept_loongarch + +/* Enable support for rounding mode context. */ +#define HAVE_RM_CTX 1 + +static __always_inline void +libc_feholdexcept_setround_loongarch_ctx (struct rm_ctx *ctx, int round) +{ + fpu_control_t old, new; + + /* Save the current state. */ + _FPU_GETCW (old); + ctx->env.__fp_control_register = old; + + /* Clear all exception enable bits and flags. */ + new = old & ~(_FPU_MASK_ALL); + + /* Set rounding bits. */ + new = (new & ~_FPU_RC_MASK) | round; + + if (__glibc_unlikely (new != old)) + { + _FPU_SETCW (new); + ctx->updated_status = true; + } + else + ctx->updated_status = false; +} +#define libc_feholdexcept_setround_ctx libc_feholdexcept_setround_loongarch_ctx +#define libc_feholdexcept_setroundf_ctx \ + libc_feholdexcept_setround_loongarch_ctx +#define libc_feholdexcept_setroundl_ctx \ + libc_feholdexcept_setround_loongarch_ctx + +static __always_inline void +libc_fesetenv_loongarch_ctx (struct rm_ctx *ctx) +{ + libc_fesetenv_loongarch (&ctx->env); +} +#define libc_fesetenv_ctx libc_fesetenv_loongarch_ctx +#define libc_fesetenvf_ctx libc_fesetenv_loongarch_ctx +#define libc_fesetenvl_ctx libc_fesetenv_loongarch_ctx + +static __always_inline void +libc_feupdateenv_loongarch_ctx (struct rm_ctx *ctx) +{ + if (__glibc_unlikely (ctx->updated_status)) + libc_feupdateenv_test_loongarch (&ctx->env, 0); +} +#define libc_feupdateenv_ctx libc_feupdateenv_loongarch_ctx +#define libc_feupdateenvf_ctx libc_feupdateenv_loongarch_ctx +#define libc_feupdateenvl_ctx libc_feupdateenv_loongarch_ctx +#define libc_feresetround_ctx libc_feupdateenv_loongarch_ctx +#define libc_feresetroundf_ctx libc_feupdateenv_loongarch_ctx +#define libc_feresetroundl_ctx libc_feupdateenv_loongarch_ctx + +static __always_inline void +libc_feholdsetround_loongarch_ctx (struct rm_ctx *ctx, int round) +{ + fpu_control_t old, new; + + /* Save the current state. */ + _FPU_GETCW (old); + ctx->env.__fp_control_register = old; + + /* Set rounding bits. */ + new = (old & ~_FPU_RC_MASK) | round; + + if (__glibc_unlikely (new != old)) + { + _FPU_SETCW (new); + ctx->updated_status = true; + } + else + ctx->updated_status = false; +} +#define libc_feholdsetround_ctx libc_feholdsetround_loongarch_ctx +#define libc_feholdsetroundf_ctx libc_feholdsetround_loongarch_ctx +#define libc_feholdsetroundl_ctx libc_feholdsetround_loongarch_ctx + +#endif + +#include_next + +#endif From patchwork Thu Aug 19 04:18:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Hua X-Patchwork-Id: 44711 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 343F63992019 for ; Thu, 19 Aug 2021 04:19:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 343F63992019 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629346783; bh=jQq5PzGTSjlsLeibMvPGH20qt5vIPJjhBd/bFILtdbQ=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=KD/vx24h2o3D9XFMEwf1u2qLT/k5YUoGoZbALTUWZnOtlxNWimr1T0UXCqKjHEbcR Vlpa6ZPQL9BPz3tfuV0XKpJ2C0TDX4k4SzE4FZFcgOCRQ9P1M0Z8FD7AHsSdpq7WEl yDfhgYgG8NjENssyjfiSCY96f44wtG5eNt8esRh0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by sourceware.org (Postfix) with ESMTPS id C9BD43857401 for ; Thu, 19 Aug 2021 04:19:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C9BD43857401 Received: by mail-yb1-xb34.google.com with SMTP id z18so9736879ybg.8 for ; Wed, 18 Aug 2021 21:19:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=jQq5PzGTSjlsLeibMvPGH20qt5vIPJjhBd/bFILtdbQ=; b=Oq9uEL8g3X+bmLQFOiQmLUl3tAhFgv1VFNtPTUrs8gQVnCtxDEB4T0xfvEPBQQj2JR bDP7D+9pOjN10/EdJv+iN1jONc1mCLK1tkuItOnAIelG2o6rogeenxVrh1kGic0RGdFK lbTuRrR+mq/fRbhBnBAUudTMmZDV4YbcjUxVUfV3l1BA5sOmU6d3wjQHTXe3/QsEkkDq K4urLfigj4t/g/QYbS90KGYi5axdvOJphat2RtCzSiMFkfPxCpJLrdy0RYRVWBITNIOf iKQX4TsbMDMV45XrDPh+qGUsNXl3PVBDe+lfu6lBV/5xT6jvhfx/GK6wBRYIGdTuFNKT lUEA== X-Gm-Message-State: AOAM532VaN0Mg4SE1jh7T3xfmWrMilYE5t8QNkh3wBdaqoat22iO+D9u 1D0+9mdnpGNwaGF1mpBVss70gee6f1Pk1IV/Z5vBK2iDA1760Xcq X-Google-Smtp-Source: ABdhPJxzOZb/Pxr0H5TbigEptDMJBu+giG6+g+ntn3LEiQKb+siyLaTashAmjH64YMZ2Tl0wBLVobWKdOHrVNOoYci0= X-Received: by 2002:a25:af4a:: with SMTP id c10mr16297927ybj.482.1629346750230; Wed, 18 Aug 2021 21:19:10 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 19 Aug 2021 12:18:58 +0800 Message-ID: Subject: [PATCH 12/14] [LoongArch] Add optimized string functions To: libc-alpha@sourceware.org X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Paul Hua via Libc-alpha From: Paul Hua Reply-To: Paul Hua Cc: Xu Chenghua , huangpei@loongson.cn, caiyinyu@loongson.cn Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From c51b254809ebf7d2471497ff8e1053e4a30d1baf Mon Sep 17 00:00:00 2001 From: caiyinyu Date: Tue, 27 Jul 2021 16:18:43 +0800 Subject: [PATCH 12/14] LoongArch: Add optimized string functions * sysdeps/loongarch/lp64/strchr.S: New file. * sysdeps/loongarch/lp64/strchrnul.S: Likewise. * sysdeps/loongarch/lp64/strcmp.S: Likewise. * sysdeps/loongarch/lp64/strcpy.S: Likewise. * sysdeps/loongarch/lp64/strlen.S: Likewise. * sysdeps/loongarch/lp64/strncmp.S: Likewise. * sysdeps/loongarch/lp64/strnlen.S: Likewise. --- sysdeps/loongarch/lp64/strchr.S | 148 +++++++++++++++ sysdeps/loongarch/lp64/strchrnul.S | 163 +++++++++++++++++ sysdeps/loongarch/lp64/strcmp.S | 211 +++++++++++++++++++++ sysdeps/loongarch/lp64/strcpy.S | 224 +++++++++++++++++++++++ sysdeps/loongarch/lp64/strlen.S | 151 +++++++++++++++ sysdeps/loongarch/lp64/strncmp.S | 282 +++++++++++++++++++++++++++++ sysdeps/loongarch/lp64/strnlen.S | 168 +++++++++++++++++ 7 files changed, 1347 insertions(+) create mode 100644 sysdeps/loongarch/lp64/strchr.S create mode 100644 sysdeps/loongarch/lp64/strchrnul.S create mode 100644 sysdeps/loongarch/lp64/strcmp.S create mode 100644 sysdeps/loongarch/lp64/strcpy.S create mode 100644 sysdeps/loongarch/lp64/strlen.S create mode 100644 sysdeps/loongarch/lp64/strncmp.S create mode 100644 sysdeps/loongarch/lp64/strnlen.S diff --git a/sysdeps/loongarch/lp64/strchr.S b/sysdeps/loongarch/lp64/strchr.S new file mode 100644 index 0000000000..de02859500 --- /dev/null +++ b/sysdeps/loongarch/lp64/strchr.S @@ -0,0 +1,148 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch + * ABI: lp64 + */ + +/* basic algorithm : + + +. use ld.d and mask for the first 8 bytes or less; + + +. build a1 with 8c with dins; + + +. use xor from a1 and v0 to check if is found; + + +. if (v0 - 0x0101010101010101) & (~(v0 | 0x7f7f7f7f7f7f7f7f)!= 0, v0 has + one byte is \0, else has no \0 + */ + +#include +#include + +#define L_ADDIU addi.d +#define L_ADDU add.d +#define L_SUBU sub.d + +#define STRCHR strchr +#define MOVN(rd,rs,rt) \ + maskeqz t6, rs, rt;\ + masknez rd, rd, rt;\ + or rd, rd, t6 + +#define MOVN2(rd,rt) \ + masknez rd, rd, rt;\ + or rd, rd, rt + + +/* char * strchr (const char *s1, int c); */ + +LEAF(STRCHR) + .align 6 + + li.w t4, 0x7 + lu12i.w a2, 0x01010 + bstrins.d a1, a1, 15, 8 + andi t0, a0, 0x7 + + ori a2, a2, 0x101 + andn t4, a0, t4 + slli.w t1, t0, 3 + + ld.d t4, t4, 0 + + + nor t8, zero, zero + bstrins.d a1, a1, 31, 16 + srl.d t4, t4, t1 + + bstrins.d a1, a1, 63, 32 + bstrins.d a2, a2, 63, 32 + srl.d a7, t8, t1 + + li.w t1, 8 + nor t8, a7, zero + slli.d a3, a2, 7 + or t5, t8, t4 + and t3, a7, a1 + + sub.w t1, t1, t0 + nor a3, a3, zero + xor t2, t5, t3 + sub.d a7, t5, a2 + nor a6, t5, a3 + + sub.d a5, t2, a2 + nor a4, t2, a3 + + and a6, a7, a6 + and a5, a5, a4 + or a7, a6, a5 + bnez a7, L(_mc8_a) + + L_ADDU a0, a0, t1 +L(_aloop): + ld.d t4, a0, 0 + + xor t2, t4, a1 + sub.d a7, t4, a2 + nor a6, t4, a3 + sub.d a5, t2, a2 + + nor a4, t2, a3 + and a6, a7, a6 + and a5, a5, a4 + or a7, a6, a5 + bnez a7, L(_mc8_a) + + ld.d t4, a0, 8 + L_ADDIU a0, a0, 16 + xor t2, t4, a1 + sub.d a7, t4, a2 + nor a6, t4, a3 + sub.d a5, t2, a2 + + nor a4, t2, a3 + and a6, a7, a6 + and a5, a5, a4 + or a7, a6, a5 + beqz a7, L(_aloop) + + L_ADDIU a0, a0, -8 +L(_mc8_a): + + ctz.d t0, a5 + ctz.d t2, a6 + + srli.w t0, t0, 3 + srli.w t2, t2, 3 + sltu t1, t2, t0 + L_ADDU v0, a0, t0 + masknez v0, v0, t1 + jr ra +END(STRCHR) + +#ifndef ANDROID_CHANGES +#ifdef _LIBC +libc_hidden_builtin_def (strchr) +weak_alias (strchr, index) +#endif +#endif diff --git a/sysdeps/loongarch/lp64/strchrnul.S b/sysdeps/loongarch/lp64/strchrnul.S new file mode 100644 index 0000000000..23f91c93dd --- /dev/null +++ b/sysdeps/loongarch/lp64/strchrnul.S @@ -0,0 +1,163 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch + * ABI: lp64 + */ + +/* basic algorithm : + + +. use ld.d and mask for the first 8 bytes or less; + + +. build a1 with 8c with dins; + + +. use xor from a1 and v0 to check if is found; + + +. if (v0 - 0x0101010101010101) & (~(v0 | 0x7f7f7f7f7f7f7f7f)!= 0, v0 has + one byte is \0, else has no \0 + + */ + +#include +#include + + +#define L_ADDIU addi.d +#define L_ADDU add.d +#define L_SUBU sub.d + +#define STRCHRNUL __strchrnul + +#define MOVN(rd,rs,rt) \ + maskeqz t6, rs, rt;\ + masknez rd, rd, rt;\ + or rd, rd, t6 + +#define MOVZ(rd,rs,rt) \ + masknez t6, rs, rt;\ + maskeqz rd, rd, rt;\ + or rd, rd, t6 + + +#define MOVN2(rd,rt) \ + masknez rd, rd, rt;\ + or rd, rd, rt + +/* char * strchrnul (const char *s1, int c); */ + +LEAF(STRCHRNUL) + .align 6 + + li.w t4, 0x7 + lu12i.w a2, 0x01010 + bstrins.d a1, a1, 15, 8 + andi t0, a0, 0x7 + + ori a2, a2, 0x101 + andn t4, a0, t4 + slli.w t1, t0, 3 + + /* ldr t4, 0(a0) */ + ld.d t4, t4, 0 + + + nor t8, zero, zero + bstrins.d a1, a1, 31, 16 + srl.d t4, t4, t1 + + preld 0, a0, 32 + bstrins.d a1, a1, 63, 32 + bstrins.d a2, a2, 63, 32 + srl.d a7, t8, t1 + + nor t8, a7, zero + slli.d a3, a2, 7 + or t5, t8, t4 + and t3, a7, a1 + + nor a3, a3, zero + xor t2, t5, t3 + sub.d a7, t5, a2 + nor a6, t5, a3 + + li.w t1, 8 + sub.d a5, t2, a2 + nor a4, t2, a3 + + and a6, a7, a6 + and a5, a5, a4 + or a7, a6, a5 + bnez a7, L(_mc8_a) + + + sub.w t1, t1, t0 + L_ADDU a0, a0, t1 +L(_aloop): + ld.d t4, a0, 0 + + xor t2, t4, a1 + sub.d a7, t4, a2 + nor a6, t4, a3 + sub.d a5, t2, a2 + + nor a4, t2, a3 + and a6, a7, a6 + and a5, a5, a4 + + or a7, a6, a5 + bnez a7, L(_mc8_a) + + ld.d t4, a0, 8 + L_ADDIU a0, a0, 16 + + xor t2, t4, a1 + sub.d a7, t4, a2 + nor a6, t4, a3 + sub.d a5, t2, a2 + + nor a4, t2, a3 + and a6, a7, a6 + and a5, a5, a4 + + or a7, a6, a5 + beqz a7, L(_aloop) + + L_ADDIU a0, a0, -8 +L(_mc8_a): + + ctz.d t0, a5 + ctz.d t2, a6 + + srli.w t0, t0, 3 + srli.w t2, t2, 3 + slt t1, t0, t2 + + MOVZ(t0,t2,t1) + + L_ADDU v0, a0, t0 + jr ra +END(STRCHRNUL) + +#ifndef ANDROID_CHANGES +#ifdef _LIBC +weak_alias(__strchrnul, strchrnul) +libc_hidden_builtin_def (__strchrnul) +#endif +#endif diff --git a/sysdeps/loongarch/lp64/strcmp.S b/sysdeps/loongarch/lp64/strcmp.S new file mode 100644 index 0000000000..b59f36f578 --- /dev/null +++ b/sysdeps/loongarch/lp64/strcmp.S @@ -0,0 +1,211 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch + * ABI: lp64 + */ + +/* basic algorithm : + + +. let t0, t1 point to a0, a1, if a0 has smaller low 3 bit of a0 and a1, + set a4 to 1 and let t0 point to the larger of lower 3bit of a0 and a1 + + +. if low 3 bit of a0 equal low 3 bit of a0, use a ldr one time and more + ld other times; + + +. if not, load partial t2 and t3, check if t2 has \0; + + +. then use use ld for t0, ldr for t1, + + +. if partial 8 byte from t1 has \0, compare partial 8 byte from t1 with + 8 byte from t0 with a mask in a7 + + +. if not, ldl other part of t1, compare 8 byte from t1 with 8 byte from + t0 + + +. if (v0 - 0x0101010101010101) & (~v0) & 0x8080808080808080 != 0, v0 has + one byte is \0, else has no \0 + + +. for partial 8 byte from ldr t3, 0(a0), preload t3 with + 0xffffffffffffffff + */ + +#include +#include + + +#define STRCMP strcmp + +#define REP8_01 0x0101010101010101 +#define REP8_7f 0x7f7f7f7f7f7f7f7f +#define REP8_80 0x8080808080808080 + +/* Parameters and Results */ +#define src1 a0 +#define src2 a1 +#define result v0 +/* Note: v0 = a0 in lp64 ABI */ + + +/* Internal variable */ +#define data1 t0 +#define data2 t1 +#define has_nul t2 +#define diff t3 +#define syndrome t4 +#define zeroones t5 +#define sevenf t6 +#define pos t7 +#define exchange t8 +#define tmp1 a4 +#define tmp2 a5 +#define tmp3 a6 +#define src1_off a2 +#define src2_off a3 +#define tmp4 a7 + +/* rd <- if rc then ra else rb will destroy tmp3 */ + +#define CONDITIONSEL(rd,rc,ra,rb)\ + masknez tmp3, rb, rc;\ + maskeqz rd, ra, rc;\ + or rd, rd, tmp3 + + + +/* int strcmp (const char *s1, const char *s2); */ + +LEAF(STRCMP) + .align 4 + + xor tmp1, src1, src2 + lu12i.w zeroones, 0x01010 + lu12i.w sevenf, 0x7f7f7 + andi src1_off, src1, 0x7 + ori zeroones, zeroones, 0x101 + ori sevenf, sevenf, 0xf7f + andi tmp1, tmp1, 0x7 + bstrins.d zeroones, zeroones, 63, 32 + bstrins.d sevenf, sevenf, 63, 32 + bnez tmp1, strcmp_misaligned8 + bnez src1_off, strcmp_mutual_align +strcmp_loop_aligned: + ld.d data1, src1, 0 + addi.d src1, src1, 8 + ld.d data2, src2, 0 + addi.d src2, src2, 8 +strcmp_start_realigned: + sub.d tmp1, data1, zeroones + or tmp2, data1, sevenf + xor diff, data1, data2 + andn has_nul, tmp1, tmp2 + or syndrome, diff, has_nul + beqz syndrome, strcmp_loop_aligned + +strcmp_end: + ctz.d pos, syndrome + bstrins.d pos, zero, 2, 0 + srl.d data1, data1, pos + srl.d data2, data2, pos + andi data1, data1, 0xff + andi data2, data2, 0xff + sub.d result, data1, data2 + jr ra +strcmp_mutual_align: + bstrins.d src1, zero, 2, 0 + bstrins.d src2, zero, 2, 0 + slli.d tmp1, src1_off, 0x3 + ld.d data1, src1, 0 + sub.d tmp1, zero, tmp1 + ld.d data2, src2, 0 + addi.d src1, src1, 8 + addi.d src2, src2, 8 + nor tmp2, zero, zero + srl.d tmp2, tmp2, tmp1 + or data1, data1, tmp2 + or data2, data2, tmp2 + b strcmp_start_realigned + +strcmp_misaligned8: + +/* check if ((src1 != 0) && ((src2 == 0) || (src1 < src2))) + then exchange(src1,src2) +*/ + andi src2_off, src2, 0x7 + slt tmp2, src1_off, src2_off + CONDITIONSEL(tmp2,src2_off,tmp2,tmp1) + maskeqz exchange, tmp2, src1_off + xor tmp3, src1, src2 + maskeqz tmp3, tmp3, exchange + xor src1, src1, tmp3 + xor src2, src2, tmp3 + + andi src1_off, src1, 0x7 + beqz src1_off, strcmp_loop_misaligned +strcmp_do_misaligned: + ld.bu data1, src1, 0 + ld.bu data2, src2, 0 + xor tmp3, data1, data2 + addi.d src1, src1, 1 + masknez tmp3, data1, tmp3 + addi.d src2, src2, 1 + beqz tmp3, strcmp_done + andi src1_off, src1, 0x7 + bnez src1_off, strcmp_do_misaligned + +strcmp_loop_misaligned: + andi tmp1, src2, 0xff8 + xori tmp1, tmp1, 0xff8 + beqz tmp1, strcmp_do_misaligned + ld.d data1, src1, 0 + ld.d data2, src2, 0 + addi.d src1, src1, 8 + addi.d src2, src2, 8 + + sub.d tmp1, data1, zeroones + or tmp2, data1, sevenf + xor diff, data1, data2 + andn has_nul, tmp1, tmp2 + or syndrome, diff, has_nul + beqz syndrome, strcmp_loop_misaligned + +strcmp_misalign_end: + ctz.d pos, syndrome + bstrins.d pos, zero, 2, 0 + srl.d data1, data1, pos + srl.d data2, data2, pos + andi data1, data1, 0xff + andi data2, data2, 0xff + sub.d tmp1, data1, data2 + sub.d tmp2, data2, data1 + CONDITIONSEL(result,exchange,tmp2,tmp1) + jr ra + +strcmp_done: + sub.d tmp1, data1, data2 + sub.d tmp2, data2, data1 + CONDITIONSEL(result,exchange,tmp2,tmp1) + jr ra +END(STRCMP) +#ifndef ANDROID_CHANGES +#ifdef _LIBC +libc_hidden_builtin_def (strcmp) +#endif +#endif diff --git a/sysdeps/loongarch/lp64/strcpy.S b/sysdeps/loongarch/lp64/strcpy.S new file mode 100644 index 0000000000..e75eeb06cf --- /dev/null +++ b/sysdeps/loongarch/lp64/strcpy.S @@ -0,0 +1,224 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch + * ABI: lp64 + */ + +/* basic algorithm : + + +. if src aligned. just do the copy loop. if not, do the cross page check + and copy one double word. + + Then move src to aligned. + + +. if (v0 - 0x0101010101010101) & (~v0) & 0x8080808080808080 != 0, v0 has + one byte is \0, else has no \0 + + */ + +#include +#include + + +#define STRCPY strcpy + + +#define REP8_01 0x0101010101010101 +#define REP8_7f 0x7f7f7f7f7f7f7f7f +#define REP8_80 0x8080808080808080 + +/* Parameters and Results */ +#define dest a0 +#define src a1 +#define result v0 +/* Note: v0 = a0 in lp64 ABI */ + + +/* Internal variable */ +#define data t0 +#define data1 t1 +#define has_nul t2 +#define diff t3 +#define syndrome t4 +#define zeroones t5 +#define sevenf t6 +#define pos t7 +#define dest_backup t8 +#define tmp1 a4 +#define tmp2 a5 +#define tmp3 a6 +#define dest_off a2 +#define src_off a3 +#define tmp4 a7 + +/* rd <- if rc then ra else rb will destroy tmp3 */ +#define CONDITIONSEL(rd,rc,ra,rb)\ + masknez tmp3, rb, rc;\ + maskeqz rd, ra, rc;\ + or rd, rd, tmp3 + + + +/* int strcpy (const char *s1, const char *s2); */ +LEAF(STRCPY) + .align 4 + + move dest_backup, dest + lu12i.w zeroones, 0x01010 + lu12i.w sevenf, 0x7f7f7 + ori zeroones, zeroones, 0x101 + ori sevenf, sevenf, 0xf7f + bstrins.d zeroones, zeroones, 63, 32 + bstrins.d sevenf, sevenf, 63, 32 + andi src_off, src, 0x7 + beqz src_off, strcpy_loop_aligned_1 + b strcpy_mutual_align +strcpy_loop_aligned: + st.d data, dest, 0 + addi.d dest, dest, 8 +strcpy_loop_aligned_1: + ld.d data, src, 0 + addi.d src, src, 8 +strcpy_start_realigned: + sub.d tmp1, data, zeroones + or tmp2, data, sevenf + andn has_nul, tmp1, tmp2 + beqz has_nul, strcpy_loop_aligned + +strcpy_end: + +/* 8 4 2 1 */ + ctz.d pos, has_nul + srli.d pos, pos, 3 + addi.d pos, pos, 1 + +/* + Do 8/4/2/1 strcpy based on pos value. + pos value is the number of bytes to be copied + the bytes include the final \0 so the max length is 8 and the min length + is 1 + */ +strcpy_end_8: + andi tmp1, pos, 0x8 + beqz tmp1, strcpy_end_4 + st.d data, dest, 0 + move dest, dest_backup + jr ra +strcpy_end_4: + andi tmp1, pos, 0x4 + beqz tmp1, strcpy_end_2 + st.w data, dest, 0 + srli.d data, data, 32 + addi.d dest, dest, 4 +strcpy_end_2: + andi tmp1, pos, 0x2 + beqz tmp1, strcpy_end_1 + st.h data, dest, 0 + srli.d data, data, 16 + addi.d dest, dest, 2 +strcpy_end_1: + andi tmp1, pos, 0x1 + beqz tmp1, strcpy_end_ret + st.b data, dest, 0 +strcpy_end_ret: + move result, dest_backup + jr ra + +strcpy_mutual_align: + +/* + Check if around src page bound. + if not go to page cross ok. + if it is, do further check. + use tmp2 to accelerate. + */ + li.w tmp2, 0xff8 + andi tmp1, src, 0xff8 + beq tmp1, tmp2, strcpy_page_cross + +strcpy_page_cross_ok: + +/* + Load a misaligned double word and check if has \0 + If no, do a misaligned double word paste. + If yes, calculate the number of avaliable bytes, + then jump to 4/2/1 end. + */ + ld.d data, src, 0 + sub.d tmp1, data, zeroones + or tmp2, data, sevenf + andn has_nul, tmp1, tmp2 + bnez has_nul, strcpy_end +strcpy_mutual_align_finish: + +/* + Before jump back to align loop, make dest/src aligned. + This will cause a duplicated paste for several bytes between the first + double word and the second double word, + but should not bring a problem. + */ + li.w tmp1, 8 + st.d data, dest, 0 + sub.d tmp1, tmp1, src_off + add.d src, src, tmp1 + add.d dest, dest, tmp1 + + b strcpy_loop_aligned_1 + +strcpy_page_cross: + +/* + ld.d from aligned address(src & ~0x7). + check if high bytes have \0. + it not, go back to page cross ok, + since the string is supposed to cross the page bound in such situation. + if it is, + do a srl for data to make it seems like a direct double word from src, + then go to 4/2/1 strcpy end. + + tmp4 is 0xffff...ffff mask + tmp2 demonstrate the bytes to be masked + tmp2 = src_off << 3 + data = data >> (src_off * 8) | -1 << (64 - src_off * 8) + and + -1 << (64 - src_off * 8) -> ~(-1 >> (src_off * 8)) + + */ + li.w tmp1, 0x7 + andn tmp3, src, tmp1 + ld.d data, tmp3, 0 + li.w tmp4, -1 + slli.d tmp2, src_off, 3 + srl.d tmp4, tmp4, tmp2 + srl.d data, data, tmp2 + nor tmp4, tmp4, zero + or data, data, tmp4 + sub.d tmp1, data, zeroones + or tmp2, data, sevenf + andn has_nul, tmp1, tmp2 + beqz has_nul, strcpy_page_cross_ok + b strcpy_end +END(STRCPY) +#ifndef ANDROID_CHANGES +#ifdef _LIBC +libc_hidden_builtin_def (strcpy) +#endif +#endif diff --git a/sysdeps/loongarch/lp64/strlen.S b/sysdeps/loongarch/lp64/strlen.S new file mode 100644 index 0000000000..ec4bad21a6 --- /dev/null +++ b/sysdeps/loongarch/lp64/strlen.S @@ -0,0 +1,151 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch + * ABI: lp64 + */ +/* +algorithm: + + #. use ld/ldr to access word/partial word in the string + + #. use (x - 0x0101010101010101) & (~(x | 0x7f7f7f7f7f7f7f7f) != 0 to + judge if x has zero byte + + #. use dctz((x - 0x0101010101010101) & (~(x | 0x7f7f7f7f7f7f7f7f) >> 3 + to get the index of first rightmost zero byte in dword x; + + #. use dctz(x) = 64 - dclz(~x & (x-1)); + + #. use pointer to the last non zero byte minus pointer to the start + of the string to get the length of string + +*/ + +#include +#include + + +#define L_ADDIU addi.d +#define L_ADDU add.d +#define L_SUBU sub.d + +#define STRLEN strlen +#define L(x) x + + +/* size_t strlen (const char *s1); */ + + .text; + .globl strlen; + .align 5; + cfi_startproc; + .type strlen, @function; +strlen: + + /* LEAF(strlen) */ + #preld 0, a0, 0 + + nor t4, zero, zero + lu12i.w a2, 0x01010 + andi t5, a0, 0x7 + + li.w t7, 0x7 + slli.d t6, t5, 0x3 + andn t7, a0, t7 + ld.d a1, t7, 0 + sub.d t7, zero, t6 + sll.d t4, t4, t7 + maskeqz t4, t4, t6 + srl.d a1, a1, t6 + or a1, a1, t4 + + + ori a2, a2, 0x101 + nor t1, a1, zero + li.w a4, 8 + + #preld 0, a0, 32 + bstrins.d a2, a2, 63, 32 + sub.d a5, a4, t5 + move t5, a0 + + sub.d t0, a1, a2 + slli.d t4, a2, 7 + nor a3, zero, t4 + nor t1, a1, a3 + + and t0, t0, t1 + #preld 0, a0, 64 + +/* instead of use bnel with daddu a0, a0, a5 in branch slot */ + + bnez t0, strlen_count1 + L_ADDU a0, a0, a5 +strlen_loop: + ld.d a1, a0, 0 + sub.d t0, a1, a2 + and t1, t0, t4 + bnez t1, strlen_count_pre + ld.d a1, a0, 8 + sub.d t0, a1, a2 + and t1, t0, t4 + L_ADDIU a0, a0, 16 + beqz t1, strlen_loop +strlen_count: + addi.d a0, a0, -8 +strlen_count_pre: + nor t1, a1, a3 + and t0, t0, t1 + beqz t0, strlen_noascii_start +strlen_count1: + ctz.d t1, t0 + L_SUBU v0, a0, t5 + srli.w t1, t1, 3 + L_ADDU v0, v0, t1 + jr ra +strlen_noascii_start: + addi.d a0, a0, 8 +strlen_loop_noascii: + ld.d a1, a0, 0 + sub.d t0, a1, a2 + nor t1, a1, a3 + and t0, t0, t1 + bnez t0, strlen_count1 + ld.d a1, a0, 8 + sub.d t0, a1, a2 + nor t1, a1, a3 + and t0, t0, t1 + L_ADDIU a0, a0, 16 + beqz t0, strlen_loop_noascii + addi.d a0, a0, -8 + ctz.d t1, t0 + L_SUBU v0, a0, t5 + srli.w t1, t1, 3 + L_ADDU v0, v0, t1 + jr ra +END(STRLEN) + +#ifndef ANDROID_CHANGES +#ifdef _LIBC +libc_hidden_builtin_def (strlen) +#endif +#endif + diff --git a/sysdeps/loongarch/lp64/strncmp.S b/sysdeps/loongarch/lp64/strncmp.S new file mode 100644 index 0000000000..e95f5f7a36 --- /dev/null +++ b/sysdeps/loongarch/lp64/strncmp.S @@ -0,0 +1,282 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch64 + * ABI: lp64 + */ + +/* basic algorithm : + + +. let t0, t1 point to a0, a1, if a0 has smaller low 3 bit of a0 and a1, + set a4 to 1 and let t0 point to the larger of lower 3bit of a0 and a1 + + +. if low 3 bit of a0 equal low 3 bit of a0, use a ldr one time and more + ld other times; + + +. if not, load partial t2 and t3, check if t2 has \0; + + +. then use use ld for t0, ldr for t1, + + +. if partial 8 byte from t1 has \0, compare partial 8 byte from t1 with + 8 byte from t0 with a mask in a7 + + +. if not, ldl other part of t1, compare 8 byte from t1 with 8 byte from + t0 + + +. if (v0 - 0x0101010101010101) & (~v0) & 0x8080808080808080 != 0, v0 has + one byte is \0, else has no \0 + + +. for partial 8 byte from ldr t3, 0(a0), preload t3 with + 0xffffffffffffffff + + */ + +#include +#include + + +#define STRNCMP strncmp + +#define REP8_01 0x0101010101010101 +#define REP8_7f 0x7f7f7f7f7f7f7f7f +#define REP8_80 0x8080808080808080 + +/* Parameters and Results */ +#define src1 a0 +#define src2 a1 +#define limit a2 +/* Note: v0 = a0 in lp64 ABI */ +#define result v0 + + +/* Internal variable */ +#define data1 t0 +#define data2 t1 +#define has_nul t2 +#define diff t3 +#define syndrome t4 +#define zeroones t5 +#define sevenf t6 +#define pos t7 +#define exchange t8 +#define tmp1 a5 +#define tmp2 a6 +#define tmp3 a7 +#define src1_off a3 +#define limit_wd a4 + + +/* int strncmp (const char *s1, const char *s2); */ + +LEAF(STRNCMP) + .align 4 + beqz limit, strncmp_ret0 + + xor tmp1, src1, src2 + lu12i.w zeroones, 0x01010 + lu12i.w sevenf, 0x7f7f7 + andi src1_off, src1, 0x7 + ori zeroones, zeroones, 0x101 + andi tmp1, tmp1, 0x7 + ori sevenf, sevenf, 0xf7f + bstrins.d zeroones, zeroones, 63, 32 + bstrins.d sevenf, sevenf, 63, 32 + bnez tmp1, strncmp_misaligned8 + bnez src1_off, strncmp_mutual_align + + addi.d limit_wd, limit, -1 + srli.d limit_wd, limit_wd, 3 + +strncmp_loop_aligned: + ld.d data1, src1, 0 + addi.d src1, src1, 8 + ld.d data2, src2, 0 + addi.d src2, src2, 8 + +strncmp_start_realigned: + addi.d limit_wd, limit_wd, -1 + sub.d tmp1, data1, zeroones + or tmp2, data1, sevenf + xor diff, data1, data2 + andn has_nul, tmp1, tmp2 + srli.d tmp1, limit_wd, 63 + or syndrome, diff, has_nul + or tmp2, syndrome, tmp1 + beqz tmp2, strncmp_loop_aligned + + /* if not reach limit */ + bge limit_wd, zero, strncmp_not_limit + /* if reach limit */ + andi limit, limit, 0x7 + li.w tmp1, 0x8 + sub.d limit, tmp1, limit + slli.d limit, limit, 0x3 + li.d tmp1, -1 + srl.d tmp1, tmp1, limit + and data1, data1, tmp1 + and data2, data2, tmp1 + orn syndrome, syndrome, tmp1 + + +strncmp_not_limit: + ctz.d pos, syndrome + bstrins.d pos, zero, 2, 0 + srl.d data1, data1, pos + srl.d data2, data2, pos + andi data1, data1, 0xff + andi data2, data2, 0xff + sub.d result, data1, data2 + jr ra + + + +strncmp_mutual_align: + bstrins.d src1, zero, 2, 0 + bstrins.d src2, zero, 2, 0 + slli.d tmp1, src1_off, 0x3 + ld.d data1, src1, 0 + ld.d data2, src2, 0 + addi.d src2, src2, 8 + addi.d src1, src1, 8 + + addi.d limit_wd, limit, -1 + andi tmp3, limit_wd, 0x7 + srli.d limit_wd, limit_wd, 3 + add.d limit, limit, src1_off + add.d tmp3, tmp3, src1_off + srli.d tmp3, tmp3, 3 + add.d limit_wd, limit_wd, tmp3 + + sub.d tmp1, zero, tmp1 + nor tmp2, zero, zero + srl.d tmp2, tmp2, tmp1 + or data1, data1, tmp2 + or data2, data2, tmp2 + b strncmp_start_realigned + +strncmp_misaligned8: + li.w tmp1, 0x10 + bge limit, tmp1, strncmp_try_words + +strncmp_byte_loop: + ld.bu data1, src1, 0 + ld.bu data2, src2, 0 + addi.d limit, limit, -1 + xor tmp1, data1, data2 + masknez tmp1, data1, tmp1 + maskeqz tmp1, limit, tmp1 + beqz tmp1, strncmp_done + + ld.bu data1, src1, 1 + ld.bu data2, src2, 1 + addi.d src1, src1, 2 + addi.d src2, src2, 2 + addi.d limit, limit, -1 + xor tmp1, data1, data2 + masknez tmp1, data1, tmp1 + maskeqz tmp1, limit, tmp1 + bnez tmp1, strncmp_byte_loop + + +strncmp_done: + sub.d result, data1, data2 + jr ra + +strncmp_try_words: + srli.d limit_wd, limit, 3 + beqz src1_off, strncmp_do_misaligned + + sub.d src1_off, zero, src1_off + andi src1_off, src1_off, 0x7 + sub.d limit, limit, src1_off + srli.d limit_wd, limit, 0x3 + + +strncmp_page_end_loop: + ld.bu data1, src1, 0 + ld.bu data2, src2, 0 + addi.d src1, src1, 1 + addi.d src2, src2, 1 + xor tmp1, data1, data2 + masknez tmp1, data1, tmp1 + beqz tmp1, strncmp_done + andi tmp1, src1, 0x7 + bnez tmp1, strncmp_page_end_loop +strncmp_do_misaligned: + li.w src1_off, 0x8 + addi.d limit_wd, limit_wd, -1 + blt limit_wd, zero, strncmp_done_loop + +strncmp_loop_misaligned: + andi tmp2, src2, 0xff8 + xori tmp2, tmp2, 0xff8 + beqz tmp2, strncmp_page_end_loop + + ld.d data1, src1, 0 + ld.d data2, src2, 0 + addi.d src1, src1, 8 + addi.d src2, src2, 8 + sub.d tmp1, data1, zeroones + or tmp2, data1, sevenf + xor diff, data1, data2 + andn has_nul, tmp1, tmp2 + or syndrome, diff, has_nul + bnez syndrome, strncmp_not_limit + addi.d limit_wd, limit_wd, -1 + bge limit_wd, zero, strncmp_loop_misaligned + +strncmp_done_loop: + andi limit, limit, 0x7 + beqz limit, strncmp_not_limit + /* Read the last double word */ + /* check if the final part is about to exceed the page */ + andi tmp1, src2, 0x7 + andi tmp2, src2, 0xff8 + add.d tmp1, tmp1, limit + xori tmp2, tmp2, 0xff8 + andi tmp1, tmp1, 0x8 + masknez tmp1, tmp1, tmp2 + bnez tmp1, strncmp_byte_loop + addi.d src1, src1, -8 + addi.d src2, src2, -8 + ldx.d data1, src1, limit + ldx.d data2, src2, limit + sub.d tmp1, data1, zeroones + or tmp2, data1, sevenf + xor diff, data1, data2 + andn has_nul, tmp1, tmp2 + or syndrome, diff, has_nul + bnez syndrome, strncmp_not_limit + +strncmp_ret0: + move result, zero + jr ra + +/* check if ((src1 != 0) && ((src2 == 0) || (src1 < src2))) + then exchange(src1,src2) + */ + + +END(STRNCMP) +#ifndef ANDROID_CHANGES +#ifdef _LIBC +libc_hidden_builtin_def (strncmp) +#endif +#endif diff --git a/sysdeps/loongarch/lp64/strnlen.S b/sysdeps/loongarch/lp64/strnlen.S new file mode 100644 index 0000000000..b475938daf --- /dev/null +++ b/sysdeps/loongarch/lp64/strnlen.S @@ -0,0 +1,168 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* + * ISA: LoongArch + * ABI: lp64 + */ +/* +algorithm: + + #. use ld/ldr to access word/partial word in the string + + #. use (x - 0x0101010101010101) & (~(x | 0x7f7f7f7f7f7f7f7f) != 0 to + judge if x has zero byte + + #. use dctz((x - 0x0101010101010101) & (~(x | 0x7f7f7f7f7f7f7f7f) >> 3 + to get the index of first rightmost zero byte in dword x; + + #. use dctz(x) = 64 - dclz(~x & (x-1)); + + #. use pointer to the last non zero byte minus pointer to the start + of the string to get the length of string + + */ + +#include +#include + + + +#define L_ADDIU addi.d +#define L_ADDU add.d +#define L_SUBU sub.d + +#define STRNLEN __strnlen +#define L(x) x +/* rd <- if rc then ra else rb will destroy t6 */ + +#define CONDITIONSEL(rd,ra,rb,rc)\ + masknez a5, rb, rc;\ + maskeqz rd, ra, rc;\ + or rd, rd, a5 + + +/* Parameters and Results */ +#define srcin a0 +#define limit a1 +#define len v0 + + +/* Internal variable */ +#define data1 t0 +#define data2 t1 +#define has_nul1 t2 +#define has_nul2 t3 +#define src t4 +#define zeroones t5 +#define sevenf t6 +#define data2a t7 +#define tmp6 t7 +#define pos t8 +#define tmp1 a2 +#define tmp2 a3 +#define tmp3 a4 +#define tmp4 a5 +#define tmp5 a6 +#define limit_wd a7 + + + +/* size_t strnlen (const char *s1,size_t maxlen); */ + +LEAF(STRNLEN) + + .align 4 + beqz limit, L(_hit_limit) + lu12i.w zeroones, 0x01010 + lu12i.w sevenf, 0x7f7f7 + ori zeroones, zeroones, 0x101 + ori sevenf, sevenf, 0xf7f + bstrins.d zeroones, zeroones, 63, 32 + bstrins.d sevenf, sevenf, 63, 32 + andi tmp1, srcin, 15 + sub.d src, srcin, tmp1 + bnez tmp1, L(misaligned) + addi.d limit_wd, limit, -1 + srli.d limit_wd, limit_wd, 4 +L(_loop): + ld.d data1, src, 0 + ld.d data2, src, 8 + addi.d src, src, 16 +L(_realigned): + sub.d tmp1, data1, zeroones + or tmp2, data1, sevenf + sub.d tmp3, data2, zeroones + or tmp4, data2, sevenf + andn has_nul1, tmp1, tmp2 + andn has_nul2, tmp3, tmp4 + addi.d limit_wd, limit_wd, -1 + srli.d tmp1, limit_wd, 63 + or tmp2, has_nul1, has_nul2 + or tmp3, tmp1, tmp2 + beqz tmp3, L(_loop) + beqz tmp2, L(_hit_limit) + sub.d len, src, srcin + beqz has_nul1, L(_nul_in_data2) + move has_nul2, has_nul1 + addi.d len, len, -8 +L(_nul_in_data2): + ctz.d pos, has_nul2 + srli.d pos, pos, 3 + addi.d len, len, -8 + add.d len, len, pos + sltu tmp1, len, limit + CONDITIONSEL(len,len,limit,tmp1) + jr ra + + +L(misaligned): + addi.d limit_wd, limit, -1 + sub.d tmp4, zero, tmp1 + andi tmp3, limit_wd, 15 + srli.d limit_wd, limit_wd, 4 + li.d tmp5, -1 + ld.d data1, src, 0 + ld.d data2, src, 8 + addi.d src, src, 16 + slli.d tmp4, tmp4, 3 + add.d tmp3, tmp3, tmp1 + srl.d tmp2, tmp5, tmp4 + srli.d tmp3, tmp3, 4 + add.d limit_wd, limit_wd, tmp3 + or data1, data1, tmp2 + or data2a, data2, tmp2 + li.w tmp3, 9 + sltu tmp1, tmp1, tmp3 + CONDITIONSEL(data1,data1,tmp5,tmp1) + CONDITIONSEL(data2,data2,data2a,tmp1) + b L(_realigned) + + +L(_hit_limit): + move len, limit + jr ra +END(STRNLEN) +#ifndef ANDROID_CHANGES +#ifdef _LIBC +weak_alias (__strnlen, strnlen) +libc_hidden_def (strnlen) +libc_hidden_def (__strnlen) +#endif +#endif From patchwork Thu Aug 19 04:20:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Hua X-Patchwork-Id: 44712 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BF1F1383D005 for ; Thu, 19 Aug 2021 04:20:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BF1F1383D005 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629346844; bh=BazVToX87jt3cYdmbO/vQsPAI1q+RkNhIq/dUtMgWVQ=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=G2GnD/vQDPLf4fHjlezRVU22ypLi1h/dYbX6mmcEUpCffCmOFXXFVxlLSn44OwBxp f4i28ueHpzT/m3bsjm78+177bHJ6i7WoH2EkX196yeLH2cPF1k9U3pjR/TwZes05ql CTU1dUZmWcS7gsvayBSX3aX7bcuCGYlBtYC4nxYM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by sourceware.org (Postfix) with ESMTPS id 7705A3843893 for ; Thu, 19 Aug 2021 04:20:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7705A3843893 Received: by mail-yb1-xb31.google.com with SMTP id m193so9747726ybf.9 for ; Wed, 18 Aug 2021 21:20:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=BazVToX87jt3cYdmbO/vQsPAI1q+RkNhIq/dUtMgWVQ=; b=p5gOYwwVFRU8KzCluv1ROBZAhPWrC0VCzJcrZOQ4sSLAxnlhcfzznchWqgbUYVMmmz r4xLjdBk0y0f6UtkC+Zp9h7syyFnD7/CJ7hnET//Rs/kXTGLh5N9ekLbgJN3dpzpR076 OQUHIOZlXVNGeXpCziEy1GSgA1NhbDrtdWs36EWCWog9fT9FQacHWhSDEL5IZDyaeSSy IlQ4E4QbQ8J+S+a6NDTglwuT+vaMWV2lBluBygaD2Mour3KsXp0TnAU8FqyDGnGZHd0l ifRuHeI4QQBxYye3s89vJnONLDDVaks+av/hRGsTvbYP8vTSEQdvllvBmRZhaH/+Gcy5 6RNw== X-Gm-Message-State: AOAM532QeKoZksENy5LrKItHy+bqyylZ9W0Em9bLdRBwLBQdFdBE6PsB VvZI+4ViR0u7iXd3uSJsMHFDG7XknBFPgFc02zou3HhKAL0ETwqG X-Google-Smtp-Source: ABdhPJwHBY9gh1OIv8MRkbItxWdr3xc7wSTS4qqrJn3HzYKISWus0x9GqXxicULePpUaCeF/apxDgSHfWiT2BFNt1Iw= X-Received: by 2002:a25:824b:: with SMTP id d11mr16398012ybn.361.1629346812872; Wed, 18 Aug 2021 21:20:12 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 19 Aug 2021 12:20:01 +0800 Message-ID: Subject: [PATCH 13/14] [LoongArch] Add assembly optimized sinf cosf functions To: libc-alpha@sourceware.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Paul Hua via Libc-alpha From: Paul Hua Reply-To: Paul Hua Cc: Xu Chenghua , huangpei@loongson.cn, caiyinyu@loongson.cn Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From 13e66b2d0be38c1be530c6c4df1a57f321f6e3d2 Mon Sep 17 00:00:00 2001 From: caiyinyu Date: Tue, 27 Jul 2021 16:21:01 +0800 Subject: [PATCH 13/14] LoongArch: Add assembly optimized sinf cosf functions. * sysdeps/loongarch/lp64/s_cosf.S: New file. * sysdeps/loongarch/lp64/s_sinf.S: Likewise. --- sysdeps/loongarch/lp64/s_cosf.S | 472 ++++++++++++++++++++++++++++++++ sysdeps/loongarch/lp64/s_sinf.S | 451 ++++++++++++++++++++++++++++++ 2 files changed, 923 insertions(+) create mode 100644 sysdeps/loongarch/lp64/s_cosf.S create mode 100644 sysdeps/loongarch/lp64/s_sinf.S +libm_alias_float (__sin, sin) diff --git a/sysdeps/loongarch/lp64/s_cosf.S b/sysdeps/loongarch/lp64/s_cosf.S new file mode 100644 index 0000000000..4207fd2563 --- /dev/null +++ b/sysdeps/loongarch/lp64/s_cosf.S @@ -0,0 +1,472 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +/* Short algorithm description: + * + * 1) if |x|==0: sin(x)=x, + * cos(x)=1. + * 2) if |x|<2^-27: sin(x)=x-x*DP_SMALL, raising underflow only when needed, + * cos(x)=1-|x|. + * 3) if |x|<2^-5 : sin(x)=x+x*x^2*DP_SIN2_0+x^5*DP_SIN2_1, + * cos(x)=1+1*x^2*DP_COS2_0+x^5*DP_COS2_1 + * 4) if |x|< Pi/4: sin(x)=x+x*x^2*(S0+x^2*(S1+x^2*(S2+x^2*(S3+x^2*S4)))), + * cos(x)=1+1*x^2*(C0+x^2*(C1+x^2*(C2+x^2*(C3+x^2*C4)))). + * 5) if |x| < 9*Pi/4: + * 5.1) Range reduction: + * k=trunc(|x|/(Pi/4)), j=(k+1)&0x0e, n=k+1, t=|x|-j*Pi/4. + * 5.2) Reconstruction: + * sign_sin = sign(x) * (-1.0)^((n >>2)&1) + * sign_cos = (-1.0)^(((n+2)>>2)&1) + * poly_sin = ((((S4*t^2 + S3)*t^2 + S2)*t^2 + S1)*t^2 + S0)*t^2*t+t + * poly_cos = ((((C4*t^2 + C3)*t^2 + C2)*t^2 + C1)*t^2 + C0)*t^2*s+s + * if(n&2 != 0) { + * using cos(t) and sin(t) polynomials for |t|= 2^23, very large args: + * 7.1) Range reduction: + * k=trunc(|x|/(Pi/4)), j=(k+1)&0xfffffffe, n=k+1, t=|x|-j*Pi/4. + * 7.2) Reconstruction same as (5.2). + * 8) if x is Inf, return x-x, and set errno=EDOM. + * 9) if x is NaN, return x-x. + * + * Special cases: + * sin/cos(+-0) = +-0/1 not raising inexact/underflow, + * sin/cos(subnormal) raises inexact/underflow, + * sin/cos(min_normalized) raises inexact/underflow, + * sin/cos(normalized) raises inexact, + * sin/cos(Inf) = NaN, raises invalid, sets errno to EDOM, + * sin/cos(NaN) = NaN. + */ + +#define COSF __cosf + +#define LOADFD(rd, rs, label) \ + la.local rs, label;\ + fld.d rd, rs, 0 + +#define LOADFS(rd, rs, label) \ + la.local rs, label;\ + fld.s rd, rs, 0 + +#define FTOL(rd, rs, tmp) \ + ftintrz.l.d tmp, rs;\ + movfr2gr.d rd, tmp + +#define FTOW(rd, rs, tmp) \ + ftintrz.w.d tmp, rs;\ + movfr2gr.s rd, tmp + +#define WTOF(rd, rs, tmp) \ + movgr2fr.w tmp, rs;\ + ffint.d.w rd, tmp + +#define LTOF(rd, rs, tmp) \ + movgr2fr.d tmp, rs;\ + ffint.d.l rd, tmp + +LEAF(COSF) + .align 2 + .align 3 + + /* fa0 is SP x; fa1 is DP x */ + movfr2gr.s t0, fa0 /* Bits of x */ + fcvt.d.s fa1, fa0 /* DP x */ + li.w t1, 0x7fffffff + and t0, t0, t1 /* |x| */ + li.w t1, 0x3f490fdb /* const Pi/4 */ + bltu t0, t1, L(arg_less_pio4)/* |x| < Pi/4 branch */ + li.w t1, 0x40e231d6 /* 9*Pi/4 */ + la.local t4, L(DP_) /*DP_ base addr*/ + + /* |x| >= 9*Pi/4 branch */ + bgeu t0, t1, L(greater_or_equal_9pio4) + +/* L(median_args): */ + +/* Here if Pi/4<=|x|<9*Pi/4 */ + fabs.d fa0, fa1 /* DP |x| */ + fld.d fa1, t4, 56 /* 4/Pi */ + fmul.d fa1, fa1, fa0 /* DP |x|/(Pi/4) */ + FTOW(t0, fa1, fa1) /* k=trunc(|x|/(Pi/4)) */ + la.local t1, L(PIO2J) /* base addr of PIO2J table */ + addi.w t0, t0, 1 /* k+1 */ + bstrpick.d t2, t0, 3, 1 /* j=n/2 */ + alsl.d t1, t2, t1, 3 + fld.d fa1, t1, 0 /* j*Pi/2 */ + addi.w t0, t0, 2 /* n = k+3 */ + fsub.d fa0, fa0, fa1 /* t = |x| - j * Pi/2 */ + + /* Input: t0=n fa0=t */ +L(reduced): + +/* Here if cos(x) calculated using cos(t) polynomial for |t|>2)&1) + * result = s * (1.0+t^2*(C0+t^2*(C1+t^2*(C2+t^2*(C3+t^2*C4))))) + + * Here if cos(x) calculated using sin(t) polynomial for |t|>2)&1) + * result = s * t * (1.0+t^2*(S0+t^2*(S1+t^2*(S2+t^2*(S3+t^2*S4))))) + */ + +/* TODO: what is the best order ??? */ + +/* load-to-use latency, hardware module usage, integer pipeline & float + * pipeline */ + + /* cancel branch */ + slli.w t0, t0, 1 /* (n << 1) */ + andi t1, t0, 4 /* (n << 1) & 4 */ + alsl.d t2, t1, t4, 4 /* adjust to DP_C or DP_S */ + fld.d fa3, t2, 32 /* C4 */ + andi t0, t0, 8 /* =====> (n << 1) & 8 */ + fmul.d fa1, fa0, fa0 /* y=x^2 */ + fld.d fa4, t2, 16 /* C2 */ + fmul.d fa2, fa1, fa1 /* z=x^4 */ + fld.d fa5, t2, 24 /* C3 */ + la.local t3, L(DP_ONES) /* =====> DP_ONES */ + fld.d fa6, t2, 8 /* C1 */ + fmadd.d fa4, fa2, fa3, fa4 /* cx = C2+z*C4 */ + fld.d fa3, t2, 0 /* C0 */ + fmadd.d fa5, fa2, fa5, fa6 /* cy = C1+z*C3 */ + fld.d fa6, t3, 0 /* one */ + fmadd.d fa4, fa2, fa4, fa3 /* cx = C0+z*cx */ + add.d t0, t0, t3 /* =====> addr */ + fmadd.d fa4, fa1, fa5, fa4 /* cx = cx+y*cy */ + fld.d fa2, t0, 0 /* sign */ + fmadd.d fa4, fa4, fa1, fa6 /* 1.0+y*cx */ + fmul.d fa1, fa2, fa4 /* sign * cx */ + bnez t1, L_return + + /* t*s, where s = sign(x) * (-1.0)^((n>>2)&1) */ + fmul.d fa1, fa1, fa0 +L_return: + fcvt.s.d fa0, fa1 /* SP result */ + jr ra + +L(greater_or_equal_9pio4): + +/* Here if |x|>=9*Pi/4 */ + li.w t1, 0x7f800000 /* x is Inf or NaN? */ + bgeu t0, t1, L(inf_or_nan) /* |x| >= Inf branch */ + +/* Here if finite |x|>=9*Pi/4 */ + li.w t1, 0x4b000000 /* 2^23 */ + + /* |x| >= 2^23 branch */ + bgeu t0, t1, L(greater_or_equal_2p23) + +/* Here if 9*Pi/4<=|x|<2^23 */ + fabs.d fa0, fa1 /* DP |x| */ + fld.d fa1, t4, 56 + fmul.d fa1, fa1, fa0 /* |x|/(Pi/4) */ + FTOW(t0, fa1, fa1) /* k=trunc(|x|/(Pi/4)) */ + addi.w t0, t0, 1 /* k+1 */ + srli.w t1, t0, 1 /* x=n/2 */ + WTOF(fa1, t1, fa1) /* DP x */ + fld.d fa2, t4, 104 /* -PIO2HI = high part of -Pi/2 */ + fld.d fa3, t4, 112 /* -PIO2LO = low part of -Pi/2 */ + fmadd.d fa0, fa2, fa1, fa0 /* |x| - x*PIO2HI */ + addi.w t0, t0, 2 /* n = k+3 */ + fmadd.d fa0, fa3, fa1, fa0 /* |x| - x*PIO2HI - x*PIO2LO */ + b L(reduced) + +L(greater_or_equal_2p23): + +/* Here if finite |x|>=2^23 */ + fabs.s fa5, fa0 /* SP |x| */ + + /* bitpos = (ix>>23) - BIAS_32; */ + /* TODO???srai.w eb = biased exponent of x */ + srli.w t0, t0, 23 + + /* bitpos = eb - 0x7f + 59, where 0x7f is exponent bias */ + addi.w t0, t0, -124 /* t0 = bitpos */ + + /* t3= j = bitpos/28 */ + /* x/28 = (x * ((0x100000000 / 28) + 1)) >> 32 */ + li.w t1, 0x924924a + mulh.wu t0, t1, t0 + fcvt.d.s fa5, fa5 /* Convert to double */ + + /* TODO: what is the best order ??? */ + la.local t1, L(invpio4_table) /* t2 */ + alsl.d t1, t0, t1, 3 + fld.d fa0, t1, 0 /* invpio4_table[j] */ + fld.d fa1, t1, 8 /* invpio4_table[j+1] */ + fmul.d fa0, fa0, fa5 /* a = invpio4_table[j]*|x| */ + fld.d fa2, t1, 16 /* invpio4_table[j+2] */ + fmul.d fa1, fa1, fa5 /* b = invpio4_table[j+1]*|x| */ + fld.d fa3, t1, 24 /* invpio4_table[j+3] */ + fmul.d fa2, fa2, fa5 /* c = invpio4_table[j+2]*|x| */ + fmul.d fa3, fa3, fa5 /* d = invpio4_table[j+3]*|x| */ + + /* TODO: overflow check */ + /* uint64_t l = a; TODO: change the order */ + + FTOL(t0, fa0, fa4) + li.w t1, -8 /* 0xfffffffffffffff8 */ + and t0, t0, t1 /* l &= ~0x7; */ + LTOF(fa4, t0, fa4) /* DP l */ + fsub.d fa0, fa0, fa4 /* a -= l; */ + fadd.d fa4, fa0, fa1 /* fa4 double e = a + b; */ + + /* TODO: overflow check */ + FTOL(t0, fa4, fa4) /* uint64_t l = e; */ + andi t2, t0, 1 /* l & 1 TODO: change the order */ + LOADFD(fa5, t1, L(DP_ONES)) /* fa5 = 1.0 */ + LTOF(fa4, t0, fa4) /* fa4 DP l */ + + /* critical!!!! the order */ + fsub.d fa0, fa0, fa4 + fld.d fa4, t4, 120 /* PI_4 */ + beqz t2, L_even_integer + +/* L_odd_integer: */ + fsub.d fa0, fa0, fa5 + fadd.d fa0, fa0, fa1 + fadd.d fa2, fa2, fa3 + fadd.d fa0, fa0, fa2 + addi.d t0, t0, 3 + fmul.d fa0, fa0, fa4 + b L(reduced) +L_even_integer: + fadd.d fa0, fa0, fa1 + fadd.d fa2, fa2, fa3 + fadd.d fa0, fa0, fa2 + fcmp.sle.d $fcc0, fa0, fa5 + addi.d t0, t0, 3 + bcnez $fcc0, L_leq_one + +/* L_gt_one: */ + fld.d fa2, t1, 16 /* 2.0 */ + addi.d t0, t0, 1 + fsub.d fa0, fa0, fa2 +L_leq_one: + fmul.d fa0, fa0, fa4 + b L(reduced) + +L(arg_less_pio4): + +/* Here if |x|. */ + +#include +#include +#include + +/* Short algorithm description: + * + * 1) if |x|==0: sin(x)=x, + * cos(x)=1. + * 2) if |x|<2^-27: sin(x)=x-x*DP_SMALL, raising underflow only when needed, + * cos(x)=1-|x|. + * 3) if |x|<2^-5 : sin(x)=x+x*x^2*DP_SIN2_0+x^5*DP_SIN2_1, + * cos(x)=1+1*x^2*DP_COS2_0+x^5*DP_COS2_1 + * 4) if |x|< Pi/4: sin(x)=x+x*x^2*(S0+x^2*(S1+x^2*(S2+x^2*(S3+x^2*S4)))), + * cos(x)=1+1*x^2*(C0+x^2*(C1+x^2*(C2+x^2*(C3+x^2*C4)))). + * 5) if |x| < 9*Pi/4: + * 5.1) Range reduction: + * k=trunc(|x|/(Pi/4)), j=(k+1)&0x0e, n=k+1, t=|x|-j*Pi/4. + * 5.2) Reconstruction: + * sign_sin = sign(x) * (-1.0)^((n>>2)&1) + * sign_cos = (-1.0)^(((n+2)>>2)&1) + * poly_sin = ((((S4*t^2 + S3)*t^2 + S2)*t^2 + S1)*t^2 + S0)*t^2*t+t + * poly_cos = ((((C4*t^2 + C3)*t^2 + C2)*t^2 + C1)*t^2 + C0)*t^2*s+s + * if(n&2 != 0) { + * using cos(t) and sin(t) polynomials for |t|= 2^23, very large args: + * 7.1) Range reduction: + * k=trunc(|x|/(Pi/4)), j=(k+1)&0xfffffffe, n=k+1, t=|x|-j*Pi/4. + * 7.2) Reconstruction same as (5.2). + * 8) if x is Inf, return x-x, and set errno=EDOM. + * 9) if x is NaN, return x-x. + * + * Special cases: + * sin/cos(+-0) = +-0/1 not raising inexact/underflow, + * sin/cos(subnormal) raises inexact/underflow, + * sin/cos(min_normalized) raises inexact/underflow, + * sin/cos(normalized) raises inexact, + * sin/cos(Inf) = NaN, raises invalid, sets errno to EDOM, + * sin/cos(NaN) = NaN. + */ + +#define SINF __sinf + +#define LOADFD(rd, rs, label) \ + la.local rs, label;\ + fld.d rd, rs, 0 + +#define LOADFS(rd, rs, label) \ + la.local rs, label;\ + fld.s rd, rs, 0 + +#define FTOL(rd, rs, tmp) \ + ftintrz.l.d tmp, rs;\ + movfr2gr.d rd, tmp + +#define FTOW(rd, rs, tmp) \ + ftintrz.w.d tmp, rs;\ + movfr2gr.s rd, tmp + +#define WTOF(rd, rs, tmp) \ + movgr2fr.w tmp, rs;\ + ffint.d.w rd, tmp + +#define LTOF(rd, rs, tmp) \ + movgr2fr.d tmp, rs;\ + ffint.d.l rd, tmp + +LEAF(SINF) + .align 2 + .align 3 + + /* fa0 is SP x; fa1 is DP x */ + movfr2gr.s t2, fa0 /* Bits of x */ + fcvt.d.s fa1, fa0 /* DP x */ + li.w t1, 0x7fffffff + and t0, t2, t1 /* |x| */ + li.w t1, 0x3f490fdb /* const Pi/4 */ + bltu t0, t1, L(arg_less_pio4)/* |x| < Pi/4 branch */ + li.w t1, 0x40e231d6 /* 9*Pi/4 */ + la.local t4, L(DP_) /* DP_ base addr */ + bstrpick.d t5, t2, 31, 31 /* sign of x */ + slli.w t5, t5, 3 + + /* |x| >= 9*Pi/4 branch */ + bgeu t0, t1, L(greater_or_equal_9pio4) + +/* L(median_args): */ +/* Here if Pi/4<=|x|<9*Pi/4 */ + fabs.d fa0, fa1 /* DP |x| */ + fld.d fa1, t4, 56 /* 4/Pi */ + fmul.d fa1, fa1, fa0 /* DP |x|/(Pi/4) */ + FTOW(t0, fa1, fa1) /* k=trunc(|x|/(Pi/4)) */ + la.local t1, L(PIO2J) /* base addr of PIO2J table */ + addi.w t0, t0, 1 /* k+1 */ + bstrpick.d t2, t0, 3, 1 /* j=n/2 */ + alsl.d t1, t2, t1, 3 + fld.d fa1, t1, 0 /* j*Pi/2 */ + fsub.d fa0, fa0, fa1 /* t = |x| - j * Pi/2 */ + + /* Input: t0=n fa0=t*/ + /* Input: t0=n fa0=t, t5=sign(x) */ +L(reduced): + +/* Here if cos(x) calculated using cos(t) polynomial for |t|>2)&1) + * result = s * (1.0+t^2*(C0+t^2*(C1+t^2*(C2+t^2*(C3+t^2*C4))))) + * Here if cos(x) calculated using sin(t) polynomial for |t|>2)&1) + * result = s * t * (1.0+t^2*(S0+t^2*(S1+t^2*(S2+t^2*(S3+t^2*S4))))) + */ + +/* TODO: what is the best order ??? */ + +/* load-to-use latency, hardware module usage, integer pipeline & float +pipeline */ + + /* cancel branch */ + slli.w t0, t0, 1 /* (n << 1) */ + andi t1, t0, 4 /* (n << 1) & 4 */ + alsl.d t2, t1, t4, 4 /* adjust to DP_C or DP_S */ + fld.d fa3, t2, 32 /* C4 */ + andi t0, t0, 8 /* =====> (n << 1) & 8 */ + fmul.d fa1, fa0, fa0 /* y=x^2 */ + xor t0, t0, t5 /* (-1.0)^((n>>2)&1) XOR sign(x) */ + fld.d fa4, t2, 16 /* C2 */ + fmul.d fa2, fa1, fa1 /* z=x^4 */ + fld.d fa5, t2, 24 /* C3 */ + la.local t3, L(DP_ONES) /* =====> DP_ONES */ + fld.d fa6, t2, 8 /* C1 */ + fmadd.d fa4, fa2, fa3, fa4 /* cx = C2+z*C4 */ + fld.d fa3, t2, 0 /* C0 */ + fmadd.d fa5, fa2, fa5, fa6 /* cy = C1+z*C3 */ + fld.d fa6, t3, 0 /* 1.0 */ + fmadd.d fa4, fa2, fa4, fa3 /* cx = C0+z*cx */ + add.d t0, t0, t3 /* =====> addr */ + fmadd.d fa4, fa1, fa5, fa4 /* cx = cx+y*cy */ + fld.d fa2, t0, 0 /* sign */ + fmadd.d fa4, fa4, fa1, fa6 /* 1.0+y*cx */ + fmul.d fa1, fa2, fa4 /* sign * cx */ + bnez t1, L_return + + /* t*s, where s = sign(x) * (-1.0)^((n>>2)&1) */ + fmul.d fa1, fa1, fa0 + +L_return: + fcvt.s.d fa0, fa1 /* SP result */ + jr ra + +L(greater_or_equal_9pio4): + +/* Here if |x|>=9*Pi/4 */ + li.w t1, 0x7f800000 /* x is Inf or NaN? */ + bgeu t0, t1, L(inf_or_nan) /* |x| >= Inf branch */ + +/* Here if finite |x|>=9*Pi/4 */ + li.w t1, 0x4b000000 /* 2^23 */ + + /* |x| >= 2^23 branch */ + bgeu t0, t1, L(greater_or_equal_2p23) + +/* Here if 9*Pi/4<=|x|<2^23 */ + fabs.d fa0, fa1 /* DP |x| */ + fld.d fa1, t4, 56 + fmul.d fa1, fa1, fa0 /* |x|/(Pi/4) */ + FTOW(t0, fa1, fa1) /* k=trunc(|x|/(Pi/4)) */ + addi.w t0, t0, 1 /* k+1 */ + srli.w t1, t0, 1 /* x=n/2 */ + WTOF(fa1, t1, fa1) /* DP x */ + fld.d fa2, t4, 104 /* -PIO2HI = high part of -Pi/2 */ + fld.d fa3, t4, 112 /* -PIO2LO = low part of -Pi/2 */ + fmadd.d fa0, fa2, fa1, fa0 /* |x| - x*PIO2HI */ + fmadd.d fa0, fa3, fa1, fa0 /* |x| - x*PIO2HI - x*PIO2LO */ + b L(reduced) + +L(greater_or_equal_2p23): + +/* Here if finite |x|>=2^23 */ + fabs.s fa5, fa0 /* SP |x| */ + + /* bitpos = (ix>>23) - BIAS_32; */ + /* TODO???srai.w eb = biased exponent of x */ + srli.w t0, t0, 23 + + /* bitpos = eb - 0x7f + 59, where 0x7f is exponent bias */ + addi.w t0, t0, -124 /* t0 = bitpos */ + + /* t3= j = bitpos/28 */ + /* x/28 = (x * ((0x100000000 / 28) + 1)) >> 32 */ + li.w t1, 0x924924a + mulh.wu t0, t1, t0 + fcvt.d.s fa5, fa5 /* Convert to double */ + + /* TODO: what is the best order ??? */ + la.local t1, L(invpio4_table)/* t2 */ + alsl.d t1, t0, t1, 3 + fld.d fa0, t1, 0 /* invpio4_table[j] */ + fld.d fa1, t1, 8 /* invpio4_table[j+1] */ + fmul.d fa0, fa0, fa5 /* a = invpio4_table[j]*|x| */ + fld.d fa2, t1, 16 /* invpio4_table[j+2] */ + fmul.d fa1, fa1, fa5 /* b = invpio4_table[j+1]*|x| */ + fld.d fa3, t1, 24 /* invpio4_table[j+3] */ + fmul.d fa2, fa2, fa5 /* c = invpio4_table[j+2]*|x| */ + fmul.d fa3, fa3, fa5 /* d = invpio4_table[j+3]*|x| */ + + /* TODO: overflow check */ + /* uint64_t l = a; TODO: change the order */ + FTOL(t0, fa0, fa4) + li.w t1, -8 /* 0xfffffffffffffff8 */ + and t0, t0, t1 /* l &= ~0x7; */ + LTOF(fa4, t0, fa4) /* DP l*/ + fsub.d fa0, fa0, fa4 /* a -= l; */ + fadd.d fa4, fa0, fa1 /* fa4 double e = a + b; */ + + /* TODO: overflow check */ + FTOL(t0, fa4, fa4) /* uint64_t l = e */ + andi t2, t0, 1 /* l & 1 TODO: change the order */ + LOADFD(fa5, t1, L(DP_ONES)) /* fa5 = 1.0 */ + LTOF(fa4, t0, fa4) /* fa4 DP l */ + + /* critical!!!! the order */ + fsub.d fa0, fa0, fa4 + fld.d fa4, t4, 120 /* PI_4 */ + beqz t2, L_even_integer + +/* L_odd_integer: */ + fsub.d fa0, fa0, fa5 + fadd.d fa0, fa0, fa1 + fadd.d fa2, fa2, fa3 + fadd.d fa0, fa0, fa2 + addi.d t0, t0, 1 + fmul.d fa0, fa0, fa4 + b L(reduced) + +L_even_integer: + fadd.d fa0, fa0, fa1 + fadd.d fa2, fa2, fa3 + fadd.d fa0, fa0, fa2 + fcmp.sle.d $fcc0, fa0, fa5 + addi.d t0, t0, 1 + bcnez $fcc0, L_leq_one + +/* L_gt_one: */ + fld.d fa2, t1, 16 /* 2.0 */ + addi.d t0, t0, 1 + fsub.d fa0, fa0, fa2 +L_leq_one: + fmul.d fa0, fa0, fa4 + b L(reduced) + +L(arg_less_pio4): + +/* Here if |x| X-Patchwork-Id: 44713 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B4C73399201F for ; Thu, 19 Aug 2021 04:22:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B4C73399201F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629346959; bh=6Scf6lEG5XzJfQbHKCUyG1Yi0uPwVVKI+Y/rWz5SuiM=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=c+tHk1oPZGe69Nfw1eTGCBVBxG9cLsXMfo8rvF2LTZsXU40fs6zRsxeGQm2NtRfdS WSEzx0V4TchH18aZ2IfL7eEcD4fb1wN5C9zyURg3NOWqujzoWJAW/g68BM8SYkuT7x 7ZekROnp7zbbSpYTJ0rpPD7rhJW6k6SRF49o6+yw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) by sourceware.org (Postfix) with ESMTPS id 218EC385C412 for ; Thu, 19 Aug 2021 04:22:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 218EC385C412 Received: by mail-yb1-xb2e.google.com with SMTP id a9so6310650ybr.5 for ; Wed, 18 Aug 2021 21:22:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=6Scf6lEG5XzJfQbHKCUyG1Yi0uPwVVKI+Y/rWz5SuiM=; b=eoczE7W/sln3BhF6clv1sAXmBWB9Vd5X854G1swmFfiYKZ5C4oSWM6KMaAOEvpBpDJ 6qBtrGrE9JaQxhaF8adB3dOIXREmEHnaaL3C7zdTOowvrK2JfzInqAOjT4Rju9VCHYq8 /N2QUEN9zTE9SXUPXiogQj+BUDCdS9elQmPtgoiI2d3IxHLZ14g4G1WCoFAmNiRfoLHE ISjCuNv/nQR/BTh/ybuoxgyDm8GKxgwdoOfdEKq6mB+IPtBH14emQ/kev9DCKrrBRFAq +IEKnUNccWqDIv7lENPvxvx1nYxDylwIRBCa20NOXRmXl1eQ9Nn33io3KWCYpnKTxOsw uBOQ== X-Gm-Message-State: AOAM532XTwC++xQBPiAvQZMJ3ugLho7wSKPYUi2kbDchjSMqOg3sjC3W x6Ac4qtkAbQWQtvTa7NYVInuzLskdAfniYp+u6gi6lW7eyzk9cTX X-Google-Smtp-Source: ABdhPJxL8NbrBXNVucOXpg/SrJPIo91MIuPcKaAChAyIN5AsK0YRojNRF/FeIceOYTvU05L0Tfa1ImKhvHrZU2yukA4= X-Received: by 2002:a25:da89:: with SMTP id n131mr15894316ybf.255.1629346928579; Wed, 18 Aug 2021 21:22:08 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 19 Aug 2021 12:21:55 +0800 Message-ID: Subject: [PATCH 14/14] [LoongArch] Add optimized memcpy set move To: libc-alpha@sourceware.org X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Paul Hua via Libc-alpha From: Paul Hua Reply-To: Paul Hua Cc: Xu Chenghua , huangpei@loongson.cn, caiyinyu@loongson.cn Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" From 9a38e8f5dafe14982ab9cecc693561aee3542da2 Mon Sep 17 00:00:00 2001 From: caiyinyu Date: Tue, 27 Jul 2021 16:24:00 +0800 Subject: [PATCH 14/14] LoongArch: Add optimized memcpy/set/move * sysdeps/loongarch/lp64/memcpy.S: New file. * sysdeps/loongarch/lp64/memmove.S: Likewise. * sysdeps/loongarch/lp64/memset.S: Likewise. --- sysdeps/loongarch/lp64/memcpy.S | 420 ++++++++++++++++++++++++++ sysdeps/loongarch/lp64/memmove.S | 492 +++++++++++++++++++++++++++++++ sysdeps/loongarch/lp64/memset.S | 186 ++++++++++++ 3 files changed, 1098 insertions(+) create mode 100644 sysdeps/loongarch/lp64/memcpy.S create mode 100644 sysdeps/loongarch/lp64/memmove.S create mode 100644 sysdeps/loongarch/lp64/memset.S diff --git a/sysdeps/loongarch/lp64/memcpy.S b/sysdeps/loongarch/lp64/memcpy.S new file mode 100644 index 0000000000..cb4a406e11 --- /dev/null +++ b/sysdeps/loongarch/lp64/memcpy.S @@ -0,0 +1,420 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#ifdef _LIBC +#include +#include +#include +#else +#include +#include +#endif + +/* Allow the routine to be named something else if desired. */ +#ifndef MEMCPY_NAME +#define MEMCPY_NAME memcpy +#endif + +#define LD_64(reg, n) \ + ld.d t0, reg, n; \ + ld.d t1, reg, n+8; \ + ld.d t2, reg, n+16; \ + ld.d t3, reg, n+24; \ + ld.d t4, reg, n+32; \ + ld.d t5, reg, n+40; \ + ld.d t6, reg, n+48; \ + ld.d t7, reg, n+56; + + +#define ST_64(reg, n) \ + st.d t0, reg, n; \ + st.d t1, reg, n+8; \ + st.d t2, reg, n+16; \ + st.d t3, reg, n+24; \ + st.d t4, reg, n+32; \ + st.d t5, reg, n+40; \ + st.d t6, reg, n+48; \ + st.d t7, reg, n+56; + +#define LDST_1024 \ + LD_64(a1, 0); \ + ST_64(a0, 0); \ + LD_64(a1, 64); \ + ST_64(a0, 64); \ + LD_64(a1, 128); \ + ST_64(a0, 128); \ + LD_64(a1, 192); \ + ST_64(a0, 192); \ + LD_64(a1, 256); \ + ST_64(a0, 256); \ + LD_64(a1, 320); \ + ST_64(a0, 320); \ + LD_64(a1, 384); \ + ST_64(a0, 384); \ + LD_64(a1, 448); \ + ST_64(a0, 448); \ + LD_64(a1, 512); \ + ST_64(a0, 512); \ + LD_64(a1, 576); \ + ST_64(a0, 576); \ + LD_64(a1, 640); \ + ST_64(a0, 640); \ + LD_64(a1, 704); \ + ST_64(a0, 704); \ + LD_64(a1, 768); \ + ST_64(a0, 768); \ + LD_64(a1, 832); \ + ST_64(a0, 832); \ + LD_64(a1, 896); \ + ST_64(a0, 896); \ + LD_64(a1, 960); \ + ST_64(a0, 960); + +#ifdef ANDROID_CHANGES +LEAF(MEMCPY_NAME, 0) +#else +LEAF(MEMCPY_NAME) +#endif + +/* 1st var: dest ptr: void *str1 $r4 */ +/* 2nd var: src ptr: void *str2 $r5 */ +/* 3rd var: size_t num */ +/* t0~t9 registers as temp */ + + add.d a4, a1, a2 + add.d a3, a0, a2 + move t8, a0 + move a5, a1 + srai.d a6, a2, 4 #num/16 + beqz a6, less_16bytes #num<16 + slti a6, a2, 137 + beqz a6, more_137bytes #num>137 + srai.d a6, a2, 6 + beqz a6, less_64bytes #num<64 + + srli.d a0, a0, 3 + slli.d a0, a0, 3 + addi.d a0, a0, 0x8 + sub.d a7, t8, a0 + ld.d t0, a1, 0 + sub.d a1, a1, a7 + st.d t0, t8, 0 + + add.d a7, a7, a2 + addi.d a7, a7, -0x20 +loop_32: + ld.d t0, a1, 0 + ld.d t1, a1, 8 + ld.d t2, a1, 16 + ld.d t3, a1, 24 + st.d t0, a0, 0 + st.d t1, a0, 8 + st.d t2, a0, 16 + st.d t3, a0, 24 + + addi.d a0, a0, 0x20 + addi.d a1, a1, 0x20 + addi.d a7, a7, -0x20 + blt zero, a7, loop_32 + + ld.d t4, a4, -32 + ld.d t5, a4, -24 + ld.d t6, a4, -16 + ld.d t7, a4, -8 + st.d t4, a3, -32 + st.d t5, a3, -24 + st.d t6, a3, -16 + st.d t7, a3, -8 + + move v0, t8 + jr ra + +less_64bytes: + srai.d a6, a2, 5 + beqz a6, less_32bytes + + ld.d t0, a1, 0 + ld.d t1, a1, 8 + ld.d t2, a1, 16 + ld.d t3, a1, 24 + ld.d t4, a4, -32 + ld.d t5, a4, -24 + ld.d t6, a4, -16 + ld.d t7, a4, -8 + st.d t0, a0, 0 + st.d t1, a0, 8 + st.d t2, a0, 16 + st.d t3, a0, 24 + st.d t4, a3, -32 + st.d t5, a3, -24 + st.d t6, a3, -16 + st.d t7, a3, -8 + + jr ra + +less_32bytes: + ld.d t0, a1, 0 + ld.d t1, a1, 8 + ld.d t2, a4, -16 + ld.d t3, a4, -8 + st.d t0, a0, 0 + st.d t1, a0, 8 + st.d t2, a3, -16 + st.d t3, a3, -8 + + jr ra + +less_16bytes: + srai.d a6, a2, 3 #num/8 + beqz a6, less_8bytes + + ld.d t0, a1, 0 + ld.d t1, a4, -8 + st.d t0, a0, 0 + st.d t1, a3, -8 + + jr ra + +less_8bytes: + srai.d a6, a2, 2 + beqz a6, less_4bytes + + ld.w t0, a1, 0 + ld.w t1, a4, -4 + st.w t0, a0, 0 + st.w t1, a3, -4 + + jr ra + +less_4bytes: + srai.d a6, a2, 1 + beqz a6, less_2bytes + + ld.h t0, a1, 0 + ld.h t1, a4, -2 + st.h t0, a0, 0 + st.h t1, a3, -2 + + jr ra + +less_2bytes: + beqz a2, less_1bytes + + ld.b t0, a1, 0 + st.b t0, a0, 0 + + jr ra + +less_1bytes: + jr ra + +more_137bytes: + li.w a6, 64 + andi t1, a0, 7 + srli.d a0, a0, 3 + andi t2, a2, 7 + slli.d a0, a0, 3 + add.d t1, t1, t2 + beqz t1, all_align + beq a0, t8, start_over + addi.d a0, a0, 0x8 + sub.d a7, t8, a0 + sub.d a1, a1, a7 + add.d a2, a7, a2 + +start_unalign_proc: + ld.d t0, a5, 0 + slli.d t0, t0, 8 + pcaddi t1, 18 + slli.d t2, a7, 3 + add.d t1, t1, t2 + jirl zero, t1, 0 + +start_7_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -7 +start_6_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -6 +start_5_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -5 +start_4_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -4 +start_3_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -3 +start_2_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -2 +start_1_unalign: + srli.d t0, t0, 8 + st.b t0, a0, -1 +start_over: + + addi.d a2, a2, -0x80 + blt a2, zero, end_unalign_proc + +loop_less: + LD_64(a1, 0) + ST_64(a0, 0) + LD_64(a1, 64) + ST_64(a0, 64) + + addi.d a0, a0, 0x80 + addi.d a1, a1, 0x80 + addi.d a2, a2, -0x80 + bge a2, zero, loop_less + +end_unalign_proc: + addi.d a2, a2, 0x80 + + pcaddi t1, 34 + andi t2, a2, 0x78 + sub.d t1, t1, t2 + jirl zero, t1, 0 + +end_120_128_unalign: + ld.d t0, a1, 112 + st.d t0, a0, 112 +end_112_120_unalign: + ld.d t0, a1, 104 + st.d t0, a0, 104 +end_104_112_unalign: + ld.d t0, a1, 96 + st.d t0, a0, 96 +end_96_104_unalign: + ld.d t0, a1, 88 + st.d t0, a0, 88 +end_88_96_unalign: + ld.d t0, a1, 80 + st.d t0, a0, 80 +end_80_88_unalign: + ld.d t0, a1, 72 + st.d t0, a0, 72 +end_72_80_unalign: + ld.d t0, a1, 64 + st.d t0, a0, 64 +end_64_72_unalign: + ld.d t0, a1, 56 + st.d t0, a0, 56 +end_56_64_unalign: + ld.d t0, a1, 48 + st.d t0, a0, 48 +end_48_56_unalign: + ld.d t0, a1, 40 + st.d t0, a0, 40 +end_40_48_unalign: + ld.d t0, a1, 32 + st.d t0, a0, 32 +end_32_40_unalign: + ld.d t0, a1, 24 + st.d t0, a0, 24 +end_24_32_unalign: + ld.d t0, a1, 16 + st.d t0, a0, 16 +end_16_24_unalign: + ld.d t0, a1, 8 + st.d t0, a0, 8 +end_8_16_unalign: + ld.d t0, a1, 0 + st.d t0, a0, 0 +end_0_8_unalign: + + mod.d t0, a3, a6 + srli.d t1, t0, 3 + slti t0, t0, 1 + add.d t0, t0, t1 + blt zero, t0, end_8_without_cross_cache_line + + andi a2, a2, 0x7 + pcaddi t1, 18 + slli.d a2, a2, 3 + sub.d t1, t1, a2 + jirl zero, t1, 0 + +end_7_unalign: + ld.b t0, a4, -7 + st.b t0, a3, -7 +end_6_unalign: + ld.b t0, a4, -6 + st.b t0, a3, -6 +end_5_unalign: + ld.b t0, a4, -5 + st.b t0, a3, -5 +end_4_unalign: + ld.b t0, a4, -4 + st.b t0, a3, -4 +end_3_unalign: + ld.b t0, a4, -3 + st.b t0, a3, -3 +end_2_unalign: + ld.b t0, a4, -2 + st.b t0, a3, -2 +end_1_unalign: + ld.b t0, a4, -1 + st.b t0, a3, -1 +end: + move v0, t8 + jr ra + +all_align: + addi.d a2, a2, -0x20 + +align_loop_less: + ld.d t0, a1, 0 + ld.d t1, a1, 8 + ld.d t2, a1, 16 + ld.d t3, a1, 24 + st.d t0, a0, 0 + st.d t1, a0, 8 + st.d t2, a0, 16 + st.d t3, a0, 24 + + addi.d a0, a0, 0x20 + addi.d a1, a1, 0x20 + addi.d a2, a2, -0x20 + blt zero, a2, align_loop_less + + ld.d t4, a4, -32 + ld.d t5, a4, -24 + ld.d t6, a4, -16 + ld.d t7, a4, -8 + st.d t4, a3, -32 + st.d t5, a3, -24 + st.d t6, a3, -16 + st.d t7, a3, -8 + + move v0, t8 + jr ra + +end_8_without_cross_cache_line: + ld.d t0, a4, -8 + st.d t0, a3, -8 + + move v0, t8 + jr ra + +END(MEMCPY_NAME) +#ifndef ANDROID_CHANGES +#ifdef _LIBC +libc_hidden_builtin_def (MEMCPY_NAME) +#endif +#endif diff --git a/sysdeps/loongarch/lp64/memmove.S b/sysdeps/loongarch/lp64/memmove.S new file mode 100644 index 0000000000..0d35062f1b --- /dev/null +++ b/sysdeps/loongarch/lp64/memmove.S @@ -0,0 +1,492 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + Contributed by Loongson Technology Corporation Limited. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#ifdef _LIBC +#include +#include +#include +#else +#include +#include +#endif + +/* Allow the routine to be named something else if desired. */ +#ifndef MEMMOVE_NAME +#define MEMMOVE_NAME memmove +#endif + +#define LD_64(reg, n) \ + ld.d t0, reg, n; \ + ld.d t1, reg, n+8; \ + ld.d t2, reg, n+16; \ + ld.d t3, reg, n+24; \ + ld.d t4, reg, n+32; \ + ld.d t5, reg, n+40; \ + ld.d t6, reg, n+48; \ + ld.d t7, reg, n+56; + + +#define ST_64(reg, n) \ + st.d t0, reg, n; \ + st.d t1, reg, n+8; \ + st.d t2, reg, n+16; \ + st.d t3, reg, n+24; \ + st.d t4, reg, n+32; \ + st.d t5, reg, n+40; \ + st.d t6, reg, n+48; \ + st.d t7, reg, n+56; + +#define LDST_1024 \ + LD_64(a1, 0); \ + ST_64(a0, 0); \ + LD_64(a1, 64); \ + ST_64(a0, 64); \ + LD_64(a1, 128); \ + ST_64(a0, 128); \ + LD_64(a1, 192); \ + ST_64(a0, 192); \ + LD_64(a1, 256); \ + ST_64(a0, 256); \ + LD_64(a1, 320); \ + ST_64(a0, 320); \ + LD_64(a1, 384); \ + ST_64(a0, 384); \ + LD_64(a1, 448); \ + ST_64(a0, 448); \ + LD_64(a1, 512); \ + ST_64(a0, 512); \ + LD_64(a1, 576); \ + ST_64(a0, 576); \ + LD_64(a1, 640); \ + ST_64(a0, 640); \ + LD_64(a1, 704); \ + ST_64(a0, 704); \ + LD_64(a1, 768); \ + ST_64(a0, 768); \ + LD_64(a1, 832); \ + ST_64(a0, 832); \ + LD_64(a1, 896); \ + ST_64(a0, 896); \ + LD_64(a1, 960); \ + ST_64(a0, 960); + +#define LDST_1024_BACK \ + LD_64(a4, -64); \ + ST_64(a3, -64); \ + LD_64(a4, -128); \ + ST_64(a3, -128); \ + LD_64(a4, -192); \ + ST_64(a3, -192); \ + LD_64(a4, -256); \ + ST_64(a3, -256); \ + LD_64(a4, -320); \ + ST_64(a3, -320); \ + LD_64(a4, -384); \ + ST_64(a3, -384); \ + LD_64(a4, -448); \ + ST_64(a3, -448); \ + LD_64(a4, -512); \ + ST_64(a3, -512); \ + LD_64(a4, -576); \ + ST_64(a3, -576); \ + LD_64(a4, -640); \ + ST_64(a3, -640); \ + LD_64(a4, -704); \ + ST_64(a3, -704); \ + LD_64(a4, -768); \ + ST_64(a3, -768); \ + LD_64(a4, -832); \ + ST_64(a3, -832); \ + LD_64(a4, -896); \ + ST_64(a3, -896); \ + LD_64(a4, -960); \ + ST_64(a3, -960); \ + LD_64(a4, -1024); \ + ST_64(a3, -1024); + +#ifdef ANDROID_CHANGES +LEAF(MEMMOVE_NAME, 0) +#else +LEAF(MEMMOVE_NAME) +#endif + +/* 1st var: dest ptr: void *str1 $r4 a0 */ +/* 2nd var: src ptr: void *str2 $r5 a1 */ +/* 3rd var: size_t num */ +/* t0~t9 registers as temp */ + + add.d a4, a1, a2 + add.d a3, a0, a2 + beq a1, a0, less_1bytes + move t8, a0 + srai.d a6, a2, 4 #num/16 + beqz a6, less_16bytes #num<16 + srai.d a6, a2, 6 #num/64 + bnez a6, more_64bytes #num>64 + srai.d a6, a2, 5 + beqz a6, less_32bytes #num<32 + + ld.d t0, a1, 0 #32. */ + +#ifdef _LIBC +#include +#include +#include +#else +#include +#include +#endif + +#ifdef LOONGARCH_TEST +#define MEMSET _memset +#else +#define MEMSET memset +#endif + +#define ST_128(n) \ + st.d a1, a0, n; \ + st.d a1, a0, n+8 ; \ + st.d a1, a0, n+16 ; \ + st.d a1, a0, n+24 ; \ + st.d a1, a0, n+32 ; \ + st.d a1, a0, n+40 ; \ + st.d a1, a0, n+48 ; \ + st.d a1, a0, n+56 ; \ + st.d a1, a0, n+64 ; \ + st.d a1, a0, n+72 ; \ + st.d a1, a0, n+80 ; \ + st.d a1, a0, n+88 ; \ + st.d a1, a0, n+96 ; \ + st.d a1, a0, n+104; \ + st.d a1, a0, n+112; \ + st.d a1, a0, n+120; + +/* 1st var: void *str $4 a0 */ +/* 2nd var: int val $5 a1 */ +/* 3rd var: size_t num $6 a2 */ + +LEAF(MEMSET) + +memset: + .align 6 + + bstrins.d a1, a1, 15, 8 + add.d t7, a0, a2 + bstrins.d a1, a1, 31, 16 + move t0, a0 + bstrins.d a1, a1, 63, 32 + srai.d t8, a2, 4 #num/16 + beqz t8, less_16bytes #num<16 + srai.d t8, a2, 6 #num/64 + bnez t8, more_64bytes #num>64 + srai.d t8, a2, 5 #num/32 + beqz t8, less_32bytes #num<32 + st.d a1, a0, 0 #32