From patchwork Thu Oct 31 05:49:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rikalo X-Patchwork-Id: 99869 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 70F013857000 for ; Thu, 31 Oct 2024 05:51:48 +0000 (GMT) X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 7BFAC385843B for ; Thu, 31 Oct 2024 05:50:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7BFAC385843B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7BFAC385843B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::635 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730353828; cv=none; b=R6uSKXmBYkH4FuexXS8AFdVVQOiqK3z0GFtOpGGOIGN84HQTaLrGhbGfSO+KTErxFkBOsKfJYuwB03yRNKPxkQQLfdwJjmglCZzbvycDrNIDQ489PwwGDDE60DIsSJyffXgIu7kn1cEC1VB3kwwfWDqvsShmI1LOkhIMxjGxNtw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730353828; c=relaxed/simple; bh=tWFUEsx1s7DCKjPxQWe3MrNz+NKXHLiB78zPwHmY9p8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=jSQKmjpUIW6Co7TVQHvEKsu7o7n2TEbNnKYk4PGhj8ynbAGrDzcN2SZPebWoeSmRMaEBciB9wEHGRPVMJfCn5qEva9jYhgf+AmL+LeYrLj9+64V193cNTiOPaIgALtioSH8VoZd49KmWHScU5SqtpUMmns2NQM0Wb45x3W4kVJE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-a99f1fd20c4so69514366b.0 for ; Wed, 30 Oct 2024 22:50:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730353818; x=1730958618; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kPKPzYR3vbOWK7NVUHCLvtwd5Xu+yQZTlq5+sbp46E8=; b=FcvRLF89RJDniQeJ8C+DOnx8ZLtiiGWiRw0Hyb6vvhPr/ddgodToQ8DS5E6HFF9IIA TJ4LRnIfUC5VIAxqglDH4/2LB0tmDLPC5big9Euxp0W54hRBXHILKGXm+4jfia5VLRwo GGQ5LYJ11LVLVWWG82i+WNPu02Ap4lS3A1KSbi0GpsdJpinFCUG64ZWVPs1zGJukh8ht 3hNhQMBWX+NSa9OFKWYozOVKmIK3Q8u2cUcywNN8svAWt73lJj8jDP5BKHVLJ0T8+gIH 13wxdpl8ghVGriifJ5C+0DYgkrJR7bsGkTVn80+hEY7t/YcIXVecbZwe0PqTw3HXEdgL iO2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730353818; x=1730958618; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kPKPzYR3vbOWK7NVUHCLvtwd5Xu+yQZTlq5+sbp46E8=; b=Vlzv9iGv4K6EsKwRv5DJxLI9cxKVHxWu8JQCghytdkSTh0E/xYTAc3BabQheq4nR2m RmNCWBFLecIBjNOB16IwbgcHBgPPoabk7KqxMNkr+QKb+pEx/9OTxAAGqgRtGJd2A0ph D/sDrq7gRNmhoIpM0qRp+Tz/cdbw51fLbSqzgY1sF+92+4Iy96D4+2R2Qe1paPqXDh/n un/zoo4ZpgFmuTDHpUtiYrofKSWQI3/CtFsPYvtG+8me2Un7/kR0OMYEdhIW73jg1FAk UfzdGRnt3ypocyS77RR1eODPsHMw7Q/eo0UoSE2VkKVGUe/zikCqkkmKny6uVXrmfldP H/Pw== X-Gm-Message-State: AOJu0YxWymG6cOOwpa7euGqs4VmCtq/p5KTk9GJNuXSJuHzt/96xuuIl ub8UqYPw5+RxWuD54xNeX1YmrPgNQKarHAbUrbew5y6TF2zcPLRBxLSBCA== X-Google-Smtp-Source: AGHT+IHChW5EEIv7htZBkvTn/La8PGd24QFFot/JVz4uwTISCpBaTpXPhjhR120SjPWt//oCO9/0Rw== X-Received: by 2002:a17:907:94d0:b0:a99:f3fb:f88e with SMTP id a640c23a62f3a-a9e50b57493mr144794866b.41.1730353817598; Wed, 30 Oct 2024 22:50:17 -0700 (PDT) Received: from localhost.localdomain ([79.175.114.8]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9e56649363sm28779966b.184.2024.10.30.22.50.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Oct 2024 22:50:17 -0700 (PDT) From: Aleksandar Rikalo To: newlib@sourceware.org Cc: Aleksandar Rikalo , Djordje Todorovic Subject: [PATCH 17/21] mips: libc: Improve performance of strcmp implementation Date: Thu, 31 Oct 2024 06:49:33 +0100 Message-Id: <20241031054937.68189-18-arikalo@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241031054937.68189-1-arikalo@gmail.com> References: <20241031054937.68189-1-arikalo@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org From: Faraz Shahbazker Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rikalo --- newlib/libc/machine/mips/strcmp.S | 222 ++++++++++++++++++------------ 1 file changed, 133 insertions(+), 89 deletions(-) diff --git a/newlib/libc/machine/mips/strcmp.S b/newlib/libc/machine/mips/strcmp.S index d17fef4d3..d5257bb49 100644 --- a/newlib/libc/machine/mips/strcmp.S +++ b/newlib/libc/machine/mips/strcmp.S @@ -2,16 +2,16 @@ * Copyright (C) 2014-2024 MIPS Tech, LLC * * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived from this - * software without specific prior written permission. + * contributors may be used to endorse or promote products derived from this + * software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE @@ -24,7 +24,7 @@ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. -*/ + */ #ifdef ANDROID_CHANGES # include "machine/asm.h" @@ -41,6 +41,10 @@ performance loss, so we are not turning it on by default. */ #if defined(ENABLE_CLZ) && (__mips_isa_rev > 1) # define USE_CLZ +#elif (__mips_isa_rev >= 2) +# define USE_EXT 1 +#else +# define USE_EXT 0 #endif /* Some asm.h files do not have the L macro definition. */ @@ -61,6 +65,10 @@ # endif #endif +/* Haven't yet found a configuration where DSP code outperforms + normal assembly. */ +#define __mips_using_dsp 0 + /* Allow the routine to be named something else if desired. */ #ifndef STRCMP_NAME # define STRCMP_NAME strcmp @@ -72,25 +80,36 @@ LEAF(STRCMP_NAME, 0) LEAF(STRCMP_NAME) #endif .set nomips16 - or t0, a0, a1 - andi t0,0x3 + andi t0, t0, 0x3 bne t0, zero, L(byteloop) /* Both strings are 4 byte aligned at this point. */ li t8, 0x01010101 +#if !__mips_using_dsp li t9, 0x7f7f7f7f +#endif -#define STRCMP32(OFFSET) \ - lw vt0, OFFSET(a0); \ - lw vt1, OFFSET(a1); \ - subu t0, vt0, t8; \ - nor t1, vt0, t9; \ - bne vt0, vt1, L(worddiff); \ - and t0, t0, t1; \ +#if __mips_using_dsp +# define STRCMP32(OFFSET) \ + lw a2, OFFSET(a0); \ + lw a3, OFFSET(a1); \ + subu_s.qb t0, t8, a2; \ + bne a2, a3, L(worddiff); \ bne t0, zero, L(returnzero) +#else /* !__mips_using_dsp */ +# define STRCMP32(OFFSET) \ + lw a2, OFFSET(a0); \ + lw a3, OFFSET(a1); \ + subu t0, a2, t8; \ + nor t1, a2, t9; \ + bne a2, a3, L(worddiff); \ + and t1, t0, t1; \ + bne t1, zero, L(returnzero) +#endif /* __mips_using_dsp */ + .align 2 L(wordloop): STRCMP32(0) STRCMP32(4) @@ -99,118 +118,143 @@ L(wordloop): STRCMP32(16) STRCMP32(20) STRCMP32(24) - lw vt0, 28(a0) - lw vt1, 28(a1) - subu t0, vt0, t8 - nor t1, vt0, t9 - bne vt0, vt1, L(worddiff) - and t0, t0, t1 + lw a2, 28(a0) + lw a3, 28(a1) +#if __mips_using_dsp + subu_s.qb t0, t8, a2 +#else + subu t0, a2, t8 + nor t1, a2, t9 + and t1, t0, t1 +#endif + PTR_ADDIU a0, a0, 32 - bne t0, zero, L(returnzero) + bne a2, a3, L(worddiff) PTR_ADDIU a1, a1, 32 - b L(wordloop) + beq t1, zero, L(wordloop) L(returnzero): move va0, zero jr ra + .align 2 L(worddiff): #ifdef USE_CLZ - subu t0, vt0, t8 - nor t1, vt0, t9 - and t1, t0, t1 - xor t0, vt0, vt1 + xor t0, a2, a3 or t0, t0, t1 # if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ wsbh t0, t0 rotr t0, t0, 16 -# endif +# endif /* LITTLE_ENDIAN */ clz t1, t0 - and t1, 0xf8 -# if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ - neg t1 - addu t1, 24 + or t0, t1, 24 /* Only care about multiples of 8. */ + xor t1, t1, t0 /* {0,8,16,24} => {24,16,8,0} */ +# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + sllv a2,a2,t1 + sllv a3,a3,t1 +# else + srlv a2,a2,t1 + srlv a3,a3,t1 # endif - rotrv vt0, vt0, t1 - rotrv vt1, vt1, t1 - and vt0, vt0, 0xff - and vt1, vt1, 0xff - subu va0, vt0, vt1 + subu va0, a2, a3 jr ra #else /* USE_CLZ */ # if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - andi t0, vt0, 0xff - andi t1, vt1, 0xff - beq t0, zero, L(wexit01) - srl t8, vt0, 8 - bne t0, t1, L(wexit01) - - srl t9, vt1, 8 + andi a0, a2, 0xff /* abcd => d */ + andi a1, a3, 0xff + beq a0, zero, L(wexit01) +# if USE_EXT + ext t8, a2, 8, 8 + bne a0, a1, L(wexit01) + ext t9, a3, 8, 8 + beq t8, zero, L(wexit89) + ext a0, a2, 16, 8 + bne t8, t9, L(wexit89) + ext a1, a3, 16, 8 +# else /* !USE_EXT */ + srl t8, a2, 8 + bne a0, a1, L(wexit01) + srl t9, a3, 8 andi t8, t8, 0xff andi t9, t9, 0xff beq t8, zero, L(wexit89) - srl t0, vt0, 16 + srl a0, a2, 16 bne t8, t9, L(wexit89) + srl a1, a3, 16 + andi a0, a0, 0xff + andi a1, a1, 0xff +# endif /* !USE_EXT */ - srl t1, vt1, 16 - andi t0, t0, 0xff - andi t1, t1, 0xff - beq t0, zero, L(wexit01) - srl t8, vt0, 24 - bne t0, t1, L(wexit01) - - srl t9, vt1, 24 # else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ - srl t0, vt0, 24 - srl t1, vt1, 24 - beq t0, zero, L(wexit01) - srl t8, vt0, 16 - bne t0, t1, L(wexit01) + srl a0, a2, 24 /* abcd => a */ + srl a1, a3, 24 + beq a0, zero, L(wexit01) - srl t9, vt1, 16 +# if USE_EXT + ext t8, a2, 16, 8 + bne a0, a1, L(wexit01) + ext t9, a3, 16, 8 + beq t8, zero, L(wexit89) + ext a0, a2, 8, 8 + bne t8, t9, L(wexit89) + ext a1, a3, 8, 8 +# else /* ! USE_EXT */ + srl t8, a2, 16 + bne a0, a1, L(wexit01) + srl t9, a3, 16 andi t8, t8, 0xff andi t9, t9, 0xff beq t8, zero, L(wexit89) - srl t0, vt0, 8 + srl a0, a2, 8 bne t8, t9, L(wexit89) + srl a1, a3, 8 + andi a0, a0, 0xff + andi a1, a1, 0xff +# endif /* USE_EXT */ - srl t1, vt1, 8 - andi t0, t0, 0xff - andi t1, t1, 0xff - beq t0, zero, L(wexit01) - andi t8, vt0, 0xff - bne t0, t1, L(wexit01) - - andi t9, vt1, 0xff # endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ + beq a0, zero, L(wexit01) + bne a0, a1, L(wexit01) + + /* The other bytes are identical, so just subract the 2 words + and return the difference. */ + move a0, a2 + move a1, a3 + +L(wexit01): + subu va0, a0, a1 + jr ra + L(wexit89): subu va0, t8, t9 jr ra -L(wexit01): - subu va0, t0, t1 - jr ra + #endif /* USE_CLZ */ +#define DELAY_NOP nop + /* It might seem better to do the 'beq' instruction between the two 'lbu' instructions so that the nop is not needed but testing showed that this code is actually faster (based on glibc strcmp test). */ + #define BYTECMP01(OFFSET) \ - lbu vt1, OFFSET(a1); \ - nop; \ - beq vt0, zero, L(bexit01); \ + lbu a3, OFFSET(a1); \ + DELAY_NOP; \ + beq a2, zero, L(bexit01); \ lbu t8, OFFSET+1(a0); \ - bne vt0, vt1, L(bexit01) + bne a2, a3, L(bexit01) #define BYTECMP89(OFFSET) \ lbu t9, OFFSET(a1); \ - nop; \ + DELAY_NOP; \ beq t8, zero, L(bexit89); \ - lbu vt0, OFFSET+1(a0); \ + lbu a2, OFFSET+1(a0); \ bne t8, t9, L(bexit89) + .align 2 L(byteloop): - lbu vt0, 0(a0) + lbu a2, 0(a0) BYTECMP01(0) BYTECMP89(1) BYTECMP01(2) @@ -219,20 +263,20 @@ L(byteloop): BYTECMP89(5) BYTECMP01(6) lbu t9, 7(a1) - nop - beq t8, zero, L(bexit89) + PTR_ADDIU a0, a0, 8 - bne t8, t9, L(bexit89) + beq t8, zero, L(bexit89) PTR_ADDIU a1, a1, 8 - b L(byteloop) + beq t8, t9, L(byteloop) -L(bexit01): - subu va0, vt0, vt1 - jr ra L(bexit89): subu va0, t8, t9 jr ra +L(bexit01): + subu va0, a2, a3 + jr ra + .set at END(STRCMP_NAME)