From patchwork Fri Mar 7 10:23:03 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: R Vidya X-Patchwork-Id: 10 X-Patchwork-Delegate: azanella@linux.vnet.ibm.com Return-Path: X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx23.g.dreamhost.com (caibbdcaabja.dreamhost.com [208.113.200.190]) by wilcox.dreamhost.com (Postfix) with ESMTP id 4C7C33600B2 for ; Fri, 7 Mar 2014 02:23:48 -0800 (PST) Received: by homiemail-mx23.g.dreamhost.com (Postfix, from userid 14307373) id DBBD2628BA4D6; Fri, 7 Mar 2014 02:23:47 -0800 (PST) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx23.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx23.g.dreamhost.com (Postfix) with ESMTPS id 9DA26628CC394 for ; Fri, 7 Mar 2014 02:23:47 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type; q=dns; s=default; b=c4IdbL W2m0yHNXmS3Oe9TgQWo+ADm6MJafUmAByL9JLjs6XkidVRNXrzdb2sYl28shVNf5 YGp3Hmxy/oFmmncJ3o1r4eA+nCFpBL+UOOh1i+0uNAfjM60j8qxx1E21pOtEMrda yWLgyIkvi9dezoqbf+yli6jA81VgC20Fvi9Oc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type; s=default; bh=5XPzA7hDEJ/t Nvaa5U6gPkSfehE=; b=KfUjK3Qh4c8x/9jg4okuSH3YXTY0SoxlbZjAIsu65FTb 8eZ3K5tEAUcxy5FO9yioec1hbE5eAny9ty84mWtv9terCS8Ub0ii+q4o0zuGNqYn ezuPp0hs06SCQkNen7dxiOorWU9B2Q0Wm6ZN6+f4hGocfv3v5ogKvbHOc8tbK74= Received: (qmail 8716 invoked by alias); 7 Mar 2014 10:23:45 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 8704 invoked by uid 89); 7 Mar 2014 10:23:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e24smtp01.br.ibm.com Message-ID: <53199E07.6040009@linux.vnet.ibm.com> Date: Fri, 07 Mar 2014 15:53:03 +0530 From: R Vidya User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Adhemerval Zanella , libc-alpha@sourceware.org Subject: Re: [PATCH] Multiarch optimization for strspn on POWERPC References: <5315CD58.5080002@linux.vnet.ibm.com> <53163BD1.5000106@linux.vnet.ibm.com> In-Reply-To: <53163BD1.5000106@linux.vnet.ibm.com> X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14030710-1524-0000-0000-00000920BE40 X-DH-Original-To: glibc@patchwork.siddhesh.in This patch has to be applied on top of strncat for multiarch support. On Wednesday 05 March 2014 02:17 AM, Adhemerval Zanella wrote: > Hi Vidya, > > The patch looks good, just some comments below. > > > On 04-03-2014 09:55, R Vidya wrote: >> diff --git a/string/strspn.c b/string/strspn.c >> index 37e8161..557eec5 100644 >> --- a/string/strspn.c >> +++ b/string/strspn.c >> @@ -18,11 +18,14 @@ >> #include >> >> #undef strspn >> +#ifndef STRSPN >> +#define STRSPN strspn >> +#endif >> >> /* Return the length of the maximum initial segment >> of S which contains only characters in ACCEPT. */ >> size_t >> -strspn (s, accept) >> +STRSPN (s, accept) >> const char *s; >> const char *accept; >> { >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile >> index d09f2e3..9821101 100644 >> --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile >> +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile >> @@ -14,7 +14,7 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \ >> wcsrchr-ppc64 wcscpy-power7 wcscpy-power6 wcscpy-ppc64 \ >> wordcopy-power7 wordcopy-power6 wordcopy-ppc64 \ >> strcpy-power7 strcpy-ppc64 stpcpy-power7 stpcpy-ppc64 \ >> - strrchr-power7 strrchr-ppc64 >> + strrchr-power7 strrchr-ppc64 strspn-power7 strspn-ppc64 >> >> CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops >> CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c >> index 8789483..7cf60f6 100644 >> --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c >> +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c >> @@ -246,5 +246,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, >> IFUNC_IMPL_ADD (array, i, strrchr, 1, >> __strrchr_ppc)) >> >> + /* Support sysdeps/powerpc/powerpc64/multiarch/strspn.c. */ >> + IFUNC_IMPL (i, name, strspn, >> + IFUNC_IMPL_ADD (array, i, strspn, >> + hwcap & PPC_FEATURE_HAS_VSX, >> + __strspn_power7) >> + IFUNC_IMPL_ADD (array, i, strspn, 1, >> + __strspn_ppc)) >> + >> return i; >> } >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S >> new file mode 100644 >> index 0000000..889dfee >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S >> @@ -0,0 +1,40 @@ >> +/* Optimized strspn implementation for POWER7. >> + Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#include >> + >> +#undef EALIGN >> +#define EALIGN(name, alignt, words) \ >> + .section ".text"; \ >> + ENTRY_2(__strspn_power7) \ >> + .align ALIGNARG(alignt); \ >> + EALIGN_W_##words; \ >> + BODY_LABEL(__strspn_power7): \ >> + cfi_startproc; \ >> + LOCALENTRY(__strspn_power7) >> + >> +#undef END >> +#define END(name) \ >> + cfi_endproc; \ >> + TRACEBACK(__strspn_power7) \ >> + END_2(__strspn_power7) >> + >> +#undef libc_hidden_builtin_def >> +#define libc_hidden_builtin_def(name) >> + >> +#include >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c >> new file mode 100644 >> index 0000000..d543772 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c >> @@ -0,0 +1,33 @@ >> +/* Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#include >> + >> +#define STRSPN __strspn_ppc >> +#undef weak_alias >> +#define weak_alias(name, aliasname) \ >> + extern __typeof (__strspn_ppc) aliasname \ >> + __attribute__ ((weak, alias ("__strspn_ppc"))); >> +#if !defined(NOT_IN_libc) && defined(SHARED) >> +# undef libc_hidden_builtin_def >> +# define libc_hidden_builtin_def(name) \ >> + __hidden_ver1(__strspn_ppc, __GI_strspn, __strspn_ppc); >> +#endif >> + >> +extern __typeof (strspn) __strspn_ppc attribute_hidden; >> + >> +#include >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn.c b/sysdeps/powerpc/powerpc64/multiarch/strspn.c >> new file mode 100644 >> index 0000000..44945f3 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn.c >> @@ -0,0 +1,31 @@ >> +/* Multiple versions of strspn. PowerPC64 version. >> + Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#if defined SHARED && !defined NOT_IN_libc >> +# include >> +# include >> +# include "init-arch.h" >> + >> +extern __typeof (strspn) __strspn_ppc attribute_hidden; >> +extern __typeof (strspn) __strspn_power7 attribute_hidden; >> + >> +libc_ifunc (strspn, >> + (hwcap & PPC_FEATURE_HAS_VSX) >> + ? __strspn_power7 >> + : __strspn_ppc); >> +#endif >> diff --git a/sysdeps/powerpc/powerpc64/power7/strspn.S b/sysdeps/powerpc/powerpc64/power7/strspn.S >> new file mode 100644 >> index 0000000..a891768 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/power7/strspn.S >> @@ -0,0 +1,149 @@ >> +/* Optimized strspn implementation for PowerPC64/POWER7. >> + >> + Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +/* size_t [r3] strspn (const char *string [r3], >> + const char *needleAccept [r4] */ >> + >> +/* Performance gains are grabbed through following techniques: >> + >> + > hashing of needle. >> + > hashing avoids scanning of duplicate entries in needle across the string. >> + > initializing the hash table with Vector instructions by quadword access. >> + > unrolling when scanning for character in string across hash table. */ >> + >> +/* Algorithm is as below: >> + 1. A empty hash table/dictionary is created comprising of 256 ascii character set >> + 2. When hash entry is found in needle , the hash index is initialized to 1 >> + 3. The string is scanned until end and for every character, its corresponding >> + hash index is compared. >> + 4. initial length of string (count) until first hit of accept needle to be found >> + is set to 0 >> + 4. If hash index is set to 1 for the index of string, count is returned. >> + 5. Otherwise count is incremented and scanning continues until end of string. >> +*/ >> + >> +#include >> + >> +#undef strspn >> + >> + .machine power7 >> +EALIGN(strspn, 4, 0) >> + CALL_MCOUNT 2 >> + >> + lbz r10, 0(r4) /* load r10 with needle (r4) */ > Comments with double-space ending, fix for the remaining ones. > >> + addi r9, r1, -272 /* r9 is a hash of 272 bytes */ > This is not really correct, check https://sourceware.org/ml/libc-alpha/2014-03/msg00071.html > >> + >> + li r5, 16 /* set r5 = 16 as offset */ >> + li r6, 32 /* set r6 = 32 as offset */ >> + li r8, 48 /* set r8 = 48 as offset */ >> + >> +/*Iniatliaze hash table with Zeroes in double indexed quadword accesses */ >> + xxlxor v0, v0, v0 /* prepare for initializing hash */ >> + >> + stxvd2x v0, r0, r9 /* initialize 1st quadword */ >> + stxvd2x v0, r9, r5 >> + stxvd2x v0, r9, r6 >> + stxvd2x v0, r9, r8 /* initialize 4th quadword */ >> + >> + addi r11, r9, 64 /* r11 is index to hash */ >> + >> + stxvd2x v0, r0, r11 /* initialize 5th quadword */ >> + stxvd2x v0, r11, r5 >> + stxvd2x v0, r11, r6 >> + stxvd2x v0, r11, r8 /* initialize 8th quadword */ >> + >> + addi r11, r9, 128 /* r11 is index to hash */ >> + >> + stxvd2x v0, r0, r11 /* initialize 9th quadword */ >> + stxvd2x v0, r11, r5 >> + stxvd2x v0, r11, r6 >> + stxvd2x v0, r11, r8 /* initialize 12th quadword */ >> + >> + addi r11, r9, 192 /* r11 is index to hash */ >> + >> + stxvd2x v0, r0, r11 /* initialize 13th quadword */ >> + stxvd2x v0, r11, r5 >> + stxvd2x v0, r11, r6 >> + stxvd2x v0, r11, r8 /* initialize 16th quadword */ >> + >> + li r8, 1 /* r8=1, marker into hash if found in needle */ >> + >> + cmpdi cr7, r10, 0 /* accept needle is NULL */ >> + beq cr7, L(skipHashing) /* if needle is NULL, skip hashing */ >> + >> + .p2align 4 /* align section to 16 byte boundary */ >> +L(hashing): >> + stbx r8, r9, r10 /* update hash with marker for the pivot of the needle */ >> + lbzu r10, 1(r4) /* load needle into r10 and update to next */ >> + cmpdi cr7, r10, 0 /* if needle is has reached NULL, continue */ >> + bne cr7, L(hashing) /* loop to hash the needle */ >> + >> +L(skipHashing): >> + li r10, 0 /* load counter = 0 */ >> + b L(beginScan) >> + >> + .p2align 4 /* align section to 16 byte boundary */ >> +L(scanUnroll): >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7, L(ret1stIndex) /* we have hit accept needle, return the count */ >> + >> + lbz r8, 1(r3) /* load string[1] into r8 */ >> + addi r10, r10, 4 /* increment counter */ >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7, L(ret2ndIndex) /* we have hit accept needle, return the count */ >> + >> + lbz r8, 2(r3) /* load string[2] into r8 */ >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7, L(ret3rdIndex) /* we have hit accept needle, return the count */ >> + >> + lbz r8, 3(r3) /* load string[3] into r8 */ >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + addi r3, r3, 4 /* unroll factor , increment string by 4 */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7,L(ret4thIndex) /* we have hit accept needle, return the count */ >> + >> +L(beginScan): >> + lbz r8, 0(r3) /* load string[0] into r8 */ >> + addi r6, r10, 1 /* place holder for counter + 1 */ >> + addi r5, r10, 2 /* place holder for counter + 2 */ >> + addi r4, r10, 3 /* place holder for counter + 3 */ >> + cmpdi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + bne cr7, L(scanUnroll) /* continue scanning */ >> + >> +L(ret1stIndex): >> + mr r3, r10 /* update r3 for return */ >> + blr /* return */ >> + >> +L(ret2ndIndex): >> + mr r3, r6 /* update r3 for return */ >> + blr /* return */ >> + >> +L(ret3rdIndex): >> + mr r3, r5 /* update r3 for return */ >> + blr /* return */ >> + >> +L(ret4thIndex): >> + mr r3, r4 /* update r3 for return */ >> + blr /* done */ >> + >> +END(strspn) >> +libc_hidden_builtin_def (strspn) >From 48e7a71efeac09d77ad2bdefca886f61a59cbf39 Mon Sep 17 00:00:00 2001 From: Vidya Ranganathan Date: Fri, 7 Mar 2014 03:15:06 -0500 Subject: [PATCH] Multiarch optimization for strspn() on PowerPC. I have attached the benchtest output to show the performance improvement. The optimization is achieved by following techniques: > hashing of needle. > hashing avoids scanning of duplicate entries in needle across the string. > initializing the hash table with Vector instructions (VSX) by quadword access. > unrolling when scanning for character in string across hash table. ChangeLog: 2014-03-04 Vidya Ranganathan * sysdeps/powerpc/powerpc64/power7/strspn.S: New file: Optimization. * sysdeps/powerpc/powerpc64/multiarch/strspn.c: New file: multiarch strspn for PPC64. * sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c: New file * sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S: New file * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: (__libc_ifunc_impl_list): Likewise. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Add strspn multiarch optimizations * string/strspn.c (strspn): Using macro to redefine symbol name. Signed-off-by: Vidya Ranganathan --- string/strspn.c | 5 +- sysdeps/powerpc/powerpc64/multiarch/Makefile | 3 +- .../powerpc/powerpc64/multiarch/ifunc-impl-list.c | 8 ++ .../powerpc/powerpc64/multiarch/strspn-power7.S | 40 ++++++ sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c | 33 +++++ sysdeps/powerpc/powerpc64/multiarch/strspn.c | 31 +++++ sysdeps/powerpc/powerpc64/power7/strspn.S | 148 +++++++++++++++++++++ 7 files changed, 266 insertions(+), 2 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strspn.c create mode 100644 sysdeps/powerpc/powerpc64/power7/strspn.S diff --git a/string/strspn.c b/string/strspn.c index 37e8161..557eec5 100644 --- a/string/strspn.c +++ b/string/strspn.c @@ -18,11 +18,14 @@ #include #undef strspn +#ifndef STRSPN +#define STRSPN strspn +#endif /* Return the length of the maximum initial segment of S which contains only characters in ACCEPT. */ size_t -strspn (s, accept) +STRSPN (s, accept) const char *s; const char *accept; { diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile index a03569e..3e8010c 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile @@ -14,7 +14,8 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \ wcsrchr-ppc64 wcscpy-power7 wcscpy-power6 wcscpy-ppc64 \ wordcopy-power7 wordcopy-power6 wordcopy-ppc64 \ strcpy-power7 strcpy-ppc64 stpcpy-power7 stpcpy-ppc64 \ - strrchr-power7 strrchr-ppc64 strncat-power7 strncat-ppc64 + strrchr-power7 strrchr-ppc64 strncat-power7 strncat-ppc64 \ + strspn-power7 strspn-ppc64 CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c index 11a3bed..20d7918 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c @@ -254,5 +254,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, strncat, 1, __strncat_ppc)) + /* Support sysdeps/powerpc/powerpc64/multiarch/strspn.c. */ + IFUNC_IMPL (i, name, strspn, + IFUNC_IMPL_ADD (array, i, strspn, + hwcap & PPC_FEATURE_HAS_VSX, + __strspn_power7) + IFUNC_IMPL_ADD (array, i, strspn, 1, + __strspn_ppc)) + return i; } diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S new file mode 100644 index 0000000..889dfee --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S @@ -0,0 +1,40 @@ +/* Optimized strspn implementation for POWER7. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#undef EALIGN +#define EALIGN(name, alignt, words) \ + .section ".text"; \ + ENTRY_2(__strspn_power7) \ + .align ALIGNARG(alignt); \ + EALIGN_W_##words; \ + BODY_LABEL(__strspn_power7): \ + cfi_startproc; \ + LOCALENTRY(__strspn_power7) + +#undef END +#define END(name) \ + cfi_endproc; \ + TRACEBACK(__strspn_power7) \ + END_2(__strspn_power7) + +#undef libc_hidden_builtin_def +#define libc_hidden_builtin_def(name) + +#include diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c new file mode 100644 index 0000000..d543772 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c @@ -0,0 +1,33 @@ +/* Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#define STRSPN __strspn_ppc +#undef weak_alias +#define weak_alias(name, aliasname) \ + extern __typeof (__strspn_ppc) aliasname \ + __attribute__ ((weak, alias ("__strspn_ppc"))); +#if !defined(NOT_IN_libc) && defined(SHARED) +# undef libc_hidden_builtin_def +# define libc_hidden_builtin_def(name) \ + __hidden_ver1(__strspn_ppc, __GI_strspn, __strspn_ppc); +#endif + +extern __typeof (strspn) __strspn_ppc attribute_hidden; + +#include diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn.c b/sysdeps/powerpc/powerpc64/multiarch/strspn.c new file mode 100644 index 0000000..44945f3 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn.c @@ -0,0 +1,31 @@ +/* Multiple versions of strspn. PowerPC64 version. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if defined SHARED && !defined NOT_IN_libc +# include +# include +# include "init-arch.h" + +extern __typeof (strspn) __strspn_ppc attribute_hidden; +extern __typeof (strspn) __strspn_power7 attribute_hidden; + +libc_ifunc (strspn, + (hwcap & PPC_FEATURE_HAS_VSX) + ? __strspn_power7 + : __strspn_ppc); +#endif diff --git a/sysdeps/powerpc/powerpc64/power7/strspn.S b/sysdeps/powerpc/powerpc64/power7/strspn.S new file mode 100644 index 0000000..876e9cc --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power7/strspn.S @@ -0,0 +1,148 @@ +/* Optimized strspn implementation for PowerPC64/POWER7. + + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* size_t [r3] strspn (const char *string [r3], + const char *needleAccept [r4] */ + +/* Performance gains are grabbed through following techniques: + + > hashing of needle. + > hashing avoids scanning of duplicate entries in needle across the string. + > initializing the hash table with Vector instructions by quadword access. + > unrolling when scanning for character in string across hash table. */ + +/* Algorithm is as below: + 1. A empty hash table/dictionary is created comprising of 256 ascii character set + 2. When hash entry is found in needle , the hash index is initialized to 1 + 3. The string is scanned until end and for every character, its corresponding + hash index is compared. + 4. initial length of string (count) until first hit of accept needle to be found + is set to 0 + 4. If hash index is set to 1 for the index of string, count is returned. + 5. Otherwise count is incremented and scanning continues until end of string. */ + +#include + +#undef strspn + + .machine power7 +EALIGN(strspn, 4, 0) + CALL_MCOUNT 2 + + lbz r10, 0(r4) /* load r10 with needle (r4) */ + addi r9, r1, -256 /* r9 is a hash of 256 bytes */ + + li r5, 16 /* set r5 = 16 as offset */ + li r6, 32 /* set r6 = 32 as offset */ + li r8, 48 /* set r8 = 48 as offset */ + +/*Iniatliaze hash table with Zeroes in double indexed quadword accesses */ + xxlxor v0, v0, v0 /* prepare for initializing hash */ + + stxvd2x v0, r0, r9 /* initialize 1st quadword */ + stxvd2x v0, r9, r5 + stxvd2x v0, r9, r6 + stxvd2x v0, r9, r8 /* initialize 4th quadword */ + + addi r11, r9, 64 /* r11 is index to hash */ + + stxvd2x v0, r0, r11 /* initialize 5th quadword */ + stxvd2x v0, r11, r5 + stxvd2x v0, r11, r6 + stxvd2x v0, r11, r8 /* initialize 8th quadword */ + + addi r11, r9, 128 /* r11 is index to hash */ + + stxvd2x v0, r0, r11 /* initialize 9th quadword */ + stxvd2x v0, r11, r5 + stxvd2x v0, r11, r6 + stxvd2x v0, r11, r8 /* initialize 12th quadword */ + + addi r11, r9, 192 /* r11 is index to hash */ + + stxvd2x v0, r0, r11 /* initialize 13th quadword */ + stxvd2x v0, r11, r5 + stxvd2x v0, r11, r6 + stxvd2x v0, r11, r8 /* initialize 16th quadword */ + + li r8, 1 /* r8=1, marker into hash if found in needle */ + + cmpdi cr7, r10, 0 /* accept needle is NULL */ + beq cr7, L(skipHashing) /* if needle is NULL, skip hashing */ + + .p2align 4 /* align section to 16 byte boundary */ +L(hashing): + stbx r8, r9, r10 /* update hash with marker for the pivot of the needle */ + lbzu r10, 1(r4) /* load needle into r10 and update to next */ + cmpdi cr7, r10, 0 /* if needle is has reached NULL, continue */ + bne cr7, L(hashing) /* loop to hash the needle */ + +L(skipHashing): + li r10, 0 /* load counter = 0 */ + b L(beginScan) + + .p2align 4 /* align section to 16 byte boundary */ +L(scanUnroll): + lbzx r8, r9, r8 /* load r8 with hash value at index */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7, L(ret1stIndex) /* we have hit accept needle, return the count */ + + lbz r8, 1(r3) /* load string[1] into r8 */ + addi r10, r10, 4 /* increment counter */ + lbzx r8, r9, r8 /* load r8 with hash value at index */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7, L(ret2ndIndex) /* we have hit accept needle, return the count */ + + lbz r8, 2(r3) /* load string[2] into r8 */ + lbzx r8, r9, r8 /* load r8 with hash value at index */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7, L(ret3rdIndex) /* we have hit accept needle, return the count */ + + lbz r8, 3(r3) /* load string[3] into r8 */ + lbzx r8, r9, r8 /* load r8 with hash value at index */ + addi r3, r3, 4 /* unroll factor , increment string by 4 */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7,L(ret4thIndex) /* we have hit accept needle, return the count */ + +L(beginScan): + lbz r8, 0(r3) /* load string[0] into r8 */ + addi r6, r10, 1 /* place holder for counter + 1 */ + addi r5, r10, 2 /* place holder for counter + 2 */ + addi r4, r10, 3 /* place holder for counter + 3 */ + cmpdi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + bne cr7, L(scanUnroll) /* continue scanning */ + +L(ret1stIndex): + mr r3, r10 /* update r3 for return */ + blr /* return */ + +L(ret2ndIndex): + mr r3, r6 /* update r3 for return */ + blr /* return */ + +L(ret3rdIndex): + mr r3, r5 /* update r3 for return */ + blr /* return */ + +L(ret4thIndex): + mr r3, r4 /* update r3 for return */ + blr /* done */ + +END(strspn) +libc_hidden_builtin_def (strspn) -- 1.8.3.1