From patchwork Mon Mar 10 17:36:36 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: R Vidya X-Patchwork-Id: 24 X-Patchwork-Delegate: azanella@linux.vnet.ibm.com Return-Path: X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx20.g.dreamhost.com (caibbdcaaahc.dreamhost.com [208.113.200.72]) by wilcox.dreamhost.com (Postfix) with ESMTP id 356D6360097 for ; Mon, 10 Mar 2014 10:36:59 -0700 (PDT) Received: by homiemail-mx20.g.dreamhost.com (Postfix, from userid 14307373) id E0257407B2D27; Mon, 10 Mar 2014 10:36:58 -0700 (PDT) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx20.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx20.g.dreamhost.com (Postfix) with ESMTPS id B917D407A4C5E for ; Mon, 10 Mar 2014 10:36:58 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type; q=dns; s=default; b=PVEw0I Tb97TGjfbwhc3qktieoUHet8M9zEl1xR/h23a0E30MphvkOl3Tx+0uQ7Wl74O55y tRqNtiNrYh10VZIdKxvxjwKXNjbI10RNx7l+hkDF6XlUrc9LPBPG22VnBjVh/aJs CAd6yp6aloR/rZJ76+e35rUJh2DyF6rZXKdkA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type; s=default; bh=HyU1ljk2UWml sMj4Z+MwCpwV9gI=; b=bpeLrxQsITrCOv0klRViU7PB2aCqLR32hoJ1YB4YH7Dj NYOiR/ITLgYbV48VkxG0etnQ2qopBSUtSJjZ/afRtEHE0/cZ3eU2FWrAPUtEQPiG 4Dycoc3WpLcotssOVRnh3hpnqtLYUJs0bkEVodUhfNuGImUeiyJhrmiAa+p94/Y= Received: (qmail 5421 invoked by alias); 10 Mar 2014 17:36:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 5410 invoked by uid 89); 10 Mar 2014 17:36:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e24smtp01.br.ibm.com Message-ID: <531DF824.3040302@linux.vnet.ibm.com> Date: Mon, 10 Mar 2014 23:06:36 +0530 From: R Vidya User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Adhemerval Zanella , libc-alpha@sourceware.org Subject: Re: [PATCH] Multiarch optimization for strspn on POWERPC References: <5315CD58.5080002@linux.vnet.ibm.com> <53163BD1.5000106@linux.vnet.ibm.com> <53199E07.6040009@linux.vnet.ibm.com> <531DCEA2.4060305@linux.vnet.ibm.com> In-Reply-To: <531DCEA2.4060305@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14031017-1524-0000-0000-0000092902ED X-DH-Original-To: glibc@patchwork.siddhesh.in Hi Adhemerval , fixed it. Please push if OKay. On Monday 10 March 2014 08:09 PM, Adhemerval Zanella wrote: > Hi Vidya, > > Sorry if I didn't catch it early, but just noted some line are not respecting the 80 column > requirement from code guidelines, please fix these (example below). Also, I just also noted > since you are touching 'string/strspn.c' it is worth of cleanup the old C-style (comment > below). With this change I think it is good to push upstream. > > > On 07-03-2014 07:23, R Vidya wrote: >> diff --git a/string/strspn.c b/string/strspn.c >> index 37e8161..557eec5 100644 >> --- a/string/strspn.c >> +++ b/string/strspn.c >> @@ -18,11 +18,14 @@ >> #include >> >> #undef strspn >> +#ifndef STRSPN >> +#define STRSPN strspn >> +#endif >> >> /* Return the length of the maximum initial segment >> of S which contains only characters in ACCEPT. */ >> size_t >> -strspn (s, accept) >> +STRSPN (s, accept) >> const char *s; >> const char *accept; >> { > There is no need on using old K&R style, just convert it to STRSPN (const char *s, const char *accept) > >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile >> index a03569e..3e8010c 100644 >> --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile >> +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile >> @@ -14,7 +14,8 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \ >> wcsrchr-ppc64 wcscpy-power7 wcscpy-power6 wcscpy-ppc64 \ >> wordcopy-power7 wordcopy-power6 wordcopy-ppc64 \ >> strcpy-power7 strcpy-ppc64 stpcpy-power7 stpcpy-ppc64 \ >> - strrchr-power7 strrchr-ppc64 strncat-power7 strncat-ppc64 >> + strrchr-power7 strrchr-ppc64 strncat-power7 strncat-ppc64 \ >> + strspn-power7 strspn-ppc64 >> >> CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops >> CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c >> index 11a3bed..20d7918 100644 >> --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c >> +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c >> @@ -254,5 +254,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, >> IFUNC_IMPL_ADD (array, i, strncat, 1, >> __strncat_ppc)) >> >> + /* Support sysdeps/powerpc/powerpc64/multiarch/strspn.c. */ >> + IFUNC_IMPL (i, name, strspn, >> + IFUNC_IMPL_ADD (array, i, strspn, >> + hwcap & PPC_FEATURE_HAS_VSX, >> + __strspn_power7) >> + IFUNC_IMPL_ADD (array, i, strspn, 1, >> + __strspn_ppc)) >> + >> return i; >> } >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S >> new file mode 100644 >> index 0000000..889dfee >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S >> @@ -0,0 +1,40 @@ >> +/* Optimized strspn implementation for POWER7. >> + Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#include >> + >> +#undef EALIGN >> +#define EALIGN(name, alignt, words) \ >> + .section ".text"; \ >> + ENTRY_2(__strspn_power7) \ >> + .align ALIGNARG(alignt); \ >> + EALIGN_W_##words; \ >> + BODY_LABEL(__strspn_power7): \ >> + cfi_startproc; \ >> + LOCALENTRY(__strspn_power7) >> + >> +#undef END >> +#define END(name) \ >> + cfi_endproc; \ >> + TRACEBACK(__strspn_power7) \ >> + END_2(__strspn_power7) >> + >> +#undef libc_hidden_builtin_def >> +#define libc_hidden_builtin_def(name) >> + >> +#include >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c >> new file mode 100644 >> index 0000000..d543772 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c >> @@ -0,0 +1,33 @@ >> +/* Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#include >> + >> +#define STRSPN __strspn_ppc >> +#undef weak_alias >> +#define weak_alias(name, aliasname) \ >> + extern __typeof (__strspn_ppc) aliasname \ >> + __attribute__ ((weak, alias ("__strspn_ppc"))); >> +#if !defined(NOT_IN_libc) && defined(SHARED) >> +# undef libc_hidden_builtin_def >> +# define libc_hidden_builtin_def(name) \ >> + __hidden_ver1(__strspn_ppc, __GI_strspn, __strspn_ppc); >> +#endif >> + >> +extern __typeof (strspn) __strspn_ppc attribute_hidden; >> + >> +#include >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn.c b/sysdeps/powerpc/powerpc64/multiarch/strspn.c >> new file mode 100644 >> index 0000000..44945f3 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn.c >> @@ -0,0 +1,31 @@ >> +/* Multiple versions of strspn. PowerPC64 version. >> + Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#if defined SHARED && !defined NOT_IN_libc >> +# include >> +# include >> +# include "init-arch.h" >> + >> +extern __typeof (strspn) __strspn_ppc attribute_hidden; >> +extern __typeof (strspn) __strspn_power7 attribute_hidden; >> + >> +libc_ifunc (strspn, >> + (hwcap & PPC_FEATURE_HAS_VSX) >> + ? __strspn_power7 >> + : __strspn_ppc); >> +#endif >> diff --git a/sysdeps/powerpc/powerpc64/power7/strspn.S b/sysdeps/powerpc/powerpc64/power7/strspn.S >> new file mode 100644 >> index 0000000..876e9cc >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/power7/strspn.S >> @@ -0,0 +1,148 @@ >> +/* Optimized strspn implementation for PowerPC64/POWER7. >> + >> + Copyright (C) 2014 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +/* size_t [r3] strspn (const char *string [r3], >> + const char *needleAccept [r4] */ >> + >> +/* Performance gains are grabbed through following techniques: >> + >> + > hashing of needle. >> + > hashing avoids scanning of duplicate entries in needle across the string. >> + > initializing the hash table with Vector instructions by quadword access. >> + > unrolling when scanning for character in string across hash table. */ >> + >> +/* Algorithm is as below: >> + 1. A empty hash table/dictionary is created comprising of 256 ascii character set > Here it is an example of the column limit limit, please break it to fix 80 column. > Also check for all file, since there are other issues relating to this. > >> + 2. When hash entry is found in needle , the hash index is initialized to 1 >> + 3. The string is scanned until end and for every character, its corresponding >> + hash index is compared. >> + 4. initial length of string (count) until first hit of accept needle to be found >> + is set to 0 >> + 4. If hash index is set to 1 for the index of string, count is returned. >> + 5. Otherwise count is incremented and scanning continues until end of string. */ >> + >> +#include >> + >> +#undef strspn >> + >> + .machine power7 >> +EALIGN(strspn, 4, 0) >> + CALL_MCOUNT 2 >> + >> + lbz r10, 0(r4) /* load r10 with needle (r4) */ >> + addi r9, r1, -256 /* r9 is a hash of 256 bytes */ >> + >> + li r5, 16 /* set r5 = 16 as offset */ >> + li r6, 32 /* set r6 = 32 as offset */ >> + li r8, 48 /* set r8 = 48 as offset */ >> + >> +/*Iniatliaze hash table with Zeroes in double indexed quadword accesses */ >> + xxlxor v0, v0, v0 /* prepare for initializing hash */ >> + >> + stxvd2x v0, r0, r9 /* initialize 1st quadword */ >> + stxvd2x v0, r9, r5 >> + stxvd2x v0, r9, r6 >> + stxvd2x v0, r9, r8 /* initialize 4th quadword */ >> + >> + addi r11, r9, 64 /* r11 is index to hash */ >> + >> + stxvd2x v0, r0, r11 /* initialize 5th quadword */ >> + stxvd2x v0, r11, r5 >> + stxvd2x v0, r11, r6 >> + stxvd2x v0, r11, r8 /* initialize 8th quadword */ >> + >> + addi r11, r9, 128 /* r11 is index to hash */ >> + >> + stxvd2x v0, r0, r11 /* initialize 9th quadword */ >> + stxvd2x v0, r11, r5 >> + stxvd2x v0, r11, r6 >> + stxvd2x v0, r11, r8 /* initialize 12th quadword */ >> + >> + addi r11, r9, 192 /* r11 is index to hash */ >> + >> + stxvd2x v0, r0, r11 /* initialize 13th quadword */ >> + stxvd2x v0, r11, r5 >> + stxvd2x v0, r11, r6 >> + stxvd2x v0, r11, r8 /* initialize 16th quadword */ >> + >> + li r8, 1 /* r8=1, marker into hash if found in needle */ >> + >> + cmpdi cr7, r10, 0 /* accept needle is NULL */ >> + beq cr7, L(skipHashing) /* if needle is NULL, skip hashing */ >> + >> + .p2align 4 /* align section to 16 byte boundary */ >> +L(hashing): >> + stbx r8, r9, r10 /* update hash with marker for the pivot of the needle */ >> + lbzu r10, 1(r4) /* load needle into r10 and update to next */ >> + cmpdi cr7, r10, 0 /* if needle is has reached NULL, continue */ >> + bne cr7, L(hashing) /* loop to hash the needle */ >> + >> +L(skipHashing): >> + li r10, 0 /* load counter = 0 */ >> + b L(beginScan) >> + >> + .p2align 4 /* align section to 16 byte boundary */ >> +L(scanUnroll): >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7, L(ret1stIndex) /* we have hit accept needle, return the count */ >> + >> + lbz r8, 1(r3) /* load string[1] into r8 */ >> + addi r10, r10, 4 /* increment counter */ >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7, L(ret2ndIndex) /* we have hit accept needle, return the count */ >> + >> + lbz r8, 2(r3) /* load string[2] into r8 */ >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7, L(ret3rdIndex) /* we have hit accept needle, return the count */ >> + >> + lbz r8, 3(r3) /* load string[3] into r8 */ >> + lbzx r8, r9, r8 /* load r8 with hash value at index */ >> + addi r3, r3, 4 /* unroll factor , increment string by 4 */ >> + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + beq cr7,L(ret4thIndex) /* we have hit accept needle, return the count */ >> + >> +L(beginScan): >> + lbz r8, 0(r3) /* load string[0] into r8 */ >> + addi r6, r10, 1 /* place holder for counter + 1 */ >> + addi r5, r10, 2 /* place holder for counter + 2 */ >> + addi r4, r10, 3 /* place holder for counter + 3 */ >> + cmpdi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ >> + bne cr7, L(scanUnroll) /* continue scanning */ >> + >> +L(ret1stIndex): >> + mr r3, r10 /* update r3 for return */ >> + blr /* return */ >> + >> +L(ret2ndIndex): >> + mr r3, r6 /* update r3 for return */ >> + blr /* return */ >> + >> +L(ret3rdIndex): >> + mr r3, r5 /* update r3 for return */ >> + blr /* return */ >> + >> +L(ret4thIndex): >> + mr r3, r4 /* update r3 for return */ >> + blr /* done */ >> + >> +END(strspn) >> +libc_hidden_builtin_def (strspn) >From 49b84f6296408669abd6d73beb5fe7e2c37fcdba Mon Sep 17 00:00:00 2001 From: Vidya Ranganathan Date: Mon, 10 Mar 2014 12:20:36 -0400 Subject: [PATCH] Multiarch optimization for strspn() on PowerPC. I have attached the benchtest output to show the performance improvement. The optimization is achieved by following techniques: > hashing of needle. > hashing avoids scanning of duplicate entries in needle across the string. > initializing the hash table with Vector instructions (VSX) by quadword access. > unrolling when scanning for character in string across hash table. ChangeLog: 2014-03-04 Vidya Ranganathan * sysdeps/powerpc/powerpc64/power7/strspn.S: New file: Optimization. * sysdeps/powerpc/powerpc64/multiarch/strspn.c: New file: multiarch strspn for PPC64. * sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c: New file * sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S: New file * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: (__libc_ifunc_impl_list): Likewise. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Add strspn multiarch optimizations * string/strspn.c (strspn): Using macro to redefine symbol name. Signed-off-by: Vidya Ranganathan --- string/strspn.c | 7 +- sysdeps/powerpc/powerpc64/multiarch/Makefile | 3 +- .../powerpc/powerpc64/multiarch/ifunc-impl-list.c | 8 ++ .../powerpc/powerpc64/multiarch/strspn-power7.S | 40 ++++++ sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c | 33 +++++ sysdeps/powerpc/powerpc64/multiarch/strspn.c | 31 +++++ sysdeps/powerpc/powerpc64/power7/strspn.S | 155 +++++++++++++++++++++ 7 files changed, 273 insertions(+), 4 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strspn.c create mode 100644 sysdeps/powerpc/powerpc64/power7/strspn.S diff --git a/string/strspn.c b/string/strspn.c index 37e8161..c2d6364 100644 --- a/string/strspn.c +++ b/string/strspn.c @@ -18,13 +18,14 @@ #include #undef strspn +#ifndef STRSPN +#define STRSPN strspn +#endif /* Return the length of the maximum initial segment of S which contains only characters in ACCEPT. */ size_t -strspn (s, accept) - const char *s; - const char *accept; +STRSPN (const char *s, const char *accept) { const char *p; const char *a; diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile index a03569e..3e8010c 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/Makefile +++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile @@ -14,7 +14,8 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \ wcsrchr-ppc64 wcscpy-power7 wcscpy-power6 wcscpy-ppc64 \ wordcopy-power7 wordcopy-power6 wordcopy-ppc64 \ strcpy-power7 strcpy-ppc64 stpcpy-power7 stpcpy-ppc64 \ - strrchr-power7 strrchr-ppc64 strncat-power7 strncat-ppc64 + strrchr-power7 strrchr-ppc64 strncat-power7 strncat-ppc64 \ + strspn-power7 strspn-ppc64 CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c index 11a3bed..20d7918 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c @@ -254,5 +254,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, strncat, 1, __strncat_ppc)) + /* Support sysdeps/powerpc/powerpc64/multiarch/strspn.c. */ + IFUNC_IMPL (i, name, strspn, + IFUNC_IMPL_ADD (array, i, strspn, + hwcap & PPC_FEATURE_HAS_VSX, + __strspn_power7) + IFUNC_IMPL_ADD (array, i, strspn, 1, + __strspn_ppc)) + return i; } diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S new file mode 100644 index 0000000..889dfee --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-power7.S @@ -0,0 +1,40 @@ +/* Optimized strspn implementation for POWER7. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#undef EALIGN +#define EALIGN(name, alignt, words) \ + .section ".text"; \ + ENTRY_2(__strspn_power7) \ + .align ALIGNARG(alignt); \ + EALIGN_W_##words; \ + BODY_LABEL(__strspn_power7): \ + cfi_startproc; \ + LOCALENTRY(__strspn_power7) + +#undef END +#define END(name) \ + cfi_endproc; \ + TRACEBACK(__strspn_power7) \ + END_2(__strspn_power7) + +#undef libc_hidden_builtin_def +#define libc_hidden_builtin_def(name) + +#include diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c new file mode 100644 index 0000000..d543772 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c @@ -0,0 +1,33 @@ +/* Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#define STRSPN __strspn_ppc +#undef weak_alias +#define weak_alias(name, aliasname) \ + extern __typeof (__strspn_ppc) aliasname \ + __attribute__ ((weak, alias ("__strspn_ppc"))); +#if !defined(NOT_IN_libc) && defined(SHARED) +# undef libc_hidden_builtin_def +# define libc_hidden_builtin_def(name) \ + __hidden_ver1(__strspn_ppc, __GI_strspn, __strspn_ppc); +#endif + +extern __typeof (strspn) __strspn_ppc attribute_hidden; + +#include diff --git a/sysdeps/powerpc/powerpc64/multiarch/strspn.c b/sysdeps/powerpc/powerpc64/multiarch/strspn.c new file mode 100644 index 0000000..44945f3 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/multiarch/strspn.c @@ -0,0 +1,31 @@ +/* Multiple versions of strspn. PowerPC64 version. + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if defined SHARED && !defined NOT_IN_libc +# include +# include +# include "init-arch.h" + +extern __typeof (strspn) __strspn_ppc attribute_hidden; +extern __typeof (strspn) __strspn_power7 attribute_hidden; + +libc_ifunc (strspn, + (hwcap & PPC_FEATURE_HAS_VSX) + ? __strspn_power7 + : __strspn_ppc); +#endif diff --git a/sysdeps/powerpc/powerpc64/power7/strspn.S b/sysdeps/powerpc/powerpc64/power7/strspn.S new file mode 100644 index 0000000..6033b02 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power7/strspn.S @@ -0,0 +1,155 @@ +/* Optimized strspn implementation for PowerPC64/POWER7. + + Copyright (C) 2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* size_t [r3] strspn (const char *string [r3], + const char *needleAccept [r4] */ + +/* Performance gains are grabbed through following techniques: + + > hashing of needle. + > hashing avoids scanning of duplicate entries in needle + across the string. + > initializing the hash table with Vector instructions + by quadword access. + > unrolling when scanning for character in string + across hash table. */ + +/* Algorithm is as below: + 1. A empty hash table/dictionary is created comprising of + 256 ascii character set + 2. When hash entry is found in needle , the hash index + is initialized to 1 + 3. The string is scanned until end and for every character, + its corresponding hash index is compared. + 4. initial length of string (count) until first hit of + accept needle to be found is set to 0 + 4. If hash index is set to 1 for the index of string, + count is returned. + 5. Otherwise count is incremented and scanning continues + until end of string. */ + +#include + +#undef strspn + + .machine power7 +EALIGN(strspn, 4, 0) + CALL_MCOUNT 2 + + lbz r10, 0(r4) /* load r10 with needle (r4) */ + addi r9, r1, -256 /* r9 is a hash of 256 bytes */ + + li r5, 16 /* set r5 = 16 as offset */ + li r6, 32 /* set r6 = 32 as offset */ + li r8, 48 /* set r8 = 48 as offset */ + +/*Iniatliaze hash table with Zeroes in double indexed quadword accesses */ + xxlxor v0, v0, v0 /* prepare for initializing hash */ + + stxvd2x v0, r0, r9 /* initialize 1st quadword */ + stxvd2x v0, r9, r5 + stxvd2x v0, r9, r6 + stxvd2x v0, r9, r8 /* initialize 4th quadword */ + + addi r11, r9, 64 /* r11 is index to hash */ + + stxvd2x v0, r0, r11 /* initialize 5th quadword */ + stxvd2x v0, r11, r5 + stxvd2x v0, r11, r6 + stxvd2x v0, r11, r8 /* initialize 8th quadword */ + + addi r11, r9, 128 /* r11 is index to hash */ + + stxvd2x v0, r0, r11 /* initialize 9th quadword */ + stxvd2x v0, r11, r5 + stxvd2x v0, r11, r6 + stxvd2x v0, r11, r8 /* initialize 12th quadword */ + + addi r11, r9, 192 /* r11 is index to hash */ + + stxvd2x v0, r0, r11 /* initialize 13th quadword */ + stxvd2x v0, r11, r5 + stxvd2x v0, r11, r6 + stxvd2x v0, r11, r8 /* initialize 16th quadword */ + + li r8, 1 /* r8=1, marker into hash if found in needle */ + + cmpdi cr7, r10, 0 /* accept needle is NULL */ + beq cr7, L(skipHashing) /* if needle is NULL, skip hashing */ + + .p2align 4 /* align section to 16 byte boundary */ +L(hashing): + stbx r8, r9, r10 /* update hash with marker for the pivot of the needle */ + lbzu r10, 1(r4) /* load needle into r10 and update to next */ + cmpdi cr7, r10, 0 /* if needle is has reached NULL, continue */ + bne cr7, L(hashing) /* loop to hash the needle */ + +L(skipHashing): + li r10, 0 /* load counter = 0 */ + b L(beginScan) + + .p2align 4 /* align section to 16 byte boundary */ +L(scanUnroll): + lbzx r8, r9, r8 /* load r8 with hash value at index */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7, L(ret1stIndex) /* we have hit accept needle, return the count */ + + lbz r8, 1(r3) /* load string[1] into r8 */ + addi r10, r10, 4 /* increment counter */ + lbzx r8, r9, r8 /* load r8 with hash value at index */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7, L(ret2ndIndex) /* we have hit accept needle, return the count */ + + lbz r8, 2(r3) /* load string[2] into r8 */ + lbzx r8, r9, r8 /* load r8 with hash value at index */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7, L(ret3rdIndex) /* we have hit accept needle, return the count */ + + lbz r8, 3(r3) /* load string[3] into r8 */ + lbzx r8, r9, r8 /* load r8 with hash value at index */ + addi r3, r3, 4 /* unroll factor , increment string by 4 */ + cmpwi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + beq cr7,L(ret4thIndex) /* we have hit accept needle, return the count */ + +L(beginScan): + lbz r8, 0(r3) /* load string[0] into r8 */ + addi r6, r10, 1 /* place holder for counter + 1 */ + addi r5, r10, 2 /* place holder for counter + 2 */ + addi r4, r10, 3 /* place holder for counter + 3 */ + cmpdi cr7, r8, 0 /* if we hit marker in hash, we have found accept needle */ + bne cr7, L(scanUnroll) /* continue scanning */ + +L(ret1stIndex): + mr r3, r10 /* update r3 for return */ + blr /* return */ + +L(ret2ndIndex): + mr r3, r6 /* update r3 for return */ + blr /* return */ + +L(ret3rdIndex): + mr r3, r5 /* update r3 for return */ + blr /* return */ + +L(ret4thIndex): + mr r3, r4 /* update r3 for return */ + blr /* done */ + +END(strspn) +libc_hidden_builtin_def (strspn) -- 1.8.3.1