From patchwork Tue Jun 2 08:59:09 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Rajalakshmi S X-Patchwork-Id: 7004 X-Patchwork-Delegate: tuliom@linux.vnet.ibm.com Received: (qmail 31968 invoked by alias); 2 Jun 2015 09:01:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 31941 invoked by uid 89); 2 Jun 2015 09:01:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.2 required=5.0 tests=AWL, BAYES_60, KAM_LAZY_DOMAIN_SECURITY, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: e28smtp07.in.ibm.com Message-ID: <556D705D.20201@linux.vnet.ibm.com> Date: Tue, 02 Jun 2015 14:29:09 +0530 From: Rajalakshmi Srinivasaraghavan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: =?UTF-8?B?T25kxZllaiBCw61sa2E=?= , Steven Munroe CC: GNU C Library , Steve Munroe Subject: Re: [PATCH] powerpc: strcasestr optimization References: <55687597.1060101@linux.vnet.ibm.com> <556C36D8.2070208@linux.vnet.ibm.com> <20150601122830.GA14649@domone> <1433169407.10235.5.camel@sjmunroe-ThinkPad-W500> <20150601162215.GA8955@domone> In-Reply-To: <20150601162215.GA8955@domone> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15060208-0025-0000-0000-0000052241AF On 06/01/2015 09:52 PM, Ondřej Bílka wrote: > On Mon, Jun 01, 2015 at 09:36:47AM -0500, Steven Munroe wrote: >> On Mon, 2015-06-01 at 14:28 +0200, Ondřej Bílka wrote: >>> On Mon, Jun 01, 2015 at 04:11:28PM +0530, Rajalakshmi Srinivasaraghavan wrote: >>>> >>>> This patch optimizes strcasestr function for power >= 7 systems. >>>> This patch uses optimized strlen and strnlen for calculating >>>> string length and the average improvement of this optimization is ~40%. >>>> This patch is tested on powerpc64 and powerpc64le. >>>> Attached the benchresults with this new patch. >>>> >>> Thats not enough. As strcasestr that I submited is around three times >>> slower your implementation would likely be regression over generic one. >>> >>> A problem here is that you use moronic algorithm. Fix algorithm first >>> before trying to optimize it. >>> >> >> This is not very helpful. You are demanding changes without clear >> explanation and justification. >> >> What is wrong with Raja's algorithm? What is insufficient in the >> benchmark data she has provided? And why do you think your specific >> design applies to PowerISA and POWER7/POWER8 micro-architecture. >> >> What data do you have that justified this objection? > > I replied on strstr patch thread on why what she submitted is > performance regression. So I will repeat arguments from other thread > which still apply. > > First was problem with quadratic behaviour. She tried to fix it but it > isn't a fix at all. Just benchmark > > strcasestr ("aaa...(4000 times)...aaa", "aaa...(2000 times)...aab") > > That call would take around hundred times than before which is > unacceptable. This is already handled in the patch.If the needle len is more than 2048, it calls default string/strcasestr.c > > If we ignore that red flag second problem was that benchmark she used is > bogus. It test with periodic haystacks, needle is copy of first bytes of > haystack with last byte set to something else. Which benchmark are you referring as bogus? The benchtest result attached in the previous thread was created using benchtests/bench-strcasestr.c . Since your proposed benchtest changes were not yet committed, I have used default ones. . . . > > Just use same patch like I send with ((unsigned) rand())%16 + 1 and you > will see completely different numbers in benchmark. > Benchtest results attached with these changes. . . > > As I don't have powerpc access now apply my patches > > [PATCH v5] Generic string skeleton > [PATCH v5 4*] Generic string search functions (strstr, strcasestr, memmem) > I have attached the benchtest result with the above patches applied along with benchtests/bench-strcasestr.c changes.(similiar changed as proposed by you for benchtests/bench-strstr.c). The result attached clearly shows improvement. > and run (preferably fixed) benchmark with these. As gains that I see on > x64 are bigger than ones gained by this assembly you will likely see > that generic implementation is indeed better and it would be pointless > to try review that only to remove it shortly after adding to improve > performance. > > diff --git a/benchtests/bench-strcasestr.c b/benchtests/bench-strcasestr.c index 33531a4..f579491 100644 --- a/benchtests/bench-strcasestr.c +++ b/benchtests/bench-strcasestr.c @@ -82,33 +82,14 @@ do_test (size_t align1, size_t align2, size_t len1, size_t len2, { char *s1 = (char *) (buf1 + align1); char *s2 = (char *) (buf2 + align2); - - static const char d[] = "1234567890abcxyz"; -#define dl (sizeof (d) - 1) - char *ss2 = s2; - for (size_t l = len2; l > 0; l = l > dl ? l - dl : 0) + for (size_t l = 0; l < len2; l++) { - size_t t = l > dl ? dl : l; - ss2 = mempcpy (ss2, d, t); + s2[l] = (rand () % 128) + 1; } s2[len2] = '\0'; - - if (fail) - { - char *ss1 = s1; - for (size_t l = len1; l > 0; l = l > dl ? l - dl : 0) - { - size_t t = l > dl ? dl : l; - memcpy (ss1, d, t); - ++ss1[len2 > 7 ? 7 : len2 - 1]; - ss1 += t; - } - } - else + for (size_t l = 0; l < len1; l++) { - memset (s1, '0', len1); - for (size_t i = 0; i < len2; ++i) - s1[len1 - len2 + i] = toupper (s2[i]); + s1[l] = (rand () % 128) + 1; } s1[len1] = '\0';