From patchwork Tue Mar 18 10:01:38 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ondrej Bilka X-Patchwork-Id: 140 Return-Path: X-Original-To: siddhesh@wilcox.dreamhost.com Delivered-To: siddhesh@wilcox.dreamhost.com Received: from homiemail-mx23.g.dreamhost.com (caibbdcaabja.dreamhost.com [208.113.200.190]) by wilcox.dreamhost.com (Postfix) with ESMTP id 49FBF360183 for ; Tue, 18 Mar 2014 03:01:52 -0700 (PDT) Received: by homiemail-mx23.g.dreamhost.com (Postfix, from userid 14307373) id EE9D561C6EEA3; Tue, 18 Mar 2014 03:01:51 -0700 (PDT) X-Original-To: glibc@patchwork.siddhesh.in Delivered-To: x14307373@homiemail-mx23.g.dreamhost.com Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by homiemail-mx23.g.dreamhost.com (Postfix) with ESMTPS id D08BC61C63913 for ; Tue, 18 Mar 2014 03:01:51 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; q=dns; s=default; b=XtlW pgL6s5DysDMYKuST64xv6uVg5Wm8b8RAE7t9IFM6mfmrRWp3noaPyssGshs3M5bn AyLNcNHm9Bk3mujPld2rdHruslh9wQoY6PJLMSI7xtIjiMBfqdKgdmWM8+B2jwoW CFSkDf8VoUmzbM5VXfC5IPonpolgRIyTwXwhjdM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; s=default; bh=cDIF0dEFTy VhWDVxZWeUIo4F+a8=; b=wsmdek5W51Ibkqr+CmFfnchc1gxsdaG8H3inHgHAbn WBkgYrCV24Ecc9kAUlvML3WxB/XvByJuxsI0HMS6NtL3RwFZcughSVKg4x2Tv7XD tibxVXkrdsUODrn6dShH4FDPFV6xaFmTiRKVLiyj55U/QjvyAguTnF7w8WSj49VO c= Received: (qmail 5341 invoked by alias); 18 Mar 2014 10:01:49 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 5331 invoked by uid 89); 18 Mar 2014 10:01:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Tue, 18 Mar 2014 11:01:38 +0100 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: Carlos O'Donell Cc: libc-alpha@sourceware.org Subject: [PATCH 3/2] Use strspn/strcspn/strpbrk ifunc in internal calls. Message-ID: <20140318100138.GC8415@domone.podge> References: <20140227123238.GA26291@domone.podge> <20140227124206.GA26474@domone.podge> <5318A03D.3000705@redhat.com> <20140306163241.GA11843@domone.podge> <5318B58B.5040704@redhat.com> <20140306205212.GB11843@domone.podge> <53192422.2050101@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <53192422.2050101@redhat.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-DH-Original-To: glibc@patchwork.siddhesh.in To make a strtok faster and improve performance in general we need to do one additional change. A comment: /* It doesn't make sense to send libc-internal strcspn calls through a PLT. The speedup we get from using SSE4.2 instruction is likely eaten away by the indirect call in the PLT. */ Does not make sense at all because nobody bothered to check it. Gap between these implementations is quite big, when haystack is empty a sse2 is around 40 cycles slower because it needs to populate a lookup table and difference only increases with size. That is much bigger than plt slowdown which is few cycles. Even benchtest show a gap which also may be reverse by branch misprediction but my internal benchmark shown. simple_strspn stupid_strspn __strspn_sse42 __strspn_sse2 Length 0, alignment 0, acc len 6: 18.6562 35.2344 17.0469 61.6719 Length 6, alignment 0, acc len 6: 59.5469 72.5781 16.4219 73.625 This patch also handles strpbrk which is implemented by including a x86_64/multiarch/strcspn.S file. * sysdeps/x86_64/multiarch/strspn.S: Remove plt indirection. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. diff --git a/sysdeps/x86_64/multiarch/strcspn.S b/sysdeps/x86_64/multiarch/strcspn.S index 24f55e9..1b3e1aa 100644 --- a/sysdeps/x86_64/multiarch/strcspn.S +++ b/sysdeps/x86_64/multiarch/strcspn.S @@ -65,14 +65,7 @@ END(STRCSPN) # undef END # define END(name) \ cfi_endproc; .size STRCSPN_SSE2, .-STRCSPN_SSE2 -# undef libc_hidden_builtin_def -/* It doesn't make sense to send libc-internal strcspn calls through a PLT. - The speedup we get from using SSE4.2 instruction is likely eaten away - by the indirect call in the PLT. */ -# define libc_hidden_builtin_def(name) \ - .globl __GI_STRCSPN; __GI_STRCSPN = STRCSPN_SSE2 #endif - #endif /* HAVE_SSE4_SUPPORT */ #ifdef USE_AS_STRPBRK diff --git a/sysdeps/x86_64/multiarch/strspn.S b/sysdeps/x86_64/multiarch/strspn.S index bf7308e..fde1e1e 100644 --- a/sysdeps/x86_64/multiarch/strspn.S +++ b/sysdeps/x86_64/multiarch/strspn.S @@ -50,12 +50,6 @@ END(strspn) # undef END # define END(name) \ cfi_endproc; .size __strspn_sse2, .-__strspn_sse2 -# undef libc_hidden_builtin_def -/* It doesn't make sense to send libc-internal strspn calls through a PLT. - The speedup we get from using SSE4.2 instruction is likely eaten away - by the indirect call in the PLT. */ -# define libc_hidden_builtin_def(name) \ - .globl __GI_strspn; __GI_strspn = __strspn_sse2 #endif #endif /* HAVE_SSE4_SUPPORT */