From patchwork Wed Dec 10 09:47:35 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Earnshaw X-Patchwork-Id: 4134 Received: (qmail 28805 invoked by alias); 10 Dec 2014 09:47:40 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 28796 invoked by uid 89); 10 Dec 2014 09:47:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Message-ID: <548816B7.1060901@arm.com> Date: Wed, 10 Dec 2014 09:47:35 +0000 From: Richard Earnshaw User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Andrew Pinski , Kyle McMartin CC: Marcus Shawcroft , "libc-alpha@sourceware.org" Subject: Re: [PATCH, AArch64] Add optimized strchrnul References: <539AD11E.50507@arm.com> <20141203174546.GG4170@redacted.bos.redhat.com> <5488148C.2020805@arm.com> In-Reply-To: <5488148C.2020805@arm.com> X-MC-Unique: 114121009473608701 On 10/12/14 09:38, Richard Earnshaw wrote: > On 09/12/14 22:38, Andrew Pinski wrote: >> On Wed, Dec 3, 2014 at 9:45 AM, Kyle McMartin wrote: >>> On Wed, Nov 05, 2014 at 02:02:22PM +0000, Marcus Shawcroft wrote: >>>> On 13 June 2014 11:23, Richard Earnshaw wrote: >>>>> Here is an optimized implementation of __strchrnul. The simplification >>>>> that we don't have to track precisely why the loop terminates (match or >>>>> end-of-string) means we have to do less work in both setup and the core >>>>> inner loop. That means this should never be slower than strchr. >>>>> >>>>> As with strchr, the use of LD1 means we do not need different versions >>>>> for big-/little-endian. >>>>> >>>>> Richard Earnshaw >>>>> >>>>> * sysdeps/aarch64/strchrnul.S: New file. >>>> >>>> Committed. + NEWS entry. /Marcus >>> >>> Hrm, I've bisected this commit as the cause of localedef segfaulting as >>> part of install-locales... it ends up crashing weirdly inside of >>> vsprintf's internals. Have either of you seen anything similar? >> >> I just ran into this with an ILP32 compiled glibc (with thunderX >> tuning and some extra patches) where printf was crashing. The problem >> is v15 is being used but it is a callee saved register. Here is the >> documentation from the ABI: >> Registers v8-v15 must be preserved by a callee across subroutine >> calls; the remaining registers (v0-v7, v16-v31) >> do not need to be preserved (or should be preserved by the caller). >> >> Thanks, >> Andrew Pinski >> >>> >>> regards, Kyle >> > > Well spotted! I'll push a fix round the various implementations of this > ASAP. > > R. > And here's the patch. I'd commit this as obvious, but I don't have a commit bit for glibc :-( * sysdeps/aarch64/strchrnul.S (vrepmask): Use a call-clobbered register. R. diff --git a/sysdeps/aarch64/strchrnul.S b/sysdeps/aarch64/strchrnul.S index b98c2e9..57835d8 100644 --- a/sysdeps/aarch64/strchrnul.S +++ b/sysdeps/aarch64/strchrnul.S @@ -46,7 +46,7 @@ #define vhas_nul2 v4 #define vhas_chr1 v5 #define vhas_chr2 v6 -#define vrepmask v15 +#define vrepmask v7 #define vend1 v16 /* Core algorithm.