[AArch64] Add optimized strchrnul

Message ID 548816B7.1060901@arm.com
State Committed
Headers

Commit Message

Richard Earnshaw Dec. 10, 2014, 9:47 a.m. UTC
  On 10/12/14 09:38, Richard Earnshaw wrote:
> On 09/12/14 22:38, Andrew Pinski wrote:
>> On Wed, Dec 3, 2014 at 9:45 AM, Kyle McMartin <kmcmarti@redhat.com> wrote:
>>> On Wed, Nov 05, 2014 at 02:02:22PM +0000, Marcus Shawcroft wrote:
>>>> On 13 June 2014 11:23, Richard Earnshaw <rearnsha@arm.com> wrote:
>>>>> Here is an optimized implementation of __strchrnul.  The simplification
>>>>> that we don't have to track precisely why the loop terminates (match or
>>>>> end-of-string) means we have to do less work in both setup and the core
>>>>> inner loop.  That means this should never be slower than strchr.
>>>>>
>>>>> As with strchr, the use of LD1 means we do not need different versions
>>>>> for big-/little-endian.
>>>>>
>>>>> <date>  Richard Earnshaw  <rearnsha@arm.com>
>>>>>
>>>>>         * sysdeps/aarch64/strchrnul.S: New file.
>>>>
>>>> Committed. + NEWS entry. /Marcus
>>>
>>> Hrm, I've bisected this commit as the cause of localedef segfaulting as
>>> part of install-locales... it ends up crashing weirdly inside of
>>> vsprintf's internals. Have either of you seen anything similar?
>>
>> I just ran into this with an ILP32 compiled glibc (with thunderX
>> tuning and some extra patches) where printf was crashing.  The problem
>> is v15 is being used but it is a callee saved register.  Here is the
>> documentation from the ABI:
>> Registers v8-v15 must be preserved by a callee across subroutine
>> calls; the remaining registers (v0-v7, v16-v31)
>> do not need to be preserved (or should be preserved by the caller).
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>> regards, Kyle
>>
> 
> Well spotted!  I'll push a fix round the various implementations of this
> ASAP.
> 
> R.
> 

And here's the patch.

I'd commit this as obvious, but I don't have a commit bit for glibc :-(

	* sysdeps/aarch64/strchrnul.S (vrepmask): Use a call-clobbered
	register.

R.
  

Comments

Marcus Shawcroft Dec. 10, 2014, 9:56 a.m. UTC | #1
On 10 December 2014 at 09:47, Richard Earnshaw <rearnsha@arm.com> wrote:

> And here's the patch.
>
> I'd commit this as obvious, but I don't have a commit bit for glibc :-(
>
>         * sysdeps/aarch64/strchrnul.S (vrepmask): Use a call-clobbered
>         register.
>
> R.

Committed.

Richard, why why don't you ask the overseers list to give you write access ?

Cheers
/Marcus
  

Patch

diff --git a/sysdeps/aarch64/strchrnul.S b/sysdeps/aarch64/strchrnul.S
index b98c2e9..57835d8 100644
--- a/sysdeps/aarch64/strchrnul.S
+++ b/sysdeps/aarch64/strchrnul.S
@@ -46,7 +46,7 @@ 
 #define vhas_nul2	v4
 #define vhas_chr1	v5
 #define vhas_chr2	v6
-#define vrepmask	v15
+#define vrepmask	v7
 #define vend1		v16
 
 /* Core algorithm.