Patchwork aarch64: Improve strncmp for mutually misaligned inputs

login
register
mail settings
Submitter Wilco Dijkstra
Date March 14, 2018, 2:04 p.m.
Message ID <DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com>
Download mbox | patch
Permalink /patch/26308/
State New
Headers show

Comments

Wilco Dijkstra - March 14, 2018, 2:04 p.m.
Hi,

Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!


        /* We found a difference or a NULL before the limit was reached.  */
        and     limit, limit, #7
        cbz     limit, L(not_limit)

Wilco
Siddhesh Poyarekar - March 14, 2018, 2:20 p.m.
On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!

Because I was half asleep and just followed what Szabolcs said ;)

I'll fix that up later (I can barely sit today, my back is killing me)
or please feel free to fix up if you'd like to.

Thanks,
Siddhesh
Siddhesh Poyarekar - March 15, 2018, 2:37 a.m.
On Wednesday 14 March 2018 07:50 PM, Siddhesh Poyarekar wrote:
> On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
>> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!
> 
> Because I was half asleep and just followed what Szabolcs said ;)
> 
> I'll fix that up later (I can barely sit today, my back is killing me)
> or please feel free to fix up if you'd like to.

I have fixed this now:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1

Thanks,
Siddhesh
Wilco Dijkstra - March 15, 2018, 1:44 p.m.
Siddhesh Poyarekar wrote:

> I have fixed this now:
>
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1

Thanks, that's fine for now. We should look into tuning this further in the future,
I think both strcmp and strncmp should be able to be almost as fast as memcmp.

Wilco
Siddhesh Poyarekar - March 15, 2018, 1:57 p.m.
On Thursday 15 March 2018 07:14 PM, Wilco Dijkstra wrote:
> Thanks, that's fine for now. We should look into tuning this further in the future,
> I think both strcmp and strncmp should be able to be almost as fast as memcmp.

Agreed, I haven't taken it off my plate.  This was a pretty big gain to
keep holding on to though, which is why I pushed it out early.

Siddhesh

Patch

--- a/sysdeps/aarch64/strncmp.S
+++ b/sysdeps/aarch64/strncmp.S
@@ -208,13 +208,15 @@  L(done):
        /* Align the SRC1 to a dword by doing a bytewise compare and then do
           the dword loop.  */
 L(try_misaligned_words):
-       mov     limit_wd, limit, lsr #3
+       mov     limit_wd, limit
+       lsr     limit_wd, limit_wd, #3
        cbz     count, L(do_misaligned)
 
        neg     count, count
        and     count, count, #7
        sub     limit, limit, count
-       mov     limit_wd, limit, lsr #3
+       mov     limit_wd, limit
+       lsr     limit_wd, limit_wd, #3

Also it seems to me it would be far easier to subtract 8 from limit in the main loop.
This means we don't ever need limit_wd, and avoids having to do this later: