aarch64: Improve strncmp for mutually misaligned inputs

Message ID DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com
State New, archived
Headers

Commit Message

Wilco Dijkstra March 14, 2018, 2:04 p.m. UTC
  Hi,

Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!


        /* We found a difference or a NULL before the limit was reached.  */
        and     limit, limit, #7
        cbz     limit, L(not_limit)

Wilco
  

Comments

Siddhesh Poyarekar March 14, 2018, 2:20 p.m. UTC | #1
On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!

Because I was half asleep and just followed what Szabolcs said ;)

I'll fix that up later (I can barely sit today, my back is killing me)
or please feel free to fix up if you'd like to.

Thanks,
Siddhesh
  
Siddhesh Poyarekar March 15, 2018, 2:37 a.m. UTC | #2
On Wednesday 14 March 2018 07:50 PM, Siddhesh Poyarekar wrote:
> On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
>> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!
> 
> Because I was half asleep and just followed what Szabolcs said ;)
> 
> I'll fix that up later (I can barely sit today, my back is killing me)
> or please feel free to fix up if you'd like to.

I have fixed this now:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1

Thanks,
Siddhesh
  
Wilco Dijkstra March 15, 2018, 1:44 p.m. UTC | #3
Siddhesh Poyarekar wrote:

> I have fixed this now:
>
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1

Thanks, that's fine for now. We should look into tuning this further in the future,
I think both strcmp and strncmp should be able to be almost as fast as memcmp.

Wilco
  
Siddhesh Poyarekar March 15, 2018, 1:57 p.m. UTC | #4
On Thursday 15 March 2018 07:14 PM, Wilco Dijkstra wrote:
> Thanks, that's fine for now. We should look into tuning this further in the future,
> I think both strcmp and strncmp should be able to be almost as fast as memcmp.

Agreed, I haven't taken it off my plate.  This was a pretty big gain to
keep holding on to though, which is why I pushed it out early.

Siddhesh
  

Patch

--- a/sysdeps/aarch64/strncmp.S
+++ b/sysdeps/aarch64/strncmp.S
@@ -208,13 +208,15 @@  L(done):
        /* Align the SRC1 to a dword by doing a bytewise compare and then do
           the dword loop.  */
 L(try_misaligned_words):
-       mov     limit_wd, limit, lsr #3
+       mov     limit_wd, limit
+       lsr     limit_wd, limit_wd, #3
        cbz     count, L(do_misaligned)
 
        neg     count, count
        and     count, count, #7
        sub     limit, limit, count
-       mov     limit_wd, limit, lsr #3
+       mov     limit_wd, limit
+       lsr     limit_wd, limit_wd, #3

Also it seems to me it would be far easier to subtract 8 from limit in the main loop.
This means we don't ever need limit_wd, and avoids having to do this later: