aarch64: Improve strncmp for mutually misaligned inputs

Message ID	DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: "siddhesh@sourceware.org" <siddhesh@sourceware.org> CC: Szabolcs Nagy <Szabolcs.Nagy@arm.com>, "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>, nd <nd@arm.com> Subject: Re: [PATCH] aarch64: Improve strncmp for mutually misaligned inputs Date: Wed, 14 Mar 2018 14:04:00 +0000 Message-ID: <DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com> nodisclaimer: True received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: PIExDdgDyXB3XBEBJMK6/Wq6lpT1Ez/ywQsygr9wZREKwE1Gm6ao+JSLm9Kwnl+UsUPGK4T1LY5H3mCPfIjIHL2nGgWnJ0KY5V6xxl3224IzR54897HCE3MYffFYDFRlMQa7YfhTNG6kvigmjot+bSNb3fZo6bOpv5wrKf2djGKUHMxBEPBgT4P1CfXJUMLTR4ZvpQQqlagXatz3hs1fUl6IvQHrr+9AyNHi3LPcXA0zg3dilu/8NYWu3KTqaHlIheFdddw/2TzlNI57YYnHIsmxdQ/tSBU0Z9mwjI1+cjxb0VyeohIAEs1AjZFNT73y86vu5b5rmuk9kIZiNWYF9Q== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: "siddhesh@sourceware.org" <siddhesh@sourceware.org>
CC: Szabolcs Nagy <Szabolcs.Nagy@arm.com>, "libc-alpha@sourceware.org"
	<libc-alpha@sourceware.org>, nd <nd@arm.com>
Subject: Re: [PATCH] aarch64: Improve strncmp for mutually misaligned inputs
Date: Wed, 14 Mar 2018 14:04:00 +0000
Message-ID: <DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com>
nodisclaimer: True
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: ddc88a00-9568-482c-f071-08d589b46fe4
X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Mar 2018 14:04:00.6405
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1272

Commit Message

Wilco Dijkstra March 14, 2018, 2:04 p.m. UTC

  Hi,

Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!


        /* We found a difference or a NULL before the limit was reached.  */
        and     limit, limit, #7
        cbz     limit, L(not_limit)

Wilco

Comments

Siddhesh Poyarekar March 14, 2018, 2:20 p.m. UTC | #1

On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!

Because I was half asleep and just followed what Szabolcs said ;)

I'll fix that up later (I can barely sit today, my back is killing me)
or please feel free to fix up if you'd like to.

Thanks,
Siddhesh

Siddhesh Poyarekar March 15, 2018, 2:37 a.m. UTC | #2

On Wednesday 14 March 2018 07:50 PM, Siddhesh Poyarekar wrote:
> On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
>> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!
> 
> Because I was half asleep and just followed what Szabolcs said ;)
> 
> I'll fix that up later (I can barely sit today, my back is killing me)
> or please feel free to fix up if you'd like to.

I have fixed this now:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1

Thanks,
Siddhesh

Wilco Dijkstra March 15, 2018, 1:44 p.m. UTC | #3

Siddhesh Poyarekar wrote:

> I have fixed this now:
>
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1

Thanks, that's fine for now. We should look into tuning this further in the future,
I think both strcmp and strncmp should be able to be almost as fast as memcmp.

Wilco

Siddhesh Poyarekar March 15, 2018, 1:57 p.m. UTC | #4

On Thursday 15 March 2018 07:14 PM, Wilco Dijkstra wrote:
> Thanks, that's fine for now. We should look into tuning this further in the future,
> I think both strcmp and strncmp should be able to be almost as fast as memcmp.

Agreed, I haven't taken it off my plate.  This was a pretty big gain to
keep holding on to though, which is why I pushed it out early.

Siddhesh

diff mbox

Patch

--- a/sysdeps/aarch64/strncmp.S
+++ b/sysdeps/aarch64/strncmp.S
@@ -208,13 +208,15 @@  L(done):
        /* Align the SRC1 to a dword by doing a bytewise compare and then do
           the dword loop.  */
 L(try_misaligned_words):
-       mov     limit_wd, limit, lsr #3
+       mov     limit_wd, limit
+       lsr     limit_wd, limit_wd, #3
        cbz     count, L(do_misaligned)
 
        neg     count, count
        and     count, count, #7
        sub     limit, limit, count
-       mov     limit_wd, limit, lsr #3
+       mov     limit_wd, limit
+       lsr     limit_wd, limit_wd, #3

Also it seems to me it would be far easier to subtract 8 from limit in the main loop.
This means we don't ever need limit_wd, and avoids having to do this later: