Message ID | DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 116653 invoked by alias); 14 Mar 2018 14:04:13 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 116500 invoked by uid 89); 14 Mar 2018 14:04:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-Spam-User: qpsmtpd, 2 recipients X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: "siddhesh@sourceware.org" <siddhesh@sourceware.org> CC: Szabolcs Nagy <Szabolcs.Nagy@arm.com>, "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>, nd <nd@arm.com> Subject: Re: [PATCH] aarch64: Improve strncmp for mutually misaligned inputs Date: Wed, 14 Mar 2018 14:04:00 +0000 Message-ID: <DB6PR0801MB2053D467ED5AC3E8860BD0EB83D10@DB6PR0801MB2053.eurprd08.prod.outlook.com> authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB1272; 7:Clx4aeCAZJtnWoVE69zgGT45Ve/cIaAabYbfqJIOq8f4feKhwj9U3+d6sdrOc41eUMnek7ehm6xxDunbeTL2C2nM5/FihnVQnG16TL2bRSnMP60h53tcir5upNUOg0ii3Ym6YNlLtMTNnYQj38dFN7NnuUFEPr2UgtcFpUBZjQUlbYFvGPsb/DNf8P4C0GAKK71jiE6pOXbRUCDVpgqMI7ZJYV3L+Mey4x6NtLRl9nObdNUEz3ePzpigsYHeyOAW x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: ddc88a00-9568-482c-f071-08d589b46fe4 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(2017052603328)(7153060)(7193020); SRVR:DB6PR0801MB1272; x-ms-traffictypediagnostic: DB6PR0801MB1272: nodisclaimer: True x-microsoft-antispam-prvs: <DB6PR0801MB127264BF28E48B2188AEA4B983D10@DB6PR0801MB1272.eurprd08.prod.outlook.com> x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(3231221)(944501244)(52105095)(93006095)(93001095)(3002001)(6055026)(6041310)(20161123562045)(20161123560045)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:DB6PR0801MB1272; BCL:0; PCL:0; RULEID:; SRVR:DB6PR0801MB1272; x-forefront-prvs: 0611A21987 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(39380400002)(366004)(346002)(376002)(39860400002)(199004)(189003)(2900100001)(102836004)(54906003)(6506007)(450100002)(6246003)(3846002)(4326008)(2351001)(53936002)(106356001)(55016002)(6116002)(9686003)(26005)(7696005)(5250100002)(2501003)(8936002)(72206003)(14454004)(316002)(59450400001)(8676002)(81156014)(81166006)(68736007)(1730700003)(478600001)(6916009)(105586002)(7736002)(3660700001)(5660300001)(6436002)(25786009)(305945005)(74316002)(3280700002)(66066001)(97736004)(33656002)(229853002)(2906002)(5640700003)(99286004)(86362001)(40753002)(133343001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB1272; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: PIExDdgDyXB3XBEBJMK6/Wq6lpT1Ez/ywQsygr9wZREKwE1Gm6ao+JSLm9Kwnl+UsUPGK4T1LY5H3mCPfIjIHL2nGgWnJ0KY5V6xxl3224IzR54897HCE3MYffFYDFRlMQa7YfhTNG6kvigmjot+bSNb3fZo6bOpv5wrKf2djGKUHMxBEPBgT4P1CfXJUMLTR4ZvpQQqlagXatz3hs1fUl6IvQHrr+9AyNHi3LPcXA0zg3dilu/8NYWu3KTqaHlIheFdddw/2TzlNI57YYnHIsmxdQ/tSBU0Z9mwjI1+cjxb0VyeohIAEs1AjZFNT73y86vu5b5rmuk9kIZiNWYF9Q== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: ddc88a00-9568-482c-f071-08d589b46fe4 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Mar 2018 14:04:00.6405 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1272 |
Commit Message
Wilco Dijkstra
March 14, 2018, 2:04 p.m. UTC
Hi, Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64! /* We found a difference or a NULL before the limit was reached. */ and limit, limit, #7 cbz limit, L(not_limit) Wilco
Comments
On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote:
> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64!
Because I was half asleep and just followed what Szabolcs said ;)
I'll fix that up later (I can barely sit today, my back is killing me)
or please feel free to fix up if you'd like to.
Thanks,
Siddhesh
On Wednesday 14 March 2018 07:50 PM, Siddhesh Poyarekar wrote: > On Wednesday 14 March 2018 07:34 PM, Wilco Dijkstra wrote: >> Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64! > > Because I was half asleep and just followed what Szabolcs said ;) > > I'll fix that up later (I can barely sit today, my back is killing me) > or please feel free to fix up if you'd like to. I have fixed this now: https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1 Thanks, Siddhesh
Siddhesh Poyarekar wrote: > I have fixed this now: > > https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1 Thanks, that's fine for now. We should look into tuning this further in the future, I think both strcmp and strncmp should be able to be almost as fast as memcmp. Wilco
On Thursday 15 March 2018 07:14 PM, Wilco Dijkstra wrote: > Thanks, that's fine for now. We should look into tuning this further in the future, > I think both strcmp and strncmp should be able to be almost as fast as memcmp. Agreed, I haven't taken it off my plate. This was a pretty big gain to keep holding on to though, which is why I pushed it out early. Siddhesh
--- a/sysdeps/aarch64/strncmp.S +++ b/sysdeps/aarch64/strncmp.S @@ -208,13 +208,15 @@ L(done): /* Align the SRC1 to a dword by doing a bytewise compare and then do the dword loop. */ L(try_misaligned_words): - mov limit_wd, limit, lsr #3 + mov limit_wd, limit + lsr limit_wd, limit_wd, #3 cbz count, L(do_misaligned) neg count, count and count, count, #7 sub limit, limit, count - mov limit_wd, limit, lsr #3 + mov limit_wd, limit + lsr limit_wd, limit_wd, #3 Also it seems to me it would be far easier to subtract 8 from limit in the main loop. This means we don't ever need limit_wd, and avoids having to do this later: