From patchwork Wed Mar 14 14:04:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 26308 Received: (qmail 116653 invoked by alias); 14 Mar 2018 14:04:13 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 116500 invoked by uid 89); 14 Mar 2018 14:04:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-Spam-User: qpsmtpd, 2 recipients X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "siddhesh@sourceware.org" CC: Szabolcs Nagy , "libc-alpha@sourceware.org" , nd Subject: Re: [PATCH] aarch64: Improve strncmp for mutually misaligned inputs Date: Wed, 14 Mar 2018 14:04:00 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB1272; 7:Clx4aeCAZJtnWoVE69zgGT45Ve/cIaAabYbfqJIOq8f4feKhwj9U3+d6sdrOc41eUMnek7ehm6xxDunbeTL2C2nM5/FihnVQnG16TL2bRSnMP60h53tcir5upNUOg0ii3Ym6YNlLtMTNnYQj38dFN7NnuUFEPr2UgtcFpUBZjQUlbYFvGPsb/DNf8P4C0GAKK71jiE6pOXbRUCDVpgqMI7ZJYV3L+Mey4x6NtLRl9nObdNUEz3ePzpigsYHeyOAW x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: ddc88a00-9568-482c-f071-08d589b46fe4 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(2017052603328)(7153060)(7193020); SRVR:DB6PR0801MB1272; x-ms-traffictypediagnostic: DB6PR0801MB1272: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(3231221)(944501244)(52105095)(93006095)(93001095)(3002001)(6055026)(6041310)(20161123562045)(20161123560045)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:DB6PR0801MB1272; BCL:0; PCL:0; RULEID:; SRVR:DB6PR0801MB1272; x-forefront-prvs: 0611A21987 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(39380400002)(366004)(346002)(376002)(39860400002)(199004)(189003)(2900100001)(102836004)(54906003)(6506007)(450100002)(6246003)(3846002)(4326008)(2351001)(53936002)(106356001)(55016002)(6116002)(9686003)(26005)(7696005)(5250100002)(2501003)(8936002)(72206003)(14454004)(316002)(59450400001)(8676002)(81156014)(81166006)(68736007)(1730700003)(478600001)(6916009)(105586002)(7736002)(3660700001)(5660300001)(6436002)(25786009)(305945005)(74316002)(3280700002)(66066001)(97736004)(33656002)(229853002)(2906002)(5640700003)(99286004)(86362001)(40753002)(133343001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB1272; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: PIExDdgDyXB3XBEBJMK6/Wq6lpT1Ez/ywQsygr9wZREKwE1Gm6ao+JSLm9Kwnl+UsUPGK4T1LY5H3mCPfIjIHL2nGgWnJ0KY5V6xxl3224IzR54897HCE3MYffFYDFRlMQa7YfhTNG6kvigmjot+bSNb3fZo6bOpv5wrKf2djGKUHMxBEPBgT4P1CfXJUMLTR4ZvpQQqlagXatz3hs1fUl6IvQHrr+9AyNHi3LPcXA0zg3dilu/8NYWu3KTqaHlIheFdddw/2TzlNI57YYnHIsmxdQ/tSBU0Z9mwjI1+cjxb0VyeohIAEs1AjZFNT73y86vu5b5rmuk9kIZiNWYF9Q== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: ddc88a00-9568-482c-f071-08d589b46fe4 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Mar 2018 14:04:00.6405 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1272 Hi, Why not use lsr limit_wd, limit, 3? We have 3-operand shifts on AArch64! /* We found a difference or a NULL before the limit was reached. */ and limit, limit, #7 cbz limit, L(not_limit) Wilco --- a/sysdeps/aarch64/strncmp.S +++ b/sysdeps/aarch64/strncmp.S @@ -208,13 +208,15 @@ L(done): /* Align the SRC1 to a dword by doing a bytewise compare and then do the dword loop. */ L(try_misaligned_words): - mov limit_wd, limit, lsr #3 + mov limit_wd, limit + lsr limit_wd, limit_wd, #3 cbz count, L(do_misaligned) neg count, count and count, count, #7 sub limit, limit, count - mov limit_wd, limit, lsr #3 + mov limit_wd, limit + lsr limit_wd, limit_wd, #3 Also it seems to me it would be far easier to subtract 8 from limit in the main loop. This means we don't ever need limit_wd, and avoids having to do this later: