From patchwork Tue Dec 13 12:49:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 18428 Received: (qmail 65485 invoked by alias); 13 Dec 2016 12:49:24 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 65450 invoked by uid 89); 13 Dec 2016 12:49:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=dick, Dick, LONG_MAX, long_max X-HELO: EUR02-AM5-obe.outbound.protection.outlook.com From: Wilco Dijkstra To: "libc-alpha@sourceware.org" CC: nd Subject: Re: [PATCH] Improve generic rawmemchr Date: Tue, 13 Dec 2016 12:49:08 +0000 Message-ID: References: In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-office365-filtering-correlation-id: da8d56ec-5ddf-4c94-2e72-08d423566dff x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:AM5PR0802MB2609; x-microsoft-exchange-diagnostics: 1; AM5PR0802MB2609; 7:MMqOO4u9YH2CoJwNK9UoiMpu+mpnwFiAt9ExepPO8TXaG6vzXr30uj5yYIXT74OOjYfM9hxsJuqwApN7ZJmsq6FeLelmzSkcPVvAO5/yYQyDcA50TLmTg22ReEBf9RFC3gamfwIWzK0S8pR/M8SaYIpd+y12/CrI8joxIi/TTyK3SmTm1NNu5UGAIAHghnsFw5eAS6m3PTwthhlS96DNxr8eaOLrzu5J7/jZLvQxnwJbLvoSfKZFPxHP5zYfOXhx593K3oG0V8NiM3th0UtyxJcatWT678JhVIckWGJ4rsxVoFpl/bOZq0xhQBYwUgvYt4XijZC6Vlth0EkEJ30NgNI61JHNIGIkvS+ECSM7lPAeJJEPN+sGb5hFxKMDyG9Mt4tI75xG0JSTL2+INY8c8WWiRUUMIXQTIZf+DYnqxqfdkVDDcgVz2tkH/yTo60NvVpXYkjpTLT9RVp/eFAKvXw== nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(250305191791016)(180628864354917)(22074186197030); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6041248)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(6072148); SRVR:AM5PR0802MB2609; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0802MB2609; x-forefront-prvs: 01559F388D x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(39860400002)(39850400002)(39840400002)(39450400003)(39410400002)(189002)(377424004)(199003)(54534003)(50986999)(9686002)(6916009)(77096006)(6436002)(92566002)(76176999)(5660300001)(229853002)(4001150100001)(4326007)(54356999)(101416001)(97736004)(86362001)(66066001)(2900100001)(2501003)(8936002)(8676002)(76576001)(81166006)(81156014)(3846002)(3660700001)(2351001)(33656002)(189998001)(102836003)(6506006)(106356001)(2906002)(3900700001)(110136003)(106116001)(122556002)(7696004)(38730400001)(6116002)(450100001)(305945005)(5640700002)(74316002)(7736002)(3280700002)(68736007)(105586002)(2950100002)(41533002)(2004002); DIR:OUT; SFP:1101; SCL:1; SRVR:AM5PR0802MB2609; H:AM5PR0802MB2610.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Dec 2016 12:49:08.3200 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2609 ping From: Wilco Dijkstra Sent: 16 November 2016 18:53 To: libc-alpha@sourceware.org Cc: nd Subject: [PATCH] Improve generic rawmemchr   Improve generic rawmemchr for targets that don't have an assembler version by tailcalling memchr with the maximum size. If a target has an optimized memchr this is significantly faster (~3x on AArch64), if not, then this makes little difference. Also optimize the special case of zero to use strlen as this is typically faster than memchr. ChangeLog: 2015-11-16  Wilco Dijkstra          * string/rawmemchr.c (RAWMEMCHR): Use faster memchr/strlen. diff --git a/string/rawmemchr.c b/string/rawmemchr.c index fa3176d6ac7e25490be415af0459807509d1e02b..1a146af980619ac9a37a3c9d8df3917e7ce5db12 100644 --- a/string/rawmemchr.c +++ b/string/rawmemchr.c @@ -1,10 +1,5 @@  /* Copyright (C) 1991-2016 Free Software Foundation, Inc.     This file is part of the GNU C Library. -   Based on strlen implementation by Torbjorn Granlund (tege@sics.se), -   with help from Dan Sahlin (dan@sics.se) and -   commentary by Jim Blandy (jimb@ai.mit.edu); -   adaptation to memchr suggested by Dick Karpinski (dick@cca.ucsf.edu), -   and implemented by Roland McGrath (roland@ai.mit.edu).       The GNU C Library is free software; you can redistribute it and/or     modify it under the terms of the GNU Lesser General Public @@ -20,157 +15,19 @@     License along with the GNU C Library; if not, see     .  */   -#ifdef HAVE_CONFIG_H -#include -#endif - -#undef __ptr_t -#define __ptr_t void * - -#if defined (_LIBC) -# include -# include -# include -#endif - -#if defined (HAVE_LIMITS_H) || defined (_LIBC) -# include -#endif - -#define LONG_MAX_32_BITS 2147483647 - -#ifndef LONG_MAX -#define LONG_MAX LONG_MAX_32_BITS -#endif - -#include - -#undef memchr +#include    #ifndef RAWMEMCHR  # define RAWMEMCHR __rawmemchr  #endif    /* Find the first occurrence of C in S.  */ -__ptr_t -RAWMEMCHR (const __ptr_t s, int c_in) +void * +RAWMEMCHR (const void *s, int c)  { -  const unsigned char *char_ptr; -  const unsigned long int *longword_ptr; -  unsigned long int longword, magic_bits, charmask; -  unsigned char c; - -  c = (unsigned char) c_in; - -  /* Handle the first few characters by reading one character at a time. -     Do this until CHAR_PTR is aligned on a longword boundary.  */ -  for (char_ptr = (const unsigned char *) s; -       ((unsigned long int) char_ptr & (sizeof (longword) - 1)) != 0; -       ++char_ptr) -    if (*char_ptr == c) -      return (__ptr_t) char_ptr; - -  /* All these elucidatory comments refer to 4-byte longwords, -     but the theory applies equally well to 8-byte longwords.  */ - -  longword_ptr = (unsigned long int *) char_ptr; - -  /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits -     the "holes."  Note that there is a hole just to the left of -     each byte, with an extra at the end: - -     bits:  01111110 11111110 11111110 11111111 -     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD - -     The 1-bits make sure that carries propagate to the next 0-bit. -     The 0-bits provide holes for carries to fall into.  */ -  magic_bits = -1; -  magic_bits = magic_bits / 0xff * 0xfe << 1 >> 1 | 1; - -  /* Set up a longword, each of whose bytes is C.  */ -  charmask = c | (c << 8); -  charmask |= charmask << 16; -#if LONG_MAX > LONG_MAX_32_BITS -  charmask |= charmask << 32; -#endif - -  /* Instead of the traditional loop which tests each character, -     we will test a longword at a time.  The tricky part is testing -     if *any of the four* bytes in the longword in question are zero.  */ -  while (1) -    { -      /* We tentatively exit the loop if adding MAGIC_BITS to -        LONGWORD fails to change any of the hole bits of LONGWORD. - -        1) Is this safe?  Will it catch all the zero bytes? -        Suppose there is a byte with all zeros.  Any carry bits -        propagating from its left will fall into the hole at its -        least significant bit and stop.  Since there will be no -        carry from its most significant bit, the LSB of the -        byte to the left will be unchanged, and the zero will be -        detected. - -        2) Is this worthwhile?  Will it ignore everything except -        zero bytes?  Suppose every byte of LONGWORD has a bit set -        somewhere.  There will be a carry into bit 8.  If bit 8 -        is set, this will carry into bit 16.  If bit 8 is clear, -        one of bits 9-15 must be set, so there will be a carry -        into bit 16.  Similarly, there will be a carry into bit -        24.  If one of bits 24-30 is set, there will be a carry -        into bit 31, so all of the hole bits will be changed. - -        The one misfire occurs when bits 24-30 are clear and bit -        31 is set; in this case, the hole at bit 31 is not -        changed.  If we had access to the processor carry flag, -        we could close this loophole by putting the fourth hole -        at bit 32! - -        So it ignores everything except 128's, when they're aligned -        properly. - -        3) But wait!  Aren't we looking for C, not zero? -        Good point.  So what we do is XOR LONGWORD with a longword, -        each of whose bytes is C.  This turns each byte that is C -        into a zero.  */ - -      longword = *longword_ptr++ ^ charmask; - -      /* Add MAGIC_BITS to LONGWORD.  */ -      if ((((longword + magic_bits) - -           /* Set those bits that were unchanged by the addition.  */ -           ^ ~longword) - -          /* Look at only the hole bits.  If any of the hole bits -             are unchanged, most likely one of the bytes was a -             zero.  */ -          & ~magic_bits) != 0) -       { -         /* Which of the bytes was C?  If none of them were, it was -            a misfire; continue the search.  */ - -         const unsigned char *cp = (const unsigned char *) (longword_ptr - 1); - -         if (cp[0] == c) -           return (__ptr_t) cp; -         if (cp[1] == c) -           return (__ptr_t) &cp[1]; -         if (cp[2] == c) -           return (__ptr_t) &cp[2]; -         if (cp[3] == c) -           return (__ptr_t) &cp[3]; -#if LONG_MAX > 2147483647 -         if (cp[4] == c) -           return (__ptr_t) &cp[4]; -         if (cp[5] == c) -           return (__ptr_t) &cp[5]; -         if (cp[6] == c) -           return (__ptr_t) &cp[6]; -         if (cp[7] == c) -           return (__ptr_t) &cp[7]; -#endif -       } -    } +  if (c != '\0') +    return memchr (s, c, (size_t)-1); +  return (char *)s + strlen (s);  }  libc_hidden_def (__rawmemchr)  weak_alias (__rawmemchr, rawmemchr)