Message ID | AM5PR0802MB2610223EAFFF5D9F3CC5378E83D20@AM5PR0802MB2610.eurprd08.prod.outlook.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 39021 invoked by alias); 29 Jun 2017 16:21:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 38972 invoked by uid 89); 29 Jun 2017 16:20:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=19723, Hx-languages-length:1847, 3, 2, 1974 X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org> CC: nd <nd@arm.com> Subject: [PATCH][AArch64] Inline mempcpy again Date: Thu, 29 Jun 2017 16:20:54 +0000 Message-ID: <AM5PR0802MB2610223EAFFF5D9F3CC5378E83D20@AM5PR0802MB2610.eurprd08.prod.outlook.com> authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM5PR0802MB2611; 7:8oYj9+UzhHFMQhxgfNy5pMFKd4/frD5kCInDb3PMcy5yTutbWaWDkldmg7SLAHynkzCwY7wAhQ8OQMNgYlm53MpTGwOyyKYhR870BfsKtLcQVLD84yFyDwO7Xv2CHTXZihQyHCj4i9qCVgEgE9VDNLAP0LZVk0ZqMwWyvDLjfhmooJv2sQ+7k7jMiP8JcFzRspRV2/FXNCjGTFfjC4q+7ve0VZZZ3FFT+Gc+/4lQamCnFSgom4mjb+dEfMIRu1HvFGFd78rK6xahlF3TZrp5WoyqEsrw6kzW36gIseecM5lFEc/LeGWSF3FrrsvKiYzgjWXwE93kbKotFAjqlGDJe+eLg9q1Tn/bWf1Q3fQ+zX9KrMN1cC+EdrSk3AKfKP7QayUvWvyFVc5BlKoqDtbMLU1izkY9EXCeCu3RRs1QeVRHuHIw1HnCPNQeLe9v7MLskvRAYOTBs8a7uApbMnDIoBZw9L3YW4v74kNjNyH1a6nWZ71KDEnH1J4Ilryjm4fO67mnsd7G3dE3bagdyUcYpqF5gqph0gbRD2W5Or2kISUVhu1JbL7Rs1GtXVmyl1clOHn8jqJDs7cS953CkNqAotIZt7RLuS0aSW26Q6QW0GXfaR4KFvN3RXKsYNr01FCes3xI8vYjGBwdT8WFQdrKjb7CTaw1vC4XhJsvBD3sK1ywYMnHke+eQhJzYRovRqDXAZoWmrGPXNf3gc+D7g124gqxaO8JCM4is6fmGl2ZCOScqalYNWSdjJ6NHqMX2k/nf7HFAGHfLeXvVd7GQ9NWYjtb4Y08qx++e5wjn0rm2+c= x-ms-office365-filtering-correlation-id: 8a1555b6-4a95-40a3-5789-08d4bf0ad10c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254075)(300000503095)(300135400095)(48565401081)(201703131423075)(201703031133081)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:AM5PR0802MB2611; x-ms-traffictypediagnostic: AM5PR0802MB2611: nodisclaimer: True x-microsoft-antispam-prvs: <AM5PR0802MB2611DC76B0810C3DBFCA5C0D83D20@AM5PR0802MB2611.eurprd08.prod.outlook.com> x-exchange-antispam-report-test: UriScan:(180628864354917)(133145235818549)(236129657087228)(148574349560750)(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(3002001)(6055026)(6041248)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123560025)(20161123555025)(20161123558100)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:AM5PR0802MB2611; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:AM5PR0802MB2611; x-forefront-prvs: 0353563E2B x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39860400002)(39400400002)(39450400003)(39840400002)(39410400002)(39850400002)(54534003)(377424004)(81166006)(3660700001)(53936002)(5250100002)(7696004)(2900100001)(8936002)(14454004)(6506006)(8676002)(25786009)(6436002)(86362001)(2351001)(38730400002)(55016002)(4326008)(575784001)(2501003)(110136004)(99286003)(6916009)(102836003)(3846002)(6116002)(72206003)(9686003)(189998001)(50986999)(54356999)(7736002)(478600001)(5660300001)(5640700003)(66066001)(3280700002)(33656002)(305945005)(74316002); DIR:OUT; SFP:1101; SCL:1; SRVR:AM5PR0802MB2611; H:AM5PR0802MB2610.eurprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Jun 2017 16:20:54.3438 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2611 |
Commit Message
Wilco Dijkstra
June 29, 2017, 4:20 p.m. UTC
Recent changes removed the generic mempcpy inline. Given GCC still doesn't optimize mempcpy (PR70140), I am adding it again. Since string/string.h no longer includes an architecture specific header, do this inside include/string.h and for now only on AArch64. OK for commit? ChangeLog: 2017-06-29 Wilco Dijkstra <wdijkstr@arm.com> * include/string.h: (mempcpy): Redirect to __mempcpy_inline. (__mempcpy): Likewise. (__mempcpy_inline): New inline function. * sysdeps/aarch64/string_private.h: Define _INLINE_mempcpy. --
Comments
On 29/06/2017 13:20, Wilco Dijkstra wrote: > Recent changes removed the generic mempcpy inline. Given GCC still > doesn't optimize mempcpy (PR70140), I am adding it again. Since > string/string.h no longer includes an architecture specific header, do this > inside include/string.h and for now only on AArch64. Should we reopen PR70140 then, its current RESOLVED/FIXED state gives indicates recent gcc does not show the issue. I also noted some discussion on PR81657, which is also set as RESOLVED/FIXED. > > OK for commit? > > ChangeLog: > 2017-06-29 Wilco Dijkstra <wdijkstr@arm.com> > > * include/string.h: (mempcpy): Redirect to __mempcpy_inline. > (__mempcpy): Likewise. > (__mempcpy_inline): New inline function. > * sysdeps/aarch64/string_private.h: Define _INLINE_mempcpy. We removed because the consensus is we do not want this kind of optimization to be provided by libc anymore, adding this exception can potentially get the previous state we are with multiple architectures providing its own string.h/string_private.h hacks. I see adding this patch is a step back and I hardly think it is really an optimization which yield a large performance improvements to add an exception. If optimizing mempcpy is really required I think a better option would to provide the optimized based on current memcpy/memmove. I have created an implementation [1] which provides the expected optimized mempcpy with the cost of only extra 'mov' instruction on both memcpy and memmove (to use the same memcpy/memmove code) [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/aarch64-mempcpy > > -- > diff --git a/include/string.h b/include/string.h > index 069efd0b87010e5fdb64c87ced7af1dc4f54f232..46b90b8f346149f075fad026e562dfb27b658969 100644 > --- a/include/string.h > +++ b/include/string.h > @@ -197,4 +197,23 @@ extern char *__strncat_chk (char *__restrict __dest, > size_t __len, size_t __destlen) __THROW; > #endif > > +#if defined __USE_GNU && defined __OPTIMIZE__ \ > + && defined __extern_always_inline && __GNUC_PREREQ (3,2) \ > + && defined _INLINE_mempcpy > + > +#undef mempcpy > +#undef __mempcpy > + > +#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) > +#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) > + > +__extern_always_inline void * > +__mempcpy_inline (void *__restrict __dest, > + const void *__restrict __src, size_t __n) > +{ > + return (char *) memcpy (__dest, __src, __n) + __n; > +} > + > +#endif > + > #endif > diff --git a/sysdeps/aarch64/string_private.h b/sysdeps/aarch64/string_private.h > index 09dedbf3db40cf06077a44af992b399a6b37b48d..8b8fdddcc17a3f69455e72efe9c3616d2d33abe2 100644 > --- a/sysdeps/aarch64/string_private.h > +++ b/sysdeps/aarch64/string_private.h > @@ -18,3 +18,6 @@ > > /* AArch64 implementations support efficient unaligned access. */ > #define _STRING_ARCH_unaligned 1 > + > +/* Inline mempcpy since GCC doesn't optimize it (PR70140). */ > +#define _INLINE_mempcpy 1 >
On 07/05/2018 06:33 PM, Adhemerval Zanella wrote: > If optimizing mempcpy is really required I think a better option would > to provide the optimized based on current memcpy/memmove. I have created > an implementation [1] which provides the expected optimized mempcpy with > the cost of only extra 'mov' instruction on both memcpy and memmove (to > use the same memcpy/memmove code) > > [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/aarch64-mempcpy I had proposed the exact same thing for __memcpy_chk[1] for aarch64, which was rejected under the pretext that this should be handled completely by gcc. If that consensus has changed then I'd like to propose that patch again as well. However, I do understand that this is much better off being fixed in gcc so we should probably try and understand the limitations of doing that first. Wilco, does anything prevent gcc from doing this optimization for mempcpy or __memcpy_chk? Siddhesh [1] http://sourceware-org.1504.n7.nabble.com/PATCH-0-2-Multiarch-hooks-for-memcpy-variants-td463236.html >> >> -- >> diff --git a/include/string.h b/include/string.h >> index 069efd0b87010e5fdb64c87ced7af1dc4f54f232..46b90b8f346149f075fad026e562dfb27b658969 100644 >> --- a/include/string.h >> +++ b/include/string.h >> @@ -197,4 +197,23 @@ extern char *__strncat_chk (char *__restrict __dest, >> size_t __len, size_t __destlen) __THROW; >> #endif >> >> +#if defined __USE_GNU && defined __OPTIMIZE__ \ >> + && defined __extern_always_inline && __GNUC_PREREQ (3,2) \ >> + && defined _INLINE_mempcpy >> + >> +#undef mempcpy >> +#undef __mempcpy >> + >> +#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) >> +#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) >> + >> +__extern_always_inline void * >> +__mempcpy_inline (void *__restrict __dest, >> + const void *__restrict __src, size_t __n) >> +{ >> + return (char *) memcpy (__dest, __src, __n) + __n; >> +} >> + >> +#endif >> + >> #endif >> diff --git a/sysdeps/aarch64/string_private.h b/sysdeps/aarch64/string_private.h >> index 09dedbf3db40cf06077a44af992b399a6b37b48d..8b8fdddcc17a3f69455e72efe9c3616d2d33abe2 100644 >> --- a/sysdeps/aarch64/string_private.h >> +++ b/sysdeps/aarch64/string_private.h >> @@ -18,3 +18,6 @@ >> >> /* AArch64 implementations support efficient unaligned access. */ >> #define _STRING_ARCH_unaligned 1 >> + >> +/* Inline mempcpy since GCC doesn't optimize it (PR70140). */ >> +#define _INLINE_mempcpy 1 >>
On 06/07/2018 02:34, Siddhesh Poyarekar wrote: > On 07/05/2018 06:33 PM, Adhemerval Zanella wrote: >> If optimizing mempcpy is really required I think a better option would >> to provide the optimized based on current memcpy/memmove. I have created >> an implementation [1] which provides the expected optimized mempcpy with >> the cost of only extra 'mov' instruction on both memcpy and memmove (to >> use the same memcpy/memmove code) >> >> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/aarch64-mempcpy > > I had proposed the exact same thing for __memcpy_chk[1] for aarch64, which was rejected under the pretext that this should be handled completely by gcc. If that consensus has changed then I'd like to propose that patch again as well. > > However, I do understand that this is much better off being fixed in gcc so we should probably try and understand the limitations of doing that first. Wilco, does anything prevent gcc from doing this optimization for mempcpy or __memcpy_chk? > > Siddhesh I tend to agree it should indeed be handled by compiler, but checking on bugzilla reports about the changes on gcc, it seems the idea is not make it generic for all platforms, but rather dependent of the backend plus targeting libc (that's why PR70140 seems to be reverted). In any case, if the idea is indeed optimize mempcpy and GCC won't get the required support anytime soon I still prefer to *not* get back in adding the code on string*.h headers. This patch was just one idea that we can get a similar performance directly on the assembly routines (which the advantage if compiler does not transform mempcpy to memcpy it will still get some improvements). > > [1] http://sourceware-org.1504.n7.nabble.com/PATCH-0-2-Multiarch-hooks-for-memcpy-variants-td463236.html > >>> >>> -- >>> diff --git a/include/string.h b/include/string.h >>> index 069efd0b87010e5fdb64c87ced7af1dc4f54f232..46b90b8f346149f075fad026e562dfb27b658969 100644 >>> --- a/include/string.h >>> +++ b/include/string.h >>> @@ -197,4 +197,23 @@ extern char *__strncat_chk (char *__restrict __dest, >>> size_t __len, size_t __destlen) __THROW; >>> #endif >>> +#if defined __USE_GNU && defined __OPTIMIZE__ \ >>> + && defined __extern_always_inline && __GNUC_PREREQ (3,2) \ >>> + && defined _INLINE_mempcpy >>> + >>> +#undef mempcpy >>> +#undef __mempcpy >>> + >>> +#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) >>> +#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) >>> + >>> +__extern_always_inline void * >>> +__mempcpy_inline (void *__restrict __dest, >>> + const void *__restrict __src, size_t __n) >>> +{ >>> + return (char *) memcpy (__dest, __src, __n) + __n; >>> +} >>> + >>> +#endif >>> + >>> #endif >>> diff --git a/sysdeps/aarch64/string_private.h b/sysdeps/aarch64/string_private.h >>> index 09dedbf3db40cf06077a44af992b399a6b37b48d..8b8fdddcc17a3f69455e72efe9c3616d2d33abe2 100644 >>> --- a/sysdeps/aarch64/string_private.h >>> +++ b/sysdeps/aarch64/string_private.h >>> @@ -18,3 +18,6 @@ >>> /* AArch64 implementations support efficient unaligned access. */ >>> #define _STRING_ARCH_unaligned 1 >>> + >>> +/* Inline mempcpy since GCC doesn't optimize it (PR70140). */ >>> +#define _INLINE_mempcpy 1 >>>
diff --git a/include/string.h b/include/string.h index 069efd0b87010e5fdb64c87ced7af1dc4f54f232..46b90b8f346149f075fad026e562dfb27b658969 100644 --- a/include/string.h +++ b/include/string.h @@ -197,4 +197,23 @@ extern char *__strncat_chk (char *__restrict __dest, size_t __len, size_t __destlen) __THROW; #endif +#if defined __USE_GNU && defined __OPTIMIZE__ \ + && defined __extern_always_inline && __GNUC_PREREQ (3,2) \ + && defined _INLINE_mempcpy + +#undef mempcpy +#undef __mempcpy + +#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) +#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) + +__extern_always_inline void * +__mempcpy_inline (void *__restrict __dest, + const void *__restrict __src, size_t __n) +{ + return (char *) memcpy (__dest, __src, __n) + __n; +} + +#endif + #endif diff --git a/sysdeps/aarch64/string_private.h b/sysdeps/aarch64/string_private.h index 09dedbf3db40cf06077a44af992b399a6b37b48d..8b8fdddcc17a3f69455e72efe9c3616d2d33abe2 100644 --- a/sysdeps/aarch64/string_private.h +++ b/sysdeps/aarch64/string_private.h @@ -18,3 +18,6 @@ /* AArch64 implementations support efficient unaligned access. */ #define _STRING_ARCH_unaligned 1 + +/* Inline mempcpy since GCC doesn't optimize it (PR70140). */ +#define _INLINE_mempcpy 1