Message ID | CAMe9rOrYywzUwdCmsG06ppzdKC9Gntwhia_kMp25ETTx+weDpQ@mail.gmail.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 92491 invoked by alias); 7 Mar 2016 21:47:56 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 92475 invoked by uid 89); 7 Mar 2016 21:47:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=__n, hongjiu.lu@intel.com, hongjiuluintelcom X-HELO: mail-qg0-f42.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=DDvjbxCR3qB9wxazBjoQJkBaRkxIj6PvwBWsX6xr2y4=; b=CAS3crNa28F2R+LiKD4m3oEB503H2DOyiDXHGHfDG7zQFX3Uk0jPvRO308tMP7/Kjq irBYQicr9qPp9NcaGdr9GLT7uAgg2R1X6eFto34i6K8J5Sw2rtGQOpBmhwKj2N/DBlEL Y1yQ53U75PEKrhW5s9ZVaHhzHSVPuuddSd+1z3OHoc33YrVFvubWhYxO+eH4pvdHeja1 iVQr0hFPv8f8gqVr3hOlPt3m5kEQvGNnh4gjglohXf32bQsPpfkvlqIJ/FjE+w3qQiyq hAxIjcg/gUV2LZ0bHa8jnIoEdVALZIfWVuaUs/YgGr5FV/3GTdvWKTRvx2vQMXkgY1BF gPcw== X-Gm-Message-State: AD7BkJKVq8nuHGngN1fyleDHPd1dEtSoOWIeJ5cnQ7GkzNn5evhnPi+GCzvBBvHsfahyH9YbPlKJTmVv5fwX0w== MIME-Version: 1.0 X-Received: by 10.140.16.225 with SMTP id 88mr31684643qgb.96.1457387263688; Mon, 07 Mar 2016 13:47:43 -0800 (PST) Date: Mon, 7 Mar 2016 13:47:43 -0800 Message-ID: <CAMe9rOrYywzUwdCmsG06ppzdKC9Gntwhia_kMp25ETTx+weDpQ@mail.gmail.com> Subject: [PATCH] Revert commit 05a910f7 From: "H.J. Lu" <hjl.tools@gmail.com> To: GNU C Library <libc-alpha@sourceware.org>, Wilco Dijkstra <wdijkstr@arm.com> Content-Type: multipart/mixed; boundary=001a11c0b3b42730f4052d7c6a63 |
Commit Message
H.J. Lu
March 7, 2016, 9:47 p.m. UTC
On Mon, Mar 7, 2016 at 1:20 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 7, 2016 at 12:57 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86, >> define _HAVE_STRING_ARCH_mempcpy to 1 for x86. >> >> If duplicated code between optimized memcpy and mempcpy is a concern, >> we can add an entry point in memcpy and use it to implement mempcpy, >> similar to the set of patches for __mempcpy_sse2_unaligned: >> >> https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html >> >> OK for master? >> >> >> H.J. >> --- >> [BZ #19759] >> * sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New. >> --- >> sysdeps/x86/bits/string.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/sysdeps/x86/bits/string.h b/sysdeps/x86/bits/string.h >> index e4e019f..f5885b4 100644 >> --- a/sysdeps/x86/bits/string.h >> +++ b/sysdeps/x86/bits/string.h >> @@ -62,6 +62,9 @@ >> | ((const unsigned char *) (src))[idx]) >> >> >> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */ >> +# define _HAVE_STRING_ARCH_mempcpy 1 >> + >> /* Copy N bytes of SRC to DEST. */ >> # define _HAVE_STRING_ARCH_memcpy 1 >> # define memcpy(dest, src, n) \ >> -- >> 2.5.0 >> > > It doesn't work since <bits/string.h> is included only if > > #if defined __GNUC__ && __GNUC__ >= 2 > # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ > && !defined __NO_INLINE__ && !defined __cplusplus > > is true. > I believe commit 05a910f7 was wrong. At minimum, mempcpy shouldn't be inlined in 2 different header files.
Comments
H.J. Lu <hjl.tools@gmail.com> wrote: >> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */ >> +# define _HAVE_STRING_ARCH_mempcpy 1 >> + >> /* Copy N bytes of SRC to DEST. */ >> # define _HAVE_STRING_ARCH_memcpy 1 >> # define memcpy(dest, src, n) \ >> -- >> 2.5.0 >> > > It doesn't work since <bits/string.h> is included only if > > #if defined __GNUC__ && __GNUC__ >= 2 > # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ > && !defined __NO_INLINE__ && !defined __cplusplus > > is true. > > I believe commit 05a910f7 was wrong. At minimum, > mempcpy shouldn't be inlined in 2 different header files. There is nothing wrong with that commit. I already posted patches that remove most of the redundant stuff from bits/string.h, including the duplicate mempcpy defines. I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you either, unless you use non-standard options or a very ancient compiler. The proper solution is to get rid of the bits/string.h mess altogether rather than conditionally including it. With my outstanding patches we're there most of the way. Wilco
On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > H.J. Lu <hjl.tools@gmail.com> wrote: >>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */ >>> +# define _HAVE_STRING_ARCH_mempcpy 1 >>> + >>> /* Copy N bytes of SRC to DEST. */ >>> # define _HAVE_STRING_ARCH_memcpy 1 >>> # define memcpy(dest, src, n) \ >>> -- >>> 2.5.0 >>> >> >> It doesn't work since <bits/string.h> is included only if >> >> #if defined __GNUC__ && __GNUC__ >= 2 >> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ >> && !defined __NO_INLINE__ && !defined __cplusplus >> >> is true. >> > >> I believe commit 05a910f7 was wrong. At minimum, >> mempcpy shouldn't be inlined in 2 different header files. > > There is nothing wrong with that commit. I already posted patches that remove most > of the redundant stuff from bits/string.h, including the duplicate mempcpy defines. > I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you > either, unless you use non-standard options or a very ancient compiler. I can make it to work. If we don't want to wait before the other mempcpy inline is removed first, we can put the new mempcpy inline in a new header file and x86 won't include it until the other mempcpy inline is removed. It is very odd to have mempcpy inlined in 2 different places. > The proper solution is to get rid of the bits/string.h mess altogether rather than > conditionally including it. With my outstanding patches we're there most of the way. > > Wilco > The remove patch should have gone in first before adding another one.
On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >> H.J. Lu <hjl.tools@gmail.com> wrote: >>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */ >>>> +# define _HAVE_STRING_ARCH_mempcpy 1 >>>> + >>>> /* Copy N bytes of SRC to DEST. */ >>>> # define _HAVE_STRING_ARCH_memcpy 1 >>>> # define memcpy(dest, src, n) \ >>>> -- >>>> 2.5.0 >>>> >>> >>> It doesn't work since <bits/string.h> is included only if >>> >>> #if defined __GNUC__ && __GNUC__ >= 2 >>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ >>> && !defined __NO_INLINE__ && !defined __cplusplus >>> >>> is true. >>> >> >>> I believe commit 05a910f7 was wrong. At minimum, >>> mempcpy shouldn't be inlined in 2 different header files. >> >> There is nothing wrong with that commit. I already posted patches that remove most >> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines. >> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you >> either, unless you use non-standard options or a very ancient compiler. > > I can make it to work. > > If we don't want to wait before the other mempcpy inline is removed first, > we can put the new mempcpy inline in a new header file and x86 won't > include it until the other mempcpy inline is removed. It is very odd > to have mempcpy inlined in 2 different places. > >> The proper solution is to get rid of the bits/string.h mess altogether rather than >> conditionally including it. With my outstanding patches we're there most of the way. >> >> Wilco >> > > The remove patch should have gone in first before adding another one. Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy checking should be in <string.h>. It belongs to a different header file.
H.J. Lu <hjl.tools@gmail.com> wrote: On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >> H.J. Lu <hjl.tools@gmail.com> wrote: >>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */ >>>> +# define _HAVE_STRING_ARCH_mempcpy 1 >>>> + >>>> /* Copy N bytes of SRC to DEST. */ >>>> # define _HAVE_STRING_ARCH_memcpy 1 >>>> # define memcpy(dest, src, n) \ >>>> -- >>>> 2.5.0 >>>> >>> >>> It doesn't work since <bits/string.h> is included only if >>> >>> #if defined __GNUC__ && __GNUC__ >= 2 >>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ >>> && !defined __NO_INLINE__ && !defined __cplusplus >>> >>> is true. >>> >> >>> I believe commit 05a910f7 was wrong. At minimum, >>> mempcpy shouldn't be inlined in 2 different header files. >> >> There is nothing wrong with that commit. I already posted patches that remove most >> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines. >> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you >> either, unless you use non-standard options or a very ancient compiler. > > I can make it to work. > > If we don't want to wait before the other mempcpy inline is removed first, > we can put the new mempcpy inline in a new header file and x86 won't > include it until the other mempcpy inline is removed. It is very odd > to have mempcpy inlined in 2 different places. Actually it is inlined in 3 different places as x86/bits/string.h also defines it. If _HAVE_STRING_ARCH_mempcpy doesn't work in all circumstances, it is due to the weird condition for including bits/string.h. That's on my list to get rid of, but if you want something asap then you could move this from <string.h> #if defined __GNUC__ && __GNUC__ >= 2 # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ && !defined __NO_INLINE__ && !defined __cplusplus into x86/bits/string.h and sparc/bits/string.h, and add your _HAVE_STRING_ARCH_mempcpy define before it. >> The proper solution is to get rid of the bits/string.h mess altogether rather than >> conditionally including it. With my outstanding patches we're there most of the way. >> >> Wilco >> > > The remove patch should have gone in first before adding another one. That wasn't feasible at the time due to the complex mess in string2.h. After 5 cleanup patches it has now become possible to remove it altogether. A few more and we can get rid of string2.h altogether! It may also be worth checking whether any of the inlines in i386/bits/string.h are still relevant today as many use rep movsb variants that stopped being useful 2 decades ago... > Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy > checking should be in <string.h>. It belongs to a different header file. Why? String.h contains lots of inlines. Wilco
On Tue, Mar 8, 2016 at 4:24 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >>> H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */ >>>>> +# define _HAVE_STRING_ARCH_mempcpy 1 >>>>> + >>>>> /* Copy N bytes of SRC to DEST. */ >>>>> # define _HAVE_STRING_ARCH_memcpy 1 >>>>> # define memcpy(dest, src, n) \ >>>>> -- >>>>> 2.5.0 >>>>> >>>> >>>> It doesn't work since <bits/string.h> is included only if >>>> >>>> #if defined __GNUC__ && __GNUC__ >= 2 >>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ >>>> && !defined __NO_INLINE__ && !defined __cplusplus >>>> >>>> is true. >>>> >>> >>>> I believe commit 05a910f7 was wrong. At minimum, >>>> mempcpy shouldn't be inlined in 2 different header files. >>> >>> There is nothing wrong with that commit. I already posted patches that remove most >>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines. >>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you >>> either, unless you use non-standard options or a very ancient compiler. >> >> I can make it to work. >> >> If we don't want to wait before the other mempcpy inline is removed first, >> we can put the new mempcpy inline in a new header file and x86 won't >> include it until the other mempcpy inline is removed. It is very odd >> to have mempcpy inlined in 2 different places. > > Actually it is inlined in 3 different places as x86/bits/string.h also defines it. > > If _HAVE_STRING_ARCH_mempcpy doesn't work in all circumstances, it is due > to the weird condition for including bits/string.h. That's on my list to get rid of, but > if you want something asap then you could move this from <string.h> > > #if defined __GNUC__ && __GNUC__ >= 2 > # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ > && !defined __NO_INLINE__ && !defined __cplusplus > > into x86/bits/string.h and sparc/bits/string.h, and add your _HAVE_STRING_ARCH_mempcpy > define before it. > >>> The proper solution is to get rid of the bits/string.h mess altogether rather than >>> conditionally including it. With my outstanding patches we're there most of the way. >>> >>> Wilco >>> >> >> The remove patch should have gone in first before adding another one. > > That wasn't feasible at the time due to the complex mess in string2.h. After 5 > cleanup patches it has now become possible to remove it altogether. A few > more and we can get rid of string2.h altogether! It may also be worth checking > whether any of the inlines in i386/bits/string.h are still relevant today as many > use rep movsb variants that stopped being useful 2 decades ago... Exactly. Has anyone reported any issues against it? >> Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy >> checking should be in <string.h>. It belongs to a different header file. > > Why? String.h contains lots of inlines. > > Wilco > <string.h> has #if defined __GNUC__ && __GNUC__ >= 2 # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \ && !defined __NO_INLINE__ && !defined __cplusplus /* When using GNU CC we provide some optimized versions of selected functions from this header. There are two kinds of optimizations: - machine-dependent optimizations, most probably using inline assembler code; these might be quite expensive since the code size can increase significantly. These optimizations are not used unless the symbol __USE_STRING_INLINES is defined before including this header. - machine-independent optimizations which do not increase the code size significantly and which optimize mainly situations where one or more arguments are compile-time constants. These optimizations are used always when the compiler is taught to optimize. One can inhibit all optimizations by defining __NO_STRING_INLINES. */ /* Get the machine-dependent optimizations (if any). */ # include <bits/string.h> /* These are generic optimizations which do not add too much inline code. */ # include <bits/string2.h> # endif # if __USE_FORTIFY_LEVEL > 0 && defined __fortify_function /* Functions with security checks. */ # include <bits/string3.h> # endif #endif Why do we want to remove <bits/string2.h>? Shouldn't inline mempcpy be there?
H.J. Lu <hjl.tools@gmail.com> wrote: > Why do we want to remove <bits/string2.h>? Shouldn't inline mempcpy > be there? Currently bits/string2.h is a huge mess without being beneficial (quite a few of the inlines are not useful or even detrimental to performance). Once bits/string2.h is cleaned up and included unconditionally, what's the difference between placing any remaining useful inlines in string2.h or string.h? Wilco
On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > H.J. Lu <hjl.tools@gmail.com> wrote: >> Why do we want to remove <bits/string2.h>? Shouldn't inline mempcpy >> be there? > > Currently bits/string2.h is a huge mess without being beneficial (quite a few of > the inlines are not useful or even detrimental to performance). > Once bits/string2.h is cleaned up and included unconditionally, what's the > difference between placing any remaining useful inlines in string2.h or string.h? Smaller header files? Why did we have string2.h to begin with?
H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >> >> H.J. Lu <hjl.tools@gmail.com> wrote: >>> Why do we want to remove <bits/string2.h>? Shouldn't inline mempcpy >>> be there? >> >> Currently bits/string2.h is a huge mess without being beneficial (quite a few of >> the inlines are not useful or even detrimental to performance). >> Once bits/string2.h is cleaned up and included unconditionally, what's the >> difference between placing any remaining useful inlines in string2.h or string.h? > > Smaller header files? Why did we have string2.h to begin with? Yes, string2.h is quite huge today, but it won't be when I'm finished with the cleanup. Wilco
On Tue, Mar 8, 2016 at 6:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > H.J. Lu <hjl.tools@gmail.com> wrote: >> On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: >>> >>> H.J. Lu <hjl.tools@gmail.com> wrote: >>>> Why do we want to remove <bits/string2.h>? Shouldn't inline mempcpy >>>> be there? >>> >>> Currently bits/string2.h is a huge mess without being beneficial (quite a few of >>> the inlines are not useful or even detrimental to performance). >>> Once bits/string2.h is cleaned up and included unconditionally, what's the >>> difference between placing any remaining useful inlines in string2.h or string.h? >> >> Smaller header files? Why did we have string2.h to begin with? > > Yes, string2.h is quite huge today, but it won't be when I'm finished with the cleanup. > > Wouldn't it be easier to review your change if you break it down into one inline function cleanup in one patch?
From 786bb62f66da6ffe6b2614191ca19a0f0d161af9 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Mon, 7 Mar 2016 13:40:54 -0800 Subject: [PATCH] Revert commit 05a910f7 mempcpy is inlined in string/bits/string2.h. Any improvement for mempcpy should be made in string/bits/string2.h, not in string/string.h. [BZ #19759] * string/string.h (mempcpy): Removed. (__mempcpy): Likewise. (__mempcpy_inline): Likewise. * sysdeps/sparc/bits/string.h (_HAVE_STRING_ARCH_mempcpy): Likewise. --- string/string.h | 19 ------------------- sysdeps/sparc/bits/string.h | 3 --- 2 files changed, 22 deletions(-) diff --git a/string/string.h b/string/string.h index 1f3e348..f46e3eb 100644 --- a/string/string.h +++ b/string/string.h @@ -636,25 +636,6 @@ extern char *basename (const char *__filename) __THROW __nonnull ((1)); # endif #endif -#if defined __USE_GNU && defined __OPTIMIZE__ \ - && defined __extern_always_inline && __GNUC_PREREQ (3,2) -# if !defined _FORCE_INLINES && !defined _HAVE_STRING_ARCH_mempcpy - -#undef mempcpy -#undef __mempcpy -#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) -#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n) - -__extern_always_inline void * -__mempcpy_inline (void *__restrict __dest, - const void *__restrict __src, size_t __n) -{ - return (char *) memcpy (__dest, __src, __n) + __n; -} - -# endif -#endif - __END_DECLS #endif /* string.h */ diff --git a/sysdeps/sparc/bits/string.h b/sysdeps/sparc/bits/string.h index 10beca6..1ef4243 100644 --- a/sysdeps/sparc/bits/string.h +++ b/sysdeps/sparc/bits/string.h @@ -26,6 +26,3 @@ /* sparc32 and sparc64 strchr(x, '\0') perform better than __rawmemchr(x, '\0'). */ #define _HAVE_STRING_ARCH_strchr 1 - -/* Don't inline mempcpy into memcpy as sparc has an optimized mempcpy. */ -#define _HAVE_STRING_ARCH_mempcpy 1 -- 2.5.0