Patchwork Revert commit 05a910f7

login
register
mail settings
Submitter H.J. Lu
Date March 7, 2016, 9:47 p.m.
Message ID <CAMe9rOrYywzUwdCmsG06ppzdKC9Gntwhia_kMp25ETTx+weDpQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/11247/
State New
Headers show

Comments

H.J. Lu - March 7, 2016, 9:47 p.m.
On Mon, Mar 7, 2016 at 1:20 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 12:57 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
>> define _HAVE_STRING_ARCH_mempcpy to 1 for x86.
>>
>> If duplicated code between optimized memcpy and mempcpy is a concern,
>> we can add an entry point in memcpy and use it to implement mempcpy,
>> similar to the set of patches for __mempcpy_sse2_unaligned:
>>
>> https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html
>>
>> OK for master?
>>
>>
>> H.J.
>> ---
>>         [BZ #19759]
>>         * sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
>> ---
>>  sysdeps/x86/bits/string.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/sysdeps/x86/bits/string.h b/sysdeps/x86/bits/string.h
>> index e4e019f..f5885b4 100644
>> --- a/sysdeps/x86/bits/string.h
>> +++ b/sysdeps/x86/bits/string.h
>> @@ -62,6 +62,9 @@
>>     | ((const unsigned char *) (src))[idx])
>>
>>
>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>> +# define _HAVE_STRING_ARCH_mempcpy 1
>> +
>>  /* Copy N bytes of SRC to DEST.  */
>>  # define _HAVE_STRING_ARCH_memcpy 1
>>  # define memcpy(dest, src, n) \
>> --
>> 2.5.0
>>
>
> It doesn't work since  <bits/string.h> is included only if
>
> #if defined __GNUC__ && __GNUC__ >= 2
> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>      && !defined __NO_INLINE__ && !defined __cplusplus
>
> is true.
>

I believe commit 05a910f7 was wrong.  At minimum,
mempcpy shouldn't be inlined in 2 different header files.
Wilco Dijkstra - March 7, 2016, 11:20 p.m.
H.J. Lu <hjl.tools@gmail.com> wrote:
>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>> +# define _HAVE_STRING_ARCH_mempcpy 1
>> +
>>  /* Copy N bytes of SRC to DEST.  */
>>  # define _HAVE_STRING_ARCH_memcpy 1
>>  # define memcpy(dest, src, n) \
>> --
>> 2.5.0
>>
>
> It doesn't work since  <bits/string.h> is included only if
>
> #if defined __GNUC__ && __GNUC__ >= 2
> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>      && !defined __NO_INLINE__ && !defined __cplusplus
>
> is true.
>

> I believe commit 05a910f7 was wrong.  At minimum,
> mempcpy shouldn't be inlined in 2 different header files.

There is nothing wrong with that commit. I already posted patches that remove most 
of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
either, unless you use non-standard options or a very ancient compiler.

The proper solution is to get rid of the bits/string.h mess altogether rather than 
conditionally including it. With my outstanding patches we're there most of the way.

Wilco
H.J. Lu - March 8, 2016, 12:13 a.m.
On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> H.J. Lu <hjl.tools@gmail.com> wrote:
>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>> +
>>>  /* Copy N bytes of SRC to DEST.  */
>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>  # define memcpy(dest, src, n) \
>>> --
>>> 2.5.0
>>>
>>
>> It doesn't work since  <bits/string.h> is included only if
>>
>> #if defined __GNUC__ && __GNUC__ >= 2
>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>
>> is true.
>>
>
>> I believe commit 05a910f7 was wrong.  At minimum,
>> mempcpy shouldn't be inlined in 2 different header files.
>
> There is nothing wrong with that commit. I already posted patches that remove most
> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
> either, unless you use non-standard options or a very ancient compiler.

I can make it to work.

If we don't want to wait before the other mempcpy inline is removed first,
we can put the new mempcpy inline in a new header file and x86 won't
include it until the other mempcpy inline is removed.  It is very odd
to have mempcpy inlined in 2 different places.

> The proper solution is to get rid of the bits/string.h mess altogether rather than
> conditionally including it. With my outstanding patches we're there most of the way.
>
> Wilco
>

The remove patch should have gone in first before adding another one.
H.J. Lu - March 8, 2016, 1:29 a.m.
On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>>> +
>>>>  /* Copy N bytes of SRC to DEST.  */
>>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>>  # define memcpy(dest, src, n) \
>>>> --
>>>> 2.5.0
>>>>
>>>
>>> It doesn't work since  <bits/string.h> is included only if
>>>
>>> #if defined __GNUC__ && __GNUC__ >= 2
>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>>
>>> is true.
>>>
>>
>>> I believe commit 05a910f7 was wrong.  At minimum,
>>> mempcpy shouldn't be inlined in 2 different header files.
>>
>> There is nothing wrong with that commit. I already posted patches that remove most
>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
>> either, unless you use non-standard options or a very ancient compiler.
>
> I can make it to work.
>
> If we don't want to wait before the other mempcpy inline is removed first,
> we can put the new mempcpy inline in a new header file and x86 won't
> include it until the other mempcpy inline is removed.  It is very odd
> to have mempcpy inlined in 2 different places.
>
>> The proper solution is to get rid of the bits/string.h mess altogether rather than
>> conditionally including it. With my outstanding patches we're there most of the way.
>>
>> Wilco
>>
>
> The remove patch should have gone in first before adding another one.

Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy
checking should be in <string.h>.  It belongs to a different header file.
Wilco Dijkstra - March 8, 2016, 12:24 p.m.
H.J. Lu <hjl.tools@gmail.com> wrote:
On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>>> +
>>>>  /* Copy N bytes of SRC to DEST.  */
>>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>>  # define memcpy(dest, src, n) \
>>>> --
>>>> 2.5.0
>>>>
>>>
>>> It doesn't work since  <bits/string.h> is included only if
>>>
>>> #if defined __GNUC__ && __GNUC__ >= 2
>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>>
>>> is true.
>>>
>>
>>> I believe commit 05a910f7 was wrong.  At minimum,
>>> mempcpy shouldn't be inlined in 2 different header files.
>>
>> There is nothing wrong with that commit. I already posted patches that remove most
>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
>> either, unless you use non-standard options or a very ancient compiler.
>
> I can make it to work.
>
> If we don't want to wait before the other mempcpy inline is removed first,
> we can put the new mempcpy inline in a new header file and x86 won't
> include it until the other mempcpy inline is removed.  It is very odd
> to have mempcpy inlined in 2 different places.

Actually it is inlined in 3 different places as x86/bits/string.h also defines it. 

If _HAVE_STRING_ARCH_mempcpy doesn't work in all circumstances, it is due
to the weird condition for including bits/string.h. That's on my list to get rid of, but
if you want something asap then you could move this from <string.h>

#if defined __GNUC__ && __GNUC__ >= 2
# if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
      && !defined __NO_INLINE__ && !defined __cplusplus

into x86/bits/string.h and sparc/bits/string.h, and add your _HAVE_STRING_ARCH_mempcpy
define before it.

>> The proper solution is to get rid of the bits/string.h mess altogether rather than
>> conditionally including it. With my outstanding patches we're there most of the way.
>>
>> Wilco
>>
>
> The remove patch should have gone in first before adding another one.

That wasn't feasible at the time due to the complex mess in string2.h. After 5 
cleanup patches it has now become possible to remove it altogether. A few
more and we can get rid of string2.h altogether! It may also be worth checking 
whether any of the inlines in i386/bits/string.h are still relevant today as many
use rep movsb variants that stopped being useful 2 decades ago...

> Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy
> checking should be in <string.h>.  It belongs to a different header file.

Why? String.h contains lots of inlines.

Wilco
H.J. Lu - March 8, 2016, 1:02 p.m.
On Tue, Mar 8, 2016 at 4:24 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>>>> +
>>>>>  /* Copy N bytes of SRC to DEST.  */
>>>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>>>  # define memcpy(dest, src, n) \
>>>>> --
>>>>> 2.5.0
>>>>>
>>>>
>>>> It doesn't work since  <bits/string.h> is included only if
>>>>
>>>> #if defined __GNUC__ && __GNUC__ >= 2
>>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>>>
>>>> is true.
>>>>
>>>
>>>> I believe commit 05a910f7 was wrong.  At minimum,
>>>> mempcpy shouldn't be inlined in 2 different header files.
>>>
>>> There is nothing wrong with that commit. I already posted patches that remove most
>>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
>>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
>>> either, unless you use non-standard options or a very ancient compiler.
>>
>> I can make it to work.
>>
>> If we don't want to wait before the other mempcpy inline is removed first,
>> we can put the new mempcpy inline in a new header file and x86 won't
>> include it until the other mempcpy inline is removed.  It is very odd
>> to have mempcpy inlined in 2 different places.
>
> Actually it is inlined in 3 different places as x86/bits/string.h also defines it.
>
> If _HAVE_STRING_ARCH_mempcpy doesn't work in all circumstances, it is due
> to the weird condition for including bits/string.h. That's on my list to get rid of, but
> if you want something asap then you could move this from <string.h>
>
> #if defined __GNUC__ && __GNUC__ >= 2
> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>       && !defined __NO_INLINE__ && !defined __cplusplus
>
> into x86/bits/string.h and sparc/bits/string.h, and add your _HAVE_STRING_ARCH_mempcpy
> define before it.
>
>>> The proper solution is to get rid of the bits/string.h mess altogether rather than
>>> conditionally including it. With my outstanding patches we're there most of the way.
>>>
>>> Wilco
>>>
>>
>> The remove patch should have gone in first before adding another one.
>
> That wasn't feasible at the time due to the complex mess in string2.h. After 5
> cleanup patches it has now become possible to remove it altogether. A few
> more and we can get rid of string2.h altogether! It may also be worth checking
> whether any of the inlines in i386/bits/string.h are still relevant today as many
> use rep movsb variants that stopped being useful 2 decades ago...

Exactly.  Has anyone reported any issues against it?

>> Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy
>> checking should be in <string.h>.  It belongs to a different header file.
>
> Why? String.h contains lots of inlines.
>
> Wilco
>

<string.h> has

#if defined __GNUC__ && __GNUC__ >= 2
# if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
     && !defined __NO_INLINE__ && !defined __cplusplus
/* When using GNU CC we provide some optimized versions of selected
   functions from this header.  There are two kinds of optimizations:

   - machine-dependent optimizations, most probably using inline
     assembler code; these might be quite expensive since the code
     size can increase significantly.
     These optimizations are not used unless the symbol
__USE_STRING_INLINES
     is defined before including this header.

   - machine-independent optimizations which do not increase the
     code size significantly and which optimize mainly situations
     where one or more arguments are compile-time constants.
     These optimizations are used always when the compiler is
     taught to optimize.

   One can inhibit all optimizations by defining __NO_STRING_INLINES.  */

/* Get the machine-dependent optimizations (if any).  */
#  include <bits/string.h>

/* These are generic optimizations which do not add too much inline code.  */
#  include <bits/string2.h>
# endif

# if __USE_FORTIFY_LEVEL > 0 && defined __fortify_function
/* Functions with security checks.  */
#  include <bits/string3.h>
# endif
#endif

Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
be there?
Wilco Dijkstra - March 8, 2016, 1:32 p.m.
H.J. Lu <hjl.tools@gmail.com> wrote:
> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
> be there?

Currently bits/string2.h is a huge mess without being beneficial (quite a few of
the inlines are not useful or even detrimental to performance).
Once bits/string2.h is cleaned up and included unconditionally, what's the
difference between placing any remaining useful inlines in string2.h or string.h?

Wilco
H.J. Lu - March 8, 2016, 1:48 p.m.
On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> H.J. Lu <hjl.tools@gmail.com> wrote:
>> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
>> be there?
>
> Currently bits/string2.h is a huge mess without being beneficial (quite a few of
> the inlines are not useful or even detrimental to performance).
> Once bits/string2.h is cleaned up and included unconditionally, what's the
> difference between placing any remaining useful inlines in string2.h or string.h?

Smaller header files? Why did we have string2.h to begin with?
Wilco Dijkstra - March 8, 2016, 2:08 p.m.
H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>
>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
>>> be there?
>>
>> Currently bits/string2.h is a huge mess without being beneficial (quite a few of
>> the inlines are not useful or even detrimental to performance).
>> Once bits/string2.h is cleaned up and included unconditionally, what's the
>> difference between placing any remaining useful inlines in string2.h or string.h?
>
> Smaller header files? Why did we have string2.h to begin with?

Yes, string2.h is quite huge today, but it won't be when I'm finished with the cleanup.

Wilco
H.J. Lu - March 8, 2016, 2:32 p.m.
On Tue, Mar 8, 2016 at 6:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>>
>>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
>>>> be there?
>>>
>>> Currently bits/string2.h is a huge mess without being beneficial (quite a few of
>>> the inlines are not useful or even detrimental to performance).
>>> Once bits/string2.h is cleaned up and included unconditionally, what's the
>>> difference between placing any remaining useful inlines in string2.h or string.h?
>>
>> Smaller header files? Why did we have string2.h to begin with?
>
> Yes, string2.h is quite huge today, but it won't be when I'm finished with the cleanup.
>
>

Wouldn't it be easier to review your change if you break it down into
one inline function cleanup in one patch?

Patch

From 786bb62f66da6ffe6b2614191ca19a0f0d161af9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 7 Mar 2016 13:40:54 -0800
Subject: [PATCH] Revert commit 05a910f7

mempcpy is inlined in string/bits/string2.h.  Any improvement for mempcpy
should be made in string/bits/string2.h, not in string/string.h.

	[BZ #19759]
	* string/string.h (mempcpy): Removed.
	(__mempcpy): Likewise.
	(__mempcpy_inline): Likewise.
	* sysdeps/sparc/bits/string.h (_HAVE_STRING_ARCH_mempcpy):
	Likewise.
---
 string/string.h             | 19 -------------------
 sysdeps/sparc/bits/string.h |  3 ---
 2 files changed, 22 deletions(-)

diff --git a/string/string.h b/string/string.h
index 1f3e348..f46e3eb 100644
--- a/string/string.h
+++ b/string/string.h
@@ -636,25 +636,6 @@  extern char *basename (const char *__filename) __THROW __nonnull ((1));
 # endif
 #endif
 
-#if defined __USE_GNU && defined __OPTIMIZE__ \
-    && defined __extern_always_inline && __GNUC_PREREQ (3,2)
-# if !defined _FORCE_INLINES && !defined _HAVE_STRING_ARCH_mempcpy
-
-#undef mempcpy
-#undef __mempcpy
-#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
-#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
-
-__extern_always_inline void *
-__mempcpy_inline (void *__restrict __dest,
-		  const void *__restrict __src, size_t __n)
-{
-  return (char *) memcpy (__dest, __src, __n) + __n;
-}
-
-# endif
-#endif
-
 __END_DECLS
 
 #endif /* string.h  */
diff --git a/sysdeps/sparc/bits/string.h b/sysdeps/sparc/bits/string.h
index 10beca6..1ef4243 100644
--- a/sysdeps/sparc/bits/string.h
+++ b/sysdeps/sparc/bits/string.h
@@ -26,6 +26,3 @@ 
 /* sparc32 and sparc64 strchr(x, '\0') perform better than
    __rawmemchr(x, '\0').  */
 #define _HAVE_STRING_ARCH_strchr 1
-
-/* Don't inline mempcpy into memcpy as sparc has an optimized mempcpy.  */
-#define _HAVE_STRING_ARCH_mempcpy 1
-- 
2.5.0