Revert commit 05a910f7

Message ID CAMe9rOrYywzUwdCmsG06ppzdKC9Gntwhia_kMp25ETTx+weDpQ@mail.gmail.com
State New, archived
Headers

Commit Message

H.J. Lu March 7, 2016, 9:47 p.m. UTC
  On Mon, Mar 7, 2016 at 1:20 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 12:57 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>> Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
>> define _HAVE_STRING_ARCH_mempcpy to 1 for x86.
>>
>> If duplicated code between optimized memcpy and mempcpy is a concern,
>> we can add an entry point in memcpy and use it to implement mempcpy,
>> similar to the set of patches for __mempcpy_sse2_unaligned:
>>
>> https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html
>>
>> OK for master?
>>
>>
>> H.J.
>> ---
>>         [BZ #19759]
>>         * sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
>> ---
>>  sysdeps/x86/bits/string.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/sysdeps/x86/bits/string.h b/sysdeps/x86/bits/string.h
>> index e4e019f..f5885b4 100644
>> --- a/sysdeps/x86/bits/string.h
>> +++ b/sysdeps/x86/bits/string.h
>> @@ -62,6 +62,9 @@
>>     | ((const unsigned char *) (src))[idx])
>>
>>
>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>> +# define _HAVE_STRING_ARCH_mempcpy 1
>> +
>>  /* Copy N bytes of SRC to DEST.  */
>>  # define _HAVE_STRING_ARCH_memcpy 1
>>  # define memcpy(dest, src, n) \
>> --
>> 2.5.0
>>
>
> It doesn't work since  <bits/string.h> is included only if
>
> #if defined __GNUC__ && __GNUC__ >= 2
> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>      && !defined __NO_INLINE__ && !defined __cplusplus
>
> is true.
>

I believe commit 05a910f7 was wrong.  At minimum,
mempcpy shouldn't be inlined in 2 different header files.
  

Comments

Wilco Dijkstra March 7, 2016, 11:20 p.m. UTC | #1
H.J. Lu <hjl.tools@gmail.com> wrote:
>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>> +# define _HAVE_STRING_ARCH_mempcpy 1
>> +
>>  /* Copy N bytes of SRC to DEST.  */
>>  # define _HAVE_STRING_ARCH_memcpy 1
>>  # define memcpy(dest, src, n) \
>> --
>> 2.5.0
>>
>
> It doesn't work since  <bits/string.h> is included only if
>
> #if defined __GNUC__ && __GNUC__ >= 2
> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>      && !defined __NO_INLINE__ && !defined __cplusplus
>
> is true.
>

> I believe commit 05a910f7 was wrong.  At minimum,
> mempcpy shouldn't be inlined in 2 different header files.

There is nothing wrong with that commit. I already posted patches that remove most 
of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
either, unless you use non-standard options or a very ancient compiler.

The proper solution is to get rid of the bits/string.h mess altogether rather than 
conditionally including it. With my outstanding patches we're there most of the way.

Wilco
  
H.J. Lu March 8, 2016, 12:13 a.m. UTC | #2
On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> H.J. Lu <hjl.tools@gmail.com> wrote:
>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>> +
>>>  /* Copy N bytes of SRC to DEST.  */
>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>  # define memcpy(dest, src, n) \
>>> --
>>> 2.5.0
>>>
>>
>> It doesn't work since  <bits/string.h> is included only if
>>
>> #if defined __GNUC__ && __GNUC__ >= 2
>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>
>> is true.
>>
>
>> I believe commit 05a910f7 was wrong.  At minimum,
>> mempcpy shouldn't be inlined in 2 different header files.
>
> There is nothing wrong with that commit. I already posted patches that remove most
> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
> either, unless you use non-standard options or a very ancient compiler.

I can make it to work.

If we don't want to wait before the other mempcpy inline is removed first,
we can put the new mempcpy inline in a new header file and x86 won't
include it until the other mempcpy inline is removed.  It is very odd
to have mempcpy inlined in 2 different places.

> The proper solution is to get rid of the bits/string.h mess altogether rather than
> conditionally including it. With my outstanding patches we're there most of the way.
>
> Wilco
>

The remove patch should have gone in first before adding another one.
  
H.J. Lu March 8, 2016, 1:29 a.m. UTC | #3
On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>>> +
>>>>  /* Copy N bytes of SRC to DEST.  */
>>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>>  # define memcpy(dest, src, n) \
>>>> --
>>>> 2.5.0
>>>>
>>>
>>> It doesn't work since  <bits/string.h> is included only if
>>>
>>> #if defined __GNUC__ && __GNUC__ >= 2
>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>>
>>> is true.
>>>
>>
>>> I believe commit 05a910f7 was wrong.  At minimum,
>>> mempcpy shouldn't be inlined in 2 different header files.
>>
>> There is nothing wrong with that commit. I already posted patches that remove most
>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
>> either, unless you use non-standard options or a very ancient compiler.
>
> I can make it to work.
>
> If we don't want to wait before the other mempcpy inline is removed first,
> we can put the new mempcpy inline in a new header file and x86 won't
> include it until the other mempcpy inline is removed.  It is very odd
> to have mempcpy inlined in 2 different places.
>
>> The proper solution is to get rid of the bits/string.h mess altogether rather than
>> conditionally including it. With my outstanding patches we're there most of the way.
>>
>> Wilco
>>
>
> The remove patch should have gone in first before adding another one.

Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy
checking should be in <string.h>.  It belongs to a different header file.
  
Wilco Dijkstra March 8, 2016, 12:24 p.m. UTC | #4
H.J. Lu <hjl.tools@gmail.com> wrote:
On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>>> +
>>>>  /* Copy N bytes of SRC to DEST.  */
>>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>>  # define memcpy(dest, src, n) \
>>>> --
>>>> 2.5.0
>>>>
>>>
>>> It doesn't work since  <bits/string.h> is included only if
>>>
>>> #if defined __GNUC__ && __GNUC__ >= 2
>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>>
>>> is true.
>>>
>>
>>> I believe commit 05a910f7 was wrong.  At minimum,
>>> mempcpy shouldn't be inlined in 2 different header files.
>>
>> There is nothing wrong with that commit. I already posted patches that remove most
>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
>> either, unless you use non-standard options or a very ancient compiler.
>
> I can make it to work.
>
> If we don't want to wait before the other mempcpy inline is removed first,
> we can put the new mempcpy inline in a new header file and x86 won't
> include it until the other mempcpy inline is removed.  It is very odd
> to have mempcpy inlined in 2 different places.

Actually it is inlined in 3 different places as x86/bits/string.h also defines it. 

If _HAVE_STRING_ARCH_mempcpy doesn't work in all circumstances, it is due
to the weird condition for including bits/string.h. That's on my list to get rid of, but
if you want something asap then you could move this from <string.h>

#if defined __GNUC__ && __GNUC__ >= 2
# if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
      && !defined __NO_INLINE__ && !defined __cplusplus

into x86/bits/string.h and sparc/bits/string.h, and add your _HAVE_STRING_ARCH_mempcpy
define before it.

>> The proper solution is to get rid of the bits/string.h mess altogether rather than
>> conditionally including it. With my outstanding patches we're there most of the way.
>>
>> Wilco
>>
>
> The remove patch should have gone in first before adding another one.

That wasn't feasible at the time due to the complex mess in string2.h. After 5 
cleanup patches it has now become possible to remove it altogether. A few
more and we can get rid of string2.h altogether! It may also be worth checking 
whether any of the inlines in i386/bits/string.h are still relevant today as many
use rep movsb variants that stopped being useful 2 decades ago...

> Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy
> checking should be in <string.h>.  It belongs to a different header file.

Why? String.h contains lots of inlines.

Wilco
  
H.J. Lu March 8, 2016, 1:02 p.m. UTC | #5
On Tue, Mar 8, 2016 at 4:24 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 4:13 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Mar 7, 2016 at 3:20 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
>>>>> +# define _HAVE_STRING_ARCH_mempcpy 1
>>>>> +
>>>>>  /* Copy N bytes of SRC to DEST.  */
>>>>>  # define _HAVE_STRING_ARCH_memcpy 1
>>>>>  # define memcpy(dest, src, n) \
>>>>> --
>>>>> 2.5.0
>>>>>
>>>>
>>>> It doesn't work since  <bits/string.h> is included only if
>>>>
>>>> #if defined __GNUC__ && __GNUC__ >= 2
>>>> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>>>>      && !defined __NO_INLINE__ && !defined __cplusplus
>>>>
>>>> is true.
>>>>
>>>
>>>> I believe commit 05a910f7 was wrong.  At minimum,
>>>> mempcpy shouldn't be inlined in 2 different header files.
>>>
>>> There is nothing wrong with that commit. I already posted patches that remove most
>>> of the redundant stuff from bits/string.h, including the duplicate mempcpy defines.
>>> I don't understand how defining _HAVE_STRING_ARCH_mempcpy doesn't work for you
>>> either, unless you use non-standard options or a very ancient compiler.
>>
>> I can make it to work.
>>
>> If we don't want to wait before the other mempcpy inline is removed first,
>> we can put the new mempcpy inline in a new header file and x86 won't
>> include it until the other mempcpy inline is removed.  It is very odd
>> to have mempcpy inlined in 2 different places.
>
> Actually it is inlined in 3 different places as x86/bits/string.h also defines it.
>
> If _HAVE_STRING_ARCH_mempcpy doesn't work in all circumstances, it is due
> to the weird condition for including bits/string.h. That's on my list to get rid of, but
> if you want something asap then you could move this from <string.h>
>
> #if defined __GNUC__ && __GNUC__ >= 2
> # if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
>       && !defined __NO_INLINE__ && !defined __cplusplus
>
> into x86/bits/string.h and sparc/bits/string.h, and add your _HAVE_STRING_ARCH_mempcpy
> define before it.
>
>>> The proper solution is to get rid of the bits/string.h mess altogether rather than
>>> conditionally including it. With my outstanding patches we're there most of the way.
>>>
>>> Wilco
>>>
>>
>> The remove patch should have gone in first before adding another one.
>
> That wasn't feasible at the time due to the complex mess in string2.h. After 5
> cleanup patches it has now become possible to remove it altogether. A few
> more and we can get rid of string2.h altogether! It may also be worth checking
> whether any of the inlines in i386/bits/string.h are still relevant today as many
> use rep movsb variants that stopped being useful 2 decades ago...

Exactly.  Has anyone reported any issues against it?

>> Another thing, I don't think inline with _HAVE_STRING_ARCH_mempcpy
>> checking should be in <string.h>.  It belongs to a different header file.
>
> Why? String.h contains lots of inlines.
>
> Wilco
>

<string.h> has

#if defined __GNUC__ && __GNUC__ >= 2
# if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
     && !defined __NO_INLINE__ && !defined __cplusplus
/* When using GNU CC we provide some optimized versions of selected
   functions from this header.  There are two kinds of optimizations:

   - machine-dependent optimizations, most probably using inline
     assembler code; these might be quite expensive since the code
     size can increase significantly.
     These optimizations are not used unless the symbol
__USE_STRING_INLINES
     is defined before including this header.

   - machine-independent optimizations which do not increase the
     code size significantly and which optimize mainly situations
     where one or more arguments are compile-time constants.
     These optimizations are used always when the compiler is
     taught to optimize.

   One can inhibit all optimizations by defining __NO_STRING_INLINES.  */

/* Get the machine-dependent optimizations (if any).  */
#  include <bits/string.h>

/* These are generic optimizations which do not add too much inline code.  */
#  include <bits/string2.h>
# endif

# if __USE_FORTIFY_LEVEL > 0 && defined __fortify_function
/* Functions with security checks.  */
#  include <bits/string3.h>
# endif
#endif

Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
be there?
  
Wilco Dijkstra March 8, 2016, 1:32 p.m. UTC | #6
H.J. Lu <hjl.tools@gmail.com> wrote:
> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
> be there?

Currently bits/string2.h is a huge mess without being beneficial (quite a few of
the inlines are not useful or even detrimental to performance).
Once bits/string2.h is cleaned up and included unconditionally, what's the
difference between placing any remaining useful inlines in string2.h or string.h?

Wilco
  
H.J. Lu March 8, 2016, 1:48 p.m. UTC | #7
On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> H.J. Lu <hjl.tools@gmail.com> wrote:
>> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
>> be there?
>
> Currently bits/string2.h is a huge mess without being beneficial (quite a few of
> the inlines are not useful or even detrimental to performance).
> Once bits/string2.h is cleaned up and included unconditionally, what's the
> difference between placing any remaining useful inlines in string2.h or string.h?

Smaller header files? Why did we have string2.h to begin with?
  
Wilco Dijkstra March 8, 2016, 2:08 p.m. UTC | #8
H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>
>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
>>> be there?
>>
>> Currently bits/string2.h is a huge mess without being beneficial (quite a few of
>> the inlines are not useful or even detrimental to performance).
>> Once bits/string2.h is cleaned up and included unconditionally, what's the
>> difference between placing any remaining useful inlines in string2.h or string.h?
>
> Smaller header files? Why did we have string2.h to begin with?

Yes, string2.h is quite huge today, but it won't be when I'm finished with the cleanup.

Wilco
  
H.J. Lu March 8, 2016, 2:32 p.m. UTC | #9
On Tue, Mar 8, 2016 at 6:08 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Mar 8, 2016 at 5:32 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>>
>>> H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> Why do we want to remove  <bits/string2.h>?  Shouldn't inline mempcpy
>>>> be there?
>>>
>>> Currently bits/string2.h is a huge mess without being beneficial (quite a few of
>>> the inlines are not useful or even detrimental to performance).
>>> Once bits/string2.h is cleaned up and included unconditionally, what's the
>>> difference between placing any remaining useful inlines in string2.h or string.h?
>>
>> Smaller header files? Why did we have string2.h to begin with?
>
> Yes, string2.h is quite huge today, but it won't be when I'm finished with the cleanup.
>
>

Wouldn't it be easier to review your change if you break it down into
one inline function cleanup in one patch?
  

Patch

From 786bb62f66da6ffe6b2614191ca19a0f0d161af9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 7 Mar 2016 13:40:54 -0800
Subject: [PATCH] Revert commit 05a910f7

mempcpy is inlined in string/bits/string2.h.  Any improvement for mempcpy
should be made in string/bits/string2.h, not in string/string.h.

	[BZ #19759]
	* string/string.h (mempcpy): Removed.
	(__mempcpy): Likewise.
	(__mempcpy_inline): Likewise.
	* sysdeps/sparc/bits/string.h (_HAVE_STRING_ARCH_mempcpy):
	Likewise.
---
 string/string.h             | 19 -------------------
 sysdeps/sparc/bits/string.h |  3 ---
 2 files changed, 22 deletions(-)

diff --git a/string/string.h b/string/string.h
index 1f3e348..f46e3eb 100644
--- a/string/string.h
+++ b/string/string.h
@@ -636,25 +636,6 @@  extern char *basename (const char *__filename) __THROW __nonnull ((1));
 # endif
 #endif
 
-#if defined __USE_GNU && defined __OPTIMIZE__ \
-    && defined __extern_always_inline && __GNUC_PREREQ (3,2)
-# if !defined _FORCE_INLINES && !defined _HAVE_STRING_ARCH_mempcpy
-
-#undef mempcpy
-#undef __mempcpy
-#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
-#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
-
-__extern_always_inline void *
-__mempcpy_inline (void *__restrict __dest,
-		  const void *__restrict __src, size_t __n)
-{
-  return (char *) memcpy (__dest, __src, __n) + __n;
-}
-
-# endif
-#endif
-
 __END_DECLS
 
 #endif /* string.h  */
diff --git a/sysdeps/sparc/bits/string.h b/sysdeps/sparc/bits/string.h
index 10beca6..1ef4243 100644
--- a/sysdeps/sparc/bits/string.h
+++ b/sysdeps/sparc/bits/string.h
@@ -26,6 +26,3 @@ 
 /* sparc32 and sparc64 strchr(x, '\0') perform better than
    __rawmemchr(x, '\0').  */
 #define _HAVE_STRING_ARCH_strchr 1
-
-/* Don't inline mempcpy into memcpy as sparc has an optimized mempcpy.  */
-#define _HAVE_STRING_ARCH_mempcpy 1
-- 
2.5.0