Define _HAVE_STRING_ARCH_mempcpy to 1 for x86

Message ID 20160307205735.GA3204@intel.com
State New, archived
Headers

Commit Message

Lu, Hongjiu March 7, 2016, 8:57 p.m. UTC
  Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
define _HAVE_STRING_ARCH_mempcpy to 1 for x86.

If duplicated code between optimized memcpy and mempcpy is a concern,
we can add an entry point in memcpy and use it to implement mempcpy,
similar to the set of patches for __mempcpy_sse2_unaligned:

https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html

OK for master?


H.J.
---
	[BZ #19759]
	* sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
---
 sysdeps/x86/bits/string.h | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

H.J. Lu March 7, 2016, 9:20 p.m. UTC | #1
On Mon, Mar 7, 2016 at 12:57 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
> define _HAVE_STRING_ARCH_mempcpy to 1 for x86.
>
> If duplicated code between optimized memcpy and mempcpy is a concern,
> we can add an entry point in memcpy and use it to implement mempcpy,
> similar to the set of patches for __mempcpy_sse2_unaligned:
>
> https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html
>
> OK for master?
>
>
> H.J.
> ---
>         [BZ #19759]
>         * sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
> ---
>  sysdeps/x86/bits/string.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/sysdeps/x86/bits/string.h b/sysdeps/x86/bits/string.h
> index e4e019f..f5885b4 100644
> --- a/sysdeps/x86/bits/string.h
> +++ b/sysdeps/x86/bits/string.h
> @@ -62,6 +62,9 @@
>     | ((const unsigned char *) (src))[idx])
>
>
> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
> +# define _HAVE_STRING_ARCH_mempcpy 1
> +
>  /* Copy N bytes of SRC to DEST.  */
>  # define _HAVE_STRING_ARCH_memcpy 1
>  # define memcpy(dest, src, n) \
> --
> 2.5.0
>

It doesn't work since  <bits/string.h> is included only if

#if defined __GNUC__ && __GNUC__ >= 2
# if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
     && !defined __NO_INLINE__ && !defined __cplusplus

is true.
  

Patch

diff --git a/sysdeps/x86/bits/string.h b/sysdeps/x86/bits/string.h
index e4e019f..f5885b4 100644
--- a/sysdeps/x86/bits/string.h
+++ b/sysdeps/x86/bits/string.h
@@ -62,6 +62,9 @@ 
    | ((const unsigned char *) (src))[idx])
 
 
+/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy.  */
+# define _HAVE_STRING_ARCH_mempcpy 1
+
 /* Copy N bytes of SRC to DEST.  */
 # define _HAVE_STRING_ARCH_memcpy 1
 # define memcpy(dest, src, n) \