Define _HAVE_STRING_ARCH_mempcpy to 1 for x86
Commit Message
Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
define _HAVE_STRING_ARCH_mempcpy to 1 for x86.
If duplicated code between optimized memcpy and mempcpy is a concern,
we can add an entry point in memcpy and use it to implement mempcpy,
similar to the set of patches for __mempcpy_sse2_unaligned:
https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html
OK for master?
H.J.
---
[BZ #19759]
* sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
---
sysdeps/x86/bits/string.h | 3 +++
1 file changed, 3 insertions(+)
Comments
On Mon, Mar 7, 2016 at 12:57 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
> define _HAVE_STRING_ARCH_mempcpy to 1 for x86.
>
> If duplicated code between optimized memcpy and mempcpy is a concern,
> we can add an entry point in memcpy and use it to implement mempcpy,
> similar to the set of patches for __mempcpy_sse2_unaligned:
>
> https://sourceware.org/ml/libc-alpha/2016-03/msg00166.html
>
> OK for master?
>
>
> H.J.
> ---
> [BZ #19759]
> * sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
> ---
> sysdeps/x86/bits/string.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/sysdeps/x86/bits/string.h b/sysdeps/x86/bits/string.h
> index e4e019f..f5885b4 100644
> --- a/sysdeps/x86/bits/string.h
> +++ b/sysdeps/x86/bits/string.h
> @@ -62,6 +62,9 @@
> | ((const unsigned char *) (src))[idx])
>
>
> +/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */
> +# define _HAVE_STRING_ARCH_mempcpy 1
> +
> /* Copy N bytes of SRC to DEST. */
> # define _HAVE_STRING_ARCH_memcpy 1
> # define memcpy(dest, src, n) \
> --
> 2.5.0
>
It doesn't work since <bits/string.h> is included only if
#if defined __GNUC__ && __GNUC__ >= 2
# if defined __OPTIMIZE__ && !defined __OPTIMIZE_SIZE__ \
&& !defined __NO_INLINE__ && !defined __cplusplus
is true.
@@ -62,6 +62,9 @@
| ((const unsigned char *) (src))[idx])
+/* Don't inline mempcpy into memcpy as x86 has an optimized mempcpy. */
+# define _HAVE_STRING_ARCH_mempcpy 1
+
/* Copy N bytes of SRC to DEST. */
# define _HAVE_STRING_ARCH_memcpy 1
# define memcpy(dest, src, n) \