[AArch64] Inline mempcpy

Message ID 000101d080db$904af8d0$b0e0ea70$@com
State Superseded
Headers

Commit Message

Wilco Dijkstra April 27, 2015, 11:16 a.m. UTC
  Add inlining of mempcpy on AArch64 to expand into memcpy.

OK for commit?

2015-04-27  Wilco Dijkstra  <wdijkstr@arm.com>

        * sysdeps/aarch64/bits/string.h: New file.
        (_STRING_ARCH_unaligned): Define.
        (mempcpy): Redirect to __mempcpy_inline.
        (__mempcpy): Likewise.
        (__mempcpy_inline): New function.

---
 sysdeps/aarch64/bits/string.h | 50 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 sysdeps/aarch64/bits/string.h
  

Comments

Andrew Pinski April 27, 2015, 3:25 p.m. UTC | #1
> On Apr 27, 2015, at 4:16 AM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
> 
> Add inlining of mempcpy on AArch64 to expand into memcpy.
> 
> OK for commit?

This seems like this should be part of generic code rather than aarch64 specific versions. 

Thanks,
Andrew

> 
> 2015-04-27  Wilco Dijkstra  <wdijkstr@arm.com>
> 
>        * sysdeps/aarch64/bits/string.h: New file.
>        (_STRING_ARCH_unaligned): Define.
>        (mempcpy): Redirect to __mempcpy_inline.
>        (__mempcpy): Likewise.
>        (__mempcpy_inline): New function.
> 
> ---
> sysdeps/aarch64/bits/string.h | 50 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
> create mode 100644 sysdeps/aarch64/bits/string.h
> 
> diff --git a/sysdeps/aarch64/bits/string.h b/sysdeps/aarch64/bits/string.h
> new file mode 100644
> index 0000000..b0dacd6
> --- /dev/null
> +++ b/sysdeps/aarch64/bits/string.h
> @@ -0,0 +1,50 @@
> +/* Optimized, inlined string functions.  AArch64 version.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _STRING_H
> +# error "Never use <bits/string.h> directly; include <string.h> instead."
> +#endif
> +
> +/* AArch64 implementations support efficient unaligned access.  */
> +#define _STRING_ARCH_unaligned 1
> +
> +#if !defined __NO_STRING_INLINES
> +
> +# ifndef __STRING_INLINE
> +#  ifndef __extern_inline
> +#   define __STRING_INLINE inline
> +#  else
> +#   define __STRING_INLINE __extern_inline
> +#  endif
> +# endif
> +
> +# ifdef __USE_GNU
> +#  define _HAVE_STRING_ARCH_mempcpy 1
> +#  define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
> +#  define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
> +
> +__STRING_INLINE void *
> +__mempcpy_inline (void *__restrict __dest,
> +          const void *__restrict __src, size_t __n)
> +{
> +  return memcpy (__dest, __src, __n) + __n;
> +}
> +
> +# endif
> +
> +#endif
> -- 
> 1.9.1
> 
> 
>
  
Wilco Dijkstra April 27, 2015, 3:40 p.m. UTC | #2
> pinskia@gmail.com wrote:
> > On Apr 27, 2015, at 4:16 AM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
> >
> > Add inlining of mempcpy on AArch64 to expand into memcpy.
> >
> > OK for commit?
> 
> This seems like this should be part of generic code rather than aarch64 specific versions.

Absolutely, but the question is what is the best way of doing that? Some targets may want
to keep using their assembler mempcpy implementation.

Wilco
  
Adhemerval Zanella Netto April 29, 2015, 1:56 p.m. UTC | #3
On 27-04-2015 12:40, Wilco Dijkstra wrote:
>> pinskia@gmail.com wrote:
>>> On Apr 27, 2015, at 4:16 AM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
>>>
>>> Add inlining of mempcpy on AArch64 to expand into memcpy.
>>>
>>> OK for commit?
>>
>> This seems like this should be part of generic code rather than aarch64 specific versions.
> 
> Absolutely, but the question is what is the best way of doing that? Some targets may want
> to keep using their assembler mempcpy implementation.
> 
> Wilco
> 

My first though is to add it on string/bits/string2.h and then adjust the
bits/strings.h for each arch.  However seems that only s390 is really
defining _HAVE_STRING_ARCH_xxxx.  So you can start with aarch64 and
then we can check for the remaining ports.
  
Ondrej Bilka May 1, 2015, 12:30 p.m. UTC | #4
On Wed, Apr 29, 2015 at 10:56:32AM -0300, Adhemerval Zanella wrote:
> 
> 
> On 27-04-2015 12:40, Wilco Dijkstra wrote:
> >> pinskia@gmail.com wrote:
> >>> On Apr 27, 2015, at 4:16 AM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
> >>>
> >>> Add inlining of mempcpy on AArch64 to expand into memcpy.
> >>>
> >>> OK for commit?
> >>
> >> This seems like this should be part of generic code rather than aarch64 specific versions.
> > 
> > Absolutely, but the question is what is the best way of doing that? Some targets may want
> > to keep using their assembler mempcpy implementation.
> > 
> > Wilco
> > 
> 
> My first though is to add it on string/bits/string2.h and then adjust the
> bits/strings.h for each arch.  However seems that only s390 is really
> defining _HAVE_STRING_ARCH_xxxx.  So you can start with aarch64 and
> then we can check for the remaining ports.

Wilco, could you post patch as Adhemerval suggested? A git grep mempcpy
shows that only powerpc, sparc and i386/x64 define these.

I am ok with x64 part, how about sparc and powerpc?

As previously discussed it will improve performance due to smaller
icache pressure.

Its change that would never hurt as you could implement one of memcpy and mempcpy with other so they are few cycles apart.

Only way this could hurt is when one of mempcpy arguments isn't used
after call and that causes unnecessary spill.
  

Patch

diff --git a/sysdeps/aarch64/bits/string.h b/sysdeps/aarch64/bits/string.h
new file mode 100644
index 0000000..b0dacd6
--- /dev/null
+++ b/sysdeps/aarch64/bits/string.h
@@ -0,0 +1,50 @@ 
+/* Optimized, inlined string functions.  AArch64 version.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _STRING_H
+# error "Never use <bits/string.h> directly; include <string.h> instead."
+#endif
+
+/* AArch64 implementations support efficient unaligned access.  */
+#define _STRING_ARCH_unaligned 1
+
+#if !defined __NO_STRING_INLINES
+
+# ifndef __STRING_INLINE
+#  ifndef __extern_inline
+#   define __STRING_INLINE inline
+#  else
+#   define __STRING_INLINE __extern_inline
+#  endif
+# endif
+
+# ifdef __USE_GNU
+#  define _HAVE_STRING_ARCH_mempcpy 1
+#  define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
+#  define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
+
+__STRING_INLINE void *
+__mempcpy_inline (void *__restrict __dest,
+		  const void *__restrict __src, size_t __n)
+{
+  return memcpy (__dest, __src, __n) + __n;
+}
+
+# endif
+
+#endif