[2/2] Remove ancient GCC string inlines

Message ID 000401d102aa$8db859e0$a9290da0$@com
State Superseded
Headers

Commit Message

Wilco Dijkstra Oct. 9, 2015, 3:52 p.m. UTC
  Move mempcpy, strcpy and stpcpy inlines to string/string-inlines.c as they are no longer used in
string2.h. Passes AArch64, x86 and x64 regression.

OK for commit?

ChangeLog:
2015-10-09  Wilco Dijkstra  wdijkstr@arm.com

	* string/string-inlines.c (__STRING2_COPY_TYPE): Moved from string2.h.
	(__mempcpy_small): Likewise.
	(__strcpy_small): Likewise.
	(__stpcpy_small): Likewise.
	* string/bits/string2.h (__STRING2_COPY_TYPE): Remove.
	(__mempcpy_small): Remove.
	(__strcpy_small): Remove.
	(__stpcpy_small): Remove.
---
 string/bits/string2.h   | 372 ------------------------------------------------
 string/string-inlines.c | 346 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 346 insertions(+), 372 deletions(-)
  

Comments

Joseph Myers Oct. 9, 2015, 4:01 p.m. UTC | #1
On Fri, 9 Oct 2015, Wilco Dijkstra wrote:

> Move mempcpy, strcpy and stpcpy inlines to string/string-inlines.c as 
> they are no longer used in string2.h. Passes AArch64, x86 and x64 
> regression.

I take it's that's full testsuite runs, not just the string/ directory 
(so, in particular, you verified that the ABI tests for libc.so pass and 
that these functions are still in libc.so for x86 - I don't see quite how 
that would work with this patch since you didn't touch 
sysdeps/i386/string-inlines.c).

As a followup, such functions (no longer used in inlines in any installed 
header) could be made into compat symbols (not in libc.a, not available 
for new links, not included in future glibc ports).
  
Wilco Dijkstra Oct. 9, 2015, 4:40 p.m. UTC | #2
> Joseph Myers wrote:
> On Fri, 9 Oct 2015, Wilco Dijkstra wrote:
> 
> > Move mempcpy, strcpy and stpcpy inlines to string/string-inlines.c as
> > they are no longer used in string2.h. Passes AArch64, x86 and x64
> > regression.
> 
> I take it's that's full testsuite runs, not just the string/ directory
> (so, in particular, you verified that the ABI tests for libc.so pass and
> that these functions are still in libc.so for x86 - I don't see quite how
> that would work with this patch since you didn't touch
> sysdeps/i386/string-inlines.c).

You wrote in http://sourceware.org/ml/libc-alpha/2015-05/msg00770.html:
"(sysdeps/i386/string-inlines.c can be ignored or removed since plain i386 
is no longer supported; only string/string-inlines.c and 
sysdeps/i386/i486/string-inlines.c are relevant.)"

Looking at the x86 build, string-inlines.o is indeed string/string-inlines.c so
That confirms sysdeps/i386/string-inlines.c is not built at all. No idea what magic
ensures this, but that's what I get...

I did full testsuite runs of course, and the only failures are the usual suspects
(besides nptl/*):

tests.sum:FAIL: debug/tst-backtrace2
tests.sum:FAIL: debug/tst-backtrace3
tests.sum:FAIL: debug/tst-backtrace4
tests.sum:FAIL: debug/tst-backtrace5
tests.sum:FAIL: debug/tst-backtrace6
tests.sum:FAIL: debug/tst-chk4
tests.sum:FAIL: debug/tst-chk5
tests.sum:FAIL: debug/tst-chk6
tests.sum:FAIL: debug/tst-lfschk4
tests.sum:FAIL: debug/tst-lfschk5
tests.sum:FAIL: debug/tst-lfschk6
tests.sum:FAIL: dlfcn/bug-atexit3
tests.sum:FAIL: elf/check-abi-libc
tests.sum:FAIL: grp/testgrp
tests.sum:FAIL: posix/globtest
tests.sum:FAIL: rt/tst-cpuclock2
tests.sum:FAIL: rt/tst-mqueue8
tests.sum:FAIL: rt/tst-mqueue8x

> As a followup, such functions (no longer used in inlines in any installed
> header) could be made into compat symbols (not in libc.a, not available
> for new links, not included in future glibc ports).

Yes that is possible. I'd first like to fix the rest of the string2.h header,
for example there are still quite a few inlines that are either unnecessary,
could be done in GCC or could be done as generic optimizations in the actual 
library function that benefit all inputs, not just rare cases.

Wilco
  
Joseph Myers Oct. 9, 2015, 5:01 p.m. UTC | #3
On Fri, 9 Oct 2015, Wilco Dijkstra wrote:

> You wrote in http://sourceware.org/ml/libc-alpha/2015-05/msg00770.html:
> "(sysdeps/i386/string-inlines.c can be ignored or removed since plain i386 
> is no longer supported; only string/string-inlines.c and 
> sysdeps/i386/i486/string-inlines.c are relevant.)"

That was before HJ moved the i386/i486 contents up a directory level.  
sysdeps/i386/string-inlines.c is what used to be 
sysdeps/i386/i486/string-inlines.c at the time I wrote that.

> Looking at the x86 build, string-inlines.o is indeed 
> string/string-inlines.c so That confirms sysdeps/i386/string-inlines.c 
> is not built at all. No idea what magic ensures this, but that's what I 
> get...

Well, sysdeps/i386/string-inlines.c should be built.  My 32-bit x86 builds 
clearly show "gcc -m32 ../sysdeps/i386/string-inlines.c -c ..." in the 
build logs.

> I did full testsuite runs of course, and the only failures are the usual 
> suspects

> tests.sum:FAIL: elf/check-abi-libc

ABI test failures are not usual suspects!  (Remember to build with 
--prefix=/usr to avoid bug 14664.)  This is the key test to show up 
problems with this patch; you should never ignore failures of this test.
  
Joseph Myers Oct. 9, 2015, 5:07 p.m. UTC | #4
On Fri, 9 Oct 2015, Wilco Dijkstra wrote:

> (besides nptl/*):

Oh, and if building with a compiler whose libgcc_s.so.1 / libstdc++.so.6 
aren't in the default dynamic linker search paths, copy them into the 
glibc build directory before running tests.  Then you should not have a 
significant number of nptl/ test failures (and probably avoid some of the 
others on your list as well).
  
Wilco Dijkstra Oct. 12, 2015, 11:20 a.m. UTC | #5
> Joseph Myers wrote: 
> On Fri, 9 Oct 2015, Wilco Dijkstra wrote:
> 
> > You wrote in http://sourceware.org/ml/libc-alpha/2015-05/msg00770.html:
> > "(sysdeps/i386/string-inlines.c can be ignored or removed since plain i386
> > is no longer supported; only string/string-inlines.c and
> > sysdeps/i386/i486/string-inlines.c are relevant.)"
> 
> That was before HJ moved the i386/i486 contents up a directory level.
> sysdeps/i386/string-inlines.c is what used to be
> sysdeps/i386/i486/string-inlines.c at the time I wrote that.
> 
> > Looking at the x86 build, string-inlines.o is indeed
> > string/string-inlines.c so That confirms sysdeps/i386/string-inlines.c
> > is not built at all. No idea what magic ensures this, but that's what I
> > get...
> 
> Well, sysdeps/i386/string-inlines.c should be built.  My 32-bit x86 builds
> clearly show "gcc -m32 ../sysdeps/i386/string-inlines.c -c ..." in the
> build logs.

So what is the magic to do a cross build for 32-bit? I can't find anything in
the GLIBC documentation that gives the correct recipe to do this, and when I
try to force it to use 32-bit using -m32 -arch=i686 I only get configure and
build failures...

Wilco
  
Joseph Myers Oct. 12, 2015, 11:33 a.m. UTC | #6
On Mon, 12 Oct 2015, Wilco Dijkstra wrote:

> So what is the magic to do a cross build for 32-bit? I can't find anything in
> the GLIBC documentation that gives the correct recipe to do this, and when I
> try to force it to use 32-bit using -m32 -arch=i686 I only get configure and
> build failures...

Assuming you have a compiler with working -m32 support (startup files etc. 
installed), you should set CC and CXX to include -m32, and use 
--host=i686-pc-linux-gnu when configuring.  If you want to treat it as a 
native build for testing purposes, --build=i686-pc-linux-gnu is a good 
idea as well.  This is in addition to usual configure options such as 
--prefix=/usr --enable-add-ons.
  
Wilco Dijkstra Oct. 12, 2015, 12:53 p.m. UTC | #7
> Joseph Myers wrote:
> On Mon, 12 Oct 2015, Wilco Dijkstra wrote:
> 
> > So what is the magic to do a cross build for 32-bit? I can't find anything in
> > the GLIBC documentation that gives the correct recipe to do this, and when I
> > try to force it to use 32-bit using -m32 -arch=i686 I only get configure and
> > build failures...
> 
> Assuming you have a compiler with working -m32 support (startup files etc.
> installed), you should set CC and CXX to include -m32, and use
> --host=i686-pc-linux-gnu when configuring.  If you want to treat it as a
> native build for testing purposes, --build=i686-pc-linux-gnu is a good
> idea as well.  This is in addition to usual configure options such as
> --prefix=/usr --enable-add-ons.

Thanks, that seems to pass the configure correctly. However it fails later
compiling tls code with either GCC 4.8.2 or trunk:

CC="/work/install/gcc_x64/bin/gcc -m32"
CXX="/work/install/gcc_x64/bin/g++ -m32"
CFLAGS="-O2"
CXXFLAGS="-O2"

export CFLAGS
export CXXFLAGS
export CC
export CXX

cd ${build}
${src}/configure --prefix=/usr --enable-add-ons --target=i686-pc-linux-gnu --host=i686-pc-linux-gnu
--build=i686-pc-linux-gnu 
make -j20

It reports on latest GLIBC:

In file included from ../sysdeps/i386/nptl/tls.h:28:0,
                 from ../sysdeps/i386/i686/nptl/tls.h:33,
                 from ../include/errno.h:27,
                 from libc-tls.c:19:
libc-tls.c: In function '__libc_setup_tls':
../sysdeps/unix/sysv/linux/i386/sysdep.h:409:12: error: '__NR_set_thread_area' undeclared (first use
in this function)
     : "i" (__NR_##name) ASMFMT_##nr(args) : "memory", "cc")
            ^
../sysdeps/unix/sysv/linux/i386/sysdep.h:346:5: note: in expansion of macro
'INTERNAL_SYSCALL_MAIN_INLINE'
     INTERNAL_SYSCALL_MAIN_INLINE(name, err, 1, args)
     ^
../sysdeps/unix/sysv/linux/i386/sysdep.h:374:5: note: in expansion of macro
'INTERNAL_SYSCALL_MAIN_1'
     INTERNAL_SYSCALL_MAIN_##nr (name, err, args);         \
     ^
../sysdeps/i386/nptl/tls.h:213:16: note: in expansion of macro 'INTERNAL_SYSCALL'
      _result = INTERNAL_SYSCALL (set_thread_area, err, 1, &_segdescr.desc);   \
                ^
libc-tls.c:188:25: note: in expansion of macro 'TLS_INIT_TP'
   const char *lossage = TLS_INIT_TP ((char *) tlsblock + tcb_offset);


This is a common error (lots of hits on Google) but nobody seems to offer an actual fix...

Wilco
  
Joseph Myers Oct. 12, 2015, 1 p.m. UTC | #8
On Mon, 12 Oct 2015, Wilco Dijkstra wrote:

> ../sysdeps/unix/sysv/linux/i386/sysdep.h:409:12: error: '__NR_set_thread_area' undeclared (first use
> in this function)

If you have a compiler installation with working -m32 support, it should 
be searching a copy of kernel headers (installed with the Linux kernel's 
headers_install target) containing asm/unistd.h that looks like

#ifndef _ASM_X86_UNISTD_H
#define _ASM_X86_UNISTD_H

/* x32 syscall flag bit */
#define __X32_SYSCALL_BIT       0x40000000

# ifdef __i386__
#  include <asm/unistd_32.h>
# elif defined(__ILP32__)
#  include <asm/unistd_x32.h>
# else
#  include <asm/unistd_64.h>
# endif

#endif /* _ASM_X86_UNISTD_H */

and where unistd_32.h contains the definition of __NR_set_thread_area, and 
where an include of sys/syscall.h from within glibc ends up including that 
<asm/unistd.h>.

So, you need to find out why your compiler installation isn't finding an 
appropriate asm/unistd.h, and fix that problem.
  

Patch

diff --git a/string/bits/string2.h b/string/bits/string2.h
index bd8c404..f8ea1f9 100644
--- a/string/bits/string2.h
+++ b/string/bits/string2.h
@@ -51,20 +51,6 @@ 
 # include <endian.h>
 # include <bits/types.h>
 
-#else
-/* These are a few types we need for the optimizations if we cannot
-   use unaligned memory accesses.  */
-# define __STRING2_COPY_TYPE(N) \
-  typedef struct { unsigned char __arr[N]; }				      \
-    __attribute__ ((__packed__)) __STRING2_COPY_ARR##N
-__STRING2_COPY_TYPE (2);
-__STRING2_COPY_TYPE (3);
-__STRING2_COPY_TYPE (4);
-__STRING2_COPY_TYPE (5);
-__STRING2_COPY_TYPE (6);
-__STRING2_COPY_TYPE (7);
-__STRING2_COPY_TYPE (8);
-# undef __STRING2_COPY_TYPE
 #endif
 
 /* Dereferencing a pointer arg to run sizeof on it fails for the void
@@ -97,135 +83,10 @@  __STRING2_COPY_TYPE (8);
    we have to use the name `__mempcpy'.  */
 #   define mempcpy(dest, src, n) __mempcpy (dest, src, n)
 #  endif
-
-#  if defined _FORCE_INLINES
-#   if _STRING_ARCH_unaligned
-__STRING_INLINE void *__mempcpy_small (void *, char, char, char, char,
-				       __uint16_t, __uint16_t, __uint32_t,
-				       __uint32_t, size_t);
-__STRING_INLINE void *
-__mempcpy_small (void *__dest1,
-		 char __src0_1, char __src2_1, char __src4_1, char __src6_1,
-		 __uint16_t __src0_2, __uint16_t __src4_2,
-		 __uint32_t __src0_4, __uint32_t __src4_4,
-		 size_t __srclen)
-{
-  union {
-    __uint32_t __ui;
-    __uint16_t __usi;
-    unsigned char __uc;
-    unsigned char __c;
-  } *__u = __dest1;
-  switch ((unsigned int) __srclen)
-    {
-    case 1:
-      __u->__c = __src0_1;
-      __u = __extension__ ((void *) __u + 1);
-      break;
-    case 2:
-      __u->__usi = __src0_2;
-      __u = __extension__ ((void *) __u + 2);
-      break;
-    case 3:
-      __u->__usi = __src0_2;
-      __u = __extension__ ((void *) __u + 2);
-      __u->__c = __src2_1;
-      __u = __extension__ ((void *) __u + 1);
-      break;
-    case 4:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      break;
-    case 5:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__c = __src4_1;
-      __u = __extension__ ((void *) __u + 1);
-      break;
-    case 6:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__usi = __src4_2;
-      __u = __extension__ ((void *) __u + 2);
-      break;
-    case 7:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__usi = __src4_2;
-      __u = __extension__ ((void *) __u + 2);
-      __u->__c = __src6_1;
-      __u = __extension__ ((void *) __u + 1);
-      break;
-    case 8:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__ui = __src4_4;
-      __u = __extension__ ((void *) __u + 4);
-      break;
-    }
-  return (void *) __u;
-}
-#   else
-__STRING_INLINE void *__mempcpy_small (void *, char, __STRING2_COPY_ARR2,
-				       __STRING2_COPY_ARR3,
-				       __STRING2_COPY_ARR4,
-				       __STRING2_COPY_ARR5,
-				       __STRING2_COPY_ARR6,
-				       __STRING2_COPY_ARR7,
-				       __STRING2_COPY_ARR8, size_t);
-__STRING_INLINE void *
-__mempcpy_small (void *__dest, char __src1,
-		 __STRING2_COPY_ARR2 __src2, __STRING2_COPY_ARR3 __src3,
-		 __STRING2_COPY_ARR4 __src4, __STRING2_COPY_ARR5 __src5,
-		 __STRING2_COPY_ARR6 __src6, __STRING2_COPY_ARR7 __src7,
-		 __STRING2_COPY_ARR8 __src8, size_t __srclen)
-{
-  union {
-    char __c;
-    __STRING2_COPY_ARR2 __sca2;
-    __STRING2_COPY_ARR3 __sca3;
-    __STRING2_COPY_ARR4 __sca4;
-    __STRING2_COPY_ARR5 __sca5;
-    __STRING2_COPY_ARR6 __sca6;
-    __STRING2_COPY_ARR7 __sca7;
-    __STRING2_COPY_ARR8 __sca8;
-  } *__u = __dest;
-  switch ((unsigned int) __srclen)
-    {
-    case 1:
-      __u->__c = __src1;
-      break;
-    case 2:
-      __extension__ __u->__sca2 = __src2;
-      break;
-    case 3:
-      __extension__ __u->__sca3 = __src3;
-      break;
-    case 4:
-      __extension__ __u->__sca4 = __src4;
-      break;
-    case 5:
-      __extension__ __u->__sca5 = __src5;
-      break;
-    case 6:
-      __extension__ __u->__sca6 = __src6;
-      break;
-    case 7:
-      __extension__ __u->__sca7 = __src7;
-      break;
-    case 8:
-      __extension__ __u->__sca8 = __src8;
-      break;
-    }
-  return __extension__ ((void *) __u + __srclen);
-}
-#   endif
-#  endif
 # endif
 #endif
 
 
-/* Return pointer to C in S.  */
 #ifndef _HAVE_STRING_ARCH_strchr
 extern void *__rawmemchr (const void *__s, int __c);
 # if __GNUC_PREREQ (3, 2)
@@ -243,121 +104,6 @@  extern void *__rawmemchr (const void *__s, int __c);
 #endif
 
 
-/* Copy SRC to DEST.  */
-#if defined _FORCE_INLINES
-# if _STRING_ARCH_unaligned
-__STRING_INLINE char *__strcpy_small (char *, __uint16_t, __uint16_t,
-				      __uint32_t, __uint32_t, size_t);
-__STRING_INLINE char *
-__strcpy_small (char *__dest,
-		__uint16_t __src0_2, __uint16_t __src4_2,
-		__uint32_t __src0_4, __uint32_t __src4_4,
-		size_t __srclen)
-{
-  union {
-    __uint32_t __ui;
-    __uint16_t __usi;
-    unsigned char __uc;
-  } *__u = (void *) __dest;
-  switch ((unsigned int) __srclen)
-    {
-    case 1:
-      __u->__uc = '\0';
-      break;
-    case 2:
-      __u->__usi = __src0_2;
-      break;
-    case 3:
-      __u->__usi = __src0_2;
-      __u = __extension__ ((void *) __u + 2);
-      __u->__uc = '\0';
-      break;
-    case 4:
-      __u->__ui = __src0_4;
-      break;
-    case 5:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__uc = '\0';
-      break;
-    case 6:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__usi = __src4_2;
-      break;
-    case 7:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__usi = __src4_2;
-      __u = __extension__ ((void *) __u + 2);
-      __u->__uc = '\0';
-      break;
-    case 8:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__ui = __src4_4;
-      break;
-    }
-  return __dest;
-}
-# else
-__STRING_INLINE char *__strcpy_small (char *, __STRING2_COPY_ARR2,
-				      __STRING2_COPY_ARR3,
-				      __STRING2_COPY_ARR4,
-				      __STRING2_COPY_ARR5,
-				      __STRING2_COPY_ARR6,
-				      __STRING2_COPY_ARR7,
-				      __STRING2_COPY_ARR8, size_t);
-__STRING_INLINE char *
-__strcpy_small (char *__dest,
-		__STRING2_COPY_ARR2 __src2, __STRING2_COPY_ARR3 __src3,
-		__STRING2_COPY_ARR4 __src4, __STRING2_COPY_ARR5 __src5,
-		__STRING2_COPY_ARR6 __src6, __STRING2_COPY_ARR7 __src7,
-		__STRING2_COPY_ARR8 __src8, size_t __srclen)
-{
-  union {
-    char __c;
-    __STRING2_COPY_ARR2 __sca2;
-    __STRING2_COPY_ARR3 __sca3;
-    __STRING2_COPY_ARR4 __sca4;
-    __STRING2_COPY_ARR5 __sca5;
-    __STRING2_COPY_ARR6 __sca6;
-    __STRING2_COPY_ARR7 __sca7;
-    __STRING2_COPY_ARR8 __sca8;
-  } *__u = (void *) __dest;
-  switch ((unsigned int) __srclen)
-    {
-    case 1:
-      __u->__c = '\0';
-      break;
-    case 2:
-      __extension__ __u->__sca2 = __src2;
-      break;
-    case 3:
-      __extension__ __u->__sca3 = __src3;
-      break;
-    case 4:
-      __extension__ __u->__sca4 = __src4;
-      break;
-    case 5:
-      __extension__ __u->__sca5 = __src5;
-      break;
-    case 6:
-      __extension__ __u->__sca6 = __src6;
-      break;
-    case 7:
-      __extension__ __u->__sca7 = __src7;
-      break;
-    case 8:
-      __extension__ __u->__sca8 = __src8;
-      break;
-  }
-  return __dest;
-}
-# endif
-#endif
-
-
 /* Copy SRC to DEST, returning pointer to final NUL byte.  */
 #ifdef __USE_GNU
 # if !defined _HAVE_STRING_ARCH_stpcpy || defined _FORCE_INLINES
@@ -369,124 +115,6 @@  __strcpy_small (char *__dest,
    we have to use the name `__stpcpy'.  */
 #   define stpcpy(dest, src) __stpcpy (dest, src)
 #  endif
-
-#  if defined _FORCE_INLINES
-#   if _STRING_ARCH_unaligned
-__STRING_INLINE char *__stpcpy_small (char *, __uint16_t, __uint16_t,
-				      __uint32_t, __uint32_t, size_t);
-__STRING_INLINE char *
-__stpcpy_small (char *__dest,
-		__uint16_t __src0_2, __uint16_t __src4_2,
-		__uint32_t __src0_4, __uint32_t __src4_4,
-		size_t __srclen)
-{
-  union {
-    unsigned int __ui;
-    unsigned short int __usi;
-    unsigned char __uc;
-    char __c;
-  } *__u = (void *) __dest;
-  switch ((unsigned int) __srclen)
-    {
-    case 1:
-      __u->__uc = '\0';
-      break;
-    case 2:
-      __u->__usi = __src0_2;
-      __u = __extension__ ((void *) __u + 1);
-      break;
-    case 3:
-      __u->__usi = __src0_2;
-      __u = __extension__ ((void *) __u + 2);
-      __u->__uc = '\0';
-      break;
-    case 4:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 3);
-      break;
-    case 5:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__uc = '\0';
-      break;
-    case 6:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__usi = __src4_2;
-      __u = __extension__ ((void *) __u + 1);
-      break;
-    case 7:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__usi = __src4_2;
-      __u = __extension__ ((void *) __u + 2);
-      __u->__uc = '\0';
-      break;
-    case 8:
-      __u->__ui = __src0_4;
-      __u = __extension__ ((void *) __u + 4);
-      __u->__ui = __src4_4;
-      __u = __extension__ ((void *) __u + 3);
-      break;
-    }
-  return &__u->__c;
-}
-#   else
-__STRING_INLINE char *__stpcpy_small (char *, __STRING2_COPY_ARR2,
-				      __STRING2_COPY_ARR3,
-				      __STRING2_COPY_ARR4,
-				      __STRING2_COPY_ARR5,
-				      __STRING2_COPY_ARR6,
-				      __STRING2_COPY_ARR7,
-				      __STRING2_COPY_ARR8, size_t);
-__STRING_INLINE char *
-__stpcpy_small (char *__dest,
-		__STRING2_COPY_ARR2 __src2, __STRING2_COPY_ARR3 __src3,
-		__STRING2_COPY_ARR4 __src4, __STRING2_COPY_ARR5 __src5,
-		__STRING2_COPY_ARR6 __src6, __STRING2_COPY_ARR7 __src7,
-		__STRING2_COPY_ARR8 __src8, size_t __srclen)
-{
-  union {
-    char __c;
-    __STRING2_COPY_ARR2 __sca2;
-    __STRING2_COPY_ARR3 __sca3;
-    __STRING2_COPY_ARR4 __sca4;
-    __STRING2_COPY_ARR5 __sca5;
-    __STRING2_COPY_ARR6 __sca6;
-    __STRING2_COPY_ARR7 __sca7;
-    __STRING2_COPY_ARR8 __sca8;
-  } *__u = (void *) __dest;
-  switch ((unsigned int) __srclen)
-    {
-    case 1:
-      __u->__c = '\0';
-      break;
-    case 2:
-      __extension__ __u->__sca2 = __src2;
-      break;
-    case 3:
-      __extension__ __u->__sca3 = __src3;
-      break;
-    case 4:
-      __extension__ __u->__sca4 = __src4;
-      break;
-    case 5:
-      __extension__ __u->__sca5 = __src5;
-      break;
-    case 6:
-      __extension__ __u->__sca6 = __src6;
-      break;
-    case 7:
-      __extension__ __u->__sca7 = __src7;
-      break;
-    case 8:
-      __extension__ __u->__sca8 = __src8;
-      break;
-  }
-  return __dest + __srclen - 1;
-}
-#   endif
-#  endif
 # endif
 #endif
 
diff --git a/string/string-inlines.c b/string/string-inlines.c
index 0445be7..432d58d 100644
--- a/string/string-inlines.c
+++ b/string/string-inlines.c
@@ -32,3 +32,349 @@ 
 #undef __NO_INLINE__
 #include <bits/string.h>
 #include <bits/string2.h>
+
+/* These are a few types we need for the optimizations if we cannot
+   use unaligned memory accesses.  */
+# define __STRING2_COPY_TYPE(N) \
+  typedef struct { unsigned char __arr[N]; }				      \
+    __attribute__ ((__packed__)) __STRING2_COPY_ARR##N
+__STRING2_COPY_TYPE (2);
+__STRING2_COPY_TYPE (3);
+__STRING2_COPY_TYPE (4);
+__STRING2_COPY_TYPE (5);
+__STRING2_COPY_TYPE (6);
+__STRING2_COPY_TYPE (7);
+__STRING2_COPY_TYPE (8);
+# undef __STRING2_COPY_TYPE
+
+
+#if _STRING_ARCH_unaligned
+__STRING_INLINE void *__mempcpy_small (void *, char, char, char, char,
+				       __uint16_t, __uint16_t, __uint32_t,
+				       __uint32_t, size_t);
+__STRING_INLINE void *
+__mempcpy_small (void *__dest1,
+		 char __src0_1, char __src2_1, char __src4_1, char __src6_1,
+		 __uint16_t __src0_2, __uint16_t __src4_2,
+		 __uint32_t __src0_4, __uint32_t __src4_4,
+		 size_t __srclen)
+{
+  union {
+    __uint32_t __ui;
+    __uint16_t __usi;
+    unsigned char __uc;
+    unsigned char __c;
+  } *__u = __dest1;
+  switch ((unsigned int) __srclen)
+    {
+    case 1:
+      __u->__c = __src0_1;
+      __u = __extension__ ((void *) __u + 1);
+      break;
+    case 2:
+      __u->__usi = __src0_2;
+      __u = __extension__ ((void *) __u + 2);
+      break;
+    case 3:
+      __u->__usi = __src0_2;
+      __u = __extension__ ((void *) __u + 2);
+      __u->__c = __src2_1;
+      __u = __extension__ ((void *) __u + 1);
+      break;
+    case 4:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      break;
+    case 5:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__c = __src4_1;
+      __u = __extension__ ((void *) __u + 1);
+      break;
+    case 6:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__usi = __src4_2;
+      __u = __extension__ ((void *) __u + 2);
+      break;
+    case 7:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__usi = __src4_2;
+      __u = __extension__ ((void *) __u + 2);
+      __u->__c = __src6_1;
+      __u = __extension__ ((void *) __u + 1);
+      break;
+    case 8:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__ui = __src4_4;
+      __u = __extension__ ((void *) __u + 4);
+      break;
+    }
+  return (void *) __u;
+}
+
+#else
+
+__STRING_INLINE void *
+__mempcpy_small (void *__dest, char __src1,
+		 __STRING2_COPY_ARR2 __src2, __STRING2_COPY_ARR3 __src3,
+		 __STRING2_COPY_ARR4 __src4, __STRING2_COPY_ARR5 __src5,
+		 __STRING2_COPY_ARR6 __src6, __STRING2_COPY_ARR7 __src7,
+		 __STRING2_COPY_ARR8 __src8, size_t __srclen)
+{
+  union {
+    char __c;
+    __STRING2_COPY_ARR2 __sca2;
+    __STRING2_COPY_ARR3 __sca3;
+    __STRING2_COPY_ARR4 __sca4;
+    __STRING2_COPY_ARR5 __sca5;
+    __STRING2_COPY_ARR6 __sca6;
+    __STRING2_COPY_ARR7 __sca7;
+    __STRING2_COPY_ARR8 __sca8;
+  } *__u = __dest;
+  switch ((unsigned int) __srclen)
+    {
+    case 1:
+      __u->__c = __src1;
+      break;
+    case 2:
+      __extension__ __u->__sca2 = __src2;
+      break;
+    case 3:
+      __extension__ __u->__sca3 = __src3;
+      break;
+    case 4:
+      __extension__ __u->__sca4 = __src4;
+      break;
+    case 5:
+      __extension__ __u->__sca5 = __src5;
+      break;
+    case 6:
+      __extension__ __u->__sca6 = __src6;
+      break;
+    case 7:
+      __extension__ __u->__sca7 = __src7;
+      break;
+    case 8:
+      __extension__ __u->__sca8 = __src8;
+      break;
+    }
+  return __extension__ ((void *) __u + __srclen);
+}
+#endif
+
+# if _STRING_ARCH_unaligned
+__STRING_INLINE char *
+__strcpy_small (char *__dest,
+		__uint16_t __src0_2, __uint16_t __src4_2,
+		__uint32_t __src0_4, __uint32_t __src4_4,
+		size_t __srclen)
+{
+  union {
+    __uint32_t __ui;
+    __uint16_t __usi;
+    unsigned char __uc;
+  } *__u = (void *) __dest;
+  switch ((unsigned int) __srclen)
+    {
+    case 1:
+      __u->__uc = '\0';
+      break;
+    case 2:
+      __u->__usi = __src0_2;
+      break;
+    case 3:
+      __u->__usi = __src0_2;
+      __u = __extension__ ((void *) __u + 2);
+      __u->__uc = '\0';
+      break;
+    case 4:
+      __u->__ui = __src0_4;
+      break;
+    case 5:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__uc = '\0';
+      break;
+    case 6:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__usi = __src4_2;
+      break;
+    case 7:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__usi = __src4_2;
+      __u = __extension__ ((void *) __u + 2);
+      __u->__uc = '\0';
+      break;
+    case 8:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__ui = __src4_4;
+      break;
+    }
+  return __dest;
+}
+
+#else
+
+__STRING_INLINE char *
+__strcpy_small (char *__dest,
+		__STRING2_COPY_ARR2 __src2, __STRING2_COPY_ARR3 __src3,
+		__STRING2_COPY_ARR4 __src4, __STRING2_COPY_ARR5 __src5,
+		__STRING2_COPY_ARR6 __src6, __STRING2_COPY_ARR7 __src7,
+		__STRING2_COPY_ARR8 __src8, size_t __srclen)
+{
+  union {
+    char __c;
+    __STRING2_COPY_ARR2 __sca2;
+    __STRING2_COPY_ARR3 __sca3;
+    __STRING2_COPY_ARR4 __sca4;
+    __STRING2_COPY_ARR5 __sca5;
+    __STRING2_COPY_ARR6 __sca6;
+    __STRING2_COPY_ARR7 __sca7;
+    __STRING2_COPY_ARR8 __sca8;
+  } *__u = (void *) __dest;
+  switch ((unsigned int) __srclen)
+    {
+    case 1:
+      __u->__c = '\0';
+      break;
+    case 2:
+      __extension__ __u->__sca2 = __src2;
+      break;
+    case 3:
+      __extension__ __u->__sca3 = __src3;
+      break;
+    case 4:
+      __extension__ __u->__sca4 = __src4;
+      break;
+    case 5:
+      __extension__ __u->__sca5 = __src5;
+      break;
+    case 6:
+      __extension__ __u->__sca6 = __src6;
+      break;
+    case 7:
+      __extension__ __u->__sca7 = __src7;
+      break;
+    case 8:
+      __extension__ __u->__sca8 = __src8;
+      break;
+  }
+  return __dest;
+}
+#endif
+
+
+#if _STRING_ARCH_unaligned
+__STRING_INLINE char *
+__stpcpy_small (char *__dest,
+		__uint16_t __src0_2, __uint16_t __src4_2,
+		__uint32_t __src0_4, __uint32_t __src4_4,
+		size_t __srclen)
+{
+  union {
+    unsigned int __ui;
+    unsigned short int __usi;
+    unsigned char __uc;
+    char __c;
+  } *__u = (void *) __dest;
+  switch ((unsigned int) __srclen)
+    {
+    case 1:
+      __u->__uc = '\0';
+      break;
+    case 2:
+      __u->__usi = __src0_2;
+      __u = __extension__ ((void *) __u + 1);
+      break;
+    case 3:
+      __u->__usi = __src0_2;
+      __u = __extension__ ((void *) __u + 2);
+      __u->__uc = '\0';
+      break;
+    case 4:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 3);
+      break;
+    case 5:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__uc = '\0';
+      break;
+    case 6:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__usi = __src4_2;
+      __u = __extension__ ((void *) __u + 1);
+      break;
+    case 7:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__usi = __src4_2;
+      __u = __extension__ ((void *) __u + 2);
+      __u->__uc = '\0';
+      break;
+    case 8:
+      __u->__ui = __src0_4;
+      __u = __extension__ ((void *) __u + 4);
+      __u->__ui = __src4_4;
+      __u = __extension__ ((void *) __u + 3);
+      break;
+    }
+  return &__u->__c;
+}
+
+#else
+
+__STRING_INLINE char *
+__stpcpy_small (char *__dest,
+		__STRING2_COPY_ARR2 __src2, __STRING2_COPY_ARR3 __src3,
+		__STRING2_COPY_ARR4 __src4, __STRING2_COPY_ARR5 __src5,
+		__STRING2_COPY_ARR6 __src6, __STRING2_COPY_ARR7 __src7,
+		__STRING2_COPY_ARR8 __src8, size_t __srclen)
+{
+  union {
+    char __c;
+    __STRING2_COPY_ARR2 __sca2;
+    __STRING2_COPY_ARR3 __sca3;
+    __STRING2_COPY_ARR4 __sca4;
+    __STRING2_COPY_ARR5 __sca5;
+    __STRING2_COPY_ARR6 __sca6;
+    __STRING2_COPY_ARR7 __sca7;
+    __STRING2_COPY_ARR8 __sca8;
+  } *__u = (void *) __dest;
+  switch ((unsigned int) __srclen)
+    {
+    case 1:
+      __u->__c = '\0';
+      break;
+    case 2:
+      __extension__ __u->__sca2 = __src2;
+      break;
+    case 3:
+      __extension__ __u->__sca3 = __src3;
+      break;
+    case 4:
+      __extension__ __u->__sca4 = __src4;
+      break;
+    case 5:
+      __extension__ __u->__sca5 = __src5;
+      break;
+    case 6:
+      __extension__ __u->__sca6 = __src6;
+      break;
+    case 7:
+      __extension__ __u->__sca7 = __src7;
+      break;
+    case 8:
+      __extension__ __u->__sca8 = __src8;
+      break;
+  }
+  return __dest + __srclen - 1;
+}
+#endif