x86-64: Update strlen.S to support wcslen/wcsnlen

Message ID CAMe9rOo-BY55VNoL7D64B66ACgyV=u2GKLfzxuoJ_8aWAivRXw@mail.gmail.com
State New, archived
Headers

Commit Message

H.J. Lu June 6, 2017, 12:37 p.m. UTC
  On Mon, Jun 5, 2017 at 10:37 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2017.06.05 at 07:24 -0700, H.J. Lu wrote:
>> On Tue, May 30, 2017 at 3:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Sat, May 20, 2017 at 6:56 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >> The difference between strlen and wcslen is byte vs int.  We can
>> >> replace pminub and pcmpeqb with pminud and pcmpeqd to turn strlen
>> >> into wcslen.  Tested on Ivy Bridge with benchtests/bench-wcslen.c,
>> >> the new strlen based wcslen is as fast as the old wcslen.
>> >>
>> >> OK for master?
>> >>
>> >> H.J.
>> >> ---
>> >>         * sysdeps/x86_64/strlen.S (PMINU): New.
>> >>         (PCMPEQ): Likewise.
>> >>         (SHIFT_RETURN): Likewise.
>> >>         (FIND_ZERO): Replace pcmpeqb with PCMPEQ.
>> >>         (strlen): Add SHIFT_RETURN before ret.  Replace pcmpeqb and
>> >>         pminub with PCMPEQ and PMINU.
>> >>         * sysdeps/x86_64/wcslen.S: Define AS_WCSLEN and strlen.
>> >>         Include "strlen.S".
>> >>         * sysdeps/x86_64/wcsnlen.S: New file.
>> >
>> > Here is the updated patch only with SSE2 wcsnlen.S.  Any
>> > comments?
>> >
>>
>> I will check it in today.
>
> It doesn't work on old machines without SSE4.1:
>
> FAIL: stdio-common/tstdiomisc
> FAIL: wcsmbs/test-wcpncpy
> FAIL: wcsmbs/test-wcsncmp
> FAIL: wcsmbs/test-wcsncpy
> FAIL: wcsmbs/test-wcsnlen
> FAIL: wcsmbs/wcsatcliff
>
> markus@x4 tmp % gdb --args /var/tmp/glibc-build/elf/ld-linux-x86-64.so.2 --library-path /var/tmp/glibc-build:/var/tmp/glibc-build/math:/var/tmp/glibc-build/elf:/var/tmp/gl[31/150]d/dlfcn:/var/tmp/glibc-build/nss:/var/tmp/glibc-build/nis:/var/tmp/glibc-build/rt:/var/tmp/glibc-build/resolv:/var/tmp/glibc-build/crypt:/var/tmp/glibc-build/mathvec:/var/tmp/glib
> c-build/support:/var/tmp/glibc-build/nptl /var/tmp/glibc-build/stdio-common/tstdiomisc
> Reading symbols from /var/tmp/glibc-build/elf/ld-linux-x86-64.so.2...done.
> (gdb) run
> Starting program: /home/markus/tmp/glibc-build/elf/ld-linux-x86-64.so.2 --library-path /var/tmp/glibc-build:/var/tmp/glibc-build/math:/var/tmp/glibc-build/elf:/var/tmp/glibc-build/dlfcn:/var/tmp/glibc-build/nss:/var/tmp/glibc-build/nis:/var/tmp/glibc-build/rt:/var/tmp/glibc-build/resolv:/var/tmp/glibc-build/crypt:/var/tmp/glibc-build/mathvec:/var/tmp/glibc-build/support:/var/tmp/glibc-build/nptl /var/tmp/glibc-build/stdio-common/tstdiomisc
> t1: count=5
> sscanf ("12345", "%ld", &x) => 1, x = 12345
> sscanf ("12345", "%llllld", &x) => 0, x = -1
> sscanf ("12345", "%LLLLLd", &x) => 0, x = -1
> sscanf ("test ", "%*s%n", &x) => 0, x = 4
> sscanf ("test ", "%2*s%n", &x) => 0, x = -1
> sscanf ("12 ", "%l2d", &x) => 0, x = -1
> sscanf ("12 ", "%2ld", &x) => 1, x = 12
> sscanf ("1 1", "%d %Z", &n, &N) => 1, n = 1, N = -1
> expected "nan NAN nan NAN nan NAN nan NAN", got "nan NAN nan NAN nan NAN nan NAN"
> expected "-nan -NAN -nan -NAN -nan -NAN -nan -NAN", got "-nan -NAN -nan -NAN -nan -NAN -nan -NAN"
> expected "nan NAN nan NAN nan NAN nan NAN", got "nan NAN nan NAN nan NAN nan NAN"
> expected "-nan -NAN -nan -NAN -nan -NAN -nan -NAN", got "-nan -NAN -nan -NAN -nan -NAN -nan -NAN"                                                                                  expected "inf INF inf INF inf INF inf INF", got "inf INF inf INF inf INF inf INF"
> expected "-inf -INF -inf -INF -inf -INF -inf -INF", got "-inf -INF -inf -INF -inf -INF -inf -INF"
> Program received signal SIGILL, Illegal instruction.
> wcsnlen () at ../sysdeps/x86_64/strlen.S:180
> 180             PMINU   16(%rax), %xmm0
>

Please try this.  Sorry for the breakage.
  

Comments

Markus Trippelsdorf June 6, 2017, 12:59 p.m. UTC | #1
On 2017.06.06 at 05:37 -0700, H.J. Lu wrote:
> On Mon, Jun 5, 2017 at 10:37 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> >
> > It doesn't work on old machines without SSE4.1:
> >
> > FAIL: stdio-common/tstdiomisc
> > FAIL: wcsmbs/test-wcpncpy
> > FAIL: wcsmbs/test-wcsncmp
> > FAIL: wcsmbs/test-wcsncpy
> > FAIL: wcsmbs/test-wcsnlen
> > FAIL: wcsmbs/wcsatcliff
> 
> Please try this.  Sorry for the breakage.

It works fine now. Thanks for the quick patch.
  
H.J. Lu June 6, 2017, 1:13 p.m. UTC | #2
On Tue, Jun 6, 2017 at 5:59 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2017.06.06 at 05:37 -0700, H.J. Lu wrote:
>> On Mon, Jun 5, 2017 at 10:37 PM, Markus Trippelsdorf
>> <markus@trippelsdorf.de> wrote:
>> >
>> > It doesn't work on old machines without SSE4.1:
>> >
>> > FAIL: stdio-common/tstdiomisc
>> > FAIL: wcsmbs/test-wcpncpy
>> > FAIL: wcsmbs/test-wcsncmp
>> > FAIL: wcsmbs/test-wcsncpy
>> > FAIL: wcsmbs/test-wcsnlen
>> > FAIL: wcsmbs/wcsatcliff
>>
>> Please try this.  Sorry for the breakage.
>
> It works fine now. Thanks for the quick patch.
>

I checked it in.
  

Patch

From b5b8fce19221891afba4907e9dd7e05b9e797f53 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 6 Jun 2017 05:31:48 -0700
Subject: [PATCH] x86-64: Move wcsnlen.S to multiarch/wcsnlen-sse4_1.S

Since wcsnlen.S uses pminud which is the part of SSE4.1, move wcsnlen.S
to multiarch/wcsnlen-sse4_1.S.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	wcsnlen-sse4_1 and wcsnlen-c.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __wcsnlen_sse4_1 and
	__wcsnlen_sse2.
	* sysdeps/x86_64/multiarch/ifunc-sse4_1.h: New file.
	* sysdeps/x86_64/multiarch/wcsnlen-c.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsnlen.c: Likewise.
	* sysdeps/x86_64/wcsnlen.S: Removed.
---
 sysdeps/x86_64/multiarch/Makefile          |  3 ++-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c |  7 ++++++
 sysdeps/x86_64/multiarch/ifunc-sse4_1.h    | 34 ++++++++++++++++++++++++++++++
 sysdeps/x86_64/multiarch/wcsnlen-c.c       |  9 ++++++++
 sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S  |  5 +++++
 sysdeps/x86_64/multiarch/wcsnlen.c         | 31 +++++++++++++++++++++++++++
 sysdeps/x86_64/wcsnlen.S                   |  7 ------
 7 files changed, 88 insertions(+), 8 deletions(-)
 create mode 100644 sysdeps/x86_64/multiarch/ifunc-sse4_1.h
 create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-c.c
 create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsnlen.c
 delete mode 100644 sysdeps/x86_64/wcsnlen.S

diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index b040288..310a3a4 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -33,7 +33,8 @@  endif
 ifeq ($(subdir),wcsmbs)
 sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \
 		   wmemcmp-avx2-movbe \
-		   wcscpy-ssse3 wcscpy-c
+		   wcscpy-ssse3 wcscpy-c \
+		   wcsnlen-sse4_1 wcsnlen-c
 endif
 
 ifeq ($(subdir),debug)
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index b61bc9f..ee4243a 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -296,6 +296,13 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 			      __wcscpy_ssse3)
 	      IFUNC_IMPL_ADD (array, i, wcscpy, 1, __wcscpy_sse2))
 
+  /* Support sysdeps/x86_64/multiarch/wcsnlen.c.  */
+  IFUNC_IMPL (i, name, wcsnlen,
+	      IFUNC_IMPL_ADD (array, i, wcsnlen,
+			      HAS_CPU_FEATURE (SSE4_1),
+			      __wcsnlen_sse4_1)
+	      IFUNC_IMPL_ADD (array, i, wcsnlen, 1, __wcsnlen_sse2))
+
   /* Support sysdeps/x86_64/multiarch/wmemcmp.S.  */
   IFUNC_IMPL (i, name, wmemcmp,
 	      IFUNC_IMPL_ADD (array, i, wmemcmp,
diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_1.h b/sysdeps/x86_64/multiarch/ifunc-sse4_1.h
new file mode 100644
index 0000000..2b89231
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/ifunc-sse4_1.h
@@ -0,0 +1,34 @@ 
+/* Common definition for ifunc selections optimized with SSE2 and SSE4.1.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <init-arch.h>
+
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+  const struct cpu_features* cpu_features = __get_cpu_features ();
+
+  if (CPU_FEATURES_CPU_P (cpu_features, SSE4_1))
+    return OPTIMIZE (sse4_1);
+
+  return OPTIMIZE (sse2);
+}
diff --git a/sysdeps/x86_64/multiarch/wcsnlen-c.c b/sysdeps/x86_64/multiarch/wcsnlen-c.c
new file mode 100644
index 0000000..e1ec7cf
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/wcsnlen-c.c
@@ -0,0 +1,9 @@ 
+#if IS_IN (libc)
+# include <wchar.h>
+
+# define WCSNLEN __wcsnlen_sse2
+
+extern __typeof (wcsnlen) __wcsnlen_sse2;
+#endif
+
+#include "wcsmbs/wcsnlen.c"
diff --git a/sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S b/sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S
new file mode 100644
index 0000000..a8cab0c
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S
@@ -0,0 +1,5 @@ 
+#define AS_WCSLEN
+#define AS_STRNLEN
+#define strlen	__wcsnlen_sse4_1
+
+#include "../strlen.S"
diff --git a/sysdeps/x86_64/multiarch/wcsnlen.c b/sysdeps/x86_64/multiarch/wcsnlen.c
new file mode 100644
index 0000000..5f74d2c
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/wcsnlen.c
@@ -0,0 +1,31 @@ 
+/* Multiple versions of wcsnlen.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc. */
+#if IS_IN (libc)
+# define __wcsnlen __redirect_wcsnlen
+# include <wchar.h>
+# undef __wcsnlen
+
+# define SYMBOL_NAME wcsnlen
+# include "ifunc-sse4_1.h"
+
+libc_ifunc_redirected (__redirect_wcsnlen, __wcsnlen, IFUNC_SELECTOR ());
+weak_alias (__wcsnlen, wcsnlen);
+#endif
diff --git a/sysdeps/x86_64/wcsnlen.S b/sysdeps/x86_64/wcsnlen.S
deleted file mode 100644
index 968bb69..0000000
--- a/sysdeps/x86_64/wcsnlen.S
+++ /dev/null
@@ -1,7 +0,0 @@ 
-#define AS_WCSLEN
-#define AS_STRNLEN
-#define strlen	__wcsnlen
-
-#include "strlen.S"
-
-weak_alias(__wcsnlen, wcsnlen)
-- 
2.9.4