diff mbox series

Optimize generic strtok(_r) function.

Message ID 9d4a1ba5-544c-f5d9-261a-4ead8da3f6eb@brown.edu
State New
Headers show
Series Optimize generic strtok(_r) function. | expand

Commit Message

Joseph Yoo March 12, 2020, 3:30 p.m. UTC
The current generic strtok_r implementation calls strspn to find the
beginning of the token and strcspn to find the end. Within each of these
call stacks is a redundant 256-byte look-up table used to compare chars
of S with DELIM. This patched version, for the most part, manually
inlines them + adds other (commented) optimizations. Still, strtok(_r)
benefits from existing str(c)spn vector implementations. So, any
sub-arch with a __str(c)spn_(vx/sse42/etc) should also have a
__strtok_r_(vx/sse42/etc) to call them explicitly. I've done so for those
affected: x86_64, powerpc64/power8, s390(x), and i386/686.
I've tested on my own x86_64 machine, but I'd like help with the
others (there seem to be different conventions in how versioning
is done, and I'm acquainted with at least none of them). 

I CC-ed Adhemerval Zanella because I was working under their
branch, generic-strings, and also just noticed that this goes
against their goal of promoting portability. Meaning, it seems it would
be preferred to keep str(c)spn in this function. So, another solution
could be to 'externally' inline the two calls (strspn+strcspn),
allowing an optimizing compiler to prevent re-allocation. With 
optimization flags set at O[s,2-3] (tested in x86_64 gcc 9.2) the compiler 
does so. However, the instructions to zero the table, 
search through delim, and set the table still repeat. So, manual inlining 
seems to be 'optimal,' but I understand that it would mean more work in
terms of having to write more architecture-specific implementations.

I hope this patch/email/Changelog format is somewhat correct!

ChangeLog:

2020-03-12  Joseph Yoo  <joseph_yoo.brown.edu>

        * benchtests/bench-string.h (STRTOK_R): Define symbol for __strtok_r
        * benchtests/bench-strtok.c: Change output to JSON and change IMPL 
	calls to strtok_r instead of strtok so multiple versions of strtok_r 
	can be compared. I also added some cases for when the size of delim
	is around 16 (the size of a 128bit register and also the limit at
	which str(c)spn just relegates the function call to __str(c)spn_sse2,
	which is just the generic version).

        * string/strtok_r.c: Optimize function.

        * sysdeps/i386/i686/multiarch/Makefile: Add strtok_r-c.c and
	strtok_r-ia32.c to sysdep_routines.        
	* sysdeps/i386/i686/multiarch/ifunc-impl-list.c 
	(__libc_ifunc_impl_list): Add ia32 (generic) and SSE4.2 versions.
        * sysdeps/i386/i686/multiarch/strtok_r-c.c: SSE4.2 version that just
	includes the x86_64 SSE4.2 impl.
        * sysdeps/i386/i686/multiarch/strtok_r-ia32.c: Default version that 
	includes the generic impl.
        * sysdeps/i386/i686/multiarch/strtok_r.c: Defines the versions.

        * sysdeps/powerpc/powerpc64/multiarch/Makefile: Add strtok_r-power8 and
	strtok_r-ppc64 to sysdep_routines.
        * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c 
	(__libc_ifunc_impl_list):  Add ppc64 (generic) and power8 versions.
        * sysdeps/powerpc/powerpc64/multiarch/strtok_r-power8.c: Calls the
	power8 vector implementations of strspn and strcspn (as the current)
	generic version does.
        * sysdeps/powerpc/powerpc64/multiarch/strtok_r-ppc64.c: Default
	version that includes generic impl.
        * sysdeps/powerpc/powerpc64/multiarch/strtok_r.c: Defines the versions.

        * sysdeps/s390/Makefile: Add strtok_r, strtok_r-vx, and strtok_r-c
	to sysdep_routines.
        * sysdeps/s390/ifunc-strtok_r.h: New ifunc header for strtok_r (to
	define macros).
        * sysdeps/s390/multiarch/ifunc-impl-list.c: Add STRTOK_R_C (generic)
	and STRTOK_R_Z13 versions based on macros in header.
        * sysdeps/s390/strtok_r-c.c: Default/generic version.
        * sysdeps/s390/strtok_r-vx.c: Calls str(c)spn_vx implementations as
	the current generic version does.
        * sysdeps/s390/strtok_r.c: Define versions.

        * sysdeps/x86_64/multiarch/Makefile: Add strtok_r-sse2 and strtok_r-c
	to sysdep_routines.
        * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Add sse2 (generic)
	and sse4.2 versions.
        * sysdeps/x86_64/multiarch/strtok_r-c.c: Calls str(c)spn_sse42 versions
	as the current generic version does.
        * sysdeps/x86_64/multiarch/strtok_r-sse2.c: Default/generic version.
        * sysdeps/x86_64/multiarch/strtok_r.c: Define versions.
---
 benchtests/bench-string.h                     |   1 +
 benchtests/bench-strtok.c                     | 213 +++++++++--------
 string/strtok_r.c                             | 225 +++++++++++++++---
 sysdeps/i386/i686/multiarch/Makefile          |   4 +-
 sysdeps/i386/i686/multiarch/ifunc-impl-list.c |   6 +
 sysdeps/i386/i686/multiarch/strtok_r-c.c      |   1 +
 sysdeps/i386/i686/multiarch/strtok_r-ia32.c   |  23 ++
 sysdeps/i386/i686/multiarch/strtok_r.c        |  33 +++
 sysdeps/powerpc/powerpc64/multiarch/Makefile  |   2 +-
 .../powerpc64/multiarch/ifunc-impl-list.c     |   7 +
 .../powerpc64/multiarch/strtok_r-power8.c     |  58 +++++
 .../powerpc64/multiarch/strtok_r-ppc64.c      |  28 +++
 .../powerpc/powerpc64/multiarch/strtok_r.c    |  35 +++
 sysdeps/s390/Makefile                         |   1 +
 sysdeps/s390/ifunc-strtok_r.h                 |  52 ++++
 sysdeps/s390/multiarch/ifunc-impl-list.c      |  13 +
 sysdeps/s390/strtok_r-c.c                     |  30 +++
 sysdeps/s390/strtok_r-vx.c                    |  69 ++++++
 sysdeps/s390/strtok_r.c                       |  40 ++++
 sysdeps/x86_64/multiarch/Makefile             |   3 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c    |   6 +
 sysdeps/x86_64/multiarch/strtok_r-c.c         |  63 +++++
 sysdeps/x86_64/multiarch/strtok_r-sse2.c      |  23 ++
 sysdeps/x86_64/multiarch/strtok_r.c           |  42 ++++
 24 files changed, 845 insertions(+), 133 deletions(-)
 create mode 100644 sysdeps/i386/i686/multiarch/strtok_r-c.c
 create mode 100644 sysdeps/i386/i686/multiarch/strtok_r-ia32.c
 create mode 100644 sysdeps/i386/i686/multiarch/strtok_r.c
 create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strtok_r-power8.c
 create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strtok_r-ppc64.c
 create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strtok_r.c
 create mode 100644 sysdeps/s390/ifunc-strtok_r.h
 create mode 100644 sysdeps/s390/strtok_r-c.c
 create mode 100644 sysdeps/s390/strtok_r-vx.c
 create mode 100644 sysdeps/s390/strtok_r.c
 create mode 100644 sysdeps/x86_64/multiarch/strtok_r-c.c
 create mode 100644 sysdeps/x86_64/multiarch/strtok_r-sse2.c
 create mode 100644 sysdeps/x86_64/multiarch/strtok_r.c

Comments

Carlos O'Donell March 12, 2020, 4:06 p.m. UTC | #1
On 3/12/20 11:30 AM, Joseph Yoo wrote:
> The current generic strtok_r implementation calls strspn to find the
> beginning of the token and strcspn to find the end. Within each of these
> call stacks is a redundant 256-byte look-up table used to compare chars
> of S with DELIM. This patched version, for the most part, manually
> inlines them + adds other (commented) optimizations. Still, strtok(_r)
> benefits from existing str(c)spn vector implementations. So, any
> sub-arch with a __str(c)spn_(vx/sse42/etc) should also have a
> __strtok_r_(vx/sse42/etc) to call them explicitly. I've done so for those
> affected: x86_64, powerpc64/power8, s390(x), and i386/686.
> I've tested on my own x86_64 machine, but I'd like help with the
> others (there seem to be different conventions in how versioning
> is done, and I'm acquainted with at least none of them). 

Thanks for posting this!

Before we go much further I just wanted to check if you have copyright
assignment with the FSF and to make sure we start that process if you don't.
Assignment will allow us to review patches quickly and with minimal friction.

The contribution checklist is here:
https://sourceware.org/glibc/wiki/Contribution%20checklist

> I CC-ed Adhemerval Zanella because I was working under their
> branch, generic-strings, and also just noticed that this goes
> against their goal of promoting portability. Meaning, it seems it would
> be preferred to keep str(c)spn in this function. So, another solution
> could be to 'externally' inline the two calls (strspn+strcspn),
> allowing an optimizing compiler to prevent re-allocation. With 
> optimization flags set at O[s,2-3] (tested in x86_64 gcc 9.2) the compiler 
> does so. However, the instructions to zero the table, 
> search through delim, and set the table still repeat. So, manual inlining 
> seems to be 'optimal,' but I understand that it would mean more work in
> terms of having to write more architecture-specific implementations.

Correct, and at an architectural review level the direction you are
proposing here goes against what we've been doing over the past couple
of years to centralize the implementations. I'll let Adhemerval comment
more.

Without looking too deeply at your changes, what kinds of performance
improvements do you see? 

> I hope this patch/email/Changelog format is somewhat correct!

You don't need ChangeLog's anymore :-)
diff mbox series

Patch

diff --git a/benchtests/bench-string.h b/benchtests/bench-string.h
index 841a66a9d8..20c74bbec3 100644
--- a/benchtests/bench-string.h
+++ b/benchtests/bench-string.h
@@ -89,6 +89,7 @@  extern impl_t __start_impls[], __stop_impls[];
 #  define STRSPN strspn
 #  define STPCPY stpcpy
 #  define STPNCPY stpncpy
+#  define STRTOK_R strtok_r
 # else
 #  include <wchar.h>
 #  define CHAR wchar_t
diff --git a/benchtests/bench-strtok.c b/benchtests/bench-strtok.c
index 7012fb9265..2fcfa5c45d 100644
--- a/benchtests/bench-strtok.c
+++ b/benchtests/bench-strtok.c
@@ -19,69 +19,43 @@ 
 #define TEST_MAIN
 #define TEST_NAME "strtok"
 #include "bench-string.h"
+#include "json-lib.h"
 
-char *
-oldstrtok (char *s, const char *delim)
-{
-  static char *olds;
-  char *token;
-
-  if (s == NULL)
-    s = olds;
-
-  /* Scan leading delimiters.  */
-  s += strspn (s, delim);
-  if (*s == '\0')
-    {
-      olds = s;
-      return NULL;
-    }
+typedef char *(*proto_t) (char *, const char *, char **);
 
-  /* Find the end of the token.  */
-  token = s;
-  s = strpbrk (token, delim);
-  if (s == NULL)
-    /* This token finishes the string.  */
-    olds = rawmemchr (token, '\0');
-  else
-    {
-      /* Terminate the token and make OLDS point past it.  */
-      *s = '\0';
-      olds = s + 1;
-    }
-  return token;
-}
+char *generic_strtok_r (char *, const char *, char **);
 
-typedef char *(*proto_t) (const char *, const char *);
-
-IMPL (oldstrtok, 0)
-IMPL (strtok, 1)
+IMPL (generic_strtok_r, 0)
+IMPL (STRTOK_R, 1)
 
 static void
-do_one_test (impl_t * impl, const char *s1, const char *s2)
+do_one_test (impl_t *impl, char *str, const char *delim, json_ctx_t *json_ctx)
 {
   size_t i, iters = INNER_LOOP_ITERS_SMALL;
   timing_t start, stop, cur;
+
+  char *savep;
   TIMING_NOW (start);
   for (i = 0; i < iters; ++i)
     {
-      CALL (impl, s1, s2);
-      CALL (impl, NULL, s2);
-      CALL (impl, NULL, s2);
+
+      char *_ = CALL (impl, str, delim, &savep);
+
+      while ((_ = CALL (impl, NULL, delim, &savep)))
+        ;
     }
   TIMING_NOW (stop);
 
   TIMING_DIFF (cur, start, stop);
 
-  TIMING_PRINT_MEAN ((double) cur, (double) iters);
-
+  json_element_double (json_ctx, (double)cur / (double)iters);
 }
 
-
 static void
-do_test (size_t align1, size_t align2, size_t len1, size_t len2, int fail)
+do_test (size_t align1, size_t align2, size_t len1, size_t len2, int fail,
+         json_ctx_t *json_ctx)
 {
-  char *s2 = (char *) (buf2 + align2);
+  char *s2 = (char *)(buf2 + align2);
   static const char d[] = "1234567890abcdef";
 #define dl (sizeof (d) - 1)
   char *ss2 = s2;
@@ -92,89 +66,128 @@  do_test (size_t align1, size_t align2, size_t len1, size_t len2, int fail)
     }
   s2[len2] = '\0';
 
-  printf ("Length %4zd/%zd, alignment %2zd/%2zd, %s:",
-	  len1, len2, align1, align2, fail ? "fail" : "found");
+  json_element_object_begin (json_ctx);
+  json_array_begin (json_ctx, "length");
+  json_element_uint (json_ctx, len1);
+  json_element_uint (json_ctx, len2);
+  json_array_end (json_ctx);
+  json_array_begin (json_ctx, "alignment");
+  json_element_uint (json_ctx, align1);
+  json_element_uint (json_ctx, align2);
+  json_array_end (json_ctx);
+  json_array_begin (json_ctx, fail ? "fail" : "found");
 
   FOR_EACH_IMPL (impl, 0)
   {
-    char *s1 = (char *) (buf1 + align1);
+    char *s1 = (char *)(buf1 + align1);
     if (fail)
       {
-	char *ss1 = s1;
-	for (size_t l = len1; l > 0; l = l > dl ? l - dl : 0)
-	  {
-	    size_t t = l > dl ? dl : l;
-	    memcpy (ss1, d, t);
-	    ++ss1[len2 > 7 ? 7 : len2 - 1];
-	    ss1 += t;
-	  }
+        char *ss1 = s1;
+        for (size_t l = len1; l > 0; l = l > dl ? l - dl : 0)
+          {
+            size_t t = l > dl ? dl : l;
+            memcpy (ss1, d, t);
+            ++ss1[len2 > 7 ? 7 : len2 - 1];
+            ss1 += t;
+          }
       }
     else
       {
-	memset (s1, '0', len1);
-	memcpy (s1 + (len1 - len2) - 2, s2, len2);
-	if ((len1 / len2) > 4)
-	  memcpy (s1 + (len1 - len2) - (3 * len2), s2, len2);
+        memset (s1, '0', len1);
+        memcpy (s1 + (len1 - len2) - 2, s2, len2);
+        if ((len1 / len2) > 4)
+          memcpy (s1 + (len1 - len2) - (3 * len2), s2, len2);
       }
     s1[len1] = '\0';
-    do_one_test (impl, s1, s2);
+    do_one_test (impl, s1, s2, json_ctx);
   }
-  putchar ('\n');
+  json_array_end (json_ctx);
+  json_element_object_end (json_ctx);
 }
 
 static int
 test_main (void)
 {
+  json_ctx_t json_ctx;
   test_init ();
 
-  printf ("%23s", "");
+  json_init (&json_ctx, 0, stdout);
+
+  json_document_begin (&json_ctx);
+  json_attr_string (&json_ctx, "timing_type", TIMING_TYPE);
+
+  json_attr_object_begin (&json_ctx, "functions");
+  json_attr_object_begin (&json_ctx, TEST_NAME);
+  json_attr_string (&json_ctx, "bench-variant", "");
+
+  json_array_begin (&json_ctx, "ifuncs");
   FOR_EACH_IMPL (impl, 0)
-    printf ("\t%s", impl->name);
-  putchar ('\n');
+  json_element_string (&json_ctx, impl->name);
+  json_array_end (&json_ctx);
+
+  json_array_begin (&json_ctx, "results");
 
   for (size_t klen = 2; klen < 32; ++klen)
     for (size_t hlen = 2 * klen; hlen < 16 * klen; hlen += klen)
       {
-	do_test (0, 0, hlen, klen, 0);
-	do_test (0, 0, hlen, klen, 1);
-	do_test (0, 3, hlen, klen, 0);
-	do_test (0, 3, hlen, klen, 1);
-	do_test (0, 9, hlen, klen, 0);
-	do_test (0, 9, hlen, klen, 1);
-	do_test (0, 15, hlen, klen, 0);
-	do_test (0, 15, hlen, klen, 1);
-
-	do_test (3, 0, hlen, klen, 0);
-	do_test (3, 0, hlen, klen, 1);
-	do_test (3, 3, hlen, klen, 0);
-	do_test (3, 3, hlen, klen, 1);
-	do_test (3, 9, hlen, klen, 0);
-	do_test (3, 9, hlen, klen, 1);
-	do_test (3, 15, hlen, klen, 0);
-	do_test (3, 15, hlen, klen, 1);
-
-	do_test (9, 0, hlen, klen, 0);
-	do_test (9, 0, hlen, klen, 1);
-	do_test (9, 3, hlen, klen, 0);
-	do_test (9, 3, hlen, klen, 1);
-	do_test (9, 9, hlen, klen, 0);
-	do_test (9, 9, hlen, klen, 1);
-	do_test (9, 15, hlen, klen, 0);
-	do_test (9, 15, hlen, klen, 1);
-
-	do_test (15, 0, hlen, klen, 0);
-	do_test (15, 0, hlen, klen, 1);
-	do_test (15, 3, hlen, klen, 0);
-	do_test (15, 3, hlen, klen, 1);
-	do_test (15, 9, hlen, klen, 0);
-	do_test (15, 9, hlen, klen, 1);
-	do_test (15, 15, hlen, klen, 0);
-	do_test (15, 15, hlen, klen, 1);
+        do_test (0, 0, hlen, klen, 0, &json_ctx);
+        do_test (0, 0, hlen, klen, 1, &json_ctx);
+        do_test (0, 3, hlen, klen, 0, &json_ctx);
+        do_test (0, 3, hlen, klen, 1, &json_ctx);
+        do_test (0, 9, hlen, klen, 0, &json_ctx);
+        do_test (0, 9, hlen, klen, 1, &json_ctx);
+        do_test (0, 15, hlen, klen, 0, &json_ctx);
+        do_test (0, 15, hlen, klen, 1, &json_ctx);
+
+        do_test (3, 0, hlen, klen, 0, &json_ctx);
+        do_test (3, 0, hlen, klen, 1, &json_ctx);
+        do_test (3, 3, hlen, klen, 0, &json_ctx);
+        do_test (3, 3, hlen, klen, 1, &json_ctx);
+        do_test (3, 9, hlen, klen, 0, &json_ctx);
+        do_test (3, 9, hlen, klen, 1, &json_ctx);
+        do_test (3, 15, hlen, klen, 0, &json_ctx);
+        do_test (3, 15, hlen, klen, 1, &json_ctx);
+
+        do_test (9, 0, hlen, klen, 0, &json_ctx);
+        do_test (9, 0, hlen, klen, 1, &json_ctx);
+        do_test (9, 3, hlen, klen, 0, &json_ctx);
+        do_test (9, 3, hlen, klen, 1, &json_ctx);
+        do_test (9, 9, hlen, klen, 0, &json_ctx);
+        do_test (9, 9, hlen, klen, 1, &json_ctx);
+        do_test (9, 15, hlen, klen, 0, &json_ctx);
+        do_test (9, 15, hlen, klen, 1, &json_ctx);
+
+        do_test (15, 0, hlen, klen, 0, &json_ctx);
+        do_test (15, 0, hlen, klen, 1, &json_ctx);
+        do_test (15, 3, hlen, klen, 0, &json_ctx);
+        do_test (15, 3, hlen, klen, 1, &json_ctx);
+        do_test (15, 9, hlen, klen, 0, &json_ctx);
+        do_test (15, 9, hlen, klen, 1, &json_ctx);
+        do_test (15, 15, hlen, klen, 0, &json_ctx);
+        do_test (15, 15, hlen, klen, 1, &json_ctx);
       }
-  do_test (0, 0, page_size - 1, 16, 0);
-  do_test (0, 0, page_size - 1, 16, 1);
+
+  do_test (0, 0, page_size - 1, 3, 0, &json_ctx);
+  do_test (0, 0, page_size - 1, 3, 1, &json_ctx);
+  do_test (0, 0, page_size - 1, 7, 0, &json_ctx);
+  do_test (0, 0, page_size - 1, 7, 1, &json_ctx);
+  do_test (0, 0, page_size - 1, 15, 0, &json_ctx);
+  do_test (0, 0, page_size - 1, 15, 1, &json_ctx);
+  do_test (9, 0, page_size / 2, 17, 0, &json_ctx);
+  do_test (9, 0, page_size / 2, 17, 1, &json_ctx);
+  do_test (0, 0, page_size - 1, 100, 0, &json_ctx);
+  do_test (0, 0, page_size - 1, 100, 1, &json_ctx);
+
+  json_array_end (&json_ctx);
+  json_attr_object_end (&json_ctx);
+  json_attr_object_end (&json_ctx);
+  json_document_end (&json_ctx);
 
   return ret;
 }
 
 #include <support/test-driver.c>
+
+#undef STRTOK_R
+#define STRTOK_R generic_strtok_r
+#include <string/strtok_r.c>
\ No newline at end of file
diff --git a/string/strtok_r.c b/string/strtok_r.c
index b8359c8653..784ecba8f7 100644
--- a/string/strtok_r.c
+++ b/string/strtok_r.c
@@ -16,16 +16,19 @@ 
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#ifdef HAVE_CONFIG_H
-# include <config.h>
-#endif
-
+#include <libc-pointer-arith.h>
 #include <string.h>
 
-#ifndef _LIBC
-/* Get specification.  */
-# include "strtok_r.h"
-# define __strtok_r strtok_r
+#undef __strtok_r
+#undef strtok_r
+
+#ifndef STRTOK_R
+# ifdef weak_alias
+#  define STRTOK_R __strtok_r
+   weak_alias (__strtok_r, strtok_r)
+# else
+#  define STRTOK_R strtok_r
+# endif
 #endif
 
 /* Parse S into tokens separated by characters in DELIM.
@@ -37,43 +40,207 @@ 
 	x = strtok_r(NULL, "-=", &sp);	// x = "def", sp = NULL
 	x = strtok_r(NULL, "=", &sp);	// x = NULL
 		// s = "abc\0-def\0"
+
+  This generic implementation can be thought of as 3 (sub-)routines:
+    [0] Process delims - 
+    Set up a look-up table with the delimiting characters for
+    the input string to compare against
+    [1] Find start - Iterate through input string until non-delimiting 
+    character is reached-- basically strspn.
+    [2] Find end - Iterate through input string until delimiting 
+    character is reached-- basically strcspn.
 */
 char *
-__strtok_r (char *s, const char *delim, char **save_ptr)
+STRTOK_R (char *start, const char *delim, char **save_ptr)
 {
-  char *end;
+  /* General pointer used to cast START, DELIM, and *SAVE_PTR 
+  as unsigned char pointers */
+  unsigned char *u;
 
-  if (s == NULL)
-    s = *save_ptr;
+  /** BEGIN ROUTINE 0 **/
+  /* Zero-initialize a character-indexed look-up table. The offsets
+  corresponding to char values in DELIM store 1; otherwise, remain 0.
+  See str(c)spn implementations for original reference. */
+  unsigned char dset[256];
+  memset (dset, 0, 64);
+  memset (dset + 64, 0, 64);
+  memset (dset + 128, 0, 64);
+  memset (dset + 192, 0, 64);
 
-  if (*s == '\0')
+  /* To fill the table, search for the NUL byte in DELIM by checking 4 bytes,
+  and then aligning down (if, at all) to the closest lower
+  4-byte boundary. Proceed to check 4 bytes at a time by loading into a
+  4-byte integer 'word' */
+  u = (unsigned char *)delim;
+
+  if (__glibc_unlikely (u[0] == '\0')) ;
+  else if (u[1] == '\0') dset[u[0]] = 1;
+  else if (u[2] == '\0') dset[u[0]] = 1, dset[u[1]] = 1;
+  else if (u[3] == '\0') dset[u[0]] = 1, dset[u[1]] = 1, dset[u[2]] = 1;
+  else
     {
-      *save_ptr = s;
-      return NULL;
+      dset[u[0]] = 1, dset[u[1]] = 1, dset[u[2]] = 1, dset[u[3]] = 1;
+      
+      /* Align down to 4-byte boundary (+ the 4 bytes already checked) */
+      u = PTR_ALIGN_DOWN (u, 4) + 4;
+
+#if __INT_LEAST32_WIDTH__ == 32 && __CHAR_BIT__ == 8
+
+      uint_fast32_t zmask, word = *(uint_least32_t *)u;
+      /* The classic bit-twiddling check for 0-byte in a word, in which
+      the resulting 'mask', ZMASK, sets 0x80 where WORD contains the zero
+      byte and 0x00 for nonzero bytes. Thus, break the loop if ZMASK
+      isn't all zeros. */
+      while ((zmask = ~word & (word - 0x01010101UL) & 0x80808080UL) == 0)
+        {
+          dset[u[0]] = 1;
+          dset[u[1]] = 1;
+          dset[u[2]] = 1;
+          dset[u[3]] = 1;
+          word = *(uint_least32_t *)(u += 4);
+        }
+
+/* macro to handle the remaining bytes using zmask */
+# define handle_zmask(shft0, fst2_nonzero, shft2) \
+  { \
+    /* Move MSB to LSB and XOR to get (in bits):
+      0..1 from 0..0
+      0..0 from 1..0,
+    effectively doing (!!) w/o flag dependency */ \
+    dset[u[0]] = (unsigned char) (zmask >> shft0) ^ 1; \
+    /* fst2_nonzero is true if the first 2 bytes of u are nonzero
+    (i.e. zmask[0/1] = 0/0) */ \
+    if (fst2_nonzero) \
+      dset[u[1]] = 1, \
+      dset[u[2]] = (unsigned char) (zmask >> shft2) ^ 1; \
+  } 
+
+/* handle the remaining bytes */
+# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  handle_zmask (7, (uint_least16_t)zmask == 0x0000, 23);
+# elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  handle_zmask (31, zmask <= 0x8080, 15);
+# elif __BYTE_ORDER__ == __ORDER_PDP_ENDIAN__
+  handle_zmask (23, zmask <= 0x8080, 7);
+# endif /* END __BYTE_ORDER__ == X */
+
+#else  /* Portably do it in bytes */
+    int c0 = (_Bool)u[0], 
+        c1 = (_Bool)u[1], 
+        c2 = (_Bool)u[2],
+        c3 = (_Bool)u[3];
+
+    while (c0 & c1 & c2 & c3)
+      {
+        dset[u[0]] = 1;
+        dset[u[1]] = 1;
+        dset[u[2]] = 1;
+        dset[u[3]] = 1;
+
+        u += 4;
+
+        c0 = (_Bool)u[0];
+        c1 = (_Bool)u[1]; 
+        c2 = (_Bool)u[2];
+        c3 = (_Bool)u[3];
+      }
+    
+    dset[u[0]] = c0;
+    if (c0 & c1) 
+      dset[u[1]] = 1, 
+      dset[u[2]] = c2;
+#endif /* END __INT_LEAST32_WIDTH__ == 32 && __CHAR_BIT__ == 8 */
     }
+  /** END ROUTINE 0 **/
 
-  /* Scan leading delimiters.  */
-  s += strspn (s, delim);
-  if (*s == '\0')
+  /* From this point on, U refers to the input string, conditionally
+  START or *SAVE_P. */
+  u = (unsigned char *)*save_ptr;
+  if (__glibc_unlikely (start != NULL))
+    u = (unsigned char *)start;
+
+  /** BEGIN ROUTINE 1 **/
+  /* Find first character in U that is not in DSET */
+  if (!dset[u[0]])
+    ;
+  else if (!dset[u[1]])
+    u += 1;
+  else if (!dset[u[2]])
+    u += 2;
+  else if (!dset[u[3]])
+    u += 3;
+  else
+    {
+      /* If there were a 'fast type' for an implicit int (i.e. one without
+      a specified minimum width), it should be that of the int with the
+      minimum possible width, 16) */
+      int_fast16_t s0, s2, det;
+
+      u = PTR_ALIGN_DOWN (u, 4);
+      do
+        {
+          u += 4;
+
+          s0 = dset[u[0]];
+          det = dset[u[1]] & s0;
+          s2 = dset[u[2]];
+        }
+      while (det & s2 & dset[u[3]]);
+
+      u += !det ? s0 : s2 + 2;
+    }
+  /** END ROUTINE 1 **/
+
+  /* End of string is reached */
+  if (__glibc_unlikely (*u == '\0'))
     {
-      *save_ptr = s;
+      *save_ptr = (char *)u;
       return NULL;
     }
 
-  /* Find the end of the token.  */
-  end = s + strcspn (s, delim);
-  if (*end == '\0')
+  /* End of string is not yet reached, so set START of return token */
+  start = (char *)u;
+  /* For NUL to continue causing a break in ROUTINE 2, set DSET[NUL] to 1 */
+  dset['\0'] = 1;
+
+  /** BEGIN ROUTINE 2 **/
+  /* Find first character in start that is in DSET */
+  if (dset[u[0]])
+    ;
+  else if (dset[u[1]])
+    u += 1;
+  else if (dset[u[2]])
+    u += 2;
+  else if (dset[u[3]])
+    u += 3;
+  else
+    {
+      int_fast16_t s0, s2, det;
+
+      u = PTR_ALIGN_DOWN (u, 4);
+      do
+        {
+          u += 4;
+
+          s0  = dset[u[0]];
+          det = dset[u[1]] | s0;
+          s2  = dset[u[2]];
+        }
+      while ((det | s2 | dset[u[3]]) == 0);
+
+      u += det ? 1 - s0 : 3 - s2;
+    }
+  /** END ROUTINE 2 **/
+
+  *save_ptr = (char *)u;
+  if (__glibc_likely (*u != 0))
     {
-      *save_ptr = end;
-      return s;
+      *u = 0;
+      (*save_ptr)++;
     }
 
-  /* Terminate the token and make *SAVE_PTR point past it.  */
-  *end = '\0';
-  *save_ptr = end + 1;
-  return s;
+  return start;
 }
 #ifdef weak_alias
 libc_hidden_def (__strtok_r)
-weak_alias (__strtok_r, strtok_r)
 #endif
diff --git a/sysdeps/i386/i686/multiarch/Makefile b/sysdeps/i386/i686/multiarch/Makefile
index bf75a9947f..83540b2d8b 100644
--- a/sysdeps/i386/i686/multiarch/Makefile
+++ b/sysdeps/i386/i686/multiarch/Makefile
@@ -24,13 +24,13 @@  sysdep_routines += bzero-sse2 memset-sse2 memcpy-ssse3 mempcpy-ssse3 \
 		   strcasecmp_l-sse4 strncase_l-sse4 \
 		   bcopy-sse2-unaligned memcpy-sse2-unaligned \
 		   mempcpy-sse2-unaligned memmove-sse2-unaligned \
-		   strcspn-c strpbrk-c strspn-c \
+		   strcspn-c strpbrk-c strspn-c strtok_r-c \
 		   bcopy-ia32 bzero-ia32 rawmemchr-ia32 \
 		   memchr-ia32 memcmp-ia32 memcpy-ia32 memmove-ia32 \
 		   mempcpy-ia32 memset-ia32 strcat-ia32 strchr-ia32 \
 		   strrchr-ia32 strcpy-ia32 strcmp-ia32 strcspn-ia32 \
 		   strpbrk-ia32 strspn-ia32 strlen-ia32 stpcpy-ia32 \
-		   stpncpy-ia32
+		   stpncpy-ia32 strtok_r-ia32
 CFLAGS-varshift.c += -msse4
 CFLAGS-strcspn-c.c += -msse4
 CFLAGS-strpbrk-c.c += -msse4
diff --git a/sysdeps/i386/i686/multiarch/ifunc-impl-list.c b/sysdeps/i386/i686/multiarch/ifunc-impl-list.c
index 23774fbe8a..022725e40c 100644
--- a/sysdeps/i386/i686/multiarch/ifunc-impl-list.c
+++ b/sysdeps/i386/i686/multiarch/ifunc-impl-list.c
@@ -272,6 +272,12 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 			      __strspn_sse42)
 	      IFUNC_IMPL_ADD (array, i, strspn, 1, __strspn_ia32))
 
+  /* Support sysdeps/i386/i686/multiarch/strtok_r.c.  */
+  IFUNC_IMPL (i, name, strtok_r,
+	      IFUNC_IMPL_ADD (array, i, strtok_r, HAS_CPU_FEATURE (SSE4_2),
+			      __strtok_r_sse42)
+	      IFUNC_IMPL_ADD (array, i, strtok_r, 1, __strtok_r_ia32))
+
   /* Support sysdeps/i386/i686/multiarch/wcschr.S.  */
   IFUNC_IMPL (i, name, wcschr,
 	      IFUNC_IMPL_ADD (array, i, wcschr, HAS_CPU_FEATURE (SSE2),
diff --git a/sysdeps/i386/i686/multiarch/strtok_r-c.c b/sysdeps/i386/i686/multiarch/strtok_r-c.c
new file mode 100644
index 0000000000..0e9b303c0d
--- /dev/null
+++ b/sysdeps/i386/i686/multiarch/strtok_r-c.c
@@ -0,0 +1 @@ 
+#include <sysdeps/x86_64/multiarch/strtok_r-c.c>
\ No newline at end of file
diff --git a/sysdeps/i386/i686/multiarch/strtok_r-ia32.c b/sysdeps/i386/i686/multiarch/strtok_r-ia32.c
new file mode 100644
index 0000000000..2fa48abbb2
--- /dev/null
+++ b/sysdeps/i386/i686/multiarch/strtok_r-ia32.c
@@ -0,0 +1,23 @@ 
+/* strtok_r optimized for i686 (generic).
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define STRTOK_R __strtok_r_ia32
+
+#undef weak_alias
+#define weak_alias(ignored1, ignored2)
+#undef libc_hidden_def
+#define libc_hidden_def(strtok_r)
+
+#include <string/strtok_r.c>
\ No newline at end of file
diff --git a/sysdeps/i386/i686/multiarch/strtok_r.c b/sysdeps/i386/i686/multiarch/strtok_r.c
new file mode 100644
index 0000000000..2662ad78db
--- /dev/null
+++ b/sysdeps/i386/i686/multiarch/strtok_r.c
@@ -0,0 +1,33 @@ 
+/* Multiple versions of strtok_r.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define strtok_r __redirect_strtok_r
+# define __strtok_r __redirect___strtok_r
+# include <string.h>
+# undef strtok_r
+# undef __strtok_r
+
+# define SYMBOL_NAME strtok_r
+# include "ifunc-sse4_2.h"
+
+libc_ifunc_redirected (__redirect_strtok_r, __strtok_r, IFUNC_SELECTOR ());
+weak_alias (__strtok_r, strtok_r)
+#endif
diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index ea936bf9ed..f22fd825bb 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -29,7 +29,7 @@  sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \
 		   strspn-power8 strspn-ppc64 strcspn-power8 strcspn-ppc64 \
 		   strlen-power8 strcasestr-power8 strcasestr-ppc64 \
 		   strcasecmp-ppc64 strcasecmp-power8 strncase-ppc64 \
-		   strncase-power8
+		   strncase-power8 strtok_r-power8 strtok_r-ppc64
 
 ifneq (,$(filter %le,$(config-machine)))
 sysdep_routines += strcmp-power9 strncmp-power9
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index b9fef3f43c..cd131ec70b 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -355,6 +355,13 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
              IFUNC_IMPL_ADD (array, i, strstr, 1,
                              __strstr_ppc))
 
+  /* Support sysdeps/powerpc/powerpc64/multiarch/strtok_r.c.  */
+  IFUNC_IMPL (i, name, strtok_r,
+             IFUNC_IMPL_ADD (array, i, strtok_r,
+                             hwcap2 & PPC_FEATURE2_ARCH_2_07,
+                             __strtok_r_power8)
+             IFUNC_IMPL_ADD (array, i, strtok_r, 1,
+                             __strtok_r_ppc))
 
   /* Support sysdeps/powerpc/powerpc64/multiarch/strcasestr.c.  */
   IFUNC_IMPL (i, name, strcasestr,
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strtok_r-power8.c b/sysdeps/powerpc/powerpc64/multiarch/strtok_r-power8.c
new file mode 100644
index 0000000000..d2ce7b1429
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/strtok_r-power8.c
@@ -0,0 +1,58 @@ 
+/* Optimized strtok_r implementation for POWER8.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+extern size_t __strspn_power8 (const char *, const char *) attribute_hidden;
+extern size_t __strcspn_power8 (const char *, const char *) attribute_hidden;
+
+/* __asm__ (".machine power8"); */
+char *
+__strtok_r_power8 (char * s, const char *delim, char **save_p)
+{
+  char *end;
+
+  if (s == NULL) s = *save_p;
+
+  if (*s == '\0')
+    {
+      *save_p = s;
+      return NULL;
+    }
+
+  /* Scan leading delimiters.  */
+  s += __strspn_power8 (s, delim);
+  if (*s == '\0')
+    {
+      *save_p = s;
+      return NULL;
+    }
+
+  /* Find the end of the token.  */
+  end = s + __strcspn_power8 (s, delim);
+  if (*end == '\0')
+    {
+      *save_p = end;
+      return s;
+    }
+
+  /* Terminate the token and make *save_p point past it.  */
+  *end = '\0';
+  *save_p = end + 1;
+  return s;
+}
\ No newline at end of file
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strtok_r-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strtok_r-ppc64.c
new file mode 100644
index 0000000000..27b73fdb37
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/strtok_r-ppc64.c
@@ -0,0 +1,28 @@ 
+/* PowerPC64 default implementation of strtok_r.
+   Copyright (C) 2013-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+#define STRTOK_R  __strtok_r_ppc
+
+#undef weak_alias
+#define weak_alias(a,b )
+
+extern __typeof (strtok_r) __strtok_r_ppc attribute_hidden;
+
+#include <string/strtok_r.c>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strtok_r.c b/sysdeps/powerpc/powerpc64/multiarch/strtok_r.c
new file mode 100644
index 0000000000..fc54b994d7
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/strtok_r.c
@@ -0,0 +1,35 @@ 
+/* Multiple versions of strtok_r. PowerPC64 version.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if IS_IN (libc)
+# include <string.h>
+# include <shlib-compat.h>
+# include "init-arch.h"
+
+extern __typeof (__strtok_r) __strtok_r_ppc attribute_hidden;
+extern __typeof (__strtok_r) __strtok_r_power8 attribute_hidden;
+
+libc_ifunc (__strtok_r,
+	    (hwcap2 & PPC_FEATURE2_ARCH_2_07)
+	    ? __strtok_r_power8
+	    : __strtok_r_ppc);
+
+weak_alias (__strtok_r, strtok_r)
+#else
+#include <string/strtok_r.c>
+#endif
\ No newline at end of file
diff --git a/sysdeps/s390/Makefile b/sysdeps/s390/Makefile
index a8c49c928f..eb95105889 100644
--- a/sysdeps/s390/Makefile
+++ b/sysdeps/s390/Makefile
@@ -59,6 +59,7 @@  sysdep_routines += bzero memset memset-z900 \
 		   mempcpy memcpy memcpy-z900 \
 		   memmove memmove-c \
 		   strstr strstr-arch13 strstr-vx strstr-c \
+		   strtok_r strtok_r-vx strtok_r-c \
 		   memmem memmem-arch13 memmem-vx memmem-c \
 		   strlen strlen-vx strlen-c \
 		   strnlen strnlen-vx strnlen-c \
diff --git a/sysdeps/s390/ifunc-strtok_r.h b/sysdeps/s390/ifunc-strtok_r.h
new file mode 100644
index 0000000000..836e2f36b2
--- /dev/null
+++ b/sysdeps/s390/ifunc-strtok_r.h
@@ -0,0 +1,52 @@ 
+/* strtok_r variant information on S/390 version.
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if defined USE_MULTIARCH && IS_IN (libc)		\
+  && ! defined HAVE_S390_MIN_Z13_ZARCH_ASM_SUPPORT
+# define HAVE_STRTOK_R_IFUNC	1
+#else
+# define HAVE_STRTOK_R_IFUNC	0
+#endif
+
+#ifdef HAVE_S390_VX_ASM_SUPPORT
+# define HAVE_STRTOK_R_IFUNC_AND_VX_SUPPORT HAVE_STRTOK_R_IFUNC
+#else
+# define HAVE_STRTOK_R_IFUNC_AND_VX_SUPPORT 0
+#endif
+
+#if defined HAVE_S390_MIN_Z13_ZARCH_ASM_SUPPORT
+# define STRTOK_R_DEFAULT		STRTOK_R_Z13
+# define HAVE_STRTOK_R_C		0
+# define HAVE_STRTOK_R_Z13	1
+#else
+# define STRTOK_R_DEFAULT		STRTOK_R_C
+# define HAVE_STRTOK_R_C		1
+# define HAVE_STRTOK_R_Z13	HAVE_STRTOK_R_IFUNC_AND_VX_SUPPORT
+#endif
+
+#if HAVE_STRTOK_R_C
+# define STRTOK_R_C		__strtok_r_c
+#else
+# define STRTOK_R_C		NULL
+#endif
+
+#if HAVE_STRTOK_R_Z13
+# define STRTOK_R_Z13		__strtok_r_vx
+#else
+# define STRTOK_R_Z13		NULL
+#endif
diff --git a/sysdeps/s390/multiarch/ifunc-impl-list.c b/sysdeps/s390/multiarch/ifunc-impl-list.c
index e6195c6e26..4c86eac432 100644
--- a/sysdeps/s390/multiarch/ifunc-impl-list.c
+++ b/sysdeps/s390/multiarch/ifunc-impl-list.c
@@ -25,6 +25,7 @@ 
 #include <ifunc-memcmp.h>
 #include <ifunc-memcpy.h>
 #include <ifunc-strstr.h>
+#include <ifunc-strtok_r.h>
 #include <ifunc-memmem.h>
 #include <ifunc-strlen.h>
 #include <ifunc-strnlen.h>
@@ -200,6 +201,18 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 		)
 #endif /* HAVE_STRSTR_IFUNC  */
 
+#if HAVE_STRTOK_R_IFUNC
+    IFUNC_IMPL (i, name, strtok_r,
+# if HAVE_STRTOK_R_Z13
+		IFUNC_IMPL_ADD (array, i, strtok_r,
+				dl_hwcap & HWCAP_S390_VX, STRTOK_R_Z13)
+# endif
+# if HAVE_STRTOK_R_C
+		IFUNC_IMPL_ADD (array, i, strtok_r, 1, STRTOK_R_C)
+# endif
+		)
+#endif /* HAVE_STRTOK_R_IFUNC  */
+
 #if HAVE_MEMMEM_IFUNC
     IFUNC_IMPL (i, name, memmem,
 # if HAVE_MEMMEM_ARCH13
diff --git a/sysdeps/s390/strtok_r-c.c b/sysdeps/s390/strtok_r-c.c
new file mode 100644
index 0000000000..aa5547e29e
--- /dev/null
+++ b/sysdeps/s390/strtok_r-c.c
@@ -0,0 +1,30 @@ 
+/* Default strtok_r implementation for S/390.
+   Copyright (C) 2015-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ifunc-strtok_r.h>
+
+#if HAVE_STRTOK_R_C
+# if HAVE_STRTOK_R_IFUNC
+#  define STRTOK_R STRTOK_R_C
+#  define __strtok_r STRTOK_R
+#  undef weak_alias
+#  define weak_alias(name, alias)
+# endif
+
+# include <string/strtok_r.c>
+#endif
diff --git a/sysdeps/s390/strtok_r-vx.c b/sysdeps/s390/strtok_r-vx.c
new file mode 100644
index 0000000000..49b09e1fc3
--- /dev/null
+++ b/sysdeps/s390/strtok_r-vx.c
@@ -0,0 +1,69 @@ 
+/* Default strtok_r implementation with vector string functions for S/390.
+   Copyright (C) 2018-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ifunc-strtok_r.h>
+
+#if HAVE_STRTOK_R_Z13
+
+# include <string.h>
+
+# ifdef USE_MULTIARCH
+extern __typeof (strspn) __strspn_vx attribute_hidden;
+extern __typeof (strcspn) __strcspn_vx attribute_hidden;
+# endif
+
+/* __asm__ (".machine \"z13\"\n\t.machinemode \"zarch_nohighprs\""); */
+
+char *
+STRTOK_R_Z13 (char * s, const char *delim, char **save_p)
+{
+  char *end;
+
+  if (s == NULL) s = *save_p;
+
+  if (*s == '\0')
+    {
+      *save_p = s;
+      return NULL;
+    }
+
+  /* Scan leading delimiters.  */
+  s += __strspn_vx (s, delim);
+  if (*s == '\0')
+    {
+      *save_p = s;
+      return NULL;
+    }
+
+  /* Find the end of the token.  */
+  end = s + __strcspn_vx (s, delim);
+  if (*end == '\0')
+    {
+      *save_p = end;
+      return s;
+    }
+
+  /* Terminate the token and make *save_p point past it.  */
+  *end = '\0';
+  *save_p = end + 1;
+  return s;
+}
+
+strong_alias (STRTOK_R_Z13, __strtok_r)
+weak_alias (__strtok_r, strtok_r)
+#endif
diff --git a/sysdeps/s390/strtok_r.c b/sysdeps/s390/strtok_r.c
new file mode 100644
index 0000000000..026c136490
--- /dev/null
+++ b/sysdeps/s390/strtok_r.c
@@ -0,0 +1,40 @@ 
+/* Multiple versions of strtok_r.
+   Copyright (C) 2015-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ifunc-strtok_r.h>
+
+#if HAVE_STRTOK_R_IFUNC
+# include <string.h>
+# include <ifunc-resolve.h>
+
+# if HAVE_STRTOK_R_C
+extern __typeof (__strtok_r) STRTOK_R_C attribute_hidden;
+# endif
+
+# if HAVE_STRTOK_R_Z13
+extern __typeof (__strtok_r) STRTOK_R_Z13 attribute_hidden;
+# endif
+
+s390_libc_ifunc_expr (__strtok_r, __strtok_r,
+		      (HAVE_STRTOK_R_Z13 && (hwcap & HWCAP_S390_VX))
+		      ? STRTOK_R_Z13
+		      : STRTOK_R_DEFAULT
+		      )
+weak_alias (__strtok_r, strtok_r)
+#endif
+
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index 395e432c09..969fe405d7 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -43,7 +43,8 @@  sysdep_routines += strncat-c stpncpy-c strncpy-c \
 		   memmove-avx512-unaligned-erms \
 		   memset-sse2-unaligned-erms \
 		   memset-avx2-unaligned-erms \
-		   memset-avx512-unaligned-erms
+		   memset-avx512-unaligned-erms \
+		   strtok_r-sse2 strtok_r-c
 CFLAGS-varshift.c += -msse4
 CFLAGS-strcspn-c.c += -msse4
 CFLAGS-strpbrk-c.c += -msse4
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index ce7eb1eecf..53da1ab784 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -365,6 +365,12 @@  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, strstr, 1, __strstr_sse2_unaligned)
 	      IFUNC_IMPL_ADD (array, i, strstr, 1, __strstr_sse2))
 
+/* Support sysdeps/x86_64/multiarch/strtok_r.c.  */
+  IFUNC_IMPL (i, name, strtok_r,
+	      IFUNC_IMPL_ADD (array, i, strtok_r, HAS_CPU_FEATURE (SSE4_2),
+			      __strtok_r_sse42)
+	      IFUNC_IMPL_ADD (array, i, strtok_r, 1, __strtok_r_sse2))
+
   /* Support sysdeps/x86_64/multiarch/wcschr.c.  */
   IFUNC_IMPL (i, name, wcschr,
 	      IFUNC_IMPL_ADD (array, i, wcschr,
diff --git a/sysdeps/x86_64/multiarch/strtok_r-c.c b/sysdeps/x86_64/multiarch/strtok_r-c.c
new file mode 100644
index 0000000000..a7a95fc73c
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/strtok_r-c.c
@@ -0,0 +1,63 @@ 
+/* strtok_r with SSE4.2 intrinsics
+   Copyright (C) 2009-2020 Free Software Foundation, Inc.
+   Contributed by Intel Corporation.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+extern size_t __strspn_sse42 (const char *, const char *) attribute_hidden;
+extern size_t __strcspn_sse42 (const char *, const char *) attribute_hidden;
+
+/* Uses the SSE4.2 implementations of strspn and strcspn.
+ * This used to be the former generic strtok_r.
+ */
+char *
+__attribute__ ((section (".text.sse4.2")))
+__strtok_r_sse42 (char *s, const char *delim, char **save_p)
+{
+  char *end;
+
+  if (s == NULL)
+    s = *save_p;
+
+  if (*s == '\0')
+    {
+      *save_p = s;
+      return NULL;
+    }
+
+  /* Scan leading delimiters.  */
+  s += __strspn_sse42 (s, delim);
+  if (*s == '\0')
+    {
+      *save_p = s;
+      return NULL;
+    }
+
+  /* Find the end of the token.  */
+  end = s + __strcspn_sse42 (s, delim);
+  if (*end == '\0')
+    {
+      *save_p = end;
+      return s;
+    }
+
+  /* Terminate the token and make *save_p point past it.  */
+  *end = '\0';
+  *save_p = end + 1;
+  return s;
+}
diff --git a/sysdeps/x86_64/multiarch/strtok_r-sse2.c b/sysdeps/x86_64/multiarch/strtok_r-sse2.c
new file mode 100644
index 0000000000..19059d5a9a
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/strtok_r-sse2.c
@@ -0,0 +1,23 @@ 
+/* strtok_r optimized with SSE2.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define STRTOK_R __strtok_r_sse2
+
+#undef weak_alias
+#define weak_alias(ignored1, ignored2)
+#undef libc_hidden_def
+#define libc_hidden_def(strtok_r)
+
+#include <string/strtok_r.c>
\ No newline at end of file
diff --git a/sysdeps/x86_64/multiarch/strtok_r.c b/sysdeps/x86_64/multiarch/strtok_r.c
new file mode 100644
index 0000000000..a3b35b9a02
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/strtok_r.c
@@ -0,0 +1,42 @@ 
+/* Multiple versions of strtok_r.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN(libc)
+# define strtok_r __redirect_strtok_r
+# define __strtok_r __redirect___strtok_r
+
+# include <string.h>
+
+# undef __strtok_r
+# undef strtok_r
+
+# define SYMBOL_NAME strtok_r
+
+# include "ifunc-sse4_2.h"
+
+libc_ifunc_redirected (__redirect_strtok_r, __strtok_r, IFUNC_SELECTOR ());
+
+weak_alias (__strtok_r, strtok_r)
+
+# ifdef SHARED
+    __hidden_ver1 (__strtok_r, __GI___strtok_r, __redirect___strtok_r)
+        __attribute__ ((visibility ("hidden")));
+# endif
+#endif
\ No newline at end of file