From patchwork Fri Jun 2 19:48:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 20751 Received: (qmail 118976 invoked by alias); 2 Jun 2017 19:48:51 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 118966 invoked by uid 89); 2 Jun 2017 19:48:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.0 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qk0-f172.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=wtAGA9B9aFVkCEi9MzWeVyVt9Rr2Md8zebEEOeJzfYk=; b=gEBuzwn9IHgFuRnT2v3f6QjP3l7QspEu/3YkTp/2slC+pn1d0/2IwBYgkK1k5yL8J9 15QTUr56X4vDuDM6//SVngdy+vyh6VajRUCp8aE3mYj11FW7ONjt5kN3VZ46BZ9/NPKC +2iNlSWT3jlObWsTe7ZVsbP7XGqztoyzI5csx5c2mzgqiG+5nJItMXywSP36/BZ8WBRT 7Rixf7wp1LbjtcfgjZneeSwpevE6Et/dYXk1ywqaxmVEhvRLKs0CuIWL4Hjq3zw2n3Yz TPo55SWrlee0z6gSo4U/FyrxPcHF47ukn3o7gBFtMOvCB8z0ljewnhiVXZ5LiJ/tndMn GA/g== X-Gm-Message-State: AKS2vOxXuJSL7Xmz04qV7m0A10YIBNG8wB0Vzzt8EAJkOjuYPpn4SjS8 8rcI9AM9kZ5/MuaEtTF7O49RAzRSvQ== X-Received: by 10.55.147.3 with SMTP id v3mr9664105qkd.165.1496432928920; Fri, 02 Jun 2017 12:48:48 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170601181242.GA28627@lucon.org> References: <20170601181242.GA28627@lucon.org> From: "H.J. Lu" Date: Fri, 2 Jun 2017 12:48:48 -0700 Message-ID: Subject: Re: [PATCH] x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 To: GNU C Library On Thu, Jun 1, 2017 at 11:12 AM, H.J. Lu wrote: > Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 to check 32 bytes with > a single vector compare instruction. It is as fast as SSE2 versions for > size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. > Select AVX2 version on AVX2 machines where vzeroupper is preferred and > AVX unaligned load is fast. > > NB: It uses TZCNT instead of BSF since TZCNT produces the same result > as BSF for non-zero input. TZCNT is faster than BSF and is executed > as BSF if machine doesn't support TZCNT. > > Any comments? > > H.J. > --- > * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add > strlen-avx2, strnlen-avx2 and wcslen-avx2 and wcsnlen-avx2. > * sysdeps/x86_64/multiarch/ifunc-impl-list.c > (__libc_ifunc_impl_list): Add tests for __strlen_avx2, > __strlen_sse2, __strnlen_avx2, __strnlen_sse2, __wcslen_avx2, > __wcslen_sse2, __wcsnlen_avx2 and __wcsnlen_sse2. > * sysdeps/x86_64/multiarch/strlen-avx2.S: New file. > * sysdeps/x86_64/multiarch/strlen.S: Likewise. > * sysdeps/x86_64/multiarch/strnlen-avx2.S: Likewise. > * sysdeps/x86_64/multiarch/strnlen.S: Likewise. > * sysdeps/x86_64/multiarch/wcslen-avx2.S: Likewise. > * sysdeps/x86_64/multiarch/wcslen.S: Likewise. > * sysdeps/x86_64/multiarch/wcsnlen-avx2.S: Likewise. > * sysdeps/x86_64/multiarch/wcsnlen.S: Likewise. Updated patch with IFUNC selector in C. From eca10ba617eea905abfaa7576f6185c1f3f47fa5 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 19 May 2017 12:19:42 -0700 Subject: [PATCH 5/8] x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 to check 32 bytes with a single vector compare instruction. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strlen-sse2, strnlen-sse2, strlen-avx2, strnlen-avx2, wcslen-sse2, wcsnlen-sse2, wcslen-avx2 and wcsnlen-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strlen_avx2, __strlen_sse2, __strnlen_avx2, __strnlen_sse2, __wcslen_avx2, __wcslen_sse2, __wcsnlen_avx2 and __wcsnlen_sse2. * sysdeps/x86_64/multiarch/strlen-avx2.S: New file. * sysdeps/x86_64/multiarch/strlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strlen.c: Likewise. * sysdeps/x86_64/multiarch/strnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen.c: Likewise. * sysdeps/x86_64/multiarch/wcslen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c: Likewise. --- sysdeps/x86_64/multiarch/Makefile | 4 +- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 28 ++ sysdeps/x86_64/multiarch/strlen-avx2.S | 394 +++++++++++++++++++++++++++++ sysdeps/x86_64/multiarch/strlen-sse2.S | 32 +++ sysdeps/x86_64/multiarch/strlen.c | 30 +++ sysdeps/x86_64/multiarch/strnlen-avx2.S | 4 + sysdeps/x86_64/multiarch/strnlen-sse2.S | 36 +++ sysdeps/x86_64/multiarch/strnlen.c | 33 +++ sysdeps/x86_64/multiarch/wcslen-avx2.S | 4 + sysdeps/x86_64/multiarch/wcslen-sse2.S | 26 ++ sysdeps/x86_64/multiarch/wcslen.c | 31 +++ sysdeps/x86_64/multiarch/wcsnlen-avx2.S | 5 + sysdeps/x86_64/multiarch/wcsnlen-sse2.S | 26 ++ sysdeps/x86_64/multiarch/wcsnlen.c | 31 +++ 14 files changed, 683 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2.S create mode 100644 sysdeps/x86_64/multiarch/strlen-sse2.S create mode 100644 sysdeps/x86_64/multiarch/strlen.c create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2.S create mode 100644 sysdeps/x86_64/multiarch/strnlen-sse2.S create mode 100644 sysdeps/x86_64/multiarch/strnlen.c create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcslen-sse2.S create mode 100644 sysdeps/x86_64/multiarch/wcslen.c create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2.S create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-sse2.S create mode 100644 sysdeps/x86_64/multiarch/wcsnlen.c diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index eb42b19..915b44f 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -13,6 +13,7 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c strcmp-ssse3 \ memcpy-ssse3-back \ memmove-ssse3-back \ memmove-avx512-no-vzeroupper strcasecmp_l-ssse3 \ + strlen-sse2 strnlen-sse2 strlen-avx2 strnlen-avx2 \ strncase_l-ssse3 strcat-ssse3 strncat-ssse3\ strcpy-ssse3 strncpy-ssse3 stpcpy-ssse3 stpncpy-ssse3 \ strcpy-sse2-unaligned strncpy-sse2-unaligned \ @@ -35,7 +36,8 @@ ifeq ($(subdir),wcsmbs) sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \ wmemchr-sse2 wmemchr-avx2 \ wmemcmp-avx2-movbe \ - wcscpy-ssse3 wcscpy-c + wcscpy-ssse3 wcscpy-c \ + wcslen-sse2 wcsnlen-sse2 wcslen-avx2 wcsnlen-avx2 endif ifeq ($(subdir),debug) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index f60535b..f139efc 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -166,6 +166,20 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, __rawmemchr_avx2) IFUNC_IMPL_ADD (array, i, rawmemchr, 1, __rawmemchr_sse2)) + /* Support sysdeps/x86_64/multiarch/strlen.S. */ + IFUNC_IMPL (i, name, strlen, + IFUNC_IMPL_ADD (array, i, strlen, + HAS_ARCH_FEATURE (AVX2_Usable), + __strlen_avx2) + IFUNC_IMPL_ADD (array, i, strlen, 1, __strlen_sse2)) + + /* Support sysdeps/x86_64/multiarch/strnlen.S. */ + IFUNC_IMPL (i, name, strnlen, + IFUNC_IMPL_ADD (array, i, strnlen, + HAS_ARCH_FEATURE (AVX2_Usable), + __strnlen_avx2) + IFUNC_IMPL_ADD (array, i, strnlen, 1, __strnlen_sse2)) + /* Support sysdeps/x86_64/multiarch/stpncpy.S. */ IFUNC_IMPL (i, name, stpncpy, IFUNC_IMPL_ADD (array, i, stpncpy, HAS_CPU_FEATURE (SSSE3), @@ -310,6 +324,20 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, __wcscpy_ssse3) IFUNC_IMPL_ADD (array, i, wcscpy, 1, __wcscpy_sse2)) + /* Support sysdeps/x86_64/multiarch/wcslen.S. */ + IFUNC_IMPL (i, name, wcslen, + IFUNC_IMPL_ADD (array, i, wcslen, + HAS_ARCH_FEATURE (AVX2_Usable), + __wcslen_avx2) + IFUNC_IMPL_ADD (array, i, wcslen, 1, __wcslen_sse2)) + + /* Support sysdeps/x86_64/multiarch/wcsnlen.S. */ + IFUNC_IMPL (i, name, wcsnlen, + IFUNC_IMPL_ADD (array, i, wcsnlen, + HAS_ARCH_FEATURE (AVX2_Usable), + __wcsnlen_avx2) + IFUNC_IMPL_ADD (array, i, wcsnlen, 1, __wcsnlen_sse2)) + /* Support sysdeps/x86_64/multiarch/wmemchr.S. */ IFUNC_IMPL (i, name, wmemchr, IFUNC_IMPL_ADD (array, i, wmemchr, diff --git a/sysdeps/x86_64/multiarch/strlen-avx2.S b/sysdeps/x86_64/multiarch/strlen-avx2.S new file mode 100644 index 0000000..1dc823a --- /dev/null +++ b/sysdeps/x86_64/multiarch/strlen-avx2.S @@ -0,0 +1,394 @@ +/* strlen/strnlen/wcslen/wcsnlen optimized with AVX2. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) + +# include + +# ifndef STRLEN +# define STRLEN __strlen_avx2 +# endif + +# ifdef USE_AS_WCSLEN +# define VPCMPEQ vpcmpeqd +# define VPMINU vpminud +# else +# define VPCMPEQ vpcmpeqb +# define VPMINU vpminub +# endif + +# ifndef VZEROUPPER +# define VZEROUPPER vzeroupper +# endif + +# define VEC_SIZE 32 + + .section .text.avx,"ax",@progbits +ENTRY (STRLEN) +# ifdef USE_AS_STRNLEN + /* Check for zero length. */ + testq %rsi, %rsi + jz L(zero) +# ifdef USE_AS_WCSLEN + shl $2, %rsi +# endif + movq %rsi, %r8 +# endif + movl %edi, %ecx + movq %rdi, %rdx + vpxor %xmm0, %xmm0, %xmm0 + + /* Check if we may cross page boundary with one vector load. */ + andl $(2 * VEC_SIZE - 1), %ecx + cmpl $VEC_SIZE, %ecx + ja L(cros_page_boundary) + + /* Check the first VEC_SIZE bytes. */ + VPCMPEQ (%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + +# ifdef USE_AS_STRNLEN + jnz L(first_vec_x0_check) + /* Adjust length and check the end of data. */ + subq $VEC_SIZE, %rsi + jbe L(max) +# else + jnz L(first_vec_x0) +# endif + + /* Align data for aligned loads in the loop. */ + addq $VEC_SIZE, %rdi + andl $(VEC_SIZE - 1), %ecx + andq $-VEC_SIZE, %rdi + +# ifdef USE_AS_STRNLEN + /* Adjust length. */ + addq %rcx, %rsi + + subq $(VEC_SIZE * 4), %rsi + jbe L(last_4x_vec_or_less) +# endif + jmp L(more_4x_vec) + + .p2align 4 +L(cros_page_boundary): + andl $(VEC_SIZE - 1), %ecx + andq $-VEC_SIZE, %rdi + VPCMPEQ (%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + /* Remove the leading bytes. */ + sarl %cl, %eax + testl %eax, %eax + jz L(aligned_more) + tzcntl %eax, %eax +# ifdef USE_AS_STRNLEN + /* Check the end of data. */ + cmpq %rax, %rsi + jbe L(max) +# endif + addq %rdi, %rax + addq %rcx, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(aligned_more): +# ifdef USE_AS_STRNLEN + /* "rcx" is less than VEC_SIZE. Calculate "rdx + rcx - VEC_SIZE" + with "rdx - (VEC_SIZE - rcx)" instead of "(rdx + rcx) - VEC_SIZE" + to void possible addition overflow. */ + negq %rcx + addq $VEC_SIZE, %rcx + + /* Check the end of data. */ + subq %rcx, %rsi + jbe L(max) +# endif + + addq $VEC_SIZE, %rdi + +# ifdef USE_AS_STRNLEN + subq $(VEC_SIZE * 4), %rsi + jbe L(last_4x_vec_or_less) +# endif + +L(more_4x_vec): + /* Check the first 4 * VEC_SIZE. Only one VEC_SIZE at a time + since data is only aligned to VEC_SIZE. */ + VPCMPEQ (%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x0) + + VPCMPEQ VEC_SIZE(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x1) + + VPCMPEQ (VEC_SIZE * 2)(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x2) + + VPCMPEQ (VEC_SIZE * 3)(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x3) + + addq $(VEC_SIZE * 4), %rdi + +# ifdef USE_AS_STRNLEN + subq $(VEC_SIZE * 4), %rsi + jbe L(last_4x_vec_or_less) +# endif + + /* Align data to 4 * VEC_SIZE. */ + movq %rdi, %rcx + andl $(4 * VEC_SIZE - 1), %ecx + andq $-(4 * VEC_SIZE), %rdi + +# ifdef USE_AS_STRNLEN + /* Adjust length. */ + addq %rcx, %rsi +# endif + + .p2align 4 +L(loop_4x_vec): + /* Compare 4 * VEC at a time forward. */ + vmovdqa (%rdi), %ymm1 + vmovdqa VEC_SIZE(%rdi), %ymm2 + vmovdqa (VEC_SIZE * 2)(%rdi), %ymm3 + vmovdqa (VEC_SIZE * 3)(%rdi), %ymm4 + VPMINU %ymm1, %ymm2, %ymm5 + VPMINU %ymm3, %ymm4, %ymm6 + VPMINU %ymm5, %ymm6, %ymm5 + + VPCMPEQ %ymm5, %ymm0, %ymm5 + vpmovmskb %ymm5, %eax + testl %eax, %eax + jnz L(4x_vec_end) + + addq $(VEC_SIZE * 4), %rdi + +# ifndef USE_AS_STRNLEN + jmp L(loop_4x_vec) +# else + subq $(VEC_SIZE * 4), %rsi + ja L(loop_4x_vec) + +L(last_4x_vec_or_less): + /* Less than 4 * VEC and aligned to VEC_SIZE. */ + addl $(VEC_SIZE * 2), %esi + jle L(last_2x_vec) + + VPCMPEQ (%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x0) + + VPCMPEQ VEC_SIZE(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x1) + + VPCMPEQ (VEC_SIZE * 2)(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + + jnz L(first_vec_x2_check) + subl $VEC_SIZE, %esi + jle L(max) + + VPCMPEQ (VEC_SIZE * 3)(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + + jnz L(first_vec_x3_check) + movq %r8, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(last_2x_vec): + addl $(VEC_SIZE * 2), %esi + VPCMPEQ (%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + + jnz L(first_vec_x0_check) + subl $VEC_SIZE, %esi + jle L(max) + + VPCMPEQ VEC_SIZE(%rdi), %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x1_check) + movq %r8, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(first_vec_x0_check): + tzcntl %eax, %eax + /* Check the end of data. */ + cmpq %rax, %rsi + jbe L(max) + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(first_vec_x1_check): + tzcntl %eax, %eax + /* Check the end of data. */ + cmpq %rax, %rsi + jbe L(max) + addq $VEC_SIZE, %rax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(first_vec_x2_check): + tzcntl %eax, %eax + /* Check the end of data. */ + cmpq %rax, %rsi + jbe L(max) + addq $(VEC_SIZE * 2), %rax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(first_vec_x3_check): + tzcntl %eax, %eax + /* Check the end of data. */ + cmpq %rax, %rsi + jbe L(max) + addq $(VEC_SIZE * 3), %rax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(max): + movq %r8, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(zero): + xorl %eax, %eax + ret +# endif + + .p2align 4 +L(first_vec_x0): + tzcntl %eax, %eax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(first_vec_x1): + tzcntl %eax, %eax + addq $VEC_SIZE, %rax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(first_vec_x2): + tzcntl %eax, %eax + addq $(VEC_SIZE * 2), %rax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + + .p2align 4 +L(4x_vec_end): + VPCMPEQ %ymm1, %ymm0, %ymm1 + vpmovmskb %ymm1, %eax + testl %eax, %eax + jnz L(first_vec_x0) + VPCMPEQ %ymm2, %ymm0, %ymm2 + vpmovmskb %ymm2, %eax + testl %eax, %eax + jnz L(first_vec_x1) + VPCMPEQ %ymm3, %ymm0, %ymm3 + vpmovmskb %ymm3, %eax + testl %eax, %eax + jnz L(first_vec_x2) + VPCMPEQ %ymm4, %ymm0, %ymm4 + vpmovmskb %ymm4, %eax + testl %eax, %eax +L(first_vec_x3): + tzcntl %eax, %eax + addq $(VEC_SIZE * 3), %rax + addq %rdi, %rax + subq %rdx, %rax +# ifdef USE_AS_WCSLEN + shrq $2, %rax +# endif + VZEROUPPER + ret + +END (STRLEN) +#endif diff --git a/sysdeps/x86_64/multiarch/strlen-sse2.S b/sysdeps/x86_64/multiarch/strlen-sse2.S new file mode 100644 index 0000000..d0c2991 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strlen-sse2.S @@ -0,0 +1,32 @@ +/* strlen optimized with SSE2. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +# define strlen __strlen_sse2 + +# ifdef SHARED +# undef libc_hidden_builtin_def +/* It doesn't make sense to send libc-internal strlen calls through a PLT. + The speedup we get from using AVX2 instructions is likely eaten away + by the indirect call in the PLT. */ +# define libc_hidden_builtin_def(name) \ + .globl __GI_strlen; __GI_strlen = __strlen_sse2 +# endif +#endif + +#include "../strlen.S" diff --git a/sysdeps/x86_64/multiarch/strlen.c b/sysdeps/x86_64/multiarch/strlen.c new file mode 100644 index 0000000..8384035 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strlen.c @@ -0,0 +1,30 @@ +/* Multiple versions of strlen. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define strlen __redirect_strlen +# include +# undef strlen + +# define SYMBOL_NAME strlen +# include "ifunc-sse2-avx2.h" + +libc_ifunc_redirected (__redirect_strlen, strlen, IFUNC_SELECTOR ()); +#endif diff --git a/sysdeps/x86_64/multiarch/strnlen-avx2.S b/sysdeps/x86_64/multiarch/strnlen-avx2.S new file mode 100644 index 0000000..c4062b2 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strnlen-avx2.S @@ -0,0 +1,4 @@ +#define STRLEN __strnlen_avx2 +#define USE_AS_STRNLEN 1 + +#include "strlen-avx2.S" diff --git a/sysdeps/x86_64/multiarch/strnlen-sse2.S b/sysdeps/x86_64/multiarch/strnlen-sse2.S new file mode 100644 index 0000000..7db8821 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strnlen-sse2.S @@ -0,0 +1,36 @@ +/* strnlen optimized with SSE2. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +# define __strnlen __strnlen_sse2 + +# ifdef SHARED +/* It doesn't make sense to send libc-internal strnlen calls through a PLT. + The speedup we get from using AVX2 instructions is likely eaten away + by the indirect call in the PLT. */ +# undef libc_hidden_def +# define libc_hidden_def(name) \ + .globl __GI_strnlen; __GI_strnlen = __strnlen_sse2; \ + .globl __GI___strnlen; __GI___strnlen = __strnlen_sse2 +# endif + +# undef weak_alias +# define weak_alias(__strnlen, strnlen) +#endif + +#include "../strnlen.S" diff --git a/sysdeps/x86_64/multiarch/strnlen.c b/sysdeps/x86_64/multiarch/strnlen.c new file mode 100644 index 0000000..bb09b83 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strnlen.c @@ -0,0 +1,33 @@ +/* Multiple versions of strnlen. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define strnlen __redirect_strnlen +# define __strnlen __redirect___strnlen +# include +# undef __strnlen +# undef strnlen + +# define SYMBOL_NAME strnlen +# include "ifunc-sse2-avx2.h" + +libc_ifunc_redirected (__redirect_strnlen, __strnlen, IFUNC_SELECTOR ()); +weak_alias (__strnlen, strnlen); +#endif diff --git a/sysdeps/x86_64/multiarch/wcslen-avx2.S b/sysdeps/x86_64/multiarch/wcslen-avx2.S new file mode 100644 index 0000000..c9224f1 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcslen-avx2.S @@ -0,0 +1,4 @@ +#define STRLEN __wcslen_avx2 +#define USE_AS_WCSLEN 1 + +#include "strlen-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcslen-sse2.S b/sysdeps/x86_64/multiarch/wcslen-sse2.S new file mode 100644 index 0000000..e7b24ea --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcslen-sse2.S @@ -0,0 +1,26 @@ +/* wcslen optimized with SSE2. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +# define __wcslen __wcslen_sse2 + +# undef weak_alias +# define weak_alias(__wcslen, wcslen) +#endif + +#include "../wcslen.S" diff --git a/sysdeps/x86_64/multiarch/wcslen.c b/sysdeps/x86_64/multiarch/wcslen.c new file mode 100644 index 0000000..3f8ad87 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcslen.c @@ -0,0 +1,31 @@ +/* Multiple versions of wcslen. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define __wcslen __redirect_wcslen +# include +# undef __wcslen + +# define SYMBOL_NAME wcslen +# include "ifunc-sse2-avx2.h" + +libc_ifunc_redirected (__redirect_wcslen, __wcslen, IFUNC_SELECTOR ()); +weak_alias (__wcslen, wcslen); +#endif diff --git a/sysdeps/x86_64/multiarch/wcsnlen-avx2.S b/sysdeps/x86_64/multiarch/wcsnlen-avx2.S new file mode 100644 index 0000000..fac8354 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsnlen-avx2.S @@ -0,0 +1,5 @@ +#define STRLEN __wcsnlen_avx2 +#define USE_AS_WCSLEN 1 +#define USE_AS_STRNLEN 1 + +#include "strlen-avx2.S" diff --git a/sysdeps/x86_64/multiarch/wcsnlen-sse2.S b/sysdeps/x86_64/multiarch/wcsnlen-sse2.S new file mode 100644 index 0000000..846466b --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsnlen-sse2.S @@ -0,0 +1,26 @@ +/* wcsnlen optimized with SSE2. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +# define __wcsnlen __wcsnlen_sse2 + +# undef weak_alias +# define weak_alias(__wcsnlen, wcsnlen) +#endif + +#include "../wcsnlen.S" diff --git a/sysdeps/x86_64/multiarch/wcsnlen.c b/sysdeps/x86_64/multiarch/wcsnlen.c new file mode 100644 index 0000000..35541b1 --- /dev/null +++ b/sysdeps/x86_64/multiarch/wcsnlen.c @@ -0,0 +1,31 @@ +/* Multiple versions of wcsnlen. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define __wcsnlen __redirect_wcsnlen +# include +# undef __wcsnlen + +# define SYMBOL_NAME wcsnlen +# include "ifunc-sse2-avx2.h" + +libc_ifunc_redirected (__redirect_wcsnlen, __wcsnlen, IFUNC_SELECTOR ()); +weak_alias (__wcsnlen, wcsnlen); +#endif -- 2.9.4