From patchwork Wed Mar 23 21:57:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 52278 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6E853388883F for ; Wed, 23 Mar 2022 22:05:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6E853388883F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1648073157; bh=bhRa5PTJKYlVl+BlABN4B0N6Iz0+y1h1ywexVNJekE4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=u9kAd7oCBtT6NhOMdeHSGU7ppeart+BzH+fWo3OMklwKzP78pjmMyiGRgH/Po493n S81WO+ulE5ICbHG19pzrdBEls/LMd0uWV5mQSNCm+mBINefG3jW9HR4eOPrrUzNiIK FxZx+ZD6Z7vgRX6FmY1epQMmyFSRU9uQ3MtKJTlo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by sourceware.org (Postfix) with ESMTPS id B14EB3876053 for ; Wed, 23 Mar 2022 22:01:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B14EB3876053 Received: by mail-io1-xd32.google.com with SMTP id c23so3407483ioi.4 for ; Wed, 23 Mar 2022 15:01:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bhRa5PTJKYlVl+BlABN4B0N6Iz0+y1h1ywexVNJekE4=; b=2RdvOkRU5CaGhbp29jgrCbUza8CAd5Jh7WdV/JVXa6n8S7w++coAY+r5B47n/nF6Ya YGiqQKdZjbOJybclPAU+RAPN6WrpwqgaR6D+zdsBTWvaYby+mke4yfJXIHa2hzB3MFgU mF4lFqKpuI9eL+iQI/d/doPK6+ul7KA4e5BT6wpIJwRaialV92j9r3UhKNdUcd94bwZe e3UoHqOR5SLmh+N/n7QwLwT5qLrPFvnislD3l+F68BDUo3tRRm9lmLsSDQdfiJuXQ6c5 Su55Qv/5MEiRS5gSQed3LFgljKr4Xxyexb0X24pi5EFAUDICYP8wpbMNdyRKuhJ3CZJk NnUg== X-Gm-Message-State: AOAM531+sO1lpt3UmA0AWK262Ak8OVFrtPsTN7oTQyM4mESKjSh8TFMh M1lroedE+Bab/d+jwq3GtH8Qi1O1uuU= X-Google-Smtp-Source: ABdhPJyBPQpPWteVdkkKQhDu/6EiL+kDDDH6ZIXEszhxgvd+QmINoxpOGHL8ys0bdX58HRBw4lWjjQ== X-Received: by 2002:a05:6602:73c:b0:649:5e22:f3ee with SMTP id g28-20020a056602073c00b006495e22f3eemr1082642iox.156.1648072863495; Wed, 23 Mar 2022 15:01:03 -0700 (PDT) Received: from localhost.localdomain (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.googlemail.com with ESMTPSA id c22-20020a5ea816000000b00649d360663asm529227ioa.40.2022.03.23.15.01.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Mar 2022 15:01:03 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 11/23] x86: Remove strspn-sse2.S and use the generic implementation Date: Wed, 23 Mar 2022 16:57:29 -0500 Message-Id: <20220323215734.3927131-11-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220323215734.3927131-1-goldstein.w.n@gmail.com> References: <20220323215734.3927131-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The generic implementation is faster. geometric_mean(N=20) of all benchmarks New / Original: .710 All string/memory tests pass. Reviewed-by: H.J. Lu --- Geomtric Mean N=20 runs; All functions page aligned len, align1, align2, pos, New Time / Old Time 1, 0, 0, 512, 0.824 1, 1, 0, 512, 1.018 1, 0, 1, 512, 0.986 1, 1, 1, 512, 1.092 2, 0, 0, 512, 0.86 2, 2, 0, 512, 0.868 2, 0, 2, 512, 0.858 2, 2, 2, 512, 0.857 3, 0, 0, 512, 0.836 3, 3, 0, 512, 0.849 3, 0, 3, 512, 0.84 3, 3, 3, 512, 0.85 4, 0, 0, 512, 0.843 4, 4, 0, 512, 0.837 4, 0, 4, 512, 0.835 4, 4, 4, 512, 0.846 5, 0, 0, 512, 0.852 5, 5, 0, 512, 0.848 5, 0, 5, 512, 0.85 5, 5, 5, 512, 0.85 6, 0, 0, 512, 0.853 6, 6, 0, 512, 0.855 6, 0, 6, 512, 0.853 6, 6, 6, 512, 0.853 7, 0, 0, 512, 0.857 7, 7, 0, 512, 0.861 7, 0, 7, 512, 0.94 7, 7, 7, 512, 0.856 8, 0, 0, 512, 0.927 8, 0, 8, 512, 0.965 9, 0, 0, 512, 0.967 9, 1, 0, 512, 0.976 9, 0, 9, 512, 0.887 9, 1, 9, 512, 0.881 10, 0, 0, 512, 0.853 10, 2, 0, 512, 0.846 10, 0, 10, 512, 0.855 10, 2, 10, 512, 0.849 11, 0, 0, 512, 0.854 11, 3, 0, 512, 0.855 11, 0, 11, 512, 0.85 11, 3, 11, 512, 0.854 12, 0, 0, 512, 0.864 12, 4, 0, 512, 0.864 12, 0, 12, 512, 0.867 12, 4, 12, 512, 0.87 13, 0, 0, 512, 0.853 13, 5, 0, 512, 0.841 13, 0, 13, 512, 0.837 13, 5, 13, 512, 0.85 14, 0, 0, 512, 0.838 14, 6, 0, 512, 0.842 14, 0, 14, 512, 0.818 14, 6, 14, 512, 0.845 15, 0, 0, 512, 0.799 15, 7, 0, 512, 0.847 15, 0, 15, 512, 0.787 15, 7, 15, 512, 0.84 16, 0, 0, 512, 0.824 16, 0, 16, 512, 0.827 17, 0, 0, 512, 0.817 17, 1, 0, 512, 0.823 17, 0, 17, 512, 0.82 17, 1, 17, 512, 0.814 18, 0, 0, 512, 0.81 18, 2, 0, 512, 0.833 18, 0, 18, 512, 0.811 18, 2, 18, 512, 0.842 19, 0, 0, 512, 0.823 19, 3, 0, 512, 0.818 19, 0, 19, 512, 0.821 19, 3, 19, 512, 0.824 20, 0, 0, 512, 0.814 20, 4, 0, 512, 0.818 20, 0, 20, 512, 0.806 20, 4, 20, 512, 0.802 21, 0, 0, 512, 0.835 21, 5, 0, 512, 0.839 21, 0, 21, 512, 0.842 21, 5, 21, 512, 0.82 22, 0, 0, 512, 0.824 22, 6, 0, 512, 0.831 22, 0, 22, 512, 0.819 22, 6, 22, 512, 0.824 23, 0, 0, 512, 0.816 23, 7, 0, 512, 0.856 23, 0, 23, 512, 0.808 23, 7, 23, 512, 0.848 24, 0, 0, 512, 0.88 24, 0, 24, 512, 0.846 25, 0, 0, 512, 0.929 25, 1, 0, 512, 0.917 25, 0, 25, 512, 0.884 25, 1, 25, 512, 0.859 26, 0, 0, 512, 0.919 26, 2, 0, 512, 0.867 26, 0, 26, 512, 0.914 26, 2, 26, 512, 0.845 27, 0, 0, 512, 0.919 27, 3, 0, 512, 0.864 27, 0, 27, 512, 0.917 27, 3, 27, 512, 0.847 28, 0, 0, 512, 0.905 28, 4, 0, 512, 0.896 28, 0, 28, 512, 0.898 28, 4, 28, 512, 0.871 29, 0, 0, 512, 0.911 29, 5, 0, 512, 0.91 29, 0, 29, 512, 0.905 29, 5, 29, 512, 0.884 30, 0, 0, 512, 0.907 30, 6, 0, 512, 0.802 30, 0, 30, 512, 0.906 30, 6, 30, 512, 0.818 31, 0, 0, 512, 0.907 31, 7, 0, 512, 0.821 31, 0, 31, 512, 0.89 31, 7, 31, 512, 0.787 4, 0, 0, 32, 0.623 4, 1, 0, 32, 0.606 4, 0, 1, 32, 0.6 4, 1, 1, 32, 0.603 4, 0, 0, 64, 0.731 4, 2, 0, 64, 0.733 4, 0, 2, 64, 0.734 4, 2, 2, 64, 0.755 4, 0, 0, 128, 0.822 4, 3, 0, 128, 0.873 4, 0, 3, 128, 0.89 4, 3, 3, 128, 0.907 4, 0, 0, 256, 0.827 4, 4, 0, 256, 0.811 4, 0, 4, 256, 0.794 4, 4, 4, 256, 0.814 4, 5, 0, 512, 0.841 4, 0, 5, 512, 0.831 4, 5, 5, 512, 0.845 4, 0, 0, 1024, 0.861 4, 6, 0, 1024, 0.857 4, 0, 6, 1024, 0.9 4, 6, 6, 1024, 0.861 4, 0, 0, 2048, 0.879 4, 7, 0, 2048, 0.875 4, 0, 7, 2048, 0.883 4, 7, 7, 2048, 0.88 10, 1, 0, 64, 0.747 10, 1, 1, 64, 0.743 10, 2, 0, 64, 0.732 10, 2, 2, 64, 0.729 10, 3, 0, 64, 0.747 10, 3, 3, 64, 0.733 10, 4, 0, 64, 0.74 10, 4, 4, 64, 0.751 10, 5, 0, 64, 0.735 10, 5, 5, 64, 0.746 10, 6, 0, 64, 0.735 10, 6, 6, 64, 0.733 10, 7, 0, 64, 0.734 10, 7, 7, 64, 0.74 6, 0, 0, 0, 0.377 6, 0, 0, 1, 0.369 6, 0, 1, 1, 0.383 6, 0, 0, 2, 0.391 6, 0, 2, 2, 0.394 6, 0, 0, 3, 0.416 6, 0, 3, 3, 0.411 6, 0, 0, 4, 0.475 6, 0, 4, 4, 0.483 6, 0, 0, 5, 0.473 6, 0, 5, 5, 0.476 6, 0, 0, 6, 0.459 6, 0, 6, 6, 0.445 6, 0, 0, 7, 0.433 6, 0, 7, 7, 0.432 6, 0, 0, 8, 0.492 6, 0, 8, 8, 0.494 6, 0, 0, 9, 0.476 6, 0, 9, 9, 0.483 6, 0, 0, 10, 0.46 6, 0, 10, 10, 0.476 6, 0, 0, 11, 0.463 6, 0, 11, 11, 0.463 6, 0, 0, 12, 0.511 6, 0, 12, 12, 0.515 6, 0, 0, 13, 0.506 6, 0, 13, 13, 0.536 6, 0, 0, 14, 0.496 6, 0, 14, 14, 0.484 6, 0, 0, 15, 0.473 6, 0, 15, 15, 0.475 6, 0, 0, 16, 0.534 6, 0, 16, 16, 0.534 6, 0, 0, 17, 0.525 6, 0, 17, 17, 0.523 6, 0, 0, 18, 0.522 6, 0, 18, 18, 0.524 6, 0, 0, 19, 0.512 6, 0, 19, 19, 0.514 6, 0, 0, 20, 0.535 6, 0, 20, 20, 0.54 6, 0, 0, 21, 0.543 6, 0, 21, 21, 0.536 6, 0, 0, 22, 0.542 6, 0, 22, 22, 0.542 6, 0, 0, 23, 0.529 6, 0, 23, 23, 0.53 6, 0, 0, 24, 0.596 6, 0, 24, 24, 0.589 6, 0, 0, 25, 0.583 6, 0, 25, 25, 0.58 6, 0, 0, 26, 0.574 6, 0, 26, 26, 0.58 6, 0, 0, 27, 0.575 6, 0, 27, 27, 0.558 6, 0, 0, 28, 0.606 6, 0, 28, 28, 0.606 6, 0, 0, 29, 0.589 6, 0, 29, 29, 0.595 6, 0, 0, 30, 0.592 6, 0, 30, 30, 0.585 6, 0, 0, 31, 0.585 6, 0, 31, 31, 0.579 6, 0, 0, 32, 0.625 6, 0, 32, 32, 0.615 6, 0, 0, 33, 0.615 6, 0, 33, 33, 0.61 6, 0, 0, 34, 0.604 6, 0, 34, 34, 0.6 6, 0, 0, 35, 0.602 6, 0, 35, 35, 0.608 6, 0, 0, 36, 0.644 6, 0, 36, 36, 0.644 6, 0, 0, 37, 0.658 6, 0, 37, 37, 0.651 6, 0, 0, 38, 0.644 6, 0, 38, 38, 0.649 6, 0, 0, 39, 0.626 6, 0, 39, 39, 0.632 6, 0, 0, 40, 0.662 6, 0, 40, 40, 0.661 6, 0, 0, 41, 0.656 6, 0, 41, 41, 0.655 6, 0, 0, 42, 0.643 6, 0, 42, 42, 0.637 6, 0, 0, 43, 0.622 6, 0, 43, 43, 0.628 6, 0, 0, 44, 0.673 6, 0, 44, 44, 0.687 6, 0, 0, 45, 0.661 6, 0, 45, 45, 0.659 6, 0, 0, 46, 0.657 6, 0, 46, 46, 0.653 6, 0, 0, 47, 0.658 6, 0, 47, 47, 0.65 6, 0, 0, 48, 0.678 6, 0, 48, 48, 0.683 6, 0, 0, 49, 0.676 6, 0, 49, 49, 0.661 6, 0, 0, 50, 0.672 6, 0, 50, 50, 0.662 6, 0, 0, 51, 0.656 6, 0, 51, 51, 0.659 6, 0, 0, 52, 0.682 6, 0, 52, 52, 0.686 6, 0, 0, 53, 0.67 6, 0, 53, 53, 0.674 6, 0, 0, 54, 0.663 6, 0, 54, 54, 0.675 6, 0, 0, 55, 0.662 6, 0, 55, 55, 0.665 6, 0, 0, 56, 0.681 6, 0, 56, 56, 0.697 6, 0, 0, 57, 0.686 6, 0, 57, 57, 0.687 6, 0, 0, 58, 0.701 6, 0, 58, 58, 0.693 6, 0, 0, 59, 0.709 6, 0, 59, 59, 0.698 6, 0, 0, 60, 0.708 6, 0, 60, 60, 0.708 6, 0, 0, 61, 0.709 6, 0, 61, 61, 0.716 6, 0, 0, 62, 0.709 6, 0, 62, 62, 0.707 6, 0, 0, 63, 0.703 6, 0, 63, 63, 0.716 .../{strspn-sse2.S => strspn-sse2.c} | 8 +- sysdeps/x86_64/strspn.S | 112 ------------------ 2 files changed, 4 insertions(+), 116 deletions(-) rename sysdeps/x86_64/multiarch/{strspn-sse2.S => strspn-sse2.c} (86%) delete mode 100644 sysdeps/x86_64/strspn.S diff --git a/sysdeps/x86_64/multiarch/strspn-sse2.S b/sysdeps/x86_64/multiarch/strspn-sse2.c similarity index 86% rename from sysdeps/x86_64/multiarch/strspn-sse2.S rename to sysdeps/x86_64/multiarch/strspn-sse2.c index e0a095f25a..61cc6cb0a5 100644 --- a/sysdeps/x86_64/multiarch/strspn-sse2.S +++ b/sysdeps/x86_64/multiarch/strspn-sse2.c @@ -1,4 +1,4 @@ -/* strspn optimized with SSE2. +/* strspn. Copyright (C) 2017-2022 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -19,10 +19,10 @@ #if IS_IN (libc) # include -# define strspn __strspn_sse2 +# define STRSPN __strspn_sse2 # undef libc_hidden_builtin_def -# define libc_hidden_builtin_def(strspn) +# define libc_hidden_builtin_def(STRSPN) #endif -#include +#include diff --git a/sysdeps/x86_64/strspn.S b/sysdeps/x86_64/strspn.S deleted file mode 100644 index 61b76ee0a1..0000000000 --- a/sysdeps/x86_64/strspn.S +++ /dev/null @@ -1,112 +0,0 @@ -/* strspn (str, ss) -- Return the length of the initial segment of STR - which contains only characters from SS. - For AMD x86-64. - Copyright (C) 1994-2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include - - .text -ENTRY (strspn) - - movq %rdi, %rdx /* Save SRC. */ - - /* First we create a table with flags for all possible characters. - For the ASCII (7bit/8bit) or ISO-8859-X character sets which are - supported by the C string functions we have 256 characters. - Before inserting marks for the stop characters we clear the whole - table. */ - movq %rdi, %r8 /* Save value. */ - subq $256, %rsp /* Make space for 256 bytes. */ - cfi_adjust_cfa_offset(256) - movl $32, %ecx /* 32*8 bytes = 256 bytes. */ - movq %rsp, %rdi - xorl %eax, %eax /* We store 0s. */ - cld - rep - stosq - - movq %rsi, %rax /* Setup stopset. */ - -/* For understanding the following code remember that %rcx == 0 now. - Although all the following instruction only modify %cl we always - have a correct zero-extended 64-bit value in %rcx. */ - - .p2align 4 -L(2): movb (%rax), %cl /* get byte from stopset */ - testb %cl, %cl /* is NUL char? */ - jz L(1) /* yes => start compare loop */ - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ - - movb 1(%rax), %cl /* get byte from stopset */ - testb $0xff, %cl /* is NUL char? */ - jz L(1) /* yes => start compare loop */ - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ - - movb 2(%rax), %cl /* get byte from stopset */ - testb $0xff, %cl /* is NUL char? */ - jz L(1) /* yes => start compare loop */ - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ - - movb 3(%rax), %cl /* get byte from stopset */ - addq $4, %rax /* increment stopset pointer */ - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ - testb $0xff, %cl /* is NUL char? */ - jnz L(2) /* no => process next dword from stopset */ - -L(1): leaq -4(%rdx), %rax /* prepare loop */ - - /* We use a neat trick for the following loop. Normally we would - have to test for two termination conditions - 1. a character in the stopset was found - and - 2. the end of the string was found - But as a sign that the character is in the stopset we store its - value in the table. But the value of NUL is NUL so the loop - terminates for NUL in every case. */ - - .p2align 4 -L(3): addq $4, %rax /* adjust pointer for full loop round */ - - movb (%rax), %cl /* get byte from string */ - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ - jz L(4) /* no => return */ - - movb 1(%rax), %cl /* get byte from string */ - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ - jz L(5) /* no => return */ - - movb 2(%rax), %cl /* get byte from string */ - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ - jz L(6) /* no => return */ - - movb 3(%rax), %cl /* get byte from string */ - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ - jnz L(3) /* yes => start loop again */ - - incq %rax /* adjust pointer */ -L(6): incq %rax -L(5): incq %rax - -L(4): addq $256, %rsp /* remove stopset */ - cfi_adjust_cfa_offset(-256) - subq %rdx, %rax /* we have to return the number of valid - characters, so compute distance to first - non-valid character */ - ret -END (strspn) -libc_hidden_builtin_def (strspn)