From patchwork Thu Feb 2 18:11:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 55453 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1A6CC385B506 for ; Thu, 2 Feb 2023 18:12:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1A6CC385B506 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1675361540; bh=znaI1U4wIJZ+NQJCbdR6kQLEkfBw9KpMugncYLG5/HM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=oN36uorLyhxM8bBFL8AeW1e7hrIGIvRIQDcJBsmC2dSbg8ksU/PwspiBEYO5krURu XmKN8UqkbksanWai//o4WKEeY2Jr/N98sx8nNwp5rX+GnQLR8aFxjg7qe2jId0QXIy /YPDOj+9F/gVku2fRjzdNLZQgolhGtQwl/vucIPk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id A97F23858C52 for ; Thu, 2 Feb 2023 18:11:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A97F23858C52 Received: by mail-ot1-x334.google.com with SMTP id f5-20020a9d5f05000000b00684c0c2eb3fso685717oti.10 for ; Thu, 02 Feb 2023 10:11:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=znaI1U4wIJZ+NQJCbdR6kQLEkfBw9KpMugncYLG5/HM=; b=MqUN0mt9nhL/G9RvQuXxkej6bW9RDY8y0srLMM+sXjDjsN71yJviJWxqhkVCcfe6ss GbzCeBGPzcnI3o/Lg9ne1IMXMJBeWBN8dYKA7DMPhqVpp+xl23gPMwXXt8KVJDSsgCct K7QX4qeeHIff5LPbjpG+H2ztNNx5PDqtzb1AQHyTufAT/jlRpalkjHRf0E8eNyF1RV4N nLVzhj9JLKq77higsD/8x7jcqiunPcslWgp7jt1KxkINY+UefAb5dNHJLJLvnYLUFo3O crqVoQF1/zxwT+RlGsRqP1RJJXceJ7Xbjitx/+Qko+Q9pySij3NjqOH3vFsXb8nV/XUt vU1w== X-Gm-Message-State: AO0yUKXMdgkyDYWyUbJofJtLNAoAOZzm+t7RKXjaotpnSpMjGLzcieUd ebSEnxwKxtA0huYEO6TM9GYDDXEFqGAZVjizaNU= X-Google-Smtp-Source: AK7set+uR0DivQimbD1KEkLI38TmeG3LIWv8ak3y0NH/8iEx1e2wCVNK2rTtx9hmdAWL98C0vkoYsQ== X-Received: by 2002:a05:6830:1255:b0:68b:e391:324d with SMTP id s21-20020a056830125500b0068be391324dmr3670613otp.0.1675361513806; Thu, 02 Feb 2023 10:11:53 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c2:1887:da12:b9d3:2162:a28c]) by smtp.gmail.com with ESMTPSA id ci10-20020a05683063ca00b00684a10970adsm126689otb.16.2023.02.02.10.11.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Feb 2023 10:11:53 -0800 (PST) To: libc-alpha@sourceware.org, Richard Henderson , Jeff Law , Xi Ruoyao , Noah Goldstein Subject: [PATCH v12 00/31] Improve generic string routines Date: Thu, 2 Feb 2023 15:11:18 -0300 Message-Id: <20230202181149.2181553-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It is done by: 1. Parametrizing the internal routines (for instance the find zero in a word) so each architecture can reimplement without the need to reimplement the whole routine. 2. Vectorizing more string implementations (for instance strcpy and strcmp). 3. Change some implementations to use already possible optimized ones (strnlen and strchr). It makes new ports to focus on only provide optimized implementation of a hardful symbols (for instance memchr) and make its improvement to be used in a larger set of routines. I checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powerpc64-linux-gnu by removing the arch-specific assembly implementation and disabling multiarch (it covers both LE and BE for 64 and 32 bits). I also checked the string routines on alpha, hppa, and sh. Changes since v11: * Use index_first_zero_ne strcmp/strncmp, and fixed it on LE. * Added strrchr optimization based on strlen/memrchr. * Reorder the patches so composed optimizations (such as strrchr) as ordered later. Changes since v10: * Added strcpy and stpcpy optimization. * Added RISCV __riscv_zbb support to lower ctz/clz/orc.b. * Fixed test-strnlen name. Changes since v9: * Added strncmp optimization. * Fixed wcsmbs regressions. Changes since v8: * Change memrchr to use vectorized load on final string, instead of byte per byte reads. * Remove string-maskoff.h header. * Add string-repeat_bytes.h and string-shift.h. * Hook up the generic implementation on string tests. Changes since v7: * Split string-fzc.h out of string-fzi.h, with all of the routines that are combinations of fza and fzi routines. * Fix missing find_t and shift_find() from alpha, arm, powerpc. * Use compiler builtins for arm and powerpc. * Define sh4 has_zero() via has_eq(), rather than reverse. Changes since v6: * Add find_t to handle alpha way of comapring bytes (which returns a bit-mask instead of byte-mask). * Fixed alpha string-fzi.h and added string-fza.h. * Renamed check_mask to shift_find. Changes since v5: * Replace 'inline' with '__always_inline' macros. * Replace strchr implementation with a simpler one that call strchrnul. * Add strchrnul suggested changes. * Add memchr suggested changes. * Added check_mask on string-maskoff.h. * Rebase and update Copyright years. Changes since v4: * Removed __clz and __ctz in favor of count_leading_zero and count_trailing_zeros from longlong.h. * Use repeat_bytes more often. * Added a comment on strcmp final_cmp on why index_first_zero_ne can not be used. Changes since v3: * Rebased against master. * Dropped strcpy optimization. * Refactor strcmp implementation. * Some minor changes in comments. Changes since v2: * Move string-fz{a,b,i} to its own patch. * Add a inline implementation for __builtin_c{l,t}z to avoid using compiler provided symbols. * Add a new header, string-maskoff.h, to handle unaligned accesses on some implementation. * Fixed strcmp on LE machines. * Added a unaligned strcpy variant for architecture that define _STRING_ARCH_unaligned. * Add SH string-fzb.h (which uses cmp/str instruction to find a zero in word). Changes since v1: * Marked ChangeLog entries with [BZ #5806], as appropriate. * Reorganized the headers, so that armv6t2 and power6 need override as little as possible to use their (integer) zero detection insns. * Hopefully fixed all of the coding style issues. * Adjusted the memrchr algorithm as discussed. * Replaced the #ifdef STRRCHR etc that are used by the multiarch * files. * Tested on i386, i686, x86_64 (verified this is unused), ppc64, ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, aarch64, alpha (qemu) and hppa (qemu). Adhemerval Zanella (25): Parameterize op_t from memcopy.h Add string vectorized find and detection functions string: Improve generic strlen string: Improve generic strchrnul string: Improve generic strchr string: Improve generic strcmp string: Improve generic strncmp string: Improve generic stpcpy string: Improve generic strcpy string: Improve generic memchr string: Improve generic strnlen with memchr string: Improve generic memrchr string: Improve generic strrchr with memrchr and strlen sh: Add string-fzb.h riscv: Add string-fza.h and string-fzi.h string: Hook up the default implementation on test-strlen string: Hook up the default implementation on test-strnlen string: Hook up the default implementation on test-strchr string: Hook up the default implementation on test-strcmp string: Hook up the default implementation on test-strncmp string: Hook up the default implementation on test-stpcpy string: Hook up the default implementation on test-strcpy string: Hook up the default implementation on test-memchr string: Hook up the default implementation on test-memrchr string: Hook up the default implementation on test-strrchr Richard Henderson (6): Parameterize OP_T_THRES from memcopy.h hppa: Add memcopy.h hppa: Add string-fza.h, string-fzc.h, and string-fzi.h alpha: Add string-fza, string-fzb.h, string-fzi.h, and string-shift.h arm: Add string-fza.h powerpc: Add string-fza.h string/memchr.c | 176 +++++----------- string/memcmp.c | 4 - string/memrchr.c | 196 ++++-------------- string/stpcpy.c | 92 +++++++- string/strchr.c | 164 +-------------- string/strchrnul.c | 155 ++------------ string/strcmp.c | 110 ++++++++-- string/strcpy.c | 6 +- string/strlen.c | 92 ++------ string/strncmp.c | 138 ++++++++---- string/strnlen.c | 137 +----------- string/strrchr.c | 18 +- string/test-memchr.c | 31 ++- string/test-memrchr.c | 7 + string/test-stpcpy.c | 32 ++- string/test-strchr.c | 53 +++-- string/test-strcmp.c | 22 ++ string/test-strcpy.c | 34 ++- string/test-strlen.c | 31 ++- string/test-strncmp.c | 16 ++ string/test-strnlen.c | 35 +++- string/test-strrchr.c | 38 ++-- sysdeps/alpha/string-fza.h | 61 ++++++ sysdeps/alpha/string-fzb.h | 52 +++++ sysdeps/alpha/string-fzi.h | 62 ++++++ sysdeps/alpha/string-shift.h | 44 ++++ sysdeps/arm/armv6t2/string-fza.h | 68 ++++++ sysdeps/generic/memcopy.h | 10 +- sysdeps/generic/string-fza.h | 104 ++++++++++ sysdeps/generic/string-fzb.h | 49 +++++ sysdeps/generic/string-fzc.h | 83 ++++++++ sysdeps/generic/string-fzi.h | 71 +++++++ sysdeps/generic/string-misc.h | 45 ++++ sysdeps/generic/string-opthr.h | 25 +++ sysdeps/generic/string-optype.h | 24 +++ sysdeps/generic/string-shift.h | 52 +++++ sysdeps/hppa/memcopy.h | 42 ++++ sysdeps/hppa/string-fzb.h | 63 ++++++ sysdeps/hppa/string-fzc.h | 124 +++++++++++ sysdeps/hppa/string-fzi.h | 63 ++++++ sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +- sysdeps/i386/memcopy.h | 3 - sysdeps/i386/string-opthr.h | 25 +++ sysdeps/m68k/memcopy.h | 3 - sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 - .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- .../power4/multiarch/strchrnul-ppc32.c | 4 - .../power4/multiarch/strnlen-ppc32.c | 14 +- .../powerpc64/multiarch/memchr-ppc64.c | 9 +- sysdeps/powerpc/string-fza.h | 72 +++++++ sysdeps/riscv/string-fza.h | 70 +++++++ sysdeps/riscv/string-fzi.h | 77 +++++++ sysdeps/s390/strchr-c.c | 11 +- sysdeps/s390/strchrnul-c.c | 2 - sysdeps/s390/strlen-c.c | 10 +- sysdeps/s390/strnlen-c.c | 14 +- sysdeps/sh/string-fzb.h | 55 +++++ sysdeps/x86_64/x32/string-optype.h | 24 +++ 58 files changed, 2041 insertions(+), 1014 deletions(-) create mode 100644 sysdeps/alpha/string-fza.h create mode 100644 sysdeps/alpha/string-fzb.h create mode 100644 sysdeps/alpha/string-fzi.h create mode 100644 sysdeps/alpha/string-shift.h create mode 100644 sysdeps/arm/armv6t2/string-fza.h create mode 100644 sysdeps/generic/string-fza.h create mode 100644 sysdeps/generic/string-fzb.h create mode 100644 sysdeps/generic/string-fzc.h create mode 100644 sysdeps/generic/string-fzi.h create mode 100644 sysdeps/generic/string-misc.h create mode 100644 sysdeps/generic/string-opthr.h create mode 100644 sysdeps/generic/string-optype.h create mode 100644 sysdeps/generic/string-shift.h create mode 100644 sysdeps/hppa/memcopy.h create mode 100644 sysdeps/hppa/string-fzb.h create mode 100644 sysdeps/hppa/string-fzc.h create mode 100644 sysdeps/hppa/string-fzi.h create mode 100644 sysdeps/i386/string-opthr.h create mode 100644 sysdeps/powerpc/string-fza.h create mode 100644 sysdeps/riscv/string-fza.h create mode 100644 sysdeps/riscv/string-fzi.h create mode 100644 sysdeps/sh/string-fzb.h create mode 100644 sysdeps/x86_64/x32/string-optype.h