Message ID | 20210315142520.1661407-1-hjl.tools@gmail.com |
---|---|
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 874E4385E83A; Mon, 15 Mar 2021 14:25:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 874E4385E83A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615818327; bh=HcPW+GjbUWDlc0hajhUfGWnDqCY9JkxtAofYGPOHLrg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ouiTel9ehdJysmFfxMO8pEHf7qj5BSf955PVSwL/v4MtB7OohC0kNTvBRbua2gaem 8qtLpalKtTFs29s97fPoJUMKz4H8tKSK1kED+zeOH8gElEcfYryftFa6nMme+K+O94 MDeLYK5kYSRcZHFA+CRpf4ROZrJTS5YSjcsBsDuc= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by sourceware.org (Postfix) with ESMTPS id F0C48385DC14 for <libc-alpha@sourceware.org>; Mon, 15 Mar 2021 14:25:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F0C48385DC14 Received: by mail-pj1-x102a.google.com with SMTP id ga23-20020a17090b0397b02900c0b81bbcd4so14478226pjb.0 for <libc-alpha@sourceware.org>; Mon, 15 Mar 2021 07:25:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=HcPW+GjbUWDlc0hajhUfGWnDqCY9JkxtAofYGPOHLrg=; b=cHRbs3vv+hz1JRDiWWfNU5TeL+UZgPOMMzYLW4X/0fOogHNIhOjY2losA6FCe3NM95 cA+s2TelyFIM39DRMTGhEmYbI0fj7FoR+8KvQsDXgkWsVj40On8Eaqr2RzmTX66zaQ9o mKjpEeSrCJeDxgRzuCqP1d4yj1W+ZoZXTDtxKKCPwoWDVEGLIs5EZMu0+PDkqFtkqX7C +uyEtzGrh9x6lG1k2YQpxvLSWv0TVXzwoJA/rFCykn2aiqOJrFcPH6crGBvzttRYKcRO /+ztJei+JnD0Q/LOBZGw2c4CTD44abqheRpMnBG4hUbJ3sFqhcts9JuhXXKlhTXxl7On 0jCQ== X-Gm-Message-State: AOAM533wvYxSbIaAYeAeoztdN5RaLJOVe0rH3FPuAZZaXJqMf3KkWnpo iAc3wknWFaPbIqtUh4AGA9wEpLNz86s= X-Google-Smtp-Source: ABdhPJx5CDV1oants5KUaXIreGDN/LrCsq7bdklqZVyI8uh2gwuLvcWVF2aJRIbfZTnAcU0moQJV8Q== X-Received: by 2002:a17:90b:357:: with SMTP id fh23mr12931598pjb.169.1615818322657; Mon, 15 Mar 2021 07:25:22 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.56.38.48]) by smtp.gmail.com with ESMTPSA id u7sm13376893pfh.150.2021.03.15.07.25.21 for <libc-alpha@sourceware.org> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Mar 2021 07:25:22 -0700 (PDT) Received: from gnu-cfl-2.?040none?041 (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id C74141A077C for <libc-alpha@sourceware.org>; Mon, 15 Mar 2021 07:25:20 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 00/10] x86-64: Avoid RTM abort inside a RTM region Date: Mon, 15 Mar 2021 07:25:10 -0700 Message-Id: <20210315142520.1661407-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3027.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: "H.J. Lu via Libc-alpha" <libc-alpha@sourceware.org> Reply-To: "H.J. Lu" <hjl.tools@gmail.com> Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
x86-64: Avoid RTM abort inside a RTM region
|
|
Message
H.J. Lu
March 15, 2021, 2:25 p.m. UTC
Changes in v2: 1. Don't use YMM2 in EVEX strcpy/strcat. 2. Correct EVEX mempcpy listing. 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. --- Since VZEROUPPER triggers RTM abort inside a transactionally executing RTM region, avoid VZEROUPPER inside a RTM region in string/memory functions: 1. Turn on Prefer_No_VZEROUPPER for processors with RTM. 2. Select functions optimized with 256-bit EVEX instructions using YMM16-YMM31 registers, which don't need VZEROUPPER at function exit. 3. Select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with RTM, but without 256-bit EVEX instructions. 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. 5. Add tests to verify that string/memory functions won't cause RTM abort in RTM region. 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. H.J. Lu (10): x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP x86-64: Add ifunc-avx2.h functions with 256-bit EVEX x86-64: Add strcpy family functions with 256-bit EVEX x86-64: Add memmove family functions with 256-bit EVEX x86-64: Add memset family functions with 256-bit EVEX x86-64: Add memcmp family functions with 256-bit EVEX x86-64: Add AVX optimized string/memory functions for RTM x86: Add string/memory function tests in RTM region x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions sysdeps/x86/Makefile | 23 + sysdeps/x86/cpu-features.c | 20 +- sysdeps/x86/cpu-tunables.c | 2 + ...cpu-features-preferred_feature_index_1.def | 1 + sysdeps/x86/tst-memchr-rtm.c | 54 + sysdeps/x86/tst-memcmp-rtm.c | 52 + sysdeps/x86/tst-memmove-rtm.c | 53 + sysdeps/x86/tst-memrchr-rtm.c | 54 + sysdeps/x86/tst-memset-rtm.c | 45 + sysdeps/x86/tst-strchr-rtm.c | 54 + sysdeps/x86/tst-strcpy-rtm.c | 53 + sysdeps/x86/tst-string-rtm.h | 72 ++ sysdeps/x86/tst-strlen-rtm.c | 53 + sysdeps/x86/tst-strncmp-rtm.c | 52 + sysdeps/x86/tst-strrchr-rtm.c | 53 + sysdeps/x86_64/multiarch/Makefile | 58 +- sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 381 +++++- sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +- sysdeps/x86_64/multiarch/ifunc-memmove.h | 45 +- sysdeps/x86_64/multiarch/ifunc-memset.h | 49 +- sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +- sysdeps/x86_64/multiarch/ifunc-wmemset.h | 22 +- sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +- sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++ .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 + sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +- sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++ .../memmove-avx-unaligned-erms-rtm.S | 17 + .../multiarch/memmove-avx512-unaligned-erms.S | 25 +- .../multiarch/memmove-evex-unaligned-erms.S | 33 + .../multiarch/memmove-vec-unaligned-erms.S | 57 +- sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +- sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++ .../memset-avx2-unaligned-erms-rtm.S | 10 + .../multiarch/memset-avx2-unaligned-erms.S | 12 +- .../multiarch/memset-avx512-unaligned-erms.S | 16 +- .../multiarch/memset-evex-unaligned-erms.S | 24 + .../multiarch/memset-vec-unaligned-erms.S | 61 +- sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 + sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 + sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 + sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 + sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 + sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +- sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++ sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +- sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++ sysdeps/x86_64/multiarch/strchr.c | 17 +- sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 + sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +- sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++ sysdeps/x86_64/multiarch/strcmp.c | 19 +- sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +- sysdeps/x86_64/multiarch/strcpy-evex.S | 1003 ++++++++++++++++ sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +- sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++ sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/strncat-evex.S | 3 + sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/strncmp-evex.S | 3 + sysdeps/x86_64/multiarch/strncmp.c | 19 +- sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/strncpy-evex.S | 3 + sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 + sysdeps/x86_64/multiarch/strnlen-evex.S | 4 + sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 + sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +- sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++ sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcschr-evex.S | 3 + sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 + sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 + sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 + sysdeps/x86_64/multiarch/wcslen-evex.S | 4 + sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 + sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 + sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 + sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 + sysdeps/x86_64/multiarch/wcsnlen.c | 18 +- sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 + sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 + sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 + sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 + .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 + sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 + sysdeps/x86_64/sysdep.h | 22 + 96 files changed, 6372 insertions(+), 337 deletions(-) create mode 100644 sysdeps/x86/tst-memchr-rtm.c create mode 100644 sysdeps/x86/tst-memcmp-rtm.c create mode 100644 sysdeps/x86/tst-memmove-rtm.c create mode 100644 sysdeps/x86/tst-memrchr-rtm.c create mode 100644 sysdeps/x86/tst-memset-rtm.c create mode 100644 sysdeps/x86/tst-strchr-rtm.c create mode 100644 sysdeps/x86/tst-strcpy-rtm.c create mode 100644 sysdeps/x86/tst-string-rtm.h create mode 100644 sysdeps/x86/tst-strlen-rtm.c create mode 100644 sysdeps/x86/tst-strncmp-rtm.c create mode 100644 sysdeps/x86/tst-strrchr-rtm.c create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
Comments
On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > Changes in v2: > > 1. Don't use YMM2 in EVEX strcpy/strcat. > 2. Correct EVEX mempcpy listing. > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. > > --- > Since VZEROUPPER triggers RTM abort inside a transactionally executing > RTM region, avoid VZEROUPPER inside a RTM region in string/memory > functions: > > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM. > 2. Select functions optimized with 256-bit EVEX instructions using > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit. > 3. Select AVX optimized string/memory functions with > > xtest > jz 1f > vzeroall > ret > 1: > vzeroupper > ret > > at function exit on processors with RTM, but without 256-bit EVEX > instructions. > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. > 5. Add tests to verify that string/memory functions won't cause RTM abort > in RTM region. > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. > > H.J. Lu (10): > x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP > x86-64: Add ifunc-avx2.h functions with 256-bit EVEX > x86-64: Add strcpy family functions with 256-bit EVEX > x86-64: Add memmove family functions with 256-bit EVEX > x86-64: Add memset family functions with 256-bit EVEX > x86-64: Add memcmp family functions with 256-bit EVEX > x86-64: Add AVX optimized string/memory functions for RTM > x86: Add string/memory function tests in RTM region > x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions > x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions > > sysdeps/x86/Makefile | 23 + > sysdeps/x86/cpu-features.c | 20 +- > sysdeps/x86/cpu-tunables.c | 2 + > ...cpu-features-preferred_feature_index_1.def | 1 + > sysdeps/x86/tst-memchr-rtm.c | 54 + > sysdeps/x86/tst-memcmp-rtm.c | 52 + > sysdeps/x86/tst-memmove-rtm.c | 53 + > sysdeps/x86/tst-memrchr-rtm.c | 54 + > sysdeps/x86/tst-memset-rtm.c | 45 + > sysdeps/x86/tst-strchr-rtm.c | 54 + > sysdeps/x86/tst-strcpy-rtm.c | 53 + > sysdeps/x86/tst-string-rtm.h | 72 ++ > sysdeps/x86/tst-strlen-rtm.c | 53 + > sysdeps/x86/tst-strncmp-rtm.c | 52 + > sysdeps/x86/tst-strrchr-rtm.c | 53 + > sysdeps/x86_64/multiarch/Makefile | 58 +- > sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +- > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 381 +++++- > sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +- > sysdeps/x86_64/multiarch/ifunc-memmove.h | 45 +- > sysdeps/x86_64/multiarch/ifunc-memset.h | 49 +- > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +- > sysdeps/x86_64/multiarch/ifunc-wmemset.h | 22 +- > sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +- > sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++ > .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 + > sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +- > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++ > .../memmove-avx-unaligned-erms-rtm.S | 17 + > .../multiarch/memmove-avx512-unaligned-erms.S | 25 +- > .../multiarch/memmove-evex-unaligned-erms.S | 33 + > .../multiarch/memmove-vec-unaligned-erms.S | 57 +- > sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +- > sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++ > .../memset-avx2-unaligned-erms-rtm.S | 10 + > .../multiarch/memset-avx2-unaligned-erms.S | 12 +- > .../multiarch/memset-avx512-unaligned-erms.S | 16 +- > .../multiarch/memset-evex-unaligned-erms.S | 24 + > .../multiarch/memset-vec-unaligned-erms.S | 61 +- > sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 + > sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 + > sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 + > sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 + > sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 + > sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +- > sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++ > sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +- > sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++ > sysdeps/x86_64/multiarch/strchr.c | 17 +- > sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 + > sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +- > sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++ > sysdeps/x86_64/multiarch/strcmp.c | 19 +- > sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +- > sysdeps/x86_64/multiarch/strcpy-evex.S | 1003 ++++++++++++++++ > sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +- > sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++ > sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/strncat-evex.S | 3 + > sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/strncmp-evex.S | 3 + > sysdeps/x86_64/multiarch/strncmp.c | 19 +- > sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/strncpy-evex.S | 3 + > sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 + > sysdeps/x86_64/multiarch/strnlen-evex.S | 4 + > sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 + > sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +- > sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++ > sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/wcschr-evex.S | 3 + > sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 + > sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 + > sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 + > sysdeps/x86_64/multiarch/wcslen-evex.S | 4 + > sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 + > sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 + > sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 + > sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 + > sysdeps/x86_64/multiarch/wcsnlen.c | 18 +- > sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 + > sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 + > sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 + > sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 + > .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 + > sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 + > sysdeps/x86_64/sysdep.h | 22 + > 96 files changed, 6372 insertions(+), 337 deletions(-) > create mode 100644 sysdeps/x86/tst-memchr-rtm.c > create mode 100644 sysdeps/x86/tst-memcmp-rtm.c > create mode 100644 sysdeps/x86/tst-memmove-rtm.c > create mode 100644 sysdeps/x86/tst-memrchr-rtm.c > create mode 100644 sysdeps/x86/tst-memset-rtm.c > create mode 100644 sysdeps/x86/tst-strchr-rtm.c > create mode 100644 sysdeps/x86/tst-strcpy-rtm.c > create mode 100644 sysdeps/x86/tst-string-rtm.h > create mode 100644 sysdeps/x86/tst-strlen-rtm.c > create mode 100644 sysdeps/x86/tst-strncmp-rtm.c > create mode 100644 sysdeps/x86/tst-strrchr-rtm.c > create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S > create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S > create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S > > -- > 2.30.2 > These patches have been tested internally and externally for more than 2 weeks. I have been running the patched system glibc on AVX and AVX512 machines. If there are no objections nor comments, I will check them in next Tuesday.
On Wed, Mar 24, 2021 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > Changes in v2: > > > > 1. Don't use YMM2 in EVEX strcpy/strcat. > > 2. Correct EVEX mempcpy listing. > > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. > > > > --- > > Since VZEROUPPER triggers RTM abort inside a transactionally executing > > RTM region, avoid VZEROUPPER inside a RTM region in string/memory > > functions: > > > > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM. > > 2. Select functions optimized with 256-bit EVEX instructions using > > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit. > > 3. Select AVX optimized string/memory functions with > > > > xtest > > jz 1f > > vzeroall > > ret > > 1: > > vzeroupper > > ret > > > > at function exit on processors with RTM, but without 256-bit EVEX > > instructions. > > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 > > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, > > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add > > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. > > 5. Add tests to verify that string/memory functions won't cause RTM abort > > in RTM region. > > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. > > > > H.J. Lu (10): > > x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP > > x86-64: Add ifunc-avx2.h functions with 256-bit EVEX > > x86-64: Add strcpy family functions with 256-bit EVEX > > x86-64: Add memmove family functions with 256-bit EVEX > > x86-64: Add memset family functions with 256-bit EVEX > > x86-64: Add memcmp family functions with 256-bit EVEX > > x86-64: Add AVX optimized string/memory functions for RTM > > x86: Add string/memory function tests in RTM region > > x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions > > x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions > > > > sysdeps/x86/Makefile | 23 + > > sysdeps/x86/cpu-features.c | 20 +- > > sysdeps/x86/cpu-tunables.c | 2 + > > ...cpu-features-preferred_feature_index_1.def | 1 + > > sysdeps/x86/tst-memchr-rtm.c | 54 + > > sysdeps/x86/tst-memcmp-rtm.c | 52 + > > sysdeps/x86/tst-memmove-rtm.c | 53 + > > sysdeps/x86/tst-memrchr-rtm.c | 54 + > > sysdeps/x86/tst-memset-rtm.c | 45 + > > sysdeps/x86/tst-strchr-rtm.c | 54 + > > sysdeps/x86/tst-strcpy-rtm.c | 53 + > > sysdeps/x86/tst-string-rtm.h | 72 ++ > > sysdeps/x86/tst-strlen-rtm.c | 53 + > > sysdeps/x86/tst-strncmp-rtm.c | 52 + > > sysdeps/x86/tst-strrchr-rtm.c | 53 + > > sysdeps/x86_64/multiarch/Makefile | 58 +- > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +- > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 381 +++++- > > sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +- > > sysdeps/x86_64/multiarch/ifunc-memmove.h | 45 +- > > sysdeps/x86_64/multiarch/ifunc-memset.h | 49 +- > > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +- > > sysdeps/x86_64/multiarch/ifunc-wmemset.h | 22 +- > > sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +- > > sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++ > > .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 + > > sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +- > > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++ > > .../memmove-avx-unaligned-erms-rtm.S | 17 + > > .../multiarch/memmove-avx512-unaligned-erms.S | 25 +- > > .../multiarch/memmove-evex-unaligned-erms.S | 33 + > > .../multiarch/memmove-vec-unaligned-erms.S | 57 +- > > sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +- > > sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++ > > .../memset-avx2-unaligned-erms-rtm.S | 10 + > > .../multiarch/memset-avx2-unaligned-erms.S | 12 +- > > .../multiarch/memset-avx512-unaligned-erms.S | 16 +- > > .../multiarch/memset-evex-unaligned-erms.S | 24 + > > .../multiarch/memset-vec-unaligned-erms.S | 61 +- > > sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 + > > sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 + > > sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 + > > sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 + > > sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 + > > sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +- > > sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++ > > sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +- > > sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++ > > sysdeps/x86_64/multiarch/strchr.c | 17 +- > > sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 + > > sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +- > > sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++ > > sysdeps/x86_64/multiarch/strcmp.c | 19 +- > > sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +- > > sysdeps/x86_64/multiarch/strcpy-evex.S | 1003 ++++++++++++++++ > > sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +- > > sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++ > > sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/strncat-evex.S | 3 + > > sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/strncmp-evex.S | 3 + > > sysdeps/x86_64/multiarch/strncmp.c | 19 +- > > sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/strncpy-evex.S | 3 + > > sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 + > > sysdeps/x86_64/multiarch/strnlen-evex.S | 4 + > > sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 + > > sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +- > > sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++ > > sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/wcschr-evex.S | 3 + > > sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 + > > sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 + > > sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 + > > sysdeps/x86_64/multiarch/wcslen-evex.S | 4 + > > sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 + > > sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 + > > sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 + > > sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 + > > sysdeps/x86_64/multiarch/wcsnlen.c | 18 +- > > sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 + > > sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 + > > sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 + > > sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 + > > .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 + > > sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 + > > sysdeps/x86_64/sysdep.h | 22 + > > 96 files changed, 6372 insertions(+), 337 deletions(-) > > create mode 100644 sysdeps/x86/tst-memchr-rtm.c > > create mode 100644 sysdeps/x86/tst-memcmp-rtm.c > > create mode 100644 sysdeps/x86/tst-memmove-rtm.c > > create mode 100644 sysdeps/x86/tst-memrchr-rtm.c > > create mode 100644 sysdeps/x86/tst-memset-rtm.c > > create mode 100644 sysdeps/x86/tst-strchr-rtm.c > > create mode 100644 sysdeps/x86/tst-strcpy-rtm.c > > create mode 100644 sysdeps/x86/tst-string-rtm.h > > create mode 100644 sysdeps/x86/tst-strlen-rtm.c > > create mode 100644 sysdeps/x86/tst-strncmp-rtm.c > > create mode 100644 sysdeps/x86/tst-strrchr-rtm.c > > create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > > create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S > > create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S > > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S > > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S > > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S > > > > -- > > 2.30.2 > > > > These patches have been tested internally and externally for more than 2 > weeks. I have been running the patched system glibc on AVX and AVX512 > machines. If there are no objections nor comments, I will check them in > next Tuesday. > I checked all 10 patches into master branch. Here are backports for 2.33 to 2.28 branches: https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.33 https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.32 https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.31 https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.30 https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.29 https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.28
On Mon, Mar 29, 2021 at 4:06 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Wed, Mar 24, 2021 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > > > Changes in v2: > > > > > > 1. Don't use YMM2 in EVEX strcpy/strcat. > > > 2. Correct EVEX mempcpy listing. > > > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. > > > > > > --- > > > Since VZEROUPPER triggers RTM abort inside a transactionally executing > > > RTM region, avoid VZEROUPPER inside a RTM region in string/memory > > > functions: > > > > > > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM. > > > 2. Select functions optimized with 256-bit EVEX instructions using > > > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit. > > > 3. Select AVX optimized string/memory functions with > > > > > > xtest > > > jz 1f > > > vzeroall > > > ret > > > 1: > > > vzeroupper > > > ret > > > > > > at function exit on processors with RTM, but without 256-bit EVEX > > > instructions. > > > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 > > > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, > > > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add > > > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. > > > 5. Add tests to verify that string/memory functions won't cause RTM abort > > > in RTM region. > > > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions. > > > > > > H.J. Lu (10): > > > x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP > > > x86-64: Add ifunc-avx2.h functions with 256-bit EVEX > > > x86-64: Add strcpy family functions with 256-bit EVEX > > > x86-64: Add memmove family functions with 256-bit EVEX > > > x86-64: Add memset family functions with 256-bit EVEX > > > x86-64: Add memcmp family functions with 256-bit EVEX > > > x86-64: Add AVX optimized string/memory functions for RTM > > > x86: Add string/memory function tests in RTM region > > > x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions > > > x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions > > > > > > sysdeps/x86/Makefile | 23 + > > > sysdeps/x86/cpu-features.c | 20 +- > > > sysdeps/x86/cpu-tunables.c | 2 + > > > ...cpu-features-preferred_feature_index_1.def | 1 + > > > sysdeps/x86/tst-memchr-rtm.c | 54 + > > > sysdeps/x86/tst-memcmp-rtm.c | 52 + > > > sysdeps/x86/tst-memmove-rtm.c | 53 + > > > sysdeps/x86/tst-memrchr-rtm.c | 54 + > > > sysdeps/x86/tst-memset-rtm.c | 45 + > > > sysdeps/x86/tst-strchr-rtm.c | 54 + > > > sysdeps/x86/tst-strcpy-rtm.c | 53 + > > > sysdeps/x86/tst-string-rtm.h | 72 ++ > > > sysdeps/x86/tst-strlen-rtm.c | 53 + > > > sysdeps/x86/tst-strncmp-rtm.c | 52 + > > > sysdeps/x86/tst-strrchr-rtm.c | 53 + > > > sysdeps/x86_64/multiarch/Makefile | 58 +- > > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 18 +- > > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 381 +++++- > > > sysdeps/x86_64/multiarch/ifunc-memcmp.h | 17 +- > > > sysdeps/x86_64/multiarch/ifunc-memmove.h | 45 +- > > > sysdeps/x86_64/multiarch/ifunc-memset.h | 49 +- > > > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 17 +- > > > sysdeps/x86_64/multiarch/ifunc-wmemset.h | 22 +- > > > sysdeps/x86_64/multiarch/memchr-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/memchr-avx2.S | 45 +- > > > sysdeps/x86_64/multiarch/memchr-evex.S | 381 ++++++ > > > .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 28 +- > > > sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 440 +++++++ > > > .../memmove-avx-unaligned-erms-rtm.S | 17 + > > > .../multiarch/memmove-avx512-unaligned-erms.S | 25 +- > > > .../multiarch/memmove-evex-unaligned-erms.S | 33 + > > > .../multiarch/memmove-vec-unaligned-erms.S | 57 +- > > > sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/memrchr-avx2.S | 53 +- > > > sysdeps/x86_64/multiarch/memrchr-evex.S | 337 ++++++ > > > .../memset-avx2-unaligned-erms-rtm.S | 10 + > > > .../multiarch/memset-avx2-unaligned-erms.S | 12 +- > > > .../multiarch/memset-avx512-unaligned-erms.S | 16 +- > > > .../multiarch/memset-evex-unaligned-erms.S | 24 + > > > .../multiarch/memset-vec-unaligned-erms.S | 61 +- > > > sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/rawmemchr-evex.S | 4 + > > > sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/stpcpy-evex.S | 3 + > > > sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/stpncpy-evex.S | 4 + > > > sysdeps/x86_64/multiarch/strcat-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/strcat-avx2.S | 6 +- > > > sysdeps/x86_64/multiarch/strcat-evex.S | 283 +++++ > > > sysdeps/x86_64/multiarch/strchr-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/strchr-avx2.S | 28 +- > > > sysdeps/x86_64/multiarch/strchr-evex.S | 335 ++++++ > > > sysdeps/x86_64/multiarch/strchr.c | 17 +- > > > sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/strchrnul-evex.S | 3 + > > > sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/strcmp-avx2.S | 55 +- > > > sysdeps/x86_64/multiarch/strcmp-evex.S | 1043 +++++++++++++++++ > > > sysdeps/x86_64/multiarch/strcmp.c | 19 +- > > > sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/strcpy-avx2.S | 85 +- > > > sysdeps/x86_64/multiarch/strcpy-evex.S | 1003 ++++++++++++++++ > > > sysdeps/x86_64/multiarch/strlen-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/strlen-avx2.S | 43 +- > > > sysdeps/x86_64/multiarch/strlen-evex.S | 436 +++++++ > > > sysdeps/x86_64/multiarch/strncat-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/strncat-evex.S | 3 + > > > sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/strncmp-evex.S | 3 + > > > sysdeps/x86_64/multiarch/strncmp.c | 19 +- > > > sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/strncpy-evex.S | 3 + > > > sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/strnlen-evex.S | 4 + > > > sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S | 12 + > > > sysdeps/x86_64/multiarch/strrchr-avx2.S | 19 +- > > > sysdeps/x86_64/multiarch/strrchr-evex.S | 265 +++++ > > > sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/wcschr-evex.S | 3 + > > > sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/wcscmp-evex.S | 4 + > > > sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/wcslen-evex.S | 4 + > > > sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S | 5 + > > > sysdeps/x86_64/multiarch/wcsncmp-evex.S | 5 + > > > sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S | 5 + > > > sysdeps/x86_64/multiarch/wcsnlen-evex.S | 5 + > > > sysdeps/x86_64/multiarch/wcsnlen.c | 18 +- > > > sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S | 3 + > > > sysdeps/x86_64/multiarch/wcsrchr-evex.S | 3 + > > > sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/wmemchr-evex.S | 4 + > > > .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S | 4 + > > > sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S | 4 + > > > sysdeps/x86_64/sysdep.h | 22 + > > > 96 files changed, 6372 insertions(+), 337 deletions(-) > > > create mode 100644 sysdeps/x86/tst-memchr-rtm.c > > > create mode 100644 sysdeps/x86/tst-memcmp-rtm.c > > > create mode 100644 sysdeps/x86/tst-memmove-rtm.c > > > create mode 100644 sysdeps/x86/tst-memrchr-rtm.c > > > create mode 100644 sysdeps/x86/tst-memset-rtm.c > > > create mode 100644 sysdeps/x86/tst-strchr-rtm.c > > > create mode 100644 sysdeps/x86/tst-strcpy-rtm.c > > > create mode 100644 sysdeps/x86/tst-string-rtm.h > > > create mode 100644 sysdeps/x86/tst-strlen-rtm.c > > > create mode 100644 sysdeps/x86/tst-strncmp-rtm.c > > > create mode 100644 sysdeps/x86/tst-strrchr-rtm.c > > > create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S > > > create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S > > > create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S > > > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S > > > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S > > > create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S > > > > > > -- > > > 2.30.2 > > > > > > > These patches have been tested internally and externally for more than 2 > > weeks. I have been running the patched system glibc on AVX and AVX512 > > machines. If there are no objections nor comments, I will check them in > > next Tuesday. > > > > I checked all 10 patches into master branch. Here are backports for > 2.33 to 2.28 branches: > > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.33 > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.32 > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.31 > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.30 > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.29 > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.28 > I am backporting these to release branches.