From patchwork Fri Oct 14 22:39:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58876 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 75F7B385383E for ; Fri, 14 Oct 2022 22:41:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 75F7B385383E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787292; bh=g4oRMyVGEgJOCIVHbJS1w0FCoR9NlVUuKLeGv+4q8Y8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=CfHtoBavqmgcj7J5SpI8+DESUoVMXLJmUpM66fLW9CEoku5GY5SSVDTrt38AqYxIG cKPw+NUdblOw90nGgwPQtdjp8RyH2jGhQirJYfPhCKXSYzzMNs49XljUAluG6eqqy1 YrqoyHgB3rfDQsXzh4nqmYhehjym1B7UHIb85EmY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by sourceware.org (Postfix) with ESMTPS id 0DBAF3858D38 for ; Fri, 14 Oct 2022 22:39:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0DBAF3858D38 Received: by mail-pg1-x530.google.com with SMTP id e129so5496605pgc.9 for ; Fri, 14 Oct 2022 15:39:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g4oRMyVGEgJOCIVHbJS1w0FCoR9NlVUuKLeGv+4q8Y8=; b=ZHfTx/nVCE+brHSII1keC/05IxEW8+bF0DURV7EE2h7lAKVQqW0cbRF6me3GLp8mCA 2MdvtAeeLUjwQXMhkqymsTmjMrCNA57GjmsDr9g+RHkXXeebMy55NtNf44Qe3AsbI4wZ rDLe4qdyBd6axcAXr2nUeod0OYvrisfNv9Tw2sHyhu8qurEXwWLjHjDBumnPxRV8lb5a EqIRUaHsYP/lMiTj5jt8UYQEB3IpzHz7TLnu1qH2EC6M4Sn1fpP/slLZ0bl/LCGAvQfV MWQ8tK4D7wPvDLk4wDcU4D5MSkM1cpcIEkI08aJeYXIGCRvqA2iaL5vmjiVj3cownkZ+ cfZQ== X-Gm-Message-State: ACrzQf20ZkpZnTYt7S607OBD39/MVretFKYjRP1Hr6/9pRuS5hNV+tV2 j0NzEHuDZm2ELbKl2JA8sHl98hT9iQruLA== X-Google-Smtp-Source: AMsMyM5ge17CFhTQNQw/S9bsOtUGjHGTypZRwN4gZapj64iPBwSn0XNHJ5N8dbB07WNzZffVGuY34g== X-Received: by 2002:a05:6a00:2384:b0:566:813c:ae24 with SMTP id f4-20020a056a00238400b00566813cae24mr101710pfc.17.1665787158416; Fri, 14 Oct 2022 15:39:18 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:17 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 1/7] x86: Update and move evex256/512 vec macros Date: Fri, 14 Oct 2022 17:39:08 -0500 Message-Id: <20221014223914.700492-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014164008.1325863-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" 1) Copy so that backport will be easier. 2) Make section only define if there is not a previous definition 3) Add `VEC_lo` definition for proper reg-width but in the ymm/zmm0-15 range. This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/x86-avx-rtm-vecs.h | 35 ++++++++ sysdeps/x86_64/multiarch/x86-avx-vecs.h | 47 ++++++++++ .../x86_64/multiarch/x86-evex-vecs-common.h | 39 ++++++++ sysdeps/x86_64/multiarch/x86-evex256-vecs.h | 38 ++++++++ sysdeps/x86_64/multiarch/x86-evex512-vecs.h | 38 ++++++++ sysdeps/x86_64/multiarch/x86-sse2-vecs.h | 47 ++++++++++ sysdeps/x86_64/multiarch/x86-vec-macros.h | 90 +++++++++++++++++++ 7 files changed, 334 insertions(+) create mode 100644 sysdeps/x86_64/multiarch/x86-avx-rtm-vecs.h create mode 100644 sysdeps/x86_64/multiarch/x86-avx-vecs.h create mode 100644 sysdeps/x86_64/multiarch/x86-evex-vecs-common.h create mode 100644 sysdeps/x86_64/multiarch/x86-evex256-vecs.h create mode 100644 sysdeps/x86_64/multiarch/x86-evex512-vecs.h create mode 100644 sysdeps/x86_64/multiarch/x86-sse2-vecs.h create mode 100644 sysdeps/x86_64/multiarch/x86-vec-macros.h diff --git a/sysdeps/x86_64/multiarch/x86-avx-rtm-vecs.h b/sysdeps/x86_64/multiarch/x86-avx-rtm-vecs.h new file mode 100644 index 0000000000..0b326c8a70 --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-avx-rtm-vecs.h @@ -0,0 +1,35 @@ +/* Common config for AVX-RTM VECs + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _X86_AVX_RTM_VECS_H +#define _X86_AVX_RTM_VECS_H 1 + +#define COND_VZEROUPPER COND_VZEROUPPER_XTEST +#define ZERO_UPPER_VEC_REGISTERS_RETURN \ + ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST + +#define VZEROUPPER_RETURN jmp L(return_vzeroupper) + +#define USE_WITH_RTM 1 +#include "x86-avx-vecs.h" + +#undef SECTION +#define SECTION(p) p##.avx.rtm + +#endif diff --git a/sysdeps/x86_64/multiarch/x86-avx-vecs.h b/sysdeps/x86_64/multiarch/x86-avx-vecs.h new file mode 100644 index 0000000000..dca1089060 --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-avx-vecs.h @@ -0,0 +1,47 @@ +/* Common config for AVX VECs + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _X86_AVX_VECS_H +#define _X86_AVX_VECS_H 1 + +#ifdef VEC_SIZE +# error "Multiple VEC configs included!" +#endif + +#define VEC_SIZE 32 +#include "x86-vec-macros.h" + +#define USE_WITH_AVX 1 +#define SECTION(p) p##.avx + +/* 4-byte mov instructions with AVX2. */ +#define MOV_SIZE 4 +/* 1 (ret) + 3 (vzeroupper). */ +#define RET_SIZE 4 +#define VZEROUPPER vzeroupper + +#define VMOVU vmovdqu +#define VMOVA vmovdqa +#define VMOVNT vmovntdq + +/* Often need to access xmm portion. */ +#define VMM_128 VMM_any_xmm +#define VMM VMM_any_ymm + +#endif diff --git a/sysdeps/x86_64/multiarch/x86-evex-vecs-common.h b/sysdeps/x86_64/multiarch/x86-evex-vecs-common.h new file mode 100644 index 0000000000..f331e9d8ec --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-evex-vecs-common.h @@ -0,0 +1,39 @@ +/* Common config for EVEX256 and EVEX512 VECs + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _X86_EVEX_VECS_COMMON_H +#define _X86_EVEX_VECS_COMMON_H 1 + +#include "x86-vec-macros.h" + +/* 6-byte mov instructions with EVEX. */ +#define MOV_SIZE 6 +/* No vzeroupper needed. */ +#define RET_SIZE 1 +#define VZEROUPPER + +#define VMOVU vmovdqu64 +#define VMOVA vmovdqa64 +#define VMOVNT vmovntdq + +#define VMM_128 VMM_hi_xmm +#define VMM_256 VMM_hi_ymm +#define VMM_512 VMM_hi_zmm + +#endif diff --git a/sysdeps/x86_64/multiarch/x86-evex256-vecs.h b/sysdeps/x86_64/multiarch/x86-evex256-vecs.h new file mode 100644 index 0000000000..8337b95504 --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-evex256-vecs.h @@ -0,0 +1,38 @@ +/* Common config for EVEX256 VECs + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _EVEX256_VECS_H +#define _EVEX256_VECS_H 1 + +#ifdef VEC_SIZE +# error "Multiple VEC configs included!" +#endif + +#define VEC_SIZE 32 +#include "x86-evex-vecs-common.h" + +#define USE_WITH_EVEX256 1 + +#ifndef SECTION +# define SECTION(p) p##.evex +#endif + +#define VMM VMM_256 +#define VMM_lo VMM_any_ymm +#endif diff --git a/sysdeps/x86_64/multiarch/x86-evex512-vecs.h b/sysdeps/x86_64/multiarch/x86-evex512-vecs.h new file mode 100644 index 0000000000..7dc5c23ad0 --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-evex512-vecs.h @@ -0,0 +1,38 @@ +/* Common config for EVEX512 VECs + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _EVEX512_VECS_H +#define _EVEX512_VECS_H 1 + +#ifdef VEC_SIZE +# error "Multiple VEC configs included!" +#endif + +#define VEC_SIZE 64 +#include "x86-evex-vecs-common.h" + +#define USE_WITH_EVEX512 1 + +#ifndef SECTION +# define SECTION(p) p##.evex512 +#endif + +#define VMM VMM_512 +#define VMM_lo VMM_any_zmm +#endif diff --git a/sysdeps/x86_64/multiarch/x86-sse2-vecs.h b/sysdeps/x86_64/multiarch/x86-sse2-vecs.h new file mode 100644 index 0000000000..b8bbd5dc29 --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-sse2-vecs.h @@ -0,0 +1,47 @@ +/* Common config for SSE2 VECs + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _X86_SSE2_VECS_H +#define _X86_SSE2_VECS_H 1 + +#ifdef VEC_SIZE +# error "Multiple VEC configs included!" +#endif + +#define VEC_SIZE 16 +#include "x86-vec-macros.h" + +#define USE_WITH_SSE2 1 +#define SECTION(p) p + +/* 3-byte mov instructions with SSE2. */ +#define MOV_SIZE 3 +/* No vzeroupper needed. */ +#define RET_SIZE 1 +#define VZEROUPPER + +#define VMOVU movups +#define VMOVA movaps +#define VMOVNT movntdq + +#define VMM_128 VMM_any_xmm +#define VMM VMM_any_xmm + + +#endif diff --git a/sysdeps/x86_64/multiarch/x86-vec-macros.h b/sysdeps/x86_64/multiarch/x86-vec-macros.h new file mode 100644 index 0000000000..7d6bb31d55 --- /dev/null +++ b/sysdeps/x86_64/multiarch/x86-vec-macros.h @@ -0,0 +1,90 @@ +/* Macro helpers for VEC_{type}({vec_num}) + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _X86_VEC_MACROS_H +#define _X86_VEC_MACROS_H 1 + +#ifndef VEC_SIZE +# error "Never include this file directly. Always include a vector config." +#endif + +/* Defines so we can use SSE2 / AVX2 / EVEX / EVEX512 encoding with same + VMM(N) values. */ +#define VMM_hi_xmm0 xmm16 +#define VMM_hi_xmm1 xmm17 +#define VMM_hi_xmm2 xmm18 +#define VMM_hi_xmm3 xmm19 +#define VMM_hi_xmm4 xmm20 +#define VMM_hi_xmm5 xmm21 +#define VMM_hi_xmm6 xmm22 +#define VMM_hi_xmm7 xmm23 +#define VMM_hi_xmm8 xmm24 +#define VMM_hi_xmm9 xmm25 +#define VMM_hi_xmm10 xmm26 +#define VMM_hi_xmm11 xmm27 +#define VMM_hi_xmm12 xmm28 +#define VMM_hi_xmm13 xmm29 +#define VMM_hi_xmm14 xmm30 +#define VMM_hi_xmm15 xmm31 + +#define VMM_hi_ymm0 ymm16 +#define VMM_hi_ymm1 ymm17 +#define VMM_hi_ymm2 ymm18 +#define VMM_hi_ymm3 ymm19 +#define VMM_hi_ymm4 ymm20 +#define VMM_hi_ymm5 ymm21 +#define VMM_hi_ymm6 ymm22 +#define VMM_hi_ymm7 ymm23 +#define VMM_hi_ymm8 ymm24 +#define VMM_hi_ymm9 ymm25 +#define VMM_hi_ymm10 ymm26 +#define VMM_hi_ymm11 ymm27 +#define VMM_hi_ymm12 ymm28 +#define VMM_hi_ymm13 ymm29 +#define VMM_hi_ymm14 ymm30 +#define VMM_hi_ymm15 ymm31 + +#define VMM_hi_zmm0 zmm16 +#define VMM_hi_zmm1 zmm17 +#define VMM_hi_zmm2 zmm18 +#define VMM_hi_zmm3 zmm19 +#define VMM_hi_zmm4 zmm20 +#define VMM_hi_zmm5 zmm21 +#define VMM_hi_zmm6 zmm22 +#define VMM_hi_zmm7 zmm23 +#define VMM_hi_zmm8 zmm24 +#define VMM_hi_zmm9 zmm25 +#define VMM_hi_zmm10 zmm26 +#define VMM_hi_zmm11 zmm27 +#define VMM_hi_zmm12 zmm28 +#define VMM_hi_zmm13 zmm29 +#define VMM_hi_zmm14 zmm30 +#define VMM_hi_zmm15 zmm31 + +#define PRIMITIVE_VMM(vec, num) vec##num + +#define VMM_any_xmm(i) PRIMITIVE_VMM(xmm, i) +#define VMM_any_ymm(i) PRIMITIVE_VMM(ymm, i) +#define VMM_any_zmm(i) PRIMITIVE_VMM(zmm, i) + +#define VMM_hi_xmm(i) PRIMITIVE_VMM(VMM_hi_xmm, i) +#define VMM_hi_ymm(i) PRIMITIVE_VMM(VMM_hi_ymm, i) +#define VMM_hi_zmm(i) PRIMITIVE_VMM(VMM_hi_zmm, i) + +#endif From patchwork Fri Oct 14 22:39:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58878 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EE173384B82C for ; Fri, 14 Oct 2022 22:42:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EE173384B82C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787377; bh=7WImWnKhshd49GDa5gTKZRJkdqrhpFFrgF5cc2jp1t0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=O3gumO6KedqHh3lY1KFaoMhyfJeLNjY4uXGSksh7PIAglefqWePQ60Eu4Sr4uL7jx 7AF2KSLoPCwZP94uLcUnj7nb/cCzUkIcdJ/0lUTZep3KNJ7cq11VyqqZUFWRCpBTU8 lXNlvZtd0GhZ+Lc5AxeXO6W8AshPYQGBlxbJq0Hs= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id C626C3858C52 for ; Fri, 14 Oct 2022 22:39:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C626C3858C52 Received: by mail-pl1-x631.google.com with SMTP id d24so6003450pls.4 for ; Fri, 14 Oct 2022 15:39:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7WImWnKhshd49GDa5gTKZRJkdqrhpFFrgF5cc2jp1t0=; b=G1z1Zri72uQ7uHVo4saIdXNYX61UAPaQC1SDDBhRtp56EPB6Voz9QUT6Rt4wLjZcDn kF35axZImqfy3i46vtD8sUV+0JcL7c2L7UGAEZ79G6Un8/Sz3oRuWCpJX5/PE+04xLvc V5aba6sSN7mudZ78fhbc39PPA7okLQ++xu3tquXLY9jBMsa/OBS1ZuAw60jtp/u7tCVM I81Y9l7w1+MMJewthh6kepMstvDenh5J/0i2lwzW2iViH4QPDRIEZRR0zmxoHU4378XW jSQ/738UDVbjiSIkxhfPNRErQ2i9dTNqHB0C+LTPaKYyFMiESYb+acAfNFkbJSPgR8v5 LKCg== X-Gm-Message-State: ACrzQf1diTWScGk+cc5Ig/6ZKEybwal31Tr5Xgtq6svES0Yu+uUD687G ZmHNokP4zkv8s1ZvNxIhTAlNfhffqoLjFw== X-Google-Smtp-Source: AMsMyM5yOtSgj8be8vUfcq3I8gKMPQvxeXKZ724w0kL21cC0g/XZuM4yvqeN8Pp1QMh4PNTFamhdgw== X-Received: by 2002:a17:902:7104:b0:17f:cdc1:f4c3 with SMTP id a4-20020a170902710400b0017fcdc1f4c3mr7434649pll.149.1665787160286; Fri, 14 Oct 2022 15:39:20 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:19 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 2/7] x86: Add macros for GPRs / mask insn based on VEC_SIZE Date: Fri, 14 Oct 2022 17:39:09 -0500 Message-Id: <20221014223914.700492-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014223914.700492-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014223914.700492-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This is to make it easier to do think like: ``` vpcmpb %VEC(0), %VEC(1), %k0 kmov{d|q} %k0, %{eax|rax} test %{eax|rax} ``` It adds macro s.t any GPR can get the proper width with: `V{upper_case_GPR_name}` and any mask insn can get the proper width with: `{mask_insn_without_postfix}V` This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/reg-macros.h | 166 ++++++++++++++++++ .../multiarch/scripts/gen-reg-macros.py | 123 +++++++++++++ 2 files changed, 289 insertions(+) create mode 100644 sysdeps/x86_64/multiarch/reg-macros.h create mode 100644 sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py diff --git a/sysdeps/x86_64/multiarch/reg-macros.h b/sysdeps/x86_64/multiarch/reg-macros.h new file mode 100644 index 0000000000..16168b6fda --- /dev/null +++ b/sysdeps/x86_64/multiarch/reg-macros.h @@ -0,0 +1,166 @@ +/* This file was generated by: gen-reg-macros.py. + + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _REG_MACROS_H +#define _REG_MACROS_H 1 + +#define rax_8 al +#define rax_16 ax +#define rax_32 eax +#define rax_64 rax +#define rbx_8 bl +#define rbx_16 bx +#define rbx_32 ebx +#define rbx_64 rbx +#define rcx_8 cl +#define rcx_16 cx +#define rcx_32 ecx +#define rcx_64 rcx +#define rdx_8 dl +#define rdx_16 dx +#define rdx_32 edx +#define rdx_64 rdx +#define rbp_8 bpl +#define rbp_16 bp +#define rbp_32 ebp +#define rbp_64 rbp +#define rsp_8 spl +#define rsp_16 sp +#define rsp_32 esp +#define rsp_64 rsp +#define rsi_8 sil +#define rsi_16 si +#define rsi_32 esi +#define rsi_64 rsi +#define rdi_8 dil +#define rdi_16 di +#define rdi_32 edi +#define rdi_64 rdi +#define r8_8 r8b +#define r8_16 r8w +#define r8_32 r8d +#define r8_64 r8 +#define r9_8 r9b +#define r9_16 r9w +#define r9_32 r9d +#define r9_64 r9 +#define r10_8 r10b +#define r10_16 r10w +#define r10_32 r10d +#define r10_64 r10 +#define r11_8 r11b +#define r11_16 r11w +#define r11_32 r11d +#define r11_64 r11 +#define r12_8 r12b +#define r12_16 r12w +#define r12_32 r12d +#define r12_64 r12 +#define r13_8 r13b +#define r13_16 r13w +#define r13_32 r13d +#define r13_64 r13 +#define r14_8 r14b +#define r14_16 r14w +#define r14_32 r14d +#define r14_64 r14 +#define r15_8 r15b +#define r15_16 r15w +#define r15_32 r15d +#define r15_64 r15 + +#define kmov_8 kmovb +#define kmov_16 kmovw +#define kmov_32 kmovd +#define kmov_64 kmovq +#define kortest_8 kortestb +#define kortest_16 kortestw +#define kortest_32 kortestd +#define kortest_64 kortestq +#define kor_8 korb +#define kor_16 korw +#define kor_32 kord +#define kor_64 korq +#define ktest_8 ktestb +#define ktest_16 ktestw +#define ktest_32 ktestd +#define ktest_64 ktestq +#define kand_8 kandb +#define kand_16 kandw +#define kand_32 kandd +#define kand_64 kandq +#define kxor_8 kxorb +#define kxor_16 kxorw +#define kxor_32 kxord +#define kxor_64 kxorq +#define knot_8 knotb +#define knot_16 knotw +#define knot_32 knotd +#define knot_64 knotq +#define kxnor_8 kxnorb +#define kxnor_16 kxnorw +#define kxnor_32 kxnord +#define kxnor_64 kxnorq +#define kunpack_8 kunpackbw +#define kunpack_16 kunpackwd +#define kunpack_32 kunpackdq + +/* Common API for accessing proper width GPR is V{upcase_GPR_name}. */ +#define VRAX VGPR(rax) +#define VRBX VGPR(rbx) +#define VRCX VGPR(rcx) +#define VRDX VGPR(rdx) +#define VRBP VGPR(rbp) +#define VRSP VGPR(rsp) +#define VRSI VGPR(rsi) +#define VRDI VGPR(rdi) +#define VR8 VGPR(r8) +#define VR9 VGPR(r9) +#define VR10 VGPR(r10) +#define VR11 VGPR(r11) +#define VR12 VGPR(r12) +#define VR13 VGPR(r13) +#define VR14 VGPR(r14) +#define VR15 VGPR(r15) + +/* Common API for accessing proper width mask insn is {upcase_mask_insn}. */ +#define KMOV VKINSN(kmov) +#define KORTEST VKINSN(kortest) +#define KOR VKINSN(kor) +#define KTEST VKINSN(ktest) +#define KAND VKINSN(kand) +#define KXOR VKINSN(kxor) +#define KNOT VKINSN(knot) +#define KXNOR VKINSN(kxnor) +#define KUNPACK VKINSN(kunpack) + +#ifndef REG_WIDTH +# define REG_WIDTH VEC_SIZE +#endif + +#define VPASTER(x, y) x##_##y +#define VEVALUATOR(x, y) VPASTER(x, y) + +#define VGPR_SZ(reg_name, reg_size) VEVALUATOR(reg_name, reg_size) +#define VKINSN_SZ(insn, reg_size) VEVALUATOR(insn, reg_size) + +#define VGPR(reg_name) VGPR_SZ(reg_name, REG_WIDTH) +#define VKINSN(mask_insn) VKINSN_SZ(mask_insn, REG_WIDTH) + +#endif diff --git a/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py b/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py new file mode 100644 index 0000000000..c7296a8104 --- /dev/null +++ b/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py @@ -0,0 +1,123 @@ +#!/usr/bin/python3 +# Copyright (C) 2022 Free Software Foundation, Inc. +# This file is part of the GNU C Library. +# +# The GNU C Library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. +# +# The GNU C Library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with the GNU C Library; if not, see +# . +"""Generate macros for getting GPR name of a certain size + +Inputs: None +Output: Prints header fill to stdout + +API: + VGPR(reg_name) + - Get register name VEC_SIZE component of `reg_name` + VGPR_SZ(reg_name, reg_size) + - Get register name `reg_size` component of `reg_name` +""" + +import sys +import os +from datetime import datetime + +registers = [["rax", "eax", "ax", "al"], ["rbx", "ebx", "bx", "bl"], + ["rcx", "ecx", "cx", "cl"], ["rdx", "edx", "dx", "dl"], + ["rbp", "ebp", "bp", "bpl"], ["rsp", "esp", "sp", "spl"], + ["rsi", "esi", "si", "sil"], ["rdi", "edi", "di", "dil"], + ["r8", "r8d", "r8w", "r8b"], ["r9", "r9d", "r9w", "r9b"], + ["r10", "r10d", "r10w", "r10b"], ["r11", "r11d", "r11w", "r11b"], + ["r12", "r12d", "r12w", "r12b"], ["r13", "r13d", "r13w", "r13b"], + ["r14", "r14d", "r14w", "r14b"], ["r15", "r15d", "r15w", "r15b"]] + +mask_insns = [ + "kmov", + "kortest", + "kor", + "ktest", + "kand", + "kxor", + "knot", + "kxnor", +] +mask_insns_ext = ["b", "w", "d", "q"] + +cr = """ + Copyright (C) {} Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ +""" + +print("/* This file was generated by: {}.".format(os.path.basename( + sys.argv[0]))) +print(cr.format(datetime.today().year)) + +print("#ifndef _REG_MACROS_H") +print("#define _REG_MACROS_H\t1") +print("") +for reg in registers: + for i in range(0, 4): + print("#define {}_{}\t{}".format(reg[0], 8 << i, reg[3 - i])) + +print("") +for mask_insn in mask_insns: + for i in range(0, 4): + print("#define {}_{}\t{}{}".format(mask_insn, 8 << i, mask_insn, + mask_insns_ext[i])) +for i in range(0, 3): + print("#define kunpack_{}\tkunpack{}{}".format(8 << i, mask_insns_ext[i], + mask_insns_ext[i + 1])) +mask_insns.append("kunpack") + +print("") +print( + "/* Common API for accessing proper width GPR is V{upcase_GPR_name}. */") +for reg in registers: + print("#define V{}\tVGPR({})".format(reg[0].upper(), reg[0])) + +print("") + +print( + "/* Common API for accessing proper width mask insn is {upcase_mask_insn}. */" +) +for mask_insn in mask_insns: + print("#define {} \tVKINSN({})".format(mask_insn.upper(), mask_insn)) +print("") + +print("#ifndef REG_WIDTH") +print("# define REG_WIDTH VEC_SIZE") +print("#endif") +print("") +print("#define VPASTER(x, y)\tx##_##y") +print("#define VEVALUATOR(x, y)\tVPASTER(x, y)") +print("") +print("#define VGPR_SZ(reg_name, reg_size)\tVEVALUATOR(reg_name, reg_size)") +print("#define VKINSN_SZ(insn, reg_size)\tVEVALUATOR(insn, reg_size)") +print("") +print("#define VGPR(reg_name)\tVGPR_SZ(reg_name, REG_WIDTH)") +print("#define VKINSN(mask_insn)\tVKINSN_SZ(mask_insn, REG_WIDTH)") + +print("\n#endif") From patchwork Fri Oct 14 22:39:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58872 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 637793856DF8 for ; Fri, 14 Oct 2022 22:40:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 637793856DF8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787241; bh=SFEijbmXdXBpQ5hr6+0B4AUdc1+PmRc0lDFf2XCXVFM=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=P6kzQH8LJkscDS4SKGZ4JQF5E3EkbpkX/aYOWVQX5BPKTbRZarYaVDgLm07CRbf0/ ZbFa/dgp10MeaT3xFotFp/Jl3ShvttGxhs46gwcoAjN+5r3qpJUNbaUXAD+P2bQkoM EYmVnHj2VDQGHMd2Ckgy6bAoQ/ex+6YHbUu+G5iM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id 624BB3858C53 for ; Fri, 14 Oct 2022 22:39:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 624BB3858C53 Received: by mail-pg1-x532.google.com with SMTP id q1so5488227pgl.11 for ; Fri, 14 Oct 2022 15:39:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SFEijbmXdXBpQ5hr6+0B4AUdc1+PmRc0lDFf2XCXVFM=; b=kko+2PMvBH1Ftz2YlhADf6+ojJIoXHd/rSI/N4RtIw+31clrLvbRq4c2YCYQj65Iju bn0hjWiLUhg8kz985eGwKhZSIRfsPCW84653Qty5XjsIP2crr68VGL7MdmW1dKoWS/Q1 ZYM9qCrb2xcM+DJNqaFISVLhicfyQKGgCxo8dE8TaCXOHcd7JgPv81lMj9/Z8TKhCQJv /maRqqwlPfLQ7srxz8XBUmTeuCqIZIfg5Kf3Rf0PTxk0NdWHkQsCoF/Rc9NoiZgYPOqL lcF0QyCmWR3Zp6lU3DliJtSosyT2r7OdvUM3/e6Pa8pmgKmPe/rGRXDTm9QH6vmtEpM9 G0NQ== X-Gm-Message-State: ACrzQf1jEZr+YHolPw7R6RbYom1wE0T0TOIaXjXbztzKB5Y/2JfBJKqd YoFHyrNUWQlfVVMsy6DQT1w8sFHCF4UFEw== X-Google-Smtp-Source: AMsMyM6kTgMH9mXfcs633C40iEf+CdnoyMeKIZgHJEJ5sRCZmGeyvQO71sTTdlgMIbEaeTrYuziYBQ== X-Received: by 2002:a63:5b5c:0:b0:440:8531:d3f6 with SMTP id l28-20020a635b5c000000b004408531d3f6mr125456pgm.114.1665787161969; Fri, 14 Oct 2022 15:39:21 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:21 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 3/7] x86: Update memrchr to use new VEC macros Date: Fri, 14 Oct 2022 17:39:10 -0500 Message-Id: <20221014223914.700492-3-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014223914.700492-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014223914.700492-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/memrchr-evex.S | 42 ++++++++++++------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memrchr-evex.S b/sysdeps/x86_64/multiarch/memrchr-evex.S index ea3a0a0a60..550b328c5a 100644 --- a/sysdeps/x86_64/multiarch/memrchr-evex.S +++ b/sysdeps/x86_64/multiarch/memrchr-evex.S @@ -21,7 +21,7 @@ #if ISA_SHOULD_BUILD (4) # include -# include "evex256-vecs.h" +# include "x86-evex256-vecs.h" # if VEC_SIZE != 32 # error "VEC_SIZE != 32 unimplemented" # endif @@ -31,7 +31,7 @@ # endif # define PAGE_SIZE 4096 -# define VECMATCH VEC(0) +# define VMMMATCH VMM(0) .section SECTION(.text), "ax", @progbits ENTRY_P2ALIGN(MEMRCHR, 6) @@ -47,7 +47,7 @@ ENTRY_P2ALIGN(MEMRCHR, 6) correct page cross check and 2) it correctly sets up end ptr to be subtract by lzcnt aligned. */ leaq -1(%rdi, %rdx), %rax - vpbroadcastb %esi, %VECMATCH + vpbroadcastb %esi, %VMMMATCH /* Check if we can load 1x VEC without cross a page. */ testl $(PAGE_SIZE - VEC_SIZE), %eax @@ -55,7 +55,7 @@ ENTRY_P2ALIGN(MEMRCHR, 6) /* Don't use rax for pointer here because EVEX has better encoding with offset % VEC_SIZE == 0. */ - vpcmpb $0, -(VEC_SIZE)(%rdi, %rdx), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE)(%rdi, %rdx), %VMMMATCH, %k0 kmovd %k0, %ecx /* Fall through for rdx (len) <= VEC_SIZE (expect small sizes). */ @@ -96,7 +96,7 @@ L(more_1x_vec): movq %rax, %rdx /* Need no matter what. */ - vpcmpb $0, -(VEC_SIZE)(%rax), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx subq %rdi, %rdx @@ -115,7 +115,7 @@ L(last_2x_vec): /* Don't use rax for pointer here because EVEX has better encoding with offset % VEC_SIZE == 0. */ - vpcmpb $0, -(VEC_SIZE * 2)(%rdi, %rdx), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE * 2)(%rdi, %rdx), %VMMMATCH, %k0 kmovd %k0, %ecx /* NB: 64-bit lzcnt. This will naturally add 32 to position. */ lzcntq %rcx, %rcx @@ -131,7 +131,7 @@ L(last_2x_vec): L(page_cross): movq %rax, %rsi andq $-VEC_SIZE, %rsi - vpcmpb $0, (%rsi), %VECMATCH, %k0 + vpcmpb $0, (%rsi), %VMMMATCH, %k0 kmovd %k0, %r8d /* Shift out negative alignment (because we are starting from endptr and working backwards). */ @@ -165,13 +165,13 @@ L(more_2x_vec): testl %ecx, %ecx jnz L(ret_vec_x0_dec) - vpcmpb $0, -(VEC_SIZE * 2)(%rax), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE * 2)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx testl %ecx, %ecx jnz L(ret_vec_x1) /* Need no matter what. */ - vpcmpb $0, -(VEC_SIZE * 3)(%rax), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE * 3)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx subq $(VEC_SIZE * 4), %rdx @@ -185,7 +185,7 @@ L(last_vec): /* Need no matter what. */ - vpcmpb $0, -(VEC_SIZE * 4)(%rax), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE * 4)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx lzcntl %ecx, %ecx subq $(VEC_SIZE * 3 + 1), %rax @@ -220,7 +220,7 @@ L(more_4x_vec): testl %ecx, %ecx jnz L(ret_vec_x2) - vpcmpb $0, -(VEC_SIZE * 4)(%rax), %VECMATCH, %k0 + vpcmpb $0, -(VEC_SIZE * 4)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx testl %ecx, %ecx @@ -243,17 +243,17 @@ L(more_4x_vec): L(loop_4x_vec): /* Store 1 were not-equals and 0 where equals in k1 (used to mask later on). */ - vpcmpb $4, (VEC_SIZE * 3)(%rax), %VECMATCH, %k1 + vpcmpb $4, (VEC_SIZE * 3)(%rax), %VMMMATCH, %k1 /* VEC(2/3) will have zero-byte where we found a CHAR. */ - vpxorq (VEC_SIZE * 2)(%rax), %VECMATCH, %VEC(2) - vpxorq (VEC_SIZE * 1)(%rax), %VECMATCH, %VEC(3) - vpcmpb $0, (VEC_SIZE * 0)(%rax), %VECMATCH, %k4 + vpxorq (VEC_SIZE * 2)(%rax), %VMMMATCH, %VMM(2) + vpxorq (VEC_SIZE * 1)(%rax), %VMMMATCH, %VMM(3) + vpcmpb $0, (VEC_SIZE * 0)(%rax), %VMMMATCH, %k4 /* Combine VEC(2/3) with min and maskz with k1 (k1 has zero bit where CHAR is found and VEC(2/3) have zero-byte where CHAR is found. */ - vpminub %VEC(2), %VEC(3), %VEC(3){%k1}{z} - vptestnmb %VEC(3), %VEC(3), %k2 + vpminub %VMM(2), %VMM(3), %VMM(3){%k1}{z} + vptestnmb %VMM(3), %VMM(3), %k2 /* Any 1s and we found CHAR. */ kortestd %k2, %k4 @@ -270,7 +270,7 @@ L(loop_4x_vec): L(last_4x_vec): /* Used no matter what. */ - vpcmpb $0, (VEC_SIZE * -1)(%rax), %VECMATCH, %k0 + vpcmpb $0, (VEC_SIZE * -1)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx cmpl $(VEC_SIZE * 2), %edx @@ -280,14 +280,14 @@ L(last_4x_vec): jnz L(ret_vec_x0_dec) - vpcmpb $0, (VEC_SIZE * -2)(%rax), %VECMATCH, %k0 + vpcmpb $0, (VEC_SIZE * -2)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx testl %ecx, %ecx jnz L(ret_vec_x1) /* Used no matter what. */ - vpcmpb $0, (VEC_SIZE * -3)(%rax), %VECMATCH, %k0 + vpcmpb $0, (VEC_SIZE * -3)(%rax), %VMMMATCH, %k0 kmovd %k0, %ecx cmpl $(VEC_SIZE * 3), %edx @@ -309,7 +309,7 @@ L(loop_end): testl %ecx, %ecx jnz L(ret_vec_x0_end) - vptestnmb %VEC(2), %VEC(2), %k0 + vptestnmb %VMM(2), %VMM(2), %k0 kmovd %k0, %ecx testl %ecx, %ecx jnz L(ret_vec_x1_end) From patchwork Fri Oct 14 22:39:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58875 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0A3513852764 for ; Fri, 14 Oct 2022 22:40:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0A3513852764 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787259; bh=ykH2NpmFqT8x0VMWYqZ06Oc4qVNY0DYVgbhfTUgMGok=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=hC86ToNCVxisYkcIWEqJpWyCwuWREYWwofttamnQsCdWNzjgkP0H37ySDDcuCSHj4 YWgdTmLoFcTgIhw8FhAr/rfJ2Q7c+/W23dkXAdOeOLRBApCY8oVFr5Cc/qBa0IJJ2z QAIfzJV1gxpA7rNg8S7Ozj22uzs9q7zZqhZM6pRU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id 244803858C62 for ; Fri, 14 Oct 2022 22:39:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 244803858C62 Received: by mail-pf1-x429.google.com with SMTP id f140so6168807pfa.1 for ; Fri, 14 Oct 2022 15:39:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ykH2NpmFqT8x0VMWYqZ06Oc4qVNY0DYVgbhfTUgMGok=; b=1L3f1V8eyh50Tijo62OrLei1nDNHLSi3EOVn2XDydAkahdPXMps0yYbNBmjNhnvROh h7hV3DcAVanSZA/EzYMSENHaEhPLVN3MdYnzy+GnFgSmOuZ/RwbYLO6Nd8tX6K6tc6Sd cZdcmswU/gpurquA2/HUx9h6iwkzq07XOTn41wKpT/3jC0lmNWgivVtTK9Lej2W5Vqn4 CWKgDtW+KygCXJBJXtHNJqe5O5bjIogaRDTZw29/a6OOSVeatG0h74r3WmnhVBT4hKs5 NZpCJhpesTRWRAAzuinMbHSrDul1NMFLL4sjycV+AXNpjYrwcWQYgN+TcuIbPdufR67H mJ7Q== X-Gm-Message-State: ACrzQf3bzwsUKELt6AOJTTGh6dok8iPT8auabCAXx+yby1PSACMuCOmY au8uM75HyHNW7WVWEu+VPLw9fnNhhXafKw== X-Google-Smtp-Source: AMsMyM7dKlorbM2DMmGcAsENj007i/sSgr+gwSRXfgS8C3rnv6qBSAl+BAtf8pFdlpm8JoXovhv4CQ== X-Received: by 2002:a63:1145:0:b0:46a:e00c:579c with SMTP id 5-20020a631145000000b0046ae00c579cmr77916pgr.279.1665787163599; Fri, 14 Oct 2022 15:39:23 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:23 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 4/7] x86: Remove now unused vec header macros. Date: Fri, 14 Oct 2022 17:39:11 -0500 Message-Id: <20221014223914.700492-4-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014223914.700492-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014223914.700492-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/avx-rtm-vecs.h | 35 -------- sysdeps/x86_64/multiarch/avx-vecs.h | 47 ----------- sysdeps/x86_64/multiarch/evex-vecs-common.h | 39 --------- sysdeps/x86_64/multiarch/evex256-vecs.h | 35 -------- sysdeps/x86_64/multiarch/evex512-vecs.h | 35 -------- sysdeps/x86_64/multiarch/sse2-vecs.h | 47 ----------- sysdeps/x86_64/multiarch/vec-macros.h | 90 --------------------- 7 files changed, 328 deletions(-) delete mode 100644 sysdeps/x86_64/multiarch/avx-rtm-vecs.h delete mode 100644 sysdeps/x86_64/multiarch/avx-vecs.h delete mode 100644 sysdeps/x86_64/multiarch/evex-vecs-common.h delete mode 100644 sysdeps/x86_64/multiarch/evex256-vecs.h delete mode 100644 sysdeps/x86_64/multiarch/evex512-vecs.h delete mode 100644 sysdeps/x86_64/multiarch/sse2-vecs.h delete mode 100644 sysdeps/x86_64/multiarch/vec-macros.h diff --git a/sysdeps/x86_64/multiarch/avx-rtm-vecs.h b/sysdeps/x86_64/multiarch/avx-rtm-vecs.h deleted file mode 100644 index 6ca9f5e6ba..0000000000 --- a/sysdeps/x86_64/multiarch/avx-rtm-vecs.h +++ /dev/null @@ -1,35 +0,0 @@ -/* Common config for AVX-RTM VECs - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _AVX_RTM_VECS_H -#define _AVX_RTM_VECS_H 1 - -#define COND_VZEROUPPER COND_VZEROUPPER_XTEST -#define ZERO_UPPER_VEC_REGISTERS_RETURN \ - ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST - -#define VZEROUPPER_RETURN jmp L(return_vzeroupper) - -#define USE_WITH_RTM 1 -#include "avx-vecs.h" - -#undef SECTION -#define SECTION(p) p##.avx.rtm - -#endif diff --git a/sysdeps/x86_64/multiarch/avx-vecs.h b/sysdeps/x86_64/multiarch/avx-vecs.h deleted file mode 100644 index 89680f5db8..0000000000 --- a/sysdeps/x86_64/multiarch/avx-vecs.h +++ /dev/null @@ -1,47 +0,0 @@ -/* Common config for AVX VECs - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _AVX_VECS_H -#define _AVX_VECS_H 1 - -#ifdef VEC_SIZE -# error "Multiple VEC configs included!" -#endif - -#define VEC_SIZE 32 -#include "vec-macros.h" - -#define USE_WITH_AVX 1 -#define SECTION(p) p##.avx - -/* 4-byte mov instructions with AVX2. */ -#define MOV_SIZE 4 -/* 1 (ret) + 3 (vzeroupper). */ -#define RET_SIZE 4 -#define VZEROUPPER vzeroupper - -#define VMOVU vmovdqu -#define VMOVA vmovdqa -#define VMOVNT vmovntdq - -/* Often need to access xmm portion. */ -#define VEC_xmm VEC_any_xmm -#define VEC VEC_any_ymm - -#endif diff --git a/sysdeps/x86_64/multiarch/evex-vecs-common.h b/sysdeps/x86_64/multiarch/evex-vecs-common.h deleted file mode 100644 index 99806ebcd7..0000000000 --- a/sysdeps/x86_64/multiarch/evex-vecs-common.h +++ /dev/null @@ -1,39 +0,0 @@ -/* Common config for EVEX256 and EVEX512 VECs - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _EVEX_VECS_COMMON_H -#define _EVEX_VECS_COMMON_H 1 - -#include "vec-macros.h" - -/* 6-byte mov instructions with EVEX. */ -#define MOV_SIZE 6 -/* No vzeroupper needed. */ -#define RET_SIZE 1 -#define VZEROUPPER - -#define VMOVU vmovdqu64 -#define VMOVA vmovdqa64 -#define VMOVNT vmovntdq - -#define VEC_xmm VEC_hi_xmm -#define VEC_ymm VEC_hi_ymm -#define VEC_zmm VEC_hi_zmm - -#endif diff --git a/sysdeps/x86_64/multiarch/evex256-vecs.h b/sysdeps/x86_64/multiarch/evex256-vecs.h deleted file mode 100644 index 222ba46dc7..0000000000 --- a/sysdeps/x86_64/multiarch/evex256-vecs.h +++ /dev/null @@ -1,35 +0,0 @@ -/* Common config for EVEX256 VECs - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _EVEX256_VECS_H -#define _EVEX256_VECS_H 1 - -#ifdef VEC_SIZE -# error "Multiple VEC configs included!" -#endif - -#define VEC_SIZE 32 -#include "evex-vecs-common.h" - -#define USE_WITH_EVEX256 1 -#define SECTION(p) p##.evex - -#define VEC VEC_ymm - -#endif diff --git a/sysdeps/x86_64/multiarch/evex512-vecs.h b/sysdeps/x86_64/multiarch/evex512-vecs.h deleted file mode 100644 index d1784d5368..0000000000 --- a/sysdeps/x86_64/multiarch/evex512-vecs.h +++ /dev/null @@ -1,35 +0,0 @@ -/* Common config for EVEX512 VECs - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _EVEX512_VECS_H -#define _EVEX512_VECS_H 1 - -#ifdef VEC_SIZE -# error "Multiple VEC configs included!" -#endif - -#define VEC_SIZE 64 -#include "evex-vecs-common.h" - -#define USE_WITH_EVEX512 1 -#define SECTION(p) p##.evex512 - -#define VEC VEC_zmm - -#endif diff --git a/sysdeps/x86_64/multiarch/sse2-vecs.h b/sysdeps/x86_64/multiarch/sse2-vecs.h deleted file mode 100644 index 2b77a59d56..0000000000 --- a/sysdeps/x86_64/multiarch/sse2-vecs.h +++ /dev/null @@ -1,47 +0,0 @@ -/* Common config for SSE2 VECs - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _SSE2_VECS_H -#define _SSE2_VECS_H 1 - -#ifdef VEC_SIZE -# error "Multiple VEC configs included!" -#endif - -#define VEC_SIZE 16 -#include "vec-macros.h" - -#define USE_WITH_SSE2 1 -#define SECTION(p) p - -/* 3-byte mov instructions with SSE2. */ -#define MOV_SIZE 3 -/* No vzeroupper needed. */ -#define RET_SIZE 1 -#define VZEROUPPER - -#define VMOVU movups -#define VMOVA movaps -#define VMOVNT movntdq - -#define VEC_xmm VEC_any_xmm -#define VEC VEC_any_xmm - - -#endif diff --git a/sysdeps/x86_64/multiarch/vec-macros.h b/sysdeps/x86_64/multiarch/vec-macros.h deleted file mode 100644 index 9f3ffecede..0000000000 --- a/sysdeps/x86_64/multiarch/vec-macros.h +++ /dev/null @@ -1,90 +0,0 @@ -/* Macro helpers for VEC_{type}({vec_num}) - All versions must be listed in ifunc-impl-list.c. - Copyright (C) 2022 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#ifndef _VEC_MACROS_H -#define _VEC_MACROS_H 1 - -#ifndef VEC_SIZE -# error "Never include this file directly. Always include a vector config." -#endif - -/* Defines so we can use SSE2 / AVX2 / EVEX / EVEX512 encoding with same - VEC(N) values. */ -#define VEC_hi_xmm0 xmm16 -#define VEC_hi_xmm1 xmm17 -#define VEC_hi_xmm2 xmm18 -#define VEC_hi_xmm3 xmm19 -#define VEC_hi_xmm4 xmm20 -#define VEC_hi_xmm5 xmm21 -#define VEC_hi_xmm6 xmm22 -#define VEC_hi_xmm7 xmm23 -#define VEC_hi_xmm8 xmm24 -#define VEC_hi_xmm9 xmm25 -#define VEC_hi_xmm10 xmm26 -#define VEC_hi_xmm11 xmm27 -#define VEC_hi_xmm12 xmm28 -#define VEC_hi_xmm13 xmm29 -#define VEC_hi_xmm14 xmm30 -#define VEC_hi_xmm15 xmm31 - -#define VEC_hi_ymm0 ymm16 -#define VEC_hi_ymm1 ymm17 -#define VEC_hi_ymm2 ymm18 -#define VEC_hi_ymm3 ymm19 -#define VEC_hi_ymm4 ymm20 -#define VEC_hi_ymm5 ymm21 -#define VEC_hi_ymm6 ymm22 -#define VEC_hi_ymm7 ymm23 -#define VEC_hi_ymm8 ymm24 -#define VEC_hi_ymm9 ymm25 -#define VEC_hi_ymm10 ymm26 -#define VEC_hi_ymm11 ymm27 -#define VEC_hi_ymm12 ymm28 -#define VEC_hi_ymm13 ymm29 -#define VEC_hi_ymm14 ymm30 -#define VEC_hi_ymm15 ymm31 - -#define VEC_hi_zmm0 zmm16 -#define VEC_hi_zmm1 zmm17 -#define VEC_hi_zmm2 zmm18 -#define VEC_hi_zmm3 zmm19 -#define VEC_hi_zmm4 zmm20 -#define VEC_hi_zmm5 zmm21 -#define VEC_hi_zmm6 zmm22 -#define VEC_hi_zmm7 zmm23 -#define VEC_hi_zmm8 zmm24 -#define VEC_hi_zmm9 zmm25 -#define VEC_hi_zmm10 zmm26 -#define VEC_hi_zmm11 zmm27 -#define VEC_hi_zmm12 zmm28 -#define VEC_hi_zmm13 zmm29 -#define VEC_hi_zmm14 zmm30 -#define VEC_hi_zmm15 zmm31 - -#define PRIMITIVE_VEC(vec, num) vec##num - -#define VEC_any_xmm(i) PRIMITIVE_VEC(xmm, i) -#define VEC_any_ymm(i) PRIMITIVE_VEC(ymm, i) -#define VEC_any_zmm(i) PRIMITIVE_VEC(zmm, i) - -#define VEC_hi_xmm(i) PRIMITIVE_VEC(VEC_hi_xmm, i) -#define VEC_hi_ymm(i) PRIMITIVE_VEC(VEC_hi_ymm, i) -#define VEC_hi_zmm(i) PRIMITIVE_VEC(VEC_hi_zmm, i) - -#endif From patchwork Fri Oct 14 22:39:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58877 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 596E1384D14D for ; Fri, 14 Oct 2022 22:42:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 596E1384D14D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787335; bh=JgmzOxoNeHeZvXMx0CHpaURQ+I8dVEA6wik+hQFOF6s=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=HVHceBdlpBCBRaJ1e+BH7M4Gv0kPxIWr/RfHe2x0WP7D9nm93UnBqHS7HaIvYDvPO pLzkkwvF8SkJyy+UMyhmVsw+R+65S4PklhM80vSX8v8gleCzaUm7aTGGUWBQ04oTZe R07n85U3Vo7r5n6XJIHiLVj+grCIHwAQ3deGOwf8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id 07CD63858C39 for ; Fri, 14 Oct 2022 22:39:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 07CD63858C39 Received: by mail-pf1-x433.google.com with SMTP id p14so6164060pfq.5 for ; Fri, 14 Oct 2022 15:39:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JgmzOxoNeHeZvXMx0CHpaURQ+I8dVEA6wik+hQFOF6s=; b=aYZgiPCAFhW/ZdhUsOi44pXKrrXheYGyzWIAZPMPwCvHCH0/hIzSlY4LFVV7mAVWfe bABZbIDRC1L15uMEHtnOE8bDbMgmxc6U3iq63Q90TCF/OpN/kK3BfD+rcsKH7NE0KVG7 UdJ/j0hH/zLDzNX/Q3wcHwyddPFbjmXBvKTR2Cgbr2nSEiR+x/GyLlDNirVSMPFPKqkc Y+m69D8NDhWkK+Fg691LhCBQwQ4KUmEcUtdZBxDAL0n5lPqRWvTRp2aH035TN12nxGxj XvQDiHRsS9AVJQFzn9sBTFzFNf4crBh6OWCAbr8Uyzrz8k3b3oaxMbloHvQwewm1Y8oR +Gug== X-Gm-Message-State: ACrzQf1eCWz9gke+TSa2YSNpH+PaR6CtgSSbeD8i6LdLY7H38VXCCNOl ooGuQwHK+x7CVSqSDXoP/0npI397rbarUQ== X-Google-Smtp-Source: AMsMyM4IE9uH4AbnSRZYriK5MJLH0CbxDWaSKrTRvYUlcSkdIu6vMmSvW+Fv6+039APeYKM5PZiHOw== X-Received: by 2002:a05:6a00:cce:b0:565:cbe0:16c6 with SMTP id b14-20020a056a000cce00b00565cbe016c6mr116933pfv.56.1665787165276; Fri, 14 Oct 2022 15:39:25 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:24 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 5/7] x86: Update memmove to use new VEC macros Date: Fri, 14 Oct 2022 17:39:12 -0500 Message-Id: <20221014223914.700492-5-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014223914.700492-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014223914.700492-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64 --- .../memmove-avx-unaligned-erms-rtm.S | 15 +- .../multiarch/memmove-avx-unaligned-erms.S | 9 +- .../multiarch/memmove-avx512-unaligned-erms.S | 30 +- .../multiarch/memmove-evex-unaligned-erms.S | 30 +- .../multiarch/memmove-sse2-unaligned-erms.S | 11 +- .../multiarch/memmove-vec-unaligned-erms.S | 262 +++++++++--------- 6 files changed, 135 insertions(+), 222 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S b/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S index 67a55f0c85..c2a95dc247 100644 --- a/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S +++ b/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S @@ -1,16 +1,9 @@ -#if IS_IN (libc) -# define VEC_SIZE 32 -# define VEC(i) ymm##i -# define VMOVNT vmovntdq -# define VMOVU vmovdqu -# define VMOVA vmovdqa -# define MOV_SIZE 4 -# define ZERO_UPPER_VEC_REGISTERS_RETURN \ - ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST +#include -# define VZEROUPPER_RETURN jmp L(return) +#if ISA_SHOULD_BUILD (3) + +# include "x86-avx-rtm-vecs.h" -# define SECTION(p) p##.avx.rtm # define MEMMOVE_SYMBOL(p,s) p##_avx_##s##_rtm # include "memmove-vec-unaligned-erms.S" diff --git a/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S index a14b155667..4e4b4635f9 100644 --- a/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S @@ -2,14 +2,7 @@ #if ISA_SHOULD_BUILD (3) -# define VEC_SIZE 32 -# define VEC(i) ymm##i -# define VMOVNT vmovntdq -# define VMOVU vmovdqu -# define VMOVA vmovdqa -# define MOV_SIZE 4 - -# define SECTION(p) p##.avx +# include "x86-avx-vecs.h" # ifndef MEMMOVE_SYMBOL # define MEMMOVE_SYMBOL(p,s) p##_avx_##s diff --git a/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S index 8d1568a7ba..cca97e38f8 100644 --- a/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S @@ -2,35 +2,7 @@ #if ISA_SHOULD_BUILD (4) -# define VEC_SIZE 64 -# define XMM0 xmm16 -# define XMM1 xmm17 -# define YMM0 ymm16 -# define YMM1 ymm17 -# define VEC0 zmm16 -# define VEC1 zmm17 -# define VEC2 zmm18 -# define VEC3 zmm19 -# define VEC4 zmm20 -# define VEC5 zmm21 -# define VEC6 zmm22 -# define VEC7 zmm23 -# define VEC8 zmm24 -# define VEC9 zmm25 -# define VEC10 zmm26 -# define VEC11 zmm27 -# define VEC12 zmm28 -# define VEC13 zmm29 -# define VEC14 zmm30 -# define VEC15 zmm31 -# define VEC(i) VEC##i -# define VMOVNT vmovntdq -# define VMOVU vmovdqu64 -# define VMOVA vmovdqa64 -# define VZEROUPPER -# define MOV_SIZE 6 - -# define SECTION(p) p##.evex512 +# include "x86-evex512-vecs.h" # ifndef MEMMOVE_SYMBOL # define MEMMOVE_SYMBOL(p,s) p##_avx512_##s diff --git a/sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S index 2373017358..1f7b5715f7 100644 --- a/sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S @@ -2,35 +2,7 @@ #if ISA_SHOULD_BUILD (4) -# define VEC_SIZE 32 -# define XMM0 xmm16 -# define XMM1 xmm17 -# define YMM0 ymm16 -# define YMM1 ymm17 -# define VEC0 ymm16 -# define VEC1 ymm17 -# define VEC2 ymm18 -# define VEC3 ymm19 -# define VEC4 ymm20 -# define VEC5 ymm21 -# define VEC6 ymm22 -# define VEC7 ymm23 -# define VEC8 ymm24 -# define VEC9 ymm25 -# define VEC10 ymm26 -# define VEC11 ymm27 -# define VEC12 ymm28 -# define VEC13 ymm29 -# define VEC14 ymm30 -# define VEC15 ymm31 -# define VEC(i) VEC##i -# define VMOVNT vmovntdq -# define VMOVU vmovdqu64 -# define VMOVA vmovdqa64 -# define VZEROUPPER -# define MOV_SIZE 6 - -# define SECTION(p) p##.evex +# include "x86-evex256-vecs.h" # ifndef MEMMOVE_SYMBOL # define MEMMOVE_SYMBOL(p,s) p##_evex_##s diff --git a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S index 422a079902..8431bcd000 100644 --- a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S @@ -22,18 +22,9 @@ so we need this to build for ISA V2 builds. */ #if ISA_SHOULD_BUILD (2) -# include +# include "x86-sse2-vecs.h" -# define VEC_SIZE 16 -# define VEC(i) xmm##i # define PREFETCHNT prefetchnta -# define VMOVNT movntdq -/* Use movups and movaps for smaller code sizes. */ -# define VMOVU movups -# define VMOVA movaps -# define MOV_SIZE 3 - -# define SECTION(p) p # ifndef MEMMOVE_SYMBOL # define MEMMOVE_SYMBOL(p,s) p##_sse2_##s diff --git a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S index 04747133b7..5b758cae5e 100644 --- a/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S @@ -60,14 +60,6 @@ # define MEMMOVE_CHK_SYMBOL(p,s) MEMMOVE_SYMBOL(p, s) #endif -#ifndef XMM0 -# define XMM0 xmm0 -#endif - -#ifndef YMM0 -# define YMM0 ymm0 -#endif - #ifndef VZEROUPPER # if VEC_SIZE > 16 # define VZEROUPPER vzeroupper @@ -225,13 +217,13 @@ L(start): cmp $VEC_SIZE, %RDX_LP jb L(less_vec) /* Load regardless. */ - VMOVU (%rsi), %VEC(0) + VMOVU (%rsi), %VMM(0) cmp $(VEC_SIZE * 2), %RDX_LP ja L(more_2x_vec) /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */ - VMOVU -VEC_SIZE(%rsi,%rdx), %VEC(1) - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(1), -VEC_SIZE(%rdi,%rdx) + VMOVU -VEC_SIZE(%rsi,%rdx), %VMM(1) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(1), -VEC_SIZE(%rdi,%rdx) #if !(defined USE_MULTIARCH && IS_IN (libc)) ZERO_UPPER_VEC_REGISTERS_RETURN #else @@ -270,15 +262,15 @@ L(start_erms): cmp $VEC_SIZE, %RDX_LP jb L(less_vec) /* Load regardless. */ - VMOVU (%rsi), %VEC(0) + VMOVU (%rsi), %VMM(0) cmp $(VEC_SIZE * 2), %RDX_LP ja L(movsb_more_2x_vec) /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */ - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(1) - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(1), -VEC_SIZE(%rdi, %rdx) -L(return): + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(1) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(1), -VEC_SIZE(%rdi, %rdx) +L(return_vzeroupper): # if VEC_SIZE > 16 ZERO_UPPER_VEC_REGISTERS_RETURN # else @@ -359,10 +351,10 @@ L(between_16_31): .p2align 4,, 10 L(between_32_63): /* From 32 to 63. No branch when size == 32. */ - VMOVU (%rsi), %YMM0 - VMOVU -32(%rsi, %rdx), %YMM1 - VMOVU %YMM0, (%rdi) - VMOVU %YMM1, -32(%rdi, %rdx) + VMOVU (%rsi), %VMM_256(0) + VMOVU -32(%rsi, %rdx), %VMM_256(1) + VMOVU %VMM_256(0), (%rdi) + VMOVU %VMM_256(1), -32(%rdi, %rdx) VZEROUPPER_RETURN #endif @@ -380,12 +372,12 @@ L(last_4x_vec): /* Copy from 2 * VEC + 1 to 4 * VEC, inclusively. */ /* VEC(0) and VEC(1) have already been loaded. */ - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(2) - VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VEC(3) - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(1), VEC_SIZE(%rdi) - VMOVU %VEC(2), -VEC_SIZE(%rdi, %rdx) - VMOVU %VEC(3), -(VEC_SIZE * 2)(%rdi, %rdx) + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(2) + VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VMM(3) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(1), VEC_SIZE(%rdi) + VMOVU %VMM(2), -VEC_SIZE(%rdi, %rdx) + VMOVU %VMM(3), -(VEC_SIZE * 2)(%rdi, %rdx) VZEROUPPER_RETURN .p2align 4 @@ -400,24 +392,24 @@ L(more_2x_vec): cmpq $(VEC_SIZE * 8), %rdx ja L(more_8x_vec) /* Load VEC(1) regardless. VEC(0) has already been loaded. */ - VMOVU VEC_SIZE(%rsi), %VEC(1) + VMOVU VEC_SIZE(%rsi), %VMM(1) cmpq $(VEC_SIZE * 4), %rdx jbe L(last_4x_vec) /* Copy from 4 * VEC + 1 to 8 * VEC, inclusively. */ - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(2) - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(3) - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(4) - VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VEC(5) - VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VEC(6) - VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VEC(7) - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(1), VEC_SIZE(%rdi) - VMOVU %VEC(2), (VEC_SIZE * 2)(%rdi) - VMOVU %VEC(3), (VEC_SIZE * 3)(%rdi) - VMOVU %VEC(4), -VEC_SIZE(%rdi, %rdx) - VMOVU %VEC(5), -(VEC_SIZE * 2)(%rdi, %rdx) - VMOVU %VEC(6), -(VEC_SIZE * 3)(%rdi, %rdx) - VMOVU %VEC(7), -(VEC_SIZE * 4)(%rdi, %rdx) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(2) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(3) + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(4) + VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VMM(5) + VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VMM(6) + VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VMM(7) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(1), VEC_SIZE(%rdi) + VMOVU %VMM(2), (VEC_SIZE * 2)(%rdi) + VMOVU %VMM(3), (VEC_SIZE * 3)(%rdi) + VMOVU %VMM(4), -VEC_SIZE(%rdi, %rdx) + VMOVU %VMM(5), -(VEC_SIZE * 2)(%rdi, %rdx) + VMOVU %VMM(6), -(VEC_SIZE * 3)(%rdi, %rdx) + VMOVU %VMM(7), -(VEC_SIZE * 4)(%rdi, %rdx) VZEROUPPER_RETURN .p2align 4,, 4 @@ -466,14 +458,14 @@ L(more_8x_vec_forward): */ /* First vec was already loaded into VEC(0). */ - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(5) - VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VEC(6) + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(5) + VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VMM(6) /* Save begining of dst. */ movq %rdi, %rcx /* Align dst to VEC_SIZE - 1. */ orq $(VEC_SIZE - 1), %rdi - VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VEC(7) - VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VEC(8) + VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VMM(7) + VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VMM(8) /* Subtract dst from src. Add back after dst aligned. */ subq %rcx, %rsi @@ -488,25 +480,25 @@ L(more_8x_vec_forward): .p2align 4,, 11 L(loop_4x_vec_forward): /* Copy 4 * VEC a time forward. */ - VMOVU (%rsi), %VEC(1) - VMOVU VEC_SIZE(%rsi), %VEC(2) - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(3) - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(4) + VMOVU (%rsi), %VMM(1) + VMOVU VEC_SIZE(%rsi), %VMM(2) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(3) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(4) subq $-(VEC_SIZE * 4), %rsi - VMOVA %VEC(1), (%rdi) - VMOVA %VEC(2), VEC_SIZE(%rdi) - VMOVA %VEC(3), (VEC_SIZE * 2)(%rdi) - VMOVA %VEC(4), (VEC_SIZE * 3)(%rdi) + VMOVA %VMM(1), (%rdi) + VMOVA %VMM(2), VEC_SIZE(%rdi) + VMOVA %VMM(3), (VEC_SIZE * 2)(%rdi) + VMOVA %VMM(4), (VEC_SIZE * 3)(%rdi) subq $-(VEC_SIZE * 4), %rdi cmpq %rdi, %rdx ja L(loop_4x_vec_forward) /* Store the last 4 * VEC. */ - VMOVU %VEC(5), (VEC_SIZE * 3)(%rdx) - VMOVU %VEC(6), (VEC_SIZE * 2)(%rdx) - VMOVU %VEC(7), VEC_SIZE(%rdx) - VMOVU %VEC(8), (%rdx) + VMOVU %VMM(5), (VEC_SIZE * 3)(%rdx) + VMOVU %VMM(6), (VEC_SIZE * 2)(%rdx) + VMOVU %VMM(7), VEC_SIZE(%rdx) + VMOVU %VMM(8), (%rdx) /* Store the first VEC. */ - VMOVU %VEC(0), (%rcx) + VMOVU %VMM(0), (%rcx) /* Keep L(nop_backward) target close to jmp for 2-byte encoding. */ L(nop_backward): @@ -523,12 +515,12 @@ L(more_8x_vec_backward): addresses. */ /* First vec was also loaded into VEC(0). */ - VMOVU VEC_SIZE(%rsi), %VEC(5) - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(6) + VMOVU VEC_SIZE(%rsi), %VMM(5) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(6) /* Begining of region for 4x backward copy stored in rcx. */ leaq (VEC_SIZE * -4 + -1)(%rdi, %rdx), %rcx - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(7) - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(8) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(7) + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(8) /* Subtract dst from src. Add back after dst aligned. */ subq %rdi, %rsi /* Align dst. */ @@ -540,25 +532,25 @@ L(more_8x_vec_backward): .p2align 4,, 11 L(loop_4x_vec_backward): /* Copy 4 * VEC a time backward. */ - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(1) - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(2) - VMOVU (VEC_SIZE * 1)(%rsi), %VEC(3) - VMOVU (VEC_SIZE * 0)(%rsi), %VEC(4) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(1) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(2) + VMOVU (VEC_SIZE * 1)(%rsi), %VMM(3) + VMOVU (VEC_SIZE * 0)(%rsi), %VMM(4) addq $(VEC_SIZE * -4), %rsi - VMOVA %VEC(1), (VEC_SIZE * 3)(%rcx) - VMOVA %VEC(2), (VEC_SIZE * 2)(%rcx) - VMOVA %VEC(3), (VEC_SIZE * 1)(%rcx) - VMOVA %VEC(4), (VEC_SIZE * 0)(%rcx) + VMOVA %VMM(1), (VEC_SIZE * 3)(%rcx) + VMOVA %VMM(2), (VEC_SIZE * 2)(%rcx) + VMOVA %VMM(3), (VEC_SIZE * 1)(%rcx) + VMOVA %VMM(4), (VEC_SIZE * 0)(%rcx) addq $(VEC_SIZE * -4), %rcx cmpq %rcx, %rdi jb L(loop_4x_vec_backward) /* Store the first 4 * VEC. */ - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(5), VEC_SIZE(%rdi) - VMOVU %VEC(6), (VEC_SIZE * 2)(%rdi) - VMOVU %VEC(7), (VEC_SIZE * 3)(%rdi) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(5), VEC_SIZE(%rdi) + VMOVU %VMM(6), (VEC_SIZE * 2)(%rdi) + VMOVU %VMM(7), (VEC_SIZE * 3)(%rdi) /* Store the last VEC. */ - VMOVU %VEC(8), -VEC_SIZE(%rdx, %rdi) + VMOVU %VMM(8), -VEC_SIZE(%rdx, %rdi) VZEROUPPER_RETURN #if defined USE_MULTIARCH && IS_IN (libc) @@ -568,7 +560,7 @@ L(loop_4x_vec_backward): # if ALIGN_MOVSB L(skip_short_movsb_check): # if MOVSB_ALIGN_TO > VEC_SIZE - VMOVU VEC_SIZE(%rsi), %VEC(1) + VMOVU VEC_SIZE(%rsi), %VMM(1) # endif # if MOVSB_ALIGN_TO > (VEC_SIZE * 2) # error Unsupported MOVSB_ALIGN_TO @@ -597,9 +589,9 @@ L(skip_short_movsb_check): rep movsb - VMOVU %VEC(0), (%r8) + VMOVU %VMM(0), (%r8) # if MOVSB_ALIGN_TO > VEC_SIZE - VMOVU %VEC(1), VEC_SIZE(%r8) + VMOVU %VMM(1), VEC_SIZE(%r8) # endif VZEROUPPER_RETURN # endif @@ -640,7 +632,7 @@ L(movsb): # endif # if ALIGN_MOVSB # if MOVSB_ALIGN_TO > VEC_SIZE - VMOVU VEC_SIZE(%rsi), %VEC(1) + VMOVU VEC_SIZE(%rsi), %VMM(1) # endif # if MOVSB_ALIGN_TO > (VEC_SIZE * 2) # error Unsupported MOVSB_ALIGN_TO @@ -664,9 +656,9 @@ L(movsb_align_dst): rep movsb /* Store VECs loaded for aligning. */ - VMOVU %VEC(0), (%r8) + VMOVU %VMM(0), (%r8) # if MOVSB_ALIGN_TO > VEC_SIZE - VMOVU %VEC(1), VEC_SIZE(%r8) + VMOVU %VMM(1), VEC_SIZE(%r8) # endif VZEROUPPER_RETURN # else /* !ALIGN_MOVSB. */ @@ -701,18 +693,18 @@ L(large_memcpy_2x): /* First vec was also loaded into VEC(0). */ # if VEC_SIZE < 64 - VMOVU VEC_SIZE(%rsi), %VEC(1) + VMOVU VEC_SIZE(%rsi), %VMM(1) # if VEC_SIZE < 32 - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(2) - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(3) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(2) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(3) # endif # endif - VMOVU %VEC(0), (%rdi) + VMOVU %VMM(0), (%rdi) # if VEC_SIZE < 64 - VMOVU %VEC(1), VEC_SIZE(%rdi) + VMOVU %VMM(1), VEC_SIZE(%rdi) # if VEC_SIZE < 32 - VMOVU %VEC(2), (VEC_SIZE * 2)(%rdi) - VMOVU %VEC(3), (VEC_SIZE * 3)(%rdi) + VMOVU %VMM(2), (VEC_SIZE * 2)(%rdi) + VMOVU %VMM(3), (VEC_SIZE * 3)(%rdi) # endif # endif @@ -761,12 +753,12 @@ L(loop_large_memcpy_2x_inner): PREFETCH_ONE_SET(1, (%rsi), PAGE_SIZE + PREFETCHED_LOAD_SIZE) PREFETCH_ONE_SET(1, (%rsi), PAGE_SIZE + PREFETCHED_LOAD_SIZE * 2) /* Load vectors from rsi. */ - LOAD_ONE_SET((%rsi), 0, %VEC(0), %VEC(1), %VEC(2), %VEC(3)) - LOAD_ONE_SET((%rsi), PAGE_SIZE, %VEC(4), %VEC(5), %VEC(6), %VEC(7)) + LOAD_ONE_SET((%rsi), 0, %VMM(0), %VMM(1), %VMM(2), %VMM(3)) + LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7)) subq $-LARGE_LOAD_SIZE, %rsi /* Non-temporal store vectors to rdi. */ - STORE_ONE_SET((%rdi), 0, %VEC(0), %VEC(1), %VEC(2), %VEC(3)) - STORE_ONE_SET((%rdi), PAGE_SIZE, %VEC(4), %VEC(5), %VEC(6), %VEC(7)) + STORE_ONE_SET((%rdi), 0, %VMM(0), %VMM(1), %VMM(2), %VMM(3)) + STORE_ONE_SET((%rdi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7)) subq $-LARGE_LOAD_SIZE, %rdi decl %ecx jnz L(loop_large_memcpy_2x_inner) @@ -785,31 +777,31 @@ L(loop_large_memcpy_2x_tail): /* Copy 4 * VEC a time forward with non-temporal stores. */ PREFETCH_ONE_SET (1, (%rsi), PREFETCHED_LOAD_SIZE) PREFETCH_ONE_SET (1, (%rdi), PREFETCHED_LOAD_SIZE) - VMOVU (%rsi), %VEC(0) - VMOVU VEC_SIZE(%rsi), %VEC(1) - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(2) - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(3) + VMOVU (%rsi), %VMM(0) + VMOVU VEC_SIZE(%rsi), %VMM(1) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(2) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(3) subq $-(VEC_SIZE * 4), %rsi addl $-(VEC_SIZE * 4), %edx - VMOVA %VEC(0), (%rdi) - VMOVA %VEC(1), VEC_SIZE(%rdi) - VMOVA %VEC(2), (VEC_SIZE * 2)(%rdi) - VMOVA %VEC(3), (VEC_SIZE * 3)(%rdi) + VMOVA %VMM(0), (%rdi) + VMOVA %VMM(1), VEC_SIZE(%rdi) + VMOVA %VMM(2), (VEC_SIZE * 2)(%rdi) + VMOVA %VMM(3), (VEC_SIZE * 3)(%rdi) subq $-(VEC_SIZE * 4), %rdi cmpl $(VEC_SIZE * 4), %edx ja L(loop_large_memcpy_2x_tail) L(large_memcpy_2x_end): /* Store the last 4 * VEC. */ - VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VEC(0) - VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VEC(1) - VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VEC(2) - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(3) - - VMOVU %VEC(0), -(VEC_SIZE * 4)(%rdi, %rdx) - VMOVU %VEC(1), -(VEC_SIZE * 3)(%rdi, %rdx) - VMOVU %VEC(2), -(VEC_SIZE * 2)(%rdi, %rdx) - VMOVU %VEC(3), -VEC_SIZE(%rdi, %rdx) + VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VMM(0) + VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VMM(1) + VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VMM(2) + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(3) + + VMOVU %VMM(0), -(VEC_SIZE * 4)(%rdi, %rdx) + VMOVU %VMM(1), -(VEC_SIZE * 3)(%rdi, %rdx) + VMOVU %VMM(2), -(VEC_SIZE * 2)(%rdi, %rdx) + VMOVU %VMM(3), -VEC_SIZE(%rdi, %rdx) VZEROUPPER_RETURN .p2align 4 @@ -831,16 +823,16 @@ L(loop_large_memcpy_4x_inner): PREFETCH_ONE_SET(1, (%rsi), PAGE_SIZE * 2 + PREFETCHED_LOAD_SIZE) PREFETCH_ONE_SET(1, (%rsi), PAGE_SIZE * 3 + PREFETCHED_LOAD_SIZE) /* Load vectors from rsi. */ - LOAD_ONE_SET((%rsi), 0, %VEC(0), %VEC(1), %VEC(2), %VEC(3)) - LOAD_ONE_SET((%rsi), PAGE_SIZE, %VEC(4), %VEC(5), %VEC(6), %VEC(7)) - LOAD_ONE_SET((%rsi), PAGE_SIZE * 2, %VEC(8), %VEC(9), %VEC(10), %VEC(11)) - LOAD_ONE_SET((%rsi), PAGE_SIZE * 3, %VEC(12), %VEC(13), %VEC(14), %VEC(15)) + LOAD_ONE_SET((%rsi), 0, %VMM(0), %VMM(1), %VMM(2), %VMM(3)) + LOAD_ONE_SET((%rsi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7)) + LOAD_ONE_SET((%rsi), PAGE_SIZE * 2, %VMM(8), %VMM(9), %VMM(10), %VMM(11)) + LOAD_ONE_SET((%rsi), PAGE_SIZE * 3, %VMM(12), %VMM(13), %VMM(14), %VMM(15)) subq $-LARGE_LOAD_SIZE, %rsi /* Non-temporal store vectors to rdi. */ - STORE_ONE_SET((%rdi), 0, %VEC(0), %VEC(1), %VEC(2), %VEC(3)) - STORE_ONE_SET((%rdi), PAGE_SIZE, %VEC(4), %VEC(5), %VEC(6), %VEC(7)) - STORE_ONE_SET((%rdi), PAGE_SIZE * 2, %VEC(8), %VEC(9), %VEC(10), %VEC(11)) - STORE_ONE_SET((%rdi), PAGE_SIZE * 3, %VEC(12), %VEC(13), %VEC(14), %VEC(15)) + STORE_ONE_SET((%rdi), 0, %VMM(0), %VMM(1), %VMM(2), %VMM(3)) + STORE_ONE_SET((%rdi), PAGE_SIZE, %VMM(4), %VMM(5), %VMM(6), %VMM(7)) + STORE_ONE_SET((%rdi), PAGE_SIZE * 2, %VMM(8), %VMM(9), %VMM(10), %VMM(11)) + STORE_ONE_SET((%rdi), PAGE_SIZE * 3, %VMM(12), %VMM(13), %VMM(14), %VMM(15)) subq $-LARGE_LOAD_SIZE, %rdi decl %ecx jnz L(loop_large_memcpy_4x_inner) @@ -858,31 +850,31 @@ L(loop_large_memcpy_4x_tail): /* Copy 4 * VEC a time forward with non-temporal stores. */ PREFETCH_ONE_SET (1, (%rsi), PREFETCHED_LOAD_SIZE) PREFETCH_ONE_SET (1, (%rdi), PREFETCHED_LOAD_SIZE) - VMOVU (%rsi), %VEC(0) - VMOVU VEC_SIZE(%rsi), %VEC(1) - VMOVU (VEC_SIZE * 2)(%rsi), %VEC(2) - VMOVU (VEC_SIZE * 3)(%rsi), %VEC(3) + VMOVU (%rsi), %VMM(0) + VMOVU VEC_SIZE(%rsi), %VMM(1) + VMOVU (VEC_SIZE * 2)(%rsi), %VMM(2) + VMOVU (VEC_SIZE * 3)(%rsi), %VMM(3) subq $-(VEC_SIZE * 4), %rsi addl $-(VEC_SIZE * 4), %edx - VMOVA %VEC(0), (%rdi) - VMOVA %VEC(1), VEC_SIZE(%rdi) - VMOVA %VEC(2), (VEC_SIZE * 2)(%rdi) - VMOVA %VEC(3), (VEC_SIZE * 3)(%rdi) + VMOVA %VMM(0), (%rdi) + VMOVA %VMM(1), VEC_SIZE(%rdi) + VMOVA %VMM(2), (VEC_SIZE * 2)(%rdi) + VMOVA %VMM(3), (VEC_SIZE * 3)(%rdi) subq $-(VEC_SIZE * 4), %rdi cmpl $(VEC_SIZE * 4), %edx ja L(loop_large_memcpy_4x_tail) L(large_memcpy_4x_end): /* Store the last 4 * VEC. */ - VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VEC(0) - VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VEC(1) - VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VEC(2) - VMOVU -VEC_SIZE(%rsi, %rdx), %VEC(3) - - VMOVU %VEC(0), -(VEC_SIZE * 4)(%rdi, %rdx) - VMOVU %VEC(1), -(VEC_SIZE * 3)(%rdi, %rdx) - VMOVU %VEC(2), -(VEC_SIZE * 2)(%rdi, %rdx) - VMOVU %VEC(3), -VEC_SIZE(%rdi, %rdx) + VMOVU -(VEC_SIZE * 4)(%rsi, %rdx), %VMM(0) + VMOVU -(VEC_SIZE * 3)(%rsi, %rdx), %VMM(1) + VMOVU -(VEC_SIZE * 2)(%rsi, %rdx), %VMM(2) + VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(3) + + VMOVU %VMM(0), -(VEC_SIZE * 4)(%rdi, %rdx) + VMOVU %VMM(1), -(VEC_SIZE * 3)(%rdi, %rdx) + VMOVU %VMM(2), -(VEC_SIZE * 2)(%rdi, %rdx) + VMOVU %VMM(3), -VEC_SIZE(%rdi, %rdx) VZEROUPPER_RETURN #endif END (MEMMOVE_SYMBOL (__memmove, unaligned_erms)) From patchwork Fri Oct 14 22:39:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58873 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 40D7938515DD for ; Fri, 14 Oct 2022 22:40:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 40D7938515DD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787247; bh=61OdAFPtk3YHmwVo71nL9Xv3KxO17ZxhHlTlC9+Fk/Y=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=pgQzDgE2ZrdrIEOA67EUMwjAxLvLtGGTg4pUuTyoPqzDYxV+m84tmCQ8TnsUpfCsN g/KzNXJykFcFNkBrrlql8gJKS+TaXUro/hSTUZX4yuINRgW6/dgM3GV005R21F0xBz T9DV5+kYCvzCn0wNCuXtEr7nSc76hBsQpokBLvLk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by sourceware.org (Postfix) with ESMTPS id 6B4D63858401 for ; Fri, 14 Oct 2022 22:39:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B4D63858401 Received: by mail-pj1-x1035.google.com with SMTP id gf8so6151684pjb.5 for ; Fri, 14 Oct 2022 15:39:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=61OdAFPtk3YHmwVo71nL9Xv3KxO17ZxhHlTlC9+Fk/Y=; b=AMNNNsZxuBC3vBe7E6OWNxqPQ49zdR/ow4dvkM9cygmxE0NtRTM05WsX8qQYWX6/TD 3WK1579LU+PR1EvfsSLN8Bo1mw1VWcuuhQOjrnO5umcecKLsLCsxUfDUac8xckqo9J/A T22aP2gjsRwChBsfqm+OTInklZ3mfKMKMYwJ6u/L75KNFGlxUqKK7G2MPBT/5JY8N7To 503UmSEpaXECvXlTy0XE1KURCGy0rud5FCalpETfNrCHpdbqqxwkrQ7e3kEMe/BDgPlr hqMZgGbk04x2QlpH82ovt20yG+PD6XagFGTmNc1yTmPCNIzoluSyjpTE8ee7u2wAHhCp 2crg== X-Gm-Message-State: ACrzQf2GkHEU++fOsDIUFDvneFgz/UCm1pfTsPFDT0jGgOR3E/n17TxA kNK5XftvETKmWws9ixa5xAR0jhPCH0HCSA== X-Google-Smtp-Source: AMsMyM7+FQeJKl0MsfTHkax9v7kitWp4Qv/0NTnKjVxzf+MLEs3NpmzzmoXZELdR/XCPooR37BeYdw== X-Received: by 2002:a17:902:8303:b0:182:4ef1:8740 with SMTP id bd3-20020a170902830300b001824ef18740mr67426plb.4.1665787166965; Fri, 14 Oct 2022 15:39:26 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:26 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 6/7] x86: Update memset to use new VEC macros Date: Fri, 14 Oct 2022 17:39:13 -0500 Message-Id: <20221014223914.700492-6-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014223914.700492-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014223914.700492-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64 --- .../memset-avx2-unaligned-erms-rtm.S | 8 +-- .../multiarch/memset-avx2-unaligned-erms.S | 14 +--- .../multiarch/memset-avx512-unaligned-erms.S | 20 +----- .../multiarch/memset-evex-unaligned-erms.S | 20 +----- .../multiarch/memset-sse2-unaligned-erms.S | 10 +-- .../multiarch/memset-vec-unaligned-erms.S | 70 ++++++++----------- 6 files changed, 43 insertions(+), 99 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S index 8ac3e479bb..bc8605faf3 100644 --- a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S +++ b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S @@ -1,10 +1,6 @@ -#define ZERO_UPPER_VEC_REGISTERS_RETURN \ - ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST +#include "x86-avx-rtm-vecs.h" -#define VZEROUPPER_RETURN jmp L(return) - -#define SECTION(p) p##.avx.rtm #define MEMSET_SYMBOL(p,s) p##_avx2_##s##_rtm #define WMEMSET_SYMBOL(p,s) p##_avx2_##s##_rtm -#include "memset-avx2-unaligned-erms.S" +# include "memset-avx2-unaligned-erms.S" diff --git a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S index a9054a9122..47cf5072a4 100644 --- a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S @@ -4,14 +4,9 @@ # define USE_WITH_AVX2 1 -# define VEC_SIZE 32 -# define MOV_SIZE 4 -# define RET_SIZE 4 - -# define VEC(i) ymm##i - -# define VMOVU vmovdqu -# define VMOVA vmovdqa +# ifndef VEC_SIZE +# include "x86-avx-vecs.h" +# endif # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ vmovd d, %xmm0; \ @@ -26,9 +21,6 @@ # define WMEMSET_VDUP_TO_VEC0_HIGH() vpbroadcastd %xmm0, %ymm0 # define WMEMSET_VDUP_TO_VEC0_LOW() vpbroadcastd %xmm0, %xmm0 -# ifndef SECTION -# define SECTION(p) p##.avx -# endif # ifndef MEMSET_SYMBOL # define MEMSET_SYMBOL(p,s) p##_avx2_##s # endif diff --git a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S index 47623b8ee8..84145b6c27 100644 --- a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S @@ -4,26 +4,14 @@ # define USE_WITH_AVX512 1 -# define VEC_SIZE 64 -# define MOV_SIZE 6 -# define RET_SIZE 1 - -# define XMM0 xmm16 -# define YMM0 ymm16 -# define VEC0 zmm16 -# define VEC(i) VEC##i - -# define VMOVU vmovdqu64 -# define VMOVA vmovdqa64 - -# define VZEROUPPER +# include "x86-evex512-vecs.h" # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ - vpbroadcastb d, %VEC0; \ + vpbroadcastb d, %VMM(0); \ movq r, %rax # define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ - vpbroadcastd d, %VEC0; \ + vpbroadcastd d, %VMM(0); \ movq r, %rax # define MEMSET_VDUP_TO_VEC0_HIGH() @@ -32,8 +20,6 @@ # define WMEMSET_VDUP_TO_VEC0_HIGH() # define WMEMSET_VDUP_TO_VEC0_LOW() -# define SECTION(p) p##.evex512 - #ifndef MEMSET_SYMBOL # define MEMSET_SYMBOL(p,s) p##_avx512_##s #endif diff --git a/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S index ac4b2d2d50..1f03b26bf8 100644 --- a/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S @@ -4,26 +4,14 @@ # define USE_WITH_EVEX 1 -# define VEC_SIZE 32 -# define MOV_SIZE 6 -# define RET_SIZE 1 - -# define XMM0 xmm16 -# define YMM0 ymm16 -# define VEC0 ymm16 -# define VEC(i) VEC##i - -# define VMOVU vmovdqu64 -# define VMOVA vmovdqa64 - -# define VZEROUPPER +# include "x86-evex256-vecs.h" # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ - vpbroadcastb d, %VEC0; \ + vpbroadcastb d, %VMM(0); \ movq r, %rax # define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ - vpbroadcastd d, %VEC0; \ + vpbroadcastd d, %VMM(0); \ movq r, %rax # define MEMSET_VDUP_TO_VEC0_HIGH() @@ -32,8 +20,6 @@ # define WMEMSET_VDUP_TO_VEC0_HIGH() # define WMEMSET_VDUP_TO_VEC0_LOW() -# define SECTION(p) p##.evex - #ifndef MEMSET_SYMBOL # define MEMSET_SYMBOL(p,s) p##_evex_##s #endif diff --git a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S index 44f9b8888b..34b245d8ca 100644 --- a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S @@ -26,13 +26,7 @@ # include # define USE_WITH_SSE2 1 -# define VEC_SIZE 16 -# define MOV_SIZE 3 -# define RET_SIZE 1 - -# define VEC(i) xmm##i -# define VMOVU movups -# define VMOVA movaps +# include "x86-sse2-vecs.h" # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ movd d, %xmm0; \ @@ -52,8 +46,6 @@ # define WMEMSET_VDUP_TO_VEC0_HIGH() # define WMEMSET_VDUP_TO_VEC0_LOW() -# define SECTION(p) p - # ifndef MEMSET_SYMBOL # define MEMSET_SYMBOL(p,s) p##_sse2_##s # endif diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 905d0fa464..03de0ab907 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -34,14 +34,6 @@ # define WMEMSET_CHK_SYMBOL(p,s) WMEMSET_SYMBOL(p, s) #endif -#ifndef XMM0 -# define XMM0 xmm0 -#endif - -#ifndef YMM0 -# define YMM0 ymm0 -#endif - #ifndef VZEROUPPER # if VEC_SIZE > 16 # define VZEROUPPER vzeroupper @@ -150,8 +142,8 @@ L(entry_from_wmemset): cmpq $(VEC_SIZE * 2), %rdx ja L(more_2x_vec) /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */ - VMOVU %VEC(0), -VEC_SIZE(%rdi,%rdx) - VMOVU %VEC(0), (%rdi) + VMOVU %VMM(0), -VEC_SIZE(%rdi,%rdx) + VMOVU %VMM(0), (%rdi) VZEROUPPER_RETURN #if defined USE_MULTIARCH && IS_IN (libc) END (MEMSET_SYMBOL (__memset, unaligned)) @@ -175,19 +167,19 @@ ENTRY_P2ALIGN (MEMSET_SYMBOL (__memset, unaligned_erms), 6) cmp $(VEC_SIZE * 2), %RDX_LP ja L(stosb_more_2x_vec) /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */ - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi, %rdx) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(0), (VEC_SIZE * -1)(%rdi, %rdx) VZEROUPPER_RETURN #endif .p2align 4,, 4 L(last_2x_vec): #ifdef USE_LESS_VEC_MASK_STORE - VMOVU %VEC(0), (VEC_SIZE * -2)(%rdi, %rdx) - VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi, %rdx) + VMOVU %VMM(0), (VEC_SIZE * -2)(%rdi, %rdx) + VMOVU %VMM(0), (VEC_SIZE * -1)(%rdi, %rdx) #else - VMOVU %VEC(0), (VEC_SIZE * -2)(%rdi) - VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi) + VMOVU %VMM(0), (VEC_SIZE * -2)(%rdi) + VMOVU %VMM(0), (VEC_SIZE * -1)(%rdi) #endif VZEROUPPER_RETURN @@ -221,7 +213,7 @@ L(less_vec_from_wmemset): bzhil %edx, %ecx, %ecx kmovd %ecx, %k1 # endif - vmovdqu8 %VEC(0), (%rax){%k1} + vmovdqu8 %VMM(0), (%rax){%k1} VZEROUPPER_RETURN # if defined USE_MULTIARCH && IS_IN (libc) @@ -249,8 +241,8 @@ L(stosb_more_2x_vec): and (4x, 8x] jump to target. */ L(more_2x_vec): /* Store next 2x vec regardless. */ - VMOVU %VEC(0), (%rdi) - VMOVU %VEC(0), (VEC_SIZE * 1)(%rdi) + VMOVU %VMM(0), (%rdi) + VMOVU %VMM(0), (VEC_SIZE * 1)(%rdi) /* Two different methods of setting up pointers / compare. The two @@ -278,8 +270,8 @@ L(more_2x_vec): #endif /* Store next 2x vec regardless. */ - VMOVU %VEC(0), (VEC_SIZE * 2)(%rax) - VMOVU %VEC(0), (VEC_SIZE * 3)(%rax) + VMOVU %VMM(0), (VEC_SIZE * 2)(%rax) + VMOVU %VMM(0), (VEC_SIZE * 3)(%rax) #if defined USE_WITH_EVEX || defined USE_WITH_AVX512 @@ -304,20 +296,20 @@ L(more_2x_vec): andq $(VEC_SIZE * -2), %LOOP_REG .p2align 4 L(loop): - VMOVA %VEC(0), LOOP_4X_OFFSET(%LOOP_REG) - VMOVA %VEC(0), (VEC_SIZE + LOOP_4X_OFFSET)(%LOOP_REG) - VMOVA %VEC(0), (VEC_SIZE * 2 + LOOP_4X_OFFSET)(%LOOP_REG) - VMOVA %VEC(0), (VEC_SIZE * 3 + LOOP_4X_OFFSET)(%LOOP_REG) + VMOVA %VMM(0), LOOP_4X_OFFSET(%LOOP_REG) + VMOVA %VMM(0), (VEC_SIZE + LOOP_4X_OFFSET)(%LOOP_REG) + VMOVA %VMM(0), (VEC_SIZE * 2 + LOOP_4X_OFFSET)(%LOOP_REG) + VMOVA %VMM(0), (VEC_SIZE * 3 + LOOP_4X_OFFSET)(%LOOP_REG) subq $-(VEC_SIZE * 4), %LOOP_REG cmpq %END_REG, %LOOP_REG jb L(loop) .p2align 4,, MOV_SIZE L(last_4x_vec): - VMOVU %VEC(0), LOOP_4X_OFFSET(%END_REG) - VMOVU %VEC(0), (VEC_SIZE + LOOP_4X_OFFSET)(%END_REG) - VMOVU %VEC(0), (VEC_SIZE * 2 + LOOP_4X_OFFSET)(%END_REG) - VMOVU %VEC(0), (VEC_SIZE * 3 + LOOP_4X_OFFSET)(%END_REG) -L(return): + VMOVU %VMM(0), LOOP_4X_OFFSET(%END_REG) + VMOVU %VMM(0), (VEC_SIZE + LOOP_4X_OFFSET)(%END_REG) + VMOVU %VMM(0), (VEC_SIZE * 2 + LOOP_4X_OFFSET)(%END_REG) + VMOVU %VMM(0), (VEC_SIZE * 3 + LOOP_4X_OFFSET)(%END_REG) +L(return_vzeroupper): #if VEC_SIZE > 16 ZERO_UPPER_VEC_REGISTERS_RETURN #else @@ -355,7 +347,7 @@ L(cross_page): jge L(between_16_31) #endif #ifndef USE_XMM_LESS_VEC - MOVQ %XMM0, %SET_REG64 + MOVQ %VMM_128(0), %SET_REG64 #endif cmpl $8, %edx jge L(between_8_15) @@ -374,8 +366,8 @@ L(between_0_0): .p2align 4,, SMALL_MEMSET_ALIGN(MOV_SIZE, RET_SIZE) /* From 32 to 63. No branch when size == 32. */ L(between_32_63): - VMOVU %YMM0, (%LESS_VEC_REG) - VMOVU %YMM0, -32(%LESS_VEC_REG, %rdx) + VMOVU %VMM_256(0), (%LESS_VEC_REG) + VMOVU %VMM_256(0), -32(%LESS_VEC_REG, %rdx) VZEROUPPER_RETURN #endif @@ -383,8 +375,8 @@ L(between_32_63): .p2align 4,, SMALL_MEMSET_ALIGN(MOV_SIZE, 1) L(between_16_31): /* From 16 to 31. No branch when size == 16. */ - VMOVU %XMM0, (%LESS_VEC_REG) - VMOVU %XMM0, -16(%LESS_VEC_REG, %rdx) + VMOVU %VMM_128(0), (%LESS_VEC_REG) + VMOVU %VMM_128(0), -16(%LESS_VEC_REG, %rdx) ret #endif @@ -394,8 +386,8 @@ L(between_16_31): L(between_8_15): /* From 8 to 15. No branch when size == 8. */ #ifdef USE_XMM_LESS_VEC - MOVQ %XMM0, (%rdi) - MOVQ %XMM0, -8(%rdi, %rdx) + MOVQ %VMM_128(0), (%rdi) + MOVQ %VMM_128(0), -8(%rdi, %rdx) #else movq %SET_REG64, (%LESS_VEC_REG) movq %SET_REG64, -8(%LESS_VEC_REG, %rdx) @@ -408,8 +400,8 @@ L(between_8_15): L(between_4_7): /* From 4 to 7. No branch when size == 4. */ #ifdef USE_XMM_LESS_VEC - MOVD %XMM0, (%rdi) - MOVD %XMM0, -4(%rdi, %rdx) + MOVD %VMM_128(0), (%rdi) + MOVD %VMM_128(0), -4(%rdi, %rdx) #else movl %SET_REG32, (%LESS_VEC_REG) movl %SET_REG32, -4(%LESS_VEC_REG, %rdx) From patchwork Fri Oct 14 22:39:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 58874 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63A523857375 for ; Fri, 14 Oct 2022 22:40:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 63A523857375 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665787250; bh=d8/P1rn5ZgLME+tco3Jl7dOgZxz+GNq9hb6L2fXdjr0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=A3l9eHbm2GbHy8s0Q3mtrvfQMb4gD/mvD8BZJzzYQtw9hrr5GwOkjz73WxTtWn6SW vlXW87Fx3QizvlDk6ZZuvRlufLOwRmQzkrugE/qRjHIcqO1ydducwiF39RkVOSJEE/ 0xA1RpYbm7H9fXO2l/5SuztOrpHdOYwYD30ye6mA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id F2E3B3858418 for ; Fri, 14 Oct 2022 22:39:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F2E3B3858418 Received: by mail-pl1-x635.google.com with SMTP id b2so5997539plc.7 for ; Fri, 14 Oct 2022 15:39:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d8/P1rn5ZgLME+tco3Jl7dOgZxz+GNq9hb6L2fXdjr0=; b=J5KaEsY4cRLEdu1bRLtOl/fnpsbLPl+wAGPYbzmEZvSOLaCo/UQTHa2TRw9slENmQ6 xVcyvJnzJPT3Hylh82WIfROvjqAdZQ7JZtOR7spyuLWQ7T3mvQMcG+UScrTdAwdoUzVE TXlrUpJlvUNi4ldt6fG+H/KyE2TJb4GBPg/0BfcnmQVYB6eHuFDd2xve9KmIGJWwmFTJ QikS2/2GpsVXKTC6gopkDdrCGmtXftVzQReRbPygENjlJNaOHpALw4IPr+HSuKmFOo6p 6alrQjOQRZYc5eJNX3KqTmjJA+pxLM6LIox/X5wiXRJATJEvXpFMp8HCcda/n9B9xtyv csQw== X-Gm-Message-State: ACrzQf0EkGRRgSJqCU9SeuvUlYBRYDtaEMRTtdGpxMs/hcOpor5X1tE7 VykJkUbJCAZNh95uoPnbPvUvx9yqP/rqxA== X-Google-Smtp-Source: AMsMyM5mvUe1LI+eobNhmq6T8rqQrEHEg+2wiksjgWSdLkmxJVItlD/eMkL2fY9PpQPCfyYVlec8Zg== X-Received: by 2002:a17:902:b598:b0:184:e49d:3042 with SMTP id a24-20020a170902b59800b00184e49d3042mr157087pls.16.1665787168576; Fri, 14 Oct 2022 15:39:28 -0700 (PDT) Received: from noahgold-desk.. ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id r19-20020a170902e3d300b0017849a2b56asm2175471ple.46.2022.10.14.15.39.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 15:39:28 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 7/7] x86: Update strlen-evex-base to use new reg/vec macros. Date: Fri, 14 Oct 2022 17:39:14 -0500 Message-Id: <20221014223914.700492-7-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014223914.700492-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014223914.700492-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" To avoid duplicate the VMM / GPR / mask insn macros in all incoming evex512 files use the macros defined in 'reg-macros.h' and '{vec}-macros.h' This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/strlen-evex-base.S | 116 +++++++------------- sysdeps/x86_64/multiarch/strlen-evex512.S | 4 +- 2 files changed, 44 insertions(+), 76 deletions(-) diff --git a/sysdeps/x86_64/multiarch/strlen-evex-base.S b/sysdeps/x86_64/multiarch/strlen-evex-base.S index 418e9f8411..c832b15a48 100644 --- a/sysdeps/x86_64/multiarch/strlen-evex-base.S +++ b/sysdeps/x86_64/multiarch/strlen-evex-base.S @@ -36,42 +36,10 @@ # define CHAR_SIZE 1 # endif -# define XMM0 xmm16 # define PAGE_SIZE 4096 # define CHAR_PER_VEC (VEC_SIZE / CHAR_SIZE) -# if VEC_SIZE == 64 -# define KMOV kmovq -# define KORTEST kortestq -# define RAX rax -# define RCX rcx -# define RDX rdx -# define SHR shrq -# define TEXTSUFFIX evex512 -# define VMM0 zmm16 -# define VMM1 zmm17 -# define VMM2 zmm18 -# define VMM3 zmm19 -# define VMM4 zmm20 -# define VMOVA vmovdqa64 -# elif VEC_SIZE == 32 -/* Currently Unused. */ -# define KMOV kmovd -# define KORTEST kortestd -# define RAX eax -# define RCX ecx -# define RDX edx -# define SHR shrl -# define TEXTSUFFIX evex256 -# define VMM0 ymm16 -# define VMM1 ymm17 -# define VMM2 ymm18 -# define VMM3 ymm19 -# define VMM4 ymm20 -# define VMOVA vmovdqa32 -# endif - - .section .text.TEXTSUFFIX, "ax", @progbits + .section SECTION(.text),"ax",@progbits /* Aligning entry point to 64 byte, provides better performance for one vector length string. */ ENTRY_P2ALIGN (STRLEN, 6) @@ -86,18 +54,18 @@ ENTRY_P2ALIGN (STRLEN, 6) # endif movl %edi, %eax - vpxorq %XMM0, %XMM0, %XMM0 + vpxorq %VMM_128(0), %VMM_128(0), %VMM_128(0) andl $(PAGE_SIZE - 1), %eax cmpl $(PAGE_SIZE - VEC_SIZE), %eax ja L(page_cross) /* Compare [w]char for null, mask bit will be set for match. */ - VPCMP $0, (%rdi), %VMM0, %k0 - KMOV %k0, %RAX - test %RAX, %RAX + VPCMP $0, (%rdi), %VMM(0), %k0 + KMOV %k0, %VRAX + test %VRAX, %VRAX jz L(align_more) - bsf %RAX, %RAX + bsf %VRAX, %VRAX # ifdef USE_AS_STRNLEN cmpq %rsi, %rax cmovnb %rsi, %rax @@ -120,7 +88,7 @@ L(align_more): movq %rax, %rdx subq %rdi, %rdx # ifdef USE_AS_WCSLEN - SHR $2, %RDX + shr $2, %VRDX # endif /* At this point rdx contains [w]chars already compared. */ subq %rsi, %rdx @@ -131,9 +99,9 @@ L(align_more): # endif /* Loop unroll 4 times for 4 vector loop. */ - VPCMP $0, (%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, (%rax), %VMM(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x1) # ifdef USE_AS_STRNLEN @@ -141,9 +109,9 @@ L(align_more): jbe L(ret_max) # endif - VPCMP $0, VEC_SIZE(%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, VEC_SIZE(%rax), %VMM(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x2) # ifdef USE_AS_STRNLEN @@ -151,9 +119,9 @@ L(align_more): jbe L(ret_max) # endif - VPCMP $0, (VEC_SIZE * 2)(%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, (VEC_SIZE * 2)(%rax), %VMM(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x3) # ifdef USE_AS_STRNLEN @@ -161,9 +129,9 @@ L(align_more): jbe L(ret_max) # endif - VPCMP $0, (VEC_SIZE * 3)(%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, (VEC_SIZE * 3)(%rax), %VMM(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x4) # ifdef USE_AS_STRNLEN @@ -179,7 +147,7 @@ L(align_more): # ifdef USE_AS_STRNLEN subq %rax, %rcx # ifdef USE_AS_WCSLEN - SHR $2, %RCX + shr $2, %VRCX # endif /* rcx contains number of [w]char will be recompared due to alignment fixes. rdx must be incremented by rcx to offset @@ -199,42 +167,42 @@ L(loop_entry): # endif /* VPMINU and VPCMP combination provide better performance as compared to alternative combinations. */ - VMOVA (VEC_SIZE * 4)(%rax), %VMM1 - VPMINU (VEC_SIZE * 5)(%rax), %VMM1, %VMM2 - VMOVA (VEC_SIZE * 6)(%rax), %VMM3 - VPMINU (VEC_SIZE * 7)(%rax), %VMM3, %VMM4 + VMOVA (VEC_SIZE * 4)(%rax), %VMM(1) + VPMINU (VEC_SIZE * 5)(%rax), %VMM(1), %VMM(2) + VMOVA (VEC_SIZE * 6)(%rax), %VMM(3) + VPMINU (VEC_SIZE * 7)(%rax), %VMM(3), %VMM(4) - VPTESTN %VMM2, %VMM2, %k0 - VPTESTN %VMM4, %VMM4, %k1 + VPTESTN %VMM(2), %VMM(2), %k0 + VPTESTN %VMM(4), %VMM(4), %k1 subq $-(VEC_SIZE * 4), %rax KORTEST %k0, %k1 jz L(loop) - VPTESTN %VMM1, %VMM1, %k2 - KMOV %k2, %RCX - test %RCX, %RCX + VPTESTN %VMM(1), %VMM(1), %k2 + KMOV %k2, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x1) - KMOV %k0, %RCX + KMOV %k0, %VRCX /* At this point, if k0 is non zero, null char must be in the second vector. */ - test %RCX, %RCX + test %VRCX, %VRCX jnz L(ret_vec_x2) - VPTESTN %VMM3, %VMM3, %k3 - KMOV %k3, %RCX - test %RCX, %RCX + VPTESTN %VMM(3), %VMM(3), %k3 + KMOV %k3, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x3) /* At this point null [w]char must be in the fourth vector so no need to check. */ - KMOV %k1, %RCX + KMOV %k1, %VRCX /* Fourth, third, second vector terminating are pretty much same, implemented this way to avoid branching and reuse code from pre loop exit condition. */ L(ret_vec_x4): - bsf %RCX, %RCX + bsf %VRCX, %VRCX subq %rdi, %rax # ifdef USE_AS_WCSLEN subq $-(VEC_SIZE * 3), %rax @@ -250,7 +218,7 @@ L(ret_vec_x4): ret L(ret_vec_x3): - bsf %RCX, %RCX + bsf %VRCX, %VRCX subq %rdi, %rax # ifdef USE_AS_WCSLEN subq $-(VEC_SIZE * 2), %rax @@ -268,7 +236,7 @@ L(ret_vec_x3): L(ret_vec_x2): subq $-VEC_SIZE, %rax L(ret_vec_x1): - bsf %RCX, %RCX + bsf %VRCX, %VRCX subq %rdi, %rax # ifdef USE_AS_WCSLEN shrq $2, %rax @@ -289,13 +257,13 @@ L(page_cross): /* ecx contains number of w[char] to be skipped as a result of address alignment. */ xorq %rdi, %rax - VPCMP $0, (PAGE_SIZE - VEC_SIZE)(%rax), %VMM0, %k0 - KMOV %k0, %RAX + VPCMP $0, (PAGE_SIZE - VEC_SIZE)(%rax), %VMM(0), %k0 + KMOV %k0, %VRAX /* Ignore number of character for alignment adjustment. */ - SHR %cl, %RAX + shr %cl, %VRAX jz L(align_more) - bsf %RAX, %RAX + bsf %VRAX, %VRAX # ifdef USE_AS_STRNLEN cmpq %rsi, %rax cmovnb %rsi, %rax diff --git a/sysdeps/x86_64/multiarch/strlen-evex512.S b/sysdeps/x86_64/multiarch/strlen-evex512.S index 116f8981c8..10c3415c8a 100644 --- a/sysdeps/x86_64/multiarch/strlen-evex512.S +++ b/sysdeps/x86_64/multiarch/strlen-evex512.S @@ -2,6 +2,6 @@ # define STRLEN __strlen_evex512 #endif -#define VEC_SIZE 64 - +#include "x86-evex512-vecs.h" +#include "reg-macros.h" #include "strlen-evex-base.S"