From patchwork Mon Jun 27 19:06:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 55456 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B8DF8386C582 for ; Mon, 27 Jun 2022 19:07:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B8DF8386C582 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656356843; bh=7W+48MdVmGh6sjWgB13wPv+ZMX/bpL3kM60N8pbqLuo=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=AGR+mPnF8mwn/pKupdTEB1kml9tKc3/Dmf3O/daGzT28hdfIwqUOAYu5tlH54vowJ z1kcEbTodPz8itIEZcCdS+XBK/4RYMlOtBf3bCYC+SzUU5Z9JKrib0US4MLCqK6Mri NKVaBoPfkYG+s+2X2vODq5c88ep5vbm8X+cSJ4ak= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by sourceware.org (Postfix) with ESMTPS id 4D5B138582BE for ; Mon, 27 Jun 2022 19:07:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D5B138582BE Received: by mail-pj1-x102f.google.com with SMTP id b12-20020a17090a6acc00b001ec2b181c98so13404575pjm.4 for ; Mon, 27 Jun 2022 12:07:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=7W+48MdVmGh6sjWgB13wPv+ZMX/bpL3kM60N8pbqLuo=; b=HjsP8duywsPIyHihabpmSvND7wmUmB/gO98LhZu8yFyDg/C3pUph0F6EoUDD1j2P3j jWSwxvHYPgjzOvETfB+tq4m4CCqt7s7XGSvC/VHFE83UN8obeUUY8US2b36IzRxp8GyU xEZyvIcfzAapwfiIldwOR23IAOomiD5KY9y2DPF7LjNGRLcSmwFwPO5tW6TTPw/GWLpG wyUyHcRxMe3pR5dpbknhB5PYY7yG8tZGFwNVni4g8xoEI6VlT0c6KD99V2P9ho2q8YUn I7s2rkDRlPoJPIZRhjeC0v1XIqQPiZcr4ZcyJhAlSiBlUSozXNdd4taCgCXFeF+fgr84 am2A== X-Gm-Message-State: AJIora9iDrvA9NGrQD0FWr1esd2bXih/gKTCffb79CVGPvYPRBdLZHl3 ojca1saGR7D6w9OVe4INEFUkOrteLVA= X-Google-Smtp-Source: AGRyM1uF5kmgjDXqo1KGIHxP+nJsGQd1EYzT6XW+jiIPjmOQRwTDEHodlrGZGYKmJe+QLwC+fwAasQ== X-Received: by 2002:a17:902:7893:b0:16a:6d44:2556 with SMTP id q19-20020a170902789300b0016a6d442556mr913997pll.166.1656356821201; Mon, 27 Jun 2022 12:07:01 -0700 (PDT) Received: from gnu-tgl-3.localdomain ([172.58.37.230]) by smtp.gmail.com with ESMTPSA id e4-20020a17090ab38400b001ecaa74f8dasm6847336pjr.11.2022.06.27.12.07.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 12:07:00 -0700 (PDT) Received: from gnu-tgl-3.. (localhost [IPv6:::1]) by gnu-tgl-3.localdomain (Postfix) with ESMTP id CA8C6C0351; Mon, 27 Jun 2022 12:06:59 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH] x86-64: Only define used SSE/AVX/AVX512 run-time resolvers Date: Mon, 27 Jun 2022 12:06:59 -0700 Message-Id: <20220627190659.831144-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" When glibc is built with x86-64 ISA level v3, SSE run-time resolvers aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers are unused. 1. Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to isa-level.h. 2. Check the minimum x86-64 ISA level to exclude the unused run-time resolvers. --- sysdeps/x86/isa-ifunc-macros.h | 27 ---------------- sysdeps/x86/isa-level.h | 26 +++++++++++++++ sysdeps/x86_64/dl-machine.h | 12 ++++--- sysdeps/x86_64/dl-trampoline.S | 59 ++++++++++++++++++---------------- 4 files changed, 66 insertions(+), 58 deletions(-) diff --git a/sysdeps/x86/isa-ifunc-macros.h b/sysdeps/x86/isa-ifunc-macros.h index d69905689b..f967a1bec6 100644 --- a/sysdeps/x86/isa-ifunc-macros.h +++ b/sysdeps/x86/isa-ifunc-macros.h @@ -56,31 +56,4 @@ # define X86_IFUNC_IMPL_ADD_V1(...) #endif -/* Both X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P - macros are wrappers for the the respective - CPU_FEATURE{S}_{USABLE|ARCH}_P runtime checks. They differ in two - ways. - - 1. The USABLE_P version is evaluated to true when the feature - is enabled. - - 2. The ARCH_P version has a third argument `not`. The `not` - argument can either be '!' or empty. If the feature is - enabled above an ISA level, the third argument should be empty - and the expression is evaluated to true when the feature is - enabled. If the feature is disabled above an ISA level, the - third argument should be `!` and the expression is evaluated - to true when the feature is disabled. - */ - -#define X86_ISA_CPU_FEATURE_USABLE_P(ptr, name) \ - (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ - || CPU_FEATURE_USABLE_P (ptr, name)) - - -#define X86_ISA_CPU_FEATURES_ARCH_P(ptr, name, not) \ - (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ - || not CPU_FEATURES_ARCH_P (ptr, name)) - - #endif diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index 075e7c6ee1..f293aea906 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -68,10 +68,12 @@ compile-time constant.. */ /* ISA level >= 4 guaranteed includes. */ +#define AVX512F_X86_ISA_LEVEL 4 #define AVX512VL_X86_ISA_LEVEL 4 #define AVX512BW_X86_ISA_LEVEL 4 /* ISA level >= 3 guaranteed includes. */ +#define AVX_X86_ISA_LEVEL 3 #define AVX2_X86_ISA_LEVEL 3 #define BMI2_X86_ISA_LEVEL 3 @@ -87,6 +89,30 @@ when ISA level < 3. */ #define Prefer_No_VZEROUPPER_X86_ISA_LEVEL 3 +/* Both X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P + macros are wrappers for the respective CPU_FEATURE{S}_{USABLE|ARCH}_P + runtime checks. They differ in two ways. + + 1. The USABLE_P version is evaluated to true when the feature + is enabled. + + 2. The ARCH_P version has a third argument `not`. The `not` + argument can either be `!` or empty. If the feature is + enabled above an ISA level, the third argument should be empty + and the expression is evaluated to true when the feature is + enabled. If the feature is disabled above an ISA level, the + third argument should be `!` and the expression is evaluated + to true when the feature is disabled. + */ + +#define X86_ISA_CPU_FEATURE_USABLE_P(ptr, name) \ + (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ + || CPU_FEATURE_USABLE_P (ptr, name)) + +#define X86_ISA_CPU_FEATURES_ARCH_P(ptr, name, not) \ + (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ + || not CPU_FEATURES_ARCH_P (ptr, name)) + #define ISA_SHOULD_BUILD(isa_build_level) \ (MINIMUM_X86_ISA_LEVEL <= (isa_build_level) && IS_IN (libc)) \ || defined ISA_DEFAULT_IMPL diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h index 34766325ae..005d089501 100644 --- a/sysdeps/x86_64/dl-machine.h +++ b/sysdeps/x86_64/dl-machine.h @@ -28,6 +28,7 @@ #include #include #include +#include /* Return nonzero iff ELF header is compatible with the running host. */ static inline int __attribute__ ((unused)) @@ -86,6 +87,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* Identify this shared object. */ *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l; + const struct cpu_features* cpu_features = __get_cpu_features (); + /* The got[2] entry contains the address of a function which gets called to get the address of a so far unresolved function and jump to it. The profiling extension of the dynamic linker allows @@ -94,9 +97,9 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], end in this function. */ if (__glibc_unlikely (profile)) { - if (CPU_FEATURE_USABLE (AVX512F)) + if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F)) *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512; - else if (CPU_FEATURE_USABLE (AVX)) + else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX)) *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx; else *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_sse; @@ -112,9 +115,10 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* This function will get called to fix up the GOT entry indicated by the offset on the stack, and then jump to the resolved address. */ - if (GLRO(dl_x86_cpu_features).xsave_state_size != 0) + if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL + || GLRO(dl_x86_cpu_features).xsave_state_size != 0) *(ElfW(Addr) *) (got + 2) - = (CPU_FEATURE_USABLE (XSAVEC) + = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC) ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec : (ElfW(Addr)) &_dl_runtime_resolve_xsave); else diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S index 831a654713..f669805ac5 100644 --- a/sysdeps/x86_64/dl-trampoline.S +++ b/sysdeps/x86_64/dl-trampoline.S @@ -20,6 +20,7 @@ #include #include #include +#include #ifndef DL_STACK_ALIGNMENT /* Due to GCC bug: @@ -62,35 +63,39 @@ #undef VMOVA #undef VEC_SIZE -#define VEC_SIZE 32 -#define VMOVA vmovdqa -#define VEC(i) ymm##i -#define _dl_runtime_profile _dl_runtime_profile_avx -#include "dl-trampoline.h" -#undef _dl_runtime_profile -#undef VEC -#undef VMOVA -#undef VEC_SIZE +#if MINIMUM_X86_ISA_LEVEL <= AVX_X86_ISA_LEVEL +# define VEC_SIZE 32 +# define VMOVA vmovdqa +# define VEC(i) ymm##i +# define _dl_runtime_profile _dl_runtime_profile_avx +# include "dl-trampoline.h" +# undef _dl_runtime_profile +# undef VEC +# undef VMOVA +# undef VEC_SIZE +#endif +#if MINIMUM_X86_ISA_LEVEL < AVX_X86_ISA_LEVEL /* movaps/movups is 1-byte shorter. */ -#define VEC_SIZE 16 -#define VMOVA movaps -#define VEC(i) xmm##i -#define _dl_runtime_profile _dl_runtime_profile_sse -#undef RESTORE_AVX -#include "dl-trampoline.h" -#undef _dl_runtime_profile -#undef VEC -#undef VMOVA -#undef VEC_SIZE - -#define USE_FXSAVE -#define STATE_SAVE_ALIGNMENT 16 -#define _dl_runtime_resolve _dl_runtime_resolve_fxsave -#include "dl-trampoline.h" -#undef _dl_runtime_resolve -#undef USE_FXSAVE -#undef STATE_SAVE_ALIGNMENT +# define VEC_SIZE 16 +# define VMOVA movaps +# define VEC(i) xmm##i +# define _dl_runtime_profile _dl_runtime_profile_sse +# undef RESTORE_AVX +# include "dl-trampoline.h" +# undef _dl_runtime_profile +# undef VEC +# undef VMOVA +# undef VEC_SIZE + +# define USE_FXSAVE +# define STATE_SAVE_ALIGNMENT 16 +# define _dl_runtime_resolve _dl_runtime_resolve_fxsave +# include "dl-trampoline.h" +# undef _dl_runtime_resolve +# undef USE_FXSAVE +# undef STATE_SAVE_ALIGNMENT +#endif #define USE_XSAVE #define STATE_SAVE_ALIGNMENT 64