From patchwork Tue May 23 20:39:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Erich Elsen X-Patchwork-Id: 20553 Received: (qmail 52901 invoked by alias); 23 May 2017 20:39:08 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 52870 invoked by uid 89); 23 May 2017 20:39:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-oi0-f44.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5e+0cYbhprvcUuzznXtBkerBByBASiSopiFQV/ZlxvM=; b=WxyLHrwXmlV+EtgwxjnNKd8xITWDdU3+3zagARzZl1nFo5p1GRY17vlgpjlODggy2B vWiobReLNyfAY7Hjq9RU4NmNRJSfhiS9N59anZ7NbdUh27Las1IancSQrGCZ2E6Seqlp z+T5scOweRvThF+7L3V03Yyp9eyinrnRd2LwFFIBD6Idtv/FP3Ccg9DeNuEOXsvuypdT dbXK4se7KaPLyFJ246Iy0eScXbGWU81T7JJgMHLmyEnYr4vevsMed66WHQCWDlAZqPqh 2NsQO7PZBCDm7LukLsz5bFuj5cijPnTDxJq1Yku5dnTkYSJzIob2GDI6XdXMhNmRl3FK lNHw== X-Gm-Message-State: AODbwcDz/om7PaAcJyTacEhzGvGFfrjazGI/1bmjjaCCZZKsH+t+mjS2 XZHa/1GeMMWIt5MZ6/ofjMkkl3OxLInA X-Received: by 10.157.15.103 with SMTP id 94mr2480112ott.113.1495571947361; Tue, 23 May 2017 13:39:07 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> From: Erich Elsen Date: Tue, 23 May 2017 13:39:06 -0700 Message-ID: Subject: Re: memcpy performance regressions 2.19 -> 2.24(5) To: "H.J. Lu" Cc: "Carlos O'Donell" , GNU C Library I was also thinking that it might be nice to have a TUNABLE that sets the implementation of memcpy directly. It would be easier to do this if memcpy.S was memcpy.c. Attached is a patch that does the conversion but doesn't add the tunables. How would you feel about this? It has no runtime impact, probably increases the size slightly, and makes the code easier to read / modify. On Mon, May 22, 2017 at 8:19 PM, Erich Elsen wrote: > Here is the patch that slightly refactors how init_cacheinfo is called. > > On Mon, May 22, 2017 at 7:24 PM, H.J. Lu wrote: >> On Mon, May 22, 2017 at 6:23 PM, Erich Elsen wrote: >>> I definitely think increasing the size in the case of processors with >>> a large number of cores makes sense. Hopefully with some testing we >>> can confirm it is a net win and/or find a more empirical number. >>> >>> Thanks for that patch with the tunable support. I've just put a >>> similar patch in review for sharing right now. It adds support in the >>> case that HAVE_TUNABLES isn't defined like the similar code in arena.c >>> and also makes a minor change that turns init_cacheinfo into a >>> init_cacheinfo_impl (a hidden callable). init_cacheinfo is now a >>> constructor that just calls the impl and passes the cpu_features >>> struct. This is useful in that it makes the code a bit more modular >>> (something that we'll need to be able to test this internally). >> >> This sounds a good idea. I'd also like to add tunable support in >> init_cpu_features to turn on/off CPU features. non_temporal_threshold >> will be one of them. >> >> >> -- >> H.J. From a2957f5a0b21f9588e8756228b11b86f886b0f4c Mon Sep 17 00:00:00 2001 From: Erich Elsen Date: Tue, 23 May 2017 12:29:24 -0700 Subject: [PATCH] add memcpy.c --- sysdeps/x86_64/multiarch/memcpy.c | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 sysdeps/x86_64/multiarch/memcpy.c diff --git a/sysdeps/x86_64/multiarch/memcpy.c b/sysdeps/x86_64/multiarch/memcpy.c new file mode 100644 index 0000000000..b0ff8c71fd --- /dev/null +++ b/sysdeps/x86_64/multiarch/memcpy.c @@ -0,0 +1,70 @@ +#include "cpu-features.h" +#include "init-arch.h" +#include "shlib-compat.h" +#include + +typedef void * (*memcpy_fn)(void *, const void *, size_t); + +extern void * __memcpy_erms(void *dest, const void *src, size_t n); +extern void * __memcpy_sse2_unaligned(void *dest, const void *src, size_t n); +extern void * __memcpy_sse2_unaligned_erms(void *dest, const void *src, size_t n); +extern void * __memcpy_ssse3(void *dest, const void *src, size_t n); +extern void * __memcpy_ssse3_back(void *dest, const void *src, size_t n); +extern void * __memcpy_avx_unaligned(void *dest, const void *src, size_t n); +extern void * __memcpy_avx_unaligned_erms(void *dest, const void *src, size_t n); +extern void * __memcpy_avx512_unaligned(void *dest, const void *src, size_t n); +extern void * __memcpy_avx512_unaligned_erms(void *dest, const void *src, size_t n); + +/* Defined in cacheinfo.c */ +extern long int __x86_shared_cache_size attribute_hidden; +extern long int __x86_shared_cache_size_half attribute_hidden; +extern long int __x86_data_cache_size attribute_hidden; +extern long int __x86_data_cache_size_half attribute_hidden; +extern long int __x86_shared_non_temporal_threshold attribute_hidden; + +static void * select_memcpy_impl(void) { + const struct cpu_features* cpu_features_struct_p = __get_cpu_features (); + + if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Prefer_ERMS)) { + return __memcpy_erms; + } + + if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, AVX512F_Usable)) { + if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Prefer_No_VZEROUPPER)) + return __memcpy_avx512_unaligned_erms; + return __memcpy_avx512_unaligned; + } + + if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, AVX_Fast_Unaligned_Load)) { + if (CPU_FEATURES_CPU_P(cpu_features_struct_p, ERMS)) { + return __memcpy_avx_unaligned_erms; + + } + return __memcpy_avx_unaligned; + } + else { + if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Fast_Unaligned_Copy)) { + if (CPU_FEATURES_CPU_P(cpu_features_struct_p, ERMS)) { + return __memcpy_sse2_unaligned_erms; + + } + return __memcpy_sse2_unaligned; + } + else { + if (!CPU_FEATURES_CPU_P(cpu_features_struct_p, SSSE3)) { + return __memcpy_sse2_unaligned; + + } + if (CPU_FEATURES_ARCH_P(cpu_features_struct_p, Fast_Copy_Backward)) { + return __memcpy_ssse3_back; + + } + return __memcpy_ssse3; + } + } +} + +void *__new_memcpy(void *dest, const void *src, size_t n) + __attribute__ ((ifunc ("select_memcpy_impl"))); + +versioned_symbol(libc, __new_memcpy, memcpy, GLIBC_2_14); -- 2.13.0.219.gdb65acc882-goog