From patchwork Tue Aug 18 21:18:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ondrej Bilka X-Patchwork-Id: 8279 Received: (qmail 88303 invoked by alias); 18 Aug 2015 21:18:38 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 88291 invoked by uid 89); 18 Aug 2015 21:18:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Tue, 18 Aug 2015 23:18:26 +0200 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: libc-alpha@sourceware.org Subject: [RFC] Fixing strcmp performance on power7 for unaligned loads. Message-ID: <20150818211826.GA8700@domone> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Hi, As I told before that benchmarks should be read or they are useless so I looked on powerpc ones. I noticed that power7 strcmp and strncmp are about five times slower than memcmp for unaligned case. Thats too much so I could easily improve performance by 50% on that case by implementing strcmp as strnlen+memcmp loop despite overhead of strnlen. As that loop is due that overhead lot slower than aligned data it should be fixed in assembly by changing unaligned case to follow pattern in following c code. A strncmp should be same case when I will handle correctly handle corner cases, benchmark results that i have now are same until segfault. Same optimization would probably work also for older machines but I don't have one to test it. A part of benchtest large inputs is here: simple_strcmp stupid_strcmp __strcmp_power7 __strcmp_power7b __strcmp_ppc Length 32, alignment 0/ 0: 22.6719 31.8438 3.40625 14.875 5.39062 Length 32, alignment 0/ 4: 22.75 31.7969 18.9062 19.1094 19.2344 Length 32, alignment 4/ 5: 22.75 31.75 18.1875 20.1719 22.6562 Length 64, alignment 0/ 0: 40.3906 51.2031 5.03125 15.0156 8 Length 64, alignment 0/ 5: 40.5312 51.6094 24.6562 18.0781 32.5156 Length 64, alignment 5/ 6: 40.7969 51.0781 23.9531 19.3281 32.7188 Length 128, alignment 0/ 0: 76.5 91.5312 8 32.3281 17.4219 Length 128, alignment 0/ 6: 76.5 90.7969 45.25 41.25 60.5 Length 128, alignment 6/ 7: 76.25 91.1562 43.5 40.3906 61.7031 Length 256, alignment 0/ 0: 148.156 168.656 18.3281 57.7188 27.7656 Length 256, alignment 0/ 7: 148.422 168.969 83.0469 65.6406 115.828 Length 256, alignment 7/ 8: 146.25 169.391 83.5938 67.9219 115.75 Length 512, alignment 0/ 0: 291.953 333.031 30.25 90.9219 48.5 Length 512, alignment 0/ 8: 291.516 339.516 30.2656 93.0469 48.7188 Length 512, alignment 8/ 9: 291.578 333.984 161.75 109.109 226.281 Length 1024, alignment 0/ 0: 587.406 656.406 55.1562 159.688 89.7812 Length 1024, alignment 0/ 0: 578.688 649.219 55.2812 160.188 90.2188 Length 1024, alignment 0/ 9: 588.781 653.062 318.406 203.547 447.328 Length 1024, alignment 9/10: 589.406 650.5 320.688 196.375 447.484 diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c index 364385b..bbf6ee6 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c +++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c @@ -308,6 +318,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, strcmp, hwcap & PPC_FEATURE_HAS_VSX, __strcmp_power7) + IFUNC_IMPL_ADD (array, i, strcmp, + hwcap & PPC_FEATURE_HAS_VSX, + __strcmp_power7b) + IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_ppc)) diff --git a/sysdeps/powerpc/powerpc64/multiarch/strcmp.c b/sysdeps/powerpc/powerpc64/multiarch/strcmp.c index b45ba1f..fd7a1b9 100644 --- a/sysdeps/powerpc/powerpc64/multiarch/strcmp.c +++ b/sysdeps/powerpc/powerpc64/multiarch/strcmp.c @@ -20,10 +20,47 @@ # include # include # include "init-arch.h" - extern __typeof (strcmp) __strcmp_ppc attribute_hidden; extern __typeof (strcmp) __strcmp_power7 attribute_hidden; extern __typeof (strcmp) __strcmp_power8 attribute_hidden; +extern __typeof (strnlen) __strnlen_power7 attribute_hidden; +extern __typeof (memcmp) __memcmp_power7 attribute_hidden; +extern __typeof (strcmp) __strcmp_power7 attribute_hidden; + +# include "libc-internal.h" +int __strcmp_power7b(const char *a, const char *b) +{ + size_t len; + int ret; + len = __strnlen_power7 (a, 64); + len = __strnlen_power7 (b, len); + if (len != 64) + { + return __memcmp_power7 (a, b, len + 1); + } + ret = __memcmp_power7 (a, b, 64); + if (ret) + return ret; + + const char *a_old = a; + a = PTR_ALIGN_DOWN (a + 64, 64); + b += a - a_old; + + while (1) + { + len = __strnlen_power7 (b, 64); + if (len != 64) + { + return __memcmp_power7 (a, b, len + 1); + } + + ret = __memcmp_power7 (a, b, 64); + if (ret) + return ret; + a+=64; + b+=64; + } +} libc_ifunc (strcmp, (hwcap2 & PPC_FEATURE2_ARCH_2_07)