From patchwork Wed Sep 23 15:13:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick McGehearty X-Patchwork-Id: 40482 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 85F8E394D8B1; Wed, 23 Sep 2020 15:13:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 85F8E394D8B1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1600874037; bh=jyzroaCC1VS/UloMPrJaQoN4p+9yraPmO17435+CmYk=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=nTbe2q1Zkb9A6pPqmT4eVEGjXzsYjRUqmV4/Bhu2eBlBx3B2YSki571LndX1WoMqE 7w2h5njvzGJIkZmCXetRC4chcvGHCXk37F59bjyG1y9/9+EFFwt+YXMntas56B71aZ uKs7rhwiCPN3ECb+5YZPeyWmp/fQmRDV6UlumDCI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by sourceware.org (Postfix) with ESMTPS id A4B0A386F433 for ; Wed, 23 Sep 2020 15:13:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A4B0A386F433 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 08NF9k4M045612 for ; Wed, 23 Sep 2020 15:13:54 GMT Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2130.oracle.com with ESMTP id 33qcpu00d7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Wed, 23 Sep 2020 15:13:54 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 08NF66HJ022149 for ; Wed, 23 Sep 2020 15:13:53 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 33r28vnk8s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 23 Sep 2020 15:13:53 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 08NFDqQD012791 for ; Wed, 23 Sep 2020 15:13:52 GMT Received: from arm34vm.us.oracle.com (/10.211.14.34) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 23 Sep 2020 08:13:52 -0700 To: libc-alpha@sourceware.org Subject: [PATCH] Reversing calculation of __x86_shared_non_temporal_threshold Date: Wed, 23 Sep 2020 15:13:47 +0000 Message-Id: <1600874027-7718-1-git-send-email-patrick.mcgehearty@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9753 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 mlxlogscore=999 suspectscore=0 adultscore=0 bulkscore=0 malwarescore=0 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009230123 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9753 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 priorityscore=1501 phishscore=0 spamscore=0 malwarescore=0 clxscore=1011 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009230123 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Patrick McGehearty via Libc-alpha From: Patrick McGehearty Reply-To: Patrick McGehearty Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" The __x86_shared_non_temporal_threshold determines when memcpy on x86 uses non_temporal stores to avoid pushing other data out of the last level cache. This patch proposes to revert the calculation change made by H.J. Lu's patch of June 2, 2017. H.J. Lu's patch selected a threshold suitable for a single thread getting maximum performance. It was tuned using the single threaded large memcpy micro benchmark on an 8 core processor. The last change changes the threshold from using 3/4 of one thread's share of the cache to using 3/4 of the entire cache of a multi-threaded system before switching to non-temporal stores. Multi-threaded systems with more than a few threads are server-class and typically have many active threads. If one thread consumes 3/4 of the available cache for all threads, it will cause other active threads to have data removed from the cache. Two examples show the range of the effect. John McCalpin's widely parallel Stream benchmark, which runs in parallel and fetches data sequentially, saw a 20% slowdown with this patch on an internal system test of 128 threads. This regression was discovered when comparing OL8 performance to OL7. An example that compares normal stores to non-temporal stores may be found at https://vgatherps.github.io/2018-09-02-nontemporal/. A simple test shows performance loss of 400 to 500% due to a failure to use nontemporal stores. These performance losses are most likely to occur when the system load is heaviest and good performance is critical. The tunable x86_non_temporal_threshold can be used to override the default for the knowledgable user who really wants maximum cache allocation to a single thread in a multi-threaded system. The manual entry for the tunable has been expanded to provide more information about its purpose. modified: sysdeps/x86/cacheinfo.c modified: manual/tunables.texi --- manual/tunables.texi | 6 +++++- sysdeps/x86/cacheinfo.c | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/manual/tunables.texi b/manual/tunables.texi index b6bb54d..94d4fbd 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -364,7 +364,11 @@ set shared cache size in bytes for use in memory and string routines. @deftp Tunable glibc.tune.x86_non_temporal_threshold The @code{glibc.tune.x86_non_temporal_threshold} tunable allows the user -to set threshold in bytes for non temporal store. +to set threshold in bytes for non temporal store. Non temporal stores +give a hint to the hardware to move data directly to memory without +displacing other data from the cache. This tunable is used by some +platforms to determine when to use non temporal stores in operations +like memmove and memcpy. This tunable is specific to i386 and x86-64. @end deftp diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c index b9444dd..5c5192a 100644 --- a/sysdeps/x86/cacheinfo.c +++ b/sysdeps/x86/cacheinfo.c @@ -785,7 +785,7 @@ intel_bug_no_cache_info: __x86_shared_non_temporal_threshold = (cpu_features->non_temporal_threshold != 0 ? cpu_features->non_temporal_threshold - : __x86_shared_cache_size * threads * 3 / 4); + : __x86_shared_cache_size * 3 / 4); } #endif