Message ID | e6de570b-48bf-88cf-2cec-5f5a5e7821bf@huawei.com |
---|---|
State | Dropped |
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3C2C8386F83F; Sat, 23 May 2020 04:10:17 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from huawei.com (szxga05-in.huawei.com [45.249.212.191]) by sourceware.org (Postfix) with ESMTPS id 45CAD386F465 for <libc-alpha@sourceware.org>; Sat, 23 May 2020 04:10:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 45CAD386F465 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=liqingqing3@huawei.com Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id A83DBE9D999278B3FA26 for <libc-alpha@sourceware.org>; Sat, 23 May 2020 12:10:11 +0800 (CST) Received: from [127.0.0.1] (10.166.213.55) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.487.0; Sat, 23 May 2020 12:10:02 +0800 Subject: [PATCH]x86: update REP_STOSB_THRESHOLD's default value from 2k to 1M References: <15ec783d-46f5-0166-aee9-f1d16a58ca83@huawei.com> To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>, <hjl.tools@gmail.com>, Hushiyuan <hushiyuan@huawei.com> From: liqingqing <liqingqing3@huawei.com> X-Forwarded-Message-Id: <15ec783d-46f5-0166-aee9-f1d16a58ca83@huawei.com> Message-ID: <e6de570b-48bf-88cf-2cec-5f5a5e7821bf@huawei.com> Date: Sat, 23 May 2020 12:10:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0 MIME-Version: 1.0 In-Reply-To: <15ec783d-46f5-0166-aee9-f1d16a58ca83@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.166.213.55] X-CFilter-Loop: Reflected X-Spam-Status: No, score=-14.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <http://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <http://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
x86: update REP_STOSB_THRESHOLD's default value from 2k to 1M
|
|
Commit Message
Qingqing Li
May 23, 2020, 4:10 a.m. UTC
this commitid 830566307f038387ca0af3fd327706a8d1a2f595 optimize implementation of function memset,
and set macro REP_STOSB_THRESHOLD's default value to 2KB, when the input value is less than 2KB, the data flow is the same, and when the input value is large than 2KB,
this api will use STOB to instead of MOVQ
but when I test this API on x86_64 platform
and found that this default value is not appropriate for some input length. here it's the enviornment and result
test suite: libMicro-0.4.0
./memset -E -C 200 -L -S -W -N "memset_4k" -s 4k -I 250
./memset -E -C 200 -L -S -W -N "memset_4k_uc" -s 4k -u -I 400
./memset -E -C 200 -L -S -W -N "memset_1m" -s 1m -I 200000
./memset -E -C 200 -L -S -W -N "memset_10m" -s 10m -I 2000000
hardware platform:
Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
L1d cache:32KB
L1i cache: 32KB
L2 cache: 1MB
L3 cache: 60MB
the result is that when input length is between the processor's L1 data cache and L2 cache size, the REP_STOSB_THRESHOLD=2KB will reduce performance.
before this commit after this commit
cycle cycle
memset_4k 249 96
memset_10k 657 185
memset_36k 2773 3767
memset_100k 7594 10002
memset_500k 37678 52149
memset_1m 86780 108044
memset_10m 1307238 1148994
before this commit after this commit
MLC cache miss(10sec) MLC cache miss(10sec)
memset_4k 1,09,33,823 1,01,79,270
memset_10k 1,23,78,958 1,05,41,087
memset_36k 3,61,64,244 4,07,22,429
memset_100k 8,25,33,052 9,31,81,253
memset_500k 37,32,55,449 43,56,70,395
memset_1m 75,16,28,239 88,29,90,237
memset_10m 9,36,61,67,397 8,96,69,49,522
though REP_STOSB_THRESHOLD can be modified at the building time by use -DREP_STOSB_THRESHOLD=xxx,
but I think the default value may be is not a better one, cause I think most of the processor's L2 cache is large than 2KB, so i submit a patch as below:
From 44314a556239a7524b5a6451025737c1bdbb1cd0 Mon Sep 17 00:00:00 2001
From: liqingqing <liqingqing3@huawei.com>
Date: Thu, 21 May 2020 11:23:06 +0800
Subject: [PATCH] update REP_STOSB_THRESHOLD's default value from 2k to 1M
macro REP_STOSB_THRESHOLD's value will reduce memset performace when input length is between processor's L1 data cache and L2 cache.
so update the defaule value to eliminate the decrement .
---
sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On 5/23/20 9:40 AM, liqingqing wrote: > this commitid 830566307f038387ca0af3fd327706a8d1a2f595 optimize implementation of function memset, > and set macro REP_STOSB_THRESHOLD's default value to 2KB, when the input value is less than 2KB, the data flow is the same, and when the input value is large than 2KB, > this api will use STOB to instead of MOVQ > > but when I test this API on x86_64 platform > and found that this default value is not appropriate for some input length. here it's the enviornment and result This patch is not needed anymore since the threshold has been made a tunable: glibc.cpu.x86_rep_movsb_threshold. Siddhesh
OK, thanks. On 2020/12/21 12:38, Siddhesh Poyarekar wrote: > On 5/23/20 9:40 AM, liqingqing wrote: >> this commitid 830566307f038387ca0af3fd327706a8d1a2f595 optimize implementation of function memset, >> and set macro REP_STOSB_THRESHOLD's default value to 2KB, when the input value is less than 2KB, the data flow is the same, and when the input value is large than 2KB, >> this api will use STOB to instead of MOVQ >> >> but when I test this API on x86_64 platform >> and found that this default value is not appropriate for some input length. here it's the enviornment and result > > This patch is not needed anymore since the threshold has been made a tunable: glibc.cpu.x86_rep_movsb_threshold. > > Siddhesh > .
diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index dcd63c92..92c08eed 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -65,7 +65,7 @@ Enhanced REP STOSB. Since the stored value is fixed, larger register size has minimal impact on threshold. */ #ifndef REP_STOSB_THRESHOLD -# define REP_STOSB_THRESHOLD 2048 +# define REP_STOSB_THRESHOLD 1048576 #endif #ifndef SECTION