From patchwork Tue Jan 3 19:37:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 62674 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9A5363858C5E for ; Tue, 3 Jan 2023 19:37:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9A5363858C5E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1672774660; bh=T9yyBsonzsU+Kz16OryWFC91if4XM8feaVGTlzHEulc=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=ZFx/frs5kKdeo6QGJsmbsoe1N1qPY8VWIhayiDaAUoQCiNgX9N7R1gieVoUFmBkXK Vp6oEeeXuAYAXBiAUUCSkWQ6WlXPsjjfswssF7CBKZIsWb+TyK+oJf/ZqOjefPTQOq Om/FvJChCHo1TWQTegOoQlZgEswoQLu8WG81Slho= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by sourceware.org (Postfix) with ESMTPS id CD65A3858D20 for ; Tue, 3 Jan 2023 19:37:18 +0000 (GMT) Received: by mail-pj1-x102a.google.com with SMTP id cp9-20020a17090afb8900b00226a934e0e5so546314pjb.1 for ; Tue, 03 Jan 2023 11:37:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=T9yyBsonzsU+Kz16OryWFC91if4XM8feaVGTlzHEulc=; b=EGUGT0TAyTJpEzcmijIUXhwcWWGoWgmsJdmvJffQ+tGsUYR1Jfqk1CQVmAri+TtBTb RU6jXJfcARI7bvEvshmT9LKy+IK7wxSBo0Os/VRXWL4RzKzNOfBkjqk9zQoX04JXVN+o 3JgiGbg0biOHsk5DBAu03rwMKmN1bD3CWfYWag2IxcUGy5/Ey04BVJ31DZZ5DLbBIJXP Lvo6eoNxVaTwnGiVyybfTy809ixEAT/Sq/LYVNyqM3erJZ8I1KoDeZ/xmTn/xhSMWg0o dJDuc96ypBc3U/EaYYY1/tOqY2j4uG2mZjCHBN5KE9wn9tMeUiynj7TSEqcpoYcILNrs myFw== X-Gm-Message-State: AFqh2kqxIC5bE++qOVzi9/I+mMUrj5G6qbVS6IlbkjgKeiyi6Nrxa2EI 2dvjLyboEJKeqZDXWUWZsdghUTPVzsQ= X-Google-Smtp-Source: AMrXdXtCMOKC9DZeBStPC3X84uwTiD5t3iBWbMWYYcypBSVTkayjfgMNUktUWrW+4KUnbiRRwK7NlA== X-Received: by 2002:a05:6a20:3b04:b0:ae:47a6:e6c7 with SMTP id c4-20020a056a203b0400b000ae47a6e6c7mr42151756pzh.32.1672774637790; Tue, 03 Jan 2023 11:37:17 -0800 (PST) Received: from gnu-cfl-3.localdomain ([172.56.31.27]) by smtp.gmail.com with ESMTPSA id n25-20020a6563d9000000b00477bfac06b7sm19145506pgv.34.2023.01.03.11.37.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Jan 2023 11:37:17 -0800 (PST) Received: from gnu-cfl-3.. (localhost [IPv6:::1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id E6C8C74005A; Tue, 3 Jan 2023 11:37:15 -0800 (PST) To: libc-alpha@sourceware.org Cc: Noah Goldstein Subject: [PATCH] x86: Check the minimum non_temporal_threshold [BZ #29953] Date: Tue, 3 Jan 2023 11:37:15 -0800 Message-Id: <20230103193715.2549830-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.39.0 MIME-Version: 1.0 X-Spam-Status: No, score=-3023.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may be set to less than the minimum value when the shared cache size isn't available (e.g., in an emulator) or by the tunable. Add a check for the minimum non_temporal_threshold. This fixes BZ #29953. --- sysdeps/x86/dl-cacheinfo.h | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index e9f3382108..92e8e40752 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -861,8 +861,18 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) share of the cache, it has a substantial risk of negatively impacting the performance of other threads running on the chip. */ unsigned long int non_temporal_threshold = shared * 3 / 4; + /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of + 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best + if that operation cannot overflow. Minimum of 0x4040 (16448) because the + L(large_memset_4x) loops need 64-byte to cache align and enough space for + at least 1 iteration of 4x PAGE_SIZE unrolled loop. Both values are + reflected in the manual. */ + unsigned long int minimum_non_temporal_threshold = 0x4040; + if (non_temporal_threshold < minimum_non_temporal_threshold) + non_temporal_threshold = minimum_non_temporal_threshold; #if HAVE_TUNABLES + unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8. */ unsigned int minimum_rep_movsb_threshold; #endif @@ -915,8 +925,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) shared = tunable_size; tunable_size = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL); - /* NB: Ignore the default value 0. */ - if (tunable_size != 0) + if (tunable_size > minimum_non_temporal_threshold + && tunable_size <= maximum_non_temporal_threshold) non_temporal_threshold = tunable_size; tunable_size = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL); @@ -931,14 +941,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); - /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of - 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best - if that operation cannot overflow. Minimum of 0x4040 (16448) because the - L(large_memset_4x) loops need 64-byte to cache align and enough space for - at least 1 iteration of 4x PAGE_SIZE unrolled loop. Both values are - reflected in the manual. */ TUNABLE_SET_WITH_BOUNDS (x86_non_temporal_threshold, non_temporal_threshold, - 0x4040, SIZE_MAX >> 4); + minimum_non_temporal_threshold, + maximum_non_temporal_threshold); TUNABLE_SET_WITH_BOUNDS (x86_rep_movsb_threshold, rep_movsb_threshold, minimum_rep_movsb_threshold, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_rep_stosb_threshold, rep_stosb_threshold, 1,