From patchwork Mon Nov 1 05:49:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 46883 X-Patchwork-Delegate: hjl.tools@gmail.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7C13D385780A for ; Mon, 1 Nov 2021 05:52:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7C13D385780A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1635745954; bh=jU2ZASX5s0wLVcw7fmm/m1e19xWDFAmel9q/ahghF48=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=d7PMuABFP4H4JfaauDc3KLTGAiSSMEJSS2FN6xPnr/okJLOe4QBUbk0xaZjDS3oZ8 b0Ugm0cioalThP/k0RKUPQbsGv3e7l+/jxZDSPG3i6VausUIshqvmZDf76DSO4mvpY R3UQJ/NKe9GNy8vorCd5TnEipPH+vk8vedi2EQdo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by sourceware.org (Postfix) with ESMTPS id 963D63858015 for ; Mon, 1 Nov 2021 05:50:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 963D63858015 Received: by mail-io1-xd35.google.com with SMTP id n11so6989571iod.9 for ; Sun, 31 Oct 2021 22:50:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jU2ZASX5s0wLVcw7fmm/m1e19xWDFAmel9q/ahghF48=; b=p/Ujxf05R6ZUy9SZDZXUG5vWDOuSzf8L77yiCMTngyEv0EuAWEKCyqylX0QCYdD7mM NXhz6aMv4u54s2azoUAsJGUlX2Zpd/GWmTOi+m26LOyeCdVybhpUUo/MzqKK1KyzR8jr m1FH9qfZ4Cs8Lx6vnZUqedB9kaMfYOLFyxIl51Q/2ZIgoh0048k3K6C5jC9hT7RPdiLT PC6c2P2Xq0zki4zQGjJJWSAvEzXC0phf4/J3d0mk+MLotwXmL5Bh8i2sTE2fBrMv8+s2 v1X3WFrTo8uz8ciKBm1nTugYqNojZaxPnbznZmIA65zlDWzsbOQHUApKxAWeaTOqG15N JzhA== X-Gm-Message-State: AOAM530TGP/LpiKbcqAafYTBnmAJB1FhiHcTBZXTtwvVQgi5CAt4qtqJ zR/VOiAt9OFScYlooePIOUyrnnpaOAM= X-Google-Smtp-Source: ABdhPJw2ag9Ejn7444+rFhseMBAZmldm6UqTRMqYUSih867beeNxJsy6nBO9MzCwVYY//IaLkt9V5A== X-Received: by 2002:a02:5b82:: with SMTP id g124mr20002380jab.89.1635745803223; Sun, 31 Oct 2021 22:50:03 -0700 (PDT) Received: from localhost.localdomain (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.googlemail.com with ESMTPSA id y1sm7483563ilp.43.2021.10.31.22.50.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Oct 2021 22:50:02 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 5/5] x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h Date: Mon, 1 Nov 2021 00:49:52 -0500 Message-Id: <20211101054952.2349590-5-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211101054952.2349590-1-goldstein.w.n@gmail.com> References: <20211101054952.2349590-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 --- sysdeps/x86/dl-cacheinfo.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index e6c94dfd02..712b7c7fd0 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -871,7 +871,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F) && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512)) { - rep_movsb_threshold = 2048 * (64 / 16); + rep_movsb_threshold = 4096 * (64 / 16); #if HAVE_TUNABLES minimum_rep_movsb_threshold = 64 * 8; #endif @@ -879,14 +879,14 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) else if (CPU_FEATURE_PREFERRED_P (cpu_features, AVX_Fast_Unaligned_Load)) { - rep_movsb_threshold = 2048 * (32 / 16); + rep_movsb_threshold = 4096 * (32 / 16); #if HAVE_TUNABLES minimum_rep_movsb_threshold = 32 * 8; #endif } else { - rep_movsb_threshold = 2048 * (16 / 16); + rep_movsb_threshold = 4096 * (16 / 16); #if HAVE_TUNABLES minimum_rep_movsb_threshold = 16 * 8; #endif @@ -896,6 +896,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) rep_movsb_threshold = 2112; + + + unsigned long int rep_movsb_stop_threshold; /* ERMS feature is implemented from AMD Zen3 architecture and it is performing poorly for data above L2 cache size. Henceforth, adding