From patchwork Sat Nov 6 17:37:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 47168 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B929E3857C76 for ; Sat, 6 Nov 2021 17:39:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B929E3857C76 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636220387; bh=eis6caW7G3wo3fCpMXVGceDYgwT/uhqxRUKStu+8zQA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=k2WWoaGY0ucyGGDdsrNIZKr+j2uWVt7wlssdXAfXzS8+xOUHNGSLnUKmqBrUWrzlG +MGMxKidAW0ccv5JwX1QfOBYtXayvZyUouABu5i24L4hQr6RrXHJS4lbQJGsIGxNVW vlSXhoFCSZi8doFoksri599wYBN/5la4H+VftXBA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-il1-x131.google.com (mail-il1-x131.google.com [IPv6:2607:f8b0:4864:20::131]) by sourceware.org (Postfix) with ESMTPS id 345633857C76 for ; Sat, 6 Nov 2021 17:37:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 345633857C76 Received: by mail-il1-x131.google.com with SMTP id h2so12696295ili.11 for ; Sat, 06 Nov 2021 10:37:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eis6caW7G3wo3fCpMXVGceDYgwT/uhqxRUKStu+8zQA=; b=E/qbmv1C711oFJCvMXH3EUTIE6ULijpLGUUrl7DimD4tYLcuh4mxP+2gJREIsM5QfV r7IN4bb32WfaspljkY3o3s6d5To8bOxVTEvB301+wfFHyjUZI7y/FpdbL8oWxU8lGzQx EUQWgd/gca59hBw7C8YgaBC7GivABjbyEHEFY/HE5PjyA5yYWotV1RVegnm6KoOn0l8O P7VQHbv3EMX83S9SiO/CLAJh6eyaDFek/vgld9b2F0YODlBkRl2Gi898CNUJBBu/2Jko t8AjMAGh5wi7ryhqxYNcVHa0yrNc2ikgvjfAcTCRDpvlue+nEYo2jyqmyM3QiXgC8LjI eGkw== X-Gm-Message-State: AOAM5303Ek9fAplT+UQ4f/ZS9vEGAGMWmidoYNvbJbAyAXUPXH9xXLpm uSoqWqC6M8RpwB+FvQCNnOudkk1H3Pw= X-Google-Smtp-Source: ABdhPJwGdG7j1yhODfGpMiBwK4i8YwoPImxvjuqySNVPnJxpTMdS5Mldjr5aAziy9s00pdNasigc0Q== X-Received: by 2002:a05:6e02:158c:: with SMTP id m12mr45505799ilu.132.1636220240395; Sat, 06 Nov 2021 10:37:20 -0700 (PDT) Received: from localhost.localdomain (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.googlemail.com with ESMTPSA id q8sm5488927ilv.48.2021.11.06.10.37.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Nov 2021 10:37:19 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 5/5] x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h Date: Sat, 6 Nov 2021 12:37:06 -0500 Message-Id: <20211106173706.3125357-5-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211106173706.3125357-1-goldstein.w.n@gmail.com> References: <20211101054952.2349590-1-goldstein.w.n@gmail.com> <20211106173706.3125357-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 --- sysdeps/x86/dl-cacheinfo.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index e6c94dfd02..ceb3b53828 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -866,12 +866,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8. */ unsigned int minimum_rep_movsb_threshold; #endif - /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16). */ + /* NB: The default REP MOVSB threshold is 4096 * (VEC_SIZE / 16). */ unsigned int rep_movsb_threshold; if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F) && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512)) { - rep_movsb_threshold = 2048 * (64 / 16); + rep_movsb_threshold = 4096 * (64 / 16); #if HAVE_TUNABLES minimum_rep_movsb_threshold = 64 * 8; #endif @@ -879,14 +879,14 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) else if (CPU_FEATURE_PREFERRED_P (cpu_features, AVX_Fast_Unaligned_Load)) { - rep_movsb_threshold = 2048 * (32 / 16); + rep_movsb_threshold = 4096 * (32 / 16); #if HAVE_TUNABLES minimum_rep_movsb_threshold = 32 * 8; #endif } else { - rep_movsb_threshold = 2048 * (16 / 16); + rep_movsb_threshold = 4096 * (16 / 16); #if HAVE_TUNABLES minimum_rep_movsb_threshold = 16 * 8; #endif