From patchwork Wed Aug 19 10:45:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sajan Karumanchi X-Patchwork-Id: 40284 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29FAC386F400; Wed, 19 Aug 2020 10:56:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 29FAC386F400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1597834597; bh=g863Wj8LX9cNtOapI3V9uMoW5LYUKlHR581SfayB/fs=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=boZaNHMv1gZUWSdRZV/AWWtDxsIySe1tzVqRObi4cIgWiRioE3aJk7gQyseQg6UM0 jv/+/tQubOtGItNcGWK4h1b5+EurCTAmQ2TzcdxxymVQ8PVTeEOL+8bncoLA6AwGa0 HcmNLNVT6pup3D++BQLAxn9RXiBbZY1l0LQ7scMA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id D38963844020 for ; Wed, 19 Aug 2020 10:56:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D38963844020 Received: by mail-pg1-x532.google.com with SMTP id h12so11208530pgm.7 for ; Wed, 19 Aug 2020 03:56:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=g863Wj8LX9cNtOapI3V9uMoW5LYUKlHR581SfayB/fs=; b=HMYR3RsNbDqirj5i/lVgfJv2+f6QUiiXxmEfRxJhagB0+JUqKgdGLCqwz6joqtmPKW 3rDapLCOCi+8zvJprIaVGkj2KTs2ymy8cSzWa+E42C6k1ndSne0UMxwLlILArY/xJNbw 8zXtshHKZ4WLWzfFvF2BORxZcc0lX9rO4v+O3pr+/uVnIQBQThfA7Hnpqy7kBKT/9F7E t6HZV25CcwkbRl2NKGI7awHER9bhe4qr/HuDEj8nehuAdHl5mk9nq/iBqRYBHCOXbwRp nkNXu6TjnkqbQSPnFvt2pvKxGcxfvZPE3QN4XB29DXBlXL6wTeFRODsnMQ7iggXNuSyS EQmg== X-Gm-Message-State: AOAM532NZ5zwUIWiNS+vB8wzehapgTQHARCR7oDixg1kZvq1B3srjEpT cmlX3Dz975XXmP32cSp37Vs1JIK4VOtZDQ== X-Google-Smtp-Source: ABdhPJwRXHQL+NudhRmHzBThmrOvb0DFCJN1FliOSMnRTIXoZvJq+djWLt14xV9yWq/v7e9s108U2Q== X-Received: by 2002:a63:b21a:: with SMTP id x26mr9972939pge.424.1597834593375; Wed, 19 Aug 2020 03:56:33 -0700 (PDT) Received: from lib-golemit-02.amd.com ([165.204.156.251]) by smtp.googlemail.com with ESMTPSA id p9sm2861066pjm.1.2020.08.19.03.56.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Aug 2020 03:56:32 -0700 (PDT) X-Google-Original-From: Sajan Karumanchi To: libc-alpha@sourceware.org, carlos@redhat.com Subject: [PATCH 0/1] x86: Tuning NT Threshold parameter for AMD machines Date: Wed, 19 Aug 2020 16:15:38 +0530 Message-Id: <20200819104539.9854-1-sajan.karumanchi@amd.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Sajan Karumanchi via Libc-alpha From: Sajan Karumanchi Reply-To: Sajan Karumanchi Cc: Sajan Karumanchi , premachandra.mallappa@amd.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Tuning NT threshold parameter '__x86_shared_non_temporal_threshold' to 2/3 of shared cache size on AMD Zen[1|2] machines brings in performance gains for memcpy/memmove as per the Large and Walk Bench variant reuslts. As there are run to run variations in bench results, I took average of 100 runs for both vanilla and patched glibc. AMD ZEN[1/2] architectures doesn't have ERMS cpu feature. So, on ZEN architecutre memcpy takes 'memcpy_avx_unaligned' entry point. Below is the large bench test results comparision for entry points: avx_unaligned and avx_unaligned_erms. ------------------------------------------------------------------------- size load_align store_align avx_unaligned(%) avx_unaligned_erms(%) ------------------------------------------------------------------------- 1048583 0 0 1.89 68.28 1048591 0 3 1.19 94.56 1048607 3 0 -0.25 68.25 1048639 3 5 -90.7 89.69 2097159 0 0 -75.11 43.18 2097167 0 3 -74.08 90.16 2097183 3 0 -78.12 43.81 2097215 3 5 -73.75 90.58 4194311 0 0 -88.5 39.26 4194319 0 3 -72.13 90.21 4194335 3 0 -78.31 43.97 4194367 3 5 -72 90.64 8388615 0 0 -12.22 43.24 8388623 0 3 -15.76 90.3 8388639 3 0 -22.31 39.92 8388671 3 5 -15.34 90.74 16777223 0 0 49.8 46.89 16777231 0 3 52.5 90.14 16777247 3 0 51.82 46.68 16777279 3 5 52.35 90.55 33554439 0 0 41.76 52.72 33554447 0 3 44.17 88.29 33554463 3 0 43.74 53.62 33554495 3 5 44.09 88.78 ------------------------------------------------------------------------- Below is the Walk bench test results comparision for entry points. avx_unaligned and avx_unaligned_erms. --------------------------------------------------- size avx_unaligned(%) avx_unaligned_erms(%) --------------------------------------------------- 1048576 -0.2 15.03 1048577 0.92 15.52 2097152 40.52 50.92 2097153 40.76 50.84 4194304 40.6 51.22 4194305 40.57 51.25 8388608 40.61 51.23 8388609 40.82 51.32 16777216 40.56 51.11 16777217 40.35 51.29 33554432 40.15 37.41 33554433 20.75 41.22 --------------------------------------------------- Question: Why do we see discrepancies in the results of Large bench, though code path taken for NT Stores in memcpy is same for both entry points "memcpy_avx_unaligned" and "memcpy_avx_unaligned_erms"? Sajan Karumanchi (1): x86: Tuning NT Threshold parameter for AMD machines. sysdeps/x86/cacheinfo.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)