From patchwork Thu Oct 22 04:50:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Karumanchi, Sajan" X-Patchwork-Id: 40807 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A48E5396EC90; Thu, 22 Oct 2020 04:50:40 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2086.outbound.protection.outlook.com [40.107.223.86]) by sourceware.org (Postfix) with ESMTPS id 335FD3857C57 for ; Thu, 22 Oct 2020 04:50:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 335FD3857C57 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=amd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=Sajan.Karumanchi@amd.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ihHRcSnB2lYj0/6/uveY+dqf61NNmleuG89KuwkWr69GhUrCpzqvDS94ULboUqLFRojVpJ2LQ1Lcdlsvh0O1CwQPD9GyY/68mCEJTFYqrnoUHXtX+7co+nBAJ00n2Qzaq4G6XlVn+PK4QZAdJaU2/dc+Sa8san+LLUxJB82D48R+YPjhTbhR76dNW+D4L23LSH65sUq7ks2aDqqayEjXPBmTPdm0aLhedGD2stTdjzoG99A54R+di+cOH0yTnwuBeqzvXIObLjFtxmagyQ6Jh5vftVJvPQEmsDav34ESs43cmQAM+i11okHPL84Ze66GobbPo0biEpMRIqDEu6X4lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=F96oapVOmaoR+KYJbw9bpndH4pXqVplHGrMsVpXbewM=; b=Up9HNVa2/EKadN5oRA5VldDWZTrD1G2eRiFbsE8KtGN/N1LHZxqu1poG8OpdkuEMmiF1+fT9XyyjYE6fIFiTkNdW5huWL5+7i79pWZeI8C29a4j6cOpL2Pr9aqjQgPBimOa5BYYq2OKg2/1OgK25fHGM+MR195IXeDJJ1VL+9aAZupjHHxO1W6ONHewjNbdLC6yq9wRRXutd5QmtvdFjmvp34deTh8Tbe5sxZT5HsJsvnDVUzE9vP0MALTUo/rkYNl69KN0okebH9eRIV5vLNJieTTZlOnRvzdl62qhyubIeJTtKfBKD5JgVrHAgj4IvRQRljVUEDqlxDxCS+nOEDg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector2-amdcloud-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=F96oapVOmaoR+KYJbw9bpndH4pXqVplHGrMsVpXbewM=; b=iDYqSeat3rUUEgAAa0it3VDp0kx2vYHQheC7NU1NXtC9U2BKKS9QsCAoyoosWk+4BzRbrCuKxQIW8KvXKHV8G2TPvfPfSLw/lPzlA94VWb/3+UvwdTMYQiYVGhC8dSGBnZraqdBBKj6rHUM18htSSkVYh9WAA4cvgHfzeKRhUgY= Authentication-Results: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=amd.com; Received: from BY5PR12MB4067.namprd12.prod.outlook.com (2603:10b6:a03:212::17) by BYAPR12MB3094.namprd12.prod.outlook.com (2603:10b6:a03:db::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18; Thu, 22 Oct 2020 04:50:31 +0000 Received: from BY5PR12MB4067.namprd12.prod.outlook.com ([fe80::2d32:272f:bf1b:6d24]) by BY5PR12MB4067.namprd12.prod.outlook.com ([fe80::2d32:272f:bf1b:6d24%9]) with mapi id 15.20.3499.018; Thu, 22 Oct 2020 04:50:31 +0000 From: sajan.karumanchi@amd.com To: libc-alpha@sourceware.org, carlos@redhat.com, fweimer@redhat.com Subject: [PATCH 0/1] Optimizing memcpy for AMD Zen architecture. Date: Thu, 22 Oct 2020 10:20:04 +0530 Message-Id: <20201022045005.17371-1-sajan.karumanchi@amd.com> X-Mailer: git-send-email 2.17.1 X-Originating-IP: [165.204.156.251] X-ClientProxiedBy: MAXPR0101CA0008.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:c::18) To BY5PR12MB4067.namprd12.prod.outlook.com (2603:10b6:a03:212::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from amd.com (165.204.156.251) by MAXPR0101CA0008.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:c::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend Transport; Thu, 22 Oct 2020 04:50:29 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 761196da-87f4-4167-7823-08d87646011a X-MS-TrafficTypeDiagnostic: BYAPR12MB3094: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7691; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uDpfDZgbdCfhFbFkvYnjPvi9jqQ8iVW1r9i5xIPQWh0ihHWAV185JxEIksiUEaC46aqz0LQh0Lpg1BjUozR0HMxLxMwUdSnchB0mmwhRwySnqJO/xwszb+GvfeUyubo0nDWhjTqGXrGGkKNDey3E52VkHAsZQikjADk/wFenNQSNQqpu53iX0elF4WUSc+7bm6Cc7JaeIGLKNXMfjzOeWgVg8QPt2JfslDXmPqvJnvlitKOlMYNmi+DIozAgkYcqi0sBsSndqSlLhkLGqieWJvMcuE1c8YGzhk40JnEKUzZj382WnfLLtbjNb0BbNIcmNRPhbxP/XhmV4g2UlfBsrxVP5gP+lONVzz1vcoZVARhe4gWNXr5XcyyU6K6NWAQVYAtsCIb/hT7ptC/wA8m9H//Bcmjpm/hiQgexmlTLukmKRwgMwPuL1bG+0E0KFl2F7hbsPsdQ1GtCfoy0SfT3ew== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BY5PR12MB4067.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(346002)(39860400002)(366004)(136003)(376002)(66556008)(8886007)(66476007)(966005)(16526019)(52116002)(83380400001)(66946007)(8676002)(6666004)(186003)(4326008)(55016002)(9686003)(478600001)(5660300002)(7696005)(86362001)(26005)(2616005)(36756003)(2906002)(1076003)(316002)(8936002)(956004)(15519875007); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: kvoA7wb8TmTEetCKpmlXqcKVRfFIYYU3gjknPTGBxwRxnXTQrXCaUkZ2u56kSiNzkD439+hG8kdwms6WnPJD240tRJq1N6KPJMEdHhz6cFJUB6wXR0AjpwwFL/QSPF02l8FLcqetJk+dtr5satnVrF5tGe6PA9X4P/EAQ6g/jukF6NuM+30GZ2MMbUHdcSR3L08PzLYW7UlqRuiUBOLniE3C4DUh4Sd7or0ksmqF3M2lYSSMfq0q20MamIrG+K+T5x3+w5KbET3IALJNHx+E66umcjhdQ47FkY7xUQnhxfl8D9eFc4gItcN2AajLMAlO/f4ebaGZe5HTTy4oU8qw4msjMzMHiqFhQSfZwPj/yRn4SVt3JZSIdsvh1Ha7w0uWsj2BsAkl7aesUWcpcX/63iS9lLwb2TQPA/UoOerRo1aoFRaQfY1hqSB8riHFE3WlThC1oWBQU4WZ4X4XA/RKuxr9TAKhlaVhnMusX38EiHmVKk19g7mNpq3RklyckIf1HoZlBa5zLuEhBEzRNVm+2KlPElQbxnKgDyzcoohgUJk9s2Id3oLKC1JDAN3qqaYUNz+ju4P7JPOVPqYrwhFNgi9J2zNsl9AMX3SPeVhuYL102mrLty7wqtRYI+jRJUXdUCP5dyBzGgdPccqiF91i4g== X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 761196da-87f4-4167-7823-08d87646011a X-MS-Exchange-CrossTenant-AuthSource: BY5PR12MB4067.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Oct 2020 04:50:31.2977 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: IhGfQlp9AP9SUFWkeMRmDaH4CO62nqPlgHHgyIG72E8d4f62fQNLwsvo+KGSPyCfEFGbYlI44OGczMfExgQyjw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3094 X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sajan Karumanchi , premachandra.mallappa@amd.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" From: Sajan Karumanchi Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architecture. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX'(Core-Complex) has brought performance gains of ~44% for memory sizes greater than 16MB. The patch I posted earlier: 'Tuning NT Threshold parameter for AMD machines' https://sourceware.org/pipermail/libc-alpha/2020-August/117080.html and the recent patch committed by Patrick McGehearty: 'Reversing calculation of __x86_shared_non_temporal_threshold', both have regression problems on AMD Zen machines for memory ranges of 1MB to 8MB as per the large bench variant results. This patch addresses the regression problem on AMD Zen machines. The below link will show the performance results chart comparison of 'Master' branch and 'AMD' patch against the 2.32 stable release. https://i.imgur.com/0ZJAwes.png Summary: On master branch we see a regression for memoery sizes below 8MB with performance drop of upto 99%, whereas AMD patch has performance gains for 16MB and above with no regressions. Note: The benchmarking is done by isolating all the cpu cores in a CCX, configuring them to fixed frequency mode and routing the IRQs to other cpu cores. Then the large bench tests were run by pinning to one of the isolated cores for 1000 iterations and the performance computation is done by taking average of these iterations. Sajan Karumanchi (1): x86: Optimizing memcpy for AMD Zen architecture. sysdeps/x86/cacheinfo.h | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-)