From patchwork Wed Oct 28 07:35:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Karumanchi, Sajan" X-Patchwork-Id: 40904 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 253D23857039; Wed, 28 Oct 2020 07:36:11 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-eopbgr750043.outbound.protection.outlook.com [40.107.75.43]) by sourceware.org (Postfix) with ESMTPS id 9D5DA3858002 for ; Wed, 28 Oct 2020 07:36:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9D5DA3858002 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=amd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=Sajan.Karumanchi@amd.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fnBdw3txqnSfZ1F9xMvy78L5KH8vCtX2wqjpmn1CG0N5mfeRASPGXliDhPLcKYyaVYolfycbZvopXhFHyx8YYxlSekhTAerfNSYTym5sjSQ6zXjIvOeZwjO2987Da2NuCBCiDZwXTL9EO4hXiEgncr4qSkyfc0ipi6DES7p19WZy9uKpZP8omxkpHdrzb1WIdkM2cKWNDaN6OOUczosWzztglD/20Fl49bNYSD4r2xIUl/TIopnGlgsJyPukzoDv2Fn/anHkjjwuWJAzOQ3Z8gEE3XRKf3rKbWMJ2OfjasDY1o20ieF07aPSp+vEwG1EQwLMc4RSx9cSYtwv9So9ZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wl74ULel9WtcdtkGNN6u2IgBrrEs2HhKSKbKqv6vmaQ=; b=J+syIJaOtrFA74lIRxVp4+CdPenkkH5g0zuBleyHPl+f2xCAA0emxxgCO6BjJAxVTBDC/zrD7+uwP6iVeXK9/1qgez8t7gISi4MUTUGam9+3OjD03R8h0tAszwJh4MW09DuC2yVaTYBI3QWIQoch4ByG6qWORrJTZ1BSv1JNbFGpudoUIFFnLa1IpoTaf024MmIGZoKTGghgvhFO36TpSOv/FG5y3R+Bv6bpGys8muygd1Yl8zDTJnmy14ovz07p6/NRORaVvRT7zhD66U06iFLD8ES3VLRPcLEns1N0JRtQSd8XKxzHiO/gpfhv6ZHL5JHlukBMn/y9dCyFPZkyqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector2-amdcloud-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wl74ULel9WtcdtkGNN6u2IgBrrEs2HhKSKbKqv6vmaQ=; b=RHmSOFowddOh+crMklqsIwQE6FxcKnaEBXQyB75VsaEF27X5wSi5zu06nG9fRL74BcqXO6d1UgXO0CyeOQSDl90qVq1CUzoS90RMkbgCZasOkIN8udSy4HGdy3rcSJ2kwT/ACpoMg4QRj8JUhuMhxANoseBlvpYvqQxG3KJiovs= Authentication-Results: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=amd.com; Received: from BY5PR12MB4067.namprd12.prod.outlook.com (2603:10b6:a03:212::17) by BY5PR12MB3922.namprd12.prod.outlook.com (2603:10b6:a03:195::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3477.20; Wed, 28 Oct 2020 07:36:03 +0000 Received: from BY5PR12MB4067.namprd12.prod.outlook.com ([fe80::2d32:272f:bf1b:6d24]) by BY5PR12MB4067.namprd12.prod.outlook.com ([fe80::2d32:272f:bf1b:6d24%9]) with mapi id 15.20.3499.019; Wed, 28 Oct 2020 07:36:03 +0000 From: sajan.karumanchi@amd.com To: libc-alpha@sourceware.org, carlos@redhat.com, fweimer@redhat.com Subject: [PATCH] x86: Optimizing memcpy for AMD Zen architecture. Date: Wed, 28 Oct 2020 13:05:33 +0530 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: <87lffs6jjn.fsf@oldenburg2.str.redhat.com> References: <87lffs6jjn.fsf@oldenburg2.str.redhat.com> X-Originating-IP: [165.204.156.251] X-ClientProxiedBy: MA1PR01CA0105.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:1::21) To BY5PR12MB4067.namprd12.prod.outlook.com (2603:10b6:a03:212::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from amd.com (165.204.156.251) by MA1PR01CA0105.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:1::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend Transport; Wed, 28 Oct 2020 07:36:01 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 7441d51b-792b-44c5-4e3b-08d87b141f9e X-MS-TrafficTypeDiagnostic: BY5PR12MB3922: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:4303; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 7hHYPU59vVSzxMKusyxraS122eNeatCMOdPbG0yEQAH/ba+dsMlDIu613uUyBOaMYk4VBgl9D93HzUL5NEu99hmc0AgNJxDKjcR3iynqm4TXkFmfD5lr0uIGVimDW7l4j6yP03m+cZCkrSatM73Cz4msYUzIKu40p2GQMN6c+gayJRDXOiUqE32VdVCO8/psYSz9hSRCTGkOm0+vnpvrHSZS2j044WlGQA50C1JX8tvWdhZP7d7ZR99W5FFf9ID0PYDL8XY4VCe5jqyz97XBiXXSlztoXZtrzwc5ZyXXxeoQDzM2ULMd+NRYo9psDc/xF0VR0wV0fbtN+6RqfwAfJg== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BY5PR12MB4067.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39850400004)(396003)(136003)(366004)(346002)(376002)(478600001)(5660300002)(8886007)(66556008)(2906002)(36756003)(6666004)(8676002)(8936002)(55016002)(9686003)(66476007)(2616005)(956004)(4326008)(83380400001)(66946007)(316002)(26005)(86362001)(16526019)(52116002)(7696005)(186003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: OEPTLtW4/dAajlWE/TJpUodi0cUHWzdC/d7gQJ1UZa9OJNvWirFir7Z0+hL0Ij7yYouyKQsXsMe5fgQyFUFhOhsmi8kaD02rssCt5JYQKYwlKsHpB/dpJxlMhmFJc2vPFp8Zx91dCeFDrFmwBSRELJVo/gd96WjY7enxN8sBpUvK01oSGDDz5ecnpBftVKmVqvheq4bjg42AMpLPpZVKA/KTEUAkfJ7CuUph3ySQyicJv0LjVhs1GX9MDDbw8eRedIX/fX4xlOIo1tgSX3lIFv2c25XH1TH5yJ5MGWq2xG3RrYi6Caed/Av+v9CBFb8y7bul6ZJXM2InzmePODhDW5dgdRwo7bREdkaGBJHGSRiYvIp7GSARc1abLriL4J0047zQY6rkk15CUh82WK04WC8o3jLaAbjlYFEfNM/D3j6SQu95lY3+rRAQ0Lg4eJ33Hd/y6mv5TRdnKAojgj3A3TEFBPBE+ZOMzfuY+obTevfr1zxpzNynU835iC2Rk11hrLhZK+mYxv4aM7NS3Vag+fTPu5HB4ByV5edHdB7PSOnSgEUIVK41alrTU0CmSv1oiSVKIxFkicuiZWn255iR0dW6gRUOl9s05c4RbP/AZbsVm+Xqb+nkP9oKgd5IquJzO729kV2lA6keVan5xSov/w== X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7441d51b-792b-44c5-4e3b-08d87b141f9e X-MS-Exchange-CrossTenant-AuthSource: BY5PR12MB4067.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2020 07:36:03.8083 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 9a68jZrc0eJ9OFtLoCLWaKnQ05eA8wgX8qEu39PGqNDvSzuLi1R1OLv2bXknkPqAlKG1pIDv42iqK2iznCMFNg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB3922 X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sajan Karumanchi , premachandra.mallappa@amd.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" From: Sajan Karumanchi Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Reviewed-by: Premachandra Mallappa --- sysdeps/x86/cacheinfo.h | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h index 7f342fdc23..1296c93b2b 100644 --- a/sysdeps/x86/cacheinfo.h +++ b/sysdeps/x86/cacheinfo.h @@ -320,7 +320,7 @@ init_cacheinfo (void) threads = 1 << ((ecx >> 12) & 0x0f); } - if (threads == 0) + if (threads == 0 || cpu_features->basic.family >= 0x17) { /* If APIC ID width is not available, use logical processor count. */ @@ -335,13 +335,30 @@ init_cacheinfo (void) if (threads > 0) shared /= threads; - /* Account for exclusive L2 and L3 caches. */ - shared += core; - } + /* Get shared cache per ccx for Zen architectures. */ + if (cpu_features->basic.family >= 0x17) + { + unsigned int eax; + + /* Get number of threads share the L3 cache in CCX. */ + __cpuid_count (0x8000001D, 0x3, eax, ebx, ecx, edx); + + unsigned int threads_per_ccx = ((eax >> 14) & 0xfff) + 1; + shared *= threads_per_ccx; + } + else + { + /* Account for exclusive L2 and L3 caches. */ + shared += core; + } + } } if (cpu_features->data_cache_size != 0) - data = cpu_features->data_cache_size; + { + if (data == 0 || cpu_features->basic.kind != arch_kind_amd) + data = cpu_features->data_cache_size; + } if (data > 0) { @@ -354,7 +371,10 @@ init_cacheinfo (void) } if (cpu_features->shared_cache_size != 0) - shared = cpu_features->shared_cache_size; + { + if (shared == 0 || cpu_features->basic.kind != arch_kind_amd) + shared = cpu_features->shared_cache_size; + } if (shared > 0) {