[0/1] Optimizing memcpy for AMD Zen architecture.

Message ID	20201022045005.17371-1-sajan.karumanchi@amd.com
Headers	DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 335FD3857C57 From: sajan.karumanchi@amd.com To: libc-alpha@sourceware.org, carlos@redhat.com, fweimer@redhat.com Subject: [PATCH 0/1] Optimizing memcpy for AMD Zen architecture. Date: Thu, 22 Oct 2020 10:20:04 +0530 Message-Id: <20201022045005.17371-1-sajan.karumanchi@amd.com> Content-Type: text/plain MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Precedence: list Cc: Sajan Karumanchi <sajan.karumanchi@amd.com>, premachandra.mallappa@amd.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>

Message ID

20201022045005.17371-1-sajan.karumanchi@amd.com

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 335FD3857C57
From: sajan.karumanchi@amd.com
To: libc-alpha@sourceware.org,
	carlos@redhat.com,
	fweimer@redhat.com
Subject: [PATCH 0/1] Optimizing memcpy for AMD Zen architecture.
Date: Thu, 22 Oct 2020 10:20:04 +0530
Message-Id: <20201022045005.17371-1-sajan.karumanchi@amd.com>
Content-Type: text/plain
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
Received: from amd.com (165.204.156.251) by
 MAXPR0101CA0008.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:c::18) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend
 Transport; Thu, 22 Oct 2020 04:50:29 +0000
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-HT: Tenant
X-MS-Office365-Filtering-Correlation-Id: 761196da-87f4-4167-7823-08d87646011a
X-MS-TrafficTypeDiagnostic: BYAPR12MB3094:
X-MS-Exchange-Transport-Forked: True
X-Microsoft-Antispam-PRVS: 
 <BYAPR12MB30942E46B4F53521E1F10BE3891D0@BYAPR12MB3094.namprd12.prod.outlook.com>
X-MS-Oob-TLC-OOBClassifiers: OLM:7691;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: 
 uDpfDZgbdCfhFbFkvYnjPvi9jqQ8iVW1r9i5xIPQWh0ihHWAV185JxEIksiUEaC46aqz0LQh0Lpg1BjUozR0HMxLxMwUdSnchB0mmwhRwySnqJO/xwszb+GvfeUyubo0nDWhjTqGXrGGkKNDey3E52VkHAsZQikjADk/wFenNQSNQqpu53iX0elF4WUSc+7bm6Cc7JaeIGLKNXMfjzOeWgVg8QPt2JfslDXmPqvJnvlitKOlMYNmi+DIozAgkYcqi0sBsSndqSlLhkLGqieWJvMcuE1c8YGzhk40JnEKUzZj382WnfLLtbjNb0BbNIcmNRPhbxP/XhmV4g2UlfBsrxVP5gP+lONVzz1vcoZVARhe4gWNXr5XcyyU6K6NWAQVYAtsCIb/hT7ptC/wA8m9H//Bcmjpm/hiQgexmlTLukmKRwgMwPuL1bG+0E0KFl2F7hbsPsdQ1GtCfoy0SfT3ew==
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:BY5PR12MB4067.namprd12.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(4636009)(396003)(346002)(39860400002)(366004)(136003)(376002)(66556008)(8886007)(66476007)(966005)(16526019)(52116002)(83380400001)(66946007)(8676002)(6666004)(186003)(4326008)(55016002)(9686003)(478600001)(5660300002)(7696005)(86362001)(26005)(2616005)(36756003)(2906002)(1076003)(316002)(8936002)(956004)(15519875007);
 DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData: 
 kvoA7wb8TmTEetCKpmlXqcKVRfFIYYU3gjknPTGBxwRxnXTQrXCaUkZ2u56kSiNzkD439+hG8kdwms6WnPJD240tRJq1N6KPJMEdHhz6cFJUB6wXR0AjpwwFL/QSPF02l8FLcqetJk+dtr5satnVrF5tGe6PA9X4P/EAQ6g/jukF6NuM+30GZ2MMbUHdcSR3L08PzLYW7UlqRuiUBOLniE3C4DUh4Sd7or0ksmqF3M2lYSSMfq0q20MamIrG+K+T5x3+w5KbET3IALJNHx+E66umcjhdQ47FkY7xUQnhxfl8D9eFc4gItcN2AajLMAlO/f4ebaGZe5HTTy4oU8qw4msjMzMHiqFhQSfZwPj/yRn4SVt3JZSIdsvh1Ha7w0uWsj2BsAkl7aesUWcpcX/63iS9lLwb2TQPA/UoOerRo1aoFRaQfY1hqSB8riHFE3WlThC1oWBQU4WZ4X4XA/RKuxr9TAKhlaVhnMusX38EiHmVKk19g7mNpq3RklyckIf1HoZlBa5zLuEhBEzRNVm+2KlPElQbxnKgDyzcoohgUJk9s2Id3oLKC1JDAN3qqaYUNz+ju4P7JPOVPqYrwhFNgi9J2zNsl9AMX3SPeVhuYL102mrLty7wqtRYI+jRJUXdUCP5dyBzGgdPccqiF91i4g==
X-OriginatorOrg: amd.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 761196da-87f4-4167-7823-08d87646011a
X-MS-Exchange-CrossTenant-AuthSource: BY5PR12MB4067.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Oct 2020 04:50:31.2977 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 IhGfQlp9AP9SUFWkeMRmDaH4CO62nqPlgHHgyIG72E8d4f62fQNLwsvo+KGSPyCfEFGbYlI44OGczMfExgQyjw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3094
X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, FORGED_SPF_HELO, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Cc: Sajan Karumanchi <sajan.karumanchi@amd.com>, premachandra.mallappa@amd.com
Errors-To: libc-alpha-bounces@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>

Message

Karumanchi, Sajan Oct. 22, 2020, 4:50 a.m. UTC

  From: Sajan Karumanchi <sajan.karumanchi@amd.com>

Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architecture.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'.
Recomputing this shareable cache as 'L3 per CCX'(Core-Complex) has
brought performance gains of ~44% for memory sizes greater than 16MB.

The patch I posted earlier: 'Tuning NT Threshold parameter for AMD
machines'
https://sourceware.org/pipermail/libc-alpha/2020-August/117080.html
and the recent patch committed by Patrick McGehearty: 'Reversing
calculation of __x86_shared_non_temporal_threshold', both have
regression problems on AMD Zen machines for memory ranges of 1MB to 8MB
as per the large bench variant results.
This patch addresses the regression problem on AMD Zen machines.
The below link will show the performance results chart comparison of
'Master' branch and 'AMD' patch against the 2.32 stable release.
https://i.imgur.com/0ZJAwes.png
Summary: On master branch we see a regression for memoery sizes below
8MB with performance drop of upto 99%, whereas AMD patch has performance
gains for 16MB and above with no regressions.

Note: The benchmarking is done by isolating all the cpu cores in a CCX,
configuring them to fixed frequency mode and routing the IRQs to other
cpu cores.
Then the large bench tests were run by pinning to one of the isolated
cores for 1000 iterations and the performance computation is done by
taking average of these iterations.


Sajan Karumanchi (1):
  x86: Optimizing memcpy for AMD Zen architecture.

 sysdeps/x86/cacheinfo.h | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)