From patchwork Thu Feb 1 19:36:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Jeanson X-Patchwork-Id: 85172 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5C215385E00F for ; Thu, 1 Feb 2024 19:40:23 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from smtpout.efficios.com (smtpout.efficios.com [IPv6:2607:5300:203:b2ee::31e5]) by sourceware.org (Postfix) with ESMTPS id 7310B385AE5D for ; Thu, 1 Feb 2024 19:37:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7310B385AE5D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=efficios.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7310B385AE5D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:5300:203:b2ee::31e5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706816249; cv=none; b=f34BQmyIbJOO07+zutahJd29PxRUybX0utiqiBGalAbhI/uKk6/TNJDDRXI3wliQWVmwrRhk8Z1oq4yxcckb+A8d/OG9h2YfmQjERo9PVdj5aQb6o+rLxPJxoazleAA+BAhmr6ugfrfwV2O3jgB/PROIZCCQBFpI1nSi/ROblU4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706816249; c=relaxed/simple; bh=zS8FzI1n96gOaAIgIRKACk6cLU+HEzEf4vR+A6dbHP8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=sFLITXAiXTxrd2dObYl6ES2GkCkxD8wyXIg219ligT+fWHMPCRgijkLWiqQ7grBXWAZkZJN3DCSOEELAzHMR51gVMiOqX1OiPY5oaI6MD9/bkN8A0bAbjRJXI4mSJpiXbpZExB2Mgah/r1mEhtKzTDARQclQT5V+lJ2lBjPA0mY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1706816247; bh=zS8FzI1n96gOaAIgIRKACk6cLU+HEzEf4vR+A6dbHP8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lmFJ0ToJ+hYi/eKHhRY2sT0/kI1nFfC8qKTqEIuDFyhhZJuC9LBhBOuR+veQ/QccB 2vsPckKDmDHCHofQ5Ga2Tow6RtCx8jYo2LZmTiCXkNlVP44kqZOTUEArigJOcDQ8nr AcxMUDASCWALIAPHnmUmnq9pihoLWUnpR4Q5ruq48XRfetPFtIJPOn7DMrSq80uGJ6 FbOK7Y+F+fYitHMxE2wtoZVMgADXfjy7tucptRDhobuCxXwfCJ7MdmL1BHIo3EVTGH TpMTvZuHusWXfMRdTk12HAM+IjKCSz/sL3E94iR+OO/cuxzbVCkvlDnL6L8CwRu6j5 95crvuxRNzCyA== Received: from laptop-mjeanson.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4TQq1M0TSczX42; Thu, 1 Feb 2024 14:37:27 -0500 (EST) From: Michael Jeanson To: libc-alpha@sourceware.org Cc: Michael Jeanson , Florian Weimer , Carlos O'Donell , Mathieu Desnoyers Subject: [PATCH v7 8/8] Linux: Use rseq to accelerate getcpu Date: Thu, 1 Feb 2024 14:36:48 -0500 Message-Id: <20240201193648.584917-9-mjeanson@efficios.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240201193648.584917-1-mjeanson@efficios.com> References: <20240201193648.584917-1-mjeanson@efficios.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org On architectures that implement rseq_load32_load32_relaxed() (and thus define RSEQ_HAS_LOAD32_LOAD32_RELAXED), when the node_id feature is available, use rseq to fetch the cpu_id and node_id atomically with respect to preemption and signal delivery to speed up getcpu() compared to a vsyscall or system call implementation. Loading both cpu_id and node_id atomically with respect to preemption is required to ensure consistency of the topology mapping between cpu_id and node_id due to migration between both loads. On an aarch64 system (Snapdragon 8cx Gen 3) which lacks a vDSO for getcpu() we measured an improvement from 130 ns to 1 ns while on x86_64 (i7-8550U) which has a vDSO we measured a more modest improvement from 10 ns to 2 ns. Co-authored-by: Mathieu Desnoyers Signed-off-by: Mathieu Desnoyers Signed-off-by: Michael Jeanson --- sysdeps/unix/sysv/linux/getcpu.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/sysdeps/unix/sysv/linux/getcpu.c b/sysdeps/unix/sysv/linux/getcpu.c index 0e7c3238c9..7e34d6d1eb 100644 --- a/sysdeps/unix/sysv/linux/getcpu.c +++ b/sysdeps/unix/sysv/linux/getcpu.c @@ -19,9 +19,10 @@ #include #include #include +#include -int -__getcpu (unsigned int *cpu, unsigned int *node) +static int +vsyscall_getcpu (unsigned int *cpu, unsigned int *node) { #ifdef HAVE_GETCPU_VSYSCALL return INLINE_VSYSCALL (getcpu, 3, cpu, node, NULL); @@ -29,5 +30,32 @@ __getcpu (unsigned int *cpu, unsigned int *node) return INLINE_SYSCALL_CALL (getcpu, cpu, node, NULL); #endif } + +#if defined (RSEQ_SIG) && defined (RSEQ_HAS_LOAD32_LOAD32_RELAXED) +int +__getcpu (unsigned int *cpu, unsigned int *node) +{ + /* Check if rseq is registered and the node_id feature is available. */ + if (__glibc_likely (rseq_node_id_available())) + { + struct rseq_area *rseq_area = rseq_get_area(); + + if (rseq_load32_load32_relaxed(cpu, &rseq_area->cpu_id, + node, &rseq_area->node_id) == 0) + { + /* The critical section was not aborted, return 0. */ + return 0; + } + } + + return vsyscall_getcpu (cpu, node); +} +#else +int +__getcpu (unsigned int *cpu, unsigned int *node) +{ + return vsyscall_getcpu (cpu, node); +} +#endif weak_alias (__getcpu, getcpu) libc_hidden_def (__getcpu)