Improve performance of libc locks

Message ID AM5PR0801MB16680E6787BC0553E1847A0B83649@AM5PR0801MB1668.eurprd08.prod.outlook.com
State New
Headers
Series Improve performance of libc locks |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Wilco Dijkstra Aug. 11, 2022, 4:22 p.m. UTC
  Improve performance of libc locks by adding a fast path for the
single-threaded case.

On Neoverse V1, a loop using rand() improved 3.6 times. Multithreaded
performance is unchanged.

Passes regress on AArch64, OK for commit?

---
  

Comments

Carlos O'Donell Aug. 15, 2022, 2:07 p.m. UTC | #1
On 8/11/22 12:22, Wilco Dijkstra via Libc-alpha wrote:
> Improve performance of libc locks by adding a fast path for the
> single-threaded case.
> 
> On Neoverse V1, a loop using rand() improved 3.6 times. Multithreaded
> performance is unchanged.
> 
> Passes regress on AArch64, OK for commit?

This impacts all architectures.

Are you able to run the microbenchmarks to show that this improves one of them?

If you can, then we can at least ask the other machine maintainers to test the
patch shows gains there too. Conceptually I don't see why it wouldn't improve
the performance of all architectures, but having a baseline at the point of the
patch is good for recording the performance for future discussions.

If we don't have a benchmark that shows this specific base of ST vs MT and
internal __libc_lock_lock-locks then we should add one. Improving the internal
locking for our algorithms is always going to be a point of interest for IHVs.

Thanks.
 
> ---
> 
> diff --git a/sysdeps/nptl/libc-lockP.h b/sysdeps/nptl/libc-lockP.h
> index d3a6837fd212f3f5dfd80f46d0e9ce365042ae0c..ccdb11fee6f14069d0b936be93d0f0fa6d8bc30b 100644
> --- a/sysdeps/nptl/libc-lockP.h
> +++ b/sysdeps/nptl/libc-lockP.h
> @@ -108,7 +108,14 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
>  #define __libc_rwlock_fini(NAME) ((void) 0)
>  
>  /* Lock the named lock variable.  */
> -#define __libc_lock_lock(NAME) ({ lll_lock (NAME, LLL_PRIVATE); 0; })
> +#define __libc_lock_lock(NAME)						\
> + ({									\
> +    if (SINGLE_THREAD_P)						\
> +      (NAME) = LLL_LOCK_INITIALIZER_LOCKED;				\
> +    else								\
> +      lll_lock (NAME, LLL_PRIVATE);					\
> +    0;									\
> +  })
>  #define __libc_rwlock_rdlock(NAME) __pthread_rwlock_rdlock (&(NAME))
>  #define __libc_rwlock_wrlock(NAME) __pthread_rwlock_wrlock (&(NAME))
>  
> @@ -116,7 +123,14 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
>  #define __libc_lock_trylock(NAME) lll_trylock (NAME)
>  
>  /* Unlock the named lock variable.  */
> -#define __libc_lock_unlock(NAME) lll_unlock (NAME, LLL_PRIVATE)
> +#define __libc_lock_unlock(NAME)					\
> + ({									\
> +    if (SINGLE_THREAD_P)						\
> +      (NAME) = LLL_LOCK_INITIALIZER;					\
> +    else								\
> +      lll_unlock (NAME, LLL_PRIVATE);					\
> +    0;									\
> + })
>  #define __libc_rwlock_unlock(NAME) __pthread_rwlock_unlock (&(NAME))
>  
>  #if IS_IN (rtld)
> 
> 
>
  
Wilco Dijkstra Aug. 15, 2022, 5:35 p.m. UTC | #2
Hi Carlos,

> This impacts all architectures.

That was the goal indeed - we should add single-threaded optimizations in a
generic way.

> If we don't have a benchmark that shows this specific base of ST vs MT and
> internal __libc_lock_lock-locks then we should add one. Improving the internal
> locking for our algorithms is always going to be a point of interest for IHVs.

I can easily wrap my rand() microbench in json and add it to the benchtests.
I think it would be harder to do more tests on internal locks/headers since they
are not easily usable from benchtest infrastructure (just including libc-lock.h
results in lots of errors...).

Cheers,
Wilco
  
Noah Goldstein Aug. 16, 2022, 7:26 a.m. UTC | #3
On Tue, Aug 16, 2022 at 1:35 AM Wilco Dijkstra via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>  Hi Carlos,
>
> > This impacts all architectures.
>
> That was the goal indeed - we should add single-threaded optimizations in a
> generic way.
>
> > If we don't have a benchmark that shows this specific base of ST vs MT and
> > internal __libc_lock_lock-locks then we should add one. Improving the internal
> > locking for our algorithms is always going to be a point of interest for IHVs.
>
> I can easily wrap my rand() microbench in json and add it to the benchtests.

Think that would be good so we can easily measure on other architectures.

> I think it would be harder to do more tests on internal locks/headers since they
> are not easily usable from benchtest infrastructure (just including libc-lock.h
> results in lots of errors...).
>
> Cheers,
> Wilco
  
Cristian Rodríguez Nov. 15, 2022, 8:17 p.m. UTC | #4
On Thu, Aug 11, 2022 at 12:23 PM Wilco Dijkstra via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Improve performance of libc locks by adding a fast path for the
> single-threaded case.
>
> On Neoverse V1, a loop using rand() improved 3.6 times. Multithreaded
> performance is unchanged.
>
> Passes regress on AArch64, OK for commit?
>
> ---
>

Ping ? I saw the stdio one was committed but what happened with this one ?
  
Wilco Dijkstra Dec. 9, 2022, 2:10 p.m. UTC | #5
Hi Cristian,

>> Improve performance of libc locks by adding a fast path for the
>> single-threaded case.
>>
>> On Neoverse V1, a loop using rand() improved 3.6 times. Multithreaded
>> performance is unchanged.
>>
>> Passes regress on AArch64, OK for commit?
>
> Ping ? I saw the stdio one was committed but what happened with this one ?

It is waiting on the locking benchmark being accepted. I've pinged that
(https://sourceware.org/pipermail/libc-alpha/2022-December/143944.html)
since it would be great to get all this in the next GLIBC.

Cheers,
Wilco
  

Patch

diff --git a/sysdeps/nptl/libc-lockP.h b/sysdeps/nptl/libc-lockP.h
index d3a6837fd212f3f5dfd80f46d0e9ce365042ae0c..ccdb11fee6f14069d0b936be93d0f0fa6d8bc30b 100644
--- a/sysdeps/nptl/libc-lockP.h
+++ b/sysdeps/nptl/libc-lockP.h
@@ -108,7 +108,14 @@  _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
 #define __libc_rwlock_fini(NAME) ((void) 0)
 
 /* Lock the named lock variable.  */
-#define __libc_lock_lock(NAME) ({ lll_lock (NAME, LLL_PRIVATE); 0; })
+#define __libc_lock_lock(NAME)						\
+ ({									\
+    if (SINGLE_THREAD_P)						\
+      (NAME) = LLL_LOCK_INITIALIZER_LOCKED;				\
+    else								\
+      lll_lock (NAME, LLL_PRIVATE);					\
+    0;									\
+  })
 #define __libc_rwlock_rdlock(NAME) __pthread_rwlock_rdlock (&(NAME))
 #define __libc_rwlock_wrlock(NAME) __pthread_rwlock_wrlock (&(NAME))
 
@@ -116,7 +123,14 @@  _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
 #define __libc_lock_trylock(NAME) lll_trylock (NAME)
 
 /* Unlock the named lock variable.  */
-#define __libc_lock_unlock(NAME) lll_unlock (NAME, LLL_PRIVATE)
+#define __libc_lock_unlock(NAME)					\
+ ({									\
+    if (SINGLE_THREAD_P)						\
+      (NAME) = LLL_LOCK_INITIALIZER;					\
+    else								\
+      lll_unlock (NAME, LLL_PRIVATE);					\
+    0;									\
+ })
 #define __libc_rwlock_unlock(NAME) __pthread_rwlock_unlock (&(NAME))
 
 #if IS_IN (rtld)