Created tunable to force small pages on stack allocation.

Message ID 20230310143558.361825-2-cupertino.miranda@oracle.com
State Superseded
Headers
Series Created tunable to force small pages on stack allocation. |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Cupertino Miranda March 10, 2023, 2:35 p.m. UTC
  Created tunable glibc.pthread.stack_hugetlb to control when hugepages
can be used for stack allocation.
In case THP are enabled and glibc.pthread.stack_hugetlb is set to
0, glibc will madvise the kernel not to use allow hugepages for stack
allocations.
---
 nptl/allocatestack.c          |  6 ++++++
 nptl/nptl-stack.c             |  1 +
 nptl/nptl-stack.h             |  3 +++
 nptl/pthread_mutex_conf.c     | 23 +++++++++++++++++++++++
 sysdeps/nptl/dl-tunables.list |  6 ++++++
 5 files changed, 39 insertions(+)
  

Comments

Florian Weimer March 13, 2023, 7:54 a.m. UTC | #1
* Cupertino Miranda via Libc-alpha:

> +static void
> +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp)
> +{
> +  enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
> +  /*
> +     Only allow to change the default of this tunable if the system
> +     does support and has either 'madvise' or 'always' mode. Otherwise
> +     the madvise() call is wasteful.
> +  */
> +  switch(thp_mode)
> +    {
> +    case malloc_thp_mode_always:
> +    case malloc_thp_mode_madvise:
> +      __nptl_stack_hugetlb = (int32_t) valp->numval;
> +      break;
> +    default:
> +      break;
> +    }
> +}

I suspect that quite a few failing madvise calls are cheaper than the
those three system calls in __malloc_thp_mode.  In addition,
__malloc_thp_mode may fail due to future kernel changes, disabling the
tunable by accident.

Thanks,
Florian
  
Cupertino Miranda March 13, 2023, 10:07 a.m. UTC | #2
Hi Florian

> * Cupertino Miranda via Libc-alpha:
>
>> +static void
>> +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp)
>> +{
>> +  enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
>> +  /*
>> +     Only allow to change the default of this tunable if the system
>> +     does support and has either 'madvise' or 'always' mode. Otherwise
>> +     the madvise() call is wasteful.
>> +  */
>> +  switch(thp_mode)
>> +    {
>> +    case malloc_thp_mode_always:
>> +    case malloc_thp_mode_madvise:
>> +      __nptl_stack_hugetlb = (int32_t) valp->numval;
>> +      break;
>> +    default:
>> +      break;
>> +    }
>> +}
>
> I suspect that quite a few failing madvise calls are cheaper than the
> those three system calls in __malloc_thp_mode.  In addition,
> __malloc_thp_mode may fail due to future kernel changes, disabling the
> tunable by accident.
Thanks for your review. Ok, I did not check the inner workings of
malloc_thp_mode. I used it by suggestion from the RFC thread.
I will prepare a version removing it.

>
> Thanks,
> Florian
  
Cupertino Miranda March 13, 2023, 5:10 p.m. UTC | #3
Cupertino Miranda via Libc-alpha writes:

> Hi Florian
>
>> * Cupertino Miranda via Libc-alpha:
>>
>>> +static void
>>> +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp)
>>> +{
>>> +  enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
>>> +  /*
>>> +     Only allow to change the default of this tunable if the system
>>> +     does support and has either 'madvise' or 'always' mode. Otherwise
>>> +     the madvise() call is wasteful.
>>> +  */
>>> +  switch(thp_mode)
>>> +    {
>>> +    case malloc_thp_mode_always:
>>> +    case malloc_thp_mode_madvise:
>>> +      __nptl_stack_hugetlb = (int32_t) valp->numval;
>>> +      break;
>>> +    default:
>>> +      break;
>>> +    }
>>> +}
>>
>> I suspect that quite a few failing madvise calls are cheaper than the
>> those three system calls in __malloc_thp_mode.  In addition,
>> __malloc_thp_mode may fail due to future kernel changes, disabling the
>> tunable by accident.
> Thanks for your review. Ok, I did not check the inner workings of
> malloc_thp_mode. I used it by suggestion from the RFC thread.
> I will prepare a version removing it.
>
>>
>> Thanks,
>> Florian



kkjjj   b
  
Adhemerval Zanella March 13, 2023, 5:15 p.m. UTC | #4
On 13/03/23 07:07, Cupertino Miranda wrote:
> 
> Hi Florian
> 
>> * Cupertino Miranda via Libc-alpha:
>>
>>> +static void
>>> +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp)
>>> +{
>>> +  enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
>>> +  /*
>>> +     Only allow to change the default of this tunable if the system
>>> +     does support and has either 'madvise' or 'always' mode. Otherwise
>>> +     the madvise() call is wasteful.
>>> +  */
>>> +  switch(thp_mode)
>>> +    {
>>> +    case malloc_thp_mode_always:
>>> +    case malloc_thp_mode_madvise:
>>> +      __nptl_stack_hugetlb = (int32_t) valp->numval;
>>> +      break;
>>> +    default:
>>> +      break;
>>> +    }
>>> +}
>>
>> I suspect that quite a few failing madvise calls are cheaper than the
>> those three system calls in __malloc_thp_mode.  In addition,
>> __malloc_thp_mode may fail due to future kernel changes, disabling the
>> tunable by accident.
> Thanks for your review. Ok, I did not check the inner workings of
> malloc_thp_mode. I used it by suggestion from the RFC thread.
> I will prepare a version removing it.

It only make sense to madvise iff __malloc_thp_mode is set as 'always',
and tunable is a opt-in feature.  So I think it would make sense to
check the sysfs is tunable is used; the startup code will be amortized
on high threads workloads.
  
Cupertino Miranda March 13, 2023, 5:17 p.m. UTC | #5
> I suspect that quite a few failing madvise calls are cheaper than the
> those three system calls in __malloc_thp_mode.  In addition,
> __malloc_thp_mode may fail due to future kernel changes, disabling the
> tunable by accident.

Please find the new patch which removes the __malloc_thp_mode
verification below.
Looking forward to your comments.

Cupertino
  
Cupertino Miranda March 13, 2023, 5:18 p.m. UTC | #6
> kkjjj   b
Apologies, sent this garbage by mistake when email client was not
responsive. :(
  
Cupertino Miranda March 13, 2023, 5:32 p.m. UTC | #7
Adhemerval Zanella Netto writes:

> On 13/03/23 07:07, Cupertino Miranda wrote:
>>
>> Hi Florian
>>
>>> * Cupertino Miranda via Libc-alpha:
>>>
>>>> +static void
>>>> +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp)
>>>> +{
>>>> +  enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
>>>> +  /*
>>>> +     Only allow to change the default of this tunable if the system
>>>> +     does support and has either 'madvise' or 'always' mode. Otherwise
>>>> +     the madvise() call is wasteful.
>>>> +  */
>>>> +  switch(thp_mode)
>>>> +    {
>>>> +    case malloc_thp_mode_always:
>>>> +    case malloc_thp_mode_madvise:
>>>> +      __nptl_stack_hugetlb = (int32_t) valp->numval;
>>>> +      break;
>>>> +    default:
>>>> +      break;
>>>> +    }
>>>> +}
>>>
>>> I suspect that quite a few failing madvise calls are cheaper than the
>>> those three system calls in __malloc_thp_mode.  In addition,
>>> __malloc_thp_mode may fail due to future kernel changes, disabling the
>>> tunable by accident.
>> Thanks for your review. Ok, I did not check the inner workings of
>> malloc_thp_mode. I used it by suggestion from the RFC thread.
>> I will prepare a version removing it.
>
> It only make sense to madvise iff __malloc_thp_mode is set as 'always',
> and tunable is a opt-in feature.  So I think it would make sense to
> check the sysfs is tunable is used; the startup code will be amortized
> on high threads workloads.

From my side either solution is Ok. Using the __malloc_thp_mode would
protect the code from calls to madvise in systems that do not have
hugetlb support. This would avoid returning an error on madvise if the
advice argument is not identified as valid.
  

Patch

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index c7adbccd6f..c792c6ed1f 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -369,6 +369,12 @@  allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	  if (__glibc_unlikely (mem == MAP_FAILED))
 	    return errno;
 
+	  /* Do madvise in case the tunable glibc.pthread.stack_hugetlb is
+	     set to 0, disabling hugetlb.  */
+	  if (__glibc_unlikely (__nptl_stack_hugetlb == 0)
+	      && __madvise(mem, size, MADV_NOHUGEPAGE) != 0)
+	    return errno;
+
 	  /* SIZE is guaranteed to be greater than zero.
 	     So we can never get a null pointer back from mmap.  */
 	  assert (mem != NULL);
diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c
index 5eb7773575..e829711cb5 100644
--- a/nptl/nptl-stack.c
+++ b/nptl/nptl-stack.c
@@ -21,6 +21,7 @@ 
 #include <pthreadP.h>
 
 size_t __nptl_stack_cache_maxsize = 40 * 1024 * 1024;
+int32_t __nptl_stack_hugetlb = 1;
 
 void
 __nptl_stack_list_del (list_t *elem)
diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h
index 34f8bbb15e..d5b2612b4a 100644
--- a/nptl/nptl-stack.h
+++ b/nptl/nptl-stack.h
@@ -27,6 +27,9 @@ 
 /* Maximum size of the cache, in bytes.  40 MiB by default.  */
 extern size_t __nptl_stack_cache_maxsize attribute_hidden;
 
+/* Should allow stacks to use hugetlb. 1 is default */
+extern int32_t __nptl_stack_hugetlb;
+
 /* Check whether the stack is still used or not.  */
 static inline bool
 __nptl_stack_in_use (struct pthread *pd)
diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c
index 329c4cbb8f..6934b048b8 100644
--- a/nptl/pthread_mutex_conf.c
+++ b/nptl/pthread_mutex_conf.c
@@ -24,6 +24,7 @@ 
 #include <unistd.h>  /* Get STDOUT_FILENO for _dl_printf.  */
 #include <elf/dl-tunables.h>
 #include <nptl-stack.h>
+#include <malloc-hugepages.h>
 
 struct mutex_config __mutex_aconf =
 {
@@ -45,6 +46,26 @@  TUNABLE_CALLBACK (set_stack_cache_size) (tunable_val_t *valp)
   __nptl_stack_cache_maxsize = valp->numval;
 }
 
+static void
+TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp)
+{
+  enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
+  /*
+     Only allow to change the default of this tunable if the system
+     does support and has either 'madvise' or 'always' mode. Otherwise
+     the madvise() call is wasteful.
+  */
+  switch(thp_mode)
+    {
+    case malloc_thp_mode_always:
+    case malloc_thp_mode_madvise:
+      __nptl_stack_hugetlb = (int32_t) valp->numval;
+      break;
+    default:
+      break;
+    }
+}
+
 void
 __pthread_tunables_init (void)
 {
@@ -52,5 +73,7 @@  __pthread_tunables_init (void)
                TUNABLE_CALLBACK (set_mutex_spin_count));
   TUNABLE_GET (stack_cache_size, size_t,
                TUNABLE_CALLBACK (set_stack_cache_size));
+  TUNABLE_GET (stack_hugetlb, int32_t,
+	       TUNABLE_CALLBACK (set_stack_hugetlb));
 }
 #endif
diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list
index bd1ddb121d..22fa9e0d12 100644
--- a/sysdeps/nptl/dl-tunables.list
+++ b/sysdeps/nptl/dl-tunables.list
@@ -33,5 +33,11 @@  glibc {
       maxval: 1
       default: 1
     }
+    stack_hugetlb {
+      type: INT_32
+      minval: 0
+      maxval: 1
+      default: 0
+    }
   }
 }