malloc: Use __get_nprocs on arena_get2 (BZ 30945)
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
Commit Message
This restore the 2.33 semantic for arena_get2. It was changed by
11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2acafc). The
__get_nproc was refactored over then and now it also avoid to call
malloc.
The 11a02b035b46 did not take in consideration any performance
implication, which should have been discussed properly. The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.
Checked on x86_64-linux-gnu.
---
include/sys/sysinfo.h | 4 ----
malloc/arena.c | 2 +-
misc/getsysstats.c | 6 ------
sysdeps/mach/getsysstats.c | 6 ------
sysdeps/unix/sysv/linux/getsysstats.c | 2 +-
5 files changed, 2 insertions(+), 18 deletions(-)
Comments
Part of the reason for using get_nprocs_sched() is that cpu affinity can
limit the application to fewer processors, and the logic needs to take
that into account.
If you have a 192-core system but use setaffinity to limit a process to
4 cores, there will be *way* too many arenas created.
On 11/10/23 15:54, DJ Delorie wrote:
>
> Part of the reason for using get_nprocs_sched() is that cpu affinity can
> limit the application to fewer processors, and the logic needs to take
> that into account.
>
> If you have a 192-core system but use setaffinity to limit a process to
> 4 cores, there will be *way* too many arenas created.
>
Yes, that was your rationale on the previous attempt [1]. Maybe adding a
tunable, so if you are using affinity you can change thie behavior?
In any case, I think we should either close the bug or fix this and work
on a better heuristic to handle process that set affinity.
[1] https://sourceware.org/pipermail/libc-alpha/2022-June/140161.html
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
> Yes, that was your rationale on the previous attempt [1]. Maybe adding a
> tunable, so if you are using affinity you can change thie behavior?
We have a tunable to set the max arenas already.
Measuring the number of runnable CPUs seems more of a "correctness"
thing to me. This code is only called when we need to create a new
arena, so it's not in a performance-critical path. It's more important
to get this right so that the rest of the code maximizes performance.
> In any case, I think we should either close the bug or fix this and work
> on a better heuristic to handle process that set affinity.
It sound like we need something between get_nprocs and get_nprocs_sched
that counts the number of cores the program is assigned, not the number
of cores the thread is assigned? I thought that's what
sched_getaffinity() did, but the man page says "process" most of the
time, but sometimes "thread".
On 11/10/23 17:34, DJ Delorie wrote:
> Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
>> Yes, that was your rationale on the previous attempt [1]. Maybe adding a
>> tunable, so if you are using affinity you can change thie behavior?
>
> We have a tunable to set the max arenas already.
>
> Measuring the number of runnable CPUs seems more of a "correctness"
> thing to me. This code is only called when we need to create a new
> arena, so it's not in a performance-critical path. It's more important
> to get this right so that the rest of the code maximizes performance.
I agree, my main problem with this change it was originally unintentional.
My suggestion would to revert it back, and proper change with a meaningful
commit message that it is intentional and process that use setaffinity
and see contention increase the max arenas.
>
>> In any case, I think we should either close the bug or fix this and work
>> on a better heuristic to handle process that set affinity.
>
> It sound like we need something between get_nprocs and get_nprocs_sched
> that counts the number of cores the program is assigned, not the number
> of cores the thread is assigned? I thought that's what
> sched_getaffinity() did, but the man page says "process" most of the
> time, but sometimes "thread".
>
The sched_getaffinity on Linux is essentially pthread_getaffinity_np,
since the cpu mask is per process attribute. And I couldn't find any
procfs way to access the AND mask of all mask withins the thread group.
And I see no easy way to accomplish it, iterating over the /proc/self/task
is way too slow and do not scale. Best option I can think of is add
a global mask and keep track of set bits on each sched_setaffinity
successful call; but it would require at least some global locking to
avoid racy conditions.
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes:
> On 11/10/23 15:54, DJ Delorie wrote:
>>
>> Part of the reason for using get_nprocs_sched() is that cpu affinity can
>> limit the application to fewer processors, and the logic needs to take
>> that into account.
>>
>> If you have a 192-core system but use setaffinity to limit a process to
>> 4 cores, there will be *way* too many arenas created.
>>
>
> Yes, that was your rationale on the previous attempt [1]. Maybe adding a
> tunable, so if you are using affinity you can change thie behavior?
>
> In any case, I think we should either close the bug or fix this and work
> on a better heuristic to handle process that set affinity.
>
> [1] https://sourceware.org/pipermail/libc-alpha/2022-June/140161.html
Rather than let this issue rot, I'm withdrawing my dissent. I still
think we need to resolve this issue with something between "nprocs" and
"thread procs" - perhaps store the sched mask at startup and use that?
But that could be deferred to some new development.
I think the max-arenas tunable is more useful if you err on the side of
too many arenas, too.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
> @@ -29,7 +29,7 @@
> #include <sys/sysinfo.h>
> #include <sysdep.h>
>
> -int
> +static int
> __get_nprocs_sched (void)
> {
> enum
So we're making this function internal again... ok.
> diff --git a/include/sys/sysinfo.h b/include/sys/sysinfo.h
>
> -/* Return the number of available processors which the process can
> - be scheduled. */
> -extern int __get_nprocs_sched (void) attribute_hidden;
> -
which means we no longer need this declaration. Ok.
> diff --git a/malloc/arena.c b/malloc/arena.c
> @@ -820,7 +820,7 @@ arena_get2 (size_t size, mstate avoid_arena)
> narenas_limit = mp_.arena_max;
> else if (narenas > mp_.arena_test)
> {
> - int n = __get_nprocs_sched ();
> + int n = __get_nprocs ();
And can't use it here. Ok.
> diff --git a/misc/getsysstats.c b/misc/getsysstats.c
> @@ -44,12 +44,6 @@ weak_alias (__get_nprocs, get_nprocs)
> link_warning (get_nprocs, "warning: get_nprocs will always return 1")
>
>
> -int
> -__get_nprocs_sched (void)
> -{
> - return 1;
> -}
> -
Nor need it here. Ok.
> diff --git a/sysdeps/mach/getsysstats.c b/sysdeps/mach/getsysstats.c
> @@ -62,12 +62,6 @@ __get_nprocs (void)
> libc_hidden_def (__get_nprocs)
> weak_alias (__get_nprocs, get_nprocs)
>
> -int
> -__get_nprocs_sched (void)
> -{
> - return __get_nprocs ();
> -}
> -
Or here. Ok.
LGTM.
Reviewed-by: DJ Delorie <dj@redhat.com>
@@ -14,10 +14,6 @@ libc_hidden_proto (__get_nprocs_conf)
extern int __get_nprocs (void);
libc_hidden_proto (__get_nprocs)
-/* Return the number of available processors which the process can
- be scheduled. */
-extern int __get_nprocs_sched (void) attribute_hidden;
-
/* Return number of physical pages of memory in the system. */
extern long int __get_phys_pages (void);
libc_hidden_proto (__get_phys_pages)
@@ -820,7 +820,7 @@ arena_get2 (size_t size, mstate avoid_arena)
narenas_limit = mp_.arena_max;
else if (narenas > mp_.arena_test)
{
- int n = __get_nprocs_sched ();
+ int n = __get_nprocs ();
if (n >= 1)
narenas_limit = NARENAS_FROM_NCORES (n);
@@ -44,12 +44,6 @@ weak_alias (__get_nprocs, get_nprocs)
link_warning (get_nprocs, "warning: get_nprocs will always return 1")
-int
-__get_nprocs_sched (void)
-{
- return 1;
-}
-
long int
__get_phys_pages (void)
{
@@ -62,12 +62,6 @@ __get_nprocs (void)
libc_hidden_def (__get_nprocs)
weak_alias (__get_nprocs, get_nprocs)
-int
-__get_nprocs_sched (void)
-{
- return __get_nprocs ();
-}
-
/* Return the number of physical pages on the system. */
long int
__get_phys_pages (void)
@@ -29,7 +29,7 @@
#include <sys/sysinfo.h>
#include <sysdep.h>
-int
+static int
__get_nprocs_sched (void)
{
enum