[v3,4/5] malloc: Add Huge Page support for mmap()
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
Commit Message
With the morecore hook gone, there is not easy way to provide a way to
force the system allocation to use huge pages directly without resorting
to transparent huge pages. Some users and programs do prefer to
use the huge pages directly instead of THP for multiple reasons:
no splitting, re-merging by the VM, no TLB shootdowns for running
processes, fast allocation from the reserve pool, no competition with
the rest of the processes unlike THP, no swapping all, etc.
This patch extends the 'glibc.malloc.hugetlb' tunable: the value
'2' means to use huge pages directly using the system default size,
while a positive value means and specific page size that is matched
against the ones supported by the system.
Currently only memory allocated on sysmalloc() is handled, the arenas
still uses the default system page size.
To test is a new rule is added tests-malloc-hugetlb2, which run the
addes tests with the required GLIBC_TUNABLE setting. On systems without
a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB)
allocation failure. To improve test coverage it is required to create
a pool with some allocated pages.
Checked on x86_64-linux-gnu.
---
NEWS | 12 +-
Rules | 17 +++
elf/dl-tunables.list | 3 +-
elf/tst-rtld-list-tunables.exp | 2 +-
malloc/Makefile | 12 +-
malloc/malloc.c | 30 ++++-
manual/tunables.texi | 7 ++
sysdeps/generic/malloc-hugepages.c | 7 ++
sysdeps/generic/malloc-hugepages.h | 7 ++
sysdeps/unix/sysv/linux/malloc-hugepages.c | 126 +++++++++++++++++++++
10 files changed, 209 insertions(+), 14 deletions(-)
Comments
Hi Adhemerval,
I gave this new version a try and I'm still hitting the
malloc/tst-free-errno* test failures on ppc64le as before [1][2]. I
tried to debug it but I failed to find the root cause so far.
I could not reproduce it on x86_64 either, only ppc64le. Please let me
know if there's any test I can run to help.
[1] https://sourceware.org/pipermail/libc-alpha/2021-August/130321.html
[2] https://sourceware.org/pipermail/libc-alpha/2021-August/130344.html
--
Matheus Castanho
On 26/08/2021 15:35, Matheus Castanho wrote:
>
> Hi Adhemerval,
>
> I gave this new version a try and I'm still hitting the
> malloc/tst-free-errno* test failures on ppc64le as before [1][2]. I
> tried to debug it but I failed to find the root cause so far.
>
> I could not reproduce it on x86_64 either, only ppc64le. Please let me
> know if there's any test I can run to help.
>
> [1] https://sourceware.org/pipermail/libc-alpha/2021-August/130321.html
> [2] https://sourceware.org/pipermail/libc-alpha/2021-August/130344.html
I need to send an updated version of the 5/5. But is it the same failure
as before? Unfortunately the powerpc machines I have access don't have
any large page configured, and I couldn't reproduced this issue on
aarch64 as well.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> On 26/08/2021 15:35, Matheus Castanho wrote:
>>
>> Hi Adhemerval,
>>
>> I gave this new version a try and I'm still hitting the
>> malloc/tst-free-errno* test failures on ppc64le as before [1][2]. I
>> tried to debug it but I failed to find the root cause so far.
>>
>> I could not reproduce it on x86_64 either, only ppc64le. Please let me
>> know if there's any test I can run to help.
>>
>> [1] https://sourceware.org/pipermail/libc-alpha/2021-August/130321.html
>> [2] https://sourceware.org/pipermail/libc-alpha/2021-August/130344.html
>
> I need to send an updated version of the 5/5. But is it the same failure
> as before?
For malloc/tst-free-errno*, yes. The others were fixed in the new
version. And these happen even without 5/5.
$ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno
[...]
double free or corruption (out)
make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
FAIL: malloc/tst-free-errno
original exit status 1
Didn't expect signal from child: got `Aborted'
$ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-mcheck
[...]
memory clobbered past end of allocated block
make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
FAIL: malloc/tst-free-errno-mcheck
original exit status 1
Didn't expect signal from child: got `Aborted'
$ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-malloc-check
[...]
make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
FAIL: malloc/tst-free-errno-malloc-check
original exit status 1
error: xmmap.c:29: mmap of 16908288 bytes, prot=0x3, flags=0x32: Device or resource busy
error: 1 test failures
> Unfortunately the powerpc machines I have access don't have
> any large page configured, and I couldn't reproduced this issue on
> aarch64 as well.
--
Matheus Castanho
Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>
>> On 26/08/2021 15:35, Matheus Castanho wrote:
>>>
>>> Hi Adhemerval,
>>>
>>> I gave this new version a try and I'm still hitting the
>>> malloc/tst-free-errno* test failures on ppc64le as before [1][2]. I
>>> tried to debug it but I failed to find the root cause so far.
>>>
>>> I could not reproduce it on x86_64 either, only ppc64le. Please let me
>>> know if there's any test I can run to help.
>>>
>>> [1] https://sourceware.org/pipermail/libc-alpha/2021-August/130321.html
>>> [2] https://sourceware.org/pipermail/libc-alpha/2021-August/130344.html
>>
>> I need to send an updated version of the 5/5. But is it the same failure
>> as before?
>
> For malloc/tst-free-errno*, yes. The others were fixed in the new
> version. And these happen even without 5/5.
>
> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno
> [...]
> double free or corruption (out)
> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
> FAIL: malloc/tst-free-errno
> original exit status 1
> Didn't expect signal from child: got `Aborted'
>
> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-mcheck
> [...]
> memory clobbered past end of allocated block
> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
> FAIL: malloc/tst-free-errno-mcheck
> original exit status 1
> Didn't expect signal from child: got `Aborted'
>
> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-malloc-check
> [...]
> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
> FAIL: malloc/tst-free-errno-malloc-check
> original exit status 1
> error: xmmap.c:29: mmap of 16908288 bytes, prot=0x3, flags=0x32: Device or resource busy
> error: 1 test failures
>
More info:
I noticed that big_size used in tst-free-errno.c (0x1000000) happens to
match exactly the hugepage size of the system I was testing the patches
on (16 MB).
The tests pass when running on another POWER9 that uses 2 MB hugepages.
>> Unfortunately the powerpc machines I have access don't have
>> any large page configured, and I couldn't reproduced this issue on
>> aarch64 as well.
--
Matheus Castanho
On 26/08/2021 18:26, Matheus Castanho wrote:
>
> Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:
>
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>
>>> On 26/08/2021 15:35, Matheus Castanho wrote:
>>>>
>>>> Hi Adhemerval,
>>>>
>>>> I gave this new version a try and I'm still hitting the
>>>> malloc/tst-free-errno* test failures on ppc64le as before [1][2]. I
>>>> tried to debug it but I failed to find the root cause so far.
>>>>
>>>> I could not reproduce it on x86_64 either, only ppc64le. Please let me
>>>> know if there's any test I can run to help.
>>>>
>>>> [1] https://sourceware.org/pipermail/libc-alpha/2021-August/130321.html
>>>> [2] https://sourceware.org/pipermail/libc-alpha/2021-August/130344.html
>>>
>>> I need to send an updated version of the 5/5. But is it the same failure
>>> as before?
>>
>> For malloc/tst-free-errno*, yes. The others were fixed in the new
>> version. And these happen even without 5/5.
>>
>> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno
>> [...]
>> double free or corruption (out)
>> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
>> FAIL: malloc/tst-free-errno
>> original exit status 1
>> Didn't expect signal from child: got `Aborted'
>>
>> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-mcheck
>> [...]
>> memory clobbered past end of allocated block
>> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
>> FAIL: malloc/tst-free-errno-mcheck
>> original exit status 1
>> Didn't expect signal from child: got `Aborted'
>>
>> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-malloc-check
>> [...]
>> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
>> FAIL: malloc/tst-free-errno-malloc-check
>> original exit status 1
>> error: xmmap.c:29: mmap of 16908288 bytes, prot=0x3, flags=0x32: Device or resource busy
>> error: 1 test failures
>>
>
> More info:
>
> I noticed that big_size used in tst-free-errno.c (0x1000000) happens to
> match exactly the hugepage size of the system I was testing the patches
> on (16 MB).
>
> The tests pass when running on another POWER9 that uses 2 MB hugepages.
I think I know what might be happening here: the test mmap an region
overlapping the allocated memory by mmap at line 92. To avoid trashing
any malloc metadata it also backup and copy back the first page which
contains the malloc metadata. The problem is depending of the page
size used on the malloc() call, the backup used by the test might not
be suffice: the metadata might be further than the
[ptr_align, ptr_aligned + pagesize].
And since we might not know which page size is used on malloc (since
it might depend of available pages on system global poll at the
moment of the mmap reserve), this is no easy way to find which page
size to use to backup.
I think it would be better to just filter out this test for
malloc-tests-hugetlb2.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> On 26/08/2021 18:26, Matheus Castanho wrote:
>>
>> Matheus Castanho via Libc-alpha <libc-alpha@sourceware.org> writes:
>>
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>>
>>>> On 26/08/2021 15:35, Matheus Castanho wrote:
>>>>>
>>>>> Hi Adhemerval,
>>>>>
>>>>> I gave this new version a try and I'm still hitting the
>>>>> malloc/tst-free-errno* test failures on ppc64le as before [1][2]. I
>>>>> tried to debug it but I failed to find the root cause so far.
>>>>>
>>>>> I could not reproduce it on x86_64 either, only ppc64le. Please let me
>>>>> know if there's any test I can run to help.
>>>>>
>>>>> [1] https://sourceware.org/pipermail/libc-alpha/2021-August/130321.html
>>>>> [2] https://sourceware.org/pipermail/libc-alpha/2021-August/130344.html
>>>>
>>>> I need to send an updated version of the 5/5. But is it the same failure
>>>> as before?
>>>
>>> For malloc/tst-free-errno*, yes. The others were fixed in the new
>>> version. And these happen even without 5/5.
>>>
>>> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno
>>> [...]
>>> double free or corruption (out)
>>> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
>>> FAIL: malloc/tst-free-errno
>>> original exit status 1
>>> Didn't expect signal from child: got `Aborted'
>>>
>>> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-mcheck
>>> [...]
>>> memory clobbered past end of allocated block
>>> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
>>> FAIL: malloc/tst-free-errno-mcheck
>>> original exit status 1
>>> Didn't expect signal from child: got `Aborted'
>>>
>>> $ GLIBC_TUNABLES="glibc.malloc.hugetlb=2" make test t=malloc/tst-free-errno-malloc-check
>>> [...]
>>> make[2]: Leaving directory '/home/mscastanho/src/glibc/malloc'
>>> FAIL: malloc/tst-free-errno-malloc-check
>>> original exit status 1
>>> error: xmmap.c:29: mmap of 16908288 bytes, prot=0x3, flags=0x32: Device or resource busy
>>> error: 1 test failures
>>>
>>
>> More info:
>>
>> I noticed that big_size used in tst-free-errno.c (0x1000000) happens to
>> match exactly the hugepage size of the system I was testing the patches
>> on (16 MB).
>>
>> The tests pass when running on another POWER9 that uses 2 MB hugepages.
>
> I think I know what might be happening here: the test mmap an region
> overlapping the allocated memory by mmap at line 92. To avoid trashing
> any malloc metadata it also backup and copy back the first page which
> contains the malloc metadata. The problem is depending of the page
> size used on the malloc() call, the backup used by the test might not
> be suffice: the metadata might be further than the
> [ptr_align, ptr_aligned + pagesize].
>
That seems to match what I observed while debugging. getpagesize()
returns 64K, but since malloc is using hugepages, the actual page size
used in the allocation is 16 MB in this case, so the backup does not
actually preserve the entire pages.
> And since we might not know which page size is used on malloc (since
> it might depend of available pages on system global poll at the
> moment of the mmap reserve), this is no easy way to find which page
> size to use to backup.
>
> I think it would be better to just filter out this test for
> malloc-tests-hugetlb2.
--
Matheus Castanho
@@ -10,9 +10,15 @@ Version 2.35
Major new features:
* On Linux, a new tunable, glibc.malloc.hugetlb, can be used to
- make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls.
- It might improve performance with Transparent Huge Pages madvise mode
- depending of the workload.
+ either make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk
+ or to use huge pages directly with mmap calls with the MAP_HUGETLB
+ flags). The former can improve performance when Transparent Huge Pages
+ is set to 'madvise' mode while the latter uses the system reversed
+ pages.
+
+* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to
+ instruct malloc to try use Huge Pages when allocate memory with mmap()
+ calls (through the use of MAP_HUGETLB).
Deprecated and removed features, and other changes affecting compatibility:
@@ -158,6 +158,7 @@ tests: $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \
$(tests-mcheck:%=$(objpfx)%-mcheck.out) \
$(tests-malloc-check:%=$(objpfx)%-malloc-check.out) \
$(tests-malloc-hugetlb1:%=$(objpfx)%-malloc-hugetlb1.out) \
+ $(tests-malloc-hugetlb2:%=$(objpfx)%-malloc-hugetlb2.out) \
$(tests-special) $(tests-printers-out)
xtests: tests $(xtests:%=$(objpfx)%.out) $(xtests-special)
endif
@@ -170,6 +171,7 @@ else
tests-expected = $(tests) $(tests-internal) $(tests-printers) \
$(tests-container) $(tests-malloc-check:%=%-malloc-check) \
$(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) \
+ $(tests-malloc-hugetlb2:%=%-malloc-hugetlb2) \
$(tests-mcheck:%=%-mcheck)
endif
tests:
@@ -199,6 +201,7 @@ endif
binaries-mcheck-tests = $(tests-mcheck:%=%-mcheck)
binaries-malloc-check-tests = $(tests-malloc-check:%=%-malloc-check)
binaries-malloc-hugetlb1-tests = $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1)
+binaries-malloc-hugetlb2-tests = $(tests-malloc-hugetlb2:%=%-malloc-hugetlb2)
else
binaries-all-notests =
binaries-all-tests = $(tests) $(tests-internal) $(xtests) $(test-srcs)
@@ -211,6 +214,7 @@ binaries-pie-notests =
binaries-mcheck-tests =
binaries-malloc-check-tests =
binaries-malloc-hugetlb1-tests =
+binaries-malloc-hugetlb2-tests =
endif
binaries-pie = $(binaries-pie-tests) $(binaries-pie-notests)
@@ -259,6 +263,14 @@ $(addprefix $(objpfx),$(binaries-malloc-hugetlb1-tests)): %-malloc-hugetlb1: %.o
$(+link-tests)
endif
+ifneq "$(strip $(binaries-malloc-hugetlb2-tests))" ""
+$(addprefix $(objpfx),$(binaries-malloc-hugetlb2-tests)): %-malloc-hugetlb2: %.o \
+ $(link-extra-libs-tests) \
+ $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \
+ $(addprefix $(csu-objpfx),start.o) $(+preinit) $(+postinit)
+ $(+link-tests)
+endif
+
ifneq "$(strip $(binaries-pie-tests))" ""
$(addprefix $(objpfx),$(binaries-pie-tests)): %: %.o \
$(link-extra-libs-tests) \
@@ -302,6 +314,11 @@ $(1)-malloc-hugetlb1-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=1
endef
$(foreach t,$(tests-malloc-hugetlb1),$(eval $(call malloc-hugetlb1-ENVS,$(t))))
+# All malloc-hugetlb2 tests will be run with GLIBC_TUNABLE=glibc.malloc.hugetlb=2
+define malloc-hugetlb2-ENVS
+$(1)-malloc-hugetlb2-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=2
+endef
+$(foreach t,$(tests-malloc-hugetlb2),$(eval $(call malloc-hugetlb2-ENVS,$(t))))
# mcheck tests need the debug DSO to support -lmcheck.
define mcheck-ENVS
@@ -93,9 +93,8 @@ glibc {
security_level: SXID_IGNORE
}
hugetlb {
- type: INT_32
+ type: SIZE_T
minval: 0
- maxval: 1
}
}
cpu {
@@ -1,7 +1,7 @@
glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+)
glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+)
glibc.malloc.check: 0 (min: 0, max: 3)
-glibc.malloc.hugetlb: 0 (min: 0, max: 1)
+glibc.malloc.hugetlb: 0x0 (min: 0x0, max: 0x[f]+)
glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+)
glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+)
@@ -78,10 +78,10 @@ tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \
tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \
$(tests-static),$(tests))
-# Run all testes with GLIBC_TUNABLE=glibc.malloc.hugetlb=1 that check the
-# Transparent Huge Pages support. We need exclude some tests that define
-# the ENV vars.
-tests-exclude-hugetlb1 = \
+# Run all tests with GLIBC_TUNABLE=glibc.malloc.hugetlb={1,2} which check
+# the Transparent Huge Pages support (1) or Huge Page support (2). We need
+# exclude some tests that define the ENV vars.
+tests-exclude-hugetlb = \
tst-compathooks-off \
tst-compathooks-on \
tst-interpose-nothread \
@@ -92,7 +92,9 @@ tests-exclude-hugetlb1 = \
tst-malloc-usable-tunables \
tst-mallocstate
tests-malloc-hugetlb1 = \
- $(filter-out $(tests-exclude-hugetlb1), $(tests))
+ $(filter-out $(tests-exclude-hugetlb), $(tests))
+tests-malloc-hugetlb2 = \
+ $(filter-out $(tests-exclude-hugetlb), $(tests))
# -lmcheck needs __malloc_initialize_hook, which was deprecated in 2.24.
ifeq ($(have-GLIBC_2.23)$(build-shared),yesyes)
@@ -1884,6 +1884,10 @@ struct malloc_par
#if HAVE_TUNABLES
/* Transparent Large Page support. */
INTERNAL_SIZE_T thp_pagesize;
+ /* A value different than 0 means to align mmap allocation to hp_pagesize
+ add hp_flags on flags. */
+ INTERNAL_SIZE_T hp_pagesize;
+ int hp_flags;
#endif
/* Memory map support */
@@ -2442,7 +2446,10 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
if (mm == MAP_FAILED)
return mm;
- madvise_thp (mm, size);
+#ifdef MAP_HUGETLB
+ if (!(extra_flags & MAP_HUGETLB))
+ madvise_thp (mm, size);
+#endif
/*
The offset to the start of the mmapped region is stored in the prev_size
@@ -2530,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
|| ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
&& (mp_.n_mmaps < mp_.n_mmaps_max)))
{
- char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
+ char *mm;
+#if HAVE_TUNABLES
+ if (mp_.hp_pagesize > 0)
+ {
+ /* There is no need to isse the THP madvise call if Huge Pages are
+ used directly. */
+ mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av);
+ if (mm != MAP_FAILED)
+ return mm;
+ }
+#endif
+ mm = sysmalloc_mmap (nb, pagesize, 0, av);
if (mm != MAP_FAILED)
return mm;
tried_mmap = true;
@@ -2611,7 +2629,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
}
else if (!tried_mmap)
{
- /* We can at least try to use to mmap memory. */
+ /* We can at least try to use to mmap memory. If new_heap() fails
+ it is unlikely that trying to allocage huge page succeeds. */
char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
if (mm != MAP_FAILED)
return mm;
@@ -5405,6 +5424,11 @@ do_set_hugetlb (int32_t value)
if (thp_mode == malloc_thp_mode_madvise)
mp_.thp_pagesize = __malloc_default_thp_pagesize ();
}
+ else if (value >= 2)
+ {
+ __malloc_hugepage_config (value == 2 ? 0 : value, &mp_.hp_pagesize,
+ &mp_.hp_flags);
+ }
return 0;
}
#endif
@@ -277,6 +277,13 @@ value is @code{0}, which disables any additional support on @code{malloc}.
Setting its value to @code{1} enables the use of @code{madvise} with
@code{MADV_HUGEPAGE} after memory allocation with @code{mmap}. It is enabled
only if the system supports Transparent Huge Page (currently only on Linux).
+
+Setting its value to @code{2} enables the use of Huge Page directly with
+@code{mmap} with the use of @code{MAP_HUGETLB} flags. The huge page size
+to use will be the default one provided by the system. A value larger than
+@code{2} specifies a specific huge page size, which will be matched against
+the system supported ones. If neither the default huge page size or if the
+provided value is invalid, the huge page size usage is disabled.
@end deftp
@node Dynamic Linking Tunables
@@ -29,3 +29,10 @@ __malloc_thp_mode (void)
{
return malloc_thp_mode_not_supported;
}
+
+/* Return the default transparent huge page size. */
+void __malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags)
+{
+ *pagesize = 0;
+ *flags = 0;
+}
@@ -34,4 +34,11 @@ enum malloc_thp_mode_t
enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
+/* Return the support huge page size from the REQUESTED sizes on PAGESIZE
+ along with the required extra mmap flags on FLAGS, Requesting the value
+ of 0 returns the default huge page size, otherwise the value will be
+ matched against the supported on by the system. */
+void __malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags)
+ attribute_hidden;
+
#endif /* _MALLOC_HUGEPAGES_H */
@@ -17,8 +17,10 @@
not, see <https://www.gnu.org/licenses/>. */
#include <intprops.h>
+#include <dirent.h>
#include <malloc-hugepages.h>
#include <not-cancel.h>
+#include <sys/mman.h>
size_t
__malloc_default_thp_pagesize (void)
@@ -74,3 +76,127 @@ __malloc_thp_mode (void)
}
return malloc_thp_mode_not_supported;
}
+
+static size_t
+malloc_default_hugepage_size (void)
+{
+ int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY);
+ if (fd == -1)
+ return 0;
+
+ size_t hpsize = 0;
+
+ char buf[512];
+ off64_t off = 0;
+ while (1)
+ {
+ ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off);
+ if (r < 0)
+ break;
+ buf[r - 1] = '\0';
+
+ const char *s = strstr (buf, "Hugepagesize:");
+ if (s == NULL)
+ {
+ char *nl = strrchr (buf, '\n');
+ if (nl == NULL)
+ break;
+ off += (nl + 1) - buf;
+ continue;
+ }
+
+ /* The default huge page size is in the form:
+ Hugepagesize: NUMBER kB */
+ s += sizeof ("Hugepagesize: ") - 1;
+ for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++)
+ {
+ if (s[i] == ' ')
+ continue;
+ hpsize *= 10;
+ hpsize += s[i] - '0';
+ }
+ hpsize *= 1024;
+ break;
+ }
+
+ __close_nocancel (fd);
+
+ return hpsize;
+}
+
+static inline int
+hugepage_flags (size_t pagesize)
+{
+ return MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT);
+}
+
+void
+__malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags)
+{
+ *pagesize = 0;
+ *flags = 0;
+
+ if (requested == 0)
+ {
+ *pagesize = malloc_default_hugepage_size ();
+ if (pagesize != 0)
+ *flags = hugepage_flags (*pagesize);
+ return;
+ }
+
+ int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages",
+ O_RDONLY | O_DIRECTORY, 0);
+ if (dirfd == -1)
+ return;
+
+ char buffer[1024];
+ while (true)
+ {
+#if !IS_IN(libc)
+# define __getdents64 getdents64
+#endif
+ ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer));
+ if (ret == -1)
+ break;
+ else if (ret == 0)
+ break;
+
+ bool found = false;
+ char *begin = buffer, *end = buffer + ret;
+ while (begin != end)
+ {
+ unsigned short int d_reclen;
+ memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen),
+ sizeof (d_reclen));
+ const char *dname = begin + offsetof (struct dirent64, d_name);
+ begin += d_reclen;
+
+ if (dname[0] == '.'
+ || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0)
+ continue;
+
+ /* Each entry represents a supported huge page in the form of:
+ hugepages-<size>kB. */
+ size_t hpsize = 0;
+ const char *sizestr = dname + sizeof ("hugepages-") - 1;
+ for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++)
+ {
+ hpsize *= 10;
+ hpsize += sizestr[i] - '0';
+ }
+ hpsize *= 1024;
+
+ if (hpsize == requested)
+ {
+ *pagesize = hpsize;
+ *flags = hugepage_flags (*pagesize);
+ found = true;
+ break;
+ }
+ }
+ if (found)
+ break;
+ }
+
+ __close_nocancel (dirfd);
+}