From patchwork Mon Aug 23 21:57:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44772 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B2EF23857823 for ; Mon, 23 Aug 2021 22:01:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B2EF23857823 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629756096; bh=2Mpm4t4SnBHP3yBAKy6ZZ+tq5VlBTfqO4pBI9tpyDlE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=s7JfbuWsik3rn7tEclgFvGVuKpnPOPwARo+yzVO9LgA6J2+ofOjCeDJUQ9NqcI7ej dzrAl2zKGIHn1VrYx13qbsT4IifRROCFYPjKpb72EGE2JbZ0ymdaQh4UEcxaSECs0n BwS0iGhPWCT1/uWdeEQjbk297yakohlvpHVqyGF4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by sourceware.org (Postfix) with ESMTPS id 4FC5E385741A for ; Mon, 23 Aug 2021 21:57:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4FC5E385741A Received: by mail-qv1-xf34.google.com with SMTP id c14so10588357qvs.9 for ; Mon, 23 Aug 2021 14:57:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2Mpm4t4SnBHP3yBAKy6ZZ+tq5VlBTfqO4pBI9tpyDlE=; b=G5chn/9AJArmP5LkY5irC+vNT9SBDyEP2dUPHp6btwhhfnWUeLIaZp/E07UVmUA6CM dRX3zLZ/r8V5L7dgNGXQcgShcqw1l9yphvCZdeI/xRMfP+gfDu81J3NqNl4XNaLpBxEJ dAcH+Kp+3mFv7JcyMm1ypsA8s+JekSc3+/2Ef1pbkdRNjSpFQ3fKtKKxgN2A2PshCd4h VYLVM1A+f9I5wHXjm7F65hn4zYraqv5lGiMeF6LuDFLgdfzRqJ2uoOvdXMyhUc2iygPg eROQWChil5EhjNOkesGIP+mQr5SsI87awu3Lm8kPYwFTKezMeC9q+pLfw0fPEFFjmdzw 7plQ== X-Gm-Message-State: AOAM532g2MUVshUsMpfrEgQC4wRQ9rYKPrsDJbMHh2P9ta4Iy1r4gncQ sv4A2ceI9VUO8qGgYEm4fSH9DOqtQAL3fA== X-Google-Smtp-Source: ABdhPJxMdjeEQl4/mx/uKWxJW1JzSWX3EsbVTrzdPzuk9C65U19celIaHX1od0AeR22MRDzN7H6kXw== X-Received: by 2002:a05:6214:1787:: with SMTP id ct7mr35644824qvb.53.1629755843674; Mon, 23 Aug 2021 14:57:23 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id g1sm7444540qti.56.2021.08.23.14.57.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:57:23 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 4/5] malloc: Add Huge Page support for mmap() Date: Mon, 23 Aug 2021 18:57:12 -0300 Message-Id: <20210823215713.3304523-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> References: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" With the morecore hook gone, there is not easy way to provide a way to force the system allocation to use huge pages directly without resorting to transparent huge pages. Some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly using the system default size, while a positive value means and specific page size that is matched against the ones supported by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. --- NEWS | 12 +- Rules | 17 +++ elf/dl-tunables.list | 3 +- elf/tst-rtld-list-tunables.exp | 2 +- malloc/Makefile | 12 +- malloc/malloc.c | 30 ++++- manual/tunables.texi | 7 ++ sysdeps/generic/malloc-hugepages.c | 7 ++ sysdeps/generic/malloc-hugepages.h | 7 ++ sysdeps/unix/sysv/linux/malloc-hugepages.c | 126 +++++++++++++++++++++ 10 files changed, 209 insertions(+), 14 deletions(-) diff --git a/NEWS b/NEWS index 5c9486b468..ac8f31950f 100644 --- a/NEWS +++ b/NEWS @@ -10,9 +10,15 @@ Version 2.35 Major new features: * On Linux, a new tunable, glibc.malloc.hugetlb, can be used to - make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls. - It might improve performance with Transparent Huge Pages madvise mode - depending of the workload. + either make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk + or to use huge pages directly with mmap calls with the MAP_HUGETLB + flags). The former can improve performance when Transparent Huge Pages + is set to 'madvise' mode while the latter uses the system reversed + pages. + +* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to + instruct malloc to try use Huge Pages when allocate memory with mmap() + calls (through the use of MAP_HUGETLB). Deprecated and removed features, and other changes affecting compatibility: diff --git a/Rules b/Rules index 471458ad4a..542a37eef0 100644 --- a/Rules +++ b/Rules @@ -158,6 +158,7 @@ tests: $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \ $(tests-mcheck:%=$(objpfx)%-mcheck.out) \ $(tests-malloc-check:%=$(objpfx)%-malloc-check.out) \ $(tests-malloc-hugetlb1:%=$(objpfx)%-malloc-hugetlb1.out) \ + $(tests-malloc-hugetlb2:%=$(objpfx)%-malloc-hugetlb2.out) \ $(tests-special) $(tests-printers-out) xtests: tests $(xtests:%=$(objpfx)%.out) $(xtests-special) endif @@ -170,6 +171,7 @@ else tests-expected = $(tests) $(tests-internal) $(tests-printers) \ $(tests-container) $(tests-malloc-check:%=%-malloc-check) \ $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) \ + $(tests-malloc-hugetlb2:%=%-malloc-hugetlb2) \ $(tests-mcheck:%=%-mcheck) endif tests: @@ -199,6 +201,7 @@ endif binaries-mcheck-tests = $(tests-mcheck:%=%-mcheck) binaries-malloc-check-tests = $(tests-malloc-check:%=%-malloc-check) binaries-malloc-hugetlb1-tests = $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) +binaries-malloc-hugetlb2-tests = $(tests-malloc-hugetlb2:%=%-malloc-hugetlb2) else binaries-all-notests = binaries-all-tests = $(tests) $(tests-internal) $(xtests) $(test-srcs) @@ -211,6 +214,7 @@ binaries-pie-notests = binaries-mcheck-tests = binaries-malloc-check-tests = binaries-malloc-hugetlb1-tests = +binaries-malloc-hugetlb2-tests = endif binaries-pie = $(binaries-pie-tests) $(binaries-pie-notests) @@ -259,6 +263,14 @@ $(addprefix $(objpfx),$(binaries-malloc-hugetlb1-tests)): %-malloc-hugetlb1: %.o $(+link-tests) endif +ifneq "$(strip $(binaries-malloc-hugetlb2-tests))" "" +$(addprefix $(objpfx),$(binaries-malloc-hugetlb2-tests)): %-malloc-hugetlb2: %.o \ + $(link-extra-libs-tests) \ + $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \ + $(addprefix $(csu-objpfx),start.o) $(+preinit) $(+postinit) + $(+link-tests) +endif + ifneq "$(strip $(binaries-pie-tests))" "" $(addprefix $(objpfx),$(binaries-pie-tests)): %: %.o \ $(link-extra-libs-tests) \ @@ -302,6 +314,11 @@ $(1)-malloc-hugetlb1-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=1 endef $(foreach t,$(tests-malloc-hugetlb1),$(eval $(call malloc-hugetlb1-ENVS,$(t)))) +# All malloc-hugetlb2 tests will be run with GLIBC_TUNABLE=glibc.malloc.hugetlb=2 +define malloc-hugetlb2-ENVS +$(1)-malloc-hugetlb2-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=2 +endef +$(foreach t,$(tests-malloc-hugetlb2),$(eval $(call malloc-hugetlb2-ENVS,$(t)))) # mcheck tests need the debug DSO to support -lmcheck. define mcheck-ENVS diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index 1b347487f7..379701b84f 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -93,9 +93,8 @@ glibc { security_level: SXID_IGNORE } hugetlb { - type: INT_32 + type: SIZE_T minval: 0 - maxval: 1 } } cpu { diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp index 89aa5c0d40..245b074432 100644 --- a/elf/tst-rtld-list-tunables.exp +++ b/elf/tst-rtld-list-tunables.exp @@ -1,7 +1,7 @@ glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.check: 0 (min: 0, max: 3) -glibc.malloc.hugetlb: 0 (min: 0, max: 1) +glibc.malloc.hugetlb: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647) glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+) diff --git a/malloc/Makefile b/malloc/Makefile index e47fd660f6..a03739d3e1 100644 --- a/malloc/Makefile +++ b/malloc/Makefile @@ -78,10 +78,10 @@ tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \ tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \ $(tests-static),$(tests)) -# Run all testes with GLIBC_TUNABLE=glibc.malloc.hugetlb=1 that check the -# Transparent Huge Pages support. We need exclude some tests that define -# the ENV vars. -tests-exclude-hugetlb1 = \ +# Run all tests with GLIBC_TUNABLE=glibc.malloc.hugetlb={1,2} which check +# the Transparent Huge Pages support (1) or Huge Page support (2). We need +# exclude some tests that define the ENV vars. +tests-exclude-hugetlb = \ tst-compathooks-off \ tst-compathooks-on \ tst-interpose-nothread \ @@ -92,7 +92,9 @@ tests-exclude-hugetlb1 = \ tst-malloc-usable-tunables \ tst-mallocstate tests-malloc-hugetlb1 = \ - $(filter-out $(tests-exclude-hugetlb1), $(tests)) + $(filter-out $(tests-exclude-hugetlb), $(tests)) +tests-malloc-hugetlb2 = \ + $(filter-out $(tests-exclude-hugetlb), $(tests)) # -lmcheck needs __malloc_initialize_hook, which was deprecated in 2.24. ifeq ($(have-GLIBC_2.23)$(build-shared),yesyes) diff --git a/malloc/malloc.c b/malloc/malloc.c index dc5ecb84c5..370d9ffac0 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1884,6 +1884,10 @@ struct malloc_par #if HAVE_TUNABLES /* Transparent Large Page support. */ INTERNAL_SIZE_T thp_pagesize; + /* A value different than 0 means to align mmap allocation to hp_pagesize + add hp_flags on flags. */ + INTERNAL_SIZE_T hp_pagesize; + int hp_flags; #endif /* Memory map support */ @@ -2442,7 +2446,10 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av) if (mm == MAP_FAILED) return mm; - madvise_thp (mm, size); +#ifdef MAP_HUGETLB + if (!(extra_flags & MAP_HUGETLB)) + madvise_thp (mm, size); +#endif /* The offset to the start of the mmapped region is stored in the prev_size @@ -2530,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold) && (mp_.n_mmaps < mp_.n_mmaps_max))) { - char *mm = sysmalloc_mmap (nb, pagesize, 0, av); + char *mm; +#if HAVE_TUNABLES + if (mp_.hp_pagesize > 0) + { + /* There is no need to isse the THP madvise call if Huge Pages are + used directly. */ + mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av); + if (mm != MAP_FAILED) + return mm; + } +#endif + mm = sysmalloc_mmap (nb, pagesize, 0, av); if (mm != MAP_FAILED) return mm; tried_mmap = true; @@ -2611,7 +2629,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) } else if (!tried_mmap) { - /* We can at least try to use to mmap memory. */ + /* We can at least try to use to mmap memory. If new_heap() fails + it is unlikely that trying to allocage huge page succeeds. */ char *mm = sysmalloc_mmap (nb, pagesize, 0, av); if (mm != MAP_FAILED) return mm; @@ -5405,6 +5424,11 @@ do_set_hugetlb (int32_t value) if (thp_mode == malloc_thp_mode_madvise) mp_.thp_pagesize = __malloc_default_thp_pagesize (); } + else if (value >= 2) + { + __malloc_hugepage_config (value == 2 ? 0 : value, &mp_.hp_pagesize, + &mp_.hp_flags); + } return 0; } #endif diff --git a/manual/tunables.texi b/manual/tunables.texi index 799fa76258..1961adcbcb 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -277,6 +277,13 @@ value is @code{0}, which disables any additional support on @code{malloc}. Setting its value to @code{1} enables the use of @code{madvise} with @code{MADV_HUGEPAGE} after memory allocation with @code{mmap}. It is enabled only if the system supports Transparent Huge Page (currently only on Linux). + +Setting its value to @code{2} enables the use of Huge Page directly with +@code{mmap} with the use of @code{MAP_HUGETLB} flags. The huge page size +to use will be the default one provided by the system. A value larger than +@code{2} specifies a specific huge page size, which will be matched against +the system supported ones. If neither the default huge page size or if the +provided value is invalid, the huge page size usage is disabled. @end deftp @node Dynamic Linking Tunables diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c index 262bcdbeb8..258f7db7e6 100644 --- a/sysdeps/generic/malloc-hugepages.c +++ b/sysdeps/generic/malloc-hugepages.c @@ -29,3 +29,10 @@ __malloc_thp_mode (void) { return malloc_thp_mode_not_supported; } + +/* Return the default transparent huge page size. */ +void __malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags) +{ + *pagesize = 0; + *flags = 0; +} diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h index 664cda9b67..a957611c06 100644 --- a/sysdeps/generic/malloc-hugepages.h +++ b/sysdeps/generic/malloc-hugepages.h @@ -34,4 +34,11 @@ enum malloc_thp_mode_t enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; +/* Return the support huge page size from the REQUESTED sizes on PAGESIZE + along with the required extra mmap flags on FLAGS, Requesting the value + of 0 returns the default huge page size, otherwise the value will be + matched against the supported on by the system. */ +void __malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags) + attribute_hidden; + #endif /* _MALLOC_HUGEPAGES_H */ diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c index 66589127cd..d5ec93a093 100644 --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -17,8 +17,10 @@ not, see . */ #include +#include #include #include +#include size_t __malloc_default_thp_pagesize (void) @@ -74,3 +76,127 @@ __malloc_thp_mode (void) } return malloc_thp_mode_not_supported; } + +static size_t +malloc_default_hugepage_size (void) +{ + int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY); + if (fd == -1) + return 0; + + size_t hpsize = 0; + + char buf[512]; + off64_t off = 0; + while (1) + { + ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off); + if (r < 0) + break; + buf[r - 1] = '\0'; + + const char *s = strstr (buf, "Hugepagesize:"); + if (s == NULL) + { + char *nl = strrchr (buf, '\n'); + if (nl == NULL) + break; + off += (nl + 1) - buf; + continue; + } + + /* The default huge page size is in the form: + Hugepagesize: NUMBER kB */ + s += sizeof ("Hugepagesize: ") - 1; + for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++) + { + if (s[i] == ' ') + continue; + hpsize *= 10; + hpsize += s[i] - '0'; + } + hpsize *= 1024; + break; + } + + __close_nocancel (fd); + + return hpsize; +} + +static inline int +hugepage_flags (size_t pagesize) +{ + return MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT); +} + +void +__malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags) +{ + *pagesize = 0; + *flags = 0; + + if (requested == 0) + { + *pagesize = malloc_default_hugepage_size (); + if (pagesize != 0) + *flags = hugepage_flags (*pagesize); + return; + } + + int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages", + O_RDONLY | O_DIRECTORY, 0); + if (dirfd == -1) + return; + + char buffer[1024]; + while (true) + { +#if !IS_IN(libc) +# define __getdents64 getdents64 +#endif + ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer)); + if (ret == -1) + break; + else if (ret == 0) + break; + + bool found = false; + char *begin = buffer, *end = buffer + ret; + while (begin != end) + { + unsigned short int d_reclen; + memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen), + sizeof (d_reclen)); + const char *dname = begin + offsetof (struct dirent64, d_name); + begin += d_reclen; + + if (dname[0] == '.' + || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0) + continue; + + /* Each entry represents a supported huge page in the form of: + hugepages-kB. */ + size_t hpsize = 0; + const char *sizestr = dname + sizeof ("hugepages-") - 1; + for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++) + { + hpsize *= 10; + hpsize += sizestr[i] - '0'; + } + hpsize *= 1024; + + if (hpsize == requested) + { + *pagesize = hpsize; + *flags = hugepage_flags (*pagesize); + found = true; + break; + } + } + if (found) + break; + } + + __close_nocancel (dirfd); +}