Message ID | 20210813210429.1147112-1-adhemerval.zanella@linaro.org |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8B8A8389041B for <patchwork@sourceware.org>; Fri, 13 Aug 2021 21:05:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8B8A8389041B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628888707; bh=/+OtZla/Gy4xrYQV+OzXtwvotjuIn60540iIqTs9Ra8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Ap9UlqjLU9wD5h+Qdpy0OAtcgi25CTiU4R1UNrA28AcEIwgphTVz8pMoxUesRovPq j6TP4v+ARdcnWml8kceT/FRrhfvBOG0/+izi/ucrnkcGwbELUmtGBHakcDpfaxG3N6 Bd1eQ70KlIV9ie2ZvGpXNjHoQ//tvkNoyAP2QwGA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by sourceware.org (Postfix) with ESMTPS id 371D93853834 for <libc-alpha@sourceware.org>; Fri, 13 Aug 2021 21:04:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 371D93853834 Received: by mail-pj1-x1033.google.com with SMTP id u13-20020a17090abb0db0290177e1d9b3f7so22523886pjr.1 for <libc-alpha@sourceware.org>; Fri, 13 Aug 2021 14:04:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=/+OtZla/Gy4xrYQV+OzXtwvotjuIn60540iIqTs9Ra8=; b=iVo4OE1CiPEso/pIa2mL1mIZb5nOFHb4hr/OOh6PcLWtG412CKsXB3tXceaTV1tfj2 Ov2e5zTsZDES1wOrhLh7xVUnEqj0kOJJhyDHilEjSQD3l3SgvuVMVyqs43clm6wBcnie GW9jI1n+PaF6prp+fTYkRScb8aC352Xyc8s7QrLSNLy9JW5IPmVOcCHtnsSXWevk9WMJ 4XewI8TgbOX038TXJvFYtRa/ur2pRU0qQ24Qt0jRPp/KpJWeTg8UOGWNRnOCXX9RhkFj B4nbsBYnJ+tHySAHIHlsvDiCePWf+vI5IPmDUoneTlwtnt/lS7Ez5c0tIhv7bEucYis5 y7AQ== X-Gm-Message-State: AOAM5328BLNzLI0OGZuHQo5PVzP8lHsOoR+mlCEXvhk3MXhZzgbPffBm InZnbyRftl4pyS2skmhl09qVNCK1zKcljA== X-Google-Smtp-Source: ABdhPJy7WBQP2/CVdBfyqEjomNpCUgIMKMn8EVfqFp10OD2k1kxktz2osu4OO5xCJNw5LZLd0tCJDA== X-Received: by 2002:a05:6a00:b95:b0:3e0:f3f4:6214 with SMTP id g21-20020a056a000b9500b003e0f3f46214mr4206570pfj.5.1628888674019; Fri, 13 Aug 2021 14:04:34 -0700 (PDT) Received: from birita.. ([2804:431:c7cb:9dce:7cf8:4c28:b9e7:2767]) by smtp.gmail.com with ESMTPSA id v15sm3259462pff.105.2021.08.13.14.04.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Aug 2021 14:04:33 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 0/3] malloc: improve THP effectiveness Date: Fri, 13 Aug 2021 18:04:26 -0300 Message-Id: <20210813210429.1147112-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Norbert Manthey <nmanthey@conp-solutions.com>, Siddhesh Poyarekar <siddhesh@sourceware.org>, Guillaume Morin <guillaume@morinfr.org> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series | malloc: improve THP effectiveness | |
Message
Adhemerval Zanella Netto
Aug. 13, 2021, 9:04 p.m. UTC
Linux Transparent Huge Pages (THP) current support three different states [1]: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous memory. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through the madvise(MADV_HUGEPAGE) call. This patchset adds a new tunable, 'glibc.malloc.thp_pagesize', which allows the user to explicit use THP on anonymous page even if the 'madvise' state is set. The usage should be transparent for mmap() call, and the madvise(MADV_HUGEPAGE) call is only set for sizes large than the THP huge page size. The sbrk() change alters the program memory allocation since the increment is now aligned to the huge page size, instead of default page size. It is enable as with the new tunable. This patchset adds THP for aarch64, mips, powerpc, riscv, s390, sparc, and x86. These are the architecture that have HAVE_ARCH_TRANSPARENT_HUGEPAGE as default and define the internal flags for THP support (I might have missed some architecture). Although it does improve THP effectiveness, it does not provide the same features from libhugetlsfs morecore implementation [2], since it does use MAP_HUGETLB explicit on mmap. And I think this is not what we want for glibc, it requires additional setup from the admin to mount the hugetlsfs and reserve the pages with it outside from glibc scope. The performance improvements are really dependent of the workload and the platform, however a simple testcase might show the possible improvements: $ cat hugepages.cc #include <unordered_map> int main (int argc, char *argv[]) { std::size_t iters = 10000000; std::unordered_map <std::size_t, std::size_t> ht; ht.reserve (iters); for (std::size_t i = 0; i < iters; ++i) ht.try_emplace (i, i); return 0; } $ g++ -std=c++17 -O2 hugepages.cc -o hugepages On a x86_64 (Ryzen 9 5900X): Performance counter stats for 'env GLIBC_TUNABLES=glibc.malloc.thp_pagesize=0 ./testrun.sh ./hugepages': 98,874 faults 717,059 dTLB-loads 411,701 dTLB-load-misses # 57.42% of all dTLB cache accesses 3,754,927 cache-misses # 8.479 % of all cache refs 44,287,580 cache-references 0.315278378 seconds time elapsed 0.238635000 seconds user 0.076714000 seconds sys Performance counter stats for 'env GLIBC_TUNABLES=glibc.malloc.thp_pagesize=1 ./testrun.sh ./hugepages': 1,871 faults 120,035 dTLB-loads 19,882 dTLB-load-misses # 16.56% of all dTLB cache accesses 4,182,942 cache-misses # 7.452 % of all cache refs 56,128,995 cache-references 0.262620733 seconds time elapsed 0.222233000 seconds user 0.040333000 seconds sys On an AArch64 (cortex A72): Performance counter stats for 'env GLIBC_TUNABLES=glibc.malloc.thp_pagesize=0 ./testrun.sh ./hugepages': 98835 faults 2007234756 dTLB-loads 4613669 dTLB-load-misses # 0.23% of all dTLB cache accesses 8831801 cache-misses # 0.504 % of all cache refs 1751391405 cache-references 0.616782575 seconds time elapsed 0.460946000 seconds user 0.154309000 seconds sys Performance counter stats for 'env GLIBC_TUNABLES=glibc.malloc.thp_pagesize=1 ./testrun.sh ./hugepages': 955 faults 1787401880 dTLB-loads 224034 dTLB-load-misses # 0.01% of all dTLB cache accesses 5480917 cache-misses # 0.337 % of all cache refs 1625937858 cache-references 0.487773443 seconds time elapsed 0.440894000 seconds user 0.046465000 seconds sys And on a powerpc64 (POWER8): Performance counter stats for 'env GLIBC_TUNABLES=glibc.malloc.thp_pagesize=0 ./testrun.sh ./hugepages ': 5453 faults 9940 dTLB-load-misses 1338152 cache-misses # 0.101 % of all cache refs 1326037487 cache-references 1.056355887 seconds time elapsed 1.014633000 seconds user 0.041805000 seconds sys Performance counter stats for 'env GLIBC_TUNABLES=glibc.malloc.thp_pagesize=1 ./testrun.sh ./hugepages ': 1016 faults 1746 dTLB-load-misses 399052 cache-misses # 0.030 % of all cache refs 1316059877 cache-references 1.057810501 seconds time elapsed 1.012175000 seconds user 0.045624000 seconds sys It is worth to note that the powerpc64 machine has 'always' set on '/sys/kernel/mm/transparent_hugepage/enabled'. Norbert Manthey's paper has more information with a more thoroughly performance analysis. This is a based of the previous RFC to enable Transparent Huge Page with madvise [3], with fixes and improvements: * Remove the usage of MAP_HUGE_SHIFT since kernel only uses it when MAP_HUGETLB is also used. * Moved the option to a tunable and add documentation. * Remove the address alignmente before madvise call, since kernels already does it. * Added a arch-specific hook to return the THP huge page size. * Avoid calling sbrk() twice to align to THP and remove a lot of unecessary internal state. * Add a NEWS entry. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/mm/transhuge.rst [2] https://sourceware.org/pipermail/libc-alpha/2021-July/129041.html [3] https://arxiv.org/pdf/2004.14378.pdf [4] https://sourceware.org/pipermail/libc-alpha/2020-May/113539.html Adhemerval Zanella (3): malloc: Add madvise support for Transparent Huge Pages malloc: Add THP/madvise support for sbrk malloc: Add arch-specific malloc_verify_thp_pagesize for Linux NEWS | 5 +- elf/dl-tunables.list | 5 ++ elf/tst-rtld-list-tunables.exp | 1 + malloc/arena.c | 5 ++ malloc/malloc-internal.h | 1 + malloc/malloc.c | 85 ++++++++++++++++++-- manual/tunables.texi | 11 +++ sysdeps/generic/malloc-thp.h | 32 ++++++++ sysdeps/unix/sysv/linux/aarch64/malloc-thp.h | 40 +++++++++ sysdeps/unix/sysv/linux/mips/malloc-thp.h | 39 +++++++++ sysdeps/unix/sysv/linux/powerpc/malloc-thp.h | 56 +++++++++++++ sysdeps/unix/sysv/linux/riscv/malloc-thp.h | 32 ++++++++ sysdeps/unix/sysv/linux/s390/malloc-thp.h | 33 ++++++++ sysdeps/unix/sysv/linux/sparc/malloc-thp.h | 36 +++++++++ sysdeps/unix/sysv/linux/x86/malloc-thp.h | 32 ++++++++ 15 files changed, 407 insertions(+), 6 deletions(-) create mode 100644 sysdeps/generic/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/mips/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/powerpc/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/riscv/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/s390/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/sparc/malloc-thp.h create mode 100644 sysdeps/unix/sysv/linux/x86/malloc-thp.h
Comments
Hello Adhemerval, On 13 Aug 18:04, Adhemerval Zanella wrote: > Although it does improve THP effectiveness, it does not provide the same > features from libhugetlsfs morecore implementation [2], since it does > use MAP_HUGETLB explicit on mmap. And I think this is not what we want > for glibc, it requires additional setup from the admin to mount the > hugetlsfs and reserve the pages with it outside from glibc scope. I certainly do appreciate the effort. But unfortunately this is not a usable replacement for most libhugetlblfs users (who actually want to use hugetlbfs). First, I'll argue to have THP supported directly in the allocator is probably a nice-to-have feature for THP users but probably not that critical considering you can just madvise() the memory *after* it's been allocated. Alternatively any malloc interposition scheme can do this trivially: afaik there were never an actual *need* for a morecore implementation in this case. There is no such possibility with hugetlbfs. It's either mmap() with MAP_HUGETLB or not. Second, THP is not a drop-in replacement for hugetblfs. hugetlbfs has desirable properties that simply do not exist for THP. Just a few examples: 1) A hugetlbfs allocation gives you a huge page or not at allocation time but this is forever. There is no splitting, re-merging by the VM: no TLB shootdowns for running processes etc. 2) Fast allocation: there is a dedicated pool of these pages. There is no competition with the rest of the processes unlike THP 3) No swapping all hugetlbfs pages. I would really like to discuss and/or maybe implement some schemable that allows to optionally use MAP_HUGETLB for all allocations (which would be a definitive improvement over what libhugetlbfs was doing) if that's workable for you. Guillaume.
On 13/08/2021 18:37, Guillaume Morin wrote: > Hello Adhemerval, > > On 13 Aug 18:04, Adhemerval Zanella wrote: >> Although it does improve THP effectiveness, it does not provide the same >> features from libhugetlsfs morecore implementation [2], since it does >> use MAP_HUGETLB explicit on mmap. And I think this is not what we want >> for glibc, it requires additional setup from the admin to mount the >> hugetlsfs and reserve the pages with it outside from glibc scope. > > I certainly do appreciate the effort. But unfortunately this is not a > usable replacement for most libhugetlblfs users (who actually want to use > hugetlbfs). Yes, that's why I explicit stated this is not a replacement. But I had the misconception that MAP_HUGETLB would require to to use solely with mmap files opened on libhugetls filesystem and that's why I wrote that I think it is not meant to glibc. However reading the kernel documentation properly and after some experiment, I think we add another tunable to use MAP_HUGETLB as first allocation option. > > First, I'll argue to have THP supported directly in the allocator is > probably a nice-to-have feature for THP users but probably not that > critical considering you can just madvise() the memory *after* > it's been allocated. Alternatively any malloc interposition scheme can > do this trivially: afaik there were never an actual *need* for a > morecore implementation in this case. > There is no such possibility with hugetlbfs. It's either mmap() with > MAP_HUGETLB or not. Yeah, I am aware. The idea is mainly to abstract to requirement to query the kernel or handle the multiple pagesize from different architectures and also possible handle the sbrk() calls for main arena. We can also add more tuning in the future if we find some scenarios where THP need tuning. > > Second, THP is not a drop-in replacement for hugetblfs. hugetlbfs has > desirable properties that simply do not exist for THP. Just a few > examples: 1) A hugetlbfs allocation gives you a huge page or not at > allocation time but this is forever. There is no splitting, re-merging > by the VM: no TLB shootdowns for running processes etc. 2) Fast > allocation: there is a dedicated pool of these pages. There is no > competition with the rest of the processes unlike THP 3) No swapping all > hugetlbfs pages. > > I would really like to discuss and/or maybe implement some schemable > that allows to optionally use MAP_HUGETLB for all allocations (which > would be a definitive improvement over what libhugetlbfs was doing) if > that's workable for you. I am reworking this patchset and I intend to add an option to use MAP_HUGETLB as well.
On 8/17/21 2:25 AM, Adhemerval Zanella via Libc-alpha wrote: > I am reworking this patchset and I intend to add an option to use > MAP_HUGETLB as well. A low hanging fruit for adding MAP_HUGETLB may be at the mmap threshold; whenever it crosses the hugepage size, always use MAP_HUGETLB for mmapped blocks if the user requests it via tunable. Siddhesh