From patchwork Thu Apr 20 17:24:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 68087 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 376A03857340 for ; Thu, 20 Apr 2023 17:25:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 376A03857340 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1682011507; bh=K4EtPmfw8nw4vM9vMBknz3lkStI5hJUaakX6HqoKGLg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Tz6JyrejVqt0l4U7uHTrzE301ChIjPHvGrWhRzKxVsuXV7kRePbjW/rcNxmPkWyy/ Ky2f3WRG5TeJxs5MkQdShv056iyhAkna1S3LVCXUMpkjdr38a4jzoaugmwMpqSf6Dw LPTEC6v3m11aIRdxbO3vpxGkwqcVHyruhM5L+k5A= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oo1-xc35.google.com (mail-oo1-xc35.google.com [IPv6:2607:f8b0:4864:20::c35]) by sourceware.org (Postfix) with ESMTPS id 551A73858C83 for ; Thu, 20 Apr 2023 17:24:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 551A73858C83 Received: by mail-oo1-xc35.google.com with SMTP id k1-20020a4ab281000000b0053e8b5524abso873769ooo.2 for ; Thu, 20 Apr 2023 10:24:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682011481; x=1684603481; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K4EtPmfw8nw4vM9vMBknz3lkStI5hJUaakX6HqoKGLg=; b=Z6xfZhFfcuuzPIfJlc3A03NzIi+Zcw7T4jxQ1tmJkSVTAVFnoM5tFKe9SyflTepITH w2X0vGZc3mKU89ZYnIsU/8lzu5tx8zML8Sf5aUkm8oIS4Mp7LnM98vconB8f0DOTkthh bNkN7BRhNhT7OIzjxDKRTbEYIPCdZ0nliRfOGH0TD8A6lcxcS44/W715tQ7yqHU6edbz fPpjkizFFb/mCGifAgwDxrKAe+Io2eJmLQ9ebSiGzTohCwSaz2m8oepNKagATVN0Efm1 mxO1QoyS2RdvQkttC7Tryef1NJCYroKUwxABbUdHnGJnks4INRaWXiw+3nBf9LX450lC TRdQ== X-Gm-Message-State: AAQBX9eCXXdnGND5Y4W9EtWfrvHxIdM05DacTTIPcxgSv8pHwsmHU25Z CY1eY47gZEGIaSB2G+/OXCmYex0ueN5OmCfnA7N0xA== X-Google-Smtp-Source: AKy350aHF7s1HMPkv6wx8kON1qX11Q6YELx6lIeqPGKe90JtrOXCSSzWmdEle5RJk3U2jqDj6Vp9VA== X-Received: by 2002:a4a:49d6:0:b0:545:b40d:267d with SMTP id z205-20020a4a49d6000000b00545b40d267dmr1169446ooa.4.1682011480979; Thu, 20 Apr 2023 10:24:40 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c3:333:c0e2:699e:18c9:eb96]) by smtp.gmail.com with ESMTPSA id 124-20020a4a1d82000000b00541854d066bsm835059oog.10.2023.04.20.10.24.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Apr 2023 10:24:40 -0700 (PDT) To: libc-alpha@sourceware.org, Cupertino Miranda , Wilco Dijkstra Subject: [PATCH] nptl: Disable THP on thread stack if it incurs in large RSS usage Date: Thu, 20 Apr 2023 14:24:36 -0300 Message-Id: <20230420172436.2013698-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" If Transparent Huge Page (THP) is set to 'always', the thread stack might be backed by Huge Pages depending of the asigned address by the kernel and the resulting guard position. If the guard page is within the same large page that might be used by the stack itself, changing stack permission will make the allocated range no longer be server with THP. The kernel will then revert back using default page size. In this case, besides the aditional work, the kernel will need to potential keep all the pages since it can not distinguish which one was really touched by the process. This result in a large RSS usage than just madvise the range to not use huge pages. The glibc.pthread.stack_hugetlb is still useful for the case where the caller either setup no guard page or a guard page multiple of default THP size. In this case, the kernel might potentially backup the stack with THP, but it would be up to the thread stack profile if THP is benefitial or not. Checked on x86_64-linux-gnu. --- nptl/allocatestack.c | 34 ++++++++++++++++ sysdeps/generic/malloc-hugepages.h | 1 + sysdeps/unix/sysv/linux/malloc-hugepages.c | 46 ++++++++++++++++++---- 3 files changed, 74 insertions(+), 7 deletions(-) diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index f9d8cdfd08..1eb34f816c 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -33,6 +33,7 @@ #include #include #include +#include /* Default alignment of stack. */ #ifndef STACK_ALIGN @@ -206,6 +207,31 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize) #endif } +/* If Transparent Huge Page (THP) is set to 'always', the thread stack might + be backed by Huge Pages depending of the asigned address by the kernel and + if resulting guard position. If the guard page is within the same large + page that might be used by the stack itself, changing stack permission + will make the allocated range no longer be server with THP. The kernel will + then revert back using default page size. + + In this case, besides the aditional work, the kernel will need to potential + keep all the pages since it can distinguish which one was really touched by + the process. This result in a large RSS usage than just madvise the range + to not use huge pages. */ +static __always_inline int +advise_thp (void *mem, size_t size, char *guard) +{ + enum malloc_thp_mode_t thpmode = __malloc_thp_mode (); + if (thpmode != malloc_thp_mode_always) + return 0; + + unsigned long int thpsize = __malloc_default_thp_pagesize (); + if (PTR_ALIGN_DOWN (mem, thpsize) != PTR_ALIGN_DOWN (guard, thpsize)) + return 0; + + return __madvise (mem, size, MADV_NOHUGEPAGE); +} + /* Returns a usable stack for a new thread either by allocating a new stack or reusing a cached stack of sufficient size. ATTR must be non-NULL and point to a valid pthread_attr. @@ -396,6 +422,14 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, { char *guard = guard_position (mem, size, guardsize, pd, pagesize_m1); + + if (__glibc_unlikely (__nptl_stack_hugetlb == 1)) + { + int r = advise_thp (mem, size, guard); + if (r != 0) + return r; + } + if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0) { __munmap (mem, size); diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h index d68b85630c..21d4844bc4 100644 --- a/sysdeps/generic/malloc-hugepages.h +++ b/sysdeps/generic/malloc-hugepages.h @@ -26,6 +26,7 @@ unsigned long int __malloc_default_thp_pagesize (void) attribute_hidden; enum malloc_thp_mode_t { + malloc_thp_mode_unknown, malloc_thp_mode_always, malloc_thp_mode_madvise, malloc_thp_mode_never, diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c index 2f316474c1..e7877f098e 100644 --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -22,19 +22,33 @@ #include #include +/* The __malloc_thp_mode is called only in single-thread mode, either in + malloc initialization or pthread creation. */ +static unsigned long int thp_pagesize = -1; + unsigned long int __malloc_default_thp_pagesize (void) { + unsigned long int size = atomic_load_relaxed (&thp_pagesize); + if (size != -1) + return size; + int fd = __open64_nocancel ( "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY); if (fd == -1) - return 0; + { + atomic_store_relaxed (&thp_pagesize, 0); + return 0; + } char str[INT_BUFSIZE_BOUND (unsigned long int)]; ssize_t s = __read_nocancel (fd, str, sizeof (str)); __close_nocancel (fd); if (s < 0) - return 0; + { + atomic_store_relaxed (&thp_pagesize, 0); + return 0; + } unsigned long int r = 0; for (ssize_t i = 0; i < s; i++) @@ -44,16 +58,28 @@ __malloc_default_thp_pagesize (void) r *= 10; r += str[i] - '0'; } + atomic_store_relaxed (&thp_pagesize, r); return r; } +/* The __malloc_thp_mode is called only in single-thread mode, either in + malloc initialization or pthread creation. */ +static enum malloc_thp_mode_t thp_mode = malloc_thp_mode_unknown; + enum malloc_thp_mode_t __malloc_thp_mode (void) { + enum malloc_thp_mode_t mode = atomic_load_relaxed (&thp_mode); + if (mode != malloc_thp_mode_unknown) + return mode; + int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled", O_RDONLY); if (fd == -1) - return malloc_thp_mode_not_supported; + { + atomic_store_relaxed (&thp_mode, malloc_thp_mode_not_supported); + return malloc_thp_mode_not_supported; + } static const char mode_always[] = "[always] madvise never\n"; static const char mode_madvise[] = "always [madvise] never\n"; @@ -69,13 +95,19 @@ __malloc_thp_mode (void) if (s == sizeof (mode_always) - 1) { if (strcmp (str, mode_always) == 0) - return malloc_thp_mode_always; + mode = malloc_thp_mode_always; else if (strcmp (str, mode_madvise) == 0) - return malloc_thp_mode_madvise; + mode = malloc_thp_mode_madvise; else if (strcmp (str, mode_never) == 0) - return malloc_thp_mode_never; + mode = malloc_thp_mode_never; + else + mode = malloc_thp_mode_not_supported; } - return malloc_thp_mode_not_supported; + else + mode = malloc_thp_mode_not_supported; + + atomic_store_relaxed (&thp_mode, mode); + return mode; } static size_t