[v8,0/6] elf: THP-aware load segment alignment

Message ID 20260405035323.558335-1-wangrui@loongson.cn (mailing list archive)
Headers
Series elf: THP-aware load segment alignment |

Message

WANG Rui April 5, 2026, 3:53 a.m. UTC
  OK for commit?

Changes since [v7]:
  * Rename tunable glibc.elf.hugetlb to glibc.elf.thp.
  * Rebase on current master.

Changes since [v6]:
  * Move MAX_THP_PAGESIZE to hugepages.
  * Skip THP mode probing when DL_MAP_DEFAULT_THP_PAGESIZE is non-zero.

Changes since [v5]:
  * No functional changes.
  * Add benchmark results to the commit message.

Changes since [v4]:
  * Merge malloc-hugepages into hugepages.
  * Limit the glibc.elf.hugetlb tunable maximum value to 1.

Changes since [v3]:
  * Rebased on current master.
  * Resolved conflicts with recently merged changes.
  * No functional changes intended.

Changes since [v2]:
  * Refactor THP detection into a new generic hugepages abstraction,
    moving helpers out of malloc-hugepages.
  * Add a new tunable, `glibc.elf.hugetlb`, to control THP-aware ELF
    segment alignment.
  * Move the Linux implementation of `_dl_map_segment_align` to
    sysdeps/unix/sysv/linux/, making it generic for Linux (including
    32-bit) and avoiding wordsize-64-specific overrides.
  * Use `static inline` instead of `static __always_inline`.

Changes since [v1]:
  * Fix CI build failure (-Wunused-function).

This patch series introduces a small extension point in the ELF loader to allow
architecture-specific adjustment of load segment alignment, and uses it to
improve Transparent Huge Page (THP) usage on Linux.

Patch 1 moves Transparent Huge Page helpers into a new generic hugepages
abstraction, so THP mode detection and default huge page size probing can be
shared between malloc and the dynamic loader. There is no functional change.

Patch 2 removes a redundant declaration of `_dl_map_segments` from
dl-load.h, avoiding `-Wunused-function` build failures and keeping the
prototype colocated with its definition.

Patch 3 adds a new helper, `_dl_map_segment_align`, which is called when
determining the maximum alignment for ELF load segments. The generic
implementation is a no-op and preserves existing behavior.

Patch 4 introduces a new tunable, `glibc.elf.hugetlb`, which controls
THP-aware alignment of ELF loadable segments. By default, the value is 0
and existing behavior is preserved.

Patch 5 provides a Linux implementation of `_dl_map_segment_align` that
opportunistically aligns large, suitably aligned, non-writable `PT_LOAD`
segments to the system’s default THP page size when THP is configured to
be used unconditionally and the tunable is enabled.

Patch 6 enables this behavior by default on LoongArch64 Linux and defines
the default THP page size (32MB), matching the architecture’s PMD huge page
geometry.

[v7]: https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
[v6]: https://sourceware.org/pipermail/libc-alpha/2026-March/175737.html
[v5]: https://sourceware.org/pipermail/libc-alpha/2026-March/175694.html
[v4]: https://sourceware.org/pipermail/libc-alpha/2026-March/175644.html
[v3]: https://sourceware.org/pipermail/libc-alpha/2026-February/175464.html
[v2]: https://sourceware.org/pipermail/libc-alpha/2026-February/175394.html
[v1]: https://sourceware.org/pipermail/libc-alpha/2026-February/175359.html

WANG Rui (6):
  hugepages: Move THP helpers to generic hugepages abstraction
  elf: Remove redundant _dl_map_segments declaration from dl-load.h
  elf: Introduce _dl_map_segment_align hook for segment alignment tuning
  tunables: Add glibc.elf.thp tunable for THP-aware segment alignment
  elf: Align large load segments to PMD huge page size for THP
  loongarch: Enable THP-aligned load segments by default on 64-bit

 elf/dl-load.c                                 |  4 ++
 elf/dl-load.h                                 |  5 +-
 elf/dl-tunables.c                             |  7 +--
 elf/dl-tunables.list                          |  8 +++
 malloc/malloc-internal.h                      |  2 +-
 malloc/malloc.c                               | 27 +++++-----
 manual/tunables.texi                          | 24 +++++++++
 sysdeps/generic/Makefile                      |  4 +-
 sysdeps/generic/dl-map-segment-align.h        | 26 +++++++++
 .../{malloc-hugepages.c => hugepages.c}       | 13 +++--
 .../{malloc-hugepages.h => hugepages.h}       | 32 ++++++-----
 sysdeps/unix/sysv/linux/Makefile              |  1 +
 .../{malloc-hugepages.h => hugepages.h}       |  4 +-
 .../unix/sysv/linux/dl-map-segment-align.c    | 53 +++++++++++++++++++
 .../unix/sysv/linux/dl-map-segment-align.h    | 27 ++++++++++
 .../linux/{malloc-hugepages.c => hugepages.c} | 33 ++++++------
 .../unix/sysv/linux/loongarch/cpu-features.c  |  6 +++
 .../loongarch/lp64/dl-map-segment-align.h     | 22 ++++++++
 18 files changed, 237 insertions(+), 61 deletions(-)
 create mode 100644 sysdeps/generic/dl-map-segment-align.h
 rename sysdeps/generic/{malloc-hugepages.c => hugepages.c} (76%)
 rename sysdeps/generic/{malloc-hugepages.h => hugepages.h} (68%)
 rename sysdeps/unix/sysv/linux/aarch64/{malloc-hugepages.h => hugepages.h} (91%)
 create mode 100644 sysdeps/unix/sysv/linux/dl-map-segment-align.c
 create mode 100644 sysdeps/unix/sysv/linux/dl-map-segment-align.h
 rename sysdeps/unix/sysv/linux/{malloc-hugepages.c => hugepages.c} (89%)
 create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
  

Comments

H.J. Lu April 6, 2026, 11:32 p.m. UTC | #1
On Sun, Apr 5, 2026 at 11:54 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> OK for commit?
>
> Changes since [v7]:
>   * Rename tunable glibc.elf.hugetlb to glibc.elf.thp.
>   * Rebase on current master.
>
> Changes since [v6]:
>   * Move MAX_THP_PAGESIZE to hugepages.
>   * Skip THP mode probing when DL_MAP_DEFAULT_THP_PAGESIZE is non-zero.
>
> Changes since [v5]:
>   * No functional changes.
>   * Add benchmark results to the commit message.
>
> Changes since [v4]:
>   * Merge malloc-hugepages into hugepages.
>   * Limit the glibc.elf.hugetlb tunable maximum value to 1.
>
> Changes since [v3]:
>   * Rebased on current master.
>   * Resolved conflicts with recently merged changes.
>   * No functional changes intended.
>
> Changes since [v2]:
>   * Refactor THP detection into a new generic hugepages abstraction,
>     moving helpers out of malloc-hugepages.
>   * Add a new tunable, `glibc.elf.hugetlb`, to control THP-aware ELF
>     segment alignment.
>   * Move the Linux implementation of `_dl_map_segment_align` to
>     sysdeps/unix/sysv/linux/, making it generic for Linux (including
>     32-bit) and avoiding wordsize-64-specific overrides.
>   * Use `static inline` instead of `static __always_inline`.
>
> Changes since [v1]:
>   * Fix CI build failure (-Wunused-function).
>
> This patch series introduces a small extension point in the ELF loader to allow
> architecture-specific adjustment of load segment alignment, and uses it to
> improve Transparent Huge Page (THP) usage on Linux.
>
> Patch 1 moves Transparent Huge Page helpers into a new generic hugepages
> abstraction, so THP mode detection and default huge page size probing can be
> shared between malloc and the dynamic loader. There is no functional change.
>
> Patch 2 removes a redundant declaration of `_dl_map_segments` from
> dl-load.h, avoiding `-Wunused-function` build failures and keeping the
> prototype colocated with its definition.
>
> Patch 3 adds a new helper, `_dl_map_segment_align`, which is called when
> determining the maximum alignment for ELF load segments. The generic
> implementation is a no-op and preserves existing behavior.
>
> Patch 4 introduces a new tunable, `glibc.elf.hugetlb`, which controls
> THP-aware alignment of ELF loadable segments. By default, the value is 0
> and existing behavior is preserved.
>
> Patch 5 provides a Linux implementation of `_dl_map_segment_align` that
> opportunistically aligns large, suitably aligned, non-writable `PT_LOAD`
> segments to the system’s default THP page size when THP is configured to
> be used unconditionally and the tunable is enabled.
>
> Patch 6 enables this behavior by default on LoongArch64 Linux and defines
> the default THP page size (32MB), matching the architecture’s PMD huge page
> geometry.
>
> [v7]: https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
> [v6]: https://sourceware.org/pipermail/libc-alpha/2026-March/175737.html
> [v5]: https://sourceware.org/pipermail/libc-alpha/2026-March/175694.html
> [v4]: https://sourceware.org/pipermail/libc-alpha/2026-March/175644.html
> [v3]: https://sourceware.org/pipermail/libc-alpha/2026-February/175464.html
> [v2]: https://sourceware.org/pipermail/libc-alpha/2026-February/175394.html
> [v1]: https://sourceware.org/pipermail/libc-alpha/2026-February/175359.html
>
> WANG Rui (6):
>   hugepages: Move THP helpers to generic hugepages abstraction
>   elf: Remove redundant _dl_map_segments declaration from dl-load.h
>   elf: Introduce _dl_map_segment_align hook for segment alignment tuning
>   tunables: Add glibc.elf.thp tunable for THP-aware segment alignment
>   elf: Align large load segments to PMD huge page size for THP
>   loongarch: Enable THP-aligned load segments by default on 64-bit
>
>  elf/dl-load.c                                 |  4 ++
>  elf/dl-load.h                                 |  5 +-
>  elf/dl-tunables.c                             |  7 +--
>  elf/dl-tunables.list                          |  8 +++
>  malloc/malloc-internal.h                      |  2 +-
>  malloc/malloc.c                               | 27 +++++-----
>  manual/tunables.texi                          | 24 +++++++++
>  sysdeps/generic/Makefile                      |  4 +-
>  sysdeps/generic/dl-map-segment-align.h        | 26 +++++++++
>  .../{malloc-hugepages.c => hugepages.c}       | 13 +++--
>  .../{malloc-hugepages.h => hugepages.h}       | 32 ++++++-----
>  sysdeps/unix/sysv/linux/Makefile              |  1 +
>  .../{malloc-hugepages.h => hugepages.h}       |  4 +-
>  .../unix/sysv/linux/dl-map-segment-align.c    | 53 +++++++++++++++++++
>  .../unix/sysv/linux/dl-map-segment-align.h    | 27 ++++++++++
>  .../linux/{malloc-hugepages.c => hugepages.c} | 33 ++++++------
>  .../unix/sysv/linux/loongarch/cpu-features.c  |  6 +++
>  .../loongarch/lp64/dl-map-segment-align.h     | 22 ++++++++
>  18 files changed, 237 insertions(+), 61 deletions(-)
>  create mode 100644 sysdeps/generic/dl-map-segment-align.h
>  rename sysdeps/generic/{malloc-hugepages.c => hugepages.c} (76%)
>  rename sysdeps/generic/{malloc-hugepages.h => hugepages.h} (68%)
>  rename sysdeps/unix/sysv/linux/aarch64/{malloc-hugepages.h => hugepages.h} (91%)
>  create mode 100644 sysdeps/unix/sysv/linux/dl-map-segment-align.c
>  create mode 100644 sysdeps/unix/sysv/linux/dl-map-segment-align.h
>  rename sysdeps/unix/sysv/linux/{malloc-hugepages.c => hugepages.c} (89%)
>  create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
>
> --
> 2.53.0
>

Please add some tests for this feature.
  
H.J. Lu April 8, 2026, 2:07 a.m. UTC | #2
On Tue, Apr 7, 2026 at 7:32 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Apr 5, 2026 at 11:54 AM WANG Rui <wangrui@loongson.cn> wrote:
> >
> > OK for commit?
> >
> > Changes since [v7]:
> >   * Rename tunable glibc.elf.hugetlb to glibc.elf.thp.
> >   * Rebase on current master.
> >
> > Changes since [v6]:
> >   * Move MAX_THP_PAGESIZE to hugepages.
> >   * Skip THP mode probing when DL_MAP_DEFAULT_THP_PAGESIZE is non-zero.
> >
> > Changes since [v5]:
> >   * No functional changes.
> >   * Add benchmark results to the commit message.
> >
> > Changes since [v4]:
> >   * Merge malloc-hugepages into hugepages.
> >   * Limit the glibc.elf.hugetlb tunable maximum value to 1.
> >
> > Changes since [v3]:
> >   * Rebased on current master.
> >   * Resolved conflicts with recently merged changes.
> >   * No functional changes intended.
> >
> > Changes since [v2]:
> >   * Refactor THP detection into a new generic hugepages abstraction,
> >     moving helpers out of malloc-hugepages.
> >   * Add a new tunable, `glibc.elf.hugetlb`, to control THP-aware ELF
> >     segment alignment.
> >   * Move the Linux implementation of `_dl_map_segment_align` to
> >     sysdeps/unix/sysv/linux/, making it generic for Linux (including
> >     32-bit) and avoiding wordsize-64-specific overrides.
> >   * Use `static inline` instead of `static __always_inline`.
> >
> > Changes since [v1]:
> >   * Fix CI build failure (-Wunused-function).
> >
> > This patch series introduces a small extension point in the ELF loader to allow
> > architecture-specific adjustment of load segment alignment, and uses it to
> > improve Transparent Huge Page (THP) usage on Linux.
> >
> > Patch 1 moves Transparent Huge Page helpers into a new generic hugepages
> > abstraction, so THP mode detection and default huge page size probing can be
> > shared between malloc and the dynamic loader. There is no functional change.
> >
> > Patch 2 removes a redundant declaration of `_dl_map_segments` from
> > dl-load.h, avoiding `-Wunused-function` build failures and keeping the
> > prototype colocated with its definition.
> >
> > Patch 3 adds a new helper, `_dl_map_segment_align`, which is called when
> > determining the maximum alignment for ELF load segments. The generic
> > implementation is a no-op and preserves existing behavior.
> >
> > Patch 4 introduces a new tunable, `glibc.elf.hugetlb`, which controls
> > THP-aware alignment of ELF loadable segments. By default, the value is 0
> > and existing behavior is preserved.
> >
> > Patch 5 provides a Linux implementation of `_dl_map_segment_align` that
> > opportunistically aligns large, suitably aligned, non-writable `PT_LOAD`
> > segments to the system’s default THP page size when THP is configured to
> > be used unconditionally and the tunable is enabled.
> >
> > Patch 6 enables this behavior by default on LoongArch64 Linux and defines
> > the default THP page size (32MB), matching the architecture’s PMD huge page
> > geometry.
> >
> > [v7]: https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
> > [v6]: https://sourceware.org/pipermail/libc-alpha/2026-March/175737.html
> > [v5]: https://sourceware.org/pipermail/libc-alpha/2026-March/175694.html
> > [v4]: https://sourceware.org/pipermail/libc-alpha/2026-March/175644.html
> > [v3]: https://sourceware.org/pipermail/libc-alpha/2026-February/175464.html
> > [v2]: https://sourceware.org/pipermail/libc-alpha/2026-February/175394.html
> > [v1]: https://sourceware.org/pipermail/libc-alpha/2026-February/175359.html
> >
> > WANG Rui (6):
> >   hugepages: Move THP helpers to generic hugepages abstraction
> >   elf: Remove redundant _dl_map_segments declaration from dl-load.h
> >   elf: Introduce _dl_map_segment_align hook for segment alignment tuning
> >   tunables: Add glibc.elf.thp tunable for THP-aware segment alignment
> >   elf: Align large load segments to PMD huge page size for THP
> >   loongarch: Enable THP-aligned load segments by default on 64-bit
> >
> >  elf/dl-load.c                                 |  4 ++
> >  elf/dl-load.h                                 |  5 +-
> >  elf/dl-tunables.c                             |  7 +--
> >  elf/dl-tunables.list                          |  8 +++
> >  malloc/malloc-internal.h                      |  2 +-
> >  malloc/malloc.c                               | 27 +++++-----
> >  manual/tunables.texi                          | 24 +++++++++
> >  sysdeps/generic/Makefile                      |  4 +-
> >  sysdeps/generic/dl-map-segment-align.h        | 26 +++++++++
> >  .../{malloc-hugepages.c => hugepages.c}       | 13 +++--
> >  .../{malloc-hugepages.h => hugepages.h}       | 32 ++++++-----
> >  sysdeps/unix/sysv/linux/Makefile              |  1 +
> >  .../{malloc-hugepages.h => hugepages.h}       |  4 +-
> >  .../unix/sysv/linux/dl-map-segment-align.c    | 53 +++++++++++++++++++
> >  .../unix/sysv/linux/dl-map-segment-align.h    | 27 ++++++++++
> >  .../linux/{malloc-hugepages.c => hugepages.c} | 33 ++++++------
> >  .../unix/sysv/linux/loongarch/cpu-features.c  |  6 +++
> >  .../loongarch/lp64/dl-map-segment-align.h     | 22 ++++++++
> >  18 files changed, 237 insertions(+), 61 deletions(-)
> >  create mode 100644 sysdeps/generic/dl-map-segment-align.h
> >  rename sysdeps/generic/{malloc-hugepages.c => hugepages.c} (76%)
> >  rename sysdeps/generic/{malloc-hugepages.h => hugepages.h} (68%)
> >  rename sysdeps/unix/sysv/linux/aarch64/{malloc-hugepages.h => hugepages.h} (91%)
> >  create mode 100644 sysdeps/unix/sysv/linux/dl-map-segment-align.c
> >  create mode 100644 sysdeps/unix/sysv/linux/dl-map-segment-align.h
> >  rename sysdeps/unix/sysv/linux/{malloc-hugepages.c => hugepages.c} (89%)
> >  create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
> >
> > --
> > 2.53.0
> >
>
> Please add some tests for this feature.
>

I opened:

https://sourceware.org/bugzilla/show_bug.cgi?id=34056

for GNU_PROPERTY_1_NEEDED_TRANSPARENT_HUGEPAGE.
  
Wilco Dijkstra April 9, 2026, 10:28 a.m. UTC | #3
Hi HJ,

> I opened:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=34056
> 
> for GNU_PROPERTY_1_NEEDED_TRANSPARENT_HUGEPAGE.

So what is the reasoning for this? Where will it be set and under which conditions?
It looks like adding a lot of startup overhead for a flag that would either never be
set if it is not the default or always set if it is the default. The whole point of THP
is that it is transparent, ie. automatic, always enabled.

Cheers,
Wilco
  
H.J. Lu April 9, 2026, 11:50 a.m. UTC | #4
On Thu, Apr 9, 2026 at 6:29 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi HJ,
>
> > I opened:
> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=34056
> >
> > for GNU_PROPERTY_1_NEEDED_TRANSPARENT_HUGEPAGE.
>
> So what is the reasoning for this? Where will it be set and under which conditions?

A programmer can ask for it.

> It looks like adding a lot of startup overhead for a flag that would either never be
> set if it is not the default or always set if it is the default. The whole point of THP
> is that it is transparent, ie. automatic, always enabled.
>

Is it always beneficial to load every program, big or small,
performance sensitive or non-sensitive, with THP?  If it is
the case, why bother with a tunable?
  
Wilco Dijkstra April 9, 2026, 1:29 p.m. UTC | #5
Hi HJ,

> Is it always beneficial to load every program, big or small,
> performance sensitive or non-sensitive, with THP?  If it is
> the case, why bother with a tunable?

Linux reads much larger blocks from the filesystem, so yes,
it's beneficial if the binary is large enough and you have THP.
Small binaries can't really benefit.

Tunables are almost for free, it's just checking a global
variable, not a system call.

Cheers,
Wilco
  
H.J. Lu April 9, 2026, 9:18 p.m. UTC | #6
On Thu, Apr 9, 2026 at 9:30 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi HJ,
>
> > Is it always beneficial to load every program, big or small,
> > performance sensitive or non-sensitive, with THP?  If it is
> > the case, why bother with a tunable?
>
> Linux reads much larger blocks from the filesystem, so yes,
> it's beneficial if the binary is large enough and you have THP.
> Small binaries can't really benefit.
>
> Tunables are almost for free, it's just checking a global
> variable, not a system call.

The end-user still needs to set the tunable.  It is beyond
the programmer's control.  With a property bit, the developer
can set it on the binary.   If needed, it can be set in one of
crt files.
  
Wilco Dijkstra April 9, 2026, 9:39 p.m. UTC | #7
Hi HJ,

>> Tunables are almost for free, it's just checking a global
>> variable, not a system call.
> 
> The end-user still needs to set the tunable.  It is beyond
> the programmer's control.  With a property bit, the developer
> can set it on the binary.   If needed, it can be set in one of
> crt files.

The goal is to enable this by default on modern 64-bit targets given the extra
alignment is not harmful (only minor reduction of ALSR bits when PMD size is 2MB).

All this configurability is insane - nobody is ever going to figure out which settings are
available, how to modify each setting, let alone which combination gives the best
performance on every application on a particular machine... Users just want stuff to
work well out of the box. Call them lazy if you like.

Cheers,
Wilco
  
H.J. Lu April 9, 2026, 10:24 p.m. UTC | #8
On Fri, Apr 10, 2026 at 5:40 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi HJ,
>
> >> Tunables are almost for free, it's just checking a global
> >> variable, not a system call.
> >
> > The end-user still needs to set the tunable.  It is beyond
> > the programmer's control.  With a property bit, the developer
> > can set it on the binary.   If needed, it can be set in one of
> > crt files.
>
> The goal is to enable this by default on modern 64-bit targets given the extra
> alignment is not harmful (only minor reduction of ALSR bits when PMD size is 2MB).
>
> All this configurability is insane - nobody is ever going to figure out which settings are
> available, how to modify each setting, let alone which combination gives the best
> performance on every application on a particular machine... Users just want stuff to
> work well out of the box. Call them lazy if you like.
>
>

Why a tunable at all?  Users may not set it.
  
Wilco Dijkstra April 9, 2026, 11:32 p.m. UTC | #9
Hi HJ,

> Why a tunable at all?  Users may not set it.

If it is already the default, users never need to set it. It makes benchmarking
simpler at a cost that is ~10000x lower than checking the huge page size.

Cheers,
Wilco
  
H.J. Lu April 10, 2026, 1 a.m. UTC | #10
On Fri, Apr 10, 2026 at 7:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi HJ,
>
> > Why a tunable at all?  Users may not set it.
>
> If it is already the default, users never need to set it. It makes benchmarking
> simpler at a cost that is ~10000x lower than checking the huge page size.
>

Even if THP isn't enabled by default, like for 32-bit mode.  We may still
want to enable THP on specific 32-bit applications.   A property bit provides
such flexibility without requiring the user to set a tunable at run-time.
  
WANG Rui April 10, 2026, 2:50 a.m. UTC | #11
On Fri, Apr 10, 2026 at 9:26 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Apr 10, 2026 at 7:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> >
> > Hi HJ,
> >
> > > Why a tunable at all?  Users may not set it.
> >
> > If it is already the default, users never need to set it. It makes benchmarking
> > simpler at a cost that is ~10000x lower than checking the huge page size.
> >
>
> Even if THP isn't enabled by default, like for 32-bit mode.  We may still
> want to enable THP on specific 32-bit applications.   A property bit provides
> such flexibility without requiring the user to set a tunable at run-time.

Do we really need a flag in the executable file to hint the loader to
cooperate with THP? If we already have the ability to modify the
executable, why not just make the LOAD segmengts hugepage-aligned in
the first place?

As you can see, this optimization is currently opt-in on most arches
rather than enabled by default. I see the tunable as more of a
transitional mechanism, it simply gives users a way to turn the opt on
and on arches where it is enbled by default, it can also be used to
turn it off.

Thanks,
Rui
  
H.J. Lu April 10, 2026, 3:26 a.m. UTC | #12
On Fri, Apr 10, 2026 at 10:57 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> On Fri, Apr 10, 2026 at 9:26 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Fri, Apr 10, 2026 at 7:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > >
> > > Hi HJ,
> > >
> > > > Why a tunable at all?  Users may not set it.
> > >
> > > If it is already the default, users never need to set it. It makes benchmarking
> > > simpler at a cost that is ~10000x lower than checking the huge page size.
> > >
> >
> > Even if THP isn't enabled by default, like for 32-bit mode.  We may still
> > want to enable THP on specific 32-bit applications.   A property bit provides
> > such flexibility without requiring the user to set a tunable at run-time.
>
> Do we really need a flag in the executable file to hint the loader to
> cooperate with THP? If we already have the ability to modify the
> executable, why not just make the LOAD segmengts hugepage-aligned in
> the first place?
>
> As you can see, this optimization is currently opt-in on most arches
> rather than enabled by default. I see the tunable as more of a
> transitional mechanism, it simply gives users a way to turn the opt on
> and on arches where it is enbled by default, it can also be used to
> turn it off.
>

This all or nothing approach is my main concern.  A programmer
may want to enable THP on an application even if THP isn't enabled
by default.    The overhead of checking a bit in the l_1_needed field
is minimum.
  
WANG Rui April 10, 2026, 3:51 a.m. UTC | #13
On Fri, Apr 10, 2026 at 11:35 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Apr 10, 2026 at 10:57 AM WANG Rui <wangrui@loongson.cn> wrote:
> >
> > On Fri, Apr 10, 2026 at 9:26 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Fri, Apr 10, 2026 at 7:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > > >
> > > > Hi HJ,
> > > >
> > > > > Why a tunable at all?  Users may not set it.
> > > >
> > > > If it is already the default, users never need to set it. It makes benchmarking
> > > > simpler at a cost that is ~10000x lower than checking the huge page size.
> > > >
> > >
> > > Even if THP isn't enabled by default, like for 32-bit mode.  We may still
> > > want to enable THP on specific 32-bit applications.   A property bit provides
> > > such flexibility without requiring the user to set a tunable at run-time.
> >
> > Do we really need a flag in the executable file to hint the loader to
> > cooperate with THP? If we already have the ability to modify the
> > executable, why not just make the LOAD segmengts hugepage-aligned in
> > the first place?
> >
> > As you can see, this optimization is currently opt-in on most arches
> > rather than enabled by default. I see the tunable as more of a
> > transitional mechanism, it simply gives users a way to turn the opt on
> > and on arches where it is enbled by default, it can also be used to
> > turn it off.
> >
>
> This all or nothing approach is my main concern.  A programmer
> may want to enable THP on an application even if THP isn't enabled
> by default.    The overhead of checking a bit in the l_1_needed field
> is minimum.

In earlier discussions, there was mention of memory pressure caused by
THP, which is really a runtime system concern. When THP is not enabled
by default on the system, I'm a bit unsure whether it's a good idea
for a programer's decision to override the system default.

Ideally, the executable itself would take care of hugepage-friendly
alignment, the loader would make sure the mappings land on
hugepage-aligned virtual addresses, and the final decision on whether
to use huge pages would be left solely to THP. It has visibility into
the system's dynamic state, and users would only need to deal with
this single control point, making it relatively straightforward to
turn it on or off.

Thanks,
Rui
  
H.J. Lu April 10, 2026, 4:19 a.m. UTC | #14
On Fri, Apr 10, 2026 at 11:58 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> On Fri, Apr 10, 2026 at 11:35 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Fri, Apr 10, 2026 at 10:57 AM WANG Rui <wangrui@loongson.cn> wrote:
> > >
> > > On Fri, Apr 10, 2026 at 9:26 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Fri, Apr 10, 2026 at 7:33 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > > > >
> > > > > Hi HJ,
> > > > >
> > > > > > Why a tunable at all?  Users may not set it.
> > > > >
> > > > > If it is already the default, users never need to set it. It makes benchmarking
> > > > > simpler at a cost that is ~10000x lower than checking the huge page size.
> > > > >
> > > >
> > > > Even if THP isn't enabled by default, like for 32-bit mode.  We may still
> > > > want to enable THP on specific 32-bit applications.   A property bit provides
> > > > such flexibility without requiring the user to set a tunable at run-time.
> > >
> > > Do we really need a flag in the executable file to hint the loader to
> > > cooperate with THP? If we already have the ability to modify the
> > > executable, why not just make the LOAD segmengts hugepage-aligned in
> > > the first place?
> > >
> > > As you can see, this optimization is currently opt-in on most arches
> > > rather than enabled by default. I see the tunable as more of a
> > > transitional mechanism, it simply gives users a way to turn the opt on
> > > and on arches where it is enbled by default, it can also be used to
> > > turn it off.
> > >
> >
> > This all or nothing approach is my main concern.  A programmer
> > may want to enable THP on an application even if THP isn't enabled
> > by default.    The overhead of checking a bit in the l_1_needed field
> > is minimum.
>
> In earlier discussions, there was mention of memory pressure caused by
> THP, which is really a runtime system concern. When THP is not enabled
> by default on the system, I'm a bit unsure whether it's a good idea
> for a programer's decision to override the system default.
>
> Ideally, the executable itself would take care of hugepage-friendly
> alignment, the loader would make sure the mappings land on
> hugepage-aligned virtual addresses, and the final decision on whether
> to use huge pages would be left solely to THP. It has visibility into
> the system's dynamic state, and users would only need to deal with
> this single control point, making it relatively straightforward to
> turn it on or off.
>

What a property bit provides is flexibility.