[v8,6/6] loongarch: Enable THP-aligned load segments by default on 64-bit

Message ID 20260405035555.558396-2-wangrui@loongson.cn (mailing list archive)
State New
Headers
Series elf: THP-aware load segment alignment |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

WANG Rui April 5, 2026, 3:55 a.m. UTC
  On LoongArch64 Linux, aligning ELF load segments to Transparent Huge Page
(THP) boundaries provides consistent performance benefits for large
binaries by reducing TLB pressure and improving instruction fetch
efficiency.

Enable THP-based load segment alignment by default on LoongArch64 by
setting `glibc.elf.thp=1` during startup. Define the default THP
page size for load segment alignment on LoongArch64 as 32MB.

This allows the dynamic loader to apply THP-friendly alignment without
requiring the `glibc.elf.thp` tunable to be explicitly set.

Benchmarks

Machine: Loongson 3A6000 (LoongArch64)
Kernel: 6.18.13
  CONFIG_READ_ONLY_THP_FOR_FS=y
  CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y

Workload 1: building Cargo 1.93.0
Rustc: nightly-2026-02-26

                Without patch        With patch
instructions    3,690,358,948,176    3,690,301,774,568
cpu-cycles      4,233,025,766,760    4,035,866,635,741
itlb-misses     9,708,829,532        2,700,014,717
time elapsed    302.40 s             289.68 s

Instructions remain essentially unchanged. iTLB misses drop by about
72%, reducing CPU cycles by about 4.7% and wall time by about 4.2%.

Workload 2: building Linux kernel v7.0-rc1
LLVM: 21.1.8

                Without patch        With patch
instructions    14,163,739,876,387   14,169,418,598,675
cpu-cycles      19,231,890,317,741   16,851,494,928,181
itlb-misses     91,142,010,440       90,779,245
time elapsed    1022.09 s            893.22 s

Instructions remain roughly the same. iTLB misses drop from about 91B
to about 90M (roughly 99.9% reduction), reducing CPU cycles by about
12% and wall time by about 12.6%.

Reviewed-by: caiyinyu <caiyinyu@loongson.cn>
Signed-off-by: WANG Rui <wangrui@loongson.cn>
---
 .../unix/sysv/linux/loongarch/cpu-features.c  |  6 +++++
 .../loongarch/lp64/dl-map-segment-align.h     | 22 +++++++++++++++++++
 2 files changed, 28 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
  

Comments

H.J. Lu April 7, 2026, 1:06 a.m. UTC | #1
On Sun, Apr 5, 2026 at 11:56 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> On LoongArch64 Linux, aligning ELF load segments to Transparent Huge Page
> (THP) boundaries provides consistent performance benefits for large
> binaries by reducing TLB pressure and improving instruction fetch
> efficiency.
>
> Enable THP-based load segment alignment by default on LoongArch64 by
> setting `glibc.elf.thp=1` during startup. Define the default THP
> page size for load segment alignment on LoongArch64 as 32MB.
>
> This allows the dynamic loader to apply THP-friendly alignment without
> requiring the `glibc.elf.thp` tunable to be explicitly set.
>
> Benchmarks
>
> Machine: Loongson 3A6000 (LoongArch64)
> Kernel: 6.18.13
>   CONFIG_READ_ONLY_THP_FOR_FS=y
>   CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
>
> Workload 1: building Cargo 1.93.0
> Rustc: nightly-2026-02-26
>
>                 Without patch        With patch
> instructions    3,690,358,948,176    3,690,301,774,568
> cpu-cycles      4,233,025,766,760    4,035,866,635,741
> itlb-misses     9,708,829,532        2,700,014,717
> time elapsed    302.40 s             289.68 s
>
> Instructions remain essentially unchanged. iTLB misses drop by about
> 72%, reducing CPU cycles by about 4.7% and wall time by about 4.2%.
>
> Workload 2: building Linux kernel v7.0-rc1
> LLVM: 21.1.8
>
>                 Without patch        With patch
> instructions    14,163,739,876,387   14,169,418,598,675
> cpu-cycles      19,231,890,317,741   16,851,494,928,181
> itlb-misses     91,142,010,440       90,779,245
> time elapsed    1022.09 s            893.22 s
>
> Instructions remain roughly the same. iTLB misses drop from about 91B
> to about 90M (roughly 99.9% reduction), reducing CPU cycles by about
> 12% and wall time by about 12.6%.
>
> Reviewed-by: caiyinyu <caiyinyu@loongson.cn>
> Signed-off-by: WANG Rui <wangrui@loongson.cn>
> ---
>  .../unix/sysv/linux/loongarch/cpu-features.c  |  6 +++++
>  .../loongarch/lp64/dl-map-segment-align.h     | 22 +++++++++++++++++++
>  2 files changed, 28 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
>
> diff --git a/sysdeps/unix/sysv/linux/loongarch/cpu-features.c b/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
> index 0c125e64e2b..7ed54ff53e7 100644
> --- a/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
> @@ -27,4 +27,10 @@ init_cpu_features (struct cpu_features *cpu_features)
>    GLRO(dl_larch_cpu_features).hwcap = GLRO(dl_hwcap);
>    TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *,
>                TUNABLE_CALLBACK (set_hwcaps));
> +
> +#ifdef __LP64__
> +  /* Enable THP-based load segment alignment by default on LoongArch64. */
> +  if (!TUNABLE_IS_INITIALIZED (glibc, elf, thp))

TUNABLE_IS_INITIALIZED takes one argument.  Did you mean
TUNABLE_IS_INITIALIZED_FULL?

> +    TUNABLE_SET (glibc, elf, thp, 1);

TUNABLE_SET takes 2 arguments.  Did you mean
TUNABLE_SET_FULL?

> +#endif
>  }
> diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h b/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
> new file mode 100644
> index 00000000000..c51ee4ac47e
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
> @@ -0,0 +1,22 @@
> +/* _dl_map_segment_align.  LoongArch64 Linux version.
> +   Copyright (C) 2026 Free Software Foundation, Inc.
> +   Copyright The GNU Toolchain Authors.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define DL_MAP_DEFAULT_THP_PAGESIZE (32 * 1024 * 1024)
> +
> +#include_next <dl-map-segment-align.h>
> --
> 2.53.0
>
  
Wilco Dijkstra April 7, 2026, 5:59 p.m. UTC | #2
Hi HJ,

> +  /* Enable THP-based load segment alignment by default on LoongArch64. */
> +  if (!TUNABLE_IS_INITIALIZED (glibc, elf, thp))

TUNABLE_IS_INITIALIZED takes one argument.  Did you mean
TUNABLE_IS_INITIALIZED_FULL?

> +    TUNABLE_SET (glibc, elf, thp, 1);

TUNABLE_SET takes 2 arguments.  Did you mean
TUNABLE_SET_FULL?


This works fine since TUNABLE_NAMESPACE is not set in sysdeps directories, so it
has 2 extra arguments in that case (we could clean this up to reduce confusion).

Cheers,
Wilco
  
H.J. Lu April 10, 2026, 1:47 a.m. UTC | #3
On Sun, Apr 5, 2026 at 11:56 AM WANG Rui <wangrui@loongson.cn> wrote:
>
> On LoongArch64 Linux, aligning ELF load segments to Transparent Huge Page
> (THP) boundaries provides consistent performance benefits for large
> binaries by reducing TLB pressure and improving instruction fetch
> efficiency.
>
> Enable THP-based load segment alignment by default on LoongArch64 by
> setting `glibc.elf.thp=1` during startup. Define the default THP
> page size for load segment alignment on LoongArch64 as 32MB.
>
> This allows the dynamic loader to apply THP-friendly alignment without
> requiring the `glibc.elf.thp` tunable to be explicitly set.
>
> Benchmarks
>
> Machine: Loongson 3A6000 (LoongArch64)
> Kernel: 6.18.13
>   CONFIG_READ_ONLY_THP_FOR_FS=y
>   CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
>
> Workload 1: building Cargo 1.93.0
> Rustc: nightly-2026-02-26
>
>                 Without patch        With patch
> instructions    3,690,358,948,176    3,690,301,774,568
> cpu-cycles      4,233,025,766,760    4,035,866,635,741
> itlb-misses     9,708,829,532        2,700,014,717
> time elapsed    302.40 s             289.68 s
>
> Instructions remain essentially unchanged. iTLB misses drop by about
> 72%, reducing CPU cycles by about 4.7% and wall time by about 4.2%.
>
> Workload 2: building Linux kernel v7.0-rc1
> LLVM: 21.1.8
>
>                 Without patch        With patch
> instructions    14,163,739,876,387   14,169,418,598,675
> cpu-cycles      19,231,890,317,741   16,851,494,928,181
> itlb-misses     91,142,010,440       90,779,245
> time elapsed    1022.09 s            893.22 s
>
> Instructions remain roughly the same. iTLB misses drop from about 91B
> to about 90M (roughly 99.9% reduction), reducing CPU cycles by about
> 12% and wall time by about 12.6%.
>
> Reviewed-by: caiyinyu <caiyinyu@loongson.cn>
> Signed-off-by: WANG Rui <wangrui@loongson.cn>
> ---
>  .../unix/sysv/linux/loongarch/cpu-features.c  |  6 +++++
>  .../loongarch/lp64/dl-map-segment-align.h     | 22 +++++++++++++++++++
>  2 files changed, 28 insertions(+)
>  create mode 100644 sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
>
> diff --git a/sysdeps/unix/sysv/linux/loongarch/cpu-features.c b/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
> index 0c125e64e2b..7ed54ff53e7 100644
> --- a/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
> @@ -27,4 +27,10 @@ init_cpu_features (struct cpu_features *cpu_features)
>    GLRO(dl_larch_cpu_features).hwcap = GLRO(dl_hwcap);
>    TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *,
>                TUNABLE_CALLBACK (set_hwcaps));
> +
> +#ifdef __LP64__
> +  /* Enable THP-based load segment alignment by default on LoongArch64. */
> +  if (!TUNABLE_IS_INITIALIZED (glibc, elf, thp))
> +    TUNABLE_SET (glibc, elf, thp, 1);
> +#endif
>  }
> diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h b/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
> new file mode 100644
> index 00000000000..c51ee4ac47e
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
> @@ -0,0 +1,22 @@
> +/* _dl_map_segment_align.  LoongArch64 Linux version.
> +   Copyright (C) 2026 Free Software Foundation, Inc.
> +   Copyright The GNU Toolchain Authors.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define DL_MAP_DEFAULT_THP_PAGESIZE (32 * 1024 * 1024)
>

What happens if DL_MAP_DEFAULT_THP_PAGESIZE is different
from /sys/kernel/mm/transparent_hugepage/hpage_pmd_size?

> +#include_next <dl-map-segment-align.h>
> --
> 2.53.0
>
  
Wilco Dijkstra April 10, 2026, 10:34 a.m. UTC | #4
Hi HJ,

>> +#define DL_MAP_DEFAULT_THP_PAGESIZE (32 * 1024 * 1024)
>
> What happens if DL_MAP_DEFAULT_THP_PAGESIZE is different
> from /sys/kernel/mm/transparent_hugepage/hpage_pmd_size?

THP always works whatever value you choose - it just works better
if you align the start. And if you don't know your PMD size, why set it?

Cheers,
Wilco
  
H.J. Lu April 11, 2026, 12:31 a.m. UTC | #5
On Fri, Apr 10, 2026 at 6:35 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi HJ,
>
> >> +#define DL_MAP_DEFAULT_THP_PAGESIZE (32 * 1024 * 1024)
> >
> > What happens if DL_MAP_DEFAULT_THP_PAGESIZE is different
> > from /sys/kernel/mm/transparent_hugepage/hpage_pmd_size?
>
> THP always works whatever value you choose - it just works better
> if you align the start. And if you don't know your PMD size, why set it?
>

glibc shouldn't assume THP status which is set by kernel.
  

Patch

diff --git a/sysdeps/unix/sysv/linux/loongarch/cpu-features.c b/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
index 0c125e64e2b..7ed54ff53e7 100644
--- a/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
+++ b/sysdeps/unix/sysv/linux/loongarch/cpu-features.c
@@ -27,4 +27,10 @@  init_cpu_features (struct cpu_features *cpu_features)
   GLRO(dl_larch_cpu_features).hwcap = GLRO(dl_hwcap);
   TUNABLE_GET (glibc, cpu, hwcaps, tunable_val_t *,
 	       TUNABLE_CALLBACK (set_hwcaps));
+
+#ifdef __LP64__
+  /* Enable THP-based load segment alignment by default on LoongArch64. */
+  if (!TUNABLE_IS_INITIALIZED (glibc, elf, thp))
+    TUNABLE_SET (glibc, elf, thp, 1);
+#endif
 }
diff --git a/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h b/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
new file mode 100644
index 00000000000..c51ee4ac47e
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/loongarch/lp64/dl-map-segment-align.h
@@ -0,0 +1,22 @@ 
+/* _dl_map_segment_align.  LoongArch64 Linux version.
+   Copyright (C) 2026 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define DL_MAP_DEFAULT_THP_PAGESIZE (32 * 1024 * 1024)
+
+#include_next <dl-map-segment-align.h>