ld: Disable separate code by default for Linux/x86
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_binutils_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-arm |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
-z separate-code, mainly to avoid mixing code and data in the same segment
for better cache performance, and enabled it by default for Linux/x86.
Today, Linux already has working support for mapping executable code with THP,
which can noticeably reduce iTLB pressure and bring measurable performance
benefits.
In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
typically place the code PT_LOAD segment first, so both the virtual address and
file offset are 0. That turns out to be an important prerequisite for THP-aware
PT_LOAD alignment optimizations [1][2][3], and the data shows clear performance
wins on both AArch64 and LoongArch64. [4][1]
Linux/x86 is a bit of an outlier here: the default -z separate-code setting
happens to break this layout. So this patch changes the Linux/x86 default to
-z noseparate-code.
Benchmark
Machine: AMD Ryzen 9 7950X (x86_64)
Binutils: 2.46
GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
Linux kernel: 6.19.6 with patch [1]
Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
Without patch With patch
instructions 8,246,133,611,932 8,246,025,137,750
cpu-cycles 8,001,028,142,928 7,565,925,107,502
itlb-misses 3,672,158,331 26,821,242
time elapsed 64.66 s 61.97 s
Instructions are basically unchanged. iTLB misses drop from ~3.67B to
~26M (~99.27% reduction), which results in about a ~5.44% reduction in
cycles and ~4.18% shorter wall time for this workload.
[1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
[2] https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
[3] https://lore.kernel.org/linux-fsdevel/20260310145406.3073394-1-usama.arif@linux.dev
[4] https://lore.kernel.org/linux-fsdevel/20260314095022.217231-1-r@hev.cc
bfd/
* configure.ac: Remove Linux/x86 default enabling of
-z separate-code.
* configure: Regenerated.
ld/
* configure.tgt: Do not enable -z separate-code by default
for Linux/x86 targets.
Signed-off-by: WANG Rui <r@hev.cc>
---
bfd/configure | 8 --------
bfd/configure.ac | 10 ----------
ld/configure.tgt | 5 +----
3 files changed, 1 insertion(+), 22 deletions(-)
Comments
On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
>
> Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> -z separate-code, mainly to avoid mixing code and data in the same segment
> for better cache performance, and enabled it by default for Linux/x86.
-z separate-code reduces cache pollution and improves security by avoiding
putting read-only data in the same page with code.
> Today, Linux already has working support for mapping executable code with THP,
> which can noticeably reduce iTLB pressure and bring measurable performance
> benefits.
>
> In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> typically place the code PT_LOAD segment first, so both the virtual address and
> file offset are 0. That turns out to be an important prerequisite for THP-aware
Should ld.so map read-only data segment plus code segment into the same huge
page? This should have the similar effect as -z noseparate-code.
> PT_LOAD alignment optimizations [1][2][3], and the data shows clear performance
> wins on both AArch64 and LoongArch64. [4][1]
>
> Linux/x86 is a bit of an outlier here: the default -z separate-code setting
> happens to break this layout. So this patch changes the Linux/x86 default to
> -z noseparate-code.
>
> Benchmark
>
> Machine: AMD Ryzen 9 7950X (x86_64)
> Binutils: 2.46
> GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
> Linux kernel: 6.19.6 with patch [1]
>
> Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
>
> Without patch With patch
> instructions 8,246,133,611,932 8,246,025,137,750
> cpu-cycles 8,001,028,142,928 7,565,925,107,502
> itlb-misses 3,672,158,331 26,821,242
> time elapsed 64.66 s 61.97 s
>
> Instructions are basically unchanged. iTLB misses drop from ~3.67B to
> ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
> cycles and ~4.18% shorter wall time for this workload.
Hongtao, haochecn, can you work with Rui to measure -z noseparate-code
performance impact on SPEC CPU?
> [1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
> [2] https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
> [3] https://lore.kernel.org/linux-fsdevel/20260310145406.3073394-1-usama.arif@linux.dev
> [4] https://lore.kernel.org/linux-fsdevel/20260314095022.217231-1-r@hev.cc
>
> bfd/
>
> * configure.ac: Remove Linux/x86 default enabling of
> -z separate-code.
> * configure: Regenerated.
>
> ld/
>
> * configure.tgt: Do not enable -z separate-code by default
> for Linux/x86 targets.
>
> Signed-off-by: WANG Rui <r@hev.cc>
> ---
> bfd/configure | 8 --------
> bfd/configure.ac | 10 ----------
> ld/configure.tgt | 5 +----
> 3 files changed, 1 insertion(+), 22 deletions(-)
>
> diff --git a/bfd/configure b/bfd/configure
> index f474c6af9cd..830c06a6c34 100755
> --- a/bfd/configure
> +++ b/bfd/configure
> @@ -12356,14 +12356,6 @@ if test "${enable_separate_code+set}" = set; then :
> esac
> fi
>
> -# Enable -z separate-code by default for Linux/x86.
> -case "${target}" in
> -i[3-7]86-*-linux-* | x86_64-*-linux-*)
> - if test ${ac_default_ld_z_separate_code} = unset; then
> - ac_default_ld_z_separate_code=1
> - fi
> - ;;
> -esac
> if test "${ac_default_ld_z_separate_code}" = unset; then
> ac_default_ld_z_separate_code=0
> fi
> diff --git a/bfd/configure.ac b/bfd/configure.ac
> index c30a75ba8b7..e152cf0cf94 100644
> --- a/bfd/configure.ac
> +++ b/bfd/configure.ac
> @@ -131,16 +131,6 @@ AC_ARG_ENABLE(separate-code,
> yes) ac_default_ld_z_separate_code=1 ;;
> no) ac_default_ld_z_separate_code=0 ;;
> esac])
> -# Enable -z separate-code by default for Linux/x86.
> -changequote(,)dnl
> -case "${target}" in
> -i[3-7]86-*-linux-* | x86_64-*-linux-*)
> -changequote([,])dnl
> - if test ${ac_default_ld_z_separate_code} = unset; then
> - ac_default_ld_z_separate_code=1
> - fi
> - ;;
> -esac
> if test "${ac_default_ld_z_separate_code}" = unset; then
> ac_default_ld_z_separate_code=0
> fi
> diff --git a/ld/configure.tgt b/ld/configure.tgt
> index 3e158913b89..c96f55ab7ce 100644
> --- a/ld/configure.tgt
> +++ b/ld/configure.tgt
> @@ -1119,12 +1119,9 @@ frv-*-* | hppa*-*-* | ia64-*-* | mips*-*-*)
> ;;
> esac
>
> -# Enable -z separate-code and --warn-textrel by default for Linux/x86.
> +# Enable --warn-textrel by default for Linux/x86.
> case "${target}" in
> i[3-7]86-*-linux-* | x86_64-*-linux-*)
> - if test ${ac_default_ld_z_separate_code} = unset; then
> - ac_default_ld_z_separate_code=1
> - fi
> if test ${ac_default_ld_textrel_check} = unset; then
> ac_default_ld_textrel_check=yes
> fi
> --
> 2.53.0
>
On Mon, Mar 16, 2026 at 11:58 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
> >
> > Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> > -z separate-code, mainly to avoid mixing code and data in the same segment
> > for better cache performance, and enabled it by default for Linux/x86.
>
> -z separate-code reduces cache pollution and improves security by avoiding
> putting read-only data in the same page with code.
I see. IIUC, the security aspect you mentioned mainly concerns
avoiding an executable ELF header. In that case we probably need to
make a trade-off here.
> > Today, Linux already has working support for mapping executable code with THP,
> > which can noticeably reduce iTLB pressure and bring measurable performance
> > benefits.
> >
> > In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> > typically place the code PT_LOAD segment first, so both the virtual address and
> > file offset are 0. That turns out to be an important prerequisite for THP-aware
>
> Should ld.so map read-only data segment plus code segment into the same huge
> page? This should have the similar effect as -z noseparate-code.
A huge page has a single set of permissions, so it doesn't seem
possible for part of the page to be read-only and non-executable while
another part is executable.
> > PT_LOAD alignment optimizations [1][2][3], and the data shows clear performance
> > wins on both AArch64 and LoongArch64. [4][1]
> >
> > Linux/x86 is a bit of an outlier here: the default -z separate-code setting
> > happens to break this layout. So this patch changes the Linux/x86 default to
> > -z noseparate-code.
> >
> > Benchmark
> >
> > Machine: AMD Ryzen 9 7950X (x86_64)
> > Binutils: 2.46
> > GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
> > Linux kernel: 6.19.6 with patch [1]
> >
> > Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
> >
> > Without patch With patch
> > instructions 8,246,133,611,932 8,246,025,137,750
> > cpu-cycles 8,001,028,142,928 7,565,925,107,502
> > itlb-misses 3,672,158,331 26,821,242
> > time elapsed 64.66 s 61.97 s
> >
> > Instructions are basically unchanged. iTLB misses drop from ~3.67B to
> > ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
> > cycles and ~4.18% shorter wall time for this workload.
>
> Hongtao, haochecn, can you work with Rui to measure -z noseparate-code
> performance impact on SPEC CPU?
Also just a reminder that this optimization depends on Linux kernel
support for file-backend THP,
specifically:
* `CONFIG_READ_ONLY_THP_FOR_FS` (available since Linux kernel 5.4), and
* `CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS`.
Thanks,
Rui
On Mon, Mar 16, 2026 at 9:21 AM hev <r@hev.cc> wrote:
>
> On Mon, Mar 16, 2026 at 11:58 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
> > >
> > > Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> > > -z separate-code, mainly to avoid mixing code and data in the same segment
> > > for better cache performance, and enabled it by default for Linux/x86.
> >
> > -z separate-code reduces cache pollution and improves security by avoiding
> > putting read-only data in the same page with code.
>
> I see. IIUC, the security aspect you mentioned mainly concerns
> avoiding an executable ELF header. In that case we probably need to
> make a trade-off here.
>
> > > Today, Linux already has working support for mapping executable code with THP,
> > > which can noticeably reduce iTLB pressure and bring measurable performance
> > > benefits.
> > >
> > > In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> > > typically place the code PT_LOAD segment first, so both the virtual address and
> > > file offset are 0. That turns out to be an important prerequisite for THP-aware
> >
> > Should ld.so map read-only data segment plus code segment into the same huge
> > page? This should have the similar effect as -z noseparate-code.
>
> A huge page has a single set of permissions, so it doesn't seem
> possible for part of the page to be read-only and non-executable while
> another part is executable.
With -z separate-code, I got
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040
0x0002d8 0x0002d8 R 0x8
INTERP 0x001000 0x0000000000001000 0x0000000000001000
0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000
0x0004a1 0x0004a1 R E 0x1000
LOAD 0x001000 0x0000000000001000 0x0000000000001000
0x0003f0 0x0003f0 R 0x1000
LOAD 0x001d98 0x0000000000002d98 0x0000000000002d98
0x000274 0x000278 RW 0x1000
With -z noseparate-code, I got
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040
0x0002a0 0x0002a0 R 0x8
INTERP 0x000304 0x0000000000000304 0x0000000000000304
0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000
0x000858 0x000858 R E 0x1000
LOAD 0x000d98 0x0000000000001d98 0x0000000000001d98
0x000274 0x000278 RW 0x1000
Please remind me of the benefits of -z noseparate-code. If ld.so maps
LOAD 0x000000 0x0000000000000000 0x0000000000000000
0x0004a1 0x0004a1 R E 0x1000
LOAD 0x001000 0x0000000000001000 0x0000000000001000
0x0003f0 0x0003f0 R 0x1000
in the same huge page with RW permission, will we get the same benefits?
> > > PT_LOAD alignment optimizations [1][2][3], and the data shows clear performance
> > > wins on both AArch64 and LoongArch64. [4][1]
> > >
> > > Linux/x86 is a bit of an outlier here: the default -z separate-code setting
> > > happens to break this layout. So this patch changes the Linux/x86 default to
> > > -z noseparate-code.
> > >
> > > Benchmark
> > >
> > > Machine: AMD Ryzen 9 7950X (x86_64)
> > > Binutils: 2.46
> > > GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
> > > Linux kernel: 6.19.6 with patch [1]
> > >
> > > Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
> > >
> > > Without patch With patch
> > > instructions 8,246,133,611,932 8,246,025,137,750
> > > cpu-cycles 8,001,028,142,928 7,565,925,107,502
> > > itlb-misses 3,672,158,331 26,821,242
> > > time elapsed 64.66 s 61.97 s
> > >
> > > Instructions are basically unchanged. iTLB misses drop from ~3.67B to
> > > ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
> > > cycles and ~4.18% shorter wall time for this workload.
> >
> > Hongtao, haochecn, can you work with Rui to measure -z noseparate-code
> > performance impact on SPEC CPU?
>
> Also just a reminder that this optimization depends on Linux kernel
> support for file-backend THP,
> specifically:
>
> * `CONFIG_READ_ONLY_THP_FOR_FS` (available since Linux kernel 5.4), and
> * `CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS`.
>
> Thanks,
> Rui
On Tue, Mar 17, 2026 at 12:35 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 16, 2026 at 9:21 AM hev <r@hev.cc> wrote:
> >
> > On Mon, Mar 16, 2026 at 11:58 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
> > > >
> > > > Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> > > > -z separate-code, mainly to avoid mixing code and data in the same segment
> > > > for better cache performance, and enabled it by default for Linux/x86.
> > >
> > > -z separate-code reduces cache pollution and improves security by avoiding
> > > putting read-only data in the same page with code.
> >
> > I see. IIUC, the security aspect you mentioned mainly concerns
> > avoiding an executable ELF header. In that case we probably need to
> > make a trade-off here.
> >
> > > > Today, Linux already has working support for mapping executable code with THP,
> > > > which can noticeably reduce iTLB pressure and bring measurable performance
> > > > benefits.
> > > >
> > > > In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> > > > typically place the code PT_LOAD segment first, so both the virtual address and
> > > > file offset are 0. That turns out to be an important prerequisite for THP-aware
> > >
> > > Should ld.so map read-only data segment plus code segment into the same huge
> > > page? This should have the similar effect as -z noseparate-code.
> >
> > A huge page has a single set of permissions, so it doesn't seem
> > possible for part of the page to be read-only and non-executable while
> > another part is executable.
>
> With -z separate-code, I got
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flg Align
> PHDR 0x000040 0x0000000000000040 0x0000000000000040
> 0x0002d8 0x0002d8 R 0x8
> INTERP 0x001000 0x0000000000001000 0x0000000000001000
> 0x00001c 0x00001c R 0x1
> [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
> LOAD 0x000000 0x0000000000000000 0x0000000000000000
> 0x0004a1 0x0004a1 R E 0x1000
> LOAD 0x001000 0x0000000000001000 0x0000000000001000
> 0x0003f0 0x0003f0 R 0x1000
> LOAD 0x001d98 0x0000000000002d98 0x0000000000002d98
> 0x000274 0x000278 RW 0x1000
Huh? Why is the layout you get different from mine? With -z
separate-code, the read-only segment should normally come first, so
that the ELF header and program headers fall into that segment and
remain read-only and non-executable for security reasons.
With -z separate-code, I got
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000310 0x0000000000000310 R 0x8
INTERP 0x0000000000000374 0x0000000000000374 0x0000000000000374
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000005c8 0x00000000000005c8 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000139 0x0000000000000139 R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x00000000000000c8 0x00000000000000c8 R 0x1000
LOAD 0x0000000000002e10 0x0000000000003e10 0x0000000000003e10
0x0000000000000200 0x0000000000000208 RW 0x1000
With -z noseparate-code, I got
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002a0 0x00000000000002a0 R 0x8
INTERP 0x0000000000000304 0x0000000000000304 0x0000000000000304
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000768 0x0000000000000768 R E 0x1000
LOAD 0x0000000000000e10 0x0000000000001e10 0x0000000000001e10
0x0000000000000200 0x0000000000000208 RW 0x1000
>
> With -z noseparate-code, I got
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flg Align
> PHDR 0x000040 0x0000000000000040 0x0000000000000040
> 0x0002a0 0x0002a0 R 0x8
> INTERP 0x000304 0x0000000000000304 0x0000000000000304
> 0x00001c 0x00001c R 0x1
> [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
> LOAD 0x000000 0x0000000000000000 0x0000000000000000
> 0x000858 0x000858 R E 0x1000
> LOAD 0x000d98 0x0000000000001d98 0x0000000000001d98
> 0x000274 0x000278 RW 0x1000
>
> Please remind me of the benefits of -z noseparate-code. If ld.so maps
>
> LOAD 0x000000 0x0000000000000000 0x0000000000000000
> 0x0004a1 0x0004a1 R E 0x1000
> LOAD 0x001000 0x0000000000001000 0x0000000000001000
> 0x0003f0 0x0003f0 R 0x1000
>
> in the same huge page with RW permission, will we get the same benefits?
One benefit of -z noseparate-code is that the executable PT_LOAD ends
up at virtual address 0 with file offset 0. For huge page mappings,
both the virtual address and the file offset need to be aligned to the
huge page size, e.g.
((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
Are you suggesting that, with the -z separate-code layout, ld.so would
ignore the non-executable flag of the first PT_LOAD and merge it with
the second PT_LOAD, mapping them together as read-only executable?
That might help with existing binaries, but wouldn't it blur the
permission boundaries between segments?
Thanks,
Rui
On Mon, Mar 16, 2026 at 10:03 AM hev <r@hev.cc> wrote:
>
> On Tue, Mar 17, 2026 at 12:35 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Mar 16, 2026 at 9:21 AM hev <r@hev.cc> wrote:
> > >
> > > On Mon, Mar 16, 2026 at 11:58 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
> > > > >
> > > > > Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> > > > > -z separate-code, mainly to avoid mixing code and data in the same segment
> > > > > for better cache performance, and enabled it by default for Linux/x86.
> > > >
> > > > -z separate-code reduces cache pollution and improves security by avoiding
> > > > putting read-only data in the same page with code.
> > >
> > > I see. IIUC, the security aspect you mentioned mainly concerns
> > > avoiding an executable ELF header. In that case we probably need to
> > > make a trade-off here.
> > >
> > > > > Today, Linux already has working support for mapping executable code with THP,
> > > > > which can noticeably reduce iTLB pressure and bring measurable performance
> > > > > benefits.
> > > > >
> > > > > In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> > > > > typically place the code PT_LOAD segment first, so both the virtual address and
> > > > > file offset are 0. That turns out to be an important prerequisite for THP-aware
> > > >
> > > > Should ld.so map read-only data segment plus code segment into the same huge
> > > > page? This should have the similar effect as -z noseparate-code.
> > >
> > > A huge page has a single set of permissions, so it doesn't seem
> > > possible for part of the page to be read-only and non-executable while
> > > another part is executable.
> >
> > With -z separate-code, I got
> >
> > Program Headers:
> > Type Offset VirtAddr PhysAddr
> > FileSiz MemSiz Flg Align
> > PHDR 0x000040 0x0000000000000040 0x0000000000000040
> > 0x0002d8 0x0002d8 R 0x8
> > INTERP 0x001000 0x0000000000001000 0x0000000000001000
> > 0x00001c 0x00001c R 0x1
> > [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
> > LOAD 0x000000 0x0000000000000000 0x0000000000000000
> > 0x0004a1 0x0004a1 R E 0x1000
> > LOAD 0x001000 0x0000000000001000 0x0000000000001000
> > 0x0003f0 0x0003f0 R 0x1000
> > LOAD 0x001d98 0x0000000000002d98 0x0000000000002d98
> > 0x000274 0x000278 RW 0x1000
>
> Huh? Why is the layout you get different from mine? With -z
> separate-code, the read-only segment should normally come first, so
> that the ELF header and program headers fall into that segment and
> remain read-only and non-executable for security reasons.
It is because my binutils was configured --enable-rosegment.
> With -z separate-code, I got
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
> 0x0000000000000310 0x0000000000000310 R 0x8
> INTERP 0x0000000000000374 0x0000000000000374 0x0000000000000374
> 0x000000000000001c 0x000000000000001c R 0x1
> [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x00000000000005c8 0x00000000000005c8 R 0x1000
> LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
> 0x0000000000000139 0x0000000000000139 R E 0x1000
> LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
> 0x00000000000000c8 0x00000000000000c8 R 0x1000
> LOAD 0x0000000000002e10 0x0000000000003e10 0x0000000000003e10
> 0x0000000000000200 0x0000000000000208 RW 0x1000
>
> With -z noseparate-code, I got
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
> 0x00000000000002a0 0x00000000000002a0 R 0x8
> INTERP 0x0000000000000304 0x0000000000000304 0x0000000000000304
> 0x000000000000001c 0x000000000000001c R 0x1
> [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x0000000000000768 0x0000000000000768 R E 0x1000
> LOAD 0x0000000000000e10 0x0000000000001e10 0x0000000000001e10
> 0x0000000000000200 0x0000000000000208 RW 0x1000
>
> >
> > With -z noseparate-code, I got
> >
> > Program Headers:
> > Type Offset VirtAddr PhysAddr
> > FileSiz MemSiz Flg Align
> > PHDR 0x000040 0x0000000000000040 0x0000000000000040
> > 0x0002a0 0x0002a0 R 0x8
> > INTERP 0x000304 0x0000000000000304 0x0000000000000304
> > 0x00001c 0x00001c R 0x1
> > [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
> > LOAD 0x000000 0x0000000000000000 0x0000000000000000
> > 0x000858 0x000858 R E 0x1000
> > LOAD 0x000d98 0x0000000000001d98 0x0000000000001d98
> > 0x000274 0x000278 RW 0x1000
> >
> > Please remind me of the benefits of -z noseparate-code. If ld.so maps
> >
> > LOAD 0x000000 0x0000000000000000 0x0000000000000000
> > 0x0004a1 0x0004a1 R E 0x1000
> > LOAD 0x001000 0x0000000000001000 0x0000000000001000
> > 0x0003f0 0x0003f0 R 0x1000
> >
> > in the same huge page with RW permission, will we get the same benefits?
>
> One benefit of -z noseparate-code is that the executable PT_LOAD ends
> up at virtual address 0 with file offset 0. For huge page mappings,
> both the virtual address and the file offset need to be aligned to the
> huge page size, e.g.
>
> ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
>
> Are you suggesting that, with the -z separate-code layout, ld.so would
> ignore the non-executable flag of the first PT_LOAD and merge it with
> the second PT_LOAD, mapping them together as read-only executable?
> That might help with existing binaries, but wouldn't it blur the
> permission boundaries between segments?
>
That is correct. It should have the similar effect as -z noseparate-code.
The difference is link-time vs run-time. Of course, ld.so should do it only
when huge page is in use.
On Tue, Mar 17, 2026 at 5:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> > One benefit of -z noseparate-code is that the executable PT_LOAD ends
> > up at virtual address 0 with file offset 0. For huge page mappings,
> > both the virtual address and the file offset need to be aligned to the
> > huge page size, e.g.
> >
> > ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
> >
> > Are you suggesting that, with the -z separate-code layout, ld.so would
> > ignore the non-executable flag of the first PT_LOAD and merge it with
> > the second PT_LOAD, mapping them together as read-only executable?
> > That might help with existing binaries, but wouldn't it blur the
> > permission boundaries between segments?
> >
>
> That is correct. It should have the similar effect as -z noseparate-code.
> The difference is link-time vs run-time. Of course, ld.so should do it only
> when huge page is in use.
Having ld.so transparently merge the mappings doesn't always seem
workable. For example, in chromium the RE segment has different
virtual addresses and file offsets, so we can't construct a correct
mapping when attempting to merge them.
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000014d2b2c 0x00000000014d2b2c R 0x1000
LOAD 0x00000000014d3000 0x00000000014d4000 0x00000000014d4000
0x000000000fb7a420 0x000000000fb7a420 R E 0x1000
LOAD 0x000000001104d420 0x000000001104f420 0x000000001104f420
0x0000000000ac0f58 0x0000000000ac1be0 RW 0x1000
LOAD 0x0000000011b0f000 0x0000000011b12000 0x0000000011b12000
0x0000000000146001 0x0000000000428d34 RW 0x1000
Thanks,
Rui
On Mon, Mar 16, 2026 at 5:27 PM hev <r@hev.cc> wrote:
>
> On Tue, Mar 17, 2026 at 5:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > > One benefit of -z noseparate-code is that the executable PT_LOAD ends
> > > up at virtual address 0 with file offset 0. For huge page mappings,
> > > both the virtual address and the file offset need to be aligned to the
> > > huge page size, e.g.
> > >
> > > ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
> > >
> > > Are you suggesting that, with the -z separate-code layout, ld.so would
> > > ignore the non-executable flag of the first PT_LOAD and merge it with
> > > the second PT_LOAD, mapping them together as read-only executable?
> > > That might help with existing binaries, but wouldn't it blur the
> > > permission boundaries between segments?
> > >
> >
> > That is correct. It should have the similar effect as -z noseparate-code.
> > The difference is link-time vs run-time. Of course, ld.so should do it only
> > when huge page is in use.
>
> Having ld.so transparently merge the mappings doesn't always seem
> workable. For example, in chromium the RE segment has different
> virtual addresses and file offsets, so we can't construct a correct
> mapping when attempting to merge them.
>
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x00000000014d2b2c 0x00000000014d2b2c R 0x1000
> LOAD 0x00000000014d3000 0x00000000014d4000 0x00000000014d4000
> 0x000000000fb7a420 0x000000000fb7a420 R E 0x1000
> LOAD 0x000000001104d420 0x000000001104f420 0x000000001104f420
> 0x0000000000ac0f58 0x0000000000ac1be0 RW 0x1000
> LOAD 0x0000000011b0f000 0x0000000011b12000 0x0000000011b12000
> 0x0000000000146001 0x0000000000428d34 RW 0x1000
>
This layout won't work. It can be an option for compatible layouts. For
ld on Linux/x86, we can default to --rosegment.
On Tue, Mar 17, 2026 at 9:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 16, 2026 at 5:27 PM hev <r@hev.cc> wrote:
> >
> > On Tue, Mar 17, 2026 at 5:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > > One benefit of -z noseparate-code is that the executable PT_LOAD ends
> > > > up at virtual address 0 with file offset 0. For huge page mappings,
> > > > both the virtual address and the file offset need to be aligned to the
> > > > huge page size, e.g.
> > > >
> > > > ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
> > > >
> > > > Are you suggesting that, with the -z separate-code layout, ld.so would
> > > > ignore the non-executable flag of the first PT_LOAD and merge it with
> > > > the second PT_LOAD, mapping them together as read-only executable?
> > > > That might help with existing binaries, but wouldn't it blur the
> > > > permission boundaries between segments?
> > > >
> > >
> > > That is correct. It should have the similar effect as -z noseparate-code.
> > > The difference is link-time vs run-time. Of course, ld.so should do it only
> > > when huge page is in use.
> >
> > Having ld.so transparently merge the mappings doesn't always seem
> > workable. For example, in chromium the RE segment has different
> > virtual addresses and file offsets, so we can't construct a correct
> > mapping when attempting to merge them.
> >
> > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> > 0x00000000014d2b2c 0x00000000014d2b2c R 0x1000
> > LOAD 0x00000000014d3000 0x00000000014d4000 0x00000000014d4000
> > 0x000000000fb7a420 0x000000000fb7a420 R E 0x1000
> > LOAD 0x000000001104d420 0x000000001104f420 0x000000001104f420
> > 0x0000000000ac0f58 0x0000000000ac1be0 RW 0x1000
> > LOAD 0x0000000011b0f000 0x0000000011b12000 0x0000000011b12000
> > 0x0000000000146001 0x0000000000428d34 RW 0x1000
> >
>
> This layout won't work. It can be an option for compatible layouts. For
> ld on Linux/x86, we can default to --rosegment.
With -z separate-code + --rosegment, the layout appears as RE, R, and
RW, which also looks reasonable.
With -z noseparate-code + --no-rosegment, the RE and R segments are
merged, resulting in RE and RW.
One advantage of the latter is that read-only data can also benefit
from huge page mappings. In addition, the merged segment has a better
chance of being mapped with huge pages. Since a PMD-sized huge page is
2 MB, a segment shorter than that cannot use it. Merging segments
increases the total size, making PMD usage more likely.
Thanks,
Rui
On Mon, Mar 16, 2026 at 7:11 PM hev <r@hev.cc> wrote:
>
> On Tue, Mar 17, 2026 at 9:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Mar 16, 2026 at 5:27 PM hev <r@hev.cc> wrote:
> > >
> > > On Tue, Mar 17, 2026 at 5:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > > One benefit of -z noseparate-code is that the executable PT_LOAD ends
> > > > > up at virtual address 0 with file offset 0. For huge page mappings,
> > > > > both the virtual address and the file offset need to be aligned to the
> > > > > huge page size, e.g.
> > > > >
> > > > > ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
> > > > >
> > > > > Are you suggesting that, with the -z separate-code layout, ld.so would
> > > > > ignore the non-executable flag of the first PT_LOAD and merge it with
> > > > > the second PT_LOAD, mapping them together as read-only executable?
> > > > > That might help with existing binaries, but wouldn't it blur the
> > > > > permission boundaries between segments?
> > > > >
> > > >
> > > > That is correct. It should have the similar effect as -z noseparate-code.
> > > > The difference is link-time vs run-time. Of course, ld.so should do it only
> > > > when huge page is in use.
> > >
> > > Having ld.so transparently merge the mappings doesn't always seem
> > > workable. For example, in chromium the RE segment has different
> > > virtual addresses and file offsets, so we can't construct a correct
> > > mapping when attempting to merge them.
> > >
> > > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> > > 0x00000000014d2b2c 0x00000000014d2b2c R 0x1000
> > > LOAD 0x00000000014d3000 0x00000000014d4000 0x00000000014d4000
> > > 0x000000000fb7a420 0x000000000fb7a420 R E 0x1000
> > > LOAD 0x000000001104d420 0x000000001104f420 0x000000001104f420
> > > 0x0000000000ac0f58 0x0000000000ac1be0 RW 0x1000
> > > LOAD 0x0000000011b0f000 0x0000000011b12000 0x0000000011b12000
> > > 0x0000000000146001 0x0000000000428d34 RW 0x1000
> > >
> >
> > This layout won't work. It can be an option for compatible layouts. For
> > ld on Linux/x86, we can default to --rosegment.
>
> With -z separate-code + --rosegment, the layout appears as RE, R, and
> RW, which also looks reasonable.
I think ld.so can map RE + R to RE when huge page is enabled if it
is shown to improve performance.
> With -z noseparate-code + --no-rosegment, the RE and R segments are
> merged, resulting in RE and RW.
>
> One advantage of the latter is that read-only data can also benefit
> from huge page mappings. In addition, the merged segment has a better
> chance of being mapped with huge pages. Since a PMD-sized huge page is
> 2 MB, a segment shorter than that cannot use it. Merging segments
> increases the total size, making PMD usage more likely.
>
> Thanks,
> Rui
On Tue, Mar 17, 2026 at 10:15 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 16, 2026 at 7:11 PM hev <r@hev.cc> wrote:
> >
> > On Tue, Mar 17, 2026 at 9:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Mon, Mar 16, 2026 at 5:27 PM hev <r@hev.cc> wrote:
> > > >
> > > > On Tue, Mar 17, 2026 at 5:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > > One benefit of -z noseparate-code is that the executable PT_LOAD ends
> > > > > > up at virtual address 0 with file offset 0. For huge page mappings,
> > > > > > both the virtual address and the file offset need to be aligned to the
> > > > > > huge page size, e.g.
> > > > > >
> > > > > > ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
> > > > > >
> > > > > > Are you suggesting that, with the -z separate-code layout, ld.so would
> > > > > > ignore the non-executable flag of the first PT_LOAD and merge it with
> > > > > > the second PT_LOAD, mapping them together as read-only executable?
> > > > > > That might help with existing binaries, but wouldn't it blur the
> > > > > > permission boundaries between segments?
> > > > > >
> > > > >
> > > > > That is correct. It should have the similar effect as -z noseparate-code.
> > > > > The difference is link-time vs run-time. Of course, ld.so should do it only
> > > > > when huge page is in use.
> > > >
> > > > Having ld.so transparently merge the mappings doesn't always seem
> > > > workable. For example, in chromium the RE segment has different
> > > > virtual addresses and file offsets, so we can't construct a correct
> > > > mapping when attempting to merge them.
> > > >
> > > > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> > > > 0x00000000014d2b2c 0x00000000014d2b2c R 0x1000
> > > > LOAD 0x00000000014d3000 0x00000000014d4000 0x00000000014d4000
> > > > 0x000000000fb7a420 0x000000000fb7a420 R E 0x1000
> > > > LOAD 0x000000001104d420 0x000000001104f420 0x000000001104f420
> > > > 0x0000000000ac0f58 0x0000000000ac1be0 RW 0x1000
> > > > LOAD 0x0000000011b0f000 0x0000000011b12000 0x0000000011b12000
> > > > 0x0000000000146001 0x0000000000428d34 RW 0x1000
> > > >
> > >
> > > This layout won't work. It can be an option for compatible layouts. For
> > > ld on Linux/x86, we can default to --rosegment.
> >
> > With -z separate-code + --rosegment, the layout appears as RE, R, and
> > RW, which also looks reasonable.
>
> I think ld.so can map RE + R to RE when huge page is enabled if it
> is shown to improve performance.
Yes, this works if the virtual addresses and file offsets line up.
When the segment p_align matches the base page size, merging them in
ld.so doesn't waste physical memory.
However, on systems that support multiple page sizes, such as AArch64
and LoongArch64, p_align is typically 64 KB. That means there is
padding between the RE and R segments. With normal page mappings those
pages aren't populated, but if they are merged into a huge-page
mapping, the entire huge page would end up being backed by physical
memory.
If ld.so goes down this path, it probably needs to work as a general
mechanism rather than something tailored for x86, which introduces
another trade-off.
Thanks,
Rui
* H. J. Lu:
> On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
>>
>> Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
>> -z separate-code, mainly to avoid mixing code and data in the same segment
>> for better cache performance, and enabled it by default for Linux/x86.
>
> -z separate-code reduces cache pollution and
64 byte section alignment would achieve the same thing, I believe.
But maybe there's impact on TLB slots? But then separate-code seems to
make page coalescing more difficult.
> improves security by avoiding putting read-only data in the same page
> with code.
I think we've moved on from gadget avoidance as a security hardening
approach, so I'm not sure that's relevant anymore.
> Should ld.so map read-only data segment plus code segment into the same huge
> page? This should have the similar effect as -z noseparate-code.
Would it? Lots of objects have segments smaller than 2 MiB.
Thanks,
Florian
On Tue, Mar 17, 2026 at 5:01 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
> >>
> >> Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> >> -z separate-code, mainly to avoid mixing code and data in the same segment
> >> for better cache performance, and enabled it by default for Linux/x86.
> >
> > -z separate-code reduces cache pollution and
>
> 64 byte section alignment would achieve the same thing, I believe.
This avoids cacheline pollution, not cache pollution. icache can still
contain read-only data and dcache can contain code.
> But maybe there's impact on TLB slots? But then separate-code seems to
> make page coalescing more difficult.
> > improves security by avoiding putting read-only data in the same page
> > with code.
>
> I think we've moved on from gadget avoidance as a security hardening
> approach, so I'm not sure that's relevant anymore.
So gadget isn't a security issue anymore?
> > Should ld.so map read-only data segment plus code segment into the same huge
> > page? This should have the similar effect as -z noseparate-code.
>
> Would it? Lots of objects have segments smaller than 2 MiB.
Does THP help their performance? Won't -z noseparate-code be good for them?
On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
>
> Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> -z separate-code, mainly to avoid mixing code and data in the same segment
> for better cache performance, and enabled it by default for Linux/x86.
>
> Today, Linux already has working support for mapping executable code with THP,
> which can noticeably reduce iTLB pressure and bring measurable performance
> benefits.
>
> In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> typically place the code PT_LOAD segment first, so both the virtual address and
> file offset are 0. That turns out to be an important prerequisite for THP-aware
> PT_LOAD alignment optimizations [1][2][3], and the data shows clear performance
> wins on both AArch64 and LoongArch64. [4][1]
>
> Linux/x86 is a bit of an outlier here: the default -z separate-code setting
> happens to break this layout. So this patch changes the Linux/x86 default to
> -z noseparate-code.
>
> Benchmark
>
> Machine: AMD Ryzen 9 7950X (x86_64)
> Binutils: 2.46
> GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
> Linux kernel: 6.19.6 with patch [1]
>
> Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
>
> Without patch With patch
> instructions 8,246,133,611,932 8,246,025,137,750
> cpu-cycles 8,001,028,142,928 7,565,925,107,502
> itlb-misses 3,672,158,331 26,821,242
> time elapsed 64.66 s 61.97 s
>
> Instructions are basically unchanged. iTLB misses drop from ~3.67B to
> ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
> cycles and ~4.18% shorter wall time for this workload.
>
> [1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
> [2] https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
> [3] https://lore.kernel.org/linux-fsdevel/20260310145406.3073394-1-usama.arif@linux.dev
> [4] https://lore.kernel.org/linux-fsdevel/20260314095022.217231-1-r@hev.cc
>
Please try this patch.
Linux is working on mapping executable code with THP, transparent huge
page, which can noticeably reduce iTLB pressure and bring measurable
performance benefits. Aligning both virtual address and file offset of
code pages at the huge page size can help THP-aware PT_LOAD alignment
optimizations. When -z separate-code is used with PIE, we get
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x00404 0x00404 R 0x1000
LOAD 0x001000 0x00001000 0x00001000 0x00161 0x00161 R E 0x1000
LOAD 0x002000 0x00002000 0x00002000 0x00114 0x00114 R 0x1000
LOAD 0x002eac 0x00003eac 0x00003eac 0x00160 0x00164 RW 0x1000
where virtual address and file offset of code pages aren't aligned at the
huge page size which is greater than 4KB. When --rosegment
‘--rosegment’
‘--no-rosegment’
Attempt to ensure that only a single read-only, non-code segment is
created. Only useful when used in conjunction with the ‘-z
separate-code’ option. The resulting binaries should be smaller
than if ‘-z separate-code’ is used on its own. Without this
option, or if ‘--no-rosegment’ is specified, the ‘-z separate-code’
option will create two read-only segments, one before the code
segment and one after it.
is also used, we got
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x00341 0x00341 R E 0x1000
LOAD 0x001000 0x00001000 0x00001000 0x00320 0x00320 R 0x1000
LOAD 0x001eac 0x00002eac 0x00002eac 0x00160 0x00164 RW 0x1000
where virtual address and file offset of code pages are aligned at the
huge page size.
* configure.tgt: Enable --rosegment for Linux/x86 if
-z separate-code is enabled.
On Mon, Mar 16, 2026 at 7:35 PM hev <r@hev.cc> wrote:
>
> On Tue, Mar 17, 2026 at 10:15 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Mar 16, 2026 at 7:11 PM hev <r@hev.cc> wrote:
> > >
> > > On Tue, Mar 17, 2026 at 9:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 16, 2026 at 5:27 PM hev <r@hev.cc> wrote:
> > > > >
> > > > > On Tue, Mar 17, 2026 at 5:47 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > > >
> > > > > > > One benefit of -z noseparate-code is that the executable PT_LOAD ends
> > > > > > > up at virtual address 0 with file offset 0. For huge page mappings,
> > > > > > > both the virtual address and the file offset need to be aligned to the
> > > > > > > huge page size, e.g.
> > > > > > >
> > > > > > > ((p_vaddr | p_offset) & (HUGE_PAGE_SIZE - 1)) == 0
> > > > > > >
> > > > > > > Are you suggesting that, with the -z separate-code layout, ld.so would
> > > > > > > ignore the non-executable flag of the first PT_LOAD and merge it with
> > > > > > > the second PT_LOAD, mapping them together as read-only executable?
> > > > > > > That might help with existing binaries, but wouldn't it blur the
> > > > > > > permission boundaries between segments?
> > > > > > >
> > > > > >
> > > > > > That is correct. It should have the similar effect as -z noseparate-code.
> > > > > > The difference is link-time vs run-time. Of course, ld.so should do it only
> > > > > > when huge page is in use.
> > > > >
> > > > > Having ld.so transparently merge the mappings doesn't always seem
> > > > > workable. For example, in chromium the RE segment has different
> > > > > virtual addresses and file offsets, so we can't construct a correct
> > > > > mapping when attempting to merge them.
> > > > >
> > > > > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> > > > > 0x00000000014d2b2c 0x00000000014d2b2c R 0x1000
> > > > > LOAD 0x00000000014d3000 0x00000000014d4000 0x00000000014d4000
> > > > > 0x000000000fb7a420 0x000000000fb7a420 R E 0x1000
> > > > > LOAD 0x000000001104d420 0x000000001104f420 0x000000001104f420
> > > > > 0x0000000000ac0f58 0x0000000000ac1be0 RW 0x1000
> > > > > LOAD 0x0000000011b0f000 0x0000000011b12000 0x0000000011b12000
> > > > > 0x0000000000146001 0x0000000000428d34 RW 0x1000
> > > > >
> > > >
> > > > This layout won't work. It can be an option for compatible layouts. For
> > > > ld on Linux/x86, we can default to --rosegment.
> > >
> > > With -z separate-code + --rosegment, the layout appears as RE, R, and
> > > RW, which also looks reasonable.
> >
> > I think ld.so can map RE + R to RE when huge page is enabled if it
> > is shown to improve performance.
>
> Yes, this works if the virtual addresses and file offsets line up.
> When the segment p_align matches the base page size, merging them in
> ld.so doesn't waste physical memory.
>
> However, on systems that support multiple page sizes, such as AArch64
> and LoongArch64, p_align is typically 64 KB. That means there is
> padding between the RE and R segments. With normal page mappings those
> pages aren't populated, but if they are merged into a huge-page
> mapping, the entire huge page would end up being backed by physical
> memory.
ld.so should check if merging of RE and R segments is possible and
beneficial. Also merging should be tunable.
>
> If ld.so goes down this path, it probably needs to work as a general
> mechanism rather than something tailored for x86, which introduces
> another trade-off.
>
True.
* H. J. Lu:
> On Tue, Mar 17, 2026 at 5:01 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> > On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
>> >>
>> >> Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
>> >> -z separate-code, mainly to avoid mixing code and data in the same segment
>> >> for better cache performance, and enabled it by default for Linux/x86.
>> >
>> > -z separate-code reduces cache pollution and
>>
>> 64 byte section alignment would achieve the same thing, I believe.
>
> This avoids cacheline pollution, not cache pollution. icache can still
> contain read-only data and dcache can contain code.
Sorry, I still don't get it.
>> But maybe there's impact on TLB slots? But then separate-code seems to
>> make page coalescing more difficult.
>
>> > improves security by avoiding putting read-only data in the same page
>> > with code.
>>
>> I think we've moved on from gadget avoidance as a security hardening
>> approach, so I'm not sure that's relevant anymore.
>
> So gadget isn't a security issue anymore?
No, I don't think so.
>> > Should ld.so map read-only data segment plus code segment into the same huge
>> > page? This should have the similar effect as -z noseparate-code.
>>
>> Would it? Lots of objects have segments smaller than 2 MiB.
>
> Does THP help their performance? Won't -z noseparate-code be good for them?
I think you need fairly recent kernels for THP usage with non-anonymous
memory. THP should help. -z noseparate-code may lead to more page
coalescing opportunities because there are fewer and larger load
segments.
Thanks,
Florian
On Wed, Mar 18, 2026 at 12:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 16, 2026 at 7:20 AM WANG Rui <r@hev.cc> wrote:
> >
> > Years ago, commit f6aec96dce1d ("ld: Add --enable-separate-code") introduced
> > -z separate-code, mainly to avoid mixing code and data in the same segment
> > for better cache performance, and enabled it by default for Linux/x86.
> >
> > Today, Linux already has working support for mapping executable code with THP,
> > which can noticeably reduce iTLB pressure and bring measurable performance
> > benefits.
> >
> > In practice, I observed that on AArch64 and LoongArch64 distros, ELF executables
> > typically place the code PT_LOAD segment first, so both the virtual address and
> > file offset are 0. That turns out to be an important prerequisite for THP-aware
> > PT_LOAD alignment optimizations [1][2][3], and the data shows clear performance
> > wins on both AArch64 and LoongArch64. [4][1]
> >
> > Linux/x86 is a bit of an outlier here: the default -z separate-code setting
> > happens to break this layout. So this patch changes the Linux/x86 default to
> > -z noseparate-code.
> >
> > Benchmark
> >
> > Machine: AMD Ryzen 9 7950X (x86_64)
> > Binutils: 2.46
> > GCC: 15.2.1 (built with -z,noseparate-code + --enable-host-pie)
> > Linux kernel: 6.19.6 with patch [1]
> >
> > Workload: building Linux v7.0-rc1 vmlinux with x86_64_defconfig.
> >
> > Without patch With patch
> > instructions 8,246,133,611,932 8,246,025,137,750
> > cpu-cycles 8,001,028,142,928 7,565,925,107,502
> > itlb-misses 3,672,158,331 26,821,242
> > time elapsed 64.66 s 61.97 s
> >
> > Instructions are basically unchanged. iTLB misses drop from ~3.67B to
> > ~26M (~99.27% reduction), which results in about a ~5.44% reduction in
> > cycles and ~4.18% shorter wall time for this workload.
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc
> > [2] https://sourceware.org/pipermail/libc-alpha/2026-March/175776.html
> > [3] https://lore.kernel.org/linux-fsdevel/20260310145406.3073394-1-usama.arif@linux.dev
> > [4] https://lore.kernel.org/linux-fsdevel/20260314095022.217231-1-r@hev.cc
> >
>
> Please try this patch.
>
> Linux is working on mapping executable code with THP, transparent huge
> page, which can noticeably reduce iTLB pressure and bring measurable
> performance benefits. Aligning both virtual address and file offset of
> code pages at the huge page size can help THP-aware PT_LOAD alignment
> optimizations. When -z separate-code is used with PIE, we get
>
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> LOAD 0x000000 0x00000000 0x00000000 0x00404 0x00404 R 0x1000
> LOAD 0x001000 0x00001000 0x00001000 0x00161 0x00161 R E 0x1000
> LOAD 0x002000 0x00002000 0x00002000 0x00114 0x00114 R 0x1000
> LOAD 0x002eac 0x00003eac 0x00003eac 0x00160 0x00164 RW 0x1000
>
> where virtual address and file offset of code pages aren't aligned at the
> huge page size which is greater than 4KB. When --rosegment
>
> ‘--rosegment’
> ‘--no-rosegment’
> Attempt to ensure that only a single read-only, non-code segment is
> created. Only useful when used in conjunction with the ‘-z
> separate-code’ option. The resulting binaries should be smaller
> than if ‘-z separate-code’ is used on its own. Without this
> option, or if ‘--no-rosegment’ is specified, the ‘-z separate-code’
> option will create two read-only segments, one before the code
> segment and one after it.
>
> is also used, we got
>
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> LOAD 0x000000 0x00000000 0x00000000 0x00341 0x00341 R E 0x1000
> LOAD 0x001000 0x00001000 0x00001000 0x00320 0x00320 R 0x1000
> LOAD 0x001eac 0x00002eac 0x00002eac 0x00160 0x00164 RW 0x1000
>
> where virtual address and file offset of code pages are aligned at the
> huge page size.
This patch looks acceptable to me, and it’s good to see progress here.
That said, in some cases, -z noseparate-code can be more helpful for
THP. I explained the reasoning in another thread. [1][2]
Also, I’m a bit curious: GNU ld with --rosegment doesn’t seem to
arrange segments as R, RE, RW. From a security perspective, it would
make more sense to put R first so the ELF header is not executable.
LLVM lld with --rosegment does this by default (R, RE, RW).
I wonder whether relying on the current GNU ld --rosegment layout is
entirely stable, if it ever shifts towards a more security-driven
layout, it might not behave the way we expect here.
[1] https://sourceware.org/pipermail/binutils/2026-March/148605.html
[2] https://sourceware.org/pipermail/binutils/2026-March/148607.html
Thanks,
Rui
Hello,
On Wed, 18 Mar 2026, hev wrote:
> This patch looks acceptable to me, and it’s good to see progress here.
> That said, in some cases, -z noseparate-code can be more helpful for
> THP. I explained the reasoning in another thread. [1][2]
>
> Also, I’m a bit curious: GNU ld with --rosegment doesn’t seem to
> arrange segments as R, RE, RW. From a security perspective, it would
> make more sense to put R first so the ELF header is not executable.
I we'd continue to care for gadget avoidance, which Florian tells us is
not a thing anymore (yippie! :) ).
> LLVM lld with --rosegment does this by default (R, RE, RW).
But your point in these mails is to align the E part to huge-page borders.
Putting a read-only blob in front of it defeats that purpose.
> I wonder whether relying on the current GNU ld --rosegment layout is
> entirely stable, if it ever shifts towards a more security-driven
> layout, it might not behave the way we expect here.
Hmm? Your proposal even was to completely merge the read-only and exec
segment into one. You cannot at the same time care and not care about the
security implications of merging readonly data and code :)
Ciao,
Michael.
Hello,
On Wed, Mar 18, 2026 at 11:01 PM Michael Matz <matz@suse.de> wrote:
>
> Hello,
>
> On Wed, 18 Mar 2026, hev wrote:
>
> > This patch looks acceptable to me, and it’s good to see progress here.
> > That said, in some cases, -z noseparate-code can be more helpful for
> > THP. I explained the reasoning in another thread. [1][2]
> >
> > Also, I’m a bit curious: GNU ld with --rosegment doesn’t seem to
> > arrange segments as R, RE, RW. From a security perspective, it would
> > make more sense to put R first so the ELF header is not executable.
>
> I we'd continue to care for gadget avoidance, which Florian tells us is
> not a thing anymore (yippie! :) ).
In practice, GNU ld only enables --separate-code by default for
Linux/x86. That means most Linux distros on other arches are
effectively not using it. I’m not really sure how much security
benefit this actually brings in practice.
>
> > LLVM lld with --rosegment does this by default (R, RE, RW).
>
> But your point in these mails is to align the E part to huge-page borders.
> Putting a read-only blob in front of it defeats that purpose.
Yeah, that’s exactly why I’m concerned that --rosegment in GNU ld
might eventually move toward a more security-oriented layout like lld
(R, RE, RW).
>
> > I wonder whether relying on the current GNU ld --rosegment layout is
> > entirely stable, if it ever shifts towards a more security-driven
> > layout, it might not behave the way we expect here.
>
> Hmm? Your proposal even was to completely merge the read-only and exec
> segment into one. You cannot at the same time care and not care about the
> security implications of merging readonly data and code :)
From the motivation of this patch, I’m personally leaning toward
performance. If we can’t have both security and performance at the
same time, then maybe the linker should offer both options so users
can choose based on their needs:
* --separate-code + --rosegment: security-oriented. This would also
allow --rosegment to evolve toward a safer layout (RE, R, RW -> R, RE,
RW), similar to what lld does today.
* --noseparate-code: performance-oriented, providing a stable (RE, RW) layout.
Thanks,
Rui
On Wed, Mar 18, 2026 at 9:21 AM hev <r@hev.cc> wrote:
>
> Hello,
>
> On Wed, Mar 18, 2026 at 11:01 PM Michael Matz <matz@suse.de> wrote:
> >
> > Hello,
> >
> > On Wed, 18 Mar 2026, hev wrote:
> >
> > > This patch looks acceptable to me, and it’s good to see progress here.
> > > That said, in some cases, -z noseparate-code can be more helpful for
> > > THP. I explained the reasoning in another thread. [1][2]
> > >
> > > Also, I’m a bit curious: GNU ld with --rosegment doesn’t seem to
> > > arrange segments as R, RE, RW. From a security perspective, it would
> > > make more sense to put R first so the ELF header is not executable.
> >
> > I we'd continue to care for gadget avoidance, which Florian tells us is
> > not a thing anymore (yippie! :) ).
>
> In practice, GNU ld only enables --separate-code by default for
> Linux/x86. That means most Linux distros on other arches are
> effectively not using it. I’m not really sure how much security
> benefit this actually brings in practice.
>
> >
> > > LLVM lld with --rosegment does this by default (R, RE, RW).
> >
> > But your point in these mails is to align the E part to huge-page borders.
> > Putting a read-only blob in front of it defeats that purpose.
>
> Yeah, that’s exactly why I’m concerned that --rosegment in GNU ld
> might eventually move toward a more security-oriented layout like lld
> (R, RE, RW).
>
> >
> > > I wonder whether relying on the current GNU ld --rosegment layout is
> > > entirely stable, if it ever shifts towards a more security-driven
> > > layout, it might not behave the way we expect here.
> >
> > Hmm? Your proposal even was to completely merge the read-only and exec
> > segment into one. You cannot at the same time care and not care about the
> > security implications of merging readonly data and code :)
>
> From the motivation of this patch, I’m personally leaning toward
> performance. If we can’t have both security and performance at the
> same time, then maybe the linker should offer both options so users
> can choose based on their needs:
>
> * --separate-code + --rosegment: security-oriented. This would also
> allow --rosegment to evolve toward a safer layout (RE, R, RW -> R, RE,
> RW), similar to what lld does today.
> * --noseparate-code: performance-oriented, providing a stable (RE, RW) layout.
>
> Thanks,
> Rui
For x86, --no-rosegment -z noseparate-code -z max-page-size=2097152 is
the most compact layout that works with transparent hugepages.
Google and ChromeOS have been using such a layout for a long time.
If you use --rosegment to separate R and RX segments, there is a trade-off
* With -z separate-code , THP works well for the code segment. The
cost is massive file size inflation due to 2MiB padding between the R
and R+X segments.
* With -z noseparate-code, the RX segment starts at a non-aligned
address. Last time I checked, Linux kernel's THP requires both the
file offset and the VMA start to be huge-page-aligned, so the code
segment cannot use THP.
"In -z noseparate-code layouts, the file content starts somewhere at
the first page, potentially wasting half a huge page on unrelated
content." However, the file size saving advantage outweighs this THP
concern.
(https://maskray.me/blog/2023-12-17-exploring-the-section-layout-in-linker-output)
Gadget avoidance has always been more security theater than substance.
It's irrelevant when talking about --[no-]rosegment and -z
[no]separate-code preference.
GNU ld's issues:
* -z separate-code suppresses --rosegment/--no-rosegment. In LLD the
two options are orthogonal.
* The default linker script places readonly sections after RX
sections, which is not ideal
(https://sourceware.org/bugzilla/show_bug.cgi?id=23704)
On Thu, Mar 19, 2026 at 10:06 AM Fangrui Song <i@maskray.me> wrote:
> For x86, --no-rosegment -z noseparate-code -z max-page-size=2097152 is
> the most compact layout that works with transparent hugepages.
> Google and ChromeOS have been using such a layout for a long time.
--no-rosegment -z noseparate-code -z max-page-size=2097152 is somewhat
more compact than --no-rosegment -z separate-code -z
max-page-size=2097152. However, due to alignment, there is still a
sizable gap between the RE and RW segments, which significantly
increases the file size (e.g. hello world: 6K -> 2.1M). This is almost
impractical with 32M huge pages.
LOAD 0x0000000000000000 0x0000000000000000
0x0000000000000000
0x0000000000000738 0x0000000000000738 R E
0x200000
LOAD 0x00000000001ffe10 0x00000000003ffe10 0x00000000003ffe10
0x0000000000000200 0x0000000000000208 RW 0x200000
> If you use --rosegment to separate R and RX segments, there is a trade-off
>
> * With -z separate-code , THP works well for the code segment. The
> cost is massive file size inflation due to 2MiB padding between the R
> and R+X segments.
> * With -z noseparate-code, the RX segment starts at a non-aligned
> address. Last time I checked, Linux kernel's THP requires both the
> file offset and the VMA start to be huge-page-aligned, so the code
> segment cannot use THP.
It doesn’t seem to be the case anymore. With -z noseparate-code, the
RX segment now starts at offset 0, which works for alignment with any
huge page size. That’s also why I suggested putting the RX segment
first.
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000738 0x0000000000000738 R E 0x1000
LOAD 0x0000000000000e10 0x0000000000001e10 0x0000000000001e10
0x0000000000000200 0x0000000000000208 RW 0x1000
> "In -z noseparate-code layouts, the file content starts somewhere at
> the first page, potentially wasting half a huge page on unrelated
> content." However, the file size saving advantage outweighs this THP
> concern.
> (https://maskray.me/blog/2023-12-17-exploring-the-section-layout-in-linker-output)
>
>
> Gadget avoidance has always been more security theater than substance.
Agreed.
Thanks,
Rui
On Wed, Mar 18, 2026 at 7:56 PM hev <r@hev.cc> wrote:
>
> On Thu, Mar 19, 2026 at 10:06 AM Fangrui Song <i@maskray.me> wrote:
> > For x86, --no-rosegment -z noseparate-code -z max-page-size=2097152 is
> > the most compact layout that works with transparent hugepages.
> > Google and ChromeOS have been using such a layout for a long time.
>
> --no-rosegment -z noseparate-code -z max-page-size=2097152 is somewhat
> more compact than --no-rosegment -z separate-code -z
> max-page-size=2097152. However, due to alignment, there is still a
> sizable gap between the RE and RW segments, which significantly
> increases the file size (e.g. hello world: 6K -> 2.1M). This is almost
> impractical with 32M huge pages.
>
> LOAD 0x0000000000000000 0x0000000000000000
> 0x0000000000000000
> 0x0000000000000738 0x0000000000000738 R E
> 0x200000
> LOAD 0x00000000001ffe10 0x00000000003ffe10 0x00000000003ffe10
> 0x0000000000000200 0x0000000000000208 RW 0x200000
This is due to the suboptimal PT_GNU_RELRO layout in GNU ld. See
https://sourceware.org/bugzilla/show_bug.cgi?id=30612
mold and ld.lld avoid this issue by introducing the .relro_padding
section. lld's implementation also emulates
DATA_SEGMENT_ALIGN/DATA_RELRO_END.
https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#z-relro
"""
GNU ld uses one RW PT_LOAD program header with padding at the start.
The first half of the PT_LOAD overlaps with PT_GNU_RELRO. The padding
is added so that the end of PT_GNU_RELRO is aligned by max-page-size.
(See ld.bfd --verbose output.) Prior to GNU ld 2.39, the end was
aligned by common-page-size. GNU ld's one RW PT_LOAD layout makes the
alignment increase the file size. max-page-size can be large, such as
65536 for many systems, causing wasted space.
lld utilitizes two RW PT_LOAD program headers: one for RELRO sections
and the other for non-RELRO sections. Although this might appear
unusual initially, it eliminates the need for alignment padding as
seen in GNU ld's layout. Key changes:
https://reviews.llvm.org/D58892 switched from
PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss) to
PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss).
The end of the PT_GNU_RELRO segment and the associated RW PT_LOAD
segment is padded to a common-page-size boundary. The padding section
.relro_padding is like mold. Before LLD 18, there is an issue that
runtime_page_size < common-page-size does not work.
The layout used by mold is similar to that of lld. In mold's case, the
end of PT_GNU_RELRO is padded to max-page-size by appending a
SHT_NOBITS .relro_padding section. This approach ensures that the last
page of PT_GNU_RELRO is protected, regardless of the system page size.
However, when the system page size is less than max-page-size, the map
from the first RW PT_LOAD is larger than needed.
"""
> > If you use --rosegment to separate R and RX segments, there is a trade-off
> >
> > * With -z separate-code , THP works well for the code segment. The
> > cost is massive file size inflation due to 2MiB padding between the R
> > and R+X segments.
> > * With -z noseparate-code, the RX segment starts at a non-aligned
> > address. Last time I checked, Linux kernel's THP requires both the
> > file offset and the VMA start to be huge-page-aligned, so the code
> > segment cannot use THP.
>
> It doesn’t seem to be the case anymore. With -z noseparate-code, the
> RX segment now starts at offset 0, which works for alignment with any
> huge page size. That’s also why I suggested putting the RX segment
> first.
>
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x0000000000000738 0x0000000000000738 R E 0x1000
> LOAD 0x0000000000000e10 0x0000000000001e10 0x0000000000001e10
> 0x0000000000000200 0x0000000000000208 RW 0x1000
I was describing lld's -z noseparate-code --rosegment behavior.
With GNU ld's -z noseparate-code, --rosegment and --no-rosegment don't
make a difference.
Actually, GNU ld's --rosegment behavior seems broken...
┌───────────────────────────────────┬──────────┬────────────────┐
│ Options │ Segments │ Layout │
├───────────────────────────────────┼──────────┼────────────────┤
│ -z noseparate-code --no-rosegment │ 2 │ R+X, RW │
├───────────────────────────────────┼──────────┼────────────────┤
│ -z noseparate-code --rosegment │ 2 │ R+X, RW (same) │
├───────────────────────────────────┼──────────┼────────────────┤
│ -z separate-code --no-rosegment │ 4 │ R, R+X, R, RW │
├───────────────────────────────────┼──────────┼────────────────┤
│ -z separate-code --rosegment │ 3 │ R+X, R, RW │
└───────────────────────────────────┴──────────┴────────────────┘
I didn't expect -z separate-code --no-rosegment to have more segments
than -z separate-code --rosegment.
> > "In -z noseparate-code layouts, the file content starts somewhere at
> > the first page, potentially wasting half a huge page on unrelated
> > content." However, the file size saving advantage outweighs this THP
> > concern.
> > (https://maskray.me/blog/2023-12-17-exploring-the-section-layout-in-linker-output)
> >
> >
> > Gadget avoidance has always been more security theater than substance.
>
> Agreed.
>
> Thanks,
> Rui
On Thu, Mar 19, 2026 at 11:15 AM Fangrui Song <i@maskray.me> wrote:
>
> On Wed, Mar 18, 2026 at 7:56 PM hev <r@hev.cc> wrote:
> >
> > On Thu, Mar 19, 2026 at 10:06 AM Fangrui Song <i@maskray.me> wrote:
> > > For x86, --no-rosegment -z noseparate-code -z max-page-size=2097152 is
> > > the most compact layout that works with transparent hugepages.
> > > Google and ChromeOS have been using such a layout for a long time.
> >
> > --no-rosegment -z noseparate-code -z max-page-size=2097152 is somewhat
> > more compact than --no-rosegment -z separate-code -z
> > max-page-size=2097152. However, due to alignment, there is still a
> > sizable gap between the RE and RW segments, which significantly
> > increases the file size (e.g. hello world: 6K -> 2.1M). This is almost
> > impractical with 32M huge pages.
> >
> > LOAD 0x0000000000000000 0x0000000000000000
> > 0x0000000000000000
> > 0x0000000000000738 0x0000000000000738 R E
> > 0x200000
> > LOAD 0x00000000001ffe10 0x00000000003ffe10 0x00000000003ffe10
> > 0x0000000000000200 0x0000000000000208 RW 0x200000
>
> This is due to the suboptimal PT_GNU_RELRO layout in GNU ld. See
> https://sourceware.org/bugzilla/show_bug.cgi?id=30612
> mold and ld.lld avoid this issue by introducing the .relro_padding
> section. lld's implementation also emulates
> DATA_SEGMENT_ALIGN/DATA_RELRO_END.
>
> https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#z-relro
>
> """
> GNU ld uses one RW PT_LOAD program header with padding at the start.
> The first half of the PT_LOAD overlaps with PT_GNU_RELRO. The padding
> is added so that the end of PT_GNU_RELRO is aligned by max-page-size.
> (See ld.bfd --verbose output.) Prior to GNU ld 2.39, the end was
> aligned by common-page-size. GNU ld's one RW PT_LOAD layout makes the
> alignment increase the file size. max-page-size can be large, such as
> 65536 for many systems, causing wasted space.
>
> lld utilitizes two RW PT_LOAD program headers: one for RELRO sections
> and the other for non-RELRO sections. Although this might appear
> unusual initially, it eliminates the need for alignment padding as
> seen in GNU ld's layout. Key changes:
>
> https://reviews.llvm.org/D58892 switched from
> PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss) to
> PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss).
> The end of the PT_GNU_RELRO segment and the associated RW PT_LOAD
> segment is padded to a common-page-size boundary. The padding section
> .relro_padding is like mold. Before LLD 18, there is an issue that
> runtime_page_size < common-page-size does not work.
>
> The layout used by mold is similar to that of lld. In mold's case, the
> end of PT_GNU_RELRO is padded to max-page-size by appending a
> SHT_NOBITS .relro_padding section. This approach ensures that the last
> page of PT_GNU_RELRO is protected, regardless of the system page size.
> However, when the system page size is less than max-page-size, the map
> from the first RW PT_LOAD is larger than needed.
> """
Thanks for the explanation.
With -z separate-code --rosegment, the segments are page-aligned, so
there’s no overlap when mapped.
With the default -z noseparate-code --rosegment, read-only data ends
up in a non-executable segment, but since the layout is packed, the RO
and RX segments aren’t fully separated. From a THP-friendly
perspective, I’m curious what you think about LLD defaulting to
--no-rosegment?
Thanks,
Rui
On Wed, Mar 18, 2026 at 9:11 PM hev <r@hev.cc> wrote:
>
> On Thu, Mar 19, 2026 at 11:15 AM Fangrui Song <i@maskray.me> wrote:
> >
> > On Wed, Mar 18, 2026 at 7:56 PM hev <r@hev.cc> wrote:
> > >
> > > On Thu, Mar 19, 2026 at 10:06 AM Fangrui Song <i@maskray.me> wrote:
> > > > For x86, --no-rosegment -z noseparate-code -z max-page-size=2097152 is
> > > > the most compact layout that works with transparent hugepages.
> > > > Google and ChromeOS have been using such a layout for a long time.
> > >
> > > --no-rosegment -z noseparate-code -z max-page-size=2097152 is somewhat
> > > more compact than --no-rosegment -z separate-code -z
> > > max-page-size=2097152. However, due to alignment, there is still a
> > > sizable gap between the RE and RW segments, which significantly
> > > increases the file size (e.g. hello world: 6K -> 2.1M). This is almost
> > > impractical with 32M huge pages.
> > >
> > > LOAD 0x0000000000000000 0x0000000000000000
> > > 0x0000000000000000
> > > 0x0000000000000738 0x0000000000000738 R E
> > > 0x200000
> > > LOAD 0x00000000001ffe10 0x00000000003ffe10 0x00000000003ffe10
> > > 0x0000000000000200 0x0000000000000208 RW 0x200000
> >
> > This is due to the suboptimal PT_GNU_RELRO layout in GNU ld. See
> > https://sourceware.org/bugzilla/show_bug.cgi?id=30612
> > mold and ld.lld avoid this issue by introducing the .relro_padding
> > section. lld's implementation also emulates
> > DATA_SEGMENT_ALIGN/DATA_RELRO_END.
> >
> > https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#z-relro
> >
> > """
> > GNU ld uses one RW PT_LOAD program header with padding at the start.
> > The first half of the PT_LOAD overlaps with PT_GNU_RELRO. The padding
> > is added so that the end of PT_GNU_RELRO is aligned by max-page-size.
> > (See ld.bfd --verbose output.) Prior to GNU ld 2.39, the end was
> > aligned by common-page-size. GNU ld's one RW PT_LOAD layout makes the
> > alignment increase the file size. max-page-size can be large, such as
> > 65536 for many systems, causing wasted space.
> >
> > lld utilitizes two RW PT_LOAD program headers: one for RELRO sections
> > and the other for non-RELRO sections. Although this might appear
> > unusual initially, it eliminates the need for alignment padding as
> > seen in GNU ld's layout. Key changes:
> >
> > https://reviews.llvm.org/D58892 switched from
> > PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss) to
> > PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss).
> > The end of the PT_GNU_RELRO segment and the associated RW PT_LOAD
> > segment is padded to a common-page-size boundary. The padding section
> > .relro_padding is like mold. Before LLD 18, there is an issue that
> > runtime_page_size < common-page-size does not work.
> >
> > The layout used by mold is similar to that of lld. In mold's case, the
> > end of PT_GNU_RELRO is padded to max-page-size by appending a
> > SHT_NOBITS .relro_padding section. This approach ensures that the last
> > page of PT_GNU_RELRO is protected, regardless of the system page size.
> > However, when the system page size is less than max-page-size, the map
> > from the first RW PT_LOAD is larger than needed.
> > """
>
> Thanks for the explanation.
>
> With -z separate-code --rosegment, the segments are page-aligned, so
> there’s no overlap when mapped.
>
> With the default -z noseparate-code --rosegment, read-only data ends
> up in a non-executable segment, but since the layout is packed, the RO
> and RX segments aren’t fully separated. From a THP-friendly
> perspective, I’m curious what you think about LLD defaulting to
> --no-rosegment?
>
> Thanks,
> Rui
No, ld.lld will not switch to --no-rosegment.
* THP preferences are better managed through compiler driver options
rather than changing the fundamental defaults of the linker itself.
* Maintaining separate segments for different permissions has
aesthetic benefits.
* Changing the default will break 475 tests, causing unneeded churn.
@@ -12356,14 +12356,6 @@ if test "${enable_separate_code+set}" = set; then :
esac
fi
-# Enable -z separate-code by default for Linux/x86.
-case "${target}" in
-i[3-7]86-*-linux-* | x86_64-*-linux-*)
- if test ${ac_default_ld_z_separate_code} = unset; then
- ac_default_ld_z_separate_code=1
- fi
- ;;
-esac
if test "${ac_default_ld_z_separate_code}" = unset; then
ac_default_ld_z_separate_code=0
fi
@@ -131,16 +131,6 @@ AC_ARG_ENABLE(separate-code,
yes) ac_default_ld_z_separate_code=1 ;;
no) ac_default_ld_z_separate_code=0 ;;
esac])
-# Enable -z separate-code by default for Linux/x86.
-changequote(,)dnl
-case "${target}" in
-i[3-7]86-*-linux-* | x86_64-*-linux-*)
-changequote([,])dnl
- if test ${ac_default_ld_z_separate_code} = unset; then
- ac_default_ld_z_separate_code=1
- fi
- ;;
-esac
if test "${ac_default_ld_z_separate_code}" = unset; then
ac_default_ld_z_separate_code=0
fi
@@ -1119,12 +1119,9 @@ frv-*-* | hppa*-*-* | ia64-*-* | mips*-*-*)
;;
esac
-# Enable -z separate-code and --warn-textrel by default for Linux/x86.
+# Enable --warn-textrel by default for Linux/x86.
case "${target}" in
i[3-7]86-*-linux-* | x86_64-*-linux-*)
- if test ${ac_default_ld_z_separate_code} = unset; then
- ac_default_ld_z_separate_code=1
- fi
if test ${ac_default_ld_textrel_check} = unset; then
ac_default_ld_textrel_check=yes
fi