[v6,0/9] Add support for memory sealing

Message ID 20250311171305.89091-1-adhemerval.zanella@linaro.org (mailing list archive)
Headers
Series Add support for memory sealing |

Message

Adhemerval Zanella Netto March 11, 2025, 5:09 p.m. UTC
  The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
blocking some memory mapping operations on the VMA range:

 * Unmapping, moving to another location, extending or shrinking the
   size, munmap, and mremap.
 * Moving or expanding a different VMA into the current range, via
   mremap.
 * Modifying the memory range with mmap along with flag MAP_FIXED.
 * Expanding the size with mremap.
 * Change the protection flags with mprotect or pkey_mprotect.
 * Destructive behaviors on anonymous memory, such as madvice with
   MADV_DONTNEED.

Memory sealing is a hardening mechanism [1] to prevent memory mapping
changes to the ELF segment mappings done by the dynamic loader (for
instance, the RELRO hardening). The OpenBSD supports a similar hardening
with the mimmutable syscall [2].

Memory sealing is an opt-in security feature that requires the new GNU
property GNU_PROPERTY_MEMORY_SEAL, defined in Linux ABI [3] and
supported on binutils 2.44 [4]. The memory sealing is applied iff the
attribute is presented in the binary, and ignored in dependencies
otherwise. It allows the program to define whether sealing will be
applied, even if the dependencies have the attribute (for instance on
CRIU, where it can not work correctly with sealing applied).

The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
supports the mseal syscall and how glibc is configured.  On the default
configuration that aims to support older kernel releases, the memory
sealing attribute is taken as a hint. If glibc is configured with a
minimum kernel of 6.10, where mseal is implied to be supported, sealing
is enforced.

The memory sealing for ELF segments is done in multiple places where the
mapping is supposed to be immutable over program execution:

 * All shared library dependencies from the binary, including the
   read-only segments after PT_GNU_RELRO setup.
 * The binary itself, including dynamic and static links. In both cases,
   it is up either to binary or the loader to set up the sealing.
 * Any preload libraries.
 * Any library loaded with dlopen with RTLD_NODELETE flag (including
   libgcc.so loaded to enable process unwind and thread cancellation).
 * Audit modules.

There is an option to enable memory sealing for glibc itself,
—-enable-memory-sealing, including installed programs and tests. It is
disabled by default.

Since memory sealing is applied for all the PT_LOAD segments, a new
program header is also supported so users can define a region where
sealing should not be applied (so users can later initialize and change
memory protection). The special section, PT_GNU_MUTABLE (reference
implemented at [5]),  marks a memory region that should not be sealed
if the GNU_PROPERTY_MEMORY_SEAL attribute is present.  The section name
starts with ".gnu.mutable" and has an alignment and size of the defined
maximum page size (-z max-page-size linker options). For instance the code:

  #define GNU_MUTABLE_SECTION_NAME       ".gnu.mutable"

  unsigned char mutable_array1[64]
    __attribute__ ((section (GNU_MUTABLE_SECTION_NAME)))
       = { 0 };
  unsigned char mutable_array2[32]
    __attribute__ ((section (GNU_MUTABLE_SECTION_NAME)))
       = { 0 };

places both 'mutable_array1' and 'mutable_array2' on a page
aligned memory region with the size of a page.

The linker sets the alignment and size to simplify support for ABIs with
multiple page sizes, otherwise user would need to know the maximum page
size to correctly define the alignment and size of the variable.

Finally a new tunable can be used to disable memory sealing on the program
and all its dependencies.  This can be useful for debugging, along with
patchable function entries (-fpatchable-function-entry), or profiling
(-mfentry, -mnop-count, -minstrument-return) where the program might be
run with or without changes to its text segment.

I tested on both x86_64-linux-gnu and aarch64-linux-gnu with Linux
6.14, along with some testing on a powerpc64le-linux-gnu and s390x-inux-gnu VMs.

[1] https://blog.trailofbits.com/2024/10/25/a-deep-dive-into-linuxs-new-mseal-syscall/
[2] https://man.openbsd.org/mimmutable.2
[3] https://gitlab.com/x86-psABIs/Linux-ABI/-/commit/25a851b99665e7b22db5fabe818efaaa52466893
[4] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4d890484df4b2cf004f6f1f6d8c39a69fa39c875
[5] https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/azanella/pt_gnu_mutable

Changes v5->v6:
* Fixed aarch64 gnu attribute handling to work with GCS.
* Add support for PT_GNU_MUTABLE.
* Fixed mseal manual wording.
* Add the tunable to allow some user cases (profiling).
* Changed semantics to avoid sealing on dependencies if the attribute is not present on the binary.
* Change how to apply sealing on glibc itself (disable by default now)

Changes v4->v5:
* Removed the tunable.
* Rebased against GCS change to enable GNU attribute parsing on all architectures.

Changes v3->v4:
* Rebase against master (remove nios2 ABI update and handle f2326c2ec0a0a8db
  changes).
* Handle vvar_vclock mapping on tests.
 
Changes v2->v3:
* Make the option opt-in instead of opt-out.

Adhemerval Zanella (9):
  linux: Add mseal syscall support
  elf: Parse gnu properties for statically linked binaries
  elf: Parse gnu properties for the loader
  elf: Use RTLD_NODELETE for dependencies
  elf: Add support to memory sealing
  Add --enable-memory-sealing configure option
  linux: Add memory sealing tests
  linux: Add support for PT_GNU_MUTABLE
  elf: Add glibc.rtld.seal tunable

 INSTALL                                       |   6 +
 Makeconfig                                    |  19 +-
 Makerules                                     |   2 +
 NEWS                                          |  23 +-
 configure                                     |  90 +++++-
 configure.ac                                  |  38 +++
 csu/libc-start.c                              |   4 +
 elf/Makefile                                  |  19 +-
 elf/dl-load.c                                 |   9 +
 elf/dl-map-segments.h                         |  17 +-
 elf/dl-open.c                                 |   3 +-
 elf/dl-reloc.c                                |  77 ++++-
 elf/dl-support.c                              |  25 ++
 elf/dl-tunables.list                          |   6 +
 elf/elf.h                                     |   4 +
 elf/rtld.c                                    |  24 +-
 elf/setup-vdso.h                              |   2 +
 elf/tst-rtld-list-tunables.exp                |   1 +
 include/link.h                                |  12 +
 manual/install.texi                           |   5 +
 manual/memory.texi                            |  69 +++++
 manual/tunables.texi                          |  36 +++
 sysdeps/aarch64/dl-prop.h                     |   5 +
 sysdeps/generic/dl-mseal.h                    |  22 ++
 sysdeps/generic/dl-prop-mseal.h               |  34 +++
 sysdeps/generic/dl-prop.h                     |   5 +
 sysdeps/generic/ldsodefs.h                    |  10 +
 sysdeps/generic/libc-prop.h                   |  44 +++
 sysdeps/unix/sysv/linux/Makefile              | 106 +++++++
 sysdeps/unix/sysv/linux/Versions              |   3 +
 sysdeps/unix/sysv/linux/aarch64/libc-start.h  |  11 -
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/bits/mman-shared.h    |   8 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/dl-mseal.c            |  60 ++++
 sysdeps/unix/sysv/linux/dl-mseal.h            |  31 ++
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/kernel-features.h     |   8 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/syscalls.list         |   1 +
 .../unix/sysv/linux/tst-dl_mseal-auditmod.c   |  23 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal-common.h |  29 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-1-1.c |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-1.c   |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-2-1.c |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-2.c   |  19 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-1.c  |  19 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-2.c  |  19 ++
 .../sysv/linux/tst-dl_mseal-mutable-dlopen.c  |   1 +
 .../sysv/linux/tst-dl_mseal-mutable-mod.c     |  47 ++++
 .../sysv/linux/tst-dl_mseal-mutable-mod.h     |  33 +++
 .../sysv/linux/tst-dl_mseal-mutable-static.c  |   2 +
 .../unix/sysv/linux/tst-dl_mseal-mutable.c    | 242 ++++++++++++++++
 sysdeps/unix/sysv/linux/tst-dl_mseal-noseal.c |  94 +++++++
 .../unix/sysv/linux/tst-dl_mseal-preload.c    |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-skeleton.c   | 266 ++++++++++++++++++
 .../sysv/linux/tst-dl_mseal-static-noseal.c   |  59 ++++
 sysdeps/unix/sysv/linux/tst-dl_mseal-static.c |  56 ++++
 .../unix/sysv/linux/tst-dl_mseal-tunable.c    |  24 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal.c        |  92 ++++++
 sysdeps/unix/sysv/linux/tst-mseal-pkey.c      |  84 ++++++
 sysdeps/unix/sysv/linux/tst-mseal.c           |  67 +++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 sysdeps/x86/dl-prop.h                         |   8 +-
 90 files changed, 2104 insertions(+), 26 deletions(-)
 create mode 100644 sysdeps/generic/dl-mseal.h
 create mode 100644 sysdeps/generic/dl-prop-mseal.h
 create mode 100644 sysdeps/generic/libc-prop.h
 create mode 100644 sysdeps/unix/sysv/linux/dl-mseal.c
 create mode 100644 sysdeps/unix/sysv/linux/dl-mseal.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-auditmod.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-common.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-1-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-2-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-2.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-2.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mutable-dlopen.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mutable-mod.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mutable-mod.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mutable-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mutable.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-preload.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-skeleton.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-static-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-tunable.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-mseal-pkey.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-mseal.c
  

Comments

Yury Khrustalev March 18, 2025, 5:16 p.m. UTC | #1
Hi Adhemerval,

On Tue, Mar 11, 2025 at 02:09:47PM -0300, Adhemerval Zanella wrote:
> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
> blocking some memory mapping operations on the VMA range:
> 

At least 4 tests fail due to BTI detecting mismatch when the binaries for these tests
are built with `-Wl,-z,memory-seal`:

FAIL: elf/filter
FAIL: elf/tst-big-note
FAIL: elf/tst-relr3
FAIL: elf/tst-relr4

To reproduce this, you need to enable BTI support by using `-mbranch-protection=bti`
and you also need a system that supports BTI, and of course configure Glibc with
`--enable-memory-sealing`.

I used the FVP model following this guide [1].

I will need more time to get to the bottom of where and why this happens. Disabling
either BTI (e.g. on kernel level) or memory sealing solves this issue, so it seems
that these two are not compatible yet.

[1] https://inbox.sourceware.org/libc-help/Z9grjmNzxsuPu7iD@arm.com/

Kind regards,
Yury
  
Yury Khrustalev March 21, 2025, 9:53 a.m. UTC | #2
On Tue, Mar 18, 2025 at 05:16:48PM +0000, Yury Khrustalev wrote:
> Hi Adhemerval,
> 
> On Tue, Mar 11, 2025 at 02:09:47PM -0300, Adhemerval Zanella wrote:
> > The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
> > blocking some memory mapping operations on the VMA range:
> > 
> 
> At least 4 tests fail due to BTI detecting mismatch when the binaries for these tests
> are built with `-Wl,-z,memory-seal`:
> 
> FAIL: elf/filter


I'm using GCC toolchain built with `--enable-standard-branch-protection`
with binutils 2.44.

The tests fail with SIGILL due to not having proper BTI landing pad in
the PLT sequence of test library `elf/filtmod1.so`. When this library is
linked with `-Wl,-z,memory-seal`, for some reason generated PLT sequence
looks like this:

```
00000000000004e0 <.plt>:
 4e0:   a9bf7bf0        stp     x16, x30, [sp, #-16]!
 4e4:   f00000f0        adrp    x16, 1f000 <__FRAME_END__+0x1e8bc>
 4e8:   f947fe11        ldr     x17, [x16, #4088]
 4ec:   913fe210        add     x16, x16, #0xff8
 4f0:   d61f0220        br      x17
```

And it should start with

```
00000000000004f0 <.plt>:
 4f0:   d503245f        bti     c
 ...
```

The latter does work when linking with `-Wl,-z,nomemory-seal`.

Another test library `elf/filtmod2.so` has correct PLT sequence but only
becuase BTI is requested explicitly on the command line for it with the
`-Wl,-z,force-bti` option.

I suspect something is wrong with the static linker when it processes
objects with BTI and is supposed to generate correct PLT sequece with
BTI landing pad by default.

Kind regards,
Yury
  
Yury Khrustalev March 24, 2025, 5:09 p.m. UTC | #3
On Fri, Mar 21, 2025 at 09:53:51AM +0000, Yury Khrustalev wrote:
> On Tue, Mar 18, 2025 at 05:16:48PM +0000, Yury Khrustalev wrote:
> 
> I suspect something is wrong with the static linker when it processes
> objects with BTI and is supposed to generate correct PLT sequece with
> BTI landing pad by default.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32818

Kind regards,
Yury
  
Florian Weimer April 3, 2025, 5:40 a.m. UTC | #4
* Adhemerval Zanella:

> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
> blocking some memory mapping operations on the VMA range:
>
>  * Unmapping, moving to another location, extending or shrinking the
>    size, munmap, and mremap.
>  * Moving or expanding a different VMA into the current range, via
>    mremap.
>  * Modifying the memory range with mmap along with flag MAP_FIXED.
>  * Expanding the size with mremap.
>  * Change the protection flags with mprotect or pkey_mprotect.
>  * Destructive behaviors on anonymous memory, such as madvice with
>    MADV_DONTNEED.
>
> Memory sealing is a hardening mechanism [1] to prevent memory mapping
> changes to the ELF segment mappings done by the dynamic loader (for
> instance, the RELRO hardening). The OpenBSD supports a similar hardening
> with the mimmutable syscall [2].

The Nvidia drivers appear to patch the glibc dynamic linker (in
_dl_debug_update):

  <https://bugzilla.redhat.com/show_bug.cgi?id=2357062#c5>

I don't know if they use an mprotect-patch-mprotect sequence for that,
or some other mechanism that is compatible with mseal.  But it seems we
can't apply mseal to glibc by default for backwards compatibility
reasons.

Thanks,
Florian
  
Yury Khrustalev April 17, 2025, 1:48 p.m. UTC | #5
Hi Adhemerval,

On Mon, Mar 24, 2025 at 05:09:18PM +0000, Yury Khrustalev wrote:
> On Fri, Mar 21, 2025 at 09:53:51AM +0000, Yury Khrustalev wrote:
> > On Tue, Mar 18, 2025 at 05:16:48PM +0000, Yury Khrustalev wrote:
> > 
> > I suspect something is wrong with the static linker when it processes
> > objects with BTI and is supposed to generate correct PLT sequece with
> > BTI landing pad by default.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32818

Was fixed at bb71794c479 [1]. Glibc tests that used to fail now pass.

[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=bb71794c479b15aa4ca6b4a74b87af75079a2a36

Kind regards,
Yury
  
Jeff Xu April 18, 2025, 6:52 p.m. UTC | #6
On Tue, Mar 11, 2025 at 10:13 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
> Changes v5->v6:
> * Fixed aarch64 gnu attribute handling to work with GCS.
> * Add support for PT_GNU_MUTABLE.
> * Fixed mseal manual wording.
> * Add the tunable to allow some user cases (profiling).

> * Changed semantics to avoid sealing on dependencies if the attribute is not present on the binary.
I tested the above change.
With this change, Checkpoint restoration should work fine when glibc
is built with sealing enabled, since the CRIU doesn't set the seal
flag in its main binary.

Thanks
-Jeff
  
Jeff Xu Oct. 10, 2025, 5:33 p.m. UTC | #7
On Wed, Apr 2, 2025 at 10:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Adhemerval Zanella:
>
> > The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
> > blocking some memory mapping operations on the VMA range:
> >
> >  * Unmapping, moving to another location, extending or shrinking the
> >    size, munmap, and mremap.
> >  * Moving or expanding a different VMA into the current range, via
> >    mremap.
> >  * Modifying the memory range with mmap along with flag MAP_FIXED.
> >  * Expanding the size with mremap.
> >  * Change the protection flags with mprotect or pkey_mprotect.
> >  * Destructive behaviors on anonymous memory, such as madvice with
> >    MADV_DONTNEED.
> >
> > Memory sealing is a hardening mechanism [1] to prevent memory mapping
> > changes to the ELF segment mappings done by the dynamic loader (for
> > instance, the RELRO hardening). The OpenBSD supports a similar hardening
> > with the mimmutable syscall [2].
>
> The Nvidia drivers appear to patch the glibc dynamic linker (in
> _dl_debug_update):
>
>   <https://bugzilla.redhat.com/show_bug.cgi?id=2357062#c5>
>
> I don't know if they use an mprotect-patch-mprotect sequence for that,
> or some other mechanism that is compatible with mseal.  But it seems we
> can't apply mseal to glibc by default for backwards compatibility
> reasons.
>
Based on the discussion in [1], [2], and [3], it seems that the Nvidia
drivers issue is related to loading a 32-bit app and has either a
workaround or a fix. If this is indeed a bug, glibc memory sealing can
help prevent or detect such a bug earlier.

There are two solutions for such a case in the future:
1. Do not turn on sealing when building the main binary.
2. Use the .gnu.mutable solution to mark specific areas/variables to
be not sealable.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2357062#c11 [1]
Link: https://bugzilla.rpmfusion.org/show_bug.cgi?id=7180#c25 [2]
Link: https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/252#note_2918141
[3]

Thanks
-Jeff

> Thanks,
> Florian
>
  
Jeff Xu Oct. 10, 2025, 5:39 p.m. UTC | #8
Hi All.

It seems that all of the opening comments for V6 are addressed.
From a functionality point of view, V6 is suitable for use with Chrome
and ChromeOS.
Adding Stephen Röttger from Chrome point of view if there are any
additional comments.

Could we merge this patch to main glibc ?  ChromeOS is planning to use
this  after upgrading glibc.

Much appreciated.
Thanks
-Jeff


On Fri, Apr 18, 2025 at 11:52 AM Jeff Xu <jeffxu@google.com> wrote:
>
> On Tue, Mar 11, 2025 at 10:13 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> >
> > Changes v5->v6:
> > * Fixed aarch64 gnu attribute handling to work with GCS.
> > * Add support for PT_GNU_MUTABLE.
> > * Fixed mseal manual wording.
> > * Add the tunable to allow some user cases (profiling).
>
> > * Changed semantics to avoid sealing on dependencies if the attribute is not present on the binary.
> I tested the above change.
> With this change, Checkpoint restoration should work fine when glibc
> is built with sealing enabled, since the CRIU doesn't set the seal
> flag in its main binary.
>
> Thanks
> -Jeff
  
Florian Weimer Oct. 13, 2025, 8:17 a.m. UTC | #9
* Jeff Xu:

> Based on the discussion in [1], [2], and [3], it seems that the Nvidia
> drivers issue is related to loading a 32-bit app and has either a
> workaround or a fix. If this is indeed a bug, glibc memory sealing can
> help prevent or detect such a bug earlier.

> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2357062#c11 [1]
> Link: https://bugzilla.rpmfusion.org/show_bug.cgi?id=7180#c25 [2]
> Link: https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/252#note_2918141 [3]

I interpret this comment

| NVIDIA driver releases >= r575 will not do this (broken) patching by default.

<https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/252#note_2918155>

in such a way that the patching code is still there, just disabled by
default.  I don't know what this means from a compatibility perspective.

We have since learned that UCX <https://github.com/openucx/ucx> performs
similar patching.  It commonly shows up on systems as a dependency of
OpenMPI <https://www.open-mpi.org/>, which is one of the more widely
used MPI libraries.

We noticed a while back that the UCX patching does not work on POWER10

  ucx incorrectly patches glibc functions on Power10+ ppc64le systems
  <https://github.com/openucx/ucx/issues/10780>

(it blindly overwrites instructions that it does not recognize).  So I'm
not sure how actively this part of the code is being maintained, and if
there's a chance to move it to a more supportable mechanism.

> There are two solutions for such a case in the future:
> 1. Do not turn on sealing when building the main binary.
> 2. Use the .gnu.mutable solution to mark specific areas/variables to
> be not sealable.

(1) does not work if we are dealing with code that is usually loaded
through dlopen (such as userspace graphics drivers and Python modules).

(2) does not really apply here because this would have to be done during
the glibc build.

Thanks,
Florian
  
Adhemerval Zanella Netto Oct. 13, 2025, 2:48 p.m. UTC | #10
On 13/10/25 05:17, Florian Weimer wrote:
> * Jeff Xu:
> 
>> Based on the discussion in [1], [2], and [3], it seems that the Nvidia
>> drivers issue is related to loading a 32-bit app and has either a
>> workaround or a fix. If this is indeed a bug, glibc memory sealing can
>> help prevent or detect such a bug earlier.
> 
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2357062#c11 [1]
>> Link: https://bugzilla.rpmfusion.org/show_bug.cgi?id=7180#c25 [2]
>> Link: https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/252#note_2918141 [3]
> 
> I interpret this comment
> 
> | NVIDIA driver releases >= r575 will not do this (broken) patching by default.
> 
> <https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/252#note_2918155>
> 
> in such a way that the patching code is still there, just disabled by
> default.  I don't know what this means from a compatibility perspective.
> 
> We have since learned that UCX <https://github.com/openucx/ucx> performs
> similar patching.  It commonly shows up on systems as a dependency of
> OpenMPI <https://www.open-mpi.org/>, which is one of the more widely
> used MPI libraries.
> 
> We noticed a while back that the UCX patching does not work on POWER10
> 
>   ucx incorrectly patches glibc functions on Power10+ ppc64le systems
>   <https://github.com/openucx/ucx/issues/10780>
> 
> (it blindly overwrites instructions that it does not recognize).  So I'm
> not sure how actively this part of the code is being maintained, and if
> there's a chance to move it to a more supportable mechanism.
> 
>> There are two solutions for such a case in the future:
>> 1. Do not turn on sealing when building the main binary.
>> 2. Use the .gnu.mutable solution to mark specific areas/variables to
>> be not sealable.
> 
> (1) does not work if we are dealing with code that is usually loaded
> through dlopen (such as userspace graphics drivers and Python modules).
> 
> (2) does not really apply here because this would have to be done during
> the glibc build.

This (2) can be done at glibc build to make _dl_debug_update mutable, and
that's the idea of adding the mutable.
  
Florian Weimer Oct. 13, 2025, 3 p.m. UTC | #11
* Adhemerval Zanella Netto:

>>> There are two solutions for such a case in the future:
>>> 1. Do not turn on sealing when building the main binary.
>>> 2. Use the .gnu.mutable solution to mark specific areas/variables to
>>> be not sealable.
>> 
>> (1) does not work if we are dealing with code that is usually loaded
>> through dlopen (such as userspace graphics drivers and Python modules).
>> 
>> (2) does not really apply here because this would have to be done during
>> the glibc build.
>
> This (2) can be done at glibc build to make _dl_debug_update mutable, and
> that's the idea of adding the mutable.

We only learned about _dl_debug_update because of the patching failure.
There are probably other examples.  And I'm not sure if using GCC's
section attribute is sufficient to achieve suitable placement of all the
functions we know about—so we might have to write more functions in
assembler for predictable outcomes.

Thanks,
Florian
  
Adhemerval Zanella Netto Oct. 13, 2025, 3:08 p.m. UTC | #12
On 13/10/25 12:00, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>>>> There are two solutions for such a case in the future:
>>>> 1. Do not turn on sealing when building the main binary.
>>>> 2. Use the .gnu.mutable solution to mark specific areas/variables to
>>>> be not sealable.
>>>
>>> (1) does not work if we are dealing with code that is usually loaded
>>> through dlopen (such as userspace graphics drivers and Python modules).
>>>
>>> (2) does not really apply here because this would have to be done during
>>> the glibc build.
>>
>> This (2) can be done at glibc build to make _dl_debug_update mutable, and
>> that's the idea of adding the mutable.
> 
> We only learned about _dl_debug_update because of the patching failure.
> There are probably other examples.  And I'm not sure if using GCC's
> section attribute is sufficient to achieve suitable placement of all the
> functions we know about—so we might have to write more functions in
> assembler for predictable outcomes.

The underlying question is whether this patching is really required and
expected. But I think we make an opt-out libc sealing with a tunable if the
bianry requires such patching; this enable make it the default.
  
Jeff Xu Oct. 13, 2025, 10:07 p.m. UTC | #13
Hi Florian,

On Mon, Oct 13, 2025 at 1:17 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jeff Xu:
>
> > There are two solutions for such a case in the future:
> > 1. Do not turn on sealing when building the main binary.
> > 2. Use the .gnu.mutable solution to mark specific areas/variables to
> > be not sealable.
>
> (1) does not work if we are dealing with code that is usually loaded
> through dlopen (such as userspace graphics drivers and Python modules).
>
In order for dlopen() to seal the mapping, three conditions must be met:
1. the dll has GNU_PROPERTY_MEMORYSEAL flag set
2. dlopen() uses the RTLD_NODELETE flag.
3.The main binary has sealing enabled (e.g. GNU_PROPERTY_MEMORYSEAL is set)

Either the dll or the main binary can disable the sealing at build time.

There will be a long tail of such "runtime patching" cases to be
discovered when apps start enabling sealing. V6 already considered
those cases without causing regression.

Thanks
-Jeff