[v3] elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT

Message ID 20240102151332.205849-1-hjl.tools@gmail.com
State Superseded
Headers
Series [v3] elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

H.J. Lu Jan. 2, 2024, 3:13 p.m. UTC
  Changes in v3:

1. Define and use macros for instruction opcodes and sizes.
2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
0xffffffffULL.
3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
5. Add some comments for mprotect test.
6. Update plt_rewrite logic.

Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
relocation.

For x86-64, add

 #define DT_X86_64_PLT     (DT_LOPROC + 0)
 #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
 #define DT_X86_64_PLTENT  (DT_LOPROC + 3)

1. DT_X86_64_PLT: The address of the procedure linkage table.
2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
table.
3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
entry.

With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
memory offset of the indirect branch instruction.

Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
with direct branch after relocation when the lazy binding is disabled.

PLT rewrite is disabled by default since SELinux may disallow modifying
code pages and ld.so can't detect it in all cases.  Add

$ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1

to enable PLT rewrite at run-time.
---
 elf/dynamic-link.h                   |   5 +
 elf/elf.h                            |   5 +
 elf/tst-glibcelf.py                  |   1 +
 manual/tunables.texi                 |   9 +
 scripts/glibcelf.py                  |   4 +
 sysdeps/x86/cet-control.h            |  14 ++
 sysdeps/x86/cpu-features.c           |  10 ++
 sysdeps/x86/dl-procruntime.c         |   1 +
 sysdeps/x86/dl-tunables.list         |   5 +
 sysdeps/x86_64/Makefile              |  19 ++
 sysdeps/x86_64/configure             |  35 ++++
 sysdeps/x86_64/configure.ac          |   4 +
 sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
 sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
 sysdeps/x86_64/link_map.h            |  22 +++
 sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
 sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
 17 files changed, 467 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
 create mode 100644 sysdeps/x86_64/link_map.h
 create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
 create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
  

Comments

Adhemerval Zanella Netto Jan. 2, 2024, 3:50 p.m. UTC | #1
On 02/01/24 12:13, H.J. Lu wrote:
> Changes in v3:
> 
> 1. Define and use macros for instruction opcodes and sizes.
> 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> 0xffffffffULL.
> 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> 5. Add some comments for mprotect test.
> 6. Update plt_rewrite logic.
> 
> Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> relocation.
> 
> For x86-64, add
> 
>  #define DT_X86_64_PLT     (DT_LOPROC + 0)
>  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
>  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
> 
> 1. DT_X86_64_PLT: The address of the procedure linkage table.
> 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> table.
> 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> entry.
> 
> With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> memory offset of the indirect branch instruction.
> 
> Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> with direct branch after relocation when the lazy binding is disabled.
> 
> PLT rewrite is disabled by default since SELinux may disallow modifying
> code pages and ld.so can't detect it in all cases.  Add
> 
> $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
> 
> to enable PLT rewrite at run-time.
> ---
>  elf/dynamic-link.h                   |   5 +
>  elf/elf.h                            |   5 +
>  elf/tst-glibcelf.py                  |   1 +
>  manual/tunables.texi                 |   9 +
>  scripts/glibcelf.py                  |   4 +
>  sysdeps/x86/cet-control.h            |  14 ++
>  sysdeps/x86/cpu-features.c           |  10 ++
>  sysdeps/x86/dl-procruntime.c         |   1 +
>  sysdeps/x86/dl-tunables.list         |   5 +
>  sysdeps/x86_64/Makefile              |  19 ++
>  sysdeps/x86_64/configure             |  35 ++++
>  sysdeps/x86_64/configure.ac          |   4 +
>  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
>  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
>  sysdeps/x86_64/link_map.h            |  22 +++
>  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
>  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
>  17 files changed, 467 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
>  create mode 100644 sysdeps/x86_64/link_map.h
>  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
>  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
> 
> diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
> index 8cdf7bde09..83d834ecaf 100644
> --- a/elf/dynamic-link.h
> +++ b/elf/dynamic-link.h
> @@ -177,6 +177,10 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
>        }									      \
>    } while (0);
>  
> +# ifndef ELF_DYNAMIC_AFTER_RELOC
> +#  define ELF_DYNAMIC_AFTER_RELOC(map, lazy)
> +# endif
> +
>  /* This can't just be an inline function because GCC is too dumb
>     to inline functions containing inlines themselves.  */
>  # ifdef RTLD_BOOTSTRAP
> @@ -192,6 +196,7 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
>        ELF_DYNAMIC_DO_RELR (map);					      \
>      ELF_DYNAMIC_DO_REL ((map), (scope), edr_lazy, skip_ifunc);		      \
>      ELF_DYNAMIC_DO_RELA ((map), (scope), edr_lazy, skip_ifunc);		      \
> +    ELF_DYNAMIC_AFTER_RELOC ((map), (edr_lazy));			      \
>    } while (0)
>  
>  #endif
> diff --git a/elf/elf.h b/elf/elf.h
> index ca6a7d9d67..455731663c 100644
> --- a/elf/elf.h
> +++ b/elf/elf.h
> @@ -3639,6 +3639,11 @@ enum
>  /* x86-64 sh_type values.  */
>  #define SHT_X86_64_UNWIND	0x70000001 /* Unwind information.  */
>  
> +/* x86-64 d_tag values.  */
> +#define DT_X86_64_PLT		(DT_LOPROC + 0)
> +#define DT_X86_64_PLTSZ		(DT_LOPROC + 1)
> +#define DT_X86_64_PLTENT	(DT_LOPROC + 3)
> +#define DT_X86_64_NUM		4
>  
>  /* AM33 relocations.  */
>  #define R_MN10300_NONE		0	/* No reloc.  */
> diff --git a/elf/tst-glibcelf.py b/elf/tst-glibcelf.py
> index 00cd2bba85..c191636a99 100644
> --- a/elf/tst-glibcelf.py
> +++ b/elf/tst-glibcelf.py
> @@ -187,6 +187,7 @@ DT_VALNUM
>  DT_VALRNGHI
>  DT_VALRNGLO
>  DT_VERSIONTAGNUM
> +DT_X86_64_NUM
>  ELFCLASSNUM
>  ELFDATANUM
>  EM_NUM
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index b31f16da84..f9bd83622e 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -57,6 +57,7 @@ glibc.pthread.stack_cache_size: 0x2800000 (min: 0x0, max: 0xffffffffffffffff)
>  glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffffffffffff)
>  glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
>  glibc.elision.skip_trylock_internal_abort: 3 (min: 0, max: 2147483647)
> +glibc.cpu.plt_rewrite: 0 (min: 0, max: 1)
>  glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffffffffffff)
>  glibc.cpu.x86_ibt:
>  glibc.cpu.hwcaps:
> @@ -614,6 +615,14 @@ this tunable.
>  This tunable is specific to 64-bit x86-64.
>  @end deftp
>  
> +@deftp Tunable glibc.cpu.plt_rewrite
> +When this tunable is set to @code{1}, the dynamic linker will rewrite
> +the PLT section with direct branch after relocation if possible when
> +the lazy binding is disabled.
> +
> +This tunable is specific to x86-64.
> +@end deftp
> +
>  @node Memory Related Tunables
>  @section Memory Related Tunables
>  @cindex memory related tunables
> diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
> index c5e5dda48e..5f3813f326 100644
> --- a/scripts/glibcelf.py
> +++ b/scripts/glibcelf.py
> @@ -439,6 +439,8 @@ class DtRISCV(Dt):
>      """Supplemental DT_* constants for EM_RISCV."""
>  class DtSPARC(Dt):
>      """Supplemental DT_* constants for EM_SPARC."""
> +class DtX86_64(Dt):
> +    """Supplemental DT_* constants for EM_X86_64."""
>  _dt_skip = '''
>  DT_ENCODING DT_PROCNUM
>  DT_ADDRRNGLO DT_ADDRRNGHI DT_ADDRNUM
> @@ -451,6 +453,7 @@ DT_MIPS_NUM
>  DT_PPC_NUM
>  DT_PPC64_NUM
>  DT_SPARC_NUM
> +DT_X86_64_NUM
>  '''.strip().split()
>  _register_elf_h(DtAARCH64, prefix='DT_AARCH64_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtALPHA, prefix='DT_ALPHA_', skip=_dt_skip, parent=Dt)
> @@ -461,6 +464,7 @@ _register_elf_h(DtPPC, prefix='DT_PPC_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtPPC64, prefix='DT_PPC64_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtRISCV, prefix='DT_RISCV_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtSPARC, prefix='DT_SPARC_', skip=_dt_skip, parent=Dt)
> +_register_elf_h(DtX86_64, prefix='DT_X86_64_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(Dt, skip=_dt_skip, ranges=True)
>  del _dt_skip
>  
> diff --git a/sysdeps/x86/cet-control.h b/sysdeps/x86/cet-control.h
> index a45d59bf8c..d4341f98ba 100644
> --- a/sysdeps/x86/cet-control.h
> +++ b/sysdeps/x86/cet-control.h
> @@ -32,10 +32,24 @@ enum dl_x86_cet_control
>    cet_permissive
>  };
>  
> +/* PLT rewrite control.  */
> +enum dl_plt_rewrite_control
> +{
> +  /* No PLT rewrite.  */
> +  plt_rewrite_none,
> +  /* PLT rewrite is enabled at run-time.  */
> +  plt_rewrite_enabled,
> +  /* Rewrite PLT with JMP at run-time.  */
> +  plt_rewrite_jmp,
> +  /* Rewrite PLT with JMPABS at run-time.  */
> +  plt_rewrite_jmpabs
> +};
> +
>  struct dl_x86_feature_control
>  {
>    enum dl_x86_cet_control ibt : 2;
>    enum dl_x86_cet_control shstk : 2;
> +  enum dl_plt_rewrite_control plt_rewrite : 2;
>  };
>  
>  #endif /* cet-control.h */
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index f193ea7a2d..f96e6489ed 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -27,6 +27,13 @@
>  extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
>    attribute_hidden;
>  
> +static void
> +TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> +{
> +  if (valp->numval)

No implicit int checks.

> +    GL(dl_x86_feature_control).plt_rewrite = plt_rewrite_enabled;
> +}
> +
>  #ifdef __LP64__
>  static void
>  TUNABLE_CALLBACK (set_prefer_map_32bit_exec) (tunable_val_t *valp)
> @@ -996,6 +1003,9 @@ no_cpuid:
>  
>    TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps));
>  
> +  TUNABLE_GET (plt_rewrite, tunable_val_t *,
> +	       TUNABLE_CALLBACK (set_plt_rewrite));
> +
>  #ifdef __LP64__
>    TUNABLE_GET (prefer_map_32bit_exec, tunable_val_t *,
>  	       TUNABLE_CALLBACK (set_prefer_map_32bit_exec));
> diff --git a/sysdeps/x86/dl-procruntime.c b/sysdeps/x86/dl-procruntime.c
> index 4d25d9f327..15b3d0d878 100644
> --- a/sysdeps/x86/dl-procruntime.c
> +++ b/sysdeps/x86/dl-procruntime.c
> @@ -67,6 +67,7 @@ PROCINFO_CLASS struct dl_x86_feature_control _dl_x86_feature_control
>  = {
>      .ibt = DEFAULT_DL_X86_CET_CONTROL,
>      .shstk = DEFAULT_DL_X86_CET_CONTROL,
> +    .plt_rewrite = plt_rewrite_none,
>    }
>  # endif
>  # if !defined SHARED || defined PROCINFO_DECL
> diff --git a/sysdeps/x86/dl-tunables.list b/sysdeps/x86/dl-tunables.list
> index 147a7270ec..e2e441e1b7 100644
> --- a/sysdeps/x86/dl-tunables.list
> +++ b/sysdeps/x86/dl-tunables.list
> @@ -66,5 +66,10 @@ glibc {
>      x86_shared_cache_size {
>        type: SIZE_T
>      }
> +    plt_rewrite {
> +      type: INT_32
> +      minval: 0
> +      maxval: 1
> +    }
>    }
>  }
> diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> index 00120ca9ca..106f1d0f55 100644
> --- a/sysdeps/x86_64/Makefile
> +++ b/sysdeps/x86_64/Makefile
> @@ -175,6 +175,25 @@ ifeq (no,$(build-hardcoded-path-in-tests))
>  tests-container += tst-glibc-hwcaps-cache
>  endif
>  
> +ifeq (yes,$(have-z-mark-plt))
> +tests += \
> +  tst-plt-rewrite1 \
> +# tests
> +modules-names += \
> +  tst-plt-rewritemod1 \
> +# modules-names
> +
> +tst-plt-rewrite1-no-pie = yes
> +LDFLAGS-tst-plt-rewrite1 = -Wl,-z,mark-plt,-z,now
> +LDFLAGS-tst-plt-rewritemod1.so = -Wl,-z,mark-plt,-z,now
> +tst-plt-rewrite1-ENV = GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 LD_DEBUG=files:bindings
> +$(objpfx)tst-plt-rewrite1: $(objpfx)tst-plt-rewritemod1.so
> +$(objpfx)tst-plt-rewrite1.out: /dev/null $(objpfx)tst-plt-rewrite1
> +	$(tst-plt-rewrite1-ENV) $(make-test-out) > $@ 2>&1; \
> +	grep -q -E "changing 'bar' PLT entry in .*/elf/tst-plt-rewritemod1.so' to direct branch" $@; \
> +	$(evaluate-test)
> +endif
> +
>  endif # $(subdir) == elf
>  
>  ifeq ($(subdir),csu)
> diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
> index e307467afa..a15eaa349d 100755
> --- a/sysdeps/x86_64/configure
> +++ b/sysdeps/x86_64/configure
> @@ -25,6 +25,41 @@ printf "%s\n" "$libc_cv_cc_mprefer_vector_width" >&6; }
>  config_vars="$config_vars
>  config-cflags-mprefer-vector-width = $libc_cv_cc_mprefer_vector_width"
>  
> +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports -z mark-plt" >&5
> +printf %s "checking for linker that supports -z mark-plt... " >&6; }
> +libc_linker_feature=no
> +cat > conftest.c <<EOF
> +int _start (void) { return 42; }
> +EOF
> +if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp
> +		  -Wl,-z,mark-plt -nostdlib -nostartfiles
> +		  -fPIC -shared -o conftest.so conftest.c
> +		  1>&5'
> +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> +  (eval $ac_try) 2>&5
> +  ac_status=$?
> +  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0; }; }
> +then
> +  if ${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp -Wl,-z,mark-plt -nostdlib \
> +      -nostartfiles -fPIC -shared -o conftest.so conftest.c 2>&1 \
> +      | grep "warning: -z mark-plt ignored" > /dev/null 2>&1; then
> +    true
> +  else
> +    libc_linker_feature=yes
> +  fi
> +fi
> +rm -f conftest*
> +if test $libc_linker_feature = yes; then
> +  libc_cv_z_mark_plt=yes
> +else
> +  libc_cv_z_mark_plt=no
> +fi
> +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_linker_feature" >&5
> +printf "%s\n" "$libc_linker_feature" >&6; }
> +config_vars="$config_vars
> +have-z-mark-plt = $libc_cv_z_mark_plt"
> +
>  if test x"$build_mathvec" = xnotset; then
>    build_mathvec=yes
>  fi
> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> index 1215dcb1e4..40f642db31 100644
> --- a/sysdeps/x86_64/configure.ac
> +++ b/sysdeps/x86_64/configure.ac
> @@ -10,6 +10,10 @@ LIBC_TRY_CC_OPTION([-mprefer-vector-width=128],
>  LIBC_CONFIG_VAR([config-cflags-mprefer-vector-width],
>  		[$libc_cv_cc_mprefer_vector_width])
>  
> +LIBC_LINKER_FEATURE([-z mark-plt], [-Wl,-z,mark-plt],
> +                    [libc_cv_z_mark_plt=yes], [libc_cv_z_mark_plt=no])
> +LIBC_CONFIG_VAR([have-z-mark-plt], [$libc_cv_z_mark_plt])
> +
>  if test x"$build_mathvec" = xnotset; then
>    build_mathvec=yes
>  fi
> diff --git a/sysdeps/x86_64/dl-dtprocnum.h b/sysdeps/x86_64/dl-dtprocnum.h
> new file mode 100644
> index 0000000000..dd41d95b88
> --- /dev/null
> +++ b/sysdeps/x86_64/dl-dtprocnum.h
> @@ -0,0 +1,21 @@
> +/* Configuration of lookup functions.  x64-64 version.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Number of extra dynamic section entries for this architecture.  By
> +   default there are none.  */
> +#define DT_THISPROCNUM	DT_X86_64_NUM
> diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> index e0a9b14469..51935a2fb2 100644
> --- a/sysdeps/x86_64/dl-machine.h
> +++ b/sysdeps/x86_64/dl-machine.h
> @@ -22,6 +22,7 @@
>  #define ELF_MACHINE_NAME "x86_64"
>  
>  #include <assert.h>
> +#include <stdint.h>
>  #include <sys/param.h>
>  #include <sysdep.h>
>  #include <tls.h>
> @@ -35,6 +36,9 @@
>  # define RTLD_START_ENABLE_X86_FEATURES
>  #endif
>  
> +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> +
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
>  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
>  
>        switch (r_type)
>  	{
> -	case R_X86_64_GLOB_DAT:
>  	case R_X86_64_JUMP_SLOT:
> +	  map->l_has_jump_slot_reloc = true;

Although it is not really defined in code guidelines, I think it would good
to start annotate swich fallthrough (with a comment /* fallthrough */ at
least).

> +	case R_X86_64_GLOB_DAT:
>  	  *reloc_addr = value;
>  	  break;
>  
> @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
>  }
>  
>  #endif /* RESOLVE_MAP */
> +
> +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> +    && defined SHARED
> +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> +  x86_64_dynamic_after_reloc (map, (lazy))
> +
> +# define JMP32_INSN_OPCODE	0xe9
> +# define JMP32_INSN_SIZE	5
> +# define JMPABS_INSN_OPCODE	0xa100d5
> +# define JMPABS_INSN_SIZE	11
> +# define INT3_INSN_OPCODE	0xcc
> +
> +static const char *
> +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> +{
> +  const ElfW(Sym) *const symtab
> +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> +  return strtab + refsym->st_name;
> +}
> +
> +static void
> +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> +		    ElfW(Addr) plt_aligned)
> +{
> +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> +  ElfW(Addr) l_addr = map->l_addr;
> +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> +  const ElfW(Rela) *reloc = (const void *) start;
> +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> +
> +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> +					  header.feature_1);
> +  bool ibt_enabled_p
> +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> +
> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> +		      DSO_FILENAME (map->l_name));
> +
> +  for (; reloc < reloc_end; reloc++)
> +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> +      {
> +	/* Get the value from the GOT entry.  */
> +	ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> +
> +	/* Get the corresponding PLT entry from r_addend.  */
> +	ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> +	/* Skip ENDBR64 if IBT isn't enabled.  */
> +	if (!ibt_enabled_p)
> +	  branch_start = ALIGN_DOWN (branch_start, pltent);
> +	/* Get the displacement from the branch target.  */
> +	ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> +	ElfW(Addr) plt_end;
> +	ElfW(Addr) pad;
> +
> +	branch_start += plt_rewrite_bias;

Would be possible to overflow in this case?

> +	plt_end = (branch_start & -pltent) + pltent;
> +
> +	/* Update the PLT entry.  */
> +	if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))

Maybe it would be clear to use the value directly:

  if ((uint64_t) disp + UINT64_C (0x80000000) <= UINT64_C (0xffffffff))

Or (if I am not doing anything wrong):

  if ((uint64_t) dist <= 0x7FFFFFFF)

> +	    <= (uint64_t) UINT32_MAX)
> +	  {
> +	    /* If the target branch can be reached with a direct branch,
> +	       rewrite the PLT entry with a direct branch.  */
> +	    if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> +	      {
> +		const char *sym_name = x86_64_reloc_symbol_name (map,
> +								 reloc);
> +		_dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> +				  "direct branch\n", sym_name,
> +				  DSO_FILENAME (map->l_name));
> +	      }
> +
> +	    pad = branch_start + JMP32_INSN_SIZE;
> +
> +	    if (__glibc_unlikely (pad > plt_end))
> +	      {
> +		if (__glibc_unlikely (GLRO(dl_debug_mask)
> +				      & DL_DEBUG_BINDINGS))
> +		  {
> +		    const char *sym_name
> +		      = x86_64_reloc_symbol_name (map, reloc);
> +		    _dl_debug_printf ("\ninvalid r_addend of "
> +				      "R_X86_64_JUMP_SLOT against '%s' "
> +				      "in '%s'\n", sym_name,
> +				      DSO_FILENAME (map->l_name));
> +		  }
> +
> +		continue;
> +	      }
> +
> +	    /* Write out direct branch.  */
> +	    *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> +	    *((uint32_t *) (branch_start + 1)) = disp;

These are technically UB, so maybe use memcpy here:

  memcpy (branch_start, &(uint8_t) { JMP32_INSN_OPCODE}, sizeof (uint8_t));
  memcpy (branch_start + 1, &disp, sizeof (uint32_t)); 

> +	  }
> +	else
> +	  {
> +	    if (GL(dl_x86_feature_control).plt_rewrite
> +		!= plt_rewrite_jmpabs)
> +	      continue;
> +
> +	    pad = branch_start + JMPABS_INSN_SIZE;
> +
> +	    if (pad > plt_end)
> +	      continue;
> +
> +	    /* Rewrite the PLT entry with JMPABS.  */
> +	    if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> +	      {
> +		const char *sym_name = x86_64_reloc_symbol_name (map,
> +								 reloc);
> +		_dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> +				  "JMPABS\n", sym_name,
> +				  DSO_FILENAME (map->l_name));
> +	      }
> +
> +	    /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> +	       a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> +	       between 4-byte write and 8-byte write.  */
> +	    *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> +	    *(uint64_t *) (branch_start + 3) = value;
> +	  }
> +
> +	/* Fill the unused part of the PLT entry with INT3.  */
> +	for (; pad < plt_end; pad++)
> +	  *(uint8_t *) pad = INT3_INSN_OPCODE;
> +      }
> +}
> +
> +static inline void
> +x86_64_rewrite_plt_in_place (struct link_map *map)
> +{
> +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> +		    + map->l_addr);
> +  size_t pagesize = GLRO(dl_pagesize);
> +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> +		  + plt - plt_aligned);
> +
> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> +		      DSO_FILENAME (map->l_name));
> +
> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> +				    PROT_WRITE | PROT_READ) < 0))
> +    {
> +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +	_dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> +			  DSO_FILENAME (map->l_name));
> +      return;
> +    }
> +
> +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> +
> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> +          DSO_FILENAME (map->l_name));
> +
> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> +				    PROT_EXEC | PROT_READ) < 0))
> +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> +		      "failed to change PLT back to read-only");
> +}
> +
> +/* Rewrite PLT entries to direct branch if possible.  */
> +
> +static inline void
> +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> +{
> +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> +  if (lazy)
> +    return;
> +
> +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> +    return;
> +
> +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> +  if (!map->l_has_jump_slot_reloc)
> +    return;
> +
> +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> +    return;
> +
> +  /* Ignore DT_X86_64_PLT if
> +     1. DT_JMPREL isn't available or its value is 0.
> +     2. DT_PLTRELSZ is 0.
> +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> +	16 bytes.
> +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> +	DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> +	value.  */
> +  if (map->l_info[DT_JMPREL] == NULL
> +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> +	  < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> +	  % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> +    return;
> +
> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> +    {
> +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> +
> +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> +			  MAP_PRIVATE | MAP_ANONYMOUS,
> +			  -1, 0);
> +      if (__glibc_unlikely (plt != MAP_FAILED))
> +	{
> +	  /* Touch the PROT_READ | PROT_WRITE page.  */
> +	  *(int32_t *) plt = -1;
> +
> +	  /* If the updated PROT_READ | PROT_WRITE page can be changed
> +	     to PROT_EXEC | PROT_READ, rewrite PLT.  */
> +	  if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
> +	    /* Use JMPABS on APX processors.  */
> +	    plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
> +						  APX_F)
> +			   ? plt_rewrite_jmpabs : plt_rewrite_jmp);
> +
> +	  __munmap (plt, 4096);
> +	}

Do we have x86_64 cpu/kernel that does not support setting PROT_EXEC | PROT_READ
(afaik only i386 has some restrictions)?  Because this runtime tests is somewhat
brittle: mmap might fail or being restricted (for instance due filtering or
resource limitation/exaustion).

> +
> +      GL(dl_x86_feature_control).plt_rewrite = plt_rewrite;
> +    }
> +
> +  /* Ignore DT_X86_64_PLT if PLT rewrite isn't enabled.  */
> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_none)
> +    return;
> +
> +  x86_64_rewrite_plt_in_place (map);
> +}
> +#endif
> diff --git a/sysdeps/x86_64/link_map.h b/sysdeps/x86_64/link_map.h
> new file mode 100644
> index 0000000000..ddb8e78077
> --- /dev/null
> +++ b/sysdeps/x86_64/link_map.h
> @@ -0,0 +1,22 @@
> +/* Additional fields in struct link_map.  x86-64 version.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Has R_X86_64_JUMP_SLOT relocation.  */
> +bool l_has_jump_slot_reloc;
> +
> +#include <sysdeps/x86/link_map.h>
> diff --git a/sysdeps/x86_64/tst-plt-rewrite1.c b/sysdeps/x86_64/tst-plt-rewrite1.c
> new file mode 100644
> index 0000000000..0dd4724a5a
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-plt-rewrite1.c
> @@ -0,0 +1,31 @@
> +/* Test PLT rewrite.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <string.h>
> +#include <support/check.h>
> +
> +extern const char *foo (void);
> +
> +static int
> +do_test (void)
> +{
> +  TEST_COMPARE (strcmp (foo (), "PLT rewrite works"), 0);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> diff --git a/sysdeps/x86_64/tst-plt-rewritemod1.c b/sysdeps/x86_64/tst-plt-rewritemod1.c
> new file mode 100644
> index 0000000000..361b907790
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-plt-rewritemod1.c
> @@ -0,0 +1,32 @@
> +/* Check PLT rewrite works correctly.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* foo calls bar with indirect branch via PLT.  PLT rewrite should
> +   change it to direct branch.  */
> +
> +const char *
> +bar (void)
> +{
> +  return "PLT rewrite works";
> +}
> +
> +const char *
> +foo (void)
> +{
> +  return bar ();
> +}
  
Florian Weimer Jan. 2, 2024, 4:33 p.m. UTC | #2
* Adhemerval Zanella Netto:

>> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
>> +    {
>> +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
>> +
>> +      /* PLT rewrite is enabled.  Check if mprotect works.  */
>> +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
>> +			  MAP_PRIVATE | MAP_ANONYMOUS,
>> +			  -1, 0);
>> +      if (__glibc_unlikely (plt != MAP_FAILED))
>> +	{
>> +	  /* Touch the PROT_READ | PROT_WRITE page.  */
>> +	  *(int32_t *) plt = -1;
>> +
>> +	  /* If the updated PROT_READ | PROT_WRITE page can be changed
>> +	     to PROT_EXEC | PROT_READ, rewrite PLT.  */
>> +	  if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
>> +	    /* Use JMPABS on APX processors.  */
>> +	    plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
>> +						  APX_F)
>> +			   ? plt_rewrite_jmpabs : plt_rewrite_jmp);
>> +
>> +	  __munmap (plt, 4096);
>> +	}
>
> Do we have x86_64 cpu/kernel that does not support setting PROT_EXEC |
> PROT_READ (afaik only i386 has some restrictions)?  Because this
> runtime tests is somewhat brittle: mmap might fail or being restricted
> (for instance due filtering or resource limitation/exaustion).

I assume you are referring to read-implies-exec here?  Some early x86-64
CPUs have this property as well.  But even for those, the protection
flags are tracked by the kernel separately, show up in /proc, etc.
There's just no CPU enforcement.

But I think H.J.'s checks are about probing for restrictions imposed by
Linux security modules.  Some do not allow mapping memory as read/write
first, and then map it executable.

Thanks,
Florian
  
H.J. Lu Jan. 2, 2024, 5:02 p.m. UTC | #3
On Tue, Jan 2, 2024 at 7:51 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 02/01/24 12:13, H.J. Lu wrote:
> > Changes in v3:
> >
> > 1. Define and use macros for instruction opcodes and sizes.
> > 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> > 0xffffffffULL.
> > 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> > 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> > 5. Add some comments for mprotect test.
> > 6. Update plt_rewrite logic.
> >
> > Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> > relocation.
> >
> > For x86-64, add
> >
> >  #define DT_X86_64_PLT     (DT_LOPROC + 0)
> >  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
> >  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
> >
> > 1. DT_X86_64_PLT: The address of the procedure linkage table.
> > 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> > table.
> > 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> > entry.
> >
> > With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> > memory offset of the indirect branch instruction.
> >
> > Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> > with direct branch after relocation when the lazy binding is disabled.
> >
> > PLT rewrite is disabled by default since SELinux may disallow modifying
> > code pages and ld.so can't detect it in all cases.  Add
> >
> > $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
> >
> > to enable PLT rewrite at run-time.
> > ---
> >  elf/dynamic-link.h                   |   5 +
> >  elf/elf.h                            |   5 +
> >  elf/tst-glibcelf.py                  |   1 +
> >  manual/tunables.texi                 |   9 +
> >  scripts/glibcelf.py                  |   4 +
> >  sysdeps/x86/cet-control.h            |  14 ++
> >  sysdeps/x86/cpu-features.c           |  10 ++
> >  sysdeps/x86/dl-procruntime.c         |   1 +
> >  sysdeps/x86/dl-tunables.list         |   5 +
> >  sysdeps/x86_64/Makefile              |  19 ++
> >  sysdeps/x86_64/configure             |  35 ++++
> >  sysdeps/x86_64/configure.ac          |   4 +
> >  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
> >  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
> >  sysdeps/x86_64/link_map.h            |  22 +++
> >  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
> >  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
> >  17 files changed, 467 insertions(+), 1 deletion(-)
> >  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
> >  create mode 100644 sysdeps/x86_64/link_map.h
> >  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
> >  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
> >
> > diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
> > index 8cdf7bde09..83d834ecaf 100644
> > --- a/elf/dynamic-link.h
> > +++ b/elf/dynamic-link.h
> > @@ -177,6 +177,10 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> >        }                                                                            \
> >    } while (0);
> >
> > +# ifndef ELF_DYNAMIC_AFTER_RELOC
> > +#  define ELF_DYNAMIC_AFTER_RELOC(map, lazy)
> > +# endif
> > +
> >  /* This can't just be an inline function because GCC is too dumb
> >     to inline functions containing inlines themselves.  */
> >  # ifdef RTLD_BOOTSTRAP
> > @@ -192,6 +196,7 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> >        ELF_DYNAMIC_DO_RELR (map);                                           \
> >      ELF_DYNAMIC_DO_REL ((map), (scope), edr_lazy, skip_ifunc);                     \
> >      ELF_DYNAMIC_DO_RELA ((map), (scope), edr_lazy, skip_ifunc);                    \
> > +    ELF_DYNAMIC_AFTER_RELOC ((map), (edr_lazy));                           \
> >    } while (0)
> >
> >  #endif
> > diff --git a/elf/elf.h b/elf/elf.h
> > index ca6a7d9d67..455731663c 100644
> > --- a/elf/elf.h
> > +++ b/elf/elf.h
> > @@ -3639,6 +3639,11 @@ enum
> >  /* x86-64 sh_type values.  */
> >  #define SHT_X86_64_UNWIND    0x70000001 /* Unwind information.  */
> >
> > +/* x86-64 d_tag values.  */
> > +#define DT_X86_64_PLT                (DT_LOPROC + 0)
> > +#define DT_X86_64_PLTSZ              (DT_LOPROC + 1)
> > +#define DT_X86_64_PLTENT     (DT_LOPROC + 3)
> > +#define DT_X86_64_NUM                4
> >
> >  /* AM33 relocations.  */
> >  #define R_MN10300_NONE               0       /* No reloc.  */
> > diff --git a/elf/tst-glibcelf.py b/elf/tst-glibcelf.py
> > index 00cd2bba85..c191636a99 100644
> > --- a/elf/tst-glibcelf.py
> > +++ b/elf/tst-glibcelf.py
> > @@ -187,6 +187,7 @@ DT_VALNUM
> >  DT_VALRNGHI
> >  DT_VALRNGLO
> >  DT_VERSIONTAGNUM
> > +DT_X86_64_NUM
> >  ELFCLASSNUM
> >  ELFDATANUM
> >  EM_NUM
> > diff --git a/manual/tunables.texi b/manual/tunables.texi
> > index b31f16da84..f9bd83622e 100644
> > --- a/manual/tunables.texi
> > +++ b/manual/tunables.texi
> > @@ -57,6 +57,7 @@ glibc.pthread.stack_cache_size: 0x2800000 (min: 0x0, max: 0xffffffffffffffff)
> >  glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffffffffffff)
> >  glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
> >  glibc.elision.skip_trylock_internal_abort: 3 (min: 0, max: 2147483647)
> > +glibc.cpu.plt_rewrite: 0 (min: 0, max: 1)
> >  glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> >  glibc.cpu.x86_ibt:
> >  glibc.cpu.hwcaps:
> > @@ -614,6 +615,14 @@ this tunable.
> >  This tunable is specific to 64-bit x86-64.
> >  @end deftp
> >
> > +@deftp Tunable glibc.cpu.plt_rewrite
> > +When this tunable is set to @code{1}, the dynamic linker will rewrite
> > +the PLT section with direct branch after relocation if possible when
> > +the lazy binding is disabled.
> > +
> > +This tunable is specific to x86-64.
> > +@end deftp
> > +
> >  @node Memory Related Tunables
> >  @section Memory Related Tunables
> >  @cindex memory related tunables
> > diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
> > index c5e5dda48e..5f3813f326 100644
> > --- a/scripts/glibcelf.py
> > +++ b/scripts/glibcelf.py
> > @@ -439,6 +439,8 @@ class DtRISCV(Dt):
> >      """Supplemental DT_* constants for EM_RISCV."""
> >  class DtSPARC(Dt):
> >      """Supplemental DT_* constants for EM_SPARC."""
> > +class DtX86_64(Dt):
> > +    """Supplemental DT_* constants for EM_X86_64."""
> >  _dt_skip = '''
> >  DT_ENCODING DT_PROCNUM
> >  DT_ADDRRNGLO DT_ADDRRNGHI DT_ADDRNUM
> > @@ -451,6 +453,7 @@ DT_MIPS_NUM
> >  DT_PPC_NUM
> >  DT_PPC64_NUM
> >  DT_SPARC_NUM
> > +DT_X86_64_NUM
> >  '''.strip().split()
> >  _register_elf_h(DtAARCH64, prefix='DT_AARCH64_', skip=_dt_skip, parent=Dt)
> >  _register_elf_h(DtALPHA, prefix='DT_ALPHA_', skip=_dt_skip, parent=Dt)
> > @@ -461,6 +464,7 @@ _register_elf_h(DtPPC, prefix='DT_PPC_', skip=_dt_skip, parent=Dt)
> >  _register_elf_h(DtPPC64, prefix='DT_PPC64_', skip=_dt_skip, parent=Dt)
> >  _register_elf_h(DtRISCV, prefix='DT_RISCV_', skip=_dt_skip, parent=Dt)
> >  _register_elf_h(DtSPARC, prefix='DT_SPARC_', skip=_dt_skip, parent=Dt)
> > +_register_elf_h(DtX86_64, prefix='DT_X86_64_', skip=_dt_skip, parent=Dt)
> >  _register_elf_h(Dt, skip=_dt_skip, ranges=True)
> >  del _dt_skip
> >
> > diff --git a/sysdeps/x86/cet-control.h b/sysdeps/x86/cet-control.h
> > index a45d59bf8c..d4341f98ba 100644
> > --- a/sysdeps/x86/cet-control.h
> > +++ b/sysdeps/x86/cet-control.h
> > @@ -32,10 +32,24 @@ enum dl_x86_cet_control
> >    cet_permissive
> >  };
> >
> > +/* PLT rewrite control.  */
> > +enum dl_plt_rewrite_control
> > +{
> > +  /* No PLT rewrite.  */
> > +  plt_rewrite_none,
> > +  /* PLT rewrite is enabled at run-time.  */
> > +  plt_rewrite_enabled,
> > +  /* Rewrite PLT with JMP at run-time.  */
> > +  plt_rewrite_jmp,
> > +  /* Rewrite PLT with JMPABS at run-time.  */
> > +  plt_rewrite_jmpabs
> > +};
> > +
> >  struct dl_x86_feature_control
> >  {
> >    enum dl_x86_cet_control ibt : 2;
> >    enum dl_x86_cet_control shstk : 2;
> > +  enum dl_plt_rewrite_control plt_rewrite : 2;
> >  };
> >
> >  #endif /* cet-control.h */
> > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > index f193ea7a2d..f96e6489ed 100644
> > --- a/sysdeps/x86/cpu-features.c
> > +++ b/sysdeps/x86/cpu-features.c
> > @@ -27,6 +27,13 @@
> >  extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
> >    attribute_hidden;
> >
> > +static void
> > +TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > +{
> > +  if (valp->numval)
>
> No implicit int checks.

Will fix.

> > +    GL(dl_x86_feature_control).plt_rewrite = plt_rewrite_enabled;
> > +}
> > +
> >  #ifdef __LP64__
> >  static void
> >  TUNABLE_CALLBACK (set_prefer_map_32bit_exec) (tunable_val_t *valp)
> > @@ -996,6 +1003,9 @@ no_cpuid:
> >
> >    TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps));
> >
> > +  TUNABLE_GET (plt_rewrite, tunable_val_t *,
> > +            TUNABLE_CALLBACK (set_plt_rewrite));
> > +
> >  #ifdef __LP64__
> >    TUNABLE_GET (prefer_map_32bit_exec, tunable_val_t *,
> >              TUNABLE_CALLBACK (set_prefer_map_32bit_exec));
> > diff --git a/sysdeps/x86/dl-procruntime.c b/sysdeps/x86/dl-procruntime.c
> > index 4d25d9f327..15b3d0d878 100644
> > --- a/sysdeps/x86/dl-procruntime.c
> > +++ b/sysdeps/x86/dl-procruntime.c
> > @@ -67,6 +67,7 @@ PROCINFO_CLASS struct dl_x86_feature_control _dl_x86_feature_control
> >  = {
> >      .ibt = DEFAULT_DL_X86_CET_CONTROL,
> >      .shstk = DEFAULT_DL_X86_CET_CONTROL,
> > +    .plt_rewrite = plt_rewrite_none,
> >    }
> >  # endif
> >  # if !defined SHARED || defined PROCINFO_DECL
> > diff --git a/sysdeps/x86/dl-tunables.list b/sysdeps/x86/dl-tunables.list
> > index 147a7270ec..e2e441e1b7 100644
> > --- a/sysdeps/x86/dl-tunables.list
> > +++ b/sysdeps/x86/dl-tunables.list
> > @@ -66,5 +66,10 @@ glibc {
> >      x86_shared_cache_size {
> >        type: SIZE_T
> >      }
> > +    plt_rewrite {
> > +      type: INT_32
> > +      minval: 0
> > +      maxval: 1
> > +    }
> >    }
> >  }
> > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> > index 00120ca9ca..106f1d0f55 100644
> > --- a/sysdeps/x86_64/Makefile
> > +++ b/sysdeps/x86_64/Makefile
> > @@ -175,6 +175,25 @@ ifeq (no,$(build-hardcoded-path-in-tests))
> >  tests-container += tst-glibc-hwcaps-cache
> >  endif
> >
> > +ifeq (yes,$(have-z-mark-plt))
> > +tests += \
> > +  tst-plt-rewrite1 \
> > +# tests
> > +modules-names += \
> > +  tst-plt-rewritemod1 \
> > +# modules-names
> > +
> > +tst-plt-rewrite1-no-pie = yes
> > +LDFLAGS-tst-plt-rewrite1 = -Wl,-z,mark-plt,-z,now
> > +LDFLAGS-tst-plt-rewritemod1.so = -Wl,-z,mark-plt,-z,now
> > +tst-plt-rewrite1-ENV = GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 LD_DEBUG=files:bindings
> > +$(objpfx)tst-plt-rewrite1: $(objpfx)tst-plt-rewritemod1.so
> > +$(objpfx)tst-plt-rewrite1.out: /dev/null $(objpfx)tst-plt-rewrite1
> > +     $(tst-plt-rewrite1-ENV) $(make-test-out) > $@ 2>&1; \
> > +     grep -q -E "changing 'bar' PLT entry in .*/elf/tst-plt-rewritemod1.so' to direct branch" $@; \
> > +     $(evaluate-test)
> > +endif
> > +
> >  endif # $(subdir) == elf
> >
> >  ifeq ($(subdir),csu)
> > diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
> > index e307467afa..a15eaa349d 100755
> > --- a/sysdeps/x86_64/configure
> > +++ b/sysdeps/x86_64/configure
> > @@ -25,6 +25,41 @@ printf "%s\n" "$libc_cv_cc_mprefer_vector_width" >&6; }
> >  config_vars="$config_vars
> >  config-cflags-mprefer-vector-width = $libc_cv_cc_mprefer_vector_width"
> >
> > +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports -z mark-plt" >&5
> > +printf %s "checking for linker that supports -z mark-plt... " >&6; }
> > +libc_linker_feature=no
> > +cat > conftest.c <<EOF
> > +int _start (void) { return 42; }
> > +EOF
> > +if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp
> > +               -Wl,-z,mark-plt -nostdlib -nostartfiles
> > +               -fPIC -shared -o conftest.so conftest.c
> > +               1>&5'
> > +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> > +  (eval $ac_try) 2>&5
> > +  ac_status=$?
> > +  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> > +  test $ac_status = 0; }; }
> > +then
> > +  if ${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp -Wl,-z,mark-plt -nostdlib \
> > +      -nostartfiles -fPIC -shared -o conftest.so conftest.c 2>&1 \
> > +      | grep "warning: -z mark-plt ignored" > /dev/null 2>&1; then
> > +    true
> > +  else
> > +    libc_linker_feature=yes
> > +  fi
> > +fi
> > +rm -f conftest*
> > +if test $libc_linker_feature = yes; then
> > +  libc_cv_z_mark_plt=yes
> > +else
> > +  libc_cv_z_mark_plt=no
> > +fi
> > +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_linker_feature" >&5
> > +printf "%s\n" "$libc_linker_feature" >&6; }
> > +config_vars="$config_vars
> > +have-z-mark-plt = $libc_cv_z_mark_plt"
> > +
> >  if test x"$build_mathvec" = xnotset; then
> >    build_mathvec=yes
> >  fi
> > diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> > index 1215dcb1e4..40f642db31 100644
> > --- a/sysdeps/x86_64/configure.ac
> > +++ b/sysdeps/x86_64/configure.ac
> > @@ -10,6 +10,10 @@ LIBC_TRY_CC_OPTION([-mprefer-vector-width=128],
> >  LIBC_CONFIG_VAR([config-cflags-mprefer-vector-width],
> >               [$libc_cv_cc_mprefer_vector_width])
> >
> > +LIBC_LINKER_FEATURE([-z mark-plt], [-Wl,-z,mark-plt],
> > +                    [libc_cv_z_mark_plt=yes], [libc_cv_z_mark_plt=no])
> > +LIBC_CONFIG_VAR([have-z-mark-plt], [$libc_cv_z_mark_plt])
> > +
> >  if test x"$build_mathvec" = xnotset; then
> >    build_mathvec=yes
> >  fi
> > diff --git a/sysdeps/x86_64/dl-dtprocnum.h b/sysdeps/x86_64/dl-dtprocnum.h
> > new file mode 100644
> > index 0000000000..dd41d95b88
> > --- /dev/null
> > +++ b/sysdeps/x86_64/dl-dtprocnum.h
> > @@ -0,0 +1,21 @@
> > +/* Configuration of lookup functions.  x64-64 version.
> > +   Copyright (C) 2023 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +/* Number of extra dynamic section entries for this architecture.  By
> > +   default there are none.  */
> > +#define DT_THISPROCNUM       DT_X86_64_NUM
> > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > index e0a9b14469..51935a2fb2 100644
> > --- a/sysdeps/x86_64/dl-machine.h
> > +++ b/sysdeps/x86_64/dl-machine.h
> > @@ -22,6 +22,7 @@
> >  #define ELF_MACHINE_NAME "x86_64"
> >
> >  #include <assert.h>
> > +#include <stdint.h>
> >  #include <sys/param.h>
> >  #include <sysdep.h>
> >  #include <tls.h>
> > @@ -35,6 +36,9 @@
> >  # define RTLD_START_ENABLE_X86_FEATURES
> >  #endif
> >
> > +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> > +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> > +
> >  /* Return nonzero iff ELF header is compatible with the running host.  */
> >  static inline int __attribute__ ((unused))
> >  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> > @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
> >
> >        switch (r_type)
> >       {
> > -     case R_X86_64_GLOB_DAT:
> >       case R_X86_64_JUMP_SLOT:
> > +       map->l_has_jump_slot_reloc = true;
>
> Although it is not really defined in code guidelines, I think it would good
> to start annotate swich fallthrough (with a comment /* fallthrough */ at
> least).

Will fix.

> > +     case R_X86_64_GLOB_DAT:
> >         *reloc_addr = value;
> >         break;
> >
> > @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> >  }
> >
> >  #endif /* RESOLVE_MAP */
> > +
> > +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> > +    && defined SHARED
> > +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> > +  x86_64_dynamic_after_reloc (map, (lazy))
> > +
> > +# define JMP32_INSN_OPCODE   0xe9
> > +# define JMP32_INSN_SIZE     5
> > +# define JMPABS_INSN_OPCODE  0xa100d5
> > +# define JMPABS_INSN_SIZE    11
> > +# define INT3_INSN_OPCODE    0xcc
> > +
> > +static const char *
> > +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> > +{
> > +  const ElfW(Sym) *const symtab
> > +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> > +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> > +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> > +  return strtab + refsym->st_name;
> > +}
> > +
> > +static void
> > +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> > +                 ElfW(Addr) plt_aligned)
> > +{
> > +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> > +  ElfW(Addr) l_addr = map->l_addr;
> > +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> > +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> > +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> > +  const ElfW(Rela) *reloc = (const void *) start;
> > +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> > +
> > +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> > +                                       header.feature_1);
> > +  bool ibt_enabled_p
> > +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> > +
> > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> > +                   DSO_FILENAME (map->l_name));
> > +
> > +  for (; reloc < reloc_end; reloc++)
> > +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> > +      {
> > +     /* Get the value from the GOT entry.  */
> > +     ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> > +
> > +     /* Get the corresponding PLT entry from r_addend.  */
> > +     ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> > +     /* Skip ENDBR64 if IBT isn't enabled.  */
> > +     if (!ibt_enabled_p)
> > +       branch_start = ALIGN_DOWN (branch_start, pltent);
> > +     /* Get the displacement from the branch target.  */
> > +     ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> > +     ElfW(Addr) plt_end;
> > +     ElfW(Addr) pad;
> > +
> > +     branch_start += plt_rewrite_bias;
>
> Would be possible to overflow in this case?

I will drop plt_rewrite_bias since x86_64_rewrite_plt is called with

  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);

> > +     plt_end = (branch_start & -pltent) + pltent;
> > +
> > +     /* Update the PLT entry.  */
> > +     if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
>
> Maybe it would be clear to use the value directly:
>
>   if ((uint64_t) disp + UINT64_C (0x80000000) <= UINT64_C (0xffffffff))

I was using

if ((disp + 0x80000000ULL) <= 0xffffffffULL)

before.  Should I change it back or with UINT64_C?

> Or (if I am not doing anything wrong):
>
>   if ((uint64_t) dist <= 0x7FFFFFFF)

(uint64_t)(INT32_MIN) <= 0x7FFFFFFF

is false.

>
> > +         <= (uint64_t) UINT32_MAX)
> > +       {
> > +         /* If the target branch can be reached with a direct branch,
> > +            rewrite the PLT entry with a direct branch.  */
> > +         if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > +           {
> > +             const char *sym_name = x86_64_reloc_symbol_name (map,
> > +                                                              reloc);
> > +             _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > +                               "direct branch\n", sym_name,
> > +                               DSO_FILENAME (map->l_name));
> > +           }
> > +
> > +         pad = branch_start + JMP32_INSN_SIZE;
> > +
> > +         if (__glibc_unlikely (pad > plt_end))
> > +           {
> > +             if (__glibc_unlikely (GLRO(dl_debug_mask)
> > +                                   & DL_DEBUG_BINDINGS))
> > +               {
> > +                 const char *sym_name
> > +                   = x86_64_reloc_symbol_name (map, reloc);
> > +                 _dl_debug_printf ("\ninvalid r_addend of "
> > +                                   "R_X86_64_JUMP_SLOT against '%s' "
> > +                                   "in '%s'\n", sym_name,
> > +                                   DSO_FILENAME (map->l_name));
> > +               }
> > +
> > +             continue;
> > +           }
> > +
> > +         /* Write out direct branch.  */
> > +         *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> > +         *((uint32_t *) (branch_start + 1)) = disp;
>
> These are technically UB, so maybe use memcpy here:
>
>   memcpy (branch_start, &(uint8_t) { JMP32_INSN_OPCODE}, sizeof (uint8_t));
>   memcpy (branch_start + 1, &disp, sizeof (uint32_t));

It should work since there is no difference from x86-64 memcpy:

L(between_8_15):
        /* From 8 to 15.  No branch when size == 8.  */
        movq    -8(%rsi, %rdx), %rcx
        movq    (%rsi), %rsi
        movq    %rsi, (%rdi)
        movq    %rcx, -8(%rdi, %rdx)
        ret

> > +       }
> > +     else
> > +       {
> > +         if (GL(dl_x86_feature_control).plt_rewrite
> > +             != plt_rewrite_jmpabs)
> > +           continue;
> > +
> > +         pad = branch_start + JMPABS_INSN_SIZE;
> > +
> > +         if (pad > plt_end)
> > +           continue;
> > +
> > +         /* Rewrite the PLT entry with JMPABS.  */
> > +         if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > +           {
> > +             const char *sym_name = x86_64_reloc_symbol_name (map,
> > +                                                              reloc);
> > +             _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > +                               "JMPABS\n", sym_name,
> > +                               DSO_FILENAME (map->l_name));
> > +           }
> > +
> > +         /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> > +            a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> > +            between 4-byte write and 8-byte write.  */
> > +         *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> > +         *(uint64_t *) (branch_start + 3) = value;
> > +       }
> > +
> > +     /* Fill the unused part of the PLT entry with INT3.  */
> > +     for (; pad < plt_end; pad++)
> > +       *(uint8_t *) pad = INT3_INSN_OPCODE;
> > +      }
> > +}
> > +
> > +static inline void
> > +x86_64_rewrite_plt_in_place (struct link_map *map)
> > +{
> > +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> > +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> > +                 + map->l_addr);
> > +  size_t pagesize = GLRO(dl_pagesize);
> > +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> > +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > +               + plt - plt_aligned);
> > +
> > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> > +                   DSO_FILENAME (map->l_name));
> > +
> > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > +                                 PROT_WRITE | PROT_READ) < 0))
> > +    {
> > +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +     _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> > +                       DSO_FILENAME (map->l_name));
> > +      return;
> > +    }
> > +
> > +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> > +
> > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> > +          DSO_FILENAME (map->l_name));
> > +
> > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > +                                 PROT_EXEC | PROT_READ) < 0))
> > +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> > +                   "failed to change PLT back to read-only");
> > +}
> > +
> > +/* Rewrite PLT entries to direct branch if possible.  */
> > +
> > +static inline void
> > +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> > +{
> > +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> > +  if (lazy)
> > +    return;
> > +
> > +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> > +    return;
> > +
> > +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> > +  if (!map->l_has_jump_slot_reloc)
> > +    return;
> > +
> > +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> > +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> > +    return;
> > +
> > +  /* Ignore DT_X86_64_PLT if
> > +     1. DT_JMPREL isn't available or its value is 0.
> > +     2. DT_PLTRELSZ is 0.
> > +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> > +     16 bytes.
> > +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> > +     DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> > +     value.  */
> > +  if (map->l_info[DT_JMPREL] == NULL
> > +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> > +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> > +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> > +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> > +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > +       < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > +       % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> > +    return;
> > +
> > +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> > +    {
> > +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> > +
> > +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> > +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> > +                       MAP_PRIVATE | MAP_ANONYMOUS,
> > +                       -1, 0);
> > +      if (__glibc_unlikely (plt != MAP_FAILED))
> > +     {
> > +       /* Touch the PROT_READ | PROT_WRITE page.  */
> > +       *(int32_t *) plt = -1;
> > +
> > +       /* If the updated PROT_READ | PROT_WRITE page can be changed
> > +          to PROT_EXEC | PROT_READ, rewrite PLT.  */
> > +       if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
> > +         /* Use JMPABS on APX processors.  */
> > +         plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
> > +                                               APX_F)
> > +                        ? plt_rewrite_jmpabs : plt_rewrite_jmp);
> > +
> > +       __munmap (plt, 4096);
> > +     }
>
> Do we have x86_64 cpu/kernel that does not support setting PROT_EXEC | PROT_READ
> (afaik only i386 has some restrictions)?  Because this runtime tests is somewhat
> brittle: mmap might fail or being restricted (for instance due filtering or
> resource limitation/exaustion).

As Florian mentioned, it checks for SELinux restrictions.  It isn't 100%
foolproof since SELinux may allow it only for non-program mapped memory
and disallow it for program mapped memory.

Thanks.
  
Adhemerval Zanella Netto Jan. 2, 2024, 5:38 p.m. UTC | #4
On 02/01/24 13:33, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>>> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
>>> +    {
>>> +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
>>> +
>>> +      /* PLT rewrite is enabled.  Check if mprotect works.  */
>>> +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
>>> +			  MAP_PRIVATE | MAP_ANONYMOUS,
>>> +			  -1, 0);
>>> +      if (__glibc_unlikely (plt != MAP_FAILED))
>>> +	{
>>> +	  /* Touch the PROT_READ | PROT_WRITE page.  */
>>> +	  *(int32_t *) plt = -1;
>>> +
>>> +	  /* If the updated PROT_READ | PROT_WRITE page can be changed
>>> +	     to PROT_EXEC | PROT_READ, rewrite PLT.  */
>>> +	  if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
>>> +	    /* Use JMPABS on APX processors.  */
>>> +	    plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
>>> +						  APX_F)
>>> +			   ? plt_rewrite_jmpabs : plt_rewrite_jmp);
>>> +
>>> +	  __munmap (plt, 4096);
>>> +	}
>>
>> Do we have x86_64 cpu/kernel that does not support setting PROT_EXEC |
>> PROT_READ (afaik only i386 has some restrictions)?  Because this
>> runtime tests is somewhat brittle: mmap might fail or being restricted
>> (for instance due filtering or resource limitation/exaustion).
> 
> I assume you are referring to read-implies-exec here?  Some early x86-64
> CPUs have this property as well.  But even for those, the protection
> flags are tracked by the kernel separately, show up in /proc, etc.
> There's just no CPU enforcement.
> 
> But I think H.J.'s checks are about probing for restrictions imposed by
> Linux security modules.  Some do not allow mapping memory as read/write
> first, and then map it executable.

But does it make sense to add such runtime check when this option is gated
through a tunable? Specially because it might not fully discover whether
kernel does allow it.
  
Adhemerval Zanella Netto Jan. 2, 2024, 5:42 p.m. UTC | #5
On 02/01/24 14:02, H.J. Lu wrote:
> On Tue, Jan 2, 2024 at 7:51 AM Adhemerval Zanella Netto

>> Maybe it would be clear to use the value directly:
>>
>>   if ((uint64_t) disp + UINT64_C (0x80000000) <= UINT64_C (0xffffffff))
> 
> I was using
> 
> if ((disp + 0x80000000ULL) <= 0xffffffffULL)
> 
> before.  Should I change it back or with UINT64_C?

I just saw it that you stated it on cover-letter, I don't have a strong opnion
in fact.  So either should be fine.

> 
>> Or (if I am not doing anything wrong):
>>
>>   if ((uint64_t) dist <= 0x7FFFFFFF)
> 
> (uint64_t)(INT32_MIN) <= 0x7FFFFFFF
> 
> is false.

Yeah, I just realize that the update value is a signed one.

> 
>>
>>> +         <= (uint64_t) UINT32_MAX)
>>> +       {
>>> +         /* If the target branch can be reached with a direct branch,
>>> +            rewrite the PLT entry with a direct branch.  */
>>> +         if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
>>> +           {
>>> +             const char *sym_name = x86_64_reloc_symbol_name (map,
>>> +                                                              reloc);
>>> +             _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
>>> +                               "direct branch\n", sym_name,
>>> +                               DSO_FILENAME (map->l_name));
>>> +           }
>>> +
>>> +         pad = branch_start + JMP32_INSN_SIZE;
>>> +
>>> +         if (__glibc_unlikely (pad > plt_end))
>>> +           {
>>> +             if (__glibc_unlikely (GLRO(dl_debug_mask)
>>> +                                   & DL_DEBUG_BINDINGS))
>>> +               {
>>> +                 const char *sym_name
>>> +                   = x86_64_reloc_symbol_name (map, reloc);
>>> +                 _dl_debug_printf ("\ninvalid r_addend of "
>>> +                                   "R_X86_64_JUMP_SLOT against '%s' "
>>> +                                   "in '%s'\n", sym_name,
>>> +                                   DSO_FILENAME (map->l_name));
>>> +               }
>>> +
>>> +             continue;
>>> +           }
>>> +
>>> +         /* Write out direct branch.  */
>>> +         *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
>>> +         *((uint32_t *) (branch_start + 1)) = disp;
>>
>> These are technically UB, so maybe use memcpy here:
>>
>>   memcpy (branch_start, &(uint8_t) { JMP32_INSN_OPCODE}, sizeof (uint8_t));
>>   memcpy (branch_start + 1, &disp, sizeof (uint32_t));
> 
> It should work since there is no difference from x86-64 memcpy:
> 
> L(between_8_15):
>         /* From 8 to 15.  No branch when size == 8.  */
>         movq    -8(%rsi, %rdx), %rcx
>         movq    (%rsi), %rsi
>         movq    %rsi, (%rdi)
>         movq    %rcx, -8(%rdi, %rdx)
>         ret

And I would expect that compiler would inline these short memcpy anyway.

> 
>>> +       }
>>> +     else
>>> +       {
>>> +         if (GL(dl_x86_feature_control).plt_rewrite
>>> +             != plt_rewrite_jmpabs)
>>> +           continue;
>>> +
>>> +         pad = branch_start + JMPABS_INSN_SIZE;
>>> +
>>> +         if (pad > plt_end)
>>> +           continue;
>>> +
>>> +         /* Rewrite the PLT entry with JMPABS.  */
>>> +         if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
>>> +           {
>>> +             const char *sym_name = x86_64_reloc_symbol_name (map,
>>> +                                                              reloc);
>>> +             _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
>>> +                               "JMPABS\n", sym_name,
>>> +                               DSO_FILENAME (map->l_name));
>>> +           }
>>> +
>>> +         /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
>>> +            a 3-byte opcode + 64bit address.  There is a 1-byte overlap
>>> +            between 4-byte write and 8-byte write.  */
>>> +         *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
>>> +         *(uint64_t *) (branch_start + 3) = value;
>>> +       }
>>> +
>>> +     /* Fill the unused part of the PLT entry with INT3.  */
>>> +     for (; pad < plt_end; pad++)
>>> +       *(uint8_t *) pad = INT3_INSN_OPCODE;
>>> +      }
>>> +}
>>> +
>>> +static inline void
>>> +x86_64_rewrite_plt_in_place (struct link_map *map)
>>> +{
>>> +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
>>> +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
>>> +                 + map->l_addr);
>>> +  size_t pagesize = GLRO(dl_pagesize);
>>> +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
>>> +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
>>> +               + plt - plt_aligned);
>>> +
>>> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
>>> +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
>>> +                   DSO_FILENAME (map->l_name));
>>> +
>>> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
>>> +                                 PROT_WRITE | PROT_READ) < 0))
>>> +    {
>>> +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
>>> +     _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
>>> +                       DSO_FILENAME (map->l_name));
>>> +      return;
>>> +    }
>>> +
>>> +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
>>> +
>>> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
>>> +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
>>> +          DSO_FILENAME (map->l_name));
>>> +
>>> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
>>> +                                 PROT_EXEC | PROT_READ) < 0))
>>> +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
>>> +                   "failed to change PLT back to read-only");
>>> +}
>>> +
>>> +/* Rewrite PLT entries to direct branch if possible.  */
>>> +
>>> +static inline void
>>> +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
>>> +{
>>> +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
>>> +  if (lazy)
>>> +    return;
>>> +
>>> +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
>>> +    return;
>>> +
>>> +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
>>> +  if (!map->l_has_jump_slot_reloc)
>>> +    return;
>>> +
>>> +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
>>> +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
>>> +    return;
>>> +
>>> +  /* Ignore DT_X86_64_PLT if
>>> +     1. DT_JMPREL isn't available or its value is 0.
>>> +     2. DT_PLTRELSZ is 0.
>>> +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
>>> +     16 bytes.
>>> +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
>>> +     DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
>>> +     value.  */
>>> +  if (map->l_info[DT_JMPREL] == NULL
>>> +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
>>> +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
>>> +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
>>> +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
>>> +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
>>> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
>>> +       < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
>>> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
>>> +       % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
>>> +    return;
>>> +
>>> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
>>> +    {
>>> +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
>>> +
>>> +      /* PLT rewrite is enabled.  Check if mprotect works.  */
>>> +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
>>> +                       MAP_PRIVATE | MAP_ANONYMOUS,
>>> +                       -1, 0);
>>> +      if (__glibc_unlikely (plt != MAP_FAILED))
>>> +     {
>>> +       /* Touch the PROT_READ | PROT_WRITE page.  */
>>> +       *(int32_t *) plt = -1;
>>> +
>>> +       /* If the updated PROT_READ | PROT_WRITE page can be changed
>>> +          to PROT_EXEC | PROT_READ, rewrite PLT.  */
>>> +       if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
>>> +         /* Use JMPABS on APX processors.  */
>>> +         plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
>>> +                                               APX_F)
>>> +                        ? plt_rewrite_jmpabs : plt_rewrite_jmp);
>>> +
>>> +       __munmap (plt, 4096);
>>> +     }
>>
>> Do we have x86_64 cpu/kernel that does not support setting PROT_EXEC | PROT_READ
>> (afaik only i386 has some restrictions)?  Because this runtime tests is somewhat
>> brittle: mmap might fail or being restricted (for instance due filtering or
>> resource limitation/exaustion).
> 
> As Florian mentioned, it checks for SELinux restrictions.  It isn't 100%
> foolproof since SELinux may allow it only for non-program mapped memory
> and disallow it for program mapped memory.
> 

Yeah, I missed the previous conversation about the SELinux issue.  I don't have
a strong preference, but I do tend to see such runtime checks to add some
maintainability issues specially in this case that they are not really
bullet-proof.
  
H.J. Lu Jan. 2, 2024, 5:53 p.m. UTC | #6
On Tue, Jan 2, 2024 at 9:42 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 02/01/24 14:02, H.J. Lu wrote:
> > On Tue, Jan 2, 2024 at 7:51 AM Adhemerval Zanella Netto
>
> >> Maybe it would be clear to use the value directly:
> >>
> >>   if ((uint64_t) disp + UINT64_C (0x80000000) <= UINT64_C (0xffffffff))
> >
> > I was using
> >
> > if ((disp + 0x80000000ULL) <= 0xffffffffULL)
> >
> > before.  Should I change it back or with UINT64_C?
>
> I just saw it that you stated it on cover-letter, I don't have a strong opnion
> in fact.  So either should be fine.

I will keep it ASIS.

> >
> >> Or (if I am not doing anything wrong):
> >>
> >>   if ((uint64_t) dist <= 0x7FFFFFFF)
> >
> > (uint64_t)(INT32_MIN) <= 0x7FFFFFFF
> >
> > is false.
>
> Yeah, I just realize that the update value is a signed one.
>
> >
> >>
> >>> +         <= (uint64_t) UINT32_MAX)
> >>> +       {
> >>> +         /* If the target branch can be reached with a direct branch,
> >>> +            rewrite the PLT entry with a direct branch.  */
> >>> +         if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> >>> +           {
> >>> +             const char *sym_name = x86_64_reloc_symbol_name (map,
> >>> +                                                              reloc);
> >>> +             _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> >>> +                               "direct branch\n", sym_name,
> >>> +                               DSO_FILENAME (map->l_name));
> >>> +           }
> >>> +
> >>> +         pad = branch_start + JMP32_INSN_SIZE;
> >>> +
> >>> +         if (__glibc_unlikely (pad > plt_end))
> >>> +           {
> >>> +             if (__glibc_unlikely (GLRO(dl_debug_mask)
> >>> +                                   & DL_DEBUG_BINDINGS))
> >>> +               {
> >>> +                 const char *sym_name
> >>> +                   = x86_64_reloc_symbol_name (map, reloc);
> >>> +                 _dl_debug_printf ("\ninvalid r_addend of "
> >>> +                                   "R_X86_64_JUMP_SLOT against '%s' "
> >>> +                                   "in '%s'\n", sym_name,
> >>> +                                   DSO_FILENAME (map->l_name));
> >>> +               }
> >>> +
> >>> +             continue;
> >>> +           }
> >>> +
> >>> +         /* Write out direct branch.  */
> >>> +         *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> >>> +         *((uint32_t *) (branch_start + 1)) = disp;
> >>
> >> These are technically UB, so maybe use memcpy here:
> >>
> >>   memcpy (branch_start, &(uint8_t) { JMP32_INSN_OPCODE}, sizeof (uint8_t));
> >>   memcpy (branch_start + 1, &disp, sizeof (uint32_t));
> >
> > It should work since there is no difference from x86-64 memcpy:
> >
> > L(between_8_15):
> >         /* From 8 to 15.  No branch when size == 8.  */
> >         movq    -8(%rsi, %rdx), %rcx
> >         movq    (%rsi), %rsi
> >         movq    %rsi, (%rdi)
> >         movq    %rcx, -8(%rdi, %rdx)
> >         ret
>
> And I would expect that compiler would inline these short memcpy anyway.

If
       case R_X86_64_PC32:
          value += reloc->r_addend - (ElfW(Addr)) reloc_addr;
          *(unsigned int *) reloc_addr = value;

in dl-machine.h works,

            *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
            *((uint32_t *) (branch_start + 1)) = disp;

must work.

> >
> >>> +       }
> >>> +     else
> >>> +       {
> >>> +         if (GL(dl_x86_feature_control).plt_rewrite
> >>> +             != plt_rewrite_jmpabs)
> >>> +           continue;
> >>> +
> >>> +         pad = branch_start + JMPABS_INSN_SIZE;
> >>> +
> >>> +         if (pad > plt_end)
> >>> +           continue;
> >>> +
> >>> +         /* Rewrite the PLT entry with JMPABS.  */
> >>> +         if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> >>> +           {
> >>> +             const char *sym_name = x86_64_reloc_symbol_name (map,
> >>> +                                                              reloc);
> >>> +             _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> >>> +                               "JMPABS\n", sym_name,
> >>> +                               DSO_FILENAME (map->l_name));
> >>> +           }
> >>> +
> >>> +         /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> >>> +            a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> >>> +            between 4-byte write and 8-byte write.  */
> >>> +         *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> >>> +         *(uint64_t *) (branch_start + 3) = value;
> >>> +       }
> >>> +
> >>> +     /* Fill the unused part of the PLT entry with INT3.  */
> >>> +     for (; pad < plt_end; pad++)
> >>> +       *(uint8_t *) pad = INT3_INSN_OPCODE;
> >>> +      }
> >>> +}
> >>> +
> >>> +static inline void
> >>> +x86_64_rewrite_plt_in_place (struct link_map *map)
> >>> +{
> >>> +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> >>> +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> >>> +                 + map->l_addr);
> >>> +  size_t pagesize = GLRO(dl_pagesize);
> >>> +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> >>> +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> >>> +               + plt - plt_aligned);
> >>> +
> >>> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> >>> +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> >>> +                   DSO_FILENAME (map->l_name));
> >>> +
> >>> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> >>> +                                 PROT_WRITE | PROT_READ) < 0))
> >>> +    {
> >>> +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> >>> +     _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> >>> +                       DSO_FILENAME (map->l_name));
> >>> +      return;
> >>> +    }
> >>> +
> >>> +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> >>> +
> >>> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> >>> +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> >>> +          DSO_FILENAME (map->l_name));
> >>> +
> >>> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> >>> +                                 PROT_EXEC | PROT_READ) < 0))
> >>> +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> >>> +                   "failed to change PLT back to read-only");
> >>> +}
> >>> +
> >>> +/* Rewrite PLT entries to direct branch if possible.  */
> >>> +
> >>> +static inline void
> >>> +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> >>> +{
> >>> +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> >>> +  if (lazy)
> >>> +    return;
> >>> +
> >>> +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> >>> +    return;
> >>> +
> >>> +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> >>> +  if (!map->l_has_jump_slot_reloc)
> >>> +    return;
> >>> +
> >>> +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> >>> +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> >>> +    return;
> >>> +
> >>> +  /* Ignore DT_X86_64_PLT if
> >>> +     1. DT_JMPREL isn't available or its value is 0.
> >>> +     2. DT_PLTRELSZ is 0.
> >>> +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> >>> +     16 bytes.
> >>> +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> >>> +     DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> >>> +     value.  */
> >>> +  if (map->l_info[DT_JMPREL] == NULL
> >>> +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> >>> +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> >>> +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> >>> +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> >>> +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> >>> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> >>> +       < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> >>> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> >>> +       % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> >>> +    return;
> >>> +
> >>> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> >>> +    {
> >>> +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> >>> +
> >>> +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> >>> +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> >>> +                       MAP_PRIVATE | MAP_ANONYMOUS,
> >>> +                       -1, 0);
> >>> +      if (__glibc_unlikely (plt != MAP_FAILED))
> >>> +     {
> >>> +       /* Touch the PROT_READ | PROT_WRITE page.  */
> >>> +       *(int32_t *) plt = -1;
> >>> +
> >>> +       /* If the updated PROT_READ | PROT_WRITE page can be changed
> >>> +          to PROT_EXEC | PROT_READ, rewrite PLT.  */
> >>> +       if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
> >>> +         /* Use JMPABS on APX processors.  */
> >>> +         plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
> >>> +                                               APX_F)
> >>> +                        ? plt_rewrite_jmpabs : plt_rewrite_jmp);
> >>> +
> >>> +       __munmap (plt, 4096);
> >>> +     }
> >>
> >> Do we have x86_64 cpu/kernel that does not support setting PROT_EXEC | PROT_READ
> >> (afaik only i386 has some restrictions)?  Because this runtime tests is somewhat
> >> brittle: mmap might fail or being restricted (for instance due filtering or
> >> resource limitation/exaustion).
> >
> > As Florian mentioned, it checks for SELinux restrictions.  It isn't 100%
> > foolproof since SELinux may allow it only for non-program mapped memory
> > and disallow it for program mapped memory.
> >
>
> Yeah, I missed the previous conversation about the SELinux issue.  I don't have
> a strong preference, but I do tend to see such runtime checks to add some
> maintainability issues specially in this case that they are not really
> bullet-proof.

I can drop it.
  
Noah Goldstein Jan. 2, 2024, 5:56 p.m. UTC | #7
On Tue, Jan 2, 2024 at 7:13 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Changes in v3:
>
> 1. Define and use macros for instruction opcodes and sizes.
> 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> 0xffffffffULL.
> 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> 5. Add some comments for mprotect test.
> 6. Update plt_rewrite logic.
>
> Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> relocation.
>
> For x86-64, add
>
>  #define DT_X86_64_PLT     (DT_LOPROC + 0)
>  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
>  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
>
> 1. DT_X86_64_PLT: The address of the procedure linkage table.
> 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> table.
> 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> entry.
>
> With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> memory offset of the indirect branch instruction.
>
> Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> with direct branch after relocation when the lazy binding is disabled.
>
> PLT rewrite is disabled by default since SELinux may disallow modifying
> code pages and ld.so can't detect it in all cases.  Add
>
> $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
>
> to enable PLT rewrite at run-time.
> ---
>  elf/dynamic-link.h                   |   5 +
>  elf/elf.h                            |   5 +
>  elf/tst-glibcelf.py                  |   1 +
>  manual/tunables.texi                 |   9 +
>  scripts/glibcelf.py                  |   4 +
>  sysdeps/x86/cet-control.h            |  14 ++
>  sysdeps/x86/cpu-features.c           |  10 ++
>  sysdeps/x86/dl-procruntime.c         |   1 +
>  sysdeps/x86/dl-tunables.list         |   5 +
>  sysdeps/x86_64/Makefile              |  19 ++
>  sysdeps/x86_64/configure             |  35 ++++
>  sysdeps/x86_64/configure.ac          |   4 +
>  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
>  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
>  sysdeps/x86_64/link_map.h            |  22 +++
>  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
>  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
>  17 files changed, 467 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
>  create mode 100644 sysdeps/x86_64/link_map.h
>  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
>  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
>
> diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
> index 8cdf7bde09..83d834ecaf 100644
> --- a/elf/dynamic-link.h
> +++ b/elf/dynamic-link.h
> @@ -177,6 +177,10 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
>        }                                                                              \
>    } while (0);
>
> +# ifndef ELF_DYNAMIC_AFTER_RELOC
> +#  define ELF_DYNAMIC_AFTER_RELOC(map, lazy)
> +# endif
> +
>  /* This can't just be an inline function because GCC is too dumb
>     to inline functions containing inlines themselves.  */
>  # ifdef RTLD_BOOTSTRAP
> @@ -192,6 +196,7 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
>        ELF_DYNAMIC_DO_RELR (map);                                             \
>      ELF_DYNAMIC_DO_REL ((map), (scope), edr_lazy, skip_ifunc);               \
>      ELF_DYNAMIC_DO_RELA ((map), (scope), edr_lazy, skip_ifunc);                      \
> +    ELF_DYNAMIC_AFTER_RELOC ((map), (edr_lazy));                             \
>    } while (0)
>
>  #endif
> diff --git a/elf/elf.h b/elf/elf.h
> index ca6a7d9d67..455731663c 100644
> --- a/elf/elf.h
> +++ b/elf/elf.h
> @@ -3639,6 +3639,11 @@ enum
>  /* x86-64 sh_type values.  */
>  #define SHT_X86_64_UNWIND      0x70000001 /* Unwind information.  */
>
> +/* x86-64 d_tag values.  */
> +#define DT_X86_64_PLT          (DT_LOPROC + 0)
> +#define DT_X86_64_PLTSZ                (DT_LOPROC + 1)
> +#define DT_X86_64_PLTENT       (DT_LOPROC + 3)
> +#define DT_X86_64_NUM          4
>
>  /* AM33 relocations.  */
>  #define R_MN10300_NONE         0       /* No reloc.  */
> diff --git a/elf/tst-glibcelf.py b/elf/tst-glibcelf.py
> index 00cd2bba85..c191636a99 100644
> --- a/elf/tst-glibcelf.py
> +++ b/elf/tst-glibcelf.py
> @@ -187,6 +187,7 @@ DT_VALNUM
>  DT_VALRNGHI
>  DT_VALRNGLO
>  DT_VERSIONTAGNUM
> +DT_X86_64_NUM
>  ELFCLASSNUM
>  ELFDATANUM
>  EM_NUM
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index b31f16da84..f9bd83622e 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -57,6 +57,7 @@ glibc.pthread.stack_cache_size: 0x2800000 (min: 0x0, max: 0xffffffffffffffff)
>  glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffffffffffff)
>  glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
>  glibc.elision.skip_trylock_internal_abort: 3 (min: 0, max: 2147483647)
> +glibc.cpu.plt_rewrite: 0 (min: 0, max: 1)
>  glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffffffffffff)
>  glibc.cpu.x86_ibt:
>  glibc.cpu.hwcaps:
> @@ -614,6 +615,14 @@ this tunable.
>  This tunable is specific to 64-bit x86-64.
>  @end deftp
>
> +@deftp Tunable glibc.cpu.plt_rewrite
> +When this tunable is set to @code{1}, the dynamic linker will rewrite
> +the PLT section with direct branch after relocation if possible when
> +the lazy binding is disabled.
> +
> +This tunable is specific to x86-64.
> +@end deftp
> +
>  @node Memory Related Tunables
>  @section Memory Related Tunables
>  @cindex memory related tunables
> diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
> index c5e5dda48e..5f3813f326 100644
> --- a/scripts/glibcelf.py
> +++ b/scripts/glibcelf.py
> @@ -439,6 +439,8 @@ class DtRISCV(Dt):
>      """Supplemental DT_* constants for EM_RISCV."""
>  class DtSPARC(Dt):
>      """Supplemental DT_* constants for EM_SPARC."""
> +class DtX86_64(Dt):
> +    """Supplemental DT_* constants for EM_X86_64."""
>  _dt_skip = '''
>  DT_ENCODING DT_PROCNUM
>  DT_ADDRRNGLO DT_ADDRRNGHI DT_ADDRNUM
> @@ -451,6 +453,7 @@ DT_MIPS_NUM
>  DT_PPC_NUM
>  DT_PPC64_NUM
>  DT_SPARC_NUM
> +DT_X86_64_NUM
>  '''.strip().split()
>  _register_elf_h(DtAARCH64, prefix='DT_AARCH64_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtALPHA, prefix='DT_ALPHA_', skip=_dt_skip, parent=Dt)
> @@ -461,6 +464,7 @@ _register_elf_h(DtPPC, prefix='DT_PPC_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtPPC64, prefix='DT_PPC64_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtRISCV, prefix='DT_RISCV_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(DtSPARC, prefix='DT_SPARC_', skip=_dt_skip, parent=Dt)
> +_register_elf_h(DtX86_64, prefix='DT_X86_64_', skip=_dt_skip, parent=Dt)
>  _register_elf_h(Dt, skip=_dt_skip, ranges=True)
>  del _dt_skip
>
> diff --git a/sysdeps/x86/cet-control.h b/sysdeps/x86/cet-control.h
> index a45d59bf8c..d4341f98ba 100644
> --- a/sysdeps/x86/cet-control.h
> +++ b/sysdeps/x86/cet-control.h
> @@ -32,10 +32,24 @@ enum dl_x86_cet_control
>    cet_permissive
>  };
>
> +/* PLT rewrite control.  */
> +enum dl_plt_rewrite_control
> +{
> +  /* No PLT rewrite.  */
> +  plt_rewrite_none,
> +  /* PLT rewrite is enabled at run-time.  */
> +  plt_rewrite_enabled,
> +  /* Rewrite PLT with JMP at run-time.  */
> +  plt_rewrite_jmp,
> +  /* Rewrite PLT with JMPABS at run-time.  */
> +  plt_rewrite_jmpabs
> +};
> +
>  struct dl_x86_feature_control
>  {
>    enum dl_x86_cet_control ibt : 2;
>    enum dl_x86_cet_control shstk : 2;
> +  enum dl_plt_rewrite_control plt_rewrite : 2;
>  };
>
>  #endif /* cet-control.h */
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index f193ea7a2d..f96e6489ed 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -27,6 +27,13 @@
>  extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
>    attribute_hidden;
>
> +static void
> +TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> +{
> +  if (valp->numval)
> +    GL(dl_x86_feature_control).plt_rewrite = plt_rewrite_enabled;
> +}
> +
>  #ifdef __LP64__
>  static void
>  TUNABLE_CALLBACK (set_prefer_map_32bit_exec) (tunable_val_t *valp)
> @@ -996,6 +1003,9 @@ no_cpuid:
>
>    TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps));
>
> +  TUNABLE_GET (plt_rewrite, tunable_val_t *,
> +              TUNABLE_CALLBACK (set_plt_rewrite));
> +
>  #ifdef __LP64__
>    TUNABLE_GET (prefer_map_32bit_exec, tunable_val_t *,
>                TUNABLE_CALLBACK (set_prefer_map_32bit_exec));
> diff --git a/sysdeps/x86/dl-procruntime.c b/sysdeps/x86/dl-procruntime.c
> index 4d25d9f327..15b3d0d878 100644
> --- a/sysdeps/x86/dl-procruntime.c
> +++ b/sysdeps/x86/dl-procruntime.c
> @@ -67,6 +67,7 @@ PROCINFO_CLASS struct dl_x86_feature_control _dl_x86_feature_control
>  = {
>      .ibt = DEFAULT_DL_X86_CET_CONTROL,
>      .shstk = DEFAULT_DL_X86_CET_CONTROL,
> +    .plt_rewrite = plt_rewrite_none,
>    }
>  # endif
>  # if !defined SHARED || defined PROCINFO_DECL
> diff --git a/sysdeps/x86/dl-tunables.list b/sysdeps/x86/dl-tunables.list
> index 147a7270ec..e2e441e1b7 100644
> --- a/sysdeps/x86/dl-tunables.list
> +++ b/sysdeps/x86/dl-tunables.list
> @@ -66,5 +66,10 @@ glibc {
>      x86_shared_cache_size {
>        type: SIZE_T
>      }
> +    plt_rewrite {
> +      type: INT_32
> +      minval: 0
> +      maxval: 1
> +    }
>    }
>  }
> diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> index 00120ca9ca..106f1d0f55 100644
> --- a/sysdeps/x86_64/Makefile
> +++ b/sysdeps/x86_64/Makefile
> @@ -175,6 +175,25 @@ ifeq (no,$(build-hardcoded-path-in-tests))
>  tests-container += tst-glibc-hwcaps-cache
>  endif
>
> +ifeq (yes,$(have-z-mark-plt))
> +tests += \
> +  tst-plt-rewrite1 \
> +# tests
> +modules-names += \
> +  tst-plt-rewritemod1 \
> +# modules-names
> +
> +tst-plt-rewrite1-no-pie = yes
> +LDFLAGS-tst-plt-rewrite1 = -Wl,-z,mark-plt,-z,now
> +LDFLAGS-tst-plt-rewritemod1.so = -Wl,-z,mark-plt,-z,now
> +tst-plt-rewrite1-ENV = GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 LD_DEBUG=files:bindings
> +$(objpfx)tst-plt-rewrite1: $(objpfx)tst-plt-rewritemod1.so
> +$(objpfx)tst-plt-rewrite1.out: /dev/null $(objpfx)tst-plt-rewrite1
> +       $(tst-plt-rewrite1-ENV) $(make-test-out) > $@ 2>&1; \
> +       grep -q -E "changing 'bar' PLT entry in .*/elf/tst-plt-rewritemod1.so' to direct branch" $@; \
> +       $(evaluate-test)
> +endif
> +
>  endif # $(subdir) == elf
>
>  ifeq ($(subdir),csu)
> diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
> index e307467afa..a15eaa349d 100755
> --- a/sysdeps/x86_64/configure
> +++ b/sysdeps/x86_64/configure
> @@ -25,6 +25,41 @@ printf "%s\n" "$libc_cv_cc_mprefer_vector_width" >&6; }
>  config_vars="$config_vars
>  config-cflags-mprefer-vector-width = $libc_cv_cc_mprefer_vector_width"
>
> +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports -z mark-plt" >&5
> +printf %s "checking for linker that supports -z mark-plt... " >&6; }
> +libc_linker_feature=no
> +cat > conftest.c <<EOF
> +int _start (void) { return 42; }
> +EOF
> +if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp
> +                 -Wl,-z,mark-plt -nostdlib -nostartfiles
> +                 -fPIC -shared -o conftest.so conftest.c
> +                 1>&5'
> +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> +  (eval $ac_try) 2>&5
> +  ac_status=$?
> +  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> +  test $ac_status = 0; }; }
> +then
> +  if ${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp -Wl,-z,mark-plt -nostdlib \
> +      -nostartfiles -fPIC -shared -o conftest.so conftest.c 2>&1 \
> +      | grep "warning: -z mark-plt ignored" > /dev/null 2>&1; then
> +    true
> +  else
> +    libc_linker_feature=yes
> +  fi
> +fi
> +rm -f conftest*
> +if test $libc_linker_feature = yes; then
> +  libc_cv_z_mark_plt=yes
> +else
> +  libc_cv_z_mark_plt=no
> +fi
> +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_linker_feature" >&5
> +printf "%s\n" "$libc_linker_feature" >&6; }
> +config_vars="$config_vars
> +have-z-mark-plt = $libc_cv_z_mark_plt"
> +
>  if test x"$build_mathvec" = xnotset; then
>    build_mathvec=yes
>  fi
> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> index 1215dcb1e4..40f642db31 100644
> --- a/sysdeps/x86_64/configure.ac
> +++ b/sysdeps/x86_64/configure.ac
> @@ -10,6 +10,10 @@ LIBC_TRY_CC_OPTION([-mprefer-vector-width=128],
>  LIBC_CONFIG_VAR([config-cflags-mprefer-vector-width],
>                 [$libc_cv_cc_mprefer_vector_width])
>
> +LIBC_LINKER_FEATURE([-z mark-plt], [-Wl,-z,mark-plt],
> +                    [libc_cv_z_mark_plt=yes], [libc_cv_z_mark_plt=no])
> +LIBC_CONFIG_VAR([have-z-mark-plt], [$libc_cv_z_mark_plt])
> +
>  if test x"$build_mathvec" = xnotset; then
>    build_mathvec=yes
>  fi
> diff --git a/sysdeps/x86_64/dl-dtprocnum.h b/sysdeps/x86_64/dl-dtprocnum.h
> new file mode 100644
> index 0000000000..dd41d95b88
> --- /dev/null
> +++ b/sysdeps/x86_64/dl-dtprocnum.h
> @@ -0,0 +1,21 @@
> +/* Configuration of lookup functions.  x64-64 version.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Number of extra dynamic section entries for this architecture.  By
> +   default there are none.  */
> +#define DT_THISPROCNUM DT_X86_64_NUM
> diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> index e0a9b14469..51935a2fb2 100644
> --- a/sysdeps/x86_64/dl-machine.h
> +++ b/sysdeps/x86_64/dl-machine.h
> @@ -22,6 +22,7 @@
>  #define ELF_MACHINE_NAME "x86_64"
>
>  #include <assert.h>
> +#include <stdint.h>
>  #include <sys/param.h>
>  #include <sysdep.h>
>  #include <tls.h>
> @@ -35,6 +36,9 @@
>  # define RTLD_START_ENABLE_X86_FEATURES
>  #endif
>
> +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> +
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
>  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
>
>        switch (r_type)
>         {
> -       case R_X86_64_GLOB_DAT:
>         case R_X86_64_JUMP_SLOT:
> +         map->l_has_jump_slot_reloc = true;
> +       case R_X86_64_GLOB_DAT:
>           *reloc_addr = value;
>           break;
>
> @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
>  }
>
>  #endif /* RESOLVE_MAP */
> +
> +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> +    && defined SHARED
> +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> +  x86_64_dynamic_after_reloc (map, (lazy))
> +
> +# define JMP32_INSN_OPCODE     0xe9
> +# define JMP32_INSN_SIZE       5
> +# define JMPABS_INSN_OPCODE    0xa100d5
> +# define JMPABS_INSN_SIZE      11
> +# define INT3_INSN_OPCODE      0xcc
> +
> +static const char *
> +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> +{
> +  const ElfW(Sym) *const symtab
> +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> +  return strtab + refsym->st_name;
> +}
> +
> +static void
> +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> +                   ElfW(Addr) plt_aligned)
> +{
> +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> +  ElfW(Addr) l_addr = map->l_addr;
> +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> +  const ElfW(Rela) *reloc = (const void *) start;
> +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> +
> +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> +                                         header.feature_1);
> +  bool ibt_enabled_p
> +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> +
> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> +                     DSO_FILENAME (map->l_name));
> +
> +  for (; reloc < reloc_end; reloc++)
> +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> +      {
> +       /* Get the value from the GOT entry.  */
> +       ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> +
> +       /* Get the corresponding PLT entry from r_addend.  */
> +       ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> +       /* Skip ENDBR64 if IBT isn't enabled.  */
> +       if (!ibt_enabled_p)
> +         branch_start = ALIGN_DOWN (branch_start, pltent);
> +       /* Get the displacement from the branch target.  */
> +       ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> +       ElfW(Addr) plt_end;
> +       ElfW(Addr) pad;
> +
> +       branch_start += plt_rewrite_bias;
> +       plt_end = (branch_start & -pltent) + pltent;
> +
> +       /* Update the PLT entry.  */
> +       if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
> +           <= (uint64_t) UINT32_MAX)
> +         {
> +           /* If the target branch can be reached with a direct branch,
> +              rewrite the PLT entry with a direct branch.  */
> +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> +             {
> +               const char *sym_name = x86_64_reloc_symbol_name (map,
> +                                                                reloc);
> +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> +                                 "direct branch\n", sym_name,
> +                                 DSO_FILENAME (map->l_name));
> +             }
> +
> +           pad = branch_start + JMP32_INSN_SIZE;
> +
> +           if (__glibc_unlikely (pad > plt_end))
> +             {
> +               if (__glibc_unlikely (GLRO(dl_debug_mask)
> +                                     & DL_DEBUG_BINDINGS))
> +                 {
> +                   const char *sym_name
> +                     = x86_64_reloc_symbol_name (map, reloc);
> +                   _dl_debug_printf ("\ninvalid r_addend of "
> +                                     "R_X86_64_JUMP_SLOT against '%s' "
> +                                     "in '%s'\n", sym_name,
> +                                     DSO_FILENAME (map->l_name));
> +                 }
> +
> +               continue;
> +             }
> +
> +           /* Write out direct branch.  */
> +           *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> +           *((uint32_t *) (branch_start + 1)) = disp;
> +         }
> +       else
> +         {
> +           if (GL(dl_x86_feature_control).plt_rewrite
> +               != plt_rewrite_jmpabs)
> +             continue;
> +
> +           pad = branch_start + JMPABS_INSN_SIZE;
> +
> +           if (pad > plt_end)
> +             continue;
> +
> +           /* Rewrite the PLT entry with JMPABS.  */
> +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> +             {
> +               const char *sym_name = x86_64_reloc_symbol_name (map,
> +                                                                reloc);
> +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> +                                 "JMPABS\n", sym_name,
> +                                 DSO_FILENAME (map->l_name));
> +             }
> +
> +           /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> +              a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> +              between 4-byte write and 8-byte write.  */
> +           *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> +           *(uint64_t *) (branch_start + 3) = value;
> +         }
> +
> +       /* Fill the unused part of the PLT entry with INT3.  */
> +       for (; pad < plt_end; pad++)
> +         *(uint8_t *) pad = INT3_INSN_OPCODE;
Should this be moved outside of the `else` scope so that we also pad for
jmp_d32 case?
> +      }
> +}
> +
> +static inline void
> +x86_64_rewrite_plt_in_place (struct link_map *map)
> +{
> +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> +                   + map->l_addr);
> +  size_t pagesize = GLRO(dl_pagesize);
> +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> +                 + plt - plt_aligned);
> +
> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> +                     DSO_FILENAME (map->l_name));
> +
> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> +                                   PROT_WRITE | PROT_READ) < 0))
> +    {
> +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +       _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> +                         DSO_FILENAME (map->l_name));
> +      return;
> +    }
> +
> +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> +
> +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> +          DSO_FILENAME (map->l_name));
> +
> +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> +                                   PROT_EXEC | PROT_READ) < 0))
> +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> +                     "failed to change PLT back to read-only");
> +}
> +
> +/* Rewrite PLT entries to direct branch if possible.  */
> +
> +static inline void
> +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> +{
> +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> +  if (lazy)
> +    return;
> +
> +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> +    return;
> +
> +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> +  if (!map->l_has_jump_slot_reloc)
> +    return;
> +
> +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> +    return;
> +
> +  /* Ignore DT_X86_64_PLT if
> +     1. DT_JMPREL isn't available or its value is 0.
> +     2. DT_PLTRELSZ is 0.
> +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> +       16 bytes.
> +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> +       DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> +       value.  */
> +  if (map->l_info[DT_JMPREL] == NULL
> +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> +         < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> +         % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> +    return;
> +
> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> +    {
> +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> +
> +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> +                         MAP_PRIVATE | MAP_ANONYMOUS,
> +                         -1, 0);
> +      if (__glibc_unlikely (plt != MAP_FAILED))
> +       {
> +         /* Touch the PROT_READ | PROT_WRITE page.  */
> +         *(int32_t *) plt = -1;
Should this be +=1 so it does r/w instead of just w?
I doubt it will matter, but it does seem more complete.
> +
> +         /* If the updated PROT_READ | PROT_WRITE page can be changed
> +            to PROT_EXEC | PROT_READ, rewrite PLT.  */
> +         if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
> +           /* Use JMPABS on APX processors.  */
> +           plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
> +                                                 APX_F)
> +                          ? plt_rewrite_jmpabs : plt_rewrite_jmp);
> +
> +         __munmap (plt, 4096);
> +       }
> +
> +      GL(dl_x86_feature_control).plt_rewrite = plt_rewrite;
> +    }
> +
> +  /* Ignore DT_X86_64_PLT if PLT rewrite isn't enabled.  */
> +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_none)
> +    return;
> +
> +  x86_64_rewrite_plt_in_place (map);
> +}
> +#endif
> diff --git a/sysdeps/x86_64/link_map.h b/sysdeps/x86_64/link_map.h
> new file mode 100644
> index 0000000000..ddb8e78077
> --- /dev/null
> +++ b/sysdeps/x86_64/link_map.h
> @@ -0,0 +1,22 @@
> +/* Additional fields in struct link_map.  x86-64 version.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Has R_X86_64_JUMP_SLOT relocation.  */
> +bool l_has_jump_slot_reloc;
> +
> +#include <sysdeps/x86/link_map.h>
> diff --git a/sysdeps/x86_64/tst-plt-rewrite1.c b/sysdeps/x86_64/tst-plt-rewrite1.c
> new file mode 100644
> index 0000000000..0dd4724a5a
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-plt-rewrite1.c
> @@ -0,0 +1,31 @@
> +/* Test PLT rewrite.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <string.h>
> +#include <support/check.h>
> +
> +extern const char *foo (void);
> +
> +static int
> +do_test (void)
> +{
> +  TEST_COMPARE (strcmp (foo (), "PLT rewrite works"), 0);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> diff --git a/sysdeps/x86_64/tst-plt-rewritemod1.c b/sysdeps/x86_64/tst-plt-rewritemod1.c
> new file mode 100644
> index 0000000000..361b907790
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-plt-rewritemod1.c
> @@ -0,0 +1,32 @@
> +/* Check PLT rewrite works correctly.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* foo calls bar with indirect branch via PLT.  PLT rewrite should
> +   change it to direct branch.  */
> +
> +const char *
> +bar (void)
> +{
> +  return "PLT rewrite works";
> +}
> +
> +const char *
> +foo (void)
> +{
> +  return bar ();
> +}
> --
> 2.43.0
>
  
H.J. Lu Jan. 2, 2024, 6:03 p.m. UTC | #8
On Tue, Jan 2, 2024 at 9:56 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Tue, Jan 2, 2024 at 7:13 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Changes in v3:
> >
> > 1. Define and use macros for instruction opcodes and sizes.
> > 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> > 0xffffffffULL.
> > 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> > 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> > 5. Add some comments for mprotect test.
> > 6. Update plt_rewrite logic.
> >
> > Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> > relocation.
> >
> > For x86-64, add
> >
> >  #define DT_X86_64_PLT     (DT_LOPROC + 0)
> >  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
> >  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
> >
> > 1. DT_X86_64_PLT: The address of the procedure linkage table.
> > 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> > table.
> > 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> > entry.
> >
> > With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> > memory offset of the indirect branch instruction.
> >
> > Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> > with direct branch after relocation when the lazy binding is disabled.
> >
> > PLT rewrite is disabled by default since SELinux may disallow modifying
> > code pages and ld.so can't detect it in all cases.  Add
> >
> > $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
> >
> > to enable PLT rewrite at run-time.
> > ---
> >  elf/dynamic-link.h                   |   5 +
> >  elf/elf.h                            |   5 +
> >  elf/tst-glibcelf.py                  |   1 +
> >  manual/tunables.texi                 |   9 +
> >  scripts/glibcelf.py                  |   4 +
> >  sysdeps/x86/cet-control.h            |  14 ++
> >  sysdeps/x86/cpu-features.c           |  10 ++
> >  sysdeps/x86/dl-procruntime.c         |   1 +
> >  sysdeps/x86/dl-tunables.list         |   5 +
> >  sysdeps/x86_64/Makefile              |  19 ++
> >  sysdeps/x86_64/configure             |  35 ++++
> >  sysdeps/x86_64/configure.ac          |   4 +
> >  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
> >  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
> >  sysdeps/x86_64/link_map.h            |  22 +++
> >  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
> >  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
> >  17 files changed, 467 insertions(+), 1 deletion(-)
> >  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
> >  create mode 100644 sysdeps/x86_64/link_map.h
> >  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
> >  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
> >

...
> > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > index e0a9b14469..51935a2fb2 100644
> > --- a/sysdeps/x86_64/dl-machine.h
> > +++ b/sysdeps/x86_64/dl-machine.h
> > @@ -22,6 +22,7 @@
> >  #define ELF_MACHINE_NAME "x86_64"
> >
> >  #include <assert.h>
> > +#include <stdint.h>
> >  #include <sys/param.h>
> >  #include <sysdep.h>
> >  #include <tls.h>
> > @@ -35,6 +36,9 @@
> >  # define RTLD_START_ENABLE_X86_FEATURES
> >  #endif
> >
> > +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> > +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> > +
> >  /* Return nonzero iff ELF header is compatible with the running host.  */
> >  static inline int __attribute__ ((unused))
> >  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> > @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
> >
> >        switch (r_type)
> >         {
> > -       case R_X86_64_GLOB_DAT:
> >         case R_X86_64_JUMP_SLOT:
> > +         map->l_has_jump_slot_reloc = true;
> > +       case R_X86_64_GLOB_DAT:
> >           *reloc_addr = value;
> >           break;
> >
> > @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> >  }
> >
> >  #endif /* RESOLVE_MAP */
> > +
> > +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> > +    && defined SHARED
> > +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> > +  x86_64_dynamic_after_reloc (map, (lazy))
> > +
> > +# define JMP32_INSN_OPCODE     0xe9
> > +# define JMP32_INSN_SIZE       5
> > +# define JMPABS_INSN_OPCODE    0xa100d5
> > +# define JMPABS_INSN_SIZE      11
> > +# define INT3_INSN_OPCODE      0xcc
> > +
> > +static const char *
> > +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> > +{
> > +  const ElfW(Sym) *const symtab
> > +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> > +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> > +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> > +  return strtab + refsym->st_name;
> > +}
> > +
> > +static void
> > +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> > +                   ElfW(Addr) plt_aligned)
> > +{
> > +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> > +  ElfW(Addr) l_addr = map->l_addr;
> > +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> > +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> > +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> > +  const ElfW(Rela) *reloc = (const void *) start;
> > +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> > +
> > +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> > +                                         header.feature_1);
> > +  bool ibt_enabled_p
> > +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> > +
> > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> > +                     DSO_FILENAME (map->l_name));
> > +
> > +  for (; reloc < reloc_end; reloc++)
> > +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> > +      {
> > +       /* Get the value from the GOT entry.  */
> > +       ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> > +
> > +       /* Get the corresponding PLT entry from r_addend.  */
> > +       ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> > +       /* Skip ENDBR64 if IBT isn't enabled.  */
> > +       if (!ibt_enabled_p)
> > +         branch_start = ALIGN_DOWN (branch_start, pltent);
> > +       /* Get the displacement from the branch target.  */
> > +       ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> > +       ElfW(Addr) plt_end;
> > +       ElfW(Addr) pad;
> > +
> > +       branch_start += plt_rewrite_bias;
> > +       plt_end = (branch_start & -pltent) + pltent;
> > +
> > +       /* Update the PLT entry.  */
> > +       if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
> > +           <= (uint64_t) UINT32_MAX)
> > +         {
> > +           /* If the target branch can be reached with a direct branch,
> > +              rewrite the PLT entry with a direct branch.  */
> > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > +             {
> > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > +                                                                reloc);
> > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > +                                 "direct branch\n", sym_name,
> > +                                 DSO_FILENAME (map->l_name));
> > +             }
> > +
> > +           pad = branch_start + JMP32_INSN_SIZE;
> > +
> > +           if (__glibc_unlikely (pad > plt_end))
> > +             {
> > +               if (__glibc_unlikely (GLRO(dl_debug_mask)
> > +                                     & DL_DEBUG_BINDINGS))
> > +                 {
> > +                   const char *sym_name
> > +                     = x86_64_reloc_symbol_name (map, reloc);
> > +                   _dl_debug_printf ("\ninvalid r_addend of "
> > +                                     "R_X86_64_JUMP_SLOT against '%s' "
> > +                                     "in '%s'\n", sym_name,
> > +                                     DSO_FILENAME (map->l_name));
> > +                 }
> > +
> > +               continue;
> > +             }
> > +
> > +           /* Write out direct branch.  */
> > +           *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> > +           *((uint32_t *) (branch_start + 1)) = disp;
> > +         }
> > +       else
> > +         {
> > +           if (GL(dl_x86_feature_control).plt_rewrite
> > +               != plt_rewrite_jmpabs)
> > +             continue;
> > +
> > +           pad = branch_start + JMPABS_INSN_SIZE;
> > +
> > +           if (pad > plt_end)
> > +             continue;
> > +
> > +           /* Rewrite the PLT entry with JMPABS.  */
> > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > +             {
> > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > +                                                                reloc);
> > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > +                                 "JMPABS\n", sym_name,
> > +                                 DSO_FILENAME (map->l_name));
> > +             }
> > +
> > +           /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> > +              a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> > +              between 4-byte write and 8-byte write.  */
> > +           *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> > +           *(uint64_t *) (branch_start + 3) = value;
> > +         }
> > +
> > +       /* Fill the unused part of the PLT entry with INT3.  */
> > +       for (; pad < plt_end; pad++)
> > +         *(uint8_t *) pad = INT3_INSN_OPCODE;
> Should this be moved outside of the `else` scope so that we also pad for
> jmp_d32 case?

It is outside of the else.  Both cases are padded.

> > +      }
> > +}
> > +
> > +static inline void
> > +x86_64_rewrite_plt_in_place (struct link_map *map)
> > +{
> > +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> > +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> > +                   + map->l_addr);
> > +  size_t pagesize = GLRO(dl_pagesize);
> > +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> > +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > +                 + plt - plt_aligned);
> > +
> > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> > +                     DSO_FILENAME (map->l_name));
> > +
> > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > +                                   PROT_WRITE | PROT_READ) < 0))
> > +    {
> > +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +       _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> > +                         DSO_FILENAME (map->l_name));
> > +      return;
> > +    }
> > +
> > +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> > +
> > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> > +          DSO_FILENAME (map->l_name));
> > +
> > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > +                                   PROT_EXEC | PROT_READ) < 0))
> > +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> > +                     "failed to change PLT back to read-only");
> > +}
> > +
> > +/* Rewrite PLT entries to direct branch if possible.  */
> > +
> > +static inline void
> > +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> > +{
> > +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> > +  if (lazy)
> > +    return;
> > +
> > +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> > +    return;
> > +
> > +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> > +  if (!map->l_has_jump_slot_reloc)
> > +    return;
> > +
> > +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> > +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> > +    return;
> > +
> > +  /* Ignore DT_X86_64_PLT if
> > +     1. DT_JMPREL isn't available or its value is 0.
> > +     2. DT_PLTRELSZ is 0.
> > +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> > +       16 bytes.
> > +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> > +       DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> > +       value.  */
> > +  if (map->l_info[DT_JMPREL] == NULL
> > +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> > +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> > +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> > +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> > +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > +         < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > +         % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> > +    return;
> > +
> > +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> > +    {
> > +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> > +
> > +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> > +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> > +                         MAP_PRIVATE | MAP_ANONYMOUS,
> > +                         -1, 0);
> > +      if (__glibc_unlikely (plt != MAP_FAILED))
> > +       {
> > +         /* Touch the PROT_READ | PROT_WRITE page.  */
> > +         *(int32_t *) plt = -1;
> Should this be +=1 so it does r/w instead of just w?
> I doubt it will matter, but it does seem more complete.

I will drop the whole mprotect check.

Thanks.
  
Noah Goldstein Jan. 2, 2024, 6:39 p.m. UTC | #9
On Tue, Jan 2, 2024 at 10:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Jan 2, 2024 at 9:56 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Tue, Jan 2, 2024 at 7:13 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > Changes in v3:
> > >
> > > 1. Define and use macros for instruction opcodes and sizes.
> > > 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> > > 0xffffffffULL.
> > > 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> > > 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> > > 5. Add some comments for mprotect test.
> > > 6. Update plt_rewrite logic.
> > >
> > > Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> > > relocation.
> > >
> > > For x86-64, add
> > >
> > >  #define DT_X86_64_PLT     (DT_LOPROC + 0)
> > >  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
> > >  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
> > >
> > > 1. DT_X86_64_PLT: The address of the procedure linkage table.
> > > 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> > > table.
> > > 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> > > entry.
> > >
> > > With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> > > memory offset of the indirect branch instruction.
> > >
> > > Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> > > with direct branch after relocation when the lazy binding is disabled.
> > >
> > > PLT rewrite is disabled by default since SELinux may disallow modifying
> > > code pages and ld.so can't detect it in all cases.  Add
> > >
> > > $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
> > >
> > > to enable PLT rewrite at run-time.
> > > ---
> > >  elf/dynamic-link.h                   |   5 +
> > >  elf/elf.h                            |   5 +
> > >  elf/tst-glibcelf.py                  |   1 +
> > >  manual/tunables.texi                 |   9 +
> > >  scripts/glibcelf.py                  |   4 +
> > >  sysdeps/x86/cet-control.h            |  14 ++
> > >  sysdeps/x86/cpu-features.c           |  10 ++
> > >  sysdeps/x86/dl-procruntime.c         |   1 +
> > >  sysdeps/x86/dl-tunables.list         |   5 +
> > >  sysdeps/x86_64/Makefile              |  19 ++
> > >  sysdeps/x86_64/configure             |  35 ++++
> > >  sysdeps/x86_64/configure.ac          |   4 +
> > >  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
> > >  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
> > >  sysdeps/x86_64/link_map.h            |  22 +++
> > >  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
> > >  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
> > >  17 files changed, 467 insertions(+), 1 deletion(-)
> > >  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
> > >  create mode 100644 sysdeps/x86_64/link_map.h
> > >  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
> > >  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
> > >
>
> ...
> > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > > index e0a9b14469..51935a2fb2 100644
> > > --- a/sysdeps/x86_64/dl-machine.h
> > > +++ b/sysdeps/x86_64/dl-machine.h
> > > @@ -22,6 +22,7 @@
> > >  #define ELF_MACHINE_NAME "x86_64"
> > >
> > >  #include <assert.h>
> > > +#include <stdint.h>
> > >  #include <sys/param.h>
> > >  #include <sysdep.h>
> > >  #include <tls.h>
> > > @@ -35,6 +36,9 @@
> > >  # define RTLD_START_ENABLE_X86_FEATURES
> > >  #endif
> > >
> > > +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> > > +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> > > +
> > >  /* Return nonzero iff ELF header is compatible with the running host.  */
> > >  static inline int __attribute__ ((unused))
> > >  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> > > @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
> > >
> > >        switch (r_type)
> > >         {
> > > -       case R_X86_64_GLOB_DAT:
> > >         case R_X86_64_JUMP_SLOT:
> > > +         map->l_has_jump_slot_reloc = true;
> > > +       case R_X86_64_GLOB_DAT:
> > >           *reloc_addr = value;
> > >           break;
> > >
> > > @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> > >  }
> > >
> > >  #endif /* RESOLVE_MAP */
> > > +
> > > +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> > > +    && defined SHARED
> > > +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> > > +  x86_64_dynamic_after_reloc (map, (lazy))
> > > +
> > > +# define JMP32_INSN_OPCODE     0xe9
> > > +# define JMP32_INSN_SIZE       5
> > > +# define JMPABS_INSN_OPCODE    0xa100d5
> > > +# define JMPABS_INSN_SIZE      11
> > > +# define INT3_INSN_OPCODE      0xcc
> > > +
> > > +static const char *
> > > +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> > > +{
> > > +  const ElfW(Sym) *const symtab
> > > +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> > > +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> > > +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> > > +  return strtab + refsym->st_name;
> > > +}
> > > +
> > > +static void
> > > +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> > > +                   ElfW(Addr) plt_aligned)
> > > +{
> > > +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> > > +  ElfW(Addr) l_addr = map->l_addr;
> > > +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> > > +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> > > +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> > > +  const ElfW(Rela) *reloc = (const void *) start;
> > > +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> > > +
> > > +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> > > +                                         header.feature_1);
> > > +  bool ibt_enabled_p
> > > +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> > > +
> > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> > > +                     DSO_FILENAME (map->l_name));
> > > +
> > > +  for (; reloc < reloc_end; reloc++)
> > > +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> > > +      {
> > > +       /* Get the value from the GOT entry.  */
> > > +       ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> > > +
> > > +       /* Get the corresponding PLT entry from r_addend.  */
> > > +       ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> > > +       /* Skip ENDBR64 if IBT isn't enabled.  */
> > > +       if (!ibt_enabled_p)
> > > +         branch_start = ALIGN_DOWN (branch_start, pltent);
> > > +       /* Get the displacement from the branch target.  */
> > > +       ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> > > +       ElfW(Addr) plt_end;
> > > +       ElfW(Addr) pad;
> > > +
> > > +       branch_start += plt_rewrite_bias;
> > > +       plt_end = (branch_start & -pltent) + pltent;
> > > +
> > > +       /* Update the PLT entry.  */
> > > +       if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
> > > +           <= (uint64_t) UINT32_MAX)
> > > +         {
> > > +           /* If the target branch can be reached with a direct branch,
> > > +              rewrite the PLT entry with a direct branch.  */
> > > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > > +             {
> > > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > > +                                                                reloc);
> > > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > > +                                 "direct branch\n", sym_name,
> > > +                                 DSO_FILENAME (map->l_name));
> > > +             }
> > > +
> > > +           pad = branch_start + JMP32_INSN_SIZE;
> > > +
> > > +           if (__glibc_unlikely (pad > plt_end))
> > > +             {
> > > +               if (__glibc_unlikely (GLRO(dl_debug_mask)
> > > +                                     & DL_DEBUG_BINDINGS))
> > > +                 {
> > > +                   const char *sym_name
> > > +                     = x86_64_reloc_symbol_name (map, reloc);
> > > +                   _dl_debug_printf ("\ninvalid r_addend of "
> > > +                                     "R_X86_64_JUMP_SLOT against '%s' "
> > > +                                     "in '%s'\n", sym_name,
> > > +                                     DSO_FILENAME (map->l_name));
> > > +                 }
> > > +
> > > +               continue;
> > > +             }
> > > +
> > > +           /* Write out direct branch.  */
> > > +           *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> > > +           *((uint32_t *) (branch_start + 1)) = disp;
> > > +         }
> > > +       else
> > > +         {
> > > +           if (GL(dl_x86_feature_control).plt_rewrite
> > > +               != plt_rewrite_jmpabs)
> > > +             continue;
> > > +
> > > +           pad = branch_start + JMPABS_INSN_SIZE;
> > > +
> > > +           if (pad > plt_end)
> > > +             continue;
> > > +
> > > +           /* Rewrite the PLT entry with JMPABS.  */
> > > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > > +             {
> > > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > > +                                                                reloc);
> > > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > > +                                 "JMPABS\n", sym_name,
> > > +                                 DSO_FILENAME (map->l_name));
> > > +             }
> > > +
> > > +           /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> > > +              a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> > > +              between 4-byte write and 8-byte write.  */
> > > +           *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> > > +           *(uint64_t *) (branch_start + 3) = value;
> > > +         }
> > > +
> > > +       /* Fill the unused part of the PLT entry with INT3.  */
> > > +       for (; pad < plt_end; pad++)
> > > +         *(uint8_t *) pad = INT3_INSN_OPCODE;
> > Should this be moved outside of the `else` scope so that we also pad for
> > jmp_d32 case?
>
> It is outside of the else.  Both cases are padded.
>
> > > +      }
> > > +}
> > > +
> > > +static inline void
> > > +x86_64_rewrite_plt_in_place (struct link_map *map)
> > > +{
> > > +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> > > +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> > > +                   + map->l_addr);
> > > +  size_t pagesize = GLRO(dl_pagesize);
> > > +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> > > +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > +                 + plt - plt_aligned);
> > > +
> > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> > > +                     DSO_FILENAME (map->l_name));
> > > +
> > > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > > +                                   PROT_WRITE | PROT_READ) < 0))
> > > +    {
> > > +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > +       _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> > > +                         DSO_FILENAME (map->l_name));
> > > +      return;
> > > +    }
> > > +
> > > +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> > > +
> > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> > > +          DSO_FILENAME (map->l_name));
> > > +
> > > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > > +                                   PROT_EXEC | PROT_READ) < 0))
> > > +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> > > +                     "failed to change PLT back to read-only");
> > > +}
> > > +
> > > +/* Rewrite PLT entries to direct branch if possible.  */
> > > +
> > > +static inline void
> > > +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> > > +{
> > > +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> > > +  if (lazy)
> > > +    return;
> > > +
> > > +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> > > +    return;
> > > +
> > > +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> > > +  if (!map->l_has_jump_slot_reloc)
> > > +    return;
> > > +
> > > +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> > > +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> > > +    return;
> > > +
> > > +  /* Ignore DT_X86_64_PLT if
> > > +     1. DT_JMPREL isn't available or its value is 0.
> > > +     2. DT_PLTRELSZ is 0.
> > > +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> > > +       16 bytes.
> > > +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> > > +       DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> > > +       value.  */
> > > +  if (map->l_info[DT_JMPREL] == NULL
> > > +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> > > +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> > > +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> > > +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> > > +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> > > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > +         < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> > > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > +         % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> > > +    return;
> > > +
> > > +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> > > +    {
> > > +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> > > +
> > > +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> > > +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> > > +                         MAP_PRIVATE | MAP_ANONYMOUS,
> > > +                         -1, 0);
> > > +      if (__glibc_unlikely (plt != MAP_FAILED))
> > > +       {
> > > +         /* Touch the PROT_READ | PROT_WRITE page.  */
> > > +         *(int32_t *) plt = -1;
> > Should this be +=1 so it does r/w instead of just w?
> > I doubt it will matter, but it does seem more complete.
>
> I will drop the whole mprotect check.

Will the check never supply anything. Without the check if its
not possible we will fail to start the process which seems like
an outcome to avoid if possible.
>
> Thanks.
>
> --
> H.J.
  
H.J. Lu Jan. 2, 2024, 6:48 p.m. UTC | #10
On Tue, Jan 2, 2024 at 10:39 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Tue, Jan 2, 2024 at 10:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Tue, Jan 2, 2024 at 9:56 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Tue, Jan 2, 2024 at 7:13 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > Changes in v3:
> > > >
> > > > 1. Define and use macros for instruction opcodes and sizes.
> > > > 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> > > > 0xffffffffULL.
> > > > 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> > > > 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> > > > 5. Add some comments for mprotect test.
> > > > 6. Update plt_rewrite logic.
> > > >
> > > > Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> > > > relocation.
> > > >
> > > > For x86-64, add
> > > >
> > > >  #define DT_X86_64_PLT     (DT_LOPROC + 0)
> > > >  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
> > > >  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
> > > >
> > > > 1. DT_X86_64_PLT: The address of the procedure linkage table.
> > > > 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> > > > table.
> > > > 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> > > > entry.
> > > >
> > > > With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> > > > memory offset of the indirect branch instruction.
> > > >
> > > > Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> > > > with direct branch after relocation when the lazy binding is disabled.
> > > >
> > > > PLT rewrite is disabled by default since SELinux may disallow modifying
> > > > code pages and ld.so can't detect it in all cases.  Add
> > > >
> > > > $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
> > > >
> > > > to enable PLT rewrite at run-time.
> > > > ---
> > > >  elf/dynamic-link.h                   |   5 +
> > > >  elf/elf.h                            |   5 +
> > > >  elf/tst-glibcelf.py                  |   1 +
> > > >  manual/tunables.texi                 |   9 +
> > > >  scripts/glibcelf.py                  |   4 +
> > > >  sysdeps/x86/cet-control.h            |  14 ++
> > > >  sysdeps/x86/cpu-features.c           |  10 ++
> > > >  sysdeps/x86/dl-procruntime.c         |   1 +
> > > >  sysdeps/x86/dl-tunables.list         |   5 +
> > > >  sysdeps/x86_64/Makefile              |  19 ++
> > > >  sysdeps/x86_64/configure             |  35 ++++
> > > >  sysdeps/x86_64/configure.ac          |   4 +
> > > >  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
> > > >  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
> > > >  sysdeps/x86_64/link_map.h            |  22 +++
> > > >  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
> > > >  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
> > > >  17 files changed, 467 insertions(+), 1 deletion(-)
> > > >  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
> > > >  create mode 100644 sysdeps/x86_64/link_map.h
> > > >  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
> > > >  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
> > > >
> >
> > ...
> > > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > > > index e0a9b14469..51935a2fb2 100644
> > > > --- a/sysdeps/x86_64/dl-machine.h
> > > > +++ b/sysdeps/x86_64/dl-machine.h
> > > > @@ -22,6 +22,7 @@
> > > >  #define ELF_MACHINE_NAME "x86_64"
> > > >
> > > >  #include <assert.h>
> > > > +#include <stdint.h>
> > > >  #include <sys/param.h>
> > > >  #include <sysdep.h>
> > > >  #include <tls.h>
> > > > @@ -35,6 +36,9 @@
> > > >  # define RTLD_START_ENABLE_X86_FEATURES
> > > >  #endif
> > > >
> > > > +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> > > > +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> > > > +
> > > >  /* Return nonzero iff ELF header is compatible with the running host.  */
> > > >  static inline int __attribute__ ((unused))
> > > >  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> > > > @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
> > > >
> > > >        switch (r_type)
> > > >         {
> > > > -       case R_X86_64_GLOB_DAT:
> > > >         case R_X86_64_JUMP_SLOT:
> > > > +         map->l_has_jump_slot_reloc = true;
> > > > +       case R_X86_64_GLOB_DAT:
> > > >           *reloc_addr = value;
> > > >           break;
> > > >
> > > > @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> > > >  }
> > > >
> > > >  #endif /* RESOLVE_MAP */
> > > > +
> > > > +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> > > > +    && defined SHARED
> > > > +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> > > > +  x86_64_dynamic_after_reloc (map, (lazy))
> > > > +
> > > > +# define JMP32_INSN_OPCODE     0xe9
> > > > +# define JMP32_INSN_SIZE       5
> > > > +# define JMPABS_INSN_OPCODE    0xa100d5
> > > > +# define JMPABS_INSN_SIZE      11
> > > > +# define INT3_INSN_OPCODE      0xcc
> > > > +
> > > > +static const char *
> > > > +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> > > > +{
> > > > +  const ElfW(Sym) *const symtab
> > > > +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> > > > +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> > > > +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> > > > +  return strtab + refsym->st_name;
> > > > +}
> > > > +
> > > > +static void
> > > > +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> > > > +                   ElfW(Addr) plt_aligned)
> > > > +{
> > > > +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> > > > +  ElfW(Addr) l_addr = map->l_addr;
> > > > +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> > > > +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> > > > +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> > > > +  const ElfW(Rela) *reloc = (const void *) start;
> > > > +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> > > > +
> > > > +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> > > > +                                         header.feature_1);
> > > > +  bool ibt_enabled_p
> > > > +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> > > > +
> > > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> > > > +                     DSO_FILENAME (map->l_name));
> > > > +
> > > > +  for (; reloc < reloc_end; reloc++)
> > > > +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> > > > +      {
> > > > +       /* Get the value from the GOT entry.  */
> > > > +       ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> > > > +
> > > > +       /* Get the corresponding PLT entry from r_addend.  */
> > > > +       ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> > > > +       /* Skip ENDBR64 if IBT isn't enabled.  */
> > > > +       if (!ibt_enabled_p)
> > > > +         branch_start = ALIGN_DOWN (branch_start, pltent);
> > > > +       /* Get the displacement from the branch target.  */
> > > > +       ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> > > > +       ElfW(Addr) plt_end;
> > > > +       ElfW(Addr) pad;
> > > > +
> > > > +       branch_start += plt_rewrite_bias;
> > > > +       plt_end = (branch_start & -pltent) + pltent;
> > > > +
> > > > +       /* Update the PLT entry.  */
> > > > +       if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
> > > > +           <= (uint64_t) UINT32_MAX)
> > > > +         {
> > > > +           /* If the target branch can be reached with a direct branch,
> > > > +              rewrite the PLT entry with a direct branch.  */
> > > > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > > > +             {
> > > > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > > > +                                                                reloc);
> > > > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > > > +                                 "direct branch\n", sym_name,
> > > > +                                 DSO_FILENAME (map->l_name));
> > > > +             }
> > > > +
> > > > +           pad = branch_start + JMP32_INSN_SIZE;
> > > > +
> > > > +           if (__glibc_unlikely (pad > plt_end))
> > > > +             {
> > > > +               if (__glibc_unlikely (GLRO(dl_debug_mask)
> > > > +                                     & DL_DEBUG_BINDINGS))
> > > > +                 {
> > > > +                   const char *sym_name
> > > > +                     = x86_64_reloc_symbol_name (map, reloc);
> > > > +                   _dl_debug_printf ("\ninvalid r_addend of "
> > > > +                                     "R_X86_64_JUMP_SLOT against '%s' "
> > > > +                                     "in '%s'\n", sym_name,
> > > > +                                     DSO_FILENAME (map->l_name));
> > > > +                 }
> > > > +
> > > > +               continue;
> > > > +             }
> > > > +
> > > > +           /* Write out direct branch.  */
> > > > +           *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> > > > +           *((uint32_t *) (branch_start + 1)) = disp;
> > > > +         }
> > > > +       else
> > > > +         {
> > > > +           if (GL(dl_x86_feature_control).plt_rewrite
> > > > +               != plt_rewrite_jmpabs)
> > > > +             continue;
> > > > +
> > > > +           pad = branch_start + JMPABS_INSN_SIZE;
> > > > +
> > > > +           if (pad > plt_end)
> > > > +             continue;
> > > > +
> > > > +           /* Rewrite the PLT entry with JMPABS.  */
> > > > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > > > +             {
> > > > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > > > +                                                                reloc);
> > > > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > > > +                                 "JMPABS\n", sym_name,
> > > > +                                 DSO_FILENAME (map->l_name));
> > > > +             }
> > > > +
> > > > +           /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> > > > +              a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> > > > +              between 4-byte write and 8-byte write.  */
> > > > +           *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> > > > +           *(uint64_t *) (branch_start + 3) = value;
> > > > +         }
> > > > +
> > > > +       /* Fill the unused part of the PLT entry with INT3.  */
> > > > +       for (; pad < plt_end; pad++)
> > > > +         *(uint8_t *) pad = INT3_INSN_OPCODE;
> > > Should this be moved outside of the `else` scope so that we also pad for
> > > jmp_d32 case?
> >
> > It is outside of the else.  Both cases are padded.
> >
> > > > +      }
> > > > +}
> > > > +
> > > > +static inline void
> > > > +x86_64_rewrite_plt_in_place (struct link_map *map)
> > > > +{
> > > > +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> > > > +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> > > > +                   + map->l_addr);
> > > > +  size_t pagesize = GLRO(dl_pagesize);
> > > > +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> > > > +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > > +                 + plt - plt_aligned);
> > > > +
> > > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> > > > +                     DSO_FILENAME (map->l_name));
> > > > +
> > > > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > > > +                                   PROT_WRITE | PROT_READ) < 0))
> > > > +    {
> > > > +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > +       _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> > > > +                         DSO_FILENAME (map->l_name));
> > > > +      return;
> > > > +    }
> > > > +
> > > > +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> > > > +
> > > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> > > > +          DSO_FILENAME (map->l_name));
> > > > +
> > > > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > > > +                                   PROT_EXEC | PROT_READ) < 0))
> > > > +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> > > > +                     "failed to change PLT back to read-only");
> > > > +}
> > > > +
> > > > +/* Rewrite PLT entries to direct branch if possible.  */
> > > > +
> > > > +static inline void
> > > > +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> > > > +{
> > > > +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> > > > +  if (lazy)
> > > > +    return;
> > > > +
> > > > +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> > > > +    return;
> > > > +
> > > > +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> > > > +  if (!map->l_has_jump_slot_reloc)
> > > > +    return;
> > > > +
> > > > +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> > > > +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> > > > +    return;
> > > > +
> > > > +  /* Ignore DT_X86_64_PLT if
> > > > +     1. DT_JMPREL isn't available or its value is 0.
> > > > +     2. DT_PLTRELSZ is 0.
> > > > +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> > > > +       16 bytes.
> > > > +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> > > > +       DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> > > > +       value.  */
> > > > +  if (map->l_info[DT_JMPREL] == NULL
> > > > +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> > > > +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> > > > +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> > > > +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> > > > +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> > > > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > > +         < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> > > > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > > +         % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> > > > +    return;
> > > > +
> > > > +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> > > > +    {
> > > > +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> > > > +
> > > > +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> > > > +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> > > > +                         MAP_PRIVATE | MAP_ANONYMOUS,
> > > > +                         -1, 0);
> > > > +      if (__glibc_unlikely (plt != MAP_FAILED))
> > > > +       {
> > > > +         /* Touch the PROT_READ | PROT_WRITE page.  */
> > > > +         *(int32_t *) plt = -1;
> > > Should this be +=1 so it does r/w instead of just w?
> > > I doubt it will matter, but it does seem more complete.
> >
> > I will drop the whole mprotect check.
>
> Will the check never supply anything. Without the check if its
> not possible we will fail to start the process which seems like
> an outcome to avoid if possible.

It was suggested that the check should be removed:

https://sourceware.org/pipermail/libc-alpha/2024-January/153645.html

I can put it back.  We need a consensus.
  
Noah Goldstein Jan. 2, 2024, 7:02 p.m. UTC | #11
On Tue, Jan 2, 2024 at 10:48 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Jan 2, 2024 at 10:39 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Tue, Jan 2, 2024 at 10:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Tue, Jan 2, 2024 at 9:56 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 2, 2024 at 7:13 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > Changes in v3:
> > > > >
> > > > > 1. Define and use macros for instruction opcodes and sizes.
> > > > > 2. Use INT32_MIN and UINT32_MAX instead of 0x80000000ULL and
> > > > > 0xffffffffULL.
> > > > > 3. Replace 3 1-byte writes with a 4-byte write to write JMPABS.
> > > > > 4. Verify that DT_X86_64_PLTSZ is a multiple of DT_X86_64_PLTENT.
> > > > > 5. Add some comments for mprotect test.
> > > > > 6. Update plt_rewrite logic.
> > > > >
> > > > > Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
> > > > > relocation.
> > > > >
> > > > > For x86-64, add
> > > > >
> > > > >  #define DT_X86_64_PLT     (DT_LOPROC + 0)
> > > > >  #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
> > > > >  #define DT_X86_64_PLTENT  (DT_LOPROC + 3)
> > > > >
> > > > > 1. DT_X86_64_PLT: The address of the procedure linkage table.
> > > > > 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
> > > > > table.
> > > > > 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
> > > > > entry.
> > > > >
> > > > > With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
> > > > > memory offset of the indirect branch instruction.
> > > > >
> > > > > Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
> > > > > with direct branch after relocation when the lazy binding is disabled.
> > > > >
> > > > > PLT rewrite is disabled by default since SELinux may disallow modifying
> > > > > code pages and ld.so can't detect it in all cases.  Add
> > > > >
> > > > > $ GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
> > > > >
> > > > > to enable PLT rewrite at run-time.
> > > > > ---
> > > > >  elf/dynamic-link.h                   |   5 +
> > > > >  elf/elf.h                            |   5 +
> > > > >  elf/tst-glibcelf.py                  |   1 +
> > > > >  manual/tunables.texi                 |   9 +
> > > > >  scripts/glibcelf.py                  |   4 +
> > > > >  sysdeps/x86/cet-control.h            |  14 ++
> > > > >  sysdeps/x86/cpu-features.c           |  10 ++
> > > > >  sysdeps/x86/dl-procruntime.c         |   1 +
> > > > >  sysdeps/x86/dl-tunables.list         |   5 +
> > > > >  sysdeps/x86_64/Makefile              |  19 ++
> > > > >  sysdeps/x86_64/configure             |  35 ++++
> > > > >  sysdeps/x86_64/configure.ac          |   4 +
> > > > >  sysdeps/x86_64/dl-dtprocnum.h        |  21 +++
> > > > >  sysdeps/x86_64/dl-machine.h          | 250 ++++++++++++++++++++++++++-
> > > > >  sysdeps/x86_64/link_map.h            |  22 +++
> > > > >  sysdeps/x86_64/tst-plt-rewrite1.c    |  31 ++++
> > > > >  sysdeps/x86_64/tst-plt-rewritemod1.c |  32 ++++
> > > > >  17 files changed, 467 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 sysdeps/x86_64/dl-dtprocnum.h
> > > > >  create mode 100644 sysdeps/x86_64/link_map.h
> > > > >  create mode 100644 sysdeps/x86_64/tst-plt-rewrite1.c
> > > > >  create mode 100644 sysdeps/x86_64/tst-plt-rewritemod1.c
> > > > >
> > >
> > > ...
> > > > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > > > > index e0a9b14469..51935a2fb2 100644
> > > > > --- a/sysdeps/x86_64/dl-machine.h
> > > > > +++ b/sysdeps/x86_64/dl-machine.h
> > > > > @@ -22,6 +22,7 @@
> > > > >  #define ELF_MACHINE_NAME "x86_64"
> > > > >
> > > > >  #include <assert.h>
> > > > > +#include <stdint.h>
> > > > >  #include <sys/param.h>
> > > > >  #include <sysdep.h>
> > > > >  #include <tls.h>
> > > > > @@ -35,6 +36,9 @@
> > > > >  # define RTLD_START_ENABLE_X86_FEATURES
> > > > >  #endif
> > > > >
> > > > > +/* Translate a processor specific dynamic tag to the index in l_info array.  */
> > > > > +#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
> > > > > +
> > > > >  /* Return nonzero iff ELF header is compatible with the running host.  */
> > > > >  static inline int __attribute__ ((unused))
> > > > >  elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
> > > > > @@ -312,8 +316,9 @@ and creates an unsatisfiable circular dependency.\n",
> > > > >
> > > > >        switch (r_type)
> > > > >         {
> > > > > -       case R_X86_64_GLOB_DAT:
> > > > >         case R_X86_64_JUMP_SLOT:
> > > > > +         map->l_has_jump_slot_reloc = true;
> > > > > +       case R_X86_64_GLOB_DAT:
> > > > >           *reloc_addr = value;
> > > > >           break;
> > > > >
> > > > > @@ -549,3 +554,246 @@ elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
> > > > >  }
> > > > >
> > > > >  #endif /* RESOLVE_MAP */
> > > > > +
> > > > > +#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
> > > > > +    && defined SHARED
> > > > > +# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
> > > > > +  x86_64_dynamic_after_reloc (map, (lazy))
> > > > > +
> > > > > +# define JMP32_INSN_OPCODE     0xe9
> > > > > +# define JMP32_INSN_SIZE       5
> > > > > +# define JMPABS_INSN_OPCODE    0xa100d5
> > > > > +# define JMPABS_INSN_SIZE      11
> > > > > +# define INT3_INSN_OPCODE      0xcc
> > > > > +
> > > > > +static const char *
> > > > > +x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
> > > > > +{
> > > > > +  const ElfW(Sym) *const symtab
> > > > > +    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
> > > > > +  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
> > > > > +  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
> > > > > +  return strtab + refsym->st_name;
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
> > > > > +                   ElfW(Addr) plt_aligned)
> > > > > +{
> > > > > +  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
> > > > > +  ElfW(Addr) l_addr = map->l_addr;
> > > > > +  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
> > > > > +  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
> > > > > +  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
> > > > > +  const ElfW(Rela) *reloc = (const void *) start;
> > > > > +  const ElfW(Rela) *reloc_end = (const void *) (start + size);
> > > > > +
> > > > > +  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
> > > > > +                                         header.feature_1);
> > > > > +  bool ibt_enabled_p
> > > > > +    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
> > > > > +
> > > > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > > +    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
> > > > > +                     DSO_FILENAME (map->l_name));
> > > > > +
> > > > > +  for (; reloc < reloc_end; reloc++)
> > > > > +    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
> > > > > +      {
> > > > > +       /* Get the value from the GOT entry.  */
> > > > > +       ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
> > > > > +
> > > > > +       /* Get the corresponding PLT entry from r_addend.  */
> > > > > +       ElfW(Addr) branch_start = l_addr + reloc->r_addend;
> > > > > +       /* Skip ENDBR64 if IBT isn't enabled.  */
> > > > > +       if (!ibt_enabled_p)
> > > > > +         branch_start = ALIGN_DOWN (branch_start, pltent);
> > > > > +       /* Get the displacement from the branch target.  */
> > > > > +       ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
> > > > > +       ElfW(Addr) plt_end;
> > > > > +       ElfW(Addr) pad;
> > > > > +
> > > > > +       branch_start += plt_rewrite_bias;
> > > > > +       plt_end = (branch_start & -pltent) + pltent;
> > > > > +
> > > > > +       /* Update the PLT entry.  */
> > > > > +       if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
> > > > > +           <= (uint64_t) UINT32_MAX)
> > > > > +         {
> > > > > +           /* If the target branch can be reached with a direct branch,
> > > > > +              rewrite the PLT entry with a direct branch.  */
> > > > > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > > > > +             {
> > > > > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > > > > +                                                                reloc);
> > > > > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > > > > +                                 "direct branch\n", sym_name,
> > > > > +                                 DSO_FILENAME (map->l_name));
> > > > > +             }
> > > > > +
> > > > > +           pad = branch_start + JMP32_INSN_SIZE;
> > > > > +
> > > > > +           if (__glibc_unlikely (pad > plt_end))
> > > > > +             {
> > > > > +               if (__glibc_unlikely (GLRO(dl_debug_mask)
> > > > > +                                     & DL_DEBUG_BINDINGS))
> > > > > +                 {
> > > > > +                   const char *sym_name
> > > > > +                     = x86_64_reloc_symbol_name (map, reloc);
> > > > > +                   _dl_debug_printf ("\ninvalid r_addend of "
> > > > > +                                     "R_X86_64_JUMP_SLOT against '%s' "
> > > > > +                                     "in '%s'\n", sym_name,
> > > > > +                                     DSO_FILENAME (map->l_name));
> > > > > +                 }
> > > > > +
> > > > > +               continue;
> > > > > +             }
> > > > > +
> > > > > +           /* Write out direct branch.  */
> > > > > +           *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
> > > > > +           *((uint32_t *) (branch_start + 1)) = disp;
> > > > > +         }
> > > > > +       else
> > > > > +         {
> > > > > +           if (GL(dl_x86_feature_control).plt_rewrite
> > > > > +               != plt_rewrite_jmpabs)
> > > > > +             continue;
> > > > > +
> > > > > +           pad = branch_start + JMPABS_INSN_SIZE;
> > > > > +
> > > > > +           if (pad > plt_end)
> > > > > +             continue;
> > > > > +
> > > > > +           /* Rewrite the PLT entry with JMPABS.  */
> > > > > +           if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
> > > > > +             {
> > > > > +               const char *sym_name = x86_64_reloc_symbol_name (map,
> > > > > +                                                                reloc);
> > > > > +               _dl_debug_printf ("changing '%s' PLT entry in '%s' to "
> > > > > +                                 "JMPABS\n", sym_name,
> > > > > +                                 DSO_FILENAME (map->l_name));
> > > > > +             }
> > > > > +
> > > > > +           /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
> > > > > +              a 3-byte opcode + 64bit address.  There is a 1-byte overlap
> > > > > +              between 4-byte write and 8-byte write.  */
> > > > > +           *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
> > > > > +           *(uint64_t *) (branch_start + 3) = value;
> > > > > +         }
> > > > > +
> > > > > +       /* Fill the unused part of the PLT entry with INT3.  */
> > > > > +       for (; pad < plt_end; pad++)
> > > > > +         *(uint8_t *) pad = INT3_INSN_OPCODE;
> > > > Should this be moved outside of the `else` scope so that we also pad for
> > > > jmp_d32 case?
> > >
> > > It is outside of the else.  Both cases are padded.
> > >
> > > > > +      }
> > > > > +}
> > > > > +
> > > > > +static inline void
> > > > > +x86_64_rewrite_plt_in_place (struct link_map *map)
> > > > > +{
> > > > > +  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
> > > > > +  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
> > > > > +                   + map->l_addr);
> > > > > +  size_t pagesize = GLRO(dl_pagesize);
> > > > > +  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
> > > > > +  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > > > +                 + plt - plt_aligned);
> > > > > +
> > > > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > > +    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
> > > > > +                     DSO_FILENAME (map->l_name));
> > > > > +
> > > > > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > > > > +                                   PROT_WRITE | PROT_READ) < 0))
> > > > > +    {
> > > > > +      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > > +       _dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
> > > > > +                         DSO_FILENAME (map->l_name));
> > > > > +      return;
> > > > > +    }
> > > > > +
> > > > > +  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
> > > > > +
> > > > > +  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
> > > > > +   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
> > > > > +          DSO_FILENAME (map->l_name));
> > > > > +
> > > > > +  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
> > > > > +                                   PROT_EXEC | PROT_READ) < 0))
> > > > > +    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
> > > > > +                     "failed to change PLT back to read-only");
> > > > > +}
> > > > > +
> > > > > +/* Rewrite PLT entries to direct branch if possible.  */
> > > > > +
> > > > > +static inline void
> > > > > +x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
> > > > > +{
> > > > > +  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
> > > > > +  if (lazy)
> > > > > +    return;
> > > > > +
> > > > > +  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
> > > > > +    return;
> > > > > +
> > > > > +  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
> > > > > +  if (!map->l_has_jump_slot_reloc)
> > > > > +    return;
> > > > > +
> > > > > +  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
> > > > > +  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
> > > > > +    return;
> > > > > +
> > > > > +  /* Ignore DT_X86_64_PLT if
> > > > > +     1. DT_JMPREL isn't available or its value is 0.
> > > > > +     2. DT_PLTRELSZ is 0.
> > > > > +     3. DT_X86_64_PLTENT isn't available or its value is smaller than
> > > > > +       16 bytes.
> > > > > +     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
> > > > > +       DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
> > > > > +       value.  */
> > > > > +  if (map->l_info[DT_JMPREL] == NULL
> > > > > +      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
> > > > > +      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
> > > > > +      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
> > > > > +      || map->l_info[DT_X86_64 (PLTENT)] == NULL
> > > > > +      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
> > > > > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > > > +         < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
> > > > > +      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
> > > > > +         % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
> > > > > +    return;
> > > > > +
> > > > > +  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
> > > > > +    {
> > > > > +      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
> > > > > +
> > > > > +      /* PLT rewrite is enabled.  Check if mprotect works.  */
> > > > > +      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
> > > > > +                         MAP_PRIVATE | MAP_ANONYMOUS,
> > > > > +                         -1, 0);
> > > > > +      if (__glibc_unlikely (plt != MAP_FAILED))
> > > > > +       {
> > > > > +         /* Touch the PROT_READ | PROT_WRITE page.  */
> > > > > +         *(int32_t *) plt = -1;
> > > > Should this be +=1 so it does r/w instead of just w?
> > > > I doubt it will matter, but it does seem more complete.
> > >
> > > I will drop the whole mprotect check.
> >
> > Will the check never supply anything. Without the check if its
> > not possible we will fail to start the process which seems like
> > an outcome to avoid if possible.
>
> It was suggested that the check should be removed:
>
> https://sourceware.org/pipermail/libc-alpha/2024-January/153645.html
>
> I can put it back.  We need a consensus.

Yeah, I saw. I tend to agree with the point that a non-robust check
isn't ideal, but for my money is better than failing entirely.

It seems extremely unlikely to me that the resources for a single
page mmap cause a spurious failure.

But if Florian/Adhemerval feel strongly, its not necessary to have
the check.
>
> --
> H.J.
  

Patch

diff --git a/elf/dynamic-link.h b/elf/dynamic-link.h
index 8cdf7bde09..83d834ecaf 100644
--- a/elf/dynamic-link.h
+++ b/elf/dynamic-link.h
@@ -177,6 +177,10 @@  elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
       }									      \
   } while (0);
 
+# ifndef ELF_DYNAMIC_AFTER_RELOC
+#  define ELF_DYNAMIC_AFTER_RELOC(map, lazy)
+# endif
+
 /* This can't just be an inline function because GCC is too dumb
    to inline functions containing inlines themselves.  */
 # ifdef RTLD_BOOTSTRAP
@@ -192,6 +196,7 @@  elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
       ELF_DYNAMIC_DO_RELR (map);					      \
     ELF_DYNAMIC_DO_REL ((map), (scope), edr_lazy, skip_ifunc);		      \
     ELF_DYNAMIC_DO_RELA ((map), (scope), edr_lazy, skip_ifunc);		      \
+    ELF_DYNAMIC_AFTER_RELOC ((map), (edr_lazy));			      \
   } while (0)
 
 #endif
diff --git a/elf/elf.h b/elf/elf.h
index ca6a7d9d67..455731663c 100644
--- a/elf/elf.h
+++ b/elf/elf.h
@@ -3639,6 +3639,11 @@  enum
 /* x86-64 sh_type values.  */
 #define SHT_X86_64_UNWIND	0x70000001 /* Unwind information.  */
 
+/* x86-64 d_tag values.  */
+#define DT_X86_64_PLT		(DT_LOPROC + 0)
+#define DT_X86_64_PLTSZ		(DT_LOPROC + 1)
+#define DT_X86_64_PLTENT	(DT_LOPROC + 3)
+#define DT_X86_64_NUM		4
 
 /* AM33 relocations.  */
 #define R_MN10300_NONE		0	/* No reloc.  */
diff --git a/elf/tst-glibcelf.py b/elf/tst-glibcelf.py
index 00cd2bba85..c191636a99 100644
--- a/elf/tst-glibcelf.py
+++ b/elf/tst-glibcelf.py
@@ -187,6 +187,7 @@  DT_VALNUM
 DT_VALRNGHI
 DT_VALRNGLO
 DT_VERSIONTAGNUM
+DT_X86_64_NUM
 ELFCLASSNUM
 ELFDATANUM
 EM_NUM
diff --git a/manual/tunables.texi b/manual/tunables.texi
index b31f16da84..f9bd83622e 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -57,6 +57,7 @@  glibc.pthread.stack_cache_size: 0x2800000 (min: 0x0, max: 0xffffffffffffffff)
 glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffffffffffff)
 glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
 glibc.elision.skip_trylock_internal_abort: 3 (min: 0, max: 2147483647)
+glibc.cpu.plt_rewrite: 0 (min: 0, max: 1)
 glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffffffffffff)
 glibc.cpu.x86_ibt:
 glibc.cpu.hwcaps:
@@ -614,6 +615,14 @@  this tunable.
 This tunable is specific to 64-bit x86-64.
 @end deftp
 
+@deftp Tunable glibc.cpu.plt_rewrite
+When this tunable is set to @code{1}, the dynamic linker will rewrite
+the PLT section with direct branch after relocation if possible when
+the lazy binding is disabled.
+
+This tunable is specific to x86-64.
+@end deftp
+
 @node Memory Related Tunables
 @section Memory Related Tunables
 @cindex memory related tunables
diff --git a/scripts/glibcelf.py b/scripts/glibcelf.py
index c5e5dda48e..5f3813f326 100644
--- a/scripts/glibcelf.py
+++ b/scripts/glibcelf.py
@@ -439,6 +439,8 @@  class DtRISCV(Dt):
     """Supplemental DT_* constants for EM_RISCV."""
 class DtSPARC(Dt):
     """Supplemental DT_* constants for EM_SPARC."""
+class DtX86_64(Dt):
+    """Supplemental DT_* constants for EM_X86_64."""
 _dt_skip = '''
 DT_ENCODING DT_PROCNUM
 DT_ADDRRNGLO DT_ADDRRNGHI DT_ADDRNUM
@@ -451,6 +453,7 @@  DT_MIPS_NUM
 DT_PPC_NUM
 DT_PPC64_NUM
 DT_SPARC_NUM
+DT_X86_64_NUM
 '''.strip().split()
 _register_elf_h(DtAARCH64, prefix='DT_AARCH64_', skip=_dt_skip, parent=Dt)
 _register_elf_h(DtALPHA, prefix='DT_ALPHA_', skip=_dt_skip, parent=Dt)
@@ -461,6 +464,7 @@  _register_elf_h(DtPPC, prefix='DT_PPC_', skip=_dt_skip, parent=Dt)
 _register_elf_h(DtPPC64, prefix='DT_PPC64_', skip=_dt_skip, parent=Dt)
 _register_elf_h(DtRISCV, prefix='DT_RISCV_', skip=_dt_skip, parent=Dt)
 _register_elf_h(DtSPARC, prefix='DT_SPARC_', skip=_dt_skip, parent=Dt)
+_register_elf_h(DtX86_64, prefix='DT_X86_64_', skip=_dt_skip, parent=Dt)
 _register_elf_h(Dt, skip=_dt_skip, ranges=True)
 del _dt_skip
 
diff --git a/sysdeps/x86/cet-control.h b/sysdeps/x86/cet-control.h
index a45d59bf8c..d4341f98ba 100644
--- a/sysdeps/x86/cet-control.h
+++ b/sysdeps/x86/cet-control.h
@@ -32,10 +32,24 @@  enum dl_x86_cet_control
   cet_permissive
 };
 
+/* PLT rewrite control.  */
+enum dl_plt_rewrite_control
+{
+  /* No PLT rewrite.  */
+  plt_rewrite_none,
+  /* PLT rewrite is enabled at run-time.  */
+  plt_rewrite_enabled,
+  /* Rewrite PLT with JMP at run-time.  */
+  plt_rewrite_jmp,
+  /* Rewrite PLT with JMPABS at run-time.  */
+  plt_rewrite_jmpabs
+};
+
 struct dl_x86_feature_control
 {
   enum dl_x86_cet_control ibt : 2;
   enum dl_x86_cet_control shstk : 2;
+  enum dl_plt_rewrite_control plt_rewrite : 2;
 };
 
 #endif /* cet-control.h */
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index f193ea7a2d..f96e6489ed 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -27,6 +27,13 @@ 
 extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
   attribute_hidden;
 
+static void
+TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
+{
+  if (valp->numval)
+    GL(dl_x86_feature_control).plt_rewrite = plt_rewrite_enabled;
+}
+
 #ifdef __LP64__
 static void
 TUNABLE_CALLBACK (set_prefer_map_32bit_exec) (tunable_val_t *valp)
@@ -996,6 +1003,9 @@  no_cpuid:
 
   TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps));
 
+  TUNABLE_GET (plt_rewrite, tunable_val_t *,
+	       TUNABLE_CALLBACK (set_plt_rewrite));
+
 #ifdef __LP64__
   TUNABLE_GET (prefer_map_32bit_exec, tunable_val_t *,
 	       TUNABLE_CALLBACK (set_prefer_map_32bit_exec));
diff --git a/sysdeps/x86/dl-procruntime.c b/sysdeps/x86/dl-procruntime.c
index 4d25d9f327..15b3d0d878 100644
--- a/sysdeps/x86/dl-procruntime.c
+++ b/sysdeps/x86/dl-procruntime.c
@@ -67,6 +67,7 @@  PROCINFO_CLASS struct dl_x86_feature_control _dl_x86_feature_control
 = {
     .ibt = DEFAULT_DL_X86_CET_CONTROL,
     .shstk = DEFAULT_DL_X86_CET_CONTROL,
+    .plt_rewrite = plt_rewrite_none,
   }
 # endif
 # if !defined SHARED || defined PROCINFO_DECL
diff --git a/sysdeps/x86/dl-tunables.list b/sysdeps/x86/dl-tunables.list
index 147a7270ec..e2e441e1b7 100644
--- a/sysdeps/x86/dl-tunables.list
+++ b/sysdeps/x86/dl-tunables.list
@@ -66,5 +66,10 @@  glibc {
     x86_shared_cache_size {
       type: SIZE_T
     }
+    plt_rewrite {
+      type: INT_32
+      minval: 0
+      maxval: 1
+    }
   }
 }
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 00120ca9ca..106f1d0f55 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -175,6 +175,25 @@  ifeq (no,$(build-hardcoded-path-in-tests))
 tests-container += tst-glibc-hwcaps-cache
 endif
 
+ifeq (yes,$(have-z-mark-plt))
+tests += \
+  tst-plt-rewrite1 \
+# tests
+modules-names += \
+  tst-plt-rewritemod1 \
+# modules-names
+
+tst-plt-rewrite1-no-pie = yes
+LDFLAGS-tst-plt-rewrite1 = -Wl,-z,mark-plt,-z,now
+LDFLAGS-tst-plt-rewritemod1.so = -Wl,-z,mark-plt,-z,now
+tst-plt-rewrite1-ENV = GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 LD_DEBUG=files:bindings
+$(objpfx)tst-plt-rewrite1: $(objpfx)tst-plt-rewritemod1.so
+$(objpfx)tst-plt-rewrite1.out: /dev/null $(objpfx)tst-plt-rewrite1
+	$(tst-plt-rewrite1-ENV) $(make-test-out) > $@ 2>&1; \
+	grep -q -E "changing 'bar' PLT entry in .*/elf/tst-plt-rewritemod1.so' to direct branch" $@; \
+	$(evaluate-test)
+endif
+
 endif # $(subdir) == elf
 
 ifeq ($(subdir),csu)
diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
index e307467afa..a15eaa349d 100755
--- a/sysdeps/x86_64/configure
+++ b/sysdeps/x86_64/configure
@@ -25,6 +25,41 @@  printf "%s\n" "$libc_cv_cc_mprefer_vector_width" >&6; }
 config_vars="$config_vars
 config-cflags-mprefer-vector-width = $libc_cv_cc_mprefer_vector_width"
 
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for linker that supports -z mark-plt" >&5
+printf %s "checking for linker that supports -z mark-plt... " >&6; }
+libc_linker_feature=no
+cat > conftest.c <<EOF
+int _start (void) { return 42; }
+EOF
+if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp
+		  -Wl,-z,mark-plt -nostdlib -nostartfiles
+		  -fPIC -shared -o conftest.so conftest.c
+		  1>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+  if ${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS $no_ssp -Wl,-z,mark-plt -nostdlib \
+      -nostartfiles -fPIC -shared -o conftest.so conftest.c 2>&1 \
+      | grep "warning: -z mark-plt ignored" > /dev/null 2>&1; then
+    true
+  else
+    libc_linker_feature=yes
+  fi
+fi
+rm -f conftest*
+if test $libc_linker_feature = yes; then
+  libc_cv_z_mark_plt=yes
+else
+  libc_cv_z_mark_plt=no
+fi
+{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_linker_feature" >&5
+printf "%s\n" "$libc_linker_feature" >&6; }
+config_vars="$config_vars
+have-z-mark-plt = $libc_cv_z_mark_plt"
+
 if test x"$build_mathvec" = xnotset; then
   build_mathvec=yes
 fi
diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
index 1215dcb1e4..40f642db31 100644
--- a/sysdeps/x86_64/configure.ac
+++ b/sysdeps/x86_64/configure.ac
@@ -10,6 +10,10 @@  LIBC_TRY_CC_OPTION([-mprefer-vector-width=128],
 LIBC_CONFIG_VAR([config-cflags-mprefer-vector-width],
 		[$libc_cv_cc_mprefer_vector_width])
 
+LIBC_LINKER_FEATURE([-z mark-plt], [-Wl,-z,mark-plt],
+                    [libc_cv_z_mark_plt=yes], [libc_cv_z_mark_plt=no])
+LIBC_CONFIG_VAR([have-z-mark-plt], [$libc_cv_z_mark_plt])
+
 if test x"$build_mathvec" = xnotset; then
   build_mathvec=yes
 fi
diff --git a/sysdeps/x86_64/dl-dtprocnum.h b/sysdeps/x86_64/dl-dtprocnum.h
new file mode 100644
index 0000000000..dd41d95b88
--- /dev/null
+++ b/sysdeps/x86_64/dl-dtprocnum.h
@@ -0,0 +1,21 @@ 
+/* Configuration of lookup functions.  x64-64 version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Number of extra dynamic section entries for this architecture.  By
+   default there are none.  */
+#define DT_THISPROCNUM	DT_X86_64_NUM
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index e0a9b14469..51935a2fb2 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -22,6 +22,7 @@ 
 #define ELF_MACHINE_NAME "x86_64"
 
 #include <assert.h>
+#include <stdint.h>
 #include <sys/param.h>
 #include <sysdep.h>
 #include <tls.h>
@@ -35,6 +36,9 @@ 
 # define RTLD_START_ENABLE_X86_FEATURES
 #endif
 
+/* Translate a processor specific dynamic tag to the index in l_info array.  */
+#define DT_X86_64(x) (DT_X86_64_##x - DT_LOPROC + DT_NUM)
+
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
 elf_machine_matches_host (const ElfW(Ehdr) *ehdr)
@@ -312,8 +316,9 @@  and creates an unsatisfiable circular dependency.\n",
 
       switch (r_type)
 	{
-	case R_X86_64_GLOB_DAT:
 	case R_X86_64_JUMP_SLOT:
+	  map->l_has_jump_slot_reloc = true;
+	case R_X86_64_GLOB_DAT:
 	  *reloc_addr = value;
 	  break;
 
@@ -549,3 +554,246 @@  elf_machine_lazy_rel (struct link_map *map, struct r_scope_elem *scope[],
 }
 
 #endif /* RESOLVE_MAP */
+
+#if !defined ELF_DYNAMIC_AFTER_RELOC && !defined RTLD_BOOTSTRAP \
+    && defined SHARED
+# define ELF_DYNAMIC_AFTER_RELOC(map, lazy) \
+  x86_64_dynamic_after_reloc (map, (lazy))
+
+# define JMP32_INSN_OPCODE	0xe9
+# define JMP32_INSN_SIZE	5
+# define JMPABS_INSN_OPCODE	0xa100d5
+# define JMPABS_INSN_SIZE	11
+# define INT3_INSN_OPCODE	0xcc
+
+static const char *
+x86_64_reloc_symbol_name (struct link_map *map, const ElfW(Rela) *reloc)
+{
+  const ElfW(Sym) *const symtab
+    = (const void *) map->l_info[DT_SYMTAB]->d_un.d_ptr;
+  const ElfW(Sym) *const refsym = &symtab[ELFW (R_SYM) (reloc->r_info)];
+  const char *strtab = (const char *) map->l_info[DT_STRTAB]->d_un.d_ptr;
+  return strtab + refsym->st_name;
+}
+
+static void
+x86_64_rewrite_plt (struct link_map *map, ElfW(Addr) plt_rewrite,
+		    ElfW(Addr) plt_aligned)
+{
+  ElfW(Addr) plt_rewrite_bias = plt_rewrite - plt_aligned;
+  ElfW(Addr) l_addr = map->l_addr;
+  ElfW(Addr) pltent = map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val;
+  ElfW(Addr) start = map->l_info[DT_JMPREL]->d_un.d_ptr;
+  ElfW(Addr) size = map->l_info[DT_PLTRELSZ]->d_un.d_val;
+  const ElfW(Rela) *reloc = (const void *) start;
+  const ElfW(Rela) *reloc_end = (const void *) (start + size);
+
+  unsigned int feature_1 = THREAD_GETMEM (THREAD_SELF,
+					  header.feature_1);
+  bool ibt_enabled_p
+    = (feature_1 & GNU_PROPERTY_X86_FEATURE_1_IBT) != 0;
+
+  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
+    _dl_debug_printf ("\nchanging PLT in '%s' to direct branch\n",
+		      DSO_FILENAME (map->l_name));
+
+  for (; reloc < reloc_end; reloc++)
+    if (ELFW(R_TYPE) (reloc->r_info) == R_X86_64_JUMP_SLOT)
+      {
+	/* Get the value from the GOT entry.  */
+	ElfW(Addr) value = *(ElfW(Addr) *) (l_addr + reloc->r_offset);
+
+	/* Get the corresponding PLT entry from r_addend.  */
+	ElfW(Addr) branch_start = l_addr + reloc->r_addend;
+	/* Skip ENDBR64 if IBT isn't enabled.  */
+	if (!ibt_enabled_p)
+	  branch_start = ALIGN_DOWN (branch_start, pltent);
+	/* Get the displacement from the branch target.  */
+	ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
+	ElfW(Addr) plt_end;
+	ElfW(Addr) pad;
+
+	branch_start += plt_rewrite_bias;
+	plt_end = (branch_start & -pltent) + pltent;
+
+	/* Update the PLT entry.  */
+	if (((uint64_t) disp + (uint64_t) ((uint32_t) INT32_MIN))
+	    <= (uint64_t) UINT32_MAX)
+	  {
+	    /* If the target branch can be reached with a direct branch,
+	       rewrite the PLT entry with a direct branch.  */
+	    if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
+	      {
+		const char *sym_name = x86_64_reloc_symbol_name (map,
+								 reloc);
+		_dl_debug_printf ("changing '%s' PLT entry in '%s' to "
+				  "direct branch\n", sym_name,
+				  DSO_FILENAME (map->l_name));
+	      }
+
+	    pad = branch_start + JMP32_INSN_SIZE;
+
+	    if (__glibc_unlikely (pad > plt_end))
+	      {
+		if (__glibc_unlikely (GLRO(dl_debug_mask)
+				      & DL_DEBUG_BINDINGS))
+		  {
+		    const char *sym_name
+		      = x86_64_reloc_symbol_name (map, reloc);
+		    _dl_debug_printf ("\ninvalid r_addend of "
+				      "R_X86_64_JUMP_SLOT against '%s' "
+				      "in '%s'\n", sym_name,
+				      DSO_FILENAME (map->l_name));
+		  }
+
+		continue;
+	      }
+
+	    /* Write out direct branch.  */
+	    *(uint8_t *) branch_start = JMP32_INSN_OPCODE;
+	    *((uint32_t *) (branch_start + 1)) = disp;
+	  }
+	else
+	  {
+	    if (GL(dl_x86_feature_control).plt_rewrite
+		!= plt_rewrite_jmpabs)
+	      continue;
+
+	    pad = branch_start + JMPABS_INSN_SIZE;
+
+	    if (pad > plt_end)
+	      continue;
+
+	    /* Rewrite the PLT entry with JMPABS.  */
+	    if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_BINDINGS))
+	      {
+		const char *sym_name = x86_64_reloc_symbol_name (map,
+								 reloc);
+		_dl_debug_printf ("changing '%s' PLT entry in '%s' to "
+				  "JMPABS\n", sym_name,
+				  DSO_FILENAME (map->l_name));
+	      }
+
+	    /* "jmpabs $target" for 64-bit displacement.  NB: JMPABS has
+	       a 3-byte opcode + 64bit address.  There is a 1-byte overlap
+	       between 4-byte write and 8-byte write.  */
+	    *(uint32_t *) (branch_start) = JMPABS_INSN_OPCODE;
+	    *(uint64_t *) (branch_start + 3) = value;
+	  }
+
+	/* Fill the unused part of the PLT entry with INT3.  */
+	for (; pad < plt_end; pad++)
+	  *(uint8_t *) pad = INT3_INSN_OPCODE;
+      }
+}
+
+static inline void
+x86_64_rewrite_plt_in_place (struct link_map *map)
+{
+  /* Adjust DT_X86_64_PLT address and DT_X86_64_PLTSZ values.  */
+  ElfW(Addr) plt = (map->l_info[DT_X86_64 (PLT)]->d_un.d_ptr
+		    + map->l_addr);
+  size_t pagesize = GLRO(dl_pagesize);
+  ElfW(Addr) plt_aligned = ALIGN_DOWN (plt, pagesize);
+  size_t pltsz = (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
+		  + plt - plt_aligned);
+
+  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
+    _dl_debug_printf ("\nchanging PLT in '%s' to writable\n",
+		      DSO_FILENAME (map->l_name));
+
+  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
+				    PROT_WRITE | PROT_READ) < 0))
+    {
+      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
+	_dl_debug_printf ("\nfailed to change PLT in '%s' to writable\n",
+			  DSO_FILENAME (map->l_name));
+      return;
+    }
+
+  x86_64_rewrite_plt (map, plt_aligned, plt_aligned);
+
+  if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
+   _dl_debug_printf ("\nchanging PLT in '%s' back to read-only\n",
+          DSO_FILENAME (map->l_name));
+
+  if (__glibc_unlikely (__mprotect ((void *) plt_aligned, pltsz,
+				    PROT_EXEC | PROT_READ) < 0))
+    _dl_signal_error (0, DSO_FILENAME (map->l_name), NULL,
+		      "failed to change PLT back to read-only");
+}
+
+/* Rewrite PLT entries to direct branch if possible.  */
+
+static inline void
+x86_64_dynamic_after_reloc (struct link_map *map, int lazy)
+{
+  /* Ignore DT_X86_64_PLT if the lazy binding is enabled.  */
+  if (lazy)
+    return;
+
+  if (__glibc_likely (map->l_info[DT_X86_64 (PLT)] == NULL))
+    return;
+
+  /* Ignore DT_X86_64_PLT if there is no R_X86_64_JUMP_SLOT.  */
+  if (!map->l_has_jump_slot_reloc)
+    return;
+
+  /* Ignore DT_X86_64_PLT on ld.so to avoid changing its own PLT.  */
+  if (map == &GL(dl_rtld_map) || map->l_real == &GL(dl_rtld_map))
+    return;
+
+  /* Ignore DT_X86_64_PLT if
+     1. DT_JMPREL isn't available or its value is 0.
+     2. DT_PLTRELSZ is 0.
+     3. DT_X86_64_PLTENT isn't available or its value is smaller than
+	16 bytes.
+     4. DT_X86_64_PLTSZ isn't available or its value is smaller than
+	DT_X86_64_PLTENT's value or isn't a multiple of DT_X86_64_PLTENT's
+	value.  */
+  if (map->l_info[DT_JMPREL] == NULL
+      || map->l_info[DT_JMPREL]->d_un.d_ptr == 0
+      || map->l_info[DT_PLTRELSZ]->d_un.d_val == 0
+      || map->l_info[DT_X86_64 (PLTSZ)] == NULL
+      || map->l_info[DT_X86_64 (PLTENT)] == NULL
+      || map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val < 16
+      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
+	  < map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val)
+      || (map->l_info[DT_X86_64 (PLTSZ)]->d_un.d_val
+	  % map->l_info[DT_X86_64 (PLTENT)]->d_un.d_val) != 0)
+    return;
+
+  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_enabled)
+    {
+      enum dl_plt_rewrite_control plt_rewrite = plt_rewrite_none;
+
+      /* PLT rewrite is enabled.  Check if mprotect works.  */
+      void *plt = __mmap (NULL, 4096, PROT_READ | PROT_WRITE,
+			  MAP_PRIVATE | MAP_ANONYMOUS,
+			  -1, 0);
+      if (__glibc_unlikely (plt != MAP_FAILED))
+	{
+	  /* Touch the PROT_READ | PROT_WRITE page.  */
+	  *(int32_t *) plt = -1;
+
+	  /* If the updated PROT_READ | PROT_WRITE page can be changed
+	     to PROT_EXEC | PROT_READ, rewrite PLT.  */
+	  if (__mprotect (plt, 4096, PROT_EXEC | PROT_READ) == 0)
+	    /* Use JMPABS on APX processors.  */
+	    plt_rewrite = (CPU_FEATURE_PRESENT_P (__get_cpu_features (),
+						  APX_F)
+			   ? plt_rewrite_jmpabs : plt_rewrite_jmp);
+
+	  __munmap (plt, 4096);
+	}
+
+      GL(dl_x86_feature_control).plt_rewrite = plt_rewrite;
+    }
+
+  /* Ignore DT_X86_64_PLT if PLT rewrite isn't enabled.  */
+  if (GL(dl_x86_feature_control).plt_rewrite == plt_rewrite_none)
+    return;
+
+  x86_64_rewrite_plt_in_place (map);
+}
+#endif
diff --git a/sysdeps/x86_64/link_map.h b/sysdeps/x86_64/link_map.h
new file mode 100644
index 0000000000..ddb8e78077
--- /dev/null
+++ b/sysdeps/x86_64/link_map.h
@@ -0,0 +1,22 @@ 
+/* Additional fields in struct link_map.  x86-64 version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Has R_X86_64_JUMP_SLOT relocation.  */
+bool l_has_jump_slot_reloc;
+
+#include <sysdeps/x86/link_map.h>
diff --git a/sysdeps/x86_64/tst-plt-rewrite1.c b/sysdeps/x86_64/tst-plt-rewrite1.c
new file mode 100644
index 0000000000..0dd4724a5a
--- /dev/null
+++ b/sysdeps/x86_64/tst-plt-rewrite1.c
@@ -0,0 +1,31 @@ 
+/* Test PLT rewrite.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <support/check.h>
+
+extern const char *foo (void);
+
+static int
+do_test (void)
+{
+  TEST_COMPARE (strcmp (foo (), "PLT rewrite works"), 0);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/x86_64/tst-plt-rewritemod1.c b/sysdeps/x86_64/tst-plt-rewritemod1.c
new file mode 100644
index 0000000000..361b907790
--- /dev/null
+++ b/sysdeps/x86_64/tst-plt-rewritemod1.c
@@ -0,0 +1,32 @@ 
+/* Check PLT rewrite works correctly.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* foo calls bar with indirect branch via PLT.  PLT rewrite should
+   change it to direct branch.  */
+
+const char *
+bar (void)
+{
+  return "PLT rewrite works";
+}
+
+const char *
+foo (void)
+{
+  return bar ();
+}