manual: Recommendations for dynamic linker hardening

Message ID 87cyptkbae.fsf@oldenburg.str.redhat.com
State Changes Requested
Headers
Series manual: Recommendations for dynamic linker hardening |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed

Commit Message

Florian Weimer May 10, 2024, 2:10 p.m. UTC
  This new section in the manual provides recommendations for
use of glibc in environments with higher integrity requirements.
It's reflecting both current implementation shortcomings, and
challenges we inherit from ELF and psABI requirements.

---
 manual/dynlink.texi | 447 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 447 insertions(+)


base-commit: 143ef68b2aded7c794956beddad495af8c7d3251
  

Comments

Joe Simmons-Talbott May 10, 2024, 3:49 p.m. UTC | #1
Some minor typos and such are pointed out below.

Thanks,
Joe

On Fri, May 10, 2024 at 10:11 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> This new section in the manual provides recommendations for
> use of glibc in environments with higher integrity requirements.
> It's reflecting both current implementation shortcomings, and
> challenges we inherit from ELF and psABI requirements.
>
> ---
>  manual/dynlink.texi | 447 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 447 insertions(+)
>
> diff --git a/manual/dynlink.texi b/manual/dynlink.texi
> index d71f7a30d6..747e2c0030 100644
> --- a/manual/dynlink.texi
> +++ b/manual/dynlink.texi
> @@ -15,6 +15,7 @@ Dynamic linkers are sometimes called @dfn{dynamic loaders}.
>  @menu
>  * Dynamic Linker Invocation::   Explicit invocation of the dynamic linker.
>  * Dynamic Linker Introspection::    Interfaces for querying mapping information.
> +* Dynamic Linker Hardening::    Avoiding unexpected issues with dynamic linking.
>  @end menu
>
>  @node Dynamic Linker Invocation
> @@ -535,6 +536,452 @@ information is processed.
>  This function is a GNU extension.
>  @end deftypefun
>
> +@node Dynamic Linker Hardening
> +@section Avoiding Unexpected Issues With Dynamic Linking
> +
> +This section details recommendations for increasing application
> +robustness, by avoiding potential issues related to dynamic linking.
> +The recommendations have two main aims: reduce the involvement of the
> +dynamic linker in application execution after process startup, and
> +restrict the application to a dynamic linker feature set whose behavior
> +is more easily understood.
> +
> +Key aspects of limiting dynamic linker usage after startup are: no use
> +of the @code{dlopen} function, disabling lazy binding, and using the
> +static TLS model.  More easily understood dynamic linker behavior
> +requires avoiding name conflicts (symbols and sonames) and highly
> +customizable features like the audit subsystem.
> +
> +Note that while these steps can be considered a form of application
> +hardening, they do not guard against potential harm from accidental or
> +deliberate loading of untrusted or malicious code.
> +
> +@subsection Restricted Dynamic Linker Features
> +
> +Avoiding certain dynamic linker features can increase predictability of
> +applications and reduce the risk of running into dynamic linker defects.
> +
> +@itemize @bullet
> +@item
> +Do not use the functions @code{dlopen}, @code{dlmopen}, or
> +@code{dlclose}.  Dynamic loading and unloadign of shared objects

unloading

> +introduces substantial complications related to symbol and thread-local
> +storage (TLS) management.
> +
> +@item
> +Without the @code{dlopen} function, @code{dlsym} and @code{dlvsym}
> +cannot be used with shared object handles.  Minimizing use of both
> +functions is recommended.  If they have to be used, only the
> +@code{RTLD_DEFAULT} pseudo-handle should be used.
> +
> +@item
> +Use the local-exec or initial-exec TLS models.  If @code{dlopen} is not
> +used, there are no compatibility concerns for initial-exec TLS.  This
> +TLS model avoids most of the complexity around TLS access.  In
> +particular, there are no run-time memory allocations after process or
> +thread start.
> +
> +If shared objects are expected to be used more generally, outside the
> +restricted context, lack of compatibility between @code{dlopen} and
> +initial-exec TLS could be a concern.  In that case, the second-best
> +alternative is to use global-dynamic TLS with GNU2 TLS descriptors, for
> +targets that fully implement them, including the fast path for access to
> +TLS variables defined in the initially loaded set of objects.  Like
> +initial-exec TLS, this avoids memory allocations after thread creation.
> +
> +@item
> +Do not use lazy binding.  Lazy binding may require run-time memory
> +allocation, is not async-signal-safe, and introduces considerable
> +complexity.
> +
> +@item
> +Limit the use of indirect function (IFUNC) resolvers.  These resolvers
> +run during relocation processing, when @theglibc{} is not in a fully
> +consistent state.  If you use IFUNC resolvers, do not depend on external
> +data or function references.
> +
> +@item
> +Do not use the audit functionality (@code{LD_AUDIT}, @code{DT_AUDIT},
> +@code{DT_DEPAUDIT}).  Its callback and hooking capabilities introduce a
> +lot of complexity and subtly alter dynamic linker behavior in corner

subtlety

> +cases even if the audit module is inactive.
> +
> +@item
> +Do not use on symbol interposition, except in cases where copy

Do not use symbol interposition.

> +relocations are involved.  Without symbol interposition, the exact order
> +in which shared objects are searched are less relevant.
> +
> +@item
> +One potential source of symbol interposition is a combination of static
> +and dynamic linking, namely linking a static archive into multiple
> +dynamic shared objects.  For such scenarios, the static library should
> +be converted into its own dynamic shared object.
> +
> +A different approach to this situation uses hidden visibility for
> +symbols in the static library, but this can cause problems if the
> +library does not expect that multiple copies of its code coexist within
> +the same process, with no or partial sharing of state.
> +
> +@item
> +If you use shared objects that are linked with @code{-Wl,-Bsymbolic} (or
> +equivalent) or use protected visibility, them code for the main program

s/them/then/

> +must be built as @code{-fpic} to avoid creating copy relocations (and
> +the main program must not use copy relocations for other reasons).
> +
> +@item
> +Be careful about explicit section annotations.  Make sure that the
> +target section matches the properties of the declared entity (e.g., no
> +writable objects in @code{.text}).
> +
> +@item
> +Ensure that all assembler or object input files have recommended

have the recommended

> +security markup, particularly for non-executable stack.
> +
> +@item
> +Some features of @theglibc{} indirectly depend on run-time code loading
> +and @code{dlopen}.  Use @code{iconv_open} with built-in converters only
> +(such as @code{UTF-8}).  Do not use NSS functionality such as
> +@code{getaddrinfo} or @code{getpwuid_r} unless the system is configured
> +for built-in NSS service moduleso only (see below).

modules

> +@end itemize
> +
> +Several considerations apply to ELF constructors and destructors.
> +
> +@itemize @bullet
> +@item
> +The dynamic linker does not take constructor and destructor priorities
> +into account when determining their execution order.  Priorities are
> +only used by the link editor for ordering execution within a completely
> +linked object.
> +
> +@item
> +The recommendations to avoid cyclic dependencies and symbol
> +interposition (and the next two recommendations) make it less likely
> +that ELF objects are accessed before their ELF constructors have run (or
> +after their ELF destructors have run).  However, using @code{dlsym} and
> +@code{dlvsym}, it still possible to access uninitialized facilities even

it is still

> +with these restrictions in place.  Such access is also possible within a
> +single shared object (or the main executable).  Consider using dynamic,
> +on-demand initialization instead.  To deal with access after
> +de-initialization, it may be necessary to implement special cases for
> +that scenario, potentially with degraded functionality.
> +
> +@item
> +Do not use the @code{DT_PREINIT_ARRAY} dynamic tag.
> +
> +@item
> +Do not flag objects as @code{DF_1_INITFIRST}.
> +
> +@item
> +If @code{dlopen} and @code{dlmopen} are not used, @code{DT_NEEDED}
> +dependency information is complete, and lazy binding is disabled, the
> +execution order of ELF destructors is expected to be the reverse of the
> +ELF constructor order.  However, two separate dependency sort operations
> +still occur.  Even though the listed preconditions should ensure that
> +both sorts produce the same ordering, it is recommended not to depend on
> +the destructor order being the reverse of the constructor order.
> +@end itemize
> +
> +The following items provide C++-specific guidance for preparing
> +applications.  If another programming language is used and it uses these
> +toolchain features targeted at C++ to implement some language
> +constructs, these restrictions and recommendations still apply in
> +analogous ways.
> +
> +@itemize @bullet
> +@item
> +C++ inline functions, templates, and other constructs may need to be
> +duplicated into multiple shared objects, resulting in symbol
> +interposition.  (This scenario is similar to combining static and
> +dynamic linking, as discussed above.)
> +
> +It may be possible to use explicit template instantiations to avoid
> +symbol interposition arising from multiple copies of the same template.
> +Hidden visibility could be used as well, but this may cause problems if
> +the duplicated functionality cannot deal with multiple copies within the
> +same process, with no or only partial state sharing.
> +
> +In general, due to the way certain C++ features are implemented in the
> +toolchain, avoiding symbol interposition entirely may not be possible as
> +long as these C++ features are used.
> +
> +@item
> +The toolchain and dynamic linker have multiple mechanisms that bypass
> +the usual symbol binding procedures.  This means that the C++ one
> +definition rule (ODR) still holds even if certain symbol-based isolation
> +mechanisms are used, and object addresses are not shared across
> +translation units with incompatible type definitions.  This does not
> +matter if the previous advice regarding symbol interposition is
> +followed.  However, as the advice may be difficult to implement, it is
> +necessary to avoid ODR violations across the entire process image.
> +
> +@item
> +Be a ware that as a special of interposed symbols, symbols with the

aware

> +@code{STB_GNU_UNIQUE} type do not follow the usual symbol name space

namespace

> +isolation rules: such symbols bind across @code{RTLD_LOCAL} and
> +@code{dlmopen} boundaries.  Furthermore, symbol versioning is ignored
> +for such symbols; they are bound by symbol name only.  All their
> +definitions and uses must therefore be compatible.  Hidden visibility
> +still prevents the creation of @code{STB_GNU_UNIQUE} symbols and can
> +achieve isolation of incompatible definitions.
> +
> +@item
> +C++ exception handling and run-time type information (RTTI), as
> +implemented in the GNU toolchain, is not address-significant, and
> +therefore is not affected by the symbol binding behaviour of the dynamic
> +linker.  This means that types of the same fully-qualified name (in
> +non-anonymous name spaces) are always considered the same from an

namespaces

> +exception-handling or RTTI perspective.  This is true if the type
> +information object or vtable has hidden symbol visibility, or the
> +corresponding symbols are versioned under different symbol versions, or
> +the symbols not bound to the same objects due to the use of
> +@code{RTLD_LOCAL} or @code{dlmopen}.
> +
> +This can cause issues in applications that contain multiple incompatible
> +definitions of the same type.
> +
> +@item
> +C++ exception handling across multiple @code{dlmopen} name spaces may
> +not work, particular with the unwinder in GCC versions before 12.
> +Current toolchain versions are able to process unwinding tables across
> +@code{dlmopen} boundaries.  However, note that type comparison is
> +name-based, not address-based (see the previous item), so exception
> +types may still be matched in unexpected ways.  An important special
> +case of exception handling, invoking destructors for variables of block
> +scope, is not impacted by this RTTI type-sharing.  Likewise, regular
> +virtual member function dispatch for objects is unaffected (but still
> +requires that the type definitions match in all directly involved
> +translation units).
> +
> +@item
> +The Itanium C++ ABI requires that in some cases, C++ destructors for
> +global objects that are constructed on demand are not destructed in the
> +opposite order of their construction.  (The C++ standard requires
> +opposite constructor order.)  As a result, do not depend on the precise
> +destructor invocation order.
> +
> +@item
> +Registering destructors for later invocation allocates memory and may
> +result in process termination if insufficient memory is available.  In
> +particular, this applies to dynamic initialization of a block variable
> +with static storage duration of a type that has a non-trivial destructor.
> +If unexpected program termination is a concern, ensure that such
> +objects merely have trivial destructors, avoiding the need for
> +registration.
> +
> +This includes @code{thread_local} objects with destructors.  Callbacks
> +for destruction of per-thread objects can be registered using
> +@code{pthread_key_create} in a way that permits handling memory
> +allocation errors.
> +@end itemize
> +
> +
> +@subsection Producing Matching Binaries
> +
> +This subsection recommends tools and build flags for producing
> +applications that meet the recommendations of the previous subsection.
> +
> +@itemize @bullet
> +@item
> +Use BFD ld (@command{bfd.ld}) from GNU binutils to produce binaries,
> +invoked through a compiler driver such as @command{gcc}.  The version
> +should be not too far ahead of what was current when the version of
> +@theglibc{} was first released.
> +
> +@item
> +Do not use a binutils release that is older than the one used to build
> +@theglibc{} itself.
> +
> +@item
> +Compile with @code{-ftls-model=initial-exec} to force the initial-exec
> +TLS model.
> +
> +@item
> +Link with @option{-Wl,-z,now} to disable lazy binding.
> +
> +@item
> +Link with @option{-Wl,-z,relro} to enable RELRO (which is the default on
> +most targets).
> +
> +@item
> +Specify all direct shared objects dependencies using @code{-l} options
> +to avoid underlinking.  Rely on @code{.so} files (which can be linker
> +scripts) and searching with the @option{-l} option.  Do not specify the
> +file names of shared objects on the linker command line.
> +
> +@item
> +Consider using @option{-Wl,-z,defs} to treat underlinking as an error
> +condition.
> +
> +@item
> +For a shared object (linked with @option{-shared}, use
> +@option{-Wl,-soname,lib@dots{}} to set a soname that matches the final
> +installed name of the file.
> +
> +@item
> +Do not use the @option{-rpath} linker option.
> +
> +@item
> +Use @option{-Wl,--error-rwx-segments} and @option{-Wl,--error-execstack} to
> +instruct the link editor to fail the link if the resulting final object
> +would have read-write-execute segments or an executable stack.  Such
> +issues usually indicate that the input files are not marked up
> +correctly.
> +
> +@item
> +Ensure that for each @code{LOAD} segment in the ELF program header, file
> +offsets, memory sizes, and load addresses are multiples of the largest
> +page size supported at run time.  Similarly, the start address and size
> +of the @code{GNU_RELRO} range should be multiples of the page size.
> +
> +Avoid creating gaps between @code{LOAD} segments.  The difference
> +between the load addresses of two subsequent @code{LOAD} segments should
> +be the size of the first @code{LOAD} segment.  (This may require linking
> +with @option{-Wl,-z,noseparatecode}.)
> +
> +This may not be possible to achieve with currently available link

with the currently

> +editors.
> +
> +@item
> +If the multiple-of-page-size criterion for the @code{GNU_RELRO} region
> +cannot be achieved, ensure that the process memory image right before
> +the start of the region does not contain executable of writable memory.

executable or writable

> +@c https://sourceware.org/pipermail/libc-alpha/2022-May/138638.html
> +@end itemize
> +
> +@subsection Checking Binaries
> +
> +In some cases, if the previous recommendations are not followed, this
> +can be determined from the produced binaries.  This section contains
> +suggestions for verifying aspects of these binaries.
> +
> +@itemize @bullet
> +@item
> +To detect underlinking, examine the dynamic symbol table, for example
> +using @samp{readelf -sDW}.  If the symbol is defined in a shared object
> +that uses symbol versioning, it must carry a symbol version, as in
> +@samp{pthread_kill@@GLIBC_2.34}.
> +
> +@item
> +Examine the dynamic segment with @samp{readelf -dW} to check that all
> +the required @code{NEEDED} entries are present.  (It is not necessary to
> +list indirect dependencies.)
> +
> +@item
> +The @code{NEEDED} entries should not contain full path names including
> +slashes, only @code{sonames}.
> +
> +@item
> +For a further consistency check, collect all shared objects referenced
> +via @code{NEEDED} entries in dynamic segments, transitively, starting at
> +the main program.  Then determine their dynamic symbol tables (using
> +@samp{readelf -sDW}, for example).
> +
> +Ideally, every symbol should be defined at most once.
> +
> +If there are interposed data symbols, check if the single interposing
> +definition is in the main program.  In this case, there must be a copy
> +relocation for it.  (This only applies to targets with copy relocations.)
> +
> +Function symbols should never be interposed.
> +
> +@item
> +Using the previously collected @code{NEEDED} entries, check that the
> +dependency graph does not contain any cycles.
> +
> +@item
> +The dynamic segment should also mention @code{BIND_NOW} on the
> +@code{FLAGS} line or @code{NOW} on the @code{FLAGS_1} line (one is
> +enough).
> +
> +@item
> +For shared objects (not main programs), if the program header has a
> +@code{PT_TLS} segment, the dynamic segment (as shown by @samp{readelf
> +-dW}) should contain the @code{STATIC_TLS} flag on the @code{FLAGS}
> +line.
> +
> +If @code{STATIC_TLS} is missing in shared objects, ensure that the
> +appropriate relocations for GNU2 TLS descriptors are used (for example,
> +@code{R_AARCH64_TLSDESC} or @code{R_X86_64_TLSDESC}).
> +
> +@item
> +There should not be a reference to the @code{__tLs_get_addr} symbol in

__tls_get_addr?

> +the dynamic symbol table (in the @samp{readelf -sDW} output).
> +Thread-local storage must be accessed using the initial-exec (static)
> +model, or using GNU2 TLS descriptors.
> +
> +@item
> +Likewise, the functions @code{dlopen}, @code{dlmopen}, @code{dlclose}
> +should not be referenced from the dynamic symbol table.
> +
> +@item
> +For shared objects, there should be a @code{SONAME} entry that matches
> +the file name (the base name, i.e., the part after the slash).  The
> +@code{SONAME} string must not contain a slash @samp{/}.
> +
> +@item
> +For all objects, the dynamic segment (as shown by @samp{readelf -dW})
> +should not contain @code{RPATH} or @code{RUNPATH} entries.
> +
> +@item
> +Likewise, the dynamic segment should not show any @code{AUDIT},
> +@code{DEPAUDIT}, @code{AUXILIARY}, @code{FILTER}, or
> +@code{PREINIT_ARRAY} tags.
> +
> +@item
> +The @code{DF_1_INITFIRST} flag should not be used.
> +
> +@item
> +The program header must not have @code{LOAD} segments that are writable
> +and executable at the same time.
> +
> +@item
> +All produces objects should have @code{GNU_STACK} program header that is

produced

> +not marked as executable.  (However, on some newer targets, a
> +non-executable stack is the default, so the @code{GNU_STACK} program
> +header is not required.)
> +@end itemize
> +
> +@subsection Run-time Considerations
> +
> +@itemize @bullet
> +@item
> +Install shared objects using their sonames in a default search path
> +directory (usually @file{/usr/lib64}).  Do not use symbolic links.
> +@c This is currently not standard practice.
> +
> +@item
> +The default search path must not contain objects with duplicate file
> +names or sonames.
> +
> +@item
> +Do not use environment variables (@code{LD_@dots{} variables such as
> +@code{LD_PRELOAD} or @code{LD_LIBRARY_PATH}}, or @code{GLIBC_TUNABLES})
> +to change default dynamic linker behavior.
> +
> +@item
> +Do not install shared objects in non-default locations.  (Such locations
> +are listed explicitly in the configuration file for @command{ldconfig},
> +usually @file{/etc/ld.so.conf}, or in files included from there.)
> +
> +@item
> +Do not configure dynamically-loaded NSS service modules, to avoid
> +accidental internal use of the @code{dlopen} facility.  The @code{files}
> +and @code{dns} modules are built in and do not rely on @code{dlopen}.
> +
> +@item
> +Do not truncate and overwrite files containing programs and shared
> +objects in place, while they are used.  Instead, write the new version
> +to a different path and use @code{rename} to replace the
> +already-installed version.
> +
> +@item
> +Be aware that during an update procedure that involves multiple object
> +files (shared objects and main programs), concurrently starting
> +processes may observe an inconsistent combination of object files (some
> +already updated, some still at the previous version).
> +@end itemize
> +
>
>  @c FIXME these are undocumented:
>  @c dladdr
>
> base-commit: 143ef68b2aded7c794956beddad495af8c7d3251
>
  
Simon Josefsson May 10, 2024, 5:14 p.m. UTC | #2
Florian Weimer <fweimer@redhat.com> writes:

> +Do not use the functions @code{dlopen}, @code{dlmopen}, or
> +@code{dlclose}.
...
> +Do not use lazy binding.  Lazy binding may require run-time memory
...
> +Limit the use of indirect function (IFUNC) resolvers.  These resolvers
...
> +Do not use the audit functionality (@code{LD_AUDIT}, @code{DT_AUDIT},

While all of these mechanisms cause complexity and adds security
concerns, I feel this is a bit too harshly worded.  You can solve all
security problems by saying "then don't do that" which as a general
solution isn't all that helpful.  Couldn't this be reworded a'la
"carefully consider if using X is the best design" instead of "don't do
X"?  It makes the recomendation a bit more actionable for already
deployed code.  I'm sure there are situations where using dlopen()
actually is a the least worst solution, just like I'm sure there are
situations where use of dlopen() is frivolous and could be redesigned.

/Simon
  
Cristian Rodríguez May 11, 2024, 1:16 a.m. UTC | #3
On Fri, May 10, 2024 at 10:11 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> This new section in the manual provides recommendations for
> use of glibc in environments with higher integrity requirements.
> It's reflecting both current implementation shortcomings, and
> challenges we inherit from ELF and psABI requirements.
>
>

First of all, thank you for writing this, it will be very useful .

"+If shared objects are expected to be used more generally, outside the
+restricted context, lack of compatibility between @code{dlopen} and
+initial-exec TLS could be a concern.  In that case, the second-best
+alternative is to use global-dynamic TLS with GNU2 TLS descriptors, for
+targets that fully implement them, including the fast path for access to
+TLS variables defined in the initially loaded set of objects.  Like
+initial-exec TLS, this avoids memory allocations after thread creation."

Ok, in the wild we can safely assume dlopen will be used so why is
GNU2 TLS descriptors + global dynamic the default in targets where all
the conditions are met?
Not having memory allocations after thread creation is almost
certainly the most desirable/predictable behaviour to have..


"+There should not be a reference to the @code{__tLs_get_addr} symbol in
+the dynamic symbol table (in the @samp{readelf -sDW} output).
+Thread-local storage must be accessed using the initial-exec (static)
+model, or using GNU2 TLS descriptors."

Currently pretty much everything I have installed here contains a
reference to __tls_get_addr, most will go away if one switches the GCC
defaults .. unfortunately also things that ought to know better like
the rust compiler need to be fixed.
  
Florian Weimer May 11, 2024, 9:17 a.m. UTC | #4
* Cristian Rodríguez:

> On Fri, May 10, 2024 at 10:11 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> This new section in the manual provides recommendations for
>> use of glibc in environments with higher integrity requirements.
>> It's reflecting both current implementation shortcomings, and
>> challenges we inherit from ELF and psABI requirements.
>>
>>
>
> First of all, thank you for writing this, it will be very useful .
>
> "+If shared objects are expected to be used more generally, outside the
> +restricted context, lack of compatibility between @code{dlopen} and
> +initial-exec TLS could be a concern.  In that case, the second-best
> +alternative is to use global-dynamic TLS with GNU2 TLS descriptors, for
> +targets that fully implement them, including the fast path for access to
> +TLS variables defined in the initially loaded set of objects.  Like
> +initial-exec TLS, this avoids memory allocations after thread creation."
>
> Ok, in the wild we can safely assume dlopen will be used so why is
> GNU2 TLS descriptors + global dynamic the default in targets where all
> the conditions are met?

I assume you meant to ask “why isn't it the default”.

It's the default on aarch64.  I think ppc64le has its own equivalent
optimization (but uses different symbols, and not exactly GNU2 TLS
descriptors).

GNU2 TLS descriptors supposedly have been implemented on x86-64 for a
long, long time, but have been broken from the started because GCC and
glibc did not agree on the ABI for the descriptor call.  We believe this
has been fixed, and may flip the default for GCC 15 (if a glibc build
with the fix is detected at GCC configure time, presumably).  I think
it's just another example of an incomplete transition.

An undesirable side effect of the GNU2 TLS optimization (which is needed
to avoid the late malloc calls) is that uses up the static TLS area,
which means that dlopen calls of shared objects that use initial-exec
TLS may fail.  The previous TLS descriptors (those using __tls_get_addr)
do not have this optimization (although in principle they could), so
dlopen works in more cases.  However, we have changed the allocation
heuristics in glibc somewhat and reserve more space, for the benefit of
aarch64 and ppc64le, and there have not been further complaints about
non-working configurations since then.  That's why I think we are ready
to make the switch on x86-64 as well.

> Not having memory allocations after thread creation is almost
> certainly the most desirable/predictable behaviour to have..

I realize now that my description was inaccurate.  With dlopen, you
might still get memory allocation after thread creation because the
simple descriptors are not always used.  You only get that
initial-exec-like behavior if everything is loaded from the start.

This could of course be implemented differently inside glibc, fully
preserving ABI, but it's not the implementation we have today.

> "+There should not be a reference to the @code{__tLs_get_addr} symbol in
> +the dynamic symbol table (in the @samp{readelf -sDW} output).
> +Thread-local storage must be accessed using the initial-exec (static)
> +model, or using GNU2 TLS descriptors."
>
> Currently pretty much everything I have installed here contains a
> reference to __tls_get_addr, most will go away if one switches the GCC
> defaults .. unfortunately also things that ought to know better like
> the rust compiler need to be fixed.

Yes, that's expected with x86-64 today.  We've switch Fedora 41 to GNU2
TLS descriptors, and merged this recently into c10s:

   Enable GNU2 TLS descriptors on x86-64 (GCC only) (RHEL-25031)
  <https://gitlab.com/redhat/centos-stream/rpms/redhat-rpm-config/-/commit/d6d795286a90340b1717411807762f1083345f0f>

The “GCC only” part refers to the fact that we still need bring up
support across the LLVM ecosystem.  Basic support has been merged
upstream:

  [X86] Add Support for X86 TLSDESC Relocations
  <https://github.com/llvm/llvm-project/pull/83136>

But this has to be surfaced in Clang, Rust etc. as well.

Thanks,
Florian
  
Florian Weimer May 14, 2024, 10:06 a.m. UTC | #5
* Joe Simmons-Talbott:

>> +@item
>> +Do not use the audit functionality (@code{LD_AUDIT}, @code{DT_AUDIT},
>> +@code{DT_DEPAUDIT}).  Its callback and hooking capabilities introduce a
>> +lot of complexity and subtly alter dynamic linker behavior in corner
>
> subtlety

Sorry, I don't think this change is correct?  I think I'm using “subtly”
as an adverb here.

Rest of the suggestions applied, thanks.

Florian
  
Florian Weimer May 14, 2024, 10:20 a.m. UTC | #6
* Simon Josefsson:

> Florian Weimer <fweimer@redhat.com> writes:
>
>> +Do not use the functions @code{dlopen}, @code{dlmopen}, or
>> +@code{dlclose}.
> ...
>> +Do not use lazy binding.  Lazy binding may require run-time memory
> ...
>> +Limit the use of indirect function (IFUNC) resolvers.  These resolvers
> ...
>> +Do not use the audit functionality (@code{LD_AUDIT}, @code{DT_AUDIT},
>
> While all of these mechanisms cause complexity and adds security
> concerns, I feel this is a bit too harshly worded.  You can solve all
> security problems by saying "then don't do that" which as a general
> solution isn't all that helpful.  Couldn't this be reworded a'la
> "carefully consider if using X is the best design" instead of "don't do
> X"?

This entire section is not really about security hardening.  I tried to
explain that in initial paragraphs.

Lazy binding in particular is an absolute nonstarter from a complexity
perspective.

Thanks,
Florian
  
Xi Ruoyao May 15, 2024, 7:36 a.m. UTC | #7
On Fri, 2024-05-10 at 16:10 +0200, Florian Weimer wrote:
> +Avoid creating gaps between @code{LOAD} segments.  The difference
> +between the load addresses of two subsequent @code{LOAD} segments should
> +be the size of the first @code{LOAD} segment.  (This may require linking
> +with @option{-Wl,-z,noseparatecode}.)

-Wl,-z,noseparate-code (a minus sign between "separate" and "code").
  
Florian Weimer May 15, 2024, 7:41 a.m. UTC | #8
* Xi Ruoyao:

> On Fri, 2024-05-10 at 16:10 +0200, Florian Weimer wrote:
>> +Avoid creating gaps between @code{LOAD} segments.  The difference
>> +between the load addresses of two subsequent @code{LOAD} segments should
>> +be the size of the first @code{LOAD} segment.  (This may require linking
>> +with @option{-Wl,-z,noseparatecode}.)
>
> -Wl,-z,noseparate-code (a minus sign between "separate" and "code").

Thanks, fixed locally.

Florian
  

Patch

diff --git a/manual/dynlink.texi b/manual/dynlink.texi
index d71f7a30d6..747e2c0030 100644
--- a/manual/dynlink.texi
+++ b/manual/dynlink.texi
@@ -15,6 +15,7 @@  Dynamic linkers are sometimes called @dfn{dynamic loaders}.
 @menu
 * Dynamic Linker Invocation::   Explicit invocation of the dynamic linker.
 * Dynamic Linker Introspection::    Interfaces for querying mapping information.
+* Dynamic Linker Hardening::    Avoiding unexpected issues with dynamic linking.
 @end menu
 
 @node Dynamic Linker Invocation
@@ -535,6 +536,452 @@  information is processed.
 This function is a GNU extension.
 @end deftypefun
 
+@node Dynamic Linker Hardening
+@section Avoiding Unexpected Issues With Dynamic Linking
+
+This section details recommendations for increasing application
+robustness, by avoiding potential issues related to dynamic linking.
+The recommendations have two main aims: reduce the involvement of the
+dynamic linker in application execution after process startup, and
+restrict the application to a dynamic linker feature set whose behavior
+is more easily understood.
+
+Key aspects of limiting dynamic linker usage after startup are: no use
+of the @code{dlopen} function, disabling lazy binding, and using the
+static TLS model.  More easily understood dynamic linker behavior
+requires avoiding name conflicts (symbols and sonames) and highly
+customizable features like the audit subsystem.
+
+Note that while these steps can be considered a form of application
+hardening, they do not guard against potential harm from accidental or
+deliberate loading of untrusted or malicious code.
+
+@subsection Restricted Dynamic Linker Features
+
+Avoiding certain dynamic linker features can increase predictability of
+applications and reduce the risk of running into dynamic linker defects.
+
+@itemize @bullet
+@item
+Do not use the functions @code{dlopen}, @code{dlmopen}, or
+@code{dlclose}.  Dynamic loading and unloadign of shared objects
+introduces substantial complications related to symbol and thread-local
+storage (TLS) management.
+
+@item
+Without the @code{dlopen} function, @code{dlsym} and @code{dlvsym}
+cannot be used with shared object handles.  Minimizing use of both
+functions is recommended.  If they have to be used, only the
+@code{RTLD_DEFAULT} pseudo-handle should be used.
+
+@item
+Use the local-exec or initial-exec TLS models.  If @code{dlopen} is not
+used, there are no compatibility concerns for initial-exec TLS.  This
+TLS model avoids most of the complexity around TLS access.  In
+particular, there are no run-time memory allocations after process or
+thread start.
+
+If shared objects are expected to be used more generally, outside the
+restricted context, lack of compatibility between @code{dlopen} and
+initial-exec TLS could be a concern.  In that case, the second-best
+alternative is to use global-dynamic TLS with GNU2 TLS descriptors, for
+targets that fully implement them, including the fast path for access to
+TLS variables defined in the initially loaded set of objects.  Like
+initial-exec TLS, this avoids memory allocations after thread creation.
+
+@item
+Do not use lazy binding.  Lazy binding may require run-time memory
+allocation, is not async-signal-safe, and introduces considerable
+complexity.
+
+@item
+Limit the use of indirect function (IFUNC) resolvers.  These resolvers
+run during relocation processing, when @theglibc{} is not in a fully
+consistent state.  If you use IFUNC resolvers, do not depend on external
+data or function references.
+
+@item
+Do not use the audit functionality (@code{LD_AUDIT}, @code{DT_AUDIT},
+@code{DT_DEPAUDIT}).  Its callback and hooking capabilities introduce a
+lot of complexity and subtly alter dynamic linker behavior in corner
+cases even if the audit module is inactive.
+
+@item
+Do not use on symbol interposition, except in cases where copy
+relocations are involved.  Without symbol interposition, the exact order
+in which shared objects are searched are less relevant.
+
+@item
+One potential source of symbol interposition is a combination of static
+and dynamic linking, namely linking a static archive into multiple
+dynamic shared objects.  For such scenarios, the static library should
+be converted into its own dynamic shared object.
+
+A different approach to this situation uses hidden visibility for
+symbols in the static library, but this can cause problems if the
+library does not expect that multiple copies of its code coexist within
+the same process, with no or partial sharing of state.
+
+@item
+If you use shared objects that are linked with @code{-Wl,-Bsymbolic} (or
+equivalent) or use protected visibility, them code for the main program
+must be built as @code{-fpic} to avoid creating copy relocations (and
+the main program must not use copy relocations for other reasons).
+
+@item
+Be careful about explicit section annotations.  Make sure that the
+target section matches the properties of the declared entity (e.g., no
+writable objects in @code{.text}).
+
+@item
+Ensure that all assembler or object input files have recommended
+security markup, particularly for non-executable stack.
+
+@item
+Some features of @theglibc{} indirectly depend on run-time code loading
+and @code{dlopen}.  Use @code{iconv_open} with built-in converters only
+(such as @code{UTF-8}).  Do not use NSS functionality such as
+@code{getaddrinfo} or @code{getpwuid_r} unless the system is configured
+for built-in NSS service moduleso only (see below).
+@end itemize
+
+Several considerations apply to ELF constructors and destructors.
+
+@itemize @bullet
+@item
+The dynamic linker does not take constructor and destructor priorities
+into account when determining their execution order.  Priorities are
+only used by the link editor for ordering execution within a completely
+linked object.
+
+@item
+The recommendations to avoid cyclic dependencies and symbol
+interposition (and the next two recommendations) make it less likely
+that ELF objects are accessed before their ELF constructors have run (or
+after their ELF destructors have run).  However, using @code{dlsym} and
+@code{dlvsym}, it still possible to access uninitialized facilities even
+with these restrictions in place.  Such access is also possible within a
+single shared object (or the main executable).  Consider using dynamic,
+on-demand initialization instead.  To deal with access after
+de-initialization, it may be necessary to implement special cases for
+that scenario, potentially with degraded functionality.
+
+@item
+Do not use the @code{DT_PREINIT_ARRAY} dynamic tag.
+
+@item
+Do not flag objects as @code{DF_1_INITFIRST}.
+
+@item
+If @code{dlopen} and @code{dlmopen} are not used, @code{DT_NEEDED}
+dependency information is complete, and lazy binding is disabled, the
+execution order of ELF destructors is expected to be the reverse of the
+ELF constructor order.  However, two separate dependency sort operations
+still occur.  Even though the listed preconditions should ensure that
+both sorts produce the same ordering, it is recommended not to depend on
+the destructor order being the reverse of the constructor order.
+@end itemize
+
+The following items provide C++-specific guidance for preparing
+applications.  If another programming language is used and it uses these
+toolchain features targeted at C++ to implement some language
+constructs, these restrictions and recommendations still apply in
+analogous ways.
+
+@itemize @bullet
+@item
+C++ inline functions, templates, and other constructs may need to be
+duplicated into multiple shared objects, resulting in symbol
+interposition.  (This scenario is similar to combining static and
+dynamic linking, as discussed above.)
+
+It may be possible to use explicit template instantiations to avoid
+symbol interposition arising from multiple copies of the same template.
+Hidden visibility could be used as well, but this may cause problems if
+the duplicated functionality cannot deal with multiple copies within the
+same process, with no or only partial state sharing.
+
+In general, due to the way certain C++ features are implemented in the
+toolchain, avoiding symbol interposition entirely may not be possible as
+long as these C++ features are used.
+
+@item
+The toolchain and dynamic linker have multiple mechanisms that bypass
+the usual symbol binding procedures.  This means that the C++ one
+definition rule (ODR) still holds even if certain symbol-based isolation
+mechanisms are used, and object addresses are not shared across
+translation units with incompatible type definitions.  This does not
+matter if the previous advice regarding symbol interposition is
+followed.  However, as the advice may be difficult to implement, it is
+necessary to avoid ODR violations across the entire process image.
+
+@item
+Be a ware that as a special of interposed symbols, symbols with the
+@code{STB_GNU_UNIQUE} type do not follow the usual symbol name space
+isolation rules: such symbols bind across @code{RTLD_LOCAL} and
+@code{dlmopen} boundaries.  Furthermore, symbol versioning is ignored
+for such symbols; they are bound by symbol name only.  All their
+definitions and uses must therefore be compatible.  Hidden visibility
+still prevents the creation of @code{STB_GNU_UNIQUE} symbols and can
+achieve isolation of incompatible definitions.
+
+@item
+C++ exception handling and run-time type information (RTTI), as
+implemented in the GNU toolchain, is not address-significant, and
+therefore is not affected by the symbol binding behaviour of the dynamic
+linker.  This means that types of the same fully-qualified name (in
+non-anonymous name spaces) are always considered the same from an
+exception-handling or RTTI perspective.  This is true if the type
+information object or vtable has hidden symbol visibility, or the
+corresponding symbols are versioned under different symbol versions, or
+the symbols not bound to the same objects due to the use of
+@code{RTLD_LOCAL} or @code{dlmopen}.
+
+This can cause issues in applications that contain multiple incompatible
+definitions of the same type.
+
+@item
+C++ exception handling across multiple @code{dlmopen} name spaces may
+not work, particular with the unwinder in GCC versions before 12.
+Current toolchain versions are able to process unwinding tables across
+@code{dlmopen} boundaries.  However, note that type comparison is
+name-based, not address-based (see the previous item), so exception
+types may still be matched in unexpected ways.  An important special
+case of exception handling, invoking destructors for variables of block
+scope, is not impacted by this RTTI type-sharing.  Likewise, regular
+virtual member function dispatch for objects is unaffected (but still
+requires that the type definitions match in all directly involved
+translation units).
+
+@item
+The Itanium C++ ABI requires that in some cases, C++ destructors for
+global objects that are constructed on demand are not destructed in the
+opposite order of their construction.  (The C++ standard requires
+opposite constructor order.)  As a result, do not depend on the precise
+destructor invocation order.
+
+@item
+Registering destructors for later invocation allocates memory and may
+result in process termination if insufficient memory is available.  In
+particular, this applies to dynamic initialization of a block variable
+with static storage duration of a type that has a non-trivial destructor.
+If unexpected program termination is a concern, ensure that such
+objects merely have trivial destructors, avoiding the need for
+registration.
+
+This includes @code{thread_local} objects with destructors.  Callbacks
+for destruction of per-thread objects can be registered using
+@code{pthread_key_create} in a way that permits handling memory
+allocation errors.
+@end itemize
+
+
+@subsection Producing Matching Binaries
+
+This subsection recommends tools and build flags for producing
+applications that meet the recommendations of the previous subsection.
+
+@itemize @bullet
+@item
+Use BFD ld (@command{bfd.ld}) from GNU binutils to produce binaries,
+invoked through a compiler driver such as @command{gcc}.  The version
+should be not too far ahead of what was current when the version of
+@theglibc{} was first released.
+
+@item
+Do not use a binutils release that is older than the one used to build
+@theglibc{} itself.
+
+@item
+Compile with @code{-ftls-model=initial-exec} to force the initial-exec
+TLS model.
+
+@item
+Link with @option{-Wl,-z,now} to disable lazy binding.
+
+@item
+Link with @option{-Wl,-z,relro} to enable RELRO (which is the default on
+most targets).
+
+@item
+Specify all direct shared objects dependencies using @code{-l} options
+to avoid underlinking.  Rely on @code{.so} files (which can be linker
+scripts) and searching with the @option{-l} option.  Do not specify the
+file names of shared objects on the linker command line.
+
+@item
+Consider using @option{-Wl,-z,defs} to treat underlinking as an error
+condition.
+
+@item
+For a shared object (linked with @option{-shared}, use
+@option{-Wl,-soname,lib@dots{}} to set a soname that matches the final
+installed name of the file.
+
+@item
+Do not use the @option{-rpath} linker option.
+
+@item
+Use @option{-Wl,--error-rwx-segments} and @option{-Wl,--error-execstack} to
+instruct the link editor to fail the link if the resulting final object
+would have read-write-execute segments or an executable stack.  Such
+issues usually indicate that the input files are not marked up
+correctly.
+
+@item
+Ensure that for each @code{LOAD} segment in the ELF program header, file
+offsets, memory sizes, and load addresses are multiples of the largest
+page size supported at run time.  Similarly, the start address and size
+of the @code{GNU_RELRO} range should be multiples of the page size.
+
+Avoid creating gaps between @code{LOAD} segments.  The difference
+between the load addresses of two subsequent @code{LOAD} segments should
+be the size of the first @code{LOAD} segment.  (This may require linking
+with @option{-Wl,-z,noseparatecode}.)
+
+This may not be possible to achieve with currently available link
+editors.
+
+@item
+If the multiple-of-page-size criterion for the @code{GNU_RELRO} region
+cannot be achieved, ensure that the process memory image right before
+the start of the region does not contain executable of writable memory.
+@c https://sourceware.org/pipermail/libc-alpha/2022-May/138638.html
+@end itemize
+
+@subsection Checking Binaries
+
+In some cases, if the previous recommendations are not followed, this
+can be determined from the produced binaries.  This section contains
+suggestions for verifying aspects of these binaries.
+
+@itemize @bullet
+@item
+To detect underlinking, examine the dynamic symbol table, for example
+using @samp{readelf -sDW}.  If the symbol is defined in a shared object
+that uses symbol versioning, it must carry a symbol version, as in
+@samp{pthread_kill@@GLIBC_2.34}.
+
+@item
+Examine the dynamic segment with @samp{readelf -dW} to check that all
+the required @code{NEEDED} entries are present.  (It is not necessary to
+list indirect dependencies.)
+
+@item
+The @code{NEEDED} entries should not contain full path names including
+slashes, only @code{sonames}.
+
+@item
+For a further consistency check, collect all shared objects referenced
+via @code{NEEDED} entries in dynamic segments, transitively, starting at
+the main program.  Then determine their dynamic symbol tables (using
+@samp{readelf -sDW}, for example).
+
+Ideally, every symbol should be defined at most once.
+
+If there are interposed data symbols, check if the single interposing
+definition is in the main program.  In this case, there must be a copy
+relocation for it.  (This only applies to targets with copy relocations.)
+
+Function symbols should never be interposed.
+
+@item
+Using the previously collected @code{NEEDED} entries, check that the
+dependency graph does not contain any cycles.
+
+@item
+The dynamic segment should also mention @code{BIND_NOW} on the
+@code{FLAGS} line or @code{NOW} on the @code{FLAGS_1} line (one is
+enough).
+
+@item
+For shared objects (not main programs), if the program header has a
+@code{PT_TLS} segment, the dynamic segment (as shown by @samp{readelf
+-dW}) should contain the @code{STATIC_TLS} flag on the @code{FLAGS}
+line.
+
+If @code{STATIC_TLS} is missing in shared objects, ensure that the
+appropriate relocations for GNU2 TLS descriptors are used (for example,
+@code{R_AARCH64_TLSDESC} or @code{R_X86_64_TLSDESC}).
+
+@item
+There should not be a reference to the @code{__tLs_get_addr} symbol in
+the dynamic symbol table (in the @samp{readelf -sDW} output).
+Thread-local storage must be accessed using the initial-exec (static)
+model, or using GNU2 TLS descriptors.
+
+@item
+Likewise, the functions @code{dlopen}, @code{dlmopen}, @code{dlclose}
+should not be referenced from the dynamic symbol table.
+
+@item
+For shared objects, there should be a @code{SONAME} entry that matches
+the file name (the base name, i.e., the part after the slash).  The
+@code{SONAME} string must not contain a slash @samp{/}.
+
+@item
+For all objects, the dynamic segment (as shown by @samp{readelf -dW})
+should not contain @code{RPATH} or @code{RUNPATH} entries.
+
+@item
+Likewise, the dynamic segment should not show any @code{AUDIT},
+@code{DEPAUDIT}, @code{AUXILIARY}, @code{FILTER}, or
+@code{PREINIT_ARRAY} tags.
+
+@item
+The @code{DF_1_INITFIRST} flag should not be used.
+
+@item
+The program header must not have @code{LOAD} segments that are writable
+and executable at the same time.
+
+@item
+All produces objects should have @code{GNU_STACK} program header that is
+not marked as executable.  (However, on some newer targets, a
+non-executable stack is the default, so the @code{GNU_STACK} program
+header is not required.)
+@end itemize
+
+@subsection Run-time Considerations
+
+@itemize @bullet
+@item
+Install shared objects using their sonames in a default search path
+directory (usually @file{/usr/lib64}).  Do not use symbolic links.
+@c This is currently not standard practice.
+
+@item
+The default search path must not contain objects with duplicate file
+names or sonames.
+
+@item
+Do not use environment variables (@code{LD_@dots{} variables such as
+@code{LD_PRELOAD} or @code{LD_LIBRARY_PATH}}, or @code{GLIBC_TUNABLES})
+to change default dynamic linker behavior.
+
+@item
+Do not install shared objects in non-default locations.  (Such locations
+are listed explicitly in the configuration file for @command{ldconfig},
+usually @file{/etc/ld.so.conf}, or in files included from there.)
+
+@item
+Do not configure dynamically-loaded NSS service modules, to avoid
+accidental internal use of the @code{dlopen} facility.  The @code{files}
+and @code{dns} modules are built in and do not rely on @code{dlopen}.
+
+@item
+Do not truncate and overwrite files containing programs and shared
+objects in place, while they are used.  Instead, write the new version
+to a different path and use @code{rename} to replace the
+already-installed version.
+
+@item
+Be aware that during an update procedure that involves multiple object
+files (shared objects and main programs), concurrently starting
+processes may observe an inconsistent combination of object files (some
+already updated, some still at the previous version).
+@end itemize
+
 
 @c FIXME these are undocumented:
 @c dladdr