[v4] manual: Describe struct link_map, support link maps with dlinfo
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
This does not describe how to use RTLD_DI_ORIGIN and l_name
to reconstruct a full path for the an object. The reason
is that I think we should not recommend further use of
RTLD_DI_ORIGIN due to its buffer overflow potential. This
should be covered by another dlinfo extension. It would
also obsolete the need for the dladdr approach to obtain
the file name for the main executable.
Obtaining the lowest address from load segments in program
headers is quite clumsy and should be provided directly
via dlinfo.
---
v3: (unfortunately labeled v2) was not tested properly and had
a Texinfo glitch. No other changes.
manual/dynlink.texi | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 105 insertions(+), 2 deletions(-)
base-commit: a0ecbb45969e93ec5eb6ba0d1f0a5578bdb2e54c
Comments
On Thu, 8 Aug 2024, Florian Weimer wrote:
> +exposed through the @code{stuct link_map}. Applications must not modify
s/stuct/struct/
* Joseph Myers:
> On Thu, 8 Aug 2024, Florian Weimer wrote:
>
>> +exposed through the @code{stuct link_map}. Applications must not modify
>
> s/stuct/struct/
Thanks, fixed locally.
Florian
On 8/8/24 9:23 AM, Florian Weimer wrote:
> This does not describe how to use RTLD_DI_ORIGIN and l_name
> to reconstruct a full path for the an object. The reason
> is that I think we should not recommend further use of
> RTLD_DI_ORIGIN due to its buffer overflow potential. This
> should be covered by another dlinfo extension.
So should I explicitly request this? With a feature request like:
RTLD_DI_ORIGIN2 (struct origin_string *)
Copy a \0 terminated pathname of the origin of the shared object
corresponding to handle to the location pointed to by info->buf up to
the size of the buffer specified by info->buf_size. If the length of the
pathname is larger than info->bufsize then buf_size will be set to -1
otherwise buf_size is left as is.
struct origin_string {
size_t buf_size; /* the size of the buffer pointed to by buf */
char *buf; /* a buffer to store the origin into */
};
-ben
> It would
> also obsolete the need for the dladdr approach to obtain
> the file name for the main executable.
>
> Obtaining the lowest address from load segments in program
> headers is quite clumsy and should be provided directly
> via dlinfo.
>
> ---
> v3: (unfortunately labeled v2) was not tested properly and had
> a Texinfo glitch. No other changes.
> manual/dynlink.texi | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 105 insertions(+), 2 deletions(-)
>
> diff --git a/manual/dynlink.texi b/manual/dynlink.texi
> index 1500a53de6..82ef5a1706 100644
> --- a/manual/dynlink.texi
> +++ b/manual/dynlink.texi
> @@ -352,16 +352,119 @@ support the XGETBV instruction.
> @node Dynamic Linker Introspection
> @section Dynamic Linker Introspection
>
> -@Theglibc{} provides various functions for querying information from the
> +@Theglibc{} provides various facilities for querying information from the
> dynamic linker.
>
> +@deftp {Data Type} {struct link_map}
> +
> +@cindex link map
> +A @dfn{link map} is associated with the main executable and each shared
> +object. Some fields of the link map are accesible to applications and
> +exposed through the @code{stuct link_map}. Applications must not modify
> +the link map directly.
> +
> +Pointers to link maps can be obtained from the @code{_r_debug} variable,
> +from the @code{RTLD_DI_LINKMAP} request for @code{dlinfo}, and from the
> +@code{_dl_find_object} function. See below for details.
> +
> +@table @code
> +@item l_addr
> +@cindex load address
> +This field contains the @dfn{load address} of the object. This is the
> +offset that needs to be applied to unrelocated addresses in the object
> +image (as to stored on disk) to form an address that can be used at run
> +time for accessing data or running code. For position-dependent
> +executables, the load address is typically zero, and no adjustment is
> +required. For position-independent objects, the @code{l_addr} field
> +usually contains the address of the object's ELF header in the process
> +image. However, this correspondence is not guaranteed because the ELF
> +header might not be mapped at all, and the ELF file as stored on disk
> +might use zero as the lowest virtual address. Due to the second
> +variable, values of the @code{l_addr} field do not necessarily uniquely
> +identify a shared object.
> +
> +On Linux, to obtain the lowest loaded address of the main program, use
> +@code{getauxval} to obtain the @code{AT_PHDR} and @code{AT_PHNUM} values
> +for the current process. Alternatively, call
> +@samp{dlinfo (_r_debug.r_map, &@var{phdr})}
> +to obtain the number of program headers, and the address of the program
> +header array will be stored in @var{phdr}
> +(of type @code{const ElfW(Phdr) *}, as explained below).
> +These values allow processing the array of program headers and the
> +address information in the @code{PT_LOAD} entries among them.
> +This works even when the program was started with an explicit loader
> +invocation.
> +
> +@item l_name
> +For a shared object, this field contains the file name that the
> +@theglibc{} dynamic loader used when opening the object. This can be a
> +relative path (relative the current directory at process start, or when
> +the object was loaded later, via @code{dlopen} or @code{dlmopen}).
> +Symbolic links are not necessarily resolved.
> +
> +For the main executable, @code{l_name} is @samp{""} (the empty string).
> +(The main executable is not loaded by @theglibc{}, so its file name is
> +not available.) On Linux, the main executable is available as
> +@file{/proc/self/exe} (unless an explicit loader invocation was used to
> +start the program). The file name @file{/proc/self/exe} continues to
> +resolve to the same file even if it is moved within or deleted from the
> +file system. Its current location can be read using @code{readlink}.
> +@xref{Symbolic Links}. (Although @file{/proc/self/exe} is not actually
> +a symbol link, it is only presented as one.) Note that @file{/proc} may
> +not be mounted, in which case @file{/proc/self/exe} is not available.
> +
> +If an explicit loader invocation is used (such as @samp{ld.so
> +/usr/bin/emacs}), the @file{/proc/self/exe} approach does not work
> +because the file name refers to the dynamic linker @code{ld.so}, and not
> +the @code{/usr/bin/emacs} program. An approximation to the executable
> +path is still available in the @code{@var{info}.dli_fname} member after
> +calling @samp{dladdr (_r_debug.r_map->l_ld, &@var{info})}. Note that
> +this could be a relative path, and it is supplied by the process that
> +created the current process, not the kernel, so it could be inaccurate.
> +
> +@item l_ld
> +This is a pointer to the ELF dynamic segment, an array of tag/value
> +pairs that provide various pieces of information that the dynamic
> +linking process uses. On most architectures, addreses in the dynamic
> +segment are relocated at run time, but on some architectures and in some
> +run-time configurations, it is necessary to add the @code{l_addr} field
> +value to obtain a proper address.
> +
> +@item l_prev
> +@itemx l_next
> +These fields are used to main a double-linked linked list of all link
> +maps within on @code{dlmopen} namespace. Note that there is currently
> +no thread-safe way to iteratoe over this list. The callback-based
> +@code{dl_iterate_phdr} interface can be used instead.
> +@end table
> +@end deftp
> +
> +@strong{Portability note:} It is not possible to create a @code{struct
> +link_map} object and pass a pointer to a function that expects a
> +@code{struct link_map *} argument. Only link map pointers initially
> +supplied by @theglibc{} are permitted as arguments. In current versions
> +of @theglibc{}, handles returned by @code{dlopen} and @code{dlmopen} are
> +pointers to link maps. However, this is not a portable assumption, and
> +may even change in future versions of @theglibc{}. To obtain the link
> +map associated with a handle, see @code{dlinfo} and
> +@code{RTLD_DI_LINKMAP} below. If a function accepts both
> +@code{dlopen}/@code{dlmopen} handles and @code{struct link_map} pointers
> +in its @code{void *} argument, that is documented explicitly.
> +
> +@subsection Querying information for loaded objects
> +
> +The @code{dlinfo} function provides access to internal information
> +associated with @code{dlopen}/@code{dlmopen} handles and link maps.
> +
> @deftypefun {int} dlinfo (void *@var{handle}, int @var{request}, void *@var{arg})
> @safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acunsafe{@acucorrupt{}}}
> @standards{GNU, dlfcn.h}
> This function returns information about @var{handle} in the memory
> location @var{arg}, based on @var{request}. The @var{handle} argument
> must be a pointer returned by @code{dlopen} or @code{dlmopen}; it must
> -not have been closed by @code{dlclose}.
> +not have been closed by @code{dlclose}. Alternatively, @var{handle}
> +can be a @code{struct link_map *} value for a link map of an object
> +that has not been closed.
>
> On success, @code{dlinfo} returns 0 for most request types; exceptions
> are noted below. If there is an error, the function returns @math{-1},
>
> base-commit: a0ecbb45969e93ec5eb6ba0d1f0a5578bdb2e54c
>
On 08/08/24 15:22, Ben Woodard wrote:
>
> On 8/8/24 9:23 AM, Florian Weimer wrote:
>> This does not describe how to use RTLD_DI_ORIGIN and l_name
>> to reconstruct a full path for the an object. The reason
>> is that I think we should not recommend further use of
>> RTLD_DI_ORIGIN due to its buffer overflow potential. This
>> should be covered by another dlinfo extension.
>
> So should I explicitly request this? With a feature request like:
>
> RTLD_DI_ORIGIN2 (struct origin_string *)
>
> Copy a \0 terminated pathname of the origin of the shared object corresponding to handle to the location pointed to by info->buf up to the size of the buffer specified by info->buf_size. If the length of the pathname is larger than info->bufsize then buf_size will be set to -1 otherwise buf_size is left as is.
>
> struct origin_string {
> size_t buf_size; /* the size of the buffer pointed to by buf */
> char *buf; /* a buffer to store the origin into */
>
> };
I think it would be better to add support to query the required size, since
in theory we do support paths larger than PATH_MAX:
struct origin_string os { .buf = NULL, .buf_size = 0 };
assert (dlinfo (handler, RTLD_DI_ORIGIN2, &os) == 0);
os.buf = malloc (os.buf_size);
assert (os.buf != NULL);
assert (dlinfo (handler, RTLD_DI_ORIGIN2, &os) == 0);
* Ben Woodard:
> On 8/8/24 9:23 AM, Florian Weimer wrote:
>> This does not describe how to use RTLD_DI_ORIGIN and l_name
>> to reconstruct a full path for the an object. The reason
>> is that I think we should not recommend further use of
>> RTLD_DI_ORIGIN due to its buffer overflow potential. This
>> should be covered by another dlinfo extension.
>
> So should I explicitly request this? With a feature request like:
>
> RTLD_DI_ORIGIN2 (struct origin_string *)
>
> Copy a \0 terminated pathname of the origin of the shared object
> corresponding to handle to the location pointed to by info->buf up to
> the size of the buffer specified by info->buf_size. If the length of
> the pathname is larger than info->bufsize then buf_size will be set to
> -1 otherwise buf_size is left as is.
>
> struct origin_string {
> size_t buf_size; /* the size of the buffer pointed to by buf */
> char *buf; /* a buffer to store the origin into */
>
> };
In the current implementation, we do not even need to make a copy. We
could return the pointer directly.
We need another dlinfo request for the non-relative pathname of the
object. We already compute that (except for the main program), but then
truncate it to get the $ORIGIN value. What if we just return this
pathname, untruncated, and also a byte count that gives the length of
the $ORIGIN prefix (up to the last '/')?
This assumes that $ORIGIN and the full pathname are closely related, but
I don't think we can change this behavior today, although it is
inconsistent: $ORIGIN for the main executable comes from /proc/self/exe
and resolves symbolic links. $ORIGIN for shared objects is the dlopen
argument, the pathname from /etc/ld.so.cache, or some search path
element concatenated with the dlopen argument—without symbolic link
resolution. Adding symbolic link resolution to $ORIGIN is a
backwards-incompatible change for shared object, and removing it from
the main executable is as well.
The latter would break /usr/bin/java on several Linux distributions. We
actually have a bug that “ld.so /usr/bin/java” does not work.
Thanks,
Florian
On 8/8/24 9:23 AM, Florian Weimer wrote:
> This does not describe how to use RTLD_DI_ORIGIN and l_name
> to reconstruct a full path for the an object. The reason
> is that I think we should not recommend further use of
> RTLD_DI_ORIGIN due to its buffer overflow potential. This
> should be covered by another dlinfo extension. It would
> also obsolete the need for the dladdr approach to obtain
> the file name for the main executable.
>
> Obtaining the lowest address from load segments in program
> headers is quite clumsy and should be provided directly
> via dlinfo.
>
> ---
> v3: (unfortunately labeled v2) was not tested properly and had
> a Texinfo glitch. No other changes.
> manual/dynlink.texi | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 105 insertions(+), 2 deletions(-)
>
> diff --git a/manual/dynlink.texi b/manual/dynlink.texi
> index 1500a53de6..82ef5a1706 100644
> --- a/manual/dynlink.texi
> +++ b/manual/dynlink.texi
> @@ -352,16 +352,119 @@ support the XGETBV instruction.
> @node Dynamic Linker Introspection
> @section Dynamic Linker Introspection
>
> -@Theglibc{} provides various functions for querying information from the
> +@Theglibc{} provides various facilities for querying information from the
> dynamic linker.
>
> +@deftp {Data Type} {struct link_map}
> +
> +@cindex link map
> +A @dfn{link map} is associated with the main executable and each shared
> +object. Some fields of the link map are accesible to applications and
> +exposed through the @code{stuct link_map}. Applications must not modify
> +the link map directly.
> +
> +Pointers to link maps can be obtained from the @code{_r_debug} variable,
> +from the @code{RTLD_DI_LINKMAP} request for @code{dlinfo}, and from the
> +@code{_dl_find_object} function. See below for details.
> +
> +@table @code
> +@item l_addr
> +@cindex load address
> +This field contains the @dfn{load address} of the object. This is the
> +offset that needs to be applied to unrelocated addresses in the object
> +image (as to stored on disk) to form an address that can be used at run
s/as to stored on disk/as stored on disk/
> +time for accessing data or running code. For position-dependent
> +executables, the load address is typically zero, and no adjustment is
> +required. For position-independent objects, the @code{l_addr} field
> +usually contains the address of the object's ELF header in the process
> +image. However, this correspondence is not guaranteed because the ELF
> +header might not be mapped at all, and the ELF file as stored on disk
> +might use zero as the lowest virtual address. Due to the second
> +variable, values of the @code{l_addr} field do not necessarily uniquely
> +identify a shared object.
> +
> +On Linux, to obtain the lowest loaded address of the main program, use
> +@code{getauxval} to obtain the @code{AT_PHDR} and @code{AT_PHNUM} values
> +for the current process. Alternatively, call
> +@samp{dlinfo (_r_debug.r_map, &@var{phdr})}
> +to obtain the number of program headers, and the address of the program
> +header array will be stored in @var{phdr}
> +(of type @code{const ElfW(Phdr) *}, as explained below).
> +These values allow processing the array of program headers and the
> +address information in the @code{PT_LOAD} entries among them.
> +This works even when the program was started with an explicit loader
> +invocation.
> +
> +@item l_name
> +For a shared object, this field contains the file name that the
> +@theglibc{} dynamic loader used when opening the object. This can be a
> +relative path (relative the current directory at process start, or when
> +the object was loaded later, via @code{dlopen} or @code{dlmopen}).
> +Symbolic links are not necessarily resolved.
> +
> +For the main executable, @code{l_name} is @samp{""} (the empty string).
> +(The main executable is not loaded by @theglibc{}, so its file name is
> +not available.) On Linux, the main executable is available as
> +@file{/proc/self/exe} (unless an explicit loader invocation was used to
> +start the program). The file name @file{/proc/self/exe} continues to
> +resolve to the same file even if it is moved within or deleted from the
> +file system. Its current location can be read using @code{readlink}.
> +@xref{Symbolic Links}. (Although @file{/proc/self/exe} is not actually
> +a symbol link, it is only presented as one.) Note that @file{/proc} may
> +not be mounted, in which case @file{/proc/self/exe} is not available.
> +
It seems like you are spending a very long time explaining how to
recover this in various situations. Wouldn't it be much easier to encode
that logic into glibc itself and use that to populate l_name for the
executable.
Furthermore, if it becomes too cumbersome to handle all the corner cases
like non-linux kernels or when /proc is not mounted then glibc is much
better placed than tool authors to request that whatever kernel provide
the information through some other mechanism.
I seriously doubt that any tools rely on the link_map->l_name to be ""
in the case of the original binary.
> +If an explicit loader invocation is used (such as @samp{ld.so
> +/usr/bin/emacs}),
Isn't the explicit loader invocation just a case where the loader is the
program and the first thing that it does is begin the process of loading
argv[1] as the first object as it gets ready run. Other than the fact
that ld.so doesn't have a INTERP program header, in what way is it
different than any other wrapped command. like "sudo ls" or "strace ls"
or "watch ls".
> the @file{/proc/self/exe} approach does not work
> +because the file name refers to the dynamic linker @code{ld.so}, and not
> +the @code{/usr/bin/emacs} program. An approximation to the executable
> +path is still available in the @code{@var{info}.dli_fname} member after
> +calling @samp{dladdr (_r_debug.r_map->l_ld, &@var{info})}. Note that
> +this could be a relative path, and it is supplied by the process that
> +created the current process, not the kernel, so it could be inaccurate.
> +
> +@item l_ld
> +This is a pointer to the ELF dynamic segment, an array of tag/value
> +pairs that provide various pieces of information that the dynamic
> +linking process uses. On most architectures, addreses in the dynamic
> +segment are relocated at run time, but on some architectures and in some
> +run-time configurations, it is necessary to add the @code{l_addr} field
> +value to obtain a proper address.
> +
> +@item l_prev
> +@itemx l_next
> +These fields are used to main a double-linked linked list of all link
> +maps within on @code{dlmopen} namespace. Note that there is currently
> +no thread-safe way to iteratoe over this list. The callback-based
> +@code{dl_iterate_phdr} interface can be used instead.
I'm slightly confused here. My understanding of dl_iterate_phdr() is
that it will stay within the primary namespace and will not traverse any
of the private namespaces. Therefore I do not understand what you mean
when you say "instead". It seems to me like you need a couple more
functions like:
It sounds like we need some additional functions to fully handle other
namespaces:
int dl_iterate_phdr_lmid( Lmid_t lmid,
int (*callback)(struct dl_phdr_info *info,
size_t size, void *data),
void *data);
which is the same as dl_iterate_phdr() except that it itereates through
a specified lmid or
int dl_iterate_phdr_all(
int (*callback)(struct dl_phdr_info *info,
size_t size, void *data),
void *data);\
Which traverses all of the namespaces not just the default one. What is
your preferred solution to remove this limitation now that more and more
applications are using dlmopen() and tools are migrating to the LD_AUDIT
tool interface?
Presumably the thing that makes dl_iterate_phdr() thread safe is that it
takes a lock internal to ld.so so that it has a consistent view while
iterating. That suggests to me that this fact should be documented
saying something to the effect:
1) Any other threads trying to do dynamic loader operations will be
forced to wait until the completion of this call.
2) No dynamic loader operations are permitted within the callback function.
> +@end table
> +@end deftp
> +
> +@strong{Portability note:} It is not possible to create a @code{struct
> +link_map} object and pass a pointer to a function that expects a
> +@code{struct link_map *} argument. Only link map pointers initially
> +supplied by @theglibc{} are permitted as arguments. In current versions
> +of @theglibc{}, handles returned by @code{dlopen} and @code{dlmopen} are
> +pointers to link maps. However, this is not a portable assumption, and
> +may even change in future versions of @theglibc{}. To obtain the link
> +map associated with a handle, see @code{dlinfo} and
> +@code{RTLD_DI_LINKMAP} below. If a function accepts both
> +@code{dlopen}/@code{dlmopen} handles and @code{struct link_map} pointers
> +in its @code{void *} argument, that is documented explicitly.
> +
> +@subsection Querying information for loaded objects
> +
> +The @code{dlinfo} function provides access to internal information
> +associated with @code{dlopen}/@code{dlmopen} handles and link maps.
> +
That is a clever solution rather than making a function that converts a
link_map* into a void* RTDL handle.
> @deftypefun {int} dlinfo (void *@var{handle}, int @var{request}, void *@var{arg})
> @safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acunsafe{@acucorrupt{}}}
> @standards{GNU, dlfcn.h}
> This function returns information about @var{handle} in the memory
> location @var{arg}, based on @var{request}. The @var{handle} argument
> must be a pointer returned by @code{dlopen} or @code{dlmopen}; it must
> -not have been closed by @code{dlclose}.
> +not have been closed by @code{dlclose}. Alternatively, @var{handle}
> +can be a @code{struct link_map *} value for a link map of an object
> +that has not been closed.
>
> On success, @code{dlinfo} returns 0 for most request types; exceptions
> are noted below. If there is an error, the function returns @math{-1},
>
> base-commit: a0ecbb45969e93ec5eb6ba0d1f0a5578bdb2e54c
>
* Ben Woodard:
>> +@table @code
>> +@item l_addr
>> +@cindex load address
>> +This field contains the @dfn{load address} of the object. This is the
>> +offset that needs to be applied to unrelocated addresses in the object
>> +image (as to stored on disk) to form an address that can be used at run
>
> s/as to stored on disk/as stored on disk/
Thanks, fixed locally.
>> +@item l_name
>> +For a shared object, this field contains the file name that the
>> +@theglibc{} dynamic loader used when opening the object. This can be a
>> +relative path (relative the current directory at process start, or when
>> +the object was loaded later, via @code{dlopen} or @code{dlmopen}).
>> +Symbolic links are not necessarily resolved.
>> +
>> +For the main executable, @code{l_name} is @samp{""} (the empty string).
>> +(The main executable is not loaded by @theglibc{}, so its file name is
>> +not available.) On Linux, the main executable is available as
>> +@file{/proc/self/exe} (unless an explicit loader invocation was used to
>> +start the program). The file name @file{/proc/self/exe} continues to
>> +resolve to the same file even if it is moved within or deleted from the
>> +file system. Its current location can be read using @code{readlink}.
>> +@xref{Symbolic Links}. (Although @file{/proc/self/exe} is not actually
>> +a symbol link, it is only presented as one.) Note that @file{/proc} may
>> +not be mounted, in which case @file{/proc/self/exe} is not available.
>> +
>
> It seems like you are spending a very long time explaining how to
> recover this in various situations. Wouldn't it be much easier to
> encode that logic into glibc itself and use that to populate l_name
> for the executable.
Yes, this seems like a relatively easy change on the glibc side that
could benefit some advanced applications. I can drop all these details
from the patch.
> I seriously doubt that any tools rely on the link_map->l_name to be ""
> in the case of the original binary.
There's code in GDB that might go wrong as a result. See
svr4_read_so_list in gdb/solib-svr4.c. I don't know what the exact
implications are.
On the other hand, we used to have l_name != "" for explicit ld.so
invocations, and that only broke certain glibc tests, and no
applications (as far as we were aware).
>> +If an explicit loader invocation is used (such as @samp{ld.so
>> +/usr/bin/emacs}),
> Isn't the explicit loader invocation just a case where the loader is
> the program and the first thing that it does is begin the process of
> loading argv[1] as the first object as it gets ready run. Other than
> the fact that ld.so doesn't have a INTERP program header, in what way
> is it different than any other wrapped command. like "sudo ls" or
> "strace ls" or "watch ls".
It's different in that we cannot rewrite /proc/self/exe to refer to the
actual main executable. It's more like “python script.py”, where
/proc/self/exe refers to the Python interpreter. Python scripts are
obviously fine with that (they do not use /proc/self/exe but __file__),
but some programs that use /proc/self/exe to locate data files get very
confused.
We already define program_invocation_name in glibc, but it's argv[0] and
not /proc/self/exe. It's not entirely clear if it's strictly read-only.
>> +@item l_prev
>> +@itemx l_next
>> +These fields are used to main a double-linked linked list of all link
>> +maps within on @code{dlmopen} namespace. Note that there is currently
>> +no thread-safe way to iteratoe over this list. The callback-based
>> +@code{dl_iterate_phdr} interface can be used instead.
>
> I'm slightly confused here. My understanding of dl_iterate_phdr() is
> that it will stay within the primary namespace and will not traverse
> any of the private namespaces. Therefore I do not understand what you
> mean when you say "instead".
I can drop the reference to dl_iterate_phdr if that helps to address
your concern. We do not have documentation for struct r_debug_extended
yet, so discussing traversal of multiple or foreign namespaces is not
really in scope. Although you can use dl_iterate_phdr to obtain the
required lock today, so all the required pieces are there, I think, just
not very convenient to use.
> Which traverses all of the namespaces not just the default one. What
> is your preferred solution to remove this limitation now that more and
> more applications are using dlmopen() and tools are migrating to the
> LD_AUDIT tool interface?
I don't know yet. Callback-based functions are problematic in general.
I suspect dl_iterate_phdr or iteration in general is used as a fallback
for some other missing functionality. It can be horribily inefficient.
> Presumably the thing that makes dl_iterate_phdr() thread safe is that
> it takes a lock internal to ld.so so that it has a consistent view
> while iterating. That suggests to me that this fact should be
> documented saying something to the effect:
>
> 1) Any other threads trying to do dynamic loader operations will be
> forced to wait until the completion of this call.
> 2) No dynamic loader operations are permitted within the callback function.
It's a recursive lock, so dlopen/dlclose do not self-deadlock. It's
also only acquiring the list lock, not the broader dynamic linker lock.
The implications are difficult to document. That's one reason why
callback-based functions are difficult.
Thanks,
Florian
@@ -352,16 +352,119 @@ support the XGETBV instruction.
@node Dynamic Linker Introspection
@section Dynamic Linker Introspection
-@Theglibc{} provides various functions for querying information from the
+@Theglibc{} provides various facilities for querying information from the
dynamic linker.
+@deftp {Data Type} {struct link_map}
+
+@cindex link map
+A @dfn{link map} is associated with the main executable and each shared
+object. Some fields of the link map are accesible to applications and
+exposed through the @code{stuct link_map}. Applications must not modify
+the link map directly.
+
+Pointers to link maps can be obtained from the @code{_r_debug} variable,
+from the @code{RTLD_DI_LINKMAP} request for @code{dlinfo}, and from the
+@code{_dl_find_object} function. See below for details.
+
+@table @code
+@item l_addr
+@cindex load address
+This field contains the @dfn{load address} of the object. This is the
+offset that needs to be applied to unrelocated addresses in the object
+image (as to stored on disk) to form an address that can be used at run
+time for accessing data or running code. For position-dependent
+executables, the load address is typically zero, and no adjustment is
+required. For position-independent objects, the @code{l_addr} field
+usually contains the address of the object's ELF header in the process
+image. However, this correspondence is not guaranteed because the ELF
+header might not be mapped at all, and the ELF file as stored on disk
+might use zero as the lowest virtual address. Due to the second
+variable, values of the @code{l_addr} field do not necessarily uniquely
+identify a shared object.
+
+On Linux, to obtain the lowest loaded address of the main program, use
+@code{getauxval} to obtain the @code{AT_PHDR} and @code{AT_PHNUM} values
+for the current process. Alternatively, call
+@samp{dlinfo (_r_debug.r_map, &@var{phdr})}
+to obtain the number of program headers, and the address of the program
+header array will be stored in @var{phdr}
+(of type @code{const ElfW(Phdr) *}, as explained below).
+These values allow processing the array of program headers and the
+address information in the @code{PT_LOAD} entries among them.
+This works even when the program was started with an explicit loader
+invocation.
+
+@item l_name
+For a shared object, this field contains the file name that the
+@theglibc{} dynamic loader used when opening the object. This can be a
+relative path (relative the current directory at process start, or when
+the object was loaded later, via @code{dlopen} or @code{dlmopen}).
+Symbolic links are not necessarily resolved.
+
+For the main executable, @code{l_name} is @samp{""} (the empty string).
+(The main executable is not loaded by @theglibc{}, so its file name is
+not available.) On Linux, the main executable is available as
+@file{/proc/self/exe} (unless an explicit loader invocation was used to
+start the program). The file name @file{/proc/self/exe} continues to
+resolve to the same file even if it is moved within or deleted from the
+file system. Its current location can be read using @code{readlink}.
+@xref{Symbolic Links}. (Although @file{/proc/self/exe} is not actually
+a symbol link, it is only presented as one.) Note that @file{/proc} may
+not be mounted, in which case @file{/proc/self/exe} is not available.
+
+If an explicit loader invocation is used (such as @samp{ld.so
+/usr/bin/emacs}), the @file{/proc/self/exe} approach does not work
+because the file name refers to the dynamic linker @code{ld.so}, and not
+the @code{/usr/bin/emacs} program. An approximation to the executable
+path is still available in the @code{@var{info}.dli_fname} member after
+calling @samp{dladdr (_r_debug.r_map->l_ld, &@var{info})}. Note that
+this could be a relative path, and it is supplied by the process that
+created the current process, not the kernel, so it could be inaccurate.
+
+@item l_ld
+This is a pointer to the ELF dynamic segment, an array of tag/value
+pairs that provide various pieces of information that the dynamic
+linking process uses. On most architectures, addreses in the dynamic
+segment are relocated at run time, but on some architectures and in some
+run-time configurations, it is necessary to add the @code{l_addr} field
+value to obtain a proper address.
+
+@item l_prev
+@itemx l_next
+These fields are used to main a double-linked linked list of all link
+maps within on @code{dlmopen} namespace. Note that there is currently
+no thread-safe way to iteratoe over this list. The callback-based
+@code{dl_iterate_phdr} interface can be used instead.
+@end table
+@end deftp
+
+@strong{Portability note:} It is not possible to create a @code{struct
+link_map} object and pass a pointer to a function that expects a
+@code{struct link_map *} argument. Only link map pointers initially
+supplied by @theglibc{} are permitted as arguments. In current versions
+of @theglibc{}, handles returned by @code{dlopen} and @code{dlmopen} are
+pointers to link maps. However, this is not a portable assumption, and
+may even change in future versions of @theglibc{}. To obtain the link
+map associated with a handle, see @code{dlinfo} and
+@code{RTLD_DI_LINKMAP} below. If a function accepts both
+@code{dlopen}/@code{dlmopen} handles and @code{struct link_map} pointers
+in its @code{void *} argument, that is documented explicitly.
+
+@subsection Querying information for loaded objects
+
+The @code{dlinfo} function provides access to internal information
+associated with @code{dlopen}/@code{dlmopen} handles and link maps.
+
@deftypefun {int} dlinfo (void *@var{handle}, int @var{request}, void *@var{arg})
@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acunsafe{@acucorrupt{}}}
@standards{GNU, dlfcn.h}
This function returns information about @var{handle} in the memory
location @var{arg}, based on @var{request}. The @var{handle} argument
must be a pointer returned by @code{dlopen} or @code{dlmopen}; it must
-not have been closed by @code{dlclose}.
+not have been closed by @code{dlclose}. Alternatively, @var{handle}
+can be a @code{struct link_map *} value for a link map of an object
+that has not been closed.
On success, @code{dlinfo} returns 0 for most request types; exceptions
are noted below. If there is an error, the function returns @math{-1},