[v2] elf: Remove one-default-version check when searching an unversioned symbol

Message ID 20220528005658.1088272-1-maskray@google.com
State New
Headers
Series [v2] elf: Remove one-default-version check when searching an unversioned symbol |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Fangrui Song May 28, 2022, 12:56 a.m. UTC
  When searching an unversioned symbol in an object with DT_VERDEF, we
have two cases:

* there is a VER_NDX_GLOBAL definition
* there are a few version index>=2 definitions, with at most one being
  the default version

`versioned_sym` may be set in the second case.  Since the linker ensures
at most one default version (otherwise ``multiple definition of
`foo'``), it is of very little value for the loader to check this corner
case.  Delete `num_versions` to simplify code.

While here, add some comments.

--
Changes from v1
* Add comments
* Clarify commit message
---
 elf/dl-lookup.c | 52 +++++++++++++++++++------------------------------
 1 file changed, 20 insertions(+), 32 deletions(-)
  

Comments

Florian Weimer May 30, 2022, 1:08 p.m. UTC | #1
* Fangrui Song via Libc-alpha:

> -      sym = num_versions == 1 ? versioned_sym : NULL;

> +      sym = versioned_sym;

This change is rather suspicious.  If we discard the return value this
way, why have a distinct out parameter at all?

More generally, the default version at index 2 is necessarily a global
default, e.g. for x86-64, it is always GLIBC_2.2.5 for any symbol.  But
we have symbols whose non-default version is *later* than GLIBC_2.2.5.
I want to understand better how that information is encoded.

If possible we should avoid scanning the entire hash chain for
unversioned symbols and pick the version with lowest or highest index
(for binding or dlsym), but I don't know if we can.

Thanks,
Florian
  
Fangrui Song May 31, 2022, 6:24 a.m. UTC | #2
On 2022-05-30, Florian Weimer wrote:
>* Fangrui Song via Libc-alpha:
>
>> -      sym = num_versions == 1 ? versioned_sym : NULL;
>
>> +      sym = versioned_sym;
>
>This change is rather suspicious.  If we discard the return value this
>way, why have a distinct out parameter at all?

versioned_sym is a fallback when sym is still NULL on this line.

When there is a version index 2 and DL_LOOKUP_RETURN_NEWEST is not set,
version index 2 takes precedence over the default version. The code
needs to remember the default version in case a definiton of index 2
is absent.

Perhaps renaming `versioned_sym` to `default_version_sym` may be clearer?

>More generally, the default version at index 2 is necessarily a global
>default, e.g. for x86-64, it is always GLIBC_2.2.5 for any symbol.

The term "default" has ambiguity. It may refer to the visibility
STV_DEFAULT or the property of a version: default vs non-default
(hidden).

For a symbol name, there can be at most one default version and any
number of non-default (aka hidden) versions.  The default version does
not need to be at index 2.

In glibc, I think generally the last version index is the default and
the others are non-default. However this is not a requirement in the
linker. That said, linkers guarantee that version nodes in a version
script get increasing version indices.

>But
>we have symbols whose non-default version is *later* than GLIBC_2.2.5.
>I want to understand better how that information is encoded.

Yes. Some symbol names have more than one non-default versions, e.g.

% nm -jpD /lib/x86_64-linux-gnu/libc.so.6 | grep \^sys_errlist
sys_errlist@GLIBC_2.12
sys_errlist@GLIBC_2.2.5
sys_errlist@GLIBC_2.3
sys_errlist@GLIBC_2.4

The version node with index 2 is GLIBC_2.2.5.

When DL_LOOKUP_RETURN_NEWEST is unset, an unversioned symbol lookup prefers
GLIBC_2.2.5 over the default version if exists.

The behavior is to retains compatibility when a shared object becomes versioned: the symbols with the smallest version (index 2) indicate the previously unversioned symbols. If a new version of a shared object needs to deprecate an unversioned bar, you can remove bar and define bar@GLIBC_2.2.5 instead. Libraries using bar are unaffected but new linking against bar all gets versioned references.

>If possible we should avoid scanning the entire hash chain for
>unversioned symbols and pick the version with lowest or highest index
>(for binding or dlsym), but I don't know if we can.

This could be achieved if @xxx were part of the hashed string, but the
.hash/.gnu.hash designs had chosen to hash only the unversioned part, so this
situation could not be improved now...
  
Florian Weimer May 31, 2022, 7:39 p.m. UTC | #3
* Fangrui Song:

> On 2022-05-30, Florian Weimer wrote:
>>* Fangrui Song via Libc-alpha:
>>
>>> -      sym = num_versions == 1 ? versioned_sym : NULL;
>>
>>> +      sym = versioned_sym;
>>
>>This change is rather suspicious.  If we discard the return value this
>>way, why have a distinct out parameter at all?
>
> versioned_sym is a fallback when sym is still NULL on this line.
>
> When there is a version index 2 and DL_LOOKUP_RETURN_NEWEST is not set,
> version index 2 takes precedence over the default version. The code
> needs to remember the default version in case a definiton of index 2
> is absent.
>
> Perhaps renaming `versioned_sym` to `default_version_sym` may be
> clearer?

I think I now understand how versioned symbol lookup works.

We have the symbol table (DT_SYMTAB).  For each symbol table entry,
there is a corresponding element in the DT_VERSYM array.  The elements
are 16-bit numbers.  The MSB is set if the symbol is hidden (indicating
a symbol definition that has a non-default version), the lower 15 bits
are version index.  Version indices are defined via DT_VERDEF, which is
not structured as a table for some reason.  Instead, the variable-sized
entries (type is ElfW(Verdef)) have explicit version indices associated
with them via the vd_ndx field.

It is reasonable to assume that the vd_ndx numbers in DT_VERDEF are
assigned in such a way that a version that comes after another version
in the version tree order has a higher index.  We could check that at
run time, but it seems unnecessary.

DT_GNU_HASH contains a header, a Bloom filter, an array of hash buckets,
and another array that parallels a subrange of the DT_SYMTAB entries
(basically everything after the local and undefined symbols).  The
latter array contains 31-bit hashes, and the LSB indicates whether
scanning should stop at that entry.  Lookup uses the buckets to find a
start symbol index, and scanning proceed from there until a full match
is found, or the LSB is set, indicating that the chain is complete.
(This is a variant of open addressing.)  As a result, we cannot make any
assumptions about the order in which we encounter symbols while
scanning.

Based on first principles, I think we should do the following:

* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
  and the target is versioned as well, a full match means equality of
  symbol name and version.  The hidden/default flag and the version
  index do not matter.

* If we have a version to lookup up, but no version information in the
  target, we match only the name.  This is necessary so that
  interposition of e.g. malloc without a version works.  dlvsym should
  match symbol binding here and pick up the unversioned definition as
  well.  Without interposition, the non-dlvsym cannot happen because we
  have a version coverage check in elf/dl-version.c that should have
  detected that the soname lacks the required version information.
  (There is an assert failure in elf/dl-lookup.c that can trigger in
  some cases, though, so the coverage check is currently incomplete.)

* If we have no version information at all, then a full match means that
  only the name matches.

* For symbol binding during relocation, a version-less lookup may need
  to find a versioned definition.  This was initially implemented so
  that glibc 2.1 could introduce symbol versioning without rebuilding
  the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
  not have symbol versioning.)  It has subsequently been used by other
  libraries.  In this case, we should pick the symbol with the lowest
  version index and a matching name.  The hidden/default version index
  flag does not matter.  We need to pick the lowest version even if it
  is no longer the default.  This ensures that old binaries do not pick
  up a newer definition, changing behavior.

* For symbol lookup with dlsym (without a version), we need to find a
  versioned definition.  The use case for that is mainly FFI bindings
  (e.g., Python ctypes), where programmers generally do not specify
  version information.  We decided that we would want the latest version
  in this case.  However, in this case, I would want us to consider
  default versions only if they exist.  Usually, the latest version is
  the default version (if a default version exists at all), but we have
  gone backwards in version in a few cases in Fedora rawhide, to
  implement an ABI revert that happened during upstream development.
  The toolchain supports this, and I think dlsym should match the link
  editor behavior in this case.  (But on second thought, maybe the
  complexity is not worth it, and we should ignore the hidden/default
  version flag after all.)

The current implementation has two different, potentially conflicting
flags for indicating dlvsym lookups: the “hidden” member in the struct
r_found_version object that is passed in, and the
DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
versioned symbol binding during relocation processing, the hidden flag
does not matter according to the overview above.

There is a corner case for unversioned symbol definitions in versioned
object, and how versioned lookups can bind against them.  I'm changing
this code in my patch for bug 29190.  Such symbol definitions have an
empty version name and version hash zero.  Apparently, the intent is
that they can bind to any version, particularly if they are the default.
(Or at index 0 or 1?  But the existing code does not check that.)  I
have never seen them used in practice, so I don't know how to test them.
The current code has checks for the default/hidden version index flag,
but I don't think we need that, it is inconsistent with that.

Going back to your suggestion regarding default_version_sym, I think we
need to turn this into a struct and keep track of the version index (the
vd_ndx value) as well.  num_versions still should go because I don't
think the number of versions matter in any case.

We can still keep the early scan stop based on a non-NULL return value
from check_match:

* For versioned lookup, we can stop on a full match.  The hidden/default
  flag is ignored.

* For unversioned lookup during binding, we can stop if we encounter the
  base version (index 2) with a matching symbol name.  (The
  hidden/default flag does not matter.)

* But for unversioned lookup with dlsym (currently called
  DL_LOOKUP_RETURN_NEWEST), we can stop at the first default version
  with a matching symbol name because there is supposed to be only one
  default version.  (BFD ld seems to enforce this.)

So going back to your patch, I think the removal of num_versions is
correct.  We need to keep the fallback symbol, but we also need to keep
track of the version index, and steer the fallback symbol to the one
with the smallest or greatest version index.

>>More generally, the default version at index 2 is necessarily a global
>>default, e.g. for x86-64, it is always GLIBC_2.2.5 for any symbol.
>
> The term "default" has ambiguity. It may refer to the visibility
> STV_DEFAULT or the property of a version: default vs non-default
> (hidden).
>
> For a symbol name, there can be at most one default version and any
> number of non-default (aka hidden) versions.  The default version does
> not need to be at index 2.
>
> In glibc, I think generally the last version index is the default and
> the others are non-default. However this is not a requirement in the
> linker. That said, linkers guarantee that version nodes in a version
> script get increasing version indices.

Yes, I used “default version” in the quote above in the sense of index
2, so perhaps the base version, or the lowest possible/initial symbol
version.

>>But
>>we have symbols whose non-default version is *later* than GLIBC_2.2.5.
>>I want to understand better how that information is encoded.
>
> Yes. Some symbol names have more than one non-default versions, e.g.
>
> % nm -jpD /lib/x86_64-linux-gnu/libc.so.6 | grep \^sys_errlist
> sys_errlist@GLIBC_2.12
> sys_errlist@GLIBC_2.2.5
> sys_errlist@GLIBC_2.3
> sys_errlist@GLIBC_2.4
>
> The version node with index 2 is GLIBC_2.2.5.
>
> When DL_LOOKUP_RETURN_NEWEST is unset, an unversioned symbol lookup prefers
> GLIBC_2.2.5 over the default version if exists.
>
> The behavior is to retains compatibility when a shared object becomes
> versioned: the symbols with the smallest version (index 2) indicate
> the previously unversioned symbols. If a new version of a shared
> object needs to deprecate an unversioned bar, you can remove bar and
> define bar@GLIBC_2.2.5 instead. Libraries using bar are unaffected but
> new linking against bar all gets versioned references.

I agree.

>>If possible we should avoid scanning the entire hash chain for
>>unversioned symbols and pick the version with lowest or highest index
>>(for binding or dlsym), but I don't know if we can.
>
> This could be achieved if @xxx were part of the hashed string, but the
> .hash/.gnu.hash designs had chosen to hash only the unversioned part,
> so this situation could not be improved now...

The advantage is that we can use the same hash table for versioned and
unversioned lookups.

Thanks,
Florian
  
Fangrui Song June 1, 2022, 4:07 a.m. UTC | #4
On 2022-05-31, Florian Weimer wrote:
>* Fangrui Song:
>
>> On 2022-05-30, Florian Weimer wrote:
>>>* Fangrui Song via Libc-alpha:
>>>
>>>> -      sym = num_versions == 1 ? versioned_sym : NULL;
>>>
>>>> +      sym = versioned_sym;
>>>
>>>This change is rather suspicious.  If we discard the return value this
>>>way, why have a distinct out parameter at all?
>>
>> versioned_sym is a fallback when sym is still NULL on this line.
>>
>> When there is a version index 2 and DL_LOOKUP_RETURN_NEWEST is not set,
>> version index 2 takes precedence over the default version. The code
>> needs to remember the default version in case a definiton of index 2
>> is absent.
>>
>> Perhaps renaming `versioned_sym` to `default_version_sym` may be
>> clearer?
>
>I think I now understand how versioned symbol lookup works.
>
>We have the symbol table (DT_SYMTAB).  For each symbol table entry,
>there is a corresponding element in the DT_VERSYM array.  The elements
>are 16-bit numbers.  The MSB is set if the symbol is hidden (indicating
>a symbol definition that has a non-default version), the lower 15 bits
>are version index.  Version indices are defined via DT_VERDEF, which is
>not structured as a table for some reason.  Instead, the variable-sized
>entries (type is ElfW(Verdef)) have explicit version indices associated
>with them via the vd_ndx field.

Yes.

>It is reasonable to assume that the vd_ndx numbers in DT_VERDEF are
>assigned in such a way that a version that comes after another version
>in the version tree order has a higher index.  We could check that at
>run time, but it seems unnecessary.

Yes.

>DT_GNU_HASH contains a header, a Bloom filter, an array of hash buckets,
>and another array that parallels a subrange of the DT_SYMTAB entries
>(basically everything after the local and undefined symbols).  The
>latter array contains 31-bit hashes, and the LSB indicates whether
>scanning should stop at that entry.  Lookup uses the buckets to find a
>start symbol index, and scanning proceed from there until a full match
>is found, or the LSB is set, indicating that the chain is complete.
>(This is a variant of open addressing.)  As a result, we cannot make any
>assumptions about the order in which we encounter symbols while
>scanning.

Yes.

>Based on first principles, I think we should do the following:
>
>* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
>  and the target is versioned as well, a full match means equality of
>  symbol name and version.  The hidden/default flag and the version
>  index do not matter.

Yes, the intuitive case.

>* If we have a version to lookup up, but no version information in the
>  target, we match only the name.  This is necessary so that
>  interposition of e.g. malloc without a version works.  

Yes. Sanitizers work this way: they are unversioned but can intercept
*@GLIBC_* functions.

>  dlvsym should
>  match symbol binding here and pick up the unversioned definition as
>  well.

Yes if the object does not have DT_VERSYM:
dlvsym(RTLD_DEFAULT, "foo", "v1") pick up unversioned foo if the object
does not have DT_VERSYM.

>  Without interposition, the non-dlvsym cannot happen because we
>  have a version coverage check in elf/dl-version.c that should have
>  detected that the soname lacks the required version information.

?

>  (There is an assert failure in elf/dl-lookup.c that can trigger in
>  some cases, though, so the coverage check is currently incomplete.)

Correct.

For example, for an object without DT_VERSYM,
`assert (version->filename == NULL || ! _dl_name_match_p (version->filename, map))`
fails if dlvsym(RTLD_DEFAULT, "foo", "") is called.

>* If we have no version information at all, then a full match means that
>  only the name matches.
>
>* For symbol binding during relocation, a version-less lookup may need
>  to find a versioned definition.  This was initially implemented so
>  that glibc 2.1 could introduce symbol versioning without rebuilding
>  the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
>  not have symbol versioning.)  It has subsequently been used by other
>  libraries.  In this case, we should pick the symbol with the lowest
>  version index and a matching name.  The hidden/default version index
>  flag does not matter.  We need to pick the lowest version even if it
>  is no longer the default.  This ensures that old binaries do not pick
>  up a newer definition, changing behavior.

Right. The index is either 1 (VER_NDX_GLOBAL; ld ensures VER_NDX_GLOBAL
is the only version attached to the symbol name) or 2.

>* For symbol lookup with dlsym (without a version), we need to find a
>  versioned definition.  The use case for that is mainly FFI bindings
>  (e.g., Python ctypes), where programmers generally do not specify
>  version information.  We decided that we would want the latest version
>  in this case.

Correction: the default version, instead of the latest version.

If somehow there are foo@@v2 and foo@v3, dlsym(RTLD_DEFAULT, "foo")
picks up foo@@v2.

>  However, in this case, I would want us to consider
>  default versions only if they exist.

The current behavior already implements this.

When searching a definition for foo,

* for an object without DT_VERSYM
   + it can be bound to foo
* for an object with DT_VERSYM
   + it can be bound to foo of version VER_NDX_GLOBAL. This takes precendence over the next two rules
   + it can be bound to foo of any default version
   + it can be bound to foo of non-default version index 2 in relocation resolving phase (not dlsym/dlvsym)

>  Usually, the latest version is
>  the default version (if a default version exists at all), but we have
>  gone backwards in version in a few cases in Fedora rawhide, to
>  implement an ABI revert that happened during upstream development.
>  The toolchain supports this, and I think dlsym should match the link
>  editor behavior in this case.  (But on second thought, maybe the
>  complexity is not worth it, and we should ignore the hidden/default
>  version flag after all.)
>
>The current implementation has two different, potentially conflicting
>flags for indicating dlvsym lookups: the “hidden” member in the struct
>r_found_version object that is passed in, and the
>DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
>versioned symbol binding during relocation processing, the hidden flag
>does not matter according to the overview above.

Based on my current understanding, removing DL_LOOKUP_RETURN_NEWEST is
fine, but we'd still need a flag.

For the versioned lookup code, DL_LOOKUP_RETURN_NEWEST is unused.

For the unversioned lookup code (dlvsym has been ruled out),
DL_LOOKUP_RETURN_NEWEST is to differentiate relocation resolving and
dlsym.

>There is a corner case for unversioned symbol definitions in versioned
>object, and how versioned lookups can bind against them.  I'm changing
>this code in my patch for bug 29190.  Such symbol definitions have an
>empty version name and version hash zero.  Apparently, the intent is
>that they can bind to any version, particularly if they are the default.
>(Or at index 0 or 1?  But the existing code does not check that.)  I
>have never seen them used in practice, so I don't know how to test them.
>The current code has checks for the default/hidden version index flag,
>but I don't think we need that, it is inconsistent with that.

Index 0 is VER_NDX_LOCAL. Such symbols should have STB_LOCAL binding.
Such symbols, if present in .dynsym, should be STT_SECTION.
rtld does not need to check the invariant.
STT_SECTION symbols have been ignored by `check_match`.

(GNU ld's powerpc64 port may produce R_PPC64_UADDR64 referencing a STB_LOCAL section symbol.)

>Going back to your suggestion regarding default_version_sym, I think we
>need to turn this into a struct and keep track of the version index (the
>vd_ndx value) as well.  num_versions still should go because I don't
>think the number of versions matter in any case.

Hmm. I don't think we need a vd_ndx value.

>We can still keep the early scan stop based on a non-NULL return value
>from check_match:
>
>* For versioned lookup, we can stop on a full match.  The hidden/default
>  flag is ignored.

This is the current behavior.

>* For unversioned lookup during binding, we can stop if we encounter the
>  base version (index 2) with a matching symbol name.  (The
>  hidden/default flag does not matter.)

This is the current behavior.

>* But for unversioned lookup with dlsym (currently called
>  DL_LOOKUP_RETURN_NEWEST), we can stop at the first default version
>  with a matching symbol name because there is supposed to be only one
>  default version.  (BFD ld seems to enforce this.)

Right.

For dlsym, we can change `*versioned_sym = sym;` the code path to return sym,
but the change will add lines which seem unnecessary.

>So going back to your patch, I think the removal of num_versions is
>correct.  We need to keep the fallback symbol, but we also need to keep
>track of the version index, and steer the fallback symbol to the one
>with the smallest or greatest version index.

See above. I don't think we need to track the version index.

>>>More generally, the default version at index 2 is necessarily a global
>>>default, e.g. for x86-64, it is always GLIBC_2.2.5 for any symbol.
>>
>> The term "default" has ambiguity. It may refer to the visibility
>> STV_DEFAULT or the property of a version: default vs non-default
>> (hidden).
>>
>> For a symbol name, there can be at most one default version and any
>> number of non-default (aka hidden) versions.  The default version does
>> not need to be at index 2.
>>
>> In glibc, I think generally the last version index is the default and
>> the others are non-default. However this is not a requirement in the
>> linker. That said, linkers guarantee that version nodes in a version
>> script get increasing version indices.
>
>Yes, I used “default version” in the quote above in the sense of index
>2, so perhaps the base version, or the lowest possible/initial symbol
>version.
>
>>>But
>>>we have symbols whose non-default version is *later* than GLIBC_2.2.5.
>>>I want to understand better how that information is encoded.
>>
>> Yes. Some symbol names have more than one non-default versions, e.g.
>>
>> % nm -jpD /lib/x86_64-linux-gnu/libc.so.6 | grep \^sys_errlist
>> sys_errlist@GLIBC_2.12
>> sys_errlist@GLIBC_2.2.5
>> sys_errlist@GLIBC_2.3
>> sys_errlist@GLIBC_2.4
>>
>> The version node with index 2 is GLIBC_2.2.5.
>>
>> When DL_LOOKUP_RETURN_NEWEST is unset, an unversioned symbol lookup prefers
>> GLIBC_2.2.5 over the default version if exists.
>>
>> The behavior is to retains compatibility when a shared object becomes
>> versioned: the symbols with the smallest version (index 2) indicate
>> the previously unversioned symbols. If a new version of a shared
>> object needs to deprecate an unversioned bar, you can remove bar and
>> define bar@GLIBC_2.2.5 instead. Libraries using bar are unaffected but
>> new linking against bar all gets versioned references.
>
>I agree.
>
>>>If possible we should avoid scanning the entire hash chain for
>>>unversioned symbols and pick the version with lowest or highest index
>>>(for binding or dlsym), but I don't know if we can.
>>
>> This could be achieved if @xxx were part of the hashed string, but the
>> .hash/.gnu.hash designs had chosen to hash only the unversioned part,
>> so this situation could not be improved now...
>
>The advantage is that we can use the same hash table for versioned and
>unversioned lookups.
>
>Thanks,
>Florian
>
  
Florian Weimer June 1, 2022, 6:49 a.m. UTC | #5
* Fangrui Song:

>>Based on first principles, I think we should do the following:
>>
>>* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
>>  and the target is versioned as well, a full match means equality of
>>  symbol name and version.  The hidden/default flag and the version
>>  index do not matter.
>
> Yes, the intuitive case.
>
>>* If we have a version to lookup up, but no version information in the
>>  target, we match only the name.  This is necessary so that
>>  interposition of e.g. malloc without a version works.  
>
> Yes. Sanitizers work this way: they are unversioned but can intercept
> *@GLIBC_* functions.

They break symbol versioning due to this, so I'm not sure if they are a
good example.

>>  dlvsym should
>>  match symbol binding here and pick up the unversioned definition as
>>  well.
>
> Yes if the object does not have DT_VERSYM:
> dlvsym(RTLD_DEFAULT, "foo", "v1") pick up unversioned foo if the object
> does not have DT_VERSYM.
>
>>  Without interposition, the non-dlvsym cannot happen because we
>>  have a version coverage check in elf/dl-version.c that should have
>>  detected that the soname lacks the required version information.
>
> ?

Needed versions contain a soname (vn_file), and we check that the shared
object with that soname provides all the needed versions.  This is how
we detect applications that run on too-old glibc even if lazy binding is
active.

>>* For symbol binding during relocation, a version-less lookup may need
>>  to find a versioned definition.  This was initially implemented so
>>  that glibc 2.1 could introduce symbol versioning without rebuilding
>>  the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
>>  not have symbol versioning.)  It has subsequently been used by other
>>  libraries.  In this case, we should pick the symbol with the lowest
>>  version index and a matching name.  The hidden/default version index
>>  flag does not matter.  We need to pick the lowest version even if it
>>  is no longer the default.  This ensures that old binaries do not pick
>>  up a newer definition, changing behavior.
>
> Right. The index is either 1 (VER_NDX_GLOBAL; ld ensures VER_NDX_GLOBAL
> is the only version attached to the symbol name) or 2.

I believe this not to be true.  For example,
sched_setaffinity@GLIBC_2.3.3 is the lowest version (with index 6, but
that might vary from build to build), not sched_setaffinity@GLIBC_2.2.5.
I think we should still bind to that if an object contains an
unversioned undefined symbol reference to sched_setaffinity.

>>* For symbol lookup with dlsym (without a version), we need to find a
>>  versioned definition.  The use case for that is mainly FFI bindings
>>  (e.g., Python ctypes), where programmers generally do not specify
>>  version information.  We decided that we would want the latest version
>>  in this case.
>
> Correction: the default version, instead of the latest version.
>
> If somehow there are foo@@v2 and foo@v3, dlsym(RTLD_DEFAULT, "foo")
> picks up foo@@v2.

If there is no default version, I think we need to pick the latest
version, instead of failing the lookup.

>>  However, in this case, I would want us to consider
>>  default versions only if they exist.
>
> The current behavior already implements this.
>
> When searching a definition for foo,
>
> * for an object without DT_VERSYM
>   + it can be bound to foo
> * for an object with DT_VERSYM
>   + it can be bound to foo of version VER_NDX_GLOBAL. This takes precendence over the next two rules
>   + it can be bound to foo of any default version
>   + it can be bound to foo of non-default version index 2 in relocation resolving phase (not dlsym/dlvsym)

Is VER_NDX_GLOBAL actually used?  It seems to be a marker for “we no
longer use symbol versioning for this symbol”.

I think these rules are not quite correct because they do not find the
earliest version (for relocation bindings) or latest eligible version
(for dlsym).

>>The current implementation has two different, potentially conflicting
>>flags for indicating dlvsym lookups: the “hidden” member in the struct
>>r_found_version object that is passed in, and the
>>DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
>>versioned symbol binding during relocation processing, the hidden flag
>>does not matter according to the overview above.
>
> Based on my current understanding, removing DL_LOOKUP_RETURN_NEWEST is
> fine, but we'd still need a flag.
>
> For the versioned lookup code, DL_LOOKUP_RETURN_NEWEST is unused.
>
> For the unversioned lookup code (dlvsym has been ruled out),
> DL_LOOKUP_RETURN_NEWEST is to differentiate relocation resolving and
> dlsym.

I think we don't need both DL_LOOKUP_RETURN_NEWEST and version->hidden.

We need to differentiate between unversioned relocation binding and
dlsym.

Versioned relocation binding and dlvsym should behave identical, I htink.

>>There is a corner case for unversioned symbol definitions in versioned
>>object, and how versioned lookups can bind against them.  I'm changing
>>this code in my patch for bug 29190.  Such symbol definitions have an
>>empty version name and version hash zero.  Apparently, the intent is
>>that they can bind to any version, particularly if they are the default.
>>(Or at index 0 or 1?  But the existing code does not check that.)  I
>>have never seen them used in practice, so I don't know how to test them.
>>The current code has checks for the default/hidden version index flag,
>>but I don't think we need that, it is inconsistent with that.
>
> Index 0 is VER_NDX_LOCAL. Such symbols should have STB_LOCAL binding.
> Such symbols, if present in .dynsym, should be STT_SECTION.
> rtld does not need to check the invariant.
> STT_SECTION symbols have been ignored by `check_match`.
>
> (GNU ld's powerpc64 port may produce R_PPC64_UADDR64 referencing a
> STB_LOCAL section symbol.)

dl-lookup.c does not encounter STB_LOCAL symbols because the DT_GNU_HASH
array does not cover them (the symbias value in elf/dl-setup_hash.c).
STB_LOCAL symbols come first in the symbol table (an ELF rule for symbol
tables, “In each symbol table, all symbols with STB_LOCAL binding
precede the weak and global symbols.”), and I think symbias is supposed
to chosen in such a way that they do not show up in the table.

However, in some objects, I see VER_NDX_LOCAL being used for undefined
symbols.  Other objects use VER_NDX_GLOBAL.  As far as I can tell, they
are all produced by BFD ld.

>>Going back to your suggestion regarding default_version_sym, I think we
>>need to turn this into a struct and keep track of the version index (the
>>vd_ndx value) as well.  num_versions still should go because I don't
>>think the number of versions matter in any case.
>
> Hmm. I don't think we need a vd_ndx value.

I think we need it for unversioned relocation binding if version index 2
does not exist for the symbol name, and for dlsym if the default version
has been removed.

Thanks,
Florian
  
Fangrui Song June 1, 2022, 8:14 a.m. UTC | #6
On 2022-06-01, Florian Weimer wrote:
>* Fangrui Song:
>
>>>Based on first principles, I think we should do the following:
>>>
>>>* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
>>>  and the target is versioned as well, a full match means equality of
>>>  symbol name and version.  The hidden/default flag and the version
>>>  index do not matter.
>>
>> Yes, the intuitive case.
>>
>>>* If we have a version to lookup up, but no version information in the
>>>  target, we match only the name.  This is necessary so that
>>>  interposition of e.g. malloc without a version works.
>>
>> Yes. Sanitizers work this way: they are unversioned but can intercept
>> *@GLIBC_* functions.
>
>They break symbol versioning due to this, so I'm not sure if they are a
>good example.

The behavior is natural to me: the behavior when DT_VERSYM is absent
matches the intention of the index 1/2 special rule when DT_VERSYM is present.

On the other hand, without the rule, an interceptor needs a glibc style version
script to intercept functions like malloc.  This would make the interceptor
cumbersome.  Also, if programs get newer malloc versions (because glibc updated malloc behavior),
the interceptor would have to keep adding new versioned definitions.

>>>  dlvsym should
>>>  match symbol binding here and pick up the unversioned definition as
>>>  well.
>>
>> Yes if the object does not have DT_VERSYM:
>> dlvsym(RTLD_DEFAULT, "foo", "v1") pick up unversioned foo if the object
>> does not have DT_VERSYM.
>>
>>>  Without interposition, the non-dlvsym cannot happen because we
>>>  have a version coverage check in elf/dl-version.c that should have
>>>  detected that the soname lacks the required version information.
>>
>> ?
>
>Needed versions contain a soname (vn_file), and we check that the shared
>object with that soname provides all the needed versions.  This is how
>we detect applications that run on too-old glibc even if lazy binding is
>active.

Yeah, this matches my understanding.

>>>* For symbol binding during relocation, a version-less lookup may need
>>>  to find a versioned definition.  This was initially implemented so
>>>  that glibc 2.1 could introduce symbol versioning without rebuilding
>>>  the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
>>>  not have symbol versioning.)  It has subsequently been used by other
>>>  libraries.  In this case, we should pick the symbol with the lowest
>>>  version index and a matching name.  The hidden/default version index
>>>  flag does not matter.  We need to pick the lowest version even if it
>>>  is no longer the default.  This ensures that old binaries do not pick
>>>  up a newer definition, changing behavior.
>>
>> Right. The index is either 1 (VER_NDX_GLOBAL; ld ensures VER_NDX_GLOBAL
>> is the only version attached to the symbol name) or 2.
>
>I believe this not to be true.  For example,
>sched_setaffinity@GLIBC_2.3.3 is the lowest version (with index 6, but
>that might vary from build to build), not sched_setaffinity@GLIBC_2.2.5.
>I think we should still bind to that if an object contains an
>unversioned undefined symbol reference to sched_setaffinity.

The current behavior is that an unversioned sched_setaffinity from
relocation does not bind to sched_setaffinity@GLIBC_2.3.3 (index > 2).

I think the idea is that when a DSO becomes versioned, all definitions
should be assigned VER_NDX_GLOBAL (1) or index 2.  Definitions from newer
versions do not count.

cat > ./a.c <<'eof'
#include <stdio.h>
int foo(int a);
int main(void) { printf("%d\n", foo(0)); }
eof
echo 'int foo(int a) { return -1; }' > ./b0.c
echo 'int foo_v1(int a) { return 1; } asm(".symver foo_v1, foo@v1, remove");' > ./b.c
echo 'va {}; v1 {}; v2 {};' > ./b.ver
sed 's/^        /\t/' > ./Makefile <<'eof'
.MAKE.MODE := meta curdirOk=1
CFLAGS = -fpic -g

a: a.o b0.so b.so
         $(CC) -Wl,--no-as-needed a.o b0.so -ldl -Wl,-rpath=$$PWD -o $@

b0.so: b0.o
         $(CC) $> -shared -Wl,--soname=b.so -o $@

b.so: b.o
         $(CC) $> -shared -Wl,--soname=b.so,--version-script=b.ver -o $@

clean:
         rm -f a *.so *.o *.meta
eof

Run `bmake; ./a` => `./a: symbol lookup error: ./a: undefined symbol: foo`

If va and v1 are swapped in b.ver, foo from a will be able to bind to foo@v1 in b.so.

Loosen the rule will unlikely break any users, but seems unnecessary to me
and does not help with the (necessary) inconsistency among reloc/dlsym/dlvsym.

>>>* For symbol lookup with dlsym (without a version), we need to find a
>>>  versioned definition.  The use case for that is mainly FFI bindings
>>>  (e.g., Python ctypes), where programmers generally do not specify
>>>  version information.  We decided that we would want the latest version
>>>  in this case.
>>
>> Correction: the default version, instead of the latest version.
>>
>> If somehow there are foo@@v2 and foo@v3, dlsym(RTLD_DEFAULT, "foo")
>> picks up foo@@v2.
>
>If there is no default version, I think we need to pick the latest
>version, instead of failing the lookup.

I prefer the current rtld behavior: it matches ld behavior.

In ld, an unversioned reference foo can be resolved by either foo or foo@@v2.
It cannot bind to a non-default version definition foo@v3.

foo can be understood as VER_NDX_GLOBAL in a versioned shared object.

I have some notes about ld behavior:
https://maskray.me/blog/2020-11-26-all-about-symbol-versioning#linker-behavior

Basically rtld behavior mostly matches ld, but is loose in some aspects for DSO-upgrade compatibility.

>>>  However, in this case, I would want us to consider
>>>  default versions only if they exist.
>>
>> The current behavior already implements this.
>>
>> When searching a definition for foo,
>>
>> * for an object without DT_VERSYM
>>   + it can be bound to foo
>> * for an object with DT_VERSYM
>>   + it can be bound to foo of version VER_NDX_GLOBAL. This takes precendence over the next two rules
>>   + it can be bound to foo of any default version
>>   + it can be bound to foo of non-default version index 2 in relocation resolving phase (not dlsym/dlvsym)
>
>Is VER_NDX_GLOBAL actually used?  It seems to be a marker for “we no
>longer use symbol versioning for this symbol”.

VER_NDX_GLOBAL (1) is used.  If a definition is not attached to a version node,
it gets VER_NDX_GLOBAL.

>I think these rules are not quite correct because they do not find the
>earliest version (for relocation bindings) or latest eligible version
>(for dlsym).

I have tried many examples to testify the rules.
Do you have an example that the rules fail?

>>>The current implementation has two different, potentially conflicting
>>>flags for indicating dlvsym lookups: the “hidden” member in the struct
>>>r_found_version object that is passed in, and the
>>>DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
>>>versioned symbol binding during relocation processing, the hidden flag
>>>does not matter according to the overview above.
>>
>> Based on my current understanding, removing DL_LOOKUP_RETURN_NEWEST is
>> fine, but we'd still need a flag.
>>
>> For the versioned lookup code, DL_LOOKUP_RETURN_NEWEST is unused.
>>
>> For the unversioned lookup code (dlvsym has been ruled out),
>> DL_LOOKUP_RETURN_NEWEST is to differentiate relocation resolving and
>> dlsym.
>
>I think we don't need both DL_LOOKUP_RETURN_NEWEST and version->hidden.
>
>We need to differentiate between unversioned relocation binding and
>dlsym.
>
>Versioned relocation binding and dlvsym should behave identical, I htink.

The current behavior is: dlvsym is stricter than versioned relocation binding.
Relocation binding accepts VER_NDX_GLOBAL while dlvsym does not.

dlvsym is to find the exact match, while relocation binding (for compatibility
reasons) has a fallback.

The current rtld behavior on dlvsym exactly matches how ld resolves a foo@v1
reference, so I quite like its semantics.

>>>There is a corner case for unversioned symbol definitions in versioned
>>>object, and how versioned lookups can bind against them.  I'm changing
>>>this code in my patch for bug 29190.  Such symbol definitions have an
>>>empty version name and version hash zero.  Apparently, the intent is
>>>that they can bind to any version, particularly if they are the default.
>>>(Or at index 0 or 1?  But the existing code does not check that.)  I
>>>have never seen them used in practice, so I don't know how to test them.
>>>The current code has checks for the default/hidden version index flag,
>>>but I don't think we need that, it is inconsistent with that.
>>
>> Index 0 is VER_NDX_LOCAL. Such symbols should have STB_LOCAL binding.
>> Such symbols, if present in .dynsym, should be STT_SECTION.
>> rtld does not need to check the invariant.
>> STT_SECTION symbols have been ignored by `check_match`.
>>
>> (GNU ld's powerpc64 port may produce R_PPC64_UADDR64 referencing a
>> STB_LOCAL section symbol.)
>
>dl-lookup.c does not encounter STB_LOCAL symbols because the DT_GNU_HASH
>array does not cover them (the symbias value in elf/dl-setup_hash.c).
>STB_LOCAL symbols come first in the symbol table (an ELF rule for symbol
>tables, “In each symbol table, all symbols with STB_LOCAL binding
>precede the weak and global symbols.”), and I think symbias is supposed
>to chosen in such a way that they do not show up in the table.

Agree

>However, in some objects, I see VER_NDX_LOCAL being used for undefined
>symbols.  Other objects use VER_NDX_GLOBAL.  As far as I can tell, they
>are all produced by BFD ld.

GNU ld before 2.37 may use VER_NDX_LOCAL for undefined symbols.
I reported the bug at https://sourceware.org/bugzilla/show_bug.cgi?id=26002

>>>Going back to your suggestion regarding default_version_sym, I think we
>>>need to turn this into a struct and keep track of the version index (the
>>>vd_ndx value) as well.  num_versions still should go because I don't
>>>think the number of versions matter in any case.
>>
>> Hmm. I don't think we need a vd_ndx value.
>
>I think we need it for unversioned relocation binding if version index 2
>does not exist for the symbol name, and for dlsym if the default version
>has been removed.

The debate is about whether we want to further loosen dlsym/reloc resolving behavior.
My opinion is no, so the current information is sufficient and we do not need a new value...

>Thanks,
>Florian
>
  
Fangrui Song June 10, 2022, 9:44 p.m. UTC | #7
On 2022-06-01, Fangrui Song wrote:
>On 2022-06-01, Florian Weimer wrote:
>>* Fangrui Song:
>>
>>>>Based on first principles, I think we should do the following:
>>>>
>>>>* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
>>>> and the target is versioned as well, a full match means equality of
>>>> symbol name and version.  The hidden/default flag and the version
>>>> index do not matter.
>>>
>>>Yes, the intuitive case.
>>>
>>>>* If we have a version to lookup up, but no version information in the
>>>> target, we match only the name.  This is necessary so that
>>>> interposition of e.g. malloc without a version works.
>>>
>>>Yes. Sanitizers work this way: they are unversioned but can intercept
>>>*@GLIBC_* functions.
>>
>>They break symbol versioning due to this, so I'm not sure if they are a
>>good example.
>
>The behavior is natural to me: the behavior when DT_VERSYM is absent
>matches the intention of the index 1/2 special rule when DT_VERSYM is present.
>
>On the other hand, without the rule, an interceptor needs a glibc style version
>script to intercept functions like malloc.  This would make the interceptor
>cumbersome.  Also, if programs get newer malloc versions (because glibc updated malloc behavior),
>the interceptor would have to keep adding new versioned definitions.
>
>>>> dlvsym should
>>>> match symbol binding here and pick up the unversioned definition as
>>>> well.
>>>
>>>Yes if the object does not have DT_VERSYM:
>>>dlvsym(RTLD_DEFAULT, "foo", "v1") pick up unversioned foo if the object
>>>does not have DT_VERSYM.
>>>
>>>> Without interposition, the non-dlvsym cannot happen because we
>>>> have a version coverage check in elf/dl-version.c that should have
>>>> detected that the soname lacks the required version information.
>>>
>>>?
>>
>>Needed versions contain a soname (vn_file), and we check that the shared
>>object with that soname provides all the needed versions.  This is how
>>we detect applications that run on too-old glibc even if lazy binding is
>>active.
>
>Yeah, this matches my understanding.
>
>>>>* For symbol binding during relocation, a version-less lookup may need
>>>> to find a versioned definition.  This was initially implemented so
>>>> that glibc 2.1 could introduce symbol versioning without rebuilding
>>>> the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
>>>> not have symbol versioning.)  It has subsequently been used by other
>>>> libraries.  In this case, we should pick the symbol with the lowest
>>>> version index and a matching name.  The hidden/default version index
>>>> flag does not matter.  We need to pick the lowest version even if it
>>>> is no longer the default.  This ensures that old binaries do not pick
>>>> up a newer definition, changing behavior.
>>>
>>>Right. The index is either 1 (VER_NDX_GLOBAL; ld ensures VER_NDX_GLOBAL
>>>is the only version attached to the symbol name) or 2.
>>
>>I believe this not to be true.  For example,
>>sched_setaffinity@GLIBC_2.3.3 is the lowest version (with index 6, but
>>that might vary from build to build), not sched_setaffinity@GLIBC_2.2.5.
>>I think we should still bind to that if an object contains an
>>unversioned undefined symbol reference to sched_setaffinity.
>
>The current behavior is that an unversioned sched_setaffinity from
>relocation does not bind to sched_setaffinity@GLIBC_2.3.3 (index > 2).
>
>I think the idea is that when a DSO becomes versioned, all definitions
>should be assigned VER_NDX_GLOBAL (1) or index 2.  Definitions from newer
>versions do not count.
>
>cat > ./a.c <<'eof'
>#include <stdio.h>
>int foo(int a);
>int main(void) { printf("%d\n", foo(0)); }
>eof
>echo 'int foo(int a) { return -1; }' > ./b0.c
>echo 'int foo_v1(int a) { return 1; } asm(".symver foo_v1, foo@v1, remove");' > ./b.c
>echo 'va {}; v1 {}; v2 {};' > ./b.ver
>sed 's/^        /\t/' > ./Makefile <<'eof'
>.MAKE.MODE := meta curdirOk=1
>CFLAGS = -fpic -g
>
>a: a.o b0.so b.so
>        $(CC) -Wl,--no-as-needed a.o b0.so -ldl -Wl,-rpath=$$PWD -o $@
>
>b0.so: b0.o
>        $(CC) $> -shared -Wl,--soname=b.so -o $@
>
>b.so: b.o
>        $(CC) $> -shared -Wl,--soname=b.so,--version-script=b.ver -o $@
>
>clean:
>        rm -f a *.so *.o *.meta
>eof
>
>Run `bmake; ./a` => `./a: symbol lookup error: ./a: undefined symbol: foo`
>
>If va and v1 are swapped in b.ver, foo from a will be able to bind to foo@v1 in b.so.
>
>Loosen the rule will unlikely break any users, but seems unnecessary to me
>and does not help with the (necessary) inconsistency among reloc/dlsym/dlvsym.
>
>>>>* For symbol lookup with dlsym (without a version), we need to find a
>>>> versioned definition.  The use case for that is mainly FFI bindings
>>>> (e.g., Python ctypes), where programmers generally do not specify
>>>> version information.  We decided that we would want the latest version
>>>> in this case.
>>>
>>>Correction: the default version, instead of the latest version.
>>>
>>>If somehow there are foo@@v2 and foo@v3, dlsym(RTLD_DEFAULT, "foo")
>>>picks up foo@@v2.
>>
>>If there is no default version, I think we need to pick the latest
>>version, instead of failing the lookup.
>
>I prefer the current rtld behavior: it matches ld behavior.
>
>In ld, an unversioned reference foo can be resolved by either foo or foo@@v2.
>It cannot bind to a non-default version definition foo@v3.
>
>foo can be understood as VER_NDX_GLOBAL in a versioned shared object.
>
>I have some notes about ld behavior:
>https://maskray.me/blog/2020-11-26-all-about-symbol-versioning#linker-behavior
>
>Basically rtld behavior mostly matches ld, but is loose in some aspects for DSO-upgrade compatibility.
>
>>>> However, in this case, I would want us to consider
>>>> default versions only if they exist.
>>>
>>>The current behavior already implements this.
>>>
>>>When searching a definition for foo,
>>>
>>>* for an object without DT_VERSYM
>>>  + it can be bound to foo
>>>* for an object with DT_VERSYM
>>>  + it can be bound to foo of version VER_NDX_GLOBAL. This takes precendence over the next two rules
>>>  + it can be bound to foo of any default version
>>>  + it can be bound to foo of non-default version index 2 in relocation resolving phase (not dlsym/dlvsym)
>>
>>Is VER_NDX_GLOBAL actually used?  It seems to be a marker for “we no
>>longer use symbol versioning for this symbol”.
>
>VER_NDX_GLOBAL (1) is used.  If a definition is not attached to a version node,
>it gets VER_NDX_GLOBAL.
>
>>I think these rules are not quite correct because they do not find the
>>earliest version (for relocation bindings) or latest eligible version
>>(for dlsym).
>
>I have tried many examples to testify the rules.
>Do you have an example that the rules fail?
>
>>>>The current implementation has two different, potentially conflicting
>>>>flags for indicating dlvsym lookups: the “hidden” member in the struct
>>>>r_found_version object that is passed in, and the
>>>>DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
>>>>versioned symbol binding during relocation processing, the hidden flag
>>>>does not matter according to the overview above.
>>>
>>>Based on my current understanding, removing DL_LOOKUP_RETURN_NEWEST is
>>>fine, but we'd still need a flag.
>>>
>>>For the versioned lookup code, DL_LOOKUP_RETURN_NEWEST is unused.
>>>
>>>For the unversioned lookup code (dlvsym has been ruled out),
>>>DL_LOOKUP_RETURN_NEWEST is to differentiate relocation resolving and
>>>dlsym.
>>
>>I think we don't need both DL_LOOKUP_RETURN_NEWEST and version->hidden.
>>
>>We need to differentiate between unversioned relocation binding and
>>dlsym.
>>
>>Versioned relocation binding and dlvsym should behave identical, I htink.
>
>The current behavior is: dlvsym is stricter than versioned relocation binding.
>Relocation binding accepts VER_NDX_GLOBAL while dlvsym does not.
>
>dlvsym is to find the exact match, while relocation binding (for compatibility
>reasons) has a fallback.
>
>The current rtld behavior on dlvsym exactly matches how ld resolves a foo@v1
>reference, so I quite like its semantics.
>
>>>>There is a corner case for unversioned symbol definitions in versioned
>>>>object, and how versioned lookups can bind against them.  I'm changing
>>>>this code in my patch for bug 29190.  Such symbol definitions have an
>>>>empty version name and version hash zero.  Apparently, the intent is
>>>>that they can bind to any version, particularly if they are the default.
>>>>(Or at index 0 or 1?  But the existing code does not check that.)  I
>>>>have never seen them used in practice, so I don't know how to test them.
>>>>The current code has checks for the default/hidden version index flag,
>>>>but I don't think we need that, it is inconsistent with that.
>>>
>>>Index 0 is VER_NDX_LOCAL. Such symbols should have STB_LOCAL binding.
>>>Such symbols, if present in .dynsym, should be STT_SECTION.
>>>rtld does not need to check the invariant.
>>>STT_SECTION symbols have been ignored by `check_match`.
>>>
>>>(GNU ld's powerpc64 port may produce R_PPC64_UADDR64 referencing a
>>>STB_LOCAL section symbol.)
>>
>>dl-lookup.c does not encounter STB_LOCAL symbols because the DT_GNU_HASH
>>array does not cover them (the symbias value in elf/dl-setup_hash.c).
>>STB_LOCAL symbols come first in the symbol table (an ELF rule for symbol
>>tables, “In each symbol table, all symbols with STB_LOCAL binding
>>precede the weak and global symbols.”), and I think symbias is supposed
>>to chosen in such a way that they do not show up in the table.
>
>Agree
>
>>However, in some objects, I see VER_NDX_LOCAL being used for undefined
>>symbols.  Other objects use VER_NDX_GLOBAL.  As far as I can tell, they
>>are all produced by BFD ld.
>
>GNU ld before 2.37 may use VER_NDX_LOCAL for undefined symbols.
>I reported the bug at https://sourceware.org/bugzilla/show_bug.cgi?id=26002
>
>>>>Going back to your suggestion regarding default_version_sym, I think we
>>>>need to turn this into a struct and keep track of the version index (the
>>>>vd_ndx value) as well.  num_versions still should go because I don't
>>>>think the number of versions matter in any case.
>>>
>>>Hmm. I don't think we need a vd_ndx value.
>>
>>I think we need it for unversioned relocation binding if version index 2
>>does not exist for the symbol name, and for dlsym if the default version
>>has been removed.
>
>The debate is about whether we want to further loosen dlsym/reloc resolving behavior.
>My opinion is no, so the current information is sufficient and we do not need a new value...
>
>>Thanks,
>>Florian
>>

Ping on this cleanup.
It does not change the behavior for ld output.
  
Fangrui Song June 28, 2022, 8:22 p.m. UTC | #8
On 2022-06-10, Fangrui Song wrote:
>On 2022-06-01, Fangrui Song wrote:
>>On 2022-06-01, Florian Weimer wrote:
>>>* Fangrui Song:
>>>
>>>>>Based on first principles, I think we should do the following:
>>>>>
>>>>>* If we have a version to look up (via a DT_VERNEED binding or dlvsym),
>>>>>and the target is versioned as well, a full match means equality of
>>>>>symbol name and version.  The hidden/default flag and the version
>>>>>index do not matter.
>>>>
>>>>Yes, the intuitive case.
>>>>
>>>>>* If we have a version to lookup up, but no version information in the
>>>>>target, we match only the name.  This is necessary so that
>>>>>interposition of e.g. malloc without a version works.
>>>>
>>>>Yes. Sanitizers work this way: they are unversioned but can intercept
>>>>*@GLIBC_* functions.
>>>
>>>They break symbol versioning due to this, so I'm not sure if they are a
>>>good example.
>>
>>The behavior is natural to me: the behavior when DT_VERSYM is absent
>>matches the intention of the index 1/2 special rule when DT_VERSYM is present.
>>
>>On the other hand, without the rule, an interceptor needs a glibc style version
>>script to intercept functions like malloc.  This would make the interceptor
>>cumbersome.  Also, if programs get newer malloc versions (because glibc updated malloc behavior),
>>the interceptor would have to keep adding new versioned definitions.
>>
>>>>>dlvsym should
>>>>>match symbol binding here and pick up the unversioned definition as
>>>>>well.
>>>>
>>>>Yes if the object does not have DT_VERSYM:
>>>>dlvsym(RTLD_DEFAULT, "foo", "v1") pick up unversioned foo if the object
>>>>does not have DT_VERSYM.
>>>>
>>>>>Without interposition, the non-dlvsym cannot happen because we
>>>>>have a version coverage check in elf/dl-version.c that should have
>>>>>detected that the soname lacks the required version information.
>>>>
>>>>?
>>>
>>>Needed versions contain a soname (vn_file), and we check that the shared
>>>object with that soname provides all the needed versions.  This is how
>>>we detect applications that run on too-old glibc even if lazy binding is
>>>active.
>>
>>Yeah, this matches my understanding.
>>
>>>>>* For symbol binding during relocation, a version-less lookup may need
>>>>>to find a versioned definition.  This was initially implemented so
>>>>>that glibc 2.1 could introduce symbol versioning without rebuilding
>>>>>the world.  (We have plenty of GLIBC_2.0 symbols, but glibc 2.0 did
>>>>>not have symbol versioning.)  It has subsequently been used by other
>>>>>libraries.  In this case, we should pick the symbol with the lowest
>>>>>version index and a matching name.  The hidden/default version index
>>>>>flag does not matter.  We need to pick the lowest version even if it
>>>>>is no longer the default.  This ensures that old binaries do not pick
>>>>>up a newer definition, changing behavior.
>>>>
>>>>Right. The index is either 1 (VER_NDX_GLOBAL; ld ensures VER_NDX_GLOBAL
>>>>is the only version attached to the symbol name) or 2.
>>>
>>>I believe this not to be true.  For example,
>>>sched_setaffinity@GLIBC_2.3.3 is the lowest version (with index 6, but
>>>that might vary from build to build), not sched_setaffinity@GLIBC_2.2.5.
>>>I think we should still bind to that if an object contains an
>>>unversioned undefined symbol reference to sched_setaffinity.
>>
>>The current behavior is that an unversioned sched_setaffinity from
>>relocation does not bind to sched_setaffinity@GLIBC_2.3.3 (index > 2).
>>
>>I think the idea is that when a DSO becomes versioned, all definitions
>>should be assigned VER_NDX_GLOBAL (1) or index 2.  Definitions from newer
>>versions do not count.
>>
>>cat > ./a.c <<'eof'
>>#include <stdio.h>
>>int foo(int a);
>>int main(void) { printf("%d\n", foo(0)); }
>>eof
>>echo 'int foo(int a) { return -1; }' > ./b0.c
>>echo 'int foo_v1(int a) { return 1; } asm(".symver foo_v1, foo@v1, remove");' > ./b.c
>>echo 'va {}; v1 {}; v2 {};' > ./b.ver
>>sed 's/^        /\t/' > ./Makefile <<'eof'
>>.MAKE.MODE := meta curdirOk=1
>>CFLAGS = -fpic -g
>>
>>a: a.o b0.so b.so
>>       $(CC) -Wl,--no-as-needed a.o b0.so -ldl -Wl,-rpath=$$PWD -o $@
>>
>>b0.so: b0.o
>>       $(CC) $> -shared -Wl,--soname=b.so -o $@
>>
>>b.so: b.o
>>       $(CC) $> -shared -Wl,--soname=b.so,--version-script=b.ver -o $@
>>
>>clean:
>>       rm -f a *.so *.o *.meta
>>eof
>>
>>Run `bmake; ./a` => `./a: symbol lookup error: ./a: undefined symbol: foo`
>>
>>If va and v1 are swapped in b.ver, foo from a will be able to bind to foo@v1 in b.so.
>>
>>Loosen the rule will unlikely break any users, but seems unnecessary to me
>>and does not help with the (necessary) inconsistency among reloc/dlsym/dlvsym.
>>
>>>>>* For symbol lookup with dlsym (without a version), we need to find a
>>>>>versioned definition.  The use case for that is mainly FFI bindings
>>>>>(e.g., Python ctypes), where programmers generally do not specify
>>>>>version information.  We decided that we would want the latest version
>>>>>in this case.
>>>>
>>>>Correction: the default version, instead of the latest version.
>>>>
>>>>If somehow there are foo@@v2 and foo@v3, dlsym(RTLD_DEFAULT, "foo")
>>>>picks up foo@@v2.
>>>
>>>If there is no default version, I think we need to pick the latest
>>>version, instead of failing the lookup.
>>
>>I prefer the current rtld behavior: it matches ld behavior.
>>
>>In ld, an unversioned reference foo can be resolved by either foo or foo@@v2.
>>It cannot bind to a non-default version definition foo@v3.
>>
>>foo can be understood as VER_NDX_GLOBAL in a versioned shared object.
>>
>>I have some notes about ld behavior:
>>https://maskray.me/blog/2020-11-26-all-about-symbol-versioning#linker-behavior
>>
>>Basically rtld behavior mostly matches ld, but is loose in some aspects for DSO-upgrade compatibility.
>>
>>>>>However, in this case, I would want us to consider
>>>>>default versions only if they exist.
>>>>
>>>>The current behavior already implements this.
>>>>
>>>>When searching a definition for foo,
>>>>
>>>>* for an object without DT_VERSYM
>>>> + it can be bound to foo
>>>>* for an object with DT_VERSYM
>>>> + it can be bound to foo of version VER_NDX_GLOBAL. This takes precendence over the next two rules
>>>> + it can be bound to foo of any default version
>>>> + it can be bound to foo of non-default version index 2 in relocation resolving phase (not dlsym/dlvsym)
>>>
>>>Is VER_NDX_GLOBAL actually used?  It seems to be a marker for “we no
>>>longer use symbol versioning for this symbol”.
>>
>>VER_NDX_GLOBAL (1) is used.  If a definition is not attached to a version node,
>>it gets VER_NDX_GLOBAL.
>>
>>>I think these rules are not quite correct because they do not find the
>>>earliest version (for relocation bindings) or latest eligible version
>>>(for dlsym).
>>
>>I have tried many examples to testify the rules.
>>Do you have an example that the rules fail?
>>
>>>>>The current implementation has two different, potentially conflicting
>>>>>flags for indicating dlvsym lookups: the “hidden” member in the struct
>>>>>r_found_version object that is passed in, and the
>>>>>DL_LOOKUP_RETURN_NEWEST.  I think we should consolidate this.  For
>>>>>versioned symbol binding during relocation processing, the hidden flag
>>>>>does not matter according to the overview above.
>>>>
>>>>Based on my current understanding, removing DL_LOOKUP_RETURN_NEWEST is
>>>>fine, but we'd still need a flag.
>>>>
>>>>For the versioned lookup code, DL_LOOKUP_RETURN_NEWEST is unused.
>>>>
>>>>For the unversioned lookup code (dlvsym has been ruled out),
>>>>DL_LOOKUP_RETURN_NEWEST is to differentiate relocation resolving and
>>>>dlsym.
>>>
>>>I think we don't need both DL_LOOKUP_RETURN_NEWEST and version->hidden.
>>>
>>>We need to differentiate between unversioned relocation binding and
>>>dlsym.
>>>
>>>Versioned relocation binding and dlvsym should behave identical, I htink.
>>
>>The current behavior is: dlvsym is stricter than versioned relocation binding.
>>Relocation binding accepts VER_NDX_GLOBAL while dlvsym does not.
>>
>>dlvsym is to find the exact match, while relocation binding (for compatibility
>>reasons) has a fallback.
>>
>>The current rtld behavior on dlvsym exactly matches how ld resolves a foo@v1
>>reference, so I quite like its semantics.
>>
>>>>>There is a corner case for unversioned symbol definitions in versioned
>>>>>object, and how versioned lookups can bind against them.  I'm changing
>>>>>this code in my patch for bug 29190.  Such symbol definitions have an
>>>>>empty version name and version hash zero.  Apparently, the intent is
>>>>>that they can bind to any version, particularly if they are the default.
>>>>>(Or at index 0 or 1?  But the existing code does not check that.)  I
>>>>>have never seen them used in practice, so I don't know how to test them.
>>>>>The current code has checks for the default/hidden version index flag,
>>>>>but I don't think we need that, it is inconsistent with that.
>>>>
>>>>Index 0 is VER_NDX_LOCAL. Such symbols should have STB_LOCAL binding.
>>>>Such symbols, if present in .dynsym, should be STT_SECTION.
>>>>rtld does not need to check the invariant.
>>>>STT_SECTION symbols have been ignored by `check_match`.
>>>>
>>>>(GNU ld's powerpc64 port may produce R_PPC64_UADDR64 referencing a
>>>>STB_LOCAL section symbol.)
>>>
>>>dl-lookup.c does not encounter STB_LOCAL symbols because the DT_GNU_HASH
>>>array does not cover them (the symbias value in elf/dl-setup_hash.c).
>>>STB_LOCAL symbols come first in the symbol table (an ELF rule for symbol
>>>tables, “In each symbol table, all symbols with STB_LOCAL binding
>>>precede the weak and global symbols.”), and I think symbias is supposed
>>>to chosen in such a way that they do not show up in the table.
>>
>>Agree
>>
>>>However, in some objects, I see VER_NDX_LOCAL being used for undefined
>>>symbols.  Other objects use VER_NDX_GLOBAL.  As far as I can tell, they
>>>are all produced by BFD ld.
>>
>>GNU ld before 2.37 may use VER_NDX_LOCAL for undefined symbols.
>>I reported the bug at https://sourceware.org/bugzilla/show_bug.cgi?id=26002
>>
>>>>>Going back to your suggestion regarding default_version_sym, I think we
>>>>>need to turn this into a struct and keep track of the version index (the
>>>>>vd_ndx value) as well.  num_versions still should go because I don't
>>>>>think the number of versions matter in any case.
>>>>
>>>>Hmm. I don't think we need a vd_ndx value.
>>>
>>>I think we need it for unversioned relocation binding if version index 2
>>>does not exist for the symbol name, and for dlsym if the default version
>>>has been removed.
>>
>>The debate is about whether we want to further loosen dlsym/reloc resolving behavior.
>>My opinion is no, so the current information is sufficient and we do not need a new value...
>>
>>>Thanks,
>>>Florian
>>>
>
>Ping on this cleanup.
>It does not change the behavior for ld output.

:)
  

Patch

diff --git a/elf/dl-lookup.c b/elf/dl-lookup.c
index a42f6d5390..0245f76799 100644
--- a/elf/dl-lookup.c
+++ b/elf/dl-lookup.c
@@ -66,8 +66,7 @@  check_match (const char *const undef_name,
 	     const Elf_Symndx symidx,
 	     const char *const strtab,
 	     const struct link_map *const map,
-	     const ElfW(Sym) **const versioned_sym,
-	     int *const num_versions)
+	     const ElfW(Sym) **const versioned_sym)
 {
   unsigned int stt = ELFW(ST_TYPE) (sym->st_info);
   assert (ELF_RTYPE_CLASS_PLT == 1);
@@ -92,6 +91,9 @@  check_match (const char *const undef_name,
     return NULL;
 
   const ElfW(Half) *verstab = map->l_versyms;
+  /* If the object is versioned, the linker ensures that either there is a
+     VER_NDX_GLOBAL definition or there are a few version index>=2 definitions,
+     with at most one being the default version. */
   if (version != NULL)
     {
       if (__glibc_unlikely (verstab == NULL))
@@ -124,33 +126,22 @@  check_match (const char *const undef_name,
     }
   else
     {
-      /* No specific version is selected.  There are two ways we
-	 can got here:
+      /* Looking for an unversioned symbol.  If the object is unversioned or
+	 there is a VER_NDX_GLOBAL definition, return sym.  Otherwise:
 
-	 - a binary which does not include versioning information
-	 is loaded
+	 If DL_LOOKUP_RETURN_NEWEST is set, return the default version if
+	 exists.
 
-	 - dlsym() instead of dlvsym() is used to get a symbol which
-	 might exist in more than one form
-
-	 If the library does not provide symbol version information
-	 there is no problem at all: we simply use the symbol if it
-	 is defined.
-
-	 These two lookups need to be handled differently if the
-	 library defines versions.  In the case of the old
-	 unversioned application the oldest (default) version
-	 should be used.  In case of a dlsym() call the latest and
-	 public interface should be returned.  */
+	 If DL_LOOKUP_RETURN_NEWEST is unset, return the index 2 definition if
+	 exists, otherwise return the default version if exists.  When the
+	 object became versioned, the original unversioned definition may take
+	 index 2.  */
       if (verstab != NULL)
 	{
 	  if ((verstab[symidx] & 0x7fff)
 	      >= ((flags & DL_LOOKUP_RETURN_NEWEST) ? 2 : 3))
 	    {
-	      /* Don't accept hidden symbols.  */
-	      if ((verstab[symidx] & 0x8000) == 0
-		  && (*num_versions)++ == 0)
-		/* No version so far.  */
+	      if ((verstab[symidx] & 0x8000) == 0)
 		*versioned_sym = sym;
 
 	      return NULL;
@@ -381,7 +372,6 @@  do_lookup_x (const char *undef_name, unsigned int new_hash,
 	continue;
 
       Elf_Symndx symidx;
-      int num_versions = 0;
       const ElfW(Sym) *versioned_sym = NULL;
 
       /* The tables for this map.  */
@@ -415,8 +405,7 @@  do_lookup_x (const char *undef_name, unsigned int new_hash,
 			symidx = ELF_MACHINE_HASH_SYMIDX (map, hasharr);
 			sym = check_match (undef_name, ref, version, flags,
 					   type_class, &symtab[symidx], symidx,
-					   strtab, map, &versioned_sym,
-					   &num_versions);
+					   strtab, map, &versioned_sym);
 			if (sym != NULL)
 			  goto found_it;
 		      }
@@ -440,18 +429,17 @@  do_lookup_x (const char *undef_name, unsigned int new_hash,
 	    {
 	      sym = check_match (undef_name, ref, version, flags,
 				 type_class, &symtab[symidx], symidx,
-				 strtab, map, &versioned_sym,
-				 &num_versions);
+				 strtab, map, &versioned_sym);
 	      if (sym != NULL)
 		goto found_it;
 	    }
 	}
 
-      /* If we have seen exactly one versioned symbol while we are
-	 looking for an unversioned symbol and the version is not the
-	 default version we still accept this symbol since there are
-	 no possible ambiguities.  */
-      sym = num_versions == 1 ? versioned_sym : NULL;
+      /* When looking for an unversioned symbol, a default version is a fallback
+	 when VER_NDX_GLOBAL is absent (and when DL_LOOKUP_RETURN_NEWEST is set,
+	 index 2 is also absent).  We don't repeat the check ensured by the
+	 linker: a symbol name has at most one default version.  */
+      sym = versioned_sym;
 
       if (sym != NULL)
 	{