RFC: Treat RTLD_GLOBAL as unique to namespace when used with dlmopen

Message ID 55A73673.3060104@redhat.com
State Dropped
Headers

Commit Message

Carlos O'Donell July 16, 2015, 4:43 a.m. UTC
  Michael Kerrisk and I are working on a man page for dlmopen.

I have a question, and a proposal for the community.

We do not allow dlmopen to use RTLD_GLOBAL. Was this really
intended or simply a QoI issue?

Without RTLD_GLOBAL support in dlmopen it means that
the newly loaded DSO in the given namespace is always RTLD_LOCAL.
This seems wrong since it means no DSO loaded via dlmopen can
be used to provide symbols to subsequently dlmopen'd DSOs in the
same namespace?

Therefore dlmopen at present serves only as a limited way to
load one library in an isolated namespace along with all of
the dependent (DT_NEEDED) libraries. It would seem to me that
RTLD_LOCAL already provides this functionality with the exception
that such a DSO may get promoted to RTLD_GLOBAL if future dlopen
calls load a DSO RTLD_GLOBAL that has an implicit dependency
on the RTLD_LOCAL DSO (DT_NEEDED). In this case the DSO loaded
RTLD_LOCAL is promoted to RTLD_GLOBAL to resolve the dependencies.
This breaks the RTLD_LOCAL isolation, and is one of the benefits
of loading a DSO with dlmopen since at least *that* copy will
never be promoted to RTLD_GLOBAL.

The clever developer says "No problem, I will dlmopen a stub
that dlopen's my library with RTLD_GLOBAL" under the impression
that global search list is unique per namespace. On expects
this allows the dlmopen'd stub to load several conjoined plugin
DSOs into the new namspace, having them to resolve their symbols
against eachother in an isolated way. This fails immediately
with a sigsegv (see Bug 18684[1]).

This trick fails for the same reason that calling dlmopen
with RTLD_GLOBAL would fail if you removed the check in dlfcn/dmlopen.c
(dlmopen_doit). When you go to add the DSO to the global
search list you find there is no search list setup. In the case of
the application we have rtld setup the global search list.

Which begs the question? What should the global search list
be for a new namespace? I propose that the global search
list for a new namespace should be a copy of the symbol search
list (scope) of the first DSO loaded into the namespace with
RTLD_GLOBAL, and subsequent RTLD_GLOBAL loads into the namespace
add to that list.

The Solaris documentation is silent on exactly what should happen
in this case. Since an alternate interpretation could be: All objects,
regardless of namespace (link map list) loaded with RTLD_GLOBAL are
available for symbol resolution for any objects. In which case
dlmopen with RTLD_GLOBAL makes no sense, other than perhaps symmetry
with dlopen, because the namespace isolation is lost. This still doesn't
solve the most compelling use case of an isolated set of dlmopen/dlopen
plugins with their own global search list.

The proposed interpretation of RTLD_GLOBAL for dlmopen would allow:

* Use dlmopen with RTLD_GLOBAL, making the symbols of the first
  object loaded into the namespace immediately available to
  subsequent DSOs loaded in constructors or other dlopen implicitly
  into the namespace.

* Use dlopen RTLD_GLOBAL to make symbols available for resolution
  only within the namespace the caller was in.

* Allows complete isolation of a group of dependent DSOs, either
  via DT_NEEDED dependencies or via dlopen or subsequent dlmopen.
  This isolation allows plugin virtualization via dlmopen.

Attached is a patch that fixes this for master. I still need to write
something like a dozen tests to show that this works as expected in
all the cases, but so far every test I've written works and doesn't
regress anything.

Obviously not for 2.22, but 2.23 material, along with Michael's
new dlmopen/dlinfo man pages we should be ready to help developers
use such a feature more extensively. At present I find almost no
code using dlmopen in userspace because it has languished as an
unsupported undocumented feature (Bug 15971, Bug 15271, and Bug 15134
all need fixing).

Thoughts?

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18684
  

Comments

Michael Kerrisk \(man-pages\) July 20, 2015, 6:56 p.m. UTC | #1
Hi all,

I'll add some further details to Carlos's points, plus some 
observations from testing on Solaris.

On 07/16/2015 06:43 AM, Carlos O'Donell wrote:
> Michael Kerrisk and I are working on a man page for dlmopen.
> 
> I have a question, and a proposal for the community.
> 
> We do not allow dlmopen to use RTLD_GLOBAL. Was this really
> intended or simply a QoI issue?

Well, the API comes from Solaris, but does not follow 
Solaris behavior.

> Without RTLD_GLOBAL support in dlmopen it means that
> the newly loaded DSO in the given namespace is always RTLD_LOCAL.
> This seems wrong since it means no DSO loaded via dlmopen can
> be used to provide symbols to subsequently dlmopen'd DSOs in the
> same namespace?

Exactly. 

> Therefore dlmopen at present serves only as a limited way to
> load one library in an isolated namespace along with all of
> the dependent (DT_NEEDED) libraries. It would seem to me that
> RTLD_LOCAL already provides this functionality with the exception
> that such a DSO may get promoted to RTLD_GLOBAL if future dlopen
> calls load a DSO RTLD_GLOBAL that has an implicit dependency
> on the RTLD_LOCAL DSO (DT_NEEDED). In this case the DSO loaded
> RTLD_LOCAL is promoted to RTLD_GLOBAL to resolve the dependencies.
> This breaks the RTLD_LOCAL isolation, and is one of the benefits
> of loading a DSO with dlmopen since at least *that* copy will
> never be promoted to RTLD_GLOBAL.

Correct. And this is not the way that things are on Soalris.

> The clever developer says "No problem, I will dlmopen a stub
> that dlopen's my library with RTLD_GLOBAL" under the impression
> that global search list is unique per namespace. On expects
> this allows the dlmopen'd stub to load several conjoined plugin
> DSOs into the new namspace, having them to resolve their symbols
> against eachother in an isolated way. This fails immediately
> with a sigsegv (see Bug 18684[1]).

This is precisely the use case the Solaris dlmopen() does support:
isolation of load namespaces, while allowing DSOs inside a namespace
to share symbols via RTLD_GLOBAL.
> 
> This trick fails for the same reason that calling dlmopen
> with RTLD_GLOBAL would fail if you removed the check in dlfcn/dmlopen.c
> (dlmopen_doit). When you go to add the DSO to the global
> search list you find there is no search list setup. In the case of
> the application we have rtld setup the global search list.
> 
> Which begs the question? What should the global search list
> be for a new namespace? I propose that the global search
> list for a new namespace should be a copy of the symbol search
> list (scope) of the first DSO loaded into the namespace with
> RTLD_GLOBAL, and subsequent RTLD_GLOBAL loads into the namespace
> add to that list.

The above is what Solaris appears to provide.

> The Solaris documentation is silent on exactly what should happen
> in this case. 

Yes, but notably the Solaris documentation does not explicitly
prohibit the use of RTLD_GLOBAL with dlmopen(). The Solaris
documentation says:

     The dlmopen() function is identical to dlopen(), except that
     an identifying link-map ID (lmid) is provided. This link-map
     ID informs the dynamic linking facilities upon  which  link-
     map  list  to  load  the  object.

> Since an alternate interpretation could be: All objects,
> regardless of namespace (link map list) loaded with RTLD_GLOBAL are
> available for symbol resolution for any objects. In which case
> dlmopen with RTLD_GLOBAL makes no sense, other than perhaps symmetry
> with dlopen, because the namespace isolation is lost. This still doesn't
> solve the most compelling use case of an isolated set of dlmopen/dlopen
> plugins with their own global search list.

And, in my testing, the above is *not* what Solaris does.

> The proposed interpretation of RTLD_GLOBAL for dlmopen would allow:
> 
> * Use dlmopen with RTLD_GLOBAL, making the symbols of the first
>   object loaded into the namespace immediately available to
>   subsequent DSOs loaded in constructors or other dlopen implicitly
>   into the namespace.
> 
> * Use dlopen RTLD_GLOBAL to make symbols available for resolution
>   only within the namespace the caller was in.
> 
> * Allows complete isolation of a group of dependent DSOs, either
>   via DT_NEEDED dependencies or via dlopen or subsequent dlmopen.
>   This isolation allows plugin virtualization via dlmopen.

The above is what Solaris seems to provide.

> Attached is a patch that fixes this for master. I still need to write
> something like a dozen tests to show that this works as expected in
> all the cases, but so far every test I've written works and doesn't
> regress anything.

I've not yet had a chance to test this patch. Carlos, you may wish
to try my code examples, and check how things look compared to Solaris.

One other deviation that I note from Solaris. The dlopen() man page
currently says:

       If filename is NULL, then the returned handle is  for
       the  main  program.

And this is what glibc currently does *regardless* of the namespace
from which the dlopen(NULL, flags) call is made. But, in the context
of dlmopen(LM_ID_NEWLM) namespaces, I'd expect this call to return 
something like "the root of the this namespace". And that is what
Solaris appears to do.

> Obviously not for 2.22, but 2.23 material, along with Michael's
> new dlmopen/dlinfo man pages we should be ready to help developers
> use such a feature more extensively. At present I find almost no
> code using dlmopen in userspace because it has languished as an
> unsupported undocumented feature (Bug 15971, Bug 15271, and Bug 15134
> all need fixing).

I would said "... because it currently serves no useful purpose".
The dlmopen() seems to have been added to Solaris to support
precisely the use cases that Carlos describes, and the glibc
implementation doesn't support those cases at all.

The attached tarball contains a short build script that creates a few
shared libraries from (mostly) simple (and commented) source files.

The overall structure is as follows:

    main():

        1. Loads libabc.so with either dlmopen() or dlopen() and 
           with either RTLD_GLOBAL or RTLD_LOCAL, depending on the 
           command-line arguments. If no arguments are provided, the 
           default is dlmopen(..., RTLD_GLOBAL);

        2. Invokes abc_start() in libabc.so

    abc_start():
        1. Loads some other shared libraries using different
           combinations of dlmopen() and RTLD_GLOBAL vs RTLD_LOCAL.

        2. Invokes a function qrs_start() in the libqrs.so
           library.

    qrs_start():
        Looks up (dlsym()) various symbols in the other shared
        libraries and reports on success or failure of the lookups.

    main():
        Control eventually returns to main(), and it then looks up
        some of the same symbols as qrs_start() and reports on
        success or failure of the lookups.    

The program produces log messages that should make the results 
reasonably easy to interpret. Annotated output from a sample
run follows.

---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
$ uname -a
SunOS login 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
$ sh build.sh && ./main
main(): lmid from dlopen(NULL) is 0 (handle = 0xff3634d8)
main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
main(): lmid from dlopen("libabc.so") is -13222656 (handle = 0xff371560)
main(): invoking abc_start()
    Called abc_start()
# Note in next line that dlopen(NULL) gave us back a handle for something
# other than initial NS. Linux differs on this point.
    abc_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
    abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
    abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
    abc_start(): invoking qrs_start()
        Called qrs_start()
        qrs_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
        qrs_start(): lookup of "abc" succeeded   # In this NS, with 
        qrs_start(): lookup of "def" failed      # Was loaded into initial NS
        qrs_start(): lookup of "jkl" succeeded
        qrs_start(): lookup of "mno" failed      # Was loaded with RTLD_LOCAL
        qrs_start(): lookup of "main" failed     # Is in initial NS
# Now do some lookups from initial NS
main(): lookup of "abc" failed                   # In another NS
main(): lookup of "def" succeeded                # Was loaded into initial NS
main(): lookup of "jkl" failed                   # In another NS
main(): lookup of "mno" failed                   # In another NS (+ RTLD_LOCAL)
---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

Cheers,

Michael
  
Michael Kerrisk \(man-pages\) July 24, 2015, 9:29 a.m. UTC | #2
On 07/16/2015 06:43 AM, Carlos O'Donell wrote:
> Michael Kerrisk and I are working on a man page for dlmopen.

See below for the current draft of the text describing dlmopen().
Note that this text describes how Carlos and I think the API
is supposed to behave (at least, I think the text fits with
discussions Carlos and I had), and it's also consistent with
actual Solaris behavior. The BUGS section describes the
points that make dlmopen() unfit for purpose (IIUC).

Cheers,

Michael



SYNOPSIS
      ....
       #define _GNU_SOURCE
       #include <dlfcn.h>

       void *dlmopen (Lmid_t lmid, const char *filename, int flags);

       Link with -ldl.

DESCRIPTION
   dlopen()
       ...
   dlmopen()
       This function performs the same task as dlopen()—the filename and
       flags  arguments,  as  well  as  the  return value, are the same,
       except for the differences noted below.

       The dlmopen() function differs from dlopen() primarily in that it
       accepts an additional argument, lmid, that specifies the link-map
       list (also referred to as a namespace) in which the shared object
       should  be loaded.  (By comparison, dlopen() adds the dynamically
       loaded shared object to the same namespace as the  shared  object
       from  which  the  dlopen()  call is made.)  The Lmid_t type is an
       opaque handle that refers to a namespace.

       The lmid argument is either  the  ID  of  an  existing  namespace
       (which  can be obtained using the dlinfo(3) RTLD_DI_LMID request)
       or one of the following special values:

       LM_ID_BASE
              Load the shared object in the initial namespace (i.e., the
              application's namespace).

       LM_ID_NEWLM
              Create  a new namespace and load the shared object in that
              namespace.  The object must have been correctly linked  to
              reference   all  of  the  other  shared  objects  that  it
              requires, since the new namespace is initially empty.

       If handle is NULL, then the only  permitted  value  for  lmid  is
       LM_ID_BASE.

       ....

NOTES
   dlmopen() and namespaces
       A  link-map list defines an isolated namespace for the resolution
       of symbols by the dynamic linker.  Within a namespace,  dependent
       shared  objects  are  implicitly  loaded  according  to the usual
       rules, and symbol references are likewise resolved  according  to
       the  usual  rules, but such resolution is confined to the defini‐
       tions provided by the objects  that  have  been  (explicitly  and
       implicitly) loaded into the namespace.

       The  dlmopen() function permits object-load isolation—the ability
       to load a shared object in a new namespace without  exposing  the
       rest  of the application to the symbols made available by the new
       object.  Note that the use of the RTLD_LOCAL flag is  not  suffi‐
       cient  for this purpose, since it prevents a shared object's sym‐
       bols from being available to any other shared  object.   In  some
       cases,  we may want to make the symbols provided by a dynamically
       loaded shared object available to  (a  subset  of)  other  shared
       objects without exposing those symbols to the entire application.
       This can be achieved  by  using  a  separate  namespace  and  the
       RTLD_GLOBAL flag.

       Possible  uses  of  dlmopen() are plugins where the author of the
       plugin-loading framework can't trust the plugin authors and  does
       not  wish  any  undefined symbols from the plugin framework to be
       resolved to plugin symbols.  Another use  is  to  load  the  same
       object  more than once.  Without the use of dlmopen(), this would
       require the creation of distinct  copies  of  the  shared  object
       file.   Using dlmopen(), this can be achieved by loading the same
       shared object file into different namespaces.

       The glibc implementation supports a maximum of 16 namespaces.

       ...
BUGS
       As  at  glibc  2.21, specifying the RTLD_GLOBAL flag when calling
       dlmopen()   generates   an   error.    Furthermore,    specifying
       RTLD_GLOBAL  when  calling  dlopen()  results  in a program crash
       (SIGSEGV) if the call is made from any object loaded in a  names‐
       pace other than the initial namespace.
  
Carlos O'Donell July 24, 2015, 6:37 p.m. UTC | #3
On 07/20/2015 02:56 PM, Michael Kerrisk (man-pages) wrote:
> Hi all,
> 
> I'll add some further details to Carlos's points, plus some 
> observations from testing on Solaris.

Thank you.

> On 07/16/2015 06:43 AM, Carlos O'Donell wrote:
>> Michael Kerrisk and I are working on a man page for dlmopen.
>>
>> I have a question, and a proposal for the community.
>>
>> We do not allow dlmopen to use RTLD_GLOBAL. Was this really
>> intended or simply a QoI issue?
> 
> Well, the API comes from Solaris, but does not follow 
> Solaris behavior.

It should to the extent that it makes sense for our users.

>> Therefore dlmopen at present serves only as a limited way to
>> load one library in an isolated namespace along with all of
>> the dependent (DT_NEEDED) libraries. It would seem to me that
>> RTLD_LOCAL already provides this functionality with the exception
>> that such a DSO may get promoted to RTLD_GLOBAL if future dlopen
>> calls load a DSO RTLD_GLOBAL that has an implicit dependency
>> on the RTLD_LOCAL DSO (DT_NEEDED). In this case the DSO loaded
>> RTLD_LOCAL is promoted to RTLD_GLOBAL to resolve the dependencies.
>> This breaks the RTLD_LOCAL isolation, and is one of the benefits
>> of loading a DSO with dlmopen since at least *that* copy will
>> never be promoted to RTLD_GLOBAL.
> 
> Correct. And this is not the way that things are on Soalris.

Thanks for checking.

>> The clever developer says "No problem, I will dlmopen a stub
>> that dlopen's my library with RTLD_GLOBAL" under the impression
>> that global search list is unique per namespace. On expects
>> this allows the dlmopen'd stub to load several conjoined plugin
>> DSOs into the new namspace, having them to resolve their symbols
>> against eachother in an isolated way. This fails immediately
>> with a sigsegv (see Bug 18684[1]).
> 
> This is precisely the use case the Solaris dlmopen() does support:
> isolation of load namespaces, while allowing DSOs inside a namespace
> to share symbols via RTLD_GLOBAL.

I have seen some academic projects that used dlmopen in Solaris to
implement a form  of virtualization via the isolation of library
loading. The use of dlmopen solves quite a number of interesting
security and isolation problems within an application.

>> This trick fails for the same reason that calling dlmopen
>> with RTLD_GLOBAL would fail if you removed the check in dlfcn/dmlopen.c
>> (dlmopen_doit). When you go to add the DSO to the global
>> search list you find there is no search list setup. In the case of
>> the application we have rtld setup the global search list.
>>
>> Which begs the question? What should the global search list
>> be for a new namespace? I propose that the global search
>> list for a new namespace should be a copy of the symbol search
>> list (scope) of the first DSO loaded into the namespace with
>> RTLD_GLOBAL, and subsequent RTLD_GLOBAL loads into the namespace
>> add to that list.
> 
> The above is what Solaris appears to provide.

OK.

>> The Solaris documentation is silent on exactly what should happen
>> in this case. 
> 
> Yes, but notably the Solaris documentation does not explicitly
> prohibit the use of RTLD_GLOBAL with dlmopen(). The Solaris
> documentation says:
> 
>      The dlmopen() function is identical to dlopen(), except that
>      an identifying link-map ID (lmid) is provided. This link-map
>      ID informs the dynamic linking facilities upon  which  link-
>      map  list  to  load  the  object.

Agreed.

>> Since an alternate interpretation could be: All objects,
>> regardless of namespace (link map list) loaded with RTLD_GLOBAL are
>> available for symbol resolution for any objects. In which case
>> dlmopen with RTLD_GLOBAL makes no sense, other than perhaps symmetry
>> with dlopen, because the namespace isolation is lost. This still doesn't
>> solve the most compelling use case of an isolated set of dlmopen/dlopen
>> plugins with their own global search list.
> 
> And, in my testing, the above is *not* what Solaris does.

Good. I proposed the alternatives only in as much as they exist, but
I do not believe they are the correct technical solution.

>> The proposed interpretation of RTLD_GLOBAL for dlmopen would allow:
>>
>> * Use dlmopen with RTLD_GLOBAL, making the symbols of the first
>>   object loaded into the namespace immediately available to
>>   subsequent DSOs loaded in constructors or other dlopen implicitly
>>   into the namespace.
>>
>> * Use dlopen RTLD_GLOBAL to make symbols available for resolution
>>   only within the namespace the caller was in.
>>
>> * Allows complete isolation of a group of dependent DSOs, either
>>   via DT_NEEDED dependencies or via dlopen or subsequent dlmopen.
>>   This isolation allows plugin virtualization via dlmopen.
> 
> The above is what Solaris seems to provide.

OK.

>> Attached is a patch that fixes this for master. I still need to write
>> something like a dozen tests to show that this works as expected in
>> all the cases, but so far every test I've written works and doesn't
>> regress anything.
> 
> I've not yet had a chance to test this patch. Carlos, you may wish
> to try my code examples, and check how things look compared to Solaris.
> 
> One other deviation that I note from Solaris. The dlopen() man page
> currently says:
> 
>        If filename is NULL, then the returned handle is  for
>        the  main  program.
> 
> And this is what glibc currently does *regardless* of the namespace
> from which the dlopen(NULL, flags) call is made. But, in the context
> of dlmopen(LM_ID_NEWLM) namespaces, I'd expect this call to return 
> something like "the root of the this namespace". And that is what
> Solaris appears to do.

Agreed. We can fix that.

>> Obviously not for 2.22, but 2.23 material, along with Michael's
>> new dlmopen/dlinfo man pages we should be ready to help developers
>> use such a feature more extensively. At present I find almost no
>> code using dlmopen in userspace because it has languished as an
>> unsupported undocumented feature (Bug 15971, Bug 15271, and Bug 15134
>> all need fixing).
> 
> I would said "... because it currently serves no useful purpose".
> The dlmopen() seems to have been added to Solaris to support
> precisely the use cases that Carlos describes, and the glibc
> implementation doesn't support those cases at all.
> 
> The attached tarball contains a short build script that creates a few
> shared libraries from (mostly) simple (and commented) source files.
> 
> The overall structure is as follows:
> 
>     main():
> 
>         1. Loads libabc.so with either dlmopen() or dlopen() and 
>            with either RTLD_GLOBAL or RTLD_LOCAL, depending on the 
>            command-line arguments. If no arguments are provided, the 
>            default is dlmopen(..., RTLD_GLOBAL);
> 
>         2. Invokes abc_start() in libabc.so
> 
>     abc_start():
>         1. Loads some other shared libraries using different
>            combinations of dlmopen() and RTLD_GLOBAL vs RTLD_LOCAL.
> 
>         2. Invokes a function qrs_start() in the libqrs.so
>            library.
> 
>     qrs_start():
>         Looks up (dlsym()) various symbols in the other shared
>         libraries and reports on success or failure of the lookups.
> 
>     main():
>         Control eventually returns to main(), and it then looks up
>         some of the same symbols as qrs_start() and reports on
>         success or failure of the lookups.    
> 
> The program produces log messages that should make the results 
> reasonably easy to interpret. Annotated output from a sample
> run follows.
> 
> ---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
> $ uname -a
> SunOS login 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
> $ sh build.sh && ./main
> main(): lmid from dlopen(NULL) is 0 (handle = 0xff3634d8)
> main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
> main(): lmid from dlopen("libabc.so") is -13222656 (handle = 0xff371560)
> main(): invoking abc_start()
>     Called abc_start()
> # Note in next line that dlopen(NULL) gave us back a handle for something
> # other than initial NS. Linux differs on this point.
>     abc_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
>     abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
>     abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
>     abc_start(): invoking qrs_start()
>         Called qrs_start()
>         qrs_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
>         qrs_start(): lookup of "abc" succeeded   # In this NS, with 
>         qrs_start(): lookup of "def" failed      # Was loaded into initial NS
>         qrs_start(): lookup of "jkl" succeeded
>         qrs_start(): lookup of "mno" failed      # Was loaded with RTLD_LOCAL
>         qrs_start(): lookup of "main" failed     # Is in initial NS
> # Now do some lookups from initial NS
> main(): lookup of "abc" failed                   # In another NS
> main(): lookup of "def" succeeded                # Was loaded into initial NS
> main(): lookup of "jkl" failed                   # In another NS
> main(): lookup of "mno" failed                   # In another NS (+ RTLD_LOCAL)
> ---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

With a few more patches I get *almost* all the way there:

[carlos@athas dlmopen_expt]$ ./main
main(): lmid from dlopen(NULL) is 0 (handle = 0x0x7fa3e58ec188)
main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
main(): lmid from dlopen("libabc.so") is 1 (handle = 0x0x2267030)
main(): invoking abc_start()
    Called abc_start()
    abc_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x2267030)
    abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
    abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
    abc_start(): invoking qrs_start()
        Called qrs_start()
        qrs_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x2267030)
        qrs_start(): lookup of "abc" succeeded
        qrs_start(): lookup of "def" failed
Segmentation fault (core dumped)

There is more work to be done. This failure is from calling free() in
the non-LM_ID_BASE namespace for the first time.

My opinion is that this should all just work, but may require some special
cases in libc.so.6 and ld.so to make sure everything is initialized in the
new namespace and has it's own distinct TLS blocks (doesn't use the base
namespace TLS blocks).

The bummer is that gdb stops working to debug anything after the dlmopen.
We're going to need their help to continue debugging this after we get
the basic patches in place for 2.23.

Cheers,
Carlos.
  
Carlos O'Donell July 24, 2015, 6:43 p.m. UTC | #4
On 07/24/2015 05:29 AM, Michael Kerrisk (man-pages) wrote:
> On 07/16/2015 06:43 AM, Carlos O'Donell wrote:
>> Michael Kerrisk and I are working on a man page for dlmopen.
> 
> See below for the current draft of the text describing dlmopen().
> Note that this text describes how Carlos and I think the API
> is supposed to behave (at least, I think the text fits with
> discussions Carlos and I had), and it's also consistent with
> actual Solaris behavior. The BUGS section describes the
> points that make dlmopen() unfit for purpose (IIUC).
> 
> Cheers,
> 
> Michael
> 
> 
> 
> SYNOPSIS
>       ....
>        #define _GNU_SOURCE
>        #include <dlfcn.h>
> 
>        void *dlmopen (Lmid_t lmid, const char *filename, int flags);
> 
>        Link with -ldl.
> 
> DESCRIPTION
>    dlopen()
>        ...
>    dlmopen()
>        This function performs the same task as dlopen()—the filename and
>        flags  arguments,  as  well  as  the  return value, are the same,
>        except for the differences noted below.
> 
>        The dlmopen() function differs from dlopen() primarily in that it
>        accepts an additional argument, lmid, that specifies the link-map
>        list (also referred to as a namespace) in which the shared object
>        should  be loaded.  (By comparison, dlopen() adds the dynamically
>        loaded shared object to the same namespace as the  shared  object
>        from  which  the  dlopen()  call is made.)  The Lmid_t type is an
>        opaque handle that refers to a namespace.
> 
>        The lmid argument is either  the  ID  of  an  existing  namespace
>        (which  can be obtained using the dlinfo(3) RTLD_DI_LMID request)
>        or one of the following special values:
> 
>        LM_ID_BASE
>               Load the shared object in the initial namespace (i.e., the
>               application's namespace).
> 
>        LM_ID_NEWLM
>               Create  a new namespace and load the shared object in that
>               namespace.  The object must have been correctly linked  to
>               reference   all  of  the  other  shared  objects  that  it
>               requires, since the new namespace is initially empty.
> 
>        If handle is NULL, then the only  permitted  value  for  lmid  is
>        LM_ID_BASE.

What handle is NULL? If "filename" is NULL? This would be a glibc limitation
which we plan to fix in 2.23. We should certainly return the handle to the
base object loaded in the namespace as you suggest, just like the application
is returned upon dlopen (NULL).

> 
>        ....
> 
> NOTES
>    dlmopen() and namespaces
>        A  link-map list defines an isolated namespace for the resolution
>        of symbols by the dynamic linker.  Within a namespace,  dependent
>        shared  objects  are  implicitly  loaded  according  to the usual
>        rules, and symbol references are likewise resolved  according  to
>        the  usual  rules, but such resolution is confined to the defini‐
>        tions provided by the objects  that  have  been  (explicitly  and
>        implicitly) loaded into the namespace.
> 
>        The  dlmopen() function permits object-load isolation—the ability
>        to load a shared object in a new namespace without  exposing  the
>        rest  of the application to the symbols made available by the new
>        object.  Note that the use of the RTLD_LOCAL flag is  not  suffi‐
>        cient  for this purpose, since it prevents a shared object's sym‐
>        bols from being available to any other shared  object.   In  some
>        cases,  we may want to make the symbols provided by a dynamically
>        loaded shared object available to  (a  subset  of)  other  shared
>        objects without exposing those symbols to the entire application.
>        This can be achieved  by  using  a  separate  namespace  and  the
>        RTLD_GLOBAL flag.

Lastly, RTLD_LOCAL loaded objects may be promoted to RTLD_GLOBAL if they
are dependencies of another RTLD_GLOBAL loaded object. Therefore RTLD_LOCAL
is still not sufficient to isolate the loaded shared object without explicit
control over all object dependencies.

> 
>        Possible  uses  of  dlmopen() are plugins where the author of the
>        plugin-loading framework can't trust the plugin authors and  does
>        not  wish  any  undefined symbols from the plugin framework to be
>        resolved to plugin symbols.  Another use  is  to  load  the  same
>        object  more than once.  Without the use of dlmopen(), this would
>        require the creation of distinct  copies  of  the  shared  object
>        file.   Using dlmopen(), this can be achieved by loading the same
>        shared object file into different namespaces.
> 
>        The glibc implementation supports a maximum of 16 namespaces.
> 
>        ...
> BUGS
>        As  at  glibc  2.21, specifying the RTLD_GLOBAL flag when calling
>        dlmopen()   generates   an   error.    Furthermore,    specifying
>        RTLD_GLOBAL  when  calling  dlopen()  results  in a program crash
>        (SIGSEGV) if the call is made from any object loaded in a  names‐
>        pace other than the initial namespace.
> 

Looks good to me. We need dlinfo documented also, which you already know,
but I wanted to make that explicit in this public discussion.

Cheers,
Carlos.
  
Michael Kerrisk \(man-pages\) July 24, 2015, 7:33 p.m. UTC | #5
Hi Carlos,

On 07/24/2015 08:37 PM, Carlos O'Donell wrote:
> On 07/20/2015 02:56 PM, Michael Kerrisk (man-pages) wrote:

[...]

>>> The clever developer says "No problem, I will dlmopen a stub
>>> that dlopen's my library with RTLD_GLOBAL" under the impression
>>> that global search list is unique per namespace. On expects
>>> this allows the dlmopen'd stub to load several conjoined plugin
>>> DSOs into the new namspace, having them to resolve their symbols
>>> against eachother in an isolated way. This fails immediately
>>> with a sigsegv (see Bug 18684[1]).
>>
>> This is precisely the use case the Solaris dlmopen() does support:
>> isolation of load namespaces, while allowing DSOs inside a namespace
>> to share symbols via RTLD_GLOBAL.
> 
> I have seen some academic projects that used dlmopen in Solaris to
> implement a form  of virtualization via the isolation of library
> loading. The use of dlmopen solves quite a number of interesting
> security and isolation problems within an application.

Yes,I came across one paper also about this.

[...]

(Yes, I supposed you were just looking for completeness in the discussion.)

>>> Obviously not for 2.22, but 2.23 material, along with Michael's
>>> new dlmopen/dlinfo man pages we should be ready to help developers
>>> use such a feature more extensively. At present I find almost no
>>> code using dlmopen in userspace because it has languished as an
>>> unsupported undocumented feature (Bug 15971, Bug 15271, and Bug 15134
>>> all need fixing).
>>
>> I would said "... because it currently serves no useful purpose".
>> The dlmopen() seems to have been added to Solaris to support
>> precisely the use cases that Carlos describes, and the glibc
>> implementation doesn't support those cases at all.
>>
>> The attached tarball contains a short build script that creates a few
>> shared libraries from (mostly) simple (and commented) source files.
>>
>> The overall structure is as follows:
>>
>>     main():
>>
>>         1. Loads libabc.so with either dlmopen() or dlopen() and 
>>            with either RTLD_GLOBAL or RTLD_LOCAL, depending on the 
>>            command-line arguments. If no arguments are provided, the 
>>            default is dlmopen(..., RTLD_GLOBAL);
>>
>>         2. Invokes abc_start() in libabc.so
>>
>>     abc_start():
>>         1. Loads some other shared libraries using different
>>            combinations of dlmopen() and RTLD_GLOBAL vs RTLD_LOCAL.
>>
>>         2. Invokes a function qrs_start() in the libqrs.so
>>            library.
>>
>>     qrs_start():
>>         Looks up (dlsym()) various symbols in the other shared
>>         libraries and reports on success or failure of the lookups.
>>
>>     main():
>>         Control eventually returns to main(), and it then looks up
>>         some of the same symbols as qrs_start() and reports on
>>         success or failure of the lookups.    
>>
>> The program produces log messages that should make the results 
>> reasonably easy to interpret. Annotated output from a sample
>> run follows.
>>
>> ---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
>> $ uname -a
>> SunOS login 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
>> $ sh build.sh && ./main
>> main(): lmid from dlopen(NULL) is 0 (handle = 0xff3634d8)
>> main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
>> main(): lmid from dlopen("libabc.so") is -13222656 (handle = 0xff371560)
>> main(): invoking abc_start()
>>     Called abc_start()
>> # Note in next line that dlopen(NULL) gave us back a handle for something
>> # other than initial NS. Linux differs on this point.
>>     abc_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
>>     abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
>>     abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
>>     abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
>>     abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
>>     abc_start(): invoking qrs_start()
>>         Called qrs_start()
>>         qrs_start(): lmid from dlopen(NULL) is -13222656 (handle = 0xff173690)
>>         qrs_start(): lookup of "abc" succeeded   # In this NS, with 
>>         qrs_start(): lookup of "def" failed      # Was loaded into initial NS
>>         qrs_start(): lookup of "jkl" succeeded
>>         qrs_start(): lookup of "mno" failed      # Was loaded with RTLD_LOCAL
>>         qrs_start(): lookup of "main" failed     # Is in initial NS
>> # Now do some lookups from initial NS
>> main(): lookup of "abc" failed                   # In another NS
>> main(): lookup of "def" succeeded                # Was loaded into initial NS
>> main(): lookup of "jkl" failed                   # In another NS
>> main(): lookup of "mno" failed                   # In another NS (+ RTLD_LOCAL)
>> ---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
> 
> With a few more patches I get *almost* all the way there:
> 
> [carlos@athas dlmopen_expt]$ ./main
> main(): lmid from dlopen(NULL) is 0 (handle = 0x0x7fa3e58ec188)
> main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
> main(): lmid from dlopen("libabc.so") is 1 (handle = 0x0x2267030)
> main(): invoking abc_start()
>     Called abc_start()
>     abc_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x2267030)
>     abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
>     abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
>     abc_start(): invoking qrs_start()
>         Called qrs_start()
>         qrs_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x2267030)
>         qrs_start(): lookup of "abc" succeeded
>         qrs_start(): lookup of "def" failed
> Segmentation fault (core dumped)
> 
> There is more work to be done. This failure is from calling free() in
> the non-LM_ID_BASE namespace for the first time.

Thanks for working on this, and running my test.

> My opinion is that this should all just work, but may require some special
> cases in libc.so.6 and ld.so to make sure everything is initialized in the
> new namespace and has it's own distinct TLS blocks (doesn't use the base
> namespace TLS blocks).
> 
> The bummer is that gdb stops working to debug anything after the dlmopen.
> We're going to need their help to continue debugging this after we get
> the basic patches in place for 2.23.

Okay.

Cheers,

Michael
  
Michael Kerrisk \(man-pages\) July 24, 2015, 7:43 p.m. UTC | #6
Hi Carlos,

On 07/24/2015 08:43 PM, Carlos O'Donell wrote:
> On 07/24/2015 05:29 AM, Michael Kerrisk (man-pages) wrote:
>> On 07/16/2015 06:43 AM, Carlos O'Donell wrote:
>>> Michael Kerrisk and I are working on a man page for dlmopen.
>>
>> See below for the current draft of the text describing dlmopen().
>> Note that this text describes how Carlos and I think the API
>> is supposed to behave (at least, I think the text fits with
>> discussions Carlos and I had), and it's also consistent with
>> actual Solaris behavior. The BUGS section describes the
>> points that make dlmopen() unfit for purpose (IIUC).
>>
>> Cheers,
>>
>> Michael
>>
>>
>>
>> SYNOPSIS
>>       ....
>>        #define _GNU_SOURCE
>>        #include <dlfcn.h>
>>
>>        void *dlmopen (Lmid_t lmid, const char *filename, int flags);
>>
>>        Link with -ldl.
>>
>> DESCRIPTION
>>    dlopen()
>>        ...
>>    dlmopen()
>>        This function performs the same task as dlopen()—the filename and
>>        flags  arguments,  as  well  as  the  return value, are the same,
>>        except for the differences noted below.
>>
>>        The dlmopen() function differs from dlopen() primarily in that it
>>        accepts an additional argument, lmid, that specifies the link-map
>>        list (also referred to as a namespace) in which the shared object
>>        should  be loaded.  (By comparison, dlopen() adds the dynamically
>>        loaded shared object to the same namespace as the  shared  object
>>        from  which  the  dlopen()  call is made.)  The Lmid_t type is an
>>        opaque handle that refers to a namespace.
>>
>>        The lmid argument is either  the  ID  of  an  existing  namespace
>>        (which  can be obtained using the dlinfo(3) RTLD_DI_LMID request)
>>        or one of the following special values:
>>
>>        LM_ID_BASE
>>               Load the shared object in the initial namespace (i.e., the
>>               application's namespace).
>>
>>        LM_ID_NEWLM
>>               Create  a new namespace and load the shared object in that
>>               namespace.  The object must have been correctly linked  to
>>               reference   all  of  the  other  shared  objects  that  it
>>               requires, since the new namespace is initially empty.
>>
>>        If handle is NULL, then the only  permitted  value  for  lmid  is
>>        LM_ID_BASE.
> 
> What handle is NULL? If "filename" is NULL? 

Yes, sorry s/handle/filename/

> This would be a glibc limitation
> which we plan to fix in 2.23. We should certainly return the handle to the
> base object loaded in the namespace as you suggest, just like the application
> is returned upon dlopen (NULL).

Okay.

>>        ....
>>
>> NOTES
>>    dlmopen() and namespaces
>>        A  link-map list defines an isolated namespace for the resolution
>>        of symbols by the dynamic linker.  Within a namespace,  dependent
>>        shared  objects  are  implicitly  loaded  according  to the usual
>>        rules, and symbol references are likewise resolved  according  to
>>        the  usual  rules, but such resolution is confined to the defini‐
>>        tions provided by the objects  that  have  been  (explicitly  and
>>        implicitly) loaded into the namespace.
>>
>>        The  dlmopen() function permits object-load isolation—the ability
>>        to load a shared object in a new namespace without  exposing  the
>>        rest  of the application to the symbols made available by the new
>>        object.  Note that the use of the RTLD_LOCAL flag is  not  suffi‐
>>        cient  for this purpose, since it prevents a shared object's sym‐
>>        bols from being available to any other shared  object.   In  some
>>        cases,  we may want to make the symbols provided by a dynamically
>>        loaded shared object available to  (a  subset  of)  other  shared
>>        objects without exposing those symbols to the entire application.
>>        This can be achieved  by  using  a  separate  namespace  and  the
>>        RTLD_GLOBAL flag.
> 
> Lastly, RTLD_LOCAL loaded objects may be promoted to RTLD_GLOBAL if they
> are dependencies of another RTLD_GLOBAL loaded object. Therefore RTLD_LOCAL
> is still not sufficient to isolate the loaded shared object without explicit
> control over all object dependencies.

I wrote this up as

       The  dlmopen() function also can be used to provide better isola‐
       tion than the RTLD_LOCAL flag.   In  particular,  shared  objects
       laoded with RTLD_LOCAL may be promoted to RTLD_GLOBAL if they are
       dependencies of another shared object  loaded  with  RTLD_GLOBAL.
       Thus,  RTLD_LOCAL  is  insufficient  to  isolate  a loaded shared
       object except in the (uncommon) case where one has explicit  con‐
       trol over all shared object dependencies.

Look okay?

>>        Possible  uses  of  dlmopen() are plugins where the author of the
>>        plugin-loading framework can't trust the plugin authors and  does
>>        not  wish  any  undefined symbols from the plugin framework to be
>>        resolved to plugin symbols.  Another use  is  to  load  the  same
>>        object  more than once.  Without the use of dlmopen(), this would
>>        require the creation of distinct  copies  of  the  shared  object
>>        file.   Using dlmopen(), this can be achieved by loading the same
>>        shared object file into different namespaces.
>>
>>        The glibc implementation supports a maximum of 16 namespaces.
>>
>>        ...
>> BUGS
>>        As  at  glibc  2.21, specifying the RTLD_GLOBAL flag when calling
>>        dlmopen()   generates   an   error.    Furthermore,    specifying
>>        RTLD_GLOBAL  when  calling  dlopen()  results  in a program crash
>>        (SIGSEGV) if the call is made from any object loaded in a  names‐
>>        pace other than the initial namespace.
>>
> 
> Looks good to me. We need dlinfo documented also, which you already know,
> but I wanted to make that explicit in this public discussion.

I think you didn't catch up on all your mail yet ;-).

Cheers,

Michael
  
Carlos O'Donell July 24, 2015, 7:56 p.m. UTC | #7
On 07/24/2015 03:43 PM, Michael Kerrisk (man-pages) wrote:
>>>        If handle is NULL, then the only  permitted  value  for  lmid  is
>>>        LM_ID_BASE.
>>
>> What handle is NULL? If "filename" is NULL? 
> 
> Yes, sorry s/handle/filename/
> 
>> This would be a glibc limitation
>> which we plan to fix in 2.23. We should certainly return the handle to the
>> base object loaded in the namespace as you suggest, just like the application
>> is returned upon dlopen (NULL).
> 
> Okay.

To be more clear, that's exactly what my present patches implement.

>>>        ....
>>>
>>> NOTES
>>>    dlmopen() and namespaces
>>>        A  link-map list defines an isolated namespace for the resolution
>>>        of symbols by the dynamic linker.  Within a namespace,  dependent
>>>        shared  objects  are  implicitly  loaded  according  to the usual
>>>        rules, and symbol references are likewise resolved  according  to
>>>        the  usual  rules, but such resolution is confined to the defini‐
>>>        tions provided by the objects  that  have  been  (explicitly  and
>>>        implicitly) loaded into the namespace.
>>>
>>>        The  dlmopen() function permits object-load isolation—the ability
>>>        to load a shared object in a new namespace without  exposing  the
>>>        rest  of the application to the symbols made available by the new
>>>        object.  Note that the use of the RTLD_LOCAL flag is  not  suffi‐
>>>        cient  for this purpose, since it prevents a shared object's sym‐
>>>        bols from being available to any other shared  object.   In  some
>>>        cases,  we may want to make the symbols provided by a dynamically
>>>        loaded shared object available to  (a  subset  of)  other  shared
>>>        objects without exposing those symbols to the entire application.
>>>        This can be achieved  by  using  a  separate  namespace  and  the
>>>        RTLD_GLOBAL flag.
>>
>> Lastly, RTLD_LOCAL loaded objects may be promoted to RTLD_GLOBAL if they
>> are dependencies of another RTLD_GLOBAL loaded object. Therefore RTLD_LOCAL
>> is still not sufficient to isolate the loaded shared object without explicit
>> control over all object dependencies.
> 
> I wrote this up as
> 
>        The  dlmopen() function also can be used to provide better isola‐
>        tion than the RTLD_LOCAL flag.   In  particular,  shared  objects
>        laoded with RTLD_LOCAL may be promoted to RTLD_GLOBAL if they are
>        dependencies of another shared object  loaded  with  RTLD_GLOBAL.
>        Thus,  RTLD_LOCAL  is  insufficient  to  isolate  a loaded shared
>        object except in the (uncommon) case where one has explicit  con‐
>        trol over all shared object dependencies.
> 
> Look okay?

Perfect.

>>>        Possible  uses  of  dlmopen() are plugins where the author of the
>>>        plugin-loading framework can't trust the plugin authors and  does
>>>        not  wish  any  undefined symbols from the plugin framework to be
>>>        resolved to plugin symbols.  Another use  is  to  load  the  same
>>>        object  more than once.  Without the use of dlmopen(), this would
>>>        require the creation of distinct  copies  of  the  shared  object
>>>        file.   Using dlmopen(), this can be achieved by loading the same
>>>        shared object file into different namespaces.
>>>
>>>        The glibc implementation supports a maximum of 16 namespaces.
>>>
>>>        ...
>>> BUGS
>>>        As  at  glibc  2.21, specifying the RTLD_GLOBAL flag when calling
>>>        dlmopen()   generates   an   error.    Furthermore,    specifying
>>>        RTLD_GLOBAL  when  calling  dlopen()  results  in a program crash
>>>        (SIGSEGV) if the call is made from any object loaded in a  names‐
>>>        pace other than the initial namespace.
>>>
>>
>> Looks good to me. We need dlinfo documented also, which you already know,
>> but I wanted to make that explicit in this public discussion.
> 
> I think you didn't catch up on all your mail yet ;-).

When am I ever? :-)

c.
  
Carlos O'Donell Sept. 15, 2015, 6:39 p.m. UTC | #8
On 07/24/2015 02:37 PM, Carlos O'Donell wrote:
> With a few more patches I get *almost* all the way there:

[carlos@athas dlmopen_expt]$ ./main
main(): lmid from dlopen(NULL) is 0 (handle = 0x0x7feda4b49168)
main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
main(): lmid from dlopen("libabc.so") is 1 (handle = 0x0x1cc7440)
main(): invoking abc_start()
    Called abc_start()
    abc_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x1cc7440)
    abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
    abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
    abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
    abc_start(): invoking qrs_start()
        Called qrs_start()
        qrs_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x1cc7440)
        qrs_start(): lookup of "abc" succeeded
        qrs_start(): lookup of "def" failed
        qrs_start(): lookup of "jkl" succeeded
        qrs_start(): lookup of "mno" failed
        qrs_start(): lookup of "main" failed
main(): lookup of "abc" failed
main(): lookup of "def" succeeded
main(): lookup of "jkl" failed
main(): lookup of "mno" failed
[carlos@athas dlmopen_expt]$ echo $?
0

I'm now passing your test.

I wonder if there is anything more complicated we can throw at it?

The design is solidly based on first principles about dynamic
loading, so I don't see anything we can't solve by reworking
the implementation slightly.

I guess the next step is to get all of this into master and start
telling people to use it to solve isolation problems like plugins.
Perhaps write a small blog post about plugin isolation.

Cheers,
Carlos.
  
Carlos O'Donell Sept. 18, 2015, 4:22 a.m. UTC | #9
On 09/15/2015 02:39 PM, Carlos O'Donell wrote:
> On 07/24/2015 02:37 PM, Carlos O'Donell wrote:
>> With a few more patches I get *almost* all the way there:
> 
> [carlos@athas dlmopen_expt]$ ./main
> main(): lmid from dlopen(NULL) is 0 (handle = 0x0x7feda4b49168)
> main(): dlmopen LM_ID_NEWLM ./libabc.so   RTLD_GLOBAL
> main(): lmid from dlopen("libabc.so") is 1 (handle = 0x0x1cc7440)
> main(): invoking abc_start()
>     Called abc_start()
>     abc_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x1cc7440)
>     abc_start(): dlmopen LM_ID_BASE  ./libdef.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libjkl.so   RTLD_GLOBAL
>     abc_start(): dlopen              ./libmno.so   RTLD_LOCAL
>     abc_start(): dlopen              ./libqrs.so   RTLD_LOCAL
>     abc_start(): invoking qrs_start()
>         Called qrs_start()
>         qrs_start(): lmid from dlopen(NULL) is 1 (handle = 0x0x1cc7440)
>         qrs_start(): lookup of "abc" succeeded
>         qrs_start(): lookup of "def" failed
>         qrs_start(): lookup of "jkl" succeeded
>         qrs_start(): lookup of "mno" failed
>         qrs_start(): lookup of "main" failed
> main(): lookup of "abc" failed
> main(): lookup of "def" succeeded
> main(): lookup of "jkl" failed
> main(): lookup of "mno" failed
> [carlos@athas dlmopen_expt]$ echo $?
> 0
> 
> I'm now passing your test.

All patches on carlos/dlmopen branch, tracking master.

Cheers,
Carlos.
  

Patch

diff --git a/dlfcn/dlmopen.c b/dlfcn/dlmopen.c
index 38dca7a..ba468d2 100644
--- a/dlfcn/dlmopen.c
+++ b/dlfcn/dlmopen.c
@@ -61,11 +61,6 @@  dlmopen_doit (void *a)
       if (args->file == NULL)
 # endif
 	GLRO(dl_signal_error) (EINVAL, NULL, NULL, N_("invalid namespace"));
-
-      /* It makes no sense to use RTLD_GLOBAL when loading a DSO into
-	 a namespace other than the base namespace.  */
-      if (__glibc_unlikely (args->mode & RTLD_GLOBAL))
-	GLRO(dl_signal_error) (EINVAL, NULL, NULL, N_("invalid mode"));
     }
 
   args->new = GLRO(dl_open) (args->file ?: "", args->mode | __RTLD_DLOPEN,
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 027c1e0..175ef16 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -72,6 +72,31 @@  add_to_global (struct link_map *new)
     if (new->l_searchlist.r_list[cnt]->l_global == 0)
       ++to_add;
 
+  struct link_namespaces *ns = &GL(dl_ns)[new->l_ns];
+
+  if (__glibc_unlikely (new->l_ns != LM_ID_BASE
+			&& ns->_ns_main_searchlist == NULL))
+    {
+      /* An initial object was loaded with dlmopen into a distinct namespace
+	 that has no global searchlist (RTLD_GLOBAL) and RTLD_GLOBAL was used.
+	 Or that object then dlopened another object into the global
+	 searchlist.  We find ourselves with no global searchlist initialized.
+	 We have two choices, either we forbid this scenario and return an
+	 error or treat the first RTLD_GLOBAL DSOs searchlist as the global
+	 searchlist of the namespace.  We do the latter since it's the most
+	 sensible course of action since you may dlmopen other libraries which
+	 have no idea they have been isolated.  Thus RTLD_GLOBAL dlopen calls
+	 within the new namespace are restricted to the new namespace and may
+	 reference the symbols of the initial RTLD_GLOBAL dlmopen'd
+	 libraries.  */
+      ns->_ns_main_searchlist = &new->l_searchlist;
+      /* Treat this list like it is read-only.  A value of zero forces a copy
+	 later if we need to extend this list.  The list itself is already
+	 being used as the primary scope for the first loaded RTLD_GLOBAL
+	 object into the new namespace, thus we don't want to free it.  */
+      ns->_ns_global_scope_alloc = 0;
+    }
+
   /* The symbols of the new objects and its dependencies are to be
      introduced into the global scope that will be used to resolve
      references from other dynamically-loaded objects.
@@ -86,7 +111,6 @@  add_to_global (struct link_map *new)
      in an realloc() call.  Therefore we allocate a completely new
      array the first time we have to add something to the locale scope.  */
 
-  struct link_namespaces *ns = &GL(dl_ns)[new->l_ns];
   if (ns->_ns_global_scope_alloc == 0)
     {
       /* This is the first dynamic object given global scope.  */