[v5,00/22] Some rtld-audit fixes

Message ID	0D3F0C5F-2586-42F9-916D-2F327432AF13@rice.edu
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D8D973858402 sender: johnmc@rice.edu) by mx2.mail.rice.edu (Postfix) with ESMTPSA id 51C6D209C24; Wed, 17 Nov 2021 12:08:18 -0600 (CST) Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Fwd: [PATCH v5 00/22] Some rtld-audit fixes Date: Wed, 17 Nov 2021 12:08:17 -0600 References: <EA69A62D-7C01-4536-B551-2609226053F2@rice.edu> To: John Mellor-Crummey via Libc-alpha <libc-alpha@sourceware.org> Message-Id: <0D3F0C5F-2586-42F9-916D-2F327432AF13@rice.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Precedence: list From: John Mellor-Crummey via Libc-alpha <libc-alpha@sourceware.org> Reply-To: John Mellor-Crummey <johnmc@rice.edu> Cc: Xiaozhu Meng <xm13@rice.edu>, John Mellor-Crummey <johnmc@rice.edu>, "Mark W. Krentel" <krentel@rice.edu> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Message ID

0D3F0C5F-2586-42F9-916D-2F327432AF13@rice.edu

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D8D973858402
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Subject: Fwd: [PATCH v5 00/22] Some rtld-audit fixes
Date: Wed, 17 Nov 2021 12:08:17 -0600
References: <EA69A62D-7C01-4536-B551-2609226053F2@rice.edu>
To: John Mellor-Crummey via Libc-alpha <libc-alpha@sourceware.org>
Message-Id: <0D3F0C5F-2586-42F9-916D-2F327432AF13@rice.edu>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Precedence: list
From: John Mellor-Crummey via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: John Mellor-Crummey <johnmc@rice.edu>
Cc: Xiaozhu Meng <xm13@rice.edu>, John Mellor-Crummey <johnmc@rice.edu>,
 "Mark W. Krentel" <krentel@rice.edu>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Message

John Mellor-Crummey Nov. 17, 2021, 6:08 p.m. UTC

  We are grateful to the members of the libc-alpha team who have been working on authoring (Adhemerval Zanella) and reviewing (Florian Weimer) fixes for the auditor problems that we previously reported to this list.

Jonathon Anderson, the auditor expert on my team, has reviewed the patches and offers some feedback about them.
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

> Begin forwarded message:
> 
> From: Jonathon Anderson <janderson@rice.edu>
> Subject: Re: [PATCH v5 00/22] Some rtld-audit fixes
> Date: November 16, 2021 at 5:38:28 PM CST
> 
> 
> === Reply to [00/20]: ===
> 
> Many thanks to Adhemerval Zanella for authoring and maintaining these fixes and to Florian Weimer for reviewing them! We are very happy to see progress towards getting our bugs resolved.
> 
> I have reviewed the tests included with these patches, presuming all these tests pass all of our reported 'Tier 1' issues are resolved by this patchset. I have also tested this patch series in an x86-64 Fedora rawhide container, all of our 'Tier 1' issue reproducers pass (except for the ARM-specific reproducer which I skipped). Not including the ones Adhemerval raised questions about, our 'Tier 2' issues that are not yet resolved in upstream + these patches are as follows (see [1] for full details):
> 
>  - "Various Glibc functions cannot be called from an auditor": Several Glibc functions segfault when called from an auditor, many dependent on the initialization of the Glibc in the application namespace (LM_ID_BASE). Patch [13/20] fixes and tests the isspace case. gethostid does not have a test but our reproducer passes. dlopen/dladdr are also known to have this issue, fixes for these are not included in this patchset.
> 
>  - "La_activity calls are missing or mis-ordered with respect to la_obj* calls": The calls to la_activity seen in practice often differ from our presumed interpretation of the rtld-audit man page. Patch [17/20] fixes case 2 (la_activity is not called during application exit, even though la_objclose is). A fix for case 4 (la_activity(LA_ACT_DELETE) is only called after la_objclose, and immediately followed by la_activity(LA_ACT_CONSISTENT)) is not included in this patchset.
> 
> Overall, we are very pleased with these patches, and would love to see them included in official releases of Glibc and RHEL in the near future.
> 
> In addition, since the time of the initial development of these patches we have uncovered a pair of new 'Tier 2' issues (see [1] for full details):
> 
>  - "Related recursive dl*open calls can crash or cause inconsistent state": Dlopen and dlmopen when called recursively (under very specific conditions) will crash uncatchably or return prior to the init constructors of the requested binaries (the difference is presumed to be based on whether loader assertions were enabled during the build).
> 
>  - "Auditor namespaces (undocumentedly) differ from normal namespaces": The separate namespace used to load an auditor behaves very differently from a "normal" namespace created by the application, however there is no indication in the auditor callbacks that a namespace falls into this "special" case. Since LD_AUDIT can load auditors arbitrarily *all* auditors must be robust against the idiosyncrasies of auditor namespaces, complicating already delicate auditor code.
> 
> Of course, we would also like our 'Tier 2' issues to also be resolved soon. Below are our responses to the questions Adhemerval raised concerning a few of our issues:
> 
> > There is also some point brough by John Melloc-Crummey documents that
> > I don't have a straighforward answer so I haven't added on this
> > patchset:
> >
> >   1 la_activity(LA_ACT_ADD) is never called for auditor namespaces,
> >      even though la_objopen and la_activity(LA_ACT_CONSISTENT) are.
> >
> >   There is no easy solution for this: we need at least to load the
> >   *first* auditor to actually issue the la_activity(LA_ACT_ADD).  It
> >   means that it would *only* work for subsequent audit modules, and
> >   adding this specific semantic is confusing and does not really
> >   improve things (it only helps when multiple audit modules are used).
> 
> Our reproducer for this issue passes even though there is no explicit test, so this question may be resolved in part already. To provide some of our motivation for this issue:
> 
> The intended use case is in fact multiple audit modules. As detailed in the introduction to our issue document [1], the available interfaces aside from LD_AUDIT for intercepting functions are insufficient for libraries intended to accelerate applications and for performance tools. We predict a near-future scenario where large-scale applications are accelerated by auditors, in order to provide suitable performance measurements in this environment we need first-class support for multiple auditors. We see this already taking place with Spindle [2], an auditor which accelerates dynamic library loading using cross-node caching and communication. If Spindle is helping or hurting the performance of an application, we want to be able to see that in our performance measurements. Degraded callbacks from Glibc for auditors increases the burden and complexity on our own code to work around/defend against such idiosyncrasies. If there are multiple auditors other than HPCToolkit, e.g. spindle and any other auditor, each of the auditors may need to address these issues.
> 
> >   2. la_objopen is called for the main binary and for ld.so before the
> >      first la_activity(LA_ACT_ADD) call.  This contradicts the pattern
> >      found in a successful dlopen (where la_activity(LA_ACT_ADD)
> >      precedes la_objopen).
> >
> >   The constrain here is we need to handle DT_AUDIT and DT_DEPAUDIT
> >   dynamic tags, which means we need to first load the executable in
> >   memory to parse the required audit modules.  So we need to first parse
> >   the dynamic audit tags, load the audit modules, and then load the
> >   object itself.
> 
> We are wholly sympathetic to the difficulty of adjusting complex and delicate code!
> 
> From an outside perspective this constraint does not seem to be an issue, consider that the executable's la_objopen must occur *after* the DT_*AUDIT auditors have been loaded so that it can be reported to the newly-loaded auditors, this also matches what happens in miniature experiments. I would think then that there is (hopefully one) code path where la_activity(ADD) could be called just before the la_objopen for the executable or just after DT_*AUDIT auditors have been loaded.
> 
> >   3. For non-PIE executables the base address listed in link_map->l_addr
> >      for the main application binary is 0, even though dladdr is able to
> >      recover the correct offset. La_objopen is affected by this.
> >
> >   This would require to change an internal semantic for link_map->l_addr.
> >   This is not straighfoward and I am not sure about the direct gains.
> 
> Again, we are wholly sympathetic to the difficulty of refactoring complex code!
> 
> The motivation for providing a consistent link_map->l_addr value is to unify the handling for the main executable with any other binary and to allow access to the ELF header of the main executable (which provides fields not available anywhere else: type, ABI, entry point...). An alternative would be to re-open the file from its path (link_map->l_name), however this is a serious performance concern for large-scale executions (metadata servers are known to be a bottleneck of parallel filesystems). dladdr is not always an option for auditors, as noted by another of our 'Tier 2' issues. Right now, we only require the program headers which we can obtain from getauxval(AT_PHDR), however this technique has questionable portability and robustness (getauxval returns an unsigned long, not a pointer).
> 
> From an outside perspective the current l_addr semantic is fairly undocumented, the dladdr and dlinfo man pages define it vaguely as the "difference between the address in the ELF file and the address in memory." That sounds (to me at least) like l_addr should point to byte 0 in the file (the ELF header), and that seems to be correct in all but the non-PIE case. It makes little sense to expose a value without a clear (documented!) interpretation, and knowing this deviation makes it unclear to me as a user how I should be using this value (and makes me wonder how many other users have made the same mistake).
> 
> dladdr gets its value from link_map->l_map_start instead of l_addr, so the semantic we want is already present in a private field. It seems to me these two fields could be swapped with little issue, if altering the public semantic is not acceptable we could also be sated if l_map_start was made public.
> 
> [1] https://docs.google.com/document/d/1dVaDBdzySecxQqD6hLLzDrEF18M1UtjDna9gL5BWWI0/edit# <https://docs.google.com/document/d/1dVaDBdzySecxQqD6hLLzDrEF18M1UtjDna9gL5BWWI0/edit#>
> [2] https://github.com/hpc/Spindle <https://github.com/hpc/Spindle>
> 
> === Reply to [17/20]: ===
> 
> > la_activity() is not called during application exit, even though
> > la_objclose is.
> >
> > Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
> 
> Many thanks for the patch!
> 
> As an outsider's review, the included test does not appear to test the new functionality noted in the commit message and fixed in the patch. It merely tests that la_objopen/close pairs are properly generated for all binaries, not that an la_activity(DELETE) call is made at program termination.
> 
> Since dlclose is not called in the test body, I believe all that should be needed is an additional check that la_activity(DELETE) is called exactly once.
> 
> -Jonathon