elf: Do not signal LA_ACT_CONSISTENT for an empty namespace [BZ #26076]

Message ID 8736641l8b.fsf@oldenburg2.str.redhat.com
State Committed
Headers
Series elf: Do not signal LA_ACT_CONSISTENT for an empty namespace [BZ #26076] |

Commit Message

Florian Weimer July 6, 2020, 7:45 p.m. UTC
  The auditing interface identifies namespaces by their first loaded
module.  Once the namespace is empty, it is no longer possible to signal
LA_ACT_CONSISTENT for it because the first loaded module is already gone
at that point.

Tested on i686-linux-gnu and x86_64-linux-gnu.

---
 elf/dl-close.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
  

Comments

H.J. Lu July 6, 2020, 8:40 p.m. UTC | #1
On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> The auditing interface identifies namespaces by their first loaded
> module.  Once the namespace is empty, it is no longer possible to signal
> LA_ACT_CONSISTENT for it because the first loaded module is already gone
> at that point.
>
> Tested on i686-linux-gnu and x86_64-linux-gnu.
>
> ---
>  elf/dl-close.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/elf/dl-close.c b/elf/dl-close.c
> index 73b2817bbf..8e146ecee1 100644
> --- a/elf/dl-close.c
> +++ b/elf/dl-close.c
> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
>    if (__glibc_unlikely (do_audit))
>      {
>        struct link_map *head = ns->_ns_loaded;
> -      /* Do not call the functions for any auditing object.  */
> -      if (head->l_auditing == 0)

I assume that "head" can be NULL.  Do you have a testcase?

> +      /* If head is NULL, the namespace has become empty, and the
> +        audit interface does not give us a way to signal
> +        LA_ACT_CONSISTENT for it because the first loaded module is
> +        used to identify the namespace.
> +
> +        Furthermore, do not notify auditors of the cleanup of a
> +        failed audit module loading attempt.  */
> +      if (head != NULL && head->l_auditing == 0)
>         {
>           struct audit_ifaces *afct = GLRO(dl_audit);
>           for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>
  
Carlos O'Donell July 6, 2020, 8:41 p.m. UTC | #2
On 7/6/20 4:40 PM, H.J. Lu via Libc-alpha wrote:
> On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> The auditing interface identifies namespaces by their first loaded
>> module.  Once the namespace is empty, it is no longer possible to signal
>> LA_ACT_CONSISTENT for it because the first loaded module is already gone
>> at that point.
>>
>> Tested on i686-linux-gnu and x86_64-linux-gnu.
>>
>> ---
>>  elf/dl-close.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>> index 73b2817bbf..8e146ecee1 100644
>> --- a/elf/dl-close.c
>> +++ b/elf/dl-close.c
>> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
>>    if (__glibc_unlikely (do_audit))
>>      {
>>        struct link_map *head = ns->_ns_loaded;
>> -      /* Do not call the functions for any auditing object.  */
>> -      if (head->l_auditing == 0)
> 
> I assume that "head" can be NULL.  Do you have a testcase?

Yes, it's tst-auditmany in trunk right now. It fails because of this issue.

> 
>> +      /* If head is NULL, the namespace has become empty, and the
>> +        audit interface does not give us a way to signal
>> +        LA_ACT_CONSISTENT for it because the first loaded module is
>> +        used to identify the namespace.
>> +
>> +        Furthermore, do not notify auditors of the cleanup of a
>> +        failed audit module loading attempt.  */
>> +      if (head != NULL && head->l_auditing == 0)
>>         {
>>           struct audit_ifaces *afct = GLRO(dl_audit);
>>           for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>>
> 
>
  
H.J. Lu July 6, 2020, 8:47 p.m. UTC | #3
On Mon, Jul 6, 2020 at 1:41 PM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 7/6/20 4:40 PM, H.J. Lu via Libc-alpha wrote:
> > On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >>
> >> The auditing interface identifies namespaces by their first loaded
> >> module.  Once the namespace is empty, it is no longer possible to signal
> >> LA_ACT_CONSISTENT for it because the first loaded module is already gone
> >> at that point.
> >>
> >> Tested on i686-linux-gnu and x86_64-linux-gnu.
> >>
> >> ---
> >>  elf/dl-close.c | 10 ++++++++--
> >>  1 file changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/elf/dl-close.c b/elf/dl-close.c
> >> index 73b2817bbf..8e146ecee1 100644
> >> --- a/elf/dl-close.c
> >> +++ b/elf/dl-close.c
> >> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
> >>    if (__glibc_unlikely (do_audit))
> >>      {
> >>        struct link_map *head = ns->_ns_loaded;
> >> -      /* Do not call the functions for any auditing object.  */
> >> -      if (head->l_auditing == 0)
> >
> > I assume that "head" can be NULL.  Do you have a testcase?
>
> Yes, it's tst-auditmany in trunk right now. It fails because of this issue.

I have seen tst-auditmany failure,  but not always.  The test was added
more than 6 months ago. Why does it start failing now?

> >
> >> +      /* If head is NULL, the namespace has become empty, and the
> >> +        audit interface does not give us a way to signal
> >> +        LA_ACT_CONSISTENT for it because the first loaded module is
> >> +        used to identify the namespace.
> >> +
> >> +        Furthermore, do not notify auditors of the cleanup of a
> >> +        failed audit module loading attempt.  */
> >> +      if (head != NULL && head->l_auditing == 0)
> >>         {
> >>           struct audit_ifaces *afct = GLRO(dl_audit);
> >>           for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
> >>
> >
> >
>
>
> --
> Cheers,
> Carlos.
>
  
Carlos O'Donell July 6, 2020, 8:53 p.m. UTC | #4
On 7/6/20 4:47 PM, H.J. Lu wrote:
> On Mon, Jul 6, 2020 at 1:41 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> On 7/6/20 4:40 PM, H.J. Lu via Libc-alpha wrote:
>>> On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>>
>>>> The auditing interface identifies namespaces by their first loaded
>>>> module.  Once the namespace is empty, it is no longer possible to signal
>>>> LA_ACT_CONSISTENT for it because the first loaded module is already gone
>>>> at that point.
>>>>
>>>> Tested on i686-linux-gnu and x86_64-linux-gnu.
>>>>
>>>> ---
>>>>  elf/dl-close.c | 10 ++++++++--
>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>>>> index 73b2817bbf..8e146ecee1 100644
>>>> --- a/elf/dl-close.c
>>>> +++ b/elf/dl-close.c
>>>> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
>>>>    if (__glibc_unlikely (do_audit))
>>>>      {
>>>>        struct link_map *head = ns->_ns_loaded;
>>>> -      /* Do not call the functions for any auditing object.  */
>>>> -      if (head->l_auditing == 0)
>>>
>>> I assume that "head" can be NULL.  Do you have a testcase?
>>
>> Yes, it's tst-auditmany in trunk right now. It fails because of this issue.
> 
> I have seen tst-auditmany failure,  but not always.  The test was added
> more than 6 months ago. Why does it start failing now?

A combination of, IMO, Szabolcs surplus tls fixes, rseq increasing static
TLS usage, and Florian's recent (months ago) fixes to make unloading reliable.

The 9th audit module can fail to load if it runs out of static tls surplus
to use or if the audit module load fails (not enough namespaces).

I see this failing with Szabolcs recent patches to adjust tls surplus.

I have debugged the failure and it's like this:

(a) We try to load the auditor.
(b) The auditor load fails and we unwind all the loads, leaving an
    empty link namespace.
(c) We try to indicate consistent namespace for an empty namespace
    and crash.

With Florian's recent work to make unloading reliable we now end up with
empty link namespaces for failed to load audit modules. As they should be.

However this above code expects there to be *something* left in the link
namespace and there isn't.

We *could* arrange to call LA_ACT_CONSISTENT with a NULL cookie in this
case for all observing auditors. I'm not opposed to that.
 
>>>
>>>> +      /* If head is NULL, the namespace has become empty, and the
>>>> +        audit interface does not give us a way to signal
>>>> +        LA_ACT_CONSISTENT for it because the first loaded module is
>>>> +        used to identify the namespace.
>>>> +
>>>> +        Furthermore, do not notify auditors of the cleanup of a
>>>> +        failed audit module loading attempt.  */
>>>> +      if (head != NULL && head->l_auditing == 0)
>>>>         {
>>>>           struct audit_ifaces *afct = GLRO(dl_audit);
>>>>           for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>>>>
>>>
>>>
>>
>>
>> --
>> Cheers,
>> Carlos.
>>
> 
>
  
H.J. Lu July 6, 2020, 9:08 p.m. UTC | #5
On Mon, Jul 6, 2020 at 1:53 PM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 7/6/20 4:47 PM, H.J. Lu wrote:
> > On Mon, Jul 6, 2020 at 1:41 PM Carlos O'Donell <carlos@redhat.com> wrote:
> >>
> >> On 7/6/20 4:40 PM, H.J. Lu via Libc-alpha wrote:
> >>> On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
> >>> <libc-alpha@sourceware.org> wrote:
> >>>>
> >>>> The auditing interface identifies namespaces by their first loaded
> >>>> module.  Once the namespace is empty, it is no longer possible to signal
> >>>> LA_ACT_CONSISTENT for it because the first loaded module is already gone
> >>>> at that point.
> >>>>
> >>>> Tested on i686-linux-gnu and x86_64-linux-gnu.
> >>>>
> >>>> ---
> >>>>  elf/dl-close.c | 10 ++++++++--
> >>>>  1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/elf/dl-close.c b/elf/dl-close.c
> >>>> index 73b2817bbf..8e146ecee1 100644
> >>>> --- a/elf/dl-close.c
> >>>> +++ b/elf/dl-close.c
> >>>> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
> >>>>    if (__glibc_unlikely (do_audit))
> >>>>      {
> >>>>        struct link_map *head = ns->_ns_loaded;
> >>>> -      /* Do not call the functions for any auditing object.  */
> >>>> -      if (head->l_auditing == 0)
> >>>
> >>> I assume that "head" can be NULL.  Do you have a testcase?
> >>
> >> Yes, it's tst-auditmany in trunk right now. It fails because of this issue.
> >
> > I have seen tst-auditmany failure,  but not always.  The test was added
> > more than 6 months ago. Why does it start failing now?
>
> A combination of, IMO, Szabolcs surplus tls fixes, rseq increasing static
> TLS usage, and Florian's recent (months ago) fixes to make unloading reliable.
>
> The 9th audit module can fail to load if it runs out of static tls surplus
> to use or if the audit module load fails (not enough namespaces).
>
> I see this failing with Szabolcs recent patches to adjust tls surplus.
>
> I have debugged the failure and it's like this:
>
> (a) We try to load the auditor.
> (b) The auditor load fails and we unwind all the loads, leaving an
>     empty link namespace.
> (c) We try to indicate consistent namespace for an empty namespace
>     and crash.
>
> With Florian's recent work to make unloading reliable we now end up with
> empty link namespaces for failed to load audit modules. As they should be.
>
> However this above code expects there to be *something* left in the link
> namespace and there isn't.
>
> We *could* arrange to call LA_ACT_CONSISTENT with a NULL cookie in this
> case for all observing auditors. I'm not opposed to that.

There are

  bool do_audit = GLRO(dl_naudit) > 0 && !ns->_ns_loaded->l_auditing;
...

  if (__glibc_unlikely (do_audit))
    {
      struct link_map *head = ns->_ns_loaded;
      struct audit_ifaces *afct = GLRO(dl_audit);
      /* Do not call the functions for any auditing object.  */
      if (head->l_auditing == 0)

It sounds like ns->_ns_loaded is changed between setting and using of
do_audit.  Shouldn't do_audit be set to false when ns->_ns_loaded becomes
NULL?

> >>>
> >>>> +      /* If head is NULL, the namespace has become empty, and the
> >>>> +        audit interface does not give us a way to signal
> >>>> +        LA_ACT_CONSISTENT for it because the first loaded module is
> >>>> +        used to identify the namespace.
> >>>> +
> >>>> +        Furthermore, do not notify auditors of the cleanup of a
> >>>> +        failed audit module loading attempt.  */
> >>>> +      if (head != NULL && head->l_auditing == 0)
> >>>>         {
> >>>>           struct audit_ifaces *afct = GLRO(dl_audit);
> >>>>           for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
> >>>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Cheers,
> >> Carlos.
> >>
> >
> >
>
>
> --
> Cheers,
> Carlos.
>
  
Carlos O'Donell July 7, 2020, 12:19 a.m. UTC | #6
On 7/6/20 5:08 PM, H.J. Lu wrote:
> On Mon, Jul 6, 2020 at 1:53 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> On 7/6/20 4:47 PM, H.J. Lu wrote:
>>> On Mon, Jul 6, 2020 at 1:41 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>>>
>>>> On 7/6/20 4:40 PM, H.J. Lu via Libc-alpha wrote:
>>>>> On Mon, Jul 6, 2020 at 12:45 PM Florian Weimer via Libc-alpha
>>>>> <libc-alpha@sourceware.org> wrote:
>>>>>>
>>>>>> The auditing interface identifies namespaces by their first loaded
>>>>>> module.  Once the namespace is empty, it is no longer possible to signal
>>>>>> LA_ACT_CONSISTENT for it because the first loaded module is already gone
>>>>>> at that point.
>>>>>>
>>>>>> Tested on i686-linux-gnu and x86_64-linux-gnu.
>>>>>>
>>>>>> ---
>>>>>>  elf/dl-close.c | 10 ++++++++--
>>>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/elf/dl-close.c b/elf/dl-close.c
>>>>>> index 73b2817bbf..8e146ecee1 100644
>>>>>> --- a/elf/dl-close.c
>>>>>> +++ b/elf/dl-close.c
>>>>>> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
>>>>>>    if (__glibc_unlikely (do_audit))
>>>>>>      {
>>>>>>        struct link_map *head = ns->_ns_loaded;
>>>>>> -      /* Do not call the functions for any auditing object.  */
>>>>>> -      if (head->l_auditing == 0)
>>>>>
>>>>> I assume that "head" can be NULL.  Do you have a testcase?
>>>>
>>>> Yes, it's tst-auditmany in trunk right now. It fails because of this issue.
>>>
>>> I have seen tst-auditmany failure,  but not always.  The test was added
>>> more than 6 months ago. Why does it start failing now?
>>
>> A combination of, IMO, Szabolcs surplus tls fixes, rseq increasing static
>> TLS usage, and Florian's recent (months ago) fixes to make unloading reliable.
>>
>> The 9th audit module can fail to load if it runs out of static tls surplus
>> to use or if the audit module load fails (not enough namespaces).
>>
>> I see this failing with Szabolcs recent patches to adjust tls surplus.
>>
>> I have debugged the failure and it's like this:
>>
>> (a) We try to load the auditor.
>> (b) The auditor load fails and we unwind all the loads, leaving an
>>     empty link namespace.
>> (c) We try to indicate consistent namespace for an empty namespace
>>     and crash.
>>
>> With Florian's recent work to make unloading reliable we now end up with
>> empty link namespaces for failed to load audit modules. As they should be.
>>
>> However this above code expects there to be *something* left in the link
>> namespace and there isn't.
>>
>> We *could* arrange to call LA_ACT_CONSISTENT with a NULL cookie in this
>> case for all observing auditors. I'm not opposed to that.
> 
> There are
> 
>   bool do_audit = GLRO(dl_naudit) > 0 && !ns->_ns_loaded->l_auditing;
> ...
> 
>   if (__glibc_unlikely (do_audit))
>     {
>       struct link_map *head = ns->_ns_loaded;
>       struct audit_ifaces *afct = GLRO(dl_audit);
>       /* Do not call the functions for any auditing object.  */
>       if (head->l_auditing == 0)
> 
> It sounds like ns->_ns_loaded is changed between setting and using of
> do_audit.  Shouldn't do_audit be set to false when ns->_ns_loaded becomes
> NULL?

I don't think so. In fact we should still be auditing IMO. I think that
do_audit should remain true, but *what* we use as the cookie should have
been a more stable choice, *or* the location of the LA_ACT_CONSISTENT
is too late.

When the first loaded object fails to be loaded, as is the case here,
we end up with an empty link namespace and thus no link map to be able
to use as the cookie. It's not clear why we are using ns->_ns_loaded
rather than something else as the cookie.

A more stable cookie would be to rewrite the code to use the address
of a cookie that is attached to the link namespace and removed from there
*after* we issue the final LA_ACT_CONSISTENT that uses the cookie to track
the objects removal.

Such a change is more complex than we need today. For now I think it is
sufficient to say that la_activity LA_ACT_CONSISTENT will not be called if
the last object in the link namespace is removed.

Alternatively we can avoid deleting the link map until *after* the la_activity
audit point. That requires refactoring.

Note that in Solaris it too up to LAV_VERSION5 before they got all the
semantics to this sorted out :-)
  
Carlos O'Donell July 7, 2020, 12:21 a.m. UTC | #7
On 7/6/20 3:45 PM, Florian Weimer wrote:
> The auditing interface identifies namespaces by their first loaded
> module.  Once the namespace is empty, it is no longer possible to signal
> LA_ACT_CONSISTENT for it because the first loaded module is already gone
> at that point.

I debugged this thoroughly given that I have access to a test case now.

The following items work together to trigger this:
(a) Szabolc's patch does not limit the number of link namespaces, only
    the way the memory is used and reserved.
(b) Mathieu's rseq patch uses more static tls.
(c) Florian's good work to consistently remove failed loaded objects.

My original complaint had been along the lines that do_audit *should*
have correctly reflected that we don't want to audit the removal of
this object, but I withdraw that complaint (see my notes to HJ in this
thread).

In practice I see the following:
(1) GLRO(dl_naudit) > 0, because we have *other* auditors loaded.
    - In this case it's 8.
(2) ns->_ns_loaded->l_auditing is false because the loading of the
    auditor itself never got far enough to enable the auditor.
    - In this case l_auditing == 0.

Therefore the other auditors are present to observe the unloading
of the failed auditor.

The problem is that we have no link maps left in the namespace
after the unloading and so we can't meet the contract of the
la_activity() call except to call it with a NULL cookie and
use LA_ACT_CONSISTENT, and we can discuss that later after the
release. The problem is that using the link map as a cookie is
probably a poor choice, and it should have been the link namespace
itself created by the dlmopen call since that is stable even if
the namespace is empty. Again this is a cookie so you're not
supposed to look at it, but some of our tests do (tests-special)
and that's OK.

OK for master.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> Tested on i686-linux-gnu and x86_64-linux-gnu.
> 
> ---
>  elf/dl-close.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/elf/dl-close.c b/elf/dl-close.c
> index 73b2817bbf..8e146ecee1 100644
> --- a/elf/dl-close.c
> +++ b/elf/dl-close.c
> @@ -781,8 +781,14 @@ _dl_close_worker (struct link_map *map, bool force)
>    if (__glibc_unlikely (do_audit))
>      {
>        struct link_map *head = ns->_ns_loaded;
> -      /* Do not call the functions for any auditing object.  */
> -      if (head->l_auditing == 0)
> +      /* If head is NULL, the namespace has become empty, and the
> +	 audit interface does not give us a way to signal
> +	 LA_ACT_CONSISTENT for it because the first loaded module is
> +	 used to identify the namespace.
> +
> +	 Furthermore, do not notify auditors of the cleanup of a
> +	 failed audit module loading attempt.  */
> +      if (head != NULL && head->l_auditing == 0)

OK.

With this patch I get:

ERROR: ld.so: object 'tst-auditmanymod9.so' cannot be loaded as audit interface: cannot allocate memory in static TLS block; ignored.
...
elf/tst-auditmany: error while loading shared libraries: /home/carlos/build/glibc-review/libc.so.6: cannot allocate memory in static TLS block
[Inferior 1 (process 2312863) exited with code 0177]

Which looks correct to me given the static tls changes reservation changes.

We get all the way to tst-auditmanymod9.so, start to load it, and run out of static TLS space, and fail.

>  	{
>  	  struct audit_ifaces *afct = GLRO(dl_audit);
>  	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)
>
  

Patch

diff --git a/elf/dl-close.c b/elf/dl-close.c
index 73b2817bbf..8e146ecee1 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -781,8 +781,14 @@  _dl_close_worker (struct link_map *map, bool force)
   if (__glibc_unlikely (do_audit))
     {
       struct link_map *head = ns->_ns_loaded;
-      /* Do not call the functions for any auditing object.  */
-      if (head->l_auditing == 0)
+      /* If head is NULL, the namespace has become empty, and the
+	 audit interface does not give us a way to signal
+	 LA_ACT_CONSISTENT for it because the first loaded module is
+	 used to identify the namespace.
+
+	 Furthermore, do not notify auditors of the cleanup of a
+	 failed audit module loading attempt.  */
+      if (head != NULL && head->l_auditing == 0)
 	{
 	  struct audit_ifaces *afct = GLRO(dl_audit);
 	  for (unsigned int cnt = 0; cnt < GLRO(dl_naudit); ++cnt)