[2/2] nss: handle stat failure in check_reload_and_get (BZ #28752)
Checks
Commit Message
Skip the chroot test if the database isn't loaded
correctly (because the chroot test uses some
existing DB state).
The __stat64_time64 -> fstatat call can fail if
running under an (aggressive) seccomp filter,
like Firefox seems to use.
This manifested in a crash when using glib built
with FAM support with such a Firefox build.
Suggested-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
---
nss/nss_database.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)
Comments
> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote:
>
> Skip the chroot test if the database isn't loaded
> correctly (because the chroot test uses some
> existing DB state).
It looks like there's a test failure with the trybot. I'll look into it.
On 25/05/2022 21:20, Sam James via Libc-alpha wrote:
> Skip the chroot test if the database isn't loaded
> correctly (because the chroot test uses some
> existing DB state).
>
> The __stat64_time64 -> fstatat call can fail if
> running under an (aggressive) seccomp filter,
> like Firefox seems to use.
>
> This manifested in a crash when using glib built
> with FAM support with such a Firefox build.
>
> Suggested-by: DJ Delorie <dj@redhat.com>
> Signed-off-by: Sam James <sam@gentoo.org>
> ---
> nss/nss_database.c | 33 ++++++++++++++++++---------------
> 1 file changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/nss/nss_database.c b/nss/nss_database.c
> index d56c5b798d..6c9b440a98 100644
> --- a/nss/nss_database.c
> +++ b/nss/nss_database.c
> @@ -420,21 +420,24 @@ nss_database_check_reload_and_get (struct nss_database_state *local,
> return true;
> }
>
> - /* Before we reload, verify that "/" hasn't changed. We assume that
> - errors here are very unlikely, but the chance that we're entering
> - a container is also very unlikely, so we err on the side of both
> - very unlikely things not happening at the same time. */
> - if (__stat64_time64 ("/", &str) != 0
> - || (local->root_ino != 0
> - && (str.st_ino != local->root_ino
> - || str.st_dev != local->root_dev)))
> - {
> - /* Change detected; disable reloading and return current state. */
> - atomic_store_release (&local->data.reload_disabled, 1);
> - *result = local->data.services[database_index];
> - __libc_lock_unlock (local->lock);
> - return true;
> - }
> + if (local->data.services[database_index] != NULL) {
> + /* Before we reload, verify that "/" hasn't changed. We assume that
> + errors here are very unlikely, but the chance that we're entering
> + a container is also very unlikely, so we err on the side of both
> + very unlikely things not happening at the same time. */
> + if (__stat64_time64 ("/", &str) != 0
> + || (local->root_ino != 0
> + && (str.st_ino != local->root_ino
> + || str.st_dev != local->root_dev)))
> + {
> + /* Change detected; disable reloading and return current state. */
> + atomic_store_release (&local->data.reload_disabled, 1);
> + *result = local->data.services[database_index];
> + __libc_lock_unlock (local->lock);
> + return true;
> + }
> + }
> +
> local->root_ino = str.st_ino;
> local->root_dev = str.st_dev;
> __libc_lock_unlock (local->lock);
Besides the buildbot issue [1] you already noted, please use the expected GNU
indentation. We have now a clang-format script (.clang-format) to help the
format.
[1] https://www.delorie.com/trybots/32bit/9696/nss-tst-reload1.out
Sam James <sam@gentoo.org> writes:
>> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote:
>>
>> Skip the chroot test if the database isn't loaded
>> correctly (because the chroot test uses some
>> existing DB state).
>
> It looks like there's a test failure with the trybot. I'll look into it.
I think we need to move the local->* settings inside the new block as
otherwise the str.* we're referencing are undefined.
diff --git a/nss/nss_database.c b/nss/nss_database.c
index d56c5b798d..72883d9b42 100644
--- a/nss/nss_database.c
+++ b/nss/nss_database.c
@@ -420,23 +420,26 @@ nss_database_check_reload_and_get (struct nss_database_state *local,
return true;
}
- /* Before we reload, verify that "/" hasn't changed. We assume that
- errors here are very unlikely, but the chance that we're entering
- a container is also very unlikely, so we err on the side of both
- very unlikely things not happening at the same time. */
- if (__stat64_time64 ("/", &str) != 0
- || (local->root_ino != 0
- && (str.st_ino != local->root_ino
- || str.st_dev != local->root_dev)))
- {
- /* Change detected; disable reloading and return current state. */
- atomic_store_release (&local->data.reload_disabled, 1);
- *result = local->data.services[database_index];
- __libc_lock_unlock (local->lock);
- return true;
- }
- local->root_ino = str.st_ino;
- local->root_dev = str.st_dev;
+ if (local->data.services[database_index] != NULL) {
+ /* Before we reload, verify that "/" hasn't changed. We assume that
+ errors here are very unlikely, but the chance that we're entering
+ a container is also very unlikely, so we err on the side of both
+ very unlikely things not happening at the same time. */
+ if (__stat64_time64 ("/", &str) != 0
+ || (local->root_ino != 0
+ && (str.st_ino != local->root_ino
+ || str.st_dev != local->root_dev)))
+ {
+ /* Change detected; disable reloading and return current state. */
+ atomic_store_release (&local->data.reload_disabled, 1);
+ *result = local->data.services[database_index];
+ __libc_lock_unlock (local->lock);
+ return true;
+ }
+ local->root_ino = str.st_ino;
+ local->root_dev = str.st_dev;
+ }
+
__libc_lock_unlock (local->lock);
/* Avoid overwriting the global configuration until we have loaded
> On 4 Jun 2022, at 04:26, DJ Delorie via Libc-alpha <libc-alpha@sourceware.org> wrote:
>
> Sam James <sam@gentoo.org> writes:
>>> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote:
>>>
>>> Skip the chroot test if the database isn't loaded
>>> correctly (because the chroot test uses some
>>> existing DB state).
>>
>> It looks like there's a test failure with the trybot. I'll look into it.
>
> I think we need to move the local->* settings inside the new block as
> otherwise the str.* we're referencing are undefined.
Duh. Thanks.
> On 4 Jun 2022, at 04:26, DJ Delorie <dj@redhat.com> wrote:
>
> Sam James <sam@gentoo.org> writes:
>>> On 26 May 2022, at 01:20, Sam James <sam@gentoo.org> wrote:
>>>
>>> Skip the chroot test if the database isn't loaded
>>> correctly (because the chroot test uses some
>>> existing DB state).
>>
>> It looks like there's a test failure with the trybot. I'll look into it.
>
> I think we need to move the local->* settings inside the new block as
> otherwise the str.* we're referencing are undefined.
>
Huh, it still fails, but makes a bit more sense. It's upset because
now we apparently we return a result despite not being started
up:
```
FAIL: nss/tst-reload2
original exit status 1
error: tst-reload2.c:132: not true: pw->pw_uid != 2468
tst-reload2.c:138: numeric comparison failure
left: 5 (0x5); from: pw->pw_uid
right: 1234 (0x4d2); from: 1234
error: tst-reload2.c:140: not true: gr != NULL
error: tst-reload2.c:148: not true: he != NULL
error: 4 test failures
```
If local->data.services[database_index] is null, we go ahead
and reload anyway even if we're in a chroot, because we assume
that it's unlikely to have both cases. But it's null .. even if we
are just entering a chroot, but then we skip the test?
Are we back to always wanting to do the chroot test, because
otherwise we're going to accidentally end up loading from the
chroot again?
i.e. if it's null, we never change local->root_{into,dev}, so
they continue to point into the chroot.
But I can't always do a statx call because then we're back
at square 1.
And now I've talked myself into a corner and I'm confused :)
Sam James <sam@gentoo.org> writes:
> Huh, it still fails, but makes a bit more sense. It's upset because
> now we apparently we return a result despite not being started
> up:
So back to basics...
This routine is called when we have a real getXbyY() call and need
a config.
Our code happens after we've checked for nsswitch.conf changing.
action: FALSE = error returned to user
action: TRUE = old configuration re-used
action: CONT = continue to load new configuration
services[db] inode.changed action
NULL NO CONT - we need some config
NULL YES CONT - we need some config
non-null NO CONT - we need to reload
non-null YES TRUE - use old config
So I think the patch is correct. I'm seeing tst-reload2 fail also
(sigh, only tested tst-reload1)
It looks like we need to initialize local->root_* even if [services] is
NULL. I.e. we need to run that stat always, despite deferring its
check.
This is a little messier but I think the logic is all in the right
order. We might need to move the local-> if{} to before the NULL check,
although we must load the config at least once so it should be OK.
diff --git a/nss/nss_database.c b/nss/nss_database.c
index d56c5b798d..f2ed2f2c25 100644
--- a/nss/nss_database.c
+++ b/nss/nss_database.c
@@ -420,23 +420,32 @@ nss_database_check_reload_and_get (struct nss_database_state *local,
return true;
}
- /* Before we reload, verify that "/" hasn't changed. We assume that
- errors here are very unlikely, but the chance that we're entering
- a container is also very unlikely, so we err on the side of both
- very unlikely things not happening at the same time. */
- if (__stat64_time64 ("/", &str) != 0
- || (local->root_ino != 0
- && (str.st_ino != local->root_ino
- || str.st_dev != local->root_dev)))
+ int stat_rv = __stat64_time64 ("/", &str);
+
+ if (local->data.services[database_index] != NULL)
{
- /* Change detected; disable reloading and return current state. */
- atomic_store_release (&local->data.reload_disabled, 1);
- *result = local->data.services[database_index];
- __libc_lock_unlock (local->lock);
- return true;
+ /* Before we reload, verify that "/" hasn't changed. We assume that
+ errors here are very unlikely, but the chance that we're entering
+ a container is also very unlikely, so we err on the side of both
+ very unlikely things not happening at the same time. */
+ if (stat_rv != 0
+ || (local->root_ino != 0
+ && (str.st_ino != local->root_ino
+ || str.st_dev != local->root_dev)))
+ {
+ /* Change detected; disable reloading and return current state. */
+ atomic_store_release (&local->data.reload_disabled, 1);
+ *result = local->data.services[database_index];
+ __libc_lock_unlock (local->lock);
+ return true;
+ }
+ }
+ if (stat_rv == 0)
+ {
+ local->root_ino = str.st_ino;
+ local->root_dev = str.st_dev;
}
- local->root_ino = str.st_ino;
- local->root_dev = str.st_dev;
+
__libc_lock_unlock (local->lock);
/* Avoid overwriting the global configuration until we have loaded
> On 4 Jun 2022, at 21:51, DJ Delorie <dj@redhat.com> wrote:
>
> Sam James <sam@gentoo.org> writes:
>> Huh, it still fails, but makes a bit more sense. It's upset because
>> now we apparently we return a result despite not being started
>> up:
>
> So back to basics...
>
> This routine is called when we have a real getXbyY() call and need
> a config.
>
> Our code happens after we've checked for nsswitch.conf changing.
>
> action: FALSE = error returned to user
> action: TRUE = old configuration re-used
> action: CONT = continue to load new configuration
>
> services[db] inode.changed action
>
> NULL NO CONT - we need some config
> NULL YES CONT - we need some config
> non-null NO CONT - we need to reload
> non-null YES TRUE - use old config
>
> So I think the patch is correct. I'm seeing tst-reload2 fail also
> (sigh, only tested tst-reload1)
>
> It looks like we need to initialize local->root_* even if [services] is
> NULL. I.e. we need to run that stat always, despite deferring its
> check.
>
> This is a little messier but I think the logic is all in the right
> order. We might need to move the local-> if{} to before the NULL check,
> although we must load the config at least once so it should be OK.
>
I'm glad I nearly got there -- I was worried about making the stat
call at all, but the issue is referencing the result, of course, if
it's not safe to do so.
Let me give this a whack!
> diff --git a/nss/nss_database.c b/nss/nss_database.c
> index d56c5b798d..f2ed2f2c25 100644
> --- a/nss/nss_database.c
> +++ b/nss/nss_database.c
> @@ -420,23 +420,32 @@ nss_database_check_reload_and_get (struct nss_database_state *local,
> return true;
> }
>
> - /* Before we reload, verify that "/" hasn't changed. We assume that
> - errors here are very unlikely, but the chance that we're entering
> - a container is also very unlikely, so we err on the side of both
> - very unlikely things not happening at the same time. */
> - if (__stat64_time64 ("/", &str) != 0
> - || (local->root_ino != 0
> - && (str.st_ino != local->root_ino
> - || str.st_dev != local->root_dev)))
> + int stat_rv = __stat64_time64 ("/", &str);
> +
> + if (local->data.services[database_index] != NULL)
> {
> - /* Change detected; disable reloading and return current state. */
> - atomic_store_release (&local->data.reload_disabled, 1);
> - *result = local->data.services[database_index];
> - __libc_lock_unlock (local->lock);
> - return true;
> + /* Before we reload, verify that "/" hasn't changed. We assume that
> + errors here are very unlikely, but the chance that we're entering
> + a container is also very unlikely, so we err on the side of both
> + very unlikely things not happening at the same time. */
> + if (stat_rv != 0
> + || (local->root_ino != 0
> + && (str.st_ino != local->root_ino
> + || str.st_dev != local->root_dev)))
> + {
> + /* Change detected; disable reloading and return current state. */
> + atomic_store_release (&local->data.reload_disabled, 1);
> + *result = local->data.services[database_index];
> + __libc_lock_unlock (local->lock);
> + return true;
> + }
> + }
> + if (stat_rv == 0)
> + {
> + local->root_ino = str.st_ino;
> + local->root_dev = str.st_dev;
> }
> - local->root_ino = str.st_ino;
> - local->root_dev = str.st_dev;
> +
> __libc_lock_unlock (local->lock);
>
> /* Avoid overwriting the global configuration until we have loaded
>
>
Best,
sam
> On 4 Jun 2022, at 22:29, Sam James via Libc-alpha <libc-alpha@sourceware.org> wrote:
>
>
>
>> On 4 Jun 2022, at 21:51, DJ Delorie <dj@redhat.com> wrote:
>>
>> Sam James <sam@gentoo.org> writes:
>>> Huh, it still fails, but makes a bit more sense. It's upset because
>>> now we apparently we return a result despite not being started
>>> up:
>>
>> So back to basics...
>>
>> This routine is called when we have a real getXbyY() call and need
>> a config.
>>
>> Our code happens after we've checked for nsswitch.conf changing.
>>
>> action: FALSE = error returned to user
>> action: TRUE = old configuration re-used
>> action: CONT = continue to load new configuration
>>
>> services[db] inode.changed action
>>
>> NULL NO CONT - we need some config
>> NULL YES CONT - we need some config
>> non-null NO CONT - we need to reload
>> non-null YES TRUE - use old config
>>
>> So I think the patch is correct. I'm seeing tst-reload2 fail also
>> (sigh, only tested tst-reload1)
>>
>> It looks like we need to initialize local->root_* even if [services] is
>> NULL. I.e. we need to run that stat always, despite deferring its
>> check.
>>
>> This is a little messier but I think the logic is all in the right
>> order. We might need to move the local-> if{} to before the NULL check,
>> although we must load the config at least once so it should be OK.
>>
>
> I'm glad I nearly got there -- I was worried about making the stat
> call at all, but the issue is referencing the result, of course, if
> it's not safe to do so.
>
> Let me give this a whack!
>>
Okay, it still fails (I put the check before and after, no difference, although I think after makes more sense), but bear with me:
```
FAIL: nss/tst-reload2
original exit status 1
error: tst-reload2.c:132: not true: pw->pw_uid != 2468
tst-reload2.c:138: numeric comparison failure
left: 5 (0x5); from: pw->pw_uid
right: 1234 (0x4d2); from: 1234
error: tst-reload2.c:140: not true: gr != NULL
error: tst-reload2.c:148: not true: he != NULL
error: 4 test failures
make[1]: Leaving directory '/home/sam/git/glibc'
```
Are we sure tst-reload2 is correct?
At line 140, shouldn't it be null (I mean, it is when the test fails, but isn't that okay?), because we don't want to load the inner config?
Especially given on line 142, we want the getgrnam() call to return null. And Group ID 5 is only defined in group_table_data2.
Now, where I'm less sure is the failure on line 148 (gethostbyname). test2 is in subdir/etc/nsswitch.conf and nothing about test2 is available in the outer configuration. *BUT* the comment above it says "... can still load the files DSO.", so I'm less confident I'm understanding that one.
>
> Best,
> sam
> On 4 Jun 2022, at 23:11, Sam James <sam@gentoo.org> wrote:
>
>
>
>> On 4 Jun 2022, at 22:29, Sam James via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>
>>
>>
>>> On 4 Jun 2022, at 21:51, DJ Delorie <dj@redhat.com> wrote:
>>>
>>> Sam James <sam@gentoo.org> writes:
>>>> Huh, it still fails, but makes a bit more sense. It's upset because
>>>> now we apparently we return a result despite not being started
>>>> up:
>>>
>>> So back to basics...
>>>
>>> This routine is called when we have a real getXbyY() call and need
>>> a config.
>>>
>>> Our code happens after we've checked for nsswitch.conf changing.
>>>
>>> action: FALSE = error returned to user
>>> action: TRUE = old configuration re-used
>>> action: CONT = continue to load new configuration
>>>
>>> services[db] inode.changed action
>>>
>>> NULL NO CONT - we need some config
>>> NULL YES CONT - we need some config
>>> non-null NO CONT - we need to reload
>>> non-null YES TRUE - use old config
>>>
>>> So I think the patch is correct. I'm seeing tst-reload2 fail also
>>> (sigh, only tested tst-reload1)
>>>
>>> It looks like we need to initialize local->root_* even if [services] is
>>> NULL. I.e. we need to run that stat always, despite deferring its
>>> check.
>>>
>>> This is a little messier but I think the logic is all in the right
>>> order. We might need to move the local-> if{} to before the NULL check,
>>> although we must load the config at least once so it should be OK.
>>>
>>
>> I'm glad I nearly got there -- I was worried about making the stat
>> call at all, but the issue is referencing the result, of course, if
>> it's not safe to do so.
>>
>> Let me give this a whack!
>>>
>
> Okay, it still fails (I put the check before and after, no difference, although I think after makes more sense), but bear with me:
> ```
> FAIL: nss/tst-reload2
> original exit status 1
> error: tst-reload2.c:132: not true: pw->pw_uid != 2468
> tst-reload2.c:138: numeric comparison failure
> left: 5 (0x5); from: pw->pw_uid
> right: 1234 (0x4d2); from: 1234
> error: tst-reload2.c:140: not true: gr != NULL
> error: tst-reload2.c:148: not true: he != NULL
> error: 4 test failures
> make[1]: Leaving directory '/home/sam/git/glibc'
> ```
>
> Are we sure tst-reload2 is correct?
>
> At line 140, shouldn't it be null (I mean, it is when the test fails, but isn't that okay?), because we don't want to load the inner config?
>
> Especially given on line 142, we want the getgrnam() call to return null. And Group ID 5 is only defined in group_table_data2.
>
> Now, where I'm less sure is the failure on line 148 (gethostbyname). test2 is in subdir/etc/nsswitch.conf and nothing about test2 is available in the outer configuration. *BUT* the comment above it says "... can still load the files DSO.", so I'm less confident I'm understanding that one.
>
(This doesn't address all of the failures, so it might be I'm just misunderstanding, but line 140 vs line 142 definitely seems odd?)
>>
>> Best,
>> sam
Weird, it works for me, both reload1 and reload2 inside and outside of
the trybot's build container.
Note: the testroot is NOT automatically updated each time you type
"make", because that's a lot of overhead that's rarely needed. If you
modify the sources and want to see the results in containerized tests:
$ rm -rf $BUILD/testroot.*
$ make -j $BUILD/testroot.pristine/install.stamp
> On 5 Jun 2022, at 01:55, DJ Delorie <dj@redhat.com> wrote:
>
>
> Weird, it works for me, both reload1 and reload2 inside and outside of
> the trybot's build container.
>
> Note: the testroot is NOT automatically updated each time you type
> "make", because that's a lot of overhead that's rarely needed. If you
> modify the sources and want to see the results in containerized tests:
>
> $ rm -rf $BUILD/testroot.*
> $ make -j $BUILD/testroot.pristine/install.stamp
>
Ah, thanks, that did it! I hadn't realised about the testroot part.
(Then got bit by https://sourceware.org/bugzilla/show_bug.cgi?id=25836).
Passing! \o/
On Jun 04 2022, DJ Delorie via Libc-alpha wrote:
> $ rm -rf $BUILD/testroot.*
> $ make -j $BUILD/testroot.pristine/install.stamp
There should be a makefile target to do that.
Andreas Schwab <schwab@linux-m68k.org> writes:
>> $ rm -rf $BUILD/testroot.*
>> $ make -j $BUILD/testroot.pristine/install.stamp
>
> There should be a makefile target to do that.
I'm not opposed, but:
1. Bikeshed a clever name, and
2. How would anyone learn of this new option when they need it?
@@ -420,21 +420,24 @@ nss_database_check_reload_and_get (struct nss_database_state *local,
return true;
}
- /* Before we reload, verify that "/" hasn't changed. We assume that
- errors here are very unlikely, but the chance that we're entering
- a container is also very unlikely, so we err on the side of both
- very unlikely things not happening at the same time. */
- if (__stat64_time64 ("/", &str) != 0
- || (local->root_ino != 0
- && (str.st_ino != local->root_ino
- || str.st_dev != local->root_dev)))
- {
- /* Change detected; disable reloading and return current state. */
- atomic_store_release (&local->data.reload_disabled, 1);
- *result = local->data.services[database_index];
- __libc_lock_unlock (local->lock);
- return true;
- }
+ if (local->data.services[database_index] != NULL) {
+ /* Before we reload, verify that "/" hasn't changed. We assume that
+ errors here are very unlikely, but the chance that we're entering
+ a container is also very unlikely, so we err on the side of both
+ very unlikely things not happening at the same time. */
+ if (__stat64_time64 ("/", &str) != 0
+ || (local->root_ino != 0
+ && (str.st_ino != local->root_ino
+ || str.st_dev != local->root_dev)))
+ {
+ /* Change detected; disable reloading and return current state. */
+ atomic_store_release (&local->data.reload_disabled, 1);
+ *result = local->data.services[database_index];
+ __libc_lock_unlock (local->lock);
+ return true;
+ }
+ }
+
local->root_ino = str.st_ino;
local->root_dev = str.st_dev;
__libc_lock_unlock (local->lock);