[0/2] Fix nss/tst-nss-files-hosts-long on single-stack hosts (bug 24816)

Message ID cover.1663079342.git.fweimer@redhat.com
Headers
Series Fix nss/tst-nss-files-hosts-long on single-stack hosts (bug 24816) |

Message

Florian Weimer Sept. 13, 2022, 2:35 p.m. UTC
  Our Fedora builders started running the container tests (after the
switch to systemd-nspawn), and we encountered this test failure as well.
Fix this by disabling address configuration in the getent tool.

Tested on x86_64-linux-gnu.

Florian Weimer (2):
  nss: Implement --no-addrconfig option for getent
  nss: Fix tst-nss-files-hosts-long on single-stack hosts (bug 24816)

 NEWS                           |  5 ++++-
 nss/getent.c                   | 11 ++++++++++-
 nss/tst-nss-files-hosts-long.c |  9 +++++----
 3 files changed, 19 insertions(+), 6 deletions(-)


base-commit: f278835f594740f5913001430641cf1da4878670
  

Comments

Carlos O'Donell Sept. 14, 2022, 9:42 a.m. UTC | #1
On Tue, Sep 13, 2022 at 04:35:39PM +0200, Florian Weimer via Libc-alpha wrote:
> Our Fedora builders started running the container tests (after the
> switch to systemd-nspawn), and we encountered this test failure as well.
> Fix this by disabling address configuration in the getent tool.

Two things I'd like to discuss.

(1) Change the getent default and drop AI_ADDRCONFIG.

I'm hesitant to add a new option to getent as a solution to a testing
problem. The documented description for getent ahosts talks only
about enumerating the host entries or calling getaddrinfo with
AF_UNSPEC. Could we just change the default and ignore the host
configuration? This is less conservative but logically it seems to me
that we could just drop AI_ADDRCONFIG, and add a --addrconfig option to
get back the old behaviour. What could we possibly break?

(2) Fix the test.

Alternatively the test should be checking to see if it is in a dual
stack environment or single stack environment and only call getent for
the specific case when such interfaces are enabled.

Can we resolve this entirely in tst-nss-files-hosts-long?

> Tested on x86_64-linux-gnu.
> 
> Florian Weimer (2):
>   nss: Implement --no-addrconfig option for getent
>   nss: Fix tst-nss-files-hosts-long on single-stack hosts (bug 24816)
> 
>  NEWS                           |  5 ++++-
>  nss/getent.c                   | 11 ++++++++++-
>  nss/tst-nss-files-hosts-long.c |  9 +++++----
>  3 files changed, 19 insertions(+), 6 deletions(-)
> 
> 
> base-commit: f278835f594740f5913001430641cf1da4878670
> -- 
> 2.37.2
>
  
Florian Weimer Sept. 14, 2022, 9:54 a.m. UTC | #2
* Carlos O'Donell:

> On Tue, Sep 13, 2022 at 04:35:39PM +0200, Florian Weimer via Libc-alpha wrote:
>> Our Fedora builders started running the container tests (after the
>> switch to systemd-nspawn), and we encountered this test failure as well.
>> Fix this by disabling address configuration in the getent tool.
>
> Two things I'd like to discuss.
>
> (1) Change the getent default and drop AI_ADDRCONFIG.
>
> I'm hesitant to add a new option to getent as a solution to a testing
> problem. The documented description for getent ahosts talks only
> about enumerating the host entries or calling getaddrinfo with
> AF_UNSPEC. Could we just change the default and ignore the host
> configuration? This is less conservative but logically it seems to me
> that we could just drop AI_ADDRCONFIG, and add a --addrconfig option to
> get back the old behaviour. What could we possibly break?

I'm not sure why we would make such a backwards-incompatible change just
to fix a test.  It sounds even more preposterous than adding the new
option.

There have been support cases where the --no-addrconfig option would
have been useful.  Today, getent isn't a great tool for diagnosing DNS
issues, and I think this option improves the situation slightly.

> (2) Fix the test.
>
> Alternatively the test should be checking to see if it is in a dual
> stack environment or single stack environment and only call getent for
> the specific case when such interfaces are enabled.
>
> Can we resolve this entirely in tst-nss-files-hosts-long?

I think it's futile to try to replicate the AI_ADDRCONFIG behavior in
the test.

Thanks,
Florian
  
Carlos O'Donell Sept. 14, 2022, 10:26 p.m. UTC | #3
On Wed, Sep 14, 2022 at 11:54:47AM +0200, Florian Weimer wrote:
> * Carlos O'Donell:
> 
> > On Tue, Sep 13, 2022 at 04:35:39PM +0200, Florian Weimer via Libc-alpha wrote:
> >> Our Fedora builders started running the container tests (after the
> >> switch to systemd-nspawn), and we encountered this test failure as well.
> >> Fix this by disabling address configuration in the getent tool.
> >
> > Two things I'd like to discuss.
> >
> > (1) Change the getent default and drop AI_ADDRCONFIG.
> >
> > I'm hesitant to add a new option to getent as a solution to a testing
> > problem. The documented description for getent ahosts talks only
> > about enumerating the host entries or calling getaddrinfo with
> > AF_UNSPEC. Could we just change the default and ignore the host
> > configuration? This is less conservative but logically it seems to me
> > that we could just drop AI_ADDRCONFIG, and add a --addrconfig option to
> > get back the old behaviour. What could we possibly break?
> 
> I'm not sure why we would make such a backwards-incompatible change just
> to fix a test.  It sounds even more preposterous than adding the new
> option.

I fully agree that the most backwards compatible change is to add
an option that allows getent to operate without AI_ADDRCONFIG.

What I want to explore here is: Why use AI_ADDRCONFIG at all with
getent?

If our collective answer is: Because that's just the way we've always
done it and changing it would be a backwards incompatible change, then
I'm fine with that. I just wanted to explore that a bit.

I can see arguments both ways. I was looking for your opinion here.
My read of your opinion is that we should make the minimum backwards
compatible change.

> There have been support cases where the --no-addrconfig option would
> have been useful.  Today, getent isn't a great tool for diagnosing DNS
> issues, and I think this option improves the situation slightly.

That's a good point in favour of the new option.

> > (2) Fix the test.
> >
> > Alternatively the test should be checking to see if it is in a dual
> > stack environment or single stack environment and only call getent for
> > the specific case when such interfaces are enabled.
> >
> > Can we resolve this entirely in tst-nss-files-hosts-long?
> 
> I think it's futile to try to replicate the AI_ADDRCONFIG behavior in
> the test.

I did an audit and it looks like getent, and this specific test are
the only ones that we'd need cood like this for, and so there isn't
a win-win here with other tests.

Cheers,
Carlos.
  
Florian Weimer Sept. 15, 2022, 12:37 p.m. UTC | #4
* Carlos O'Donell:

> On Wed, Sep 14, 2022 at 11:54:47AM +0200, Florian Weimer wrote:
>> * Carlos O'Donell:
>> 
>> > On Tue, Sep 13, 2022 at 04:35:39PM +0200, Florian Weimer via Libc-alpha wrote:
>> >> Our Fedora builders started running the container tests (after the
>> >> switch to systemd-nspawn), and we encountered this test failure as well.
>> >> Fix this by disabling address configuration in the getent tool.
>> >
>> > Two things I'd like to discuss.
>> >
>> > (1) Change the getent default and drop AI_ADDRCONFIG.
>> >
>> > I'm hesitant to add a new option to getent as a solution to a testing
>> > problem. The documented description for getent ahosts talks only
>> > about enumerating the host entries or calling getaddrinfo with
>> > AF_UNSPEC. Could we just change the default and ignore the host
>> > configuration? This is less conservative but logically it seems to me
>> > that we could just drop AI_ADDRCONFIG, and add a --addrconfig option to
>> > get back the old behaviour. What could we possibly break?
>> 
>> I'm not sure why we would make such a backwards-incompatible change just
>> to fix a test.  It sounds even more preposterous than adding the new
>> option.
>
> I fully agree that the most backwards compatible change is to add
> an option that allows getent to operate without AI_ADDRCONFIG.
>
> What I want to explore here is: Why use AI_ADDRCONFIG at all with
> getent?

I think it dates back when it was assumed that AI_ADDRCONFIG was useful.
Back then, it looked like that if your system had IPv6 addresses
configured, it could reach the entire IPv6 Internet.

In practice, what applications should do is to get all addresses,
always, and make connections in parallel over both address families.
Filtering out addresses that are clearly unusable is just an
optimization (but glibc isn't very good at that, getting the IPv6
support status of a host the way we do it can be very expensive, or give
inaccurate results over time).

> If our collective answer is: Because that's just the way we've always
> done it and changing it would be a backwards incompatible change, then
> I'm fine with that. I just wanted to explore that a bit.

Yeah, that too.  getent ahosts is not really useful because of the
address duplication.

Thanks,
Florian