Message ID | 20190208175245.2314-1-halves@canonical.com |
---|---|
State | Rejected |
Headers |
Received: (qmail 10863 invoked by alias); 8 Feb 2019 17:53:20 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 10848 invoked by uid 89); 8 Feb 2019 17:53:19 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY autolearn=ham version=3.3.2 spammy=Force, getaddrinfo, wrt, family X-HELO: youngberry.canonical.com From: "Heitor R. Alves de Siqueira" <halves@canonical.com> To: libc-alpha@sourceware.org Cc: Florian Weimer <fweimer@redhat.com>, Dan Streetman <dan.streetman@canonical.com>, "Heitor R. Alves de Siqueira" <halves@canonical.com> Subject: [RFC PATCH] getaddrinfo: Force name resolution for AI_CANONNAME [BZ# 24182] Date: Fri, 8 Feb 2019 15:52:45 -0200 Message-Id: <20190208175245.2314-1-halves@canonical.com> |
Commit Message
Heitor R. Alves de Siqueira
Feb. 8, 2019, 5:52 p.m. UTC
When getaddrinfo() is called with a numeric nodename argument (e.g. 67882190), we should try name resolution if AI_CANONNAME is set. RFC 1123 allows digits-only hostnames, but inet_aton_exact() can interpret these as valid IPv4 addresses in a 32-bit number form. This behaviour causes the internal gaih_inet() call to think a numeric hostname is a valid IPv4 address and skip name resolution. One can reproduce this by following these steps: 1) Append numeric hostname records to /etc/hosts: $ head -n2 /etc/hosts 127.0.0.1 localhost 127.0.0.1 1234.example.com 1234 2) Change local hostname to the numeric record: $ sudo hostname 1234 3) Call `hostname -f` (output should be '1234.example.com'): $ hostname -f 1234 This patch forces name resolution if the AI_CANONNAME flag is set. Even if inet_aton_exact() identifies the input name as being a valid IPv4 address, we will try name resolution in case it's a valid hostname. If no hostname is found after resolution, the input name is still copied to the ai_canonname field. The patch was tested on amd64, and the glibc test suite showed no regressions. Further use case tests showed that current behaviour is not modified w.r.t. IPv4 addresses. --- This is a tentative patch suggestion for BZ# 24182. The general idea is already being discussed there, but I would like some more thoughts on the patch specifically, if possible. Likely there are many ways to tackle this and I'm unsure which would be best suited. Thanks! Heitor sysdeps/posix/getaddrinfo.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
Comments
* Heitor R. Alves de Siqueira: > This patch forces name resolution if the AI_CANONNAME flag is set. Even > if inet_aton_exact() identifies the input name as being a valid IPv4 > address, we will try name resolution in case it's a valid hostname. If > no hostname is found after resolution, the input name is still copied > to the ai_canonname field. This is not correct because it sends queries for names such as 192.0.2.1 to the DNS root servers. I'm sorry, but I still don't see how the general idea is useful. Which applications benefit if getaddrinfo returns in ai_canonname which will most likely resolve to a completely different set of addresses? Do you have a bug report that requests a behavior change in this area? Which problem is this trying to address? Thanks, Florian
* Florian Weimer: > This is not correct because it sends queries for names such as 192.0.2.1 to the DNS root servers. Right, that can be avoided by setting the AI_NUMERICHOST flag, but it is a problem if we just set the AI_CANONNAME flag (and then getaddrinfo() wouldn't know if we meant the nodename to be an IPv4 address). An alternative solution is to check if the nodename contains any '.' characters (e.g. using strchr) after it was identified by inet_aton_exact(). In that case, we could set ai_family to AF_UNSPEC and try name resolution since the nodename could be either an IPv4 adress in 32-bit format or a numeric hostname. Do you think that would be a better approach, Florian? > I'm sorry, but I still don't see how the general idea is useful. Which applications benefit if getaddrinfo returns in ai_canonname which will most likely resolve to a completely different set of addresses? Most applications that go through glibc for network connections would benefit from this. As an example, with the current behaviour we can't ssh to a machine called '12345' in our LAN even if we explicitly add its IP address to /etc/hosts: $ head -n2 /etc/hosts 127.0.0.1 localhost 10.188.133.187 12345.lan 12345 $ ssh 12345 ssh: connect to host 12345 port 22: Invalid argument If we change getaddrinfo() to handle digits-only hostnames, then we can correctly reach the host: $ ssh 12345 user@12345: Permission denied (publickey). We could connect to these hosts using the FQDN according to the hosts file or DNS, but I think it's reasonable to expect the numeric host to resolve if it's set in the /etc/hosts file or equivalent. Changing getaddrinfo() so that it resolves numeric hostnames helps with this scenario not only for ssh, but also for other glibc-dependent programs. Thanks! Heitor
* Heitor Alves de Siqueira: > * Florian Weimer: > >> This is not correct because it sends queries for names such as 192.0.2.1 > to the DNS root servers. > > Right, that can be avoided by setting the AI_NUMERICHOST flag, but it is a > problem if we just set the AI_CANONNAME flag (and then getaddrinfo() wouldn't > know if we meant the nodename to be an IPv4 address). Just be clear here, we need to avoid sending those queries in all cases, whether AI_NUMERICHOST is set or not. > An alternative solution is to check if the nodename contains any '.' characters > (e.g. using strchr) after it was identified by inet_aton_exact(). In that > case, we could set ai_family to AF_UNSPEC and try name resolution since the > nodename could be either an IPv4 adress in 32-bit format or a numeric hostname. > Do you think that would be a better approach, Florian? It is at least theoretically possible to attempt a host name lookup for a name that is a non-negative integer, and use the integer as an IPv4 address only as a fallback if name resolution through NSS does not deliver any results. This would still benefit from changes to the stub resolver that essentially make sure that these queries do not reach the root servers (related to bug 19634). The question is if it's worth this complexity, and the resulting lack of consistency with what other systems do (and older versions of glibc which have not backported this change). >> I'm sorry, but I still don't see how the general idea is useful. Which > applications benefit if getaddrinfo returns in ai_canonname which will most > likely resolve to a completely different set of addresses? > > Most applications that go through glibc for network connections would benefit > from this. As an example, with the current behaviour we can't ssh to a machine > called '12345' in our LAN even if we explicitly add its IP address to > /etc/hosts: > $ head -n2 /etc/hosts > 127.0.0.1 localhost > 10.188.133.187 12345.lan 12345 > > $ ssh 12345 > ssh: connect to host 12345 port 22: Invalid argument > > If we change getaddrinfo() to handle digits-only hostnames, then we can > correctly reach the host: > $ ssh 12345 > user@12345: Permission denied (publickey). Is there *any* system that currently behaves this way? I checked Windows 10 (native, not WSL, obviously), and it very closely matches the glibc behavior: octal parsing, and the hosts file does not override parsing as numeric domain names. This is not surprising, given the shared ancestry in the BIND stub resolver code. Thanks, Florian
diff --git a/sysdeps/posix/getaddrinfo.c b/sysdeps/posix/getaddrinfo.c index aa054b620f2a..fa9e2d6ad3b1 100644 --- a/sysdeps/posix/getaddrinfo.c +++ b/sysdeps/posix/getaddrinfo.c @@ -505,9 +505,6 @@ gaih_inet (const char *name, const struct gaih_service *service, result = -EAI_ADDRFAMILY; goto free_and_return; } - - if (req->ai_flags & AI_CANONNAME) - canon = name; } else if (at->family == AF_UNSPEC) { @@ -548,7 +545,8 @@ gaih_inet (const char *name, const struct gaih_service *service, } } - if (at->family == AF_UNSPEC && (req->ai_flags & AI_NUMERICHOST) == 0) + if ((at->family == AF_UNSPEC || (req->ai_flags & AI_CANONNAME)) + && (req->ai_flags & AI_NUMERICHOST) == 0) { struct gaih_addrtuple **pat = &at; int no_data = 0;