New version of the libidn2 patch

Message ID 68093b68-a7ab-145e-2357-ea84aedee3b5@redhat.com
State Superseded
Headers

Commit Message

Florian Weimer April 4, 2018, 7:22 p.m. UTC
  On 04/01/2018 09:42 PM, Florian Weimer wrote:
> * Zack Weinberg:
> 
>> On Sun, Apr 1, 2018 at 2:54 PM, Florian Weimer <fweimer@redhat.com> wrote:
>>> On 03/15/2018 11:03 PM, Florian Weimer wrote:
>>>>
>>>> For some reason, libidn2 does not fail the conversion, but uses the
>>>> replacement character '?' for unencodable characters.  This will have to be
>>>> fixed in libidn2 eventually.>
>>>
>>> This has now been fixed in upstream libidn2.
>>
>> I don't think I can comment usefully on any of the code changes but I
>> have some concerns about the behavior in some of the failure cases.
>> Most importantly, I don't understand exactly what we do if the
>> application calls getaddrinfo, passing AI_IDN, with a name that would
>> be changed by punycoding it, but libidn2 is not available.  The NEWS
>> makes it sound like we might transmit a raw non-ASCII name to the DNS
>> server in that case, and I think we shouldn't do that.
> 
> Only very few applications currently use AI_IDN.  This means that if
> the user enters such a name with a non-AI_IDN application, exactly the
> same thing will happen.  That's why I thought the fallback behavior
> made sense.
> 
> It's probably more of a performance optimization to change this: We
> could check the name for non-ASCII characters (and backslashes,
> because interesting things happen with them) and load and call into
> libidn2 only if there are such non-ASCII characters.

Here's a patch implementing this.  I'd like to commit this separately 
because it is slightly less well-tested than the rest.

>> I also wonder if we shouldn't somehow detect and refuse to use
>> versions of libidn2 without all the necessary bugfixes.
> 
> The ??? transcoding bug is probably something approaching a security
> bug, but I'm not sure if it can actually be triggered in a UTF-8
> locale.  (The point is that for getnameinfo and AI_CANONIDN, the ?
> replacement character is introduced *after* the meta-character checks
> in nss_dns, and pathname expansion in the shell could result in all
> kinds of unwanted characters.)  The other bugs are really benign, and
> there's no evidence that they have been encountered in the wild.

It also papers over part of the transcoding bug because it will 
recognize invalid multi-byte sequences and not pass them to libidn2.

Thanks,
Florian
  

Comments

Florian Weimer April 11, 2018, 9:39 a.m. UTC | #1
I hope to commit this cluster of patches soon:

https://sourceware.org/ml/libc-alpha/2018-01/msg00440.html
https://sourceware.org/ml/libc-alpha/2018-04/msg00001.html
https://sourceware.org/ml/libc-alpha/2018-04/msg00123.html

I would particularly like to see some comments on the middle patch and 
the removal of the old libidn in it.

Thanks,
Florian
  

Patch

Subject: [PATCH] getaddrinfo: Check for IDNA name before calling into libidn2
To: libc-alpha@sourceware.org

This avoids unnecessary loading of libidn2, and sending
unencoded non-ASCII names over the network.

2018-04-04  Florian Weimer  <fweimer@redhat.com>

	* inet/Makefile (tests-static, tests-internal): Add
	tst-idna_name_classify.
	(LOCALES): Generate locales for tests.
	(tst-idna_name_classify.out): Depend on generated locales.
	* inet/idna_name_classify.c: New file.
	* inet/idna.c (__idna_to_dns_encoding): Call __idna_name_classify
	before encoding.  Do not send unencoded non-ASCII names over the
	network.
	* inet/net-internal.h (enum idna_name_classification): Define.
	(__idna_name_classify): Declare.
	* inet/tst-idna_name_classify.c: New file.

diff --git a/NEWS b/NEWS
index 0de39d00a2..f785947ec9 100644
--- a/NEWS
+++ b/NEWS
@@ -28,11 +28,13 @@  Major new features:
 * IDN domain names in getaddrinfo and getnameinfo now use the system libidn2
   library if installed.  If libidn2 is not available, internationalized
   domain names are not encoded or decoded even if the AI_IDN or NI_IDN flags
-  are passed to getaddrinfo or getnameinfo.  Flags which used to change the
-  IDN encoding and decoding behavior (AI_IDN_ALLOW_UNASSIGNED,
-  AI_IDN_USE_STD3_ASCII_RULES, NI_IDN_ALLOW_UNASSIGNED,
-  NI_IDN_USE_STD3_ASCII_RULES) have been deprecated.  They no longer have
-  any effect.  (libidn2 allows yet-unassigned characters by default.)
+  are passed to getaddrinfo or getnameinfo.  (getaddrinfo calls with
+  non-ASCII names and AI_IDN will fail with an encoding error.)  Flags which
+  used to change the IDN encoding and decoding behavior
+  (AI_IDN_ALLOW_UNASSIGNED, AI_IDN_USE_STD3_ASCII_RULES,
+  NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES) have been
+  deprecated.  They no longer have any effect.  (libidn2 allows
+  yet-unassigned characters by default.)
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/inet/Makefile b/inet/Makefile
index 4079be856a..09f5ba78fc 100644
--- a/inet/Makefile
+++ b/inet/Makefile
@@ -45,7 +45,7 @@  routines := htonl htons		\
 	    in6_addr getnameinfo if_index ifaddrs inet6_option \
 	    getipv4sourcefilter setipv4sourcefilter \
 	    getsourcefilter setsourcefilter inet6_opt inet6_rth \
-	    inet6_scopeid_pton deadline idna
+	    inet6_scopeid_pton deadline idna idna_name_classify
 
 aux := check_pf check_native ifreq
 
@@ -59,12 +59,20 @@  tests := htontest test_ifindex tst-ntoa tst-ether_aton tst-network \
 tests-static += tst-deadline
 tests-internal += tst-deadline
 
+# tst-idna_name_classify must be linked statically because it tests
+# internal functionality.
+tests-static += tst-idna_name_classify
+tests-internal += tst-idna_name_classify
+
 # tst-inet6_scopeid_pton also needs internal functions but does not
 # need to be linked statically.
 tests-internal += tst-inet6_scopeid_pton
 
 include ../Rules
 
+LOCALES := en_US.UTF-8 en_US.ISO-8859-1
+include ../gen-locales.mk
+
 ifeq ($(have-thread-library),yes)
 
 CFLAGS-gethstbyad_r.c += -fexceptions
@@ -103,3 +111,5 @@  endif
 ifeq ($(build-static-nss),yes)
 CFLAGS += -DSTATIC_NSS
 endif
+
+$(objpfx)tst-idna_name_classify.out: $(gen-locales)
diff --git a/inet/idna.c b/inet/idna.c
index e4c83182d6..c561bf2e9e 100644
--- a/inet/idna.c
+++ b/inet/idna.c
@@ -112,11 +112,29 @@  gai_strdup (const char *name, char **result)
 int
 __idna_to_dns_encoding (const char *name, char **result)
 {
+  switch (__idna_name_classify (name))
+    {
+    case idna_name_ascii:
+      /* Nothing to convert.  */
+      return gai_strdup (name, result);
+    case idna_name_nonascii:
+      /* Encoding needed.  Handled below.  */
+      break;
+    case idna_name_nonascii_backslash:
+    case idna_name_encoding_error:
+      return EAI_IDN_ENCODE;
+    case idna_name_memory_error:
+      return EAI_MEMORY;
+    case idna_name_error:
+      return EAI_SYSTEM;
+    }
+
   struct functions *functions = get_functions ();
   if (functions == NULL)
-    /* Forward the domain name to the NSS module unencoded.  It may be
-       able to apply its own encoding to find an answer.  */
-    return gai_strdup (name, result);
+    /* We report this as an encoding error (assuming that libidn2 is
+       not installed), although the root cause may be a temporary
+       error condition due to resource shortage.  */
+    return EAI_IDN_ENCODE;
   char *ptr = NULL;
   __typeof__ (functions->lookup_ul) fptr = functions->lookup_ul;
 #ifdef PTR_DEMANGLE
diff --git a/inet/idna_name_classify.c b/inet/idna_name_classify.c
new file mode 100644
index 0000000000..3683e1133f
--- /dev/null
+++ b/inet/idna_name_classify.c
@@ -0,0 +1,75 @@ 
+/* Classify a domain name for IDNA purposes.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <inet/net-internal.h>
+#include <stdbool.h>
+#include <string.h>
+#include <wchar.h>
+
+enum idna_name_classification
+__idna_name_classify (const char *name)
+{
+  mbstate_t mbs;
+  memset (&mbs, 0, sizeof (mbs));
+  const char *p = name;
+  const char *end = p + strlen (p) + 1;
+  bool nonascii = false;
+  bool backslash = false;
+  while (true)
+    {
+      wchar_t wc;
+      size_t result = mbrtowc (&wc, p, end - p, &mbs);
+      if (result == 0)
+        /* NUL terminator was reached.  */
+        break;
+      else if (result == (size_t) -2)
+        /* Incomplete trailing multi-byte character.  This is an
+           encoding error becaue we received the full name.  */
+        return idna_name_encoding_error;
+      else if (result == (size_t) -1)
+        {
+          /* Other error, including EILSEQ.  */
+          if (errno == EILSEQ)
+            return idna_name_encoding_error;
+          else if (errno == ENOMEM)
+            return idna_name_memory_error;
+          else
+            return idna_name_error;
+        }
+      else
+        {
+          /* A wide character was decoded.  */
+          p += result;
+          if (wc == L'\\')
+            backslash = true;
+          else if (wc > 127)
+            nonascii = true;
+        }
+    }
+
+  if (nonascii)
+    {
+      if (backslash)
+        return idna_name_nonascii_backslash;
+      else
+        return idna_name_nonascii;
+    }
+  else
+    return idna_name_ascii;
+}
diff --git a/inet/net-internal.h b/inet/net-internal.h
index a5cd4e0304..0ba6736aef 100644
--- a/inet/net-internal.h
+++ b/inet/net-internal.h
@@ -40,6 +40,22 @@  int __idna_from_dns_encoding (const char *name, char **result);
 libc_hidden_proto (__idna_from_dns_encoding)
 
 
+/* Return value of __idna_name_classify below.  */
+enum idna_name_classification
+{
+  idna_name_ascii,          /* No non-ASCII characters.  */
+  idna_name_nonascii,       /* Non-ASCII characters, no backslash.  */
+  idna_name_nonascii_backslash, /* Non-ASCII characters with backslash.  */
+  idna_name_encoding_error, /* Decoding error.  */
+  idna_name_memory_error,   /* Memory allocation failure.  */
+  idna_name_error,          /* Other error during decoding.  Check errno.  */
+};
+
+/* Check the specified name for non-ASCII characters and backslashes
+   or encoding errors.  */
+enum idna_name_classification __idna_name_classify (const char *name)
+  attribute_hidden;
+
 /* Deadline handling for enforcing timeouts.
 
    Code should call __deadline_current_time to obtain the current time
diff --git a/inet/tst-idna_name_classify.c b/inet/tst-idna_name_classify.c
new file mode 100644
index 0000000000..c4a2c91329
--- /dev/null
+++ b/inet/tst-idna_name_classify.c
@@ -0,0 +1,73 @@ 
+/* Test IDNA name classification.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <inet/net-internal.h>
+#include <locale.h>
+#include <stdio.h>
+#include <support/check.h>
+
+static void
+locale_insensitive_tests (void)
+{
+  TEST_COMPARE (__idna_name_classify (""), idna_name_ascii);
+  TEST_COMPARE (__idna_name_classify ("abc"), idna_name_ascii);
+  TEST_COMPARE (__idna_name_classify (".."), idna_name_ascii);
+  TEST_COMPARE (__idna_name_classify ("\001abc\177"), idna_name_ascii);
+  TEST_COMPARE (__idna_name_classify ("\\065bc"), idna_name_ascii);
+}
+
+static int
+do_test (void)
+{
+  puts ("info: C locale tests");
+  locale_insensitive_tests ();
+  TEST_COMPARE (__idna_name_classify ("abc\200def"),
+                idna_name_encoding_error);
+  TEST_COMPARE (__idna_name_classify ("abc\200\\def"),
+                idna_name_encoding_error);
+  TEST_COMPARE (__idna_name_classify ("abc\377def"),
+                idna_name_encoding_error);
+
+  puts ("info: en_US.ISO-8859-1 locale tests");
+  if (setlocale (LC_CTYPE, "en_US.ISO-8859-1") == 0)
+    FAIL_EXIT1 ("setlocale for en_US.ISO-8859-1: %m\n");
+  locale_insensitive_tests ();
+  TEST_COMPARE (__idna_name_classify ("abc\200def"), idna_name_nonascii);
+  TEST_COMPARE (__idna_name_classify ("abc\377def"), idna_name_nonascii);
+  TEST_COMPARE (__idna_name_classify ("abc\\\200def"),
+                idna_name_nonascii_backslash);
+  TEST_COMPARE (__idna_name_classify ("abc\200\\def"),
+                idna_name_nonascii_backslash);
+
+  puts ("info: en_US.UTF-8 locale tests");
+  if (setlocale (LC_CTYPE, "en_US.UTF-8") == 0)
+    FAIL_EXIT1 ("setlocale for en_US.UTF-8: %m\n");
+  locale_insensitive_tests ();
+  TEST_COMPARE (__idna_name_classify ("abc\xc3\x9f""def"), idna_name_nonascii);
+  TEST_COMPARE (__idna_name_classify ("abc\\\xc3\x9f""def"),
+                idna_name_nonascii_backslash);
+  TEST_COMPARE (__idna_name_classify ("abc\xc3\x9f\\def"),
+                idna_name_nonascii_backslash);
+  TEST_COMPARE (__idna_name_classify ("abc\200def"), idna_name_encoding_error);
+  TEST_COMPARE (__idna_name_classify ("abc\xc3""def"), idna_name_encoding_error);
+  TEST_COMPARE (__idna_name_classify ("abc\xc3"), idna_name_encoding_error);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/resolv/tst-resolv-ai_idn-nolibidn2.c b/resolv/tst-resolv-ai_idn-nolibidn2.c
index 8f30da8741..e97a00a619 100644
--- a/resolv/tst-resolv-ai_idn-nolibidn2.c
+++ b/resolv/tst-resolv-ai_idn-nolibidn2.c
@@ -37,12 +37,15 @@  gai_tests (void)
             "canonname: non-idn.example\n"
             "address: STREAM/TCP 192.0.2.110 80\n");
 
+  /* This gets passed over the network to the server, so it will
+     result in an NXDOMAIN error.  */
   check_ai (NAEMCHEN ".example", 0,
             "error: Name or service not known\n");
+  /* Due to missing libidn2, this fails inside getaddrinfo.  */
   check_ai (NAEMCHEN ".example", AI_IDN,
-            "error: Name or service not known\n");
+            "error: Parameter string not correctly encoded\n");
   check_ai (NAEMCHEN ".example", AI_IDN | AI_CANONNAME | AI_CANONIDN,
-            "error: Name or service not known\n");
+            "error: Parameter string not correctly encoded\n");
 
   /* Non-IDN CNAME.  */
   check_ai ("with.cname.example", 0,