intl: Proof against invalid offset/length

Message ID m3oao06pj3.fsf-ueno@gnu.org
State New, archived
Headers

Commit Message

Daiki Ueno March 11, 2015, 6:01 a.m. UTC
  Hello,

Last year, Jakub Wilk of Debian reported several crashes of msgunfmt
caused by artificial MO files:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769901
and we fixed them in gettext-tools (msgunfmt uses a different MO file
parser than intl for some reason).

However, I was too lazy to check if that is also the case for
gettext-runtime (libintl) and thus GLIBC, and it is:

  $ mkdir -p en/LC_MESSAGES
  $ cp gettext/gettext-tools/tests/overflow*.mo en/LC_MESSAGES
  $ gdb `which gettext`
  (gdb) set environment TEXTDOMAINDIR .
  (gdb) set environment LC_ALL en_US.utf8
  (gdb) run --domain=overflow-2 foo
  Starting program: /usr/bin/gettext --domain=overflow-2 foo

  Program received signal SIGSEGV, Segmentation fault.
  0x000000365fa94190 in __memcpy_sse2 () from /lib64/libc.so.6
  (gdb) bt
  #0  0x000000365fa94190 in __memcpy_sse2 () from /lib64/libc.so.6
  #1  0x000000365fa30a72 in _nl_load_domain () from /lib64/libc.so.6
  #2  0x000000365fa2fb9e in _nl_find_domain () from /lib64/libc.so.6
  #3  0x000000365fa2f24c in __dcigettext () from /lib64/libc.so.6
  #4  0x000000000040197c in main ()
  (gdb) 

It is surprising that there are no checks of lengths/offsets read from
MO files.  Currently, I'm thinking of the attached patch (to gettext),
which is a bit complicated.  If anyone could suggest a cleaner approach,
I'd appreciate it.

Regards,
--
Daiki Ueno
  

Comments

Carlos O'Donell March 11, 2015, 6:39 a.m. UTC | #1
On 03/11/2015 02:01 AM, Daiki Ueno wrote:
> It is surprising that there are no checks of lengths/offsets read from
> MO files.  Currently, I'm thinking of the attached patch (to gettext),
> which is a bit complicated.  If anyone could suggest a cleaner approach,
> I'd appreciate it.

Why does it surprise you?

The MO files are writable only by root, so it's not a security issue
because if you could write to them you'd be root, and you'd have
full access to the system anyway.

The other alternative is that the filesystem is corrupted and loading
the MO file crashes your application. This is expected since the
filesystem is corrupted. You are suggesting we add some rather complex
checking for the possibly low probability case of a corrupted
filesystem. If the filesystem is corrupted other things will be failing
and you need to fix the corruption.

What strong technical reasons do you have for propsing these additional
checks?

Cheers,
Carlos.
  
Mike Frysinger March 11, 2015, 7:10 a.m. UTC | #2
On 11 Mar 2015 02:39, Carlos O'Donell wrote:
> On 03/11/2015 02:01 AM, Daiki Ueno wrote:
> > It is surprising that there are no checks of lengths/offsets read from
> > MO files.  Currently, I'm thinking of the attached patch (to gettext),
> > which is a bit complicated.  If anyone could suggest a cleaner approach,
> > I'd appreciate it.
> 
> Why does it surprise you?
> 
> The MO files are writable only by root, so it's not a security issue
> because if you could write to them you'd be root, and you'd have
> full access to the system anyway.
> 
> The other alternative is that the filesystem is corrupted and loading
> the MO file crashes your application. This is expected since the
> filesystem is corrupted. You are suggesting we add some rather complex
> checking for the possibly low probability case of a corrupted
> filesystem. If the filesystem is corrupted other things will be failing
> and you need to fix the corruption.
> 
> What strong technical reasons do you have for propsing these additional
> checks?

i thought you could control things via $TEXTDOMAIN/$TEXTDOMAINDIR, but it looks 
like just `bash` and `gettext` respect those ?  so if you have a shell script 
that either directly supports translated messages (e.g. bash's $"..."), or 
indirectly (e.g. manually calling `gettext`), and it doesn't lock down the 
TEXTDOMAINDIR envvar properly, you could get them to load untrusted data and 
crash due to the omitted range checks in glibc ?

i'm not really familiar with how much gettext relies on glibc though or if it 
just entirely uses its own copy of code.

using Debian's code search [1], it looks like git provides GIT_TEXTDOMAINDIR to 
override the default TEXTDOMAINDIR.  i stopped at page ~6 ;).
-mike

[1] http://codesearch.debian.net/perpackage-results/TEXTDOMAINDIR/
  
Daiki Ueno March 11, 2015, 7:31 a.m. UTC | #3
Mike Frysinger <vapier@gentoo.org> writes:

>> What strong technical reasons do you have for propsing these additional
>> checks?
>
> i thought you could control things via $TEXTDOMAIN/$TEXTDOMAINDIR, but it looks 
> like just `bash` and `gettext` respect those ?  so if you have a shell script 
> that either directly supports translated messages (e.g. bash's $"..."), or 
> indirectly (e.g. manually calling `gettext`), and it doesn't lock down the 
> TEXTDOMAINDIR envvar properly, you could get them to load untrusted data and 
> crash due to the omitted range checks in glibc ?

bindtextdomain is the only place to configure the location, and it
seems to be the design:
http://thread.gmane.org/gmane.comp.lib.glibc.alpha/575

However, I too observed a few programs which use the location obtained
from environment variable.  Perhaps it would be nice to suggest using
the fixed location in the documentation.

Regards,
--
Daiki Ueno
  
Bruno Haible March 12, 2015, 1:04 a.m. UTC | #4
Carlos O'Donell wrote:
> The MO files are writable only by root, so it's not a security issue
> because if you could write to them you'd be root, and you'd have
> full access to the system anyway.

Your argument is similar to Ulrich Drepper's argument: MO files are part
of the distribution of a package, like executables and shared libraries.
glibc does not check against invalid offsets in shared libraries either,
and the kernel does not check against illegal instructions that happen
to exist in executables and shared libraries.

But these arguments don't consider the LANGUAGE variable. The original
intent of LANGUAGE was that it contains colon-separated language or locale
identifiers. But in fact, you can specify relative files names that start
with "../", and thus you can make the _nl_load_domain function in glibc
access files anywhere in the file system. For example:

$ LANGUAGE=../../../../../../../../../../../../../../tmp/hack/crashing-mos strace cp . .
...
open("/usr/share/locale/../../../../../../../../../../../../../../tmp/hack/crashing-mos/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/../../../../../../../../../../../../../../tmp/hack/crashing-mos/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
...
If I had put a hacked .mo file at /tmp/hack/crashing-mos/LC_MESSAGES/coreutils.mo
I would have crashed the 'cp' program from coreutils. Likewise with
any program from any package.

Bruno
  
Florian Weimer March 13, 2015, 2:29 p.m. UTC | #5
On 03/12/2015 02:04 AM, Bruno Haible wrote:

> But these arguments don't consider the LANGUAGE variable. The original
> intent of LANGUAGE was that it contains colon-separated language or locale
> identifiers. But in fact, you can specify relative files names that start
> with "../", and thus you can make the _nl_load_domain function in glibc
> access files anywhere in the file system.

Yes, this is bug 17142.
  
Carlos O'Donell March 13, 2015, 2:31 p.m. UTC | #6
On 03/13/2015 10:29 AM, Florian Weimer wrote:
> On 03/12/2015 02:04 AM, Bruno Haible wrote:
> 
>> But these arguments don't consider the LANGUAGE variable. The original
>> intent of LANGUAGE was that it contains colon-separated language or locale
>> identifiers. But in fact, you can specify relative files names that start
>> with "../", and thus you can make the _nl_load_domain function in glibc
>> access files anywhere in the file system.
> 
> Yes, this is bug 17142.

In my opinion we need to restrict LANGUAGE just like we restricted all
other the other variables in CVE-2014-0475.

Cheers,
Carlos.
  
Daiki Ueno March 20, 2015, 1:06 a.m. UTC | #7
"Carlos O'Donell" <carlos@redhat.com> writes:

> On 03/13/2015 10:29 AM, Florian Weimer wrote:
>> On 03/12/2015 02:04 AM, Bruno Haible wrote:
>> 
>>> But these arguments don't consider the LANGUAGE variable. The original
>>> intent of LANGUAGE was that it contains colon-separated language or locale
>>> identifiers. But in fact, you can specify relative files names that start
>>> with "../", and thus you can make the _nl_load_domain function in glibc
>>> access files anywhere in the file system.
>> 
>> Yes, this is bug 17142.
>
> In my opinion we need to restrict LANGUAGE just like we restricted all
> other the other variables in CVE-2014-0475.

I agree.  Now that intl/ is almost[1] synchronized with gettext, what's
blocking this?  I'm happy to include the patch in the upcoming gettext
release so non-glibc consumers also benefit from it.

Footnotes: 
[1]  except a minor adjustment for non-glibc systems:
     https://lists.gnu.org/archive/html/bug-gettext/2015-01/msg00029.html
     and absolute pathname handling, which I guess could be reverted because
     of not much use:
     http://git.savannah.gnu.org/cgit/gettext.git/commit/?id=279b57fc

Regards,
  
Florian Weimer March 20, 2015, 9:10 a.m. UTC | #8
On 03/20/2015 02:06 AM, Daiki Ueno wrote:

> I agree.  Now that intl/ is almost[1] synchronized with gettext, what's
> blocking this?  I'm happy to include the patch in the upcoming gettext
> release so non-glibc consumers also benefit from it.

The patch will use getauxval(AT_SECURE) or __libc_enable_secure (or
issetuugid on other systems, but which I cannot test).  It is not going
to be very portable.
  
Daiki Ueno March 21, 2015, 3:17 a.m. UTC | #9
Florian Weimer <fweimer@redhat.com> writes:

> The patch will use getauxval(AT_SECURE) or __libc_enable_secure (or
> issetuugid on other systems, but which I cannot test).  It is not going
> to be very portable.

I see (though I'm a bit confused that you removed the use of
__libc_enable_secure in CVE-2014-0475).  Can't you use secure_getenv,
which Gnulib provides a replacement, compare the result with
the normal getenv, and apply the pathname check if needed?

Regards,
  
Florian Weimer March 23, 2015, 2:14 p.m. UTC | #10
On 03/21/2015 04:17 AM, Daiki Ueno wrote:
> Florian Weimer <fweimer@redhat.com> writes:
> 
>> The patch will use getauxval(AT_SECURE) or __libc_enable_secure (or
>> issetuugid on other systems, but which I cannot test).  It is not going
>> to be very portable.
> 
> I see (though I'm a bit confused that you removed the use of
> __libc_enable_secure in CVE-2014-0475).  Can't you use secure_getenv,
> which Gnulib provides a replacement, compare the result with
> the normal getenv, and apply the pathname check if needed?

Hmm, I was under the impression that absolute paths for LANGUAGE were a
supported feature.  If that's not the case, we can just reject directory
traversal and confine lookups to the system locale directory, like we
did for the other locale files.
  

Patch

From 8909a24b692e38472d7c05911f4b65eb24292c66 Mon Sep 17 00:00:00 2001
From: Daiki Ueno <ueno@gnu.org>
Date: Wed, 11 Mar 2015 14:43:34 +0900
Subject: [PATCH] intl: Proof against invalid offset/length

* gettext-runtime/intl/loadmsgcat.c (_nl_load_domain): Check the upper
bound of offset/length values read from MO file.

* gettext-tools/tests/gettext-9: New file.
* gettext-tools/tests/Makefile.am (TESTS): Add new test.
---
 gettext-runtime/intl/ChangeLog    |   6 ++
 gettext-runtime/intl/loadmsgcat.c | 132 ++++++++++++++++++++++++++++++--------
 gettext-tools/tests/ChangeLog     |   5 ++
 gettext-tools/tests/Makefile.am   |   2 +-
 gettext-tools/tests/gettext-9     |  24 +++++++
 5 files changed, 142 insertions(+), 27 deletions(-)
 create mode 100755 gettext-tools/tests/gettext-9

diff --git a/gettext-runtime/intl/ChangeLog b/gettext-runtime/intl/ChangeLog
index d32774c..ed0a14d 100644
--- a/gettext-runtime/intl/ChangeLog
+++ b/gettext-runtime/intl/ChangeLog
@@ -1,3 +1,9 @@ 
+2015-03-11  Daiki Ueno  <ueno@gnu.org>
+
+	intl: Proof against invalid offset/length
+	* loadmsgcat.c (_nl_load_domain): Check the upper bound of
+	offset/length values read from MO file.
+
 2015-01-22  Daiki Ueno  <ueno@gnu.org>
 
 	intl: Update from gnulib
diff --git a/gettext-runtime/intl/loadmsgcat.c b/gettext-runtime/intl/loadmsgcat.c
index 8eb77d8..b97b316 100644
--- a/gettext-runtime/intl/loadmsgcat.c
+++ b/gettext-runtime/intl/loadmsgcat.c
@@ -799,6 +799,8 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
   int revision;
   const char *nullentry;
   size_t nullentrylen;
+  size_t orig_tab_offset;
+  size_t trans_tab_offset;
 
   __libc_lock_lock_recursive (lock);
   if (domain_file->decided != 0)
@@ -921,6 +923,7 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
   domain->mmap_size = size;
   domain->must_swap = data->magic != _MAGIC;
   domain->malloced = NULL;
+  domain->hash_tab = NULL;
 
   /* Fill in the information about the available tables.  */
   revision = W (domain->must_swap, data->revision);
@@ -930,16 +933,32 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
     case 0:
     case 1:
       domain->nstrings = W (domain->must_swap, data->nstrings);
+      orig_tab_offset = W (domain->must_swap, data->orig_tab_offset);
+      if (__builtin_expect (orig_tab_offset >= size, 0))
+	goto invalid;
       domain->orig_tab = (const struct string_desc *)
-	((char *) data + W (domain->must_swap, data->orig_tab_offset));
+	((char *) data + orig_tab_offset);
+      trans_tab_offset = W (domain->must_swap, data->trans_tab_offset);
+      if (__builtin_expect (trans_tab_offset >= size, 0))
+	goto invalid;
       domain->trans_tab = (const struct string_desc *)
-	((char *) data + W (domain->must_swap, data->trans_tab_offset));
+	((char *) data + trans_tab_offset);
       domain->hash_size = W (domain->must_swap, data->hash_tab_size);
-      domain->hash_tab =
-	(domain->hash_size > 2
-	 ? (const nls_uint32 *)
-	   ((char *) data + W (domain->must_swap, data->hash_tab_offset))
-	 : NULL);
+      if (__builtin_expect (domain->hash_size
+			    >= SIZE_MAX / sizeof (nls_uint32), 0))
+	goto invalid;
+      if (domain->hash_size > 2)
+	{
+	  size_t hash_tab_byte_size = domain->hash_size * sizeof (nls_uint32);
+	  size_t hash_tab_offset = W (domain->must_swap, data->hash_tab_offset);
+	  if (__builtin_expect (hash_tab_offset
+				>= SIZE_MAX - hash_tab_byte_size, 0)
+	      || __builtin_expect (hash_tab_offset + hash_tab_byte_size
+				   >= size, 0))
+	    goto invalid;
+	  domain->hash_tab = (const nls_uint32 *)
+	    ((char *) data + hash_tab_offset);
+	}
       domain->must_swap_hash_tab = domain->must_swap;
 
       /* Now dispatch on the minor revision.  */
@@ -968,6 +987,9 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
 		const char **sysdep_segment_values;
 		const nls_uint32 *orig_sysdep_tab;
 		const nls_uint32 *trans_sysdep_tab;
+		size_t sysdep_segments_offset;
+		size_t orig_sysdep_tab_offset;
+		size_t trans_sysdep_tab_offset;
 		nls_uint32 n_inmem_sysdep_strings;
 		size_t memneed;
 		char *mem;
@@ -979,20 +1001,30 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
 		/* Get the values of the system dependent segments.  */
 		n_sysdep_segments =
 		  W (domain->must_swap, data->n_sysdep_segments);
+		sysdep_segments_offset =
+		  W (domain->must_swap, data->sysdep_segments_offset);
+		if (__builtin_expect (sysdep_segments_offset >= size, 0))
+		  goto invalid;
 		sysdep_segments = (const struct sysdep_segment *)
-		  ((char *) data
-		   + W (domain->must_swap, data->sysdep_segments_offset));
+		  ((char *) data + sysdep_segments_offset);
 		sysdep_segment_values =
 		  (const char **)
 		  alloca (n_sysdep_segments * sizeof (const char *));
 		for (i = 0; i < n_sysdep_segments; i++)
 		  {
-		    const char *name =
-		      (char *) data
-		      + W (domain->must_swap, sysdep_segments[i].offset);
+		    const char *name;
+		    size_t nameoff =
+		      W (domain->must_swap, sysdep_segments[i].offset);
 		    nls_uint32 namelen =
 		      W (domain->must_swap, sysdep_segments[i].length);
+		    if (__builtin_expect (namelen >= SIZE_MAX - nameoff, 0)
+			|| __builtin_expect (nameoff + namelen >= size, 0))
+		      {
+			freea (sysdep_segment_values);
+			goto invalid;
+		      }
 
+		    name = (char *) data + nameoff;
 		    if (!(namelen > 0 && name[namelen - 1] == '\0'))
 		      {
 			freea (sysdep_segment_values);
@@ -1002,12 +1034,24 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
 		    sysdep_segment_values[i] = get_sysdep_segment_value (name);
 		  }
 
+		orig_sysdep_tab_offset =
+		  W (domain->must_swap, data->orig_sysdep_tab_offset);
+		if (__builtin_expect (orig_sysdep_tab_offset >= size, 0))
+		  {
+		    freea (sysdep_segment_values);
+		    goto invalid;
+		  }
 		orig_sysdep_tab = (const nls_uint32 *)
-		  ((char *) data
-		   + W (domain->must_swap, data->orig_sysdep_tab_offset));
+		  ((char *) data + orig_sysdep_tab_offset);
+		trans_sysdep_tab_offset =
+		  W (domain->must_swap, data->trans_sysdep_tab_offset);
+		if (__builtin_expect (trans_sysdep_tab_offset >= size, 0))
+		  {
+		    freea (sysdep_segment_values);
+		    goto invalid;
+		  }
 		trans_sysdep_tab = (const nls_uint32 *)
-		  ((char *) data
-		   + W (domain->must_swap, data->trans_sysdep_tab_offset));
+		  ((char *) data + trans_sysdep_tab_offset);
 
 		/* Compute the amount of additional memory needed for the
 		   system dependent strings and the augmented hash table.
@@ -1022,22 +1066,52 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
 
 		    for (j = 0; j < 2; j++)
 		      {
-			const struct sysdep_string *sysdep_string =
-			  (const struct sysdep_string *)
-			  ((char *) data
-			   + W (domain->must_swap,
-				j == 0
-				? orig_sysdep_tab[i]
-				: trans_sysdep_tab[i]));
+			size_t sysdep_string_offset;
+			const struct sysdep_string *sysdep_string;
+			size_t offset;
 			size_t need = 0;
-			const struct segment_pair *p = sysdep_string->segments;
+			const struct segment_pair *p;
 
+			sysdep_string_offset = W (domain->must_swap,
+						  j == 0
+						  ? orig_sysdep_tab[i]
+						  : trans_sysdep_tab[i]);
+			if (__builtin_expect (sysdep_string_offset >= size, 0))
+			  {
+			    freea (sysdep_segment_values);
+			    goto invalid;
+			  }
+			sysdep_string = (const struct sysdep_string *)
+			  ((char *) data + sysdep_string_offset);
+
+			offset = W (domain->must_swap, sysdep_string->offset);
+			if (__builtin_expect (offset >= size, 0))
+			  {
+			    freea (sysdep_segment_values);
+			    goto invalid;
+			  }
+
+			p = sysdep_string->segments;
 			if (W (domain->must_swap, p->sysdepref) != SEGMENTS_END)
 			  for (p = sysdep_string->segments;; p++)
 			    {
+			      nls_uint32 segsize = W (domain->must_swap,
+						      p->segsize);
 			      nls_uint32 sysdepref;
+			      size_t n;
 
-			      need += W (domain->must_swap, p->segsize);
+                              if (__builtin_expect (segsize
+                                                    >= SIZE_MAX - offset, 0)
+                                  || __builtin_expect (offset + segsize
+                                                       >= size, 0)
+                                  || __builtin_expect (segsize
+                                                       >= SIZE_MAX - need, 0))
+                                {
+                                  freea (sysdep_segment_values);
+                                  goto invalid;
+                                }
+
+			      need += segsize;
 
 			      sysdepref = W (domain->must_swap, p->sysdepref);
 			      if (sysdepref == SEGMENTS_END)
@@ -1057,7 +1131,13 @@  _nl_load_domain (struct loaded_l10nfile *domain_file,
 				  break;
 				}
 
-			      need += strlen (sysdep_segment_values[sysdepref]);
+			      n = strlen (sysdep_segment_values[sysdepref]);
+			      if (__builtin_expect (n >= SIZE_MAX - need, 0))
+				{
+				  freea (sysdep_segment_values);
+				  goto invalid;
+				}
+			      need += n;
 			    }
 
 			needs[j] = need;
diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog
index 72a46d4..9983a14 100644
--- a/gettext-tools/tests/ChangeLog
+++ b/gettext-tools/tests/ChangeLog
@@ -1,3 +1,8 @@ 
+2015-03-11  Daiki Ueno  <ueno@gnu.org>
+
+	* gettext-9: New file.
+	* Makefile.am (TESTS): Add new test.
+
 2015-03-05  Daiki Ueno  <ueno@gnu.org>
 
 	* format-kde-kuit-1: New file.
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am
index a4decaf..7c995e3 100644
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -21,7 +21,7 @@  EXTRA_DIST =
 MOSTLYCLEANFILES = core *.stackdump
 
 TESTS = gettext-1 gettext-2 gettext-3 gettext-4 gettext-5 gettext-6 gettext-7 \
-	gettext-8 \
+	gettext-8 gettext-9 \
 	msgattrib-1 msgattrib-2 msgattrib-3 msgattrib-4 msgattrib-5 \
 	msgattrib-6 msgattrib-7 msgattrib-8 msgattrib-9 msgattrib-10 \
 	msgattrib-11 msgattrib-12 msgattrib-13 msgattrib-14 msgattrib-15 \
diff --git a/gettext-tools/tests/gettext-9 b/gettext-tools/tests/gettext-9
new file mode 100755
index 0000000..7d6d4c8
--- /dev/null
+++ b/gettext-tools/tests/gettext-9
@@ -0,0 +1,24 @@ 
+#! /bin/sh
+. "${srcdir=.}/init.sh"; path_prepend_ . ../src
+
+# Test invalid or incomplete input
+
+: ${LOCALE_FR=fr_FR}
+{ test $LOCALE_FR != none && LC_ALL=$LOCALE_FR ../testlocale; } || {
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no traditional french locale is installed"
+  else
+    echo "Skipping test: no traditional french locale is supported"
+  fi
+  exit 77
+}
+
+: ${GETTEXT=gettext}
+
+test -d fr || mkdir fr
+test -d fr/LC_MESSAGES || mkdir fr/LC_MESSAGES
+
+for n in 1 2 3 4 5 6; do
+  cp "$abs_srcdir"/overflow-$n.mo fr/LC_MESSAGES
+  echo LANGUAGE= LC_ALL=${LOCALE_FR} TEXTDOMAINDIR=. ${GETTEXT} -d overflow-$n foo
+done
-- 
2.1.0