From patchwork Thu Mar 19 21:01:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 38585 Return-Path: X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by sourceware.org (Postfix) with ESMTPS id 236B33861C30 for ; Thu, 19 Mar 2020 21:02:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 236B33861C30 Received: by mail-qt1-x82f.google.com with SMTP id l13so3165634qtv.10 for ; Thu, 19 Mar 2020 14:02:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=c4n1Jwz1q6cVtn/LlL3eIUXwtUApMZw0goktN4lcBG4=; b=RSAY7X6BuDNKrbPxOcofJeXuasTNhbtNjLZpiULGepe5IGnTWom6FTgx5qTmFz1UoA dOt2I+Z2p35cC0gnSe0k3o5Lh0vLMgbHE8s7yD3GY41j3O8fC8FzlubH4G/+EpabsyRX XwhQhCHYrads1NLFMb+TviIU19RSSnjbgh1QVKbbB1MIfLi1uvBLlnJGrJFlyHTmlbki vMCzVg2HzfUK/TXmIgsjTu/O0MA2YgiL5DOsMwLhseiHDkEAjIPrF0Z7bodnwJgAhX/2 JkKxQFkHqG5W6KG2TWErDDbxvexmx8rVfqLd1AtJYobcfbM/8c5m9/NNpL9qWgvPq63c fARg== X-Gm-Message-State: ANhLgQ3DoPi6YhlY4kBibKJztAp6vCdIh7swc+h01dmOMlyayBwnBgkG U2vpjIJRk78zfMvcc8CPtuYhcP7YHbs= X-Google-Smtp-Source: ADFU+vtFR9ZnbGdFDfvKThKe6FWAtj+aaTbGjDYjAYiaak3A5Arse1/hzspfpLastkQUDulE88mHmg== X-Received: by 2002:ac8:43d2:: with SMTP id w18mr5050090qtn.192.1584651720686; Thu, 19 Mar 2020 14:02:00 -0700 (PDT) Received: from localhost.localdomain ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id b7sm2357126qkh.0.2020.03.19.14.01.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Mar 2020 14:02:00 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Subject: [PATCH] stdio: Remove memory leak from multibyte convertion (BZ#25691) Date: Thu, 19 Mar 2020 18:01:55 -0300 Message-Id: <20200319210155.31625-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-25.2 required=5.0 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Mar 2020 21:02:04 -0000 From: Florian Weimer This is an updated version of a previous patch [1] with the following changes: - Use compiler overflow builtins on done_add_func function. - Define the scratch +utstring_converted_wide_string using CHAR_T. - Added a testcase and mention the bug report. Both default and wide printf functions might leak memory when manipulate multibyte characters conversion depending of the size of the input (whether __libc_use_alloca trigger or not the fallback heap allocation). This patch fixes it by removing the extra memory allocation on string formatting with conversion parts. The testcase uses input argument size that trigger memory leaks on unpatched code (using a scratch buffer the threashold to use heap allocation is lower). Checked on x86_64-linux-gnu and i686-linux-gnu. Co-authored-by: Adhemerval Zanella [1] https://sourceware.org/pipermail/libc-alpha/2017-June/082098.html --- stdio-common/Makefile | 9 +- stdio-common/tst-printf-bz25691.c | 110 ++++++++++ stdio-common/vfprintf-internal.c | 323 +++++++++++++++++------------- 3 files changed, 296 insertions(+), 146 deletions(-) create mode 100644 stdio-common/tst-printf-bz25691.c diff --git a/stdio-common/Makefile b/stdio-common/Makefile index 95af0c12d7..0a8d66b846 100644 --- a/stdio-common/Makefile +++ b/stdio-common/Makefile @@ -66,6 +66,7 @@ tests := tstscanf test_rdwr test-popen tstgetln test-fseek \ tst-scanf-round \ tst-renameat2 tst-bz11319 tst-bz11319-fortify2 \ scanf14a scanf16a \ + tst-printf-bz25691 test-srcs = tst-unbputc tst-printf tst-printfsz-islongdouble @@ -75,10 +76,12 @@ tests-special += $(objpfx)tst-unbputc.out $(objpfx)tst-printf.out \ $(objpfx)tst-printf-bz18872-mem.out \ $(objpfx)tst-setvbuf1-cmp.out \ $(objpfx)tst-vfprintf-width-prec-mem.out \ - $(objpfx)tst-printfsz-islongdouble.out + $(objpfx)tst-printfsz-islongdouble.out \ + $(objpfx)tst-printf-bz25691-mem.out generated += tst-printf-bz18872.c tst-printf-bz18872.mtrace \ tst-printf-bz18872-mem.out \ - tst-vfprintf-width-prec.mtrace tst-vfprintf-width-prec-mem.out + tst-vfprintf-width-prec.mtrace tst-vfprintf-width-prec-mem.out \ + tst-printf-bz25691.mtrace tst-printf-bz25691-mem.out endif include ../Rules @@ -100,6 +103,8 @@ endif tst-printf-bz18872-ENV = MALLOC_TRACE=$(objpfx)tst-printf-bz18872.mtrace tst-vfprintf-width-prec-ENV = \ MALLOC_TRACE=$(objpfx)tst-vfprintf-width-prec.mtrace +tst-printf-bz25691-ENV = \ + MALLOC_TRACE=$(objpfx)tst-printf-bz25691.mtrace $(objpfx)tst-unbputc.out: tst-unbputc.sh $(objpfx)tst-unbputc $(SHELL) $< $(common-objpfx) '$(test-program-prefix)' > $@; \ diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-printf-bz25691.c new file mode 100644 index 0000000000..4ebb91fb44 --- /dev/null +++ b/stdio-common/tst-printf-bz25691.c @@ -0,0 +1,110 @@ +/* Test for memory leak with large width (BZ#25691). + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +static int +do_test (void) +{ + mtrace (); + + /* For 's' conversion specifier with 'l' modifier the array must be + converted to multibyte characters up to the precision specific + value. This might require a heap allocated temporary buffer for + large precisions. */ + { + /* The input size value is to force a heap allocation on temporary + buffer. */ + const size_t winputsize = 64 * 1024 + 1; + wchar_t *winput = xmalloc (winputsize * sizeof (wchar_t)); + wmemset (winput, L'a', winputsize - 1); + winput[winputsize - 1] = L'\0'; + + char result[9]; + const char expected[] = "aaaaaaaa"; + int ret; + + ret = snprintf (result, sizeof (result), "%.65537ls", winput); + TEST_COMPARE (ret, winputsize - 1); + TEST_COMPARE_BLOB (result, sizeof (result), expected, sizeof (expected)); + + ret = snprintf (result, sizeof (result), "%ls", winput); + TEST_COMPARE (ret, winputsize - 1); + TEST_COMPARE_BLOB (result, sizeof (result), expected, sizeof (expected)); + + free (winput); + } + + /* For 's' converstion specifier the array is interpreted as a multibyte + character sequence and converted to wide characters up to the precision + specific value. This might require a heap allocated temporary buffer + for large precisions. */ + { + /* The input size value is to force a heap allocation on temporary + buffer. */ + const size_t mbssize = 32 * 1024; + char *mbs = xmalloc (mbssize); + memset (mbs, 'a', mbssize - 1); + mbs[mbssize - 1] = '\0'; + + const size_t expectedsize = 32 * 1024; + wchar_t *expected = xmalloc (expectedsize * sizeof (wchar_t)); + wmemset (expected, L'a', expectedsize - 1); + expected[expectedsize-1] = L'\0'; + + const size_t resultsize = mbssize * sizeof (wchar_t); + wchar_t *result = xmalloc (resultsize); + int ret; + + ret = swprintf (result, resultsize, L"%.65537s", mbs); + TEST_COMPARE (ret, mbssize - 1); + TEST_COMPARE_BLOB (result, (ret + 1) * sizeof (wchar_t), + expected, expectedsize * sizeof (wchar_t)); + + ret = swprintf (result, resultsize, L"%1$.65537s", mbs); + TEST_COMPARE (ret, mbssize - 1); + TEST_COMPARE_BLOB (result, (ret + 1) * sizeof (wchar_t), + expected, expectedsize * sizeof (wchar_t)); + + /* Same test, but with an invalid multibyte sequence. */ + mbs[mbssize - 2] = 0xff; + + ret = swprintf (result, resultsize, L"%.65537s", mbs); + TEST_COMPARE (ret, -1); + + ret = swprintf (result, resultsize, L"%1$.65537s", mbs); + TEST_COMPARE (ret, -1); + + free (mbs); + free (result); + free (expected); + } + + return 0; +} + +#include diff --git a/stdio-common/vfprintf-internal.c b/stdio-common/vfprintf-internal.c index 3be92d4b6e..d34e31746c 100644 --- a/stdio-common/vfprintf-internal.c +++ b/stdio-common/vfprintf-internal.c @@ -119,22 +119,38 @@ while (0) #endif -#define done_add(val) \ - do { \ - unsigned int _val = val; \ - assert ((unsigned int) done < (unsigned int) INT_MAX); \ - if (__glibc_unlikely (INT_MAX - done < _val)) \ - { \ - done = -1; \ - __set_errno (EOVERFLOW); \ - goto all_done; \ - } \ - done += _val; \ - } while (0) +/* Add LENGTH to DONE. Return the new value of DONE, or -1 on + overflow (and set errno accordingly). */ +static inline int +done_add_func (size_t length, int done) +{ + if (done < 0) + return done; + int ret; + if (__builtin_add_overflow (done, length, &ret)) + { + __set_errno (EOVERFLOW); + return -1; + } + return ret; +} + +#define done_add(val) \ + do \ + { \ + /* Ensure that VAL has a type similar to int. */ \ + _Static_assert (sizeof (val) == sizeof (int), "value int size"); \ + _Static_assert ((__typeof__ (val)) -1 < 0, "value signed"); \ + done = done_add_func ((val), done); \ + if (done < 0) \ + goto all_done; \ + } \ + while (0) #ifndef COMPILE_WPRINTF # define vfprintf __vfprintf_internal # define CHAR_T char +# define OTHER_CHAR_T wchar_t # define UCHAR_T unsigned char # define INT_T int typedef const char *THOUSANDS_SEP_T; @@ -143,25 +159,14 @@ typedef const char *THOUSANDS_SEP_T; # define STR_LEN(Str) strlen (Str) # define PUT(F, S, N) _IO_sputn ((F), (S), (N)) -# define PAD(Padchar) \ - do { \ - if (width > 0) \ - { \ - ssize_t written = _IO_padn (s, (Padchar), width); \ - if (__glibc_unlikely (written != width)) \ - { \ - done = -1; \ - goto all_done; \ - } \ - done_add (written); \ - } \ - } while (0) # define PUTC(C, F) _IO_putc_unlocked (C, F) # define ORIENT if (_IO_vtable_offset (s) == 0 && _IO_fwide (s, -1) != -1)\ return -1 +# define CONVERT_FROM_OTHER_STRING __wcsrtombs #else # define vfprintf __vfwprintf_internal # define CHAR_T wchar_t +# define OTHER_CHAR_T char /* This is a hack!!! There should be a type uwchar_t. */ # define UCHAR_T unsigned int /* uwchar_t */ # define INT_T wint_t @@ -173,21 +178,9 @@ typedef wchar_t THOUSANDS_SEP_T; # include <_itowa.h> # define PUT(F, S, N) _IO_sputn ((F), (S), (N)) -# define PAD(Padchar) \ - do { \ - if (width > 0) \ - { \ - ssize_t written = _IO_wpadn (s, (Padchar), width); \ - if (__glibc_unlikely (written != width)) \ - { \ - done = -1; \ - goto all_done; \ - } \ - done_add (written); \ - } \ - } while (0) # define PUTC(C, F) _IO_putwc_unlocked (C, F) # define ORIENT if (_IO_fwide (s, 1) != 1) return -1 +# define CONVERT_FROM_OTHER_STRING __mbsrtowcs # undef _itoa # define _itoa(Val, Buf, Base, Case) _itowa (Val, Buf, Base, Case) @@ -196,6 +189,33 @@ typedef wchar_t THOUSANDS_SEP_T; # define EOF WEOF #endif +static inline int +pad_func (FILE *s, CHAR_T padchar, int width, int done) +{ + if (width > 0) + { + ssize_t written; +#ifndef COMPILE_WPRINTF + written = _IO_padn (s, padchar, width); +#else + written = _IO_wpadn (s, padchar, width); +#endif + if (__glibc_unlikely (written != width)) + return -1; + return done_add_func (width, done); + } + return done; +} + +#define PAD(Padchar) \ + do \ + { \ + done = pad_func (s, (Padchar), width, done); \ + if (done < 0) \ + goto all_done; \ + } \ + while (0) + #include "_i18n_number.h" /* Include the shared code for parsing the format string. */ @@ -215,24 +235,115 @@ typedef wchar_t THOUSANDS_SEP_T; } \ while (0) -#define outstring(String, Len) \ - do \ - { \ - assert ((size_t) done <= (size_t) INT_MAX); \ - if ((size_t) PUT (s, (String), (Len)) != (size_t) (Len)) \ - { \ - done = -1; \ - goto all_done; \ - } \ - if (__glibc_unlikely (INT_MAX - done < (Len))) \ - { \ - done = -1; \ - __set_errno (EOVERFLOW); \ - goto all_done; \ - } \ - done += (Len); \ - } \ - while (0) +static inline int +outstring_func (FILE *s, const UCHAR_T *string, size_t length, int done) +{ + assert ((size_t) done <= (size_t) INT_MAX); + if ((size_t) PUT (s, string, length) != (size_t) (length)) + return -1; + return done_add_func (length, done); +} + +#define outstring(String, Len) \ + do \ + { \ + const void *string_ = (String); \ + done = outstring_func (s, string_, (Len), done); \ + if (done < 0) \ + goto all_done; \ + } \ + while (0) + +/* Write the string SRC to S. If PREC is non-negative, write at most + PREC bytes. If LEFT is true, perform left justification. */ +static int +outstring_converted_wide_string (FILE *s, const OTHER_CHAR_T *src, int prec, + int width, bool left, int done) +{ + /* Use a small buffer to combine processing of multiple characters. + CONVERT_FROM_OTHER_STRING expects the buffer size in (wide) + characters, and buf_length counts that. */ + enum { buf_length = 256 / sizeof (CHAR_T) }; + CHAR_T buf[buf_length]; + _Static_assert (sizeof (buf) > MB_LEN_MAX, + "buffer is large enough for a single multi-byte character"); + + /* Add the initial padding if needed. */ + if (width > 0 && !left) + { + /* Make a first pass to find the output width, so that we can + add the required padding. */ + mbstate_t mbstate = { 0 }; + const OTHER_CHAR_T *src_copy = src; + size_t total_written; + if (prec < 0) + total_written = CONVERT_FROM_OTHER_STRING + (NULL, &src_copy, 0, &mbstate); + else + { + /* The source might not be null-terminated. Enforce the + limit manually, based on the output length. */ + total_written = 0; + size_t limit = prec; + while (limit > 0 && src_copy != NULL) + { + size_t write_limit = buf_length; + if (write_limit > limit) + write_limit = limit; + size_t written = CONVERT_FROM_OTHER_STRING + (buf, &src_copy, write_limit, &mbstate); + if (written == (size_t) -1) + return -1; + if (written == 0) + break; + total_written += written; + limit -= written; + } + } + + /* Output initial padding. */ + if (total_written < width) + { + done = pad_func (s, L_(' '), width - total_written, done); + if (done < 0) + return done; + } + } + + /* Convert the input string, piece by piece. */ + size_t total_written = 0; + { + mbstate_t mbstate = { 0 }; + /* If prec is negative, remaining is not decremented, otherwise, + it serves as the write limit. */ + size_t remaining = -1; + if (prec >= 0) + remaining = prec; + while (remaining > 0 && src != NULL) + { + size_t write_limit = buf_length; + if (remaining < write_limit) + write_limit = remaining; + size_t written = CONVERT_FROM_OTHER_STRING + (buf, &src, write_limit, &mbstate); + if (written == (size_t) -1) + return -1; + if (written == 0) + break; + done = outstring_func (s, (const UCHAR_T *) buf, written, done); + if (done < 0) + return done; + total_written += written; + if (prec >= 0) + remaining -= written; + } + } + + /* Add final padding. */ + if (width > 0 && left && total_written < width) + return pad_func (s, L_(' '), width - total_written, done); + return done; +} /* For handling long_double and longlong we use the same flag. If `long' and `long long' are effectively the same type define it to @@ -1022,7 +1133,6 @@ static const uint8_t jump_table[] = LABEL (form_string): \ { \ size_t len; \ - int string_malloced; \ \ /* The string argument could in fact be `char *' or `wchar_t *'. \ But this should not make a difference here. */ \ @@ -1034,7 +1144,6 @@ static const uint8_t jump_table[] = /* Entry point for printing other strings. */ \ LABEL (print_string): \ \ - string_malloced = 0; \ if (string == NULL) \ { \ /* Write "(null)" if there's space. */ \ @@ -1051,41 +1160,12 @@ static const uint8_t jump_table[] = } \ else if (!is_long && spec != L_('S')) \ { \ - /* This is complicated. We have to transform the multibyte \ - string into a wide character string. */ \ - const char *mbs = (const char *) string; \ - mbstate_t mbstate; \ - \ - len = prec != -1 ? __strnlen (mbs, (size_t) prec) : strlen (mbs); \ - \ - /* Allocate dynamically an array which definitely is long \ - enough for the wide character version. Each byte in the \ - multi-byte string can produce at most one wide character. */ \ - if (__glibc_unlikely (len > SIZE_MAX / sizeof (wchar_t))) \ - { \ - __set_errno (EOVERFLOW); \ - done = -1; \ - goto all_done; \ - } \ - else if (__libc_use_alloca (len * sizeof (wchar_t))) \ - string = (CHAR_T *) alloca (len * sizeof (wchar_t)); \ - else if ((string = (CHAR_T *) malloc (len * sizeof (wchar_t))) \ - == NULL) \ - { \ - done = -1; \ - goto all_done; \ - } \ - else \ - string_malloced = 1; \ - \ - memset (&mbstate, '\0', sizeof (mbstate_t)); \ - len = __mbsrtowcs (string, &mbs, len, &mbstate); \ - if (len == (size_t) -1) \ - { \ - /* Illegal multibyte character. */ \ - done = -1; \ - goto all_done; \ - } \ + done = outstring_converted_wide_string \ + (s, (const char *) string, prec, width, left, done); \ + if (done < 0) \ + goto all_done; \ + /* The padding has already been written. */ \ + break; \ } \ else \ { \ @@ -1108,8 +1188,6 @@ static const uint8_t jump_table[] = outstring (string, len); \ if (left) \ PAD (L' '); \ - if (__glibc_unlikely (string_malloced)) \ - free (string); \ } \ break; #else @@ -1158,7 +1236,6 @@ static const uint8_t jump_table[] = LABEL (form_string): \ { \ size_t len; \ - int string_malloced; \ \ /* The string argument could in fact be `char *' or `wchar_t *'. \ But this should not make a difference here. */ \ @@ -1170,7 +1247,6 @@ static const uint8_t jump_table[] = /* Entry point for printing other strings. */ \ LABEL (print_string): \ \ - string_malloced = 0; \ if (string == NULL) \ { \ /* Write "(null)" if there's space. */ \ @@ -1196,51 +1272,12 @@ static const uint8_t jump_table[] = } \ else \ { \ - const wchar_t *s2 = (const wchar_t *) string; \ - mbstate_t mbstate; \ - \ - memset (&mbstate, '\0', sizeof (mbstate_t)); \ - \ - if (prec >= 0) \ - { \ - /* The string `s2' might not be NUL terminated. */ \ - if (__libc_use_alloca (prec)) \ - string = (char *) alloca (prec); \ - else if ((string = (char *) malloc (prec)) == NULL) \ - { \ - done = -1; \ - goto all_done; \ - } \ - else \ - string_malloced = 1; \ - len = __wcsrtombs (string, &s2, prec, &mbstate); \ - } \ - else \ - { \ - len = __wcsrtombs (NULL, &s2, 0, &mbstate); \ - if (len != (size_t) -1) \ - { \ - assert (__mbsinit (&mbstate)); \ - s2 = (const wchar_t *) string; \ - if (__libc_use_alloca (len + 1)) \ - string = (char *) alloca (len + 1); \ - else if ((string = (char *) malloc (len + 1)) == NULL) \ - { \ - done = -1; \ - goto all_done; \ - } \ - else \ - string_malloced = 1; \ - (void) __wcsrtombs (string, &s2, len + 1, &mbstate); \ - } \ - } \ - \ - if (len == (size_t) -1) \ - { \ - /* Illegal wide-character string. */ \ - done = -1; \ - goto all_done; \ - } \ + done = outstring_converted_wide_string \ + (s, (const wchar_t *) string, prec, width, left, done); \ + if (done < 0) \ + goto all_done; \ + /* The padding has already been written. */ \ + break; \ } \ \ if ((width -= len) < 0) \ @@ -1254,8 +1291,6 @@ static const uint8_t jump_table[] = outstring (string, len); \ if (left) \ PAD (' '); \ - if (__glibc_unlikely (string_malloced)) \ - free (string); \ } \ break; #endif