From patchwork Tue Feb 2 13:08:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 41894 X-Patchwork-Delegate: azanella@linux.vnet.ibm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EDB7E3987C11; Tue, 2 Feb 2021 13:08:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EDB7E3987C11 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1612271293; bh=aSH7KU9NurB3ny6Oq7RRks+FEPeUlZxtEgvPAbXSXII=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=F6zffjuxs8q2yFDnCJdeBGpHepVYXbPDoRIdUQrzyuExEA3sZHhES3CBQIUpEirxo zVQfZUmm2o3sZUa33JrOKTdlpv+JpZymzTV7cyahZhS7fc8Z+Y1w9TqQH5U9mC4lUA uGgFTyqI7dCSz23VAs7NNrQS0BXkfCRPwFdFfJBY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by sourceware.org (Postfix) with ESMTPS id 2DA92385801A for ; Tue, 2 Feb 2021 13:08:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2DA92385801A Received: by mail-qv1-xf34.google.com with SMTP id r13so5400071qvm.11 for ; Tue, 02 Feb 2021 05:08:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=aSH7KU9NurB3ny6Oq7RRks+FEPeUlZxtEgvPAbXSXII=; b=m7gb4ZlYWriR06z1338RaY388agAj1PLuW0T5vr6BSGVUG914AqBWJoZdZH18gnE+K 4ALd8JvOP/ZG8iRSLrBjFlaSsuYqfzNlHKrOEe0z18pKwRzd/u6lLtZWoXsST4xxM3Ln P2U56yVb1tNMO7cPmutbgVLJk6LmSboOalWco/5dOKe2s8v3t8zJjhADVXa0Bt3lOM7I ZKiZIBRE2M2WyXiLaTkDip3MMFSYKAez2dOcu2lPza9dZogh7gF6538ZwIlfcHJTX/1v QGGhFXq4DnVTc5JK6M0GZVxEnrdY4YGwy5rMmkSPTuODVnAdOt0yVh2wVhVyuT/uE6SL CGMA== X-Gm-Message-State: AOAM531iGEcoFrnGSm6vmqnmLGYyVvcUHCV7MnAdkXzbioP3ncij3ySx h0v54aYNO4/4E+sQ/QrVldGsCjznmPHacg== X-Google-Smtp-Source: ABdhPJwqGaOFivYQj5Phf3pKf/iCuDgnheRIm3A25ohekOuPJkMBGa4k9Jpv6O7z4aiywl8KVG0TvA== X-Received: by 2002:a05:6214:14e2:: with SMTP id k2mr19946893qvw.24.1612271289435; Tue, 02 Feb 2021 05:08:09 -0800 (PST) Received: from localhost.localdomain ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id h63sm15305510qtd.14.2021.02.02.05.08.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Feb 2021 05:08:08 -0800 (PST) To: libc-alpha@sourceware.org Subject: [PATCH 1/2] posix: Falling back to non wide mode in case of encoding error [BZ #14185] Date: Tue, 2 Feb 2021 10:08:03 -0300 Message-Id: <20210202130804.1920933-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Cc: bug-gnulib@gnu.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Gnulib has added the proposed fix with aed23714d60 (done in 2005), but recently with a glibc merge with 67306f6 (done in 2020 with sync back) it has fallback to old semantic to return -1 on in case of failure. From gnulib developer feedback it was an oversight. Although the full fix for BZ #14185 would require to rewrite fnmatch implementation to use mbrtowc instead of mbsrtowcs on the full input, this mitigate the issue and it has been used by gnulib for a long time. This patch also removes the alloca usage on the string convertion to wide characters before calling the internal function. Checked on x86_64-linux-gnu. --- posix/fnmatch.c | 160 ++++++++++++++-------------------------- posix/tst-fnmatch.input | 2 + 2 files changed, 58 insertions(+), 104 deletions(-) diff --git a/posix/fnmatch.c b/posix/fnmatch.c index b8a71f164d..a66c9196c7 100644 --- a/posix/fnmatch.c +++ b/posix/fnmatch.c @@ -75,6 +75,7 @@ extern int fnmatch (const char *pattern, const char *string, int flags); #include #include +#include #ifdef _LIBC typedef ptrdiff_t idx_t; @@ -231,121 +232,72 @@ is_char_class (const wchar_t *wcs) #include "fnmatch_loop.c" +static int +fnmatch_convert_to_wide (const char *str, struct scratch_buffer *buf, + size_t *n) +{ + mbstate_t ps; + memset (&ps, '\0', sizeof (ps)); + + size_t nw = buf->length / sizeof (wchar_t); + *n = strnlen (str, nw - 1); + if (__glibc_likely (*n < nw)) + { + const char *p = str; + *n = mbsrtowcs (buf->data, &p, *n + 1, &ps); + if (__glibc_unlikely (*n == (size_t) -1)) + /* Something wrong. + XXX Do we have to set 'errno' to something which mbsrtows hasn't + already done? */ + return -1; + if (p == NULL) + return 0; + memset (&ps, '\0', sizeof (ps)); + } + + *n = mbsrtowcs (NULL, &str, 0, &ps); + if (__glibc_unlikely (*n == (size_t) -1)) + return -1; + if (!scratch_buffer_set_array_size (buf, *n + 1, sizeof (wchar_t))) + { + __set_errno (ENOMEM); + return -2; + } + assert (mbsinit (&ps)); + mbsrtowcs (buf->data, &str, *n + 1, &ps); + return 0; +} int fnmatch (const char *pattern, const char *string, int flags) { if (__glibc_unlikely (MB_CUR_MAX != 1)) { - mbstate_t ps; size_t n; - const char *p; - wchar_t *wpattern_malloc = NULL; - wchar_t *wpattern; - wchar_t *wstring_malloc = NULL; - wchar_t *wstring; - size_t alloca_used = 0; - - /* Convert the strings into wide characters. */ - memset (&ps, '\0', sizeof (ps)); - p = pattern; - n = strnlen (pattern, 1024); - if (__glibc_likely (n < 1024)) + struct scratch_buffer wpattern; + scratch_buffer_init (&wpattern); + struct scratch_buffer wstring; + scratch_buffer_init (&wstring); + int r; + + /* Convert the strings into wide characters. Any conversion issue + fallback to the ascii version. */ + r = fnmatch_convert_to_wide (pattern, &wpattern, &n); + if (r == 0) { - wpattern = (wchar_t *) alloca_account ((n + 1) * sizeof (wchar_t), - alloca_used); - n = mbsrtowcs (wpattern, &p, n + 1, &ps); - if (__glibc_unlikely (n == (size_t) -1)) - /* Something wrong. - XXX Do we have to set 'errno' to something which mbsrtows hasn't - already done? */ - return -1; - if (p) - { - memset (&ps, '\0', sizeof (ps)); - goto prepare_wpattern; - } - } - else - { - prepare_wpattern: - n = mbsrtowcs (NULL, &pattern, 0, &ps); - if (__glibc_unlikely (n == (size_t) -1)) - /* Something wrong. - XXX Do we have to set 'errno' to something which mbsrtows hasn't - already done? */ - return -1; - if (__glibc_unlikely (n >= (size_t) -1 / sizeof (wchar_t))) - { - __set_errno (ENOMEM); - return -2; - } - wpattern_malloc = wpattern - = (wchar_t *) malloc ((n + 1) * sizeof (wchar_t)); - assert (mbsinit (&ps)); - if (wpattern == NULL) - return -2; - (void) mbsrtowcs (wpattern, &pattern, n + 1, &ps); - } - - assert (mbsinit (&ps)); - n = strnlen (string, 1024); - p = string; - if (__glibc_likely (n < 1024)) - { - wstring = (wchar_t *) alloca_account ((n + 1) * sizeof (wchar_t), - alloca_used); - n = mbsrtowcs (wstring, &p, n + 1, &ps); - if (__glibc_unlikely (n == (size_t) -1)) - { - /* Something wrong. - XXX Do we have to set 'errno' to something which - mbsrtows hasn't already done? */ - free_return: - free (wpattern_malloc); - return -1; - } - if (p) - { - memset (&ps, '\0', sizeof (ps)); - goto prepare_wstring; - } - } - else - { - prepare_wstring: - n = mbsrtowcs (NULL, &string, 0, &ps); - if (__glibc_unlikely (n == (size_t) -1)) - /* Something wrong. - XXX Do we have to set 'errno' to something which mbsrtows hasn't - already done? */ - goto free_return; - if (__glibc_unlikely (n >= (size_t) -1 / sizeof (wchar_t))) - { - free (wpattern_malloc); - __set_errno (ENOMEM); - return -2; - } - - wstring_malloc = wstring - = (wchar_t *) malloc ((n + 1) * sizeof (wchar_t)); - if (wstring == NULL) - { - free (wpattern_malloc); - return -2; - } - assert (mbsinit (&ps)); - (void) mbsrtowcs (wstring, &string, n + 1, &ps); - } - - int res = internal_fnwmatch (wpattern, wstring, wstring + n, + r = fnmatch_convert_to_wide (string, &wstring, &n); + if (r == 0) + r = internal_fnwmatch (wpattern.data, wstring.data, + (wchar_t *) wstring.data + n, flags & FNM_PERIOD, flags, NULL, - alloca_used); + false); + } - free (wstring_malloc); - free (wpattern_malloc); + scratch_buffer_free (&wstring); + scratch_buffer_free (&wpattern); - return res; + if (r == -2 || r == 0) + return r; } return internal_fnmatch (pattern, string, string + strlen (string), diff --git a/posix/tst-fnmatch.input b/posix/tst-fnmatch.input index 4fef4ee829..67aac5aada 100644 --- a/posix/tst-fnmatch.input +++ b/posix/tst-fnmatch.input @@ -676,6 +676,8 @@ C "x" "*" 0 PATHNAME|LEADING_DIR C "x/y" "*" 0 PATHNAME|LEADING_DIR C "x/y/z" "*" 0 PATHNAME|LEADING_DIR C "x" "*x" 0 PATHNAME|LEADING_DIR + +en_US.UTF-8 "\366.csv" "*.csv" 0 C "x/y" "*x" 0 PATHNAME|LEADING_DIR C "x/y/z" "*x" 0 PATHNAME|LEADING_DIR C "x" "x*" 0 PATHNAME|LEADING_DIR