From patchwork Wed Jun 30 08:56:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 44051 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E14FC383581A for ; Wed, 30 Jun 2021 08:58:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E14FC383581A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1625043516; bh=FNc1CkDcOYmtRwS/hV3C+kr0kVGXNEwVDgZ9sS0Sbsc=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=yQJJx9f4ALky73ZEQO36pmXYT/M9k6IwZ8JPO4kkJVLvPdmHggR9IkPPoOym+ZCNr 2awA2rIg5/aFx7IVo6z/X7iCzLtwxB8TyHYF7kUCx/0SMICGrmC8opoXgOH4ZcIJe6 9WZAV7NnjOhxm31q+r7K3QWNVT3gKl1Zs5U05KNU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from black.elm.relay.mailchannels.net (black.elm.relay.mailchannels.net [23.83.212.19]) by sourceware.org (Postfix) with ESMTPS id C3A9A3839C46 for ; Wed, 30 Jun 2021 08:58:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C3A9A3839C46 X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id EA6CC2274B; Wed, 30 Jun 2021 08:58:10 +0000 (UTC) Received: from pdx1-sub0-mail-a26.g.dreamhost.com (100-98-55-130.trex-nlb.outbound.svc.cluster.local [100.98.55.130]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 1D9EF22739; Wed, 30 Jun 2021 08:58:10 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from pdx1-sub0-mail-a26.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.98.55.130 (trex/6.3.3); Wed, 30 Jun 2021 08:58:10 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Unite-Shelf: 1d8d17d603db5d70_1625043490801_275721564 X-MC-Loop-Signature: 1625043490801:2307673760 X-MC-Ingress-Time: 1625043490801 Received: from pdx1-sub0-mail-a26.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a26.g.dreamhost.com (Postfix) with ESMTP id D9C9A80A8C; Wed, 30 Jun 2021 08:58:09 +0000 (UTC) Received: from rhbox.intra.reserved-bit.com (unknown [1.186.101.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a26.g.dreamhost.com (Postfix) with ESMTPSA id CD1927EFD7; Wed, 30 Jun 2021 08:58:07 +0000 (UTC) X-DH-BACKEND: pdx1-sub0-mail-a26 To: libc-alpha@sourceware.org Subject: [PATCH] setlocale: Fail if iconv module for charset is not present [BZ #27996] Date: Wed, 30 Jun 2021 14:26:42 +0530 Message-Id: <20210630085642.2661589-1-siddhesh@sourceware.org> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3494.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Siddhesh Poyarekar via Libc-alpha From: Siddhesh Poyarekar Reply-To: Siddhesh Poyarekar Cc: fweimer@redhat.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" setlocale currently succeeds even if the requested locale uses a charset that does not have a converter module installed. Check for existence of the charset (either the one requested through the input name or the one needed by the selected locale file) and fail if it doesn't. For testing, test6.c has been recycled because it was unused. The test verifes that loading test5 and test6 locales fail because both locales have charsets without a converter, viz. test5 and test6 respectively. --- iconv/gconv_db.c | 25 ++++++ iconv/gconv_int.h | 2 + locale/findlocale.c | 62 +++++++++----- localedata/Makefile | 8 +- localedata/tests/test6.c | 137 ------------------------------- localedata/tst-invalid-charset.c | 31 +++++++ 6 files changed, 103 insertions(+), 162 deletions(-) delete mode 100644 localedata/tests/test6.c create mode 100644 localedata/tst-invalid-charset.c diff --git a/iconv/gconv_db.c b/iconv/gconv_db.c index af868e8023..547a6d0a44 100644 --- a/iconv/gconv_db.c +++ b/iconv/gconv_db.c @@ -698,6 +698,31 @@ do_lookup_alias (const char *name) return found != NULL ? (*found)->toname : NULL; } +bool +__gconv_module_exists (const char *name) +{ + /* Ensure that the configuration data is read. */ + __gconv_load_conf (); + + name = do_lookup_alias (name) ?: name; + + struct gconv_module *root = __gconv_modules_db; + + while (root != NULL) + { + int cmpres; + + cmpres = strcmp (name, root->from_string); + if (cmpres == 0) + return true; + else if (cmpres < 0) + root = root->left; + else + root = root->right; + } + + return false; +} int __gconv_compare_alias (const char *name1, const char *name2) diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h index 30a9286be2..6a52275700 100644 --- a/iconv/gconv_int.h +++ b/iconv/gconv_int.h @@ -269,6 +269,8 @@ libc_hidden_proto (__gconv_transliterate) extern int __gconv_compare_alias (const char *name1, const char *name2) attribute_hidden; +/* Return true if NAME exists either as a module or an alias. */ +extern bool __gconv_module_exists (const char *name) attribute_hidden; /* Builtin transformations. */ #ifdef _LIBC diff --git a/locale/findlocale.c b/locale/findlocale.c index ab09122b0c..10dfe2aef3 100644 --- a/locale/findlocale.c +++ b/locale/findlocale.c @@ -98,6 +98,15 @@ valid_locale_name (const char *name) return 1; } +static bool +codeset_has_module (const char *codeset) +{ + char *ccodeset = (char *) alloca (strlen (codeset) + 3); + strip (ccodeset, codeset); + + return __gconv_module_exists (ccodeset); +} + struct __locale_data * _nl_find_locale (const char *locale_path, size_t locale_path_len, int category, const char **name) @@ -200,6 +209,10 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, /* Memory allocate problem. */ return NULL; + /* The requested codeset does not have a converter, don't use it. */ + if (codeset != NULL && !codeset_has_module (codeset)) + return NULL; + /* If exactly this locale was already asked for we have an entry with the complete name. */ locale_file = _nl_make_l10nflist (&_nl_locale_file_list[category], @@ -248,6 +261,33 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, return NULL; } + /* Get the codeset information from the locale file. */ + static const int codeset_idx[] = + { + [__LC_CTYPE] = _NL_ITEM_INDEX (CODESET), + [__LC_NUMERIC] = _NL_ITEM_INDEX (_NL_NUMERIC_CODESET), + [__LC_TIME] = _NL_ITEM_INDEX (_NL_TIME_CODESET), + [__LC_COLLATE] = _NL_ITEM_INDEX (_NL_COLLATE_CODESET), + [__LC_MONETARY] = _NL_ITEM_INDEX (_NL_MONETARY_CODESET), + [__LC_MESSAGES] = _NL_ITEM_INDEX (_NL_MESSAGES_CODESET), + [__LC_PAPER] = _NL_ITEM_INDEX (_NL_PAPER_CODESET), + [__LC_NAME] = _NL_ITEM_INDEX (_NL_NAME_CODESET), + [__LC_ADDRESS] = _NL_ITEM_INDEX (_NL_ADDRESS_CODESET), + [__LC_TELEPHONE] = _NL_ITEM_INDEX (_NL_TELEPHONE_CODESET), + [__LC_MEASUREMENT] = _NL_ITEM_INDEX (_NL_MEASUREMENT_CODESET), + [__LC_IDENTIFICATION] = _NL_ITEM_INDEX (_NL_IDENTIFICATION_CODESET) + }; + const struct __locale_data *data; + const char *locale_codeset; + + data = (const struct __locale_data *) locale_file->data; + locale_codeset = (const char *) data->values[codeset_idx[category]].string; + assert (locale_codeset != NULL); + + /* The locale codeset does not have a converter, don't use it. */ + if (locale_codeset[0] != '\0' && !codeset_has_module (locale_codeset)) + return NULL; + /* The LC_CTYPE category allows to check whether a locale is really usable. If the locale name contains a charset name and the charset name used in the locale (present in the LC_CTYPE data) is @@ -256,31 +296,9 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, in the locale name. */ if (codeset != NULL) { - /* Get the codeset information from the locale file. */ - static const int codeset_idx[] = - { - [__LC_CTYPE] = _NL_ITEM_INDEX (CODESET), - [__LC_NUMERIC] = _NL_ITEM_INDEX (_NL_NUMERIC_CODESET), - [__LC_TIME] = _NL_ITEM_INDEX (_NL_TIME_CODESET), - [__LC_COLLATE] = _NL_ITEM_INDEX (_NL_COLLATE_CODESET), - [__LC_MONETARY] = _NL_ITEM_INDEX (_NL_MONETARY_CODESET), - [__LC_MESSAGES] = _NL_ITEM_INDEX (_NL_MESSAGES_CODESET), - [__LC_PAPER] = _NL_ITEM_INDEX (_NL_PAPER_CODESET), - [__LC_NAME] = _NL_ITEM_INDEX (_NL_NAME_CODESET), - [__LC_ADDRESS] = _NL_ITEM_INDEX (_NL_ADDRESS_CODESET), - [__LC_TELEPHONE] = _NL_ITEM_INDEX (_NL_TELEPHONE_CODESET), - [__LC_MEASUREMENT] = _NL_ITEM_INDEX (_NL_MEASUREMENT_CODESET), - [__LC_IDENTIFICATION] = _NL_ITEM_INDEX (_NL_IDENTIFICATION_CODESET) - }; - const struct __locale_data *data; - const char *locale_codeset; char *clocale_codeset; char *ccodeset; - data = (const struct __locale_data *) locale_file->data; - locale_codeset = - (const char *) data->values[codeset_idx[category]].string; - assert (locale_codeset != NULL); /* Note the length of the allocated memory: +3 for up to two slashes and the NUL byte. */ clocale_codeset = (char *) alloca (strlen (locale_codeset) + 3); diff --git a/localedata/Makefile b/localedata/Makefile index 14e04cd3c5..797523b250 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -128,7 +128,7 @@ ld-test-names := test1 test2 test3 test4 test5 test6 test7 ld-test-srcs := $(addprefix tests/,$(addsuffix .cm,$(ld-test-names)) \ $(addsuffix .def,$(ld-test-names)) \ $(addsuffix .ds,test5 test6) \ - test6.c trans.def) + trans.def) fmon-tests = n01y12 n02n40 n10y31 n11y41 n12y11 n20n32 n30y20 n41n00 \ y01y10 y02n22 y22n42 y30y21 y32n31 y40y00 y42n21 @@ -158,7 +158,7 @@ tests = $(locale_test_suite) tst-digits tst-setlocale bug-iconv-trans \ tst-leaks tst-mbswcs1 tst-mbswcs2 tst-mbswcs3 tst-mbswcs4 tst-mbswcs5 \ tst-mbswcs6 tst-xlocale1 tst-xlocale2 bug-usesetlocale \ tst-strfmon1 tst-sscanf bug-setlocale1 tst-setlocale2 tst-setlocale3 \ - tst-wctype tst-iconv-math-trans + tst-wctype tst-iconv-math-trans tst-invalid-charset tests-static = bug-setlocale1-static tests += $(tests-static) ifeq (yes,$(build-shared)) @@ -402,6 +402,7 @@ $(objpfx)tst-langinfo-setlocale-static.out: tst-langinfo.sh \ $(evaluate-test) $(objpfx)tst-digits.out: $(objpfx)tst-locale.out +$(objpfx)tst-invalid-charset.out: $(objpfx)tst-locale.out $(objpfx)tst-mbswcs6.out: $(addprefix $(objpfx),$(CTYPE_FILES)) endif @@ -461,7 +462,8 @@ $(objpfx)mtrace-tst-leaks.out: $(objpfx)tst-leaks.out $(common-objpfx)malloc/mtrace $(objpfx)tst-leaks.mtrace > $@; \ $(evaluate-test) -bug-setlocale1-ENV-only = LOCPATH=$(objpfx) LC_CTYPE=de_DE.UTF-8 +bug-setlocale1-ENV-only = GCONV_PATH=$(common-objpfx)iconvdata \ + LOCPATH=$(objpfx) LC_CTYPE=de_DE.UTF-8 bug-setlocale1-static-ENV-only = $(bug-setlocale1-ENV-only) $(objdir)/iconvdata/gconv-modules: diff --git a/localedata/tests/test6.c b/localedata/tests/test6.c deleted file mode 100644 index edb5fe4a5f..0000000000 --- a/localedata/tests/test6.c +++ /dev/null @@ -1,137 +0,0 @@ -/* Test program for character classes and mappings. - Copyright (C) 1999-2021 Free Software Foundation, Inc. - This file is part of the GNU C Library. - Contributed by Ulrich Drepper , 1999. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ - -#include -#include -#include - - -int -main (void) -{ - const char lower[] = "abcdefghijklmnopqrstuvwxyz"; - const char upper[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; -#define LEN (sizeof (upper) - 1) - const wchar_t wlower[] = L"abcdefghijklmnopqrstuvwxyz"; - const wchar_t wupper[] = L"ABCDEFGHIJKLMNOPQRSTUVWXYZ"; - int i; - int result = 0; - - setlocale (LC_ALL, "test6"); - - for (i = 0; i < LEN; ++i) - { - /* Test basic table handling (basic == not more than 256 characters). - The charmaps swaps the normal lower-upper case meaning of the - ASCII characters used in the source code while the Unicode mapping - in the repertoire map has the normal correspondents. This test - shows the independence of the tables for `char' and `wchar_t' - characters. */ - - if (islower (lower[i])) - { - printf ("islower ('%c') false\n", lower[i]); - result = 1; - } - if (! isupper (lower[i])) - { - printf ("isupper ('%c') false\n", lower[i]); - result = 1; - } - - if (! islower (upper[i])) - { - printf ("islower ('%c') false\n", upper[i]); - result = 1; - } - if (isupper (upper[i])) - { - printf ("isupper ('%c') false\n", upper[i]); - result = 1; - } - - if (toupper (lower[i]) != lower[i]) - { - printf ("toupper ('%c') false\n", lower[i]); - result = 1; - } - if (tolower (lower[i]) != upper[i]) - { - printf ("tolower ('%c') false\n", lower[i]); - result = 1; - } - - if (tolower (upper[i]) != upper[i]) - { - printf ("tolower ('%c') false\n", upper[i]); - result = 1; - } - if (toupper (upper[i]) != lower[i]) - { - printf ("toupper ('%c') false\n", upper[i]); - result = 1; - } - - if (iswlower (wupper[i])) - { - printf ("iswlower (L'%c') false\n", upper[i]); - result = 1; - } - if (! iswupper (wupper[i])) - { - printf ("iswupper (L'%c') false\n", upper[i]); - result = 1; - } - - if (iswupper (wlower[i])) - { - printf ("iswupper (L'%c') false\n", lower[i]); - result = 1; - } - if (! iswlower (wlower[i])) - { - printf ("iswlower (L'%c') false\n", lower[i]); - result = 1; - } - - if (towupper (wlower[i]) != wupper[i]) - { - printf ("towupper ('%c') false\n", lower[i]); - result = 1; - } - if (towlower (wlower[i]) != wlower[i]) - { - printf ("towlower ('%c') false\n", lower[i]); - result = 1; - } - - if (towlower (wupper[i]) != wlower[i]) - { - printf ("towlower ('%c') false\n", upper[i]); - result = 1; - } - if (towupper (wupper[i]) != wupper[i]) - { - printf ("towupper ('%c') false\n", upper[i]); - result = 1; - } - } - - return result; -} diff --git a/localedata/tst-invalid-charset.c b/localedata/tst-invalid-charset.c new file mode 100644 index 0000000000..46a5198c66 --- /dev/null +++ b/localedata/tst-invalid-charset.c @@ -0,0 +1,31 @@ +/* Test program to verify that setlocale fails for charsets that do not have a + converter. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + + +int +main (void) +{ + /* Fail if setlocale succeeds for any of these locales. */ + return (setlocale (LC_ALL, "test5") != NULL + || setlocale (LC_ALL, "test6") != NULL); +}