From patchwork Fri Sep 25 12:09:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arjun Shankar X-Patchwork-Id: 40504 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A5E463987C09; Fri, 25 Sep 2020 12:09:12 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from aloka.lostca.se (aloka.lostca.se [IPv6:2a01:4f8:120:624c::2]) by sourceware.org (Postfix) with ESMTPS id 071F13894C2C; Fri, 25 Sep 2020 12:09:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 071F13894C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=lostca.se Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arjun.is@lostca.se Received: from aloka.lostca.se (aloka [127.0.0.1]) by aloka.lostca.se (Postfix) with ESMTP id E98A41D611; Fri, 25 Sep 2020 12:09:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=lostca.se; h=date:from:to :cc:subject:message-id:mime-version:content-type; s=howrah; bh=g LZ1KU/+HCrEqybFTbMwTsRpko8=; b=GvdXoPxT+SyFUeysAYU21SwGrOb+/ha9R 5Q4JUgaPGvgZq5vqCQ2wtgeizG9l23rUuYQ/fsMVEkAepaELu7UBzP2E6uLiafG6 St6jTSOtFRiEHzHq7iz4HY7x1FM8/gNtxdizKI54sUFEL4EAh+F2P5uKPykyS72P pSGolXXeHQ= Received: from localhost (unknown [IPv6:2a01:4f8:120:624c::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aloka.lostca.se (Postfix) with ESMTPSA id ADA701D610; Fri, 25 Sep 2020 12:09:06 +0000 (UTC) Date: Fri, 25 Sep 2020 12:09:05 +0000 From: Arjun Shankar To: libc-alpha@sourceware.org Subject: [PATCH] intl: Handle translation output codesets with TRANSLIT or IGNORE suffixes Message-ID: <20200925120904.GA55310@aloka.lostca.se> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NORMAL_HTTP_TO_IP, NUMERIC_HTTP_ADDR, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer , Siddhesh Poyarekar Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" From: Arjun Shankar Commit 91927b7c7643 (Rewrite iconv option parsing [BZ #19519]) did not handle cases where the output codeset for translations (via the `gettext' family of functions) might have a caller specified encoding suffix such as TRANSLIT or IGNORE. This led to a regression where translations did not work when the codeset had a suffix. This commit fixes the above issue by parsing any suffixes passed to __dcigettext and adds two new test-cases to intl/tst-codeset.c to verify correct behaviour. The iconv-internal function __gconv_create_spec and the static iconv-internal function gconv_destroy_spec are now visible internally within glibc and used in intl/dcigettext.c. --- iconv/Versions | 4 +++- iconv/gconv_charset.c | 10 ++++++++++ iconv/gconv_charset.h | 27 --------------------------- iconv/gconv_int.h | 21 +++++++++++++++++++++ iconv/iconv_open.c | 2 +- iconv/iconv_prog.c | 2 +- intl/dcigettext.c | 17 ++++++++++------- intl/tst-codeset.c | 34 ++++++++++++++-------------------- support/.check.h.swp | Bin 0 -> 16384 bytes 9 files changed, 60 insertions(+), 57 deletions(-) create mode 100644 support/.check.h.swp GIT binary patch literal 16384 zcmeHNON=8|6)gw@BqV@fM(1J*eHHj}_N7yxyy67oWMUzkKmq+Pxn?tQkSyaPJyE4=s9~ zteJ-2@!Un%@Yv@Qfu+DVOFhT#n57^Z4n03CnSEj2E%uj*$i_1UG6r5Z0}p9e*Vk5g z`pTt?^#1qlzivs{#EgNAfsBESfsBESfsBESfsBESfj1um;qX4~3o!ewvFS-g@Rs;{ z^8VlWwHg0@Fg1Q({5>DPe|u^?{+hjI3}g&s3}g&s3}g&s3}g&s3}g&s3}g&s3}g)a z9~dyYn)WW}eUb}(JpWJX{}1NhI|HSHI`3Gf*3uM3*? zQ(y=bfjp1{9sw=`ue?puegceu9pFLWMKnFW0UQ7y0$xHBSq(f?MYI zvq>{WtuqlF*@4}01ce@jeL+LR7j8&H-!nxJkl}VI6hTN`-#!w4k*M9b1F|C53~l_~ z0s;nN;Q1pO7{j6M_6iiRAR{Epu${;k0r`UeULm??B;4Aa_4YqLU%k*5(WWpM{B2I>WUOV#aF2#Q?(V0XY4BFO6 z7bkY8+k>G)IU8IDp%F$w0S=x@uxv*ZKv+nQ?TUi7M16XyRK77^yHcEu4S9P5~)oTs; zX+^F7+<>_gbG$e(=T66fk|3NwEV^pZSYpG*qKg)eDG9bkIqB}gYiAF#gxr&}$Ikcb z9c`s6ENo9|S6glUpxUf$9qQF5YVGqHb!YCeB@bQ>R)UY!C-#~ojtnQl2^sK7Z5d!6 zN86RsL{!XLQ?=cfMM}1&YRzY^>3Nq^ijYjE)@R<0iE_Br@Oc*6ZX`&Y2s6URUVi3^ z^D-x^JL71(c5U`ypz8-^z1-Yx>3V(~ty|?uSF4keJd3Vf z%d7Z;{d;`8kMF|+6_A8ZYtHmdcTF2#%r*bfo>#9T-nGBT_ zjbm}vUF@}y(X6&p3(5Irtl|xvj#KgI*L4BLtwXZ&tT!4gF@dQg7kUhqh}v zQCHCQU=);09OOZ_hLf89YI1C;ym5z^(i#<0g zdVa6u*rsp;5tMF>*RZaBBE#|Ao?@1%-frwuh3a<4H~i5CFC|h^n9M_AG zHusGq0d`D*RGjl%(;JR>uO>q32xBb!ws3`SIJ6gaU<5B3%^ZPBX(Fd_gdAQ_SA+*Xfx?Kk&!TU9vP5-Np< zPb`FwxZAb%&hCDj%8f(1U2ZlZ^Ke6YkEOwFO}NqlzE9b3>)7yp!wuOnme8Iihy%Q9 z2XU${iwgF??+sZqvQgYKk8NbA9YIkbtjH-y3XppTs90#dxR!7h`(I$+a+o>tBA>h= z7dCkuZzFPWzx64eDOpJJ!yO&km}S8FaA7OM?nwHxdNnTCrvIc16UVLOyO{Er?+s1gq626Ksm zFH9>RN22t%f)MlVULQ$*B~RgoN4#X`^ z-55|BMox%eo6fe3gU_&9I@_#D?^8hb${5HP$QZ~N$QZ~NcoQ?QB3xu=gydx^%a?r?U9UE7CgnrAlZT8P zedp6wULZdrFU)aAQniocN)u_G&)I#dL{;7B&_@pdW3q98JbcKeAX7rZdfXSt5@o80 zv{Kdp9l=R6I?CE)n|I0*WLN@;TWkf~s&NC|MQM=q?>R$8osNz@_xL#(O-f=VicPS5 zV<3(_|E@9`3FLIw^jH{o3!J#SdgQW1UFCM%37#a19|R;srJLV*&KS>HXBrL{w{d@;tLcdqk)D?Msp1a1mr2jHgtGZlCpV?NPYK&EOZ=wFH>aBLO z*4WOarswC{B|BrjFapmtI?jlzT9h6gVH$`Tj`V>Y;BkW4FHpzhZW6*>_7_BVTtXWp zpFRvd#1;00d_0Ay#pU?Mtq-}k+W1NFiDyNscJ8R>OnotuatVb7xR|KZF(9nCu^IGJ^5>F$7_=}y$j}sNIZZhqTO?`^JSxikB~75y|8~? z-Eyy8Q8A4Tu=sUWW#X>XIDg-E9Iun?TBT8LhGlDfromcode); + free (conv_spec->tocode); + return; +} +libc_hidden_def (__gconv_destroy_spec) diff --git a/iconv/gconv_charset.h b/iconv/gconv_charset.h index b39b09aea1..e9c122cf7e 100644 --- a/iconv/gconv_charset.h +++ b/iconv/gconv_charset.h @@ -48,33 +48,6 @@ #define GCONV_IGNORE_ERRORS_SUFFIX "IGNORE" -/* This function accepts the charset names of the source and destination of the - conversion and populates *conv_spec with an equivalent conversion - specification that may later be used by __gconv_open. The charset names - might contain options in the form of suffixes that alter the conversion, - e.g. "ISO-10646/UTF-8/TRANSLIT". It processes the charset names, ignoring - and truncating any suffix options in fromcode, and processing and truncating - any suffix options in tocode. Supported suffix options ("TRANSLIT" or - "IGNORE") when found in tocode lead to the corresponding flag in *conv_spec - to be set to true. Unrecognized suffix options are silently discarded. If - the function succeeds, it returns conv_spec back to the caller. It returns - NULL upon failure. */ -struct gconv_spec * -__gconv_create_spec (struct gconv_spec *conv_spec, const char *fromcode, - const char *tocode); -libc_hidden_proto (__gconv_create_spec) - - -/* This function frees all heap memory allocated by __gconv_create_spec. */ -static void __attribute__ ((unused)) -gconv_destroy_spec (struct gconv_spec *conv_spec) -{ - free (conv_spec->fromcode); - free (conv_spec->tocode); - return; -} - - /* This function copies in-order, characters from the source 's' that are either alpha-numeric or one in one of these: "_-.,:/" - into the destination 'wp' while dropping all other characters. In the process, it converts all diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h index e86938dae7..f721ce30ff 100644 --- a/iconv/gconv_int.h +++ b/iconv/gconv_int.h @@ -152,6 +152,27 @@ extern int __gconv_open (struct gconv_spec *conv_spec, __gconv_t *handle, int flags); libc_hidden_proto (__gconv_open) +/* This function accepts the charset names of the source and destination of the + conversion and populates *conv_spec with an equivalent conversion + specification that may later be used by __gconv_open. The charset names + might contain options in the form of suffixes that alter the conversion, + e.g. "ISO-10646/UTF-8/TRANSLIT". It processes the charset names, ignoring + and truncating any suffix options in fromcode, and processing and truncating + any suffix options in tocode. Supported suffix options ("TRANSLIT" or + "IGNORE") when found in tocode lead to the corresponding flag in *conv_spec + to be set to true. Unrecognized suffix options are silently discarded. If + the function succeeds, it returns conv_spec back to the caller. It returns + NULL upon failure. */ +extern struct gconv_spec * +__gconv_create_spec (struct gconv_spec *conv_spec, const char *fromcode, + const char *tocode); +libc_hidden_proto (__gconv_create_spec) + +/* This function frees all heap memory allocated by __gconv_create_spec. */ +extern void +__gconv_destroy_spec (struct gconv_spec *conv_spec); +libc_hidden_proto (__gconv_destroy_spec) + /* Free resources associated with transformation descriptor CD. */ extern int __gconv_close (__gconv_t cd) attribute_hidden; diff --git a/iconv/iconv_open.c b/iconv/iconv_open.c index dd54bc12e0..5b30055c04 100644 --- a/iconv/iconv_open.c +++ b/iconv/iconv_open.c @@ -39,7 +39,7 @@ iconv_open (const char *tocode, const char *fromcode) int res = __gconv_open (&conv_spec, &cd, 0); - gconv_destroy_spec (&conv_spec); + __gconv_destroy_spec (&conv_spec); if (__builtin_expect (res, __GCONV_OK) != __GCONV_OK) { diff --git a/iconv/iconv_prog.c b/iconv/iconv_prog.c index b4334faa57..d59979759c 100644 --- a/iconv/iconv_prog.c +++ b/iconv/iconv_prog.c @@ -184,7 +184,7 @@ main (int argc, char *argv[]) /* Let's see whether we have these coded character sets. */ res = __gconv_open (&conv_spec, &cd, 0); - gconv_destroy_spec (&conv_spec); + __gconv_destroy_spec (&conv_spec); if (res != __GCONV_OK) { diff --git a/intl/dcigettext.c b/intl/dcigettext.c index 2e7c662bc7..bd332e71da 100644 --- a/intl/dcigettext.c +++ b/intl/dcigettext.c @@ -1120,15 +1120,18 @@ _nl_find_msg (struct loaded_l10nfile *domain_file, # ifdef _LIBC - struct gconv_spec conv_spec - = { .fromcode = norm_add_slashes (charset, ""), - .tocode = norm_add_slashes (outcharset, ""), - /* We always want to use transliteration. */ - .translit = true, - .ignore = false - }; + struct gconv_spec conv_spec; + + __gconv_create_spec (&conv_spec, charset, outcharset); + + /* We always want to use transliteration. */ + conv_spec.translit = true; + int r = __gconv_open (&conv_spec, &convd->conv, GCONV_AVOID_NOCONV); + + __gconv_destroy_spec (&conv_spec); + if (__builtin_expect (r != __GCONV_OK, 0)) { /* If the output encoding is the same there is diff --git a/intl/tst-codeset.c b/intl/tst-codeset.c index fd70432eca..c13f791641 100644 --- a/intl/tst-codeset.c +++ b/intl/tst-codeset.c @@ -22,13 +22,11 @@ #include #include #include +#include static int do_test (void) { - char *s; - int result = 0; - unsetenv ("LANGUAGE"); unsetenv ("OUTPUT_CHARSET"); setlocale (LC_ALL, "de_DE.ISO-8859-1"); @@ -36,25 +34,21 @@ do_test (void) bindtextdomain ("codeset", OBJPFX "domaindir"); /* Here we expect output in ISO-8859-1. */ - s = gettext ("cheese"); - if (strcmp (s, "K\344se")) - { - printf ("call 1 returned: %s\n", s); - result = 1; - } + TEST_COMPARE_STRING (gettext ("cheese"), "K\344se"); + /* Here we expect output in UTF-8. */ bind_textdomain_codeset ("codeset", "UTF-8"); + TEST_COMPARE_STRING (gettext ("cheese"), "K\303\244se"); - /* Here we expect output in UTF-8. */ - s = gettext ("cheese"); - if (strcmp (s, "K\303\244se")) - { - printf ("call 2 returned: %s\n", s); - result = 1; - } - - return result; + /* The accented `a' is transliterated. */ + bind_textdomain_codeset ("codeset", "ASCII//TRANSLIT"); + TEST_COMPARE_STRING (gettext ("cheese"), "Kaese"); + + /* Transliteration also works by default even if not set. */ + bind_textdomain_codeset ("codeset", "ASCII"); + TEST_COMPARE_STRING (gettext ("cheese"), "Kaese"); + + return 0; } -#define TEST_FUNCTION do_test () -#include "../test-skeleton.c" +#include diff --git a/support/.check.h.swp b/support/.check.h.swp new file mode 100644 index 0000000000000000000000000000000000000000..bec356c03cf91a9c00655cbf5b29d40164ea58d7