From patchwork Mon Jan 25 09:02:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gautier X-Patchwork-Id: 41802 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D3A4B39450F9; Mon, 25 Jan 2021 09:02:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D3A4B39450F9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1611565378; bh=5IPzPYn46PD/hF+ZDMEBmI3yu8imQS0xpPoWzVSSgVo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=qVr6kn3go2CVzNyO6xtKMZz0CavwTbpi+J5Olxu8qJjwY/x4oqGKQQS5o2Q/XFHM0 g1us7uTQds/OgdBXYzvoElgFBdfurtGseMaTx1tSp19QCtkOcfs/hBxXuabe16Q/5v UFi7OTQ2DDY2ghcCU+0Wv9R2QCEv9JAp1JbdB9ew= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by sourceware.org (Postfix) with ESMTPS id 1023A3858C27 for ; Mon, 25 Jan 2021 09:02:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1023A3858C27 Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4DPP5n5whZzQlYY; Mon, 25 Jan 2021 10:02:53 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id 6HXtoYIN9oju; Mon, 25 Jan 2021 10:02:48 +0100 (CET) To: libc-alpha@sourceware.org Subject: [PATCH v3 1/5] Copy utf-7 module to modified-utf-7 Date: Mon, 25 Jan 2021 10:02:22 +0100 Message-Id: <20210125090226.39967-2-mg@max.gautier.name> In-Reply-To: <20210125090226.39967-1-mg@max.gautier.name> References: <87y2m9agmm.fsf@mid.deneb.enyo.de> <20210125090226.39967-1-mg@max.gautier.name> MIME-Version: 1.0 X-MBO-SPAM-Probability: X-Rspamd-Score: -6.16 / 15.00 / 15.00 X-Rspamd-Queue-Id: 28AB6186C X-Rspamd-UID: f79406 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Max Gautier via Libc-alpha From: Max Gautier Reply-To: Max Gautier Cc: Max Gautier Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" --- iconvdata/Makefile | 2 +- iconvdata/modified-utf-7.c | 531 +++++++++++++++++++++++++++++++++++++ 2 files changed, 532 insertions(+), 1 deletion(-) create mode 100644 iconvdata/modified-utf-7.c diff --git a/iconvdata/Makefile b/iconvdata/Makefile index c8c532a3e4..7f932e10ed 100644 --- a/iconvdata/Makefile +++ b/iconvdata/Makefile @@ -61,7 +61,7 @@ modules := ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8859-5 \ IBM5347 IBM9030 IBM9066 IBM9448 IBM12712 IBM16804 \ IBM1364 IBM1371 IBM1388 IBM1390 IBM1399 ISO_11548-1 MIK BRF \ MAC-CENTRALEUROPE KOI8-RU ISO8859-9E \ - CP770 CP771 CP772 CP773 CP774 + CP770 CP771 CP772 CP773 CP774 MODIFIED-UTF-7 # If lazy binding is disabled, use BIND_NOW for the gconv modules. ifeq ($(bind-now),yes) diff --git a/iconvdata/modified-utf-7.c b/iconvdata/modified-utf-7.c new file mode 100644 index 0000000000..fc6a8dfcfd --- /dev/null +++ b/iconvdata/modified-utf-7.c @@ -0,0 +1,531 @@ +/* Conversion module for UTF-7. + Copyright (C) 2000-2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* UTF-7 is a legacy encoding used for transmitting Unicode within the + ASCII character set, used primarily by mail agents. New programs + are encouraged to use UTF-8 instead. + + UTF-7 is specified in RFC 2152 (and old RFC 1641, RFC 1642). The + original Base64 encoding is defined in RFC 2045. */ + +#include +#include +#include +#include + + +/* Define this to 1 if you want the so-called "optional direct" characters + ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } + to be encoded. Define to 0 if you want them to be passed straight + through, like the so-called "direct" characters. + We set this to 1 because it's safer. + */ +#define UTF7_ENCODE_OPTIONAL_CHARS 1 + + +/* The set of "direct characters": + A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr +*/ + +static const unsigned char direct_tab[128 / 8] = + { + 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87, + 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 + }; + +static int +isdirect (uint32_t ch) +{ + return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1)); +} + + +/* The set of "direct and optional direct characters": + A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr + ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } +*/ + +static const unsigned char xdirect_tab[128 / 8] = + { + 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f + }; + +static int +isxdirect (uint32_t ch) +{ + return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1)); +} + + +/* The set of "extended base64 characters": + A-Z a-z 0-9 + / - +*/ + +static const unsigned char xbase64_tab[128 / 8] = + { + 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03, + 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 + }; + +static int +isxbase64 (uint32_t ch) +{ + return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1)); +} + + +/* Converts a value in the range 0..63 to a base64 encoded char. */ +static unsigned char +base64 (unsigned int i) +{ + if (i < 26) + return i + 'A'; + else if (i < 52) + return i - 26 + 'a'; + else if (i < 62) + return i - 52 + '0'; + else if (i == 62) + return '+'; + else if (i == 63) + return '/'; + else + abort (); +} + + +/* Definitions used in the body of the `gconv' function. */ +#define CHARSET_NAME "UTF-7//" +#define DEFINE_INIT 1 +#define DEFINE_FINI 1 +#define FROM_LOOP from_utf7_loop +#define TO_LOOP to_utf7_loop +#define MIN_NEEDED_FROM 1 +#define MAX_NEEDED_FROM 6 +#define MIN_NEEDED_TO 4 +#define MAX_NEEDED_TO 4 +#define ONE_DIRECTION 0 +#define PREPARE_LOOP \ + mbstate_t saved_state; \ + mbstate_t *statep = data->__statep; +#define EXTRA_LOOP_ARGS , statep + + +/* Since we might have to reset input pointer we must be able to save + and restore the state. */ +#define SAVE_RESET_STATE(Save) \ + if (Save) \ + saved_state = *statep; \ + else \ + *statep = saved_state + + +/* First define the conversion function from UTF-7 to UCS4. + The state is structured as follows: + __count bit 2..0: zero + __count bit 8..3: shift + __wch: data + Precise meaning: + shift data + 0 -- not inside base64 encoding + 1..32 XX..XX00..00 inside base64, (32 - shift) bits pending + This state layout is simpler than relying on STORE_REST/UNPACK_BYTES. + + When shift = 0, __wch needs to store at most one lookahead byte (see + __GCONV_INCOMPLETE_INPUT below). +*/ +#define MIN_NEEDED_INPUT MIN_NEEDED_FROM +#define MAX_NEEDED_INPUT MAX_NEEDED_FROM +#define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +#define MAX_NEEDED_OUTPUT MAX_NEEDED_TO +#define LOOPFCT FROM_LOOP +#define BODY \ + { \ + uint_fast8_t ch = *inptr; \ + \ + if ((statep->__count >> 3) == 0) \ + { \ + /* base64 encoding inactive. */ \ + if (isxdirect (ch)) \ + { \ + inptr++; \ + put32 (outptr, ch); \ + outptr += 4; \ + } \ + else if (__glibc_likely (ch == '+')) \ + { \ + if (__glibc_unlikely (inptr + 2 > inend)) \ + { \ + /* Not enough input available. */ \ + result = __GCONV_INCOMPLETE_INPUT; \ + break; \ + } \ + if (inptr[1] == '-') \ + { \ + inptr += 2; \ + put32 (outptr, ch); \ + outptr += 4; \ + } \ + else \ + { \ + /* Switch into base64 mode. */ \ + inptr++; \ + statep->__count = (32 << 3); \ + statep->__value.__wch = 0; \ + } \ + } \ + else \ + { \ + /* The input is invalid. */ \ + STANDARD_FROM_LOOP_ERR_HANDLER (1); \ + } \ + } \ + else \ + { \ + /* base64 encoding active. */ \ + uint32_t i; \ + int shift; \ + \ + if (ch >= 'A' && ch <= 'Z') \ + i = ch - 'A'; \ + else if (ch >= 'a' && ch <= 'z') \ + i = ch - 'a' + 26; \ + else if (ch >= '0' && ch <= '9') \ + i = ch - '0' + 52; \ + else if (ch == '+') \ + i = 62; \ + else if (ch == '/') \ + i = 63; \ + else \ + { \ + /* Terminate base64 encoding. */ \ + \ + /* If accumulated data is nonzero, the input is invalid. */ \ + /* Also, partial UTF-16 characters are invalid. */ \ + if (__builtin_expect (statep->__value.__wch != 0, 0) \ + || __builtin_expect ((statep->__count >> 3) <= 26, 0)) \ + { \ + STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \ + } \ + \ + if (ch == '-') \ + inptr++; \ + \ + statep->__count = 0; \ + continue; \ + } \ + \ + /* Concatenate the base64 integer i to the accumulator. */ \ + shift = (statep->__count >> 3); \ + if (shift > 6) \ + { \ + uint32_t wch; \ + \ + shift -= 6; \ + wch = statep->__value.__wch | (i << shift); \ + \ + if (shift <= 16 && shift > 10) \ + { \ + /* An UTF-16 character has just been completed. */ \ + uint32_t wc1 = wch >> 16; \ + \ + /* UTF-16: When we see a High Surrogate, we must also decode \ + the following Low Surrogate. */ \ + if (!(wc1 >= 0xd800 && wc1 < 0xdc00)) \ + { \ + wch = wch << 16; \ + shift += 16; \ + put32 (outptr, wc1); \ + outptr += 4; \ + } \ + } \ + else if (shift <= 10 && shift > 4) \ + { \ + /* After a High Surrogate, verify that the next 16 bit \ + indeed form a Low Surrogate. */ \ + uint32_t wc2 = wch & 0xffff; \ + \ + if (! __builtin_expect (wc2 >= 0xdc00 && wc2 < 0xe000, 1)) \ + { \ + STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1));\ + } \ + } \ + \ + statep->__value.__wch = wch; \ + } \ + else \ + { \ + /* An UTF-16 surrogate pair has just been completed. */ \ + uint32_t wc1 = (uint32_t) statep->__value.__wch >> 16; \ + uint32_t wc2 = ((uint32_t) statep->__value.__wch & 0xffff) \ + | (i >> (6 - shift)); \ + \ + statep->__value.__wch = (i << shift) << 26; \ + shift += 26; \ + \ + assert (wc1 >= 0xd800 && wc1 < 0xdc00); \ + assert (wc2 >= 0xdc00 && wc2 < 0xe000); \ + put32 (outptr, \ + 0x10000 + ((wc1 - 0xd800) << 10) + (wc2 - 0xdc00)); \ + outptr += 4; \ + } \ + \ + statep->__count = shift << 3; \ + \ + /* Now that we digested the input increment the input pointer. */ \ + inptr++; \ + } \ + } +#define LOOP_NEED_FLAGS +#define EXTRA_LOOP_DECLS , mbstate_t *statep +#include + + +/* Next, define the conversion from UCS4 to UTF-7. + The state is structured as follows: + __count bit 2..0: zero + __count bit 4..3: shift + __count bit 8..5: data + Precise meaning: + shift data + 0 0 not inside base64 encoding + 1 0 inside base64, no pending bits + 2 XX00 inside base64, 2 bits known for next byte + 3 XXXX inside base64, 4 bits known for next byte + + __count bit 2..0 and __wch are always zero, because this direction + never returns __GCONV_INCOMPLETE_INPUT. +*/ +#define MIN_NEEDED_INPUT MIN_NEEDED_TO +#define MAX_NEEDED_INPUT MAX_NEEDED_TO +#define MIN_NEEDED_OUTPUT MIN_NEEDED_FROM +#define MAX_NEEDED_OUTPUT MAX_NEEDED_FROM +#define LOOPFCT TO_LOOP +#define BODY \ + { \ + uint32_t ch = get32 (inptr); \ + \ + if ((statep->__count & 0x18) == 0) \ + { \ + /* base64 encoding inactive */ \ + if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \ + { \ + *outptr++ = (unsigned char) ch; \ + } \ + else \ + { \ + size_t count; \ + \ + if (ch == '+') \ + count = 2; \ + else if (ch < 0x10000) \ + count = 3; \ + else if (ch < 0x110000) \ + count = 6; \ + else \ + STANDARD_TO_LOOP_ERR_HANDLER (4); \ + \ + if (__glibc_unlikely (outptr + count > outend)) \ + { \ + result = __GCONV_FULL_OUTPUT; \ + break; \ + } \ + \ + *outptr++ = '+'; \ + if (ch == '+') \ + *outptr++ = '-'; \ + else if (ch < 0x10000) \ + { \ + *outptr++ = base64 (ch >> 10); \ + *outptr++ = base64 ((ch >> 4) & 0x3f); \ + statep->__count = ((ch & 15) << 5) | (3 << 3); \ + } \ + else if (ch < 0x110000) \ + { \ + uint32_t ch1 = 0xd800 + ((ch - 0x10000) >> 10); \ + uint32_t ch2 = 0xdc00 + ((ch - 0x10000) & 0x3ff); \ + \ + ch = (ch1 << 16) | ch2; \ + *outptr++ = base64 (ch >> 26); \ + *outptr++ = base64 ((ch >> 20) & 0x3f); \ + *outptr++ = base64 ((ch >> 14) & 0x3f); \ + *outptr++ = base64 ((ch >> 8) & 0x3f); \ + *outptr++ = base64 ((ch >> 2) & 0x3f); \ + statep->__count = ((ch & 3) << 7) | (2 << 3); \ + } \ + else \ + abort (); \ + } \ + } \ + else \ + { \ + /* base64 encoding active */ \ + if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \ + { \ + /* deactivate base64 encoding */ \ + size_t count; \ + \ + count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \ + if (__glibc_unlikely (outptr + count > outend)) \ + { \ + result = __GCONV_FULL_OUTPUT; \ + break; \ + } \ + \ + if ((statep->__count & 0x18) >= 0x10) \ + *outptr++ = base64 ((statep->__count >> 3) & ~3); \ + if (isxbase64 (ch)) \ + *outptr++ = '-'; \ + *outptr++ = (unsigned char) ch; \ + statep->__count = 0; \ + } \ + else \ + { \ + size_t count; \ + \ + if (ch < 0x10000) \ + count = ((statep->__count & 0x18) >= 0x10 ? 3 : 2); \ + else if (ch < 0x110000) \ + count = ((statep->__count & 0x18) >= 0x18 ? 6 : 5); \ + else \ + STANDARD_TO_LOOP_ERR_HANDLER (4); \ + \ + if (__glibc_unlikely (outptr + count > outend)) \ + { \ + result = __GCONV_FULL_OUTPUT; \ + break; \ + } \ + \ + if (ch < 0x10000) \ + { \ + switch ((statep->__count >> 3) & 3) \ + { \ + case 1: \ + *outptr++ = base64 (ch >> 10); \ + *outptr++ = base64 ((ch >> 4) & 0x3f); \ + statep->__count = ((ch & 15) << 5) | (3 << 3); \ + break; \ + case 2: \ + *outptr++ = \ + base64 (((statep->__count >> 3) & ~3) | (ch >> 12)); \ + *outptr++ = base64 ((ch >> 6) & 0x3f); \ + *outptr++ = base64 (ch & 0x3f); \ + statep->__count = (1 << 3); \ + break; \ + case 3: \ + *outptr++ = \ + base64 (((statep->__count >> 3) & ~3) | (ch >> 14)); \ + *outptr++ = base64 ((ch >> 8) & 0x3f); \ + *outptr++ = base64 ((ch >> 2) & 0x3f); \ + statep->__count = ((ch & 3) << 7) | (2 << 3); \ + break; \ + default: \ + abort (); \ + } \ + } \ + else if (ch < 0x110000) \ + { \ + uint32_t ch1 = 0xd800 + ((ch - 0x10000) >> 10); \ + uint32_t ch2 = 0xdc00 + ((ch - 0x10000) & 0x3ff); \ + \ + ch = (ch1 << 16) | ch2; \ + switch ((statep->__count >> 3) & 3) \ + { \ + case 1: \ + *outptr++ = base64 (ch >> 26); \ + *outptr++ = base64 ((ch >> 20) & 0x3f); \ + *outptr++ = base64 ((ch >> 14) & 0x3f); \ + *outptr++ = base64 ((ch >> 8) & 0x3f); \ + *outptr++ = base64 ((ch >> 2) & 0x3f); \ + statep->__count = ((ch & 3) << 7) | (2 << 3); \ + break; \ + case 2: \ + *outptr++ = \ + base64 (((statep->__count >> 3) & ~3) | (ch >> 28)); \ + *outptr++ = base64 ((ch >> 22) & 0x3f); \ + *outptr++ = base64 ((ch >> 16) & 0x3f); \ + *outptr++ = base64 ((ch >> 10) & 0x3f); \ + *outptr++ = base64 ((ch >> 4) & 0x3f); \ + statep->__count = ((ch & 15) << 5) | (3 << 3); \ + break; \ + case 3: \ + *outptr++ = \ + base64 (((statep->__count >> 3) & ~3) | (ch >> 30)); \ + *outptr++ = base64 ((ch >> 24) & 0x3f); \ + *outptr++ = base64 ((ch >> 18) & 0x3f); \ + *outptr++ = base64 ((ch >> 12) & 0x3f); \ + *outptr++ = base64 ((ch >> 6) & 0x3f); \ + *outptr++ = base64 (ch & 0x3f); \ + statep->__count = (1 << 3); \ + break; \ + default: \ + abort (); \ + } \ + } \ + else \ + abort (); \ + } \ + } \ + \ + /* Now that we wrote the output increment the input pointer. */ \ + inptr += 4; \ + } +#define LOOP_NEED_FLAGS +#define EXTRA_LOOP_DECLS , mbstate_t *statep +#include + + +/* Since this is a stateful encoding we have to provide code which resets + the output state to the initial state. This has to be done during the + flushing. */ +#define EMIT_SHIFT_TO_INIT \ + if (FROM_DIRECTION) \ + /* Nothing to emit. */ \ + memset (data->__statep, '\0', sizeof (mbstate_t)); \ + else \ + { \ + /* The "to UTF-7" direction. Flush the remaining bits and terminate \ + with a '-' byte. This will guarantee correct decoding if more \ + UTF-7 encoded text is added afterwards. */ \ + int state = data->__statep->__count; \ + \ + if (state & 0x18) \ + { \ + /* Deactivate base64 encoding. */ \ + size_t count = ((state & 0x18) >= 0x10) + 1; \ + \ + if (__glibc_unlikely (outbuf + count > outend)) \ + /* We don't have enough room in the output buffer. */ \ + status = __GCONV_FULL_OUTPUT; \ + else \ + { \ + /* Write out the shift sequence. */ \ + if ((state & 0x18) >= 0x10) \ + *outbuf++ = base64 ((state >> 3) & ~3); \ + *outbuf++ = '-'; \ + \ + data->__statep->__count = 0; \ + } \ + } \ + else \ + data->__statep->__count = 0; \ + } + + +/* Now define the toplevel functions. */ +#include From patchwork Mon Jan 25 09:02:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gautier X-Patchwork-Id: 41801 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 40C2739450F1; Mon, 25 Jan 2021 09:02:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 40C2739450F1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1611565378; bh=5pdZALaAEZbmUgVJoV0lhbcxLUEd6wcKKBTF46HBBAA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=PwxXu5y4ve5VmS1fElxGCWa2DjaUX89mIslXXrFDR1u2T9LXKYX/PZI5WwhGgCbuY uCwNXhWXovCKUBoH5BxQY6hxf5fM9QVhSZllR0Rzeny2EBtrgJ1MHtX1GSp83t7FIZ s4aYRX7KaaCmK1jJZCgpl5Cm/7TaK1L1sk5nXp8o= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [IPv6:2001:67c:2050::465:201]) by sourceware.org (Postfix) with ESMTPS id 3FE213840C1E for ; Mon, 25 Jan 2021 09:02:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3FE213840C1E Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4DPP5p2L2gzQlYt; Mon, 25 Jan 2021 10:02:54 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter02.heinlein-hosting.de (spamfilter02.heinlein-hosting.de [80.241.56.116]) (amavisd-new, port 10030) with ESMTP id 3MbSCM-MTcAk; Mon, 25 Jan 2021 10:02:52 +0100 (CET) To: libc-alpha@sourceware.org Subject: [PATCH v3 2/5] Update gconv-modules file Date: Mon, 25 Jan 2021 10:02:23 +0100 Message-Id: <20210125090226.39967-3-mg@max.gautier.name> In-Reply-To: <20210125090226.39967-1-mg@max.gautier.name> References: <87y2m9agmm.fsf@mid.deneb.enyo.de> <20210125090226.39967-1-mg@max.gautier.name> MIME-Version: 1.0 X-MBO-SPAM-Probability: X-Rspamd-Score: -4.51 / 15.00 / 15.00 X-Rspamd-Queue-Id: 76FDA17CE X-Rspamd-UID: ea4a25 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Max Gautier via Libc-alpha From: Max Gautier Reply-To: Max Gautier Cc: Max Gautier Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" --- iconvdata/gconv-modules | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/iconvdata/gconv-modules b/iconvdata/gconv-modules index 8540225b1c..da107a1a56 100644 --- a/iconvdata/gconv-modules +++ b/iconvdata/gconv-modules @@ -1534,6 +1534,10 @@ alias UTF7// UTF-7// module UTF-7// INTERNAL UTF-7 1 module INTERNAL UTF-7// UTF-7 1 +# from to module cost +module IMAP-UTF-7// INTERNAL IMAP-UTF-7 1 +module INTERNAL IMAP-UTF-7// IMAP-UTF-7 1 + # from to module cost module GB18030// INTERNAL GB18030 1 module INTERNAL GB18030// GB18030 1 From patchwork Mon Jan 25 09:02:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gautier X-Patchwork-Id: 41803 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6F99D39450FF; Mon, 25 Jan 2021 09:03:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6F99D39450FF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1611565382; bh=OHBfWH/Qq0oJZMt7K0gsmp3YFWgNOk7RsOQt78bU67c=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=PTnYxV0o9kF9TR2q9uXnbY23hGw1tg4icT79MLKQngZDZDjyQl/E/EwbPDuoJY9yo itmQzjT15RTr7fMJwzcB+R9rZnqCnLEA7IAhFy+sBV1zbs0yDMkHmbUKnz5jMBHstu Usp2L5za3/KD1/hu38pZcksOvDoq25MJ/px7CZrM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [IPv6:2001:67c:2050::465:101]) by sourceware.org (Postfix) with ESMTPS id 4651A3858C27 for ; Mon, 25 Jan 2021 09:03:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4651A3858C27 Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4DPP5v18pqzQlB2; Mon, 25 Jan 2021 10:02:59 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by gerste.heinlein-support.de (gerste.heinlein-support.de [91.198.250.173]) (amavisd-new, port 10030) with ESMTP id i7H833KsaawQ; Mon, 25 Jan 2021 10:02:54 +0100 (CET) To: libc-alpha@sourceware.org Subject: [PATCH v3 3/5] Transform UTF-7 to IMAP-UTF-7 Date: Mon, 25 Jan 2021 10:02:24 +0100 Message-Id: <20210125090226.39967-4-mg@max.gautier.name> In-Reply-To: <20210125090226.39967-1-mg@max.gautier.name> References: <87y2m9agmm.fsf@mid.deneb.enyo.de> <20210125090226.39967-1-mg@max.gautier.name> MIME-Version: 1.0 X-MBO-SPAM-Probability: X-Rspamd-Score: -5.76 / 15.00 / 15.00 X-Rspamd-Queue-Id: 0977A186E X-Rspamd-UID: 504a05 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Max Gautier via Libc-alpha From: Max Gautier Reply-To: Max Gautier Cc: Max Gautier Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" * shift character is '&' instead of '+' * No "optionnal direct characters" set * modified base64 character set * use direct comparison instead of arrays and bitwise op (if there is a good reason to use those, please let me know) --- iconvdata/Makefile | 2 +- iconvdata/{modified-utf-7.c => imap-utf-7.c} | 97 +++++++------------- 2 files changed, 32 insertions(+), 67 deletions(-) rename iconvdata/{modified-utf-7.c => imap-utf-7.c} (87%) diff --git a/iconvdata/Makefile b/iconvdata/Makefile index 7f932e10ed..c9c1a3a006 100644 --- a/iconvdata/Makefile +++ b/iconvdata/Makefile @@ -61,7 +61,7 @@ modules := ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8859-5 \ IBM5347 IBM9030 IBM9066 IBM9448 IBM12712 IBM16804 \ IBM1364 IBM1371 IBM1388 IBM1390 IBM1399 ISO_11548-1 MIK BRF \ MAC-CENTRALEUROPE KOI8-RU ISO8859-9E \ - CP770 CP771 CP772 CP773 CP774 MODIFIED-UTF-7 + CP770 CP771 CP772 CP773 CP774 IMAP-UTF-7 # If lazy binding is disabled, use BIND_NOW for the gconv modules. ifeq ($(bind-now),yes) diff --git a/iconvdata/modified-utf-7.c b/iconvdata/imap-utf-7.c similarity index 87% rename from iconvdata/modified-utf-7.c rename to iconvdata/imap-utf-7.c index fc6a8dfcfd..ebd66d3388 100644 --- a/iconvdata/modified-utf-7.c +++ b/iconvdata/imap-utf-7.c @@ -1,4 +1,4 @@ -/* Conversion module for UTF-7. +/* Conversion module for IMAP-UTF-7. Copyright (C) 2000-2020 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,12 +16,12 @@ License along with the GNU C Library; if not, see . */ -/* UTF-7 is a legacy encoding used for transmitting Unicode within the - ASCII character set, used primarily by mail agents. New programs - are encouraged to use UTF-8 instead. +/* IMAP-UTF-7 is a legacy encoding used for transmitting Unicode within the + ASCII character set, used primarily by IMAP server and clients agents. + New programs are encouraged to use UTF-8 instead. - UTF-7 is specified in RFC 2152 (and old RFC 1641, RFC 1642). The - original Base64 encoding is defined in RFC 2045. */ + IMAP-UTF-7 is specified in RFC 3501 as part of the IMAPv4 specification. + The original Base64 encoding is defined in RFC 2045. */ #include #include @@ -29,64 +29,29 @@ #include -/* Define this to 1 if you want the so-called "optional direct" characters - ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } - to be encoded. Define to 0 if you want them to be passed straight - through, like the so-called "direct" characters. - We set this to 1 because it's safer. - */ -#define UTF7_ENCODE_OPTIONAL_CHARS 1 - - /* The set of "direct characters": A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr + ! " # $ % + * ; < = > @ [ ] ^ _ ` { | } */ -static const unsigned char direct_tab[128 / 8] = - { - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87, - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 - }; - static int isdirect (uint32_t ch) { - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1)); -} - - -/* The set of "direct and optional direct characters": - A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr - ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } -*/ - -static const unsigned char xdirect_tab[128 / 8] = - { - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff, - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f - }; - -static int -isxdirect (uint32_t ch) -{ - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1)); + return ((ch == '\n' || ch == '\t' || ch == '\r') + || (ch >= 0x20 && ch <= 0x7e && ch != '&')); } - -/* The set of "extended base64 characters": - A-Z a-z 0-9 + / - +/* The set of "modified base64 characters": + A-Z a-z 0-9 + , - */ -static const unsigned char xbase64_tab[128 / 8] = - { - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03, - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 - }; - static int -isxbase64 (uint32_t ch) +ismbase64 (uint32_t ch) { - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1)); + return ((ch >= 'a' && ch <= 'z') + || (ch >= 'A' && ch <= 'Z') + || (ch >= '0' && ch <= '9') + || (ch == '+' || ch == ',')); } @@ -103,18 +68,18 @@ base64 (unsigned int i) else if (i == 62) return '+'; else if (i == 63) - return '/'; + return ','; else abort (); } /* Definitions used in the body of the `gconv' function. */ -#define CHARSET_NAME "UTF-7//" +#define CHARSET_NAME "IMAP-UTF-7//" #define DEFINE_INIT 1 #define DEFINE_FINI 1 -#define FROM_LOOP from_utf7_loop -#define TO_LOOP to_utf7_loop +#define FROM_LOOP from_imap_utf7_loop +#define TO_LOOP to_imap_utf7_loop #define MIN_NEEDED_FROM 1 #define MAX_NEEDED_FROM 6 #define MIN_NEEDED_TO 4 @@ -161,13 +126,13 @@ base64 (unsigned int i) if ((statep->__count >> 3) == 0) \ { \ /* base64 encoding inactive. */ \ - if (isxdirect (ch)) \ + if (isdirect (ch)) \ { \ inptr++; \ put32 (outptr, ch); \ outptr += 4; \ } \ - else if (__glibc_likely (ch == '+')) \ + else if (__glibc_likely (ch == '&')) \ { \ if (__glibc_unlikely (inptr + 2 > inend)) \ { \ @@ -209,7 +174,7 @@ base64 (unsigned int i) i = ch - '0' + 52; \ else if (ch == '+') \ i = 62; \ - else if (ch == '/') \ + else if (ch == ',') \ i = 63; \ else \ { \ @@ -323,7 +288,7 @@ base64 (unsigned int i) if ((statep->__count & 0x18) == 0) \ { \ /* base64 encoding inactive */ \ - if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \ + if (isdirect (ch)) \ { \ *outptr++ = (unsigned char) ch; \ } \ @@ -331,7 +296,7 @@ base64 (unsigned int i) { \ size_t count; \ \ - if (ch == '+') \ + if (ch == '&') \ count = 2; \ else if (ch < 0x10000) \ count = 3; \ @@ -346,8 +311,8 @@ base64 (unsigned int i) break; \ } \ \ - *outptr++ = '+'; \ - if (ch == '+') \ + *outptr++ = '&'; \ + if (ch == '&') \ *outptr++ = '-'; \ else if (ch < 0x10000) \ { \ @@ -375,12 +340,12 @@ base64 (unsigned int i) else \ { \ /* base64 encoding active */ \ - if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \ + if (isdirect (ch)) \ { \ /* deactivate base64 encoding */ \ size_t count; \ \ - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \ + count = ((statep->__count & 0x18) >= 0x10) + ismbase64 (ch) + 1; \ if (__glibc_unlikely (outptr + count > outend)) \ { \ result = __GCONV_FULL_OUTPUT; \ @@ -389,7 +354,7 @@ base64 (unsigned int i) \ if ((statep->__count & 0x18) >= 0x10) \ *outptr++ = base64 ((statep->__count >> 3) & ~3); \ - if (isxbase64 (ch)) \ + if (ismbase64 (ch)) \ *outptr++ = '-'; \ *outptr++ = (unsigned char) ch; \ statep->__count = 0; \ @@ -499,7 +464,7 @@ base64 (unsigned int i) memset (data->__statep, '\0', sizeof (mbstate_t)); \ else \ { \ - /* The "to UTF-7" direction. Flush the remaining bits and terminate \ + /* The "to M-UTF-7" direction. Flush the remaining bits and terminate \ with a '-' byte. This will guarantee correct decoding if more \ UTF-7 encoded text is added afterwards. */ \ int state = data->__statep->__count; \ From patchwork Mon Jan 25 09:02:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gautier X-Patchwork-Id: 41804 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8DE183945C0B; Mon, 25 Jan 2021 09:03:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8DE183945C0B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1611565385; bh=A9sIqJ5IPCZXqH5nSsFNggWQMoZO4ytUwm7WF5y2cLY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=CZW75vZ6vR/0mac6CkW/foGJmfK6iBJhs2PRUY0OLkxy62HcX7/g4t6ZBrQI3Dkc5 6uY9jZaRlk1wuF33XmqS/nYGkzYZ+XsgiPKTLS9GBhgIWw8IttkrLz2ERkY95wMLDA zmxkVBe2HG7sIerjgKm14v6e38fh4A9YKFm1L9CU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by sourceware.org (Postfix) with ESMTPS id 6BD3439450FD for ; Mon, 25 Jan 2021 09:03:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6BD3439450FD Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4DPP5w357GzQlYY; Mon, 25 Jan 2021 10:03:00 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id LTDPcmsR9ANv; Mon, 25 Jan 2021 10:02:57 +0100 (CET) To: libc-alpha@sourceware.org Subject: [PATCH v3 4/5] Make terminating base64 sequences mandatory Date: Mon, 25 Jan 2021 10:02:25 +0100 Message-Id: <20210125090226.39967-5-mg@max.gautier.name> In-Reply-To: <20210125090226.39967-1-mg@max.gautier.name> References: <87y2m9agmm.fsf@mid.deneb.enyo.de> <20210125090226.39967-1-mg@max.gautier.name> MIME-Version: 1.0 X-MBO-SPAM-Probability: X-Rspamd-Score: -4.11 / 15.00 / 15.00 X-Rspamd-Queue-Id: 7036A17D7 X-Rspamd-UID: f378e0 X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Max Gautier via Libc-alpha From: Max Gautier Reply-To: Max Gautier Cc: Max Gautier Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" In the modified UTF-7 encoding, unlike in UTF-7, one MUST terminate all base64 sequence with the '-' character. MODIFIED-UTF-7 -> INTERNAL : make unterminated sequences illegal INTERNAL -> MODIFIED-UTF-7 : always terminate the sequences --- iconvdata/imap-utf-7.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/iconvdata/imap-utf-7.c b/iconvdata/imap-utf-7.c index ebd66d3388..45629a48f8 100644 --- a/iconvdata/imap-utf-7.c +++ b/iconvdata/imap-utf-7.c @@ -176,7 +176,7 @@ base64 (unsigned int i) i = 62; \ else if (ch == ',') \ i = 63; \ - else \ + else if (ch == '-') \ { \ /* Terminate base64 encoding. */ \ \ @@ -188,12 +188,14 @@ base64 (unsigned int i) STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \ } \ \ - if (ch == '-') \ - inptr++; \ + inptr++; \ \ statep->__count = 0; \ continue; \ } \ + else \ + STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \ + /* Terminating '-' is required */ \ \ /* Concatenate the base64 integer i to the accumulator. */ \ shift = (statep->__count >> 3); \ @@ -354,8 +356,7 @@ base64 (unsigned int i) \ if ((statep->__count & 0x18) >= 0x10) \ *outptr++ = base64 ((statep->__count >> 3) & ~3); \ - if (ismbase64 (ch)) \ - *outptr++ = '-'; \ + *outptr++ = '-'; \ *outptr++ = (unsigned char) ch; \ statep->__count = 0; \ } \ From patchwork Tue Mar 16 14:39:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 42588 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5F353385043E; Tue, 16 Mar 2021 14:40:01 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from bumble.maple.relay.mailchannels.net (bumble.maple.relay.mailchannels.net [23.83.214.25]) by sourceware.org (Postfix) with ESMTPS id BC42B385043E for ; Tue, 16 Mar 2021 14:39:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BC42B385043E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gotplt.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=siddhesh@gotplt.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 5E7457E3504 for ; Tue, 16 Mar 2021 14:39:57 +0000 (UTC) Received: from pdx1-sub0-mail-a59.g.dreamhost.com (100-96-16-31.trex.outbound.svc.cluster.local [100.96.16.31]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id E36467E36AB for ; Tue, 16 Mar 2021 14:39:56 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from pdx1-sub0-mail-a59.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.16.31 (trex/6.1.1); Tue, 16 Mar 2021 14:39:57 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Grain-Rock: 746065fc63ede409_1615905597243_2368290666 X-MC-Loop-Signature: 1615905597243:2158554724 X-MC-Ingress-Time: 1615905597243 Received: from pdx1-sub0-mail-a59.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a59.g.dreamhost.com (Postfix) with ESMTP id 22E688A6A4 for ; Tue, 16 Mar 2021 07:39:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gotplt.org; h=from:subject :to:references:message-id:date:mime-version:in-reply-to :content-type:content-transfer-encoding; s=gotplt.org; bh=HfEKUO hHN7ZYGD9ZwFi+uvU+CxQ=; b=AnzNj/JXY9rGJ8/lsvdOj69urTGU4inykAgn4B SCxMMvbATfTA+XdR4VdhjMG3Z6smfWoQLQHQOqhXGpkA4aMrn2kntEAfjDRaF2Zg d/lEtrJgvV6IDYIO9+h08uihJnquA+AF7N9NEs0XjMFGmgski00vp/IrRFHHeIEQ BxZao= Received: from [192.168.1.111] (unknown [1.186.101.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a59.g.dreamhost.com (Postfix) with ESMTPSA id 14A928A6A2 for ; Tue, 16 Mar 2021 07:39:54 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a59 From: Siddhesh Poyarekar Subject: [PATCH v3 5/5][pw utf test] Add test case for IMAP-UTF-7 To: libc-alpha@sourceware.org References: <87y2m9agmm.fsf@mid.deneb.enyo.de> <20210125090226.39967-1-mg@max.gautier.name> Message-ID: Date: Tue, 16 Mar 2021 20:09:50 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <20210125090226.39967-1-mg@max.gautier.name> Content-Language: en-US X-Spam-Status: No, score=-3034.5 required=5.0 tests=BAYES_05, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" --- iconvdata/TESTS | 1 + iconvdata/testdata/IMAP-UTF-7 | 25 +++++++++++++++++++++++++ iconvdata/testdata/IMAP-UTF-7..UTF8 | 25 +++++++++++++++++++++++++ 3 files changed, 51 insertions(+) create mode 100644 iconvdata/testdata/IMAP-UTF-7 create mode 100644 iconvdata/testdata/IMAP-UTF-7..UTF8 diff --git a/iconvdata/TESTS b/iconvdata/TESTS index 74b82f1409..96d425219f 100644 --- a/iconvdata/TESTS +++ b/iconvdata/TESTS @@ -95,6 +95,7 @@ EUC-TW EUC-TW Y UTF8 GBK GBK Y UTF8 BIG5HKSCS BIG5HKSCS Y UTF8 UTF-7 UTF-7 N UTF8 +IMAP-UTF-7 IMAP-UTF-7 N UTF8 IBM856 IBM856 N UTF8 IBM922 IBM922 Y UTF8 IBM930 IBM930 N UTF8 diff --git a/iconvdata/testdata/IMAP-UTF-7 b/iconvdata/testdata/IMAP-UTF-7 new file mode 100644 index 0000000000..4b03e4ae57 --- /dev/null +++ b/iconvdata/testdata/IMAP-UTF-7 @@ -0,0 +1,25 @@ +&EqASGxItEps- Amharic +&AQ0-esky Czech +Dansk Danish +English English +Suomi Finnish +Fran&AOc-ais French +Deutsch German +&A5UDuwO7A7cDvQO5A7oDrA- Greek +&BeIF0QXoBdkF6g- Hebrew +Italiano Italian +Norsk Norwegian +&BCAEQwRBBEEEOgQ4BDk- Russian +Espa&APE-ol Spanish +Svenska Swedish +&DiAOMg4pDjIORA4XDiI- Thai +T&APw-rk&AOc-e Turkish +Ti&Hr8-ng Vi&Hsc-t Vietnamese +&ZeVnLIqe- Japanese +&Ti1lhw- Chinese +&1VyuAA- Korean + +// The last line of this file is missing the end-of-line terminator +// on purpose, in order to test that the conversion empties the bit buffer +// and shifts back to the initial state at the end of the conversion. +A&ImIDkQ- \ No newline at end of file diff --git a/iconvdata/testdata/IMAP-UTF-7..UTF8 b/iconvdata/testdata/IMAP-UTF-7..UTF8 new file mode 100644 index 0000000000..3b362e578c --- /dev/null +++ b/iconvdata/testdata/IMAP-UTF-7..UTF8 @@ -0,0 +1,25 @@ +አማርኛ Amharic +česky Czech +Dansk Danish +English English +Suomi Finnish +Français French +Deutsch German +Ελληνικά Greek +עברית Hebrew +Italiano Italian +Norsk Norwegian +Русский Russian +Español Spanish +Svenska Swedish +ภาษาไทย Thai +Türkçe Turkish +Tiếng Việt Vietnamese +日本語 Japanese +中文 Chinese +한글 Korean + +// The last line of this file is missing the end-of-line terminator +// on purpose, in order to test that the conversion empties the bit buffer +// and shifts back to the initial state at the end of the conversion. +A≢Α \ No newline at end of file