From patchwork Wed Aug 19 23:07:00 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Max Gautier <mg@max.gautier.name>
X-Patchwork-Id: 40310
Return-Path: <libc-alpha-bounces@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3AE3D38618AF;
	Wed, 19 Aug 2020 23:05:59 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3AE3D38618AF
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1597878359;
	bh=UPmVAIIDhl9YehVlBUhg8sCnoEFCSLrav15LVSnteWY=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=IRx8QhW7gVhZFWaaeJCvkPtslm1pOZkIMxbRumrv4CG0BzDUCZON1zDrObawmmDjp
	 hscSMeFKqjT85gndHvwp6GiHDFwdAUWTP6Cg9Mq9xwDbPna6XoJSi6h/7ILB3/nHs+
	 D3jdcX2pcSkcaViYB5fvEz2BrO+4deUk2pU+C3FM=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mout-y-111.mailbox.org (mout-y-111.mailbox.org
 [IPv6:2001:67c:2050:1::465:111])
 by sourceware.org (Postfix) with ESMTPS id E3B083861026
 for <libc-alpha@sourceware.org>; Wed, 19 Aug 2020 23:05:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E3B083861026
Received: from smtp2.mailbox.org (smtp2.mailbox.org
 [IPv6:2001:67c:2050:105:465:1:2:0])
 (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits))
 (No client certificate requested)
 by mout-y-111.mailbox.org (Postfix) with ESMTPS id 4BX3Kt6pwrzQlGW;
 Thu, 20 Aug 2020 01:05:54 +0200 (CEST)
X-Virus-Scanned: amavisd-new at heinlein-support.de
Received: from smtp2.mailbox.org ([80.241.60.241])
 by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de
 [80.241.56.123]) (amavisd-new, port 10030)
 with ESMTP id NGP64lQgokNW; Thu, 20 Aug 2020 01:05:51 +0200 (CEST)
To: libc-alpha@sourceware.org
Subject: [PATCH 3/5] Transform UTF-7 to MODIFIED-UTF-7
Date: Thu, 20 Aug 2020 01:07:00 +0200
Message-Id: <20200819230702.229822-4-mg@max.gautier.name>
In-Reply-To: <20200819230702.229822-1-mg@max.gautier.name>
References: <20200819230702.229822-1-mg@max.gautier.name>
MIME-Version: 1.0
X-MBO-SPAM-Probability: 
X-Rspamd-Score: -5.80 / 15.00 / 15.00
X-Rspamd-Queue-Id: C9BF0178A
X-Rspamd-UID: d6caa0
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Max Gautier via Libc-alpha
 <libc-alpha@sourceware.org>
From: Max Gautier <mg@max.gautier.name>
Reply-To: Max Gautier <mg@max.gautier.name>
Cc: Max Gautier <mg@max.gautier.name>
Errors-To: libc-alpha-bounces@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>

* shift character is '&' instead of '+'
* No "optionnal direct characters" set
* modified base64 character set
* use direct comparison instead of arrays and bitwise op
---
Regarding the fourth item, if there is reasons to use the bitwise way,
please let me know.
 iconvdata/modified-utf-7.c | 97 ++++++++++++--------------------------
 1 file changed, 31 insertions(+), 66 deletions(-)

diff --git a/iconvdata/modified-utf-7.c b/iconvdata/modified-utf-7.c
index fc6a8dfcfd..e6eb784891 100644
--- a/iconvdata/modified-utf-7.c
+++ b/iconvdata/modified-utf-7.c
@@ -1,4 +1,4 @@
-/* Conversion module for UTF-7.
+/* Conversion module for Modified UTF-7.
    Copyright (C) 2000-2020 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,12 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-/* UTF-7 is a legacy encoding used for transmitting Unicode within the
-   ASCII character set, used primarily by mail agents.  New programs
-   are encouraged to use UTF-8 instead.
+/* Modified UTF-7 is a legacy encoding used for transmitting Unicode within the
+   ASCII character set, used primarily by IMAP server and clients agents.
+   New programs are encouraged to use UTF-8 instead.
 
-   UTF-7 is specified in RFC 2152 (and old RFC 1641, RFC 1642).  The
-   original Base64 encoding is defined in RFC 2045.  */
+   Modified UTF-7 is specified in RFC 3501 as part of the IMAPv4 specification.
+   The original Base64 encoding is defined in RFC 2045.  */
 
 #include <dlfcn.h>
 #include <gconv.h>
@@ -29,64 +29,29 @@
 #include <stdlib.h>
 
 
-/* Define this to 1 if you want the so-called "optional direct" characters
-      ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
-   to be encoded. Define to 0 if you want them to be passed straight
-   through, like the so-called "direct" characters.
-   We set this to 1 because it's safer.
- */
-#define UTF7_ENCODE_OPTIONAL_CHARS 1
-
-
 /* The set of "direct characters":
    A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
+   ! " # $ % + * ; < = > @ [ ] ^ _ ` { | }
 */
 
-static const unsigned char direct_tab[128 / 8] =
-  {
-    0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
-    0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
-  };
-
 static int
 isdirect (uint32_t ch)
 {
-  return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
-}
-
-
-/* The set of "direct and optional direct characters":
-   A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
-   ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
-*/
-
-static const unsigned char xdirect_tab[128 / 8] =
-  {
-    0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
-    0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
-  };
-
-static int
-isxdirect (uint32_t ch)
-{
-  return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
+  return ((ch == '\n' || ch == '\t' || ch == '\r')
+		  || (ch >= 0x20 && ch <= 0x7e && ch != '&'));
 }
 
-
-/* The set of "extended base64 characters":
-   A-Z a-z 0-9 + / -
+/* The set of "modified base64 characters":
+   A-Z a-z 0-9 + , -
 */
 
-static const unsigned char xbase64_tab[128 / 8] =
-  {
-    0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
-    0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
-  };
-
 static int
-isxbase64 (uint32_t ch)
+ismbase64 (uint32_t ch)
 {
-  return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
+  return ((ch >= 'a' && ch <= 'z')
+			  || (ch >= 'A' && ch <= 'Z')
+			  || (ch >= '0' && ch <= '9')
+			  || (ch == '+' || ch == ','));
 }
 
 
@@ -103,18 +68,18 @@ base64 (unsigned int i)
   else if (i == 62)
     return '+';
   else if (i == 63)
-    return '/';
+    return ',';
   else
     abort ();
 }
 
 
 /* Definitions used in the body of the `gconv' function.  */
-#define CHARSET_NAME		"UTF-7//"
+#define CHARSET_NAME		"MODIFIED-UTF-7//"
 #define DEFINE_INIT		1
 #define DEFINE_FINI		1
-#define FROM_LOOP		from_utf7_loop
-#define TO_LOOP			to_utf7_loop
+#define FROM_LOOP		from_m_utf7_loop
+#define TO_LOOP			to_m_utf7_loop
 #define MIN_NEEDED_FROM		1
 #define MAX_NEEDED_FROM		6
 #define MIN_NEEDED_TO		4
@@ -161,13 +126,13 @@ base64 (unsigned int i)
     if ((statep->__count >> 3) == 0)					      \
       {									      \
 	/* base64 encoding inactive.  */				      \
-	if (isxdirect (ch))						      \
+	if (isdirect (ch))						      \
 	  {								      \
 	    inptr++;							      \
 	    put32 (outptr, ch);						      \
 	    outptr += 4;						      \
 	  }								      \
-	else if (__glibc_likely (ch == '+'))				      \
+	else if (__glibc_likely (ch == '&'))				      \
 	  {								      \
 	    if (__glibc_unlikely (inptr + 2 > inend))			      \
 	      {								      \
@@ -209,7 +174,7 @@ base64 (unsigned int i)
 	  i = ch - '0' + 52;						      \
 	else if (ch == '+')						      \
 	  i = 62;							      \
-	else if (ch == '/')						      \
+	else if (ch == ',')						      \
 	  i = 63;							      \
 	else								      \
 	  {								      \
@@ -323,7 +288,7 @@ base64 (unsigned int i)
     if ((statep->__count & 0x18) == 0)					      \
       {									      \
 	/* base64 encoding inactive */					      \
-	if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch))      \
+	if (isdirect (ch))      \
 	  {								      \
 	    *outptr++ = (unsigned char) ch;				      \
 	  }								      \
@@ -331,7 +296,7 @@ base64 (unsigned int i)
 	  {								      \
 	    size_t count;						      \
 									      \
-	    if (ch == '+')						      \
+	    if (ch == '&')						      \
 	      count = 2;						      \
 	    else if (ch < 0x10000)					      \
 	      count = 3;						      \
@@ -346,8 +311,8 @@ base64 (unsigned int i)
 		break;							      \
 	      }								      \
 									      \
-	    *outptr++ = '+';						      \
-	    if (ch == '+')						      \
+	    *outptr++ = '&';						      \
+	    if (ch == '&')						      \
 	      *outptr++ = '-';						      \
 	    else if (ch < 0x10000)					      \
 	      {								      \
@@ -375,12 +340,12 @@ base64 (unsigned int i)
     else								      \
       {									      \
 	/* base64 encoding active */					      \
-	if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch))      \
+	if (isdirect (ch))      \
 	  {								      \
 	    /* deactivate base64 encoding */				      \
 	    size_t count;						      \
 									      \
-	    count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1;  \
+	    count = ((statep->__count & 0x18) >= 0x10) + ismbase64 (ch) + 1;  \
 	    if (__glibc_unlikely (outptr + count > outend))		      \
 	      {								      \
 		result = __GCONV_FULL_OUTPUT;				      \
@@ -389,7 +354,7 @@ base64 (unsigned int i)
 									      \
 	    if ((statep->__count & 0x18) >= 0x10)			      \
 	      *outptr++ = base64 ((statep->__count >> 3) & ~3);		      \
-	    if (isxbase64 (ch))						      \
+	    if (ismbase64 (ch))						      \
 	      *outptr++ = '-';						      \
 	    *outptr++ = (unsigned char) ch;				      \
 	    statep->__count = 0;					      \
@@ -499,7 +464,7 @@ base64 (unsigned int i)
     memset (data->__statep, '\0', sizeof (mbstate_t));			      \
   else									      \
     {									      \
-      /* The "to UTF-7" direction.  Flush the remaining bits and terminate    \
+      /* The "to M-UTF-7" direction.  Flush the remaining bits and terminate    \
 	 with a '-' byte.  This will guarantee correct decoding if more	      \
 	 UTF-7 encoded text is added afterwards.  */			      \
       int state = data->__statep->__count;				      \