From patchwork Thu Dec 9 09:31:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gautier X-Patchwork-Id: 48706 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DB19A385803A for ; Thu, 9 Dec 2021 09:33:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DB19A385803A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1639042409; bh=pS5xlxKqTIRmxwc9CcSplOSdj0+ByjIzyHPNciBaSYQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=LL05c8ijpgymxRecQsUSvWGg0Req998zDe1h4gBQk0AYYISDzIyz2OiyqGcjo9WXk /FTRl9T66CpT87mDM3Sfnpus2tRWoDR4rD/3B++CzE+2MrIrfvBc/RQHdhiCL5hLYJ nXk3CtJCSkDHeKsGsD1KZTFA15i9Xv76lI/7HOaE= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [IPv6:2001:67c:2050::465:202]) by sourceware.org (Postfix) with ESMTPS id DD55E3858C39 for ; Thu, 9 Dec 2021 09:32:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DD55E3858C39 Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4J8pj34VRHzQjcN; Thu, 9 Dec 2021 10:32:23 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de To: libc-alpha@sourceware.org Subject: [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Date: Thu, 9 Dec 2021 10:31:50 +0100 Message-Id: <20211209093152.313872-3-mg@max.gautier.name> In-Reply-To: <20211209093152.313872-1-mg@max.gautier.name> References: <87blcw9ptq.fsf@oldenburg.str.redhat.com> <20211209093152.313872-1-mg@max.gautier.name> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Max Gautier via Libc-alpha From: Max Gautier Reply-To: Max Gautier Cc: Max Gautier Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" - Direct use of characters instead of arcane arrays - isxbase64 is not the Modified BASE64 alphabet, but the characters who needs to trigger an explicit shift back to US-ASCII. Make that clearer Signed-off-by: Max Gautier Reviewed-by: Adhemerval Zanella --- iconvdata/utf-7.c | 56 +++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c index 9ba0974959..ac7d78141a 100644 --- a/iconvdata/utf-7.c +++ b/iconvdata/utf-7.c @@ -30,20 +30,27 @@ +static int +between(uint32_t const ch, + uint32_t const lower_bound, uint32_t const upper_bound) +{ + return (ch >= lower_bound && ch <= upper_bound); +} + /* The set of "direct characters": A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr */ -static const unsigned char direct_tab[128 / 8] = - { - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87, - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 - }; - static int isdirect (uint32_t ch) { - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1)); + return (between(ch, 'A', 'Z') + || between(ch, 'a', 'z') + || between(ch, '0', '9') + || ch == '\'' || ch == '(' || ch == ')' + || between(ch, ',', '/') + || ch == ':' || ch == '?' + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'); } @@ -52,33 +59,33 @@ isdirect (uint32_t ch) ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } */ -static const unsigned char xdirect_tab[128 / 8] = - { - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff, - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f - }; static int isxdirect (uint32_t ch) { - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1)); + return (ch == '\t' + || ch == '\n' + || ch == '\r' + || (between(ch, ' ','}') + && ch != '+' && ch != '\\') + ); } -/* The set of "extended base64 characters": +/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7 + only): Modified base64 + '-' (shift back character) A-Z a-z 0-9 + / - */ -static const unsigned char xbase64_tab[128 / 8] = - { - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03, - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 - }; - static int -isxbase64 (uint32_t ch) +needs_explicit_shift (uint32_t ch) { - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1)); + return (between(ch, 'A', 'Z') + || between(ch, 'a', 'z') + || between(ch, '/', '9') + || ch == '+' + || ch == '-' + ); } @@ -372,7 +379,8 @@ base64 (unsigned int i) /* deactivate base64 encoding */ \ size_t count; \ \ - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \ + count = ((statep->__count & 0x18) >= 0x10) \ + + needs_explicit_shift (ch) + 1; \ if (__glibc_unlikely (outptr + count > outend)) \ { \ result = __GCONV_FULL_OUTPUT; \ @@ -381,7 +389,7 @@ base64 (unsigned int i) \ if ((statep->__count & 0x18) >= 0x10) \ *outptr++ = base64 ((statep->__count >> 3) & ~3); \ - if (isxbase64 (ch)) \ + if (needs_explicit_shift (ch)) \ *outptr++ = '-'; \ *outptr++ = (unsigned char) ch; \ statep->__count = 0; \