From patchwork Tue Nov 2 09:06:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?0J3QuNC60LjRgtCwINCf0L7Qv9C+0LI=?= X-Patchwork-Id: 46947 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 81E713858015 for ; Tue, 2 Nov 2021 09:07:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 81E713858015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1635844055; bh=Xz7dz52DnSitSotRUI1Guc/Md/qiVENvU4Y9HP2bbq8=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=eJQRp5D9ffRtwFwMPMtFYowYFpZ2MKiSmlUb6qfVK7c0+QhTfLe0TgdjFcc1ajhsp VTGxjNQXib39xwwWiZiuvkg2EtFTOf42C9RbvjU1pK7mZLMprj2yp6f5Dg0WBfWKKm 5mC/qA6h55XVSUC6TB8XR9ZnvOokjrpTqH72+pNo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by sourceware.org (Postfix) with ESMTPS id C91733858402 for ; Tue, 2 Nov 2021 09:07:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C91733858402 Received: by mail-lj1-x231.google.com with SMTP id x19so9583358ljm.11 for ; Tue, 02 Nov 2021 02:07:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Xz7dz52DnSitSotRUI1Guc/Md/qiVENvU4Y9HP2bbq8=; b=q2XKVttF3pX0DT++B0EdfeBGZ2Mz8Wr3VrgVHObGRLos9b/nT6EfF38M6ZZ5U6AyEC jnSjOIcxA3HKmr/OA20LOM9w+kv1UQaOl4opIMsBMvXwUT0ijmjFvdXTEO2EMlklYGKX jpCeOzp08If8S7oPx05ya+kgrZkdTSrE1Gbct/rnQBCTMhppMneLmCN6a7P379LJi70x YjKvhT9xD58jErzLFv2CL9uh//AQiOjfX4Ephrl5Ng8Ztqx63bzNxxlkNLpWLiCXKv5R z7YtCGDUHAzQUWp8SVSTQkGQG+PxpT4AX/Tms0wMTH7YrY96g/APBTvZSQzVA8fPDmAw rNZg== X-Gm-Message-State: AOAM530SQ54F+KP1uOAF2mj9D7kUKzZBLafEqcuT6v77Z64FdW4y1dy3 oi2/qkPd9d2ZwNTWK17vNc5WDg9K1w+MDPEMDdw/TdnCIHw= X-Google-Smtp-Source: ABdhPJzZg/3c3qGL+WohdfXI/ZZwOsyvSErl0UEtKLd4cd48V4NExzoyBOgdwq/BTsANcsMV5WAf0JixNkSSjuVoXbc= X-Received: by 2002:a2e:8107:: with SMTP id d7mr22572018ljg.303.1635844030088; Tue, 02 Nov 2021 02:07:10 -0700 (PDT) MIME-Version: 1.0 Date: Tue, 2 Nov 2021 14:06:58 +0500 Message-ID: Subject: [PATCH] gconv: Do not emit spurious NUL character in ISO-2022-JP-3 (bug 28524) To: libc-alpha@sourceware.org X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: =?utf-8?b?0J3QuNC60LjRgtCwINCf0L7Qv9C+0LIgdmlhIExpYmMtYWxwaGE=?= From: =?utf-8?b?0J3QuNC60LjRgtCwINCf0L7Qv9C+0LI=?= Reply-To: =?utf-8?b?0J3QuNC60LjRgtCwINCf0L7Qv9C+0LI=?= Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Hello, I'm submitting a proposed patch for bug 28524. From 2fa94ed223424fe62d1e3ef02a4b562e0e164eac Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Tue, 2 Nov 2021 13:21:42 +0500 Subject: [PATCH] gconv: Do not emit spurious NUL character in ISO-2022-JP-3 (bug 28524) Bugfix 27256 has introduced another issue: In conversion from ISO-2022-JP-3 encoding, it is possible to force iconv to emit extra NUL character on internal state reset. To do this, it is sufficient to feed iconv with escape sequence which switches active character set. The simplified check 'data->__statep->__count != ASCII_set' introduced by the aforementioned bugfix picks that case and behaves as if '\0' character has been queued thus emitting it. To eliminate this issue, these steps are taken: * Restore original condition '(data->__statep->__count & ~7) != ASCII_set'. It is necessary since bits 0-2 may contain number of buffered input characters. * Check that queued character is not NUL. Similar step is taken for main conversion loop. Bundled test case follows following logic: * Try to convert ISO-2022-JP-3 escape sequence switching active character set * Reset internal state by providing NULL as input buffer * Ensure that nothing has been converted. Signed-off-by: Nikita Popov --- iconvdata/Makefile | 4 ++- iconvdata/bug-iconv15.c | 55 +++++++++++++++++++++++++++++++++++++++ iconvdata/iso-2022-jp-3.c | 27 +++++++++++++------ 3 files changed, 77 insertions(+), 9 deletions(-) create mode 100644 iconvdata/bug-iconv15.c diff --git a/iconvdata/Makefile b/iconvdata/Makefile index c216f959df..f7888de29c 100644 --- a/iconvdata/Makefile +++ b/iconvdata/Makefile @@ -74,7 +74,7 @@ ifeq (yes,$(build-shared)) tests = bug-iconv1 bug-iconv2 tst-loading tst-e2big tst-iconv4 bug-iconv4 \ tst-iconv6 bug-iconv5 bug-iconv6 tst-iconv7 bug-iconv8 bug-iconv9 \ bug-iconv10 bug-iconv11 bug-iconv12 tst-iconv-big5-hkscs-to-2ucs4 \ - bug-iconv13 bug-iconv14 + bug-iconv13 bug-iconv14 bug-iconv15 ifeq ($(have-thread-library),yes) tests += bug-iconv3 endif @@ -327,6 +327,8 @@ $(objpfx)bug-iconv12.out: $(addprefix $(objpfx), $(gconv-modules)) \ $(addprefix $(objpfx),$(modules.so)) $(objpfx)bug-iconv14.out: $(addprefix $(objpfx), $(gconv-modules)) \ $(addprefix $(objpfx),$(modules.so)) +$(objpfx)bug-iconv15.out: $(addprefix $(objpfx), $(gconv-modules)) \ + $(addprefix $(objpfx),$(modules.so)) $(objpfx)iconv-test.out: run-iconv-test.sh \ $(addprefix $(objpfx), $(gconv-modules)) \ diff --git a/iconvdata/bug-iconv15.c b/iconvdata/bug-iconv15.c new file mode 100644 index 0000000000..4037e131ff --- /dev/null +++ b/iconvdata/bug-iconv15.c @@ -0,0 +1,55 @@ +/* Bug 28524: Conversion from ISO-2022-JP-3 with iconv + may emit spurious NUL character on state reset. + Copyright (C) The GNU Toolchain Authors. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +static int +do_test (void) +{ + char in[] = "\x1b(I"; + char *inbuf = in; + size_t inleft = sizeof (in) - 1; + char out[1]; + char *outbuf = out; + size_t outleft = sizeof (out); + iconv_t cd; + + cd = iconv_open ("UTF8", "ISO-2022-JP-3"); + TEST_VERIFY_EXIT (cd != (iconv_t) -1); + + /* First call to iconv should alter internal state. + Now, JISX0201_Kana_set is selected and + state value != ASCII_set */ + TEST_VERIFY (iconv (cd, &inbuf, &inleft, &outbuf, &outleft) != (size_t) -1); + + /* Second call shall emit spurious NUL character in unpatched glibc. */ + TEST_VERIFY (iconv (cd, NULL, NULL, &outbuf, &outleft) != (size_t) -1); + + /* No characters are expected to be produced. */ + TEST_VERIFY (outbuf == out); + TEST_VERIFY (outleft == sizeof (out)); + + TEST_VERIFY_EXIT (iconv_close (cd) != -1); + + return 0; +} + +#include diff --git a/iconvdata/iso-2022-jp-3.c b/iconvdata/iso-2022-jp-3.c index 70b28ace7f..5e66d686f1 100644 --- a/iconvdata/iso-2022-jp-3.c +++ b/iconvdata/iso-2022-jp-3.c @@ -79,20 +79,31 @@ enum the output state to the initial state. This has to be done during the flushing. */ #define EMIT_SHIFT_TO_INIT \ - if (data->__statep->__count != ASCII_set) \ + if ((data->__statep->__count & ~7) != ASCII_set) \ { \ if (FROM_DIRECTION) \ { \ - if (__glibc_likely (outbuf + 4 <= outend)) \ + uint32_t ch = data->__statep->__count >> 6; \ + \ + if (__glibc_unlikely (ch != 0)) \ { \ - /* Write out the last character. */ \ - *((uint32_t *) outbuf) = data->__statep->__count >> 6; \ - outbuf += sizeof (uint32_t); \ - data->__statep->__count = ASCII_set; \ + if (__glibc_likely (outbuf + 4 <= outend)) \ + { \ + /* Write out the last character. */ \ + put32u (outbuf, ch); \ + outbuf += 4; \ + data->__statep->__count &= 7; \ + data->__statep->__count |= ASCII_set; \ + } \ + else \ + /* We don't have enough room in the output buffer. */ \ + status = __GCONV_FULL_OUTPUT; \ } \ else \ - /* We don't have enough room in the output buffer. */ \ - status = __GCONV_FULL_OUTPUT; \ + { \ + data->__statep->__count &= 7; \ + data->__statep->__count |= ASCII_set; \ + } \ } \ else \ { \ -- 2.17.1