Message ID | 619cade7e73dc33184bf4247b739d54cd9d7d8b3.1652994079.git.fweimer@redhat.com |
---|---|
State | Committed |
Commit | 19d494445981a09503e4a0175732745c39dd7c21 |
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CE86B3839C57 for <patchwork@sourceware.org>; Thu, 19 May 2022 21:08:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE86B3839C57 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652994503; bh=59fmDdnrmni+rTtZFpeOD5bSjHA6KBFoo5j5IvL5o3Q=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=UGUS1lwvCdnPlLLzaA8eZ1pCnO5LPodj8rc2e0j+9TE/6dqSB1GNVrTvSqQ6XG/3G 4OqU5wPM+Tp4APSHweb620VIBuuA7ag+4/QHrXZFnoC5g9T8minzClKhI6piVyl8EI 2a+7Yu4jxYOADxY3cQC9bk+BFH0/MqNj5BoGdHZ0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 15FED3839C6A for <libc-alpha@sourceware.org>; Thu, 19 May 2022 21:06:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 15FED3839C6A Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-601-Xzycb5DsNnqrhNPs6bQcnQ-1; Thu, 19 May 2022 17:06:37 -0400 X-MC-Unique: Xzycb5DsNnqrhNPs6bQcnQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 476B92A59541 for <libc-alpha@sourceware.org>; Thu, 19 May 2022 21:06:37 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B39D17AD5 for <libc-alpha@sourceware.org>; Thu, 19 May 2022 21:06:36 +0000 (UTC) To: libc-alpha@sourceware.org Subject: [PATCH 2/5] locale: Fix signed char bug in lr_getc In-Reply-To: <cover.1652994079.git.fweimer@redhat.com> References: <cover.1652994079.git.fweimer@redhat.com> X-From-Line: 619cade7e73dc33184bf4247b739d54cd9d7d8b3 Mon Sep 17 00:00:00 2001 Message-Id: <619cade7e73dc33184bf4247b739d54cd9d7d8b3.1652994079.git.fweimer@redhat.com> Date: Thu, 19 May 2022 23:06:34 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Florian Weimer <fweimer@redhat.com> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Assume UTF-8 encoding for localedef input files
|
|
Checks
Context | Check | Description |
---|---|---|
dj/TryBot-apply_patch | success | Patch applied to master at the time it was sent |
Commit Message
Florian Weimer
May 19, 2022, 9:06 p.m. UTC
The array lr->buf contains characters, which can be signed. A 0xff byte in the input could be incorrectly reported as EOF. More importantly, get_string in linereader.c converts a signed input byte to a Unicode code point using ADDWC ((uint32_t) ch), under the assumption that this decodes the ISO-8859-1 input encoding. If char is signed, this does not give the correct result. This means that ISO-8859-1 input files for localedef are not actually supported, contrary to the comment in get_string. This is a happy accident because we can therefore change the file encoding to UTF-8 without impacting backwards compatibility. While at it, remove the \32 check for MS-DOS end-of-file character (^Z). --- locale/programs/linereader.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On 5/19/22 17:06, Florian Weimer via Libc-alpha wrote: > The array lr->buf contains characters, which can be signed. A 0xff > byte in the input could be incorrectly reported as EOF. More > importantly, get_string in linereader.c converts a signed input byte > to a Unicode code point using ADDWC ((uint32_t) ch), under the > assumption that this decodes the ISO-8859-1 input encoding. If char > is signed, this does not give the correct result. This means that > ISO-8859-1 input files for localedef are not actually supported, > contrary to the comment in get_string. This is a happy accident because > we can therefore change the file encoding to UTF-8 without impacting > backwards compatibility. LGTM. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> > > While at it, remove the \32 check for MS-DOS end-of-file character (^Z). OK. We don't need this, files should have the correct EOF. > --- > locale/programs/linereader.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/locale/programs/linereader.h b/locale/programs/linereader.h > index 0fb10ec833..653a71d2d1 100644 > --- a/locale/programs/linereader.h > +++ b/locale/programs/linereader.h > @@ -134,7 +134,7 @@ lr_getc (struct linereader *lr) > return EOF; > } > > - return lr->buf[lr->idx] == '\32' ? EOF : lr->buf[lr->idx++]; > + return lr->buf[lr->idx++] & 0xff; OK. Agreed, this should not be sign extended. It's a byte in the buffer not EOF. With the original MS-DOS checking it might have been *needed* to return -1. > } > >
diff --git a/locale/programs/linereader.h b/locale/programs/linereader.h index 0fb10ec833..653a71d2d1 100644 --- a/locale/programs/linereader.h +++ b/locale/programs/linereader.h @@ -134,7 +134,7 @@ lr_getc (struct linereader *lr) return EOF; } - return lr->buf[lr->idx] == '\32' ? EOF : lr->buf[lr->idx++]; + return lr->buf[lr->idx++] & 0xff; }