From patchwork Thu May 19 21:06:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 54237 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D91EF3839C53 for ; Thu, 19 May 2022 21:07:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D91EF3839C53 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652994461; bh=FQaulRYmKyjpa15eJC7qBQ6MpOQSQdSJNBftVO3Iu0M=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=oqkHYgxuSXBOoxhI9Ghj7pwL6S3vrVQqv99n9c1dsi7JcJt1YVxa77GOLcmHjMNFT b+0ueq62hXyvZo6kToYVBMRZarTIrqU6iUgWKE6avcSmWnbLR3mra2/L1lGY6d4EYk ZuHbV9S1O4bOgYpqQiQ3WETk4jb3VYM/atGGCoIc= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id D6C703839C4C for ; Thu, 19 May 2022 21:06:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D6C703839C4C Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-550-hYViZnZVNVKnDJYLGg9NYg-1; Thu, 19 May 2022 17:06:33 -0400 X-MC-Unique: hYViZnZVNVKnDJYLGg9NYg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2AC31101A52C for ; Thu, 19 May 2022 21:06:33 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E1E2B7B64 for ; Thu, 19 May 2022 21:06:31 +0000 (UTC) To: libc-alpha@sourceware.org Subject: [PATCH 1/5] locale: Turn ADDC and ADDS into functions in linereader.c In-Reply-To: References: X-From-Line: 6e108376cb28f366ead22cab2347245bf43e4060 Mon Sep 17 00:00:00 2001 Message-Id: <6e108376cb28f366ead22cab2347245bf43e4060.1652994079.git.fweimer@redhat.com> Date: Thu, 19 May 2022 23:06:30 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" And introduce struct lr_buffer. The functions addc and adds can be called from functions, enabling subsequent refactoring. Reviewed-by: Carlos O'Donell Tested-by: Carlos O'Donell --- locale/programs/linereader.c | 203 ++++++++++++++++++----------------- 1 file changed, 104 insertions(+), 99 deletions(-) diff --git a/locale/programs/linereader.c b/locale/programs/linereader.c index 146d32e5e2..d5367e0a1e 100644 --- a/locale/programs/linereader.c +++ b/locale/programs/linereader.c @@ -416,36 +416,60 @@ get_toplvl_escape (struct linereader *lr) return &lr->token; } +/* Multibyte string buffer. */ +struct lr_buffer +{ + size_t act; + size_t max; + char *buf; +}; -#define ADDC(ch) \ - do \ - { \ - if (bufact == bufmax) \ - { \ - bufmax *= 2; \ - buf = xrealloc (buf, bufmax); \ - } \ - buf[bufact++] = (ch); \ - } \ - while (0) +/* Initialize *LRB with a default-sized buffer. */ +static void +lr_buffer_init (struct lr_buffer *lrb) +{ + lrb->act = 0; + lrb->max = 56; + lrb->buf = xmalloc (lrb->max); +} +/* Transfers the buffer string from *LRB to LR->token.mbstr. */ +static void +lr_buffer_to_token (struct lr_buffer *lrb, struct linereader *lr) +{ + lr->token.val.str.startmb = xrealloc (lrb->buf, lrb->act + 1); + lr->token.val.str.startmb[lrb->act] = '\0'; + lr->token.val.str.lenmb = lrb->act; +} -#define ADDS(s, l) \ - do \ - { \ - size_t _l = (l); \ - if (bufact + _l > bufmax) \ - { \ - if (bufact < _l) \ - bufact = _l; \ - bufmax *= 2; \ - buf = xrealloc (buf, bufmax); \ - } \ - memcpy (&buf[bufact], s, _l); \ - bufact += _l; \ - } \ - while (0) +/* Adds CH to *LRB. */ +static void +addc (struct lr_buffer *lrb, char ch) +{ + if (lrb->act == lrb->max) + { + lrb->max *= 2; + lrb->buf = xrealloc (lrb->buf, lrb->max); + } + lrb->buf[lrb->act++] = ch; +} +/* Adds L bytes at S to *LRB. */ +static void +adds (struct lr_buffer *lrb, const unsigned char *s, size_t l) +{ + if (lrb->max - lrb->act < l) + { + size_t required_size = lrb->act + l; + size_t new_max = 2 * lrb->max; + if (new_max < required_size) + new_max = required_size; + lrb->buf = xrealloc (lrb->buf, new_max); + lrb->max = new_max; + } + memcpy (lrb->buf + lrb->act, s, l); + lrb->act += l; +} #define ADDWC(ch) \ do \ @@ -467,13 +491,11 @@ get_symname (struct linereader *lr) 1. reserved words 2. ISO 10646 position values 3. all other. */ - char *buf; - size_t bufact = 0; - size_t bufmax = 56; const struct keyword_t *kw; int ch; + struct lr_buffer lrb; - buf = (char *) xmalloc (bufmax); + lr_buffer_init (&lrb); do { @@ -481,13 +503,13 @@ get_symname (struct linereader *lr) if (ch == lr->escape_char) { int c2 = lr_getc (lr); - ADDC (c2); + addc (&lrb, c2); if (c2 == '\n') ch = '\n'; } else - ADDC (ch); + addc (&lrb, ch); } while (ch != '>' && ch != '\n'); @@ -495,39 +517,35 @@ get_symname (struct linereader *lr) lr_error (lr, _("unterminated symbolic name")); /* Test for ISO 10646 position value. */ - if (buf[0] == 'U' && (bufact == 6 || bufact == 10)) + if (lrb.buf[0] == 'U' && (lrb.act == 6 || lrb.act == 10)) { - char *cp = buf + 1; - while (cp < &buf[bufact - 1] && isxdigit (*cp)) + char *cp = lrb.buf + 1; + while (cp < &lrb.buf[lrb.act - 1] && isxdigit (*cp)) ++cp; - if (cp == &buf[bufact - 1]) + if (cp == &lrb.buf[lrb.act - 1]) { /* Yes, it is. */ lr->token.tok = tok_ucs4; - lr->token.val.ucs4 = strtoul (buf + 1, NULL, 16); + lr->token.val.ucs4 = strtoul (lrb.buf + 1, NULL, 16); return &lr->token; } } /* It is a symbolic name. Test for reserved words. */ - kw = lr->hash_fct (buf, bufact - 1); + kw = lr->hash_fct (lrb.buf, lrb.act - 1); if (kw != NULL && kw->symname_or_ident == 1) { lr->token.tok = kw->token; - free (buf); + free (lrb.buf); } else { lr->token.tok = tok_bsymbol; - - buf = xrealloc (buf, bufact + 1); - buf[bufact] = '\0'; - - lr->token.val.str.startmb = buf; - lr->token.val.str.lenmb = bufact - 1; + lr_buffer_to_token (&lrb, lr); + --lr->token.val.str.lenmb; /* Hide the training '>'. */ } return &lr->token; @@ -537,16 +555,13 @@ get_symname (struct linereader *lr) static struct token * get_ident (struct linereader *lr) { - char *buf; - size_t bufact; - size_t bufmax = 56; const struct keyword_t *kw; int ch; + struct lr_buffer lrb; - buf = xmalloc (bufmax); - bufact = 0; + lr_buffer_init (&lrb); - ADDC (lr->buf[lr->idx - 1]); + addc (&lrb, lr->buf[lr->idx - 1]); while (!isspace ((ch = lr_getc (lr))) && ch != '"' && ch != ';' && ch != '<' && ch != ',' && ch != EOF) @@ -560,27 +575,22 @@ get_ident (struct linereader *lr) break; } } - ADDC (ch); + addc (&lrb, ch); } lr_ungetc (lr, ch); - kw = lr->hash_fct (buf, bufact); + kw = lr->hash_fct (lrb.buf, lrb.act); if (kw != NULL && kw->symname_or_ident == 0) { lr->token.tok = kw->token; - free (buf); + free (lrb.buf); } else { lr->token.tok = tok_ident; - - buf = xrealloc (buf, bufact + 1); - buf[bufact] = '\0'; - - lr->token.val.str.startmb = buf; - lr->token.val.str.lenmb = bufact; + lr_buffer_to_token (&lrb, lr); } return &lr->token; @@ -593,14 +603,10 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, int verbose) { int return_widestr = lr->return_widestr; - char *buf; + struct lr_buffer lrb; wchar_t *buf2 = NULL; - size_t bufact; - size_t bufmax = 56; - /* We must return two different strings. */ - buf = xmalloc (bufmax); - bufact = 0; + lr_buffer_init (&lrb); /* We know it'll be a string. */ lr->token.tok = tok_string; @@ -613,19 +619,19 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, buf2 = NULL; while ((ch = lr_getc (lr)) != '"' && ch != '\n' && ch != EOF) - ADDC (ch); + addc (&lrb, ch); /* Catch errors with trailing escape character. */ - if (bufact > 0 && buf[bufact - 1] == lr->escape_char - && (bufact == 1 || buf[bufact - 2] != lr->escape_char)) + if (lrb.act > 0 && lrb.buf[lrb.act - 1] == lr->escape_char + && (lrb.act == 1 || lrb.buf[lrb.act - 2] != lr->escape_char)) { lr_error (lr, _("illegal escape sequence at end of string")); - --bufact; + --lrb.act; } else if (ch == '\n' || ch == EOF) lr_error (lr, _("unterminated string")); - ADDC ('\0'); + addc (&lrb, '\0'); } else { @@ -662,7 +668,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, break; } - ADDC (ch); + addc (&lrb, ch); if (return_widestr) ADDWC ((uint32_t) ch); @@ -671,7 +677,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, /* Now we have to search for the end of the symbolic name, i.e., the closing '>'. */ - startidx = bufact; + startidx = lrb.act; while ((ch = lr_getc (lr)) != '>' && ch != '\n' && ch != EOF) { if (ch == lr->escape_char) @@ -680,12 +686,12 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (ch == '\n' || ch == EOF) break; } - ADDC (ch); + addc (&lrb, ch); } if (ch == '\n' || ch == EOF) /* Not a correct string. */ break; - if (bufact == startidx) + if (lrb.act == startidx) { /* <> is no correct name. Ignore it and also signal an error. */ @@ -694,23 +700,23 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, } /* It might be a Uxxxx symbol. */ - if (buf[startidx] == 'U' - && (bufact - startidx == 5 || bufact - startidx == 9)) + if (lrb.buf[startidx] == 'U' + && (lrb.act - startidx == 5 || lrb.act - startidx == 9)) { - char *cp = buf + startidx + 1; - while (cp < &buf[bufact] && isxdigit (*cp)) + char *cp = lrb.buf + startidx + 1; + while (cp < &lrb.buf[lrb.act] && isxdigit (*cp)) ++cp; - if (cp == &buf[bufact]) + if (cp == &lrb.buf[lrb.act]) { char utmp[10]; /* Yes, it is. */ - ADDC ('\0'); - wch = strtoul (buf + startidx + 1, NULL, 16); + addc (&lrb, '\0'); + wch = strtoul (lrb.buf + startidx + 1, NULL, 16); /* Now forget about the name we just added. */ - bufact = startidx; + lrb.act = startidx; if (return_widestr) ADDWC (wch); @@ -774,7 +780,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, seq = charmap_find_value (charmap, utmp, 9); assert (seq != NULL); - ADDS (seq->bytes, seq->nbytes); + adds (&lrb, seq->bytes, seq->nbytes); } continue; @@ -788,24 +794,24 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, } if (seq != NULL) - ADDS (seq->bytes, seq->nbytes); + adds (&lrb, seq->bytes, seq->nbytes); continue; } } - /* We now have the symbolic name in buf[startidx] to - buf[bufact-1]. Now find out the value for this character + /* We now have the symbolic name in lrb.buf[startidx] to + lrb.buf[lrb.act-1]. Now find out the value for this character in the charmap as well as in the repertoire map (in this order). */ - seq = charmap_find_value (charmap, &buf[startidx], - bufact - startidx); + seq = charmap_find_value (charmap, &lrb.buf[startidx], + lrb.act - startidx); if (seq == NULL) { /* This name is not in the charmap. */ lr_error (lr, _("symbol `%.*s' not in charmap"), - (int) (bufact - startidx), &buf[startidx]); + (int) (lrb.act - startidx), &lrb.buf[startidx]); illegal_string = 1; } @@ -816,8 +822,8 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, wch = seq->ucs4; else { - wch = repertoire_find_value (repertoire, &buf[startidx], - bufact - startidx); + wch = repertoire_find_value (repertoire, &lrb.buf[startidx], + lrb.act - startidx); if (seq != NULL) seq->ucs4 = wch; } @@ -826,7 +832,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, { /* This name is not in the repertoire map. */ lr_error (lr, _("symbol `%.*s' not in repertoire map"), - (int) (bufact - startidx), &buf[startidx]); + (int) (lrb.act - startidx), &lrb.buf[startidx]); illegal_string = 1; } else @@ -834,11 +840,11 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, } /* Now forget about the name we just added. */ - bufact = startidx; + lrb.act = startidx; /* And copy the bytes. */ if (seq != NULL) - ADDS (seq->bytes, seq->nbytes); + adds (&lrb, seq->bytes, seq->nbytes); } if (ch == '\n' || ch == EOF) @@ -849,7 +855,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (illegal_string) { - free (buf); + free (lrb.buf); free (buf2); lr->token.val.str.startmb = NULL; lr->token.val.str.lenmb = 0; @@ -859,7 +865,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, return &lr->token; } - ADDC ('\0'); + addc (&lrb, '\0'); if (return_widestr) { @@ -870,8 +876,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, } } - lr->token.val.str.startmb = xrealloc (buf, bufact); - lr->token.val.str.lenmb = bufact; + lr_buffer_to_token (&lrb, lr); return &lr->token; } From patchwork Thu May 19 21:06:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 54238 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CE86B3839C57 for ; Thu, 19 May 2022 21:08:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE86B3839C57 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652994503; bh=59fmDdnrmni+rTtZFpeOD5bSjHA6KBFoo5j5IvL5o3Q=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=UGUS1lwvCdnPlLLzaA8eZ1pCnO5LPodj8rc2e0j+9TE/6dqSB1GNVrTvSqQ6XG/3G 4OqU5wPM+Tp4APSHweb620VIBuuA7ag+4/QHrXZFnoC5g9T8minzClKhI6piVyl8EI 2a+7Yu4jxYOADxY3cQC9bk+BFH0/MqNj5BoGdHZ0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 15FED3839C6A for ; Thu, 19 May 2022 21:06:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 15FED3839C6A Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-601-Xzycb5DsNnqrhNPs6bQcnQ-1; Thu, 19 May 2022 17:06:37 -0400 X-MC-Unique: Xzycb5DsNnqrhNPs6bQcnQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 476B92A59541 for ; Thu, 19 May 2022 21:06:37 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B39D17AD5 for ; Thu, 19 May 2022 21:06:36 +0000 (UTC) To: libc-alpha@sourceware.org Subject: [PATCH 2/5] locale: Fix signed char bug in lr_getc In-Reply-To: References: X-From-Line: 619cade7e73dc33184bf4247b739d54cd9d7d8b3 Mon Sep 17 00:00:00 2001 Message-Id: <619cade7e73dc33184bf4247b739d54cd9d7d8b3.1652994079.git.fweimer@redhat.com> Date: Thu, 19 May 2022 23:06:34 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The array lr->buf contains characters, which can be signed. A 0xff byte in the input could be incorrectly reported as EOF. More importantly, get_string in linereader.c converts a signed input byte to a Unicode code point using ADDWC ((uint32_t) ch), under the assumption that this decodes the ISO-8859-1 input encoding. If char is signed, this does not give the correct result. This means that ISO-8859-1 input files for localedef are not actually supported, contrary to the comment in get_string. This is a happy accident because we can therefore change the file encoding to UTF-8 without impacting backwards compatibility. While at it, remove the \32 check for MS-DOS end-of-file character (^Z). Reviewed-by: Carlos O'Donell Tested-by: Carlos O'Donell --- locale/programs/linereader.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/locale/programs/linereader.h b/locale/programs/linereader.h index 0fb10ec833..653a71d2d1 100644 --- a/locale/programs/linereader.h +++ b/locale/programs/linereader.h @@ -134,7 +134,7 @@ lr_getc (struct linereader *lr) return EOF; } - return lr->buf[lr->idx] == '\32' ? EOF : lr->buf[lr->idx++]; + return lr->buf[lr->idx++] & 0xff; } From patchwork Thu May 19 21:06:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 54239 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A57633839C56 for ; Thu, 19 May 2022 21:09:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A57633839C56 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652994545; bh=tW04Sw0bXDg4sisNTPpepyJXeCIXHy04YOJ4alZvy08=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=g31iVk/E6oQpKac+CyDZkn9Fk/5o08R31mVmpXEThVPjL+evildTcyn1tODI8a6Dj mKPy/g0OJVNAzvDBKW8xc2CVIjDDwdTu/3vFz+9KZZs1HZZ7FUNLh8T1IUWMsjbAxa 1Veoyu/MwJ7YdxFrXJW7ErngybCViX5kk2HLgSXw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id B48E03839C61 for ; Thu, 19 May 2022 21:06:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B48E03839C61 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-96-5Taz4fEuMNuikclkZ8dz4g-1; Thu, 19 May 2022 17:06:41 -0400 X-MC-Unique: 5Taz4fEuMNuikclkZ8dz4g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E94D68015BA for ; Thu, 19 May 2022 21:06:40 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1C93E40C1421 for ; Thu, 19 May 2022 21:06:39 +0000 (UTC) To: libc-alpha@sourceware.org Subject: [PATCH 3/5] locale: Introduce translate_unicode_codepoint into linereader.c In-Reply-To: References: X-From-Line: a89cee054d28d43cf8f7e5f171e876326e4af96e Mon Sep 17 00:00:00 2001 Message-Id: Date: Thu, 19 May 2022 23:06:38 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This will permit reusing the Unicode character processing for different character encodings, not just the current encoding. Reviewed-by: Carlos O'Donell Tested-by: Carlos O'Donell --- locale/programs/linereader.c | 167 ++++++++++++++++++----------------- 1 file changed, 85 insertions(+), 82 deletions(-) diff --git a/locale/programs/linereader.c b/locale/programs/linereader.c index d5367e0a1e..f7292f0102 100644 --- a/locale/programs/linereader.c +++ b/locale/programs/linereader.c @@ -596,6 +596,83 @@ get_ident (struct linereader *lr) return &lr->token; } +/* Process a decoded Unicode codepoint WCH in a string, placing the + multibyte sequence into LRB. Return false if the character is not + found in CHARMAP/REPERTOIRE. */ +static bool +translate_unicode_codepoint (struct localedef_t *locale, + const struct charmap_t *charmap, + const struct repertoire_t *repertoire, + uint32_t wch, struct lr_buffer *lrb) +{ + /* See whether the charmap contains the Uxxxxxxxx names. */ + char utmp[10]; + snprintf (utmp, sizeof (utmp), "U%08X", wch); + struct charseq *seq = charmap_find_value (charmap, utmp, 9); + + if (seq == NULL) + { + /* No, this isn't the case. Now determine from + the repertoire the name of the character and + find it in the charmap. */ + if (repertoire != NULL) + { + const char *symbol = repertoire_find_symbol (repertoire, wch); + if (symbol != NULL) + seq = charmap_find_value (charmap, symbol, strlen (symbol)); + } + + if (seq == NULL) + { +#ifndef NO_TRANSLITERATION + /* Transliterate if possible. */ + if (locale != NULL) + { + if ((locale->avail & CTYPE_LOCALE) == 0) + { + /* Load the CTYPE data now. */ + int old_needed = locale->needed; + + locale->needed = 0; + locale = load_locale (LC_CTYPE, locale->name, + locale->repertoire_name, + charmap, locale); + locale->needed = old_needed; + } + + uint32_t *translit; + if ((locale->avail & CTYPE_LOCALE) != 0 + && ((translit = find_translit (locale, charmap, wch)) + != NULL)) + /* The CTYPE data contains a matching + transliteration. */ + { + for (int i = 0; translit[i] != 0; ++i) + { + snprintf (utmp, sizeof (utmp), "U%08X", translit[i]); + seq = charmap_find_value (charmap, utmp, 9); + assert (seq != NULL); + adds (lrb, seq->bytes, seq->nbytes); + } + return true; + } + } +#endif /* NO_TRANSLITERATION */ + + /* Not a known name. */ + return false; + } + } + + if (seq != NULL) + { + adds (lrb, seq->bytes, seq->nbytes); + return true; + } + else + return false; +} + static struct token * get_string (struct linereader *lr, const struct charmap_t *charmap, @@ -635,7 +712,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, } else { - int illegal_string = 0; + bool illegal_string = false; size_t buf2act = 0; size_t buf2max = 56 * sizeof (uint32_t); int ch; @@ -695,7 +772,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, { /* <> is no correct name. Ignore it and also signal an error. */ - illegal_string = 1; + illegal_string = true; continue; } @@ -709,8 +786,6 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (cp == &lrb.buf[lrb.act]) { - char utmp[10]; - /* Yes, it is. */ addc (&lrb, '\0'); wch = strtoul (lrb.buf + startidx + 1, NULL, 16); @@ -721,81 +796,9 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (return_widestr) ADDWC (wch); - /* See whether the charmap contains the Uxxxxxxxx names. */ - snprintf (utmp, sizeof (utmp), "U%08X", wch); - seq = charmap_find_value (charmap, utmp, 9); - - if (seq == NULL) - { - /* No, this isn't the case. Now determine from - the repertoire the name of the character and - find it in the charmap. */ - if (repertoire != NULL) - { - const char *symbol; - - symbol = repertoire_find_symbol (repertoire, wch); - - if (symbol != NULL) - seq = charmap_find_value (charmap, symbol, - strlen (symbol)); - } - - if (seq == NULL) - { -#ifndef NO_TRANSLITERATION - /* Transliterate if possible. */ - if (locale != NULL) - { - uint32_t *translit; - - if ((locale->avail & CTYPE_LOCALE) == 0) - { - /* Load the CTYPE data now. */ - int old_needed = locale->needed; - - locale->needed = 0; - locale = load_locale (LC_CTYPE, - locale->name, - locale->repertoire_name, - charmap, locale); - locale->needed = old_needed; - } - - if ((locale->avail & CTYPE_LOCALE) != 0 - && ((translit = find_translit (locale, - charmap, wch)) - != NULL)) - /* The CTYPE data contains a matching - transliteration. */ - { - int i; - - for (i = 0; translit[i] != 0; ++i) - { - char utmp[10]; - - snprintf (utmp, sizeof (utmp), "U%08X", - translit[i]); - seq = charmap_find_value (charmap, utmp, - 9); - assert (seq != NULL); - adds (&lrb, seq->bytes, seq->nbytes); - } - - continue; - } - } -#endif /* NO_TRANSLITERATION */ - - /* Not a known name. */ - illegal_string = 1; - } - } - - if (seq != NULL) - adds (&lrb, seq->bytes, seq->nbytes); - + if (!translate_unicode_codepoint (locale, charmap, + repertoire, wch, &lrb)) + illegal_string = true; continue; } } @@ -812,7 +815,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, /* This name is not in the charmap. */ lr_error (lr, _("symbol `%.*s' not in charmap"), (int) (lrb.act - startidx), &lrb.buf[startidx]); - illegal_string = 1; + illegal_string = true; } if (return_widestr) @@ -833,7 +836,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, /* This name is not in the repertoire map. */ lr_error (lr, _("symbol `%.*s' not in repertoire map"), (int) (lrb.act - startidx), &lrb.buf[startidx]); - illegal_string = 1; + illegal_string = true; } else ADDWC (wch); @@ -850,7 +853,7 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (ch == '\n' || ch == EOF) { lr_error (lr, _("unterminated string")); - illegal_string = 1; + illegal_string = true; } if (illegal_string) From patchwork Thu May 19 21:06:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 54240 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A7BA53839C61 for ; Thu, 19 May 2022 21:09:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A7BA53839C61 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652994587; bh=PUNq2ekzrPeqJp5H71x7Ojjl2tkXiLVRtFNNEu54ALY=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jXwHcvwzzVJYM/9pxRwMDG0XAtEPz4xuOCWtivwiHoA7fnHaPKtWEW1BtiPX9PIy2 lguxdZU0SLUsUzXzuIantJjP7G8BzhbWAi6LgTT4ytj2I8Ok7I/3rWn+4o7lmX31CT AIwl9tCpdK9Ng0cSE4p4xItXh8Y2RSFXtO1LkJ/A= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 2BEBA3839C6A for ; Thu, 19 May 2022 21:06:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2BEBA3839C6A Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-421-aRGsRqfWOeiJfBgqiXcbnQ-1; Thu, 19 May 2022 17:06:45 -0400 X-MC-Unique: aRGsRqfWOeiJfBgqiXcbnQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5E49E2949BA1 for ; Thu, 19 May 2022 21:06:45 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 97DC640C1421 for ; Thu, 19 May 2022 21:06:44 +0000 (UTC) To: libc-alpha@sourceware.org Subject: [PATCH 4/5] locale: localdef input files are now encoded in UTF-8 In-Reply-To: References: X-From-Line: bab1c8587126515188cb6104cf6eba85d2e813e5 Mon Sep 17 00:00:00 2001 Message-Id: Date: Thu, 19 May 2022 23:06:42 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Previously, they were assumed to be in ISO-8859-1, and that the output charset overlapped with ISO-8859-1 for the characters actually used. However, this did not work as intended on many architectures even for an ISO-8859-1 output encoding because of the char signedness bug in lr_getc. Therefore, this commit switches to UTF-8 without making provisions for backwards compatibility. The following Elisp code can be used to convert locale definition files to UTF-8: (defun glibc/convert-localedef (from to) (interactive "r") (save-excursion (save-restriction (narrow-to-region from to) (goto-char (point-min)) (save-match-data (while (re-search-forward "" nil t) (let* ((codepoint (string-to-number (match-string 1) 16)) (converted (cond ((memq codepoint '(?/ ?\ ?< ?>)) (string ?/ codepoint)) ((= codepoint ?\") "") (t (string codepoint))))) (replace-match converted t))))))) Reviewed-by: Carlos O'Donell Tested-by: Carlos O'Donell --- NEWS | 4 + locale/programs/linereader.c | 144 ++++++++++++++++++++++++++++++++--- 2 files changed, 137 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index ad0c08d8ca..7ce0d8a135 100644 --- a/NEWS +++ b/NEWS @@ -20,6 +20,10 @@ Major new features: have been added. The pidfd functionality provides access to a process while avoiding the issue of PID reuse on tranditional Unix systems. +* localedef now accepts locale definition files encoded in UTF-8. + Previously, input bytes not within the ASCII range resulted in + unpredictable output. + Deprecated and removed features, and other changes affecting compatibility: * Support for prelink will be removed in the next release; this includes diff --git a/locale/programs/linereader.c b/locale/programs/linereader.c index f7292f0102..b484327969 100644 --- a/locale/programs/linereader.c +++ b/locale/programs/linereader.c @@ -42,6 +42,7 @@ static struct token *get_string (struct linereader *lr, struct localedef_t *locale, const struct repertoire_t *repertoire, int verbose); +static bool utf8_decode (struct linereader *lr, uint8_t ch1, uint32_t *wch); struct linereader * @@ -327,6 +328,17 @@ lr_token (struct linereader *lr, const struct charmap_t *charmap, } lr_ungetn (lr, 2); break; + + case 0x80 ... 0xff: /* UTF-8 sequence. */ + uint32_t wch; + if (!utf8_decode (lr, ch, &wch)) + { + lr->token.tok = tok_error; + return &lr->token; + } + lr->token.tok = tok_ucs4; + lr->token.val.ucs4 = wch; + return &lr->token; } return get_ident (lr); @@ -673,6 +685,87 @@ translate_unicode_codepoint (struct localedef_t *locale, return false; } +/* Returns true if ch is not EOF (that is, non-negative) and a valid + UTF-8 trailing byte. */ +static bool +utf8_valid_trailing (int ch) +{ + return ch >= 0 && (ch & 0xc0) == 0x80; +} + +/* Reports an error for a broken UTF-8 sequence. CH2 to CH4 may be + EOF. Always returns false. */ +static bool +utf8_sequence_error (struct linereader *lr, uint8_t ch1, int ch2, int ch3, + int ch4) +{ + char buf[30]; + + if (ch2 < 0) + snprintf (buf, sizeof (buf), "0x%02x", ch1); + else if (ch3 < 0) + snprintf (buf, sizeof (buf), "0x%02x 0x%02x", ch1, ch2); + else if (ch4 < 0) + snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x", ch1, ch2, ch3); + else + snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x", + ch1, ch2, ch3, ch4); + + lr_error (lr, _("invalid UTF-8 sequence %s"), buf); + return false; +} + +/* Reads a UTF-8 sequence from LR, with the leading byte CH1, and + stores the decoded codepoint in *WCH. Returns false on failure and + reports an error. */ +static bool +utf8_decode (struct linereader *lr, uint8_t ch1, uint32_t *wch) +{ + /* See RFC 3629 section 4 and __gconv_transform_utf8_internal. */ + if (ch1 < 0xc2) + return utf8_sequence_error (lr, ch1, -1, -1, -1); + + int ch2 = lr_getc (lr); + if (!utf8_valid_trailing (ch2)) + return utf8_sequence_error (lr, ch1, ch2, -1, -1); + + if (ch1 <= 0xdf) + { + uint32_t result = ((ch1 & 0x1f) << 6) | (ch2 & 0x3f); + if (result < 0x80) + return utf8_sequence_error (lr, ch1, ch2, -1, -1); + *wch = result; + return true; + } + + int ch3 = lr_getc (lr); + if (!utf8_valid_trailing (ch3) || ch1 < 0xe0) + return utf8_sequence_error (lr, ch1, ch2, ch3, -1); + + if (ch1 <= 0xef) + { + uint32_t result = (((ch1 & 0x0f) << 12) + | ((ch2 & 0x3f) << 6) + | (ch3 & 0x3f)); + if (result < 0x800) + return utf8_sequence_error (lr, ch1, ch2, ch3, -1); + *wch = result; + return true; + } + + int ch4 = lr_getc (lr); + if (!utf8_valid_trailing (ch4) || ch1 < 0xf0 || ch1 > 0xf4) + return utf8_sequence_error (lr, ch1, ch2, ch3, ch4); + + uint32_t result = (((ch1 & 0x07) << 18) + | ((ch2 & 0x3f) << 12) + | ((ch3 & 0x3f) << 6) + | (ch4 & 0x3f)); + if (result < 0x10000) + return utf8_sequence_error (lr, ch1, ch2, ch3, ch4); + *wch = result; + return true; +} static struct token * get_string (struct linereader *lr, const struct charmap_t *charmap, @@ -696,7 +789,11 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, buf2 = NULL; while ((ch = lr_getc (lr)) != '"' && ch != '\n' && ch != EOF) - addc (&lrb, ch); + { + if (ch >= 0x80) + lr_error (lr, _("illegal 8-bit character in untranslated string")); + addc (&lrb, ch); + } /* Catch errors with trailing escape character. */ if (lrb.act > 0 && lrb.buf[lrb.act - 1] == lr->escape_char @@ -730,24 +827,49 @@ get_string (struct linereader *lr, const struct charmap_t *charmap, if (ch != '<') { - /* The standards leave it up to the implementation to decide - what to do with character which stand for themself. We - could jump through hoops to find out the value relative to - the charmap and the repertoire map, but instead we leave - it up to the locale definition author to write a better - definition. We assume here that every character which - stands for itself is encoded using ISO 8859-1. Using the - escape character is allowed. */ + /* The standards leave it up to the implementation to + decide what to do with characters which stand for + themselves. This implementation treats the input + file as encoded in UTF-8. */ if (ch == lr->escape_char) { ch = lr_getc (lr); + if (ch >= 0x80) + { + lr_error (lr, _("illegal 8-bit escape sequence")); + illegal_string = true; + break; + } if (ch == '\n' || ch == EOF) break; + addc (&lrb, ch); + wch = ch; + } + else if (ch < 0x80) + { + wch = ch; + addc (&lrb, ch); + } + else /* UTF-8 sequence. */ + { + if (!utf8_decode (lr, ch, &wch)) + { + illegal_string = true; + break; + } + if (!translate_unicode_codepoint (locale, charmap, + repertoire, wch, &lrb)) + { + /* Ignore the rest of the string. Callers may + skip this string because it cannot be encoded + in the output character set. */ + illegal_string = true; + continue; + } } - addc (&lrb, ch); if (return_widestr) - ADDWC ((uint32_t) ch); + ADDWC (wch); continue; } From patchwork Thu May 19 21:06:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 54241 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7F2573839C6A for ; Thu, 19 May 2022 21:10:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7F2573839C6A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652994634; bh=FVEI4s8FQz2Ry1PLC+7PC9sLePwO9R30R6ChiV/MyPY=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=A6uLg2kdfz0DQvrehOklKyMibtZSig66qsOHY/NobepU9AnHZt2xbPNOwU/Sbf3Go EcNOeGwJjzXG0Omm+nH0uEa2qh5gLOKq9EuvHc2DZYY9T1tqGh7as65CW/anaJC9up s7YNX/tn2793kMeCu7Edd6vXReaDMONCZXIdU76I= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A59F33839C5D for ; Thu, 19 May 2022 21:06:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A59F33839C5D Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-62-2-zBeyBbP5WZAMQ57sO_OA-1; Thu, 19 May 2022 17:06:49 -0400 X-MC-Unique: 2-zBeyBbP5WZAMQ57sO_OA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0BBAB38349AA for ; Thu, 19 May 2022 21:06:49 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.58]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 74ED7492CA3 for ; Thu, 19 May 2022 21:06:48 +0000 (UTC) To: libc-alpha@sourceware.org Subject: [PATCH 5/5] de_DE: Convert to UTF-8 In-Reply-To: References: X-From-Line: 8faf1d5dc7508a17bd14005b54f89593667aeecb Mon Sep 17 00:00:00 2001 Message-Id: <8faf1d5dc7508a17bd14005b54f89593667aeecb.1652994079.git.fweimer@redhat.com> Date: Thu, 19 May 2022 23:06:46 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_05, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Florian Weimer via Libc-alpha From: Florian Weimer Reply-To: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" --- localedata/locales/de_DE | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) Reviewed-by: Carlos O'Donell Tested-by: Carlos O'Donell diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE index 49a421fff4..52767808f7 100644 --- a/localedata/locales/de_DE +++ b/localedata/locales/de_DE @@ -46,36 +46,36 @@ include "translit_combining";"" % German umlauts. % LATIN CAPITAL LETTER A WITH DIAERESIS. - "A";"AE" +Ä "Ä";"AE" % LATIN CAPITAL LETTER O WITH DIAERESIS. - "O";"OE" +Ö "Ö";"OE" % LATIN CAPITAL LETTER U WITH DIAERESIS. - "U";"UE" +Ü "Ü";"UE" % LATIN SMALL LETTER A WITH DIAERESIS. - "a";"ae" +ä "ä";"ae" % LATIN SMALL LETTER O WITH DIAERESIS. - "o";"oe" +ö "ö";"oe" % LATIN SMALL LETTER U WITH DIAERESIS. - "u";"ue" +ü "ü";"ue" % Danish. % LATIN CAPITAL LETTER A WITH RING ABOVE. - "A";"AA" +Å "Å";"AA" % LATIN SMALL LETTER A WITH RING ABOVE. - "a";"aa" +å "å";"aa" % The following strange first-level transliteration derive from the use % U201E and U201C as "correct" quoting characters. These two characters % do not really belong together. The result is that somebody who uses % U201C and U201D will get the incorrect U00AB / U00BB sequences. % LEFT DOUBLE QUOTATION MARK - ; +“ «; % RIGHT DOUBLE QUOTATION MARK - ; +” »; % DOUBLE LOW-9 QUOTATION MARK - ;"" +„ »;",," % DOUBLE HIGH-REVERSED-9 QUOTATION MARK - ; +‟ «; translit_end @@ -90,7 +90,7 @@ END LC_COLLATE LC_MONETARY int_curr_symbol "EUR " -currency_symbol "" +currency_symbol "€" mon_decimal_point "," mon_thousands_sep "." mon_grouping 3;3 @@ -126,14 +126,14 @@ day "Sonntag";/ "Freitag";/ "Samstag" abmon "Jan";"Feb";/ - "Mr";"Apr";/ + "Mär";"Apr";/ "Mai";"Jun";/ "Jul";"Aug";/ "Sep";"Okt";/ "Nov";"Dez" mon "Januar";/ "Februar";/ - "Mrz";/ + "März";/ "April";/ "Mai";/ "Juni";/ @@ -172,7 +172,7 @@ END LC_PAPER LC_NAME name_fmt "%d%t%g%t%m%t%f" -name_miss "Frulein" +name_miss "Fräulein" name_mr "Herr" name_mrs "Frau" name_ms "Frau"