diff mbox series

[5/5] de_DE: Convert to UTF-8

Message ID	8faf1d5dc7508a17bd14005b54f89593667aeecb.1652994079.git.fweimer@redhat.com
State	Committed
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A59F33839C5D To: libc-alpha@sourceware.org Subject: [PATCH 5/5] de_DE: Convert to UTF-8 In-Reply-To: <cover.1652994079.git.fweimer@redhat.com> References: <cover.1652994079.git.fweimer@redhat.com> Message-Id: <8faf1d5dc7508a17bd14005b54f89593667aeecb.1652994079.git.fweimer@redhat.com> Date: Thu, 19 May 2022 23:06:46 +0200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: list From: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Florian Weimer <fweimer@redhat.com> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
Series	Assume UTF-8 encoding for localedef input files \| [0/5] Assume UTF-8 encoding for localedef input files [1/5] locale: Turn ADDC and ADDS into functions in linereader.c [2/5] locale: Fix signed char bug in lr_getc [3/5] locale: Introduce translate_unicode_codepoint into linereader.c [4/5] locale: localdef input files are now encoded in UTF-8 [5/5] de_DE: Convert to UTF-8

Checks

Context	Check	Description
dj/TryBot-apply_patch	success	Patch applied to master at the time it was sent
dj/TryBot-32bit	success	Build for i686

Commit Message

Florian Weimer May 19, 2022, 9:06 p.m. UTC

  ---
 localedata/locales/de_DE | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

Comments

Carlos O'Donell July 4, 2022, 7:54 p.m. UTC | #1

On 5/19/22 17:06, Florian Weimer via Libc-alpha wrote:
> ---

I appreciate that you did a test conversion!

LGTM.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>


>  localedata/locales/de_DE | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE
> index 49a421fff4..52767808f7 100644
> --- a/localedata/locales/de_DE
> +++ b/localedata/locales/de_DE
> @@ -46,36 +46,36 @@ include "translit_combining";""
>  
>  % German umlauts.
>  % LATIN CAPITAL LETTER A WITH DIAERESIS.
> -<U00C4> "A<U0308>";"AE"
> +Ä "Ä";"AE"
>  % LATIN CAPITAL LETTER O WITH DIAERESIS.
> -<U00D6> "O<U0308>";"OE"
> +Ö "Ö";"OE"
>  % LATIN CAPITAL LETTER U WITH DIAERESIS.
> -<U00DC> "U<U0308>";"UE"
> +Ü "Ü";"UE"
>  % LATIN SMALL LETTER A WITH DIAERESIS.
> -<U00E4> "a<U0308>";"ae"
> +ä "ä";"ae"
>  % LATIN SMALL LETTER O WITH DIAERESIS.
> -<U00F6> "o<U0308>";"oe"
> +ö "ö";"oe"
>  % LATIN SMALL LETTER U WITH DIAERESIS.
> -<U00FC> "u<U0308>";"ue"
> +ü "ü";"ue"
>  
>  % Danish.
>  % LATIN CAPITAL LETTER A WITH RING ABOVE.
> -<U00C5> "A<U030A>";"AA"
> +Å "Å";"AA"
>  % LATIN SMALL LETTER A WITH RING ABOVE.
> -<U00E5> "a<U030A>";"aa"
> +å "å";"aa"
>  
>  % The following strange first-level transliteration derive from the use
>  % U201E and U201C as "correct" quoting characters.  These two characters
>  % do not really belong together.  The result is that somebody who uses
>  % U201C and U201D will get the incorrect U00AB / U00BB sequences.
>  % LEFT DOUBLE QUOTATION MARK
> -<U201C> <U00AB>;<U0022>
> +“ «;<U0022>
>  % RIGHT DOUBLE QUOTATION MARK
> -<U201D> <U00BB>;<U0022>
> +” »;<U0022>
>  % DOUBLE LOW-9 QUOTATION MARK
> -<U201E> <U00BB>;"<U002C><U002C>"
> +„ »;",,"
>  % DOUBLE HIGH-REVERSED-9 QUOTATION MARK
> -<U201F> <U00AB>;<U0022>
> +‟ «;<U0022>
>  
>  translit_end
>  
> @@ -90,7 +90,7 @@ END LC_COLLATE
>  
>  LC_MONETARY
>  int_curr_symbol     "EUR "
> -currency_symbol     "<U20AC>"
> +currency_symbol     "€"
>  mon_decimal_point   ","
>  mon_thousands_sep   "."
>  mon_grouping        3;3
> @@ -126,14 +126,14 @@ day	"Sonntag";/
>  	"Freitag";/
>  	"Samstag"
>  abmon	"Jan";"Feb";/
> -	"M<U00E4>r";"Apr";/
> +	"Mär";"Apr";/
>  	"Mai";"Jun";/
>  	"Jul";"Aug";/
>  	"Sep";"Okt";/
>  	"Nov";"Dez"
>  mon	"Januar";/
>  	"Februar";/
> -	"M<U00E4>rz";/
> +	"März";/
>  	"April";/
>  	"Mai";/
>  	"Juni";/
> @@ -172,7 +172,7 @@ END LC_PAPER
>  
>  LC_NAME
>  name_fmt    "%d%t%g%t%m%t%f"
> -name_miss   "Fr<U00E4>ulein"
> +name_miss   "Fräulein"
>  name_mr     "Herr"
>  name_mrs    "Frau"
>  name_ms     "Frau"

Andreas Schwab July 5, 2022, 9:27 a.m. UTC | #2

On Mai 19 2022, Florian Weimer via Libc-alpha wrote:

> diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE
> index 49a421fff4..52767808f7 100644
> --- a/localedata/locales/de_DE
> +++ b/localedata/locales/de_DE
> @@ -46,36 +46,36 @@ include "translit_combining";""
>  
>  % German umlauts.
>  % LATIN CAPITAL LETTER A WITH DIAERESIS.
> -<U00C4> "A<U0308>";"AE"
> +Ä "Ä";"AE"
>  % LATIN CAPITAL LETTER O WITH DIAERESIS.
> -<U00D6> "O<U0308>";"OE"
> +Ö "Ö";"OE"
>  % LATIN CAPITAL LETTER U WITH DIAERESIS.
> -<U00DC> "U<U0308>";"UE"
> +Ü "Ü";"UE"
>  % LATIN SMALL LETTER A WITH DIAERESIS.
> -<U00E4> "a<U0308>";"ae"
> +ä "ä";"ae"
>  % LATIN SMALL LETTER O WITH DIAERESIS.
> -<U00F6> "o<U0308>";"oe"
> +ö "ö";"oe"
>  % LATIN SMALL LETTER U WITH DIAERESIS.
> -<U00FC> "u<U0308>";"ue"
> +ü "ü";"ue"
>  
>  % Danish.
>  % LATIN CAPITAL LETTER A WITH RING ABOVE.
> -<U00C5> "A<U030A>";"AA"
> +Å "Å";"AA"
>  % LATIN SMALL LETTER A WITH RING ABOVE.
> -<U00E5> "a<U030A>";"aa"
> +å "å";"aa"

This has the danger of an overeager editor converting between NFC and
NFD.

Florian Weimer July 5, 2022, 9:55 a.m. UTC | #3

* Andreas Schwab:

>> -<U00E5> "a<U030A>";"aa"
>> +å "å";"aa"
>
> This has the danger of an overeager editor converting between NFC and
> NFD.

Wouldn't we catch this during patch review, similar to a spurious
whitespace change?

(And I think that ideally, normalization of combining characters would
be handled by the collation framework, and not within the data.)

Thanks,
Florian

Andreas Schwab July 5, 2022, 10:38 a.m. UTC | #4

On Jul 05 2022, Florian Weimer wrote:

> * Andreas Schwab:
>
>>> -<U00E5> "a<U030A>";"aa"
>>> +å "å";"aa"
>>
>> This has the danger of an overeager editor converting between NFC and
>> NFD.
>
> Wouldn't we catch this during patch review, similar to a spurious
> whitespace change?

If all reviewers are careful enough to use diff-mode.

diff mbox series

Patch

diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE
index 49a421fff4..52767808f7 100644
--- a/localedata/locales/de_DE
+++ b/localedata/locales/de_DE
@@ -46,36 +46,36 @@  include "translit_combining";""
 
 % German umlauts.
 % LATIN CAPITAL LETTER A WITH DIAERESIS.
-<U00C4> "A<U0308>";"AE"
+Ä "Ä";"AE"
 % LATIN CAPITAL LETTER O WITH DIAERESIS.
-<U00D6> "O<U0308>";"OE"
+Ö "Ö";"OE"
 % LATIN CAPITAL LETTER U WITH DIAERESIS.
-<U00DC> "U<U0308>";"UE"
+Ü "Ü";"UE"
 % LATIN SMALL LETTER A WITH DIAERESIS.
-<U00E4> "a<U0308>";"ae"
+ä "ä";"ae"
 % LATIN SMALL LETTER O WITH DIAERESIS.
-<U00F6> "o<U0308>";"oe"
+ö "ö";"oe"
 % LATIN SMALL LETTER U WITH DIAERESIS.
-<U00FC> "u<U0308>";"ue"
+ü "ü";"ue"
 
 % Danish.
 % LATIN CAPITAL LETTER A WITH RING ABOVE.
-<U00C5> "A<U030A>";"AA"
+Å "Å";"AA"
 % LATIN SMALL LETTER A WITH RING ABOVE.
-<U00E5> "a<U030A>";"aa"
+å "å";"aa"
 
 % The following strange first-level transliteration derive from the use
 % U201E and U201C as "correct" quoting characters.  These two characters
 % do not really belong together.  The result is that somebody who uses
 % U201C and U201D will get the incorrect U00AB / U00BB sequences.
 % LEFT DOUBLE QUOTATION MARK
-<U201C> <U00AB>;<U0022>
+“ «;<U0022>
 % RIGHT DOUBLE QUOTATION MARK
-<U201D> <U00BB>;<U0022>
+” »;<U0022>
 % DOUBLE LOW-9 QUOTATION MARK
-<U201E> <U00BB>;"<U002C><U002C>"
+„ »;",,"
 % DOUBLE HIGH-REVERSED-9 QUOTATION MARK
-<U201F> <U00AB>;<U0022>
+‟ «;<U0022>
 
 translit_end
 
@@ -90,7 +90,7 @@  END LC_COLLATE
 
 LC_MONETARY
 int_curr_symbol     "EUR "
-currency_symbol     "<U20AC>"
+currency_symbol     "€"
 mon_decimal_point   ","
 mon_thousands_sep   "."
 mon_grouping        3;3
@@ -126,14 +126,14 @@  day	"Sonntag";/
 	"Freitag";/
 	"Samstag"
 abmon	"Jan";"Feb";/
-	"M<U00E4>r";"Apr";/
+	"Mär";"Apr";/
 	"Mai";"Jun";/
 	"Jul";"Aug";/
 	"Sep";"Okt";/
 	"Nov";"Dez"
 mon	"Januar";/
 	"Februar";/
-	"M<U00E4>rz";/
+	"März";/
 	"April";/
 	"Mai";/
 	"Juni";/
@@ -172,7 +172,7 @@  END LC_PAPER
 
 LC_NAME
 name_fmt    "%d%t%g%t%m%t%f"
-name_miss   "Fr<U00E4>ulein"
+name_miss   "Fräulein"
 name_mr     "Herr"
 name_mrs    "Frau"
 name_ms     "Frau"