From patchwork Mon Dec 15 21:52:38 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 4264 Received: (qmail 12890 invoked by alias); 15 Dec 2014 21:53:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 12876 invoked by uid 89); 15 Dec 2014 21:53:03 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_50, SPF_HELO_PASS, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com From: Alexandre Oliva To: libc-alpha@sourceware.org Subject: [PATCH] sort diacritics left-to-right except in fr_CA locale Date: Mon, 15 Dec 2014 19:52:38 -0200 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 This fixes a long-standing collation bug in glibc, affecting all locales but de_DE, lb_LU and fr_CA. This led me to write a separate NEWS entry for this bug; do we want a bug report in the database regardless? Tested on x86_64-linux-gnu. Ok to install? for ChangeLog * localedata/Makefile (test-input): Add fr_CA.UTF-8. (LOCALES): Likewise. * localedata/fr_CA.in: Copied and adjusted from... * localedata/fr_FR.in: ... this. Adjusted too. * localedata/locales/de_DE (DIACRIT_FORWARD): Do not define. * localedata/locales/lb_LU (DIACRIT_FORWARD): Likewise. * localedata/locales/fr_CA (DIACRIT_BACKWARD): Define. * localedata/locales/iso14651_t1_common (DIACRIT_FORWARD): Make it the new default, overridable with DIACRIT_BACKWARD. * NEWS: Note behavior change. --- NEWS | 9 +++ localedata/Makefile | 4 + localedata/fr_CA.in | 96 +++++++++++++++++++++++++++++++++ localedata/fr_FR.in | 22 ++++---- localedata/locales/de_DE | 2 - localedata/locales/fr_CA | 2 + localedata/locales/iso14651_t1_common | 6 +- localedata/locales/lb_LU | 2 - 8 files changed, 123 insertions(+), 20 deletions(-) create mode 100644 localedata/fr_CA.in diff --git a/NEWS b/NEWS index a324c10..1a78cda 100644 --- a/NEWS +++ b/NEWS @@ -41,6 +41,15 @@ Version 2.21 * Merged gettext 0.19.3 into the intl subdirectory. This fixes building with newer versions of bison. + +* Collation (sorting) general rules regarding diacritics have been fixed to + match those in Unicode CLDR, namely, whether diacritic tie-breaking takes + place in a forward or backward pass over the strings or wstrings. The + only locale that sort diacritics with a backward pass is now fr_CA; it + already sorted «cote < côte < coté < côté» before. All other locales now + use a forward pass, so that they sort «cote < coté < côte < côté», which + only de_DE and lb_LU did before. + Version 2.20 diff --git a/localedata/Makefile b/localedata/Makefile index 0826b36..4fc523e 100644 --- a/localedata/Makefile +++ b/localedata/Makefile @@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \ tst-ctype tst-langinfo tst-langinfo-static tst-numeric test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \ hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \ - si_LK.UTF-8 + si_LK.UTF-8 fr_CA.UTF-8 test-input-data = $(addsuffix .in, $(basename $(test-input))) test-output := $(foreach s, .out .xout, \ $(addsuffix $s, $(basename $(test-input)))) @@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \ hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \ nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \ zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \ - tr_TR.ISO-8859-9 en_GB.UTF-8 + tr_TR.ISO-8859-9 en_GB.UTF-8 fr_CA.UTF-8 LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g') CHARMAPS := $(shell echo "$(LOCALES)" | \ sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g) diff --git a/localedata/fr_CA.in b/localedata/fr_CA.in new file mode 100644 index 0000000..1c05d69 --- /dev/null +++ b/localedata/fr_CA.in @@ -0,0 +1,96 @@ +@@@@@ +0000 +9999 +Aalborg +aide +aïeul +air +@@@air +air@@@ +Ålborg +août +bohème +Bohême +Bohémien +caennais +cæsium +çà et là +C.A.F. +Canon +cañon +casanier +cølibat +colon +côlon +COOP +CO-OP +coop +co-op +Copenhagen +COTE +cote +CÔTE +côte +COTÉ +coté +CÔTÉ +côté +du +dû +élève +élevé +gène +gêne +gêné +Größe +Grossist +haie +haïe +île +Île d'Orléans +lame +l'âme +lamé +les +LÈS +lèse +lésé +L'Haÿ-les-Roses +MÂCON +maçon +McArthur +Mc Arthur +Mc Mahon +MODÈLE +modelé +NOËL +Noël +notre +nôtre +ode +œil +ou +OÙ +ovoïde +pèche +pêche +PÉCHÉ +péché +pêché +pécher +pêcher +pechère +péchère +relève +relevé +resume +resumé +résumé +révèle +révélé +vice-president +vice-président +vice-president's offices +vice-presidents' offices +VICE-VERSA +vice versa diff --git a/localedata/fr_FR.in b/localedata/fr_FR.in index dd5c533..070eb4dc 100644 --- a/localedata/fr_FR.in +++ b/localedata/fr_FR.in @@ -29,16 +29,16 @@ CO-OP Copenhagen cote COTE -côte -CÔTE coté COTÉ +côte +CÔTE côté CÔTÉ du dû -élève élevé +élève gène gêne gêné @@ -49,20 +49,20 @@ haïe île Île d'Orléans lame -l'âme lamé +l'âme les LÈS -lèse lésé +lèse L'Haÿ-les-Roses -MÂCON maçon +MÂCON McArthur Mc Arthur Mc Mahon -MODÈLE modelé +MODÈLE Noël NOËL notre @@ -72,22 +72,22 @@ ode ou OÙ ovoïde -pèche -pêche péché PÉCHÉ +pèche +pêche pêché pécher pêcher pechère péchère -relève relevé +relève resume resumé résumé -révèle révélé +révèle vice-president vice-président vice-president's offices diff --git a/localedata/locales/de_DE b/localedata/locales/de_DE index e2704a7..2c3510a 100644 --- a/localedata/locales/de_DE +++ b/localedata/locales/de_DE @@ -76,8 +76,6 @@ END LC_CTYPE LC_COLLATE -define DIACRIT_FORWARD - % Copy the template from ISO/IEC 14651 copy "iso14651_t1" diff --git a/localedata/locales/fr_CA b/localedata/locales/fr_CA index 5e2c5a1..878539b 100644 --- a/localedata/locales/fr_CA +++ b/localedata/locales/fr_CA @@ -51,6 +51,8 @@ copy "fr_FR" END LC_CTYPE LC_COLLATE +define DIACRIT_BACKWARD + copy "en_CA" END LC_COLLATE diff --git a/localedata/locales/iso14651_t1_common b/localedata/locales/iso14651_t1_common index e0c3eaa..1fc214f 100644 --- a/localedata/locales/iso14651_t1_common +++ b/localedata/locales/iso14651_t1_common @@ -5060,10 +5060,10 @@ order_start ;forward;backward;forward;forward,position IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; -ifdef DIACRIT_FORWARD -order_start ;forward;forward;forward;forward,position -else +ifdef DIACRIT_BACKWARD order_start ;forward;backward;forward;forward,position +else +order_start ;forward;forward;forward;forward,position endif # ;;;IGNORE # 170 diff --git a/localedata/locales/lb_LU b/localedata/locales/lb_LU index a74e162..c8616fd 100644 --- a/localedata/locales/lb_LU +++ b/localedata/locales/lb_LU @@ -77,8 +77,6 @@ END LC_CTYPE LC_COLLATE -define DIACRIT_FORWARD - % Copy the template from ISO/IEC 14651 copy "iso14651_t1"