sort diacritics left-to-right except in fr_CA locale
Commit Message
On Dec 17, 2014, Roland McGrath <roland@hack.frob.com> wrote:
>> Noted, thanks. Any other comments on the patch, before I post a revised
>> version mentioning the yet-to-be-filed bug report?
> I am pretty useless in that area of the code, sorry.
Ping? (as in, anyone else? :-)
Here's a revised patch that adds a reference to the newly-filed bug
report.
for ChangeLog
[BZ #17750]
* localedata/Makefile (test-input): Add fr_CA.UTF-8.
(LOCALES): Likewise.
* localedata/fr_CA.in: Copied and adjusted from...
* localedata/fr_FR.in: ... this. Adjusted too.
* localedata/locales/de_DE (DIACRIT_FORWARD): Do not define.
* localedata/locales/lb_LU (DIACRIT_FORWARD): Likewise.
* localedata/locales/fr_CA (DIACRIT_BACKWARD): Define.
* localedata/locales/iso14651_t1_common (DIACRIT_FORWARD):
Make it the new default, overridable with DIACRIT_BACKWARD.
* NEWS: Note behavior change.
---
NEWS | 11 +++-
localedata/Makefile | 4 +
localedata/fr_CA.in | 96 +++++++++++++++++++++++++++++++++
localedata/fr_FR.in | 22 ++++----
localedata/locales/de_DE | 2 -
localedata/locales/fr_CA | 2 +
localedata/locales/iso14651_t1_common | 6 +-
localedata/locales/lb_LU | 2 -
8 files changed, 124 insertions(+), 21 deletions(-)
create mode 100644 localedata/fr_CA.in
Comments
On Dec 23, 2014, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Dec 17, 2014, Roland McGrath <roland@hack.frob.com> wrote:
>>> Noted, thanks. Any other comments on the patch, before I post a revised
>>> version mentioning the yet-to-be-filed bug report?
>> I am pretty useless in that area of the code, sorry.
> Ping? (as in, anyone else? :-)
> Here's a revised patch that adds a reference to the newly-filed bug
> report.
> for ChangeLog
> [BZ #17750]
> * localedata/Makefile (test-input): Add fr_CA.UTF-8.
> (LOCALES): Likewise.
> * localedata/fr_CA.in: Copied and adjusted from...
> * localedata/fr_FR.in: ... this. Adjusted too.
> * localedata/locales/de_DE (DIACRIT_FORWARD): Do not define.
> * localedata/locales/lb_LU (DIACRIT_FORWARD): Likewise.
> * localedata/locales/fr_CA (DIACRIT_BACKWARD): Define.
> * localedata/locales/iso14651_t1_common (DIACRIT_FORWARD):
> Make it the new default, overridable with DIACRIT_BACKWARD.
> * NEWS: Note behavior change.
Ping?
On 23 Dec 2014 02:47, Alexandre Oliva wrote:
> On Dec 17, 2014, Roland McGrath <roland@hack.frob.com> wrote:
>
> >> Noted, thanks. Any other comments on the patch, before I post a revised
> >> version mentioning the yet-to-be-filed bug report?
>
> > I am pretty useless in that area of the code, sorry.
>
> Ping? (as in, anyone else? :-)
>
> Here's a revised patch that adds a reference to the newly-filed bug
> report.
ok
-mike
On 05 Mar 2015 13:05, Mike Frysinger wrote:
> On 23 Dec 2014 02:47, Alexandre Oliva wrote:
> > On Dec 17, 2014, Roland McGrath <roland@hack.frob.com> wrote:
> >
> > >> Noted, thanks. Any other comments on the patch, before I post a revised
> > >> version mentioning the yet-to-be-filed bug report?
> >
> > > I am pretty useless in that area of the code, sorry.
> >
> > Ping? (as in, anyone else? :-)
> >
> > Here's a revised patch that adds a reference to the newly-filed bug
> > report.
>
> ok
were you going to merge this ?
-mike
On Tue, Apr 12, 2016 at 03:49:03AM -0400, Mike Frysinger wrote:
> On 05 Mar 2015 13:05, Mike Frysinger wrote:
> > On 23 Dec 2014 02:47, Alexandre Oliva wrote:
> > > On Dec 17, 2014, Roland McGrath <roland@hack.frob.com> wrote:
> > >
> > > >> Noted, thanks. Any other comments on the patch, before I post a revised
> > > >> version mentioning the yet-to-be-filed bug report?
> > >
> > > > I am pretty useless in that area of the code, sorry.
> > >
> > > Ping? (as in, anyone else? :-)
> > >
> > > Here's a revised patch that adds a reference to the newly-filed bug
> > > report.
> >
> > ok
>
> were you going to merge this ?
Well, a number of locales where French is influential, should stll have
the reversed accents sorting. This should include fr_BE, fr_CH, da_DK,
fr_CA and a number of locales in Africa, that uses French as a business language.
da_DK we say that as where this matters, it is most likely because the words or names
originate from French. The same reasoning coul also apply to nb_NO, nn_NO and sv_SE
Best regards
keld
@@ -15,7 +15,7 @@ Version 2.21
17522, 17555, 17570, 17571, 17572, 17573, 17574, 17581, 17582, 17583,
17584, 17585, 17589, 17594, 17601, 17608, 17616, 17625, 17630, 17633,
17634, 17647, 17653, 17657, 17664, 17665, 17668, 17682, 17717, 17719,
- 17722, 17724, 17725, 17733, 17744, 17745, 17746, 17747.
+ 17722, 17724, 17725, 17733, 17744, 17745, 17746, 17747, 17750.
* CVE-2104-7817 The wordexp function could ignore the WRDE_NOCMD flag
under certain input conditions resulting in the execution of a shell for
@@ -46,6 +46,15 @@ Version 2.21
* Merged gettext 0.19.3 into the intl subdirectory. This fixes building
with newer versions of bison.
+
+* Collation (sorting) general rules regarding diacritics have been fixed to
+ match those in Unicode CLDR, namely, whether diacritic tie-breaking takes
+ place in a forward or backward pass over the strings or wstrings. The
+ only locale that sort diacritics with a backward pass is now fr_CA; it
+ already sorted «cote < côte < coté < côté» before. All other locales now
+ use a forward pass, so that they sort «cote < coté < côte < côté», which
+ only de_DE and lb_LU did before. (Bugzilla #17750)
+
Version 2.20
@@ -37,7 +37,7 @@ test-srcs := collate-test xfrm-test tst-fmon tst-rpmatch tst-trans \
tst-ctype tst-langinfo tst-langinfo-static tst-numeric
test-input := de_DE.ISO-8859-1 en_US.ISO-8859-1 da_DK.ISO-8859-1 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 tr_TR.UTF-8 fr_FR.UTF-8 \
- si_LK.UTF-8
+ si_LK.UTF-8 fr_CA.UTF-8
test-input-data = $(addsuffix .in, $(basename $(test-input)))
test-output := $(foreach s, .out .xout, \
$(addsuffix $s, $(basename $(test-input))))
@@ -106,7 +106,7 @@ LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 \
hr_HR.ISO-8859-2 sv_SE.ISO-8859-1 ja_JP.SJIS fr_FR.ISO-8859-1 \
nb_NO.ISO-8859-1 nn_NO.ISO-8859-1 tr_TR.UTF-8 cs_CZ.UTF-8 \
zh_TW.EUC-TW fa_IR.UTF-8 fr_FR.UTF-8 ja_JP.UTF-8 si_LK.UTF-8 \
- tr_TR.ISO-8859-9 en_GB.UTF-8
+ tr_TR.ISO-8859-9 en_GB.UTF-8 fr_CA.UTF-8
LOCALE_SRCS := $(shell echo "$(LOCALES)"|sed 's/\([^ .]*\)[^ ]*/\1/g')
CHARMAPS := $(shell echo "$(LOCALES)" | \
sed -e 's/[^ .]*[.]\([^ ]*\)/\1/g' -e s/SJIS/SHIFT_JIS/g)
new file mode 100644
@@ -0,0 +1,96 @@
+@@@@@
+0000
+9999
+Aalborg
+aide
+aïeul
+air
+@@@air
+air@@@
+Ã…lborg
+août
+bohème
+Bohême
+Bohémien
+caennais
+cæsium
+çà et lÃ
+C.A.F.
+Canon
+cañon
+casanier
+cølibat
+colon
+côlon
+COOP
+CO-OP
+coop
+co-op
+Copenhagen
+COTE
+cote
+CÔTE
+côte
+COTÉ
+coté
+CÔTÉ
+côté
+du
+dû
+élève
+élevé
+gène
+gêne
+gêné
+Größe
+Grossist
+haie
+haïe
+île
+Île d'Orléans
+lame
+l'âme
+lamé
+les
+LÈS
+lèse
+lésé
+L'Haÿ-les-Roses
+MÂCON
+maçon
+McArthur
+Mc Arthur
+Mc Mahon
+MODÈLE
+modelé
+NOËL
+Noël
+notre
+nôtre
+ode
+Å“il
+ou
+OÙ
+ovoïde
+pèche
+pêche
+PÉCHÉ
+péché
+pêché
+pécher
+pêcher
+pechère
+péchère
+relève
+relevé
+resume
+resumé
+résumé
+révèle
+révélé
+vice-president
+vice-président
+vice-president's offices
+vice-presidents' offices
+VICE-VERSA
+vice versa
@@ -29,16 +29,16 @@ CO-OP
Copenhagen
cote
COTE
-côte
-CÔTE
coté
COTÉ
+côte
+CÔTE
côté
CÔTÉ
du
dû
-élève
élevé
+élève
gène
gêne
gêné
@@ -49,20 +49,20 @@ haïe
île
Île d'Orléans
lame
-l'âme
lamé
+l'âme
les
LÈS
-lèse
lésé
+lèse
L'Haÿ-les-Roses
-MÂCON
maçon
+MÂCON
McArthur
Mc Arthur
Mc Mahon
-MODÈLE
modelé
+MODÈLE
Noël
NOËL
notre
@@ -72,22 +72,22 @@ ode
ou
OÙ
ovoïde
-pèche
-pêche
péché
PÉCHÉ
+pèche
+pêche
pêché
pécher
pêcher
pechère
péchère
-relève
relevé
+relève
resume
resumé
résumé
-révèle
révélé
+révèle
vice-president
vice-président
vice-president's offices
@@ -76,8 +76,6 @@ END LC_CTYPE
LC_COLLATE
-define DIACRIT_FORWARD
-
% Copy the template from ISO/IEC 14651
copy "iso14651_t1"
@@ -51,6 +51,8 @@ copy "fr_FR"
END LC_CTYPE
LC_COLLATE
+define DIACRIT_BACKWARD
+
copy "en_CA"
END LC_COLLATE
@@ -5060,10 +5060,10 @@ order_start <SPECIAL>;forward;backward;forward;forward,position
<U009E> IGNORE;IGNORE;IGNORE;<U009E>
<U009F> IGNORE;IGNORE;IGNORE;<U009F>
-ifdef DIACRIT_FORWARD
-order_start <LATIN>;forward;forward;forward;forward,position
-else
+ifdef DIACRIT_BACKWARD
order_start <LATIN>;forward;backward;forward;forward,position
+else
+order_start <LATIN>;forward;forward;forward;forward,position
endif
#
<U00A0> <U0020>;<BAS>;<MIN>;IGNORE # 170<NBSP>
@@ -77,8 +77,6 @@ END LC_CTYPE
LC_COLLATE
-define DIACRIT_FORWARD
-
% Copy the template from ISO/IEC 14651
copy "iso14651_t1"