Update sv_SE to treate 'W' as a distinct character (Bug 25036)

Message ID 20210319014318.2565491-1-carlos@redhat.com
State Committed
Commit ebde2baeb535661019b8f774a906d6abd332f3b8
Headers
Series Update sv_SE to treate 'W' as a distinct character (Bug 25036) |

Commit Message

Carlos O'Donell March 19, 2021, 1:43 a.m. UTC
  From: Sebastian Rasmussen <sebras@gmail.com>

The 13th edition of Svenska Akademiens ordlista lists 'W' as a
distinct letter that sorts after 'V'. We adjust the sv_SE locale
(and tests) to match this updated and "reformed" language change.
This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
the letter 'W'.

No regressions on x86_64, and locale sorting tests all pass.

Co-authored-by: Carlos O'Donell <carlos@redhat.com>
---
 localedata/locales/sv_SE       | 26 +++++++++-----------------
 localedata/sv_SE.ISO-8859-1.in |  4 ++--
 localedata/sv_SE.UTF-8.in      |  4 ++--
 3 files changed, 13 insertions(+), 21 deletions(-)
  

Comments

Carlos O'Donell April 6, 2021, 2:23 p.m. UTC | #1
On 3/18/21 9:43 PM, Carlos O'Donell wrote:
> From: Sebastian Rasmussen <sebras@gmail.com>
> 
> The 13th edition of Svenska Akademiens ordlista lists 'W' as a
> distinct letter that sorts after 'V'. We adjust the sv_SE locale
> (and tests) to match this updated and "reformed" language change.
> This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
> the letter 'W'.

I will be committing this patch shortly to resolve this issue.

I haven't seen any objections and the general consensus is to
harmonize with CLDR which has already made these changes.

General feedback from native speakers is that this is the correct
way forward for the sv_SE locale.
 
> No regressions on x86_64, and locale sorting tests all pass.
> 
> Co-authored-by: Carlos O'Donell <carlos@redhat.com>
> ---
>  localedata/locales/sv_SE       | 26 +++++++++-----------------
>  localedata/sv_SE.ISO-8859-1.in |  4 ++--
>  localedata/sv_SE.UTF-8.in      |  4 ++--
>  3 files changed, 13 insertions(+), 21 deletions(-)
> 
> diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE
> index b0901726db..f54c73226d 100644
> --- a/localedata/locales/sv_SE
> +++ b/localedata/locales/sv_SE
> @@ -61,22 +61,25 @@ LC_COLLATE
>  copy "iso14651_t1"
>  
>  % CLDR collation rules for Swedish:
> -% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/sv.xml)
> +% (https://github.com/unicode-org/cldr/blob/master/common/collation/sv.xml)
>  %
> -% <collation type="standard">
> +% We use the new "reformed" rules from the 13th edition of Svenska Akademiens
> +% ordlista where 'W' is considered a distinct character sorting after 'V'.
> +% This matches CLDR 1.5.0 released in 2007.
> +%
> +% <defaultCollation>reformed</defaultCollation>
> +% <collation type="reformed">
>  %   <cr><![CDATA[
>  %     &D<<đ<<<Đ<<ð<<<Ð
>  %     &t<<<þ/h
>  %     &T<<<Þ/H
> -%     &v<<<V<<w<<<W
>  %     &Y<<ü<<<Ü<<ű<<<Ű
>  %     &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<<ô<<<Ô
>  %   ]]></cr>
>  % </collation>
>  %
> -% And CLDR also lists the following
> -% index characters:
> -% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/sv.xml)
> +% And CLDR also lists the following index characters:
> +% (https://github.com/unicode-org/cldr/blob/master/common/main/sv.xml)
>  %
>  % <exemplarCharacters type="index">[A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö]</exemplarCharacters>
>  %
> @@ -103,17 +106,6 @@ reorder-after <AFTER-Z>
>  <U00DE> "<S0074><S0068>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";IGNORE % Þ
>  <U00FE> "<S0074><S0068>";"<BASE><BASE>";"<COMPAT><COMPAT>";IGNORE % þ
>  
> -% The letter w is normally not present in the Swedish alphabet. It
> -% exists in some names in Swedish and foreign words, but is accounted
> -% for as a variant of 'v'.  Words and names with 'w' are in Swedish
> -% ordered alphabetically among the words and names with 'v'. If two
> -% words or names are only to be distinguished by 'v' or % 'w', 'v' is
> -% placed before 'w'.
> -
> -% &v<<<V<<w<<<W
> -<U0057> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";IGNORE % W
> -<U0077> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";IGNORE % w
> -
>  % &Y<<ü<<<Ü<<ű<<<Ű
>  <U00DC> <S0079>;"<BASE><TREMA>";"<CAP><MIN>";IGNORE % Ü
>  <U00FC> <S0079>;"<BASE><TREMA>";"<MIN><MIN>";IGNORE % ü
> diff --git a/localedata/sv_SE.ISO-8859-1.in b/localedata/sv_SE.ISO-8859-1.in
> index 967c761370..94552ea80a 100644
> --- a/localedata/sv_SE.ISO-8859-1.in
> +++ b/localedata/sv_SE.ISO-8859-1.in
> @@ -42,10 +42,10 @@ u
>  U
>  v
>  V
> -w
> -W
>  va
>  Va
> +w
> +W
>  x
>  X
>  y
> diff --git a/localedata/sv_SE.UTF-8.in b/localedata/sv_SE.UTF-8.in
> index 6db46e6271..80a093e709 100644
> --- a/localedata/sv_SE.UTF-8.in
> +++ b/localedata/sv_SE.UTF-8.in
> @@ -65,10 +65,10 @@ U
>  Ů
>  v
>  V
> -w
> -W
>  va
>  Va
> +w
> +W
>  x
>  X
>  y
>
  
Carlos O'Donell April 6, 2021, 4:55 p.m. UTC | #2
On 4/6/21 10:23 AM, Carlos O'Donell wrote:
> On 3/18/21 9:43 PM, Carlos O'Donell wrote:
>> From: Sebastian Rasmussen <sebras@gmail.com>
>>
>> The 13th edition of Svenska Akademiens ordlista lists 'W' as a
>> distinct letter that sorts after 'V'. We adjust the sv_SE locale
>> (and tests) to match this updated and "reformed" language change.
>> This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
>> the letter 'W'.
> 
> I will be committing this patch shortly to resolve this issue.
> 
> I haven't seen any objections and the general consensus is to
> harmonize with CLDR which has already made these changes.
> 
> General feedback from native speakers is that this is the correct
> way forward for the sv_SE locale.

Pushed, and so will be fixed in 2.34.
  

Patch

diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE
index b0901726db..f54c73226d 100644
--- a/localedata/locales/sv_SE
+++ b/localedata/locales/sv_SE
@@ -61,22 +61,25 @@  LC_COLLATE
 copy "iso14651_t1"
 
 % CLDR collation rules for Swedish:
-% (see: https://unicode.org/cldr/trac/browser/trunk/common/collation/sv.xml)
+% (https://github.com/unicode-org/cldr/blob/master/common/collation/sv.xml)
 %
-% <collation type="standard">
+% We use the new "reformed" rules from the 13th edition of Svenska Akademiens
+% ordlista where 'W' is considered a distinct character sorting after 'V'.
+% This matches CLDR 1.5.0 released in 2007.
+%
+% <defaultCollation>reformed</defaultCollation>
+% <collation type="reformed">
 %   <cr><![CDATA[
 %     &D<<đ<<<Đ<<ð<<<Ð
 %     &t<<<þ/h
 %     &T<<<Þ/H
-%     &v<<<V<<w<<<W
 %     &Y<<ü<<<Ü<<ű<<<Ű
 %     &[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<<ô<<<Ô
 %   ]]></cr>
 % </collation>
 %
-% And CLDR also lists the following
-% index characters:
-% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/sv.xml)
+% And CLDR also lists the following index characters:
+% (https://github.com/unicode-org/cldr/blob/master/common/main/sv.xml)
 %
 % <exemplarCharacters type="index">[A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Å Ä Ö]</exemplarCharacters>
 %
@@ -103,17 +106,6 @@  reorder-after <AFTER-Z>
 <U00DE> "<S0074><S0068>";"<BASE><BASE>";"<COMPATCAP><COMPATCAP>";IGNORE % Þ
 <U00FE> "<S0074><S0068>";"<BASE><BASE>";"<COMPAT><COMPAT>";IGNORE % þ
 
-% The letter w is normally not present in the Swedish alphabet. It
-% exists in some names in Swedish and foreign words, but is accounted
-% for as a variant of 'v'.  Words and names with 'w' are in Swedish
-% ordered alphabetically among the words and names with 'v'. If two
-% words or names are only to be distinguished by 'v' or % 'w', 'v' is
-% placed before 'w'.
-
-% &v<<<V<<w<<<W
-<U0057> <S0076>;"<BASE><VRNT1>";"<CAP><MIN>";IGNORE % W
-<U0077> <S0076>;"<BASE><VRNT1>";"<MIN><MIN>";IGNORE % w
-
 % &Y<<ü<<<Ü<<ű<<<Ű
 <U00DC> <S0079>;"<BASE><TREMA>";"<CAP><MIN>";IGNORE % Ü
 <U00FC> <S0079>;"<BASE><TREMA>";"<MIN><MIN>";IGNORE % ü
diff --git a/localedata/sv_SE.ISO-8859-1.in b/localedata/sv_SE.ISO-8859-1.in
index 967c761370..94552ea80a 100644
--- a/localedata/sv_SE.ISO-8859-1.in
+++ b/localedata/sv_SE.ISO-8859-1.in
@@ -42,10 +42,10 @@  u
 U
 v
 V
-w
-W
 va
 Va
+w
+W
 x
 X
 y
diff --git a/localedata/sv_SE.UTF-8.in b/localedata/sv_SE.UTF-8.in
index 6db46e6271..80a093e709 100644
--- a/localedata/sv_SE.UTF-8.in
+++ b/localedata/sv_SE.UTF-8.in
@@ -65,10 +65,10 @@  U
 Ů
 v
 V
-w
-W
 va
 Va
+w
+W
 x
 X
 y