localedata: add new locales scn_IT and scn_US
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
Commit Message
Hello,
please consider merging the following patch, adding two new locales, scn_IT and
scn_US.
This is part of the ongoing effort in making Sicilian (ISO-639 scn) language
officially recognized as a minority language in Italy. The _US locale is
because the US is currently home to the majority of 2nd and 3rd generation
Sicilian-descendants. There are also vast communities in South America and
Australia, but that would be for another day really.
Thank you for considering,
David Paleino
President of Cademia Siciliana
---
localedata/SUPPORTED | 2 +
localedata/locales/scn_IT | 138 ++++++++++++++++++++++++++++++++++++++
localedata/locales/scn_US | 106 +++++++++++++++++++++++++++++
3 files changed, 246 insertions(+)
create mode 100644 localedata/locales/scn_IT
create mode 100644 localedata/locales/scn_US
Comments
* David Paleino:
> +translit_start
> +<U1E0C><U1E0C> "<U0044><U0044><U0048>"
> +<U1E0D><U1E0D> "<U0064><U0064><U0068>"
> +<U1E0C><U1E0D> "<U0044><U0064><U0068>"
> +translit_end
Please use UTF-8 for new locale definitions.
Is adding scn_US really necessary? A similar argument could be made
about most languages.
Thanks,
Florian
Hello,
On Mon, 29 Apr 2024 15:17:56 +0200, Florian Weimer wrote:
> * David Paleino:
>
> > +translit_start
> > +<U1E0C><U1E0C> "<U0044><U0044><U0048>"
> > +<U1E0D><U1E0D> "<U0064><U0064><U0068>"
> > +<U1E0C><U1E0D> "<U0044><U0064><U0068>"
> > +translit_end
>
> Please use UTF-8 for new locale definitions.
Please find attached the revised patch.
> Is adding scn_US really necessary? A similar argument could be made
> about most languages.
Currently, United States is probably the country hosting the biggest community
of Sicilian expats and their descendants, who might find it useful to have a
separate locale. We, as an association, have had actual demand for the locale
to be implemented. Probably second place goes to Latin America, but I'm only
proposing scn_US here because we already have a keyboard layout for that
particular combination.
I definitely understand that going down the rabbit hole of adding
<minority_language>_* can quickly become a nightmare, so please, if you prefer
scn_US to be dropped from the patch, just find a second patch attached, only
adding scn_IT.
Thank you,
David
David, I completed the UTF-8 conversion. Would you please double-check
that it's correct and resubmit as appropriate?
Thanks,
Florian
comment_char %
escape_char /
% This file is part of the GNU C Library and contains locale data.
% The Free Software Foundation does not claim any copyright interest
% in the locale data contained in this file. The foregoing does not
% affect the license of the GNU C Library as a whole. It does not
% exempt you from the conditions of the license if your use would
% otherwise be governed by that license.
% Sicilian Language Locale for Italy
% Source: Cademia Siciliana
% Address: Via Convento S.F. di Paola, 73
% 91100 Trapani, Italy
% Contact: David Paleino
% Email: david@cademiasiciliana.org
% Tel:
% Fax:
% Language: scn
% Territory: IT
% Revision: 1.0
% Date: 2024-04-27
% Users: general
LC_IDENTIFICATION
title "Sicilian locale for Italy"
source "Cademia Siciliana"
address "Via Convento S.F. di Paola, 73, 91100 Trapani, Italy"
contact ""
email "tech@cademiasiciliana.org"
tel ""
fax ""
language "Sicilian"
territory "Italy"
revision "1.0"
date "2024-04-27"
category "i18n:2012";LC_IDENTIFICATION
category "i18n:2012";LC_CTYPE
category "i18n:2012";LC_COLLATE
category "i18n:2012";LC_TIME
category "i18n:2012";LC_NUMERIC
category "i18n:2012";LC_MONETARY
category "i18n:2012";LC_MESSAGES
category "i18n:2012";LC_PAPER
category "i18n:2012";LC_NAME
category "i18n:2012";LC_ADDRESS
category "i18n:2012";LC_TELEPHONE
category "i18n:2012";LC_MEASUREMENT
END LC_IDENTIFICATION
LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE
LC_CTYPE
copy "it_IT"
translit_start
ḌḌ "DDH"
ḍḍ "ddh"
Ḍḍ "Ddh"
translit_end
END LC_CTYPE
LC_MESSAGES
yesexpr "^[+1sSyY]"
noexpr "^[-0nN]"
yesstr "se"
nostr "no"
END LC_MESSAGES
LC_MONETARY
copy "it_IT"
END LC_MONETARY
LC_NUMERIC
copy "it_IT"
END LC_NUMERIC
LC_TIME
copy "it_IT"
abday "dum";"lun";/
"mar";"mer";/
"jov";"ven";/
"sab"
day "dumìnica";/
"lunnidìa";/
"martidìa";/
"mercuridìa";/
"jovidìa";/
"vènniri";/
"sàbbatu"
abmon "jin";"fri";/
"mar";"apr";/
"maj";"giu";/
"gnt";"agu";/
"sit";"utt";/
"nuv";"dic"
mon "jinnaru";/
"frivaru";/
"marzu";/
"aprili";/
"maju";/
"giugnu";/
"giugnettu";/
"agustu";/
"sittèmmiru";/
"uttùviru";/
"novèmmiru";/
"dicèmmiru"
END LC_TIME
LC_PAPER
copy "it_IT"
END LC_PAPER
LC_TELEPHONE
copy "it_IT"
END LC_TELEPHONE
LC_MEASUREMENT
copy "it_IT"
END LC_MEASUREMENT
LC_NAME
copy "it_IT"
END LC_NAME
LC_ADDRESS
copy "it_IT"
lang_name "sicilianu"
lang_ab ""
lang_term "scn"
lang_lib "scn"
END LC_ADDRESS
On Mon, 13 May 2024 15:23:59 +0200, Florian Weimer wrote:
> David, I completed the UTF-8 conversion. Would you please double-check
> that it's correct and resubmit as appropriate?
Sorry, somehow I completely missed all the other conversions. Meh.
Final patch attached, thank you!
David
On Tue, 14 May 2024 22:58:15 +0200, David Paleino wrote:
> [..]
> Final patch attached, thank you!
Meh, I see on the online archives it gets attached a binary blob(?!)
Putting it in simple text format, sorry for the noise.
David
From f6ac8098264dcc4d1666b80bcb96eeda7b7084cd Mon Sep 17 00:00:00 2001
From: David Paleino <dapal@debian.org>
Date: Sat, 27 Apr 2024 23:22:01 +0200
Subject: [PATCH] localedata: add new locale scn_IT
Signed-off-by: David Paleino <dapal@debian.org>
---
localedata/SUPPORTED | 1 +
localedata/locales/scn_IT | 138 ++++++++++++++++++++++++++++++++++++++
2 files changed, 139 insertions(+)
create mode 100644 localedata/locales/scn_IT
diff --git a/localedata/SUPPORTED b/localedata/SUPPORTED
index 759895cc3a..96ff43f8fd 100644
--- a/localedata/SUPPORTED
+++ b/localedata/SUPPORTED
@@ -394,6 +394,7 @@ sa_IN/UTF-8 \
sah_RU/UTF-8 \
sat_IN/UTF-8 \
sc_IT/UTF-8 \
+scn_IT/UTF-8 \
sd_IN/UTF-8 \
sd_IN@devanagari/UTF-8 \
se_NO/UTF-8 \
diff --git a/localedata/locales/scn_IT b/localedata/locales/scn_IT
new file mode 100644
index 0000000000..abf9b1e49f
--- /dev/null
+++ b/localedata/locales/scn_IT
@@ -0,0 +1,138 @@
+comment_char %
+escape_char /
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Sicilian Language Locale for Italy
+% Source: Cademia Siciliana
+% Address: Via Convento S.F. di Paola, 73
+% 91100 Trapani, Italy
+% Contact: David Paleino
+% Email: david@cademiasiciliana.org
+% Tel:
+% Fax:
+% Language: scn
+% Territory: IT
+% Revision: 1.0
+% Date: 2024-04-27
+% Users: general
+
+LC_IDENTIFICATION
+title "Sicilian locale for Italy"
+source "Cademia Siciliana"
+address "Via Convento S.F. di Paola, 73, 91100 Trapani, Italy"
+contact ""
+email "tech@cademiasiciliana.org"
+tel ""
+fax ""
+language "Sicilian"
+territory "Italy"
+revision "1.0"
+date "2024-04-27"
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
+END LC_IDENTIFICATION
+
+LC_COLLATE
+copy "iso14651_t1"
+END LC_COLLATE
+
+LC_CTYPE
+copy "it_IT"
+
+translit_start
+ḌḌ "DDH"
+ḍḍ "ddh"
+Ḍḍ "Ddh"
+translit_end
+END LC_CTYPE
+
+LC_MESSAGES
+yesexpr "^[+1sSyY]"
+noexpr "^[-0nN]"
+yesstr "se"
+nostr "no"
+END LC_MESSAGES
+
+LC_MONETARY
+copy "it_IT"
+END LC_MONETARY
+
+LC_NUMERIC
+copy "it_IT"
+END LC_NUMERIC
+
+LC_TIME
+copy "it_IT"
+
+abday "dum";"lun";/
+ "mar";"mer";/
+ "jov";"ven";/
+ "sab"
+day "dumìnica";/
+ "lunnidìa";/
+ "martidìa";/
+ "mercuridìa";/
+ "jovidìa";/
+ "venniridìa";/
+ "sàbbatu"
+abmon "jin";"fri";/
+ "mar";"apr";/
+ "maj";"giu";/
+ "gnt";"agu";/
+ "sit";"utt";/
+ "nuv";"dic"
+mon "jinnaru";/
+ "frivaru";/
+ "marzu";/
+ "aprili";/
+ "maju";/
+ "giugnu";/
+ "giugnettu";/
+ "agustu";/
+ "sittèmmiru";/
+ "uttùviru";/
+ "nuvèmmiru";/
+ "dicèmmiru"
+END LC_TIME
+
+LC_PAPER
+copy "it_IT"
+END LC_PAPER
+
+LC_TELEPHONE
+copy "it_IT"
+END LC_TELEPHONE
+
+LC_MEASUREMENT
+copy "it_IT"
+END LC_MEASUREMENT
+
+LC_NAME
+copy "it_IT"
+END LC_NAME
+
+LC_ADDRESS
+copy "it_IT"
+
+lang_name "sicilianu"
+lang_ab ""
+lang_term "scn"
+lang_lib "scn"
+END LC_ADDRESS
* David Paleino:
> On Tue, 14 May 2024 22:58:15 +0200, David Paleino wrote:
>
>> [..]
>> Final patch attached, thank you!
>
> Meh, I see on the online archives it gets attached a binary blob(?!)
It shows up in the alternative archives:
<https://inbox.sourceware.org/libc-alpha/20240514230847.20b64f52@betelgeuse.hanskalabs.net/T/#mfccd4453d0e6706770cfa26e04b8fa7ec2b2995a>
Thanks,
Florian
David Paleino <dapal@debian.org> さんはかきました:
> +LC_TIME
> +copy "it_IT" <- problem here
> +
> +abday "dum";"lun";/
> + "mar";"mer";/
> + "jov";"ven";/
> + "sab"
> +day "dumìnica";/
> + "lunnidìa";/
> + "martidìa";/
> + "mercuridìa";/
> + "jovidìa";/
> + "venniridìa";/
> + "sàbbatu"
> +abmon "jin";"fri";/
> + "mar";"apr";/
> + "maj";"giu";/
> + "gnt";"agu";/
> + "sit";"utt";/
> + "nuv";"dic"
> +mon "jinnaru";/
> + "frivaru";/
> + "marzu";/
> + "aprili";/
> + "maju";/
> + "giugnu";/
> + "giugnettu";/
> + "agustu";/
> + "sittèmmiru";/
> + "uttùviru";/
> + "nuvèmmiru";/
> + "dicèmmiru"
> +END LC_TIME
> +LC_ADDRESS
> +copy "it_IT" <- problem here
> +
> +lang_name "sicilianu"
> +lang_ab ""
> +lang_term "scn"
> +lang_lib "scn"
> +END LC_ADDRESS
$ localedef -f UTF-8 -i scn_IT /tmp/sci_IT.UTF-8
scn_IT:83: no other keyword shall be specified when `copy' is used
scn_IT:133: no other keyword shall be specified when `copy' is used
On Wed, 15 May 2024 17:17:19 +0200, Mike FABIAN wrote:
> [..]
>
> $ localedef -f UTF-8 -i scn_IT /tmp/sci_IT.UTF-8
> scn_IT:83: no other keyword shall be specified when `copy' is used
> scn_IT:133: no other keyword shall be specified when `copy' is used
Fixed, thank you.
David
From d5d51e4a162fe3e0057a03f0412e910ab15c0522 Mon Sep 17 00:00:00 2001
From: David Paleino <dapal@debian.org>
Date: Sat, 27 Apr 2024 23:22:01 +0200
Subject: [PATCH] localedata: add new locale scn_IT
Signed-off-by: David Paleino <dapal@debian.org>
---
localedata/SUPPORTED | 1 +
localedata/locales/scn_IT | 150 ++++++++++++++++++++++++++++++++++++++
2 files changed, 151 insertions(+)
create mode 100644 localedata/locales/scn_IT
diff --git a/localedata/SUPPORTED b/localedata/SUPPORTED
index 759895cc3a..96ff43f8fd 100644
--- a/localedata/SUPPORTED
+++ b/localedata/SUPPORTED
@@ -394,6 +394,7 @@ sa_IN/UTF-8 \
sah_RU/UTF-8 \
sat_IN/UTF-8 \
sc_IT/UTF-8 \
+scn_IT/UTF-8 \
sd_IN/UTF-8 \
sd_IN@devanagari/UTF-8 \
se_NO/UTF-8 \
diff --git a/localedata/locales/scn_IT b/localedata/locales/scn_IT
new file mode 100644
index 0000000000..6161c529fb
--- /dev/null
+++ b/localedata/locales/scn_IT
@@ -0,0 +1,150 @@
+comment_char %
+escape_char /
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Sicilian Language Locale for Italy
+% Source: Cademia Siciliana
+% Address: Via Convento S.F. di Paola, 73
+% 91100 Trapani, Italy
+% Contact: David Paleino
+% Email: david@cademiasiciliana.org
+% Tel:
+% Fax:
+% Language: scn
+% Territory: IT
+% Revision: 1.0
+% Date: 2024-04-27
+% Users: general
+
+LC_IDENTIFICATION
+title "Sicilian locale for Italy"
+source "Cademia Siciliana"
+address "Via Convento S.F. di Paola, 73, 91100 Trapani, Italy"
+contact ""
+email "tech@cademiasiciliana.org"
+tel ""
+fax ""
+language "Sicilian"
+territory "Italy"
+revision "1.0"
+date "2024-04-27"
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
+END LC_IDENTIFICATION
+
+LC_COLLATE
+copy "iso14651_t1"
+END LC_COLLATE
+
+LC_CTYPE
+copy "it_IT"
+
+translit_start
+ḌḌ "DDH"
+ḍḍ "ddh"
+Ḍḍ "Ddh"
+translit_end
+END LC_CTYPE
+
+LC_MESSAGES
+yesexpr "^[+1sSyY]"
+noexpr "^[-0nN]"
+yesstr "se"
+nostr "no"
+END LC_MESSAGES
+
+LC_MONETARY
+copy "it_IT"
+END LC_MONETARY
+
+LC_NUMERIC
+copy "it_IT"
+END LC_NUMERIC
+
+LC_TIME
+abday "dum";"lun";/
+ "mar";"mer";/
+ "jov";"ven";/
+ "sab"
+day "dumìnica";/
+ "lunnidìa";/
+ "martidìa";/
+ "mercuridìa";/
+ "jovidìa";/
+ "venniridìa";/
+ "sàbbatu"
+abmon "jin";"fri";/
+ "mar";"apr";/
+ "maj";"giu";/
+ "gnt";"agu";/
+ "sit";"utt";/
+ "nuv";"dic"
+mon "jinnaru";/
+ "frivaru";/
+ "marzu";/
+ "aprili";/
+ "maju";/
+ "giugnu";/
+ "giugnettu";/
+ "agustu";/
+ "sittèmmiru";/
+ "uttùviru";/
+ "nuvèmmiru";/
+ "dicèmmiru"
+d_t_fmt "%a %-d %b %Y, %T"
+d_fmt "%d//%m//%Y"
+t_fmt "%T"
+am_pm "";""
+t_fmt_ampm ""
+date_fmt "%a %-d %b %Y, %T, %Z"
+week 7;19971130;4
+first_weekday 2
+END LC_TIME
+
+LC_PAPER
+copy "it_IT"
+END LC_PAPER
+
+LC_TELEPHONE
+copy "it_IT"
+END LC_TELEPHONE
+
+LC_MEASUREMENT
+copy "it_IT"
+END LC_MEASUREMENT
+
+LC_NAME
+copy "it_IT"
+END LC_NAME
+
+LC_ADDRESS
+postal_fmt "%f%N%a%N%d%N%b%N%s %h %e %r%N%z %T%N%c%N"
+country_name "Italia"
+country_ab2 "IT"
+country_ab3 "ITA"
+country_num 380
+country_isbn "978-88,979-12"
+country_car "I"
+
+lang_name "sicilianu"
+lang_ab ""
+lang_term "scn"
+lang_lib "scn"
+END LC_ADDRESS
David Paleino <dapal@debian.org> さんはかきました:
> +LC_CTYPE
> +copy "it_IT"
> +
> +translit_start
> +ḌḌ "DDH"
> +ḍḍ "ddh"
> +Ḍḍ "Ddh"
> +translit_end
> +END LC_CTYPE
I am sorry for not testing that earlier, but that translit part does not
seem to work:
bash-5.2# export LC_ALL=scn_IT.UTF-8
bash-5.2# echo 'ḌḌ' | iconv -f UTF-8 -t ASCII//translit
??
bash-5.2# echo 'ß' | iconv -f UTF-8 -t ASCII//translit
ss
bash-5.2#
Mike FABIAN <mfabian@redhat.com> さんはかきました:
> David Paleino <dapal@debian.org> さんはかきました:
>
>> +LC_CTYPE
>> +copy "it_IT"
>> +
>> +translit_start
>> +ḌḌ "DDH"
>> +ḍḍ "ddh"
>> +Ḍḍ "Ddh"
>> +translit_end
>> +END LC_CTYPE
>
> I am sorry for not testing that earlier, but that translit part does not
> seem to work:
>
> bash-5.2# export LC_ALL=scn_IT.UTF-8
> bash-5.2# echo 'ḌḌ' | iconv -f UTF-8 -t ASCII//translit
> ??
> bash-5.2# echo 'ß' | iconv -f UTF-8 -t ASCII//translit
> ss
> bash-5.2#
With single input characters the transliteration works, i.e. something
like this works:
LC_CTYPE
copy "it_IT"
translit_start
Ḍ "D"
ḍ "d"
translit_end
END LC_CTYPE
bash-5.2# export LC_ALL=scn_IT.UTF-8
bash-5.2# echo 'Ḍ' | iconv -f UTF-8 -t ASCII//translit
D
bash-5.2#
I think glibc can only transliterate single input characters into an
output string, it most likely cannot transliterate a multi-character
input string into something at the moment.
On Mai 16 2024, Mike FABIAN wrote:
> I think glibc can only transliterate single input characters into an
> output string, it most likely cannot transliterate a multi-character
> input string into something at the moment.
AFAICT, it should work with multi-character transliterations.
* Andreas Schwab:
> On Mai 16 2024, Mike FABIAN wrote:
>
>> I think glibc can only transliterate single input characters into an
>> output string, it most likely cannot transliterate a multi-character
>> input string into something at the moment.
>
> AFAICT, it should work with multi-character transliterations.
How does this work reliably if there is an inconvenient iconv buffer
boundary?
(Not saying this is the case here.)
Thanks,
Florian
On Mai 16 2024, Mike FABIAN wrote:
> David Paleino <dapal@debian.org> さんはかきました:
>
>> +LC_CTYPE
>> +copy "it_IT"
>> +
>> +translit_start
>> +ḌḌ "DDH"
>> +ḍḍ "ddh"
>> +Ḍḍ "Ddh"
>> +translit_end
>> +END LC_CTYPE
>
> I am sorry for not testing that earlier, but that translit part does not
> seem to work:
>
> bash-5.2# export LC_ALL=scn_IT.UTF-8
> bash-5.2# echo 'ḌḌ' | iconv -f UTF-8 -t ASCII//translit
> ??
There is already a transliteration for Ḍ which takes precedence.
On Mai 16 2024, Florian Weimer wrote:
> * Andreas Schwab:
>
>> On Mai 16 2024, Mike FABIAN wrote:
>>
>>> I think glibc can only transliterate single input characters into an
>>> output string, it most likely cannot transliterate a multi-character
>>> input string into something at the moment.
>>
>> AFAICT, it should work with multi-character transliterations.
>
> How does this work reliably if there is an inconvenient iconv buffer
> boundary?
__gconv_transliterate returns __GCONV_INCOMPLETE_INPUT.
On Mai 16 2024, Andreas Schwab wrote:
> On Mai 16 2024, Mike FABIAN wrote:
>
>> David Paleino <dapal@debian.org> さんはかきました:
>>
>>> +LC_CTYPE
>>> +copy "it_IT"
>>> +
>>> +translit_start
>>> +ḌḌ "DDH"
>>> +ḍḍ "ddh"
>>> +Ḍḍ "Ddh"
>>> +translit_end
>>> +END LC_CTYPE
>>
>> I am sorry for not testing that earlier, but that translit part does not
>> seem to work:
>>
>> bash-5.2# export LC_ALL=scn_IT.UTF-8
>> bash-5.2# echo 'ḌḌ' | iconv -f UTF-8 -t ASCII//translit
>> ??
>
> There is already a transliteration for Ḍ which takes precedence.
Actually, the entries above replace the ones from translit_combining,
but they are interpreted as "Ḍ" -> "DDH" and "ḍ" -> "ḍddh" (the third
entry is ignored). The proper syntax would be
translit_start
"ḌḌ" "DDH"
"ḍḍ" "ddh"
"Ḍḍ" "Ddh"
translit_end
but depending on the how the binary search goes on, either these or the
shorter matches will win.
Andreas Schwab <schwab@suse.de> さんはかきました:
> On Mai 16 2024, Andreas Schwab wrote:
>
>> On Mai 16 2024, Mike FABIAN wrote:
>>
>>> David Paleino <dapal@debian.org> さんはかきました:
>>>
>>>> +LC_CTYPE
>>>> +copy "it_IT"
>>>> +
>>>> +translit_start
>>>> +ḌḌ "DDH"
>>>> +ḍḍ "ddh"
>>>> +Ḍḍ "Ddh"
>>>> +translit_end
>>>> +END LC_CTYPE
>>>
>>> I am sorry for not testing that earlier, but that translit part does not
>>> seem to work:
>>>
>>> bash-5.2# export LC_ALL=scn_IT.UTF-8
>>> bash-5.2# echo 'ḌḌ' | iconv -f UTF-8 -t ASCII//translit
>>> ??
>>
>> There is already a transliteration for Ḍ which takes precedence.
>
> Actually, the entries above replace the ones from translit_combining,
> but they are interpreted as "Ḍ" -> "DDH" and "ḍ" -> "ḍddh" (the third
> entry is ignored). The proper syntax would be
>
> translit_start
> "ḌḌ" "DDH"
> "ḍḍ" "ddh"
> "Ḍḍ" "Ddh"
> translit_end
Thank you! I have tried with that syntax now but could not make it work.
> but depending on the how the binary search goes on, either these or the
> shorter matches will win.
Does it depend on the exact input how the binary search goes on?
I tried several inputs and for me the shorter matches did always win.
Then I uncommented the shorter matches in translit_combining like this:
diff --git a/localedata/locales/translit_combining b/localedata/locales/translit_combining
index ce2f19eee1..6f879d9caf 100644
--- a/localedata/locales/translit_combining
+++ b/localedata/locales/translit_combining
@@ -2486,9 +2486,9 @@ translit_start
% LATIN SMALL LETTER D WITH DOT ABOVE
<U1E0B> <U0064>
% LATIN CAPITAL LETTER D WITH DOT BELOW
-<U1E0C> <U0044>
+%<U1E0C> <U0044>
% LATIN SMALL LETTER D WITH DOT BELOW
-<U1E0D> <U0064>
+%<U1E0D> <U0064>
% LATIN CAPITAL LETTER D WITH LINE BELOW
<U1E0E> <U0044>
% LATIN SMALL LETTER D WITH LINE BELOW
and after doing that,
bash-5.2# echo 'ḌḌ'|iconv -f UTF-8 -t ASCII//translit
^C
bash-5.2#
uses 100% CPU and never stops until I stop it with Control-C.
On Mai 16 2024, Mike FABIAN wrote:
> Does it depend on the exact input how the binary search goes on?
It depends on the translit data, how the midway point moves through it.
> bash-5.2# echo 'ḌḌ'|iconv -f UTF-8 -t ASCII//translit
> ^C
> bash-5.2#
>
> uses 100% CPU and never stops until I stop it with Control-C.
else if (cnt > 0)
/* This means that the input buffer contents matches a prefix of
an entry. Since we cannot match it unless we get more input,
we will tell the caller about it. */
return __GCONV_INCOMPLETE_INPUT;
This should only return when the end of the input string is reached,
otherwise it's a non-match and it should go on to try other translit
patterns.
@@ -393,6 +393,8 @@ sa_IN/UTF-8 \
sah_RU/UTF-8 \
sat_IN/UTF-8 \
sc_IT/UTF-8 \
+scn_IT/UTF-8 \
+scn_US/UTF-8 \
sd_IN/UTF-8 \
sd_IN@devanagari/UTF-8 \
se_NO/UTF-8 \
new file mode 100644
@@ -0,0 +1,138 @@
+comment_char %
+escape_char /
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Sicilian Language Locale for Italy
+% Source: Cademia Siciliana
+% Address: Via Convento S.F. di Paola, 73
+% 91100 Trapani, Italy
+% Contact: David Paleino
+% Email: david@cademiasiciliana.org
+% Tel:
+% Fax:
+% Language: scn
+% Territory: IT
+% Revision: 1.0
+% Date: 2024-04-27
+% Users: general
+
+LC_IDENTIFICATION
+title "Sicilian locale for Italy"
+source "Cademia Siciliana"
+address "Via Convento S.F. di Paola, 73, 91100 Trapani, Italy"
+contact ""
+email "tech@cademiasiciliana.org"
+tel ""
+fax ""
+language "Sicilian"
+territory "Italy"
+revision "1.0"
+date "2024-04-27"
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
+END LC_IDENTIFICATION
+
+LC_COLLATE
+copy "iso14651_t1"
+END LC_COLLATE
+
+LC_CTYPE
+copy "it_IT"
+
+translit_start
+<U1E0C><U1E0C> "<U0044><U0044><U0048>"
+<U1E0D><U1E0D> "<U0064><U0064><U0068>"
+<U1E0C><U1E0D> "<U0044><U0064><U0068>"
+translit_end
+END LC_CTYPE
+
+LC_MESSAGES
+yesexpr "^[+1sSyY]"
+noexpr "^[-0nN]"
+yesstr "se"
+nostr "no"
+END LC_MESSAGES
+
+LC_MONETARY
+copy "it_IT"
+END LC_MONETARY
+
+LC_NUMERIC
+copy "it_IT"
+END LC_NUMERIC
+
+LC_TIME
+copy "it_IT"
+
+abday "dum";"lun";/
+ "mar";"mer";/
+ "jov";"ven";/
+ "sab"
+day "dum<U00EC>nica";/
+ "lunnid<U00EC>a";/
+ "martid<U00EC>a";/
+ "mercurid<U00EC>a";/
+ "jovid<U00EC>a";/
+ "v<U00E8>nniri";/
+ "s<U00E0>bbatu"
+abmon "jin";"fri";/
+ "mar";"apr";/
+ "maj";"giu";/
+ "gnt";"agu";/
+ "sit";"utt";/
+ "nuv";"dic"
+mon "jinnaru";/
+ "frivaru";/
+ "marzu";/
+ "aprili";/
+ "maju";/
+ "giugnu";/
+ "giugnettu";/
+ "agustu";/
+ "sitt<U00E8>mmiru";/
+ "utt<U00F9>viru";/
+ "nov<U00E8>mmiru";/
+ "dic<U00E8>mmiru"
+END LC_TIME
+
+LC_PAPER
+copy "it_IT"
+END LC_PAPER
+
+LC_TELEPHONE
+copy "it_IT"
+END LC_TELEPHONE
+
+LC_MEASUREMENT
+copy "it_IT"
+END LC_MEASUREMENT
+
+LC_NAME
+copy "it_IT"
+END LC_NAME
+
+LC_ADDRESS
+copy "it_IT"
+
+lang_name "sicilianu"
+lang_ab ""
+lang_term "scn"
+lang_lib "scn"
+END LC_ADDRESS
new file mode 100644
@@ -0,0 +1,106 @@
+comment_char %
+escape_char /
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file. The foregoing does not
+% affect the license of the GNU C Library as a whole. It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Sicilian Language Locale for the USA
+% Source: Cademia Siciliana
+% Address: Via Convento S.F. di Paola, 73
+% 91100 Trapani, Italy
+% Contact: David Paleino
+% Email: david@cademiasiciliana.org
+% Tel:
+% Fax:
+% Language: scn
+% Territory: USA
+% Revision: 1.0
+% Date: 2024-04-27
+% Users: general
+
+LC_IDENTIFICATION
+title "Sicilian locale for Italy"
+source "Cademia Siciliana"
+address "Via Convento S.F. di Paola, 73, 91100 Trapani, Italy"
+contact ""
+email "tech@cademiasiciliana.org"
+tel ""
+fax ""
+language "Sicilian"
+territory "United States"
+revision "1.0"
+date "2024-04-27"
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
+END LC_IDENTIFICATION
+
+LC_COLLATE
+copy "iso14651_t1"
+END LC_COLLATE
+
+LC_CTYPE
+copy "scn_IT"
+END LC_CTYPE
+
+LC_MESSAGES
+copy "scn_IT"
+END LC_MESSAGES
+
+LC_MONETARY
+copy "en_US"
+END LC_MONETARY
+
+LC_NUMERIC
+copy "en_US"
+END LC_NUMERIC
+
+LC_TIME
+copy "scn_IT"
+
+week 7;19971130;1
+d_t_fmt "%a %d %b %Y %r %Z"
+d_fmt "%m//%d//%Y"
+t_fmt "%r"
+t_fmt_ampm "%I:%M:%S %p"
+date_fmt "%a %b %e %r %Z %Y"
+am_pm "AM";"PM"
+END LC_TIME
+
+LC_PAPER
+copy "en_US"
+END LC_PAPER
+
+LC_TELEPHONE
+copy "en_US"
+END LC_TELEPHONE
+
+LC_MEASUREMENT
+copy "en_US"
+END LC_MEASUREMENT
+
+LC_NAME
+copy "en_US"
+END LC_NAME
+
+LC_ADDRESS
+copy "en_US"
+lang_name "sicilianu"
+lang_ab ""
+lang_term "scn"
+lang_lib "scn"
+END LC_ADDRESS