From patchwork Mon Jun 15 16:04:46 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mike FABIAN X-Patchwork-Id: 7184 Received: (qmail 101310 invoked by alias); 15 Jun 2015 16:05:12 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 101149 invoked by uid 89); 15 Jun 2015 16:05:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: Yes, score=5.1 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, LIKELY_SPAM_BODY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com From: Mike FABIAN To: libc-alpha@sourceware.org Cc: Pravin Satpute Subject: [PATCHv2] Update the localedata/locales/translit_* files to Unicode 7.0.0 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5.50 (gnu/linux) Date: Mon, 15 Jun 2015 18:04:46 +0200 Message-ID: MIME-Version: 1.0 This is an update to my earlier patches: https://sourceware.org/ml/libc-alpha/2015-04/msg00361.html Updates: - transliteration rules for da, nb, nn, and sv locales added to transliterate for example "ö" to "oe" in these locales because the "neutral" transliteration should be "ö" to "o" (For example in English, coöperation as used in http://www.newyorker.com/humor/borowitz-report/obama-putin-agree-never-to-speak-to-each-other-again should be transliterated to "cooperation", not "cooeperation"). This should fix [BZ #89]. - lots of stuff added to translit_neutral - some more tweaks to the script generating the translit files generated from Unicode I tested the patches on Fedora 22. Can somebody review this please? ---------------------------------------------------------------------- The attached file updates these translit files to Unicode 7.0.0: locales/translit_circle locales/translit_cjk_compat locales/translit_combining locales/translit_compat locales/translit_font locales/translit_fraction it also contains lots of manual updates to locales/translit_neutral now, many of them taken from http://unicode.org/cldr/trac/browser/trunk/common/transforms/Latin-ASCII.xml It does *not* update these translit files: locales/translit_cjk_variants locales/translit_hangul locales/translit_narrow locales/translit_small locales/translit_wide because translit_cjk_variants is apparently not generated from Unicode data. The other files, translit_hangul, translit_narrow, translit_small, translit_wide are generated but they would not change when using Unicode 7.0.0 data, nothing seems to have changed in Unicode affecting these files. I could add scripts to generate these as well, but they would just reproduce the current files. Maybe I should do that nevertheless, just to be able to see if something changes in future (quite unlikely, I think). Some code was duplicated in utf8_gen.py and utf8_compatibility.py, Alexandre Oliva had already suggested to split this into an extra file. As the new generator scripts added by this patch needed this code again I saw that Alexandre was right and did put the reusable code into an extra file unicode_utils.py. Not everything in the generated translit_* files could be reproduced exactly from Unicode data, the were some manual additions in the files (which were not mentioned in the comments on top of these files, the “grep” and “sed” expressions mentioned in the comments reproduce most of the contents of these files but not everything). Where the manual additions seemed to make sense, I added manual hacks to the new generator scripts gen_translit_*.py to reproduce these manual additons as well. From ef2a1022224d32989891f7a12f2170a1b3a7e7f9 Mon Sep 17 00:00:00 2001 From: Mike FABIAN Date: Wed, 20 May 2015 11:16:30 +0200 Subject: [PATCH 4/4] Add transliteration rules for da, nb, nn, and sv locales. for localedata/Changelog [BZ #89] * locales/da_DK add more transliteration rules * locales/nb_NO add transliteration rules * locales/sv_SE add transliteration rules --- localedata/locales/da_DK | 21 ++++++++++++++++++--- localedata/locales/nb_NO | 22 ++++++++++++++++++++++ localedata/locales/sv_SE | 22 ++++++++++++++++++++++ 3 files changed, 62 insertions(+), 3 deletions(-) diff --git a/localedata/locales/da_DK b/localedata/locales/da_DK index c5024a4..d1d4087 100644 --- a/localedata/locales/da_DK +++ b/localedata/locales/da_DK @@ -137,11 +137,26 @@ translit_start include "translit_combining";"" -% Danish. -% LATIN CAPITAL LETTER A WITH RING ABOVE. +% LATIN CAPITAL LETTER A WITH DIAERESIS -> "AE" + "";"" +% LATIN CAPITAL LETTER A WITH RING ABOVE -> "AA" "";"" -% LATIN SMALL LETTER A WITH RING ABOVE. +% LATIN CAPITAL LETTER AE -> "AE" + "" +% LATIN CAPITAL LETTER O WITH DIAERESIS -> "OE" + "";"" +% LATIN CAPITAL LETTER O WITH STROKE -> "OE" + "";"" +% LATIN SMALL LETTER A WITH DIAERESIS -> "ae" + "";"" +% LATIN SMALL LETTER A WITH RING ABOVE -> "aa" "";"" +% LATIN SMALL LETTER AE -> "ae" + "" +% LATIN SMALL LETTER O WITH DIAERESIS -> "oe" + "";"" +% LATIN SMALL LETTER O WITH STROKE -> "oe" + "";"" translit_end diff --git a/localedata/locales/nb_NO b/localedata/locales/nb_NO index 513d50c..332092a 100644 --- a/localedata/locales/nb_NO +++ b/localedata/locales/nb_NO @@ -127,6 +127,28 @@ copy "i18n" translit_start include "translit_combining";"" + +% LATIN CAPITAL LETTER A WITH DIAERESIS -> "AE" + "";"" +% LATIN CAPITAL LETTER A WITH RING ABOVE -> "AA" + "";"" +% LATIN CAPITAL LETTER AE -> "AE" + "" +% LATIN CAPITAL LETTER O WITH DIAERESIS -> "OE" + "";"" +% LATIN CAPITAL LETTER O WITH STROKE -> "OE" + "";"" +% LATIN SMALL LETTER A WITH DIAERESIS -> "ae" + "";"" +% LATIN SMALL LETTER A WITH RING ABOVE -> "aa" + "";"" +% LATIN SMALL LETTER AE -> "ae" + "" +% LATIN SMALL LETTER O WITH DIAERESIS -> "oe" + "";"" +% LATIN SMALL LETTER O WITH STROKE -> "oe" + "";"" + translit_end END LC_CTYPE diff --git a/localedata/locales/sv_SE b/localedata/locales/sv_SE index ecf7858..92358b9 100644 --- a/localedata/locales/sv_SE +++ b/localedata/locales/sv_SE @@ -112,6 +112,28 @@ copy "i18n" translit_start include "translit_combining";"" + +% LATIN CAPITAL LETTER A WITH DIAERESIS -> "AE" + "";"" +% LATIN CAPITAL LETTER A WITH RING ABOVE -> "AA" + "";"" +% LATIN CAPITAL LETTER AE -> "AE" + "" +% LATIN CAPITAL LETTER O WITH DIAERESIS -> "OE" + "";"" +% LATIN CAPITAL LETTER O WITH STROKE -> "OE" + "";"" +% LATIN SMALL LETTER A WITH DIAERESIS -> "ae" + "";"" +% LATIN SMALL LETTER A WITH RING ABOVE -> "aa" + "";"" +% LATIN SMALL LETTER AE -> "ae" + "" +% LATIN SMALL LETTER O WITH DIAERESIS -> "oe" + "";"" +% LATIN SMALL LETTER O WITH STROKE -> "oe" + "";"" + translit_end END LC_CTYPE -- 2.4.2