From patchwork Thu Nov 14 13:14:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Egor Kobylkin X-Patchwork-Id: 35875 Received: (qmail 50318 invoked by alias); 14 Nov 2019 13:14:32 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 50237 invoked by uid 89); 14 Nov 2019 13:14:31 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-13.6 required=5.0 tests=AWL, BAYES_00, BODY_8BITS, GARBLED_BODY, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, MIME_QP_LONG_LINE, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS, UPPERCASE_50_75 autolearn=ham version=3.3.1 spammy=thursday, Thursday X-HELO: mail4.protonmail.ch Date: Thu, 14 Nov 2019 13:14:14 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kobylkin.com; s=protonmail; t=1573737264; bh=xkozJJIS4SOBZeYEGG/y7egC4S7cCUAfObMnUhjV1Os=; h=Date:To:From:Cc:Reply-To:Subject:Feedback-ID:From; b=r9UZ1dO75d6zT8Bsbb5Q+D7flxBEhKYqyj8U646hXbhNvPm+IcqySUIilTiAWaCoG Oa8hcnABKFEYJFOcxRjrZoRYinLLHALY/ZOX4DB4Aip7Uh6SaIzwsliKBe8akBCfWz 1zrYx1FtL3gph7RSysRfgNd6SaWVyVVDSr+kAzOc= To: "libc-locales@sourceware.org" , "libc-alpha@sourceware.org" From: "Diego (Egor) Kobylkin" Cc: Florian Weimer , Marko Myllynen Reply-To: "Diego (Egor) Kobylkin" Subject: [PATCHv3] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031] Message-ID: MIME-Version: 1.0 Changelog: v3 * spurious change (https/http) removed v2 * ETA WITH TONOS is now transliterated as I/i to be consistent throughout the table. Ancient Greek calls for E/e and modern for I/i which we are taking here. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, November 14, 2019 1:29 PM, Diego (Egor) Kobylkin wrote: > Changelog: > > v2 > > - ETA WITH TONOS is now transliterated as I/i to be consistent throughout the table. Ancient Greek calls for E/e and modern for I/i which we are taking here. > > Thanks Florian for the feedback on this! > Egor > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Wednesday, September 4, 2019 9:31 AM, Diego (Egor) Kobylkin egor@kobylkin.com wrote: > > > > Dear locale maintainers, > > fix the glibc bug 12031 "iconv -t ascii//translit with Greek characters" [1] > > add Greek transliteration rows to locale/C-translit.h.in. > > This work is done on the heels of the successfully committed patch for the > > virtually the same bug [BZ #2872] but concerning Cyrillic characters. [2] > > AFAIK there are many versions of transcription tables for Greek to ASCII > > transcription. Given that current iconv logic can only translit one to many > > but not many to many symbols we take the "Standard" part of > > the Romanization_of_Greek#Modern_Greek table [3] > > and only keep the one letter Greek graphems. That "standard" seems to be close to > > the ELOT 743 indeed but not the same. > > So we omit things like M and Μπ being transliterated as M and B accordingly. > > Rather Μπ will be treated like two separate graphems and transliterated as Mp. > > Here is the list of some standards I have collected so far. There doesn't seem > > a way to harmonize them all into one. But if anyone want to propose a solution - > > please do. > > > > - ΕΛΟΤ 743 https://www.teicrete.gr/users/kutrulis/Ergalia/ELOT743.htm Passports. > > > > - ISO 843 https://en.wikipedia.org/wiki/ISO_843 > > > > - ALA-LC https://www.loc.gov/catdir/cpso/romanization/greek.pdf Book titles. > > > > - BGN/PCGN http://libraries.ucsd.edu/bib/fed/USBGN_romanization.pdf > > > > - http://geonames.nga.mil/gns/html/Romanization/Romanization_Greek.pdf Geographical names. > > Furthermore to cover the whole U0370-U03FF Greek/Coptic Unicode range I have > > asked around and made a best effort transliteration for the rest of characters > > not covered in above standards. > > Should you have better sources for the actual translit entries please make sure to > > send your feedback! > > The patch is attached. > > Best regards, > > Egor Kobylkin > > https://sourceware.org/bugzilla/show_bug.cgi?id=12031 [1] > > https://sourceware.org/ml/libc-alpha/2019-07/msg00477.html [2] > > https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek [3] > > From 5a45778dabe368d4a8fce9ca69dbd9894fda0006 Mon Sep 17 00:00:00 2001 From: Egor Kobylkin Date: Thu, 14 Nov 2019 13:59:39 +0100 Subject: [PATCH] Locales: Greek -> ASCII transliteration table [BZ #12031] [BZ #12031] * locale/C-translit.h.in: Add Greeklish transliteration. --- locale/C-translit.h.in | 135 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in index 12cbcd35be..5a3cf482e0 100644 --- a/locale/C-translit.h.in +++ b/locale/C-translit.h.in @@ -56,6 +56,141 @@ "\x02cd" "_" # MODIFIER LETTER LOW MACRON "\x02d0" ":" # MODIFIER LETTER TRIANGULAR COLON "\x02dc" "~" # SMALL TILDE +"\x0370" "H" # GREEK CAPITAL LETTER HETA +"\x0371" "h" # GREEK SMALL LETTER HETA +"\x0372" "SS" # GREEK CAPITAL LETTER ARCHAIC SAMPI +"\x0373" "ss" # GREEK SMALL LETTER ARCHAIC SAMPI +"\x0374" "#" # GREEK NUMERAL SIGN +"\x0375" "#`" # GREEK LOWER NUMERAL SIGN +"\x0376" "W" # GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA +"\x0377" "w" # GREEK SMALL LETTER PAMPHYLIAN DIGAMMA +"\x037a" "i" # GREEK YPOGEGRAMMENI +"\x037b" "s" # GREEK SMALL REVERSED LUNATE SIGMA SYMBOL +"\x037c" "s" # GREEK SMALL DOTTED LUNATE SIGMA SYMBOL +"\x037d" "s" # GREEK SMALL REVERSED DOTTED LUNATE SIGMA SYMBOL +"\x037e" "?" # GREEK QUESTION MARK +"\x037f" "J" # GREEK CAPITAL LETTER YOT +"\x0384" "`" # GREEK TONOS +"\x0385" "`" # GREEK DIALYTIKA TONOS +"\x0386" "A" # GREEK CAPITAL LETTER ALPHA WITH TONOS +"\x0387" ";" # GREEK ANO TELEIA +"\x0388" "E" # GREEK CAPITAL LETTER EPSILON WITH TONOS +"\x0389" "I" # GREEK CAPITAL LETTER ETA WITH TONOS +"\x038a" "I" # GREEK CAPITAL LETTER IOTA WITH TONOS +"\x038c" "O" # GREEK CAPITAL LETTER OMICRON WITH TONOS +"\x038e" "Y" # GREEK CAPITAL LETTER UPSILON WITH TONOS +"\x038f" "O" # GREEK CAPITAL LETTER OMEGA WITH TONOS +"\x0390" "I" # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS +"\x0391" "A" # GREEK CAPITAL LETTER ALPHA +"\x0392" "V" # GREEK CAPITAL LETTER BETA +"\x0393" "G" # GREEK CAPITAL LETTER GAMMA +"\x0394" "D" # GREEK CAPITAL LETTER DELTA +"\x0395" "E" # GREEK CAPITAL LETTER EPSILON +"\x0396" "Z" # GREEK CAPITAL LETTER ZETA +"\x0397" "I" # GREEK CAPITAL LETTER ETA +"\x0398" "TH" # GREEK CAPITAL LETTER THETA +"\x0399" "I" # GREEK CAPITAL LETTER IOTA +"\x039a" "K" # GREEK CAPITAL LETTER KAPPA +"\x039b" "L" # GREEK CAPITAL LETTER LAMDA +"\x039c" "M" # GREEK CAPITAL LETTER MU +"\x039d" "N" # GREEK CAPITAL LETTER NU +"\x039e" "X" # GREEK CAPITAL LETTER XI +"\x039f" "O" # GREEK CAPITAL LETTER OMICRON +"\x03a0" "P" # GREEK CAPITAL LETTER PI +"\x03a1" "R" # GREEK CAPITAL LETTER RHO +"\x03a3" "S" # GREEK CAPITAL LETTER SIGMA +"\x03a4" "T" # GREEK CAPITAL LETTER TAU +"\x03a5" "Y" # GREEK CAPITAL LETTER UPSILON +"\x03a6" "F" # GREEK CAPITAL LETTER PHI +"\x03a7" "CH" # GREEK CAPITAL LETTER CHI +"\x03a8" "PS" # GREEK CAPITAL LETTER PSI +"\x03a9" "O" # GREEK CAPITAL LETTER OMEGA +"\x03aa" "I" # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA +"\x03ab" "Y" # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA +"\x03ac" "a" # GREEK SMALL LETTER ALPHA WITH TONOS +"\x03ad" "e" # GREEK SMALL LETTER EPSILON WITH TONOS +"\x03ae" "i" # GREEK SMALL LETTER ETA WITH TONOS +"\x03af" "i" # GREEK SMALL LETTER IOTA WITH TONOS +"\x03b0" "y" # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS +"\x03b1" "a" # GREEK SMALL LETTER ALPHA +"\x03b2" "v" # GREEK SMALL LETTER BETA +"\x03b3" "g" # GREEK SMALL LETTER GAMMA +"\x03b4" "d" # GREEK SMALL LETTER DELTA +"\x03b5" "e" # GREEK SMALL LETTER EPSILON +"\x03b6" "z" # GREEK SMALL LETTER ZETA +"\x03b7" "i" # GREEK SMALL LETTER ETA +"\x03b8" "th" # GREEK SMALL LETTER THETA +"\x03b9" "i" # GREEK SMALL LETTER IOTA +"\x03ba" "k" # GREEK SMALL LETTER KAPPA +"\x03bb" "l" # GREEK SMALL LETTER LAMDA +"\x03bc" "m" # GREEK SMALL LETTER MU +"\x03bd" "n" # GREEK SMALL LETTER NU +"\x03be" "x" # GREEK SMALL LETTER XI +"\x03bf" "o" # GREEK SMALL LETTER OMICRON +"\x03c0" "p" # GREEK SMALL LETTER PI +"\x03c1" "r" # GREEK SMALL LETTER RHO +"\x03c2" "s" # GREEK SMALL LETTER FINAL SIGMA +"\x03c3" "s" # GREEK SMALL LETTER SIGMA +"\x03c4" "t" # GREEK SMALL LETTER TAU +"\x03c5" "y" # GREEK SMALL LETTER UPSILON +"\x03c6" "f" # GREEK SMALL LETTER PHI +"\x03c7" "ch" # GREEK SMALL LETTER CHI +"\x03c8" "ps" # GREEK SMALL LETTER PSI +"\x03c9" "o" # GREEK SMALL LETTER OMEGA +"\x03ca" "i" # GREEK SMALL LETTER IOTA WITH DIALYTIKA +"\x03cb" "y" # GREEK SMALL LETTER UPSILON WITH DIALYTIKA +"\x03cc" "o" # GREEK SMALL LETTER OMICRON WITH TONOS +"\x03cd" "y" # GREEK SMALL LETTER UPSILON WITH TONOS +"\x03ce" "o" # GREEK SMALL LETTER OMEGA WITH TONOS +"\x03cf" "&" # GREEK CAPITAL KAI SYMBOL +"\x03d0" "b" # GREEK BETA SYMBOL +"\x03d1" "th" # GREEK THETA SYMBOL +"\x03d2" "Y`" # GREEK UPSILON WITH HOOK SYMBOL +"\x03d3" "Y`" # GREEK UPSILON WITH ACUTE AND HOOK SYMBOL +"\x03d4" "Y`" # GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL +"\x03d5" "f" # GREEK PHI SYMBOL +"\x03d6" "p" # GREEK PI SYMBOL +"\x03d7" "&" # GREEK KAI SYMBOL +"\x03d8" "Q" # GREEK LETTER ARCHAIC KOPPA +"\x03d9" "q" # GREEK SMALL LETTER ARCHAIC KOPPA +"\x03da" "6" # GREEK LETTER STIGMA +"\x03db" "6" # GREEK SMALL LETTER STIGMA +"\x03dc" "W" # GREEK LETTER DIGAMMA +"\x03dd" "w" # GREEK SMALL LETTER DIGAMMA +"\x03de" "90" # GREEK LETTER KOPPA +"\x03df" "90" # GREEK SMALL LETTER KOPPA +"\x03e0" "900" # GREEK LETTER SAMPI +"\x03e1" "900" # GREEK SMALL LETTER SAMPI +"\x03e2" "SH" # COPTIC CAPITAL LETTER SHEI +"\x03e3" "sh" # COPTIC SMALL LETTER SHEI +"\x03e4" "F" # COPTIC CAPITAL LETTER FEI +"\x03e5" "f" # COPTIC SMALL LETTER FEI +"\x03e6" "KH" # COPTIC CAPITAL LETTER KHEI +"\x03e7" "kh" # COPTIC SMALL LETTER KHEI +"\x03e8" "H" # COPTIC CAPITAL LETTER HORI +"\x03e9" "h" # COPTIC SMALL LETTER HORI +"\x03ea" "DJ" # COPTIC CAPITAL LETTER GANGIA +"\x03eb" "dj" # COPTIC SMALL LETTER GANGIA +"\x03ec" "GJ" # COPTIC CAPITAL LETTER SHIMA +"\x03ed" "gj" # COPTIC SMALL LETTER SHIMA +"\x03ee" "TI" # COPTIC CAPITAL LETTER DEI +"\x03ef" "ti" # COPTIC SMALL LETTER DEI +"\x03f0" "k" # GREEK KAPPA SYMBOL +"\x03f1" "r" # GREEK RHO SYMBOL +"\x03f2" "s" # GREEK LUNATE SIGMA SYMBOL +"\x03f3" "j" # GREEK LETTER YOT +"\x03f4" "TH" # GREEK CAPITAL THETA SYMBOL +"\x03f5" "e" # GREEK LUNATE EPSILON SYMBOL +"\x03f6" "e" # GREEK REVERSED LUNATE EPSILON SYMBOL +"\x03f7" "SH" # GREEK CAPITAL LETTER SHO +"\x03f8" "sh" # GREEK SMALL LETTER SHO +"\x03f9" "S" # GREEK CAPITAL LUNATE SIGMA SYMBOL +"\x03fa" "S" # GREEK CAPITAL LETTER SAN +"\x03fb" "s" # GREEK SMALL LETTER SAN +"\x03fc" "r" # GREEK RHO WITH STROKE SYMBOL +"\x03fd" "S" # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL +"\x03fe" "S" # GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL +"\x03ff" "S" # GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL "\x0401" "YO" # CYRILLIC CAPITAL LETTER IO "\x0402" "DJ" # CYRILLIC CAPITAL LETTER DJE "\x0403" "G`" # CYRILLIC CAPITAL LETTER GJE -- 2.17.1