Update Macintosh character maps (bug 23073)
Commit Message
This updates the macintosh character maps from the tables available from
<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> and
<http://www.evertype.com/standards/se/mac-sami.html>. Note that this
makes MAC-CYRILLIC identical to MAC-UK (it was identical to CP10007
before).
* localedata/charmaps/MACINTOSH: Update from
<http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>.
* localedata/charmaps/MAC-CYRILLIC: Likewise.
* localedata/charmaps/MAC-IS: Likewise.
* localedata/charmaps/MAC-UK: Likewise.
* localedata/charmaps/MAC-SAMI: Update from
<http://www.evertype.com/standards/se/mac-sami.html>.
* iconvdata/mac-is.c (HAS_HOLES): Define to 0.
---
iconvdata/mac-is.c | 2 +-
localedata/charmaps/MAC-CYRILLIC | 5 ++---
localedata/charmaps/MAC-IS | 22 ++++++++++++----------
localedata/charmaps/MAC-SAMI | 12 ++++++------
localedata/charmaps/MAC-UK | 2 +-
localedata/charmaps/MACINTOSH | 4 ++--
6 files changed, 24 insertions(+), 23 deletions(-)
Comments
On 04/18/2018 02:46 PM, Andreas Schwab wrote:
> This updates the macintosh character maps from the tables available from
> <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> and
> <http://www.evertype.com/standards/se/mac-sami.html>. Note that this
> makes MAC-CYRILLIC identical to MAC-UK (it was identical to CP10007
> before).
Have you checked these mappings match what MacOS does?
If not, I can do so later this week (I hope).
Thanks,
Florian
On Apr 18 2018, Florian Weimer <fweimer@redhat.com> wrote:
> On 04/18/2018 02:46 PM, Andreas Schwab wrote:
>> This updates the macintosh character maps from the tables available from
>> <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> and
>> <http://www.evertype.com/standards/se/mac-sami.html>. Note that this
>> makes MAC-CYRILLIC identical to MAC-UK (it was identical to CP10007
>> before).
>
> Have you checked these mappings match what MacOS does?
I don't have access to a MacOS installation.
Andreas.
On 04/18/2018 04:52 PM, Andreas Schwab wrote:
> On Apr 18 2018, Florian Weimer <fweimer@redhat.com> wrote:
>
>> On 04/18/2018 02:46 PM, Andreas Schwab wrote:
>>> This updates the macintosh character maps from the tables available from
>>> <http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> and
>>> <http://www.evertype.com/standards/se/mac-sami.html>. Note that this
>>> makes MAC-CYRILLIC identical to MAC-UK (it was identical to CP10007
>>> before).
>>
>> Have you checked these mappings match what MacOS does?
>
> I don't have access to a MacOS installation.
We have a test machine somewhere in the office. I will run this tool on it:
https://pagure.io/glibc/iconv-enumerate
and report back the results (assuming that it works as expected).
Thanks,
Florian
On 04/18/2018 02:46 PM, Andreas Schwab wrote:
> diff --git a/localedata/charmaps/MAC-UK b/localedata/charmaps/MAC-UK
> index f9da2dda35..a3d278011c 100644
> --- a/localedata/charmaps/MAC-UK
> +++ b/localedata/charmaps/MAC-UK
> @@ -259,5 +259,5 @@ CHARMAP
> <U044C> /xfc CYRILLIC SMALL LETTER SOFT SIGN
> <U044D> /xfd CYRILLIC SMALL LETTER E
> <U044E> /xfe CYRILLIC SMALL LETTER YU
> -<U00A4> /xff CURRENCY SIGN
> +<U20AC> /xff EURO SIGN
> END CHARMAP
I'm checking these changes against Darwin 17.5.0 (from March 2018).
The MacOS equivalent is called MacUkraine. It matches the current
mapping, so the change above is incorrect. It also appears to be
semantically wrong.
> diff --git a/localedata/charmaps/MACINTOSH b/localedata/charmaps/MACINTOSH
> index b826dc944f..86d732fb63 100644
> --- a/localedata/charmaps/MACINTOSH
> +++ b/localedata/charmaps/MACINTOSH
> @@ -204,7 +204,7 @@ CHARMAP
> <U221A> /xc3 SQUARE ROOT
> <U0192> /xc4 LATIN SMALL LETTER F WITH HOOK
> <U2248> /xc5 ALMOST EQUAL TO
> -<U0394> /xc6 GREEK CAPITAL LETTER DELTA
> +<U2206> /xc6 INCREMENT
> <U00AB> /xc7 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
> <U00BB> /xc8 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
> <U2026> /xc9 HORIZONTAL ELLIPSIS
> @@ -246,7 +246,7 @@ CHARMAP
> <U00CC> /xed LATIN CAPITAL LETTER I WITH GRAVE
> <U00D3> /xee LATIN CAPITAL LETTER O WITH ACUTE
> <U00D4> /xef LATIN CAPITAL LETTER O WITH CIRCUMFLEX
> -<UE01E> /xf0 APPLE LOGO (Macintosh_F0)
> +<UF8FF> /xf0 APPLE LOGO (Macintosh_F0)
> <U00D2> /xf1 LATIN CAPITAL LETTER O WITH GRAVE
> <U00DA> /xf2 LATIN CAPITAL LETTER U WITH ACUTE
> <U00DB> /xf3 LATIN CAPITAL LETTER U WITH CIRCUMFLEX
This change aligns our table with the one in MacOS (called Macintosh
there), so you can commit it.
I'm still checking the other changes.
Thanks,
Florian
On 04/18/2018 02:46 PM, Andreas Schwab wrote:
> diff --git a/localedata/charmaps/MAC-IS b/localedata/charmaps/MAC-IS
> index 835a49c664..b3c308a9c0 100644
> --- a/localedata/charmaps/MAC-IS
> +++ b/localedata/charmaps/MAC-IS
> @@ -163,7 +163,7 @@ CHARMAP
> <U00F9> /x9d LATIN SMALL LETTER U WITH GRAVE
> <U00FB> /x9e LATIN SMALL LETTER U WITH CIRCUMFLEX
> <U00FC> /x9f LATIN SMALL LETTER U WITH DIAERESIS
> -<U2020> /xa0 DAGGER
> +<U00DD> /xa0 LATIN CAPITAL LETTER Y WITH ACUTE
I'm comparing against MacIceland (on Darwin 17.5.0). This change is good.
> @@ -201,7 +201,7 @@ CHARMAP
> <U221A> /xc3 SQUARE ROOT
> <U0192> /xc4 LATIN SMALL LETTER F WITH HOOK
> <U2248> /xc5 ALMOST EQUAL TO
> -<U0394> /xc6 GREEK CAPITAL LETTER DELTA
> +<U2206> /xc6 INCREMENT
This one too.
> @@ -211,23 +211,23 @@ CHARMAP
> <U00D5> /xcd LATIN CAPITAL LETTER O WITH TILDE
> <U0152> /xce LATIN CAPITAL LIGATURE OE
> <U0153> /xcf LATIN SMALL LIGATURE OE
> -<U2014> /xd0 EM DASH
> -<U2013> /xd1 EN DASH
> +<U2013> /xd0 EN DASH
> +<U2014> /xd1 EM DASH
Good as well.
> -<U25C6> /xd7 BLACK DIAMOND
> +<U25CA> /xd7 LOZENGE
Likewise.
> -<U00A4> /xdb CURRENCY SIGN
> -<U0110> /xdc LATIN CAPITAL LETTER D WITH STROKE
> -<U0111> /xdd LATIN SMALL LETTER D WITH STROKE
> +<U20AC> /xdb EURO SIGN
> +<U00D0> /xdc LATIN CAPITAL LETTER ETH
> +<U00F0> /xdd LATIN SMALL LETTER ETH
The Mac keeps the international currency sign for /xdb, so we should as
well. The other changes are good.
> <U00DE> /xde LATIN CAPITAL LETTER THORN (Icelandic)
> <U00FE> /xdf LATIN SMALL LETTER THORN (Icelandic)
> -<U2021> /xe0 DOUBLE DAGGER
> +<U00FD> /xe0 LATIN SMALL LETTER Y WITH ACUTE
Okay.
> @@ -243,12 +243,14 @@ CHARMAP
> <U00CC> /xed LATIN CAPITAL LETTER I WITH GRAVE
> <U00D3> /xee LATIN CAPITAL LETTER O WITH ACUTE
> <U00D4> /xef LATIN CAPITAL LETTER O WITH CIRCUMFLEX
> -<UE01E> /xf0 APPLE LOGO (Macintosh_F0)
> +<UF8FF> /xf0 APPLE LOGO (Macintosh_F0)
MacOS has no mapping for /xf0 and returns EILSEQ under iconv. I think
we should do the same. This may require reverting the mac-is.c change, too.
> +<U02C6> /xf6 MODIFIER LETTER CIRCUMFLEX ACCENT
> +<U02DC> /xf7 SMALL TILDE
Again okay.
In addition to the two discrepancies, I see this difference with your
patch applied:
-<U03A9> /xbd
+<U2126> /xbd
That is, we use GREEK CAPITAL LETTER OMEGA, and MacOS uses OHM SIGN. I
think we should make the switch to align with MacOS.
NB: testrun.sh does not seem to force use of the non-installed gconv
modules in the source tree.
Thanks,
Florian
On 04/18/2018 02:46 PM, Andreas Schwab wrote:
> diff --git a/localedata/charmaps/MAC-CYRILLIC b/localedata/charmaps/MAC-CYRILLIC
> index 482cfd4691..32463f2be4 100644
> --- a/localedata/charmaps/MAC-CYRILLIC
> +++ b/localedata/charmaps/MAC-CYRILLIC
> @@ -2,7 +2,6 @@
> <comment_char> %
> <escape_char> /
>
> -% alias CP10007
> CHARMAP
> <U0000> /x00 NULL
> <U0001> /x01 START OF HEADING
> @@ -166,7 +165,7 @@ CHARMAP
> <U042F> /x9f CYRILLIC CAPITAL LETTER YA
> <U2020> /xa0 DAGGER
> <U00B0> /xa1 DEGREE SIGN
> -<U00A2> /xa2 CENT SIGN
> +<U0490> /xa2 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
MacOS, in MacCyrillic on Darwin 17.5.0 keeps the cent sign.
> @@ -259,5 +258,5 @@ CHARMAP
> <U044C> /xfc CYRILLIC SMALL LETTER SOFT SIGN
> <U044D> /xfd CYRILLIC SMALL LETTER E
> <U044E> /xfe CYRILLIC SMALL LETTER YU
> -<U00A4> /xff CURRENCY SIGN
> +<U20AC> /xff EURO SIGN
It also keeps the international currency sign.
There is a further discrepancy:
-<U0491> /xb6
+<U2202> /xb6
We use CYRILLIC SMALL LETTER GHE WITH UPTURN, they have PARTIAL
DIFFERENTIAL. This is related to the first difference.
With your patch, MAC-CYRILLIC and MAC-UK are identical. I think should
keep the separation, like in MacOS does.
Thanks,
Florian
On 04/18/2018 02:46 PM, Andreas Schwab wrote:
> * localedata/charmaps/MAC-SAMI: Update from
> <http://www.evertype.com/standards/se/mac-sami.html>.
MacOS doesn't have a Sami locale or iconv character set, as far as I can
see. I don't know if the above table has ever been used on Macs, or was
just a font encoding third-party vendors agreed upon.
Thanks,
Florian
@@ -23,6 +23,6 @@
#define TABLES <mac-is.h>
#define CHARSET_NAME "MAC-IS//"
-#define HAS_HOLES 1 /* Not all 256 character are defined. */
+#define HAS_HOLES 0
#include <8bit-gap.c>
@@ -2,7 +2,6 @@
<comment_char> %
<escape_char> /
-% alias CP10007
CHARMAP
<U0000> /x00 NULL
<U0001> /x01 START OF HEADING
@@ -166,7 +165,7 @@ CHARMAP
<U042F> /x9f CYRILLIC CAPITAL LETTER YA
<U2020> /xa0 DAGGER
<U00B0> /xa1 DEGREE SIGN
-<U00A2> /xa2 CENT SIGN
+<U0490> /xa2 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
<U00A3> /xa3 POUND SIGN
<U00A7> /xa4 SECTION SIGN
<U2022> /xa5 BULLET
@@ -259,5 +258,5 @@ CHARMAP
<U044C> /xfc CYRILLIC SMALL LETTER SOFT SIGN
<U044D> /xfd CYRILLIC SMALL LETTER E
<U044E> /xfe CYRILLIC SMALL LETTER YU
-<U00A4> /xff CURRENCY SIGN
+<U20AC> /xff EURO SIGN
END CHARMAP
@@ -163,7 +163,7 @@ CHARMAP
<U00F9> /x9d LATIN SMALL LETTER U WITH GRAVE
<U00FB> /x9e LATIN SMALL LETTER U WITH CIRCUMFLEX
<U00FC> /x9f LATIN SMALL LETTER U WITH DIAERESIS
-<U2020> /xa0 DAGGER
+<U00DD> /xa0 LATIN CAPITAL LETTER Y WITH ACUTE
<U00B0> /xa1 DEGREE SIGN
<U00A2> /xa2 CENT SIGN
<U00A3> /xa3 POUND SIGN
@@ -201,7 +201,7 @@ CHARMAP
<U221A> /xc3 SQUARE ROOT
<U0192> /xc4 LATIN SMALL LETTER F WITH HOOK
<U2248> /xc5 ALMOST EQUAL TO
-<U0394> /xc6 GREEK CAPITAL LETTER DELTA
+<U2206> /xc6 INCREMENT
<U00AB> /xc7 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
<U00BB> /xc8 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
<U2026> /xc9 HORIZONTAL ELLIPSIS
@@ -211,23 +211,23 @@ CHARMAP
<U00D5> /xcd LATIN CAPITAL LETTER O WITH TILDE
<U0152> /xce LATIN CAPITAL LIGATURE OE
<U0153> /xcf LATIN SMALL LIGATURE OE
-<U2014> /xd0 EM DASH
-<U2013> /xd1 EN DASH
+<U2013> /xd0 EN DASH
+<U2014> /xd1 EM DASH
<U201C> /xd2 LEFT DOUBLE QUOTATION MARK
<U201D> /xd3 RIGHT DOUBLE QUOTATION MARK
<U2018> /xd4 LEFT SINGLE QUOTATION MARK
<U2019> /xd5 RIGHT SINGLE QUOTATION MARK
<U00F7> /xd6 DIVISION SIGN
-<U25C6> /xd7 BLACK DIAMOND
+<U25CA> /xd7 LOZENGE
<U00FF> /xd8 LATIN SMALL LETTER Y WITH DIAERESIS
<U0178> /xd9 LATIN CAPITAL LETTER Y WITH DIAERESIS
<U2044> /xda FRACTION SLASH
-<U00A4> /xdb CURRENCY SIGN
-<U0110> /xdc LATIN CAPITAL LETTER D WITH STROKE
-<U0111> /xdd LATIN SMALL LETTER D WITH STROKE
+<U20AC> /xdb EURO SIGN
+<U00D0> /xdc LATIN CAPITAL LETTER ETH
+<U00F0> /xdd LATIN SMALL LETTER ETH
<U00DE> /xde LATIN CAPITAL LETTER THORN (Icelandic)
<U00FE> /xdf LATIN SMALL LETTER THORN (Icelandic)
-<U2021> /xe0 DOUBLE DAGGER
+<U00FD> /xe0 LATIN SMALL LETTER Y WITH ACUTE
<U00B7> /xe1 MIDDLE DOT
<U201A> /xe2 SINGLE LOW-9 QUOTATION MARK
<U201E> /xe3 DOUBLE LOW-9 QUOTATION MARK
@@ -243,12 +243,14 @@ CHARMAP
<U00CC> /xed LATIN CAPITAL LETTER I WITH GRAVE
<U00D3> /xee LATIN CAPITAL LETTER O WITH ACUTE
<U00D4> /xef LATIN CAPITAL LETTER O WITH CIRCUMFLEX
-<UE01E> /xf0 APPLE LOGO (Macintosh_F0)
+<UF8FF> /xf0 APPLE LOGO (Macintosh_F0)
<U00D2> /xf1 LATIN CAPITAL LETTER O WITH GRAVE
<U00DA> /xf2 LATIN CAPITAL LETTER U WITH ACUTE
<U00DB> /xf3 LATIN CAPITAL LETTER U WITH CIRCUMFLEX
<U00D9> /xf4 LATIN CAPITAL LETTER U WITH GRAVE
<U0131> /xf5 LATIN SMALL LETTER DOTLESS I
+<U02C6> /xf6 MODIFIER LETTER CIRCUMFLEX ACCENT
+<U02DC> /xf7 SMALL TILDE
<U00AF> /xf8 MACRON
<U02D8> /xf9 BREVE
<U02D9> /xfa DOT ABOVE (Mandarin Chinese light tone)
@@ -1,7 +1,7 @@
<code_set_name> MAC-SAMI
<comment_char> %
<escape_char> /
-% source: http://www.indigo.ie/egt/standards/se/mac-sami.html
+% source: http://www.evertype.com/standards/se/mac-sami.html
% author: Regnor Jernsletten <Regnor.Jernsletten@sami.uit.no>
% date: 2001-07-31
% comment: Macintosh sami standard.
@@ -185,8 +185,8 @@ CHARMAP
<U00D8> /xaf LATIN CAPITAL LETTER O WITH STROKE
<U0110> /xb0 LATIN CAPITAL LETTER D WITH STROKE
<U014A> /xb1 LATIN CAPITAL LETTER ENG
-<U021E> /xb2 LATIN CAPITAL LETTER H WITH CARON
-<U021F> /xb3 LATIN SMALL LETTER H WITH CARON
+<U2264> /xb2 LESS-THAN OR EQUAL TO
+<U2265> /xb3 GREATER-THAN OR EQUAL TO
<U0160> /xb4 LATIN CAPITAL LETTER S WITH CARON
<U0166> /xb5 LATIN CAPITAL LETTER T WITH STROKE
<U2202> /xb6 PARTIAL DIFFERENTIAL
@@ -205,7 +205,7 @@ CHARMAP
<U221A> /xc3 SQUARE ROOT
<U0192> /xc4 LATIN SMALL LETTER F WITH HOOK
<U2248> /xc5 ALMOST EQUAL TO
-<U2206> /xc6 INCREMENT
+<U021E> /xc6 LATIN CAPITAL LETTER H WITH CARON
<U00AB> /xc7 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
<U00BB> /xc8 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
<U2026> /xc9 HORIZONTAL ELLIPSIS
@@ -222,11 +222,11 @@ CHARMAP
<U2018> /xd4 LEFT SINGLE QUOTATION MARK
<U2019> /xd5 RIGHT SINGLE QUOTATION MARK
<U00F7> /xd6 DIVISION SIGN
-<U25CA> /xd7 LOZENGE
+<U021F> /xd7 LATIN SMALL LETTER H WITH CARON
<U00FF> /xd8 LATIN SMALL LETTER Y WITH DIAERESIS
<U0178> /xd9 LATIN CAPITAL LETTER Y WITH DIAERESIS
<U2044> /xda FRACTION SLASH
-<U00A4> /xdb CURRENCY SIGN
+<U20AC> /xdb EURO SIGN
<U00D0> /xdc LATIN CAPITAL LETTER ETH
<U00F0> /xdd LATIN SMALL LETTER ETH
<U00DE> /xde LATIN CAPITAL LETTER THORN
@@ -259,5 +259,5 @@ CHARMAP
<U044C> /xfc CYRILLIC SMALL LETTER SOFT SIGN
<U044D> /xfd CYRILLIC SMALL LETTER E
<U044E> /xfe CYRILLIC SMALL LETTER YU
-<U00A4> /xff CURRENCY SIGN
+<U20AC> /xff EURO SIGN
END CHARMAP
@@ -204,7 +204,7 @@ CHARMAP
<U221A> /xc3 SQUARE ROOT
<U0192> /xc4 LATIN SMALL LETTER F WITH HOOK
<U2248> /xc5 ALMOST EQUAL TO
-<U0394> /xc6 GREEK CAPITAL LETTER DELTA
+<U2206> /xc6 INCREMENT
<U00AB> /xc7 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
<U00BB> /xc8 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
<U2026> /xc9 HORIZONTAL ELLIPSIS
@@ -246,7 +246,7 @@ CHARMAP
<U00CC> /xed LATIN CAPITAL LETTER I WITH GRAVE
<U00D3> /xee LATIN CAPITAL LETTER O WITH ACUTE
<U00D4> /xef LATIN CAPITAL LETTER O WITH CIRCUMFLEX
-<UE01E> /xf0 APPLE LOGO (Macintosh_F0)
+<UF8FF> /xf0 APPLE LOGO (Macintosh_F0)
<U00D2> /xf1 LATIN CAPITAL LETTER O WITH GRAVE
<U00DA> /xf2 LATIN CAPITAL LETTER U WITH ACUTE
<U00DB> /xf3 LATIN CAPITAL LETTER U WITH CIRCUMFLEX