[BZ,17588,13064] Update UTF-8 charmap and width to Unicode 7.0.0

  On Feb 16, 2015, "Carlos O'Donell" <carlos@redhat.com> wrote:

> On 02/12/2015 05:18 AM, Alexandre Oliva wrote:
>>> Regression tested on x86_64-linux-gnu.  Ok to install?

> Yes, this version is OK to install if you fix all the nits.

Thanks.

> Despite complaints that a change in the generator would create
> a smaller diff, that doesn't matter to me.

The script changes were small and I figured it wouldn't hurt to merge
them and reduce the diff, so I did.  So I'll wait for another ACK before
checking this in.

I also added the downloaded files to the tree, so that binary
distributors don't risk running afoul of the LGPL for lack of the .txt
files.  It's not clear that they would be required, but it doesn't hurt
to put them in.  I also added unicode-license.txt, copied from other
packages that ship it.  I couldn't find the text file for download from
unicode.org, though I admittedly didn't search very thoroughly.

> Nit: ChangeLog needs [BZ #xxx] etc.

*check*.  Heh, I didn't realize there were open bugs about this, in
spite of the mention in the Subject.  Doh!

> Nit: This covers bugs 17588, 13064, *AND* 14094.

*check*

> Nit: Needs a NEWS entry describing this in full glory :-)

* Character encoding and ctype tables were updated to Unicode 7.0.0, using
  new generator scripts contributed by Pravin Satpute and Mike FABIAN (Red
  Hat).  These updates cause user visible changes, such as the fix for bug
  17998.

> Some might argue it fits better under "scripts" e.g. scripts/unicode-gen,
> but I don't care. We can move it later if we think it should move at all.

*nod*

>>> * unicode-gen/gen_unicode_ctype.py: New generator.

> Nit: Wrong copyright year e.g. 2014 -> 2015.

*check*.  I added ", 2015" after 2014 in the scripts.

> Nit: We don't use "Contributed by" statements, they are instead pat of what
>      git records as Author or in the git commit message.

*check*.  I removed them from the scripts, and added them as "from" in
the ChangeLog and in NEWS.

I also removed the links that pointed to github as upstream, since I
understand the GNU libc repository is going to hold the master copy, and
the repository that was linked to is thus obsolescent.

>>> * tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
>>> true for ordinal indicators.

> Nit: This need a specific new BZ for the fix to user-visible behaviour.

*check*: [BZ# 17998]

Here's the header of the patch and the incremental changes to the
scripts, from the previously posted version.

The entire patch can be found in the lzip-compressed attachment.

for  localedata/ChangeLog

	[BZ #17588]
	[BZ #13064]
	[BZ #14094]
	[BZ #17998]
	* unicode-gen/Makefile: New.
	* unicode-gen/unicode-license.txt: New, from Unicode.
	* unicode-gen/UnicodeData.txt: New, from Unicode.
	* unicode-gen/DerivedCoreProperties.txt: New, from Unicode.
	* unicode-gen/EastAsianWidth.txt: New, from Unicode.
	* unicode-gen/gen_unicode_ctype.py: New generator, from Mike
	FABIAN <mfabian@redhat.com>.
	* unicode-gen/ctype_compatibility.py: New verifier, from
	Pravin Satpute <psatpute@redhat.com> and Mike FABIAN.
	* unicode-gen/ctype_compatibility_test_cases.py: New verifier
	module, from Mike FABIAN.
	* unicode-gen/utf8_gen.py: New generator, from Pravin Satpute
	and Mike FABIAN.
	* unicode-gen/utf8_compatibility.py: New verifier, from Pravin
	Satpute and Mike FABIAN.
	* charmaps/UTF-8: Update.
	* locales/i18n: Update.
	* gen-unicode-ctype.c: Remove.
	* tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
	true for ordinal indicators.
---
 NEWS                                               |   11 
 localedata/charmaps/UTF-8                          |11946 ++++++---
 localedata/gen-unicode-ctype.c                     |  784 -
 localedata/locales/i18n                            | 2652 +-
 localedata/tst-ctype-de_DE.ISO-8859-1.in           |    2 
 localedata/unicode-gen/DerivedCoreProperties.txt   |10794 ++++++++
 localedata/unicode-gen/EastAsianWidth.txt          | 2121 ++
 localedata/unicode-gen/Makefile                    |   99 
 localedata/unicode-gen/UnicodeData.txt             |27268 ++++++++++++++++++++
 localedata/unicode-gen/ctype_compatibility.py      |  546 
 .../unicode-gen/ctype_compatibility_test_cases.py  |  951 +
 localedata/unicode-gen/gen_unicode_ctype.py        |  751 +
 localedata/unicode-gen/unicode-license.txt         |   50 
 localedata/unicode-gen/utf8_compatibility.py       |  399 
 localedata/unicode-gen/utf8_gen.py                 |  286 
 15 files changed, 53278 insertions(+), 5382 deletions(-)
 delete mode 100644 localedata/gen-unicode-ctype.c
 create mode 100644 localedata/unicode-gen/DerivedCoreProperties.txt
 create mode 100644 localedata/unicode-gen/EastAsianWidth.txt
 create mode 100644 localedata/unicode-gen/Makefile
 create mode 100644 localedata/unicode-gen/UnicodeData.txt
 create mode 100755 localedata/unicode-gen/ctype_compatibility.py
 create mode 100644 localedata/unicode-gen/ctype_compatibility_test_cases.py
 create mode 100755 localedata/unicode-gen/gen_unicode_ctype.py
 create mode 100644 localedata/unicode-gen/unicode-license.txt
 create mode 100755 localedata/unicode-gen/utf8_compatibility.py
 create mode 100755 localedata/unicode-gen/utf8_gen.py

Message ID	ortwyig5xa.fsf@livre.home (mailing list archive)
State	Committed
Delegated to:	Carlos O'Donell
Headers	Received: (qmail 27007 invoked by alias); 18 Feb 2015 23:24:15 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 26990 invoked by uid 89); 18 Feb 2015 23:24:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.5 required=5.0 tests=AWL, BAYES_50, LIKELY_SPAM_SUBJECT, SPF_HELO_PASS, SPF_PASS, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mx1.redhat.com From: Alexandre Oliva <aoliva@redhat.com> To: "Carlos O'Donell" <carlos@redhat.com> Cc: Pravin Satpute <psatpute@redhat.com>, Siddhesh Poyarekar <siddhesh@redhat.com>, Mike FABIAN <mfabian@redhat.com>, libc-alpha@sourceware.org, Jens Petersen <petersen@redhat.com> Subject: Re: [PATCH] [BZ 17588 13064] Update UTF-8 charmap and width to Unicode 7.0.0 References: <573624784.8871393.1416848051220.JavaMail.zimbra@redhat.com> <orzjb3o7yf.fsf@free.home> <s9dy4qir6fu.fsf@ari.site> <orfvce7y90.fsf@free.home> <s9d388duu5r.fsf@ari.site> <orioh35mbq.fsf@free.home> <20141223111038.GA5172@spoyarek.pnq.redhat.com> <119234933.5523688.1422972847328.JavaMail.zimbra@redhat.com> <or7fvnlbeo.fsf@livre.home> <orwq3njuvc.fsf@livre.home> <54E23EC9.5020400@redhat.com> Date: Wed, 18 Feb 2015 21:23:45 -0200 In-Reply-To: <54E23EC9.5020400@redhat.com> (Carlos O'Donell's message of "Mon, 16 Feb 2015 14:02:33 -0500") Message-ID: <ortwyig5xa.fsf@livre.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-="

[BZ,17588,13064] Update UTF-8 charmap and width to Unicode 7.0.0

Commit Message

Comments

Patch