mbox

[v4,0/3] C.UTF-8

Message ID 20210729063515.1541388-1-carlos@redhat.com
Headers

Message

Carlos O'Donell July 29, 2021, 6:35 a.m. UTC
  The following changes implement a minimally sized C.UTF-8.
First we implement the 'strcmp_collation' directive.
Then we implement C.UTF-8 with an LC_COLLATE that uses the
'strcmp_collation' directive to support using strcmp for
collation i.e. code point sorting. The final C.UTF-8 is
only ~396KiB with the largest ~346KiB in LC_CTYPE for all
of Unicode.

This v4 fixes the regressions detected in Fedora Rawhide
here: https://bugzilla.redhat.com/show_bug.cgi?id=1986421
Additional testing coverage is provided for fnmatch, regcomp,
and regexec (which would have caught the regression).

Carlos O'Donell (3):
  Add support for locales with zero collation rules.
  Add 'strcmp_collation' support for LC_COLLATE.
  Add generic C.UTF-8 locale (Bug 17318)

 iconv/Makefile                   |  22 +-
 iconv/tst-iconv9.c               |  87 +++++
 locale/programs/ld-collate.c     |  24 +-
 locale/programs/locfile-kw.gperf |   1 +
 locale/programs/locfile-kw.h     | 306 ++++++++---------
 locale/programs/locfile-token.h  |   1 +
 localedata/C.UTF-8.in            | 157 +++++++++
 localedata/Makefile              |   2 +
 localedata/SUPPORTED             |   1 +
 localedata/locales/C             | 194 +++++++++++
 posix/bug-regex1.c               |  20 ++
 posix/bug-regex19.c              |  22 +-
 posix/bug-regex4.c               |  25 ++
 posix/bug-regex6.c               |   2 +-
 posix/fnmatch_loop.c             |  95 ++++--
 posix/regcomp.c                  |  12 +-
 posix/regexec.c                  |  85 +++--
 posix/transbug.c                 |  22 +-
 posix/tst-fnmatch.input          | 549 ++++++++++++++++++++++++++++++-
 posix/tst-regcomp-truncated.c    |   1 +
 posix/tst-regex.c                |  25 +-
 21 files changed, 1385 insertions(+), 268 deletions(-)
 create mode 100644 iconv/tst-iconv9.c
 create mode 100644 localedata/C.UTF-8.in
 create mode 100644 localedata/locales/C