Make Unicode generation reproducible.

Message ID 20210429172742.3301414-1-carlos@redhat.com
State Changes Requested, archived
Headers
Series Make Unicode generation reproducible. |

Commit Message

Carlos O'Donell April 29, 2021, 5:27 p.m. UTC
  The following changes make Unicode generation reproducible.

First we create a UnicodeRelease.txt file with metadata about the
release. This metadata contains the release date for the Unicode
version that we imported into glibc. Then we add APIs to
unicode_utils.py to access the release metadata. Then we refactor
all of the code use the release metadata, which includes using
the consistent date of the Unicode release for the required
LC_IDENTIFICATION dates.  If the existing files like i18n_ctype
or tr_TR have newer dates then we keep those, otherwise we use the
newer date from the Unicode release.

All data files are regenerated with:
cd localedata/unicode-gen
make
make install

Subsequent regeneration will not alter any file dates and makes
the Unicode generation reproducible.

Tested on x86_64 and i686 without regression.
---
 localedata/locales/i18n_ctype                 |  4 +-
 localedata/locales/tr_TR                      |  2 +-
 localedata/locales/translit_circle            |  2 +-
 localedata/locales/translit_cjk_compat        |  2 +-
 localedata/locales/translit_combining         |  2 +-
 localedata/locales/translit_compat            |  2 +-
 localedata/locales/translit_font              |  2 +-
 localedata/locales/translit_fraction          |  2 +-
 localedata/unicode-gen/Makefile               | 66 ++++++++-----------
 localedata/unicode-gen/UnicodeRelease.txt     |  8 +++
 localedata/unicode-gen/gen_translit_circle.py | 20 +++---
 .../unicode-gen/gen_translit_cjk_compat.py    | 20 +++---
 .../unicode-gen/gen_translit_combining.py     | 20 +++---
 localedata/unicode-gen/gen_translit_compat.py | 20 +++---
 localedata/unicode-gen/gen_translit_font.py   | 20 +++---
 .../unicode-gen/gen_translit_fraction.py      | 20 +++---
 localedata/unicode-gen/gen_unicode_ctype.py   | 50 ++++++--------
 localedata/unicode-gen/unicode_utils.py       | 38 +++++++++++
 localedata/unicode-gen/utf8_compatibility.py  | 27 ++++----
 localedata/unicode-gen/utf8_gen.py            | 61 +++++++----------
 20 files changed, 189 insertions(+), 199 deletions(-)
 create mode 100644 localedata/unicode-gen/UnicodeRelease.txt
  

Comments

Florian Weimer April 30, 2021, 9:26 a.m. UTC | #1
Unaddressed
* Carlos O'Donell:

> diff --git a/localedata/unicode-gen/UnicodeRelease.txt b/localedata/unicode-gen/UnicodeRelease.txt
> new file mode 100644
> index 0000000000..bd9cc14ae0
> --- /dev/null
> +++ b/localedata/unicode-gen/UnicodeRelease.txt
> @@ -0,0 +1,8 @@
> +% This metadata is used by glibc and updated by the developer(s)
> +% carrying out the Unicode update.
> +Version,13.0.0
> +ReleaseDate,2021-03-10
> +Data,UnicodeData.txt
> +DcpData,DerivedCoreProperties.txt
> +EawData,EastAsianWidth.txt
> +PlData,PropList.txt

I suggest to use <https://www.unicode.org/Public/13.0.0/ucd/ReadMe.txt>
instead.  It would give 2020-03-06 as the date.  2021-03-10 is
definitely wrong, it should be 2020-03-10.

Perhaps it's time to move the *.txt files into their own directory and
also include <https://www.unicode.org/copyright.html> and
<https://www.unicode.org/license.html>.

Thanks,
Florian
  
Florian Weimer May 4, 2021, 9:46 p.m. UTC | #2
Unaddressed
* Florian Weimer:

> * Carlos O'Donell:
>
>> diff --git a/localedata/unicode-gen/UnicodeRelease.txt b/localedata/unicode-gen/UnicodeRelease.txt
>> new file mode 100644
>> index 0000000000..bd9cc14ae0
>> --- /dev/null
>> +++ b/localedata/unicode-gen/UnicodeRelease.txt
>> @@ -0,0 +1,8 @@
>> +% This metadata is used by glibc and updated by the developer(s)
>> +% carrying out the Unicode update.
>> +Version,13.0.0
>> +ReleaseDate,2021-03-10
>> +Data,UnicodeData.txt
>> +DcpData,DerivedCoreProperties.txt
>> +EawData,EastAsianWidth.txt
>> +PlData,PropList.txt
>
> I suggest to use <https://www.unicode.org/Public/13.0.0/ucd/ReadMe.txt>
> instead.  It would give 2020-03-06 as the date.  2021-03-10 is
> definitely wrong, it should be 2020-03-10.
>
> Perhaps it's time to move the *.txt files into their own directory and
> also include <https://www.unicode.org/copyright.html> and
> <https://www.unicode.org/license.html>.

Hmm, maybe this is asking for too much, given that this started out as
something else entirely.

Maybe put variables into the generator script along with a comment for
now?  I think the custom descriptor file is probably a bit overdesigned.

Thanks,
Florian
  

Patch

diff --git a/localedata/locales/i18n_ctype b/localedata/locales/i18n_ctype
index c63e0790fc..f5063fe743 100644
--- a/localedata/locales/i18n_ctype
+++ b/localedata/locales/i18n_ctype
@@ -13,7 +13,7 @@  comment_char %
 % information, but with different transliterations, can include it
 % directly.
 
-% Generated automatically by gen_unicode_ctype.py for Unicode 12.1.0.
+% Generated automatically by gen_unicode_ctype.py.
 
 LC_IDENTIFICATION
 title     "Unicode 13.0.0 FDCC-set"
@@ -26,7 +26,7 @@  fax       ""
 language  ""
 territory "Earth"
 revision  "13.0.0"
-date      "2020-06-25"
+date      "2021-03-10"
 category  "i18n:2012";LC_CTYPE
 END LC_IDENTIFICATION
 
diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR
index 7dbb923228..ff8b315b7b 100644
--- a/localedata/locales/tr_TR
+++ b/localedata/locales/tr_TR
@@ -43,7 +43,7 @@  fax        ""
 language   "Turkish"
 territory  "Turkey"
 revision   "1.0"
-date       "2020-06-25"
+date       "2021-03-10"
 
 category "i18n:2012";LC_IDENTIFICATION
 category "i18n:2012";LC_CTYPE
diff --git a/localedata/locales/translit_circle b/localedata/locales/translit_circle
index 5c07b44532..f2ef558e2d 100644
--- a/localedata/locales/translit_circle
+++ b/localedata/locales/translit_circle
@@ -9,7 +9,7 @@  comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of encircled characters.
-% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2020-06-25 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_circle.py for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_cjk_compat b/localedata/locales/translit_cjk_compat
index ee0d7f83c6..2696445dbf 100644
--- a/localedata/locales/translit_cjk_compat
+++ b/localedata/locales/translit_cjk_compat
@@ -9,7 +9,7 @@  comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of CJK compatibility characters.
-% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2020-06-25 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_combining b/localedata/locales/translit_combining
index 36128f097a..b8e6b7efbd 100644
--- a/localedata/locales/translit_combining
+++ b/localedata/locales/translit_combining
@@ -10,7 +10,7 @@  comment_char %
 
 % Transliterations that remove all combining characters (accents,
 % pronounciation marks, etc.).
-% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2020-06-25 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_combining.py for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_compat b/localedata/locales/translit_compat
index ac24c4e938..61cdcccbc9 100644
--- a/localedata/locales/translit_compat
+++ b/localedata/locales/translit_compat
@@ -9,7 +9,7 @@  comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of compatibility characters and ligatures.
-% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2020-06-25 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_compat.py for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_font b/localedata/locales/translit_font
index 680c4ed426..c3d7b44772 100644
--- a/localedata/locales/translit_font
+++ b/localedata/locales/translit_font
@@ -9,7 +9,7 @@  comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of font equivalents.
-% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2020-06-25 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_font.py for Unicode 13.0.0.
 
 LC_CTYPE
 
diff --git a/localedata/locales/translit_fraction b/localedata/locales/translit_fraction
index b52244969e..292fe3e806 100644
--- a/localedata/locales/translit_fraction
+++ b/localedata/locales/translit_fraction
@@ -9,7 +9,7 @@  comment_char %
 % otherwise be governed by that license.
 
 % Transliterations of fractions.
-% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2020-06-25 for Unicode 13.0.0.
+% Generated automatically from UnicodeData.txt by gen_translit_fraction.py for Unicode 13.0.0.
 % The replacements have been surrounded with spaces, because fractions are
 % often preceded by a decimal number and followed by a unit or a math symbol.
 
diff --git a/localedata/unicode-gen/Makefile b/localedata/unicode-gen/Makefile
index d0dd1b78a5..b5c9c5517b 100644
--- a/localedata/unicode-gen/Makefile
+++ b/localedata/unicode-gen/Makefile
@@ -18,11 +18,10 @@ 
 # Makefile for generating and updating Unicode-extracted files.
 
 
-# This Makefile is NOT used as part of the GNU libc build.  It needs
-# to be run manually, within the source tree, at Unicode upgrades
-# (change UNICODE_VERSION below), to update ../locales/i18n_ctype ctype
-# information (part of the file is preserved, so don't wipe it all
-# out), and ../charmaps/UTF-8.
+# This Makefile is NOT used as part of the GNU libc build.  It needs to
+# be run manually, within the source tree, at Unicode upgrades, to
+# update ../locales/i18n_ctype ctype information (part of the file is
+# preserved, so don't wipe it all out), and ../charmaps/UTF-8.
 
 # Use make all to generate the files used in the glibc build out of
 # the original Unicode files; make check to verify that they are what
@@ -33,13 +32,14 @@ 
 # running afoul of the LGPL corresponding sources requirements, even
 # though it's not clear that they are preferred over the generated
 # files for making modifications.
-
-
-UNICODE_VERSION = 13.0.0
+#
+# The UnicodeRelease.txt file must be updated manually to include the
+# information about the downloaded Unicode release.
 
 PYTHON3 = python3
 WGET = wget
 
+RELEASEDATA = UnicodeRelease.txt
 DOWNLOADS = UnicodeData.txt DerivedCoreProperties.txt EastAsianWidth.txt PropList.txt
 GENERATED = i18n_ctype tr_TR UTF-8 translit_combining translit_compat translit_circle translit_cjk_compat translit_font translit_fraction
 REPORTS = i18n_ctype-report UTF-8-report
@@ -66,12 +66,10 @@  mostlyclean:
 
 .PHONY: all check clean mostlyclean install
 
-i18n_ctype: UnicodeData.txt DerivedCoreProperties.txt
+i18n_ctype: UnicodeData.txt DerivedCoreProperties.txt $(RELEASEDATA)
 i18n_ctype: ../locales/i18n_ctype # Preserve non-ctype information.
 i18n_ctype: gen_unicode_ctype.py
-	$(PYTHON3) gen_unicode_ctype.py -u UnicodeData.txt \
-	  -d DerivedCoreProperties.txt -i ../locales/i18n_ctype -o $@ \
-	  --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) gen_unicode_ctype.py -i ../locales/i18n_ctype -o $@
 
 i18n_ctype-report: i18n_ctype ../locales/i18n_ctype
 i18n_ctype-report: ctype_compatibility.py ctype_compatibility_test_cases.py
@@ -86,55 +84,45 @@  check-i18n_ctype: i18n_ctype-report
 tr_TR: UnicodeData.txt DerivedCoreProperties.txt
 tr_TR: ../locales/tr_TR # Preserve non-ctype information.
 tr_TR: gen_unicode_ctype.py
-	$(PYTHON3) gen_unicode_ctype.py -u UnicodeData.txt \
-	  -d DerivedCoreProperties.txt -i ../locales/tr_TR -o $@ \
-	  --unicode_version $(UNICODE_VERSION) --turkish
+	$(PYTHON3) gen_unicode_ctype.py -i ../locales/tr_TR -o $@ \
+	  --turkish
 
-UTF-8: UnicodeData.txt EastAsianWidth.txt
+UTF-8: UnicodeData.txt EastAsianWidth.txt $(RELEASEDATA)
 UTF-8: utf8_gen.py
-	$(PYTHON3) utf8_gen.py -u UnicodeData.txt \
-	-e EastAsianWidth.txt -p PropList.txt \
-	--unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) utf8_gen.py
 
 UTF-8-report: UTF-8 ../charmaps/UTF-8
 UTF-8-report: utf8_compatibility.py
-	$(PYTHON3) ./utf8_compatibility.py -u UnicodeData.txt \
-	-e EastAsianWidth.txt -o ../charmaps/UTF-8 \
+	$(PYTHON3) ./utf8_compatibility.py -o ../charmaps/UTF-8 \
 	-n UTF-8 -a -m -c > $@
 
 check-UTF-8: UTF-8-report
 	@if grep '^Total.*: [^0]' UTF-8-report; \
 	then echo manual verification required; false; else true; fi
 
-translit_combining: UnicodeData.txt
+translit_combining: UnicodeData.txt $(RELEASEDATA)
 translit_combining: gen_translit_combining.py
-	$(PYTHON3) ./gen_translit_combining.py -u UnicodeData.txt \
-	-o $@ --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) ./gen_translit_combining.py -o $@
 
-translit_compat: UnicodeData.txt
+translit_compat: UnicodeData.txt $(RELEASEDATA)
 translit_compat: gen_translit_compat.py
-	$(PYTHON3) ./gen_translit_compat.py -u UnicodeData.txt \
-	-o $@ --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) ./gen_translit_compat.py -o $@
 
-translit_circle: UnicodeData.txt
+translit_circle: UnicodeData.txt $(RELEASEDATA)
 translit_circle: gen_translit_circle.py
-	$(PYTHON3) ./gen_translit_circle.py -u UnicodeData.txt \
-	-o $@ --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) ./gen_translit_circle.py -o $@
 
-translit_cjk_compat: UnicodeData.txt
+translit_cjk_compat: UnicodeData.txt $(RELEASEDATA)
 translit_cjk_compat: gen_translit_cjk_compat.py
-	$(PYTHON3) ./gen_translit_cjk_compat.py -u UnicodeData.txt \
-	-o $@ --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) ./gen_translit_cjk_compat.py -o $@
 
-translit_font: UnicodeData.txt
+translit_font: UnicodeData.txt $(RELEASEDATA)
 translit_font: gen_translit_font.py
-	$(PYTHON3) ./gen_translit_font.py -u UnicodeData.txt \
-	-o $@ --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) ./gen_translit_font.py -o $@
 
-translit_fraction: UnicodeData.txt
+translit_fraction: UnicodeData.txt $(RELEASEDATA)
 translit_fraction: gen_translit_fraction.py
-	$(PYTHON3) ./gen_translit_fraction.py -u UnicodeData.txt \
-	-o $@ --unicode_version $(UNICODE_VERSION)
+	$(PYTHON3) ./gen_translit_fraction.py -o $@
 
 .PHONY: downloads clean-downloads
 downloads: $(DOWNLOADS)
diff --git a/localedata/unicode-gen/UnicodeRelease.txt b/localedata/unicode-gen/UnicodeRelease.txt
new file mode 100644
index 0000000000..bd9cc14ae0
--- /dev/null
+++ b/localedata/unicode-gen/UnicodeRelease.txt
@@ -0,0 +1,8 @@ 
+% This metadata is used by glibc and updated by the developer(s)
+% carrying out the Unicode update.
+Version,13.0.0
+ReleaseDate,2021-03-10
+Data,UnicodeData.txt
+DcpData,DerivedCoreProperties.txt
+EawData,EastAsianWidth.txt
+PlData,PropList.txt
diff --git a/localedata/unicode-gen/gen_translit_circle.py b/localedata/unicode-gen/gen_translit_circle.py
index a83dccc163..cc897b2f5f 100644
--- a/localedata/unicode-gen/gen_translit_circle.py
+++ b/localedata/unicode-gen/gen_translit_circle.py
@@ -67,7 +67,6 @@  def output_head(translit_file, unicode_version, head=''):
         translit_file.write('% Transliterations of encircled characters.\n')
         translit_file.write('% Generated automatically from UnicodeData.txt '
                             + 'by gen_translit_circle.py '
-                            + 'on {:s} '.format(time.strftime('%Y-%m-%d'))
                             + 'for Unicode {:s}.\n'.format(unicode_version))
         translit_file.write('\n')
         translit_file.write('LC_CTYPE\n')
@@ -110,11 +109,11 @@  if __name__ == "__main__":
         Generate a translit_circle file from UnicodeData.txt.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -133,19 +132,16 @@  if __name__ == "__main__":
         “translit_start” line and the tail from the “translit_end”
         line to the end of the file will be copied unchanged into the
         output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
     HEAD = TAIL = ''
     if ARGS.input_file:
         (HEAD, TAIL) = read_input_file(ARGS.input_file)
     with open(ARGS.output_file, mode='w') as TRANSLIT_FILE:
-        output_head(TRANSLIT_FILE, ARGS.unicode_version, head=HEAD)
+        output_head(TRANSLIT_FILE, unicode_version, head=HEAD)
         output_transliteration(TRANSLIT_FILE)
         output_tail(TRANSLIT_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/gen_translit_cjk_compat.py b/localedata/unicode-gen/gen_translit_cjk_compat.py
index a040511d06..ac127a8e21 100644
--- a/localedata/unicode-gen/gen_translit_cjk_compat.py
+++ b/localedata/unicode-gen/gen_translit_cjk_compat.py
@@ -69,7 +69,6 @@  def output_head(translit_file, unicode_version, head=''):
         translit_file.write('characters.\n')
         translit_file.write('% Generated automatically from UnicodeData.txt '
                             + 'by gen_translit_cjk_compat.py '
-                            + 'on {:s} '.format(time.strftime('%Y-%m-%d'))
                             + 'for Unicode {:s}.\n'.format(unicode_version))
         translit_file.write('\n')
         translit_file.write('LC_CTYPE\n')
@@ -180,11 +179,11 @@  if __name__ == "__main__":
         Generate a translit_cjk_compat file from UnicodeData.txt.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -203,19 +202,16 @@  if __name__ == "__main__":
         “translit_start” line and the tail from the “translit_end”
         line to the end of the file will be copied unchanged into the
         output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
     HEAD = TAIL = ''
     if ARGS.input_file:
         (HEAD, TAIL) = read_input_file(ARGS.input_file)
     with open(ARGS.output_file, mode='w') as TRANSLIT_FILE:
-        output_head(TRANSLIT_FILE, ARGS.unicode_version, head=HEAD)
+        output_head(TRANSLIT_FILE, unicode_version, head=HEAD)
         output_transliteration(TRANSLIT_FILE)
         output_tail(TRANSLIT_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/gen_translit_combining.py b/localedata/unicode-gen/gen_translit_combining.py
index 88be8f4b8a..082c0da92c 100644
--- a/localedata/unicode-gen/gen_translit_combining.py
+++ b/localedata/unicode-gen/gen_translit_combining.py
@@ -69,7 +69,6 @@  def output_head(translit_file, unicode_version, head=''):
         translit_file.write('% pronounciation marks, etc.).\n')
         translit_file.write('% Generated automatically from UnicodeData.txt '
                             + 'by gen_translit_combining.py '
-                            + 'on {:s} '.format(time.strftime('%Y-%m-%d'))
                             + 'for Unicode {:s}.\n'.format(unicode_version))
         translit_file.write('\n')
         translit_file.write('LC_CTYPE\n')
@@ -404,11 +403,11 @@  if __name__ == "__main__":
         Generate a translit_combining file from UnicodeData.txt.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -427,19 +426,16 @@  if __name__ == "__main__":
         “translit_start” line and the tail from the “translit_end”
         line to the end of the file will be copied unchanged into the
         output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
     HEAD = TAIL = ''
     if ARGS.input_file:
         (HEAD, TAIL) = read_input_file(ARGS.input_file)
     with open(ARGS.output_file, mode='w') as TRANSLIT_FILE:
-        output_head(TRANSLIT_FILE, ARGS.unicode_version, head=HEAD)
+        output_head(TRANSLIT_FILE, unicode_version, head=HEAD)
         output_transliteration(TRANSLIT_FILE)
         output_tail(TRANSLIT_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/gen_translit_compat.py b/localedata/unicode-gen/gen_translit_compat.py
index c8c63b23af..ba144e9bee 100644
--- a/localedata/unicode-gen/gen_translit_compat.py
+++ b/localedata/unicode-gen/gen_translit_compat.py
@@ -68,7 +68,6 @@  def output_head(translit_file, unicode_version, head=''):
         translit_file.write('and ligatures.\n')
         translit_file.write('% Generated automatically from UnicodeData.txt '
                             + 'by gen_translit_compat.py '
-                            + 'on {:s} '.format(time.strftime('%Y-%m-%d'))
                             + 'for Unicode {:s}.\n'.format(unicode_version))
         translit_file.write('\n')
         translit_file.write('LC_CTYPE\n')
@@ -286,11 +285,11 @@  if __name__ == "__main__":
         Generate a translit_compat file from UnicodeData.txt.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -309,19 +308,16 @@  if __name__ == "__main__":
         “translit_start” line and the tail from the “translit_end”
         line to the end of the file will be copied unchanged into the
         output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
     HEAD = TAIL = ''
     if ARGS.input_file:
         (HEAD, TAIL) = read_input_file(ARGS.input_file)
     with open(ARGS.output_file, mode='w') as TRANSLIT_FILE:
-        output_head(TRANSLIT_FILE, ARGS.unicode_version, head=HEAD)
+        output_head(TRANSLIT_FILE, unicode_version, head=HEAD)
         output_transliteration(TRANSLIT_FILE)
         output_tail(TRANSLIT_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/gen_translit_font.py b/localedata/unicode-gen/gen_translit_font.py
index db41b47fab..93b2f128fa 100644
--- a/localedata/unicode-gen/gen_translit_font.py
+++ b/localedata/unicode-gen/gen_translit_font.py
@@ -67,7 +67,6 @@  def output_head(translit_file, unicode_version, head=''):
         translit_file.write('% Transliterations of font equivalents.\n')
         translit_file.write('% Generated automatically from UnicodeData.txt '
                             + 'by gen_translit_font.py '
-                            + 'on {:s} '.format(time.strftime('%Y-%m-%d'))
                             + 'for Unicode {:s}.\n'.format(unicode_version))
         translit_file.write('\n')
         translit_file.write('LC_CTYPE\n')
@@ -116,11 +115,11 @@  if __name__ == "__main__":
         Generate a translit_font file from UnicodeData.txt.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -139,19 +138,16 @@  if __name__ == "__main__":
         “translit_start” line and the tail from the “translit_end”
         line to the end of the file will be copied unchanged into the
         output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
     HEAD = TAIL = ''
     if ARGS.input_file:
         (HEAD, TAIL) = read_input_file(ARGS.input_file)
     with open(ARGS.output_file, mode='w') as TRANSLIT_FILE:
-        output_head(TRANSLIT_FILE, ARGS.unicode_version, head=HEAD)
+        output_head(TRANSLIT_FILE, unicode_version, head=HEAD)
         output_transliteration(TRANSLIT_FILE)
         output_tail(TRANSLIT_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/gen_translit_fraction.py b/localedata/unicode-gen/gen_translit_fraction.py
index c3c1513eb9..097cb04ea0 100644
--- a/localedata/unicode-gen/gen_translit_fraction.py
+++ b/localedata/unicode-gen/gen_translit_fraction.py
@@ -67,7 +67,6 @@  def output_head(translit_file, unicode_version, head=''):
         translit_file.write('% Transliterations of fractions.\n')
         translit_file.write('% Generated automatically from UnicodeData.txt '
                             + 'by gen_translit_fraction.py '
-                            + 'on {:s} '.format(time.strftime('%Y-%m-%d'))
                             + 'for Unicode {:s}.\n'.format(unicode_version))
         translit_file.write('% The replacements have been surrounded ')
         translit_file.write('with spaces, because fractions are\n')
@@ -157,11 +156,11 @@  if __name__ == "__main__":
         Generate a translit_cjk_compat file from UnicodeData.txt.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -180,19 +179,16 @@  if __name__ == "__main__":
         “translit_start” line and the tail from the “translit_end”
         line to the end of the file will be copied unchanged into the
         output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
     HEAD = TAIL = ''
     if ARGS.input_file:
         (HEAD, TAIL) = read_input_file(ARGS.input_file)
     with open(ARGS.output_file, mode='w') as TRANSLIT_FILE:
-        output_head(TRANSLIT_FILE, ARGS.unicode_version, head=HEAD)
+        output_head(TRANSLIT_FILE, unicode_version, head=HEAD)
         output_transliteration(TRANSLIT_FILE)
         output_tail(TRANSLIT_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/gen_unicode_ctype.py b/localedata/unicode-gen/gen_unicode_ctype.py
index 7548961df1..41760567cf 100755
--- a/localedata/unicode-gen/gen_unicode_ctype.py
+++ b/localedata/unicode-gen/gen_unicode_ctype.py
@@ -32,6 +32,7 @@  To see how this script is used, call it with the “-h” option:
 import argparse
 import time
 import re
+import datetime
 import unicode_utils
 
 def code_point_ranges(is_class_function):
@@ -123,7 +124,7 @@  def output_charmap(i18n_file, map_name, map_function):
         i18n_file.write(line+'\n')
     i18n_file.write('\n')
 
-def read_input_file(filename):
+def read_input_file(filename, unicode_release_date):
     '''Reads the original glibc i18n file to get the original head
     and tail.
 
@@ -140,8 +141,13 @@  def read_input_file(filename):
                 r'^(?P<key>date\s+)(?P<value>"[0-9]{4}-[0-9]{2}-[0-9]{2}")',
                 line)
             if match:
-                line = match.group('key') \
-                       + '"{:s}"\n'.format(time.strftime('%Y-%m-%d'))
+                # Update the file date if the Unicode standard date
+                # is newer.
+                orig_date = datetime.date.fromisoformat(match.group('value').strip('"'))
+                new_date = datetime.date.fromisoformat(unicode_release_date)
+                if new_date > orig_date:
+                    line = match.group('key') \
+                           + '"{:s}"\n'.format(unicode_release_date)
             head = head + line
             if line.startswith('LC_CTYPE'):
                 break
@@ -153,7 +159,7 @@  def read_input_file(filename):
             tail = tail + line
     return (head, tail)
 
-def output_head(i18n_file, unicode_version, head=''):
+def output_head(i18n_file, unicode_version, unicode_release_date, head=''):
     '''Write the header of the output file, i.e. the part of the file
     before the “LC_CTYPE” line.
     '''
@@ -180,8 +186,7 @@  def output_head(i18n_file, unicode_version, head=''):
         i18n_file.write('language  ""\n')
         i18n_file.write('territory "Earth"\n')
         i18n_file.write('revision  "{:s}"\n'.format(unicode_version))
-        i18n_file.write('date      "{:s}"\n'.format(
-            time.strftime('%Y-%m-%d')))
+        i18n_file.write('date      "{:s}"\n'.format(unicode_release_date))
         i18n_file.write('category  "i18n:2012";LC_CTYPE\n')
         i18n_file.write('END LC_IDENTIFICATION\n')
         i18n_file.write('\n')
@@ -267,18 +272,11 @@  if __name__ == "__main__":
         UnicodeData.txt and DerivedCoreProperties.txt files.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
-              + 'default: %(default)s'))
-    PARSER.add_argument(
-        '-d', '--derived_core_properties_file',
-        nargs='?',
-        type=str,
-        default='DerivedCoreProperties.txt',
-        help=('The DerivedCoreProperties.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
     PARSER.add_argument(
         '-i', '--input_file',
@@ -298,27 +296,21 @@  if __name__ == "__main__":
         classes and the date stamp in
         LC_IDENTIFICATION will be copied unchanged
         into the output file.  ''')
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     PARSER.add_argument(
         '--turkish',
         action='store_true',
         help='Use Turkish case conversions.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(
-        ARGS.unicode_data_file)
-    unicode_utils.fill_derived_core_properties(
-        ARGS.derived_core_properties_file)
+    unicode_version = unicode_utils.release_version (ARGS.unicode_data_dir)
+    unicode_release_date = unicode_utils.release_date (ARGS.unicode_data_dir)
+    unicode_utils.fill_attributes(unicode_utils.release_data_file(ARGS.unicode_data_dir))
+    unicode_utils.fill_derived_core_properties(unicode_utils.release_dcp_file(ARGS.unicode_data_dir))
     unicode_utils.verifications()
     HEAD = TAIL = ''
     if ARGS.input_file:
-        (HEAD, TAIL) = read_input_file(ARGS.input_file)
+        (HEAD, TAIL) = read_input_file(ARGS.input_file, unicode_release_date)
     with open(ARGS.output_file, mode='w') as I18N_FILE:
-        output_head(I18N_FILE, ARGS.unicode_version, head=HEAD)
-        output_tables(I18N_FILE, ARGS.unicode_version, ARGS.turkish)
+        output_head(I18N_FILE, unicode_version, unicode_release_date, head=HEAD)
+        output_tables(I18N_FILE, unicode_version, ARGS.turkish)
         output_tail(I18N_FILE, tail=TAIL)
diff --git a/localedata/unicode-gen/unicode_utils.py b/localedata/unicode-gen/unicode_utils.py
index 3263f4510b..2b7c6aaa45 100644
--- a/localedata/unicode-gen/unicode_utils.py
+++ b/localedata/unicode-gen/unicode_utils.py
@@ -525,3 +525,41 @@  def verifications():
             and (is_graph(code_point) or code_point == 0x0020)):
             sys.stderr.write('%(sym)s is graph|<space> but not print\n' %{
                 'sym': unicode_utils.ucs_symbol(code_point)})
+
+def release_metadata(data_dir, parameter):
+    ''' Parse the UnicodeRelease.txt metadata and return the value for
+    the specified parameter.'''
+    value = ""
+    with open(data_dir + '/' + "UnicodeRelease.txt", "r") as f:
+        for line in f:
+            if line.strip()[0] == '%':
+                continue
+            fields = line.strip().split(",")
+            if fields[0] == parameter:
+                value = fields[1].strip()
+    assert value != ""
+    return value
+
+def release_version(data_dir):
+    ''' Return the Unicode version of the data in use.'''
+    return release_metadata(data_dir, "Version")
+
+def release_date(data_dir):
+    ''' Release the release date for the Unicode version of the data.'''
+    return release_metadata(data_dir, "ReleaseDate")
+
+def release_data_file(data_dir):
+    ''' The name of the primary data file.'''
+    return data_dir + '/' + release_metadata(data_dir, 'Data')
+
+def release_dcp_file(data_dir):
+    ''' The name of the derived core properties data file.'''
+    return data_dir + '/' + release_metadata(data_dir, 'DcpData')
+
+def release_eaw_file(data_dir):
+    ''' The name of the East Asian width data file.'''
+    return data_dir + '/' + release_metadata(data_dir, 'EawData')
+
+def release_pl_file(data_dir):
+    ''' The name of the properties list data file.'''
+    return data_dir + '/' + release_metadata(data_dir, 'PlData')
diff --git a/localedata/unicode-gen/utf8_compatibility.py b/localedata/unicode-gen/utf8_compatibility.py
index eca2e8cddc..7e485ba759 100755
--- a/localedata/unicode-gen/utf8_compatibility.py
+++ b/localedata/unicode-gen/utf8_compatibility.py
@@ -216,6 +216,13 @@  if __name__ == "__main__":
         description='''
         Compare the contents of LC_CTYPE in two files and check for errors.
         ''')
+    PARSER.add_argument(
+        '-u', '--unicode_data_dir',
+        nargs='?',
+        type=str,
+        default='.',
+        help=('The directory containing Unicode data to read, '
+              + 'default: %(default)s'))
     PARSER.add_argument(
         '-o', '--old_utf8_file',
         nargs='?',
@@ -228,16 +235,6 @@  if __name__ == "__main__":
         required=True,
         type=str,
         help='The new UTF-8 file.')
-    PARSER.add_argument(
-        '-u', '--unicode_data_file',
-        nargs='?',
-        type=str,
-        help='The UnicodeData.txt file to read.')
-    PARSER.add_argument(
-        '-e', '--east_asian_width_file',
-        nargs='?',
-        type=str,
-        help='The EastAsianWidth.txt file to read.')
     PARSER.add_argument(
         '-a', '--show_added_characters',
         action='store_true',
@@ -252,9 +249,11 @@  if __name__ == "__main__":
         help='Show characters whose width was changed in detail.')
     ARGS = PARSER.parse_args()
 
-    if ARGS.unicode_data_file:
-        unicode_utils.fill_attributes(ARGS.unicode_data_file)
-    if ARGS.east_asian_width_file:
-        unicode_utils.fill_east_asian_widths(ARGS.east_asian_width_file)
+    unicode_data_file = unicode_utils.release_data_file (ARGS.unicode_data_dir)
+    east_asian_width_file = unicode_utils.release_eaw_file (ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
+    unicode_utils.fill_east_asian_widths(east_asian_width_file)
+
     check_charmap(ARGS.old_utf8_file, ARGS.new_utf8_file)
     check_width(ARGS.old_utf8_file, ARGS.new_utf8_file)
diff --git a/localedata/unicode-gen/utf8_gen.py b/localedata/unicode-gen/utf8_gen.py
index 899840923a..4fc3038fe0 100755
--- a/localedata/unicode-gen/utf8_gen.py
+++ b/localedata/unicode-gen/utf8_gen.py
@@ -22,7 +22,7 @@ 
 This script generates a glibc/localedata/charmaps/UTF-8 file
 from Unicode data.
 
-Usage: python3 utf8_gen.py UnicodeData.txt EastAsianWidth.txt
+Usage: python3 utf8_gen.py
 
 It will output UTF-8 file
 '''
@@ -198,23 +198,27 @@  def write_header_charmap(outfile):
     outfile.write("% alias ISO-10646/UTF-8\n")
     outfile.write("CHARMAP\n")
 
-def write_header_width(outfile, unicode_version):
+def write_header_width(outfile, unicode_data_dir):
     '''Writes the header on top of the WIDTH section to the output file'''
+    unicode_version = unicode_utils.release_version(unicode_data_dir)
+    unicode_data = unicode_utils.release_metadata(unicode_data_dir, 'Data')
+    eaw_data = unicode_utils.release_metadata(unicode_data_dir, 'EawData')
+    pl_data = unicode_utils.release_metadata(unicode_data_dir, 'PlData')
     outfile.write('% Character width according to Unicode '
                   + '{:s}.\n'.format(unicode_version))
     outfile.write('% - Default width is 1.\n')
     outfile.write('% - Double-width characters have width 2; generated from\n')
-    outfile.write('%        "grep \'^[^;]*;[WF]\' EastAsianWidth.txt"\n')
+    outfile.write('%        "grep \'^[^;]*;[WF]\' ' + eaw_data + '"\n')
     outfile.write('% - Non-spacing characters have width 0; '
-                  + 'generated from PropList.txt or\n')
+                  + 'generated from ' + pl_data + ' or\n')
     outfile.write('%   "grep \'^[^;]*;[^;]*;[^;]*;[^;]*;NSM;\' '
-                  + 'UnicodeData.txt"\n')
+                  + unicode_data  + '"\n')
     outfile.write('% - Format control characters have width 0; '
                   + 'generated from\n')
-    outfile.write("%   \"grep '^[^;]*;[^;]*;Cf;' UnicodeData.txt\"\n")
+    outfile.write("%   \"grep '^[^;]*;[^;]*;Cf;' " + unicode_data + "\"\n")
 #   Not needed covered by Cf
 #    outfile.write("% - Zero width characters have width 0; generated from\n")
-#    outfile.write("%   \"grep '^[^;]*;ZERO WIDTH ' UnicodeData.txt\"\n")
+#    outfile.write("%   \"grep '^[^;]*;ZERO WIDTH ' " + unicode_data + "\"\n")
     outfile.write("WIDTH\n")
 
 def process_width(outfile, ulines, elines, plines):
@@ -302,41 +306,26 @@  def process_width(outfile, ulines, elines, plines):
 if __name__ == "__main__":
     PARSER = argparse.ArgumentParser(
         description='''
-        Generate a UTF-8 file from UnicodeData.txt, EastAsianWidth.txt, and PropList.txt.
+        Generate a UTF-8 file from the Unicode release data files.
         ''')
     PARSER.add_argument(
-        '-u', '--unicode_data_file',
+        '-u', '--unicode_data_dir',
         nargs='?',
         type=str,
-        default='UnicodeData.txt',
-        help=('The UnicodeData.txt file to read, '
+        default='.',
+        help=('The directory containing Unicode data to read, '
               + 'default: %(default)s'))
-    PARSER.add_argument(
-        '-e', '--east_asian_with_file',
-        nargs='?',
-        type=str,
-        default='EastAsianWidth.txt',
-        help=('The EastAsianWidth.txt file to read, '
-              + 'default: %(default)s'))
-    PARSER.add_argument(
-        '-p', '--prop_list_file',
-        nargs='?',
-        type=str,
-        default='PropList.txt',
-        help=('The PropList.txt file to read, '
-              + 'default: %(default)s'))
-    PARSER.add_argument(
-        '--unicode_version',
-        nargs='?',
-        required=True,
-        type=str,
-        help='The Unicode version of the input files used.')
     ARGS = PARSER.parse_args()
 
-    unicode_utils.fill_attributes(ARGS.unicode_data_file)
-    with open(ARGS.unicode_data_file, mode='r') as UNIDATA_FILE:
+    unicode_version = unicode_utils.release_version(ARGS.unicode_data_dir)
+    unicode_data_file = unicode_utils.release_data_file(ARGS.unicode_data_dir)
+    east_asian_width_file = unicode_utils.release_eaw_file(ARGS.unicode_data_dir)
+    prop_list_file = unicode_utils.release_pl_file(ARGS.unicode_data_dir)
+
+    unicode_utils.fill_attributes(unicode_data_file)
+    with open(unicode_data_file, mode='r') as UNIDATA_FILE:
         UNICODE_DATA_LINES = UNIDATA_FILE.readlines()
-    with open(ARGS.east_asian_with_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
+    with open(east_asian_width_file, mode='r') as EAST_ASIAN_WIDTH_FILE:
         EAST_ASIAN_WIDTH_LINES = []
         for LINE in EAST_ASIAN_WIDTH_FILE:
             # If characters from EastAasianWidth.txt which are from
@@ -352,7 +341,7 @@  if __name__ == "__main__":
                 continue
             if re.match(r'^[^;]*;[WF]', LINE):
                 EAST_ASIAN_WIDTH_LINES.append(LINE.strip())
-    with open(ARGS.prop_list_file, mode='r') as PROP_LIST_FILE:
+    with open(prop_list_file, mode='r') as PROP_LIST_FILE:
         PROP_LIST_LINES = []
         for LINE in PROP_LIST_FILE:
             if re.match(r'^[^;]*;[\s]*Prepended_Concatenation_Mark', LINE):
@@ -363,7 +352,7 @@  if __name__ == "__main__":
         process_charmap(UNICODE_DATA_LINES, OUTFILE)
         OUTFILE.write("END CHARMAP\n\n")
         # Processing EastAsianWidth.txt and write WIDTH to UTF-8 file
-        write_header_width(OUTFILE, ARGS.unicode_version)
+        write_header_width(OUTFILE, ARGS.unicode_data_dir)
         process_width(OUTFILE,
                       UNICODE_DATA_LINES,
                       EAST_ASIAN_WIDTH_LINES,