[PATCHv2] Add --no-hard-links option to localedef (Bug 23923)

Message ID abeea93e-ef10-6b8f-1a51-8cc49b708c5c@redhat.com
State New, archived
Headers

Commit Message

Carlos O'Donell Nov. 26, 2018, 7:25 p.m. UTC
  On 11/26/18 11:58 AM, Joseph Myers wrote:
> On Mon, 26 Nov 2018, Carlos O'Donell wrote:
> 
>> If the downstream distro wants to post-process the hardlinks then
>> the time spent in localedef looking up sibling directories and
>> processing hardlinks is wasted effort.
> 
> Do you have figures for the time involved?

On a fast SSD the -j1 locale build and install of SUPPORTED locales
is 3% faster, or for me roughly ~10s on a:
"make localedata/install-locale-files DESTDIR=..." iteration.

The option itself ensures that no hard links are present and that may
make the logic of the post-install hardlink creation phase simpler.
So there is more than just performance to consider.

>> +     other locales, and see if we can reuse it, to save disk space.
>> +     If the user specified --no-hard-link to localedef then hard_links is
>> +     false, other_paths remains NULL and we skip the optimization below.
>> +     The use of --no-hard-link is distribution specific since some distros
>> +     have post-processing hard-link steps and so doing this here is a waste
>> +     of time.  */
> 
> This should refer to --no-hard-links, not --no-hard-link (twice).

Fixed.

> This is not a review of other aspects of the patch (but I approve of 
> improving determinism of the locale build, and also have such a hard-link 
> post-processing step at Mentor).

v2
- Fix comments to mention '--no-hard-links'
- Update convenience target to use '--no-hard-links'

[PATCH] Add --no-hard-links option to localedef (bug 23923)

Downstream distributions need consistent sets of hardlinks in
order for rpm to operate effectively. This means that even if
locales are built with a high level of parallelism that the
resulting files need to have consistent hardlink counts. The only
way to achieve this is with a post-install hardlink pass using a
program like 'hardklink' (shipped in Fedora).

If the downstream distro wants to post-process the hardlinks then
the time spent in localedef looking up sibling directories and
processing hardlinks is wasted effort.

To optimize the build and install pass we add a --no-hard-links
option to localedef to avoid doing the hardlink optimziation for
size.

Tested on x86_64 with 'make localedata/install-locale-files'
before and after. Without the patch we have files with 100+
hardlink counts. After the patch and running with --no-hard-links
all link counts are 1. This patch also alters the convenience
target 'make localedata/install-locale-files' to use the new
option.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
---
 ChangeLog                   | 10 ++++++++++
 locale/programs/localedef.c | 10 ++++++++++
 locale/programs/localedef.h |  1 +
 locale/programs/locfile.c   | 16 +++++++++++++---
 localedata/Makefile         |  2 +-
 5 files changed, 35 insertions(+), 4 deletions(-)
  

Comments

Florian Weimer Nov. 26, 2018, 7:26 p.m. UTC | #1
* Carlos O'Donell:

> On 11/26/18 11:58 AM, Joseph Myers wrote:
>> On Mon, 26 Nov 2018, Carlos O'Donell wrote:
>> 
>>> If the downstream distro wants to post-process the hardlinks then
>>> the time spent in localedef looking up sibling directories and
>>> processing hardlinks is wasted effort.
>> 
>> Do you have figures for the time involved?
>
> On a fast SSD the -j1 locale build and install of SUPPORTED locales
> is 3% faster, or for me roughly ~10s on a:
> "make localedata/install-locale-files DESTDIR=..." iteration.
>
> The option itself ensures that no hard links are present and that may
> make the logic of the post-install hardlink creation phase simpler.
> So there is more than just performance to consider.

The real performance win comes from the ability to use parallel make and
still get fully deterministic results.

Thanks,
Florian
  
Carlos O'Donell Nov. 26, 2018, 7:33 p.m. UTC | #2
On 11/26/18 2:26 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>> On 11/26/18 11:58 AM, Joseph Myers wrote:
>>> On Mon, 26 Nov 2018, Carlos O'Donell wrote:
>>>
>>>> If the downstream distro wants to post-process the hardlinks then
>>>> the time spent in localedef looking up sibling directories and
>>>> processing hardlinks is wasted effort.
>>>
>>> Do you have figures for the time involved?
>>
>> On a fast SSD the -j1 locale build and install of SUPPORTED locales
>> is 3% faster, or for me roughly ~10s on a:
>> "make localedata/install-locale-files DESTDIR=..." iteration.
>>
>> The option itself ensures that no hard links are present and that may
>> make the logic of the post-install hardlink creation phase simpler.
>> So there is more than just performance to consider.
> 
> The real performance win comes from the ability to use parallel make and
> still get fully deterministic results.

If you are OK with v2 then I'll give it a few days for others to comment
and then commit. In v2 I switch the 'make localedata/install-locale-files'
to use --no-hard-links as the default distribution way to install the
files deterministically regardless of build paralleism.
  
Carlos O'Donell Dec. 3, 2018, 3:22 p.m. UTC | #3
On 11/26/18 2:33 PM, Carlos O'Donell wrote:
> On 11/26/18 2:26 PM, Florian Weimer wrote:
>> * Carlos O'Donell:
>>
>>> On 11/26/18 11:58 AM, Joseph Myers wrote:
>>>> On Mon, 26 Nov 2018, Carlos O'Donell wrote:
>>>>
>>>>> If the downstream distro wants to post-process the hardlinks then
>>>>> the time spent in localedef looking up sibling directories and
>>>>> processing hardlinks is wasted effort.
>>>>
>>>> Do you have figures for the time involved?
>>>
>>> On a fast SSD the -j1 locale build and install of SUPPORTED locales
>>> is 3% faster, or for me roughly ~10s on a:
>>> "make localedata/install-locale-files DESTDIR=..." iteration.
>>>
>>> The option itself ensures that no hard links are present and that may
>>> make the logic of the post-install hardlink creation phase simpler.
>>> So there is more than just performance to consider.
>>
>> The real performance win comes from the ability to use parallel make and
>> still get fully deterministic results.
> 
> If you are OK with v2 then I'll give it a few days for others to comment
> and then commit. In v2 I switch the 'make localedata/install-locale-files'
> to use --no-hard-links as the default distribution way to install the
> files deterministically regardless of build paralleism.
> 

OK, I've pushed v2, and this alters 'make localedata/install-locale-files'

I have also added a note in the "Packaging Changes" for distribution maintainers
to read regarding the change in the convenience target:

https://sourceware.org/glibc/wiki/Release/2.29#Localedata_installation_without_hard_links
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 03887874a2..b9bce8bc8e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@ 
+2018-11-26  Carlos O'Donell  <carlos@redhat.com>
+
+	[BZ #23923]
+	* locale/programs/localedef.c: Declare boolean hard_links default true.
+	(options): Add --no-hard-links option.
+	(parse_opt): Add OPT_NO_HARD_LINKS case and set hard_links to false.
+	* locale/programs/localedef.h: Declare prototype for hard_links.
+	* locale/programs/locfile.c (write_locale_data): Don't use hard
+	links if hard_links is false.
+
 2018-11-26  Carlos O'Donell  <carlos@redhat.com>
 
 	* scripts/abilist.awk: Handle .tdata. Error for unknown combinations.
diff --git a/locale/programs/localedef.c b/locale/programs/localedef.c
index d718d2e9f4..6c4936be6b 100644
--- a/locale/programs/localedef.c
+++ b/locale/programs/localedef.c
@@ -85,6 +85,9 @@  static bool replace_archive;
 /* If true list archive content.  */
 static bool list_archive;
 
+/* If true create hard links to other locales (default).  */
+bool hard_links = true;
+
 /* Maximum number of retries when opening the locale archive.  */
 int max_locarchive_open_retry = 10;
 
@@ -105,6 +108,7 @@  void (*argp_program_version_hook) (FILE *, struct argp_state *) = print_version;
 #define OPT_BIG_ENDIAN 401
 #define OPT_NO_WARN 402
 #define OPT_WARN 403
+#define OPT_NO_HARD_LINKS 404
 
 /* Definitions of arguments for argp functions.  */
 static const struct argp_option options[] =
@@ -120,6 +124,8 @@  static const struct argp_option options[] =
   { NULL, 0, NULL, 0, N_("Output control:") },
   { "force", 'c', NULL, 0,
     N_("Create output even if warning messages were issued") },
+  { "no-hard-links", OPT_NO_HARD_LINKS, NULL, 0,
+    N_("Do not create hard links between installed locales") },
   { "prefix", OPT_PREFIX, N_("PATH"), 0, N_("Optional output file prefix") },
   { "posix", OPT_POSIX, NULL, 0, N_("Strictly conform to POSIX") },
   { "quiet", OPT_QUIET, NULL, 0,
@@ -389,6 +395,10 @@  parse_opt (int key, char *arg, struct argp_state *state)
       /* Enable the warnings.  */
       set_warnings (arg, true);
       break;
+    case OPT_NO_HARD_LINKS:
+      /* Do not hard link to other locales.  */
+      hard_links = false;
+      break;
     case 'c':
       force_output = 1;
       break;
diff --git a/locale/programs/localedef.h b/locale/programs/localedef.h
index 0083faceab..e2b39e78f3 100644
--- a/locale/programs/localedef.h
+++ b/locale/programs/localedef.h
@@ -118,6 +118,7 @@  extern const char *repertoire_global;
 extern int max_locarchive_open_retry;
 extern bool no_archive;
 extern const char *alias_file;
+extern bool hard_links;
 
 
 /* Prototypes for a few program-wide used functions.  */
diff --git a/locale/programs/locfile.c b/locale/programs/locfile.c
index 32e5f761f2..c75eabc886 100644
--- a/locale/programs/locfile.c
+++ b/locale/programs/locfile.c
@@ -702,7 +702,7 @@  write_locale_data (const char *output_path, int catidx, const char *category,
   size_t cnt, step, maxiov;
   int fd;
   char *fname;
-  const char **other_paths;
+  const char **other_paths = NULL;
   uint32_t header[2];
   size_t n_elem;
   struct iovec vec[3];
@@ -828,8 +828,18 @@  failure while writing data for category `%s'"), category);
   close (fd);
 
   /* Compare the file with the locale data files for the same category in
-     other locales, and see if we can reuse it, to save disk space.  */
-  other_paths = siblings (output_path);
+     other locales, and see if we can reuse it, to save disk space.
+     If the user specified --no-hard-links to localedef then hard_links is
+     false, other_paths remains NULL and we skip the optimization below.
+     The use of --no-hard-links is distribution specific since some distros
+     have post-processing hard-link steps and so doing this here is a waste
+     of time.  */
+  if (hard_links)
+    other_paths = siblings (output_path);
+
+  /* If there are other paths, then walk the sibling paths looking for
+     files with the same content so we can hard link and reduce disk
+     space usage.  */
   if (other_paths != NULL)
     {
       struct stat64 fname_stat;
diff --git a/localedata/Makefile b/localedata/Makefile
index 0eea396ad8..b245f0ff54 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -423,7 +423,7 @@  $(INSTALL-SUPPORTED-LOCALE-ARCHIVE): install-locales-dir
 	$(build-one-locale)
 
 $(INSTALL-SUPPORTED-LOCALE-FILES): install-locales-dir
-	@flags="-c --no-archive"; \
+	@flags="-c --no-archive --no-hard-links"; \
 	$(build-one-locale)
 
 tst-setlocale-ENV = LC_ALL=ja_JP.EUC-JP