[v2] ldconfig: set LC_COLLATE to C [BZ #22505]

Message ID 20171130224358.14578-1-aurelien@aurel32.net
State New, archived
Headers

Commit Message

Aurelien Jarno Nov. 30, 2017, 10:43 p.m. UTC
  ldconfig supports `include' directives and use the glob function to
process them. The glob function sort entries according to the LC_COLLATE
category. When using a standard "include /etc/ld.so.conf.d/*.conf" entry
in /etc/ld.so.conf, the order therefore depends on the locale used to
run ldconfig. A few examples of locale specific order that might be
disturbing in that context compared to the C locale:
- The cs_CZ and sk_SK locales sort the digits after the letters.
- The et_EE locale sorts the 'z' between 's' and 't'.

This patch fixes that by setting LC_COLLATE to C in order to process
files in deterministic order, independently of the locale used to launch
ldconfig.

NOTE: This should NOT be backported to older release branches.

Changelog:
	[BZ #22505]
	* elf/ldconfig.c (main): Call setlocale to force LC_COLLATE to C.
---
 ChangeLog      | 5 +++++
 NEWS           | 4 ++++
 elf/ldconfig.c | 4 ++++
 3 files changed, 13 insertions(+)


It seems the discussion about doing the change in two steps hasn't
really converged, so I have implemented the single step version. I also
noted that it requires a packaging change doc entry on the Wiki. I'll
copy the NEWS entry there once the commit has been applied to master.
  

Comments

Rafal Luzynski Dec. 1, 2017, 10:46 p.m. UTC | #1
30.11.2017 23:43 Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> [...]
> diff --git a/NEWS b/NEWS
> index f3fdf9aec5..df699c6997 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -43,6 +43,10 @@ Major new features:
>
> * glibc now implements the memfd_create and mlock2 functions on Linux.
>
> +* The ldconfig utility now process `include' directives using the C/POSIX

Did you mean "processes"? ----^^^^^^^

Otherwise the patch is trivial.  I have not tested but I guess that
nothing wrong could happen here.

Seems like I was late to discuss in the previous thread so here are my
humble opinions.  The packagers (distributors, etc.) name the files in
/etc/ld.so.conf.d after the package (application, library, etc.) name
which is pretty random.  This makes me think that the order of loading
those config files does not matter.  Otherwise there would be some
mechanism enforcing the correct order of the files.  However, if it
ever happens that the order does matter then enforcing a consistent
locale will help to track down a potential bug.  Otherwise the
reproducibility of a bug would depend on the locale.  Therefore this
change is good.  But why not C.UTF-8?  If it is not yet ready then
why not wait a little until it is?

Few more details:

27.11.2017 09:47 Rical Jasan <ricaljasan@pacific.net> wrote:
> [...] If a strict ordering is needed, I would expect numerical
> prefixing to be used.

That's exactly what I mean.

27.11.2017 17:36 Carlos O'Donell <carlos@redhat.com> wrote:
> [...]
> This patch, while being the correct thing to do, has the capability to
> completely break a users system if the administrator used locale-specific
> ordering to find paths that were needed for say critical identity management
> plugins for ldap or other subsystems required for login (krb5), of which
> I know at least one with such changes (but no locale-specific ordering on
> the configuration files).

Unless an administrator has developed his/her own config files instead of
using standard packages I think that means these packages are broken already.
If that particular combination of packages (and their config files) works
correctly in that particular computer it means it works only in that
particular locale.  Forcing C locale only reveals a bug which should be
fixed anyway.

> A quick brainstorm:
>
> * Do nothing. A minimal number of users have broken systems because of
> collation
> in their locale breaks certain packages adding known to sort incorrectly
> entries
> from their distribution for ld.so.conf.d/.

Doing nothing is another good solution IMHO.  Do we know about any actual
combination of packages/config files/locales which is broken?

> [...] We really should switch to C.UTF-8, but we don't have
> this upstream yet (working on it).

I agree with this.  Carlos, if you are working on C.UTF-8 is it likely that
you finish it during this development cycle?  Can we wait until you finish?

27.11.2017 18:02 Florian Weimer <fweimer@redhat.com> wrote:
> [...]
> We can also enumerate twice and warn if there are any differences.

I think it's likely (very low probability but nonzero) that a difference
exists but very unlikely that it does actually matter.  What should
an administrator do when he/she sees such a warning?  Rename the files?
What should a casual user do?  An owner of a consumer device?

Regards,

Rafal
  
Aurelien Jarno Dec. 1, 2017, 11:40 p.m. UTC | #2
On 2017-12-01 23:46, Rafal Luzynski wrote:
> 30.11.2017 23:43 Aurelien Jarno <aurelien@aurel32.net> wrote:
> >
> > [...]
> > diff --git a/NEWS b/NEWS
> > index f3fdf9aec5..df699c6997 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -43,6 +43,10 @@ Major new features:
> >
> > * glibc now implements the memfd_create and mlock2 functions on Linux.
> >
> > +* The ldconfig utility now process `include' directives using the C/POSIX
> 
> Did you mean "processes"? ----^^^^^^^

Thanks, good catch, I'll fix that.

> Otherwise the patch is trivial.  I have not tested but I guess that
> nothing wrong could happen here.
> 
> Seems like I was late to discuss in the previous thread so here are my
> humble opinions.  The packagers (distributors, etc.) name the files in
> /etc/ld.so.conf.d after the package (application, library, etc.) name
> which is pretty random.  This makes me think that the order of loading
> those config files does not matter.  Otherwise there would be some
> mechanism enforcing the correct order of the files.  However, if it
> ever happens that the order does matter then enforcing a consistent

We have the problem in Debian is that we allow multiple libc to be
installed at the same time through, following the multiarch scheme. In
some complex situations involving multilib and multiarch, we end-up with
two libcs for the same architecture installed on a system. We want to
make sure the system one always has higher priority than the other
installed one.

Maybe we haven't chosen our names correctly, the system libc
installs x86_64-linux-gnu.conf while the multilib libc installs
zz_amd64-biarch-compat.conf. While we can change the names, I am afraid
that it's just a workaround and that this problem might surface another
day in slightly different form.

> locale will help to track down a potential bug.  Otherwise the
> reproducibility of a bug would depend on the locale.  Therefore this
> change is good.  But why not C.UTF-8?  If it is not yet ready then
> why not wait a little until it is?

From the glibc upstream side we might indeed want to wait for C.UTF-8.
From the Debian side we'll apply this patch asap, as it already causes
some issues on some systems.

> Few more details:
> 
> 27.11.2017 09:47 Rical Jasan <ricaljasan@pacific.net> wrote:
> > [...] If a strict ordering is needed, I would expect numerical
> > prefixing to be used.
> 
> That's exactly what I mean.

That might work from a distribution point of view, where all the files
provided by packages are prefixed by number. Then if in addition to that
a user drop files starting with letters, the order become undefined as
letter might be sorted before or after numbers depending on the locale.

> 27.11.2017 17:36 Carlos O'Donell <carlos@redhat.com> wrote:
> > [...]
> > This patch, while being the correct thing to do, has the capability to
> > completely break a users system if the administrator used locale-specific
> > ordering to find paths that were needed for say critical identity management
> > plugins for ldap or other subsystems required for login (krb5), of which
> > I know at least one with such changes (but no locale-specific ordering on
> > the configuration files).
> 
> Unless an administrator has developed his/her own config files instead of
> using standard packages I think that means these packages are broken already.
> If that particular combination of packages (and their config files) works
> correctly in that particular computer it means it works only in that
> particular locale.  Forcing C locale only reveals a bug which should be
> fixed anyway.

I am even more worried that on a given system ldconfig is ran with
different locales, both the default system one and the C/POSIX one. The
latter can be forced in a Makefile or a script that the user run to
install some new library or in some system tools.

> > A quick brainstorm:
> >
> > * Do nothing. A minimal number of users have broken systems because of
> > collation
> > in their locale breaks certain packages adding known to sort incorrectly
> > entries
> > from their distribution for ld.so.conf.d/.
> 
> Doing nothing is another good solution IMHO.  Do we know about any actual
> combination of packages/config files/locales which is broken?
> 
> > [...] We really should switch to C.UTF-8, but we don't have
> > this upstream yet (working on it).
> 
> I agree with this.  Carlos, if you are working on C.UTF-8 is it likely that
> you finish it during this development cycle?  Can we wait until you finish?
> 
> 27.11.2017 18:02 Florian Weimer <fweimer@redhat.com> wrote:
> > [...]
> > We can also enumerate twice and warn if there are any differences.
> 
> I think it's likely (very low probability but nonzero) that a difference
> exists but very unlikely that it does actually matter.  What should
> an administrator do when he/she sees such a warning?  Rename the files?
> What should a casual user do?  An owner of a consumer device?

I am even afraid that it won't be visible from an administrator point of 
as ldconfig is not that often run manually.

Regards,
Aurelien
  
Aurelien Jarno Dec. 13, 2017, 8:57 p.m. UTC | #3
On 2017-12-02 00:40, Aurelien Jarno wrote:
> On 2017-12-01 23:46, Rafal Luzynski wrote:
> > 30.11.2017 23:43 Aurelien Jarno <aurelien@aurel32.net> wrote:
> > > A quick brainstorm:
> > >
> > > * Do nothing. A minimal number of users have broken systems because of
> > > collation
> > > in their locale breaks certain packages adding known to sort incorrectly
> > > entries
> > > from their distribution for ld.so.conf.d/.
> > 
> > Doing nothing is another good solution IMHO.  Do we know about any actual
> > combination of packages/config files/locales which is broken?
> > 
> > > [...] We really should switch to C.UTF-8, but we don't have
> > > this upstream yet (working on it).
> > 
> > I agree with this.  Carlos, if you are working on C.UTF-8 is it likely that
> > you finish it during this development cycle?  Can we wait until you finish?
> > 
> > 27.11.2017 18:02 Florian Weimer <fweimer@redhat.com> wrote:
> > > [...]
> > > We can also enumerate twice and warn if there are any differences.
> > 
> > I think it's likely (very low probability but nonzero) that a difference
> > exists but very unlikely that it does actually matter.  What should
> > an administrator do when he/she sees such a warning?  Rename the files?
> > What should a casual user do?  An owner of a consumer device?
> 
> I am even afraid that it won't be visible from an administrator point of 
> as ldconfig is not that often run manually.

Looking at all the good work done on locales for the last few versions
and also planned for 2.27, there have already been and there will be
disruptive changes to collation in locales that cause the same kind of
issues than switching the collation to C in ldconfig. We haven't emitted
any warning about that in ldconfig. Usually people are actually rather
happy that the collation finally matches the rules they have learned
about their language.

If there are no objections, I would therefore like to commit this patch
(with the English fixes suggested by Rafal) for 2.27. I still consider
adding an entry to NEWS and packaging change doc worthwhile.
  
Rafal Luzynski Dec. 13, 2017, 11:10 p.m. UTC | #4
13.12.2017 21:57 Aurelien Jarno <aurelien@aurel32.net> wrote:
> [...]
> If there are no objections, I would therefore like to commit this patch
> (with the English fixes suggested by Rafal) for 2.27. I still consider
> adding an entry to NEWS and packaging change doc worthwhile.

Actually I suggested switching to C.UTF-8 if it's going to be available
in this development cycle, otherwise indeed please switch to C and then
we'll switch to C.UTF-8 when it is available ever in the future.

No other objections from me.

Regards,

Rafal
  
Dmitry V. Levin Dec. 13, 2017, 11:29 p.m. UTC | #5
On Wed, Dec 13, 2017 at 09:57:27PM +0100, Aurelien Jarno wrote:
[...]
> If there are no objections, I would therefore like to commit this patch
> (with the English fixes suggested by Rafal) for 2.27. I still consider
> adding an entry to NEWS and packaging change doc worthwhile.

The patch is fine, please commit.
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 8c042b8be6..2899c77fda 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@ 
+2017-11-30  Aurelien Jarno  <aurelien@aurel32.net>
+
+	[BZ #22505]
+	* elf/ldconfig.c (main): Call setlocale to force LC_COLLATE to C.
+
 2017-11-30  Joseph Myers  <joseph@codesourcery.com>
 
 	* sysdeps/m68k/m680x0/fpu/s_llrint.c: Include
diff --git a/NEWS b/NEWS
index f3fdf9aec5..df699c6997 100644
--- a/NEWS
+++ b/NEWS
@@ -43,6 +43,10 @@  Major new features:
 
 * glibc now implements the memfd_create and mlock2 functions on Linux.
 
+* The ldconfig utility now process `include' directives using the C/POSIX
+  collation ordering.  Previous glibc versions used locale-specific
+  ordering, the change might break systems that relied on that.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * On GNU/Linux, the obsolete Linux constant PTRACE_SEIZE_DEVEL is no longer
diff --git a/elf/ldconfig.c b/elf/ldconfig.c
index 89042351f8..2c01ab567b 100644
--- a/elf/ldconfig.c
+++ b/elf/ldconfig.c
@@ -1259,6 +1259,10 @@  main (int argc, char **argv)
   /* Set locale via LC_ALL.  */
   setlocale (LC_ALL, "");
 
+  /* But keep the C collation.  That way `include' directives using
+     globbing patterns are processed in a locale-independent order.  */
+  setlocale (LC_COLLATE, "C");
+
   /* Set the text message domain.  */
   textdomain (_libc_intl_domainname);