intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621)
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
Commit Message
The previous commit was incomplete: gettext() still returns a translation
if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
prohibits the translation also in this case.
* gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale
like the C locale.
---
intl/dcigettext.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
Comments
* Bruno Haible:
> The previous commit was incomplete: gettext() still returns a translation
> if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
> prohibits the translation also in this case.
>
> * gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale
> like the C locale.
> ---
> intl/dcigettext.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/intl/dcigettext.c b/intl/dcigettext.c
> index 27063886d2..fb69bbf94b 100644
> --- a/intl/dcigettext.c
> +++ b/intl/dcigettext.c
> @@ -691,9 +691,10 @@ DCIGETTEXT (const char *domainname, const char *msgid1, const char *msgid2,
> continue;
> }
>
> - /* If the current locale value is C (or POSIX) we don't load a
> - domain. Return the MSGID. */
> - if (strcmp (single_locale, "C") == 0
> + /* If the current locale value is "C" or "C.<encoding>" or "POSIX",
> + we don't load a domain. Return the MSGID. */
> + if ((single_locale[0] == 'C'
> + && (single_locale[1] == '\0' || single_locale[1] == '.'))
> || strcmp (single_locale, "POSIX") == 0)
> break;
I wasn't sure if this is a bug. The implementation does not fallback to
translation, it just uses C as a message catalog. Do you consider this
a problem?
Thanks,
Florian
Florian Weimer wrote:
> > The previous commit was incomplete: gettext() still returns a translation
> > if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
> > prohibits the translation also in this case.
>
> I wasn't sure if this is a bug. The implementation does not fallback to
> translation, it just uses C as a message catalog. Do you consider this
> a problem?
Yes, I consider this a bug, for two reasons:
* The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 states
"It shall be the C locale but with UTF-8 encodings."
and
"These will be the same as C... LC_MESSAGES"
The C locale has the property that gettext() returns the msgid in all cases,
regardless of what files are on disk and regardless of the values of any
environment variables.
If the C.UTF-8 has the property that gettext() returns msgid only if there
is no translation catalog at /usr/share/locale/C/LC_MESSAGES/<domain>.mo,
it is *not* the same as "the C locale but with UTF-8 encodings".
* We have this rule, that gettext() returns the msgid when the locale is the
"C" locale, because
- the POSIX standard specifies the precise output of some programs (e.g.
'diff') in the C locale, and
- we wanted, from the beginning in 1995, that gettext() can be used in
the source code of these programs, without an explicit check for the
locale.
It is possible that, in the long run, POSIX adopts the C.UTF-8 locale,
since several platforms already have it: glibc, musl libc, FreeBSD, NetBSD,
OpenBSD, Cygwin, Android.
When this happens, we want that the maintainers of 'diff' etc. can continue
to use gettext(), without introducing an explicit check for the locale.
Bruno
@@ -691,9 +691,10 @@ DCIGETTEXT (const char *domainname, const char *msgid1, const char *msgid2,
continue;
}
- /* If the current locale value is C (or POSIX) we don't load a
- domain. Return the MSGID. */
- if (strcmp (single_locale, "C") == 0
+ /* If the current locale value is "C" or "C.<encoding>" or "POSIX",
+ we don't load a domain. Return the MSGID. */
+ if ((single_locale[0] == 'C'
+ && (single_locale[1] == '\0' || single_locale[1] == '.'))
|| strcmp (single_locale, "POSIX") == 0)
break;