intl: Treat C.UTF-8 locale like C locale (BZ# 16621)

Message ID 20221115005623.3774099-1-bruno@clisp.org
State Committed
Commit 2897b231a6b71ee17d47d3d63f1112b2641a476c
Delegated to: Florian Weimer
Headers
Series intl: Treat C.UTF-8 locale like C locale (BZ# 16621) |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Bruno Haible Nov. 15, 2022, 12:56 a.m. UTC
  The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
says that "Setting LC_ALL=C.UTF-8 will ignore LANGUAGE just like it
does with LC_ALL=C." This patch implements it.

* intl/dcigettext.c (guess_category_value): Treat C.<encoding> locale
like the C locale.
---
 intl/dcigettext.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)
  

Comments

Florian Weimer Sept. 4, 2023, 1:34 p.m. UTC | #1
* Bruno Haible:

> The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
> says that "Setting LC_ALL=C.UTF-8 will ignore LANGUAGE just like it
> does with LC_ALL=C." This patch implements it.
>
> * intl/dcigettext.c (guess_category_value): Treat C.<encoding> locale
> like the C locale.
> ---
>  intl/dcigettext.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/intl/dcigettext.c b/intl/dcigettext.c
> index 1fc074a414..6a3c248e68 100644
> --- a/intl/dcigettext.c
> +++ b/intl/dcigettext.c
> @@ -1564,8 +1564,12 @@ guess_category_value (int category, const char *categoryname)
>       2. The precise output of some programs in the "C" locale is specified
>  	by POSIX and should not depend on environment variables like
>  	"LANGUAGE" or system-dependent information.  We allow such programs
> -        to use gettext().  */
> -  if (strcmp (locale, "C") == 0)
> +        to use gettext().
> +     Ignore LANGUAGE and its system-dependent analogon also if the locale is
> +     set to "C.UTF-8" or, more generally, to "C.<encoding>", because that's
> +     the by-design behaviour for glibc, see
> +     <https://sourceware.org/glibc/wiki/Proposals/C.UTF-8>.  */
> +  if (locale[0] == 'C' && (locale[1] == '\0' || locale[1] == '.'))
>      return locale;
>  
>    /* The highest priority value is the value of the 'LANGUAGE' environment

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Fix pushed.  I've posted my test case as well:

  [PATCH] intl: Add test case for bug 16621
  <https://inbox.sourceware.org/libc-alpha/87o7iiukpt.fsf@oldenburg3.str.redhat.com/T/#u>

Thanks,
Florian
  
Bruno Haible Sept. 10, 2023, 7:10 p.m. UTC | #2
Florian Weimer wrote:
> > * intl/dcigettext.c (guess_category_value): Treat C.<encoding> locale
> > like the C locale.
> 
> Reviewed-by: Florian Weimer <fweimer@redhat.com>
> 
> Fix pushed.

Thanks!

> I've posted my test case as well:
> 
>   [PATCH] intl: Add test case for bug 16621
>   <https://inbox.sourceware.org/libc-alpha/87o7iiukpt.fsf@oldenburg3.str.redhat.com/T/#u>

Now that the main patch is in glibc, I added it also to GNU gettext, together with a unit
test. My unit test [1][2] happens to be stricter than what I had manually tested in Dec. 2022:
It adds a .mo file at <LOCALEDIR>/C/LC_MESSAGES/<domain>.mo . And the test fails. A
second patch is needed, basically the same change at a different place in dcigettext.c.
I'm posting it separately.

Bruno

[1] https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-tools/tests/intl-0;h=9977cfe2e5d645c3a20fbfe891974720aacb488d;hb=HEAD
[2] https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-tools/tests/intl-1-prg.c;h=cda076140b4d60d2a9535d4fa1d769f26c580c20;hb=HEAD
  

Patch

diff --git a/intl/dcigettext.c b/intl/dcigettext.c
index 1fc074a414..6a3c248e68 100644
--- a/intl/dcigettext.c
+++ b/intl/dcigettext.c
@@ -1564,8 +1564,12 @@  guess_category_value (int category, const char *categoryname)
      2. The precise output of some programs in the "C" locale is specified
 	by POSIX and should not depend on environment variables like
 	"LANGUAGE" or system-dependent information.  We allow such programs
-        to use gettext().  */
-  if (strcmp (locale, "C") == 0)
+        to use gettext().
+     Ignore LANGUAGE and its system-dependent analogon also if the locale is
+     set to "C.UTF-8" or, more generally, to "C.<encoding>", because that's
+     the by-design behaviour for glibc, see
+     <https://sourceware.org/glibc/wiki/Proposals/C.UTF-8>.  */
+  if (locale[0] == 'C' && (locale[1] == '\0' || locale[1] == '.'))
     return locale;
 
   /* The highest priority value is the value of the 'LANGUAGE' environment