[ping] Different outputs affected by locale

Message ID 538FE412.1050806@codesourcery.com
State Superseded
Headers

Commit Message

Yao Qi June 5, 2014, 3:29 a.m. UTC
  On 06/05/2014 04:23 AM, Pedro Alves wrote:
>> > I am not really a great standards lawyer but my first reaction is that
>> > mingw's C locale is not conforming.  At least from:
>> > 
>> >     http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html
>> > 
>> > .. it seems to me that \242 is not defined as a 'print' character in the
>> > LC_CTYPE section.  Though I'd like to reiterate that I don't actually
>> > trust my own reading of that text.
> I wonder whether this is really a mingw issue, or whether this is a
> remote host testing issue.  That is, aren't we setting LC_CTYPE
> on the _build_ (where expect runs), not on the host (mingw, through

This is a not a mingw issue nor a remote host testing issue.  If the
LC_CTYPE isn't set properly on host, these tests will fail, even in the
native testing.

> ssh)?  Is LC_CTYPE really being propagated to the host?

No, setting env variables on host or target in dejagnu isn't trivial to
me.

> Does testing GDB manually directly on a Windows console show the same
> issue?

Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
However, I didn't investigate why 'ó' is printed.

gdb) p repeat
$1 = L"A", 'ó' <repeats 21 times>, "B\000\xffff\200\000\x1370\500\xfe0c\"\x300\x
7ffe\xfe98\"\xe115\x771b\x67c9\x42c8\xfffe\xffff\x6d91\x7726\x1ae0@\xeb0:\x300\x
7ffe\xea8:\200\000Ω\000\xf480\x7594\000:\000\000\xf489\x7594\017\000\004\000Ω\00
0\xfe9c\"\x6094\x771e\xa2ac\x771f\xffff\xffff$\000\xfe98\"\004\000\000\000\x559\
xc000\xfea8\"\xf600\x7594\000\000\000\000\000\000\xfebc\"\xa442\x7594\x2a8\x759e
\xfefc\"\xf4d2\x7594\b\000\x118e\x7595\x1162\x7595\x8ccb\x3e13\000\000\000\000\0
00\000\x1ae0@\xfed0\"\x8fe3\x759b\xffc4"

here is the update patch to match either \242 or cent sign.
  

Comments

Pedro Alves June 5, 2014, 8:58 a.m. UTC | #1
On 06/05/2014 04:29 AM, Yao Qi wrote:
> On 06/05/2014 04:23 AM, Pedro Alves wrote:
>>>> I am not really a great standards lawyer but my first reaction is that
>>>> mingw's C locale is not conforming.  At least from:
>>>>
>>>>     http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html
>>>>
>>>> .. it seems to me that \242 is not defined as a 'print' character in the
>>>> LC_CTYPE section.  Though I'd like to reiterate that I don't actually
>>>> trust my own reading of that text.
>> I wonder whether this is really a mingw issue, or whether this is a
>> remote host testing issue.  That is, aren't we setting LC_CTYPE
>> on the _build_ (where expect runs), not on the host (mingw, through
> 
> This is a not a mingw issue nor a remote host testing issue.  

But that's a conflicting answer.  It's a remote host testing
if this only triggers with remote host testing.

> If the
> LC_CTYPE isn't set properly on host, these tests will fail, even in the
> native testing.

Sure, but it's supposed to be set, and then tests can assume so.
If not set in some circumstance, then it's a bug in the test
infrustruture, not the test.  For native testing, those are
set by gdb.exp:gdb_init.

> 
>> ssh)?  Is LC_CTYPE really being propagated to the host?
> 
> No, setting env variables on host or target in dejagnu isn't trivial to
> me.

They need to be passed down explicitly in the ssh command line:

$ ssh localhost "FOO=1 env | grep FOO"
FOO=1

> 
>> Does testing GDB manually directly on a Windows console show the same
>> issue?
> 
> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
> However, I didn't investigate why 'ó' is printed.

But was that with LC_CTYPE set to C?
  
Yao Qi June 5, 2014, 9:56 a.m. UTC | #2
On 06/05/2014 04:58 PM, Pedro Alves wrote:
>> This is a not a mingw issue nor a remote host testing issue.  
> But that's a conflicting answer.  It's a remote host testing
> if this only triggers with remote host testing.
> 

OK, it is a remote host testing issue, since LC_CTYPE is set on build
only.

>> > If the
>> > LC_CTYPE isn't set properly on host, these tests will fail, even in the
>> > native testing.
> Sure, but it's supposed to be set, and then tests can assume so.
> If not set in some circumstance, then it's a bug in the test
> infrustruture, not the test.  For native testing, those are
> set by gdb.exp:gdb_init.
> 
>> > 
>>> >> ssh)?  Is LC_CTYPE really being propagated to the host?
>> > 
>> > No, setting env variables on host or target in dejagnu isn't trivial to
>> > me.
> They need to be passed down explicitly in the ssh command line:
> 
> $ ssh localhost "FOO=1 env | grep FOO"
> FOO=1
> 

Yes, it is simple to pass env variable through ssh, but isn't trivial to
pass env variable to host or target in dejagnu, because,

 - ssh is not the only connection dejagnu supports, how about telnet?
 - env variable should bind to board.  host and target can have
different env vars.

I saw Jie's patch to set env var on target
http://lists.gnu.org/archive/html/dejagnu/2008-07/msg00000.html
but we need do more than that, IMO.  That is the reason I am inclined to
fix the test case instead of the infrastructure (dejagnu).

>> > 
>>> >> Does testing GDB manually directly on a Windows console show the same
>>> >> issue?
>> > 
>> > Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
>> > However, I didn't investigate why 'ó' is printed.
> But was that with LC_CTYPE set to C?

I don't know how check LC_CTYPE on Windows. :(
  
Pedro Alves June 5, 2014, 10:12 a.m. UTC | #3
On 06/05/2014 10:56 AM, Yao Qi wrote:
> On 06/05/2014 04:58 PM, Pedro Alves wrote:

>>>>>> Does testing GDB manually directly on a Windows console show the same
>>>>>> issue?
>>>>
>>>> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
>>>> However, I didn't investigate why 'ó' is printed.
>> But was that with LC_CTYPE set to C?
> 
> I don't know how check LC_CTYPE on Windows. :(

Try "set", and "set /?".
  
Pedro Alves June 5, 2014, 10:27 a.m. UTC | #4
On 06/05/2014 10:56 AM, Yao Qi wrote:

> Yes, it is simple to pass env variable through ssh, but isn't trivial to
> pass env variable to host or target in dejagnu, because,
> 
>  - ssh is not the only connection dejagnu supports, how about telnet?

Well, nobody really uses that for _host_ connections.

>  - env variable should bind to board.  host and target can have
> different env vars.
> 
> I saw Jie's patch to set env var on target
> http://lists.gnu.org/archive/html/dejagnu/2008-07/msg00000.html
> but we need do more than that, IMO.  That is the reason I am inclined to
> fix the test case instead of the infrastructure (dejagnu).

In practice, all real host board files will have a ${board}_spawn
override anyway.  We can set GDB's vars in a gdb_env array, similar
to Jie's patch, and then the ${board}_spawn routine can pass them
to $RSH.  When/if Jie's patch is extended to bind to board, and
accepted upstream, we just set the appropriate new board var to $gdb_env.
  
Eli Zaretskii June 5, 2014, 2:47 p.m. UTC | #5
> Date: Thu, 5 Jun 2014 11:29:22 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: Joel Brobecker <brobecker@adacore.com>, <gdb-patches@sourceware.org>
> 
> However, I didn't investigate why 'ó' is printed.

'ó' is 243 decimal (0363 octal), not 242.
  
Eli Zaretskii June 5, 2014, 3:04 p.m. UTC | #6
> Date: Thu, 05 Jun 2014 11:12:50 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
> 
> > I don't know how check LC_CTYPE on Windows. :(
> 
> Try "set", and "set /?".

Typing "set LC_TYPE" will either display its value or say that it is
not defined.
  
Yao Qi June 9, 2014, 8:35 a.m. UTC | #7
On 06/05/2014 06:12 PM, Pedro Alves wrote:
> On 06/05/2014 10:56 AM, Yao Qi wrote:
>> On 06/05/2014 04:58 PM, Pedro Alves wrote:
> 
>>>>>>> Does testing GDB manually directly on a Windows console show the same
>>>>>>> issue?
>>>>>
>>>>> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console).
>>>>> However, I didn't investigate why 'ó' is printed.
>>> But was that with LC_CTYPE set to C?
>>
>> I don't know how check LC_CTYPE on Windows. :(
> 
> Try "set", and "set /?".
> 

LC_CTYPE isn't set on the Windows machine I am using.  set LC_CTYPE=C,
but the output is unchanged.

I dive into locale stuff, and find something more, in
main.c:captured_main, gdb does

#if defined (HAVE_SETLOCALE)
  setlocale (LC_CTYPE, "");
#endif

the man page of setlocale says

If locale is "", each part of the locale that should be modified is set
according to the environment variables.

That is why we can pass env var to change gdb's locale.

However, looks setlocale on Windows behaves differently when locale is
"".  The msdn about setlocale
<http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> says "If locale
points to an empty string, the locale is the implementation-defined
native environment.", but it doesn't say much on the
"implementation-defined native environment".  The following example
in the same page gives me some hints,

setlocale( LC_ALL, "" );
Sets the locale to the default, which is the user-default ANSI code page
obtained from the operating system.

As far as I can see, windows doesn't consider any env var with
setlocale(FOO, "").  If I am correct, we can't set gdb's locale by means
of setting env var, instead, we have to match all the possibilities in
the testcase.  WDYT?
  
Pedro Alves June 9, 2014, 10:11 a.m. UTC | #8
On 06/09/2014 09:35 AM, Yao Qi wrote:

> LC_CTYPE isn't set on the Windows machine I am using.  set LC_CTYPE=C,
> but the output is unchanged.
> 
> I dive into locale stuff, and find something more, in
> main.c:captured_main, gdb does
> 
> #if defined (HAVE_SETLOCALE)
>   setlocale (LC_CTYPE, "");
> #endif
> 
> the man page of setlocale says
> 
> If locale is "", each part of the locale that should be modified is set
> according to the environment variables.
> 
> That is why we can pass env var to change gdb's locale.
> 
> However, looks setlocale on Windows behaves differently when locale is
> "".  The msdn about setlocale
> <http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> says "If locale
> points to an empty string, the locale is the implementation-defined
> native environment.", but it doesn't say much on the
> "implementation-defined native environment".  The following example
> in the same page gives me some hints,
> 
> setlocale( LC_ALL, "" );
> Sets the locale to the default, which is the user-default ANSI code page
> obtained from the operating system.
> 
> As far as I can see, windows doesn't consider any env var with
> setlocale(FOO, "").  

Correct.

> If I am correct, we can't set gdb's locale by means
> of setting env var, 

Not true.  It just means that GDB should be doing more
on native Windows, instead of assuming setlocale on Windows
behaves like the POSIX counterpart.  See e.g.,
src/intl/localename.c  (gettext):

...
  /* Let the user override the system settings through environment
     variables, as on POSIX systems.  */
  retval = getenv ("LC_ALL");
  if (retval != NULL && retval[0] != '\0')
    return retval;
  retval = getenv (categoryname);
  if (retval != NULL && retval[0] != '\0')
    return retval;
  retval = getenv ("LANG");
  if (retval != NULL && retval[0] != '\0')
    return retval;

  /* Use native Win32 API locale ID.  */
  lcid = GetThreadLocale ();
...

etc.

But that code has evolved upstream, and we have the solution
already in gnulib.  See:

http://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00154.html

Newer versions of intl/gettext override setlocale like that too:

 http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/setlocale.c

> instead, we have to match all the possibilities in
> the testcase.  WDYT?

I think the test caught a real GDB bug on Windows, and we
should fix GDB to make it look at the environment variables,
as is expected of GNU programs.  And that the best way
to handle this is to import the gnulib setlocale module.
  
Yao Qi June 11, 2014, 2:20 a.m. UTC | #9
On 06/09/2014 06:11 PM, Pedro Alves wrote:
> I think the test caught a real GDB bug on Windows, and we
> should fix GDB to make it look at the environment variables,
> as is expected of GNU programs.  And that the best way
> to handle this is to import the gnulib setlocale module.

I've started setlocale module import, but during the work, I did some
experiments and the result is confusing me.

We import setlocale so that we can set locale through env var, assuming
that different locales affect the return value of iswprint (0xa2).
However, this assumption isn't true on Windows :(

I write the following program to check the return value of iswprint
under different locales.

On Linux, the output is reasonable
$ ./iswprint
4
C: 0
en_US.UTF-8: 1
C: 0

On Windows, iswprint always return true!
C:\>iswprint.win.exe
2
C: 16
English_United States.1252: 16
C: 16

iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C,
iswprint (0xa2) behaves differently on Windows and Linux.
  
Eli Zaretskii June 11, 2014, 4:22 p.m. UTC | #10
> Date: Wed, 11 Jun 2014 10:20:28 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,	<gdb-patches@sourceware.org>
> 
> We import setlocale so that we can set locale through env var, assuming
> that different locales affect the return value of iswprint (0xa2).
> However, this assumption isn't true on Windows :(
> 
> I write the following program to check the return value of iswprint
> under different locales.
> 
> On Linux, the output is reasonable
> $ ./iswprint
> 4
> C: 0
> en_US.UTF-8: 1
> C: 0
> 
> On Windows, iswprint always return true!
> C:\>iswprint.win.exe
> 2
> C: 16
> English_United States.1252: 16
> C: 16
> 
> iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C,
> iswprint (0xa2) behaves differently on Windows and Linux.

Why do you need 0xa2 to be unprintable?
  
Yao Qi June 12, 2014, 12:46 a.m. UTC | #11
On 06/12/2014 12:22 AM, Eli Zaretskii wrote:
> Why do you need 0xa2 to be unprintable?

Test in gdb.base/wchar.exp expects 0xa2 being unprintable.

set cent "\\\\242"
gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"

but it is printable on mingw and causes several fails in wchar.exp.  At
the beginning, I think this is caused by locale but the experiment later
shows that setting locale doesn't change anything (the subject becomes
misleading).

I should change the subject to "Different output affected by host", and
probably go back to use the patch to relax the pattern to

set cent "(\\\\242|\u00A2)"
  
Eli Zaretskii June 12, 2014, 2:46 a.m. UTC | #12
> Date: Thu, 12 Jun 2014 08:46:23 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: <palves@redhat.com>, <tromey@redhat.com>, <brobecker@adacore.com>,
> 	<gdb-patches@sourceware.org>
> 
> On 06/12/2014 12:22 AM, Eli Zaretskii wrote:
> > Why do you need 0xa2 to be unprintable?
> 
> Test in gdb.base/wchar.exp expects 0xa2 being unprintable.

So you need _any_ character for which iswprint returns zero?  If so,
does the character have to be a single byte?
  
Yao Qi June 12, 2014, 7:02 a.m. UTC | #13
On 06/12/2014 10:46 AM, Eli Zaretskii wrote:
> So you need _any_ character for which iswprint returns zero?  If so,
> does the character have to be a single byte?

Find a character for which iswprint returns zero isn't the point, IMO.
The problem is wchar.exp expects "\242" but GDB prints cent sign on
mingw.  Instead of changing to another character, isn't better to match
both (\242 and cent sign) in regexp pattern?
  
Pedro Alves June 12, 2014, 11:36 a.m. UTC | #14
On 06/11/2014 03:20 AM, Yao Qi wrote:
> On 06/09/2014 06:11 PM, Pedro Alves wrote:
>> I think the test caught a real GDB bug on Windows, and we
>> should fix GDB to make it look at the environment variables,
>> as is expected of GNU programs.  And that the best way
>> to handle this is to import the gnulib setlocale module.
> 
> I've started setlocale module import, but during the work, I did some
> experiments and the result is confusing me.
> 
> We import setlocale so that we can set locale through env var, assuming
> that different locales affect the return value of iswprint (0xa2).
> However, this assumption isn't true on Windows :(

Well, it actually is.

> 
> I write the following program to check the return value of iswprint
> under different locales.
> 
> On Linux, the output is reasonable
> $ ./iswprint
> 4
> C: 0
> en_US.UTF-8: 1
> C: 0
> 
> On Windows, iswprint always return true!
> C:\>iswprint.win.exe
> 2
> C: 16
> English_United States.1252: 16

This shows that what happens is that on Windows the LC_CTYPE=C picks
up the CP-1252 Windows code page (Latin 1), an extended ASCII
code page.  And in that code page, 162 is printable.

> C: 16
> 
> iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C,
> iswprint (0xa2) behaves differently on Windows and Linux.
> 

The difference is really in what locale/code page LC_CTYPE=C picks
up.

What does "show host-charset" show on Windows, before and after
you make GDB pick LC_CTYPE=C from the environment (with the
setlocale gnulib module)?

(Ideally, the wchar tests would actually iterate testing GDB
behaves as expected with different values of LC_CTYPE, etc. set
in the environment.  With all other tests assuming ASCII as set
by default by the testsuite framework.)
  
Yao Qi June 12, 2014, 2:37 p.m. UTC | #15
On 06/12/2014 07:36 PM, Pedro Alves wrote:
> What does "show host-charset" show on Windows, before and after
> you make GDB pick LC_CTYPE=C from the environment (with the
> setlocale gnulib module)?

GDB on Windows gets host charset from GetACP(), in
charset.c:_initialize_charset ().

#elif defined (USE_WIN32API)
  {
    /* "CP" + x<=5 digits + paranoia.  */
    static char w32_host_default_charset[16];

    snprintf (w32_host_default_charset, sizeof w32_host_default_charset,
	      "CP%d", GetACP());
    auto_host_charset_name = w32_host_default_charset;
    auto_target_charset_name = auto_host_charset_name;
  }
#endif

GetACP doesn't depend on locale, so I don't think LC_CTYPE=C affects the
host-charset in GDB.  However, I do this:

  printf ("%d\n", GetACP());

  setlocale (LC_CTYPE, "");
  printf ("%d\n", GetACP());

  setlocale (LC_CTYPE, "C");
  printf ("%d\n", GetACP());

On my Windows machine, 1252 is printed three times.

> 
> (Ideally, the wchar tests would actually iterate testing GDB
> behaves as expected with different values of LC_CTYPE, etc. set
> in the environment.  With all other tests assuming ASCII as set
> by default by the testsuite framework.)

On the condition that we know or enumerate the expected output for
wchars under each LC_CTYPE on different host (or OS).  Test like this
is out of the scope of GDB (or debugger) testing, IMO.
  
Eli Zaretskii June 12, 2014, 5:02 p.m. UTC | #16
> Date: Thu, 12 Jun 2014 15:02:57 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: <palves@redhat.com>, <tromey@redhat.com>, <brobecker@adacore.com>,
> 	<gdb-patches@sourceware.org>
> 
> On 06/12/2014 10:46 AM, Eli Zaretskii wrote:
> > So you need _any_ character for which iswprint returns zero?  If so,
> > does the character have to be a single byte?
> 
> Find a character for which iswprint returns zero isn't the point, IMO.
> The problem is wchar.exp expects "\242" but GDB prints cent sign on
> mingw.  Instead of changing to another character, isn't better to match
> both (\242 and cent sign) in regexp pattern?

Maybe.  Can you tell what is the purpose of the test?  (Sorry, I know
almost nothing about the test suite.)
  
Eli Zaretskii June 12, 2014, 5:07 p.m. UTC | #17
> Date: Thu, 12 Jun 2014 22:37:38 +0800
> From: Yao Qi <yao@codesourcery.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,	<gdb-patches@sourceware.org>
> 
> GetACP doesn't depend on locale, so I don't think LC_CTYPE=C affects the
> host-charset in GDB.

Indeed, it doesn't.

> However, I do this:
> 
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "");
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "C");
>   printf ("%d\n", GetACP());
> 
> On my Windows machine, 1252 is printed three times.

As expected: GetACP returns the _default_ codepage, and the default
does not change when you change a locale.  And the iswprint function
doesn't consult the default codepage.  So I don't think this issue
with GetACP is at all relevant.
  
Eli Zaretskii June 12, 2014, 5:08 p.m. UTC | #18
> Date: Thu, 12 Jun 2014 12:36:29 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
> 
> (Ideally, the wchar tests would actually iterate testing GDB
> behaves as expected with different values of LC_CTYPE, etc. set
> in the environment.  With all other tests assuming ASCII as set
> by default by the testsuite framework.)

What do you mean by "behaves as expected"?  And why is LC_TYPE
important here?
  
Pedro Alves June 12, 2014, 5:23 p.m. UTC | #19
On 06/12/2014 03:37 PM, Yao Qi wrote:
> On 06/12/2014 07:36 PM, Pedro Alves wrote:
>> What does "show host-charset" show on Windows, before and after
>> you make GDB pick LC_CTYPE=C from the environment (with the
>> setlocale gnulib module)?
> 
> GDB on Windows gets host charset from GetACP(), in
> charset.c:_initialize_charset ().
> 
> #elif defined (USE_WIN32API)
>   {
>     /* "CP" + x<=5 digits + paranoia.  */
>     static char w32_host_default_charset[16];
> 
>     snprintf (w32_host_default_charset, sizeof w32_host_default_charset,
> 	      "CP%d", GetACP());
>     auto_host_charset_name = w32_host_default_charset;
>     auto_target_charset_name = auto_host_charset_name;
>   }
> #endif
> 

I note gnulib's nl_langinfo replacement actually does
the same thing.

> GetACP doesn't depend on locale, 

Yeah, it's a mess, and those are really different
things.  The former is the system locale, while the latter
the user locale.  MSDN is confusing, but lots of blogs around
explaining this.

> so I don't think LC_CTYPE=C affects the
> host-charset in GDB.  However, I do this:
> 
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "");
>   printf ("%d\n", GetACP());
> 
>   setlocale (LC_CTYPE, "C");
>   printf ("%d\n", GetACP());
> 
> On my Windows machine, 1252 is printed three times.

So what I'm thinking is indeed going with making the test
accept the cent, but conditioned, like:

# Fallback to assuming 7-bit ASCII.  Test are ran under LC_CTYPE=C.

set cent "\\\\242"

set test "show host-charset"
gdb_test_multiple $test $test {
   -re "CP1252\r\n$gdb_prompt $" {
        # With Windows code page 1252 (Latin 1), the cent
        # is printable.
	set cent "\u00A2"
	pass $test
   }
   -re "$gdb_prompt $" {
	pass $test
   }
}

> 
>>
>> (Ideally, the wchar tests would actually iterate testing GDB
>> behaves as expected with different values of LC_CTYPE, etc. set
>> in the environment.  With all other tests assuming ASCII as set
>> by default by the testsuite framework.)
> 
> On the condition that we know or enumerate the expected output for
> wchars under each LC_CTYPE on different host (or OS).  Test like this
> is out of the scope of GDB (or debugger) testing, IMO.

Not an exaustive test, and not by host, but just by picking a couple
charsets/locales.  So that we at least ensure that the framework is
all in sync.  That is, check:

$ unset LC_CTYPE; gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
$ LC_CTYPE=XXX gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
$ LC_CTYPE=en_US gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
$ LC_CTYPE=en_US.UTF-8 gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
  
Pedro Alves June 12, 2014, 5:26 p.m. UTC | #20
On 06/12/2014 06:08 PM, Eli Zaretskii wrote:
>> Date: Thu, 12 Jun 2014 12:36:29 +0100
>> From: Pedro Alves <palves@redhat.com>
>> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
>>
>> (Ideally, the wchar tests would actually iterate testing GDB
>> behaves as expected with different values of LC_CTYPE, etc. set
>> in the environment.  With all other tests assuming ASCII as set
>> by default by the testsuite framework.)
> 
> What do you mean by "behaves as expected"?  And why is LC_TYPE
> important here?

I think I've answered this in my response to Yao.
  
Eli Zaretskii June 12, 2014, 5:48 p.m. UTC | #21
> Date: Thu, 12 Jun 2014 18:23:34 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>,        gdb-patches@sourceware.org
> 
> On 06/12/2014 03:37 PM, Yao Qi wrote:
> > On 06/12/2014 07:36 PM, Pedro Alves wrote:
> >> What does "show host-charset" show on Windows, before and after
> >> you make GDB pick LC_CTYPE=C from the environment (with the
> >> setlocale gnulib module)?
> > 
> > GDB on Windows gets host charset from GetACP(), in
> > charset.c:_initialize_charset ().
> > 
> > #elif defined (USE_WIN32API)
> >   {
> >     /* "CP" + x<=5 digits + paranoia.  */
> >     static char w32_host_default_charset[16];
> > 
> >     snprintf (w32_host_default_charset, sizeof w32_host_default_charset,
> > 	      "CP%d", GetACP());
> >     auto_host_charset_name = w32_host_default_charset;
> >     auto_target_charset_name = auto_host_charset_name;
> >   }
> > #endif
> > 
> 
> I note gnulib's nl_langinfo replacement actually does
> the same thing.

And gnulib's nl_langinfo is wrong, btw, because one can use
'setlocale' to change the codeset, without any relation whatsoever to
the console encoding.  (I sent a fix for that to gnulib's list just
yesterdat.)

> > GetACP doesn't depend on locale, 
> 
> Yeah, it's a mess, and those are really different
> things.  The former is the system locale, while the latter
> the user locale.

That's true, but that's not the important issue here.  The important
issue here is the fundamental difference between the Windows console
encoding and the current locale's codeset.  The former affects how
Windows writes to the console, and in most cases changing the console
codepage (e.g., with SetConsoleCP or SetConsoleOutputCP) is a futile
exercise, because all it does is cause garbled display.  The latter is
an important feature when you are dealing with programs that don't
intend using the codeset to display text to the user, but, for
example, to change the behavior of iswprint.

Using the console codepage when really the locale's codeset is needed
will only going to work for the default setup, not when you want or
need to change the locale.
  
Eli Zaretskii June 12, 2014, 5:49 p.m. UTC | #22
> Date: Thu, 12 Jun 2014 18:26:47 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,        gdb-patches@sourceware.org
> 
> > What do you mean by "behaves as expected"?  And why is LC_TYPE
> > important here?
> 
> I think I've answered this in my response to Yao.

Not really, but you don't have to explain as long as the original
problem is solved.
  
Pedro Alves June 12, 2014, 6:05 p.m. UTC | #23
On 06/12/2014 06:49 PM, Eli Zaretskii wrote:
>> Date: Thu, 12 Jun 2014 18:26:47 +0100
>> From: Pedro Alves <palves@redhat.com>
>> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,        gdb-patches@sourceware.org
>>
>>> What do you mean by "behaves as expected"?  And why is LC_TYPE
>>> important here?
>>
>> I think I've answered this in my response to Yao.
> 
> Not really, but you don't have to explain as long as the original
> problem is solved.

Trying again then.

The testsuite framework does, in gdb.exp:gdb_init:

    # We set LC_ALL, LC_CTYPE, and LANG to C so that we get the same
    # messages as expected.
    setenv LC_ALL C
    setenv LC_CTYPE C
    setenv LANG C

... so that output is stable for everyone.

And if we do that, we miss making sure GDB works correctly
with locales/charsets other than C/ASCII on most hosts.

So I was just saying that IMO ideally we'd have tests that
make sure GDB prints what we think it should print when
LC_CTYPE (etc.) is set to something else, like e.g.,
en_US.UTF-8.

Does that answer the question?
  
Eli Zaretskii June 12, 2014, 6:34 p.m. UTC | #24
> Date: Thu, 12 Jun 2014 19:05:52 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,
>         gdb-patches@sourceware.org
> 
> Trying again then.

Thanks.

> The testsuite framework does, in gdb.exp:gdb_init:
> 
>     # We set LC_ALL, LC_CTYPE, and LANG to C so that we get the same
>     # messages as expected.
>     setenv LC_ALL C
>     setenv LC_CTYPE C
>     setenv LANG C
> 
> ... so that output is stable for everyone.

With you so far.  But note that on Windows, even the above does not
guarantee "stable output", because the console codepage is not changed
by 'setlocale', and moreover, the Windows 'setlocale' doesn't pay
attention to environment variables.  So on Windows, these tests run in
the default system locale (because we call 'setlocale' with the 2nd
argument an empty string).

> And if we do that, we miss making sure GDB works correctly
> with locales/charsets other than C/ASCII on most hosts.

And here, "works correctly" means what? sets host-charset? or
something else?  Assuming the former below.

> So I was just saying that IMO ideally we'd have tests that
> make sure GDB prints what we think it should print when
> LC_CTYPE (etc.) is set to something else, like e.g.,
> en_US.UTF-8.

You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset
(although there is a UTF-8 codepage, and Windows does support it in
general).  More importantly, since 'setlocale' on Windows disregards
the environment variables, you cannot change the host charset by
setting environment variables.  You must do that by a GDB command that
sets host-charset.
  
Pedro Alves June 16, 2014, 1:57 p.m. UTC | #25
On 06/12/2014 07:34 PM, Eli Zaretskii wrote:

> With you so far.  But note that on Windows, even the above does not
> guarantee "stable output", because the console codepage is not changed
> by 'setlocale', 

I guess the harmness could run gdb under chcp 65001 or some such.

> You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset
> (although there is a UTF-8 codepage, and Windows does support it in
> general).  More importantly, since 'setlocale' on Windows disregards
> the environment variables, you cannot change the host charset by
> setting environment variables.  You must do that by a GDB command that
> sets host-charset.

See https://sourceware.org/ml/gdb-patches/2014-06/msg00364.html .
  
Eli Zaretskii June 16, 2014, 3:40 p.m. UTC | #26
> Date: Mon, 16 Jun 2014 14:57:59 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com,
>         gdb-patches@sourceware.org
> 
> On 06/12/2014 07:34 PM, Eli Zaretskii wrote:
> 
> > With you so far.  But note that on Windows, even the above does not
> > guarantee "stable output", because the console codepage is not changed
> > by 'setlocale', 
> 
> I guess the harmness could run gdb under chcp 65001 or some such.

You could, but it won't help, really.  It's a long story, but support
for UTF-8 on a Windows console is pathetic.  With enough trouble
(which will need source changes in GDB and in Readline), you might
have European characters displayed correctly, if you also change the
console font to Lucida Console.  But anything beyond European
characters simply cannot be displayed, because the font doesn't have
them.

> > You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset
> > (although there is a UTF-8 codepage, and Windows does support it in
> > general).  More importantly, since 'setlocale' on Windows disregards
> > the environment variables, you cannot change the host charset by
> > setting environment variables.  You must do that by a GDB command that
> > sets host-charset.
> 
> See https://sourceware.org/ml/gdb-patches/2014-06/msg00364.html .

If you mean the last 2 sentences, then yes, using setlocale from
gnulib will fix that.  But the problem with UTF-8 as the charset isn't
(and AFAIK cannot be) solved by gnulib, because Windows simply does
not support codepage 65001 in its setlocale implementation (this is
documented in MSDN).
  
Pedro Alves June 16, 2014, 4:23 p.m. UTC | #27
On 06/16/2014 04:40 PM, Eli Zaretskii wrote:

>>> With you so far.  But note that on Windows, even the above does not
>>> guarantee "stable output", because the console codepage is not changed
>>> by 'setlocale', 
>>
>> I guess the harmness could run gdb under chcp 65001 or some such.
> 
> You could, but it won't help, really.  It's a long story, but support
> for UTF-8 on a Windows console is pathetic.  With enough trouble
> (which will need source changes in GDB and in Readline), you might
> have European characters displayed correctly, if you also change the
> console font to Lucida Console.  But anything beyond European
> characters simply cannot be displayed, because the font doesn't have
> them.

OK, I was focusing more on the "stable output" aspect than
the specific codepage.
  
Yao Qi June 17, 2014, 1:01 a.m. UTC | #28
On 06/13/2014 01:02 AM, Eli Zaretskii wrote:
> Maybe.  Can you tell what is the purpose of the test?  (Sorry, I know
> almost nothing about the test suite.)

Eli,
This test was added by the following patch, which is to fix the
incorrect placement of comma in repeated characters,

 [RFA] gdb/14288
 https://sourceware.org/ml/gdb-patches/2012-08/msg00780.html

The test wasn't about the \242-or-cent-sign-printing we discussed here.
  

Patch

diff --git a/gdb/testsuite/gdb.base/wchar.exp b/gdb/testsuite/gdb.base/wchar.exp
index 4290478..aa19d92 100644
--- a/gdb/testsuite/gdb.base/wchar.exp
+++ b/gdb/testsuite/gdb.base/wchar.exp
@@ -36,7 +36,10 @@  gdb_test "print simple\[2\]" "= 99 L'c'"
 
 gdb_test "print difficile\[2\]" "= 65261 L'\\\\xfeed'"
 
-set cent "\\\\242"
+# The contents in 'repeat' are shown differently under different
+# locale.  We match all the possible outputs here, '\242' or cent sign.
+set cent "(\\\\242|\u00A2)"
+
 gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"
 
 global hex