Message ID | 538FE412.1050806@codesourcery.com |
---|---|
State | Superseded |
Headers |
Received: (qmail 26422 invoked by alias); 5 Jun 2014 03:31:31 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <gdb-patches.sourceware.org> List-Unsubscribe: <mailto:gdb-patches-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 26409 invoked by uid 89); 5 Jun 2014 03:31:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00 autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 05 Jun 2014 03:31:28 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1WsOOh-00013S-Vl from Yao_Qi@mentor.com ; Wed, 04 Jun 2014 20:31:24 -0700 Received: from SVR-ORW-FEM-03.mgc.mentorg.com ([147.34.97.39]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 4 Jun 2014 20:31:21 -0700 Received: from qiyao.dyndns.org (147.34.91.1) by svr-orw-fem-03.mgc.mentorg.com (147.34.97.39) with Microsoft SMTP Server id 14.2.247.3; Wed, 4 Jun 2014 20:31:21 -0700 Message-ID: <538FE412.1050806@codesourcery.com> Date: Thu, 5 Jun 2014 11:29:22 +0800 From: Yao Qi <yao@codesourcery.com> User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Pedro Alves <palves@redhat.com>, Tom Tromey <tromey@redhat.com> CC: Joel Brobecker <brobecker@adacore.com>, <gdb-patches@sourceware.org> Subject: Re: [ping] [PATCH] Different outputs affected by locale References: <1401192650-29688-1-git-send-email-yao@codesourcery.com> <538EAEE5.2080708@codesourcery.com> <20140604124708.GR4289@adacore.com> <538F1CC3.9090605@codesourcery.com> <87oay8a0t6.fsf@fleche.redhat.com> <538F803A.9020007@redhat.com> In-Reply-To: <538F803A.9020007@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-IsSubscribed: yes |
Commit Message
Yao Qi
June 5, 2014, 3:29 a.m. UTC
On 06/05/2014 04:23 AM, Pedro Alves wrote: >> > I am not really a great standards lawyer but my first reaction is that >> > mingw's C locale is not conforming. At least from: >> > >> > http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html >> > >> > .. it seems to me that \242 is not defined as a 'print' character in the >> > LC_CTYPE section. Though I'd like to reiterate that I don't actually >> > trust my own reading of that text. > I wonder whether this is really a mingw issue, or whether this is a > remote host testing issue. That is, aren't we setting LC_CTYPE > on the _build_ (where expect runs), not on the host (mingw, through This is a not a mingw issue nor a remote host testing issue. If the LC_CTYPE isn't set properly on host, these tests will fail, even in the native testing. > ssh)? Is LC_CTYPE really being propagated to the host? No, setting env variables on host or target in dejagnu isn't trivial to me. > Does testing GDB manually directly on a Windows console show the same > issue? Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console). However, I didn't investigate why 'ó' is printed. gdb) p repeat $1 = L"A", 'ó' <repeats 21 times>, "B\000\xffff\200\000\x1370\500\xfe0c\"\x300\x 7ffe\xfe98\"\xe115\x771b\x67c9\x42c8\xfffe\xffff\x6d91\x7726\x1ae0@\xeb0:\x300\x 7ffe\xea8:\200\000Ω\000\xf480\x7594\000:\000\000\xf489\x7594\017\000\004\000Ω\00 0\xfe9c\"\x6094\x771e\xa2ac\x771f\xffff\xffff$\000\xfe98\"\004\000\000\000\x559\ xc000\xfea8\"\xf600\x7594\000\000\000\000\000\000\xfebc\"\xa442\x7594\x2a8\x759e \xfefc\"\xf4d2\x7594\b\000\x118e\x7595\x1162\x7595\x8ccb\x3e13\000\000\000\000\0 00\000\x1ae0@\xfed0\"\x8fe3\x759b\xffc4" here is the update patch to match either \242 or cent sign.
Comments
On 06/05/2014 04:29 AM, Yao Qi wrote: > On 06/05/2014 04:23 AM, Pedro Alves wrote: >>>> I am not really a great standards lawyer but my first reaction is that >>>> mingw's C locale is not conforming. At least from: >>>> >>>> http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html >>>> >>>> .. it seems to me that \242 is not defined as a 'print' character in the >>>> LC_CTYPE section. Though I'd like to reiterate that I don't actually >>>> trust my own reading of that text. >> I wonder whether this is really a mingw issue, or whether this is a >> remote host testing issue. That is, aren't we setting LC_CTYPE >> on the _build_ (where expect runs), not on the host (mingw, through > > This is a not a mingw issue nor a remote host testing issue. But that's a conflicting answer. It's a remote host testing if this only triggers with remote host testing. > If the > LC_CTYPE isn't set properly on host, these tests will fail, even in the > native testing. Sure, but it's supposed to be set, and then tests can assume so. If not set in some circumstance, then it's a bug in the test infrustruture, not the test. For native testing, those are set by gdb.exp:gdb_init. > >> ssh)? Is LC_CTYPE really being propagated to the host? > > No, setting env variables on host or target in dejagnu isn't trivial to > me. They need to be passed down explicitly in the ssh command line: $ ssh localhost "FOO=1 env | grep FOO" FOO=1 > >> Does testing GDB manually directly on a Windows console show the same >> issue? > > Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console). > However, I didn't investigate why 'ó' is printed. But was that with LC_CTYPE set to C?
On 06/05/2014 04:58 PM, Pedro Alves wrote: >> This is a not a mingw issue nor a remote host testing issue. > But that's a conflicting answer. It's a remote host testing > if this only triggers with remote host testing. > OK, it is a remote host testing issue, since LC_CTYPE is set on build only. >> > If the >> > LC_CTYPE isn't set properly on host, these tests will fail, even in the >> > native testing. > Sure, but it's supposed to be set, and then tests can assume so. > If not set in some circumstance, then it's a bug in the test > infrustruture, not the test. For native testing, those are > set by gdb.exp:gdb_init. > >> > >>> >> ssh)? Is LC_CTYPE really being propagated to the host? >> > >> > No, setting env variables on host or target in dejagnu isn't trivial to >> > me. > They need to be passed down explicitly in the ssh command line: > > $ ssh localhost "FOO=1 env | grep FOO" > FOO=1 > Yes, it is simple to pass env variable through ssh, but isn't trivial to pass env variable to host or target in dejagnu, because, - ssh is not the only connection dejagnu supports, how about telnet? - env variable should bind to board. host and target can have different env vars. I saw Jie's patch to set env var on target http://lists.gnu.org/archive/html/dejagnu/2008-07/msg00000.html but we need do more than that, IMO. That is the reason I am inclined to fix the test case instead of the infrastructure (dejagnu). >> > >>> >> Does testing GDB manually directly on a Windows console show the same >>> >> issue? >> > >> > Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console). >> > However, I didn't investigate why 'ó' is printed. > But was that with LC_CTYPE set to C? I don't know how check LC_CTYPE on Windows. :(
On 06/05/2014 10:56 AM, Yao Qi wrote: > On 06/05/2014 04:58 PM, Pedro Alves wrote: >>>>>> Does testing GDB manually directly on a Windows console show the same >>>>>> issue? >>>> >>>> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console). >>>> However, I didn't investigate why 'ó' is printed. >> But was that with LC_CTYPE set to C? > > I don't know how check LC_CTYPE on Windows. :( Try "set", and "set /?".
On 06/05/2014 10:56 AM, Yao Qi wrote: > Yes, it is simple to pass env variable through ssh, but isn't trivial to > pass env variable to host or target in dejagnu, because, > > - ssh is not the only connection dejagnu supports, how about telnet? Well, nobody really uses that for _host_ connections. > - env variable should bind to board. host and target can have > different env vars. > > I saw Jie's patch to set env var on target > http://lists.gnu.org/archive/html/dejagnu/2008-07/msg00000.html > but we need do more than that, IMO. That is the reason I am inclined to > fix the test case instead of the infrastructure (dejagnu). In practice, all real host board files will have a ${board}_spawn override anyway. We can set GDB's vars in a gdb_env array, similar to Jie's patch, and then the ${board}_spawn routine can pass them to $RSH. When/if Jie's patch is extended to bind to board, and accepted upstream, we just set the appropriate new board var to $gdb_env.
> Date: Thu, 5 Jun 2014 11:29:22 +0800 > From: Yao Qi <yao@codesourcery.com> > CC: Joel Brobecker <brobecker@adacore.com>, <gdb-patches@sourceware.org> > > However, I didn't investigate why 'ó' is printed. 'ó' is 243 decimal (0363 octal), not 242.
> Date: Thu, 05 Jun 2014 11:12:50 +0100 > From: Pedro Alves <palves@redhat.com> > CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org > > > I don't know how check LC_CTYPE on Windows. :( > > Try "set", and "set /?". Typing "set LC_TYPE" will either display its value or say that it is not defined.
On 06/05/2014 06:12 PM, Pedro Alves wrote: > On 06/05/2014 10:56 AM, Yao Qi wrote: >> On 06/05/2014 04:58 PM, Pedro Alves wrote: > >>>>>>> Does testing GDB manually directly on a Windows console show the same >>>>>>> issue? >>>>> >>>>> Yes, here is the output I got on Windows 7 (running gdb.exe in Windows console). >>>>> However, I didn't investigate why 'ó' is printed. >>> But was that with LC_CTYPE set to C? >> >> I don't know how check LC_CTYPE on Windows. :( > > Try "set", and "set /?". > LC_CTYPE isn't set on the Windows machine I am using. set LC_CTYPE=C, but the output is unchanged. I dive into locale stuff, and find something more, in main.c:captured_main, gdb does #if defined (HAVE_SETLOCALE) setlocale (LC_CTYPE, ""); #endif the man page of setlocale says If locale is "", each part of the locale that should be modified is set according to the environment variables. That is why we can pass env var to change gdb's locale. However, looks setlocale on Windows behaves differently when locale is "". The msdn about setlocale <http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> says "If locale points to an empty string, the locale is the implementation-defined native environment.", but it doesn't say much on the "implementation-defined native environment". The following example in the same page gives me some hints, setlocale( LC_ALL, "" ); Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system. As far as I can see, windows doesn't consider any env var with setlocale(FOO, ""). If I am correct, we can't set gdb's locale by means of setting env var, instead, we have to match all the possibilities in the testcase. WDYT?
On 06/09/2014 09:35 AM, Yao Qi wrote: > LC_CTYPE isn't set on the Windows machine I am using. set LC_CTYPE=C, > but the output is unchanged. > > I dive into locale stuff, and find something more, in > main.c:captured_main, gdb does > > #if defined (HAVE_SETLOCALE) > setlocale (LC_CTYPE, ""); > #endif > > the man page of setlocale says > > If locale is "", each part of the locale that should be modified is set > according to the environment variables. > > That is why we can pass env var to change gdb's locale. > > However, looks setlocale on Windows behaves differently when locale is > "". The msdn about setlocale > <http://msdn.microsoft.com/en-us/library/x99tb11d.aspx> says "If locale > points to an empty string, the locale is the implementation-defined > native environment.", but it doesn't say much on the > "implementation-defined native environment". The following example > in the same page gives me some hints, > > setlocale( LC_ALL, "" ); > Sets the locale to the default, which is the user-default ANSI code page > obtained from the operating system. > > As far as I can see, windows doesn't consider any env var with > setlocale(FOO, ""). Correct. > If I am correct, we can't set gdb's locale by means > of setting env var, Not true. It just means that GDB should be doing more on native Windows, instead of assuming setlocale on Windows behaves like the POSIX counterpart. See e.g., src/intl/localename.c (gettext): ... /* Let the user override the system settings through environment variables, as on POSIX systems. */ retval = getenv ("LC_ALL"); if (retval != NULL && retval[0] != '\0') return retval; retval = getenv (categoryname); if (retval != NULL && retval[0] != '\0') return retval; retval = getenv ("LANG"); if (retval != NULL && retval[0] != '\0') return retval; /* Use native Win32 API locale ID. */ lcid = GetThreadLocale (); ... etc. But that code has evolved upstream, and we have the solution already in gnulib. See: http://lists.gnu.org/archive/html/bug-gnulib/2011-02/msg00154.html Newer versions of intl/gettext override setlocale like that too: http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/setlocale.c > instead, we have to match all the possibilities in > the testcase. WDYT? I think the test caught a real GDB bug on Windows, and we should fix GDB to make it look at the environment variables, as is expected of GNU programs. And that the best way to handle this is to import the gnulib setlocale module.
On 06/09/2014 06:11 PM, Pedro Alves wrote: > I think the test caught a real GDB bug on Windows, and we > should fix GDB to make it look at the environment variables, > as is expected of GNU programs. And that the best way > to handle this is to import the gnulib setlocale module. I've started setlocale module import, but during the work, I did some experiments and the result is confusing me. We import setlocale so that we can set locale through env var, assuming that different locales affect the return value of iswprint (0xa2). However, this assumption isn't true on Windows :( I write the following program to check the return value of iswprint under different locales. On Linux, the output is reasonable $ ./iswprint 4 C: 0 en_US.UTF-8: 1 C: 0 On Windows, iswprint always return true! C:\>iswprint.win.exe 2 C: 16 English_United States.1252: 16 C: 16 iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C, iswprint (0xa2) behaves differently on Windows and Linux.
> Date: Wed, 11 Jun 2014 10:20:28 +0800 > From: Yao Qi <yao@codesourcery.com> > CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>, <gdb-patches@sourceware.org> > > We import setlocale so that we can set locale through env var, assuming > that different locales affect the return value of iswprint (0xa2). > However, this assumption isn't true on Windows :( > > I write the following program to check the return value of iswprint > under different locales. > > On Linux, the output is reasonable > $ ./iswprint > 4 > C: 0 > en_US.UTF-8: 1 > C: 0 > > On Windows, iswprint always return true! > C:\>iswprint.win.exe > 2 > C: 16 > English_United States.1252: 16 > C: 16 > > iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C, > iswprint (0xa2) behaves differently on Windows and Linux. Why do you need 0xa2 to be unprintable?
On 06/12/2014 12:22 AM, Eli Zaretskii wrote:
> Why do you need 0xa2 to be unprintable?
Test in gdb.base/wchar.exp expects 0xa2 being unprintable.
set cent "\\\\242"
gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"
but it is printable on mingw and causes several fails in wchar.exp. At
the beginning, I think this is caused by locale but the experiment later
shows that setting locale doesn't change anything (the subject becomes
misleading).
I should change the subject to "Different output affected by host", and
probably go back to use the patch to relax the pattern to
set cent "(\\\\242|\u00A2)"
> Date: Thu, 12 Jun 2014 08:46:23 +0800 > From: Yao Qi <yao@codesourcery.com> > CC: <palves@redhat.com>, <tromey@redhat.com>, <brobecker@adacore.com>, > <gdb-patches@sourceware.org> > > On 06/12/2014 12:22 AM, Eli Zaretskii wrote: > > Why do you need 0xa2 to be unprintable? > > Test in gdb.base/wchar.exp expects 0xa2 being unprintable. So you need _any_ character for which iswprint returns zero? If so, does the character have to be a single byte?
On 06/12/2014 10:46 AM, Eli Zaretskii wrote: > So you need _any_ character for which iswprint returns zero? If so, > does the character have to be a single byte? Find a character for which iswprint returns zero isn't the point, IMO. The problem is wchar.exp expects "\242" but GDB prints cent sign on mingw. Instead of changing to another character, isn't better to match both (\242 and cent sign) in regexp pattern?
On 06/11/2014 03:20 AM, Yao Qi wrote: > On 06/09/2014 06:11 PM, Pedro Alves wrote: >> I think the test caught a real GDB bug on Windows, and we >> should fix GDB to make it look at the environment variables, >> as is expected of GNU programs. And that the best way >> to handle this is to import the gnulib setlocale module. > > I've started setlocale module import, but during the work, I did some > experiments and the result is confusing me. > > We import setlocale so that we can set locale through env var, assuming > that different locales affect the return value of iswprint (0xa2). > However, this assumption isn't true on Windows :( Well, it actually is. > > I write the following program to check the return value of iswprint > under different locales. > > On Linux, the output is reasonable > $ ./iswprint > 4 > C: 0 > en_US.UTF-8: 1 > C: 0 > > On Windows, iswprint always return true! > C:\>iswprint.win.exe > 2 > C: 16 > English_United States.1252: 16 This shows that what happens is that on Windows the LC_CTYPE=C picks up the CP-1252 Windows code page (Latin 1), an extended ASCII code page. And in that code page, 162 is printable. > C: 16 > > iswprint return value depends on LC_CTYPE, but under LC_CTYPE=C, > iswprint (0xa2) behaves differently on Windows and Linux. > The difference is really in what locale/code page LC_CTYPE=C picks up. What does "show host-charset" show on Windows, before and after you make GDB pick LC_CTYPE=C from the environment (with the setlocale gnulib module)? (Ideally, the wchar tests would actually iterate testing GDB behaves as expected with different values of LC_CTYPE, etc. set in the environment. With all other tests assuming ASCII as set by default by the testsuite framework.)
On 06/12/2014 07:36 PM, Pedro Alves wrote: > What does "show host-charset" show on Windows, before and after > you make GDB pick LC_CTYPE=C from the environment (with the > setlocale gnulib module)? GDB on Windows gets host charset from GetACP(), in charset.c:_initialize_charset (). #elif defined (USE_WIN32API) { /* "CP" + x<=5 digits + paranoia. */ static char w32_host_default_charset[16]; snprintf (w32_host_default_charset, sizeof w32_host_default_charset, "CP%d", GetACP()); auto_host_charset_name = w32_host_default_charset; auto_target_charset_name = auto_host_charset_name; } #endif GetACP doesn't depend on locale, so I don't think LC_CTYPE=C affects the host-charset in GDB. However, I do this: printf ("%d\n", GetACP()); setlocale (LC_CTYPE, ""); printf ("%d\n", GetACP()); setlocale (LC_CTYPE, "C"); printf ("%d\n", GetACP()); On my Windows machine, 1252 is printed three times. > > (Ideally, the wchar tests would actually iterate testing GDB > behaves as expected with different values of LC_CTYPE, etc. set > in the environment. With all other tests assuming ASCII as set > by default by the testsuite framework.) On the condition that we know or enumerate the expected output for wchars under each LC_CTYPE on different host (or OS). Test like this is out of the scope of GDB (or debugger) testing, IMO.
> Date: Thu, 12 Jun 2014 15:02:57 +0800 > From: Yao Qi <yao@codesourcery.com> > CC: <palves@redhat.com>, <tromey@redhat.com>, <brobecker@adacore.com>, > <gdb-patches@sourceware.org> > > On 06/12/2014 10:46 AM, Eli Zaretskii wrote: > > So you need _any_ character for which iswprint returns zero? If so, > > does the character have to be a single byte? > > Find a character for which iswprint returns zero isn't the point, IMO. > The problem is wchar.exp expects "\242" but GDB prints cent sign on > mingw. Instead of changing to another character, isn't better to match > both (\242 and cent sign) in regexp pattern? Maybe. Can you tell what is the purpose of the test? (Sorry, I know almost nothing about the test suite.)
> Date: Thu, 12 Jun 2014 22:37:38 +0800 > From: Yao Qi <yao@codesourcery.com> > CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>, <gdb-patches@sourceware.org> > > GetACP doesn't depend on locale, so I don't think LC_CTYPE=C affects the > host-charset in GDB. Indeed, it doesn't. > However, I do this: > > printf ("%d\n", GetACP()); > > setlocale (LC_CTYPE, ""); > printf ("%d\n", GetACP()); > > setlocale (LC_CTYPE, "C"); > printf ("%d\n", GetACP()); > > On my Windows machine, 1252 is printed three times. As expected: GetACP returns the _default_ codepage, and the default does not change when you change a locale. And the iswprint function doesn't consult the default codepage. So I don't think this issue with GetACP is at all relevant.
> Date: Thu, 12 Jun 2014 12:36:29 +0100 > From: Pedro Alves <palves@redhat.com> > CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org > > (Ideally, the wchar tests would actually iterate testing GDB > behaves as expected with different values of LC_CTYPE, etc. set > in the environment. With all other tests assuming ASCII as set > by default by the testsuite framework.) What do you mean by "behaves as expected"? And why is LC_TYPE important here?
On 06/12/2014 03:37 PM, Yao Qi wrote: > On 06/12/2014 07:36 PM, Pedro Alves wrote: >> What does "show host-charset" show on Windows, before and after >> you make GDB pick LC_CTYPE=C from the environment (with the >> setlocale gnulib module)? > > GDB on Windows gets host charset from GetACP(), in > charset.c:_initialize_charset (). > > #elif defined (USE_WIN32API) > { > /* "CP" + x<=5 digits + paranoia. */ > static char w32_host_default_charset[16]; > > snprintf (w32_host_default_charset, sizeof w32_host_default_charset, > "CP%d", GetACP()); > auto_host_charset_name = w32_host_default_charset; > auto_target_charset_name = auto_host_charset_name; > } > #endif > I note gnulib's nl_langinfo replacement actually does the same thing. > GetACP doesn't depend on locale, Yeah, it's a mess, and those are really different things. The former is the system locale, while the latter the user locale. MSDN is confusing, but lots of blogs around explaining this. > so I don't think LC_CTYPE=C affects the > host-charset in GDB. However, I do this: > > printf ("%d\n", GetACP()); > > setlocale (LC_CTYPE, ""); > printf ("%d\n", GetACP()); > > setlocale (LC_CTYPE, "C"); > printf ("%d\n", GetACP()); > > On my Windows machine, 1252 is printed three times. So what I'm thinking is indeed going with making the test accept the cent, but conditioned, like: # Fallback to assuming 7-bit ASCII. Test are ran under LC_CTYPE=C. set cent "\\\\242" set test "show host-charset" gdb_test_multiple $test $test { -re "CP1252\r\n$gdb_prompt $" { # With Windows code page 1252 (Latin 1), the cent # is printable. set cent "\u00A2" pass $test } -re "$gdb_prompt $" { pass $test } } > >> >> (Ideally, the wchar tests would actually iterate testing GDB >> behaves as expected with different values of LC_CTYPE, etc. set >> in the environment. With all other tests assuming ASCII as set >> by default by the testsuite framework.) > > On the condition that we know or enumerate the expected output for > wchars under each LC_CTYPE on different host (or OS). Test like this > is out of the scope of GDB (or debugger) testing, IMO. Not an exaustive test, and not by host, but just by picking a couple charsets/locales. So that we at least ensure that the framework is all in sync. That is, check: $ unset LC_CTYPE; gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch $ LC_CTYPE=XXX gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch $ LC_CTYPE=en_US gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch $ LC_CTYPE=en_US.UTF-8 gdb -ex "show host-charset" -ex ' p "\u00A2"' --batch
On 06/12/2014 06:08 PM, Eli Zaretskii wrote: >> Date: Thu, 12 Jun 2014 12:36:29 +0100 >> From: Pedro Alves <palves@redhat.com> >> CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org >> >> (Ideally, the wchar tests would actually iterate testing GDB >> behaves as expected with different values of LC_CTYPE, etc. set >> in the environment. With all other tests assuming ASCII as set >> by default by the testsuite framework.) > > What do you mean by "behaves as expected"? And why is LC_TYPE > important here? I think I've answered this in my response to Yao.
> Date: Thu, 12 Jun 2014 18:23:34 +0100 > From: Pedro Alves <palves@redhat.com> > CC: Tom Tromey <tromey@redhat.com>, Joel Brobecker <brobecker@adacore.com>, gdb-patches@sourceware.org > > On 06/12/2014 03:37 PM, Yao Qi wrote: > > On 06/12/2014 07:36 PM, Pedro Alves wrote: > >> What does "show host-charset" show on Windows, before and after > >> you make GDB pick LC_CTYPE=C from the environment (with the > >> setlocale gnulib module)? > > > > GDB on Windows gets host charset from GetACP(), in > > charset.c:_initialize_charset (). > > > > #elif defined (USE_WIN32API) > > { > > /* "CP" + x<=5 digits + paranoia. */ > > static char w32_host_default_charset[16]; > > > > snprintf (w32_host_default_charset, sizeof w32_host_default_charset, > > "CP%d", GetACP()); > > auto_host_charset_name = w32_host_default_charset; > > auto_target_charset_name = auto_host_charset_name; > > } > > #endif > > > > I note gnulib's nl_langinfo replacement actually does > the same thing. And gnulib's nl_langinfo is wrong, btw, because one can use 'setlocale' to change the codeset, without any relation whatsoever to the console encoding. (I sent a fix for that to gnulib's list just yesterdat.) > > GetACP doesn't depend on locale, > > Yeah, it's a mess, and those are really different > things. The former is the system locale, while the latter > the user locale. That's true, but that's not the important issue here. The important issue here is the fundamental difference between the Windows console encoding and the current locale's codeset. The former affects how Windows writes to the console, and in most cases changing the console codepage (e.g., with SetConsoleCP or SetConsoleOutputCP) is a futile exercise, because all it does is cause garbled display. The latter is an important feature when you are dealing with programs that don't intend using the codeset to display text to the user, but, for example, to change the behavior of iswprint. Using the console codepage when really the locale's codeset is needed will only going to work for the default setup, not when you want or need to change the locale.
> Date: Thu, 12 Jun 2014 18:26:47 +0100 > From: Pedro Alves <palves@redhat.com> > CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com, gdb-patches@sourceware.org > > > What do you mean by "behaves as expected"? And why is LC_TYPE > > important here? > > I think I've answered this in my response to Yao. Not really, but you don't have to explain as long as the original problem is solved.
On 06/12/2014 06:49 PM, Eli Zaretskii wrote: >> Date: Thu, 12 Jun 2014 18:26:47 +0100 >> From: Pedro Alves <palves@redhat.com> >> CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com, gdb-patches@sourceware.org >> >>> What do you mean by "behaves as expected"? And why is LC_TYPE >>> important here? >> >> I think I've answered this in my response to Yao. > > Not really, but you don't have to explain as long as the original > problem is solved. Trying again then. The testsuite framework does, in gdb.exp:gdb_init: # We set LC_ALL, LC_CTYPE, and LANG to C so that we get the same # messages as expected. setenv LC_ALL C setenv LC_CTYPE C setenv LANG C ... so that output is stable for everyone. And if we do that, we miss making sure GDB works correctly with locales/charsets other than C/ASCII on most hosts. So I was just saying that IMO ideally we'd have tests that make sure GDB prints what we think it should print when LC_CTYPE (etc.) is set to something else, like e.g., en_US.UTF-8. Does that answer the question?
> Date: Thu, 12 Jun 2014 19:05:52 +0100 > From: Pedro Alves <palves@redhat.com> > CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com, > gdb-patches@sourceware.org > > Trying again then. Thanks. > The testsuite framework does, in gdb.exp:gdb_init: > > # We set LC_ALL, LC_CTYPE, and LANG to C so that we get the same > # messages as expected. > setenv LC_ALL C > setenv LC_CTYPE C > setenv LANG C > > ... so that output is stable for everyone. With you so far. But note that on Windows, even the above does not guarantee "stable output", because the console codepage is not changed by 'setlocale', and moreover, the Windows 'setlocale' doesn't pay attention to environment variables. So on Windows, these tests run in the default system locale (because we call 'setlocale' with the 2nd argument an empty string). > And if we do that, we miss making sure GDB works correctly > with locales/charsets other than C/ASCII on most hosts. And here, "works correctly" means what? sets host-charset? or something else? Assuming the former below. > So I was just saying that IMO ideally we'd have tests that > make sure GDB prints what we think it should print when > LC_CTYPE (etc.) is set to something else, like e.g., > en_US.UTF-8. You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset (although there is a UTF-8 codepage, and Windows does support it in general). More importantly, since 'setlocale' on Windows disregards the environment variables, you cannot change the host charset by setting environment variables. You must do that by a GDB command that sets host-charset.
On 06/12/2014 07:34 PM, Eli Zaretskii wrote: > With you so far. But note that on Windows, even the above does not > guarantee "stable output", because the console codepage is not changed > by 'setlocale', I guess the harmness could run gdb under chcp 65001 or some such. > You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset > (although there is a UTF-8 codepage, and Windows does support it in > general). More importantly, since 'setlocale' on Windows disregards > the environment variables, you cannot change the host charset by > setting environment variables. You must do that by a GDB command that > sets host-charset. See https://sourceware.org/ml/gdb-patches/2014-06/msg00364.html .
> Date: Mon, 16 Jun 2014 14:57:59 +0100 > From: Pedro Alves <palves@redhat.com> > CC: yao@codesourcery.com, tromey@redhat.com, brobecker@adacore.com, > gdb-patches@sourceware.org > > On 06/12/2014 07:34 PM, Eli Zaretskii wrote: > > > With you so far. But note that on Windows, even the above does not > > guarantee "stable output", because the console codepage is not changed > > by 'setlocale', > > I guess the harmness could run gdb under chcp 65001 or some such. You could, but it won't help, really. It's a long story, but support for UTF-8 on a Windows console is pathetic. With enough trouble (which will need source changes in GDB and in Readline), you might have European characters displayed correctly, if you also change the console font to Lucida Console. But anything beyond European characters simply cannot be displayed, because the font doesn't have them. > > You cannot ask the Windows 'setlocale' to use UTF-8 as the codeset > > (although there is a UTF-8 codepage, and Windows does support it in > > general). More importantly, since 'setlocale' on Windows disregards > > the environment variables, you cannot change the host charset by > > setting environment variables. You must do that by a GDB command that > > sets host-charset. > > See https://sourceware.org/ml/gdb-patches/2014-06/msg00364.html . If you mean the last 2 sentences, then yes, using setlocale from gnulib will fix that. But the problem with UTF-8 as the charset isn't (and AFAIK cannot be) solved by gnulib, because Windows simply does not support codepage 65001 in its setlocale implementation (this is documented in MSDN).
On 06/16/2014 04:40 PM, Eli Zaretskii wrote: >>> With you so far. But note that on Windows, even the above does not >>> guarantee "stable output", because the console codepage is not changed >>> by 'setlocale', >> >> I guess the harmness could run gdb under chcp 65001 or some such. > > You could, but it won't help, really. It's a long story, but support > for UTF-8 on a Windows console is pathetic. With enough trouble > (which will need source changes in GDB and in Readline), you might > have European characters displayed correctly, if you also change the > console font to Lucida Console. But anything beyond European > characters simply cannot be displayed, because the font doesn't have > them. OK, I was focusing more on the "stable output" aspect than the specific codepage.
On 06/13/2014 01:02 AM, Eli Zaretskii wrote: > Maybe. Can you tell what is the purpose of the test? (Sorry, I know > almost nothing about the test suite.) Eli, This test was added by the following patch, which is to fix the incorrect placement of comma in repeated characters, [RFA] gdb/14288 https://sourceware.org/ml/gdb-patches/2012-08/msg00780.html The test wasn't about the \242-or-cent-sign-printing we discussed here.
diff --git a/gdb/testsuite/gdb.base/wchar.exp b/gdb/testsuite/gdb.base/wchar.exp index 4290478..aa19d92 100644 --- a/gdb/testsuite/gdb.base/wchar.exp +++ b/gdb/testsuite/gdb.base/wchar.exp @@ -36,7 +36,10 @@ gdb_test "print simple\[2\]" "= 99 L'c'" gdb_test "print difficile\[2\]" "= 65261 L'\\\\xfeed'" -set cent "\\\\242" +# The contents in 'repeat' are shown differently under different +# locale. We match all the possible outputs here, '\242' or cent sign. +set cent "(\\\\242|\u00A2)" + gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*" global hex