Different outputs affected by locale

Message ID 1401192650-29688-1-git-send-email-yao@codesourcery.com
State Superseded
Headers

Commit Message

Yao Qi May 27, 2014, 12:10 p.m. UTC
  We find the following fails in gdb test on mingw host.

FAIL: gdb.base/wchar.exp: print repeat
FAIL: gdb.base/wchar.exp: print repeat_p
FAIL: gdb.base/wchar.exp: print repeat (print null on)
FAIL: gdb.base/wchar.exp: print repeat (print elements 3)
FAIL: gdb.base/wchar.exp: print repeat_p (print elements 3)

print repeat^M
$7 = L"A", '¢' <repeats 21 times>, "B", '\000' <repeats 104 times>^M
(gdb) FAIL: gdb.base/wchar.exp: print repeat

the \242 is expected in the test but cent sign is displayed.

In valprint.c:print_wchar, wchar_printable is called to determine
whether a wchar is printable.  wchar_printable calls iswprint but
the iswprint's return value depends on LC_CTYPE setting of locale [1, 2].
The output may vary with different locale settings.  I noticed that
gdb.exp:gdb_init set LC_CTYPE to C.  If I remove that line, tests
fail on native testing too.

IMO, either \242 or '¢' (cent sign) is a correct output, which is
affect by locale, and it is not related to gdb at all.

[1] http://pubs.opengroup.org/onlinepubs/009604499/functions/iswprint.html
[2] msdn.microsoft.com/en-us/library/ewx8s4kw.aspx

This patch is to add code to 'p repeat[1]' to extract the cent first,
and then use it to match in the following tests.

gdb/testsuite:

2014-05-27  Yao Qi  <yao@codesourcery.com>

	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
	cent from the output.
---
 gdb/testsuite/gdb.base/wchar.exp | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
  

Comments

Yao Qi June 4, 2014, 5:30 a.m. UTC | #1
On 05/27/2014 08:10 PM, Yao Qi wrote:
> We find the following fails in gdb test on mingw host.
> 
> FAIL: gdb.base/wchar.exp: print repeat
> FAIL: gdb.base/wchar.exp: print repeat_p
> FAIL: gdb.base/wchar.exp: print repeat (print null on)
> FAIL: gdb.base/wchar.exp: print repeat (print elements 3)
> FAIL: gdb.base/wchar.exp: print repeat_p (print elements 3)
> 
> print repeat^M
> $7 = L"A", '¢' <repeats 21 times>, "B", '\000' <repeats 104 times>^M
> (gdb) FAIL: gdb.base/wchar.exp: print repeat
> 
> the \242 is expected in the test but cent sign is displayed.
> 
> In valprint.c:print_wchar, wchar_printable is called to determine
> whether a wchar is printable.  wchar_printable calls iswprint but
> the iswprint's return value depends on LC_CTYPE setting of locale [1, 2].
> The output may vary with different locale settings.  I noticed that
> gdb.exp:gdb_init set LC_CTYPE to C.  If I remove that line, tests
> fail on native testing too.
> 
> IMO, either \242 or '¢' (cent sign) is a correct output, which is
> affect by locale, and it is not related to gdb at all.
> 
> [1] http://pubs.opengroup.org/onlinepubs/009604499/functions/iswprint.html
> [2] msdn.microsoft.com/en-us/library/ewx8s4kw.aspx
> 
> This patch is to add code to 'p repeat[1]' to extract the cent first,
> and then use it to match in the following tests.
> 
> gdb/testsuite:
> 
> 2014-05-27  Yao Qi  <yao@codesourcery.com>
> 
> 	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
> 	cent from the output.

Ping.
  
Joel Brobecker June 4, 2014, 12:47 p.m. UTC | #2
> > 2014-05-27  Yao Qi  <yao@codesourcery.com>
> > 
> > 	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
> > 	cent from the output.

This is a patch that I felt would be better reviewed by Tom, but
we'd have to wait for him to be back. When I read your patch,
I thought that the approach you took was weakening the test a little,
because if GDB started printing the character incorrectly, you would
not notice it anymore.
  
Yao Qi June 4, 2014, 1:18 p.m. UTC | #3
On 06/04/2014 08:47 PM, Joel Brobecker wrote:
>>> 2014-05-27  Yao Qi  <yao@codesourcery.com>
>>>
>>> 	* gdb.base/wchar.exp: Execute command 'p repeat[1]' and extract
>>> 	cent from the output.
> 
> This is a patch that I felt would be better reviewed by Tom, but
> we'd have to wait for him to be back. When I read your patch,
> I thought that the approach you took was weakening the test a little,
> because if GDB started printing the character incorrectly, you would
> not notice it anymore.
> 

The character printed by GDB in this case is out the control of GDB,
IMO.  IOW, we can't tell what character printed is correct and what is
incorrect.  Or we can relax the pattern to match either \242 or '¢'
(cent sign) in the test.  WDYT?
  
Joel Brobecker June 4, 2014, 1:52 p.m. UTC | #4
> The character printed by GDB in this case is out the control of GDB,
> IMO.

IIRC, it is a little bit by ways of how it decodes multibyte characters?

> IOW, we can't tell what character printed is correct and what is
> incorrect.  Or we can relax the pattern to match either \242 or '¢'
> (cent sign) in the test.  WDYT?

That would have been my first approach, but I would prefer it if
someone who knows better about encodings commented on that. I could
be wrong!
  
Tom Tromey June 4, 2014, 8:15 p.m. UTC | #5
>>>>> "Yao" == Yao Qi <yao@codesourcery.com> writes:

Yao> The character printed by GDB in this case is out the control of GDB,
Yao> IMO.  IOW, we can't tell what character printed is correct and what is
Yao> incorrect.  Or we can relax the pattern to match either \242 or '¢'
Yao> (cent sign) in the test.  WDYT?

I think that would be preferable.  It is more conservative for the
reason Joel pointed out; and should we encounter a system that emits
something else, it is easy to update the test at that time.

I am not really a great standards lawyer but my first reaction is that
mingw's C locale is not conforming.  At least from:

    http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html

.. it seems to me that \242 is not defined as a 'print' character in the
LC_CTYPE section.  Though I'd like to reiterate that I don't actually
trust my own reading of that text.

Tom
  
Pedro Alves June 4, 2014, 8:23 p.m. UTC | #6
On 06/04/2014 09:15 PM, Tom Tromey wrote:

> I am not really a great standards lawyer but my first reaction is that
> mingw's C locale is not conforming.  At least from:
> 
>     http://pubs.opengroup.org/onlinepubs/009604499/basedefs/xbd_chap07.html
> 
> .. it seems to me that \242 is not defined as a 'print' character in the
> LC_CTYPE section.  Though I'd like to reiterate that I don't actually
> trust my own reading of that text.

I wonder whether this is really a mingw issue, or whether this is a
remote host testing issue.  That is, aren't we setting LC_CTYPE
on the _build_ (where expect runs), not on the host (mingw, through
ssh)?  Is LC_CTYPE really being propagated to the host?
Does testing GDB manually directly on a Windows console show the same
issue?
  

Patch

diff --git a/gdb/testsuite/gdb.base/wchar.exp b/gdb/testsuite/gdb.base/wchar.exp
index 4290478..215d2f4 100644
--- a/gdb/testsuite/gdb.base/wchar.exp
+++ b/gdb/testsuite/gdb.base/wchar.exp
@@ -36,7 +36,23 @@  gdb_test "print simple\[2\]" "= 99 L'c'"
 
 gdb_test "print difficile\[2\]" "= 65261 L'\\\\xfeed'"
 
-set cent "\\\\242"
+# The contents in 'repeat' are shown differently under different
+# locale.  In stead of hard code the cent sign in variable 'cent',
+# extract it from the output of 'print repeat[1]', and use it to
+# match the output in the following tests.
+set cent ""
+set test "get cent"
+gdb_test_multiple "p repeat\[1\]" $test {
+    -re " = 162 L'(.*)'.*\r\n$gdb_prompt $" {
+	set cent [string_to_regexp $expect_out(1,string)]
+	pass $test
+    }
+    -re ".*$gdb_prompt $" {
+	fail $test
+	return
+    }
+}
+
 gdb_test "print repeat" "= L\"A\", '$cent' <repeats 21 times>, \"B.*"
 
 global hex