Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8.

Message ID 6b0df162-d17a-ec1d-0fc0-728dc3a8ef9b@linux.ibm.com
State Superseded
Headers

Commit Message

Stefan Liebler Aug. 26, 2019, 2:41 p.m. UTC
  Hi,

the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
Unfortunately the test posix/tst-regex assumes the former encoding.

This patch just changes the encoding back to ISO-8859.
Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
which leads to two further matches in the first call to test_expr.

Bye
Stefan

ChangeLog:

	* ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding.
  

Comments

Zack Weinberg Aug. 26, 2019, 3:06 p.m. UTC | #1
On Mon, Aug 26, 2019 at 10:41 AM Stefan Liebler <stli@linux.ibm.com> wrote:
> the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
> Unfortunately the test posix/tst-regex assumes the former encoding.
>
> This patch just changes the encoding back to ISO-8859.
> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
> which leads to two further matches in the first call to test_expr.

Better the test should be fixed to assume UTF-8, or, even better, not
to rely on ChangeLog files for test data.

zw
  
Carlos O'Donell Aug. 27, 2019, 6:41 p.m. UTC | #2
On 8/26/19 10:41 AM, Stefan Liebler wrote:
> Hi,
> 
> the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
> Unfortunately the test posix/tst-regex assumes the former encoding.
> 
> This patch just changes the encoding back to ISO-8859.
> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
> which leads to two further matches in the first call to test_expr.
> 
> Bye
> Stefan
> 
> ChangeLog:
> 
>     * ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding.
Stefan, Thank you for trying to fix this. I think we should handle
it slightly different and set a precedent here.

Please copy ChangeLog.8 to tst-regex.in and use a test-specific input
file. I find it silly that we've been using ChangeLog.8 for anything.

To be clear I think we should:
* Leave the ChangeLog in UTF-8. -- This is for attribution only.
* Add a new tst-regex.in  -- This is for testing only.

Likewise if you find anything else like this I'd say we split out the
inputs into test-specific inputs.

Any time we mix attribution and testing is bound to go wrong.
  

Patch

commit 695be39994979a372e7643eabf90578de6246e20
Author: Stefan Liebler <stli@linux.ibm.com>
Date:   Mon Aug 26 13:47:21 2019 +0200

    Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8
    
    The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
    changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
    Unfortunately the test posix/tst-regex assumes the former encoding.
    
    This patch just changes the encoding back to ISO-8859.
    Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
    which leads to two further matches in the first call to test_expr.
    
    ChangeLog:
    
            * ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding.

diff --git a/ChangeLog.old/ChangeLog.8 b/ChangeLog.old/ChangeLog.8
index c48660d23a..12988d982a 100644
--- a/ChangeLog.old/ChangeLog.8
+++ b/ChangeLog.old/ChangeLog.8
@@ -6025,7 +6025,7 @@ 
 	(Host Address Functions): Use uint32_t consequently and add a
 	number of clarifications for IPv4/IPv6, classless addresses.
 	(Internet Namespace): Added some paragraphs about IPv6.
-	Based on suggestions by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
+	Based on suggestions by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
 
 1998-04-05  Philip Blundell  <Philip.Blundell@pobox.com>
 
@@ -6565,7 +6565,7 @@ 
 	* manual/examples/mkfsock.c (make_named_socket): Removed blank
 	lines for clarification.
 	(make_named_socket): Use strncpy instead of strcpy.
-	Reported by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
+	Reported by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
 
 1998-03-30 13:28  Ulrich Drepper  <drepper@cygnus.com>
 
@@ -8314,7 +8314,7 @@ 
 1998-02-27  Ulrich Drepper  <drepper@cygnus.com>
 
 	* misc/efgcvt_r.c (APPEND): Handle printing of 0.0 correctly.
-	Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
+	Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
 
 	* misc/tst-efgcvt.c (ecvt_tests): Add new test case for reported
 	bug.
@@ -8322,7 +8322,7 @@ 
 1998-02-25  Andreas Jaeger  <aj@arthur.rhein-neckar.de>
 
 	* manual/arith.texi (Old-style number conversion): Correct
-	typo. Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
+	typo. Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
 
 1998-02-27  Ulrich Drepper  <drepper@cygnus.com>
 
diff --git a/posix/tst-regex.c b/posix/tst-regex.c
index fe05da9b63..995ad38c7f 100644
--- a/posix/tst-regex.c
+++ b/posix/tst-regex.c
@@ -124,7 +124,7 @@  do_test (void)
 
   /* Run the actual tests.  All tests are run in a single-byte and a
      multi-byte locale.  */
-  result = test_expr ("[äáàâéèêíìîñöóòôüúùû]", 2, 2);
+  result = test_expr ("[äáàâéèêíìîñöóòôüúùû]", 4, 4);
   result |= test_expr ("G.ran", 2, 3);
   result |= test_expr ("G.\\{1\\}ran", 2, 3);
   result |= test_expr ("G.*ran", 3, 44);