RFC: locale-source validation script

Message ID s9dwp6v6x3h.fsf@redhat.com
State Not applicable
Headers

Commit Message

Mike FABIAN July 26, 2017, 3:23 p.m. UTC
  Zack Weinberg <zackw@panix.com> wrote:

> On Wed, Jul 26, 2017 at 8:44 AM, Mike FABIAN <mfabian@redhat.com> wrote:
>> Zack Weinberg <zackw@panix.com> wrote:
>>
>>> - The complaints about "inappropriate character '\t'" are all caused
>>> by _unintentional_ tabs inside strings.  If you write
>>>
>>> message "xyz/
>>>          abc"
>>>
>>> the whitespace on the second line gets included in the string, which
>>> is not what you want.
>>
>> Yes, at the moment we get for example:
>>
>> $ LC_ALL=et_EE.UTF-8 locale -k postal_fmt
>> postal_fmt="%a%N %f%N %d%N %b%N %s%t%h%t%e%t%r%N %C-%z %T%N %c%N"
>>
>> I’ll fix it like this, this is far more readable as well:
>
> Note that there's probably a bunch of similar cases where the
> undesirable whitespace is just space characters, no tabs - my script
> won't catch that.  (I won't be working on it today, but this is on my
> list of things to fix.)

Just as a quick hack to find these cases I added the following to
your script to find sequences of 2 or more spaces in strings:

This found only 3 instances of space sequences, all of them appeared
to be errors and I pushed a fix.
  

Patch

--- check-localedef.py.~1~    2017-07-26 08:14:27.052046435 +0200
+++ check-localedef.py        2017-07-26 16:32:50.625185637 +0200
@@ -369,6 +369,9 @@ 
                 if c != ' ' and not isgraph(c):
                     log.error(fp.lineno, "inappropriate character {!r} in {}",
                               c, "string" if end_char == '"' else "symbol")
+                if c == ' ' and tbuf[-1:] == [' ']:
+                    log.error(fp.lineno, "suspicous sequence of spaces {}", tbuf)
+                    tbuf.append(c)
                 else:
                     tbuf.append(c)