Mark non-UTF-8 localedata test input files as binary

Message ID mvmcyzxitrj.fsf@suse.de
State New
Headers
Series Mark non-UTF-8 localedata test input files as binary |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed

Commit Message

Andreas Schwab Aug. 8, 2023, 12:49 p.m. UTC
  Non-UTF-8 files can cause problems, because a mixture of encodings in the
same patch is hard to handle.  Mark the non-UTF-8 files as binary to
avoid that.
---
 .gitattributes | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

Florian Weimer Aug. 8, 2023, 2:11 p.m. UTC | #1
* Andreas Schwab via Libc-alpha:

> Non-UTF-8 files can cause problems, because a mixture of encodings in the
> same patch is hard to handle.  Mark the non-UTF-8 files as binary to
> avoid that.
> ---
>  .gitattributes | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/.gitattributes b/.gitattributes
> index 06b553db80..1492f6c84d 100644
> --- a/.gitattributes
> +++ b/.gitattributes
> @@ -1,2 +1,6 @@
>  ChangeLog    merge=merge-changelog
>  timezone/* -whitespace
> +localedata/*.ISO-8859-*.in binary
> +localedata/cs_CZ.in binary
> +localedata/th_TH.in binary
> +localedata/tst-langinfo.sh binary

What about the other files?

git ls-files |
  while read p
  do iconv -f UTF-8 -t UTF-8 > /dev/null $p 2>/dev/null || echo $p
  done

The *.S files we should probably simply convert to UTF-8.  But even
under localedata/, there are more persistent exceptions.

Thanks,
Florian
  
Andreas Schwab Aug. 8, 2023, 2:32 p.m. UTC | #2
On Aug 08 2023, Florian Weimer wrote:

> * Andreas Schwab via Libc-alpha:
>
>> Non-UTF-8 files can cause problems, because a mixture of encodings in the
>> same patch is hard to handle.  Mark the non-UTF-8 files as binary to
>> avoid that.
>> ---
>>  .gitattributes | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/.gitattributes b/.gitattributes
>> index 06b553db80..1492f6c84d 100644
>> --- a/.gitattributes
>> +++ b/.gitattributes
>> @@ -1,2 +1,6 @@
>>  ChangeLog    merge=merge-changelog
>>  timezone/* -whitespace
>> +localedata/*.ISO-8859-*.in binary
>> +localedata/cs_CZ.in binary
>> +localedata/th_TH.in binary
>> +localedata/tst-langinfo.sh binary
>
> What about the other files?

git's builtin binary detection already works for files with null bytes.
For the others, they should probably be marked binary as well.

> The *.S files we should probably simply convert to UTF-8.

Yes.
  
Mike FABIAN Aug. 29, 2023, 5:31 p.m. UTC | #3
Andreas Schwab via Libc-alpha <libc-alpha@sourceware.org> さんはかきました:

> Non-UTF-8 files can cause problems, because a mixture of encodings in the
> same patch is hard to handle.  Mark the non-UTF-8 files as binary to
> avoid that.
> ---
>  .gitattributes | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/.gitattributes b/.gitattributes
> index 06b553db80..1492f6c84d 100644
> --- a/.gitattributes
> +++ b/.gitattributes
> @@ -1,2 +1,6 @@
>  ChangeLog    merge=merge-changelog
>  timezone/* -whitespace
> +localedata/*.ISO-8859-*.in binary
> +localedata/cs_CZ.in binary
> +localedata/th_TH.in binary
> +localedata/tst-langinfo.sh binary
> -- 
>
> 2.41.0

Does that help with applying this Thai collation patch?:

https://patchwork.sourceware.org/project/glibc/patch/20230725164811.1961209-1-mfabian@redhat.com/
  
Andreas Schwab Aug. 30, 2023, 7:28 a.m. UTC | #4
On Aug 29 2023, Mike FABIAN wrote:

> Does that help with applying this Thai collation patch?:

It helps generating a usable patch.
  
Mike FABIAN Aug. 30, 2023, 8:05 a.m. UTC | #5
Andreas Schwab <schwab@suse.de> さんはかきました:

> On Aug 29 2023, Mike FABIAN wrote:
>
>> Does that help with applying this Thai collation patch?:
>
> It helps generating a usable patch.

Like this:

$ cat 0002-Remove-unused-localedata-th_TH.in.patch
From 59b6e249452b2a3ad402b4307d88c13285ea0a80 Mon Sep 17 00:00:00 2001
From: Mike FABIAN <mfabian@redhat.com>
Date: Tue, 25 Jul 2023 16:00:28 +0200
Subject: [PATCH 2/2] Remove unused localedata/th_TH.in

---
 localedata/th_TH.in | Bin 1081 -> 0 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 localedata/th_TH.in

diff --git a/localedata/th_TH.in b/localedata/th_TH.in
deleted file mode 100644
index cc93d1f264dec688157d4435ec255f2739499d36..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 1081
zcmZuwO-~b16y2X+(Vemw0!w!^1XdDESQ-~&olp|fk^$Kmf6N<eMr&%s7GbC&)9E;3
zQ)nL@EV~x@n1Pz;Kk&RM1s6_w&pr3t``)|vCY_q%hY5a|oJc)gS~ZOulDbRD?`(Ku
z|44s+-%2WnYeAK$I60KuGd4>7PkK0d?a@uu^O5ud_xK?V9cc}^hV(de{_a3i$|TZE
zk&P!hGieGcVoW;X_ZcPP0*Ri<8o4B%Nn~Su=;;(m&nysTRl%ak%d<T3^l{!?gue8|
zAW&Q~)v9PJn&!>3nAUUo9H!^zA0U&@FOz8r)fCS!8@a5$$Y<v=X>&FE%p^(OvN=7=
z!)r?!Lt8eD6~2>OO~TZgVXo-I%#aEJ;dyQX18Bl)^=pyMP;NuCdZN5OfM~NQmI1^e
zM8|;$BZ#9bk}mDIf}V(Nf*(tTVJ%RK;<7~uqeKWVYbyHD6zB_^*20N!XdZ9+$-_7m
z5zU7cJFpx!U`ZQRq#(YA^Mta?iewE{>qr%o6araG$a$)8JS^+kY*yv^98mct-|krs
z?LtA6unPLt1^cl$cHiP43amgTvekBy%+M<ywD2;hpx}ACT7m<-YQmN-UU%IcpZJUT
zMhX<!&|6MXl4ZOLg~WUIye=Eu!iR|4DSN0m8(xSFUsmwZaj<#p54SiX2R9NGPTg>$
zri(2{7NiG<D|X^TIAM(KZmA>{Hhfz-P5G)sl@C=}SG}ZN(bxhtLQLpaIqs)+1)n7j
zcFG6XDR+vs0lskCyK*mt%cHfZX~Uz{H@7+i)G61`2tNOl5?{}X5`n-s>0{58_%6CW
r8eV6&TPn6|Xw)|qvYCFvjtFYW4aZf`2o&F@xoEPReTmck6CZy8+MXrU
  

Patch

diff --git a/.gitattributes b/.gitattributes
index 06b553db80..1492f6c84d 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,2 +1,6 @@ 
 ChangeLog    merge=merge-changelog
 timezone/* -whitespace
+localedata/*.ISO-8859-*.in binary
+localedata/cs_CZ.in binary
+localedata/th_TH.in binary
+localedata/tst-langinfo.sh binary