[RFC] newlib/libc/include/langinfo.h: nl_langinfo enum off by one error causing pointer overwrite
Message ID | d1a0b9da87e90420bbaf192ba01a141386f07966.1724283807.git.Brian.Inglis@SystematicSW.ab.ca |
---|---|
State | New |
Headers |
Return-Path: <newlib-bounces~patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 03146385E45D for <patchwork@sourceware.org>; Thu, 22 Aug 2024 00:00:07 +0000 (GMT) X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by sourceware.org (Postfix) with ESMTPS id 325BD3858435 for <newlib@sourceware.org>; Wed, 21 Aug 2024 23:59:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 325BD3858435 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=SystematicSW.ab.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=SystematicSW.ab.ca ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 325BD3858435 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=216.40.44.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724284792; cv=none; b=DWKyRA0qhk6AvrgZKM30Bei37AZmJmv302zxtazbN+VmY++xZ74vu30SDbb0PqKYkO4vUqQ+LooBwvPHGlPO+6f2DJpunFumu5fUNRuuuQ/IJ0H8BoHsmrgp8YXkX7op6tGTpCqB6gRmjUSYxVWfAujN9ECTdTFMmxC1Meivfag= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724284792; c=relaxed/simple; bh=wzw0Cy+UNMP2VBF5Gbv5Fz4WNN74A30LVkhG5cQ3awo=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=tvT5dlyXf8PdSfFR1rk6ep6btFA8roN6xNp3n0cSxsatbca6HL8WTOF4yiOXjloKHv0rSbPyCMKHnh+Oswc/V03Hn/somi9BuI0PSPXptnBN8PR/y2SOM21ThFHce7TRk2XE6rWMFWyFDOxNGqzdr7H5SxnZEskcsJM2leIrboY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from omf03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AA49D1212AD; Wed, 21 Aug 2024 23:59:24 +0000 (UTC) Received: from [HIDDEN] (Authenticated sender: Brian.Inglis@SystematicSW.ab.ca) by omf03.hostedemail.com (Postfix) with ESMTPA id 4390E6000D; Wed, 21 Aug 2024 23:59:23 +0000 (UTC) From: Brian Inglis <Brian.Inglis@SystematicSW.ab.ca> To: newlib@sourceware.org (Newlib) Subject: [RFC] newlib/libc/include/langinfo.h: nl_langinfo enum off by one error causing pointer overwrite Date: Wed, 21 Aug 2024 17:58:46 -0600 Message-ID: <d1a0b9da87e90420bbaf192ba01a141386f07966.1724283807.git.Brian.Inglis@SystematicSW.ab.ca> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Organization: Systematic Software Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4390E6000D X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Stat-Signature: hs7nzydgtktyehk5g9zh8crbuhdh8ugd X-Rspamd-Server: rspamout08 X-Session-Marker: 427269616E2E496E676C69734053797374656D6174696353572E61622E6361 X-Session-ID: U2FsdGVkX18A1wbjFuUqMRcHsnW8YgPP3J2WPhwZo+c= X-HE-Tag: 1724284763-910363 X-HE-Meta: U2FsdGVkX1+guhF9uV2KmsIE0vvPJR/TnW6wQZvZH6CxAKOWKS1fPXVwIOYv15AnHNm36xNa8AKjNef4KNV5N52nE9pRXCdFkYoepxaInceMPRkdABTxVQVktWu5VHheafS57lz1zr1Ah8nrQLTamLOoq+yJ/6z18ldzEZW+G1v55uaCQydZSePv/WGK7Deus6hLxDjq8mmAaqBvgIP3fI6ZiZQ+4YJMdIY9XiTzww0jnesX6Pl3LrvfN6RMWvU/p4/eG5BuAvHozn+9UcuSxAcZLpEdAZdI9xFcYf8/fDrH4p0xAVl1QVMk+sVFtDSSU09iTv7Qepo= X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list <newlib.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/newlib>, <mailto:newlib-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/newlib/> List-Post: <mailto:newlib@sourceware.org> List-Help: <mailto:newlib-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/newlib>, <mailto:newlib-request@sourceware.org?subject=subscribe> Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org |
Series |
[RFC] newlib/libc/include/langinfo.h: nl_langinfo enum off by one error causing pointer overwrite
|
|
Commit Message
Brian Inglis
Aug. 21, 2024, 11:58 p.m. UTC
if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined
instead of _NL_CTYPE_CODESET after _NL_CTYPE_MB_CUR_MAX, and
_NL_MESSAGES_CODESET is omitted before _NL_MESSAGES_WYESEXPR, resulting
in _NL_MESSAGES_WNOSTR pointer being overwritten by _NL_COLLATE_CODESET
pointer
demonstration of pointer overwrite by nl_langinfo dump program generated
with additional langinfo.h hack after fix:
$ diff -4 /usr{/src/newlib-cygwin/newlib/libc,}/include/langinfo.h
--- /usr/src/newlib-cygwin/newlib/libc/include/langinfo.h 2024-08-21 17:11:00.852069900 -0600
+++ /usr/include/langinfo.h 2024-08-21 16:59:59.609998900 -0600
@@ -301,9 +301,9 @@ enum
_NL_MESSAGES_CODESET,
_NL_MESSAGES_WYESEXPR,
_NL_MESSAGES_WNOEXPR,
_NL_MESSAGES_WYESSTR,
- _NL_MESSAGES_WNOSTR,
+#define _NL_MESSAGES_WNOSTR _NL_MESSAGES_WYESSTR
_NL_COLLATE_CODESET,
MESSAGES_CODESET UTF-8
MESSAGES_WYESEXPR ^[+1yY]
MESSAGES_WNOEXPR ^[-0nN]
MESSAGES_WYESSTR yes
79 y 65 e 73 s 0 ^@ 6e n 6f o 0 ^@ 0 ^@ # %hx %C dump
COLLATE_CODESET UTF-8
if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is
also not defined, so it is unclear if _NL_MESSAGES_CODESET should be
defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor
__HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field
depending on those definitions
Signed-off-by: Brian Inglis <Brian.Inglis@SystematicSW.ab.ca>
---
newlib/libc/include/langinfo.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
Hi Brian, On Aug 21 17:58, Brian Inglis wrote: > if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined > instead of _NL_CTYPE_CODESET There is no _NL_CTYPE_CODESET, only CODESET, and the value of _NL_MESSAGES_CODESET is what it is for backward compatibility. The values must not change. > demonstration of pointer overwrite by nl_langinfo dump program generated > with additional langinfo.h hack after fix: Can you please provide your STC? > if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is > also not defined, so it is unclear if _NL_MESSAGES_CODESET should be > defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor > __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field > depending on those definitions As I wrote above, the values must not change. And given _NL_CTYPE_CODESET doesn't really exist (but CODESET does), I'm not sure I understand the problem here. As on Linux we have a value _NL_CTYPE_CODESET_NAME which is equivalent to CODESET. Corinna
On 2024-08-22 02:07, Corinna Vinschen wrote: > Hi Brian, > > On Aug 21 17:58, Brian Inglis wrote: >> if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined >> instead of _NL_CTYPE_CODESET > > There is no _NL_CTYPE_CODESET, only CODESET, and the value of > _NL_MESSAGES_CODESET is what it is for backward compatibility. > > The values must not change. What about the pointer positions in the array? >> demonstration of pointer overwrite by nl_langinfo dump program generated >> with additional langinfo.h hack after fix: > > Can you please provide your STC? Attached, with log, assuming current langinfo.h. Also attached is my original program, for use with appropriately fixed and/or hacked up langinfo.h, with my nl_items label array adjusted to match. >> if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is >> also not defined, so it is unclear if _NL_MESSAGES_CODESET should be >> defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor >> __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field >> depending on those definitions > > As I wrote above, the values must not change. And given _NL_CTYPE_CODESET > doesn't really exist (but CODESET does), I'm not sure I understand the > problem here. As on Linux we have a value _NL_CTYPE_CODESET_NAME which > is equivalent to CODESET. An instance of a CODESET item and pointer exists, before the wide messages fields, and not accounted for in the enum defined in langinfo.h, used to retrieve the values by nl_langinfo(3). What is shown in the enum as the _NL_MESSAGE_CODESET item appears to be generated from the LC_CTYPE locale entry, and added to the buffer and pointer array after the _NL_CTYPE_MB_CUR_MAX pointer, before the _NL_CTYPE_OUTDIGIT?_MB item pointers. What is not shown in the enum is the CODESET item generated from the LC_MESSAGES locale entry, and added to the buffer and pointer array after the _NL_MONETARY_WNEGATIVE_SIGN pointer before the _NL_MESSAGES_WYESEXPR pointer. As a result, the underlying pointers in the pointer array are off by one in the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what was stored in the _NL_MESSAGES_WNOSTR position in the pointer array. This is shown below from the STC log attached: you can see in the dump that the underlying data from the current WNOSTR item, has L"yes" followed by L"no" L"", and the following COLLATE_CODESET item looks normal: MESSAGES_WYESEXPR 呕ⵆ8孞ㄫ她佯]孞〭乮]敹s潮帀嬀⬀礀夀漀伀崀 5455 '呕' 2d46 'ⵆ' 0038 '8' 5b5e '孞' 312b 'ㄫ' 5979 '她' 4f6f '佯' 005d ']' 55 'U' 54 'T' 46 'F' 2d '-' 38 '8' 00 '' 5e '^' 5b '[' 2b '+' 31 '1' 79 'y' 59 'Y' 6f 'o' 4f 'O' 5d ']' 00 '' MESSAGES_WNOEXPR ^[+1yYoO] 005e '^' 005b '[' 002b '+' 0031 '1' 0079 'y' 0059 'Y' 006f 'o' 004f 'O' 5e '^' 00 '' 5b '[' 00 '' 2b '+' 00 '' 31 '1' 00 '' 79 'y' 00 '' 59 'Y' 00 '' 6f 'o' 00 '' 4f 'O' 00 '' MESSAGES_WYESSTR ^[-0nN] 005e '^' 005b '[' 002d '-' 0030 '0' 006e 'n' 004e 'N' 005d ']' 0000 '' 5e '^' 00 '' 5b '[' 00 '' 2d '-' 00 '' 30 '0' 00 '' 6e 'n' 00 '' 4e 'N' 00 '' 5d ']' 00 '' 00 '' 00 '' MESSAGES_WNOSTR yes 0079 'y' 0065 'e' 0073 's' 0000 '' 006e 'n' 006f 'o' 0000 '' 0000 '' 79 'y' 00 '' 65 'e' 00 '' 73 's' 00 '' 00 '' 00 '' 6e 'n' 00 '' 6f 'o' 00 '' 00 '' 00 '' 00 '' 00 '' COLLATE_CODESET UTF-8 5455 '呕' 2d46 'ⵆ' 0038 '8' 0000 '' 0000 '' 0000 '' 0000 '' 0000 '' 55 'U' 54 'T' 46 'F' 2d '-' 38 '8' 00 '' 00 '' 00 '' 00 '' 00 '' 00 '' 00 '' 00 '' 00 '' 00 '' 00 '' So the generated locale data contains a (message) codeset preceding the wide message fields, and this is missing from the enum, so that the following item pointers are off by one, and the collate codeset pointer and item look correct, as the pointer clobbered that for WNOSTR. Additional langinfo.h hack after fix to show existing problem: $ diff -4 /usr{/src/newlib-cygwin/newlib/libc,}/include/langinfo.h --- /usr/src/newlib-cygwin/newlib/libc/include/langinfo.h 2024-08-21 17:11:00.852069900 -0600 +++ /usr/include/langinfo.h 2024-08-21 16:59:59.609998900 -0600 @@ -301,9 +301,9 @@ enum _NL_MESSAGES_CODESET, _NL_MESSAGES_WYESEXPR, _NL_MESSAGES_WNOEXPR, _NL_MESSAGES_WYESSTR, - _NL_MESSAGES_WNOSTR, +#define _NL_MESSAGES_WNOSTR _NL_MESSAGES_WYESSTR _NL_COLLATE_CODESET, Demonstration of pointer overwrite by nl_langinfo dump program run with patch and above hack: MESSAGES_CODESET UTF-8 MESSAGES_WYESEXPR ^[+1yY] MESSAGES_WNOEXPR ^[-0nN] MESSAGES_WYESSTR yes 79 y 65 e 73 s 0 ^@ 6e n 6f o 0 ^@ 0 ^@ # %hx %C dump COLLATE_CODESET UTF-8 If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is also not defined, so it is unclear if _NL_MESSAGES_CODESET should be defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field depending on those definitions?
On Aug 22 11:14, Brian Inglis wrote: > On 2024-08-22 02:07, Corinna Vinschen wrote: > > Hi Brian, > > > > On Aug 21 17:58, Brian Inglis wrote: > > > if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined > > > instead of _NL_CTYPE_CODESET > > > > There is no _NL_CTYPE_CODESET, only CODESET, and the value of > > _NL_MESSAGES_CODESET is what it is for backward compatibility. > > > > The values must not change. > > What about the pointer positions in the array? > > > > demonstration of pointer overwrite by nl_langinfo dump program generated > > > with additional langinfo.h hack after fix: > > > > Can you please provide your STC? > > Attached, with log, assuming current langinfo.h. > Also attached is my original program, for use with appropriately fixed > and/or hacked up langinfo.h, with my nl_items label array adjusted to match. > > > > if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is > > > also not defined, so it is unclear if _NL_MESSAGES_CODESET should be > > > defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor > > > __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field > > > depending on those definitions > > > > As I wrote above, the values must not change. And given _NL_CTYPE_CODESET > > doesn't really exist (but CODESET does), I'm not sure I understand the > > problem here. As on Linux we have a value _NL_CTYPE_CODESET_NAME which > > is equivalent to CODESET. > > An instance of a CODESET item and pointer exists, before the wide messages > fields, and not accounted for in the enum defined in langinfo.h, used to > retrieve the values by nl_langinfo(3). I don't understand this sentence. There is a _NL_MESSAGES_CODESET at this point, and the value is *not* supposed to be a pointer into an array or struct. Whatever is wrong, it's not the values defined in langinfo.h > What is shown in the enum as the _NL_MESSAGE_CODESET item appears to be > generated from the LC_CTYPE locale entry, and added to the buffer and > pointer array after the _NL_CTYPE_MB_CUR_MAX pointer, before the > _NL_CTYPE_OUTDIGIT?_MB item pointers. > > What is not shown in the enum is the CODESET item generated from the > LC_MESSAGES locale entry, and added to the buffer and pointer array after > the _NL_MONETARY_WNEGATIVE_SIGN pointer before the _NL_MESSAGES_WYESEXPR > pointer. > This is a wrong assumption. Compare with newlib/libc/locale/nl_langinfo.c, line 183. > As a result, the underlying pointers in the pointer array are off by one in > the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what > was stored in the _NL_MESSAGES_WNOSTR position in the pointer array. Yes, youre' right, there's an off by one. But this can't be fixed by chaning nl_item values. The culprit is actually in nl_langinfo.c. The internal nl_ext array erronously contains an entry for lc_monetary_T::codeset which is not part of the _NL_LOCALE_EXTENDED entries. This moves the wide char yes/no strings accidentally by one. The patch is simple: diff --git a/newlib/libc/locale/nl_langinfo.c b/newlib/libc/locale/nl_langinfo.c index c34a7d131376..4477d833bec1 100644 --- a/newlib/libc/locale/nl_langinfo.c +++ b/newlib/libc/locale/nl_langinfo.c @@ -160,7 +160,6 @@ static struct _nl_item_t _NLITEM (monetary, wmon_thousands_sep), _NLITEM (monetary, wpositive_sign), _NLITEM (monetary, wnegative_sign), - _NLITEM (messages, codeset), _NLITEM (messages, wyesexpr), _NLITEM (messages, wnoexpr), _NLITEM (messages, wyesstr), > If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is > also not defined, so it is unclear if _NL_MESSAGES_CODESET should be > defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor > __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field > depending on those definitions? No, see above. There is no definition of _NL_CTYPE_CODESET, only _NL_CTYPE_CODESET_NAME == CODESET. _NL_MESSAGES_CODESET is exactly where it has to be. If you start moving _NL_* values around, you *will* break backward compatibility with existing executables. However, there *is* another problem. _NL_MESSAGES_CODESET is defined if __HAVE_LOCALE_INFO__ is defined. OTOH, the internal datatype struct lc_messages_T (see libc/locale/setlocale.h) defines the codeset member only if __HAVE_LOCALE_INFO_EXTENDED__ is defined. This is no problem at all for Cygwin, which defines __HAVE_LOCALE_INFO_EXTENDED__ anyway, but a target only defining __HAVE_LOCALE_INFO__ would probably see an error during compilation because lc_messages_T::codeset will be undefined. This is... weird. Does no other target except Cygwin define __HAVE_LOCALE_INFO__??? Anyway, I'll fix it. Thanks, Corinna
On 2024-08-22 13:13, Corinna Vinschen wrote: > On Aug 22 11:14, Brian Inglis wrote: >> On 2024-08-22 02:07, Corinna Vinschen wrote: >>> Hi Brian, >>> >>> On Aug 21 17:58, Brian Inglis wrote: >>>> if __HAVE_LOCALE_INFO__ is defined, then _NL_MESSAGES_CODESET is defined >>>> instead of _NL_CTYPE_CODESET >>> >>> There is no _NL_CTYPE_CODESET, only CODESET, and the value of >>> _NL_MESSAGES_CODESET is what it is for backward compatibility. >>> >>> The values must not change. >> >> What about the pointer positions in the array? >> >>>> demonstration of pointer overwrite by nl_langinfo dump program generated >>>> with additional langinfo.h hack after fix: >>> >>> Can you please provide your STC? >> >> Attached, with log, assuming current langinfo.h. >> Also attached is my original program, for use with appropriately fixed >> and/or hacked up langinfo.h, with my nl_items label array adjusted to match. >> >>>> if __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is >>>> also not defined, so it is unclear if _NL_MESSAGES_CODESET should be >>>> defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor >>>> __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field >>>> depending on those definitions >>> >>> As I wrote above, the values must not change. And given _NL_CTYPE_CODESET >>> doesn't really exist (but CODESET does), I'm not sure I understand the >>> problem here. As on Linux we have a value _NL_CTYPE_CODESET_NAME which >>> is equivalent to CODESET. >> >> An instance of a CODESET item and pointer exists, before the wide messages >> fields, and not accounted for in the enum defined in langinfo.h, used to >> retrieve the values by nl_langinfo(3). > > I don't understand this sentence. There is a _NL_MESSAGES_CODESET at this > point, and the value is *not* supposed to be a pointer into an array or > struct. Whatever is wrong, it's not the values defined in langinfo.h > >> What is shown in the enum as the _NL_MESSAGE_CODESET item appears to be >> generated from the LC_CTYPE locale entry, and added to the buffer and >> pointer array after the _NL_CTYPE_MB_CUR_MAX pointer, before the >> _NL_CTYPE_OUTDIGIT?_MB item pointers. >> >> What is not shown in the enum is the CODESET item generated from the >> LC_MESSAGES locale entry, and added to the buffer and pointer array after >> the _NL_MONETARY_WNEGATIVE_SIGN pointer before the _NL_MESSAGES_WYESEXPR >> pointer. >> > > This is a wrong assumption. Compare with newlib/libc/locale/nl_langinfo.c, > line 183. > >> As a result, the underlying pointers in the pointer array are off by one in >> the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what >> was stored in the _NL_MESSAGES_WNOSTR position in the pointer array. > > Yes, youre' right, there's an off by one. But this can't be fixed by > chaning nl_item values. > > The culprit is actually in nl_langinfo.c. The internal nl_ext array > erronously contains an entry for lc_monetary_T::codeset which is not > part of the _NL_LOCALE_EXTENDED entries. This moves the wide char > yes/no strings accidentally by one. The patch is simple: > > diff --git a/newlib/libc/locale/nl_langinfo.c b/newlib/libc/locale/nl_langinfo.c > index c34a7d131376..4477d833bec1 100644 > --- a/newlib/libc/locale/nl_langinfo.c > +++ b/newlib/libc/locale/nl_langinfo.c > @@ -160,7 +160,6 @@ static struct _nl_item_t > _NLITEM (monetary, wmon_thousands_sep), > _NLITEM (monetary, wpositive_sign), > _NLITEM (monetary, wnegative_sign), > - _NLITEM (messages, codeset), > _NLITEM (messages, wyesexpr), > _NLITEM (messages, wnoexpr), > _NLITEM (messages, wyesstr), > >> If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is >> also not defined, so it is unclear if _NL_MESSAGES_CODESET should be >> defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor >> __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field >> depending on those definitions? > > No, see above. There is no definition of _NL_CTYPE_CODESET, only > _NL_CTYPE_CODESET_NAME == CODESET. _NL_MESSAGES_CODESET is exactly > where it has to be. If you start moving _NL_* values around, you *will* > break backward compatibility with existing executables. > > However, there *is* another problem. _NL_MESSAGES_CODESET is defined if > __HAVE_LOCALE_INFO__ is defined. OTOH, the internal datatype struct > lc_messages_T (see libc/locale/setlocale.h) defines the codeset member > only if __HAVE_LOCALE_INFO_EXTENDED__ is defined. This is no problem at > all for Cygwin, which defines __HAVE_LOCALE_INFO_EXTENDED__ anyway, but > a target only defining __HAVE_LOCALE_INFO__ would probably see an > error during compilation because lc_messages_T::codeset will be > undefined. > > This is... weird. > > Does no other target except Cygwin define __HAVE_LOCALE_INFO__??? > > Anyway, I'll fix it. But aren't those entries populated from Windows, where the codeset is copied in winsup/cygwin/nlsfuncs.cc(__set_lc_messages_from_win) https://cygwin.com/git?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/nlsfuncs.cc#l1114 into newlib/libc/locale/setlocale.h(lc_messages_t): https://cygwin.com/git?p=newlib-cygwin.git;a=blob;f=newlib/libc/locale/setlocale.h#l166 which includes codeset?
On Aug 22 15:25, Brian Inglis wrote: > On 2024-08-22 13:13, Corinna Vinschen wrote: > > On Aug 22 11:14, Brian Inglis wrote: > > > As a result, the underlying pointers in the pointer array are off by one in > > > the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what > > > was stored in the _NL_MESSAGES_WNOSTR position in the pointer array. > > > > Yes, youre' right, there's an off by one. But this can't be fixed by > > chaning nl_item values. > > > > The culprit is actually in nl_langinfo.c. The internal nl_ext array > > erronously contains an entry for lc_monetary_T::codeset which is not > > part of the _NL_LOCALE_EXTENDED entries. This moves the wide char > > yes/no strings accidentally by one. The patch is simple: > > > > diff --git a/newlib/libc/locale/nl_langinfo.c b/newlib/libc/locale/nl_langinfo.c > > index c34a7d131376..4477d833bec1 100644 > > --- a/newlib/libc/locale/nl_langinfo.c > > +++ b/newlib/libc/locale/nl_langinfo.c > > @@ -160,7 +160,6 @@ static struct _nl_item_t > > _NLITEM (monetary, wmon_thousands_sep), > > _NLITEM (monetary, wpositive_sign), > > _NLITEM (monetary, wnegative_sign), > > - _NLITEM (messages, codeset), > > _NLITEM (messages, wyesexpr), > > _NLITEM (messages, wnoexpr), > > _NLITEM (messages, wyesstr), > > > > > If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is > > > also not defined, so it is unclear if _NL_MESSAGES_CODESET should be > > > defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor > > > __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field > > > depending on those definitions? > > > > No, see above. There is no definition of _NL_CTYPE_CODESET, only > > _NL_CTYPE_CODESET_NAME == CODESET. _NL_MESSAGES_CODESET is exactly > > where it has to be. If you start moving _NL_* values around, you *will* > > break backward compatibility with existing executables. > > [...] > > But aren't those entries populated from Windows, where the codeset is copied in > winsup/cygwin/nlsfuncs.cc(__set_lc_messages_from_win) > > https://cygwin.com/git?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/nlsfuncs.cc#l1114 > > into newlib/libc/locale/setlocale.h(lc_messages_t): > > https://cygwin.com/git?p=newlib-cygwin.git;a=blob;f=newlib/libc/locale/setlocale.h#l166 > > which includes codeset? No. First of all, apart from fetching LANG from the windows locale at startup (see /etc/profile.d/lang.*), the codeset is maintained by Cygwin/newlib alone, via LC_* environment and setlocale() calls in the application. Also, Windows has no GetLocaleInfo call for fetching locale-specific yes/no strings. I. e., there's no equivalent for LC_MESSAGES. Therefore Cygwin fetches the information from static information maintained in winsup/cygwin/local_includes/lc_msg.h. If you look closely into __set_lc_messages_from_win(), you'll notice that the __get_rfc5646_from_locale() function is called from __set_lc_messages_from_win() *only* to check if a matching Windows locale exists. This is a pure sanity check. After that, any further information is taken from the lc_msg struct defined in lc_msg.h, which in turn has been fetched from Linux. See winsup/cygwin/linux-locale-helpers. Does that make sense? Thanks, Corinna
On 2024-08-23 02:39, Corinna Vinschen wrote: > On Aug 22 15:25, Brian Inglis wrote: >> On 2024-08-22 13:13, Corinna Vinschen wrote: >>> On Aug 22 11:14, Brian Inglis wrote: >>>> As a result, the underlying pointers in the pointer array are off by one in >>>> the wide messages fields, and the _NL_COLLATE_CODESET pointer clobbers what >>>> was stored in the _NL_MESSAGES_WNOSTR position in the pointer array. >>> >>> Yes, youre' right, there's an off by one. But this can't be fixed by >>> chaning nl_item values. >>> >>> The culprit is actually in nl_langinfo.c. The internal nl_ext array >>> erronously contains an entry for lc_monetary_T::codeset which is not >>> part of the _NL_LOCALE_EXTENDED entries. This moves the wide char >>> yes/no strings accidentally by one. The patch is simple: >>> >>> diff --git a/newlib/libc/locale/nl_langinfo.c b/newlib/libc/locale/nl_langinfo.c >>> index c34a7d131376..4477d833bec1 100644 >>> --- a/newlib/libc/locale/nl_langinfo.c >>> +++ b/newlib/libc/locale/nl_langinfo.c >>> @@ -160,7 +160,6 @@ static struct _nl_item_t >>> _NLITEM (monetary, wmon_thousands_sep), >>> _NLITEM (monetary, wpositive_sign), >>> _NLITEM (monetary, wnegative_sign), >>> - _NLITEM (messages, codeset), >>> _NLITEM (messages, wyesexpr), >>> _NLITEM (messages, wnoexpr), >>> _NLITEM (messages, wyesstr), >>> >>>> If __HAVE_LOCALE_INFO__ is not defined, then _NL_MESSAGES_CODESET is >>>> also not defined, so it is unclear if _NL_MESSAGES_CODESET should be >>>> defined to _NL_CTYPE_CODESET if neither __HAVE_LOCALE_INFO__ nor >>>> __HAVE_LOCALE_INFO_EXTENDED__ are defined, or added as another field >>>> depending on those definitions? >>> >>> No, see above. There is no definition of _NL_CTYPE_CODESET, only >>> _NL_CTYPE_CODESET_NAME == CODESET. _NL_MESSAGES_CODESET is exactly >>> where it has to be. If you start moving _NL_* values around, you *will* >>> break backward compatibility with existing executables. >>> [...] >> >> But aren't those entries populated from Windows, where the codeset is copied in >> winsup/cygwin/nlsfuncs.cc(__set_lc_messages_from_win) >> >> https://cygwin.com/git?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/nlsfuncs.cc#l1114 >> >> into newlib/libc/locale/setlocale.h(lc_messages_t): >> >> https://cygwin.com/git?p=newlib-cygwin.git;a=blob;f=newlib/libc/locale/setlocale.h#l166 >> >> which includes codeset? > > No. First of all, apart from fetching LANG from the windows locale > at startup (see /etc/profile.d/lang.*), the codeset is maintained by > Cygwin/newlib alone, via LC_* environment and setlocale() calls in > the application. > > Also, Windows has no GetLocaleInfo call for fetching locale-specific > yes/no strings. I. e., there's no equivalent for LC_MESSAGES. > > Therefore Cygwin fetches the information from static information > maintained in winsup/cygwin/local_includes/lc_msg.h. > > If you look closely into __set_lc_messages_from_win(), you'll > notice that the __get_rfc5646_from_locale() function is called > from __set_lc_messages_from_win() *only* to check if a matching > Windows locale exists. This is a pure sanity check. After that, > any further information is taken from the lc_msg struct defined > in lc_msg.h, which in turn has been fetched from Linux. See > winsup/cygwin/linux-locale-helpers. > > Does that make sense? Yes - I dug into all those sources and that is not any issue. But __set_lc_messages_from_win() then adds the codeset field before the wide character fields, as with the other Cygwin functions, so would it not be more consistent and simpler to just add an enum entry for the codeset, called something unique including ...CODESET..., keeping the entry and realigning tha last few indexes with the items, as nobody else has noticed any issue in the last 15 years ;^>
On Aug 24 10:54, Brian Inglis wrote: > On 2024-08-23 02:39, Corinna Vinschen wrote: > > No. First of all, apart from fetching LANG from the windows locale > > at startup (see /etc/profile.d/lang.*), the codeset is maintained by > > Cygwin/newlib alone, via LC_* environment and setlocale() calls in > > the application. > > > > Also, Windows has no GetLocaleInfo call for fetching locale-specific > > yes/no strings. I. e., there's no equivalent for LC_MESSAGES. > > > > Therefore Cygwin fetches the information from static information > > maintained in winsup/cygwin/local_includes/lc_msg.h. > > > > If you look closely into __set_lc_messages_from_win(), you'll > > notice that the __get_rfc5646_from_locale() function is called > > from __set_lc_messages_from_win() *only* to check if a matching > > Windows locale exists. This is a pure sanity check. After that, > > any further information is taken from the lc_msg struct defined > > in lc_msg.h, which in turn has been fetched from Linux. See > > winsup/cygwin/linux-locale-helpers. > > > > Does that make sense? > > Yes - I dug into all those sources and that is not any issue. > > But __set_lc_messages_from_win() then adds the codeset field before the wide > character fields, as with the other Cygwin functions, so would it not be > more consistent and simpler to just add an enum entry for the codeset, > called something unique including ...CODESET..., keeping the entry and > realigning tha last few indexes with the items, as nobody else has noticed > any issue in the last 15 years ;^> Again: The values MUST NOT change for backward compat. The order in the nl_item enum doesn't reflect a struct order anyway, because the first items are the standarized ones, and only then the extensions follow with the preference of values already defined with __HAVE_LOCALE_INFO__. Apart from that, the actual numeric values are meaningless. They could have been chosen by throwing a dice. If you feel really strongly about, we can add a comment, like this: diff --git a/newlib/libc/include/langinfo.h b/newlib/libc/include/langinfo.h index 41d090d3710e..509e8fb0e493 100644 --- a/newlib/libc/include/langinfo.h +++ b/newlib/libc/include/langinfo.h @@ -298,6 +298,7 @@ enum _NL_MONETARY_WPOSITIVE_SIGN, _NL_MONETARY_WNEGATIVE_SIGN, + /* _NL_MESSAGES_CODESET already defined earlier for compatibility */ _NL_MESSAGES_WYESEXPR, _NL_MESSAGES_WNOEXPR, _NL_MESSAGES_WYESSTR, Corinna
diff --git a/newlib/libc/include/langinfo.h b/newlib/libc/include/langinfo.h index 41d090d3710e..e383dc59d83f 100644 --- a/newlib/libc/include/langinfo.h +++ b/newlib/libc/include/langinfo.h @@ -179,7 +179,7 @@ enum #ifdef __HAVE_LOCALE_INFO__ _NL_CTYPE_MB_CUR_MAX, - _NL_MESSAGES_CODESET, + _NL_CTYPE_CODESET, #ifdef __HAVE_LOCALE_INFO_EXTENDED__ @@ -298,6 +298,7 @@ enum _NL_MONETARY_WPOSITIVE_SIGN, _NL_MONETARY_WNEGATIVE_SIGN, + _NL_MESSAGES_CODESET, _NL_MESSAGES_WYESEXPR, _NL_MESSAGES_WNOEXPR, _NL_MESSAGES_WYESSTR,