| Message ID | 9cea6e5e-796d-4e61-9c49-5bd050d80081@towo.net |
|---|---|
| State | New |
| Headers |
Return-Path: <newlib-bounces~patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 1B2784BB58DC for <patchwork@sourceware.org>; Mon, 2 Feb 2026 13:23:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1B2784BB58DC Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=towo.net header.i=towo@towo.net header.a=rsa-sha256 header.s=s1-ionos header.b=2GuBPLHS X-Original-To: newlib@sourceware.org Delivered-To: newlib@sourceware.org Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.17.24]) by sourceware.org (Postfix) with ESMTPS id BD2AF4BB58D4 for <newlib@sourceware.org>; Mon, 2 Feb 2026 13:21:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BD2AF4BB58D4 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=towo.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=towo.net ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BD2AF4BB58D4 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=212.227.17.24 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1770038519; cv=none; b=GxnzlYUDZ0lWXRzv5iMPmWmHeDByJTwpUYXN2MlELa8EMPVMhBs8WNHQSj9W9/JjZqKcyu14ZPK7gIl1fYLD5v1NZOvLfjdpBYm3WyoBJpuJSq1L5/aXPjWZGAYL1ouxXzCWZoD6QHKoZFErzwvXMVX6/HSCt/AxEM8MDSM76Mc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1770038519; c=relaxed/simple; bh=VWTSdhtRzU7YwSdrxvNt2r2SO1lc8mnNRjOW/KqepFg=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=Igv+OFb7i1mEAl29+bcjBRliEwxtiwcMeaZcMj1A79Ig9u2cbvmTZhw4JVHcqC9bWlpkmg1+g+qVvdzWFLhfnoL1EAWrJI4uofLE0TirPCWcKr9BWYTNrM39/2tPRZq/YUSRyReJVfJ8g5djqN4cmOuRTCrQy6S3jBDKl5JOPNI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BD2AF4BB58D4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=towo.net; s=s1-ionos; t=1770038517; x=1770643317; i=towo@towo.net; bh=VWTSdhtRzU7YwSdrxvNt2r2SO1lc8mnNRjOW/KqepFg=; h=X-UI-Sender-Class:Content-Type:Message-ID:Date:MIME-Version:From: To:Subject:cc:content-transfer-encoding:content-type:date:from: message-id:mime-version:reply-to:subject:to; b=2GuBPLHSWJKSGnJCTwQMB/1ypOMTzTJdLA683qJPLQCp+1HKc4Cd+tezIj6L0amn q8vXuisQCh/ycRs1KdmTSqFXFYqdjfwbtFr9BqnZnalztt27LREuA9TZxHXxq8X1Y u4g86QT2PmZe18mRuVj3VoMacUDUSZFBRQWRjoy1PHk7q4b4TXkuiFMB+OEOdbgd4 nkb9+oh8qh9pqmUgBhvqvPbrIMXb2cASvvcA8GWUYcXXV/DEiY4hilBsMaYTucbKG TLPaXXEafCtNDAgpbQPM8t5z37qCgiHgK9ESzCHuIJZGotNH38aA+5OWEiGuTn7Wp Mzd4XIUsPh4CdKqSJg== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Received: from [192.168.178.23] ([92.219.162.23]) by mrelayeu.kundenserver.de (mreue109 [212.227.15.183]) with ESMTPSA (Nemesis) id 1MDyoU-1vwlEH1bdc-001U5M for <newlib@sourceware.org>; Mon, 02 Feb 2026 14:21:57 +0100 Content-Type: multipart/mixed; boundary="------------34XeL9FGQIW0yM8in7KSVvTC" Message-ID: <9cea6e5e-796d-4e61-9c49-5bd050d80081@towo.net> Date: Mon, 2 Feb 2026 14:21:57 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Thomas Wolff <towo@towo.net> To: newlib@sourceware.org Subject: [PATCH] handle casing of i/I in Turkic languages Autocrypt: addr=towo@towo.net; keydata= xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11 Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1 zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m 7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9 +AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve 5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72 5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4 uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ 0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW 6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ== X-Provags-ID: V03:K1:+zG5tIQf5Hi2SjrMxugjuTTjTj/Tj03uGvZUxj8Ani1GaskFq7U +79ZtSxu4uEZTXcj2t2tmjqcBiPHjeX/ej2bFypdr16eM9mZsKfrquwtETdTgmeyEchVSsM 5+p95FiqNn79nvPC0N0boEX2po5hsgG7SD1bEfIhinzzaGrsnSAn1hLF62A8RXcoNqSBadc ZJh+FZnxth5U4oJTQI34A== UI-OutboundReport: notjunk:1;M01:P0:x6ldFnHjuKo=;RM8O7GXMFtbYD+kdSZOzgJhr5kc AIxBUR02ZczxP7pRuhiYcwzCSYceDARagbXTYDJCDEDAnRIda5u8pNWAOJFmvg42osTsQKmEH cxfMKPNvwFojFUcdmAM6y2DSO5XZDZdej0CAM1MynIQuXyvY2BJXOWggF6INvLNSPKrMECcEd pVthJUeuBu8usutbv1pQGKZvpgRkeM0rt+no4rK6dWqmxMeoYp2PwIUX6cjAKVB+sWqZK/2q9 E+H9qnmK8G80EatWJEnY2OMUWvJZliXtJqfqG9eufSBMPFi2L8pG5oyoAAHRJRxPvbNkQIhL5 NRWp4vKQUqG5w8l90mOW48iBoUK/ZiFF8NcHamofJpLz8NgouiOSCU3BM6JPOCZah9zaA+HGt UX3LQf2xF+9kZfNpNsJhnKE3hDetrknC7xv+uQ4wLYHhcf7+8H1nSYi8FDDhNL98Mrxsjm6kT nO8TLlYpPXUis4osC5UaeodPdi4WnuBzCMtc81XsQ8jGdqatpFvAZZ1C9OR6udT0NIWYXydGi uC6z9AsgbQ1cnmnEXWQJCjDl5lWq7o+BvLPdtr0pL1kxxyXpxZj3UKhXtrvD84lVXwMqVhjHs 1B8msQAFpr1ho49uOMQnOwh5bOeIONCSxswnz3a2v49cuxs39VBzt1PkdoazpciyKOl6Vobb7 IsO0Ie1rqkAn6cbdhDslSmALVc9Rddx2xxV3/UTBkZ18oA5R+UHwXSvm/l21euXfQxfbXn6Wl rF04nWOiUGoZ4P41PuzurC8vuFlSzlOLO6/PJ333dGahH0/GqIJrRuZ89Wep8ZoVdB3q5olkZ zHeRNLLVb0Xr7nbVAHSiTRHrJy0p026a/b4lq4Nczh7yk4zVAKDKn65B7+no096UkSYDPKBWI CBuE5FwSMaf+LjhM94axNySxUlk4zFPIetJj46vVKGu2vYZ36Beq3LuWayBz1Zf1HxN5emE5J zXZJsCRf8pPT+tmGoiYqSPs2Uvknz5GUns08KE00X2sH7P5lN+x5Hu1WjwYAMIpYPTzIBAj+r kbFUanGtpyj/c/11ZEP0ZiKpJPM2sCcpfL/uD2/BjSxYgUYyahk8H6aU0M4zIVvFU7vWSbtaF 9ZZ7cmouzj8fmpRHD/KIa4E0tK3rcokpv/4sQt9vAITd8eoZ7Q1JGUrMZmSA66gMO8aPT+28g yG/iZY+w6g5K5xmwPhxO8CdOTkhnCKqnp7eTV1xYnMVI+PICxwgq37yH5zhS78KAkg5EywZ0o /jfvybxowk5mr1zAh4n4KFVzim+y+CpNiyusvALUHx3No0KDSYNm9gXBBHdOymoxr0/DEhWsh 52NX2SMedOv04PkLm9JaEvfB+9hyjWpZ87KIBtw2U/9cy2kUBCFDVGJoJRTzKJceSLyRcsn05 WhZBdlqLQetLS5oFZ0wlwGexCe5smOaHhCzCV+UM5XNnQ0fzxIfx33DyWR7/7rFuF0ERrKWzC WpwlzPtK8ZCItmY1Keg9Vq004i/lunNdjBReeeie1n6N8xyNJHzfVBcWcDBjT+NlcRltoX7fA s8Bb6SITER9NlNkKC5XzKRQb5x++gp6lx/haoeE7gO3e0Xm1eQ1/dSlaiK1sXNQbYdyJRrPHD EOXXxuXeKdhWYDQ8mo5ua8ZpYYWjba4y3lpIzwOpyb97ijDOQkY8rDC9x0oy0Fg2wN9aEHV4g 4J5AmkE7kePq1Gl7itk6TpO4Em9P1Nw9evTkz0QqH0HsfCFtALs896KlcuPx8OzLG8tPOtV/Z tKjiGVClnOjJ7 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_BLOCKED, RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Newlib mailing list <newlib.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/newlib>, <mailto:newlib-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/newlib/> List-Post: <mailto:newlib@sourceware.org> List-Help: <mailto:newlib-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/newlib>, <mailto:newlib-request@sourceware.org?subject=subscribe> Errors-To: newlib-bounces~patchwork=sourceware.org@sourceware.org |
| Series |
handle casing of i/I in Turkic languages
|
|
Commit Message
Thomas Wolff
Feb. 2, 2026, 1:21 p.m. UTC
From 99df2722cb39dcf0290dfcd14d7e81ec87853e15 Mon Sep 17 00:00:00 2001
From: Thomas Wolff <towo@towo.net>
Date: Mon, 2 Feb 2026 00:00:00 +0000
Subject: [PATCH] towupper/towlower: handle special casing of "i"/"I" for
Turkic languages
---
newlib/libc/ctype/towctrans.c | 2 +-
newlib/libc/ctype/towctrans_l.c | 23 +++++++++++++++++++----
2 files changed, 20 insertions(+), 5 deletions(-)
Comments
Hi Thomas, thanks for the patch. Just one point: On Feb 2 14:21, Thomas Wolff wrote: > From 99df2722cb39dcf0290dfcd14d7e81ec87853e15 Mon Sep 17 00:00:00 2001 > From: Thomas Wolff <towo@towo.net> > Date: Mon, 2 Feb 2026 00:00:00 +0000 > Subject: [PATCH] towupper/towlower: handle special casing of "i"/"I" for > Turkic languages This is a non-trivial patch, so it would be nice to have a non-trivial commit message explaining what your doing and why. To the uninformed reader, the explicit exception for the turk language looks arbitrary. Care to add a bit of text? Thanks, Corinna > > --- > newlib/libc/ctype/towctrans.c | 2 +- > newlib/libc/ctype/towctrans_l.c | 23 +++++++++++++++++++---- > 2 files changed, 20 insertions(+), 5 deletions(-) > > diff --git a/newlib/libc/ctype/towctrans.c b/newlib/libc/ctype/towctrans.c > index 176aa3d9d..2cd34184e 100644 > --- a/newlib/libc/ctype/towctrans.c > +++ b/newlib/libc/ctype/towctrans.c > @@ -81,7 +81,7 @@ _towctrans_r (struct _reent *r, > wctrans_t w) > { > if (w == WCT_TOLOWER || w == WCT_TOUPPER) > - return towctrans_l (c, w, 0); > + return towctrans_l (c, w, LC_GLOBAL_LOCALE); > else > { > // skipping this because it was causing trouble (cygwin crash) > diff --git a/newlib/libc/ctype/towctrans_l.c b/newlib/libc/ctype/towctrans_l.c > index e94d6f492..2b843f302 100644 > --- a/newlib/libc/ctype/towctrans_l.c > +++ b/newlib/libc/ctype/towctrans_l.c > @@ -72,9 +72,21 @@ bisearch (wint_t ucs, const struct caseconv_entry *table, int max) > return 0; > } > > +static int > +isturk (struct __locale_t *locale) > +{ > + const char * loc = getlocalename_l (LC_CTYPE, locale); > + if (!loc) > + return 0; > + return 0 == strncmp (loc, "tr", 2) || 0 == strncmp (loc, "az", 2); > +} > + > static wint_t > -toulower (wint_t c) > +toulower (wint_t c, struct __locale_t *locale) > { > + if (c == 'I' && isturk (locale)) > + return 0x131; // LATIN SMALL LETTER DOTLESS I > + > const struct caseconv_entry * cce = > bisearch(c, caseconv_table, > sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); > @@ -108,8 +120,11 @@ toulower (wint_t c) > } > > static wint_t > -touupper (wint_t c) > +touupper (wint_t c, struct __locale_t *locale) > { > + if (c == 'i' && isturk (locale)) > + return 0x130; // LATIN CAPITAL LETTER I WITH DOT ABOVE > + > const struct caseconv_entry * cce = > bisearch(c, caseconv_table, > sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); > @@ -151,9 +166,9 @@ towctrans_l (wint_t c, wctrans_t w, struct __locale_t *locale) > wint_t u = _jp2uc_l (c, locale); > wint_t res; > if (w == WCT_TOLOWER) > - res = toulower (u); > + res = toulower (u, locale); > else if (w == WCT_TOUPPER) > - res = touupper (u); > + res = touupper (u, locale); > else > { > // skipping the errno setting that was previously involved > -- > 2.51.0 >
Am 02.02.2026 um 14:45 schrieb Corinna Vinschen: > Hi Thomas, > > > thanks for the patch. Just one point: > > On Feb 2 14:21, Thomas Wolff wrote: > >> From 99df2722cb39dcf0290dfcd14d7e81ec87853e15 Mon Sep 17 00:00:00 2001 >> From: Thomas Wolff <towo@towo.net> >> Date: Mon, 2 Feb 2026 00:00:00 +0000 >> Subject: [PATCH] towupper/towlower: handle special casing of "i"/"I" for >> Turkic languages > This is a non-trivial patch, so it would be nice to have a non-trivial > commit message explaining what your doing and why. To the uninformed > reader, the explicit exception for the turk language looks arbitrary. > > Care to add a bit of text? Quite a bit, very well, as attached. Thomas > Thanks, > Corinna > > > >> --- >> newlib/libc/ctype/towctrans.c | 2 +- >> newlib/libc/ctype/towctrans_l.c | 23 +++++++++++++++++++---- >> 2 files changed, 20 insertions(+), 5 deletions(-) >> >> diff --git a/newlib/libc/ctype/towctrans.c b/newlib/libc/ctype/towctrans.c >> index 176aa3d9d..2cd34184e 100644 >> --- a/newlib/libc/ctype/towctrans.c >> +++ b/newlib/libc/ctype/towctrans.c >> @@ -81,7 +81,7 @@ _towctrans_r (struct _reent *r, >> wctrans_t w) >> { >> if (w == WCT_TOLOWER || w == WCT_TOUPPER) >> - return towctrans_l (c, w, 0); >> + return towctrans_l (c, w, LC_GLOBAL_LOCALE); >> else >> { >> // skipping this because it was causing trouble (cygwin crash) >> diff --git a/newlib/libc/ctype/towctrans_l.c b/newlib/libc/ctype/towctrans_l.c >> index e94d6f492..2b843f302 100644 >> --- a/newlib/libc/ctype/towctrans_l.c >> +++ b/newlib/libc/ctype/towctrans_l.c >> @@ -72,9 +72,21 @@ bisearch (wint_t ucs, const struct caseconv_entry *table, int max) >> return 0; >> } >> >> +static int >> +isturk (struct __locale_t *locale) >> +{ >> + const char * loc = getlocalename_l (LC_CTYPE, locale); >> + if (!loc) >> + return 0; >> + return 0 == strncmp (loc, "tr", 2) || 0 == strncmp (loc, "az", 2); >> +} >> + >> static wint_t >> -toulower (wint_t c) >> +toulower (wint_t c, struct __locale_t *locale) >> { >> + if (c == 'I' && isturk (locale)) >> + return 0x131; // LATIN SMALL LETTER DOTLESS I >> + >> const struct caseconv_entry * cce = >> bisearch(c, caseconv_table, >> sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); >> @@ -108,8 +120,11 @@ toulower (wint_t c) >> } >> >> static wint_t >> -touupper (wint_t c) >> +touupper (wint_t c, struct __locale_t *locale) >> { >> + if (c == 'i' && isturk (locale)) >> + return 0x130; // LATIN CAPITAL LETTER I WITH DOT ABOVE >> + >> const struct caseconv_entry * cce = >> bisearch(c, caseconv_table, >> sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); >> @@ -151,9 +166,9 @@ towctrans_l (wint_t c, wctrans_t w, struct __locale_t *locale) >> wint_t u = _jp2uc_l (c, locale); >> wint_t res; >> if (w == WCT_TOLOWER) >> - res = toulower (u); >> + res = toulower (u, locale); >> else if (w == WCT_TOUPPER) >> - res = touupper (u); >> + res = touupper (u, locale); >> else >> { >> // skipping the errno setting that was previously involved >> -- >> 2.51.0 >> From 99df2722cb39dcf0290dfcd14d7e81ec87853e15 Mon Sep 17 00:00:00 2001 From: Thomas Wolff <towo@towo.net> Date: Mon, 2 Feb 2026 00:00:00 +0000 Subject: [PATCH] towupper/towlower: handle Turkic language special casing For case conversion, Unicode has a standard mapping and a separate list of mapping rules for special cases (file SpecialCasing.txt), some of which are also language-dependent (as configured via locale). However, most of these rules are context-dependent, e.g. Greek capital Sigma is lowered to two different small sigmas, depending on the position at the end of a word. The POSIX API function towupper and tolower cannot consider context as they work only on one character at a time. String casing functions are unfortunately not available. The only special case conversions that apply to a single character are i and I in Turkish and Azerbaijani, where i keeps the dot when capitalised (U+0130) and I keeps not having a dot when converted small (U+0131). The patch handles these special cases, based on locale consideration. --- newlib/libc/ctype/towctrans.c | 2 +- newlib/libc/ctype/towctrans_l.c | 23 +++++++++++++++++++---- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/newlib/libc/ctype/towctrans.c b/newlib/libc/ctype/towctrans.c index 176aa3d9d..2cd34184e 100644 --- a/newlib/libc/ctype/towctrans.c +++ b/newlib/libc/ctype/towctrans.c @@ -81,7 +81,7 @@ _towctrans_r (struct _reent *r, wctrans_t w) { if (w == WCT_TOLOWER || w == WCT_TOUPPER) - return towctrans_l (c, w, 0); + return towctrans_l (c, w, LC_GLOBAL_LOCALE); else { // skipping this because it was causing trouble (cygwin crash) diff --git a/newlib/libc/ctype/towctrans_l.c b/newlib/libc/ctype/towctrans_l.c index e94d6f492..2b843f302 100644 --- a/newlib/libc/ctype/towctrans_l.c +++ b/newlib/libc/ctype/towctrans_l.c @@ -72,9 +72,21 @@ bisearch (wint_t ucs, const struct caseconv_entry *table, int max) return 0; } +static int +isturk (struct __locale_t *locale) +{ + const char * loc = getlocalename_l (LC_CTYPE, locale); + if (!loc) + return 0; + return 0 == strncmp (loc, "tr", 2) || 0 == strncmp (loc, "az", 2); +} + static wint_t -toulower (wint_t c) +toulower (wint_t c, struct __locale_t *locale) { + if (c == 'I' && isturk (locale)) + return 0x131; // LATIN SMALL LETTER DOTLESS I + const struct caseconv_entry * cce = bisearch(c, caseconv_table, sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); @@ -108,8 +120,11 @@ toulower (wint_t c) } static wint_t -touupper (wint_t c) +touupper (wint_t c, struct __locale_t *locale) { + if (c == 'i' && isturk (locale)) + return 0x130; // LATIN CAPITAL LETTER I WITH DOT ABOVE + const struct caseconv_entry * cce = bisearch(c, caseconv_table, sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); @@ -151,9 +166,9 @@ towctrans_l (wint_t c, wctrans_t w, struct __locale_t *locale) wint_t u = _jp2uc_l (c, locale); wint_t res; if (w == WCT_TOLOWER) - res = toulower (u); + res = toulower (u, locale); else if (w == WCT_TOUPPER) - res = touupper (u); + res = touupper (u, locale); else { // skipping the errno setting that was previously involved
On Feb 2 18:05, Thomas Wolff wrote: > > > Am 02.02.2026 um 14:45 schrieb Corinna Vinschen: > > Hi Thomas, > > > > > > thanks for the patch. Just one point: > > > > On Feb 2 14:21, Thomas Wolff wrote: > > > > > From 99df2722cb39dcf0290dfcd14d7e81ec87853e15 Mon Sep 17 00:00:00 2001 > > > From: Thomas Wolff <towo@towo.net> > > > Date: Mon, 2 Feb 2026 00:00:00 +0000 > > > Subject: [PATCH] towupper/towlower: handle special casing of "i"/"I" for > > > Turkic languages > > This is a non-trivial patch, so it would be nice to have a non-trivial > > commit message explaining what your doing and why. To the uninformed > > reader, the explicit exception for the turk language looks arbitrary. > > > > Care to add a bit of text? > Quite a bit, very well, as attached. Looks great. Pushed. Thanks, Corinna
Hi Thomas, Is upper casing with accents in fr-CA vs without in fr-FR handled? On 2026-02-02 10:05, Thomas Wolff wrote: > Am 02.02.2026 um 14:45 schrieb Corinna Vinschen: >> Hi Thomas, >> thanks for the patch. Just one point: >> This is a non-trivial patch, so it would be nice to have a non-trivial >> commit message explaining what your doing and why. To the uninformed >> reader, the explicit exception for the turk language looks arbitrary. >> Care to add a bit of text? > Quite a bit, very well, as attached.
[IDK if Thomas is subscribed. CC'ing him here...] On Feb 2 23:08, Brian Inglis wrote: > Hi Thomas, > > Is upper casing with accents in fr-CA vs without in fr-FR handled? > > On 2026-02-02 10:05, Thomas Wolff wrote: > > Am 02.02.2026 um 14:45 schrieb Corinna Vinschen: > > > Hi Thomas, > > > thanks for the patch. Just one point: > > > > This is a non-trivial patch, so it would be nice to have a non-trivial > > > commit message explaining what your doing and why. To the uninformed > > > reader, the explicit exception for the turk language looks arbitrary. > > > Care to add a bit of text? > > > Quite a bit, very well, as attached. > -- > Take care. Thanks, Brian Inglis Calgary, Alberta, Canada > > La perfection est atteinte Perfection is achieved > non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add > mais lorsqu'il n'y a plus rien à retrancher but when there is no more to cut > -- Antoine de Saint-Exupéry
diff --git a/newlib/libc/ctype/towctrans.c b/newlib/libc/ctype/towctrans.c index 176aa3d9d..2cd34184e 100644 --- a/newlib/libc/ctype/towctrans.c +++ b/newlib/libc/ctype/towctrans.c @@ -81,7 +81,7 @@ _towctrans_r (struct _reent *r, wctrans_t w) { if (w == WCT_TOLOWER || w == WCT_TOUPPER) - return towctrans_l (c, w, 0); + return towctrans_l (c, w, LC_GLOBAL_LOCALE); else { // skipping this because it was causing trouble (cygwin crash) diff --git a/newlib/libc/ctype/towctrans_l.c b/newlib/libc/ctype/towctrans_l.c index e94d6f492..2b843f302 100644 --- a/newlib/libc/ctype/towctrans_l.c +++ b/newlib/libc/ctype/towctrans_l.c @@ -72,9 +72,21 @@ bisearch (wint_t ucs, const struct caseconv_entry *table, int max) return 0; } +static int +isturk (struct __locale_t *locale) +{ + const char * loc = getlocalename_l (LC_CTYPE, locale); + if (!loc) + return 0; + return 0 == strncmp (loc, "tr", 2) || 0 == strncmp (loc, "az", 2); +} + static wint_t -toulower (wint_t c) +toulower (wint_t c, struct __locale_t *locale) { + if (c == 'I' && isturk (locale)) + return 0x131; // LATIN SMALL LETTER DOTLESS I + const struct caseconv_entry * cce = bisearch(c, caseconv_table, sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); @@ -108,8 +120,11 @@ toulower (wint_t c) } static wint_t -touupper (wint_t c) +touupper (wint_t c, struct __locale_t *locale) { + if (c == 'i' && isturk (locale)) + return 0x130; // LATIN CAPITAL LETTER I WITH DOT ABOVE + const struct caseconv_entry * cce = bisearch(c, caseconv_table, sizeof(caseconv_table) / sizeof(*caseconv_table) - 1); @@ -151,9 +166,9 @@ towctrans_l (wint_t c, wctrans_t w, struct __locale_t *locale) wint_t u = _jp2uc_l (c, locale); wint_t res; if (w == WCT_TOLOWER) - res = toulower (u); + res = toulower (u, locale); else if (w == WCT_TOUPPER) - res = touupper (u); + res = touupper (u, locale); else { // skipping the errno setting that was previously involved