From patchwork Sun Oct 2 14:00:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Monnerat X-Patchwork-Id: 58270 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 095373857BB7 for ; Sun, 2 Oct 2022 14:01:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 095373857BB7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1664719275; bh=PKB95jGIGkGub2hbB+UF0RTq1HfqLUfq0h4abN3JhCA=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ILZcErbkworxAMtS6C1UsxQ9Dxzn2opXELVsJAOS68CUscOkCA/n356hHAvK/SZqL Bf/NnVEiwM9i20nRkh2VcGaV8Vi/ZXuCO33WMRIj34wg3NI+DA14DqBaasBf0cI8ZO bj+23lZA856m8YMHx1X/nftHNP7Z2jVBihy9sgUs= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from vimdzmsp-nwas04.bluewin.ch (vimdzmsp-nwas04.bluewin.ch [195.186.228.51]) by sourceware.org (Postfix) with ESMTPS id 534B83858D32 for ; Sun, 2 Oct 2022 14:00:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 534B83858D32 Received: from patrick.monnerat ([92.106.168.209]) by vimdzmsp-nwas04.bluewin.ch Swisscom AG with ESMTP id ezWOocOPvywoTezWToIT90; Sun, 02 Oct 2022 16:00:49 +0200 Received: from patrick.monnerat (localhost [127.0.0.1]) by patrick.monnerat (8.17.1/8.16.1) with ESMTPS id 292E0aMx106252 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sun, 2 Oct 2022 16:00:36 +0200 Received: (from patrick@localhost) by patrick.monnerat (8.17.1/8.17.1/Submit) id 292E0aqL106251; Sun, 2 Oct 2022 16:00:36 +0200 X-Authentication-Warning: patrick.monnerat: patrick set sender to patrick@monnerat.net using -f To: gdb-patches@sourceware.org Subject: [PATCH] gdb: add UTF16/UTF32 target charsets in phony_iconv Date: Sun, 2 Oct 2022 16:00:10 +0200 Message-Id: <20221002140010.106238-1-patrick@monnerat.net> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 X-CMAE-Envelope: MS4xfDGcIDZlORJCHy4pw1gCwJ9krjZzHQ1RUnyJUqudyr4D0/vbQT0/Wqucd1orG8NkexiZ1DD3vj9NSMTggtJFYCDHpN8+Gvp3ZRt/q2nBimFUHAgdT5BE LoqrvF2RocOeDKrzlK9LFImAiYOZrHqPiKGyB4tBFX23Qo8IoXIA2bOZpVMBF6jhRg3QXGyD6nwpZDOMDav/fxKQ57k9xn8pQgc= X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_NONE, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Patrick Monnerat via Gdb-patches From: Patrick Monnerat Reply-To: Patrick Monnerat Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" Function phony_iconv is substituted to the system-supplied iconv on platforms where the latter is deficient. It implements too few possible conversions for the current gdb requirements. In particular, Ada support in gdb needs converting strings to UTF-32, which is not currently featured: as this is used to determine the language, a warning is issued in all cases. Conditonal statements decide when the substitution occurs. This currently enables it for mingw (wchar_t is not UTF-32) even when the system-supplied iconv is suitable for gdb use. This patch extends phony_iconv_open and phony_iconv functions to support any conversion from/to host encoding, wchar_t, UTF-16 and UTF-32 with endianness alternatives. The value returned by phony_iconv_open is an integer token representing the size and endianness of both character encodings involved. --- gdb/charset.c | 132 +++++++++++++++++++++++++++++--------------------- 1 file changed, 76 insertions(+), 56 deletions(-) diff --git a/gdb/charset.c b/gdb/charset.c index a6261fc505c..c0528bfb9b5 100644 --- a/gdb/charset.c +++ b/gdb/charset.c @@ -98,27 +98,63 @@ #undef ICONV_CONST #define ICONV_CONST const -/* We allow conversions from UTF-32, wchar_t, and the host charset. - We allow conversions to wchar_t and the host charset. - Return 1 if we are converting from UTF-32BE, 2 if from UTF32-LE, - 0 otherwise. This is used as a flag in calls to iconv. */ +/* We allow conversions from/to UTF-16, UTF-32, wchar_t, and the host charset. + Return a token representing the conversion or -1 if error. The token + is unpacked in iconv. */ -static iconv_t -phony_iconv_open (const char *to, const char *from) +#define TOKEN_BITS 3 +#define TOKEN_MASK ((1 << TOKEN_BITS) - 1) +#define TOKEN_BIGENDIAN (1 << (TOKEN_BITS - 1)) + +static int +phony_iconv_token (const char *encoding) { - if (strcmp (to, "wchar_t") && strcmp (to, GDB_DEFAULT_HOST_CHARSET)) - return -1; + static struct + { + const char *name; + int token; + } const encodings[] = + { + { "UTF-16", (2 - 1) | TOKEN_BIGENDIAN }, + { "UTF-16BE", (2 - 1) | TOKEN_BIGENDIAN }, + { "UTF-16LE", (2 - 1) }, + { "UTF-32", (4 - 1) | TOKEN_BIGENDIAN }, + { "UTF-32BE", (4 - 1) | TOKEN_BIGENDIAN }, + { "UTF-32LE", (4 - 1) }, + { GDB_DEFAULT_HOST_CHARSET, (1 - 1) }, +#if WORDS_BIGENDIAN + { "wchar_t", (sizeof (gdb_wchar_t) - 1) | TOKEN_BIGENDIAN }, +#else + { "wchar_t", (sizeof (gdb_wchar_t) - 1) }, +#endif + { NULL, -1 } + }; - if (!strcmp (from, "UTF-32BE") || !strcmp (from, "UTF-32")) - return 1; + for (auto p = encodings; p->name; p++) + if (strcmp (encoding, p->name) == 0) + return p->token; - if (!strcmp (from, "UTF-32LE")) - return 2; + return -1; +} - if (strcmp (from, "wchar_t") && strcmp (from, GDB_DEFAULT_HOST_CHARSET)) +static void +phony_split_token (int token, size_t &size, enum bfd_endian &endian) +{ + /* Extract parameter values from the token. */ + endian = token & TOKEN_BIGENDIAN ? BFD_ENDIAN_BIG : BFD_ENDIAN_LITTLE; + size = (token & TOKEN_MASK & ~TOKEN_BIGENDIAN) + 1; +} + +static iconv_t +phony_iconv_open (const char *to, const char *from) +{ + int totok = phony_iconv_token (to); + int fromtok = phony_iconv_token (from); + + if (totok < 0 || fromtok < 0) return -1; - return 0; + return (totok << TOKEN_BITS) | fromtok; } static int @@ -128,60 +164,44 @@ phony_iconv_close (iconv_t arg) } static size_t -phony_iconv (iconv_t utf_flag, const char **inbuf, size_t *inbytesleft, +phony_iconv (iconv_t token, const char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft) { - if (utf_flag) + enum bfd_endian toendian, fromendian; + size_t tosize, fromsize; + unsigned long maxval; + + if(token & ~((TOKEN_MASK << TOKEN_BITS) | TOKEN_MASK)) { - enum bfd_endian endian - = utf_flag == 1 ? BFD_ENDIAN_BIG : BFD_ENDIAN_LITTLE; - while (*inbytesleft >= 4) - { - unsigned long c - = extract_unsigned_integer ((const gdb_byte *)*inbuf, 4, endian); + errno = EBADF; + return (size_t) -1; + } - if (c >= 256) - { - errno = EILSEQ; - return -1; - } - if (*outbytesleft < 1) - { - errno = E2BIG; - return -1; - } - **outbuf = c & 0xff; - ++*outbuf; - --*outbytesleft; + phony_split_token (token, fromsize, fromendian); + phony_split_token (token >> TOKEN_BITS, tosize, toendian); + maxval = 1UL << (7 * tosize); /* Split shift to avoid count overflow. */ + maxval = (maxval << tosize) - 1; - *inbuf += 4; - *inbytesleft -= 4; - } - if (*inbytesleft) + while (*inbytesleft >= fromsize) + { + unsigned long c = extract_unsigned_integer ((const gdb_byte *) *inbuf, + fromsize, fromendian); + + if (c > maxval) { - /* Partial sequence on input. */ - errno = EINVAL; + errno = EILSEQ; return -1; } - } - else - { - /* In all other cases we simply copy input bytes to the - output. */ - size_t amt = *inbytesleft; - - if (amt > *outbytesleft) - amt = *outbytesleft; - memcpy (*outbuf, *inbuf, amt); - *inbuf += amt; - *outbuf += amt; - *inbytesleft -= amt; - *outbytesleft -= amt; - if (*inbytesleft) + if (*outbytesleft < tosize) { errno = E2BIG; return -1; } + store_unsigned_integer ((gdb_byte *) *outbuf, tosize, toendian, c); + *inbuf += fromsize; + *inbytesleft -= fromsize; + *outbuf += tosize; + *outbytesleft -= tosize; } /* The number of non-reversible conversions -- but they were all