From patchwork Thu Dec 31 16:48:07 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sandra Loosemore X-Patchwork-Id: 10187 Received: (qmail 9095 invoked by alias); 31 Dec 2015 16:48:53 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 9076 invoked by uid 89); 31 Dec 2015 16:48:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=iso88591, iso-8859-1, H*r:Thu, Match X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 31 Dec 2015 16:48:50 +0000 Received: from svr-orw-fem-03.mgc.mentorg.com ([147.34.97.39]) by relay1.mentorg.com with esmtp id 1aEgP8-0006jw-RH from Sandra_Loosemore@mentor.com for gdb-patches@sourceware.org; Thu, 31 Dec 2015 08:48:46 -0800 Received: from [IPv6:::1] (147.34.91.1) by svr-orw-fem-03.mgc.mentorg.com (147.34.97.39) with Microsoft SMTP Server id 14.3.224.2; Thu, 31 Dec 2015 08:48:46 -0800 To: gdb-patches From: Sandra Loosemore Subject: [patch] fix phony_iconv wide character support Message-ID: <56855C47.7090004@codesourcery.com> Date: Thu, 31 Dec 2015 09:48:07 -0700 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 A while back I submitted a patch to skip gdb.base/wchar.exp tests when GDB is built with PHONY_ICONV enabled: https://sourceware.org/ml/gdb-patches/2015-09/msg00388.html The consensus from the discussion seemed to be that it would be better to fix the PHONY_ICONV support instead, so that's what this patch does. I found a whole bunch of bugs in the existing code.... (1) phony_iconv_open had code to recognize "UTF-32BE" as a wide character set, but GDB_DEFAULT_TARGET_WIDE_CHARSET was set to "ISO-8859-1" instead, so the decoding wasn't getting triggered. (2) phony_iconv wasn't advancing *inbuf properly after processing each 4-byte group, plus it wasn't checking for output overflow. I don't see how the old code could ever have worked even if you explicitly chose UTF-32BE. (3) I was doing experiments on a little-endian target (nios2-linux-gnu), so I had to extend the conversion code to handle both endiannesses. (4) On Windows host, the system default (narrow) charset is CP1252 rather than ISO-8859-1; this affects the behavior of functions like isprint, and in particular what is printed for the character constant cent ('\242') used in the testcase. The testcase already checks for CP1252 but GDB wasn't reporting the default correctly. So, my patch fixes all these things and the gdb.base/wchar.exp tests now all PASS in this configuration. OK to commit? -Sandra diff --git a/gdb/charset.c b/gdb/charset.c index ee1ae20..82e5644 100644 --- a/gdb/charset.c +++ b/gdb/charset.c @@ -77,9 +77,13 @@ arrange for there to be a single available character set. */ #undef GDB_DEFAULT_HOST_CHARSET +#ifdef USE_WIN32API +#define GDB_DEFAULT_HOST_CHARSET "CP1252" +#else #define GDB_DEFAULT_HOST_CHARSET "ISO-8859-1" -#define GDB_DEFAULT_TARGET_CHARSET "ISO-8859-1" -#define GDB_DEFAULT_TARGET_WIDE_CHARSET "ISO-8859-1" +#endif +#define GDB_DEFAULT_TARGET_CHARSET GDB_DEFAULT_HOST_CHARSET +#define GDB_DEFAULT_TARGET_WIDE_CHARSET "UTF-32" #undef DEFAULT_CHARSET_NAMES #define DEFAULT_CHARSET_NAMES GDB_DEFAULT_HOST_CHARSET , @@ -95,20 +99,27 @@ #undef ICONV_CONST #define ICONV_CONST const +/* We allow conversions from UTF-32, wchar_t, and the host charset. + We allow conversions to wchar_t and the host charset + Return 1 if we are converting from UTF-32BE, 2 if from UTF32-LE, + 0 otherwise. This is used as a flag in calls to iconv. */ + static iconv_t phony_iconv_open (const char *to, const char *from) { - /* We allow conversions from UTF-32BE, wchar_t, and the host charset. - We allow conversions to wchar_t and the host charset. */ - if (strcmp (from, "UTF-32BE") && strcmp (from, "wchar_t") - && strcmp (from, GDB_DEFAULT_HOST_CHARSET)) - return -1; if (strcmp (to, "wchar_t") && strcmp (to, GDB_DEFAULT_HOST_CHARSET)) return -1; - /* Return 1 if we are converting from UTF-32BE, 0 otherwise. This is - used as a flag in calls to iconv. */ - return !strcmp (from, "UTF-32BE"); + if (!strcmp (from, "UTF-32BE") || !strcmp (from, "UTF-32")) + return 1; + + if (!strcmp (from, "UTF-32LE")) + return 2; + + if (strcmp (from, "wchar_t") && strcmp (from, GDB_DEFAULT_HOST_CHARSET)) + return -1; + + return 0; } static int @@ -130,8 +141,17 @@ phony_iconv (iconv_t utf_flag, const char **inbuf, size_t *inbytesleft, for (j = 0; j < 4; ++j) { - c <<= 8; - c += (*inbuf)[j] & 0xff; + if (utf_flag == 1) + { + /* Big-endian. */ + c <<= 8; + c += (*inbuf)[j] & 0xff; + } + else + { + /* Little-endian. */ + c += ((*inbuf)[j] & 0xff) << (8 * j); + } } if (c >= 256) @@ -139,15 +159,21 @@ phony_iconv (iconv_t utf_flag, const char **inbuf, size_t *inbytesleft, errno = EILSEQ; return -1; } + if (*outbytesleft < 1) + { + errno = E2BIG; + return -1; + } **outbuf = c & 0xff; ++*outbuf; --*outbytesleft; - ++*inbuf; + *inbuf += 4; *inbytesleft -= 4; } - if (*inbytesleft < 4) + if (*inbytesleft) { + /* Partial sequence on input. */ errno = EINVAL; return -1; } @@ -165,12 +191,11 @@ phony_iconv (iconv_t utf_flag, const char **inbuf, size_t *inbytesleft, *outbuf += amt; *inbytesleft -= amt; *outbytesleft -= amt; - } - - if (*inbytesleft) - { - errno = E2BIG; - return -1; + if (*inbytesleft) + { + errno = E2BIG; + return -1; + } } /* The number of non-reversible conversions -- but they were all @@ -290,6 +315,10 @@ set_be_le_names (struct gdbarch *gdbarch) return; be_le_arch = gdbarch; +#ifdef PHONY_ICONV + target_wide_charset_le_name = "UTF-32LE"; + target_wide_charset_be_name = "UTF-32BE"; +#else target_wide_charset_le_name = NULL; target_wide_charset_be_name = NULL; @@ -313,6 +342,7 @@ set_be_le_names (struct gdbarch *gdbarch) target_wide_charset_le_name = charset_enum[i]; } } +# endif /* PHONY_ICONV */ } /* 'Set charset', 'set host-charset', 'set target-charset', 'set