From patchwork Mon Jul 17 20:47:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom Tromey X-Patchwork-Id: 72812 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A0F32385828E for ; Mon, 17 Jul 2023 20:48:30 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from alt-proxy28.mail.unifiedlayer.com (alt-proxy28.mail.unifiedlayer.com [74.220.216.123]) by sourceware.org (Postfix) with ESMTPS id C25AF3858D28 for ; Mon, 17 Jul 2023 20:48:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C25AF3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tromey.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tromey.com Received: from cmgw10.mail.unifiedlayer.com (unknown [10.0.90.125]) by progateway1.mail.pro1.eigbox.com (Postfix) with ESMTP id 304411003B6C5 for ; Mon, 17 Jul 2023 20:48:04 +0000 (UTC) Received: from box5379.bluehost.com ([162.241.216.53]) by cmsmtp with ESMTP id LV8WqIeurSTfELV8WqlgUF; Mon, 17 Jul 2023 20:48:04 +0000 X-Authority-Reason: nr=8 X-Authority-Analysis: v=2.4 cv=dvUet3s4 c=1 sm=1 tr=0 ts=64b5a904 a=ApxJNpeYhEAb1aAlGBBbmA==:117 a=ApxJNpeYhEAb1aAlGBBbmA==:17 a=OWjo9vPv0XrRhIrVQ50Ab3nP57M=:19 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19 a=ws7JD89P4LkA:10:nop_rcvd_month_year a=Qbun_eYptAEA:10:endurance_base64_authed_username_1 a=CCpqsmhAAAAA:8 a=iVhWBCT8yCFhM6QTiZcA:9 a=ul9cdbp4aOFLsgKbc677:22 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tromey.com; s=default; h=Content-Transfer-Encoding:MIME-Version:Message-ID:Date:Subject: Cc:To:From:Sender:Reply-To:Content-Type:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=GRztCcADv49szSmA6sL5iB7YFdtIGmAxjCgGbxcPzt0=; b=ip0JEc7u1MCH2m0doHd6+m1Xve MPELt2JJv4KnkO/1aJ0MdO6Wp/N7MfZpsqj1eOZ88tNG9kAP27N/ItElcBALbcwZ5QxHbZRsFgWf2 whKby+0l/MB70Czh3eVPQsmNk; Received: from 75-166-135-140.hlrn.qwest.net ([75.166.135.140]:59658 helo=localhost.localdomain) by box5379.bluehost.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qLV8V-0029Bg-37; Mon, 17 Jul 2023 14:48:04 -0600 From: Tom Tromey To: gdb-patches@sourceware.org Cc: Tom Tromey Subject: [RFC] Filter invalid encodings from Linux thread names Date: Mon, 17 Jul 2023 14:47:56 -0600 Message-ID: <20230717204756.1880945-1-tom@tromey.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box5379.bluehost.com X-AntiAbuse: Original Domain - sourceware.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tromey.com X-BWhitelist: no X-Source-IP: 75.166.135.140 X-Source-L: No X-Exim-ID: 1qLV8V-0029Bg-37 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: 75-166-135-140.hlrn.qwest.net (localhost.localdomain) [75.166.135.140]:59658 X-Source-Auth: tom+tromey.com X-Email-Count: 1 X-Source-Cap: ZWx5bnJvYmk7ZWx5bnJvYmk7Ym94NTM3OS5ibHVlaG9zdC5jb20= X-Local-Domain: yes X-Spam-Status: No, score=-3025.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" On Linux, a thread can only be 16 bytes (including the trailing \0). A user sent in a test case where this causes a truncated UTF-8 sequence, causing gdbserver to create invalid XML. I went back and forth about different ways to solve this, and in the end decided to fix it in gdbserver, with the reason being that it seems important to generate correct XML for the response. I am not totally sure whether the call to setlocale could have unplanned consequences. This is needed, though, for nl_langinfo to return the correct result. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30618 --- gdbserver/linux-low.cc | 59 ++++++++++++++++++++++++++++++++++++++++-- gdbserver/server.cc | 1 + 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc index 651f219b738..98bb345b415 100644 --- a/gdbserver/linux-low.cc +++ b/gdbserver/linux-low.cc @@ -38,14 +38,16 @@ #include #include #include -#include #include #include #include #include #include #include +#include +#include #include "gdbsupport/filestuff.h" +#include "gdbsupport/gdb-safe-ctype.h" #include "tracepoint.h" #include #include "gdbsupport/common-inferior.h" @@ -6898,10 +6900,63 @@ current_lwp_ptid (void) return ptid_of (current_thread); } +/* A helper function that copies NAME to DEST, replacing non-printable + characters with '?'. Returns DEST as a convenience. */ + +static const char * +replace_non_ascii (char *dest, const char *name) +{ + while (*name != '\0') + { + if (!ISPRINT (*name)) + *dest++ = '?'; + else + *dest++ = *name; + ++name; + } + return dest; +} + const char * linux_process_target::thread_name (ptid_t thread) { - return linux_proc_tid_get_name (thread); + static char dest[100]; + + const char *name = linux_proc_tid_get_name (thread); + if (name == nullptr) + return nullptr; + + /* Linux limits the comm file to 16 bytes (including the trailing + \0. If the program or thread name is set when using a multi-byte + encoding, this might cause it to be truncated mid-character. In + this situation, sending the truncated form in an XML + response will cause a parse error in gdb. So, instead convert + from the locale's encoding (we can't be sure this is the correct + encoding, but it's as good a guess as we have) to UTF-8, but in a + way that ignores any encoding errors. See PR remote/30618. */ + const char *cset = nl_langinfo (CODESET); + iconv_t handle = iconv_open ("UTF-8//IGNORE", cset); + if (handle == (iconv_t) -1) + return replace_non_ascii (dest, name); + + size_t inbytes = strlen (name); + char *inbuf = const_cast (name); + size_t outbytes = sizeof (dest); + char *outbuf = dest; + size_t result = iconv (handle, &inbuf, &inbytes, &outbuf, &outbytes); + + if (result == (size_t) -1) + { + if (errno == E2BIG) + outbuf = &dest[sizeof (dest) - 1]; + else if ((errno == EILSEQ || errno == EINVAL) + && outbuf < &dest[sizeof (dest) - 2]) + *outbuf++ '?'; + *outbuf = '\0'; + } + + iconv_close (handle); + return *dest == '\0' ? nullptr : dest; } #if USE_THREAD_DB diff --git a/gdbserver/server.cc b/gdbserver/server.cc index c57270175b4..f6eb01af204 100644 --- a/gdbserver/server.cc +++ b/gdbserver/server.cc @@ -4056,6 +4056,7 @@ captured_main (int argc, char *argv[]) int main (int argc, char *argv[]) { + setlocale (LC_CTYPE, ""); try {