Message ID | 20221117194241.1776125-8-simon.marchi@efficios.com |
---|---|
State | New |
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B4B6384F6F9 for <patchwork@sourceware.org>; Thu, 17 Nov 2022 19:43:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B4B6384F6F9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1668714227; bh=fy0kVtLa/yc9V8kjFemg4vZpYYqBkgLmK8c03OCQKb0=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=v8RVPME9Z91mil1p3CRdh+4GiVnNXY9NJcIiHh41kgAL/V/sBzxYsPMwW9Fw9znrG cpN+VOWpUQVBU9tiLoPUFxI3186Taaw8nbSGP2uOPY9LKB+xVv5I35hQeCrR100gIJ +WG1Z7E1hUhpCkvqjA2rkjhIJU4kGJHQgNTh8ttM= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from barracuda.ebox.ca (barracuda.ebox.ca [96.127.255.19]) by sourceware.org (Postfix) with ESMTPS id B8CC63852C5F for <gdb-patches@sourceware.org>; Thu, 17 Nov 2022 19:42:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B8CC63852C5F X-ASG-Debug-ID: 1668714162-0c856e02a115ba00001-fS2M51 Received: from smtp.ebox.ca (smtp.ebox.ca [96.127.255.82]) by barracuda.ebox.ca with ESMTP id eOx5aJrrMAS4Ipcn (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Thu, 17 Nov 2022 14:42:42 -0500 (EST) X-Barracuda-Envelope-From: simon.marchi@efficios.com X-Barracuda-RBL-Trusted-Forwarder: 96.127.255.82 Received: from epycamd.internal.efficios.com (192-222-180-24.qc.cable.ebox.net [192.222.180.24]) by smtp.ebox.ca (Postfix) with ESMTP id DAE1C441D65; Thu, 17 Nov 2022 14:42:42 -0500 (EST) X-Barracuda-RBL-IP: 192.222.180.24 X-Barracuda-Effective-Source-IP: 192-222-180-24.qc.cable.ebox.net[192.222.180.24] X-Barracuda-Apparent-Source-IP: 192.222.180.24 To: gdb-patches@sourceware.org Cc: Simon Marchi <simon.marchi@efficios.com> Subject: [PATCH 7/8] gdbserver: switch to right process in find_one_thread Date: Thu, 17 Nov 2022 14:42:40 -0500 X-ASG-Orig-Subj: [PATCH 7/8] gdbserver: switch to right process in find_one_thread Message-Id: <20221117194241.1776125-8-simon.marchi@efficios.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221117194241.1776125-1-simon.marchi@efficios.com> References: <20221117194241.1776125-1-simon.marchi@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Barracuda-Connect: smtp.ebox.ca[96.127.255.82] X-Barracuda-Start-Time: 1668714162 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: https://96.127.255.19:443/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at ebox.ca X-Barracuda-Scan-Msg-Size: 7506 X-Barracuda-Spam-Score: 0.50 X-Barracuda-Spam-Status: No, SCORE=0.50 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests=WEIRD_PORT X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.102207 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 WEIRD_PORT URI: Uses non-standard port number for HTTP X-Spam-Status: No, score=-3498.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> From: Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> Reply-To: Simon Marchi <simon.marchi@efficios.com> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Fix some commit_resumed_state assertion failures (PR 28275)
|
|
Commit Message
Simon Marchi
Nov. 17, 2022, 7:42 p.m. UTC
When I do this: $ ./gdb -nx --data-directory=data-directory -q \ /bin/sleep \ -ex "maint set target-non-stop on" \ -ex "tar ext :1234" \ -ex "set remote exec-file /bin/sleep" \ -ex "run 1231 &" \ -ex add-inferior \ -ex "inferior 2" Reading symbols from /bin/sleep... (No debugging symbols found in /bin/sleep) Remote debugging using :1234 Starting program: /bin/sleep 1231 Reading /lib64/ld-linux-x86-64.so.2 from remote target... warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. Reading /lib64/ld-linux-x86-64.so.2 from remote target... Reading /usr/lib/debug/.build-id/a6/7a1408f18db3576757eea210d07ba3fc560dff.debug from remote target... [New inferior 2] Added inferior 2 on connection 1 (extended-remote :1234) [Switching to inferior 2 [<null>] (<noexec>)] (gdb) Reading /lib/x86_64-linux-gnu/libc.so.6 from remote target... attach 3659848 Attaching to process 3659848 /home/smarchi/src/binutils-gdb/gdb/thread.c:85: internal-error: inferior_thread: Assertion `current_thread_ != nullptr' failed. The internal error of GDB is actually caused by GDBserver crashing, and the error recovery of GDB is not on point. This patch aims to fix just the GDBserver crash, not the GDB problem. GDBserver crashes with a segfault here: (gdb) bt #0 0x00005555557fb3f4 in find_one_thread (ptid=...) at /home/smarchi/src/binutils-gdb/gdbserver/thread-db.cc:177 #1 0x00005555557fd5cf in thread_db_thread_handle (ptid=<error reading variable: Cannot access memory at address 0xffffffffffffffa0>, handle=0x7fffffffc400, handle_len=0x7fffffffc3f0) at /home/smarchi/src/binutils-gdb/gdbserver/thread-db.cc:461 #2 0x000055555578a0b6 in linux_process_target::thread_handle (this=0x5555558a64c0 <the_x86_target>, ptid=<error reading variable: Cannot access memory at address 0xffffffffffffffa0>, handle=0x7fffffffc400, handle_len=0x7fffffffc3f0) at /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:6905 #3 0x00005555556dfcc6 in handle_qxfer_threads_worker (thread=0x60b000000510, buffer=0x7fffffffc8a0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1645 #4 0x00005555556e00e6 in operator() (__closure=0x7fffffffc5e0, thread=0x60b000000510) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1696 #5 0x00005555556f54be in for_each_thread<handle_qxfer_threads_proper(buffer*)::<lambda(thread_info*)> >(struct {...}) (func=...) at /home/smarchi/src/binutils-gdb/gdbserver/gdbthread.h:159 #6 0x00005555556e0242 in handle_qxfer_threads_proper (buffer=0x7fffffffc8a0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1694 #7 0x00005555556e04ba in handle_qxfer_threads (annex=0x629000000213 "", readbuf=0x621000019100 '\276' <repeats 200 times>..., writebuf=0x0, offset=0, len=4097) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1732 #8 0x00005555556e1989 in handle_qxfer (own_buf=0x629000000200 "qXfer:threads", packet_len=26, new_packet_len_p=0x7fffffffd630) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2045 #9 0x00005555556e720a in handle_query (own_buf=0x629000000200 "qXfer:threads", packet_len=26, new_packet_len_p=0x7fffffffd630) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2685 #10 0x00005555556f1a01 in process_serial_event () at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4176 #11 0x00005555556f4457 in handle_serial_event (err=0, client_data=0x0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4514 #12 0x0000555555820f56 in handle_file_event (file_ptr=0x607000000250, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573 #13 0x0000555555821895 in gdb_wait_for_event (block=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694 #14 0x000055555581f533 in gdb_do_one_event (mstimeout=-1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264 #15 0x00005555556ec9fb in start_event_loop () at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3512 #16 0x00005555556f0769 in captured_main (argc=4, argv=0x7fffffffe0d8) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3992 #17 0x00005555556f0e3f in main (argc=4, argv=0x7fffffffe0d8) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4078 The reason is a wrong current process when find_one_thread is called. The current process is the 2nd one, which was just attached. It does not yet have thread_db data (proc->priv->thread_db is nullptr). As we iterate on all threads of all process to fulfull the qxfer:threads:read request, we get to a thread of process 1 for which we haven't read thread_db information yet (lwp_info::thread_known is false), so we get into find_one_thread. find_one_thread uses `current_processĀ ()->priv->thread_db`, assuming the current process matches the ptid passed as a parameter, which is wrong. A segfault happens when trying to dereference that thread_db pointer. Fix this by making find_one_thread not assume what the current process / current thread is. If it needs to call into libthread_db, which we know will try to read memory from the current process, then temporarily set the current process. In the case where the thread is already know and we return early, we don't need to switch process. I hit this case when running the test included with the following patch, "gdb: disable commit resumed in target_kill", so the fix is exercised by that test. Change-Id: I09b00883e8b73b7e5f89d0f47cb4e9c0f3d6caaa --- gdbserver/thread-db.cc | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-)
Comments
Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes: > When I do this: > > $ ./gdb -nx --data-directory=data-directory -q \ > /bin/sleep \ > -ex "maint set target-non-stop on" \ > -ex "tar ext :1234" \ > -ex "set remote exec-file /bin/sleep" \ > -ex "run 1231 &" \ > -ex add-inferior \ > -ex "inferior 2" > Reading symbols from /bin/sleep... > (No debugging symbols found in /bin/sleep) > Remote debugging using :1234 > Starting program: /bin/sleep 1231 > Reading /lib64/ld-linux-x86-64.so.2 from remote target... > warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. > Reading /lib64/ld-linux-x86-64.so.2 from remote target... > Reading /usr/lib/debug/.build-id/a6/7a1408f18db3576757eea210d07ba3fc560dff.debug from remote target... > [New inferior 2] > Added inferior 2 on connection 1 (extended-remote :1234) > [Switching to inferior 2 [<null>] (<noexec>)] > (gdb) Reading /lib/x86_64-linux-gnu/libc.so.6 from remote target... > attach 3659848 > Attaching to process 3659848 > /home/smarchi/src/binutils-gdb/gdb/thread.c:85: internal-error: inferior_thread: Assertion `current_thread_ != nullptr' failed. > > The internal error of GDB is actually caused by GDBserver crashing, and > the error recovery of GDB is not on point. This patch aims to fix just > the GDBserver crash, not the GDB problem. > > GDBserver crashes with a segfault here: > > (gdb) bt > #0 0x00005555557fb3f4 in find_one_thread (ptid=...) at /home/smarchi/src/binutils-gdb/gdbserver/thread-db.cc:177 > #1 0x00005555557fd5cf in thread_db_thread_handle (ptid=<error reading variable: Cannot access memory at address 0xffffffffffffffa0>, handle=0x7fffffffc400, handle_len=0x7fffffffc3f0) > at /home/smarchi/src/binutils-gdb/gdbserver/thread-db.cc:461 > #2 0x000055555578a0b6 in linux_process_target::thread_handle (this=0x5555558a64c0 <the_x86_target>, ptid=<error reading variable: Cannot access memory at address 0xffffffffffffffa0>, handle=0x7fffffffc400, > handle_len=0x7fffffffc3f0) at /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:6905 > #3 0x00005555556dfcc6 in handle_qxfer_threads_worker (thread=0x60b000000510, buffer=0x7fffffffc8a0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1645 > #4 0x00005555556e00e6 in operator() (__closure=0x7fffffffc5e0, thread=0x60b000000510) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1696 > #5 0x00005555556f54be in for_each_thread<handle_qxfer_threads_proper(buffer*)::<lambda(thread_info*)> >(struct {...}) (func=...) at /home/smarchi/src/binutils-gdb/gdbserver/gdbthread.h:159 > #6 0x00005555556e0242 in handle_qxfer_threads_proper (buffer=0x7fffffffc8a0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1694 > #7 0x00005555556e04ba in handle_qxfer_threads (annex=0x629000000213 "", readbuf=0x621000019100 '\276' <repeats 200 times>..., writebuf=0x0, offset=0, len=4097) > at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1732 > #8 0x00005555556e1989 in handle_qxfer (own_buf=0x629000000200 "qXfer:threads", packet_len=26, new_packet_len_p=0x7fffffffd630) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2045 > #9 0x00005555556e720a in handle_query (own_buf=0x629000000200 "qXfer:threads", packet_len=26, new_packet_len_p=0x7fffffffd630) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2685 > #10 0x00005555556f1a01 in process_serial_event () at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4176 > #11 0x00005555556f4457 in handle_serial_event (err=0, client_data=0x0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4514 > #12 0x0000555555820f56 in handle_file_event (file_ptr=0x607000000250, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573 > #13 0x0000555555821895 in gdb_wait_for_event (block=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694 > #14 0x000055555581f533 in gdb_do_one_event (mstimeout=-1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264 > #15 0x00005555556ec9fb in start_event_loop () at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3512 > #16 0x00005555556f0769 in captured_main (argc=4, argv=0x7fffffffe0d8) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3992 > #17 0x00005555556f0e3f in main (argc=4, argv=0x7fffffffe0d8) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4078 > > The reason is a wrong current process when find_one_thread is called. > The current process is the 2nd one, which was just attached. It does > not yet have thread_db data (proc->priv->thread_db is nullptr). As we > iterate on all threads of all process to fulfull the qxfer:threads:read > request, we get to a thread of process 1 for which we haven't read > thread_db information yet (lwp_info::thread_known is false), so we get > into find_one_thread. find_one_thread uses > `current_processĀ ()->priv->thread_db`, assuming the current process > matches the ptid passed as a parameter, which is wrong. A segfault > happens when trying to dereference that thread_db pointer. > > Fix this by making find_one_thread not assume what the current process / > current thread is. If it needs to call into libthread_db, which we know > will try to read memory from the current process, then temporarily set > the current process. > > In the case where the thread is already know and we return early, we > don't need to switch process. > > I hit this case when running the test included with the following patch, > "gdb: disable commit resumed in target_kill", so the fix is exercised by > that test. I'm not able to reproduce the failure you describe. I've applied this series except for this patch, and run the test from patch #8 with native-extended-gdbserver board, and it passes just fine. Actually, I do see an issue with the test, but that issue doesn't seem to be related to this problem, and is present with and without this patch - I'll reply to patch #8 describing that issue. Is this bug intermittent? Or is it likely to depend on certain parts of the environment? I got the impression from the description that it if we did the steps in the right order then we'd get a nullptr dereference, which didn't feel like an intermittent issue, but maybe I don't understand correctly. That said, the change itself looks reasonable - but it would be nice to know why I can't reproduce the failure. Thanks, Andrew > > Change-Id: I09b00883e8b73b7e5f89d0f47cb4e9c0f3d6caaa > --- > gdbserver/thread-db.cc | 29 +++++++++++++++++------------ > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/gdbserver/thread-db.cc b/gdbserver/thread-db.cc > index 6e0e2228a5f..bf98ca9557a 100644 > --- a/gdbserver/thread-db.cc > +++ b/gdbserver/thread-db.cc > @@ -155,30 +155,35 @@ thread_db_state_str (td_thr_state_e state) > } > #endif > > -/* Get thread info about PTID, accessing memory via the current > - thread. */ > +/* Get thread info about PTID. */ > > static int > find_one_thread (ptid_t ptid) > { > - td_thrhandle_t th; > - td_thrinfo_t ti; > - td_err_e err; > - struct lwp_info *lwp; > - struct thread_db *thread_db = current_process ()->priv->thread_db; > - int lwpid = ptid.lwp (); > - > thread_info *thread = find_thread_ptid (ptid); > - lwp = get_thread_lwp (thread); > + lwp_info *lwp = get_thread_lwp (thread); > if (lwp->thread_known) > return 1; > > - /* Get information about this thread. */ > - err = thread_db->td_ta_map_lwp2thr_p (thread_db->thread_agent, lwpid, &th); > + /* Get information about this thread. libthread_db will need to read some > + memory, which will be done on the current process, so make PTID's process > + the current one. */ > + process_info *proc = find_process_pid (ptid.pid ()); > + gdb_assert (proc != nullptr); > + > + scoped_restore_current_thread restore_thread; > + switch_to_process (proc); > + > + thread_db *thread_db = proc->priv->thread_db; > + td_thrhandle_t th; > + int lwpid = ptid.lwp (); > + td_err_e err = thread_db->td_ta_map_lwp2thr_p (thread_db->thread_agent, lwpid, > + &th); > if (err != TD_OK) > error ("Cannot get thread handle for LWP %d: %s", > lwpid, thread_db_err_str (err)); > > + td_thrinfo_t ti; > err = thread_db->td_thr_get_info_p (&th, &ti); > if (err != TD_OK) > error ("Cannot get thread info for LWP %d: %s", > -- > 2.37.3
On 11/18/22 08:19, Andrew Burgess via Gdb-patches wrote: > Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes: > >> When I do this: >> >> $ ./gdb -nx --data-directory=data-directory -q \ >> /bin/sleep \ >> -ex "maint set target-non-stop on" \ >> -ex "tar ext :1234" \ >> -ex "set remote exec-file /bin/sleep" \ >> -ex "run 1231 &" \ >> -ex add-inferior \ >> -ex "inferior 2" >> Reading symbols from /bin/sleep... >> (No debugging symbols found in /bin/sleep) >> Remote debugging using :1234 >> Starting program: /bin/sleep 1231 >> Reading /lib64/ld-linux-x86-64.so.2 from remote target... >> warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. >> Reading /lib64/ld-linux-x86-64.so.2 from remote target... >> Reading /usr/lib/debug/.build-id/a6/7a1408f18db3576757eea210d07ba3fc560dff.debug from remote target... >> [New inferior 2] >> Added inferior 2 on connection 1 (extended-remote :1234) >> [Switching to inferior 2 [<null>] (<noexec>)] >> (gdb) Reading /lib/x86_64-linux-gnu/libc.so.6 from remote target... >> attach 3659848 >> Attaching to process 3659848 >> /home/smarchi/src/binutils-gdb/gdb/thread.c:85: internal-error: inferior_thread: Assertion `current_thread_ != nullptr' failed. >> >> The internal error of GDB is actually caused by GDBserver crashing, and >> the error recovery of GDB is not on point. This patch aims to fix just >> the GDBserver crash, not the GDB problem. >> >> GDBserver crashes with a segfault here: >> >> (gdb) bt >> #0 0x00005555557fb3f4 in find_one_thread (ptid=...) at /home/smarchi/src/binutils-gdb/gdbserver/thread-db.cc:177 >> #1 0x00005555557fd5cf in thread_db_thread_handle (ptid=<error reading variable: Cannot access memory at address 0xffffffffffffffa0>, handle=0x7fffffffc400, handle_len=0x7fffffffc3f0) >> at /home/smarchi/src/binutils-gdb/gdbserver/thread-db.cc:461 >> #2 0x000055555578a0b6 in linux_process_target::thread_handle (this=0x5555558a64c0 <the_x86_target>, ptid=<error reading variable: Cannot access memory at address 0xffffffffffffffa0>, handle=0x7fffffffc400, >> handle_len=0x7fffffffc3f0) at /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:6905 >> #3 0x00005555556dfcc6 in handle_qxfer_threads_worker (thread=0x60b000000510, buffer=0x7fffffffc8a0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1645 >> #4 0x00005555556e00e6 in operator() (__closure=0x7fffffffc5e0, thread=0x60b000000510) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1696 >> #5 0x00005555556f54be in for_each_thread<handle_qxfer_threads_proper(buffer*)::<lambda(thread_info*)> >(struct {...}) (func=...) at /home/smarchi/src/binutils-gdb/gdbserver/gdbthread.h:159 >> #6 0x00005555556e0242 in handle_qxfer_threads_proper (buffer=0x7fffffffc8a0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1694 >> #7 0x00005555556e04ba in handle_qxfer_threads (annex=0x629000000213 "", readbuf=0x621000019100 '\276' <repeats 200 times>..., writebuf=0x0, offset=0, len=4097) >> at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:1732 >> #8 0x00005555556e1989 in handle_qxfer (own_buf=0x629000000200 "qXfer:threads", packet_len=26, new_packet_len_p=0x7fffffffd630) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2045 >> #9 0x00005555556e720a in handle_query (own_buf=0x629000000200 "qXfer:threads", packet_len=26, new_packet_len_p=0x7fffffffd630) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2685 >> #10 0x00005555556f1a01 in process_serial_event () at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4176 >> #11 0x00005555556f4457 in handle_serial_event (err=0, client_data=0x0) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4514 >> #12 0x0000555555820f56 in handle_file_event (file_ptr=0x607000000250, ready_mask=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573 >> #13 0x0000555555821895 in gdb_wait_for_event (block=1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694 >> #14 0x000055555581f533 in gdb_do_one_event (mstimeout=-1) at /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264 >> #15 0x00005555556ec9fb in start_event_loop () at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3512 >> #16 0x00005555556f0769 in captured_main (argc=4, argv=0x7fffffffe0d8) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3992 >> #17 0x00005555556f0e3f in main (argc=4, argv=0x7fffffffe0d8) at /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4078 >> >> The reason is a wrong current process when find_one_thread is called. >> The current process is the 2nd one, which was just attached. It does >> not yet have thread_db data (proc->priv->thread_db is nullptr). As we >> iterate on all threads of all process to fulfull the qxfer:threads:read >> request, we get to a thread of process 1 for which we haven't read >> thread_db information yet (lwp_info::thread_known is false), so we get >> into find_one_thread. find_one_thread uses >> `current_processĀ ()->priv->thread_db`, assuming the current process >> matches the ptid passed as a parameter, which is wrong. A segfault >> happens when trying to dereference that thread_db pointer. >> >> Fix this by making find_one_thread not assume what the current process / >> current thread is. If it needs to call into libthread_db, which we know >> will try to read memory from the current process, then temporarily set >> the current process. >> >> In the case where the thread is already know and we return early, we >> don't need to switch process. >> >> I hit this case when running the test included with the following patch, >> "gdb: disable commit resumed in target_kill", so the fix is exercised by >> that test. > > I'm not able to reproduce the failure you describe. I've applied this > series except for this patch, and run the test from patch #8 with > native-extended-gdbserver board, and it passes just fine. > > Actually, I do see an issue with the test, but that issue doesn't seem > to be related to this problem, and is present with and without this > patch - I'll reply to patch #8 describing that issue. > > Is this bug intermittent? Or is it likely to depend on certain parts of > the environment? I got the impression from the description that it if > we did the steps in the right order then we'd get a nullptr dereference, > which didn't feel like an intermittent issue, but maybe I don't > understand correctly. The crash reproduces all the time for me with the instructions provided in the commit message, I think it's deterministic. However, I got confused by my own instructions :P. There's an "attach" command buried in there that is easy to miss. It's on that attach that things fail. If I put that attach on the command line, I don't get the crash (more on this later). I looked into it a bit more, it takes a relatively precise context for this to reproduce: - The first process must be far enough to have loaded its libc or libpthread (whatever triggers the loading of libthread_db), such that its proc->priv->thread_db is not nullptr - However, its lwp must still be in the `!lwp->thread_known` state, meaning GDBserver hasn't asked libthread_db to compute the thread handle yet. That means, GDB must not have refreshed the thread list yet, since that would cause the thread handles to be computed. That means, no stopping on a breakpoint, since that causes a thread list update. That's why the first inferior needs to be started with "run &". It hits some internal breakpoints when shared libraries are loaded, GDBserver asks for the symbols necessary to load libthread_db, but GDB never asks for a full thread list. - The attach then causes GDB to ask for a thread list update here: #18 0x000055ac22438062 in remote_target::update_thread_list (this=0x61700003d800) at /home/simark/src/binutils-gdb/gdb/remote.c:3946 #19 0x000055ac224445df in extended_remote_target::attach (this=0x61700003d800, args=0x602000077950 "936019", from_tty=1) at /home/simark/src/binutils-gdb/gdb/remote.c:6166 #20 0x000055ac21e1ffe2 in attach_command (args=0x602000077950 "936019", from_tty=1) at /home/simark/src/binutils-gdb/gdb/infcmd.c:2644 At this point, the current process is the second one (the one we attach to). Since we are early in the attach process, this process' proc->priv->thread_db is still nullptr. In thread_db_thread_handle, while handling a thread from the first process, we determined (correctly) that this thread's process uses thread_db but this thread's handle is not know yet. But then in find_one_thread, we use the current process, the second process, and try to dereference its nullptr proc->priv->thread_db. If I put the attach on the command line, GDB doesn't go back to the event loop between the "run &" and the "attach", so it won't handle events for the first process, not to the qSymbol dance, so the first process' thread_db is still nullptr when doing the attach. And we don't see the bug because thread_db_thread_handle will return early and not call find_one_thread. Ah, I know now why you don't see the crash when running the test without this patch applied. It's because I later modified the test to grab the first process' pid. To do so, it runs to some spot (after a getpid call), reads a variable and then does "continue &". The breakpoint stop causes a thread list update, and breaks the second condition listed above. I will make a dedicated test for this specific bug then and include it in a v2 for this patch. > > That said, the change itself looks reasonable - but it would be nice to > know why I can't reproduce the failure. You're right to ask, it made me look at it more and understand the conditions more clearly. Simon
diff --git a/gdbserver/thread-db.cc b/gdbserver/thread-db.cc index 6e0e2228a5f..bf98ca9557a 100644 --- a/gdbserver/thread-db.cc +++ b/gdbserver/thread-db.cc @@ -155,30 +155,35 @@ thread_db_state_str (td_thr_state_e state) } #endif -/* Get thread info about PTID, accessing memory via the current - thread. */ +/* Get thread info about PTID. */ static int find_one_thread (ptid_t ptid) { - td_thrhandle_t th; - td_thrinfo_t ti; - td_err_e err; - struct lwp_info *lwp; - struct thread_db *thread_db = current_process ()->priv->thread_db; - int lwpid = ptid.lwp (); - thread_info *thread = find_thread_ptid (ptid); - lwp = get_thread_lwp (thread); + lwp_info *lwp = get_thread_lwp (thread); if (lwp->thread_known) return 1; - /* Get information about this thread. */ - err = thread_db->td_ta_map_lwp2thr_p (thread_db->thread_agent, lwpid, &th); + /* Get information about this thread. libthread_db will need to read some + memory, which will be done on the current process, so make PTID's process + the current one. */ + process_info *proc = find_process_pid (ptid.pid ()); + gdb_assert (proc != nullptr); + + scoped_restore_current_thread restore_thread; + switch_to_process (proc); + + thread_db *thread_db = proc->priv->thread_db; + td_thrhandle_t th; + int lwpid = ptid.lwp (); + td_err_e err = thread_db->td_ta_map_lwp2thr_p (thread_db->thread_agent, lwpid, + &th); if (err != TD_OK) error ("Cannot get thread handle for LWP %d: %s", lwpid, thread_db_err_str (err)); + td_thrinfo_t ti; err = thread_db->td_thr_get_info_p (&th, &ti); if (err != TD_OK) error ("Cannot get thread info for LWP %d: %s",