gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
Message ID | 20230712032540.3110113-1-zhiyong.yan@windriver.com |
---|---|
State | New |
Headers |
Return-Path: <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 656DD3857711 for <patchwork@sourceware.org>; Wed, 12 Jul 2023 03:26:03 +0000 (GMT) X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from mx0a-0064b401.pphosted.com (mx0a-0064b401.pphosted.com [205.220.166.238]) by sourceware.org (Postfix) with ESMTPS id 64E2A3858D20 for <gdb-patches@sourceware.org>; Wed, 12 Jul 2023 03:25:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 64E2A3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=windriver.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=windriver.com Received: from pps.filterd (m0250810.ppops.net [127.0.0.1]) by mx0a-0064b401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36C2RN3F012919; Tue, 11 Jul 2023 20:25:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=windriver.com; h=from:to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=PPS06212021; bh=5FijOU0ct5Glrq4N2p ArkADEAupZ0NztzllOK6iUgQw=; b=SQrOnEdRrxy83Nmv2NuoZelGlYt8lifn1o 4y7Vf+Qu80RNg17LV6mf2sXOtSMQUcyZMlD3THxB7gsjpiqJdoRVLsXZmIz5z9AH 5bhb1c+nrnD2RZ8UOxBGr01uoWQuRihe0kNBCSs1R1BSZEA1OrbSZrE1Sm4oF91p MOk8AR1xJJi+lKjIjSPBEQ4FZqVcT1jxKQN0su+vW4tFFqkMBl8KwdeDUw/LncRm j4nfPfmCYMbIC+y/Az7wiw3ECjgFJMDaNwb9G3UMiOQ5pUq8/9jtqLWkHa2qB3DU P4zYp+E1woKooa2wWUFQidWJ6I3w3gIZNY0L1ofhdsEummuzBStg== Received: from ala-exchng01.corp.ad.wrs.com (ala-exchng01.wrs.com [147.11.82.252]) by mx0a-0064b401.pphosted.com (PPS) with ESMTPS id 3rq32yk1c0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Tue, 11 Jul 2023 20:25:43 -0700 (PDT) Received: from ALA-EXCHNG02.corp.ad.wrs.com (147.11.82.254) by ala-exchng01.corp.ad.wrs.com (147.11.82.252) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 11 Jul 2023 20:25:42 -0700 Received: from pek-lpd-ccm6.wrs.com (147.11.1.11) by ALA-EXCHNG02.corp.ad.wrs.com (147.11.82.254) with Microsoft SMTP Server id 15.1.2507.27 via Frontend Transport; Tue, 11 Jul 2023 20:25:41 -0700 From: <zhiyong.yan@windriver.com> To: <gdb-patches@sourceware.org> CC: <luis.machado@arm.com>, <kevinb@redhat.com>, <tom@tromey.com>, <zhiyong.yan@windriver.com> Subject: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step Date: Wed, 12 Jul 2023 11:25:40 +0800 Message-ID: <20230712032540.3110113-1-zhiyong.yan@windriver.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Proofpoint-GUID: Qst9h7qfS7cKzMDxlyxAJeU0qVnwAo6x X-Proofpoint-ORIG-GUID: Qst9h7qfS7cKzMDxlyxAJeU0qVnwAo6x X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-11_14,2023-07-11_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 impostorscore=0 mlxlogscore=668 priorityscore=1501 mlxscore=0 malwarescore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2305260000 definitions=main-2307120028 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/gdb-patches/> List-Post: <mailto:gdb-patches@sourceware.org> List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>, <mailto:gdb-patches-request@sourceware.org?subject=subscribe> Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" <gdb-patches-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
|
|
Commit Message
Yan, Zhiyong
July 12, 2023, 3:25 a.m. UTC
From: Zhiyong Yan <zhiyong.yan@windriver.com> Gdb should not assume pending threads always generate “a non-gdbserver trap event”, for example “Signal 17” event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending. Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387 --- gdbserver/linux-low.cc | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
Comments
Hi Zhiyong, I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch. I was able to build and run your test case, but I could not reproduce the bug on the Pi. I tested gdb.threads/*.exp using --target_board=native-gdbserver both with and without your patch. Some of these tests are racy, but my conclusion from just looking at the PASSes and FAILs (after many test runs) is that there are no regressions. But then I remembered to enable core dumps on the Pi and after running gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork by itself, I saw that it left a core file... $ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp ... === gdb Summary === # of unexpected core files 1 # of expected passes 240 The core file was from the running test case, not gdbserver, nor gdb. Looking at the core file in GDB shows... Program terminated with signal SIGTRAP, Trace/breakpoint trap. #0 0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29 29 x++; [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))] (gdb) x/i $pc => 0x10624 <break_here+12>: udf #16 (gdb) x/x $pc 0x10624 <break_here+12>: 0xe7f001f0 ...and in gdbserver/linux-aarch32-low.cc: #define arm_eabi_breakpoint 0xe7f001f0UL I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case. When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP. This core file is not created when I run the test using a gdbserver without your patch. I'm suspicious of the assert in linux_process_target::maybe_hw_step. Currently, it looks like this: bool linux_process_target::maybe_hw_step (thread_info *thread) { if (supports_hardware_single_step ()) return true; else { /* GDBserver must insert single-step breakpoint for software single step. */ gdb_assert (has_single_step_breakpoints (thread)); return false; } } But, when Yao Qi introduced it back in June, 2016, it looked like this: static int maybe_hw_step (struct thread_info *thread) { if (can_hardware_single_step ()) return 1; else { struct process_info *proc = get_thread_process (thread); /* GDBserver must insert reinsert breakpoint for software single step. */ gdb_assert (has_reinsert_breakpoints (proc)); return 0; } } So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted. Now, that's not at all obvious. Also, back in 2016, maybe_hw_step() was only called from two locations; in each case it was in a block in which the condition lwp->bp_reinsert != 0 was true. But now there are two other calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain. In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens? Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert. Kevin
Hi Kevin,
The callstack of assert is attached.
Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.
Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.
Best Regards.
Zhiyong
-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com>
Sent: Saturday, July 22, 2023 4:50 AM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Zhiyong,
I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch. I was able to build and run your test case, but I could not reproduce the bug on the Pi.
I tested gdb.threads/*.exp using --target_board=native-gdbserver both with and without your patch. Some of these tests are racy, but my conclusion from just looking at the PASSes and FAILs (after many test
runs) is that there are no regressions.
But then I remembered to enable core dumps on the Pi and after running gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork
by itself, I saw that it left a core file...
$ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp
...
=== gdb Summary ===
# of unexpected core files 1
# of expected passes 240
The core file was from the running test case, not gdbserver, nor gdb.
Looking at the core file in GDB shows...
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
29 x++;
[Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
(gdb) x/i $pc
=> 0x10624 <break_here+12>: udf #16
(gdb) x/x $pc
0x10624 <break_here+12>: 0xe7f001f0
...and in gdbserver/linux-aarch32-low.cc:
#define arm_eabi_breakpoint 0xe7f001f0UL
I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case. When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.
This core file is not created when I run the test using a gdbserver without your patch.
I'm suspicious of the assert in linux_process_target::maybe_hw_step.
Currently, it looks like this:
bool
linux_process_target::maybe_hw_step (thread_info *thread) {
if (supports_hardware_single_step ())
return true;
else
{
/* GDBserver must insert single-step breakpoint for software
single step. */
gdb_assert (has_single_step_breakpoints (thread));
return false;
}
}
But, when Yao Qi introduced it back in June, 2016, it looked like
this:
static int
maybe_hw_step (struct thread_info *thread) {
if (can_hardware_single_step ())
return 1;
else
{
struct process_info *proc = get_thread_process (thread);
/* GDBserver must insert reinsert breakpoint for software
single step. */
gdb_assert (has_reinsert_breakpoints (proc));
return 0;
}
}
So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted. Now, that's not at all obvious.
Also, back in 2016, maybe_hw_step() was only called from two locations; in each case it was in a block in which the condition
lwp->bp_reinsert != 0 was true. But now there are two other
calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.
In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?
Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.
Kevin
maybe_hw_step: Assertion `has_single_step_breakpoints (thread)' failed.
Aborted (core dumped)
EC@mpr3.1> ls -l
total 7464
-rw------- 1 root root 630784 Jun 28 12:14 core-gdbserver-11475-1656418477
-rwxr-xr-x 1 root root 7265572 Jun 28 12:12 gdbserver
EC@mpr3.1> gdb ./gdbserver ./core-gdbserver-11475-1656418477
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-wrs-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./gdbserver...
[New LWP 11475]
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `./gdbserver --once --attach :1234 325'.
Program terminated with signal SIGABRT, Aborted.
#0 0x76ca6216 in ?? () from /lib/libc.so.6
(gdb) bt
#0 0x76ca6216 in ?? () from /lib/libc.so.6
#1 0x76cb49d4 in raise () from /lib/libc.so.6
#2 0x76ca5ca0 in abort () from /lib/libc.so.6
#3 0x004a8280 in abort_or_exit () at ../../gdb-13.0.50.20221021/gdbserver/utils.cc:39
#4 internal_verror (file=<optimized out>, line=line@entry=2448, fmt=0x0, fmt@entry=0x7ee8c2bc "\210\316N", args=...,
args@entry=...) at ../../gdb-13.0.50.20221021/gdbserver/utils.cc:108
#5 0x004d83ce in internal_error_loc (file=<optimized out>, line=line@entry=2448, fmt=0x4e4bc8 "%s: Assertion `%s' failed.")
at ../../gdb-13.0.50.20221021/gdbsupport/errors.cc:58
#6 0x004c5c28 in linux_process_target::maybe_hw_step (this=<optimized out>, thread=<optimized out>)
at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2448
#7 linux_process_target::maybe_hw_step (this=<optimized out>, thread=<optimized out>)
at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2440
#8 0x004c660a in linux_process_target::resume_stopped_resumed_lwps (this=this@entry=0x50a3cc <the_arm_target>, thread=0x209fb28)
at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2466
#9 0x004c7614 in <lambda(thread_info*)>::operator() (__closure=<synthetic pointer>, thread=<optimized out>)
at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2606
#10 for_each_thread<linux_process_target::wait_for_event_filtered(ptid_t, ptid_t, int*, int)::<lambda(thread_info*)> > (func=...)
at ../../gdb-13.0.50.20221021/gdbserver/gdbthread.h:159
#11 linux_process_target::wait_for_event_filtered (this=this@entry=0x50a3cc <the_arm_target>, wait_ptid=..., filter_ptid=...,
wstatp=0x7ee8c544, options=1073741824) at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2604
#12 0x004c9426 in linux_process_target::wait_for_event (options=1079026852, wstatp=0x7ee8c544, ptid=...,
this=0x50a3cc <the_arm_target>) at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2676
#13 linux_process_target::wait_1 (this=this@entry=0x50a3cc <the_arm_target>, ptid=...,
ourstatus=ourstatus@entry=0x50c9f8 <g_client_state+1288>, target_options=..., target_options@entry=...)
at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2970
#14 0x004cab6c in linux_process_target::wait (this=0x50a3cc <the_arm_target>, ptid=..., ourstatus=0x50c9f8 <g_client_state+1288>,
target_options=...) at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:3624
#15 0x004b9778 in target_wait (options=..., status=0x50c9f8 <g_client_state+1288>, ptid=...)
at ../../gdb-13.0.50.20221021/gdbserver/target.cc:197
#16 mywait (ptid=..., ourstatus=ourstatus@entry=0x50c9f8 <g_client_state+1288>, options=...,
connected_wait=connected_wait@entry=1) at ../../gdb-13.0.50.20221021/gdbserver/target.cc:142
#17 0x004b3428 in resume (actions=<optimized out>, num_actions=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--c
at ../../gdb-13.0.50.20221021/gdbserver/server.cc:2916
#18 resume (actions=0x20a6778, num_actions=<optimized out>) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:2888
#19 0x004b3ed0 in handle_v_cont (own_buf=0x20a6778 "E\001") at ../../gdb-13.0.50.20221021/gdbserver/server.cc:2875
#20 handle_v_requests (own_buf=own_buf@entry=0x208b0a8 "vCont;r45b444,45b44c:p145.18a;c:p145.-1", packet_len=packet_len@entry=39, new_packet_len=new_packet_len@entry=0x7ee8c8ac) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:3135
#21 0x004b6b76 in process_serial_event () at ../../gdb-13.0.50.20221021/gdbserver/server.cc:4481
#22 handle_serial_event (err=<optimized out>, client_data=<optimized out>) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:4513
#23 0x004d8904 in gdb_wait_for_event (block=block@entry=1) at ../../gdb-13.0.50.20221021/gdbsupport/event-loop.cc:694
#24 0x004d9034 in gdb_wait_for_event (block=1) at ../../gdb-13.0.50.20221021/gdbsupport/event-loop.cc:593
#25 gdb_do_one_event (mstimeout=mstimeout@entry=-1) at ../../gdb-13.0.50.20221021/gdbsupport/event-loop.cc:264
#26 0x004a8a6c in start_event_loop () at ../../gdb-13.0.50.20221021/gdbserver/server.cc:3511
#27 captured_main (argv=<optimized out>, argc=5) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:3991
#28 main (argc=5, argv=<optimized out>) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:4077
(gdb)
(gdb)
(gdb) info thread
Id Target Id Frame
* 1 LWP 11475 0x76ca6216 in ?? () from /lib/libc.so.6
(gdb)
Hi Zhiyong, I looked at the backtrace that you provided and see that maybe_hw_step() is being called from linux_process_target::resume_stopped_resumed_lwps, which is the one location where I wasn't able to convince myself that the assert should hold. I was running your test case executable (osm) as an unprivileged user, so neither the syslog calls nor the sudo were working. (Sudo could perhaps work, but it wanted to prompt for a password and stdin and stdout were closed.) I've since modified it so that sudo isn't used and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how I discovered that sudo wasn't working. I've tried next'ing quite a lot, but so far I haven't reproduced the bug. (Hopefully, the sudo isn't required to reproduce the problem.) If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great! So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is: 1) Run osm in terminal 1. (I didn't want to mess with systemd.) Once I start running it, I see a bunch of messages from the dd command. 2) In terminal 2, I run: /path/to/gdbserver --debug --debug-format=all --remote-debug --event-loop-debug --once --attach :1234 $(pgrep osm) 3) In terminal 3, I run: /path/to/gdb osm -x ./gdbx2 (I've changed the target remote command in gdbx2 to refer to localhost.) I'm also attaching my hacked lupdated.c. If you see anything wrong with what I'm trying, please let me know. Kevin On Mon, 24 Jul 2023 13:36:24 +0000 "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > Hi Kevin, > The callstack of assert is attached. > Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error. > > Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform. > > Best Regards. > Zhiyong > > -----Original Message----- > From: Kevin Buettner <kevinb@redhat.com> > Sent: Saturday, July 22, 2023 4:50 AM > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > Hi Zhiyong, > > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch. I was able to build and run your test case, but I could not reproduce the bug on the Pi. > > I tested gdb.threads/*.exp using --target_board=native-gdbserver both with and without your patch. Some of these tests are racy, but my conclusion from just looking at the PASSes and FAILs (after many test > runs) is that there are no regressions. > > But then I remembered to enable core dumps on the Pi and after running gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork > by itself, I saw that it left a core file... > > $ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp > ... > === gdb Summary === > > # of unexpected core files 1 > # of expected passes 240 > > The core file was from the running test case, not gdbserver, nor gdb. > > Looking at the core file in GDB shows... > > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > #0 0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29 > 29 x++; > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))] > (gdb) x/i $pc > => 0x10624 <break_here+12>: udf #16 > (gdb) x/x $pc > 0x10624 <break_here+12>: 0xe7f001f0 > > ...and in gdbserver/linux-aarch32-low.cc: > > #define arm_eabi_breakpoint 0xe7f001f0UL > > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case. When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP. > > This core file is not created when I run the test using a gdbserver without your patch. > > I'm suspicious of the assert in linux_process_target::maybe_hw_step. > Currently, it looks like this: > > bool > linux_process_target::maybe_hw_step (thread_info *thread) { > if (supports_hardware_single_step ()) > return true; > else > { > /* GDBserver must insert single-step breakpoint for software > single step. */ > gdb_assert (has_single_step_breakpoints (thread)); > return false; > } > } > > But, when Yao Qi introduced it back in June, 2016, it looked like > this: > > static int > maybe_hw_step (struct thread_info *thread) { > if (can_hardware_single_step ()) > return 1; > else > { > struct process_info *proc = get_thread_process (thread); > > /* GDBserver must insert reinsert breakpoint for software > single step. */ > gdb_assert (has_reinsert_breakpoints (proc)); > return 0; > } > } > > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted. Now, that's not at all obvious. > > Also, back in 2016, maybe_hw_step() was only called from two locations; in each case it was in a block in which the condition > lwp->bp_reinsert != 0 was true. But now there are two other > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain. > > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens? > > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert. > > Kevin >
Hi Zhiyong, One problem that I encountered on my Pi, which may explain the behavior that you're seeing, is that recent 32-bit versions of the Raspberry Pi OS are running a 64-bit/aarch64 kernel, but the userland is 32-bit. root@rpi4-2:~# /usr/bin/uname -m aarch64 root@rpi4-2:~# file /usr/bin/ls /usr/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81004d065160807541b79235b23eea0e00a2d44e, for GNU/Linux 3.2.0, stripped Note that uname -m returns aarch64, but that "ls" and other executables are "ELF 32-bit ...". The binutils-gdb configury uses uname -m to figure out for what gdbserver host/target to build. (host and target must be the same, otherwise gdbserver won't build.) So it may be the case that you built an aarch64 gdbserver instead of an arm gdbserver. I think you can check this as follows: kev@rpi4-2:/mesquite2/sourceware-git/rpi-master/bld/gdbserver$ ls linux-{arm,aarch}* linux-aarch32-low.o linux-aarch32-tdesc.o linux-arm-low.o linux-arm-tdesc.o If you also/instead see linux-aarch64-low.o and linux-aarch64-tdesc.o in that list, then you probably have an aarch64 gdbserver. When I tried my build with uname -m returning aarch64, the build errored out because (I think) I was missing certain aarch64 header files. But I knew that I didn't want to build for aarch64, so I abandoned that build. What I ended up doing was making a wrapper for uname which substituted 'arm' for 'aarch64'. I put it in /usr/local/bin, and /usr/local/bin is early in my PATH, so the configury finds it first... root@rpi4-2:~# uname -m arm Here's my /usr/local/bin/uname script: - - - - root@rpi4-2:~# cat /usr/local/bin/uname #!/bin/bash /usr/bin/uname $* | sed -e s/aarch64/arm/ - - - - [ Yes, this is a hack, but I couldn't think of a cleaner way to do it. I tried a configure line with "--host=arm-linux --target=host-linux", but that didn't work because something in the build wanted arm-linux-ar to exist and it didn't. I could have made some symlinks, e.g. "ln -s /usr/bin/ar /usr/local/bin/arm-linux-ar", with similar symlinks for gcc, g++, ln, ranlib, etc, but that seemed like more work than my uname wrapper hack.] I just checked my gdbserver build. It's definitely getting into arm_target::supports_hardware_single_step: Breakpoint 1, linux_process_target::maybe_hw_step ( this=0x8645c <the_arm_target>, thread=0x9df38) at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-low.cc:2442 2442 if (supports_hardware_single_step ()) (gdb) s arm_target::supports_hardware_single_step (this=0x8645c <the_arm_target>) at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-arm-low.cc:1042 1042 return false; Kevin On Tue, 25 Jul 2023 04:21:00 +0000 "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > Hi Kevin, > I test gdb11 on RaspBerry Pi4. > As you said, I can't produce this assert issue. > The direct reason is because supports_hardware_single_step () returns on RaspBerry Pi4, not like xilinux-zynq. > Please see attached pictures, we can see arm_target::supports_hardware_single_step () is never entered. > This assert only happens when supports_hardware_single_step () returns 'false'. On Raspberry Pi4, when I hardcoded supports_hardware_single_step () returns 'false', then assert happened. > For more information about " This assert only happens when supports_hardware_single_step () returns 'false'". > You can check https://sourceware.org/bugzilla/show_bug.cgi?id=30387 > > So, the new question is why arm_target::supports_hardware_single_step () is never entered on Raspberry Pi4. > > Best Regards. > Zhiyong > > > -----Original Message-----, > From: Kevin Buettner <kevinb@redhat.com> > Sent: Tuesday, July 25, 2023 11:37 AM > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > Hi Zhiyong, > > I looked at the backtrace that you provided and see that maybe_hw_step() is being called from linux_process_target::resume_stopped_resumed_lwps, > which is the one location where I wasn't able to convince myself that the assert should hold. > > I was running your test case executable (osm) as an unprivileged user, so neither the syslog calls nor the sudo were working. (Sudo could perhaps work, but it wanted to prompt for a password and stdin and stdout were closed.) I've since modified it so that sudo isn't used and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how I discovered that sudo wasn't working. I've tried next'ing quite a lot, but so far I haven't reproduced the bug. (Hopefully, the sudo isn't required to reproduce the problem.) > > If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great! > > So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is: > > 1) Run osm in terminal 1. (I didn't want to mess with systemd.) Once I start running it, I see a bunch of messages from the dd command. > > 2) In terminal 2, I run: > > /path/to/gdbserver --debug --debug-format=all --remote-debug --event-loop-debug --once --attach :1234 $(pgrep osm) > > 3) In terminal 3, I run: > > /path/to/gdb osm -x ./gdbx2 > > (I've changed the target remote command in gdbx2 to refer to localhost.) > > I'm also attaching my hacked lupdated.c. If you see anything wrong with what I'm trying, please let me know. > > Kevin > > On Mon, 24 Jul 2023 13:36:24 +0000 > "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > > > Hi Kevin, > > The callstack of assert is attached. > > Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error. > > > > Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform. > > > > Best Regards. > > Zhiyong > > > > -----Original Message----- > > From: Kevin Buettner <kevinb@redhat.com> > > Sent: Saturday, July 22, 2023 4:50 AM > > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a > > pending thread whose last_resume_kind is resume_step > > > > CAUTION: This email comes from a non Wind River email account! > > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > Hi Zhiyong, > > > > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch. I was able to build and run your test case, but I could not reproduce the bug on the Pi. > > > > I tested gdb.threads/*.exp using --target_board=native-gdbserver both > > with and without your patch. Some of these tests are racy, but my > > conclusion from just looking at the PASSes and FAILs (after many test > > runs) is that there are no regressions. > > > > But then I remembered to enable core dumps on the Pi and after running > > gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-v > > fork by itself, I saw that it left a core file... > > > > $ make check RUNTESTFLAGS="--target_board=native-gdbserver" > > TESTS=gdb.threads/pending-fork-event-detach.exp > > ... > > === gdb Summary === > > > > # of unexpected core files 1 > > # of expected passes 240 > > > > The core file was from the running test case, not gdbserver, nor gdb. > > > > Looking at the core file in GDB shows... > > > > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > > #0 0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29 > > 29 x++; > > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))] > > (gdb) x/i $pc > > => 0x10624 <break_here+12>: udf #16 > > (gdb) x/x $pc > > 0x10624 <break_here+12>: 0xe7f001f0 > > > > ...and in gdbserver/linux-aarch32-low.cc: > > > > #define arm_eabi_breakpoint 0xe7f001f0UL > > > > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case. When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP. > > > > This core file is not created when I run the test using a gdbserver without your patch. > > > > I'm suspicious of the assert in linux_process_target::maybe_hw_step. > > Currently, it looks like this: > > > > bool > > linux_process_target::maybe_hw_step (thread_info *thread) { > > if (supports_hardware_single_step ()) > > return true; > > else > > { > > /* GDBserver must insert single-step breakpoint for software > > single step. */ > > gdb_assert (has_single_step_breakpoints (thread)); > > return false; > > } > > } > > > > But, when Yao Qi introduced it back in June, 2016, it looked like > > this: > > > > static int > > maybe_hw_step (struct thread_info *thread) { > > if (can_hardware_single_step ()) > > return 1; > > else > > { > > struct process_info *proc = get_thread_process (thread); > > > > /* GDBserver must insert reinsert breakpoint for software > > single step. */ > > gdb_assert (has_reinsert_breakpoints (proc)); > > return 0; > > } > > } > > > > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted. Now, that's not at all obvious. > > > > Also, back in 2016, maybe_hw_step() was only called from two > > locations; in each case it was in a block in which the condition > > lwp->bp_reinsert != 0 was true. But now there are two other > > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain. > > > > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens? > > > > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert. > > > > Kevin > >
Hi Kevin, Below is my PI's info: root@bcm-2xxx-rpi4:~# uname -a Linux bcm-2xxx-rpi4 5.15.110-yocto-standard #1 SMP PREEMPT Wed May 3 01:43:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux root@bcm-2xxx-rpi4:~# file gdbserver gdbserver: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-aarch64.so.1, BuildID[sha1]=1e6ee58be0809d620fbdf3f86c8c4541f01ad9e9, for GNU/Linux 3.14.0, with debug_info, not stripped root@bcm-2xxx-rpi4:~# file osm osm: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=da7ee15aec73080ae3954d803b542bd9e8185c44, for GNU/Linux 3.14.0, with debug_info, not stripped root@bcm-2xxx-rpi4:~# Both user app and OS are aarch64. [zyan1] On this, gdbserver supports hardware single step, assert doesn't happen. ----------------------------------------------------------- Below is my Xilinx-zynq info: Last login: Wed Jul 26 03:03:09 2023 root@xilinx-zynq:~# uname -a Linux xilinx-zynq 5.15.106-yocto-standard #1 SMP PREEMPT Tue Apr 11 03:06:10 UTC 2023 armv7l armv7l armv7l GNU/Linux root@xilinx-zynq:~# which ls /bin/ls root@xilinx-zynq:~# file /bin/ls /bin/ls: symbolic link to /bin/ls.coreutils root@xilinx-zynq:~# file /bin/ls.coreutils /bin/ls.coreutils: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=f8bf6bdad65965d53cdd9fd1ebfd00d191f4cbbc, for GNU/Linux 3.2.0, stripped root@xilinx-zynq:~# [zyan1] On this, gdbserver doesn't support hardware single step, assert can happen. Best Regards. -----Original Message----- From: Kevin Buettner <kevinb@redhat.com> Sent: Tuesday, July 25, 2023 2:32 PM To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step CAUTION: This email comes from a non Wind River email account! Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Zhiyong, One problem that I encountered on my Pi, which may explain the behavior that you're seeing, is that recent 32-bit versions of the Raspberry Pi OS are running a 64-bit/aarch64 kernel, but the userland is 32-bit. root@rpi4-2:~# /usr/bin/uname -m aarch64 root@rpi4-2:~# file /usr/bin/ls /usr/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81004d065160807541b79235b23eea0e00a2d44e, for GNU/Linux 3.2.0, stripped Note that uname -m returns aarch64, but that "ls" and other executables are "ELF 32-bit ...". The binutils-gdb configury uses uname -m to figure out for what gdbserver host/target to build. (host and target must be the same, otherwise gdbserver won't build.) So it may be the case that you built an aarch64 gdbserver instead of an arm gdbserver. I think you can check this as follows: kev@rpi4-2:/mesquite2/sourceware-git/rpi-master/bld/gdbserver$ ls linux-{arm,aarch}* linux-aarch32-low.o linux-aarch32-tdesc.o linux-arm-low.o linux-arm-tdesc.o If you also/instead see linux-aarch64-low.o and linux-aarch64-tdesc.o in that list, then you probably have an aarch64 gdbserver. When I tried my build with uname -m returning aarch64, the build errored out because (I think) I was missing certain aarch64 header files. But I knew that I didn't want to build for aarch64, so I abandoned that build. What I ended up doing was making a wrapper for uname which substituted 'arm' for 'aarch64'. I put it in /usr/local/bin, and /usr/local/bin is early in my PATH, so the configury finds it first... root@rpi4-2:~# uname -m arm Here's my /usr/local/bin/uname script: - - - - root@rpi4-2:~# cat /usr/local/bin/uname #!/bin/bash /usr/bin/uname $* | sed -e s/aarch64/arm/ - - - - [ Yes, this is a hack, but I couldn't think of a cleaner way to do it. I tried a configure line with "--host=arm-linux --target=host-linux", but that didn't work because something in the build wanted arm-linux-ar to exist and it didn't. I could have made some symlinks, e.g. "ln -s /usr/bin/ar /usr/local/bin/arm-linux-ar", with similar symlinks for gcc, g++, ln, ranlib, etc, but that seemed like more work than my uname wrapper hack.] I just checked my gdbserver build. It's definitely getting into arm_target::supports_hardware_single_step: Breakpoint 1, linux_process_target::maybe_hw_step ( this=0x8645c <the_arm_target>, thread=0x9df38) at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-low.cc:2442 2442 if (supports_hardware_single_step ()) (gdb) s arm_target::supports_hardware_single_step (this=0x8645c <the_arm_target>) at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-arm-low.cc:1042 1042 return false; Kevin On Tue, 25 Jul 2023 04:21:00 +0000 "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > Hi Kevin, > I test gdb11 on RaspBerry Pi4. > As you said, I can't produce this assert issue. > The direct reason is because supports_hardware_single_step () returns on RaspBerry Pi4, not like xilinux-zynq. > Please see attached pictures, we can see arm_target::supports_hardware_single_step () is never entered. > This assert only happens when supports_hardware_single_step () returns 'false'. On Raspberry Pi4, when I hardcoded supports_hardware_single_step () returns 'false', then assert happened. > For more information about " This assert only happens when supports_hardware_single_step () returns 'false'". > You can check > https://sourceware.org/bugzilla/show_bug.cgi?id=30387 > > So, the new question is why arm_target::supports_hardware_single_step () is never entered on Raspberry Pi4. > > Best Regards. > Zhiyong > > > -----Original Message-----, > From: Kevin Buettner <kevinb@redhat.com> > Sent: Tuesday, July 25, 2023 11:37 AM > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a > pending thread whose last_resume_kind is resume_step > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > Hi Zhiyong, > > I looked at the backtrace that you provided and see that > maybe_hw_step() is being called from > linux_process_target::resume_stopped_resumed_lwps, > which is the one location where I wasn't able to convince myself that the assert should hold. > > I was running your test case executable (osm) as an unprivileged user, > so neither the syslog calls nor the sudo were working. (Sudo could > perhaps work, but it wanted to prompt for a password and stdin and > stdout were closed.) I've since modified it so that sudo isn't used > and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how > I discovered that sudo wasn't working. I've tried next'ing quite a > lot, but so far I haven't reproduced the bug. (Hopefully, the sudo > isn't required to reproduce the problem.) > > If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great! > > So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is: > > 1) Run osm in terminal 1. (I didn't want to mess with systemd.) Once I start running it, I see a bunch of messages from the dd command. > > 2) In terminal 2, I run: > > /path/to/gdbserver --debug --debug-format=all --remote-debug > --event-loop-debug --once --attach :1234 $(pgrep osm) > > 3) In terminal 3, I run: > > /path/to/gdb osm -x ./gdbx2 > > (I've changed the target remote command in gdbx2 to refer to > localhost.) > > I'm also attaching my hacked lupdated.c. If you see anything wrong with what I'm trying, please let me know. > > Kevin > > On Mon, 24 Jul 2023 13:36:24 +0000 > "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > > > Hi Kevin, > > The callstack of assert is attached. > > Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error. > > > > Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform. > > > > Best Regards. > > Zhiyong > > > > -----Original Message----- > > From: Kevin Buettner <kevinb@redhat.com> > > Sent: Saturday, July 22, 2023 4:50 AM > > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a > > pending thread whose last_resume_kind is resume_step > > > > CAUTION: This email comes from a non Wind River email account! > > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > Hi Zhiyong, > > > > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch. I was able to build and run your test case, but I could not reproduce the bug on the Pi. > > > > I tested gdb.threads/*.exp using --target_board=native-gdbserver > > both with and without your patch. Some of these tests are racy, but > > my conclusion from just looking at the PASSes and FAILs (after many > > test > > runs) is that there are no regressions. > > > > But then I remembered to enable core dumps on the Pi and after > > running > > gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main > > -v fork by itself, I saw that it left a core file... > > > > $ make check RUNTESTFLAGS="--target_board=native-gdbserver" > > TESTS=gdb.threads/pending-fork-event-detach.exp > > ... > > === gdb Summary === > > > > # of unexpected core files 1 > > # of expected passes 240 > > > > The core file was from the running test case, not gdbserver, nor gdb. > > > > Looking at the core file in GDB shows... > > > > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > > #0 0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29 > > 29 x++; > > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))] > > (gdb) x/i $pc > > => 0x10624 <break_here+12>: udf #16 > > (gdb) x/x $pc > > 0x10624 <break_here+12>: 0xe7f001f0 > > > > ...and in gdbserver/linux-aarch32-low.cc: > > > > #define arm_eabi_breakpoint 0xe7f001f0UL > > > > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case. When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP. > > > > This core file is not created when I run the test using a gdbserver without your patch. > > > > I'm suspicious of the assert in linux_process_target::maybe_hw_step. > > Currently, it looks like this: > > > > bool > > linux_process_target::maybe_hw_step (thread_info *thread) { > > if (supports_hardware_single_step ()) > > return true; > > else > > { > > /* GDBserver must insert single-step breakpoint for software > > single step. */ > > gdb_assert (has_single_step_breakpoints (thread)); > > return false; > > } > > } > > > > But, when Yao Qi introduced it back in June, 2016, it looked like > > this: > > > > static int > > maybe_hw_step (struct thread_info *thread) { > > if (can_hardware_single_step ()) > > return 1; > > else > > { > > struct process_info *proc = get_thread_process (thread); > > > > /* GDBserver must insert reinsert breakpoint for software > > single step. */ > > gdb_assert (has_reinsert_breakpoints (proc)); > > return 0; > > } > > } > > > > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted. Now, that's not at all obvious. > > > > Also, back in 2016, maybe_hw_step() was only called from two > > locations; in each case it was in a block in which the condition > > lwp->bp_reinsert != 0 was true. But now there are two other > > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain. > > > > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens? > > > > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert. > > > > Kevin > >
Hi Kevin, Possibly the conclusion is Raspberry with aarch64 supports gdb 'hardware single step', you need turn to another arm platform which doesn't supports gdb 'hardware single step' to produce this assert error. Best Regards. Zhiyong -----Original Message----- From: Yan, Zhiyong Sent: Tuesday, July 25, 2023 2:51 PM To: Kevin Buettner <kevinb@redhat.com> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com Subject: RE: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step Hi Kevin, Below is my PI's info: root@bcm-2xxx-rpi4:~# uname -a Linux bcm-2xxx-rpi4 5.15.110-yocto-standard #1 SMP PREEMPT Wed May 3 01:43:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux root@bcm-2xxx-rpi4:~# file gdbserver gdbserver: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-aarch64.so.1, BuildID[sha1]=1e6ee58be0809d620fbdf3f86c8c4541f01ad9e9, for GNU/Linux 3.14.0, with debug_info, not stripped root@bcm-2xxx-rpi4:~# file osm osm: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=da7ee15aec73080ae3954d803b542bd9e8185c44, for GNU/Linux 3.14.0, with debug_info, not stripped root@bcm-2xxx-rpi4:~# Both user app and OS are aarch64. [zyan1] On this, gdbserver supports hardware single step, assert doesn't happen. ----------------------------------------------------------- Below is my Xilinx-zynq info: Last login: Wed Jul 26 03:03:09 2023 root@xilinx-zynq:~# uname -a Linux xilinx-zynq 5.15.106-yocto-standard #1 SMP PREEMPT Tue Apr 11 03:06:10 UTC 2023 armv7l armv7l armv7l GNU/Linux root@xilinx-zynq:~# which ls /bin/ls root@xilinx-zynq:~# file /bin/ls /bin/ls: symbolic link to /bin/ls.coreutils root@xilinx-zynq:~# file /bin/ls.coreutils /bin/ls.coreutils: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=f8bf6bdad65965d53cdd9fd1ebfd00d191f4cbbc, for GNU/Linux 3.2.0, stripped root@xilinx-zynq:~# [zyan1] On this, gdbserver doesn't support hardware single step, assert can happen. Best Regards. -----Original Message----- From: Kevin Buettner <kevinb@redhat.com> Sent: Tuesday, July 25, 2023 2:32 PM To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step CAUTION: This email comes from a non Wind River email account! Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Zhiyong, One problem that I encountered on my Pi, which may explain the behavior that you're seeing, is that recent 32-bit versions of the Raspberry Pi OS are running a 64-bit/aarch64 kernel, but the userland is 32-bit. root@rpi4-2:~# /usr/bin/uname -m aarch64 root@rpi4-2:~# file /usr/bin/ls /usr/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81004d065160807541b79235b23eea0e00a2d44e, for GNU/Linux 3.2.0, stripped Note that uname -m returns aarch64, but that "ls" and other executables are "ELF 32-bit ...". The binutils-gdb configury uses uname -m to figure out for what gdbserver host/target to build. (host and target must be the same, otherwise gdbserver won't build.) So it may be the case that you built an aarch64 gdbserver instead of an arm gdbserver. I think you can check this as follows: kev@rpi4-2:/mesquite2/sourceware-git/rpi-master/bld/gdbserver$ ls linux-{arm,aarch}* linux-aarch32-low.o linux-aarch32-tdesc.o linux-arm-low.o linux-arm-tdesc.o If you also/instead see linux-aarch64-low.o and linux-aarch64-tdesc.o in that list, then you probably have an aarch64 gdbserver. When I tried my build with uname -m returning aarch64, the build errored out because (I think) I was missing certain aarch64 header files. But I knew that I didn't want to build for aarch64, so I abandoned that build. What I ended up doing was making a wrapper for uname which substituted 'arm' for 'aarch64'. I put it in /usr/local/bin, and /usr/local/bin is early in my PATH, so the configury finds it first... root@rpi4-2:~# uname -m arm Here's my /usr/local/bin/uname script: - - - - root@rpi4-2:~# cat /usr/local/bin/uname #!/bin/bash /usr/bin/uname $* | sed -e s/aarch64/arm/ - - - - [ Yes, this is a hack, but I couldn't think of a cleaner way to do it. I tried a configure line with "--host=arm-linux --target=host-linux", but that didn't work because something in the build wanted arm-linux-ar to exist and it didn't. I could have made some symlinks, e.g. "ln -s /usr/bin/ar /usr/local/bin/arm-linux-ar", with similar symlinks for gcc, g++, ln, ranlib, etc, but that seemed like more work than my uname wrapper hack.] I just checked my gdbserver build. It's definitely getting into arm_target::supports_hardware_single_step: Breakpoint 1, linux_process_target::maybe_hw_step ( this=0x8645c <the_arm_target>, thread=0x9df38) at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-low.cc:2442 2442 if (supports_hardware_single_step ()) (gdb) s arm_target::supports_hardware_single_step (this=0x8645c <the_arm_target>) at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-arm-low.cc:1042 1042 return false; Kevin On Tue, 25 Jul 2023 04:21:00 +0000 "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > Hi Kevin, > I test gdb11 on RaspBerry Pi4. > As you said, I can't produce this assert issue. > The direct reason is because supports_hardware_single_step () returns on RaspBerry Pi4, not like xilinux-zynq. > Please see attached pictures, we can see arm_target::supports_hardware_single_step () is never entered. > This assert only happens when supports_hardware_single_step () returns 'false'. On Raspberry Pi4, when I hardcoded supports_hardware_single_step () returns 'false', then assert happened. > For more information about " This assert only happens when supports_hardware_single_step () returns 'false'". > You can check > https://sourceware.org/bugzilla/show_bug.cgi?id=30387 > > So, the new question is why arm_target::supports_hardware_single_step () is never entered on Raspberry Pi4. > > Best Regards. > Zhiyong > > > -----Original Message-----, > From: Kevin Buettner <kevinb@redhat.com> > Sent: Tuesday, July 25, 2023 11:37 AM > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a > pending thread whose last_resume_kind is resume_step > > CAUTION: This email comes from a non Wind River email account! > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > Hi Zhiyong, > > I looked at the backtrace that you provided and see that > maybe_hw_step() is being called from > linux_process_target::resume_stopped_resumed_lwps, > which is the one location where I wasn't able to convince myself that the assert should hold. > > I was running your test case executable (osm) as an unprivileged user, > so neither the syslog calls nor the sudo were working. (Sudo could > perhaps work, but it wanted to prompt for a password and stdin and > stdout were closed.) I've since modified it so that sudo isn't used > and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how > I discovered that sudo wasn't working. I've tried next'ing quite a > lot, but so far I haven't reproduced the bug. (Hopefully, the sudo > isn't required to reproduce the problem.) > > If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great! > > So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is: > > 1) Run osm in terminal 1. (I didn't want to mess with systemd.) Once I start running it, I see a bunch of messages from the dd command. > > 2) In terminal 2, I run: > > /path/to/gdbserver --debug --debug-format=all --remote-debug > --event-loop-debug --once --attach :1234 $(pgrep osm) > > 3) In terminal 3, I run: > > /path/to/gdb osm -x ./gdbx2 > > (I've changed the target remote command in gdbx2 to refer to > localhost.) > > I'm also attaching my hacked lupdated.c. If you see anything wrong with what I'm trying, please let me know. > > Kevin > > On Mon, 24 Jul 2023 13:36:24 +0000 > "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote: > > > Hi Kevin, > > The callstack of assert is attached. > > Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error. > > > > Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform. > > > > Best Regards. > > Zhiyong > > > > -----Original Message----- > > From: Kevin Buettner <kevinb@redhat.com> > > Sent: Saturday, July 22, 2023 4:50 AM > > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com> > > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com > > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a > > pending thread whose last_resume_kind is resume_step > > > > CAUTION: This email comes from a non Wind River email account! > > Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > Hi Zhiyong, > > > > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch. I was able to build and run your test case, but I could not reproduce the bug on the Pi. > > > > I tested gdb.threads/*.exp using --target_board=native-gdbserver > > both with and without your patch. Some of these tests are racy, but > > my conclusion from just looking at the PASSes and FAILs (after many > > test > > runs) is that there are no regressions. > > > > But then I remembered to enable core dumps on the Pi and after > > running > > gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main > > -v fork by itself, I saw that it left a core file... > > > > $ make check RUNTESTFLAGS="--target_board=native-gdbserver" > > TESTS=gdb.threads/pending-fork-event-detach.exp > > ... > > === gdb Summary === > > > > # of unexpected core files 1 > > # of expected passes 240 > > > > The core file was from the running test case, not gdbserver, nor gdb. > > > > Looking at the core file in GDB shows... > > > > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > > #0 0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29 > > 29 x++; > > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))] > > (gdb) x/i $pc > > => 0x10624 <break_here+12>: udf #16 > > (gdb) x/x $pc > > 0x10624 <break_here+12>: 0xe7f001f0 > > > > ...and in gdbserver/linux-aarch32-low.cc: > > > > #define arm_eabi_breakpoint 0xe7f001f0UL > > > > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case. When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP. > > > > This core file is not created when I run the test using a gdbserver without your patch. > > > > I'm suspicious of the assert in linux_process_target::maybe_hw_step. > > Currently, it looks like this: > > > > bool > > linux_process_target::maybe_hw_step (thread_info *thread) { > > if (supports_hardware_single_step ()) > > return true; > > else > > { > > /* GDBserver must insert single-step breakpoint for software > > single step. */ > > gdb_assert (has_single_step_breakpoints (thread)); > > return false; > > } > > } > > > > But, when Yao Qi introduced it back in June, 2016, it looked like > > this: > > > > static int > > maybe_hw_step (struct thread_info *thread) { > > if (can_hardware_single_step ()) > > return 1; > > else > > { > > struct process_info *proc = get_thread_process (thread); > > > > /* GDBserver must insert reinsert breakpoint for software > > single step. */ > > gdb_assert (has_reinsert_breakpoints (proc)); > > return 0; > > } > > } > > > > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted. Now, that's not at all obvious. > > > > Also, back in 2016, maybe_hw_step() was only called from two > > locations; in each case it was in a block in which the condition > > lwp->bp_reinsert != 0 was true. But now there are two other > > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain. > > > > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens? > > > > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert. > > > > Kevin > >
Hi Zhiyong, I've finally been able to reproduce the bug on a Raspberry Pi. On a different SD card, I installed 32-bit Ubunutu server 20.04.5 LTS. It seems to have both a 32-bit (arm) kernel + 32-bit userland. I.e... kev@rpi4-3:~/Downloads/bz30387$ uname -m armv7l kev@rpi4-3:~/Downloads/bz30387$ file ./osm ./osm: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81e7e2b5dfba0fe35f1f1a6af2ee558efbdafa7f, for GNU/Linux 3.2.0, with debug_info, not stripped (Compare the above output to that your reported on your Pi; on your Pi, uname -m reported aarch64 and the 'file' command showed 64-bit aarch64 binaries.) The internal error appears to be the same as that described in your bug report as well as on the gdb-patches list: /mesquite2/sourceware-git/rpi-arm-master/bld/../../worktree-master/gdbserver/linux-low.cc:2448: A problem internal to GDBserver has been detected. maybe_hw_step: Assertion `has_single_step_breakpoints (thread)' failed. Now that I've reproduced it, I want to retest gdb.threads/*.exp to see if any of those tests show the same failure. If not, I'll try to adapt your test case into one suitable for the gdb test suite. I have an alternate patch in mind, which I'll try out too. If it works out, I'll ask you to test it on your hardware... Kevin
Hi Kevin,
Thanks for your effort.
Best Regards.
Zhiyong
-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com>
Sent: Wednesday, July 26, 2023 11:59 AM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Zhiyong,
I've finally been able to reproduce the bug on a Raspberry Pi. On a different SD card, I installed 32-bit Ubunutu server 20.04.5 LTS. It seems to have both a 32-bit (arm) kernel + 32-bit userland. I.e...
kev@rpi4-3:~/Downloads/bz30387$ uname -m armv7l kev@rpi4-3:~/Downloads/bz30387$ file ./osm
./osm: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81e7e2b5dfba0fe35f1f1a6af2ee558efbdafa7f, for GNU/Linux 3.2.0, with debug_info, not stripped
(Compare the above output to that your reported on your Pi; on your Pi, uname -m reported aarch64 and the 'file' command showed 64-bit
aarch64 binaries.)
The internal error appears to be the same as that described in your bug report as well as on the gdb-patches list:
/mesquite2/sourceware-git/rpi-arm-master/bld/../../worktree-master/gdbserver/linux-low.cc:2448: A problem internal to GDBserver has been detected.
maybe_hw_step: Assertion `has_single_step_breakpoints (thread)' failed.
Now that I've reproduced it, I want to retest gdb.threads/*.exp to see if any of those tests show the same failure. If not, I'll try to adapt your test case into one suitable for the gdb test suite.
I have an alternate patch in mind, which I'll try out too. If it works out, I'll ask you to test it on your hardware...
Kevin
diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc index e6a39202a98..d29881174db 100644 --- a/gdbserver/linux-low.cc +++ b/gdbserver/linux-low.cc @@ -4671,7 +4671,16 @@ linux_process_target::resume_one_thread (thread_info *thread, proceed_one_lwp (thread, NULL); } else - threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread)); + { + threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread)); + if (thread->last_resume_kind == resume_step) + { + /* If resume_step is required by GDB, + install single-step breakpoint. */ + if (supports_software_single_step ()) + install_software_single_step_breakpoints (lwp); + } + } thread->last_status.set_ignore (); lwp->resume = NULL;