gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

Message ID 20230712032540.3110113-1-zhiyong.yan@windriver.com
State New
Headers
Series gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step |

Commit Message

Yan, Zhiyong July 12, 2023, 3:25 a.m. UTC
  From: Zhiyong Yan <zhiyong.yan@windriver.com>

Gdb should not assume pending threads always generate “a non-gdbserver trap event”, for example “Signal 17” event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.

Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
---
 gdbserver/linux-low.cc | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
  

Comments

Kevin Buettner July 21, 2023, 8:49 p.m. UTC | #1
Hi Zhiyong,

I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so
that I could test your patch.  I was able to build and run your test
case, but I could not reproduce the bug on the Pi.

I tested gdb.threads/*.exp using --target_board=native-gdbserver both
with and without your patch.  Some of these tests are racy, but my
conclusion from just looking at the PASSes and FAILs (after many test
runs) is that there are no regressions.

But then I remembered to enable core dumps on the Pi and after running
gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork
by itself, I saw that it left a core file...

$ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp
...
		=== gdb Summary ===

# of unexpected core files	1
# of expected passes		240

The core file was from the running test case, not gdbserver, nor gdb.

Looking at the core file in GDB shows...

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
29	  x++;
[Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
(gdb) x/i $pc
=> 0x10624 <break_here+12>:	udf	#16
(gdb) x/x $pc
0x10624 <break_here+12>:	0xe7f001f0

...and in gdbserver/linux-aarch32-low.cc:

#define arm_eabi_breakpoint 0xe7f001f0UL

I think what's happened here is that the breakpoint added by your
patch is left in place when GDB detaches the test case.  When it
starts running again, it hits the software single step breakpoint
and, since it's no longer under GDB control, it dies with a SIGTRAP.

This core file is not created when I run the test using a gdbserver
without your patch.

I'm suspicious of the assert in linux_process_target::maybe_hw_step. 
Currently, it looks like this:

bool
linux_process_target::maybe_hw_step (thread_info *thread)
{
  if (supports_hardware_single_step ())
    return true;
  else
    {
      /* GDBserver must insert single-step breakpoint for software
	 single step.  */
      gdb_assert (has_single_step_breakpoints (thread));
      return false;
    }
}

But, when Yao Qi introduced it back in June, 2016, it looked like
this:

static int
maybe_hw_step (struct thread_info *thread)
{
  if (can_hardware_single_step ())
    return 1;
  else
    {
      struct process_info *proc = get_thread_process (thread);

      /* GDBserver must insert reinsert breakpoint for software
     single step.  */
      gdb_assert (has_reinsert_breakpoints (proc));
      return 0;
    }
}

So, back is 2016, when it was introduced, it's clear that the assert
was referring to breakpoints which needed to be reinserted.  Now,
that's not at all obvious.

Also, back in 2016, maybe_hw_step() was only called from two
locations; in each case it was in a block in which the condition
lwp->bp_reinsert != 0 was true.  But now there are two other
calls; in one case, the software single step breakpoints have
just been inserted, so that should be okay, but for the other
case, in linux_process_target::resume_stopped_resumed_lwps, I'm
less certain.

In any case, could you comment out (or delete) the assert in a
version of the source without your patch and let me know what
happens?

Also, if possible, I'd like to see a backtrace from where the
assert occurs so that I can see which call to maybe_hw_step
is responsible for triggering the failing assert.

Kevin
  
Yan, Zhiyong July 24, 2023, 1:36 p.m. UTC | #2
Hi Kevin,
    The callstack of assert is attached.
    Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.

    Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.

Best Regards.
Zhiyong

-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com> 
Sent: Saturday, July 22, 2023 4:50 AM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Zhiyong,

I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch.  I was able to build and run your test case, but I could not reproduce the bug on the Pi.

I tested gdb.threads/*.exp using --target_board=native-gdbserver both with and without your patch.  Some of these tests are racy, but my conclusion from just looking at the PASSes and FAILs (after many test
runs) is that there are no regressions.

But then I remembered to enable core dumps on the Pi and after running gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork
by itself, I saw that it left a core file...

$ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp
...
                === gdb Summary ===

# of unexpected core files      1
# of expected passes            240

The core file was from the running test case, not gdbserver, nor gdb.

Looking at the core file in GDB shows...

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
29        x++;
[Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
(gdb) x/i $pc
=> 0x10624 <break_here+12>:     udf     #16
(gdb) x/x $pc
0x10624 <break_here+12>:        0xe7f001f0

...and in gdbserver/linux-aarch32-low.cc:

#define arm_eabi_breakpoint 0xe7f001f0UL

I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case.  When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.

This core file is not created when I run the test using a gdbserver without your patch.

I'm suspicious of the assert in linux_process_target::maybe_hw_step.
Currently, it looks like this:

bool
linux_process_target::maybe_hw_step (thread_info *thread) {
  if (supports_hardware_single_step ())
    return true;
  else
    {
      /* GDBserver must insert single-step breakpoint for software
         single step.  */
      gdb_assert (has_single_step_breakpoints (thread));
      return false;
    }
}

But, when Yao Qi introduced it back in June, 2016, it looked like
this:

static int
maybe_hw_step (struct thread_info *thread) {
  if (can_hardware_single_step ())
    return 1;
  else
    {
      struct process_info *proc = get_thread_process (thread);

      /* GDBserver must insert reinsert breakpoint for software
     single step.  */
      gdb_assert (has_reinsert_breakpoints (proc));
      return 0;
    }
}

So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted.  Now, that's not at all obvious.

Also, back in 2016, maybe_hw_step() was only called from two locations; in each case it was in a block in which the condition
lwp->bp_reinsert != 0 was true.  But now there are two other
calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.

In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?

Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.

Kevin
maybe_hw_step: Assertion `has_single_step_breakpoints (thread)' failed.
Aborted (core dumped)
EC@mpr3.1> ls -l
total 7464
-rw------- 1 root root  630784 Jun 28 12:14 core-gdbserver-11475-1656418477
-rwxr-xr-x 1 root root 7265572 Jun 28 12:12 gdbserver
EC@mpr3.1> gdb ./gdbserver ./core-gdbserver-11475-1656418477
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-wrs-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./gdbserver...
[New LWP 11475]

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `./gdbserver --once --attach :1234 325'.
Program terminated with signal SIGABRT, Aborted.
#0  0x76ca6216 in ?? () from /lib/libc.so.6
(gdb) bt
#0  0x76ca6216 in ?? () from /lib/libc.so.6
#1  0x76cb49d4 in raise () from /lib/libc.so.6
#2  0x76ca5ca0 in abort () from /lib/libc.so.6
#3  0x004a8280 in abort_or_exit () at ../../gdb-13.0.50.20221021/gdbserver/utils.cc:39
#4  internal_verror (file=<optimized out>, line=line@entry=2448, fmt=0x0, fmt@entry=0x7ee8c2bc "\210\316N", args=...,
    args@entry=...) at ../../gdb-13.0.50.20221021/gdbserver/utils.cc:108
#5  0x004d83ce in internal_error_loc (file=<optimized out>, line=line@entry=2448, fmt=0x4e4bc8 "%s: Assertion `%s' failed.")
    at ../../gdb-13.0.50.20221021/gdbsupport/errors.cc:58
#6  0x004c5c28 in linux_process_target::maybe_hw_step (this=<optimized out>, thread=<optimized out>)
    at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2448
#7  linux_process_target::maybe_hw_step (this=<optimized out>, thread=<optimized out>)
    at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2440
#8  0x004c660a in linux_process_target::resume_stopped_resumed_lwps (this=this@entry=0x50a3cc <the_arm_target>, thread=0x209fb28)
    at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2466
#9  0x004c7614 in <lambda(thread_info*)>::operator() (__closure=<synthetic pointer>, thread=<optimized out>)
    at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2606
#10 for_each_thread<linux_process_target::wait_for_event_filtered(ptid_t, ptid_t, int*, int)::<lambda(thread_info*)> > (func=...)
    at ../../gdb-13.0.50.20221021/gdbserver/gdbthread.h:159
#11 linux_process_target::wait_for_event_filtered (this=this@entry=0x50a3cc <the_arm_target>, wait_ptid=..., filter_ptid=...,
    wstatp=0x7ee8c544, options=1073741824) at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2604
#12 0x004c9426 in linux_process_target::wait_for_event (options=1079026852, wstatp=0x7ee8c544, ptid=...,
    this=0x50a3cc <the_arm_target>) at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2676
#13 linux_process_target::wait_1 (this=this@entry=0x50a3cc <the_arm_target>, ptid=...,
    ourstatus=ourstatus@entry=0x50c9f8 <g_client_state+1288>, target_options=..., target_options@entry=...)
    at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:2970
#14 0x004cab6c in linux_process_target::wait (this=0x50a3cc <the_arm_target>, ptid=..., ourstatus=0x50c9f8 <g_client_state+1288>,
    target_options=...) at ../../gdb-13.0.50.20221021/gdbserver/linux-low.cc:3624
#15 0x004b9778 in target_wait (options=..., status=0x50c9f8 <g_client_state+1288>, ptid=...)
    at ../../gdb-13.0.50.20221021/gdbserver/target.cc:197
#16 mywait (ptid=..., ourstatus=ourstatus@entry=0x50c9f8 <g_client_state+1288>, options=...,
    connected_wait=connected_wait@entry=1) at ../../gdb-13.0.50.20221021/gdbserver/target.cc:142
#17 0x004b3428 in resume (actions=<optimized out>, num_actions=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--c
    at ../../gdb-13.0.50.20221021/gdbserver/server.cc:2916
#18 resume (actions=0x20a6778, num_actions=<optimized out>) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:2888
#19 0x004b3ed0 in handle_v_cont (own_buf=0x20a6778 "E\001") at ../../gdb-13.0.50.20221021/gdbserver/server.cc:2875
#20 handle_v_requests (own_buf=own_buf@entry=0x208b0a8 "vCont;r45b444,45b44c:p145.18a;c:p145.-1", packet_len=packet_len@entry=39, new_packet_len=new_packet_len@entry=0x7ee8c8ac) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:3135
#21 0x004b6b76 in process_serial_event () at ../../gdb-13.0.50.20221021/gdbserver/server.cc:4481
#22 handle_serial_event (err=<optimized out>, client_data=<optimized out>) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:4513
#23 0x004d8904 in gdb_wait_for_event (block=block@entry=1) at ../../gdb-13.0.50.20221021/gdbsupport/event-loop.cc:694
#24 0x004d9034 in gdb_wait_for_event (block=1) at ../../gdb-13.0.50.20221021/gdbsupport/event-loop.cc:593
#25 gdb_do_one_event (mstimeout=mstimeout@entry=-1) at ../../gdb-13.0.50.20221021/gdbsupport/event-loop.cc:264
#26 0x004a8a6c in start_event_loop () at ../../gdb-13.0.50.20221021/gdbserver/server.cc:3511
#27 captured_main (argv=<optimized out>, argc=5) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:3991
#28 main (argc=5, argv=<optimized out>) at ../../gdb-13.0.50.20221021/gdbserver/server.cc:4077
(gdb)
(gdb)
(gdb) info thread
  Id   Target Id         Frame
* 1    LWP 11475         0x76ca6216 in ?? () from /lib/libc.so.6
(gdb)
  
Kevin Buettner July 25, 2023, 3:36 a.m. UTC | #3
Hi Zhiyong,

I looked at the backtrace that you provided and see that maybe_hw_step()
is being called from linux_process_target::resume_stopped_resumed_lwps,
which is the one location where I wasn't able to convince myself that
the assert should hold.

I was running your test case executable (osm) as an unprivileged user,
so neither the syslog calls nor the sudo were working.  (Sudo could
perhaps work, but it wanted to prompt for a password and stdin and
stdout were closed.)  I've since modified it so that sudo isn't used
and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how
I discovered that sudo wasn't working.  I've tried next'ing quite a
lot, but so far I haven't reproduced the bug.  (Hopefully, the sudo
isn't required to reproduce the problem.)

If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me
how to do it), that'd be great!

So, what I'm doing, using three separate terminals, in an attempt to
reproduce the bug is:

1) Run osm in terminal 1.  (I didn't want to mess with systemd.)  Once
I start running it, I see a bunch of messages from the dd command.

2) In terminal 2, I run:

   /path/to/gdbserver --debug --debug-format=all --remote-debug --event-loop-debug --once --attach :1234 $(pgrep osm)

3) In terminal 3, I run:

   /path/to/gdb osm -x ./gdbx2

(I've changed the target remote command in gdbx2 to refer to localhost.)

I'm also attaching my hacked lupdated.c.  If you see anything wrong
with what I'm trying, please let me know.

Kevin

On Mon, 24 Jul 2023 13:36:24 +0000
"Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:

> Hi Kevin,
>     The callstack of assert is attached.
>     Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.
> 
>     Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.
> 
> Best Regards.
> Zhiyong
> 
> -----Original Message-----
> From: Kevin Buettner <kevinb@redhat.com> 
> Sent: Saturday, July 22, 2023 4:50 AM
> To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> Hi Zhiyong,
> 
> I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch.  I was able to build and run your test case, but I could not reproduce the bug on the Pi.
> 
> I tested gdb.threads/*.exp using --target_board=native-gdbserver both with and without your patch.  Some of these tests are racy, but my conclusion from just looking at the PASSes and FAILs (after many test
> runs) is that there are no regressions.
> 
> But then I remembered to enable core dumps on the Pi and after running gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-vfork
> by itself, I saw that it left a core file...
> 
> $ make check RUNTESTFLAGS="--target_board=native-gdbserver" TESTS=gdb.threads/pending-fork-event-detach.exp
> ...
>                 === gdb Summary ===
> 
> # of unexpected core files      1
> # of expected passes            240
> 
> The core file was from the running test case, not gdbserver, nor gdb.
> 
> Looking at the core file in GDB shows...
> 
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> #0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
> 29        x++;
> [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
> (gdb) x/i $pc
> => 0x10624 <break_here+12>:     udf     #16  
> (gdb) x/x $pc
> 0x10624 <break_here+12>:        0xe7f001f0
> 
> ...and in gdbserver/linux-aarch32-low.cc:
> 
> #define arm_eabi_breakpoint 0xe7f001f0UL
> 
> I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case.  When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.
> 
> This core file is not created when I run the test using a gdbserver without your patch.
> 
> I'm suspicious of the assert in linux_process_target::maybe_hw_step.
> Currently, it looks like this:
> 
> bool
> linux_process_target::maybe_hw_step (thread_info *thread) {
>   if (supports_hardware_single_step ())
>     return true;
>   else
>     {
>       /* GDBserver must insert single-step breakpoint for software
>          single step.  */
>       gdb_assert (has_single_step_breakpoints (thread));
>       return false;
>     }
> }
> 
> But, when Yao Qi introduced it back in June, 2016, it looked like
> this:
> 
> static int
> maybe_hw_step (struct thread_info *thread) {
>   if (can_hardware_single_step ())
>     return 1;
>   else
>     {
>       struct process_info *proc = get_thread_process (thread);
> 
>       /* GDBserver must insert reinsert breakpoint for software
>      single step.  */
>       gdb_assert (has_reinsert_breakpoints (proc));
>       return 0;
>     }
> }
> 
> So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted.  Now, that's not at all obvious.
> 
> Also, back in 2016, maybe_hw_step() was only called from two locations; in each case it was in a block in which the condition
> lwp->bp_reinsert != 0 was true.  But now there are two other
> calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.
> 
> In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?
> 
> Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.
> 
> Kevin
>
  
Kevin Buettner July 25, 2023, 6:32 a.m. UTC | #4
Hi Zhiyong,

One problem that I encountered on my Pi, which may explain the
behavior that you're seeing, is that recent 32-bit versions of
the Raspberry Pi OS are running a 64-bit/aarch64 kernel, but the
userland is 32-bit.

root@rpi4-2:~# /usr/bin/uname -m
aarch64
root@rpi4-2:~# file /usr/bin/ls
/usr/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3,
BuildID[sha1]=81004d065160807541b79235b23eea0e00a2d44e, for GNU/Linux
3.2.0, stripped

Note that uname -m returns aarch64, but that "ls" and other executables
are "ELF 32-bit ...".

The binutils-gdb configury uses uname -m to figure out for what
gdbserver host/target to build.  (host and target must be the same,
otherwise gdbserver won't build.) So it may be the case that you built
an aarch64 gdbserver instead of an arm gdbserver.  I think you can
check this as follows:

kev@rpi4-2:/mesquite2/sourceware-git/rpi-master/bld/gdbserver$ ls linux-{arm,aarch}*
linux-aarch32-low.o  linux-aarch32-tdesc.o  linux-arm-low.o  linux-arm-tdesc.o

If you also/instead see linux-aarch64-low.o and linux-aarch64-tdesc.o
in that list, then you probably have an aarch64 gdbserver.

When I tried my build with uname -m returning aarch64, the build
errored out because (I think) I was missing certain aarch64 header
files.  But I knew that I didn't want to build for aarch64, so I
abandoned that build.  What I ended up doing was making a wrapper for
uname which substituted 'arm' for 'aarch64'.  I put it in
/usr/local/bin, and /usr/local/bin is early in my PATH, so the
configury finds it first...

root@rpi4-2:~# uname -m
arm

Here's my /usr/local/bin/uname script:
- - - -
root@rpi4-2:~# cat /usr/local/bin/uname
#!/bin/bash

/usr/bin/uname $* | sed -e s/aarch64/arm/
- - - -

[ Yes, this is a hack, but I couldn't think of a cleaner way to do it. 
  I tried a configure line with "--host=arm-linux --target=host-linux",
  but that didn't work because something in the build wanted
  arm-linux-ar to exist and it didn't.  I could have made some
  symlinks, e.g. "ln -s /usr/bin/ar /usr/local/bin/arm-linux-ar", with
  similar symlinks for gcc, g++, ln, ranlib, etc, but that seemed like
  more work than my uname wrapper hack.]

I just checked my gdbserver build.  It's definitely getting into
arm_target::supports_hardware_single_step:

Breakpoint 1, linux_process_target::maybe_hw_step (
    this=0x8645c <the_arm_target>, thread=0x9df38)
    at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-low.cc:2442
2442	  if (supports_hardware_single_step ())
(gdb) s
arm_target::supports_hardware_single_step (this=0x8645c <the_arm_target>)
    at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-arm-low.cc:1042
1042	  return false;

Kevin

On Tue, 25 Jul 2023 04:21:00 +0000
"Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:

> Hi Kevin,
>      I test gdb11 on RaspBerry Pi4.
>      As you said, I can't produce this assert issue.
>      The direct reason is because supports_hardware_single_step () returns on RaspBerry Pi4, not like xilinux-zynq.
>      Please see attached pictures, we can see arm_target::supports_hardware_single_step () is never entered.
>      This assert only happens when supports_hardware_single_step () returns 'false'. On Raspberry Pi4, when I hardcoded supports_hardware_single_step () returns 'false', then assert happened.
>      For more information about " This assert only happens when supports_hardware_single_step () returns 'false'".
>      You can check https://sourceware.org/bugzilla/show_bug.cgi?id=30387
> 
>      So, the new question is why arm_target::supports_hardware_single_step () is never entered on Raspberry Pi4.
> 
> Best Regards.
> Zhiyong
> 
> 
> -----Original Message-----, 
> From: Kevin Buettner <kevinb@redhat.com> 
> Sent: Tuesday, July 25, 2023 11:37 AM
> To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> Hi Zhiyong,
> 
> I looked at the backtrace that you provided and see that maybe_hw_step() is being called from linux_process_target::resume_stopped_resumed_lwps,
> which is the one location where I wasn't able to convince myself that the assert should hold.
> 
> I was running your test case executable (osm) as an unprivileged user, so neither the syslog calls nor the sudo were working.  (Sudo could perhaps work, but it wanted to prompt for a password and stdin and stdout were closed.)  I've since modified it so that sudo isn't used and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how I discovered that sudo wasn't working.  I've tried next'ing quite a lot, but so far I haven't reproduced the bug.  (Hopefully, the sudo isn't required to reproduce the problem.)
> 
> If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great!
> 
> So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is:
> 
> 1) Run osm in terminal 1.  (I didn't want to mess with systemd.)  Once I start running it, I see a bunch of messages from the dd command.
> 
> 2) In terminal 2, I run:
> 
>    /path/to/gdbserver --debug --debug-format=all --remote-debug --event-loop-debug --once --attach :1234 $(pgrep osm)
> 
> 3) In terminal 3, I run:
> 
>    /path/to/gdb osm -x ./gdbx2
> 
> (I've changed the target remote command in gdbx2 to refer to localhost.)
> 
> I'm also attaching my hacked lupdated.c.  If you see anything wrong with what I'm trying, please let me know.
> 
> Kevin
> 
> On Mon, 24 Jul 2023 13:36:24 +0000
> "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:
> 
> > Hi Kevin,
> >     The callstack of assert is attached.
> >     Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.
> >
> >     Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.
> >
> > Best Regards.
> > Zhiyong
> >
> > -----Original Message-----
> > From: Kevin Buettner <kevinb@redhat.com>
> > Sent: Saturday, July 22, 2023 4:50 AM
> > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a 
> > pending thread whose last_resume_kind is resume_step
> >
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> > Hi Zhiyong,
> >
> > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch.  I was able to build and run your test case, but I could not reproduce the bug on the Pi.
> >
> > I tested gdb.threads/*.exp using --target_board=native-gdbserver both 
> > with and without your patch.  Some of these tests are racy, but my 
> > conclusion from just looking at the PASSes and FAILs (after many test
> > runs) is that there are no regressions.
> >
> > But then I remembered to enable core dumps on the Pi and after running 
> > gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main-v
> > fork by itself, I saw that it left a core file...
> >
> > $ make check RUNTESTFLAGS="--target_board=native-gdbserver" 
> > TESTS=gdb.threads/pending-fork-event-detach.exp
> > ...
> >                 === gdb Summary ===
> >
> > # of unexpected core files      1
> > # of expected passes            240
> >
> > The core file was from the running test case, not gdbserver, nor gdb.
> >
> > Looking at the core file in GDB shows...
> >
> > Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> > #0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
> > 29        x++;
> > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
> > (gdb) x/i $pc  
> > => 0x10624 <break_here+12>:     udf     #16  
> > (gdb) x/x $pc
> > 0x10624 <break_here+12>:        0xe7f001f0
> >
> > ...and in gdbserver/linux-aarch32-low.cc:
> >
> > #define arm_eabi_breakpoint 0xe7f001f0UL
> >
> > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case.  When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.
> >
> > This core file is not created when I run the test using a gdbserver without your patch.
> >
> > I'm suspicious of the assert in linux_process_target::maybe_hw_step.
> > Currently, it looks like this:
> >
> > bool
> > linux_process_target::maybe_hw_step (thread_info *thread) {
> >   if (supports_hardware_single_step ())
> >     return true;
> >   else
> >     {
> >       /* GDBserver must insert single-step breakpoint for software
> >          single step.  */
> >       gdb_assert (has_single_step_breakpoints (thread));
> >       return false;
> >     }
> > }
> >
> > But, when Yao Qi introduced it back in June, 2016, it looked like
> > this:
> >
> > static int
> > maybe_hw_step (struct thread_info *thread) {
> >   if (can_hardware_single_step ())
> >     return 1;
> >   else
> >     {
> >       struct process_info *proc = get_thread_process (thread);
> >
> >       /* GDBserver must insert reinsert breakpoint for software
> >      single step.  */
> >       gdb_assert (has_reinsert_breakpoints (proc));
> >       return 0;
> >     }
> > }
> >
> > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted.  Now, that's not at all obvious.
> >
> > Also, back in 2016, maybe_hw_step() was only called from two 
> > locations; in each case it was in a block in which the condition
> > lwp->bp_reinsert != 0 was true.  But now there are two other
> > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.
> >
> > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?
> >
> > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.
> >
> > Kevin
> >
  
Yan, Zhiyong July 25, 2023, 6:50 a.m. UTC | #5
Hi Kevin,
     Below is my PI's info:

root@bcm-2xxx-rpi4:~# uname -a
Linux bcm-2xxx-rpi4 5.15.110-yocto-standard #1 SMP PREEMPT Wed May 3 01:43:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

root@bcm-2xxx-rpi4:~# file gdbserver
gdbserver: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-aarch64.so.1, BuildID[sha1]=1e6ee58be0809d620fbdf3f86c8c4541f01ad9e9, for GNU/Linux 3.14.0, with debug_info, not stripped

root@bcm-2xxx-rpi4:~# file osm
osm: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=da7ee15aec73080ae3954d803b542bd9e8185c44, for GNU/Linux 3.14.0, with debug_info, not stripped
root@bcm-2xxx-rpi4:~#

     Both user app and OS are aarch64. 
[zyan1] On this, gdbserver supports hardware single step, assert doesn't happen.
-----------------------------------------------------------
Below is my Xilinx-zynq info: 
Last login: Wed Jul 26 03:03:09 2023
root@xilinx-zynq:~# uname -a
Linux xilinx-zynq 5.15.106-yocto-standard #1 SMP PREEMPT Tue Apr 11 03:06:10 UTC 2023 armv7l armv7l armv7l GNU/Linux
root@xilinx-zynq:~# which ls
/bin/ls
root@xilinx-zynq:~# file /bin/ls
/bin/ls: symbolic link to /bin/ls.coreutils
root@xilinx-zynq:~# file /bin/ls.coreutils
/bin/ls.coreutils: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=f8bf6bdad65965d53cdd9fd1ebfd00d191f4cbbc, for GNU/Linux 3.2.0, stripped
root@xilinx-zynq:~#

[zyan1] On this, gdbserver doesn't support hardware single step, assert can happen.

Best Regards.

-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com> 
Sent: Tuesday, July 25, 2023 2:32 PM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Zhiyong,

One problem that I encountered on my Pi, which may explain the behavior that you're seeing, is that recent 32-bit versions of the Raspberry Pi OS are running a 64-bit/aarch64 kernel, but the userland is 32-bit.

root@rpi4-2:~# /usr/bin/uname -m
aarch64
root@rpi4-2:~# file /usr/bin/ls
/usr/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81004d065160807541b79235b23eea0e00a2d44e, for GNU/Linux 3.2.0, stripped

Note that uname -m returns aarch64, but that "ls" and other executables are "ELF 32-bit ...".

The binutils-gdb configury uses uname -m to figure out for what gdbserver host/target to build.  (host and target must be the same, otherwise gdbserver won't build.) So it may be the case that you built an aarch64 gdbserver instead of an arm gdbserver.  I think you can check this as follows:

kev@rpi4-2:/mesquite2/sourceware-git/rpi-master/bld/gdbserver$ ls linux-{arm,aarch}* linux-aarch32-low.o  linux-aarch32-tdesc.o  linux-arm-low.o  linux-arm-tdesc.o

If you also/instead see linux-aarch64-low.o and linux-aarch64-tdesc.o in that list, then you probably have an aarch64 gdbserver.

When I tried my build with uname -m returning aarch64, the build errored out because (I think) I was missing certain aarch64 header files.  But I knew that I didn't want to build for aarch64, so I abandoned that build.  What I ended up doing was making a wrapper for uname which substituted 'arm' for 'aarch64'.  I put it in /usr/local/bin, and /usr/local/bin is early in my PATH, so the configury finds it first...

root@rpi4-2:~# uname -m
arm

Here's my /usr/local/bin/uname script:
- - - -
root@rpi4-2:~# cat /usr/local/bin/uname
#!/bin/bash

/usr/bin/uname $* | sed -e s/aarch64/arm/
- - - -

[ Yes, this is a hack, but I couldn't think of a cleaner way to do it.
  I tried a configure line with "--host=arm-linux --target=host-linux",
  but that didn't work because something in the build wanted
  arm-linux-ar to exist and it didn't.  I could have made some
  symlinks, e.g. "ln -s /usr/bin/ar /usr/local/bin/arm-linux-ar", with
  similar symlinks for gcc, g++, ln, ranlib, etc, but that seemed like
  more work than my uname wrapper hack.]

I just checked my gdbserver build.  It's definitely getting into
arm_target::supports_hardware_single_step:

Breakpoint 1, linux_process_target::maybe_hw_step (
    this=0x8645c <the_arm_target>, thread=0x9df38)
    at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-low.cc:2442
2442      if (supports_hardware_single_step ())
(gdb) s
arm_target::supports_hardware_single_step (this=0x8645c <the_arm_target>)
    at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-arm-low.cc:1042
1042      return false;

Kevin

On Tue, 25 Jul 2023 04:21:00 +0000
"Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:

> Hi Kevin,
>      I test gdb11 on RaspBerry Pi4.
>      As you said, I can't produce this assert issue.
>      The direct reason is because supports_hardware_single_step () returns on RaspBerry Pi4, not like xilinux-zynq.
>      Please see attached pictures, we can see arm_target::supports_hardware_single_step () is never entered.
>      This assert only happens when supports_hardware_single_step () returns 'false'. On Raspberry Pi4, when I hardcoded supports_hardware_single_step () returns 'false', then assert happened.
>      For more information about " This assert only happens when supports_hardware_single_step () returns 'false'".
>      You can check 
> https://sourceware.org/bugzilla/show_bug.cgi?id=30387
>
>      So, the new question is why arm_target::supports_hardware_single_step () is never entered on Raspberry Pi4.
>
> Best Regards.
> Zhiyong
>
>
> -----Original Message-----,
> From: Kevin Buettner <kevinb@redhat.com>
> Sent: Tuesday, July 25, 2023 11:37 AM
> To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a 
> pending thread whose last_resume_kind is resume_step
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi Zhiyong,
>
> I looked at the backtrace that you provided and see that 
> maybe_hw_step() is being called from 
> linux_process_target::resume_stopped_resumed_lwps,
> which is the one location where I wasn't able to convince myself that the assert should hold.
>
> I was running your test case executable (osm) as an unprivileged user, 
> so neither the syslog calls nor the sudo were working.  (Sudo could 
> perhaps work, but it wanted to prompt for a password and stdin and 
> stdout were closed.)  I've since modified it so that sudo isn't used 
> and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how 
> I discovered that sudo wasn't working.  I've tried next'ing quite a 
> lot, but so far I haven't reproduced the bug.  (Hopefully, the sudo 
> isn't required to reproduce the problem.)
>
> If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great!
>
> So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is:
>
> 1) Run osm in terminal 1.  (I didn't want to mess with systemd.)  Once I start running it, I see a bunch of messages from the dd command.
>
> 2) In terminal 2, I run:
>
>    /path/to/gdbserver --debug --debug-format=all --remote-debug 
> --event-loop-debug --once --attach :1234 $(pgrep osm)
>
> 3) In terminal 3, I run:
>
>    /path/to/gdb osm -x ./gdbx2
>
> (I've changed the target remote command in gdbx2 to refer to 
> localhost.)
>
> I'm also attaching my hacked lupdated.c.  If you see anything wrong with what I'm trying, please let me know.
>
> Kevin
>
> On Mon, 24 Jul 2023 13:36:24 +0000
> "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:
>
> > Hi Kevin,
> >     The callstack of assert is attached.
> >     Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.
> >
> >     Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.
> >
> > Best Regards.
> > Zhiyong
> >
> > -----Original Message-----
> > From: Kevin Buettner <kevinb@redhat.com>
> > Sent: Saturday, July 22, 2023 4:50 AM
> > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a 
> > pending thread whose last_resume_kind is resume_step
> >
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> > Hi Zhiyong,
> >
> > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch.  I was able to build and run your test case, but I could not reproduce the bug on the Pi.
> >
> > I tested gdb.threads/*.exp using --target_board=native-gdbserver 
> > both with and without your patch.  Some of these tests are racy, but 
> > my conclusion from just looking at the PASSes and FAILs (after many 
> > test
> > runs) is that there are no regressions.
> >
> > But then I remembered to enable core dumps on the Pi and after 
> > running 
> > gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main
> > -v fork by itself, I saw that it left a core file...
> >
> > $ make check RUNTESTFLAGS="--target_board=native-gdbserver"
> > TESTS=gdb.threads/pending-fork-event-detach.exp
> > ...
> >                 === gdb Summary ===
> >
> > # of unexpected core files      1
> > # of expected passes            240
> >
> > The core file was from the running test case, not gdbserver, nor gdb.
> >
> > Looking at the core file in GDB shows...
> >
> > Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> > #0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
> > 29        x++;
> > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
> > (gdb) x/i $pc
> > => 0x10624 <break_here+12>:     udf     #16
> > (gdb) x/x $pc
> > 0x10624 <break_here+12>:        0xe7f001f0
> >
> > ...and in gdbserver/linux-aarch32-low.cc:
> >
> > #define arm_eabi_breakpoint 0xe7f001f0UL
> >
> > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case.  When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.
> >
> > This core file is not created when I run the test using a gdbserver without your patch.
> >
> > I'm suspicious of the assert in linux_process_target::maybe_hw_step.
> > Currently, it looks like this:
> >
> > bool
> > linux_process_target::maybe_hw_step (thread_info *thread) {
> >   if (supports_hardware_single_step ())
> >     return true;
> >   else
> >     {
> >       /* GDBserver must insert single-step breakpoint for software
> >          single step.  */
> >       gdb_assert (has_single_step_breakpoints (thread));
> >       return false;
> >     }
> > }
> >
> > But, when Yao Qi introduced it back in June, 2016, it looked like
> > this:
> >
> > static int
> > maybe_hw_step (struct thread_info *thread) {
> >   if (can_hardware_single_step ())
> >     return 1;
> >   else
> >     {
> >       struct process_info *proc = get_thread_process (thread);
> >
> >       /* GDBserver must insert reinsert breakpoint for software
> >      single step.  */
> >       gdb_assert (has_reinsert_breakpoints (proc));
> >       return 0;
> >     }
> > }
> >
> > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted.  Now, that's not at all obvious.
> >
> > Also, back in 2016, maybe_hw_step() was only called from two 
> > locations; in each case it was in a block in which the condition
> > lwp->bp_reinsert != 0 was true.  But now there are two other
> > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.
> >
> > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?
> >
> > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.
> >
> > Kevin
> >
  
Yan, Zhiyong July 25, 2023, 7:05 a.m. UTC | #6
Hi Kevin,
     Possibly the conclusion is Raspberry with aarch64 supports gdb 'hardware single step', you need turn to another arm platform which doesn't supports gdb 'hardware single step' to produce this assert error.

Best Regards.
Zhiyong

-----Original Message-----
From: Yan, Zhiyong 
Sent: Tuesday, July 25, 2023 2:51 PM
To: Kevin Buettner <kevinb@redhat.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: RE: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

Hi Kevin,
     Below is my PI's info:

root@bcm-2xxx-rpi4:~# uname -a
Linux bcm-2xxx-rpi4 5.15.110-yocto-standard #1 SMP PREEMPT Wed May 3 01:43:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

root@bcm-2xxx-rpi4:~# file gdbserver
gdbserver: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-aarch64.so.1, BuildID[sha1]=1e6ee58be0809d620fbdf3f86c8c4541f01ad9e9, for GNU/Linux 3.14.0, with debug_info, not stripped

root@bcm-2xxx-rpi4:~# file osm
osm: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=da7ee15aec73080ae3954d803b542bd9e8185c44, for GNU/Linux 3.14.0, with debug_info, not stripped root@bcm-2xxx-rpi4:~#

     Both user app and OS are aarch64. 
[zyan1] On this, gdbserver supports hardware single step, assert doesn't happen.
-----------------------------------------------------------
Below is my Xilinx-zynq info: 
Last login: Wed Jul 26 03:03:09 2023
root@xilinx-zynq:~# uname -a
Linux xilinx-zynq 5.15.106-yocto-standard #1 SMP PREEMPT Tue Apr 11 03:06:10 UTC 2023 armv7l armv7l armv7l GNU/Linux root@xilinx-zynq:~# which ls /bin/ls root@xilinx-zynq:~# file /bin/ls
/bin/ls: symbolic link to /bin/ls.coreutils root@xilinx-zynq:~# file /bin/ls.coreutils
/bin/ls.coreutils: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=f8bf6bdad65965d53cdd9fd1ebfd00d191f4cbbc, for GNU/Linux 3.2.0, stripped root@xilinx-zynq:~#

[zyan1] On this, gdbserver doesn't support hardware single step, assert can happen.

Best Regards.

-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com>
Sent: Tuesday, July 25, 2023 2:32 PM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Zhiyong,

One problem that I encountered on my Pi, which may explain the behavior that you're seeing, is that recent 32-bit versions of the Raspberry Pi OS are running a 64-bit/aarch64 kernel, but the userland is 32-bit.

root@rpi4-2:~# /usr/bin/uname -m
aarch64
root@rpi4-2:~# file /usr/bin/ls
/usr/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81004d065160807541b79235b23eea0e00a2d44e, for GNU/Linux 3.2.0, stripped

Note that uname -m returns aarch64, but that "ls" and other executables are "ELF 32-bit ...".

The binutils-gdb configury uses uname -m to figure out for what gdbserver host/target to build.  (host and target must be the same, otherwise gdbserver won't build.) So it may be the case that you built an aarch64 gdbserver instead of an arm gdbserver.  I think you can check this as follows:

kev@rpi4-2:/mesquite2/sourceware-git/rpi-master/bld/gdbserver$ ls linux-{arm,aarch}* linux-aarch32-low.o  linux-aarch32-tdesc.o  linux-arm-low.o  linux-arm-tdesc.o

If you also/instead see linux-aarch64-low.o and linux-aarch64-tdesc.o in that list, then you probably have an aarch64 gdbserver.

When I tried my build with uname -m returning aarch64, the build errored out because (I think) I was missing certain aarch64 header files.  But I knew that I didn't want to build for aarch64, so I abandoned that build.  What I ended up doing was making a wrapper for uname which substituted 'arm' for 'aarch64'.  I put it in /usr/local/bin, and /usr/local/bin is early in my PATH, so the configury finds it first...

root@rpi4-2:~# uname -m
arm

Here's my /usr/local/bin/uname script:
- - - -
root@rpi4-2:~# cat /usr/local/bin/uname
#!/bin/bash

/usr/bin/uname $* | sed -e s/aarch64/arm/
- - - -

[ Yes, this is a hack, but I couldn't think of a cleaner way to do it.
  I tried a configure line with "--host=arm-linux --target=host-linux",
  but that didn't work because something in the build wanted
  arm-linux-ar to exist and it didn't.  I could have made some
  symlinks, e.g. "ln -s /usr/bin/ar /usr/local/bin/arm-linux-ar", with
  similar symlinks for gcc, g++, ln, ranlib, etc, but that seemed like
  more work than my uname wrapper hack.]

I just checked my gdbserver build.  It's definitely getting into
arm_target::supports_hardware_single_step:

Breakpoint 1, linux_process_target::maybe_hw_step (
    this=0x8645c <the_arm_target>, thread=0x9df38)
    at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-low.cc:2442
2442      if (supports_hardware_single_step ())
(gdb) s
arm_target::supports_hardware_single_step (this=0x8645c <the_arm_target>)
    at /mesquite2/sourceware-git/rpi-master/bld/../../worktree-gdbserver/gdbserver/linux-arm-low.cc:1042
1042      return false;

Kevin

On Tue, 25 Jul 2023 04:21:00 +0000
"Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:

> Hi Kevin,
>      I test gdb11 on RaspBerry Pi4.
>      As you said, I can't produce this assert issue.
>      The direct reason is because supports_hardware_single_step () returns on RaspBerry Pi4, not like xilinux-zynq.
>      Please see attached pictures, we can see arm_target::supports_hardware_single_step () is never entered.
>      This assert only happens when supports_hardware_single_step () returns 'false'. On Raspberry Pi4, when I hardcoded supports_hardware_single_step () returns 'false', then assert happened.
>      For more information about " This assert only happens when supports_hardware_single_step () returns 'false'".
>      You can check
> https://sourceware.org/bugzilla/show_bug.cgi?id=30387
>
>      So, the new question is why arm_target::supports_hardware_single_step () is never entered on Raspberry Pi4.
>
> Best Regards.
> Zhiyong
>
>
> -----Original Message-----,
> From: Kevin Buettner <kevinb@redhat.com>
> Sent: Tuesday, July 25, 2023 11:37 AM
> To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a 
> pending thread whose last_resume_kind is resume_step
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi Zhiyong,
>
> I looked at the backtrace that you provided and see that
> maybe_hw_step() is being called from
> linux_process_target::resume_stopped_resumed_lwps,
> which is the one location where I wasn't able to convince myself that the assert should hold.
>
> I was running your test case executable (osm) as an unprivileged user, 
> so neither the syslog calls nor the sudo were working.  (Sudo could 
> perhaps work, but it wanted to prompt for a password and stdin and 
> stdout were closed.)  I've since modified it so that sudo isn't used 
> and I'm using 'fprintf(stderr, ...)' instead of syslog - which is how 
> I discovered that sudo wasn't working.  I've tried next'ing quite a 
> lot, but so far I haven't reproduced the bug.  (Hopefully, the sudo 
> isn't required to reproduce the problem.)
>
> If you manage to reproduce the bug on a Raspberry Pi 4 (and tell me how to do it), that'd be great!
>
> So, what I'm doing, using three separate terminals, in an attempt to reproduce the bug is:
>
> 1) Run osm in terminal 1.  (I didn't want to mess with systemd.)  Once I start running it, I see a bunch of messages from the dd command.
>
> 2) In terminal 2, I run:
>
>    /path/to/gdbserver --debug --debug-format=all --remote-debug 
> --event-loop-debug --once --attach :1234 $(pgrep osm)
>
> 3) In terminal 3, I run:
>
>    /path/to/gdb osm -x ./gdbx2
>
> (I've changed the target remote command in gdbx2 to refer to
> localhost.)
>
> I'm also attaching my hacked lupdated.c.  If you see anything wrong with what I'm trying, please let me know.
>
> Kevin
>
> On Mon, 24 Jul 2023 13:36:24 +0000
> "Yan, Zhiyong" <Zhiyong.Yan@windriver.com> wrote:
>
> > Hi Kevin,
> >     The callstack of assert is attached.
> >     Please see attached gdbx2 which add more 'n' commands, on arm platform, keep execute 'n' command, this test case can trigger assert error.
> >
> >     Today, I didn't finish setting up test environments on RaspBerry Pi4. Before I produced this issue on Xilinx arm platform.
> >
> > Best Regards.
> > Zhiyong
> >
> > -----Original Message-----
> > From: Kevin Buettner <kevinb@redhat.com>
> > Sent: Saturday, July 22, 2023 4:50 AM
> > To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
> > Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
> > Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a 
> > pending thread whose last_resume_kind is resume_step
> >
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> > Hi Zhiyong,
> >
> > I set up a Raspberry Pi running a recent 32-bit Raspberry Pi OS so that I could test your patch.  I was able to build and run your test case, but I could not reproduce the bug on the Pi.
> >
> > I tested gdb.threads/*.exp using --target_board=native-gdbserver 
> > both with and without your patch.  Some of these tests are racy, but 
> > my conclusion from just looking at the PASSes and FAILs (after many 
> > test
> > runs) is that there are no regressions.
> >
> > But then I remembered to enable core dumps on the Pi and after 
> > running 
> > gdb.threads/pending-fork-event-detach/pending-fork-event-detach-main
> > -v fork by itself, I saw that it left a core file...
> >
> > $ make check RUNTESTFLAGS="--target_board=native-gdbserver"
> > TESTS=gdb.threads/pending-fork-event-detach.exp
> > ...
> >                 === gdb Summary ===
> >
> > # of unexpected core files      1
> > # of expected passes            240
> >
> > The core file was from the running test case, not gdbserver, nor gdb.
> >
> > Looking at the core file in GDB shows...
> >
> > Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> > #0  0x00010624 in break_here () at /mesquite2/sourceware-git/rpi-gdbserver/bld/../../worktree-gdbserver/gdb/testsuite/gdb.threads/pending-fork-event-detach.c:29
> > 29        x++;
> > [Current thread is 1 (Thread 0xf7e10440 (LWP 4835))]
> > (gdb) x/i $pc
> > => 0x10624 <break_here+12>:     udf     #16
> > (gdb) x/x $pc
> > 0x10624 <break_here+12>:        0xe7f001f0
> >
> > ...and in gdbserver/linux-aarch32-low.cc:
> >
> > #define arm_eabi_breakpoint 0xe7f001f0UL
> >
> > I think what's happened here is that the breakpoint added by your patch is left in place when GDB detaches the test case.  When it starts running again, it hits the software single step breakpoint and, since it's no longer under GDB control, it dies with a SIGTRAP.
> >
> > This core file is not created when I run the test using a gdbserver without your patch.
> >
> > I'm suspicious of the assert in linux_process_target::maybe_hw_step.
> > Currently, it looks like this:
> >
> > bool
> > linux_process_target::maybe_hw_step (thread_info *thread) {
> >   if (supports_hardware_single_step ())
> >     return true;
> >   else
> >     {
> >       /* GDBserver must insert single-step breakpoint for software
> >          single step.  */
> >       gdb_assert (has_single_step_breakpoints (thread));
> >       return false;
> >     }
> > }
> >
> > But, when Yao Qi introduced it back in June, 2016, it looked like
> > this:
> >
> > static int
> > maybe_hw_step (struct thread_info *thread) {
> >   if (can_hardware_single_step ())
> >     return 1;
> >   else
> >     {
> >       struct process_info *proc = get_thread_process (thread);
> >
> >       /* GDBserver must insert reinsert breakpoint for software
> >      single step.  */
> >       gdb_assert (has_reinsert_breakpoints (proc));
> >       return 0;
> >     }
> > }
> >
> > So, back is 2016, when it was introduced, it's clear that the assert was referring to breakpoints which needed to be reinserted.  Now, that's not at all obvious.
> >
> > Also, back in 2016, maybe_hw_step() was only called from two 
> > locations; in each case it was in a block in which the condition
> > lwp->bp_reinsert != 0 was true.  But now there are two other
> > calls; in one case, the software single step breakpoints have just been inserted, so that should be okay, but for the other case, in linux_process_target::resume_stopped_resumed_lwps, I'm less certain.
> >
> > In any case, could you comment out (or delete) the assert in a version of the source without your patch and let me know what happens?
> >
> > Also, if possible, I'd like to see a backtrace from where the assert occurs so that I can see which call to maybe_hw_step is responsible for triggering the failing assert.
> >
> > Kevin
> >
  
Kevin Buettner July 26, 2023, 3:58 a.m. UTC | #7
Hi Zhiyong,

I've finally been able to reproduce the bug on a Raspberry Pi. On a
different SD card, I installed 32-bit Ubunutu server 20.04.5 LTS.  It
seems to have both a 32-bit (arm) kernel + 32-bit userland.  I.e...

kev@rpi4-3:~/Downloads/bz30387$ uname -m
armv7l
kev@rpi4-3:~/Downloads/bz30387$ file ./osm
./osm: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81e7e2b5dfba0fe35f1f1a6af2ee558efbdafa7f, for GNU/Linux 3.2.0, with debug_info, not stripped

(Compare the above output to that your reported on your Pi; on your
Pi, uname -m reported aarch64 and the 'file' command showed 64-bit
aarch64 binaries.)

The internal error appears to be the same as that described in your
bug report as well as on the gdb-patches list:

/mesquite2/sourceware-git/rpi-arm-master/bld/../../worktree-master/gdbserver/linux-low.cc:2448: A problem internal to GDBserver has been detected.
maybe_hw_step: Assertion `has_single_step_breakpoints (thread)' failed.

Now that I've reproduced it, I want to retest gdb.threads/*.exp to see
if any of those tests show the same failure.  If not, I'll try to adapt
your test case into one suitable for the gdb test suite.

I have an alternate patch in mind, which I'll try out too.  If it
works out, I'll ask you to test it on your hardware...

Kevin
  
Yan, Zhiyong July 26, 2023, 6:30 a.m. UTC | #8
Hi Kevin,
     Thanks for your effort.

Best Regards.
Zhiyong

-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com> 
Sent: Wednesday, July 26, 2023 11:59 AM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; luis.machado@arm.com; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Zhiyong,

I've finally been able to reproduce the bug on a Raspberry Pi. On a different SD card, I installed 32-bit Ubunutu server 20.04.5 LTS.  It seems to have both a 32-bit (arm) kernel + 32-bit userland.  I.e...

kev@rpi4-3:~/Downloads/bz30387$ uname -m armv7l kev@rpi4-3:~/Downloads/bz30387$ file ./osm
./osm: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=81e7e2b5dfba0fe35f1f1a6af2ee558efbdafa7f, for GNU/Linux 3.2.0, with debug_info, not stripped

(Compare the above output to that your reported on your Pi; on your Pi, uname -m reported aarch64 and the 'file' command showed 64-bit
aarch64 binaries.)

The internal error appears to be the same as that described in your bug report as well as on the gdb-patches list:

/mesquite2/sourceware-git/rpi-arm-master/bld/../../worktree-master/gdbserver/linux-low.cc:2448: A problem internal to GDBserver has been detected.
maybe_hw_step: Assertion `has_single_step_breakpoints (thread)' failed.

Now that I've reproduced it, I want to retest gdb.threads/*.exp to see if any of those tests show the same failure.  If not, I'll try to adapt your test case into one suitable for the gdb test suite.

I have an alternate patch in mind, which I'll try out too.  If it works out, I'll ask you to test it on your hardware...

Kevin
  

Patch

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index e6a39202a98..d29881174db 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -4671,7 +4671,16 @@  linux_process_target::resume_one_thread (thread_info *thread,
       proceed_one_lwp (thread, NULL);
     }
   else
-    threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
+    {
+      threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
+      if (thread->last_resume_kind == resume_step)
+	{
+	  /* If resume_step is required by GDB, 
+	     install single-step breakpoint.  */
+	  if (supports_software_single_step ())
+	    install_software_single_step_breakpoints (lwp);
+	}
+    }
 
   thread->last_status.set_ignore ();
   lwp->resume = NULL;