gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

Message ID 20230703030458.1192525-1-zhiyong.yan@windriver.com
State New
Headers
Series gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step |

Commit Message

Yan, Zhiyong July 3, 2023, 3:04 a.m. UTC
  From: Zhiyong Yan <zhiyong.yan@windriver.com>

Gdb should not assume pending threads always generate “a non-gdbserver trap event”, for example “Signal 17” event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.

Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
---
 gdbserver/linux-low.cc | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
  

Comments

Alexandra Petlanova Hajkova July 10, 2023, 12:46 p.m. UTC | #1
On Mon, Jul 3, 2023 at 5:05 AM <zhiyong.yan@windriver.com> wrote:

> From: Zhiyong Yan <zhiyong.yan@windriver.com>
>
> Gdb should not assume pending threads always generate “a non-gdbserver
> trap event”, for example “Signal 17” event could happen. Now that
> resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point
> must already exist, resume_one_thread() should ensure the software breaking
> point is installed although the thread is pending.
>
> Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387


I can confirm this patch causes no regression Fedora-Rawhide, ppc64le.
  
Yan, Zhiyong July 11, 2023, 1:49 a.m. UTC | #2
Hi Alexandra,
     Thanks for your feedback.

Best Regards.
Zhiyong

From: Alexandra Petlanova Hajkova <ahajkova@redhat.com>
Sent: Monday, July 10, 2023 8:46 PM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; tom@tromey.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.


On Mon, Jul 3, 2023 at 5:05 AM <zhiyong.yan@windriver.com<mailto:zhiyong.yan@windriver.com>> wrote:
From: Zhiyong Yan <zhiyong.yan@windriver.com<mailto:zhiyong.yan@windriver.com>>

Gdb should not assume pending threads always generate “a non-gdbserver trap event”, for example “Signal 17” event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.

Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com<mailto:zhiyong.yan@windriver.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387<https://urldefense.com/v3/__https:/sourceware.org/bugzilla/show_bug.cgi?id=30387__;!!AjveYdw8EvQ!fkg3Ae-1jPyI75siqABZuT57DvfkxmNtWeqO1A1Jly9PEvrBravJqc2Eou3b6H0Qfc31GRg9fixINWkyyZW8IwY$>

I can confirm this patch causes no regression Fedora-Rawhide, ppc64le.
  
Kevin Buettner July 11, 2023, 6:42 p.m. UTC | #3
Hi Zhiyong,

See my comments below; there's still one formatting nit, but I also
have a question about whether this is the right place to fix the bug
that you're seeing.

Also, I've added Luis to the CC list since he knows a lot more about
the ARM architecture than I do.

On Mon, 3 Jul 2023 11:04:58 +0800
<zhiyong.yan@windriver.com> wrote:

> From: Zhiyong Yan <zhiyong.yan@windriver.com>
> 
> Gdb should not assume pending threads always generate "a non-gdbserver trap event", for example "Signal 17" event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.
> 
> Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
> ---
>  gdbserver/linux-low.cc | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
> index e6a39202a98..7d8bbf71ddc 100644
> --- a/gdbserver/linux-low.cc
> +++ b/gdbserver/linux-low.cc
> @@ -4671,7 +4671,16 @@ linux_process_target::resume_one_thread (thread_info *thread,
>        proceed_one_lwp (thread, NULL);
>      }
>    else
> -    threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
> +    {
> +      threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
> +      if (thread->last_resume_kind == resume_step)
> +	{
> +	  /* If resume_step is required by GDB, 
> +	     install single-step breakpoint.  */
> +	  if (supports_software_single_step())

Formatting nit: you need a space between 'supports_software_single_step'
and the left paren.

> +	    install_software_single_step_breakpoints (lwp);
> +	}
> +    }
>  
>    thread->last_status.set_ignore ();
>    lwp->resume = NULL;

With regard to the change itself...

If the debugging printf is accurate, the LWP is being left in a
stopped state.  If that's the case, then why are software single step
breakpoints being inserted?  It seems to me that you'd only want
to insert these breakpoints when the thread is actually about to
resume.

That said, I have no doubt that this change works for you, so clearly
there's something going on which I do not understand.  I'd like to
understand the situation better before approving it.  Also, once you
have an explanation, please update the code comments and/or commit log
message as appropriate.  My personal preference is for a concise
explanation to be placed in the code comments with any additional,
longer winded explanations or examples being placed in the commit log
message.

Kevin
  
Yan, Zhiyong July 12, 2023, 2:33 a.m. UTC | #4
Hi Kevin,
     My last mail's attachment is a little big, I am not sure if you can receive it. I zipped the log, and send again.

     -  For "format nit", I will work on it again.

     -  For your concern "I'd like to understand the situation better before approving it.",
        Please check attached " step5-assert.txt".
        Attached step5-assert.txt is a debug log which contains several "gdb step next" output.
        For example, at line 29, resume_one_thread() doesn't call process_one_lwp() because  thread LWP 316.316 is pending, as a result the software breaking point is not installed. Later, if this pending thread makes "wait_1: Hit a non-gdbserver trap event." happen, stop_all_lwps() -> "prepare_resume_reply: Writing resume reply for" ->...
        In such case, "gdb N" can finish without assert error. 
        But in step5-assert.txt, from line 14923 to 14943, we can see the pending thread make below happen,
         "
              wait_for_event_filtered: Got a pending child 316
              362.994099   [threads] wait_for_event_filtered: Got an event from pending child 316 (117f)
              362.994379   [threads] wait_1: Ignored signal 17 for LWP 316
         "
         In this case "resume_stopped_resumed_lwps" will resume every 'stopped-resumed' thread, but a thread (previously pending) has no software break point installed, then resume_stopped_resumed_lwps() asserts failed in maybe_hw_step().
     

Best Regards.
Zhiyong

-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com> 
Sent: Wednesday, July 12, 2023 2:43 AM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; tom@tromey.com; luis.machado@arm.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Zhiyong,

See my comments below; there's still one formatting nit, but I also have a question about whether this is the right place to fix the bug that you're seeing.

Also, I've added Luis to the CC list since he knows a lot more about the ARM architecture than I do.

On Mon, 3 Jul 2023 11:04:58 +0800
<zhiyong.yan@windriver.com> wrote:

> From: Zhiyong Yan <zhiyong.yan@windriver.com>
>
> Gdb should not assume pending threads always generate "a non-gdbserver trap event", for example "Signal 17" event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.
>
> Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
> ---
>  gdbserver/linux-low.cc | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc index 
> e6a39202a98..7d8bbf71ddc 100644
> --- a/gdbserver/linux-low.cc
> +++ b/gdbserver/linux-low.cc
> @@ -4671,7 +4671,16 @@ linux_process_target::resume_one_thread (thread_info *thread,
>        proceed_one_lwp (thread, NULL);
>      }
>    else
> -    threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
> +    {
> +      threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
> +      if (thread->last_resume_kind == resume_step)
> +     {
> +       /* If resume_step is required by GDB,
> +          install single-step breakpoint.  */
> +       if (supports_software_single_step())

Formatting nit: you need a space between 'supports_software_single_step'
and the left paren.

> +         install_software_single_step_breakpoints (lwp);
> +     }
> +    }
>
>    thread->last_status.set_ignore ();
>    lwp->resume = NULL;

With regard to the change itself...

If the debugging printf is accurate, the LWP is being left in a stopped state.  If that's the case, then why are software single step breakpoints being inserted?  It seems to me that you'd only want to insert these breakpoints when the thread is actually about to resume.

That said, I have no doubt that this change works for you, so clearly there's something going on which I do not understand.  I'd like to understand the situation better before approving it.  Also, once you have an explanation, please update the code comments and/or commit log message as appropriate.  My personal preference is for a concise explanation to be placed in the code comments with any additional, longer winded explanations or examples being placed in the commit log message.

Kevin
  
Yan, Zhiyong July 12, 2023, 3:29 a.m. UTC | #5
Hi all,
     For "format nit", I have sent new patch again. Because add more mail CC, I send patch triple today, they same. Sorry for bothering you.

Best Regards.
Zhiyong

-----Original Message-----
From: Yan, Zhiyong 
Sent: Wednesday, July 12, 2023 10:34 AM
To: Kevin Buettner <kevinb@redhat.com>
Cc: gdb-patches@sourceware.org; tom@tromey.com; luis.machado@arm.com
Subject: RE: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

Hi Kevin,
     My last mail's attachment is a little big, I am not sure if you can receive it. I zipped the log, and send again.

     -  For "format nit", I will work on it again.

     -  For your concern "I'd like to understand the situation better before approving it.",
        Please check attached " step5-assert.txt".
        Attached step5-assert.txt is a debug log which contains several "gdb step next" output.
        For example, at line 29, resume_one_thread() doesn't call process_one_lwp() because  thread LWP 316.316 is pending, as a result the software breaking point is not installed. Later, if this pending thread makes "wait_1: Hit a non-gdbserver trap event." happen, stop_all_lwps() -> "prepare_resume_reply: Writing resume reply for" ->...
        In such case, "gdb N" can finish without assert error. 
        But in step5-assert.txt, from line 14923 to 14943, we can see the pending thread make below happen,
         "
              wait_for_event_filtered: Got a pending child 316
              362.994099   [threads] wait_for_event_filtered: Got an event from pending child 316 (117f)
              362.994379   [threads] wait_1: Ignored signal 17 for LWP 316
         "
         In this case "resume_stopped_resumed_lwps" will resume every 'stopped-resumed' thread, but a thread (previously pending) has no software break point installed, then resume_stopped_resumed_lwps() asserts failed in maybe_hw_step().
     

Best Regards.
Zhiyong

-----Original Message-----
From: Kevin Buettner <kevinb@redhat.com>
Sent: Wednesday, July 12, 2023 2:43 AM
To: Yan, Zhiyong <Zhiyong.Yan@windriver.com>
Cc: gdb-patches@sourceware.org; tom@tromey.com; luis.machado@arm.com
Subject: Re: [PATCH] gdbserver: Install single-step breakpoint for a pending thread whose last_resume_kind is resume_step

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Zhiyong,

See my comments below; there's still one formatting nit, but I also have a question about whether this is the right place to fix the bug that you're seeing.

Also, I've added Luis to the CC list since he knows a lot more about the ARM architecture than I do.

On Mon, 3 Jul 2023 11:04:58 +0800
<zhiyong.yan@windriver.com> wrote:

> From: Zhiyong Yan <zhiyong.yan@windriver.com>
>
> Gdb should not assume pending threads always generate "a non-gdbserver trap event", for example "Signal 17" event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.
>
> Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
> ---
>  gdbserver/linux-low.cc | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc index 
> e6a39202a98..7d8bbf71ddc 100644
> --- a/gdbserver/linux-low.cc
> +++ b/gdbserver/linux-low.cc
> @@ -4671,7 +4671,16 @@ linux_process_target::resume_one_thread (thread_info *thread,
>        proceed_one_lwp (thread, NULL);
>      }
>    else
> -    threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
> +    {
> +      threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
> +      if (thread->last_resume_kind == resume_step)
> +     {
> +       /* If resume_step is required by GDB,
> +          install single-step breakpoint.  */
> +       if (supports_software_single_step())

Formatting nit: you need a space between 'supports_software_single_step'
and the left paren.

> +         install_software_single_step_breakpoints (lwp);
> +     }
> +    }
>
>    thread->last_status.set_ignore ();
>    lwp->resume = NULL;

With regard to the change itself...

If the debugging printf is accurate, the LWP is being left in a stopped state.  If that's the case, then why are software single step breakpoints being inserted?  It seems to me that you'd only want to insert these breakpoints when the thread is actually about to resume.

That said, I have no doubt that this change works for you, so clearly there's something going on which I do not understand.  I'd like to understand the situation better before approving it.  Also, once you have an explanation, please update the code comments and/or commit log message as appropriate.  My personal preference is for a concise explanation to be placed in the code comments with any additional, longer winded explanations or examples being placed in the commit log message.

Kevin
  
Luis Machado July 17, 2023, 9:35 a.m. UTC | #6
Hi,

On 7/11/23 19:42, Kevin Buettner wrote:
> Hi Zhiyong,
> 
> See my comments below; there's still one formatting nit, but I also
> have a question about whether this is the right place to fix the bug
> that you're seeing.
> 
> Also, I've added Luis to the CC list since he knows a lot more about
> the ARM architecture than I do.

I was aware of this one, but since it is generic code, I thought I'd defer it to the
global maintainers. It doesn't look like something that can/should be handled in arch-specific
layers. But I can revisit it if people think otherwise.

> 
> On Mon, 3 Jul 2023 11:04:58 +0800
> <zhiyong.yan@windriver.com> wrote:
> 
>> From: Zhiyong Yan <zhiyong.yan@windriver.com>
>>
>> Gdb should not assume pending threads always generate "a non-gdbserver trap event", for example "Signal 17" event could happen. Now that resume_stopped_resumed_lwps() -> may_hw_step() assumes that the break point must already exist, resume_one_thread() should ensure the software breaking point is installed although the thread is pending.
>>
>> Signed-off-by: Zhiyong Yan zhiyong.yan@windriver.com
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30387
>> ---
>>  gdbserver/linux-low.cc | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
>> index e6a39202a98..7d8bbf71ddc 100644
>> --- a/gdbserver/linux-low.cc
>> +++ b/gdbserver/linux-low.cc
>> @@ -4671,7 +4671,16 @@ linux_process_target::resume_one_thread (thread_info *thread,
>>        proceed_one_lwp (thread, NULL);
>>      }
>>    else
>> -    threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
>> +    {
>> +      threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
>> +      if (thread->last_resume_kind == resume_step)
>> +	{
>> +	  /* If resume_step is required by GDB, 
>> +	     install single-step breakpoint.  */
>> +	  if (supports_software_single_step())
> 
> Formatting nit: you need a space between 'supports_software_single_step'
> and the left paren.
> 
>> +	    install_software_single_step_breakpoints (lwp);
>> +	}
>> +    }
>>  
>>    thread->last_status.set_ignore ();
>>    lwp->resume = NULL;
> 
> With regard to the change itself...
> 
> If the debugging printf is accurate, the LWP is being left in a
> stopped state.  If that's the case, then why are software single step
> breakpoints being inserted?  It seems to me that you'd only want
> to insert these breakpoints when the thread is actually about to
> resume.
> 
> That said, I have no doubt that this change works for you, so clearly
> there's something going on which I do not understand.  I'd like to
> understand the situation better before approving it.  Also, once you
> have an explanation, please update the code comments and/or commit log
> message as appropriate.  My personal preference is for a concise
> explanation to be placed in the code comments with any additional,
> longer winded explanations or examples being placed in the commit log
> message.
> 
> Kevin
>
  

Patch

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index e6a39202a98..7d8bbf71ddc 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -4671,7 +4671,16 @@  linux_process_target::resume_one_thread (thread_info *thread,
       proceed_one_lwp (thread, NULL);
     }
   else
-    threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
+    {
+      threads_debug_printf ("leaving LWP %ld stopped", lwpid_of (thread));
+      if (thread->last_resume_kind == resume_step)
+	{
+	  /* If resume_step is required by GDB, 
+	     install single-step breakpoint.  */
+	  if (supports_software_single_step())
+	    install_software_single_step_breakpoints (lwp);
+	}
+    }
 
   thread->last_status.set_ignore ();
   lwp->resume = NULL;