[8/8] Special-case wildcard requests in ravenscar-thread.c

Message ID 20190207094016.368-9-tom@tromey.com
State New, archived
Headers

Commit Message

Tom Tromey Feb. 7, 2019, 9:40 a.m. UTC
  From: Tom Tromey <tromey@adacore.com>

ravenscar-thread.c intercepts resume and wait target requests and
replaces the requested ptid with the ptid of the underlying CPU.
However, this is incorrect when a request is made with a wildcard
ptid.

This patch adds a special case to ravenscar-thread.c for
minus_one_ptid.  I don't believe a special case for process wildcards
is necessary, so I have not added that.

gdb/ChangeLog
2019-02-07  Tom Tromey  <tromey@adacore.com>

	* ravenscar-thread.c (ravenscar_thread_target::resume)
	(ravenscar_thread_target::wait): Special case wildcard requests.
---
 gdb/ChangeLog          |  5 +++++
 gdb/ravenscar-thread.c | 20 ++++++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)
  

Comments

Pedro Alves Feb. 7, 2019, 5:23 p.m. UTC | #1
On 02/07/2019 09:40 AM, Tom Tromey wrote:
> From: Tom Tromey <tromey@adacore.com>
> 
> ravenscar-thread.c intercepts resume and wait target requests and
> replaces the requested ptid with the ptid of the underlying CPU.
> However, this is incorrect when a request is made with a wildcard
> ptid.

Can you clarify a bit more what went wrong?
IIRC, the base target always has one thread/cpu only, anyway?  What
difference does the patch make in practice?

The reason I'm wondering is because the patch makes me wonder about
a step request with no sched-lock, which is the default (*).
In that case, you'll have:

 - the thread to step is in inferior_ptid
 - ptid == minus_one_ptid (indicating everything resumes / no schedlock)

So seems like after this patch the lower layer might get a request to step
an unknown inferior_ptid?  Might not happen by luck/accident.
I'd suspect that you'd want to do instead:

  inferior_ptid = m_base_ptid;
  /* If we see a wildcard resume, we simply pass that on.  Otherwise,
     arrange to resume the base ptid.  */
  if (ptid != minus_one_ptid)
    ptid = m_base_ptid;
  beneath ()->resume (ptid, step, siggnal);

I.e., always flip inferior_ptid.

(*) since ravenscar-thread.c doesn't know how to interact with
    the ravenscar runtime to specify which threads are resumable,
    "schedlock on" most certainly doesn't work properly.  E.g.,
      "task 1 stops; set scheduler-locking on; task 2; step"
    seemingly will step task 1 instead of 2, AFAICT.

Thanks,
Pedro Alves
  
Joel Brobecker Feb. 9, 2019, 4:41 a.m. UTC | #2
> > ravenscar-thread.c intercepts resume and wait target requests and
> > replaces the requested ptid with the ptid of the underlying CPU.
> > However, this is incorrect when a request is made with a wildcard
> > ptid.
> 
> Can you clarify a bit more what went wrong?
> IIRC, the base target always has one thread/cpu only, anyway?  What
> difference does the patch make in practice?

This happens when debugging application on a multi-CPU target,
using QEMU, where some tasks are allocated to one CPU, and others
are allocated to the other.

When stopping, QEMU tells us about one thread per CPU, so we start
with one possible base_ptid per CPU.  Previously, sending a resume
order for one thread actually resumed execution on all the CPUs.
But during an upgrade, which behavior was changed so that sending
a resume order for one thread only wakes the corresponding CPU up.

At the user level, we noticed the issue because we had a test were
we insert a breakpoint one some code which is only run from, say,
CPU #2, whereas we unfortunately resumed the execution after having
stopped somewhere in CPU #1. As a result, we sent an order to resume
CPU #1, which starves CPU #2 forever, because the code in CPU #1
waits for some of the Ada tasks allocated to CPU #2 (and we never
reach our breakpoint either).

> The reason I'm wondering is because the patch makes me wonder about
> a step request with no sched-lock, which is the default (*).
> In that case, you'll have:
> 
>  - the thread to step is in inferior_ptid
>  - ptid == minus_one_ptid (indicating everything resumes / no schedlock)
> 
> So seems like after this patch the lower layer might get a request to step
> an unknown inferior_ptid?  Might not happen by luck/accident.
> I'd suspect that you'd want to do instead:
> 
>   inferior_ptid = m_base_ptid;
>   /* If we see a wildcard resume, we simply pass that on.  Otherwise,
>      arrange to resume the base ptid.  */
>   if (ptid != minus_one_ptid)
>     ptid = m_base_ptid;
>   beneath ()->resume (ptid, step, siggnal);
> 
> I.e., always flip inferior_ptid.
> 
> (*) since ravenscar-thread.c doesn't know how to interact with
>     the ravenscar runtime to specify which threads are resumable,
>     "schedlock on" most certainly doesn't work properly.  E.g.,
>       "task 1 stops; set scheduler-locking on; task 2; step"
>     seemingly will step task 1 instead of 2, AFAICT.

That's correct (unless if you are in the particular situation where
you have one task per CPU).
  
Pedro Alves Feb. 14, 2019, 1:35 p.m. UTC | #3
On 02/09/2019 04:41 AM, Joel Brobecker wrote:
>>> ravenscar-thread.c intercepts resume and wait target requests and
>>> replaces the requested ptid with the ptid of the underlying CPU.
>>> However, this is incorrect when a request is made with a wildcard
>>> ptid.
>>
>> Can you clarify a bit more what went wrong?
>> IIRC, the base target always has one thread/cpu only, anyway?  What
>> difference does the patch make in practice?
> 
> This happens when debugging application on a multi-CPU target,
> using QEMU, where some tasks are allocated to one CPU, and others
> are allocated to the other.
> 
> When stopping, QEMU tells us about one thread per CPU, so we start
> with one possible base_ptid per CPU.  Previously, sending a resume
> order for one thread actually resumed execution on all the CPUs.
> But during an upgrade, which behavior was changed so that sending
> a resume order for one thread only wakes the corresponding CPU up.

Interesting -- that clarifies the "why changed that made this
noticeable now" question I had in my mind.  It's the sort of thing that
I think is worth it to include in the commit log.

> 
> At the user level, we noticed the issue because we had a test were
> we insert a breakpoint one some code which is only run from, say,
> CPU #2, whereas we unfortunately resumed the execution after having
> stopped somewhere in CPU #1. As a result, we sent an order to resume
> CPU #1, which starves CPU #2 forever, because the code in CPU #1
> waits for some of the Ada tasks allocated to CPU #2 (and we never
> reach our breakpoint either).

I see.  For some reason I recalled that Ravenscar didn't support SMP.

> 
>> The reason I'm wondering is because the patch makes me wonder about
>> a step request with no sched-lock, which is the default (*).
>> In that case, you'll have:
>>
>>  - the thread to step is in inferior_ptid
>>  - ptid == minus_one_ptid (indicating everything resumes / no schedlock)
>>
>> So seems like after this patch the lower layer might get a request to step
>> an unknown inferior_ptid?  Might not happen by luck/accident.
>> I'd suspect that you'd want to do instead:
>>
>>   inferior_ptid = m_base_ptid;
>>   /* If we see a wildcard resume, we simply pass that on.  Otherwise,
>>      arrange to resume the base ptid.  */
>>   if (ptid != minus_one_ptid)
>>     ptid = m_base_ptid;
>>   beneath ()->resume (ptid, step, siggnal);
>>
>> I.e., always flip inferior_ptid.

This comment still applies, I think.

BTW, the main reason I asked is because I have a change in the multi-target
branch that makes infrun switch inferior_ptid to null_ptid before
handling events, which exposed a number of bad assumptions that could lead
to accessing the wrong target.  Ravenscar may end up affected, but I'm
not certain.

>>
>> (*) since ravenscar-thread.c doesn't know how to interact with
>>     the ravenscar runtime to specify which threads are resumable,
>>     "schedlock on" most certainly doesn't work properly.  E.g.,
>>       "task 1 stops; set scheduler-locking on; task 2; step"
>>     seemingly will step task 1 instead of 2, AFAICT.
> 
> That's correct (unless if you are in the particular situation where
> you have one task per CPU).
Thanks,
Pedro Alves
  
Tom Tromey Feb. 15, 2019, 8:39 p.m. UTC | #4
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

Pedro> The reason I'm wondering is because the patch makes me wonder about
Pedro> a step request with no sched-lock, which is the default (*).
Pedro> In that case, you'll have:

I think the default is "replay", which is normally like "off" -- but
there's a Fedora patch to change the default to "step"?

Anyway, not super important...

Pedro> So seems like after this patch the lower layer might get a request to step
Pedro> an unknown inferior_ptid?  Might not happen by luck/accident.
Pedro> I'd suspect that you'd want to do instead:
[...]

Yes, I agree.  Thanks for pointing this out.  I've made this change.

Tom
  
Tom Tromey Feb. 15, 2019, 8:43 p.m. UTC | #5
>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:

Pedro> Interesting -- that clarifies the "why changed that made this
Pedro> noticeable now" question I had in my mind.  It's the sort of thing that
Pedro> I think is worth it to include in the commit log.

I'm copying Joel's text into the commit message :)

Pedro> BTW, the main reason I asked is because I have a change in the multi-target
Pedro> branch that makes infrun switch inferior_ptid to null_ptid before
Pedro> handling events, which exposed a number of bad assumptions that could lead
Pedro> to accessing the wrong target.  Ravenscar may end up affected, but I'm
Pedro> not certain.

FWIW, ravenscar-thread.c doesn't seem to intrinsically care about
inferior_ptid -- it is just setting it for the benefit of any lower
layers on the target stack.  So, if the rest of the stack is changed so
as not to rely on this, then ravenscar-thread can follow.

I'm re-testing this series using the AdaCore internal test suite, which
exercises ravenscar-thread; I'll push them if this goes well.

thanks,
Tom
  
Pedro Alves Feb. 20, 2019, 7:23 p.m. UTC | #6
On 02/15/2019 08:39 PM, Tom Tromey wrote:
>>>>>> "Pedro" == Pedro Alves <palves@redhat.com> writes:
> 
> Pedro> The reason I'm wondering is because the patch makes me wonder about
> Pedro> a step request with no sched-lock, which is the default (*).
> Pedro> In that case, you'll have:
> 
> I think the default is "replay", which is normally like "off"

Yes, replay is basically "off", except when you're replaying it behaves
like "step".  What I mean is that by default when you step all threads
run unlocked.

> -- but
> there's a Fedora patch to change the default to "step"?

Yeah.  On my list of patches to drop from Fedora, actually.

> 
> Anyway, not super important...
> 
> Pedro> So seems like after this patch the lower layer might get a request to step
> Pedro> an unknown inferior_ptid?  Might not happen by luck/accident.
> Pedro> I'd suspect that you'd want to do instead:
> [...]
> 
> Yes, I agree.  Thanks for pointing this out.  I've made this change.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/ravenscar-thread.c b/gdb/ravenscar-thread.c
index 2e8d76d4ae8..4b07b00aae2 100644
--- a/gdb/ravenscar-thread.c
+++ b/gdb/ravenscar-thread.c
@@ -323,8 +323,14 @@  void
 ravenscar_thread_target::resume (ptid_t ptid, int step,
 				 enum gdb_signal siggnal)
 {
-  inferior_ptid = m_base_ptid;
-  beneath ()->resume (m_base_ptid, step, siggnal);
+  /* If we see a wildcard resume, we simply pass that on.  Otherwise,
+     arrange to resume the base ptid.  */
+  if (ptid != minus_one_ptid)
+    {
+      inferior_ptid = m_base_ptid;
+      ptid = m_base_ptid;
+    }
+  beneath ()->resume (ptid, step, siggnal);
 }
 
 ptid_t
@@ -334,8 +340,12 @@  ravenscar_thread_target::wait (ptid_t ptid,
 {
   ptid_t event_ptid;
 
-  inferior_ptid = m_base_ptid;
-  event_ptid = beneath ()->wait (m_base_ptid, status, 0);
+  if (ptid != minus_one_ptid)
+    {
+      inferior_ptid = m_base_ptid;
+      ptid = m_base_ptid;
+    }
+  event_ptid = beneath ()->wait (ptid, status, 0);
   /* Find any new threads that might have been created, and update
      inferior_ptid to the active thread.
 
@@ -350,6 +360,8 @@  ravenscar_thread_target::wait (ptid_t ptid,
       this->update_thread_list ();
       this->update_inferior_ptid ();
     }
+  else
+    inferior_ptid = m_base_ptid;
   return inferior_ptid;
 }