[RFA] Fix internal error with 'set debug infrun 1' under high load

Message ID 1553461783.1504.3.camel@skynet.be
State New, archived
Headers

Commit Message

Philippe Waroquiers March 24, 2019, 9:09 p.m. UTC
  On Sun, 2019-03-24 at 13:50 -0700, Kevin Buettner wrote:
> Hi Philippe,
> 
> There is definitely a bug in this section of code from infrun.c:
> 
> 	  else if (ws.kind == TARGET_WAITKIND_THREAD_EXITED
> 		   || ws.kind == TARGET_WAITKIND_EXITED
> 		   || ws.kind == TARGET_WAITKIND_SIGNALLED)
> 	    {
> 	      if (debug_infrun)
> 		{
> 		  ptid_t ptid = ptid_t (ws.value.integer);
> 
> 		  fprintf_unfiltered (gdb_stdlog,
> 				      "infrun: %s exited while "
> 				      "stopping threads\n",
> 				      target_pid_to_str (ptid).c_str ());
> 		}
> 	    }
> 
> This line...
> 
> 		  ptid_t ptid = ptid_t (ws.value.integer);
> 
> ...doesn't make sense to me since ws.value.integer is supposed to
> be the exit status for TARGET_WAITKIND_THREAD_EXITED and
> TARGET_WAITKIND_EXITED.
> 
> However, for TARGET_WAITKIND_SIGNALLED, the signal number is in
> ws.value.sig (which, due to being part of a union occupies some
> of the same bytes as ws.value.integer).
> 
> So trying to find the ptid in that manner makes no sense at all.
> 
> I'm guessing that the ptid values are bogus when it does work.
> 
> Does it work when you use 
> 
> 		  ptid_t ptid = ptid_t (event_pid);
> 
> instead?
I guess you mean to only print event_ptid.

Yes, that is working (the proposed patch was printing both
event_ptid and the ptid derived from ws.value.integer, assuming
that sometimes ws.value.integer was something relevant).

Here is the trace I obtain after a few trials under high load:
infrun: stop_all_threads, pass=0, iterations=0
infrun:   Thread 0x7ffff7fcfb40 (LWP 3587) not executing
infrun:   Thread 0x7ffff7310700 (LWP 3632) executing, need stop
[Thread 0x7ffff7310700 (LWP 3632) exited]
infrun: target_wait (-1.0.0, status) =
infrun:   3587.3632.0 [LWP 3632],
infrun:   status->kind = thread exited, status = 0
infrun: LWP 3632 exited while stopping threads
infrun:   Thread 0x7ffff7fcfb40 (LWP 3587) not executing
infrun: stop_all_threads, pass=1, iterations=1
infrun:   Thread 0x7ffff7fcfb40 (LWP 3587) not executing
infrun: stop_all_threads done

The above is obtained with the patch:
  

Comments

Kevin Buettner March 24, 2019, 9:35 p.m. UTC | #1
On Sun, 24 Mar 2019 22:09:43 +0100
Philippe Waroquiers <philippe.waroquiers@skynet.be> wrote:

> On Sun, 2019-03-24 at 13:50 -0700, Kevin Buettner wrote:
> > Hi Philippe,
> > 
> > There is definitely a bug in this section of code from infrun.c:
> > 
> > 	  else if (ws.kind == TARGET_WAITKIND_THREAD_EXITED
> > 		   || ws.kind == TARGET_WAITKIND_EXITED
> > 		   || ws.kind == TARGET_WAITKIND_SIGNALLED)
> > 	    {
> > 	      if (debug_infrun)
> > 		{
> > 		  ptid_t ptid = ptid_t (ws.value.integer);
> > 
> > 		  fprintf_unfiltered (gdb_stdlog,
> > 				      "infrun: %s exited while "
> > 				      "stopping threads\n",
> > 				      target_pid_to_str (ptid).c_str ());
> > 		}
> > 	    }
> > 
> > This line...
> > 
> > 		  ptid_t ptid = ptid_t (ws.value.integer);
> > 
> > ...doesn't make sense to me since ws.value.integer is supposed to
> > be the exit status for TARGET_WAITKIND_THREAD_EXITED and
> > TARGET_WAITKIND_EXITED.
> > 
> > However, for TARGET_WAITKIND_SIGNALLED, the signal number is in
> > ws.value.sig (which, due to being part of a union occupies some
> > of the same bytes as ws.value.integer).
> > 
> > So trying to find the ptid in that manner makes no sense at all.
> > 
> > I'm guessing that the ptid values are bogus when it does work.
> > 
> > Does it work when you use 
> > 
> > 		  ptid_t ptid = ptid_t (event_pid);
> > 
> > instead?  
> I guess you mean to only print event_ptid.
> 
> Yes, that is working (the proposed patch was printing both
> event_ptid and the ptid derived from ws.value.integer, assuming
> that sometimes ws.value.integer was something relevant).
> 
> Here is the trace I obtain after a few trials under high load:
> infrun: stop_all_threads, pass=0, iterations=0
> infrun:   Thread 0x7ffff7fcfb40 (LWP 3587) not executing
> infrun:   Thread 0x7ffff7310700 (LWP 3632) executing, need stop
> [Thread 0x7ffff7310700 (LWP 3632) exited]
> infrun: target_wait (-1.0.0, status) =
> infrun:   3587.3632.0 [LWP 3632],
> infrun:   status->kind = thread exited, status = 0
> infrun: LWP 3632 exited while stopping threads
> infrun:   Thread 0x7ffff7fcfb40 (LWP 3587) not executing
> infrun: stop_all_threads, pass=1, iterations=1
> infrun:   Thread 0x7ffff7fcfb40 (LWP 3587) not executing
> infrun: stop_all_threads done
> 
> The above is obtained with the patch:
> 
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index ad7892105a..7f1339a917 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -4365,12 +4365,10 @@ stop_all_threads (void)
>             {
>               if (debug_infrun)
>                 {
> -                 ptid_t ptid = ptid_t (ws.value.integer);
> -
>                   fprintf_unfiltered (gdb_stdlog,
>                                       "infrun: %s exited while "
>                                       "stopping threads\n",
> -                                     target_pid_to_str (ptid).c_str ());
> +                                     target_pid_to_str (event_ptid).c_str ());
>                 }
>             }
>           else
> 

You make a good point about trying to make use of ws.value.integer.

So, here are my suggestions:

1) Move TARGET_WAITKIND_SIGNALLED into another "else if" clause.  It
doesn't make sense for the debug message to indicate that the process
has exited when it's actually been signalled.

2) Make the TARGET_WAITKIND_THREAD_EXITED / TARGET_WAITKIND_EXITED
case print the exit status and make the TARGET_WAITKIND_SIGNALLED case
print the signal.  These are available (respectively) in ws.value.integer and
ws.value.sig.

Kevin
  
Philippe Waroquiers March 24, 2019, 9:40 p.m. UTC | #2
On Sun, 2019-03-24 at 14:35 -0700, Kevin Buettner wrote:
> You make a good point about trying to make use of ws.value.integer.
> 
> So, here are my suggestions:
> 
> 1) Move TARGET_WAITKIND_SIGNALLED into another "else if" clause.  It
> doesn't make sense for the debug message to indicate that the process
> has exited when it's actually been signalled.
> 
> 2) Make the TARGET_WAITKIND_THREAD_EXITED / TARGET_WAITKIND_EXITED
> case print the exit status and make the TARGET_WAITKIND_SIGNALLED case
> print the signal.  These are available (respectively) in ws.value.integer and
> ws.value.sig.
Thanks for the feedback, I will (some time this week) prepare a patch
based on these suggestions.

Philippe
  

Patch

diff --git a/gdb/infrun.c b/gdb/infrun.c
index ad7892105a..7f1339a917 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -4365,12 +4365,10 @@  stop_all_threads (void)
            {
              if (debug_infrun)
                {
-                 ptid_t ptid = ptid_t (ws.value.integer);
-
                  fprintf_unfiltered (gdb_stdlog,
                                      "infrun: %s exited while "
                                      "stopping threads\n",
-                                     target_pid_to_str (ptid).c_str ());
+                                     target_pid_to_str (event_ptid).c_str ());
                }
            }
          else