Fix gdb.mi/mi-nsmoribund.exp timeouts (Re: [PATCH] Cleanup num_found usage in gdb_wait_for_event)

Message ID 554A5375.4040405@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves May 6, 2015, 5:46 p.m. UTC
  On 04/29/2015 08:40 PM, Simon Marchi wrote:

> Oh well, I was investigating the exact same test failure (mi-nsmoribund.exp). It's
> failing maybe 90% of the time on our x86 Jenkins boxes. I was actually writing a
> patch for that at the moment.
> 
> For poll, I keep a static variable around that tells which pollfd structure to start
> iterating from. So the first time we check 0, 1, 2, next time we'll check 1, 2, 0, then
> 2, 0, 1, etc. It's really simple. The static variable should be a field in gdb_notifier
> though.

That's _exactly_ what my patch does too.  :-)  That was this bit:

@@ -751,8 +794,20 @@ gdb_wait_for_event (int block)
   if (use_poll)
     {
 #ifdef HAVE_POLL
-      for (i = 0; (i < gdb_notifier.num_fds) && (num_found > 0); i++)
+      /* To level the fairness across event sources, we poll them in a
+        round-robin-like fashion.  The number and order of the polled
+        file descriptors may change between invocations, but this is
+        good enough.  */
+      static int last_notifier = -1;
+
+      while (num_found > 0)
        {
+         last_notifier++;
+         if (last_notifier >= gdb_notifier.num_fds)
+           last_notifier = 0;
+
+         i = last_notifier;
+
          if ((gdb_notifier.poll_fds + i)->revents)
            num_found--;
          else


> 
> Here is my patch in raw form:
> https://github.com/simark/binutils-gdb/commit/ec49aaec6963f0aa974d183b8fca669f6261274d
> 
> In any case, I don't see why interacting with num_found here is useful, given that we only
> handle one event. If we did handle all the found events, then I would understand, since you
> want to know when you are done.

The only reason I think is to avoid looking at the pollfds if poll
returned no events.  So, how about we combine and simplify things,
like this?

Thanks,
Pedro Alves

From 1dfb4a7a57179f633b18feb054211257050a22c2 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 6 May 2015 18:34:32 +0100
Subject: [PATCH] Fix gdb.mi/mi-nsmoribund.exp timeouts

The PPC64 buildbot has been showing timeouts in mi-nsmoribund.exp,
like this:

 (...)
 -thread-info
 FAIL: gdb.mi/mi-nsmoribund.exp: thread state: all running except the breakpoint thread (timeout)

... and I can reproduce this on gcc110 (PPC64) on the gcc compile
farm.

That is, the test sends "-thread-info" to GDB, but GDB never replies
back.

The problem is that these machines are too fast for gdb.  :-)

That test has a few threads running the same tight loop, and
constantly hitting a thread-specific breakpoint that needs to be
stepped over.  If threads trip on breakpoints fast enough that
linux-nat.c's event pipe associated with SIGCHLD is constantly being
written to, even if the stdin file descriptor also has an event to
handle, gdb never gets to it. because linux-nat.c's pipe comes first
in the set of descriptors served by the poll/select code in the event
loop.

Fix this by having the event loop serve file event sources in
round-robin-like fashion, similarly to how its done in
gdb_do_one_event.

Unfortunately, the poll and the select variants each need their own
fixing.

Tested on x86_64 Fedora 20 (poll and select variants), and PPC64
Fedora 18.  Fixes the timeout in the PPC64 machine in the compile farm
that times out without this, and I won't be surprised if it fixes
other random timeouts in other tests.

(gdbserver's copy of the event-loop doesn't need this (yet), as it
still pushes all ready events to an event queue.  That is, it hasn't
had 70b66289 merged yet.  We should really merge both event-loop.c
copies into a single shared file, but that's for another day.)

gdb/ChangeLog:
2015-05-06  Pedro Alves  <palves@redhat.com>
	    Simon Marchi  <simon.marchi@ericsson.com>

	* event-loop.c (gdb_notifier) <next_file_handler,
	next_poll_fds_index>: New fields.
	(get_next_file_handler_to_handle_and_advance): New function.
	(delete_file_handler): If deleting the next file handler to
	handle, advance to the next file handler.
	(gdb_wait_for_event): Bail early if no event fired.  Poll file
	handlers in round-robin fashion.
---
 gdb/event-loop.c | 120 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 88 insertions(+), 32 deletions(-)
  

Comments

Pedro Alves May 15, 2015, 3:33 p.m. UTC | #1
On 05/06/2015 06:46 PM, Pedro Alves wrote:

> gdb/ChangeLog:
> 2015-05-06  Pedro Alves  <palves@redhat.com>
> 	    Simon Marchi  <simon.marchi@ericsson.com>
> 
> 	* event-loop.c (gdb_notifier) <next_file_handler,
> 	next_poll_fds_index>: New fields.
> 	(get_next_file_handler_to_handle_and_advance): New function.
> 	(delete_file_handler): If deleting the next file handler to
> 	handle, advance to the next file handler.
> 	(gdb_wait_for_event): Bail early if no event fired.  Poll file
> 	handlers in round-robin fashion.

I pushed this in now.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/event-loop.c b/gdb/event-loop.c
index 79e41fd..9ac4908 100644
--- a/gdb/event-loop.c
+++ b/gdb/event-loop.c
@@ -165,10 +165,24 @@  static struct
     /* Ptr to head of file handler list.  */
     file_handler *first_file_handler;
 
+    /* Next file handler to handle, for the select variant.  To level
+       the fairness across event sources, we serve file handlers in a
+       round-robin-like fashion.  The number and order of the polled
+       file handlers may change between invocations, but this is good
+       enough.  */
+    file_handler *next_file_handler;
+
 #ifdef HAVE_POLL
     /* Ptr to array of pollfd structures.  */
     struct pollfd *poll_fds;
 
+    /* Next file descriptor to handle, for the poll variant.  To level
+       the fairness across event sources, we poll the file descriptors
+       in a round-robin-like fashion.  The number and order of the
+       polled file descriptors may change between invocations, but
+       this is good enough.  */
+    int next_poll_fds_index;
+
     /* Timeout in milliseconds for calls to poll().  */
     int poll_timeout;
 #endif
@@ -494,6 +508,31 @@  create_file_handler (int fd, int mask, handler_func * proc,
   file_ptr->mask = mask;
 }
 
+/* Return the next file handler to handle, and advance to the next
+   file handler, wrapping around if the end of the list is
+   reached.  */
+
+static file_handler *
+get_next_file_handler_to_handle_and_advance (void)
+{
+  file_handler *curr_next;
+
+  /* The first time around, this is still NULL.  */
+  if (gdb_notifier.next_file_handler == NULL)
+    gdb_notifier.next_file_handler = gdb_notifier.first_file_handler;
+
+  curr_next = gdb_notifier.next_file_handler;
+  gdb_assert (curr_next != NULL);
+
+  /* Advance.  */
+  gdb_notifier.next_file_handler = curr_next->next_file;
+  /* Wrap around, if necessary.  */
+  if (gdb_notifier.next_file_handler == NULL)
+    gdb_notifier.next_file_handler = gdb_notifier.first_file_handler;
+
+  return curr_next;
+}
+
 /* Remove the file descriptor FD from the list of monitored fd's: 
    i.e. we don't care anymore about events on the FD.  */
 void
@@ -576,6 +615,17 @@  delete_file_handler (int fd)
 
   file_ptr->mask = 0;
 
+  /* If this file handler was going to be the next one to be handled,
+     advance to the next's next, if any.  */
+  if (gdb_notifier.next_file_handler == file_ptr)
+    {
+      if (file_ptr->next_file == NULL
+	  && file_ptr == gdb_notifier.first_file_handler)
+	gdb_notifier.next_file_handler = NULL;
+      else
+	get_next_file_handler_to_handle_and_advance ();
+    }
+
   /* Get rid of the file handler in the file handler list.  */
   if (file_ptr == gdb_notifier.first_file_handler)
     gdb_notifier.first_file_handler = file_ptr->next_file;
@@ -672,7 +722,6 @@  gdb_wait_for_event (int block)
 {
   file_handler *file_ptr;
   int num_found = 0;
-  int i;
 
   /* Make sure all output is done before getting another event.  */
   gdb_flush (gdb_stdout);
@@ -743,37 +792,47 @@  gdb_wait_for_event (int block)
 	}
     }
 
+  /* Avoid looking at poll_fds[i]->revents if no event fired.  */
+  if (num_found <= 0)
+    return 0;
+
   /* Run event handlers.  We always run just one handler and go back
      to polling, in case a handler changes the notifier list.  Since
      events for sources we haven't consumed yet wake poll/select
      immediately, no event is lost.  */
 
+  /* To level the fairness across event descriptors, we handle them in
+     a round-robin-like fashion.  The number and order of descriptors
+     may change between invocations, but this is good enough.  */
   if (use_poll)
     {
 #ifdef HAVE_POLL
-      for (i = 0; (i < gdb_notifier.num_fds) && (num_found > 0); i++)
-	{
-	  if ((gdb_notifier.poll_fds + i)->revents)
-	    num_found--;
-	  else
-	    continue;
+      int i;
+      int mask;
 
-	  for (file_ptr = gdb_notifier.first_file_handler;
-	       file_ptr != NULL;
-	       file_ptr = file_ptr->next_file)
-	    {
-	      if (file_ptr->fd == (gdb_notifier.poll_fds + i)->fd)
-		break;
-	    }
+      while (1)
+	{
+	  if (gdb_notifier.next_poll_fds_index >= gdb_notifier.num_fds)
+	    gdb_notifier.next_poll_fds_index = 0;
+	  i = gdb_notifier.next_poll_fds_index++;
 
-	  if (file_ptr)
-	    {
-	      int mask = (gdb_notifier.poll_fds + i)->revents;
+	  gdb_assert (i < gdb_notifier.num_fds);
+	  if ((gdb_notifier.poll_fds + i)->revents)
+	    break;
+	}
 
-	      handle_file_event (file_ptr, mask);
-	      return 1;
-	    }
+      for (file_ptr = gdb_notifier.first_file_handler;
+	   file_ptr != NULL;
+	   file_ptr = file_ptr->next_file)
+	{
+	  if (file_ptr->fd == (gdb_notifier.poll_fds + i)->fd)
+	    break;
 	}
+      gdb_assert (file_ptr != NULL);
+
+      mask = (gdb_notifier.poll_fds + i)->revents;
+      handle_file_event (file_ptr, mask);
+      return 1;
 #else
       internal_error (__FILE__, __LINE__,
 		      _("use_poll without HAVE_POLL"));
@@ -781,11 +840,12 @@  gdb_wait_for_event (int block)
     }
   else
     {
-      for (file_ptr = gdb_notifier.first_file_handler;
-	   (file_ptr != NULL) && (num_found > 0);
-	   file_ptr = file_ptr->next_file)
+      /* See comment about even source fairness above.  */
+      int mask = 0;
+
+      do
 	{
-	  int mask = 0;
+	  file_ptr = get_next_file_handler_to_handle_and_advance ();
 
 	  if (FD_ISSET (file_ptr->fd, &gdb_notifier.ready_masks[0]))
 	    mask |= GDB_READABLE;
@@ -793,15 +853,11 @@  gdb_wait_for_event (int block)
 	    mask |= GDB_WRITABLE;
 	  if (FD_ISSET (file_ptr->fd, &gdb_notifier.ready_masks[2]))
 	    mask |= GDB_EXCEPTION;
-
-	  if (!mask)
-	    continue;
-	  else
-	    num_found--;
-
-	  handle_file_event (file_ptr, mask);
-	  return 1;
 	}
+      while (mask == 0);
+
+      handle_file_event (file_ptr, mask);
+      return 1;
     }
   return 0;
 }