Setting high test timeouts makes some tests take longer.

Message ID 20160226174207.GN19841@vapier.lan
State New, archived
Delegated to: Mike Frysinger
Headers

Commit Message

Mike Frysinger Feb. 26, 2016, 5:42 p.m. UTC
  On 26 Feb 2016 03:50, Carlos O'Donell wrote:
> I noticed nptl/tst-exit2 taking longer and longer the more
> I increased my timeouts.
> 
> If you look at the test it actually relies on the test to
> timeout for it to complete! Yikes!
> 
> It uses pthread_exit to exit the main thread, and then it
> proves the created thread is still running by timing out
> the test.
> 
> These tests that have "EXPECTED_SIGNAL SIGALRM" (not all
> of them, but several) need to be rewritten to use alarm()
> to trigger the signal at some reasonable time instead of
> the full duration of the test timeout.
> 
> So with your patch to increase timeouts, it makes
> these tests take longer :-(

looks like most of the tests that rely on a lower timeout actually set
the timeout to be low.  only a few were letting it float higher.  this
is good though as it also means my patch to delete TIMEOUT from all the
tests should be respun.

we should land a quick fix to get back to the status quo by setting
TIMEOUT in these tests to a low value.  i'll post a patch to do that.

> Do you think using alarm() is a good solution? Seems
> easiest.

if we explicitly set TIMEOUT low for these tests (and drop a comment in
to keep people from "cleaning" them up), this still allows for the factor
to scale it up pretty high, which is both good & bad.  good in that for
truly slow devices, we give it a chance to pass, but for ones that would
complete in time, we sit around waiting a very long time.

looking closer, i think any & all use of "EXPECTED_SIGNAL SIGALRM" is a
bad idea.  if an earlier part of the test setup hangs, then it will still
be detected as a pass because it timedout in the end (which the skeleton
expected).

so if we added a call to alarm(1) in these tests, we still wouldn't catch
bad/early flakes.  i think we want to grow the test harness a bit here so
it can differentiate between the two.  PoC below.
-mike
  

Patch

--- a/nptl/tst-exit2.c
+++ b/nptl/tst-exit2.c
@@ -1,13 +1,17 @@ 
+#include <atomic.h>
 #include <pthread.h>
 #include <signal.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
+static int reached;
 
 static void *
 tf (void *arg)
 {
+  atomic_increment (&reached);
   while (1)
     sleep (100);
 
@@ -15,6 +19,11 @@  tf (void *arg)
   return NULL;
 }
 
+static int
+timeout_check (void)
+{
+  return (reached == 2) ? 0 : 1;
+}
 
 static int
 do_test (void)
@@ -28,6 +37,9 @@  do_test (void)
       return 1;
     }
 
+  atomic_increment (&reached);
+  alarm (3);
+
   /* Terminate only this thread.  */
   pthread_exit (NULL);
 
@@ -35,6 +47,6 @@  do_test (void)
   return 1;
 }
 
-#define EXPECTED_SIGNAL SIGALRM
 #define TEST_FUNCTION do_test ()
+#define TIMEOUT_CHECK timeout_check()
 #include "../test-skeleton.c"
--- a/test-skeleton.c
+++ b/test-skeleton.c
@@ -219,6 +219,20 @@  signal_handler (int sig __attribute__ ((unused)))
   exit (1);
 }
 
+#ifdef TIMEOUT_CHECK
+static void
+__attribute__ ((noreturn))
+child_timeout_handler (int sig __attribute__ ((unused)))
+{
+  if (TIMEOUT_CHECK == 0)
+    exit (0);
+
+  /* If the check returns, use a generic diagnosis.  */
+  puts ("Timeout check failed");
+  exit (1);
+}
+#endif
+
 /* Avoid all the buffer overflow messages on stderr.  */
 static void
 __attribute__ ((unused))
@@ -451,6 +465,10 @@  main (int argc, char *argv[])
       if (setpgid (0, 0) != 0)
 	printf ("Failed to set the process group ID: %m\n");
 
+#ifdef TIMEOUT_CHECK
+      signal (SIGALRM, child_timeout_handler);
+#endif
+
       /* Execute the test function and exit with the return value.   */
       exit (TEST_FUNCTION);
     }