increasing default timeout significantly

Message ID 20160119211840.GJ14840@vapier.lan
State Committed
Delegated to: Mike Frysinger
Headers

Commit Message

Mike Frysinger Jan. 19, 2016, 9:18 p.m. UTC
  is there a compelling reason to keep the default timeout so low ?
the vast majority of timeouts i've seen w/glibc tests are due to:
 - slow system (e.g. <1 GHz cpu)
 - loaded system (e.g. lots of parallelism)
even then, i've seen timeouts on system i don't generally consider
slow, or even loaded, and considering TIMEOUT is set to <=10 in ~60
tests (and <=20 in ~75 tests), it seems i'm not alone.  i've just
gotten in the habit of doing `export TIMEOUTFACTOR=10` on all my
setups.

in the edge case where there is a bug in the test and the timeout is
hit, i think we all agree that's either a problem with the test or a
real bug in the library somewhere.  in either case, the incident rate
should be low, so catering to that seems like the wrong trade-off.
-mike
  

Comments

Andreas Schwab Jan. 19, 2016, 10:04 p.m. UTC | #1
Mike Frysinger <vapier@gentoo.org> writes:

> --- a/test-skeleton.c
> +++ b/test-skeleton.c
> @@ -46,8 +46,9 @@
>  #endif
>  
>  #ifndef TIMEOUT
> -  /* Default timeout is two seconds.  */
> -# define TIMEOUT 2
> +  /* Default timeout is twenty seconds.  Tests should normally complete faster
> +     than this, but if they don't, that's abnormal (a bug) anyways.  */
> +# define TIMEOUT 20
>  #endif

You need to review all tests that override this value.

Andreas.
  
Mike Frysinger Jan. 19, 2016, 10:26 p.m. UTC | #2
On 19 Jan 2016 23:04, Andreas Schwab wrote:
> Mike Frysinger <vapier@gentoo.org> writes:
> > --- a/test-skeleton.c
> > +++ b/test-skeleton.c
> > @@ -46,8 +46,9 @@
> >  #endif
> >  
> >  #ifndef TIMEOUT
> > -  /* Default timeout is two seconds.  */
> > -# define TIMEOUT 2
> > +  /* Default timeout is twenty seconds.  Tests should normally complete faster
> > +     than this, but if they don't, that's abnormal (a bug) anyways.  */
> > +# define TIMEOUT 20
> >  #endif
> 
> You need to review all tests that override this value.

in what way ?  there are really only three states:
 - test doesn't set a timeout
   - a passing test doesn't care if you timeout in 2 sec or 1 year
   - a failing test would take longer to timeout now, but that test
     should already be logged, and i don't think this should affect
     the default to the detriment of the common case & valid systems
 - test sets a higher timeout
   - the custom timeout is still used
   - behavior is unchanged
 - test sets a lower timeout
   - the custom timeout is still used
   - behavior is unchanged

now, if you mean "you should delete the #define TIMEOUT from all tests
whose value is <=20", then yes, that's a cleanup that i'd probably do.
but i wasn't going to bother updating >=50 files if we didn't want to
accept the fundamental change i posted above.
-mike
  
Andreas Schwab Jan. 19, 2016, 10:45 p.m. UTC | #3
Mike Frysinger <vapier@gentoo.org> writes:

> in what way ?  there are really only three states:
>  - test doesn't set a timeout
>    - a passing test doesn't care if you timeout in 2 sec or 1 year
>    - a failing test would take longer to timeout now, but that test
>      should already be logged, and i don't think this should affect
>      the default to the detriment of the common case & valid systems
>  - test sets a higher timeout
>    - the custom timeout is still used
>    - behavior is unchanged
>  - test sets a lower timeout
>    - the custom timeout is still used
>    - behavior is unchanged
>
> now, if you mean "you should delete the #define TIMEOUT from all tests
> whose value is <=20", then yes, that's a cleanup that i'd probably do.
> but i wasn't going to bother updating >=50 files if we didn't want to
> accept the fundamental change i posted above.

That should have been part of the original submission.

Andreas.
  
Mike Frysinger Jan. 19, 2016, 10:50 p.m. UTC | #4
On 19 Jan 2016 23:45, Andreas Schwab wrote:
> Mike Frysinger <vapier@gentoo.org> writes:
> > in what way ?  there are really only three states:
> >  - test doesn't set a timeout
> >    - a passing test doesn't care if you timeout in 2 sec or 1 year
> >    - a failing test would take longer to timeout now, but that test
> >      should already be logged, and i don't think this should affect
> >      the default to the detriment of the common case & valid systems
> >  - test sets a higher timeout
> >    - the custom timeout is still used
> >    - behavior is unchanged
> >  - test sets a lower timeout
> >    - the custom timeout is still used
> >    - behavior is unchanged
> >
> > now, if you mean "you should delete the #define TIMEOUT from all tests
> > whose value is <=20", then yes, that's a cleanup that i'd probably do.
> > but i wasn't going to bother updating >=50 files if we didn't want to
> > accept the fundamental change i posted above.
> 
> That should have been part of the original submission.

i disagree.  if people don't think the default TIMEOUT should be
increased, updating 50+ files is a complete waste of my time.  i
don't like wasting my time..
-mike
  
Paul E. Murphy Jan. 19, 2016, 11:28 p.m. UTC | #5
I think the idea is reasonable.

I've been looking at a number of nptl test
failures on ppc caused by the small timeout.

This would greatly reduce the noise.

On 01/19/2016 04:50 PM, Mike Frysinger wrote:
> On 19 Jan 2016 23:45, Andreas Schwab wrote:
>> Mike Frysinger <vapier@gentoo.org> writes:
>>> in what way ?  there are really only three states:
>>>  - test doesn't set a timeout
>>>    - a passing test doesn't care if you timeout in 2 sec or 1 year
>>>    - a failing test would take longer to timeout now, but that test
>>>      should already be logged, and i don't think this should affect
>>>      the default to the detriment of the common case & valid systems
>>>  - test sets a higher timeout
>>>    - the custom timeout is still used
>>>    - behavior is unchanged
>>>  - test sets a lower timeout
>>>    - the custom timeout is still used
>>>    - behavior is unchanged
>>>
>>> now, if you mean "you should delete the #define TIMEOUT from all tests
>>> whose value is <=20", then yes, that's a cleanup that i'd probably do.
>>> but i wasn't going to bother updating >=50 files if we didn't want to
>>> accept the fundamental change i posted above.
>>
>> That should have been part of the original submission.
> 
> i disagree.  if people don't think the default TIMEOUT should be
> increased, updating 50+ files is a complete waste of my time.  i
> don't like wasting my time..
> -mike
>
  
Carlos O'Donell Jan. 20, 2016, 5:33 a.m. UTC | #6
On 01/19/2016 05:50 PM, Mike Frysinger wrote:
> On 19 Jan 2016 23:45, Andreas Schwab wrote:
>> Mike Frysinger <vapier@gentoo.org> writes:
>>> in what way ?  there are really only three states:
>>>  - test doesn't set a timeout
>>>    - a passing test doesn't care if you timeout in 2 sec or 1 year
>>>    - a failing test would take longer to timeout now, but that test
>>>      should already be logged, and i don't think this should affect
>>>      the default to the detriment of the common case & valid systems
>>>  - test sets a higher timeout
>>>    - the custom timeout is still used
>>>    - behavior is unchanged
>>>  - test sets a lower timeout
>>>    - the custom timeout is still used
>>>    - behavior is unchanged
>>>
>>> now, if you mean "you should delete the #define TIMEOUT from all tests
>>> whose value is <=20", then yes, that's a cleanup that i'd probably do.
>>> but i wasn't going to bother updating >=50 files if we didn't want to
>>> accept the fundamental change i posted above.
>>
>> That should have been part of the original submission.
> 
> i disagree.  if people don't think the default TIMEOUT should be
> increased, updating 50+ files is a complete waste of my time.  i
> don't like wasting my time..

I think updating TIMEOUT in all tests that define TIMEOUT is an orthogonal
cleanup that should not block increasing the default timeout for tests that
don't define TIMEOUT.

To put it another way: I think Mike's change to increase timeout to 20 is
fine. In practice I run with TIMEOUTFACTOR=999999 on my development system
to catch hung tests and debug them.

Cheers,
Carlos.
  
Andreas Schwab Jan. 20, 2016, 8:55 a.m. UTC | #7
Mike Frysinger <vapier@gentoo.org> writes:

> On 19 Jan 2016 23:45, Andreas Schwab wrote:
>> Mike Frysinger <vapier@gentoo.org> writes:
>> > in what way ?  there are really only three states:
>> >  - test doesn't set a timeout
>> >    - a passing test doesn't care if you timeout in 2 sec or 1 year
>> >    - a failing test would take longer to timeout now, but that test
>> >      should already be logged, and i don't think this should affect
>> >      the default to the detriment of the common case & valid systems
>> >  - test sets a higher timeout
>> >    - the custom timeout is still used
>> >    - behavior is unchanged
>> >  - test sets a lower timeout
>> >    - the custom timeout is still used
>> >    - behavior is unchanged
>> >
>> > now, if you mean "you should delete the #define TIMEOUT from all tests
>> > whose value is <=20", then yes, that's a cleanup that i'd probably do.
>> > but i wasn't going to bother updating >=50 files if we didn't want to
>> > accept the fundamental change i posted above.
>> 
>> That should have been part of the original submission.
>
> i disagree.

Without it we cannot be sure that you thought about the implications of
your patch.  That is important to know.

Andreas.
  
Chris Metcalf Jan. 20, 2016, 5:48 p.m. UTC | #8
On 01/19/2016 06:28 PM, Paul E. Murphy wrote:
> I think the idea is reasonable.
>
> I've been looking at a number of nptl test
> failures on ppc caused by the small timeout.
>
> This would greatly reduce the noise.

I also agree that a larger timeout makes sense.
  
Mike Frysinger Jan. 20, 2016, 7:35 p.m. UTC | #9
On 20 Jan 2016 09:55, Andreas Schwab wrote:
> Mike Frysinger <vapier@gentoo.org> writes:
> > On 19 Jan 2016 23:45, Andreas Schwab wrote:
> >> Mike Frysinger <vapier@gentoo.org> writes:
> >> > in what way ?  there are really only three states:
> >> >  - test doesn't set a timeout
> >> >    - a passing test doesn't care if you timeout in 2 sec or 1 year
> >> >    - a failing test would take longer to timeout now, but that test
> >> >      should already be logged, and i don't think this should affect
> >> >      the default to the detriment of the common case & valid systems
> >> >  - test sets a higher timeout
> >> >    - the custom timeout is still used
> >> >    - behavior is unchanged
> >> >  - test sets a lower timeout
> >> >    - the custom timeout is still used
> >> >    - behavior is unchanged
> >> >
> >> > now, if you mean "you should delete the #define TIMEOUT from all tests
> >> > whose value is <=20", then yes, that's a cleanup that i'd probably do.
> >> > but i wasn't going to bother updating >=50 files if we didn't want to
> >> > accept the fundamental change i posted above.
> >> 
> >> That should have been part of the original submission.
> >
> > i disagree.
> 
> Without it we cannot be sure that you thought about the implications of
> your patch.  That is important to know.

sorry, but this just sounds hand wavey.  with my changes, the test results are
the same.  as others have noted, they're already using very large timeout vars
which means in practice, we're setting the timeout to minutes/hours/longer.
-mike
  
Carlos O'Donell Jan. 20, 2016, 8:14 p.m. UTC | #10
On 01/20/2016 02:35 PM, Mike Frysinger wrote:
> On 20 Jan 2016 09:55, Andreas Schwab wrote:
>> Mike Frysinger <vapier@gentoo.org> writes:
>>> On 19 Jan 2016 23:45, Andreas Schwab wrote:
>>>> Mike Frysinger <vapier@gentoo.org> writes:
>>>>> now, if you mean "you should delete the #define TIMEOUT from all tests
>>>>> whose value is <=20", then yes, that's a cleanup that i'd probably do.
>>>>> but i wasn't going to bother updating >=50 files if we didn't want to
>>>>> accept the fundamental change i posted above.
>>>>
>>>> That should have been part of the original submission.
>>>
>>> i disagree.
>>
>> Without it we cannot be sure that you thought about the implications of
>> your patch.  That is important to know.
> 
> sorry, but this just sounds hand wavey.  with my changes, the test results are
> the same.  as others have noted, they're already using very large timeout vars
> which means in practice, we're setting the timeout to minutes/hours/longer.

I agree with Mike here. I see no technical reason why you can't raise the
default timeout. Review of existing tests that set the timeout to a specific
value is an orthogonal fix.

Cheers,
Carlos.
  
Andreas Schwab Jan. 20, 2016, 10:20 p.m. UTC | #11
It doesn't make sense for some tests to reduce the timeout from the
default.

Andreas.
  
Florian Weimer Jan. 22, 2016, 5:10 p.m. UTC | #12
On 01/19/2016 10:18 PM, Mike Frysinger wrote:
> is there a compelling reason to keep the default timeout so low ?

Some tests use TIMEOUT not just as a safety net to detect hangs, but to
adjust loop counts etc. for exercising races.

Florian
  
Mike Frysinger Jan. 22, 2016, 7:53 p.m. UTC | #13
On 22 Jan 2016 18:10, Florian Weimer wrote:
> On 01/19/2016 10:18 PM, Mike Frysinger wrote:
> > is there a compelling reason to keep the default timeout so low ?
> 
> Some tests use TIMEOUT not just as a safety net to detect hangs, but to
> adjust loop counts etc. for exercising races.

the only files i see using TIMEOUT directly:
- malloc/tst-malloc-thread-exit.c:
  - defines TIMEOUT to 7
  - runs the test for (TIMEOUT - 2) seconds
- nptl/tst-rwlock9.c
  - defines TIMEOUT to 1000000
  - uses define to set some structs
  - undefines TIMEOUT and resets it to 30 before including test-skeleton

the first test is the only one i can see where we don't want to use a
higher timeout, so i'll add a comment above the define and leave it be.
the second one looks like bogus re-use of the TIMEOUT name.

what other tests were you looking at ?
-mike
  
Florian Weimer Jan. 25, 2016, 8:03 p.m. UTC | #14
* Mike Frysinger:

> On 22 Jan 2016 18:10, Florian Weimer wrote:
>> On 01/19/2016 10:18 PM, Mike Frysinger wrote:
>> > is there a compelling reason to keep the default timeout so low ?
>> 
>> Some tests use TIMEOUT not just as a safety net to detect hangs, but to
>> adjust loop counts etc. for exercising races.
>
> the only files i see using TIMEOUT directly:
> - malloc/tst-malloc-thread-exit.c:
>   - defines TIMEOUT to 7
>   - runs the test for (TIMEOUT - 2) seconds
> - nptl/tst-rwlock9.c
>   - defines TIMEOUT to 1000000
>   - uses define to set some structs
>   - undefines TIMEOUT and resets it to 30 before including test-skeleton
>
> the first test is the only one i can see where we don't want to use a
> higher timeout, so i'll add a comment above the define and leave it be.
> the second one looks like bogus re-use of the TIMEOUT name.
>
> what other tests were you looking at ?

Some things that have been submitted (as patches or in bug reports),
but apparently not yet comitted. :)

In other words, if you think it's okay to have the occasional outlier
like tst-malloc-thread-exit, I'm fine with increasing the timeout.
  

Patch

--- a/test-skeleton.c
+++ b/test-skeleton.c
@@ -46,8 +46,9 @@ 
 #endif
 
 #ifndef TIMEOUT
-  /* Default timeout is two seconds.  */
-# define TIMEOUT 2
+  /* Default timeout is twenty seconds.  Tests should normally complete faster
+     than this, but if they don't, that's abnormal (a bug) anyways.  */
+# define TIMEOUT 20
 #endif
 
 #define OPT_DIRECT 1000