diff mbox

Fix time/tst-cpuclock1 intermitent failures

Message ID 20200128170229.11180-1-lamm@linux.ibm.com
State Superseded
Delegated to: Carlos O'Donell
Headers show

Commit Message

Lucas A. M. Magalhaes Jan. 28, 2020, 5:02 p.m. UTC
This test fails intermittently in systems with heavy load as
CLOCK_PROCESS_CPUTIME_ID is subject to scheduler pressure.  Thus the
test boundaries where relaxed to keep it from fail on this systems.

--

Hi,

I tried to implement the solution suggested by Adhemerval and it worked
fine on my tests.

The curious thing I had much more problems with nanosleep returning a
lot earlier then expected.  On the 1s sleep it many times returned in
0.2s for a heavy loaded system.

 time/tst-cpuclock1.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Carlos O'Donell Jan. 29, 2020, 9:54 p.m. UTC | #1
On 1/28/20 12:02 PM, Lucas A. M. Magalhaes wrote:
> This test fails intermittently in systems with heavy load as
> CLOCK_PROCESS_CPUTIME_ID is subject to scheduler pressure.  Thus the
> test boundaries where relaxed to keep it from fail on this systems.
> 
> --
> 
> Hi,
> 
> I tried to implement the solution suggested by Adhemerval and it worked
> fine on my tests.
> 
> The curious thing I had much more problems with nanosleep returning a
> lot earlier then expected.  On the 1s sleep it many times returned in
> 0.2s for a heavy loaded system.

(a) Idea.

I like the idea behind this patch because it makes the test more robust
while still keeping the intent of the test which is to catch gross mistakes.

At the high level I think we need a slight refactoring and we can fix two
tests with one change.

We see the same failures from rt/tst-cpuclock2.c and I'm wondering if we
couldn't refactor a function into support/ to help with both tests?
Like percent_diff_check(time1, time2, percent_difference_allowed)?

Then both tests can be adjusted to just use the new function.

We could also do with a time_sub() function to compute time1 - time2, and
time_add() function to compute time1 + time2.

(b) Implementation.

You adjust the places which need adjusting for time difference calculation
and that's correct.

(c) Details.

There are a few nit-picky

>  time/tst-cpuclock1.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
OK.

> diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
> index 0120906f23..4968038207 100644
> --- a/time/tst-cpuclock1.c
> +++ b/time/tst-cpuclock1.c
> @@ -157,14 +157,17 @@ do_test (void)
>  
>    struct timespec diff = { .tv_sec = after.tv_sec - before.tv_sec,
>  			   .tv_nsec = after.tv_nsec - before.tv_nsec };
> +  /* In ideal scheduler pressure this diff should be closer to 0.5s.  But in
> +     a heavy loaded system the scheduler pressure can make this times to be
> +     uncertain.  That's why the upper bound is 0.7s and there is no lower bound
> +   */

Suggest:

/* Under ideal scheduler pressure this difference should be closer to 0.5s.
   Under heavy load the scheduler pressure makes the timing uncertain.
   Given the uncertainty we set the upper bound to 0.7s and omit the lower bound.  */

Note:
- Trailing comment close follows on same line after 2 spaces.


>    if (diff.tv_nsec < 0)
>      {
>        --diff.tv_sec;
>        diff.tv_nsec += 1000000000;
>      }
>    if (diff.tv_sec != 0
> -      || diff.tv_nsec > 600000000
> -      || diff.tv_nsec < 100000000)
> +      || diff.tv_nsec > 700000000)

Could we rewrite this as a percent difference function check?

% diff = | e1 - e2 | / (0.5 * (e1 + e2)) * 100

Then subtract 0.5s from the "after" time.

Then do a percent_diff_check (time1, time2, 10%)

And consider them equal within 10% or 5% or whatever.

At least this way we quantify the jitter as a percentage.

>      {
>        printf ("before - after %ju.%.9ju outside reasonable range\n",
>  	      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
> @@ -196,13 +199,15 @@ do_test (void)
>  	{
>  	  struct timespec d = { .tv_sec = afterns.tv_sec - after.tv_sec,
>  				.tv_nsec = afterns.tv_nsec - after.tv_nsec };
> +	  /* scheduler pressure may affect sleep time so this test have relaxed
> +	     time restrictions.  */
>  	  if (d.tv_nsec < 0)
>  	    {
>  	      --d.tv_sec;
>  	      d.tv_nsec += 1000000000;
>  	    }
>  	  if (d.tv_sec > 0
> -	      || d.tv_nsec < sleeptime.tv_nsec
> +	      || d.tv_nsec < 100000000
>  	      || d.tv_nsec > sleeptime.tv_nsec * 2)

Likewise this also becomes a percent difference test.

>  	    {
>  	      printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
>
Lucas A. M. Magalhaes Jan. 30, 2020, 1:12 p.m. UTC | #2
Quoting Carlos O'Donell (2020-01-29 18:54:19)
> On 1/28/20 12:02 PM, Lucas A. M. Magalhaes wrote:
> > This test fails intermittently in systems with heavy load as
> > CLOCK_PROCESS_CPUTIME_ID is subject to scheduler pressure.  Thus the
> > test boundaries where relaxed to keep it from fail on this systems.
> > 
> > --
> > 
> > Hi,
> > 
> > I tried to implement the solution suggested by Adhemerval and it worked
> > fine on my tests.
> > 
> > The curious thing I had much more problems with nanosleep returning a
> > lot earlier then expected.  On the 1s sleep it many times returned in
> > 0.2s for a heavy loaded system.
> 
> (a) Idea.
> 
> I like the idea behind this patch because it makes the test more robust
> while still keeping the intent of the test which is to catch gross mistakes.
> 
> At the high level I think we need a slight refactoring and we can fix two
> tests with one change.
> 
> We see the same failures from rt/tst-cpuclock2.c and I'm wondering if we
> couldn't refactor a function into support/ to help with both tests?
> Like percent_diff_check(time1, time2, percent_difference_allowed)?
> 

Thanks Carlos for this review.  I also like the idea of this refactor and will
work on this.

> Then both tests can be adjusted to just use the new function.
> 
> We could also do with a time_sub() function to compute time1 - time2, and
> time_add() function to compute time1 + time2.
> 
> (b) Implementation.
> 
> You adjust the places which need adjusting for time difference calculation
> and that's correct.
> 
> (c) Details.
> 
> There are a few nit-picky
> 
> >  time/tst-cpuclock1.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> OK.
> 
> > diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
> > index 0120906f23..4968038207 100644
> > --- a/time/tst-cpuclock1.c
> > +++ b/time/tst-cpuclock1.c
> > @@ -157,14 +157,17 @@ do_test (void)
> >  
> >    struct timespec diff = { .tv_sec = after.tv_sec - before.tv_sec,
> >                          .tv_nsec = after.tv_nsec - before.tv_nsec };
> > +  /* In ideal scheduler pressure this diff should be closer to 0.5s.  But in
> > +     a heavy loaded system the scheduler pressure can make this times to be
> > +     uncertain.  That's why the upper bound is 0.7s and there is no lower bound
> > +   */
> 
> Suggest:
> 
> /* Under ideal scheduler pressure this difference should be closer to 0.5s.
>    Under heavy load the scheduler pressure makes the timing uncertain.
>    Given the uncertainty we set the upper bound to 0.7s and omit the lower bound.  */
> 
> Note:
> - Trailing comment close follows on same line after 2 spaces.
> 
> 

Ok, I will rewrite this. It will need another description after the refactor.

> >    if (diff.tv_nsec < 0)
> >      {
> >        --diff.tv_sec;
> >        diff.tv_nsec += 1000000000;
> >      }
> >    if (diff.tv_sec != 0
> > -      || diff.tv_nsec > 600000000
> > -      || diff.tv_nsec < 100000000)
> > +      || diff.tv_nsec > 700000000)
> 
> Could we rewrite this as a percent difference function check?
> 
> % diff = | e1 - e2 | / (0.5 * (e1 + e2)) * 100
> 
> Then subtract 0.5s from the "after" time.
> 
> Then do a percent_diff_check (time1, time2, 10%)
> 
> And consider them equal within 10% or 5% or whatever.
> 
> At least this way we quantify the jitter as a percentage.
> 

Yes. I agree.
diff mbox

Patch

diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
index 0120906f23..4968038207 100644
--- a/time/tst-cpuclock1.c
+++ b/time/tst-cpuclock1.c
@@ -157,14 +157,17 @@  do_test (void)
 
   struct timespec diff = { .tv_sec = after.tv_sec - before.tv_sec,
 			   .tv_nsec = after.tv_nsec - before.tv_nsec };
+  /* In ideal scheduler pressure this diff should be closer to 0.5s.  But in
+     a heavy loaded system the scheduler pressure can make this times to be
+     uncertain.  That's why the upper bound is 0.7s and there is no lower bound
+   */
   if (diff.tv_nsec < 0)
     {
       --diff.tv_sec;
       diff.tv_nsec += 1000000000;
     }
   if (diff.tv_sec != 0
-      || diff.tv_nsec > 600000000
-      || diff.tv_nsec < 100000000)
+      || diff.tv_nsec > 700000000)
     {
       printf ("before - after %ju.%.9ju outside reasonable range\n",
 	      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
@@ -196,13 +199,15 @@  do_test (void)
 	{
 	  struct timespec d = { .tv_sec = afterns.tv_sec - after.tv_sec,
 				.tv_nsec = afterns.tv_nsec - after.tv_nsec };
+	  /* scheduler pressure may affect sleep time so this test have relaxed
+	     time restrictions.  */
 	  if (d.tv_nsec < 0)
 	    {
 	      --d.tv_sec;
 	      d.tv_nsec += 1000000000;
 	    }
 	  if (d.tv_sec > 0
-	      || d.tv_nsec < sleeptime.tv_nsec
+	      || d.tv_nsec < 100000000
 	      || d.tv_nsec > sleeptime.tv_nsec * 2)
 	    {
 	      printf ("nanosleep time %ju.%.9ju outside reasonable range\n",