sunrpc/clnt_udp.c: always respect timeout(s)

Message ID 1464945979-10503-1-git-send-email-praiskup@redhat.com
State New, archived
Headers

Commit Message

Pavel Raiskup June 3, 2016, 9:26 a.m. UTC
  Previously, when the port we were listening on was really busy,
clntudp_call () ended up processing the UDP queue without properly
defined time-limit (clntudp_call could keep processing forever
without returning back to caller).

The fix incorporates using of clock_gettime () to measure the real
timeout, instead of simply counting the number of poll()'s.  We
should not face some huge slowdown because we call clock_gettime
only twice in most cases *and* never in busy loop.

* sunrpc/clnt_udp.c (clntudp_call): Count everything related to
timeouts in milliseconds.  Use clock_gettime ().  Still keep
checking for timeout in post-poll() time because the for(;;) loop
is torn by goto statement that requires checking for timeout
anyway.
(clntudp_call_timeouted): New helper checking whether we still
should re-send the udp request.
(timeval_to_ms): New helper macro.
---
 sunrpc/clnt_udp.c | 66 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 28 deletions(-)
  

Comments

Florian Weimer June 15, 2016, 8 a.m. UTC | #1
On 06/03/2016 11:26 AM, Pavel Raiskup wrote:

> Previously, when the port we were listening on was really busy,
> clntudp_call () ended up processing the UDP queue without properly
> defined time-limit (clntudp_call could keep processing forever
> without returning back to caller).
>
> The fix incorporates using of clock_gettime () to measure the real
> timeout, instead of simply counting the number of poll()'s.  We
> should not face some huge slowdown because we call clock_gettime
> only twice in most cases *and* never in busy loop.

Thanks.

I field a bug to track this:

   https://sourceware.org/bugzilla/show_bug.cgi?id=20257

The commit message and ChangeLog need to reference this (as “[BZ #2057]”).

> diff --git a/sunrpc/clnt_udp.c b/sunrpc/clnt_udp.c
> index 6ffa5f2..2cab4c2 100644
> --- a/sunrpc/clnt_udp.c
> +++ b/sunrpc/clnt_udp.c
> @@ -1,6 +1,8 @@
>  /*
>   * clnt_udp.c, Implements a UDP/IP based, client side RPC.
>   *
> + * Copyright (c) 2016, Free Software Foundation, Inc.
> + *
>   * Copyright (c) 2010, Oracle America, Inc.

Please do not change the copyright notice in this way.  Either add the 
full LGPL header, or leave it unchanged.

> +static bool_t
> +clntudp_call_timeouted (const struct timespec *start, int timeout, int *resend_timeout)
> +{
> +  struct timespec now;
> +  long remains, waited;
> +
> +  __clock_gettime (CLOCK_MONOTONIC, &now);

I was mistaken about the availability of CLOCK_MONOTONIC.  It seems to 
be missing on Hurd.  CLOCK_REALTIME is probably missing there as well, 
so we need fallback to gettimeofday.

We have similar code in resolv/res_send.c.  It needs to use 
CLOCK_MONOTONIC if available, but currently does not.  So this should 
really move to generic libc functionality (for separate testing, too).

I can take care of this if you want.  Unless you want to learn a bit 
about glibc symbol management.

> +  /* count everything in milliseconds */
> +  waited = (now.tv_sec  - start->tv_sec  - 1)          * 1000
> +         + (now.tv_nsec - start->tv_nsec + 1000000000) / 1000000;
> +  remains = timeout - waited;
> +  if (*resend_timeout > remains)
> +    *resend_timeout = remains;
> +  return (*resend_timeout <= 0);
> +}

This should perform a bit of overflow checking, or use unsigned long 
long at least.  It probably won't matter in practice because 2**31 
milliseconds is a long time, but let's cover this properly.

 From an interface perspective, it seems preferable to compute an 
explicit deadline, and count the milliseconds towards that, rather than 
recomputing the absolute deadline from the millisecond value.

Thanks,
Florian
  
Pavel Raiskup June 15, 2016, 12:39 p.m. UTC | #2
On Wednesday, June 15, 2016 10:00:34 AM CEST Florian Weimer wrote:
> On 06/03/2016 11:26 AM, Pavel Raiskup wrote:
>
> > Previously, when the port we were listening on was really busy,
> > clntudp_call () ended up processing the UDP queue without properly
> > defined time-limit (clntudp_call could keep processing forever
> > without returning back to caller).
> >
> > The fix incorporates using of clock_gettime () to measure the real
> > timeout, instead of simply counting the number of poll()'s.  We
> > should not face some huge slowdown because we call clock_gettime
> > only twice in most cases *and* never in busy loop.
>
> Thanks.
>
> I field a bug to track this:
>
>    https://sourceware.org/bugzilla/show_bug.cgi?id=20257
>
> The commit message and ChangeLog need to reference this (as “[BZ #2057]”).
>
> > diff --git a/sunrpc/clnt_udp.c b/sunrpc/clnt_udp.c
> > index 6ffa5f2..2cab4c2 100644
> > --- a/sunrpc/clnt_udp.c
> > +++ b/sunrpc/clnt_udp.c
> > @@ -1,6 +1,8 @@
> >  /*
> >   * clnt_udp.c, Implements a UDP/IP based, client side RPC.
> >   *
> > + * Copyright (c) 2016, Free Software Foundation, Inc.
> > + *
> >   * Copyright (c) 2010, Oracle America, Inc.
>
> Please do not change the copyright notice in this way.  Either add the
> full LGPL header, or leave it unchanged.

Those two I'm able to fix easily, however ..

> > +static bool_t
> > +clntudp_call_timeouted (const struct timespec *start, int timeout, int *resend_timeout)
> > +{
> > +  struct timespec now;
> > +  long remains, waited;
> > +
> > +  __clock_gettime (CLOCK_MONOTONIC, &now);
>
> I was mistaken about the availability of CLOCK_MONOTONIC.  It seems to
> be missing on Hurd.  CLOCK_REALTIME is probably missing there as well,
> so we need fallback to gettimeofday.
>
> We have similar code in resolv/res_send.c.  It needs to use
> CLOCK_MONOTONIC if available, but currently does not.  So this should
> really move to generic libc functionality (for separate testing, too).
>
> I can take care of this if you want.  Unless you want to learn a bit
> about glibc symbol management.

... I was afraid about portability here.  Are we talking about some new
glibc API, or just internal symbol?  Something like doing best-effort
CLOCK_MONOTONIC and falling back to gettimeofday when not possible?
Agreed here that it is better approach, and I can definitely learn about
glibc symbol management from your patches .. thanks for looking at it,
btw.

> > +  /* count everything in milliseconds */
> > +  waited = (now.tv_sec  - start->tv_sec  - 1)          * 1000
> > +         + (now.tv_nsec - start->tv_nsec + 1000000000) / 1000000;
> > +  remains = timeout - waited;
> > +  if (*resend_timeout > remains)
> > +    *resend_timeout = remains;
> > +  return (*resend_timeout <= 0);
> > +}
>
> This should perform a bit of overflow checking, or use unsigned long
> long at least.  It probably won't matter in practice because 2**31
> milliseconds is a long time, but let's cover this properly.

Ack.  Especially if it is going to be used in different scope, too.

>  From an interface perspective, it seems preferable to compute an
> explicit deadline, and count the milliseconds towards that, rather than
> recomputing the absolute deadline from the millisecond value.

I'm still not sure what you mean by deadline.  It is not time-period, but
rather point in time..  feel free to change that approach as I don't see
an issue there, too.

What I just tried to avoid is to measure the delta between subsequent
__clock_gettime() calls and then subtracting this from "remaining
milliseconds" variable.  That sounds like the delta could be measured
inaccurately, and the whole timeout could be inaccurate.  Then, we should
probably count still in "absolute" milliseconds against the deadline.

Thanks, Pavel
  

Patch

diff --git a/sunrpc/clnt_udp.c b/sunrpc/clnt_udp.c
index 6ffa5f2..2cab4c2 100644
--- a/sunrpc/clnt_udp.c
+++ b/sunrpc/clnt_udp.c
@@ -1,6 +1,8 @@ 
 /*
  * clnt_udp.c, Implements a UDP/IP based, client side RPC.
  *
+ * Copyright (c) 2016, Free Software Foundation, Inc.
+ *
  * Copyright (c) 2010, Oracle America, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -294,6 +296,24 @@  is_network_up (int sock)
   return run != NULL;
 }
 
+static bool_t
+clntudp_call_timeouted (const struct timespec *start, int timeout, int *resend_timeout)
+{
+  struct timespec now;
+  long remains, waited;
+
+  __clock_gettime (CLOCK_MONOTONIC, &now);
+  /* count everything in milliseconds */
+  waited = (now.tv_sec  - start->tv_sec  - 1)          * 1000
+         + (now.tv_nsec - start->tv_nsec + 1000000000) / 1000000;
+  remains = timeout - waited;
+  if (*resend_timeout > remains)
+    *resend_timeout = remains;
+  return (*resend_timeout <= 0);
+}
+
+#define timeval_to_ms(t) ((t).tv_sec * 1000 + (t).tv_usec / 1000)
+
 static enum clnt_stat
 clntudp_call (cl, proc, xargs, argsp, xresults, resultsp, utimeout)
      CLIENT *cl;	/* client handle */
@@ -310,28 +330,27 @@  clntudp_call (cl, proc, xargs, argsp, xresults, resultsp, utimeout)
   int inlen;
   socklen_t fromlen;
   struct pollfd fd;
-  int milliseconds = (cu->cu_wait.tv_sec * 1000) +
-    (cu->cu_wait.tv_usec / 1000);
+  int milliseconds = timeval_to_ms (cu->cu_wait);
   struct sockaddr_in from;
   struct rpc_msg reply_msg;
   XDR reply_xdrs;
-  struct timeval time_waited;
   bool_t ok;
   int nrefreshes = 2;		/* number of times to refresh cred */
-  struct timeval timeout;
+  int timeout;			/* in ms */
   int anyup;			/* any network interface up */
+  struct timespec start;
+
+  __clock_gettime (CLOCK_MONOTONIC, &start);
 
   if (cu->cu_total.tv_usec == -1)
-    {
-      timeout = utimeout;	/* use supplied timeout */
-    }
+    timeout = timeval_to_ms (utimeout); /* use supplied timeout */
   else
-    {
-      timeout = cu->cu_total;	/* use default timeout */
-    }
+    timeout = timeval_to_ms (cu->cu_total); /* use default timeout */
+
+  if (timeout < milliseconds)
+    /* probably user's mistake, but no reason to misbehave */
+    milliseconds = timeout;
 
-  time_waited.tv_sec = 0;
-  time_waited.tv_usec = 0;
 call_again:
   xdrs = &(cu->cu_outxdrs);
   if (xargs == NULL)
@@ -360,7 +379,7 @@  send_again:
   /*
    * Hack to provide rpc-based message passing
    */
-  if (timeout.tv_sec == 0 && timeout.tv_usec == 0)
+  if (timeout == 0)
     {
       return (cu->cu_error.re_status = RPC_TIMEDOUT);
     }
@@ -389,29 +408,20 @@  send_again:
 		return (cu->cu_error.re_status = RPC_CANTRECV);
 	    }
 
-	  time_waited.tv_sec += cu->cu_wait.tv_sec;
-	  time_waited.tv_usec += cu->cu_wait.tv_usec;
-	  while (time_waited.tv_usec >= 1000000)
-	    {
-	      time_waited.tv_sec++;
-	      time_waited.tv_usec -= 1000000;
-	    }
-	  if ((time_waited.tv_sec < timeout.tv_sec) ||
-	      ((time_waited.tv_sec == timeout.tv_sec) &&
-	       (time_waited.tv_usec < timeout.tv_usec)))
-	    goto send_again;
+          if (!clntudp_call_timeouted (&start, timeout, &milliseconds))
+            goto send_again;
 	  return (cu->cu_error.re_status = RPC_TIMEDOUT);
 
-	  /*
-	   * buggy in other cases because time_waited is not being
-	   * updated.
-	   */
 	case -1:
 	  if (errno == EINTR)
 	    continue;
 	  cu->cu_error.re_errno = errno;
 	  return (cu->cu_error.re_status = RPC_CANTRECV);
 	}
+
+      if (clntudp_call_timeouted (&start, timeout, &milliseconds))
+        return (cu->cu_error.re_status = RPC_TIMEDOUT);
+
 #ifdef IP_RECVERR
       if (fd.revents & POLLERR)
 	{