[RFC] sunrpc: Properly cleanup if tst-udp-timeout fails

Message ID 20200205210517.21850-1-msc@linux.ibm.com
State Superseded
Delegated to: Florian Weimer
Headers

Commit Message

Matheus Castanho Feb. 5, 2020, 9:05 p.m. UTC
  Hi,

I recently noted some orphan processes left on a host after failed
sunrpc/tst-udp-timeout.c runs.  More info on the commit message.

Even though the test infra has support for a cleanup_function, it
is only called when the test times out.  The same happens for the code
to kill the whole process group (both on the signal handler in
support/support_test_main.c).  So I ended up adding a cleanup function
to be called right before exit() to terminate the child.

I also considered using a function with __attribute__((destructor))
to perform the cleanup, but adding a new TEST_VERIFY_* macro seemed
to be more reusable by other tests if needed.

Is this the best way to handle this issue?

----8<----

The macro TEST_VERIFY_EXIT is used several times on
sunrpc/tst-udp-timeout to exit the test if a condition evaluates to
false. The side effect is that the code to terminate the RPC server
process is not executed when the program calls exit(), so that
sub-process stays alive.

This commit adds a new TEST_VERIFY_CLEANUP macro to allow specifying
a cleanup function to be called if the check fails. TEST_VERIFY_EXIT
calls in tst-udp-timeout that may leave the orphan process behind
are replaced by this macro so a cleanup function is run before
exiting to guarantee the server is properly terminated in case of
a test failure.
---
 sunrpc/tst-udp-timeout.c | 29 ++++++++++++++++++++---------
 support/check.h          | 12 ++++++++++++
 2 files changed, 32 insertions(+), 9 deletions(-)

--
2.21.1
  

Comments

Lucas A. M. Magalhaes Feb. 7, 2020, 11:55 a.m. UTC | #1
Seems good to me.

Lucas A. M. Magalhaes
  
Florian Weimer Feb. 7, 2020, 8:31 p.m. UTC | #2
* Matheus Castanho:

> I recently noted some orphan processes left on a host after failed
> sunrpc/tst-udp-timeout.c runs.  More info on the commit message.
>
> Even though the test infra has support for a cleanup_function, it
> is only called when the test times out.  The same happens for the code
> to kill the whole process group (both on the signal handler in
> support/support_test_main.c).  So I ended up adding a cleanup function
> to be called right before exit() to terminate the child.
>
> I also considered using a function with __attribute__((destructor))
> to perform the cleanup, but adding a new TEST_VERIFY_* macro seemed
> to be more reusable by other tests if needed.
>
> Is this the best way to handle this issue?

I'm not sure.  I think other processes have the same issue, and having
a generic solution would be nice.

We could perhaps add a fork wrapper that uses prctl with
PR_SET_PDEATHSIG, at least on Linux.  It's not possible to transfer
the cleanup to the outmost monitor process because the test failure is
only noticed after waitpid returns, at which point the process group
no longer exists.  Using a PID namespace would probably work better,
but it requires privileges (either root or at last user namespace
creation, as usual).

For the concrete patch, I think TEST_VERIFY_CLEANUP is confusing
because it does not perform the cleanup in both cases, and also the
exit function call is now obscured.  It may be better to replace
TEST_VERIFY_EXIT with TEST_VERIFY.  Or isolate the client in a
subprocess, too, so that the kill/waitpid sequence for the server
subprocess runs even if the client process failed (but that makes it
harder to debug the client code).  Just installing an atexit handler
may also be an option and acceptable in this case.

Thanks,
Florian
  

Patch

diff --git a/sunrpc/tst-udp-timeout.c b/sunrpc/tst-udp-timeout.c
index 8d45365b23..a936afd56c 100644
--- a/sunrpc/tst-udp-timeout.c
+++ b/sunrpc/tst-udp-timeout.c
@@ -30,6 +30,8 @@ 
 #include <time.h>
 #include <unistd.h>

+pid_t pid;
+
 /* Test data serialization and deserialization.   */

 struct test_query
@@ -66,6 +68,15 @@  xdr_test_response (XDR *xdrs, void *data, ...)
     && xdr_uint32_t (xdrs, &p->sum);
 }

+/* Cleanup function to be called in case of failure.  */
+static void
+cleanup (void)
+{
+  /* Kill the child process running the RPC server.  */
+  kill(pid, SIGTERM);
+  exit(1);
+}
+
 /* Implementation of the test server.  */

 enum
@@ -188,11 +199,11 @@  test_call (CLIENT *clnt, int proc, struct test_query query,
             proc, (unsigned long) timeout.tv_sec,
             (unsigned long) timeout.tv_usec);
   struct test_response response;
-  TEST_VERIFY_EXIT (clnt_call (clnt, proc,
+  TEST_VERIFY_CLEANUP (clnt_call (clnt, proc,
                                xdr_test_query, (void *) &query,
                                xdr_test_response, (void *) &response,
                                timeout)
-                    == RPC_SUCCESS);
+                    == RPC_SUCCESS, cleanup ());
   return response;
 }

@@ -218,11 +229,11 @@  test_call_flush (CLIENT *clnt)
      requested via the timeout_ms field.  */
   if (test_verbose)
     printf ("info: flushing pending queries\n");
-  TEST_VERIFY_EXIT (clnt_call (clnt, PROC_RESET_SEQ,
+  TEST_VERIFY_CLEANUP (clnt_call (clnt, PROC_RESET_SEQ,
                                (xdrproc_t) xdr_void, NULL,
                                (xdrproc_t) xdr_void, NULL,
                                ((struct timeval) { 5, 0 }))
-                    == RPC_SUCCESS);
+                    == RPC_SUCCESS, cleanup ());
 }

 /* Return the number seconds since an arbitrary point in time.  */
@@ -236,7 +247,7 @@  get_ticks (void)
   }
   {
     struct timeval tv;
-    TEST_VERIFY_EXIT (gettimeofday (&tv, NULL) == 0);
+    TEST_VERIFY_CLEANUP (gettimeofday (&tv, NULL) == 0, cleanup ());
     return tv.tv_sec + tv.tv_usec * 1e-6;
   }
 }
@@ -259,7 +270,7 @@  test_udp_server (int port)
   CLIENT *clnt = clntudp_create (&sin, PROGNUM, VERSNUM,
                                  (struct timeval) { 1, 500 * 1000 },
                                  &sock);
-  TEST_VERIFY_EXIT (clnt != NULL);
+  TEST_VERIFY_CLEANUP (clnt != NULL, cleanup ());

   /* Basic call/response test.  */
   struct test_response response = test_call
@@ -363,11 +374,11 @@  test_udp_server (int port)
       test_call_flush (clnt);
     }

-  TEST_VERIFY_EXIT (clnt_call (clnt, PROC_EXIT,
+  TEST_VERIFY_CLEANUP (clnt_call (clnt, PROC_EXIT,
                                (xdrproc_t) xdr_void, NULL,
                                (xdrproc_t) xdr_void, NULL,
                                ((struct timeval) { 5, 0 }))
-                    == RPC_SUCCESS);
+                    == RPC_SUCCESS, cleanup ());
   clnt_destroy (clnt);
 }

@@ -381,7 +392,7 @@  do_test (void)
   TEST_VERIFY_EXIT (transport != NULL);
   TEST_VERIFY (svc_register (transport, PROGNUM, VERSNUM, server_dispatch, 0));

-  pid_t pid = xfork ();
+  pid = xfork ();
   if (pid == 0)
     {
       svc_run ();
diff --git a/support/check.h b/support/check.h
index 77d1d1e14d..5f090dae3e 100644
--- a/support/check.h
+++ b/support/check.h
@@ -64,6 +64,18 @@  __BEGIN_DECLS
         (1, __FILE__, __LINE__, #expr);                         \
   })

+/* Record a test failure and run CLEANUP if EXPR evaluates to false.  */
+#define TEST_VERIFY_CLEANUP(expr, cleanup)                      \
+  ({                                                            \
+    if (expr)                                                   \
+      ;                                                         \
+    else                                                        \
+      {                                                         \
+        support_test_verify_impl (__FILE__, __LINE__, #expr);   \
+        cleanup;                                                \
+      }                                                         \
+  })
+


 int support_print_failure_impl (const char *file, int line,