[v2] test-skeleton: Kill any child process's offspring

Message ID alpine.DEB.1.10.1406230100010.25395@tp.orcam.me.uk
State Superseded
Headers

Commit Message

Maciej W. Rozycki June 23, 2014, 12:25 a.m. UTC
  On Mon, 10 Feb 2014, Ondřej Bílka wrote:

> friendly ping
> On Fri, Oct 11, 2013 at 06:33:32PM +0200, Ondřej Bílka wrote:
> > On Mon, Sep 23, 2013 at 09:24:36PM +0100, Maciej W. Rozycki wrote:
> > > Hi,
> > > 
> > >  Some of the test cases driven by our test-skeleton create subprocesses.  
> > > In a test scenario involving what looks to me like a Linux kernel bug I 
> > > have observed stray processes of this kind left behind where their parent 
> > > was killed by test-skeleton after a timeout however the process in 
> > > question was not.  This leads to a clutter in the environment our test 
> > > suite is run within and on weaker systems may also affect test results 
> > > where the processing power consumed by such a leftover causes the lack of 
> > > same for subsequent test cases and consequently timeouts.
> > > 
> > >  The cause of leaving such processes behind is signal_handler that calls 
> > > kill to send SIGKILL to test-skeleton's immediate child only and not any 
> > > further descendants.  Conveniently as the child starts it places itself in 
> > > a separate process group that is then inherited by any further children.  
> > > Therefore a simple if not obvious fix to this problem is to send SIGKILL 
> > > to the child's process group instead which this change implements.  There 
> > > is no need to wait for any additional descendants left as the exit status 
> > > of the immediate child is enough for our purposes and init(8) will take 
> > > care of the rest in the unlikely case this change addresses.
> > > 
> > >  The problem itself was observed with nptl/tst-mutexpi9 and that this fix 
> > > makes go away -- no stray processes after the test suite has terminated 
> > > anymore (nptl/tst-mutexpi9 has its own problem of course on this target, 
> > > but that's another matter; I think there's a value in making the test 
> > > suite itself more robust).
> > > 
> > >  OK to apply?
> > > 
> > What is status of this patch?

 Here's a version updated according to concerns expressed.  Code now 
checks for an error from setpgid and in the unlikely event of one it 
retains the current behaviour and kills itself only rather than the 
process group.

 The issue with nptl/tst-mutexpi9 has since gone for one reason or 
another, so this has been only lightly tested, making sure that another 
test case observed to time out kills itself.

 OK to apply?

2014-06-23  Maciej W. Rozycki  <macro@codesourcery.com>

	* test-skeleton.c (signal_handler): Undo any PID negation made 
	after setpgid.
	(main): Negate the PID if setpgid succeeded, otherwise report an
	error.

  Maciej

glibc-test-skeleton-kill-pgid.diff
  

Comments

Roland McGrath June 23, 2014, 5:34 p.m. UTC | #1
I don't know what you did to test that change.  I can't see how it could
possibly work as intended.  You've changed the value of PID in the child,
but it's only actually used in the parent (where it won't have changed).
Am I missing something?

Without more futzing than is worthwhile, there is no way that the child can
communicate back to the parent whether setpgid succeeded or failed.  The
error message is nice to have, but I don't think the parent actually needs
to care whether setpgid worked or not.  If it didn't, there won't be any
pgrp with pgid -pid.

So I think all you really need is to make the handler do:

  assert (pid > 1);
  /* Kill the whole process group.  */
  kill (-pid, SIGKILL);
  /* In case, setpgid failed in the child, kill it individually too.  */
  kill (pid, SIGKILL);


Thanks,
Roland
  

Patch

Index: glibc-fsf-trunk-quilt/test-skeleton.c
===================================================================
--- glibc-fsf-trunk-quilt.orig/test-skeleton.c	2014-06-17 16:03:16.621929397 +0100
+++ glibc-fsf-trunk-quilt/test-skeleton.c	2014-06-21 00:03:23.141622346 +0100
@@ -138,8 +138,9 @@  signal_handler (int sig __attribute__ ((
   int killed;
   int status;
 
-  /* Send signal.  */
+  /* Send signal.  Undo any PID negation made after setpgid.  */
   kill (pid, SIGKILL);
+  pid = abs (pid);
 
   /* Wait for it to terminate.  */
   int i;
@@ -341,8 +342,14 @@  main (int argc, char *argv[])
 #endif
 
       /* We put the test process in its own pgrp so that if it bogusly
-	 generates any job control signals, they won't hit the whole build.  */
-      setpgid (0, 0);
+	 generates any job control signals, they won't hit the whole build.
+	 Negate the PID if successful so that kill targets the PGID, to
+	 reach any child process's offspring.  */
+      status = setpgid (0, 0);
+      if (status == 0)
+	pid = -pid;
+      else
+	printf ("Failed to set the process group ID: %m\n");
 
       /* Execute the test function and exit with the return value.   */
       exit (TEST_FUNCTION);