From patchwork Mon Jun 23 00:25:34 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 1653 Received: (qmail 24747 invoked by alias); 23 Jun 2014 00:25:48 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 24736 invoked by uid 89); 23 Jun 2014 00:25:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.4 required=5.0 tests=AWL, BAYES_50 autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Date: Mon, 23 Jun 2014 01:25:34 +0100 From: "Maciej W. Rozycki" To: CC: =?ISO-8859-2?Q?Ond=F8ej_B=EDlka?= , Roland McGrath Subject: [PATCH v2] test-skeleton: Kill any child process's offspring In-Reply-To: <20140210171709.GA27843@domone> Message-ID: References: <20131011163332.GA20086@domone.podge> <20140210171709.GA27843@domone> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 On Mon, 10 Feb 2014, Ondřej Bílka wrote: > friendly ping > On Fri, Oct 11, 2013 at 06:33:32PM +0200, Ondřej Bílka wrote: > > On Mon, Sep 23, 2013 at 09:24:36PM +0100, Maciej W. Rozycki wrote: > > > Hi, > > > > > > Some of the test cases driven by our test-skeleton create subprocesses. > > > In a test scenario involving what looks to me like a Linux kernel bug I > > > have observed stray processes of this kind left behind where their parent > > > was killed by test-skeleton after a timeout however the process in > > > question was not. This leads to a clutter in the environment our test > > > suite is run within and on weaker systems may also affect test results > > > where the processing power consumed by such a leftover causes the lack of > > > same for subsequent test cases and consequently timeouts. > > > > > > The cause of leaving such processes behind is signal_handler that calls > > > kill to send SIGKILL to test-skeleton's immediate child only and not any > > > further descendants. Conveniently as the child starts it places itself in > > > a separate process group that is then inherited by any further children. > > > Therefore a simple if not obvious fix to this problem is to send SIGKILL > > > to the child's process group instead which this change implements. There > > > is no need to wait for any additional descendants left as the exit status > > > of the immediate child is enough for our purposes and init(8) will take > > > care of the rest in the unlikely case this change addresses. > > > > > > The problem itself was observed with nptl/tst-mutexpi9 and that this fix > > > makes go away -- no stray processes after the test suite has terminated > > > anymore (nptl/tst-mutexpi9 has its own problem of course on this target, > > > but that's another matter; I think there's a value in making the test > > > suite itself more robust). > > > > > > OK to apply? > > > > > What is status of this patch? Here's a version updated according to concerns expressed. Code now checks for an error from setpgid and in the unlikely event of one it retains the current behaviour and kills itself only rather than the process group. The issue with nptl/tst-mutexpi9 has since gone for one reason or another, so this has been only lightly tested, making sure that another test case observed to time out kills itself. OK to apply? 2014-06-23 Maciej W. Rozycki * test-skeleton.c (signal_handler): Undo any PID negation made after setpgid. (main): Negate the PID if setpgid succeeded, otherwise report an error. Maciej glibc-test-skeleton-kill-pgid.diff Index: glibc-fsf-trunk-quilt/test-skeleton.c =================================================================== --- glibc-fsf-trunk-quilt.orig/test-skeleton.c 2014-06-17 16:03:16.621929397 +0100 +++ glibc-fsf-trunk-quilt/test-skeleton.c 2014-06-21 00:03:23.141622346 +0100 @@ -138,8 +138,9 @@ signal_handler (int sig __attribute__ (( int killed; int status; - /* Send signal. */ + /* Send signal. Undo any PID negation made after setpgid. */ kill (pid, SIGKILL); + pid = abs (pid); /* Wait for it to terminate. */ int i; @@ -341,8 +342,14 @@ main (int argc, char *argv[]) #endif /* We put the test process in its own pgrp so that if it bogusly - generates any job control signals, they won't hit the whole build. */ - setpgid (0, 0); + generates any job control signals, they won't hit the whole build. + Negate the PID if successful so that kill targets the PGID, to + reach any child process's offspring. */ + status = setpgid (0, 0); + if (status == 0) + pid = -pid; + else + printf ("Failed to set the process group ID: %m\n"); /* Execute the test function and exit with the return value. */ exit (TEST_FUNCTION);