support: Kill process group for test failure

Message ID 20200220143406.4768-1-adhemerval.zanella@linaro.org
State Rejected
Delegated to: Carlos O'Donell
Headers

Commit Message

Adhemerval Zanella Netto Feb. 20, 2020, 2:34 p.m. UTC
  Some testcases that create multiple subprocesses might abort or exit
prior waiting for their children.  In such case, support_test_main
does not try to kill the spawned test process group (as in the
test timeout case).

On example that we are observing in internal tests is when
malloc/tst-mallocfork2 fails to fork in the signal handling (due
either maximum number of process or other non expected failure).

This patch kill the process group in the case of failed execution,
similar on how it is done on timeout.

Checked on x86_64-linux-gnu.
---
 support/support_test_main.c | 7 +++++++
 1 file changed, 7 insertions(+)
  

Comments

Carlos O'Donell Feb. 20, 2020, 2:53 p.m. UTC | #1
On 2/20/20 9:34 AM, Adhemerval Zanella wrote:
> Some testcases that create multiple subprocesses might abort or exit
> prior waiting for their children.  In such case, support_test_main
> does not try to kill the spawned test process group (as in the
> test timeout case).
> 
> On example that we are observing in internal tests is when
> malloc/tst-mallocfork2 fails to fork in the signal handling (due
> either maximum number of process or other non expected failure).
> 
> This patch kill the process group in the case of failed execution,
> similar on how it is done on timeout.

LGTM.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
 
> Checked on x86_64-linux-gnu.
> ---
>  support/support_test_main.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/support/support_test_main.c b/support/support_test_main.c
> index e3f0bf15f2..ac9f710fb7 100644
> --- a/support/support_test_main.c
> +++ b/support/support_test_main.c
> @@ -459,6 +459,9 @@ support_test_main (int argc, char **argv, const struct test_config *config)
>    /* Process terminated normaly without timeout etc.  */
>    if (WIFEXITED (status))
>      {
> +      /* It is expected that a successful test execution handles all its
> +	 children.  */

OK.

> +
>        if (config->expected_status == 0)
>          {
>            if (config->expected_signal == 0)
> @@ -486,6 +489,10 @@ support_test_main (int argc, char **argv, const struct test_config *config)
>    /* Process was killed by timer or other signal.  */
>    else
>      {
> +      /* Kill the whole process group if test process aborts or exits prior
> +	 waiting for them.  */
> +      kill (-test_pid, SIGKILL);

OK. Send negation of process group id to kill the whole process group. The
process group id is the same as the id of the process that created the group
so test_pid is the right value.

Notes:
- Should we be using test_pgid to make this clear?
- Should this cleanup be refactored a bit to avoid duplication from
  signal_handler() and support_test_main() e.g. kill_process_group ()
  which runs kill looks for errors prints diagnostic etc.

> +
>        if (config->expected_signal == 0)
>          {
>            printf ("Didn't expect signal from child: got `%s'\n",
>
  
Florian Weimer Feb. 20, 2020, 3:04 p.m. UTC | #2
* Adhemerval Zanella:

> Some testcases that create multiple subprocesses might abort or exit
> prior waiting for their children.  In such case, support_test_main
> does not try to kill the spawned test process group (as in the
> test timeout case).

Does this actually work?  Is the process group preserved if a process is
reparented to init?

Thanks,
Florian
  
Carlos O'Donell Feb. 20, 2020, 3:12 p.m. UTC | #3
On Thu, Feb 20, 2020 at 10:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Adhemerval Zanella:
>
> > Some testcases that create multiple subprocesses might abort or exit
> > prior waiting for their children.  In such case, support_test_main
> > does not try to kill the spawned test process group (as in the
> > test timeout case).
>
> Does this actually work?  Is the process group preserved if a process is
> reparented to init?

No, you are right.

There is a race too which I didn't see.

Once you waitpid the pid and pgid might be free for reuse and we can't
guarantee this will work.

Cheers,
Carlos.
  
Carlos O'Donell Feb. 20, 2020, 3:14 p.m. UTC | #4
On Thu, Feb 20, 2020 at 9:53 AM Carlos O'Donell <codonell@redhat.com> wrote:
>
> On 2/20/20 9:34 AM, Adhemerval Zanella wrote:
> > Some testcases that create multiple subprocesses might abort or exit
> > prior waiting for their children.  In such case, support_test_main
> > does not try to kill the spawned test process group (as in the
> > test timeout case).
> >
> > On example that we are observing in internal tests is when
> > malloc/tst-mallocfork2 fails to fork in the signal handling (due
> > either maximum number of process or other non expected failure).
> >
> > This patch kill the process group in the case of failed execution,
> > similar on how it is done on timeout.
>
> LGTM.
>
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>

I'm withdrawing my reviewed-by here, since there is a race.

Florian highlighted that the children are all going to be reparented
to init and that therefore we can't catch them anymore.

The only plausible solution here is to use the controlling terminal to
kill the orphan children.

> > Checked on x86_64-linux-gnu.
> > ---
> >  support/support_test_main.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/support/support_test_main.c b/support/support_test_main.c
> > index e3f0bf15f2..ac9f710fb7 100644
> > --- a/support/support_test_main.c
> > +++ b/support/support_test_main.c
> > @@ -459,6 +459,9 @@ support_test_main (int argc, char **argv, const struct test_config *config)
> >    /* Process terminated normaly without timeout etc.  */
> >    if (WIFEXITED (status))
> >      {
> > +      /* It is expected that a successful test execution handles all its
> > +      children.  */
>
> OK.
>
> > +
> >        if (config->expected_status == 0)
> >          {
> >            if (config->expected_signal == 0)
> > @@ -486,6 +489,10 @@ support_test_main (int argc, char **argv, const struct test_config *config)
> >    /* Process was killed by timer or other signal.  */
> >    else
> >      {
> > +      /* Kill the whole process group if test process aborts or exits prior
> > +      waiting for them.  */
> > +      kill (-test_pid, SIGKILL);
>
> OK. Send negation of process group id to kill the whole process group. The
> process group id is the same as the id of the process that created the group
> so test_pid is the right value.
>
> Notes:
> - Should we be using test_pgid to make this clear?
> - Should this cleanup be refactored a bit to avoid duplication from
>   signal_handler() and support_test_main() e.g. kill_process_group ()
>   which runs kill looks for errors prints diagnostic etc.
>
> > +
> >        if (config->expected_signal == 0)
> >          {
> >            printf ("Didn't expect signal from child: got `%s'\n",
> >
>
>
> --
> Cheers,
> Carlos.
  

Patch

diff --git a/support/support_test_main.c b/support/support_test_main.c
index e3f0bf15f2..ac9f710fb7 100644
--- a/support/support_test_main.c
+++ b/support/support_test_main.c
@@ -459,6 +459,9 @@  support_test_main (int argc, char **argv, const struct test_config *config)
   /* Process terminated normaly without timeout etc.  */
   if (WIFEXITED (status))
     {
+      /* It is expected that a successful test execution handles all its
+	 children.  */
+
       if (config->expected_status == 0)
         {
           if (config->expected_signal == 0)
@@ -486,6 +489,10 @@  support_test_main (int argc, char **argv, const struct test_config *config)
   /* Process was killed by timer or other signal.  */
   else
     {
+      /* Kill the whole process group if test process aborts or exits prior
+	 waiting for them.  */
+      kill (-test_pid, SIGKILL);
+
       if (config->expected_signal == 0)
         {
           printf ("Didn't expect signal from child: got `%s'\n",