Remove signal handling for nanosleep (bug 16364)

Message ID 5640A89D.80804@linaro.org
State Superseded
Headers

Commit Message

Adhemerval Zanella Nov. 9, 2015, 2:07 p.m. UTC
  On 09-11-2015 10:45, Florian Weimer wrote:
> On 11/09/2015 01:18 PM, Adhemerval Zanella wrote:
>>
>>
>> On 09-11-2015 05:53, Florian Weimer wrote:
>>> On 11/08/2015 11:52 PM, Adhemerval Zanella wrote:
>>>> Linux 2.6.32 and forward do not show the issue regarding SysV SIGCHLD
>>>> vs. SIG_IGN for nanosleep which make it feasible to use it for sleep
>>>> implementation without requiring any hacking to handle the spurious
>>>> wake up.  This patch simplifies the sleep code to call nanosleep
>>>> directly.
>>>>
>>>> Checked on x86_64, ppc64le, and aarch64.
>>>
>>> Do you know the kernel commit which fixed this?
>>
>> I tried to track down the specific commit that fixed it, but I was unable
>> to pin it down.  uClibc developers also seemed to do same [1], but also
>> they could not find out the exact commit (which I think might be the reason
>> they are using glibc strategy still). musl uses plain nanosleep as the
>> patch.
>>
>> [1] http://git.uclibc.org/uClibc/tree/libc/unistd/sleep.c
> 
> Could you write a glibc test case for this?  I don't feel comfortable
> with removing the workaround without a test, and I can't find an
> existing one.

What about:

	[BZ #16364]
	* sysdeps/unix/sysv/linux/sleep.c (__sleep): Simplify code to use
	nanosleep without requiring to handle spurious wakeups.
	* posix/tst-nanosleep-signal.c: New file.
	* posix/Makefile (test): Add tst-nanosleep-signal.

--
  

Comments

Florian Weimer Nov. 9, 2015, 2:28 p.m. UTC | #1
On 11/09/2015 03:07 PM, Adhemerval Zanella wrote:

>> Could you write a glibc test case for this?  I don't feel comfortable
>> with removing the workaround without a test, and I can't find an
>> existing one.
> 
> What about:
> 
> 	[BZ #16364]
> 	* sysdeps/unix/sysv/linux/sleep.c (__sleep): Simplify code to use
> 	nanosleep without requiring to handle spurious wakeups.
> 	* posix/tst-nanosleep-signal.c: New file.
> 	* posix/Makefile (test): Add tst-nanosleep-signal.

Thanks, but see below for details.

> +  int ret;
> +  pid_t f = fork ();
> +  if (f == 0)
> +    {
> +      // child
> +      struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL };
> +      nanosleep (&tv, &tv);
> +      exit (0);


This needs to be “_exit (0);”, to avoid running termination handlers twice.

> +    }
> +  else
> +    {
> +      // parent
> +      struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL };
> +      ret = nanosleep (&tv, &tv);
> +    }
> +  return ret;
> +}

The parent needs to check for fork errors.

> +
> +void
> +check_nanosleep_sigdfl(void)

Space before paren.


> +{
> +  /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD.  */
> +  signal(SIGCHLD, SIG_DFL);

Space before paren.

> +  if (check_nanosleep_base ())
> +    {
> +      puts ("error: nanosleep was interrupted with SIG_DFL");
> +      errors = 1;
> +    }
> +}

This should check that errno is EINTR.

> +
> +void
> +check_nanosleep_dummy(void)

Space before paren.

> +{
> +  /* With a dummy handler nanosleep should be interrupted.  */
> +  signal(SIGCHLD, dummy);

Space before paren.

> +  int ret = check_nanosleep_base ();
> +  if (ret == 0)
> +    {
> +      puts ("error: nanosleep was not interrupted with dummy handler");
> +      errors = 1;
> +    }
> +}

Missing EINTR check.

> +
> +void
> +check_nanosleep_sigign(void)

Space before paren.

> +{
> +  /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD.  */
> +  signal(SIGCHLD, SIG_IGN);

Space before paren.

The test is racy in two ways: the child could exit before nanosleep in
the parent starts, or the child exit could be delayed after nanosleep in
the parent ends.  I'm not sure if there is way to make this more reliable.

The larger question is whether the EINTR check is sufficient, or if a
time-based check is needed as well.  That is, if the kernel bug
consistent of silent early termination of nanosleep.

Florian
  
Joseph Myers Nov. 9, 2015, 4:19 p.m. UTC | #2
<https://sourceware.org/ml/libc-hacker/1998-11/msg00210.html> has what was 
claimed to be a test for the kernel bug (albeit one with "I believe", not 
asserted to have actually been tested).

I found some claims that nanosleep restarting in some cases was fixed in 
2.6(.0), e.g. <https://lkml.org/lkml/2004/11/25/5> 
<https://lkml.org/lkml/2003/11/8/50>, so before the git history.
  
Adhemerval Zanella Nov. 9, 2015, 4:30 p.m. UTC | #3
On 09-11-2015 12:28, Florian Weimer wrote:
> On 11/09/2015 03:07 PM, Adhemerval Zanella wrote:
> 
>>> Could you write a glibc test case for this?  I don't feel comfortable
>>> with removing the workaround without a test, and I can't find an
>>> existing one.
>>
>> What about:
>>
>> 	[BZ #16364]
>> 	* sysdeps/unix/sysv/linux/sleep.c (__sleep): Simplify code to use
>> 	nanosleep without requiring to handle spurious wakeups.
>> 	* posix/tst-nanosleep-signal.c: New file.
>> 	* posix/Makefile (test): Add tst-nanosleep-signal.
> 
> Thanks, but see below for details.
> 
>> +  int ret;
>> +  pid_t f = fork ();
>> +  if (f == 0)
>> +    {
>> +      // child
>> +      struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL };
>> +      nanosleep (&tv, &tv);
>> +      exit (0);
> 
> 
> This needs to be “_exit (0);”, to avoid running termination handlers twice.
> 

I will change it.

>> +    }
>> +  else
>> +    {
>> +      // parent
>> +      struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL };
>> +      ret = nanosleep (&tv, &tv);
>> +    }
>> +  return ret;
>> +}
> 
> The parent needs to check for fork errors.
> 

Right, I will change it.

>> +
>> +void
>> +check_nanosleep_sigdfl(void)
> 
> Space before paren.
> 
> 
>> +{
>> +  /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD.  */
>> +  signal(SIGCHLD, SIG_DFL);
> 
> Space before paren.
> 
>> +  if (check_nanosleep_base ())
>> +    {
>> +      puts ("error: nanosleep was interrupted with SIG_DFL");
>> +      errors = 1;
>> +    }
>> +}
> 
> This should check that errno is EINTR.

I do not think this check is strictly required: the other failures (EFAULT and
EINVAL) should also indicate something very flaky on the system.

> 
>> +
>> +void
>> +check_nanosleep_dummy(void)
> 
> Space before paren.
> 
>> +{
>> +  /* With a dummy handler nanosleep should be interrupted.  */
>> +  signal(SIGCHLD, dummy);
> 
> Space before paren.
> 
>> +  int ret = check_nanosleep_base ();
>> +  if (ret == 0)
>> +    {
>> +      puts ("error: nanosleep was not interrupted with dummy handler");
>> +      errors = 1;
>> +    }
>> +}
> 
> Missing EINTR check.

Also the idea here is to indicate any failure for nanosleep (same as before).

> 
>> +
>> +void
>> +check_nanosleep_sigign(void)
> 
> Space before paren.
> 
>> +{
>> +  /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD.  */
>> +  signal(SIGCHLD, SIG_IGN);
> 
> Space before paren.
> 
> The test is racy in two ways: the child could exit before nanosleep in
> the parent starts, or the child exit could be delayed after nanosleep in
> the parent ends.  I'm not sure if there is way to make this more reliable.

I am aware of that, that's why I try to mitigate this by making the time on
parent twice for the child and some magnitude higher than the syscalls itself.
However I also can't see a way to make this test entirely reliable: even by
doing some synchronization (either by shared semaphores or pthread barrier)
and/or sending the SIGCHLD directly using signal the two race scenarios you 
describe will still have a small window to occur.  I am not sure which 
strategy will be better and I think we should not rely or add hacks to try
to mitigate for such kernel failures.

> 
> The larger question is whether the EINTR check is sufficient, or if a
> time-based check is needed as well.  That is, if the kernel bug
> consistent of silent early termination of nanosleep.

My understanding is on old kernels the nanosleep calls was not restarted
in a nanosleep call (with the restart_syscall), so it nanosleep will
early terminate.

> 
> Florian
>
  
Florian Weimer Nov. 9, 2015, 5:18 p.m. UTC | #4
On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:

> I do not think this check is strictly required: the other failures (EFAULT and
> EINVAL) should also indicate something very flaky on the system.

We should always check for unexpected errors.  We might have been
overlooking something.

>> The test is racy in two ways: the child could exit before nanosleep in
>> the parent starts, or the child exit could be delayed after nanosleep in
>> the parent ends.  I'm not sure if there is way to make this more reliable.
> 
> I am aware of that, that's why I try to mitigate this by making the time on
> parent twice for the child and some magnitude higher than the syscalls itself.
> However I also can't see a way to make this test entirely reliable: even by
> doing some synchronization (either by shared semaphores or pthread barrier)
> and/or sending the SIGCHLD directly using signal the two race scenarios you 
> describe will still have a small window to occur.  I am not sure which 
> strategy will be better and I think we should not rely or add hacks to try
> to mitigate for such kernel failures.

I think you could make it more likely to hit the window if you forked
several child processes.

>> The larger question is whether the EINTR check is sufficient, or if a
>> time-based check is needed as well.  That is, if the kernel bug
>> consistent of silent early termination of nanosleep.
> 
> My understanding is on old kernels the nanosleep calls was not restarted
> in a nanosleep call (with the restart_syscall), so it nanosleep will
> early terminate.

To be sure, you should check how much time has elapsed in the nanosleep
calls, with clock_gettime(CLOCK_MONOTONIC).  This would cover both the
-1/EINTR case and the return value 0 case.  The existing code does not
look at EINTR, so if there was a bug, it was the return value 0 scenario.

Florian
  
Adhemerval Zanella Nov. 9, 2015, 5:43 p.m. UTC | #5
On 09-11-2015 15:18, Florian Weimer wrote:
> On 11/09/2015 05:30 PM, Adhemerval Zanella wrote:
> 
>> I do not think this check is strictly required: the other failures (EFAULT and
>> EINVAL) should also indicate something very flaky on the system.
> 
> We should always check for unexpected errors.  We might have been
> overlooking something.
> 

Fair enough, I will change that.

>>> The test is racy in two ways: the child could exit before nanosleep in
>>> the parent starts, or the child exit could be delayed after nanosleep in
>>> the parent ends.  I'm not sure if there is way to make this more reliable.
>>
>> I am aware of that, that's why I try to mitigate this by making the time on
>> parent twice for the child and some magnitude higher than the syscalls itself.
>> However I also can't see a way to make this test entirely reliable: even by
>> doing some synchronization (either by shared semaphores or pthread barrier)
>> and/or sending the SIGCHLD directly using signal the two race scenarios you 
>> describe will still have a small window to occur.  I am not sure which 
>> strategy will be better and I think we should not rely or add hacks to try
>> to mitigate for such kernel failures.
> 
> I think you could make it more likely to hit the window if you forked
> several child processes.

Or if the system load is high enough.  My question is if we should push or
not this test even with this racy condition and if so which strategy would
be better to mitigate it.

> 
>>> The larger question is whether the EINTR check is sufficient, or if a
>>> time-based check is needed as well.  That is, if the kernel bug
>>> consistent of silent early termination of nanosleep.
>>
>> My understanding is on old kernels the nanosleep calls was not restarted
>> in a nanosleep call (with the restart_syscall), so it nanosleep will
>> early terminate.
> 
> To be sure, you should check how much time has elapsed in the nanosleep
> calls, with clock_gettime(CLOCK_MONOTONIC).  This would cover both the
> -1/EINTR case and the return value 0 case.  The existing code does not
> look at EINTR, so if there was a bug, it was the return value 0 scenario.

For nanosleep check itself, glibc already have posix/tst-nanosleep.c 
(which uses gettimeofday.c).

However my main questioning and the reason of this bikeshedding regarding
is if GLIBC should continue to provide a hack for non-compliant kernels
and if we should really check for this issue with a possible racy testcase.

> 
> Florian
> 
>
  
Adhemerval Zanella Nov. 10, 2015, 12:36 p.m. UTC | #6
On 09-11-2015 14:19, Joseph Myers wrote:
> <https://sourceware.org/ml/libc-hacker/1998-11/msg00210.html> has what was 
> claimed to be a test for the kernel bug (albeit one with "I believe", not 
> asserted to have actually been tested).

This test is essentially what I have added, it just spawn more children
with different sleep periods.

> 
> I found some claims that nanosleep restarting in some cases was fixed in 
> 2.6(.0), e.g. <https://lkml.org/lkml/2004/11/25/5> 
> <https://lkml.org/lkml/2003/11/8/50>, so before the git history.
> 

Thanks, I tried to track down the kernel source which handles the nanosleep
(kernel/posix-cpu-timers.c), but found it has been fixed also before git
history.

I will resend the patch with this information.
  

Patch

diff --git a/posix/Makefile b/posix/Makefile
index aeb9890..cec06a6 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -74,8 +74,8 @@  tests		:= tstgetopt testfnm runtests runptests	     \
 		   bug-regex21 bug-regex22 bug-regex23 bug-regex24 \
 		   bug-regex25 bug-regex26 bug-regex27 bug-regex28 \
 		   bug-regex29 bug-regex30 bug-regex31 bug-regex32 \
-		   bug-regex33 tst-nice tst-nanosleep tst-regex2 \
-		   transbug tst-rxspencer tst-pcre tst-boost \
+		   bug-regex33 tst-nice tst-nanosleep tst-nanosleep-signal \
+		   tst-regex2 transbug tst-rxspencer tst-pcre tst-boost \
 		   bug-ga1 tst-vfork1 tst-vfork2 tst-vfork3 tst-waitid \
 		   tst-getaddrinfo2 bug-glob1 bug-glob2 bug-glob3 tst-sysconf \
 		   tst-execvp1 tst-execvp2 tst-execlp1 tst-execlp2 \
diff --git a/posix/tst-nanosleep-signal.c b/posix/tst-nanosleep-signal.c
new file mode 100644
index 0000000..e6abab4
--- /dev/null
+++ b/posix/tst-nanosleep-signal.c
@@ -0,0 +1,105 @@ 
+/* Check if nanosleep handles correctly SIGCHLD.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <signal.h>
+#include <sys/time.h>
+
+static int errors = 0;
+
+static void 
+dummy (int sig)
+{
+}
+
+/* Fork and terminate the child in 0.25s while parent waits on a nanosleep
+   for 0.5s.  */
+static int
+check_nanosleep_base (void)
+{
+  int ret;
+  pid_t f = fork ();
+  if (f == 0)
+    {
+      // child
+      struct timespec tv = { .tv_sec = 0, .tv_nsec = 250000000UL };
+      nanosleep (&tv, &tv);
+      exit (0);
+    }
+  else
+    {
+      // parent
+      struct timespec tv = { .tv_sec = 0, .tv_nsec = 500000000UL };
+      ret = nanosleep (&tv, &tv);
+    }
+  return ret;
+}
+
+void
+check_nanosleep_sigdfl(void)
+{
+  /* For SIG_DFL nanosleep should not be interrupted by SIGCHLD.  */
+  signal(SIGCHLD, SIG_DFL);
+  if (check_nanosleep_base ())
+    {
+      puts ("error: nanosleep was interrupted with SIG_DFL");
+      errors = 1;
+    }
+}
+
+void
+check_nanosleep_dummy(void)
+{
+  /* With a dummy handler nanosleep should be interrupted.  */
+  signal(SIGCHLD, dummy);
+  int ret = check_nanosleep_base ();
+  if (ret == 0)
+    {
+      puts ("error: nanosleep was not interrupted with dummy handler");
+      errors = 1;
+    }
+}
+
+void
+check_nanosleep_sigign(void)
+{
+  /* For SIG_IGN nanosleep should not be interrupted by SIGCHLD.  */
+  signal(SIGCHLD, SIG_IGN);
+  if (check_nanosleep_base ())
+    {
+      puts ("error: nanosleep was interrupted with SIG_IGN");
+      errors = 1;
+    }
+}
+
+
+static int
+do_test (void)
+{
+  check_nanosleep_sigdfl ();
+  check_nanosleep_dummy ();
+  check_nanosleep_sigign ();
+
+  return errors;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
diff --git a/sysdeps/unix/sysv/linux/sleep.c b/sysdeps/unix/sysv/linux/sleep.c
index 2a06d5a..3885a34 100644
--- a/sysdeps/unix/sysv/linux/sleep.c
+++ b/sysdeps/unix/sysv/linux/sleep.c
@@ -17,133 +17,16 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <errno.h>
 #include <time.h>
-#include <signal.h>
-#include <string.h>	/* For the real memset prototype.  */
 #include <unistd.h>
-#include <sys/param.h>
-#include <nptl/pthreadP.h>
+#include <sysdep-cancel.h>
 
-
-#if 0
-static void
-cl (void *arg)
-{
-  (void) __sigprocmask (SIG_SETMASK, arg, (sigset_t *) NULL);
-}
-#endif
-
-
-/* We are going to use the `nanosleep' syscall of the kernel.  But the
-   kernel does not implement the stupid SysV SIGCHLD vs. SIG_IGN
-   behaviour for this syscall.  Therefore we have to emulate it here.  */
 unsigned int
 __sleep (unsigned int seconds)
 {
-  const unsigned int max
-    = (unsigned int) (((unsigned long int) (~((time_t) 0))) >> 1);
-  struct timespec ts;
-  sigset_t set, oset;
-  unsigned int result;
-
-  /* This is not necessary but some buggy programs depend on this.  */
-  if (__glibc_unlikely (seconds == 0))
-    {
-#ifdef CANCELLATION_P
-      CANCELLATION_P (THREAD_SELF);
-#endif
-      return 0;
-    }
-
-  ts.tv_sec = 0;
-  ts.tv_nsec = 0;
- again:
-  if (sizeof (ts.tv_sec) <= sizeof (seconds))
-    {
-      /* Since SECONDS is unsigned assigning the value to .tv_sec can
-	 overflow it.  In this case we have to wait in steps.  */
-      ts.tv_sec += MIN (seconds, max);
-      seconds -= (unsigned int) ts.tv_sec;
-    }
-  else
-    {
-      ts.tv_sec = (time_t) seconds;
-      seconds = 0;
-    }
-
-  /* Linux will wake up the system call, nanosleep, when SIGCHLD
-     arrives even if SIGCHLD is ignored.  We have to deal with it
-     in libc.  We block SIGCHLD first.  */
-  __sigemptyset (&set);
-  __sigaddset (&set, SIGCHLD);
-  if (__sigprocmask (SIG_BLOCK, &set, &oset))
-    return -1;
-
-  /* If SIGCHLD is already blocked, we don't have to do anything.  */
-  if (!__sigismember (&oset, SIGCHLD))
-    {
-      int saved_errno;
-      struct sigaction oact;
-
-      __sigemptyset (&set);
-      __sigaddset (&set, SIGCHLD);
-
-      /* We get the signal handler for SIGCHLD.  */
-      if (__sigaction (SIGCHLD, (struct sigaction *) NULL, &oact) < 0)
-	{
-	  saved_errno = errno;
-	  /* Restore the original signal mask.  */
-	  (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
-	  __set_errno (saved_errno);
-	  return -1;
-	}
-
-      /* Note the sleep() is a cancellation point.  But since we call
-	 nanosleep() which itself is a cancellation point we do not
-	 have to do anything here.  */
-      if (oact.sa_handler == SIG_IGN)
-	{
-	  //__libc_cleanup_push (cl, &oset);
-
-	  /* We should leave SIGCHLD blocked.  */
-	  while (1)
-	    {
-	      result = __nanosleep (&ts, &ts);
-
-	      if (result != 0 || seconds == 0)
-		break;
-
-	      if (sizeof (ts.tv_sec) <= sizeof (seconds))
-		{
-		  ts.tv_sec = MIN (seconds, max);
-		  seconds -= (unsigned int) ts.tv_nsec;
-		}
-	    }
-
-	  //__libc_cleanup_pop (0);
-
-	  saved_errno = errno;
-	  /* Restore the original signal mask.  */
-	  (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
-	  __set_errno (saved_errno);
-
-	  goto out;
-	}
-
-      /* We should unblock SIGCHLD.  Restore the original signal mask.  */
-      (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
-    }
-
-  result = __nanosleep (&ts, &ts);
-  if (result == 0 && seconds != 0)
-    goto again;
-
- out:
-  if (result != 0)
-    /* Round remaining time.  */
-    result = seconds + (unsigned int) ts.tv_sec + (ts.tv_nsec >= 500000000L);
-
-  return result;
+  struct timespec ts = { .tv_sec = seconds, .tv_nsec = 0 };
+  if (__nanosleep (&ts, &ts))
+    return ts.tv_sec;
+  return 0;
 }
 weak_alias (__sleep, sleep)