question about which sleep is noted in manual

Message ID orfvcd7a7a.fsf@free.home
State Superseded
Headers

Commit Message

Alexandre Oliva Dec. 18, 2014, 6:18 a.m. UTC
  [copying the list]

On Dec 10, 2014, MaShimiao <mashimiao.fnst@cn.fujitsu.com> wrote:

> So, I can't understand why sleep() marked with MT-Unsafe.

Ok.  Let's start by ruling out the unsafety inheritance from
sigprocmask: the apparent race in sigprocmask was BSD-only, and the
annotation in sleep is completely different, so I think we can rule out
sigprocmask as the source of the problem.  I may have made another
mistake WRT the thread-local nature of the signal mask here, and the
comments under sleep seem to support this, but there is a race in there
nevertheless: we test what the signal handler for SIGCHLD is, to decide
whether or not we have to keep SIGCHLD blocked, as if no other thread
could modify the handler after we tested it.  That's clearly a race, and
it is the very sort of situation that @mtascusig refers to.  So, I'm
proposing to keep the annotations unchanged, and expanding the
rationale.

Unfortunately, there is another lurking problem in there: nanosleep may
be cancelled, causing handlers to run with SIGCHLD blocked.  There is
code in place to install a cleanup handler, but it's inexplicably
commented out.  This makes sleep unsafe to call even without
asynchronous cancellation.  In order to make it C-Safe (but not
AC-Safe), the first patch below enables the cleanup handlers required to
that end.  This might not sound like such a big deal, but the third
patch below elaborates on the reasons why it could be.

Before discussion the second patch, is the first patch ok to install?

for ChangeLog

	* sysdeps/unix/sysv/linux/sleep.c (__sleep): Enable cleanup
	handler to restore the signal mask at least on synchronous
	cancellation.
	* intro/time.texi (sleep): Expand unsafety rationale.
---
 manual/time.texi                |    8 ++++++--
 sysdeps/unix/sysv/linux/sleep.c |    6 ++----
 2 files changed, 8 insertions(+), 6 deletions(-)
  

Comments

Ma Shimiao Dec. 22, 2014, 2:53 a.m. UTC | #1
Hi Alexandre,

On 12/18/2014 02:18 PM, Alexandre Oliva wrote:
> /* We are going to use the `nanosleep' syscall of the kernel.  But the
> @@ -104,7 +102,7 @@ __sleep (unsigned int seconds)
>  	 have to do anything here.  */
>        if (oact.sa_handler == SIG_IGN)
>  	{
> -	  //__libc_cleanup_push (cl, &oset);
> +	  __libc_cleanup_push (cl, &oset);
>  
>  	  /* We should leave SIGCHLD blocked.  */
>  	  while (1)
> @@ -121,7 +119,7 @@ __sleep (unsigned int seconds)
>  		}
>  	    }
>  
> -	  //__libc_cleanup_pop (0);
> +	  __libc_cleanup_pop (0);
I think we do not need this change.
From the original code, I get the following logic:
1. if function __sleep doesn't block SIGCHLD,then it does nothing.
2. if funciont __sleep blocks SIGCHLD, no matter nanosleep() is cancelled or
not, __seleep will restore the original signal mask before return. detail as following:
         
	 //__libc_cleanup_pop (0);

          saved_errno = errno;
          /* Restore the original signal mask.  */
          (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
          __set_errno (saved_errno);

          goto out;
        }
So, I think we don't need this change.
Did I miss something?


Best regards
  
Alexandre Oliva Dec. 22, 2014, 6:24 p.m. UTC | #2
On Dec 22, 2014, MaShimiao <mashimiao.fnst@cn.fujitsu.com> wrote:

> 2. if funciont __sleep blocks SIGCHLD, no matter nanosleep() is cancelled or
> not, __seleep will restore the original signal mask before return.

If the thread is canceled, we won't take the normal execution path after
the nanosleep call; we'll only run cleanup handlers set up earlier in
the call chain, until the thread is terminated.

You may want to try it yourself: install a cleanup handler that probes
SIGCHLD in the signal mask, in a thread that calls sleep, and then
cancel the thread while sleep is running.  You shouldn't see SIGCHLD
blocked in your cleanup handler, but without the proposed patch, you
will.
  
Ma Shimiao Dec. 23, 2014, 3:24 a.m. UTC | #3
On 12/18/2014 02:18 PM, Alexandre Oliva wrote:
> AC-Safe), the first patch below enables the cleanup handlers required to
> that end.  This might not sound like such a big deal, but the third
> patch below elaborates on the reasons why it could be.
> 
> Before discussion the second patch, is the first patch ok to install?

I tested with this path and it did work.
So, this patch looks good to me.

> implies AC-Unsafe, and I didn't think it would be worth introducing
> @mtasusig just for this one case, though I might if we choose to go with
> this.
> 
> Ok to install on top of the first patch?

I have a comment on the second patch.

> -	  __libc_cleanup_pop (0);
> +  /* Linux will wake up the system call, nanosleep, when SIGCHLD
> +     arrives even if SIGCHLD is ignored.  We have to deal with it in
> +     libc, so we block SIGCHLD while sleeping if SIGCHLD shouldn't
> +     wake us up.  */
> +  if (__sigprocmask (SIG_BLOCK, &set, &set))
> +    return -1;
>  
> -	  saved_errno = errno;
> -	  /* Restore the original signal mask.  */
> -	  (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
> -	  __set_errno (saved_errno);
> +  if (!__sigismember (arg, SIGCHLD))

arg is not a defined variable.


Best regards
  
Ma Shimiao Dec. 23, 2014, 4:31 a.m. UTC | #4
On 12/23/2014 11:24 AM, MaShimiao wrote:
> On 12/18/2014 02:18 PM, Alexandre Oliva wrote:
>> AC-Safe), the first patch below enables the cleanup handlers required to
>> that end.  This might not sound like such a big deal, but the third
>> patch below elaborates on the reasons why it could be.
>>
>> Before discussion the second patch, is the first patch ok to install?
> 
> I tested with this path and it did work.
> So, this patch looks good to me.

Sorry, I forgot to note when testing I added bits/libc-lock.h to sleep.c.
If not, there will be a warning.
> 
>> implies AC-Unsafe, and I didn't think it would be worth introducing
>> @mtasusig just for this one case, though I might if we choose to go with
>> this.
>>
>> Ok to install on top of the first patch?
> 
> I have a comment on the second patch.
> 
>> -	  __libc_cleanup_pop (0);
>> +  /* Linux will wake up the system call, nanosleep, when SIGCHLD
>> +     arrives even if SIGCHLD is ignored.  We have to deal with it in
>> +     libc, so we block SIGCHLD while sleeping if SIGCHLD shouldn't
>> +     wake us up.  */
>> +  if (__sigprocmask (SIG_BLOCK, &set, &set))
>> +    return -1;
>>  
>> -	  saved_errno = errno;
>> -	  /* Restore the original signal mask.  */
>> -	  (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
>> -	  __set_errno (saved_errno);
>> +  if (!__sigismember (arg, SIGCHLD))
> 
> arg is not a defined variable.
> 
> 
> Best regards
>
  

Patch

diff --git a/manual/time.texi b/manual/time.texi
index 8a5f94e..8fe97dc 100644
--- a/manual/time.texi
+++ b/manual/time.texi
@@ -2828,8 +2828,12 @@  any descriptors to wait for.
 @deftypefun {unsigned int} sleep (unsigned int @var{seconds})
 @safety{@prelim{}@mtunsafe{@mtascusig{:SIGCHLD/linux}}@asunsafe{}@acunsafe{}}
 @c On Mach, it uses ports and calls time.  On generic posix, it calls
-@c nanosleep.  On Linux, it temporarily blocks SIGCHLD, which is MT- and
-@c AS-Unsafe, and in a way that makes it AC-Unsafe (C-unsafe, even!).
+@c nanosleep.  On GNU/Linux, it may temporarily block SIGCHLD while
+@c calling nanosleep, restoring it in a cleanup handler that does not
+@c cover the entire asynchronous region, and other threads might modify
+@c the signal handler after we test whether it is SIG_IGN, causing the
+@c function to wake up when it shouldn't or to block a signal that
+@c should wake it up.
 The @code{sleep} function waits for @var{seconds} or until a signal
 is delivered, whichever happens first.
 
diff --git a/sysdeps/unix/sysv/linux/sleep.c b/sysdeps/unix/sysv/linux/sleep.c
index 5411fd5..8440981 100644
--- a/sysdeps/unix/sysv/linux/sleep.c
+++ b/sysdeps/unix/sysv/linux/sleep.c
@@ -26,13 +26,11 @@ 
 #include <nptl/pthreadP.h>
 
 
-#if 0
 static void
 cl (void *arg)
 {
   (void) __sigprocmask (SIG_SETMASK, arg, (sigset_t *) NULL);
 }
-#endif
 
 
 /* We are going to use the `nanosleep' syscall of the kernel.  But the
@@ -104,7 +102,7 @@  __sleep (unsigned int seconds)
 	 have to do anything here.  */
       if (oact.sa_handler == SIG_IGN)
 	{
-	  //__libc_cleanup_push (cl, &oset);
+	  __libc_cleanup_push (cl, &oset);
 
 	  /* We should leave SIGCHLD blocked.  */
 	  while (1)
@@ -121,7 +119,7 @@  __sleep (unsigned int seconds)
 		}
 	    }
 
-	  //__libc_cleanup_pop (0);
+	  __libc_cleanup_pop (0);
 
 	  saved_errno = errno;
 	  /* Restore the original signal mask.  */



On to the second patch.  Here, I largely revamp sleep so as to factor
out and simplify various pieces of code, and extending the range of the
cleanup handler so that, even if the thread is cancelled before we call
sigprocmask, we get the correct signal mask restored before other
cleanup handlers run.  The manual is then adjusted to drop this caveat.
It doesn't affect the safety notes proper, though, because @mtascusig
implies AC-Unsafe, and I didn't think it would be worth introducing
@mtasusig just for this one case, though I might if we choose to go with
this.

Ok to install on top of the first patch?


for ChangeLog

	* sysdeps/unix/sysv/linux/sleep.c (restore_sigmask): Rename
	from cl.
	(sleep_loop): Factor out of...
	(__sleep): ... this.  Revamp.  Extend cleanup range over the
	entire range so that it is properly restored even in case of
	asynchronous cancellation.
	* manual/time.texi (sleep): Drop note about the async cleanup
	failure fixed above.
---
 manual/time.texi                |    8 +-
 sysdeps/unix/sysv/linux/sleep.c |  151 ++++++++++++++++++++-------------------
 2 files changed, 80 insertions(+), 79 deletions(-)

diff --git a/manual/time.texi b/manual/time.texi
index 8fe97dc..1aebdd0 100644
--- a/manual/time.texi
+++ b/manual/time.texi
@@ -2829,11 +2829,9 @@  any descriptors to wait for.
 @safety{@prelim{}@mtunsafe{@mtascusig{:SIGCHLD/linux}}@asunsafe{}@acunsafe{}}
 @c On Mach, it uses ports and calls time.  On generic posix, it calls
 @c nanosleep.  On GNU/Linux, it may temporarily block SIGCHLD while
-@c calling nanosleep, restoring it in a cleanup handler that does not
-@c cover the entire asynchronous region, and other threads might modify
-@c the signal handler after we test whether it is SIG_IGN, causing the
-@c function to wake up when it shouldn't or to block a signal that
-@c should wake it up.
+@c calling nanosleep, and other threads might modify the signal handler
+@c after we test whether it is SIG_IGN, causing the function to wake up
+@c when it shouldn't or to block a signal that should wake it up.
 The @code{sleep} function waits for @var{seconds} or until a signal
 is delivered, whichever happens first.
 
diff --git a/sysdeps/unix/sysv/linux/sleep.c b/sysdeps/unix/sysv/linux/sleep.c
index 8440981..f7f88cd 100644
--- a/sysdeps/unix/sysv/linux/sleep.c
+++ b/sysdeps/unix/sysv/linux/sleep.c
@@ -24,39 +24,30 @@ 
 #include <unistd.h>
 #include <sys/param.h>
 #include <nptl/pthreadP.h>
-
+#include <bits/libc-lock.h>
 
 static void
-cl (void *arg)
+restore_sigmask (void *arg)
 {
-  (void) __sigprocmask (SIG_SETMASK, arg, (sigset_t *) NULL);
+  if (!__sigismember (arg, SIGCHLD))
+    {
+      int saved_errno = errno;
+      /* Restore the original signal mask.  */
+      (void) __sigprocmask (SIG_SETMASK, arg, (sigset_t *) NULL);
+      __set_errno (saved_errno);
+    }
 }
 
-
-/* We are going to use the `nanosleep' syscall of the kernel.  But the
-   kernel does not implement the stupid SysV SIGCHLD vs. SIG_IGN
-   behaviour for this syscall.  Therefore we have to emulate it here.  */
-unsigned int
-__sleep (unsigned int seconds)
+static unsigned int
+sleep_loop (unsigned int seconds)
 {
   const unsigned int max
     = (unsigned int) (((unsigned long int) (~((time_t) 0))) >> 1);
   struct timespec ts;
-  sigset_t set, oset;
   unsigned int result;
 
-  /* This is not necessary but some buggy programs depend on this.  */
-  if (__glibc_unlikely (seconds == 0))
-    {
-#ifdef CANCELLATION_P
-      CANCELLATION_P (THREAD_SELF);
-#endif
-      return 0;
-    }
-
   ts.tv_sec = 0;
   ts.tv_nsec = 0;
- again:
   if (sizeof (ts.tv_sec) <= sizeof (seconds))
     {
       /* Since SECONDS is unsigned assigning the value to .tv_sec can
@@ -70,77 +61,89 @@  __sleep (unsigned int seconds)
       seconds = 0;
     }
 
-  /* Linux will wake up the system call, nanosleep, when SIGCHLD
-     arrives even if SIGCHLD is ignored.  We have to deal with it
-     in libc.  We block SIGCHLD first.  */
-  __sigemptyset (&set);
-  __sigaddset (&set, SIGCHLD);
-  if (__sigprocmask (SIG_BLOCK, &set, &oset))
-    return -1;
-
-  /* If SIGCHLD is already blocked, we don't have to do anything.  */
-  if (!__sigismember (&oset, SIGCHLD))
+  while (1)
     {
-      int saved_errno;
-      struct sigaction oact;
+      result = __nanosleep (&ts, &ts);
 
-      __sigemptyset (&set);
-      __sigaddset (&set, SIGCHLD);
+      if (result != 0 || seconds == 0)
+	break;
 
-      /* We get the signal handler for SIGCHLD.  */
-      if (__sigaction (SIGCHLD, (struct sigaction *) NULL, &oact) < 0)
+      if (sizeof (ts.tv_sec) <= sizeof (seconds))
 	{
-	  saved_errno = errno;
-	  /* Restore the original signal mask.  */
-	  (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
-	  __set_errno (saved_errno);
-	  return -1;
+	  ts.tv_sec = MIN (seconds, max);
+	  seconds -= (unsigned int) ts.tv_nsec;
 	}
+    }
 
-      /* Note the sleep() is a cancellation point.  But since we call
-	 nanosleep() which itself is a cancellation point we do not
-	 have to do anything here.  */
-      if (oact.sa_handler == SIG_IGN)
-	{
-	  __libc_cleanup_push (cl, &oset);
+  if (result != 0)
+    /* Round remaining time.  */
+    result = seconds + (unsigned int) ts.tv_sec + (ts.tv_nsec >= 500000000L);
+
+  return result;
+}
+
+/* We are going to use the `nanosleep' syscall of the kernel.  But the
+   kernel does not implement the stupid SysV SIGCHLD vs. SIG_IGN
+   behaviour for this syscall.  Therefore we have to emulate it here.  */
+unsigned int
+__sleep (unsigned int seconds)
+{
+  sigset_t set;
+  unsigned int result;
 
-	  /* We should leave SIGCHLD blocked.  */
-	  while (1)
-	    {
-	      result = __nanosleep (&ts, &ts);
+  /* This is not necessary but some buggy programs depend on this.  */
+  if (__glibc_unlikely (seconds == 0))
+    {
+#ifdef CANCELLATION_P
+      CANCELLATION_P (THREAD_SELF);
+#endif
+      return 0;
+    }
 
-	      if (result != 0 || seconds == 0)
-		break;
+  __sigemptyset (&set);
+  __sigaddset (&set, SIGCHLD);
 
-	      if (sizeof (ts.tv_sec) <= sizeof (seconds))
-		{
-		  ts.tv_sec = MIN (seconds, max);
-		  seconds -= (unsigned int) ts.tv_nsec;
-		}
-	    }
+  /* If we're cancelled before blocking SIGCHLD, the handler will
+     still be the sigset above, so SIGCHLD will be set, and the signal
+     mask won't be restored.  */
+  __libc_cleanup_push (restore_sigmask, &set);
 
-	  __libc_cleanup_pop (0);
+  /* Linux will wake up the system call, nanosleep, when SIGCHLD
+     arrives even if SIGCHLD is ignored.  We have to deal with it in
+     libc, so we block SIGCHLD while sleeping if SIGCHLD shouldn't
+     wake us up.  */
+  if (__sigprocmask (SIG_BLOCK, &set, &set))
+    return -1;
 
-	  saved_errno = errno;
-	  /* Restore the original signal mask.  */
-	  (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
-	  __set_errno (saved_errno);
+  if (!__sigismember (arg, SIGCHLD))
+    {
+      struct sigaction oact;
 
-	  goto out;
+      if (__sigaction (SIGCHLD, (struct sigaction *) NULL, &oact) < 0)
+	{
+	  restore_sigmask (&set);
+	  return -1;
 	}
 
-      /* We should unblock SIGCHLD.  Restore the original signal mask.  */
-      (void) __sigprocmask (SIG_SETMASK, &oset, (sigset_t *) NULL);
+      /* If there is a handler, unblock SIGCHLD, since it's ok if we
+	 wake up.  ??? This test is racy: another thread might change
+	 the handler after the sigaction call above, and then we might
+	 wake up with SIG_IGN, or block a waking-up signal.  */
+      if (oact.sa_handler != SIG_IGN)
+	{
+	  restore_sigmask (&set);
+	  /* Stop the cleanup handler from restoring the mask
+	     again.  */
+	  __sigaddset (&set, SIGCHLD);
+	}
     }
 
-  result = __nanosleep (&ts, &ts);
-  if (result == 0 && seconds != 0)
-    goto again;
+  /* Although sleep() is a cancellation point, so is nanosleep(), so
+     we do not have to do anything here.  */
+  result = sleep_loop (seconds);
 
- out:
-  if (result != 0)
-    /* Round remaining time.  */
-    result = seconds + (unsigned int) ts.tv_sec + (ts.tv_nsec >= 500000000L);
+  /* Restore the original signal mask.  */
+  __libc_cleanup_pop (1);
 
   return result;
 }


Finally, here's sort of an alternative to the second patch, that takes a
different direction: it doesn't attempt to address any of the safety
issues in sleep, just to document the AC-Safety issue that would remain
even if we decided to disregard the signal handler race.

I post it not because I think it's the way to go, but so that the issue
that hides behind the "stronger" @mtascusig annotation can be brought to
light and discussed.  We might even decide that we want to put the new
@acusig note in, maybe along with @mtasusig (not proposed yet), and have
both annotations in sleep, if the second patch doesn't go in to fix the
AC-Safety problem.

Comments?


for  ChangeLog

	* manual/intro.texi (sig): Add new meaning as AC-Safety note
	only.
	* manual/macros.texi (acusig): New macro.
	* manual/time.texi (sleep): Disregard SIGCHLD handler race,
	but note the AC-Unsafe failure to unblock it.
---
 manual/intro.texi  |   12 ++++++++++++
 manual/macros.texi |    5 +++++
 manual/time.texi   |   10 ++++++++--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/manual/intro.texi b/manual/intro.texi
index 6cd5776..9ef4eb3 100644
--- a/manual/intro.texi
+++ b/manual/intro.texi
@@ -652,6 +652,18 @@  asynchronous cancellation @emph{and} installing a cleanup handler to
 restore the signal to the desired state and to release the mutex are
 recommended.
 
+@c Functions marked with @code{sig} as an AC-Safety issue only may
+@c temporarily block a signal, and the temporary setting might be left in
+@c effect in case of cancellation, which may affect cleanup handlers
+@c executed until the cancelled thread terminates.  Since the thread's
+@c signal mask will cease to exist when the thread terminates, it might
+@c seem unlikely that this temporary blocking in a cancelling thread might
+@c cause any problem.  However, if a cleanup handler waits for the delivery
+@c of an expected signal, and all other threads have the signal blocked,
+@c the effects of the temporary mask might be noticeable and detrimental to
+@c the application.  Functions exhibiting this behavior will therefore be
+@c marked as AC-Unsafe.
+
 
 @item @code{term}
 @cindex term
diff --git a/manual/macros.texi b/manual/macros.texi
index f32c86dc2..f2bbe10 100644
--- a/manual/macros.texi
+++ b/manual/macros.texi
@@ -204,6 +204,11 @@  mem\comments\
 @macro mtascusig {comments}
 sig\comments\
 @end macro
+@c Function is AC-unsafe due to temporarily overriding or blocking a
+@c signal handler.
+@macro acusig {comments}
+sig\comments\
+@end macro
 @c Function is MT- and AS-Unsafe due to temporarily changing attributes
 @c of the controlling terminal.
 @macro mtasuterm {comments}
diff --git a/manual/time.texi b/manual/time.texi
index 1aebdd0..7fba49d 100644
--- a/manual/time.texi
+++ b/manual/time.texi
@@ -2826,12 +2826,18 @@  any descriptors to wait for.
 @comment unistd.h
 @comment POSIX.1
 @deftypefun {unsigned int} sleep (unsigned int @var{seconds})
-@safety{@prelim{}@mtunsafe{@mtascusig{:SIGCHLD/linux}}@asunsafe{}@acunsafe{}}
+@safety{@prelim{}@mtsafe{}@assafe{}@acunsafe{@acusig{:SIGCHLD/linux}}}
 @c On Mach, it uses ports and calls time.  On generic posix, it calls
 @c nanosleep.  On GNU/Linux, it may temporarily block SIGCHLD while
 @c calling nanosleep, and other threads might modify the signal handler
 @c after we test whether it is SIG_IGN, causing the function to wake up
-@c when it shouldn't or to block a signal that should wake it up.
+@c when it shouldn't or to block a signal that should wake it up, but we
+@c do not regard it as enough of a deviation from the specification to
+@c make it MT- or AS-Unsafe.  However, the cleanup handler for thread
+@c cancellation, that should restore the signal mask, does not cover the
+@c entire range in which the mask might be modified, so asynchronous
+@c cancellation might leave SIGCHLD blocked while running subsequent
+@c cleanups.
 The @code{sleep} function waits for @var{seconds} or until a signal
 is delivered, whichever happens first.