diff mbox

Add futex wrappers with error checking v2

Message ID 1420801519.29798.80.camel@triegel.csb
State Dropped
Headers show

Commit Message

Torvald Riegel Jan. 9, 2015, 11:05 a.m. UTC
This patch is a revision of
https://sourceware.org/ml/libc-alpha/2014-12/msg00134.html

This adds new functions for futex operations, starting with wait,
abstimed_wait, reltimed_wait, wake.  They add documentation and error
checking according to the outcomes of this thread:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html

Compared to the previous patch, this splits futex_timed_wait into two
variants with relative and absolute timeouts, respectively.  This allows
for removing a few cases of code duplication in pthreads code, which
uses absolute timeouts; also, it allows us to put platform-specific code
to go from an absolute to a relative timeout into the platform-specific
futex abstractions.
The latter is done by adding lll_futex_abstimed_wait.  I expect that we
will refactor this later on, depending on how we do the lll_ parts.

I have transformed the lll_futex_* uses that I saw except:
* Anything related to lowlevellock or mutexes.
* Some sparc-specific files, because the machine maintainer is
travelling until Monday.
* pthread condvar: I'm still unsure whether I can get the replacement
condvar posted before the freeze, whatever it will be.
* pthread barrier, because I need to look closer at what it does with
the private flag.
* All tls.h files.
* pthread semaphore, which I'll do by in a revision of my semaphore
patch.

From this list, we should be able to deal with all of them except
lowlevellock, which might require more care (or maybe not) and, more
importantly, requires all futex functions to come from the
lowlevellock-futex.h files, not lowlevellock.h.  The latter is true
except on the archs whose maintainers haven't reviewed my patches for
that yet (ia64, hppa, s390).

Also note that this patch relies in some places on LLL_PRIVATE being
equal in value to FUTEX_PRIVATE.  This is something we'll have to sort
out later in the lowlevellock refactoring;  we can't easily split these
currently because the same flag (eg, on the rwlock) is used to control
private/shared for both lowlevellocks and futexes unrelated to locks.

Using the wrappers actually catched a bug in sysdeps/nptl/fork.c: The
previous code didn't provide an address but the value, so this might
have led to lost wake-ups due to the futex_wake not targeting the actual
futex.


Tested on x86_64 gnu linux.  Not tested on i386 gnu linux because I get
a build failure that doesn't seem to be related to my patch (multiple
definitions of `_dl_addr_inside_object' and `_itoa').  I'll investigate
the latter, and report if I see regressions due to my patch.

Comments

Torvald Riegel Jan. 9, 2015, 6:46 p.m. UTC | #1
On Fri, 2015-01-09 at 12:05 +0100, Torvald Riegel wrote:
> Compared to the previous patch, this splits futex_timed_wait into two
> variants with relative and absolute timeouts, respectively.  This allows
> for removing a few cases of code duplication in pthreads code, which
> uses absolute timeouts; also, it allows us to put platform-specific code
> to go from an absolute to a relative timeout into the platform-specific
> futex abstractions.
> The latter is done by adding lll_futex_abstimed_wait.  I expect that we
> will refactor this later on, depending on how we do the lll_ parts.

This is missing a fixup I hadn't merged.  The code works as-is for the
current uses, but the docs for futed_*timed_wait claim that NULL can be
passed as timeout arg, which is not the case with this patch.  I'll fix
this with a follow-up.

> Tested on x86_64 gnu linux.  Not tested on i386 gnu linux because I get
> a build failure that doesn't seem to be related to my patch (multiple
> definitions of `_dl_addr_inside_object' and `_itoa').  I'll investigate
> the latter, and report if I see regressions due to my patch.

This seems to be due to the changes in nptl/cancellation.c.  I don't yet
know why, but will continue looking.  If anyone has a suggestion, please
let me know.
diff mbox

Patch

commit a3ef8c7b886f55a829ba0ff9e6d3ce2f2ae46cc5
Author: Torvald Riegel <triegel@redhat.com>
Date:   Thu Dec 4 14:12:23 2014 +0100

    Add wrappers for futex operations for glibc-internal use.

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 3c8e046..8895004 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -29,6 +29,8 @@ 
 #include <tls.h>
 #include <list.h>
 #include <lowlevellock.h>
+#include <libc-internal.h>
+#include <futex-internal.h>
 #include <kernel-features.h>
 #include <stack-aliasing.h>
 
@@ -987,7 +989,8 @@  setxid_mark_thread (struct xid_command *cmdp, struct pthread *t)
   if (t->setxid_futex == -1
       && ! atomic_compare_and_exchange_bool_acq (&t->setxid_futex, -2, -1))
     do
-      lll_futex_wait (&t->setxid_futex, -2, LLL_PRIVATE);
+      ignore_value (futex_wait ((unsigned int *) &t->setxid_futex, -2,
+				FUTEX_PRIVATE));
     while (t->setxid_futex == -2);
 
   /* Don't let the thread exit before the setxid handler runs.  */
@@ -1005,7 +1008,7 @@  setxid_mark_thread (struct xid_command *cmdp, struct pthread *t)
 	  if ((ch & SETXID_BITMASK) == 0)
 	    {
 	      t->setxid_futex = 1;
-	      lll_futex_wake (&t->setxid_futex, 1, LLL_PRIVATE);
+	      futex_wake ((unsigned int *) &t->setxid_futex, 1, FUTEX_PRIVATE);
 	    }
 	  return;
 	}
@@ -1032,7 +1035,7 @@  setxid_unmark_thread (struct xid_command *cmdp, struct pthread *t)
 
   /* Release the futex just in case.  */
   t->setxid_futex = 1;
-  lll_futex_wake (&t->setxid_futex, 1, LLL_PRIVATE);
+  futex_wake ((unsigned int *) &t->setxid_futex, 1, FUTEX_PRIVATE);
 }
 
 
@@ -1141,7 +1144,8 @@  __nptl_setxid (struct xid_command *cmdp)
       int cur = cmdp->cntr;
       while (cur != 0)
 	{
-	  lll_futex_wait (&cmdp->cntr, cur, LLL_PRIVATE);
+	  ignore_value (futex_wait ((unsigned int *) &cmdp->cntr, cur,
+				    FUTEX_PRIVATE));
 	  cur = cmdp->cntr;
 	}
     }
@@ -1254,7 +1258,8 @@  __wait_lookup_done (void)
 	continue;
 
       do
-	lll_futex_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE);
+	ignore_value (futex_wait ((unsigned int *) gscope_flagp,
+				  THREAD_GSCOPE_FLAG_WAIT, FUTEX_PRIVATE));
       while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT);
     }
 
@@ -1276,7 +1281,8 @@  __wait_lookup_done (void)
 	continue;
 
       do
-	lll_futex_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE);
+	ignore_value (futex_wait ((unsigned int *) gscope_flagp,
+				  THREAD_GSCOPE_FLAG_WAIT, FUTEX_PRIVATE));
       while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT);
     }
 
diff --git a/nptl/cancellation.c b/nptl/cancellation.c
index deac1eb..931acd4 100644
--- a/nptl/cancellation.c
+++ b/nptl/cancellation.c
@@ -18,6 +18,8 @@ 
 
 #include <setjmp.h>
 #include <stdlib.h>
+#include <libc-internal.h>
+#include <futex-internal.h>
 #include "pthreadP.h"
 
 
@@ -93,7 +95,8 @@  __pthread_disable_asynccancel (int oldtype)
   while (__builtin_expect ((newval & (CANCELING_BITMASK | CANCELED_BITMASK))
 			   == CANCELING_BITMASK, 0))
     {
-      lll_futex_wait (&self->cancelhandling, newval, LLL_PRIVATE);
+      ignore_value (futex_wait ((unsigned int *) &self->cancelhandling,
+				newval, FUTEX_PRIVATE));
       newval = THREAD_GETMEM (self, cancelhandling);
     }
 }
diff --git a/nptl/futex-internal.h b/nptl/futex-internal.h
new file mode 100644
index 0000000..5048219
--- /dev/null
+++ b/nptl/futex-internal.h
@@ -0,0 +1,226 @@ 
+/* futex operations for glibc-internal use.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef FUTEX_INTERNAL_H
+#define FUTEX_INTERNAL_H
+
+#include <errno.h>
+/* XXX include only lowlevellock-futex.h once transition is complete.  */
+#include <lowlevellock.h>
+#include <sys/time.h>
+#include <stdlib.h>
+
+/* This file defines futex operations used internally in glibc.  They are
+   wrappers for the syscalls and add glibc-specific error checking of the
+   syscall return value.  We abort on error codes that are caused by bugs in
+   glibc or in the calling application, or when an error code is not known.
+   We return error codes that can arise in correct executions to the caller.
+   Each operation calls out exactly the return values that callers need to
+   handle.
+
+   The private flag must be either FUTEX_PRIVATE or FUTEX_SHARED.
+
+   We expect callers to only use these operations if futexes are supported.
+
+   Due to POSIX requirements on when synchronization data structures such
+   as mutexes or semaphores can be destroyed and due to the futex design
+   having separate fast/slow paths for wake-ups, we need to consider that
+   futex_wake calls might effectively target a data structure that has been
+   destroyed and reused for another object, or unmapped; thus, some
+   errors or spurious wake-ups can happen in correct executions that would
+   not be possible in a program using just a single futex whose lifetime
+   does not end before the program terminates.  For background, see:
+   https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
+   https://lkml.org/lkml/2014/11/27/472  */
+
+#define FUTEX_PRIVATE LLL_PRIVATE
+#define FUTEX_SHARED  LLL_SHARED
+
+/* Returns FUTEX_PRIVATE if pshared is zero and private futexes are supported;
+   returns FUTEX_SHARED otherwise.  */
+static __always_inline int
+futex_private_if_supported (int pshared)
+{
+  if (pshared != 0)
+    return FUTEX_SHARED;
+#ifdef __ASSUME_PRIVATE_FUTEX
+  return FUTEX_PRIVATE;
+#else
+  return THREAD_GETMEM (THREAD_SELF, header.private_futex)
+      ^ FUTEX_PRIVATE_FLAG;
+#endif
+}
+
+
+/* Atomically wrt other futex operations, this blocks iff the value at
+   *FUTEX matches the expected value.  This is semantically equivalent to:
+     l = <get lock associated with futex> (FUTEX);
+     wait_flag = <get wait_flag associated with futex> (FUTEX);
+     lock (l);
+     val = atomic_load_relaxed (FUTEX);
+     if (val != expected) { unlock (l); return EAGAIN; }
+     atomic_store_relaxed (wait_flag, 1);
+     unlock (l);
+     // Now block; can time out in futex_time_wait (see below)
+     while (atomic_load_relaxed(wait_flag));
+
+   Note that no guarantee of a happens-before relation between a woken
+   futex_wait and a futex_wake is documented; however, this does not matter
+   in practice because we have to consider spurious wake-ups (see below),
+   and thus would not be able to reason which futex_wake woke us anyway.
+
+   Returns 0 if woken by a futex operation or spuriously.  (Note that due to
+   the POSIX requirements mentioned above, we need to conservatively assume
+   that unrelated futex_wake operations could wake this futex; it is easiest
+   to just be prepared for spurious wake-ups.)
+   Returns EAGAIN if the futex' value did not match the expected value.
+   Returns EINTR if signals or other spurious wake-ups happened.
+
+   Note that some previous code in glibc assumed the underlying futex
+   operation (e.g., syscall) to start with or include the equivalent of a
+   seq_cst fence; this allows one to avoid an explicit seq_cst fence before
+   a futex_wait call when synchronizing similar to Dekker synchronization.
+   However, we make no such guarantee here.
+   */
+static __always_inline int
+futex_wait (unsigned int *futex, unsigned int expected, int private)
+{
+  int err = lll_futex_timed_wait (futex, expected, NULL, private);
+  switch (err)
+    {
+    case 0:
+    case -EAGAIN:
+    case -EINTR:
+      return -err;
+
+    case -ETIMEDOUT: /* Cannot have happened as we provided no timeout.  */
+    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
+    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
+		     being normalized.  Must have been caused by a glibc or
+		     application bug.  */
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+/* Like futex_wait, but will eventually time out (i.e., stop being
+   blocked) after the duration of time provided (i.e., RELTIME) has
+   passed.  The caller must provide a normalized RELTIME.  RELTIME can also
+   equal NULL, in which case this function behaves equivalent to futex_wait.
+
+   Returns 0 if woken by a futex operation or spuriously.  (Note that due to
+   the POSIX requirements mentioned above, we need to conservatively assume
+   that unrelated futex_wake operations could wake this futex; it is easiest
+   to just be prepared for spurious wake-ups.)
+   Returns EAGAIN if the futex' value did not match the expected value.
+   Returns EINTR if signals or other spurious wake-ups happened.
+   Returns ETIMEDOUT if the timeout expired.
+   */
+static __always_inline int
+futex_reltimed_wait (unsigned int* futex, unsigned int expected,
+		     const struct timespec* reltime, int private)
+{
+  int err = lll_futex_timed_wait (futex, expected, reltime, private);
+  switch (err)
+    {
+    case 0:
+    case -EAGAIN:
+    case -EINTR:
+    case -ETIMEDOUT:
+      return -err;
+
+    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
+    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
+		     being normalized.  Must have been caused by a glibc or
+		     application bug.  */
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+/* Like futex_reltimed_wait, but the provided timeout (ABSTIME) is an
+   absolute point in time; a call will time out after this point in time.  */
+static __always_inline int
+futex_abstimed_wait (unsigned int* futex, unsigned int expected,
+		     const struct timespec* abstime, int private)
+{
+  int err = lll_futex_abstimed_wait (futex, expected, abstime, private);
+  switch (err)
+    {
+    case 0:
+    case -EAGAIN:
+    case -EINTR:
+    case -ETIMEDOUT:
+      return -err;
+
+    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
+    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
+		     being normalized.  Must have been caused by a glibc or
+		     application bug.  */
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+/* Atomically wrt other futex operations, this unblocks the specified
+   number of processes, or all processes blocked on this futex if there are
+   fewer than the specified number.  Semantically, this is equivalent to:
+     l = <get lock associated with futex> (futex);
+     lock (l);
+     for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
+       if (<no process blocked on futex>) break;
+       wf = <get wait_flag of a process blocked on futex> (futex);
+       // No happens-before guarantee with woken futex_wait (see above)
+       atomic_store_relaxed (wf, 0);
+     }
+     return res;
+
+   Note that we need to support futex_wake calls to past futexes whose memory
+   has potentially been reused due to POSIX' requirements on synchronization
+   object destruction (see above); therefore, we must not report or abort
+   on most errors.  */
+static __always_inline void
+futex_wake (unsigned int* futex, int processes_to_wake, int private)
+{
+  int res = lll_futex_wake (futex, processes_to_wake, private);
+  /* No error.  Ignore the number of woken processes.  */
+  if (res >= 0)
+    return;
+  switch (res)
+    {
+    case -EFAULT: /* Could have happened due to memory reuse.  */
+    case -EINVAL: /* Could be either due to incorrect alignment (a bug in
+		     glibc or in the application) or due to memory being
+		     reused for a PI futex.  We cannot distinguish between the
+		     two causes, and one of them is correct use, so we do not
+		     act in this case.  */
+      return;
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+#endif  /* futex-internal.h */
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index 1ad41c5..622ac90 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -34,6 +34,7 @@ 
 #include <shlib-compat.h>
 #include <smp.h>
 #include <lowlevellock.h>
+#include <futex-internal.h>
 #include <kernel-features.h>
 #include <libc-internal.h>
 
@@ -278,10 +279,10 @@  sighandler_setxid (int sig, siginfo_t *si, void *ctx)
 
   /* And release the futex.  */
   self->setxid_futex = 1;
-  lll_futex_wake (&self->setxid_futex, 1, LLL_PRIVATE);
+  futex_wake ((unsigned int *) &self->setxid_futex, 1, FUTEX_PRIVATE);
 
   if (atomic_decrement_val (&__xidcmd->cntr) == 0)
-    lll_futex_wake (&__xidcmd->cntr, 1, LLL_PRIVATE);
+    futex_wake ((unsigned int *) &__xidcmd->cntr, 1, FUTEX_PRIVATE);
 }
 #endif
 
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 71a5619..564a63a 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -31,6 +31,7 @@ 
 #include <kernel-features.h>
 #include <exit-thread.h>
 #include <default-sched.h>
+#include <futex-internal.h>
 
 #include <shlib-compat.h>
 
@@ -269,7 +270,7 @@  START_THREAD_DEFN
 
   /* Allow setxid from now onwards.  */
   if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
-    lll_futex_wake (&pd->setxid_futex, 1, LLL_PRIVATE);
+    futex_wake ((unsigned int *) &pd->setxid_futex, 1, FUTEX_PRIVATE);
 
 #ifdef __NR_set_robust_list
 # ifndef __ASSUME_SET_ROBUST_LIST
@@ -414,7 +415,7 @@  START_THREAD_DEFN
 	  this->__list.__next = NULL;
 
 	  atomic_or (&this->__lock, FUTEX_OWNER_DIED);
-	  lll_futex_wake (&this->__lock, 1, /* XYZ */ LLL_SHARED);
+	  futex_wake (&this->__lock, 1, /* XYZ */ FUTEX_SHARED);
 	}
       while (robust != (void *) &pd->robust_head);
     }
@@ -442,7 +443,8 @@  START_THREAD_DEFN
       /* Some other thread might call any of the setXid functions and expect
 	 us to reply.  In this case wait until we did that.  */
       do
-	lll_futex_wait (&pd->setxid_futex, 0, LLL_PRIVATE);
+	ignore_value (futex_wait ((unsigned int *) &pd->setxid_futex, 0,
+				  FUTEX_PRIVATE));
       while (pd->cancelhandling & SETXID_BITMASK);
 
       /* Reset the value so that the stack can be reused.  */
@@ -683,7 +685,7 @@  __pthread_create_2_1 (newthread, attr, start_routine, arg)
 	     stillborn thread.  */
 	  if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0)
 				== -2))
-	    lll_futex_wake (&pd->setxid_futex, 1, LLL_PRIVATE);
+	    futex_wake ((unsigned int *) &pd->setxid_futex, 1, FUTEX_PRIVATE);
 
 	  /* Free the resources.  */
 	  __deallocate_stack (pd);
diff --git a/nptl/pthread_once.c b/nptl/pthread_once.c
index fe6d923..e48646f 100644
--- a/nptl/pthread_once.c
+++ b/nptl/pthread_once.c
@@ -17,8 +17,9 @@ 
    <http://www.gnu.org/licenses/>.  */
 
 #include "pthreadP.h"
-#include <lowlevellock.h>
+#include <futex-internal.h>
 #include <atomic.h>
+#include <libc-internal.h>
 
 
 unsigned long int __fork_generation attribute_hidden;
@@ -35,7 +36,7 @@  clear_once_control (void *arg)
      get interrupted (see __pthread_once), so all we need to relay to other
      threads is the state being reset again.  */
   atomic_store_relaxed (once_control, 0);
-  lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);
+  futex_wake ((unsigned int *) once_control, INT_MAX, FUTEX_PRIVATE);
 }
 
 
@@ -100,8 +101,11 @@  __pthread_once_slow (pthread_once_t *once_control, void (*init_routine) (void))
 	     is set and __PTHREAD_ONCE_DONE is not.  */
 	  if (val == newval)
 	    {
-	      /* Same generation, some other thread was faster. Wait.  */
-	      lll_futex_wait (once_control, newval, LLL_PRIVATE);
+	      /* Same generation, some other thread was faster.  Wait and
+		 retry.  Ignore the return value because all possible
+		 values (0, EAGAIN, EINTR) need to be handled the same.  */
+	      ignore_value (futex_wait ((unsigned int *)once_control,
+					(unsigned int) newval, FUTEX_PRIVATE));
 	      continue;
 	    }
 	}
@@ -122,7 +126,7 @@  __pthread_once_slow (pthread_once_t *once_control, void (*init_routine) (void))
       atomic_store_release (once_control, __PTHREAD_ONCE_DONE);
 
       /* Wake up all other threads.  */
-      lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);
+      futex_wake ((unsigned int *) once_control, INT_MAX, FUTEX_PRIVATE);
       break;
     }
 
diff --git a/nptl/pthread_rwlock_rdlock.c b/nptl/pthread_rwlock_rdlock.c
index 0edca65..b3879bf 100644
--- a/nptl/pthread_rwlock_rdlock.c
+++ b/nptl/pthread_rwlock_rdlock.c
@@ -19,6 +19,8 @@ 
 #include <errno.h>
 #include <sysdep.h>
 #include <lowlevellock.h>
+#include <futex-internal.h>
+#include <libc-internal.h>
 #include <pthread.h>
 #include <pthreadP.h>
 #include <stap-probe.h>
@@ -58,9 +60,10 @@  __pthread_rwlock_rdlock_slow (pthread_rwlock_t *rwlock)
       /* Free the lock.  */
       lll_unlock (rwlock->__data.__lock, rwlock->__data.__shared);
 
-      /* Wait for the writer to finish.  */
-      lll_futex_wait (&rwlock->__data.__readers_wakeup, waitval,
-		      rwlock->__data.__shared);
+      /* Wait for the writer to finish.  We do not check the return value
+	 because decide how to continue based on the state of the rwlock.  */
+      ignore_value (futex_wait (&rwlock->__data.__readers_wakeup, waitval,
+				rwlock->__data.__shared));
 
       /* Get the lock.  */
       lll_lock (rwlock->__data.__lock, rwlock->__data.__shared);
diff --git a/nptl/pthread_rwlock_timedrdlock.c b/nptl/pthread_rwlock_timedrdlock.c
index 0212b49..d17b6de 100644
--- a/nptl/pthread_rwlock_timedrdlock.c
+++ b/nptl/pthread_rwlock_timedrdlock.c
@@ -19,10 +19,10 @@ 
 #include <errno.h>
 #include <sysdep.h>
 #include <lowlevellock.h>
+#include <futex-internal.h>
 #include <pthread.h>
 #include <pthreadP.h>
 #include <sys/time.h>
-#include <kernel-features.h>
 
 
 /* Try to acquire read lock for RWLOCK or return after specfied time.  */
@@ -78,38 +78,6 @@  pthread_rwlock_timedrdlock (rwlock, abstime)
 	  break;
 	}
 
-      /* Work around the fact that the kernel rejects negative timeout values
-	 despite them being valid.  */
-      if (__glibc_unlikely (abstime->tv_sec < 0))
-	{
-	  result = ETIMEDOUT;
-	  break;
-	}
-
-#if (!defined __ASSUME_FUTEX_CLOCK_REALTIME \
-     || !defined lll_futex_timed_wait_bitset)
-      /* Get the current time.  So far we support only one clock.  */
-      struct timeval tv;
-      (void) __gettimeofday (&tv, NULL);
-
-      /* Convert the absolute timeout value to a relative timeout.  */
-      struct timespec rt;
-      rt.tv_sec = abstime->tv_sec - tv.tv_sec;
-      rt.tv_nsec = abstime->tv_nsec - tv.tv_usec * 1000;
-      if (rt.tv_nsec < 0)
-	{
-	  rt.tv_nsec += 1000000000;
-	  --rt.tv_sec;
-	}
-      /* Did we already time out?  */
-      if (rt.tv_sec < 0)
-	{
-	  /* Yep, return with an appropriate error.  */
-	  result = ETIMEDOUT;
-	  break;
-	}
-#endif
-
       /* Remember that we are a reader.  */
       if (++rwlock->__data.__nr_readers_queued == 0)
 	{
@@ -124,17 +92,11 @@  pthread_rwlock_timedrdlock (rwlock, abstime)
       /* Free the lock.  */
       lll_unlock (rwlock->__data.__lock, rwlock->__data.__shared);
 
-      /* Wait for the writer to finish.  */
-#if (!defined __ASSUME_FUTEX_CLOCK_REALTIME \
-     || !defined lll_futex_timed_wait_bitset)
-      err = lll_futex_timed_wait (&rwlock->__data.__readers_wakeup,
-				  waitval, &rt, rwlock->__data.__shared);
-#else
-      err = lll_futex_timed_wait_bitset (&rwlock->__data.__readers_wakeup,
-					 waitval, abstime,
-					 FUTEX_CLOCK_REALTIME,
-					 rwlock->__data.__shared);
-#endif
+      /* Wait for the writer to finish.  We handle ETIMEDOUT below; on other
+	 return values, we decide how to continue based on the state of the
+	 rwlock.  */
+      err = futex_abstimed_wait (&rwlock->__data.__readers_wakeup, waitval,
+				 abstime, rwlock->__data.__shared);
 
       /* Get the lock.  */
       lll_lock (rwlock->__data.__lock, rwlock->__data.__shared);
@@ -142,7 +104,7 @@  pthread_rwlock_timedrdlock (rwlock, abstime)
       --rwlock->__data.__nr_readers_queued;
 
       /* Did the futex call time out?  */
-      if (err == -ETIMEDOUT)
+      if (err == ETIMEDOUT)
 	{
 	  /* Yep, report it.  */
 	  result = ETIMEDOUT;
diff --git a/nptl/pthread_rwlock_timedwrlock.c b/nptl/pthread_rwlock_timedwrlock.c
index 416f6e2..4c1d322 100644
--- a/nptl/pthread_rwlock_timedwrlock.c
+++ b/nptl/pthread_rwlock_timedwrlock.c
@@ -19,10 +19,10 @@ 
 #include <errno.h>
 #include <sysdep.h>
 #include <lowlevellock.h>
+#include <futex-internal.h>
 #include <pthread.h>
 #include <pthreadP.h>
 #include <sys/time.h>
-#include <kernel-features.h>
 
 
 /* Try to acquire write lock for RWLOCK or return after specfied time.	*/
@@ -69,37 +69,6 @@  pthread_rwlock_timedwrlock (rwlock, abstime)
 	  break;
 	}
 
-      /* Work around the fact that the kernel rejects negative timeout values
-	 despite them being valid.  */
-      if (__glibc_unlikely (abstime->tv_sec < 0))
-	{
-	  result = ETIMEDOUT;
-	  break;
-	}
-
-#if (!defined __ASSUME_FUTEX_CLOCK_REALTIME \
-     || !defined lll_futex_timed_wait_bitset)
-      /* Get the current time.  So far we support only one clock.  */
-      struct timeval tv;
-      (void) __gettimeofday (&tv, NULL);
-
-      /* Convert the absolute timeout value to a relative timeout.  */
-      struct timespec rt;
-      rt.tv_sec = abstime->tv_sec - tv.tv_sec;
-      rt.tv_nsec = abstime->tv_nsec - tv.tv_usec * 1000;
-      if (rt.tv_nsec < 0)
-	{
-	  rt.tv_nsec += 1000000000;
-	  --rt.tv_sec;
-	}
-      /* Did we already time out?  */
-      if (rt.tv_sec < 0)
-	{
-	  result = ETIMEDOUT;
-	  break;
-	}
-#endif
-
       /* Remember that we are a writer.  */
       if (++rwlock->__data.__nr_writers_queued == 0)
 	{
@@ -114,17 +83,11 @@  pthread_rwlock_timedwrlock (rwlock, abstime)
       /* Free the lock.  */
       lll_unlock (rwlock->__data.__lock, rwlock->__data.__shared);
 
-      /* Wait for the writer or reader(s) to finish.  */
-#if (!defined __ASSUME_FUTEX_CLOCK_REALTIME \
-     || !defined lll_futex_timed_wait_bitset)
-      err = lll_futex_timed_wait (&rwlock->__data.__writer_wakeup,
-				  waitval, &rt, rwlock->__data.__shared);
-#else
-      err = lll_futex_timed_wait_bitset (&rwlock->__data.__writer_wakeup,
-					 waitval, abstime,
-					 FUTEX_CLOCK_REALTIME,
-					 rwlock->__data.__shared);
-#endif
+      /* Wait for the writer or reader(s) to finish.  We handle ETIMEDOUT
+	 below; on other return values, we decide how to continue based on
+	 the state of the rwlock.  */
+      err = futex_abstimed_wait (&rwlock->__data.__writer_wakeup, waitval,
+				 abstime, rwlock->__data.__shared);
 
       /* Get the lock.  */
       lll_lock (rwlock->__data.__lock, rwlock->__data.__shared);
@@ -133,7 +96,7 @@  pthread_rwlock_timedwrlock (rwlock, abstime)
       --rwlock->__data.__nr_writers_queued;
 
       /* Did the futex call time out?  */
-      if (err == -ETIMEDOUT)
+      if (err == ETIMEDOUT)
 	{
 	  result = ETIMEDOUT;
 	  break;
diff --git a/nptl/pthread_rwlock_unlock.c b/nptl/pthread_rwlock_unlock.c
index b8336f6..6513849 100644
--- a/nptl/pthread_rwlock_unlock.c
+++ b/nptl/pthread_rwlock_unlock.c
@@ -19,6 +19,8 @@ 
 #include <errno.h>
 #include <sysdep.h>
 #include <lowlevellock.h>
+#include <futex-internal.h>
+#include <libc-internal.h>
 #include <pthread.h>
 #include <pthreadP.h>
 #include <stap-probe.h>
@@ -46,16 +48,16 @@  __pthread_rwlock_unlock (pthread_rwlock_t *rwlock)
 	{
 	  ++rwlock->__data.__writer_wakeup;
 	  lll_unlock (rwlock->__data.__lock, rwlock->__data.__shared);
-	  lll_futex_wake (&rwlock->__data.__writer_wakeup, 1,
-			  rwlock->__data.__shared);
+	  futex_wake (&rwlock->__data.__writer_wakeup, 1,
+		      rwlock->__data.__shared);
 	  return 0;
 	}
       else if (rwlock->__data.__nr_readers_queued)
 	{
 	  ++rwlock->__data.__readers_wakeup;
 	  lll_unlock (rwlock->__data.__lock, rwlock->__data.__shared);
-	  lll_futex_wake (&rwlock->__data.__readers_wakeup, INT_MAX,
-			  rwlock->__data.__shared);
+	  futex_wake (&rwlock->__data.__readers_wakeup, INT_MAX,
+		      rwlock->__data.__shared);
 	  return 0;
 	}
     }
diff --git a/nptl/pthread_rwlock_wrlock.c b/nptl/pthread_rwlock_wrlock.c
index 835a62f..fe6c488 100644
--- a/nptl/pthread_rwlock_wrlock.c
+++ b/nptl/pthread_rwlock_wrlock.c
@@ -19,6 +19,8 @@ 
 #include <errno.h>
 #include <sysdep.h>
 #include <lowlevellock.h>
+#include <futex-internal.h>
+#include <libc-internal.h>
 #include <pthread.h>
 #include <pthreadP.h>
 #include <stap-probe.h>
@@ -58,9 +60,11 @@  __pthread_rwlock_wrlock_slow (pthread_rwlock_t *rwlock)
       /* Free the lock.  */
       lll_unlock (rwlock->__data.__lock, rwlock->__data.__shared);
 
-      /* Wait for the writer or reader(s) to finish.  */
-      lll_futex_wait (&rwlock->__data.__writer_wakeup, waitval,
-		      rwlock->__data.__shared);
+      /* Wait for the writer or reader(s) to finish.  We do not check the
+	 return value because decide how to continue based on the state of
+	 the rwlock.  */
+      ignore_value (futex_wait (&rwlock->__data.__writer_wakeup, waitval,
+				rwlock->__data.__shared));
 
       /* Get the lock.  */
       lll_lock (rwlock->__data.__lock, rwlock->__data.__shared);
diff --git a/nptl/unregister-atfork.c b/nptl/unregister-atfork.c
index 3838cb7..94156bd 100644
--- a/nptl/unregister-atfork.c
+++ b/nptl/unregister-atfork.c
@@ -20,6 +20,8 @@ 
 #include <stdlib.h>
 #include <fork.h>
 #include <atomic.h>
+#include <libc-internal.h>
+#include <futex-internal.h>
 
 
 void
@@ -114,7 +116,8 @@  __unregister_atfork (dso_handle)
       atomic_decrement (&deleted->handler->refcntr);
       unsigned int val;
       while ((val = deleted->handler->refcntr) != 0)
-	lll_futex_wait (&deleted->handler->refcntr, val, LLL_PRIVATE);
+	ignore_value (futex_wait (&deleted->handler->refcntr, val,
+		      FUTEX_PRIVATE));
 
       deleted = deleted->next;
     }
diff --git a/sysdeps/nptl/aio_misc.h b/sysdeps/nptl/aio_misc.h
index fb69b0f..672c2ad 100644
--- a/sysdeps/nptl/aio_misc.h
+++ b/sysdeps/nptl/aio_misc.h
@@ -22,14 +22,14 @@ 
 
 #include <assert.h>
 #include <nptl/pthreadP.h>
-#include <lowlevellock.h>
+#include <nptl/futex-internal.h>
 
 #define DONT_NEED_AIO_MISC_COND	1
 
 #define AIO_MISC_NOTIFY(waitlist) \
   do {									      \
     if (*waitlist->counterp > 0 && --*waitlist->counterp == 0)		      \
-      lll_futex_wake (waitlist->counterp, 1, LLL_PRIVATE);		      \
+      futex_wake ((unsigned int *) waitlist->counterp, 1, FUTEX_PRIVATE);     \
   } while (0)
 
 #define AIO_MISC_WAIT(result, futex, timeout, cancel)			      \
@@ -48,9 +48,9 @@ 
 	int status;							      \
 	do								      \
 	  {								      \
-	    status = lll_futex_timed_wait (futexaddr, oldval, timeout,	      \
-					   LLL_PRIVATE);		      \
-	    if (status != -EWOULDBLOCK)					      \
+	    status = futex_reltimed_wait ((unsigned int *) futexaddr,	      \
+					  oldval, timeout, FUTEX_PRIVATE);    \
+	    if (status != EAGAIN)					      \
 	      break;							      \
 									      \
 	    oldval = *futexaddr;					      \
@@ -60,12 +60,12 @@ 
 	if (cancel)							      \
 	  LIBC_CANCEL_RESET (oldtype);					      \
 									      \
-	if (status == -EINTR)						      \
+	if (status == EINTR)						      \
 	  result = EINTR;						      \
-	else if (status == -ETIMEDOUT)					      \
+	else if (status == ETIMEDOUT)					      \
 	  result = EAGAIN;						      \
 	else								      \
-	  assert (status == 0 || status == -EWOULDBLOCK);		      \
+	  assert (status == 0 || status == EAGAIN);			      \
 									      \
 	pthread_mutex_lock (&__aio_requests_mutex);			      \
       }									      \
diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
index 5cffd82..f540bde 100644
--- a/sysdeps/nptl/fork.c
+++ b/sysdeps/nptl/fork.c
@@ -30,6 +30,7 @@ 
 #include <nptl/pthreadP.h>
 #include <fork.h>
 #include <arch-fork.h>
+#include <nptl/futex-internal.h>
 
 
 static void
@@ -219,7 +220,7 @@  __libc_fork (void)
 
 	  if (atomic_decrement_and_test (&allp->handler->refcntr)
 	      && allp->handler->need_signal)
-	    lll_futex_wake (allp->handler->refcntr, 1, LLL_PRIVATE);
+	    futex_wake (&allp->handler->refcntr, 1, FUTEX_PRIVATE);
 
 	  allp = allp->next;
 	}
diff --git a/sysdeps/nptl/gai_misc.h b/sysdeps/nptl/gai_misc.h
index bb83dca..f8af189 100644
--- a/sysdeps/nptl/gai_misc.h
+++ b/sysdeps/nptl/gai_misc.h
@@ -23,14 +23,14 @@ 
 #include <assert.h>
 #include <signal.h>
 #include <nptl/pthreadP.h>
-#include <lowlevellock.h>
+#include <nptl/futex-internal.h>
 
 #define DONT_NEED_GAI_MISC_COND	1
 
 #define GAI_MISC_NOTIFY(waitlist) \
   do {									      \
     if (*waitlist->counterp > 0 && --*waitlist->counterp == 0)		      \
-      lll_futex_wake (waitlist->counterp, 1, LLL_PRIVATE);		      \
+      futex_wake ((unsigned int *) waitlist->counterp, 1, FUTEX_PRIVATE);     \
   } while (0)
 
 #define GAI_MISC_WAIT(result, futex, timeout, cancel) \
@@ -49,9 +49,9 @@ 
 	int status;							      \
 	do								      \
 	  {								      \
-	    status = lll_futex_timed_wait (futexaddr, oldval, timeout,	      \
-					   LLL_PRIVATE);		      \
-	    if (status != -EWOULDBLOCK)					      \
+	    status = futex_reltimed_wait ((unsigned int *)futexaddr,	      \
+					  oldval, timeout, FUTEX_PRIVATE);    \
+	    if (status != EAGAIN)					      \
 	      break;							      \
 									      \
 	    oldval = *futexaddr;					      \
@@ -61,12 +61,12 @@ 
 	if (cancel)							      \
 	  LIBC_CANCEL_RESET (oldtype);					      \
 									      \
-	if (status == -EINTR)						      \
+	if (status == EINTR)						      \
 	  result = EINTR;						      \
-	else if (status == -ETIMEDOUT)					      \
+	else if (status == ETIMEDOUT)					      \
 	  result = EAGAIN;						      \
 	else								      \
-	  assert (status == 0 || status == -EWOULDBLOCK);		      \
+	  assert (status == 0 || status == EAGAIN);			      \
 									      \
 	pthread_mutex_lock (&__gai_requests_mutex);			      \
       }									      \
diff --git a/sysdeps/nptl/lowlevellock-futex.h b/sysdeps/nptl/lowlevellock-futex.h
index a095ad9..883f144 100644
--- a/sysdeps/nptl/lowlevellock-futex.h
+++ b/sysdeps/nptl/lowlevellock-futex.h
@@ -43,6 +43,11 @@ 
 #define lll_futex_timed_wait(futexp, val, timeout, private)             \
   -ENOSYS
 
+/* Wait until a lll_futex_wake call on FUTEXP, or ABSTIME, a point in time
+   counted by CLOCK_REALTIME, has passed.  ABSTIME must be normalized.  */
+#define lll_futex_abstimed_wait(futexp, val, abstime, private) \
+  -ENOSYS
+
 /* This macro should be defined only if FUTEX_CLOCK_REALTIME is also defined.
    If CLOCKBIT is zero, this is identical to lll_futex_timed_wait.
    If CLOCKBIT has FUTEX_CLOCK_REALTIME set, then it's the same but
diff --git a/sysdeps/unix/sysv/linux/i386/lowlevellock-futex.h b/sysdeps/unix/sysv/linux/i386/lowlevellock-futex.h
index f08b5b8..f290085 100644
--- a/sysdeps/unix/sysv/linux/i386/lowlevellock-futex.h
+++ b/sysdeps/unix/sysv/linux/i386/lowlevellock-futex.h
@@ -19,6 +19,10 @@ 
 #ifndef _LOWLEVELLOCK_FUTEX_H
 #define _LOWLEVELLOCK_FUTEX_H	1
 
+#ifndef __ASSEMBLER__
+#include <sys/time.h>
+#endif
+
 #define FUTEX_WAIT		0
 #define FUTEX_WAKE		1
 #define FUTEX_CMP_REQUEUE	4
@@ -114,6 +118,31 @@ 
   })
 
 
+#define lll_futex_abstimed_wait(futexp, val, abstime, private)		\
+  ({									\
+    /* Get the current time.  So far we support only one clock.  */	\
+    struct timeval tv;							\
+    (void) __gettimeofday (&tv, NULL);					\
+    /* Convert the absolute timeout value to a relative timeout.  */	\
+    struct timespec rt;							\
+    rt.tv_sec = (abstime)->tv_sec - tv.tv_sec;				\
+    rt.tv_nsec = (abstime)->tv_nsec - tv.tv_usec * 1000;		\
+    if (rt.tv_nsec < 0)							\
+      {									\
+	rt.tv_nsec += 1000000000;					\
+	--rt.tv_sec;							\
+      }									\
+    /* Did we already time out?  */					\
+    int result;								\
+    if (rt.tv_sec < 0)							\
+      /* Yep, return with an appropriate error.  */			\
+      result = -ETIMEDOUT;						\
+    else								\
+      result = lll_futex_timed_wait (futexp, val, &rt, private);	\
+    result;								\
+  })
+
+
 #define lll_futex_wake(futex, nr, private) \
   ({									      \
     int __status;							      \
diff --git a/sysdeps/unix/sysv/linux/lowlevellock-futex.h b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
index 59f6627..570b99c 100644
--- a/sysdeps/unix/sysv/linux/lowlevellock-futex.h
+++ b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
@@ -22,6 +22,7 @@ 
 #ifndef __ASSEMBLER__
 #include <sysdep.h>
 #include <tls.h>
+#include <sys/time.h>
 #include <kernel-features.h>
 #endif
 
@@ -92,6 +93,19 @@ 
 		     __lll_private_flag (FUTEX_WAIT, private),  \
 		     val, timeout)
 
+#define lll_futex_abstimed_wait(futexp, val, abstime, private)		    \
+  ({									    \
+    /* Work around the fact that the kernel rejects negative timeout values \
+       despite them being valid.  */					    \
+    int ret;								    \
+    if (__glibc_unlikely ((abstime)->tv_sec < 0))			    \
+      ret = -ETIMEDOUT;							    \
+    else								    \
+      ret = lll_futex_timed_wait_bitset (futexp, val, abstime,		    \
+					 FUTEX_CLOCK_REALTIME, private);    \
+    ret;								    \
+  })
+
 #define lll_futex_timed_wait_bitset(futexp, val, timeout, clockbit, private) \
   lll_futex_syscall (6, futexp,                                         \
 		     __lll_private_flag (FUTEX_WAIT_BITSET | (clockbit), \