[1/2] Add futex wrappers with error checking

Message ID 1417804656.22797.107.camel@triegel.csb
State Dropped
Headers

Commit Message

Torvald Riegel Dec. 5, 2014, 6:37 p.m. UTC
  On Thu, 2014-12-04 at 16:33 -0800, Roland McGrath wrote:
> I'm not entirely clear on why this is separate from the lll_futex_*
> layer rather than replacing it.  I understand the benefits of
> incremental change, of course.  Is that the only reason?

Incremental change is one reason.

The second is that I haven't looked through all the lowlevellock cases
yet, so didn't want to touch that; it seemed moving lll_futex* callers
over to futex* callers wouldn't be an issue later on.

Third, some of the lll_futex* definitions are in headers that are also
used from asm files; I guess that would mean I'd need to use macros
instead of C functions.

Fourth, I need some way to get to the arch-specific futex syscalls.  I
didn't know whether sysdeps/unix/sysv/linux/lowlevellock-futex.h would
work on every arch, so I just used what works for the locks.

> I don't
> think we want to have both layers as such in the long run,

Maybe not.  If we want to expose our own futex abstraction to users,
we'd need a separate version that does less of the error checking we do,
as there may be cases where certain errors would need to be handled
differently.  You point out something similar below; checking that the
kernel (or whatever below provides the futex functionality) didn't
return errors we haven't specified in our futex abstraction.

> and it's
> not clear to me what the end state of this cleanup will be in your
> vision.

I didn't think about clean-up as much.  What I wanted is something we
can use today to get the futex error handling correct in pthread_once
and the the semaphores I'm about to submit, for example.

> Perhaps we should do some more thorough design of the final
> internal API we want to have, and then figure out the incremental path
> to get there with bite-sized changes.

Fine with me.  From my perspective, it seemed best to start with an
abstraction with well-defined semantics (that's what I tried to do in
futex-internal.h at least), so that all clients of it are taken care of.
Any clean-up under the hood of it could be then done independently.

I think I have a pretty good understanding for what the futex semantics
of the abstraction that we use internally should be.  I don't have a
good feel for how to best clean up all the existing code we have related
to that.

> This is dovetailing (or colliding, to be pessimistic) with more
> cleanup and refactoring that I'm starting to do as part of my Native
> Client port.  So I'll start by just throwing out there all the issues
> I'm aware of off hand, in hopes that solving each might naturally be
> folded into what you're doing.
> 
> The internal API of lowlevellock-futex.h needs to be cleaned up and
> specified more thoroughly in a few ways.

Agreed.  That was my goal for futex-internal.h too.

> * FUTEX_PRIVATE_FLAG is not part of the stated API (as described in
>   the stub file, sysdeps/nptl/lowlevellock-futex.h).  But it is used
>   in "generic" NPTL code, with implicit assumptions about its
>   relationship to LLL_PRIVATE and LLL_SHARED.  What I'd like to see is
>   both use of FUTEX_PRIVATE_FLAG and __ASSUME_PRIVATE_FUTEX disappear
>   from generic code.

Agreed.  That's why I introduce FUTEX_PRIVATE and FUTEX_SHARED as
constants in my patch, and have the futex_private_if_supported helper.
The current code is often hard to follow, including when to xor
FUTEX_PRIVATE_FLAG and when not to.  Using just two constants everywhere
seems better.  And all the futex calls are on the slow path anyway.

> Instead, we can have some sysdeps-defined
>   inlines akin to your futex_private_if_supported, but both sides of
>   the coin: one for the cases like pthread_barrier_init and one for
>   the cases like pthread_barrier_wait.

I'm not sure we actually need something for pthread_barrier_wait.  All
uses of the private field are xor'd with FUTEX_PRIVATE_FLAG.  It seemed
to me that just having a properly set up value for the private field in
the first place would suffice.

> Ideally, this would also cover
>   detection of support in the opposite direction from what we've ever
>   dealt with before: when shared is the unavailable option, so that
>   pthread_barrierattr_setpshared et al can fail with ENOTSUP for any
>   request for pshared semantics.
> 
>   Today we actually do not have any configurations where the answer is
>   dynamic, though I'd like to support them for the future.  On Linux,
>   we always set __ASSUME_PRIVATE_FUTEX.  (Older Linux kernels did not
>   always support private, but shared can always stand in for private
>   so it's not a user-visible distinction.  Nowadays our minimum
>   supported kernel version is one that supports private.)  On NaCl,
>   all futexes are private and shared is just not available.  Hence I'd
>   like to make setpshared fail properly rather than lie.  But it's not
>   unlikely that at some point in the future, NaCl will support shared
>   and then it would be a dynamic check to determine whether it's
>   available or not.

Interesting.  I haven't thought about the case where shared is not
supported.

I suppose "dynamic" would still mean that this is stable throughout the
lifetime of the process?

>   I'm sure this can be done in a way that does not change the compiled
>   code at all for Linux.

Probably, but personally I wouldn't worry about that.  When we do the
transition to the new internal futex API, we'd change code anyway
because of adding more error checking and such.

Or are you worried about having to potentially test versioned code?

Also, I have the new semaphores I'm about to submit, have a new condvar
implementation that is also close to being finished, and rwlocks on my
todo list.  They would all be C11 atomics based, and use a futex API
with proper error checking.  So, there would be a lot of code change in
the major uses of futexes anyway (when ignoring mutexes).

> I'm even sure it can be done in a way that
>   would not change the compiled code for Linux cases without
>   __ASSUME_PRIVATE_FUTEX, if there still were such.  But I haven't
>   thought up the right API for that off hand.  And frankly, I get a
>   bit dizzy every time I try to think through all the XORs and which
>   places store the bit in which sense.

Yes, me too :)
I think the patch is correct for (the new) semaphores and pthread_once
at least -- but I haven't gone through all the other clients yet.

> * We haven't properly specified the exact types of pointer arguments
>   in lll_futex_* calls.  In NaCl these are implemented by eventually
>   calling actual C functions with similar signatures, as opposed to
>   many layers of macro turning into asm with operands that just have
>   to be pointer-sized.  Our uses are actually inconsistent about
>   whether it's an 'int *' or an 'unsigned int *' and about whether
>   it's volatile or not.

I also though about int vs. unsigned int for a while.  The Linux kernel
has int, but when I look at the synchronization code I'm using futexes
for, in most of the cases it's an unsigned you want to work with.
Therefore, I picked unsigned.

> So my build has lots of volatileness and
>   pointee signedness warnings that are not easily vanquished without
>   resorting to casts that could mask real bugs.

Nothing needs to be volatile there, IMO.  If anything, this should be an
atomic type.

> The Linux
>   implementations indirectly have casts that could indeed mask real
>   bugs.  It would be far better to have inlines with specific types
>   and clean up all our usage.

Agreed.

> * I really want to completely excise the inane "negated errno" return
>   value convention from all our internal APIs.  That is not even a
>   true Linuxism, it's a style copied from Linux kernel internals that
>   does not even map to what the user ABI for syscall errors is on all
>   machines.  All new or cleaned-up interfaces should just use the
>   straightforward POSIXy "errno or zero" convention instead.  (That's
>   what the underlying OS interface C functions in NaCl do, so today my
>   macros all do -function(...) and I get oodles of -Wunused-value
>   warnings for all the places we lack error checking today.)

Fine with me.
  
>   The only reason not to use that convention is if you needed some
>   content in the return value other than error indication.  The Linux
>   futex syscall interface does return such values (FUTEX_WAKE), but we
>   do not actually use them anywhere at all.

And in particular in FUTEX_WAKE, I see no error that we actually need to
return given the way we use futexes.  With things like the mutex
destruction issue, FUTEX_WAKE can really hit anything, reused memory,
unmapped memory, etc.
That's probably something to watch out for in the NaCl implementation
too.

> If we ever did need them,
>   I'd advocate for using out parameters in our internal APIs instead,
>   even if the Linux implementation transmutes part of the return value
>   space into the out parameter.  (Such layers will be all inlines
>   anyway, so it shouldn't even make a microoptimization difference.)

Fine with me.

> > It is implemented on top of lll_futex_*, so that we can expose the raw
> > futex to users via a syscall wrapper (or external futex_wait,...
> > functions), should we want to do that in the future.
> 
> I'm not sure how that potential future relates to the layering
> choices.  At any rate, I don't think we should choose our internal
> layering based on speculation about such future uses if it results in
> deciding on more complex internals (extra layers and the like) now.
> We can always refactor in the future when everything is clearer.
> 
> > 	* nptl/futex-internal.h: New file.
> 
> Just as a procedural matter, I'm inclined to say that a new file like
> this should come in the same commit as the first use of it.

Okay.  Do you want me to merge the two patches?

> Otherwise, even total build-breaking errors in the file wouldn't be
> noticed until the next change.  Likewise, it seems best to leave out
> things (e.g. futex_private_if_supported) until the commit that
> actually introduces a use.
> 
> > +#include <lowlevellock.h>
> 
> Include only what you need: lowlevellock-futex.h here.  That changes
> which code you're getting today, because all the machine-specific
> lowlevellock.h files still need to be removed.  But we should be
> finishing that cleanup this cycle anyway (though everyone seems to
> have forgotten).

I tried that now, but that doesn't work because it redefines lll_futex*,
and it's hard to avoid including lowlevellock.h through some other
header.  Therefore, I left this unchanged for now.

> > +  abort();
> 
> Space before paren.  Also, need #include <stdlib.h> to use abort.
> 
> So, now I'm seeing a potential reason to have this layer exist
> distinct from the OS-encapsulation layer.  Perhaps we should have the
> checks for expected errno values be in an OS-independent layer rather
> than just saying in the specification of the OS-encapsulation layer
> that it must yield only the particular set.

I'm not sure I can quite follow you.  I could see why the
OS-encapsulation layer would want to check that the set of return values
is only those we support in higher layers, but that's not what you're
after, or is it?

Updated patch is attached.  Is this one okay, or do you want to see
further changes to it and/or more of the full problem being addressed?
  

Comments

Roland McGrath Dec. 12, 2014, 1:18 a.m. UTC | #1
> The second is that I haven't looked through all the lowlevellock cases
> yet, so didn't want to touch that; it seemed moving lll_futex* callers
> over to futex* callers wouldn't be an issue later on.

Fair enough.  I'm happy as long as the end state has no duplication of
code/logic (one internal futex interface to rule them all) and you commit
to driving all the corners of the cleanup so we get to that end state in
this cycle.

> Third, some of the lll_futex* definitions are in headers that are also
> used from asm files; I guess that would mean I'd need to use macros
> instead of C functions.

#ifdef __ASSEMBLY__ if need be.  But we can also just clean things up so
that's no longer the case.  There's no reason I can see why assembly code
should want lowlevellock-futex.h.

> Fourth, I need some way to get to the arch-specific futex syscalls.  I
> didn't know whether sysdeps/unix/sysv/linux/lowlevellock-futex.h would
> work on every arch, so I just used what works for the locks.

What's an arch-specific futex syscall?  AFAIK
sysdeps/unix/sysv/linux/lowlevellock-futex.h should be fine for all
machines.  Indeed sysdeps/unix/sysv/linux/lowlevellock.h should be fine for
all machines too, and the only reason we still have any machine-specific
files is conservatism about making sure that removing each one doesn't
degrade any performance or semantics (so someone just needs to look at the
generated code and compare for each machine).

> > I don't
> > think we want to have both layers as such in the long run,
> 
> Maybe not.  If we want to expose our own futex abstraction to users,
> we'd need a separate version that does less of the error checking we do,
> as there may be cases where certain errors would need to be handled
> differently.  You point out something similar below; checking that the
> kernel (or whatever below provides the futex functionality) didn't
> return errors we haven't specified in our futex abstraction.

I think the best approach for now is not to think about any new user API.
Just do all the cleanup of our internal futex use thoroughly so we think
it's very good and very maintainable.  When/if we come up with a new user
API later, we can refactor as needed to implement it.

> I didn't think about clean-up as much.  What I wanted is something we
> can use today to get the futex error handling correct in pthread_once
> and the the semaphores I'm about to submit, for example.

I'm going to insist on cleanup so we aren't growing redundant internal
APIs.

> Fine with me.  From my perspective, it seemed best to start with an
> abstraction with well-defined semantics (that's what I tried to do in
> futex-internal.h at least), so that all clients of it are taken care of.
> Any clean-up under the hood of it could be then done independently.

Makes sense.

> I think I have a pretty good understanding for what the futex semantics
> of the abstraction that we use internally should be.  I don't have a
> good feel for how to best clean up all the existing code we have related
> to that.

If you start with a strawman proposal for the complete new internal API,
then we can all work together to figure out how to clean up existing code.

> I'm not sure we actually need something for pthread_barrier_wait.  All
> uses of the private field are xor'd with FUTEX_PRIVATE_FLAG.  It seemed
> to me that just having a properly set up value for the private field in
> the first place would suffice.

That might well be what makes most sense with all this cleaned up.  I can
only barely guess what motivated the arrangement with inverting the bit.

> Interesting.  I haven't thought about the case where shared is not
> supported.

I think that's the main wrinkle that introduced all this complexity.
But it's all so fuzzy to me that this is only a guess.

> I suppose "dynamic" would still mean that this is stable throughout the
> lifetime of the process?

Correct.  It's just runtime detection of what the kernel you're running on
supports, like we do for various syscalls and flags.  For private futex
support, the existing code does a query at startup (nptl-init.c).

> >   I'm sure this can be done in a way that does not change the compiled
> >   code at all for Linux.
> 
> Probably, but personally I wouldn't worry about that.  When we do the
> transition to the new internal futex API, we'd change code anyway
> because of adding more error checking and such.

I really just meant that dynamically using the private flag or not should
be no more costly than the current code's XOR tricks.

> I also though about int vs. unsigned int for a while.  The Linux kernel
> has int, but when I look at the synchronization code I'm using futexes
> for, in most of the cases it's an unsigned you want to work with.
> Therefore, I picked unsigned.

Seems sensible.  We just may need some cleanups once everything is going
through type-checking inlines.

> Nothing needs to be volatile there, IMO.  If anything, this should be an
> atomic type.

OK.  The NaCl futex facility has a prototype using 'volatile int *', but
that might not have been a sensible choice.

> > > 	* nptl/futex-internal.h: New file.
> > 
> > Just as a procedural matter, I'm inclined to say that a new file like
> > this should come in the same commit as the first use of it.
> 
> Okay.  Do you want me to merge the two patches?

That seems right.

> > > +#include <lowlevellock.h>
> > 
> > Include only what you need: lowlevellock-futex.h here.  That changes
> > which code you're getting today, because all the machine-specific
> > lowlevellock.h files still need to be removed.  But we should be
> > finishing that cleanup this cycle anyway (though everyone seems to
> > have forgotten).
> 
> I tried that now, but that doesn't work because it redefines lll_futex*,
> and it's hard to avoid including lowlevellock.h through some other
> header.  Therefore, I left this unchanged for now.

OK.  Perhaps you'd like to take on eliminating at least the x86 versions of
lowlevellock.h?  (I think we'll really need to eliminate all of them before
all futex-related cleanup is done.)

> > So, now I'm seeing a potential reason to have this layer exist
> > distinct from the OS-encapsulation layer.  Perhaps we should have the
> > checks for expected errno values be in an OS-independent layer rather
> > than just saying in the specification of the OS-encapsulation layer
> > that it must yield only the particular set.
> 
> I'm not sure I can quite follow you.  I could see why the
> OS-encapsulation layer would want to check that the set of return values
> is only those we support in higher layers, but that's not what you're
> after, or is it?

If the generic (higher) layers require a certain protocol about errno
values and we want code to enforce/sanity-check the underlying OS calls for
that (which seems to be the consensus for Linux), then it is duplicative
and error-prone for each OS-specific layer to repeat that checking logic.

> Updated patch is attached.  Is this one okay, or do you want to see
> further changes to it and/or more of the full problem being addressed?

I guess I'd like to be closer to a full plan for cleaning it all up--that
is, at least a more full sense of what the complete end state will look
like, if not all the details about how to reach it--before we start
committing.


Thanks,
Roland
  

Patch

commit 2034e912594b2e46911bb6b6e130be049f9d271a
Author: Torvald Riegel <triegel@redhat.com>
Date:   Thu Dec 4 14:12:23 2014 +0100

    Add wrappers for futex operations for glibc-internal use.

diff --git a/nptl/futex-internal.h b/nptl/futex-internal.h
new file mode 100644
index 0000000..67a8b52
--- /dev/null
+++ b/nptl/futex-internal.h
@@ -0,0 +1,198 @@ 
+/* futex operations for glibc-internal use.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef FUTEX_INTERNAL_H
+#define FUTEX_INTERNAL_H
+
+#include <errno.h>
+#include <lowlevellock.h>
+#include <sys/time.h>
+#include <stdlib.h>
+
+/* This file defines futex operations used internally in glibc.  They are
+   wrappers for the syscalls and add glibc-specific error checking of the
+   syscall return value.  We abort on error codes that are caused by bugs in
+   glibc or in the calling application, or when an error code is not known.
+   We return error codes that can arise in correct executions to the caller.
+   Each operation calls out exactly the return values that callers need to
+   handle.
+
+   The private flag must be either FUTEX_PRIVATE or FUTEX_SHARED.
+
+   We expect callers to only use these operations if futexes are supported.
+
+   Due to POSIX requirements on when synchronization data structures such
+   as mutexes or semaphores can be destroyed and due to the futex design
+   having separate fast/slow paths for wake-ups, we need to consider that
+   futex_wake calls might effectively target a data structure that has been
+   destroyed and reused for another object, or unmapped; thus, some
+   errors or spurious wake-ups can happen in correct executions that would
+   not be possible in a program using just a single futex whose lifetime
+   does not end before the program terminates.  For background, see:
+   https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
+   https://lkml.org/lkml/2014/11/27/472  */
+
+#define FUTEX_PRIVATE LLL_PRIVATE
+#define FUTEX_SHARED  LLL_SHARED
+
+/* Returns FUTEX_PRIVATE if pshared is zero and private futexes are supported;
+   returns FUTEX_SHARED otherwise.  */
+static inline int
+futex_private_if_supported (int pshared)
+{
+  if (pshared != 0)
+    return FUTEX_SHARED;
+#ifdef __ASSUME_PRIVATE_FUTEX
+  return FUTEX_PRIVATE;
+#else
+  return THREAD_GETMEM (THREAD_SELF, header.private_futex)
+      ^ FUTEX_PRIVATE_FLAG;
+#endif
+}
+
+
+/* Atomically wrt other futex operations, this blocks iff the value at
+   *FUTEX matches the expected value.  This is semantically equivalent to:
+     l = <get lock associated with futex> (FUTEX);
+     wait_flag = <get wait_flag associated with futex> (FUTEX);
+     lock (l);
+     val = atomic_load_relaxed (FUTEX);
+     if (val != expected) { unlock (l); return EAGAIN; }
+     atomic_store_relaxed (wait_flag, 1);
+     unlock (l);
+     // Now block; can time out in futex_time_wait (see below)
+     while (atomic_load_relaxed(wait_flag));
+
+   Note that no guarantee of a happens-before relation between a woken
+   futex_wait and a futex_wake is documented; however, this does not matter
+   in practice because we have to consider spurious wake-ups (see below),
+   and thus would not be able to reason which futex_wake woke us anyway.
+
+   Returns 0 if woken by a futex operation or spuriously.  (Note that due to
+   the POSIX requirements mentioned above, we need to conservatively assume
+   that unrelated futex_wake operations could wake this futex; it is easiest
+   to just be prepared for spurious wake-ups.)
+   Returns EAGAIN if the futex' value did not match the expected value.
+   Returns EINTR if signals or other spurious wake-ups happened.
+
+   Note that some previous code in glibc assumed the underlying futex
+   operation (e.g., syscall) to start with or include the equivalent of a
+   seq_cst fence; this allows one to avoid an explicit seq_cst fence before
+   a futex_wait call when synchronizing similar to Dekker synchronization.
+   However, we make no such guarantee here.
+   */
+static inline int
+futex_wait (unsigned int *futex, unsigned int expected, int private)
+{
+  int err = lll_futex_timed_wait (futex, expected, NULL, private);
+  switch (err)
+    {
+    case 0:
+    case -EAGAIN:
+    case -EINTR:
+      return -err;
+
+    case -ETIMEDOUT: /* Cannot have happened as we provided no timeout.  */
+    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
+    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
+		     being normalized.  Must have been caused by a glibc or
+		     application bug.  */
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+/* Like futex_timed_wait, but will eventually time out (i.e., stop being
+   blocked) after the absolute point in time provided (i.e., ABSTIME) has
+   passed.  We require the caller to provide a normalized ABSTIME.
+
+   Returns 0 if woken by a futex operation or spuriously.  (Note that due to
+   the POSIX requirements mentioned above, we need to conservatively assume
+   that unrelated futex_wake operations could wake this futex; it is easiest
+   to just be prepared for spurious wake-ups.)
+   Returns EWOULDBLOCK if the futex' value did not match the expected value.
+   Returns EINTR if signals or other spurious wake-ups happened.
+   Returns ETIMEDOUT if the timeout expired.
+   */
+static inline int
+futex_timed_wait (unsigned int* futex, unsigned int expected,
+		  const struct timespec* abstime, int private)
+{
+  int err = lll_futex_timed_wait (futex, expected, abstime, private);
+  switch (err)
+    {
+    case 0:
+    case -EAGAIN:
+    case -EINTR:
+    case -ETIMEDOUT:
+      return -err;
+
+    case -EFAULT: /* Must have been caused by a glibc or application bug.  */
+    case -EINVAL: /* Either due to wrong alignment or due to the timeout not
+		     being normalized.  Must have been caused by a glibc or
+		     application bug.  */
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+/* Atomically wrt other futex operations, this unblocks the specified
+   number of processes, or all processes blocked on this futex if there are
+   fewer than the specified number.  Semantically, this is equivalent to:
+     l = <get lock associated with futex> (futex);
+     lock (l);
+     for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
+       if (<no process blocked on futex>) break;
+       wf = <get wait_flag of a process blocked on futex> (futex);
+       // No happens-before guarantee with woken futex_wait (see above)
+       atomic_store_relaxed (wf, 0);
+     }
+     return res;
+
+   Note that we need to support futex_wake calls to past futexes whose memory
+   has potentially been reused due to POSIX' requirements on synchronization
+   object destruction (see above); therefore, we must not report or abort
+   on most errors.  */
+static inline void
+futex_wake (unsigned int* futex, int processes_to_wake, int private)
+{
+  int res = lll_futex_wake (futex, processes_to_wake, private);
+  /* No error.  Ignore the number of woken processes.  */
+  if (res >= 0)
+    return;
+  switch (res)
+    {
+    case -EFAULT: /* Could have happened due to memory reuse.  */
+    case -EINVAL: /* Could be either due to incorrect alignment (a bug in
+		     glibc or in the application) or due to memory being
+		     reused for a PI futex.  We cannot distinguish between the
+		     two causes, and one of them is correct use, so we do not
+		     act in this case.  */
+      return;
+    case -ENOSYS: /* Must have been caused by a glibc bug.  */
+    /* No other errors are documented at this time.  */
+    default:
+      abort ();
+    }
+}
+
+#endif  /* futex-internal.h */