Patchwork [v3,02/21] nptl: Fix Race conditions in pthread cancellation (BZ#12683)

login
register
mail settings
Submitter Adhemerval Zanella Netto
Date Oct. 14, 2019, 8:56 p.m.
Message ID <20191014205656.29834-3-adhemerval.zanella@linaro.org>
Download mbox | patch
Permalink /patch/34963/
State New
Headers show

Comments

Adhemerval Zanella Netto - Oct. 14, 2019, 8:56 p.m.
This patch is the initial fix for race conditions in NPTL cancellation codei
by redefining how cancellable syscalls are defined and handled.  Current
buggy approach is to enable asynchronous cancellation prior to making the
syscall and restore the previous cancellation type once the syscall returns.

As decribed in BZ#12683, this approach shows 2 important problems:

  1. Cancellation can act after the syscall has returned from kernel, but
     before userspace saves the return value.  It might result in a resource
     leak if the syscall allocated a resource or a side effect (partial
     read/write), and there is no way to program handle it with cancellation
     handlers.

  2. If a signal is handled while the thread is blocked at a cancellable
     syscall, the entire signal handler runs with asynchronous cancellation
     enabled.  This can lead to issues if the signal handler call functions
     which are async-signal-safe but not async-cancel-safe.

For cancellation to work correctly, there are 5 points at which the
cancellation signal could arrive:

  1. Before the final "testcancel" and before the syscall is made.
  2. Between the "testcancel" and the syscall.
  3. While the syscall is blocked and no side effects have yet taken place.
  4. While the syscall is blocked but with some side effects already having
     taken place (e.g. a partial read or write).
  5. After the syscall has returned.

And GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case
4 or 5.  The proposed solution follows:

  * Handling case 1 is trivial: do a conditional branch based on whether the
    thread has received a cancellation request;

  * Case 2 can be caught by the signal handler determining that the saved
    program counter (from the ucontext_t) is in some address range beginning
    just before the "testcancel" and ending with the syscall instruction.

  * In this case, except for certain syscalls that ALWAYS fail with EINTR
    even for non-interrupting signals, the kernel will reset the program
    counter to point at the syscall instruction during signal handling, so
    that the syscall is restarted when the signal handler returns. So, from
    the signal handler's standpoint, this looks the same as case 2, and thus
    it's taken care of.

  * In this case, the kernel cannot restart the syscall; when it's
    interrupted by a signal, the kernel must cause the syscall to return
    with whatever partial result it obtained (e.g. partial read or write).

  * In this case, the saved program counter points just after the syscall
    instruction, so the signal handler won't act on cancellation.
    This one is equal to 4. since the program counter is past the syscall
    instruction already.

Another case that needs handling is syscalls that fail with EINTR even
when the signal handler is non-interrupting. In this case, the syscall
wrapper code can just check the cancellation flag when the errno result
is EINTR, and act on cancellation if it's set.

The proposed GLIBC adjustments are:

  1. Remove the enable_asynccancel/disable_asynccancel function usage in
     syscall definition and instead make them call a common symbol that will
     check if cancellation is enabled (__syscall_cancel at
     nptl/libc-cancellation.c), call the arch-specific cancellable
     entry-point (__syscall_cancel_arch) and cancel the thread when required.

  2. Provide a arch-specific symbol that contains global markers. These
     markers will be used in SIGCANCEL handler to check if the interruption
     has been called in a valid syscall and if the syscalls has been
     completed or not.

     A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is
     provided.  However the markers may not be set on correct expected places
     depeding of how INTERNAL_SYSCALL_NCS is implemented by the underlying
     architecture, and it is uses compiler-speficic construct (asm volatile)
     to place the required markers.  It is expected that all architectures
     implement an arch-specific.

  3. Rewrite SIGCANCEL asynchronous handler to check for both cancelling type
     and if current IP from signal handler falls between the global markes
     and act accordingly (sigcancel_handler at nptl/nptl-init.c).

  4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
     directly. This avoid synchronization issues when updating the
     cancellation status and also focus the logic on signal handler and
     cancellation syscall code.

  5. Adjust pthread code to replace CANCEL_ASYNC/CANCEL_RESET calls to
     appropriated cancelable futex syscalls.

  6. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
     appropriated cancelable syscalls.

  7. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide
     cancelable futex calls (used in libpthread code).

This patch adds the proposed changes to NPTL common code and following patches
add the requires arch-specific bits.  The build for ia64-linux-gnu, mips-*,
and x86_64-* are broken without the arch-specific patches.
---
 manual/llio.texi                              |   4 +-
 nptl/Makefile                                 |  12 +-
 nptl/Versions                                 |   3 +
 nptl/cancellation.c                           | 101 ---------------
 nptl/descr.h                                  |  15 +--
 nptl/libc-cancellation.c                      |  45 ++++++-
 nptl/nptl-init.c                              |  88 +++++++-------
 nptl/pthreadP.h                               |  39 +++++-
 nptl/pthread_cancel.c                         |  69 ++---------
 nptl/pthread_create.c                         |   7 +-
 nptl/pthread_exit.c                           |   5 +-
 nptl/pthread_join_common.c                    |   2 +-
 nptl/pthread_kill.c                           |   7 +-
 .../pthread_kill_internal.c                   |  16 +--
 nptl/pthread_setcanceltype.c                  |   2 +-
 nptl/thrd_sleep.c                             |   7 +-
 nptl/tst-cancel28.c                           | 100 +++++++++++++++
 rt/Makefile                                   |   2 +-
 sysdeps/generic/sysdep-cancel.h               |   2 -
 sysdeps/htl/pthreadP.h                        |   1 +
 sysdeps/nptl/Makefile                         |   3 +-
 sysdeps/nptl/cancellation-pc-check.h          |  40 ++++++
 sysdeps/nptl/cancellation-sigmask.h           |  30 +++++
 sysdeps/unix/sysdep.h                         | 115 ++++++++++++++----
 sysdeps/unix/sysv/linux/clock_nanosleep.c     |   6 +-
 sysdeps/unix/sysv/linux/futex-internal.h      |  18 +--
 sysdeps/unix/sysv/linux/lowlevellock-futex.h  |  45 +++++--
 ...pthread_kill.c => pthread_kill_internal.c} |  22 +---
 sysdeps/unix/sysv/linux/socketcall.h          |  40 ++++--
 sysdeps/unix/sysv/linux/syscall_cancel.c      |  62 ++++++++++
 sysdeps/unix/sysv/linux/sysdep.h              |  20 +++
 31 files changed, 587 insertions(+), 341 deletions(-)
 delete mode 100644 nptl/cancellation.c
 rename sysdeps/nptl/librt-cancellation.c => nptl/pthread_kill_internal.c (70%)
 create mode 100644 nptl/tst-cancel28.c
 create mode 100644 sysdeps/nptl/cancellation-pc-check.h
 create mode 100644 sysdeps/nptl/cancellation-sigmask.h
 rename sysdeps/unix/sysv/linux/{pthread_kill.c => pthread_kill_internal.c} (75%)
 create mode 100644 sysdeps/unix/sysv/linux/syscall_cancel.c
Florian Weimer - Oct. 15, 2019, 10:56 a.m.
* Adhemerval Zanella:

> This patch is the initial fix for race conditions in NPTL cancellation codei
> by redefining how cancellable syscalls are defined and handled.  Current
> buggy approach is to enable asynchronous cancellation prior to making the
> syscall and restore the previous cancellation type once the syscall returns.

First sentence appears to be garbled.

> As decribed in BZ#12683, this approach shows 2 important problems:
>
>   1. Cancellation can act after the syscall has returned from kernel, but
>      before userspace saves the return value.  It might result in a resource
>      leak if the syscall allocated a resource or a side effect (partial
>      read/write), and there is no way to program handle it with cancellation
>      handlers.
>
>   2. If a signal is handled while the thread is blocked at a cancellable
>      syscall, the entire signal handler runs with asynchronous cancellation
>      enabled.  This can lead to issues if the signal handler call functions
>      which are async-signal-safe but not async-cancel-safe.

Do you mean “the cancellation signal handler runs” or “any signal
handler runs”?  I think the latter is the much larger issue.

(I think it is customary to wrapper commit messages at column 72 or
something like that.)

> For cancellation to work correctly, there are 5 points at which the
> cancellation signal could arrive:
>
>   1. Before the final "testcancel" and before the syscall is made.
>   2. Between the "testcancel" and the syscall.
>   3. While the syscall is blocked and no side effects have yet taken place.
>   4. While the syscall is blocked but with some side effects already having
>      taken place (e.g. a partial read or write).
>   5. After the syscall has returned.

For clarity, I'd suggest to say whether we want unblocking to take place
(i.e., the cancellation should happen eventually without any further
external event, such as the arrival of more data).

> And GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case
> 4 or 5.  The proposed solution follows:
>
>   * Handling case 1 is trivial: do a conditional branch based on whether the
>     thread has received a cancellation request;
>
>   * Case 2 can be caught by the signal handler determining that the saved
>     program counter (from the ucontext_t) is in some address range beginning
>     just before the "testcancel" and ending with the syscall instruction.
>
>   * In this case, except for certain syscalls that ALWAYS fail with EINTR
>     even for non-interrupting signals, the kernel will reset the program
>     counter to point at the syscall instruction during signal handling, so
>     that the syscall is restarted when the signal handler returns. So, from
>     the signal handler's standpoint, this looks the same as case 2, and thus
>     it's taken care of.
>
>   * In this case, the kernel cannot restart the syscall; when it's
>     interrupted by a signal, the kernel must cause the syscall to return
>     with whatever partial result it obtained (e.g. partial read or write).
>
>   * In this case, the saved program counter points just after the syscall
>     instruction, so the signal handler won't act on cancellation.
>     This one is equal to 4. since the program counter is past the syscall
>     instruction already.

I'd suggest to match these items to the numbers explicitly.

> Another case that needs handling is syscalls that fail with EINTR even
> when the signal handler is non-interrupting. In this case, the syscall
> wrapper code can just check the cancellation flag when the errno result
> is EINTR, and act on cancellation if it's set.
>
> The proposed GLIBC adjustments are:
>
>   1. Remove the enable_asynccancel/disable_asynccancel function usage in
>      syscall definition and instead make them call a common symbol that will
>      check if cancellation is enabled (__syscall_cancel at
>      nptl/libc-cancellation.c), call the arch-specific cancellable
>      entry-point (__syscall_cancel_arch) and cancel the thread when required.
>
>   2. Provide a arch-specific symbol that contains global markers. These
>      markers will be used in SIGCANCEL handler to check if the interruption
>      has been called in a valid syscall and if the syscalls has been
>      completed or not.

An arch-specific generic system call wrapper function?

>      A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is
>      provided.  However the markers may not be set on correct expected places
>      depeding of how INTERNAL_SYSCALL_NCS is implemented by the underlying
>      architecture, and it is uses compiler-speficic construct (asm volatile)
>      to place the required markers.  It is expected that all architectures
>      implement an arch-specific.

Truncated sentence.

>   3. Rewrite SIGCANCEL asynchronous handler to check for both cancelling type
>      and if current IP from signal handler falls between the global markes
>      and act accordingly (sigcancel_handler at nptl/nptl-init.c).

s/markes/markers/

> diff --git a/nptl/libc-cancellation.c b/nptl/libc-cancellation.c
> index 37654cfcfe..430e0b962e 100644
> --- a/nptl/libc-cancellation.c
> +++ b/nptl/libc-cancellation.c
> @@ -18,7 +18,46 @@
>  
>  #include "pthreadP.h"
>  
> +/* Cancellation function called by all cancellable syscalls.  */
> +long int
> +__syscall_cancel (__syscall_arg_t nr, __syscall_arg_t a1,
> +		  __syscall_arg_t a2, __syscall_arg_t a3,
> +		  __syscall_arg_t a4, __syscall_arg_t a5,
> +		  __syscall_arg_t a6)
> +{

long int as the return value may not be quite right on x32.

> +  pthread_t self = (pthread_t) THREAD_SELF;
> +  struct pthread *pd = (struct pthread *) self;
> +  long int result;

I would use inline definitions for new code, but that's your choice.

> @@ -286,38 +277,49 @@ __pthread_initialize_minimal_internal (void)
>    THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
>  
>  #if defined SIGCANCEL || defined SIGSETXID
> -  struct sigaction sa;
> -  __sigemptyset (&sa.sa_mask);
>  
>  # ifdef SIGCANCEL
>    /* Install the cancellation signal handler.  If for some reason we
>       cannot install the handler we do not abort.  Maybe we should, but
>       it is only asynchronous cancellation which is affected.  */
> -  sa.sa_sigaction = sigcancel_handler;
> -  sa.sa_flags = SA_SIGINFO;
> -  (void) __libc_sigaction (SIGCANCEL, &sa, NULL);
> +  {
> +    struct sigaction sa;
> +    sa.sa_sigaction = sigcancel_handler;
> +    /* The signal handle should be non-interruptible to avoid the risk of
> +       spurious EINTR caused by SIGCANCEL sent to process or if pthread_cancel
> +       is called while cancellation is disabled in the target thread.  */
> +    sa.sa_flags = SA_SIGINFO | SA_RESTART;
> +    sa.sa_mask = SIGALL_SET;
> +    __libc_sigaction (SIGCANCEL, &sa, NULL);
> +  }
>  # endif

I'm a little bit concerned that adding SA_RESTART here adds more case
where we do not act on cancellation promptly.  But that does not happen
because if we observe the cancellation conditions to apply in the
SIGCANCEL handler, we unwind straight from the signal handler, right?

> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 070b3afa8d..cafd14f5af 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -296,20 +296,46 @@ extern void __nptl_unwind_freeres (void) attribute_hidden;
>  #endif
>  
>  
> -/* Called when a thread reacts on a cancellation request.  */
>  static inline void
>  __attribute ((noreturn, always_inline))
> -__do_cancel (void)
> +__do_cancel_with_result (void *result)
>  {
>    struct pthread *self = THREAD_SELF;
>  
> -  /* Make sure we get no more cancellations.  */
> -  THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
> +  /* Make sure we get no more cancellations by clearing the cancel
> +     state.  */
> +  int oldval = THREAD_GETMEM (self, cancelhandling);
> +  while (1)
> +    {
> +      int newval = oldval | CANCELSTATE_BITMASK | EXITING_BITMASK;
> +      if (oldval == newval)
> +	break;
> +
> +      oldval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
> +					  oldval);
> +    }
> +
> +  THREAD_SETMEM (self, result, result);
>  
>    __pthread_unwind ((__pthread_unwind_buf_t *)
>  		    THREAD_GETMEM (self, cleanup_jmp_buf));
>  }

The function name should be more generic because it's for unwinding, not
cancellation.

> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
> index 64ac12e60a..b20254bdba 100644
> --- a/nptl/pthread_cancel.c
> +++ b/nptl/pthread_cancel.c
> @@ -37,67 +37,24 @@ __pthread_cancel (pthread_t th)

> +  /* Avoid signaling when thread attempts cancel itself (pthread_kill
> +     is expensive).  */
> +  if (pd == THREAD_SELF
> +      && (THREAD_GETMEM (pd, cancelhandling) & CANCELTYPE_BITMASK) == 0)
> +    return 0;

I'm not sure if I understand this check.  CANCELTYPE_BITMASK is a bit
misnamed (it's a single-bit masking).  As far as I understand it, it's
set if async-cancel is enabled.  Shouldn't we call __do_cancel directly
here in case of an async self-cancel?  And otherwise defer cancellation?

> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 130937c3c4..8cab7f970c 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -406,7 +406,7 @@ START_THREAD_DEFN
>    /* If the parent was running cancellation handlers while creating
>       the thread the new thread inherited the signal mask.  Reset the
>       cancellation signal mask.  */
> -  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELING_BITMASK))
> +  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELED_BITMASK))
>      {
>        INTERNAL_SYSCALL_DECL (err);
>        sigset_t mask;

This conflicts with the patch I posted for bug 25098.  My change resets
the SIGCANCEL mask unconditionally, so the check is gone completely and
no longer needs adjustment.

> @@ -449,7 +449,8 @@ START_THREAD_DEFN
>  	 have ownership (see CONCURRENCY NOTES above).  */
>        if (__glibc_unlikely (pd->stopped_start))
>  	{
> -	  int oldtype = CANCEL_ASYNC ();
> +	  int ct;
> +	  __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);
>  
>  	  /* Get the lock the parent locked to force synchronization.  */
>  	  lll_lock (pd->lock, LLL_PRIVATE);
> @@ -459,7 +460,7 @@ START_THREAD_DEFN
>  	  /* And give it up right away.  */
>  	  lll_unlock (pd->lock, LLL_PRIVATE);
>  
> -	  CANCEL_RESET (oldtype);
> +	  __pthread_setcanceltype (ct, NULL);
>  	}

I think htis is wrong.  Either you need to use a cancellable lll_lock
here.  Or maybe it is not even necessary to enable cancellation at all
because the new tread is eventually bound to make progress and will wake
up the futuex.

> diff --git a/sysdeps/nptl/librt-cancellation.c b/nptl/pthread_kill_internal.c
> similarity index 70%
> rename from sysdeps/nptl/librt-cancellation.c
> rename to nptl/pthread_kill_internal.c
> index 93ebe4aa71..b4f4f6dc78 100644
> --- a/sysdeps/nptl/librt-cancellation.c
> +++ b/nptl/pthread_kill_internal.c
> @@ -1,6 +1,6 @@
> -/* Copyright (C) 2002-2019 Free Software Foundation, Inc.
> +/* Send a signal to a specific pthread.  Internal version.
> +   Copyright (C) 2002-2019 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
> -   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
>  
>     The GNU C Library is free software; you can redistribute it and/or
>     modify it under the terms of the GNU Lesser General Public
> @@ -16,9 +16,11 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <nptl/pthreadP.h>
> +#include <pthreadP.h>
>  
> -
> -#define __pthread_enable_asynccancel __librt_enable_asynccancel
> -#define __pthread_disable_asynccancel __librt_disable_asynccancel
> -#include <nptl/cancellation.c>
> +int
> +__pthread_kill_internal (pthread_t threadid, int signo)
> +{
> +  return ENOSYS;
> +}
> +hidden_def (__pthread_kill_internal)

I don't think we need this stub.

> diff --git a/nptl/pthread_setcanceltype.c b/nptl/pthread_setcanceltype.c
> index d771c31e46..84cc657aed 100644
> --- a/nptl/pthread_setcanceltype.c
> +++ b/nptl/pthread_setcanceltype.c
> @@ -73,4 +73,4 @@ __pthread_setcanceltype (int type, int *oldtype)
>  
>    return 0;
>  }
> -strong_alias (__pthread_setcanceltype, pthread_setcanceltype)
> +weak_alias (__pthread_setcanceltype, pthread_setcanceltype)

Why is this change necessary?  pthread_setcanceltype is the only
function defined in this file, so there are no issues with static
linking.  I'm concerned that this hides a problem elsewhere.

> diff --git a/nptl/thrd_sleep.c b/nptl/thrd_sleep.c
> index 2e185dd748..75c0d53f3e 100644
> --- a/nptl/thrd_sleep.c
> +++ b/nptl/thrd_sleep.c
> @@ -24,13 +24,12 @@
>  int
>  thrd_sleep (const struct timespec* time_point, struct timespec* remaining)
>  {
> -  INTERNAL_SYSCALL_DECL (err);
> -  int ret = INTERNAL_SYSCALL_CANCEL (nanosleep, err, time_point, remaining);
> -  if (INTERNAL_SYSCALL_ERROR_P (ret, err))
> +  long int ret = INTERNAL_SYSCALL_CANCEL (nanosleep, time_point, remaining);

The return type may need adjustment if the return type for the
cancellation wrapper changes.

> diff --git a/nptl/tst-cancel28.c b/nptl/tst-cancel28.c
> new file mode 100644
> index 0000000000..e27f5a0776
> --- /dev/null
> +++ b/nptl/tst-cancel28.c

> +static void *
> +writeopener (void *arg)
> +{
> +  int fd;
> +  for (;;)
> +    {
> +      fd = open (arg, O_WRONLY);
> +      close (fd);
> +    }
> +  return NULL;
> +}
> +
> +static void *
> +leaker (void *arg)
> +{
> +  int fd = open (arg, O_RDONLY);
> +  TEST_VERIFY_EXIT (fd > 0);
> +  pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, 0);
> +  close (fd);
> +  return NULL;
> +}

I think these two functions *really* should check the return value of
close (or use xclose).

> +#define ITER_COUNT 1000
> +#define MAX_FILENO 1024

MAX_FILENO appears to be unused.

> +#define TIMEOUT 10
> +#include <support/test-driver.c>

TIMEOUT is smaller than the default test timeout.

> diff --git a/sysdeps/htl/pthreadP.h b/sysdeps/htl/pthreadP.h
> index e745dacac1..5628791de8 100644
> --- a/sysdeps/htl/pthreadP.h
> +++ b/sysdeps/htl/pthreadP.h
> @@ -25,6 +25,7 @@
>  
>  extern pthread_t __pthread_self (void);
>  extern int __pthread_kill (pthread_t threadid, int signo);
> +extern int __pthread_kill_internal (pthread_t threadid, int signo);
>  extern struct __pthread **__pthread_threads;
>  
>  extern int _pthread_mutex_init (pthread_mutex_t *mutex, const pthread_mutexattr_t *attr);

This change appears unnecessary because __pthread_kill_internal does not
seem to be used outside of nptl.

> diff --git a/sysdeps/nptl/cancellation-pc-check.h b/sysdeps/nptl/cancellation-pc-check.h
> new file mode 100644
> index 0000000000..8b26c4ec4e
> --- /dev/null
> +++ b/sysdeps/nptl/cancellation-pc-check.h

> +/* Check if the program counter (PC) from ucontext CTX is within the start and
> +   then end boundary from the __syscall_cancel_arch bridge.  Return TRUE if
> +   the PC is within the boundary, meaning the syscall does not have any side
> +   effects; or FALSE otherwise.  */
> +static bool
> +ucontext_check_pc_boundary (void *ctx)
> +{
> +  /* Both are defined in syscall_cancel.S.  */
> +  extern const char __syscall_cancel_arch_start[1];
> +  extern const char __syscall_cancel_arch_end[1];
> +
> +  uintptr_t pc = sigcontext_get_pc (ctx);
> +  return pc >= (uintptr_t) __syscall_cancel_arch_start
> +	 && pc < (uintptr_t) __syscall_cancel_arch_end;
> +}

The name of this function should reflect that it is related to
cancellation.

Please add a comment explaining the kernel behavior regarding PC
(non-interrupted vs interrupted system calls), as found in the commit
message.

> +#endif
> diff --git a/sysdeps/nptl/cancellation-sigmask.h b/sysdeps/nptl/cancellation-sigmask.h
> new file mode 100644
> index 0000000000..77702c135e
> --- /dev/null
> +++ b/sysdeps/nptl/cancellation-sigmask.h

> +/* Add the SIGCANCEL signal on sigmask set at the ucontext CTX obtained from
> +   the sigaction handler.  */
> +static void
> +ucontext_add_cancel (void *ctx)
> +{
> +  __sigaddset (&((ucontext_t*) ctx)->uc_sigmask, SIGCANCEL);
> +}

I find the name of the function rather unclear.  Maybe
ucontext_block_sigcancel or something like that.

> diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
> index 10468c7129..04f30b8cbc 100644
> --- a/sysdeps/unix/sysdep.h
> +++ b/sysdeps/unix/sysdep.h

> +#define __SYSCALL_CANCEL0(name) \
> +  (__syscall_cancel)(__NR_##name, 0, 0, 0, 0, 0, 0)

There are a bunch of formatting nits in the changes to this file.
“)(” should be “) (”.  Likewise for “__SSC(”.

> diff --git a/sysdeps/unix/sysv/linux/clock_nanosleep.c b/sysdeps/unix/sysv/linux/clock_nanosleep.c
> index 1f240b8720..ddcdddcd5d 100644
> --- a/sysdeps/unix/sysv/linux/clock_nanosleep.c
> +++ b/sysdeps/unix/sysv/linux/clock_nanosleep.c
> @@ -36,11 +36,9 @@ __clock_nanosleep (clockid_t clock_id, int flags, const struct timespec *req,
>  
>    /* If the call is interrupted by a signal handler or encounters an error,
>       it returns a positive value similar to errno.  */
> -  INTERNAL_SYSCALL_DECL (err);
> -  int r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep, err, clock_id, flags,
> +  int r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep, clock_id, flags,
>  				   req, rem);
> -  return (INTERNAL_SYSCALL_ERROR_P (r, err)
> -	  ? INTERNAL_SYSCALL_ERRNO (r, err) : 0);
> +  return SYSCALL_CANCEL_ERROR (r) ? -r : 0;
>  }

Shouldn't r change type here?

> diff --git a/sysdeps/unix/sysv/linux/lowlevellock-futex.h b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
> index b423673ed4..bd6d3b7cf7 100644
> --- a/sysdeps/unix/sysv/linux/lowlevellock-futex.h
> +++ b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
> @@ -74,6 +74,12 @@
>       ? -INTERNAL_SYSCALL_ERRNO (__ret, __err) : 0);                     \
>    })
>  
> +#define lll_futex_syscall_cp(...)					\
> +  ({                                                                    \
> +    long int __ret = INTERNAL_SYSCALL_CANCEL (futex, __VA_ARGS__);	\
> +    __ret;								\
> +  })
> +

The statement expression appears unnecessary here.

> +#define lll_futex_timed_wait_cancel(futexp, val, timeout, private)	\
> +  ({									\
> +    long int __ret;							\
> +    int __op = FUTEX_WAIT;						\
> +    __ret = lll_futex_syscall_cp (futexp,				\
> +				  __lll_private_flag (__op, private),	\
> +				  val, timeout);			\
> +    __ret;								\
>    })

Is the statement expression needed?  Should this be an inline function?

>  
> -#define lll_futex_timed_wait_cancel(futexp, val, timeout, private)	   \
> -  ({									   \
> -    int __oldtype = CANCEL_ASYNC ();				       	   \
> -    long int __err = lll_futex_timed_wait (futexp, val, timeout, private); \
> -    CANCEL_RESET (__oldtype);						   \
> -    __err;								   \
> +#define lll_futex_clock_wait_bitset_cancel(futexp, val, clockid,	\
> +					   timeout, private)		\
> +  ({									\
> +    long int __ret;							\
> +    if (lll_futex_supported_clockid (clockid))                          \
> +      {                                                                 \
> +        const unsigned int clockbit =                                   \
> +          (clockid == CLOCK_REALTIME) ? FUTEX_CLOCK_REALTIME : 0;       \
> +        const int op =                                                  \
> +          __lll_private_flag (FUTEX_WAIT_BITSET | clockbit, private);   \
> +									\
> +	__ret = lll_futex_syscall_cp (futexp, op, val,			\
> +                                      timeout, NULL /* Unused.  */,	\
> +                                      FUTEX_BITSET_MATCH_ANY);		\
> +      }									\
> +    else								\
> +      __ret = -EINVAL;							\
> +    __ret;								\
>    })

I do think this should be an inline function.

> --- a/sysdeps/unix/sysv/linux/pthread_kill.c
> +++ b/sysdeps/unix/sysv/linux/pthread_kill_internal.c
> @@ -16,24 +16,15 @@

> -strong_alias (__pthread_kill, pthread_kill)
> +hidden_def (__pthread_kill_internal)

Why hidden_def?  I think you should use attribute_hidden on the
declaration and drop the libc_hidden_proto instead.

> diff --git a/sysdeps/unix/sysv/linux/syscall_cancel.c b/sysdeps/unix/sysv/linux/syscall_cancel.c
> new file mode 100644
> index 0000000000..79d66ec4f4
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/syscall_cancel.c

> +long int
> +__syscall_cancel_arch (volatile int *ch, __syscall_arg_t nr,
> +		       __syscall_arg_t a1, __syscall_arg_t a2,
> +		       __syscall_arg_t a3, __syscall_arg_t a4,
> +		       __syscall_arg_t a5, __syscall_arg_t a6)
> +{
> +#define ADD_LABEL(__label)		\
> +  asm volatile (			\
> +    ".global " __label "\t\n"		\
> +    ".type " __label ",\%function\t\n" 	\
> +    __label ":\n");
> +
> +  ADD_LABEL ("__syscall_cancel_arch_start");
> +  if (__glibc_unlikely (*ch & CANCELED_BITMASK))
> +    __syscall_do_cancel();
> +
> +  INTERNAL_SYSCALL_DECL(err);
> +  long int result = INTERNAL_SYSCALL_NCS (nr, err, 6, a1, a2, a3, a4, a5, a6);
> +  ADD_LABEL ("__syscall_cancel_arch_end");
> +  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (result, err)))
> +    return -INTERNAL_SYSCALL_ERRNO (result, err);
> +  return result;
> +}
> +libc_hidden_def (__syscall_cancel_arch)

For bootstrapping, this is good enough, but I think we really, *really*
should remove this function, so that new ports do not use it by
accident.

> diff --git a/sysdeps/unix/sysv/linux/sysdep.h b/sysdeps/unix/sysv/linux/sysdep.h
> index fc9af51456..317bb7f973 100644
> --- a/sysdeps/unix/sysv/linux/sysdep.h
> +++ b/sysdeps/unix/sysv/linux/sysdep.h
> @@ -27,6 +27,26 @@
>      -1l;					\
>    })
>  
> +/* Check error from cancellable syscall and set errno accordingly.
> +   Linux uses a negative return value to indicate syscall errors
> +   and since version 2.1 the return value of a system call might be
> +   negative even if the call succeeded (e.g., the `lseek' system call
> +   might return a large offset).
> +   Current contract is kernel make sure the no syscall returns a value
> +   in -1 .. -4095 as a valid result so we can savely test with -4095.  */
> +#define SYSCALL_CANCEL_ERROR(__ret)		\
> +  (__ret > -4096UL)

This doesn't look particularly type-safe.  Does it have to be a macro?
If it is a macro, can we add type checks?

> +#define SYSCALL_CANCEL_RET(__ret)		\
> +  ({						\
> +    if (SYSCALL_CANCEL_ERROR(__ret))		\
> +      {						\
> +	__set_errno (-__ret);			\
> +	__ret = -1;				\
> +      }						\
> +    __ret;					\
> +   })
> +
>  /* Provide a dummy argument that can be used to force register
>     alignment for register pairs if required by the syscall ABI.  */
>  #ifdef __ASSUME_ALIGNED_REGISTER_PAIRS

I initially found this split from sysdeps/unix/sysdep.h a bit strange,
but I guess it does reflect the current layering.

Additional notes: {LIBC_,}CANCEL_{ASYNC,RESET} should be removed
everywhere as soon as possible.  This may have a knockpm-effect where
__pthread_disable_asynccancel etc. can go as well.

Thanks,
Florian
Adhemerval Zanella Netto - Oct. 16, 2019, 8:42 p.m.
On 15/10/2019 07:56, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> This patch is the initial fix for race conditions in NPTL cancellation codei
>> by redefining how cancellable syscalls are defined and handled.  Current
>> buggy approach is to enable asynchronous cancellation prior to making the
>> syscall and restore the previous cancellation type once the syscall returns.
> 
> First sentence appears to be garbled.

Ack, I also reformatted to fix it to wrap at 72 column.

> 
>> As decribed in BZ#12683, this approach shows 2 important problems:
>>
>>   1. Cancellation can act after the syscall has returned from kernel, but
>>      before userspace saves the return value.  It might result in a resource
>>      leak if the syscall allocated a resource or a side effect (partial
>>      read/write), and there is no way to program handle it with cancellation
>>      handlers.
>>
>>   2. If a signal is handled while the thread is blocked at a cancellable
>>      syscall, the entire signal handler runs with asynchronous cancellation
>>      enabled.  This can lead to issues if the signal handler call functions
>>      which are async-signal-safe but not async-cancel-safe.
> 
> Do you mean “the cancellation signal handler runs” or “any signal
> handler runs”?  I think the latter is the much larger issue.

Any signal handler since __pthread_enable_asynccancel explicit enables
async cancellation on all cancellable entrypoints. 

> 
> (I think it is customary to wrapper commit messages at column 72 or
> something like that.)

Ack.

> 
>> For cancellation to work correctly, there are 5 points at which the
>> cancellation signal could arrive:
>>
>>   1. Before the final "testcancel" and before the syscall is made.
>>   2. Between the "testcancel" and the syscall.
>>   3. While the syscall is blocked and no side effects have yet taken place.
>>   4. While the syscall is blocked but with some side effects already having
>>      taken place (e.g. a partial read or write).
>>   5. After the syscall has returned.
> 
> For clarity, I'd suggest to say whether we want unblocking to take place
> (i.e., the cancellation should happen eventually without any further
> external event, such as the arrival of more data).

[...]

> 
>> And GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case
>> 4 or 5.  The proposed solution follows:

What about adding:

  And GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case
  4 or 5.  For the 4 and 5 cases, the cancellation will eventually happen in
  the next cancellable entrypoint without any further external event.

  The proposed solution follows:

>>
>>   * Handling case 1 is trivial: do a conditional branch based on whether the
>>     thread has received a cancellation request;
>>
>>   * Case 2 can be caught by the signal handler determining that the saved
>>     program counter (from the ucontext_t) is in some address range beginning
>>     just before the "testcancel" and ending with the syscall instruction.
>>
>>   * In this case, except for certain syscalls that ALWAYS fail with EINTR
>>     even for non-interrupting signals, the kernel will reset the program
>>     counter to point at the syscall instruction during signal handling, so
>>     that the syscall is restarted when the signal handler returns. So, from
>>     the signal handler's standpoint, this looks the same as case 2, and thus
>>     it's taken care of.
>>
>>   * In this case, the kernel cannot restart the syscall; when it's
>>     interrupted by a signal, the kernel must cause the syscall to return
>>     with whatever partial result it obtained (e.g. partial read or write).
>>
>>   * In this case, the saved program counter points just after the syscall
>>     instruction, so the signal handler won't act on cancellation.
>>     This one is equal to 4. since the program counter is past the syscall
>>     instruction already.
> 
> I'd suggest to match these items to the numbers explicitly.

Ack, I changed to:

The proposed solution follows for each case:

  1. Do a conditional branch based on whether the thread has received
     a cancellation request;

  2. It can caught by the signal handler determining that the saved
     program counter (from the ucontext_t) is in some address range
     beginning just before the "testcancel" and ending with the
     syscall instruction.

  3. In this case, except for certain syscalls that ALWAYS fail with
     EINTR even for non-interrupting signals, the kernel will reset
     the program counter to point at the syscall instruction during
     signal handling, so that the syscall is restarted when the signal
     handler returns.  So, from the signal handler's standpoint, this
     looks the same as case 2, and thus it's taken care of.

  4. For syscals with side-effects, the kernel cannot restart the
     syscall; when it's interrupted by a signal, the kernel must cause 
     the syscall to return with whatever partial result it obtained 
     (e.g. partial read or write).

  5. In this case, the saved program counter points just after the
     syscall instruction, so the signal handler won't act on
     cancellation.  This is similar to 4. since the program counter 
     is past the syscall instruction.

> 
>> Another case that needs handling is syscalls that fail with EINTR even
>> when the signal handler is non-interrupting. In this case, the syscall
>> wrapper code can just check the cancellation flag when the errno result
>> is EINTR, and act on cancellation if it's set.
>>
>> The proposed GLIBC adjustments are:
>>
>>   1. Remove the enable_asynccancel/disable_asynccancel function usage in
>>      syscall definition and instead make them call a common symbol that will
>>      check if cancellation is enabled (__syscall_cancel at
>>      nptl/libc-cancellation.c), call the arch-specific cancellable
>>      entry-point (__syscall_cancel_arch) and cancel the thread when required.
>>
>>   2. Provide a arch-specific symbol that contains global markers. These
>>      markers will be used in SIGCANCEL handler to check if the interruption
>>      has been called in a valid syscall and if the syscalls has been
>>      completed or not.
> 
> An arch-specific generic system call wrapper function?

Ack.

> 
>>      A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is
>>      provided.  However the markers may not be set on correct expected places
>>      depeding of how INTERNAL_SYSCALL_NCS is implemented by the underlying
>>      architecture, and it is uses compiler-speficic construct (asm volatile)
>>      to place the required markers.  It is expected that all architectures
>>      implement an arch-specific.
> 
> Truncated sentence.

Ack.

> 
>>   3. Rewrite SIGCANCEL asynchronous handler to check for both cancelling type
>>      and if current IP from signal handler falls between the global markes
>>      and act accordingly (sigcancel_handler at nptl/nptl-init.c).
> 
> s/markes/markers/

Ack.

> 
>> diff --git a/nptl/libc-cancellation.c b/nptl/libc-cancellation.c
>> index 37654cfcfe..430e0b962e 100644
>> --- a/nptl/libc-cancellation.c
>> +++ b/nptl/libc-cancellation.c
>> @@ -18,7 +18,46 @@
>>  
>>  #include "pthreadP.h"
>>  
>> +/* Cancellation function called by all cancellable syscalls.  */
>> +long int
>> +__syscall_cancel (__syscall_arg_t nr, __syscall_arg_t a1,
>> +		  __syscall_arg_t a2, __syscall_arg_t a3,
>> +		  __syscall_arg_t a4, __syscall_arg_t a5,
>> +		  __syscall_arg_t a6)
>> +{
> 
> long int as the return value may not be quite right on x32.

My understanding is it should be safe since the current required
syscalls that might returns a value larger than 'long int' are
already handled by arch-specific implementations (lseek and lseek64).

And I followed that x86_64 sysdep.h already is doing for 
{INTERNAL,INLINE}_SYSCALL, since it does expect the return value
to a unsigned long int and then it cast the return to long int.

It shouldn't work for syscalls that might return a 64-bit value,
but currently there is none that is a cancellation entrypoint.
Do you think we should also parametrize the return type to possible
handle such cases on x32?

> 
>> +  pthread_t self = (pthread_t) THREAD_SELF;
>> +  struct pthread *pd = (struct pthread *) self;
>> +  long int result;
> 
> I would use inline definitions for new code, but that's your choice.

Ack, I will check it.

> 
>> @@ -286,38 +277,49 @@ __pthread_initialize_minimal_internal (void)
>>    THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
>>  
>>  #if defined SIGCANCEL || defined SIGSETXID
>> -  struct sigaction sa;
>> -  __sigemptyset (&sa.sa_mask);
>>  
>>  # ifdef SIGCANCEL
>>    /* Install the cancellation signal handler.  If for some reason we
>>       cannot install the handler we do not abort.  Maybe we should, but
>>       it is only asynchronous cancellation which is affected.  */
>> -  sa.sa_sigaction = sigcancel_handler;
>> -  sa.sa_flags = SA_SIGINFO;
>> -  (void) __libc_sigaction (SIGCANCEL, &sa, NULL);
>> +  {
>> +    struct sigaction sa;
>> +    sa.sa_sigaction = sigcancel_handler;
>> +    /* The signal handle should be non-interruptible to avoid the risk of
>> +       spurious EINTR caused by SIGCANCEL sent to process or if pthread_cancel
>> +       is called while cancellation is disabled in the target thread.  */
>> +    sa.sa_flags = SA_SIGINFO | SA_RESTART;
>> +    sa.sa_mask = SIGALL_SET;
>> +    __libc_sigaction (SIGCANCEL, &sa, NULL);
>> +  }
>>  # endif
> 
> I'm a little bit concerned that adding SA_RESTART here adds more case
> where we do not act on cancellation promptly.  But that does not happen
> because if we observe the cancellation conditions to apply in the
> SIGCANCEL handler, we unwind straight from the signal handler, right?

Correct, is the cancel type is set to asynchronous *or* if the program
counter is within the permitted range (between the start marker and
just before the syscall instruction) the cancellation/unwind is done
on signal handler.

> 
>> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
>> index 070b3afa8d..cafd14f5af 100644
>> --- a/nptl/pthreadP.h
>> +++ b/nptl/pthreadP.h
>> @@ -296,20 +296,46 @@ extern void __nptl_unwind_freeres (void) attribute_hidden;
>>  #endif
>>  
>>  
>> -/* Called when a thread reacts on a cancellation request.  */
>>  static inline void
>>  __attribute ((noreturn, always_inline))
>> -__do_cancel (void)
>> +__do_cancel_with_result (void *result)
>>  {
>>    struct pthread *self = THREAD_SELF;
>>  
>> -  /* Make sure we get no more cancellations.  */
>> -  THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
>> +  /* Make sure we get no more cancellations by clearing the cancel
>> +     state.  */
>> +  int oldval = THREAD_GETMEM (self, cancelhandling);
>> +  while (1)
>> +    {
>> +      int newval = oldval | CANCELSTATE_BITMASK | EXITING_BITMASK;
>> +      if (oldval == newval)
>> +	break;
>> +
>> +      oldval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
>> +					  oldval);
>> +    }
>> +
>> +  THREAD_SETMEM (self, result, result);
>>  
>>    __pthread_unwind ((__pthread_unwind_buf_t *)
>>  		    THREAD_GETMEM (self, cleanup_jmp_buf));
>>  }
> 
> The function name should be more generic because it's for unwinding, not
> cancellation.

Unwind in glibc is an implementation detail; the idea is indeed to
cancel thread execution.  It is used on pthread_exit, cancellable
syscalls (through __syscall_cancel function), SIGCANCEL handler, and
pthread_setcancel{state,type}. I don't think adding unwind on its name
really signify what the functions aims to do.

> 
>> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
>> index 64ac12e60a..b20254bdba 100644
>> --- a/nptl/pthread_cancel.c
>> +++ b/nptl/pthread_cancel.c
>> @@ -37,67 +37,24 @@ __pthread_cancel (pthread_t th)
> 
>> +  /* Avoid signaling when thread attempts cancel itself (pthread_kill
>> +     is expensive).  */
>> +  if (pd == THREAD_SELF
>> +      && (THREAD_GETMEM (pd, cancelhandling) & CANCELTYPE_BITMASK) == 0)
>> +    return 0;
> 
> I'm not sure if I understand this check.  CANCELTYPE_BITMASK is a bit
> misnamed (it's a single-bit masking).  As far as I understand it, it's
> set if async-cancel is enabled.  Shouldn't we call __do_cancel directly
> here in case of an async self-cancel?  And otherwise defer cancellation?

The 'cancelhandling' is indeed needless convoluted to hold not only
cancellation internal state, but also thread cancellation mode
(PTHREAD_CANCEL_ENABLE, PTHREAD_CANCEL_DISABLE) and type
(PTHREAD_CANCEL_DEFERRED, PTHREAD_CANCEL_ASYNCHRONOUS).  I have a
following patch that aims to simplify it a bit.

The check '(THREAD_GETMEM (pd, cancelhandling) & CANCELTYPE_BITMASK) == 0)' 
is basically check if the cancellation type is PTHREAD_CANCEL_DEFERRED.
In this case, instead of signalling to make the signal handler cancel
the thread, we just set the cancellation to act on next cancellation
entrypoint.

> 
>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>> index 130937c3c4..8cab7f970c 100644
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
>> @@ -406,7 +406,7 @@ START_THREAD_DEFN
>>    /* If the parent was running cancellation handlers while creating
>>       the thread the new thread inherited the signal mask.  Reset the
>>       cancellation signal mask.  */
>> -  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELING_BITMASK))
>> +  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELED_BITMASK))
>>      {
>>        INTERNAL_SYSCALL_DECL (err);
>>        sigset_t mask;
> 
> This conflicts with the patch I posted for bug 25098.  My change resets
> the SIGCANCEL mask unconditionally, so the check is gone completely and
> no longer needs adjustment.

Ack, I will rebase once it is pushed upstream.

> 
>> @@ -449,7 +449,8 @@ START_THREAD_DEFN
>>  	 have ownership (see CONCURRENCY NOTES above).  */
>>        if (__glibc_unlikely (pd->stopped_start))
>>  	{
>> -	  int oldtype = CANCEL_ASYNC ();
>> +	  int ct;
>> +	  __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);
>>  
>>  	  /* Get the lock the parent locked to force synchronization.  */
>>  	  lll_lock (pd->lock, LLL_PRIVATE);
>> @@ -459,7 +460,7 @@ START_THREAD_DEFN
>>  	  /* And give it up right away.  */
>>  	  lll_unlock (pd->lock, LLL_PRIVATE);
>>  
>> -	  CANCEL_RESET (oldtype);
>> +	  __pthread_setcanceltype (ct, NULL);
>>  	}
> 
> I think htis is wrong.  Either you need to use a cancellable lll_lock
> here.  Or maybe it is not even necessary to enable cancellation at all
> because the new tread is eventually bound to make progress and will wake
> up the futuex.

This is essentially a mechanical change, since CANCEL_ASYNC/CANCEL_RESET
is removed. I think we should track this potential issue in a separated
discussion.

> 
>> diff --git a/sysdeps/nptl/librt-cancellation.c b/nptl/pthread_kill_internal.c
>> similarity index 70%
>> rename from sysdeps/nptl/librt-cancellation.c
>> rename to nptl/pthread_kill_internal.c
>> index 93ebe4aa71..b4f4f6dc78 100644
>> --- a/sysdeps/nptl/librt-cancellation.c
>> +++ b/nptl/pthread_kill_internal.c
>> @@ -1,6 +1,6 @@
>> -/* Copyright (C) 2002-2019 Free Software Foundation, Inc.
>> +/* Send a signal to a specific pthread.  Internal version.
>> +   Copyright (C) 2002-2019 Free Software Foundation, Inc.
>>     This file is part of the GNU C Library.
>> -   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
>>  
>>     The GNU C Library is free software; you can redistribute it and/or
>>     modify it under the terms of the GNU Lesser General Public
>> @@ -16,9 +16,11 @@
>>     License along with the GNU C Library; if not, see
>>     <https://www.gnu.org/licenses/>.  */
>>  
>> -#include <nptl/pthreadP.h>
>> +#include <pthreadP.h>
>>  
>> -
>> -#define __pthread_enable_asynccancel __librt_enable_asynccancel
>> -#define __pthread_disable_asynccancel __librt_disable_asynccancel
>> -#include <nptl/cancellation.c>
>> +int
>> +__pthread_kill_internal (pthread_t threadid, int signo)
>> +{
>> +  return ENOSYS;
>> +}
>> +hidden_def (__pthread_kill_internal)
> 
> I don't think we need this stub.

Indeed nptl is already Linux specific that there is no gain in
adding another compat layer. I did:

git mv sysdeps/unix/sysv/linux/pthread_kill_internal.c nptl/pthread_kill_internal.c -f

> 
>> diff --git a/nptl/pthread_setcanceltype.c b/nptl/pthread_setcanceltype.c
>> index d771c31e46..84cc657aed 100644
>> --- a/nptl/pthread_setcanceltype.c
>> +++ b/nptl/pthread_setcanceltype.c
>> @@ -73,4 +73,4 @@ __pthread_setcanceltype (int type, int *oldtype)
>>  
>>    return 0;
>>  }
>> -strong_alias (__pthread_setcanceltype, pthread_setcanceltype)
>> +weak_alias (__pthread_setcanceltype, pthread_setcanceltype)
> 
> Why is this change necessary?  pthread_setcanceltype is the only
> function defined in this file, so there are no issues with static
> linking.  I'm concerned that this hides a problem elsewhere.

This is to avoid the linknamespace issue:

$ cat conform/ISO11/threads.h/linknamespace.out 
[initial] thrd_create -> [libpthread.a(thrd_create.o)] __pthread_create_2_1 -> [libpthread.a(pthread_create.o)] __pthread_setcanceltype -> [libpthread.a(pthread_setcanceltype.o)] pthread_setcanceltype

> 
>> diff --git a/nptl/thrd_sleep.c b/nptl/thrd_sleep.c
>> index 2e185dd748..75c0d53f3e 100644
>> --- a/nptl/thrd_sleep.c
>> +++ b/nptl/thrd_sleep.c
>> @@ -24,13 +24,12 @@
>>  int
>>  thrd_sleep (const struct timespec* time_point, struct timespec* remaining)
>>  {
>> -  INTERNAL_SYSCALL_DECL (err);
>> -  int ret = INTERNAL_SYSCALL_CANCEL (nanosleep, err, time_point, remaining);
>> -  if (INTERNAL_SYSCALL_ERROR_P (ret, err))
>> +  long int ret = INTERNAL_SYSCALL_CANCEL (nanosleep, time_point, remaining);
> 
> The return type may need adjustment if the return type for the
> cancellation wrapper changes.

Do you mean for the x32 case?

> 
>> diff --git a/nptl/tst-cancel28.c b/nptl/tst-cancel28.c
>> new file mode 100644
>> index 0000000000..e27f5a0776
>> --- /dev/null
>> +++ b/nptl/tst-cancel28.c
> 
>> +static void *
>> +writeopener (void *arg)
>> +{
>> +  int fd;
>> +  for (;;)
>> +    {
>> +      fd = open (arg, O_WRONLY);
>> +      close (fd);
>> +    }
>> +  return NULL;
>> +}
>> +
>> +static void *
>> +leaker (void *arg)
>> +{
>> +  int fd = open (arg, O_RDONLY);
>> +  TEST_VERIFY_EXIT (fd > 0);
>> +  pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, 0);
>> +  close (fd);
>> +  return NULL;
>> +}
> 
> I think these two functions *really* should check the return value of
> close (or use xclose).

Ack.

> 
>> +#define ITER_COUNT 1000
>> +#define MAX_FILENO 1024
> 
> MAX_FILENO appears to be unused.
> 
>> +#define TIMEOUT 10
>> +#include <support/test-driver.c>
> 
> TIMEOUT is smaller than the default test timeout.

Ack, I removed it.

> 
>> diff --git a/sysdeps/htl/pthreadP.h b/sysdeps/htl/pthreadP.h
>> index e745dacac1..5628791de8 100644
>> --- a/sysdeps/htl/pthreadP.h
>> +++ b/sysdeps/htl/pthreadP.h
>> @@ -25,6 +25,7 @@
>>  
>>  extern pthread_t __pthread_self (void);
>>  extern int __pthread_kill (pthread_t threadid, int signo);
>> +extern int __pthread_kill_internal (pthread_t threadid, int signo);
>>  extern struct __pthread **__pthread_threads;
>>  
>>  extern int _pthread_mutex_init (pthread_mutex_t *mutex, const pthread_mutexattr_t *attr);
> 
> This change appears unnecessary because __pthread_kill_internal does not
> seem to be used outside of nptl.

Ack.

> 
>> diff --git a/sysdeps/nptl/cancellation-pc-check.h b/sysdeps/nptl/cancellation-pc-check.h
>> new file mode 100644
>> index 0000000000..8b26c4ec4e
>> --- /dev/null
>> +++ b/sysdeps/nptl/cancellation-pc-check.h
> 
>> +/* Check if the program counter (PC) from ucontext CTX is within the start and
>> +   then end boundary from the __syscall_cancel_arch bridge.  Return TRUE if
>> +   the PC is within the boundary, meaning the syscall does not have any side
>> +   effects; or FALSE otherwise.  */
>> +static bool
>> +ucontext_check_pc_boundary (void *ctx)
>> +{
>> +  /* Both are defined in syscall_cancel.S.  */
>> +  extern const char __syscall_cancel_arch_start[1];
>> +  extern const char __syscall_cancel_arch_end[1];
>> +
>> +  uintptr_t pc = sigcontext_get_pc (ctx);
>> +  return pc >= (uintptr_t) __syscall_cancel_arch_start
>> +	 && pc < (uintptr_t) __syscall_cancel_arch_end;
>> +}
> 
> The name of this function should reflect that it is related to
> cancellation.
> 
> Please add a comment explaining the kernel behavior regarding PC
> (non-interrupted vs interrupted system calls), as found in the commit
> message.

Ack, I have changed the comment to:

/* For syscalls with side-effects, the kernel cannot restart the syscall; when
   it is interrupted by a signal, the kernel must cause the syscall to return
   with whatever partial result is obtained (e.g. partial read or write).  In
   this case, the saved program counter points just after the syscall
   instruction, so the SIGCANCEL handler should not act on cancellation.

   The __syscall_cancel_arch function, used for all cancellable syscalls,
   contains two extra markers, __syscall_cancel_arch_start and
   __syscall_cancel_arch_end.  The former points to just before the initial
   conditional branch that checks if the thread has received a cancellation
   request, while former points to the instruction after the one responsible
   to issue the syscall.

   The function check if the program counter (PC) from ucontext_t CTX is
   within the start and then end boundary from the __syscall_cancel_arch
   bridge.  Return TRUE if the PC is within the boundary, meaning the
   syscall does not have any side effects; or FALSE otherwise.  */

> 
>> +#endif
>> diff --git a/sysdeps/nptl/cancellation-sigmask.h b/sysdeps/nptl/cancellation-sigmask.h
>> new file mode 100644
>> index 0000000000..77702c135e
>> --- /dev/null
>> +++ b/sysdeps/nptl/cancellation-sigmask.h
> 
>> +/* Add the SIGCANCEL signal on sigmask set at the ucontext CTX obtained from
>> +   the sigaction handler.  */
>> +static void
>> +ucontext_add_cancel (void *ctx)
>> +{
>> +  __sigaddset (&((ucontext_t*) ctx)->uc_sigmask, SIGCANCEL);
>> +}
> 
> I find the name of the function rather unclear.  Maybe
> ucontext_block_sigcancel or something like that.

Ack, I changed to ucontext_block_sigcancel.

> 
>> diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
>> index 10468c7129..04f30b8cbc 100644
>> --- a/sysdeps/unix/sysdep.h
>> +++ b/sysdeps/unix/sysdep.h
> 
>> +#define __SYSCALL_CANCEL0(name) \
>> +  (__syscall_cancel)(__NR_##name, 0, 0, 0, 0, 0, 0)
> 
> There are a bunch of formatting nits in the changes to this file.
> “)(” should be “) (”.  Likewise for “__SSC(”.

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/clock_nanosleep.c b/sysdeps/unix/sysv/linux/clock_nanosleep.c
>> index 1f240b8720..ddcdddcd5d 100644
>> --- a/sysdeps/unix/sysv/linux/clock_nanosleep.c
>> +++ b/sysdeps/unix/sysv/linux/clock_nanosleep.c
>> @@ -36,11 +36,9 @@ __clock_nanosleep (clockid_t clock_id, int flags, const struct timespec *req,
>>  
>>    /* If the call is interrupted by a signal handler or encounters an error,
>>       it returns a positive value similar to errno.  */
>> -  INTERNAL_SYSCALL_DECL (err);
>> -  int r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep, err, clock_id, flags,
>> +  int r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep, clock_id, flags,
>>  				   req, rem);
>> -  return (INTERNAL_SYSCALL_ERROR_P (r, err)
>> -	  ? INTERNAL_SYSCALL_ERRNO (r, err) : 0);
>> +  return SYSCALL_CANCEL_ERROR (r) ? -r : 0;
>>  }
> 
> Shouldn't r change type here?

Not strickly required, but I agree that it would be better to change
to long int.

> 
>> diff --git a/sysdeps/unix/sysv/linux/lowlevellock-futex.h b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
>> index b423673ed4..bd6d3b7cf7 100644
>> --- a/sysdeps/unix/sysv/linux/lowlevellock-futex.h
>> +++ b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
>> @@ -74,6 +74,12 @@
>>       ? -INTERNAL_SYSCALL_ERRNO (__ret, __err) : 0);                     \
>>    })
>>  
>> +#define lll_futex_syscall_cp(...)					\
>> +  ({                                                                    \
>> +    long int __ret = INTERNAL_SYSCALL_CANCEL (futex, __VA_ARGS__);	\
>> +    __ret;								\
>> +  })
>> +
> 
> The statement expression appears unnecessary here.
> 
>> +#define lll_futex_timed_wait_cancel(futexp, val, timeout, private)	\
>> +  ({									\
>> +    long int __ret;							\
>> +    int __op = FUTEX_WAIT;						\
>> +    __ret = lll_futex_syscall_cp (futexp,				\
>> +				  __lll_private_flag (__op, private),	\
>> +				  val, timeout);			\
>> +    __ret;								\
>>    })
> 
> Is the statement expression needed?  Should this be an inline function?
> 
>>  
>> -#define lll_futex_timed_wait_cancel(futexp, val, timeout, private)	   \
>> -  ({									   \
>> -    int __oldtype = CANCEL_ASYNC ();				       	   \
>> -    long int __err = lll_futex_timed_wait (futexp, val, timeout, private); \
>> -    CANCEL_RESET (__oldtype);						   \
>> -    __err;								   \
>> +#define lll_futex_clock_wait_bitset_cancel(futexp, val, clockid,	\
>> +					   timeout, private)		\
>> +  ({									\
>> +    long int __ret;							\
>> +    if (lll_futex_supported_clockid (clockid))                          \
>> +      {                                                                 \
>> +        const unsigned int clockbit =                                   \
>> +          (clockid == CLOCK_REALTIME) ? FUTEX_CLOCK_REALTIME : 0;       \
>> +        const int op =                                                  \
>> +          __lll_private_flag (FUTEX_WAIT_BITSET | clockbit, private);   \
>> +									\
>> +	__ret = lll_futex_syscall_cp (futexp, op, val,			\
>> +                                      timeout, NULL /* Unused.  */,	\
>> +                                      FUTEX_BITSET_MATCH_ANY);		\
>> +      }									\
>> +    else								\
>> +      __ret = -EINVAL;							\
>> +    __ret;								\
>>    })
> 
> I do think this should be an inline function.

I followed the current practice for this file, but I agree that we should
move to inline function in this cases.  I adjusted to use the types defined
on sysdeps/unix/sysv/linux/futex-internal.h (unsigned int * for futex
address) which requires some extra casts on nptl/pthread_join_common.c.

> 
>> --- a/sysdeps/unix/sysv/linux/pthread_kill.c
>> +++ b/sysdeps/unix/sysv/linux/pthread_kill_internal.c
>> @@ -16,24 +16,15 @@
> 
>> -strong_alias (__pthread_kill, pthread_kill)
>> +hidden_def (__pthread_kill_internal)
> 
> Why hidden_def?  I think you should use attribute_hidden on the
> declaration and drop the libc_hidden_proto instead.

Ack.

> 
>> diff --git a/sysdeps/unix/sysv/linux/syscall_cancel.c b/sysdeps/unix/sysv/linux/syscall_cancel.c
>> new file mode 100644
>> index 0000000000..79d66ec4f4
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/syscall_cancel.c
> 
>> +long int
>> +__syscall_cancel_arch (volatile int *ch, __syscall_arg_t nr,
>> +		       __syscall_arg_t a1, __syscall_arg_t a2,
>> +		       __syscall_arg_t a3, __syscall_arg_t a4,
>> +		       __syscall_arg_t a5, __syscall_arg_t a6)
>> +{
>> +#define ADD_LABEL(__label)		\
>> +  asm volatile (			\
>> +    ".global " __label "\t\n"		\
>> +    ".type " __label ",\%function\t\n" 	\
>> +    __label ":\n");
>> +
>> +  ADD_LABEL ("__syscall_cancel_arch_start");
>> +  if (__glibc_unlikely (*ch & CANCELED_BITMASK))
>> +    __syscall_do_cancel();
>> +
>> +  INTERNAL_SYSCALL_DECL(err);
>> +  long int result = INTERNAL_SYSCALL_NCS (nr, err, 6, a1, a2, a3, a4, a5, a6);
>> +  ADD_LABEL ("__syscall_cancel_arch_end");
>> +  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (result, err)))
>> +    return -INTERNAL_SYSCALL_ERRNO (result, err);
>> +  return result;
>> +}
>> +libc_hidden_def (__syscall_cancel_arch)
> 
> For bootstrapping, this is good enough, but I think we really, *really*
> should remove this function, so that new ports do not use it by
> accident.
> 

What about adding a #warning to state this file should not be use?

#warning "This implementation should be use just as reference or for bootstrapping"

>> diff --git a/sysdeps/unix/sysv/linux/sysdep.h b/sysdeps/unix/sysv/linux/sysdep.h
>> index fc9af51456..317bb7f973 100644
>> --- a/sysdeps/unix/sysv/linux/sysdep.h
>> +++ b/sysdeps/unix/sysv/linux/sysdep.h
>> @@ -27,6 +27,26 @@
>>      -1l;					\
>>    })
>>  
>> +/* Check error from cancellable syscall and set errno accordingly.
>> +   Linux uses a negative return value to indicate syscall errors
>> +   and since version 2.1 the return value of a system call might be
>> +   negative even if the call succeeded (e.g., the `lseek' system call
>> +   might return a large offset).
>> +   Current contract is kernel make sure the no syscall returns a value
>> +   in -1 .. -4095 as a valid result so we can savely test with -4095.  */
>> +#define SYSCALL_CANCEL_ERROR(__ret)		\
>> +  (__ret > -4096UL)
> 
> This doesn't look particularly type-safe.  Does it have to be a macro?
> If it is a macro, can we add type checks?

__set_errno(ev) ((errno) = (ev))

> 
>> +#define SYSCALL_CANCEL_RET(__ret)		\
>> +  ({						\
>> +    if (SYSCALL_CANCEL_ERROR(__ret))		\
>> +      {						\
>> +	__set_errno (-__ret);			\
>> +	__ret = -1;				\
>> +      }						\
>> +    __ret;					\
>> +   })
>> +
>>  /* Provide a dummy argument that can be used to force register
>>     alignment for register pairs if required by the syscall ABI.  */
>>  #ifdef __ASSUME_ALIGNED_REGISTER_PAIRS> 
> I initially found this split from sysdeps/unix/sysdep.h a bit strange,
> but I guess it does reflect the current layering.
> 
> Additional notes: {LIBC_,}CANCEL_{ASYNC,RESET} should be removed
> everywhere as soon as possible.  This may have a knockpm-effect where
> __pthread_disable_asynccancel etc. can go as well.

The __pthread_{enable,disable}_asynccancel are completely removed in
this patch.  The only missing {LIBC_,}CANCEL_{ASYNC,RESET} reference
I forgot to remove is at manual/process.texi.
Florian Weimer - Oct. 18, 2019, 12:37 p.m.
* Adhemerval Zanella:

> -  sa.sa_sigaction = sigcancel_handler;
> -  sa.sa_flags = SA_SIGINFO;
> -  (void) __libc_sigaction (SIGCANCEL, &sa, NULL);
> +  {
> +    struct sigaction sa;
> +    sa.sa_sigaction = sigcancel_handler;
> +    /* The signal handle should be non-interruptible to avoid the risk of
> +       spurious EINTR caused by SIGCANCEL sent to process or if pthread_cancel
> +       is called while cancellation is disabled in the target thread.  */
> +    sa.sa_flags = SA_SIGINFO | SA_RESTART;
> +    sa.sa_mask = SIGALL_SET;
> +    __libc_sigaction (SIGCANCEL, &sa, NULL);
> +  }

Since SIGCANCEL and SIGTIMER are the same, I wondered whether this
change impacts timer_create.

Here is what i found: timer_create arranges for SIGCANCEL/SIGTIMER to be
sent to the internal helper thread, which does this:

      /* sigwaitinfo cannot be used here, since it deletes
         SIGCANCEL == SIGTIMER from the set.  */

      /* XXX The size argument hopefully will have to be changed to the
         real size of the user-level sigset_t.  */
      int result = SYSCALL_CANCEL (rt_sigtimedwait, &ss, &si, NULL, _NSIG / 8);

      if (result > 0)
        {
          if (si.si_code == SI_TIMER)
            {
              struct timer *tk = (struct timer *) si.si_ptr;
…
          else if (si.si_code == SI_TKILL)
            /* The thread is canceled.  */
            pthread_exit (NULL);
        }

This suggests to me that the helper thread does NOT depend on EINTR
being generated for SIGCANCEL/SIGTIMER, and it should be fine to use
SA_RESTART for that signal as far as timer_create is concerned.

If you agree, it probably makes sense to add this bit of information to
the commit message.

Thanks,
Florian
Adhemerval Zanella Netto - Oct. 21, 2019, 1:29 p.m.
On 18/10/2019 09:37, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> -  sa.sa_sigaction = sigcancel_handler;
>> -  sa.sa_flags = SA_SIGINFO;
>> -  (void) __libc_sigaction (SIGCANCEL, &sa, NULL);
>> +  {
>> +    struct sigaction sa;
>> +    sa.sa_sigaction = sigcancel_handler;
>> +    /* The signal handle should be non-interruptible to avoid the risk of
>> +       spurious EINTR caused by SIGCANCEL sent to process or if pthread_cancel
>> +       is called while cancellation is disabled in the target thread.  */
>> +    sa.sa_flags = SA_SIGINFO | SA_RESTART;
>> +    sa.sa_mask = SIGALL_SET;
>> +    __libc_sigaction (SIGCANCEL, &sa, NULL);
>> +  }
> 
> Since SIGCANCEL and SIGTIMER are the same, I wondered whether this
> change impacts timer_create.
> 
> Here is what i found: timer_create arranges for SIGCANCEL/SIGTIMER to be
> sent to the internal helper thread, which does this:
> 
>       /* sigwaitinfo cannot be used here, since it deletes
>          SIGCANCEL == SIGTIMER from the set.  */
> 
>       /* XXX The size argument hopefully will have to be changed to the
>          real size of the user-level sigset_t.  */
>       int result = SYSCALL_CANCEL (rt_sigtimedwait, &ss, &si, NULL, _NSIG / 8);
> 
>       if (result > 0)
>         {
>           if (si.si_code == SI_TIMER)
>             {
>               struct timer *tk = (struct timer *) si.si_ptr;
> …
>           else if (si.si_code == SI_TKILL)
>             /* The thread is canceled.  */
>             pthread_exit (NULL);
>         }
> 
> This suggests to me that the helper thread does NOT depend on EINTR
> being generated for SIGCANCEL/SIGTIMER, and it should be fine to use
> SA_RESTART for that signal as far as timer_create is concerned.
> 
> If you agree, it probably makes sense to add this bit of information to
> the commit message.

Ack, this is what I have added:

--

As a side note regarding SIGCANCEL and SIGTIMER being the the same,
it should not impact timer_create functionality.  It arranges for
SIGCANCEL/SIGTIMER to be sent to the internal helper thread, which
in turn check if the si.si_code is SI_TIMER and call pthread_exit
otherwise (sysdeps/unix/sysv/linux/timer_routines.c:129).

This suggests that the helper thread does NOT depend on EINTR
being generated for SIGCANCEL/SIGTIMER, and it should be fine to use
SA_RESTART for that signal as far as timer_create is concerned.

Patch

diff --git a/manual/llio.texi b/manual/llio.texi
index 447126b7eb..ecf175341d 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -2534,13 +2534,13 @@  aiocb64}, since the LFS transparently replaces the old interface.
 @c     sigemptyset ok
 @c     sigaddset ok
 @c     setjmp ok
-@c     CANCEL_ASYNC -> pthread_enable_asynccancel ok
+@c     __pthread_setcanceltype ok
 @c      do_cancel ok
 @c       pthread_unwind ok
 @c        Unwind_ForcedUnwind or longjmp ok [@ascuheap @acsmem?]
 @c     lll_lock @asulock @aculock
 @c     lll_unlock @asulock @aculock
-@c     CANCEL_RESET -> pthread_disable_asynccancel ok
+@c     __pthread_setcanceltype ok
 @c      lll_futex_wait ok
 @c     ->start_routine ok -----
 @c     call_tls_dtors @asulock @ascuheap @aculock @acsmem
diff --git a/nptl/Makefile b/nptl/Makefile
index 1129fd4516..02dd05f1aa 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -34,7 +34,8 @@  routines = alloca_cutoff forward libc-lowlevellock libc-cancellation \
 	   pthread_attr_destroy pthread_attr_init pthread_attr_getdetachstate \
 	   pthread_attr_setdetachstate pthread_attr_getinheritsched \
 	   pthread_attr_setinheritsched pthread_attr_getschedparam \
-	   pthread_attr_setschedparam
+	   pthread_attr_setschedparam \
+	   syscall_cancel
 shared-only-routines = forward
 static-only-routines = pthread_atfork
 
@@ -103,7 +104,8 @@  libpthread-routines = nptl-init nptlfreeres vars events version pt-interp \
 		      pthread_barrierattr_setpshared \
 		      pthread_key_create pthread_key_delete \
 		      pthread_getspecific pthread_setspecific \
-		      pthread_sigmask pthread_kill pthread_sigqueue \
+		      pthread_sigmask pthread_kill pthread_kill_internal \
+		      pthread_sigqueue \
 		      pthread_cancel pthread_testcancel \
 		      pthread_setcancelstate pthread_setcanceltype \
 		      pthread_once \
@@ -117,7 +119,6 @@  libpthread-routines = nptl-init nptlfreeres vars events version pt-interp \
 		      cleanup cleanup_defer cleanup_compat \
 		      cleanup_defer_compat unwind \
 		      pt-longjmp pt-cleanup\
-		      cancellation \
 		      lowlevellock \
 		      lll_timedlock_wait \
 		      pt-fork pt-fcntl \
@@ -171,8 +172,7 @@  CFLAGS-pthread_setcanceltype.c += -fexceptions -fasynchronous-unwind-tables
 
 # These are internal functions which similar functionality as setcancelstate
 # and setcanceltype.
-CFLAGS-cancellation.c += -fasynchronous-unwind-tables
-CFLAGS-libc-cancellation.c += -fasynchronous-unwind-tables
+CFLAGS-libc-cancellation.c += -fexceptions -fasynchronous-unwind-tables
 
 # Calling pthread_exit() must cause the registered cancel handlers to
 # be executed.  Therefore exceptions have to be thrown through this
@@ -286,7 +286,7 @@  tests = tst-attr1 tst-attr2 tst-attr3 tst-default-attr \
 	tst-cancel11 tst-cancel12 tst-cancel13 tst-cancel14 tst-cancel15 \
 	tst-cancel16 tst-cancel17 tst-cancel18 tst-cancel19 tst-cancel20 \
 	tst-cancel21 tst-cancel22 tst-cancel23 tst-cancel24 tst-cancel25 \
-	tst-cancel26 tst-cancel27 \
+	tst-cancel26 tst-cancel27 tst-cancel28 \
 	tst-cancel-self tst-cancel-self-cancelstate \
 	tst-cancel-self-canceltype tst-cancel-self-testcancel \
 	tst-cleanup0 tst-cleanup1 tst-cleanup2 tst-cleanup3 tst-cleanup4 \
diff --git a/nptl/Versions b/nptl/Versions
index be7e810875..afdb448ba9 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -39,6 +39,9 @@  libc {
     __libc_pthread_init;
     __libc_current_sigrtmin_private; __libc_current_sigrtmax_private;
     __libc_allocate_rtsig_private;
+    __syscall_cancel;
+    __syscall_cancel_arch_start;
+    __syscall_cancel_arch_end;
   }
 }
 
diff --git a/nptl/cancellation.c b/nptl/cancellation.c
deleted file mode 100644
index 7712845561..0000000000
--- a/nptl/cancellation.c
+++ /dev/null
@@ -1,101 +0,0 @@ 
-/* Copyright (C) 2002-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <setjmp.h>
-#include <stdlib.h>
-#include "pthreadP.h"
-#include <futex-internal.h>
-
-
-/* The next two functions are similar to pthread_setcanceltype() but
-   more specialized for the use in the cancelable functions like write().
-   They do not need to check parameters etc.  */
-int
-attribute_hidden
-__pthread_enable_asynccancel (void)
-{
-  struct pthread *self = THREAD_SELF;
-  int oldval = THREAD_GETMEM (self, cancelhandling);
-
-  while (1)
-    {
-      int newval = oldval | CANCELTYPE_BITMASK;
-
-      if (newval == oldval)
-	break;
-
-      int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
-					      oldval);
-      if (__glibc_likely (curval == oldval))
-	{
-	  if (CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS (newval))
-	    {
-	      THREAD_SETMEM (self, result, PTHREAD_CANCELED);
-	      __do_cancel ();
-	    }
-
-	  break;
-	}
-
-      /* Prepare the next round.  */
-      oldval = curval;
-    }
-
-  return oldval;
-}
-
-
-void
-attribute_hidden
-__pthread_disable_asynccancel (int oldtype)
-{
-  /* If asynchronous cancellation was enabled before we do not have
-     anything to do.  */
-  if (oldtype & CANCELTYPE_BITMASK)
-    return;
-
-  struct pthread *self = THREAD_SELF;
-  int newval;
-
-  int oldval = THREAD_GETMEM (self, cancelhandling);
-
-  while (1)
-    {
-      newval = oldval & ~CANCELTYPE_BITMASK;
-
-      int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
-					      oldval);
-      if (__glibc_likely (curval == oldval))
-	break;
-
-      /* Prepare the next round.  */
-      oldval = curval;
-    }
-
-  /* We cannot return when we are being canceled.  Upon return the
-     thread might be things which would have to be undone.  The
-     following loop should loop until the cancellation signal is
-     delivered.  */
-  while (__builtin_expect ((newval & (CANCELING_BITMASK | CANCELED_BITMASK))
-			   == CANCELING_BITMASK, 0))
-    {
-      futex_wait_simple ((unsigned int *) &self->cancelhandling, newval,
-			 FUTEX_PRIVATE);
-      newval = THREAD_GETMEM (self, cancelhandling);
-    }
-}
diff --git a/nptl/descr.h b/nptl/descr.h
index d3f863aa18..a53f3326e6 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -275,23 +275,20 @@  struct pthread
   /* Bit set if asynchronous cancellation mode is selected.  */
 #define CANCELTYPE_BIT		1
 #define CANCELTYPE_BITMASK	(0x01 << CANCELTYPE_BIT)
-  /* Bit set if canceling has been initiated.  */
-#define CANCELING_BIT		2
-#define CANCELING_BITMASK	(0x01 << CANCELING_BIT)
-  /* Bit set if canceled.  */
-#define CANCELED_BIT		3
+  /* Bit set if thread is canceled.  */
+#define CANCELED_BIT		2
 #define CANCELED_BITMASK	(0x01 << CANCELED_BIT)
   /* Bit set if thread is exiting.  */
-#define EXITING_BIT		4
+#define EXITING_BIT		3
 #define EXITING_BITMASK		(0x01 << EXITING_BIT)
   /* Bit set if thread terminated and TCB is freed.  */
-#define TERMINATED_BIT		5
+#define TERMINATED_BIT		4
 #define TERMINATED_BITMASK	(0x01 << TERMINATED_BIT)
   /* Bit set if thread is supposed to change XID.  */
-#define SETXID_BIT		6
+#define SETXID_BIT		5
 #define SETXID_BITMASK		(0x01 << SETXID_BIT)
   /* Mask for the rest.  Helps the compiler to optimize.  */
-#define CANCEL_RESTMASK		0xffffff80
+#define CANCEL_RESTMASK		0xffffffc0
 
 #define CANCEL_ENABLED_AND_CANCELED(value) \
   (((value) & (CANCELSTATE_BITMASK | CANCELED_BITMASK | EXITING_BITMASK	      \
diff --git a/nptl/libc-cancellation.c b/nptl/libc-cancellation.c
index 37654cfcfe..430e0b962e 100644
--- a/nptl/libc-cancellation.c
+++ b/nptl/libc-cancellation.c
@@ -18,7 +18,46 @@ 
 
 #include "pthreadP.h"
 
+/* Cancellation function called by all cancellable syscalls.  */
+long int
+__syscall_cancel (__syscall_arg_t nr, __syscall_arg_t a1,
+		  __syscall_arg_t a2, __syscall_arg_t a3,
+		  __syscall_arg_t a4, __syscall_arg_t a5,
+		  __syscall_arg_t a6)
+{
+  pthread_t self = (pthread_t) THREAD_SELF;
+  struct pthread *pd = (struct pthread *) self;
+  long int result;
 
-#define __pthread_enable_asynccancel __libc_enable_asynccancel
-#define __pthread_disable_asynccancel __libc_disable_asynccancel
-#include <nptl/cancellation.c>
+  /* If cancellation is not enabled, call the syscall directly.  */
+  if (pd->cancelhandling & CANCELSTATE_BITMASK)
+    {
+      INTERNAL_SYSCALL_DECL (err);
+      result = INTERNAL_SYSCALL_NCS_CALL (nr, err, a1, a2, a3, a4, a5, a6);
+      if (INTERNAL_SYSCALL_ERROR_P (result, err))
+	return -INTERNAL_SYSCALL_ERRNO (result, err);
+      return result;
+    }
+
+  /* Call the arch-specific entry points that contains the globals markers
+     to be checked by SIGCANCEL handler.  */
+  result = __syscall_cancel_arch (&pd->cancelhandling, nr, a1, a2, a3, a4, a5,
+			          a6);
+
+  if ((result == -EINTR)
+      && (pd->cancelhandling & CANCELED_BITMASK)
+      && !(pd->cancelhandling & CANCELSTATE_BITMASK))
+    __do_cancel ();
+
+  return result;
+}
+libc_hidden_def (__syscall_cancel)
+
+/* Since __do_cancel is a always inline function, this creates a symbol the
+   arch-specific symbol can call to cancel the thread.  */
+_Noreturn void
+attribute_hidden
+__syscall_do_cancel (void)
+{
+  __do_cancel ();
+}
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index ea91b9e138..09bfe16d45 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -39,6 +39,9 @@ 
 #include <libc-pointer-arith.h>
 #include <pthread-pids.h>
 #include <pthread_mutex_conf.h>
+#include <sigcontextinfo.h>
+#include <cancellation-sigmask.h>
+#include <cancellation-pc-check.h>
 
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
 /* Pointer to the corresponding variable in libc.  */
@@ -155,35 +158,23 @@  sigcancel_handler (int sig, siginfo_t *si, void *ctx)
 
   struct pthread *self = THREAD_SELF;
 
-  int oldval = THREAD_GETMEM (self, cancelhandling);
-  while (1)
-    {
-      /* We are canceled now.  When canceled by another thread this flag
-	 is already set but if the signal is directly send (internally or
-	 from another process) is has to be done here.  */
-      int newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK;
-
-      if (oldval == newval || (oldval & EXITING_BITMASK) != 0)
-	/* Already canceled or exiting.  */
-	break;
-
-      int curval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
-					      oldval);
-      if (curval == oldval)
-	{
-	  /* Set the return value.  */
-	  THREAD_SETMEM (self, result, PTHREAD_CANCELED);
-
-	  /* Make sure asynchronous cancellation is still enabled.  */
-	  if ((newval & CANCELTYPE_BITMASK) != 0)
-	    /* Run the registered destructors and terminate the thread.  */
-	    __do_cancel ();
-
-	  break;
-	}
-
-      oldval = curval;
-    }
+  if (((self->cancelhandling & (CANCELSTATE_BITMASK)) != 0)
+      || ((self->cancelhandling & CANCELED_BITMASK) == 0))
+    return;
+
+  /* Add SIGCANCEL on ignored sigmask to avoid the handler to be called
+     again.  */
+  ucontext_add_cancel (ctx);
+
+  /* Check if asynchronous cancellation mode is set or if interrupted
+     instruction pointer falls within the cancellable syscall bridge.  For
+     interruptable syscalls that might generate external side-effects (partial
+     reads or writes, for instance), the kernel will set the IP to after
+     '__syscall_cancel_arch_end', thus disabling the cancellation and allowing
+     the process to handle such conditions.  */
+  if (self->cancelhandling & CANCELTYPE_BITMASK
+      || ucontext_check_pc_boundary (ctx))
+    __do_cancel ();
 }
 #endif
 
@@ -286,38 +277,49 @@  __pthread_initialize_minimal_internal (void)
   THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
 
 #if defined SIGCANCEL || defined SIGSETXID
-  struct sigaction sa;
-  __sigemptyset (&sa.sa_mask);
 
 # ifdef SIGCANCEL
   /* Install the cancellation signal handler.  If for some reason we
      cannot install the handler we do not abort.  Maybe we should, but
      it is only asynchronous cancellation which is affected.  */
-  sa.sa_sigaction = sigcancel_handler;
-  sa.sa_flags = SA_SIGINFO;
-  (void) __libc_sigaction (SIGCANCEL, &sa, NULL);
+  {
+    struct sigaction sa;
+    sa.sa_sigaction = sigcancel_handler;
+    /* The signal handle should be non-interruptible to avoid the risk of
+       spurious EINTR caused by SIGCANCEL sent to process or if pthread_cancel
+       is called while cancellation is disabled in the target thread.  */
+    sa.sa_flags = SA_SIGINFO | SA_RESTART;
+    sa.sa_mask = SIGALL_SET;
+    __libc_sigaction (SIGCANCEL, &sa, NULL);
+  }
 # endif
 
 # ifdef SIGSETXID
-  /* Install the handle to change the threads' uid/gid.  */
-  sa.sa_sigaction = sighandler_setxid;
-  sa.sa_flags = SA_SIGINFO | SA_RESTART;
-  (void) __libc_sigaction (SIGSETXID, &sa, NULL);
+  {
+    /* Install the handle to change the threads' uid/gid.  */
+    struct sigaction sa;
+    __sigemptyset (&sa.sa_mask);
+    sa.sa_sigaction = sighandler_setxid;
+    sa.sa_flags = SA_SIGINFO | SA_RESTART;
+    __libc_sigaction (SIGSETXID, &sa, NULL);
+  }
 # endif
 
   /* The parent process might have left the signals blocked.  Just in
      case, unblock it.  We reuse the signal mask in the sigaction
      structure.  It is already cleared.  */
+  {
+    struct sigaction sa;
+    __sigemptyset (&sa.sa_mask);
 # ifdef SIGCANCEL
-  __sigaddset (&sa.sa_mask, SIGCANCEL);
+    __sigaddset (&sa.sa_mask, SIGCANCEL);
 # endif
 # ifdef SIGSETXID
-  __sigaddset (&sa.sa_mask, SIGSETXID);
+    __sigaddset (&sa.sa_mask, SIGSETXID);
 # endif
-  {
     INTERNAL_SYSCALL_DECL (err);
-    (void) INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_UNBLOCK, &sa.sa_mask,
-			     NULL, _NSIG / 8);
+    INTERNAL_SYSCALL_CALL (rt_sigprocmask, err, SIG_UNBLOCK, &sa.sa_mask,
+			   NULL, _NSIG / 8);
   }
 #endif
 
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 070b3afa8d..cafd14f5af 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -296,20 +296,46 @@  extern void __nptl_unwind_freeres (void) attribute_hidden;
 #endif
 
 
-/* Called when a thread reacts on a cancellation request.  */
 static inline void
 __attribute ((noreturn, always_inline))
-__do_cancel (void)
+__do_cancel_with_result (void *result)
 {
   struct pthread *self = THREAD_SELF;
 
-  /* Make sure we get no more cancellations.  */
-  THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
+  /* Make sure we get no more cancellations by clearing the cancel
+     state.  */
+  int oldval = THREAD_GETMEM (self, cancelhandling);
+  while (1)
+    {
+      int newval = oldval | CANCELSTATE_BITMASK | EXITING_BITMASK;
+      if (oldval == newval)
+	break;
+
+      oldval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling, newval,
+					  oldval);
+    }
+
+  THREAD_SETMEM (self, result, result);
 
   __pthread_unwind ((__pthread_unwind_buf_t *)
 		    THREAD_GETMEM (self, cleanup_jmp_buf));
 }
 
+/* Called when a thread reacts on a cancellation request.  */
+static inline void
+__attribute ((noreturn, always_inline))
+__do_cancel (void)
+{
+  __do_cancel_with_result (PTHREAD_CANCELED);
+}
+
+extern long int __syscall_cancel_arch (volatile int *, __syscall_arg_t nr,
+     __syscall_arg_t arg1, __syscall_arg_t arg2, __syscall_arg_t arg3,
+     __syscall_arg_t arg4, __syscall_arg_t arg5, __syscall_arg_t arg6);
+libc_hidden_proto (__syscall_cancel_arch);
+
+extern _Noreturn void __syscall_do_cancel (void)
+     attribute_hidden;
 
 /* Internal prototypes.  */
 
@@ -469,11 +495,11 @@  extern int __pthread_equal (pthread_t thread1, pthread_t thread2);
 extern int __pthread_detach (pthread_t th);
 extern int __pthread_cancel (pthread_t th);
 extern int __pthread_kill (pthread_t threadid, int signo);
+extern int __pthread_kill_internal (pthread_t threadid, int signo)
+  attribute_hidden;
 extern void __pthread_exit (void *value) __attribute__ ((__noreturn__));
 extern int __pthread_join (pthread_t threadid, void **thread_return);
 extern int __pthread_setcanceltype (int type, int *oldtype);
-extern int __pthread_enable_asynccancel (void) attribute_hidden;
-extern void __pthread_disable_asynccancel (int oldtype) attribute_hidden;
 extern void __pthread_testcancel (void);
 extern int __pthread_timedjoin_ex (pthread_t, void **, const struct timespec *,
 				   bool);
@@ -496,6 +522,7 @@  hidden_proto (__pthread_testcancel)
 hidden_proto (__pthread_mutexattr_init)
 hidden_proto (__pthread_mutexattr_settype)
 hidden_proto (__pthread_timedjoin_ex)
+hidden_proto (__pthread_kill_internal)
 #endif
 
 extern int __pthread_cond_broadcast_2_0 (pthread_cond_2_0_t *cond);
diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
index 64ac12e60a..b20254bdba 100644
--- a/nptl/pthread_cancel.c
+++ b/nptl/pthread_cancel.c
@@ -37,67 +37,24 @@  __pthread_cancel (pthread_t th)
 #ifdef SHARED
   pthread_cancel_init ();
 #endif
-  int result = 0;
-  int oldval;
-  int newval;
-  do
-    {
-    again:
-      oldval = pd->cancelhandling;
-      newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK;
 
-      /* Avoid doing unnecessary work.  The atomic operation can
-	 potentially be expensive if the bug has to be locked and
-	 remote cache lines have to be invalidated.  */
-      if (oldval == newval)
-	break;
+  THREAD_ATOMIC_BIT_SET (pd, cancelhandling, CANCELED_BIT);
 
-      /* If the cancellation is handled asynchronously just send a
-	 signal.  We avoid this if possible since it's more
-	 expensive.  */
-      if (CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS (newval))
-	{
-	  /* Mark the cancellation as "in progress".  */
-	  if (atomic_compare_and_exchange_bool_acq (&pd->cancelhandling,
-						    oldval | CANCELING_BITMASK,
-						    oldval))
-	    goto again;
-
-#ifdef SIGCANCEL
-	  /* The cancellation handler will take care of marking the
-	     thread as canceled.  */
-	  pid_t pid = __getpid ();
-
-	  INTERNAL_SYSCALL_DECL (err);
-	  int val = INTERNAL_SYSCALL_CALL (tgkill, err, pid, pd->tid,
-					   SIGCANCEL);
-	  if (INTERNAL_SYSCALL_ERROR_P (val, err))
-	    result = INTERNAL_SYSCALL_ERRNO (val, err);
-#else
-          /* It should be impossible to get here at all, since
-             pthread_setcanceltype should never have allowed
-             PTHREAD_CANCEL_ASYNCHRONOUS to be set.  */
-          abort ();
-#endif
-
-	  break;
-	}
-
-	/* A single-threaded process should be able to kill itself, since
-	   there is nothing in the POSIX specification that says that it
-	   cannot.  So we set multiple_threads to true so that cancellation
-	   points get executed.  */
-	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
+  /* A single-threaded process should be able to kill itself, since there is
+     nothing in the POSIX specification that says that it cannot.  So we set
+     multiple_threads to true so that cancellation points get executed.  */
+  THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	__pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
+  __pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
 #endif
-    }
-  /* Mark the thread as canceled.  This has to be done
-     atomically since other bits could be modified as well.  */
-  while (atomic_compare_and_exchange_bool_acq (&pd->cancelhandling, newval,
-					       oldval));
 
-  return result;
+  /* Avoid signaling when thread attempts cancel itself (pthread_kill
+     is expensive).  */
+  if (pd == THREAD_SELF
+      && (THREAD_GETMEM (pd, cancelhandling) & CANCELTYPE_BITMASK) == 0)
+    return 0;
+
+  return __pthread_kill_internal (th, SIGCANCEL);
 }
 weak_alias (__pthread_cancel, pthread_cancel)
 
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 130937c3c4..8cab7f970c 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -406,7 +406,7 @@  START_THREAD_DEFN
   /* If the parent was running cancellation handlers while creating
      the thread the new thread inherited the signal mask.  Reset the
      cancellation signal mask.  */
-  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELING_BITMASK))
+  if (__glibc_unlikely (pd->parent_cancelhandling & CANCELED_BITMASK))
     {
       INTERNAL_SYSCALL_DECL (err);
       sigset_t mask;
@@ -449,7 +449,8 @@  START_THREAD_DEFN
 	 have ownership (see CONCURRENCY NOTES above).  */
       if (__glibc_unlikely (pd->stopped_start))
 	{
-	  int oldtype = CANCEL_ASYNC ();
+	  int ct;
+	  __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);
 
 	  /* Get the lock the parent locked to force synchronization.  */
 	  lll_lock (pd->lock, LLL_PRIVATE);
@@ -459,7 +460,7 @@  START_THREAD_DEFN
 	  /* And give it up right away.  */
 	  lll_unlock (pd->lock, LLL_PRIVATE);
 
-	  CANCEL_RESET (oldtype);
+	  __pthread_setcanceltype (ct, NULL);
 	}
 
       LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);
diff --git a/nptl/pthread_exit.c b/nptl/pthread_exit.c
index 643c85bd6e..6c59cdf652 100644
--- a/nptl/pthread_exit.c
+++ b/nptl/pthread_exit.c
@@ -16,16 +16,13 @@ 
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <stdlib.h>
 #include "pthreadP.h"
 
 
 void
 __pthread_exit (void *value)
 {
-  THREAD_SETMEM (THREAD_SELF, result, value);
-
-  __do_cancel ();
+  __do_cancel_with_result (value);
 }
 weak_alias (__pthread_exit, pthread_exit)
 
diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c
index 9545ae4bd3..43cdb2c7bc 100644
--- a/nptl/pthread_join_common.c
+++ b/nptl/pthread_join_common.c
@@ -100,7 +100,7 @@  __pthread_timedjoin_ex (pthread_t threadid, void **thread_return,
   if ((pd == self
        || (self->joinid == pd
 	   && (pd->cancelhandling
-	       & (CANCELING_BITMASK | CANCELED_BITMASK | EXITING_BITMASK
+	       & (CANCELED_BITMASK | EXITING_BITMASK
 		  | TERMINATED_BITMASK)) == 0))
       && !CANCEL_ENABLED_AND_CANCELED (self->cancelhandling))
     /* This is a deadlock situation.  The threads are waiting for each
diff --git a/nptl/pthread_kill.c b/nptl/pthread_kill.c
index 2805c722a6..4c2672ae84 100644
--- a/nptl/pthread_kill.c
+++ b/nptl/pthread_kill.c
@@ -31,8 +31,9 @@  __pthread_kill (pthread_t threadid, int signo)
     /* Not a valid thread handle.  */
     return ESRCH;
 
-  return ENOSYS;
+  if (__is_internal_signal (signo))
+    return EINVAL;
+
+  return __pthread_kill_internal (threadid, signo);
 }
 strong_alias (__pthread_kill, pthread_kill)
-
-stub_warning (pthread_kill)
diff --git a/sysdeps/nptl/librt-cancellation.c b/nptl/pthread_kill_internal.c
similarity index 70%
rename from sysdeps/nptl/librt-cancellation.c
rename to nptl/pthread_kill_internal.c
index 93ebe4aa71..b4f4f6dc78 100644
--- a/sysdeps/nptl/librt-cancellation.c
+++ b/nptl/pthread_kill_internal.c
@@ -1,6 +1,6 @@ 
-/* Copyright (C) 2002-2019 Free Software Foundation, Inc.
+/* Send a signal to a specific pthread.  Internal version.
+   Copyright (C) 2002-2019 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
 
    The GNU C Library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
@@ -16,9 +16,11 @@ 
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <nptl/pthreadP.h>
+#include <pthreadP.h>
 
-
-#define __pthread_enable_asynccancel __librt_enable_asynccancel
-#define __pthread_disable_asynccancel __librt_disable_asynccancel
-#include <nptl/cancellation.c>
+int
+__pthread_kill_internal (pthread_t threadid, int signo)
+{
+  return ENOSYS;
+}
+hidden_def (__pthread_kill_internal)
diff --git a/nptl/pthread_setcanceltype.c b/nptl/pthread_setcanceltype.c
index d771c31e46..84cc657aed 100644
--- a/nptl/pthread_setcanceltype.c
+++ b/nptl/pthread_setcanceltype.c
@@ -73,4 +73,4 @@  __pthread_setcanceltype (int type, int *oldtype)
 
   return 0;
 }
-strong_alias (__pthread_setcanceltype, pthread_setcanceltype)
+weak_alias (__pthread_setcanceltype, pthread_setcanceltype)
diff --git a/nptl/thrd_sleep.c b/nptl/thrd_sleep.c
index 2e185dd748..75c0d53f3e 100644
--- a/nptl/thrd_sleep.c
+++ b/nptl/thrd_sleep.c
@@ -24,13 +24,12 @@ 
 int
 thrd_sleep (const struct timespec* time_point, struct timespec* remaining)
 {
-  INTERNAL_SYSCALL_DECL (err);
-  int ret = INTERNAL_SYSCALL_CANCEL (nanosleep, err, time_point, remaining);
-  if (INTERNAL_SYSCALL_ERROR_P (ret, err))
+  long int ret = INTERNAL_SYSCALL_CANCEL (nanosleep, time_point, remaining);
+  if (SYSCALL_CANCEL_ERROR (ret))
     {
       /* C11 states thrd_sleep function returns -1 if it has been interrupted
 	 by a signal, or a negative value if it fails.  */
-      ret = INTERNAL_SYSCALL_ERRNO (ret, err);
+      ret = -ret;
       if (ret == EINTR)
 	return -1;
       return -2;
diff --git a/nptl/tst-cancel28.c b/nptl/tst-cancel28.c
new file mode 100644
index 0000000000..e27f5a0776
--- /dev/null
+++ b/nptl/tst-cancel28.c
@@ -0,0 +1,100 @@ 
+/* Check side-effect act for cancellable syscalls (BZ #12683).
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This testcase checks if there is resource leakage if the syscall has
+   returned from kernelspace, but before userspace saves the return
+   value.  The 'leaker' thread should be able to close the file descriptor
+   if the resource is already allocated, meaning that if the cancellation
+   signal arrives *after* the open syscal return from kernel, the
+   side-effect should be visible to application.  */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+#include <support/xthread.h>
+#include <support/check.h>
+#include <support/temp_file.h>
+#include <support/support.h>
+#include <support/descriptors.h>
+
+static void *
+writeopener (void *arg)
+{
+  int fd;
+  for (;;)
+    {
+      fd = open (arg, O_WRONLY);
+      close (fd);
+    }
+  return NULL;
+}
+
+static void *
+leaker (void *arg)
+{
+  int fd = open (arg, O_RDONLY);
+  TEST_VERIFY_EXIT (fd > 0);
+  pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, 0);
+  close (fd);
+  return NULL;
+}
+
+
+#define ITER_COUNT 1000
+#define MAX_FILENO 1024
+
+static int
+do_test (void)
+{
+  char *dir = support_create_temp_directory ("tst-cancel28");
+  char *name = xasprintf ("%s/fifo", dir);
+  TEST_COMPARE (mkfifo (name, 0600), 0);
+  add_temp_file (name);
+
+  struct support_descriptors *descrs = support_descriptors_list ();
+
+  srand (1);
+
+  xpthread_create (NULL, writeopener, name);
+  for (int i = 0; i < ITER_COUNT; i++)
+    {
+      pthread_t td = xpthread_create (NULL, leaker, name);
+      struct timespec ts =
+	{ .tv_nsec = rand () % 100000, .tv_sec = 0 };
+      nanosleep (&ts, NULL);
+      /* Ignore pthread_cancel result because it might be the
+	 case when pthread_cancel is called when thread is already
+	 exited.  */
+      pthread_cancel (td);
+      xpthread_join (td);
+    }
+
+  support_descriptors_check (descrs);
+
+  support_descriptors_free (descrs);
+
+  free (name);
+
+  return 0;
+}
+
+#define TIMEOUT 10
+#include <support/test-driver.c>
diff --git a/rt/Makefile b/rt/Makefile
index 6c8365e0c0..a7a9ec4cfc 100644
--- a/rt/Makefile
+++ b/rt/Makefile
@@ -56,7 +56,7 @@  include ../Rules
 CFLAGS-aio_suspend.c += -fexceptions
 CFLAGS-mq_timedreceive.c += -fexceptions -fasynchronous-unwind-tables
 CFLAGS-mq_timedsend.c += -fexceptions -fasynchronous-unwind-tables
-CFLAGS-librt-cancellation.c += -fasynchronous-unwind-tables
+CFLAGS-clock_nanosleep.c += -fexceptions -fasynchronous-unwind-tables
 
 LDFLAGS-rt.so = -Wl,--enable-new-dtags,-z,nodelete
 
diff --git a/sysdeps/generic/sysdep-cancel.h b/sysdeps/generic/sysdep-cancel.h
index d22a786536..5c84b4499a 100644
--- a/sysdeps/generic/sysdep-cancel.h
+++ b/sysdeps/generic/sysdep-cancel.h
@@ -3,5 +3,3 @@ 
 /* No multi-thread handling enabled.  */
 #define SINGLE_THREAD_P (1)
 #define RTLD_SINGLE_THREAD_P (1)
-#define LIBC_CANCEL_ASYNC()	0 /* Just a dummy value.  */
-#define LIBC_CANCEL_RESET(val)	((void)(val)) /* Nothing, but evaluate it.  */
diff --git a/sysdeps/htl/pthreadP.h b/sysdeps/htl/pthreadP.h
index e745dacac1..5628791de8 100644
--- a/sysdeps/htl/pthreadP.h
+++ b/sysdeps/htl/pthreadP.h
@@ -25,6 +25,7 @@ 
 
 extern pthread_t __pthread_self (void);
 extern int __pthread_kill (pthread_t threadid, int signo);
+extern int __pthread_kill_internal (pthread_t threadid, int signo);
 extern struct __pthread **__pthread_threads;
 
 extern int _pthread_mutex_init (pthread_mutex_t *mutex, const pthread_mutexattr_t *attr);
diff --git a/sysdeps/nptl/Makefile b/sysdeps/nptl/Makefile
index fbb9800a5c..19293a7b3e 100644
--- a/sysdeps/nptl/Makefile
+++ b/sysdeps/nptl/Makefile
@@ -21,8 +21,7 @@  libpthread-sysdep_routines += errno-loc
 endif
 
 ifeq ($(subdir),rt)
-librt-sysdep_routines += timer_routines librt-cancellation
-CFLAGS-librt-cancellation.c += -fexceptions -fasynchronous-unwind-tables
+librt-sysdep_routines += timer_routines
 
 tests += tst-mqueue8x
 CFLAGS-tst-mqueue8x.c += -fexceptions
diff --git a/sysdeps/nptl/cancellation-pc-check.h b/sysdeps/nptl/cancellation-pc-check.h
new file mode 100644
index 0000000000..8b26c4ec4e
--- /dev/null
+++ b/sysdeps/nptl/cancellation-pc-check.h
@@ -0,0 +1,40 @@ 
+/* Architecture specific code for pthread cancellation handling.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _NPTL_CANCELLATION_PC_CHECK
+#define _NPTL_CANCELLATION_PC_CHECK
+
+#include <sigcontextinfo.h>
+
+/* Check if the program counter (PC) from ucontext CTX is within the start and
+   then end boundary from the __syscall_cancel_arch bridge.  Return TRUE if
+   the PC is within the boundary, meaning the syscall does not have any side
+   effects; or FALSE otherwise.  */
+static bool
+ucontext_check_pc_boundary (void *ctx)
+{
+  /* Both are defined in syscall_cancel.S.  */
+  extern const char __syscall_cancel_arch_start[1];
+  extern const char __syscall_cancel_arch_end[1];
+
+  uintptr_t pc = sigcontext_get_pc (ctx);
+  return pc >= (uintptr_t) __syscall_cancel_arch_start
+	 && pc < (uintptr_t) __syscall_cancel_arch_end;
+}
+
+#endif
diff --git a/sysdeps/nptl/cancellation-sigmask.h b/sysdeps/nptl/cancellation-sigmask.h
new file mode 100644
index 0000000000..77702c135e
--- /dev/null
+++ b/sysdeps/nptl/cancellation-sigmask.h
@@ -0,0 +1,30 @@ 
+/* Architecture specific code for pthread cancellation handling.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _NPTL_CANCELLATION_SIGMASK_H
+#define _NPTL_CANCELLATION_SIGMASK_H
+
+/* Add the SIGCANCEL signal on sigmask set at the ucontext CTX obtained from
+   the sigaction handler.  */
+static void
+ucontext_add_cancel (void *ctx)
+{
+  __sigaddset (&((ucontext_t*) ctx)->uc_sigmask, SIGCANCEL);
+}
+
+#endif
diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
index 10468c7129..04f30b8cbc 100644
--- a/sysdeps/unix/sysdep.h
+++ b/sysdeps/unix/sysdep.h
@@ -24,6 +24,9 @@ 
 #define	SYSCALL__(name, args)	PSEUDO (__##name, name, args)
 #define	SYSCALL(name, args)	PSEUDO (name, name, args)
 
+#ifndef __ASSEMBLER__
+# include <errno.h>
+
 #define __SYSCALL_CONCAT_X(a,b)     a##b
 #define __SYSCALL_CONCAT(a,b)       __SYSCALL_CONCAT_X (a, b)
 
@@ -57,6 +60,29 @@ 
 #define INTERNAL_SYSCALL_CALL(...) \
   __INTERNAL_SYSCALL_DISP (__INTERNAL_SYSCALL, __VA_ARGS__)
 
+#define __INTERNAL_SYSCALL_NCS0(name, err) \
+  INTERNAL_SYSCALL_NCS (name, err, 0)
+#define __INTERNAL_SYSCALL_NCS1(name, err, a1) \
+  INTERNAL_SYSCALL_NCS (name, err, 1, a1)
+#define __INTERNAL_SYSCALL_NCS2(name, err, a1, a2) \
+  INTERNAL_SYSCALL_NCS (name, err, 2, a1, a2)
+#define __INTERNAL_SYSCALL_NCS3(name, err, a1, a2, a3) \
+  INTERNAL_SYSCALL_NCS (name, err, 3, a1, a2, a3)
+#define __INTERNAL_SYSCALL_NCS4(name, err, a1, a2, a3, a4) \
+  INTERNAL_SYSCALL_NCS (name, err, 4, a1, a2, a3, a4)
+#define __INTERNAL_SYSCALL_NCS5(name, err, a1, a2, a3, a4, a5) \
+  INTERNAL_SYSCALL_NCS (name, err, 5, a1, a2, a3, a4, a5)
+#define __INTERNAL_SYSCALL_NCS6(name, err, a1, a2, a3, a4, a5, a6) \
+  INTERNAL_SYSCALL_NCS (name, err, 6, a1, a2, a3, a4, a5, a6)
+#define __INTERNAL_SYSCALL_NCS7(name, err, a1, a2, a3, a4, a5, a6, a7) \
+  INTERNAL_SYSCALL_NCS (name, err, 7, a1, a2, a3, a4, a5, a6, a7)
+
+/* Issue a syscall defined by syscall number plus any other argument required.
+   It is similar to INTERNAL_SYSCALL_NCS macro, but without the need to pass
+   the expected argument number as third parameter.  */
+#define INTERNAL_SYSCALL_NCS_CALL(...) \
+  __INTERNAL_SYSCALL_DISP (__INTERNAL_SYSCALL_NCS, __VA_ARGS__)
+
 #define __INLINE_SYSCALL0(name) \
   INLINE_SYSCALL (name, 0)
 #define __INLINE_SYSCALL1(name, a1) \
@@ -88,35 +114,70 @@ 
 #define INLINE_SYSCALL_CALL(...) \
   __INLINE_SYSCALL_DISP (__INLINE_SYSCALL, __VA_ARGS__)
 
-#define SYSCALL_CANCEL(...) \
-  ({									     \
-    long int sc_ret;							     \
-    if (SINGLE_THREAD_P) 						     \
-      sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__); 			     \
-    else								     \
-      {									     \
-	int sc_cancel_oldtype = LIBC_CANCEL_ASYNC ();			     \
-	sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__);			     \
-        LIBC_CANCEL_RESET (sc_cancel_oldtype);				     \
-      }									     \
-    sc_ret;								     \
-  })
 
-/* Issue a syscall defined by syscall number plus any other argument
-   required.  Any error will be returned unmodified (including errno).  */
-#define INTERNAL_SYSCALL_CANCEL(...) \
-  ({									     \
-    long int sc_ret;							     \
-    if (SINGLE_THREAD_P) 						     \
-      sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__); 			     \
-    else								     \
-      {									     \
-	int sc_cancel_oldtype = LIBC_CANCEL_ASYNC ();			     \
-	sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__);			     \
-        LIBC_CANCEL_RESET (sc_cancel_oldtype);				     \
-      }									     \
-    sc_ret;								     \
+/* Cancellation macros.  */
+#ifndef __SSC
+typedef long int __syscall_arg_t;
+# define __SSC(__x) ((__syscall_arg_t) (__x))
+#endif
+
+long int __syscall_cancel (__syscall_arg_t nr, __syscall_arg_t arg1,
+			   __syscall_arg_t arg2, __syscall_arg_t arg3,
+			   __syscall_arg_t arg4, __syscall_arg_t arg5,
+			   __syscall_arg_t arg6);
+libc_hidden_proto (__syscall_cancel);
+
+#define __SYSCALL_CANCEL0(name) \
+  (__syscall_cancel)(__NR_##name, 0, 0, 0, 0, 0, 0)
+#define __SYSCALL_CANCEL1(name, a1) \
+  (__syscall_cancel)(__NR_##name, __SSC(a1), 0, 0, 0, 0, 0)
+#define __SYSCALL_CANCEL2(name, a1, a2) \
+  (__syscall_cancel)(__NR_##name, __SSC(a1), __SSC(a2), 0, 0, 0, 0)
+#define __SYSCALL_CANCEL3(name, a1, a2, a3) \
+  (__syscall_cancel)(__NR_##name, __SSC(a1), __SSC(a2), __SSC(a3), 0, 0, 0)
+#define __SYSCALL_CANCEL4(name, a1, a2, a3, a4) \
+  (__syscall_cancel)(__NR_##name, __SSC(a1), __SSC(a2), __SSC(a3), \
+		     __SSC(a4), 0, 0)
+#define __SYSCALL_CANCEL5(name, a1, a2, a3, a4, a5) \
+  (__syscall_cancel)(__NR_##name, __SSC(a1), __SSC(a2), __SSC(a3), \
+		     __SSC(a4), __SSC(a5), 0)
+#define __SYSCALL_CANCEL6(name, a1, a2, a3, a4, a5, a6) \
+  (__syscall_cancel)(__NR_##name, __SSC(a1), __SSC(a2), __SSC(a3), \
+		     __SSC(a4), __SSC(a5), __SSC(a6))
+
+#define __SYSCALL_CANCEL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
+#define __SYSCALL_CANCEL_NARGS(...) \
+  __SYSCALL_CANCEL_NARGS_X (__VA_ARGS__,7,6,5,4,3,2,1,0,)
+#define __SYSCALL_CANCEL_CONCAT_X(a,b)     a##b
+#define __SYSCALL_CANCEL_CONCAT(a,b)       __SYSCALL_CANCEL_CONCAT_X (a, b)
+#define __SYSCALL_CANCEL_DISP(b,...) \
+  __SYSCALL_CANCEL_CONCAT (b,__SYSCALL_CANCEL_NARGS(__VA_ARGS__))(__VA_ARGS__)
+
+#define __SYSCALL_CANCEL_CALL(...) \
+  __SYSCALL_CANCEL_DISP (__SYSCALL_CANCEL, __VA_ARGS__)
+
+/* Issue a cancellable syscall defined by syscall number NAME plus any other
+   argument required.  If an error occurs its value is returned as an negative
+   number unmodified and errno is not set.  */
+#define INTERNAL_SYSCALL_CANCEL(name, args...) \
+  __SYSCALL_CANCEL_CALL (name, args)
+
+/* Issue a cancellable syscall defined first argument plus any other argument
+   required.  If and error occurs its value, the macro returns -1 and sets
+   errno accordingly.  */
+#if IS_IN (rtld)
+/* The loader does not need to handle thread cancellation, use direct
+   syscall instead.  */
+# define SYSCALL_CANCEL(...) INLINE_SYSCALL_CALL (__VA_ARGS__)
+#else
+# define SYSCALL_CANCEL(...) \
+  ({									\
+    long int sc_ret = __SYSCALL_CANCEL_CALL (__VA_ARGS__);		\
+    SYSCALL_CANCEL_RET ((sc_ret));					\
   })
+#endif
+
+#endif /* __ASSEMBLER__  */
 
 /* Machine-dependent sysdep.h files are expected to define the macro
    PSEUDO (function_name, syscall_name) to emit assembly code to define the
diff --git a/sysdeps/unix/sysv/linux/clock_nanosleep.c b/sysdeps/unix/sysv/linux/clock_nanosleep.c
index 1f240b8720..ddcdddcd5d 100644
--- a/sysdeps/unix/sysv/linux/clock_nanosleep.c
+++ b/sysdeps/unix/sysv/linux/clock_nanosleep.c
@@ -36,11 +36,9 @@  __clock_nanosleep (clockid_t clock_id, int flags, const struct timespec *req,
 
   /* If the call is interrupted by a signal handler or encounters an error,
      it returns a positive value similar to errno.  */
-  INTERNAL_SYSCALL_DECL (err);
-  int r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep, err, clock_id, flags,
+  int r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep, clock_id, flags,
 				   req, rem);
-  return (INTERNAL_SYSCALL_ERROR_P (r, err)
-	  ? INTERNAL_SYSCALL_ERRNO (r, err) : 0);
+  return SYSCALL_CANCEL_ERROR (r) ? -r : 0;
 }
 
 versioned_symbol (libc, __clock_nanosleep, clock_nanosleep, GLIBC_2_17);
diff --git a/sysdeps/unix/sysv/linux/futex-internal.h b/sysdeps/unix/sysv/linux/futex-internal.h
index 5a4f4ff818..b41a4d9919 100644
--- a/sysdeps/unix/sysv/linux/futex-internal.h
+++ b/sysdeps/unix/sysv/linux/futex-internal.h
@@ -75,10 +75,7 @@  static __always_inline int
 futex_wait_cancelable (unsigned int *futex_word, unsigned int expected,
 		       int private)
 {
-  int oldtype;
-  oldtype = __pthread_enable_asynccancel ();
-  int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
-  __pthread_disable_asynccancel (oldtype);
+  int err = lll_futex_timed_wait_cancel (futex_word, expected, NULL, private);
   switch (err)
     {
     case 0:
@@ -129,10 +126,7 @@  futex_reltimed_wait_cancelable (unsigned int *futex_word,
 				unsigned int expected,
 			        const struct timespec *reltime, int private)
 {
-  int oldtype;
-  oldtype = LIBC_CANCEL_ASYNC ();
-  int err = lll_futex_timed_wait (futex_word, expected, reltime, private);
-  LIBC_CANCEL_RESET (oldtype);
+  int err = lll_futex_timed_wait_cancel (futex_word, expected, reltime, private);
   switch (err)
     {
     case 0:
@@ -203,12 +197,8 @@  futex_abstimed_wait_cancelable (unsigned int *futex_word,
      despite them being valid.  */
   if (__glibc_unlikely ((abstime != NULL) && (abstime->tv_sec < 0)))
     return ETIMEDOUT;
-  int oldtype;
-  oldtype = __pthread_enable_asynccancel ();
-  int err = lll_futex_clock_wait_bitset (futex_word, expected,
-					clockid, abstime,
-					private);
-  __pthread_disable_asynccancel (oldtype);
+  int err = lll_futex_clock_wait_bitset_cancel (futex_word, expected, clockid,
+						abstime, private);
   switch (err)
     {
     case 0:
diff --git a/sysdeps/unix/sysv/linux/lowlevellock-futex.h b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
index b423673ed4..bd6d3b7cf7 100644
--- a/sysdeps/unix/sysv/linux/lowlevellock-futex.h
+++ b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
@@ -74,6 +74,12 @@ 
      ? -INTERNAL_SYSCALL_ERRNO (__ret, __err) : 0);                     \
   })
 
+#define lll_futex_syscall_cp(...)					\
+  ({                                                                    \
+    long int __ret = INTERNAL_SYSCALL_CANCEL (futex, __VA_ARGS__);	\
+    __ret;								\
+  })
+
 #define lll_futex_wait(futexp, val, private) \
   lll_futex_timed_wait (futexp, val, NULL, private)
 
@@ -148,19 +154,36 @@ 
 
 /* Cancellable futex macros.  */
 #define lll_futex_wait_cancel(futexp, val, private) \
-  ({                                                                   \
-    int __oldtype = CANCEL_ASYNC ();				       \
-    long int __err = lll_futex_wait (futexp, val, LLL_SHARED);	       \
-    CANCEL_RESET (__oldtype);					       \
-    __err;							       \
+  lll_futex_timed_wait_cancel (futexp, val, NULL, private)
+
+#define lll_futex_timed_wait_cancel(futexp, val, timeout, private)	\
+  ({									\
+    long int __ret;							\
+    int __op = FUTEX_WAIT;						\
+    __ret = lll_futex_syscall_cp (futexp,				\
+				  __lll_private_flag (__op, private),	\
+				  val, timeout);			\
+    __ret;								\
   })
 
-#define lll_futex_timed_wait_cancel(futexp, val, timeout, private)	   \
-  ({									   \
-    int __oldtype = CANCEL_ASYNC ();				       	   \
-    long int __err = lll_futex_timed_wait (futexp, val, timeout, private); \
-    CANCEL_RESET (__oldtype);						   \
-    __err;								   \
+#define lll_futex_clock_wait_bitset_cancel(futexp, val, clockid,	\
+					   timeout, private)		\
+  ({									\
+    long int __ret;							\
+    if (lll_futex_supported_clockid (clockid))                          \
+      {                                                                 \
+        const unsigned int clockbit =                                   \
+          (clockid == CLOCK_REALTIME) ? FUTEX_CLOCK_REALTIME : 0;       \
+        const int op =                                                  \
+          __lll_private_flag (FUTEX_WAIT_BITSET | clockbit, private);   \
+									\
+	__ret = lll_futex_syscall_cp (futexp, op, val,			\
+                                      timeout, NULL /* Unused.  */,	\
+                                      FUTEX_BITSET_MATCH_ANY);		\
+      }									\
+    else								\
+      __ret = -EINVAL;							\
+    __ret;								\
   })
 
 #endif  /* !__ASSEMBLER__  */
diff --git a/sysdeps/unix/sysv/linux/pthread_kill.c b/sysdeps/unix/sysv/linux/pthread_kill_internal.c
similarity index 75%
rename from sysdeps/unix/sysv/linux/pthread_kill.c
rename to sysdeps/unix/sysv/linux/pthread_kill_internal.c
index 71305b90c6..31920deae4 100644
--- a/sysdeps/unix/sysv/linux/pthread_kill.c
+++ b/sysdeps/unix/sysv/linux/pthread_kill_internal.c
@@ -16,24 +16,15 @@ 
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <errno.h>
-#include <signal.h>
-#include <pthreadP.h>
-#include <tls.h>
-#include <sysdep.h>
 #include <unistd.h>
+#include <pthreadP.h>
 
-
+/* Used internally by pthread_cancel, so we can't filter SIGCANCEL.  */
 int
-__pthread_kill (pthread_t threadid, int signo)
+__pthread_kill_internal (pthread_t threadid, int signo)
 {
   struct pthread *pd = (struct pthread *) threadid;
 
-  /* Make sure the descriptor is valid.  */
-  if (DEBUGGING_P && INVALID_TD_P (pd))
-    /* Not a valid thread handle.  */
-    return ESRCH;
-
   /* Force load of pd->tid into local variable or register.  Otherwise
      if a thread exits between ESRCH test and tgkill, we might return
      EINVAL, because pd->tid would be cleared by the kernel.  */
@@ -42,11 +33,6 @@  __pthread_kill (pthread_t threadid, int signo)
     /* Not a valid thread handle.  */
     return ESRCH;
 
-  /* Disallow sending the signal we use for cancellation, timers,
-     for the setxid implementation.  */
-  if (signo == SIGCANCEL || signo == SIGTIMER || signo == SIGSETXID)
-    return EINVAL;
-
   /* We have a special syscall to do the work.  */
   INTERNAL_SYSCALL_DECL (err);
 
@@ -56,4 +42,4 @@  __pthread_kill (pthread_t threadid, int signo)
   return (INTERNAL_SYSCALL_ERROR_P (val, err)
 	  ? INTERNAL_SYSCALL_ERRNO (val, err) : 0);
 }
-strong_alias (__pthread_kill, pthread_kill)
+hidden_def (__pthread_kill_internal)
diff --git a/sysdeps/unix/sysv/linux/socketcall.h b/sysdeps/unix/sysv/linux/socketcall.h
index 1e6387cdbf..38af55feb4 100644
--- a/sysdeps/unix/sysv/linux/socketcall.h
+++ b/sysdeps/unix/sysv/linux/socketcall.h
@@ -87,18 +87,32 @@ 
   })
 
 
-#if IS_IN (libc)
-# define __pthread_enable_asynccancel  __libc_enable_asynccancel
-# define __pthread_disable_asynccancel __libc_disable_asynccancel
-#endif
-
-#define SOCKETCALL_CANCEL(name, args...)				\
-  ({									\
-    int oldtype = LIBC_CANCEL_ASYNC ();					\
-    long int sc_ret = __SOCKETCALL (SOCKOP_##name, args);		\
-    LIBC_CANCEL_RESET (oldtype);					\
-    sc_ret;								\
-  })
-
+#define __SOCKETCALL_CANCEL1(__name, __a1) \
+  SYSCALL_CANCEL (socketcall, __name, \
+     ((long int [1]) { (long int) __a1 }))
+#define __SOCKETCALL_CANCEL2(__name, __a1, __a2) \
+  SYSCALL_CANCEL (socketcall, __name, \
+     ((long int [2]) { (long int) __a1, (long int) __a2 }))
+#define __SOCKETCALL_CANCEL3(__name, __a1, __a2, __a3) \
+  SYSCALL_CANCEL (socketcall, __name, \
+     ((long int [3]) { (long int) __a1, (long int) __a2, (long int) __a3 }))
+#define __SOCKETCALL_CANCEL4(__name, __a1, __a2, __a3, __a4) \
+  SYSCALL_CANCEL (socketcall, __name, \
+     ((long int [4]) { (long int) __a1, (long int) __a2, (long int) __a3, \
+                       (long int) __a4 }))
+#define __SOCKETCALL_CANCEL5(__name, __a1, __a2, __a3, __a4, __a5) \
+  SYSCALL_CANCEL (socketcall, __name, \
+     ((long int [5]) { (long int) __a1, (long int) __a2, (long int) __a3, \
+                       (long int) __a4, (long int) __a5 }))
+#define __SOCKETCALL_CANCEL6(__name, __a1, __a2, __a3, __a4, __a5, __a6) \
+  SYSCALL_CANCEL (socketcall, __name, \
+     ((long int [6]) { (long int) __a1, (long int) __a2, (long int) __a3, \
+                       (long int) __a4, (long int) __a5, (long int) __a6 }))
+
+#define __SOCKETCALL_CANCEL(...) __SOCKETCALL_DISP (__SOCKETCALL_CANCEL,\
+						    __VA_ARGS__)
+
+#define SOCKETCALL_CANCEL(name, args...) \
+   __SOCKETCALL_CANCEL (SOCKOP_##name, args)
 
 #endif /* sys/socketcall.h */
diff --git a/sysdeps/unix/sysv/linux/syscall_cancel.c b/sysdeps/unix/sysv/linux/syscall_cancel.c
new file mode 100644
index 0000000000..79d66ec4f4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/syscall_cancel.c
@@ -0,0 +1,62 @@ 
+/* Default cancellation syscall bridge.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <pthreadP.h>
+
+/* This is the generic version of the cancellable syscall code which
+   adds the label guards (__syscall_cancel_arch_{start,end}) used
+   on SIGCANCEL sigcancel_handler (nptl-init.c) to check if the cancelled
+   syscall have side-effects that need to be signaled to program.
+
+   This implementation should be used a reference one to document the
+   implementation constraints: the __syscall_cancel_arch_end should point
+   to the immediate next instruction after the syscall one.  This is because
+   kernel will signal interrupted syscall with side effects by setting
+   the signal frame program counter (on the ucontext_t third argument from
+   SA_SIGINFO signal handler) right after the syscall instruction.
+
+   If the INTERNAL_SYSCALL_NCS macro use more instructions to get the
+   error condition from kernel (as for powerpc and sparc), uses an
+   out of the line helper (as for ARM thumb), or uses a kernel helper
+   gate (as for i686 or ia64) the architecture should adjust the
+   macro or provide a custom __syscall_cancel_arch implementation.   */
+long int
+__syscall_cancel_arch (volatile int *ch, __syscall_arg_t nr,
+		       __syscall_arg_t a1, __syscall_arg_t a2,
+		       __syscall_arg_t a3, __syscall_arg_t a4,
+		       __syscall_arg_t a5, __syscall_arg_t a6)
+{
+#define ADD_LABEL(__label)		\
+  asm volatile (			\
+    ".global " __label "\t\n"		\
+    ".type " __label ",\%function\t\n" 	\
+    __label ":\n");
+
+  ADD_LABEL ("__syscall_cancel_arch_start");
+  if (__glibc_unlikely (*ch & CANCELED_BITMASK))
+    __syscall_do_cancel();
+
+  INTERNAL_SYSCALL_DECL(err);
+  long int result = INTERNAL_SYSCALL_NCS (nr, err, 6, a1, a2, a3, a4, a5, a6);
+  ADD_LABEL ("__syscall_cancel_arch_end");
+  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (result, err)))
+    return -INTERNAL_SYSCALL_ERRNO (result, err);
+  return result;
+}
+libc_hidden_def (__syscall_cancel_arch)
diff --git a/sysdeps/unix/sysv/linux/sysdep.h b/sysdeps/unix/sysv/linux/sysdep.h
index fc9af51456..317bb7f973 100644
--- a/sysdeps/unix/sysv/linux/sysdep.h
+++ b/sysdeps/unix/sysv/linux/sysdep.h
@@ -27,6 +27,26 @@ 
     -1l;					\
   })
 
+/* Check error from cancellable syscall and set errno accordingly.
+   Linux uses a negative return value to indicate syscall errors
+   and since version 2.1 the return value of a system call might be
+   negative even if the call succeeded (e.g., the `lseek' system call
+   might return a large offset).
+   Current contract is kernel make sure the no syscall returns a value
+   in -1 .. -4095 as a valid result so we can savely test with -4095.  */
+#define SYSCALL_CANCEL_ERROR(__ret)		\
+  (__ret > -4096UL)
+
+#define SYSCALL_CANCEL_RET(__ret)		\
+  ({						\
+    if (SYSCALL_CANCEL_ERROR(__ret))		\
+      {						\
+	__set_errno (-__ret);			\
+	__ret = -1;				\
+      }						\
+    __ret;					\
+   })
+
 /* Provide a dummy argument that can be used to force register
    alignment for register pairs if required by the syscall ABI.  */
 #ifdef __ASSUME_ALIGNED_REGISTER_PAIRS