[v2,10/19] nptl: Use tidlock when accessing TID on pthread_getaffinity_np

Message ID 20210823195047.543237-11-adhemerval.zanella@linaro.org
State Superseded
Headers
Series Fix various NPTL synchronization issues |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella Aug. 23, 2021, 7:50 p.m. UTC
  Checked on x86_64-linux-gnu.
---
 nptl/pthread_getaffinity.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)
  

Comments

Florian Weimer Aug. 26, 2021, 2:24 p.m. UTC | #1
* Adhemerval Zanella:

> Checked on x86_64-linux-gnu.
> ---
>  nptl/pthread_getaffinity.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/nptl/pthread_getaffinity.c b/nptl/pthread_getaffinity.c
> index 18261ddae0..5268d86e6e 100644
> --- a/nptl/pthread_getaffinity.c
> +++ b/nptl/pthread_getaffinity.c
> @@ -29,12 +29,24 @@
>  int
>  __pthread_getaffinity_np (pthread_t th, size_t cpusetsize, cpu_set_t *cpuset)
>  {
> -  const struct pthread *pd = (const struct pthread *) th;
> +  struct pthread *pd = (struct pthread *) th;
>  
> -  int res = INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
> -				   MIN (INT_MAX, cpusetsize), cpuset);
> -  if (INTERNAL_SYSCALL_ERROR_P (res))
> -    return INTERNAL_SYSCALL_ERRNO (res);
> +  /* Block all signal, since the lock is recursive and used on pthread_cancel
> +     (which should be async-signal-safe).  */

The pthread_cancel reference looks like a cut-and-paste-bug.

> +  sigset_t oldmask;
> +  __libc_signal_block_all (&oldmask);
> +  lll_lock (pd->tidlock, LLL_PRIVATE);
> +
> +  int res = pd->tid == 0
> +	    ? -ESRCH
> +	    : INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
> +				     MIN (INT_MAX, cpusetsize), cpuset);
> +
> +  lll_unlock (pd->tidlock, LLL_PRIVATE);
> +  __libc_signal_restore_set (&oldmask);
> +
> +  if (res < 0)
> +    return -res;

ESRCH doesn't look like the right error code here.  Should we return an
affinity mask without any bits set?

Thanks,
Florian
  
Adhemerval Zanella Aug. 26, 2021, 5:29 p.m. UTC | #2
On 26/08/2021 11:24, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> Checked on x86_64-linux-gnu.
>> ---
>>  nptl/pthread_getaffinity.c | 22 +++++++++++++++++-----
>>  1 file changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/nptl/pthread_getaffinity.c b/nptl/pthread_getaffinity.c
>> index 18261ddae0..5268d86e6e 100644
>> --- a/nptl/pthread_getaffinity.c
>> +++ b/nptl/pthread_getaffinity.c
>> @@ -29,12 +29,24 @@
>>  int
>>  __pthread_getaffinity_np (pthread_t th, size_t cpusetsize, cpu_set_t *cpuset)
>>  {
>> -  const struct pthread *pd = (const struct pthread *) th;
>> +  struct pthread *pd = (struct pthread *) th;
>>  
>> -  int res = INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
>> -				   MIN (INT_MAX, cpusetsize), cpuset);
>> -  if (INTERNAL_SYSCALL_ERROR_P (res))
>> -    return INTERNAL_SYSCALL_ERRNO (res);
>> +  /* Block all signal, since the lock is recursive and used on pthread_cancel
>> +     (which should be async-signal-safe).  */
> 
> The pthread_cancel reference looks like a cut-and-paste-bug.

It is, but I think it is applicable. Maybe:

  /* Block all signal, since the lock is not recursive and used on                                                                                             
     async-signal-safe functions. */   

> 
>> +  sigset_t oldmask;
>> +  __libc_signal_block_all (&oldmask);
>> +  lll_lock (pd->tidlock, LLL_PRIVATE);
>> +
>> +  int res = pd->tid == 0
>> +	    ? -ESRCH
>> +	    : INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
>> +				     MIN (INT_MAX, cpusetsize), cpuset);
>> +
>> +  lll_unlock (pd->tidlock, LLL_PRIVATE);
>> +  __libc_signal_restore_set (&oldmask);
>> +
>> +  if (res < 0)
>> +    return -res;
> 
> ESRCH doesn't look like the right error code here.  Should we return an
> affinity mask without any bits set?

Why not? Returning anything but an error does not improve things here
since the information won't make much sense.  Also it follows what other
symbols is already doing (such as pthread_cancel()).

> 
> Thanks,
> Florian
>
  
Florian Weimer Aug. 30, 2021, 9:30 a.m. UTC | #3
* Adhemerval Zanella:

>> The pthread_cancel reference looks like a cut-and-paste-bug.
>
> It is, but I think it is applicable. Maybe:
>
>   /* Block all signal, since the lock is not recursive and used on                                                                                             
>      async-signal-safe functions. */

Just mention async-signal-safe, please.  I do not think recursive locks
can achieve async-signal-safety.

>>> +  sigset_t oldmask;
>>> +  __libc_signal_block_all (&oldmask);
>>> +  lll_lock (pd->tidlock, LLL_PRIVATE);
>>> +
>>> +  int res = pd->tid == 0
>>> +	    ? -ESRCH
>>> +	    : INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
>>> +				     MIN (INT_MAX, cpusetsize), cpuset);
>>> +
>>> +  lll_unlock (pd->tidlock, LLL_PRIVATE);
>>> +  __libc_signal_restore_set (&oldmask);
>>> +
>>> +  if (res < 0)
>>> +    return -res;
>> 
>> ESRCH doesn't look like the right error code here.  Should we return an
>> affinity mask without any bits set?
>
> Why not? Returning anything but an error does not improve things here
> since the information won't make much sense.  Also it follows what other
> symbols is already doing (such as pthread_cancel()).

POSIX reserves ESRCH for using a pthread_t thread ID outside of the
thread lifetime.  In glibc, this is always undefined.  In contrast, the
case above involves a valid thread ID of a thread that has exited.
Returning ESRCH suggests that the thread lifetime has ended, which is
not true.

Thanks,
Florian
  

Patch

diff --git a/nptl/pthread_getaffinity.c b/nptl/pthread_getaffinity.c
index 18261ddae0..5268d86e6e 100644
--- a/nptl/pthread_getaffinity.c
+++ b/nptl/pthread_getaffinity.c
@@ -29,12 +29,24 @@ 
 int
 __pthread_getaffinity_np (pthread_t th, size_t cpusetsize, cpu_set_t *cpuset)
 {
-  const struct pthread *pd = (const struct pthread *) th;
+  struct pthread *pd = (struct pthread *) th;
 
-  int res = INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
-				   MIN (INT_MAX, cpusetsize), cpuset);
-  if (INTERNAL_SYSCALL_ERROR_P (res))
-    return INTERNAL_SYSCALL_ERRNO (res);
+  /* Block all signal, since the lock is recursive and used on pthread_cancel
+     (which should be async-signal-safe).  */
+  sigset_t oldmask;
+  __libc_signal_block_all (&oldmask);
+  lll_lock (pd->tidlock, LLL_PRIVATE);
+
+  int res = pd->tid == 0
+	    ? -ESRCH
+	    : INTERNAL_SYSCALL_CALL (sched_getaffinity, pd->tid,
+				     MIN (INT_MAX, cpusetsize), cpuset);
+
+  lll_unlock (pd->tidlock, LLL_PRIVATE);
+  __libc_signal_restore_set (&oldmask);
+
+  if (res < 0)
+    return -res;
 
   /* Clean the rest of the memory the kernel didn't do.  */
   memset ((char *) cpuset + res, '\0', cpusetsize - res);