[3/4] tst-pidfd.c: Test is UNSUPPORTED without PTRACE_MODE_ATTACH_REALCREDS

Message ID 20220626205915.33201-4-mark@klomp.org
State Superseded
Headers
Series [1/4] time/tst-clock2.c: clock_settime CLOCK_MONOTONIC might return EPERM |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Mark Wielaard June 26, 2022, 8:59 p.m. UTC
  pidfd_getfd will fail with errno EPERM if the calling process did not
have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
in that case.
---
 sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Florian Weimer June 26, 2022, 9:20 p.m. UTC | #1
* Mark Wielaard:

> pidfd_getfd will fail with errno EPERM if the calling process did not
> have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
> in that case.
> ---
>  sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
> index d93b6faa6f..28349b2f91 100644
> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> @@ -95,8 +95,10 @@ do_test (void)
>         kernel has pidfd support that we can test.  */
>      int r = pidfd_getfd (0, 0, 1);
>      TEST_VERIFY_EXIT (r == -1);
> -    if (errno == ENOSYS)
> -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd, skipping test");
> +    if (errno == ENOSYS || errno == EPERM)
> +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> +			" or we don't have PTRACE_MODE_ATTACH_REALCREDS"
> +			" permissions, skipping test");
>    }

This also hints towards a broken seccomp filter.
  
Mark Wielaard June 27, 2022, 10:01 a.m. UTC | #2
Hi Florian,

On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
> * Mark Wielaard:
> 
> > pidfd_getfd will fail with errno EPERM if the calling process did
> > not
> > have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
> > in that case.
> > ---
> >  sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
> > b/sysdeps/unix/sysv/linux/tst-pidfd.c
> > index d93b6faa6f..28349b2f91 100644
> > --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> > +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> > @@ -95,8 +95,10 @@ do_test (void)
> >         kernel has pidfd support that we can test.  */
> >      int r = pidfd_getfd (0, 0, 1);
> >      TEST_VERIFY_EXIT (r == -1);
> > -    if (errno == ENOSYS)
> > -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
> > skipping test");
> > +    if (errno == ENOSYS || errno == EPERM)
> > +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> > +			" or we don't have
> > PTRACE_MODE_ATTACH_REALCREDS"
> > +			" permissions, skipping test");
> >    }
> 
> This also hints towards a broken seccomp filter.

pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
(which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
doesn't have. If the process doesn't then pidfd_getfd is defined as
failing and setting errno to EPERM.

But it is confusing I test for errno == ENOSYS || errno == EPERM. I'll
turn that into separate tests and FAIL_UNSUPPORTED messages, so it is
clear which is which in v2.

Thanks,

Mark
  
Florian Weimer June 27, 2022, 11:14 a.m. UTC | #3
* Mark Wielaard:

> Hi Florian,
>
> On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
>> * Mark Wielaard:
>> 
>> > pidfd_getfd will fail with errno EPERM if the calling process did
>> > not
>> > have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
>> > in that case.
>> > ---
>> >  sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
>> >  1 file changed, 4 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
>> > b/sysdeps/unix/sysv/linux/tst-pidfd.c
>> > index d93b6faa6f..28349b2f91 100644
>> > --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
>> > +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
>> > @@ -95,8 +95,10 @@ do_test (void)
>> >         kernel has pidfd support that we can test.  */
>> >      int r = pidfd_getfd (0, 0, 1);
>> >      TEST_VERIFY_EXIT (r == -1);
>> > -    if (errno == ENOSYS)
>> > -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
>> > skipping test");
>> > +    if (errno == ENOSYS || errno == EPERM)
>> > +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
>> > +			" or we don't have
>> > PTRACE_MODE_ATTACH_REALCREDS"
>> > +			" permissions, skipping test");
>> >    }
>> 
>> This also hints towards a broken seccomp filter.
>
> pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
> syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
> (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
> doesn't have. If the process doesn't then pidfd_getfd is defined as
> failing and setting errno to EPERM.

But what does it mean for a process to have PTRACE_MODE_REALCREDS?
Don't we have to acquire that in some way, and can we check if *that*
fails?

Thanks,
Florian
  
Christian Brauner June 27, 2022, 11:51 a.m. UTC | #4
On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
> * Mark Wielaard:
> 
> > Hi Florian,
> >
> > On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
> >> * Mark Wielaard:
> >> 
> >> > pidfd_getfd will fail with errno EPERM if the calling process did
> >> > not
> >> > have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
> >> > in that case.
> >> > ---
> >> >  sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
> >> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> > b/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> > index d93b6faa6f..28349b2f91 100644
> >> > --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> > +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> > @@ -95,8 +95,10 @@ do_test (void)
> >> >         kernel has pidfd support that we can test.  */
> >> >      int r = pidfd_getfd (0, 0, 1);
> >> >      TEST_VERIFY_EXIT (r == -1);
> >> > -    if (errno == ENOSYS)
> >> > -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
> >> > skipping test");
> >> > +    if (errno == ENOSYS || errno == EPERM)
> >> > +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> >> > +			" or we don't have
> >> > PTRACE_MODE_ATTACH_REALCREDS"
> >> > +			" permissions, skipping test");
> >> >    }
> >> 
> >> This also hints towards a broken seccomp filter.
> >
> > pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
> > syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
> > (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
> > doesn't have. If the process doesn't then pidfd_getfd is defined as
> > failing and setting errno to EPERM.
> 
> But what does it mean for a process to have PTRACE_MODE_REALCREDS?

#define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)

PTRACE_MODE_ATTACH_REALCREDS means
* PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
  for permission checking:

So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
the target task's (ptracee's) effective, save, and real [g,u]id then
you'passed the first stage of permission checking.

If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
That will also get you past the first state of permission checking.

If both don't apply the request is denied with -EPERM.

The second stage of permission checking is:
(1) If the task isn't dumpable then you'll need CAP_SYS_PTRACE in the
    ptracee's mm user namespace. That userns may differ from the
    ptracee's current userns if the ptracee's userns wasn't privileged
    over the inode of the file it exec'ed and thus could be an ancestor
    userns (see would_dump()).
(2) The LSMs might deny your request.

Christian
  
Mark Wielaard June 27, 2022, 2:17 p.m. UTC | #5
Hi,

On Mon, Jun 27, 2022 at 01:51:41PM +0200, Christian Brauner wrote:
> On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
> > * Mark Wielaard:
> > > pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
> > > syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
> > > (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
> > > doesn't have. If the process doesn't then pidfd_getfd is defined as
> > > failing and setting errno to EPERM.
> > 
> > But what does it mean for a process to have PTRACE_MODE_REALCREDS?
> 
> #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
> 
> PTRACE_MODE_ATTACH_REALCREDS means
> * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
>   for permission checking:
> 
> So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
> the target task's (ptracee's) effective, save, and real [g,u]id then
> you'passed the first stage of permission checking.
> 
> If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
> real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
> That will also get you past the first state of permission checking.
> 
> If both don't apply the request is denied with -EPERM.
> 
> The second stage of permission checking is:
> (1) If the task isn't dumpable then you'll need CAP_SYS_PTRACE in the
>     ptracee's mm user namespace. That userns may differ from the
>     ptracee's current userns if the ptracee's userns wasn't privileged
>     over the inode of the file it exec'ed and thus could be an ancestor
>     userns (see would_dump()).
> (2) The LSMs might deny your request.

Right, it is pretty difficult to determine why you got an EPERM.  But
failing with EPERM is documented behaviour for pidfd_getfd, so IMHO if
we get EPERM we should simply skip the test whatever the "real" reason
is.

The ptrace manual page lists the whole (6 step) algorithm under
"Ptrace access mode checking":
https://man7.org/linux/man-pages/man2/ptrace.2.html

It also depends on (not) having CAP_SYS_PTRACE and the permission
might be dependend on /proc/sys/kernel/yama/ptrace_scope

Cheers,

Mark
  
Adhemerval Zanella Netto June 27, 2022, 2:21 p.m. UTC | #6
> On 27 Jun 2022, at 11:17, Mark Wielaard <mark@klomp.org> wrote:
> 
> Hi,
> 
> On Mon, Jun 27, 2022 at 01:51:41PM +0200, Christian Brauner wrote:
>> On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
>>> * Mark Wielaard:
>>>> pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
>>>> syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
>>>> (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
>>>> doesn't have. If the process doesn't then pidfd_getfd is defined as
>>>> failing and setting errno to EPERM.
>>> 
>>> But what does it mean for a process to have PTRACE_MODE_REALCREDS?
>> 
>> #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
>> 
>> PTRACE_MODE_ATTACH_REALCREDS means
>> * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
>>  for permission checking:
>> 
>> So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
>> the target task's (ptracee's) effective, save, and real [g,u]id then
>> you'passed the first stage of permission checking.
>> 
>> If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
>> real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
>> That will also get you past the first state of permission checking.
>> 
>> If both don't apply the request is denied with -EPERM.
>> 
>> The second stage of permission checking is:
>> (1) If the task isn't dumpable then you'll need CAP_SYS_PTRACE in the
>>    ptracee's mm user namespace. That userns may differ from the
>>    ptracee's current userns if the ptracee's userns wasn't privileged
>>    over the inode of the file it exec'ed and thus could be an ancestor
>>    userns (see would_dump()).
>> (2) The LSMs might deny your request.
> 
> Right, it is pretty difficult to determine why you got an EPERM.  But
> failing with EPERM is documented behaviour for pidfd_getfd, so IMHO if
> we get EPERM we should simply skip the test whatever the "real" reason
> is.

I agree, this is possible return code for the syscall so we should
handle it on testcase.

> 
> The ptrace manual page lists the whole (6 step) algorithm under
> "Ptrace access mode checking":
> https://man7.org/linux/man-pages/man2/ptrace.2.html
> 
> It also depends on (not) having CAP_SYS_PTRACE and the permission
> might be dependend on /proc/sys/kernel/yama/ptrace_scope
> 
> Cheers,
> 
> Mark
>
  
Adhemerval Zanella Netto June 27, 2022, 2:23 p.m. UTC | #7
> On 26 Jun 2022, at 17:59, Mark Wielaard <mark@klomp.org> wrote:
> 
> pidfd_getfd will fail with errno EPERM if the calling process did not
> have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
> in that case.

From the discussion this is a possible error for the syscall.  However
I think it would be good print different errors for ENOSYS and EPERM.


> ---
> sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
> index d93b6faa6f..28349b2f91 100644
> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> @@ -95,8 +95,10 @@ do_test (void)
>        kernel has pidfd support that we can test.  */
>     int r = pidfd_getfd (0, 0, 1);
>     TEST_VERIFY_EXIT (r == -1);
> -    if (errno == ENOSYS)
> -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd, skipping test");
> +    if (errno == ENOSYS || errno == EPERM)
> +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> +			" or we don't have PTRACE_MODE_ATTACH_REALCREDS"
> +			" permissions, skipping test");

Keep the ENOSYS error message and add EPERM.

>   }
> 
>   ppid = getpid ();
> -- 
> 2.30.2
>
  
Christian Brauner June 27, 2022, 2:25 p.m. UTC | #8
On Mon, Jun 27, 2022 at 04:17:17PM +0200, Mark Wielaard wrote:
> Hi,
> 
> On Mon, Jun 27, 2022 at 01:51:41PM +0200, Christian Brauner wrote:
> > On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
> > > * Mark Wielaard:
> > > > pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
> > > > syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
> > > > (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
> > > > doesn't have. If the process doesn't then pidfd_getfd is defined as
> > > > failing and setting errno to EPERM.
> > > 
> > > But what does it mean for a process to have PTRACE_MODE_REALCREDS?
> > 
> > #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
> > 
> > PTRACE_MODE_ATTACH_REALCREDS means
> > * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
> >   for permission checking:
> > 
> > So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
> > the target task's (ptracee's) effective, save, and real [g,u]id then
> > you'passed the first stage of permission checking.
> > 
> > If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
> > real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
> > That will also get you past the first state of permission checking.
> > 
> > If both don't apply the request is denied with -EPERM.
> > 
> > The second stage of permission checking is:
> > (1) If the task isn't dumpable then you'll need CAP_SYS_PTRACE in the
> >     ptracee's mm user namespace. That userns may differ from the
> >     ptracee's current userns if the ptracee's userns wasn't privileged
> >     over the inode of the file it exec'ed and thus could be an ancestor
> >     userns (see would_dump()).
> > (2) The LSMs might deny your request.
> 
> Right, it is pretty difficult to determine why you got an EPERM.  But
> failing with EPERM is documented behaviour for pidfd_getfd, so IMHO if
> we get EPERM we should simply skip the test whatever the "real" reason
> is.
> 
> The ptrace manual page lists the whole (6 step) algorithm under
> "Ptrace access mode checking":
> https://man7.org/linux/man-pages/man2/ptrace.2.html
> 
> It also depends on (not) having CAP_SYS_PTRACE and the permission
> might be dependend on /proc/sys/kernel/yama/ptrace_scope

Yeah, to be frank, I consider ptrace_may_access() to be the worst piece
of permission checking code in the kernel. It just takes and shakes so
many core concepts that it's hard to understand and difficult to figure
out what exactly caused it to fail.
  
Florian Weimer June 27, 2022, 2:42 p.m. UTC | #9
* Christian Brauner:

> On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
>> * Mark Wielaard:
>> 
>> > Hi Florian,
>> >
>> > On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
>> >> * Mark Wielaard:
>> >> 
>> >> > pidfd_getfd will fail with errno EPERM if the calling process did
>> >> > not
>> >> > have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
>> >> > in that case.
>> >> > ---
>> >> >  sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
>> >> >  1 file changed, 4 insertions(+), 2 deletions(-)
>> >> > 
>> >> > diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
>> >> > b/sysdeps/unix/sysv/linux/tst-pidfd.c
>> >> > index d93b6faa6f..28349b2f91 100644
>> >> > --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
>> >> > +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
>> >> > @@ -95,8 +95,10 @@ do_test (void)
>> >> >         kernel has pidfd support that we can test.  */
>> >> >      int r = pidfd_getfd (0, 0, 1);
>> >> >      TEST_VERIFY_EXIT (r == -1);
>> >> > -    if (errno == ENOSYS)
>> >> > -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
>> >> > skipping test");
>> >> > +    if (errno == ENOSYS || errno == EPERM)
>> >> > +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
>> >> > +			" or we don't have
>> >> > PTRACE_MODE_ATTACH_REALCREDS"
>> >> > +			" permissions, skipping test");
>> >> >    }
>> >> 
>> >> This also hints towards a broken seccomp filter.
>> >
>> > pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
>> > syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
>> > (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
>> > doesn't have. If the process doesn't then pidfd_getfd is defined as
>> > failing and setting errno to EPERM.
>> 
>> But what does it mean for a process to have PTRACE_MODE_REALCREDS?
>
> #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
>
> PTRACE_MODE_ATTACH_REALCREDS means
> * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
>   for permission checking:
>
> So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
> the target task's (ptracee's) effective, save, and real [g,u]id then
> you'passed the first stage of permission checking.
>
> If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
> real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
> That will also get you past the first state of permission checking.
>
> If both don't apply the request is denied with -EPERM.

I think this doesn't apply to the

  pidfd_getfd (0, 0, 1)

call because the arguments are invalid and we never get to the kernel
permission check.  I thought this might be a self-get call, but that's
clearly not the case (0 does not refer to a process file descriptor).

So the comment should not mention PTRACE_MODE_ATTACH_REALCREDS at all
because it's really not relevant to the situation.  This is just another
case of broken seccomp filters.  So the usual discussion whether we
should paper over that or not in the test suite applies here as well.

Thanks,
Florian
  
Adhemerval Zanella Netto June 27, 2022, 2:57 p.m. UTC | #10
> On 27 Jun 2022, at 11:42, Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> * Christian Brauner:
> 
>> On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
>>> * Mark Wielaard:
>>> 
>>>> Hi Florian,
>>>> 
>>>> On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
>>>>> * Mark Wielaard:
>>>>> 
>>>>>> pidfd_getfd will fail with errno EPERM if the calling process did
>>>>>> not
>>>>>> have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
>>>>>> in that case.
>>>>>> ---
>>>>>> sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
>>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>> b/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>> index d93b6faa6f..28349b2f91 100644
>>>>>> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>> @@ -95,8 +95,10 @@ do_test (void)
>>>>>> kernel has pidfd support that we can test. */
>>>>>> int r = pidfd_getfd (0, 0, 1);
>>>>>> TEST_VERIFY_EXIT (r == -1);
>>>>>> - if (errno == ENOSYS)
>>>>>> - FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
>>>>>> skipping test");
>>>>>> + if (errno == ENOSYS || errno == EPERM)
>>>>>> + FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
>>>>>> +			" or we don't have
>>>>>> PTRACE_MODE_ATTACH_REALCREDS"
>>>>>> +			" permissions, skipping test");
>>>>>> }
>>>>> 
>>>>> This also hints towards a broken seccomp filter.
>>>> 
>>>> pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
>>>> syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
>>>> (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
>>>> doesn't have. If the process doesn't then pidfd_getfd is defined as
>>>> failing and setting errno to EPERM.
>>> 
>>> But what does it mean for a process to have PTRACE_MODE_REALCREDS?
>> 
>> #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
>> 
>> PTRACE_MODE_ATTACH_REALCREDS means
>> * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
>> for permission checking:
>> 
>> So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
>> the target task's (ptracee's) effective, save, and real [g,u]id then
>> you'passed the first stage of permission checking.
>> 
>> If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
>> real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
>> That will also get you past the first state of permission checking.
>> 
>> If both don't apply the request is denied with -EPERM.
> 
> I think this doesn't apply to the
> 
> pidfd_getfd (0, 0, 1)

The init has PPID 0, which leads to the idea pid argument 0 is valid.
So fdget won’t fail and EPERM will be returned by pidfd_pid.  Maybe
use pidfd_getfd (TYPE_MAXIMUM (pid_t), 0, 1) instead to trigger a
EBADF.

> 
> call because the arguments are invalid and we never get to the kernel
> permission check. I thought this might be a self-get call, but that's
> clearly not the case (0 does not refer to a process file descriptor).
> 
> So the comment should not mention PTRACE_MODE_ATTACH_REALCREDS at all
> because it's really not relevant to the situation. This is just another
> case of broken seccomp filters. So the usual discussion whether we
> should paper over that or not in the test suite applies here as well.
> 
> Thanks,
> Florian
  
Christian Brauner June 27, 2022, 3:03 p.m. UTC | #11
On Mon, Jun 27, 2022 at 04:42:32PM +0200, Florian Weimer wrote:
> * Christian Brauner:
> 
> > On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
> >> * Mark Wielaard:
> >> 
> >> > Hi Florian,
> >> >
> >> > On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
> >> >> * Mark Wielaard:
> >> >> 
> >> >> > pidfd_getfd will fail with errno EPERM if the calling process did
> >> >> > not
> >> >> > have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
> >> >> > in that case.
> >> >> > ---
> >> >> >  sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
> >> >> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >> >> > 
> >> >> > diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> >> > b/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> >> > index d93b6faa6f..28349b2f91 100644
> >> >> > --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> >> > +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> >> >> > @@ -95,8 +95,10 @@ do_test (void)
> >> >> >         kernel has pidfd support that we can test.  */
> >> >> >      int r = pidfd_getfd (0, 0, 1);
> >> >> >      TEST_VERIFY_EXIT (r == -1);
> >> >> > -    if (errno == ENOSYS)
> >> >> > -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
> >> >> > skipping test");
> >> >> > +    if (errno == ENOSYS || errno == EPERM)
> >> >> > +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> >> >> > +			" or we don't have
> >> >> > PTRACE_MODE_ATTACH_REALCREDS"
> >> >> > +			" permissions, skipping test");
> >> >> >    }
> >> >> 
> >> >> This also hints towards a broken seccomp filter.
> >> >
> >> > pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
> >> > syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
> >> > (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
> >> > doesn't have. If the process doesn't then pidfd_getfd is defined as
> >> > failing and setting errno to EPERM.
> >> 
> >> But what does it mean for a process to have PTRACE_MODE_REALCREDS?
> >
> > #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
> >
> > PTRACE_MODE_ATTACH_REALCREDS means
> > * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
> >   for permission checking:
> >
> > So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
> > the target task's (ptracee's) effective, save, and real [g,u]id then
> > you'passed the first stage of permission checking.
> >
> > If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
> > real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
> > That will also get you past the first state of permission checking.
> >
> > If both don't apply the request is denied with -EPERM.
> 
> I think this doesn't apply to the
> 
>   pidfd_getfd (0, 0, 1)
> 
> call because the arguments are invalid and we never get to the kernel
> permission check.  I thought this might be a self-get call, but that's
> clearly not the case (0 does not refer to a process file descriptor).
> 
> So the comment should not mention PTRACE_MODE_ATTACH_REALCREDS at all
> because it's really not relevant to the situation.  This is just another
> case of broken seccomp filters.  So the usual discussion whether we
> should paper over that or not in the test suite applies here as well.

Yeah, that absolutely is a seccomp filter issue. Your example should
give you EINVAL.

Btw, I think you want to use pidfd_get(-EBADF, 0, 0). That isn't as
error prone as passing 1 in the flags argument.

And you should be able to always expect -EBADF for this. I consider this
to be API in all honesty. IOW, if additional arguments are sane and you
pass an invalid fd to any pidfd call then they should give you -EBADF
back.

So based on this assumption you could say that if you don't get -ENOSYS
and don't get -EBADF back then this must be seccomp or something else...

Christian
  
Christian Brauner June 27, 2022, 3:08 p.m. UTC | #12
On Mon, Jun 27, 2022 at 11:57:02AM -0300, Adhemerval Zanella wrote:
> 
> 
> > On 27 Jun 2022, at 11:42, Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> wrote:
> > 
> > * Christian Brauner:
> > 
> >> On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
> >>> * Mark Wielaard:
> >>> 
> >>>> Hi Florian,
> >>>> 
> >>>> On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
> >>>>> * Mark Wielaard:
> >>>>> 
> >>>>>> pidfd_getfd will fail with errno EPERM if the calling process did
> >>>>>> not
> >>>>>> have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
> >>>>>> in that case.
> >>>>>> ---
> >>>>>> sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
> >>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
> >>>>>> b/sysdeps/unix/sysv/linux/tst-pidfd.c
> >>>>>> index d93b6faa6f..28349b2f91 100644
> >>>>>> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> >>>>>> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> >>>>>> @@ -95,8 +95,10 @@ do_test (void)
> >>>>>> kernel has pidfd support that we can test. */
> >>>>>> int r = pidfd_getfd (0, 0, 1);
> >>>>>> TEST_VERIFY_EXIT (r == -1);
> >>>>>> - if (errno == ENOSYS)
> >>>>>> - FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
> >>>>>> skipping test");
> >>>>>> + if (errno == ENOSYS || errno == EPERM)
> >>>>>> + FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> >>>>>> +			" or we don't have
> >>>>>> PTRACE_MODE_ATTACH_REALCREDS"
> >>>>>> +			" permissions, skipping test");
> >>>>>> }
> >>>>> 
> >>>>> This also hints towards a broken seccomp filter.
> >>>> 
> >>>> pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
> >>>> syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
> >>>> (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
> >>>> doesn't have. If the process doesn't then pidfd_getfd is defined as
> >>>> failing and setting errno to EPERM.
> >>> 
> >>> But what does it mean for a process to have PTRACE_MODE_REALCREDS?
> >> 
> >> #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
> >> 
> >> PTRACE_MODE_ATTACH_REALCREDS means
> >> * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
> >> for permission checking:
> >> 
> >> So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
> >> the target task's (ptracee's) effective, save, and real [g,u]id then
> >> you'passed the first stage of permission checking.
> >> 
> >> If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
> >> real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
> >> That will also get you past the first state of permission checking.
> >> 
> >> If both don't apply the request is denied with -EPERM.
> > 
> > I think this doesn't apply to the
> > 
> > pidfd_getfd (0, 0, 1)
> 
> The init has PPID 0, which leads to the idea pid argument 0 is valid.
> So fdget won’t fail and EPERM will be returned by pidfd_pid.  Maybe
> use pidfd_getfd (TYPE_MAXIMUM (pid_t), 0, 1) instead to trigger a
> EBADF.

No, I don't think that's the case.
EPERM can only ever be returned from ptrace_may_access() in
pidfd_getfd() or if there's something like seccomp in the mix.

You should see EINVAL, I think.

SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd, unsigned int, flags)
                                 0          0                 1
{
	struct pid *pid;
	struct fd f;
	int ret;

	/* flags is currently unused - make sure it's unset */
	if (flags) // that's 1 so you should see -EINVAL
		return -EINVAL;

	f = fdget(pidfd); // That'll get you stdin most likely.
	if (!f.file)
		return -EBADF;

	pid = pidfd_pid(f.file); // That would give you -EBADF see [1]
	if (IS_ERR(pid))
		ret = PTR_ERR(pid);
	else
		ret = pidfd_getfd(pid, fd);

	fdput(f);
	return ret;
}

[1]:

struct pid *pidfd_pid(const struct file *file)
{
	if (file->f_op == &pidfd_fops) // stdin is no pidfd and thus doesn't have pidfd_fops
		return file->private_data;

	return ERR_PTR(-EBADF); // that's what you'd see for stdin
}
  
Adhemerval Zanella Netto June 27, 2022, 3:14 p.m. UTC | #13
> On 27 Jun 2022, at 12:08, Christian Brauner <brauner@kernel.org> wrote:
> 
> On Mon, Jun 27, 2022 at 11:57:02AM -0300, Adhemerval Zanella wrote:
>> 
>> 
>>> On 27 Jun 2022, at 11:42, Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>> 
>>> * Christian Brauner:
>>> 
>>>> On Mon, Jun 27, 2022 at 01:14:06PM +0200, Florian Weimer via Libc-alpha wrote:
>>>>> * Mark Wielaard:
>>>>> 
>>>>>> Hi Florian,
>>>>>> 
>>>>>> On Sun, 2022-06-26 at 23:20 +0200, Florian Weimer wrote:
>>>>>>> * Mark Wielaard:
>>>>>>> 
>>>>>>>> pidfd_getfd will fail with errno EPERM if the calling process did
>>>>>>>> not
>>>>>>>> have PTRACE_MODE_ATTACH_REALCREDS permissions. Use FAIL_UNSUPPORTED
>>>>>>>> in that case.
>>>>>>>> ---
>>>>>>>> sysdeps/unix/sysv/linux/tst-pidfd.c | 6 ++++--
>>>>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>>>> b/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>>>> index d93b6faa6f..28349b2f91 100644
>>>>>>>> --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>>>> +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
>>>>>>>> @@ -95,8 +95,10 @@ do_test (void)
>>>>>>>> kernel has pidfd support that we can test. */
>>>>>>>> int r = pidfd_getfd (0, 0, 1);
>>>>>>>> TEST_VERIFY_EXIT (r == -1);
>>>>>>>> - if (errno == ENOSYS)
>>>>>>>> - FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,
>>>>>>>> skipping test");
>>>>>>>> + if (errno == ENOSYS || errno == EPERM)
>>>>>>>> + FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
>>>>>>>> +			" or we don't have
>>>>>>>> PTRACE_MODE_ATTACH_REALCREDS"
>>>>>>>> +			" permissions, skipping test");
>>>>>>>> }
>>>>>>> 
>>>>>>> This also hints towards a broken seccomp filter.
>>>>>> 
>>>>>> pidfd_getfd is mentioned (and allowed) by the seccomp filter, but the
>>>>>> syscall also needs the process to have PTRACE_MODE_ATTACH_REALCREDS
>>>>>> (which is really PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS). Which it
>>>>>> doesn't have. If the process doesn't then pidfd_getfd is defined as
>>>>>> failing and setting errno to EPERM.
>>>>> 
>>>>> But what does it mean for a process to have PTRACE_MODE_REALCREDS?
>>>> 
>>>> #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
>>>> 
>>>> PTRACE_MODE_ATTACH_REALCREDS means
>>>> * PTRACE_MODE_REALCREDS which means cred->{g,u}id of the task are used
>>>> for permission checking:
>>>> 
>>>> So it's a bit nasty here but roughly if the ptracer's real {g,u}id match
>>>> the target task's (ptracee's) effective, save, and real [g,u]id then
>>>> you'passed the first stage of permission checking.
>>>> 
>>>> If ptracer's [g,u]id aren't matching the ptracee's effective, saved, and
>>>> real [g,u]id the ptracer needs CAP_SYS_PTRACE in the ptracee's userns.
>>>> That will also get you past the first state of permission checking.
>>>> 
>>>> If both don't apply the request is denied with -EPERM.
>>> 
>>> I think this doesn't apply to the
>>> 
>>> pidfd_getfd (0, 0, 1)
>> 
>> The init has PPID 0, which leads to the idea pid argument 0 is valid.
>> So fdget won’t fail and EPERM will be returned by pidfd_pid. Maybe
>> use pidfd_getfd (TYPE_MAXIMUM (pid_t), 0, 1) instead to trigger a
>> EBADF.
> 
> No, I don't think that's the case.
> EPERM can only ever be returned from ptrace_may_access() in
> pidfd_getfd() or if there's something like seccomp in the mix.
> 
> You should see EINVAL, I think.

Yeah, I forgot about the invalid flag argument.  So it does seems a
seccomp issue and I am not sure sure if we should paper over it on glibc
test.

> 
> SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd, unsigned int, flags)
> 0 0 1
> {
> 	struct pid *pid;
> 	struct fd f;
> 	int ret;
> 
> 	/* flags is currently unused - make sure it's unset */
> 	if (flags) // that's 1 so you should see -EINVAL
> 		return -EINVAL;
> 
> 	f = fdget(pidfd); // That'll get you stdin most likely.
> 	if (!f.file)
> 		return -EBADF;
> 
> 	pid = pidfd_pid(f.file); // That would give you -EBADF see [1]
> 	if (IS_ERR(pid))
> 		ret = PTR_ERR(pid);
> 	else
> 		ret = pidfd_getfd(pid, fd);
> 
> 	fdput(f);
> 	return ret;
> }
> 
> [1]:
> 
> struct pid *pidfd_pid(const struct file *file)
> {
> 	if (file->f_op == &pidfd_fops) // stdin is no pidfd and thus doesn't have pidfd_fops
> 		return file->private_data;
> 
> 	return ERR_PTR(-EBADF); // that's what you'd see for stdin
> }
  
Mark Wielaard June 27, 2022, 4:36 p.m. UTC | #14
Hi Adhemerval,

On Mon, 2022-06-27 at 11:23 -0300, Adhemerval Zanella wrote:
> I think it would be good print different errors for ENOSYS and EPERM.
> 
> > --- a/sysdeps/unix/sysv/linux/tst-pidfd.c
> > +++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
> > @@ -95,8 +95,10 @@ do_test (void)
> >        kernel has pidfd support that we can test.  */
> >     int r = pidfd_getfd (0, 0, 1);
> >     TEST_VERIFY_EXIT (r == -1);
> > -    if (errno == ENOSYS)
> > -      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd, skipping test");
> > +    if (errno == ENOSYS || errno == EPERM)
> > +      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
> > +			" or we don't have PTRACE_MODE_ATTACH_REALCREDS"
> > +			" permissions, skipping test");
> 
> Keep the ENOSYS error message and add EPERM.

I posted a v2 that does that:
https://sourceware.org/pipermail/libc-alpha/2022-June/140054.html

Cheers,

Mark
  
Mark Wielaard June 27, 2022, 4:48 p.m. UTC | #15
Hi Adhemerval,

On Mon, 2022-06-27 at 12:14 -0300, Adhemerval Zanella wrote:
> > You should see EINVAL, I think.
> 
> Yeah, I forgot about the invalid flag argument.  So it does seems a
> seccomp issue and I am not sure sure if we should paper over it on
> glibc test.

I don't think it is papering over the issue. We do detect the EPERM and
record it as an environment that is UNSUPPORTED. Which is imho a better
state than still trying to test and then FAILing the testcase.

Note that in the buildbot the test results and out files are put into
the bunsendb so that you can easily see why a particular test was
UNSUPPORTED (or why it FAILED).

We might not like that there are execution environments where some
syscalls return EPERM, but (sadly) they exist and handling and
recording that in the test (instead of simply failing) seems the best
way to handle that.

It also makes sure we have a zero-FAIL testsuite, which is really
useful.

Cheers,

Mark
  
Adhemerval Zanella Netto June 27, 2022, 5:03 p.m. UTC | #16
> On 27 Jun 2022, at 13:48, Mark Wielaard <mark@klomp.org> wrote:
> 
> Hi Adhemerval,
> 
> On Mon, 2022-06-27 at 12:14 -0300, Adhemerval Zanella wrote:
>>> You should see EINVAL, I think.
>> 
>> Yeah, I forgot about the invalid flag argument.  So it does seems a
>> seccomp issue and I am not sure sure if we should paper over it on
>> glibc test.
> 
> I don't think it is papering over the issue. We do detect the EPERM and
> record it as an environment that is UNSUPPORTED. Which is imho a better
> state than still trying to test and then FAILing the testcase.

At first I though it was the default kernel syscall code path (without
any security hardening or filtering) returning EPERM, but for this 
specific case it seems that an additional module that is return EPERM.

In this specific environment, how can one test that the syscall is not
implemented (ENOSYS) if passing invalid parameters always result in
EPERM? The different semantic whether filtering is present or not is
troublesome and at least on libc.so we do not try fallback. 

although it has caused a lot of issue on older containers (for instance
when glibc started to use clone3), I still think the correct interface 
here is indeed returning ENOSYS and a value different than this is
a potential failure.

What I would expect is to actually handle EPERM on the pidfd_open
at line 118, where it is valid call that might be filtered out.

> 
> Note that in the buildbot the test results and out files are put into
> the bunsendb so that you can easily see why a particular test was
> UNSUPPORTED (or why it FAILED).
> 
> We might not like that there are execution environments where some
> syscalls return EPERM, but (sadly) they exist and handling and
> recording that in the test (instead of simply failing) seems the best
> way to handle that.
> 
> It also makes sure we have a zero-FAIL testsuite, which is really
> useful.
> 
> Cheers,
> 
> Mark
  
Mark Wielaard July 1, 2022, 10:38 a.m. UTC | #17
Hi Adhemerval,

On Mon, 2022-06-27 at 14:03 -0300, Adhemerval Zanella wrote:
> > On 27 Jun 2022, at 13:48, Mark Wielaard <mark@klomp.org> wrote:
> > I don't think it is papering over the issue. We do detect the EPERM and
> > record it as an environment that is UNSUPPORTED. Which is imho a better
> > state than still trying to test and then FAILing the testcase.
> 
> At first I though it was the default kernel syscall code path (without
> any security hardening or filtering) returning EPERM, but for this 
> specific case it seems that an additional module that is return EPERM.
> 
> In this specific environment, how can one test that the syscall is not
> implemented (ENOSYS) if passing invalid parameters always result in
> EPERM? The different semantic whether filtering is present or not is
> troublesome and at least on libc.so we do not try fallback. 
> 
> although it has caused a lot of issue on older containers (for instance
> when glibc started to use clone3), I still think the correct interface 
> here is indeed returning ENOSYS and a value different than this is
> a potential failure.

Completely agreed. If you cannot make a difference between ENOSYS or
EOERM errors it is a pain/impossible to create (or test) the usage of
such a system call. That is why the pathc simply detects the issue,
flags it and doesn't try to run the testcase because that is impossible
to do correctly in that environment.

> What I would expect is to actually handle EPERM on the pidfd_open
> at line 118, where it is valid call that might be filtered out.

Good point. I will add that in a v3.

Thanks,

Mark
  

Patch

diff --git a/sysdeps/unix/sysv/linux/tst-pidfd.c b/sysdeps/unix/sysv/linux/tst-pidfd.c
index d93b6faa6f..28349b2f91 100644
--- a/sysdeps/unix/sysv/linux/tst-pidfd.c
+++ b/sysdeps/unix/sysv/linux/tst-pidfd.c
@@ -95,8 +95,10 @@  do_test (void)
        kernel has pidfd support that we can test.  */
     int r = pidfd_getfd (0, 0, 1);
     TEST_VERIFY_EXIT (r == -1);
-    if (errno == ENOSYS)
-      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd, skipping test");
+    if (errno == ENOSYS || errno == EPERM)
+      FAIL_UNSUPPORTED ("kernel does not support pidfd_getfd,"
+			" or we don't have PTRACE_MODE_ATTACH_REALCREDS"
+			" permissions, skipping test");
   }
 
   ppid = getpid ();