fix create thread failed in unprivileged process [BZ #28287]

Message ID 20210829132954.18148-1-hongxu.jia@windriver.com
State Not applicable
Headers
Series fix create thread failed in unprivileged process [BZ #28287] |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Hongxu Jia Aug. 29, 2021, 1:29 p.m. UTC
  Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
applied, start a unprivileged container (docker run without --privileged),
it creates a thread failed in container.

In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
__clone3 returns -1 with ENOSYS, fall back to clone or clone2.

As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
was specified by an unprivileged process (process without CAP_SYS_ADMIN)

[1] https://man7.org/linux/man-pages/man2/clone3.2.html

So if __clone3 returns -1 with EPERM, fall back to clone or clone2 could
fix the issue. Here are the test steps:

1) Prepare test code
cat > conftest.c <<ENDOF
 #include <pthread.h>
 #include <stdio.h>

int check_me = 0;
void* func(void* data) {check_me = 42; printf("start thread: check_me %d\n", check_me); return &check_me;}
int main()
{
  pthread_t t;
  void *ret;
  pthread_create (&t, 0, func, 0);
  pthread_join (t, &ret);
  printf("check_me %d, p %p\n", check_me, &ret);
  return (check_me != 42 || ret != &check_me);
}

ENDOF

2) Compile
gcc -o conftest -pthread conftest.c

3) Start a container with glibc 2.34 installed
[skip details]
docker run -it <container-image-name> bash

4) Run conftest without this patch
$ ./conftest
check_me 0, p 0x7ffd91ccd400

5) Run conftest with this patch
$ ./conftest
start thread: check_me 42
check_me 42, p 0x7ffe253c6f20

Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
---
 sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

H.J. Lu Aug. 29, 2021, 1:47 p.m. UTC | #1
On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> applied, start a unprivileged container (docker run without --privileged),
> it creates a thread failed in container.
>
> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>
> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> was specified by an unprivileged process (process without CAP_SYS_ADMIN)

I don't think the description is accurate.  In your test, none
of the mentioned flags are used directly.  The real bug is
that the container you used blocks the normal clone3 and
sets errno to EPERM.  The question is if/how glibc should
work arounds the clone3 bug in containers.   We want to add
a public clone3 wrapper to glibc in the future.  But before we
do that, all these containers should be changed to ENOSYS
if clone3 is blocked.

> [1] https://man7.org/linux/man-pages/man2/clone3.2.html
>
> So if __clone3 returns -1 with EPERM, fall back to clone or clone2 could
> fix the issue. Here are the test steps:
>
> 1) Prepare test code
> cat > conftest.c <<ENDOF
>  #include <pthread.h>
>  #include <stdio.h>
>
> int check_me = 0;
> void* func(void* data) {check_me = 42; printf("start thread: check_me %d\n", check_me); return &check_me;}
> int main()
> {
>   pthread_t t;
>   void *ret;
>   pthread_create (&t, 0, func, 0);
>   pthread_join (t, &ret);
>   printf("check_me %d, p %p\n", check_me, &ret);
>   return (check_me != 42 || ret != &check_me);
> }
>
> ENDOF
>
> 2) Compile
> gcc -o conftest -pthread conftest.c
>
> 3) Start a container with glibc 2.34 installed
> [skip details]
> docker run -it <container-image-name> bash
>
> 4) Run conftest without this patch
> $ ./conftest
> check_me 0, p 0x7ffd91ccd400
>
> 5) Run conftest with this patch
> $ ./conftest
> start thread: check_me 42
> check_me 42, p 0x7ffe253c6f20
>
> Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
> ---
>  sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> index 979f7880be..97101994e8 100644
> --- a/sysdeps/unix/sysv/linux/clone-internal.c
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
>    /* Try clone3 first.  */
>    int saved_errno = errno;
>    ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> -  if (ret != -1 || errno != ENOSYS)
> +  if (ret != -1 || (errno != ENOSYS && errno != EPERM))
>      return ret;
>
>    /* NB: Restore errno since errno may be checked against non-zero
> --
> 2.30.2
>
  
Hongxu Jia Aug. 29, 2021, 2:12 p.m. UTC | #2
On 8/29/21 9:47 PM, H.J. Lu wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
>> applied, start a unprivileged container (docker run without --privileged),
>> it creates a thread failed in container.
>>
>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>
>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
> I don't think the description is accurate.  In your test, none
> of the mentioned flags are used directly.  The real bug is
> that the container you used blocks the normal clone3 and
> sets errno to EPERM.  The question is if/how glibc should
> work arounds the clone3 bug in containers.   We want to add
> a public clone3 wrapper to glibc in the future.  But before we
> do that, all these containers should be changed to ENOSYS
> if clone3 is blocked.

You mean I should fix the container (here is the docker I used) to correct

EPERM to ENOSYS in this situation, but for the released/old docker,

the pthread_create still does not work with glibc 2.34 in unprivileged mode.

In other word, should the new glibc consider backward compatibility with 
others?

//Hongxu

>> [1] https://man7.org/linux/man-pages/man2/clone3.2.html
>>
>> So if __clone3 returns -1 with EPERM, fall back to clone or clone2 could
>> fix the issue. Here are the test steps:
>>
>> 1) Prepare test code
>> cat > conftest.c <<ENDOF
>>   #include <pthread.h>
>>   #include <stdio.h>
>>
>> int check_me = 0;
>> void* func(void* data) {check_me = 42; printf("start thread: check_me %d\n", check_me); return &check_me;}
>> int main()
>> {
>>    pthread_t t;
>>    void *ret;
>>    pthread_create (&t, 0, func, 0);
>>    pthread_join (t, &ret);
>>    printf("check_me %d, p %p\n", check_me, &ret);
>>    return (check_me != 42 || ret != &check_me);
>> }
>>
>> ENDOF
>>
>> 2) Compile
>> gcc -o conftest -pthread conftest.c
>>
>> 3) Start a container with glibc 2.34 installed
>> [skip details]
>> docker run -it <container-image-name> bash
>>
>> 4) Run conftest without this patch
>> $ ./conftest
>> check_me 0, p 0x7ffd91ccd400
>>
>> 5) Run conftest with this patch
>> $ ./conftest
>> start thread: check_me 42
>> check_me 42, p 0x7ffe253c6f20
>>
>> Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
>> ---
>>   sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
>> index 979f7880be..97101994e8 100644
>> --- a/sysdeps/unix/sysv/linux/clone-internal.c
>> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
>> @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
>>     /* Try clone3 first.  */
>>     int saved_errno = errno;
>>     ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
>> -  if (ret != -1 || errno != ENOSYS)
>> +  if (ret != -1 || (errno != ENOSYS && errno != EPERM))
>>       return ret;
>>
>>     /* NB: Restore errno since errno may be checked against non-zero
>> --
>> 2.30.2
>>
>
> --
> H.J.
  
H.J. Lu Aug. 29, 2021, 2:43 p.m. UTC | #3
On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 9:47 PM, H.J. Lu wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
> >> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> >> applied, start a unprivileged container (docker run without --privileged),
> >> it creates a thread failed in container.
> >>
> >> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
> >> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
> >>
> >> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> >> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> >> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
> > I don't think the description is accurate.  In your test, none
> > of the mentioned flags are used directly.  The real bug is
> > that the container you used blocks the normal clone3 and
> > sets errno to EPERM.  The question is if/how glibc should
> > work arounds the clone3 bug in containers.   We want to add
> > a public clone3 wrapper to glibc in the future.  But before we
> > do that, all these containers should be changed to ENOSYS
> > if clone3 is blocked.
>
> You mean I should fix the container (here is the docker I used) to correct
> EPERM to ENOSYS in this situation, but for the released/old docker,
> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>
> In other word, should the new glibc consider backward compatibility with
> others?

I don't think we should hide the container bug in glibc.   Will a glibc tunable
to disable the clone3 wrapper work here?
  
Hongxu Jia Aug. 29, 2021, 2:46 p.m. UTC | #4
With a simple search, the newest docker has correct the issue

https://github.com/moby/moby/commit/9f6b562dd12ef7b1f9e2f8e6f2ab6477790a6594

but the commit only was applied on master, not any released version

//Hongxu

On 8/29/21 10:12 PM, Hongxu Jia wrote:
> On 8/29/21 9:47 PM, H.J. Lu wrote:
>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>
>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> 
>> wrote:
>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 
>>> and clone3]
>>> applied, start a unprivileged container (docker run without 
>>> --privileged),
>>> it creates a thread failed in container.
>>>
>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is 
>>> defined.  If
>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>>
>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>>> was specified by an unprivileged process (process without 
>>> CAP_SYS_ADMIN)
>> I don't think the description is accurate.  In your test, none
>> of the mentioned flags are used directly.  The real bug is
>> that the container you used blocks the normal clone3 and
>> sets errno to EPERM.  The question is if/how glibc should
>> work arounds the clone3 bug in containers.   We want to add
>> a public clone3 wrapper to glibc in the future.  But before we
>> do that, all these containers should be changed to ENOSYS
>> if clone3 is blocked.
>
> You mean I should fix the container (here is the docker I used) to 
> correct
>
> EPERM to ENOSYS in this situation, but for the released/old docker,
>
> the pthread_create still does not work with glibc 2.34 in unprivileged 
> mode.
>
> In other word, should the new glibc consider backward compatibility 
> with others?
>
> //Hongxu
>
>>> [1] https://man7.org/linux/man-pages/man2/clone3.2.html
>>>
>>> So if __clone3 returns -1 with EPERM, fall back to clone or clone2 
>>> could
>>> fix the issue. Here are the test steps:
>>>
>>> 1) Prepare test code
>>> cat > conftest.c <<ENDOF
>>>   #include <pthread.h>
>>>   #include <stdio.h>
>>>
>>> int check_me = 0;
>>> void* func(void* data) {check_me = 42; printf("start thread: 
>>> check_me %d\n", check_me); return &check_me;}
>>> int main()
>>> {
>>>    pthread_t t;
>>>    void *ret;
>>>    pthread_create (&t, 0, func, 0);
>>>    pthread_join (t, &ret);
>>>    printf("check_me %d, p %p\n", check_me, &ret);
>>>    return (check_me != 42 || ret != &check_me);
>>> }
>>>
>>> ENDOF
>>>
>>> 2) Compile
>>> gcc -o conftest -pthread conftest.c
>>>
>>> 3) Start a container with glibc 2.34 installed
>>> [skip details]
>>> docker run -it <container-image-name> bash
>>>
>>> 4) Run conftest without this patch
>>> $ ./conftest
>>> check_me 0, p 0x7ffd91ccd400
>>>
>>> 5) Run conftest with this patch
>>> $ ./conftest
>>> start thread: check_me 42
>>> check_me 42, p 0x7ffe253c6f20
>>>
>>> Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
>>> ---
>>>   sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c 
>>> b/sysdeps/unix/sysv/linux/clone-internal.c
>>> index 979f7880be..97101994e8 100644
>>> --- a/sysdeps/unix/sysv/linux/clone-internal.c
>>> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
>>> @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
>>>     /* Try clone3 first.  */
>>>     int saved_errno = errno;
>>>     ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
>>> -  if (ret != -1 || errno != ENOSYS)
>>> +  if (ret != -1 || (errno != ENOSYS && errno != EPERM))
>>>       return ret;
>>>
>>>     /* NB: Restore errno since errno may be checked against non-zero
>>> -- 
>>> 2.30.2
>>>
>>
>> -- 
>> H.J.
>
>
  
Hongxu Jia Aug. 29, 2021, 2:50 p.m. UTC | #5
On 8/29/21 10:43 PM, H.J. Lu wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>> On 8/29/21 9:47 PM, H.J. Lu wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
>>>> applied, start a unprivileged container (docker run without --privileged),
>>>> it creates a thread failed in container.
>>>>
>>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
>>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>>>
>>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>>>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
>>> I don't think the description is accurate.  In your test, none
>>> of the mentioned flags are used directly.  The real bug is
>>> that the container you used blocks the normal clone3 and
>>> sets errno to EPERM.  The question is if/how glibc should
>>> work arounds the clone3 bug in containers.   We want to add
>>> a public clone3 wrapper to glibc in the future.  But before we
>>> do that, all these containers should be changed to ENOSYS
>>> if clone3 is blocked.
>> You mean I should fix the container (here is the docker I used) to correct
>> EPERM to ENOSYS in this situation, but for the released/old docker,
>> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>>
>> In other word, should the new glibc consider backward compatibility with
>> others?
> I don't think we should hide the container bug in glibc.   Will a glibc tunable
> to disable the clone3 wrapper work here?

Yes, that's my plan B, disable it by removing the macro definition of  
HAVE_CLONE3_WRAPPER in our Yocto's glibc

//Hongxu

> --
> H.J.
  
H.J. Lu Aug. 29, 2021, 3:20 p.m. UTC | #6
On Sun, Aug 29, 2021 at 7:50 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 10:43 PM, H.J. Lu wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
> >> On 8/29/21 9:47 PM, H.J. Lu wrote:
> >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>>
> >>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
> >>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> >>>> applied, start a unprivileged container (docker run without --privileged),
> >>>> it creates a thread failed in container.
> >>>>
> >>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
> >>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
> >>>>
> >>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> >>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> >>>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
> >>> I don't think the description is accurate.  In your test, none
> >>> of the mentioned flags are used directly.  The real bug is
> >>> that the container you used blocks the normal clone3 and
> >>> sets errno to EPERM.  The question is if/how glibc should
> >>> work arounds the clone3 bug in containers.   We want to add
> >>> a public clone3 wrapper to glibc in the future.  But before we
> >>> do that, all these containers should be changed to ENOSYS
> >>> if clone3 is blocked.
> >> You mean I should fix the container (here is the docker I used) to correct
> >> EPERM to ENOSYS in this situation, but for the released/old docker,
> >> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
> >>
> >> In other word, should the new glibc consider backward compatibility with
> >> others?
> > I don't think we should hide the container bug in glibc.   Will a glibc tunable
> > to disable the clone3 wrapper work here?
>
> Yes, that's my plan B, disable it by removing the macro definition of
> HAVE_CLONE3_WRAPPER in our Yocto's glibc
>

This is an option.  But this is not what I meant.  We can add

$ export GLIBC_TUNABLES=glibc.syscall=disable_clone3

to disable the clone3 wrapper.
  
Hongxu Jia Aug. 29, 2021, 4:03 p.m. UTC | #7
On 8/29/21 11:20 PM, H.J. Lu wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:50 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>> On 8/29/21 10:43 PM, H.J. Lu wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>>>> On 8/29/21 9:47 PM, H.J. Lu wrote:
>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>
>>>>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>>>>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
>>>>>> applied, start a unprivileged container (docker run without --privileged),
>>>>>> it creates a thread failed in container.
>>>>>>
>>>>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
>>>>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>>>>>
>>>>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>>>>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>>>>>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
>>>>> I don't think the description is accurate.  In your test, none
>>>>> of the mentioned flags are used directly.  The real bug is
>>>>> that the container you used blocks the normal clone3 and
>>>>> sets errno to EPERM.  The question is if/how glibc should
>>>>> work arounds the clone3 bug in containers.   We want to add
>>>>> a public clone3 wrapper to glibc in the future.  But before we
>>>>> do that, all these containers should be changed to ENOSYS
>>>>> if clone3 is blocked.
>>>> You mean I should fix the container (here is the docker I used) to correct
>>>> EPERM to ENOSYS in this situation, but for the released/old docker,
>>>> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>>>>
>>>> In other word, should the new glibc consider backward compatibility with
>>>> others?
>>> I don't think we should hide the container bug in glibc.   Will a glibc tunable
>>> to disable the clone3 wrapper work here?
>> Yes, that's my plan B, disable it by removing the macro definition of
>> HAVE_CLONE3_WRAPPER in our Yocto's glibc
>>
> This is an option.  But this is not what I meant.  We can add
>
> $ export GLIBC_TUNABLES=glibc.syscall=disable_clone3
>
> to disable the clone3 wrapper.

Thank you very much, setting an environment is better than applying an 
patch to sources

but unfortunately, I set 'export 
GLIBC_TUNABLES=glibc.syscall=disable_clone3' in my glibc build 
environment, but it seems not work,

the issue still exists. I also apply it in my runtime container, it does 
not work neither.

My build environment is a Yocto project that supports cross compiling, I 
am not familiar with GLIBC_TUNABLES setting,

with a simple search in glibc sources, I do not find clues about 
glibc.syscall=disable_clone3

//Hongxu

//


>
> --
> H.J.
  
H.J. Lu Aug. 29, 2021, 4:57 p.m. UTC | #8
On Sun, Aug 29, 2021 at 9:03 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 11:20 PM, H.J. Lu wrote:
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:50 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 10:43 PM, H.J. Lu wrote:
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 9:47 PM, H.J. Lu wrote:
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> applied, start a unprivileged container (docker run without --privileged),
> it creates a thread failed in container.
>
> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined.  If
> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>
> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
>
> I don't think the description is accurate.  In your test, none
> of the mentioned flags are used directly.  The real bug is
> that the container you used blocks the normal clone3 and
> sets errno to EPERM.  The question is if/how glibc should
> work arounds the clone3 bug in containers.   We want to add
> a public clone3 wrapper to glibc in the future.  But before we
> do that, all these containers should be changed to ENOSYS
> if clone3 is blocked.
>
> You mean I should fix the container (here is the docker I used) to correct
> EPERM to ENOSYS in this situation, but for the released/old docker,
> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>
> In other word, should the new glibc consider backward compatibility with
> others?
>
> I don't think we should hide the container bug in glibc.   Will a glibc tunable
> to disable the clone3 wrapper work here?
>
> Yes, that's my plan B, disable it by removing the macro definition of
> HAVE_CLONE3_WRAPPER in our Yocto's glibc
>
> This is an option.  But this is not what I meant.  We can add
>
> $ export GLIBC_TUNABLES=glibc.syscall=disable_clone3
>
> to disable the clone3 wrapper.
>
> Thank you very much, setting an environment is better than applying an patch to sources
>
> but unfortunately, I set 'export GLIBC_TUNABLES=glibc.syscall=disable_clone3' in my glibc build environment, but it seems not work,
>
> the issue still exists. I also apply it in my runtime container, it does not work neither.
>
> My build environment is a Yocto project that supports cross compiling, I am not familiar with GLIBC_TUNABLES setting,
>
> with a simple search in glibc sources, I do not find clues about  glibc.syscall=disable_clone3
>
>

Someone needs to add it.
  

Patch

diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
index 979f7880be..97101994e8 100644
--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -52,7 +52,7 @@  __clone_internal (struct clone_args *cl_args,
   /* Try clone3 first.  */
   int saved_errno = errno;
   ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
-  if (ret != -1 || errno != ENOSYS)
+  if (ret != -1 || (errno != ENOSYS && errno != EPERM))
     return ret;
 
   /* NB: Restore errno since errno may be checked against non-zero