fix create thread failed in unprivileged process [BZ #28287]
Checks
Context |
Check |
Description |
dj/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
dj/TryBot-32bit |
success
|
Build for i686
|
Commit Message
Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
applied, start a unprivileged container (docker run without --privileged),
it creates a thread failed in container.
In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
__clone3 returns -1 with ENOSYS, fall back to clone or clone2.
As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
was specified by an unprivileged process (process without CAP_SYS_ADMIN)
[1] https://man7.org/linux/man-pages/man2/clone3.2.html
So if __clone3 returns -1 with EPERM, fall back to clone or clone2 could
fix the issue. Here are the test steps:
1) Prepare test code
cat > conftest.c <<ENDOF
#include <pthread.h>
#include <stdio.h>
int check_me = 0;
void* func(void* data) {check_me = 42; printf("start thread: check_me %d\n", check_me); return &check_me;}
int main()
{
pthread_t t;
void *ret;
pthread_create (&t, 0, func, 0);
pthread_join (t, &ret);
printf("check_me %d, p %p\n", check_me, &ret);
return (check_me != 42 || ret != &check_me);
}
ENDOF
2) Compile
gcc -o conftest -pthread conftest.c
3) Start a container with glibc 2.34 installed
[skip details]
docker run -it <container-image-name> bash
4) Run conftest without this patch
$ ./conftest
check_me 0, p 0x7ffd91ccd400
5) Run conftest with this patch
$ ./conftest
start thread: check_me 42
check_me 42, p 0x7ffe253c6f20
Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
---
sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> applied, start a unprivileged container (docker run without --privileged),
> it creates a thread failed in container.
>
> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>
> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
I don't think the description is accurate. In your test, none
of the mentioned flags are used directly. The real bug is
that the container you used blocks the normal clone3 and
sets errno to EPERM. The question is if/how glibc should
work arounds the clone3 bug in containers. We want to add
a public clone3 wrapper to glibc in the future. But before we
do that, all these containers should be changed to ENOSYS
if clone3 is blocked.
> [1] https://man7.org/linux/man-pages/man2/clone3.2.html
>
> So if __clone3 returns -1 with EPERM, fall back to clone or clone2 could
> fix the issue. Here are the test steps:
>
> 1) Prepare test code
> cat > conftest.c <<ENDOF
> #include <pthread.h>
> #include <stdio.h>
>
> int check_me = 0;
> void* func(void* data) {check_me = 42; printf("start thread: check_me %d\n", check_me); return &check_me;}
> int main()
> {
> pthread_t t;
> void *ret;
> pthread_create (&t, 0, func, 0);
> pthread_join (t, &ret);
> printf("check_me %d, p %p\n", check_me, &ret);
> return (check_me != 42 || ret != &check_me);
> }
>
> ENDOF
>
> 2) Compile
> gcc -o conftest -pthread conftest.c
>
> 3) Start a container with glibc 2.34 installed
> [skip details]
> docker run -it <container-image-name> bash
>
> 4) Run conftest without this patch
> $ ./conftest
> check_me 0, p 0x7ffd91ccd400
>
> 5) Run conftest with this patch
> $ ./conftest
> start thread: check_me 42
> check_me 42, p 0x7ffe253c6f20
>
> Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
> ---
> sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
> index 979f7880be..97101994e8 100644
> --- a/sysdeps/unix/sysv/linux/clone-internal.c
> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
> @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
> /* Try clone3 first. */
> int saved_errno = errno;
> ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
> - if (ret != -1 || errno != ENOSYS)
> + if (ret != -1 || (errno != ENOSYS && errno != EPERM))
> return ret;
>
> /* NB: Restore errno since errno may be checked against non-zero
> --
> 2.30.2
>
On 8/29/21 9:47 PM, H.J. Lu wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
>> applied, start a unprivileged container (docker run without --privileged),
>> it creates a thread failed in container.
>>
>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>
>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
> I don't think the description is accurate. In your test, none
> of the mentioned flags are used directly. The real bug is
> that the container you used blocks the normal clone3 and
> sets errno to EPERM. The question is if/how glibc should
> work arounds the clone3 bug in containers. We want to add
> a public clone3 wrapper to glibc in the future. But before we
> do that, all these containers should be changed to ENOSYS
> if clone3 is blocked.
You mean I should fix the container (here is the docker I used) to correct
EPERM to ENOSYS in this situation, but for the released/old docker,
the pthread_create still does not work with glibc 2.34 in unprivileged mode.
In other word, should the new glibc consider backward compatibility with
others?
//Hongxu
>> [1] https://man7.org/linux/man-pages/man2/clone3.2.html
>>
>> So if __clone3 returns -1 with EPERM, fall back to clone or clone2 could
>> fix the issue. Here are the test steps:
>>
>> 1) Prepare test code
>> cat > conftest.c <<ENDOF
>> #include <pthread.h>
>> #include <stdio.h>
>>
>> int check_me = 0;
>> void* func(void* data) {check_me = 42; printf("start thread: check_me %d\n", check_me); return &check_me;}
>> int main()
>> {
>> pthread_t t;
>> void *ret;
>> pthread_create (&t, 0, func, 0);
>> pthread_join (t, &ret);
>> printf("check_me %d, p %p\n", check_me, &ret);
>> return (check_me != 42 || ret != &check_me);
>> }
>>
>> ENDOF
>>
>> 2) Compile
>> gcc -o conftest -pthread conftest.c
>>
>> 3) Start a container with glibc 2.34 installed
>> [skip details]
>> docker run -it <container-image-name> bash
>>
>> 4) Run conftest without this patch
>> $ ./conftest
>> check_me 0, p 0x7ffd91ccd400
>>
>> 5) Run conftest with this patch
>> $ ./conftest
>> start thread: check_me 42
>> check_me 42, p 0x7ffe253c6f20
>>
>> Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
>> ---
>> sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c b/sysdeps/unix/sysv/linux/clone-internal.c
>> index 979f7880be..97101994e8 100644
>> --- a/sysdeps/unix/sysv/linux/clone-internal.c
>> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
>> @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
>> /* Try clone3 first. */
>> int saved_errno = errno;
>> ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
>> - if (ret != -1 || errno != ENOSYS)
>> + if (ret != -1 || (errno != ENOSYS && errno != EPERM))
>> return ret;
>>
>> /* NB: Restore errno since errno may be checked against non-zero
>> --
>> 2.30.2
>>
>
> --
> H.J.
On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 9:47 PM, H.J. Lu wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
> >> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> >> applied, start a unprivileged container (docker run without --privileged),
> >> it creates a thread failed in container.
> >>
> >> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
> >> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
> >>
> >> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> >> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> >> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
> > I don't think the description is accurate. In your test, none
> > of the mentioned flags are used directly. The real bug is
> > that the container you used blocks the normal clone3 and
> > sets errno to EPERM. The question is if/how glibc should
> > work arounds the clone3 bug in containers. We want to add
> > a public clone3 wrapper to glibc in the future. But before we
> > do that, all these containers should be changed to ENOSYS
> > if clone3 is blocked.
>
> You mean I should fix the container (here is the docker I used) to correct
> EPERM to ENOSYS in this situation, but for the released/old docker,
> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>
> In other word, should the new glibc consider backward compatibility with
> others?
I don't think we should hide the container bug in glibc. Will a glibc tunable
to disable the clone3 wrapper work here?
With a simple search, the newest docker has correct the issue
https://github.com/moby/moby/commit/9f6b562dd12ef7b1f9e2f8e6f2ab6477790a6594
but the commit only was applied on master, not any released version
//Hongxu
On 8/29/21 10:12 PM, Hongxu Jia wrote:
> On 8/29/21 9:47 PM, H.J. Lu wrote:
>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>
>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com>
>> wrote:
>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2
>>> and clone3]
>>> applied, start a unprivileged container (docker run without
>>> --privileged),
>>> it creates a thread failed in container.
>>>
>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is
>>> defined. If
>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>>
>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>>> was specified by an unprivileged process (process without
>>> CAP_SYS_ADMIN)
>> I don't think the description is accurate. In your test, none
>> of the mentioned flags are used directly. The real bug is
>> that the container you used blocks the normal clone3 and
>> sets errno to EPERM. The question is if/how glibc should
>> work arounds the clone3 bug in containers. We want to add
>> a public clone3 wrapper to glibc in the future. But before we
>> do that, all these containers should be changed to ENOSYS
>> if clone3 is blocked.
>
> You mean I should fix the container (here is the docker I used) to
> correct
>
> EPERM to ENOSYS in this situation, but for the released/old docker,
>
> the pthread_create still does not work with glibc 2.34 in unprivileged
> mode.
>
> In other word, should the new glibc consider backward compatibility
> with others?
>
> //Hongxu
>
>>> [1] https://man7.org/linux/man-pages/man2/clone3.2.html
>>>
>>> So if __clone3 returns -1 with EPERM, fall back to clone or clone2
>>> could
>>> fix the issue. Here are the test steps:
>>>
>>> 1) Prepare test code
>>> cat > conftest.c <<ENDOF
>>> #include <pthread.h>
>>> #include <stdio.h>
>>>
>>> int check_me = 0;
>>> void* func(void* data) {check_me = 42; printf("start thread:
>>> check_me %d\n", check_me); return &check_me;}
>>> int main()
>>> {
>>> pthread_t t;
>>> void *ret;
>>> pthread_create (&t, 0, func, 0);
>>> pthread_join (t, &ret);
>>> printf("check_me %d, p %p\n", check_me, &ret);
>>> return (check_me != 42 || ret != &check_me);
>>> }
>>>
>>> ENDOF
>>>
>>> 2) Compile
>>> gcc -o conftest -pthread conftest.c
>>>
>>> 3) Start a container with glibc 2.34 installed
>>> [skip details]
>>> docker run -it <container-image-name> bash
>>>
>>> 4) Run conftest without this patch
>>> $ ./conftest
>>> check_me 0, p 0x7ffd91ccd400
>>>
>>> 5) Run conftest with this patch
>>> $ ./conftest
>>> start thread: check_me 42
>>> check_me 42, p 0x7ffe253c6f20
>>>
>>> Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
>>> ---
>>> sysdeps/unix/sysv/linux/clone-internal.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/sysdeps/unix/sysv/linux/clone-internal.c
>>> b/sysdeps/unix/sysv/linux/clone-internal.c
>>> index 979f7880be..97101994e8 100644
>>> --- a/sysdeps/unix/sysv/linux/clone-internal.c
>>> +++ b/sysdeps/unix/sysv/linux/clone-internal.c
>>> @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
>>> /* Try clone3 first. */
>>> int saved_errno = errno;
>>> ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
>>> - if (ret != -1 || errno != ENOSYS)
>>> + if (ret != -1 || (errno != ENOSYS && errno != EPERM))
>>> return ret;
>>>
>>> /* NB: Restore errno since errno may be checked against non-zero
>>> --
>>> 2.30.2
>>>
>>
>> --
>> H.J.
>
>
On 8/29/21 10:43 PM, H.J. Lu wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>> On 8/29/21 9:47 PM, H.J. Lu wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
>>>> applied, start a unprivileged container (docker run without --privileged),
>>>> it creates a thread failed in container.
>>>>
>>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
>>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>>>
>>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>>>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
>>> I don't think the description is accurate. In your test, none
>>> of the mentioned flags are used directly. The real bug is
>>> that the container you used blocks the normal clone3 and
>>> sets errno to EPERM. The question is if/how glibc should
>>> work arounds the clone3 bug in containers. We want to add
>>> a public clone3 wrapper to glibc in the future. But before we
>>> do that, all these containers should be changed to ENOSYS
>>> if clone3 is blocked.
>> You mean I should fix the container (here is the docker I used) to correct
>> EPERM to ENOSYS in this situation, but for the released/old docker,
>> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>>
>> In other word, should the new glibc consider backward compatibility with
>> others?
> I don't think we should hide the container bug in glibc. Will a glibc tunable
> to disable the clone3 wrapper work here?
Yes, that's my plan B, disable it by removing the macro definition of
HAVE_CLONE3_WRAPPER in our Yocto's glibc
//Hongxu
> --
> H.J.
On Sun, Aug 29, 2021 at 7:50 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 10:43 PM, H.J. Lu wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
> >> On 8/29/21 9:47 PM, H.J. Lu wrote:
> >>> [Please note: This e-mail is from an EXTERNAL e-mail address]
> >>>
> >>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
> >>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> >>>> applied, start a unprivileged container (docker run without --privileged),
> >>>> it creates a thread failed in container.
> >>>>
> >>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
> >>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
> >>>>
> >>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> >>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> >>>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
> >>> I don't think the description is accurate. In your test, none
> >>> of the mentioned flags are used directly. The real bug is
> >>> that the container you used blocks the normal clone3 and
> >>> sets errno to EPERM. The question is if/how glibc should
> >>> work arounds the clone3 bug in containers. We want to add
> >>> a public clone3 wrapper to glibc in the future. But before we
> >>> do that, all these containers should be changed to ENOSYS
> >>> if clone3 is blocked.
> >> You mean I should fix the container (here is the docker I used) to correct
> >> EPERM to ENOSYS in this situation, but for the released/old docker,
> >> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
> >>
> >> In other word, should the new glibc consider backward compatibility with
> >> others?
> > I don't think we should hide the container bug in glibc. Will a glibc tunable
> > to disable the clone3 wrapper work here?
>
> Yes, that's my plan B, disable it by removing the macro definition of
> HAVE_CLONE3_WRAPPER in our Yocto's glibc
>
This is an option. But this is not what I meant. We can add
$ export GLIBC_TUNABLES=glibc.syscall=disable_clone3
to disable the clone3 wrapper.
On 8/29/21 11:20 PM, H.J. Lu wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:50 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>> On 8/29/21 10:43 PM, H.J. Lu wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>>>> On 8/29/21 9:47 PM, H.J. Lu wrote:
>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>
>>>>> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>>>>>> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
>>>>>> applied, start a unprivileged container (docker run without --privileged),
>>>>>> it creates a thread failed in container.
>>>>>>
>>>>>> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
>>>>>> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>>>>>>
>>>>>> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
>>>>>> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
>>>>>> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
>>>>> I don't think the description is accurate. In your test, none
>>>>> of the mentioned flags are used directly. The real bug is
>>>>> that the container you used blocks the normal clone3 and
>>>>> sets errno to EPERM. The question is if/how glibc should
>>>>> work arounds the clone3 bug in containers. We want to add
>>>>> a public clone3 wrapper to glibc in the future. But before we
>>>>> do that, all these containers should be changed to ENOSYS
>>>>> if clone3 is blocked.
>>>> You mean I should fix the container (here is the docker I used) to correct
>>>> EPERM to ENOSYS in this situation, but for the released/old docker,
>>>> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>>>>
>>>> In other word, should the new glibc consider backward compatibility with
>>>> others?
>>> I don't think we should hide the container bug in glibc. Will a glibc tunable
>>> to disable the clone3 wrapper work here?
>> Yes, that's my plan B, disable it by removing the macro definition of
>> HAVE_CLONE3_WRAPPER in our Yocto's glibc
>>
> This is an option. But this is not what I meant. We can add
>
> $ export GLIBC_TUNABLES=glibc.syscall=disable_clone3
>
> to disable the clone3 wrapper.
Thank you very much, setting an environment is better than applying an
patch to sources
but unfortunately, I set 'export
GLIBC_TUNABLES=glibc.syscall=disable_clone3' in my glibc build
environment, but it seems not work,
the issue still exists. I also apply it in my runtime container, it does
not work neither.
My build environment is a Yocto project that supports cross compiling, I
am not familiar with GLIBC_TUNABLES setting,
with a simple search in glibc sources, I do not find clues about
glibc.syscall=disable_clone3
//Hongxu
//
>
> --
> H.J.
On Sun, Aug 29, 2021 at 9:03 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 11:20 PM, H.J. Lu wrote:
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:50 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 10:43 PM, H.J. Lu wrote:
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 7:12 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> On 8/29/21 9:47 PM, H.J. Lu wrote:
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, Aug 29, 2021 at 6:29 AM Hongxu Jia <hongxu.jia@windriver.com> wrote:
>
> Since commit [d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3]
> applied, start a unprivileged container (docker run without --privileged),
> it creates a thread failed in container.
>
> In commit d8ea0d0168, it calls __clone3 if HAVE_CLONE3_WAPPER is defined. If
> __clone3 returns -1 with ENOSYS, fall back to clone or clone2.
>
> As known from [1], cloneXXX fails with EPERM if CLONE_NEWCGROUP,
> CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID, or CLONE_NEWUTS
> was specified by an unprivileged process (process without CAP_SYS_ADMIN)
>
> I don't think the description is accurate. In your test, none
> of the mentioned flags are used directly. The real bug is
> that the container you used blocks the normal clone3 and
> sets errno to EPERM. The question is if/how glibc should
> work arounds the clone3 bug in containers. We want to add
> a public clone3 wrapper to glibc in the future. But before we
> do that, all these containers should be changed to ENOSYS
> if clone3 is blocked.
>
> You mean I should fix the container (here is the docker I used) to correct
> EPERM to ENOSYS in this situation, but for the released/old docker,
> the pthread_create still does not work with glibc 2.34 in unprivileged mode.
>
> In other word, should the new glibc consider backward compatibility with
> others?
>
> I don't think we should hide the container bug in glibc. Will a glibc tunable
> to disable the clone3 wrapper work here?
>
> Yes, that's my plan B, disable it by removing the macro definition of
> HAVE_CLONE3_WRAPPER in our Yocto's glibc
>
> This is an option. But this is not what I meant. We can add
>
> $ export GLIBC_TUNABLES=glibc.syscall=disable_clone3
>
> to disable the clone3 wrapper.
>
> Thank you very much, setting an environment is better than applying an patch to sources
>
> but unfortunately, I set 'export GLIBC_TUNABLES=glibc.syscall=disable_clone3' in my glibc build environment, but it seems not work,
>
> the issue still exists. I also apply it in my runtime container, it does not work neither.
>
> My build environment is a Yocto project that supports cross compiling, I am not familiar with GLIBC_TUNABLES setting,
>
> with a simple search in glibc sources, I do not find clues about glibc.syscall=disable_clone3
>
>
Someone needs to add it.
@@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
/* Try clone3 first. */
int saved_errno = errno;
ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
- if (ret != -1 || errno != ENOSYS)
+ if (ret != -1 || (errno != ENOSYS && errno != EPERM))
return ret;
/* NB: Restore errno since errno may be checked against non-zero