[RFC] Linux: Use fixed rseq_len value for rseq registration

Message ID 20200713193434.30440-1-mathieu.desnoyers@efficios.com
State Dropped
Headers
Series [RFC] Linux: Use fixed rseq_len value for rseq registration |

Commit Message

Mathieu Desnoyers July 13, 2020, 7:34 p.m. UTC
  The rseq registration system call expects a fixed-size argument of 32
for the rseq_len.  We are currently discussing schemes to extend
struct rseq beyond that size, and those involve using fields within
struct rseq, without any changes to the rseq_len argument.

Building a glibc with an updated, larger, struct rseq in the Linux
kernel UAPI headers should not break registration.  Therefore, use
a fixed-size of 32 as rseq_len parameter.

See struct rseq extension discussion:

https://lore.kernel.org/lkml/1305865358.10354.1594665620975.JavaMail.zimbra@efficios.com/
---
 sysdeps/unix/sysv/linux/rseq-internal.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

Florian Weimer July 14, 2020, 8:51 a.m. UTC | #1
* Mathieu Desnoyers:

> +  /* The rseq_len parameter does not allow extending struct rseq.  Fix its
> +     value to 32 as expected by the Linux kernel.  */
> +  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG);

If the layout of struct rseq can change in the kernel headers, than far
more significant changes are needed.  glibc cannot change its ABI
depending on the version of the kernel headers it is compiled against.

Thanks,
Florian
  
Mathieu Desnoyers July 14, 2020, 12:31 p.m. UTC | #2
----- On Jul 14, 2020, at 4:51 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:
> 
>> +  /* The rseq_len parameter does not allow extending struct rseq.  Fix its
>> +     value to 32 as expected by the Linux kernel.  */
>> +  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG);
> 
> If the layout of struct rseq can change in the kernel headers, than far
> more significant changes are needed.  glibc cannot change its ABI
> depending on the version of the kernel headers it is compiled against.

What I have in mind for struct rseq is that the current structure layout
is guaranteed as a "minimum" requirement. It can then be extended by the
kernel UAPI in a backward compatible way: if user-space provides enough
room for extension, and the kernel supports the additional features, then
user-space can use those additional features.

Adding extra room in the __rseq_abi can be done though:

- Upgrading glibc with knowledge of the extra layout,
- Preloading an interposition library which defines __rseq_abi with the
  extended layout,
- Defining __rseq_abi with extended layout in the application binary.

What I suspect is your concern here is that two glibc-2.32 builds against
different versions of kernel headers could lead to different __rseq_abi
size, which is unexpected.

Would using the internal struct rseq all the time in glibc (rather than
uapi linux/rseq.h) resolve your concern ? This way, glibc could integrate
extended layouts on version bump (e.g. 2.33).

Thanks,

Mathieu
  
Florian Weimer July 14, 2020, 1:28 p.m. UTC | #3
* Mathieu Desnoyers:

> ----- On Jul 14, 2020, at 4:51 AM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Mathieu Desnoyers:
>> 
>>> +  /* The rseq_len parameter does not allow extending struct rseq.  Fix its
>>> +     value to 32 as expected by the Linux kernel.  */
>>> +  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG);
>> 
>> If the layout of struct rseq can change in the kernel headers, than far
>> more significant changes are needed.  glibc cannot change its ABI
>> depending on the version of the kernel headers it is compiled against.
>
> What I have in mind for struct rseq is that the current structure layout
> is guaranteed as a "minimum" requirement. It can then be extended by the
> kernel UAPI in a backward compatible way: if user-space provides enough
> room for extension, and the kernel supports the additional features, then
> user-space can use those additional features.
>
> Adding extra room in the __rseq_abi can be done though:
>
> - Upgrading glibc with knowledge of the extra layout,
> - Preloading an interposition library which defines __rseq_abi with the
>   extended layout,
> - Defining __rseq_abi with extended layout in the application binary.
>
> What I suspect is your concern here is that two glibc-2.32 builds against
> different versions of kernel headers could lead to different __rseq_abi
> size, which is unexpected.

Indeed.

> Would using the internal struct rseq all the time in glibc (rather than
> uapi linux/rseq.h) resolve your concern ? This way, glibc could integrate
> extended layouts on version bump (e.g. 2.33).

It works reliably as long as glibc only ever uses the minimum rseq size.
And since glibc monopolizes the rseq registration, applications cannot
register a larger area.  So there is no way to make use of any future
kernel extensions.

Thanks,
Florian
  
Christian Brauner July 14, 2020, 1:33 p.m. UTC | #4
On Tue, Jul 14, 2020 at 03:28:47PM +0200, Florian Weimer via Libc-alpha wrote:
> * Mathieu Desnoyers:
> 
> > ----- On Jul 14, 2020, at 4:51 AM, Florian Weimer fweimer@redhat.com wrote:
> >
> >> * Mathieu Desnoyers:
> >> 
> >>> +  /* The rseq_len parameter does not allow extending struct rseq.  Fix its
> >>> +     value to 32 as expected by the Linux kernel.  */
> >>> +  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG);
> >> 
> >> If the layout of struct rseq can change in the kernel headers, than far
> >> more significant changes are needed.  glibc cannot change its ABI
> >> depending on the version of the kernel headers it is compiled against.
> >
> > What I have in mind for struct rseq is that the current structure layout
> > is guaranteed as a "minimum" requirement. It can then be extended by the
> > kernel UAPI in a backward compatible way: if user-space provides enough
> > room for extension, and the kernel supports the additional features, then
> > user-space can use those additional features.
> >
> > Adding extra room in the __rseq_abi can be done though:
> >
> > - Upgrading glibc with knowledge of the extra layout,
> > - Preloading an interposition library which defines __rseq_abi with the
> >   extended layout,
> > - Defining __rseq_abi with extended layout in the application binary.
> >
> > What I suspect is your concern here is that two glibc-2.32 builds against
> > different versions of kernel headers could lead to different __rseq_abi
> > size, which is unexpected.
> 
> Indeed.
> 
> > Would using the internal struct rseq all the time in glibc (rather than
> > uapi linux/rseq.h) resolve your concern ? This way, glibc could integrate
> > extended layouts on version bump (e.g. 2.33).
> 
> It works reliably as long as glibc only ever uses the minimum rseq size.
> And since glibc monopolizes the rseq registration, applications cannot
> register a larger area.  So there is no way to make use of any future
> kernel extensions.

But when you bump ABI in glibc you can switch to a new rseq size, no?

Christian
  
Florian Weimer July 14, 2020, 1:54 p.m. UTC | #5
* Christian Brauner:

>> It works reliably as long as glibc only ever uses the minimum rseq size.
>> And since glibc monopolizes the rseq registration, applications cannot
>> register a larger area.  So there is no way to make use of any future
>> kernel extensions.
>
> But when you bump ABI in glibc you can switch to a new rseq size, no?

We are expected to support interposers with their own definition
__rseq_abi, which could be smaller.

Thanks,
Florian
  
Mathieu Desnoyers July 14, 2020, 2:04 p.m. UTC | #6
----- On Jul 14, 2020, at 9:54 AM, Florian Weimer fweimer@redhat.com wrote:

> * Christian Brauner:
> 
>>> It works reliably as long as glibc only ever uses the minimum rseq size.
>>> And since glibc monopolizes the rseq registration, applications cannot
>>> register a larger area.  So there is no way to make use of any future
>>> kernel extensions.
>>
>> But when you bump ABI in glibc you can switch to a new rseq size, no?
> 
> We are expected to support interposers with their own definition
> __rseq_abi, which could be smaller.

In this scenario, the interposers would have to follow the UAPI rules:
The size should at least cover the original struct rseq fields,
up to and including the flags field.

This just means glibc would internally abide by the same rules as the
other users if it want to use an extended rseq structure, and not expect
to be the only possible library defining the __rseq_abi symbol. If glibc
wants to use the extension feature, it would have to validate that the
RSEQ_TLS_FLAG_SIZE flag is set in the __rseq_abi.flags, and then validate
that __rseq_abi.kernel_size covers the feature it needs before accessing
the feature field.

It would have to do something similar anyway to make sure the kernel
supports those extensions.

Thanks,

Mathieu
  
Florian Weimer July 14, 2020, 3:07 p.m. UTC | #7
* Mathieu Desnoyers:

> ----- On Jul 14, 2020, at 9:54 AM, Florian Weimer fweimer@redhat.com wrote:
>
>> * Christian Brauner:
>> 
>>>> It works reliably as long as glibc only ever uses the minimum rseq size.
>>>> And since glibc monopolizes the rseq registration, applications cannot
>>>> register a larger area.  So there is no way to make use of any future
>>>> kernel extensions.
>>>
>>> But when you bump ABI in glibc you can switch to a new rseq size, no?
>> 
>> We are expected to support interposers with their own definition
>> __rseq_abi, which could be smaller.
>
> In this scenario, the interposers would have to follow the UAPI rules:
> The size should at least cover the original struct rseq fields,
> up to and including the flags field.

If you use an ELF symbol, you also have to follow rules related to ELF
linking (and C language requirements).

> This just means glibc would internally abide by the same rules as the
> other users if it want to use an extended rseq structure, and not expect
> to be the only possible library defining the __rseq_abi symbol. If glibc
> wants to use the extension feature, it would have to validate that the
> RSEQ_TLS_FLAG_SIZE flag is set in the __rseq_abi.flags, and then validate
> that __rseq_abi.kernel_size covers the feature it needs before accessing
> the feature field.

This still does not give us a way to perform the rseq registration with
the size expected by an interposing definition of __rseq_abi.

I think we are looking at this from the wrong perspective.  It's not
userspace that is setting the size here, it's the kernel based on the
features it supports.  So the kernel should put the size into the
auxiliary vector, and the registration should use that size.  But that
doesn't align well with the use of an ELF TLS symbol.

Thanks,
Florian
  
Mathieu Desnoyers July 14, 2020, 3:30 p.m. UTC | #8
----- On Jul 14, 2020, at 11:07 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:
> 
>> ----- On Jul 14, 2020, at 9:54 AM, Florian Weimer fweimer@redhat.com wrote:
>>
>>> * Christian Brauner:
>>> 
>>>>> It works reliably as long as glibc only ever uses the minimum rseq size.
>>>>> And since glibc monopolizes the rseq registration, applications cannot
>>>>> register a larger area.  So there is no way to make use of any future
>>>>> kernel extensions.
>>>>
>>>> But when you bump ABI in glibc you can switch to a new rseq size, no?
>>> 
>>> We are expected to support interposers with their own definition
>>> __rseq_abi, which could be smaller.
>>
>> In this scenario, the interposers would have to follow the UAPI rules:
>> The size should at least cover the original struct rseq fields,
>> up to and including the flags field.
> 
> If you use an ELF symbol, you also have to follow rules related to ELF
> linking (and C language requirements).

Indeed.

> 
>> This just means glibc would internally abide by the same rules as the
>> other users if it want to use an extended rseq structure, and not expect
>> to be the only possible library defining the __rseq_abi symbol. If glibc
>> wants to use the extension feature, it would have to validate that the
>> RSEQ_TLS_FLAG_SIZE flag is set in the __rseq_abi.flags, and then validate
>> that __rseq_abi.kernel_size covers the feature it needs before accessing
>> the feature field.
> 
> This still does not give us a way to perform the rseq registration with
> the size expected by an interposing definition of __rseq_abi.

Yes it does. The interposing definition of __rseq_abi would have the
__rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, and would have
__rseq_abi.user_size = offsetof(struct rseq, end) (or something similar)
from its own perspective. That would be passed to the kernel on
registration. The "rseq_len" system call parameter would always stay
at value 32 though.

> I think we are looking at this from the wrong perspective.  It's not
> userspace that is setting the size here, it's the kernel based on the
> features it supports.  So the kernel should put the size into the
> auxiliary vector, and the registration should use that size.  But that
> doesn't align well with the use of an ELF TLS symbol.

We have a few possible ways to do things here:

1) Kernel exports supported size, incompatible with ELF TLS symbol,

2) Userspace dictates supported size, compatible with ELF TLS symbol,
   triggers failure if the kernel supports a smaller size,

3) Userspace lets kernel know how much space is available for struct rseq
   (through user_size field), and the kernel lets user-space know how much
   of that structure is being filled (through kernel_size field). This
   would also be compatible with ELF TLS symbol AFAIU, and would allow
   extending struct rseq.

Option (3) would allow us to have the speed gains that come with using a
TLS from the fast-path, while allowing extension of struct rseq.

Or is there anything in that scheme that breaks ELF rules or C language
requirements ?

Thanks,

Mathieu
  
Szabolcs Nagy July 14, 2020, 4:01 p.m. UTC | #9
The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote:
> > I think we are looking at this from the wrong perspective.  It's not
> > userspace that is setting the size here, it's the kernel based on the
> > features it supports.  So the kernel should put the size into the
> > auxiliary vector, and the registration should use that size.  But that
> > doesn't align well with the use of an ELF TLS symbol.

this is why it is better to use a function that returns
a pointer than a tls symbol as public abi.

> We have a few possible ways to do things here:
> 
> 1) Kernel exports supported size, incompatible with ELF TLS symbol,
> 
> 2) Userspace dictates supported size, compatible with ELF TLS symbol,
>    triggers failure if the kernel supports a smaller size,
> 
> 3) Userspace lets kernel know how much space is available for struct rseq
>    (through user_size field), and the kernel lets user-space know how much
>    of that structure is being filled (through kernel_size field). This
>    would also be compatible with ELF TLS symbol AFAIU, and would allow
>    extending struct rseq.
> 
> Option (3) would allow us to have the speed gains that come with using a
> TLS from the fast-path, while allowing extension of struct rseq.
> 
> Or is there anything in that scheme that breaks ELF rules or C language
> requirements ?

how would users access those extension fields?

how would the public struct definition change?
  
Mathieu Desnoyers July 14, 2020, 4:12 p.m. UTC | #10
----- On Jul 14, 2020, at 12:01 PM, Szabolcs Nagy szabolcs.nagy@arm.com wrote:

> The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote:
>> > I think we are looking at this from the wrong perspective.  It's not
>> > userspace that is setting the size here, it's the kernel based on the
>> > features it supports.  So the kernel should put the size into the
>> > auxiliary vector, and the registration should use that size.  But that
>> > doesn't align well with the use of an ELF TLS symbol.
> 
> this is why it is better to use a function that returns
> a pointer than a tls symbol as public abi.
> 
>> We have a few possible ways to do things here:
>> 
>> 1) Kernel exports supported size, incompatible with ELF TLS symbol,
>> 
>> 2) Userspace dictates supported size, compatible with ELF TLS symbol,
>>    triggers failure if the kernel supports a smaller size,
>> 
>> 3) Userspace lets kernel know how much space is available for struct rseq
>>    (through user_size field), and the kernel lets user-space know how much
>>    of that structure is being filled (through kernel_size field). This
>>    would also be compatible with ELF TLS symbol AFAIU, and would allow
>>    extending struct rseq.
>> 
>> Option (3) would allow us to have the speed gains that come with using a
>> TLS from the fast-path, while allowing extension of struct rseq.
>> 
>> Or is there anything in that scheme that breaks ELF rules or C language
>> requirements ?
> 
> how would users access those extension fields?

if (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) {
  /* Allowed to access user_size and kernel_size. */
  if (__rseq_abi.kernel_size >= offsetof(struct rseq, myfield) + sizeof(((struct rseq *)NULL)->myfield)) {
    /* Allowed to access __rseq_abi.myfield. User code should remember this, e.g. in a static variable. */
  }
}

Once the user program/library has confirmed that it can indeed access the field,
it can directly use the field with __rseq_abi.myfield.

> how would the public struct definition change?

Additional fields would be added at the end of struct rseq.
If we use the approach with the end[] flexible array member, then it
would move down to stay at the very end of the structure. If we don't
(because it is not c++ compliant), then we would update an
"rseq_last_field" macro to point to the last field in struct rseq.

Adding additional fields to struct rseq would increase its size due to
added fields, and eventually may change its alignment to a larger value.

Thanks,

Mathieu
  
Florian Weimer July 14, 2020, 5:03 p.m. UTC | #11
* Mathieu Desnoyers:

> ----- On Jul 14, 2020, at 12:01 PM, Szabolcs Nagy szabolcs.nagy@arm.com wrote:
>
>> The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote:
>>> > I think we are looking at this from the wrong perspective.  It's not
>>> > userspace that is setting the size here, it's the kernel based on the
>>> > features it supports.  So the kernel should put the size into the
>>> > auxiliary vector, and the registration should use that size.  But that
>>> > doesn't align well with the use of an ELF TLS symbol.
>> 
>> this is why it is better to use a function that returns
>> a pointer than a tls symbol as public abi.
>> 
>>> We have a few possible ways to do things here:
>>> 
>>> 1) Kernel exports supported size, incompatible with ELF TLS symbol,
>>> 
>>> 2) Userspace dictates supported size, compatible with ELF TLS symbol,
>>>    triggers failure if the kernel supports a smaller size,
>>> 
>>> 3) Userspace lets kernel know how much space is available for struct rseq
>>>    (through user_size field), and the kernel lets user-space know how much
>>>    of that structure is being filled (through kernel_size field). This
>>>    would also be compatible with ELF TLS symbol AFAIU, and would allow
>>>    extending struct rseq.
>>> 
>>> Option (3) would allow us to have the speed gains that come with using a
>>> TLS from the fast-path, while allowing extension of struct rseq.
>>> 
>>> Or is there anything in that scheme that breaks ELF rules or C language
>>> requirements ?
>> 
>> how would users access those extension fields?
>
> if (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) {
>   /* Allowed to access user_size and kernel_size. */
>   if (__rseq_abi.kernel_size >= offsetof(struct rseq, myfield) + sizeof(((struct rseq *)NULL)->myfield)) {
>     /* Allowed to access __rseq_abi.myfield. User code should remember this, e.g. in a static variable. */
>   }
> }

Technically, this needs a compiler extension, so that the allowed
accesses to the __rseq_abi.myfield field are not moved before the flags
and size checks.

Thanks,
Florian
  
Mathieu Desnoyers July 14, 2020, 5:12 p.m. UTC | #12
----- On Jul 14, 2020, at 1:03 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:
> 
>> ----- On Jul 14, 2020, at 12:01 PM, Szabolcs Nagy szabolcs.nagy@arm.com wrote:
>>
>>> The 07/14/2020 11:30, Mathieu Desnoyers via Libc-alpha wrote:
>>>> > I think we are looking at this from the wrong perspective.  It's not
>>>> > userspace that is setting the size here, it's the kernel based on the
>>>> > features it supports.  So the kernel should put the size into the
>>>> > auxiliary vector, and the registration should use that size.  But that
>>>> > doesn't align well with the use of an ELF TLS symbol.
>>> 
>>> this is why it is better to use a function that returns
>>> a pointer than a tls symbol as public abi.
>>> 
>>>> We have a few possible ways to do things here:
>>>> 
>>>> 1) Kernel exports supported size, incompatible with ELF TLS symbol,
>>>> 
>>>> 2) Userspace dictates supported size, compatible with ELF TLS symbol,
>>>>    triggers failure if the kernel supports a smaller size,
>>>> 
>>>> 3) Userspace lets kernel know how much space is available for struct rseq
>>>>    (through user_size field), and the kernel lets user-space know how much
>>>>    of that structure is being filled (through kernel_size field). This
>>>>    would also be compatible with ELF TLS symbol AFAIU, and would allow
>>>>    extending struct rseq.
>>>> 
>>>> Option (3) would allow us to have the speed gains that come with using a
>>>> TLS from the fast-path, while allowing extension of struct rseq.
>>>> 
>>>> Or is there anything in that scheme that breaks ELF rules or C language
>>>> requirements ?
>>> 
>>> how would users access those extension fields?
>>
>> if (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) {
>>   /* Allowed to access user_size and kernel_size. */
>>   if (__rseq_abi.kernel_size >= offsetof(struct rseq, myfield) + sizeof(((struct
>>   rseq *)NULL)->myfield)) {
>>     /* Allowed to access __rseq_abi.myfield. User code should remember this, e.g. in
>>     a static variable. */
>>   }
>> }
> 
> Technically, this needs a compiler extension, so that the allowed
> accesses to the __rseq_abi.myfield field are not moved before the flags
> and size checks.

If those accesses are performed as volatile loads, or with atomic relaxed
semantic, the compiler should not be allowed to change the order. Likewise
if we use asm volatile (::: "memory") clobber between the loads.

Thanks,

Mathieu
  

Patch

diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 8f6772ca1d..3522668f3a 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -33,8 +33,9 @@  rseq_register_current_thread (void)
   if (__rseq_abi.cpu_id != RSEQ_CPU_ID_UNINITIALIZED)
     __libc_fatal ("glibc fatal error: "
                   "rseq already initialized for this thread\n");
-  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, sizeof (struct rseq),
-                               0, RSEQ_SIG);
+  /* The rseq_len parameter does not allow extending struct rseq.  Fix its
+     value to 32 as expected by the Linux kernel.  */
+  ret = INTERNAL_SYSCALL_CALL (rseq, &__rseq_abi, 32, 0, RSEQ_SIG);
   if (INTERNAL_SYSCALL_ERROR_P (ret))
     {
       const char *msg = NULL;