[09/15] sh: rework sync_file_range ABI
Checks
Commit Message
From: Arnd Bergmann <arnd@arndb.de>
The unusual function calling conventions on superh ended up causing
sync_file_range to have the wrong argument order, with the 'flags'
argument getting sorted before 'nbytes' by the compiler.
In userspace, I found that musl, glibc, uclibc and strace all expect the
normal calling conventions with 'nbytes' last, so changing the kernel
to match them should make all of those work.
In order to be able to also fix libc implementations to work with existing
kernels, they need to be able to tell which ABI is used. An easy way
to do this is to add yet another system call using the sync_file_range2
ABI that works the same on all architectures.
Old user binaries can now work on new kernels, and new binaries can
try the new sync_file_range2() to work with new kernels or fall back
to the old sync_file_range() version if that doesn't exist.
Cc: stable@vger.kernel.org
Fixes: 75c92acdd5b1 ("sh: Wire up new syscalls.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
arch/sh/kernel/sys_sh32.c | 11 +++++++++++
arch/sh/kernel/syscalls/syscall.tbl | 3 ++-
2 files changed, 13 insertions(+), 1 deletion(-)
Comments
Hi Arnd,
thanks for your patch!
On Thu, 2024-06-20 at 18:23 +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> The unusual function calling conventions on superh ended up causing
^^^^^^
It's spelled SuperH
> sync_file_range to have the wrong argument order, with the 'flags'
> argument getting sorted before 'nbytes' by the compiler.
>
> In userspace, I found that musl, glibc, uclibc and strace all expect the
> normal calling conventions with 'nbytes' last, so changing the kernel
> to match them should make all of those work.
>
> In order to be able to also fix libc implementations to work with existing
> kernels, they need to be able to tell which ABI is used. An easy way
> to do this is to add yet another system call using the sync_file_range2
> ABI that works the same on all architectures.
>
> Old user binaries can now work on new kernels, and new binaries can
> try the new sync_file_range2() to work with new kernels or fall back
> to the old sync_file_range() version if that doesn't exist.
>
> Cc: stable@vger.kernel.org
> Fixes: 75c92acdd5b1 ("sh: Wire up new syscalls.")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> arch/sh/kernel/sys_sh32.c | 11 +++++++++++
> arch/sh/kernel/syscalls/syscall.tbl | 3 ++-
> 2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
> index 9dca568509a5..d5a4f7c697d8 100644
> --- a/arch/sh/kernel/sys_sh32.c
> +++ b/arch/sh/kernel/sys_sh32.c
> @@ -59,3 +59,14 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
> (u64)len0 << 32 | len1, advice);
> #endif
> }
> +
> +/*
> + * swap the arguments the way that libc wants it instead of
I think "swap the arguments to the order that libc wants them" would
be easier to understand here.
> + * moving flags ahead of the 64-bit nbytes argument
> + */
> +SYSCALL_DEFINE6(sh_sync_file_range6, int, fd, SC_ARG64(offset),
> + SC_ARG64(nbytes), unsigned int, flags)
> +{
> + return ksys_sync_file_range(fd, SC_VAL64(loff_t, offset),
> + SC_VAL64(loff_t, nbytes), flags);
> +}
> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> index bbf83a2db986..c55fd7696d40 100644
> --- a/arch/sh/kernel/syscalls/syscall.tbl
> +++ b/arch/sh/kernel/syscalls/syscall.tbl
> @@ -321,7 +321,7 @@
> 311 common set_robust_list sys_set_robust_list
> 312 common get_robust_list sys_get_robust_list
> 313 common splice sys_splice
> -314 common sync_file_range sys_sync_file_range
> +314 common sync_file_range sys_sh_sync_file_range6
^^^^^^ Why the suffix 6 here?
> 315 common tee sys_tee
> 316 common vmsplice sys_vmsplice
> 317 common move_pages sys_move_pages
> @@ -395,6 +395,7 @@
> 385 common pkey_alloc sys_pkey_alloc
> 386 common pkey_free sys_pkey_free
> 387 common rseq sys_rseq
> +388 common sync_file_range2 sys_sync_file_range2
> # room for arch specific syscalls
> 393 common semget sys_semget
> 394 common semctl sys_semctl
I wonder how you discovered this bug. Did you look up the calling convention on SuperH
and compare the argument order for the sys_sync_file_range system call documented there
with the order in the kernel?
Did you also check what order libc uses? I would expect libc on SuperH misordering the
arguments as well unless I am missing something. Or do we know that the code is actually
currently broken?
Thanks,
Adrian
On Fri, Jun 21, 2024, at 10:44, John Paul Adrian Glaubitz wrote:
> On Thu, 2024-06-20 at 18:23 +0200, Arnd Bergmann wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> The unusual function calling conventions on superh ended up causing
> ^^^^^^
> It's spelled SuperH
Fixed now.
>> diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
>> index 9dca568509a5..d5a4f7c697d8 100644
>> --- a/arch/sh/kernel/sys_sh32.c
>> +++ b/arch/sh/kernel/sys_sh32.c
>> @@ -59,3 +59,14 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
>> (u64)len0 << 32 | len1, advice);
>> #endif
>> }
>> +
>> +/*
>> + * swap the arguments the way that libc wants it instead of
>
> I think "swap the arguments to the order that libc wants them" would
> be easier to understand here.
Done
>> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
>> index bbf83a2db986..c55fd7696d40 100644
>> --- a/arch/sh/kernel/syscalls/syscall.tbl
>> +++ b/arch/sh/kernel/syscalls/syscall.tbl
>> @@ -321,7 +321,7 @@
>> 311 common set_robust_list sys_set_robust_list
>> 312 common get_robust_list sys_get_robust_list
>> 313 common splice sys_splice
>> -314 common sync_file_range sys_sync_file_range
>> +314 common sync_file_range sys_sh_sync_file_range6
> ^^^^^^
> Why the suffix 6 here?
In a later part of my cleanup, I'm consolidating all the
copies of this function (arm64, mips, parisc, powerpc,
s390, sh, sparc, x86) and picked the name
sys_sync_file_range6() for common implementation.
I end up with four entry points here, so the naming is a bit
confusing:
- sys_sync_file_range() is only used on 64-bit architectures,
on x32 and on mips-n32. This uses four arguments, including
two 64-bit wide ones.
- sys_sync_file_range2() continues to be used on arm, powerpc,
xtensa and now on sh, hexagon and csky. I change the
implementation to take six 32-bit arguments, but the ABI
remains the same as before, with the flags before offset.
- sys_sync_file_range6() is used for most other 32-bit ABIs:
arc, m68k, microblaze, nios2, openrisc, parisc, s390, sh, sparc
and x86. This also has six 32-bit arguments but in the
default order (fd, offset, nbytes, flags).
- sys_sync_file_range7() is exclusive to mips-o32, this one
has an unused argument and is otherwise the same as
sys_sync_file_range6().
My plan is to then have some infrastructure to ensure
userspace tools (libc, strace, qemu, rust, ...) use the
same calling conventions as the kernel. I'm doing the
same thing for all other syscalls that have architecture
specific calling conventions, so far I'm using
fadvise64_64_7
fanotify_mark6
truncate3
truncate4
ftruncate3
ftruncate4
fallocate6
pread5
pread6
pwrite5
pwrite6
preadv5
preadv6
pwritev5
pwritev6
sync_file_range6
fadvise64_64_2
fadvise64_64_6
fadvise64_5
fadvise64_6
readahead4
readahead5
The last number here is usually the number of 32-bit
arguments, except for fadvise64_64_2 that uses the
same argument reordering trick as sync_file_range2.
I'm not too happy with the naming but couldn't come up with
anything clearer either, so let me know if you have any
ideas there.
>> 315 common tee sys_tee
>> 316 common vmsplice sys_vmsplice
>> 317 common move_pages sys_move_pages
>> @@ -395,6 +395,7 @@
>> 385 common pkey_alloc sys_pkey_alloc
>> 386 common pkey_free sys_pkey_free
>> 387 common rseq sys_rseq
>> +388 common sync_file_range2 sys_sync_file_range2
>> # room for arch specific syscalls
>> 393 common semget sys_semget
>> 394 common semctl sys_semctl
>
> I wonder how you discovered this bug. Did you look up the calling
> convention on SuperH
> and compare the argument order for the sys_sync_file_range system call
> documented there
> with the order in the kernel?
I had to categorize all architectures based on their calling
conventions to see if 64-bit arguments need aligned pairs or
not, so I wrote a set of simple C files that I compiled for
all architectures to see in which cases they insert unused
arguments or swap the order of the upper and lower halves.
SuperH, parisc and s390 are each slightly different from all the
others here, so I ended up reading the ELF psABI docs and/or
the compiler sources to be sure.
I also a lot of git history.
> Did you also check what order libc uses? I would expect libc on SuperH
> misordering the
> arguments as well unless I am missing something. Or do we know that the
> code is actually
> currently broken?
Yes, I checked glibc, musl and uclibc-ng for all the cases in
which the ABI made no sense, as well as to check that my analysis
of the kernel sources matches the expectations of the libc.
Arnd
On Fri, Jun 21, 2024 at 10:44:39AM +0200, John Paul Adrian Glaubitz wrote:
> Hi Arnd,
>
> thanks for your patch!
>
> On Thu, 2024-06-20 at 18:23 +0200, Arnd Bergmann wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > The unusual function calling conventions on superh ended up causing
> ^^^^^^
> It's spelled SuperH
>
> > sync_file_range to have the wrong argument order, with the 'flags'
> > argument getting sorted before 'nbytes' by the compiler.
> >
> > In userspace, I found that musl, glibc, uclibc and strace all expect the
> > normal calling conventions with 'nbytes' last, so changing the kernel
> > to match them should make all of those work.
> >
> > In order to be able to also fix libc implementations to work with existing
> > kernels, they need to be able to tell which ABI is used. An easy way
> > to do this is to add yet another system call using the sync_file_range2
> > ABI that works the same on all architectures.
> >
> > Old user binaries can now work on new kernels, and new binaries can
> > try the new sync_file_range2() to work with new kernels or fall back
> > to the old sync_file_range() version if that doesn't exist.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 75c92acdd5b1 ("sh: Wire up new syscalls.")
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > ---
> > arch/sh/kernel/sys_sh32.c | 11 +++++++++++
> > arch/sh/kernel/syscalls/syscall.tbl | 3 ++-
> > 2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
> > index 9dca568509a5..d5a4f7c697d8 100644
> > --- a/arch/sh/kernel/sys_sh32.c
> > +++ b/arch/sh/kernel/sys_sh32.c
> > @@ -59,3 +59,14 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
> > (u64)len0 << 32 | len1, advice);
> > #endif
> > }
> > +
> > +/*
> > + * swap the arguments the way that libc wants it instead of
>
> I think "swap the arguments to the order that libc wants them" would
> be easier to understand here.
>
> > + * moving flags ahead of the 64-bit nbytes argument
> > + */
> > +SYSCALL_DEFINE6(sh_sync_file_range6, int, fd, SC_ARG64(offset),
> > + SC_ARG64(nbytes), unsigned int, flags)
> > +{
> > + return ksys_sync_file_range(fd, SC_VAL64(loff_t, offset),
> > + SC_VAL64(loff_t, nbytes), flags);
> > +}
> > diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> > index bbf83a2db986..c55fd7696d40 100644
> > --- a/arch/sh/kernel/syscalls/syscall.tbl
> > +++ b/arch/sh/kernel/syscalls/syscall.tbl
> > @@ -321,7 +321,7 @@
> > 311 common set_robust_list sys_set_robust_list
> > 312 common get_robust_list sys_get_robust_list
> > 313 common splice sys_splice
> > -314 common sync_file_range sys_sync_file_range
> > +314 common sync_file_range sys_sh_sync_file_range6
> ^^^^^^ Why the suffix 6 here?
>
> > 315 common tee sys_tee
> > 316 common vmsplice sys_vmsplice
> > 317 common move_pages sys_move_pages
> > @@ -395,6 +395,7 @@
> > 385 common pkey_alloc sys_pkey_alloc
> > 386 common pkey_free sys_pkey_free
> > 387 common rseq sys_rseq
> > +388 common sync_file_range2 sys_sync_file_range2
> > # room for arch specific syscalls
> > 393 common semget sys_semget
> > 394 common semctl sys_semctl
>
> I wonder how you discovered this bug. Did you look up the calling convention on SuperH
> and compare the argument order for the sys_sync_file_range system call documented there
> with the order in the kernel?
>
> Did you also check what order libc uses? I would expect libc on SuperH misordering the
> arguments as well unless I am missing something. Or do we know that the code is actually
> currently broken?
No, there's no reason libc would misorder them because syscalls aren't
function calls, and aren't subject to function call ABI. We have to
explicitly bind the arguments to registers and make a syscall
instruction.
The only reason this bug happened on the kernel side is that someone
thought it would be a smart idea to save maybe 10 instructions by
treating the register state on entry as directly suitable to jump from
asm to a C function rather than explicitly marshalling the arguments
out of the user-kernel syscall ABI positions into actual arguments to
a C function call.
Rich
Hi Arnd,
On Fri, 2024-06-21 at 11:41 +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2024, at 10:44, John Paul Adrian Glaubitz wrote:
> > On Thu, 2024-06-20 at 18:23 +0200, Arnd Bergmann wrote:
> > > From: Arnd Bergmann <arnd@arndb.de>
> > >
> > > The unusual function calling conventions on superh ended up causing
> > ^^^^^^
> > It's spelled SuperH
>
> Fixed now.
>
> > > diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
> > > index 9dca568509a5..d5a4f7c697d8 100644
> > > --- a/arch/sh/kernel/sys_sh32.c
> > > +++ b/arch/sh/kernel/sys_sh32.c
> > > @@ -59,3 +59,14 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
> > > (u64)len0 << 32 | len1, advice);
> > > #endif
> > > }
> > > +
> > > +/*
> > > + * swap the arguments the way that libc wants it instead of
> >
> > I think "swap the arguments to the order that libc wants them" would
> > be easier to understand here.
>
> Done
Thanks for the two improvements!
> > > diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> > > index bbf83a2db986..c55fd7696d40 100644
> > > --- a/arch/sh/kernel/syscalls/syscall.tbl
> > > +++ b/arch/sh/kernel/syscalls/syscall.tbl
> > > @@ -321,7 +321,7 @@
> > > 311 common set_robust_list sys_set_robust_list
> > > 312 common get_robust_list sys_get_robust_list
> > > 313 common splice sys_splice
> > > -314 common sync_file_range sys_sync_file_range
> > > +314 common sync_file_range sys_sh_sync_file_range6
> > ^^^^^^
> > Why the suffix 6 here?
>
> In a later part of my cleanup, I'm consolidating all the
> copies of this function (arm64, mips, parisc, powerpc,
> s390, sh, sparc, x86) and picked the name
> sys_sync_file_range6() for common implementation.
>
> I end up with four entry points here, so the naming is a bit
> confusing:
>
> - sys_sync_file_range() is only used on 64-bit architectures,
> on x32 and on mips-n32. This uses four arguments, including
> two 64-bit wide ones.
>
> - sys_sync_file_range2() continues to be used on arm, powerpc,
> xtensa and now on sh, hexagon and csky. I change the
> implementation to take six 32-bit arguments, but the ABI
> remains the same as before, with the flags before offset.
>
> - sys_sync_file_range6() is used for most other 32-bit ABIs:
> arc, m68k, microblaze, nios2, openrisc, parisc, s390, sh, sparc
> and x86. This also has six 32-bit arguments but in the
> default order (fd, offset, nbytes, flags).
>
> - sys_sync_file_range7() is exclusive to mips-o32, this one
> has an unused argument and is otherwise the same as
> sys_sync_file_range6().
>
> My plan is to then have some infrastructure to ensure
> userspace tools (libc, strace, qemu, rust, ...) use the
> same calling conventions as the kernel. I'm doing the
> same thing for all other syscalls that have architecture
> specific calling conventions, so far I'm using
>
> fadvise64_64_7
> fanotify_mark6
> truncate3
> truncate4
> ftruncate3
> ftruncate4
> fallocate6
> pread5
> pread6
> pwrite5
> pwrite6
> preadv5
> preadv6
> pwritev5
> pwritev6
> sync_file_range6
> fadvise64_64_2
> fadvise64_64_6
> fadvise64_5
> fadvise64_6
> readahead4
> readahead5
>
> The last number here is usually the number of 32-bit
> arguments, except for fadvise64_64_2 that uses the
> same argument reordering trick as sync_file_range2.
>
> I'm not too happy with the naming but couldn't come up with
> anything clearer either, so let me know if you have any
> ideas there.
OK, gotcha. I thought the 6 suffix was for SH only. I'm fine
with the naming scheme.
> > > 315 common tee sys_tee
> > > 316 common vmsplice sys_vmsplice
> > > 317 common move_pages sys_move_pages
> > > @@ -395,6 +395,7 @@
> > > 385 common pkey_alloc sys_pkey_alloc
> > > 386 common pkey_free sys_pkey_free
> > > 387 common rseq sys_rseq
> > > +388 common sync_file_range2 sys_sync_file_range2
> > > # room for arch specific syscalls
> > > 393 common semget sys_semget
> > > 394 common semctl sys_semctl
> >
> > I wonder how you discovered this bug. Did you look up the calling
> > convention on SuperH
> > and compare the argument order for the sys_sync_file_range system call
> > documented there
> > with the order in the kernel?
>
> I had to categorize all architectures based on their calling
> conventions to see if 64-bit arguments need aligned pairs or
> not, so I wrote a set of simple C files that I compiled for
> all architectures to see in which cases they insert unused
> arguments or swap the order of the upper and lower halves.
>
> SuperH, parisc and s390 are each slightly different from all the
> others here, so I ended up reading the ELF psABI docs and/or
> the compiler sources to be sure.
> I also a lot of git history.
Great job, thanks for doing the extra work to verify the ABI.
> > Did you also check what order libc uses? I would expect libc on SuperH
> > misordering the
> > arguments as well unless I am missing something. Or do we know that the
> > code is actually
> > currently broken?
>
> Yes, I checked glibc, musl and uclibc-ng for all the cases in
> which the ABI made no sense, as well as to check that my analysis
> of the kernel sources matches the expectations of the libc.
OK, awesome.
Will you send a v2 so I can ack the updated version of the patch?
I'm also fine with the patch going through your tree, as I would
like to start with the changes for v6.11 this week.
Thanks,
Adrian
On Mon, Jun 24, 2024, at 08:14, John Paul Adrian Glaubitz wrote:
> On Fri, 2024-06-21 at 11:41 +0200, Arnd Bergmann wrote:
>> On Fri, Jun 21, 2024, at 10:44, John Paul Adrian Glaubitz wrote:
>> > Did you also check what order libc uses? I would expect libc on SuperH
>> > misordering the
>> > arguments as well unless I am missing something. Or do we know that the
>> > code is actually
>> > currently broken?
>>
>> Yes, I checked glibc, musl and uclibc-ng for all the cases in
>> which the ABI made no sense, as well as to check that my analysis
>> of the kernel sources matches the expectations of the libc.
>
> OK, awesome.
>
> Will you send a v2 so I can ack the updated version of the patch?
>
> I'm also fine with the patch going through your tree, as I would
> like to start with the changes for v6.11 this week.
I should be able to get a v2 out today and apply that to my
asm-generic tree to have in linux-next before I send the
pull request.
Arnd
@@ -59,3 +59,14 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
(u64)len0 << 32 | len1, advice);
#endif
}
+
+/*
+ * swap the arguments the way that libc wants it instead of
+ * moving flags ahead of the 64-bit nbytes argument
+ */
+SYSCALL_DEFINE6(sh_sync_file_range6, int, fd, SC_ARG64(offset),
+ SC_ARG64(nbytes), unsigned int, flags)
+{
+ return ksys_sync_file_range(fd, SC_VAL64(loff_t, offset),
+ SC_VAL64(loff_t, nbytes), flags);
+}
@@ -321,7 +321,7 @@
311 common set_robust_list sys_set_robust_list
312 common get_robust_list sys_get_robust_list
313 common splice sys_splice
-314 common sync_file_range sys_sync_file_range
+314 common sync_file_range sys_sh_sync_file_range6
315 common tee sys_tee
316 common vmsplice sys_vmsplice
317 common move_pages sys_move_pages
@@ -395,6 +395,7 @@
385 common pkey_alloc sys_pkey_alloc
386 common pkey_free sys_pkey_free
387 common rseq sys_rseq
+388 common sync_file_range2 sys_sync_file_range2
# room for arch specific syscalls
393 common semget sys_semget
394 common semctl sys_semctl