Use SYSCALL_LL[64] to pass 64-bit value [BZ #20349]

Message ID 20160711192653.GA4457@intel.com
State New, archived
Headers

Commit Message

Lu, Hongjiu July 11, 2016, 7:26 p.m. UTC
  SYSCALL_LL/SYSCALL_LL64 should be used to pass 64-bit value to system
calls.

Tested on x86-64 and i686.  OK for master?


H.J.
--
	[BZ #20349]
	* sysdeps/unix/sysv/linux/preadv.c (preadv): Replace LO_HI_LONG
	with __ALIGNMENT_ARG SYSCALL_LL.
	* sysdeps/unix/sysv/linux/pwritev.c (pwritev): Likewise.
	* sysdeps/unix/sysv/linux/preadv64.c (preadv64): Replace
	LO_HI_LONG with __ALIGNMENT_ARG SYSCALL_LL64.
	* sysdeps/unix/sysv/linux/pwritev64.c (pwritev64): Likewise.
	* sysdeps/unix/sysv/linux/sysdep.h (LO_HI_LONG): Removed.
---
 sysdeps/unix/sysv/linux/preadv.c    | 5 +++--
 sysdeps/unix/sysv/linux/preadv64.c  | 5 +++--
 sysdeps/unix/sysv/linux/pwritev.c   | 5 +++--
 sysdeps/unix/sysv/linux/pwritev64.c | 5 +++--
 sysdeps/unix/sysv/linux/sysdep.h    | 5 -----
 5 files changed, 12 insertions(+), 13 deletions(-)
  

Comments

Chris Metcalf July 11, 2016, 8:36 p.m. UTC | #1
On 7/11/2016 3:26 PM, H.J. Lu wrote:
> SYSCALL_LL/SYSCALL_LL64 should be used to pass 64-bit value to system
> calls.

What problem are you trying to solve?

SYSCALL_LL and LO_HI_LONG are different on big-endian systems.  In this
case, LO_HI_LONG is correct, since the kernel API is "unsigned long pos_l,
unsigned long pos_h".  SYSCALL_LL won't do the right thing.

__ALIGNMENT_ARG introduces an extra dummy arguments to be inserted before
a 64-bit value split into two 32-bit registers, if required by the platform.
Since preadv/pwritev explicitly use split arguments to construct a 64-bit
loff_t, __ALIGNMENT_ARG will just add a random inappropriate dummy arg
to an API that doesn't need one.

I reviewed the casting for LO_HI_LONG and it looks OK; I initially was
wondering whether losing the "val >> 31, val" semantics from SYSCALL_LL()
might have been problematic, but it seems like LO_HI_LONG should generate
the same results.
  
H.J. Lu July 11, 2016, 9:04 p.m. UTC | #2
On Mon, Jul 11, 2016 at 1:36 PM, Chris Metcalf <cmetcalf@mellanox.com> wrote:
> On 7/11/2016 3:26 PM, H.J. Lu wrote:
>>
>> SYSCALL_LL/SYSCALL_LL64 should be used to pass 64-bit value to system
>> calls.
>
>
> What problem are you trying to solve?
>
> SYSCALL_LL and LO_HI_LONG are different on big-endian systems.  In this
> case, LO_HI_LONG is correct, since the kernel API is "unsigned long pos_l,
> unsigned long pos_h".  SYSCALL_LL won't do the right thing.
>
> __ALIGNMENT_ARG introduces an extra dummy arguments to be inserted before
> a 64-bit value split into two 32-bit registers, if required by the platform.
> Since preadv/pwritev explicitly use split arguments to construct a 64-bit
> loff_t, __ALIGNMENT_ARG will just add a random inappropriate dummy arg
> to an API that doesn't need one.
>
> I reviewed the casting for LO_HI_LONG and it looks OK; I initially was
> wondering whether losing the "val >> 31, val" semantics from SYSCALL_LL()
> might have been problematic, but it seems like LO_HI_LONG should generate
> the same results.
>

/* Provide a macro to pass the off{64}_t argument on p{readv,writev}{64}.  */
#define LO_HI_LONG(val) \
 (long) (val), \
 (long) (((uint64_t) (val)) >> 32)

is wrong for __ASSUME_WORDSIZE64_ILP32 platform.
  
Chris Metcalf July 13, 2016, 8:46 p.m. UTC | #3
On 7/11/2016 5:04 PM, H.J. Lu wrote:
> On Mon, Jul 11, 2016 at 1:36 PM, Chris Metcalf <cmetcalf@mellanox.com> wrote:
>> On 7/11/2016 3:26 PM, H.J. Lu wrote:
>>> SYSCALL_LL/SYSCALL_LL64 should be used to pass 64-bit value to system
>>> calls.
>>
>> What problem are you trying to solve?
>>
>> SYSCALL_LL and LO_HI_LONG are different on big-endian systems.  In this
>> case, LO_HI_LONG is correct, since the kernel API is "unsigned long pos_l,
>> unsigned long pos_h".  SYSCALL_LL won't do the right thing.
>>
>> __ALIGNMENT_ARG introduces an extra dummy arguments to be inserted before
>> a 64-bit value split into two 32-bit registers, if required by the platform.
>> Since preadv/pwritev explicitly use split arguments to construct a 64-bit
>> loff_t, __ALIGNMENT_ARG will just add a random inappropriate dummy arg
>> to an API that doesn't need one.
>>
>> I reviewed the casting for LO_HI_LONG and it looks OK; I initially was
>> wondering whether losing the "val >> 31, val" semantics from SYSCALL_LL()
>> might have been problematic, but it seems like LO_HI_LONG should generate
>> the same results.
>>
> /* Provide a macro to pass the off{64}_t argument on p{readv,writev}{64}.  */
> #define LO_HI_LONG(val) \
>   (long) (val), \
>   (long) (((uint64_t) (val)) >> 32)
>
> is wrong for __ASSUME_WORDSIZE64_ILP32 platform.

On 64-bit platforms, the default definition ends up generating a garbage
32-bit shifted "hi" arg that the kernel then discards.  So I think we
should just use a simple #ifdef _LP64 to avoid the shift, which would
be helpful for all the 64-bit platforms.

But while we're at it, I suspect we should actually pass a constant zero as
the "hi" value.  The kernel discards it, but it's not immediately obvious,
and specifying a zero here seems like good conservative programming.  In
addition, if at some point we want to use the preadv2 syscall ABI, I can
pretty much guarantee we will have to re-discover all this again because
someone will just write "LO_HI_LONG(val), flags", and be baffled as to why
the flags word is ignored on some platforms.  (And honestly, something
called LO_HI_LONG *not* producing a lo and a hi value is inherently odd.)

So I think this makes the most sense:

#ifdef _LP64
# define LO_HI_LONG(val) (val), 0
#else
# define LO_HI_LONG(val) \
  (long) (val), \
  (long) (((uint64_t) (val)) >> 32)
#endif

And then x32-like platforms that want to pass a single 64-bit loff_t
register value can override, as HJ has already done for x86_64.  It's
reasonable to consider this as the exceptional case since the kernel has
the standard version of the syscall as taking two arguments, and x32
is the only kernel architecture using the compat_sys_preadv64
implementation that only takes one loff_t argument.

This does point out that the __ASSUME_WORDSIZE64_ILP32 symbol really
isn't as general as one might like, since it really depends on which
32-bit ABI compat functions the kernel is using for each syscall.
  
Adhemerval Zanella Netto July 14, 2016, 10:50 a.m. UTC | #4
On 13/07/2016 21:46, Chris Metcalf wrote:
> On 7/11/2016 5:04 PM, H.J. Lu wrote:
>> On Mon, Jul 11, 2016 at 1:36 PM, Chris Metcalf <cmetcalf@mellanox.com> wrote:
>>> On 7/11/2016 3:26 PM, H.J. Lu wrote:
>>>> SYSCALL_LL/SYSCALL_LL64 should be used to pass 64-bit value to system
>>>> calls.
>>>
>>> What problem are you trying to solve?
>>>
>>> SYSCALL_LL and LO_HI_LONG are different on big-endian systems.  In this
>>> case, LO_HI_LONG is correct, since the kernel API is "unsigned long pos_l,
>>> unsigned long pos_h".  SYSCALL_LL won't do the right thing.
>>>
>>> __ALIGNMENT_ARG introduces an extra dummy arguments to be inserted before
>>> a 64-bit value split into two 32-bit registers, if required by the platform.
>>> Since preadv/pwritev explicitly use split arguments to construct a 64-bit
>>> loff_t, __ALIGNMENT_ARG will just add a random inappropriate dummy arg
>>> to an API that doesn't need one.
>>>
>>> I reviewed the casting for LO_HI_LONG and it looks OK; I initially was
>>> wondering whether losing the "val >> 31, val" semantics from SYSCALL_LL()
>>> might have been problematic, but it seems like LO_HI_LONG should generate
>>> the same results.
>>>
>> /* Provide a macro to pass the off{64}_t argument on p{readv,writev}{64}.  */
>> #define LO_HI_LONG(val) \
>>   (long) (val), \
>>   (long) (((uint64_t) (val)) >> 32)
>>
>> is wrong for __ASSUME_WORDSIZE64_ILP32 platform.
> 
> On 64-bit platforms, the default definition ends up generating a garbage
> 32-bit shifted "hi" arg that the kernel then discards.  So I think we
> should just use a simple #ifdef _LP64 to avoid the shift, which would
> be helpful for all the 64-bit platforms.

This also seems the right thing to do imho.

> 
> But while we're at it, I suspect we should actually pass a constant zero as
> the "hi" value.  The kernel discards it, but it's not immediately obvious,
> and specifying a zero here seems like good conservative programming.  In
> addition, if at some point we want to use the preadv2 syscall ABI, I can
> pretty much guarantee we will have to re-discover all this again because
> someone will just write "LO_HI_LONG(val), flags", and be baffled as to why
> the flags word is ignored on some platforms.  (And honestly, something
> called LO_HI_LONG *not* producing a lo and a hi value is inherently odd.)
> 
> So I think this makes the most sense:
> 
> #ifdef _LP64
> # define LO_HI_LONG(val) (val), 0
> #else
> # define LO_HI_LONG(val) \
>  (long) (val), \
>  (long) (((uint64_t) (val)) >> 32)
> #endif

Yes, this is what I am thinking to do to fix the Linux implementation.

> 
> And then x32-like platforms that want to pass a single 64-bit loff_t
> register value can override, as HJ has already done for x86_64.  It's
> reasonable to consider this as the exceptional case since the kernel has
> the standard version of the syscall as taking two arguments, and x32
> is the only kernel architecture using the compat_sys_preadv64
> implementation that only takes one loff_t argument.
> 
> This does point out that the __ASSUME_WORDSIZE64_ILP32 symbol really
> isn't as general as one might like, since it really depends on which
> 32-bit ABI compat functions the kernel is using for each syscall.
> 

At the time I suggested the __ASSUME_WORDSIZE64_ILP32 I was not area of 
all variant usages for supported ports.  I think your LP64 suggestion
does make sense and although my initial approach would to avoid arch
specific overrides, x32 does make sense in this case.  I think that
if possible futures ILP32 architecture also decide to follow x32
argument passing for 64-bit loff_t, we can make it generic by a
arch specific define.
  
H.J. Lu July 14, 2016, 3:21 p.m. UTC | #5
On Thu, Jul 14, 2016 at 3:50 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 13/07/2016 21:46, Chris Metcalf wrote:
>> On 7/11/2016 5:04 PM, H.J. Lu wrote:
>>> On Mon, Jul 11, 2016 at 1:36 PM, Chris Metcalf <cmetcalf@mellanox.com> wrote:
>>>> On 7/11/2016 3:26 PM, H.J. Lu wrote:
>>>>> SYSCALL_LL/SYSCALL_LL64 should be used to pass 64-bit value to system
>>>>> calls.
>>>>
>>>> What problem are you trying to solve?
>>>>
>>>> SYSCALL_LL and LO_HI_LONG are different on big-endian systems.  In this
>>>> case, LO_HI_LONG is correct, since the kernel API is "unsigned long pos_l,
>>>> unsigned long pos_h".  SYSCALL_LL won't do the right thing.
>>>>
>>>> __ALIGNMENT_ARG introduces an extra dummy arguments to be inserted before
>>>> a 64-bit value split into two 32-bit registers, if required by the platform.
>>>> Since preadv/pwritev explicitly use split arguments to construct a 64-bit
>>>> loff_t, __ALIGNMENT_ARG will just add a random inappropriate dummy arg
>>>> to an API that doesn't need one.
>>>>
>>>> I reviewed the casting for LO_HI_LONG and it looks OK; I initially was
>>>> wondering whether losing the "val >> 31, val" semantics from SYSCALL_LL()
>>>> might have been problematic, but it seems like LO_HI_LONG should generate
>>>> the same results.
>>>>
>>> /* Provide a macro to pass the off{64}_t argument on p{readv,writev}{64}.  */
>>> #define LO_HI_LONG(val) \
>>>   (long) (val), \
>>>   (long) (((uint64_t) (val)) >> 32)
>>>
>>> is wrong for __ASSUME_WORDSIZE64_ILP32 platform.
>>
>> On 64-bit platforms, the default definition ends up generating a garbage
>> 32-bit shifted "hi" arg that the kernel then discards.  So I think we
>> should just use a simple #ifdef _LP64 to avoid the shift, which would
>> be helpful for all the 64-bit platforms.
>
> This also seems the right thing to do imho.
>
>>
>> But while we're at it, I suspect we should actually pass a constant zero as
>> the "hi" value.  The kernel discards it, but it's not immediately obvious,
>> and specifying a zero here seems like good conservative programming.  In
>> addition, if at some point we want to use the preadv2 syscall ABI, I can
>> pretty much guarantee we will have to re-discover all this again because
>> someone will just write "LO_HI_LONG(val), flags", and be baffled as to why
>> the flags word is ignored on some platforms.  (And honestly, something
>> called LO_HI_LONG *not* producing a lo and a hi value is inherently odd.)
>>
>> So I think this makes the most sense:
>>
>> #ifdef _LP64
>> # define LO_HI_LONG(val) (val), 0
>> #else
>> # define LO_HI_LONG(val) \
>>  (long) (val), \
>>  (long) (((uint64_t) (val)) >> 32)
>> #endif
>
> Yes, this is what I am thinking to do to fix the Linux implementation.

I think it is a mistake to put LO_HI_LONG in sysdep.h.  It is
specific to preadv/pwritev and it should be defined in a common
file for preadv/pwritev.

>> And then x32-like platforms that want to pass a single 64-bit loff_t
>> register value can override, as HJ has already done for x86_64.  It's
>> reasonable to consider this as the exceptional case since the kernel has
>> the standard version of the syscall as taking two arguments, and x32
>> is the only kernel architecture using the compat_sys_preadv64
>> implementation that only takes one loff_t argument.
>>
>> This does point out that the __ASSUME_WORDSIZE64_ILP32 symbol really
>> isn't as general as one might like, since it really depends on which
>> 32-bit ABI compat functions the kernel is using for each syscall.
>>
>
> At the time I suggested the __ASSUME_WORDSIZE64_ILP32 I was not area of
> all variant usages for supported ports.  I think your LP64 suggestion
> does make sense and although my initial approach would to avoid arch
> specific overrides, x32 does make sense in this case.  I think that
> if possible futures ILP32 architecture also decide to follow x32
> argument passing for 64-bit loff_t, we can make it generic by a
> arch specific define.

It is similar to llseek vs lseek.  ILP32 architecture can choose llseek over
lseek.  X32 always passes 64-bit offset in a 640-bit register.I
  

Patch

diff --git a/sysdeps/unix/sysv/linux/preadv.c b/sysdeps/unix/sysv/linux/preadv.c
index 107cb81..3db6912 100644
--- a/sysdeps/unix/sysv/linux/preadv.c
+++ b/sysdeps/unix/sysv/linux/preadv.c
@@ -29,7 +29,8 @@ 
 ssize_t
 preadv (int fd, const struct iovec *vector, int count, off_t offset)
 {
-  return SYSCALL_CANCEL (preadv, fd, vector, count, LO_HI_LONG (offset));
+  return SYSCALL_CANCEL (preadv, fd, vector, count,
+			  __ALIGNMENT_ARG SYSCALL_LL (offset));
 }
 # else
 static ssize_t __atomic_preadv_replacement (int, const struct iovec *,
@@ -39,7 +40,7 @@  preadv (int fd, const struct iovec *vector, int count, off_t offset)
 {
 #  ifdef __NR_preadv
   ssize_t result = SYSCALL_CANCEL (preadv, fd, vector, count,
-				   LO_HI_LONG (offset));
+				   __ALIGNMENT_ARG SYSCALL_LL (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #  endif
diff --git a/sysdeps/unix/sysv/linux/preadv64.c b/sysdeps/unix/sysv/linux/preadv64.c
index 66afd4c..27cd04f 100644
--- a/sysdeps/unix/sysv/linux/preadv64.c
+++ b/sysdeps/unix/sysv/linux/preadv64.c
@@ -27,7 +27,8 @@ 
 ssize_t
 preadv64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
-  return SYSCALL_CANCEL (preadv64, fd, vector, count, LO_HI_LONG (offset));
+  return SYSCALL_CANCEL (preadv64, fd, vector, count,
+			 __ALIGNMENT_ARG SYSCALL_LL64 (offset));
 }
 #else
 static ssize_t __atomic_preadv64_replacement (int, const struct iovec *,
@@ -37,7 +38,7 @@  preadv64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
 #ifdef __NR_preadv64
   ssize_t result = SYSCALL_CANCEL (preadv64, fd, vector, count,
-				   LO_HI_LONG (offset));
+				   __ALIGNMENT_ARG SYSCALL_LL64 (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #endif
diff --git a/sysdeps/unix/sysv/linux/pwritev.c b/sysdeps/unix/sysv/linux/pwritev.c
index 6747f42..7c98bfd 100644
--- a/sysdeps/unix/sysv/linux/pwritev.c
+++ b/sysdeps/unix/sysv/linux/pwritev.c
@@ -29,7 +29,8 @@ 
 ssize_t
 pwritev (int fd, const struct iovec *vector, int count, off_t offset)
 {
-  return SYSCALL_CANCEL (pwritev, fd, vector, count, LO_HI_LONG (offset));
+  return SYSCALL_CANCEL (pwritev, fd, vector, count,
+			 __ALIGNMENT_ARG SYSCALL_LL (offset));
 }
 # else
 static ssize_t __atomic_pwritev_replacement (int, const struct iovec *,
@@ -39,7 +40,7 @@  pwritev (int fd, const struct iovec *vector, int count, off_t offset)
 {
 #  ifdef __NR_pwritev
   ssize_t result = SYSCALL_CANCEL (pwritev, fd, vector, count,
-				   LO_HI_LONG (offset));
+				   __ALIGNMENT_ARG SYSCALL_LL (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #  endif
diff --git a/sysdeps/unix/sysv/linux/pwritev64.c b/sysdeps/unix/sysv/linux/pwritev64.c
index e162948..5d5d43f 100644
--- a/sysdeps/unix/sysv/linux/pwritev64.c
+++ b/sysdeps/unix/sysv/linux/pwritev64.c
@@ -27,7 +27,8 @@ 
 ssize_t
 pwritev64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
-  return SYSCALL_CANCEL (pwritev64, fd, vector, count, LO_HI_LONG (offset));
+  return SYSCALL_CANCEL (pwritev64, fd, vector, count,
+			 __ALIGNMENT_ARG SYSCALL_LL64 (offset));
 }
 #else
 static ssize_t __atomic_pwritev64_replacement (int, const struct iovec *,
@@ -37,7 +38,7 @@  pwritev64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
 #ifdef __NR_pwritev64
   ssize_t result = SYSCALL_CANCEL (pwritev64, fd, vector, count,
-				   LO_HI_LONG (offset));
+				   __ALIGNMENT_ARG SYSCALL_LL64 (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #endif
diff --git a/sysdeps/unix/sysv/linux/sysdep.h b/sysdeps/unix/sysv/linux/sysdep.h
index a469f57..f2d7e05 100644
--- a/sysdeps/unix/sysv/linux/sysdep.h
+++ b/sysdeps/unix/sysv/linux/sysdep.h
@@ -47,8 +47,3 @@ 
 #define SYSCALL_LL64(val) \
   __LONG_LONG_PAIR ((long) ((val) >> 32), (long) ((val) & 0xffffffff))
 #endif
-
-/* Provide a macro to pass the off{64}_t argument on p{readv,writev}{64}.  */
-#define LO_HI_LONG(val) \
- (long) (val), \
- (long) (((uint64_t) (val)) >> 32)