[RFC,1/1] Optimizing the rand function on arm64 platforms

Message ID 20230724023241.6182-1-tiantao6@hisilicon.com
State New
Headers
Series [RFC,1/1] Optimizing the rand function on arm64 platforms |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 fail Testing failed

Commit Message

tiantao \(H\) July 24, 2023, 2:32 a.m. UTC
  Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
obtained from the cpu. This optimization is therefore proposed for the
arm64 platform.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
---
 stdlib/rand.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
  

Comments

Xi Ruoyao July 24, 2023, 3:49 a.m. UTC | #1
On Mon, 2023-07-24 at 10:32 +0800, Tian Tao via Libc-alpha wrote:
> Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
> obtained from the cpu. This optimization is therefore proposed for the
> arm64 platform.

NAK.

Glibc documentation explicitly says:

   You can obtain repeatable sequences of numbers on a particular
   machine type by specifying the same initial seed value for the random
   number generator.
   
This rand() implementation obviously does not satisfy this requirement.

> 
> Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
> ---
>  stdlib/rand.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/stdlib/rand.c b/stdlib/rand.c
> index c250c065e4..f199dbd65f 100644
> --- a/stdlib/rand.c
> +++ b/stdlib/rand.c
> @@ -19,10 +19,42 @@
>  
>  #undef rand
>  
> +#ifdef __aarch64__
> +// A global variable to indicate whether rndr instruction is
> supported
> +static int rndr_supported = 0;
> +// A function prototype to check rndr support
> +static void check_rndr_support (void) __attribute__ ((constructor));
> +#endif
>  
>  /* Return a random integer between 0 and RAND_MAX.  */
>  int
>  rand (void)
>  {
> +  // Use armv8.5 rndr instruction if supported
> +#ifdef __aarch64__
> +  if (rndr_supported) {
> +    unsigned int r;
> +    asm volatile("mrs %0, s3_3_c2_c4_0" : "=r" (r));
> +    // Discard the least random bit
> +    r >>=1;
> +    return (int) r;
> +  }
> +#endif
>    return (int) __random ();
>  }
> +
> +#ifdef __aarch64__
> +// A function to check rndr support and set the global variable
> accordingly
> +static void
> +check_rndr_support (void)
> +{
> +  unsigned long isar0;
> +  // Read the ID_AA64ISAR0_EL1 register to check the rndr feature
> +  asm volatile("mrs %0, id_aa64isar0_el1" : "=r" (isar0));
> +  // Check the bits [63:60] for the rndr feature
> +  if ((isar0 >> 60) & 0xf) {
> +    // Set the global variable to indicate rndr support
> +    rndr_supported = 1;
> +  }
> +}
> +#endif
  
tiantao \(H\) July 24, 2023, 4:01 a.m. UTC | #2
Thanks for the reply.

Using FEAT_RNG can be a very big performance boost, any suggestions for applying the FEAT_RNG feature to glibc?

-----邮件原件-----
发件人: Xi Ruoyao <xry111@xry111.site> 
发送时间: 2023年7月24日 11:50
收件人: tiantao (H) <tiantao6@hisilicon.com>; libc-alpha@sourceware.org
抄送: kernel@openeuler.org
主题: Re: [RFC 1/1] Optimizing the rand function on arm64 platforms

On Mon, 2023-07-24 at 10:32 +0800, Tian Tao via Libc-alpha wrote:
> Armv8.5 adds the FEAT_RNG feature so that true random numbers can be 
> obtained from the cpu. This optimization is therefore proposed for the
> arm64 platform.

NAK.

Glibc documentation explicitly says:

   You can obtain repeatable sequences of numbers on a particular
   machine type by specifying the same initial seed value for the random
   number generator.
   
This rand() implementation obviously does not satisfy this requirement.

> 
> Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
> ---
>  stdlib/rand.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/stdlib/rand.c b/stdlib/rand.c index 
> c250c065e4..f199dbd65f 100644
> --- a/stdlib/rand.c
> +++ b/stdlib/rand.c
> @@ -19,10 +19,42 @@
>  
>  #undef rand
>  
> +#ifdef __aarch64__
> +// A global variable to indicate whether rndr instruction is
> supported
> +static int rndr_supported = 0;
> +// A function prototype to check rndr support static void 
> +check_rndr_support (void) __attribute__ ((constructor)); #endif
>  
>  /* Return a random integer between 0 and RAND_MAX.  */
>  int
>  rand (void)
>  {
> +  // Use armv8.5 rndr instruction if supported #ifdef __aarch64__
> +  if (rndr_supported) {
> +    unsigned int r;
> +    asm volatile("mrs %0, s3_3_c2_c4_0" : "=r" (r));
> +    // Discard the least random bit
> +    r >>=1;
> +    return (int) r;
> +  }
> +#endif
>    return (int) __random ();
>  }
> +
> +#ifdef __aarch64__
> +// A function to check rndr support and set the global variable
> accordingly
> +static void
> +check_rndr_support (void)
> +{
> +  unsigned long isar0;
> +  // Read the ID_AA64ISAR0_EL1 register to check the rndr feature
> +  asm volatile("mrs %0, id_aa64isar0_el1" : "=r" (isar0));
> +  // Check the bits [63:60] for the rndr feature
> +  if ((isar0 >> 60) & 0xf) {
> +    // Set the global variable to indicate rndr support
> +    rndr_supported = 1;
> +  }
> +}
> +#endif

--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
  
Xi Ruoyao July 24, 2023, 4:01 a.m. UTC | #3
On Mon, 2023-07-24 at 11:49 +0800, Xi Ruoyao via Libc-alpha wrote:
> On Mon, 2023-07-24 at 10:32 +0800, Tian Tao via Libc-alpha wrote:
> > Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
> > obtained from the cpu. This optimization is therefore proposed for the
> > arm64 platform.
> 
> NAK.
> 
> Glibc documentation explicitly says:
> 
>    You can obtain repeatable sequences of numbers on a particular
>    machine type by specifying the same initial seed value for the random
>    number generator.

And this is actually a standard requirement, so it's still NAK even if
we allow to change the Glibc documentation and make a new symver for
rand().

Quote from C23 7.24.2.2p2:

   The srand function uses the argument as a seed for a new sequence of
   pseudo-random numbers to be returned by subsequent calls to rand. If
   srand is then called with the same seed value, the sequence of pseudo-
   random numbers shall be repeated.
   
> This rand() implementation obviously does not satisfy this requirement.
> 
> > 
> > Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
> > ---
> >  stdlib/rand.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/stdlib/rand.c b/stdlib/rand.c
> > index c250c065e4..f199dbd65f 100644
> > --- a/stdlib/rand.c
> > +++ b/stdlib/rand.c
> > @@ -19,10 +19,42 @@
> >  
> >  #undef rand
> >  
> > +#ifdef __aarch64__
> > +// A global variable to indicate whether rndr instruction is
> > supported
> > +static int rndr_supported = 0;
> > +// A function prototype to check rndr support
> > +static void check_rndr_support (void) __attribute__ ((constructor));
> > +#endif
> >  
> >  /* Return a random integer between 0 and RAND_MAX.  */
> >  int
> >  rand (void)
> >  {
> > +  // Use armv8.5 rndr instruction if supported
> > +#ifdef __aarch64__
> > +  if (rndr_supported) {
> > +    unsigned int r;
> > +    asm volatile("mrs %0, s3_3_c2_c4_0" : "=r" (r));
> > +    // Discard the least random bit
> > +    r >>=1;
> > +    return (int) r;
> > +  }
> > +#endif
> >    return (int) __random ();
> >  }
> > +
> > +#ifdef __aarch64__
> > +// A function to check rndr support and set the global variable
> > accordingly
> > +static void
> > +check_rndr_support (void)
> > +{
> > +  unsigned long isar0;
> > +  // Read the ID_AA64ISAR0_EL1 register to check the rndr feature
> > +  asm volatile("mrs %0, id_aa64isar0_el1" : "=r" (isar0));
> > +  // Check the bits [63:60] for the rndr feature
> > +  if ((isar0 >> 60) & 0xf) {
> > +    // Set the global variable to indicate rndr support
> > +    rndr_supported = 1;
> > +  }
> > +}
> > +#endif
>
  
Xi Ruoyao July 24, 2023, 4:49 a.m. UTC | #4
On Mon, 2023-07-24 at 04:01 +0000, tiantao (H) wrote:
> Thanks for the reply.
> 
> Using FEAT_RNG can be a very big performance boost

Hmm, I think this may be untrue.  I've attempted using inline asm to
invoke x86 rdrand instead of rand() for some competitive programming
code, but the result is *slower*.

The spec of this instruction [1] even says:

   If the instruction cannot return a genuine random number in a reasonable
   period of time, PSTATE.NZCV is set to 0b0100 and the data value returned
   is 0.
   
So at least for some implementations the "reasonable period of time" may
be long.  And we need to check PSTATE.NZCV and loop (or fallback to a
software-based PRNG), then it will be even slower...

[1]:https://developer.arm.com/documentation/ddi0601/2023-06/AArch64-Registers/RNDR--Random-Number

Generally a hardware RNG is for cryptography use, so the design of HWRNG
is often focused on security, instead of speed.  Maybe it is fast on
*your* implementation, and it may never fail on *your* implementation,
but other implementations may do things differently.
   
> any suggestions for applying the FEAT_RNG feature to glibc?

The suggestion is "don't do that".

A feature is not necessarily useful for Glibc only because it exists.
  
tiantao \(H\) July 24, 2023, 9:13 a.m. UTC | #5
在 2023/7/24 12:49, Xi Ruoyao 写道:
> On Mon, 2023-07-24 at 04:01 +0000, tiantao (H) wrote:
>> Thanks for the reply.
>>
>> Using FEAT_RNG can be a very big performance boost
> Hmm, I think this may be untrue.  I've attempted using inline asm to
> invoke x86 rdrand instead of rand() for some competitive programming
> code, but the result is *slower*.

Thanks for the reply, I tested the performance on an ARM64 server very 
well, especially in multithreaded scenarios


> The spec of this instruction [1] even says:
>
>     If the instruction cannot return a genuine random number in a reasonable
>     period of time, PSTATE.NZCV is set to 0b0100 and the data value returned
>     is 0.
>     
> So at least for some implementations the "reasonable period of time" may
> be long.  And we need to check PSTATE.NZCV and loop (or fallback to a
> software-based PRNG), then it will be even slower...
>
> [1]:https://developer.arm.com/documentation/ddi0601/2023-06/AArch64-Registers/RNDR--Random-Number
>
> Generally a hardware RNG is for cryptography use, so the design of HWRNG
> is often focused on security, instead of speed.  Maybe it is fast on
> *your* implementation, and it may never fail on *your* implementation,
> but other implementations may do things differently.
>     
>> any suggestions for applying the FEAT_RNG feature to glibc?
> The suggestion is "don't do that".
>
> A feature is not necessarily useful for Glibc only because it exists.
>
  
Xi Ruoyao July 24, 2023, 9:48 a.m. UTC | #6
On Mon, 2023-07-24 at 17:13 +0800, tiantao (H) wrote:
> 
> 在 2023/7/24 12:49, Xi Ruoyao 写道:
> > On Mon, 2023-07-24 at 04:01 +0000, tiantao (H) wrote:
> > > Thanks for the reply.
> > > 
> > > Using FEAT_RNG can be a very big performance boost
> > Hmm, I think this may be untrue.  I've attempted using inline asm to
> > invoke x86 rdrand instead of rand() for some competitive programming
> > code, but the result is *slower*.
> 
> Thanks for the reply, I tested the performance on an ARM64 server very
> well, especially in multithreaded scenarios

It's wrong to intensively call rand() in multiple threads.  C23 7.1.4p5
mandates to guard the global random state with locking.  So by intensively
calling rand() in multiple threads you are just creating a parallelism
bottleneck.

Please stop trying to misoptimize rand(), fix the code abusing it
instead.  For example, by using lrand48_r with each thread maintaining
its own drand48_data struct.
  
Adhemerval Zanella Netto July 24, 2023, 12:59 p.m. UTC | #7
On 24/07/23 01:01, Xi Ruoyao via Libc-alpha wrote:
> On Mon, 2023-07-24 at 11:49 +0800, Xi Ruoyao via Libc-alpha wrote:
>> On Mon, 2023-07-24 at 10:32 +0800, Tian Tao via Libc-alpha wrote:
>>> Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
>>> obtained from the cpu. This optimization is therefore proposed for the
>>> arm64 platform.
>>
>> NAK.
>>
>> Glibc documentation explicitly says:
>>
>>    You can obtain repeatable sequences of numbers on a particular
>>    machine type by specifying the same initial seed value for the random
>>    number generator.
> 
> And this is actually a standard requirement, so it's still NAK even if
> we allow to change the Glibc documentation and make a new symver for
> rand().
> 
> Quote from C23 7.24.2.2p2:
> 
>    The srand function uses the argument as a seed for a new sequence of
>    pseudo-random numbers to be returned by subsequent calls to rand. If
>    srand is then called with the same seed value, the sequence of pseudo-
>    random numbers shall be repeated.
>    
>> This rand() implementation obviously does not satisfy this requirement.

The rand family functions (including the rand48 ones) is *really* a bad
interface and it seems that future POSIX realized it and plan to obsolete 
it in next revision [1].

And I don't think it would be worth to improve this interface, as OpenBSD
did by making them them a arc4random wrapper instead (and adding additional
interfaces to return deterministic values as the standard defines).

And I am not sure if FEAT_RNG or RDRAND would be useful for glibc, on
arc4random addition we have a very long discussion where in the end
we replaced the userland chacha20 implementation with a wrapper for the
getrandom syscall because of the reseed issues in userland (it seems
that there is no safe way to know where the process needs to reseed
without a new kernel interface).

It might still be useful for internal hardening (for like stack cookies,
malloc internal data structures, internal randomness); but even for that
I think using a platform neutral like either arc4random or even a 
PRNG would be more useful (although we might do use hardware instruction
it it were the case).

[1] https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00104.html
  
Cristian Rodríguez July 24, 2023, 2:46 p.m. UTC | #8
On Sun, Jul 23, 2023 at 10:35 PM Tian Tao via Libc-alpha <
libc-alpha@sourceware.org> wrote:

> Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
> obtained from the cpu. This optimization is therefore proposed for the
> arm64 platform.
>


Unfortunately standards preclude you to fix this terrible interfaces, it is
highly suggested you move your efforts into adding feat_rng support to the
linux kernel so it uses as a source of entropy and to
the efforts of providing a really fast user space interface to reduce the
syscall overhead.. the VDSO approach failed to gain approval of kernel
developers so maybe  a "rng_state" accessible via rseq can work..

These interfaces are broken beyond repair and you should look elsewhere for
random numbers.
  
Adhemerval Zanella Netto July 24, 2023, 5:44 p.m. UTC | #9
On 24/07/23 11:46, Cristian Rodríguez via Libc-alpha wrote:
> On Sun, Jul 23, 2023 at 10:35 PM Tian Tao via Libc-alpha <
> libc-alpha@sourceware.org> wrote:
> 
>> Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
>> obtained from the cpu. This optimization is therefore proposed for the
>> arm64 platform.
>>
> 
> 
> Unfortunately standards preclude you to fix this terrible interfaces, it is
> highly suggested you move your efforts into adding feat_rng support to the
> linux kernel so it uses as a source of entropy and to
> the efforts of providing a really fast user space interface to reduce the
> syscall overhead.. the VDSO approach failed to gain approval of kernel
> developers so maybe  a "rng_state" accessible via rseq can work..

On kernel side, it already has proper support for kASLR and CRNG interfaces
for both getrandom and /dev/random (check arch/arm64/include/asm/archrandom.h
history).

If we even add back the PRNG with userland state, I would expect that entropy
would continue to be obtained from kernel for the state init/reset.  So even
in this scenario I don't see why use FEAT_RNG directly with kernel proving
a better entropy source.

> 
> These interfaces are broken beyond repair and you should look elsewhere for
> random numbers.
  
Xi Ruoyao July 24, 2023, 6 p.m. UTC | #10
On Mon, 2023-07-24 at 10:46 -0400, Cristian Rodríguez via Libc-alpha
wrote:
> On Sun, Jul 23, 2023 at 10:35 PM Tian Tao via Libc-alpha <
> libc-alpha@sourceware.org> wrote:
> 
> > Armv8.5 adds the FEAT_RNG feature so that true random numbers can be
> > obtained from the cpu. This optimization is therefore proposed for
> > the
> > arm64 platform.
> > 
> 
> 
> Unfortunately standards preclude you to fix this terrible interfaces, it is
> highly suggested you move your efforts into adding feat_rng support to the
> linux kernel so it uses as a source of entropy

It already does:

https://git.kernel.org/linus/1a50ec0b3b2e9a83f1b1245ea37a853aac2f741c

> and to the efforts of providing a really fast user space interface to reduce the
> syscall overhead.. the VDSO approach failed to gain approval of kernel
> developers so maybe  a "rng_state" accessible via rseq can work..

I think it's a different topic.  The vDSO approach is for CSPRNG, but we
don't actually need all PRNGs to be cryptography secure. 

> These interfaces are broken beyond repair and you should look
> elsewhere for random numbers.
  
Xi Ruoyao July 24, 2023, 6:03 p.m. UTC | #11
On Mon, 2023-07-24 at 09:59 -0300, Adhemerval Zanella Netto wrote:
> 
> The rand family functions (including the rand48 ones) is *really* a bad
> interface and it seems that future POSIX realized it and plan to obsolete 
> it in next revision [1].
> 
> [1] https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00104.html

Hmm, what is the problem with the rand48 ones?  This link does not
mention the rand48 functions anyway.

AFAIK rand_r is broken because the state space is too small and rand48
functions do not share the issue.
  
Adhemerval Zanella Netto July 24, 2023, 6:14 p.m. UTC | #12
On 24/07/23 15:03, Xi Ruoyao wrote:
> On Mon, 2023-07-24 at 09:59 -0300, Adhemerval Zanella Netto wrote:
>>
>> The rand family functions (including the rand48 ones) is *really* a bad
>> interface and it seems that future POSIX realized it and plan to obsolete 
>> it in next revision [1].
>>
>> [1] https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00104.html
> 
> Hmm, what is the problem with the rand48 ones?  This link does not
> mention the rand48 functions anyway.
> 
> AFAIK rand_r is broken because the state space is too small and rand48
> functions do not share the issue.

lrand48 and nrand48 only returns over an interval of [0, 2^31), which is
tricky to work with unsigned results, sizes larger than long (either
int128 or int64 for 32 bits), or buffers.   And it is still a linear 
congruential formula, so might be not suitable for all scenarios.
  

Patch

diff --git a/stdlib/rand.c b/stdlib/rand.c
index c250c065e4..f199dbd65f 100644
--- a/stdlib/rand.c
+++ b/stdlib/rand.c
@@ -19,10 +19,42 @@ 
 
 #undef	rand
 
+#ifdef __aarch64__
+// A global variable to indicate whether rndr instruction is supported
+static int rndr_supported = 0;
+// A function prototype to check rndr support
+static void check_rndr_support (void) __attribute__ ((constructor));
+#endif
 
 /* Return a random integer between 0 and RAND_MAX.  */
 int
 rand (void)
 {
+  // Use armv8.5 rndr instruction if supported
+#ifdef __aarch64__
+  if (rndr_supported) {
+    unsigned int r;
+    asm volatile("mrs %0, s3_3_c2_c4_0" : "=r" (r));
+    // Discard the least random bit
+    r >>=1;
+    return (int) r;
+  }
+#endif
   return (int) __random ();
 }
+
+#ifdef __aarch64__
+// A function to check rndr support and set the global variable accordingly
+static void
+check_rndr_support (void)
+{
+  unsigned long isar0;
+  // Read the ID_AA64ISAR0_EL1 register to check the rndr feature
+  asm volatile("mrs %0, id_aa64isar0_el1" : "=r" (isar0));
+  // Check the bits [63:60] for the rndr feature
+  if ((isar0 >> 60) & 0xf) {
+    // Set the global variable to indicate rndr support
+    rndr_supported = 1;
+  }
+}
+#endif