[v1,1/2] random-bits: Factor out entropy generating function

Message ID 20220328220936.2724834-1-goldstein.w.n@gmail.com
State Dropped
Headers
Series [v1,1/2] random-bits: Factor out entropy generating function |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Noah Goldstein March 28, 2022, 10:09 p.m. UTC
  On some architectures `clock_gettime` is undesirable as
it may use a syscall or there may be a faster alternative.
Future architecture specific functions can be added in
sysdeps/<arch>/random-bits-entropy.h to provide a version of
'random_bits_entropy' that doesn't use 'clock_gettime'.
---
 include/random-bits.h                 | 16 ++++++--------
 sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 10 deletions(-)
 create mode 100644 sysdeps/generic/random-bits-entropy.h
  

Comments

Adhemerval Zanella Netto March 29, 2022, 7:51 p.m. UTC | #1
On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> On some architectures `clock_gettime` is undesirable as
> it may use a syscall or there may be a faster alternative.
> Future architecture specific functions can be added in
> sysdeps/<arch>/random-bits-entropy.h to provide a version of
> 'random_bits_entropy' that doesn't use 'clock_gettime'.
> ---
>  include/random-bits.h                 | 16 ++++++--------
>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
>  2 files changed, 37 insertions(+), 10 deletions(-)
>  create mode 100644 sysdeps/generic/random-bits-entropy.h
> 
> diff --git a/include/random-bits.h b/include/random-bits.h
> index 17665b479a..016b87576c 100644
> --- a/include/random-bits.h
> +++ b/include/random-bits.h
> @@ -19,21 +19,17 @@
>  #ifndef _RANDOM_BITS_H
>  # define _RANDOM_BITS_H
>  
> -#include <time.h>
> -#include <stdint.h>
> +# include <random-bits-entropy.h>
> +# include <stdint.h>
>  
> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> -   starting time, nano-second accuracy, its randomness is significantly better
> -   than gettimeofday, and for mostly architectures it is implemented through
> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> -   bits will have less entropy. */
> +/* Provides fast pseudo-random bits through architecture specific
> +   random_bits_entropy.  Expectation is source is some timing function so
> +   the upper bits have less entropy.  */
>  static inline uint32_t
>  random_bits (void)
>  {
> -  struct __timespec64 tv;
> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> +  uint32_t ret = random_bits_entropy ();
>    /* Shuffle the lower bits to minimize the clock bias.  */
> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
>    ret ^= (ret << 24) | (ret >> 8);
>    return ret;
>  }

We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
generic interface (and other high precision timing on other architectures).
So I think a better way would be to:

  static inline uint32_t
  random_bits (void)
  { 
    hp_timing_t hp; 
    HP_TIMING_NOW (hp);
    /* Shuffle the lower bits to minimize the clock bias.  */
    uint32_t ret = hp >> 32 ^ (uint32_t) hp;
    ret ^= (ret << 24) | (ret >> 8);
    return ret;
  }

And keep the XOR on with higher bits to keep the clock bias.

> diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
> new file mode 100644
> index 0000000000..53290c7f7a
> --- /dev/null
> +++ b/sysdeps/generic/random-bits-entropy.h
> @@ -0,0 +1,31 @@
> +/* Fast function for generating entropy of random_bits.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <stdint.h>
> +#include <time.h>
> +
> +/* Generically use clock_gettime. It has unspecified starting time, nano-second
> +   accuracy, its randomness is significantly better than gettimeofday, and for
> +   mostly architectures it is implemented through vDSO instead of a syscall.  */
> +static inline uint32_t
> +random_bits_entropy (void)
> +{
> +  struct __timespec64 tv;
> +  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> +  return tv.tv_nsec ^ tv.tv_sec;
> +}
  
Noah Goldstein March 29, 2022, 7:56 p.m. UTC | #2
On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> > On some architectures `clock_gettime` is undesirable as
> > it may use a syscall or there may be a faster alternative.
> > Future architecture specific functions can be added in
> > sysdeps/<arch>/random-bits-entropy.h to provide a version of
> > 'random_bits_entropy' that doesn't use 'clock_gettime'.
> > ---
> >  include/random-bits.h                 | 16 ++++++--------
> >  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
> >  2 files changed, 37 insertions(+), 10 deletions(-)
> >  create mode 100644 sysdeps/generic/random-bits-entropy.h
> >
> > diff --git a/include/random-bits.h b/include/random-bits.h
> > index 17665b479a..016b87576c 100644
> > --- a/include/random-bits.h
> > +++ b/include/random-bits.h
> > @@ -19,21 +19,17 @@
> >  #ifndef _RANDOM_BITS_H
> >  # define _RANDOM_BITS_H
> >
> > -#include <time.h>
> > -#include <stdint.h>
> > +# include <random-bits-entropy.h>
> > +# include <stdint.h>
> >
> > -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> > -   starting time, nano-second accuracy, its randomness is significantly better
> > -   than gettimeofday, and for mostly architectures it is implemented through
> > -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> > -   bits will have less entropy. */
> > +/* Provides fast pseudo-random bits through architecture specific
> > +   random_bits_entropy.  Expectation is source is some timing function so
> > +   the upper bits have less entropy.  */
> >  static inline uint32_t
> >  random_bits (void)
> >  {
> > -  struct __timespec64 tv;
> > -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> > +  uint32_t ret = random_bits_entropy ();
> >    /* Shuffle the lower bits to minimize the clock bias.  */
> > -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
> >    ret ^= (ret << 24) | (ret >> 8);
> >    return ret;
> >  }
>
> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
> generic interface (and other high precision timing on other architectures).
> So I think a better way would be to:

For x86/generic that works but other architectures also have hp-timing
implementations that might not be suitable for this (i.e there might be
an entropy regression).

>
>   static inline uint32_t
>   random_bits (void)
>   {
>     hp_timing_t hp;
>     HP_TIMING_NOW (hp);
>     /* Shuffle the lower bits to minimize the clock bias.  */
>     uint32_t ret = hp >> 32 ^ (uint32_t) hp;
>     ret ^= (ret << 24) | (ret >> 8);
>     return ret;
>   }
>
> And keep the XOR on with higher bits to keep the clock bias.
>
> > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
> > new file mode 100644
> > index 0000000000..53290c7f7a
> > --- /dev/null
> > +++ b/sysdeps/generic/random-bits-entropy.h
> > @@ -0,0 +1,31 @@
> > +/* Fast function for generating entropy of random_bits.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <stdint.h>
> > +#include <time.h>
> > +
> > +/* Generically use clock_gettime. It has unspecified starting time, nano-second
> > +   accuracy, its randomness is significantly better than gettimeofday, and for
> > +   mostly architectures it is implemented through vDSO instead of a syscall.  */
> > +static inline uint32_t
> > +random_bits_entropy (void)
> > +{
> > +  struct __timespec64 tv;
> > +  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> > +  return tv.tv_nsec ^ tv.tv_sec;
> > +}
  
Noah Goldstein March 29, 2022, 8:04 p.m. UTC | #3
On Tue, Mar 29, 2022 at 2:56 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> >
> >
> >
> > On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> > > On some architectures `clock_gettime` is undesirable as
> > > it may use a syscall or there may be a faster alternative.
> > > Future architecture specific functions can be added in
> > > sysdeps/<arch>/random-bits-entropy.h to provide a version of
> > > 'random_bits_entropy' that doesn't use 'clock_gettime'.
> > > ---
> > >  include/random-bits.h                 | 16 ++++++--------
> > >  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
> > >  2 files changed, 37 insertions(+), 10 deletions(-)
> > >  create mode 100644 sysdeps/generic/random-bits-entropy.h
> > >
> > > diff --git a/include/random-bits.h b/include/random-bits.h
> > > index 17665b479a..016b87576c 100644
> > > --- a/include/random-bits.h
> > > +++ b/include/random-bits.h
> > > @@ -19,21 +19,17 @@
> > >  #ifndef _RANDOM_BITS_H
> > >  # define _RANDOM_BITS_H
> > >
> > > -#include <time.h>
> > > -#include <stdint.h>
> > > +# include <random-bits-entropy.h>
> > > +# include <stdint.h>
> > >
> > > -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> > > -   starting time, nano-second accuracy, its randomness is significantly better
> > > -   than gettimeofday, and for mostly architectures it is implemented through
> > > -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> > > -   bits will have less entropy. */
> > > +/* Provides fast pseudo-random bits through architecture specific
> > > +   random_bits_entropy.  Expectation is source is some timing function so
> > > +   the upper bits have less entropy.  */
> > >  static inline uint32_t
> > >  random_bits (void)
> > >  {
> > > -  struct __timespec64 tv;
> > > -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> > > +  uint32_t ret = random_bits_entropy ();
> > >    /* Shuffle the lower bits to minimize the clock bias.  */
> > > -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
> > >    ret ^= (ret << 24) | (ret >> 8);
> > >    return ret;
> > >  }
> >
> > We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
> > generic interface (and other high precision timing on other architectures).
> > So I think a better way would be to:
>
> For x86/generic that works but other architectures also have hp-timing
> implementations that might not be suitable for this (i.e there might be
> an entropy regression).
>
> >
> >   static inline uint32_t
> >   random_bits (void)
> >   {
> >     hp_timing_t hp;
> >     HP_TIMING_NOW (hp);

Also not HP_TIMING_NOW will be slightly slower (without reason)
as instead of xoring ns and seconds it does ns + (second * 10^9).

Seems to generate the same amount of entropy so it's just an extra
multiply on the critical path.

> >     /* Shuffle the lower bits to minimize the clock bias.  */
> >     uint32_t ret = hp >> 32 ^ (uint32_t) hp;
> >     ret ^= (ret << 24) | (ret >> 8);
> >     return ret;
> >   }
> >
> > And keep the XOR on with higher bits to keep the clock bias.
> >
> > > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
> > > new file mode 100644
> > > index 0000000000..53290c7f7a
> > > --- /dev/null
> > > +++ b/sysdeps/generic/random-bits-entropy.h
> > > @@ -0,0 +1,31 @@
> > > +/* Fast function for generating entropy of random_bits.
> > > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <stdint.h>
> > > +#include <time.h>
> > > +
> > > +/* Generically use clock_gettime. It has unspecified starting time, nano-second
> > > +   accuracy, its randomness is significantly better than gettimeofday, and for
> > > +   mostly architectures it is implemented through vDSO instead of a syscall.  */
> > > +static inline uint32_t
> > > +random_bits_entropy (void)
> > > +{
> > > +  struct __timespec64 tv;
> > > +  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> > > +  return tv.tv_nsec ^ tv.tv_sec;
> > > +}
  
H.J. Lu March 29, 2022, 8:14 p.m. UTC | #4
On Tue, Mar 29, 2022 at 12:56 PM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> >
> >
> >
> > On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> > > On some architectures `clock_gettime` is undesirable as
> > > it may use a syscall or there may be a faster alternative.
> > > Future architecture specific functions can be added in
> > > sysdeps/<arch>/random-bits-entropy.h to provide a version of
> > > 'random_bits_entropy' that doesn't use 'clock_gettime'.
> > > ---
> > >  include/random-bits.h                 | 16 ++++++--------
> > >  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
> > >  2 files changed, 37 insertions(+), 10 deletions(-)
> > >  create mode 100644 sysdeps/generic/random-bits-entropy.h
> > >
> > > diff --git a/include/random-bits.h b/include/random-bits.h
> > > index 17665b479a..016b87576c 100644
> > > --- a/include/random-bits.h
> > > +++ b/include/random-bits.h
> > > @@ -19,21 +19,17 @@
> > >  #ifndef _RANDOM_BITS_H
> > >  # define _RANDOM_BITS_H
> > >
> > > -#include <time.h>
> > > -#include <stdint.h>
> > > +# include <random-bits-entropy.h>
> > > +# include <stdint.h>
> > >
> > > -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> > > -   starting time, nano-second accuracy, its randomness is significantly better
> > > -   than gettimeofday, and for mostly architectures it is implemented through
> > > -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> > > -   bits will have less entropy. */
> > > +/* Provides fast pseudo-random bits through architecture specific
> > > +   random_bits_entropy.  Expectation is source is some timing function so
> > > +   the upper bits have less entropy.  */
> > >  static inline uint32_t
> > >  random_bits (void)
> > >  {
> > > -  struct __timespec64 tv;
> > > -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> > > +  uint32_t ret = random_bits_entropy ();
> > >    /* Shuffle the lower bits to minimize the clock bias.  */
> > > -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
> > >    ret ^= (ret << 24) | (ret >> 8);
> > >    return ret;
> > >  }
> >
> > We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
> > generic interface (and other high precision timing on other architectures).
> > So I think a better way would be to:
>
> For x86/generic that works but other architectures also have hp-timing
> implementations that might not be suitable for this (i.e there might be
> an entropy regression).

The default hp-timing.h has

# define HP_TIMING_NOW(var) \
({ \
  struct __timespec64 tv; \
  __clock_gettime64 (CLOCK_MONOTONIC, &tv); \
  (var) = (tv.tv_nsec + UINT64_C(1000000000) * tv.tv_sec); \
})

It isn't the same as the current include/random-bits.h.

> >
> >   static inline uint32_t
> >   random_bits (void)
> >   {
> >     hp_timing_t hp;
> >     HP_TIMING_NOW (hp);
> >     /* Shuffle the lower bits to minimize the clock bias.  */
> >     uint32_t ret = hp >> 32 ^ (uint32_t) hp;
> >     ret ^= (ret << 24) | (ret >> 8);
> >     return ret;
> >   }
> >
> > And keep the XOR on with higher bits to keep the clock bias.
> >
> > > diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
> > > new file mode 100644
> > > index 0000000000..53290c7f7a
> > > --- /dev/null
> > > +++ b/sysdeps/generic/random-bits-entropy.h
> > > @@ -0,0 +1,31 @@
> > > +/* Fast function for generating entropy of random_bits.
> > > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <stdint.h>
> > > +#include <time.h>
> > > +
> > > +/* Generically use clock_gettime. It has unspecified starting time, nano-second
> > > +   accuracy, its randomness is significantly better than gettimeofday, and for
> > > +   mostly architectures it is implemented through vDSO instead of a syscall.  */
> > > +static inline uint32_t
> > > +random_bits_entropy (void)
> > > +{
> > > +  struct __timespec64 tv;
> > > +  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> > > +  return tv.tv_nsec ^ tv.tv_sec;
> > > +}
  
Adhemerval Zanella Netto March 29, 2022, 8:37 p.m. UTC | #5
On 29/03/2022 16:56, Noah Goldstein wrote:
> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
>>> On some architectures `clock_gettime` is undesirable as
>>> it may use a syscall or there may be a faster alternative.
>>> Future architecture specific functions can be added in
>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of
>>> 'random_bits_entropy' that doesn't use 'clock_gettime'.
>>> ---
>>>  include/random-bits.h                 | 16 ++++++--------
>>>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
>>>  2 files changed, 37 insertions(+), 10 deletions(-)
>>>  create mode 100644 sysdeps/generic/random-bits-entropy.h
>>>
>>> diff --git a/include/random-bits.h b/include/random-bits.h
>>> index 17665b479a..016b87576c 100644
>>> --- a/include/random-bits.h
>>> +++ b/include/random-bits.h
>>> @@ -19,21 +19,17 @@
>>>  #ifndef _RANDOM_BITS_H
>>>  # define _RANDOM_BITS_H
>>>
>>> -#include <time.h>
>>> -#include <stdint.h>
>>> +# include <random-bits-entropy.h>
>>> +# include <stdint.h>
>>>
>>> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
>>> -   starting time, nano-second accuracy, its randomness is significantly better
>>> -   than gettimeofday, and for mostly architectures it is implemented through
>>> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
>>> -   bits will have less entropy. */
>>> +/* Provides fast pseudo-random bits through architecture specific
>>> +   random_bits_entropy.  Expectation is source is some timing function so
>>> +   the upper bits have less entropy.  */
>>>  static inline uint32_t
>>>  random_bits (void)
>>>  {
>>> -  struct __timespec64 tv;
>>> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
>>> +  uint32_t ret = random_bits_entropy ();
>>>    /* Shuffle the lower bits to minimize the clock bias.  */
>>> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
>>>    ret ^= (ret << 24) | (ret >> 8);
>>>    return ret;
>>>  }
>>
>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
>> generic interface (and other high precision timing on other architectures).
>> So I think a better way would be to:
> 
> For x86/generic that works but other architectures also have hp-timing
> implementations that might not be suitable for this (i.e there might be
> an entropy regression).

I would expect that the entropy of the hp-timing.h instruction would be similar
to the ones from system clock (which exception of legacy architecture like alpha),
but I haven't checked yet.

> 
>>
>>   static inline uint32_t
>>   random_bits (void)
>>   {
>>     hp_timing_t hp;
>>     HP_TIMING_NOW (hp);
>>     /* Shuffle the lower bits to minimize the clock bias.  */
>>     uint32_t ret = hp >> 32 ^ (uint32_t) hp;
>>     ret ^= (ret << 24) | (ret >> 8);
>>     return ret;
>>   }
>>
>> And keep the XOR on with higher bits to keep the clock bias.
>>
>>> diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
>>> new file mode 100644
>>> index 0000000000..53290c7f7a
>>> --- /dev/null
>>> +++ b/sysdeps/generic/random-bits-entropy.h
>>> @@ -0,0 +1,31 @@
>>> +/* Fast function for generating entropy of random_bits.
>>> +   Copyright (C) 2022 Free Software Foundation, Inc.
>>> +   This file is part of the GNU C Library.
>>> +
>>> +   The GNU C Library is free software; you can redistribute it and/or
>>> +   modify it under the terms of the GNU Lesser General Public
>>> +   License as published by the Free Software Foundation; either
>>> +   version 2.1 of the License, or (at your option) any later version.
>>> +
>>> +   The GNU C Library is distributed in the hope that it will be useful,
>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> +   Lesser General Public License for more details.
>>> +
>>> +   You should have received a copy of the GNU Lesser General Public
>>> +   License along with the GNU C Library; if not, see
>>> +   <https://www.gnu.org/licenses/>.  */
>>> +
>>> +#include <stdint.h>
>>> +#include <time.h>
>>> +
>>> +/* Generically use clock_gettime. It has unspecified starting time, nano-second
>>> +   accuracy, its randomness is significantly better than gettimeofday, and for
>>> +   mostly architectures it is implemented through vDSO instead of a syscall.  */
>>> +static inline uint32_t
>>> +random_bits_entropy (void)
>>> +{
>>> +  struct __timespec64 tv;
>>> +  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
>>> +  return tv.tv_nsec ^ tv.tv_sec;
>>> +}
  
Noah Goldstein March 29, 2022, 8:44 p.m. UTC | #6
On Tue, Mar 29, 2022 at 3:37 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 29/03/2022 16:56, Noah Goldstein wrote:
> > On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> >>> On some architectures `clock_gettime` is undesirable as
> >>> it may use a syscall or there may be a faster alternative.
> >>> Future architecture specific functions can be added in
> >>> sysdeps/<arch>/random-bits-entropy.h to provide a version of
> >>> 'random_bits_entropy' that doesn't use 'clock_gettime'.
> >>> ---
> >>>  include/random-bits.h                 | 16 ++++++--------
> >>>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
> >>>  2 files changed, 37 insertions(+), 10 deletions(-)
> >>>  create mode 100644 sysdeps/generic/random-bits-entropy.h
> >>>
> >>> diff --git a/include/random-bits.h b/include/random-bits.h
> >>> index 17665b479a..016b87576c 100644
> >>> --- a/include/random-bits.h
> >>> +++ b/include/random-bits.h
> >>> @@ -19,21 +19,17 @@
> >>>  #ifndef _RANDOM_BITS_H
> >>>  # define _RANDOM_BITS_H
> >>>
> >>> -#include <time.h>
> >>> -#include <stdint.h>
> >>> +# include <random-bits-entropy.h>
> >>> +# include <stdint.h>
> >>>
> >>> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> >>> -   starting time, nano-second accuracy, its randomness is significantly better
> >>> -   than gettimeofday, and for mostly architectures it is implemented through
> >>> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> >>> -   bits will have less entropy. */
> >>> +/* Provides fast pseudo-random bits through architecture specific
> >>> +   random_bits_entropy.  Expectation is source is some timing function so
> >>> +   the upper bits have less entropy.  */
> >>>  static inline uint32_t
> >>>  random_bits (void)
> >>>  {
> >>> -  struct __timespec64 tv;
> >>> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> >>> +  uint32_t ret = random_bits_entropy ();
> >>>    /* Shuffle the lower bits to minimize the clock bias.  */
> >>> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
> >>>    ret ^= (ret << 24) | (ret >> 8);
> >>>    return ret;
> >>>  }
> >>
> >> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
> >> generic interface (and other high precision timing on other architectures).
> >> So I think a better way would be to:
> >
> > For x86/generic that works but other architectures also have hp-timing
> > implementations that might not be suitable for this (i.e there might be
> > an entropy regression).
>
> I would expect that the entropy of the hp-timing.h instruction would be similar
> to the ones from system clock (which exception of legacy architecture like alpha),
> but I haven't checked yet.

Would expect the same, but think it will probably take a test on a
per-arch basis.

Also there are optimizations we can make since we only need the lower
32-bits and
not a true timestamp.

I.e no multiply for generic. Also on x86 we can skip combining the
results of rdtsc.

>
> >
> >>
> >>   static inline uint32_t
> >>   random_bits (void)
> >>   {
> >>     hp_timing_t hp;
> >>     HP_TIMING_NOW (hp);
> >>     /* Shuffle the lower bits to minimize the clock bias.  */
> >>     uint32_t ret = hp >> 32 ^ (uint32_t) hp;
> >>     ret ^= (ret << 24) | (ret >> 8);
> >>     return ret;
> >>   }
> >>
> >> And keep the XOR on with higher bits to keep the clock bias.
> >>
> >>> diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
> >>> new file mode 100644
> >>> index 0000000000..53290c7f7a
> >>> --- /dev/null
> >>> +++ b/sysdeps/generic/random-bits-entropy.h
> >>> @@ -0,0 +1,31 @@
> >>> +/* Fast function for generating entropy of random_bits.
> >>> +   Copyright (C) 2022 Free Software Foundation, Inc.
> >>> +   This file is part of the GNU C Library.
> >>> +
> >>> +   The GNU C Library is free software; you can redistribute it and/or
> >>> +   modify it under the terms of the GNU Lesser General Public
> >>> +   License as published by the Free Software Foundation; either
> >>> +   version 2.1 of the License, or (at your option) any later version.
> >>> +
> >>> +   The GNU C Library is distributed in the hope that it will be useful,
> >>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >>> +   Lesser General Public License for more details.
> >>> +
> >>> +   You should have received a copy of the GNU Lesser General Public
> >>> +   License along with the GNU C Library; if not, see
> >>> +   <https://www.gnu.org/licenses/>.  */
> >>> +
> >>> +#include <stdint.h>
> >>> +#include <time.h>
> >>> +
> >>> +/* Generically use clock_gettime. It has unspecified starting time, nano-second
> >>> +   accuracy, its randomness is significantly better than gettimeofday, and for
> >>> +   mostly architectures it is implemented through vDSO instead of a syscall.  */
> >>> +static inline uint32_t
> >>> +random_bits_entropy (void)
> >>> +{
> >>> +  struct __timespec64 tv;
> >>> +  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> >>> +  return tv.tv_nsec ^ tv.tv_sec;
> >>> +}
  
Adhemerval Zanella Netto March 29, 2022, 8:44 p.m. UTC | #7
On 29/03/2022 17:14, H.J. Lu wrote:
> On Tue, Mar 29, 2022 at 12:56 PM Noah Goldstein via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>>
>>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
>>>> On some architectures `clock_gettime` is undesirable as
>>>> it may use a syscall or there may be a faster alternative.
>>>> Future architecture specific functions can be added in
>>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of
>>>> 'random_bits_entropy' that doesn't use 'clock_gettime'.
>>>> ---
>>>>  include/random-bits.h                 | 16 ++++++--------
>>>>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
>>>>  2 files changed, 37 insertions(+), 10 deletions(-)
>>>>  create mode 100644 sysdeps/generic/random-bits-entropy.h
>>>>
>>>> diff --git a/include/random-bits.h b/include/random-bits.h
>>>> index 17665b479a..016b87576c 100644
>>>> --- a/include/random-bits.h
>>>> +++ b/include/random-bits.h
>>>> @@ -19,21 +19,17 @@
>>>>  #ifndef _RANDOM_BITS_H
>>>>  # define _RANDOM_BITS_H
>>>>
>>>> -#include <time.h>
>>>> -#include <stdint.h>
>>>> +# include <random-bits-entropy.h>
>>>> +# include <stdint.h>
>>>>
>>>> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
>>>> -   starting time, nano-second accuracy, its randomness is significantly better
>>>> -   than gettimeofday, and for mostly architectures it is implemented through
>>>> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
>>>> -   bits will have less entropy. */
>>>> +/* Provides fast pseudo-random bits through architecture specific
>>>> +   random_bits_entropy.  Expectation is source is some timing function so
>>>> +   the upper bits have less entropy.  */
>>>>  static inline uint32_t
>>>>  random_bits (void)
>>>>  {
>>>> -  struct __timespec64 tv;
>>>> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
>>>> +  uint32_t ret = random_bits_entropy ();
>>>>    /* Shuffle the lower bits to minimize the clock bias.  */
>>>> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
>>>>    ret ^= (ret << 24) | (ret >> 8);
>>>>    return ret;
>>>>  }
>>>
>>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
>>> generic interface (and other high precision timing on other architectures).
>>> So I think a better way would be to:
>>
>> For x86/generic that works but other architectures also have hp-timing
>> implementations that might not be suitable for this (i.e there might be
>> an entropy regression).
> 
> The default hp-timing.h has
> 
> # define HP_TIMING_NOW(var) \
> ({ \
>   struct __timespec64 tv; \
>   __clock_gettime64 (CLOCK_MONOTONIC, &tv); \
>   (var) = (tv.tv_nsec + UINT64_C(1000000000) * tv.tv_sec); \
> })
> 
> It isn't the same as the current include/random-bits.h.

Maybe refactor hp-timing.h to add a routine to get the system clock without
any adjustments?  I don't have a strong preference here, but I take that
you are not aiming to use RDRAND or similar instruction, since you are
optimizing to latency.  So I see that using hp-timing.h seems the best
approach, since it should work similar on different architectures (and
each one might disable if the entropy is not large enough).
  
Noah Goldstein March 29, 2022, 8:52 p.m. UTC | #8
On Tue, Mar 29, 2022 at 3:44 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 29/03/2022 17:14, H.J. Lu wrote:
> > On Tue, Mar 29, 2022 at 12:56 PM Noah Goldstein via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >>
> >> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
> >> <adhemerval.zanella@linaro.org> wrote:
> >>>
> >>>
> >>>
> >>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> >>>> On some architectures `clock_gettime` is undesirable as
> >>>> it may use a syscall or there may be a faster alternative.
> >>>> Future architecture specific functions can be added in
> >>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of
> >>>> 'random_bits_entropy' that doesn't use 'clock_gettime'.
> >>>> ---
> >>>>  include/random-bits.h                 | 16 ++++++--------
> >>>>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
> >>>>  2 files changed, 37 insertions(+), 10 deletions(-)
> >>>>  create mode 100644 sysdeps/generic/random-bits-entropy.h
> >>>>
> >>>> diff --git a/include/random-bits.h b/include/random-bits.h
> >>>> index 17665b479a..016b87576c 100644
> >>>> --- a/include/random-bits.h
> >>>> +++ b/include/random-bits.h
> >>>> @@ -19,21 +19,17 @@
> >>>>  #ifndef _RANDOM_BITS_H
> >>>>  # define _RANDOM_BITS_H
> >>>>
> >>>> -#include <time.h>
> >>>> -#include <stdint.h>
> >>>> +# include <random-bits-entropy.h>
> >>>> +# include <stdint.h>
> >>>>
> >>>> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> >>>> -   starting time, nano-second accuracy, its randomness is significantly better
> >>>> -   than gettimeofday, and for mostly architectures it is implemented through
> >>>> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> >>>> -   bits will have less entropy. */
> >>>> +/* Provides fast pseudo-random bits through architecture specific
> >>>> +   random_bits_entropy.  Expectation is source is some timing function so
> >>>> +   the upper bits have less entropy.  */
> >>>>  static inline uint32_t
> >>>>  random_bits (void)
> >>>>  {
> >>>> -  struct __timespec64 tv;
> >>>> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> >>>> +  uint32_t ret = random_bits_entropy ();
> >>>>    /* Shuffle the lower bits to minimize the clock bias.  */
> >>>> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
> >>>>    ret ^= (ret << 24) | (ret >> 8);
> >>>>    return ret;
> >>>>  }
> >>>
> >>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
> >>> generic interface (and other high precision timing on other architectures).
> >>> So I think a better way would be to:
> >>
> >> For x86/generic that works but other architectures also have hp-timing
> >> implementations that might not be suitable for this (i.e there might be
> >> an entropy regression).
> >
> > The default hp-timing.h has
> >
> > # define HP_TIMING_NOW(var) \
> > ({ \
> >   struct __timespec64 tv; \
> >   __clock_gettime64 (CLOCK_MONOTONIC, &tv); \
> >   (var) = (tv.tv_nsec + UINT64_C(1000000000) * tv.tv_sec); \
> > })
> >
> > It isn't the same as the current include/random-bits.h.
>
> Maybe refactor hp-timing.h to add a routine to get the system clock without
> any adjustments?  I don't have a strong preference here, but I take that
> you are not aiming to use RDRAND or similar instruction, since you are
> optimizing to latency.  So I see that using hp-timing.h seems the best
> approach, since it should work similar on different architectures (and
> each one might disable if the entropy is not large enough).

That would work. But we would still need to hardcode the random-bits needs
in some cases.

For example in generic we would still need to combine seconds and nanoseconds
and ideally wouldn't use multiply for that.

At that point it seems we are doing something logically different
enough it would
make more sense to just include <hp-timing.h> in <random-bits-entropy.h> if
it was appropriate for the architecture.
  
Adhemerval Zanella Netto March 30, 2022, 3:37 p.m. UTC | #9
On 29/03/2022 17:44, Noah Goldstein wrote:
> On Tue, Mar 29, 2022 at 3:37 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 29/03/2022 16:56, Noah Goldstein wrote:
>>> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
>>>>> On some architectures `clock_gettime` is undesirable as
>>>>> it may use a syscall or there may be a faster alternative.
>>>>> Future architecture specific functions can be added in
>>>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of
>>>>> 'random_bits_entropy' that doesn't use 'clock_gettime'.
>>>>> ---
>>>>>  include/random-bits.h                 | 16 ++++++--------
>>>>>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
>>>>>  2 files changed, 37 insertions(+), 10 deletions(-)
>>>>>  create mode 100644 sysdeps/generic/random-bits-entropy.h
>>>>>
>>>>> diff --git a/include/random-bits.h b/include/random-bits.h
>>>>> index 17665b479a..016b87576c 100644
>>>>> --- a/include/random-bits.h
>>>>> +++ b/include/random-bits.h
>>>>> @@ -19,21 +19,17 @@
>>>>>  #ifndef _RANDOM_BITS_H
>>>>>  # define _RANDOM_BITS_H
>>>>>
>>>>> -#include <time.h>
>>>>> -#include <stdint.h>
>>>>> +# include <random-bits-entropy.h>
>>>>> +# include <stdint.h>
>>>>>
>>>>> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
>>>>> -   starting time, nano-second accuracy, its randomness is significantly better
>>>>> -   than gettimeofday, and for mostly architectures it is implemented through
>>>>> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
>>>>> -   bits will have less entropy. */
>>>>> +/* Provides fast pseudo-random bits through architecture specific
>>>>> +   random_bits_entropy.  Expectation is source is some timing function so
>>>>> +   the upper bits have less entropy.  */
>>>>>  static inline uint32_t
>>>>>  random_bits (void)
>>>>>  {
>>>>> -  struct __timespec64 tv;
>>>>> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
>>>>> +  uint32_t ret = random_bits_entropy ();
>>>>>    /* Shuffle the lower bits to minimize the clock bias.  */
>>>>> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
>>>>>    ret ^= (ret << 24) | (ret >> 8);
>>>>>    return ret;
>>>>>  }
>>>>
>>>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
>>>> generic interface (and other high precision timing on other architectures).
>>>> So I think a better way would be to:
>>>
>>> For x86/generic that works but other architectures also have hp-timing
>>> implementations that might not be suitable for this (i.e there might be
>>> an entropy regression).
>>
>> I would expect that the entropy of the hp-timing.h instruction would be similar
>> to the ones from system clock (which exception of legacy architecture like alpha),
>> but I haven't checked yet.
> 
> Would expect the same, but think it will probably take a test on a
> per-arch basis.
> 
> Also there are optimizations we can make since we only need the lower
> 32-bits and
> not a true timestamp.
> 
> I.e no multiply for generic. Also on x86 we can skip combining the
> results of rdtsc.

I tested the entropy on some different architectures:

aarch64:
$ ent gettime-random.txt 
Entropy = 7.293634 bits per byte.
$ ent hptiming-random.txt 
Entropy = 6.451314 bits per byte.

ia64:
$ ent gettime-random.txt 
Entropy = 7.613066 bits per byte.
$ ent hptiming-random.txt 
Entropy = 7.458615 bits per byte.

powerpc64le:
$ ent gettime-random.txt 
Entropy = 7.413584 bits per byte.
$ ent hptiming-random.txt 
Entropy = 7.243894 bits per byte.

sparc64
$ ent gettime-random.txt 
Entropy = 7.388590 bits per byte.
$ ent hptiming-random.txt 
Entropy = 7.602368 bits per byte.


So it seems that only aarch64 is really losing some entropy when using
hp-timing.h (not sure why).
  
Noah Goldstein March 30, 2022, 4:30 p.m. UTC | #10
On Wed, Mar 30, 2022 at 10:37 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 29/03/2022 17:44, Noah Goldstein wrote:
> > On Tue, Mar 29, 2022 at 3:37 PM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 29/03/2022 16:56, Noah Goldstein wrote:
> >>> On Tue, Mar 29, 2022 at 2:51 PM Adhemerval Zanella
> >>> <adhemerval.zanella@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 28/03/2022 19:09, Noah Goldstein via Libc-alpha wrote:
> >>>>> On some architectures `clock_gettime` is undesirable as
> >>>>> it may use a syscall or there may be a faster alternative.
> >>>>> Future architecture specific functions can be added in
> >>>>> sysdeps/<arch>/random-bits-entropy.h to provide a version of
> >>>>> 'random_bits_entropy' that doesn't use 'clock_gettime'.
> >>>>> ---
> >>>>>  include/random-bits.h                 | 16 ++++++--------
> >>>>>  sysdeps/generic/random-bits-entropy.h | 31 +++++++++++++++++++++++++++
> >>>>>  2 files changed, 37 insertions(+), 10 deletions(-)
> >>>>>  create mode 100644 sysdeps/generic/random-bits-entropy.h
> >>>>>
> >>>>> diff --git a/include/random-bits.h b/include/random-bits.h
> >>>>> index 17665b479a..016b87576c 100644
> >>>>> --- a/include/random-bits.h
> >>>>> +++ b/include/random-bits.h
> >>>>> @@ -19,21 +19,17 @@
> >>>>>  #ifndef _RANDOM_BITS_H
> >>>>>  # define _RANDOM_BITS_H
> >>>>>
> >>>>> -#include <time.h>
> >>>>> -#include <stdint.h>
> >>>>> +# include <random-bits-entropy.h>
> >>>>> +# include <stdint.h>
> >>>>>
> >>>>> -/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
> >>>>> -   starting time, nano-second accuracy, its randomness is significantly better
> >>>>> -   than gettimeofday, and for mostly architectures it is implemented through
> >>>>> -   vDSO instead of a syscall.  Since the source is a system clock, the upper
> >>>>> -   bits will have less entropy. */
> >>>>> +/* Provides fast pseudo-random bits through architecture specific
> >>>>> +   random_bits_entropy.  Expectation is source is some timing function so
> >>>>> +   the upper bits have less entropy.  */
> >>>>>  static inline uint32_t
> >>>>>  random_bits (void)
> >>>>>  {
> >>>>> -  struct __timespec64 tv;
> >>>>> -  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
> >>>>> +  uint32_t ret = random_bits_entropy ();
> >>>>>    /* Shuffle the lower bits to minimize the clock bias.  */
> >>>>> -  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
> >>>>>    ret ^= (ret << 24) | (ret >> 8);
> >>>>>    return ret;
> >>>>>  }
> >>>>
> >>>> We already provide hp-timing.h, which uses rdtsc on x86 and clock_gettime on
> >>>> generic interface (and other high precision timing on other architectures).
> >>>> So I think a better way would be to:
> >>>
> >>> For x86/generic that works but other architectures also have hp-timing
> >>> implementations that might not be suitable for this (i.e there might be
> >>> an entropy regression).
> >>
> >> I would expect that the entropy of the hp-timing.h instruction would be similar
> >> to the ones from system clock (which exception of legacy architecture like alpha),
> >> but I haven't checked yet.
> >
> > Would expect the same, but think it will probably take a test on a
> > per-arch basis.
> >
> > Also there are optimizations we can make since we only need the lower
> > 32-bits and
> > not a true timestamp.
> >
> > I.e no multiply for generic. Also on x86 we can skip combining the
> > results of rdtsc.
>
> I tested the entropy on some different architectures:
>
> aarch64:
> $ ent gettime-random.txt
> Entropy = 7.293634 bits per byte.
> $ ent hptiming-random.txt
> Entropy = 6.451314 bits per byte.
>
> ia64:
> $ ent gettime-random.txt
> Entropy = 7.613066 bits per byte.
> $ ent hptiming-random.txt
> Entropy = 7.458615 bits per byte.
>
> powerpc64le:
> $ ent gettime-random.txt
> Entropy = 7.413584 bits per byte.
> $ ent hptiming-random.txt
> Entropy = 7.243894 bits per byte.
>
> sparc64
> $ ent gettime-random.txt
> Entropy = 7.388590 bits per byte.
> $ ent hptiming-random.txt
> Entropy = 7.602368 bits per byte.
>
>
> So it seems that only aarch64 is really losing some entropy when using
> hp-timing.h (not sure why).

Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that
since it takes testing and there are optimization cases that we may
want to make it's still best to have a seperate file.
  
Cristian Rodríguez March 30, 2022, 7:38 p.m. UTC | #11
On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:

> Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that
> since it takes testing and there are optimization cases that we mayf
> want to make it's still best to have a seperate file.

What is really needed is to get some sort of randomness function
exposed through the VDSO, so there is no syscall overhead and all this
magic can be dropped. I'm not even sure if that is possible.  that or
a reliable prng usable even by the dynamic linker.
  
Jason A. Donenfeld March 31, 2022, 4:45 a.m. UTC | #12
Hi Christian,

Thanks for looping me into this thread.

On Wed, Mar 30, 2022 at 3:38 PM Cristian Rodríguez
<crrodriguez@opensuse.org> wrote:
>
> On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>
> > Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that
> > since it takes testing and there are optimization cases that we mayf
> > want to make it's still best to have a seperate file.
>
> What is really needed is to get some sort of randomness function
> exposed through the VDSO, so there is no syscall overhead and all this
> magic can be dropped. I'm not even sure if that is possible.  that or
> a reliable prng usable even by the dynamic linker.

Can someone provide some context on this? I've actually looked into
having some megafast userspace RNG component in the vDSO, and I'm not
fundamentally opposed to the idea. I think there's interesting
potential there and something worth considering. But what's the
context of you asking for this now? Under what circumstances are you
finding that calling getrandom(0) or similar is too high overhead or
otherwise problematic?

Jason
  
Cristian Rodríguez March 31, 2022, 10:08 a.m. UTC | #13
On Thu, Mar 31, 2022 at 1:45 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Can someone provide some context on this? I've actually looked into
> having some megafast userspace RNG component in the vDSO, and I'm not
> fundamentally opposed to the idea. I think there's interesting
> potential there and something worth considering. But what's the
> context of you asking for this now? Under what circumstances are you
> finding that calling getrandom(0) or similar is too high overhead or
> otherwise problematic?

I'm not sure in what scenario the syscall overhead is too big, Maybe
it is if called in a loop..but I guess the argument here is that
getrandom(0) may block  or that in a non-bleeding edge system it might
be too slow. (Im aware of the recent massive speedups)

I believe what is needed is a PRNG, no need to guarantee to be
cryptographically secure, that can be used without having to seed it
from userspace, mega fast, never blocks and can be used by the dynamic
linker for example. Now it will be awesome if it is all of that and
CSPRNG.
  
Adhemerval Zanella Netto March 31, 2022, 11:17 a.m. UTC | #14
On 31/03/2022 07:08, Cristian Rodríguez wrote:
> On Thu, Mar 31, 2022 at 1:45 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
>> Can someone provide some context on this? I've actually looked into
>> having some megafast userspace RNG component in the vDSO, and I'm not
>> fundamentally opposed to the idea. I think there's interesting
>> potential there and something worth considering. But what's the
>> context of you asking for this now? Under what circumstances are you
>> finding that calling getrandom(0) or similar is too high overhead or
>> otherwise problematic?
> 
> I'm not sure in what scenario the syscall overhead is too big, Maybe
> it is if called in a loop..but I guess the argument here is that
> getrandom(0) may block  or that in a non-bleeding edge system it might
> be too slow. (Im aware of the recent massive speedups)

Current internal glibc usage where some entropy is required is not performant
critical, so calling a syscall is not really a problem.  However, we are
evaluating tuning the mutex fastpath for PTHREAD_MUTEX_ADAPTIVE_NP, where
a syscall most likely defeats the optimization.

They might be other entropy usages where a userspace source would be 
nice thing to have, for instance on malloc hardening.

> 
> I believe what is needed is a PRNG, no need to guarantee to be
> cryptographically secure, that can be used without having to seed it
> from userspace, mega fast, never blocks and can be used by the dynamic
> linker for example. Now it will be awesome if it is all of that and
> CSPRNG.

Florian has sent a arc4random proposal somes years ago [1], the idea was
pretty much what you summarized: get some entropy from kernel, initialize
a PNRG, and keep a internal buffer to avoid too much overhead by the calls.
It was suggested to use AES (mainly to use hardware instructions on some
architecture), but I think we can use to any algorithm currently used.

There are still some discussion on which is the best strategy to handle
fork resets and if we need per thread buffers to optimize multithread
access.

An I am not sure if a vDSO would be the best interface here, it would
most likely require some complicate synchronization to handle multithread
access like io_ring besides limiting access if kernel provides it.  I
think such interface would be better to be handle by glibc itself.

[1] https://sourceware.org/pipermail/libc-alpha/2018-March/092081.html
  
Cristian Rodríguez March 31, 2022, 11:25 a.m. UTC | #15
On Thu, Mar 31, 2022 at 8:17 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:

> There are still some discussion on which is the best strategy to handle
> fork resets and if we need per thread buffers to optimize multithread
> access.

What about giving up on this and making the state small and use thread
local storage for it ?
I believe all this complexity to handle that cases is not worth it.
  
Adhemerval Zanella Netto March 31, 2022, 11:48 a.m. UTC | #16
On 31/03/2022 08:25, Cristian Rodríguez wrote:
> On Thu, Mar 31, 2022 at 8:17 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> 
>> There are still some discussion on which is the best strategy to handle
>> fork resets and if we need per thread buffers to optimize multithread
>> access.
> 
> What about giving up on this and making the state small and use thread
> local storage for it ?
> I believe all this complexity to handle that cases is not worth it.

That what the per-thread buffer is about, we will need to define how much
buffer would ideal, how to allocate (extending the TCB or using TLS), if
the size would be tunable, which cipher to use, how to handle failures
internally if the source of entropy is not available (Florian patch [1]
used getrandom or /dev/random, and using AT_RANDOM might limit other
usages like guard and stack pointer hardening).

The AES implementation proposed uses a somewhat large state (about 300
bytes), which might add some overhead it would a per-thread buffer. And
afaik other cyphers like Chacha20 have a even larger state (512 bytes).
Florian patch tries to implement some lock-free access to the common
buffer to decrease the state size, it might be a better option indeed.

But if you check Florian's patch, most of the complexity come from
fork detections where even with a per-thread buffer we will need
to handle (for the forked thread).


[1] https://sourceware.org/legacy-ml/libc-alpha/2018-05/msg00891.html
  
Cristian Rodríguez March 31, 2022, 12:14 p.m. UTC | #17
On Thu, Mar 31, 2022 at 8:48 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:

> The AES implementation proposed uses a somewhat large state (about 300
> bytes), which might add some overhead it would a per-thread buffer. And
> afaik other cyphers like Chacha20 have a even larger state (512 bytes).

Apparently chacha8 is enough for this purpose.. but I'm probably
missing something.
  
Yann Droneaud March 31, 2022, 1:12 p.m. UTC | #18
Le 31/03/2022 à 14:14, Cristian Rodríguez via Libc-alpha a écrit :
> On Thu, Mar 31, 2022 at 8:48 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org>  wrote:
>
>> The AES implementation proposed uses a somewhat large state (about 300
>> bytes), which might add some overhead it would a per-thread buffer. And
>> afaik other cyphers like Chacha20 have a even larger state (512 bytes).
> Apparently chacha8 is enough for this purpose.. but I'm probably
> missing something.

Chacha8 should have the same state as Chacha20 which is 64 bytes wide.
Chacha8 use a a reduced round number (8) compared to Chacha20, making it faster.

I've noticed for Chacha20 to be efficient on x86_64, its cost per byte is
lowered by generating more than 64 bytes at a time, hence, the 512 bytes RNG state.

I've generated some graph here (I'm the author of this patch):

https://github.com/Parrot-Developers/libfutils/commit/0fdff8ee31f67988b68774ff04a58d0dd1d94d03

Regards.
  
Jason A. Donenfeld March 31, 2022, 3:31 p.m. UTC | #19
Just so we're on the same page here, is this a discussion about
optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html
?

You just need a super fast random uint32_t for some future pthread changes?

If so, I can send a patch to return moderately secure integers here really fast.

Jason
  
Noah Goldstein March 31, 2022, 6:16 p.m. UTC | #20
On Thu, Mar 31, 2022 at 10:32 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Just so we're on the same page here, is this a discussion about
> optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html
> ?
>
> You just need a super fast random uint32_t for some future pthread changes?

On architectures with some fast non-syscall timer like x86 (`rdtsc`)
will this ever
be worth it?

>
> If so, I can send a patch to return moderately secure integers here really fast.
>
> Jason
  
Cristian Rodríguez March 31, 2022, 9:57 p.m. UTC | #21
On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Just so we're on the same page here, is this a discussion about
> optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html
> ?
>
> You just need a super fast random uint32_t for some future pthread changes?
>
> If so, I can send a patch to return moderately secure integers here really fast.
>
> Jason

That would be great if is better than the current solution.
  
Noah Goldstein March 31, 2022, 10:33 p.m. UTC | #22
On Thu, Mar 31, 2022 at 4:57 PM Cristian Rodríguez
<crrodriguez@opensuse.org> wrote:
>
> On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Just so we're on the same page here, is this a discussion about
> > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html
> > ?
> >
> > You just need a super fast random uint32_t for some future pthread changes?
> >
> > If so, I can send a patch to return moderately secure integers here really fast.
> >
> > Jason
>
> That would be great if is better than the current solution.

Well v2 of this patch does that for x86. Just curious if I should go forward
with the patchset and we can fixup the generic random-bits-entropy.h
later or if the solution will be universally optimal.
  
Jason A. Donenfeld March 31, 2022, 10:51 p.m. UTC | #23
Hi Cristian,

On Thu, Mar 31, 2022 at 5:57 PM Cristian Rodríguez
<crrodriguez@opensuse.org> wrote:
>
> On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Just so we're on the same page here, is this a discussion about
> > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html
> > ?
> >
> > You just need a super fast random uint32_t for some future pthread changes?
> >
> > If so, I can send a patch to return moderately secure integers here really fast.
> >
> > Jason
>
> That would be great if is better than the current solution.

Alright, well, here's something: https://xn--4db.cc/YQOxDP6Z/c

This is somewhat "secure", and maybe overkill.

I can adjust this to have additional security properties (for example,
preventing that counter from running backwards). Or I can adjust it to
be even faster and have fewer security properties. Or maybe bench this
and you'll find that it's fine.

I suspect the only thing we really care about here is not leaking the
AT_RANDOM value somehow, and siphash usually does a good job at that.

Thoughts?

Jason
  
Noah Goldstein March 31, 2022, 11:05 p.m. UTC | #24
On Thu, Mar 31, 2022 at 5:51 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Cristian,
>
> On Thu, Mar 31, 2022 at 5:57 PM Cristian Rodríguez
> <crrodriguez@opensuse.org> wrote:
> >
> > On Thu, Mar 31, 2022 at 12:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > >
> > > Just so we're on the same page here, is this a discussion about
> > > optimizing https://code.woboq.org/userspace/glibc/include/random-bits.h.html
> > > ?
> > >
> > > You just need a super fast random uint32_t for some future pthread changes?
> > >
> > > If so, I can send a patch to return moderately secure integers here really fast.
> > >
> > > Jason
> >
> > That would be great if is better than the current solution.
>
> Alright, well, here's something: https://xn--4db.cc/YQOxDP6Z/c
>
> This is somewhat "secure", and maybe overkill.

AFAIK our goal is entropy more so than security. For example
if this is used to generate jiffies to stagger threads its not a security
issue in any sense, it's just not ideal for performance.

>
> I can adjust this to have additional security properties (for example,
> preventing that counter from running backwards). Or I can adjust it to
> be even faster and have fewer security properties. Or maybe bench this
> and you'll find that it's fine.
>
> I suspect the only thing we really care about here is not leaking the
> AT_RANDOM value somehow, and siphash usually does a good job at that.
>
> Thoughts?
>
> Jason
  
Jason A. Donenfeld March 31, 2022, 11:25 p.m. UTC | #25
Hi Noah,

On Thu, Mar 31, 2022 at 7:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > Alright, well, here's something: https://xn--4db.cc/YQOxDP6Z/c
> >
> > This is somewhat "secure", and maybe overkill.
>
> AFAIK our goal is entropy more so than security. For example
> if this is used to generate jiffies to stagger threads its not a security
> issue in any sense, it's just not ideal for performance.

Oh, okay, well just go with something linear and silly like this:
https://xn--4db.cc/xrAIhCvy/c (an xorshift generator) No security, but
statistically pretty good, and performs well.

Jason
  
Cristian Rodríguez April 1, 2022, 6:01 p.m. UTC | #26
On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:

> AFAIK our goal is entropy more so than security. For example
> if this is used to generate jiffies to stagger threads its not a security
> issue in any sense, it's just not ideal for performance.

In any case this should be more than fast enough for the other use
cases of random_bits() .. maybe one new random_bits_fast() function
foe edge cases where even this is too slow?
  
Florian Weimer April 4, 2022, 2:51 p.m. UTC | #27
* Cristian Rodríguez:

> On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>
>> Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that
>> since it takes testing and there are optimization cases that we mayf
>> want to make it's still best to have a seperate file.
>
> What is really needed is to get some sort of randomness function
> exposed through the VDSO, so there is no syscall overhead and all this
> magic can be dropped. I'm not even sure if that is possible.  that or
> a reliable prng usable even by the dynamic linker.

We'd need to allocate some per-thread data, register it with the kernel
(so that the kernel can apply fork protection), and pass it to the vDSO
function.  It's fairly involved.

And we'd still need to implement fallback if we want to expose a
mostly-non-failing arc4random to applications.

Thanks,
Florian
  
Jason A. Donenfeld April 4, 2022, 2:54 p.m. UTC | #28
Hi Florian,

On Mon, Apr 4, 2022 at 4:51 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Cristian Rodríguez:
>
> > On Wed, Mar 30, 2022 at 1:30 PM Noah Goldstein via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >
> >> Thanks, I can add patches for sparc/powerpc64le/ia64. Still feel that
> >> since it takes testing and there are optimization cases that we mayf
> >> want to make it's still best to have a seperate file.
> >
> > What is really needed is to get some sort of randomness function
> > exposed through the VDSO, so there is no syscall overhead and all this
> > magic can be dropped. I'm not even sure if that is possible.  that or
> > a reliable prng usable even by the dynamic linker.
>
> We'd need to allocate some per-thread data, register it with the kernel
> (so that the kernel can apply fork protection), and pass it to the vDSO
> function.  It's fairly involved.
>
> And we'd still need to implement fallback if we want to expose a
> mostly-non-failing arc4random to applications.

Indeed it's involved. But not impossible and something I'd consider working on.

_However_, based on what people have said in this thread, this all
seems highly unnecessary, since you just need some boring
statistically uniform randomness without any crypto or security
requirements of any kind, and it simply needs to be fast and dumb. If
that's the wrong set of requirements for this problem (I still have no
idea what the bigger picture here is), please pipe up. Assuming it's
true, however, some dumb xorshift thing like https://א.cc/xrAIhCvy/c
seems probably fine?

Jason
  
Florian Weimer April 4, 2022, 3 p.m. UTC | #29
* Jason A. Donenfeld via Libc-alpha:

> _However_, based on what people have said in this thread, this all
> seems highly unnecessary, since you just need some boring
> statistically uniform randomness without any crypto or security
> requirements of any kind, and it simply needs to be fast and dumb. If
> that's the wrong set of requirements for this problem (I still have no
> idea what the bigger picture here is), please pipe up.

If we can make a cryptographically secure generator fast enough, it
would relieve programmers of the need to choose between that and another
generator that just gives some random-looking bits fast.  If programmers
don't have to make a choice, they can't choose incorrectly (introducing
performance bugs or security bugs).

For example, the system call overhead could be prohibitive for things
like hash map keys if these hash maps are not used much before they are
discarded again.

Thanks,
Florian
  
Noah Goldstein April 4, 2022, 4:51 p.m. UTC | #30
On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> * Jason A. Donenfeld via Libc-alpha:
>
> > _However_, based on what people have said in this thread, this all
> > seems highly unnecessary, since you just need some boring
> > statistically uniform randomness without any crypto or security
> > requirements of any kind, and it simply needs to be fast and dumb. If
> > that's the wrong set of requirements for this problem (I still have no
> > idea what the bigger picture here is), please pipe up.
>
> If we can make a cryptographically secure generator fast enough, it
> would relieve programmers of the need to choose between that and another
> generator that just gives some random-looking bits fast.  If programmers
> don't have to make a choice, they can't choose incorrectly (introducing
> performance bugs or security bugs).

It sounds like you're talking about creating a user facing API. Since
random_bits
is internal do we really need so much ease of use at the cost of performance?
>
> For example, the system call overhead could be prohibitive for things
> like hash map keys if these hash maps are not used much before they are
> discarded again.
>
> Thanks,
> Florian
>
  
Adhemerval Zanella Netto April 4, 2022, 5:22 p.m. UTC | #31
On 04/04/2022 13:51, Noah Goldstein via Libc-alpha wrote:
> On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> * Jason A. Donenfeld via Libc-alpha:
>>
>>> _However_, based on what people have said in this thread, this all
>>> seems highly unnecessary, since you just need some boring
>>> statistically uniform randomness without any crypto or security
>>> requirements of any kind, and it simply needs to be fast and dumb. If
>>> that's the wrong set of requirements for this problem (I still have no
>>> idea what the bigger picture here is), please pipe up.
>>
>> If we can make a cryptographically secure generator fast enough, it
>> would relieve programmers of the need to choose between that and another
>> generator that just gives some random-looking bits fast.  If programmers
>> don't have to make a choice, they can't choose incorrectly (introducing
>> performance bugs or security bugs).
> 
> It sounds like you're talking about creating a user facing API. Since
> random_bits
> is internal do we really need so much ease of use at the cost of performance?

For a user exported ABI I think it would be better to rehearse the arc4random 
proposal, which I might spend some time to come with a simpler solution based
on Florian proposal.  For this specif proposal, I see that just factoring the
code to make only x86 use the rdtsc should be suffice.

Ideally, if we eventually get an arc4random the idea would be use it instead.
Florian initial proposal aimed to use AES so we can leverage hardware assisted
optimization presented in some chips.
  
Adhemerval Zanella Netto April 4, 2022, 5:42 p.m. UTC | #32
On 01/04/2022 15:01, Cristian Rodríguez wrote:
> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> 
>> AFAIK our goal is entropy more so than security. For example
>> if this is used to generate jiffies to stagger threads its not a security
>> issue in any sense, it's just not ideal for performance.
> 
> In any case this should be more than fast enough for the other use
> cases of random_bits() .. maybe one new random_bits_fast() function
> foe edge cases where even this is too slow?

I think we are bike-shedding in the same issue OpenBSD guys stumbled 
and which they have solved 10 years ago [1].  Essentially, we need to 
come  up with a internal PRNG interface that can be used internally and
externally, instead of reinventing cleaver ways to use the timer as 
entropy source.

The issue is not really the cypher used, ideally it could be replace if
we find out that it does not fit.  The main issue is to glue together 
all the requirements to have a concise internal interface, taking in
consideration the glibc constrains to work with multiple kernel version
and environments (where we can't assume we have access to a source or
reliable entropy like getrandom syscall).

My plan to rehearse Florian arc4random proposal to have some simpler
to where we might improve upon (a simpler fork detection for kernels
without MADV_WIPEONFORK that just issue a atfork handle, maybe using
ChaCha20 as virtually all other systems do, no per-thread state).

[1] https://www.youtube.com/watch?v=gp_90-3R0pE
  
Noah Goldstein April 4, 2022, 6:23 p.m. UTC | #33
On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 01/04/2022 15:01, Cristian Rodríguez wrote:
> > On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> >> AFAIK our goal is entropy more so than security. For example
> >> if this is used to generate jiffies to stagger threads its not a security
> >> issue in any sense, it's just not ideal for performance.
> >
> > In any case this should be more than fast enough for the other use
> > cases of random_bits() .. maybe one new random_bits_fast() function
> > foe edge cases where even this is too slow?
>
> I think we are bike-shedding in the same issue OpenBSD guys stumbled
> and which they have solved 10 years ago [1].  Essentially, we need to
> come  up with a internal PRNG interface that can be used internally and
> externally, instead of reinventing cleaver ways to use the timer as
> entropy source.
>
> The issue is not really the cypher used, ideally it could be replace if
> we find out that it does not fit.  The main issue is to glue together
> all the requirements to have a concise internal interface, taking in
> consideration the glibc constrains to work with multiple kernel version
> and environments (where we can't assume we have access to a source or
> reliable entropy like getrandom syscall).
>
> My plan to rehearse Florian arc4random proposal to have some simpler
> to where we might improve upon (a simpler fork detection for kernels
> without MADV_WIPEONFORK that just issue a atfork handle, maybe using
> ChaCha20 as virtually all other systems do, no per-thread state).

Why no per-thread / per-cpu? It seems otherwise there will need to be
some explicit synchronization on the stream.


Regarding this patch, do we want to skip it and just wait on arc4random
interface in kernel/glibc or should I go forward with it and some arch
specific entropy functions in the mean-time?

>
> [1] https://www.youtube.com/watch?v=gp_90-3R0pE
  
Jason A. Donenfeld April 4, 2022, 6:28 p.m. UTC | #34
Hi Florian,

On Mon, Apr 4, 2022 at 5:00 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jason A. Donenfeld via Libc-alpha:
>
> > _However_, based on what people have said in this thread, this all
> > seems highly unnecessary, since you just need some boring
> > statistically uniform randomness without any crypto or security
> > requirements of any kind, and it simply needs to be fast and dumb. If
> > that's the wrong set of requirements for this problem (I still have no
> > idea what the bigger picture here is), please pipe up.
>
> If we can make a cryptographically secure generator fast enough, it
> would relieve programmers of the need to choose between that and another
> generator that just gives some random-looking bits fast.  If programmers
> don't have to make a choice, they can't choose incorrectly (introducing
> performance bugs or security bugs).
>
> For example, the system call overhead could be prohibitive for things
> like hash map keys if these hash maps are not used much before they are
> discarded again.

Yes, if we want a general interface it's something we should do well
and have it be cryptographically secure with no downsides bla bla bla.
As mentioned, we can design something fancy with the vDSO if you think
there's a compelling use case for this.

Until your comments, though, everybody on this thread told me this was
just some dinky internal interface only used for some sort of
threading semantic. In that case, just name it
"dinky_threading_id_thinger()" and call it a day? We are all talking
about the same thing, right? Could you confirm/clarify the focus here?
And if you've misunderstood what this is about, please say so too. I
see the subsequent messages are now veering off into lala land since
yours, so I'd like to make sure we're all on the same page, lest we
waste time over nothing.

Thanks,
Jason
  
Jason A. Donenfeld April 4, 2022, 6:32 p.m. UTC | #35
Hi Noah,

On Mon, Apr 4, 2022 at 6:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > * Jason A. Donenfeld via Libc-alpha:
> >
> > > _However_, based on what people have said in this thread, this all
> > > seems highly unnecessary, since you just need some boring
> > > statistically uniform randomness without any crypto or security
> > > requirements of any kind, and it simply needs to be fast and dumb. If
> > > that's the wrong set of requirements for this problem (I still have no
> > > idea what the bigger picture here is), please pipe up.
> >
> > If we can make a cryptographically secure generator fast enough, it
> > would relieve programmers of the need to choose between that and another
> > generator that just gives some random-looking bits fast.  If programmers
> > don't have to make a choice, they can't choose incorrectly (introducing
> > performance bugs or security bugs).
>
> It sounds like you're talking about creating a user facing API. Since
> random_bits
> is internal do we really need so much ease of use at the cost of performance?

Right, my thoughts exactly. If you just need some statistically
uniform bytes for whatever it is whomever told me above something
about threading something about no security something about really
doesn't matter, then I fail to see why https://א.cc/xrAIhCvy/c is
insufficient, and I also don't know why this thread is chasing its
tail about rdtsc.

I'd appreciate it if somebody in a leadership position could let me
know what is asked of me here, since somebody CC'd me into this
thread. I've already made two RNGs here for various spitballed
objectives. I'd like to refrain from spending time on a third until
there's some clarity on what it is you all want. Probably it makes
sense for me to leave this thread alone for a bit while you guys work
that out.

Jason
  
Adhemerval Zanella Netto April 4, 2022, 6:38 p.m. UTC | #36
On 04/04/2022 15:23, Noah Goldstein wrote:
> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 01/04/2022 15:01, Cristian Rodríguez wrote:
>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>>
>>>> AFAIK our goal is entropy more so than security. For example
>>>> if this is used to generate jiffies to stagger threads its not a security
>>>> issue in any sense, it's just not ideal for performance.
>>>
>>> In any case this should be more than fast enough for the other use
>>> cases of random_bits() .. maybe one new random_bits_fast() function
>>> foe edge cases where even this is too slow?
>>
>> I think we are bike-shedding in the same issue OpenBSD guys stumbled
>> and which they have solved 10 years ago [1].  Essentially, we need to
>> come  up with a internal PRNG interface that can be used internally and
>> externally, instead of reinventing cleaver ways to use the timer as
>> entropy source.
>>
>> The issue is not really the cypher used, ideally it could be replace if
>> we find out that it does not fit.  The main issue is to glue together
>> all the requirements to have a concise internal interface, taking in
>> consideration the glibc constrains to work with multiple kernel version
>> and environments (where we can't assume we have access to a source or
>> reliable entropy like getrandom syscall).
>>
>> My plan to rehearse Florian arc4random proposal to have some simpler
>> to where we might improve upon (a simpler fork detection for kernels
>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using
>> ChaCha20 as virtually all other systems do, no per-thread state).
> 
> Why no per-thread / per-cpu? It seems otherwise there will need to be
> some explicit synchronization on the stream.

Mainly because it simplifies a lot the *initial* implementation. I would
prefer to incremental add per-thread optimization than dump a large patch
so we can review in integrate the code more easier.

> 
> 
> Regarding this patch, do we want to skip it and just wait on arc4random
> interface in kernel/glibc or should I go forward with it and some arch
> specific entropy functions in the mean-time?

I don't have a strong opinion on this patch, it does improve x86_64
latency on random_bits although currently internal usage are far from
latency sensitive.  It is really a microoptimization without much real
work gain for current code.

> 
>>
>> [1] https://www.youtube.com/watch?v=gp_90-3R0pE
  
Noah Goldstein April 4, 2022, 6:52 p.m. UTC | #37
On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 04/04/2022 15:23, Noah Goldstein wrote:
> > On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 01/04/2022 15:01, Cristian Rodríguez wrote:
> >>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >>>
> >>>> AFAIK our goal is entropy more so than security. For example
> >>>> if this is used to generate jiffies to stagger threads its not a security
> >>>> issue in any sense, it's just not ideal for performance.
> >>>
> >>> In any case this should be more than fast enough for the other use
> >>> cases of random_bits() .. maybe one new random_bits_fast() function
> >>> foe edge cases where even this is too slow?
> >>
> >> I think we are bike-shedding in the same issue OpenBSD guys stumbled
> >> and which they have solved 10 years ago [1].  Essentially, we need to
> >> come  up with a internal PRNG interface that can be used internally and
> >> externally, instead of reinventing cleaver ways to use the timer as
> >> entropy source.
> >>
> >> The issue is not really the cypher used, ideally it could be replace if
> >> we find out that it does not fit.  The main issue is to glue together
> >> all the requirements to have a concise internal interface, taking in
> >> consideration the glibc constrains to work with multiple kernel version
> >> and environments (where we can't assume we have access to a source or
> >> reliable entropy like getrandom syscall).
> >>
> >> My plan to rehearse Florian arc4random proposal to have some simpler
> >> to where we might improve upon (a simpler fork detection for kernels
> >> without MADV_WIPEONFORK that just issue a atfork handle, maybe using
> >> ChaCha20 as virtually all other systems do, no per-thread state).
> >
> > Why no per-thread / per-cpu? It seems otherwise there will need to be
> > some explicit synchronization on the stream.
>
> Mainly because it simplifies a lot the *initial* implementation. I would
> prefer to incremental add per-thread optimization than dump a large patch
> so we can review in integrate the code more easier.
>
> >
> >
> > Regarding this patch, do we want to skip it and just wait on arc4random
> > interface in kernel/glibc or should I go forward with it and some arch
> > specific entropy functions in the mean-time?
>
> I don't have a strong opinion on this patch, it does improve x86_64
> latency on random_bits although currently internal usage are far from
> latency sensitive.  It is really a microoptimization without much real
> work gain for current code.

That's fair. The motivation is so random_bits can be used for lightweight
jitter i.e in cases like:

[v2] nptl: Add backoff mechanism to spinlock loop

The x86 backend with `rdtsc` is just because it's a simple improvement,
other arch will hopefully be able to get off syscall if they don't have vdso
gettime.
But agree the improvement gains a minimal so if people don't feel its worth
the added complexity we can wait on a strong arch-random interface.


>
> >
> >>
> >> [1] https://www.youtube.com/watch?v=gp_90-3R0pE
  
Noah Goldstein April 4, 2022, 7:16 p.m. UTC | #38
On Mon, Apr 4, 2022 at 1:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Noah,
>
> On Mon, Apr 4, 2022 at 6:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> > >
> > > * Jason A. Donenfeld via Libc-alpha:
> > >
> > > > _However_, based on what people have said in this thread, this all
> > > > seems highly unnecessary, since you just need some boring
> > > > statistically uniform randomness without any crypto or security
> > > > requirements of any kind, and it simply needs to be fast and dumb. If
> > > > that's the wrong set of requirements for this problem (I still have no
> > > > idea what the bigger picture here is), please pipe up.
> > >
> > > If we can make a cryptographically secure generator fast enough, it
> > > would relieve programmers of the need to choose between that and another
> > > generator that just gives some random-looking bits fast.  If programmers
> > > don't have to make a choice, they can't choose incorrectly (introducing
> > > performance bugs or security bugs).
> >
> > It sounds like you're talking about creating a user facing API. Since
> > random_bits
> > is internal do we really need so much ease of use at the cost of performance?
>
> Right, my thoughts exactly. If you just need some statistically
> uniform bytes for whatever it is whomever told me above something
> about threading something about no security something about really
> doesn't matter, then I fail to see why https://א.cc/xrAIhCvy/c is
> insufficient, and I also don't know why this thread is chasing its
> tail about rdtsc.
>
> I'd appreciate it if somebody in a leadership position could let me
> know what is asked of me here, since somebody CC'd me into this
> thread. I've already made two RNGs here for various spitballed
> objectives. I'd like to refrain from spending time on a third until
> there's some clarity on what it is you all want. Probably it makes
> sense for me to leave this thread alone for a bit while you guys work
> that out.
>

Sorry it seems this thread has diverged in several directions.

Think a seperate thread for the arc4random stuff would be better
as its really its own thing.

Regarding this patch which aims at making it easier to provide
a lighter-weight random_bits function.

Your code seems to make sense for systems without VDSO
or some arch alternative that can be used to avoid syscalls.
Generally my thought it saving the static storage (TLS really)
is worth it so I'd prefer to keep the per-arch interface and make
your code the generic solution but thats up for debate. Particularly
arch specific is harder to maintain and this function is not
particularly hot.

So I see four possibilities:

1. Drop it.
2. Generic solution only (using code based on Jason's solution)
3. This patch and update generic solution with Jason's solution
4. This patch and keep gettime for the generic solution

Anyone have opinions of 1/2/3/4?

If not I'm partial to 3.

> Jason
  
Adhemerval Zanella Netto April 4, 2022, 7:20 p.m. UTC | #39
On 04/04/2022 15:52, Noah Goldstein wrote:
> On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 04/04/2022 15:23, Noah Goldstein wrote:
>>> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 01/04/2022 15:01, Cristian Rodríguez wrote:
>>>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>>>>
>>>>>> AFAIK our goal is entropy more so than security. For example
>>>>>> if this is used to generate jiffies to stagger threads its not a security
>>>>>> issue in any sense, it's just not ideal for performance.
>>>>>
>>>>> In any case this should be more than fast enough for the other use
>>>>> cases of random_bits() .. maybe one new random_bits_fast() function
>>>>> foe edge cases where even this is too slow?
>>>>
>>>> I think we are bike-shedding in the same issue OpenBSD guys stumbled
>>>> and which they have solved 10 years ago [1].  Essentially, we need to
>>>> come  up with a internal PRNG interface that can be used internally and
>>>> externally, instead of reinventing cleaver ways to use the timer as
>>>> entropy source.
>>>>
>>>> The issue is not really the cypher used, ideally it could be replace if
>>>> we find out that it does not fit.  The main issue is to glue together
>>>> all the requirements to have a concise internal interface, taking in
>>>> consideration the glibc constrains to work with multiple kernel version
>>>> and environments (where we can't assume we have access to a source or
>>>> reliable entropy like getrandom syscall).
>>>>
>>>> My plan to rehearse Florian arc4random proposal to have some simpler
>>>> to where we might improve upon (a simpler fork detection for kernels
>>>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using
>>>> ChaCha20 as virtually all other systems do, no per-thread state).
>>>
>>> Why no per-thread / per-cpu? It seems otherwise there will need to be
>>> some explicit synchronization on the stream.
>>
>> Mainly because it simplifies a lot the *initial* implementation. I would
>> prefer to incremental add per-thread optimization than dump a large patch
>> so we can review in integrate the code more easier.
>>
>>>
>>>
>>> Regarding this patch, do we want to skip it and just wait on arc4random
>>> interface in kernel/glibc or should I go forward with it and some arch
>>> specific entropy functions in the mean-time?
>>
>> I don't have a strong opinion on this patch, it does improve x86_64
>> latency on random_bits although currently internal usage are far from
>> latency sensitive.  It is really a microoptimization without much real
>> work gain for current code.
> 
> That's fair. The motivation is so random_bits can be used for lightweight
> jitter i.e in cases like:
> 
> [v2] nptl: Add backoff mechanism to spinlock loop
> 
> The x86 backend with `rdtsc` is just because it's a simple improvement,
> other arch will hopefully be able to get off syscall if they don't have vdso
> gettime.
> But agree the improvement gains a minimal so if people don't feel its worth
> the added complexity we can wait on a strong arch-random interface.

I think for mutex optimization it would be better to just add a arch-specific
jitter code and use the backoff optimization iff the arch-specific code is
used. And maybe not tied to random_bits(), since I am not sure if an interface
like arc4random would be good to use in such scenario (since a possible state
reschedule might call getentropy with my add unexpected latency).
  
Noah Goldstein April 4, 2022, 7:48 p.m. UTC | #40
On Mon, Apr 4, 2022 at 2:20 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 04/04/2022 15:52, Noah Goldstein wrote:
> > On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 04/04/2022 15:23, Noah Goldstein wrote:
> >>> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella
> >>> <adhemerval.zanella@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 01/04/2022 15:01, Cristian Rodríguez wrote:
> >>>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >>>>>
> >>>>>> AFAIK our goal is entropy more so than security. For example
> >>>>>> if this is used to generate jiffies to stagger threads its not a security
> >>>>>> issue in any sense, it's just not ideal for performance.
> >>>>>
> >>>>> In any case this should be more than fast enough for the other use
> >>>>> cases of random_bits() .. maybe one new random_bits_fast() function
> >>>>> foe edge cases where even this is too slow?
> >>>>
> >>>> I think we are bike-shedding in the same issue OpenBSD guys stumbled
> >>>> and which they have solved 10 years ago [1].  Essentially, we need to
> >>>> come  up with a internal PRNG interface that can be used internally and
> >>>> externally, instead of reinventing cleaver ways to use the timer as
> >>>> entropy source.
> >>>>
> >>>> The issue is not really the cypher used, ideally it could be replace if
> >>>> we find out that it does not fit.  The main issue is to glue together
> >>>> all the requirements to have a concise internal interface, taking in
> >>>> consideration the glibc constrains to work with multiple kernel version
> >>>> and environments (where we can't assume we have access to a source or
> >>>> reliable entropy like getrandom syscall).
> >>>>
> >>>> My plan to rehearse Florian arc4random proposal to have some simpler
> >>>> to where we might improve upon (a simpler fork detection for kernels
> >>>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using
> >>>> ChaCha20 as virtually all other systems do, no per-thread state).
> >>>
> >>> Why no per-thread / per-cpu? It seems otherwise there will need to be
> >>> some explicit synchronization on the stream.
> >>
> >> Mainly because it simplifies a lot the *initial* implementation. I would
> >> prefer to incremental add per-thread optimization than dump a large patch
> >> so we can review in integrate the code more easier.
> >>
> >>>
> >>>
> >>> Regarding this patch, do we want to skip it and just wait on arc4random
> >>> interface in kernel/glibc or should I go forward with it and some arch
> >>> specific entropy functions in the mean-time?
> >>
> >> I don't have a strong opinion on this patch, it does improve x86_64
> >> latency on random_bits although currently internal usage are far from
> >> latency sensitive.  It is really a microoptimization without much real
> >> work gain for current code.
> >
> > That's fair. The motivation is so random_bits can be used for lightweight
> > jitter i.e in cases like:
> >
> > [v2] nptl: Add backoff mechanism to spinlock loop
> >
> > The x86 backend with `rdtsc` is just because it's a simple improvement,
> > other arch will hopefully be able to get off syscall if they don't have vdso
> > gettime.
> > But agree the improvement gains a minimal so if people don't feel its worth
> > the added complexity we can wait on a strong arch-random interface.
>
> I think for mutex optimization it would be better to just add a arch-specific
> jitter code and use the backoff optimization iff the arch-specific code is
> used. And maybe not tied to random_bits(), since I am not sure if an interface
> like arc4random would be good to use in such scenario (since a possible state
> reschedule might call getentropy with my add unexpected latency).

So you propose drop this patch in favor of some new internal interface like
get_fast_jitter() and a define such as 'HAS_FAST_JITTER'?
  
Adhemerval Zanella Netto April 4, 2022, 7:57 p.m. UTC | #41
On 04/04/2022 16:48, Noah Goldstein wrote:
> On Mon, Apr 4, 2022 at 2:20 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 04/04/2022 15:52, Noah Goldstein wrote:
>>> On Mon, Apr 4, 2022 at 1:38 PM Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 04/04/2022 15:23, Noah Goldstein wrote:
>>>>> On Mon, Apr 4, 2022 at 12:42 PM Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 01/04/2022 15:01, Cristian Rodríguez wrote:
>>>>>>> On Thu, Mar 31, 2022 at 8:05 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>>>>>>
>>>>>>>> AFAIK our goal is entropy more so than security. For example
>>>>>>>> if this is used to generate jiffies to stagger threads its not a security
>>>>>>>> issue in any sense, it's just not ideal for performance.
>>>>>>>
>>>>>>> In any case this should be more than fast enough for the other use
>>>>>>> cases of random_bits() .. maybe one new random_bits_fast() function
>>>>>>> foe edge cases where even this is too slow?
>>>>>>
>>>>>> I think we are bike-shedding in the same issue OpenBSD guys stumbled
>>>>>> and which they have solved 10 years ago [1].  Essentially, we need to
>>>>>> come  up with a internal PRNG interface that can be used internally and
>>>>>> externally, instead of reinventing cleaver ways to use the timer as
>>>>>> entropy source.
>>>>>>
>>>>>> The issue is not really the cypher used, ideally it could be replace if
>>>>>> we find out that it does not fit.  The main issue is to glue together
>>>>>> all the requirements to have a concise internal interface, taking in
>>>>>> consideration the glibc constrains to work with multiple kernel version
>>>>>> and environments (where we can't assume we have access to a source or
>>>>>> reliable entropy like getrandom syscall).
>>>>>>
>>>>>> My plan to rehearse Florian arc4random proposal to have some simpler
>>>>>> to where we might improve upon (a simpler fork detection for kernels
>>>>>> without MADV_WIPEONFORK that just issue a atfork handle, maybe using
>>>>>> ChaCha20 as virtually all other systems do, no per-thread state).
>>>>>
>>>>> Why no per-thread / per-cpu? It seems otherwise there will need to be
>>>>> some explicit synchronization on the stream.
>>>>
>>>> Mainly because it simplifies a lot the *initial* implementation. I would
>>>> prefer to incremental add per-thread optimization than dump a large patch
>>>> so we can review in integrate the code more easier.
>>>>
>>>>>
>>>>>
>>>>> Regarding this patch, do we want to skip it and just wait on arc4random
>>>>> interface in kernel/glibc or should I go forward with it and some arch
>>>>> specific entropy functions in the mean-time?
>>>>
>>>> I don't have a strong opinion on this patch, it does improve x86_64
>>>> latency on random_bits although currently internal usage are far from
>>>> latency sensitive.  It is really a microoptimization without much real
>>>> work gain for current code.
>>>
>>> That's fair. The motivation is so random_bits can be used for lightweight
>>> jitter i.e in cases like:
>>>
>>> [v2] nptl: Add backoff mechanism to spinlock loop
>>>
>>> The x86 backend with `rdtsc` is just because it's a simple improvement,
>>> other arch will hopefully be able to get off syscall if they don't have vdso
>>> gettime.
>>> But agree the improvement gains a minimal so if people don't feel its worth
>>> the added complexity we can wait on a strong arch-random interface.
>>
>> I think for mutex optimization it would be better to just add a arch-specific
>> jitter code and use the backoff optimization iff the arch-specific code is
>> used. And maybe not tied to random_bits(), since I am not sure if an interface
>> like arc4random would be good to use in such scenario (since a possible state
>> reschedule might call getentropy with my add unexpected latency).
> 
> So you propose drop this patch in favor of some new internal interface like
> get_fast_jitter() and a define such as 'HAS_FAST_JITTER'?

Yeah, this would be better than try to improve random_bits.  Also, recently
we are trying to avoid internal HAS/HAVE defines, it is better to add a
generic interface that does either nothing or return a default value
(so default get_fast_jitter() would return 1 or 0, so the mutex code will
se it just do the current spinning without backoff).
  
Cristian Rodríguez April 5, 2022, 12:10 a.m. UTC | #42
On Mon, Apr 4, 2022 at 2:32 PM Jason A. Donenfeld via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>re, since somebody CC'd me into this
> thread. I've already made two RNGs here for various spitballed
> objectives.

I added you to this conversation because I wanted to hear the opinion
from someone with good knowledge on the subject.  in particular to
evaluate the possibility of adding a better, fast PRNG for non-crypto
purposes for internal use of the C library, since the existing code
isn't quite "random",

This revived once again the topic of introducing BSD's arc4random API
to glibc, a subject that has been circulating for quite some time,
  
Jason A. Donenfeld April 5, 2022, 12:18 a.m. UTC | #43
Hi Cristian,

On Tue, Apr 5, 2022 at 2:10 AM Cristian Rodríguez
<crrodriguez@opensuse.org> wrote:
> I added you to this conversation because I wanted to hear the opinion
> from someone with good knowledge on the subject.  in particular to
> evaluate the possibility of adding a better, fast PRNG for non-crypto
> purposes for internal use of the C library, since the existing code
> isn't quite "random",
>
> This revived once again the topic of introducing BSD's arc4random API
> to glibc, a subject that has been circulating for quite some time,

Ahh, okay, thanks for clarifying. So the arc4random stuff isn't really
part of this discussion, but the faster non-crypto prng thing is. In
that case, just go with https://א.cc/xrAIhCvy/c and be done with it?
Any objections to that approach?

Jason
  
Florian Weimer April 5, 2022, 9:20 a.m. UTC | #44
* Jason A. Donenfeld:

>> > _However_, based on what people have said in this thread, this all
>> > seems highly unnecessary, since you just need some boring
>> > statistically uniform randomness without any crypto or security
>> > requirements of any kind, and it simply needs to be fast and dumb. If
>> > that's the wrong set of requirements for this problem (I still have no
>> > idea what the bigger picture here is), please pipe up.
>>
>> If we can make a cryptographically secure generator fast enough, it
>> would relieve programmers of the need to choose between that and another
>> generator that just gives some random-looking bits fast.  If programmers
>> don't have to make a choice, they can't choose incorrectly (introducing
>> performance bugs or security bugs).
>>
>> For example, the system call overhead could be prohibitive for things
>> like hash map keys if these hash maps are not used much before they are
>> discarded again.
>
> Yes, if we want a general interface it's something we should do well
> and have it be cryptographically secure with no downsides bla bla bla.
> As mentioned, we can design something fancy with the vDSO if you think
> there's a compelling use case for this.
>
> Until your comments, though, everybody on this thread told me this was
> just some dinky internal interface only used for some sort of
> threading semantic. In that case, just name it
> "dinky_threading_id_thinger()" and call it a day? We are all talking
> about the same thing, right? Could you confirm/clarify the focus here?
> And if you've misunderstood what this is about, please say so too. I
> see the subsequent messages are now veering off into lala land since
> yours, so I'd like to make sure we're all on the same page, lest we
> waste time over nothing.

We have multiple internal users of random_bits in the glibc source tree.
Some of these users should switch to something stronger because they
are used for security hardening.  So practical unpredictability matters,
although we do not need an approved generator.

Thanks,
Florian
  
Florian Weimer April 5, 2022, 9:22 a.m. UTC | #45
* Noah Goldstein:

> On Mon, Apr 4, 2022 at 10:00 AM Florian Weimer via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> * Jason A. Donenfeld via Libc-alpha:
>>
>> > _However_, based on what people have said in this thread, this all
>> > seems highly unnecessary, since you just need some boring
>> > statistically uniform randomness without any crypto or security
>> > requirements of any kind, and it simply needs to be fast and dumb. If
>> > that's the wrong set of requirements for this problem (I still have no
>> > idea what the bigger picture here is), please pipe up.
>>
>> If we can make a cryptographically secure generator fast enough, it
>> would relieve programmers of the need to choose between that and another
>> generator that just gives some random-looking bits fast.  If programmers
>> don't have to make a choice, they can't choose incorrectly (introducing
>> performance bugs or security bugs).
>
> It sounds like you're talking about creating a user facing API. Since
> random_bits is internal do we really need so much ease of use at the
> cost of performance?

We already have uses of random_bits which are somewhat dodgy, for
example in __res_context_mkquery.

The mutex use case, where speed is strictly more important than
unpredictability, is probably an outlier.

Thanks,
Florian
  
Cristian Rodríguez April 5, 2022, 1:45 p.m. UTC | #46
On Mon, Apr 4, 2022 at 8:18 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Any objections to that approach?

I personally would prefer the siphash-based approach.. which is
probably the best approach for most cases not on a hot codepath..
  

Patch

diff --git a/include/random-bits.h b/include/random-bits.h
index 17665b479a..016b87576c 100644
--- a/include/random-bits.h
+++ b/include/random-bits.h
@@ -19,21 +19,17 @@ 
 #ifndef _RANDOM_BITS_H
 # define _RANDOM_BITS_H
 
-#include <time.h>
-#include <stdint.h>
+# include <random-bits-entropy.h>
+# include <stdint.h>
 
-/* Provides fast pseudo-random bits through clock_gettime.  It has unspecified
-   starting time, nano-second accuracy, its randomness is significantly better
-   than gettimeofday, and for mostly architectures it is implemented through
-   vDSO instead of a syscall.  Since the source is a system clock, the upper
-   bits will have less entropy. */
+/* Provides fast pseudo-random bits through architecture specific
+   random_bits_entropy.  Expectation is source is some timing function so
+   the upper bits have less entropy.  */
 static inline uint32_t
 random_bits (void)
 {
-  struct __timespec64 tv;
-  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
+  uint32_t ret = random_bits_entropy ();
   /* Shuffle the lower bits to minimize the clock bias.  */
-  uint32_t ret = tv.tv_nsec ^ tv.tv_sec;
   ret ^= (ret << 24) | (ret >> 8);
   return ret;
 }
diff --git a/sysdeps/generic/random-bits-entropy.h b/sysdeps/generic/random-bits-entropy.h
new file mode 100644
index 0000000000..53290c7f7a
--- /dev/null
+++ b/sysdeps/generic/random-bits-entropy.h
@@ -0,0 +1,31 @@ 
+/* Fast function for generating entropy of random_bits.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <time.h>
+
+/* Generically use clock_gettime. It has unspecified starting time, nano-second
+   accuracy, its randomness is significantly better than gettimeofday, and for
+   mostly architectures it is implemented through vDSO instead of a syscall.  */
+static inline uint32_t
+random_bits_entropy (void)
+{
+  struct __timespec64 tv;
+  __clock_gettime64 (CLOCK_MONOTONIC, &tv);
+  return tv.tv_nsec ^ tv.tv_sec;
+}