[v8,9/9] manual: Add documentation for arc4random functions

Message ID 20220629213428.3065430-10-adhemerval.zanella@linaro.org
State Superseded
Headers
Series Add arc4random support |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Adhemerval Zanella June 29, 2022, 9:34 p.m. UTC
  ---
 manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
  

Comments

Noah Goldstein June 29, 2022, 9:45 p.m. UTC | #1
On Wed, Jun 29, 2022 at 2:36 PM Adhemerval Zanella via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> ---
>  manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/manual/math.texi b/manual/math.texi
> index 477a18b6d1..ab96726e57 100644
> --- a/manual/math.texi
> +++ b/manual/math.texi
> @@ -1447,6 +1447,7 @@ systems.
>  * ISO Random::                  @code{rand} and friends.
>  * BSD Random::                  @code{random} and friends.
>  * SVID Random::                 @code{drand48} and friends.
> +* High Quality Random::         @code{arc4random} and friends.
>  @end menu
>
>  @node ISO Random
> @@ -1985,6 +1986,50 @@ This function is a GNU extension and should not be used in portable
>  programs.
>  @end deftypefun
>
> +@node High Quality Random
> +@subsection High Quality Random Number Functions
> +
> +This section describes the random number functions provided as a GNU
> +extension, based on OpenBSD interfaces.
> +
> +@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
> +or by reading @file{/dev/urandom} to seed and periodically re-seed the
> +internal state.  A per-thread data pool is used, which allows fast output
> +generation.
> +

Are we committing to per-thread data pools? I thought there were ideas to
use rseq.


> +Although these functions provide higher random quality than ISO, BSD, and
> +SVID functions, these still use a Pseudo-Random generator and should not
> +be used in cryptographic contexts.
> +
> +The internal state is cleared and reseed with kernel entropy on @code{fork}
> +and @code{_Fork}.  It is not cleared for either direct @code{clone} syscall
> +or when using @theglibc{} @code{syscall} function.
> +
> +The prototypes for these functions are in @file{stdlib.h}.
> +@pindex stdlib.h
> +
> +@deftypefun int32_t arc4random (void)
> +@standards{BSD, stdlib.h}
> +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
> +This function returns a single 32-bit value in the range of @code{0} to
> +@code{2^32−1} (inclusive), which is twice the range of @code{rand} and
> +@code{random}.
> +@end deftypefun
> +
> +@deftypefun void arc4random (void *@var{buffer}, size_t @var{length})
> +@standards{BSD, stdlib.h}
> +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
> +This function fills the region @var{buffer} of @var{length} with random data.
> +@end deftypefun
> +
> +@deftypefun uint32_t arc4random_uniform (uint32_t @var{upper_bound})
> +@standards{BSD, stdlib.h}
> +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
> +This function returns a single 32-bit value, uniformly distributed but
> +less than the @var{upper_bound}.  It avoids the @w{modulo bias} when the
> +upper bound is not a power of two.
> +@end deftypefun
> +
>  @node FP Function Optimizations
>  @section Is Fast Code or Small Code preferred?
>  @cindex Optimization
> --
> 2.34.1
>
  
Adhemerval Zanella June 29, 2022, 9:53 p.m. UTC | #2
> On 29 Jun 2022, at 18:45, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> 
> On Wed, Jun 29, 2022 at 2:36 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>> 
>> ---
>> manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 45 insertions(+)
>> 
>> diff --git a/manual/math.texi b/manual/math.texi
>> index 477a18b6d1..ab96726e57 100644
>> --- a/manual/math.texi
>> +++ b/manual/math.texi
>> @@ -1447,6 +1447,7 @@ systems.
>> * ISO Random:: @code{rand} and friends.
>> * BSD Random:: @code{random} and friends.
>> * SVID Random:: @code{drand48} and friends.
>> +* High Quality Random:: @code{arc4random} and friends.
>> @end menu
>> 
>> @node ISO Random
>> @@ -1985,6 +1986,50 @@ This function is a GNU extension and should not be used in portable
>> programs.
>> @end deftypefun
>> 
>> +@node High Quality Random
>> +@subsection High Quality Random Number Functions
>> +
>> +This section describes the random number functions provided as a GNU
>> +extension, based on OpenBSD interfaces.
>> +
>> +@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
>> +or by reading @file{/dev/urandom} to seed and periodically re-seed the
>> +internal state. A per-thread data pool is used, which allows fast output
>> +generation.
>> +
> 
> Are we committing to per-thread data pools? I thought there were ideas to
> use rseq.

For this version yes, since it works on all supported kernels (even for the
ones without getentropy support) and on all architectures.  I do not know how 
feasible it would be to implement per-cpu caches along with rseq and it would
require a fallback for older kernel (most likely a per-thread cache as this
version), although it might be future improvement.
  
Adhemerval Zanella June 29, 2022, 9:55 p.m. UTC | #3
> On 29 Jun 2022, at 18:34, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
> ---
> manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
> 
> diff --git a/manual/math.texi b/manual/math.texi
> index 477a18b6d1..ab96726e57 100644
> --- a/manual/math.texi
> +++ b/manual/math.texi
> @@ -1447,6 +1447,7 @@ systems.
> * ISO Random::                  @code{rand} and friends.
> * BSD Random::                  @code{random} and friends.
> * SVID Random::                 @code{drand48} and friends.
> +* High Quality Random::         @code{arc4random} and friends.
> @end menu
> 
> @node ISO Random
> @@ -1985,6 +1986,50 @@ This function is a GNU extension and should not be used in portable
> programs.
> @end deftypefun
> 
> +@node High Quality Random
> +@subsection High Quality Random Number Functions
> +
> +This section describes the random number functions provided as a GNU
> +extension, based on OpenBSD interfaces.
> +
> +@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
> +or by reading @file{/dev/urandom} to seed and periodically re-seed the
> +internal state.  A per-thread data pool is used, which allows fast output
> +generation.
> +
> +Although these functions provide higher random quality than ISO, BSD, and
> +SVID functions, these still use a Pseudo-Random generator and should not
> +be used in cryptographic contexts.
> +
> +The internal state is cleared and reseed with kernel entropy on @code{fork}
> +and @code{_Fork}.  It is not cleared for either direct @code{clone} syscall
> +or when using @theglibc{} @code{syscall} function.
> +
> +The prototypes for these functions are in @file{stdlib.h}.
> +@pindex stdlib.h
> +
> +@deftypefun int32_t arc4random (void)
> +@standards{BSD, stdlib.h}
> +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
> +This function returns a single 32-bit value in the range of @code{0} to
> +@code{2^32−1} (inclusive), which is twice the range of @code{rand} and
> +@code{random}.
> +@end deftypefun
> +
> +@deftypefun void arc4random (void *@var{buffer}, size_t @var{length})

And this should be arc4random_buf, I have fixed it locally.

> +@standards{BSD, stdlib.h}
> +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
> +This function fills the region @var{buffer} of @var{length} with random data.
> +@end deftypefun
> +
> +@deftypefun uint32_t arc4random_uniform (uint32_t @var{upper_bound})
> +@standards{BSD, stdlib.h}
> +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
> +This function returns a single 32-bit value, uniformly distributed but
> +less than the @var{upper_bound}.  It avoids the @w{modulo bias} when the
> +upper bound is not a power of two.
> +@end deftypefun
> +
> @node FP Function Optimizations
> @section Is Fast Code or Small Code preferred?
> @cindex Optimization
> -- 
> 2.34.1
>
  
Noah Goldstein June 29, 2022, 10:05 p.m. UTC | #4
On Wed, Jun 29, 2022 at 2:53 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> > On 29 Jun 2022, at 18:45, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Wed, Jun 29, 2022 at 2:36 PM Adhemerval Zanella via Libc-alpha
> > <libc-alpha@sourceware.org> wrote:
> >>
> >> ---
> >> manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 45 insertions(+)
> >>
> >> diff --git a/manual/math.texi b/manual/math.texi
> >> index 477a18b6d1..ab96726e57 100644
> >> --- a/manual/math.texi
> >> +++ b/manual/math.texi
> >> @@ -1447,6 +1447,7 @@ systems.
> >> * ISO Random:: @code{rand} and friends.
> >> * BSD Random:: @code{random} and friends.
> >> * SVID Random:: @code{drand48} and friends.
> >> +* High Quality Random:: @code{arc4random} and friends.
> >> @end menu
> >>
> >> @node ISO Random
> >> @@ -1985,6 +1986,50 @@ This function is a GNU extension and should not be used in portable
> >> programs.
> >> @end deftypefun
> >>
> >> +@node High Quality Random
> >> +@subsection High Quality Random Number Functions
> >> +
> >> +This section describes the random number functions provided as a GNU
> >> +extension, based on OpenBSD interfaces.
> >> +
> >> +@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
> >> +or by reading @file{/dev/urandom} to seed and periodically re-seed the
> >> +internal state. A per-thread data pool is used, which allows fast output
> >> +generation.
> >> +
> >
> > Are we committing to per-thread data pools? I thought there were ideas to
> > use rseq.
>
> For this version yes, since it works on all supported kernels (even for the
> ones without getentropy support) and on all architectures.  I do not know how
> feasible it would be to implement per-cpu caches along with rseq and it would
> require a fallback for older kernel (most likely a per-thread cache as this
> version), although it might be future improvement.

I guess do we want to explicitly say per-thread buffer if we may want
to experiment
with something else?

Just seems like the kind of thing that might make it impossible to re-implement
another way.

What about something like:

"The data-pool is implemented to minimize cross-core contention
allowing fast output generation"?
  
Yann Droneaud June 30, 2022, 7:58 a.m. UTC | #5
Hi,

Le 30/06/2022 à 00:05, Noah Goldstein via Libc-alpha a écrit :
> On Wed, Jun 29, 2022 at 2:53 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>> On 29 Jun 2022, at 18:45, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>>
>>> On Wed, Jun 29, 2022 at 2:36 PM Adhemerval Zanella via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>> ---
>>>> manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 45 insertions(+)
>>>>
>>>> diff --git a/manual/math.texi b/manual/math.texi
>>>> index 477a18b6d1..ab96726e57 100644
>>>> --- a/manual/math.texi
>>>> +++ b/manual/math.texi
>>>> @@ -1447,6 +1447,7 @@ systems.
>>>> * ISO Random:: @code{rand} and friends.
>>>> * BSD Random:: @code{random} and friends.
>>>> * SVID Random:: @code{drand48} and friends.
>>>> +* High Quality Random:: @code{arc4random} and friends.
>>>> @end menu
>>>>
>>>> @node ISO Random
>>>> @@ -1985,6 +1986,50 @@ This function is a GNU extension and should not be used in portable
>>>> programs.
>>>> @end deftypefun
>>>>
>>>> +@node High Quality Random
>>>> +@subsection High Quality Random Number Functions
>>>> +
>>>> +This section describes the random number functions provided as a GNU
>>>> +extension, based on OpenBSD interfaces.
>>>> +
>>>> +@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
>>>> +or by reading @file{/dev/urandom} to seed and periodically re-seed the
>>>> +internal state. A per-thread data pool is used, which allows fast output
>>>> +generation.
>>>> +
>>> Are we committing to per-thread data pools? I thought there were ideas to
>>> use rseq.
>> For this version yes, since it works on all supported kernels (even for the
>> ones without getentropy support) and on all architectures.  I do not know how
>> feasible it would be to implement per-cpu caches along with rseq and it would
>> require a fallback for older kernel (most likely a per-thread cache as this
>> version), although it might be future improvement.
> I guess do we want to explicitly say per-thread buffer if we may want
> to experiment
> with something else?
>
> Just seems like the kind of thing that might make it impossible to re-implement
> another way.
>
> What about something like:
>
> "The data-pool is implemented to minimize cross-core contention
> allowing fast output generation"?


"Each thread has its own independant random stream"
  
Adhemerval Zanella June 30, 2022, 7:37 p.m. UTC | #6
> On 29 Jun 2022, at 19:05, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> 
> On Wed, Jun 29, 2022 at 2:53 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>> 
>> 
>> 
>>> On 29 Jun 2022, at 18:45, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>> 
>>> On Wed, Jun 29, 2022 at 2:36 PM Adhemerval Zanella via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>> 
>>>> ---
>>>> manual/math.texi | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 45 insertions(+)
>>>> 
>>>> diff --git a/manual/math.texi b/manual/math.texi
>>>> index 477a18b6d1..ab96726e57 100644
>>>> --- a/manual/math.texi
>>>> +++ b/manual/math.texi
>>>> @@ -1447,6 +1447,7 @@ systems.
>>>> * ISO Random:: @code{rand} and friends.
>>>> * BSD Random:: @code{random} and friends.
>>>> * SVID Random:: @code{drand48} and friends.
>>>> +* High Quality Random:: @code{arc4random} and friends.
>>>> @end menu
>>>> 
>>>> @node ISO Random
>>>> @@ -1985,6 +1986,50 @@ This function is a GNU extension and should not be used in portable
>>>> programs.
>>>> @end deftypefun
>>>> 
>>>> +@node High Quality Random
>>>> +@subsection High Quality Random Number Functions
>>>> +
>>>> +This section describes the random number functions provided as a GNU
>>>> +extension, based on OpenBSD interfaces.
>>>> +
>>>> +@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
>>>> +or by reading @file{/dev/urandom} to seed and periodically re-seed the
>>>> +internal state. A per-thread data pool is used, which allows fast output
>>>> +generation.
>>>> +
>>> 
>>> Are we committing to per-thread data pools? I thought there were ideas to
>>> use rseq.
>> 
>> For this version yes, since it works on all supported kernels (even for the
>> ones without getentropy support) and on all architectures.  I do not know how
>> feasible it would be to implement per-cpu caches along with rseq and it would
>> require a fallback for older kernel (most likely a per-thread cache as this
>> version), although it might be future improvement.
> 
> I guess do we want to explicitly say per-thread buffer if we may want
> to experiment
> with something else?
> 
> Just seems like the kind of thing that might make it impossible to re-implement
> another way.
> 
> What about something like:
> 
> "The data-pool is implemented to minimize cross-core contention
> allowing fast output generation”?

I take the idea is to avoid adding too much implementation detail to manual,
although I take that documentation and arc4random specification does not
define performance or complexity details (as for some STL containers for
instance). If we even change the arc4random implementation I would expect
to update this documentation description.

But iIt does make sense to define expectations, so I will update with your
suggestion (and Yann remarks).
  

Patch

diff --git a/manual/math.texi b/manual/math.texi
index 477a18b6d1..ab96726e57 100644
--- a/manual/math.texi
+++ b/manual/math.texi
@@ -1447,6 +1447,7 @@  systems.
 * ISO Random::                  @code{rand} and friends.
 * BSD Random::                  @code{random} and friends.
 * SVID Random::                 @code{drand48} and friends.
+* High Quality Random::         @code{arc4random} and friends.
 @end menu
 
 @node ISO Random
@@ -1985,6 +1986,50 @@  This function is a GNU extension and should not be used in portable
 programs.
 @end deftypefun
 
+@node High Quality Random
+@subsection High Quality Random Number Functions
+
+This section describes the random number functions provided as a GNU
+extension, based on OpenBSD interfaces.
+
+@Theglibc{} uses kernel entropy obtained either through @code{getrandom}
+or by reading @file{/dev/urandom} to seed and periodically re-seed the
+internal state.  A per-thread data pool is used, which allows fast output
+generation.
+
+Although these functions provide higher random quality than ISO, BSD, and
+SVID functions, these still use a Pseudo-Random generator and should not
+be used in cryptographic contexts.
+
+The internal state is cleared and reseed with kernel entropy on @code{fork}
+and @code{_Fork}.  It is not cleared for either direct @code{clone} syscall
+or when using @theglibc{} @code{syscall} function.
+
+The prototypes for these functions are in @file{stdlib.h}.
+@pindex stdlib.h
+
+@deftypefun int32_t arc4random (void)
+@standards{BSD, stdlib.h}
+@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
+This function returns a single 32-bit value in the range of @code{0} to
+@code{2^32−1} (inclusive), which is twice the range of @code{rand} and
+@code{random}.
+@end deftypefun
+
+@deftypefun void arc4random (void *@var{buffer}, size_t @var{length})
+@standards{BSD, stdlib.h}
+@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
+This function fills the region @var{buffer} of @var{length} with random data.
+@end deftypefun
+
+@deftypefun uint32_t arc4random_uniform (uint32_t @var{upper_bound})
+@standards{BSD, stdlib.h}
+@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}}
+This function returns a single 32-bit value, uniformly distributed but
+less than the @var{upper_bound}.  It avoids the @w{modulo bias} when the
+upper bound is not a power of two.
+@end deftypefun
+
 @node FP Function Optimizations
 @section Is Fast Code or Small Code preferred?
 @cindex Optimization