Give a useful meaning to arc4random_uniform(0);

Message ID 20221231023653.41877-1-alx@kernel.org
State Not applicable
Headers
Series Give a useful meaning to arc4random_uniform(0); |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Alejandro Colomar Dec. 31, 2022, 2:36 a.m. UTC
  Special-casing it in the implementation to return 0 was useless.
Instead, considering 0 as the value after UINT32_MAX has the property
that it allows implementing the following function without any
special cases:

uint32_t
arc4random_range(uint32_t min, uint32_t max)
{
	return arc4random_uniform(max - min + 1) + min;
}

This works for any values of min and max (as long as min <= max, of
course), even for (0, UINT32_MAX).

Oh, and the implementation of arc4random_uniform(3) is now 2 lines
simpler. :)

This will work with the current implementation because powerof2(3) will
also consider 0 as a power of 2.  See powerof2(3):

 SYNOPSIS
       #include <sys/param.h>

       int powerof2(x);

 DESCRIPTION
       This  macro  returns true if x is a power of 2, and false
       otherwise.

       0 is considered a power of 2.  This can make  sense  con‐
       sidering wrapping of unsigned integers, and has interest‐
       ing properties.

Cc: Theo de Raadt <deraadt@theos.com>
Cc: Todd C. Miller <Todd.Miller@sudo.ws>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Hi,

I CCd Theo and Todd, because theirs is the original implementation, and
while this is a useful feature (IMO), it wouldn't make sense to do it
without consensus with other implementations, and especially with the
original implementation.

I found this useful for shadow, where the existing code had a function
that produced a "random" value within a range, but due to the bogus
implementation, it had bias for higher values.  Implementing a *_range()
variant in terms of *_uniform() made it really simple, but the
*_uniform() function needed to do something useful for 0 for that to
work.

Cheers,

Alex

 stdlib/arc4random_uniform.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
  

Comments

Alejandro Colomar Dec. 31, 2022, 2:48 a.m. UTC | #1
On 12/31/22 03:36, Alejandro Colomar wrote:
> Special-casing it in the implementation to return 0 was useless.
> Instead, considering 0 as the value after UINT32_MAX has the property
> that it allows implementing the following function without any
> special cases:
> 
> uint32_t
> arc4random_range(uint32_t min, uint32_t max)
> {
> 	return arc4random_uniform(max - min + 1) + min;
> }
> 
> This works for any values of min and max (as long as min <= max, of
> course), even for (0, UINT32_MAX).
> 
> Oh, and the implementation of arc4random_uniform(3) is now 2 lines
> simpler. :)
> 
> This will work with the current implementation because powerof2(3) will
> also consider 0 as a power of 2.  See powerof2(3):
> 
>   SYNOPSIS
>         #include <sys/param.h>
> 
>         int powerof2(x);
> 
>   DESCRIPTION
>         This  macro  returns true if x is a power of 2, and false
>         otherwise.
> 
>         0 is considered a power of 2.  This can make  sense  con‐
>         sidering wrapping of unsigned integers, and has interest‐
>         ing properties.
> 
> Cc: Theo de Raadt <deraadt@theos.com>
> Cc: Todd C. Miller <Todd.Miller@sudo.ws>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: Yann Droneaud <ydroneaud@opteya.com>
> Cc: Joseph Myers <joseph@codesourcery.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
> 
> Hi,
> 
> I CCd Theo and Todd, because theirs is the original implementation, and
> while this is a useful feature (IMO), it wouldn't make sense to do it
> without consensus with other implementations, and especially with the
> original implementation.
> 
> I found this useful for shadow, where the existing code had a function
> that produced a "random" value within a range, but due to the bogus
> implementation, it had bias for higher values.  Implementing a *_range()
> variant in terms of *_uniform() made it really simple, but the
> *_uniform() function needed to do something useful for 0 for that to
> work.

I forgot to link to the situation mentioned above:

<https://github.com/shadow-maint/shadow/pull/624>

> 
> Cheers,
> 
> Alex
> 
>   stdlib/arc4random_uniform.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> index 5aa98d1c13..1cd52c0d1c 100644
> --- a/stdlib/arc4random_uniform.c
> +++ b/stdlib/arc4random_uniform.c
> @@ -29,15 +29,13 @@
>      the asked range after range adjustment.
>   
>      The algorithm avoids modulo and divide operations, which might be costly
> -   depending on the architecture.  */
> +   depending on the architecture.
> +
> +   0 is treated as if it were UINT32_MAX + 1, and so arc4random_uniform(0)
> +   is equivalent to arc4random().  */
>   uint32_t
>   __arc4random_uniform (uint32_t n)
>   {
> -  if (n <= 1)
> -    /* There is no valid return value for a zero limit, and 0 is the
> -       only possible result for limit 1.  */
> -    return 0;
> -
>     /* Powers of two are easy.  */
>     if (powerof2 (n))
>       return __arc4random () & (n - 1);

-- 
<http://www.alejandro-colomar.es/>
  
Jason A. Donenfeld Dec. 31, 2022, 2:57 a.m. UTC | #2
Did you happen to CC me here because you noticed I did exactly the
same thing in the Linux kernel recently? ;-)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f576b2593a978451416424e75f69ad1e3ae4efe
  
Theo de Raadt Dec. 31, 2022, 8:50 a.m. UTC | #3
Your proposal failed to update the manual page.  That was my first
hint that somethign is amiss.

Right now the manual it is super clear that it returns a value "less
than upper_bound".  Which by prototype is uint32_t.

How would you rewrite the manual page to explain this behaviour?

Since your change choose a strange condition where it will return
a value greater than upper_bound.

Also, right now an (incorrect?) call of arc4random_uniform(0)
will return 0, but with your proposal it will return a non-zero
number.  Have you audited the entire universe of software to
ensure that your change doesn't introduce a bug in some other
piece of software?  I doubt you did that.  Very unprofessional
of you to not study the impact and just wave the issue away.

I think Special-casing the value of 0 to mean something new
and undocumented behaviour makes no sense.  It is even potentially
undocumentable.

The decision to return 0 was made very carefully.  It results programs
passing mathematically calculated erroneous values not going further off
the rails in a compliant fashion.  If they are using the value to index
into an array, they will tend to access in-range, rather than out of
range.  Do you like introducing memory safety bugs into software you
did not review but which will be affected by your proposal to change
a standard well-known API which has been in use since 2008.

I do not like your proposal at all.  A function like arc4random_range()
is even more likely to be used wrong by passing backwards points
and you seem to have a lot of hubris to add a range check to it.
Is this really all just "everyone else's big code will be perfect like
my 12 lines of code"?

Also, you have not cc'd all authors of this layer.  I am adding them.  


Alejandro Colomar <alx.manpages@gmail.com> wrote:

> Special-casing it in the implementation to return 0 was useless.
> Instead, considering 0 as the value after UINT32_MAX has the property
> that it allows implementing the following function without any
> special cases:
> 
> uint32_t
> arc4random_range(uint32_t min, uint32_t max)
> {
> 	return arc4random_uniform(max - min + 1) + min;
> }
> 
> This works for any values of min and max (as long as min <= max, of
> course), even for (0, UINT32_MAX).
> 
> Oh, and the implementation of arc4random_uniform(3) is now 2 lines
> simpler. :)
> 
> This will work with the current implementation because powerof2(3) will
> also consider 0 as a power of 2.  See powerof2(3):
> 
>  SYNOPSIS
>        #include <sys/param.h>
> 
>        int powerof2(x);
> 
>  DESCRIPTION
>        This  macro  returns true if x is a power of 2, and false
>        otherwise.
> 
>        0 is considered a power of 2.  This can make  sense  con‐
>        sidering wrapping of unsigned integers, and has interest‐
>        ing properties.
> 
> Cc: Theo de Raadt <deraadt@theos.com>
> Cc: Todd C. Miller <Todd.Miller@sudo.ws>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: Yann Droneaud <ydroneaud@opteya.com>
> Cc: Joseph Myers <joseph@codesourcery.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
> 
> Hi,
> 
> I CCd Theo and Todd, because theirs is the original implementation, and
> while this is a useful feature (IMO), it wouldn't make sense to do it
> without consensus with other implementations, and especially with the
> original implementation.
> 
> I found this useful for shadow, where the existing code had a function
> that produced a "random" value within a range, but due to the bogus
> implementation, it had bias for higher values.  Implementing a *_range()
> variant in terms of *_uniform() made it really simple, but the
> *_uniform() function needed to do something useful for 0 for that to
> work.
> 
> Cheers,
> 
> Alex
> 
>  stdlib/arc4random_uniform.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> index 5aa98d1c13..1cd52c0d1c 100644
> --- a/stdlib/arc4random_uniform.c
> +++ b/stdlib/arc4random_uniform.c
> @@ -29,15 +29,13 @@
>     the asked range after range adjustment.
>  
>     The algorithm avoids modulo and divide operations, which might be costly
> -   depending on the architecture.  */
> +   depending on the architecture.
> +
> +   0 is treated as if it were UINT32_MAX + 1, and so arc4random_uniform(0)
> +   is equivalent to arc4random().  */
>  uint32_t
>  __arc4random_uniform (uint32_t n)
>  {
> -  if (n <= 1)
> -    /* There is no valid return value for a zero limit, and 0 is the
> -       only possible result for limit 1.  */
> -    return 0;
> -
>    /* Powers of two are easy.  */
>    if (powerof2 (n))
>      return __arc4random () & (n - 1);
> -- 
> 2.39.0
>
  
Theo de Raadt Dec. 31, 2022, 8:51 a.m. UTC | #4
Your proposal failed to update the manual page.  That was my first
hint that somethign is amiss.

Right now the manual it is super clear that it returns a value "less
than upper_bound".  Which by prototype is uint32_t.

How would you rewrite the manual page to explain this behaviour?

Since your change choose a strange condition where it will return
a value greater than upper_bound.

Also, right now an (incorrect?) call of arc4random_uniform(0)
will return 0, but with your proposal it will return a non-zero
number.  Have you audited the entire universe of software to
ensure that your change doesn't introduce a bug in some other
piece of software?  I doubt you did that.  Very unprofessional
of you to not study the impact and just wave the issue away.

I think Special-casing the value of 0 to mean something new
and undocumented behaviour makes no sense.  It is even potentially
undocumentable.

The decision to return 0 was made very carefully.  It results programs
passing mathematically calculated erroneous values not going further off
the rails in a compliant fashion.  If they are using the value to index
into an array, they will tend to access in-range, rather than out of
range.  Do you like introducing memory safety bugs into software you
did not review but which will be affected by your proposal to change
a standard well-known API which has been in use since 2008.

I do not like your proposal at all.  A function like arc4random_range()
is even more likely to be used wrong by passing backwards points
and you seem to have a lot of hubris to add a range check to it.
Is this really all just "everyone else's big code will be perfect like
my 12 lines of code"?

Also, you have not cc'd all authors of this layer.  I am adding them.  


Alejandro Colomar <alx.manpages@gmail.com> wrote:

> Special-casing it in the implementation to return 0 was useless.
> Instead, considering 0 as the value after UINT32_MAX has the property
> that it allows implementing the following function without any
> special cases:
> 
> uint32_t
> arc4random_range(uint32_t min, uint32_t max)
> {
> 	return arc4random_uniform(max - min + 1) + min;
> }
> 
> This works for any values of min and max (as long as min <= max, of
> course), even for (0, UINT32_MAX).
> 
> Oh, and the implementation of arc4random_uniform(3) is now 2 lines
> simpler. :)
> 
> This will work with the current implementation because powerof2(3) will
> also consider 0 as a power of 2.  See powerof2(3):
> 
>  SYNOPSIS
>        #include <sys/param.h>
> 
>        int powerof2(x);
> 
>  DESCRIPTION
>        This  macro  returns true if x is a power of 2, and false
>        otherwise.
> 
>        0 is considered a power of 2.  This can make  sense  con‐
>        sidering wrapping of unsigned integers, and has interest‐
>        ing properties.
> 
> Cc: Theo de Raadt <deraadt@theos.com>
> Cc: Todd C. Miller <Todd.Miller@sudo.ws>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: Yann Droneaud <ydroneaud@opteya.com>
> Cc: Joseph Myers <joseph@codesourcery.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
> 
> Hi,
> 
> I CCd Theo and Todd, because theirs is the original implementation, and
> while this is a useful feature (IMO), it wouldn't make sense to do it
> without consensus with other implementations, and especially with the
> original implementation.
> 
> I found this useful for shadow, where the existing code had a function
> that produced a "random" value within a range, but due to the bogus
> implementation, it had bias for higher values.  Implementing a *_range()
> variant in terms of *_uniform() made it really simple, but the
> *_uniform() function needed to do something useful for 0 for that to
> work.
> 
> Cheers,
> 
> Alex
> 
>  stdlib/arc4random_uniform.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
> index 5aa98d1c13..1cd52c0d1c 100644
> --- a/stdlib/arc4random_uniform.c
> +++ b/stdlib/arc4random_uniform.c
> @@ -29,15 +29,13 @@
>     the asked range after range adjustment.
>  
>     The algorithm avoids modulo and divide operations, which might be costly
> -   depending on the architecture.  */
> +   depending on the architecture.
> +
> +   0 is treated as if it were UINT32_MAX + 1, and so arc4random_uniform(0)
> +   is equivalent to arc4random().  */
>  uint32_t
>  __arc4random_uniform (uint32_t n)
>  {
> -  if (n <= 1)
> -    /* There is no valid return value for a zero limit, and 0 is the
> -       only possible result for limit 1.  */
> -    return 0;
> -
>    /* Powers of two are easy.  */
>    if (powerof2 (n))
>      return __arc4random () & (n - 1);
> -- 
> 2.39.0
>
  
Alejandro Colomar Dec. 31, 2022, 1:39 p.m. UTC | #5
Hey Jason!

On 12/31/22 03:57, Jason A. Donenfeld wrote:
> Did you happen to CC me here because you noticed I did exactly the
> same thing in the Linux kernel recently? ;-)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f576b2593a978451416424e75f69ad1e3ae4efe

Ahh, I didn't know that.  I CCd you because of the lengthy discussion in the 
libc-alpha list about the introduction of arc4random(3).  I first CCd you in 
patch to shadow where I prefered getrandom(2) over arc4random(3) for problems 
with libbsd.

However, it's interesting that we came to the same interfaces with completely 
independent development.  That gives more strength to my feeling that it's a 
good interface.  I'll detail more in a reply to Theo that I'll write next :)

Cheers,

Alex

P.S.: Thanks for the review in the shadow PR!

-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 31, 2022, 2:56 p.m. UTC | #6
Hello Theo,

I'm going to state it in case it wasn't obvious:

This patch was for glibc, which can be inferred from the fact that the email was 
sent TO libc-alpha@, and all other addresses were just CCd.

Oh, and sorry about the strong words if any.  It's not my intention to attack 
you personally, but it's already been several times that you attacked me 
personally, and I expect that my words will not be the kindest that I'd like to 
write, even if I will try to.  Please, understand.

On 12/31/22 09:51, Theo de Raadt wrote:
> Your proposal failed to update the manual page.

There's still no manual page for arc4random(3); so there's no way I could have 
updated it.  Since the function was added only a few months ago to glibc, it 
makes sense that it's not documented; although, I would have preferred that the 
manual page had been developed together with the function itself.  Since I'm 
subscribed to libc-alpha@, I read some emails discussing the inclussion of the 
function, but I didn't read it entirely, and I don't follow closely the glibc 
development, so I didn't know when the function was officially added and later 
released.  I rely considerably on contributions form glibc developers that send 
new manual pages, because I wouldn't be able to document by myself everything 
that goes new into Linux and glibc.

So, I was waiting for someone to add a manual page for arc4random(3) into the 
Linux man-pages.  However, now that I've investigated myself enough on this 
topic, I feel ready to document it myself.

>  That was my first
> hint that somethign is amiss.
> 
> Right now the manual it is super clear that it returns a value "less
> than upper_bound".  Which by prototype is uint32_t.

Is it super clear?  That's a lie.  For an input of 0, it returns 0.  In which 
world is that a value "less that"?  In C, (0 < 0) always evaluates to false.

> 
> How would you rewrite the manual page to explain this behaviour?

I had struggles yesterday to come up with a description of the function, but 
overnight I came up with an idea that would even solve the lie in the current 
OpenBSD page.

"
This function returns a value in the range [0, upper_bound - 1].
"

That wording is valid for current non-0 input to arc4random_uniform(3), and 
thanks to unsigned integers wrapping around, has the property that for an input 
of 0, it is effectively [0, UINT32_MAX].

> 
> Since your change choose a strange condition where it will return
> a value greater than upper_bound.

0 isn't less than 0 either.  At least, my proposal is consistent with 
wrap-around of unsigned integers.

> 
> Also, right now an (incorrect?) call of arc4random_uniform(0)

The arc4random_uniform(3) manual page in OpenBSD doesn't seem to declare this 
call to be incorrect.

RETURN VALUES
        These functions are always successful, and no return value is  reserved
        to indicate an error.


And there's no ERRORS or CAVEATS sections.  You should specify 0 as an error 
code for the invalid input 0 (in your implementation).

> will return 0, but with your proposal it will return a non-zero
> number.  Have you audited the entire universe of software to
> ensure that your change doesn't introduce a bug in some other
> piece of software?  I doubt you did that.  Very unprofessional
> of you to not study the impact and just wave the issue away.

Since it's impossible, I didn't.  Considering that this was a patch to glibc, 
which only added this function a few months ago, there probably isn't any 
software that relies on this under glibc.  However, it would be odd to add a 
different behavior to glibc than in OpenBSD, since it would surprise porters.

> 
> I think Special-casing the value of 0 to mean something new
> and undocumented behaviour makes no sense.  It is even potentially
> undocumentable.

Actually, you're already special-casing 0 to mean something different than "this 
returns less than upper_bound".  And it's undocumented.  Returning 42 would be 
as bad as 0 with your documentation.  Therefore, I'd call this a fix, rather 
than a regression.  After this patch, it can be documented.

> 
> The decision to return 0 was made very carefully.  It results programs
> passing mathematically calculated erroneous values not going further off
> the rails in a compliant fashion.  If they are using the value to index
> into an array, they will tend to access in-range, rather than out of
> range.

I'll try to turn your own design decisions against you.  Let's bring strlcpy(3) 
into the table, which I think has some nice features.

strlcpy(3) crashes on invalid input, by accessing (reading) the input array out 
of bounds, if the input was invalid.  This allegedly helped find several cases 
of invalid input, didn't it?  IIRC, you told me that.

arc4random_uniform(3), instead, when the input is invalid, it silently returns a 
value that is likely to be confused with a valid index (it's really off-by-one, 
since it's one greater than the maximum valid index), and will allow programs to 
continue with invalid behavior, instead of just crashing.  If it returned a 
random 32-bit integer, the chances are high that any access will just crash, and 
thus it would help uncover erroneous calculations.

>  Do you like introducing memory safety bugs into software you
> did not review but which will be affected by your proposal to change
> a standard well-known API which has been in use since 2008.

Of course not.  I think this was unnecessary of you to ask, and feels to me like 
a personal attack.  At least you didn't mention ego this time.  Please, come 
back to the technical discussion.

> 
> I do not like your proposal at all.  A function like arc4random_range()
> is even more likely to be used wrong by passing backwards points
> and you seem to have a lot of hubris to add a range check to it.

Oh, I just checked hubris in the dictionary and it seems you did mention ego. 
I'll try to rebate you with something useful.

If you run `grep -rn 'arc4random_uniform('` in the OpenBSD tree, there will be 
many cases where you'd really benefit from this.  I'll just pick a few:


sys/net/pf_lb.c:224:
			cut = arc4random_uniform(1 + high - low) + low;
better as:
			cut = arc4random_range(low, high);


sys/kern/kern_fork.c:648:
		pid = 2 + arc4random_uniform(PID_MAX - 1);
better as:
		pid = arc4random_range(2, PID_MAX);


usr.bin/nc/netcat.c:1501:
				cp = arc4random_uniform(x + 1);
better as:
				cp = arc4random_range(0, x);


> Is this really all just "everyone else's big code will be perfect like
> my 12 lines of code"?
> 
> Also, you have not cc'd all authors of this layer.  I am adding them.

Thanks.  I didn't know all of the authors.

Cheers, honestly,
Alex

> 
> 
> Alejandro Colomar <alx.manpages@gmail.com> wrote:
> 
>> Special-casing it in the implementation to return 0 was useless.
>> Instead, considering 0 as the value after UINT32_MAX has the property
>> that it allows implementing the following function without any
>> special cases:
>>
>> uint32_t
>> arc4random_range(uint32_t min, uint32_t max)
>> {
>> 	return arc4random_uniform(max - min + 1) + min;
>> }
>>
>> This works for any values of min and max (as long as min <= max, of
>> course), even for (0, UINT32_MAX).
>>
>> Oh, and the implementation of arc4random_uniform(3) is now 2 lines
>> simpler. :)
>>
>> This will work with the current implementation because powerof2(3) will
>> also consider 0 as a power of 2.  See powerof2(3):
>>
>>   SYNOPSIS
>>         #include <sys/param.h>
>>
>>         int powerof2(x);
>>
>>   DESCRIPTION
>>         This  macro  returns true if x is a power of 2, and false
>>         otherwise.
>>
>>         0 is considered a power of 2.  This can make  sense  con‐
>>         sidering wrapping of unsigned integers, and has interest‐
>>         ing properties.
>>
>> Cc: Theo de Raadt <deraadt@theos.com>
>> Cc: Todd C. Miller <Todd.Miller@sudo.ws>
>> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
>> Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
>> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
>> Cc: Yann Droneaud <ydroneaud@opteya.com>
>> Cc: Joseph Myers <joseph@codesourcery.com>
>> Signed-off-by: Alejandro Colomar <alx@kernel.org>
>> ---
>>
>> Hi,
>>
>> I CCd Theo and Todd, because theirs is the original implementation, and
>> while this is a useful feature (IMO), it wouldn't make sense to do it
>> without consensus with other implementations, and especially with the
>> original implementation.
>>
>> I found this useful for shadow, where the existing code had a function
>> that produced a "random" value within a range, but due to the bogus
>> implementation, it had bias for higher values.  Implementing a *_range()
>> variant in terms of *_uniform() made it really simple, but the
>> *_uniform() function needed to do something useful for 0 for that to
>> work.
>>
>> Cheers,
>>
>> Alex
>>
>>   stdlib/arc4random_uniform.c | 10 ++++------
>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
>> index 5aa98d1c13..1cd52c0d1c 100644
>> --- a/stdlib/arc4random_uniform.c
>> +++ b/stdlib/arc4random_uniform.c
>> @@ -29,15 +29,13 @@
>>      the asked range after range adjustment.
>>   
>>      The algorithm avoids modulo and divide operations, which might be costly
>> -   depending on the architecture.  */
>> +   depending on the architecture.
>> +
>> +   0 is treated as if it were UINT32_MAX + 1, and so arc4random_uniform(0)
>> +   is equivalent to arc4random().  */
>>   uint32_t
>>   __arc4random_uniform (uint32_t n)
>>   {
>> -  if (n <= 1)
>> -    /* There is no valid return value for a zero limit, and 0 is the
>> -       only possible result for limit 1.  */
>> -    return 0;
>> -
>>     /* Powers of two are easy.  */
>>     if (powerof2 (n))
>>       return __arc4random () & (n - 1);
>> -- 
>> 2.39.0
>>

-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 31, 2022, 3:13 p.m. UTC | #7
Hi Theo,

On 12/31/22 15:56, Alejandro Colomar wrote:
>>
>> I do not like your proposal at all.  A function like arc4random_range()
>> is even more likely to be used wrong by passing backwards points
>> and you seem to have a lot of hubris to add a range check to it.

I didn't understand the entire sentence, since I'm not a native English speaker. 
  Sorry for that.  About adding a range check, I'm not against it.  But what to 
do in that case?  abort()?  I don't see anything significantly better?  In the 
Linux kernel, the used something BUILD_BUG, but I don't know those macros very much.

I'm really open to discussion about what would the the best behavior when max < min.

Cheers,

Alex

> 
> Oh, I just checked hubris in the dictionary and it seems you did mention ego. 
> I'll try to rebate you with something useful.
> 
> If you run `grep -rn 'arc4random_uniform('` in the OpenBSD tree, there will be 
> many cases where you'd really benefit from this.  I'll just pick a few:
> 
> 
> sys/net/pf_lb.c:224:
>              cut = arc4random_uniform(1 + high - low) + low;
> better as:
>              cut = arc4random_range(low, high);
> 
> 
> sys/kern/kern_fork.c:648:
>          pid = 2 + arc4random_uniform(PID_MAX - 1);
> better as:
>          pid = arc4random_range(2, PID_MAX);
> 
> 
> usr.bin/nc/netcat.c:1501:
>                  cp = arc4random_uniform(x + 1);
> better as:
>                  cp = arc4random_range(0, x);
> 
> 
-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 31, 2022, 3:17 p.m. UTC | #8
On 12/31/22 16:13, Alejandro Colomar wrote:
> Hi Theo,
> 
> On 12/31/22 15:56, Alejandro Colomar wrote:
>>>
>>> I do not like your proposal at all.  A function like arc4random_range()
>>> is even more likely to be used wrong by passing backwards points
>>> and you seem to have a lot of hubris to add a range check to it.
> 
> I didn't understand the entire sentence, since I'm not a native English speaker. 
>   Sorry for that.  About adding a range check, I'm not against it.  But what to 
> do in that case?  abort()?  I don't see anything significantly better?  In the 

s/better?/better./

> Linux kernel, the used something BUILD_BUG, but I don't know those macros very 

s/the used/they used/

Sorry for the typos.

> much.
> 
> I'm really open to discussion about what would the the best behavior when max < 
> min.
> 
> Cheers,
> 
> Alex
> 
>>
>> Oh, I just checked hubris in the dictionary and it seems you did mention ego. 
>> I'll try to rebate you with something useful.
>>
>> If you run `grep -rn 'arc4random_uniform('` in the OpenBSD tree, there will be 
>> many cases where you'd really benefit from this.  I'll just pick a few:
>>
>>
>> sys/net/pf_lb.c:224:
>>              cut = arc4random_uniform(1 + high - low) + low;
>> better as:
>>              cut = arc4random_range(low, high);
>>
>>
>> sys/kern/kern_fork.c:648:
>>          pid = 2 + arc4random_uniform(PID_MAX - 1);
>> better as:
>>          pid = arc4random_range(2, PID_MAX);
>>
>>
>> usr.bin/nc/netcat.c:1501:
>>                  cp = arc4random_uniform(x + 1);
>> better as:
>>                  cp = arc4random_range(0, x);
>>
>>

-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 31, 2022, 3:59 p.m. UTC | #9
Hi Theo,

On 12/31/22 16:13, Alejandro Colomar wrote:
> Hi Theo,
> 
> On 12/31/22 15:56, Alejandro Colomar wrote:
>>>
>>> I do not like your proposal at all.  A function like arc4random_range()
>>> is even more likely to be used wrong by passing backwards points
>>> and you seem to have a lot of hubris to add a range check to it.
> 
> I didn't understand the entire sentence, since I'm not a native English speaker. 
>   Sorry for that.  About adding a range check, I'm not against it.  But what to 
> do in that case?  abort()?  I don't see anything significantly better?  In the 
> Linux kernel, the used something BUILD_BUG, but I don't know those macros very 
> much.
> 
> I'm really open to discussion about what would the the best behavior when max < 
> min.

Since there's no obvious thing to do when the bounds are reversed (some may want 
to abort(); others may prefer to set errno?; others may just want to ignore the 
possibility...

I tried to understand what it does with the obvious implementation:


 > Alejandro Colomar <alx.manpages@gmail.com> wrote:
 >> uint32_t
 >> arc4random_range(uint32_t min, uint32_t max)
 >> {
 >> 	return arc4random_uniform(max - min + 1) + min;
 >> }

Well, let's substitute with some actual values:

	arc4random_range(7, 4);

This will result in:

	arc4random_uniform(4 - 7 + 1) + 7;

which evaluates to:

	arc4random_uniform(-2) + 7;

and is equivalent to:

	arc4random_uniform(UINT32_MAX - 1) + 7;

Let's first ignore the +7.  The arc4random_uniform(UINT32_MAX - 1) call can 
generate 2^32 - 2 random numbers.  By offsetting to 7, the 2 value that are 
excluded are 6 and 5.

So it seems that a reversed call to arc4random_range(min, max) has an 
interesting property: it generates random numbers outside of the non-inclusive 
range (min, max), that is the compementary set of numbers that arc4random(max, 
min) would produce.

If properly documented, it's not a bad behavior.

So, I'd rather not check bounds, and instead document the behavior when the 
ranges are reversed.  Then, anyone is free to add their own wrapper that adds 
checks and performs their favourite action on reversed input, which might differ 
from one programmer to another.

Cheers,

Alex

> 
> Cheers,
> 
> Alex
> 
>>
>> Oh, I just checked hubris in the dictionary and it seems you did mention ego. 
>> I'll try to rebate you with something useful.
>>
>> If you run `grep -rn 'arc4random_uniform('` in the OpenBSD tree, there will be 
>> many cases where you'd really benefit from this.  I'll just pick a few:
>>
>>
>> sys/net/pf_lb.c:224:
>>              cut = arc4random_uniform(1 + high - low) + low;
>> better as:
>>              cut = arc4random_range(low, high);
>>
>>
>> sys/kern/kern_fork.c:648:
>>          pid = 2 + arc4random_uniform(PID_MAX - 1);
>> better as:
>>          pid = arc4random_range(2, PID_MAX);
>>
>>
>> usr.bin/nc/netcat.c:1501:
>>                  cp = arc4random_uniform(x + 1);
>> better as:
>>                  cp = arc4random_range(0, x);
>>
>>

-- 
<http://www.alejandro-colomar.es/>
  
Alejandro Colomar Dec. 31, 2022, 4:03 p.m. UTC | #10
On 12/31/22 16:59, Alejandro Colomar wrote:
> Hi Theo,
> 
> On 12/31/22 16:13, Alejandro Colomar wrote:
>> Hi Theo,
>>
>> On 12/31/22 15:56, Alejandro Colomar wrote:
>>>>
>>>> I do not like your proposal at all.  A function like arc4random_range()
>>>> is even more likely to be used wrong by passing backwards points
>>>> and you seem to have a lot of hubris to add a range check to it.
>>
>> I didn't understand the entire sentence, since I'm not a native English 
>> speaker.   Sorry for that.  About adding a range check, I'm not against it.  
>> But what to do in that case?  abort()?  I don't see anything significantly 
>> better?  In the Linux kernel, the used something BUILD_BUG, but I don't know 
>> those macros very much.
>>
>> I'm really open to discussion about what would the the best behavior when max 
>> < min.
> 
> Since there's no obvious thing to do when the bounds are reversed (some may want 
> to abort(); others may prefer to set errno?; others may just want to ignore the 
> possibility...
> 
> I tried to understand what it does with the obvious implementation:
> 
> 
>  > Alejandro Colomar <alx.manpages@gmail.com> wrote:
>  >> uint32_t
>  >> arc4random_range(uint32_t min, uint32_t max)
>  >> {
>  >>     return arc4random_uniform(max - min + 1) + min;
>  >> }
> 
> Well, let's substitute with some actual values:
> 
>      arc4random_range(7, 4);
> 
> This will result in:
> 
>      arc4random_uniform(4 - 7 + 1) + 7;
> 
> which evaluates to:
> 
>      arc4random_uniform(-2) + 7;
> 
> and is equivalent to:
> 
>      arc4random_uniform(UINT32_MAX - 1) + 7;
> 
> Let's first ignore the +7.  The arc4random_uniform(UINT32_MAX - 1) call can 
> generate 2^32 - 2 random numbers.  By offsetting to 7, the 2 value that are 
> excluded are 6 and 5.
> 
> So it seems that a reversed call to arc4random_range(min, max) has an 
> interesting property: it generates random numbers outside of the non-inclusive 
> range (min, max),

should have been (max, min).

> that is the compementary set of numbers that arc4random(max, min) would produce.
> 
> If properly documented, it's not a bad behavior.
> 
> So, I'd rather not check bounds, and instead document the behavior when the 
> ranges are reversed.  Then, anyone is free to add their own wrapper that adds 
> checks and performs their favourite action on reversed input, which might differ 
> from one programmer to another.
> 
> Cheers,
> 
> Alex
> 
>>
>> Cheers,
>>
>> Alex
>>
>>>
>>> Oh, I just checked hubris in the dictionary and it seems you did mention ego. 
>>> I'll try to rebate you with something useful.
>>>
>>> If you run `grep -rn 'arc4random_uniform('` in the OpenBSD tree, there will 
>>> be many cases where you'd really benefit from this.  I'll just pick a few:
>>>
>>>
>>> sys/net/pf_lb.c:224:
>>>              cut = arc4random_uniform(1 + high - low) + low;
>>> better as:
>>>              cut = arc4random_range(low, high);
>>>
>>>
>>> sys/kern/kern_fork.c:648:
>>>          pid = 2 + arc4random_uniform(PID_MAX - 1);
>>> better as:
>>>          pid = arc4random_range(2, PID_MAX);
>>>
>>>
>>> usr.bin/nc/netcat.c:1501:
>>>                  cp = arc4random_uniform(x + 1);
>>> better as:
>>>                  cp = arc4random_range(0, x);
>>>
>>>
> 

-- 
<http://www.alejandro-colomar.es/>
  
Damien Miller Dec. 31, 2022, 11:07 p.m. UTC | #11
On Sat, 31 Dec 2022, Theo de Raadt wrote:

> Also, right now an (incorrect?) call of arc4random_uniform(0)
> will return 0, but with your proposal it will return a non-zero
> number.  Have you audited the entire universe of software to
> ensure that your change doesn't introduce a bug in some other
> piece of software?  I doubt you did that.  Very unprofessional
> of you to not study the impact and just wave the issue away.
> 
> I think Special-casing the value of 0 to mean something new
> and undocumented behaviour makes no sense.  It is even potentially
> undocumentable.

I agree - specifying a zero upper-bound is numerically nonsensical,
and could often be the result of a bug in the caller.

Changing it is likely to break code like this in a plausibly exploitable
way:

elem_t *random_elem(elem_t **elems, size_t nelems) {
	return elems[arc4random_uniform(nelems)];
}

Therefore IMO the only safe return from arc4random_uniform(0) is 0.

That changing make it fractionally simpler to implement one particular
wrapper doesn't IMO justify it.

-d
  
Alejandro Colomar Dec. 31, 2022, 11:58 p.m. UTC | #12
Hello Damien,

On 1/1/23 00:07, Damien Miller wrote:
> On Sat, 31 Dec 2022, Theo de Raadt wrote:
> 
>> Also, right now an (incorrect?) call of arc4random_uniform(0)
>> will return 0, but with your proposal it will return a non-zero
>> number.  Have you audited the entire universe of software to
>> ensure that your change doesn't introduce a bug in some other
>> piece of software?  I doubt you did that.  Very unprofessional
>> of you to not study the impact and just wave the issue away.
>>
>> I think Special-casing the value of 0 to mean something new
>> and undocumented behaviour makes no sense.  It is even potentially
>> undocumentable.
> 
> I agree - specifying a zero upper-bound is numerically nonsensical,
> and could often be the result of a bug in the caller.
> 
> Changing it is likely to break code like this in a plausibly exploitable
> way:
> 
> elem_t *random_elem(elem_t **elems, size_t nelems) {
> 	return elems[arc4random_uniform(nelems)];
> }

The above code is already broken.  In case nelems is 0, the array has exactly 0 
elements, so the pointer &elems[0] is a pointer to one-past-the-last element. 
It is legal to hold such a pointer, but not to dereference it (I guess I don't 
need to quote the standard here).

Such a pointer dereference *is dangerous*, and *is very-likely exploitable*.

Having a random 32-bit number instead is likely to be a pointer addressing an 
invalid memory address, and result in a crash.  And crashes are good, right?


Changing the behavior of arc4random_uniform() wouldn't make this code more 
broken, but rather uncover the bug in it.

> 
> Therefore IMO the only safe return from arc4random_uniform(0) is 0.

I'd argue it's the opposite.  0 is the most unsafe value it can return in such a 
case, since it's the least likely to result in a crash.  The Undefined Behavior 
is invoked, and in a way that is likely to modify memory that is available to 
the process.

42 would be a better value.
An even better value would be UINT32_MAX, which would almost-certainly guarantee 
a crash everywhere.
However, it makes more sense to just let it be an unbounded random value, which 
will likely result in the same crashes as with UINT32_MAX, but would be more 
useful for other purposes.

> 
> That changing make it fractionally simpler to implement one particular
> wrapper doesn't IMO justify it.
> 
> -d

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>
  
Ariadne Conill Jan. 1, 2023, 7:48 a.m. UTC | #13
Hi,

On Sun, 1 Jan 2023, Alejandro Colomar via Libc-alpha wrote:

> Hello Damien,
>
> On 1/1/23 00:07, Damien Miller wrote:
>> On Sat, 31 Dec 2022, Theo de Raadt wrote:
>> 
>>> Also, right now an (incorrect?) call of arc4random_uniform(0)
>>> will return 0, but with your proposal it will return a non-zero
>>> number.  Have you audited the entire universe of software to
>>> ensure that your change doesn't introduce a bug in some other
>>> piece of software?  I doubt you did that.  Very unprofessional
>>> of you to not study the impact and just wave the issue away.
>>> 
>>> I think Special-casing the value of 0 to mean something new
>>> and undocumented behaviour makes no sense.  It is even potentially
>>> undocumentable.
>> 
>> I agree - specifying a zero upper-bound is numerically nonsensical,
>> and could often be the result of a bug in the caller.
>> 
>> Changing it is likely to break code like this in a plausibly exploitable
>> way:
>> 
>> elem_t *random_elem(elem_t **elems, size_t nelems) {
>> 	return elems[arc4random_uniform(nelems)];
>> }
>
> The above code is already broken.  In case nelems is 0, the array has exactly 
> 0 elements, so the pointer &elems[0] is a pointer to one-past-the-last 
> element. It is legal to hold such a pointer, but not to dereference it (I 
> guess I don't need to quote the standard here).
>
> Such a pointer dereference *is dangerous*, and *is very-likely exploitable*.
>
> Having a random 32-bit number instead is likely to be a pointer addressing an 
> invalid memory address, and result in a crash.  And crashes are good, right?

In scenarios where available address space is constrained, the likelihood 
of a crash verses state corruption elsewhere in a program is reduced 
considerably.  I think we should avoid defining interfaces based on the 
assumption that some minimal amount of address space is available.

> Changing the behavior of arc4random_uniform() wouldn't make this code more 
> broken, but rather uncover the bug in it.

The better approach would be for random_elem to check that there is at 
least one element available.  Making arc4random_uniform(0) do something 
unexpected will probably break legitimate code.  I can think of many 
situations where arc4random_uniform(0) would be legitimately called.

>> Therefore IMO the only safe return from arc4random_uniform(0) is 0.
>
> I'd argue it's the opposite.  0 is the most unsafe value it can return in 
> such a case, since it's the least likely to result in a crash.  The Undefined 
> Behavior is invoked, and in a way that is likely to modify memory that is 
> available to the process.

If you are using arc4random_uniform to blindly pick elements out of an 
array without doing the bounds checking yourself, you're already setting 
yourself up for failure.  In an ideal world, we would add an additional 
API which is designed for picking elements and which did this type of 
bounds check and then failed safely.  Something like:

inline void *
arc4random_pickelem(void **elems, size_t elemsize, size_t nelems)
{
 	if (__builtin_object_size(elems, 0) < (nelems * elemsize)) {
 		errno = EINVAL;
 		return NULL;
 	}

 	ptrdiff_t diff = arc4random_uniform(nelems) * elemsize;
 	return (char *)(elems + diff);
}

Changing arc4random_uniform(0) to return non-zero will probably break 
monte carlo simulations and such which reasonably depend on the behavior 
that arc4random_uniform(0) == 0.

>
> 42 would be a better value.
> An even better value would be UINT32_MAX, which would almost-certainly 
> guarantee a crash everywhere.
> However, it makes more sense to just let it be an unbounded random value, 
> which will likely result in the same crashes as with UINT32_MAX, but would be 
> more useful for other purposes.

What purpose do you envision where arc4random_uniform(0) being non-zero 
would be considered useful?  If you want to safely pick elements from 
arrays at random, then we should build an interface for doing this safely, 
rather then changing the behavior of pre-existing ones.

Ariadne
  
Theo de Raadt Jan. 1, 2023, 8:34 a.m. UTC | #14
Alejandro,

Your arguments are childishly wrong.

Your API will not be incorporated, and the existing API will
not be changed.

Please stop.

Alejandro Colomar <alx.manpages@gmail.com> wrote:

> Hello Damien,
> 
> On 1/1/23 00:07, Damien Miller wrote:
> > On Sat, 31 Dec 2022, Theo de Raadt wrote:
> > 
> >> Also, right now an (incorrect?) call of arc4random_uniform(0)
> >> will return 0, but with your proposal it will return a non-zero
> >> number.  Have you audited the entire universe of software to
> >> ensure that your change doesn't introduce a bug in some other
> >> piece of software?  I doubt you did that.  Very unprofessional
> >> of you to not study the impact and just wave the issue away.
> >>
> >> I think Special-casing the value of 0 to mean something new
> >> and undocumented behaviour makes no sense.  It is even potentially
> >> undocumentable.
> > I agree - specifying a zero upper-bound is numerically nonsensical,
> > and could often be the result of a bug in the caller.
> > Changing it is likely to break code like this in a plausibly
> > exploitable
> > way:
> > elem_t *random_elem(elem_t **elems, size_t nelems) {
> > 	return elems[arc4random_uniform(nelems)];
> > }
> 
> The above code is already broken.  In case nelems is 0, the array has
> exactly 0 elements, so the pointer &elems[0] is a pointer to
> one-past-the-last element. It is legal to hold such a pointer, but not
> to dereference it (I guess I don't need to quote the standard here).
> 
> Such a pointer dereference *is dangerous*, and *is very-likely exploitable*.
> 
> Having a random 32-bit number instead is likely to be a pointer
> addressing an invalid memory address, and result in a crash.  And
> crashes are good, right?
> 
> 
> Changing the behavior of arc4random_uniform() wouldn't make this code
> more broken, but rather uncover the bug in it.
> 
> > Therefore IMO the only safe return from arc4random_uniform(0) is 0.
> 
> I'd argue it's the opposite.  0 is the most unsafe value it can return
> in such a case, since it's the least likely to result in a crash.  The
> Undefined Behavior is invoked, and in a way that is likely to modify
> memory that is available to the process.
> 
> 42 would be a better value.
> An even better value would be UINT32_MAX, which would almost-certainly
> guarantee a crash everywhere.
> However, it makes more sense to just let it be an unbounded random
> value, which will likely result in the same crashes as with
> UINT32_MAX, but would be more useful for other purposes.
> 
> > That changing make it fractionally simpler to implement one
> > particular
> > wrapper doesn't IMO justify it.
> > -d
> 
> Cheers,
> 
> Alex
> 
> -- 
> <http://www.alejandro-colomar.es/>
  
Theo de Raadt Jan. 1, 2023, 8:41 a.m. UTC | #15
Alejandro Colomar <alx.manpages@gmail.com> wrote:

> Hi Theo,
> 
> On 12/31/22 16:13, Alejandro Colomar wrote:
> > Hi Theo,
> > On 12/31/22 15:56, Alejandro Colomar wrote:
> >>>
> >>> I do not like your proposal at all.  A function like arc4random_range()
> >>> is even more likely to be used wrong by passing backwards points
> >>> and you seem to have a lot of hubris to add a range check to it.
> > I didn't understand the entire sentence, since I'm not a native
> > English speaker.   Sorry for that.  About adding a range check, I'm
> > not against it.  But what to do in that case?  abort()?  I don't see
> > anything significantly better?  In the Linux kernel, the used
> > something BUILD_BUG, but I don't know those macros very much.
> > I'm really open to discussion about what would the the best behavior
> > when max < min.
> 
> Since there's no obvious thing to do when the bounds are reversed
> (some may want to abort(); others may prefer to set errno?; others may
> just want to ignore the possibility...
> 
> I tried to understand what it does with the obvious implementation:
> 
> 
> > Alejandro Colomar <alx.manpages@gmail.com> wrote:
> >> uint32_t
> >> arc4random_range(uint32_t min, uint32_t max)
> >> {
> >> 	return arc4random_uniform(max - min + 1) + min;
> >> }
> 
> Well, let's substitute with some actual values:
> 
> 	arc4random_range(7, 4);
> 
> This will result in:
> 
> 	arc4random_uniform(4 - 7 + 1) + 7;
> 
> which evaluates to:
> 
> 	arc4random_uniform(-2) + 7;
> 
> and is equivalent to:
> 
> 	arc4random_uniform(UINT32_MAX - 1) + 7;
> 
> Let's first ignore the +7.  The arc4random_uniform(UINT32_MAX - 1)
> call can generate 2^32 - 2 random numbers.  By offsetting to 7, the 2
> value that are excluded are 6 and 5.
> 
> So it seems that a reversed call to arc4random_range(min, max) has an
> interesting property: it generates random numbers outside of the
> non-inclusive range (min, max), that is the compementary set of
> numbers that arc4random(max, min) would produce.
> 
> If properly documented, it's not a bad behavior.

With that quality of reasoning, there are only two possible conclusions
about why this email thread is happening.  I though these conclusions
were original, but others privately made these suggestions also: 

1. you are trolling us

2. you are not sufficiently intelligent to be permitted near a C compiler.

I cannot think of a third option.

I am going to give you the benefit of the doubt and select option 1, and
trolls are deal with calling them trolls, and thennever replying again.  Bye.
  
Otto Moerbeek Jan. 1, 2023, 9:21 a.m. UTC | #16
On Sun, Jan 01, 2023 at 01:48:28AM -0600, Ariadne Conill wrote:

> Hi,
> 
> On Sun, 1 Jan 2023, Alejandro Colomar via Libc-alpha wrote:
> 
> > Hello Damien,
> > 
> > On 1/1/23 00:07, Damien Miller wrote:
> > > On Sat, 31 Dec 2022, Theo de Raadt wrote:
> > > 
> > > > Also, right now an (incorrect?) call of arc4random_uniform(0)
> > > > will return 0, but with your proposal it will return a non-zero
> > > > number.  Have you audited the entire universe of software to
> > > > ensure that your change doesn't introduce a bug in some other
> > > > piece of software?  I doubt you did that.  Very unprofessional
> > > > of you to not study the impact and just wave the issue away.
> > > > 
> > > > I think Special-casing the value of 0 to mean something new
> > > > and undocumented behaviour makes no sense.  It is even potentially
> > > > undocumentable.
> > > 
> > > I agree - specifying a zero upper-bound is numerically nonsensical,
> > > and could often be the result of a bug in the caller.
> > > 
> > > Changing it is likely to break code like this in a plausibly exploitable
> > > way:
> > > 
> > > elem_t *random_elem(elem_t **elems, size_t nelems) {
> > > 	return elems[arc4random_uniform(nelems)];
> > > }
> > 
> > The above code is already broken.  In case nelems is 0, the array has
> > exactly 0 elements, so the pointer &elems[0] is a pointer to
> > one-past-the-last element. It is legal to hold such a pointer, but not
> > to dereference it (I guess I don't need to quote the standard here).
> > 
> > Such a pointer dereference *is dangerous*, and *is very-likely exploitable*.
> > 
> > Having a random 32-bit number instead is likely to be a pointer
> > addressing an invalid memory address, and result in a crash.  And
> > crashes are good, right?
> 
> In scenarios where available address space is constrained, the likelihood of
> a crash verses state corruption elsewhere in a program is reduced
> considerably.  I think we should avoid defining interfaces based on the
> assumption that some minimal amount of address space is available.
> 
> > Changing the behavior of arc4random_uniform() wouldn't make this code
> > more broken, but rather uncover the bug in it.
> 
> The better approach would be for random_elem to check that there is at least
> one element available.  Making arc4random_uniform(0) do something unexpected
> will probably break legitimate code.  I can think of many situations where
> arc4random_uniform(0) would be legitimately called.
> 
> > > Therefore IMO the only safe return from arc4random_uniform(0) is 0.
> > 
> > I'd argue it's the opposite.  0 is the most unsafe value it can return
> > in such a case, since it's the least likely to result in a crash.  The
> > Undefined Behavior is invoked, and in a way that is likely to modify
> > memory that is available to the process.
> 
> If you are using arc4random_uniform to blindly pick elements out of an array
> without doing the bounds checking yourself, you're already setting yourself
> up for failure.  In an ideal world, we would add an additional API which is
> designed for picking elements and which did this type of bounds check and
> then failed safely.  Something like:
> 
> inline void *
> arc4random_pickelem(void **elems, size_t elemsize, size_t nelems)
> {
> 	if (__builtin_object_size(elems, 0) < (nelems * elemsize)) {
> 		errno = EINVAL;
> 		return NULL;
> 	}
> 
> 	ptrdiff_t diff = arc4random_uniform(nelems) * elemsize;
> 	return (char *)(elems + diff);
> }
> 
> Changing arc4random_uniform(0) to return non-zero will probably break monte
> carlo simulations and such which reasonably depend on the behavior that
> arc4random_uniform(0) == 0.
> 
> > 
> > 42 would be a better value.
> > An even better value would be UINT32_MAX, which would almost-certainly
> > guarantee a crash everywhere.
> > However, it makes more sense to just let it be an unbounded random
> > value, which will likely result in the same crashes as with UINT32_MAX,
> > but would be more useful for other purposes.
> 
> What purpose do you envision where arc4random_uniform(0) being non-zero
> would be considered useful?  If you want to safely pick elements from arrays
> at random, then we should build an interface for doing this safely, rather
> then changing the behavior of pre-existing ones.
> 
> Ariadne

arc4random() is a well established API. Changing it, indepedent if you
consdider the 0 case "right" or "wrong", will introduce a portability
nightmare, for old code as well as for new code. You'll never know if
your runtime has the old or new behaviour, so in practice it will
cause many headaches and introduce bugs. 

	-Otto
  
Alejandro Colomar Jan. 1, 2023, 2:05 p.m. UTC | #17
Hello Ariadne,

On 1/1/23 08:48, Ariadne Conill wrote:
> On Sun, 1 Jan 2023, Alejandro Colomar via Libc-alpha wrote:
>> On 1/1/23 00:07, Damien Miller wrote:
>> Having a random 32-bit number instead is likely to be a pointer addressing an 
>> invalid memory address, and result in a crash.  And crashes are good, right?
> 
> In scenarios where available address space is constrained, the likelihood of a 
> crash verses state corruption elsewhere in a program is reduced considerably.  I 
> think we should avoid defining interfaces based on the assumption that some 
> minimal amount of address space is available.
> 
>> Changing the behavior of arc4random_uniform() wouldn't make this code more 
>> broken, but rather uncover the bug in it.
> 
> The better approach would be for random_elem to check that there is at least one 
> element available.

Agree.

>  Making arc4random_uniform(0) do something unexpected will 
> probably break legitimate code.  I can think of many situations where 
> arc4random_uniform(0) would be legitimately called.

Please show (and/or link to) some examples, if possible.  I'm really curious.

> 
>>> Therefore IMO the only safe return from arc4random_uniform(0) is 0.
>>
>> I'd argue it's the opposite.  0 is the most unsafe value it can return in such 
>> a case, since it's the least likely to result in a crash.  The Undefined 
>> Behavior is invoked, and in a way that is likely to modify memory that is 
>> available to the process.
> 
> If you are using arc4random_uniform to blindly pick elements out of an array 
> without doing the bounds checking yourself, you're already setting yourself up 
> for failure.  In an ideal world, we would add an additional API which is 
> designed for picking elements and which did this type of bounds check and then 
> failed safely.  Something like:
> 
> inline void *
> arc4random_pickelem(void **elems, size_t elemsize, size_t nelems)
> {
>      if (__builtin_object_size(elems, 0) < (nelems * elemsize)) {
>          errno = EINVAL;
>          return NULL;
>      }
> 
>      ptrdiff_t diff = arc4random_uniform(nelems) * elemsize;
>      return (char *)(elems + diff);
> }
> 
> Changing arc4random_uniform(0) to return non-zero will probably break monte 
> carlo simulations and such which reasonably depend on the behavior that 
> arc4random_uniform(0) == 0.

I don't know that code.  Please show it or link to it.

> 
>>
>> 42 would be a better value.
>> An even better value would be UINT32_MAX, which would almost-certainly 
>> guarantee a crash everywhere.
>> However, it makes more sense to just let it be an unbounded random value, 
>> which will likely result in the same crashes as with UINT32_MAX, but would be 
>> more useful for other purposes.
> 
> What purpose do you envision where arc4random_uniform(0) being non-zero would be 
> considered useful?  If you want to safely pick elements from arrays at random, 
> then we should build an interface for doing this safely, rather then changing 
> the behavior of pre-existing ones.

So far, the only code I've touched myself that needed this is the _range() 
variant that I wrote for selecting a value within a [min, max] range.  I don't 
even need to modify arc4random_uniform(3) for this, because it's not implemented 
in terms of arc4random_uniform(3), but rather in terms of 
arc4random(3)/getrandom(2), and I had to implement the _uniform() API myself. 
Since I didn't look at any existing implementation, I wrote the behavior for (0) 
that was most relevant for my use case, which is to allow the _range() function 
to work well for any input.

I will probably rename my _uniform() implementation to _uniform_wrap(), to avoid 
confusing users.

The manual page doesn't help either, since it doesn't document what happens on 
(0).  Maybe that's something you could add to CAVEATS in you manual page.

You can see the code here:
<https://github.com/shadow-maint/shadow/pull/624/files>

Maybe Jason has more examples to show you from Linux internals, since he 
implemented the same behavior for Linux recently.

> 
> Ariadne

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
  
Arsen Arsenović Jan. 1, 2023, 9:37 p.m. UTC | #18
Hi,

Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> writes:

> Special-casing it in the implementation to return 0 was useless.
> Instead, considering 0 as the value after UINT32_MAX has the property
> that it allows implementing the following function without any
> special cases:
>
> uint32_t
> arc4random_range(uint32_t min, uint32_t max)
> {
> 	return arc4random_uniform(max - min + 1) + min;
> }
>
> This works for any values of min and max (as long as min <= max, of
> course), even for (0, UINT32_MAX).
>
> Oh, and the implementation of arc4random_uniform(3) is now 2 lines
> simpler. :)

While I think this is better than the original meaning, the range [n, n)
cannot produce a (N_0) value.  For this reason, I believe the best
behavior for this would be to abort.

Zero is wrong because it's not in [0, 0), which goes also for all values
of [0, 2**32), but not for [0, -1] in the unsigned number space (which
is why I'm giving it more credit than just returning zero), though
specifying that doesn't sit easy with me, since [x, n) and [x, n-1]
aren't necessarily equivalent.

Furthermore, should the need to rely on "return zero for empty range"
behavior arise, an Autoconf test for arc4random_uniform (0) behavior
would be simpler if the test program aborted, as arc4random_uniform (n)
== 0 is quite valid, and easily could happen at configure time.

Should one needs to produce a uniformly distributed value over the full
range of [0, 2**32), they should invoke arc4random ().  Moving the
special case into a wrapper function is similarly trivial to achieve
either of the discussed behaviors.

Of course, changing existing APIs is never easy...  At least aborts are
easy to spot.

Have a great night.
  
Alejandro Colomar Jan. 1, 2023, 11:50 p.m. UTC | #19
[CC -= Theo]

Hi Arsen,

On 1/1/23 22:37, Arsen Arsenović wrote:
> Hi,
> 
> Alejandro Colomar via Libc-alpha <libc-alpha@sourceware.org> writes:
> 
>> Special-casing it in the implementation to return 0 was useless.
>> Instead, considering 0 as the value after UINT32_MAX has the property
>> that it allows implementing the following function without any
>> special cases:
>>
>> uint32_t
>> arc4random_range(uint32_t min, uint32_t max)
>> {
>> 	return arc4random_uniform(max - min + 1) + min;
>> }
>>
>> This works for any values of min and max (as long as min <= max, of
>> course), even for (0, UINT32_MAX).
>>
>> Oh, and the implementation of arc4random_uniform(3) is now 2 lines
>> simpler. :)
> 
> While I think this is better than the original meaning, the range [n, n)
> cannot produce a (N_0) value.

I'm not sure I follow.

To clarify, the arc4random_range() function suggested above is not defined to 
produce a number in the range [min, max), but rather a number in the range [min, 
max].  So, if you call it as arc4random_range(n, n), the range requested would 
be [n, n], which has exactly one possible value.

>  For this reason, I believe the best
> behavior for this would be to abort.
> 
> Zero is wrong because it's not in [0, 0), which goes also for all values
> of [0, 2**32), but not for [0, -1] in the unsigned number space (which
> is why I'm giving it more credit than just returning zero), though
> specifying that doesn't sit easy with me, since [x, n) and [x, n-1]
> aren't necessarily equivalent.

In the unsigned number space, [x, y) behaves equivalently to [x, y-1] for the 
cases where the former is defined.  However, the former is not defined for x==y 
(and an abort would probably be the best thing to do).  The latter does work for 
x==y, due to wrap around, since it can be reinterpreted as [x, y-1+2^32].

The current definition of arc4random_uniform(3) is the former, and the 
documentation didn't even cover the fact that is has no defined behavior for 0. 
The implementation chose a special value for the undefined behavior, which 
doesn't conform to any mathematical definition of the function.

Mathematically, we could redefine the function to use the latter definition, and 
it would be legal, since it is compatible in every defined case, plus has the 
benefit of having no undefined behavior.

However, due to concerns of existing code, we should be cautious with that, and 
an abort(3) might be better.  I'd like to see some existing code if possible 
before deciding.

And if we can't make the current API behave with the latter, more generous, 
definition, we could add a new function:

uint32_t
arc4random_uniform_wrap(uint32_t upper_bound)
{
	if (upper_bound == 0)
		return arc4random();

	return arc4random_uniform(upper_bound);
}

> 
> Furthermore, should the need to rely on "return zero for empty range"
> behavior arise, an Autoconf test for arc4random_uniform (0) behavior
> would be simpler if the test program aborted, as arc4random_uniform (n)
> == 0 is quite valid, and easily could happen at configure time.
> 
> Should one needs to produce a uniformly distributed value over the full
> range of [0, 2**32), they should invoke arc4random ().

Except that the value may be calculated.

>  Moving the
> special case into a wrapper function is similarly trivial to achieve
> either of the discussed behaviors.

Yes, it would be trivial to do:

uint32_t
arc4random_range(uint32_t min, uint32_t max)
{
	if (max == min - 1)
		return arc4random();

  	return arc4random_uniform(max - min + 1) + min;
}


Although that feels to me as a sign that something's wrong in the other API, and 
that we need yet another one which we can call arc4random_below() or 
arc4random_uniform_wrap() that behaves like _uniform() but returns [0, lim-1] 
instead of [0, lim).

> 
> Of course, changing existing APIs is never easy...  At least aborts are
> easy to spot.

Yes, I think an abort(3) is probably better than returning 0.
The 0 is just an off-by-one waiting to bite someone.

> 
> Have a great night.

Have a great night too.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
  
Arsen Arsenović Jan. 2, 2023, 12:02 a.m. UTC | #20
Hey,

Alejandro Colomar <alx.manpages@gmail.com> writes:

>> While I think this is better than the original meaning, the range [n, n)
>> cannot produce a (N_0) value.
>
> I'm not sure I follow.
>
> To clarify, the arc4random_range() function suggested above is not defined to
> produce a number in the range [min, max), but rather a number in the range
> [min, max].  So, if you call it as arc4random_range(n, n), the range requested
> would be [n, n], which has exactly one possible value.

Heh, excuse my overgeneralizing, I was talking about _uniform, which
produces values in the range of [0, n), when n==0, that's an empty
range, which is also the same range as every [x,x), which is why I used
the general case in my message.

>>  For this reason, I believe the best
>> behavior for this would be to abort.
>> Zero is wrong because it's not in [0, 0), which goes also for all values
>> of [0, 2**32), but not for [0, -1] in the unsigned number space (which
>> is why I'm giving it more credit than just returning zero), though
>> specifying that doesn't sit easy with me, since [x, n) and [x, n-1]
>> aren't necessarily equivalent.
>
> In the unsigned number space, [x, y) behaves equivalently to [x, y-1] for the
> cases where the former is defined.  However, the former is not defined for x==y
> (and an abort would probably be the best thing to do).  The latter does work
> for x==y, due to wrap around, since it can be reinterpreted as [x, y-1+2^32].
>
> The current definition of arc4random_uniform(3) is the former, and the
> documentation didn't even cover the fact that is has no defined behavior for
> 0. The implementation chose a special value for the undefined behavior, which
> doesn't conform to any mathematical definition of the function.
>
> Mathematically, we could redefine the function to use the latter definition,
> and it would be legal, since it is compatible in every defined case, plus has
> the benefit of having no undefined behavior.

Heh, indeed, the original (libbsd) definition is not very maths-y, but I
interpreted "... but less than upper_bound" as [0, n), in which case,
n==0 is invalid as you said, so that's why I suggest aborting.

Of course, that doesn't happen if the definition is changed to be
[0,n-1] in the unsigned space, yes.

Both before and after such a change, "return 0" is odd ;)

> However, due to concerns of existing code, we should be cautious with that, and
> an abort(3) might be better.  I'd like to see some existing code if possible
> before deciding.
>
> And if we can't make the current API behave with the latter, more generous,
> definition, we could add a new function:
>
> uint32_t
> arc4random_uniform_wrap(uint32_t upper_bound)
> {
> 	if (upper_bound == 0)
> 		return arc4random();
>
> 	return arc4random_uniform(upper_bound);
> }
>
>> Furthermore, should the need to rely on "return zero for empty range"
>> behavior arise, an Autoconf test for arc4random_uniform (0) behavior
>> would be simpler if the test program aborted, as arc4random_uniform (n)
>> == 0 is quite valid, and easily could happen at configure time.
>> Should one needs to produce a uniformly distributed value over the full
>> range of [0, 2**32), they should invoke arc4random ().
>
> Except that the value may be calculated.

Right, but I see this case as one of two:

1) Zero is unintentional, abort () is desirable,
2) Zero is intentional and should mean XYZ, user should explicitly check
   for zero to do XYZ.

After all, just based on what value the expression computes to, we can't
be sure that it's what the user wanted.

>>  Moving the
>> special case into a wrapper function is similarly trivial to achieve
>> either of the discussed behaviors.
>
> Yes, it would be trivial to do:
>
> uint32_t
> arc4random_range(uint32_t min, uint32_t max)
> {
> 	if (max == min - 1)
> 		return arc4random();
>
>  	return arc4random_uniform(max - min + 1) + min;
> }
>
>
> Although that feels to me as a sign that something's wrong in the other API,
> and that we need yet another one which we can call arc4random_below() or
> arc4random_uniform_wrap() that behaves like _uniform() but returns [0, lim-1]
> instead of [0, lim).

Indeed, including a wrapper like you suggested to accompany the
arc4random APIs would be nice.

>> Of course, changing existing APIs is never easy...  At least aborts are
>> easy to spot.
>
> Yes, I think an abort(3) is probably better than returning 0.
> The 0 is just an off-by-one waiting to bite someone.
>
>> Have a great night.
>
> Have a great night too.
>
> Cheers,
> Alex

Thanks,
  
Alejandro Colomar Jan. 2, 2023, 11:24 a.m. UTC | #21
Hey Arsen,

On 1/2/23 01:02, Arsen Arsenović wrote:
> Hey,
> 
> Alejandro Colomar <alx.manpages@gmail.com> writes:
> 
>>> While I think this is better than the original meaning, the range [n, n)
>>> cannot produce a (N_0) value.
>>
>> I'm not sure I follow.
>>
>> To clarify, the arc4random_range() function suggested above is not defined to
>> produce a number in the range [min, max), but rather a number in the range
>> [min, max].  So, if you call it as arc4random_range(n, n), the range requested
>> would be [n, n], which has exactly one possible value.
> 
> Heh, excuse my overgeneralizing, I was talking about _uniform, which
> produces values in the range of [0, n), when n==0, that's an empty
> range, which is also the same range as every [x,x), which is why I used
> the general case in my message.

No problem :)
It makes sense to me now.

> 
>>>   For this reason, I believe the best
>>> behavior for this would be to abort.
>>> Zero is wrong because it's not in [0, 0), which goes also for all values
>>> of [0, 2**32), but not for [0, -1] in the unsigned number space (which
>>> is why I'm giving it more credit than just returning zero), though
>>> specifying that doesn't sit easy with me, since [x, n) and [x, n-1]
>>> aren't necessarily equivalent.
>>
>> In the unsigned number space, [x, y) behaves equivalently to [x, y-1] for the
>> cases where the former is defined.  However, the former is not defined for x==y
>> (and an abort would probably be the best thing to do).  The latter does work
>> for x==y, due to wrap around, since it can be reinterpreted as [x, y-1+2^32].
>>
>> The current definition of arc4random_uniform(3) is the former, and the
>> documentation didn't even cover the fact that is has no defined behavior for
>> 0. The implementation chose a special value for the undefined behavior, which
>> doesn't conform to any mathematical definition of the function.
>>
>> Mathematically, we could redefine the function to use the latter definition,
>> and it would be legal, since it is compatible in every defined case, plus has
>> the benefit of having no undefined behavior.
> 
> Heh, indeed, the original (libbsd) definition is not very maths-y, but I
> interpreted "... but less than upper_bound" as [0, n), in which case,
> n==0 is invalid as you said, so that's why I suggest aborting.

Yeah, I guess libbsd just copied OpenBSD's behavior (and probably every 
implementation has done so).

And yes, they all have that non-maths-y behavior.

Aborting might be a good thing to do, yes.

> 
> Of course, that doesn't happen if the definition is changed to be
> [0,n-1] in the unsigned space, yes.
> 
> Both before and after such a change, "return 0" is odd ;)

Yup, we agree on that.

> 
>> However, due to concerns of existing code, we should be cautious with that, and
>> an abort(3) might be better.  I'd like to see some existing code if possible
>> before deciding.
>>
>> And if we can't make the current API behave with the latter, more generous,
>> definition, we could add a new function:
>>
>> uint32_t
>> arc4random_uniform_wrap(uint32_t upper_bound)
>> {
>> 	if (upper_bound == 0)
>> 		return arc4random();
>>
>> 	return arc4random_uniform(upper_bound);
>> }
>>
>>> Furthermore, should the need to rely on "return zero for empty range"
>>> behavior arise, an Autoconf test for arc4random_uniform (0) behavior
>>> would be simpler if the test program aborted, as arc4random_uniform (n)
>>> == 0 is quite valid, and easily could happen at configure time.
>>> Should one needs to produce a uniformly distributed value over the full
>>> range of [0, 2**32), they should invoke arc4random ().
>>
>> Except that the value may be calculated.
> 
> Right, but I see this case as one of two:
> 
> 1) Zero is unintentional, abort () is desirable,

It might be, but it's uncommon for libc to abort(3) a program on undefined 
behavior.  I don't oppose it, though.  Maybe others can speak up.

> 2) Zero is intentional and should mean XYZ, user should explicitly check
>     for zero to do XYZ.

I wonder if a correct program can expect a meaning different than [x, y-1].  I 
can't imagine any code that legitimately expects 0.

> 
> After all, just based on what value the expression computes to, we can't
> be sure that it's what the user wanted.
> 
>>>   Moving the
>>> special case into a wrapper function is similarly trivial to achieve
>>> either of the discussed behaviors.
>>
>> Yes, it would be trivial to do:
>>
>> uint32_t
>> arc4random_range(uint32_t min, uint32_t max)
>> {
>> 	if (max == min - 1)
>> 		return arc4random();
>>
>>   	return arc4random_uniform(max - min + 1) + min;
>> }
>>
>>
>> Although that feels to me as a sign that something's wrong in the other API,
>> and that we need yet another one which we can call arc4random_below() or
>> arc4random_uniform_wrap() that behaves like _uniform() but returns [0, lim-1]
>> instead of [0, lim).
> 
> Indeed, including a wrapper like you suggested to accompany the
> arc4random APIs would be nice.

I'll send a patch if there's agreement that it's what we want.

> 
>>> Of course, changing existing APIs is never easy...  At least aborts are
>>> easy to spot.
>>
>> Yes, I think an abort(3) is probably better than returning 0.
>> The 0 is just an off-by-one waiting to bite someone.
>>

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
  

Patch

diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c
index 5aa98d1c13..1cd52c0d1c 100644
--- a/stdlib/arc4random_uniform.c
+++ b/stdlib/arc4random_uniform.c
@@ -29,15 +29,13 @@ 
    the asked range after range adjustment.
 
    The algorithm avoids modulo and divide operations, which might be costly
-   depending on the architecture.  */
+   depending on the architecture.
+
+   0 is treated as if it were UINT32_MAX + 1, and so arc4random_uniform(0)
+   is equivalent to arc4random().  */
 uint32_t
 __arc4random_uniform (uint32_t n)
 {
-  if (n <= 1)
-    /* There is no valid return value for a zero limit, and 0 is the
-       only possible result for limit 1.  */
-    return 0;
-
   /* Powers of two are easy.  */
   if (powerof2 (n))
     return __arc4random () & (n - 1);