mbox

[v5,00/10] Add arc4random support

Message ID 20220504135921.3679585-1-adhemerval.zanella@linaro.org
Headers

Message

Adhemerval Zanella Netto May 4, 2022, 1:59 p.m. UTC
  This patch adds the arc4random, arc4random_buf, and arc4random_uniform
along with optimized versions for x86_64, aarch64, powerpc64, and
s390x.

The generic implementation is based on scalar Chacha20, with a per
thread state cache allocated in TCB.  The internal state keeps a 256
bytes buffer (8 ChaCha20 blocks) plus the cipher state, which allows
to better use the vectorized optimized version.  It would be possible
to use just 128 bytes, but it would require to rewrite the AVX2
optimization (and possible it would lower performance slight).

The initial state and reseed uses getrandom or /dev/urandom as
fallback and reseeds the internal state on every 16MB of consumed
entropy.

There is no fork detection, the internal state is reset only at the
atfork handler.  It does not handle direct clone calls, nor vfork or
_Fork.

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

The generic ChaCha20 implementation is based on the RFC8439 [1] without
the last XOR step.   Since the input stream will either zero bytes
(initial state) or the PRNG output itself this step does not add any
extra entropy.

The optimized ChaCha20 implementations for x86_64, aarch64, powerpc64,
and s390x use vectorized instruction and they are based on libgcrypt
code.

ChaCha20 is used because is the standard cipher used on different
arc4random implementation (BSDs, MacOSX), and recently on Linux random
subsystem.  It also offers a very cheap rekey, which uses periodically
uses kernel entropy to improve randomness;  it is also simpler than AES,
and shows better performance when no specialized instructions are
present.

[1] https://sourceware.org/pipermail/libc-alpha/2018-June/094879.html

v5:
* Added documentation.
* Fixed typos.

v4:
* Fixed typos and expanded comments.
* Fixed powerpc multi-arch organization.

v3:
* Add per-thread cache to remove the lock usage.  It should improve both
  performance and scalability.
* Improve benchmark precision.
* Fixed Hurd test build.

v2:
* Removed the last XOR operation on ChaCha20 implementation (it does
  not much on arc4random usage).
* Add tst-arc4random-chacha20.c and refactor to check against the
  expected implementation.
* Fixed aarch64 implementation (a last change to move symbols to hidden
  did not change the relocation to use it as well).
* Refactor x86 SSSE3 to SSE2.
* Fixed powerpc64 implementation on BE (use the correct macro to check
  for endianess instead the ones from libgcrpyt).
* Add s390x optimized ChaCha20 implementation.

Adhemerval Zanella (10):
  stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ
    #4417)
  stdlib: Add arc4random tests
  benchtests: Add arc4random benchtest
  aarch64: Add optimized chacha20
  x86: Add SSE2 optimized chacha20
  x86: Add AVX2 optimized chacha20
  powerpc64: Add optimized chacha20
  s390x: Add optimized chacha20
  stdlib: Add TLS optimization to arc4random
  manual: Add documentation for arc4random functions

 LICENSES                                      |  22 +
 NEWS                                          |   4 +
 benchtests/Makefile                           |   5 +-
 benchtests/bench-arc4random.c                 | 224 +++++++
 include/stdlib.h                              |  13 +
 malloc/thread-freeres.c                       |   2 +-
 manual/math.texi                              |  45 ++
 nptl/allocatestack.c                          |   5 +-
 stdlib/Makefile                               |   9 +
 stdlib/Versions                               |   5 +
 stdlib/arc4random.c                           | 177 ++++++
 stdlib/arc4random.h                           |  45 ++
 stdlib/arc4random_uniform.c                   | 140 +++++
 stdlib/chacha20.c                             | 166 ++++++
 stdlib/stdlib.h                               |  14 +
 stdlib/tst-arc4random-chacha20.c              | 166 ++++++
 stdlib/tst-arc4random-fork.c                  | 174 ++++++
 stdlib/tst-arc4random-stats.c                 | 146 +++++
 stdlib/tst-arc4random-thread.c                | 278 +++++++++
 sysdeps/aarch64/Makefile                      |   4 +
 sysdeps/aarch64/chacha20-neon.S               | 323 ++++++++++
 sysdeps/aarch64/chacha20_arch.h               |  40 ++
 sysdeps/generic/chacha20_arch.h               |  24 +
 sysdeps/generic/not-cancel.h                  |   2 +
 sysdeps/generic/tls-internal-struct.h         |   3 +
 sysdeps/generic/tls-internal.c                |  13 +
 sysdeps/generic/tls-internal.h                |   7 +-
 sysdeps/mach/hurd/_Fork.c                     |   2 +
 sysdeps/mach/hurd/i386/libc.abilist           |   3 +
 sysdeps/mach/hurd/not-cancel.h                |   3 +
 sysdeps/nptl/_Fork.c                          |   2 +
 .../powerpc/powerpc64/be/multiarch/Makefile   |   4 +
 .../powerpc64/be/multiarch/chacha20-ppc.c     |   1 +
 .../powerpc64/be/multiarch/chacha20_arch.h    |  42 ++
 sysdeps/powerpc/powerpc64/power8/Makefile     |   5 +
 .../powerpc/powerpc64/power8/chacha20-ppc.c   | 236 ++++++++
 .../powerpc/powerpc64/power8/chacha20_arch.h  |  37 ++
 sysdeps/s390/s390-64/Makefile                 |   4 +
 sysdeps/s390/s390-64/chacha20-vx.S            | 564 ++++++++++++++++++
 sysdeps/s390/s390-64/chacha20_arch.h          |  45 ++
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   3 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   3 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   3 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   3 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   3 +
 .../sysv/linux/microblaze/be/libc.abilist     |   3 +
 .../sysv/linux/microblaze/le/libc.abilist     |   3 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   3 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   3 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   3 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   3 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/not-cancel.h          |   7 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   3 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   3 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   3 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   3 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   3 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   3 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   3 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   3 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   3 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   3 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   3 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   3 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   3 +
 sysdeps/unix/sysv/linux/tls-internal.c        |  33 +-
 sysdeps/unix/sysv/linux/tls-internal.h        |  20 +-
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   3 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   3 +
 sysdeps/x86_64/Makefile                       |   7 +
 sysdeps/x86_64/chacha20-avx2.S                | 313 ++++++++++
 sysdeps/x86_64/chacha20-sse2.S                | 311 ++++++++++
 sysdeps/x86_64/chacha20_arch.h                |  48 ++
 80 files changed, 3826 insertions(+), 16 deletions(-)
 create mode 100644 benchtests/bench-arc4random.c
 create mode 100644 stdlib/arc4random.c
 create mode 100644 stdlib/arc4random.h
 create mode 100644 stdlib/arc4random_uniform.c
 create mode 100644 stdlib/chacha20.c
 create mode 100644 stdlib/tst-arc4random-chacha20.c
 create mode 100644 stdlib/tst-arc4random-fork.c
 create mode 100644 stdlib/tst-arc4random-stats.c
 create mode 100644 stdlib/tst-arc4random-thread.c
 create mode 100644 sysdeps/aarch64/chacha20-neon.S
 create mode 100644 sysdeps/aarch64/chacha20_arch.h
 create mode 100644 sysdeps/generic/chacha20_arch.h
 create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile
 create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c
 create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h
 create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c
 create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h
 create mode 100644 sysdeps/s390/s390-64/chacha20-vx.S
 create mode 100644 sysdeps/s390/s390-64/chacha20_arch.h
 create mode 100644 sysdeps/x86_64/chacha20-avx2.S
 create mode 100644 sysdeps/x86_64/chacha20-sse2.S
 create mode 100644 sysdeps/x86_64/chacha20_arch.h