From patchwork Thu Jul 14 11:28:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 55070 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 525C4388DF09 for ; Thu, 14 Jul 2022 11:29:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 525C4388DF09 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798154; bh=9lUFPigwPMzhvcbtyAlhO5/KaHeU/w0v+lTbW8YRVvU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=FB5IepqQJ0yk2IxiUCd4fi39F1VS/PeAlOCdDSdVCdQ5q66L0RsuVXenuIr5xd56r Iozoy/2u2AypxsmP40kShZukV5txSuKzARqRsBviwVs6gFN2XbxG8VWG4mLXJBi1JD rcdPH4pu7AdMQ0hVM7JjKf71VrxLiUKa75yXGMfo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by sourceware.org (Postfix) with ESMTPS id D71393889E10 for ; Thu, 14 Jul 2022 11:28:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D71393889E10 Received: by mail-ot1-x32f.google.com with SMTP id g19-20020a9d1293000000b0061c7bfda5dfso609898otg.1 for ; Thu, 14 Jul 2022 04:28:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=9lUFPigwPMzhvcbtyAlhO5/KaHeU/w0v+lTbW8YRVvU=; b=pnUIeSb/mx/5L+FaYhEUaEDn6BTV1LqJbKicZUEOvt3WNitCpt7j6tTKVnchj7GGg9 7aDKLfw15NxM8+5vt2rATBYm61mVvtpDNYL6MmoElZidCopUPVa1X7y9dCgpEs5CiU39 8xRzw/0dQ7J+peFlLXVe2xjEH4JRSzvdnZOhSf0CafyKi0ZVweEFbRKggmx9HluCJaW0 F3crRbt7dtuL7tEZBWuDTApD9NFDYD+CHACDiFpmFdKEmKUdv7Iqc4zo9J8Pl+mogK1X JnOhg/bixQ1GuBt6/9DPGTY+DYxFYto1OUWz7aj9hHOiFnGaeRH1iI4Olz4ph7I2hzNj p93A== X-Gm-Message-State: AJIora/XDIC15yL0iBK+8uoR7y1tMBYVhoBXEDkBLuHfr851mh7Hshbv XADjY8EIrBoQuq1kLuYs84jaxiDagoU7EQ== X-Google-Smtp-Source: AGRyM1ulTjb6i51bRVZZfD84q4cYe1lPQZIYCMaisGK0Hq61dAjX3Dy243H8xqwcp5HhCLhPSK7kAg== X-Received: by 2002:a9d:7194:0:b0:61c:5db2:526b with SMTP id o20-20020a9d7194000000b0061c5db2526bmr3150564otj.223.1657798130947; Thu, 14 Jul 2022 04:28:50 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.28.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:28:50 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 0/9] Add arc4random support Date: Thu, 14 Jul 2022 08:28:36 -0300 Message-Id: <20220714112845.704678-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch adds the arc4random, arc4random_buf, and arc4random_uniform functions along with optimized versions for x86_64 (sse2 and avx2), aarch64, powerpc64 (power8), and s390x (vx). The generic implementation is based on scalar Chacha20, with a per thread state cache allocated lazily. The internal state keeps a 256 bytes buffer (8 ChaCha20 blocks) plus the cipher state, which allows to better use the vectorized optimized version. It would be possible to use just 128 bytes, but it would require to rewrite the AVX2 optimization (and possible it would lower performance slight). The initial state and reseed uses getrandom or /dev/urandom as fallback and reseeds the internal state on every 16MB of consumed entropy. There is no fork detection, the internal state is reset only for fork and _Fork calls. It does not handle direct clone calls nor vfork. Although it is lock-free, arc4random is still not async-signal-safe (the per thread state is not updated atomically), although it is async-cancel-safe. The generic ChaCha20 implementation is based on the RFC8439 [1] without the last XOR step. Since the input stream will either zero bytes (initial state) or the PRNG output itself this step does not add any extra entropy. The optimized ChaCha20 implementations for x86_64, aarch64, powerpc64, and s390x use vectorized instruction and they are based on libgcrypt code. ChaCha20 is used because is the standard cipher used on different arc4random implementation (BSDs, MacOSX), and recently on Linux random subsystem. It also offers a very cheap rekey, which uses periodically uses kernel entropy to improve randomness; it is also simpler than AES, and shows better performance when no specialized instructions are present. [1] https://sourceware.org/pipermail/libc-alpha/2018-June/094879.html v10: * Fixed x86_64 building with different minimum ISA levels. * Fixed documentation. v9: * Reword NEWS entry, internal comments, and style. * Use explicit_bzero in more places. * Do not include bits/stdint-uintn.h on stdint.h. * Fixed documentation. v8: * Remove final register state clearing from optimized routines. v7: * Merged the lock-free TCV optimization on first patch. * Added the original Copyright headers from libgcrypt on imported implementations. * Fixed typos and wording. * Use DO_NOT_OPTIMIZE_OUT from hash benchmark. v6: * Replace array usage with variables and make compiler add hardening if required to cleanup any internal state. It also shows slight better performance. * Add tests for arc4random and arc4random_uniform on thread and fork. * Fixed documentation to state the functiosn as async-signal-unsafe. v5: * Added documentation. * Fixed typos. v4: * Fixed typos and expanded comments. * Fixed powerpc multi-arch organization. v3: * Add per-thread cache to remove the lock usage. It should improve both performance and scalability. * Improve benchmark precision. * Fixed Hurd test build. v2: * Removed the last XOR operation on ChaCha20 implementation (it does not much on arc4random usage). * Add tst-arc4random-chacha20.c and refactor to check against the expected implementation. * Fixed aarch64 implementation (a last change to move symbols to hidden did not change the relocation to use it as well). * Refactor x86 SSSE3 to SSE2. * Fixed powerpc64 implementation on BE (use the correct macro to check for endianess instead the ones from libgcrpyt). * Add s390x optimized ChaCha20 implementation. Adhemerval Zanella Netto (9): stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) stdlib: Add arc4random tests benchtests: Add arc4random benchtest aarch64: Add optimized chacha20 x86: Add SSE2 optimized chacha20 x86: Add AVX2 optimized chacha20 powerpc64: Add optimized chacha20 s390x: Add optimized chacha20 manual: Add documentation for arc4random functions LICENSES | 23 + NEWS | 4 + benchtests/Makefile | 5 +- benchtests/bench-arc4random.c | 218 +++++++ benchtests/bench-hash-funcs-kernel.h | 1 + benchtests/bench-hash-funcs.c | 2 - benchtests/bench-util.h | 7 + include/stdlib.h | 12 + malloc/thread-freeres.c | 2 +- manual/math.texi | 46 ++ nptl/allocatestack.c | 3 +- stdlib/Makefile | 9 + stdlib/Versions | 5 + stdlib/arc4random.c | 208 +++++++ stdlib/arc4random.h | 48 ++ stdlib/arc4random_uniform.c | 140 +++++ stdlib/chacha20.c | 191 ++++++ stdlib/stdlib.h | 13 + stdlib/tst-arc4random-chacha20.c | 167 +++++ stdlib/tst-arc4random-fork.c | 198 ++++++ stdlib/tst-arc4random-stats.c | 147 +++++ stdlib/tst-arc4random-thread.c | 341 +++++++++++ sysdeps/aarch64/Makefile | 4 + sysdeps/aarch64/chacha20-aarch64.S | 314 ++++++++++ sysdeps/aarch64/chacha20_arch.h | 40 ++ sysdeps/generic/chacha20_arch.h | 24 + sysdeps/generic/not-cancel.h | 2 + sysdeps/generic/tls-internal-struct.h | 1 + sysdeps/generic/tls-internal.c | 18 + sysdeps/generic/tls-internal.h | 7 +- sysdeps/mach/hurd/_Fork.c | 2 + sysdeps/mach/hurd/i386/libc.abilist | 3 + sysdeps/mach/hurd/not-cancel.h | 3 + sysdeps/nptl/_Fork.c | 2 + .../powerpc/powerpc64/be/multiarch/Makefile | 4 + .../powerpc64/be/multiarch/chacha20-ppc.c | 1 + .../powerpc64/be/multiarch/chacha20_arch.h | 42 ++ sysdeps/powerpc/powerpc64/power8/Makefile | 5 + .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 ++++++++ .../powerpc/powerpc64/power8/chacha20_arch.h | 37 ++ sysdeps/s390/s390-64/Makefile | 6 + sysdeps/s390/s390-64/chacha20-s390x.S | 573 ++++++++++++++++++ sysdeps/s390/s390-64/chacha20_arch.h | 45 ++ sysdeps/unix/sysv/linux/aarch64/libc.abilist | 3 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 3 + sysdeps/unix/sysv/linux/arc/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/le/libc.abilist | 3 + sysdeps/unix/sysv/linux/csky/libc.abilist | 3 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 3 + sysdeps/unix/sysv/linux/i386/libc.abilist | 3 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 3 + .../sysv/linux/m68k/coldfire/libc.abilist | 3 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 3 + .../sysv/linux/microblaze/be/libc.abilist | 3 + .../sysv/linux/microblaze/le/libc.abilist | 3 + .../sysv/linux/mips/mips32/fpu/libc.abilist | 3 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 3 + .../sysv/linux/mips/mips64/n32/libc.abilist | 3 + .../sysv/linux/mips/mips64/n64/libc.abilist | 3 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 3 + sysdeps/unix/sysv/linux/not-cancel.h | 7 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 3 + .../linux/powerpc/powerpc32/fpu/libc.abilist | 3 + .../powerpc/powerpc32/nofpu/libc.abilist | 3 + .../linux/powerpc/powerpc64/be/libc.abilist | 3 + .../linux/powerpc/powerpc64/le/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 3 + .../sysv/linux/sparc/sparc32/libc.abilist | 3 + .../sysv/linux/sparc/sparc64/libc.abilist | 3 + sysdeps/unix/sysv/linux/tls-internal.c | 39 +- sysdeps/unix/sysv/linux/tls-internal.h | 8 +- .../unix/sysv/linux/x86_64/64/libc.abilist | 3 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 3 + sysdeps/x86_64/Makefile | 7 + sysdeps/x86_64/chacha20-amd64-avx2.S | 328 ++++++++++ sysdeps/x86_64/chacha20-amd64-sse2.S | 311 ++++++++++ sysdeps/x86_64/chacha20_arch.h | 55 ++ 83 files changed, 4015 insertions(+), 18 deletions(-) create mode 100644 benchtests/bench-arc4random.c create mode 100644 stdlib/arc4random.c create mode 100644 stdlib/arc4random.h create mode 100644 stdlib/arc4random_uniform.c create mode 100644 stdlib/chacha20.c create mode 100644 stdlib/tst-arc4random-chacha20.c create mode 100644 stdlib/tst-arc4random-fork.c create mode 100644 stdlib/tst-arc4random-stats.c create mode 100644 stdlib/tst-arc4random-thread.c create mode 100644 sysdeps/aarch64/chacha20-aarch64.S create mode 100644 sysdeps/aarch64/chacha20_arch.h create mode 100644 sysdeps/generic/chacha20_arch.h create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h create mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S create mode 100644 sysdeps/s390/s390-64/chacha20_arch.h create mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S create mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S create mode 100644 sysdeps/x86_64/chacha20_arch.h