From patchwork Tue Apr 19 21:28:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 53047 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3376B3858D3C for ; Tue, 19 Apr 2022 21:28:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3376B3858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1650403721; bh=Aq/S/uwGSXlBPH2zYMoZV9Dwvk7leksFWuAumV7bz1g=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=HSPr1/Qh0EHZzRIyVSePN6blTKaRAEi5nfM3xsPPrKF47SBIxmovQdQ4aXmrAYhzI lpbeBUPaCfSGyR/FAtNhrqUgOurIW5XMtsAZXVF1nd0XhKp8msoWEzXddmq4vkiq1d caRJpoMumfpOnEEbL8QuRmvJ7N4zzN2xKJEMK75U= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id E4E533858D3C for ; Tue, 19 Apr 2022 21:28:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E4E533858D3C Received: by mail-oi1-x236.google.com with SMTP id s16so49688oie.0 for ; Tue, 19 Apr 2022 14:28:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=Aq/S/uwGSXlBPH2zYMoZV9Dwvk7leksFWuAumV7bz1g=; b=DD4ex8fT1S2IJ+J/T6+cEjGd9vqP/u3MLRJRO9QoB2YNjYyt1qU2hOe2Di4RyVA2F4 N8/sgjCS9NJ/K4x+OnlLfWMJ/pj3N1NJ0+phiiKqkBp7aQ3fipE2quflVlr0qp6NHjZT tTCvBLoQvXu2SaQqdO0HgdxpoL/m7exkKtujyOvMC7bBUUMeLqu/VTlles5J9nj3qFVB 3XXTa5irQOg0bBItAEsJdjC3GJS2M+7dq75ZZiLwjzwXZDF0IrFWaWAliZwzRPnw+OPA zK9rIL/UNZOLIKlifCunBo2LETU6y2pJO9WPM4I6yOpU6ZCPFDTrwHNukkerZL3cGoyj 1JWQ== X-Gm-Message-State: AOAM533z1bCtkCdx0MQf2+FudzDGJMcscdrCw9wYsAG7vAX4jEtileC1 gUUJB/CaUF/hKNKNwXSPwkFzkkRWXjuffw== X-Google-Smtp-Source: ABdhPJz9FI1VMNL9duUno1M2E4A1L2YLC+Yj+ZfwucTSYqf6Fc0Qc0fhOYXbu5CMAUF4WEqEEEuyAw== X-Received: by 2002:a05:6808:1394:b0:2ec:ddb3:c82b with SMTP id c20-20020a056808139400b002ecddb3c82bmr279924oiw.274.1650403696411; Tue, 19 Apr 2022 14:28:16 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:c9d0:98f6:7aed:2f61:2745]) by smtp.gmail.com with ESMTPSA id nf9-20020a056871460900b000e2c44ca8edsm5473321oab.6.2022.04.19.14.28.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Apr 2022 14:28:15 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 0/9] Add arc4random support Date: Tue, 19 Apr 2022 18:28:03 -0300 Message-Id: <20220419212812.2688764-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch adds the arc4random, arc4random_buf, and arc4random_uniform along with optimized versions for x86_64, aarch64, powerpc64, and s390x. The generic implementation is based on scalar Chacha20, with a per thread state cache allocated in TCB. The internal state keeps a 256 bytes buffer (8 ChaCha20 blocks) plus the cipher state, which allows to better use the vectorized optimized version. It would be possible to use just 128 bytes, but it would require to rewrite the AVX2 optimization (and possible it would lower performance slight). The initial state and reseed uses getrandom or /dev/urandom as fallback and reseeds the internal state on every 16MB of consumed entropy. There is no fork detection, the internal state is reset only at the atfork handler. It does not handle direct clone calls, nor vfork or _Fork. Although it is lock-free, arc4random is still not async-signal-safe (the per thread state is not updated atomically). The generic ChaCha20 implementation is based on the RFC8439 [1] without the last XOR step. Since the input stream will either zero bytes (initial state) or the PRNG output itself this step does not add any extra entropy. The optimized ChaCha20 implementations for x86_64, aarch64, powerpc64, and s390x use vectorized instruction and they are based on libgcrypt code. ChaCha20 is used because is the standard cipher used on different arc4random implementation (BSDs, MacOSX), and recently on Linux random subsystem. It also offers a very cheap rekey, which uses periodically uses kernel entropy to improve randomness; it is also simpler than AES, and shows better performance when no specialized instructions are present. [1] https://sourceware.org/pipermail/libc-alpha/2018-June/094879.html v3: * Add per-thread cache to remove the lock usage. It should improve both performance and scalability. * Improve benchmark precision. * Fixed Hurd test build. v2: * Removed the last XOR operation on ChaCha20 implementation (it does not much on arc4random usage). * Add tst-arc4random-chacha20.c and refactor to check against the expected implementation. * Fixed aarch64 implementation (a last change to move symbols to hidden did not change the relocation to use it as well). * Refactor x86 SSSE3 to SSE2. * Fixed powerpc64 implementation on BE (use the correct macro to check for endianess instead the ones from libgcrpyt). * Add s390x optimized ChaCha20 implementation. Adhemerval Zanella (9): stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) stdlib: Add arc4random tests benchtests: Add arc4random benchtest aarch64: Add optimized chacha20 x86: Add SSE2 optimized chacha20 x86: Add AVX2 optimized chacha20 powerpc64: Add optimized chacha20 s390x: Add optimized chacha20 stdlib: Add TLS optimization to arc4random LICENSES | 22 + NEWS | 4 +- benchtests/Makefile | 6 +- benchtests/bench-arc4random.c | 224 +++++++ include/stdlib.h | 13 + nptl/allocatestack.c | 5 +- posix/fork.c | 2 + stdlib/Makefile | 9 + stdlib/Versions | 5 + stdlib/arc4random.c | 178 ++++++ stdlib/arc4random.h | 45 ++ stdlib/arc4random_uniform.c | 148 +++++ stdlib/chacha20.c | 164 +++++ stdlib/stdlib.h | 14 + stdlib/tst-arc4random-chacha20.c | 166 ++++++ stdlib/tst-arc4random-fork.c | 174 ++++++ stdlib/tst-arc4random-stats.c | 146 +++++ stdlib/tst-arc4random-thread.c | 278 +++++++++ sysdeps/aarch64/Makefile | 4 + sysdeps/aarch64/chacha20-neon.S | 323 ++++++++++ sysdeps/aarch64/chacha20_arch.h | 40 ++ sysdeps/generic/chacha20_arch.h | 24 + sysdeps/generic/not-cancel.h | 2 + sysdeps/generic/tls-internal-struct.h | 3 + sysdeps/mach/hurd/i386/libc.abilist | 3 + sysdeps/mach/hurd/not-cancel.h | 3 + sysdeps/powerpc/powerpc64/Makefile | 3 + sysdeps/powerpc/powerpc64/chacha20-ppc.c | 236 ++++++++ sysdeps/powerpc/powerpc64/chacha20_arch.h | 47 ++ sysdeps/s390/s390-64/Makefile | 4 + sysdeps/s390/s390-64/chacha20-vx.S | 564 ++++++++++++++++++ sysdeps/s390/s390-64/chacha20_arch.h | 45 ++ sysdeps/unix/sysv/linux/aarch64/libc.abilist | 3 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 3 + sysdeps/unix/sysv/linux/arc/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/le/libc.abilist | 3 + sysdeps/unix/sysv/linux/csky/libc.abilist | 3 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 3 + sysdeps/unix/sysv/linux/i386/libc.abilist | 3 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 3 + .../sysv/linux/m68k/coldfire/libc.abilist | 3 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 3 + .../sysv/linux/microblaze/be/libc.abilist | 3 + .../sysv/linux/microblaze/le/libc.abilist | 3 + .../sysv/linux/mips/mips32/fpu/libc.abilist | 3 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 3 + .../sysv/linux/mips/mips64/n32/libc.abilist | 3 + .../sysv/linux/mips/mips64/n64/libc.abilist | 3 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 3 + sysdeps/unix/sysv/linux/not-cancel.h | 7 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 3 + .../linux/powerpc/powerpc32/fpu/libc.abilist | 3 + .../powerpc/powerpc32/nofpu/libc.abilist | 3 + .../linux/powerpc/powerpc64/be/libc.abilist | 3 + .../linux/powerpc/powerpc64/le/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 3 + .../sysv/linux/sparc/sparc32/libc.abilist | 3 + .../sysv/linux/sparc/sparc64/libc.abilist | 3 + sysdeps/unix/sysv/linux/tls-internal.h | 27 +- .../unix/sysv/linux/x86_64/64/libc.abilist | 3 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 3 + sysdeps/x86_64/Makefile | 7 + sysdeps/x86_64/chacha20-avx2.S | 313 ++++++++++ sysdeps/x86_64/chacha20-sse2.S | 311 ++++++++++ sysdeps/x86_64/chacha20_arch.h | 48 ++ 71 files changed, 3711 insertions(+), 5 deletions(-) create mode 100644 benchtests/bench-arc4random.c create mode 100644 stdlib/arc4random.c create mode 100644 stdlib/arc4random.h create mode 100644 stdlib/arc4random_uniform.c create mode 100644 stdlib/chacha20.c create mode 100644 stdlib/tst-arc4random-chacha20.c create mode 100644 stdlib/tst-arc4random-fork.c create mode 100644 stdlib/tst-arc4random-stats.c create mode 100644 stdlib/tst-arc4random-thread.c create mode 100644 sysdeps/aarch64/chacha20-neon.S create mode 100644 sysdeps/aarch64/chacha20_arch.h create mode 100644 sysdeps/generic/chacha20_arch.h create mode 100644 sysdeps/powerpc/powerpc64/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/chacha20_arch.h create mode 100644 sysdeps/s390/s390-64/chacha20-vx.S create mode 100644 sysdeps/s390/s390-64/chacha20_arch.h create mode 100644 sysdeps/x86_64/chacha20-avx2.S create mode 100644 sysdeps/x86_64/chacha20-sse2.S create mode 100644 sysdeps/x86_64/chacha20_arch.h