From patchwork Wed Apr 13 20:23:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52876 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 09AFB3857806 for ; Wed, 13 Apr 2022 20:25:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 09AFB3857806 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881512; bh=U52kHwrfS4Kf7os9KSN+mD5BXa/x+sPvo6JwsRrxtuc=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=KZXpmynzJ5QnRCoAvrUH/OATaBvnPBFYTp7+CAkor36BpHHOk/MQnIefjfFojJttf tGACamDvnsZfu3FOJto0afVo2rpOBj8nOQjYHGYlFF2Mb+GMAqGwElXC2feJb55yi/ 4fWJrWPNKOjg4QEGIFm40AkIdgGSKzanDFgSWGb0= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id D2DB2385781D for ; Wed, 13 Apr 2022 20:24:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D2DB2385781D Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-e2a00f2cc8so3218230fac.4 for ; Wed, 13 Apr 2022 13:24:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U52kHwrfS4Kf7os9KSN+mD5BXa/x+sPvo6JwsRrxtuc=; b=zWk7C6cVF5Q9gJZwE03dHeWRZnUeSDf6KsQayUNGY9ai+oZ1kWrMWdJvJH+DEgQoMu bHKzfQkOgRbYwXScuIgUvkCUZ+uCyd0vd7TJH/CeNo3UejW57bY2M5Oxrby2DH9V4Z/h Xnnap+PoHwU03cXkxdB75twqncgAqw5HGpfg8E/ni7XCJKuoBE1q+WfKR4/NuJalEegI R3i/TOYgyDvbQivkrQAml9hV9wfHaG+Fv2aK8U/jTh3Mh+OydpAbNBXrnFaWf1WBJxVc pVPV+NGEy7Fe0HcGHRM/NuhBpFNZuc1k851G2QueFQpQa10J9OFi9rYJ5mMgXy2EjVvP z9vw== X-Gm-Message-State: AOAM532+hJD9kESRIN7F8vmyJBVDC3CxTYbe0KOrxtKEJpQfxCGfECcy rcw3hbgVsNt4uN45NOhfvkyYm2+vdsWXlA== X-Google-Smtp-Source: ABdhPJyDYrIimuUIGRfvIRrp132Rk0lYbUScglTGXJPERBw0gXYZzJ3fAQ1u2N/sNWz6UIiYrSjmkw== X-Received: by 2002:a05:6871:1d4:b0:de:6122:2bbb with SMTP id q20-20020a05687101d400b000de61222bbbmr209709oad.210.1649881451110; Wed, 13 Apr 2022 13:24:11 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:10 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 1/7] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Date: Wed, 13 Apr 2022 17:23:55 -0300 Message-Id: <20220413202401.408267-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Cc: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The implementation is based on scalar Chacha20, with global cache and locking. It uses getrandom or /dev/urandom as fallback to get the initial entropy, and reseeds the internal state on every 16MB of consumed buffer. It maintains an internal buffer which consumes at maximum one page on most systems (assuming minimum of 4k pages). The internal buf optimizes the cipher encrypt calls, by amortize arc4random calls (where both function call and locks cost are the dominating factor). The ChaCha20 implementation is based on the RFC8439 [1], which a simple memcpy with xor implementation. The arc4random_uniform is based on previous work by Florian Weimer. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer [1] https://datatracker.ietf.org/doc/html/rfc8439 --- NEWS | 4 +- include/stdlib.h | 13 + posix/fork.c | 2 + stdlib/Makefile | 2 + stdlib/Versions | 5 + stdlib/arc4random.c | 242 ++++++++++++++++++ stdlib/arc4random_uniform.c | 152 +++++++++++ stdlib/chacha20.c | 211 +++++++++++++++ stdlib/stdlib.h | 14 + sysdeps/generic/not-cancel.h | 2 + sysdeps/mach/hurd/i386/libc.abilist | 3 + sysdeps/mach/hurd/not-cancel.h | 3 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 3 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 3 + sysdeps/unix/sysv/linux/arc/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/le/libc.abilist | 3 + sysdeps/unix/sysv/linux/csky/libc.abilist | 3 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 3 + sysdeps/unix/sysv/linux/i386/libc.abilist | 3 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 3 + .../sysv/linux/m68k/coldfire/libc.abilist | 3 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 3 + .../sysv/linux/microblaze/be/libc.abilist | 3 + .../sysv/linux/microblaze/le/libc.abilist | 3 + .../sysv/linux/mips/mips32/fpu/libc.abilist | 3 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 3 + .../sysv/linux/mips/mips64/n32/libc.abilist | 3 + .../sysv/linux/mips/mips64/n64/libc.abilist | 3 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 3 + sysdeps/unix/sysv/linux/not-cancel.h | 7 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 3 + .../linux/powerpc/powerpc32/fpu/libc.abilist | 3 + .../powerpc/powerpc32/nofpu/libc.abilist | 3 + .../linux/powerpc/powerpc64/be/libc.abilist | 3 + .../linux/powerpc/powerpc64/le/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 3 + .../sysv/linux/sparc/sparc32/libc.abilist | 3 + .../sysv/linux/sparc/sparc64/libc.abilist | 3 + .../unix/sysv/linux/x86_64/64/libc.abilist | 3 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 3 + 46 files changed, 758 insertions(+), 1 deletion(-) create mode 100644 stdlib/arc4random.c create mode 100644 stdlib/arc4random_uniform.c create mode 100644 stdlib/chacha20.c diff --git a/NEWS b/NEWS index 4b6d9de2b5..4d9d95b35b 100644 --- a/NEWS +++ b/NEWS @@ -9,7 +9,9 @@ Version 2.36 Major new features: - [Add new features here] +* The functions arc4random, arc4random_buf, arc4random_uniform have been + added. The functions use a cryptographic pseudo-random number generator + based on ChaCha20 initilized with entropy from kernel. Deprecated and removed features, and other changes affecting compatibility: diff --git a/include/stdlib.h b/include/stdlib.h index 1c6f70b082..055f9d2965 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -144,6 +144,19 @@ libc_hidden_proto (__ptsname_r) libc_hidden_proto (grantpt) libc_hidden_proto (unlockpt) +__typeof (arc4random) __arc4random; +libc_hidden_proto (__arc4random); +__typeof (arc4random_buf) __arc4random_buf; +libc_hidden_proto (__arc4random_buf); +__typeof (arc4random_uniform) __arc4random_uniform; +libc_hidden_proto (__arc4random_uniform); +extern void __arc4random_buf_internal (void *buffer, size_t len) + attribute_hidden; +/* Called from the fork function to reinitialize the internal lock in thte + child process. This avoids deadlocks if fork is called in multi-threaded + processes. */ +extern void __arc4random_fork_subprocess (void) attribute_hidden; + extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) __THROW __nonnull ((1)) __wur; diff --git a/posix/fork.c b/posix/fork.c index 6b50c091f9..87d8329b46 100644 --- a/posix/fork.c +++ b/posix/fork.c @@ -96,6 +96,8 @@ __libc_fork (void) &nss_database_data); } + call_function_static_weak (__arc4random_fork_subprocess); + /* Reset the lock the dynamic loader uses to protect its data. */ __rtld_lock_initialize (GL(dl_load_lock)); diff --git a/stdlib/Makefile b/stdlib/Makefile index 60fc59c12c..9f9cc1bd7f 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -53,6 +53,8 @@ routines := \ a64l \ abort \ abs \ + arc4random \ + arc4random_uniform \ at_quick_exit \ atof \ atoi \ diff --git a/stdlib/Versions b/stdlib/Versions index 5e9099a153..d09a308fb5 100644 --- a/stdlib/Versions +++ b/stdlib/Versions @@ -136,6 +136,11 @@ libc { strtof32; strtof64; strtof32x; strtof32_l; strtof64_l; strtof32x_l; } + GLIBC_2.36 { + arc4random; + arc4random_buf; + arc4random_uniform; + } GLIBC_PRIVATE { # functions which have an additional interface since they are # are cancelable. diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c new file mode 100644 index 0000000000..6653986cc4 --- /dev/null +++ b/stdlib/arc4random.c @@ -0,0 +1,242 @@ +/* Pseudo Random Number Generator based on ChaCha20. + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* Besides the cipher state 'ctx', it keeps two counters: 'have' is ithe current + valid bytes not yet consumed in 'buf', while 'count' is the maximum number of + bytes until a reseed. + + Both the initial seed an reseed tries to obtain entropy from the kernel + and abort the process if none could be obtained. + + The state 'buf' improves the usage of the cipher call, allowing to call optimized + implementations (if the archictecture provides it) and optimize arc4random + calls (since only multiple call it will encrypt the next block). */ + +struct arc4random_state +{ + struct chacha20_state ctx; + size_t have; + size_t count; + uint8_t buf[CHACHA20_BUFSIZE]; +} *state; + +/* Indicate that MADV_WIPEONFORK is supported by the kernel and thus + it does not require to clear the internal state. */ +static bool __arc4random_wipeonfork = false; + +__libc_lock_define_initialized (, arc4random_lock); + +/* Maximum number bytes until reseed (16 MB). */ +#define CHACHE_RESEED_SIZE (16 * 1024 * 1024) + +/* Called from the fork function to reset the state if MADV_WIPEONFORK is + not supported and to reinit the internal lock. */ +void +__arc4random_fork_subprocess (void) +{ + if (__arc4random_wipeonfork && state != NULL) + memset (state, 0, sizeof (struct arc4random_state)); + + __libc_lock_init (arc4random_lock); +} + +static void +arc4random_allocate_failure (void) +{ + __libc_fatal ("Fatal glibc error: Cannot allocate memory for arc4random\n"); +} + +static void +arc4random_getrandom_failure (void) +{ + __libc_fatal ("Fatal glibc error: Cannot get entropy for arc4random\n"); +} + +/* Fork detection is done by checking if MADV_WIPEONFORK supported. If not + the fork callback will reset the state on the fork call. It does not + handle direct clone calls, nor vfork or _Fork (arc4random is not + async-signal-safe due the internal lock usage). */ +static void +arc4random_init (uint8_t *buf, size_t len) +{ + state = __mmap (NULL, sizeof (struct arc4random_state), + PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (state == MAP_FAILED) + arc4random_allocate_failure (); + +#ifdef MADV_WIPEONFORK + int r = __madvise (state, sizeof (struct arc4random_state), MADV_WIPEONFORK); + if (r == 0) + __arc4random_wipeonfork = true; + else if (errno != EINVAL) + arc4random_allocate_failure (); +#endif + + chacha20_init (&state->ctx, buf, buf + CHACHA20_KEY_SIZE); +} + +#define min(x,y) (((x) > (y)) ? (y) : (x)) + +static void +arc4random_rekey (uint8_t *rnd, size_t rndlen) +{ + memset (state->buf, 0, sizeof state->buf); + chacha20_crypt (&state->ctx, state->buf, state->buf, sizeof state->buf); + + /* Mix some extra entropy if provided. */ + if (rnd != NULL) + { + size_t m = min (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); + for (size_t i = 0; i < m; i++) + state->buf[i] ^= rnd[i]; + } + + /* Immediately reinit for backtracking resistance. */ + chacha20_init (&state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); + memset (state->buf, 0, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); + state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); +} + +static void +arc4random_getentropy (uint8_t *rnd, size_t len) +{ + if (__getrandomn_nocancel (rnd, len, GRND_NONBLOCK) == len) + return; + + int fd = __open64_nocancel ("/dev/urandom", O_RDONLY); + if (fd != -1) + { + unsigned char *p = rnd; + unsigned char *end = p + len; + do + { + ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); + if (ret <= 0) + arc4random_getrandom_failure (); + p += ret; + } + while (p < end); + + if (__close_nocancel (fd) != 0) + return; + } + arc4random_getrandom_failure (); +} + +/* Either allocates the state buffer or reinit it by reseeding the cipher + state with kernel entropy. */ +static void +arc4random_stir (void) +{ + uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; + arc4random_getentropy (rnd, sizeof rnd); + + if (state == NULL) + arc4random_init (rnd, sizeof rnd); + else + arc4random_rekey (rnd, sizeof rnd); + + explicit_bzero (rnd, sizeof rnd); + + state->have = 0; + memset (state->buf, 0, sizeof state->buf); + state->count = CHACHE_RESEED_SIZE; +} + +static void +arc4random_check_stir (size_t len) +{ + if (state == NULL || state->count < len) + arc4random_stir (); + if (state->count <= len) + state->count = 0; + else + state->count -= len; +} + +void +__arc4random_buf_internal (void *buffer, size_t len) +{ + arc4random_check_stir (len); + + while (len > 0) + { + if (state->have > 0) + { + size_t m = min (len, state->have); + uint8_t *ks = state->buf + sizeof (state->buf) - state->have; + memcpy (buffer, ks, m); + memset (ks, 0, m); + buffer += m; + len -= m; + state->have -= m; + } + if (state->have == 0) + arc4random_rekey (NULL, 0); + } +} + +void +__arc4random_buf (void *buffer, size_t len) +{ + __libc_lock_lock (arc4random_lock); + __arc4random_buf_internal (buffer, len); + __libc_lock_unlock (arc4random_lock); +} +libc_hidden_def (__arc4random_buf) +weak_alias (__arc4random_buf, arc4random_buf) + + +static uint32_t +__arc4random_internal (void) +{ + uint32_t r; + + arc4random_check_stir (sizeof (uint32_t)); + if (state->have < sizeof (uint32_t)) + arc4random_rekey (NULL, 0); + uint8_t *ks = state->buf + sizeof (state->buf) - state->have; + memcpy (&r, ks, sizeof (uint32_t)); + memset (ks, 0, sizeof (uint32_t)); + state->have -= sizeof (uint32_t); + + return r; +} + +uint32_t +__arc4random (void) +{ + uint32_t r; + __libc_lock_lock (arc4random_lock); + r = __arc4random_internal (); + __libc_lock_unlock (arc4random_lock); + return r; +} +libc_hidden_def (__arc4random) +weak_alias (__arc4random, arc4random) diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c new file mode 100644 index 0000000000..0cc919d8e1 --- /dev/null +++ b/stdlib/arc4random_uniform.c @@ -0,0 +1,152 @@ +/* Random pseudo generator numbers between 0 and 2**-31 (inclusive) + uniformly distributed but with an upper_bound. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include + +/* Return the number of bytes which cover values up to the limit. */ +__attribute__ ((const)) +static uint32_t +byte_count (uint32_t n) +{ + if (n <= (1U << 8)) + return 1; + else if (n <= (1U << 16)) + return 2; + else if (n <= (1U << 24)) + return 3; + else + return 4; +} + +/* Fill the lower bits of the result with randomness, according to the + number of bytes requested. */ +static void +random_bytes (uint32_t *result, uint32_t byte_count) +{ + *result = 0; + unsigned char *ptr = (unsigned char *) result; + if (__BYTE_ORDER == __BIG_ENDIAN) + ptr += 4 - byte_count; + __arc4random_buf_internal (ptr, byte_count); +} + +static uint32_t +compute_uniform (uint32_t n) +{ + if (n <= 1) + /* There is no valid return value for a zero limit, and 0 is the + only possible result for limit 1. */ + return 0; + + /* The bits variable serves as a source for bits. Prefetch the + minimum number of bytes needed. */ + unsigned count = byte_count (n); + uint32_t bits_length = count * CHAR_BIT; + uint32_t bits; + random_bytes (&bits, count); + + /* Powers of two are easy. */ + if (powerof2 (n)) + return bits & (n - 1); + + /* The general case. This algorithm follows Jérémie Lumbroso, + Optimal Discrete Uniform Generation from Coin Flips, and + Applications (2013), who credits Donald E. Knuth and Andrew + C. Yao, The complexity of nonuniform random number generation + (1976), for solving the general case. + + The implementation below unrolls the initialization stage of the + loop, where v is less than n. */ + + /* Use 64-bit variables even though the intermediate results are + never larger that 33 bits. This ensures the code easier to + compile on 64-bit architectures. */ + uint64_t v; + uint64_t c; + + /* Initialize v and c. v is the smallest power of 2 which is larger + than n.*/ + { + uint32_t log2p1 = 32 - __builtin_clz (n); + v = 1ULL << log2p1; + c = bits & (v - 1); + bits >>= log2p1; + bits_length -= log2p1; + } + + /* At the start of the loop, c is uniformly distributed within the + half-open interval [0, v), and v < 2n < 2**33. */ + while (true) + { + if (v >= n) + { + /* If the candidate is less than n, accept it. */ + if (c < n) + /* c is uniformly distributed on [0, n). */ + return c; + else + { + /* c is uniformly distributed on [n, v). */ + v -= n; + c -= n; + /* The distribution was shifted, so c is uniformly + distributed on [0, v) again. */ + } + } + /* v < n here. */ + + /* Replenish the bit source if necessary. */ + if (bits_length == 0) + { + /* Overwrite the least significant byte. */ + random_bytes (&bits, 1); + bits_length = CHAR_BIT; + } + + /* Double the range. No overflow because v < n < 2**32. */ + v *= 2; + /* v < 2n here. */ + + /* Extract a bit and append it to c. c remains less than v and + thus 2**33. */ + c = (c << 1) | (bits & 1); + bits >>= 1; + --bits_length; + + /* At this point, c is uniformly distributed on [0, v) again, + and v < 2n < 2**33. */ + } +} + +__libc_lock_define (extern , arc4random_lock attribute_hidden) + +uint32_t +__arc4random_uniform (uint32_t upper_bound) +{ + uint32_t r; + __libc_lock_lock (arc4random_lock); + r = compute_uniform (upper_bound); + __libc_lock_unlock (arc4random_lock); + return r; +} +libc_hidden_def (__arc4random_uniform) +weak_alias (__arc4random_uniform, arc4random_uniform) diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c new file mode 100644 index 0000000000..dbd87bd942 --- /dev/null +++ b/stdlib/chacha20.c @@ -0,0 +1,211 @@ +/* Generic ChaCha20 implementation (used on arc4random). + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include + +/* 32-bit stream position, then 96-bit nonce. */ +#define CHACHA20_IV_SIZE 16 +#define CHACHA20_KEY_SIZE 32 +#define CHACHA20_KEY_WORDS (CHACHA20_KEY_SIZE / sizeof (uint32_t)) + +#define CHACHA20_BLOCK_SIZE 64 +#define CHACHA20_BUFSIZE (16 * CHACHA20_BLOCK_SIZE) + +#define CHACHA20_STATE_LEN 16 + +enum chacha20_constants +{ + CHACHA20_CONSTANT_EXPA = 0x61707865U, + CHACHA20_CONSTANT_ND_3 = 0x3320646eU, + CHACHA20_CONSTANT_2_BY = 0x79622d32U, + CHACHA20_CONSTANT_TE_K = 0x6b206574U +}; + +struct chacha20_state +{ + uint32_t ctx[CHACHA20_STATE_LEN]; +}; + +#define READ_UNALIGNED_FUNC(type) \ + static inline uint##type##_t \ + read_unaligned_##type (const uint8_t *p) \ + { \ + uint##type##_t r; \ + memcpy (&r, p, sizeof (r)); \ + return r; \ + } +READ_UNALIGNED_FUNC(16) +READ_UNALIGNED_FUNC(32) +READ_UNALIGNED_FUNC(64) + +#define WRITE_UNALIGNED_FUNC(type) \ + static inline void \ + write_unaligned_##type (uint8_t *p, uint##type##_t v) \ + { \ + memcpy (p, &v, sizeof (v)); \ + } +WRITE_UNALIGNED_FUNC(16) +WRITE_UNALIGNED_FUNC(32) +WRITE_UNALIGNED_FUNC(64) + +static inline uint32_t +read_unaligned_le32 (const uint8_t *p) +{ + uint32_t v = read_unaligned_32 (p); +#if __BYTE_ORDER == __BIG_ENDIAN + return __builtin_bswap32 (v); +#else + return v; +#endif +} + +static inline void +write_unaligned_le32 (uint8_t *p, uint32_t v) +{ +#if __BYTE_ORDER == __BIG_ENDIAN + v = __builtin_bswap32 (v); +#endif + write_unaligned_32 (p, v); +} + +static inline void +chacha20_init (struct chacha20_state *s, const uint8_t *key, const uint8_t *iv) +{ + s->ctx[0] = CHACHA20_CONSTANT_EXPA; + s->ctx[1] = CHACHA20_CONSTANT_ND_3; + s->ctx[2] = CHACHA20_CONSTANT_2_BY; + s->ctx[3] = CHACHA20_CONSTANT_TE_K; + + s->ctx[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); + s->ctx[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); + s->ctx[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); + s->ctx[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); + s->ctx[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); + s->ctx[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); + s->ctx[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); + s->ctx[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); + + s->ctx[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); + s->ctx[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); + s->ctx[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); + s->ctx[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); +} + +static inline uint32_t +rotl32 (unsigned int shift, uint32_t word) +{ + return (word << (shift & 31)) | (word >> ((-shift) & 31)); +} + +#define QROUND(x0, x1, x2, x3) \ + do { \ + x0 = x0 + x1; x3 = rotl32 (16, (x0 ^ x3)); \ + x2 = x2 + x3; x1 = rotl32 (12, (x1 ^ x2)); \ + x0 = x0 + x1; x3 = rotl32 (8, (x0 ^ x3)); \ + x2 = x2 + x3; x1 = rotl32 (7, (x1 ^ x2)); \ + } while(0) + +static inline void +chacha20_block (uint32_t *state, uint8_t *stream) +{ + uint32_t x[CHACHA20_STATE_LEN]; + memcpy (x, state, sizeof x); + + for (int i = 0; i < 20; i += 2) + { + QROUND(x[0], x[4], x[8], x[12]); + QROUND(x[1], x[5], x[9], x[13]); + QROUND(x[2], x[6], x[10], x[14]); + QROUND(x[3], x[7], x[11], x[15]); + + QROUND(x[0], x[5], x[10], x[15]); + QROUND(x[1], x[6], x[11], x[12]); + QROUND(x[2], x[7], x[8], x[13]); + QROUND(x[3], x[4], x[9], x[14]); + } + + for (int i = 0; i < CHACHA20_STATE_LEN; i++) + { + uint32_t v = x[i] + state[i]; + write_unaligned_le32 (&stream[i * sizeof (uint32_t)], v); + } + + state[12]++; +} + +static void +memxorcpy (uint8_t *dst, const uint8_t *src1, const uint8_t *src2, size_t len) +{ + while (len >= 8) + { + uint64_t l = read_unaligned_64 (src1) ^ read_unaligned_64 (src2); + write_unaligned_64 (dst, l); + dst += 8; + src1 += 8; + src2 += 8; + len -= 8; + } + + if (len >= 4) + { + uint32_t l = read_unaligned_32 (src1) ^ read_unaligned_32 (src2); + write_unaligned_32 (dst, l); + dst += 4; + src1 += 4; + src2 += 4; + len -= 4; + } + + if (len >= 2) + { + uint16_t l = read_unaligned_16 (src1) ^ read_unaligned_32 (src2); + write_unaligned_16 (dst, l); + dst += 2; + src1 += 2; + src2 += 2; + len -= 2; + } + + if (len >= 1) + *dst++ = *src1++ ^ *src2++; +} + +static void +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + uint8_t stream[CHACHA20_BLOCK_SIZE]; + + while (bytes >= CHACHA20_BLOCK_SIZE) + { + chacha20_block (state->ctx, stream); + memxorcpy (dst, src, stream, CHACHA20_BLOCK_SIZE); + bytes -= CHACHA20_BLOCK_SIZE; + dst += CHACHA20_BLOCK_SIZE; + src += CHACHA20_BLOCK_SIZE; + } + if (bytes != 0) + { + chacha20_block (state->ctx, stream); + memxorcpy (dst, src, stream, bytes); + } +} diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h index bf7cd438e1..f2b0c83c12 100644 --- a/stdlib/stdlib.h +++ b/stdlib/stdlib.h @@ -485,6 +485,7 @@ extern unsigned short int *seed48 (unsigned short int __seed16v[3]) extern void lcong48 (unsigned short int __param[7]) __THROW __nonnull ((1)); # ifdef __USE_MISC +# include /* Data structure for communication with thread safe versions. This type is to be regarded as opaque. It's only exported because users have to allocate objects of this type. */ @@ -533,6 +534,19 @@ extern int seed48_r (unsigned short int __seed16v[3], extern int lcong48_r (unsigned short int __param[7], struct drand48_data *__buffer) __THROW __nonnull ((1, 2)); + +/* Return a random integer between zero and 2**31-1 (inclusive). */ +extern uint32_t arc4random (void) + __THROW __wur; + +/* Fill the buffer with random data. */ +extern void arc4random_buf (void *__buf, size_t __size) + __THROW __nonnull ((1)); + +/* Return a random number between zero (inclusive) and the specified + limit (exclusive). */ +extern uint32_t arc4random_uniform (uint32_t __upper_bound) + __THROW __wur; # endif /* Use misc. */ #endif /* Use misc or X/Open. */ diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h index 2104efeb54..f4882a9ffd 100644 --- a/sysdeps/generic/not-cancel.h +++ b/sysdeps/generic/not-cancel.h @@ -48,5 +48,7 @@ (void) __writev (fd, iov, n) #define __fcntl64_nocancel(fd, cmd, ...) \ __fcntl64 (fd, cmd, __VA_ARGS__) +#define __getrandomn_nocancel(buf, size, flags) \ + __getrandom (buf, size, flags) #endif /* NOT_CANCEL_H */ diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist index 4dc87e9061..7bd565103b 100644 --- a/sysdeps/mach/hurd/i386/libc.abilist +++ b/sysdeps/mach/hurd/i386/libc.abilist @@ -2289,6 +2289,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 close_range F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h index 6ec92ced84..39edfe76b6 100644 --- a/sysdeps/mach/hurd/not-cancel.h +++ b/sysdeps/mach/hurd/not-cancel.h @@ -74,6 +74,9 @@ __typeof (__fcntl) __fcntl_nocancel; #define __fcntl64_nocancel(...) \ __fcntl_nocancel (__VA_ARGS__) +#define __getrandomn_nocancel(buf, size, flags) \ + __getrandom (buf, size, flags) + #if IS_IN (libc) hidden_proto (__close_nocancel) hidden_proto (__close_nocancel_nostatus) diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index 1b63d9e447..f8f38bb205 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2616,3 +2616,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index e7e4cf7d2a..9de1726de0 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2713,6 +2713,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist index bc3d228e31..16e2532838 100644 --- a/sysdeps/unix/sysv/linux/arc/libc.abilist +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist @@ -2377,3 +2377,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist index db7039c4ab..ae9e465088 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist @@ -496,6 +496,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist index d2add4fb49..b669f43194 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist @@ -493,6 +493,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist index 355d72a30c..42daa90248 100644 --- a/sysdeps/unix/sysv/linux/csky/libc.abilist +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist @@ -2652,3 +2652,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index 3df39bb28c..090be20f53 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -2601,6 +2601,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index c4da358f80..6b7cf064bb 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2785,6 +2785,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index 241bac70ea..3e766f64dd 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -2551,6 +2551,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 78bf372b72..c0b99199a8 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -497,6 +497,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0x98 GLIBC_2.4 _IO_2_1_stdin_ D 0x98 diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 00df5c901f..4d0be7c86d 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -2728,6 +2728,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist index e8118569c3..b944680ede 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist @@ -2701,3 +2701,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist index c0d2373e64..28f7d19983 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist @@ -2698,3 +2698,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 2d0fd04f54..3da7cdaca5 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -2693,6 +2693,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index e39ccfb312..9fe87f15be 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -2691,6 +2691,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 1e900f86e4..c14fca2111 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -2699,6 +2699,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index 9145ba7931..a363830226 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -2602,6 +2602,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index e95d60d926..89b6f98667 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2740,3 +2740,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 75b9e0ee1e..be5df35927 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -67,6 +67,13 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) INTERNAL_SYSCALL_CALL (writev, fd, iov, iovcnt); } +static inline int +__getrandomn_nocancel (void *buf, size_t buflen, unsigned int flags) +{ + return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); +} + + /* Uncancelable fcntl. */ __typeof (__fcntl) __fcntl64_nocancel; diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist index ca934e374b..94c0ff9526 100644 --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist @@ -2123,3 +2123,6 @@ GLIBC_2.35 wprintf F GLIBC_2.35 write F GLIBC_2.35 writev F GLIBC_2.35 wscanf F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index 3820b9f235..d6188de00b 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -2755,6 +2755,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index 464dc27fcd..8201230059 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2788,6 +2788,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index 2f7e58747f..623505d783 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -2510,6 +2510,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 4f3043d913..23b0d83408 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2812,3 +2812,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index 84b6ac815a..a72e8ed9cc 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2379,3 +2379,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index 4d5c19c56a..f3faecc2ae 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2579,3 +2579,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index 7c5ee8d569..105e5a9231 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2753,6 +2753,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 50de0b46cf..c08c6c8301 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -2547,6 +2547,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist index 66fba013ca..8ec1005644 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist @@ -2608,6 +2608,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist index 38703f8aa0..5d776576f9 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist @@ -2605,6 +2605,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 6df55eb765..f5f07f612e 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -2748,6 +2748,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 _IO_fprintf F GLIBC_2.4 _IO_printf F GLIBC_2.4 _IO_sprintf F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index b90569d881..be687ebe02 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -2574,6 +2574,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index e88b0f101f..7f456fbb55 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -2525,6 +2525,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.4 __confstr_chk F GLIBC_2.4 __fgets_chk F GLIBC_2.4 __fgets_unlocked_chk F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index e0755272eb..c737201248 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2631,3 +2631,6 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F From patchwork Wed Apr 13 20:23:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52877 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 42DE7385801E for ; Wed, 13 Apr 2022 20:26:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 42DE7385801E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881560; bh=/Aer3o162IVmRNJgcgrGwPrt2jRaYdBnvvExZcx4yyY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=JueCVt/K0uLm9JY4SDf0C9SGWomvhGNWknJ1FQl6dKTzt17UzS1plkBSpvZ2neYad J5dPmq6+1ACNgJ/+Dv55vkaGBiXONIAX5W8yEMT0/SdqaV/zjhJNs2HQPC8dnKjqgG ZKdz8hM9XDyHpr7oaH5rqzv8GyynUCwqT8B7wTiY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x2f.google.com (mail-oa1-x2f.google.com [IPv6:2001:4860:4864:20::2f]) by sourceware.org (Postfix) with ESMTPS id C32EC3857836 for ; Wed, 13 Apr 2022 20:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C32EC3857836 Received: by mail-oa1-x2f.google.com with SMTP id 586e51a60fabf-dacc470e03so3212158fac.5 for ; Wed, 13 Apr 2022 13:24:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/Aer3o162IVmRNJgcgrGwPrt2jRaYdBnvvExZcx4yyY=; b=nGT29S5892bllbLoliU/IoyZlGKHb+T4Pt4sP9iL7cM823uIaTCiz9ktlm/5rrrhkj 9fAJgBN0prOk/mEh4D9PZKLhOlATShZZDEHOYXRZd8cjVPksrcuIpO8VLS456YdM0BKY JdwM9t3H6CHrG6jIRQChqHFhCGhajnNAz/2bLkRxcNEBxPmJjbb5F0ylgjpb3Tvt8qkR k1Roz73XqM1RofC7Ij26boW70jvU9uKEXA4rthdnztotklEoaW8ot0urnZ76hOVy8Kmg jXAeGY6MxFP9y07tYV1R6ruQrtV5DjcuHaRrAs5l6SX5WHb0m1AoF2EOhuJl+NRNGtzO vpaA== X-Gm-Message-State: AOAM532zkc8yXIwN4zlghJXu8LhFnsTZYe0ce6IZeVT4WLyNYR3JZ2ww R6J3hEVBTzZxku+xSLcyl57Lw4NYRKhJrw== X-Google-Smtp-Source: ABdhPJwsU1ttUnkfMJW3uXw7rZeUMaieZbfdPbDObILptJHop36qUOzNFh2Ve97eU1qItb9TWz5KcQ== X-Received: by 2002:a05:6871:b06:b0:e2:ddbc:cf05 with SMTP id fq6-20020a0568710b0600b000e2ddbccf05mr203761oab.177.1649881453984; Wed, 13 Apr 2022 13:24:13 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:13 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 2/7] stdlib: Add arc4random tests Date: Wed, 13 Apr 2022 17:23:56 -0300 Message-Id: <20220413202401.408267-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Cc: Florian Weimer Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" The basic tst-arc4random-chacha20.c checks if the output of ChaCha20 implementation matches the reference test vectors from RFC8439. The tst-arc4random-fork.c check if subprocesses generate distinct streams of randomness (if fork handling is done correctly). The tst-arc4random-stats.c is a statistical test to the randomness of arc4random, arc4random_buf, and arc4random_uniform. The tst-arc4random-thread.c check if threads generate distinct streams of randomness (if function are thread-safe). Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer --- stdlib/Makefile | 4 + stdlib/tst-arc4random-chacha20.c | 225 +++++++++++++++++++++++++ stdlib/tst-arc4random-fork.c | 174 +++++++++++++++++++ stdlib/tst-arc4random-stats.c | 146 ++++++++++++++++ stdlib/tst-arc4random-thread.c | 278 +++++++++++++++++++++++++++++++ 5 files changed, 827 insertions(+) create mode 100644 stdlib/tst-arc4random-chacha20.c create mode 100644 stdlib/tst-arc4random-fork.c create mode 100644 stdlib/tst-arc4random-stats.c create mode 100644 stdlib/tst-arc4random-thread.c diff --git a/stdlib/Makefile b/stdlib/Makefile index 9f9cc1bd7f..4862d008ab 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -183,6 +183,9 @@ tests := \ testmb2 \ testrand \ testsort \ + tst-arc4random-fork \ + tst-arc4random-stats \ + tst-arc4random-thread \ tst-at_quick_exit \ tst-atexit \ tst-atof1 \ @@ -252,6 +255,7 @@ tests-internal := \ # tests-internal tests-static := \ + tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c new file mode 100644 index 0000000000..c5876d3f3b --- /dev/null +++ b/stdlib/tst-arc4random-chacha20.c @@ -0,0 +1,225 @@ +/* Basic tests for chacha20 cypher used in arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +static int +do_test (void) +{ + /* Reference ChaCha20 encryption test vectors from RFC8439. */ + + /* Test vector #1. */ + { + struct chacha20_state state; + + uint8_t key[CHACHA20_KEY_SIZE] = + { + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + }; + uint8_t iv[CHACHA20_IV_SIZE] = + { + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + }; + const uint8_t plaintext[CHACHA20_BLOCK_SIZE] = { 0 }; + uint8_t ciphertext[CHACHA20_BLOCK_SIZE]; + + chacha20_init (&state, key, iv); + chacha20_crypt (&state, ciphertext, plaintext, sizeof plaintext); + + const uint8_t expected[] = + { + 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, + 0x40, 0x5d, 0x6a, 0xe5, 0x53, 0x86, 0xbd, 0x28, + 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, 0xed, 0x1a, + 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, + 0xda, 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, + 0x77, 0x24, 0xe0, 0x3f, 0xb8, 0xd8, 0x4a, 0x37, + 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, 0x1c, + 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86 + }; + TEST_COMPARE_BLOB (ciphertext, sizeof ciphertext, + expected, sizeof expected); + } + + /* Test vector #2. */ + { + struct chacha20_state state; + + uint8_t key[CHACHA20_KEY_SIZE] = + { + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, + }; + uint8_t iv[CHACHA20_IV_SIZE] = + { + 0x1, 0x0, 0x0, 0x0, /* Block counter is a LE uint32_t */ + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2 + }; + const uint8_t plaintext[] = + { + 0x41, 0x6e, 0x79, 0x20, 0x73, 0x75, 0x62, 0x6d, 0x69, 0x73, 0x73, + 0x69, 0x6f, 0x6e, 0x20, 0x74, 0x6f, 0x20, 0x74, 0x68, 0x65, 0x20, + 0x49, 0x45, 0x54, 0x46, 0x20, 0x69, 0x6e, 0x74, 0x65, 0x6e, 0x64, + 0x65, 0x64, 0x20, 0x62, 0x79, 0x20, 0x74, 0x68, 0x65, 0x20, 0x43, + 0x6f, 0x6e, 0x74, 0x72, 0x69, 0x62, 0x75, 0x74, 0x6f, 0x72, 0x20, + 0x66, 0x6f, 0x72, 0x20, 0x70, 0x75, 0x62, 0x6c, 0x69, 0x63, 0x61, + 0x74, 0x69, 0x6f, 0x6e, 0x20, 0x61, 0x73, 0x20, 0x61, 0x6c, 0x6c, + 0x20, 0x6f, 0x72, 0x20, 0x70, 0x61, 0x72, 0x74, 0x20, 0x6f, 0x66, + 0x20, 0x61, 0x6e, 0x20, 0x49, 0x45, 0x54, 0x46, 0x20, 0x49, 0x6e, + 0x74, 0x65, 0x72, 0x6e, 0x65, 0x74, 0x2d, 0x44, 0x72, 0x61, 0x66, + 0x74, 0x20, 0x6f, 0x72, 0x20, 0x52, 0x46, 0x43, 0x20, 0x61, 0x6e, + 0x64, 0x20, 0x61, 0x6e, 0x79, 0x20, 0x73, 0x74, 0x61, 0x74, 0x65, + 0x6d, 0x65, 0x6e, 0x74, 0x20, 0x6d, 0x61, 0x64, 0x65, 0x20, 0x77, + 0x69, 0x74, 0x68, 0x69, 0x6e, 0x20, 0x74, 0x68, 0x65, 0x20, 0x63, + 0x6f, 0x6e, 0x74, 0x65, 0x78, 0x74, 0x20, 0x6f, 0x66, 0x20, 0x61, + 0x6e, 0x20, 0x49, 0x45, 0x54, 0x46, 0x20, 0x61, 0x63, 0x74, 0x69, + 0x76, 0x69, 0x74, 0x79, 0x20, 0x69, 0x73, 0x20, 0x63, 0x6f, 0x6e, + 0x73, 0x69, 0x64, 0x65, 0x72, 0x65, 0x64, 0x20, 0x61, 0x6e, 0x20, + 0x22, 0x49, 0x45, 0x54, 0x46, 0x20, 0x43, 0x6f, 0x6e, 0x74, 0x72, + 0x69, 0x62, 0x75, 0x74, 0x69, 0x6f, 0x6e, 0x22, 0x2e, 0x20, 0x53, + 0x75, 0x63, 0x68, 0x20, 0x73, 0x74, 0x61, 0x74, 0x65, 0x6d, 0x65, + 0x6e, 0x74, 0x73, 0x20, 0x69, 0x6e, 0x63, 0x6c, 0x75, 0x64, 0x65, + 0x20, 0x6f, 0x72, 0x61, 0x6c, 0x20, 0x73, 0x74, 0x61, 0x74, 0x65, + 0x6d, 0x65, 0x6e, 0x74, 0x73, 0x20, 0x69, 0x6e, 0x20, 0x49, 0x45, + 0x54, 0x46, 0x20, 0x73, 0x65, 0x73, 0x73, 0x69, 0x6f, 0x6e, 0x73, + 0x2c, 0x20, 0x61, 0x73, 0x20, 0x77, 0x65, 0x6c, 0x6c, 0x20, 0x61, + 0x73, 0x20, 0x77, 0x72, 0x69, 0x74, 0x74, 0x65, 0x6e, 0x20, 0x61, + 0x6e, 0x64, 0x20, 0x65, 0x6c, 0x65, 0x63, 0x74, 0x72, 0x6f, 0x6e, + 0x69, 0x63, 0x20, 0x63, 0x6f, 0x6d, 0x6d, 0x75, 0x6e, 0x69, 0x63, + 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x20, 0x6d, 0x61, 0x64, 0x65, + 0x20, 0x61, 0x74, 0x20, 0x61, 0x6e, 0x79, 0x20, 0x74, 0x69, 0x6d, + 0x65, 0x20, 0x6f, 0x72, 0x20, 0x70, 0x6c, 0x61, 0x63, 0x65, 0x2c, + 0x20, 0x77, 0x68, 0x69, 0x63, 0x68, 0x20, 0x61, 0x72, 0x65, 0x20, + 0x61, 0x64, 0x64, 0x72, 0x65, 0x73, 0x73, 0x65, 0x64, 0x20, 0x74, + 0x6f, + }; + uint8_t ciphertext[sizeof plaintext]; + + chacha20_init (&state, key, iv); + chacha20_crypt (&state, ciphertext, plaintext, sizeof plaintext); + + const uint8_t expected[] = + { + 0xa3, 0xfb, 0xf0, 0x7d, 0xf3, 0xfa, 0x2f, 0xde, 0x4f, 0x37, 0x6c, + 0xa2, 0x3e, 0x82, 0x73, 0x70, 0x41, 0x60, 0x5d, 0x9f, 0x4f, 0x4f, + 0x57, 0xbd, 0x8c, 0xff, 0x2c, 0x1d, 0x4b, 0x79, 0x55, 0xec, 0x2a, + 0x97, 0x94, 0x8b, 0xd3, 0x72, 0x29, 0x15, 0xc8, 0xf3, 0xd3, 0x37, + 0xf7, 0xd3, 0x70, 0x05, 0x0e, 0x9e, 0x96, 0xd6, 0x47, 0xb7, 0xc3, + 0x9f, 0x56, 0xe0, 0x31, 0xca, 0x5e, 0xb6, 0x25, 0x0d, 0x40, 0x42, + 0xe0, 0x27, 0x85, 0xec, 0xec, 0xfa, 0x4b, 0x4b, 0xb5, 0xe8, 0xea, + 0xd0, 0x44, 0x0e, 0x20, 0xb6, 0xe8, 0xdb, 0x09, 0xd8, 0x81, 0xa7, + 0xc6, 0x13, 0x2f, 0x42, 0x0e, 0x52, 0x79, 0x50, 0x42, 0xbd, 0xfa, + 0x77, 0x73, 0xd8, 0xa9, 0x05, 0x14, 0x47, 0xb3, 0x29, 0x1c, 0xe1, + 0x41, 0x1c, 0x68, 0x04, 0x65, 0x55, 0x2a, 0xa6, 0xc4, 0x05, 0xb7, + 0x76, 0x4d, 0x5e, 0x87, 0xbe, 0xa8, 0x5a, 0xd0, 0x0f, 0x84, 0x49, + 0xed, 0x8f, 0x72, 0xd0, 0xd6, 0x62, 0xab, 0x05, 0x26, 0x91, 0xca, + 0x66, 0x42, 0x4b, 0xc8, 0x6d, 0x2d, 0xf8, 0x0e, 0xa4, 0x1f, 0x43, + 0xab, 0xf9, 0x37, 0xd3, 0x25, 0x9d, 0xc4, 0xb2, 0xd0, 0xdf, 0xb4, + 0x8a, 0x6c, 0x91, 0x39, 0xdd, 0xd7, 0xf7, 0x69, 0x66, 0xe9, 0x28, + 0xe6, 0x35, 0x55, 0x3b, 0xa7, 0x6c, 0x5c, 0x87, 0x9d, 0x7b, 0x35, + 0xd4, 0x9e, 0xb2, 0xe6, 0x2b, 0x08, 0x71, 0xcd, 0xac, 0x63, 0x89, + 0x39, 0xe2, 0x5e, 0x8a, 0x1e, 0x0e, 0xf9, 0xd5, 0x28, 0x0f, 0xa8, + 0xca, 0x32, 0x8b, 0x35, 0x1c, 0x3c, 0x76, 0x59, 0x89, 0xcb, 0xcf, + 0x3d, 0xaa, 0x8b, 0x6c, 0xcc, 0x3a, 0xaf, 0x9f, 0x39, 0x79, 0xc9, + 0x2b, 0x37, 0x20, 0xfc, 0x88, 0xdc, 0x95, 0xed, 0x84, 0xa1, 0xbe, + 0x05, 0x9c, 0x64, 0x99, 0xb9, 0xfd, 0xa2, 0x36, 0xe7, 0xe8, 0x18, + 0xb0, 0x4b, 0x0b, 0xc3, 0x9c, 0x1e, 0x87, 0x6b, 0x19, 0x3b, 0xfe, + 0x55, 0x69, 0x75, 0x3f, 0x88, 0x12, 0x8c, 0xc0, 0x8a, 0xaa, 0x9b, + 0x63, 0xd1, 0xa1, 0x6f, 0x80, 0xef, 0x25, 0x54, 0xd7, 0x18, 0x9c, + 0x41, 0x1f, 0x58, 0x69, 0xca, 0x52, 0xc5, 0xb8, 0x3f, 0xa3, 0x6f, + 0xf2, 0x16, 0xb9, 0xc1, 0xd3, 0x00, 0x62, 0xbe, 0xbc, 0xfd, 0x2d, + 0xc5, 0xbc, 0xe0, 0x91, 0x19, 0x34, 0xfd, 0xa7, 0x9a, 0x86, 0xf6, + 0xe6, 0x98, 0xce, 0xd7, 0x59, 0xc3, 0xff, 0x9b, 0x64, 0x77, 0x33, + 0x8f, 0x3d, 0xa4, 0xf9, 0xcd, 0x85, 0x14, 0xea, 0x99, 0x82, 0xcc, + 0xaf, 0xb3, 0x41, 0xb2, 0x38, 0x4d, 0xd9, 0x02, 0xf3, 0xd1, 0xab, + 0x7a, 0xc6, 0x1d, 0xd2, 0x9c, 0x6f, 0x21, 0xba, 0x5b, 0x86, 0x2f, + 0x37, 0x30, 0xe3, 0x7c, 0xfd, 0xc4, 0xfd, 0x80, 0x6c, 0x22, 0xf2, + 0x21, + }; + TEST_COMPARE_BLOB (ciphertext, sizeof ciphertext, + expected, sizeof expected); + } + + /* Test vector #3. */ + { + struct chacha20_state state; + + uint8_t key[CHACHA20_KEY_SIZE] = + { + 0x1c, 0x92, 0x40, 0xa5, 0xeb, 0x55, 0xd3, 0x8a, + 0xf3, 0x33, 0x88, 0x86, 0x04, 0xf6, 0xb5, 0xf0, + 0x47, 0x39, 0x17, 0xc1, 0x40, 0x2b, 0x80, 0x09, + 0x9d, 0xca, 0x5c, 0xbc, 0x20, 0x70, 0x75, 0xc0 + }; + uint8_t iv[CHACHA20_IV_SIZE] = + { + 0x2a, 0x0, 0x0, 0x0, /* Block counter is a LE uint32_t */ + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2 + }; + + uint8_t plaintext[] = + { + 0x27, 0x54, 0x77, 0x61, 0x73, 0x20, 0x62, 0x72, 0x69, 0x6c, 0x6c, + 0x69, 0x67, 0x2c, 0x20, 0x61, 0x6e, 0x64, 0x20, 0x74, 0x68, 0x65, + 0x20, 0x73, 0x6c, 0x69, 0x74, 0x68, 0x79, 0x20, 0x74, 0x6f, 0x76, + 0x65, 0x73, 0x0a, 0x44, 0x69, 0x64, 0x20, 0x67, 0x79, 0x72, 0x65, + 0x20, 0x61, 0x6e, 0x64, 0x20, 0x67, 0x69, 0x6d, 0x62, 0x6c, 0x65, + 0x20, 0x69, 0x6e, 0x20, 0x74, 0x68, 0x65, 0x20, 0x77, 0x61, 0x62, + 0x65, 0x3a, 0x0a, 0x41, 0x6c, 0x6c, 0x20, 0x6d, 0x69, 0x6d, 0x73, + 0x79, 0x20, 0x77, 0x65, 0x72, 0x65, 0x20, 0x74, 0x68, 0x65, 0x20, + 0x62, 0x6f, 0x72, 0x6f, 0x67, 0x6f, 0x76, 0x65, 0x73, 0x2c, 0x0a, + 0x41, 0x6e, 0x64, 0x20, 0x74, 0x68, 0x65, 0x20, 0x6d, 0x6f, 0x6d, + 0x65, 0x20, 0x72, 0x61, 0x74, 0x68, 0x73, 0x20, 0x6f, 0x75, 0x74, + 0x67, 0x72, 0x61, 0x62, 0x65, 0x2e, + }; + uint8_t ciphertext[sizeof plaintext]; + + chacha20_init (&state, key, iv); + chacha20_crypt (&state, ciphertext, plaintext, sizeof plaintext); + + const uint8_t expected[] = + { + 0x62, 0xe6, 0x34, 0x7f, 0x95, 0xed, 0x87, 0xa4, 0x5f, 0xfa, 0xe7, + 0x42, 0x6f, 0x27, 0xa1, 0xdf, 0x5f, 0xb6, 0x91, 0x10, 0x04, 0x4c, + 0x0d, 0x73, 0x11, 0x8e, 0xff, 0xa9, 0x5b, 0x01, 0xe5, 0xcf, 0x16, + 0x6d, 0x3d, 0xf2, 0xd7, 0x21, 0xca, 0xf9, 0xb2, 0x1e, 0x5f, 0xb1, + 0x4c, 0x61, 0x68, 0x71, 0xfd, 0x84, 0xc5, 0x4f, 0x9d, 0x65, 0xb2, + 0x83, 0x19, 0x6c, 0x7f, 0xe4, 0xf6, 0x05, 0x53, 0xeb, 0xf3, 0x9c, + 0x64, 0x02, 0xc4, 0x22, 0x34, 0xe3, 0x2a, 0x35, 0x6b, 0x3e, 0x76, + 0x43, 0x12, 0xa6, 0x1a, 0x55, 0x32, 0x05, 0x57, 0x16, 0xea, 0xd6, + 0x96, 0x25, 0x68, 0xf8, 0x7d, 0x3f, 0x3f, 0x77, 0x04, 0xc6, 0xa8, + 0xd1, 0xbc, 0xd1, 0xbf, 0x4d, 0x50, 0xd6, 0x15, 0x4b, 0x6d, 0xa7, + 0x31, 0xb1, 0x87, 0xb5, 0x8d, 0xfd, 0x72, 0x8a, 0xfa, 0x36, 0x75, + 0x7a, 0x79, 0x7a, 0xc1, 0x88, 0xd1, + }; + + TEST_COMPARE_BLOB (ciphertext, sizeof ciphertext, + expected, sizeof expected); + } + + return 0; +} + +#include diff --git a/stdlib/tst-arc4random-fork.c b/stdlib/tst-arc4random-fork.c new file mode 100644 index 0000000000..cd8852c8d3 --- /dev/null +++ b/stdlib/tst-arc4random-fork.c @@ -0,0 +1,174 @@ +/* Test that subprocesses generate distinct streams of randomness. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Collect random data from subprocesses and check that all the + results are unique. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Perform multiple runs. The subsequent runs start with an + already-initialized random number generator. (The number 1500 was + seen to reproduce failures reliable in case of a race condition in + the fork detection code.) */ +enum { runs = 1500 }; + +/* One hundred processes in total. This should be high enough to + expose any issues, but low enough not to tax the overall system too + much. */ +enum { subprocesses = 49 }; + +/* The total number of processes. */ +enum { processes = subprocesses + 1 }; + +/* Number of bytes of randomness to generate per process. Large + enough to make false positive duplicates extremely unlikely. */ +enum { random_size = 16 }; + +/* Generated bytes of randomness. */ +struct result +{ + unsigned char bytes[random_size]; +}; + +/* Shared across all processes. */ +static struct shared_data +{ + pthread_barrier_t barrier; + struct result results[runs][processes]; +} *shared_data; + +/* Invoked to collect data from a subprocess. */ +static void +subprocess (int run, int process_index) +{ + xpthread_barrier_wait (&shared_data->barrier); + arc4random_buf (shared_data->results[run][process_index].bytes, random_size); +} + +/* Used to sort the results. */ +struct index +{ + int run; + int process_index; +}; + +/* Used to sort an array of struct index values. */ +static int +index_compare (const void *left1, const void *right1) +{ + const struct index *left = left1; + const struct index *right = right1; + + return memcmp (shared_data->results[left->run][left->process_index].bytes, + shared_data->results[right->run][right->process_index].bytes, + random_size); +} + +static int +do_test (void) +{ + shared_data = support_shared_allocate (sizeof (*shared_data)); + { + pthread_barrierattr_t attr; + xpthread_barrierattr_init (&attr); + xpthread_barrierattr_setpshared (&attr, PTHREAD_PROCESS_SHARED); + xpthread_barrier_init (&shared_data->barrier, &attr, processes); + xpthread_barrierattr_destroy (&attr); + } + + /* Collect random data. */ + for (int run = 0; run < runs; ++run) + { +#if 0 + if (run == runs / 2) + { + /* In the middle, desynchronize the block cache by consuming + an odd number of bytes. */ + char buf; + arc4random_buf (&buf, 1); + } +#endif + + pid_t pids[subprocesses]; + for (int process_index = 0; process_index < subprocesses; + ++process_index) + { + pids[process_index] = xfork (); + if (pids[process_index] == 0) + { + subprocess (run, process_index); + _exit (0); + } + } + + /* Trigger all subprocesses. Also add data from the parent + process. */ + subprocess (run, subprocesses); + + for (int process_index = 0; process_index < subprocesses; + ++process_index) + { + int status; + xwaitpid (pids[process_index], &status, 0); + if (status != 0) + FAIL_EXIT1 ("subprocess index %d (PID %d) exit status %d\n", + process_index, (int) pids[process_index], status); + } + } + + /* Check for duplicates. */ + struct index indexes[runs * processes]; + for (int run = 0; run < runs; ++run) + for (int process_index = 0; process_index < processes; ++process_index) + indexes[run * processes + process_index] + = (struct index) { .run = run, .process_index = process_index }; + qsort (indexes, array_length (indexes), sizeof (indexes[0]), index_compare); + for (size_t i = 1; i < array_length (indexes); ++i) + { + if (index_compare (indexes + i - 1, indexes + i) == 0) + { + support_record_failure (); + unsigned char *bytes + = shared_data->results[indexes[i].run] + [indexes[i].process_index].bytes; + char *quoted = support_quote_blob (bytes, random_size); + printf ("error: duplicate randomness data: \"%s\"\n" + " run %d, subprocess %d\n" + " run %d, subprocess %d\n", + quoted, indexes[i - 1].run, indexes[i - 1].process_index, + indexes[i].run, indexes[i].process_index); + free (quoted); + } + } + + xpthread_barrier_destroy (&shared_data->barrier); + support_shared_free (shared_data); + shared_data = NULL; + + return 0; +} + +#include diff --git a/stdlib/tst-arc4random-stats.c b/stdlib/tst-arc4random-stats.c new file mode 100644 index 0000000000..9747180c99 --- /dev/null +++ b/stdlib/tst-arc4random-stats.c @@ -0,0 +1,146 @@ +/* Statistical tests for arc4random-related functions. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +enum +{ + arc4random_key_size = 32 +}; + +struct key +{ + unsigned char data[arc4random_key_size]; +}; + +/* With 12,000 keys, the probability that a byte in a predetermined + position does not have a predetermined value in all generated keys + is about 4e-21. The probability that this happens with any of the + 16 * 256 possible byte position/values is 1.6e-17. This results in + an acceptably low false-positive rate. */ +enum { key_count = 12000 }; + +static struct key keys[key_count]; + +/* Used to perform the distribution check. */ +static int byte_counts[arc4random_key_size][256]; + +/* Bail out after this many failures. */ +enum { failure_limit = 100 }; + +static void +find_stuck_bytes (bool (*func) (unsigned char *key)) +{ + memset (&keys, 0xcc, sizeof (keys)); + + int failures = 0; + for (int key = 0; key < key_count; ++key) + { + while (true) + { + if (func (keys[key].data)) + break; + ++failures; + if (failures >= failure_limit) + { + printf ("warning: bailing out after %d failures\n", failures); + return; + } + } + } + printf ("info: key generation finished with %d failures\n", failures); + + memset (&byte_counts, 0, sizeof (byte_counts)); + for (int key = 0; key < key_count; ++key) + for (int pos = 0; pos < arc4random_key_size; ++pos) + ++byte_counts[pos][keys[key].data[pos]]; + + for (int pos = 0; pos < arc4random_key_size; ++pos) + for (int byte = 0; byte < 256; ++byte) + if (byte_counts[pos][byte] == 0) + { + support_record_failure (); + printf ("error: byte %d never appeared at position %d\n", byte, pos); + } +} + +/* Test adapter for arc4random. */ +static bool +generate_arc4random (unsigned char *key) +{ + uint32_t words[arc4random_key_size / 4]; + _Static_assert (sizeof (words) == arc4random_key_size, "sizeof (words)"); + + for (int i = 0; i < array_length (words); ++i) + words[i] = arc4random (); + memcpy (key, &words, arc4random_key_size); + return true; +} + +/* Test adapter for arc4random_buf. */ +static bool +generate_arc4random_buf (unsigned char *key) +{ + arc4random_buf (key, arc4random_key_size); + return true; +} + +/* Test adapter for arc4random_uniform. */ +static bool +generate_arc4random_uniform (unsigned char *key) +{ + for (int i = 0; i < arc4random_key_size; ++i) + key[i] = arc4random_uniform (256); + return true; +} + +/* Test adapter for arc4random_uniform with argument 257. This means + that byte 0 happens more often, but we do not perform such a + statistcal check, so the test will still pass */ +static bool +generate_arc4random_uniform_257 (unsigned char *key) +{ + for (int i = 0; i < arc4random_key_size; ++i) + key[i] = arc4random_uniform (257); + return true; +} + +static int +do_test (void) +{ + puts ("info: arc4random implementation test"); + find_stuck_bytes (generate_arc4random); + + puts ("info: arc4random_buf implementation test"); + find_stuck_bytes (generate_arc4random_buf); + + puts ("info: arc4random_uniform implementation test"); + find_stuck_bytes (generate_arc4random_uniform); + + puts ("info: arc4random_uniform implementation test (257 variant)"); + find_stuck_bytes (generate_arc4random_uniform_257); + + return 0; +} + +#include diff --git a/stdlib/tst-arc4random-thread.c b/stdlib/tst-arc4random-thread.c new file mode 100644 index 0000000000..b122eaa826 --- /dev/null +++ b/stdlib/tst-arc4random-thread.c @@ -0,0 +1,278 @@ +/* Test that threads generate distinct streams of randomness. + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include + +/* Number of arc4random_buf calls per thread. */ +enum { count_per_thread = 5000 }; + +/* Number of threads computing randomness. */ +enum { inner_threads = 5 }; + +/* Number of threads launching other threads. Chosen as to not to + overload the system. */ +enum { outer_threads = 7 }; + +/* Number of launching rounds performed by the outer threads. */ +enum { outer_rounds = 10 }; + +/* Maximum number of bytes generated in an arc4random call. */ +enum { max_size = 32 }; + +/* Sizes generated by threads. Must be long enough to be unique with + high probability. */ +static const int sizes[] = { 12, 15, 16, 17, 24, 31, max_size }; + +/* Data structure to capture randomness results. */ +struct blob +{ + unsigned int size; + int thread_id; + unsigned int index; + unsigned char bytes[max_size]; +}; + +#define DYNARRAY_STRUCT dynarray_blob +#define DYNARRAY_ELEMENT struct blob +#define DYNARRAY_PREFIX dynarray_blob_ +#include + +/* Sort blob elements by length first, then by comparing the data + member. */ +static int +compare_blob (const void *left1, const void *right1) +{ + const struct blob *left = left1; + const struct blob *right = right1; + + if (left->size != right->size) + /* No overflow due to limited range. */ + return left->size - right->size; + return memcmp (left->bytes, right->bytes, left->size); +} + +/* Used to store the global result. */ +static pthread_mutex_t global_result_lock = PTHREAD_MUTEX_INITIALIZER; +static struct dynarray_blob global_result; + +/* Copy data to the global result, with locking. */ +static void +copy_result_to_global (struct dynarray_blob *result) +{ + xpthread_mutex_lock (&global_result_lock); + size_t old_size = dynarray_blob_size (&global_result); + TEST_VERIFY_EXIT + (dynarray_blob_resize (&global_result, + old_size + dynarray_blob_size (result))); + memcpy (dynarray_blob_begin (&global_result) + old_size, + dynarray_blob_begin (result), + dynarray_blob_size (result) * sizeof (struct blob)); + xpthread_mutex_unlock (&global_result_lock); +} + +/* Used to assign unique thread IDs. Accessed atomically. */ +static int next_thread_id; + +static void * +inner_thread (void *unused) +{ + /* Use local result to avoid global lock contention while generating + randomness. */ + struct dynarray_blob result; + dynarray_blob_init (&result); + + int thread_id = __atomic_fetch_add (&next_thread_id, 1, __ATOMIC_RELAXED); + + /* Determine the sizes to be used by this thread. */ + int size_slot = thread_id % (array_length (sizes) + 1); + bool switch_sizes = size_slot == array_length (sizes); + if (switch_sizes) + size_slot = 0; + + /* Compute the random blobs. */ + for (int i = 0; i < count_per_thread; ++i) + { + struct blob *place = dynarray_blob_emplace (&result); + TEST_VERIFY_EXIT (place != NULL); + place->size = sizes[size_slot]; + place->thread_id = thread_id; + place->index = i; + arc4random_buf (place->bytes, place->size); + + if (switch_sizes) + size_slot = (size_slot + 1) % array_length (sizes); + } + + /* Store the blobs in the global result structure. */ + copy_result_to_global (&result); + + dynarray_blob_free (&result); + + return NULL; +} + +/* Launch the inner threads and wait for their termination. */ +static void * +outer_thread (void *unused) +{ + for (int round = 0; round < outer_rounds; ++round) + { + pthread_t threads[inner_threads]; + + for (int i = 0; i < inner_threads; ++i) + threads[i] = xpthread_create (NULL, inner_thread, NULL); + + for (int i = 0; i < inner_threads; ++i) + xpthread_join (threads[i]); + } + + return NULL; +} + +static bool termination_requested; + +/* Call arc4random_buf to fill one blob with 16 bytes. */ +static void * +get_one_blob_thread (void *closure) +{ + struct blob *result = closure; + result->size = 16; + arc4random_buf (result->bytes, result->size); + return NULL; +} + +/* Invoked from fork_thread to actually obtain randomness data. */ +static void +fork_thread_subprocess (void *closure) +{ + struct blob *shared_result = closure; + + pthread_t thr1 = xpthread_create + (NULL, get_one_blob_thread, shared_result + 1); + pthread_t thr2 = xpthread_create + (NULL, get_one_blob_thread, shared_result + 2); + get_one_blob_thread (shared_result); + xpthread_join (thr1); + xpthread_join (thr2); +} + +/* Continuously fork subprocesses to obtain a little bit of + randomness. */ +static void * +fork_thread (void *unused) +{ + struct dynarray_blob result; + dynarray_blob_init (&result); + + /* Three blobs from each subprocess. */ + struct blob *shared_result + = support_shared_allocate (3 * sizeof (*shared_result)); + + while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED)) + { + /* Obtain the results from a subprocess. */ + support_isolate_in_subprocess (fork_thread_subprocess, shared_result); + + for (int i = 0; i < 3; ++i) + { + struct blob *place = dynarray_blob_emplace (&result); + TEST_VERIFY_EXIT (place != NULL); + place->size = shared_result[i].size; + place->thread_id = -1; + place->index = i; + memcpy (place->bytes, shared_result[i].bytes, place->size); + } + } + + support_shared_free (shared_result); + + copy_result_to_global (&result); + dynarray_blob_free (&result); + + return NULL; +} + +/* Launch the outer threads and wait for their termination. */ +static void +run_outer_threads (void) +{ + /* Special thread that continuously calls fork. */ + pthread_t fork_thread_id = xpthread_create (NULL, fork_thread, NULL); + + pthread_t threads[outer_threads]; + for (int i = 0; i < outer_threads; ++i) + threads[i] = xpthread_create (NULL, outer_thread, NULL); + + for (int i = 0; i < outer_threads; ++i) + xpthread_join (threads[i]); + + __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED); + xpthread_join (fork_thread_id); +} + +static int +do_test (void) +{ + dynarray_blob_init (&global_result); + int expected_blobs + = count_per_thread * inner_threads * outer_threads * outer_rounds; + printf ("info: minimum of %d blob results expected\n", expected_blobs); + + run_outer_threads (); + + /* The forking thread delivers a non-deterministic number of + results, which is why expected_blobs is only a minimun number of + results. */ + printf ("info: %zu blob results observed\n", + dynarray_blob_size (&global_result)); + TEST_VERIFY (dynarray_blob_size (&global_result) >= expected_blobs); + + /* Verify that there are no duplicates. */ + qsort (dynarray_blob_begin (&global_result), + dynarray_blob_size (&global_result), + sizeof (struct blob), compare_blob); + struct blob *end = dynarray_blob_end (&global_result); + for (struct blob *p = dynarray_blob_begin (&global_result) + 1; + p < end; ++p) + { + if (compare_blob (p - 1, p) == 0) + { + support_record_failure (); + char *quoted = support_quote_blob (p->bytes, p->size); + printf ("error: duplicate blob: \"%s\" (%d bytes)\n", + quoted, (int) p->size); + printf (" first source: thread %d, index %u\n", + p[-1].thread_id, p[-1].index); + printf (" second source: thread %d, index %u\n", + p[0].thread_id, p[0].index); + free (quoted); + } + } + + dynarray_blob_free (&global_result); + + return 0; +} + +#include From patchwork Wed Apr 13 20:23:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52878 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B0153857C48 for ; Wed, 13 Apr 2022 20:26:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B0153857C48 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881602; bh=iSbAKIcGtx/2J856C77z1bzJsodqjwtahxNw56laPaw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=lRE+9b4XbTzorH4Or/FB++GYPPz+TX42PhnZ+qF9cI8pdcvIvP+3uj5YyQbEjGMhk S7jwUey8HqVLZl+bVJMF5lmVcL08o3eJuLDXHgAM/NPpItjWd0lx2RlXq6WlmDU4CS KX0GhD/uxxq3x4gDuZTPM64u3UnF9Hx6X2aFJJ3I= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x2c.google.com (mail-oa1-x2c.google.com [IPv6:2001:4860:4864:20::2c]) by sourceware.org (Postfix) with ESMTPS id 371B73857405 for ; Wed, 13 Apr 2022 20:24:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 371B73857405 Received: by mail-oa1-x2c.google.com with SMTP id 586e51a60fabf-e2a00f2cc8so3218419fac.4 for ; Wed, 13 Apr 2022 13:24:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iSbAKIcGtx/2J856C77z1bzJsodqjwtahxNw56laPaw=; b=6W5wpL2KNz/wuX95UQ4U8ze7d7EUMGKLPB+3slgptUwel5xH+MQa7/53U6RvWxhy2Y gZ/lDKmtBdgahSkZqwKlAowFDfxFWicqApdd5r6KpMtiqlpZzVOBGIlKxu+ONexDfCR5 b2LjZgfFiBf5LkJeOpMOWAC6vX75MQjTQSNPg5VFOQDZpBnmhrn9vesbIp/F5240SlfX LvLXRtuVAstASxLBpfWFSJ9G3YXZSptMhCfdYEFWk6oRryh97eYnMcEdv+1hj2j7RJ0u iZP49c/3ZWaCRiuc6BBEDTX9+cQjOGpEQNU/qSVOo9OiaBwwk6GvGcttguOI6EJGv383 AjYQ== X-Gm-Message-State: AOAM532uxiz+65slkj4SXxXM3CXWPVuM8P2pFnX9cm4ettd0VTPSyMOd IPue2VeJ1CmQ3hMbc/+/RTIXTGi3OsVvLg== X-Google-Smtp-Source: ABdhPJyA2SPIn03rBKtmL83il0MoXpiVr6GASwoaJTe54roMyw5/AwCAV5kkFvsdYeqqFQrTzO2YYQ== X-Received: by 2002:a05:6870:b408:b0:dd:ed4f:b1c7 with SMTP id x8-20020a056870b40800b000dded4fb1c7mr207763oap.41.1649881455529; Wed, 13 Apr 2022 13:24:15 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:15 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 3/7] benchtests: Add arc4random benchtest Date: Wed, 13 Apr 2022 17:23:57 -0300 Message-Id: <20220413202401.408267-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It shows both throughput (total bytes obtained in the test duration) and latecy for both arc4random and arc4random_buf with different sizes. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. --- benchtests/Makefile | 6 +- benchtests/bench-arc4random.c | 243 ++++++++++++++++++++++++++++++++++ 2 files changed, 248 insertions(+), 1 deletion(-) create mode 100644 benchtests/bench-arc4random.c diff --git a/benchtests/Makefile b/benchtests/Makefile index 8dfca592fd..50b96dd71f 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -111,8 +111,12 @@ bench-string := \ ffsll \ # bench-string +bench-stdlib := \ + arc4random \ +# bench-stdlib + ifeq (${BENCHSET},) -bench := $(bench-math) $(bench-pthread) $(bench-string) +bench := $(bench-math) $(bench-pthread) $(bench-string) $(bench-stdlib) else bench := $(foreach B,$(filter bench-%,${BENCHSET}), ${${B}}) endif diff --git a/benchtests/bench-arc4random.c b/benchtests/bench-arc4random.c new file mode 100644 index 0000000000..9e2ba9ba34 --- /dev/null +++ b/benchtests/bench-arc4random.c @@ -0,0 +1,243 @@ +/* arc4random benchmarks. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "bench-timing.h" +#include "json-lib.h" +#include +#include +#include +#include +#include +#include +#include +#include + +static volatile uint32_t r; +static volatile sig_atomic_t timer_finished; + +static void timer_callback (int unused) +{ + timer_finished = 1; +} + +static const uint32_t sizes[] = { 0, 16, 32, 64, 128 }; + +static double +bench_arc4random_throughput (void) +{ + /* Run for approximately DURATION seconds, and it does not matter who + receive the signal (so not need to mask it on main thread). */ + timer_finished = 0; + timer_t timer = support_create_timer (DURATION, 0, false, timer_callback); + + uint64_t n = 0; + + while (1) + { + r = arc4random (); + n++; + + if (timer_finished == 1) + break; + } + + support_delete_timer (timer); + + return (double) (n * sizeof (r)) / (double) DURATION; +} + +static double +bench_arc4random_latency (void) +{ + timing_t start, stop, cur; + const size_t iters = 1024; + + TIMING_NOW (start); + for (size_t i = 0; i < iters; i++) + r = arc4random (); + TIMING_NOW (stop); + + TIMING_DIFF (cur, start, stop); + + return (double) (cur) / (double) iters; +} + +static double +bench_arc4random_buf_throughput (size_t len) +{ + timer_finished = 0; + timer_t timer = support_create_timer (DURATION, 0, false, timer_callback); + + uint8_t buf[len]; + + uint64_t n = 0; + + while (1) + { + arc4random_buf (buf, len); + n++; + + if (timer_finished == 1) + break; + } + + support_delete_timer (timer); + + uint64_t total = (n * len); + return (double) (total) / (double) DURATION; +} + +static double +bench_arc4random_buf_latency (size_t len) +{ + timing_t start, stop, cur; + const size_t iters = 1024; + + uint8_t buf[len]; + + TIMING_NOW (start); + for (size_t i = 0; i < iters; i++) + arc4random_buf (buf, len); + TIMING_NOW (stop); + + TIMING_DIFF (cur, start, stop); + + return (double) (cur) / (double) iters; +} + +static void +bench_singlethread (json_ctx_t *json_ctx) +{ + json_element_object_begin (json_ctx); + + json_array_begin (json_ctx, "throughput"); + for (int i = 0; i < array_length (sizes); i++) + if (sizes[i] == 0) + json_element_double (json_ctx, bench_arc4random_throughput ()); + else + json_element_double (json_ctx, bench_arc4random_buf_throughput (sizes[i])); + json_array_end (json_ctx); + + json_array_begin (json_ctx, "latency"); + for (int i = 0; i < array_length (sizes); i++) + if (sizes[i] == 0) + json_element_double (json_ctx, bench_arc4random_latency ()); + else + json_element_double (json_ctx, bench_arc4random_buf_latency (sizes[i])); + json_array_end (json_ctx); + + json_element_object_end (json_ctx); +} + +struct thr_arc4random_arg +{ + double ret; + uint32_t val; +}; + +static void * +thr_arc4random_throughput (void *closure) +{ + struct thr_arc4random_arg *arg = closure; + arg->ret = arg->val == 0 ? bench_arc4random_throughput () + : bench_arc4random_buf_throughput (arg->val); + return NULL; +} + +static void * +thr_arc4random_latency (void *closure) +{ + struct thr_arc4random_arg *arg = closure; + arg->ret = arg->val == 0 ? bench_arc4random_latency () + : bench_arc4random_buf_latency (arg->val); + return NULL; +} + +static void +bench_threaded (json_ctx_t *json_ctx) +{ + json_element_object_begin (json_ctx); + + json_array_begin (json_ctx, "throughput"); + for (int i = 0; i < array_length (sizes); i++) + { + struct thr_arc4random_arg arg = { .val = sizes[i] }; + pthread_t thr = xpthread_create (NULL, thr_arc4random_throughput, &arg); + xpthread_join (thr); + json_element_double (json_ctx, arg.ret); + } + json_array_end (json_ctx); + + json_array_begin (json_ctx, "latency"); + for (int i = 0; i < array_length (sizes); i++) + { + struct thr_arc4random_arg arg = { .val = sizes[i] }; + pthread_t thr = xpthread_create (NULL, thr_arc4random_latency, &arg); + xpthread_join (thr); + json_element_double (json_ctx, arg.ret); + } + json_array_end (json_ctx); + + json_element_object_end (json_ctx); +} + +static void +run_bench (json_ctx_t *json_ctx, const char *name, + char *const*fnames, size_t fnameslen, + void (*bench)(json_ctx_t *ctx)) +{ + json_attr_object_begin (json_ctx, name); + json_array_begin (json_ctx, "functions"); + for (int i = 0; i < fnameslen; i++) + json_element_string (json_ctx, fnames[i]); + json_array_end (json_ctx); + + json_array_begin (json_ctx, "results"); + bench (json_ctx); + json_array_end (json_ctx); + json_attr_object_end (json_ctx); +} + +static int +do_test (void) +{ + char *fnames[array_length (sizes) + 1]; + fnames[0] = (char *) "arc4random"; + for (int i = 0; i < array_length (sizes); i++) + fnames[i+1] = xasprintf ("arc4random_buf(%u)", sizes[i]); + + json_ctx_t json_ctx; + json_init (&json_ctx, 0, stdout); + + json_document_begin (&json_ctx); + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + + run_bench (&json_ctx, "single-thread", fnames, array_length (fnames), + bench_singlethread); + run_bench (&json_ctx, "multi-thread", fnames, array_length (fnames), + bench_threaded); + + json_document_end (&json_ctx); + + for (int i = 0; i < array_length (sizes); i++) + free (fnames[i+1]); + + return 0; +} + +#include From patchwork Wed Apr 13 20:23:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52879 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4BC643857827 for ; Wed, 13 Apr 2022 20:27:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4BC643857827 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881650; bh=xIgQVXP/gGwpIyyKYmGRQaUdZ7ZEdTN9xFvo9szU2SI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=dKL6ngHra1/KLlUxOdSmr2ke2X/N5LJQ4iXKazB3BIRric/adKuWrI6Gt4q/iBWvS p/qJLV1z5uRc0sGthCkQcbJH7LJ8rqOx6icNzutVFIOjh9cSkElYlGpmg0ETHQkMHR B2hHv6JMP8lS7Y+VzN2DX8H3mAXSZOhMYRKuSzbA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id 4F45C385781A for ; Wed, 13 Apr 2022 20:24:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4F45C385781A Received: by mail-ot1-x336.google.com with SMTP id e25-20020a0568301e5900b005b236d5d74fso1974326otj.0 for ; Wed, 13 Apr 2022 13:24:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xIgQVXP/gGwpIyyKYmGRQaUdZ7ZEdTN9xFvo9szU2SI=; b=NSRJhDhMdEtNRzdvsEmB7SLXPqpY9UfhlBu3xUIknczxM6S2yNtXiurAKr+x4gvHke OujkBUpV6OvulS2heVDgydTVSCMXOOf5AXMifk0z1c+IS6KlTrE7oOcNeVu90Xbmnh1S HD/ed7x9cncrsIVCoYGd8fV9HZmNdZqIV9UTPRtRhZ4HFkgG+O6sACGAUKA9r91k+/r/ S8GsJWJO2m4KhXS25VO0xNNAotQZQLvY2iKGWWPEf5Tv5XQkC26oy4Zy5D/Uoacojje/ 37E13n8VuTHQkf3zV0sGFd3iwYP3lsT9F86vYOoQQEl6TTz6h9MZKmFrhJIIfbwZvHet /xTQ== X-Gm-Message-State: AOAM530Y5/3AcEtp8fC20TLmXPWRJ3tphZAw/yTUBuefVw6+pdfFXH0N ND30UEx5o6Xs3IPHTqKJIlJzgE87MpkwNQ== X-Google-Smtp-Source: ABdhPJxOvQoRSonan6hjQnsl+8Vp06Q8YIavItg7gmqea06GaTHC7THUXIOqtSAQzatqiVQQIj8BNg== X-Received: by 2002:a05:6830:3114:b0:5e6:d2bf:6333 with SMTP id b20-20020a056830311400b005e6d2bf6333mr8200223ots.262.1649881457142; Wed, 13 Apr 2022 13:24:17 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:16 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 4/7] x86: Add SSSE3 optimized chacha20 Date: Wed, 13 Apr 2022 17:23:58 -0300 Message-Id: <20220413202401.408267-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-amd64-ssse3.S. It is used only if SSSE3 is supported and enable by the architecture. On a Ryzen 9 5900X it shows the following improvements (using formatted bench-arc4random data): GENERIC Function MB/s -------------------------------------------------- arc4random [single-thread] 375.06 arc4random_buf(0) [single-thread] 498.50 arc4random_buf(16) [single-thread] 576.86 arc4random_buf(32) [single-thread] 615.76 arc4random_buf(64) [single-thread] 633.97 -------------------------------------------------- arc4random [multi-thread] 359.86 arc4random_buf(0) [multi-thread] 479.27 arc4random_buf(16) [multi-thread] 543.65 arc4random_buf(32) [multi-thread] 581.98 arc4random_buf(64) [multi-thread] 603.01 -------------------------------------------------- SSSE3: Function MB/s -------------------------------------------------- arc4random [single-thread] 576.55 arc4random_buf(0) [single-thread] 961.77 arc4random_buf(16) [single-thread] 1309.38 arc4random_buf(32) [single-thread] 1558.69 arc4random_buf(64) [single-thread] 1728.54 -------------------------------------------------- arc4random [multi-thread] 589.52 arc4random_buf(0) [multi-thread] 967.39 arc4random_buf(16) [multi-thread] 1319.27 arc4random_buf(32) [multi-thread] 1552.96 arc4random_buf(64) [multi-thread] 1734.27 -------------------------------------------------- Checked on x86_64-linux-gnu. --- LICENSES | 20 ++ sysdeps/generic/chacha20_arch.h | 24 +++ sysdeps/x86_64/Makefile | 6 + sysdeps/x86_64/chacha20-ssse3.S | 330 ++++++++++++++++++++++++++++++++ sysdeps/x86_64/chacha20_arch.h | 42 ++++ 5 files changed, 422 insertions(+) create mode 100644 sysdeps/generic/chacha20_arch.h create mode 100644 sysdeps/x86_64/chacha20-ssse3.S create mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index 530893b1dc..2563abd9e2 100644 --- a/LICENSES +++ b/LICENSES @@ -389,3 +389,23 @@ Copyright 2001 by Stephen L. Moshier You should have received a copy of the GNU Lesser General Public License along with this library; if not, see . */ + +sysdeps/x86_64/chacha20-ssse3.S import code from libgcrypt, with the +following notices: + +Copyright (C) 2017-2019 Jussi Kivilinna + +This file is part of Libgcrypt. + +Libgcrypt is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as +published by the Free Software Foundation; either version 2.1 of +the License, or (at your option) any later version. + +Libgcrypt is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Lesser General Public License for more details. + +You should have received a copy of the GNU Lesser General Public +License along with this program; if not, see . diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h new file mode 100644 index 0000000000..d7200ac583 --- /dev/null +++ b/sysdeps/generic/chacha20_arch.h @@ -0,0 +1,24 @@ +/* Chacha20 implementation, generic interface. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +static inline void +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + chacha20_crypt_generic (state, dst, src, bytes); +} diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index 79365aff2a..f43b6a1180 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,6 +5,12 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif +ifeq ($(subdir),stdlib) +sysdep_routines += \ + chacha20-ssse3 \ + # sysdep_routines +endif + ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-ssse3.S b/sysdeps/x86_64/chacha20-ssse3.S new file mode 100644 index 0000000000..f221daf634 --- /dev/null +++ b/sysdeps/x86_64/chacha20-ssse3.S @@ -0,0 +1,330 @@ +/* Optimized SSSE3 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#include + +#ifdef PIC +# define rRIP (%rip) +#else +# define rRIP +#endif + +/* register macros */ +#define INPUT %rdi +#define DST %rsi +#define SRC %rdx +#define NBLKS %rcx +#define ROUND %eax + +/* stack structure */ +#define STACK_VEC_X12 (16) +#define STACK_VEC_X13 (16 + STACK_VEC_X12) +#define STACK_TMP (16 + STACK_VEC_X13) +#define STACK_TMP1 (16 + STACK_TMP) +#define STACK_TMP2 (16 + STACK_TMP1) + +#define STACK_MAX (16 + STACK_TMP2) + +/* vector registers */ +#define X0 %xmm0 +#define X1 %xmm1 +#define X2 %xmm2 +#define X3 %xmm3 +#define X4 %xmm4 +#define X5 %xmm5 +#define X6 %xmm6 +#define X7 %xmm7 +#define X8 %xmm8 +#define X9 %xmm9 +#define X10 %xmm10 +#define X11 %xmm11 +#define X12 %xmm12 +#define X13 %xmm13 +#define X14 %xmm14 +#define X15 %xmm15 + +/********************************************************************** + helper macros + **********************************************************************/ + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ + movdqa x0, t2; \ + punpckhdq x1, t2; \ + punpckldq x1, x0; \ + \ + movdqa x2, t1; \ + punpckldq x3, t1; \ + punpckhdq x3, x2; \ + \ + movdqa x0, x1; \ + punpckhqdq t1, x1; \ + punpcklqdq t1, x0; \ + \ + movdqa t2, x3; \ + punpckhqdq x2, x3; \ + punpcklqdq x2, t2; \ + movdqa t2, x2; + +/* fill xmm register with 32-bit value from memory */ +#define pbroadcastd(mem32, xreg) \ + movd mem32, xreg; \ + pshufd $0, xreg, xreg; + +/* xor with unaligned memory operand */ +#define pxor_u(umem128, xreg, t) \ + movdqu umem128, t; \ + pxor t, xreg; + +/* xor register with unaligned src and save to unaligned dst */ +#define xor_src_dst(dst, src, offset, xreg, t) \ + pxor_u(offset(src), xreg, t); \ + movdqu xreg, offset(dst); + +#define clear(x) pxor x,x; + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define ROTATE2(v1,v2,c,tmp1,tmp2) \ + movdqa v1, tmp1; \ + movdqa v2, tmp2; \ + psrld $(32 - (c)), v1; \ + pslld $(c), tmp1; \ + paddb tmp1, v1; \ + psrld $(32 - (c)), v2; \ + pslld $(c), tmp2; \ + paddb tmp2, v2; + +#define ROTATE_SHUF_2(v1,v2,shuf) \ + pshufb shuf, v1; \ + pshufb shuf, v2; + +#define XOR(ds,s) \ + pxor s, ds; + +#define PLUS(ds,s) \ + paddd s, ds; + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2,\ + interleave_op1,interleave_op2) \ + movdqa L(shuf_rol16) rRIP, tmp1; \ + interleave_op1; \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE_SHUF_2(d1, d2, tmp1); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 12, tmp1, tmp2); \ + movdqa L(shuf_rol8) rRIP, tmp1; \ + interleave_op2; \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE_SHUF_2(d1, d2, tmp1); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 7, tmp1, tmp2); + + .text + +chacha20_data: + .align 16 +L(shuf_rol16): + .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 +L(shuf_rol8): + .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 +L(counter1): + .long 1,0,0,0 +L(inc_counter): + .long 0,1,2,3 +L(unsigned_cmp): + .long 0x80000000,0x80000000,0x80000000,0x80000000 + +ENTRY (__chacha20_ssse3_blocks8) + /* input: + * %rdi: input + * %rsi: dst + * %rdx: src + * %rcx: nblks (multiple of 4) + */ + + pushq %rbp; + cfi_adjust_cfa_offset(8); + cfi_rel_offset(rbp, 0) + movq %rsp, %rbp; + cfi_def_cfa_register(%rbp); + + subq $STACK_MAX, %rsp; + andq $~15, %rsp; + +L(loop4): + mov $20, ROUND; + + /* Construct counter vectors X12 and X13 */ + movdqa L(inc_counter) rRIP, X0; + movdqa L(unsigned_cmp) rRIP, X2; + pbroadcastd((12 * 4)(INPUT), X12); + pbroadcastd((13 * 4)(INPUT), X13); + paddd X0, X12; + movdqa X12, X1; + pxor X2, X0; + pxor X2, X1; + pcmpgtd X1, X0; + psubd X0, X13; + movdqa X12, (STACK_VEC_X12)(%rsp); + movdqa X13, (STACK_VEC_X13)(%rsp); + + /* Load vectors */ + pbroadcastd((0 * 4)(INPUT), X0); + pbroadcastd((1 * 4)(INPUT), X1); + pbroadcastd((2 * 4)(INPUT), X2); + pbroadcastd((3 * 4)(INPUT), X3); + pbroadcastd((4 * 4)(INPUT), X4); + pbroadcastd((5 * 4)(INPUT), X5); + pbroadcastd((6 * 4)(INPUT), X6); + pbroadcastd((7 * 4)(INPUT), X7); + pbroadcastd((8 * 4)(INPUT), X8); + pbroadcastd((9 * 4)(INPUT), X9); + pbroadcastd((10 * 4)(INPUT), X10); + pbroadcastd((11 * 4)(INPUT), X11); + pbroadcastd((14 * 4)(INPUT), X14); + pbroadcastd((15 * 4)(INPUT), X15); + movdqa X11, (STACK_TMP)(%rsp); + movdqa X15, (STACK_TMP1)(%rsp); + +L(round2_4): + QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15,,) + movdqa (STACK_TMP)(%rsp), X11; + movdqa (STACK_TMP1)(%rsp), X15; + movdqa X8, (STACK_TMP)(%rsp); + movdqa X9, (STACK_TMP1)(%rsp); + QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9,,) + QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9,,) + movdqa (STACK_TMP)(%rsp), X8; + movdqa (STACK_TMP1)(%rsp), X9; + movdqa X11, (STACK_TMP)(%rsp); + movdqa X15, (STACK_TMP1)(%rsp); + QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15,,) + sub $2, ROUND; + jnz .Lround2_4; + + /* tmp := X15 */ + movdqa (STACK_TMP)(%rsp), X11; + pbroadcastd((0 * 4)(INPUT), X15); + PLUS(X0, X15); + pbroadcastd((1 * 4)(INPUT), X15); + PLUS(X1, X15); + pbroadcastd((2 * 4)(INPUT), X15); + PLUS(X2, X15); + pbroadcastd((3 * 4)(INPUT), X15); + PLUS(X3, X15); + pbroadcastd((4 * 4)(INPUT), X15); + PLUS(X4, X15); + pbroadcastd((5 * 4)(INPUT), X15); + PLUS(X5, X15); + pbroadcastd((6 * 4)(INPUT), X15); + PLUS(X6, X15); + pbroadcastd((7 * 4)(INPUT), X15); + PLUS(X7, X15); + pbroadcastd((8 * 4)(INPUT), X15); + PLUS(X8, X15); + pbroadcastd((9 * 4)(INPUT), X15); + PLUS(X9, X15); + pbroadcastd((10 * 4)(INPUT), X15); + PLUS(X10, X15); + pbroadcastd((11 * 4)(INPUT), X15); + PLUS(X11, X15); + movdqa (STACK_VEC_X12)(%rsp), X15; + PLUS(X12, X15); + movdqa (STACK_VEC_X13)(%rsp), X15; + PLUS(X13, X15); + movdqa X13, (STACK_TMP)(%rsp); + pbroadcastd((14 * 4)(INPUT), X15); + PLUS(X14, X15); + movdqa (STACK_TMP1)(%rsp), X15; + movdqa X14, (STACK_TMP1)(%rsp); + pbroadcastd((15 * 4)(INPUT), X13); + PLUS(X15, X13); + movdqa X15, (STACK_TMP2)(%rsp); + + /* Update counter */ + addq $4, (12 * 4)(INPUT); + + transpose_4x4(X0, X1, X2, X3, X13, X14, X15); + xor_src_dst(DST, SRC, (64 * 0 + 16 * 0), X0, X15); + xor_src_dst(DST, SRC, (64 * 1 + 16 * 0), X1, X15); + xor_src_dst(DST, SRC, (64 * 2 + 16 * 0), X2, X15); + xor_src_dst(DST, SRC, (64 * 3 + 16 * 0), X3, X15); + transpose_4x4(X4, X5, X6, X7, X0, X1, X2); + movdqa (STACK_TMP)(%rsp), X13; + movdqa (STACK_TMP1)(%rsp), X14; + movdqa (STACK_TMP2)(%rsp), X15; + xor_src_dst(DST, SRC, (64 * 0 + 16 * 1), X4, X0); + xor_src_dst(DST, SRC, (64 * 1 + 16 * 1), X5, X0); + xor_src_dst(DST, SRC, (64 * 2 + 16 * 1), X6, X0); + xor_src_dst(DST, SRC, (64 * 3 + 16 * 1), X7, X0); + transpose_4x4(X8, X9, X10, X11, X0, X1, X2); + xor_src_dst(DST, SRC, (64 * 0 + 16 * 2), X8, X0); + xor_src_dst(DST, SRC, (64 * 1 + 16 * 2), X9, X0); + xor_src_dst(DST, SRC, (64 * 2 + 16 * 2), X10, X0); + xor_src_dst(DST, SRC, (64 * 3 + 16 * 2), X11, X0); + transpose_4x4(X12, X13, X14, X15, X0, X1, X2); + xor_src_dst(DST, SRC, (64 * 0 + 16 * 3), X12, X0); + xor_src_dst(DST, SRC, (64 * 1 + 16 * 3), X13, X0); + xor_src_dst(DST, SRC, (64 * 2 + 16 * 3), X14, X0); + xor_src_dst(DST, SRC, (64 * 3 + 16 * 3), X15, X0); + + sub $4, NBLKS; + lea (4 * 64)(DST), DST; + lea (4 * 64)(SRC), SRC; + jnz L(loop4); + + /* clear the used vector registers and stack */ + clear(X0); + movdqa X0, (STACK_VEC_X12)(%rsp); + movdqa X0, (STACK_VEC_X13)(%rsp); + movdqa X0, (STACK_TMP)(%rsp); + movdqa X0, (STACK_TMP1)(%rsp); + movdqa X0, (STACK_TMP2)(%rsp); + clear(X1); + clear(X2); + clear(X3); + clear(X4); + clear(X5); + clear(X6); + clear(X7); + clear(X8); + clear(X9); + clear(X10); + clear(X11); + clear(X12); + clear(X13); + clear(X14); + clear(X15); + + /* eax zeroed by round loop. */ + leave; + cfi_adjust_cfa_offset(-8) + cfi_def_cfa_register(%rsp); + ret; + int3; +END (__chacha20_ssse3_blocks8) diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h new file mode 100644 index 0000000000..37a4fdfb1f --- /dev/null +++ b/sysdeps/x86_64/chacha20_arch.h @@ -0,0 +1,42 @@ +/* Chacha20 implementation, used on arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks); + +static inline void +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4) + { + size_t nblocks = bytes / CHACHA20_BLOCK_SIZE; + nblocks -= nblocks % 4; + __chacha20_ssse3_blocks8 (state->ctx, dst, src, nblocks); + bytes -= nblocks * CHACHA20_BLOCK_SIZE; + dst += nblocks * CHACHA20_BLOCK_SIZE; + src += nblocks * CHACHA20_BLOCK_SIZE; + } + + if (bytes > 0) + chacha20_crypt_generic (state, dst, src, bytes); +} From patchwork Wed Apr 13 20:23:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52880 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B583C3858C2C for ; Wed, 13 Apr 2022 20:28:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B583C3858C2C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881692; bh=OAWM1uKuwIVwFzpRaiRWqXg46PdLlRgBDneak4ga4+Y=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=J3gXiS9HPN96S6ZDOEbNvf8+CNbiq9GBMt2AspMy5Ekzz350DyS4vSYywjqekHILU nvJaR2hghSFQdfoZ4eqLFVOjMjEQqf2UlZKJXKh0GrwBlmYPF8leEvW6wdcZEo0j1x pGtZ78+dsDjns0PF4TgKBq5qH+gvMSlZ+eCO7/kg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x36.google.com (mail-oa1-x36.google.com [IPv6:2001:4860:4864:20::36]) by sourceware.org (Postfix) with ESMTPS id 23DDC385780C for ; Wed, 13 Apr 2022 20:24:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 23DDC385780C Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-e2fa360f6dso3226522fac.2 for ; Wed, 13 Apr 2022 13:24:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OAWM1uKuwIVwFzpRaiRWqXg46PdLlRgBDneak4ga4+Y=; b=RxvJcuJ965ae/61XAeD3elNFiuRxg/Gx9mfJVGmM+8kvbVsix//0KG77FYU10pvn43 hoIobfGj6lmV8XgbIXBI8IqrheP2F7PTcreZjZ4snOk3WEF9cFsYshjwIizcrnNXyBAg J+ZDOGinB9w1zM/Be2GC7q9GL2i42hfKLEo5vaQiml/BRvFoe2DFLwk+fQ3+EMfC0fjh XvLCZYXcQf4BehjFhJlNPE0pvHCKCaSORTQc7PlynNu2fJ+R1bJFpYLA3zeRo96nJidN UDGdPJI2+YLKK9rIZGE6cw6gTm2Dmb9oi0w3PME9BsPhR+bgxRsGbzPU0xhz2+7tn/P9 +n+A== X-Gm-Message-State: AOAM5333VthH5DY5+pk6EaEgAoHB3mhEVX8+/VpJEzlP3G/DC5hab+JB +9kWPUEX2SZtV5lVJUOqyuYU2J7j/7HVzg== X-Google-Smtp-Source: ABdhPJycWPORiLj7mvfARoXHezkJ38uJ1HdLMU+1t8IjB/9UTREymj0UVquxl2+grlycSdUKtFRQiQ== X-Received: by 2002:a05:6870:1613:b0:de:29de:12cd with SMTP id b19-20020a056870161300b000de29de12cdmr192986oae.197.1649881458816; Wed, 13 Apr 2022 13:24:18 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:18 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 5/7] x86: Add AVX2 optimized chacha20 Date: Wed, 13 Apr 2022 17:23:59 -0300 Message-Id: <20220413202401.408267-6-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-amd64-avx2.S. It is used only if AVX2 is supported and enabled by the architecture. On a Ryzen 9 5900X it shows the following improvements (using formatted bench-arc4random data): SSSE3: Function MB/s -------------------------------------------------- arc4random [single-thread] 576.55 arc4random_buf(0) [single-thread] 961.77 arc4random_buf(16) [single-thread] 1309.38 arc4random_buf(32) [single-thread] 1558.69 arc4random_buf(64) [single-thread] 1728.54 -------------------------------------------------- arc4random [multi-thread] 589.52 arc4random_buf(0) [multi-thread] 967.39 arc4random_buf(16) [multi-thread] 1319.27 arc4random_buf(32) [multi-thread] 1552.96 arc4random_buf(64) [multi-thread] 1734.27 -------------------------------------------------- AVX2: Function MB/s -------------------------------------------------- arc4random [single-thread] 672.49 arc4random_buf(0) [single-thread] 1234.85 arc4random_buf(16) [single-thread] 1892.67 arc4random_buf(32) [single-thread] 2491.10 arc4random_buf(64) [single-thread] 2696.27 -------------------------------------------------- arc4random [multi-thread] 661.25 arc4random_buf(0) [multi-thread] 1214.65 arc4random_buf(16) [multi-thread] 1867.98 arc4random_buf(32) [multi-thread] 2474.70 arc4random_buf(64) [multi-thread] 2893.21 -------------------------------------------------- Checked on x86_64-linux-gnu. --- LICENSES | 4 +- stdlib/chacha20.c | 7 +- sysdeps/x86_64/Makefile | 1 + sysdeps/x86_64/chacha20-avx2.S | 317 +++++++++++++++++++++++++++++++++ sysdeps/x86_64/chacha20_arch.h | 14 ++ 5 files changed, 339 insertions(+), 4 deletions(-) create mode 100644 sysdeps/x86_64/chacha20-avx2.S diff --git a/LICENSES b/LICENSES index 2563abd9e2..8ef0f023d7 100644 --- a/LICENSES +++ b/LICENSES @@ -390,8 +390,8 @@ Copyright 2001 by Stephen L. Moshier License along with this library; if not, see . */ -sysdeps/x86_64/chacha20-ssse3.S import code from libgcrypt, with the -following notices: +sysdeps/x86_64/chacha20-ssse3.S and sysdeps/x86_64/chacha20-avx2.S +import code from libgcrypt, with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c index dbd87bd942..8569e1e78d 100644 --- a/stdlib/chacha20.c +++ b/stdlib/chacha20.c @@ -190,8 +190,8 @@ memxorcpy (uint8_t *dst, const uint8_t *src1, const uint8_t *src2, size_t len) } static void -chacha20_crypt (struct chacha20_state *state, uint8_t *dst, - const uint8_t *src, size_t bytes) +chacha20_crypt_generic (struct chacha20_state *state, uint8_t *dst, + const uint8_t *src, size_t bytes) { uint8_t stream[CHACHA20_BLOCK_SIZE]; @@ -209,3 +209,6 @@ chacha20_crypt (struct chacha20_state *state, uint8_t *dst, memxorcpy (dst, src, stream, bytes); } } + +/* Get the arch-optimized implementation, if any. */ +#include diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index f43b6a1180..afb4d173e8 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -7,6 +7,7 @@ endif ifeq ($(subdir),stdlib) sysdep_routines += \ + chacha20-avx2 \ chacha20-ssse3 \ # sysdep_routines endif diff --git a/sysdeps/x86_64/chacha20-avx2.S b/sysdeps/x86_64/chacha20-avx2.S new file mode 100644 index 0000000000..96174c0e40 --- /dev/null +++ b/sysdeps/x86_64/chacha20-avx2.S @@ -0,0 +1,317 @@ +/* Optimized AVX2 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#ifdef PIC +# define rRIP (%rip) +#else +# define rRIP +#endif + +/* register macros */ +#define INPUT %rdi +#define DST %rsi +#define SRC %rdx +#define NBLKS %rcx +#define ROUND %eax + +/* stack structure */ +#define STACK_VEC_X12 (32) +#define STACK_VEC_X13 (32 + STACK_VEC_X12) +#define STACK_TMP (32 + STACK_VEC_X13) +#define STACK_TMP1 (32 + STACK_TMP) + +#define STACK_MAX (32 + STACK_TMP1) + +/* vector registers */ +#define X0 %ymm0 +#define X1 %ymm1 +#define X2 %ymm2 +#define X3 %ymm3 +#define X4 %ymm4 +#define X5 %ymm5 +#define X6 %ymm6 +#define X7 %ymm7 +#define X8 %ymm8 +#define X9 %ymm9 +#define X10 %ymm10 +#define X11 %ymm11 +#define X12 %ymm12 +#define X13 %ymm13 +#define X14 %ymm14 +#define X15 %ymm15 + +#define X0h %xmm0 +#define X1h %xmm1 +#define X2h %xmm2 +#define X3h %xmm3 +#define X4h %xmm4 +#define X5h %xmm5 +#define X6h %xmm6 +#define X7h %xmm7 +#define X8h %xmm8 +#define X9h %xmm9 +#define X10h %xmm10 +#define X11h %xmm11 +#define X12h %xmm12 +#define X13h %xmm13 +#define X14h %xmm14 +#define X15h %xmm15 + +/********************************************************************** + helper macros + **********************************************************************/ + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ + vpunpckhdq x1, x0, t2; \ + vpunpckldq x1, x0, x0; \ + \ + vpunpckldq x3, x2, t1; \ + vpunpckhdq x3, x2, x2; \ + \ + vpunpckhqdq t1, x0, x1; \ + vpunpcklqdq t1, x0, x0; \ + \ + vpunpckhqdq x2, t2, x3; \ + vpunpcklqdq x2, t2, x2; + +/* 2x2 128-bit matrix transpose */ +#define transpose_16byte_2x2(x0,x1,t1) \ + vmovdqa x0, t1; \ + vperm2i128 $0x20, x1, x0, x0; \ + vperm2i128 $0x31, x1, t1, x1; + +/* xor register with unaligned src and save to unaligned dst */ +#define xor_src_dst(dst, src, offset, xreg) \ + vpxor offset(src), xreg, xreg; \ + vmovdqu xreg, offset(dst); + +/********************************************************************** + 8-way chacha20 + **********************************************************************/ + +#define ROTATE2(v1,v2,c,tmp) \ + vpsrld $(32 - (c)), v1, tmp; \ + vpslld $(c), v1, v1; \ + vpaddb tmp, v1, v1; \ + vpsrld $(32 - (c)), v2, tmp; \ + vpslld $(c), v2, v2; \ + vpaddb tmp, v2, v2; + +#define ROTATE_SHUF_2(v1,v2,shuf) \ + vpshufb shuf, v1, v1; \ + vpshufb shuf, v2, v2; + +#define XOR(ds,s) \ + vpxor s, ds, ds; + +#define PLUS(ds,s) \ + vpaddd s, ds, ds; + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ + interleave_op1,interleave_op2,\ + interleave_op3,interleave_op4) \ + vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ + interleave_op1; \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE_SHUF_2(d1, d2, tmp1); \ + interleave_op2; \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 12, tmp1); \ + vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ + interleave_op3; \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE_SHUF_2(d1, d2, tmp1); \ + interleave_op4; \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 7, tmp1); + + .text + .align 32 +chacha20_data: +L(shuf_rol16): + .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 +L(shuf_rol8): + .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 +L(inc_counter): + .byte 0,1,2,3,4,5,6,7 +L(unsigned_cmp): + .long 0x80000000 + +ENTRY (__chacha20_avx2_blocks8) + /* input: + * %rdi: input + * %rsi: dst + * %rdx: src + * %rcx: nblks (multiple of 8) + */ + vzeroupper; + + pushq %rbp; + cfi_adjust_cfa_offset(8); + cfi_rel_offset(rbp, 0) + movq %rsp, %rbp; + cfi_def_cfa_register(rbp); + + subq $STACK_MAX, %rsp; + andq $~31, %rsp; + +L(loop8): + mov $20, ROUND; + + /* Construct counter vectors X12 and X13 */ + vpmovzxbd L(inc_counter) rRIP, X0; + vpbroadcastd L(unsigned_cmp) rRIP, X2; + vpbroadcastd (12 * 4)(INPUT), X12; + vpbroadcastd (13 * 4)(INPUT), X13; + vpaddd X0, X12, X12; + vpxor X2, X0, X0; + vpxor X2, X12, X1; + vpcmpgtd X1, X0, X0; + vpsubd X0, X13, X13; + vmovdqa X12, (STACK_VEC_X12)(%rsp); + vmovdqa X13, (STACK_VEC_X13)(%rsp); + + /* Load vectors */ + vpbroadcastd (0 * 4)(INPUT), X0; + vpbroadcastd (1 * 4)(INPUT), X1; + vpbroadcastd (2 * 4)(INPUT), X2; + vpbroadcastd (3 * 4)(INPUT), X3; + vpbroadcastd (4 * 4)(INPUT), X4; + vpbroadcastd (5 * 4)(INPUT), X5; + vpbroadcastd (6 * 4)(INPUT), X6; + vpbroadcastd (7 * 4)(INPUT), X7; + vpbroadcastd (8 * 4)(INPUT), X8; + vpbroadcastd (9 * 4)(INPUT), X9; + vpbroadcastd (10 * 4)(INPUT), X10; + vpbroadcastd (11 * 4)(INPUT), X11; + vpbroadcastd (14 * 4)(INPUT), X14; + vpbroadcastd (15 * 4)(INPUT), X15; + vmovdqa X15, (STACK_TMP)(%rsp); + +L(round2): + QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) + vmovdqa (STACK_TMP)(%rsp), X15; + vmovdqa X8, (STACK_TMP)(%rsp); + QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) + QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) + vmovdqa (STACK_TMP)(%rsp), X8; + vmovdqa X15, (STACK_TMP)(%rsp); + QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) + sub $2, ROUND; + jnz L(round2); + + vmovdqa X8, (STACK_TMP1)(%rsp); + + /* tmp := X15 */ + vpbroadcastd (0 * 4)(INPUT), X15; + PLUS(X0, X15); + vpbroadcastd (1 * 4)(INPUT), X15; + PLUS(X1, X15); + vpbroadcastd (2 * 4)(INPUT), X15; + PLUS(X2, X15); + vpbroadcastd (3 * 4)(INPUT), X15; + PLUS(X3, X15); + vpbroadcastd (4 * 4)(INPUT), X15; + PLUS(X4, X15); + vpbroadcastd (5 * 4)(INPUT), X15; + PLUS(X5, X15); + vpbroadcastd (6 * 4)(INPUT), X15; + PLUS(X6, X15); + vpbroadcastd (7 * 4)(INPUT), X15; + PLUS(X7, X15); + transpose_4x4(X0, X1, X2, X3, X8, X15); + transpose_4x4(X4, X5, X6, X7, X8, X15); + vmovdqa (STACK_TMP1)(%rsp), X8; + transpose_16byte_2x2(X0, X4, X15); + transpose_16byte_2x2(X1, X5, X15); + transpose_16byte_2x2(X2, X6, X15); + transpose_16byte_2x2(X3, X7, X15); + vmovdqa (STACK_TMP)(%rsp), X15; + xor_src_dst(DST, SRC, (64 * 0 + 16 * 0), X0); + xor_src_dst(DST, SRC, (64 * 1 + 16 * 0), X1); + vpbroadcastd (8 * 4)(INPUT), X0; + PLUS(X8, X0); + vpbroadcastd (9 * 4)(INPUT), X0; + PLUS(X9, X0); + vpbroadcastd (10 * 4)(INPUT), X0; + PLUS(X10, X0); + vpbroadcastd (11 * 4)(INPUT), X0; + PLUS(X11, X0); + vmovdqa (STACK_VEC_X12)(%rsp), X0; + PLUS(X12, X0); + vmovdqa (STACK_VEC_X13)(%rsp), X0; + PLUS(X13, X0); + vpbroadcastd (14 * 4)(INPUT), X0; + PLUS(X14, X0); + vpbroadcastd (15 * 4)(INPUT), X0; + PLUS(X15, X0); + xor_src_dst(DST, SRC, (64 * 2 + 16 * 0), X2); + xor_src_dst(DST, SRC, (64 * 3 + 16 * 0), X3); + + /* Update counter */ + addq $8, (12 * 4)(INPUT); + + transpose_4x4(X8, X9, X10, X11, X0, X1); + transpose_4x4(X12, X13, X14, X15, X0, X1); + xor_src_dst(DST, SRC, (64 * 4 + 16 * 0), X4); + xor_src_dst(DST, SRC, (64 * 5 + 16 * 0), X5); + transpose_16byte_2x2(X8, X12, X0); + transpose_16byte_2x2(X9, X13, X0); + transpose_16byte_2x2(X10, X14, X0); + transpose_16byte_2x2(X11, X15, X0); + xor_src_dst(DST, SRC, (64 * 6 + 16 * 0), X6); + xor_src_dst(DST, SRC, (64 * 7 + 16 * 0), X7); + xor_src_dst(DST, SRC, (64 * 0 + 16 * 2), X8); + xor_src_dst(DST, SRC, (64 * 1 + 16 * 2), X9); + xor_src_dst(DST, SRC, (64 * 2 + 16 * 2), X10); + xor_src_dst(DST, SRC, (64 * 3 + 16 * 2), X11); + xor_src_dst(DST, SRC, (64 * 4 + 16 * 2), X12); + xor_src_dst(DST, SRC, (64 * 5 + 16 * 2), X13); + xor_src_dst(DST, SRC, (64 * 6 + 16 * 2), X14); + xor_src_dst(DST, SRC, (64 * 7 + 16 * 2), X15); + + sub $8, NBLKS; + lea (8 * 64)(DST), DST; + lea (8 * 64)(SRC), SRC; + jnz L(loop8); + + /* clear the used vector registers and stack */ + vpxor X0, X0, X0; + vmovdqa X0, (STACK_VEC_X12)(%rsp); + vmovdqa X0, (STACK_VEC_X13)(%rsp); + vmovdqa X0, (STACK_TMP)(%rsp); + vmovdqa X0, (STACK_TMP1)(%rsp); + vzeroall; + + /* eax zeroed by round loop. */ + leave; + cfi_adjust_cfa_offset(-8) + cfi_def_cfa_register(%rsp); + ret; + int3; +END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h index 37a4fdfb1f..7e9e7755f3 100644 --- a/sysdeps/x86_64/chacha20_arch.h +++ b/sysdeps/x86_64/chacha20_arch.h @@ -22,11 +22,25 @@ unsigned int __chacha20_ssse3_blocks8 (uint32_t *state, uint8_t *dst, const uint8_t *src, size_t nblks); +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks); static inline void chacha20_crypt (struct chacha20_state *state, uint8_t *dst, const uint8_t *src, size_t bytes) { + const struct cpu_features* cpu_features = __get_cpu_features (); + + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) && bytes >= CHACHA20_BLOCK_SIZE * 8) + { + size_t nblocks = bytes / CHACHA20_BLOCK_SIZE; + nblocks -= nblocks % 8; + __chacha20_avx2_blocks8 (state->ctx, dst, src, nblocks); + bytes -= nblocks * CHACHA20_BLOCK_SIZE; + dst += nblocks * CHACHA20_BLOCK_SIZE; + src += nblocks * CHACHA20_BLOCK_SIZE; + } + if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3) && bytes >= CHACHA20_BLOCK_SIZE * 4) { size_t nblocks = bytes / CHACHA20_BLOCK_SIZE; From patchwork Wed Apr 13 20:24:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52881 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 769D0385801B for ; Wed, 13 Apr 2022 20:29:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 769D0385801B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881740; bh=0cNLIcdYW9IUTwGmoHekiUmRqvDpcKaO+xhjdDTj1bU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=JOOtXM4B1w/6DWl6D4GqGj+PpC2lxANFNGnYciL5fXsWY/O+uZmXeEB39ePa/MLMj 1Y8fiK5+JqObYTVouZDhT4c3n4nB5pKmRw+iizCfiHdiHBQ+k4/IxObSTfYS96jtO+ Hig7e674XFOMfIpRM3mf5gUErROfgeksK4LjsRRI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by sourceware.org (Postfix) with ESMTPS id 0D6423857811 for ; Wed, 13 Apr 2022 20:24:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0D6423857811 Received: by mail-oi1-x230.google.com with SMTP id q189so3268774oia.9 for ; Wed, 13 Apr 2022 13:24:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0cNLIcdYW9IUTwGmoHekiUmRqvDpcKaO+xhjdDTj1bU=; b=qUAIY2gRZtjs+XZorwYPlPHl6wRf2aOlThS8hf/B4aSEr/QcYmv00o7wvGAh0JOhnY g7VWEvNuE5Ogc8AgSaLrUZYINEZm2sxmsLWKhsLw8jcxVr5AXFnxdR/jlW1j9jpBflfQ IkMN58NFRvoBBKz4rvHZ4EWFMsN/WPuRhs2bHBeCDHCrrKzGRO0qpwJRtK+ojhpfqfjT zoX0uLpppP6TwbJyxeYEaB5vU0Ed8E+n4uQTQdCdbqPGma8stSxuYa7sW0lbpl/AjaSN hJEj4d0I4zhEazXIUzuU9Bhal2bA4WMjGcoQZdQsWDG4y4iljkTC5BWlvApNSW46dN1W v5KQ== X-Gm-Message-State: AOAM531yu/MIGErXw66sKqF3aKfn24qCAfWC5fukHtGsFDi35hfLlrSE F0W4qMy2jd9+wdgHtAy0eejHENoLw6RF4Q== X-Google-Smtp-Source: ABdhPJxFQx4w1ZrmAOhNX23svISHXb6st4yKZfOV4PCWmlA3ex8T2DJi+cRp0P/j2srtjcB7Oxo4/Q== X-Received: by 2002:aca:bb56:0:b0:2ef:6652:5581 with SMTP id l83-20020acabb56000000b002ef66525581mr254736oif.270.1649881460719; Wed, 13 Apr 2022 13:24:20 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:20 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 6/7] aarch64: Add optimized chacha20 Date: Wed, 13 Apr 2022 17:24:00 -0300 Message-Id: <20220413202401.408267-7-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-aarch64.S. It is used as default and only little-endian is supported (BE uses generic fallback code). On a Neoverse-N1 it shows the following improvements (using formatted bench-arc4random data): GENERIC Function MB/s -------------------------------------------------- arc4random [single-thread] 129.96 arc4random_buf(0) [single-thread] 245.83 arc4random_buf(16) [single-thread] 312.38 arc4random_buf(32) [single-thread] 353.77 arc4random_buf(64) [single-thread] 380.53 -------------------------------------------------- arc4random [multi-thread] 129.63 arc4random_buf(0) [multi-thread] 245.54 arc4random_buf(16) [multi-thread] 309.15 arc4random_buf(32) [multi-thread] 356.40 arc4random_buf(64) [multi-thread] 381.94 -------------------------------------------------- OPTIMIZED Function MB/s -------------------------------------------------- arc4random [single-thread] 153.76 arc4random_buf(0) [single-thread] 349.12 arc4random_buf(16) [single-thread] 498.68 arc4random_buf(32) [single-thread] 619.87 arc4random_buf(64) [single-thread] 706.69 -------------------------------------------------- arc4random [multi-thread] 154.25 arc4random_buf(0) [multi-thread] 349.08 arc4random_buf(16) [multi-thread] 494.77 arc4random_buf(32) [multi-thread] 623.87 arc4random_buf(64) [multi-thread] 706.63 -------------------------------------------------- Checked on aarch64-linux-gnu. --- LICENSES | 5 +- sysdeps/aarch64/Makefile | 4 + sysdeps/aarch64/chacha20.S | 357 ++++++++++++++++++++++++++++++++ sysdeps/aarch64/chacha20_arch.h | 43 ++++ 4 files changed, 407 insertions(+), 2 deletions(-) create mode 100644 sysdeps/aarch64/chacha20.S create mode 100644 sysdeps/aarch64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index 8ef0f023d7..b0c43495cb 100644 --- a/LICENSES +++ b/LICENSES @@ -390,8 +390,9 @@ Copyright 2001 by Stephen L. Moshier License along with this library; if not, see . */ -sysdeps/x86_64/chacha20-ssse3.S and sysdeps/x86_64/chacha20-avx2.S -import code from libgcrypt, with the following notices: +sysdeps/x86_64/chacha20-ssse3.S, sysdeps/x86_64/chacha20-avx2.S, and +sysdeps/aarch64/chacha20.S import code from libgcrypt, with the +following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 7183895d04..173665e306 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -50,6 +50,10 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20 +endif + ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20.S b/sysdeps/aarch64/chacha20.S new file mode 100644 index 0000000000..730b9a14b9 --- /dev/null +++ b/sysdeps/aarch64/chacha20.S @@ -0,0 +1,357 @@ +/* Optimized AArch64 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +/* Only LE is supported. */ +#ifdef __AARCH64EL__ + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#define GET_DATA_POINTER(reg, name) \ + adrp reg, :got:name ; \ + ldr reg, [reg, #:got_lo12:name] ; + +/* 'ret' instruction replacement for straight-line speculation mitigation */ +#define ret_spec_stop \ + ret; dsb sy; isb; + +.cpu generic+simd + +.text + +/* register macros */ +#define INPUT x0 +#define DST x1 +#define SRC x2 +#define NBLKS x3 +#define ROUND x4 +#define INPUT_CTR x5 +#define INPUT_POS x6 +#define CTR x7 + +/* vector registers */ +#define X0 v16 +#define X1 v17 +#define X2 v18 +#define X3 v19 +#define X4 v20 +#define X5 v21 +#define X6 v22 +#define X7 v23 +#define X8 v24 +#define X9 v25 +#define X10 v26 +#define X11 v27 +#define X12 v28 +#define X13 v29 +#define X14 v30 +#define X15 v31 + +#define VCTR v0 +#define VTMP0 v1 +#define VTMP1 v2 +#define VTMP2 v3 +#define VTMP3 v4 +#define X12_TMP v5 +#define X13_TMP v6 +#define ROT8 v7 + +/********************************************************************** + helper macros + **********************************************************************/ + +#define _(...) __VA_ARGS__ + +#define vpunpckldq(s1, s2, dst) \ + zip1 dst.4s, s2.4s, s1.4s; + +#define vpunpckhdq(s1, s2, dst) \ + zip2 dst.4s, s2.4s, s1.4s; + +#define vpunpcklqdq(s1, s2, dst) \ + zip1 dst.2d, s2.2d, s1.2d; + +#define vpunpckhqdq(s1, s2, dst) \ + zip2 dst.2d, s2.2d, s1.2d; + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ + vpunpckhdq(x1, x0, t2); \ + vpunpckldq(x1, x0, x0); \ + \ + vpunpckldq(x3, x2, t1); \ + vpunpckhdq(x3, x2, x2); \ + \ + vpunpckhqdq(t1, x0, x1); \ + vpunpcklqdq(t1, x0, x0); \ + \ + vpunpckhqdq(x2, t2, x3); \ + vpunpcklqdq(x2, t2, x2); + +#define clear(x) \ + movi x.16b, #0; + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define XOR(d,s1,s2) \ + eor d.16b, s2.16b, s1.16b; + +#define PLUS(ds,s) \ + add ds.4s, ds.4s, s.4s; + +#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4,iop1,iop2,iop3) \ + shl dst1.4s, src1.4s, #(c); \ + shl dst2.4s, src2.4s, #(c); \ + iop1; \ + shl dst3.4s, src3.4s, #(c); \ + shl dst4.4s, src4.4s, #(c); \ + iop2; \ + sri dst1.4s, src1.4s, #(32 - (c)); \ + sri dst2.4s, src2.4s, #(32 - (c)); \ + iop3; \ + sri dst3.4s, src3.4s, #(32 - (c)); \ + sri dst4.4s, src4.4s, #(32 - (c)); + +#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4,iop1,iop2,iop3) \ + tbl dst1.16b, {src1.16b}, ROT8.16b; \ + iop1; \ + tbl dst2.16b, {src2.16b}, ROT8.16b; \ + iop2; \ + tbl dst3.16b, {src3.16b}, ROT8.16b; \ + iop3; \ + tbl dst4.16b, {src4.16b}, ROT8.16b; + +#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4,iop1) \ + rev32 dst1.8h, src1.8h; \ + rev32 dst2.8h, src2.8h; \ + iop1; \ + rev32 dst3.8h, src3.8h; \ + rev32 dst4.8h, src4.8h; + +#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4,\ + iop1,iop2,iop3,iop4,iop5,iop6,iop7,iop8,iop9,iop10,iop11,iop12,iop13,iop14,\ + iop15,iop16,iop17,iop18,iop19,iop20,iop21,iop22,iop23,iop24,iop25,iop26,\ + iop27,iop28,iop29) \ + PLUS(a1,b1); PLUS(a2,b2); iop1; \ + PLUS(a3,b3); PLUS(a4,b4); iop2; \ + XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); iop3; \ + XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); iop4; \ + ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4, _(iop5)); \ + iop6; \ + PLUS(c1,d1); PLUS(c2,d2); iop7; \ + PLUS(c3,d3); PLUS(c4,d4); iop8; \ + XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); iop9; \ + XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); iop10; \ + ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4, \ + _(iop11), _(iop12), _(iop13)); iop14; \ + PLUS(a1,b1); PLUS(a2,b2); iop15; \ + PLUS(a3,b3); PLUS(a4,b4); iop16; \ + XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); iop17; \ + XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); iop18; \ + ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4, \ + _(iop19), _(iop20), _(iop21)); iop22; \ + PLUS(c1,d1); PLUS(c2,d2); iop23; \ + PLUS(c3,d3); PLUS(c4,d4); iop24; \ + XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); iop25; \ + XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); iop26; \ + ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4, \ + _(iop27), _(iop28), _(iop29)); + +.align 4 +.hidden __chacha20_blocks4_data_inc_counter +__chacha20_blocks4_data_inc_counter: + .long 0,1,2,3 + +.align 4 +.hidden __chacha20_blocks4_data_rot8 +__chacha20_blocks4_data_rot8: + .byte 3,0,1,2 + .byte 7,4,5,6 + .byte 11,8,9,10 + .byte 15,12,13,14 + +ENTRY (__chacha20_neon_blocks4) + /* input: + * x0: input + * x1: dst + * x2: src + * x3: nblks (multiple of 4) + */ + + GET_DATA_POINTER(CTR, __chacha20_blocks4_data_rot8); + add INPUT_CTR, INPUT, #(12*4); + ld1 {ROT8.16b}, [CTR]; + GET_DATA_POINTER(CTR, __chacha20_blocks4_data_inc_counter); + mov INPUT_POS, INPUT; + ld1 {VCTR.16b}, [CTR]; + +L(loop4): + /* Construct counter vectors X12 and X13 */ + + ld1 {X15.16b}, [INPUT_CTR]; + mov ROUND, #20; + ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; + + dup X12.4s, X15.s[0]; + dup X13.4s, X15.s[1]; + ldr CTR, [INPUT_CTR]; + add X12.4s, X12.4s, VCTR.4s; + dup X0.4s, VTMP1.s[0]; + dup X1.4s, VTMP1.s[1]; + dup X2.4s, VTMP1.s[2]; + dup X3.4s, VTMP1.s[3]; + dup X14.4s, X15.s[2]; + cmhi VTMP0.4s, VCTR.4s, X12.4s; + dup X15.4s, X15.s[3]; + add CTR, CTR, #4; /* Update counter */ + dup X4.4s, VTMP2.s[0]; + dup X5.4s, VTMP2.s[1]; + dup X6.4s, VTMP2.s[2]; + dup X7.4s, VTMP2.s[3]; + sub X13.4s, X13.4s, VTMP0.4s; + dup X8.4s, VTMP3.s[0]; + dup X9.4s, VTMP3.s[1]; + dup X10.4s, VTMP3.s[2]; + dup X11.4s, VTMP3.s[3]; + mov X12_TMP.16b, X12.16b; + mov X13_TMP.16b, X13.16b; + str CTR, [INPUT_CTR]; + +L(round2): + subs ROUND, ROUND, #2 + QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, + X2, X6, X10, X14, X3, X7, X11, X15, + tmp:=,VTMP0,VTMP1,VTMP2,VTMP3, + ,,,,,,,,,,,,,,,,,,,,,,,,,,,,) + QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, + X2, X7, X8, X13, X3, X4, X9, X14, + tmp:=,VTMP0,VTMP1,VTMP2,VTMP3, + ,,,,,,,,,,,,,,,,,,,,,,,,,,,,) + b.ne L(round2); + + ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; + + PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ + PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ + + dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ + dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ + dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ + dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ + PLUS(X0, VTMP2); + PLUS(X1, VTMP3); + PLUS(X2, X12_TMP); + PLUS(X3, X13_TMP); + + dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ + dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ + dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ + dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ + ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; + mov INPUT_POS, INPUT; + PLUS(X4, VTMP2); + PLUS(X5, VTMP3); + PLUS(X6, X12_TMP); + PLUS(X7, X13_TMP); + + dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ + dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ + dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ + dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ + dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ + dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ + PLUS(X8, VTMP2); + PLUS(X9, VTMP3); + PLUS(X10, X12_TMP); + PLUS(X11, X13_TMP); + PLUS(X14, VTMP0); + PLUS(X15, VTMP1); + + transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); + transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); + transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); + transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); + + subs NBLKS, NBLKS, #4; + + ld1 {VTMP0.16b-VTMP3.16b}, [SRC], #64; + ld1 {X12_TMP.16b-X13_TMP.16b}, [SRC], #32; + eor VTMP0.16b, X0.16b, VTMP0.16b; + eor VTMP1.16b, X4.16b, VTMP1.16b; + eor VTMP2.16b, X8.16b, VTMP2.16b; + eor VTMP3.16b, X12.16b, VTMP3.16b; + eor X12_TMP.16b, X1.16b, X12_TMP.16b; + eor X13_TMP.16b, X5.16b, X13_TMP.16b; + st1 {VTMP0.16b-VTMP3.16b}, [DST], #64; + ld1 {VTMP0.16b-VTMP3.16b}, [SRC], #64; + st1 {X12_TMP.16b-X13_TMP.16b}, [DST], #32; + ld1 {X12_TMP.16b-X13_TMP.16b}, [SRC], #32; + eor VTMP0.16b, X9.16b, VTMP0.16b; + eor VTMP1.16b, X13.16b, VTMP1.16b; + eor VTMP2.16b, X2.16b, VTMP2.16b; + eor VTMP3.16b, X6.16b, VTMP3.16b; + eor X12_TMP.16b, X10.16b, X12_TMP.16b; + eor X13_TMP.16b, X14.16b, X13_TMP.16b; + st1 {VTMP0.16b-VTMP3.16b}, [DST], #64; + ld1 {VTMP0.16b-VTMP3.16b}, [SRC], #64; + st1 {X12_TMP.16b-X13_TMP.16b}, [DST], #32; + eor VTMP0.16b, X3.16b, VTMP0.16b; + eor VTMP1.16b, X7.16b, VTMP1.16b; + eor VTMP2.16b, X11.16b, VTMP2.16b; + eor VTMP3.16b, X15.16b, VTMP3.16b; + st1 {VTMP0.16b-VTMP3.16b}, [DST], #64; + + b.ne L(loop4); + + /* clear the used vector registers and stack */ + clear(VTMP0); + clear(VTMP1); + clear(VTMP2); + clear(VTMP3); + clear(X12_TMP); + clear(X13_TMP); + clear(X0); + clear(X1); + clear(X2); + clear(X3); + clear(X4); + clear(X5); + clear(X6); + clear(X7); + clear(X8); + clear(X9); + clear(X10); + clear(X11); + clear(X12); + clear(X13); + clear(X14); + clear(X15); + + eor x0, x0, x0 + ret_spec_stop +END (__chacha20_neon_blocks4) + +#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h new file mode 100644 index 0000000000..f7b9462793 --- /dev/null +++ b/sysdeps/aarch64/chacha20_arch.h @@ -0,0 +1,43 @@ +/* Chacha20 implementation, used on arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks); + +static void +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ +#ifdef __AARCH64EL__ + if (bytes >= CHACHA20_BLOCK_SIZE * 4) + { + size_t nblocks = bytes / CHACHA20_BLOCK_SIZE; + nblocks -= nblocks % 4; + __chacha20_neon_blocks4 (state->ctx, dst, src, nblocks); + bytes -= nblocks * CHACHA20_BLOCK_SIZE; + dst += nblocks * CHACHA20_BLOCK_SIZE; + src += nblocks * CHACHA20_BLOCK_SIZE; + } +#endif + + if (bytes > 0) + chacha20_crypt_generic (state, dst, src, bytes); +} From patchwork Wed Apr 13 20:24:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 52882 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 015F83858401 for ; Wed, 13 Apr 2022 20:29:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 015F83858401 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1649881783; bh=0P5O/jJPWNU8C1eOB9mnJrQiDLJG7lxIzg1ZiwJTJ+0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=iPkwm6IClrAyPFNfe0j6bnmZg3lRJpM3PSd+p8C1SRF6i7HyN2VIVpGSYhjeTBlEU Dszgf8wHVK+qlK66WnGTubz/LxRZloZAeYTqZAE6s5tM1c5ZeXA1AK5onWeHGL9Fzc /leXsER5FY+4L71LYXavdi+uV0KonBJZnK1rG29k= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by sourceware.org (Postfix) with ESMTPS id 92E603857C4E for ; Wed, 13 Apr 2022 20:24:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 92E603857C4E Received: by mail-oi1-x22b.google.com with SMTP id 12so3257410oix.12 for ; Wed, 13 Apr 2022 13:24:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0P5O/jJPWNU8C1eOB9mnJrQiDLJG7lxIzg1ZiwJTJ+0=; b=MwpFkB26KeA08NsZdX7Hl0CvnXDXpwZWKtM67DTy68ooJ/YP/OYc7QE8LBcoLZeUwI A30p9xWsoTRDtUp1E6iGm77VQWTIIy4L2JKRqff4c9dM2j7tGSDTcf9/O7mae4DVcAqP 6n+tC63wSowhMk1BcZE0PhF17wRtr7aJit0xVgKZxgTtQtRD6xFH4NWIT29CpDbBRvFy pQHScHLyesd/aqNQL3KdNDHj7rdp2qnxTRm8FysqwRoaCVTX8OJ9hy5VKKdUzpF1bFci R8FUTY4llMy8Lxmk+ippZlijdKm4CXVPKCoMXMu1RH/xDeEeGWa97joQjM8735a49UCt rhng== X-Gm-Message-State: AOAM5321O64g3EdmzHBcEEnr2frlnaBYX9LP4b3ZNVJWrq/qsVnc8viY xJTytV8A0vac+B1f1jTteqTMkWZHBieCqg== X-Google-Smtp-Source: ABdhPJyMGtXgdqXNY3o7m109w4Q88HgHLoMQbXzg74R3bEFkb5mXRRxIcwXpxeH9ORZSqK4XWEkqXQ== X-Received: by 2002:a05:6808:13c5:b0:2fa:6def:4dfb with SMTP id d5-20020a05680813c500b002fa6def4dfbmr239727oiw.177.1649881462238; Wed, 13 Apr 2022 13:24:22 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:431f:889f:8960:cca1:4a60]) by smtp.gmail.com with ESMTPSA id o8-20020a05680803c800b00321034c99a6sm26562oie.3.2022.04.13.13.24.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 13:24:21 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH 7/7] powerpc64: Add optimized chacha20 Date: Wed, 13 Apr 2022 17:24:01 -0300 Message-Id: <20220413202401.408267-8-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220413202401.408267-1-adhemerval.zanella@linaro.org> References: <20220413202401.408267-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-ppc.c. It targets POWER8 and it is used on default for LE. On a POWER8 it shows the following improvements (using formatted bench-arc4random data): GENERIC (powerpc64-linux-gnu) Function MB/s -------------------------------------------------- arc4random [single-thread] 70.05 arc4random_buf(0) [single-thread] 143.62 arc4random_buf(16) [single-thread] 200.85 arc4random_buf(32) [single-thread] 247.87 arc4random_buf(64) [single-thread] 277.19 -------------------------------------------------- arc4random [multi-thread] 69.99 arc4random_buf(0) [multi-thread] 143.52 arc4random_buf(16) [multi-thread] 200.31 arc4random_buf(32) [multi-thread] 248.63 arc4random_buf(64) [multi-thread] 279.66 -------------------------------------------------- POWER8 Function MB/s -------------------------------------------------- arc4random [single-thread] 86.91 arc4random_buf(0) [single-thread] 212.20 arc4random_buf(16) [single-thread] 373.42 arc4random_buf(32) [single-thread] 572.93 arc4random_buf(64) [single-thread] 772.87 -------------------------------------------------- arc4random [multi-thread] 84.43 arc4random_buf(0) [multi-thread] 211.93 arc4random_buf(16) [multi-thread] 373.58 arc4random_buf(32) [multi-thread] 573.80 arc4random_buf(64) [multi-thread] 772.96 -------------------------------------------------- Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu. --- LICENSES | 4 +- sysdeps/powerpc/powerpc64/Makefile | 3 + sysdeps/powerpc/powerpc64/chacha-ppc.c | 254 ++++++++++++++++++++++ sysdeps/powerpc/powerpc64/chacha20_arch.h | 53 +++++ 4 files changed, 312 insertions(+), 2 deletions(-) create mode 100644 sysdeps/powerpc/powerpc64/chacha-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index b0c43495cb..f7dc51c3a9 100644 --- a/LICENSES +++ b/LICENSES @@ -391,8 +391,8 @@ Copyright 2001 by Stephen L. Moshier . */ sysdeps/x86_64/chacha20-ssse3.S, sysdeps/x86_64/chacha20-avx2.S, and -sysdeps/aarch64/chacha20.S import code from libgcrypt, with the -following notices: +sysdeps/aarch64/chacha20.S, and sysdeps/powerpc/powerpc64/chacha-ppc.c +import code from libgcrypt, with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/powerpc/powerpc64/Makefile b/sysdeps/powerpc/powerpc64/Makefile index 679d5e49ba..d213d23dc4 100644 --- a/sysdeps/powerpc/powerpc64/Makefile +++ b/sysdeps/powerpc/powerpc64/Makefile @@ -66,6 +66,9 @@ tst-setjmp-bug21895-static-ENV = \ endif ifeq ($(subdir),stdlib) +sysdep_routines += chacha-ppc +CFLAGS-chacha-ppc.c += -mcpu=power8 + CFLAGS-tst-ucontext-ppc64-vscr.c += -maltivec tests += tst-ucontext-ppc64-vscr endif diff --git a/sysdeps/powerpc/powerpc64/chacha-ppc.c b/sysdeps/powerpc/powerpc64/chacha-ppc.c new file mode 100644 index 0000000000..db87aa5823 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/chacha-ppc.c @@ -0,0 +1,254 @@ +/* Optimized PowerPC implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +typedef vector unsigned char vector16x_u8; +typedef vector unsigned int vector4x_u32; +typedef vector unsigned long long vector2x_u64; + +#ifdef WORDS_BIGENDIAN +static const vector16x_u8 le_bswap_const = + { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; +#endif + +static inline vector4x_u32 +vec_rol_elems (vector4x_u32 v, unsigned int idx) +{ +#ifndef WORDS_BIGENDIAN + return vec_sld (v, v, (16 - (4 * idx)) & 15); +#else + return vec_sld (v, v, (4 * idx) & 15); +#endif +} + +static inline vector4x_u32 +vec_load_le (unsigned long offset, const unsigned char *ptr) +{ + vector4x_u32 vec; + vec = vec_vsx_ld (offset, (const uint32_t *)ptr); +#ifdef WORDS_BIGENDIAN + vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, + le_bswap_const); +#endif + return vec; +} + +static inline void +vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) +{ +#ifdef WORDS_BIGENDIAN + vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, + le_bswap_const); +#endif + vec_vsx_st (vec, offset, (uint32_t *)ptr); +} + + +static inline vector4x_u32 +vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) +{ +#ifdef WORDS_BIGENDIAN + static const vector16x_u8 swap32 = + { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; + vector2x_u64 vec, add, sum; + + vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); + add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); + sum = vec + add; + return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); +#else + return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); +#endif +} + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define ROTATE(v1,rolv) \ + __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) + +#define PLUS(ds,s) \ + ((ds) += (s)) + +#define XOR(ds,s) \ + ((ds) ^= (s)) + +#define ADD_U64(v,a) \ + (v = vec_add_ctr_u64(v, a)) + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3) ({ \ + vector4x_u32 t1 = vec_mergeh(x0, x2); \ + vector4x_u32 t2 = vec_mergel(x0, x2); \ + vector4x_u32 t3 = vec_mergeh(x1, x3); \ + x3 = vec_mergel(x1, x3); \ + x0 = vec_mergeh(t1, t3); \ + x1 = vec_mergel(t1, t3); \ + x2 = vec_mergeh(t2, x3); \ + x3 = vec_mergel(t2, x3); \ + }) + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); + +unsigned int +__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t nblks) +{ + vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; + vector4x_u32 counter_4 = { 4, 0, 0, 0 }; + vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; + vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; + vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; + vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; + vector4x_u32 state0, state1, state2, state3; + vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; + vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; + vector4x_u32 tmp; + int i; + + /* Force preload of constants to vector registers. */ + __asm__ ("": "+v" (counters_0123) :: "memory"); + __asm__ ("": "+v" (counter_4) :: "memory"); + __asm__ ("": "+v" (rotate_16) :: "memory"); + __asm__ ("": "+v" (rotate_12) :: "memory"); + __asm__ ("": "+v" (rotate_8) :: "memory"); + __asm__ ("": "+v" (rotate_7) :: "memory"); + + state0 = vec_vsx_ld (0 * 16, state); + state1 = vec_vsx_ld (1 * 16, state); + state2 = vec_vsx_ld (2 * 16, state); + state3 = vec_vsx_ld (3 * 16, state); + + do + { + v0 = vec_splat (state0, 0); + v1 = vec_splat (state0, 1); + v2 = vec_splat (state0, 2); + v3 = vec_splat (state0, 3); + v4 = vec_splat (state1, 0); + v5 = vec_splat (state1, 1); + v6 = vec_splat (state1, 2); + v7 = vec_splat (state1, 3); + v8 = vec_splat (state2, 0); + v9 = vec_splat (state2, 1); + v10 = vec_splat (state2, 2); + v11 = vec_splat (state2, 3); + v12 = vec_splat (state3, 0); + v13 = vec_splat (state3, 1); + v14 = vec_splat (state3, 2); + v15 = vec_splat (state3, 3); + + v12 += counters_0123; + v13 -= vec_cmplt (v12, counters_0123); + + for (i = 20; i > 0; i -= 2) + { + QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) + QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) + QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) + QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) + } + + v0 += vec_splat (state0, 0); + v1 += vec_splat (state0, 1); + v2 += vec_splat (state0, 2); + v3 += vec_splat (state0, 3); + v4 += vec_splat (state1, 0); + v5 += vec_splat (state1, 1); + v6 += vec_splat (state1, 2); + v7 += vec_splat (state1, 3); + v8 += vec_splat (state2, 0); + v9 += vec_splat (state2, 1); + v10 += vec_splat (state2, 2); + v11 += vec_splat (state2, 3); + tmp = vec_splat( state3, 0); + tmp += counters_0123; + v12 += tmp; + v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); + v14 += vec_splat (state3, 2); + v15 += vec_splat (state3, 3); + ADD_U64 (state3, counter_4); + + transpose_4x4 (v0, v1, v2, v3); + transpose_4x4 (v4, v5, v6, v7); + transpose_4x4 (v8, v9, v10, v11); + transpose_4x4 (v12, v13, v14, v15); + + v0 ^= vec_load_le ((64 * 0 + 16 * 0), src); + v1 ^= vec_load_le ((64 * 1 + 16 * 0), src); + v2 ^= vec_load_le ((64 * 2 + 16 * 0), src); + v3 ^= vec_load_le ((64 * 3 + 16 * 0), src); + + v4 ^= vec_load_le ((64 * 0 + 16 * 1), src); + v5 ^= vec_load_le ((64 * 1 + 16 * 1), src); + v6 ^= vec_load_le ((64 * 2 + 16 * 1), src); + v7 ^= vec_load_le ((64 * 3 + 16 * 1), src); + + v8 ^= vec_load_le ((64 * 0 + 16 * 2), src); + v9 ^= vec_load_le ((64 * 1 + 16 * 2), src); + v10 ^= vec_load_le ((64 * 2 + 16 * 2), src); + v11 ^= vec_load_le ((64 * 3 + 16 * 2), src); + + v12 ^= vec_load_le ((64 * 0 + 16 * 3), src); + v13 ^= vec_load_le ((64 * 1 + 16 * 3), src); + v14 ^= vec_load_le ((64 * 2 + 16 * 3), src); + v15 ^= vec_load_le ((64 * 3 + 16 * 3), src); + + vec_store_le (v0, (64 * 0 + 16 * 0), dst); + vec_store_le (v1, (64 * 1 + 16 * 0), dst); + vec_store_le (v2, (64 * 2 + 16 * 0), dst); + vec_store_le (v3, (64 * 3 + 16 * 0), dst); + + vec_store_le (v4, (64 * 0 + 16 * 1), dst); + vec_store_le (v5, (64 * 1 + 16 * 1), dst); + vec_store_le (v6, (64 * 2 + 16 * 1), dst); + vec_store_le (v7, (64 * 3 + 16 * 1), dst); + + vec_store_le (v8, (64 * 0 + 16 * 2), dst); + vec_store_le (v9, (64 * 1 + 16 * 2), dst); + vec_store_le (v10, (64 * 2 + 16 * 2), dst); + vec_store_le (v11, (64 * 3 + 16 * 2), dst); + + vec_store_le (v12, (64 * 0 + 16 * 3), dst); + vec_store_le (v13, (64 * 1 + 16 * 3), dst); + vec_store_le (v14, (64 * 2 + 16 * 3), dst); + vec_store_le (v15, (64 * 3 + 16 * 3), dst); + + src += 4*64; + dst += 4*64; + + nblks -= 4; + } + while (nblks); + + vec_vsx_st (state3, 3 * 16, state); + + return 0; +} diff --git a/sysdeps/powerpc/powerpc64/chacha20_arch.h b/sysdeps/powerpc/powerpc64/chacha20_arch.h new file mode 100644 index 0000000000..e958c73b3c --- /dev/null +++ b/sysdeps/powerpc/powerpc64/chacha20_arch.h @@ -0,0 +1,53 @@ +/* PowerPC optimization for ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks); + +static inline bool +is_power8 (void) +{ +#ifdef __LITTLE_ENDIAN__ + return true; +#else + unsigned long int hwcap = GLRO(dl_hwcap); + unsigned long int hwcap2 = GLRO(dl_hwcap2); + return hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC; +#endif +} + +static void +chacha20_crypt (struct chacha20_state *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + if (is_power8 () && bytes >= CHACHA20_BLOCK_SIZE * 4) + { + size_t nblocks = bytes / CHACHA20_BLOCK_SIZE; + nblocks -= nblocks % 4; + __chacha20_power8_blocks4 (state->ctx, dst, src, nblocks); + bytes -= nblocks * CHACHA20_BLOCK_SIZE; + dst += nblocks * CHACHA20_BLOCK_SIZE; + src += nblocks * CHACHA20_BLOCK_SIZE; + } + + if (bytes > 0) + chacha20_crypt_generic (state, dst, src, bytes); +}