From patchwork Mon Oct 2 19:33:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 76984 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AC8813857036 for ; Mon, 2 Oct 2023 19:34:02 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 5340A38582A3 for ; Mon, 2 Oct 2023 19:33:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5340A38582A3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1c1ff5b741cso1296935ad.2 for ; Mon, 02 Oct 2023 12:33:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1696275200; x=1696880000; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=I/mdUPzISC+XmuJBFaDw8dL6gLvuN2O/2+RJn1xOxGA=; b=tMmH6nhr2lNvqjO11qeoE4w5SZY4zNpSfEI1QvTBRwxwTD9JDOxL+u9l/KzC9btDlo tsnnFylGrMs6A7mk9oqTzZ1NmCR3bAzbjF+IYHd6x0177Yy48Wvy9GfrrA15jToFQYJ3 p6gzoaUw4xBFweOX9KxLGcoGB0QZBD6GMmnuP+EleF3GeQCPbcJoFexm+ROTGvAVSwxD FpZBwbJSECvdBoAxBYHVIpm1Jls3NOn9HpWuMYJztPX0smkNM8pb4t3F1d05u8IwO578 mhfsTG5h6rnlBt0+vpW5Uwb2l73VNIZsEglgUTlvOlHhylRIGGj9DCZ7S3iUoG0IlWft ZD3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696275200; x=1696880000; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I/mdUPzISC+XmuJBFaDw8dL6gLvuN2O/2+RJn1xOxGA=; b=w9EoqJT0+95GkD+7fXXX47RNKwPc1v+I5giOxqnMp6oXLY2PZ2Qib9LrWSNOTVgN2L A655gj6lGAZIC+LfpTLZHFTOQEQA9yTNOFIfIKcYAog6eTmHqABkLzrm7o7ETVeFt0cI sxcmCZVfYod2L4VKpTCV5S8yaeT0Up+n6NmUoDPnf5qRrWzpR/oMl6iKAn9v916QBP4y SB+3gbrlzCK1M1i6XNENj7S79/dEGJ5hKN7EHcAX0IfSs8A354DJt832e7EFNKIEtrX/ 87fNWMWfvZ1s0v1cgBcG8arELTezISQ5kMejSPaE+dUIjS1jRElG3ncrZCIknhSr3t1O dKJg== X-Gm-Message-State: AOJu0YzrCrcT9rYrquH6zesuciHRGAfZvqGhMw+30auK4Jq8ma6lfPnv 2PVLlTuDfFG42D8n+NVRMD9WozT0oeazkKZkGFRv9Q== X-Google-Smtp-Source: AGHT+IFGGDfn6aTo97I13l/F9yZjN9LYnPWAdtdJQnYjl6Mv/YbclKFrDN+UNwxG3UyvplMGV9G7dA== X-Received: by 2002:a17:902:f7d3:b0:1c1:ed61:e058 with SMTP id h19-20020a170902f7d300b001c1ed61e058mr11658035plw.16.1696275199759; Mon, 02 Oct 2023 12:33:19 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c1:feaf:8f32:80e:c10a:4836]) by smtp.gmail.com with ESMTPSA id u10-20020a170902b28a00b001c7453fae33sm6982828plr.280.2023.10.02.12.33.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 12:33:19 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , Paul Eggert , Florian Weimer Subject: [PATCH v8 2/7] stdlib: Optimization qsort{_r} swap implementation Date: Mon, 2 Oct 2023 16:33:06 -0300 Message-Id: <20231002193311.3985890-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231002193311.3985890-1-adhemerval.zanella@linaro.org> References: <20231002193311.3985890-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org The optimization takes in consideration both the most common elements are either 32 or 64 bit in size and inputs are aligned to the word boundary. This is similar to what msort does. For large buffer the swap operation uses memcpy/mempcpy with a small fixed size buffer (so compiler might inline the operations). Checked on x86_64-linux-gnu. --- stdlib/qsort.c | 94 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 76 insertions(+), 18 deletions(-) diff --git a/stdlib/qsort.c b/stdlib/qsort.c index 728a0ed370..bba9783191 100644 --- a/stdlib/qsort.c +++ b/stdlib/qsort.c @@ -23,20 +23,70 @@ #include #include #include +#include -/* Byte-wise swap two items of size SIZE. */ -#define SWAP(a, b, size) \ - do \ - { \ - size_t __size = (size); \ - char *__a = (a), *__b = (b); \ - do \ - { \ - char __tmp = *__a; \ - *__a++ = *__b; \ - *__b++ = __tmp; \ - } while (--__size > 0); \ - } while (0) +/* Swap SIZE bytes between addresses A and B. These helpers are provided + along the generic one as an optimization. */ + +enum swap_type_t + { + SWAP_WORDS_64, + SWAP_WORDS_32, + SWAP_BYTES + }; + +/* If this function returns true, elements can be safely copied using word + loads and stores. Otherwise, it might not be safe. BASE (as an integer) + must be a multiple of the word alignment. SIZE must be a multiple of + WORDSIZE. Since WORDSIZE must be a multiple of the word alignment, and + WORDSIZE is a power of two on all supported platforms, this function for + speed merely checks that BASE and SIZE are both multiples of the word + size. */ +static inline bool +is_aligned (const void *base, size_t size, size_t wordsize) +{ + return (((uintptr_t) base | size) & (wordsize - 1)) == 0; +} + +static inline void +swap_words_64 (void * restrict a, void * restrict b, size_t n) +{ + typedef uint64_t __attribute__ ((__may_alias__)) u64_alias_t; + do + { + n -= 8; + u64_alias_t t = *(u64_alias_t *)(a + n); + *(u64_alias_t *)(a + n) = *(u64_alias_t *)(b + n); + *(u64_alias_t *)(b + n) = t; + } while (n); +} + +static inline void +swap_words_32 (void * restrict a, void * restrict b, size_t n) +{ + typedef uint32_t __attribute__ ((__may_alias__)) u32_alias_t; + do + { + n -= 4; + u32_alias_t t = *(u32_alias_t *)(a + n); + *(u32_alias_t *)(a + n) = *(u32_alias_t *)(b + n); + *(u32_alias_t *)(b + n) = t; + } while (n); +} + +/* Replace the indirect call with a serie of if statements. It should help + the branch predictor. */ +static void +do_swap (void * restrict a, void * restrict b, size_t size, + enum swap_type_t swap_type) +{ + if (swap_type == SWAP_WORDS_64) + swap_words_64 (a, b, size); + else if (swap_type == SWAP_WORDS_32) + swap_words_32 (a, b, size); + else + __memswap (a, b, size); +} /* Discontinue quicksort algorithm when partition gets below this size. This particular magic number was chosen to work best on a Sun 4/260. */ @@ -96,6 +146,14 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, /* Avoid lossage with unsigned arithmetic below. */ return; + enum swap_type_t swap_type; + if (is_aligned (pbase, size, 8)) + swap_type = SWAP_WORDS_64; + else if (is_aligned (pbase, size, 4)) + swap_type = SWAP_WORDS_32; + else + swap_type = SWAP_BYTES; + if (total_elems > MAX_THRESH) { char *lo = base_ptr; @@ -119,13 +177,13 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, char *mid = lo + size * ((hi - lo) / size >> 1); if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + do_swap (mid, lo, size, swap_type); if ((*cmp) ((void *) hi, (void *) mid, arg) < 0) - SWAP (mid, hi, size); + do_swap (mid, hi, size, swap_type); else goto jump_over; if ((*cmp) ((void *) mid, (void *) lo, arg) < 0) - SWAP (mid, lo, size); + do_swap (mid, lo, size, swap_type); jump_over:; left_ptr = lo + size; @@ -144,7 +202,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, if (left_ptr < right_ptr) { - SWAP (left_ptr, right_ptr, size); + do_swap (left_ptr, right_ptr, size, swap_type); if (mid == left_ptr) mid = right_ptr; else if (mid == right_ptr) @@ -216,7 +274,7 @@ _quicksort (void *const pbase, size_t total_elems, size_t size, tmp_ptr = run_ptr; if (tmp_ptr != base_ptr) - SWAP (tmp_ptr, base_ptr, size); + do_swap (tmp_ptr, base_ptr, size, swap_type); /* Insertion sort, running from left-hand-side up to right-hand-side. */