From patchwork Fri Sep 3 17:11:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44840 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3F7353848438 for ; Fri, 3 Sep 2021 17:12:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F7353848438 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630689131; bh=YxNRwSeqB1dZEihn8zBJ/NryWsCxKKFon5s0R81OHWY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=tRkRKRBwLNmsKXhbojid61JXkaMBogaRfsBI0wMYg+mVKZGvFOLe9XmRzTvsFbxiX 5D1vnPcv2KnkEnSvdBVLz6MlVzebCeOhkEac3dis02RFngww5UhK7pibY6AR/3Bf/O hsUFDxtGJ5XpIc3xubYOcnxYPi5YfpeWefGoO26I= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id 8F773384B13A for ; Fri, 3 Sep 2021 17:11:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F773384B13A Received: by mail-qt1-x833.google.com with SMTP id s32so5070955qtc.12 for ; Fri, 03 Sep 2021 10:11:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YxNRwSeqB1dZEihn8zBJ/NryWsCxKKFon5s0R81OHWY=; b=SHZ6BOcJv000MUm3wpqFOxQr95tLO8EQ80FTWoZgZqjfCnV6ZPC/QAEwsAWPXOPkFi BsXvr+6L9mNpf472sC9QIol4q9cBzdy1Wr7Q7nw0K1CEcn9k5gdcny1cwEx7TCNumvV6 Lx8BRcew3QjOrE5N8PB46ncXm+a2/MkuZSdmj9KK0DBLjkXV9JCBdiWjfsEdFrrZ9j9U sg9HG6jegyr4fDIsKhHixMh5FhaviWqasOCSrGnBngR8caNFgtAP8161Ak5sQrIa6lKj Cn1Kah64tO8u4Ia5VcSuJ16OiGpfEAtmBHvp0zKeSIphRgNOIqNSJAcFji2C0Zlfg+cp WMcw== X-Gm-Message-State: AOAM530Oys+KhX12mEE60qcamRKwl6OR/PQmuZovUA5X5supBCgAh19y et4qqxqG/OshTe7L4esU/aVmdxkIt/5VAQ== X-Google-Smtp-Source: ABdhPJw8nICwT6F3sxLipCwNxNeSgH/DfomYH126yykud2ulimTnC+Q8uyI2Uiqoqxpoe4go4vEJKw== X-Received: by 2002:ac8:7491:: with SMTP id v17mr4738672qtq.291.1630689108735; Fri, 03 Sep 2021 10:11:48 -0700 (PDT) Received: from birita.. ([2804:431:c7cb:733d:fff8:7487:556e:e293]) by smtp.gmail.com with ESMTPSA id r4sm3207071qtw.5.2021.09.03.10.11.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Sep 2021 10:11:48 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 0/7] Use introsort for qsort Date: Fri, 3 Sep 2021 14:11:37 -0300 Message-Id: <20210903171144.952737-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This is respin from my previous attempt to fix and improve qsort() [1]. The main motivation to use introsort is to make it fully AS-Safe and AC-Safe, with a limited stack size requirement, to remove the use of malloc (along with the system memory checks), and keeping Worst-case performance on O(n*ln(n)) (instead of potentially quadradic as for the quicksort). The implementation does not aim to be the state-of-the-art sort algorithm, rather I used a well understood and simple on ((introsort, used on libstdc++) and leverage the current quicksort implementation along with a heapsort one from Linux kernel. The resulting implementation has bounded stack requirement (about 1.8k on x86_64), and results in an simplified code and binary: # master $ size stdlib/qsort.o stdlib/msort.o text data bss dec hex filename 1316 0 0 1316 524 stdlib/qsort.o 2045 0 12 2057 809 stdlib/msort.o # patched $ size stdlib/qsort.os text data bss dec hex filename 2373 0 0 2373 945 stdlib/qsort.os The performance difference with the provided benchmark clearly show the tradeoff of using a instrosort over a mergesort. The results are on a Ryzen 9 with gcc 11 ran with --nelem 49152:786432:8388608 (to simulate the L1/L2/L3 of this CPU used): SORTED elements= 49152 (size= 4): %52.47% (master=3654381.0 patches=5571774.0) elements= 786432 (size= 4): %57.39% (master=73067211.0 patches=115003964.0) elements= 8388608 (size= 4): %49.71% (master=1005357374.0 patches=1505144332.0) elements= 49152 (size= 8): %48.98% (master=3796489.0 patches=5656023.0) elements= 786432 (size= 8): %52.34% (master=75809561.0 patches=115491584.0) elements= 8388608 (size= 8): %49.97% (master=1004201712.0 patches=1506017512.0) elements= 49152 (size=32): %-2.69% (master=11191649.0 patches=10891084.0) elements= 786432 (size=32): %-12.16% (master=238870746.0 patches=209822358.0) elements= 8388608 (size=32): %-23.33% (master=3323248933.0 patches=2547848399.0) MOSTLYSORTED elements= 49152 (size= 4): %66.47% (master=6915979.0 patches=11512945.0) elements= 786432 (size= 4): %66.48% (master=143188170.0 patches=238379284.0) elements= 8388608 (size= 4): %70.90% (master=1805703842.0 patches=3086011963.0) elements= 49152 (size= 8): %78.11% (master=6836231.0 patches=12176330.0) elements= 786432 (size= 8): %77.79% (master=138851879.0 patches=246867806.0) elements= 8388608 (size= 8): %77.70% (master=1773884617.0 patches=3152112348.0) elements= 49152 (size=32): %-6.44% (master=13766992.0 patches=12880888.0) elements= 786432 (size=32): %-15.19% (master=287590899.0 patches=243919593.0) elements= 8388608 (size=32): %-25.63% (master=3852042869.0 patches=2864641676.0) RANDOM elements= 49152 (size= 4): %9.54% (master=15078260.0 patches=16516457.0) elements= 786432 (size= 4): %5.64% (master=305805587.0 patches=323043450.0) elements= 8388608 (size= 4): %2.78% (master=3838340458.0 patches=3945202718.0) elements= 49152 (size= 8): %13.75% (master=14676586.0 patches=16694499.0) elements= 786432 (size= 8): %10.71% (master=295583560.0 patches=327252646.0) elements= 8388608 (size= 8): %7.67% (master=3722946692.0 patches=4008563091.0) elements= 49152 (size=32): %-8.34% (master=20944462.0 patches=19196696.0) elements= 786432 (size=32): %-14.31% (master=425822838.0 patches=364896701.0) elements= 8388608 (size=32): %-19.86% (master=5534217212.0 patches=4435276362.0) REPEATED elements= 49152 (size= 4): %9.18% (master=14051610.0 patches=15341610.0) elements= 786432 (size= 4): %6.01% (master=282787457.0 patches=299777700.0) elements= 8388608 (size= 4): %4.19% (master=3539279291.0 patches=3687517777.0) elements= 49152 (size= 8): %12.37% (master=13659340.0 patches=15349336.0) elements= 786432 (size= 8): %11.11% (master=273070595.0 patches=303413116.0) elements= 8388608 (size= 8): %8.46% (master=3424739041.0 patches=3714521094.0) elements= 49152 (size=32): %-11.21% (master=19826230.0 patches=17603795.0) elements= 786432 (size=32): %-16.73% (master=405322605.0 patches=337501658.0) elements= 8388608 (size=32): %-21.93% (master=5261140329.0 patches=4107591456.0) BITONIC elements= 49152 (size= 4): %405.43% (master=4632011.0 patches=23411611.0) elements= 786432 (size= 4): %493.28% (master=92649410.0 patches=549674361.0) elements= 8388608 (size= 4): %609.03% (master=1095591416.0 patches=7768102251.0) elements= 49152 (size= 8): %402.05% (master=4672695.0 patches=23459484.0) elements= 786432 (size= 8): %480.60% (master=94419035.0 patches=548197066.0) elements= 8388608 (size= 8): %596.41% (master=1116631167.0 patches=7776333333.0) elements= 49152 (size=32): %-6.92% (master=11545720.0 patches=10747100.0) elements= 786432 (size=32): %-14.97% (master=245037969.0 patches=208357373.0) elements= 8388608 (size=32): %-24.14% (master=3357712042.0 patches=2547221807.0) With the bitonic and sorted showing the biggest hit. Even though, I think the tradeoff is acceptable. The improvements over sizes larger than 8 is mostly due the optimization done on swap operations. [1] https://sourceware.org/pipermail/libc-alpha/2018-August/096981.html Adhemerval Zanella (7): benchtests: Add bench-qsort support: Fix getopt_long with CMDLINE_OPTIONS stdlib: Optimization qsort{_r} swap implementation (BZ #19305) stdlib: Move insertion sort out qsort stdlib: qsort: Move some macros to inline function stdlib: Implement introsort with qsort stdlib: Remove use of mergesort on qsort (BZ #21719) benchtests/Makefile | 2 +- benchtests/bench-qsort.c | 195 +++++++++++++++++++++ manual/argp.texi | 2 +- manual/locale.texi | 3 +- manual/search.texi | 7 +- stdlib/Makefile | 3 +- stdlib/msort.c | 310 --------------------------------- stdlib/qsort.c | 333 ++++++++++++++++++++++++++++-------- support/support_test_main.c | 1 - support/test-driver.h | 1 + 10 files changed, 460 insertions(+), 397 deletions(-) create mode 100644 benchtests/bench-qsort.c delete mode 100644 stdlib/msort.c