Message ID | 20210903171144.952737-1-adhemerval.zanella@linaro.org |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3F7353848438 for <patchwork@sourceware.org>; Fri, 3 Sep 2021 17:12:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F7353848438 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630689131; bh=YxNRwSeqB1dZEihn8zBJ/NryWsCxKKFon5s0R81OHWY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=tRkRKRBwLNmsKXhbojid61JXkaMBogaRfsBI0wMYg+mVKZGvFOLe9XmRzTvsFbxiX 5D1vnPcv2KnkEnSvdBVLz6MlVzebCeOhkEac3dis02RFngww5UhK7pibY6AR/3Bf/O hsUFDxtGJ5XpIc3xubYOcnxYPi5YfpeWefGoO26I= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id 8F773384B13A for <libc-alpha@sourceware.org>; Fri, 3 Sep 2021 17:11:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F773384B13A Received: by mail-qt1-x833.google.com with SMTP id s32so5070955qtc.12 for <libc-alpha@sourceware.org>; Fri, 03 Sep 2021 10:11:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YxNRwSeqB1dZEihn8zBJ/NryWsCxKKFon5s0R81OHWY=; b=SHZ6BOcJv000MUm3wpqFOxQr95tLO8EQ80FTWoZgZqjfCnV6ZPC/QAEwsAWPXOPkFi BsXvr+6L9mNpf472sC9QIol4q9cBzdy1Wr7Q7nw0K1CEcn9k5gdcny1cwEx7TCNumvV6 Lx8BRcew3QjOrE5N8PB46ncXm+a2/MkuZSdmj9KK0DBLjkXV9JCBdiWjfsEdFrrZ9j9U sg9HG6jegyr4fDIsKhHixMh5FhaviWqasOCSrGnBngR8caNFgtAP8161Ak5sQrIa6lKj Cn1Kah64tO8u4Ia5VcSuJ16OiGpfEAtmBHvp0zKeSIphRgNOIqNSJAcFji2C0Zlfg+cp WMcw== X-Gm-Message-State: AOAM530Oys+KhX12mEE60qcamRKwl6OR/PQmuZovUA5X5supBCgAh19y et4qqxqG/OshTe7L4esU/aVmdxkIt/5VAQ== X-Google-Smtp-Source: ABdhPJw8nICwT6F3sxLipCwNxNeSgH/DfomYH126yykud2ulimTnC+Q8uyI2Uiqoqxpoe4go4vEJKw== X-Received: by 2002:ac8:7491:: with SMTP id v17mr4738672qtq.291.1630689108735; Fri, 03 Sep 2021 10:11:48 -0700 (PDT) Received: from birita.. ([2804:431:c7cb:733d:fff8:7487:556e:e293]) by smtp.gmail.com with ESMTPSA id r4sm3207071qtw.5.2021.09.03.10.11.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Sep 2021 10:11:48 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 0/7] Use introsort for qsort Date: Fri, 3 Sep 2021 14:11:37 -0300 Message-Id: <20210903171144.952737-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Use introsort for qsort
|
|
Message
Adhemerval Zanella Netto
Sept. 3, 2021, 5:11 p.m. UTC
This is respin from my previous attempt to fix and improve qsort() [1]. The main motivation to use introsort is to make it fully AS-Safe and AC-Safe, with a limited stack size requirement, to remove the use of malloc (along with the system memory checks), and keeping Worst-case performance on O(n*ln(n)) (instead of potentially quadradic as for the quicksort). The implementation does not aim to be the state-of-the-art sort algorithm, rather I used a well understood and simple on ((introsort, used on libstdc++) and leverage the current quicksort implementation along with a heapsort one from Linux kernel. The resulting implementation has bounded stack requirement (about 1.8k on x86_64), and results in an simplified code and binary: # master $ size stdlib/qsort.o stdlib/msort.o text data bss dec hex filename 1316 0 0 1316 524 stdlib/qsort.o 2045 0 12 2057 809 stdlib/msort.o # patched $ size stdlib/qsort.os text data bss dec hex filename 2373 0 0 2373 945 stdlib/qsort.os The performance difference with the provided benchmark clearly show the tradeoff of using a instrosort over a mergesort. The results are on a Ryzen 9 with gcc 11 ran with --nelem 49152:786432:8388608 (to simulate the L1/L2/L3 of this CPU used): SORTED elements= 49152 (size= 4): %52.47% (master=3654381.0 patches=5571774.0) elements= 786432 (size= 4): %57.39% (master=73067211.0 patches=115003964.0) elements= 8388608 (size= 4): %49.71% (master=1005357374.0 patches=1505144332.0) elements= 49152 (size= 8): %48.98% (master=3796489.0 patches=5656023.0) elements= 786432 (size= 8): %52.34% (master=75809561.0 patches=115491584.0) elements= 8388608 (size= 8): %49.97% (master=1004201712.0 patches=1506017512.0) elements= 49152 (size=32): %-2.69% (master=11191649.0 patches=10891084.0) elements= 786432 (size=32): %-12.16% (master=238870746.0 patches=209822358.0) elements= 8388608 (size=32): %-23.33% (master=3323248933.0 patches=2547848399.0) MOSTLYSORTED elements= 49152 (size= 4): %66.47% (master=6915979.0 patches=11512945.0) elements= 786432 (size= 4): %66.48% (master=143188170.0 patches=238379284.0) elements= 8388608 (size= 4): %70.90% (master=1805703842.0 patches=3086011963.0) elements= 49152 (size= 8): %78.11% (master=6836231.0 patches=12176330.0) elements= 786432 (size= 8): %77.79% (master=138851879.0 patches=246867806.0) elements= 8388608 (size= 8): %77.70% (master=1773884617.0 patches=3152112348.0) elements= 49152 (size=32): %-6.44% (master=13766992.0 patches=12880888.0) elements= 786432 (size=32): %-15.19% (master=287590899.0 patches=243919593.0) elements= 8388608 (size=32): %-25.63% (master=3852042869.0 patches=2864641676.0) RANDOM elements= 49152 (size= 4): %9.54% (master=15078260.0 patches=16516457.0) elements= 786432 (size= 4): %5.64% (master=305805587.0 patches=323043450.0) elements= 8388608 (size= 4): %2.78% (master=3838340458.0 patches=3945202718.0) elements= 49152 (size= 8): %13.75% (master=14676586.0 patches=16694499.0) elements= 786432 (size= 8): %10.71% (master=295583560.0 patches=327252646.0) elements= 8388608 (size= 8): %7.67% (master=3722946692.0 patches=4008563091.0) elements= 49152 (size=32): %-8.34% (master=20944462.0 patches=19196696.0) elements= 786432 (size=32): %-14.31% (master=425822838.0 patches=364896701.0) elements= 8388608 (size=32): %-19.86% (master=5534217212.0 patches=4435276362.0) REPEATED elements= 49152 (size= 4): %9.18% (master=14051610.0 patches=15341610.0) elements= 786432 (size= 4): %6.01% (master=282787457.0 patches=299777700.0) elements= 8388608 (size= 4): %4.19% (master=3539279291.0 patches=3687517777.0) elements= 49152 (size= 8): %12.37% (master=13659340.0 patches=15349336.0) elements= 786432 (size= 8): %11.11% (master=273070595.0 patches=303413116.0) elements= 8388608 (size= 8): %8.46% (master=3424739041.0 patches=3714521094.0) elements= 49152 (size=32): %-11.21% (master=19826230.0 patches=17603795.0) elements= 786432 (size=32): %-16.73% (master=405322605.0 patches=337501658.0) elements= 8388608 (size=32): %-21.93% (master=5261140329.0 patches=4107591456.0) BITONIC elements= 49152 (size= 4): %405.43% (master=4632011.0 patches=23411611.0) elements= 786432 (size= 4): %493.28% (master=92649410.0 patches=549674361.0) elements= 8388608 (size= 4): %609.03% (master=1095591416.0 patches=7768102251.0) elements= 49152 (size= 8): %402.05% (master=4672695.0 patches=23459484.0) elements= 786432 (size= 8): %480.60% (master=94419035.0 patches=548197066.0) elements= 8388608 (size= 8): %596.41% (master=1116631167.0 patches=7776333333.0) elements= 49152 (size=32): %-6.92% (master=11545720.0 patches=10747100.0) elements= 786432 (size=32): %-14.97% (master=245037969.0 patches=208357373.0) elements= 8388608 (size=32): %-24.14% (master=3357712042.0 patches=2547221807.0) With the bitonic and sorted showing the biggest hit. Even though, I think the tradeoff is acceptable. The improvements over sizes larger than 8 is mostly due the optimization done on swap operations. [1] https://sourceware.org/pipermail/libc-alpha/2018-August/096981.html Adhemerval Zanella (7): benchtests: Add bench-qsort support: Fix getopt_long with CMDLINE_OPTIONS stdlib: Optimization qsort{_r} swap implementation (BZ #19305) stdlib: Move insertion sort out qsort stdlib: qsort: Move some macros to inline function stdlib: Implement introsort with qsort stdlib: Remove use of mergesort on qsort (BZ #21719) benchtests/Makefile | 2 +- benchtests/bench-qsort.c | 195 +++++++++++++++++++++ manual/argp.texi | 2 +- manual/locale.texi | 3 +- manual/search.texi | 7 +- stdlib/Makefile | 3 +- stdlib/msort.c | 310 --------------------------------- stdlib/qsort.c | 333 ++++++++++++++++++++++++++++-------- support/support_test_main.c | 1 - support/test-driver.h | 1 + 10 files changed, 460 insertions(+), 397 deletions(-) create mode 100644 benchtests/bench-qsort.c delete mode 100644 stdlib/msort.c
Comments
On 9/3/21 10:11 AM, Adhemerval Zanella wrote: > The performance difference with the provided benchmark clearly show the > tradeoff of using a instrosort over a mergesort. Zhang, Meng and Liang <https://arxiv.org/pdf/1609.04471> report that introsort is particularly bad with reverse-sorted input; that case should be added to the benchmarks. The proposed change would appear to hurt performance significantly in the common case of sorting pointers. My guess is that typical glibc users would prefer speed to hardening against rare signal-handling or low-memory cases, since portable code will have to assume that qsort isn't hardened anyway. If hardening is a goal, perhaps we should consider an alternate API instead of changing qsort's implementation. If we do that, we can improve the API as well as changing the function's name. The new API could promise to not call malloc, or more generally to be async-signal-safe if the comparison function is. The comparison function could take a third void* argument so it can in effect be a closure (this is qsort's greatest failing), the sorting function could take a flag argument for options, and it could return an error indication (in case it detects a comparison function gone haywire, say). That sort of thing. > The improvements over sizes larger > than 8 is mostly due the optimization done on swap operations. A similar optimization could be done to the existing code independently, no? I expect that an apples-to-apples performance comparison would be even less favorable to the proposed change.
On 9/3/21 3:18 PM, Paul Eggert wrote: > On 9/3/21 10:11 AM, Adhemerval Zanella wrote: > >> The performance difference with the provided benchmark clearly show >> the tradeoff of using a instrosort over a mergesort. > > Zhang, Meng and Liang <https://arxiv.org/pdf/1609.04471> report that > introsort is particularly bad with reverse-sorted input; that case > should be added to the benchmarks. It should, and it would be useful, however, I think that Adhemerval's changes to make qsort AS-safe are actually relevant and important. All existing ruby implementations I can see use qsort in an AS-safe-requiring context (process.c) and this would help those scenarios. In general I think glibc should provide a qsort that is: - Safe. - Does not have quadratic performance behaviour. If we want better performance we need to: - Add new APIs mirroring what BSD is doing. - New APIs explicitly call out the *type* of sort algorithm. - Encourage developers and application authors to contribute those new implementations and use them in their applications. There is never going to be a qsort that meets all of the workload demands and so we should aim for a good implementation that is as safe as we can make it (limit memory usage, avoid malloc, avoid quadratic performance behaviour). > The proposed change would appear to hurt performance significantly in > the common case of sorting pointers. My guess is that typical glibc > users would prefer speed to hardening against rare signal-handling or > low-memory cases, since portable code will have to assume that qsort > isn't hardened anyway. I disagree. I think users do not expect qsort to be high performance, they expect it to be OK performance, OK memory usage, and lastly safe. My experience with core python developers has been that they expect glibc to provide functions that are safe in as many contexts as possible in order to have them avoid rolling their own for the implementation of the language runtime. Application developers may not care about this AS-safety, but they also have the flexiblity to bring in additional dependencies. We can satisfy those requirements by adding more algorithms under new symbols. > If hardening is a goal, perhaps we should consider an alternate API > instead of changing qsort's implementation. If we do that, we can > improve the API as well as changing the function's name. The new API > could promise to not call malloc, or more generally to be > async-signal-safe if the comparison function is. The comparison > function could take a third void* argument so it can in effect be a > closure (this is qsort's greatest failing), the sorting function > could take a flag argument for options, and it could return an error > indication (in case it detects a comparison function gone haywire, > say). That sort of thing. I'd argue the opposite. Make the current functions safe. Add new high performance versions with explicit algorithms and explicit tradeoffs. >> The improvements over sizes larger than 8 is mostly due the >> optimization done on swap operations. > > A similar optimization could be done to the existing code > independently, no? I expect that an apples-to-apples performance > comparison would be even less favorable to the proposed change. Size is nice, but safety is nicer IMO.
On Mon, Sep 6, 2021, at 10:13 AM, Carlos O'Donell via Libc-alpha wrote: > > In general I think glibc should provide a qsort that is: > > - Safe. > > - Does not have quadratic performance behaviour. > > If we want better performance we need to: > > - Add new APIs mirroring what BSD is doing. > > - New APIs explicitly call out the *type* of sort algorithm. > > - Encourage developers and application authors to contribute those > new implementations and use them in their applications. Do you have a link or other reference for these new APIs being developed by the BSDs? (asking out of curiosity; I haven't been following this discussion closely enough to have an opinion) zw
On 06/09/2021 14:03, Zack Weinberg via Libc-alpha wrote: > On Mon, Sep 6, 2021, at 10:13 AM, Carlos O'Donell via Libc-alpha wrote: >> >> In general I think glibc should provide a qsort that is: >> >> - Safe. >> >> - Does not have quadratic performance behaviour. >> >> If we want better performance we need to: >> >> - Add new APIs mirroring what BSD is doing. >> >> - New APIs explicitly call out the *type* of sort algorithm. >> >> - Encourage developers and application authors to contribute those >> new implementations and use them in their applications. > > Do you have a link or other reference for these new APIs being developed by the BSDs? > > (asking out of curiosity; I haven't been following this discussion closely enough to have an opinion) > The FreeBSD provides qsort, which implements a quicksort, heapsort, and mergesort [1]. [1] https://www.freebsd.org/cgi/man.cgi?query=qsort&sektion=3&manpath=freebsd-release
On 9/6/21 7:13 AM, Carlos O'Donell wrote: > All existing ruby implementations I can see use qsort in an > AS-safe-requiring context (process.c) and this would help those scenarios. I don't see how those scenarios are practical. Assuming glibc and the standard Ruby implementation, any async signal problems could occur only if Ruby's spawn method has more than 32 redirects, as this is the minimum number that would cause qsort to call malloc on practical platforms. In practice Ruby spawns don't have that many redirects, so Ruby programs won't run into a problem unless they're contrived to illustrate this portability bug in the Ruby implementation. > If we want better performance we need to: > > - Add new APIs mirroring what BSD is doing. BSD's APIs are not good, partly because they rely on nonportable C extensions (blocks) that I doubt whether glibc can insist on, partly for other reasons discussed below. An API like what I suggested in my previous email would be better, if we want to add new APIs. > - New APIs explicitly call out the *type* of sort algorithm. "Type" should not mean "heapsort" vs "quicksort" vs "mergesort" as in BSD. Users need performance, async-signal-safety, and stability; the BSD "type" approach is merely a means to that end and any new API should focus on their needs rather than on "type". > we should aim for a good implementation that is as safe as we > can make it (limit memory usage, avoid malloc, avoid quadratic performance > behaviour). If such an approach doesn't significantly hurt typical performance, I'm all for it. I'm skeptical that it's doable, though. Let's not underestimate ordinary users' desires for performance here. qsort is used in other core apps (e.g., Emacs) without running afoul of async-signal-safety bugs. Also, this is not merely about core apps like Emacs and Ruby; it's also about user one-off apps. I often see Glibc qsort in benchmarks, and if the benchmarks results become significantly worse we'll be more likely to scare users away from Glibc and toward other platforms. Unless we can fix the significant performance issues in the proposed patch, the benefit of working around an only-theoretical bug in Ruby's implementation doesn't seem worth this cost.
On 06/09/2021 21:14, Paul Eggert wrote: > On 9/6/21 7:13 AM, Carlos O'Donell wrote: > >> All existing ruby implementations I can see use qsort in an >> AS-safe-requiring context (process.c) and this would help those scenarios. > > I don't see how those scenarios are practical. Assuming glibc and the standard Ruby implementation, any async signal problems could occur only if Ruby's spawn method has more than 32 redirects, as this is the minimum number that would cause qsort to call malloc on practical platforms. In practice Ruby spawns don't have that many redirects, so Ruby programs won't run into a problem unless they're contrived to illustrate this portability bug in the Ruby implementation. I consider this a QoI to move toward provide memory bounded and async-safe interfaces even though the interface contract does not really specify such constraint. I give you that it might lead to non-portable code such the ruby case, but I prefer to actually move the interface to be async-safe instead of either trying break the caller expectations (that only works due an optimization) or changing the implementation and let the users handle any potential breakage. Eventually what might happen when some breakage occurs is users like ruby to roll out their own qsort() implementation to fulfill its need. We already have some concession about symbols that are not support to work in some scenarios (such as malloc() after fork()). > >> If we want better performance we need to: >> >> - Add new APIs mirroring what BSD is doing. > > BSD's APIs are not good, partly because they rely on nonportable C extensions (blocks) that I doubt whether glibc can insist on, partly for other reasons discussed below. An API like what I suggested in my previous email would be better, if we want to add new APIs. The blocks usage is only for *_b symbols, the BSD do provide default C compliant mergesort and heapsort. Also, FreeBSD really specifies the worst-case space complexity and its quicksort has constant-size (although it uses recursion). > >> - New APIs explicitly call out the *type* of sort algorithm. > > "Type" should not mean "heapsort" vs "quicksort" vs "mergesort" as in BSD. Users need performance, async-signal-safety, and stability; the BSD "type" approach is merely a means to that end and any new API should focus on their needs rather than on "type". > >> we should aim for a good implementation that is as safe as we >> can make it (limit memory usage, avoid malloc, avoid quadratic performance >> behaviour). > > If such an approach doesn't significantly hurt typical performance, I'm all for it. I'm skeptical that it's doable, though. As I put, it is always a tradeoff. Making async-safe means that we need to either use a different algorithm with constant worst-case space, use a hybrid approach, or use a different mechanism to allocate memory. > > Let's not underestimate ordinary users' desires for performance here. qsort is used in other core apps (e.g., Emacs) without running afoul of async-signal-safety bugs. > > Also, this is not merely about core apps like Emacs and Ruby; it's also about user one-off apps. I often see Glibc qsort in benchmarks, and if the benchmarks results become significantly worse we'll be more likely to scare users away from Glibc and toward other platforms. Unless we can fix the significant performance issues in the proposed patch, the benefit of working around an only-theoretical bug in Ruby's implementation doesn't seem worth this cost. I really don't think providing a async-safe with constant worst-case space and be *clear* about it will 'scare away' uses, specially because we are making the interface behaving in a more constraint way which imho is good way forward. Users will roll out their implementation due different needs and tradeoff, such as gcc or python. What I think really upsets users it finding out that if they use a different element size or a large number its qsort now starts to call mallocs, or starts to opens system files, or when memory is low it fallbacks to a completely different implementation (which might have quadratic performance in some cases). One possibility is to keep the mergesort using the static array if it fits and fallback to another sorting (possible heapsort or at least the intrasort I am suggesting). I still prefer to keep it simple and with a predictable performance.
On 9/7/21 7:32 AM, Adhemerval Zanella wrote: > I consider this a QoI to move toward provide memory bounded and async-safe > interfaces This is of course fine for new APIs. However, qsort users are accustomed to the longstanding behavior, and good performance is part of their expectations. > Eventually what might happen when some breakage occurs is users like ruby > to roll out their own qsort() implementation to fulfill its need. Ruby already has its own qsort implementation. Ruby's portability bug (which occurs only in theory) is that at a delicate point during a fork Ruby uses the system qsort rather than its own qsort. For what it's worth I submitted a patch for this bug <https://bugs.ruby-lang.org/issues/18152>. Regardless of whether that patch is accepted, this Ruby example is not a good argument for changing glibc qsort behavior, since the bug is not triggered in practical programs. > We already have some concession about symbols that are not support to work > in some scenarios (such as malloc() after fork()). Yes, often it make sense to go beyond what the standard requires. However, each case is different needs to be decided on its own merits. > The blocks usage is only for *_b symbols, the BSD do provide default > C compliant mergesort and heapsort. The blocks usage in macOS is essential for doing closures (the equivalent of qsort_r). Since we don't want to require blocks, we need a better API than what macOS supplies for non-qsort. Would you like to work together to come up with a new API that addresses both of our concerns? > I really don't think providing a async-safe with constant worst-case space > and be *clear* about it will 'scare away' uses, specially because we are > making the interface behaving in a more constraint way which imho is good > way forward. Users will roll out their implementation due different needs > and tradeoff, such as gcc or python. If we were creating the qsort implementation for the first time, the async-signal-safety arguments could be compelling. However, it is an entirely different thing to ask a significant number of users to change longstanding uses; this would be more work for them simply because the libc maintainers changed their mind about a particular performance-safety tradeoff. > What I think really upsets users it finding out that if they use a different > element size or a large number its qsort now starts to call mallocs Users can be upset by many things. They can be upset if qsort calls malloc. They can be upset if qsort is slow. They can be especially upset if we make changes to qsort that force them to rewrite their code. No matter what we do, users can be upset. It's our job to manage these tradeoffs, and to do so in a stable and responsible way.
On 07/09/2021 14:39, Paul Eggert wrote: > On 9/7/21 7:32 AM, Adhemerval Zanella wrote: > >> I consider this a QoI to move toward provide memory bounded and async-safe >> interfaces > > This is of course fine for new APIs. However, qsort users are accustomed to the longstanding behavior, and good performance is part of their expectations. > > >> Eventually what might happen when some breakage occurs is users like ruby >> to roll out their own qsort() implementation to fulfill its need. > > Ruby already has its own qsort implementation. Ruby's portability bug (which occurs only in theory) is that at a delicate point during a fork Ruby uses the system qsort rather than its own qsort. > > For what it's worth I submitted a patch for this bug <https://bugs.ruby-lang.org/issues/18152>. Regardless of whether that patch is accepted, this Ruby example is not a good argument for changing glibc qsort behavior, since the bug is not triggered in practical programs. I disagree, and stating: "This is not a practical problem with glibc, since glibc qsort is async-signal-safe with small sorts and in practice Ruby's use of qsort is invariably small enough" is even more problematic because it states that we are providing async-safeness depending of the input arguments (which is really a bad interface). And your fix is exactly what is not the really intended: user need to roll out ad-hoc sorting implementation to handle libc limitations. > > >> We already have some concession about symbols that are not support to work >> in some scenarios (such as malloc() after fork()). > > Yes, often it make sense to go beyond what the standard requires. However, each case is different needs to be decided on its own merits. > > >> The blocks usage is only for *_b symbols, the BSD do provide default >> C compliant mergesort and heapsort. > > The blocks usage in macOS is essential for doing closures (the equivalent of qsort_r). Since we don't want to require blocks, we need a better API than what macOS supplies for non-qsort. > My point is blocks neither closures are essential to provide different sort algorithms with different strategies and constraints. > Would you like to work together to come up with a new API that addresses both of our concerns? I am not against your suggestion, but I think this is orthogonal to providing a better semantic and guarantees to the current qsort. > > >> I really don't think providing a async-safe with constant worst-case space >> and be *clear* about it will 'scare away' uses, specially because we are >> making the interface behaving in a more constraint way which imho is good >> way forward. Users will roll out their implementation due different needs >> and tradeoff, such as gcc or python. > > If we were creating the qsort implementation for the first time, the async-signal-safety arguments could be compelling. However, it is an entirely different thing to ask a significant number of users to change longstanding uses; this would be more work for them simply because the libc maintainers changed their mind about a particular performance-safety tradeoff. > > >> What I think really upsets users it finding out that if they use a different >> element size or a large number its qsort now starts to call mallocs > > Users can be upset by many things. They can be upset if qsort calls malloc. They can be upset if qsort is slow. They can be especially upset if we make changes to qsort that force them to rewrite their code. No matter what we do, users can be upset. It's our job to manage these tradeoffs, and to do so in a stable and responsible way. I agree that performance degradation is really a potential problem, that's why I used an known algorithm used as a default sort algorithm for different library (stdlibc++). And I tend to see async-signal-safeness, along with bounded memory utilization and simple algorithm a better move forward to *the system C library* than providing the state of the art sort algorithm. From my experience we have much more bug report in the are of broken semantic than from missing performance. As before, users will most likely move to other sorting algorithm if qsort is a hotspot. And if performance is paramount we can either provide it as an alternative API as you suggest or thought a tunable.
On 9/7/21 11:07 AM, Adhemerval Zanella wrote: > we are providing > async-safeness depending of the input arguments (which is really a > bad interface). Many functions are async-signal-safe for some arguments but not others. The 'free' function is just one example. qsort is not at all unusual in this department. This sort of interface is "really bad" only in the sense that every function that is not async-signal-safe is "really bad". > My point is blocks neither closures are essential to provide different > sort algorithms with different strategies and constraints. If that's the point, then I disagree. qsort_r is there for a reason, and this reason applies to any sorting algorithm. Some callers can limp along without qsort_r, by writing code that is not thread-safe or whatever; but many callers really need qsort_r and a similar need would apply to any sorting algorithm. >> > Would you like to work together to come up with a new API that addresses both of our concerns? > I am not against your suggestion That's a good way forward, then. Do you have an API in mind? If not, I can propose one.
On 07/09/2021 16:28, Paul Eggert wrote: > On 9/7/21 11:07 AM, Adhemerval Zanella wrote: >> we are providing >> async-safeness depending of the input arguments (which is really a >> bad interface). > > Many functions are async-signal-safe for some arguments but not others. The 'free' function is just one example. qsort is not at all unusual in this department. This sort of interface is "really bad" only in the sense that every function that is not async-signal-safe is "really bad". My point it is confusing and leads to bad assumption from users, as ruby shows. > >> My point is blocks neither closures are essential to provide different >> sort algorithms with different strategies and constraints. > > If that's the point, then I disagree. qsort_r is there for a reason, and this reason applies to any sorting algorithm. Some callers can limp along without qsort_r, by writing code that is not thread-safe or whatever; but many callers really need qsort_r and a similar need would apply to any sorting algorithm. I think you missed my point here, I was not advertise to not provide a qsort_r like, but rather that the blocks extensions from BSD is just an extension. > >>> > Would you like to work together to come up with a new API that addresses both of our concerns? >> I am not against your suggestion > > That's a good way forward, then. Do you have an API in mind? If not, I can propose one. Sure, but this is orthogonal to the change I am proposing here. May we move forward in this direction first?
On 9/8/21 4:56 AM, Adhemerval Zanella wrote: > it is confusing and leads to bad assumption from users, as ruby > shows. Again, Ruby is not a good example of why a change is needed, as Ruby works fine in practice. >>>>> Would you like to work together to come up with a new API that addresses both of our concerns? >>> I am not against your suggestion >> >> That's a good way forward, then. Do you have an API in mind? If not, I can propose one. > > Sure, but this is orthogonal to the change I am proposing here. It's not orthogonal, because if we provide a better API that will be a solution to this (so far only-theoretical) problem. Although a better API should also address other problems, that doesn't mean it's orthogonal to this one.