[v3,0/7] Use introsort for qsort

Message ID	20210903171144.952737-1-adhemerval.zanella@linaro.org
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F773384B13A To: libc-alpha@sourceware.org Subject: [PATCH v3 0/7] Use introsort for qsort Date: Fri, 3 Sep 2021 14:11:37 -0300 Message-Id: <20210903171144.952737-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
Series	Use introsort for qsort \| [v3,0/7] Use introsort for qsort [v3,1/7] benchtests: Add bench-qsort [v3,2/7] support: Fix getopt_long with CMDLINE_OPTIONS [v3,3/7] stdlib: Optimization qsort{_r} swap implementation (BZ #19305) [v3,4/7] stdlib: Move insertion sort out qsort [v3,5/7] stdlib: qsort: Move some macros to inline function [v3,6/7] stdlib: Implement introsort with qsort [v3,7/7] stdlib: Remove use of mergesort on qsort (BZ #21719)

Message ID

20210903171144.952737-1-adhemerval.zanella@linaro.org

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8F773384B13A
To: libc-alpha@sourceware.org
Subject: [PATCH v3 0/7] Use introsort for qsort
Date: Fri,  3 Sep 2021 14:11:37 -0300
Message-Id: <20210903171144.952737-1-adhemerval.zanella@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

Series

Use introsort for qsort |

Message

Adhemerval Zanella Sept. 3, 2021, 5:11 p.m. UTC

  This is respin from my previous attempt to fix and improve qsort() [1].
The main motivation to use introsort is to make it fully AS-Safe and
AC-Safe, with a limited stack size requirement, to remove the use
of malloc (along with the system memory checks), and keeping Worst-case
performance on O(n*ln(n)) (instead of potentially quadradic as for the
quicksort).

The implementation does not aim to be the state-of-the-art sort
algorithm, rather I used a well understood and simple on ((introsort,
used on libstdc++) and leverage the current quicksort implementation
along with a heapsort one from Linux kernel.

The resulting implementation has bounded stack requirement (about 1.8k
on x86_64), and results in an simplified code and binary:

# master
$ size stdlib/qsort.o stdlib/msort.o
   text    data     bss     dec     hex filename
   1316       0       0    1316     524 stdlib/qsort.o
   2045       0      12    2057     809 stdlib/msort.o

# patched
$ size stdlib/qsort.os
   text    data     bss     dec     hex filename
   2373       0       0    2373     945 stdlib/qsort.os

The performance difference with the provided benchmark clearly show the
tradeoff of using a instrosort over a mergesort.  The results are on
a Ryzen 9 with gcc 11 ran with --nelem 49152:786432:8388608 (to simulate
the L1/L2/L3 of this CPU used):

SORTED
  elements=    49152 (size= 4): %52.47% (master=3654381.0 patches=5571774.0)
  elements=   786432 (size= 4): %57.39% (master=73067211.0 patches=115003964.0)
  elements=  8388608 (size= 4): %49.71% (master=1005357374.0 patches=1505144332.0)
  elements=    49152 (size= 8): %48.98% (master=3796489.0 patches=5656023.0)
  elements=   786432 (size= 8): %52.34% (master=75809561.0 patches=115491584.0)
  elements=  8388608 (size= 8): %49.97% (master=1004201712.0 patches=1506017512.0)
  elements=    49152 (size=32): %-2.69% (master=11191649.0 patches=10891084.0)
  elements=   786432 (size=32): %-12.16% (master=238870746.0 patches=209822358.0)
  elements=  8388608 (size=32): %-23.33% (master=3323248933.0 patches=2547848399.0)
MOSTLYSORTED
  elements=    49152 (size= 4): %66.47% (master=6915979.0 patches=11512945.0)
  elements=   786432 (size= 4): %66.48% (master=143188170.0 patches=238379284.0)
  elements=  8388608 (size= 4): %70.90% (master=1805703842.0 patches=3086011963.0)
  elements=    49152 (size= 8): %78.11% (master=6836231.0 patches=12176330.0)
  elements=   786432 (size= 8): %77.79% (master=138851879.0 patches=246867806.0)
  elements=  8388608 (size= 8): %77.70% (master=1773884617.0 patches=3152112348.0)
  elements=    49152 (size=32): %-6.44% (master=13766992.0 patches=12880888.0)
  elements=   786432 (size=32): %-15.19% (master=287590899.0 patches=243919593.0)
  elements=  8388608 (size=32): %-25.63% (master=3852042869.0 patches=2864641676.0)
RANDOM
  elements=    49152 (size= 4): %9.54% (master=15078260.0 patches=16516457.0)
  elements=   786432 (size= 4): %5.64% (master=305805587.0 patches=323043450.0)
  elements=  8388608 (size= 4): %2.78% (master=3838340458.0 patches=3945202718.0)
  elements=    49152 (size= 8): %13.75% (master=14676586.0 patches=16694499.0)
  elements=   786432 (size= 8): %10.71% (master=295583560.0 patches=327252646.0)
  elements=  8388608 (size= 8): %7.67% (master=3722946692.0 patches=4008563091.0)
  elements=    49152 (size=32): %-8.34% (master=20944462.0 patches=19196696.0)
  elements=   786432 (size=32): %-14.31% (master=425822838.0 patches=364896701.0)
  elements=  8388608 (size=32): %-19.86% (master=5534217212.0 patches=4435276362.0)
REPEATED
  elements=    49152 (size= 4): %9.18% (master=14051610.0 patches=15341610.0)
  elements=   786432 (size= 4): %6.01% (master=282787457.0 patches=299777700.0)
  elements=  8388608 (size= 4): %4.19% (master=3539279291.0 patches=3687517777.0)
  elements=    49152 (size= 8): %12.37% (master=13659340.0 patches=15349336.0)
  elements=   786432 (size= 8): %11.11% (master=273070595.0 patches=303413116.0)
  elements=  8388608 (size= 8): %8.46% (master=3424739041.0 patches=3714521094.0)
  elements=    49152 (size=32): %-11.21% (master=19826230.0 patches=17603795.0)
  elements=   786432 (size=32): %-16.73% (master=405322605.0 patches=337501658.0)
  elements=  8388608 (size=32): %-21.93% (master=5261140329.0 patches=4107591456.0)
BITONIC
  elements=    49152 (size= 4): %405.43% (master=4632011.0 patches=23411611.0)
  elements=   786432 (size= 4): %493.28% (master=92649410.0 patches=549674361.0)
  elements=  8388608 (size= 4): %609.03% (master=1095591416.0 patches=7768102251.0)
  elements=    49152 (size= 8): %402.05% (master=4672695.0 patches=23459484.0)
  elements=   786432 (size= 8): %480.60% (master=94419035.0 patches=548197066.0)
  elements=  8388608 (size= 8): %596.41% (master=1116631167.0 patches=7776333333.0)
  elements=    49152 (size=32): %-6.92% (master=11545720.0 patches=10747100.0)
  elements=   786432 (size=32): %-14.97% (master=245037969.0 patches=208357373.0)
  elements=  8388608 (size=32): %-24.14% (master=3357712042.0 patches=2547221807.0)

With the bitonic and sorted showing the biggest hit.  Even though, I
think the tradeoff is acceptable.  The improvements over sizes larger
than 8 is mostly due the optimization done on swap operations.

[1] https://sourceware.org/pipermail/libc-alpha/2018-August/096981.html

Adhemerval Zanella (7):
  benchtests: Add bench-qsort
  support: Fix getopt_long with CMDLINE_OPTIONS
  stdlib: Optimization qsort{_r} swap implementation (BZ #19305)
  stdlib: Move insertion sort out qsort
  stdlib: qsort: Move some macros to inline function
  stdlib: Implement introsort with qsort
  stdlib: Remove use of mergesort on qsort (BZ #21719)

 benchtests/Makefile         |   2 +-
 benchtests/bench-qsort.c    | 195 +++++++++++++++++++++
 manual/argp.texi            |   2 +-
 manual/locale.texi          |   3 +-
 manual/search.texi          |   7 +-
 stdlib/Makefile             |   3 +-
 stdlib/msort.c              | 310 ---------------------------------
 stdlib/qsort.c              | 333 ++++++++++++++++++++++++++++--------
 support/support_test_main.c |   1 -
 support/test-driver.h       |   1 +
 10 files changed, 460 insertions(+), 397 deletions(-)
 create mode 100644 benchtests/bench-qsort.c
 delete mode 100644 stdlib/msort.c

Comments

Paul Eggert Sept. 3, 2021, 7:18 p.m. UTC | #1

On 9/3/21 10:11 AM, Adhemerval Zanella wrote:

> The performance difference with the provided benchmark clearly show the
> tradeoff of using a instrosort over a mergesort.

Zhang, Meng and Liang <https://arxiv.org/pdf/1609.04471> report that 
introsort is particularly bad with reverse-sorted input; that case 
should be added to the benchmarks.

The proposed change would appear to hurt performance significantly in 
the common case of sorting pointers. My guess is that typical glibc 
users would prefer speed to hardening against rare signal-handling or 
low-memory cases, since portable code will have to assume that qsort 
isn't hardened anyway.

If hardening is a goal, perhaps we should consider an alternate API 
instead of changing qsort's implementation. If we do that, we can 
improve the API as well as changing the function's name. The new API 
could promise to not call malloc, or more generally to be 
async-signal-safe if the comparison function is. The comparison function 
could take a third void* argument so it can in effect be a closure (this 
is qsort's greatest failing), the sorting function could take a flag 
argument for options, and it could return an error indication (in case 
it detects a comparison function gone haywire, say). That sort of thing.

> The improvements over sizes larger
> than 8 is mostly due the optimization done on swap operations.

A similar optimization could be done to the existing code independently, 
no? I expect that an apples-to-apples performance comparison would be 
even less favorable to the proposed change.

Carlos O'Donell Sept. 6, 2021, 2:13 p.m. UTC | #2

On 9/3/21 3:18 PM, Paul Eggert wrote:
> On 9/3/21 10:11 AM, Adhemerval Zanella wrote:
> 
>> The performance difference with the provided benchmark clearly show
>> the tradeoff of using a instrosort over a mergesort.
> 
> Zhang, Meng and Liang <https://arxiv.org/pdf/1609.04471> report that
> introsort is particularly bad with reverse-sorted input; that case
> should be added to the benchmarks.

It should, and it would be useful, however, I think that Adhemerval's
changes to make qsort AS-safe are actually relevant and important.

All existing ruby implementations I can see use qsort in an
AS-safe-requiring context (process.c) and this would help those scenarios.

In general I think glibc should provide a qsort that is:

- Safe.

- Does not have quadratic performance behaviour.

If we want better performance we need to:

- Add new APIs mirroring what BSD is doing.

- New APIs explicitly call out the *type* of sort algorithm.

- Encourage developers and application authors to contribute those
  new implementations and use them in their applications.

There is never going to be a qsort that meets all of the workload demands
and so we should aim for a good implementation that is as safe as we
can make it (limit memory usage, avoid malloc, avoid quadratic performance
behaviour).

> The proposed change would appear to hurt performance significantly in
> the common case of sorting pointers. My guess is that typical glibc
> users would prefer speed to hardening against rare signal-handling or
> low-memory cases, since portable code will have to assume that qsort
> isn't hardened anyway.

I disagree. I think users do not expect qsort to be high performance,
they expect it to be OK performance, OK memory usage, and lastly safe.

My experience with core python developers has been that they expect
glibc to provide functions that are safe in as many contexts as possible
in order to have them avoid rolling their own for the implementation
of the language runtime.

Application developers may not care about this AS-safety, but they also
have the flexiblity to bring in additional dependencies. We can satisfy
those requirements by adding more algorithms under new symbols.

> If hardening is a goal, perhaps we should consider an alternate API
> instead of changing qsort's implementation. If we do that, we can
> improve the API as well as changing the function's name. The new API
> could promise to not call malloc, or more generally to be
> async-signal-safe if the comparison function is. The comparison
> function could take a third void* argument so it can in effect be a
> closure (this is qsort's greatest failing), the sorting function
> could take a flag argument for options, and it could return an error
> indication (in case it detects a comparison function gone haywire,
> say). That sort of thing.

I'd argue the opposite. Make the current functions safe. Add new high
performance versions with explicit algorithms and explicit tradeoffs.

>> The improvements over sizes larger than 8 is mostly due the
>> optimization done on swap operations.
> 
> A similar optimization could be done to the existing code
> independently, no? I expect that an apples-to-apples performance
> comparison would be even less favorable to the proposed change.

Size is nice, but safety is nicer IMO.

Zack Weinberg Sept. 6, 2021, 5:03 p.m. UTC | #3

On Mon, Sep 6, 2021, at 10:13 AM, Carlos O'Donell via Libc-alpha wrote:
> 
> In general I think glibc should provide a qsort that is:
> 
> - Safe.
> 
> - Does not have quadratic performance behaviour.
> 
> If we want better performance we need to:
> 
> - Add new APIs mirroring what BSD is doing.
> 
> - New APIs explicitly call out the *type* of sort algorithm.
> 
> - Encourage developers and application authors to contribute those
>   new implementations and use them in their applications.

Do you have a link or other reference for these new APIs being developed by the BSDs?

(asking out of curiosity; I haven't been following this discussion closely enough to have an opinion)

zw

Adhemerval Zanella Sept. 6, 2021, 6:19 p.m. UTC | #4

On 06/09/2021 14:03, Zack Weinberg via Libc-alpha wrote:
> On Mon, Sep 6, 2021, at 10:13 AM, Carlos O'Donell via Libc-alpha wrote:
>>
>> In general I think glibc should provide a qsort that is:
>>
>> - Safe.
>>
>> - Does not have quadratic performance behaviour.
>>
>> If we want better performance we need to:
>>
>> - Add new APIs mirroring what BSD is doing.
>>
>> - New APIs explicitly call out the *type* of sort algorithm.
>>
>> - Encourage developers and application authors to contribute those
>>   new implementations and use them in their applications.
> 
> Do you have a link or other reference for these new APIs being developed by the BSDs?
> 
> (asking out of curiosity; I haven't been following this discussion closely enough to have an opinion)
> 

The FreeBSD provides qsort, which implements a quicksort, heapsort,
and mergesort [1].

[1] https://www.freebsd.org/cgi/man.cgi?query=qsort&sektion=3&manpath=freebsd-release

Paul Eggert Sept. 7, 2021, 12:14 a.m. UTC | #5

On 9/6/21 7:13 AM, Carlos O'Donell wrote:

> All existing ruby implementations I can see use qsort in an
> AS-safe-requiring context (process.c) and this would help those scenarios.

I don't see how those scenarios are practical. Assuming glibc and the 
standard Ruby implementation, any async signal problems could occur only 
if Ruby's spawn method has more than 32 redirects, as this is the 
minimum number that would cause qsort to call malloc on practical 
platforms. In practice Ruby spawns don't have that many redirects, so 
Ruby programs won't run into a problem unless they're contrived to 
illustrate this portability bug in the Ruby implementation.

> If we want better performance we need to:
> 
> - Add new APIs mirroring what BSD is doing.

BSD's APIs are not good, partly because they rely on nonportable C 
extensions (blocks) that I doubt whether glibc can insist on, partly for 
other reasons discussed below. An API like what I suggested in my 
previous email would be better, if we want to add new APIs.

> - New APIs explicitly call out the *type* of sort algorithm.

"Type" should not mean "heapsort" vs "quicksort" vs "mergesort" as in 
BSD. Users need performance, async-signal-safety, and stability; the BSD 
"type" approach is merely a means to that end and any new API should 
focus on their needs rather than on "type".

> we should aim for a good implementation that is as safe as we
> can make it (limit memory usage, avoid malloc, avoid quadratic performance
> behaviour).

If such an approach doesn't significantly hurt typical performance, I'm 
all for it. I'm skeptical that it's doable, though.

Let's not underestimate ordinary users' desires for performance here. 
qsort is used in other core apps (e.g., Emacs) without running afoul of 
async-signal-safety bugs.

Also, this is not merely about core apps like Emacs and Ruby; it's also 
about user one-off apps.  I often see Glibc qsort in benchmarks, and if 
the benchmarks results become significantly worse we'll be more likely 
to scare users away from Glibc and toward other platforms. Unless we can 
fix the significant performance issues in the proposed patch, the 
benefit of working around an only-theoretical bug in Ruby's 
implementation doesn't seem worth this cost.

Adhemerval Zanella Sept. 7, 2021, 2:32 p.m. UTC | #6

On 06/09/2021 21:14, Paul Eggert wrote:
> On 9/6/21 7:13 AM, Carlos O'Donell wrote:
> 
>> All existing ruby implementations I can see use qsort in an
>> AS-safe-requiring context (process.c) and this would help those scenarios.
> 
> I don't see how those scenarios are practical. Assuming glibc and the standard Ruby implementation, any async signal problems could occur only if Ruby's spawn method has more than 32 redirects, as this is the minimum number that would cause qsort to call malloc on practical platforms. In practice Ruby spawns don't have that many redirects, so Ruby programs won't run into a problem unless they're contrived to illustrate this portability bug in the Ruby implementation.

I consider this a QoI to move toward provide memory bounded and async-safe
interfaces even though the interface contract does not really specify such
constraint.  I give you that it might lead to non-portable code such the
ruby case, but I prefer to actually move the interface to be async-safe
instead of either trying break the caller expectations (that only works due
an optimization) or changing the implementation and let the users handle 
any potential breakage.

Eventually what might happen when some breakage occurs is users like ruby 
to roll out their own qsort() implementation to fulfill its need.

We already have some concession about symbols that are not support to work
in some scenarios (such as malloc() after fork()).

> 
>> If we want better performance we need to:
>>
>> - Add new APIs mirroring what BSD is doing.
> 
> BSD's APIs are not good, partly because they rely on nonportable C extensions (blocks) that I doubt whether glibc can insist on, partly for other reasons discussed below. An API like what I suggested in my previous email would be better, if we want to add new APIs.

The blocks usage is only for *_b symbols, the BSD do provide default
C compliant mergesort and heapsort.  Also, FreeBSD really specifies
the worst-case space complexity and its quicksort has constant-size
(although it uses recursion).

> 
>> - New APIs explicitly call out the *type* of sort algorithm.
> 
> "Type" should not mean "heapsort" vs "quicksort" vs "mergesort" as in BSD. Users need performance, async-signal-safety, and stability; the BSD "type" approach is merely a means to that end and any new API should focus on their needs rather than on "type".
> 
>> we should aim for a good implementation that is as safe as we
>> can make it (limit memory usage, avoid malloc, avoid quadratic performance
>> behaviour).
> 
> If such an approach doesn't significantly hurt typical performance, I'm all for it. I'm skeptical that it's doable, though.

As I put, it is always a tradeoff.  Making async-safe means that we need to
either use a different algorithm with constant worst-case space, use a
hybrid approach, or use a different mechanism to allocate memory.

> 
> Let's not underestimate ordinary users' desires for performance here. qsort is used in other core apps (e.g., Emacs) without running afoul of async-signal-safety bugs.
> 
> Also, this is not merely about core apps like Emacs and Ruby; it's also about user one-off apps.  I often see Glibc qsort in benchmarks, and if the benchmarks results become significantly worse we'll be more likely to scare users away from Glibc and toward other platforms. Unless we can fix the significant performance issues in the proposed patch, the benefit of working around an only-theoretical bug in Ruby's implementation doesn't seem worth this cost.

I really don't think providing a async-safe with constant worst-case space
and be *clear* about it will 'scare away' uses, specially because we are
making the interface behaving in a more constraint way which imho is good
way forward.  Users will roll out their implementation due different needs
and tradeoff, such as gcc or python.

What I think really upsets users it finding out that if they use a different
element size or a large number its qsort now starts to call mallocs, or starts
to opens system files, or when memory is low it fallbacks to a completely
different implementation (which might have quadratic performance in some cases).

One possibility is to keep the mergesort using the static array if it
fits and fallback to another sorting (possible heapsort or at least
the intrasort I am suggesting).  I still prefer to keep it simple and
with a predictable performance.

Paul Eggert Sept. 7, 2021, 5:39 p.m. UTC | #7

On 9/7/21 7:32 AM, Adhemerval Zanella wrote:

> I consider this a QoI to move toward provide memory bounded and async-safe
> interfaces

This is of course fine for new APIs. However, qsort users are accustomed 
to the longstanding behavior, and good performance is part of their 
expectations.

> Eventually what might happen when some breakage occurs is users like ruby
> to roll out their own qsort() implementation to fulfill its need.

Ruby already has its own qsort implementation. Ruby's portability bug 
(which occurs only in theory) is that at a delicate point during a fork 
Ruby uses the system qsort rather than its own qsort.

For what it's worth I submitted a patch for this bug 
<https://bugs.ruby-lang.org/issues/18152>. Regardless of whether that 
patch is accepted, this Ruby example is not a good argument for changing 
glibc qsort behavior, since the bug is not triggered in practical programs.

> We already have some concession about symbols that are not support to work
> in some scenarios (such as malloc() after fork()).

Yes, often it make sense to go beyond what the standard requires. 
However, each case is different needs to be decided on its own merits.

> The blocks usage is only for *_b symbols, the BSD do provide default
> C compliant mergesort and heapsort.

The blocks usage in macOS is essential for doing closures (the 
equivalent of qsort_r). Since we don't want to require blocks, we need a 
better API than what macOS supplies for non-qsort.

Would you like to work together to come up with a new API that addresses 
both of our concerns?

> I really don't think providing a async-safe with constant worst-case space
> and be *clear* about it will 'scare away' uses, specially because we are
> making the interface behaving in a more constraint way which imho is good
> way forward.  Users will roll out their implementation due different needs
> and tradeoff, such as gcc or python.

If we were creating the qsort implementation for the first time, the 
async-signal-safety arguments could be compelling. However, it is an 
entirely different thing to ask a significant number of users to change 
longstanding uses; this would be more work for them simply because the 
libc maintainers changed their mind about a particular 
performance-safety tradeoff.

> What I think really upsets users it finding out that if they use a different
> element size or a large number its qsort now starts to call mallocs

Users can be upset by many things. They can be upset if qsort calls 
malloc. They can be upset if qsort is slow. They can be especially upset 
if we make changes to qsort that force them to rewrite their code. No 
matter what we do, users can be upset. It's our job to manage these 
tradeoffs, and to do so in a stable and responsible way.

Adhemerval Zanella Sept. 7, 2021, 6:07 p.m. UTC | #8

On 07/09/2021 14:39, Paul Eggert wrote:
> On 9/7/21 7:32 AM, Adhemerval Zanella wrote:
> 
>> I consider this a QoI to move toward provide memory bounded and async-safe
>> interfaces
> 
> This is of course fine for new APIs. However, qsort users are accustomed to the longstanding behavior, and good performance is part of their expectations.
> 
> 
>> Eventually what might happen when some breakage occurs is users like ruby
>> to roll out their own qsort() implementation to fulfill its need.
> 
> Ruby already has its own qsort implementation. Ruby's portability bug (which occurs only in theory) is that at a delicate point during a fork Ruby uses the system qsort rather than its own qsort.
> 
> For what it's worth I submitted a patch for this bug <https://bugs.ruby-lang.org/issues/18152>. Regardless of whether that patch is accepted, this Ruby example is not a good argument for changing glibc qsort behavior, since the bug is not triggered in practical programs.

I disagree, and stating:

"This is not a practical problem with glibc, since glibc qsort is 
async-signal-safe with small sorts and in practice Ruby's use of qsort 
is invariably small enough"

is even more problematic because it states that we are providing
async-safeness depending of the input arguments (which is really a
bad interface).

And your fix is exactly what is not the really intended: user need to
roll out ad-hoc sorting implementation to handle libc limitations.

> 
> 
>> We already have some concession about symbols that are not support to work
>> in some scenarios (such as malloc() after fork()).
> 
> Yes, often it make sense to go beyond what the standard requires. However, each case is different needs to be decided on its own merits.
> 
> 
>> The blocks usage is only for *_b symbols, the BSD do provide default
>> C compliant mergesort and heapsort.
> 
> The blocks usage in macOS is essential for doing closures (the equivalent of qsort_r). Since we don't want to require blocks, we need a better API than what macOS supplies for non-qsort.
> 

My point is blocks neither closures are essential to provide different
sort algorithms with different strategies and constraints.

> Would you like to work together to come up with a new API that addresses both of our concerns?
I am not against your suggestion, but I think this is orthogonal to 
providing a better semantic and guarantees to the current qsort.

> 
> 
>> I really don't think providing a async-safe with constant worst-case space
>> and be *clear* about it will 'scare away' uses, specially because we are
>> making the interface behaving in a more constraint way which imho is good
>> way forward.  Users will roll out their implementation due different needs
>> and tradeoff, such as gcc or python.
> 
> If we were creating the qsort implementation for the first time, the async-signal-safety arguments could be compelling. However, it is an entirely different thing to ask a significant number of users to change longstanding uses; this would be more work for them simply because the libc maintainers changed their mind about a particular performance-safety tradeoff.
> 
> 
>> What I think really upsets users it finding out that if they use a different
>> element size or a large number its qsort now starts to call mallocs
> 
> Users can be upset by many things. They can be upset if qsort calls malloc. They can be upset if qsort is slow. They can be especially upset if we make changes to qsort that force them to rewrite their code. No matter what we do, users can be upset. It's our job to manage these tradeoffs, and to do so in a stable and responsible way.

I agree that performance degradation is really a potential problem,
that's why I used an known algorithm used as a default sort algorithm
for different library (stdlibc++). 

And I tend to see async-signal-safeness, along with bounded memory
utilization and simple algorithm a better move forward to 
*the system C library* than providing the state of the art sort algorithm.
From my experience we have much more bug report in the are of broken
semantic than from missing performance. As before, users will most likely
move to other sorting algorithm if qsort is a hotspot.

And if performance is paramount we can either provide it as an alternative 
API as you suggest or thought a tunable.

Paul Eggert Sept. 7, 2021, 7:28 p.m. UTC | #9

On 9/7/21 11:07 AM, Adhemerval Zanella wrote:
> we are providing
> async-safeness depending of the input arguments (which is really a
> bad interface).

Many functions are async-signal-safe for some arguments but not others. 
The 'free' function is just one example. qsort is not at all unusual in 
this department. This sort of interface is "really bad" only in the 
sense that every function that is not async-signal-safe is "really bad".

> My point is blocks neither closures are essential to provide different
> sort algorithms with different strategies and constraints.

If that's the point, then I disagree. qsort_r is there for a reason, and 
this reason applies to any sorting algorithm. Some callers can limp 
along without qsort_r, by writing code that is not thread-safe or 
whatever; but many callers really need qsort_r and a similar need would 
apply to any sorting algorithm.

>> > Would you like to work together to come up with a new API that addresses both of our concerns?
> I am not against your suggestion

That's a good way forward, then. Do you have an API in mind? If not, I 
can propose one.

Adhemerval Zanella Sept. 8, 2021, 11:56 a.m. UTC | #10

On 07/09/2021 16:28, Paul Eggert wrote:
> On 9/7/21 11:07 AM, Adhemerval Zanella wrote:
>> we are providing
>> async-safeness depending of the input arguments (which is really a
>> bad interface).
> 
> Many functions are async-signal-safe for some arguments but not others. The 'free' function is just one example. qsort is not at all unusual in this department. This sort of interface is "really bad" only in the sense that every function that is not async-signal-safe is "really bad".

My point it is confusing and leads to bad assumption from users, as ruby
shows.

> 
>> My point is blocks neither closures are essential to provide different
>> sort algorithms with different strategies and constraints.
> 
> If that's the point, then I disagree. qsort_r is there for a reason, and this reason applies to any sorting algorithm. Some callers can limp along without qsort_r, by writing code that is not thread-safe or whatever; but many callers really need qsort_r and a similar need would apply to any sorting algorithm.

I think you missed my point here, I was not advertise to not provide
a qsort_r like, but rather that the blocks extensions from BSD is
just an extension.

> 
>>> > Would you like to work together to come up with a new API that addresses both of our concerns?
>> I am not against your suggestion
> 
> That's a good way forward, then. Do you have an API in mind? If not, I can propose one.

Sure, but this is orthogonal to the change I am proposing here. May
we move forward in this direction first?

Paul Eggert Sept. 9, 2021, 12:39 a.m. UTC | #11

On 9/8/21 4:56 AM, Adhemerval Zanella wrote:

> it is confusing and leads to bad assumption from users, as ruby
> shows.

Again, Ruby is not a good example of why a change is needed, as Ruby 
works fine in practice.

>>>>> Would you like to work together to come up with a new API that addresses both of our concerns?
>>> I am not against your suggestion
>>
>> That's a good way forward, then. Do you have an API in mind? If not, I can propose one.
> 
> Sure, but this is orthogonal to the change I am proposing here.

It's not orthogonal, because if we provide a better API that will be a 
solution to this (so far only-theoretical) problem. Although a better 
API should also address other problems, that doesn't mean it's 
orthogonal to this one.