stdlib: Optimize number of calls to comparison function
Checks
Context 
Check 
Description 
redhatptbot/TryBotapply_patch 
success

Patch applied to master at the time it was sent

redhatptbot/TryBot32bit 
success

Build for i686

linarotcwgbot/tcwg_glibc_buildmasteraarch64 
success

Testing passed

linarotcwgbot/tcwg_glibc_checkmasteraarch64 
success

Testing passed

linarotcwgbot/tcwg_glibc_buildmasterarm 
success

Testing passed

linarotcwgbot/tcwg_glibc_checkmasterarm 
success

Testing passed

Commit Message
The current heapsort implementation in the siftdown function follows
the standard textbook version, necessitating two comparisons at each
level. Transitioning to the Bottomup heapsort version allows us to
decrease the required comparisons to just one per level. On average,
this modification significantly reduces the comparison count from
2nlog2(n)  3n + o(n) to nlog2(n) + 0.37*n + o(n).
Refs:
BOTTOMUPHEAPSORT, a new variant of HEAPSORT beating, on an average,
QUICKSORT (if n is not very small)
Ingo Wegener
Theoretical Computer Science, 118(1); Pages 8198, 13 September 1993
https://doi.org/10.1016/03043975(93)90364Y
Signedoffby: KuanWei Chiu <visitorckw@gmail.com>

stdlib/qsort.c  36 ++++++++++++++++++
1 file changed, 18 insertions(+), 18 deletions()
Comments
* KuanWei Chiu:
> The current heapsort implementation in the siftdown function follows
> the standard textbook version, necessitating two comparisons at each
> level. Transitioning to the Bottomup heapsort version allows us to
> decrease the required comparisons to just one per level. On average,
> this modification significantly reduces the comparison count from
> 2nlog2(n)  3n + o(n) to nlog2(n) + 0.37*n + o(n).
I think the factor in stdlib/tstqsort5.c needs to be adjusted:
/* This is an arbitrary factor which is true for the current
implementation across a wide range of sizes. */
TEST_VERIFY (factor <= 4.5);
On the other hand, the heapsort code is not really expected to run at
all in the current implementation because medianofthree quicksort
usually avoids degenerating behavior. If this heapsort variant uses a
number of comparisons that is competitive to quicksort, maybe we should
use it instead? And use insertion sort for short arrays only.
Thanks,
Florian
On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
> I think the factor in stdlib/tstqsort5.c needs to be adjusted:
>
> /* This is an arbitrary factor which is true for the current
> implementation across a wide range of sizes. */
> TEST_VERIFY (factor <= 4.5);
>
It seems that the factor can be adjusted to around 3.5. I can send
another patch for this adjustment or resend it as a patch series.
> On the other hand, the heapsort code is not really expected to run at
> all in the current implementation because medianofthree quicksort
> usually avoids degenerating behavior. If this heapsort variant uses a
> number of comparisons that is competitive to quicksort, maybe we should
> use it instead? And use insertion sort for short arrays only.
>
This patch currently reduces the comparison count only in scenarios
where deep recursion results in a fallback to heapsort. According to
the referenced paper, beyond n >= 16000, the comparison count of
bottomup heapsort is lower than that of medianofthree quicksort.
However, my concern is that quicksort typically exhibits better
locality of reference, making it more cachefriendly, while heapsort
lacks this advantage.
Best regards,
KuanWei Chiu
* KuanWei Chiu:
> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>> I think the factor in stdlib/tstqsort5.c needs to be adjusted:
>>
>> /* This is an arbitrary factor which is true for the current
>> implementation across a wide range of sizes. */
>> TEST_VERIFY (factor <= 4.5);
>>
>
> It seems that the factor can be adjusted to around 3.5. I can send
> another patch for this adjustment or resend it as a patch series.
Based on the paper, I would have expected a greater reduction.
>> On the other hand, the heapsort code is not really expected to run at
>> all in the current implementation because medianofthree quicksort
>> usually avoids degenerating behavior. If this heapsort variant uses a
>> number of comparisons that is competitive to quicksort, maybe we should
>> use it instead? And use insertion sort for short arrays only.
>>
>
> This patch currently reduces the comparison count only in scenarios
> where deep recursion results in a fallback to heapsort. According to
> the referenced paper, beyond n >= 16000, the comparison count of
> bottomup heapsort is lower than that of medianofthree quicksort.
> However, my concern is that quicksort typically exhibits better
> locality of reference, making it more cachefriendly, while heapsort
> lacks this advantage.
Yes, the performance numbers I posted suggest that something like that
is at play here. Even though heapsort with your patch performs fewer
comparisons to most other implementations we have floating around
(except the historic merge sort), it is still slower than most.
Thanks,
Florian
On Tue, Dec 05, 2023 at 11:44:03AM +0100, Florian Weimer wrote:
> * KuanWei Chiu:
>
> > On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
> >> I think the factor in stdlib/tstqsort5.c needs to be adjusted:
> >>
> >> /* This is an arbitrary factor which is true for the current
> >> implementation across a wide range of sizes. */
> >> TEST_VERIFY (factor <= 4.5);
> >>
> >
> > It seems that the factor can be adjusted to around 3.5. I can send
> > another patch for this adjustment or resend it as a patch series.
>
> Based on the paper, I would have expected a greater reduction.
>
The adjustment factor of 3.5 is derived from data obtained using the
tstqsort5 test. Upon examining the number of elements sorted by
heapsort during the test and the corresponding comparison counts, the
comparison count is approximately nlog2(n) + 0.83*n. This is higher
than the analysis in the paper, which suggests nlog2(n) + d(n)*n, where
d(n) belongs to the range [0.34, 0.39]. I believe this discrepancy is
due to the paper's focus on the average comparison count in random
scenarios, whereas here, we are intentionally creating adversarial
permutations, making it not purely random.
Additionally, in the performance numbers you posted, d(n) is
approximately equal to 0.34, aligning with the analysis in the paper.
Here is the test data:
heapsort: number of comparisons, number of elements
Before applying the patch (commit 8e755f5bc8):
info: length 100: 1832 comparisons ~ 2.438387 * n * log (n) heapsort: 662, 76
info: length 1000: 34020 comparisons ~ 3.260275 * n * log (n) heapsort: 15324, 964
info: length 10000: 494661 comparisons ~ 3.772689 * n * log (n) heapsort: 225309, 9948
info: length 20000: 1071780 comparisons ~ 3.859536 * n * log (n) heapsort: 492534, 19944
info: length 30000: 1644872 comparisons ~ 3.830672 * n * log (n) heapsort: 775626, 29944
info: length 40000: 2305025 comparisons ~ 3.933328 * n * log (n) heapsort: 1065893, 39940
info: length 50000: 2912911 comparisons ~ 3.913976 * n * log (n) heapsort: 1363779, 49940
info: length 60000: 3529897 comparisons ~ 3.902134 * n * log (n) heapsort: 1670765, 59940
info: length 70000: 4285743 comparisons ~ 4.009278 * n * log (n) heapsort: 1976733, 69936
info: length 80000: 4929859 comparisons ~ 3.998699 * n * log (n) heapsort: 2290849, 79936
info: length 90000: 5575320 comparisons ~ 3.987933 * n * log (n) heapsort: 2606310, 89936
info: length 100000: 6223572 comparisons ~ 3.978286 * n * log (n) heapsort: 2924562, 99936
info: length 110000: 6881296 comparisons ~ 3.973322 * n * log (n) heapsort: 3252286, 109936
info: length 120000: 7537050 comparisons ~ 3.966365 * n * log (n) heapsort: 3578040, 119936
info: length 130000: 8191258 comparisons ~ 3.958248 * n * log (n) heapsort: 3902248, 129936
info: length 140000: 9138135 comparisons ~ 4.072406 * n * log (n) heapsort: 4239255, 139932
info: length 150000: 9822309 comparisons ~ 4.067167 * n * log (n) heapsort: 4573429, 149932
After applying the patch:
info: length 100: 1690 comparisons ~ 2.273802 * n * log (n) heapsort: 520, 76
info: length 1000: 28993 comparisons ~ 2.821755 * n * log (n) heapsort: 10297, 964
info: length 10000: 409589 comparisons ~ 3.169480 * n * log (n) heapsort: 140237, 9948
info: length 20000: 879212 comparisons ~ 3.211902 * n * log (n) heapsort: 299966, 19944
info: length 30000: 1336536 comparisons ~ 3.158407 * n * log (n) heapsort: 467290, 29944
info: length 40000: 1882150 comparisons ~ 3.256782 * n * log (n) heapsort: 643018, 39940
info: length 50000: 2369603 comparisons ~ 3.228731 * n * log (n) heapsort: 820471, 49940
info: length 60000: 2859511 comparisons ~ 3.205839 * n * log (n) heapsort: 1000379, 59940
info: length 70000: 3492146 comparisons ~ 3.311278 * n * log (n) heapsort: 1183136, 69936
info: length 80000: 4006690 comparisons ~ 3.294222 * n * log (n) heapsort: 1367680, 79936
info: length 90000: 4523504 comparisons ~ 3.279727 * n * log (n) heapsort: 1554494, 89936
info: length 100000: 5041697 comparisons ~ 3.266775 * n * log (n) heapsort: 1742687, 99936
info: length 110000: 5562597 comparisons ~ 3.255888 * n * log (n) heapsort: 1933587, 109936
info: length 120000: 6083545 comparisons ~ 3.245368 * n * log (n) heapsort: 2124535, 119936
info: length 130000: 6604920 comparisons ~ 3.235434 * n * log (n) heapsort: 2315910, 129936
info: length 140000: 7407274 comparisons ~ 3.344872 * n * log (n) heapsort: 2508394, 139932
info: length 150000: 7951873 comparisons ~ 3.336444 * n * log (n) heapsort: 2702993, 149932
Best regards,
KuanWei Chiu
>
On Mon, Dec 4, 2023, at 1:31 PM, KuanWei Chiu wrote:
> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>> I think the factor in stdlib/tstqsort5.c needs to be adjusted:
>>
>> /* This is an arbitrary factor which is true for the current
>> implementation across a wide range of sizes. */
>> TEST_VERIFY (factor <= 4.5);
>
> It seems that the factor can be adjusted to around 3.5. I can send
> another patch for this adjustment or resend it as a patch series.
Before you go any further with this patch series I think we need to
decide whether our backward compatibility constraints mean our qsort
has to be a stable sort. If so, we should make it *always* be a
stable sort and write that into the documentation, and that would
mean junking the entire heapsort implementation.
I do not have a strong opinion either way, but I do think it should
either *always* or *never* be stable, i.e. not the previous state of
"whether you get a stable sort depends on the size of the input and
various other undocumented factors." Also, I think that the
behavior should not depend on which version of glibc a program
was compiled against.
zw
* Zack Weinberg:
> On Mon, Dec 4, 2023, at 1:31 PM, KuanWei Chiu wrote:
>> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>>> I think the factor in stdlib/tstqsort5.c needs to be adjusted:
>>>
>>> /* This is an arbitrary factor which is true for the current
>>> implementation across a wide range of sizes. */
>>> TEST_VERIFY (factor <= 4.5);
>>
>> It seems that the factor can be adjusted to around 3.5. I can send
>> another patch for this adjustment or resend it as a patch series.
>
> Before you go any further with this patch series I think we need to
> decide whether our backward compatibility constraints mean our qsort
> has to be a stable sort. If so, we should make it *always* be a
> stable sort and write that into the documentation, and that would
> mean junking the entire heapsort implementation.
That makes sense, although there might not exist an inplace sorting
algorithm that takes constant extra space regardless of element count
and element size and has reasonable performance. Maybe we could say,
“stable if element size is less than 1024 bytes” or something like that.
What I'm concerned about is that with the current implementation, we
take a performance hit *and* have compatibility problems. The
compatibility problems would be easier to justify if we actually made
things faster. Not calling malloc internally is unlikely to be
compelling to programmers.
Thanks,
Florian
On 06/12/23 07:21, Florian Weimer wrote:
> * Zack Weinberg:
>
>> On Mon, Dec 4, 2023, at 1:31 PM, KuanWei Chiu wrote:
>>> On Mon, Dec 04, 2023 at 09:20:50AM +0100, Florian Weimer wrote:
>>>> I think the factor in stdlib/tstqsort5.c needs to be adjusted:
>>>>
>>>> /* This is an arbitrary factor which is true for the current
>>>> implementation across a wide range of sizes. */
>>>> TEST_VERIFY (factor <= 4.5);
>>>
>>> It seems that the factor can be adjusted to around 3.5. I can send
>>> another patch for this adjustment or resend it as a patch series.
>>
>> Before you go any further with this patch series I think we need to
>> decide whether our backward compatibility constraints mean our qsort
>> has to be a stable sort. If so, we should make it *always* be a
>> stable sort and write that into the documentation, and that would
>> mean junking the entire heapsort implementation.
>
> That makes sense, although there might not exist an inplace sorting
> algorithm that takes constant extra space regardless of element count
> and element size and has reasonable performance. Maybe we could say,
> “stable if element size is less than 1024 bytes” or something like that.
>
> What I'm concerned about is that with the current implementation, we
> take a performance hit *and* have compatibility problems. The
> compatibility problems would be easier to justify if we actually made
> things faster. Not calling malloc internally is unlikely to be
> compelling to programmers.
My initial intention was to remove internal multiple sorting strategies
that has different semantics and that might eventually trigger corner
cases issues (the stable default mergesort versus the fallback quicksort),
and fix the quicksort fallback that had the long standing issue of O(n^2)
on adversarial inputs.
However it seems that glibc qsort now become an instance on Hyrum's law,
where it does not really matter that neither POSIX nor C standard promised
sorting stability.
So I presume it would be better to keep with mergesort for all input sizes,
along with the limited stack buffer (so there is no failure point up a
certain input number/size); and fallback to heapsort on memory allocation
failure. I will prepare a patch to restore it.
Other system does provide mergesort as a different symbol [1], which also
returns a failure is case memory can not be allocated. But I don't think
it would be an improvement in this situation (no one will really adapt
their code to use mergesort if a stable allocation is required). I also
though about adding a tunable to enable mergesort on qsort, but it would
ending up only add more complexity and required more testing.
[1] https://man.freebsd.org/cgi/man.cgi?query=mergesort&sektion=3
On Wed, Dec 6, 2023, at 7:51 AM, Adhemerval Zanella Netto wrote:
> On 06/12/23 07:21, Florian Weimer wrote:
>> * Zack Weinberg:
>>>
>>> Before you go any further with this patch series I think we need to
>>> decide whether our backward compatibility constraints mean our qsort
>>> has to be a stable sort. If so, we should make it *always* be a
>>> stable sort and write that into the documentation, and that would
>>> mean junking the entire heapsort implementation.
>>
>> That makes sense, although there might not exist an inplace sorting
>> algorithm that takes constant extra space regardless of element
>> count and element size and has reasonable performance. Maybe we
>> could say, “stable if element size is less than 1024 bytes” or
>> something like that.
It occurred to me late last night that *any* inplace sorting algorithm
that operates on an array, can be made stable with no additional storage
cost, by using each element's address as a tiebreaker for the comparison
function. It *doesn't* need to be the element's original address  all
that matters is that for every pair of elements that compare equal, the
sorting algorithm preserves the < relation for their addresses across
all swaps, which should happen naturally if address< is used as a
tiebreaker for comparisons.
(That said, I still think we should give that "block exchange mergesort"
algorithm I dug up the other week a try. I might have time early in
January to put a patch set together.)
zw
* Zack Weinberg:
> It occurred to me late last night that *any* inplace sorting algorithm
> that operates on an array, can be made stable with no additional storage
> cost, by using each element's address as a tiebreaker for the comparison
> function. It *doesn't* need to be the element's original address  all
> that matters is that for every pair of elements that compare equal, the
> sorting algorithm preserves the < relation for their addresses across
> all swaps, which should happen naturally if address< is used as a
> tiebreaker for comparisons.
It's not as simple as that because we might have A < B, so we can move A
to a lower address than B, but at the same time time, we must make sure
that we do not move it past an element C with A == C, otherwise the
result of the tiebreaking address comparison changes. I haven't really
thought about it, but I suspect it's possible to do for some of the
quadratic algorithms, but quicksort and heapsort—not so much.
Thanks,
Florian
On Sun, Dec 17, 2023 at 10:42:17AM 0500, Zack Weinberg wrote:
> It occurred to me late last night that *any* inplace sorting algorithm
> that operates on an array, can be made stable with no additional storage
> cost, by using each element's address as a tiebreaker for the comparison
> function. It *doesn't* need to be the element's original address  all
> that matters is that for every pair of elements that compare equal, the
> sorting algorithm preserves the < relation for their addresses across
> all swaps, which should happen naturally if address< is used as a
> tiebreaker for comparisons.
>
> (That said, I still think we should give that "block exchange mergesort"
> algorithm I dug up the other week a try. I might have time early in
> January to put a patch set together.)
Additionally, if we're open to exploring noninplace options, I think
Timsort might be worth considering. It's widely adopted, including in
Python and Android, serving as a stable hybrid sorting algorithm.
Timsort is an adaptive sort algorithm that operates in O(n) time when
the input is already sorted and maintains a worstcase time complexity
of O(n log n), similar to mergesort. I'd be happy to prepare a patch
for Timsort next month if needed.
Link: https://github.com/python/cpython/blob/main/Objects/listsort.txt
Best regards,
KuanWei Chiu
* KuanWei Chiu:
> On Sun, Dec 17, 2023 at 10:42:17AM 0500, Zack Weinberg wrote:
>> It occurred to me late last night that *any* inplace sorting algorithm
>> that operates on an array, can be made stable with no additional storage
>> cost, by using each element's address as a tiebreaker for the comparison
>> function. It *doesn't* need to be the element's original address  all
>> that matters is that for every pair of elements that compare equal, the
>> sorting algorithm preserves the < relation for their addresses across
>> all swaps, which should happen naturally if address< is used as a
>> tiebreaker for comparisons.
>>
>> (That said, I still think we should give that "block exchange mergesort"
>> algorithm I dug up the other week a try. I might have time early in
>> January to put a patch set together.)
>
> Additionally, if we're open to exploring noninplace options, I think
> Timsort might be worth considering. It's widely adopted, including in
> Python and Android, serving as a stable hybrid sorting algorithm.
> Timsort is an adaptive sort algorithm that operates in O(n) time when
> the input is already sorted and maintains a worstcase time complexity
> of O(n log n), similar to mergesort. I'd be happy to prepare a patch
> for Timsort next month if needed.
I think we really want to get rid of that malloc in qsort. This means
that we need something inplaceish: we can throw a fair amount of stack
space at the problem (probably quite a bit more of the circa 1024 bytes
we use today), but not something that scales linearly with the size of
the input array. We cannot not even even assume space for just one
temporary object on the stack, I think.
Thanks,
Florian
On Sun, Dec 03, 2023 at 05:48:39AM +0800, KuanWei Chiu wrote:
> The current heapsort implementation in the siftdown function follows
> the standard textbook version, necessitating two comparisons at each
> level. Transitioning to the Bottomup heapsort version allows us to
> decrease the required comparisons to just one per level. On average,
> this modification significantly reduces the comparison count from
> 2nlog2(n)  3n + o(n) to nlog2(n) + 0.37*n + o(n).
>
> Refs:
> BOTTOMUPHEAPSORT, a new variant of HEAPSORT beating, on an average,
> QUICKSORT (if n is not very small)
> Ingo Wegener
> Theoretical Computer Science, 118(1); Pages 8198, 13 September 1993
> https://doi.org/10.1016/03043975(93)90364Y
>
> Signedoffby: KuanWei Chiu <visitorckw@gmail.com>
> 
> stdlib/qsort.c  36 ++++++++++++++++++
> 1 file changed, 18 insertions(+), 18 deletions()
>
> diff git a/stdlib/qsort.c b/stdlib/qsort.c
> index be01fb5598..f5babef150 100644
>  a/stdlib/qsort.c
> +++ b/stdlib/qsort.c
> @@ 132,26 +132,26 @@ static inline void
> siftdown (void *base, size_t size, size_t k, size_t n,
> enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
> {
>  /* There can only be a heap condition violation if there are
>  children. */
>  while (2 * k + 1 <= n)
>  {
>  /* Left child. */
>  size_t j = 2 * k + 1;
>  /* If the right child is larger, use it. */
>  if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
>  j++;
> 
>  /* If k is already >= to its children, we are done. */
>  if (j == k  cmp (base + (k * size), base + (j * size), arg) >= 0)
>  break;
> + size_t i, j;
> +
> + /* Find the siftdown path all the way to the leaves. */
> + for (i = k; j = 2 * i + 1, j + 1 <= n;)
> + i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>
>  /* Heal the violation. */
>  do_swap (base + (size * j), base + (k * size), size, swap_type);
> + /* Special case for the last leaf with no sibling. */
> + if (j == n)
> + i = j;
>
>  /* Swapping with j may have introduced a violation at j. Fix
>  it in the next loop iteration. */
>  k = j;
> + /* Backtrack to the correct location. */
> + while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
> + i = (i  1) / 2;
> +
> + /* Shift the element into its correct place. */
> + j = i;
> + while (i != k)
> + {
> + i = (i  1) / 2;
> + do_swap (base + i * size, base + j * size, size, swap_type);
> }
> }
>
> 
> 2.25.1
>
Since we still retain heapsort as a fallback for mergesort, should we
reconsider applying this patch?
Thanks,
KuanWei
On 16/02/24 04:08, KuanWei Chiu wrote:
> On Sun, Dec 03, 2023 at 05:48:39AM +0800, KuanWei Chiu wrote:
>> The current heapsort implementation in the siftdown function follows
>> the standard textbook version, necessitating two comparisons at each
>> level. Transitioning to the Bottomup heapsort version allows us to
>> decrease the required comparisons to just one per level. On average,
>> this modification significantly reduces the comparison count from
>> 2nlog2(n)  3n + o(n) to nlog2(n) + 0.37*n + o(n).
>>
>> Refs:
>> BOTTOMUPHEAPSORT, a new variant of HEAPSORT beating, on an average,
>> QUICKSORT (if n is not very small)
>> Ingo Wegener
>> Theoretical Computer Science, 118(1); Pages 8198, 13 September 1993
>> https://doi.org/10.1016/03043975(93)90364Y
>>
>> Signedoffby: KuanWei Chiu <visitorckw@gmail.com>
>> 
>> stdlib/qsort.c  36 ++++++++++++++++++
>> 1 file changed, 18 insertions(+), 18 deletions()
>>
>> diff git a/stdlib/qsort.c b/stdlib/qsort.c
>> index be01fb5598..f5babef150 100644
>>  a/stdlib/qsort.c
>> +++ b/stdlib/qsort.c
>> @@ 132,26 +132,26 @@ static inline void
>> siftdown (void *base, size_t size, size_t k, size_t n,
>> enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
>> {
>>  /* There can only be a heap condition violation if there are
>>  children. */
>>  while (2 * k + 1 <= n)
>>  {
>>  /* Left child. */
>>  size_t j = 2 * k + 1;
>>  /* If the right child is larger, use it. */
>>  if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
>>  j++;
>> 
>>  /* If k is already >= to its children, we are done. */
>>  if (j == k  cmp (base + (k * size), base + (j * size), arg) >= 0)
>>  break;
>> + size_t i, j;
>> +
>> + /* Find the siftdown path all the way to the leaves. */
>> + for (i = k; j = 2 * i + 1, j + 1 <= n;)
>> + i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>>
>>  /* Heal the violation. */
>>  do_swap (base + (size * j), base + (k * size), size, swap_type);
>> + /* Special case for the last leaf with no sibling. */
>> + if (j == n)
>> + i = j;
>>
>>  /* Swapping with j may have introduced a violation at j. Fix
>>  it in the next loop iteration. */
>>  k = j;
>> + /* Backtrack to the correct location. */
>> + while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
>> + i = (i  1) / 2;
>> +
>> + /* Shift the element into its correct place. */
>> + j = i;
>> + while (i != k)
>> + {
>> + i = (i  1) / 2;
>> + do_swap (base + i * size, base + j * size, size, swap_type);
>> }
>> }
>>
>> 
>> 2.25.1
>>
>
> Since we still retain heapsort as a fallback for mergesort, should we
> reconsider applying this patch?
>
> Thanks,
> KuanWei
Yes, I think it is work and it looks great to me.
On Wed, Mar 27, 2024 at 04:45:35PM 0300, Adhemerval Zanella Netto wrote:
>
>
> On 16/02/24 04:08, KuanWei Chiu wrote:
> > On Sun, Dec 03, 2023 at 05:48:39AM +0800, KuanWei Chiu wrote:
> >> The current heapsort implementation in the siftdown function follows
> >> the standard textbook version, necessitating two comparisons at each
> >> level. Transitioning to the Bottomup heapsort version allows us to
> >> decrease the required comparisons to just one per level. On average,
> >> this modification significantly reduces the comparison count from
> >> 2nlog2(n)  3n + o(n) to nlog2(n) + 0.37*n + o(n).
> >>
> >> Refs:
> >> BOTTOMUPHEAPSORT, a new variant of HEAPSORT beating, on an average,
> >> QUICKSORT (if n is not very small)
> >> Ingo Wegener
> >> Theoretical Computer Science, 118(1); Pages 8198, 13 September 1993
> >> https://doi.org/10.1016/03043975(93)90364Y
> >>
> >> Signedoffby: KuanWei Chiu <visitorckw@gmail.com>
> >> 
> >> stdlib/qsort.c  36 ++++++++++++++++++
> >> 1 file changed, 18 insertions(+), 18 deletions()
> >>
> >> diff git a/stdlib/qsort.c b/stdlib/qsort.c
> >> index be01fb5598..f5babef150 100644
> >>  a/stdlib/qsort.c
> >> +++ b/stdlib/qsort.c
> >> @@ 132,26 +132,26 @@ static inline void
> >> siftdown (void *base, size_t size, size_t k, size_t n,
> >> enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
> >> {
> >>  /* There can only be a heap condition violation if there are
> >>  children. */
> >>  while (2 * k + 1 <= n)
> >>  {
> >>  /* Left child. */
> >>  size_t j = 2 * k + 1;
> >>  /* If the right child is larger, use it. */
> >>  if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
> >>  j++;
> >> 
> >>  /* If k is already >= to its children, we are done. */
> >>  if (j == k  cmp (base + (k * size), base + (j * size), arg) >= 0)
> >>  break;
> >> + size_t i, j;
> >> +
> >> + /* Find the siftdown path all the way to the leaves. */
> >> + for (i = k; j = 2 * i + 1, j + 1 <= n;)
> >> + i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
> >>
> >>  /* Heal the violation. */
> >>  do_swap (base + (size * j), base + (k * size), size, swap_type);
> >> + /* Special case for the last leaf with no sibling. */
> >> + if (j == n)
> >> + i = j;
> >>
> >>  /* Swapping with j may have introduced a violation at j. Fix
> >>  it in the next loop iteration. */
> >>  k = j;
> >> + /* Backtrack to the correct location. */
> >> + while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
> >> + i = (i  1) / 2;
> >> +
> >> + /* Shift the element into its correct place. */
> >> + j = i;
> >> + while (i != k)
> >> + {
> >> + i = (i  1) / 2;
> >> + do_swap (base + i * size, base + j * size, size, swap_type);
> >> }
> >> }
> >>
> >> 
> >> 2.25.1
> >>
> >
> > Since we still retain heapsort as a fallback for mergesort, should we
> > reconsider applying this patch?
> >
> > Thanks,
> > KuanWei
>
> Yes, I think it is work and it looks great to me.
Should I resend a new patch, or can you directly apply this one?
Thanks,
KuanWei
On 27/03/24 16:59, KuanWei Chiu wrote:
> On Wed, Mar 27, 2024 at 04:45:35PM 0300, Adhemerval Zanella Netto wrote:
>>
>>
>> On 16/02/24 04:08, KuanWei Chiu wrote:
>>> On Sun, Dec 03, 2023 at 05:48:39AM +0800, KuanWei Chiu wrote:
>>>> The current heapsort implementation in the siftdown function follows
>>>> the standard textbook version, necessitating two comparisons at each
>>>> level. Transitioning to the Bottomup heapsort version allows us to
>>>> decrease the required comparisons to just one per level. On average,
>>>> this modification significantly reduces the comparison count from
>>>> 2nlog2(n)  3n + o(n) to nlog2(n) + 0.37*n + o(n).
>>>>
>>>> Refs:
>>>> BOTTOMUPHEAPSORT, a new variant of HEAPSORT beating, on an average,
>>>> QUICKSORT (if n is not very small)
>>>> Ingo Wegener
>>>> Theoretical Computer Science, 118(1); Pages 8198, 13 September 1993
>>>> https://doi.org/10.1016/03043975(93)90364Y
>>>>
>>>> Signedoffby: KuanWei Chiu <visitorckw@gmail.com>
>>>> 
>>>> stdlib/qsort.c  36 ++++++++++++++++++
>>>> 1 file changed, 18 insertions(+), 18 deletions()
>>>>
>>>> diff git a/stdlib/qsort.c b/stdlib/qsort.c
>>>> index be01fb5598..f5babef150 100644
>>>>  a/stdlib/qsort.c
>>>> +++ b/stdlib/qsort.c
>>>> @@ 132,26 +132,26 @@ static inline void
>>>> siftdown (void *base, size_t size, size_t k, size_t n,
>>>> enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
>>>> {
>>>>  /* There can only be a heap condition violation if there are
>>>>  children. */
>>>>  while (2 * k + 1 <= n)
>>>>  {
>>>>  /* Left child. */
>>>>  size_t j = 2 * k + 1;
>>>>  /* If the right child is larger, use it. */
>>>>  if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
>>>>  j++;
>>>> 
>>>>  /* If k is already >= to its children, we are done. */
>>>>  if (j == k  cmp (base + (k * size), base + (j * size), arg) >= 0)
>>>>  break;
>>>> + size_t i, j;
>>>> +
>>>> + /* Find the siftdown path all the way to the leaves. */
>>>> + for (i = k; j = 2 * i + 1, j + 1 <= n;)
>>>> + i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
>>>>
>>>>  /* Heal the violation. */
>>>>  do_swap (base + (size * j), base + (k * size), size, swap_type);
>>>> + /* Special case for the last leaf with no sibling. */
>>>> + if (j == n)
>>>> + i = j;
>>>>
>>>>  /* Swapping with j may have introduced a violation at j. Fix
>>>>  it in the next loop iteration. */
>>>>  k = j;
>>>> + /* Backtrack to the correct location. */
>>>> + while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
>>>> + i = (i  1) / 2;
>>>> +
>>>> + /* Shift the element into its correct place. */
>>>> + j = i;
>>>> + while (i != k)
>>>> + {
>>>> + i = (i  1) / 2;
>>>> + do_swap (base + i * size, base + j * size, size, swap_type);
>>>> }
>>>> }
>>>>
>>>> 
>>>> 2.25.1
>>>>
>>>
>>> Since we still retain heapsort as a fallback for mergesort, should we
>>> reconsider applying this patch?
>>>
>>> Thanks,
>>> KuanWei
>>
>> Yes, I think it is work and it looks great to me.
>
> Should I resend a new patch, or can you directly apply this one?
It applies asis, so it should be ok. I will run some tests and apply, thanks.
@@ 132,26 +132,26 @@ static inline void
siftdown (void *base, size_t size, size_t k, size_t n,
enum swap_type_t swap_type, __compar_d_fn_t cmp, void *arg)
{
 /* There can only be a heap condition violation if there are
 children. */
 while (2 * k + 1 <= n)
 {
 /* Left child. */
 size_t j = 2 * k + 1;
 /* If the right child is larger, use it. */
 if (j < n && cmp (base + (j * size), base + ((j + 1) * size), arg) < 0)
 j++;

 /* If k is already >= to its children, we are done. */
 if (j == k  cmp (base + (k * size), base + (j * size), arg) >= 0)
 break;
+ size_t i, j;
+
+ /* Find the siftdown path all the way to the leaves. */
+ for (i = k; j = 2 * i + 1, j + 1 <= n;)
+ i = cmp (base + j * size, base + (j + 1) * size, arg) >= 0 ? j : (j + 1);
 /* Heal the violation. */
 do_swap (base + (size * j), base + (k * size), size, swap_type);
+ /* Special case for the last leaf with no sibling. */
+ if (j == n)
+ i = j;
 /* Swapping with j may have introduced a violation at j. Fix
 it in the next loop iteration. */
 k = j;
+ /* Backtrack to the correct location. */
+ while (i != k && cmp (base + k * size, base + i * size, arg) > 0)
+ i = (i  1) / 2;
+
+ /* Shift the element into its correct place. */
+ j = i;
+ while (i != k)
+ {
+ i = (i  1) / 2;
+ do_swap (base + i * size, base + j * size, size, swap_type);
}
}