Halving the number of recursive calls

Message ID CAKs8_O+FRyB3kpzxQZPVrQGwVzRzwViK17kVpWzQAEzQtHi9VQ@mail.gmail.com
State New
Headers
Series Halving the number of recursive calls |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
redhat-pt-bot/TryBot-32bit fail Patch series failed to apply

Commit Message

Viktor Reznov May 13, 2024, 2:40 a.m. UTC
  From b4813bd1d7f48014e24f8a8749da49f7749c4f37 Mon Sep 17 00:00:00 2001
From: Reznov <yann.collet.is.not.a.perfectionist@gmail.com>
Date: Mon, 13 May 2024 05:27:05 +0300
Subject: [PATCH] Halving the number of recursive calls

---
 stdlib/qsort.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

   const size_t s = p->s;
  

Comments

Carlos O'Donell May 13, 2024, 2:04 p.m. UTC | #1
On 5/12/24 10:40 PM, Viktor Reznov wrote:
> From b4813bd1d7f48014e24f8a8749da49f7749c4f37 Mon Sep 17 00:00:00 2001
> From: Reznov <yann.collet.is.not.a.perfectionist@gmail.com>
> Date: Mon, 13 May 2024 05:27:05 +0300
> Subject: [PATCH] Halving the number of recursive calls

Thank you for contributing to glibc! :-)

Please note that this patch fails to pass pre-commit CI:
https://patchwork.sourceware.org/project/glibc/patch/CAKs8_O+FRyB3kpzxQZPVrQGwVzRzwViK17kVpWzQAEzQtHi9VQ@mail.gmail.com/
- Patch fails to apply due to line wrapping.

Please have a look at the contribution checklist here:
https://sourceware.org/glibc/wiki/Contribution%20checklist

In particular:

- You can use `git send-email` to send the message to the list to avoid wrapping
  issues with mail clients.

- You must choose one of the processes for copyright for your patch, either
  disclaim, assign, or developer certificate of origin.


Lastly, How did you test the performance gains?

Does this change show up visibly in a microbenchmark or workload?


> ---
>  stdlib/qsort.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/stdlib/qsort.c b/stdlib/qsort.c
> index be47aebbe0..30ba492869 100644
> --- a/stdlib/qsort.c
> +++ b/stdlib/qsort.c
> @@ -198,16 +198,15 @@ msort_with_tmp (const struct msort_param *p,
> void *b, size_t n)
>    char *b1, *b2;
>    size_t n1, n2;
> 
> -  if (n <= 1)
> -    return;
> -
>    n1 = n / 2;
>    n2 = n - n1;
>    b1 = b;
>    b2 = (char *) b + (n1 * p->s);
> 
> -  msort_with_tmp (p, b1, n1);
> -  msort_with_tmp (p, b2, n2);
> +  if (n1 > 1)
> +    msort_with_tmp (p, b1, n1);
> +  if (n2 > 1)
> +    msort_with_tmp (p, b2, n2);
> 
>    char *tmp = p->t;
>    const size_t s = p->s;
  
Viktor Reznov May 14, 2024, 5:27 a.m. UTC | #2
On Mon, May 13, 2024 at 5:14 PM Carlos O'Donell <carlos@redhat.com> wrote:
> Lastly, How did you test the performance gains?
> Does this change show up visibly in a microbenchmark or workload?

Even if a compiler is capable of doing "partial inlining", it is
better to do this inlining manually.
  
Xi Ruoyao May 14, 2024, 6:08 a.m. UTC | #3
On Tue, 2024-05-14 at 08:27 +0300, Viktor Reznov wrote:
> On Mon, May 13, 2024 at 5:14 PM Carlos O'Donell <carlos@redhat.com>
> wrote:
> > Lastly, How did you test the performance gains?
> > Does this change show up visibly in a microbenchmark or workload?
> 
> Even if a compiler is capable of doing "partial inlining", it is
> better to do this inlining manually.

No, in Glibc if we are pretty sure the compiler will optimize it, we
don't optimize it manually.

It's not really "better" unless a not-so-old GCC or Clang fails to
optimize it, *and* the optimization really makes it faster for
microbenchmark or workload.
  
Adhemerval Zanella May 14, 2024, 9:14 a.m. UTC | #4
On 14/05/24 08:08, Xi Ruoyao wrote:
> On Tue, 2024-05-14 at 08:27 +0300, Viktor Reznov wrote:
>> On Mon, May 13, 2024 at 5:14 PM Carlos O'Donell <carlos@redhat.com>
>> wrote:
>>> Lastly, How did you test the performance gains?
>>> Does this change show up visibly in a microbenchmark or workload?
>>
>> Even if a compiler is capable of doing "partial inlining", it is
>> better to do this inlining manually.
> 
> No, in Glibc if we are pretty sure the compiler will optimize it, we
> don't optimize it manually.
> 
> It's not really "better" unless a not-so-old GCC or Clang fails to
> optimize it, *and* the optimization really makes it faster for
> microbenchmark or workload.
> 

Kuan-Wei sent a similar optimization some time ago [1], which I 
complete forgot to install.  My plan is to re-run some tests and
apply, so how this patch compare to Kuan's change?

[1] https://sourceware.org/pipermail/libc-alpha/2024-March/155648.html
  

Patch

diff --git a/stdlib/qsort.c b/stdlib/qsort.c
index be47aebbe0..30ba492869 100644
--- a/stdlib/qsort.c
+++ b/stdlib/qsort.c
@@ -198,16 +198,15 @@  msort_with_tmp (const struct msort_param *p,
void *b, size_t n)
   char *b1, *b2;
   size_t n1, n2;

-  if (n <= 1)
-    return;
-
   n1 = n / 2;
   n2 = n - n1;
   b1 = b;
   b2 = (char *) b + (n1 * p->s);

-  msort_with_tmp (p, b1, n1);
-  msort_with_tmp (p, b2, n2);
+  if (n1 > 1)
+    msort_with_tmp (p, b1, n1);
+  if (n2 > 1)
+    msort_with_tmp (p, b2, n2);

   char *tmp = p->t;