[v2,2/2] x86: Add additional benchmarks for strchr

Message ID 20210201003014.785099-2-goldstein.w.n@gmail.com
State Superseded
Delegated to: H.J. Lu
Headers
Series [v2,1/2] x86: Refactor and improve performance of strchr-avx2.S |

Commit Message

Noah Goldstein Feb. 1, 2021, 12:30 a.m. UTC
  This patch adds additional benchmarks for string size of 4096 and
several benchmarks for string size 256 with different alignments.

Signed-off-by: noah <goldstein.w.n@gmail.com>
---
Added 2 additional benchmark sizes:

4096: Just feels like a natural "large" size to test
    
256 with multiple alignments: This essentially is to test how
expensive the initial work prior to the 4x loop is depending on
different alignments.

results from bench-strchr: All times are in seconds and the medium of
100 runs.  Old is current strchr-avx2.S implementation. New is this
patch.

Summary: New is definetly faster for medium -> large sizes. Once the
4x loop is hit there is a 10%+ speedup and New always wins out. For
smaller sizes there is more variance as to which is faster and the
differences are small. Generally it seems the New version wins
out. This is likely because 0 - 31 sized strings are the fast path for
new (no jmp).

Benchmarking CPU:
Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

size, algn, Old T , New T  -------- Win  Dif
0   , 0   , 2.54  , 2.52   -------- New  -0.02
1   , 0   , 2.57  , 2.52   -------- New  -0.05
2   , 0   , 2.56  , 2.52   -------- New  -0.04
3   , 0   , 2.58  , 2.54   -------- New  -0.04
4   , 0   , 2.61  , 2.55   -------- New  -0.06
5   , 0   , 2.65  , 2.62   -------- New  -0.03
6   , 0   , 2.73  , 2.74   -------- Old  -0.01
7   , 0   , 2.75  , 2.74   -------- New  -0.01
8   , 0   , 2.62  , 2.6    -------- New  -0.02
9   , 0   , 2.73  , 2.75   -------- Old  -0.02
10  , 0   , 2.74  , 2.74   -------- Eq    N/A
11  , 0   , 2.76  , 2.72   -------- New  -0.04
12  , 0   , 2.74  , 2.72   -------- New  -0.02
13  , 0   , 2.75  , 2.72   -------- New  -0.03
14  , 0   , 2.74  , 2.73   -------- New  -0.01
15  , 0   , 2.74  , 2.73   -------- New  -0.01
16  , 0   , 2.74  , 2.73   -------- New  -0.01
17  , 0   , 2.74  , 2.74   -------- Eq    N/A
18  , 0   , 2.73  , 2.73   -------- Eq    N/A
19  , 0   , 2.73  , 2.73   -------- Eq    N/A
20  , 0   , 2.73  , 2.73   -------- Eq    N/A
21  , 0   , 2.73  , 2.72   -------- New  -0.01
22  , 0   , 2.71  , 2.74   -------- Old  -0.03
23  , 0   , 2.71  , 2.69   -------- New  -0.02
24  , 0   , 2.68  , 2.67   -------- New  -0.01
25  , 0   , 2.66  , 2.62   -------- New  -0.04
26  , 0   , 2.64  , 2.62   -------- New  -0.02
27  , 0   , 2.71  , 2.64   -------- New  -0.07
28  , 0   , 2.67  , 2.69   -------- Old  -0.02
29  , 0   , 2.72  , 2.72   -------- Eq    N/A
30  , 0   , 2.68  , 2.69   -------- Old  -0.01
31  , 0   , 2.68  , 2.68   -------- Eq    N/A
32  , 0   , 3.51  , 3.52   -------- Old  -0.01
32  , 1   , 3.52  , 3.51   -------- New  -0.01
64  , 0   , 3.97  , 3.93   -------- New  -0.04
64  , 2   , 3.95  , 3.9    -------- New  -0.05
64  , 1   , 4.0   , 3.93   -------- New  -0.07
64  , 3   , 3.97  , 3.88   -------- New  -0.09
64  , 4   , 3.95  , 3.89   -------- New  -0.06
64  , 5   , 3.94  , 3.9    -------- New  -0.04
64  , 6   , 3.97  , 3.9    -------- New  -0.07
64  , 7   , 3.97  , 3.91   -------- New  -0.06
96  , 0   , 4.74  , 4.52   -------- New  -0.22
128 , 0   , 5.29  , 5.19   -------- New  -0.1
128 , 2   , 5.29  , 5.15   -------- New  -0.14
128 , 3   , 5.31  , 5.22   -------- New  -0.09
256 , 0   , 11.19 , 9.81   -------- New  -1.38
256 , 3   , 11.19 , 9.84   -------- New  -1.35
256 , 4   , 11.2  , 9.88   -------- New  -1.32
256 , 16  , 11.21 , 9.79   -------- New  -1.42
256 , 32  , 11.39 , 10.34  -------- New  -1.05
256 , 48  , 11.88 , 10.56  -------- New  -1.32
256 , 64  , 11.82 , 10.83  -------- New  -0.99
256 , 80  , 11.85 , 10.86  -------- New  -0.99
256 , 96  , 9.56  , 8.76   -------- New  -0.8 
256 , 112 , 9.55  , 8.9    -------- New  -0.65
512 , 0   , 15.76 , 13.72  -------- New  -2.04
512 , 4   , 15.72 , 13.74  -------- New  -1.98
512 , 5   , 15.73 , 13.74  -------- New  -1.99
1024, 0   , 24.85 , 21.33  -------- New  -3.52
1024, 5   , 24.86 , 21.27  -------- New  -3.59
1024, 6   , 24.87 , 21.32  -------- New  -3.55
2048, 0   , 45.75 , 36.7   -------- New  -9.05
2048, 6   , 43.91 , 35.42  -------- New  -8.49
2048, 7   , 44.43 , 36.37  -------- New  -8.06
4096, 0   , 96.94 , 81.34  -------- New  -15.6
4096, 7   , 97.01 , 81.32  -------- New  -15.69


    
 benchtests/bench-strchr.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)
  

Comments

H.J. Lu Feb. 1, 2021, 5:10 p.m. UTC | #1
On Sun, Jan 31, 2021 at 4:30 PM noah <goldstein.w.n@gmail.com> wrote:
>
> This patch adds additional benchmarks for string size of 4096 and
> several benchmarks for string size 256 with different alignments.
>
> Signed-off-by: noah <goldstein.w.n@gmail.com>
> ---
> Added 2 additional benchmark sizes:
>
> 4096: Just feels like a natural "large" size to test
>
> 256 with multiple alignments: This essentially is to test how
> expensive the initial work prior to the 4x loop is depending on
> different alignments.
>
> results from bench-strchr: All times are in seconds and the medium of
> 100 runs.  Old is current strchr-avx2.S implementation. New is this
> patch.
>
> Summary: New is definetly faster for medium -> large sizes. Once the
> 4x loop is hit there is a 10%+ speedup and New always wins out. For
> smaller sizes there is more variance as to which is faster and the
> differences are small. Generally it seems the New version wins
> out. This is likely because 0 - 31 sized strings are the fast path for
> new (no jmp).
>
> Benchmarking CPU:
> Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
>
> size, algn, Old T , New T  -------- Win  Dif
> 0   , 0   , 2.54  , 2.52   -------- New  -0.02
> 1   , 0   , 2.57  , 2.52   -------- New  -0.05
> 2   , 0   , 2.56  , 2.52   -------- New  -0.04
> 3   , 0   , 2.58  , 2.54   -------- New  -0.04
> 4   , 0   , 2.61  , 2.55   -------- New  -0.06
> 5   , 0   , 2.65  , 2.62   -------- New  -0.03
> 6   , 0   , 2.73  , 2.74   -------- Old  -0.01
> 7   , 0   , 2.75  , 2.74   -------- New  -0.01
> 8   , 0   , 2.62  , 2.6    -------- New  -0.02
> 9   , 0   , 2.73  , 2.75   -------- Old  -0.02
> 10  , 0   , 2.74  , 2.74   -------- Eq    N/A
> 11  , 0   , 2.76  , 2.72   -------- New  -0.04
> 12  , 0   , 2.74  , 2.72   -------- New  -0.02
> 13  , 0   , 2.75  , 2.72   -------- New  -0.03
> 14  , 0   , 2.74  , 2.73   -------- New  -0.01
> 15  , 0   , 2.74  , 2.73   -------- New  -0.01
> 16  , 0   , 2.74  , 2.73   -------- New  -0.01
> 17  , 0   , 2.74  , 2.74   -------- Eq    N/A
> 18  , 0   , 2.73  , 2.73   -------- Eq    N/A
> 19  , 0   , 2.73  , 2.73   -------- Eq    N/A
> 20  , 0   , 2.73  , 2.73   -------- Eq    N/A
> 21  , 0   , 2.73  , 2.72   -------- New  -0.01
> 22  , 0   , 2.71  , 2.74   -------- Old  -0.03
> 23  , 0   , 2.71  , 2.69   -------- New  -0.02
> 24  , 0   , 2.68  , 2.67   -------- New  -0.01
> 25  , 0   , 2.66  , 2.62   -------- New  -0.04
> 26  , 0   , 2.64  , 2.62   -------- New  -0.02
> 27  , 0   , 2.71  , 2.64   -------- New  -0.07
> 28  , 0   , 2.67  , 2.69   -------- Old  -0.02
> 29  , 0   , 2.72  , 2.72   -------- Eq    N/A
> 30  , 0   , 2.68  , 2.69   -------- Old  -0.01
> 31  , 0   , 2.68  , 2.68   -------- Eq    N/A
> 32  , 0   , 3.51  , 3.52   -------- Old  -0.01
> 32  , 1   , 3.52  , 3.51   -------- New  -0.01
> 64  , 0   , 3.97  , 3.93   -------- New  -0.04
> 64  , 2   , 3.95  , 3.9    -------- New  -0.05
> 64  , 1   , 4.0   , 3.93   -------- New  -0.07
> 64  , 3   , 3.97  , 3.88   -------- New  -0.09
> 64  , 4   , 3.95  , 3.89   -------- New  -0.06
> 64  , 5   , 3.94  , 3.9    -------- New  -0.04
> 64  , 6   , 3.97  , 3.9    -------- New  -0.07
> 64  , 7   , 3.97  , 3.91   -------- New  -0.06
> 96  , 0   , 4.74  , 4.52   -------- New  -0.22
> 128 , 0   , 5.29  , 5.19   -------- New  -0.1
> 128 , 2   , 5.29  , 5.15   -------- New  -0.14
> 128 , 3   , 5.31  , 5.22   -------- New  -0.09
> 256 , 0   , 11.19 , 9.81   -------- New  -1.38
> 256 , 3   , 11.19 , 9.84   -------- New  -1.35
> 256 , 4   , 11.2  , 9.88   -------- New  -1.32
> 256 , 16  , 11.21 , 9.79   -------- New  -1.42
> 256 , 32  , 11.39 , 10.34  -------- New  -1.05
> 256 , 48  , 11.88 , 10.56  -------- New  -1.32
> 256 , 64  , 11.82 , 10.83  -------- New  -0.99
> 256 , 80  , 11.85 , 10.86  -------- New  -0.99
> 256 , 96  , 9.56  , 8.76   -------- New  -0.8
> 256 , 112 , 9.55  , 8.9    -------- New  -0.65
> 512 , 0   , 15.76 , 13.72  -------- New  -2.04
> 512 , 4   , 15.72 , 13.74  -------- New  -1.98
> 512 , 5   , 15.73 , 13.74  -------- New  -1.99
> 1024, 0   , 24.85 , 21.33  -------- New  -3.52
> 1024, 5   , 24.86 , 21.27  -------- New  -3.59
> 1024, 6   , 24.87 , 21.32  -------- New  -3.55
> 2048, 0   , 45.75 , 36.7   -------- New  -9.05
> 2048, 6   , 43.91 , 35.42  -------- New  -8.49
> 2048, 7   , 44.43 , 36.37  -------- New  -8.06
> 4096, 0   , 96.94 , 81.34  -------- New  -15.6
> 4096, 7   , 97.01 , 81.32  -------- New  -15.69
>
>
>
>  benchtests/bench-strchr.c | 32 ++++++++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> index bf493fe458..5fd98a5d43 100644
> --- a/benchtests/bench-strchr.c
> +++ b/benchtests/bench-strchr.c
> @@ -100,9 +100,13 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
>    size_t i;
>    CHAR *result;
>    CHAR *buf = (CHAR *) buf1;
> -  align &= 15;
> +
> +  align &= 127;
>    if ((align + len) * sizeof (CHAR) >= page_size)
> -    return;
> +    {
> +      return;
> +    }
> +
>
>    for (i = 0; i < len; ++i)
>      {
> @@ -151,12 +155,24 @@ test_main (void)
>        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
>      }
>
> +  for (i = 1; i < 8; ++i)
> +    {
> +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> +    }
> +
>    for (i = 1; i < 8; ++i)
>      {
>        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
>        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
>      }
>
> +  for (i = 0; i < 8; ++i)
> +    {
> +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> +    }
> +
>    for (i = 0; i < 32; ++i)
>      {
>        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> @@ -169,12 +185,24 @@ test_main (void)
>        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
>      }
>
> +  for (i = 1; i < 8; ++i)
> +    {
> +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> +    }
> +
>    for (i = 1; i < 8; ++i)
>      {
>        do_test (i, 64, 256, 0, MIDDLE_CHAR);
>        do_test (i, 64, 256, 0, BIG_CHAR);
>      }
>
> +  for (i = 0; i < 8; ++i)
> +    {
> +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> +    }
> +
>    for (i = 0; i < 32; ++i)
>      {
>        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> --
> 2.29.2

Please make the similar changes in string/test-strchr.c.

Thanks.
  

Patch

diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
index bf493fe458..5fd98a5d43 100644
--- a/benchtests/bench-strchr.c
+++ b/benchtests/bench-strchr.c
@@ -100,9 +100,13 @@  do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
-    return;
+    {
+      return;                
+    }
+
 
   for (i = 0; i < len; ++i)
     {
@@ -151,12 +155,24 @@  test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -169,12 +185,24 @@  test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);