[v4,2/2] x86: Add additional benchmarks and tests for strchr

Message ID 20210203053900.4125403-2-goldstein.w.n@gmail.com
State Committed
Headers
Series [v4,1/2] x86: Refactor and improve performance of strchr-avx2.S |

Commit Message

develop--- via Libc-alpha Feb. 3, 2021, 5:39 a.m. UTC
  From: noah <goldstein.w.n@gmail.com>

This patch adds additional benchmarks and tests for string size of
4096 and several benchmarks for string size 256 with different
alignments.

Signed-off-by: noah <goldstein.w.n@gmail.com>
---
Added 2 additional benchmark and test sizes:

4096: Just feels like a natural "large" size to test
    
256 with multiple alignments: This essentially is to test how
expensive the initial work prior to the 4x loop is depending on
different alignments.

results from bench-strchr: All times are in seconds and the medium of
100 runs.  Old is current strchr-avx2.S implementation. New is this
patch.

Summary: New is definetly faster for medium -> large sizes. Once the
4x loop is hit there is a 10%+ speedup and New always wins out. For
smaller sizes there is more variance as to which is faster and the
differences are small. Generally it seems the New version wins
out. This is likely because 0 - 31 sized strings are the fast path for
new (no jmp). Also something that is neat is the significant
performance improved for alignment 96 and 112. This is because the 5x
vectors before 4x loop really favor that alignment.

Benchmarking CPU:
Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

size, algn, Old T , New T  -------- Win  Dif
0   , 0   , 2.54  , 2.52   -------- New  -0.02
1   , 0   , 2.57  , 2.52   -------- New  -0.05
2   , 0   , 2.56  , 2.52   -------- New  -0.04
3   , 0   , 2.58  , 2.54   -------- New  -0.04
4   , 0   , 2.61  , 2.55   -------- New  -0.06
5   , 0   , 2.65  , 2.62   -------- New  -0.03
6   , 0   , 2.73  , 2.74   -------- Old  -0.01
7   , 0   , 2.75  , 2.74   -------- New  -0.01
8   , 0   , 2.62  , 2.6    -------- New  -0.02
9   , 0   , 2.73  , 2.75   -------- Old  -0.02
10  , 0   , 2.74  , 2.74   -------- Eq    N/A
11  , 0   , 2.76  , 2.72   -------- New  -0.04
12  , 0   , 2.74  , 2.72   -------- New  -0.02
13  , 0   , 2.75  , 2.72   -------- New  -0.03
14  , 0   , 2.74  , 2.73   -------- New  -0.01
15  , 0   , 2.74  , 2.73   -------- New  -0.01
16  , 0   , 2.74  , 2.73   -------- New  -0.01
17  , 0   , 2.74  , 2.74   -------- Eq    N/A
18  , 0   , 2.73  , 2.73   -------- Eq    N/A
19  , 0   , 2.73  , 2.73   -------- Eq    N/A
20  , 0   , 2.73  , 2.73   -------- Eq    N/A
21  , 0   , 2.73  , 2.72   -------- New  -0.01
22  , 0   , 2.71  , 2.74   -------- Old  -0.03
23  , 0   , 2.71  , 2.69   -------- New  -0.02
24  , 0   , 2.68  , 2.67   -------- New  -0.01
25  , 0   , 2.66  , 2.62   -------- New  -0.04
26  , 0   , 2.64  , 2.62   -------- New  -0.02
27  , 0   , 2.71  , 2.64   -------- New  -0.07
28  , 0   , 2.67  , 2.69   -------- Old  -0.02
29  , 0   , 2.72  , 2.72   -------- Eq    N/A
30  , 0   , 2.68  , 2.69   -------- Old  -0.01
31  , 0   , 2.68  , 2.68   -------- Eq    N/A
32  , 0   , 3.51  , 3.52   -------- Old  -0.01
32  , 1   , 3.52  , 3.51   -------- New  -0.01
64  , 0   , 3.97  , 3.93   -------- New  -0.04
64  , 2   , 3.95  , 3.9    -------- New  -0.05
64  , 1   , 4.0   , 3.93   -------- New  -0.07
64  , 3   , 3.97  , 3.88   -------- New  -0.09
64  , 4   , 3.95  , 3.89   -------- New  -0.06
64  , 5   , 3.94  , 3.9    -------- New  -0.04
64  , 6   , 3.97  , 3.9    -------- New  -0.07
64  , 7   , 3.97  , 3.91   -------- New  -0.06
96  , 0   , 4.74  , 4.52   -------- New  -0.22
128 , 0   , 5.29  , 5.19   -------- New  -0.1
128 , 2   , 5.29  , 5.15   -------- New  -0.14
128 , 3   , 5.31  , 5.22   -------- New  -0.09
256 , 0   , 11.19 , 9.81   -------- New  -1.38
256 , 3   , 11.19 , 9.84   -------- New  -1.35
256 , 4   , 11.2  , 9.88   -------- New  -1.32
256 , 16  , 11.21 , 9.79   -------- New  -1.42
256 , 32  , 11.39 , 10.34  -------- New  -1.05
256 , 48  , 11.88 , 10.56  -------- New  -1.32
256 , 64  , 11.82 , 10.83  -------- New  -0.99
256 , 80  , 11.85 , 10.86  -------- New  -0.99
256 , 96  , 9.56  , 8.76   -------- New  -0.8 
256 , 112 , 9.55  , 8.9    -------- New  -0.65
512 , 0   , 15.76 , 13.72  -------- New  -2.04
512 , 4   , 15.72 , 13.74  -------- New  -1.98
512 , 5   , 15.73 , 13.74  -------- New  -1.99
1024, 0   , 24.85 , 21.33  -------- New  -3.52
1024, 5   , 24.86 , 21.27  -------- New  -3.59
1024, 6   , 24.87 , 21.32  -------- New  -3.55
2048, 0   , 45.75 , 36.7   -------- New  -9.05
2048, 6   , 43.91 , 35.42  -------- New  -8.49
2048, 7   , 44.43 , 36.37  -------- New  -8.06
4096, 0   , 96.94 , 81.34  -------- New  -15.6
4096, 7   , 97.01 , 81.32  -------- New  -15.69

 benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
 string/test-strchr.c      | 26 +++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 2 deletions(-)
  

Comments

H.J. Lu Feb. 8, 2021, 2:08 p.m. UTC | #1
On Tue, Feb 2, 2021 at 9:39 PM <goldstein.w.n@gmail.com> wrote:
>
> From: noah <goldstein.w.n@gmail.com>
>
> This patch adds additional benchmarks and tests for string size of
> 4096 and several benchmarks for string size 256 with different
> alignments.
>
> Signed-off-by: noah <goldstein.w.n@gmail.com>
> ---
> Added 2 additional benchmark and test sizes:
>
> 4096: Just feels like a natural "large" size to test
>
> 256 with multiple alignments: This essentially is to test how
> expensive the initial work prior to the 4x loop is depending on
> different alignments.
>
> results from bench-strchr: All times are in seconds and the medium of
> 100 runs.  Old is current strchr-avx2.S implementation. New is this
> patch.
>
> Summary: New is definetly faster for medium -> large sizes. Once the
> 4x loop is hit there is a 10%+ speedup and New always wins out. For
> smaller sizes there is more variance as to which is faster and the
> differences are small. Generally it seems the New version wins
> out. This is likely because 0 - 31 sized strings are the fast path for
> new (no jmp). Also something that is neat is the significant
> performance improved for alignment 96 and 112. This is because the 5x
> vectors before 4x loop really favor that alignment.
>
> Benchmarking CPU:
> Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
>
> size, algn, Old T , New T  -------- Win  Dif
> 0   , 0   , 2.54  , 2.52   -------- New  -0.02
> 1   , 0   , 2.57  , 2.52   -------- New  -0.05
> 2   , 0   , 2.56  , 2.52   -------- New  -0.04
> 3   , 0   , 2.58  , 2.54   -------- New  -0.04
> 4   , 0   , 2.61  , 2.55   -------- New  -0.06
> 5   , 0   , 2.65  , 2.62   -------- New  -0.03
> 6   , 0   , 2.73  , 2.74   -------- Old  -0.01
> 7   , 0   , 2.75  , 2.74   -------- New  -0.01
> 8   , 0   , 2.62  , 2.6    -------- New  -0.02
> 9   , 0   , 2.73  , 2.75   -------- Old  -0.02
> 10  , 0   , 2.74  , 2.74   -------- Eq    N/A
> 11  , 0   , 2.76  , 2.72   -------- New  -0.04
> 12  , 0   , 2.74  , 2.72   -------- New  -0.02
> 13  , 0   , 2.75  , 2.72   -------- New  -0.03
> 14  , 0   , 2.74  , 2.73   -------- New  -0.01
> 15  , 0   , 2.74  , 2.73   -------- New  -0.01
> 16  , 0   , 2.74  , 2.73   -------- New  -0.01
> 17  , 0   , 2.74  , 2.74   -------- Eq    N/A
> 18  , 0   , 2.73  , 2.73   -------- Eq    N/A
> 19  , 0   , 2.73  , 2.73   -------- Eq    N/A
> 20  , 0   , 2.73  , 2.73   -------- Eq    N/A
> 21  , 0   , 2.73  , 2.72   -------- New  -0.01
> 22  , 0   , 2.71  , 2.74   -------- Old  -0.03
> 23  , 0   , 2.71  , 2.69   -------- New  -0.02
> 24  , 0   , 2.68  , 2.67   -------- New  -0.01
> 25  , 0   , 2.66  , 2.62   -------- New  -0.04
> 26  , 0   , 2.64  , 2.62   -------- New  -0.02
> 27  , 0   , 2.71  , 2.64   -------- New  -0.07
> 28  , 0   , 2.67  , 2.69   -------- Old  -0.02
> 29  , 0   , 2.72  , 2.72   -------- Eq    N/A
> 30  , 0   , 2.68  , 2.69   -------- Old  -0.01
> 31  , 0   , 2.68  , 2.68   -------- Eq    N/A
> 32  , 0   , 3.51  , 3.52   -------- Old  -0.01
> 32  , 1   , 3.52  , 3.51   -------- New  -0.01
> 64  , 0   , 3.97  , 3.93   -------- New  -0.04
> 64  , 2   , 3.95  , 3.9    -------- New  -0.05
> 64  , 1   , 4.0   , 3.93   -------- New  -0.07
> 64  , 3   , 3.97  , 3.88   -------- New  -0.09
> 64  , 4   , 3.95  , 3.89   -------- New  -0.06
> 64  , 5   , 3.94  , 3.9    -------- New  -0.04
> 64  , 6   , 3.97  , 3.9    -------- New  -0.07
> 64  , 7   , 3.97  , 3.91   -------- New  -0.06
> 96  , 0   , 4.74  , 4.52   -------- New  -0.22
> 128 , 0   , 5.29  , 5.19   -------- New  -0.1
> 128 , 2   , 5.29  , 5.15   -------- New  -0.14
> 128 , 3   , 5.31  , 5.22   -------- New  -0.09
> 256 , 0   , 11.19 , 9.81   -------- New  -1.38
> 256 , 3   , 11.19 , 9.84   -------- New  -1.35
> 256 , 4   , 11.2  , 9.88   -------- New  -1.32
> 256 , 16  , 11.21 , 9.79   -------- New  -1.42
> 256 , 32  , 11.39 , 10.34  -------- New  -1.05
> 256 , 48  , 11.88 , 10.56  -------- New  -1.32
> 256 , 64  , 11.82 , 10.83  -------- New  -0.99
> 256 , 80  , 11.85 , 10.86  -------- New  -0.99
> 256 , 96  , 9.56  , 8.76   -------- New  -0.8
> 256 , 112 , 9.55  , 8.9    -------- New  -0.65
> 512 , 0   , 15.76 , 13.72  -------- New  -2.04
> 512 , 4   , 15.72 , 13.74  -------- New  -1.98
> 512 , 5   , 15.73 , 13.74  -------- New  -1.99
> 1024, 0   , 24.85 , 21.33  -------- New  -3.52
> 1024, 5   , 24.86 , 21.27  -------- New  -3.59
> 1024, 6   , 24.87 , 21.32  -------- New  -3.55
> 2048, 0   , 45.75 , 36.7   -------- New  -9.05
> 2048, 6   , 43.91 , 35.42  -------- New  -8.49
> 2048, 7   , 44.43 , 36.37  -------- New  -8.06
> 4096, 0   , 96.94 , 81.34  -------- New  -15.6
> 4096, 7   , 97.01 , 81.32  -------- New  -15.69
>
>  benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
>  string/test-strchr.c      | 26 +++++++++++++++++++++++++-
>  2 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> index bf493fe458..4ce2369d9b 100644
> --- a/benchtests/bench-strchr.c
> +++ b/benchtests/bench-strchr.c
> @@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
>    size_t i;
>    CHAR *result;
>    CHAR *buf = (CHAR *) buf1;
> -  align &= 15;
> +  align &= 127;
>    if ((align + len) * sizeof (CHAR) >= page_size)
>      return;
>
> @@ -151,12 +151,24 @@ test_main (void)
>        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
>      }
>
> +  for (i = 1; i < 8; ++i)
> +    {
> +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> +    }
> +
>    for (i = 1; i < 8; ++i)
>      {
>        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
>        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
>      }
>
> +  for (i = 0; i < 8; ++i)
> +    {
> +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> +    }
> +
>    for (i = 0; i < 32; ++i)
>      {
>        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> @@ -169,12 +181,24 @@ test_main (void)
>        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
>      }
>
> +  for (i = 1; i < 8; ++i)
> +    {
> +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> +    }
> +
>    for (i = 1; i < 8; ++i)
>      {
>        do_test (i, 64, 256, 0, MIDDLE_CHAR);
>        do_test (i, 64, 256, 0, BIG_CHAR);
>      }
>
> +  for (i = 0; i < 8; ++i)
> +    {
> +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> +    }
> +
>    for (i = 0; i < 32; ++i)
>      {
>        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> diff --git a/string/test-strchr.c b/string/test-strchr.c
> index 5b6022746c..2cf4ea2add 100644
> --- a/string/test-strchr.c
> +++ b/string/test-strchr.c
> @@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
>    size_t i;
>    CHAR *result;
>    CHAR *buf = (CHAR *) buf1;
> -  align &= 15;
> +  align &= 127;
>    if ((align + len) * sizeof (CHAR) >= page_size)
>      return;
>
> @@ -259,12 +259,24 @@ test_main (void)
>        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
>      }
>
> +  for (i = 1; i < 8; ++i)
> +    {
> +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> +    }
> +
>    for (i = 1; i < 8; ++i)
>      {
>        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
>        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
>      }
>
> +  for (i = 0; i < 8; ++i)
> +    {
> +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> +    }
> +
>    for (i = 0; i < 32; ++i)
>      {
>        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> @@ -277,12 +289,24 @@ test_main (void)
>        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
>      }
>
> +  for (i = 1; i < 8; ++i)
> +    {
> +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> +    }
> +
>    for (i = 1; i < 8; ++i)
>      {
>        do_test (i, 64, 256, 0, MIDDLE_CHAR);
>        do_test (i, 64, 256, 0, BIG_CHAR);
>      }
>
> +  for (i = 0; i < 8; ++i)
> +    {
> +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> +    }
> +
>    for (i = 0; i < 32; ++i)
>      {
>        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> --
> 2.29.2
>

LGTM.

Thanks.
  
H.J. Lu Feb. 8, 2021, 7:34 p.m. UTC | #2
On Mon, Feb 8, 2021 at 6:08 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Feb 2, 2021 at 9:39 PM <goldstein.w.n@gmail.com> wrote:
> >
> > From: noah <goldstein.w.n@gmail.com>
> >
> > This patch adds additional benchmarks and tests for string size of
> > 4096 and several benchmarks for string size 256 with different
> > alignments.
> >
> > Signed-off-by: noah <goldstein.w.n@gmail.com>
> > ---
> > Added 2 additional benchmark and test sizes:
> >
> > 4096: Just feels like a natural "large" size to test
> >
> > 256 with multiple alignments: This essentially is to test how
> > expensive the initial work prior to the 4x loop is depending on
> > different alignments.
> >
> > results from bench-strchr: All times are in seconds and the medium of
> > 100 runs.  Old is current strchr-avx2.S implementation. New is this
> > patch.
> >
> > Summary: New is definetly faster for medium -> large sizes. Once the
> > 4x loop is hit there is a 10%+ speedup and New always wins out. For
> > smaller sizes there is more variance as to which is faster and the
> > differences are small. Generally it seems the New version wins
> > out. This is likely because 0 - 31 sized strings are the fast path for
> > new (no jmp). Also something that is neat is the significant
> > performance improved for alignment 96 and 112. This is because the 5x
> > vectors before 4x loop really favor that alignment.
> >
> > Benchmarking CPU:
> > Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
> >
> > size, algn, Old T , New T  -------- Win  Dif
> > 0   , 0   , 2.54  , 2.52   -------- New  -0.02
> > 1   , 0   , 2.57  , 2.52   -------- New  -0.05
> > 2   , 0   , 2.56  , 2.52   -------- New  -0.04
> > 3   , 0   , 2.58  , 2.54   -------- New  -0.04
> > 4   , 0   , 2.61  , 2.55   -------- New  -0.06
> > 5   , 0   , 2.65  , 2.62   -------- New  -0.03
> > 6   , 0   , 2.73  , 2.74   -------- Old  -0.01
> > 7   , 0   , 2.75  , 2.74   -------- New  -0.01
> > 8   , 0   , 2.62  , 2.6    -------- New  -0.02
> > 9   , 0   , 2.73  , 2.75   -------- Old  -0.02
> > 10  , 0   , 2.74  , 2.74   -------- Eq    N/A
> > 11  , 0   , 2.76  , 2.72   -------- New  -0.04
> > 12  , 0   , 2.74  , 2.72   -------- New  -0.02
> > 13  , 0   , 2.75  , 2.72   -------- New  -0.03
> > 14  , 0   , 2.74  , 2.73   -------- New  -0.01
> > 15  , 0   , 2.74  , 2.73   -------- New  -0.01
> > 16  , 0   , 2.74  , 2.73   -------- New  -0.01
> > 17  , 0   , 2.74  , 2.74   -------- Eq    N/A
> > 18  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > 19  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > 20  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > 21  , 0   , 2.73  , 2.72   -------- New  -0.01
> > 22  , 0   , 2.71  , 2.74   -------- Old  -0.03
> > 23  , 0   , 2.71  , 2.69   -------- New  -0.02
> > 24  , 0   , 2.68  , 2.67   -------- New  -0.01
> > 25  , 0   , 2.66  , 2.62   -------- New  -0.04
> > 26  , 0   , 2.64  , 2.62   -------- New  -0.02
> > 27  , 0   , 2.71  , 2.64   -------- New  -0.07
> > 28  , 0   , 2.67  , 2.69   -------- Old  -0.02
> > 29  , 0   , 2.72  , 2.72   -------- Eq    N/A
> > 30  , 0   , 2.68  , 2.69   -------- Old  -0.01
> > 31  , 0   , 2.68  , 2.68   -------- Eq    N/A
> > 32  , 0   , 3.51  , 3.52   -------- Old  -0.01
> > 32  , 1   , 3.52  , 3.51   -------- New  -0.01
> > 64  , 0   , 3.97  , 3.93   -------- New  -0.04
> > 64  , 2   , 3.95  , 3.9    -------- New  -0.05
> > 64  , 1   , 4.0   , 3.93   -------- New  -0.07
> > 64  , 3   , 3.97  , 3.88   -------- New  -0.09
> > 64  , 4   , 3.95  , 3.89   -------- New  -0.06
> > 64  , 5   , 3.94  , 3.9    -------- New  -0.04
> > 64  , 6   , 3.97  , 3.9    -------- New  -0.07
> > 64  , 7   , 3.97  , 3.91   -------- New  -0.06
> > 96  , 0   , 4.74  , 4.52   -------- New  -0.22
> > 128 , 0   , 5.29  , 5.19   -------- New  -0.1
> > 128 , 2   , 5.29  , 5.15   -------- New  -0.14
> > 128 , 3   , 5.31  , 5.22   -------- New  -0.09
> > 256 , 0   , 11.19 , 9.81   -------- New  -1.38
> > 256 , 3   , 11.19 , 9.84   -------- New  -1.35
> > 256 , 4   , 11.2  , 9.88   -------- New  -1.32
> > 256 , 16  , 11.21 , 9.79   -------- New  -1.42
> > 256 , 32  , 11.39 , 10.34  -------- New  -1.05
> > 256 , 48  , 11.88 , 10.56  -------- New  -1.32
> > 256 , 64  , 11.82 , 10.83  -------- New  -0.99
> > 256 , 80  , 11.85 , 10.86  -------- New  -0.99
> > 256 , 96  , 9.56  , 8.76   -------- New  -0.8
> > 256 , 112 , 9.55  , 8.9    -------- New  -0.65
> > 512 , 0   , 15.76 , 13.72  -------- New  -2.04
> > 512 , 4   , 15.72 , 13.74  -------- New  -1.98
> > 512 , 5   , 15.73 , 13.74  -------- New  -1.99
> > 1024, 0   , 24.85 , 21.33  -------- New  -3.52
> > 1024, 5   , 24.86 , 21.27  -------- New  -3.59
> > 1024, 6   , 24.87 , 21.32  -------- New  -3.55
> > 2048, 0   , 45.75 , 36.7   -------- New  -9.05
> > 2048, 6   , 43.91 , 35.42  -------- New  -8.49
> > 2048, 7   , 44.43 , 36.37  -------- New  -8.06
> > 4096, 0   , 96.94 , 81.34  -------- New  -15.6
> > 4096, 7   , 97.01 , 81.32  -------- New  -15.69
> >
> >  benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
> >  string/test-strchr.c      | 26 +++++++++++++++++++++++++-
> >  2 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> > index bf493fe458..4ce2369d9b 100644
> > --- a/benchtests/bench-strchr.c
> > +++ b/benchtests/bench-strchr.c
> > @@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> >    size_t i;
> >    CHAR *result;
> >    CHAR *buf = (CHAR *) buf1;
> > -  align &= 15;
> > +  align &= 127;
> >    if ((align + len) * sizeof (CHAR) >= page_size)
> >      return;
> >
> > @@ -151,12 +151,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> >        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > @@ -169,12 +181,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, 0, MIDDLE_CHAR);
> >        do_test (i, 64, 256, 0, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > diff --git a/string/test-strchr.c b/string/test-strchr.c
> > index 5b6022746c..2cf4ea2add 100644
> > --- a/string/test-strchr.c
> > +++ b/string/test-strchr.c
> > @@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> >    size_t i;
> >    CHAR *result;
> >    CHAR *buf = (CHAR *) buf1;
> > -  align &= 15;
> > +  align &= 127;
> >    if ((align + len) * sizeof (CHAR) >= page_size)
> >      return;
> >
> > @@ -259,12 +259,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> >        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > @@ -277,12 +289,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, 0, MIDDLE_CHAR);
> >        do_test (i, 64, 256, 0, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > --
> > 2.29.2
> >
>
> LGTM.
>
> Thanks.
>
> --
> H.J.

This is the updated patch with extra white spaces fixed I am checking in.
  
Noah Goldstein Feb. 8, 2021, 7:49 p.m. UTC | #3
On Mon, Feb 8, 2021 at 2:35 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 6:08 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Tue, Feb 2, 2021 at 9:39 PM <goldstein.w.n@gmail.com> wrote:
> > >
> > > From: noah <goldstein.w.n@gmail.com>
> > >
> > > This patch adds additional benchmarks and tests for string size of
> > > 4096 and several benchmarks for string size 256 with different
> > > alignments.
> > >
> > > Signed-off-by: noah <goldstein.w.n@gmail.com>
> > > ---
> > > Added 2 additional benchmark and test sizes:
> > >
> > > 4096: Just feels like a natural "large" size to test
> > >
> > > 256 with multiple alignments: This essentially is to test how
> > > expensive the initial work prior to the 4x loop is depending on
> > > different alignments.
> > >
> > > results from bench-strchr: All times are in seconds and the medium of
> > > 100 runs.  Old is current strchr-avx2.S implementation. New is this
> > > patch.
> > >
> > > Summary: New is definetly faster for medium -> large sizes. Once the
> > > 4x loop is hit there is a 10%+ speedup and New always wins out. For
> > > smaller sizes there is more variance as to which is faster and the
> > > differences are small. Generally it seems the New version wins
> > > out. This is likely because 0 - 31 sized strings are the fast path for
> > > new (no jmp). Also something that is neat is the significant
> > > performance improved for alignment 96 and 112. This is because the 5x
> > > vectors before 4x loop really favor that alignment.
> > >
> > > Benchmarking CPU:
> > > Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
> > >
> > > size, algn, Old T , New T  -------- Win  Dif
> > > 0   , 0   , 2.54  , 2.52   -------- New  -0.02
> > > 1   , 0   , 2.57  , 2.52   -------- New  -0.05
> > > 2   , 0   , 2.56  , 2.52   -------- New  -0.04
> > > 3   , 0   , 2.58  , 2.54   -------- New  -0.04
> > > 4   , 0   , 2.61  , 2.55   -------- New  -0.06
> > > 5   , 0   , 2.65  , 2.62   -------- New  -0.03
> > > 6   , 0   , 2.73  , 2.74   -------- Old  -0.01
> > > 7   , 0   , 2.75  , 2.74   -------- New  -0.01
> > > 8   , 0   , 2.62  , 2.6    -------- New  -0.02
> > > 9   , 0   , 2.73  , 2.75   -------- Old  -0.02
> > > 10  , 0   , 2.74  , 2.74   -------- Eq    N/A
> > > 11  , 0   , 2.76  , 2.72   -------- New  -0.04
> > > 12  , 0   , 2.74  , 2.72   -------- New  -0.02
> > > 13  , 0   , 2.75  , 2.72   -------- New  -0.03
> > > 14  , 0   , 2.74  , 2.73   -------- New  -0.01
> > > 15  , 0   , 2.74  , 2.73   -------- New  -0.01
> > > 16  , 0   , 2.74  , 2.73   -------- New  -0.01
> > > 17  , 0   , 2.74  , 2.74   -------- Eq    N/A
> > > 18  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > > 19  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > > 20  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > > 21  , 0   , 2.73  , 2.72   -------- New  -0.01
> > > 22  , 0   , 2.71  , 2.74   -------- Old  -0.03
> > > 23  , 0   , 2.71  , 2.69   -------- New  -0.02
> > > 24  , 0   , 2.68  , 2.67   -------- New  -0.01
> > > 25  , 0   , 2.66  , 2.62   -------- New  -0.04
> > > 26  , 0   , 2.64  , 2.62   -------- New  -0.02
> > > 27  , 0   , 2.71  , 2.64   -------- New  -0.07
> > > 28  , 0   , 2.67  , 2.69   -------- Old  -0.02
> > > 29  , 0   , 2.72  , 2.72   -------- Eq    N/A
> > > 30  , 0   , 2.68  , 2.69   -------- Old  -0.01
> > > 31  , 0   , 2.68  , 2.68   -------- Eq    N/A
> > > 32  , 0   , 3.51  , 3.52   -------- Old  -0.01
> > > 32  , 1   , 3.52  , 3.51   -------- New  -0.01
> > > 64  , 0   , 3.97  , 3.93   -------- New  -0.04
> > > 64  , 2   , 3.95  , 3.9    -------- New  -0.05
> > > 64  , 1   , 4.0   , 3.93   -------- New  -0.07
> > > 64  , 3   , 3.97  , 3.88   -------- New  -0.09
> > > 64  , 4   , 3.95  , 3.89   -------- New  -0.06
> > > 64  , 5   , 3.94  , 3.9    -------- New  -0.04
> > > 64  , 6   , 3.97  , 3.9    -------- New  -0.07
> > > 64  , 7   , 3.97  , 3.91   -------- New  -0.06
> > > 96  , 0   , 4.74  , 4.52   -------- New  -0.22
> > > 128 , 0   , 5.29  , 5.19   -------- New  -0.1
> > > 128 , 2   , 5.29  , 5.15   -------- New  -0.14
> > > 128 , 3   , 5.31  , 5.22   -------- New  -0.09
> > > 256 , 0   , 11.19 , 9.81   -------- New  -1.38
> > > 256 , 3   , 11.19 , 9.84   -------- New  -1.35
> > > 256 , 4   , 11.2  , 9.88   -------- New  -1.32
> > > 256 , 16  , 11.21 , 9.79   -------- New  -1.42
> > > 256 , 32  , 11.39 , 10.34  -------- New  -1.05
> > > 256 , 48  , 11.88 , 10.56  -------- New  -1.32
> > > 256 , 64  , 11.82 , 10.83  -------- New  -0.99
> > > 256 , 80  , 11.85 , 10.86  -------- New  -0.99
> > > 256 , 96  , 9.56  , 8.76   -------- New  -0.8
> > > 256 , 112 , 9.55  , 8.9    -------- New  -0.65
> > > 512 , 0   , 15.76 , 13.72  -------- New  -2.04
> > > 512 , 4   , 15.72 , 13.74  -------- New  -1.98
> > > 512 , 5   , 15.73 , 13.74  -------- New  -1.99
> > > 1024, 0   , 24.85 , 21.33  -------- New  -3.52
> > > 1024, 5   , 24.86 , 21.27  -------- New  -3.59
> > > 1024, 6   , 24.87 , 21.32  -------- New  -3.55
> > > 2048, 0   , 45.75 , 36.7   -------- New  -9.05
> > > 2048, 6   , 43.91 , 35.42  -------- New  -8.49
> > > 2048, 7   , 44.43 , 36.37  -------- New  -8.06
> > > 4096, 0   , 96.94 , 81.34  -------- New  -15.6
> > > 4096, 7   , 97.01 , 81.32  -------- New  -15.69
> > >
> > >  benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
> > >  string/test-strchr.c      | 26 +++++++++++++++++++++++++-
> > >  2 files changed, 50 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> > > index bf493fe458..4ce2369d9b 100644
> > > --- a/benchtests/bench-strchr.c
> > > +++ b/benchtests/bench-strchr.c
> > > @@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> > >    size_t i;
> > >    CHAR *result;
> > >    CHAR *buf = (CHAR *) buf1;
> > > -  align &= 15;
> > > +  align &= 127;
> > >    if ((align + len) * sizeof (CHAR) >= page_size)
> > >      return;
> > >
> > > @@ -151,12 +151,24 @@ test_main (void)
> > >        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> > >      }
> > >
> > > +  for (i = 1; i < 8; ++i)
> > > +    {
> > > +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > +    }
> > > +
> > >    for (i = 1; i < 8; ++i)
> > >      {
> > >        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> > >        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> > >      }
> > >
> > > +  for (i = 0; i < 8; ++i)
> > > +    {
> > > +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > > +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > > +    }
> > > +
> > >    for (i = 0; i < 32; ++i)
> > >      {
> > >        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > > @@ -169,12 +181,24 @@ test_main (void)
> > >        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> > >      }
> > >
> > > +  for (i = 1; i < 8; ++i)
> > > +    {
> > > +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > +    }
> > > +
> > >    for (i = 1; i < 8; ++i)
> > >      {
> > >        do_test (i, 64, 256, 0, MIDDLE_CHAR);
> > >        do_test (i, 64, 256, 0, BIG_CHAR);
> > >      }
> > >
> > > +  for (i = 0; i < 8; ++i)
> > > +    {
> > > +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > > +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > > +    }
> > > +
> > >    for (i = 0; i < 32; ++i)
> > >      {
> > >        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > > diff --git a/string/test-strchr.c b/string/test-strchr.c
> > > index 5b6022746c..2cf4ea2add 100644
> > > --- a/string/test-strchr.c
> > > +++ b/string/test-strchr.c
> > > @@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> > >    size_t i;
> > >    CHAR *result;
> > >    CHAR *buf = (CHAR *) buf1;
> > > -  align &= 15;
> > > +  align &= 127;
> > >    if ((align + len) * sizeof (CHAR) >= page_size)
> > >      return;
> > >
> > > @@ -259,12 +259,24 @@ test_main (void)
> > >        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> > >      }
> > >
> > > +  for (i = 1; i < 8; ++i)
> > > +    {
> > > +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > +    }
> > > +
> > >    for (i = 1; i < 8; ++i)
> > >      {
> > >        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> > >        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> > >      }
> > >
> > > +  for (i = 0; i < 8; ++i)
> > > +    {
> > > +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > > +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > > +    }
> > > +
> > >    for (i = 0; i < 32; ++i)
> > >      {
> > >        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > > @@ -277,12 +289,24 @@ test_main (void)
> > >        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> > >      }
> > >
> > > +  for (i = 1; i < 8; ++i)
> > > +    {
> > > +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > +    }
> > > +
> > >    for (i = 1; i < 8; ++i)
> > >      {
> > >        do_test (i, 64, 256, 0, MIDDLE_CHAR);
> > >        do_test (i, 64, 256, 0, BIG_CHAR);
> > >      }
> > >
> > > +  for (i = 0; i < 8; ++i)
> > > +    {
> > > +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > > +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > > +    }
> > > +
> > >    for (i = 0; i < 32; ++i)
> > >      {
> > >        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > > --
> > > 2.29.2
> > >
> >
> > LGTM.
> >
> > Thanks.
> >
> > --
> > H.J.
>
> This is the updated patch with extra white spaces fixed I am checking in.
>
>
> --
> H.J.

Awesome! Thanks!

N.G.
  

Patch

diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
index bf493fe458..4ce2369d9b 100644
--- a/benchtests/bench-strchr.c
+++ b/benchtests/bench-strchr.c
@@ -100,7 +100,7 @@  do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
     return;
 
@@ -151,12 +151,24 @@  test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -169,12 +181,24 @@  test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);
diff --git a/string/test-strchr.c b/string/test-strchr.c
index 5b6022746c..2cf4ea2add 100644
--- a/string/test-strchr.c
+++ b/string/test-strchr.c
@@ -130,7 +130,7 @@  do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
     return;
 
@@ -259,12 +259,24 @@  test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+  
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+  
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -277,12 +289,24 @@  test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+  
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+  
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);