aarch64: update ampere1 vectorization cost

Message ID 20230327074654.1126912-1-philipp.tomsich@vrull.eu
State Committed
Commit ff1f2f2412bda118f7ddc10e69bd4284d9b24b9e
Headers
Series aarch64: update ampere1 vectorization cost |

Commit Message

Philipp Tomsich March 27, 2023, 7:46 a.m. UTC
  The original submission of AmpereOne (-mcpu=ampere1) costs occurred
prior to exhaustive testing of vectorizable workloads against
hardware.

Adjust the vector costs to achieve the best results and more closely
match the underlying hardware.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc: Update vector costs for ampere1.

Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---
We would like to get this into GCC 13 to avoid having to backport at
the start of the next cycle.

OK for backports?

 gcc/config/aarch64/aarch64.cc | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)
  

Comments

Kyrylo Tkachov March 27, 2023, 8:44 a.m. UTC | #1
Hi Philipp,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Philipp
> Tomsich
> Sent: Monday, March 27, 2023 8:47 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
> <Tamar.Christina@arm.com>; Philipp Tomsich <philipp.tomsich@vrull.eu>;
> Manolis Tsamis <manolis.tsamis@vrull.eu>
> Subject: [PATCH] aarch64: update ampere1 vectorization cost
> 
> The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> prior to exhaustive testing of vectorizable workloads against
> hardware.
> 
> Adjust the vector costs to achieve the best results and more closely
> match the underlying hardware.
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64.cc: Update vector costs for ampere1.
> 
> Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
> 
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> ---
> We would like to get this into GCC 13 to avoid having to backport at
> the start of the next cycle.
> 

Given this affects only the ampere1 costs that sounds fine to me and fairly low risk, you are being trusted that these costs are actually desirable and properly validated on the hardware involved.

> OK for backports?

This is ok for trunk (GCC 13). Do you also want to backport this to other branches?
Thanks,
Kyrill

> 
>  gcc/config/aarch64/aarch64.cc | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index b27f4354031..661fff65cea 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> thunderx3t110_vector_cost =
> 
>  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
>  {
> -  3, /* int_stmt_cost  */
> +  1, /* int_stmt_cost  */
>    3, /* fp_stmt_cost  */
>    0, /* ld2_st2_permute_cost  */
>    0, /* ld3_st3_permute_cost  */
> @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> ampere1_advsimd_vector_cost =
>    8, /* store_elt_extra_cost  */
>    6, /* vec_to_scalar_cost  */
>    7, /* scalar_to_vec_cost  */
> -  5, /* align_load_cost  */
> -  5, /* unalign_load_cost  */
> -  2, /* unalign_store_cost  */
> -  2  /* store_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
>  };
> 
>  /* Ampere-1 costs for vector insn classes.  */
>  static const struct cpu_vector_cost ampere1_vector_cost =
>  {
>    1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> +  3, /* scalar_fp_stmt_cost  */
>    4, /* scalar_load_cost  */
>    1, /* scalar_store_cost  */
>    1, /* cond_taken_branch_cost  */
> --
> 2.34.1
  
Philipp Tomsich March 27, 2023, 8:49 a.m. UTC | #2
On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
> Hi Philipp,
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-
> > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Monday, March 27, 2023 8:47 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
> > <Tamar.Christina@arm.com>; Philipp Tomsich <philipp.tomsich@vrull.eu>;
> > Manolis Tsamis <manolis.tsamis@vrull.eu>
> > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > prior to exhaustive testing of vectorizable workloads against
> > hardware.
> >
> > Adjust the vector costs to achieve the best results and more closely
> > match the underlying hardware.
> >
> > gcc/ChangeLog:
> >
> >       * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> >
> > Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> > We would like to get this into GCC 13 to avoid having to backport at
> > the start of the next cycle.
> >
>
> Given this affects only the ampere1 costs that sounds fine to me and fairly low risk, you are being trusted that these costs are actually desirable and properly validated on the hardware involved.
>
> > OK for backports?
>
> This is ok for trunk (GCC 13). Do you also want to backport this to other branches?

Ampere1 (with the older vector costs) are in GCC12 and GCC11.
I would like to backport to those as well.

Thanks,
Philipp.

> Thanks,
> Kyrill
>
> >
> >  gcc/config/aarch64/aarch64.cc | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index b27f4354031..661fff65cea 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > thunderx3t110_vector_cost =
> >
> >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> >  {
> > -  3, /* int_stmt_cost  */
> > +  1, /* int_stmt_cost  */
> >    3, /* fp_stmt_cost  */
> >    0, /* ld2_st2_permute_cost  */
> >    0, /* ld3_st3_permute_cost  */
> > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > ampere1_advsimd_vector_cost =
> >    8, /* store_elt_extra_cost  */
> >    6, /* vec_to_scalar_cost  */
> >    7, /* scalar_to_vec_cost  */
> > -  5, /* align_load_cost  */
> > -  5, /* unalign_load_cost  */
> > -  2, /* unalign_store_cost  */
> > -  2  /* store_cost  */
> > +  4, /* align_load_cost  */
> > +  4, /* unalign_load_cost  */
> > +  1, /* unalign_store_cost  */
> > +  1  /* store_cost  */
> >  };
> >
> >  /* Ampere-1 costs for vector insn classes.  */
> >  static const struct cpu_vector_cost ampere1_vector_cost =
> >  {
> >    1, /* scalar_int_stmt_cost  */
> > -  1, /* scalar_fp_stmt_cost  */
> > +  3, /* scalar_fp_stmt_cost  */
> >    4, /* scalar_load_cost  */
> >    1, /* scalar_store_cost  */
> >    1, /* cond_taken_branch_cost  */
> > --
> > 2.34.1
>
  
Philipp Tomsich March 27, 2023, 9:13 a.m. UTC | #3
Applied to master, thanks!
Philipp.

On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
> Hi Philipp,
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-
> > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Monday, March 27, 2023 8:47 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
> > <Tamar.Christina@arm.com>; Philipp Tomsich <philipp.tomsich@vrull.eu>;
> > Manolis Tsamis <manolis.tsamis@vrull.eu>
> > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > prior to exhaustive testing of vectorizable workloads against
> > hardware.
> >
> > Adjust the vector costs to achieve the best results and more closely
> > match the underlying hardware.
> >
> > gcc/ChangeLog:
> >
> >       * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> >
> > Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> > We would like to get this into GCC 13 to avoid having to backport at
> > the start of the next cycle.
> >
>
> Given this affects only the ampere1 costs that sounds fine to me and fairly low risk, you are being trusted that these costs are actually desirable and properly validated on the hardware involved.
>
> > OK for backports?
>
> This is ok for trunk (GCC 13). Do you also want to backport this to other branches?
> Thanks,
> Kyrill
>
> >
> >  gcc/config/aarch64/aarch64.cc | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index b27f4354031..661fff65cea 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > thunderx3t110_vector_cost =
> >
> >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> >  {
> > -  3, /* int_stmt_cost  */
> > +  1, /* int_stmt_cost  */
> >    3, /* fp_stmt_cost  */
> >    0, /* ld2_st2_permute_cost  */
> >    0, /* ld3_st3_permute_cost  */
> > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > ampere1_advsimd_vector_cost =
> >    8, /* store_elt_extra_cost  */
> >    6, /* vec_to_scalar_cost  */
> >    7, /* scalar_to_vec_cost  */
> > -  5, /* align_load_cost  */
> > -  5, /* unalign_load_cost  */
> > -  2, /* unalign_store_cost  */
> > -  2  /* store_cost  */
> > +  4, /* align_load_cost  */
> > +  4, /* unalign_load_cost  */
> > +  1, /* unalign_store_cost  */
> > +  1  /* store_cost  */
> >  };
> >
> >  /* Ampere-1 costs for vector insn classes.  */
> >  static const struct cpu_vector_cost ampere1_vector_cost =
> >  {
> >    1, /* scalar_int_stmt_cost  */
> > -  1, /* scalar_fp_stmt_cost  */
> > +  3, /* scalar_fp_stmt_cost  */
> >    4, /* scalar_load_cost  */
> >    1, /* scalar_store_cost  */
> >    1, /* cond_taken_branch_cost  */
> > --
> > 2.34.1
>
  
Kyrylo Tkachov March 27, 2023, 9:23 a.m. UTC | #4
> -----Original Message-----
> From: Philipp Tomsich <philipp.tomsich@vrull.eu>
> Sent: Monday, March 27, 2023 9:50 AM
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Tamar Christina
> <Tamar.Christina@arm.com>; Manolis Tsamis <manolis.tsamis@vrull.eu>
> Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> 
> On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >
> > Hi Philipp,
> >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-
> > > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Philipp
> > > Tomsich
> > > Sent: Monday, March 27, 2023 8:47 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
> > > <Tamar.Christina@arm.com>; Philipp Tomsich
> <philipp.tomsich@vrull.eu>;
> > > Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > >
> > > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > > prior to exhaustive testing of vectorizable workloads against
> > > hardware.
> > >
> > > Adjust the vector costs to achieve the best results and more closely
> > > match the underlying hardware.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > >
> > > Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > ---
> > > We would like to get this into GCC 13 to avoid having to backport at
> > > the start of the next cycle.
> > >
> >
> > Given this affects only the ampere1 costs that sounds fine to me and fairly
> low risk, you are being trusted that these costs are actually desirable and
> properly validated on the hardware involved.
> >
> > > OK for backports?
> >
> > This is ok for trunk (GCC 13). Do you also want to backport this to other
> branches?
> 
> Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> I would like to backport to those as well.

Ok then, though you may want to run the benchmarks on the branches as well to make sure the costs give the expected benefit there as well.
Thanks,
Kyrill

> 
> Thanks,
> Philipp.
> 
> > Thanks,
> > Kyrill
> >
> > >
> > >  gcc/config/aarch64/aarch64.cc | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > > index b27f4354031..661fff65cea 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > thunderx3t110_vector_cost =
> > >
> > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > >  {
> > > -  3, /* int_stmt_cost  */
> > > +  1, /* int_stmt_cost  */
> > >    3, /* fp_stmt_cost  */
> > >    0, /* ld2_st2_permute_cost  */
> > >    0, /* ld3_st3_permute_cost  */
> > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > ampere1_advsimd_vector_cost =
> > >    8, /* store_elt_extra_cost  */
> > >    6, /* vec_to_scalar_cost  */
> > >    7, /* scalar_to_vec_cost  */
> > > -  5, /* align_load_cost  */
> > > -  5, /* unalign_load_cost  */
> > > -  2, /* unalign_store_cost  */
> > > -  2  /* store_cost  */
> > > +  4, /* align_load_cost  */
> > > +  4, /* unalign_load_cost  */
> > > +  1, /* unalign_store_cost  */
> > > +  1  /* store_cost  */
> > >  };
> > >
> > >  /* Ampere-1 costs for vector insn classes.  */
> > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > >  {
> > >    1, /* scalar_int_stmt_cost  */
> > > -  1, /* scalar_fp_stmt_cost  */
> > > +  3, /* scalar_fp_stmt_cost  */
> > >    4, /* scalar_load_cost  */
> > >    1, /* scalar_store_cost  */
> > >    1, /* cond_taken_branch_cost  */
> > > --
> > > 2.34.1
> >
  
Philipp Tomsich April 3, 2023, 11:45 a.m. UTC | #5
Kyrill,

We reran on GCC12 and GCC11, reproducing the same improvements (e.g.,
on fotonik3d) that prompted the changes.
I'll apply the backports later this week, unless you have any further concerns…

Thanks,
Philipp.


On Mon, 27 Mar 2023 at 11:24, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > Sent: Monday, March 27, 2023 9:50 AM
> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > <Richard.Sandiford@arm.com>; Tamar Christina
> > <Tamar.Christina@arm.com>; Manolis Tsamis <manolis.tsamis@vrull.eu>
> > Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > wrote:
> > >
> > > Hi Philipp,
> > >
> > > > -----Original Message-----
> > > > From: Gcc-patches <gcc-patches-
> > > > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Philipp
> > > > Tomsich
> > > > Sent: Monday, March 27, 2023 8:47 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
> > > > <Tamar.Christina@arm.com>; Philipp Tomsich
> > <philipp.tomsich@vrull.eu>;
> > > > Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > > >
> > > > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > > > prior to exhaustive testing of vectorizable workloads against
> > > > hardware.
> > > >
> > > > Adjust the vector costs to achieve the best results and more closely
> > > > match the underlying hardware.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > > >
> > > > Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > >
> > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > > ---
> > > > We would like to get this into GCC 13 to avoid having to backport at
> > > > the start of the next cycle.
> > > >
> > >
> > > Given this affects only the ampere1 costs that sounds fine to me and fairly
> > low risk, you are being trusted that these costs are actually desirable and
> > properly validated on the hardware involved.
> > >
> > > > OK for backports?
> > >
> > > This is ok for trunk (GCC 13). Do you also want to backport this to other
> > branches?
> >
> > Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> > I would like to backport to those as well.
>
> Ok then, though you may want to run the benchmarks on the branches as well to make sure the costs give the expected benefit there as well.
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Philipp.
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >
> > > >  gcc/config/aarch64/aarch64.cc | 12 ++++++------
> > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc
> > > > index b27f4354031..661fff65cea 100644
> > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > > thunderx3t110_vector_cost =
> > > >
> > > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > > >  {
> > > > -  3, /* int_stmt_cost  */
> > > > +  1, /* int_stmt_cost  */
> > > >    3, /* fp_stmt_cost  */
> > > >    0, /* ld2_st2_permute_cost  */
> > > >    0, /* ld3_st3_permute_cost  */
> > > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > > ampere1_advsimd_vector_cost =
> > > >    8, /* store_elt_extra_cost  */
> > > >    6, /* vec_to_scalar_cost  */
> > > >    7, /* scalar_to_vec_cost  */
> > > > -  5, /* align_load_cost  */
> > > > -  5, /* unalign_load_cost  */
> > > > -  2, /* unalign_store_cost  */
> > > > -  2  /* store_cost  */
> > > > +  4, /* align_load_cost  */
> > > > +  4, /* unalign_load_cost  */
> > > > +  1, /* unalign_store_cost  */
> > > > +  1  /* store_cost  */
> > > >  };
> > > >
> > > >  /* Ampere-1 costs for vector insn classes.  */
> > > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > > >  {
> > > >    1, /* scalar_int_stmt_cost  */
> > > > -  1, /* scalar_fp_stmt_cost  */
> > > > +  3, /* scalar_fp_stmt_cost  */
> > > >    4, /* scalar_load_cost  */
> > > >    1, /* scalar_store_cost  */
> > > >    1, /* cond_taken_branch_cost  */
> > > > --
> > > > 2.34.1
> > >
  
Kyrylo Tkachov April 4, 2023, 12:49 p.m. UTC | #6
Hi Philipp,

> -----Original Message-----
> From: Philipp Tomsich <philipp.tomsich@vrull.eu>
> Sent: Monday, April 3, 2023 12:46 PM
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> <Richard.Sandiford@arm.com>; Tamar Christina
> <Tamar.Christina@arm.com>; Manolis Tsamis <manolis.tsamis@vrull.eu>
> Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> 
> Kyrill,
> 
> We reran on GCC12 and GCC11, reproducing the same improvements (e.g.,
> on fotonik3d) that prompted the changes.
> I'll apply the backports later this week, unless you have any further
> concerns…

Ok, thanks for checking.
Kyrill

> 
> Thanks,
> Philipp.
> 
> 
> On Mon, 27 Mar 2023 at 11:24, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > Sent: Monday, March 27, 2023 9:50 AM
> > > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > > <Richard.Sandiford@arm.com>; Tamar Christina
> > > <Tamar.Christina@arm.com>; Manolis Tsamis
> <manolis.tsamis@vrull.eu>
> > > Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> > >
> > > On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> > > wrote:
> > > >
> > > > Hi Philipp,
> > > >
> > > > > -----Original Message-----
> > > > > From: Gcc-patches <gcc-patches-
> > > > > bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Philipp
> > > > > Tomsich
> > > > > Sent: Monday, March 27, 2023 8:47 AM
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar
> Christina
> > > > > <Tamar.Christina@arm.com>; Philipp Tomsich
> > > <philipp.tomsich@vrull.eu>;
> > > > > Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > > > >
> > > > > The original submission of AmpereOne (-mcpu=ampere1) costs
> occurred
> > > > > prior to exhaustive testing of vectorizable workloads against
> > > > > hardware.
> > > > >
> > > > > Adjust the vector costs to achieve the best results and more closely
> > > > > match the underlying hardware.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >       * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > > > >
> > > > > Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > > > >
> > > > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > > > ---
> > > > > We would like to get this into GCC 13 to avoid having to backport at
> > > > > the start of the next cycle.
> > > > >
> > > >
> > > > Given this affects only the ampere1 costs that sounds fine to me and
> fairly
> > > low risk, you are being trusted that these costs are actually desirable and
> > > properly validated on the hardware involved.
> > > >
> > > > > OK for backports?
> > > >
> > > > This is ok for trunk (GCC 13). Do you also want to backport this to other
> > > branches?
> > >
> > > Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> > > I would like to backport to those as well.
> >
> > Ok then, though you may want to run the benchmarks on the branches as
> well to make sure the costs give the expected benefit there as well.
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Philipp.
> > >
> > > > Thanks,
> > > > Kyrill
> > > >
> > > > >
> > > > >  gcc/config/aarch64/aarch64.cc | 12 ++++++------
> > > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64.cc
> > > b/gcc/config/aarch64/aarch64.cc
> > > > > index b27f4354031..661fff65cea 100644
> > > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > > > thunderx3t110_vector_cost =
> > > > >
> > > > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > > > >  {
> > > > > -  3, /* int_stmt_cost  */
> > > > > +  1, /* int_stmt_cost  */
> > > > >    3, /* fp_stmt_cost  */
> > > > >    0, /* ld2_st2_permute_cost  */
> > > > >    0, /* ld3_st3_permute_cost  */
> > > > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > > > ampere1_advsimd_vector_cost =
> > > > >    8, /* store_elt_extra_cost  */
> > > > >    6, /* vec_to_scalar_cost  */
> > > > >    7, /* scalar_to_vec_cost  */
> > > > > -  5, /* align_load_cost  */
> > > > > -  5, /* unalign_load_cost  */
> > > > > -  2, /* unalign_store_cost  */
> > > > > -  2  /* store_cost  */
> > > > > +  4, /* align_load_cost  */
> > > > > +  4, /* unalign_load_cost  */
> > > > > +  1, /* unalign_store_cost  */
> > > > > +  1  /* store_cost  */
> > > > >  };
> > > > >
> > > > >  /* Ampere-1 costs for vector insn classes.  */
> > > > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > > > >  {
> > > > >    1, /* scalar_int_stmt_cost  */
> > > > > -  1, /* scalar_fp_stmt_cost  */
> > > > > +  3, /* scalar_fp_stmt_cost  */
> > > > >    4, /* scalar_load_cost  */
> > > > >    1, /* scalar_store_cost  */
> > > > >    1, /* cond_taken_branch_cost  */
> > > > > --
> > > > > 2.34.1
> > > >
  

Patch

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index b27f4354031..661fff65cea 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1132,7 +1132,7 @@  static const struct cpu_vector_cost thunderx3t110_vector_cost =
 
 static const advsimd_vec_cost ampere1_advsimd_vector_cost =
 {
-  3, /* int_stmt_cost  */
+  1, /* int_stmt_cost  */
   3, /* fp_stmt_cost  */
   0, /* ld2_st2_permute_cost  */
   0, /* ld3_st3_permute_cost  */
@@ -1148,17 +1148,17 @@  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
   8, /* store_elt_extra_cost  */
   6, /* vec_to_scalar_cost  */
   7, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  2, /* unalign_store_cost  */
-  2  /* store_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
 };
 
 /* Ampere-1 costs for vector insn classes.  */
 static const struct cpu_vector_cost ampere1_vector_cost =
 {
   1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
+  3, /* scalar_fp_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   1, /* cond_taken_branch_cost  */