diff mbox series

x86: Update -mtune=alderlake

Message ID	20211110090931.17050-1-lili.cui@intel.com
State	Committed
Commit	4f442a3bcb99205e0fe7863c3a6dde970134a64a
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 027173857C56 To: gcc-patches@gcc.gnu.org Subject: [PATCH] x86: Update -mtune=alderlake Date: Wed, 10 Nov 2021 17:09:31 +0800 Message-Id: <20211110090931.17050-1-lili.cui@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list From: "Cui,Lili via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Cui,Lili" <lili.cui@intel.com> Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	x86: Update -mtune=alderlake \| x86: Update -mtune=alderlake

Commit Message

Li, Pan2 via Gcc-patches Nov. 10, 2021, 9:09 a.m. UTC

  Hi Uros,

This patch is to update mtune for alderlake.

Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

OK for master?

Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support
Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and
UMONITOR/UMWAIT/TPAUSE are supported.

gcc/ChangeLog

	* config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake
	from m_CORE_AVX2.
	(processor_cost_table): Use alderlake_cost for Alderlake.
	* config/i386/i386.c (ix86_sched_init_global): Handle Alderlake.
	* config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake
	cost.
	* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake
	issue rate to 4.
	(ix86_adjust_cost): Handle Alderlake.
	* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake.
	(X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
	(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
	(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
	(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
	(X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
	(X86_TUNE_USE_LEAVE): Likewise.
	(X86_TUNE_PUSH_MEMORY): Likewise.
	(X86_TUNE_USE_INCDEC): Likewise.
	(X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
	(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
	(X86_TUNE_USE_SAHF): Likewise.
	(X86_TUNE_USE_BT): Likewise.
	(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
	(X86_TUNE_ONE_IF_CONV_INSN): Likewise.
	(X86_TUNE_AVOID_MFENCE): Likewise.
	(X86_TUNE_USE_SIMODE_FIOP): Likewise.
	(X86_TUNE_EXT_80387_CONSTANTS): Likewise.
	(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
	(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
	(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
	(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
	(X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise.
	(X86_TUNE_USE_GATHER): Disable for Alderlake.
	(X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise.
	(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.
---
 gcc/config/i386/i386-options.c   |   4 +-
 gcc/config/i386/i386.c           |   1 +
 gcc/config/i386/x86-tune-costs.h | 120 +++++++++++++++++++++++++++++++
 gcc/config/i386/x86-tune-sched.c |   2 +
 gcc/config/i386/x86-tune.def     |  58 +++++++--------
 5 files changed, 155 insertions(+), 30 deletions(-)

Comments

Uros Bizjak Nov. 10, 2021, 12:25 p.m. UTC | #1

On Wed, Nov 10, 2021 at 10:09 AM Cui,Lili <lili.cui@intel.com> wrote:
>
> Hi Uros,
>
> This patch is to update mtune for alderlake.
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
>
> Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support
> Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and
> UMONITOR/UMWAIT/TPAUSE are supported.
>
> gcc/ChangeLog
>
>         * config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake
>         from m_CORE_AVX2.
>         (processor_cost_table): Use alderlake_cost for Alderlake.
>         * config/i386/i386.c (ix86_sched_init_global): Handle Alderlake.
>         * config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake
>         cost.
>         * config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake
>         issue rate to 4.
>         (ix86_adjust_cost): Handle Alderlake.
>         * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake.
>         (X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
>         (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
>         (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
>         (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
>         (X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
>         (X86_TUNE_USE_LEAVE): Likewise.
>         (X86_TUNE_PUSH_MEMORY): Likewise.
>         (X86_TUNE_USE_INCDEC): Likewise.
>         (X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
>         (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
>         (X86_TUNE_USE_SAHF): Likewise.
>         (X86_TUNE_USE_BT): Likewise.
>         (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
>         (X86_TUNE_ONE_IF_CONV_INSN): Likewise.
>         (X86_TUNE_AVOID_MFENCE): Likewise.
>         (X86_TUNE_USE_SIMODE_FIOP): Likewise.
>         (X86_TUNE_EXT_80387_CONSTANTS): Likewise.
>         (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
>         (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
>         (X86_TUNE_SSE_TYPELESS_STORES): Likewise.
>         (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
>         (X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise.
>         (X86_TUNE_USE_GATHER): Disable for Alderlake.
>         (X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise.
>         (X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.c   |   4 +-
>  gcc/config/i386/i386.c           |   1 +
>  gcc/config/i386/x86-tune-costs.h | 120 +++++++++++++++++++++++++++++++
>  gcc/config/i386/x86-tune-sched.c |   2 +
>  gcc/config/i386/x86-tune.def     |  58 +++++++--------
>  5 files changed, 155 insertions(+), 30 deletions(-)
>
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index e7a3bd4aaea..a8cc0664f11 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -131,7 +131,7 @@ along with GCC; see the file COPYING3.  If not see
>                        | m_ICELAKE_CLIENT | m_ICELAKE_SERVER | m_CASCADELAKE \
>                        | m_TIGERLAKE | m_COOPERLAKE | m_SAPPHIRERAPIDS \
>                        | m_ROCKETLAKE)
> -#define m_CORE_AVX2 (m_HASWELL | m_SKYLAKE | m_ALDERLAKE | m_CORE_AVX512)
> +#define m_CORE_AVX2 (m_HASWELL | m_SKYLAKE | m_CORE_AVX512)
>  #define m_CORE_ALL (m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_CORE_AVX2)
>  #define m_GOLDMONT (HOST_WIDE_INT_1U<<PROCESSOR_GOLDMONT)
>  #define m_GOLDMONT_PLUS (HOST_WIDE_INT_1U<<PROCESSOR_GOLDMONT_PLUS)
> @@ -736,7 +736,7 @@ static const struct processor_costs *processor_cost_table[] =
>    &icelake_cost,
>    &skylake_cost,
>    &icelake_cost,
> -  &icelake_cost,
> +  &alderlake_cost,
>    &icelake_cost,
>    &intel_cost,
>    &geode_cost,
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e94efdf39fb..73c4d5115bb 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -17014,6 +17014,7 @@ ix86_sched_init_global (FILE *, int, int)
>      case PROCESSOR_SANDYBRIDGE:
>      case PROCESSOR_HASWELL:
>      case PROCESSOR_TREMONT:
> +    case PROCESSOR_ALDERLAKE:
>      case PROCESSOR_GENERIC:
>        /* Do not perform multipass scheduling for pre-reload schedule
>           to save compile time.  */
> diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
> index 93644be9cb3..dd5563d2e64 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -2070,6 +2070,126 @@ struct processor_costs icelake_cost = {
>    "16",                                        /* Func alignment.  */
>  };
>
> +/* alderlake_cost should produce code tuned for alderlake family of CPUs.  */
> +static stringop_algs alderlake_memcpy[2] = {
> +  {libcall,
> +   {{256, rep_prefix_1_byte, true},
> +    {256, loop, false},
> +    {-1, libcall, false}}},
> +  {libcall,
> +   {{256, rep_prefix_1_byte, true},
> +    {256, loop, false},
> +    {-1, libcall, false}}}};
> +static stringop_algs alderlake_memset[2] = {
> +  {libcall,
> +   {{256, rep_prefix_1_byte, true},
> +    {256, loop, false},
> +    {-1, libcall, false}}},
> +  {libcall,
> +   {{256, rep_prefix_1_byte, true},
> +    {256, loop, false},
> +    {-1, libcall, false}}}};
> +static const
> +struct processor_costs alderlake_cost = {
> +  {
> +  /* Start of register allocator costs.  integer->integer move cost is 2.  */
> +  6,                                /* cost for loading QImode using movzbl */
> +  {6, 6, 6},                           /* cost of loading integer registers
> +                                          in QImode, HImode and SImode.
> +                                          Relative to reg-reg move (2).  */
> +  {6, 6, 6},                           /* cost of storing integer registers */
> +  4,                                   /* cost of reg,reg fld/fst */
> +  {6, 6, 12},                          /* cost of loading fp registers
> +                                          in SFmode, DFmode and XFmode */
> +  {6, 6, 12},                          /* cost of storing fp registers
> +                                          in SFmode, DFmode and XFmode */
> +  2,                                   /* cost of moving MMX register */
> +  {6, 6},                              /* cost of loading MMX registers
> +                                          in SImode and DImode */
> +  {6, 6},                              /* cost of storing MMX registers
> +                                          in SImode and DImode */
> +  2, 3, 4,                             /* cost of moving XMM,YMM,ZMM register */
> +  {6, 6, 6, 10, 15},                   /* cost of loading SSE registers
> +                                          in 32,64,128,256 and 512-bit */
> +  {6, 6, 6, 10, 15},                   /* cost of storing SSE registers
> +                                          in 32,64,128,256 and 512-bit */
> +  6, 6,                                /* SSE->integer and integer->SSE moves */
> +  6, 6,                                /* mask->integer and integer->mask moves */
> +  {6, 6, 6},                           /* cost of loading mask register
> +                                          in QImode, HImode, SImode.  */
> +  {6, 6, 6},                   /* cost if storing mask register
> +                                          in QImode, HImode, SImode.  */
> +  2,                                   /* cost of moving mask register.  */
> +  /* End of register allocator costs.  */
> +  },
> +
> +  COSTS_N_INSNS (1),                   /* cost of an add instruction */
> +  COSTS_N_INSNS (1) + 1,               /* cost of a lea instruction */
> +  COSTS_N_INSNS (1),                   /* variable shift costs */
> +  COSTS_N_INSNS (1),                   /* constant shift costs */
> +  {COSTS_N_INSNS (3),                  /* cost of starting multiply for QI */
> +   COSTS_N_INSNS (4),                  /*                               HI */
> +   COSTS_N_INSNS (3),                  /*                               SI */
> +   COSTS_N_INSNS (4),                  /*                               DI */
> +   COSTS_N_INSNS (4)},                 /*                            other */
> +  0,                                   /* cost of multiply per each bit set */
> +  {COSTS_N_INSNS (16),                 /* cost of a divide/mod for QI */
> +   COSTS_N_INSNS (22),                 /*                          HI */
> +   COSTS_N_INSNS (30),                 /*                          SI */
> +   COSTS_N_INSNS (74),                 /*                          DI */
> +   COSTS_N_INSNS (74)},                        /*                          other */
> +  COSTS_N_INSNS (1),                   /* cost of movsx */
> +  COSTS_N_INSNS (1),                   /* cost of movzx */
> +  8,                                   /* "large" insn */
> +  17,                                  /* MOVE_RATIO */
> +  17,                                  /* CLEAR_RATIO */
> +  {6, 6, 6},                           /* cost of loading integer registers
> +                                          in QImode, HImode and SImode.
> +                                          Relative to reg-reg move (2).  */
> +  {6, 6, 6},                           /* cost of storing integer registers */
> +  {6, 6, 6, 10, 15},                   /* cost of loading SSE register
> +                                          in 32bit, 64bit, 128bit, 256bit and 512bit */
> +  {6, 6, 6, 10, 15},                   /* cost of storing SSE register
> +                                          in 32bit, 64bit, 128bit, 256bit and 512bit */
> +  {6, 6, 6, 10, 15},                   /* cost of unaligned loads.  */
> +  {6, 6, 6, 10, 15},                   /* cost of unaligned storess.  */
> +  2, 3, 4,                             /* cost of moving XMM,YMM,ZMM register */
> +  6,                                   /* cost of moving SSE register to integer.  */
> +  18, 6,                               /* Gather load static, per_elt.  */
> +  18, 6,                               /* Gather store static, per_elt.  */
> +  32,                                  /* size of l1 cache.  */
> +  512,                                 /* size of l2 cache.  */
> +  64,                                  /* size of prefetch block */
> +  6,                                   /* number of parallel prefetches */
> +  3,                                   /* Branch cost */
> +  COSTS_N_INSNS (3),                   /* cost of FADD and FSUB insns.  */
> +  COSTS_N_INSNS (5),                   /* cost of FMUL instruction.  */
> +  COSTS_N_INSNS (17),                  /* cost of FDIV instruction.  */
> +  COSTS_N_INSNS (1),                   /* cost of FABS instruction.  */
> +  COSTS_N_INSNS (1),                   /* cost of FCHS instruction.  */
> +  COSTS_N_INSNS (14),                  /* cost of FSQRT instruction.  */
> +
> +  COSTS_N_INSNS (1),                   /* cost of cheap SSE instruction.  */
> +  COSTS_N_INSNS (3),                   /* cost of ADDSS/SD SUBSS/SD insns.  */
> +  COSTS_N_INSNS (4),                   /* cost of MULSS instruction.  */
> +  COSTS_N_INSNS (5),                   /* cost of MULSD instruction.  */
> +  COSTS_N_INSNS (5),                   /* cost of FMA SS instruction.  */
> +  COSTS_N_INSNS (5),                   /* cost of FMA SD instruction.  */
> +  COSTS_N_INSNS (13),                  /* cost of DIVSS instruction.  */
> +  COSTS_N_INSNS (17),                  /* cost of DIVSD instruction.  */
> +  COSTS_N_INSNS (14),                  /* cost of SQRTSS instruction.  */
> +  COSTS_N_INSNS (18),                  /* cost of SQRTSD instruction.  */
> +  1, 4, 3, 3,                          /* reassoc int, fp, vec_int, vec_fp.  */
> +  alderlake_memcpy,
> +  alderlake_memset,
> +  COSTS_N_INSNS (4),                   /* cond_taken_branch_cost.  */
> +  COSTS_N_INSNS (2),                   /* cond_not_taken_branch_cost.  */
> +  "16:11:8",                           /* Loop alignment.  */
> +  "16:11:8",                           /* Jump alignment.  */
> +  "0:0:8",                             /* Label alignment.  */
> +  "16",                                        /* Func alignment.  */
> +};
> +
>    /* BTVER1 has optimized REP instruction for medium sized blocks, but for
>       very small blocks it is better to use loop. For large blocks, libcall can
>       do nontemporary accesses and beat inline considerably.  */
> diff --git a/gcc/config/i386/x86-tune-sched.c b/gcc/config/i386/x86-tune-sched.c
> index 56ada99a450..0c149a09531 100644
> --- a/gcc/config/i386/x86-tune-sched.c
> +++ b/gcc/config/i386/x86-tune-sched.c
> @@ -72,6 +72,7 @@ ix86_issue_rate (void)
>      case PROCESSOR_SANDYBRIDGE:
>      case PROCESSOR_HASWELL:
>      case PROCESSOR_TREMONT:
> +    case PROCESSOR_ALDERLAKE:
>      case PROCESSOR_GENERIC:
>        return 4;
>
> @@ -431,6 +432,7 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
>      case PROCESSOR_SANDYBRIDGE:
>      case PROCESSOR_HASWELL:
>      case PROCESSOR_TREMONT:
> +    case PROCESSOR_ALDERLAKE:
>      case PROCESSOR_GENERIC:
>        /* Stack engine allows to execute push&pop instructions in parall.  */
>        if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 58e8ead56b4..4ae0b569841 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -42,7 +42,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
>            m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL | m_SILVERMONT
>           | m_INTEL | m_KNL | m_KNM | m_K6_GEODE | m_AMD_MULTIPLE | m_GOLDMONT
> -         | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
> +         | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE |m_GENERIC)
>
>  /* X86_TUNE_PARTIAL_REG_DEPENDENCY: Enable more register renaming
>     on modern chips.  Preffer stores affecting whole integer register
> @@ -51,7 +51,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
>  DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
>            m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_CORE_AVX2
>           | m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_INTEL
> -         | m_KNL | m_KNM | m_AMD_MULTIPLE | m_TREMONT
> +         | m_KNL | m_KNM | m_AMD_MULTIPLE | m_TREMONT | m_ALDERLAKE
>           | m_GENERIC)
>
>  /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: This knob promotes all store
> @@ -62,7 +62,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
>     that can be partly masked by careful scheduling of moves.  */
>  DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
>            m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
> -         | m_BDVER | m_ZNVER | m_TREMONT | m_GENERIC)
> +         | m_BDVER | m_ZNVER | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY: This knob avoids
>     partial write to the destination in scalar SSE conversion from FP
> @@ -70,14 +70,14 @@ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
>  DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY,
>           "sse_partial_reg_fp_converts_dependency",
>           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
> -         | m_BDVER | m_ZNVER | m_GENERIC)
> +         | m_BDVER | m_ZNVER | m_ALDERLAKE| m_GENERIC)
>
>  /* X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY: This knob avoids partial
>     write to the destination in scalar SSE conversion from integer to FP.  */
>  DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY,
>           "sse_partial_reg_converts_dependency",
>           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
> -         | m_BDVER | m_ZNVER | m_GENERIC)
> +         | m_BDVER | m_ZNVER | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
>     are resolved on SSE register parts instead of whole registers, so we may
> @@ -103,14 +103,14 @@ DEF_TUNE (X86_TUNE_MOVX, "movx",
>            m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
>           | m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_KNL | m_KNM | m_INTEL
>           | m_GOLDMONT_PLUS | m_GEODE | m_AMD_MULTIPLE
> -         | m_CORE_AVX2 | m_TREMONT | m_GENERIC)
> +         | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by
>     full sized loads.  */
>  DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
>            m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL
>           | m_KNL | m_KNM | m_GOLDMONT | m_GOLDMONT_PLUS | m_AMD_MULTIPLE
> -         | m_TREMONT | m_GENERIC)
> +         | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_FUSE_CMP_AND_BRANCH_32: Fuse compare with a subsequent
>     conditional jump instruction for 32 bit TARGET.  */
> @@ -166,14 +166,14 @@ DEF_TUNE (X86_TUNE_EPILOGUE_USING_MOVE, "epilogue_using_move",
>  /* X86_TUNE_USE_LEAVE: Use "leave" instruction in epilogues where it fits.  */
>  DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
>           m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_TREMONT
> -         | m_GENERIC)
> +         | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_PUSH_MEMORY: Enable generation of "push mem" instructions.
>     Some chips, like 486 and Pentium works faster with separate load
>     and push instructions.  */
>  DEF_TUNE (X86_TUNE_PUSH_MEMORY, "push_memory",
>            m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE
> -          | m_TREMONT | m_GENERIC)
> +         | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_SINGLE_PUSH: Enable if single push insn is preferred
>     over esp subtraction.  */
> @@ -243,14 +243,14 @@ DEF_TUNE (X86_TUNE_READ_MODIFY, "read_modify", ~(m_PENT | m_LAKEMONT | m_PPRO))
>  DEF_TUNE (X86_TUNE_USE_INCDEC, "use_incdec",
>            ~(m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
>             | m_BONNELL | m_SILVERMONT | m_INTEL |  m_KNL | m_KNM | m_GOLDMONT
> -           | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC))
> +           | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC))
>
>  /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
>     for DFmode copies */
>  DEF_TUNE (X86_TUNE_INTEGER_DFMODE_MOVES, "integer_dfmode_moves",
>            ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
>             | m_KNL | m_KNM | m_INTEL | m_GEODE | m_AMD_MULTIPLE | m_GOLDMONT
> -           | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC))
> +           | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC))
>
>  /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag
>     will impact LEA instruction selection. */
> @@ -298,14 +298,14 @@ DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB,
>  DEF_TUNE (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES,
>           "misaligned_move_string_pro_epilogues",
>           m_386 | m_486 | m_CORE_ALL | m_AMD_MULTIPLE | m_TREMONT
> -         | m_GENERIC)
> +         | m_ALDERLAKE |m_GENERIC)
>
>  /* X86_TUNE_USE_SAHF: Controls use of SAHF.  */
>  DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf",
>            m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
>           | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER
>           | m_BTVER | m_ZNVER | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
> -         | m_GENERIC)
> +         | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO instructions.  */
>  DEF_TUNE (X86_TUNE_USE_CLTD, "use_cltd",
> @@ -316,12 +316,12 @@ DEF_TUNE (X86_TUNE_USE_CLTD, "use_cltd",
>  DEF_TUNE (X86_TUNE_USE_BT, "use_bt",
>            m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL
>           | m_LAKEMONT | m_AMD_MULTIPLE | m_GOLDMONT | m_GOLDMONT_PLUS
> -         | m_TREMONT | m_GENERIC)
> +         | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
>     for bit-manipulation instructions.  */
>  DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
> -         m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_GENERIC)
> +         m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based
>     on hardware capabilities. Bdver3 hardware has a loop buffer which makes
> @@ -333,11 +333,11 @@ DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
>     if-converted sequence to one.  */
>  DEF_TUNE (X86_TUNE_ONE_IF_CONV_INSN, "one_if_conv_insn",
>           m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_CORE_ALL | m_GOLDMONT
> -         | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
> +         | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_AVOID_MFENCE: Use lock prefixed instructions instead of mfence.  */
>  DEF_TUNE (X86_TUNE_AVOID_MFENCE, "avoid_mfence",
> -        m_CORE_ALL | m_BDVER | m_ZNVER | m_TREMONT | m_GENERIC)
> +        m_CORE_ALL | m_BDVER | m_ZNVER | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_EXPAND_ABS: This enables a new abs pattern by
>     generating instructions for abs (x) = (((signed) x >> (W-1) ^ x) -
> @@ -361,7 +361,8 @@ DEF_TUNE (X86_TUNE_USE_HIMODE_FIOP, "use_himode_fiop",
>  DEF_TUNE (X86_TUNE_USE_SIMODE_FIOP, "use_simode_fiop",
>            ~(m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL
>             | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_AMD_MULTIPLE
> -           | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC))
> +           | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
> +           | m_GENERIC))
>
>  /* X86_TUNE_USE_FFREEP: Use freep instruction instead of fstp.  */
>  DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE)
> @@ -370,7 +371,7 @@ DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE)
>  DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, "ext_80387_constants",
>            m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
>           | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_ATHLON_K8 | m_GOLDMONT
> -         | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
> +         | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /*****************************************************************************/
>  /* SSE instruction selection tuning                                          */
> @@ -385,15 +386,15 @@ DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, "general_regs_sse_spill",
>     of a sequence loading registers by parts.  */
>  DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, "sse_unaligned_load_optimal",
>           m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
> -         | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
> -         | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC)
> +         | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
> +         | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC)
>
>  /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL: Use movups for misaligned stores
>     instead of a sequence loading registers by parts.  */
>  DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal",
>           m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
>           | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
> -         | m_TREMONT | m_BDVER | m_ZNVER | m_GENERIC)
> +         | m_TREMONT | m_ALDERLAKE | m_BDVER | m_ZNVER | m_GENERIC)
>
>  /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL: Use packed single
>     precision 128bit instructions instead of double where possible.   */
> @@ -402,13 +403,13 @@ DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, "sse_packed_single_insn_optim
>
>  /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores.   */
>  DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
> -         m_AMD_MULTIPLE | m_CORE_ALL | m_TREMONT | m_GENERIC)
> +         m_AMD_MULTIPLE | m_CORE_ALL | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_SSE_LOAD0_BY_PXOR: Always use pxor to load0 as opposed to
>     xorps/xorpd and other variants.  */
>  DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
>           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER
> -         | m_TREMONT | m_GENERIC)
> +         | m_TREMONT | m_ALDERLAKE | m_GENERIC)
>
>  /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from integer
>     to SSE registers.  If disabled, the moves will be done by storing
> @@ -454,11 +455,12 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb",
>
>  /* X86_TUNE_AVOID_4BYTE_PREFIXES: Avoid instructions requiring 4+ bytes of prefixes.  */
>  DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
> -         m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_INTEL)
> +         m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
> +         | m_INTEL)
>
>  /* X86_TUNE_USE_GATHER: Use gather instructions.  */
>  DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
> -         ~(m_ZNVER1 | m_ZNVER2 | m_GENERIC))
> +         ~(m_ZNVER1 | m_ZNVER2 | m_GENERIC | m_ALDERLAKE))
>
>  /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
>     smaller FMA chain.  */
> @@ -503,12 +505,12 @@ DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512)
>  /* X86_TUNE_AVX256_MOVE_BY_PIECES: Optimize move_by_pieces with 256-bit
>     AVX instructions.  */
>  DEF_TUNE (X86_TUNE_AVX256_MOVE_BY_PIECES, "avx256_move_by_pieces",
> -         m_ALDERLAKE | m_CORE_AVX512)
> +         m_CORE_AVX512)
>
>  /* X86_TUNE_AVX256_STORE_BY_PIECES: Optimize store_by_pieces with 256-bit
>     AVX instructions.  */
>  DEF_TUNE (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces",
> -         m_ALDERLAKE | m_CORE_AVX512)
> +         m_CORE_AVX512)
>
>  /*****************************************************************************/
>  /*****************************************************************************/
> --
> 2.17.1
>
> Thanks,
> Lili.

diff mbox series

Patch

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index e7a3bd4aaea..a8cc0664f11 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -131,7 +131,7 @@  along with GCC; see the file COPYING3.  If not see
 		       | m_ICELAKE_CLIENT | m_ICELAKE_SERVER | m_CASCADELAKE \
 		       | m_TIGERLAKE | m_COOPERLAKE | m_SAPPHIRERAPIDS \
 		       | m_ROCKETLAKE)
-#define m_CORE_AVX2 (m_HASWELL | m_SKYLAKE | m_ALDERLAKE | m_CORE_AVX512)
+#define m_CORE_AVX2 (m_HASWELL | m_SKYLAKE | m_CORE_AVX512)
 #define m_CORE_ALL (m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_CORE_AVX2)
 #define m_GOLDMONT (HOST_WIDE_INT_1U<<PROCESSOR_GOLDMONT)
 #define m_GOLDMONT_PLUS (HOST_WIDE_INT_1U<<PROCESSOR_GOLDMONT_PLUS)
@@ -736,7 +736,7 @@  static const struct processor_costs *processor_cost_table[] =
   &icelake_cost,
   &skylake_cost,
   &icelake_cost,
-  &icelake_cost,
+  &alderlake_cost,
   &icelake_cost,
   &intel_cost,
   &geode_cost,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e94efdf39fb..73c4d5115bb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17014,6 +17014,7 @@  ix86_sched_init_global (FILE *, int, int)
     case PROCESSOR_SANDYBRIDGE:
     case PROCESSOR_HASWELL:
     case PROCESSOR_TREMONT:
+    case PROCESSOR_ALDERLAKE:
     case PROCESSOR_GENERIC:
       /* Do not perform multipass scheduling for pre-reload schedule
          to save compile time.  */
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 93644be9cb3..dd5563d2e64 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2070,6 +2070,126 @@  struct processor_costs icelake_cost = {
   "16",					/* Func alignment.  */
 };
 
+/* alderlake_cost should produce code tuned for alderlake family of CPUs.  */
+static stringop_algs alderlake_memcpy[2] = {
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+    {256, loop, false},
+    {-1, libcall, false}}},
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+    {256, loop, false},
+    {-1, libcall, false}}}};
+static stringop_algs alderlake_memset[2] = {
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+    {256, loop, false},
+    {-1, libcall, false}}},
+  {libcall,
+   {{256, rep_prefix_1_byte, true},
+    {256, loop, false},
+    {-1, libcall, false}}}};
+static const
+struct processor_costs alderlake_cost = {
+  {
+  /* Start of register allocator costs.  integer->integer move cost is 2.  */
+  6,				     /* cost for loading QImode using movzbl */
+  {6, 6, 6},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {6, 6, 6},				/* cost of storing integer registers */
+  4,					/* cost of reg,reg fld/fst */
+  {6, 6, 12},				/* cost of loading fp registers
+					   in SFmode, DFmode and XFmode */
+  {6, 6, 12},				/* cost of storing fp registers
+					   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {6, 6},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {6, 6},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2, 3, 4,				/* cost of moving XMM,YMM,ZMM register */
+  {6, 6, 6, 10, 15},			/* cost of loading SSE registers
+					   in 32,64,128,256 and 512-bit */
+  {6, 6, 6, 10, 15},			/* cost of storing SSE registers
+					   in 32,64,128,256 and 512-bit */
+  6, 6,				/* SSE->integer and integer->SSE moves */
+  6, 6,				/* mask->integer and integer->mask moves */
+  {6, 6, 6},				/* cost of loading mask register
+					   in QImode, HImode, SImode.  */
+  {6, 6, 6},			/* cost if storing mask register
+					   in QImode, HImode, SImode.  */
+  2,					/* cost of moving mask register.  */
+  /* End of register allocator costs.  */
+  },
+
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (1) + 1,		/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (3),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (3),			/*				 SI */
+   COSTS_N_INSNS (4),			/*				 DI */
+   COSTS_N_INSNS (4)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (16),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (22),			/*			    HI */
+   COSTS_N_INSNS (30),			/*			    SI */
+   COSTS_N_INSNS (74),			/*			    DI */
+   COSTS_N_INSNS (74)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  17,					/* MOVE_RATIO */
+  17,					/* CLEAR_RATIO */
+  {6, 6, 6},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {6, 6, 6},				/* cost of storing integer registers */
+  {6, 6, 6, 10, 15},			/* cost of loading SSE register
+					   in 32bit, 64bit, 128bit, 256bit and 512bit */
+  {6, 6, 6, 10, 15},			/* cost of storing SSE register
+					   in 32bit, 64bit, 128bit, 256bit and 512bit */
+  {6, 6, 6, 10, 15},			/* cost of unaligned loads.  */
+  {6, 6, 6, 10, 15},			/* cost of unaligned storess.  */
+  2, 3, 4,				/* cost of moving XMM,YMM,ZMM register */
+  6,					/* cost of moving SSE register to integer.  */
+  18, 6,				/* Gather load static, per_elt.  */
+  18, 6,				/* Gather store static, per_elt.  */
+  32,					/* size of l1 cache.  */
+  512,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  6,					/* number of parallel prefetches */
+  3,					/* Branch cost */
+  COSTS_N_INSNS (3),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (5),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (17),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (1),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (1),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (14),			/* cost of FSQRT instruction.  */
+
+  COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
+  COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
+  COSTS_N_INSNS (4),			/* cost of MULSS instruction.  */
+  COSTS_N_INSNS (5),			/* cost of MULSD instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMA SS instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMA SD instruction.  */
+  COSTS_N_INSNS (13),			/* cost of DIVSS instruction.  */
+  COSTS_N_INSNS (17),			/* cost of DIVSD instruction.  */
+  COSTS_N_INSNS (14),			/* cost of SQRTSS instruction.  */
+  COSTS_N_INSNS (18),			/* cost of SQRTSD instruction.  */
+  1, 4, 3, 3,				/* reassoc int, fp, vec_int, vec_fp.  */
+  alderlake_memcpy,
+  alderlake_memset,
+  COSTS_N_INSNS (4),			/* cond_taken_branch_cost.  */
+  COSTS_N_INSNS (2),			/* cond_not_taken_branch_cost.  */
+  "16:11:8",				/* Loop alignment.  */
+  "16:11:8",				/* Jump alignment.  */
+  "0:0:8",				/* Label alignment.  */
+  "16",					/* Func alignment.  */
+};
+
   /* BTVER1 has optimized REP instruction for medium sized blocks, but for
      very small blocks it is better to use loop. For large blocks, libcall can
      do nontemporary accesses and beat inline considerably.  */
diff --git a/gcc/config/i386/x86-tune-sched.c b/gcc/config/i386/x86-tune-sched.c
index 56ada99a450..0c149a09531 100644
--- a/gcc/config/i386/x86-tune-sched.c
+++ b/gcc/config/i386/x86-tune-sched.c
@@ -72,6 +72,7 @@  ix86_issue_rate (void)
     case PROCESSOR_SANDYBRIDGE:
     case PROCESSOR_HASWELL:
     case PROCESSOR_TREMONT:
+    case PROCESSOR_ALDERLAKE:
     case PROCESSOR_GENERIC:
       return 4;
 
@@ -431,6 +432,7 @@  ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
     case PROCESSOR_SANDYBRIDGE:
     case PROCESSOR_HASWELL:
     case PROCESSOR_TREMONT:
+    case PROCESSOR_ALDERLAKE:
     case PROCESSOR_GENERIC:
       /* Stack engine allows to execute push&pop instructions in parall.  */
       if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 58e8ead56b4..4ae0b569841 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -42,7 +42,7 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
           m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 	  | m_INTEL | m_KNL | m_KNM | m_K6_GEODE | m_AMD_MULTIPLE | m_GOLDMONT
-	  | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
+	  | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE |m_GENERIC)
 
 /* X86_TUNE_PARTIAL_REG_DEPENDENCY: Enable more register renaming
    on modern chips.  Preffer stores affecting whole integer register
@@ -51,7 +51,7 @@  DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
 DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
           m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_CORE_AVX2
 	  | m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_INTEL
-	  | m_KNL | m_KNM | m_AMD_MULTIPLE | m_TREMONT
+	  | m_KNL | m_KNM | m_AMD_MULTIPLE | m_TREMONT | m_ALDERLAKE
 	  | m_GENERIC)
 
 /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: This knob promotes all store
@@ -62,7 +62,7 @@  DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
    that can be partly masked by careful scheduling of moves.  */
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
-	  | m_BDVER | m_ZNVER | m_TREMONT | m_GENERIC)
+	  | m_BDVER | m_ZNVER | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY: This knob avoids
    partial write to the destination in scalar SSE conversion from FP
@@ -70,14 +70,14 @@  DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY,
 	  "sse_partial_reg_fp_converts_dependency",
 	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
-	  | m_BDVER | m_ZNVER | m_GENERIC)
+	  | m_BDVER | m_ZNVER | m_ALDERLAKE| m_GENERIC)
 
 /* X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY: This knob avoids partial
    write to the destination in scalar SSE conversion from integer to FP.  */
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY,
 	  "sse_partial_reg_converts_dependency",
 	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
-	  | m_BDVER | m_ZNVER | m_GENERIC)
+	  | m_BDVER | m_ZNVER | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
    are resolved on SSE register parts instead of whole registers, so we may
@@ -103,14 +103,14 @@  DEF_TUNE (X86_TUNE_MOVX, "movx",
           m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
 	  | m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_KNL | m_KNM | m_INTEL
 	  | m_GOLDMONT_PLUS | m_GEODE | m_AMD_MULTIPLE
-	  | m_CORE_AVX2 | m_TREMONT | m_GENERIC)
+	  | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by
    full sized loads.  */
 DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
           m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL
 	  | m_KNL | m_KNM | m_GOLDMONT | m_GOLDMONT_PLUS | m_AMD_MULTIPLE
-	  | m_TREMONT | m_GENERIC)
+	  | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_32: Fuse compare with a subsequent
    conditional jump instruction for 32 bit TARGET.  */
@@ -166,14 +166,14 @@  DEF_TUNE (X86_TUNE_EPILOGUE_USING_MOVE, "epilogue_using_move",
 /* X86_TUNE_USE_LEAVE: Use "leave" instruction in epilogues where it fits.  */
 DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
 	  m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_TREMONT
-	  | m_GENERIC)
+	  | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_PUSH_MEMORY: Enable generation of "push mem" instructions.
    Some chips, like 486 and Pentium works faster with separate load
    and push instructions.  */
 DEF_TUNE (X86_TUNE_PUSH_MEMORY, "push_memory",
           m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE
-          | m_TREMONT | m_GENERIC)
+	  | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_SINGLE_PUSH: Enable if single push insn is preferred
    over esp subtraction.  */
@@ -243,14 +243,14 @@  DEF_TUNE (X86_TUNE_READ_MODIFY, "read_modify", ~(m_PENT | m_LAKEMONT | m_PPRO))
 DEF_TUNE (X86_TUNE_USE_INCDEC, "use_incdec",
           ~(m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
 	    | m_BONNELL | m_SILVERMONT | m_INTEL |  m_KNL | m_KNM | m_GOLDMONT
-	    | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC))
+	    | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC))
 
 /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
    for DFmode copies */
 DEF_TUNE (X86_TUNE_INTEGER_DFMODE_MOVES, "integer_dfmode_moves",
           ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 	    | m_KNL | m_KNM | m_INTEL | m_GEODE | m_AMD_MULTIPLE | m_GOLDMONT
-	    | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC))
+	    | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC))
 
 /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag
    will impact LEA instruction selection. */
@@ -298,14 +298,14 @@  DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB,
 DEF_TUNE (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES,
 	  "misaligned_move_string_pro_epilogues",
 	  m_386 | m_486 | m_CORE_ALL | m_AMD_MULTIPLE | m_TREMONT
-	  | m_GENERIC)
+	  | m_ALDERLAKE |m_GENERIC)
 
 /* X86_TUNE_USE_SAHF: Controls use of SAHF.  */
 DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 	  | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER
 	  | m_BTVER | m_ZNVER | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
-	  | m_GENERIC)
+	  | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO instructions.  */
 DEF_TUNE (X86_TUNE_USE_CLTD, "use_cltd",
@@ -316,12 +316,12 @@  DEF_TUNE (X86_TUNE_USE_CLTD, "use_cltd",
 DEF_TUNE (X86_TUNE_USE_BT, "use_bt",
           m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL
 	  | m_LAKEMONT | m_AMD_MULTIPLE | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_GENERIC)
+	  | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
    for bit-manipulation instructions.  */
 DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
-	  m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_GENERIC)
+	  m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based
    on hardware capabilities. Bdver3 hardware has a loop buffer which makes
@@ -333,11 +333,11 @@  DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
    if-converted sequence to one.  */
 DEF_TUNE (X86_TUNE_ONE_IF_CONV_INSN, "one_if_conv_insn",
 	  m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_CORE_ALL | m_GOLDMONT
-	  | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
+	  | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_AVOID_MFENCE: Use lock prefixed instructions instead of mfence.  */
 DEF_TUNE (X86_TUNE_AVOID_MFENCE, "avoid_mfence",
-	 m_CORE_ALL | m_BDVER | m_ZNVER | m_TREMONT | m_GENERIC)
+	 m_CORE_ALL | m_BDVER | m_ZNVER | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_EXPAND_ABS: This enables a new abs pattern by
    generating instructions for abs (x) = (((signed) x >> (W-1) ^ x) -
@@ -361,7 +361,8 @@  DEF_TUNE (X86_TUNE_USE_HIMODE_FIOP, "use_himode_fiop",
 DEF_TUNE (X86_TUNE_USE_SIMODE_FIOP, "use_simode_fiop",
           ~(m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL
 	    | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_AMD_MULTIPLE
-	    | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC))
+	    | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
+	    | m_GENERIC))
 
 /* X86_TUNE_USE_FFREEP: Use freep instruction instead of fstp.  */
 DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE)
@@ -370,7 +371,7 @@  DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE)
 DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, "ext_80387_constants",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 	  | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_ATHLON_K8 | m_GOLDMONT
-	  | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)
+	  | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /*****************************************************************************/
 /* SSE instruction selection tuning                                          */
@@ -385,15 +386,15 @@  DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, "general_regs_sse_spill",
    of a sequence loading registers by parts.  */
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, "sse_unaligned_load_optimal",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
-	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC)
+	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
+	  | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL: Use movups for misaligned stores
    instead of a sequence loading registers by parts.  */
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
 	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_BDVER | m_ZNVER | m_GENERIC)
+	  | m_TREMONT | m_ALDERLAKE | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL: Use packed single
    precision 128bit instructions instead of double where possible.   */
@@ -402,13 +403,13 @@  DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, "sse_packed_single_insn_optim
 
 /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores.   */
 DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
-	  m_AMD_MULTIPLE | m_CORE_ALL | m_TREMONT | m_GENERIC)
+	  m_AMD_MULTIPLE | m_CORE_ALL | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_SSE_LOAD0_BY_PXOR: Always use pxor to load0 as opposed to
    xorps/xorpd and other variants.  */
 DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
 	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER
-	  | m_TREMONT | m_GENERIC)
+	  | m_TREMONT | m_ALDERLAKE | m_GENERIC)
 
 /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from integer
    to SSE registers.  If disabled, the moves will be done by storing
@@ -454,11 +455,12 @@  DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb",
 
 /* X86_TUNE_AVOID_4BYTE_PREFIXES: Avoid instructions requiring 4+ bytes of prefixes.  */
 DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
-	  m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_INTEL)
+	  m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE
+	  | m_INTEL)
 
 /* X86_TUNE_USE_GATHER: Use gather instructions.  */
 DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
-	  ~(m_ZNVER1 | m_ZNVER2 | m_GENERIC))
+	  ~(m_ZNVER1 | m_ZNVER2 | m_GENERIC | m_ALDERLAKE))
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
    smaller FMA chain.  */
@@ -503,12 +505,12 @@  DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512)
 /* X86_TUNE_AVX256_MOVE_BY_PIECES: Optimize move_by_pieces with 256-bit
    AVX instructions.  */
 DEF_TUNE (X86_TUNE_AVX256_MOVE_BY_PIECES, "avx256_move_by_pieces",
-	  m_ALDERLAKE | m_CORE_AVX512)
+	  m_CORE_AVX512)
 
 /* X86_TUNE_AVX256_STORE_BY_PIECES: Optimize store_by_pieces with 256-bit
    AVX instructions.  */
 DEF_TUNE (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces",
-	  m_ALDERLAKE | m_CORE_AVX512)
+	  m_CORE_AVX512)
 
 /*****************************************************************************/
 /*****************************************************************************/