From patchwork Mon Nov 1 16:36:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philipp Tomsich X-Patchwork-Id: 46913 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A0703385781D for ; Mon, 1 Nov 2021 16:37:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id D940D3858432 for ; Mon, 1 Nov 2021 16:36:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D940D3858432 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lj1-x22d.google.com with SMTP id h11so30382355ljk.1 for ; Mon, 01 Nov 2021 09:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull-eu.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=iq//qNAuu6Mc0EV77SJTUaLmztByb/DvSn8JR0YEOco=; b=Zz7tNVsT3q7+Qv08xzZRRfAy/0aEfqlTCOCAppD7LIfx3LOHzfWbVVHPoRQp92O6Zk 5rRtEUj7RUPWXWQQwv6tsZT4lwueIRLpvB8BVgFwn+n5F3/b4O5hF4mXqoGA+1q7jI+F 3+lGEjkigc9IcfRCfJR15Etk1Mb2HWfZ6+agq8g2Mi501bOumwJOz0tRmFNGGgeeWp9E LiRVF3S4XFknr4m0sab95ZYiPzfYYge5MQYVaBmHGppjYfchcml6IyY3DAei0cznldHQ RNoDa5JwAYozMXppIyhK98wRxLKTmgoUSYSkIf7lFZmFoDhiycjBjoRQIh4WTm2jzfU1 xN4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=iq//qNAuu6Mc0EV77SJTUaLmztByb/DvSn8JR0YEOco=; b=0Q1wHpzW/WNR6421eI9k4hhJzzqDQ57NjtsmvjRcIL5lmIQeLwcGcuzDxQ6ML/2iBv O1NHwrbxLizz2xTOIsy2zUi/pRMx1AsMaDf8GM42vlhyza9vXyLZc4pF8uMvHGdOSRdB ogBqrZcLf4PexNG0a0MLYWMMWQGQ/uDTslTS0Ly9awqjlFwtOAkz9CuzQTw8xXWJACpQ Xeeyqz1sY4Q6MH85PJiXQj1VJBrF+tbDblFRyYeJZSaRIDlpyuBvmFc8eg7dkoy+F689 laEw2mlTEA8w1eoGDGVPJa7XneuC2Cii9mrJCuQe8JnLT+NyN8BCH6rrTi5OZN8SX83j /mfQ== X-Gm-Message-State: AOAM533dkHXtmGjMS5qS95wXSN3vDdc/kHuMuyTddI5KFu01bFu/Q7ZP 9SXk33IfbjCQWrtrP28U/dPApmO2Ii5qIoAI X-Google-Smtp-Source: ABdhPJzDVnThBbRX8jJvQYtJYPnbG1FDfD/b6fv8HMYfG6eqifAeD59S88j80LWjntMG8YMqMcg5mA== X-Received: by 2002:a2e:b550:: with SMTP id a16mr31327670ljn.268.1635784603414; Mon, 01 Nov 2021 09:36:43 -0700 (PDT) Received: from ubuntu-focal.. ([2a01:4f9:3a:1e26::2]) by smtp.gmail.com with ESMTPSA id o7sm1576471ljj.41.2021.11.01.09.36.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Nov 2021 09:36:43 -0700 (PDT) From: Philipp Tomsich To: gcc-patches@gcc.gnu.org Subject: [PATCH v1] aarch64: enable Ampere-1 CPU Date: Mon, 1 Nov 2021 17:36:25 +0100 Message-Id: <20211101163625.6795-1-philipp.tomsich@vrull.eu> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Smith , JiangNing Liu , Philipp Tomsich Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This adds support and a basic turning model for the Ampere Computing "Ampere-1" CPU. The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is modelled as a 4-wide issue (as with all modern micro-architectures, the chosen issue rate is a compromise between the maximum dispatch rate and the maximum rate of uops issued to the scheduler). This adds the -mcpu=ampere1 command-line option and the relevant cost information/tuning tables for the Ampere-1. gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1 core. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64-cost-tables.h: Add extra costs for Ampere-1. * config/aarch64/aarch64.c: Add tuning structures for Ampere-1. --- gcc/config/aarch64/aarch64-cores.def | 3 +- gcc/config/aarch64/aarch64-cost-tables.h | 107 +++++++++++++++++++++++ gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/config/aarch64/aarch64.c | 78 +++++++++++++++++ gcc/doc/invoke.texi | 2 +- 5 files changed, 189 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 77da31084de..617cde42fba 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -68,7 +68,8 @@ AARCH64_CORE("octeontx83", octeontxt83, thunderx, 8A, AARCH64_FL_FOR_ARCH AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a2, -1) AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a3, -1) -/* Ampere Computing cores. */ +/* Ampere Computing ('\xC0') cores. */ +AARCH64_CORE("ampere1", ampere1, cortexa57, 8_6A, AARCH64_FL_FOR_ARCH8_6, ampere1, 0xC0, 0xac3, -1) /* Do not swap around "emag" and "xgene1", this order is required to handle variant correctly. */ AARCH64_CORE("emag", emag, xgene1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, emag, 0x50, 0x000, 3) diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index bb499a1eae6..e6ded65b67d 100644 --- a/gcc/config/aarch64/aarch64-cost-tables.h +++ b/gcc/config/aarch64/aarch64-cost-tables.h @@ -668,4 +668,111 @@ const struct cpu_cost_table a64fx_extra_costs = } }; +const struct cpu_cost_table ampere1_extra_costs = +{ + /* ALU */ + { + 0, /* arith. */ + 0, /* logical. */ + 0, /* shift. */ + COSTS_N_INSNS (1), /* shift_reg. */ + 0, /* arith_shift. */ + COSTS_N_INSNS (1), /* arith_shift_reg. */ + 0, /* log_shift. */ + COSTS_N_INSNS (1), /* log_shift_reg. */ + 0, /* extend. */ + COSTS_N_INSNS (1), /* extend_arith. */ + 0, /* bfi. */ + 0, /* bfx. */ + 0, /* clz. */ + 0, /* rev. */ + 0, /* non_exec. */ + true /* non_exec_costs_exec. */ + }, + { + /* MULT SImode */ + { + COSTS_N_INSNS (3), /* simple. */ + COSTS_N_INSNS (3), /* flag_setting. */ + COSTS_N_INSNS (3), /* extend. */ + COSTS_N_INSNS (4), /* add. */ + COSTS_N_INSNS (4), /* extend_add. */ + COSTS_N_INSNS (18) /* idiv. */ + }, + /* MULT DImode */ + { + COSTS_N_INSNS (3), /* simple. */ + 0, /* flag_setting (N/A). */ + COSTS_N_INSNS (3), /* extend. */ + COSTS_N_INSNS (4), /* add. */ + COSTS_N_INSNS (4), /* extend_add. */ + COSTS_N_INSNS (34) /* idiv. */ + } + }, + /* LD/ST */ + { + COSTS_N_INSNS (4), /* load. */ + COSTS_N_INSNS (4), /* load_sign_extend. */ + 0, /* ldrd (n/a). */ + 0, /* ldm_1st. */ + 0, /* ldm_regs_per_insn_1st. */ + 0, /* ldm_regs_per_insn_subsequent. */ + COSTS_N_INSNS (5), /* loadf. */ + COSTS_N_INSNS (5), /* loadd. */ + COSTS_N_INSNS (5), /* load_unaligned. */ + 0, /* store. */ + 0, /* strd. */ + 0, /* stm_1st. */ + 0, /* stm_regs_per_insn_1st. */ + 0, /* stm_regs_per_insn_subsequent. */ + COSTS_N_INSNS (2), /* storef. */ + COSTS_N_INSNS (2), /* stored. */ + COSTS_N_INSNS (2), /* store_unaligned. */ + COSTS_N_INSNS (3), /* loadv. */ + COSTS_N_INSNS (3) /* storev. */ + }, + { + /* FP SFmode */ + { + COSTS_N_INSNS (25), /* div. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (4), /* mult_addsub. */ + COSTS_N_INSNS (4), /* fma. */ + COSTS_N_INSNS (4), /* addsub. */ + COSTS_N_INSNS (2), /* fpconst. */ + COSTS_N_INSNS (4), /* neg. */ + COSTS_N_INSNS (4), /* compare. */ + COSTS_N_INSNS (4), /* widen. */ + COSTS_N_INSNS (4), /* narrow. */ + COSTS_N_INSNS (4), /* toint. */ + COSTS_N_INSNS (4), /* fromint. */ + COSTS_N_INSNS (4) /* roundint. */ + }, + /* FP DFmode */ + { + COSTS_N_INSNS (34), /* div. */ + COSTS_N_INSNS (5), /* mult. */ + COSTS_N_INSNS (5), /* mult_addsub. */ + COSTS_N_INSNS (5), /* fma. */ + COSTS_N_INSNS (5), /* addsub. */ + COSTS_N_INSNS (2), /* fpconst. */ + COSTS_N_INSNS (5), /* neg. */ + COSTS_N_INSNS (5), /* compare. */ + COSTS_N_INSNS (5), /* widen. */ + COSTS_N_INSNS (5), /* narrow. */ + COSTS_N_INSNS (6), /* toint. */ + COSTS_N_INSNS (6), /* fromint. */ + COSTS_N_INSNS (5) /* roundint. */ + } + }, + /* Vector */ + { + COSTS_N_INSNS (3), /* alu. */ + COSTS_N_INSNS (3), /* mult. */ + COSTS_N_INSNS (2), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ + } +}; + #endif diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index 12be9134bab..3eed700c063 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index fd9249c62b3..c5415999984 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1067,6 +1067,43 @@ static const struct cpu_vector_cost thunderx3t110_vector_cost = nullptr /* issue_info */ }; +static const advsimd_vec_cost ampere1_advsimd_vector_cost = +{ + 3, /* int_stmt_cost */ + 3, /* fp_stmt_cost */ + 0, /* ld2_st2_permute_cost */ + 0, /* ld3_st3_permute_cost */ + 0, /* ld4_st4_permute_cost */ + 2, /* permute_cost */ + 12, /* reduc_i8_cost */ + 9, /* reduc_i16_cost */ + 6, /* reduc_i32_cost */ + 5, /* reduc_i64_cost */ + 9, /* reduc_f16_cost */ + 6, /* reduc_f32_cost */ + 5, /* reduc_f64_cost */ + 8, /* store_elt_extra_cost */ + 6, /* vec_to_scalar_cost */ + 7, /* scalar_to_vec_cost */ + 5, /* align_load_cost */ + 5, /* unalign_load_cost */ + 2, /* unalign_store_cost */ + 2 /* store_cost */ +}; + +/* Ampere-1 costs for vector insn classes. */ +static const struct cpu_vector_cost ampere1_vector_cost = +{ + 1, /* scalar_int_stmt_cost */ + 1, /* scalar_fp_stmt_cost */ + 4, /* scalar_load_cost */ + 1, /* scalar_store_cost */ + 1, /* cond_taken_branch_cost */ + 1, /* cond_not_taken_branch_cost */ + &ere1_advsimd_vector_cost, /* advsimd */ + nullptr, /* sve */ + nullptr /* issue_info */ +}; /* Generic costs for branch instructions. */ static const struct cpu_branch_cost generic_branch_cost = @@ -1210,6 +1247,17 @@ static const cpu_prefetch_tune a64fx_prefetch_tune = -1 /* default_opt_level */ }; +static const cpu_prefetch_tune ampere1_prefetch_tune = +{ + 0, /* num_slots */ + 64, /* l1_cache_size */ + 64, /* l1_cache_line_size */ + 2048, /* l2_cache_size */ + true, /* prefetch_dynamic_strides */ + -1, /* minimum_stride */ + -1 /* default_opt_level */ +}; + static const struct tune_params generic_tunings = { &cortexa57_extra_costs, @@ -1670,6 +1718,36 @@ static const struct tune_params neoversen1_tunings = &generic_prefetch_tune }; +static const struct tune_params ampere1_tunings = +{ + &ere1_extra_costs, + &generic_addrcost_table, + &generic_regmove_cost, + &ere1_vector_cost, + &generic_branch_cost, + &generic_approx_modes, + SVE_NOT_IMPLEMENTED, /* sve_width */ + 4, /* memmov_cost */ + 4, /* issue_rate */ + (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC | + AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK | + AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ | + AARCH64_FUSE_CMP_BRANCH), + /* fusible_ops */ + "32", /* function_align. */ + "4", /* jump_align. */ + "32:16", /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 2, /* vec_reassoc_width. */ + 2, /* min_div_recip_mul_sf. */ + 2, /* min_div_recip_mul_df. */ + 0, /* max_case_values. */ + tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune +}; + static const advsimd_vec_cost neoversev1_advsimd_vector_cost = { 2, /* int_stmt_cost */ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c5730228821..9fb74d34920 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -18737,7 +18737,7 @@ performance of the code. Permissible values for this option are: @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53}, @samp{cortex-a75.cortex-a55}, @samp{cortex-a76.cortex-a55}, @samp{cortex-r82}, @samp{cortex-x1}, @samp{cortex-x2}, -@samp{cortex-a510}, @samp{cortex-a710}, @samp{native}. +@samp{cortex-a510}, @samp{cortex-a710}, @samp{ampere1}, @samp{native}. The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}, @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},