From patchwork Thu Nov 30 08:27:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Di Zhao OS X-Patchwork-Id: 81003 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 80A8B385DC14 for ; Thu, 30 Nov 2023 08:28:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2120.outbound.protection.outlook.com [40.107.223.120]) by sourceware.org (Postfix) with ESMTPS id ADF833858D32 for ; Thu, 30 Nov 2023 08:27:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ADF833858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ADF833858D32 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.223.120 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1701332857; cv=pass; b=TV7CzmDasN0jWxg2LomHWMLvKXlfXcakK28PqwYoY8Rp/hZPw/yDrNgBZzfX2NnRd4NsUZhnubEeuWQhSXG4QE/K16LXkPJMHYCw29Vc42IiEALlaqTMy/ze06KPd1LIZx1q3SwNWIyAL7NjVlB/t/Dh74cOSS35/iIMEji5jA8= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1701332857; c=relaxed/simple; bh=UtgjNQAe6mt5IZpjyucR1GxVZQT61cIlyNVbs/yrPJ4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=BG0oc6pCPFJDj4y139hRR8YWKc602dI/+t4pXV7FAolS9+yTZXLO/ya8Jkf8ngEzmIrNPn0TTLCGrXz7JVuCTPDKWxnaHoPpcMVZ9YyoV/Z5I2qPw00TQem2Uu2F63SEqcDzLEsgGdO164c+PdL85mp5QDa8oFvxZU170Z48R9I= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LmdObYAQXyaIvjDkqbGyh27WFzflTp0ySu18m3B6U1/7abWj+kaYPQj3Erd0u+QBZ/pwnkGlOOfZrUHLJoRFsOx9hlcxzXlGonWE0GKwOyDooSpuy08HwUFBZD0mG9xmw/Md01AQIBhYsu3hreiAfNhsGF+Ie+2SmE0KvjWMVkSscq8n27LcsgLCDdg1c3cSMCOjlFaWYNk0OWeZYVgX87yRIH0EaDuY352gU1GJaIM7fyBlXxzcdg/FIOIuHs67nnW9vqI2PaqU0Mo/NMcrVkHmOpVOcV64oCe2hmClrw7aVKobsmQ1vWWKDhsdxO5p0vtB7oTn3xVwCvyumoNteg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6qTRyOR21P1KjraCtM8QTh/yKRfiTqubcxvUOwf9NTI=; b=JvMxKWKxbwOx4Hf2YAMsZpa1lRvJ/8jcl8BmnjsZC/bgknoLPKTYKsnav8wBm0njTDY7hH5o5W7RKWRcqWFXcxIwx2kWoY3LKCZ1eNfLmIwWzYQX7wGiZEZf8WCmfp3XR41XzljBe+54qd40ZYDQVOyW5a4R+fY0K+KFSdQ9KDzjxIzmtAbaoVA7IqPnrMX3ulCtfOdEl4RN8+hzuUc6GBZwOMGOKTJG3VPaKhr+velZmFwIW9kGVQE+rvjPIAiopVXIBiRES6qTlhMZz0+86M4PsJcDdUKGO6/kCGqt3eXWTqroDnmchROGGl5i89OavCahkss2+HhtPR+nFrwTBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6qTRyOR21P1KjraCtM8QTh/yKRfiTqubcxvUOwf9NTI=; b=q9uOImbkYY0KeGnmg8DXgZjCqNR42tNqPbb8OaJF/efc4bSZeiKsxP0mrD/giTG0CwnR+mlZI1YOukKTIIhV0uJ/wjngFZ2raCo4VbDYeyj8GT8llx3bdHjxvYQlrSdQjYqSVDY/2b/fqRqRZvpCFG9+32aZ24JmiHnfLHTEAHw= Received: from SN6PR01MB4240.prod.exchangelabs.com (2603:10b6:805:ae::22) by SA1PR01MB8328.prod.exchangelabs.com (2603:10b6:806:384::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.29; Thu, 30 Nov 2023 08:27:33 +0000 Received: from SN6PR01MB4240.prod.exchangelabs.com ([fe80::5669:ba60:a9d7:6ab0]) by SN6PR01MB4240.prod.exchangelabs.com ([fe80::5669:ba60:a9d7:6ab0%4]) with mapi id 15.20.7046.023; Thu, 30 Nov 2023 08:27:33 +0000 From: Di Zhao OS To: "gcc-patches@gcc.gnu.org" CC: Philipp Tomsich Subject: [PATCH] aarch64: modify Ampere CPU tunings on reassociation/FMA Thread-Topic: [PATCH] aarch64: modify Ampere CPU tunings on reassociation/FMA Thread-Index: AdojZtfFeqbHqj59TmuXs9totbEhBQ== Date: Thu, 30 Nov 2023 08:27:33 +0000 Message-ID: Accept-Language: en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ActionId=b99471eb-6c97-438a-8b6d-832f53bcc331; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=true; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Privileged; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential (Default); MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-11-30T07:32:59Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SN6PR01MB4240:EE_|SA1PR01MB8328:EE_ x-ms-office365-filtering-correlation-id: 635944f0-3ff5-4c07-6e05-08dbf17e3386 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: /YbiFKrEqf+H9meVftl470Tvtilv7a8Xp9c9Kp+7aA3gEGhh/3zJt/2QuS9SCT3njVa3uaoTzfy59iDNrT/xaSefjyPcKp6Oda9gsupeQRubC/PEkCdistFEFEPEjKGysbrEnaTjrj9uKRru/VIh0SXTGntGI4gELfMeRwgIe0jhttqQIqIDAqEjKQxbs/QSaFhzBBsrbTvfXlrMLs+39bElAEoqwRAArfcxK0mLIPOaewKbAT2ikZh5c/gY39B4IqehExyHJETv4pl8+WbUOZiCF/eRlJ9cd8RN7/ZclukM0DNpxk6+qsPGVMj44my1VGccLLq5eKeP3LJanGz/68NYM7dSCw6ejDGth3FB2vcRCjCJxZldEbkae9VerXvHS2mtF7lZP7/pRy9MP6LQCIpP69WT8ELlEKBYNO4+F6w8K9FghRbvqt6E8SD9HcD6hvTploIahIGEN8Dn03bMIAVHQRf2gJ7VAxBsMpRohma2sI6dlMt9vD0Uqr4GoUYKH1zvh3hOJ6YwFhhkd9gfz0AhFYQP0svgCDKgywEu7nKl/cR5i4hxUp5lqko+EFPeKkwdLybmNtW1BoAVanS1NXF+yCHpwkyxC0CwpaqvxdD+Kssq/RIu4pIuTpt75EtX8cyerAA9vOEi4NVbVHymd14qWb8IGb14GQAvvk82wzg= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SN6PR01MB4240.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230031)(346002)(376002)(39850400004)(366004)(396003)(136003)(230922051799003)(1800799012)(64100799003)(186009)(451199024)(41300700001)(38100700002)(33656002)(38070700009)(122000001)(55016003)(86362001)(26005)(83380400001)(5660300002)(2906002)(9686003)(7696005)(6506007)(71200400001)(52536014)(4326008)(8676002)(8936002)(478600001)(202311291699003)(66556008)(64756008)(66446008)(66946007)(316002)(76116006)(66476007)(6916009); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: Gv2TfvVqS/YtxgU7WSeB/hNPGkcQEPZ/tVZm/wvLh5e9U/9rtk+9V673MILVulFYiUFqBgw4d1kuZc4JxmmLbmMuSgLgUkna6lUx7F0K+SZjTeDuOHjdSSVR/kgR6iWBIDgMRWVCiPAylUNDWgXF/5GYsiUzd/HfMhLlE4OYNiJ+QEncn+4ToSePtN5hXobOsOh/zNZu9A8W5s11lMGxLP1MESGh1wjGivjEXnxm6kuKieBFqsrmxKsPGmpOE2LourCWGcwfolYEUSC6Mrb1mQeoa7moLifb2Cpu0IdyeY7ixjwsj8WumccuS2+Hi+V6yicLmv6obU2BznSIoWg9F03gCWi2FErlTBiw43BWspO4l3m9QQUN1TVY0z3tFcs1yncw4OK3pmQdt3+XZ1r5qqjRtvKeaVrICyjxIZY58pHQ3/gCy2OC4kczkyKhS7xrdGck/SpSD/icYqIUfXSrhXzSTRJ4s9E/vKs96f+jSGGjFs2E0KkWf+iRSi2amg9OXc0ceikPQ5AhcYWI5/OEwPyCeJov63Bg6PnsgZwI8eb5UMFGvC4tziyqNc8jUZAPO87nZrME45dKz8HSuNdwQag7viS88274vKEm9Imvdk4f4Vxq/XenzgaNOXwHG9F9DbymKINbyWAPaldB4tc8lxbJ0vXQDj+NAeasymdHqecPBNCWewb5XSWXNX5ht3a1hbaBEWpXjsZ9gLl2hVrtwWfEf23fjSjTPLxxLqXJk2KfhlnRqW3fmujaj5FROhHykqBrna60gnQxgmlLjiTkZ/dvAnyBC4Zz1M4UmDByGuIWE4PF/aa97tTz0w3u763BobE8zWq2701ARtmnPo5vdgX7Z3Rx2Kf5gGBAffwdbU9bCYKUpi4Pnz+7Tzx6Jlrz1zF0l9CwvskhnqhybcFLFrlvqaCBYDcYMCr6+W2FpSJdNJGkXWxoEVrIw8OEt2r3xUmyx8MRc6DJ6ebc+RYja00UXR3Z4TK7b9DqIcT0ZRoatjMSB7OU4/19BZoxd7PnnGsq1pInJ83bJ/r0NPNkS4/Wls6r1D2HlEex2gMnmlNkd0h1lMVOliYlL5yZzMXjXB3f5cmGIUrtCvk/NlozbXSr7S4/O8EKI0vfgkCuE2P7Jsy9HSy8jSHTAEjFiMLXssc7xmjOJBmFgVXe9zQsEAJJL/EjLWREmj07gb4WseenyciB10WFseW2uw9Mg6Eb/YcxnSGAcGkX/lmBD9xxYcANql95uhRcog3fexmNuGnp7cajHfBLxMGXSblPndOqDxYO+8s8xswrGocQO8SO0p0eF01cduHcyFopjaUCyuT69OJB6UY3khZiuQ6XCF/O0T6GA2lUKhkLOsOKkKlXmfgWVyaQ2MdnhUdkcjRzf8uoAShisAMYwJSzhREhbVc3ysZC/s+ZNbfCVnMYYBxb0jiPmk/tWemW4hXeFTnTauXiSLvbfiBvAnxMD6f+7YWVVoJ/3Kak0Y15wb3RoULByqUwKrzQ9PqLoI/yEE4OahE7myzbi6E0ACpmvcsZCHPpPW59DxNzZxd0sxgkhJbPoOB4Dso3OO1myBiH7rAzSmFcvWQsDqbaeiP58SnqHaF9IueJ5CdYQu5shRFGl2H9pA== MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN6PR01MB4240.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 635944f0-3ff5-4c07-6e05-08dbf17e3386 X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Nov 2023 08:27:33.5050 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: t4mfX1hVDqGVsV4T9e9j0ZaNoQ8UT1mneLlDy1U6WB2Rw2LNmGy+4HXysOdBWVyx0LT7IxLc+0hu1O2XL3ldTE2jXZI+JkLaq3WwqtnL4axkxGHiH1mSN64rda+gWgqN X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB8328 X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org This patch modifies tunings for ampere1/ampere1a/ampere1b, to: 1. Allow reassociation on FP additions. 2. Avoid generating loop-dependant FMA chains. Added a tuning option for this. Bootstrapped and tested. Is this ok for trunk? Thanks, Di Zhao gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): New tuing option to avoid cross-loop FMA. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Set param_avoid_fma_max_bits according to tuning option. * config/aarch64/tuning_models/ampere1.h: Modify tunings related with FMA. * config/aarch64/tuning_models/ampere1a.h: Modify tunings related with FMA. * config/aarch64/tuning_models/ampere1b.h: Modify tunings related with FMA. --- gcc/config/aarch64/aarch64-tuning-flags.def | 2 ++ gcc/config/aarch64/aarch64.cc | 6 ++++++ gcc/config/aarch64/tuning_models/ampere1.h | 2 +- gcc/config/aarch64/tuning_models/ampere1a.h | 4 ++-- gcc/config/aarch64/tuning_models/ampere1b.h | 5 +++-- 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 774568e9106..f28a73839a6 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -47,4 +47,6 @@ AARCH64_EXTRA_TUNING_OPTION ("use_new_vector_costs", USE_NEW_VECTOR_COSTS) AARCH64_EXTRA_TUNING_OPTION ("matched_vector_throughput", MATCHED_VECTOR_THROUGHPUT) +AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 64684258b7b..28bc70a787f 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16083,6 +16083,12 @@ aarch64_override_options_internal (struct gcc_options *opts) && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level) opts->x_flag_prefetch_loop_arrays = 1; + /* Avoid loop-dependant FMA chains. */ + if (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA) + SET_OPTION_IF_UNSET (opts, &global_options_set, param_avoid_fma_max_bits, + 512); + aarch64_override_options_after_change_1 (opts); } diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h index 8d2a1c69610..a144e8f94b3 100644 --- a/gcc/config/aarch64/tuning_models/ampere1.h +++ b/gcc/config/aarch64/tuning_models/ampere1.h @@ -104,7 +104,7 @@ static const struct tune_params ampere1_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ &ere1_prefetch_tune, AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h index c419ffb3c1a..f688ed08a79 100644 --- a/gcc/config/aarch64/tuning_models/ampere1a.h +++ b/gcc/config/aarch64/tuning_models/ampere1a.h @@ -50,13 +50,13 @@ static const struct tune_params ampere1a_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ &ere1_prefetch_tune, AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1b.h b/gcc/config/aarch64/tuning_models/ampere1b.h index c4928f50d29..a98b6a980f7 100644 --- a/gcc/config/aarch64/tuning_models/ampere1b.h +++ b/gcc/config/aarch64/tuning_models/ampere1b.h @@ -99,13 +99,14 @@ static const struct tune_params ampere1b_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | + AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ &ere1b_prefetch_tune, AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALIGNED /* stp_policy_model. */