| Message ID | PAWPR08MB89824348B31E96B4F6432A3C833E9@PAWPR08MB8982.eurprd08.prod.outlook.com |
|---|---|
| State | Committed |
| Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
by sourceware.org (Postfix) with ESMTP id 46CD03858033
for <patchwork@sourceware.org>; Wed, 9 Nov 2022 12:40:51 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 46CD03858033
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
s=default; t=1667997651;
bh=ozrbMokxnKk9H52oMs2KL7bm5tVUKKg6/R2Oavmp1mY=;
h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive:
List-Post:List-Help:List-Subscribe:From:Reply-To:From;
b=pXoMWNNZTqTrKO88r5zfsjTMSblWUdfANOJa1OXG0SKev6WqJuFXpbLxxRhIwYpmN
+aiAzOYXIgm9dwnAa7688NhZWNXMalPy/MqB7rdXgyJSbTVJ/wWfhK/yfqZ+CmEppJ
GaZrYb6CERUc95mS60bKrLsZvxmGZZnaIMSToQoc=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from EUR04-VI1-obe.outbound.protection.outlook.com
(mail-eopbgr80078.outbound.protection.outlook.com [40.107.8.78])
by sourceware.org (Postfix) with ESMTPS id CB8C33858D20
for <gcc-patches@gcc.gnu.org>; Wed, 9 Nov 2022 12:40:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CB8C33858D20
ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass;
b=Lxv7AyoWe04Od3GJ8MG3+8fHnNQ+T3tehRi72X/Y0lKEtTp7E5V83hjjaGlaC9d8oHzpA5vgDA1RpITFj9Tr626XNhBfzCcJA28zwuDDH1a6aGWwgCuswloMd9uA8uz5YjPA4Aa814JPvN1xHaDQYeBhA7CaFlhuPHaivDYiGoo7fUgsCwOVm8SRhFrRoxdkcDSMF5wrEyWpAZbjix0bSmbIzFXiaSezSge2n8yuPjXbQJomDMyJ4ytbkshLDsccvOzq8y+ZOBsv/7JffHBq8O9hUxUHQzsnCNRI19g29F8I8NY4OQvJ0OkJF23rGNHouZw4nViVbrmHKctg8io1KA==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=ozrbMokxnKk9H52oMs2KL7bm5tVUKKg6/R2Oavmp1mY=;
b=VmHrbmfUtqJcuQwUDRsfJt/O8C5+HOPTDaKia5KA5bdN4WnqB2sbMCPExUrGa6s8K6wohIyILOWcYBk3k6FbOcuM+jnsGR22l1ZPRlClIsH29j/YFIo5LqtVVl54TUXD1Scof8r6HL9Z44zXOwcnnv5ozaRqzJZXpFD3ORcs2TMF2em7ApgFfVDI/4rt6kwFmOT7ThaH4AEvnObNnUxBiY/q8NsSnsysgjPjOsDCseFS9cHfJHYuUbg1xVKeVFoDzIF/OXu31NBcsDK2UFlJ3N6rR+6MJdoHhU+NHTFlPPLCM5vLjw/MwPBxZJPgJ6o/xcZiwgn0j9U09tP/tSSDTA==
ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is
63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass
(p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass
(signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1
ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com]
dmarc=[1,1,header.from=arm.com])
Received: from AS9PR06CA0049.eurprd06.prod.outlook.com (2603:10a6:20b:463::24)
by PAWPR08MB9495.eurprd08.prod.outlook.com (2603:10a6:102:2eb::17)
with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11; Wed, 9 Nov
2022 12:40:15 +0000
Received: from AM7EUR03FT020.eop-EUR03.prod.protection.outlook.com
(2603:10a6:20b:463:cafe::8b) by AS9PR06CA0049.outlook.office365.com
(2603:10a6:20b:463::24) with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.27 via Frontend
Transport; Wed, 9 Nov 2022 12:40:14 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123)
smtp.mailfrom=arm.com; dkim=pass (signature was verified)
header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
63.35.35.123 as permitted sender) receiver=protection.outlook.com;
client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
pr=C
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
AM7EUR03FT020.mail.protection.outlook.com (100.127.140.196) with
Microsoft
SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
15.20.5813.12 via Frontend Transport; Wed, 9 Nov 2022 12:40:14 +0000
Received: ("Tessian outbound b4aebcc5bc64:v130");
Wed, 09 Nov 2022 12:40:14 +0000
X-CheckRecipientChecked: true
X-CR-MTA-CID: 4bf2b212dbf1546e
X-CR-MTA-TID: 64aa7808
Received: from 27041fdf6d0c.2
by 64aa7808-outbound-1.mta.getcheckrecipient.com id
83664462-242C-4D5E-941B-B91E751903C1.1;
Wed, 09 Nov 2022 12:40:07 +0000
Received: from EUR04-VI1-obe.outbound.protection.outlook.com
by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id
27041fdf6d0c.2
(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
Wed, 09 Nov 2022 12:40:07 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
b=TnbnMy4R/CLBbDijqGi4W92rykKQCMIbXX0tiX6+EiOTuFFOzTNz6hSmYgqkmtk7tTZ1Od1mYdEma7aJZqn+coWFFMwVu4mIWNs3OxZGs/qAsAvckPDZaujve+yYcHWnWaXWtVridlOviMv0rS1xn6hP+PQew5LGSsWA0ndz7sKrotuYNWc64a2++Tzi8VNhVgkU5812d/y/D2DtODhbgIUJ/HRiJqDQzUI1dTEcG1eNcm50q/Gkvl89lgJgzhiHHyIwFNqOFjq5t2Z0DwSIxWVJaFxWEUNfmoCKNI32NiLasIzHLjEWvDoa1OGdH+2/jN9thqO+fkXrZPsWDDhD/A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
bh=ozrbMokxnKk9H52oMs2KL7bm5tVUKKg6/R2Oavmp1mY=;
b=JGS4/lhTTNUKSP9F9AMubmddmP1Eik+Kmdxaj7pMguv8PL9NtNnEuEIfULhXt0WO7o4ct1JI8nveA41HG0zy06gmUE43lt3qxvTGjUvJRWFW5XHQUAH/etFBrFomIQgKf8Tt9P3qVd6O3ODCcolHmNtVKvXv2Cpf2MlhGcRlXEg1HhSg+ww0UZ8rCDZSqq8W+cp/W5VOVdCdRX8XJoQ6kxOjenNUsXIvLhHAaPyatDK9DqUDoX4Y+JBfnoQ4TDkT7uZahJJP+BcAUrChNtHvLH+GDHX0/JdQpBX1dOT+GHmWtgOCoNziV8q4WV9w3OXkaIzB3z1LdZTPIbCDrGe85w==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
header.d=arm.com; arc=none
Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20)
by AS2PR08MB8310.eurprd08.prod.outlook.com (2603:10a6:20b:555::8)
with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11; Wed, 9 Nov
2022 12:40:05 +0000
Received: from PAWPR08MB8982.eurprd08.prod.outlook.com
([fe80::4c73:7d14:fc39:a3cb]) by PAWPR08MB8982.eurprd08.prod.outlook.com
([fe80::4c73:7d14:fc39:a3cb%4]) with mapi id 15.20.5813.011; Wed, 9 Nov 2022
12:40:05 +0000
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [PATCH] AArch64: Add fma_reassoc_width [PR107413]
Thread-Topic: [PATCH] AArch64: Add fma_reassoc_width [PR107413]
Thread-Index: AQHY9DgezXT8hXnv2EGnjnt4gJ97GQ==
Date: Wed, 9 Nov 2022 12:40:05 +0000
Message-ID:
<PAWPR08MB89824348B31E96B4F6432A3C833E9@PAWPR08MB8982.eurprd08.prod.outlook.com>
Accept-Language: en-GB, en-US
Content-Language: en-GB
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels:
Authentication-Results-Original: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=arm.com;
x-ms-traffictypediagnostic:
PAWPR08MB8982:EE_|AS2PR08MB8310:EE_|AM7EUR03FT020:EE_|PAWPR08MB9495:EE_
X-MS-Office365-Filtering-Correlation-Id: a89ace72-2697-4542-c173-08dac24f8ce8
x-checkrecipientrouted: true
nodisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original:
wIg5iXoSG0HOyj5OwtwkPxFmNhwu6ezPT5mZI7kFNxpznbq5Ft3VYyRqmEfE/oBMlImo8jZh+P+0IiU1GHb6e2ag8BCxO+Z8BM2NFDOE1tD1wffOPlwzPVBcl6SZePccXzEIDrXUIfvvh3G0PuZF1blyF3QzzwtMq5FmK6/MbqFC3bWo01cUGIDPIT9cAhcrwhIuaWHVSqzmn7+GjRagBGD0EFBVC449KQ4KsHBvAiiMZkQ1h9HG8+FoW8CVVM8E5dpG0xJ+Id+I/Gt+9uzmMdon4KgvN63bHqLry91G+azKxV5jwgZMZ08uSzbHcU8ov9QqjtAjv8ZANlD9Bck2/v5lHaAn/8goFO8Pbhie4t9DmEogf2S22hpCqd79TZacF6CyKZXNtQnXgMhdP76SSURXi6nu0UdG4AbJMYwhNWaWiEPWfZcboU3Fm+1fnGyk32fJYPmAT62QMst6oUyJmmcuvhTrMnCCxkrDyHTU7p1QupniIHgX5jp9ZkPX99FWWg8So5Ut5Vq3SV4OshoWF6uejbr33MHZrLZX/GR/rA6EaAyGjgWhJCqKS92vJGvkavvHthVh3iMgmpFvPPtSu6kdpUlV1yxJT10CYe8Sgu41IAXHDHW7XWFaDKdVQ1n+mFef9QBVNRoQCDlXqCfhp6Ll0NdRKTymzjw/7ulaw2XpnlHki16RunxwsechzNLzx6BNMUTsZUa2SnzcaOkbw3Pa8bxCVdhLEhkwaKFSdFLZJbRe28nGRRHiyDpWqdmF711VEcw17HEGGgZ1TqF+3w==
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com;
PTR:; CAT:NONE;
SFS:(13230022)(4636009)(396003)(376002)(346002)(136003)(366004)(39860400002)(451199015)(38070700005)(2906002)(38100700002)(316002)(26005)(6506007)(7696005)(478600001)(6916009)(122000001)(55016003)(186003)(71200400001)(86362001)(33656002)(76116006)(66946007)(66556008)(9686003)(66446008)(66476007)(64756008)(4326008)(8676002)(5660300002)(52536014)(8936002)(41300700001)(91956017);
DIR:OUT; SFP:1101;
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB8310
Original-Authentication-Results: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped:
AM7EUR03FT020.eop-EUR03.prod.protection.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs:
cdd22f6f-7777-4675-2b67-08dac24f876e
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
q3E8ntDiY76pVL5AvYb139MUmD1dersfI9NhlfkLu3dxhE+43WkWJGDEGzEqFPPWYgSIiZoZu8Y4jQnFJz0B0FwZJiDy+9a4TIOmMdggx5fDhMB2vyvwEwaUzR5InPRP1VfZTkK6OgUwfvuWatcTN2CMVCiQIp3KG1Kwzf07MwzMvBSS0FO6fINJuw0AXFFWaAE/qbMNenh2kv3klKx39uAAUMZFCtWt4Ncpls1+k0b6vki5b3CZttzQbjZTnQY8yKOCcboitWpmLcbn80MU7l7p2GQqHQYljTClVmXukisLnZhjLKzarjV8MP5+limfTT3pzZZUgBVLz+Q5QXXlsPrk2o5cM31L64SAVdd3vUt8k3sM2c724mRbB7e9rww6AD183KKsi/jyrfqNmAxmxDQzNC67psWysrs/+B/gVX7po+o5L4yZ8dqQQQuZeU2cPLee6iU6XbagLkQebAx7rGhjWZh22v7wM9xmxqzn/LZp0cFyUOZkCCY+0JHF9U6a96/H8mvDbmF+n/Otltb884XYWvqQfAeTqEQx39lPwuolqkcq1QbX9x+w2w2Cw3c01dAmNKZx30NjR5aY3r+sgA3KfTUaltGC5NQh/O4+ZkahTlQDu7Mp5lSYsgbAwDoGnEg30xlmjhGIgMMbl2Vw+OWqRyK0AR9FHWVeS0nus/Wme1ahshl/cAEjHui4oYwqDwNs2geYtN/kElnY0heSp4N/gT3RS8SkwZMIXKoQgAlwLNI7alWjwKJ6hy7RI+N/nnXA3uT9DMemntV+MlpC7g==
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
SFS:(13230022)(4636009)(39860400002)(396003)(346002)(136003)(376002)(451199015)(36840700001)(46966006)(40470700004)(40480700001)(40460700003)(478600001)(2906002)(33656002)(82740400003)(47076005)(86362001)(55016003)(356005)(81166007)(8676002)(70586007)(6506007)(186003)(36860700001)(9686003)(82310400005)(26005)(7696005)(41300700001)(70206006)(336012)(4326008)(316002)(8936002)(6916009)(52536014)(5660300002);
DIR:OUT; SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Nov 2022 12:40:14.7990 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id:
a89ace72-2697-4542-c173-08dac24f8ce8
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp:
TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource:
AM7EUR03FT020.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9495
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE,
RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP,
UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
<mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
<mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
From: Wilco Dijkstra via Gcc-patches <gcc-patches@gcc.gnu.org>
Reply-To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
<gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
|
| Series |
AArch64: Add fma_reassoc_width [PR107413]
|
|
Commit Message
Wilco Dijkstra
Nov. 9, 2022, 12:40 p.m. UTC
Add a reassocation width for FMAs in per-CPU tuning structures. Keep the
existing setting for cores with 2 FMA pipes, and use 4 for cores with 4
FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%.
Passes regress/bootstrap, OK for commit?
gcc/
PR 107413
* config/aarch64/aarch64.cc (struct tune_params): Add
fma_reassoc_width to all CPU tuning structures.
* config/aarch64/aarch64-protos.h (struct tune_params): Add
fma_reassoc_width.
---
Comments
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Add a reassocation width for FMAs in per-CPU tuning structures. Keep the > existing setting for cores with 2 FMA pipes, and use 4 for cores with 4 > FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. > > Passes regress/bootstrap, OK for commit? > > gcc/ > PR 107413 > * config/aarch64/aarch64.cc (struct tune_params): Add > fma_reassoc_width to all CPU tuning structures. > * config/aarch64/aarch64-protos.h (struct tune_params): Add > fma_reassoc_width. > > --- > > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index a73bfa20acb9b92ae0475794c3f11c67d22feb97..71365a446007c26b906b61ca8b2a68ee06c83037 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -540,6 +540,7 @@ struct tune_params > const char *loop_align; > int int_reassoc_width; > int fp_reassoc_width; > + int fma_reassoc_width; > int vec_reassoc_width; > int min_div_recip_mul_sf; > int min_div_recip_mul_df; > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 798363bcc449c414de5bbb4f26b8e1c64a0cf71a..643162cdecd6a8fe5587164cb2d0d62b709a491d 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings = > "4", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = > "16", /* loop_align. */ > 3, /* int_reassoc_width. */ > 2, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings = > "16", /* loop_align. */ > 3, /* int_reassoc_width. */ > 2, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2089,6 +2107,7 @@ static const struct tune_params neoversev1_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 4, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2226,6 +2245,7 @@ static const struct tune_params neoverse512tvb_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 4, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2414,6 +2434,7 @@ static const struct tune_params neoversen2_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2603,6 +2624,7 @@ static const struct tune_params neoversev2_tunings = > "32:16", /* loop_align. */ > 3, /* int_reassoc_width. */ > 6, /* fp_reassoc_width. */ > + 4, /* fma_reassoc_width. */ > 3, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2638,6 +2660,7 @@ static const struct tune_params a64fx_tunings = > "32", /* loop_align. */ > 4, /* int_reassoc_width. */ > 2, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -3350,9 +3373,10 @@ aarch64_reassociation_width (unsigned opc, machine_mode mode) > return aarch64_tune_params.vec_reassoc_width; > if (INTEGRAL_MODE_P (mode)) > return aarch64_tune_params.int_reassoc_width; > - /* Avoid reassociating floating point addition so we emit more FMAs. */ > - if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) > - return aarch64_tune_params.fp_reassoc_width; > + /* FMA's can have a different reassociation width. */ > + if (FLOAT_MODE_P (mode)) > + return opc == PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width > + : aarch64_tune_params.fp_reassoc_width; > return 1; > } I guess an obvious question is: if 1 (rather than 2) was the right value for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA pipes? It would be good to clarify how, conceptually, the core property should map to the fma_reassoc_width value. It sounds from the existing comment like the main motivation for returning 1 was to encourage more FMAs to be formed, rather than to prevent FMAs from being reassociated. Is that no longer an issue? Or is the point that, with more FMA pipes, lower FMA formation is a price worth paying for the better parallelism we get when FMAs can be formed? Does this code ever see opc == FMA? Thanks, Richard
On Tue, Nov 22, 2022 at 8:59 AM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > > Add a reassocation width for FMAs in per-CPU tuning structures. Keep the > > existing setting for cores with 2 FMA pipes, and use 4 for cores with 4 > > FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. > > > > Passes regress/bootstrap, OK for commit? > > > > gcc/ > > PR 107413 > > * config/aarch64/aarch64.cc (struct tune_params): Add > > fma_reassoc_width to all CPU tuning structures. > > * config/aarch64/aarch64-protos.h (struct tune_params): Add > > fma_reassoc_width. > > > > --- > > > > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > > index a73bfa20acb9b92ae0475794c3f11c67d22feb97..71365a446007c26b906b61ca8b2a68ee06c83037 100644 > > --- a/gcc/config/aarch64/aarch64-protos.h > > +++ b/gcc/config/aarch64/aarch64-protos.h > > @@ -540,6 +540,7 @@ struct tune_params > > const char *loop_align; > > int int_reassoc_width; > > int fp_reassoc_width; > > + int fma_reassoc_width; > > int vec_reassoc_width; > > int min_div_recip_mul_sf; > > int min_div_recip_mul_df; > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 798363bcc449c414de5bbb4f26b8e1c64a0cf71a..643162cdecd6a8fe5587164cb2d0d62b709a491d 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings = > > "4", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = > > "16", /* loop_align. */ > > 3, /* int_reassoc_width. */ > > 2, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings = > > "16", /* loop_align. */ > > 3, /* int_reassoc_width. */ > > 2, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2089,6 +2107,7 @@ static const struct tune_params neoversev1_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 4, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2226,6 +2245,7 @@ static const struct tune_params neoverse512tvb_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 4, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2414,6 +2434,7 @@ static const struct tune_params neoversen2_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2603,6 +2624,7 @@ static const struct tune_params neoversev2_tunings = > > "32:16", /* loop_align. */ > > 3, /* int_reassoc_width. */ > > 6, /* fp_reassoc_width. */ > > + 4, /* fma_reassoc_width. */ > > 3, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2638,6 +2660,7 @@ static const struct tune_params a64fx_tunings = > > "32", /* loop_align. */ > > 4, /* int_reassoc_width. */ > > 2, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -3350,9 +3373,10 @@ aarch64_reassociation_width (unsigned opc, machine_mode mode) > > return aarch64_tune_params.vec_reassoc_width; > > if (INTEGRAL_MODE_P (mode)) > > return aarch64_tune_params.int_reassoc_width; > > - /* Avoid reassociating floating point addition so we emit more FMAs. */ > > - if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) > > - return aarch64_tune_params.fp_reassoc_width; > > + /* FMA's can have a different reassociation width. */ > > + if (FLOAT_MODE_P (mode)) > > + return opc == PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width > > + : aarch64_tune_params.fp_reassoc_width; > > return 1; > > } > > I guess an obvious question is: if 1 (rather than 2) was the right value > for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA > pipes? It would be good to clarify how, conceptually, the core property > should map to the fma_reassoc_width value. > > It sounds from the existing comment like the main motivation for returning 1 > was to encourage more FMAs to be formed, rather than to prevent FMAs from > being reassociated. Is that no longer an issue? Or is the point that, > with more FMA pipes, lower FMA formation is a price worth paying for > the better parallelism we get when FMAs can be formed? Would integrating FMA formation with reassoc when reassoc_width > 1 make sense? I think we should be able to add a special oplist optimization phase to more optimally distribute FMA pairs in the formed tree? Richard. > Does this code ever see opc == FMA? > > Thanks, > Richard
Hi Richard, > I guess an obvious question is: if 1 (rather than 2) was the right value > for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA > pipes? It would be good to clarify how, conceptually, the core property > should map to the fma_reassoc_width value. 1 turns off reassociation so that FMAs get properly formed. After reassociation far fewer FMAs get formed so we end up with more FLOPS which means slower execution. It's a significant slowdown on cores that are not wide, have only 1 or 2 FP pipes and may have high FP latencies. So we turn it off by default on all older cores. > It sounds from the existing comment like the main motivation for returning 1 > was to encourage more FMAs to be formed, rather than to prevent FMAs from > being reassociated. Is that no longer an issue? Or is the point that, > with more FMA pipes, lower FMA formation is a price worth paying for > the better parallelism we get when FMAs can be formed? Exactly. A wide CPU can deal with the extra instructions, so the loss from fewer FMAs ends up lower than the speedup from the extra parallelism. Having more FMAs will be even faster of course. > Does this code ever see opc == FMA? No, that's the problem, reassociation ignores the fact that we actually want FMAs. A smart reassociation pass could form more FMAs while also increasing parallelism, but the way it currently works always results in fewer FMAs. Cheers, Wilco
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Hi Richard, > >> I guess an obvious question is: if 1 (rather than 2) was the right value >> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA >> pipes? It would be good to clarify how, conceptually, the core property >> should map to the fma_reassoc_width value. > > 1 turns off reassociation so that FMAs get properly formed. After reassociation far > fewer FMAs get formed so we end up with more FLOPS which means slower execution. > It's a significant slowdown on cores that are not wide, have only 1 or 2 FP pipes and > may have high FP latencies. So we turn it off by default on all older cores. > >> It sounds from the existing comment like the main motivation for returning 1 >> was to encourage more FMAs to be formed, rather than to prevent FMAs from >> being reassociated. Is that no longer an issue? Or is the point that, >> with more FMA pipes, lower FMA formation is a price worth paying for >> the better parallelism we get when FMAs can be formed? > > Exactly. A wide CPU can deal with the extra instructions, so the loss from fewer > FMAs ends up lower than the speedup from the extra parallelism. Having more FMAs > will be even faster of course. Thanks. It would be good to put this in a comment somewhere, perhaps above the fma_reassoc_width field. It isn't obvious from the patch as posted, and changing the existing comment drops the previous hint about what was going on. > >> Does this code ever see opc == FMA? > > No, that's the problem, reassociation ignores the fact that we actually want FMAs. Yeah, but I was wondering if later code would sometimes query this hook for existing FMAs, even if that code wasn't the focus of the patch. Once we add the distinction between FMAs and other ops, it seemed natural to test for existing FMAs. But of course, FMA is an rtl code rather than a tree code (oops), so that was never going to happen. > A smart reassociation pass could form more FMAs while also increasing > parallelism, but the way it currently works always results in fewer FMAs. Yeah, as Richard said, that seems the right long-term fix. It would also avoid the hack of treating PLUS_EXPR as a signal of an FMA, which has the drawback of assuming (for 2-FMA cores) that plain addition never benefits from reassociation in its own right. Still, I guess the hackiness is pre-existing and the patch is removing the hackiness for some cores, so from that point of view it's a strict improvement over the status quo. And it's too late in the GCC 13 cycle to do FMA reassociation properly. So I'm OK with the patch in principle, but could you post an update with more commentary? Thanks, Richard
Hi Richard, >> A smart reassociation pass could form more FMAs while also increasing >> parallelism, but the way it currently works always results in fewer FMAs. > > Yeah, as Richard said, that seems the right long-term fix. > It would also avoid the hack of treating PLUS_EXPR as a signal > of an FMA, which has the drawback of assuming (for 2-FMA cores) > that plain addition never benefits from reassociation in its own right. True but it's hard to separate them. You will have a mix of FADD and FMAs to reassociate (since FMA still counts as an add), and the ratio between them as well as the number of operations may affect the best reassociation width. > Still, I guess the hackiness is pre-existing and the patch is removing > the hackiness for some cores, so from that point of view it's a strict > improvement over the status quo. And it's too late in the GCC 13 > cycle to do FMA reassociation properly. So I'm OK with the patch > in principle, but could you post an update with more commentary? Sure, here is an update with longer comment in aarch64_reassociation_width: Add a reassocation width for FMAs in per-CPU tuning structures. Keep the existing setting for cores with 2 FMA pipes, and use 4 for cores with 4 FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. Passes regress/bootstrap, OK for commit? gcc/ChangeLog/ PR 107413 * config/aarch64/aarch64.cc (struct tune_params): Add fma_reassoc_width to all CPU tuning structures. (aarch64_reassociation_width): Use fma_reassoc_width. * config/aarch64/aarch64-protos.h (struct tune_params): Add fma_reassoc_width. --- diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 238820581c5ee7617f8eed1df2cf5418b1127e19..4be93c93c26e091f878bc8e4cf06e90888405fb2 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -540,6 +540,7 @@ struct tune_params const char *loop_align; int int_reassoc_width; int fp_reassoc_width; + int fma_reassoc_width; int vec_reassoc_width; int min_div_recip_mul_sf; int min_div_recip_mul_df; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index c91df6f5006c257690aafb75398933d628a970e1..15d478c77ceb2d6c52a70b6ffd8fdadcfa8deba0 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings = "4", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = "16", /* loop_align. */ 3, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings = "16", /* loop_align. */ 3, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1949,6 +1967,7 @@ static const struct tune_params ampere1a_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2126,6 +2145,7 @@ static const struct tune_params neoversev1_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2263,6 +2283,7 @@ static const struct tune_params neoverse512tvb_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2451,6 +2472,7 @@ static const struct tune_params neoversen2_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2640,6 +2662,7 @@ static const struct tune_params neoversev2_tunings = "32:16", /* loop_align. */ 3, /* int_reassoc_width. */ 6, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ 3, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2675,6 +2698,7 @@ static const struct tune_params a64fx_tunings = "32", /* loop_align. */ 4, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -3387,9 +3411,15 @@ aarch64_reassociation_width (unsigned opc, machine_mode mode) return aarch64_tune_params.vec_reassoc_width; if (INTEGRAL_MODE_P (mode)) return aarch64_tune_params.int_reassoc_width; - /* Avoid reassociating floating point addition so we emit more FMAs. */ - if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) - return aarch64_tune_params.fp_reassoc_width; + /* Reassociation reduces the number of FMAs which may result in worse + performance. Use a per-CPU setting for FMA reassociation which allows + narrow CPUs with few FP pipes to switch it off (value of 1), and wider + CPUs with many FP pipes to enable reassociation. + Since the reassociation pass doesn't understand FMA at all, assume + that any FP addition might turn into FMA. */ + if (FLOAT_MODE_P (mode)) + return opc == PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width + : aarch64_tune_params.fp_reassoc_width; return 1; }
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: > Hi Richard, > >>> A smart reassociation pass could form more FMAs while also increasing >>> parallelism, but the way it currently works always results in fewer FMAs. >> >> Yeah, as Richard said, that seems the right long-term fix. >> It would also avoid the hack of treating PLUS_EXPR as a signal >> of an FMA, which has the drawback of assuming (for 2-FMA cores) >> that plain addition never benefits from reassociation in its own right. > > True but it's hard to separate them. You will have a mix of FADD and FMAs > to reassociate (since FMA still counts as an add), and the ratio between > them as well as the number of operations may affect the best reassociation > width. > >> Still, I guess the hackiness is pre-existing and the patch is removing >> the hackiness for some cores, so from that point of view it's a strict >> improvement over the status quo. And it's too late in the GCC 13 >> cycle to do FMA reassociation properly. So I'm OK with the patch >> in principle, but could you post an update with more commentary? > > Sure, here is an update with longer comment in aarch64_reassociation_width: > > > Add a reassocation width for FMAs in per-CPU tuning structures. Keep the > existing setting for cores with 2 FMA pipes, and use 4 for cores with 4 > FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. > > Passes regress/bootstrap, OK for commit? > > gcc/ChangeLog/ > PR 107413 > * config/aarch64/aarch64.cc (struct tune_params): Add > fma_reassoc_width to all CPU tuning structures. > (aarch64_reassociation_width): Use fma_reassoc_width. > * config/aarch64/aarch64-protos.h (struct tune_params): Add > fma_reassoc_width. OK, thanks. Richard > --- > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index 238820581c5ee7617f8eed1df2cf5418b1127e19..4be93c93c26e091f878bc8e4cf06e90888405fb2 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -540,6 +540,7 @@ struct tune_params > const char *loop_align; > int int_reassoc_width; > int fp_reassoc_width; > + int fma_reassoc_width; > int vec_reassoc_width; > int min_div_recip_mul_sf; > int min_div_recip_mul_df; > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index c91df6f5006c257690aafb75398933d628a970e1..15d478c77ceb2d6c52a70b6ffd8fdadcfa8deba0 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings = > "4", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings = > "8", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings = > "16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = > "16", /* loop_align. */ > 3, /* int_reassoc_width. */ > 2, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings = > "16", /* loop_align. */ > 3, /* int_reassoc_width. */ > 2, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -1949,6 +1967,7 @@ static const struct tune_params ampere1a_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2126,6 +2145,7 @@ static const struct tune_params neoversev1_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 4, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2263,6 +2283,7 @@ static const struct tune_params neoverse512tvb_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 4, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2451,6 +2472,7 @@ static const struct tune_params neoversen2_tunings = > "32:16", /* loop_align. */ > 2, /* int_reassoc_width. */ > 4, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2640,6 +2662,7 @@ static const struct tune_params neoversev2_tunings = > "32:16", /* loop_align. */ > 3, /* int_reassoc_width. */ > 6, /* fp_reassoc_width. */ > + 4, /* fma_reassoc_width. */ > 3, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -2675,6 +2698,7 @@ static const struct tune_params a64fx_tunings = > "32", /* loop_align. */ > 4, /* int_reassoc_width. */ > 2, /* fp_reassoc_width. */ > + 1, /* fma_reassoc_width. */ > 2, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -3387,9 +3411,15 @@ aarch64_reassociation_width (unsigned opc, machine_mode mode) > return aarch64_tune_params.vec_reassoc_width; > if (INTEGRAL_MODE_P (mode)) > return aarch64_tune_params.int_reassoc_width; > - /* Avoid reassociating floating point addition so we emit more FMAs. */ > - if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) > - return aarch64_tune_params.fp_reassoc_width; > + /* Reassociation reduces the number of FMAs which may result in worse > + performance. Use a per-CPU setting for FMA reassociation which allows > + narrow CPUs with few FP pipes to switch it off (value of 1), and wider > + CPUs with many FP pipes to enable reassociation. > + Since the reassociation pass doesn't understand FMA at all, assume > + that any FP addition might turn into FMA. */ > + if (FLOAT_MODE_P (mode)) > + return opc == PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width > + : aarch64_tune_params.fp_reassoc_width; > return 1; > }
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index a73bfa20acb9b92ae0475794c3f11c67d22feb97..71365a446007c26b906b61ca8b2a68ee06c83037 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -540,6 +540,7 @@ struct tune_params const char *loop_align; int int_reassoc_width; int fp_reassoc_width; + int fma_reassoc_width; int vec_reassoc_width; int min_div_recip_mul_sf; int min_div_recip_mul_df; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 798363bcc449c414de5bbb4f26b8e1c64a0cf71a..643162cdecd6a8fe5587164cb2d0d62b709a491d 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings = "4", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings = "8", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings = "16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 1, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = "16", /* loop_align. */ 3, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings = "16", /* loop_align. */ 3, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2089,6 +2107,7 @@ static const struct tune_params neoversev1_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2226,6 +2245,7 @@ static const struct tune_params neoverse512tvb_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2414,6 +2434,7 @@ static const struct tune_params neoversen2_tunings = "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2603,6 +2624,7 @@ static const struct tune_params neoversev2_tunings = "32:16", /* loop_align. */ 3, /* int_reassoc_width. */ 6, /* fp_reassoc_width. */ + 4, /* fma_reassoc_width. */ 3, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -2638,6 +2660,7 @@ static const struct tune_params a64fx_tunings = "32", /* loop_align. */ 4, /* int_reassoc_width. */ 2, /* fp_reassoc_width. */ + 1, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ @@ -3350,9 +3373,10 @@ aarch64_reassociation_width (unsigned opc, machine_mode mode) return aarch64_tune_params.vec_reassoc_width; if (INTEGRAL_MODE_P (mode)) return aarch64_tune_params.int_reassoc_width; - /* Avoid reassociating floating point addition so we emit more FMAs. */ - if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) - return aarch64_tune_params.fp_reassoc_width; + /* FMA's can have a different reassociation width. */ + if (FLOAT_MODE_P (mode)) + return opc == PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width + : aarch64_tune_params.fp_reassoc_width; return 1; }