From patchwork Sun Jul 21 09:14:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 94286 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5A7DA385DDFB for ; Sun, 21 Jul 2024 09:15:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from CY4PR05CU001.outbound.protection.outlook.com (mail-westcentralusazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c112::]) by sourceware.org (Postfix) with ESMTPS id 84C533858C39 for ; Sun, 21 Jul 2024 09:15:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84C533858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 84C533858C39 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c112:: ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553304; cv=pass; b=KijPTX/21HjJCwOJv4y7GzmjVSkkVXQ3AXVYZBOYiPKFvkI6SFtpU84SK2AKnamV/QP2dls/BJ8M93ktW8BzxCP744ZRyVY8PhBthUzS8fHftzMJHA4a8MN0Z3s0rHgUU4Zldd2VdTBPgqbRUglaiXFCPttUUuHa43E71mxcYXk= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553304; c=relaxed/simple; bh=gEz6O6QwI0OEzID0gD8GpdxXGVS9gKUBWhSAMyPy+P4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=BDScgPbFnVOlC4oH0lFnwwj0yKzj6Xa8/pGI7WqF8fzpSjIVR4/9Msv72hfMGDbC0apRcGrlXESVyKgIBek/wIMJ0+SJE7B/UVRi/DDbhWY6qTKYPlZB/TeOU6H7Ki30Ff1lImVWG220Kd/LxNhbSVCJKDAsBiOdvCJnnOvwr5w= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=doksAEPyj5mxLsOectPWN+A6AiWbcA3K0Ix+yE+UFOOivu5mBU7crjQJVAZnUDll84VKnReCLDOrjrtSV6mvnk6N74OX67O03+WLqmF4bZwSkPIpzQzKXZnbePA9LaiZxrwxReuj9btcUQny17+kyows/XgjdSw8KkwY2idEDdmDKuUAm/JutKFGHhbZe4H+onngHmAVn09Ezf+vKrjUF6xl3LKvUXdDDKA15liIjc3YB3rs5xShj6wkAnLxuUKqi0wK8MChFRb3IAUc0DdkMTk9xtFLDVlFN6ZGJJWHfgCbGN12gL9zpVu4Oudmra9xynSP13KYcYsEeuuA4336/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HNDSINRcY8/OxUQogGuBQewkVe+wR/21k1X3QdrMfh0=; b=M/ZMvsKnw4iT6PUYx2TKFVPa7VAuNeGMJCrPYwloOSaOk2jX+DnpTN5WswXUn+vKBZ+BLsfQpx7Jo71bZvKS9BLvaiHhccwnt0PLhjRwTNawUaS8DecfzUT1wO6eaAG5Bptm2XhA+sUkUelDabWtwmT2jEnGxY2v+h4wdNB7BkQFGOhzOitClCDqJhNG+m9gRSs6KAU9NcXQ4G52vrS5SG94NWcUjj4by9pUnELwUMqV0p0L6tAexKPQsCG78wOxS8YUChUo+7VqofB4ViZKscTTflahMIOvz1TWzqS1s/3xupw7mkVlAk2sYLg6xxlcZLC1Oes9+2rX3T0B4vItDw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HNDSINRcY8/OxUQogGuBQewkVe+wR/21k1X3QdrMfh0=; b=rVes/5QJIftXTBO5EgY+fPdH8gz9BUDRi3GQacBBIp4GF0E1gdNBbu5z0rl96dMGL+4sDTQJ4/vvFDuAl3bsxVWC3rNNfTEdr+xTHxVAtFojisIISW5anOTjQMJst1W4wuQXsvygOKrFlSWW4SpAMMeWIRmvcRZw9jvkLm8AAm4= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.14; Sun, 21 Jul 2024 09:14:59 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024 09:14:59 +0000 From: Feng Xue OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , Tamar Christina , Richard Sandiford Subject: [RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns Thread-Topic: [RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns Thread-Index: AQHa20mwT9DDm8/aJUK1dkZViTITDQ== Date: Sun, 21 Jul 2024 09:14:59 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:14:58.925Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_ x-ms-office365-filtering-correlation-id: dddba7b3-c503-4a9d-c132-08dca9659855 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?J+FCIfMyQHhBhnqNkKrtlj/ipI?= =?iso-8859-1?q?Jk53/Kszm92e+oBuzDDVD9HCsy1BMPeVSpAo4fjozAULmjhdPCqQxBJFksrh?= =?iso-8859-1?q?56vcAfAjUjeC2o1Pg/QvG3uP8bTZs/wM2YnJcOMXuG4f9y1+A9kWK7Z8/8Dk?= =?iso-8859-1?q?1d1k6JTs1KI3cob4LwViLuOb4ARKh2yAblEzhXvtulgop8DyLslZSBYHYjw8?= =?iso-8859-1?q?LZzOarB58PwWSPoWLIrNQFS5NmLHSipkv4fEDftpI/Ww8PzSlKFkpfTpM8BX?= =?iso-8859-1?q?94T5Fm26TCOZH2+nYVwT3SGEKURNpZGd/t7ct9TCvWpQ22TWuKKoCaYdg64F?= =?iso-8859-1?q?WXhfHD1Zb5WngDJSOAiwIXtr+VGjLJxtkY8H3uKmwxFAPOuJRriyx6Rililr?= =?iso-8859-1?q?pRkFOMzUJFUGj2ChEBxHFLkkg99J43VQa0ign65zWjeul/FU2tU8/nqHIXQ7?= =?iso-8859-1?q?VOaRffunDTmoOA5BGiY8xuWi4iby2UczU1cLGerKi1Ibb+uyqYZwbMEpVzBQ?= =?iso-8859-1?q?ogi4/jWLvm4vPrgvSdix47Gihq6ZyaE0oTxKuO4GuCJTfubz8okiIpy1sW60?= =?iso-8859-1?q?N1PySjPrY3mVR2CeBhadmK+Fr9yiDm+h11PpG/L6mT+r3HTb3SP5HaTYDF19?= =?iso-8859-1?q?vNOy7jqWXze+pgY+2SB45k3vhTmQwhl31Yo6Qz8m+BFx2GHswYGfzL+nJumo?= =?iso-8859-1?q?Dn8RoVcj2R2/qfU/QbmvmR0bxDycVndODxhYYxHjXCbK006mMwpkub5fcizu?= =?iso-8859-1?q?LGCxMVQvw1WdLN5EIJ/CaCE5dunym2bCpsbJi/aHawYv/8HdCpBdBkX/Fe64?= =?iso-8859-1?q?dMzj1ozH17sbwv2ZXQJLs8KVIgSF3CRL/zQ7Z2O/KS25OqjymoS3vmHI7Vnh?= =?iso-8859-1?q?yNBSGgoLbLgdXfVMGrCRoJ7mWUFW9ldr2nnCh/foXBxbpHl6kfsYWAQGPQFE?= =?iso-8859-1?q?SWWf2/ePy/2mGJPD9ohvM44EC/cE3i+xaLy/nH+h+eWR9Z4KeP9LL0qw5ah7?= =?iso-8859-1?q?tM/N0mvcSJs/ag1PadaDHjllk7e53r7lIHAprsVaQR5CQwl+a1RDa83RABxg?= =?iso-8859-1?q?sYBikijBJ8E3a2nOA1rYNsAFXGYEwNXDJoJWQ5Dno3axJY6ylZybaE309PHb?= =?iso-8859-1?q?zplNcPSHIqT8RP3ID49v8Iaea4MmKP9B1FtpQsy94j3s6VCeXdcFvHIcEr2Y?= =?iso-8859-1?q?y92poZ9nTId/uHq6xNblaJVMbnfZKRMyhetWPWnnvvbmKUPjkt2XH3Nns2Ok?= =?iso-8859-1?q?kLx6Viwcx3488gU68lTohcXaK8E7HEvupYllEtCV5ofUgZCg5vwsxKjrZ/Uy?= =?iso-8859-1?q?uZRODcHcJvnRYvMXqSChfiWbJ2pE2IYIoHKy7fA7v9ima1CpFP9pA0Fj7kfj?= =?iso-8859-1?q?2y7jdXP4QOkZ2bBs+6Lrw55kusTPrPLqwWJg9GFaMPgr4Bn23y/eeoEkZHnI?= =?iso-8859-1?q?iYzh9X0W3aZLrDoEI+uvKmJg=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?zCtgjHc934898222nDwKYDZ?= =?iso-8859-1?q?8tTA4szZCsCA02T04gz+FUWBN6Q5tNhNDelyWv43UpvqEgwNNXBRwEnt2Wrr?= =?iso-8859-1?q?yJ7MmYZA/NwWXWwDQDXoeC1n4ZbzXR6D7nExJ7k4wFn7p1qU80HiuxJ/QWZc?= =?iso-8859-1?q?Wu/pFYCUTTOXGc37y7SgoZJvpuJtLBQ22Ue2u3wxa6DaF98NU+QZ4hmiJ9Fj?= =?iso-8859-1?q?v6efu3xICJCBZZkMJtd9Jr+BrKooPMXPdfHKOg3UaGqcpXEQwDMBCZeTvFKv?= =?iso-8859-1?q?sOtuFUGCKJalK8wNtajojUbKAKydId+WJ9u4fMFJjkbwLUlXcDW4zNfqRPFo?= =?iso-8859-1?q?um9NGmH33reKKNz39uXTXDVhk4Ou5l+WHv7Rfux9DzjeW/dBcDQmHM8eVAPx?= =?iso-8859-1?q?58lU6YdVTsK5D7bMJPq/EFKsypP8/4urG5EQXsK8/zXQWlMJzNYf9zDfFClw?= =?iso-8859-1?q?sw2XQpRuezUh8YVcr/QsTqS8bX4r6lh7dBqC5KUXkpaHwad0SLjZReGCzHfE?= =?iso-8859-1?q?w+ly4ihTVoPmsBZf6EsIAXi49H6fc4mJlogDKmSACnymGndc5u7dTmPgFgQT?= =?iso-8859-1?q?uqqyQVEPzPbzA3TIa9vq1gL4nqTLxJsc3LSoeC4phmX23avGohphjiu3ZRRo?= =?iso-8859-1?q?V/yDaPSSGTosEhdCfm4h/fQvIzWrO1lLjaVMG3qKeBkwhheJHv+1vUoihpgS?= =?iso-8859-1?q?POamDE6by5PHhuvXVLWv4ctNSSeaBpF7HRuUbDYDnFSCIS11eHKMClJ7TQS9?= =?iso-8859-1?q?u8wJMMSTVR/Sy3g8W5Mvsv6f6MDS8rmWe/78BhWAHkXmuQFVToxBJEY4KGcD?= =?iso-8859-1?q?q1MwGCFN8wYd2eb4zFbtNpNxkbW9xPxgw1kQf/yqkE2Xa7IC4sTjve8fjvPx?= =?iso-8859-1?q?ey+DawfiN0JNassgxm+nVvgyu+l3Dd6ctHIA2l2oN26IoIXZnO7E5qAO7arJ?= =?iso-8859-1?q?hGKTU+Fk20GQoUGZOT354291FUzzsk+ZycNbsSKM/7OoX6574ExxSd1J1Qaf?= =?iso-8859-1?q?F0zLNURoMggyGRx/ztNqE2ooZ4tATy9QdfJfRUMxlODmM9mS6przOHlyMZFp?= =?iso-8859-1?q?0kDrOdGQZBMULy4Z9bbe+iS0sFrv52E+mR8hw6Ws8CCZPD0Pp/HVY7KPe4yQ?= =?iso-8859-1?q?5KEv2oRMbcda+UcsKYVBFjCMuNFregFcO7CTE990zFa+HCxv0qsofVtSKA8d?= =?iso-8859-1?q?Tx1K6rzpHNwRaFBmOdwCDLQMLb/rVJ6/056WyVKEdPC/voBu0CHE5s3Mm0t6?= =?iso-8859-1?q?49aTN4GnPNDX8sahi86KO24xXcigq3+23sbuAcnvjyuBtoGuRqJXiNR8y5ea?= =?iso-8859-1?q?59rPyyf94ehA4V58+tzw6DDEHGxGwFJJc70NnWnzscFavscSXAE+62QLJJz6?= =?iso-8859-1?q?eRRdrjFFvD4eteNdrT+bOT3rfEldHITvV1SbpgtS4fchXx71hEwmhjDbEgsJ?= =?iso-8859-1?q?UJSr8ZfVDw9QdH4H7euI0k/4Xn7juhLqNjjt8pJkfogOuogrg1wo0+CuNs3b?= =?iso-8859-1?q?t8kbAhZrrjcAsXXhdkJJtfwTq36kiW26ra0VgEl24t77Et4z9P/Kjqch+lrB?= =?iso-8859-1?q?mQ5LzYPZgrS1ne/q0Fwk+DYBNeyQ7MX2NlLnScAOrpdJ/pBY41b2Jx0ZoKFG?= =?iso-8859-1?q?tjipbtMWDQWkAXjTmBfAayIKtW9IIskH25iltug=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: dddba7b3-c503-4a9d-c132-08dca9659855 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:14:59.1761 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: dprWdc78DaSLlJBD2wPamL3bFJJ+nZ1b6XbwegFWL7MLPtZe9Cl6FPBhqriakJqGfWuEvqkwJED3YmSr03lcoWxWjlnu2nlI/LL3cmijn7Q= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org The work for RFC (https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html) involves not a little code change, so I have to separate it into several batches of patchset. This and the following patches constitute the first batch. Since pattern statement coexists with normal statements in a way that it is not linked into function body, we should not invoke utility procedures that depends on def/use graph on pattern statement, such as counting uses of a pseudo value defined by a pattern statement. This patch is to fix a bug of this type in vect pattern formation. Thanks, Feng --- gcc/ * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call single_imm_use if statement is not generated by pattern recognition. --- gcc/tree-vect-patterns.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) From 52e1725339fc7e4552eb7916570790c4ab7f133d Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Fri, 14 Jun 2024 15:49:23 +0800 Subject: [PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns Since pattern statement coexists with normal statements in a way that it is not linked into function body, we should not invoke utility procedures that depends on def/use graph on pattern statement, such as counting uses of a pseudo value defined by a pattern statement. This patch is to fix a bug of this type in vect pattern formation. 2024-06-14 Feng Xue gcc/ * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call single_imm_use if statement is not generated by pattern recognition. --- gcc/tree-vect-patterns.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 4570c25b664..ca8809e7cfd 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -2700,7 +2700,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a PLUS_EXPR then do the shift last as some targets can combine the shift and add into a single instruction. */ - if (lhs && single_imm_use (lhs, &use_p, &use_stmt)) + if (lhs && !STMT_VINFO_RELATED_STMT (stmt_info) + && single_imm_use (lhs, &use_p, &use_stmt)) { if (gimple_code (use_stmt) == GIMPLE_ASSIGN && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) -- 2.17.1 From patchwork Sun Jul 21 09:15:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 94289 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E99713861022 for ; Sun, 21 Jul 2024 09:17:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from CY4PR05CU001.outbound.protection.outlook.com (mail-westcentralusazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c112::]) by sourceware.org (Postfix) with ESMTPS id 532273861003 for ; Sun, 21 Jul 2024 09:15:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 532273861003 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 532273861003 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c112:: ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553345; cv=pass; b=UDxTVKbDGE8w+qs6aShtEob/+bss4oaMBSHcMb+NGsM+dDwgkpx0PGBVWB+SND+1GKiZATfcmqYGWl/Ky5xgD1U+ZJ2Vgu0egvXJBAD8ViksZ9MP2FvSOEVz9QcgjTjOBTwiqAIV/gqa194iKbVggcRc3cHo9JWsGOCjp4XAYpw= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553345; c=relaxed/simple; bh=ml/pCwdQXA1C7MLSiUoEs7BmwDUHIJHqF4U3HKeaWN4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=OLhudATJNNbGX+0TG/Rvcexm/snXurItKgjYwuyaEjjsoeJo6OijQDWgEuI1Qd3osJKcIJ/lDD+zLMpK/OOuwdqNhn4NFvp//D4gfE2HlnpvC2ayx+ospnyQiudRrzoOV8fCNc0sN6BV9HxBbjAc01tEKCFPGeRCc5Y0KkFlTNw= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=wgdScrJ1tbgeG2eXqZttoPgZA5j5BvZzmMnmAPKMaOSeSgw0uvrOzm+7Xe+WLpa12NKrWPQL8h4yG87z5bQrZfkChMpeuvE0xGYJKkTJ4Mtqs2EPh0bij6Nvx2YSwt9F1qtHMaAsX+uY+bZqjpz0iABaE5oaQfTdQlLV1CB5EVPSFO9yNA0YeiPZH8qyeRLjuBeGMh4aBeyfbgg+neeyAEPC7sM/OoxFSXoAi8pepNVPHRLS5VdDR7vRo3Z3lSVMxpI9PohWcPtWxD4PsfCWY//iSsAsFCY4fcf60IGR7PfpEp95nIu5r6hZUzFAIZkqliu4v29FEm/uShH5V4NvVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CN8qYJNHtQeZYY/Tggn0XgH1Abjf8mMpx2QTMahokUk=; b=KiP3g9CWHg3XDeNWFUUIbtF0ahhCqaLSN9PxUklQsGrXjuhozBtlOCvoWnSirZLFMFgCL0h4Juof16iYyf9+KS/CpS/OMLuYvNn+85jL8R2mrXNnhdP9874q1VYhHWFDZj733eFptDEjWImqtoJIoCDYcGTzXVJKrRtlhFWdf2aNfNWSPDyQsP4o/3Tpak0tEs2mXRNNvo6FvGvPaLFI8GFamp3ZsTd31hO5+Szslbp3Njd3wMscuO7QSmwy2vgtgPEbRwFjXMMjFDE6GXdrgxggBXDaGTIZYzys4R6AiDe0xxMtD/r1Cfk1sP8AaiJIRFhfOPrKTkkhdkF+iEpZpQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CN8qYJNHtQeZYY/Tggn0XgH1Abjf8mMpx2QTMahokUk=; b=EkE+Qc8/qB/WXNgdlBsDtuxhBPTJp+WS2DMV1XpypmZcXJwWI9EKPRbxBVMUb7m7tepIk0XS1hd0D9ZOnexfBDe5yw2R+KBTM0btsl1ARfcUCuS3FDUSl6EukDDrnODo0bzJjol3JZlh+qc52SsTdRsFp2KxY2md2p9hh9GIkJ8= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.14; Sun, 21 Jul 2024 09:15:34 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024 09:15:34 +0000 From: Feng Xue OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , Tamar Christina , Richard Sandiford Subject: [RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog Thread-Topic: [RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog Thread-Index: AQHa20pyTUhuVtaz7kG4SqRzj95c2A== Date: Sun, 21 Jul 2024 09:15:33 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:33.745Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_ x-ms-office365-filtering-correlation-id: b5106731-09fb-4dd1-d300-08dca965ad15 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?1Y+W1GlRzytCsdSzmb3ALPt/oo?= =?iso-8859-1?q?LZtZmihGKyvwjbBT3fjTL/VzF+dDYJG/HRwPN4Brxd50r0tnuse79snQlzOh?= =?iso-8859-1?q?Lfkj63Au/qoDdmeWgiSUNuJL2AwD/RWfpsXfuJ9FD/CByXalNO5fbLNVI3wg?= =?iso-8859-1?q?mv5MBtSHje3R+jLm5wencdthsF407kF7OO5VKL2LyttqXqvjc82iqIgfpahk?= =?iso-8859-1?q?xeWdXcl+jdaFre50EhCccO8IwPaNOpzsyk9darBA9vsfTEXYkFyNqqHkCpdb?= =?iso-8859-1?q?2domsinuIRHQmU9hmc8P5nUizfpBNSXCqDodsdDq7RGCELPPsTs/eeICnrSu?= =?iso-8859-1?q?cT0z7ef+RXhen654YUwE84uUxC2tsjfemhUkB3bJz+Qv26WyLL45gy0pZLml?= =?iso-8859-1?q?G7K7VUcplD3inhQe7now29yHCAGRZGzN6yYXcuIgI/9BczqTP7hdgYbnyT8s?= =?iso-8859-1?q?f7TfsBN1g/03BD9sEqiblNTCBV3KKEEShIKyJHsbwnzg/nOFkVkr6OdPg3pw?= =?iso-8859-1?q?aoTaG0oEGdFsJXOpZN2yXgoIZHC95x6b5mqmrSm5kA5z/pCYIwj0q7FZrX/V?= =?iso-8859-1?q?L0wg3ynzj/jEw8j+bPzVScsqiWHgOxzoDANOVjPpbBbSKwG9SyOTA1RS5u7C?= =?iso-8859-1?q?crzPIi4wJpJ/aR+rg2Wn99MHfWEA0859ETnte8qGAU0C9GwglV0idLTarATb?= =?iso-8859-1?q?yC3VMSfw0q58IiEf8fpqEYIQPQkl5DzVLKZkhSjMq3+GHN4prKzQIn2vcZGh?= =?iso-8859-1?q?F3N3y+RWkQrCmUtNDfHf7CZmZk8YaehKZm99iRxhqeJm+AfIpSFhgG7KnuBZ?= =?iso-8859-1?q?vlpIB4hSjWzMrWkIUx5Ht85/FEY4FN3q2BjNqgKPkffziDj+vRxUSfMw63Pv?= =?iso-8859-1?q?C0IOt8tmoQu88iOj37OLdMTQJxY3UA+3+dQSDatlNGogh77r2GYs+pc0a9T9?= =?iso-8859-1?q?FMBBeoDlYW9LtH+qVUHcLWj+jx/dgJ6rGU2DVp8T7J+niQtzc9qVFTjIodYW?= =?iso-8859-1?q?KJN03bmwuPfLSd15v29Nfd3aumi5RkqYH3gcVG5lEqzjP6a2qSfVeYIRd4Df?= =?iso-8859-1?q?ujOp8QrBvDLJWWSCDUqXppMP8ZNt+E4JyigMKn+XpjmtU3lzHVpRFQfHKkWv?= =?iso-8859-1?q?8WcmDtcDfIUmPbV/mx2Mmv8YN6ZpWluknZ2vUgYYClWvbxQZqkVieMK8jbhB?= =?iso-8859-1?q?PkQCmMExh322KF1OK7VfGecI7+LO901QMaf4bYEk6cDgfvRazmMO/WKI5/9H?= =?iso-8859-1?q?qilPdhEOJ3kYu09oeECxZnKN5zbD+2RvLrCoaxVAm07E1d8lnTx8bQ4F4i1S?= =?iso-8859-1?q?T8YP40M3X0Z+BovxsAzDnh0FPd1g8DqvUBOIoY3nxSad82I4MZGKhHQg6/d5?= =?iso-8859-1?q?j7o+MdzjzTQmyOafkXkt5hkYcli6klSZWLZKa8zyl8zBPdoBdQ1WdJ1wPcJ3?= =?iso-8859-1?q?ibr4NEo6aGHsNfmAHTeLvkXA=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?4ALfCZsIy8TIqJ5kLidiDBg?= =?iso-8859-1?q?hWSbMMoegcZ4YNwxpTjLRQac2XuwExeI7gaFOJNw9dPRs2BAB0/PvWylKO54?= =?iso-8859-1?q?XtoABQfMMrEHpRbuDYh3NIkgXpHYEhHQgkoLVUVFWHzV+9kD0kwgafgELe0E?= =?iso-8859-1?q?Qc5XBzFhYH3X2FUj6ggbPmezZTwSdheGvoC9NabNFqBd4endccT1PXsHBT/I?= =?iso-8859-1?q?yr7jbA3bJaS1d41dTdmsAoVLAJEUpLewtA9Ntyy4U0itxA6Wj+h8O/WF0BYQ?= =?iso-8859-1?q?EUT0yHhqVtkwtJcCtAHNfm7jCfM8lJuTtLm3Drgtr5YkL/IutUTwcRSgpzT3?= =?iso-8859-1?q?su3eQtzpn4lZU1jERkkkdaPDdMpHykQT9FZ4XDgoDGyqPyVT/7HMUeZmymhJ?= =?iso-8859-1?q?3LAH21iDQxjroN3JyNDtTbCQFk80h9iDDJiNc1G1o792s7y684zcdQuicK0w?= =?iso-8859-1?q?kWRU8jN6LLjOIEt65UQeSHdNnhj6HT7r88lAH6UUfKDkwfqrrGtAm/lNhaHG?= =?iso-8859-1?q?pDJcrpxXRp3fT3rH/L49NndYAkHuUgJIs0dGrpoWkBGg/11rf5cmakbIs8fz?= =?iso-8859-1?q?0HitvYDPfH3PgMfSa7h0zEHwXFNpWUGFbFIrdWPIZRngXu4zxH1zobEbpFca?= =?iso-8859-1?q?eXs8Jy2QtGf+6RT8F38qfj1xCfz8/KBVaMF3pSgKCa6ASK4w048e1YASBrRT?= =?iso-8859-1?q?NUZseQwEIcIMwosuzRGorlY8OzI62zhqPReKzhWg28ZWAMCeP5klSHTic1iq?= =?iso-8859-1?q?hMtbiM8vEVPKQBUXx0UtFJKqWfoaIEYyZjibZNvisflJpCci2h/vnKey+aR6?= =?iso-8859-1?q?3u7Y1nEzrwIFHf/esx3s3t5vhTKbQMWxKECslwBOg1GD9FBeGVJ3YwWEi15B?= =?iso-8859-1?q?rif4dUn7CfJ/DjYOveFMeUWidhW1/Ujc1/nayyEIWjDDqbRq7cDEKahU3bIC?= =?iso-8859-1?q?Mos5EhKGERFLF/JDRgvq/t+Rqkm4qE7v2r9QGoW45fs9ILq5x2tmdkXTNI2q?= =?iso-8859-1?q?q5JJluVxOkkIAd7BUKXHDNtuJYq3Dr0dZ5gJd87RrP1tPfRrutO8NfL22V2M?= =?iso-8859-1?q?CVyQ7/zDL//28mJ9/w/2uFOgp8+gu/3zZ0bbhuSRvrAVfHKjze259uk46WHb?= =?iso-8859-1?q?g8/kk8PoIPpd8oHyzWf/svohn45u5423aB1bYzXOCSh3ckqFYpdhYPMXNVYB?= =?iso-8859-1?q?pXplR6C51ZBqlgfyqqpN9ruN4NwXIA0U7LhtD0MV6jhs1gSwqtE6DEnmCniu?= =?iso-8859-1?q?WWvR+9vPQhUWftVWakhTxpcaZq4skPqp5aywAPXVLbhMv+n+llUzDcXuNxol?= =?iso-8859-1?q?eeGkYrnrEs3N0lswuNjtlA0dUOjuyFD/23ptVHNo/+89Wl4ko6DgULEK12Vk?= =?iso-8859-1?q?2MWMvk2+X8ZwBK+uu3mR/SlagSAaI/SV6Cp8W6FM+J7FxGMU1PxKPxja4Kl3?= =?iso-8859-1?q?mwq+4cz2v08woYrbN2aJvdimPaYk8G1mD4QFqSdsjmMofhf6CdH3zHLZ8ADv?= =?iso-8859-1?q?dXyKgWs6T8cApgLDQBoKyPa9ZDBjv68x0mrTxUWMob5gZGFTKFbCXdvsI6lu?= =?iso-8859-1?q?/7vefIp7Utg8i+jb0MyCeiQ/uxH3LMIE5dumq1B+jGRnWBrynIXBCiJmYPth?= =?iso-8859-1?q?m1YiBYHv6rNEjYkQHLCUYM5EOBlV3xEoZIT/fUQ=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: b5106731-09fb-4dd1-d300-08dca965ad15 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:34.0042 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 5gJVhFNBHTVRaSn0ifGhEoBPanLriRBEzDUEIvGKforqptKyv4J6yP0ejZIh2oHOunYUyEAK7QttY19vxS+sgBr2WJ/KL3e44OFjsvV0X2o= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org For sum-based loop reduction, its affine closure is composed by statements whose results and derived computation only end up in the reduction, and are not used in any non-linear transform operation. The concept underlies the generalized lane-reducing pattern recognition in the coming patches. As mathematically proved, it is legitimate to optimize evaluation of a value with lane-reducing pattern, only if its definition statement locates in affine closure. That is to say, canonicalized representation for loop reduction could be of the following affine form, in which "opX" denotes an operation for lane-reducing pattern, h(i) represents remaining operations irrelvant to those patterns. for (i) sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); At initialization, we invoke a preprocessing step to mark all statements in affine closure, which could ease retrieval of the property during pattern matching. Since a pattern hit would replace original statement with new pattern statements, we resort to a postprocessing step after recognition, to parse semantics of those new, and incrementally update affine closure, or rollback the pattern change if it would break completeness of existing closure. Thus, inside affine closure, recog framework could universally handle both lane-reducing and normal patterns. Also with this patch, we are able to add more complicated logic to enhance lane-reducing patterns. Thanks, Feng --- gcc/ * tree-vectorizer.h (enum vect_reduc_pattern_status): New enum. (_stmt_vec_info): Add a new field reduc_pattern_status. * tree-vect-patterns.cc (vect_split_statement): Adjust statement status for reduction affine closure. (vect_convert_input): Do not reuse conversion statement in process. (vect_reassociating_reduction_p): Add a condition check to only allow statement in reduction affine closure. (vect_pattern_expr_invariant_p): New function. (vect_get_affine_operands_mask): Likewise. (vect_mark_reduction_affine_closure): Likewise. (vect_mark_stmts_for_reduction_pattern_recog): Likewise. (vect_get_prev_reduction_stmt): Likewise. (vect_mark_reduction_pattern_sequence_formed): Likewise. (vect_check_pattern_stmts_for_reduction): Likewise. (vect_pattern_recog_1): Check if a pattern recognition would break existing lane-reducing pattern statements. (vect_pattern_recog): Mark loop reduction affine closure. --- gcc/tree-vect-patterns.cc | 722 +++++++++++++++++++++++++++++++++++++- gcc/tree-vectorizer.h | 23 ++ 2 files changed, 742 insertions(+), 3 deletions(-) From 737e7ea35dff9d85f5dbd5ec908e8b8229a6631d Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Mon, 8 Apr 2024 10:57:54 +0800 Subject: [PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog For sum-based loop reduction, its affine closure is composed by statements whose results and derived computation only end up in the reduction, and are not used in any non-linear transform operation. The concept underlies the generalized lane-reducing pattern recognition in the coming patches. As mathematically proved, it is legitimate to optimize evaluation of a value with lane-reducing pattern, only if its definition statement locates in affine closure. That is to say, canonicalized representation for loop reduction could be of the following affine form, in which "opX" denotes an operation for lane-reducing pattern, h(i) represents remaining operations irrelvant to those patterns. for (i) sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); At initialization, we invoke a preprocessing step to mark all statements in affine closure, which could ease retrieval of the property during pattern matching. Since a pattern hit would replace original statement with new pattern statements, we resort to a postprocessing step after recognition, to parse semantics of those new, and incrementally update affine closure, or rollback the pattern change if it would break completeness of existing closure. Thus, inside affine closure, recog framework could universally handle both lane-reducing and normal patterns. Also with this patch, we are able to add more complicated logic to enhance lane-reducing patterns. 2024-04-08 Feng Xue gcc/ * tree-vectorizer.h (enum vect_reduc_pattern_status): New enum. (_stmt_vec_info): Add a new field reduc_pattern_status. * tree-vect-patterns.cc (vect_split_statement): Adjust statement status for reduction affine closure. (vect_convert_input): Do not reuse conversion statement in process. (vect_reassociating_reduction_p): Add a condition check to only allow statement in reduction affine closure. (vect_pattern_expr_invariant_p): New function. (vect_get_affine_operands_mask): Likewise. (vect_mark_reduction_affine_closure): Likewise. (vect_mark_stmts_for_reduction_pattern_recog): Likewise. (vect_get_prev_reduction_stmt): Likewise. (vect_mark_reduction_pattern_sequence_formed): Likewise. (vect_check_pattern_stmts_for_reduction): Likewise. (vect_pattern_recog_1): Check if a pattern recognition would break existing lane-reducing pattern statements. (vect_pattern_recog): Mark loop reduction affine closure. --- gcc/tree-vect-patterns.cc | 722 +++++++++++++++++++++++++++++++++++++- gcc/tree-vectorizer.h | 23 ++ 2 files changed, 742 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index ca8809e7cfd..02f6b942026 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -750,7 +750,6 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs, gimple_stmt_iterator gsi = gsi_for_stmt (stmt2_info->stmt, def_seq); gsi_insert_before_without_update (&gsi, stmt1, GSI_SAME_STMT); } - return true; } else { @@ -783,9 +782,35 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs, dump_printf_loc (MSG_NOTE, vect_location, "and: %G", (gimple *) new_stmt2); } + } - return true; + /* Since this function would change existing conversion statement no matter + the pattern is finally applied or not, we should check whether affine + closure of loop reduction need to be adjusted for impacted statements. */ + unsigned int status = stmt2_info->reduc_pattern_status; + + if (status != rpatt_none) + { + tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (stmt1)); + tree new_rhs_type = TREE_TYPE (new_rhs); + + /* The new statement generated by splitting is a nature widening + conversion. */ + gcc_assert (TYPE_PRECISION (rhs_type) < TYPE_PRECISION (new_rhs_type)); + gcc_assert (TYPE_UNSIGNED (rhs_type) || !TYPE_UNSIGNED (new_rhs_type)); + + /* The new statement would not break transform invariance of lane- + reducing operation, if the original conversion depends on the one + formed previously. For the case, it should also be marked with + rpatt_formed status. */ + if (status & rpatt_formed) + vinfo->lookup_stmt (stmt1)->reduc_pattern_status = rpatt_formed; + + if (!is_pattern_stmt_p (stmt2_info)) + STMT_VINFO_RELATED_STMT (stmt2_info)->reduc_pattern_status = status; } + + return true; } /* Look for the following pattern @@ -890,7 +915,10 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type, return wide_int_to_tree (type, wi::to_widest (unprom->op)); tree input = unprom->op; - if (unprom->caster) + + /* We should not reuse conversion, if it is just the statement under pattern + recognition. */ + if (unprom->caster && unprom->caster != stmt_info) { tree lhs = gimple_get_lhs (unprom->caster->stmt); tree lhs_type = TREE_TYPE (lhs); @@ -1018,6 +1046,11 @@ vect_reassociating_reduction_p (vec_info *vinfo, if (!loop_info) return false; + /* As a candidate of lane-reducing pattern matching, the statement must + be inside affine closure of loop reduction. */ + if (!(stmt_info->reduc_pattern_status & rpatt_allow)) + return false; + gassign *assign = dyn_cast (stmt_info->stmt); if (!assign || gimple_assign_rhs_code (assign) != code) return false; @@ -7201,6 +7234,672 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); +/* Check if EXPR is invariant regarding to vectorization region VINFO. */ + +static bool +vect_pattern_expr_invariant_p (vec_info *vinfo, tree expr) +{ + enum vect_def_type dt; + + if (TREE_CODE (expr) == SSA_NAME) + { + if (SSA_NAME_IS_DEFAULT_DEF (expr)) + return true; + + /* This is a value that is defined by a pattern statement that has not + been bounded with its original statement. */ + if (!gimple_bb (SSA_NAME_DEF_STMT (expr))) + return false; + } + + if (!vect_is_simple_use (expr, vinfo, &dt)) + return false; + + if (dt == vect_external_def || dt == vect_constant_def) + return true; + + return false; +} + +/* If OP is a linear transform operation, return index bit mask of all possible + variant operands, otherwise, return 0. */ + +static int +vect_get_affine_operands_mask (vec_info *vinfo, const gimple_match_op &op) +{ + switch (op.code.safe_as_tree_code ()) + { + CASE_CONVERT: + if (!tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0]))) + break; + /* FALLTHRU */ + + case SSA_NAME: + case NEGATE_EXPR: + case BIT_NOT_EXPR: + return 1 << 0; + + case PLUS_EXPR: + case MINUS_EXPR: + return (1 << 0) | (1 << 1); + + case MULT_EXPR: + if (vect_pattern_expr_invariant_p (vinfo, op.ops[0])) + return 1 << 1; + /* FALLTHRU */ + + case LSHIFT_EXPR: + if (vect_pattern_expr_invariant_p (vinfo, op.ops[1])) + return 1 << 0; + break; + + default: + if (lane_reducing_op_p (op.code)) + { + /* The last operand of lane-reducing op is for reduction. */ + gcc_assert (op.num_ops > 1); + return 1 << (op.num_ops - 1); + } + break; + } + + return 0; +} + +/* Mark all statements in affine closure whose computation leads to START that + is non-reduction addend of a loop reduction statement. The corresponding + reduction PHI is represented by REDUC_INFO. For ssa name defined by marked + statement, we record the count of uses that have not been marked so far, + into hash map USE_CNT_MAP. This function is to be called for all reduction + statements in the loop. */ + +static void +vect_mark_reduction_affine_closure (loop_vec_info loop_vinfo, + tree start, stmt_vec_info reduc_info, + hash_map &use_cnt_map) +{ + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + auto_vec worklist; + + worklist.safe_push (start); + + do + { + tree value = worklist.pop (); + stmt_vec_info stmt_info = loop_vinfo->lookup_def (value); + + if (!stmt_info || STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + continue; + + if (!has_single_use (value)) + { + bool exist; + auto &use_cnt = use_cnt_map.get_or_insert (value, &exist); + + if (!exist) + use_cnt = num_imm_uses (value); + + gcc_checking_assert (use_cnt > 0); + + /* As long as value is not referred by statement outside of reduction + affine closure, we are free to apply lane-reducing patterns to it + without duplication, no matter whether the value is single used + or not, thus even sharing a lane-reducing operation among multiple + loop reductions could be possible. */ + if (--use_cnt) + continue; + } + + gimple *stmt = stmt_info->stmt; + gimple_match_op op; + + /* Skip reduction PHI statement and leaf statement like "x = const". */ + if (!gimple_extract_op (stmt, &op)) + continue; + + if (needs_fold_left_reduction_p (op.type, op.code) + || gimple_bb (stmt)->loop_father != loop) + continue; + + stmt_info->reduc_pattern_status = rpatt_allow; + + /* Vectorizable analysis and transform on lane-reducing operation needs + some information in the associated reduction PHI statement. */ + STMT_VINFO_REDUC_DEF (stmt_info) = reduc_info; + + if (auto mask = vect_get_affine_operands_mask (loop_vinfo, op)) + { + /* Try to expand affine closure to dependant affine operands. */ + for (unsigned i = 0; i < op.num_ops; i++) + { + if (mask & (1 << i)) + worklist.safe_push (op.ops[i]); + } + } + } while (!worklist.is_empty ()); +} + +/* The prerequisite to optimize evaluation of a value with lane-reducing + pattern is that its definition statement must locate in affine closure of + non-reduction addend of loop reduction statements. To be specific, the + value and all its derived computation only end up in loop reductions, and + are not used in any non-linear transform operation. That is to say, if + such kind of patterns are matched, final pattern statements for loop + reduction could be canonicalized to the following affine form, in which + "opX" denotes a lane-reducing operation, h(i) represents other operations + irrelvant to those patterns. + + for (i) + sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); + + This function traverses all loop reductions to discover affine closures + and mark all statements inside them. */ + +static void +vect_mark_stmts_for_reduction_pattern_recog (loop_vec_info loop_vinfo) +{ + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + const edge latch = loop_latch_edge (loop); + basic_block header = loop->header; + hash_map use_cnt_map; + + DUMP_VECT_SCOPE ("vect_mark_stmts_for_reduction_pattern_recog"); + + for (auto gsi = gsi_start_phis (header); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gphi *phi = gsi.phi (); + stmt_vec_info reduc_info = loop_vinfo->lookup_stmt (phi); + + if (!reduc_info + || STMT_VINFO_DEF_TYPE (reduc_info) != vect_reduction_def + || STMT_VINFO_REDUC_CODE (reduc_info) != PLUS_EXPR + || STMT_VINFO_REDUC_TYPE (reduc_info) != TREE_CODE_REDUCTION) + continue; + + tree start_def = PHI_RESULT (phi); + tree reduc_def = PHI_ARG_DEF_FROM_EDGE (phi, latch); + auto_vec reduc_stmts; + auto_vec addends; + + while (reduc_def != start_def) + { + gimple *stmt = SSA_NAME_DEF_STMT (reduc_def); + gimple_match_op op; + + /* Dot not step into inner loop. */ + if (gimple_bb (stmt)->loop_father != loop) + break; + + if (!gimple_extract_op (stmt, &op)) + { + gcc_assert (gimple_code (stmt) == GIMPLE_PHI); + break; + } + + stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt); + int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); + + gcc_assert (reduc_idx >= 0 && reduc_idx < (int) op.num_ops); + + if (op.code == PLUS_EXPR || op.code == MINUS_EXPR) + { + if (needs_fold_left_reduction_p (op.type, op.code)) + break; + + /* Record non-reduction addend. */ + addends.safe_push (op.ops[reduc_idx ? 0 : 1]); + } + else + { + gcc_assert (CONVERT_EXPR_CODE_P (op.code)); + if (!tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0]))) + break; + } + + reduc_stmts.safe_push (stmt_info); + reduc_def = op.ops[reduc_idx]; + } + + if (reduc_def == start_def) + { + /* Mark reduction PHI statement although it would not be matched + against any pattern. */ + reduc_info->reduc_pattern_status = rpatt_allow; + + for (auto stmt_info : reduc_stmts) + { + /* Mark reduction statement itself. */ + stmt_info->reduc_pattern_status = rpatt_allow; + + /* Vectorizable analysis and transform on lane-reducing operation + needs some information in the associated reduction PHI + statement. */ + STMT_VINFO_REDUC_DEF (stmt_info) = reduc_info; + } + + /* Mark statements that participate in loop reduction indirectly + through non-reduction addends. */ + for (auto addend : addends) + vect_mark_reduction_affine_closure (loop_vinfo, addend, + reduc_info, use_cnt_map); + } + } +} + +/* For a reduction statement STMT_INFO, which could also be the reduction PHI, + return the previous reduction statement that it depends on. */ + +static stmt_vec_info +vect_get_prev_reduction_stmt (loop_vec_info loop_vinfo, + stmt_vec_info stmt_info) +{ + gimple *stmt = stmt_info->stmt; + tree prev_def; + + if (is_a (stmt)) + { + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + const edge latch = loop_latch_edge (loop); + + gcc_assert (STMT_VINFO_REDUC_DEF (stmt_info)); + gcc_assert (loop == gimple_bb (stmt)->loop_father); + prev_def = PHI_ARG_DEF_FROM_EDGE (stmt, latch); + } + else + { + int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); + gimple_match_op op; + + if (!gimple_extract_op (stmt, &op)) + gcc_unreachable (); + + gcc_assert (reduc_idx >= 0 && reduc_idx < (int) op.num_ops); + prev_def = op.ops[reduc_idx]; + } + + return vect_stmt_to_vectorize (loop_vinfo->lookup_def (prev_def)); +} + +/* Given pattern statement sequence for ORIG_STMT_INFO (including PATTERN_STMT + and STMT_VINFO_PATTERN_DEF_SEQ), a subset of it represented by FORMED_STMTS + are known to depend on (or just be) lane-reducing operations. In this + function, the subset would be marked with rpatt_formed at first, then the + status is forward propagated to every dependent pattern statement along + paths that contribute to PATTERN_STMT, other statements remain unchanged. + FORMED_STMTS is reset to empty upon completion. */ + +static void +vect_mark_reduction_pattern_sequence_formed (loop_vec_info loop_vinfo, + stmt_vec_info orig_stmt_info, + gimple *pattern_stmt, + vec &formed_stmts) +{ + stmt_vec_info last_stmt = formed_stmts.last (); + stmt_vec_info related_stmt = STMT_VINFO_RELATED_STMT (last_stmt); + gimple_seq pattern_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info); + hash_map> use_map; + + /* Due to lack of a mechanism to quickly get immedidate uses for a pattern + def, we have to build a simple def-use graph out of pattern statement + sequence. */ + for (auto seq : { pattern_stmt, pattern_seq }) + for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi)) + { + stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (gsi)); + gimple_match_op op; + + gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == related_stmt); + + /* Since elements are placed to FORMED_STMTS in the way that the nearer + distance to PATTERN_STMT, the first order, pattern statements after + the last one in the set must not depend on lane-reducing operation, + no need to process them. */ + if (stmt_info == last_stmt) + goto out; + + if (!gimple_extract_op (stmt_info->stmt, &op)) + continue; + + for (unsigned i = 0; i < op.num_ops; i++) + { + if (TREE_CODE (op.ops[i]) == SSA_NAME) + use_map.get_or_insert (op.ops[i]).safe_push (stmt_info); + } + } +out: + + basic_block bb = gimple_bb (pattern_stmt); + + do + { + stmt_vec_info stmt_info = formed_stmts.pop (); + gimple *stmt = stmt_info->stmt; + + gcc_assert (gimple_bb (stmt) == bb); + + /* A statement may be reached from more than one lane-reducing + operations, suppose a case in which two dot-products are added + together. */ + if (stmt_info->reduc_pattern_status & rpatt_formed) + continue; + + stmt_info->reduc_pattern_status |= rpatt_formed; + + /* Do not propagate status outside of pattern statement sequence. */ + if (stmt != pattern_stmt) + { + auto *uses = use_map.get (gimple_get_lhs (stmt)); + + gcc_assert (uses); + for (auto use : *uses) + formed_stmts.safe_push (use); + } + } while (!formed_stmts.is_empty ()); +} + +/* A successful pattern recognition would replace matched statement with new + pattern statements, which might cause loop reduction affine closure being + changed. On the one hand, new linear-transform-like pattern statement could + be pulled into closure, for example, this could happen with a pattern that + decomposes a mult-by-constant to a series of additions and shifts. On the + other hand, some statements that are originally in closure have to be kicked + out if linearity of a relay statement linking into the closure would be + broken, such as, due to introduction of a non-trivial conversion. However, + this would get us into a conflict situation when impacted statement connects + lane-reducing and loop reduction statement, in that lane-reducing pattern + could not be reverted once it has been formed. Only alternative is to + invalidate the other pattern in process. + + Therefore, after a pattern is recognized on ORIG_STMT_INFO, this function + is called to parse semantics of all new pattern statements (including + PATTERN_STMT), and check if possible resultant adjustment on affine closure + of loop reduction would conflict with existing lane-reducing statements, if + not, return true, otherwise, return false. */ + +static bool +vect_check_pattern_stmts_for_reduction (loop_vec_info loop_vinfo, + stmt_vec_info orig_stmt_info, + gimple *pattern_stmt) +{ + unsigned status = orig_stmt_info->reduc_pattern_status; + + /* Nothing to be done if original statement is reduction irrelevant. */ + if (status == rpatt_none) + return true; + + /* Degraded lane-reducing statement is not in reduction affine closure. + Pattern recognition on such statement should be very rare. Do not allow + it for simplicity. */ + if (!(status & rpatt_allow)) + return false; + + auto_vec non_reduc_stmts; + auto_vec rpatt_formed_stmts; + gimple_seq pattern_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info); + gimple_match_op op; + + for (auto seq : { pattern_stmt, pattern_seq }) + for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt); + + if (!stmt_info) + stmt_info = loop_vinfo->add_stmt (stmt); + + /* Replacement of original statement by pattern statement sequence has + not been committed yet, so basic block is not set. This fact could + be used to distinguish these pending pattern statements from + existing ones. */ + gcc_assert (!gimple_bb (stmt)); + + /* Initially mark pattern statement as in affine closure, and this + status might be changed later according to def/use relationship + among all pattern statements. */ + stmt_info->reduc_pattern_status = rpatt_allow; + } + + /* Traverse statements in the order that use precedes def. */ + for (auto seq : { pattern_stmt, pattern_seq }) + for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + /* Need not do any further for leaf statement like "x = const". */ + if (!gimple_extract_op (stmt, &op)) + continue; + + stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt); + int affine_oprnds_mask = 0; + + if (needs_fold_left_reduction_p (op.type, op.code)) + stmt_info->reduc_pattern_status = rpatt_none; + else if (stmt_info->reduc_pattern_status == rpatt_allow) + affine_oprnds_mask = vect_get_affine_operands_mask (loop_vinfo, op); + + /* Record lane-reducing statement into a set from which forward + propagation of rpatt_formed status would start. */ + if (lane_reducing_op_p (op.code)) + rpatt_formed_stmts.safe_push (stmt_info); + + for (unsigned i = 0; i < op.num_ops; i++) + { + tree oprnd = op.ops[i]; + stmt_vec_info oprnd_info = loop_vinfo->lookup_def (oprnd); + + if (oprnd_info) + oprnd_info = vect_stmt_to_vectorize (oprnd_info); + else if (vect_pattern_expr_invariant_p (loop_vinfo, oprnd)) + continue; + else + { + /* Pattern statement contains unvectorizable operand, simply + bail out. */ + return false; + } + + if (!(affine_oprnds_mask & (1 << i))) + { + /* It is expected that this operand would not be in affine + closure. */ + + if (!gimple_bb (oprnd_info->stmt)) + { + /* The operand is defined by another uncommitted pattern + statement, whose status should be changed to + rpatt_none. */ + oprnd_info->reduc_pattern_status = rpatt_none; + } + else if (oprnd_info->reduc_pattern_status & rpatt_formed) + { + /* Conflict with an existing lane-reducing pattern + statement, so fail the check. TODO: Allow pattern + statement that uses value defined by degraded lane- + reducing statement. */ + return false; + } + else if (oprnd_info->reduc_pattern_status & rpatt_allow) + { + /* This statement has to be removed from affine closure. + Here only record it into a set, and the actual removal + action will be recursively performed later on it and + all statements that are linked to the closure through + it. */ + non_reduc_stmts.safe_push (oprnd_info); + } + } + else if (oprnd_info->reduc_pattern_status & rpatt_formed) + { + /* There must be a path from the original statement to some + lane-reducing statement. */ + gcc_assert (status & rpatt_formed); + + /* The operand definition statement should not be uncommited + pattern statement, for which propagation of rpatt_formed + status has not been started. */ + gcc_assert (gimple_bb (oprnd_info->stmt)); + + /* The operand definition statement should be in reduction + affine closure. */ + gcc_assert (oprnd_info->reduc_pattern_status & rpatt_allow); + + /* This uncommitted pattern statement is a boundary point to + which rpatt_formed status would be propagated from other + exisiting statement. */ + rpatt_formed_stmts.safe_push (stmt_info); + } + } + } + + /* Forward propagate rpatt_formed status inside uncommitted pattern statement + sequence. */ + if (!rpatt_formed_stmts.is_empty ()) + vect_mark_reduction_pattern_sequence_formed (loop_vinfo, orig_stmt_info, + pattern_stmt, + rpatt_formed_stmts); + + stmt_vec_info pattern_stmt_info = loop_vinfo->lookup_stmt (pattern_stmt); + unsigned pattern_status = pattern_stmt_info->reduc_pattern_status; + + /* Overriding formed lane-reducing operation by another new normal pattern + matching is not allowed. */ + if ((status & rpatt_formed) && !(pattern_status & rpatt_formed)) + return false; + + if (pattern_status == rpatt_none && vect_is_reduction (orig_stmt_info)) + { + auto prev = vect_get_prev_reduction_stmt (loop_vinfo, orig_stmt_info); + + /* Since statements in a reduction chain are cyclically dependent, we + have to exclude the whole chain from affine closure if one reduction + statement does not meet lane-reducing prerequisite. Then prepare for + propagating rpatt_none status to the previous reduction statement. */ + non_reduc_stmts.safe_push (prev); + + /* Terminate propagation when rotating back to the original + statement. */ + orig_stmt_info->reduc_pattern_status = rpatt_none; + } + + /* Backward propagate rpatt_none status to existing statements. */ + while (!non_reduc_stmts.is_empty ()) + { + stmt_vec_info stmt_info = non_reduc_stmts.pop (); + gimple *stmt = stmt_info->stmt; + + gcc_assert (gimple_bb (stmt)); + + if (stmt_info->reduc_pattern_status == rpatt_none) + continue; + + gcc_assert (!(stmt_info->reduc_pattern_status & rpatt_formed)); + stmt_info->reduc_pattern_status = rpatt_none; + + if (is_a (stmt)) + { + auto prev = vect_get_prev_reduction_stmt (loop_vinfo, stmt_info); + + /* For reduction PHI, propagation shoule be confined inside the loop, + so only through latch edge. */ + non_reduc_stmts.safe_push (prev); + continue; + } + + if (!gimple_extract_op (stmt, &op)) + continue; + + gcc_assert (!lane_reducing_op_p (op.code)); + + for (unsigned i = 0; i < op.num_ops; i++) + { + if (auto oprnd_info = loop_vinfo->lookup_def (op.ops[i])) + non_reduc_stmts.safe_push (vect_stmt_to_vectorize (oprnd_info)); + } + } + + if ((status & rpatt_formed) || !(pattern_status & rpatt_formed)) + { + /* If lane-reducing statement has already existed on other path to the + original statement, no need to propagate rpatt_formed status again. + Or no lane-reducing statement is generated, nothing to do. */ + return true; + } + + rpatt_formed_stmts.safe_push (orig_stmt_info); + + /* Forward propagate rpatt_formed status inside existing pattern statement + sequence. */ + if (is_pattern_stmt_p (orig_stmt_info)) + { + stmt_vec_info root_orig_info = vect_orig_stmt (orig_stmt_info); + stmt_vec_info root_pattern = vect_stmt_to_vectorize (root_orig_info); + + vect_mark_reduction_pattern_sequence_formed (loop_vinfo, root_orig_info, + root_pattern->stmt, + rpatt_formed_stmts); + + gcc_assert (root_pattern->reduc_pattern_status & rpatt_formed); + rpatt_formed_stmts.safe_push (root_orig_info); + } + + /* Forward propagate rpatt_formed status to existing statements that haven + not been processed for pattern recognition. */ + do + { + stmt_vec_info stmt_info = rpatt_formed_stmts.pop (); + gimple *stmt = stmt_info->stmt; + + gcc_assert (gimple_bb (stmt)); + + if (stmt_info->reduc_pattern_status & rpatt_formed) + continue; + + gcc_assert (stmt_info->reduc_pattern_status & rpatt_allow); + stmt_info->reduc_pattern_status |= rpatt_formed; + + if (vect_is_reduction (stmt_info) || is_a (stmt)) + { + auto prev = vect_get_prev_reduction_stmt (loop_vinfo, stmt_info); + + /* We must consider statements in reduction chain as a whole in order + to ensure legality of lane-reducing operations, for which all + reduction statements should be marked with rpatt_formed status. + As a special handling, here we traverse reduction statements + "backforward", in that some of them might be created by pattern, + and currently, there is no straightforward way to obtain + immedidate uses for a value defined by pattern statement. Since + reduction statements are in a cycle chain, the approach would lead + to the same marking as forward progation. */ + rpatt_formed_stmts.safe_push (prev); + continue; + } + + tree lhs = gimple_get_lhs (stmt); + imm_use_iterator iter; + gimple *use_stmt; + + /* The statement has not been processed yet, so we could walk def/use + chain by normal means. */ + gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info)); + gcc_assert (!is_pattern_stmt_p (stmt_info)); + + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) + { + if (is_gimple_debug (use_stmt)) + continue; + + stmt_vec_info use_stmt_info = loop_vinfo->lookup_stmt (use_stmt); + + /* Because the statement is not reduction statement or PHI, it + should not have any use outside of the loop. */ + gcc_assert (gimple_has_lhs (use_stmt) && use_stmt_info); + rpatt_formed_stmts.safe_push (use_stmt_info); + } + } while (!rpatt_formed_stmts.is_empty ()); + + return true; +} + /* Mark statements that are involved in a pattern. */ void @@ -7383,6 +8082,19 @@ vect_pattern_recog_1 (vec_info *vinfo, return; } + loop_vec_info loop_vinfo = dyn_cast (vinfo); + + /* Check if the pattern would break existing lane-reducing pattern + statements. */ + if (loop_vinfo + && !vect_check_pattern_stmts_for_reduction (loop_vinfo, stmt_info, + pattern_stmt)) + { + /* Invalidate the pattern when detecting conflict. */ + STMT_VINFO_PATTERN_DEF_SEQ (stmt_info) = NULL; + return; + } + /* Found a vectorizable pattern. */ if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -7481,6 +8193,10 @@ vect_pattern_recog (vec_info *vinfo) DUMP_VECT_SCOPE ("vect_pattern_recog"); + /* Mark loop reduction affine closure for lane-reducing patterns. */ + if (auto loop_vinfo = dyn_cast (vinfo)) + vect_mark_stmts_for_reduction_pattern_recog (loop_vinfo); + /* Scan through the stmts in the region, applying the pattern recognition functions starting at each stmt visited. */ for (unsigned i = 0; i < nbbs; i++) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index df6c8ada2f7..52793ee87e9 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1185,6 +1185,25 @@ enum slp_vect_type { hybrid }; +/* The status of statement for lane-reducing patterns matching. */ +enum vect_reduc_pattern_status { + /* Statement is not in loop reduction affine closure. */ + rpatt_none = 0, + + /* Statement is part of loop reduction affine closure, so it is candidate of + lane-reducing patterns. */ + rpatt_allow = 1, + + /* Statement is or depends on lane-reducing pattern statement, once being + marked, the status could not be changed. In most situations, the + statement also has the status of rpatt_allow. One exceptional case + is that when lane-reducing to a given result type is not supported by + target, we could settle for second best by creating a degraded lane- + reducing statement with a smaller intermediate result type. Such + statement is not in affine closure. */ + rpatt_formed = 2 +}; + /* Says whether a statement is a load, a store of a vectorized statement result, or a store of an invariant value. */ enum vec_load_store_type { @@ -1431,6 +1450,10 @@ public: /* Whether on this stmt reduction meta is recorded. */ bool is_reduc_info; + /* Describe how the statement would be handled when performing lane-reducing + pattern matching. */ + unsigned int reduc_pattern_status; + /* If nonzero, the lhs of the statement could be truncated to this many bits without affecting any users of the result. */ unsigned int min_output_precision; -- 2.17.1 From patchwork Sun Jul 21 09:15:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 94287 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B205B385DDFB for ; Sun, 21 Jul 2024 09:16:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from CY4PR05CU001.outbound.protection.outlook.com (mail-westcentralusazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c112::]) by sourceware.org (Postfix) with ESMTPS id 40E193858C39 for ; Sun, 21 Jul 2024 09:15:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 40E193858C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 40E193858C39 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c112:: ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553348; cv=pass; b=ec6VHfYfroLLe2fzgpGTRn3icN9VCvpJhLHQb2zRUyfNbtrLNSSX51idXE29hfeIvzEO11o8u5upwAURobj8Q4gtwT4ZxcoEfFQLu7X2K1tCubI4lyxNEs7iNIaWHWq9BD7LsTOI13NTnQxvbQ0PnUJJxXFYhJryETx5cxFa8Vs= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553348; c=relaxed/simple; bh=qINA8VTV7GbdLAwUheK5t4/gJI4n4VSpALbFfy+FEBE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=VxHwEX2fOGfc0AwVz1eJuAbhJKOxEQ8QKPpUnWdZIQw1VmhN2J4hmXtrHkh9EGGD+TXgvagAkirTTP6tazS+7AdpVncqlpvxKK+wIBRBXk2ccDBUaBcfSq0Y1TDpckN6Jmyo2AG5p9RymR4KyKXQGQfGNru4i4POhelyZtB6GZA= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=bCzdy72zVj41bKq1wXMFxe1xIgjOgDnFQ3APshTj8dh9+rytkEOqhSbYyB19Jjxan/UnhxcMcc67dQlbL6lACnlG16uiZnNQ+wKtc2l9y9fkJus81I9lE47/0xZcffxM1cMLFcM20TyFw/76bn9Xb90RzFEy8ePPId7MgiAZxd+FgQQIbVDquvCJ10trwjkwoP+mPEEefmytQWY9SF+sS8EY1E2N15QSrnDTMMvm7UGDnNGztVF9xb14OpAQV6dKHIOwadocCqhTkp/gK24cROCNWtlZa2r7t3oLtT/FCkrhPMbOL7mryjbmFEU5ikBocc2/LLaw6eqhAt63QyOHXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9NhSv7QuLcWK8WqLvXcbBxU2qbpciavLBN+hPP0giwI=; b=zDq/8O3ldUMO+7gvKZEUXQWEMDvsze4kXzRFrwtVcANpetD9HIH4WH5RwBc3dz/8N1gf5dxHZ0k90/BEv2DqVppD9e4po55n+StRC/E510hHFubMCq6p9dDqlsUtT/2RsYY1E9TBK+AdpJaepr0LEM0iiBv0yYEf3GzbWhXLSSQAMj5A27O5brLLHa10x3Xapl0OdIY8WlvTnBSH0HWVfSlr9hX+4dN8hbf2JB/q7ezk5SobsA1S/n8aFaQ7+Hj0UL0SQKekNSd+fVcdhJMp7E+FirxNb5o7UQHBWwWO/4XV1o22QdfCzwVQVF/LwCZezn/unzocfb1O+0e3M89X1g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9NhSv7QuLcWK8WqLvXcbBxU2qbpciavLBN+hPP0giwI=; b=v4Ukireg6W7uBC3iA/HFH3vYB7Hr/GvOXlr8emWjBwC6/b2p29QchH4p3HdAy4g6JuZojT9BGAkyFBsmJw1nYOn3tzwASJkSwrTzZRHCq09QeOhOwbeQNmweMwhuOtLfpCqdMZlGkeWUVcG7fxGsbyidRfGrKTpqNcHiLWqH/z0= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.14; Sun, 21 Jul 2024 09:15:43 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024 09:15:43 +0000 From: Feng Xue OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , Tamar Christina , Richard Sandiford Subject: [RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement Thread-Topic: [RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement Thread-Index: AQHa20rQBvebQu0gaEOqQV2X1ti+tg== Date: Sun, 21 Jul 2024 09:15:43 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:43.184Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_ x-ms-office365-filtering-correlation-id: a7db6a83-8ff5-4172-d5d9-08dca965b2b5 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?3O3/2JVP+sAbT8My7pfXMuCC+9?= =?iso-8859-1?q?255JWh4kMAoD+a3rWgm6JYkFZzudi2GD0Buz/rc7aw663aRjAaQmI4JpXUXx?= =?iso-8859-1?q?HzqSKCzjdd1UlP1DtSftPc3oNuSOBHY9GtpChYNRB0+JydS9dPKeMKt7+LFV?= =?iso-8859-1?q?rWX459H67kTpjhflECRoHRkaZmrVxo1PqGmljjrc7PS/XQ9TpvDU4JD+EUZD?= =?iso-8859-1?q?0vmHEKh+2x7DpKAAORW/DeGBy29QYu0xpBU5ek63Zd/lKevT8l5HDV/i5k5d?= =?iso-8859-1?q?4qXpDn3orPRECGnmVXU3i2YL6wGHDg5CviPt9p1Nq9gp4RKe0XlHwUUyeBJx?= =?iso-8859-1?q?P7JY4UvvhPekX3zkmuK0n0Dmkv2+jPakHII0V7nQxe3RXbxMgUMaPB4bh1XH?= =?iso-8859-1?q?P8cOGsSt7GyJxHmfjpHIs2vIHOFYM2zfw1hW4PX8PUJQfOvZJ8wMcsQk7Ni9?= =?iso-8859-1?q?Z8eKe2kpUuBuEpFdj4ekIhqa/HQSLVSULoS+HYRhbxzr8u4/G0QnvFOJSa1F?= =?iso-8859-1?q?fGxHRKJ3S0zRzutuEma/KRbyCP8Vgv9Q8OBoDTrgC5FSHKWJHMIhghFZWl2s?= =?iso-8859-1?q?WbnyygLGN05v6PuZah5ZQf46R+m/d/FKpxSLp5+heB6x6JUnmzQWfqFcQYS8?= =?iso-8859-1?q?yYsckWayBwMfUblxBBONBthkTbz5QD8uN5wi8cRBH+DhJm5QK1Ba9pER8bGh?= =?iso-8859-1?q?lqhwX8KPfdp7ieYhHY7KsjSV+RMOY/yQTw/4Thxfx6Ek1HcGIPE1MPBL9k+o?= =?iso-8859-1?q?dP3hbHGtwWUcG212moIRMOaIHtJc0Bqdees4p86KPvBWYhayvwI8DWO9esZN?= =?iso-8859-1?q?t/N6S4OXFT1HfMjXeIfwme16eKNZVi0+DOiNrhbfnLgQFX/YtrjXhnq6sY7o?= =?iso-8859-1?q?VQThkRG2oSzZS/+Me4FUQJ8MF0JDeGJQgMvzKgA5v0g357ZzfAEHMS1Xyj8e?= =?iso-8859-1?q?pwxMvjewmfWH9+ymevFJhNTFwpkwgzZM/NxjNvaozDTAHnactGxibLFUr5/8?= =?iso-8859-1?q?C9zvzEeZuSEILOhzIh1aQ/x18upTUIPWao3p/ELNcogn2uxMd830ZJRB4BXD?= =?iso-8859-1?q?CbHKTAxXJCqcPEWZ+v8flPDDUm4b4Ut1w/zMsDbkuybcynuzrMh4Lubnn9T/?= =?iso-8859-1?q?h7wYho8jx/X34CBUErgFa6K5qvqBD/Jcup2EwjQ3FSboPnr1MpquaXhw1Zmg?= =?iso-8859-1?q?dmhp+1hCdKj4VUVU2uwSwUvC+gWAwrvWS9Qa9R5h7TFMqsf7KqhU391D4Dof?= =?iso-8859-1?q?vptahBzQSyDUqbUXrDXnCTHpDeVkN3ojdC0OuQfInnKGHt/7sDbGdZcNTGYm?= =?iso-8859-1?q?pF17FhHldoUwm4rR3dyxu2ZZi7rvOdpSH0xkdGmqukUQGvKNSCVFeycBJ9ff?= =?iso-8859-1?q?kVTqgfuY4d+JZNFagUafpWx4xVAxb1CrZO4c6plgV1Bw2/3eWmb7IEHUnO0A?= =?iso-8859-1?q?fow8y+drUsmyAl+dQGCHJQgA=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?VhYtfiJQrJcP8Bw8CvpUC1q?= =?iso-8859-1?q?Wb5NdPZAMkAfTnkiOD9D5Dwc1xaDDgF0T9CPRgLQwFyhOtOJoN+KAPh78lmY?= =?iso-8859-1?q?Z5eehMIORWE69YmY+jbkKOxuXe+ZGFO/kN1fOgtRznV2N+pNEOkMXdSBDvEk?= =?iso-8859-1?q?Up+l1vpbE00eaZIi1ByK26Va/se/rD//G9Ei54l5onxTq2kL6nob8fj97P8V?= =?iso-8859-1?q?28PR5LtWNfze81MMGQhf1LlZlRIZVLZVVYFE6/ufSHC29Foru7lULahIMJ8q?= =?iso-8859-1?q?saApw6Excao/fUINisscrUkJHkvd6N9tRsF7nRJ3vJGRYZXPwCrecop1f7Q5?= =?iso-8859-1?q?M8Wosxwbb7c/4iQU2sx/9mnaG08cCCsuSRPLLY0/bylQTocoGjXOfHH2mysl?= =?iso-8859-1?q?qAk+vjqvrSYX5X74ec0uDo0BsE6QOqDTIB6h2TbfwSQVRtxBtdzCMKebfkd8?= =?iso-8859-1?q?i6G8A33G8iPhPyoCLDIuRrPKHNjivbZitd2Z8k4my1P4UDMC7nzRRvEjQWBt?= =?iso-8859-1?q?mEMN8vRND7R3hFNSiWQya6avFm5WdIkQeRtLtuyKnZZWGueWY1uRfq5LKRw+?= =?iso-8859-1?q?95icO7olylG0NM98VyyJvbisGruKdGmtaqGi5younJ+ojyXJ+XC4+4MyeJZX?= =?iso-8859-1?q?V7Tlnt0znSlFzFphKa21ZKa1Y0+/voemW5xCcmrjJYsOTDnm3ysDtTkz49v+?= =?iso-8859-1?q?Hq66tBUZtpuIgVY+IWjgc9bE+YoMiw4ohKDoymKN8GXJAtiUgRmTe9cLQR74?= =?iso-8859-1?q?4DfDNtuLVdsLu21Of7nAaNaMQqGgS/43OoPzv9LKSHTRbrd9yI6ypPwqvVMU?= =?iso-8859-1?q?X+oa8Ah0RMr/XiM1HWH686mTLsfuPH70UDK3IWiruTt2GBih5mM9E+VVyuzR?= =?iso-8859-1?q?akS8An2A04hS+jNFApolouvpyn/Wx/9xb9NUEGgwa43HRTKZ9lMr4oqgWwz0?= =?iso-8859-1?q?DFqsXDJznojUqCR2B4ULDmcJQYWy/gZQhSzgHxkRbE3qhCuLTe0OIqlSmh6I?= =?iso-8859-1?q?jdS8cW2wJQnRQ0rFVvw52O4OZVyk9O6oZmOVSwOb6J3pE05YuT5ammmm0qcc?= =?iso-8859-1?q?wQxEkHKOEegMA7OeJRHZlTSWe5xtcy9D8hlrxu8gHfBm6NJxuDvHomXWDy14?= =?iso-8859-1?q?Ke2ALaTj34hqw8qCTCLEQ0Iz27dZHPj3SYESOE6ne3Ou0qDHjOvKwaD/yivv?= =?iso-8859-1?q?VrLMTR0MtbiRObg2Lr8TFkqkdOsofKLd9kjziR9rA58CbczZF9b7G3w6kgH9?= =?iso-8859-1?q?F1Yr2Efz3n3WTR1xLd+rDJig95z4+pDTMu1kO84nQfxEiAgQPtck1EVss/aY?= =?iso-8859-1?q?Qx9wE7BZH5174Lz2fImkZUa2ODAK/wzf1wrVaznOCMnGuG36/YuJBoez43X2?= =?iso-8859-1?q?bXlyCQjoQx1t94ecOzJQczHZyoAf2U+CjmtwvUcGTlDBjycCKHe0zaoo+Vw7?= =?iso-8859-1?q?5CyycUFbf5t4yiEIUwIjijRf7NDwUwouyJxFOadzffxQa9zM2DQ89eQ4xYwH?= =?iso-8859-1?q?50NB/8loFzomJ767xURNVcdoJ3RJEhyJvsAjRGMfDE8MiMaifTIngxPGH5wl?= =?iso-8859-1?q?dxFgFqEDpbQf7rCCetiRosP2IUxwaivO8wYt8FMGtPxXiVpzy1To5T8409hw?= =?iso-8859-1?q?r3Ebp0cn7cfG7sUlGDng1vJ9jXWFVuGRxp+7OGA=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: a7db6a83-8ff5-4172-d5d9-08dca965b2b5 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:43.4347 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: f6/vKNu7vYbK1UyukwlGMAQQCfcgZNqpBZa8mIcfkv/Xlfog+fTy+7xrUOU3FW0HzHgvlPuesxx1WGOBPB/krWNIpJbF7mzIWrhMjzFzGUI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org This patch extends original vect analysis and transform to support a new kind of lane-reducing operation that participates in loop reduction indirectly. The operation itself is not reduction statement, but its value would be accumulated into reduction result finally. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane- reducing operation. (vect_transform_reduction): Extend transform for indirect lane-reducing operation. --- gcc/tree-vect-loop.cc | 48 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 40 insertions(+), 8 deletions(-) From 5e65c65786d9594c172b58a6cd1af50c67efb927 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Wed, 24 Apr 2024 16:46:49 +0800 Subject: [PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement This patch extends original vect analysis and transform to support a new kind of lane-reducing operation that participates in loop reduction indirectly. The operation itself is not reduction statement, but its value would be accumulated into reduction result finally. 2024-04-24 Feng Xue gcc/ * tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane- reducing operation. (vect_transform_reduction): Extend transform for indirect lane-reducing operation. --- gcc/tree-vect-loop.cc | 48 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7d628efa60..c344158b419 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7520,9 +7520,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)); - /* TODO: Support lane-reducing operation that does not directly participate - in loop reduction. */ - if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0) + if (!reduc_info) return false; /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not @@ -7530,7 +7528,16 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, gcc_assert (STMT_VINFO_DEF_TYPE (reduc_info) == vect_reduction_def); gcc_assert (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION); - for (int i = 0; i < (int) gimple_num_ops (stmt) - 1; i++) + int sum_idx = STMT_VINFO_REDUC_IDX (stmt_info); + int num_ops = (int) gimple_num_ops (stmt) - 1; + + /* Participate in loop reduction either directly or indirectly. */ + if (sum_idx >= 0) + gcc_assert (sum_idx == num_ops - 1); + else + sum_idx = num_ops - 1; + + for (int i = 0; i < num_ops; i++) { stmt_vec_info def_stmt_info; slp_tree slp_op; @@ -7573,7 +7580,24 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info); - gcc_assert (vectype_in); + if (!vectype_in) + { + enum vect_def_type dt; + tree rhs1 = gimple_assign_rhs1 (stmt); + + if (!vect_is_simple_use (rhs1, loop_vinfo, &dt, &vectype_in)) + return false; + + if (!vectype_in) + { + vectype_in = get_vectype_for_scalar_type (loop_vinfo, + TREE_TYPE (rhs1)); + if (!vectype_in) + return false; + } + + STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; + } /* Compute number of effective vector statements for costing. */ unsigned int ncopies_for_cost = vect_get_num_copies (loop_vinfo, slp_node, @@ -8750,9 +8774,17 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (single_defuse_cycle || lane_reducing); if (lane_reducing) - { - /* The last operand of lane-reducing op is for reduction. */ - gcc_assert (reduc_index == (int) op.num_ops - 1); + { + if (reduc_index < 0) + { + reduc_index = (int) op.num_ops - 1; + single_defuse_cycle = false; + } + else + { + /* The last operand of lane-reducing op is for reduction. */ + gcc_assert (reduc_index == (int) op.num_ops - 1); + } } /* Create the destination vector */ -- 2.17.1 From patchwork Sun Jul 21 09:15:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 94288 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EF4023861004 for ; Sun, 21 Jul 2024 09:16:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from CY4PR05CU001.outbound.protection.outlook.com (mail-westcentralusazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c112::]) by sourceware.org (Postfix) with ESMTPS id 8F3603861038 for ; Sun, 21 Jul 2024 09:15:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8F3603861038 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8F3603861038 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c112:: ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553365; cv=pass; b=vTd+zlQNwSlqJbj8z/W+8aIcdgy4svE2wPj7DQaURzbG0UNHAHTBYGyVkw8CMIuyVpPTM03kopcfUBKpi1U+d3wsmx95VDsGRhZGmgVlZwKr6IyJOyhWebsWoKzhbW/txnL2VS1Vg/9v8qVXlBZK4DFmIxQiOUcUDh8nfACXQpI= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553365; c=relaxed/simple; bh=lFW4j0Bu88oOCP1V54iGbTvYoKvCsulWiASq4xMtnYc=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=vacXBRzF0/WQUMoTVNMSDAUH+botS6ui40d4aTgGWSa4nU7hAbNXqpVjMNajzdJ1y/ZP2lHKoGTw0Wt94RnKql6tdB22Ru1mAM0mZm/QrM/fS0ig5acAm1e86p3ZmtMptmEuvfi5uXvmqfnx2W7GsmWkUWy9vFUspZysKKoYJ5s= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=A/EznIE8i20xFRSSTPRKZUT6Wk1YgJGG4tPZvuFOIiiqdE7WKqBAc8+15fiwTaeRas8YcghlsXwQUsXUPRYAe9BQjmymUfPvT71U4xhEWixGuBYTBuanpj4TDCNHxtv3vMOdtJ+F6GL4Rb8GrqNerV8rQgfIA7G/bfbSrSQipyBBQ/D1Qc2+KutMQuYXig94cv6iqNer0WivV5wzdZmveMaEZCzW8I1sqaVYrx9MUrz+7tVjeR6uhmpkbmEm/voTcMvNs4moC27OZWa5uzp9PuulgjcOC2CoXfWpK5Yv+AgnaijbVg6dwog/DjJi3ZhnX2tctkb4ERlOkdmdi1ruRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JQmJVVTCgn05ozv09MZ1rvYeC3A9uS2q0iCWakkpbXc=; b=DgpSqp+0W5SndkOXwnMNI0ryueVG/opc/gquSFWql8sWAMczMRZU4FpLjwxytOPGnlOirLTFX5QgTK+5xzxF9c6hEW41dBiSq0JQ+ei3imhTmM5vI8AOYg41VDug2PNUNBE11OfVieObi/xm993x9OOvJlhhhk0bO87PZWrTnmZy78vBVYgIf1oGwNqci7QYO8d7At8hs/dDkIaUjbHkjIUooYRnqOKLKOF/2Z5zBIfvK9C45uNvcExSU0YdJ0vAKjq7JX37Vhol4FlHt51h8HjwkwcA1bnUlU1CHDaqUu/jnYG02f+a38/NNDvaaZZEejDo5kA8L1kVpWkSYBra6g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JQmJVVTCgn05ozv09MZ1rvYeC3A9uS2q0iCWakkpbXc=; b=P6LdKsTGpCDPewFRF7fLyVZF9uXvDk+TSrvxdShI1BVgCJk/18EIEGPbmEM2P6cXlj2uESAJUFCXeFx4xN5Zptrp1TOe+EQVDzUgADniGCRhSsQn6GkZSz1XM1qByM8yjAB/WiAbjvnzhrLQJ1Mg+aduz4Akbrm7J0pYgJ0+94I= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.14; Sun, 21 Jul 2024 09:15:50 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024 09:15:50 +0000 From: Feng Xue OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , Tamar Christina , Richard Sandiford Subject: [RFC][PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement Thread-Topic: [RFC][PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement Thread-Index: AQHa20sZK5ftqy49LEKBrm3IvwHYqg== Date: Sun, 21 Jul 2024 09:15:50 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:50.352Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_ x-ms-office365-filtering-correlation-id: 5657580c-3b2d-4396-0a58-08dca965b6f5 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?mphSJm5hCKpeYlFEKcPQLamwD4?= =?iso-8859-1?q?uYkGZTUcGznAWa35bXBiH87n6UjqoYhewHtD3nDn37fE7otzLtJzxOgNY5g4?= =?iso-8859-1?q?w4eRFTUC1eDH3Lu6ubNXQ8vSSps78zyOT1W7VOl+L06yn3dXXEpkCo1x86xA?= =?iso-8859-1?q?7T1pal2GL6FtS2dhx6cpxwcUEev/2Fptk5H4sxJyjuHSq7hOrsA8aVgT3M23?= =?iso-8859-1?q?Z9PdI9qDssATSNQZuBRiD9tmmaZ09Or8GMn+P6f0LuODxRhtm1OpKlC7lpZG?= =?iso-8859-1?q?uvCZsdUC2jhSVaN6NcA55E5xeVk+b+qmT6KTAJI/7R1Dpec1y+eDF/br5/Zu?= =?iso-8859-1?q?RZDtQ6FYlRLHdg/ma6xVzWmuD3kpUh10A+H1b8bmATqdiixHQgo+rCe+ttZK?= =?iso-8859-1?q?2wjZ5ok3rg3/EjLcBiUMih/WXUb70595Q0PkbsVfw+Zn0LqeLjuN4MoRV6p1?= =?iso-8859-1?q?nAA261XzJTuX93q5aPWhhEUzTLSq58I9qflHEX1JVVmxz3R+xtcchSc4oXpt?= =?iso-8859-1?q?3nBnWJ41UdexqTLLq2/lYED3tuE7K9GYOGG6kJi97L7jYOnOdXVBgbG7bdWx?= =?iso-8859-1?q?7Ob3f9MXVlkyyybUGHRIqlMPQs0fVS5XqCNAVquyssJUkNM/ahjvwqZnpDO+?= =?iso-8859-1?q?ixuBAj+/FzDLGCYklLLteEKdev8rKikPivkHhqQPCgVhaaJaC/7j0LNbNfyP?= =?iso-8859-1?q?oQ/QuzvJdNoIhRG99s3STmFBlKtsl9M78Ac3FiBbomfMXNvhBcECgC1SZ4+s?= =?iso-8859-1?q?PsRmzv9rv/3n4R6i5n+eln2HtVmh0P4m8yHgKVhxLtu95h563OD5oaPxm3hO?= =?iso-8859-1?q?2AfQAGUwfRV9pTwBY7+l2EFeACVXi9hrEy+ApCEpBwig366DRDEnscWcboK3?= =?iso-8859-1?q?2LSZrxrqswWHu3AD5lGGwLgG/ePPgtpSYXan3Tft9HShEXbrSU5yYuO/HErB?= =?iso-8859-1?q?eMznyoPUOrGQEljGTzgvY9qk3NIS8MiemXOYm3r0uUvgSpWUrx8s5yjz3ZQH?= =?iso-8859-1?q?DKvyBdzs76OzUZH51GRUcypxSuX10AMEUD7LQCgEuu8KIMiFwHEi+j9yM/c2?= =?iso-8859-1?q?UWOuGn4DBiliDAUMZ0whWiOK7pJC+NmC9k43oGNU0c1rYRNC9P3zNMp8BhBH?= =?iso-8859-1?q?6e981q9G6ynXt5JF+2c+7u6yGbVylDZGLnfKzUa8z7Xyr9C9WjdACyqPPvlc?= =?iso-8859-1?q?kDlPKdKMRMT7uwELWmePTQTqbTebJ1wWS9UC6yWYoJsvNCVZl6t2VDebaoUw?= =?iso-8859-1?q?dDSjxJhDi3engLDALI1JqPBxJ9F6k0wotcqKSixPCBCb1eWOrBd5AyPNuG3x?= =?iso-8859-1?q?6JO9TNBG6nGdkSZWcRzC1vWwQDowNACjB5mGyI6siFZHzrlifoQLIC+WN1PH?= =?iso-8859-1?q?GQj4TaNWeZW8oeZQJ2DmONFcVfFx/DhmwmEnJDqFY=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?3uNxm4waHf08eyhiVXOZdtv?= =?iso-8859-1?q?P03o8eaguak8iaFKXGhWQEUuKXdMzDDrPVomAghb7J0Kk9kgpLTCIE03jso1?= =?iso-8859-1?q?TaishUi8W6I5M5Ghulz3zZS8zkEB+AZi2yO+mSYHOsm5xk/XUfqBKJ/1hRNL?= =?iso-8859-1?q?Amvk8EI2O6fMZHsu/PsrvQSxpv/d+yKkbWsEZumZ9m75o7fvCNUrZElzkGh3?= =?iso-8859-1?q?wM8Pf8FPzMwSDRXXqTAMIaBSLWZXX9AUFb6h8ds7t7HJ880b+p/lwIPu9x7K?= =?iso-8859-1?q?QNiw5cNuG756S6J5NHuaMKqvBMkfaWcxZ3dBCwZehBiFKEKYV/mr7FhZYg2+?= =?iso-8859-1?q?5SL8xO/4UA40N4U9To9uUSkL1tYBiSkqYQcCaA/LFWsFhUjnsjCjWEMDejYc?= =?iso-8859-1?q?2oCDCQPyvONBa4Y2tox1k3Zuj0DXfhORZNE4KtBlhkV1839/449N/7jBWiWt?= =?iso-8859-1?q?HySwoNRbEzFINjLJVFLaqjbxMpxGBeWMnSlmAhMuVeBhDAoAIPbCMNwQ/rrx?= =?iso-8859-1?q?gLopiAYdbZSZJwQIfRpw9dCJZrf1y1immtdrQnerzfSCLU07vEqz5gZuoghN?= =?iso-8859-1?q?YZvvkINV28Yj4xY7DeJ8kMp9tJ9ObvnFO01ATERnd5Ne+muMuaIO3hPF7Fmy?= =?iso-8859-1?q?OxbXOW0vSUhqmpFoVs9fjfnaCupoYXwvdxHklHlOc3ZiKB/KG0SG/0MEWqaV?= =?iso-8859-1?q?CZkdwVOiDTv2v/5ia1hfd/88UnlDLE8Pg/5e9Sq7MXxkTL6c8c6pAToN82Vm?= =?iso-8859-1?q?mGsMbrjZyc0KfAeeLr7+AffpQQQH9fOifrlpijrJaGNXmpN4bkH7rF87RiPN?= =?iso-8859-1?q?j33Rrlh1445GymUk6Cn1W/57G4soJFNhz85gk1fIGO0vyNiAn7A7g7ve0pkl?= =?iso-8859-1?q?xLolUsX7xWU0d3HBv8sutjXz68vl2BG3taPU/dvSw6GTrnlj81rzL9vNKf54?= =?iso-8859-1?q?9aUrDIpmqKnAylhf9fWBIE3b7chahVpJauTuZErj/r0ryBHtm8/t93ZWzQak?= =?iso-8859-1?q?atOuiX7d+OKWmhSq5ERfwRacZXZ7wvPkfjuGA26I+EpJL550Gb9kiQOGsJZV?= =?iso-8859-1?q?a3DHqfB9B0iCuMzI1MsLjIgV3xEZRRd4QE3qSotR9cbyV1zycCdYEyRokSj3?= =?iso-8859-1?q?p5DhE4b7S0Q5mv6aTCnKPl9fXCbeGlkXvDyc6kXyH7ap1DB9y7N+zRinpfsf?= =?iso-8859-1?q?x6Ckl83Ogcx1HL1u0+A/9FbBBmJFf0+/Eu365qMYhgX/BB6kgYGge7jjugFD?= =?iso-8859-1?q?2E0/G0eR3Sp1wNDGgE287EhVTx45XTgbF9OGdpACSn7MPOBdVbUpCDXt6F0T?= =?iso-8859-1?q?WqfV00iFTfdo+ONrYZLXVpRO976XzfgYJu4lpcgp6cQk5uxvS3WxQYn3bG4e?= =?iso-8859-1?q?Mto45dgemmNqjlGlDYT6Czkj36yBxUah18GMOB7WDFxdr1uYIn4Rf65KHcVO?= =?iso-8859-1?q?v5LSaTs6ilT8pvN/ibpTvjpmnBZrluCgubj3Lk5mrxVB19QrCeG2DP9QZTQe?= =?iso-8859-1?q?GB9zwZVOSteIf2eHQLLlescQNSMVjYbnEh5tdUd05qTHIw40V0US5nxWu4Oq?= =?iso-8859-1?q?x7E4Vseh0KUo7czX8NXLETTeRTurF1aSSh3SafH6Z8QvHHlcuu7h3bjNeu/U?= =?iso-8859-1?q?Fr0GdcFhRchLzm+2S4WbZeQonQlZHLqDIEJr1Fw=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5657580c-3b2d-4396-0a58-08dca965b6f5 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:50.6049 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: EQt9iUGFVjWwkswHHLuTr7NRfyJhAgCmTZOp8ntM3AKfkPGiZRh3MvGAOx6BiE/evPB+zcWUCz1kO32+wG07TpTbf3oU0BKcovwDpKze9W4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org Previously, only simple lane-reducing case is supported, in which one loop reduction statement forms one pattern match: char *d0, *d1, *s0, *s1, *w; for (i) { sum += d0[i] * d1[i]; // sum = DOT_PROD(d0, d1, sum); sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum); sum += w[i]; // sum = WIDEN_SUM(w, sum); } This patch removes limitation of current lane-reducing matching strategy, and extends candidate scope to the whole loop reduction affine closure. Thus, we could optimize reduction with lane-reducing as many as possible, which ends up with generalized pattern recognition as ("opX" denotes an operation for lane-reducing pattern): for (i) sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); A lane-reducing operation contains two aspects: main primitive operation and appendant result-accumulation. Original design handles match of the compound semantics in single pattern, but the means is not suitable for operation that does not directly participate in loop reduction. In this patch, we only focus on the basic aspect, and leave another patch to cover the rest. An example with dot-product: sum = DOT_PROD(d0, d1, sum); // original sum = DOT_PROD(d0, d1, 0) + sum; // now Thanks, Feng --- gcc/ * tree-vect-patterns (vect_reassociating_reduction_p): Remove the function. (vect_recog_dot_prod_pattern): Relax check to allow any statement in reduction affine closure. (vect_recog_sad_pattern): Likewise. (vect_recog_widen_sum_pattern): Likewise. And use dot-product if widen-sum is not supported. (vect_vect_recog_func_ptrs): Move lane-reducing patterns to the topmost. gcc/testsuite/ * gcc.dg/vect/vect-reduc-affine-1.c * gcc.dg/vect/vect-reduc-affine-2.c * gcc.dg/vect/vect-reduc-affine-slp-1.c --- .../gcc.dg/vect/vect-reduc-affine-1.c | 112 ++++++ .../gcc.dg/vect/vect-reduc-affine-2.c | 81 +++++ .../gcc.dg/vect/vect-reduc-affine-slp-1.c | 74 ++++ gcc/tree-vect-patterns.cc | 321 ++++++------------ 4 files changed, 372 insertions(+), 216 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c From 548026f343a3291a38cdf06575046be5d85fe33d Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Fri, 14 Jun 2024 15:45:26 +0800 Subject: [PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement Previously, only simple lane-reducing case is supported, in which one loop reduction statement forms one pattern match: char *d0, *d1, *s0, *s1, *w; for (i) { sum += d0[i] * d1[i]; // sum = DOT_PROD(d0, d1, sum); sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum); sum += w[i]; // sum = WIDEN_SUM(w, sum); } This patch removes limitation of current lane-reducing matching strategy, and extends candidate scope to the whole loop reduction affine closure. Thus, we could optimize reduction with lane-reducing as many as possible, which ends up with generalized pattern recognition as ("opX" denotes an operation for lane-reducing pattern): for (i) sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); A lane-reducing operation contains two aspects: main primitive operation and appendant result-accumulation. Original design handles match of the compound semantics in single pattern, but the means is not suitable for operation that does not directly participate in loop reduction. In this patch, we only focus on the basic aspect, and leave another patch to cover the rest. An example with dot-product: sum = DOT_PROD(d0, d1, sum); // original sum = DOT_PROD(d0, d1, 0) + sum; // now 2024-06-14 Feng Xue gcc/ * tree-vect-patterns (vect_reassociating_reduction_p): Remove the function. (vect_recog_dot_prod_pattern): Relax check to allow any statement in reduction affine closure. (vect_recog_sad_pattern): Likewise. (vect_recog_widen_sum_pattern): Likewise. And use dot-product if widen-sum is not supported. (vect_vect_recog_func_ptrs): Move lane-reducing patterns to the topmost. gcc/testsuite/ * gcc.dg/vect/vect-reduc-affine-1.c * gcc.dg/vect/vect-reduc-affine-2.c * gcc.dg/vect/vect-reduc-affine-slp-1.c --- .../gcc.dg/vect/vect-reduc-affine-1.c | 112 ++++++ .../gcc.dg/vect/vect-reduc-affine-2.c | 81 +++++ .../gcc.dg/vect/vect-reduc-affine-slp-1.c | 74 ++++ gcc/tree-vect-patterns.cc | 321 ++++++------------ 4 files changed, 372 insertions(+), 216 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c new file mode 100644 index 00000000000..a5e99ce703b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c @@ -0,0 +1,112 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#define FN(name, S1, S2) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S2 int *restrict c, \ + S2 int cst1, \ + S2 int cst2, \ + int shift) \ +{ \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] + 16; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] + cst1; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] + c[i]; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] * 23; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] << 6; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] * cst2; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i] << shift; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += cst1 * 5 - a[i] * b[i]; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + res += ~(((a[i] * b[i] + 3) << shift) - c[i]); \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ + { \ + S2 int t = a[i] * b[i]; \ + res += (t * cst2) + ~((t - cst1) << 3); \ + } \ + \ + asm volatile ("" ::: "memory"); \ + S1 int res1 = 1; \ + S1 int res2 = 2; \ + for (int i = 0; i < N; i++) \ + { \ + S2 int t = a[i] * b[i]; \ + res1 += (t * cst2) + 18; \ + res2 += (t - cst1) << shift; \ + } \ + res += res1 ^ res2; \ + return res; \ +} + +FN(f1_vec_s, signed, signed) +FN(f1_vec_u, unsigned, signed) + +#pragma GCC push_options +#pragma GCC optimize ("O0") +FN(f1_novec_s, signed, signed) +FN(f1_novec_u, unsigned, signed) +#pragma GCC pop_options + +#define BASE ((int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + signed char a[N], b[N]; + int c[N]; + + #pragma GCC novector + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 5; + b[i] = BASE + OFFSET + i * 4; + c[i] = i; + } + + if (f1_vec_s (0x12345, a, b, c, -5, 17, 3) != f1_novec_s (0x12345, a, b, c, -5, 17, 3)) + __builtin_abort (); + + if (f1_vec_u (0x12345, a, b, c, -5, 17, 3) != f1_novec_u (0x12345, a, b, c, -5, 17, 3)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 20 "vect" { target vect_sdot_qi } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c new file mode 100644 index 00000000000..a160bc72082 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c @@ -0,0 +1,81 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 unsigned +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 signed +#endif + +#define FN(name, S1, S2, S3, S4) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S3 char *restrict c, \ + S3 char *restrict d, \ + S4 short *restrict e, \ + S4 short *restrict f, \ + S1 int *restrict g, \ + S1 int cst1) \ +{ \ + for (int i = 0; i < N; ++i) \ + { \ + short diff = a[i] - b[i]; \ + S2 short abs = diff < 0 ? -diff : diff; \ + res += ((abs + i) << 3) - (c[i] + 1) * cst1 + d[i] * 3 + e[i] - g[i]; \ + } \ + \ + return res; \ +} + +FN(f1_vec, signed, unsigned, signed, signed) + +#pragma GCC push_options +#pragma GCC optimize ("O0") +FN(f1_novec, signed, unsigned, signed, signed) +#pragma GCC pop_options + +#define BASE2 ((unsigned int) -1 < 0 ? -126 : 4) +#define BASE3 ((signed int) -1 < 0 ? -126 : 4) +#define BASE4 ((signed int) -1 < 0 ? -1026 : 373) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + unsigned char a[N], b[N]; + signed char c[N], d[N]; + signed short e[N], f[N]; + signed int g[N]; + +#pragma GCC novector + for (int i = 0; i < N; ++i) + { + a[i] = BASE2 + i * 5; + b[i] = BASE2 + OFFSET + i * 4; + c[i] = BASE3 + i * 2; + d[i] = BASE3 + OFFSET + i * 3; + e[i] = BASE4 + i * 6; + f[i] = BASE4 + OFFSET + i * 5; + g[i] = i; + } + + if (f1_vec (0x12345, a, b, c, d, e, f, g, 17) != f1_novec (0x12345, a, b, c, d, e, f, g, 17)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_qi } } } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_udot_qi } } } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_hi } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c new file mode 100644 index 00000000000..0e76536925e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c @@ -0,0 +1,74 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 100 + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 unsigned +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 signed +#endif + +#define FN(name, S1, S2) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S2 short *restrict c, \ + S2 int *restrict d, \ + S1 int cst1, \ + S1 int cst2) \ +{ \ + for (int i = 0; i < N / 2; ++i) \ + { \ + res += ~((a[2 * i + 0] * b[2 * i + 0] + 1) << 3) \ + - (c[2 * i + 0] + cst1) * cst2 + d[2 * i + 0]; \ + res += ~((a[2 * i + 1] * b[2 * i + 1] + 1) << 3) \ + - (c[2 * i + 1] + cst1) * cst2 + d[2 * i + 1]; \ + } \ + \ + return res; \ +} + +FN(f1_vec, signed, signed) + +#pragma GCC push_options +#pragma GCC optimize ("O0") +FN(f1_novec, signed, signed) +#pragma GCC pop_options + +#define BASE2 ((signed int) -1 < 0 ? -126 : 4) +#define BASE3 ((signed int) -1 < 0 ? -1026 : 373) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + signed char a[N], b[N]; + signed short c[N]; + signed int d[N]; + +#pragma GCC novector + for (int i = 0; i < N; ++i) + { + a[i] = BASE2 + i * 5; + b[i] = BASE2 + OFFSET + i * 4; + c[i] = BASE3 + i * 6; + d[i] = i; + } + + if (f1_vec (0x12345, a, b, c, d, -5, 17) != f1_novec (0x12345, a, b, c, d, -5, 17)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_qi } } } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_hi } } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 02f6b942026..bb037af0b68 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -1029,54 +1029,6 @@ vect_convert_output (vec_info *vinfo, stmt_vec_info stmt_info, tree type, return pattern_stmt; } -/* Return true if STMT_VINFO describes a reduction for which reassociation - is allowed. If STMT_INFO is part of a group, assume that it's part of - a reduction chain and optimistically assume that all statements - except the last allow reassociation. - Also require it to have code CODE and to be a reduction - in the outermost loop. When returning true, store the operands in - *OP0_OUT and *OP1_OUT. */ - -static bool -vect_reassociating_reduction_p (vec_info *vinfo, - stmt_vec_info stmt_info, tree_code code, - tree *op0_out, tree *op1_out) -{ - loop_vec_info loop_info = dyn_cast (vinfo); - if (!loop_info) - return false; - - /* As a candidate of lane-reducing pattern matching, the statement must - be inside affine closure of loop reduction. */ - if (!(stmt_info->reduc_pattern_status & rpatt_allow)) - return false; - - gassign *assign = dyn_cast (stmt_info->stmt); - if (!assign || gimple_assign_rhs_code (assign) != code) - return false; - - /* We don't allow changing the order of the computation in the inner-loop - when doing outer-loop vectorization. */ - class loop *loop = LOOP_VINFO_LOOP (loop_info); - if (loop && nested_in_vect_loop_p (loop, stmt_info)) - return false; - - if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def) - { - if (needs_fold_left_reduction_p (TREE_TYPE (gimple_assign_lhs (assign)), - code)) - return false; - } - else if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) == NULL) - return false; - - *op0_out = gimple_assign_rhs1 (assign); - *op1_out = gimple_assign_rhs2 (assign); - if (commutative_tree_code (code) && STMT_VINFO_REDUC_IDX (stmt_info) == 0) - std::swap (*op0_out, *op1_out); - return true; -} - /* match.pd function to match (cond (cmp@3 a b) (convert@1 c) (convert@2 d)) with conditions: @@ -1189,96 +1141,60 @@ vect_recog_cond_expr_convert_pattern (vec_info *vinfo, S3 x_T = (TYPE1) x_t; S4 y_T = (TYPE1) y_t; S5 prod = x_T * y_T; - [S6 prod = (TYPE2) prod; #optional] - S7 sum_1 = prod + sum_0; + [S6+ value = affine_fn (prod, ...); #optional] + S7 sum_1 = value + sum_0; - where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b', - the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of - 'type1a' and 'type1b' can differ. + There exisits natural widening conversion from both 'type1a' and 'type1b' + to 'TYPE1'. The function 'affine_fn' represents a linear transform in + concept of math, and may be composed by a series of statements. Input: * STMT_VINFO: The stmt from which the pattern search begins. In the - example, when this function is called with S7, the pattern {S3,S4,S5,S6,S7} - will be detected. + example, when this function is called with S5, the pattern {S3,S4,S5} will + be detected if S5 is known to be in affine closure of reduction for 'sum'. Output: - * TYPE_OUT: The type of the output of this pattern. + * TYPE_OUT: The type of the output of this pattern. * Return value: A new stmt that will be used to replace the sequence of stmts that constitute the pattern. In this case it will be: - WIDEN_DOT_PRODUCT + DOT_PROD_EXPR Note: The dot-prod idiom is a widening reduction pattern that is - vectorized without preserving all the intermediate results. It - produces only N/2 (widened) results (by summing up pairs of - intermediate results) rather than all N results. Therefore, we - cannot allow this pattern when we want to get all the results and in - the correct order (as is the case when this computation is in an - inner-loop nested in an outer-loop that us being vectorized). */ + vectorized without preserving all the intermediate results. It + produces less than N (widened) results (by summing up pairs of + intermediate results) rather than all N results. Therefore, we + cannot allow this pattern when we want to get all the results and in + the correct order (as is the case when this computation is in an + inner-loop nested in an outer-loop that us being vectorized). */ static gimple * vect_recog_dot_prod_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo, tree *type_out) { - tree oprnd0, oprnd1; - gimple *last_stmt = stmt_vinfo->stmt; - tree type, half_type; - gimple *pattern_stmt; - tree var; - - /* Look for the following pattern - DX = (TYPE1) X; - DY = (TYPE1) Y; - DPROD = DX * DY; - DDPROD = (TYPE2) DPROD; - sum_1 = DDPROD + sum_0; - In which - - DX is double the size of X - - DY is double the size of Y - - DX, DY, DPROD all have the same type but the sign - between X, Y and DPROD can differ. - - sum is the same size of DPROD or bigger - - sum has been recognized as a reduction variable. - - This is equivalent to: - DPROD = X w* Y; #widen mult - sum_1 = DPROD w+ sum_0; #widen summation - or - DPROD = X w* Y; #widen mult - sum_1 = DPROD + sum_0; #summation - */ - - /* Starting from LAST_STMT, follow the defs of its uses in search - of the above pattern. */ - - if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, - &oprnd0, &oprnd1)) + if (!(stmt_vinfo->reduc_pattern_status & rpatt_allow)) return NULL; - type = TREE_TYPE (gimple_get_lhs (last_stmt)); - + gimple *last_stmt = stmt_vinfo->stmt; + tree value = gimple_get_lhs (last_stmt); + tree type = TREE_TYPE (value); + tree half_type; vect_unpromoted_value unprom_mult; - oprnd0 = vect_look_through_possible_promotion (vinfo, oprnd0, &unprom_mult); - /* So far so good. Since last_stmt was detected as a (summation) reduction, - we know that oprnd1 is the reduction variable (defined by a loop-header - phi), and oprnd0 is an ssa-name defined by a stmt in the loop body. - Left to check that oprnd0 is defined by a (widen_)mult_expr */ - if (!oprnd0) + value = vect_look_through_possible_promotion (vinfo, value, &unprom_mult); + if (!value) return NULL; - stmt_vec_info mult_vinfo = vect_get_internal_def (vinfo, oprnd0); + stmt_vec_info mult_vinfo = vect_get_internal_def (vinfo, value); if (!mult_vinfo) return NULL; - /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi - inside the loop (in case we are analyzing an outer-loop). */ - vect_unpromoted_value unprom0[2]; + vect_unpromoted_value unprom[2]; enum optab_subtype subtype = optab_vector; if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR, - false, 2, unprom0, &half_type, &subtype)) + false, 2, unprom, &half_type, &subtype)) return NULL; /* If there are two widening operations, make sure they agree on the sign @@ -1318,16 +1234,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo, /* Get the inputs in the appropriate types. */ tree mult_oprnd[2]; vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type, - unprom0, half_vectype, subtype); - - var = vect_recog_temp_ssa_var (type, NULL); - pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR, - mult_oprnd[0], mult_oprnd[1], oprnd1); + unprom, half_vectype, subtype); + tree var = vect_recog_temp_ssa_var (type, NULL); + gimple *pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR, + mult_oprnd[0], mult_oprnd[1], + build_zero_cst (type)); return pattern_stmt; } - /* Function vect_recog_sad_pattern Try to find the following Sum of Absolute Difference (SAD) pattern: @@ -1343,18 +1258,20 @@ vect_recog_dot_prod_pattern (vec_info *vinfo, S4 y_T = (TYPE1) y_t; S5 diff = x_T - y_T; S6 abs_diff = ABS_EXPR ; - [S7 abs_diff = (TYPE2) abs_diff; #optional] - S8 sum_1 = abs_diff + sum_0; + [S7+ value = affine_fn (abs_diff, ...); #optional] + S8 sum_1 = value + sum_0; where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the - same size of 'TYPE1' or bigger. This is a special case of a reduction - computation. + same size of 'TYPE1' or bigger. The function 'affine_fn' represents a + linear transform in concept of math, and may be composed by a series of + statements. This is a special case of a reduction computation. Input: * STMT_VINFO: The stmt from which the pattern search begins. In the - example, when this function is called with S8, the pattern - {S3,S4,S5,S6,S7,S8} will be detected. + example, when this function is called with S6, the pattern {S3,S4,S5,S6} + will be detected if S6 is known to be in affine closure of reduction for + 'sum'. Output: @@ -1362,49 +1279,24 @@ vect_recog_dot_prod_pattern (vec_info *vinfo, * Return value: A new stmt that will be used to replace the sequence of stmts that constitute the pattern. In this case it will be: - SAD_EXPR + SAD_EXPR */ static gimple * vect_recog_sad_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo, tree *type_out) { + if (!(stmt_vinfo->reduc_pattern_status & rpatt_allow)) + return NULL; + gimple *last_stmt = stmt_vinfo->stmt; tree half_type; - /* Look for the following pattern - DX = (TYPE1) X; - DY = (TYPE1) Y; - DDIFF = DX - DY; - DAD = ABS_EXPR ; - DDPROD = (TYPE2) DPROD; - sum_1 = DAD + sum_0; - In which - - DX is at least double the size of X - - DY is at least double the size of Y - - DX, DY, DDIFF, DAD all have the same type - - sum is the same size of DAD or bigger - - sum has been recognized as a reduction variable. - - This is equivalent to: - DDIFF = X w- Y; #widen sub - DAD = ABS_EXPR ; - sum_1 = DAD w+ sum_0; #widen summation - or - DDIFF = X w- Y; #widen sub - DAD = ABS_EXPR ; - sum_1 = DAD + sum_0; #summation - */ - /* Starting from LAST_STMT, follow the defs of its uses in search of the above pattern. */ - tree plus_oprnd0, plus_oprnd1; - if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, - &plus_oprnd0, &plus_oprnd1)) - return NULL; - - tree sum_type = TREE_TYPE (gimple_get_lhs (last_stmt)); + tree value = gimple_get_lhs (last_stmt); + tree type = TREE_TYPE (value); /* Any non-truncating sequence of conversions is OK here, since with a successful match, the result of the ABS(U) is known to fit @@ -1412,23 +1304,15 @@ vect_recog_sad_pattern (vec_info *vinfo, negative of the minimum signed value due to the range of the widening MINUS_EXPR.) */ vect_unpromoted_value unprom_abs; - plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0, - &unprom_abs); - - /* So far so good. Since last_stmt was detected as a (summation) reduction, - we know that plus_oprnd1 is the reduction variable (defined by a loop-header - phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop body. - Then check that plus_oprnd0 is defined by an abs_expr. */ - if (!plus_oprnd0) + value = vect_look_through_possible_promotion (vinfo, value, &unprom_abs); + if (!value) return NULL; - stmt_vec_info abs_stmt_vinfo = vect_get_internal_def (vinfo, plus_oprnd0); + stmt_vec_info abs_stmt_vinfo = vect_get_internal_def (vinfo, value); if (!abs_stmt_vinfo) return NULL; - /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi - inside the loop (in case we are analyzing an outer-loop). */ gassign *abs_stmt = dyn_cast (abs_stmt_vinfo->stmt); vect_unpromoted_value unprom[2]; @@ -1467,22 +1351,22 @@ vect_recog_sad_pattern (vec_info *vinfo, unprom, NULL)) return NULL; - vect_pattern_detected ("vect_recog_sad_pattern", last_stmt); - tree half_vectype; - if (!vect_supportable_direct_optab_p (vinfo, sum_type, SAD_EXPR, half_type, + if (!vect_supportable_direct_optab_p (vinfo, type, SAD_EXPR, half_type, type_out, &half_vectype)) return NULL; + vect_pattern_detected ("vect_recog_sad_pattern", last_stmt); + /* Get the inputs to the SAD_EXPR in the appropriate types. */ tree sad_oprnd[2]; vect_convert_inputs (vinfo, stmt_vinfo, 2, sad_oprnd, half_type, unprom, half_vectype); - tree var = vect_recog_temp_ssa_var (sum_type, NULL); + tree var = vect_recog_temp_ssa_var (type, NULL); gimple *pattern_stmt = gimple_build_assign (var, SAD_EXPR, sad_oprnd[0], - sad_oprnd[1], plus_oprnd1); - + sad_oprnd[1], + build_zero_cst (type)); return pattern_stmt; } @@ -2492,30 +2376,35 @@ vect_recog_pow_pattern (vec_info *vinfo, TYPE x_T, sum = init; loop: sum_0 = phi - S1 x_t = *p; + S1 x_t = ...; S2 x_T = (TYPE) x_t; - S3 sum_1 = x_T + sum_0; + [S3+ value = affine_fn (x_T, ...); #optional] + S4 sum_1 = value + sum_0; where type 'TYPE' is at least double the size of type 'type', i.e - we're - summing elements of type 'type' into an accumulator of type 'TYPE'. This is - a special case of a reduction computation. + summing elements of type 'type' into an accumulator of type 'TYPE'. The + function 'affine_fn' represents a linear transform in concept of math, and + may be composed by a series of statements. This is a special case of a + reduction computation. Input: * STMT_VINFO: The stmt from which the pattern search begins. In the example, - when this function is called with S3, the pattern {S2,S3} will be detected. + when this function is called with S2, the pattern {S2} will be detected if + S2 is known to be in affine closure of reduction for 'sum'. Output: * TYPE_OUT: The type of the output of this pattern. * Return value: A new stmt that will be used to replace the sequence of - stmts that constitute the pattern. In this case it will be: - WIDEN_SUM + stmts that constitute the pattern. In this case it will be + WIDEN_SUM_EXPR if the operation is supported by target, otherwise, + DOT_PROD_EXPR if dot-product could be used. Note: The widening-sum idiom is a widening reduction pattern that is vectorized without preserving all the intermediate results. It - produces only N/2 (widened) results (by summing up pairs of + produces less than N (widened) results (by summing up pairs of intermediate results) rather than all N results. Therefore, we cannot allow this pattern when we want to get all the results and in the correct order (as is the case when this computation is in an @@ -2525,49 +2414,42 @@ static gimple * vect_recog_widen_sum_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo, tree *type_out) { + if (!(stmt_vinfo->reduc_pattern_status & rpatt_allow)) + return NULL; + gimple *last_stmt = stmt_vinfo->stmt; - tree oprnd0, oprnd1; - tree type; - gimple *pattern_stmt; + tree value = gimple_get_lhs (last_stmt); + tree type = TREE_TYPE (value); + gimple *pattern_stmt = NULL; + vect_unpromoted_value unprom; tree var; - /* Look for the following pattern - DX = (TYPE) X; - sum_1 = DX + sum_0; - In which DX is at least double the size of X, and sum_1 has been - recognized as a reduction variable. - */ - - /* Starting from LAST_STMT, follow the defs of its uses in search - of the above pattern. */ - - if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, - &oprnd0, &oprnd1) - || TREE_CODE (oprnd0) != SSA_NAME - || !vinfo->lookup_def (oprnd0)) + /* Check that value is defined by a widening cast. */ + if (!vect_look_through_possible_promotion (vinfo, value, &unprom) + || TYPE_PRECISION (unprom.type) * 2 > TYPE_PRECISION (type)) return NULL; - type = TREE_TYPE (gimple_get_lhs (last_stmt)); - - /* So far so good. Since last_stmt was detected as a (summation) reduction, - we know that oprnd1 is the reduction variable (defined by a loop-header - phi), and oprnd0 is an ssa-name defined by a stmt in the loop body. - Left to check that oprnd0 is defined by a cast from type 'type' to type - 'TYPE'. */ - - vect_unpromoted_value unprom0; - if (!vect_look_through_possible_promotion (vinfo, oprnd0, &unprom0) - || TYPE_PRECISION (unprom0.type) * 2 > TYPE_PRECISION (type)) + /* TODO: Support widening-sum on boolean value. */ + if (TREE_CODE (unprom.type) != INTEGER_TYPE) return NULL; - vect_pattern_detected ("vect_recog_widen_sum_pattern", last_stmt); - - if (!vect_supportable_direct_optab_p (vinfo, type, WIDEN_SUM_EXPR, - unprom0.type, type_out)) - return NULL; + if (vect_supportable_direct_optab_p (vinfo, type, WIDEN_SUM_EXPR, + unprom.type, type_out)) + { + var = vect_recog_temp_ssa_var (type, NULL); + pattern_stmt = gimple_build_assign (var, WIDEN_SUM_EXPR, unprom.op, + build_zero_cst (type)); + } + else if (vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, + unprom.type, type_out)) + { + var = vect_recog_temp_ssa_var (type, NULL); + pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR, unprom.op, + build_one_cst (unprom.type), + build_zero_cst (type)); + } - var = vect_recog_temp_ssa_var (type, NULL); - pattern_stmt = gimple_build_assign (var, WIDEN_SUM_EXPR, unprom0.op, oprnd1); + vect_pattern_detected ("vect_recog_widen_sum_pattern", last_stmt); return pattern_stmt; } @@ -7191,8 +7073,18 @@ struct vect_recog_func /* Note that ordering matters - the first pattern matching on a stmt is taken which means usually the more complex one needs to preceed the - less comples onex (widen_sum only after dot_prod or sad for example). */ + less complex ones (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + + /* Lane-reducing patterns(dot_prod/sad/widen_sum) are not that sort of + local statement-based patterns, in that they require knowledge of + loop structure. Naturally, it is anticipated that these patterns + would benefit loop vectorization much more than peephole-like + patterns. So give lane-reducing patterns overriding priorities. */ + { vect_recog_dot_prod_pattern, "dot_prod" }, + { vect_recog_sad_pattern, "sad" }, + { vect_recog_widen_sum_pattern, "widen_sum" }, + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_abd_pattern, "abd" }, @@ -7204,9 +7096,6 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_mulhs_pattern, "mult_high" }, { vect_recog_cast_forwprop_pattern, "cast_forwprop" }, { vect_recog_widen_mult_pattern, "widen_mult" }, - { vect_recog_dot_prod_pattern, "dot_prod" }, - { vect_recog_sad_pattern, "sad" }, - { vect_recog_widen_sum_pattern, "widen_sum" }, { vect_recog_pow_pattern, "pow" }, { vect_recog_popcount_clz_ctz_ffs_pattern, "popcount_clz_ctz_ffs" }, { vect_recog_ctz_ffs_pattern, "ctz_ffs" }, -- 2.17.1 From patchwork Sun Jul 21 09:15:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 94290 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CDE9B385840C for ; Sun, 21 Jul 2024 09:18:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from CY4PR05CU001.outbound.protection.outlook.com (mail-westcentralusazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c112::]) by sourceware.org (Postfix) with ESMTPS id E6224385ED72 for ; Sun, 21 Jul 2024 09:16:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E6224385ED72 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E6224385ED72 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c112:: ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553371; cv=pass; b=fw16F8635u5EdjSwnbITY8EcI9h9ornFEH4q3oEsoAA1uQnj5d2d/4voBJZR5QwbzEvkQ/0k8inxkVdf5ZzphKJF+ADD3vjSDgSZKvR9AfnV90JcG9So8I87+WZavPdE5awzsj5l7zLUow49yR/PK9SCjxYz5l2gMzMPCcM3IvE= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553371; c=relaxed/simple; bh=ypd3i8eGmdrq5BcDTkDP8yxfxFTI0X7AUgrTyNOsSuU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=cHLh/VKEGsjsAPCoyGi1Wwcgln/k9m8qS8gX3d41fd/wAs+KMY69iySRqB1JWxs0ak9EOsfLlDGNGEsSIfdcjdEijDqUxpViOkF9gbYx/siE1IatAZ4H1zsHG0zl1cLTVOutoSW6h8XLXl7mSy0BTTTDe7o96PUrAuJFSmgqvRQ= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=S9rwSHvMGzMpUENg+SQQ9NH+VhA4jaYydOp+gQZvYdGoqMLh6kFXZv1teYzYAp2wXBFyNKrBmwfdgcuDbYe/Fo3Aw69f99B1IUqNLA/uCZnoxscUKPIlZ2O+SpV7oQQ8BXK959rf2KwEqGSpvV/TMnHmu8GcHT/Bk7w2p9ogr8NWIFLypPTAz72rdjuMHi+APpBD+9Y5Qr7wr7hRKWzOla9I7R0bU/CJAoQbKOOrghSYPdtFXjZghrm5ZqnybRGdzNr/CbXyXaIyF5l1CAQRdCatmTSZTFybssXKzTtYD9pUQ3sXm735k/WazkhJBELf+NOYj/dAf4UvDGVix1iaAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=92owvmFg60FK1xhhdSdWd/j2iojsmNjkdIWNGHqfyLM=; b=IVTkZmZLXtkjVcHj8rdCW4dhyx5PKkL1iFYtpraF7ovZWm62sdwB0bgPp4iDPK2ZS9RPW2ifR//MxTC9iMoxaHBheIo95yQ0X5sHQTO2sGeCM9O6HZd2vZSzUFLSO5MbhOheckUbYHKq/Tf1E1jHaAmQYOiiBn0c0ds38xdGvaDBghpLZ8/Cs1Iw48OUC3p02hvLjOr9ZKkguxXYmsP2BPUmuILW4IP2hJvSa6E8KgFSI+gVWyYWjYlNAveaqw4jdHrZ6bPw23ZfXEVNpkPf5amrJ4+yovq3pm/imTfrCKjW2Fjha9+QUnY250B8IuNP2ZRdwhPRSY/2UzZ2hVGf4w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=92owvmFg60FK1xhhdSdWd/j2iojsmNjkdIWNGHqfyLM=; b=cvrE7w79Tq7wzxObZhpUAN87/GSDx4cbTm2MI5NFqI186FX8JHVwq4JT0lRBl4G9u4SzPaV7OHljnao1M1R55KXFhGnhndGwkUNo1IGwujJj2ZQn6xjfxfSDbPXYxUoEL2ZB0aiu5+ke9lr571vjvcjs7WeBVwBEvYVj6TYI08w= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.14; Sun, 21 Jul 2024 09:15:56 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024 09:15:56 +0000 From: Feng Xue OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , Tamar Christina , Richard Sandiford Subject: [RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation Thread-Topic: [RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation Thread-Index: AQHa20tQ4OgEd3B6NkG7ouF+1GXWyQ== Date: Sun, 21 Jul 2024 09:15:56 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:56.401Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_ x-ms-office365-filtering-correlation-id: b2a2d631-c70e-452d-1f8c-08dca965ba92 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?eefl5+hwqNOe3pYLNkv6ev45LV?= =?iso-8859-1?q?/d9I2Y+jMGmJMpqPolzDdVvnvNGlRNwMgP7BEJSM2agQR6fHkvGA9fJ2Zm2i?= =?iso-8859-1?q?p4JeXJXxSCVxwCXT7PzKlK5DUB/vjzcXf8mS7AFcPTJPuXN+7HnwvgbwVhB4?= =?iso-8859-1?q?0Q0tvVKmO4bsHbDa/s8Xm4EpoewiH274lRXfdJ+2opAlJo+2fl/D38cm1jWm?= =?iso-8859-1?q?TIXNpJwyJ2lwNfGuBQ4FKwRHjgGMvqI8aP/HjlDcqhzV/pOAfTedx830HD24?= =?iso-8859-1?q?7PoLZtGi/QT1+bHbk4fzCLHIW6jRi2zixUhtJ38XQSBG+FzJSY9yHWx+oV3c?= =?iso-8859-1?q?+Hy9YXU3I7cRWVTSw/+W99BHtZ7tlHUZdl9Zbj0Go8T49TXJ1WwbwcU7rqeU?= =?iso-8859-1?q?YSL6eNgz1NQb+pxfK8/bsWZPk01hg8IRijoz6UJJbwOKo95aES57GBA8b1Xc?= =?iso-8859-1?q?l7kAbF8omdhj5Ix/Ub6I5T8QRkNzDwG2armIKsFdc972uWRkWxcOaW2k39T5?= =?iso-8859-1?q?wktkfsq/58SUot7kRLTh0+KMdgxvO9ZobwWX36gEaL3HZbo+BWkEDHKBCCh4?= =?iso-8859-1?q?HSBByT0Ulpj2jtqxiPAT/RCagO+cUlseCzk5NK8+W1y4Ib6AyQJTncqqEmsT?= =?iso-8859-1?q?Qw4RVmJ76NkWrIirKRnMUQ0uNQis4QHTyIZ01L92ECtuiDFI79S/n9V/I9Kl?= =?iso-8859-1?q?SwdmvEUTwlNwYDd5UQ86jK4+lB3tS7YCVnxeHbYP7Vh1Pt4M3zHRV3QWAwYp?= =?iso-8859-1?q?oq7pbQuOVEBY2cjVh8CGPvGdqE7SdmrB0qIB4Z1JDBZ6oSKl+Yu8w3i1bKs3?= =?iso-8859-1?q?ZhrZ4Sqr9/f4q3V7gO46KeNa0aTGS9cPJBK0842Xezev6R5hzZztPF39HAjU?= =?iso-8859-1?q?s8RqTeLytOkloXtkG4WIMi6w6GluahcheW32NAxTUolhexlc55CIoKZnpVRp?= =?iso-8859-1?q?/Sd2P4f/vWtti85u9XspWtmKuzrESeEKFiqcJgegnh9PNZSTo+kJOxHcwDF/?= =?iso-8859-1?q?qrWug447Vgfgq6jZeox1KXY29pUv2sXSE31+PVmhnBPvvchxLSZuDpI4A5OM?= =?iso-8859-1?q?u9ezdrmQUd290Y5coM3yBPwfRoEFOLrjbsXApfuX9DXBCPKftLspdLdvaaAj?= =?iso-8859-1?q?f7yuc5zV/Tt20eFOwcTnw4QoV17iAmzJst4BLpUVbFWUS8zl/9POWGqaA70k?= =?iso-8859-1?q?ivJuQm4dD0cBno0LLJIuMdFE+FKVpRbys9XL5iimlPNpZ3TyNBjVqJhokbFn?= =?iso-8859-1?q?bmorxw/NaN2jaVp4qQSa0HG43y7xPTdLkJKOodKKkkMhoq4rrbYe9JQDfP2u?= =?iso-8859-1?q?mNpGSOvHF/rYbzJzVJH1uysv3eA7aMvqFxBSzDQyD2obg8lznywAJk4hWdOX?= =?iso-8859-1?q?LEl7+L3IgXkypzghXEJzT8VgQ8p9kwFGf7F5V21Dc=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?LNZlY0MlyGvZzZFFL4sDR0f?= =?iso-8859-1?q?M/7ssSaVfK6aIFOBEI/Bg8pRGX4s41z6wA7BSj/FKVDMFgM+FVjQeswxVjgq?= =?iso-8859-1?q?Fr/XBTQ3yPxmxR5LDpNJPCnoc5GokevrbjkRYonel+j2MArTSRduVl/3EipS?= =?iso-8859-1?q?gqCDfRj88CkivSDwmiZCwa0+YbWHarD0t5yEPrHQVJU6p0F5g0zYiyhWvMat?= =?iso-8859-1?q?LDeCly/310t23UUFRDzH+u4M2L5fhBAEglceIj3bWS/HJvcWdrDsirdJOqKX?= =?iso-8859-1?q?LfKGwLKQooKnkot5fnU2La8YsLcV+r4hJlr0Rb0yKZCnTUXMYeTJ5oIjB3R9?= =?iso-8859-1?q?erB1wQCakNxv49VD7ekzf4/1NKqV1napkPZv0bBGb7Mrj4ENDGkD545XTJwQ?= =?iso-8859-1?q?YM1HRAs/uWpvrzHTMooFQjxk0cm8ekIeQIURaQVfHiVC0wD1tDUycwwxHL93?= =?iso-8859-1?q?MCmbIz21gN0kw6ZiormPgb+LCvoXn5B4B5ItgjIm4Tr6qel5CHqUk/u1EA66?= =?iso-8859-1?q?Nkp9elLzZy/sNiJNro/FbQIFfp/NOwc7751bYeRBOdw3w3Zi8ipTVIjEZMTG?= =?iso-8859-1?q?pJRKdevFKZ7W84dvwx6d3J+jeDQtotNWDil+QW4QHQSMd8/DdzyZUvHAWZF0?= =?iso-8859-1?q?Za4TzdB8/b6O8qgRU7BDiYR5Izf8KSJrTM7NJOC1RFblwckSkNCyOyoXJQjD?= =?iso-8859-1?q?Rwp1Dvcje9hRVNc7HX9k7KMa2cXzc3HFItNa9MccQGlpTOvzSANiXQ+y/Hzi?= =?iso-8859-1?q?ne4Hio+Vffq9yAdthj8gQ6XSss1IVbP9Buk2no31kx4c8pgrnfKs1nWZWN1R?= =?iso-8859-1?q?Y1jwNx/RowB3vA/CQUUiTlKSfnGWVUrSRNfvtpYSPBpa4iPLd+/agNVMynZj?= =?iso-8859-1?q?SfBVDjHUhXX9vmUPYYKAY8bzMAWXzNMUn4bjM0GHB692o3eJ5+T5GnZ7MN+e?= =?iso-8859-1?q?51n0H8Ehon3Me83/DiTCx9Otx7PTYT6BE9jwuV/iFYOgfVzL+dtIEWU3WdLM?= =?iso-8859-1?q?zrdY9W3BqG1LDoIk9TXD07CLKyAe8lAdSBFjROSNMUV3PVAKRuEZ97rHUI3b?= =?iso-8859-1?q?ISC8wKwYjQsJ9xR30du8yTn7nvTlwnOi8saWP1p2OXnFnBKDgyZrTRtcLipH?= =?iso-8859-1?q?gzKAgApylYBSuKWsi3fPNo49PrxWRGEbDgaPehCIU7jwhU755fvuPXvXkSE+?= =?iso-8859-1?q?iE/hcLVpDLd9b47GIBE4r/X9XLwmiXcrKbVfY/dvGEf8QQgyetdT8aeUdPuc?= =?iso-8859-1?q?cgcEYdbM4P+gdSk9vwWE8iNV+y3XH7Bw9VV7Lzd9O0FbDhoOkOqEyCYkODz0?= =?iso-8859-1?q?Aq6dZK/hl4F/1VPiBrEVqy6DHVqc1+tlwIrffIPh9hM4oMpH9AtqEv5gh2uC?= =?iso-8859-1?q?6w68Z+Q5rR+ee5Jjw6PfekL/KRE97SOsF0lQdk5VTFpjoQxG1lQengsGAKiF?= =?iso-8859-1?q?JwdIy6Mx9FymKMZWhnK3EyhcRPOL8127z9hCMCrN+yHCN2FZyd48VonbhZBM?= =?iso-8859-1?q?8D8D22ZELQw0vX2caemvhh/G/4i/MY+Xfum8ral5W96eAdHXn4lH69CZoCAa?= =?iso-8859-1?q?JITKG05JqRa38GsJ3lwHx0/8U6dy0bhs265zldRH4SLtFSqcwYyF/eTdjeFU?= =?iso-8859-1?q?IJh6GXXuy87xlf5qYzB3lcPjHeqx+ao/l7yxN7g=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: b2a2d631-c70e-452d-1f8c-08dca965ba92 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:56.6502 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: k2OIdxXWwMlQdNsMdrehhj1UsfN/vLISw4VvycbGzPKnjcSBZ2D2nHA8iGmlg117yMWKmqqmegFDbIVeWxICX46J2rbSe+GrOwr9ZA2ts2c= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); Thanks, Feng --- gcc/ * tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add the new pattern function. * params.opt (vect-lane-reducing-accum-pattern): New parameter. gcc/testsuite/ * gcc.dg/vect/vect-reduc-accum-pattern.c --- gcc/params.opt | 4 + .../gcc.dg/vect/vect-reduc-accum-pattern.c | 61 ++++++++++ gcc/tree-vect-patterns.cc | 106 ++++++++++++++++++ 3 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c From 94d34da8de2fd479c81e8398544466e6ffe7fdfc Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Wed, 22 May 2024 17:08:32 +0800 Subject: [PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); 2024-05-22 Feng Xue gcc/ * tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add the new pattern function. * params.opt (vect-lane-reducing-accum-pattern): New parameter. gcc/testsuite/ * gcc.dg/vect/vect-reduc-accum-pattern.c --- gcc/params.opt | 4 + .../gcc.dg/vect/vect-reduc-accum-pattern.c | 61 ++++++++++ gcc/tree-vect-patterns.cc | 106 ++++++++++++++++++ 3 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c diff --git a/gcc/params.opt b/gcc/params.opt index c17ba17b91b..b94bdc26cbd 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization Enable loop vectorization of floating point inductions. +-param=vect-lane-reducing-accum-pattern= +Common Joined UInteger Var(param_vect_lane_reducing_accum_pattern) Init(2) IntegerRange(0, 2) Param Optimization +Allow pattern of combining plus into lane reducing operation or not. If value is 2, allow this for all statements, or if 1, only for reduction statement, otherwise, disable it. + -param=vrp-block-limit= Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param Maximum number of basic blocks before VRP switches to a fast model with less memory requirements. diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c new file mode 100644 index 00000000000..80a2c4f047e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c @@ -0,0 +1,61 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#define FN(name, S1, S2) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S2 char *restrict c, \ + S2 char *restrict d) \ +{ \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i]; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; ++i) \ + res += (a[i] * b[i] + c[i] * d[i]) << 3; \ + \ + return res; \ +} + +FN(f1_vec, signed, signed) + +#pragma GCC push_options +#pragma GCC optimize ("O0") +FN(f1_novec, signed, signed) +#pragma GCC pop_options + +#define BASE2 ((signed int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + signed char a[N], b[N]; + signed char c[N], d[N]; + +#pragma GCC novector + for (int i = 0; i < N; ++i) + { + a[i] = BASE2 + i * 5; + b[i] = BASE2 + OFFSET + i * 4; + c[i] = BASE2 + i * 6; + d[i] = BASE2 + OFFSET + i * 5; + } + + if (f1_vec (0x12345, a, b, c, d) != f1_novec (0x12345, a, b, c, d)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vect_recog_lane_reducing_accum_pattern: detected" "vect" { target { vect_sdot_qi } } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index bb037af0b68..9a6b16532e4 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -1490,6 +1490,111 @@ vect_recog_abd_pattern (vec_info *vinfo, return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out); } +/* Function vect_recog_lane_reducing_accum_pattern + + Try to fold a summation into the last operand of lane-reducing operation. + + sum = lane-reducing-op(..., 0) + value; + + A lane-reducing operation contains two aspects: main primitive operation + and appendant result-accumulation. Pattern matching for the basic aspect + is handled in specific pattern for dot-prod/sad/widen-sum respectively. + The function is in charge of the other aspect. + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + + Output: + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern, that is: + sum = lane-reducing-op(..., value); +*/ + +static gimple * +vect_recog_lane_reducing_accum_pattern (vec_info *vinfo, + stmt_vec_info stmt_vinfo, + tree *type_out) +{ + if (!(stmt_vinfo->reduc_pattern_status & rpatt_formed)) + return NULL; + + if (param_vect_lane_reducing_accum_pattern == 0) + return NULL; + + if (param_vect_lane_reducing_accum_pattern == 1) + { + /* Only allow combing for loop reduction statement. */ + if (STMT_VINFO_REDUC_IDX (stmt_vinfo) < 0) + return NULL; + } + + gimple *last_stmt = stmt_vinfo->stmt; + + if (!is_gimple_assign (last_stmt) + || gimple_assign_rhs_code (last_stmt) != PLUS_EXPR) + return NULL; + + gimple *lane_reducing_stmt = NULL; + tree sum_oprnd = NULL_TREE; + + for (unsigned i = 0; i < 2; i++) + { + tree oprnd = gimple_op (last_stmt, i + 1); + vect_unpromoted_value unprom; + bool single_use_p = true; + + if (!vect_look_through_possible_promotion (vinfo, oprnd, &unprom, + &single_use_p) + || !single_use_p) + continue; + + stmt_vec_info oprnd_vinfo = vect_get_internal_def (vinfo, unprom.op); + + if (!oprnd_vinfo) + continue; + + gimple *stmt = oprnd_vinfo->stmt; + + if (lane_reducing_stmt_p (stmt) + && integer_zerop (gimple_op (stmt, gimple_num_ops (stmt) - 1))) + { + lane_reducing_stmt = stmt; + sum_oprnd = gimple_op (last_stmt, 2 - i); + break; + } + } + + if (!lane_reducing_stmt) + return NULL; + + tree type = TREE_TYPE (gimple_get_lhs (last_stmt)); + + *type_out = get_vectype_for_scalar_type (vinfo, type); + if (!*type_out) + return NULL; + + vect_pattern_detected ("vect_recog_lane_reducing_accum_pattern", last_stmt); + + tree var = vect_recog_temp_ssa_var (type, NULL); + enum tree_code code = gimple_assign_rhs_code (lane_reducing_stmt); + gimple *pattern_stmt; + + if (code == WIDEN_SUM_EXPR) + pattern_stmt = gimple_build_assign (var, code, + gimple_op (lane_reducing_stmt, 1), + sum_oprnd); + else + pattern_stmt = gimple_build_assign (var, code, + gimple_op (lane_reducing_stmt, 1), + gimple_op (lane_reducing_stmt, 2), + sum_oprnd); + return pattern_stmt; +} + /* Recognize an operation that performs ORIG_CODE on widened inputs, so that it can be treated as though it had the form: @@ -7084,6 +7189,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_dot_prod_pattern, "dot_prod" }, { vect_recog_sad_pattern, "sad" }, { vect_recog_widen_sum_pattern, "widen_sum" }, + { vect_recog_lane_reducing_accum_pattern, "lane_reducing_accum" }, { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, { vect_recog_bit_insert_pattern, "bit_insert" }, -- 2.17.1