From patchwork Thu Jun 9 04:39:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 54971 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB6053850205 for ; Thu, 9 Jun 2022 04:40:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB6053850205 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1654749634; bh=Ty0/1fqyqvHwn9TdU6zN6TvUByM+85Wgy/NCxcCnYUk=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=unM9RVnMROGScXeXym0xzMgfR0suWAfPpKItZePzuaq9qmPSlpkITERESBDWrsH4g VGHj3odA4c54yARUMKSOlsscJNGpD6/jbCafUHsOsJRIqlVXRZUP3QxggwCC7zcNyN lDZF8IyURnHwbk6OAgAB3Q0zcP9Yf6YM0jSYS5BE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80054.outbound.protection.outlook.com [40.107.8.54]) by sourceware.org (Postfix) with ESMTPS id B2429385C327 for ; Thu, 9 Jun 2022 04:40:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B2429385C327 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=G1Gm8ri+yOpJQg2I3iPyOEs5pRVap5D34RaItkNboeul3ZHm7XxvuIo8sO0xc4ICX6bFRpjbk+ky3HZrPjx7p1F9PGzjtQxfJhHl08qVtlljqGP8CdX5UIgCFrAmobjov7OmKJxgO8NsHmxCrTwFGRueGuodKt/QAgyW9w8aBTZYmmP3vXRvCLT+JeIRPsDtQMK5JyHDsa3HDXbUOd2z43mUsSqBfkjagIvbI69rA9yP3Xb6wWieJaHpDaLKVKKXrtkpSbWY7l2/EaGOOcPDBjPc81/Q6Cu8VkWWbI5Ct3k6ZAG0D+oLYRH96F5qKsJvs0c+dEX+ZsW3AVg/1Xh5Cg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ty0/1fqyqvHwn9TdU6zN6TvUByM+85Wgy/NCxcCnYUk=; b=jQbiVAGHGLlQ8o7v/7hZPcQl0iICzCyGL2/xKdw647HMNDP67TD6ngRXj869iCYSB5G2KLCxyUphKiGND1nf/dGeRK7T0jlDQiHAh6yp9Iu6WOQ/MM7zLxonGYxPut9yfJReCvX3pVhyCEg97kSaLUqmzpC/xhULLRaiZSiijaq5kRuVJt57DNfzFyJ3pmWJT3E1onVV8ZFxFp/zsAmdgbhZsbogcHKbRsR2ko6wCpyhEJUi0YDU8wqXPEEerd03J5LZE4J3DdT0pp8CemKUXZcwp2pTTSnEI1vRylm59F8JqfihgVL7S7UGEob9Hx5B7LMi0msBW8OaCsv/qUdpWQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AM5P194CA0013.EURP194.PROD.OUTLOOK.COM (2603:10a6:203:8f::23) by GV1PR08MB7851.eurprd08.prod.outlook.com (2603:10a6:150:5e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.14; Thu, 9 Jun 2022 04:39:57 +0000 Received: from VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com (2603:10a6:203:8f:cafe::8c) by AM5P194CA0013.outlook.office365.com (2603:10a6:203:8f::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Thu, 9 Jun 2022 04:39:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT059.mail.protection.outlook.com (10.152.19.60) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Thu, 9 Jun 2022 04:39:56 +0000 Received: ("Tessian outbound 1766a3bff204:v120"); Thu, 09 Jun 2022 04:39:56 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a89117d4b1692f1c X-CR-MTA-TID: 64aa7808 Received: from b2975093ce1e.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1859E383-FC47-444B-9753-D8C0D8F66D89.1; Thu, 09 Jun 2022 04:39:49 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b2975093ce1e.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 09 Jun 2022 04:39:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eR7q7rqUS2/gdX6l6HxGArZtnrUob+DNO951eDxvE4KUdSAKdK92rrdIWXrK6EBSswL8x/bRVCEgnAT/NYUvVetwpbECvMTcgOA3sXoJaw5D1YvaP+qgwgA8fxSjdX+djdeCVnc5Sl+QQVJkSoEx3YwGjBwEUjV7EPL4scZB6UfhcIID0YbX1zCFIAIXo1vjkLfhiwpblWp3vUIEsg3sDxVN17feeHQUGWXLn7Z39lnNDHErkhT5dIMIj5/twdIrFHTWvoKluv47eY0lefehi1wfv9VlJvwMF9sJiDeBPqd7oeThKprLz/5R3ew5A1LiOhYtAsak0EbO/frRBh140Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ty0/1fqyqvHwn9TdU6zN6TvUByM+85Wgy/NCxcCnYUk=; b=Axj7e1T28shsSbUk8PQ9QwBQ+vmmEA2q4xwk3ib3Owux9HVtoEK6SMfZOOFpl/KJhL/I6AMLuL+ow1t0zscwLOGTUTCunYNySMmukuSzwyyH6og1cMAHZrW7iLi+ENe7AgorAH70GEaSVJdakTdaAgPvgjrs+PmrJfj1UXpbpcoTHaakxdKAEARBuP7QTQ7S9uGB7Cky2MqgKwkxtUntXVfBNvtXTakbWO9loPb9n3F1h5QvE9cwEBwwmjjwYtKPGId9pfiV3I+46sYeYEfwCFFxl8phQKfbPdVzHIrEHG4W22ZWieGMYmPjhhLvwfLGix3wqo70RCMu7ADt2TrRKg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0802MB2286.eurprd08.prod.outlook.com (2603:10a6:800:9e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.13; Thu, 9 Jun 2022 04:39:23 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%7]) with mapi id 15.20.5314.019; Thu, 9 Jun 2022 04:39:23 +0000 Date: Thu, 9 Jun 2022 05:39:18 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO2P265CA0212.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:9e::32) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 94a08cdd-e73e-4dbb-4cf2-08da49d21ab5 X-MS-TrafficTypeDiagnostic: VI1PR0802MB2286:EE_|VE1EUR03FT059:EE_|GV1PR08MB7851:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: u/hhvigS7XvA27WpP/OFKgV5ovP0qJUllGTvzaBiO7J5z903zzY5Esc/6vmUGjz2wSBZeZfhQxQGpZC7w0hs7j/KsurowwHsjMYvt9QC+37Pg4U3QwQ1x7tyD0CivVf3w6Tb4ScBuUIApVx1wTiYnswr/QbrP0d+Qt9h3qQ5IjdzDoqjiz75S7YkRlrDfp53W6FyWJLXYDSOJ6PA0IHib5zmN5RRNaqhG3s+2IJOFCbmv/jGNyyWRKa1Fe6l+s53aIOI1FV/aSjqOrewOxugT35B98NSEfv/JFYbvfPMW+iTk5JPsG1lXMaHgtp5T6KEwObMQZFnVXBio57aw0GSDfYuIQwjgGFvthPll+h0u4uIH8TBVt9LMY9/uMTS9iUwA1Ww9pE0ohdeFEHvgVElvwIVL9JUnlyPMDkrvF/5WMzo2M+tR/oCyCTH8zoTGGlG6C0HC/HFX1ZGEfvj0eMzpdu1TOn1Fcc5xn/GKtA+PWX4BQC+r6CkwTQ+WUWsRWjZ/Agc8zucAG2+uCVi3VDrlyqZ/GM2cwbm87ZFt16PQIEhU4gdlu0HWuQ1q1N1boe/BcLTzjZX/PD400uupykKJfotWn8hx/clSJNm70BB0AY3KgpPfsxX5fynPgUMHgpvf5F5Y0nH98NSOKXwX4Zb5+UIMJaGJOxzjVFuIhj/g3CThe0KkdP+Cgu8OxK/LwJNYfaqjLeWbE/GyjJH2TJlXgB2r2Y3zSJuiKg6N3g/3Ei5Nndlwm1KdIlNOvQh3VTptyG0hco4skgPT8d98E1AVWZj7gw86IWmOdCl+CqUGZQGEL/qlMMwqoeYxsQ/F3MyDQaiVconckXjimuVyAiHZ93Ic2zKsEBao9etokxGZHg= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(4636009)(366004)(235185007)(5660300002)(2616005)(508600001)(86362001)(38100700002)(26005)(4743002)(6512007)(186003)(83380400001)(33964004)(44144004)(8676002)(6666004)(8936002)(6506007)(4326008)(2906002)(66476007)(66556008)(6486002)(66946007)(316002)(6916009)(36756003)(44832011)(84970400001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2286 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 7f7e431a-4f16-4cc7-7267-08da49d206b9 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9nZSbwq5O5L93KFhAFMwQWxTjkNSW+0IME1I64Pxems+EugnpDvzG2xlGpGh9JXWpVrFC9yvP10n5fpMrWav3AC1bU9pCc44JSji0bk4+7ABHwSb+0LRtzAVXJreYAnLIKslYbfMwihCYfkqd/VjhWcltFm/rbUoo+iiVAcyyB4wc4deLt9j2NnzPHbkxYJOUlNqLfzUZb6cFbgbbO+nWlO/OOEyQLy8p0JYLHVFCm7NaeQeBulv6fM7z1/elov2vWQWEs+oe5pl5yubVuFyv9ilSpa5AH0CO7Ax7ZyOpWsghupCD4bL+MncxV/VgDaqI0AbhLFLds39tiWDszxX/+g7e4zZQhdCpHle4ZP/bQLfltBGsGt/GYsxsfskOPcScEVv21Ac4QJTApb2PmLseBRKrqOQ20SyFAoYGZ+S+DEH5cjF0cq4ixgB7v6VqNp0r3Gn1tsArdKBm9ZYPv82FZC63UAnIu9b8OjijDuBWSw/XPyyFcEaLdQWCG7AFmA+WlMP3XEzV7Smi0I3fWJxNuH77KwINyYB8hSBjLUCQ31N10c7QP+SsXn8pCNmXxuFGlm/o0xA2vzo96+iBMAwCvZf/tOp17sA/pTsPAclAlbULbjoanTxxCU/drwaPWqkoXeQXUKb0L6E8+v2kL4QjsZ/1AIaOoqOtse6izlEBTtqY4sXov2htiL4qeQiDUTNLU0rByIbnh3ioy9/pFRMseUK1DH29/xxxnDNHvl5S1Ot3cvXzaa5fWujMh4CLGN1SALBx+ohzRnRatB3chft5E7PPE2nYMcAe3VhfMGoOII4uCuSCkDRtc3CJWrOSL4CkbVSUZFPMOxwuiSrRvNX8wfXkQBp0BhjWJp/8lN22JM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(6666004)(81166007)(235185007)(8936002)(316002)(36756003)(2616005)(508600001)(4743002)(6506007)(6512007)(26005)(6916009)(186003)(33964004)(6486002)(44144004)(4326008)(86362001)(70586007)(70206006)(44832011)(336012)(8676002)(5660300002)(82310400005)(47076005)(83380400001)(84970400001)(2906002)(36860700001)(40460700003)(356005)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 04:39:56.4989 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 94a08cdd-e73e-4dbb-4cf2-08da49d21ab5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7851 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: richard.sandiford@arm.com, nd@arm.com, rguenther@suse.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an optab to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1 Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * internal-fn.def (DIV_POW2_BITMASK): New. * optabs.def (udiv_pow2_bitmask_optab): New. * doc/md.texi: Document it. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Recognize pattern. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-1.c: New test. * gcc.dg/vect/vect-div-bitmask-2.c: New test. * gcc.dg/vect/vect-div-bitmask-3.c: New test. * gcc.dg/vect/vect-div-bitmask.h: New file. --- inline copy of patch -- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7b4f6e945866c38 100644 --- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7b4f6e945866c38 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5588,6 +5588,18 @@ signed op0, op1; op0 = op1 / (1 << imm); @end smallexample +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@item @samp{udiv_pow2_bitmask@var{m2}} +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@itemx @samp{udiv_pow2_bitmask@var{m2}} +Unsigned vector division by an immediate that is equivalent to +@samp{2^(bitsize(m) / 2) - 1}. +@smallexample +unsigned short op0; op1; +@dots{} +op0 = op1 / 0xffU; +@end smallexample + @cindex @code{vec_shl_insert_@var{m}} instruction pattern @item @samp{vec_shl_insert_@var{m}} Shift the elements in vector input operand 1 left one element (i.e.@: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index d2d550d358606022b1cb44fa842f06e0be507bc3..a3e3cc1520f77683ebf6256898f916ed45de475f 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -159,6 +159,8 @@ DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) DEF_INTERNAL_OPTAB_FN (DIV_POW2, ECF_CONST | ECF_NOTHROW, sdiv_pow2, binary) +DEF_INTERNAL_OPTAB_FN (DIV_POW2_BITMASK, ECF_CONST | ECF_NOTHROW, + udiv_pow2_bitmask, unary) DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 801310ebaa7d469520809bb7efed6820f8eb866b..3f0ac05ef5ad5aed8d6ca391f4eed71b0494e17f 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -372,6 +372,7 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, "umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3") OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") +OPTAB_D (udiv_pow2_bitmask_optab, "udiv_pow2_bitmask$a2") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..a7ea3cce4764239c5d281a8f0bead1f6a452de3f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..009e16e1b36497e5724410d9843f1ce122b26dda --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..bf35a0bda8333c418e692d94220df849cc47930b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 217bdfd7045a22578a35bb891a4318d741071872..a738558cb8d12296bff462d716310ca8d82957b5 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3558,6 +3558,33 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if ((TYPE_UNSIGNED (itype) || tree_int_cst_sgn (oprnd1) != 1) + && rhs_code != TRUNC_MOD_EXPR) + { + wide_int icst = wi::to_wide (oprnd1); + wide_int val = wi::add (icst, 1); + int pow = wi::exact_log2 (val); + if (pow == (prec / 2)) + { + /* Pattern detected. */ + vect_pattern_detected ("vect_recog_divmod_pattern", last_stmt); + + *type_out = vectype; + + /* Check if the target supports this internal function. */ + internal_fn ifn = IFN_DIV_POW2_BITMASK; + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + { + tree var_div = vect_recog_temp_ssa_var (itype, NULL); + gimple *div_stmt = gimple_build_call_internal (ifn, 1, oprnd0); + gimple_call_set_lhs (div_stmt, var_div); + + gimple_set_location (div_stmt, gimple_location (last_stmt)); + + return div_stmt; + } + } + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5588,6 +5588,18 @@ signed op0, op1; op0 = op1 / (1 << imm); @end smallexample +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@item @samp{udiv_pow2_bitmask@var{m2}} +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@itemx @samp{udiv_pow2_bitmask@var{m2}} +Unsigned vector division by an immediate that is equivalent to +@samp{2^(bitsize(m) / 2) - 1}. +@smallexample +unsigned short op0; op1; +@dots{} +op0 = op1 / 0xffU; +@end smallexample + @cindex @code{vec_shl_insert_@var{m}} instruction pattern @item @samp{vec_shl_insert_@var{m}} Shift the elements in vector input operand 1 left one element (i.e.@: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index d2d550d358606022b1cb44fa842f06e0be507bc3..a3e3cc1520f77683ebf6256898f916ed45de475f 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -159,6 +159,8 @@ DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) DEF_INTERNAL_OPTAB_FN (DIV_POW2, ECF_CONST | ECF_NOTHROW, sdiv_pow2, binary) +DEF_INTERNAL_OPTAB_FN (DIV_POW2_BITMASK, ECF_CONST | ECF_NOTHROW, + udiv_pow2_bitmask, unary) DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 801310ebaa7d469520809bb7efed6820f8eb866b..3f0ac05ef5ad5aed8d6ca391f4eed71b0494e17f 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -372,6 +372,7 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, "umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3") OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") +OPTAB_D (udiv_pow2_bitmask_optab, "udiv_pow2_bitmask$a2") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..a7ea3cce4764239c5d281a8f0bead1f6a452de3f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..009e16e1b36497e5724410d9843f1ce122b26dda --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..bf35a0bda8333c418e692d94220df849cc47930b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 217bdfd7045a22578a35bb891a4318d741071872..a738558cb8d12296bff462d716310ca8d82957b5 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3558,6 +3558,33 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if ((TYPE_UNSIGNED (itype) || tree_int_cst_sgn (oprnd1) != 1) + && rhs_code != TRUNC_MOD_EXPR) + { + wide_int icst = wi::to_wide (oprnd1); + wide_int val = wi::add (icst, 1); + int pow = wi::exact_log2 (val); + if (pow == (prec / 2)) + { + /* Pattern detected. */ + vect_pattern_detected ("vect_recog_divmod_pattern", last_stmt); + + *type_out = vectype; + + /* Check if the target supports this internal function. */ + internal_fn ifn = IFN_DIV_POW2_BITMASK; + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + { + tree var_div = vect_recog_temp_ssa_var (itype, NULL); + gimple *div_stmt = gimple_build_call_internal (ifn, 1, oprnd0); + gimple_call_set_lhs (div_stmt, var_div); + + gimple_set_location (div_stmt, gimple_location (last_stmt)); + + return div_stmt; + } + } + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) From patchwork Thu Jun 9 04:40:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 54972 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E8B9C385C32C for ; Thu, 9 Jun 2022 04:41:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E8B9C385C32C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1654749713; bh=rn81Isv9devO9YhqaJImbRSG7KR7KiDJU00DqXG2WSo=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Lc0zSNNItHXfSac6TuKgBC/ghlHTzXhAA52NPqRuDsVg5qNEUoRyO2tW3A7DZKo6c GDbjchTqRNrEmlkKhUoYYrGJFpNyrr81O7r+R372VCM5ZsEeLZvWUNRlKv5gS5TwzT NPruzY59r1QSVorh80Oe9cXqEMXbLmiRA/Lsep/8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50048.outbound.protection.outlook.com [40.107.5.48]) by sourceware.org (Postfix) with ESMTPS id 50C15385C327 for ; Thu, 9 Jun 2022 04:41:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 50C15385C327 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=aTibhI/JiwjIhAbSqpn7IdGQD8BCF6uzxxWArwNoYqDApepLiq794kDzsx30AuN527atrIwHxXEzdwl/f9Fo0dLve9DNOd51hx26k23Q8wg/m7AX4slFxiN/t+9k6kkr2ND9MYeomInLBsR/gIoa9W9E+h9Jg5+XF//YwP1GyINacCWkdoeFGFdbQmNlsNv705DOmbY7XJ1mneFJ53Oh+OpmzTDaJL4VCq8XSgsRgJK7pKT8oSVDmipA7Z8jOUsptq4CD4q4TBX94rh20KGqMKJx04UXybedUPa5tzD585Rlsp86H42hnUzpHoIm7e2XnQVH/h6iGtxLlfIg9T7bng== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rn81Isv9devO9YhqaJImbRSG7KR7KiDJU00DqXG2WSo=; b=GGRUnJscrBT103eoqifVaPKDhXjQ6VUK3TxhxBp8Xxn2bQO1LPJ2VV+dpo4N95S9Hwzj3N83rrtxEbgm8MyVSYr6kVzn1XUFK76hwxYRQu23RNaUP5kGYDla8hNKGvKMblAgM2YNlTQseYbrRG9YHUJIi9C46fHKlORBQ6W+SNhj63dinR7v73wnfebBSw3w1uVfUcViE9FW3OTRmchWX7rga3NqJC7UXcntzS+vxL+mhbL+w6oalnUUWRDd8eSiaQCYKsjlgh+T0NW1gDkrNkLknwKgTksXunrk2w9EcgYpWtb+phZeIvbsfPVH1YDu35hz1WosnRHrjj1K5EmGDQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AM6P191CA0092.EURP191.PROD.OUTLOOK.COM (2603:10a6:209:8a::33) by DB7PR08MB3321.eurprd08.prod.outlook.com (2603:10a6:5:20::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.13; Thu, 9 Jun 2022 04:41:05 +0000 Received: from VE1EUR03FT035.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8a:cafe::e0) by AM6P191CA0092.outlook.office365.com (2603:10a6:209:8a::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.13 via Frontend Transport; Thu, 9 Jun 2022 04:41:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT035.mail.protection.outlook.com (10.152.18.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Thu, 9 Jun 2022 04:41:03 +0000 Received: ("Tessian outbound 4ab5a053767b:v120"); Thu, 09 Jun 2022 04:41:02 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c47223038c87b286 X-CR-MTA-TID: 64aa7808 Received: from 8dc53bcd185d.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C75D2D8F-848C-4A86-A070-E6E1CE3E07B9.1; Thu, 09 Jun 2022 04:40:56 +0000 Received: from EUR03-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 8dc53bcd185d.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 09 Jun 2022 04:40:56 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QL7fPGDLu+AOSOT4x+H7zrwUvwHhq4d+0Cy6/vA7nuVJVB2BzgmMlywWqpO9geM03bkv2hUHqy7mi4ngV0DIsKyBpf/iXy0uJv4+Zqmd0m3STkBAxV48DbjgHWght7iVlSVSp7faLhuqJpvo9rH414ppDDgrjT8MAzlJTAQ1qZVtLN8YS6iFzmHoqRyxfOsWzfYTFJrT/SbykgnDMiRaZd4Z2H1DDFdAPdZEiVrz7c0GYstp5S/C7eUzEXGFOaR+0j+oJaF2bS2wwnZJOd9fJ5TBjFxfPSxzzXfznSjsEEyCdrXl9FIGUzRfK9azwKTnEcY09ww/GJ9t6QvdxJfwiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rn81Isv9devO9YhqaJImbRSG7KR7KiDJU00DqXG2WSo=; b=oY3mZcxoHqU9ZXbuU4Z2uxTuCzjXYt1ja6CPdeglJm5ndiEwrkMZxlOm6I17wXEwu7PktXDkcZHuCYK9Gxgt9FGGYDP9o1ad6kj1NDV5vuDsnDKITs0/HYGgMSgLvWZUT4HvQ7rC3sYOXjMmmNBfHBnI4S+gEqVB6j8ev2mxyu/LxRPkpGNHfaBn64tjNlXtGuwoyzz3EGpt4Exqku+xsY3p+4c5vKiyIMK3art961qbus1/JYSOTOFYdFXVINjsHUxC3Dx5m2NUjhh5WqLyO7WgMq5Qlv+p8aM/xHSoXPFMOHd5cmgBpXq8wC5KDszXDiEKtJNitXVG7fHE+wh1hQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0802MB2143.eurprd08.prod.outlook.com (2603:10a6:800:9a::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12; Thu, 9 Jun 2022 04:40:45 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%7]) with mapi id 15.20.5314.019; Thu, 9 Jun 2022 04:40:45 +0000 Date: Thu, 9 Jun 2022 05:40:43 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/2]AArch64 aarch64: Add implementation for pow2 bitmask division. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0199.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a5::6) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 1ef572d8-4732-43ff-42d2-08da49d24284 X-MS-TrafficTypeDiagnostic: VI1PR0802MB2143:EE_|VE1EUR03FT035:EE_|DB7PR08MB3321:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: NKCxbK44bKJeX+jZNm74ruKM+8SsTqlME3IIX2DcrKXZ2unx2jacQe/o18XxALIOWPC1Ym0Y3e+F6vW88RWyK6cCOx0EHHOpm4xUQkIAsd3Ig0SSYtDTS9MvvX0jH/ySJ2k7B8LGvHzACpHL9yVsy8Jd96XN7ztlynWzZVGzS1iF/YtvWYDn53prEpMjh95zbtgf/jROT7n5fDsX/NZ+ch3noNAe+5J6EpKMOWi18RjQwicBaJc32Xom1BpuH+e9BsLRQp/gaHQaBSvtQEbzcEGqH/l+FgS38VHSz6n5R7LY5hivMbwJxTFCBeZK4xiStuNFBz4FquQqgVJqEW/cDceUJ8Biza3uVfeWJk4oqtAQ15oLOuHAarV/tyuCCEanNHJmfHusOqn4U3SQF0lApMNz4xP8ssHiiYmS+6clyMwU1eeojinMgNRhpH4wLQoinl4Y9Nf11meigwHLPOibFDI1pHCSeU57jlHlSdyqWATTKm6izrNoI4D7RoqCOxGLcSgmWE8/qWTX3dz2030eDidWhVdryLl6L4nFJn9kWnBLyoaeAAwoJipCYGk/Y02Z1DZ5Gn3UfT2ZAJTM+x5ahr/iO2sCHjF8vgG4V4Ovs4uTebtcMGD4Fjw42Fib3uNXA79XZoz4hx9TpBv1ZhSA0QRGmK+Ep57jQnTWXh4vFq/wSl60xJf+Hnig+QXoFfNS0e5ZSmQLSez3O2jN14+21QpSsk56VMsaHVVJK0X5+q4kl8IkX+dX0/o8uxi2NZicjGrbIEUhmqeD0OxBQr7hj7u6nSJ5/aU+A/dYjxTw5zSt1Ggo0GZoggcmlElWAJ7Z X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(4636009)(366004)(33964004)(235185007)(44144004)(66946007)(8676002)(66476007)(66556008)(5660300002)(6506007)(86362001)(26005)(6512007)(4743002)(316002)(6916009)(4326008)(508600001)(8936002)(6486002)(84970400001)(38100700002)(186003)(44832011)(2616005)(2906002)(36756003)(83380400001)(67856001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2143 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT035.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: f7061311-71a8-4486-5d2e-08da49d237f5 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 22XCzJ6//+CRRW/EwgcBwmBFLEtH5JjuBbkUV07mLY7WWrvqQvhzIDqF8fCN7oTupzsLZwejkk2qXcxPz8yPXA6TD6D2VjaMjgqMy/XKPeCoCWHx/zvhP7hXrzYleROMJOAMAh8teicUsm8m9BacqxCX4yMsXMaOOrxYJNNWiIwJWvvt7R9IkBx/B5n3l12sjihSEJruWhi26DE43jlfZVdpd3O9/GyXVryWaj4ihECX/RhQStzhhZEwi3Y4KkjiOeL8UKXB+IlHH+/xQ55Z4FTWxlk2q2+vVy3VTU2F01lQE0Hrx/m6RvwRlVPEP9olCfvLnXily5xz8Gy+YRz42OtCtia3C8+sMvG1GQ9r7p5ZS73WbB9T6GpICkQ9Kyv3sX399KBIPaKs+xNBXJpfMhpSvaK37V1rHfw1QQv6c8eSSnrj6mX5yKJKlse7zekCOwnobLZGdg/wEGrRhMoipsZBVw0arAXPGpXwBMNqKi2MgBjgJLZZwjtmFUYbuqlqm3/FsgO2YpH8791hVQpPg3r9cCcXGkkVNGobM2dJEh3Van5Eq6I6tyXT68xc2naHVzof5kRnuxP+rK74dmyTuY6R14oXYzX57qGGgIhFugga68a7H5reVRuv0ITrc3hWnHgFV1Anh5k0YBAoEWwGoHtcsM0vA2mvHHoe6zqieSTz+l+MTiyN5A8H0PHzIvKvsWLObYCGQksqZwSTFmEDq+XOC02UwlsaZpNyB2xWlrVieAcwK/4UtFSZH/dgkaCMAKKVG5Kk89wPYa5AJ9w8iL3Rz8oF8uKljmjsDwc/B0M7YB4QpYKpV/yP5xXAVO25 X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(36756003)(70206006)(70586007)(356005)(84970400001)(82310400005)(6916009)(8676002)(36860700001)(4326008)(81166007)(316002)(8936002)(5660300002)(4743002)(235185007)(44144004)(83380400001)(6512007)(6506007)(26005)(86362001)(44832011)(6486002)(336012)(186003)(33964004)(2616005)(40460700003)(508600001)(2906002)(47076005)(67856001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 04:41:03.2864 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1ef572d8-4732-43ff-42d2-08da49d24284 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT035.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR08MB3321 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, This adds an implementation for the new optab for unsigned pow2 bitmask for AArch64. The implementation rewrites: x = y / (2 ^ (sizeof (y)/2)-1 into e.g. (for bytes) (x + ((x + 257) >> 8)) >> 8 where it's required that the additions be done in double the precision of x such that we don't lose any bits during an overflow. Essentially the sequence decomposes the division into doing two smaller divisions, one for the top and bottom parts of the number and adding the results back together. To account for the fact that shift by 8 would be division by 256 we add 1 to both parts of x such that when 255 we still get 1 as the answer. Because the amount we shift are half the original datatype we can use the halfing instructions the ISA provides to do the operation instead of using actual shifts. For AArch64 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: movi v3.16b, 0x1 umull2 v1.8h, v0.16b, v2.16b umull v0.8h, v0.8b, v2.8b addhn v5.8b, v1.8h, v3.8h addhn v4.8b, v0.8h, v3.8h uaddw v1.8h, v1.8h, v5.8b uaddw v0.8h, v0.8h, v4.8b uzp2 v0.16b, v0.16b, v1.16b instead of: umull v2.8h, v1.8b, v5.8b umull2 v1.8h, v1.16b, v5.16b umull v0.4s, v2.4h, v3.4h umull2 v2.4s, v2.8h, v3.8h umull v4.4s, v1.4h, v3.4h umull2 v1.4s, v1.8h, v3.8h uzp2 v0.8h, v0.8h, v2.8h uzp2 v1.8h, v4.8h, v1.8h shrn v0.8b, v0.8h, 7 shrn2 v0.16b, v1.8h, 7 Which results in significantly faster code. Thanks for Wilco for the concept. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (udiv_pow2_bitmask2): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/div-by-bitmask.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 18733428f3fb91d937346aa360f6d1fe13ca1eae..6b0405924a03a243949a6741f4c0e989d9ca2869 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 18733428f3fb91d937346aa360f6d1fe13ca1eae..6b0405924a03a243949a6741f4c0e989d9ca2869 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4845,6 +4845,57 @@ (define_expand "aarch64_hn2" } ) +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; If we imagine a short as being composed of two blocks of bytes then +;; adding 257 or 0b0000_0001_0000_0001 to the number is equivalen to +;; adding 1 to each sub component: +;; +;; short value of 16-bits +;; ┌──────────────┬────────────────┐ +;; │ │ │ +;; └──────────────┴────────────────┘ +;; 8-bit part1 ▲ 8-bit part2 ▲ +;; │ │ +;; │ │ +;; +1 +1 +;; +;; after the first addition, we have to shift right by 8, and narrow the +;; results back to a byte. Remember that the addition must be done in +;; double the precision of the input. Since 8 is half the size of a short +;; we can use a narrowing halfing instruction in AArch64, addhn which also +;; does the addition in a wider precision and narrows back to a byte. The +;; shift itself is implicit in the operation as it writes back only the top +;; half of the result. i.e. bits 2*esize-1:esize. +;; +;; Since we have narrowed the result of the first part back to a byte, for +;; the second addition we can use a widening addition, uaddw. +;; +;; For the finaly shift, since it's unsigned arithmatic we emit an ushr by 8 +;; to shift and the vectorizer. +;; +;; The shift is later optimized by combine to a uzp2 with movi #0. +(define_expand "udiv_pow2_bitmask2" + [(match_operand:VQN 0 "register_operand") + (match_operand:VQN 1 "register_operand")] + "TARGET_SIMD" +{ + rtx addend = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend)); + unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode); + rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize); + emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1)); + emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector)); + DONE; +}) + ;; pmul. (define_insn "aarch64_pmul" diff --git a/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c new file mode 100644 index 0000000000000000000000000000000000000000..c03aee695ef834fbe3533a21d54a218160b0007d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99 -fdump-tree-vect -save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** umull2 v[0-9]+.8h, v[0-9]+.16b, v[0-9]+.16b +** umull v[0-9]+.8h, v[0-9]+.8b, v[0-9]+.8b +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uzp2 v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** umull2 v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h +** umull v[0-9]+.4s, v[0-9]+.4h, v[0-9]+.4h +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uzp2 v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** umull2 v[0-9]+.2d, v[0-9]+.4s, v[0-9]+.4s +** umull v[0-9]+.2d, v[0-9]+.2s, v[0-9]+.2s +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uzp2 v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +/* Costing for long vectorization seems off, so disable + the cost model to test the codegen. */ +__attribute__ ((optimize("-fno-vect-cost-model"))) +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +/* { dg-final { scan-tree-dump-times "\.DIV_POW2_BITMASK" 6 "vect" } } */ --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4845,6 +4845,57 @@ (define_expand "aarch64_hn2" } ) +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; If we imagine a short as being composed of two blocks of bytes then +;; adding 257 or 0b0000_0001_0000_0001 to the number is equivalen to +;; adding 1 to each sub component: +;; +;; short value of 16-bits +;; ┌──────────────┬────────────────┐ +;; │ │ │ +;; └──────────────┴────────────────┘ +;; 8-bit part1 ▲ 8-bit part2 ▲ +;; │ │ +;; │ │ +;; +1 +1 +;; +;; after the first addition, we have to shift right by 8, and narrow the +;; results back to a byte. Remember that the addition must be done in +;; double the precision of the input. Since 8 is half the size of a short +;; we can use a narrowing halfing instruction in AArch64, addhn which also +;; does the addition in a wider precision and narrows back to a byte. The +;; shift itself is implicit in the operation as it writes back only the top +;; half of the result. i.e. bits 2*esize-1:esize. +;; +;; Since we have narrowed the result of the first part back to a byte, for +;; the second addition we can use a widening addition, uaddw. +;; +;; For the finaly shift, since it's unsigned arithmatic we emit an ushr by 8 +;; to shift and the vectorizer. +;; +;; The shift is later optimized by combine to a uzp2 with movi #0. +(define_expand "udiv_pow2_bitmask2" + [(match_operand:VQN 0 "register_operand") + (match_operand:VQN 1 "register_operand")] + "TARGET_SIMD" +{ + rtx addend = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend)); + unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode); + rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize); + emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1)); + emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector)); + DONE; +}) + ;; pmul. (define_insn "aarch64_pmul" diff --git a/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c new file mode 100644 index 0000000000000000000000000000000000000000..c03aee695ef834fbe3533a21d54a218160b0007d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99 -fdump-tree-vect -save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** umull2 v[0-9]+.8h, v[0-9]+.16b, v[0-9]+.16b +** umull v[0-9]+.8h, v[0-9]+.8b, v[0-9]+.8b +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uzp2 v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** umull2 v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h +** umull v[0-9]+.4s, v[0-9]+.4h, v[0-9]+.4h +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uzp2 v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** umull2 v[0-9]+.2d, v[0-9]+.4s, v[0-9]+.4s +** umull v[0-9]+.2d, v[0-9]+.2s, v[0-9]+.2s +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uzp2 v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +/* Costing for long vectorization seems off, so disable + the cost model to test the codegen. */ +__attribute__ ((optimize("-fno-vect-cost-model"))) +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +/* { dg-final { scan-tree-dump-times "\.DIV_POW2_BITMASK" 6 "vect" } } */ From patchwork Fri Sep 23 09:33:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 57954 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 522963856DED for ; Fri, 23 Sep 2022 09:34:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 522963856DED DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925674; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=oZnmgOe2snzGLD45H15UwjsqDLbtd5aW56bMMZdalJPoqMqCLJX0MX+zaqcpzLUSb 1sYoc2hlxdWXqVREE3d5H9ujZy/24IEfoeT/A9m1//Bd9hf03S/T1anAq3KgLb9c1N wp00v7xxpUnnO8V98N+obqPkXEJqhemK4U0+LSf4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2063.outbound.protection.outlook.com [40.107.21.63]) by sourceware.org (Postfix) with ESMTPS id 935F13857817 for ; Fri, 23 Sep 2022 09:34:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 935F13857817 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Hm7ftjkUV2LTwPj9yQ9S9Vl77wrnlcln5klO7ETHa8mOsuWV4YwySle02avekKQW9E0CldmM+5lcIE80iI4S1wzHwxcTK33KPwE2ghs9Mp+/dzwCd0nFbCTE+ARF6vfdyAo30bk5ahJ/KEpLSW49fxLnPPd4pBcaZX/yJh6UC7xxgNqzphlEq3YVJTc7MJyMj0ZivU/DutGGBGheP0e8RWUFkrZ/01lYCIdrr7hldb3g2i+7uPurkP/UfPLie0pTsZ8Li5lsy9VHc9NQbf118MOHvcQlfkrEM2PWZr0cs3QINUjyy+esYyQ3v9zyxX5buYwAKdJSQ/rCNYAlyNwC8w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; b=QaDzm4B7DpckwAIyVurJKsnf27rNTBdVn3q5ZQr5WK2jpHEQiPdD52ZDFvRS7En+sZ9WMHaKD1fWSJL849WbnQUvMxsKOzm9RCU0o5zgwIwlWz1MX7ajuvQlV8H9E1eExNKzVtapedEiyzz5BOSaplGP6735CzfzjbU6VyqDlZoc9j3spfwvIj4ZwgfoH0fDp0tr1X3rl8WSFKG+qyxW7lRlFwKmOsN563id8M0YPsH3O+0qjtUfNlPymZavouJQOsjJ8tWvOVmCValgTDlnoPolAcVVJZIZhtxRz8wr+SdYQxVky8Ssvk7HKACaU6NMbGuP4a4Z9aff5T+2Pm6s+w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from DB6PR0301CA0005.eurprd03.prod.outlook.com (2603:10a6:4:3e::15) by AS2PR08MB9366.eurprd08.prod.outlook.com (2603:10a6:20b:596::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:33:59 +0000 Received: from DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:3e:cafe::36) by DB6PR0301CA0005.outlook.office365.com (2603:10a6:4:3e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20 via Frontend Transport; Fri, 23 Sep 2022 09:33:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT015.mail.protection.outlook.com (100.127.142.112) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:33:59 +0000 Received: ("Tessian outbound fc2405f9ecaf:v124"); Fri, 23 Sep 2022 09:33:58 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: abddd346efa8326b X-CR-MTA-TID: 64aa7808 Received: from e46854059a70.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 92EEE0B4-DF30-4A24-B10E-8C65458DD3FC.1; Fri, 23 Sep 2022 09:33:51 +0000 Received: from EUR03-DBA-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e46854059a70.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:33:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TI66yvTUcVIcNoK5jKWX4tujCoD8lgjfDmJoIICh4G5Iei0iTMPcK/NA5hGSM31CSS5MzqMvLwj3mckkRYSekbEJuoWfHsTFth/k+GGJqaU3xwKkVnP/QGU5EYqBfXiEHb5yHmgirU4gcDv+7PDLHACJBv9h5oi8x+AoMA7ydHNfciLqZ6wX0dtZAPIOyf8olRSbS5WXQSC1xoYl0p0/RMNI/XWEhwlaV+DI1DEzspH809akclstLEPEKx6VczgSCw8RJnU/du3RCnBkX4QxLVG2t7EZGCKUqS7tKQ+dpq6f+WC96hkVz3qH/Yycx9NDWr5bufEs+1H64FjLmckY+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; b=PMfZTUkNMJjABAu54h3fTYSP3g0IFJ30rOD1AieBn8gQGfnaltk2pVPDQw+A71mIvsrh510Zos6AOfOi3oyDi8Qb0Spjf5IHOKMm76QN16wgq/a94U3ZXx5LTbhfdynHarsOGVGjtgPG/n6SUNswAdE4e+TltKE3Y2RCgmlIjqKAGK+dQ0dEjPvJ8Z2GgeziVKDWWa1+ODyHq9ZJVn+uujaOUJ4Oxl0SwgkDQPgrvgPjLzM7rSnkEizCJbZuefraWrxttqzThNEsreH7+UTH1teR6FCntIWNhBuO+Fem1qSEPAPGmn5ABBwzww2EeOaCYVBWT+pXk3wyD/QNFr2bjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB9PR08MB8360.eurprd08.prod.outlook.com (2603:10a6:10:3d8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 09:33:50 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:33:50 +0000 Date: Fri, 23 Sep 2022 10:33:46 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask division Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0511.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::18) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB9PR08MB8360:EE_|DBAEUR03FT015:EE_|AS2PR08MB9366:EE_ X-MS-Office365-Filtering-Correlation-Id: 7728b3b0-2f0c-4ae3-fdb8-08da9d46be32 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9mebUCfGEA2CSGwQMQ2y/ig0Df6Rew68L/cs+zH/hXY/Lm1MTIBxyvnEixXc3qRcve1W0COprgDJbfw9nDiYLlBzq++BQ5SqsziCcAWBRLTMflYxZ9TiiSXqkNXXO9zIRRJAf2CDKSHdB6bxmZAZUf/07nqwpJwmShLkqNEXbJ5nbafaTvCyVX1aH5e8dvkPOh2pb3z35/b+PaLmAVKoqTiLwd1kZYtPvfCcuoUBYwAMN2xpo53Tlt3uCFYlpevzfE27kV1UqJPC4NuUl7PX4I7Z0LfgVZ2PYE5lbsAtOW9CfQJXQCZNUFpaAOdGkbaT6zYaugT8nxFr6GbVh9BXb4y1KoTWM5J7KAYe5nyfuLDD3Qx8efW1+yhWX1loTPvoy9GjW8G2rJP5MDZt7FMbGLxXrcSOb0umxxF92n3kpJr7lDm1VaZvoaL0FOmWVUMjsA+oEiN/sv0zkp98Ga6eYcIB0AEHaKVVSP3+Ffkq49J1bTK2rMutJz0XDpYo4DkxfswbYhRZzeAjEYLBNbgxqyPueqQIMy60F7lYW1D/N0fxXy0FZNvbDtWxUqk0JGSUpPZr8+BDRDObGo7d770pR7823w3xCUPigDknVH5C3Gusq2bMSVovWHVlWFL34zDH+6rUQ6KKAuP+PFXsu45EHAf4zRIN956nBi9it0zs4CRtIWZH20KJsjhVZxFRHVdvXh5aLZ+nqLktB1sGOYDaJrcTLfjtHo0EueI9rGN2ffci25+63FjETw0ozpQI08JsbW3UpFP2SEpXhXzp1ljV4amz/NMma1mTBhQ8IOtY8GJdfn3LL0r7nCz8hcfEySim X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(396003)(39860400002)(366004)(451199015)(8936002)(4326008)(66556008)(66476007)(66946007)(38100700002)(36756003)(84970400001)(86362001)(44832011)(5660300002)(2906002)(235185007)(2616005)(186003)(6506007)(4743002)(6512007)(6486002)(33964004)(6666004)(478600001)(41300700001)(8676002)(44144004)(26005)(6916009)(316002)(2700100001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB8360 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 76b854ae-4073-4cf6-4917-08da9d46b8e6 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GbaeV5CPR5MdwMthocaLH6YZuGcKMfYpE45nU2EA7lRcD7nh9M92UV9AXSSq7UB96x84wWjC0WAGpbBUTXSYn0v2mHj8Pi9tXupaO/FrBDbgOKiFlkTauUSUCbvpVnQ/ZUGvjZXN8itRsRsBwvcp7Efw4PvyOvmAWhZ6iooRLe9Kah9r5AnyMRLCRidrQvEE5ElQB+uubIqIz1I3QFZq8HMAHz+a2E6Mdijmf8qqSn1xNNl3GVrFlOgTMlp5HtEeGh8yqk71RSEjBVcyWJuKXRBw6Tu2wnU473E4OxrxtjixIELrPrgJUR8O5heLSnzjh4GDfSxNVNW/QWp2+QkAwoElu7VO0M7p6P8lI2msD/NRuWhwFx3eGR6YckQuDFQEwqH5wXeWB25VI1SkJJrWuTrne81/BOt0XiAw4X/9M20HzGMJeZqFJWK4lzS2Qtt10XHYZOqjZ1pZxhgm9QbEUo+n6gqu1hyY9tJhWj/fY1/S6DOYEVvT2CETxMkR+HC+dDNXd4YFadLBGsvR+LDQm77Wx3HJ4zJ6wSI/yJFXaw8zRQSFNovC9T5j6pNjlZahz2is+2WD+HPIjAFzcWgSFuJEjAOSVwxQNu5M/Z3QjjbzUwe69pTt3lOR+kH4FSo87vhA4Dq/qSR7ee5v75W/2p2hJZIoNVRAV/u8O1owWWRlqIxxG+Bk8iLVffAKuX/Q/csmBiBaBRB1iRgA98qlxkDU90bBtUBionHdtr3iuwF1jJg37R6sEhv/td4Dm2QqQrQlVgc3+1mDCHDa0qBJJ8yGva1a305CAbdz8CX5G4dudo3lOToF0PdrViFvWPAuX17Vvc/CacUYlC83jrcy3uu0MD51xn3iwVpvW2g+w1U= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(136003)(346002)(376002)(396003)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(478600001)(6486002)(4326008)(41300700001)(36756003)(316002)(5660300002)(82310400005)(70586007)(70206006)(8936002)(356005)(36860700001)(81166007)(8676002)(235185007)(4743002)(6916009)(186003)(40480700001)(82740400003)(6512007)(44144004)(2616005)(33964004)(6666004)(40460700003)(26005)(6506007)(336012)(47076005)(86362001)(2906002)(84970400001)(44832011)(2700100001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:33:59.0474 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7728b3b0-2f0c-4ae3-fdb8-08da9d46be32 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9366 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an named function to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1) For SVE2 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: mov z3.b, #1 .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 instead of: .L3: ld1b z0.h, p1/z, [x0, x3] mul z0.h, p0/m, z0.h, z1.h umulh z0.h, p0/m, z0.h, z2.h lsr z0.h, z0.h, #7 st1b z0.h, p1, [x0, x3] inch x3 whilelo p1.h, w3, w2 b.any .L3 Which results in significantly faster code. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv3): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 --- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -71,6 +71,7 @@ ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 logarithm ;; ---- [INT] Polynomial multiplication +;; ---- [INT] Misc optab implementations ;; ;; == Permutation ;; ---- [INT,FP] General permutes @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_" "\t%0., %1., %2." ) +;; ------------------------------------------------------------------------- +;; ---- [INT] Misc optab implementations +;; ------------------------------------------------------------------------- +;; Includes: +;; - aarch64_bitmask_udiv +;; ------------------------------------------------------------------------- + +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; See aarch64-simd.md for bigger explanation. +(define_expand "@aarch64_bitmask_udiv3" + [(match_operand:SVE_FULL_HSDI 0 "register_operand") + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] + "TARGET_SVE2" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp1, operands[1], + addend)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp2, operands[1], + lowpart_subreg (mode, tmp1, + mode))); + emit_move_insn (operands[0], + lowpart_subreg (mode, tmp2, mode)); + DONE; +}) + ;; ========================================================================= ;; == Permutation ;; ========================================================================= diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -71,6 +71,7 @@ ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 logarithm ;; ---- [INT] Polynomial multiplication +;; ---- [INT] Misc optab implementations ;; ;; == Permutation ;; ---- [INT,FP] General permutes @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_" "\t%0., %1., %2." ) +;; ------------------------------------------------------------------------- +;; ---- [INT] Misc optab implementations +;; ------------------------------------------------------------------------- +;; Includes: +;; - aarch64_bitmask_udiv +;; ------------------------------------------------------------------------- + +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; See aarch64-simd.md for bigger explanation. +(define_expand "@aarch64_bitmask_udiv3" + [(match_operand:SVE_FULL_HSDI 0 "register_operand") + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] + "TARGET_SVE2" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp1, operands[1], + addend)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp2, operands[1], + lowpart_subreg (mode, tmp1, + mode))); + emit_move_insn (operands[0], + lowpart_subreg (mode, tmp2, mode)); + DONE; +}) + ;; ========================================================================= ;; == Permutation ;; ========================================================================= diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} From patchwork Fri Sep 23 09:34:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 57955 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C0448385700C for ; Fri, 23 Sep 2022 09:34:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C0448385700C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925697; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=WBO97CPedFC9IrnC/s04Nd4YkNkJoxll2zJ01aFvhacoINIadUym9qLHKirOL0XMT rfy21r/g3355ZGwXR7jy+0e38f4GEV4R3J93FaxkQqqXZN7pd6mZ3bNUor0xIq5r5P JdkY9ZHHavHDneE5rPQZG8VfSY1/aTXG2JaGjsJA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80042.outbound.protection.outlook.com [40.107.8.42]) by sourceware.org (Postfix) with ESMTPS id BD445385734F for ; Fri, 23 Sep 2022 09:34:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BD445385734F ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=IyqxRZuLCFXI6wy0LruWDXRbAsobCGoWPSrFIFuoni+2kW37t6eD7i7HaYfm4TfInxt+MJDRx/cd/Hh+UCnYEoZgMNBoBDYbbnj5qchK82Qu9veNrlcGjSNtH7g3iiTlfbsRhlN0vUxy+m2x9KuhkXOnZXEKbR6StGhGnvyRAKVQgpGRWNvfASSTCwFeeraj58euVBvn8NBGo9fndwIfhGuht8rmhVahLLzayzAOrT6VllzsmskvcPhFKQMmeKMI54h/Wb5KSDVuAqBtFERlM9mSjEeUHLJIt7mm6kFJXQtoyFz6LK5iSv8jDQXD+Qex2BV18O4EqYdqOpUvlgYrEw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; b=eqU+sOpoqLwl4KfLjDjZG1CiArBhBCJpQd7r5GxMVE8Xg638NLWcEVX7KeJMxF5Hl2kpIUpqSPmVNDB6AOkrZpbIl59lqNsV7KKKkucfTDxjOxHroFT0TlR522/oyCpxM6pFsOl/vAZx4hNXr3hbkubEWy8ip50kaqZmyPR/9UYLW/eHm1Oe9uUqLvznyFmDYkJ+Xm8DC8G9In1WLskXHq2gDyRQ1Cgpe3j2LpyU/Q3NWYDssX+MFuab7HZ3O/XxCSSXtXeEUdXA1nf80dTKg7ECLw04iVztyJpBrVNoqQFqK9A6HoyzdJfdw6+kM3VdAox02UZQa8s/mD91pAi88Q== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from FR3P281CA0012.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1d::15) by DBAPR08MB5752.eurprd08.prod.outlook.com (2603:10a6:10:1ac::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.18; Fri, 23 Sep 2022 09:34:21 +0000 Received: from VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:1d:cafe::2d) by FR3P281CA0012.outlook.office365.com (2603:10a6:d10:1d::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5676.9 via Frontend Transport; Fri, 23 Sep 2022 09:34:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT026.mail.protection.outlook.com (10.152.18.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:34:21 +0000 Received: ("Tessian outbound 8ec96648b960:v124"); Fri, 23 Sep 2022 09:34:20 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 70e7534a792decd4 X-CR-MTA-TID: 64aa7808 Received: from 29dfe5c51192.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 728758E5-2CF6-4CD9-AEDA-DC679913B7A6.1; Fri, 23 Sep 2022 09:34:13 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 29dfe5c51192.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:34:13 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fR27bwQ0ZKkcdelQYrqx8g3dch8CYvdgVXrkPzEuedRozvKVqvLCVAhfjlo69mbkfBGBQSe4L+gthC+2Q9uUC6hc8qkP3lnpWRjc4AQz8pXj06+9r3D5h84NDuA0k+QY7zZMQGab81o0EPQm6jieOl4BApPxy8tujma281Zrjq3LKlXtV9+OeJvrnOsKhO3zImF/978mEm8qEw3rJP6vqSFqGOZsayoUR+CAdO0rjIs3JT3a8DLizDJV2T8XjtE4IKag2lYUCUIMqjXj4MBIn6KvqIOA/1ExKebxmjNoUnl4WHl/4q7QrSwa9v+pMHnC7Q4HbklSHATwt3DEOZS73Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; b=dIX6MRj39gRJ8ewkFsuclawZf9g64ljNRiWRZ12sHghbItkJZh4ATDyWJYro1Q65rKN9XWmSXbjMzRTMcvYP6WGh8qThnjTv6Og0GBxTKQ9ARkPdB5oHdm78quiL1EaSz4PTRdx7+EHyJeQ0jYKSnjvfDix9Kb0bLnz+2pfX9G/Dfc3ANKQAnnoiufzWgekrUlAHaibdrYTK6oCpYByTw6TTppqUDcAVgo0ij8wNNFs6pFvVMFHzbqAgjzUVQHEmxZUUqoUApkfIHtInE8rQmgZa/zC39y9PyaCTSzY9/979OOl6HgsPltBZnHSi+DiEm1QB4EFmPbO0E1qHuRB1TA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAXPR08MB6414.eurprd08.prod.outlook.com (2603:10a6:102:12e::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:34:11 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:34:11 +0000 Date: Fri, 23 Sep 2022 10:34:09 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0530.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:2c5::13) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAXPR08MB6414:EE_|VE1EUR03FT026:EE_|DBAPR08MB5752:EE_ X-MS-Office365-Filtering-Correlation-Id: 0ff91629-e379-46aa-752b-08da9d46cb7f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: xqf3GNN5Q4m9KxQhbIK4iOUufhcPfOw4fZ0JDXnfL/ntrBeXk8THezsLr9dwPzXVfqWDYn6SMGcn/2/1UV/YE1KqkfJpcFsFO/zpTQ9fsiM26kHgy8fvdCZwr0kAiSXi9vST3goyjDpmRrRiKtAGDE83wFVoBZ9t1Vzw1uj/UPLM/NaDSyvwOtv8tRxuvl9bfjQiQDsan5Mx+JJcuSkvg2R6V2C2OE+SdUM1EToTvb3xXfwAOfj+WvkXzZMj0sVW2uNV7INs1Yi8YepWWM3Q9FQZd4vzHQb5PL/pOF+AlkUajhw+NNyjo5t5snigf7pTA3NBXIUPrqWboO6Drv3HM02Ipl/Lz+Z7nZO4PE94jsYUVzxY3aDtjajW6QCtol2zBKrxnjuoWQ3V5HCJsScsivy/VkRIuCoO2tXmBmfb6J23OV1vAzz7JhhavPfLFWeTTsbruTi27pGeynvkIp3m//zIlBmH92bbB5tPJ5qNBNgML7uYsywuTEfp7MH8RpHy4MPPTcPJa2eL6wRs8KROtdvNPVDHLzWYJ0L+EQk/vVg5qFfvlmWtLsMhFoq9/FIM9qT2ib7IaVEWA/OmBM0L4a5v2rGlWUZUidd9egUoUICh2wDbCI7Qm+MgFUdWszm8v2WYRU5iENH9haIoFgTgMHlgmmcoBk9P6IZmTTzKCDVawRKblGgwrT49a5bd3+ZOgBkdZzb1Vy5A7mXYM1Dnn/A1pOmpWFfjOQtOJp8JgDV089tD16CzFekpYQaGQfR2jaGWpiu3imLvHx50hY5wRljg/v6lTqi91WBjLS+bCI12v8TVxzqQJk0eOJbrWFywDfI/COKWGbN/pcpaIkJb+g== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(396003)(366004)(346002)(376002)(136003)(451199015)(2616005)(83380400001)(4743002)(186003)(8676002)(235185007)(26005)(44832011)(41300700001)(38100700002)(5660300002)(6512007)(6486002)(33964004)(478600001)(44144004)(6506007)(8936002)(66946007)(84970400001)(4326008)(86362001)(66476007)(66556008)(2906002)(6916009)(316002)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6414 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 104b1ba2-c9fa-4323-7813-08da9d46c5a2 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XhFmF2O/VIbQh/OQgBMDD0vH3TPJZRE/NyJb3IBIfT169BO4/+koLwSi0tYXUxtO2zNylsP5z+glU1nhj97KZkh+vwDHXPjkUYcmr89HeIbiqXzxwwWf+csElZxa90z/bEfzE7630KJldV1RKNZdVXjjEuO61uTBejmYii8kIT1YXxktUlUsAxYDXgOYlPPyGKD3prG0meLqXDtlx3ZoOCoFhEEiGctedUMfxLxBE8sNRrgEftAVKfGlB1CnXtkGMGCC/mmsHE6Bs4w/oOwtvwHFcPIFEgzNQFkZVGiXG3Ys5F06M9OctID+p6nt4q3bKJe0MlPivvTgVtd3l5kRqB3S1/kuEX3XsLm2rKnxcsh86aGg913f1YeLMH5FvzaA7YQ/HAaQq5qGd6lTHQPkcmavM0fQi9Wu/isM9AdNfh1Bu8nLbGNQ8bfivaxXLqYfg3rDx3SnM8Rh+MBqsjoY5sTU62DyVjLtc9yZXWszCoqAJSCtNV0XG3jxio0pOhKX541sm30aR7HxZbclFg09ym8uECksFHmj4OoAwB0/SIg5K3H+kgG5zBtJpjl5JXcRzktcn8Dg0E14hS3Ikg+E9xQYxGSSNFkoZXuDHNUZmxD1INxGW9sDHcTVLNQZyGmIi+d2/Cw+4tV4IQUyjkqDC+sQw2gV6gKi0kgrWdux31G5QrWjYYMGcBmtE9CCBf1BEfXc0OsH3sd4QivOQdBYCwh83vIpmRenABssssyUItNrAoATfq+JfszECqfPRpS5nxQF8gpyd+y22VvxndIfaL60o+62V4/BE+4c/F8YHfUKwZvHYlkteKgTsw2mWg4/2Ygl3TqTDMy3nWBeRsNOu0tiPOOvMQ4mn/wsMHRUxHOaUjVVDcmMwMcvF1KPRmUk X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(376002)(346002)(39860400002)(396003)(136003)(451199015)(36840700001)(46966006)(40470700004)(81166007)(82310400005)(82740400003)(235185007)(84970400001)(6512007)(8676002)(5660300002)(186003)(44832011)(316002)(70206006)(26005)(70586007)(4326008)(6916009)(6486002)(36860700001)(478600001)(33964004)(83380400001)(36756003)(44144004)(40480700001)(356005)(8936002)(2616005)(4743002)(2906002)(6506007)(40460700003)(336012)(86362001)(47076005)(41300700001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:34:21.2183 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0ff91629-e379-46aa-752b-08da9d46cb7f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR08MB5752 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, This adds an RTL pattern for when two NARROWB instructions are being combined with a PACK. The second NARROWB is then transformed into a NARROWT. For the example: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] += (pixel[i] * level) / 0xff; } we generate: addhnb z6.b, z0.h, z4.h addhnb z5.b, z1.h, z4.h addhnb z0.b, z0.h, z6.h addhnt z0.b, z1.h, z5.h add z0.b, z0.b, z2.b instead of: addhnb z6.b, z1.h, z4.h addhnb z5.b, z0.h, z4.h addhnb z1.b, z1.h, z6.h addhnb z0.b, z0.h, z5.h uzp1 z0.b, z0.b, z1.b add z0.b, z0.b, z2.b Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_): New. * config/aarch64/iterators.md (binary_top): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-4.c: New test. * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 --- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_" "\t%0., %2., %3." ) +(define_insn_and_split "*aarch64_sve_pack_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (subreg:SVE_FULL_HSDI (unspec: + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] + SVE2_INT_BINARY_NARROWB) 0)] + UNSPEC_PACK))] + "TARGET_SVE2" + "#" + "&& true" + [(const_int 0)] +{ + rtx tmp = lowpart_subreg (mode, operands[1], mode); + emit_insn (gen_aarch64_sve (, mode, + operands[0], tmp, operands[2], operands[3])); +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) + (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") (UNSPEC_ADCLT "adclt") (UNSPEC_ADDHNB "addhnb") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** addhnb z6.b, z0.h, z4.h +** addhnb z5.b, z1.h, z4.h +** addhnb z0.b, z0.h, z6.h +** addhnt z0.b, z1.h, z5.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhnb z6.h, z0.s, z4.s +** addhnb z5.h, z1.s, z4.s +** addhnb z0.h, z0.s, z6.s +** addhnt z0.h, z1.s, z5.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhnb z6.s, z0.d, z4.d +** addhnb z5.s, z1.d, z4.d +** addhnb z0.s, z0.d, z6.d +** addhnt z0.s, z1.d, z5.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_" "\t%0., %2., %3." ) +(define_insn_and_split "*aarch64_sve_pack_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (subreg:SVE_FULL_HSDI (unspec: + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] + SVE2_INT_BINARY_NARROWB) 0)] + UNSPEC_PACK))] + "TARGET_SVE2" + "#" + "&& true" + [(const_int 0)] +{ + rtx tmp = lowpart_subreg (mode, operands[1], mode); + emit_insn (gen_aarch64_sve (, mode, + operands[0], tmp, operands[2], operands[3])); +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) + (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") (UNSPEC_ADCLT "adclt") (UNSPEC_ADDHNB "addhnb") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** addhnb z6.b, z0.h, z4.h +** addhnb z5.b, z1.h, z4.h +** addhnb z0.b, z0.h, z6.h +** addhnt z0.b, z1.h, z5.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhnb z6.h, z0.s, z4.s +** addhnb z5.h, z1.s, z4.s +** addhnb z0.h, z0.s, z6.s +** addhnt z0.h, z1.s, z5.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhnb z6.s, z0.d, z4.d +** addhnb z5.s, z1.d, z4.d +** addhnb z0.s, z0.d, z6.d +** addhnt z0.s, z1.d, z5.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +}