From patchwork Thu Jun 9 04:39:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 54971 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB6053850205 for ; Thu, 9 Jun 2022 04:40:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB6053850205 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1654749634; bh=Ty0/1fqyqvHwn9TdU6zN6TvUByM+85Wgy/NCxcCnYUk=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=unM9RVnMROGScXeXym0xzMgfR0suWAfPpKItZePzuaq9qmPSlpkITERESBDWrsH4g VGHj3odA4c54yARUMKSOlsscJNGpD6/jbCafUHsOsJRIqlVXRZUP3QxggwCC7zcNyN lDZF8IyURnHwbk6OAgAB3Q0zcP9Yf6YM0jSYS5BE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80054.outbound.protection.outlook.com [40.107.8.54]) by sourceware.org (Postfix) with ESMTPS id B2429385C327 for ; Thu, 9 Jun 2022 04:40:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B2429385C327 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=G1Gm8ri+yOpJQg2I3iPyOEs5pRVap5D34RaItkNboeul3ZHm7XxvuIo8sO0xc4ICX6bFRpjbk+ky3HZrPjx7p1F9PGzjtQxfJhHl08qVtlljqGP8CdX5UIgCFrAmobjov7OmKJxgO8NsHmxCrTwFGRueGuodKt/QAgyW9w8aBTZYmmP3vXRvCLT+JeIRPsDtQMK5JyHDsa3HDXbUOd2z43mUsSqBfkjagIvbI69rA9yP3Xb6wWieJaHpDaLKVKKXrtkpSbWY7l2/EaGOOcPDBjPc81/Q6Cu8VkWWbI5Ct3k6ZAG0D+oLYRH96F5qKsJvs0c+dEX+ZsW3AVg/1Xh5Cg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ty0/1fqyqvHwn9TdU6zN6TvUByM+85Wgy/NCxcCnYUk=; b=jQbiVAGHGLlQ8o7v/7hZPcQl0iICzCyGL2/xKdw647HMNDP67TD6ngRXj869iCYSB5G2KLCxyUphKiGND1nf/dGeRK7T0jlDQiHAh6yp9Iu6WOQ/MM7zLxonGYxPut9yfJReCvX3pVhyCEg97kSaLUqmzpC/xhULLRaiZSiijaq5kRuVJt57DNfzFyJ3pmWJT3E1onVV8ZFxFp/zsAmdgbhZsbogcHKbRsR2ko6wCpyhEJUi0YDU8wqXPEEerd03J5LZE4J3DdT0pp8CemKUXZcwp2pTTSnEI1vRylm59F8JqfihgVL7S7UGEob9Hx5B7LMi0msBW8OaCsv/qUdpWQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AM5P194CA0013.EURP194.PROD.OUTLOOK.COM (2603:10a6:203:8f::23) by GV1PR08MB7851.eurprd08.prod.outlook.com (2603:10a6:150:5e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.14; Thu, 9 Jun 2022 04:39:57 +0000 Received: from VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com (2603:10a6:203:8f:cafe::8c) by AM5P194CA0013.outlook.office365.com (2603:10a6:203:8f::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Thu, 9 Jun 2022 04:39:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT059.mail.protection.outlook.com (10.152.19.60) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.12 via Frontend Transport; Thu, 9 Jun 2022 04:39:56 +0000 Received: ("Tessian outbound 1766a3bff204:v120"); Thu, 09 Jun 2022 04:39:56 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a89117d4b1692f1c X-CR-MTA-TID: 64aa7808 Received: from b2975093ce1e.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1859E383-FC47-444B-9753-D8C0D8F66D89.1; Thu, 09 Jun 2022 04:39:49 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b2975093ce1e.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 09 Jun 2022 04:39:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eR7q7rqUS2/gdX6l6HxGArZtnrUob+DNO951eDxvE4KUdSAKdK92rrdIWXrK6EBSswL8x/bRVCEgnAT/NYUvVetwpbECvMTcgOA3sXoJaw5D1YvaP+qgwgA8fxSjdX+djdeCVnc5Sl+QQVJkSoEx3YwGjBwEUjV7EPL4scZB6UfhcIID0YbX1zCFIAIXo1vjkLfhiwpblWp3vUIEsg3sDxVN17feeHQUGWXLn7Z39lnNDHErkhT5dIMIj5/twdIrFHTWvoKluv47eY0lefehi1wfv9VlJvwMF9sJiDeBPqd7oeThKprLz/5R3ew5A1LiOhYtAsak0EbO/frRBh140Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ty0/1fqyqvHwn9TdU6zN6TvUByM+85Wgy/NCxcCnYUk=; b=Axj7e1T28shsSbUk8PQ9QwBQ+vmmEA2q4xwk3ib3Owux9HVtoEK6SMfZOOFpl/KJhL/I6AMLuL+ow1t0zscwLOGTUTCunYNySMmukuSzwyyH6og1cMAHZrW7iLi+ENe7AgorAH70GEaSVJdakTdaAgPvgjrs+PmrJfj1UXpbpcoTHaakxdKAEARBuP7QTQ7S9uGB7Cky2MqgKwkxtUntXVfBNvtXTakbWO9loPb9n3F1h5QvE9cwEBwwmjjwYtKPGId9pfiV3I+46sYeYEfwCFFxl8phQKfbPdVzHIrEHG4W22ZWieGMYmPjhhLvwfLGix3wqo70RCMu7ADt2TrRKg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VI1PR0802MB2286.eurprd08.prod.outlook.com (2603:10a6:800:9e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5332.13; Thu, 9 Jun 2022 04:39:23 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%7]) with mapi id 15.20.5314.019; Thu, 9 Jun 2022 04:39:23 +0000 Date: Thu, 9 Jun 2022 05:39:18 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO2P265CA0212.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:9e::32) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 94a08cdd-e73e-4dbb-4cf2-08da49d21ab5 X-MS-TrafficTypeDiagnostic: VI1PR0802MB2286:EE_|VE1EUR03FT059:EE_|GV1PR08MB7851:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: u/hhvigS7XvA27WpP/OFKgV5ovP0qJUllGTvzaBiO7J5z903zzY5Esc/6vmUGjz2wSBZeZfhQxQGpZC7w0hs7j/KsurowwHsjMYvt9QC+37Pg4U3QwQ1x7tyD0CivVf3w6Tb4ScBuUIApVx1wTiYnswr/QbrP0d+Qt9h3qQ5IjdzDoqjiz75S7YkRlrDfp53W6FyWJLXYDSOJ6PA0IHib5zmN5RRNaqhG3s+2IJOFCbmv/jGNyyWRKa1Fe6l+s53aIOI1FV/aSjqOrewOxugT35B98NSEfv/JFYbvfPMW+iTk5JPsG1lXMaHgtp5T6KEwObMQZFnVXBio57aw0GSDfYuIQwjgGFvthPll+h0u4uIH8TBVt9LMY9/uMTS9iUwA1Ww9pE0ohdeFEHvgVElvwIVL9JUnlyPMDkrvF/5WMzo2M+tR/oCyCTH8zoTGGlG6C0HC/HFX1ZGEfvj0eMzpdu1TOn1Fcc5xn/GKtA+PWX4BQC+r6CkwTQ+WUWsRWjZ/Agc8zucAG2+uCVi3VDrlyqZ/GM2cwbm87ZFt16PQIEhU4gdlu0HWuQ1q1N1boe/BcLTzjZX/PD400uupykKJfotWn8hx/clSJNm70BB0AY3KgpPfsxX5fynPgUMHgpvf5F5Y0nH98NSOKXwX4Zb5+UIMJaGJOxzjVFuIhj/g3CThe0KkdP+Cgu8OxK/LwJNYfaqjLeWbE/GyjJH2TJlXgB2r2Y3zSJuiKg6N3g/3Ei5Nndlwm1KdIlNOvQh3VTptyG0hco4skgPT8d98E1AVWZj7gw86IWmOdCl+CqUGZQGEL/qlMMwqoeYxsQ/F3MyDQaiVconckXjimuVyAiHZ93Ic2zKsEBao9etokxGZHg= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(4636009)(366004)(235185007)(5660300002)(2616005)(508600001)(86362001)(38100700002)(26005)(4743002)(6512007)(186003)(83380400001)(33964004)(44144004)(8676002)(6666004)(8936002)(6506007)(4326008)(2906002)(66476007)(66556008)(6486002)(66946007)(316002)(6916009)(36756003)(44832011)(84970400001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2286 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 7f7e431a-4f16-4cc7-7267-08da49d206b9 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9nZSbwq5O5L93KFhAFMwQWxTjkNSW+0IME1I64Pxems+EugnpDvzG2xlGpGh9JXWpVrFC9yvP10n5fpMrWav3AC1bU9pCc44JSji0bk4+7ABHwSb+0LRtzAVXJreYAnLIKslYbfMwihCYfkqd/VjhWcltFm/rbUoo+iiVAcyyB4wc4deLt9j2NnzPHbkxYJOUlNqLfzUZb6cFbgbbO+nWlO/OOEyQLy8p0JYLHVFCm7NaeQeBulv6fM7z1/elov2vWQWEs+oe5pl5yubVuFyv9ilSpa5AH0CO7Ax7ZyOpWsghupCD4bL+MncxV/VgDaqI0AbhLFLds39tiWDszxX/+g7e4zZQhdCpHle4ZP/bQLfltBGsGt/GYsxsfskOPcScEVv21Ac4QJTApb2PmLseBRKrqOQ20SyFAoYGZ+S+DEH5cjF0cq4ixgB7v6VqNp0r3Gn1tsArdKBm9ZYPv82FZC63UAnIu9b8OjijDuBWSw/XPyyFcEaLdQWCG7AFmA+WlMP3XEzV7Smi0I3fWJxNuH77KwINyYB8hSBjLUCQ31N10c7QP+SsXn8pCNmXxuFGlm/o0xA2vzo96+iBMAwCvZf/tOp17sA/pTsPAclAlbULbjoanTxxCU/drwaPWqkoXeQXUKb0L6E8+v2kL4QjsZ/1AIaOoqOtse6izlEBTtqY4sXov2htiL4qeQiDUTNLU0rByIbnh3ioy9/pFRMseUK1DH29/xxxnDNHvl5S1Ot3cvXzaa5fWujMh4CLGN1SALBx+ohzRnRatB3chft5E7PPE2nYMcAe3VhfMGoOII4uCuSCkDRtc3CJWrOSL4CkbVSUZFPMOxwuiSrRvNX8wfXkQBp0BhjWJp/8lN22JM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(6666004)(81166007)(235185007)(8936002)(316002)(36756003)(2616005)(508600001)(4743002)(6506007)(6512007)(26005)(6916009)(186003)(33964004)(6486002)(44144004)(4326008)(86362001)(70586007)(70206006)(44832011)(336012)(8676002)(5660300002)(82310400005)(47076005)(83380400001)(84970400001)(2906002)(36860700001)(40460700003)(356005)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2022 04:39:56.4989 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 94a08cdd-e73e-4dbb-4cf2-08da49d21ab5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT059.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB7851 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: richard.sandiford@arm.com, nd@arm.com, rguenther@suse.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an optab to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1 Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * internal-fn.def (DIV_POW2_BITMASK): New. * optabs.def (udiv_pow2_bitmask_optab): New. * doc/md.texi: Document it. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Recognize pattern. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-1.c: New test. * gcc.dg/vect/vect-div-bitmask-2.c: New test. * gcc.dg/vect/vect-div-bitmask-3.c: New test. * gcc.dg/vect/vect-div-bitmask.h: New file. --- inline copy of patch -- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7b4f6e945866c38 100644 --- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f3619c505c025f158c2bc64756531877378b22e1..784c49d7d24cef7619e4d613f7b4f6e945866c38 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5588,6 +5588,18 @@ signed op0, op1; op0 = op1 / (1 << imm); @end smallexample +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@item @samp{udiv_pow2_bitmask@var{m2}} +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@itemx @samp{udiv_pow2_bitmask@var{m2}} +Unsigned vector division by an immediate that is equivalent to +@samp{2^(bitsize(m) / 2) - 1}. +@smallexample +unsigned short op0; op1; +@dots{} +op0 = op1 / 0xffU; +@end smallexample + @cindex @code{vec_shl_insert_@var{m}} instruction pattern @item @samp{vec_shl_insert_@var{m}} Shift the elements in vector input operand 1 left one element (i.e.@: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index d2d550d358606022b1cb44fa842f06e0be507bc3..a3e3cc1520f77683ebf6256898f916ed45de475f 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -159,6 +159,8 @@ DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) DEF_INTERNAL_OPTAB_FN (DIV_POW2, ECF_CONST | ECF_NOTHROW, sdiv_pow2, binary) +DEF_INTERNAL_OPTAB_FN (DIV_POW2_BITMASK, ECF_CONST | ECF_NOTHROW, + udiv_pow2_bitmask, unary) DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 801310ebaa7d469520809bb7efed6820f8eb866b..3f0ac05ef5ad5aed8d6ca391f4eed71b0494e17f 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -372,6 +372,7 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, "umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3") OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") +OPTAB_D (udiv_pow2_bitmask_optab, "udiv_pow2_bitmask$a2") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..a7ea3cce4764239c5d281a8f0bead1f6a452de3f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..009e16e1b36497e5724410d9843f1ce122b26dda --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..bf35a0bda8333c418e692d94220df849cc47930b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 217bdfd7045a22578a35bb891a4318d741071872..a738558cb8d12296bff462d716310ca8d82957b5 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3558,6 +3558,33 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if ((TYPE_UNSIGNED (itype) || tree_int_cst_sgn (oprnd1) != 1) + && rhs_code != TRUNC_MOD_EXPR) + { + wide_int icst = wi::to_wide (oprnd1); + wide_int val = wi::add (icst, 1); + int pow = wi::exact_log2 (val); + if (pow == (prec / 2)) + { + /* Pattern detected. */ + vect_pattern_detected ("vect_recog_divmod_pattern", last_stmt); + + *type_out = vectype; + + /* Check if the target supports this internal function. */ + internal_fn ifn = IFN_DIV_POW2_BITMASK; + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + { + tree var_div = vect_recog_temp_ssa_var (itype, NULL); + gimple *div_stmt = gimple_build_call_internal (ifn, 1, oprnd0); + gimple_call_set_lhs (div_stmt, var_div); + + gimple_set_location (div_stmt, gimple_location (last_stmt)); + + return div_stmt; + } + } + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5588,6 +5588,18 @@ signed op0, op1; op0 = op1 / (1 << imm); @end smallexample +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@item @samp{udiv_pow2_bitmask@var{m2}} +@cindex @code{udiv_pow2_bitmask@var{m2}} instruction pattern +@itemx @samp{udiv_pow2_bitmask@var{m2}} +Unsigned vector division by an immediate that is equivalent to +@samp{2^(bitsize(m) / 2) - 1}. +@smallexample +unsigned short op0; op1; +@dots{} +op0 = op1 / 0xffU; +@end smallexample + @cindex @code{vec_shl_insert_@var{m}} instruction pattern @item @samp{vec_shl_insert_@var{m}} Shift the elements in vector input operand 1 left one element (i.e.@: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index d2d550d358606022b1cb44fa842f06e0be507bc3..a3e3cc1520f77683ebf6256898f916ed45de475f 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -159,6 +159,8 @@ DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) DEF_INTERNAL_OPTAB_FN (DIV_POW2, ECF_CONST | ECF_NOTHROW, sdiv_pow2, binary) +DEF_INTERNAL_OPTAB_FN (DIV_POW2_BITMASK, ECF_CONST | ECF_NOTHROW, + udiv_pow2_bitmask, unary) DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 801310ebaa7d469520809bb7efed6820f8eb866b..3f0ac05ef5ad5aed8d6ca391f4eed71b0494e17f 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -372,6 +372,7 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, "umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3") OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") +OPTAB_D (udiv_pow2_bitmask_optab, "udiv_pow2_bitmask$a2") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..a7ea3cce4764239c5d281a8f0bead1f6a452de3f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..009e16e1b36497e5724410d9843f1ce122b26dda --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..bf35a0bda8333c418e692d94220df849cc47930b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 217bdfd7045a22578a35bb891a4318d741071872..a738558cb8d12296bff462d716310ca8d82957b5 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3558,6 +3558,33 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if ((TYPE_UNSIGNED (itype) || tree_int_cst_sgn (oprnd1) != 1) + && rhs_code != TRUNC_MOD_EXPR) + { + wide_int icst = wi::to_wide (oprnd1); + wide_int val = wi::add (icst, 1); + int pow = wi::exact_log2 (val); + if (pow == (prec / 2)) + { + /* Pattern detected. */ + vect_pattern_detected ("vect_recog_divmod_pattern", last_stmt); + + *type_out = vectype; + + /* Check if the target supports this internal function. */ + internal_fn ifn = IFN_DIV_POW2_BITMASK; + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + { + tree var_div = vect_recog_temp_ssa_var (itype, NULL); + gimple *div_stmt = gimple_build_call_internal (ifn, 1, oprnd0); + gimple_call_set_lhs (div_stmt, var_div); + + gimple_set_location (div_stmt, gimple_location (last_stmt)); + + return div_stmt; + } + } + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1))