From patchwork Thu Aug 1 09:14:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saurabh@sourceware.org, Jha@sourceware.org X-Patchwork-Id: 94937 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 85B5F385842C for ; Thu, 1 Aug 2024 09:15:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 85B5F385842C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1722503748; bh=GUS5B+ROOy91G9ADERpy+uPA18zc96IP5dDlLn/kWU4=; h=From:To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From; b=GR6rEGzuj1wa/A96YyIuoV3NiXTj9vwzox1JXwwjvb89MypZhc845up6oaIFCdydY mfwyC69tB7ZikXKPcsQ79LHe3VCNiqsJ1Ovab64k5OlEWExrRZNbxukIrh55XLTTkv CUob3A2m/mGvD4/22mFsGG2kdAo+XSOqp8t254EU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03hn2239.outbound.protection.outlook.com [52.100.15.239]) by sourceware.org (Postfix) with ESMTPS id 048B23858C78 for ; Thu, 1 Aug 2024 09:14:26 +0000 (GMT) ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 048B23858C78 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=52.100.15.239 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1722503671; cv=pass; b=gXbsJEHxT3g7Jr1OrC1KXdUlPNt8STI2o3GbqJohI5VQQRkjscTs7+Vh5n3UOV/2Iqc3gYQCcMDa/WI0q6KiRLK7rXbRuy+KmtMY8KkzVbBSZ63ulL45EhjxNRcTfKfZrfTvXkHYkGjY0G4ZD+SkIo2XyA85RcDh4WSbnFHuLm0= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1722503671; c=relaxed/simple; bh=Hnlur6Jrk2x+7fgkC5qIak8/VAyDy+TFpKwkQd81ZOs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=eV6Xu/om/cejjx0mvMoPiR5nCTncQSBJ4kyUXLay6hHwEB5Q5TV8ITrvhWUoHdbCGG3zv3ox7spFbo0kXC61Pfm//VOVmc5BHJiH0ACkx5WzpuddS7CUkQX2ZivCJy0X0DhlhzhwxWrQ7KoUseAvv4WVLediOKEFZZjHpmEOAeM= ARC-Authentication-Results: i=2; server2.sourceware.org Authentication-Results: sourceware.org; dkim=permerror (bad message/signature format) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Bo+gsgYEYkCncOTeS01WwTw0c1U5XXPTPkdLzurgrk5lnF9oihIv4GGGRr8MR8QzMKM56aIGAMIgbdCKXSxEGmI1ddNIriWp9B631sntaWdUYdcvHPKj1uvBwm/2j7XQGMOijBTlo3WzYBnw//6q5MDBNeaIbdSbR7f9QpwRkoCUQxIOo2dpzBFTShmehdgUOqa83tT8zjp4PngHL669ROnWRAVVjFf0XQ2dxVyC6GKBQie1AAs5qh6K2FI263ulYgTP+Idljdx1WjhjE22ZvvEPteTRC4WtTk4GK7iFyzJvLVhoAh7nGIs3fY6liebMryQ093FDPLcwSXaiTb4ibA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GUS5B+ROOy91G9ADERpy+uPA18zc96IP5dDlLn/kWU4=; b=hChjgkxoBYysnEF0vsrDSDnOvuXhHLWEzyYlPwvesoFUwCNjsyMLzG81xH+xNDhIY65wGQyMItVsqMS4ipSA1KGjbQmI68WEb8vhzn9YPGVbvyzoEv28l1RzftGOMfis7FECNbY0CUYzZgEkrpcaLdoKQpdnvx7la7sf1aKWTutzYIJdMw8tI6vWBzYdiCAD5CfecKNIXP+MAGFu7XjPr7D06jDrJygm37qBjFS+69xxVa1jLXnqqqRp8+6dDrunol7pR/mmAqsNIJ5gWBuphncQZH/t6i/07NM+7zIUy+sxPOb8v1Mf3Q+KnNdL5oBlh7Zkwr7PGOspGx6b0XvLeQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.helo=nebula.arm.com; dmarc=none; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GUS5B+ROOy91G9ADERpy+uPA18zc96IP5dDlLn/kWU4=; b=QvK9EOZVcq1bFYyabr3vQapIRAQTokHzjdQTwUep15a+0N7M/7ZUruGXesJrdao5wgYsyuE2k5Pw4Az2NJrupsaBuMzTjDe0oT893dzjOaO279ayrbq2niz+PPxizb4MVguhz2crqWmcEzmEuI6LbhxaXTRpfUNX8yxIg4wN6jY= Received: from DU6P191CA0025.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:53f::25) by PAXPR08MB7669.eurprd08.prod.outlook.com (2603:10a6:102:243::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7828.22; Thu, 1 Aug 2024 09:14:20 +0000 Received: from DB5PEPF00014B99.eurprd02.prod.outlook.com (2603:10a6:10:53f:cafe::60) by DU6P191CA0025.outlook.office365.com (2603:10a6:10:53f::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7828.22 via Frontend Transport; Thu, 1 Aug 2024 09:14:20 +0000 X-MS-Exchange-Authentication-Results: spf=none (sender IP is 40.67.248.234) smtp.helo=nebula.arm.com; dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=; Received-SPF: None (protection.outlook.com: nebula.arm.com does not designate permitted sender hosts) Received: from nebula.arm.com (40.67.248.234) by DB5PEPF00014B99.mail.protection.outlook.com (10.167.8.166) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7828.19 via Frontend Transport; Thu, 1 Aug 2024 09:14:19 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 1 Aug 2024 09:14:15 +0000 Received: from e130340.cambridge.arm.com (10.2.80.47) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Thu, 1 Aug 2024 09:14:15 +0000 From: Saurabh@sourceware.org, Jha@sourceware.org To: CC: , , Saurabh Jha Subject: [PATCH] aarch64: Add support for AdvSIMD faminmax Date: Thu, 1 Aug 2024 10:14:10 +0100 Message-ID: <20240801091410.2466996-1-saurabh.jha@arm.com> X-Mailer: git-send-email 2.43.2 MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DB5PEPF00014B99:EE_|PAXPR08MB7669:EE_ X-MS-Office365-Filtering-Correlation-Id: e13585e6-ff11-4483-40e6-08dcb20a5337 NoDisclaimer: true X-MS-Exchange-SenderADCheck: 2 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|5005620100009|61400799027|35950700016|376014|82310400026|1557600090; X-Microsoft-Antispam-Message-Info: =?utf-8?q?ncqJ2NnhrK+qp1WLBsb5CbPOJN23pT+?= =?utf-8?q?Qh78zdUvTxoMWclgGrU41WwOkN1Syv7AU0tno8e74qLqTJ93VEner4M5gDt4kRWwB?= =?utf-8?q?G6SLRXUIohf71AO4s4faU8XI+qoukvASiLEnhtwtamHePZLDVBzEd4UQ2joME4Qcb?= =?utf-8?q?1CQjyg+m00X84cyrM5M/04JWLwODJb7SB5y2wudWWrJmucQvtI65mLlcaG5N3O+Je?= =?utf-8?q?D/7qniLvI8aI85hCS8/IRJ+BD/TE19HGrX6T8p2zmEdKSioK5OpM01YAYbiwGt8p2?= =?utf-8?q?OWA2x+4+u2ZjXdz8n9VQQPEd05MmSOA6I7YyRNgtNX8wDWh1gx4f7+RZXycdY2/45?= =?utf-8?q?8JYsKHX4FapB6Mos4MDraB7MBVMI6eSzC3TJP5nfAAbbPFpGHacWwB0Q8EjlZHyn1?= =?utf-8?q?cWr2Z+Rf2Nz6Tz8TWnQh2LBgTBNIcWimd+o3+WEGbuojEVUdo3fdLwRvlYj/7Zf/B?= =?utf-8?q?1xgVF73CHfO/Ws3jCOW+CimTuyJU0aEOjr2b8l3NtIU88mvIrJ2igN6jHElCg6fLy?= =?utf-8?q?43Q/gtgdOkn01J3T8ZdRxyVfZYRlBaGQ59g76i2iQHQ+2K/Q/mt/3y8R4IPF5jb/A?= =?utf-8?q?vtV94lQc7C6K6jk/74kqWGQruVZf7mcBtHpurtC66xrU67wRaZRXu0Mx2gHL5ZyX0?= =?utf-8?q?b8HlJTXTDknk7oXO89+Dt8w2S4FjUSLvGPIAcYARuW377FmLLBF7OGjCp9U6p0DIu?= =?utf-8?q?lCTEdqryN8ar25JGKShkTw6Uu+ooYUZV6258/EZ7XGEXNQKFK+pm5tMZ7qNMmuL7S?= =?utf-8?q?6HaR8MVcKHE6do03bX4mk84flcOUnYC9SRy8nbPNkDP183OMXKgaPoQ9iQupMTSoc?= =?utf-8?q?Qhr2LkhmSMw+86PRdgDQprT/JqelNlUztjARTzzzjpUIxemKg+NfE9xvgTDbc9Ryi?= =?utf-8?q?bEzjMdW/rXEc9vrZx6rz+EF173SnfBYfOJtSRFR2rE6Bp+ZyKvVUZlDDdqgnpcZCz?= =?utf-8?q?AacEUzS+97rEVJhL0ZQQIUSInThUGV0pD80uxnjNO13n+Vd+RmUwOulEMntTGq2gE?= =?utf-8?q?zBvzugJHKLK9KTIYm1lhEPBxcZRnM0bxZzf7Kpn/D4Hyp1Dso3plhTOmDJD4KuErl?= =?utf-8?q?s6ZHn9zEU+EguBJSR/25quPogesT0FY9SwrfphXXXn0ssfdvfAtXq+eZ93uWuPcxG?= =?utf-8?q?lXtSSdYMlQzV0R7lnWcU+AOH352xT5kTUsOPBuMCywgIDQOK4zbrkdAyrE3fMeyXx?= =?utf-8?q?tuOo9Wpt1vFh1Or8auLT922Yb8yx/4aY7qOyMRT6s/BaONtxpT2mO1FSfHNn6wzrp?= =?utf-8?q?vXWtUsQUBKXrvOskNCuvyCPeh0+8BlWDhV93TP4yf4bTTD7nE7/T3o4p6Bs1Vh6Dt?= =?utf-8?q?kX+rBLITzsAemozdbnUZG3p6xlv/b2+lhLlBRrn9olcqe+hNro3VXzS4Xc28Z1x5A?= =?utf-8?q?GXv+pxpJPgzx9pMyK8dj3TKWlKmi7ZEO9naBRi9qM60iwCKWAyaBibNt1RQFGPPuS?= =?utf-8?q?TToorZyvqhdTCYoWUT+WdSYcIf714d/0OSGTqe98SFty4/RSTqoXs+4UQVFDLspBX?= =?utf-8?q?MoqF4MSvALRMAcSacVkAYGGCIV0j60jQeMPI0hw9ieFIyQBVoNh4I9eaDf5jN2CzQ?= =?utf-8?q?+F4o2aNugTAwOckceSHU/6QwL0ralRk8VLlXkD6+6uLF8BN5qnX1Kglkkybfs26vl?= =?utf-8?q?HoO6+VPPUp0ct0hV0v/dWJdUUTgL8XSWIAj/PXQNCYUy2lhP5T9TrH4fdsdRHHFBE?= =?utf-8?q?BsZ8WjPv07O06ZNyOhlJbaJEJzQhELFyUyi2oZwvRtM4wea6QZUbMRXigRsN1CMc3?= =?utf-8?q?xmcRPCqSg8hbe+pG+yu3Kv+oJ8g=3D=3D?= X-Forefront-Antispam-Report: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:6; SRV:; IPV:NLI; SFV:SPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:OSPM; SFS:(13230040)(5005620100009)(61400799027)(35950700016)(376014)(82310400026)(1557600090); DIR:OUT; SFP:1501; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Aug 2024 09:14:19.5002 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e13585e6-ff11-4483-40e6-08dcb20a5337 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[40.67.248.234]; Helo=[nebula.arm.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B99.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7669 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and mandatory from Armv9.5-a. It introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch does three things: 1. Introduces AdvSIMD faminmax intrinsics. 2. Adds code generation support for famax and famin in terms of the existing operators. 3. Move report_missing_extension and reported_missing_extension_p to make it more usable. The intrinsics of this extension are implemented as the following builtin functions: * vamax_f16 * vamaxq_f16 * vamax_f32 * vamaxq_f32 * vamaxq_f64 * vamin_f16 * vaminq_f16 * vamin_f32 * vaminq_f32 * vaminq_f64 For code generation, famax/famin is equivalent to first taking fabs of the operands and then taking fmax/fmin of the results of fabs. famax/famin (a, b) = fmax/fmin (fabs (a), fabs (b)) This is correct because NaN/Inf handling of famax/famin and fmax/fmin are same. We cannot use fmaxnm/fminnm here as Nan/Inf are handled differently in them. We moved the definition of `report_missing_extension` from gcc/config/aarch64/aarch64-sve-builtins.cc to gcc/config/aarch64/aarch64-builtins.cc and its declaration to gcc/config/aarch64/aarch64-builtins.h. We also moved the declaration of `reported_missing_extension_p` from gcc/config/aarch64/aarch64-sve-builtins.cc to gcc/config/aarch64/aarch64-builtins.cc, closer to the definition of `report_missing_extension`. In the exsiting code structure, this leads to `report_missing_extension` being usable from both normal builtins and sve builtins. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): New enum values for faminmax builtins. (aarch64_init_faminmax_builtins): New function to declare new builtins. (handle_arm_neon_h): Modify to call aarch64_init_faminmax_builtins. (aarch64_general_check_builtin_call): Modify to check whether +faminmax flag is being used and printing error message if not being used. (aarch64_expand_builtin_faminmax): New function to emit instructions of this extension. (aarch64_general_expand_builtin): Modify to call aarch64_expand_builtin_faminmax. (report_missing_extension): Move from config/aarch64/aarch64-sve-builtins.cc. * config/aarch64/aarch64-builtins.h (report_missing_extension): Declaration for this function so that it can be used wherever this header is included. (reported_missing_extension_p): Move from config/aarch64/aarch64-sve-builtins.cc * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Introduce new flag for this extension. * config/aarch64/aarch64-simd.md (aarch64_): Introduce instruction pattern for this extension. * config/aarch64/aarch64-sve-builtins.cc (reported_missing_extension_p): Move to config/aarch64/aarch64-builtins.cc (report_missing_extension): Move to config/aarch64/aarch64-builtins.cc. * config/aarch64/aarch64.h (TARGET_FAMINMAX): Introduce new flag for this extension. * config/aarch64/iterators.md: Introduce new iterators for this extension. * config/arm/types.md: Introduce neon_fp_aminmax attributes. * doc/invoke.texi: Document extension in AArch64 Options. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-builtins.c: New test. * gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test. * gcc.target/aarch64/simd/faminmax-codegen.c: New test. --- Hi, Regression tested for aarch64-none-linux-gnu and found no regressions. This patch is a revised version of an earlier patch https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657914.html but has more scope than that. That's why I didn't add "v2" in the subject line. Ok for master? I don't have commit access so can someone please commit on my behalf? Regards, Saurabh --- gcc/config/aarch64/aarch64-builtins.cc | 173 +++++++++++++++++- gcc/config/aarch64/aarch64-builtins.h | 5 +- .../aarch64/aarch64-option-extensions.def | 2 + gcc/config/aarch64/aarch64-simd.md | 12 ++ gcc/config/aarch64/aarch64-sve-builtins.cc | 22 --- gcc/config/aarch64/aarch64.h | 4 + gcc/config/aarch64/iterators.md | 8 + gcc/config/arm/types.md | 6 + gcc/doc/invoke.texi | 2 + .../aarch64/simd/faminmax-builtins-no-flag.c | 10 + .../aarch64/simd/faminmax-builtins.c | 75 ++++++++ .../aarch64/simd/faminmax-codegen-no-flag.c | 54 ++++++ .../aarch64/simd/faminmax-codegen.c | 104 +++++++++++ 13 files changed, 445 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 30669f8aa18..cd590186f22 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -829,6 +829,17 @@ enum aarch64_builtins AARCH64_RBIT, AARCH64_RBITL, AARCH64_RBITLL, + /* FAMINMAX builtins. */ + AARCH64_FAMINMAX_BUILTIN_FAMAX4H, + AARCH64_FAMINMAX_BUILTIN_FAMAX8H, + AARCH64_FAMINMAX_BUILTIN_FAMAX2S, + AARCH64_FAMINMAX_BUILTIN_FAMAX4S, + AARCH64_FAMINMAX_BUILTIN_FAMAX2D, + AARCH64_FAMINMAX_BUILTIN_FAMIN4H, + AARCH64_FAMINMAX_BUILTIN_FAMIN8H, + AARCH64_FAMINMAX_BUILTIN_FAMIN2S, + AARCH64_FAMINMAX_BUILTIN_FAMIN4S, + AARCH64_FAMINMAX_BUILTIN_FAMIN2D, /* System register builtins. */ AARCH64_RSR, AARCH64_RSRP, @@ -1547,6 +1558,66 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma) } } +/* Initialize the absolute maximum/minimum (FAMINMAX) builtins. */ + +typedef struct +{ + const char *name; + unsigned int code; + tree eltype; + machine_mode mode; +} faminmax_builtins_data; + +static void +aarch64_init_faminmax_builtins () +{ + faminmax_builtins_data data[] = { + /* Absolute maximum. */ + {"vamax_f16", AARCH64_FAMINMAX_BUILTIN_FAMAX4H, + aarch64_simd_types[Float16x4_t].eltype, + aarch64_simd_types[Float16x4_t].mode}, + {"vamaxq_f16", AARCH64_FAMINMAX_BUILTIN_FAMAX8H, + aarch64_simd_types[Float16x8_t].eltype, + aarch64_simd_types[Float16x8_t].mode}, + {"vamax_f32", AARCH64_FAMINMAX_BUILTIN_FAMAX2S, + aarch64_simd_types[Float32x2_t].eltype, + aarch64_simd_types[Float32x2_t].mode}, + {"vamaxq_f32", AARCH64_FAMINMAX_BUILTIN_FAMAX4S, + aarch64_simd_types[Float32x4_t].eltype, + aarch64_simd_types[Float32x4_t].mode}, + {"vamaxq_f64", AARCH64_FAMINMAX_BUILTIN_FAMAX2D, + aarch64_simd_types[Float64x2_t].eltype, + aarch64_simd_types[Float64x2_t].mode}, + /* Absolute minimum. */ + {"vamin_f16", AARCH64_FAMINMAX_BUILTIN_FAMIN4H, + aarch64_simd_types[Float16x4_t].eltype, + aarch64_simd_types[Float16x4_t].mode}, + {"vaminq_f16", AARCH64_FAMINMAX_BUILTIN_FAMIN8H, + aarch64_simd_types[Float16x8_t].eltype, + aarch64_simd_types[Float16x8_t].mode}, + {"vamin_f32", AARCH64_FAMINMAX_BUILTIN_FAMIN2S, + aarch64_simd_types[Float32x2_t].eltype, + aarch64_simd_types[Float32x2_t].mode}, + {"vaminq_f32", AARCH64_FAMINMAX_BUILTIN_FAMIN4S, + aarch64_simd_types[Float32x4_t].eltype, + aarch64_simd_types[Float32x4_t].mode}, + {"vaminq_f64", AARCH64_FAMINMAX_BUILTIN_FAMIN2D, + aarch64_simd_types[Float64x2_t].eltype, + aarch64_simd_types[Float64x2_t].mode}, + }; + + for (size_t i = 0; i < ARRAY_SIZE (data); ++i) + { + tree type + = build_vector_type (data[i].eltype, GET_MODE_NUNITS (data[i].mode)); + tree fntype = build_function_type_list (type, type, type, NULL_TREE); + unsigned int code = data[i].code; + const char *name = data[i].name; + aarch64_builtin_decls[code] + = aarch64_general_simulate_builtin (name, fntype, code); + } +} + /* Register the tuple type that contains NUM_VECTORS of the AdvSIMD type indexed by TYPE_INDEX. */ static void @@ -1640,6 +1711,7 @@ handle_arm_neon_h (void) aarch64_init_simd_builtin_functions (true); aarch64_init_simd_intrinsics (); + aarch64_init_faminmax_builtins (); } static void @@ -2197,15 +2269,35 @@ aarch64_general_check_builtin_call (location_t location, vec, case AARCH64_WSR64: case AARCH64_WSRF: case AARCH64_WSRF64: - tree addr = STRIP_NOPS (args[0]); - if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE - || TREE_CODE (addr) != ADDR_EXPR - || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST) - { - error_at (location, "first argument to %qD must be a string literal", - fndecl); - return false; - } + { + tree addr = STRIP_NOPS (args[0]); + if (TREE_CODE (TREE_TYPE (addr)) != POINTER_TYPE + || TREE_CODE (addr) != ADDR_EXPR + || TREE_CODE (TREE_OPERAND (addr, 0)) != STRING_CST) + { + error_at (location, + "first argument to %qD must be a string literal", + fndecl); + return false; + } + } + case AARCH64_FAMINMAX_BUILTIN_FAMAX4H: + case AARCH64_FAMINMAX_BUILTIN_FAMAX8H: + case AARCH64_FAMINMAX_BUILTIN_FAMAX2S: + case AARCH64_FAMINMAX_BUILTIN_FAMAX4S: + case AARCH64_FAMINMAX_BUILTIN_FAMAX2D: + case AARCH64_FAMINMAX_BUILTIN_FAMIN4H: + case AARCH64_FAMINMAX_BUILTIN_FAMIN8H: + case AARCH64_FAMINMAX_BUILTIN_FAMIN2S: + case AARCH64_FAMINMAX_BUILTIN_FAMIN4S: + case AARCH64_FAMINMAX_BUILTIN_FAMIN2D: + { + if (!TARGET_FAMINMAX) + { + report_missing_extension (location, fndecl, "faminmax"); + return false; + } + } } /* Default behavior. */ return true; @@ -3071,6 +3163,44 @@ aarch64_expand_builtin_data_intrinsic (unsigned int fcode, tree exp, rtx target) return ops[0].value; } +static rtx +aarch64_expand_builtin_faminmax (unsigned int fcode, tree exp, rtx target) +{ + machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + rtx op0 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 0))); + rtx op1 = force_reg (mode, expand_normal (CALL_EXPR_ARG (exp, 1))); + + enum insn_code icode; + if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX4H) + icode = CODE_FOR_aarch64_famaxv4hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX8H) + icode = CODE_FOR_aarch64_famaxv8hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX2S) + icode = CODE_FOR_aarch64_famaxv2sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX4S) + icode = CODE_FOR_aarch64_famaxv4sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMAX2D) + icode = CODE_FOR_aarch64_famaxv2df; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN4H) + icode = CODE_FOR_aarch64_faminv4hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN8H) + icode = CODE_FOR_aarch64_faminv8hf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN2S) + icode = CODE_FOR_aarch64_faminv2sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN4S) + icode = CODE_FOR_aarch64_faminv4sf; + else if (fcode == AARCH64_FAMINMAX_BUILTIN_FAMIN2D) + icode = CODE_FOR_aarch64_faminv2df; + else + gcc_unreachable (); + + rtx pat = GEN_FCN (icode) (target, op0, op1); + + emit_insn (pat); + + return target; +} + /* Expand an expression EXP as fpsr or fpcr setter (depending on UNSPEC) using MODE. */ static void @@ -3250,6 +3380,9 @@ aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target, if (fcode >= AARCH64_REV16 && fcode <= AARCH64_RBITLL) return aarch64_expand_builtin_data_intrinsic (fcode, exp, target); + if (fcode >= AARCH64_FAMINMAX_BUILTIN_FAMAX4H + && fcode <= AARCH64_FAMINMAX_BUILTIN_FAMIN2D) + return aarch64_expand_builtin_faminmax (fcode, exp, target); gcc_unreachable (); } @@ -3794,6 +3927,28 @@ aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) reload_fenv, restore_fnenv), update_call); } +/* True if we've already complained about attempts to use functions + when the required extension is disabled. */ +static bool reported_missing_extension_p; + +/* Report an error against LOCATION that the user has tried to use + function FNDECL when extension EXTENSION is disabled. */ +void +report_missing_extension (location_t location, tree fndecl, + const char *extension) +{ + /* Avoid reporting a slew of messages for a single oversight. */ + if (reported_missing_extension_p) + return; + + error_at (location, "ACLE function %qD requires ISA extension %qs", + fndecl, extension); + inform (location, "you can enable %qs using the command-line" + " option %<-march%>, or by using the %" + " attribute or pragma", extension); + reported_missing_extension_p = true; +} + /* Resolve overloaded MEMTAG build-in functions. */ #define AARCH64_BUILTIN_SUBCODE(F) \ (DECL_MD_FUNCTION_CODE (F) >> AARCH64_BUILTIN_SHIFT) diff --git a/gcc/config/aarch64/aarch64-builtins.h b/gcc/config/aarch64/aarch64-builtins.h index e326fe66676..93e31a30ec6 100644 --- a/gcc/config/aarch64/aarch64-builtins.h +++ b/gcc/config/aarch64/aarch64-builtins.h @@ -96,4 +96,7 @@ struct GTY(()) aarch64_simd_type_info extern aarch64_simd_type_info aarch64_simd_types[]; -#endif \ No newline at end of file +void report_missing_extension (location_t location, tree fndecl, + const char *extension); + +#endif diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 42ec0eec31e..e95bd70893a 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the") AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs") +AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax") + #undef AARCH64_OPT_FMV_EXTENSION #undef AARCH64_OPT_EXTENSION #undef AARCH64_FMV_FEATURE diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bbeee221f37..6fab2f5a976 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -9881,3 +9881,15 @@ "shl\\t%d0, %d1, #16" [(set_attr "type" "neon_shift_imm")] ) + +;; faminmax +(define_insn "aarch64_" + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF + [(abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")) + (abs:VHSDF (match_operand:VHSDF 2 "register_operand" "w"))] + FMAXMIN_ONLY_UNS))] + "TARGET_FAMINMAX" + "\t%0., %1., %2." + [(set_attr "type" "neon_fp_aminmax")] +) diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index f3983a123e3..58c780b9464 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -947,10 +947,6 @@ static hash_table *function_table; are IDENTIFIER_NODEs. */ static GTY(()) hash_map *overload_names[2]; -/* True if we've already complained about attempts to use functions - when the required extension is disabled. */ -static bool reported_missing_extension_p; - /* True if we've already complained about attempts to use functions which require registers that are missing. */ static bool reported_missing_registers_p; @@ -1076,24 +1072,6 @@ lookup_fndecl (tree fndecl) return &(*registered_functions)[subcode]->instance; } -/* Report an error against LOCATION that the user has tried to use - function FNDECL when extension EXTENSION is disabled. */ -static void -report_missing_extension (location_t location, tree fndecl, - const char *extension) -{ - /* Avoid reporting a slew of messages for a single oversight. */ - if (reported_missing_extension_p) - return; - - error_at (location, "ACLE function %qD requires ISA extension %qs", - fndecl, extension); - inform (location, "you can enable %qs using the command-line" - " option %<-march%>, or by using the %" - " attribute or pragma", extension); - reported_missing_extension_p = true; -} - /* Check whether the registers required by SVE function fndecl are available. Report an error against LOCATION and return false if not. */ static bool diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 8056c337957..c6773f64745 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -456,6 +456,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED enabled through +gcs. */ #define TARGET_GCS AARCH64_HAVE_ISA (GCS) +/* Floating Point Absolute Maximum/Minimum extension instructions are + enabled through +faminmax. */ +#define TARGET_FAMINMAX AARCH64_HAVE_ISA (FAMINMAX) + /* Prefer different predicate registers for the output of a predicated operation over re-using an existing input predicate. */ #define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \ diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 95fe8f070f4..8e144c8ee4e 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -4457,3 +4457,11 @@ (UNSPECV_SET_FPCR "fpcr")]) (define_int_attr bits_etype [(8 "b") (16 "h") (32 "s") (64 "d")]) + +;; Iterators and attributes for faminmax + +(define_int_iterator FMAXMIN_ONLY_UNS [UNSPEC_FMAX UNSPEC_FMIN]) +(define_int_attr faminmax + [(UNSPEC_FMAX "famax") (UNSPEC_FMIN "famin")]) + + diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 9527bdb9e87..d8de9dbc9d1 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -492,6 +492,8 @@ ; neon_fp_reduc_minmax_s_q ; neon_fp_reduc_minmax_d ; neon_fp_reduc_minmax_d_q +; neon_fp_aminmax +; neon_fp_aminmax_q ; neon_fp_cvt_narrow_s_q ; neon_fp_cvt_narrow_d_q ; neon_fp_cvt_widen_h @@ -1044,6 +1046,8 @@ neon_fp_reduc_minmax_d,\ neon_fp_reduc_minmax_d_q,\ \ + neon_fp_aminmax,\ + neon_fp_aminmax_q,\ neon_fp_cvt_narrow_s_q,\ neon_fp_cvt_narrow_d_q,\ neon_fp_cvt_widen_h,\ @@ -1264,6 +1268,8 @@ neon_fp_reduc_add_d_q, neon_fp_reduc_minmax_s, neon_fp_reduc_minmax_s_q, neon_fp_reduc_minmax_d,\ neon_fp_reduc_minmax_d_q,\ + neon_fp_aminmax, neon_fp_aminmax_q,\ + neon_fp_aminmax, neon_fp_aminmax_q,\ neon_fp_cvt_narrow_s_q, neon_fp_cvt_narrow_d_q,\ neon_fp_cvt_widen_h, neon_fp_cvt_widen_s, neon_fp_to_int_s,\ neon_fp_to_int_s_q, neon_int_to_fp_s, neon_int_to_fp_s_q,\ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4850c7379bf..d48516f4f60 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21777,6 +21777,8 @@ Enable support for Armv9.4-a Guarded Control Stack extension. Enable support for Armv8.9-a/9.4-a translation hardening extension. @item rcpc3 Enable the RCpc3 (Release Consistency) extension. +@item faminmax +Enable the Floating Point Absolute Maximum/Minimum extension. @end table diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c new file mode 100644 index 00000000000..63ed1508c23 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c @@ -0,0 +1,10 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-march=armv9-a" } */ + +#include "arm_neon.h" + +void +test (float32x4_t a, float32x4_t b) +{ + vamaxq_f32 (a, b); /* { dg-error {ACLE function 'vamaxq_f32' requires ISA extension 'faminmax'} } */ +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c new file mode 100644 index 00000000000..f2b5bafb81c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c @@ -0,0 +1,75 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-march=armv9-a+faminmax" } */ + +#include "arm_neon.h" + +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + return vamax_f16 (a, b); +} + +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + return vamaxq_f16 (a, b); +} + +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + return vamax_f32 (a, b); +} + +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + return vamaxq_f32 (a, b); +} + +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + return vamaxq_f64 (a, b); +} + +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + return vamin_f16 (a, b); +} + +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + return vaminq_f16 (a, b); +} + +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + return vamin_f32 (a, b); +} + +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + return vaminq_f32 (a, b); +} + +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + return vaminq_f64 (a, b); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c new file mode 100644 index 00000000000..545a9468fdc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c @@ -0,0 +1,54 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O -march=armv9-a" } */ + +#include "arm_neon.h" + +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + return vmax_f16 (vabs_f16 (a), vabs_f16 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 0 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.4h, v[0-9]+.4h} 2 } } */ + +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + return vmaxq_f16 (vabsq_f16 (a), vabsq_f16 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 0 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.8h, v[0-9]+.8h} 2 } } */ + +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + return vmax_f32 (vabs_f32 (a), vabs_f32 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 0 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.2s, v[0-9]+.2s} 2 } } */ + +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + return vmaxq_f32 (vabsq_f32 (a), vabsq_f32 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 0 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.4s, v[0-9]+.4s} 2 } } */ + +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + return vmaxq_f64 (vabsq_f64 (a), vabsq_f64 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 0 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.2d, v[0-9]+.2d} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c new file mode 100644 index 00000000000..e4e079a6f9e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c @@ -0,0 +1,104 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-O -march=armv9-a+faminmax" } */ + +#include "arm_neon.h" + +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + return vmax_f16 (vabs_f16 (a), vabs_f16 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.4h, v[0-9]+.4h} 0 } } */ + +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + return vmaxq_f16 (vabsq_f16 (a), vabsq_f16 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.8h, v[0-9]+.8h} 0 } } */ + +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + return vmax_f32 (vabs_f32 (a), vabs_f32 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.2s, v[0-9]+.2s} 0 } } */ + +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + return vmaxq_f32 (vabsq_f32 (a), vabsq_f32 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.4s, v[0-9]+.4s} 0 } } */ + +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + return vmaxq_f64 (vabsq_f64 (a), vabsq_f64 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.2d, v[0-9]+.2d} 0 } } */ + +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + return vmin_f16 (vabs_f16 (a), vabs_f16 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmin\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.4h, v[0-9]+.4h} 0 } } */ + +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + return vminq_f16 (vabsq_f16 (a), vabsq_f16 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmin\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.8h, v[0-9]+.8h} 0 } } */ + +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + return vmin_f32 (vabs_f32 (a), vabs_f32 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmin\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.2s, v[0-9]+.2s} 0 } } */ + +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + return vminq_f32 (vabsq_f32 (a), vabsq_f32 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmin\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.4s, v[0-9]+.4s} 0 } } */ + +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + return vminq_f64 (vabsq_f64 (a), vabsq_f64 (b)); +} + +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmin\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 0 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tv[0-9]+.2d, v[0-9]+.2d} 0 } } */