Message ID | 20240919131204.3865854-1-mmalcomson@nvidia.com |
---|---|
Headers |
Return-Path: <gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9EF423857011 for <patchwork@sourceware.org>; Thu, 19 Sep 2024 13:14:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam02on20613.outbound.protection.outlook.com [IPv6:2a01:111:f403:2405::613]) by sourceware.org (Postfix) with ESMTPS id 799F03858D29 for <gcc-patches@gcc.gnu.org>; Thu, 19 Sep 2024 13:12:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 799F03858D29 Authentication-Results: sourceware.org; dmarc=fail (p=reject dis=none) header.from=nvidia.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 799F03858D29 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2405::613 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1726751565; cv=pass; b=n4SRz+rWyMz8MSuJDLLFwF08gHWqD/iNJVTl+XIb7z9helhL+hD4ZLFOgAGJaagJDkOLVo6X6pW+//8KZ/afgmUpd8A8ZpUeV3rYx9NRhD1qLi4HjWof7r3rB1jr+21VSgIvl/ktRGVjFz+ljDhMIv/zSgkl2nD8l005ULHzHuo= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1726751565; c=relaxed/simple; bh=DeTRvYCOE2RsQpbIEvGLSFF98bKvT/cv8V7di2HUmXQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=K/7RNNqzcImYyz2RHMZRB2x6rBWW6i2XW+0p9WG2mFmiUUn0/mlELWwMq1wcZD68JqvpmvnIkzp2LcZOvppmCsPkweOB7lC3zw7mfxbjiX8R/HcYwvo/Ctgmc04Zb0yBGRHFajEbUCwlETLcJFyxBv4+KAzxRMRN5akVD+uv/gA= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=vX26MhXLi6VVPkTAOtCvhkEAQSYDEWSLdD71O8buMRjGX7RtfiECQPppjbG+oUq4Ub+tzSgRxRATFAYaNZhOTwxPUMQ6/8hKjILV08vrC83sKKskZn4wIMrlXBA9G/KDw1EXdBdjma5cBkEIObhMORqPcgmppU9GEvMylJQKHVX8tD6miqenNZbyaLw1griubx+53hXSEx1kmDQeQPyxAQw/m4GBaKJjuAsQb4TVoWvXB6JmNh70L6vi+cEG0YmGerz7P0g/ZTFThN9ijMkDek7qQITEmfyexsSZff3fCZzyqy/Cr4wAQRhz/6MEtaYukNp5pNENS0TJhRoQxErltg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/IdU2YAic3h8oXQPXMG6YQUG8CgPqFmmvfKAbenjddk=; b=HlVOK6TdAz53I12gMtMPSH30gLMl4WDfIk2JJcIjpe7kEbHqRNY3nGM/TQsjKBIIf2z/ulxIfo2dv4Qsk2EuCfEPz8eLvR2A7bSP7TXIv0p4JhXooatgMhETu7mdPqYjyLYFzkTX3Qczia4WNBfJByO+TqOQowQVUGCRfQOzkKSpP3tMjfLWgZCD512k9UctZ08U9zO0q9blHId+FACc9RXm3jJHnK38rIAK+u2JaCpinYBYeyHZt8cGshuHl8RUXvm7O+1cTGqaHQBsz4NDzl3tqSUmf5N08X8qb5KGAYCq2td6xEoYoRVm1rq6yIVmiXXR14YPuVkfzmxsssqqsQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/IdU2YAic3h8oXQPXMG6YQUG8CgPqFmmvfKAbenjddk=; b=gJQkMbcp0p7xUzZXWt0bGQ9KplkUYs/a/eOMngidEUfOLsT/rnvSpZKaZ/6fqjiSZxPt4hjkn5WbXsJodj7olUjGM3J57SFV43HYTmEEd+eyfrdktiGnsRylPfTIVr1VyoIIFy18r/cRcI7xJ/q+KwUB5vSHtS2Xnax6soySEy57zHL6YM+pYQQ2OdAyTFgJRHpcuPXFYYcyLHS4kWtAImmLvTf8tYmDQOdwfnUJV1J3rgeeJDSVMgag4mBCcPIhQOqQCr/OxDcz7DtGeYLx50EZBpELIArIMecY39y7K8jgRhkaZEz5vnS16VI/UQ3YSuOlLS35kktHqnLpgFTEgg== Received: from BY5PR13CA0029.namprd13.prod.outlook.com (2603:10b6:a03:180::42) by MW4PR12MB7287.namprd12.prod.outlook.com (2603:10b6:303:22c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.23; Thu, 19 Sep 2024 13:12:37 +0000 Received: from CO1PEPF000066EB.namprd05.prod.outlook.com (2603:10b6:a03:180:cafe::be) by BY5PR13CA0029.outlook.office365.com (2603:10b6:a03:180::42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.24 via Frontend Transport; Thu, 19 Sep 2024 13:12:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1PEPF000066EB.mail.protection.outlook.com (10.167.249.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Thu, 19 Sep 2024 13:12:35 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Thu, 19 Sep 2024 06:12:23 -0700 Received: from f2784bb-lcelt.nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Thu, 19 Sep 2024 06:12:21 -0700 From: <mmalcomson@nvidia.com> To: <gcc-patches@gcc.gnu.org> CC: Jonathan Wakely <jwakely@redhat.com>, Joseph Myers <josmyers@redhat.com>, Richard Biener <rguenther@suse.de>, Matthew Malcomson <mmalcomson@nvidia.com> Subject: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins Date: Thu, 19 Sep 2024 14:11:56 +0100 Message-ID: <20240919131204.3865854-1-mmalcomson@nvidia.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000066EB:EE_|MW4PR12MB7287:EE_ X-MS-Office365-Filtering-Correlation-Id: 2b91ce58-cbf2-4337-fc29-08dcd8acbaa7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|36860700013|376014|82310400026; X-Microsoft-Antispam-Message-Info: OIfjbKiVLsQP762SggR35flQUH23QNsK4Ix6lVsPz9OsO1KWaJhIomO1vvQbvIxzAV9+9BS1iA3Yu2KAfit8ACNkPoYN7LoG5XLb+CA0JxU3OCNTD9UuxZPiu6VKggBRTlfEuLu6YrPf38xHQflV6oqLnvEK1bL9VOjtbXi6SdKD/oqClhpmZolX2jmbe4jv58LY1ynmKcwFP8/wd2DvLRaXAg+cPSaZQeX8qsvArCN1FmHEcJ9GLSCB5w+rA1vd+56QSLENpdmYeLXHQQ4t2nnRPmP5kjUUEtB2OUhNaYtg/JUIFsJ3Ep8BTQpV4Oo7LY3O7aOTswrz7myqRg5uMGMoxDzpf1cI0AzI719JDD4KhWKFb0JW0DqmXaFBUPTUCN+5uvbMfVvc54KyBXegdX/DI40UfoQDmzU/toZeUkqXG33Hodow8BUCE1qTyPjrqAllsqskcnTAybUhIt8ZybMN6yN66c3tkJiwvJtsWNoBZSxWte76COq5AflVjCVHAi+kSXjEuxPd9RE1SmcaTrl1Pg0tNwoWuBNqTvh6NVm00JpSrEVQ4AGXEmnZ93dcBPbxL1UK1r+PLp7LfK4yHsSl9GBSV0p+EYShn4I15Sld3OVtTtjlZ5ugBhzSnLvc412c26IPJSa9X1a98SmXYYe6BYmc/JihU5zJGIDcrwiMRnIN3UmVeSPor45ot2exdyw1feZOAS6Jtzzz3aVdrgO+EprgXo2q8YG1spxIYjzZHWqhXgcg15l3amu+dJKL5K6KDPMymePVE7M0y+Bo5nQjORi5Vp/OZXg7nvtzPuyR9zBPH5aDpZs6LHnU4juCGG16QAymndxrzyzcF3aRIsF6MVSOHmmVgBIaA/i17BZw3IEz3Ddh2w9xPZarXbGEcUgd9vnoDfDJdwL45Ee7ZJVuY34foiTVu+DSMmNGUCCqnHoaYxWLo49wuiahZz3fBy81/rxq0u+T9mZ/cnb9v4wenFlCM4ioCaWCXOmHlEpGpfyMQhtdo7W8C4p/5YCoaySU3NA2VZIiWfaeb6lNVzfTLgsrsnG5IBOjvkLShGB0NC7TPxcmaJf31dZz8ZZeJbK/tfI1AKvNOPTC7IQuULpTcZnDh0Gff0Qdx9OnQLFtSnNd5NDdL4YttEICaJ7DNEuO5ZBwhqf6w5hWxgUBXqtJK3OzIKeDFRWxOhbu66D6ts7t48xxnWSILUbIZ6Ra2mffFmrMIKIyC1yriqz+zrxybNGqDu9zQrR2GiPH+8D/8B1pU7X+QVOWujG39h/qHiqqzTdjLGFB2URQaX2zyLdrV+ZYlO7gK6uEpa2fVLbA85Ekg6RXAIMTsJ1AesPjq8dbmmIF8s/CJfnwde5tBvTkcL0TcIyrrgpqV7sXNfku6w+6xykG3WwKTKQ+WXj2YqECcaQHxTiDarnuU9JNOc4L2rMDtko+6nid8oPeVPkw4T6Zz+Vaz+LgxX6XR45Y X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(1800799024)(36860700013)(376014)(82310400026); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Sep 2024 13:12:35.6376 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2b91ce58-cbf2-4337-fc29-08dcd8acbaa7 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000066EB.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB7287 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org |
Series |
Introduce floating point fetch_add builtins
|
|
Message
Matthew Malcomson
Sept. 19, 2024, 1:11 p.m. UTC
From: Matthew Malcomson <mmalcomson@nvidia.com>
Hello, this is an RFC for adding an atomic floating point fetch_add builtin
(and variants) to GCC. The atomic fetch_add operation is defined to work
on the base floating point types in the C++20 standard chapter 31.7.3, and
extended to work for all cv-unqualified floating point types in C++23
chapter 33.5.7.4.
Honestly not sure who to Cc, please do point me to someone else if that's
better.
This is nowhere near complete (for one thing even the tests I've added
don't fully pass), but I think I have a complete enough idea that it's
worth checking if this is something that could be agreed on.
As it stands no target except the nvptx backend would natively support
these operations.
Main questions that I'm looking to resolve with this RFC:
1) Would GCC be OK accepting this implementation even though no backend
would be implementing these yet?
- AIUI only the nvptx backend could theoretically implement this.
- Even without a backend implementing it natively, the ability to use
this in code (especially libstdc++) enables other compilers to
generate better code for GPU's using standard C++.
2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
(i.e. a check on the resolved builtin rather than the more user-facing
one) in order to determine whether floating point atomic fetch_add is
available.
- N.b. this builtin is actually the builtin working on the "double"
type, one would have to rely on any compilers implementing that
particular resolved builtin to also implement the other floating point
atomic fetch_add builtins that they would want to support in libstdc++
`atomic<[floating_point_type]>::fetch_add`.
More specific questions about the choice of which builtins to implement and
whether the types are OK:
1) Is it OK to not implement the `__sync_*` versions?
Since these are deprecated and the `__atomic_*` versions are there to
match the C/C++ code atomic operations (which is a large part of the
reason for the new floating point operations).
2) Is it OK to not implement all the `fetch_*` operations?
None of the bitwise operations are specified for C++ and bitwise
operations are (AIUI) rarely used on floating point values.
3) Wanting to be able to farm out to libatomic meant that we need constant names
for the specialised functions.
- This led to the naming convention based on floating point type.
- That naming convention *could* be updated to include the special backend
floating point types if needed. I have not done this mostly because I
thought it would not add much, though I have not looked into this very
closely.
4) Wanting to name the functions based on floating point type rather than size
meant that the mapping from type passed to the overloaded version to
specialised builtin was less direct than for the integral versions.
- Ended up with a hard-coded table in the source to check this.
- Would very much like some better approach, not certain what better approach
I could find.
- Will eventually at least share the hard-coded tables (which isn't
happening yet because this is at RFC level).
5) Are there any other types that I should use?
Similarly are there any types that I'm trying to use that I shouldn't?
I *believe* I've implemented all the types that make sense and are
general builtin types. Could easily have missed some (e.g. left
`_Float128x` alone because AIUI it's larger than 128bits which means we
don't have any other atomic operations on such data), could also easily
be misunderstanding the mention in the C++ standards of "extended" types
(which I believe is the `_Float*` and `bfloat16` types).
6) Anything special about floating point maths that I'm tripping up on?
(Especially around CAS loop where we expand the operation directly,
sometimes convert a PLUS into a MINUS of a negative value ...).
Don't know of anything myself, but also a bit wary of floating point
maths.
N.b. I know that there's a reasonable amount of work left in:
1) Ensuring that all the places which use atomic types are updated
(e.g. asan),
2) Ensuring that all frontends can use these to the level that they could
use the integral atomics.
3) Ensuring the major backends can still compile libatomic (I had to do
something in AArch64 libatomic backend to make it compile, seems like
there might be more to do for others).
Matthew Malcomson (8):
Define new floating point builtin fetch_add functions
Add FP types for atomic builtin overload resolution
Tie the new atomic builtins to the backend
Have libatomic working as first draft
Use new builtins in libstdc++
First attempt at testsuite
Mention floating point atomic fetch_add etc in docs
Add demo implementation of one of the operations
gcc/builtin-types.def | 20 +
gcc/builtins.cc | 153 ++-
gcc/c-family/c-common.cc | 88 +-
gcc/config/aarch64/aarch64.h | 2 +
gcc/config/aarch64/aarch64.opt | 5 +
gcc/config/aarch64/atomics.md | 15 +
gcc/doc/extend.texi | 12 +
gcc/fortran/types.def | 48 +
gcc/optabs.cc | 101 +-
gcc/optabs.def | 6 +-
gcc/optabs.h | 2 +-
gcc/sync-builtins.def | 40 +
gcc/testsuite/gcc.dg/atomic-op-fp.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf128.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf16.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf16b.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf32.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf32x.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf64.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpf64x.c | 204 +++
gcc/testsuite/gcc.dg/atomic-op-fpl.c | 204 +++
gcc/testsuite/lib/target-supports.exp | 463 ++++++-
libatomic/Makefile.am | 6 +-
libatomic/Makefile.in | 12 +-
libatomic/acinclude.m4 | 49 +
libatomic/auto-config.h.in | 84 +-
libatomic/config/linux/aarch64/host-config.h | 2 +
libatomic/configure | 1153 ++++++++++++++++-
libatomic/configure.ac | 4 +
libatomic/fadd_n.c | 23 +
libatomic/fop_n.c | 5 +-
libatomic/fsub_n.c | 23 +
libatomic/libatomic.map | 44 +
libatomic/libatomic_i.h | 58 +
libatomic/testsuite/Makefile.in | 1 +
.../testsuite/libatomic.c/atomic-op-fp.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf128.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf16.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf16b.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf32.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf32x.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf64.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpf64x.c | 203 +++
.../testsuite/libatomic.c/atomic-op-fpl.c | 203 +++
libstdc++-v3/include/bits/atomic_base.h | 16 +
47 files changed, 6393 insertions(+), 112 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c
create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c
create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c
Comments
On Thu, 19 Sept 2024 at 14:12, <mmalcomson@nvidia.com> wrote: > > From: Matthew Malcomson <mmalcomson@nvidia.com> > > Hello, this is an RFC for adding an atomic floating point fetch_add builtin > (and variants) to GCC. The atomic fetch_add operation is defined to work > on the base floating point types in the C++20 standard chapter 31.7.3, and > extended to work for all cv-unqualified floating point types in C++23 > chapter 33.5.7.4. > > Honestly not sure who to Cc, please do point me to someone else if that's > better. > > This is nowhere near complete (for one thing even the tests I've added > don't fully pass), but I think I have a complete enough idea that it's > worth checking if this is something that could be agreed on. > > As it stands no target except the nvptx backend would natively support > these operations. > > Main questions that I'm looking to resolve with this RFC: > 1) Would GCC be OK accepting this implementation even though no backend > would be implementing these yet? > - AIUI only the nvptx backend could theoretically implement this. > - Even without a backend implementing it natively, the ability to use > this in code (especially libstdc++) enables other compilers to > generate better code for GPU's using standard C++. > 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)` > (i.e. a check on the resolved builtin rather than the more user-facing > one) in order to determine whether floating point atomic fetch_add is > available. Yes, if that name is what other compilers will also use (have you discussed this with Clang?) It looks like PATCH 5/8 only uses the _fp name for fetch_add though, and just uses fetch_sub etc. for the other functions, is that a mistake? > - N.b. this builtin is actually the builtin working on the "double" OK, so the library code just calls the generic __atomic_fetch_add that accepts any types, but then that gets expanded to a more specific form for float, double etc.? And the more specific form has to exist at some level, because we need an extern symbol from libatomic, so either we include the type as an explicit suffix on the name, or we use some kind of name mangling like _Z18__atomic_fetch_addPdS_S_, which is obviously nasty. > type, one would have to rely on any compilers implementing that > particular resolved builtin to also implement the other floating point > atomic fetch_add builtins that they would want to support in libstdc++ > `atomic<[floating_point_type]>::fetch_add`. This seems a bit concerning. I can imagine somebody implementing these for float and double first, but leaving long double, _Float64, _Float32, _Float128 etc. for later. In that case, libstdc++ would not work if somebody tries to use std::atomic<long double>, or whichever types aren't supported yet. It's OK if we can be *sure* that won't happen i.e. that Clang will either implement the new built-in for *all* FP types, or none. > > More specific questions about the choice of which builtins to implement and > whether the types are OK: > 1) Is it OK to not implement the `__sync_*` versions? > Since these are deprecated and the `__atomic_*` versions are there to > match the C/C++ code atomic operations (which is a large part of the > reason for the new floating point operations). > 2) Is it OK to not implement all the `fetch_*` operations? > None of the bitwise operations are specified for C++ and bitwise > operations are (AIUI) rarely used on floating point values. That seems OK (entirely correct even) to me. > 3) Wanting to be able to farm out to libatomic meant that we need constant names > for the specialised functions. > - This led to the naming convention based on floating point type. > - That naming convention *could* be updated to include the special backend > floating point types if needed. I have not done this mostly because I > thought it would not add much, though I have not looked into this very > closely. > 4) Wanting to name the functions based on floating point type rather than size > meant that the mapping from type passed to the overloaded version to > specialised builtin was less direct than for the integral versions. > - Ended up with a hard-coded table in the source to check this. > - Would very much like some better approach, not certain what better approach > I could find. > - Will eventually at least share the hard-coded tables (which isn't > happening yet because this is at RFC level). > 5) Are there any other types that I should use? > Similarly are there any types that I'm trying to use that I shouldn't? > I *believe* I've implemented all the types that make sense and are > general builtin types. Could easily have missed some (e.g. left > `_Float128x` alone because AIUI it's larger than 128bits which means we > don't have any other atomic operations on such data), could also easily That seems like a problem though - it means that GCC could be in exactly the concerning position described above: there could be a FP type (_Float128x) for which we have no __atomic_fetch_add built-in, but the __has_builtin(__atomic_fetch_add) check would work, because the built-in for double exists. So std::atomic<_Float128x> would not work. I don't think that's a problem today, because we don't have a _Float128x type AFAIK. But that suggests to me that the rule for whether to define the built-in for a FP type needs to be based on whether the type exists at all, not whether we have other atomic ops for it. Maybe the libstdc++ code should have a separate __has_builtin check for each type, e.g. bool __use_builtin = false; +#if __has_builtin(__atomic_fetch_add_fp) + if constexpr (is_same_v<_Tp, double>) + __use_builtin = true; +##endif +#if __has_builtin(__atomic_fetch_add_fpf) + if constexpr (is_same_v<_Tp, float>) + __use_builtin = true; +##endif +#if __has_builtin(__atomic_fetch_add_fpl) + if constexpr (is_same_v<_Tp, long double>) + __use_builtin = true; +##endif // ... if (__use_builtin) return __atomic_fatch_add(...); // .. current CAS-based implementation And so on with __has_builtin checks for the function for each type. This would be a lot more tedious, but would handle the case where the new built-in is only supported for some types. > be misunderstanding the mention in the C++ standards of "extended" types > (which I believe is the `_Float*` and `bfloat16` types). Yes. The C++ standard says there's a typedef std::float32_t which denotes some implementation-defined type, which is _Float32 in our implementation. But the _Float32 name is not in the C++ standard. > 6) Anything special about floating point maths that I'm tripping up on? > (Especially around CAS loop where we expand the operation directly, > sometimes convert a PLUS into a MINUS of a negative value ...). > Don't know of anything myself, but also a bit wary of floating point > maths. > > N.b. I know that there's a reasonable amount of work left in: > 1) Ensuring that all the places which use atomic types are updated > (e.g. asan), > 2) Ensuring that all frontends can use these to the level that they could > use the integral atomics. > 3) Ensuring the major backends can still compile libatomic (I had to do > something in AArch64 libatomic backend to make it compile, seems like > there might be more to do for others). > > > Matthew Malcomson (8): > Define new floating point builtin fetch_add functions > Add FP types for atomic builtin overload resolution > Tie the new atomic builtins to the backend > Have libatomic working as first draft > Use new builtins in libstdc++ > First attempt at testsuite > Mention floating point atomic fetch_add etc in docs > Add demo implementation of one of the operations > > gcc/builtin-types.def | 20 + > gcc/builtins.cc | 153 ++- > gcc/c-family/c-common.cc | 88 +- > gcc/config/aarch64/aarch64.h | 2 + > gcc/config/aarch64/aarch64.opt | 5 + > gcc/config/aarch64/atomics.md | 15 + > gcc/doc/extend.texi | 12 + > gcc/fortran/types.def | 48 + > gcc/optabs.cc | 101 +- > gcc/optabs.def | 6 +- > gcc/optabs.h | 2 +- > gcc/sync-builtins.def | 40 + > gcc/testsuite/gcc.dg/atomic-op-fp.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf128.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf16.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf16b.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf32.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf32x.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf64.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpf64x.c | 204 +++ > gcc/testsuite/gcc.dg/atomic-op-fpl.c | 204 +++ > gcc/testsuite/lib/target-supports.exp | 463 ++++++- > libatomic/Makefile.am | 6 +- > libatomic/Makefile.in | 12 +- > libatomic/acinclude.m4 | 49 + > libatomic/auto-config.h.in | 84 +- > libatomic/config/linux/aarch64/host-config.h | 2 + > libatomic/configure | 1153 ++++++++++++++++- > libatomic/configure.ac | 4 + > libatomic/fadd_n.c | 23 + > libatomic/fop_n.c | 5 +- > libatomic/fsub_n.c | 23 + > libatomic/libatomic.map | 44 + > libatomic/libatomic_i.h | 58 + > libatomic/testsuite/Makefile.in | 1 + > .../testsuite/libatomic.c/atomic-op-fp.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf128.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf16.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf16b.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf32.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf32x.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf64.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpf64x.c | 203 +++ > .../testsuite/libatomic.c/atomic-op-fpl.c | 203 +++ > libstdc++-v3/include/bits/atomic_base.h | 16 + > 47 files changed, 6393 insertions(+), 112 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c > > -- > 2.43.0 >
On Thu, 19 Sep 2024, mmalcomson@nvidia.com wrote:
> 6) Anything special about floating point maths that I'm tripping up on?
Correct atomic operations with floating-point operands should ensure that
exceptions raised exactly correspond to the operands for which the
operation succeeded, and not to the operands for any previous attempts
where the compare-exchange failed. There is a lengthy note in the C
standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in
6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses
appropriate code sequences to achieve this. In GCC the implementation of
this is in c-typeck.cc:build_atomic_assign, which in turn calls
targetm.atomic_assign_expand_fenv (note that we have the complication for
C of not introducing libm dependencies in code that only uses standard
language features and not <math.h>, <fenv.h> or <complex.h>, so direct use
of <fenv.h> functions is inappropriate here).
I would expect such built-in functions to follow the same semantics for
floating-point exceptions as _Atomic compound assignment does. (Note that
_Atomic compound assignment is more general in the allowed operands,
because compound assignment is a heterogeneous operation - for example,
the special floating-point logic in build_atomic_assign includes the case
where the LHS of the compound assignment is of atomic integer type but the
RHS is of floating type. However, built-in functions allow memory orders
other than seq_cst to be used, whereas _Atomic compound assignment is
limited to the seq_cst case.)
So it would seem appropriate for the implementation of such built-in
functions to make use of targetm.atomic_assign_expand_fenv for
floating-point environment handling, and for testcases to include tests
analogous to c11-atomic-exec-5.c that exceptions are being handled
correctly.
Cf. N2329 which suggested such operations for C in <stdatomic.h> (but
tried to do to many things in one paper to be accepted into C); it didn't
go into the floating-point exceptions semantics but simple correctness
would indicate avoiding spurious exceptions from discarded computations.
Thanks Jonathan, I agree with your point that having just the check against one of the overloaded versions is not very robust and having multiple checks against different versions would be better. Unfortunately - while asking the clang folk about this I realised that clang doesn't expose the resolved versions (e.g. the existing versions like `__atomic_load_2` etc) to the user. Instead they allow using SFINAE on these overloaded builtins. https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461 I spent some time looking at this and it seems that enabling SFINAE in GCC for these builtins is not too problematic (idea being to pass in a `error_on_noresolve` boolean to `resolve_overloaded_builtin` based on the context in the C++ frontend, then only emit errors if that boolean is set). To Jonathon: * Would you be OK with using SFINAE to choose whether to use the __atomic_fetch_add builtin for floating point types in libstdc++? At C++ frontend maintainers I Cc'd in: * Are you happy with the idea of enabling SFINAE on overloaded builtins resolved via resolve_overloaded_builtin? To global maintainers I Cc'd in: * Is there any reason you know of not to enable SFINAE on the overloaded builtins? * Would it be OK to enable SFINAE on the generic overloaded builtins and add the parameter so that targets can do the same for their target-specific builtins (i.e. without changing the behaviour of the existing target specific builtins)?
On Wed, 2 Oct 2024 at 17:48, Matthew Malcomson <mmalcomson@nvidia.com> wrote: > > Thanks Jonathan, > > I agree with your point that having just the check against one of the overloaded versions is not very robust and having multiple checks against different versions would be better. > > Unfortunately — while asking the clang folk about this I realised that clang doesn't expose the resolved versions (e.g. the existing versions like `__atomic_load_2` etc) to the user. > Instead they allow using SFINAE on these overloaded builtins. > https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461 > > I spent some time looking at this and it seems that enabling SFINAE in GCC for these builtins is not too problematic (idea being to pass in a `error_on_noresolve` boolean to `resolve_overloaded_builtin` based on the context in the C++ frontend, then only emit errors if that boolean is set). > > To Jonathon: > > Would you be OK with using SFINAE to choose whether to use the __atomic_fetch_add builtin for floating point types in libstdc++? It's not great for compile times, but no objection otherwise. > > > At C++ frontend maintainers I Cc'd in: > > Are you happy with the idea of enabling SFINAE on overloaded builtins resolved via resolve_overloaded_builtin? > > To global maintainers I Cc'd in: > > Is there any reason you know of not to enable SFINAE on the overloaded builtins? > Would it be OK to enable SFINAE on the generic overloaded builtins and add the parameter so that targets can do the same for their target-specific builtins (i.e. without changing the behaviour of the existing target specific builtins)? > > > ________________________________ > From: Jonathan Wakely <jwakely@redhat.com> > Sent: 19 September 2024 3:47 PM > To: Matthew Malcomson <mmalcomson@nvidia.com> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Joseph Myers <josmyers@redhat.com>; Richard Biener <rguenther@suse.de> > Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins > > External email: Use caution opening links or attachments > > > On Thu, 19 Sept 2024 at 14:12, <mmalcomson@nvidia.com> wrote: > > > > From: Matthew Malcomson <mmalcomson@nvidia.com> > > > > Hello, this is an RFC for adding an atomic floating point fetch_add builtin > > (and variants) to GCC. The atomic fetch_add operation is defined to work > > on the base floating point types in the C++20 standard chapter 31.7.3, and > > extended to work for all cv-unqualified floating point types in C++23 > > chapter 33.5.7.4. > > > > Honestly not sure who to Cc, please do point me to someone else if that's > > better. > > > > This is nowhere near complete (for one thing even the tests I've added > > don't fully pass), but I think I have a complete enough idea that it's > > worth checking if this is something that could be agreed on. > > > > As it stands no target except the nvptx backend would natively support > > these operations. > > > > Main questions that I'm looking to resolve with this RFC: > > 1) Would GCC be OK accepting this implementation even though no backend > > would be implementing these yet? > > - AIUI only the nvptx backend could theoretically implement this. > > - Even without a backend implementing it natively, the ability to use > > this in code (especially libstdc++) enables other compilers to > > generate better code for GPU's using standard C++. > > 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)` > > (i.e. a check on the resolved builtin rather than the more user-facing > > one) in order to determine whether floating point atomic fetch_add is > > available. > > Yes, if that name is what other compilers will also use (have you > discussed this with Clang?) > > It looks like PATCH 5/8 only uses the _fp name for fetch_add though, > and just uses fetch_sub etc. for the other functions, is that a > mistake? > > > - N.b. this builtin is actually the builtin working on the "double" > > OK, so the library code just calls the generic __atomic_fetch_add that > accepts any types, but then that gets expanded to a more specific form > for float, double etc.? > And the more specific form has to exist at some level, because we need > an extern symbol from libatomic, so either we include the type as an > explicit suffix on the name, or we use some kind of name mangling like > _Z18__atomic_fetch_addPdS_S_, which is obviously nasty. > > > type, one would have to rely on any compilers implementing that > > particular resolved builtin to also implement the other floating point > > atomic fetch_add builtins that they would want to support in libstdc++ > > `atomic<[floating_point_type]>::fetch_add`. > > This seems a bit concerning. I can imagine somebody implementing these > for float and double first, but leaving long double, _Float64, > _Float32, _Float128 etc. for later. In that case, libstdc++ would not > work if somebody tries to use std::atomic<long double>, or whichever > types aren't supported yet. It's OK if we can be *sure* that won't > happen i.e. that Clang will either implement the new built-in for > *all* FP types, or none. > > > > > More specific questions about the choice of which builtins to implement and > > whether the types are OK: > > 1) Is it OK to not implement the `__sync_*` versions? > > Since these are deprecated and the `__atomic_*` versions are there to > > match the C/C++ code atomic operations (which is a large part of the > > reason for the new floating point operations). > > 2) Is it OK to not implement all the `fetch_*` operations? > > None of the bitwise operations are specified for C++ and bitwise > > operations are (AIUI) rarely used on floating point values. > > That seems OK (entirely correct even) to me. > > > > 3) Wanting to be able to farm out to libatomic meant that we need constant names > > for the specialised functions. > > - This led to the naming convention based on floating point type. > > - That naming convention *could* be updated to include the special backend > > floating point types if needed. I have not done this mostly because I > > thought it would not add much, though I have not looked into this very > > closely. > > 4) Wanting to name the functions based on floating point type rather than size > > meant that the mapping from type passed to the overloaded version to > > specialised builtin was less direct than for the integral versions. > > - Ended up with a hard-coded table in the source to check this. > > - Would very much like some better approach, not certain what better approach > > I could find. > > - Will eventually at least share the hard-coded tables (which isn't > > happening yet because this is at RFC level). > > 5) Are there any other types that I should use? > > Similarly are there any types that I'm trying to use that I shouldn't? > > I *believe* I've implemented all the types that make sense and are > > general builtin types. Could easily have missed some (e.g. left > > `_Float128x` alone because AIUI it's larger than 128bits which means we > > don't have any other atomic operations on such data), could also easily > > That seems like a problem though - it means that GCC could be in > exactly the concerning position described above: there could be a FP > type (_Float128x) for which we have no __atomic_fetch_add built-in, > but the __has_builtin(__atomic_fetch_add) check would work, because > the built-in for double exists. So std::atomic<_Float128x> would not > work. > > I don't think that's a problem today, because we don't have a > _Float128x type AFAIK. But that suggests to me that the rule for > whether to define the built-in for a FP type needs to be based on > whether the type exists at all, not whether we have other atomic ops > for it. > > Maybe the libstdc++ code should have a separate __has_builtin check > for each type, e.g. > > bool __use_builtin = false; > +#if __has_builtin(__atomic_fetch_add_fp) > + if constexpr (is_same_v<_Tp, double>) > + __use_builtin = true; > +##endif > +#if __has_builtin(__atomic_fetch_add_fpf) > + if constexpr (is_same_v<_Tp, float>) > + __use_builtin = true; > +##endif > +#if __has_builtin(__atomic_fetch_add_fpl) > + if constexpr (is_same_v<_Tp, long double>) > + __use_builtin = true; > +##endif > // ... > > if (__use_builtin) > return __atomic_fatch_add(...); > // .. current CAS-based implementation > > And so on with __has_builtin checks for the function for each type. > This would be a lot more tedious, but would handle the case where the > new built-in is only supported for some types. > > > > be misunderstanding the mention in the C++ standards of "extended" types > > (which I believe is the `_Float*` and `bfloat16` types). > > Yes. The C++ standard says there's a typedef std::float32_t which > denotes some implementation-defined type, which is _Float32 in our > implementation. But the _Float32 name is not in the C++ standard. > > > > 6) Anything special about floating point maths that I'm tripping up on? > > (Especially around CAS loop where we expand the operation directly, > > sometimes convert a PLUS into a MINUS of a negative value ...). > > Don't know of anything myself, but also a bit wary of floating point > > maths. > > > > N.b. I know that there's a reasonable amount of work left in: > > 1) Ensuring that all the places which use atomic types are updated > > (e.g. asan), > > 2) Ensuring that all frontends can use these to the level that they could > > use the integral atomics. > > 3) Ensuring the major backends can still compile libatomic (I had to do > > something in AArch64 libatomic backend to make it compile, seems like > > there might be more to do for others). > > > > > > Matthew Malcomson (8): > > Define new floating point builtin fetch_add functions > > Add FP types for atomic builtin overload resolution > > Tie the new atomic builtins to the backend > > Have libatomic working as first draft > > Use new builtins in libstdc++ > > First attempt at testsuite > > Mention floating point atomic fetch_add etc in docs > > Add demo implementation of one of the operations > > > > gcc/builtin-types.def | 20 + > > gcc/builtins.cc | 153 ++- > > gcc/c-family/c-common.cc | 88 +- > > gcc/config/aarch64/aarch64.h | 2 + > > gcc/config/aarch64/aarch64.opt | 5 + > > gcc/config/aarch64/atomics.md | 15 + > > gcc/doc/extend.texi | 12 + > > gcc/fortran/types.def | 48 + > > gcc/optabs.cc | 101 +- > > gcc/optabs.def | 6 +- > > gcc/optabs.h | 2 +- > > gcc/sync-builtins.def | 40 + > > gcc/testsuite/gcc.dg/atomic-op-fp.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf128.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf16.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf16b.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf32.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf32x.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf64.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpf64x.c | 204 +++ > > gcc/testsuite/gcc.dg/atomic-op-fpl.c | 204 +++ > > gcc/testsuite/lib/target-supports.exp | 463 ++++++- > > libatomic/Makefile.am | 6 +- > > libatomic/Makefile.in | 12 +- > > libatomic/acinclude.m4 | 49 + > > libatomic/auto-config.h.in | 84 +- > > libatomic/config/linux/aarch64/host-config.h | 2 + > > libatomic/configure | 1153 ++++++++++++++++- > > libatomic/configure.ac | 4 + > > libatomic/fadd_n.c | 23 + > > libatomic/fop_n.c | 5 +- > > libatomic/fsub_n.c | 23 + > > libatomic/libatomic.map | 44 + > > libatomic/libatomic_i.h | 58 + > > libatomic/testsuite/Makefile.in | 1 + > > .../testsuite/libatomic.c/atomic-op-fp.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf128.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf16.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf16b.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf32.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf32x.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf64.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpf64x.c | 203 +++ > > .../testsuite/libatomic.c/atomic-op-fpl.c | 203 +++ > > libstdc++-v3/include/bits/atomic_base.h | 16 + > > 47 files changed, 6393 insertions(+), 112 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c > > create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c > > create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c > > > > -- > > 2.43.0 > > >
After getting GCC to handle floating point exceptions I'm now noticing that the `targetm.atomic_assign_expand_fenv` hook tends to use the library function `__atomic_feraiseexcept`. This requires adding -latomic to the command line where it wasn't needed before. I also notice that the inline CAS loop for libstdc++ `atomic<T>::fetch_add` on floating point types doesn't handle these floating point exceptions. For context, clang (which has already implemented floating point fetch_add for floating point types) is not emitting anything to handle the floating point environment. I expect that the eventual goal should be (please do correct if you disagree) 1. Update the existing libstdc++ CAS loop to handle floating point exceptions. 2. Suggest to clang that the fetch_add builtin should handle floating point exceptions. 3. GCC builtin handles floating point exceptions. 4. __atomic_feraiseexcept should be a builtin to avoid previously unnecessary requirement to link libatomic. However I'm uncertain we could sort all that out for GCC 15. Would it be reasonable to require linking with libatomic for the GCC 15 release? Alternatively is there some other intermediate step that people would prefer? Similarly — maybe there's something I'm missing — maybe about differences between C/C++? Another option is to aim for implementing the functionality of __atomic_feraiseexcept inline in the fetch_add GCC 15 (a bit tight on time), to use it in the new fetch_add builtin overload, but not use it in the CAS loop. That way no compiler & floating point type combination would need linking libatomic, current clang would not handle floating point exception information, while current gcc would handle that information. What are your thoughts? Regards, Matthew
N.B. please CC the libstdc++ list for libstdc++ topics. On Mon, 14 Oct 2024 at 14:00, Matthew Malcomson <mmalcomson@nvidia.com> wrote: > > After getting GCC to handle floating point exceptions I'm now noticing that the `targetm.atomic_assign_expand_fenv` hook tends to use the library function `__atomic_feraiseexcept`. > This requires adding -latomic to the command line where it wasn't needed before. > I also notice that the inline CAS loop for libstdc++ `atomic<T>::fetch_add` on floating point types doesn't handle these floating point exceptions. Right, Joseph made the C front-end handle them for _Atomic but I didn't do that for std::atomic. > For context, clang (which has already implemented floating point fetch_add for floating point types) is not emitting anything to handle the floating point environment. > > I expect that the eventual goal should be (please do correct if you disagree) > > Update the existing libstdc++ CAS loop to handle floating point exceptions. > Suggest to clang that the fetch_add builtin should handle floating point exceptions. > GCC builtin handles floating point exceptions. > __atomic_feraiseexcept should be a builtin to avoid previously unnecessary requirement to link libatomic. > > However I'm uncertain we could sort all that out for GCC 15. > Would it be reasonable to require linking with libatomic for the GCC 15 release? That would be a regression for anybody using std::atomic<floating-point-type> today. > Alternatively is there some other intermediate step that people would prefer? > Similarly — maybe there's something I'm missing — maybe about differences between C/C++? > > Another option is to aim for implementing the functionality of __atomic_feraiseexcept inline in the fetch_add GCC 15 (a bit tight on time), to use it in the new fetch_add builtin overload, but not use it in the CAS loop. > That way no compiler & floating point type combination would need linking libatomic, current clang would not handle floating point exception information, while current gcc would handle that information. > > What are your thoughts? If making it Just Work without a new libatomic requirement is possible, I'd prefer that. > Regards, > Matthew > > ________________________________ > From: Joseph Myers <josmyers@redhat.com> > Sent: 19 September 2024 10:38 PM > To: Matthew Malcomson <mmalcomson@nvidia.com> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Jonathan Wakely <jwakely@redhat.com>; Richard Biener <rguenther@suse.de> > Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins > > External email: Use caution opening links or attachments > > > On Thu, 19 Sep 2024, mmalcomson@nvidia.com wrote: > > > 6) Anything special about floating point maths that I'm tripping up on? > > Correct atomic operations with floating-point operands should ensure that > exceptions raised exactly correspond to the operands for which the > operation succeeded, and not to the operands for any previous attempts > where the compare-exchange failed. There is a lengthy note in the C > standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in > 6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses > appropriate code sequences to achieve this. In GCC the implementation of > this is in c-typeck.cc:build_atomic_assign, which in turn calls > targetm.atomic_assign_expand_fenv (note that we have the complication for > C of not introducing libm dependencies in code that only uses standard > language features and not <math.h>, <fenv.h> or <complex.h>, so direct use > of <fenv.h> functions is inappropriate here). > > I would expect such built-in functions to follow the same semantics for > floating-point exceptions as _Atomic compound assignment does. (Note that > _Atomic compound assignment is more general in the allowed operands, > because compound assignment is a heterogeneous operation - for example, > the special floating-point logic in build_atomic_assign includes the case > where the LHS of the compound assignment is of atomic integer type but the > RHS is of floating type. However, built-in functions allow memory orders > other than seq_cst to be used, whereas _Atomic compound assignment is > limited to the seq_cst case.) > > So it would seem appropriate for the implementation of such built-in > functions to make use of targetm.atomic_assign_expand_fenv for > floating-point environment handling, and for testcases to include tests > analogous to c11-atomic-exec-5.c that exceptions are being handled > correctly. > > Cf. N2329 which suggested such operations for C in <stdatomic.h> (but > tried to do to many things in one paper to be accepted into C); it didn't > go into the floating-point exceptions semantics but simple correctness > would indicate avoiding spurious exceptions from discarded computations. > > -- > Joseph S. Myers > josmyers@redhat.com >
On Mon, 14 Oct 2024 at 14:35, Jonathan Wakely <jwakely@redhat.com> wrote: > > N.B. please CC the libstdc++ list for libstdc++ topics. > > On Mon, 14 Oct 2024 at 14:00, Matthew Malcomson <mmalcomson@nvidia.com> wrote: > > > > After getting GCC to handle floating point exceptions I'm now noticing that the `targetm.atomic_assign_expand_fenv` hook tends to use the library function `__atomic_feraiseexcept`. > > This requires adding -latomic to the command line where it wasn't needed before. > > I also notice that the inline CAS loop for libstdc++ `atomic<T>::fetch_add` on floating point types doesn't handle these floating point exceptions. > > Right, Joseph made the C front-end handle them for _Atomic but I > didn't do that for std::atomic. > > > For context, clang (which has already implemented floating point fetch_add for floating point types) is not emitting anything to handle the floating point environment. I'm not sure if C++ users really care about fenv. I think most don't. > > > > I expect that the eventual goal should be (please do correct if you disagree) > > > > Update the existing libstdc++ CAS loop to handle floating point exceptions. > > Suggest to clang that the fetch_add builtin should handle floating point exceptions. > > GCC builtin handles floating point exceptions. > > __atomic_feraiseexcept should be a builtin to avoid previously unnecessary requirement to link libatomic. > > > > However I'm uncertain we could sort all that out for GCC 15. > > Would it be reasonable to require linking with libatomic for the GCC 15 release? > > That would be a regression for anybody using > std::atomic<floating-point-type> today. > > > Alternatively is there some other intermediate step that people would prefer? > > Similarly — maybe there's something I'm missing — maybe about differences between C/C++? > > > > Another option is to aim for implementing the functionality of __atomic_feraiseexcept inline in the fetch_add GCC 15 (a bit tight on time), to use it in the new fetch_add builtin overload, but not use it in the CAS loop. > > That way no compiler & floating point type combination would need linking libatomic, current clang would not handle floating point exception information, while current gcc would handle that information. > > > > What are your thoughts? > > If making it Just Work without a new libatomic requirement is > possible, I'd prefer that. > > > Regards, > > Matthew > > > > ________________________________ > > From: Joseph Myers <josmyers@redhat.com> > > Sent: 19 September 2024 10:38 PM > > To: Matthew Malcomson <mmalcomson@nvidia.com> > > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Jonathan Wakely <jwakely@redhat.com>; Richard Biener <rguenther@suse.de> > > Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins > > > > External email: Use caution opening links or attachments > > > > > > On Thu, 19 Sep 2024, mmalcomson@nvidia.com wrote: > > > > > 6) Anything special about floating point maths that I'm tripping up on? > > > > Correct atomic operations with floating-point operands should ensure that > > exceptions raised exactly correspond to the operands for which the > > operation succeeded, and not to the operands for any previous attempts > > where the compare-exchange failed. There is a lengthy note in the C > > standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in > > 6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses > > appropriate code sequences to achieve this. In GCC the implementation of > > this is in c-typeck.cc:build_atomic_assign, which in turn calls > > targetm.atomic_assign_expand_fenv (note that we have the complication for > > C of not introducing libm dependencies in code that only uses standard > > language features and not <math.h>, <fenv.h> or <complex.h>, so direct use > > of <fenv.h> functions is inappropriate here). > > > > I would expect such built-in functions to follow the same semantics for > > floating-point exceptions as _Atomic compound assignment does. (Note that > > _Atomic compound assignment is more general in the allowed operands, > > because compound assignment is a heterogeneous operation - for example, > > the special floating-point logic in build_atomic_assign includes the case > > where the LHS of the compound assignment is of atomic integer type but the > > RHS is of floating type. However, built-in functions allow memory orders > > other than seq_cst to be used, whereas _Atomic compound assignment is > > limited to the seq_cst case.) > > > > So it would seem appropriate for the implementation of such built-in > > functions to make use of targetm.atomic_assign_expand_fenv for > > floating-point environment handling, and for testcases to include tests > > analogous to c11-atomic-exec-5.c that exceptions are being handled > > correctly. > > > > Cf. N2329 which suggested such operations for C in <stdatomic.h> (but > > tried to do to many things in one paper to be accepted into C); it didn't > > go into the floating-point exceptions semantics but simple correctness > > would indicate avoiding spurious exceptions from discarded computations. > > > > -- > > Joseph S. Myers > > josmyers@redhat.com > >
Just to double-check: If C++ users don't really care about fenv, then: Would a feasible shortcut be to add a #pragma to avoid caring about about this for a given region (and use that pragma in libstdc++ in order to avoid the regression of needing to link against libatomic). The reason I'm looking to see what shortcuts are possible is that I'm quite dubious of implementing `__atomic_feraiseexcept` as a builtin correctly in the GCC 15 timeframe (on top of finishing off this patch). Especially given that there seem to be two specialist implementations in backends I'm unfamiliar with.
On Mon, 14 Oct 2024, Matthew Malcomson wrote: > 4. __atomic_feraiseexcept should be a builtin to avoid previously > unnecessary requirement to link libatomic. libatomic should be linked by default (with --as-needed); see bug 81358. But if your concern is e.g. libstdc++.so having DT_NEEDED for libatomic, that might not suffice. __atomic_feraiseexcept is a bit long to expand inline all the time (it's supposed to ensure that exceptions are raised including taking any enabled traps, so it's not just manipulating bits in the floating-point status register). On the other hand, if you're only concerned with one operation, only a subset of the checks in __atomic_feraiseexcept are relevant (for example, addition and subtraction can't raise FE_DIVBYZERO; and except for the exact underflow case which __atomic_feraiseexcept doesn't handle anyway, they can't raise FE_UNDERFLOW).
Thanks for the pointer — always linking with libatomic by default using--as-needed sounds quite promising from my end. I am not certain, but suspect that needing libatomic for atomic<float>::fetch_{add,sub} would not mean libstdc++.so would get a DT_NEEDED for libatomic. The place where the new builtin would go is in a header that the user code would include (and would be inlined completely). From a quick grep it appears this functionality in this header is not currently used in the libstdc++.so library itself. Not 100% confident on that yet — but enough to feel this approach is promising. I guess if that is the case, then using the same hypothetical driver flag suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81358#c9 for linking libstdc++ seems like it could avoid adding the requirement without realising. From reading the comments in that bugzilla it doesn't look like there's any objection to implementing that. Seems that the testsuite and build system are the main things to watch out for here right? Does that seem reasonable to others?
On Tue, 15 Oct 2024, Matthew Malcomson wrote: > Seems that the testsuite and build system are the main things to watch > out for here right? Yes, as illustrated by the draft patch linked from bug 81358 comment 11.