[8/8,RFC] Add demo implementation of one of the operations

Message ID 20240919131204.3865854-9-mmalcomson@nvidia.com
State New
Headers
Series Introduce floating point fetch_add builtins |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Build failed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Test failed

Commit Message

Matthew Malcomson Sept. 19, 2024, 1:12 p.m. UTC
  From: Matthew Malcomson <mmalcomson@nvidia.com>

Do demo implementation in AArch64 since that's the backend I'm most
familiar with.

Nothing much else to say -- nice to see that the demo implementation
seems to work as expected (being used for fetch_add, add_fetch and
sub_fetch even though it's only defined for fetch_sub).

Demo implementation ensures that I can run some execution tests.

Demo is added behind a flag in order to be able to run the testsuite
with different variants (with the flag and without).
Ensuring that the functionality worked for both the fallback and when
this optab was implemented (also check with the two different fallbacks
of either using libatomic or inlining a CAS loop).

In order to run with both this and the fallback implementation I use the
following flag in RUNTESTFLAGS:
    --target_board='unix {unix/-mtesting-fp-atomics}'

Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
---
 gcc/config/aarch64/aarch64.h   |  2 ++
 gcc/config/aarch64/aarch64.opt |  5 +++++
 gcc/config/aarch64/atomics.md  | 15 +++++++++++++++
 3 files changed, 22 insertions(+)
  

Patch

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index fac1882bcb3..c2f37545cd7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -119,6 +119,8 @@ 
    of LSE instructions.  */
 #define TARGET_OUTLINE_ATOMICS (aarch64_flag_outline_atomics)
 
+#define TARGET_TESTING_FP_ATOMICS (aarch64_flag_testing_fp_atomics)
+
 /* Align definitions of arrays, unions and structures so that
    initializations and copies can be made more efficient.  This is not
    ABI-changing, so it only affects places where we can see the
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 6356c419399..ed031258575 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -332,6 +332,11 @@  moutline-atomics
 Target Var(aarch64_flag_outline_atomics) Init(2) Save
 Generate local calls to out-of-line atomic operations.
 
+mtesting-fp-atomics
+Target Var(aarch64_flag_testing_fp_atomics) Init(0) Save
+Use the demonstration implementation of atomic_fetch_sub_<mode> for floating
+point modes.
+
 -param=aarch64-vect-compare-costs=
 Target Joined UInteger Var(aarch64_vect_compare_costs) Init(1) IntegerRange(0, 1) Param
 When vectorizing, consider using multiple different approaches and use
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 32a0a723732..ee8fbcd6c58 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -368,6 +368,21 @@ 
 ;; However we also implement the acquire memory barrier with DMB LD,
 ;; and so the ST<OP> is not blocked by the barrier.
 
+(define_insn "atomic_fetch_sub<mode>"
+  [(set (match_operand:GPF 0 "register_operand" "=&w")
+        (match_operand:GPF 1 "aarch64_sync_memory_operand" "+Q"))
+    (set (match_dup 1)
+        (unspec_volatile:GPF
+            [(minus:GPF (match_dup 1)
+                       (match_operand:GPF 2 "register_operand" "w"))
+             (match_operand:SI 3 "const_int_operand")]
+         UNSPECV_ATOMIC_LDOP_PLUS))
+    (clobber (match_scratch:GPF 4 "=w"))]
+    "TARGET_TESTING_FP_ATOMICS"
+    "// Here's your sandwich.\;ldr %<s>0, %1\;fsub %<s>4, %<s>0, %<s>2\;str %<s>4, %1\;// END"
+)
+
+
 (define_insn "aarch64_atomic_<atomic_ldoptab><mode>_lse"
   [(set (match_operand:ALLI 0 "aarch64_sync_memory_operand" "+Q")
 	(unspec_volatile:ALLI