From patchwork Sat Nov 13 02:34:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 47583 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4DF1D3858410 for ; Sat, 13 Nov 2021 02:34:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4DF1D3858410 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636770884; bh=g6w5A9FFw6bA3+s2Apka9+a7yX7Y4EP4207C7xFpm+A=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ng3i6YAMwe1yxEXoXTh6InVE8EDWubndbokx8uw5wT1JNgqk1dkMk0aUWSEXWBIlJ a1CjGU8cTiCHltxw2z8kADyveLeGEkySbvRgvwuDgyUBfrOeewkh4OBl+OcU07Br8j 1lR0xqlYrGpjI+LdiBhIFe4dEl8Y9KFvxIyDScgE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 11BD8385840B for ; Sat, 13 Nov 2021 02:34:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 11BD8385840B X-IronPort-AV: E=McAfee;i="6200,9189,10166"; a="220126381" X-IronPort-AV: E=Sophos;i="5.87,231,1631602800"; d="scan'208";a="220126381" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Nov 2021 18:34:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,231,1631602800"; d="scan'208";a="584158643" Received: from scymds01.sc.intel.com ([10.148.94.138]) by FMSMGA003.fm.intel.com with ESMTP; 12 Nov 2021 18:34:11 -0800 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1AD2Y9au031481; Fri, 12 Nov 2021 18:34:10 -0800 To: ubizjak@gmail.com Subject: [PATCH] PR target/103069: Relax cmpxchg loop for x86 target Date: Sat, 13 Nov 2021 10:34:09 +0800 Message-Id: <20211113023409.49472-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, SPOOFED_FREEMAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Hongyu Wang via Gcc-patches From: Hongyu Wang Reply-To: Hongyu Wang Cc: hongtao.liu@intel.com, gcc-patches@gcc.gnu.org, hjl.tools@intel.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, From the CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and causes excessive cache line bouncing. The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under -march=x86-64 like: movl (%rdi), %eax .L2: movl %eax, %edx movl %eax, %r8d orl $esi, %edx lock cmpxchgl %edx, (%rdi) jne .L2 movl %r8d, %eax ret To relax above loop, GCC should first emit a normal load, check and jump to .L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to yield the CPU to another hyperthread and to save power, so the code is like movl (%rdi), %eax .L4: movl (%rdi), %ecx movl %eax, %edx orl %esi, %edx cmpl %eax, %ecx jne .L2 lock cmpxchgl %edx, (%rdi) jne .L4 .L2: rep nop jmp .L4 This patch adds corresponding atomic_fetch_op expanders to insert load/ compare and pause for all the atomic logic fetch builtins. Add flag -mrelax-cmpxchg-loop to control whether to generate relaxed loop. Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: PR target/103069 * config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop): New expand function. * config/i386/i386-options.c (ix86_target_string): Add -mrelax-cmpxchg-loop flag. (ix86_valid_target_attribute_inner_p): Likewise. * config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop): New expand function prototype. * config/i386/i386.opt: Add -mrelax-cmpxchg-loop. * config/i386/sync.md (atomic_fetch_): New expander for SI,HI,QI modes. (atomic__fetch): Likewise. (atomic_fetch_nand): Likewise. (atomic_nand_fetch): Likewise. (atomic_fetch_): New expander for DI,TI modes. (atomic__fetch): Likewise. (atomic_fetch_nand): Likewise. (atomic_nand_fetch): Likewise. * doc/invoke.texi: Document -mrelax-cmpxchg-loop. gcc/testsuite/ChangeLog: PR target/103069 * gcc.target/i386/pr103069-1.c: New test. * gcc.target/i386/pr103069-2.c: Ditto. --- gcc/config/i386/i386-expand.c | 77 ++++++++++++++ gcc/config/i386/i386-options.c | 7 +- gcc/config/i386/i386-protos.h | 2 + gcc/config/i386/i386.opt | 4 + gcc/config/i386/sync.md | 117 +++++++++++++++++++++ gcc/doc/invoke.texi | 9 +- gcc/testsuite/gcc.target/i386/pr103069-1.c | 35 ++++++ gcc/testsuite/gcc.target/i386/pr103069-2.c | 70 ++++++++++++ 8 files changed, 319 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr103069-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103069-2.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 088e6af2258..f8a61835d85 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -23138,4 +23138,81 @@ ix86_expand_divmod_libfunc (rtx libfunc, machine_mode mode, *rem_p = rem; } +void ix86_expand_atomic_fetch_op_loop (rtx target, rtx mem, rtx val, + enum rtx_code code, bool after, + bool doubleword) +{ + rtx old_reg, new_reg, old_mem, success, oldval, new_mem; + rtx_code_label *loop_label, *pause_label; + machine_mode mode = GET_MODE (target); + + old_reg = gen_reg_rtx (mode); + new_reg = old_reg; + loop_label = gen_label_rtx (); + pause_label = gen_label_rtx (); + old_mem = copy_to_reg (mem); + emit_label (loop_label); + emit_move_insn (old_reg, old_mem); + + /* return value for atomic_fetch_op. */ + if (!after) + emit_move_insn (target, old_reg); + + if (code == NOT) + { + new_reg = expand_simple_binop (mode, AND, new_reg, val, NULL_RTX, + true, OPTAB_LIB_WIDEN); + new_reg = expand_simple_unop (mode, code, new_reg, NULL_RTX, true); + } + else + new_reg = expand_simple_binop (mode, code, new_reg, val, NULL_RTX, + true, OPTAB_LIB_WIDEN); + + /* return value for atomic_op_fetch. */ + if (after) + emit_move_insn (target, new_reg); + + /* Load memory again inside loop. */ + new_mem = copy_to_reg (mem); + /* Compare mem value with expected value. */ + + if (doubleword) + { + machine_mode half_mode = (mode == DImode)? SImode : DImode; + rtx low_new_mem = gen_lowpart (half_mode, new_mem); + rtx low_old_mem = gen_lowpart (half_mode, old_mem); + rtx high_new_mem = gen_highpart (half_mode, new_mem); + rtx high_old_mem = gen_highpart (half_mode, old_mem); + emit_cmp_and_jump_insns (low_new_mem, low_old_mem, NE, NULL_RTX, + half_mode, 1, pause_label, + profile_probability::guessed_never ()); + emit_cmp_and_jump_insns (high_new_mem, high_old_mem, NE, NULL_RTX, + half_mode, 1, pause_label, + profile_probability::guessed_never ()); + } + else + emit_cmp_and_jump_insns (new_mem, old_mem, NE, NULL_RTX, + GET_MODE (old_mem), 1, pause_label, + profile_probability::guessed_never ()); + + success = NULL_RTX; + oldval = old_mem; + expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg, + new_reg, false, MEMMODEL_SYNC_SEQ_CST, + MEMMODEL_RELAXED); + if (oldval != old_mem) + emit_move_insn (old_mem, oldval); + + emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx, + GET_MODE (success), 1, loop_label, + profile_probability::guessed_never ()); + + /* If mem is not expected, pause and loop back. */ + emit_label (pause_label); + emit_insn (gen_pause ()); + emit_jump_insn (gen_jump (loop_label)); + emit_barrier (); +} + + #include "gt-i386-expand.h" diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index a8cc0664f11..feff2584f41 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -397,7 +397,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, { "-mstv", MASK_STV }, { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD }, { "-mavx256-split-unaligned-store", MASK_AVX256_SPLIT_UNALIGNED_STORE }, - { "-mcall-ms2sysv-xlogues", MASK_CALL_MS2SYSV_XLOGUES } + { "-mcall-ms2sysv-xlogues", MASK_CALL_MS2SYSV_XLOGUES }, + { "-mrelax-cmpxchg-loop", MASK_RELAX_CMPXCHG_LOOP } }; /* Additional flag options. */ @@ -1092,6 +1093,10 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_IX86_YES ("general-regs-only", OPT_mgeneral_regs_only, OPTION_MASK_GENERAL_REGS_ONLY), + + IX86_ATTR_YES ("relax-cmpxchg-loop", + OPT_mrelax_cmpxchg_loop, + MASK_RELAX_CMPXCHG_LOOP), }; location_t loc diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index bd52450a148..7e05510c679 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -217,6 +217,8 @@ extern void ix86_move_vector_high_sse_to_mmx (rtx); extern void ix86_split_mmx_pack (rtx[], enum rtx_code); extern void ix86_split_mmx_punpck (rtx[], bool); extern void ix86_expand_avx_vzeroupper (void); +extern void ix86_expand_atomic_fetch_op_loop (rtx, rtx, rtx, enum rtx_code, + bool, bool); #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index ad366974b5b..46fad3cc038 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -404,6 +404,10 @@ momit-leaf-frame-pointer Target Mask(OMIT_LEAF_FRAME_POINTER) Save Omit the frame pointer in leaf functions. +mrelax-cmpxchg-loop +Target Mask(RELAX_CMPXCHG_LOOP) Save +Relax cmpxchg loop for atomic_fetch_{or,xor,and,nand} by adding load and cmp before cmpxchg, execute pause and loop back to load and compare if load value is not expected. + mpc32 Target RejectNegative Set 80387 floating-point precision to 32-bit. diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 05a835256bb..46048425327 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -525,6 +525,123 @@ (set (reg:CCZ FLAGS_REG) (unspec_volatile:CCZ [(const_int 0)] UNSPECV_CMPXCHG))])]) +(define_expand "atomic_fetch_" + [(match_operand:SWI124 0 "register_operand") + (any_logic:SWI124 + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , false, + false); + DONE; +}) + +(define_expand "atomic__fetch" + [(match_operand:SWI124 0 "register_operand") + (any_logic:SWI124 + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , true, + false); + DONE; +}) + +(define_expand "atomic_fetch_nand" + [(match_operand:SWI124 0 "register_operand") + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, false, + false); + DONE; +}) + +(define_expand "atomic_nand_fetch" + [(match_operand:SWI124 0 "register_operand") + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, true, + false); + DONE; +}) + +(define_expand "atomic_fetch_" + [(match_operand:CASMODE 0 "register_operand") + (any_logic:CASMODE + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , false, + doubleword); + DONE; +}) + +(define_expand "atomic__fetch" + [(match_operand:CASMODE 0 "register_operand") + (any_logic:CASMODE + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , true, + doubleword); + DONE; +}) + +(define_expand "atomic_fetch_nand" + [(match_operand:CASMODE 0 "register_operand") + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, false, + doubleword); + DONE; +}) + +(define_expand "atomic_nand_fetch" + [(match_operand:CASMODE 0 "register_operand") + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, true, + doubleword); + DONE; +}) + + ;; For operand 2 nonmemory_operand predicate is used instead of ;; register_operand to allow combiner to better optimize atomic ;; additions of constants. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2aba4c70b44..06ecf79bc0c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1419,7 +1419,7 @@ See RS/6000 and PowerPC Options. -mstack-protector-guard-reg=@var{reg} @gol -mstack-protector-guard-offset=@var{offset} @gol -mstack-protector-guard-symbol=@var{symbol} @gol --mgeneral-regs-only -mcall-ms2sysv-xlogues @gol +-mgeneral-regs-only -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol -mindirect-branch-register -mneeded} @@ -32259,6 +32259,13 @@ Generate code that uses only the general-purpose registers. This prevents the compiler from using floating-point, vector, mask and bound registers. +@item -mrelax-cmpxchg-loop +@opindex mrelax-cmpxchg-loop +Relax cmpxchg loop by emit early load and compare before cmpxchg, +execute pause if load value is not expected. This reduces excessive +cachline bouncing when and works for all atomic logic fetch builtins +that generates compare and swap loop. + @item -mindirect-branch=@var{choice} @opindex mindirect-branch Convert indirect call and jump with @var{choice}. The default is diff --git a/gcc/testsuite/gcc.target/i386/pr103069-1.c b/gcc/testsuite/gcc.target/i386/pr103069-1.c new file mode 100644 index 00000000000..444485cbae9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103069-1.c @@ -0,0 +1,35 @@ +/* PR target/103068 */ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -march=x86-64 -mtune=generic -mrelax-cmpxchg-loop" } */ +/* { dg-final { scan-assembler-times "rep nop" 32 } } */ + +#include + +#define FUNC_ATOMIC(TYPE, OP) \ +__attribute__ ((noinline, noclone)) \ +TYPE f_##TYPE##_##OP##_fetch (TYPE *a, TYPE b) \ +{ \ + return __atomic_##OP##_fetch (a, b, __ATOMIC_RELAXED); \ +} \ +__attribute__ ((noinline, noclone)) \ +TYPE f_##TYPE##_fetch_##OP (TYPE *a, TYPE b) \ +{ \ + return __atomic_fetch_##OP (a, b, __ATOMIC_RELAXED); \ +} + +FUNC_ATOMIC (int64_t, and) +FUNC_ATOMIC (int64_t, nand) +FUNC_ATOMIC (int64_t, or) +FUNC_ATOMIC (int64_t, xor) +FUNC_ATOMIC (int, and) +FUNC_ATOMIC (int, nand) +FUNC_ATOMIC (int, or) +FUNC_ATOMIC (int, xor) +FUNC_ATOMIC (short, and) +FUNC_ATOMIC (short, nand) +FUNC_ATOMIC (short, or) +FUNC_ATOMIC (short, xor) +FUNC_ATOMIC (char, and) +FUNC_ATOMIC (char, nand) +FUNC_ATOMIC (char, or) +FUNC_ATOMIC (char, xor) diff --git a/gcc/testsuite/gcc.target/i386/pr103069-2.c b/gcc/testsuite/gcc.target/i386/pr103069-2.c new file mode 100644 index 00000000000..8ac824cc8e8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103069-2.c @@ -0,0 +1,70 @@ +/* PR target/103068 */ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -march=x86-64 -mtune=generic" } */ + +#include +#include "pr103069-1.c" + +#define FUNC_ATOMIC_RELAX(TYPE, OP) \ +__attribute__ ((noinline, noclone, target ("relax-cmpxchg-loop"))) \ +TYPE relax_##TYPE##_##OP##_fetch (TYPE *a, TYPE b) \ +{ \ + return __atomic_##OP##_fetch (a, b, __ATOMIC_RELAXED); \ +} \ +__attribute__ ((noinline, noclone, target ("relax-cmpxchg-loop"))) \ +TYPE relax_##TYPE##_fetch_##OP (TYPE *a, TYPE b) \ +{ \ + return __atomic_fetch_##OP (a, b, __ATOMIC_RELAXED); \ +} + +FUNC_ATOMIC_RELAX (int64_t, and) +FUNC_ATOMIC_RELAX (int64_t, nand) +FUNC_ATOMIC_RELAX (int64_t, or) +FUNC_ATOMIC_RELAX (int64_t, xor) +FUNC_ATOMIC_RELAX (int, and) +FUNC_ATOMIC_RELAX (int, nand) +FUNC_ATOMIC_RELAX (int, or) +FUNC_ATOMIC_RELAX (int, xor) +FUNC_ATOMIC_RELAX (short, and) +FUNC_ATOMIC_RELAX (short, nand) +FUNC_ATOMIC_RELAX (short, or) +FUNC_ATOMIC_RELAX (short, xor) +FUNC_ATOMIC_RELAX (char, and) +FUNC_ATOMIC_RELAX (char, nand) +FUNC_ATOMIC_RELAX (char, or) +FUNC_ATOMIC_RELAX (char, xor) + +#define TEST_ATOMIC_FETCH_LOGIC(TYPE, OP) \ +{ \ + TYPE a = 11, b = 101, res, exp; \ + res = relax_##TYPE##_##OP##_fetch (&a, b); \ + exp = f_##TYPE##_##OP##_fetch (&a, b); \ + if (res != exp) \ + abort (); \ + a = 21, b = 92; \ + res = relax_##TYPE##_fetch_##OP (&a, b); \ + exp = f_##TYPE##_fetch_##OP (&a, b); \ + if (res != exp) \ + abort (); \ +} + +int main (void) +{ + TEST_ATOMIC_FETCH_LOGIC (int64_t, and) + TEST_ATOMIC_FETCH_LOGIC (int64_t, nand) + TEST_ATOMIC_FETCH_LOGIC (int64_t, or) + TEST_ATOMIC_FETCH_LOGIC (int64_t, xor) + TEST_ATOMIC_FETCH_LOGIC (int, and) + TEST_ATOMIC_FETCH_LOGIC (int, nand) + TEST_ATOMIC_FETCH_LOGIC (int, or) + TEST_ATOMIC_FETCH_LOGIC (int, xor) + TEST_ATOMIC_FETCH_LOGIC (short, and) + TEST_ATOMIC_FETCH_LOGIC (short, nand) + TEST_ATOMIC_FETCH_LOGIC (short, or) + TEST_ATOMIC_FETCH_LOGIC (short, xor) + TEST_ATOMIC_FETCH_LOGIC (char, and) + TEST_ATOMIC_FETCH_LOGIC (char, nand) + TEST_ATOMIC_FETCH_LOGIC (char, or) + TEST_ATOMIC_FETCH_LOGIC (char, xor) + return 0; +}